Photo of Mark's face, taken in standard office fluorescent.
Things are calm at the moment, so it seems a good time for me to ruminate on the current state of Dreamwidth's load/capacity/etc. Please let me know if anything is unclear, or if you have any concerns, or whatever -- I'll do my best to answer everything.

The summary -- Dreamwidth has definitely been hit with a lot of extra load in the past few weeks, but it's maybe not as much as you thought. We're over double what we were a month ago, but it's still only double -- we already had a good amount of load. Here's a good graph showing the bump:

Dreamwidth Bandwidth Usage

There are several main "systems" that make up Dreamwidth (or, really, most web sites). They are the frontend, the web servers, the cache, the databases, and the miscellaneous services.

Let's take it one by one... we'll start with the easy things. We're going to talk about the current state of stuff and the scaling of it. This is a term that loosely means "making it handle more traffic". (Where traffic is more users, more features, more whatever.)

Miscellaneous Services


These are pretty straightforward. Scaling these is, typically, a matter of just running more of them. We're nowhere near capacity on most of these, and any that are, we can just run more of the worker processes. I'm not too worried about these -- even if they do get overloaded, they won't stop the site itself from working. People can still read, post, and do stuff.

If these overload, it will affect emails going out, search, payments, and similar things that are considered non-critical services. (I.e., if search goes down for a day, it's frustrating and I will do my best to get it back online ASAP, but it's a lower priority than web servers or databases.)

Cache


We use memcached for most of our caching. Having this service online is crucial for the site to be working -- without our cache, the databases will overload and croak. No good.

Thankfully, though, this system is nearly free to scale. All we have to do is deploy another few instances and update the site config to use them. The downside is that adding more instances will cause the site to slow down for a little while because the entire cache has to be emptied and redistributed across the new, larger cache cluster.

We're getting close to the point where I want to deploy new cache instances, and I will be doing that when I get the new databases up and ready. I'll schedule it for a low traffic time so it should have minimal impact on the site.

I'm very comfortable with our status here and our ability to scale out for more capacity.

Web servers


These are the actual machines that handle processing the web pages, as you might expect. The nice thing about them, though, is that they are horizontally scalable. This means that adding more of them adds more capacity in a linear fashion. If we have ten web servers and they're overloaded, adding ten more doubles our capacity for this service.

We currently have six machines handling web requests and we can easily add more. It just takes about 48 hour notice to our hosting provider to get them to spin up and deploy a new machine. As soon as I notice us getting close to capacity on this tier, I submit a request and we get more up. Since the big bump of users two weeks ago, we've added two more web servers. If the load holds where it is now, we'll stay at this level -- but again, it's easy to add more.

I'm very comfortable with our status here, too.

Databases


We're currently running on MySQL databases. These machines are a lot more expensive than web servers -- more RAM, fancy disks, a RAID card with BBU, etc -- and they're a lot harder to scale than the webs. Harder, but not really impossible.

Physically, we have two machines. Logically, though, there are two types of databases -- the global database cluster and the user clusters. We have to talk about scaling the database in terms of its logical components, since they have different scaling requirements.

For the user clusters, these are effectively horizontally scalable, just like the web servers. We put online two more machines and we create a new user cluster, then we start moving people over to it. We can balance the load on the user databases by increasing or decreasing the number of users that "live" on that machine. You can see what user cluster you're on with our Where am I? tool.

The global cluster is harder to scale. There are some bits of data that have to live in one place because running it in several places makes code very, very hard to get right. Think about it like having two bosses -- if you have two bosses who do the same thing, you're never really sure who to listen to. Jim may tell you to work on project X, but Sally might say work on project Y. How do you decide what to do?

On the plus side, our global cluster is a lot, lot smaller than the user clusters. It only stores things like payments, user login information, and some other data that is pretty small and lightly used. It has a much higher capacity (how much load we can throw at it) before we have to consider scaling it.

Even then, scaling it can be done by adding more machines in as slaves -- i.e., exact copies of the master global database. This will buy us a decent amount of headroom before we have to consider doing something fancier like moving to SSDs instead of rotating disks. We can also add more cache machines to give us even more capacity.

We're hitting close to capacity on our existing databases, but we have two more machines on their way right now. They should be set up pretty soon (in the next day or so) and then we'll have more than double our current capacity. Also, we're still running on a MySQL version that is two years old -- there have been a lot of improvements to MySQL (particularly the Percona branch) since then, and I will be upgrading us soon.

All told, I'm pretty comfortable with our scaling here. Our existing systems are getting loaded but there's a very clear path from here to get us to more than 10x our current size. Once we start getting that big we'll have to do some more interesting work, but if we get to 10x our current size, we should have enough money that it will be no problem at all.

Frontend



Finally, the frontend -- our load balancer -- the machine that handles getting all of the user traffic from the Internet to our web servers. We're running a combination of software on this machine, primarily Pound and Perlbal. (Although soon I will be adding Varnish to help with userpic caching.)

Scaling the frontend is easy up to a certain point, after which it becomes really hard. Thankfully that "certain point" is fairly far off. Right now we're at about 25% capacity on this machine -- this is after the doubled load! -- and adding in a Varnish cache for userpics should help reduce that to about 15%.

When we start getting closer to that point I have a few ideas that will help with the load -- notably offloading the Perlbal instances to another machine -- and that will allow us to go up to the bandwidth limit of the machine. We're doing up to about 25Mbps right now and we can go close to 800Mbps before we start to hit capacity on that front.

In short, then, I believe we're in good shape on this front and have a clear path to scaling this out to more than 10x our current load.

Code/other concerns



Honestly, the part that is most likely to bite us is also one of the easiest to fix -- and that's our code. There are certainly inefficient things in our codebase and we will have to address them as they come up. This is also exactly the kind of thing that has led LJ to temporarily suspend ONTD and similar communities from time to time, because that's the most expedient way to get the service back to normal for everybody else while they isolate and fix the problem triggered by the heavy users.

Dreamwidth will have the same policy, too. If the site goes down and it turns out to be because of a particular community or heavy user, we'll take what action we need to bring the site back -- and then we'll work our tails off to get service restored to that particular user/group. I also promise that we will communicate with anybody affected by this and let you know what's up -- you won't sit and wonder what happened.

Open floor!



All that said... any questions? Fire away, I'll answer them to the best of my ability. (Although I will say that right now I'm going to step away from the computer and go make some bread. It's New Year's Eve and I'd like to spend some time with my partner, [personal profile] aposiopetic. I'll check back in though!)

And, if I haven't said it enough, thank you for using Dreamwidth. It's really gratifying to see people moving in and giving things a whirl. We've worked really hard on this site for the past few years -- this is our baby! -- and I'm so excited to share.

<3

Dec. 27th, 2011 05:20 pm
Photo of Mark's face, taken in standard office fluorescent.
I love you, Dreamwidth.
Photo of Mark's face, taken in standard office fluorescent.
Something I've been meaning to write about, but there's never really a good time, so hey.

One of the best things about building a really awesome foundation on something is that, eventually, you can take a step back and let things continue to go and grow and feel entirely confident that things aren't going to go off the rails without you. Over the past six months I've been reducing my role on Dreamwidth and am recently at the point where I'm really not doing a lot on the site day to day.

I'm still helping out when really needed and I will continue to make myself available for middle-of-the-night site breakages and other problems that need immediate redress. However, I am not taking an active role in the management or development of the project on a day-to-day basis. For any of that, you should talk to [staff profile] denise and/or [staff profile] fu. They're awesome and I have complete confidence in the future of Dreamwidth with them at the helm.

I think that's the salient part.
Photo of Mark's face, taken in standard office fluorescent.
There's a bunch of talk going around right now about the whole issue of content security, and trusting the people who host your content and have access to it. I wanted to talk on that for a moment as it's something that is really important to me.

The only people that are authorized to view protected content on Dreamwidth that they don't otherwise have access to are [staff profile] denise and myself. We are the only people with the proper access level. On top of that, it's not automatic -- in order to view the protected content, every time one of us visits a URL we have to edit it and add "viewall=1" to the end of it. It's a very manual process (for good reason). It's also logged -- and I don't know about Denise, but I review the logs regularly, just like every other security log we have.

The second level of access is for people who have access to the production servers that run Dreamwidth. When someone has the ability to log in to our servers, they have full access to the data on the databases and could in theory access protected content. The only people with server access are again myself and Denise, plus our two sysadmins: [personal profile] matthew, who used to work for LJ (before and during Six Apart), and [personal profile] alierak, who I've known for a decade and I trust completely.

That's it. The four of us.

At some point, it comes down to trust. We need the ability to work on the servers, so there are always going to be a set of people who have the ability to see private data. This isn't something that we can feasibly get rid of, either. The data exists on the servers (that's how we can show it to you and the people who are authorized to see it) and we need access to those servers to maintain them. The data isn't just sitting around visible to us, though -- it's tucked away in the database and requires a lot of manual effort to dig out, unzip, and connect to a user account. We never see post content accidentally.

In the end, I think that the best that I can offer anybody is to be explicit about who has access (and what kind of access they have) and to personally watch the security logs. I watch to make sure we don't have unauthorized access to our servers, and I look for unauthorized access to private data as well. It's part of the routine, and it's something I take very seriously. Having dealt with some problems related to this in the past (on other projects, with other people) it's not something I want to see Dreamwidth have to go through.

I'm happy to talk about this, if anybody has any thoughts, comments, or questions.
Photo of Mark's face, taken in standard office fluorescent.
Happy Birthday, Dreamwidth!

Today is April 30th, 2010. One year ago today (at this very time!), [staff profile] denise and I were running around like some imitation of headless chickens (a weird expression, honestly). We were trying to make sure everything was put together, that all of our ducks were in a row (I'm in a fowl mood apparently), and that nothing was going to go wrong.

The gates opened at 9PM Eastern -- just under five hours from now. We had a lot of users sign up, a lot of Seed Accounts bought, and a great time. We worked hard, we were tired as hell the next day, but we made it.

And now today is DW's first real birthday.

This entire project would not have worked without the tons of effort, time, and love our volunteers have put into Dreamwidth. Thank you, all of you. I am so awed every week when I see how much you all care, how much you contribute, everything. Thank you.

A huge amount of thanks to the people who took a chance on us a year ago, too. You put your trust in a little unproven company and here we are -- a year later, through trials and tribulations, trolls and troglodytes, we are still here.

I'm looking forward to the next year -- and many more!
Photo of Mark's face, taken in standard office fluorescent.
More details are here: http://dw-meetups.dreamwidth.org/3418.html

Consider this your official warning! RSVP over on that post, thanks.
Photo of Mark's face, taken in standard office fluorescent.
I'm starting to plan a meetup in the local (to me) area, if you're interested, see the details:

http://dw-meetups.dreamwidth.org/2803.html
Photo of Mark's face, taken in standard office fluorescent.
Happy Second Birthday to the Dreamwidth project!

The project was officially named and rolling on a small scale on March 29th, 2008 (that's when the domain was registered). So -- happy belated birthday, us.

Sometimes the project feels so new, I think that we're just in the beginning. We are definitely in the beginning of the project, but we're not so new as all that. We've been public for coming up on two years, actually, since it was announced June 11th, 2008.

Of course, it took us a year to get to the point where we were ready to take on users, so our first anniversary of Open Beta is coming up soon.

And this is me rambling a bit. Marking things down. I need to go back sometime and dig up the original emails that started it all and put them in here so that they are recorded and don't just disappear. There was some interesting discussion back then.
Photo of Mark's face, taken in standard office fluorescent.
Who's got things to go in the Monday update?
Photo of Mark's face, taken in standard office fluorescent.
Who's got the stuff for Monday?
Photo of Mark's face, taken in standard office fluorescent.
Who's got things for the Monday update?
Photo of Mark's face, taken in standard office fluorescent.
If you are in or near or can get to Sydney, NSW, AU this Friday (the 15th), I would love to meet up with you! I am staying near the Circular Quay in the downtown Sydney area.

If you're interested, please comment! If you have any suggestions on places for a (suspected to be) small group to get dinner and hang out for a while, I'd welcome that, too.

Thanks!
Photo of Mark's face, taken in standard office fluorescent.
Happy Monday -- well, almost, depending on where you are. Who has what for this week's update? What I have on my list:

* Still quiet for the holidays
* Update on me in Sydney and location for get together-ish, also Wellington
* Holiday promotion ending soon
* ...

I know I had more stuff, but being here for Christmas has emptied my mind of most work related topics.
Photo of Mark's face, taken in standard office fluorescent.
Anybody feel like going through the comments to this post:

http://dw-biz.dreamwidth.org/2794.html

I want a summary of features requested, features demanded, and how many people want them. If ten users say "I absolutely must have pink dingos" then that should be noted. Basically, a priority list of what people want, sorted by number of people and the feature itself.

If you want to take this on, just leave a comment saying so! I'm hoping to have the data in the next week or so. Thanks. :)

...

PS, and also, Merry Christmas! I hope all of you have a wonderful day, whether you celebrate this particular holiday or not.

PPS, over the past 2 months, Dreamwidth has averaged 5,611 posts-per-day. Just going based on the number of documents being indexed by our search system. We're up over 7.4 million posts being indexed right now. (Not all of these are public, of course! But we index most posts.)
Photo of Mark's face, taken in standard office fluorescent.
Does anybody have anything for the news post?

I'm planning on keeping it fairly short, wishing everybody Happy Holiday of Choice, and advising that we'll be quiet for the next few weeks. No pushes, few patches, many folks are busy... The usual!
Photo of Mark's face, taken in standard office fluorescent.
If you're looking for me over the next two weeks, I may be sparse. I will be on a plane for most of the 20th and the 28th, and in between I will be at my parents' house doing Christmas. As soon as I get back, we're preparing for New Year's here, and a second Christmas at home.

I will be checking email every day and will be online as much as possible. I will continue to do the weekly news posts, too. But if you're expecting much in the way of patch review or code or fast responses out of me, please keep this in mind. Thanks!
Photo of Mark's face, taken in standard office fluorescent.
What do we have for the Monday update? So far I have:

* Holiday promotion
* Development update (code tour, bug tour?)

Short list on my side. Whatchagot?
Photo of Mark's face, taken in standard office fluorescent.
Today has been slow, I've been doing system updates on the production stack. Taking machines out gracefully so nothing misses them and the site doesn't melt down, upgrading them, rebooting them, verifying them, and putting them back in ... very exciting.

I still have to do the more exciting ones. Search will have to go offline for 10 minutes because we don't have two machines running that service. I also can't do the primary load balancer or the master database without taking a downtime, so I'll have to schedule that. But most of the other things I can get done pretty quickly and painlessly.

We're not running anything so close to the edge that I'm worried about system updates messing up our performance. I also did the updates on one test webserver last night and let it run for a while: nobody complained of any weirdness, so nothing untoward seems to have happened in the upgrades.

I'm going to start asking people to upgrade their development environments to Karmic, too, and see how that does. I'm in no hurry to update the production environment, but since we're not running LTS I want to try to make sure we don't end up on archaic unsupported security-hole-ridden versions of things...

Okay, sb-web03 has come back, time to go continue the parade.
Photo of Mark's face, taken in standard office fluorescent.
I just decommissioned the last two production slices we had active on Slicehost. The original load balancer slice and the admin slice.

They served us well for the ~6 months we used them. There were a number of hiccups along the way (hence our moving) but all in all it wasn't terrible. Reasonable for a beta service.

Just figured I'd note this down here.
Photo of Mark's face, taken in standard office fluorescent.
I'm priming the pump, this entry is tagged "nnwm09" which means that you will be able to find it in the "find latest posts tagged nnwm09" page:

http://www.dreamwidth.org/latest?tag=nnwm09

Note that this doesn't work with all tags -- only with ones we explicitly allow. For now, that's nnwm09. Even though this entry has nothing to do with nnwm09. ;-)

This is a new feature we're currently working on and testing. It's going through an accelerated roll out because NaNoWriMo starts in a few days. Please let me know if you see any bumps or bruises along the way.
Page generated Jan. 28th, 2012 10:29 pm
Powered by Dreamwidth Studios