Evergreens

Mark Smith's Journal

Work related musings of a geek.

the state of Dreamwidth: load, capacity, etc

[staff profile] mark
Things are calm at the moment, so it seems a good time for me to ruminate on the current state of Dreamwidth's load/capacity/etc. Please let me know if anything is unclear, or if you have any concerns, or whatever -- I'll do my best to answer everything.

The summary -- Dreamwidth has definitely been hit with a lot of extra load in the past few weeks, but it's maybe not as much as you thought. We're over double what we were a month ago, but it's still only double -- we already had a good amount of load. Here's a good graph showing the bump:

Dreamwidth Bandwidth Usage

There are several main "systems" that make up Dreamwidth (or, really, most web sites). They are the frontend, the web servers, the cache, the databases, and the miscellaneous services.

Let's take it one by one... we'll start with the easy things. We're going to talk about the current state of stuff and the scaling of it. This is a term that loosely means "making it handle more traffic". (Where traffic is more users, more features, more whatever.)

Miscellaneous Services


These are pretty straightforward. Scaling these is, typically, a matter of just running more of them. We're nowhere near capacity on most of these, and any that are, we can just run more of the worker processes. I'm not too worried about these -- even if they do get overloaded, they won't stop the site itself from working. People can still read, post, and do stuff.

If these overload, it will affect emails going out, search, payments, and similar things that are considered non-critical services. (I.e., if search goes down for a day, it's frustrating and I will do my best to get it back online ASAP, but it's a lower priority than web servers or databases.)

Cache


We use memcached for most of our caching. Having this service online is crucial for the site to be working -- without our cache, the databases will overload and croak. No good.

Thankfully, though, this system is nearly free to scale. All we have to do is deploy another few instances and update the site config to use them. The downside is that adding more instances will cause the site to slow down for a little while because the entire cache has to be emptied and redistributed across the new, larger cache cluster.

We're getting close to the point where I want to deploy new cache instances, and I will be doing that when I get the new databases up and ready. I'll schedule it for a low traffic time so it should have minimal impact on the site.

I'm very comfortable with our status here and our ability to scale out for more capacity.

Web servers


These are the actual machines that handle processing the web pages, as you might expect. The nice thing about them, though, is that they are horizontally scalable. This means that adding more of them adds more capacity in a linear fashion. If we have ten web servers and they're overloaded, adding ten more doubles our capacity for this service.

We currently have six machines handling web requests and we can easily add more. It just takes about 48 hour notice to our hosting provider to get them to spin up and deploy a new machine. As soon as I notice us getting close to capacity on this tier, I submit a request and we get more up. Since the big bump of users two weeks ago, we've added two more web servers. If the load holds where it is now, we'll stay at this level -- but again, it's easy to add more.

I'm very comfortable with our status here, too.

Databases


We're currently running on MySQL databases. These machines are a lot more expensive than web servers -- more RAM, fancy disks, a RAID card with BBU, etc -- and they're a lot harder to scale than the webs. Harder, but not really impossible.

Physically, we have two machines. Logically, though, there are two types of databases -- the global database cluster and the user clusters. We have to talk about scaling the database in terms of its logical components, since they have different scaling requirements.

For the user clusters, these are effectively horizontally scalable, just like the web servers. We put online two more machines and we create a new user cluster, then we start moving people over to it. We can balance the load on the user databases by increasing or decreasing the number of users that "live" on that machine. You can see what user cluster you're on with our Where am I? tool.

The global cluster is harder to scale. There are some bits of data that have to live in one place because running it in several places makes code very, very hard to get right. Think about it like having two bosses -- if you have two bosses who do the same thing, you're never really sure who to listen to. Jim may tell you to work on project X, but Sally might say work on project Y. How do you decide what to do?

On the plus side, our global cluster is a lot, lot smaller than the user clusters. It only stores things like payments, user login information, and some other data that is pretty small and lightly used. It has a much higher capacity (how much load we can throw at it) before we have to consider scaling it.

Even then, scaling it can be done by adding more machines in as slaves -- i.e., exact copies of the master global database. This will buy us a decent amount of headroom before we have to consider doing something fancier like moving to SSDs instead of rotating disks. We can also add more cache machines to give us even more capacity.

We're hitting close to capacity on our existing databases, but we have two more machines on their way right now. They should be set up pretty soon (in the next day or so) and then we'll have more than double our current capacity. Also, we're still running on a MySQL version that is two years old -- there have been a lot of improvements to MySQL (particularly the Percona branch) since then, and I will be upgrading us soon.

All told, I'm pretty comfortable with our scaling here. Our existing systems are getting loaded but there's a very clear path from here to get us to more than 10x our current size. Once we start getting that big we'll have to do some more interesting work, but if we get to 10x our current size, we should have enough money that it will be no problem at all.

Frontend



Finally, the frontend -- our load balancer -- the machine that handles getting all of the user traffic from the Internet to our web servers. We're running a combination of software on this machine, primarily Pound and Perlbal. (Although soon I will be adding Varnish to help with userpic caching.)

Scaling the frontend is easy up to a certain point, after which it becomes really hard. Thankfully that "certain point" is fairly far off. Right now we're at about 25% capacity on this machine -- this is after the doubled load! -- and adding in a Varnish cache for userpics should help reduce that to about 15%.

When we start getting closer to that point I have a few ideas that will help with the load -- notably offloading the Perlbal instances to another machine -- and that will allow us to go up to the bandwidth limit of the machine. We're doing up to about 25Mbps right now and we can go close to 800Mbps before we start to hit capacity on that front.

In short, then, I believe we're in good shape on this front and have a clear path to scaling this out to more than 10x our current load.

Code/other concerns



Honestly, the part that is most likely to bite us is also one of the easiest to fix -- and that's our code. There are certainly inefficient things in our codebase and we will have to address them as they come up. This is also exactly the kind of thing that has led LJ to temporarily suspend ONTD and similar communities from time to time, because that's the most expedient way to get the service back to normal for everybody else while they isolate and fix the problem triggered by the heavy users.

Dreamwidth will have the same policy, too. If the site goes down and it turns out to be because of a particular community or heavy user, we'll take what action we need to bring the site back -- and then we'll work our tails off to get service restored to that particular user/group. I also promise that we will communicate with anybody affected by this and let you know what's up -- you won't sit and wonder what happened.

Open floor!



All that said... any questions? Fire away, I'll answer them to the best of my ability. (Although I will say that right now I'm going to step away from the computer and go make some bread. It's New Year's Eve and I'd like to spend some time with my partner, [personal profile] aposiopetic. I'll check back in though!)

And, if I haven't said it enough, thank you for using Dreamwidth. It's really gratifying to see people moving in and giving things a whirl. We've worked really hard on this site for the past few years -- this is our baby! -- and I'm so excited to share.
Page 1 of 2 << [1] [2] >>
01.01.2012 12:15 am (UTC)

(no subject)

ashtoreth: (sunflower)
Posted by [personal profile] ashtoreth
I hope that you and everyone at Dreamwidth has a prosperous and healthy New Year.

Admittedly, I understood about 3/4 of the post, but it sounds like Dreamwidth is in very capable hands. Thank you and [staff profile] denise for creating such a fabulous place to call our on-line home.
Edited 01.01.2012 01:51 am (UTC)
01.01.2012 12:17 am (UTC)

(no subject)

thoitaxh: (Once Upon A Time: Hunter)
Posted by [personal profile] thoitaxh
Not a question, but thank you for keeping us in the loop! Also, have a great year 2012! Best wishes from Germany!
01.01.2012 12:23 am (UTC)

(no subject)

princessofgeeks: (Default)
Posted by [personal profile] princessofgeeks
Thank you so much for putting this in something close to layman's terms. It's been exciting to see you all build this and I have been very very happy here.

Happy New Year to you and yours!!!
Edited 01.01.2012 12:23 am (UTC)
01.01.2012 12:25 am (UTC)

(no subject)

xylohypha: Owl (semyaza_owl2)
Posted by [personal profile] xylohypha
No questions, but thank you for an explanation about scaling which makes me feel like I understand it--even though my actual knowledge of how websites run is very minimal. Thank you also for keeping us updated with what is going on with DW.

Best wishes for a very good New Year.
01.01.2012 12:25 am (UTC)

(no subject)

ilyena_sylph: (Dreamwidth "d", rainbow-colored by Sophie) (Dreamwidth)
Posted by [personal profile] ilyena_sylph
There is a lot of this post that I don't entirely understand (this kind of tech is not my thing), but I followed the gist of it, and I love on you a lot for writing it.

Thank you so much for being this open with us and telling us so much about how things work.
01.01.2012 12:26 am (UTC)

(no subject)

jhumor: (Default)
Posted by [personal profile] jhumor
This is amazing and I'm impressed with how this is presented in plain English.

Thanks for all you do and Happy New Year.
01.01.2012 12:29 am (UTC)

(no subject)

zarhooie: Girl on a blueberry bramble looking happy. Text: Kat (DW powered by Disco)
Posted by [personal profile] zarhooie
Thanks for writing this! I hope your breads are tasty. :)
01.01.2012 12:31 am (UTC)

(no subject)

crosspistols: (Curiosity)
Posted by [personal profile] crosspistols
Just a quick question regarding the code itself. I know LJ used to just pile new code on top of each other, which is what led them to having to completely re-write the system as we saw in Release 88. I'm wondering if this sort of thing is going to happen to DW at some point, or if the code is managed entirely differently and more efficiently?

Also, Happy New Year! It's 2012 here already <3
01.01.2012 12:45 am (UTC)

(no subject)

jumpuphigh: Pigeon with text "jumpuphigh" (Default)
Posted by [personal profile] jumpuphigh
I think this post from [staff profile] denise on technical debt may answer your question.
01.01.2012 12:54 am (UTC)

(no subject)

crosspistols: (;D)
Posted by [personal profile] crosspistols
Ah that link is wonderful! (I didn't see it before due to it's date) Thank you! It answers some of my question!
01.01.2012 12:56 am (UTC)

(no subject)

jumpuphigh: Pigeon with text "jumpuphigh" (Default)
Posted by [personal profile] jumpuphigh
You're welcome. If you didn't read the comments, there is more information there as well.
01.01.2012 12:57 am (UTC)

(no subject)

crosspistols: (Happy)
Posted by [personal profile] crosspistols
I'm making my way through them right now o7
Posted by [personal profile] foxfirefey
Yes, sometimes DW is doing things that rewrite entire systems.

One example of this is the Javascript for journal pages; we are reimplementing it in a better way. You can test it out here:

http://www.dreamwidth.org/betafeatures

We have almost finished reimplementing all of the old dynamic behaviors under this new system, and people have been testing it the whole time. But, to the outside observer, there aren't going to be huge differences--most of the changes are happening under the hood.

Another example of a system getting pretty much a complete overhaul, front and back, is the posting page for entries. (You can also check that out using the beta features page.) This posting page has been developing with a LOT of feedback, has been posted about in news many times giving people a chance to check it out, and is going to pave the way for really nice things like draft and scheduled posts.

Another system we are going to totally redo from the base up in the future is the memories system, which is overdue for a rehaul.

So, yes, sometimes Dreamwidth does rewrite entire systems--but if they are user facing, you'll be warned about it before they happen and there will be feedback periods. DW can't guarantee that the changes will always be what everybody wants and that nobody will still prefer the old system, but they definitely consider feedback given.
Posted by [personal profile] crosspistols
Actually, I already have the beta features enabled! I'm glad DW really takes the feedback on board.
Posted by [personal profile] instantramen
*drive-by Beta love*
01.01.2012 12:36 am (UTC)

(no subject)

haruka: (basco-smile)
Posted by [personal profile] haruka
I won't pretend I understood it all, but I'm impressed with all of you and your ability to keep everything running smoothly. :) Happy New Year to you!
01.01.2012 12:38 am (UTC)

(no subject)

kore: (Dreamwidth on the xkcd map)
Posted by [personal profile] kore
Thank you for it explaining it all so clearly, and Happy New Year!
01.01.2012 12:44 am (UTC)

(no subject)

belief: (just praying to a god that i don't belie)
Posted by [personal profile] belief
Just saying that I've been a member on dreamwidth with an account since beta, and I'm very excited to see this site come into it's own! Also, I'm glad to see that you use MySQL, as well as other programs and implementations I'm familiar with. :) I am with you in being excited and optimistic about the future.
01.01.2012 12:45 am (UTC)

(no subject)

belief: (Default)
Posted by [personal profile] belief
Also, out of curiosity - do you run many VMs? You mentioned that you only have 2 physical machines, so I'm assuming that you do.
01.01.2012 01:13 am (UTC)

(no subject)

alierak: (Default)
Posted by [personal profile] alierak
No, we're only using physical machines right now. Two of them are database servers, six are webservers, etc., for I think a total of ten. We started out the beta period using about 50 VMs, not knowing how many of each kind of server would really be needed at that point.
01.01.2012 02:01 am (UTC)

(no subject)

belief: (Default)
Posted by [personal profile] belief
Ah, very interesting to know! Also, out of curiosity - is there an OS that is preferred at DW headquarters?
01.01.2012 03:26 am (UTC)

(no subject)

exor674: Text: "I survived open beta adn all I got was this lousy icon!" (dreamwidth open beta)
Posted by [personal profile] exor674
Dropping this here so it's on your radar ( or you may have been aware )

The Lucid perl has a segfault bug ( testable by a simple: perl -e "sub M::DESTROY; bless {}, M;" if I recall correctly )
I've successfully backported the maverick perl ( 5.10.1-12ubuntu ) for my Lucid dev env.

Relevant perl -V line:

DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method
01.01.2012 01:05 pm (UTC)

(no subject)

exor674: Computer Science is my girlfriend (Default)
Posted by [personal profile] exor674
If I recall, I hit it either when doing something involving synsuck or with the importer.

( Or who knows, I don't think the bug was in our code, maybe whatever Perl package had the issue got fixed instead )
01.01.2012 07:01 pm (UTC)

(no subject)

sophie: A cartoon-like representation of a girl standing on a hill, with brown hair, blue eyes, a flowery top, and blue skirt. ☀ (Default)
Posted by [personal profile] sophie
For the record, the Dreamhack machine is using Lucid with some Maverick packages. This is my /etc/apt/preferences file:

Package: *
Pin: release n=maverick
Pin-Priority: 50

Package: perl
Pin: release n=maverick
Pin-Priority: 600

Package: perl-base
Pin: release n=maverick
Pin-Priority: 600

Package: perl-modules
Pin: release n=maverick
Pin-Priority: 600

Package: perl-doc
Pin: release n=maverick
Pin-Priority: 600

Package: libperl5.10
Pin: release n=maverick
Pin-Priority: 600


This lets the Dreamhack machine use Lucid packages for everything except Perl. It also lets us use packages from Maverick that aren't otherwise in Lucid. (Though if Lucid *does* have them, it prefers those over Maverick's.)
01.01.2012 05:13 am (UTC)

(no subject)

doldonius: (Default)
Posted by [personal profile] doldonius
Why not straight Debian? I'm using it a lot, so whatever's wrong with it might bite me, too.
01.01.2012 09:54 am (UTC)

(no subject)

doldonius: (Default)
Posted by [personal profile] doldonius
Thanks! And lots of thanks for an outline of DW's internal structure.
01.01.2012 06:39 pm (UTC)

(no subject)

belief: (Default)
Posted by [personal profile] belief
Oh, great! I'm a big fan of Ubuntu. :) Thanks for answering all of my prying questions!
01.01.2012 12:45 am (UTC)

(no subject)

intermezzo: (ILY)
Posted by [personal profile] intermezzo
I had no doubts you guys had everything under control. DW loaded pages a bit slower for me a few hours ago, but it's all back to normal now. So honestly, I don't think you could have made more than what you did. Thanks for everything! ♥

Also? IDK how you do it, Mark, but when you explain things, I understand everything! *___*

Uh, I actually have a question about the importer (mind you, it's not really important, but I'm curious). Say I'm importing a community. If I hit the 'refresh' link, I'm taken back to the importer "main" page, and my main journal is selected. To actually see the status of the import, I need to re-select the comm I'm importing. Is that supposed to happen? I mean, it's no trouble at all to just click a couple of times to get to the import status, but iirc when I hit refresh and I'm re-importing content to my main journal, the import status page stays put.
01.01.2012 01:56 am (UTC)

(no subject)

intermezzo: (stock:pencils)
Posted by [personal profile] intermezzo
Thank you! *sends love and lots of hugs*
01.01.2012 12:48 am (UTC)

(no subject)

widowmaker: (Default)
Posted by [personal profile] widowmaker
Thank you so much for this post, it's been really interesting and informative to read. Happy New Year to you and Denise, as well.

Is the Varnish cache related to the 'Varnish errors' that LJ consistently suffers from? I think everyone who used LJ a lot in the past year has come to kneejerk loathe that word without really knowing what it is.
01.01.2012 03:11 pm (UTC)

(no subject)

kore: (Default)
Posted by [personal profile] kore
HA NO KIDDING. Apparently (their website says) it's "web application accelerator. You install it in front of your web application and it will speed it up significantly" and not at all bad in and of itself, but I got so tired of not knowing what the hell it meant.
01.01.2012 04:18 pm (UTC)

(no subject)

owl: Stylized barn owl (Default)
Posted by [personal profile] owl
Yet another reason why barfing system error messages untranslated at the users is a bad plan...
01.01.2012 12:55 am (UTC)

Happy New Year!

faere: (icon heart)
Posted by [personal profile] faere
Thank you for such a clear explanation! It's so neat to know which cluster my journal is living on too!

Many thanks to the Dreamwidth staff, volunteers and everyone who keeps making this place more loveable with each day!
01.01.2012 12:59 am (UTC)

(no subject)

subluxate: Sophia Bush leaning against a piano (Default)
Posted by [personal profile] subluxate
Thank you for this post; it was very informative and helpful! I did notice email notifications are somewhat delayed, but that may also be Gmail because an LJ comment was also late getting to me.

I really appreciate you keeping us in the loop. Have a happy new year and some very good bread!
01.01.2012 01:07 am (UTC)

(no subject)

evilawyer: young black-tailed prairie dog at SF Zoo (Default)
Posted by [personal profile] evilawyer
Didn't understand it (which is my thing) so can't formulate questions, but I thank you heartily for giving us all the explanation and putting us as ease as to DW's ability to handle the exra load (something I got worried about last time there was a big LJ-induced spike in usership). Happy New Year to you and yours, and enjoy the bread.
01.01.2012 01:26 am (UTC)

(no subject)

dots: (Default)
Posted by [personal profile] dots
I'm glad to hear everything is looking good! I've been a little worried about how you would stand with the increased traffic. Thank you for keeping us in the loop about where things stand!
01.01.2012 01:33 am (UTC)

(no subject)

backtothelight: Wall-E and Eve. (You light up my life.)
Posted by [personal profile] backtothelight
The moment I get money I can give you guys for my accounts, it's yours. You guys rock.
01.01.2012 01:46 am (UTC)

(no subject)

sinnesspiel: (It's-a me!)
Posted by [personal profile] sinnesspiel
Thanks for the open communication! I'm really impressed that you explained all of this and managed to dumb it down so that non-techies can understand the state of things.

I feel very comfortable in buying services from DW and and super psyched that my RP game is moving here! This is exactly the kind of business I'd like to support with my hobby!
01.01.2012 01:58 am (UTC)

(no subject)

suzette: (i had fun)
Posted by [personal profile] suzette
Thank you so much for explaining this stuff. I don't really understand it (not my field haha!), but I'm glad that it is there. I second [personal profile] widowmaker's question about Varnish in addition to asking another: what is DW's financial capabilities to run all this, especially if traffic does increase on account of community imports from LJ and whatnot? Not only to run and maintain on top of overhead costs?
01.01.2012 02:24 am (UTC)

(no subject)

denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
Posted by [staff profile] denise
Actually it's less than that! Until the recent server additions, our monthly spend was between $6500 and $7k per month, depending on various bits here and there. I was all excited and looking forward to being able to tell [site community profile] dw_biz that we'd actually started being in-the-black month to month, too. (Then this month happened! And, uh, now we have a little bit less to worry about for a while.)
01.01.2012 02:30 am (UTC)

(no subject)

jumpuphigh: Pigeon with text "jumpuphigh" (Default)
Posted by [personal profile] jumpuphigh
That is so good to hear! I'm looking forward to your next "state of all things money" post.
01.01.2012 02:32 am (UTC)

(no subject)

suzette: (let's get some)
Posted by [personal profile] suzette
Thank you so much for your speedy reply! I admit I had doubts about the financial capabilities required to run everything for x amount of time for the users who have come - and the users who will come in combination with the community importing - but this puts me (and likely others) more at ease.

I'm also amazed that you only have at most four people running this site. Is that a correct assumption? And do you think that with the increased traffic that you will be forced to expand in paid help any time soon? How much would that add to DW's financial strain? (Not asking for absolute amounts, but in general!)
01.01.2012 10:19 am (UTC)

(no subject)

rydra_wong: Dreamsheep holding a hammer; "Dreamwidth Antispam". (dreamwidth -- spamsheep)
Posted by [personal profile] rydra_wong
Just to add to the general info resources in the comments here, here's a post I wrote a while back on the various forms of volunteer activity at DW:

Dreamwidth Volunteering: what it is and how to do it

For example, I'm an anti-spam volunteer. *points at icon*

It's just a few minutes a day, when I have time, but I find it strangely soothing. *g*
01.01.2012 01:59 am (UTC)

(no subject)

inarticulate: a geisha reading in bed (all my favorites have happy endings)
Posted by [personal profile] inarticulate
This was really awesome to read; thank you so much! ♥
01.01.2012 02:26 am (UTC)

(no subject)

leonhart: Rinoa holding Squall with the text "I will follow you into the dark" (you'll find me)
Posted by [personal profile] leonhart
Thank you for writing this! Happy New Year! :)
01.01.2012 02:38 am (UTC)

Thank You!

misstia: (New Year: Fireworks)
Posted by [personal profile] misstia
As someone who just paid for an account here, this post just reaffirms my commitment that DW definitely deserved my money (I'm a LJ #88 refugee). I'm beyond impressed with the level of communication and willingness to work so hard---even on a holiday to not only ensure things run smoothly, but to explain it all to anyone who wants to know.

THANK YOU!! And may you and your family have a wonderful New Year!!
01.01.2012 03:04 am (UTC)

(no subject)

sharpest_asp: the DW logo 'D' surrounded by words (General: Dreamwidth)
Posted by [personal profile] sharpest_asp
Many blessings on you, the support behind the scenes, and the servers for the coming year.
01.01.2012 04:01 am (UTC)

(no subject)

vanessagalore: (Default)
Posted by [personal profile] vanessagalore
So interesting. Thank you for taking the time to write it all down in such clear language. And I also appreciate the transparency about the financial situation here, although I must say I'd like to see you guys making more of a profit. But it's also great that you're working on something that apparently makes you happy, and as they say, that's priceless.
01.01.2012 03:08 pm (UTC)

(no subject)

kore: (Default)
Posted by [personal profile] kore
Yeah, it sounds like you guys are really trying to grow a smaller but stable business (rather than the typical New Shiny Thing that gets lots of cash and hype thrown at it but isn't actually profitable), and that is great. I support smaller business companies whenever I can. I really dislike the big business model (both in terms of Web 2.0 and superstores), and it's great to see DW's approach is different.
Page 1 of 2 << [1] [2] >>