Hacker News new | past | comments | ask | show | jobs | submit login
Stack Overflow Architecture Update - Now At 95 Million Page Views A Month (highscalability.com)
128 points by ZeroMinx on March 3, 2011 | hide | past | favorite | 61 comments



For those who aren't reading it the ServerFault blog http://blog.serverfault.com/ is a treasure trove of information. The work they are doing and blogging about on the systems side is something a lot of people take for granted.


Hm. I love Stack Overflow but architecture only starts to become interesting at 4-5 times their traffic.

95M/mo translates to a mere 36/sec avg. A single web- and db-server will handle that without breaking a sweat, although you of course want some more machines for peaks, redundancy and comfort.


36/sec but then they're doing 800 http requests per second, not sure what could cause such a big difference (lots of ajax stuff?) but I think 800/sec is an interesting thing to see them deal with, that's something in the region of 1bn a month, which fits your requirement for interesting-ness :p


Perhaps the 800 req/second number is a peak figure? 36 requests a second average isn't meaningful when your traffic is spiky.


Probably a combination of other page resources (images, JS, stylesheets, etc), bots and peak access times.


which fits your requirement for interesting-ness

Sorry, the threshold of interestingness for serving static assets is even higher. ;)

A single nginx on a moderate host will barely warm up before ~10k reqs/sec. The network tends to be the bottleneck here. Anyways, ~800/s should be doable from a $10 VPS.


I don't think they're paying all this money for the fun of it


Hm, I'm not sure I follow. What money do you mean?


They have 10 web servers. If they could replace all that with a VPS or two, they would.


According to the article only 3 are dedicated to StackOverflow which sounds about right to me.


The VPS figure was for serving static files. So, a lot of the stackoverflow content isn't static.


We (blocket.se) are hitting almost 1b/mo and I can assure you that it's irrelevant wether the average p/v per second can be handled by some basic hardware or not. During peak hours it's an order of magnitude difference. Worth mentioning though that we are tightly focused in only one country, and probably have larger variations during peak hours than stack overflow does. I'm not sure what our traffic looks like combined with our other local sites.


If your peak hours exceed average by an order of magnitude then you have oddly shaped traffic. Most sites follow a bell curve where regular peak hours range around factor 2-3 above average.

Either way I didn't mean to discount their efforts. Was just trying to point out that their architecture is not very interesting from a performance point of view, yet.


I wish I had to figure out how to deal with a "mere 36/sec avg" number of visitors. I'd sell my company and retire.


architecture only starts to become interesting at 4-5 times their traffic.

Which suggests that even sites with a medium-to-large internet presence, which dominatre thier niche, can do so with "uninteresting" architecture.


Which suggests that even sites with a medium-to-large internet presence [...] can do so with "uninteresting" architecture

Absolutely. Hardware is evolving so fast (Moore's Law & friends) that we humans have a hard time to keep up mentally.

Go back six years in time and the traffic they're dealing would have required roughly 24 servers, instead of the 3 that they have today.

This is of course a rough extrapolation and six years seems like an eternity on the internet-calendar.

However we're quickly approaching a point where there's only two scales left to worry about: "normal" and "web-scale", with only a few hundred sites falling into the latter category.


Really cool to see how they moved from Windows-only to other technologies like Redis.

They dind't jump into the hype at first version, they started using other things just now when they really needed that.


T9space.com up to 42 Million pageviews a month: http://i.imgur.com/Vggvl.png 10 App servers dynamically converting websites to mobile versions + 1 MySQL DB + Haproxy LB + 1 Lucene Search


I use stackoverflow as the benchmark for success on my own* sites, we're still winning. We did 145m for February!


whats your website?


It's not even related, I feel bad now because it's irrelevant to the post, it just happens that the first time Stackoverflow announced their traffic I mentioned ours and Joel (I think?) said something about it, so now I always use them as a benchmark. Anyway, minecraftforum.net and minecraftwiki.net, I don't own them any more (reason for the * -- curse.com acquired them). http://www.quantcast.com/minecraftwiki.net & http://www.quantcast.com/minecraftforum.net combined 145m for feb


I believe criticsquid runs the Minecraft wiki...


I understand the tradeoffs between Windows and Linux and choosing one or the other for a particular task, buy why run Ubuntu Server on one machine and CentOS on others? Are the tools that Linux specific that it's worth keeping track of two versions, two package managers, etc?


I question the truth in that article and whether they really use Razor already. It's barely out, so is MVC3 and I haven't noticed any changes that would suggest that they would have converted it all from MVC2 to MVC3. It's a major task.

They might have some Razor in there because it is backwards compatible but to have it all razor is a stretch.


We're using ASP.NET 4 and MVC 3 in production. New features in both made it very worthwhile to invest the time in conversion.

We are not using Razor yet, but it's feasible they are. Take a look at this: https://github.com/telerik/razor-converter


> All pages accessed by (and subsequently served to) annonymous users are cached via Output Caching.

I'm surprised they don't cache output on logged-in pages as well, using ESI or some kind of client-side replacement. There's not that much personalization going on. Most page content is the same between users.


I would imagine that they calculated it would be cheaper to scale up their application servers and backend caching than pay for the development resource to facilitate this change.


Interesting on the switch from the Windows/MS stack. Is there not a decent No SQL option for a windows stack?


I'm not sure what switch you are referring to, they've been using HAProxy and Lucene for quite some time.

Also, what would a No SQL data store give them (for central data storage - not caching) that they aren't already handling with SQL? Keep in mind that they also have relatively easily partionable data (if necessary they can break off any StackExchange site and database to their own dedicated hardware).


redis is nosql, and they are using it on linux


Raven DB (http://ravendb.net/) is a pretty good NoSQL option for the .NET/Windows platform.


Wow, I had no idea this project existed. Thanks for the link!

I was looking for something like this a while ago for my startup but we ended up going with Azure and using their Table Storage.


AppFabric Server is closer to Memcached than Redis. However since their persistence layer is SQL Server, they could have used AppFabric in place of Redis.

I'm guessing they didn't want to use a technology that is so recently out of beta perhaps?


How did the code have to change to read from/write to Redis?


I doubt very much. They're basically using Redis as a much more powerful memcached. SQL Server still is the ultimate owner of all StackOverflow data, last time I talked to the team.


I guess I just haven't seen memcached integration in .NET. All of the caching examples I've seen use the framework caching components.

I was also wondering if they'd developed any generic custom code to fetch from Redis or fetch from SQL/insert into Redis, and how it fits into their application logic. (e.g. does it happen before they make a LINQ to SQL query, or does it intercept LINQ to SQL queries and return results from Redis).


If you're interested, here is a C# Redis Client: https://github.com/ServiceStack/ServiceStack.Redis

With downloads for windows: https://github.com/dmajkic/redis/downloads

Here's an example of an OSS 1-page mini-StackOverflow, written only using Redis: http://www.servicestack.net/RedisStackOverflow/

And this page has caching web services examples (at the bottom), with source code of non-non-invasively dropping in a Redis cache provider using an IOC.

http://www.servicestack.net/ServiceStack.Northwind/

If you want to learn more about Redis, here is an article on how to build an application using only POCO's and Redis: https://github.com/ServiceStack/ServiceStack.Redis/wiki/Desi...

With more info on the wiki: https://github.com/ServiceStack/ServiceStack.Redis/wiki/Desi...


Great, thanks!!


It's fascinating that a MS loyalist like Joel would end up using so much linux/oss software.

I wonder if Joel regrets not going 100% open source. I guess the cost of Microsoft licensing for all that is still very cheap if viewed as the price of being featured in Bing searches.


This might possibly be the most ridiculous comment I've ever seen on the internet, let alone Hacker News.

* I am not a Microsoft Loyalist

* I don't make the technical decisions. Why would I? I'm the person who knows the least

* It is not necessary to use Microsoft licensing to be featured in Bing searches... that is possibly the most preposterous thing I have ever heard in my life

* Microsoft licensing is a rounding error, with the possible exception of SQL Server licenses

* Your bizarre personification of Stack Overflow as "Joel" is charming and quaint and completely obnoxious to the 20-odd people at Stack Overflow that actually do the work instead of sitting around browsing hacker news like I do


1) I think the word "loyalist" was poorly chosen on my part. I meant it only to describe how Fog Creek has traditionally been a fairly Microsoft oriented shop... even after (in my opinion) open source languages/platforms began to surpass MS in usability, documentation, and community support.

2) While it may be the case that it is not necessary to use MS technology to be featured in Bing searches, I think you (Joel) are among the more influential developers in the world and that your firm leverages Microsoft's latest technology was likely a factor in the decision to "feature" the results instead of simply displaying them first when they're relevant. Maybe I'm cynical but that seems like a smart marketing move on Microsoft's part.

3) My mention of license pricing was intended to include all licensing costs, including SQL server. I'd guess the cost of your setup as described would be about $150K in licenses... Perhaps rounding error but still a nice developer salary.

4) No disrespect or insult was meant by my comment. I started out using 100% Microsoft tools and technologies, and I've been intrigued to see open source projects start to offer compelling value and features. Mostly my comment was intended to express surprise that it's not simply a matter of "rounding error" costs to do a 100% Microsoft solution for a major site... and that things like Redis and Ubuntu are part of the reason that StackOverflow is as performant as it is.


Welcome to the modern Internet - where the digital equivalent of the crazy homeless zealot also happens to know your name and career details.


There is a group of a people who will get worked up if they think you would say anything neutral or positive about Microsoft products. I'm not sure why but a sizeable number of people believe that you must be getting bribes from Microsoft to use their software.


That you would reply to this - makes me respect you even more


And I you.


"...instead of sitting around browsing hacker news like I do."

I think I'll redefine my definition of success to include this. :)


I think this is one of the things that a lot of people in the FOSS don't get. Most ppl I know on the MS stack use it because its the best tool for their job. But will also use FOSS where it makes sense. They're just not as ideological, and are much more technology driven.

I suspect the staff at Stack Overflow never blinked at using FOSS software. I suspect they said, "we need capability X, lets find the best tool to handle it".


I really wish it were the case that most devs on the MS-stack don't drink the kool-aid and use all-things-Microsoft. There are still devs using DataSet's in VB.NET, pushing the SOAP envelope, using the MS Patterns & Practices stack and continue to develop ASP.NET web apps like state-heavy windows apps since that's what they were told to do.

I don't blame anyone for forming a stereo type on .NET devs, considering pre-ASP.NET MVC (years late to the game), there was no competing web framework to the anti-web state-heavy ASP.NET, simply because IMHO there was none sanctioned for use by MS. Sadly there is an inherent Microsoft culture that will only believe and advocate anything as long as they've read it on microsoft.com.

Despite their failed DAL's and poor application frameworks, Truth is IMHO Microsoft have created a superior development platform with C#/.NET and with VS.NET/R# by the C# devs side, it's one of the most productive ones to use.

Fortunately there are free-thinking pro developers, not constrained to think inside MS toolbox, who just want to use the best software they can use, regardless of the culture. Interestingly it seems a lot of the time C#/FOSS devs will independently come to the same conclusions on best-of-class FOSS software to use: e.g. nginx / redis / linux / naigos -- i.e. the best tool for the job.

StackOverflow is clearly one such 'pro-dev company' not shackled to the MS Platform nor will purposely/religiously maintain an anti-MS stance since they'll continue to use MS tools and products where they think to be highest quality (i.e. C# / IIS7 / SQL Server).

It's quite clear StackOverflow has a job to do, and they'll end up choosing the best tools they think to do it. I believe this pro-dev, high-quality culture is why they've become so popular, providing some of the most enjoyable website experiences on the Internet - all without skipping a beat, or with a hint of growing pains.


I like your comment, but there's something untold here:

There's value in thinking inside the box - MySQL / PostgreSQL may not be the best in any category, but with inside knowledge of the internals, you may fine tune it to surpass any competitor in performance / scalability.

I can't find the project right now, but there was this one guy that made a plugin for MySQL which allowed him to get passed the query parser/optimizer and talked directly with InnoDB ... with the result being much better performance than Memcached; using InnoDB as storage. And because the Memcached protocol is public, heck, you could even have memcache-clients talking directly to your DB.

Not having loyalty to technologies or companies is pragmatic and kick-ass and makes you better in some ways - but it also gives you a wishy-washy attitude, and this outside-of-the-box thinking is actually inside-the-box from at least one perspective, as you'll end up jack of all trades, master of none.

And if you are going to master a piece of technology, which would you rather choose?

A proprietary piece of technology that's best in class for what it does, for which you don't have access to source code, and which might be discontinued after Microsoft invents this new / shiny and backwards incompatible alternative? Or an open-source alternative that's slightly worse in benchmarks or tools available or features - but that you can make it your bitch in any way imaginable?

Pick your poison.


I find the opposite. Back when I was working in .net, I remember one argument I got into with my boss with regards to software evaluation. MS has a testing framework and a build tool that are basically a generic xunit and an ant clone. Now, there are a great many better xunit implementations on .net, and I think we can all agree at this point that xml isn't the greatest programming language for build scripts. I was pointing out various choices that beat out the MS ones in terms of community, maturity, and features. His argument was basically that if MS doesn't have the best product, eventually they will make it better and it will become the best.

Every community has strengths and weaknesses. IMO the big weakness of the .net community is also the worst thing about MS itself from a tech point of view - a real unwillingness to look at what everyone else in the world is doing. Because of that, you see this strange effect where enterprise trends seem to happen about 5 years later in .net then they do in other communities. For example, it is only extremely recently (like in the last two years) that it started to become normal for .net shops to use ORMs, and the hot debates over whether or not to unit test are just starting to die down. And it is really in the last year that the most bleeding edge shops have started to look at DCVS (the cutting edge ones are on svn, and the majority on a cvs type system).

And for the record, I don't think .net is a bad platform or anything like that. What I do think is that the single thing that could improve most .net shops is being more open to non ms technology.


Because of that, you see this strange effect where enterprise trends seem to happen about 5 years later in .net then they do in other communities.

I think this is a different issue. And this I think is true, but actually by design. It's funny because when I consulted I ran into the opposite problem. I'd go into shops and the devs wanted to do some cutting edge stuff. I usually came in and convinced them that this was a bad idea. :-)

The reason for this was pretty simple -- usually the stuff they wanted to use wasn't ready, and didn't match the culture of enterprise shops. For example I remember NHibernate picking up steam and lots of shops wanting to use it, but there were a lot of holes in it still.

And DVCS is another great example. In enterprises there is very little advantage of Git over SVN. SVN's offline support has been great for years now. I can add/revert without touching the network. And in the enterprise you don't have a model of "anyone pull down a branch". You want a very controlled version of the code around. Yet, again, a lot of .NET devs want to use Mercurial and my recommendation, in most cases, is that its not a good idea.

BUT source control is a great example where .NET devs have clearly not towed the line, given that I rarely see a shop using TFS. Everyone uses SVN or Mercurial. Just as no one used Source Safe in the past. If .NET devs just followed MS blindly we'd see all of these .NET devs using TFS and waiting for MS to make it better. Hasn't happened.


I think you were consulting in fairly good shops, OR it is more a geographical thing, but man, that wasn't my experience. I was working in one of the better .net companies in toronto -- we were an MS gold partner, our CTO was an MVP and was frequently on the conference circuit, and we sponsored the local user group. I met and worked with a lot of great devs in the MS community, but frustration at that tunnel-vision thing was probably what ultimately got me to jump platforms

I sort of disagree on the DVCS point, just because the technology is so much better all around. If I start working on something, I make a branch for it. When I am done, I pull master, and rebase my topic branch back on top of the other work people have done. This allows for a few things.

1) If I get interrupted and need to work on something else, not only am I able to, but switching back to master, pulling, and branching again takes under 20 seconds and is 3 commands.

2) If someone else is working on the same part of the code base, I can pull their half done work from them directly if I need to, while keeping the mainline clean and in working order. Where I work we do pair programming, so this is really huge, any time we want to switch roles it is just a local commit and pull away.

3) no single point of failure. I don't know if this has happened to you before, but if your svn server gets corrupted, basically work completely stops until it is fixed. Reconstructing a recent version of your code base based on whats on dev machines is pretty much asking who was the last one to pull, because again, dev work doesn't mess with master since branching is so easy, so everyone should have a clean version of the mainline on their machines.

I think the advantages of DVCS are _way_ more obvious for open source projects, but you can see a lot of benefits using it pretty much anywhere.


While the comment might have come off as "ridiculous" I think the underlying theme is quite pertinent.

First, in response to Joel's comment, obviously Stack Overflow is personified as "Joel", the same way Apple is "Steve" and Windows is "Bill". What I find charming is your personification of it as collection of individuals who are are equally invested... it must be ponies and rainbows over there.

What's pertinent about the comment and probably at the heart of his "fascination" (mine too) is how this product is held up as a poster child for ASP.NET MVC, without giving the "whole story". The is very common of the Microsoft evangelists. Let's be clear, I am not accusing Joel of being a "loyalist" I think he is probably only pinned as one because of these evangelists running around trumpeting the use of ASP.NET MVC for this successful site. The true MS loyalists will never read this article or decide that Stack Overflow just "doesn't know about" the correct MS technology stack that could eliminate his need for any Open source.

So that's what's "fascinating". That an ASP.NET MVC application did not stay inside the reservation.

As for the right tool for the job nonsense, let's cut the bullshit. We all make religious decisions when we "start" a software project. You build out an idea your excited about in a language/platform your excited about. What's interesting here is that the real world of success has demanded an open source stack... and it could very well be cost and not capability. You are in FANTASY LAND if you think you would be paying a "couple grand" in licensing to scale out to what Stack Overflow must need to run its service. Add about 3 more zeros to that guess-timate.

End of the day, I don't think anyone regrets not going 100% open source you build it in what your excited about. The story I would be most interested in is what performance/cost analysis was done that steered them in the technology directions they took. I think we could all learn from that.


People rarely pay sticker price for this stuff. Right now SO is still using bizspark, but lets pretend we are paying sticker for our NY Datacenter:

* 10x Windows Server Standard R2 for the Web Tier (~1k per)

* 4x Enterprise OS (4k) for the DBs (were are going to give SO its own DB soon)

* 8 Sockets of SQL Server Enterprise R2 (27k) .

So 10x1000 +4x4000 + 8x27000 ~= 242k sticker price. Again people don't tend to pay sticker price for this stuff -- but this guesstimate is not $2 Million by any means.

References: http://www.microsoft.com/sqlserver/2008/en/us/pricing.aspx http://www.microsoft.com/windowsserver2008/en/us/pricing.asp...


As for the right tool for the job nonsense, let's cut the bullshit. We all make religious decisions when we "start" a software project.

I think you speak for yourself here. The fact that you can't even imagine that people don't make religious decisions at the start of a project is telling.


There are Java projects that use Oracle as the back end. Since Java is Free Software (technically at least) and Oracle is not, would you consider that to be a failure for Free Software?

Do you use a binary blob NVidia driver on your Linux computer? Would that be a case of not staying in the reservation?

You're a Rails developer, aren't you? twitter was hailed for a while as validation for Rails and Ruby in general. That is, until they realized it wasn't scaling and had to use something else. Failure for RoR? I guess they weren't religious enough (as you say) to stick with what they had and gosh darn it, make it work at all costs. After all, you can run your blog on RoR, just like you can on MVC. If you imply that using Redis on Linux to scale invalidates MVC as a viable platform, how is using TokyoCabinet (or whatever) to scale RoR any different? Unless of course your argument is purely ideological.

We make decisions on what technology to use based on what we know. On our areas of expertise and knowledge. If we pay X for something, it's because we believe that X is a good price for what we're getting back. If we decide to use Y because it's free, it's because it works the same as Z, which is expensive or limited.

If there are "true" MS loyalists out there who make decisions based on their adoration of the company, then that's their problem. More power to them and all that.

And by the way, if you feel that this represents a form of "capitulation" for the MS stack, then you need to go talk to MS about it, not StackOverflow. Because they made their technical decisions based on practicality, not religion (as you say). And besides, if it's supposed to be so embarrassing, why mention it at all? No one is forcing them to disclose what platform they happen to be running their key value store on.


>You are in FANTASY LAND if you think you would be paying a "couple grand" in licensing to scale out to what Stack Overflow must need to run its service. Add about 3 more zeros to that guess-timate.

Couple grand = $2000 Add 3 more zeros: $2,000,000

Stackoverflow is paying $2 million in software licensing fees! Wow, using the Microsoft stack is expensive!!


Ah I meant to upvote your comment and hit the down arrow, sorry.


I'm going to guess it's just a result of just using the best tools for the job. His developers know the MS stack, so it makes sense to build the application on top of that. Things like Redis and HAProxy are traditionally better left to run on a *nix server. I use MongoDB on Server 2008 for testing and whatnot, but once that goes to production it's going into a Linux box.

That the opposite isn't generally true (a mostly FOSS project using Microsoft bits) is a testament to what's available on the FOSS side, of course.

Your use of the word "loyalist" implies some kind of blind joined-at-the-hip fanboysm is at play on Spolsky's brain. I think that's probably very far from the truth. Those of us who use and take advantage of Microsoft technologies are also quite capable of recognizing that some things are better done on other platforms. I.e., best tool for the job.


The site states that they are using a WISC stack via BizSpark, which gives companies free use of a TON of tools and software for something like 4 years or once they make over $1 million in revenue. Sure, you could also do this for free using open source tools, but I would GLADLY pay a couple grand in licensing fees if my company was making over a $million in revenue using the Microsoft stack.I think the fact that StackOverflow uses a WISC stack is a testament to its reliability and robustness. Hey, whatever gets the job done.


I'd presume that they've broken the million dollar bizspark limit or they're doing something very wrong...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: