How Three Guys Rebuilt the Foundation of Facebook

Walk to the back of Building 18, on the edge of Facebook's new headquarters in Menlo Park, California, and you'll find the remnants of The Battle Cave. Today, this room is just another stretch of open office space, where rows of Facebookers work on who-knows-what. But if you look to your right, at the top of the wall, you'll see two metal brackets that once held a pair of flat-screen displays. That's where Joel Pobar and his crew would track their daily progress.
Image may contain Human Person Clothing Apparel Pants and Art
Jason Evans, Keith Adams, and Drew Paroski, three engineers at the heart of a swashbuckling mission to replace the foundation of Facebook -- without changing the site itself.Photo: Alex Washburn/Wired

Walk to the back of Building 18, on the edge of Facebook's new headquarters in Menlo Park, California, and you'll find the remnants of The Battle Cave.

Today, this room is just another stretch of open office space, where rows of Facebookers work on who-knows-what. But if you look to your right, at the top of the wall, you'll see two metal brackets that once held a pair of flat-screen displays. That's where Joel Pobar and his crew would track their daily progress.

Pobar oversees a team of engineers charged with rebuilding the very foundation of the world's most popular social network. They've toiled on this project for more than three years now, and for several weeks this past fall -- when progress stalled and it seemed like the thing might never see the light of day -- they hunkered down in this room at the back of Building 18, spending nearly every waking hour writing and re-writing code, struggling to hone their creation to the point where it could run one of the largest websites on the planet.

It was called The Battle Cave for good reason. They were battling software code, but also time. In an echo of The War Room in Stanley Kubrick's Dr. Strangelove, their efforts were mapped out on a pair of displays mounted high on the wall, a constant reminder of just how far they were from finishing what was supposed to be the future of Facebook.

They've since moved out of The Battle Cave, but those two metal brackets remain. They're a small reminder of an enormous bet placed by Facebook -- a technological bet that exemplifies the unique attitude that infuses this nine-year-old company, something founder and CEO Mark Zuckerberg likes to call "The Hacker Way."

"It was a high-risk, high-reward bet," says Jay Parikh, the engineering vice president who oversees the design and operation of the hardware and software that underpins Facebook. "We now operate at such an enormous scale, we have to take enormous risks in order to survive."

To understand this bet, you must first travel back a moment in late 2003. If you've seen The Social Network, you know the one: That cold, northeastern day when Zuckerberg sat down in his Harvard dorm room and first started work on his social network.

Inside 'The Hacker Way'

There's been more than a little controversy over where the idea for Facebook came from, what Zuckerberg was ultimately trying to do, and whether he began the project just after Caribbean night at AEPi. But one thing's for sure: When he sat down to build the site, he used a computer programming language called PHP.

Among web coders like Zuckerberg, PHP was all the rage in 2003. It gave them a means of building and re-building web software at a particularly fast pace, taking a shortcut around more complicated languages like C++ or Java. But as the months, then the years, passed, PHP's knack for rapid-fire development would become particularly important to Facebook and The Hacker Way, the philosophy of constant iteration that drives Zuckerberg and his entire company. Facebook engineers like to change things, and change them quickly. PHP lets them do that.

It's what's known as a "dynamically typed" programming language, meaning you don't have to take the time to define the specific parameters of each and every variable used in your program. "If you tell a room-full developers to build an application and they use a dynamic language," says Facebook engineer Keith Adams, "they will get it done faster."

Today, Facebook is used by more than one billion people worldwide, and more than a thousand engineers are dedicated to building and rebuilding the site. And they're still using PHP.

>'Facebook was able to preserve their cultural contract by solving a very hard problem. But it also marks a shift at the company. It shows that they're really getting serious.'

Eli Collins

In one sense, that's surprising. Although PHP is ideal for rapid development, it's in some ways ill-suited to running a website of such enormous size. When you build a site with PHP code -- as opposed to a "statically-typed" language like C++ -- you can build it at a faster clip, but you'll need far more machines to run the thing, and as you reach a billion users, all those servers get mighty expensive.

Yet Zuckerberg and company stick with it. Rather than switch to a new language -- as, say, Twitter has done -- they've invented new ways of running PHP at unusually fast speeds. In essence, they keep replacing the foundation of the site -- without changing the site itself. Such is The Hacker Way.

In 2010, Facebook rolled out a tool called HipHop. This would convert the PHP code into C++ before it was executed on the company's servers, and Facebook engineers eventually honed this tool to the point where it could juggle 500 to 600 percent more traffic than the pure PHP site on the same number of machines.

"There was a moment where, if HipHop hadn't been there, we would have been in hot water. We would probably have needed more machines to serve the site than we could have gotten in time," says Facebook engineer Drew Paroski. "It was a Hail Mary pass that worked out."

But shortly after that pass was thrown, Paroski, Adams, and third Facebook engineer, Jason Evans, decided they could do better. Standing around the proverbial water cooler one afternoon, the trio agreed that Facebook could take PHP performance to an even higher level if they replaced HipHop with something called a virtual machine -- software that could provide even greater synergy between the site's PHP code and the server hardware that runs it.

Rather than translate PHP into C++, they would convert it into native machine code -- the language spoken by the chips at the heart of the company's servers -- and they would do this as the code was executing. By tracking the way the site executed in real-time, they could get a better idea of how it should be translated into machine code -- and this would ultimately speed things up. "HipHop," Adams remembers, "seemed pretty beatable."

It was a bold idea -- especially when you consider that HipHop had only just gone live. Building such a virtual machine is a mammoth task typically reserved for software companies like Oracle and Microsoft and VMware -- companies with the sole aim of creating this sort of "systems software," software runs at the very heart of our computer systems.

Within weeks, they went to work on their virtual machine, and after a few months, they'd made enough progress to get the company behind the project. Eventually, the Facebook brass put another seven engineers to work on the new platform -- and halted the development of HipHop. It's a move that highlights Facebook's hacker-centric culture, but it also shows just how much the company has grown up over the years. "Facebook was able to preserve their cultural contract by solving a very hard problem," says Eli Collins, who worked on such software at tech giant VMware. "But it also marks a shift at the company. It shows that they're really getting serious."

The only trouble is that after Adams, Evans, and Paroski spent two years building this virtual machine, it was slower than the live website.

Facebook engineer Keith Adams.

Photo: Alex Washburn/Wired

The New Facebook

Keith Adams personifies Facebook's recent evolution. The Brown University graduate began his career at VMware, alongside Eli Collins, where he built the most complex of systems software. VMware makes software that lets the world's businesses run a different kind of virtual machine -- a way of treating one computer server like many servers-- and Adams worked on the guts of this "hypervisor" software.

According to Collins, when Intel first built microprocessors designed to work in tandem with VMware's hypervisor, Adams was the lone VMware engineer sent inside the chip giant to ensure the two pieces of technology worked well together. "That shows you how important he was to VMware," Collins says.

>'For me at least, this was a very scary time. What was really nerve-wracking is that we didn't have any really good theories about where all the extra time was going.'

Keith Adams

It may seem odd that Adams would move from VMware to Facebook, but that's what Facebook has become. Like Google and Amazon and Yahoo and even Twitter, Facebook has grown so large, it needs engineers capable of rethinking the fundamental ways our computers operate. Google is famous for building entirely new hardware and software that can run its worldwide data center network with significantly greater speed and efficiency, and now Facebook has reached the same point.

The company hires people like Amir Michael, who build servers. It employs engineers like Raghu Murthy, who build software capable of juggling data across tens of thousands of machines. And it nabs minds like Adams, who joined the social networking giant in 2009.

In the beginning, Adams worked on the Facebook search engine. But then he ran into Evans and Paroski, two other engineers steeped in the most complex of technologies. As a graduate student studying bioinformatics at the University of Idaho, Evans built a new tool for managing the use of computer memory. It was known as jemalloc, and it soon turned up in Mozilla Firefox, one of the world's most popular web browsers. "He basically helped us halve the amount of memory Firefox was using," says ex-Mozilla man Stuart Parmenter. "He's definitely one of the smartest people I know."

Meanwhile, Paroski had come to Facebook from Microsoft -- another systems software giant -- where he worked on the .NET runtime, what amounts to a virtual machine for Microsoft's C# and VB.NET programming languages. It only makes sense that Adams, Evans, and Paroski would dream up a virtual machine capable of juicing Facebook's PHP code. But actually building the thing is another matter.

According to Pobar and others who worked on the project, it was Adams who pushed hardest on the virtual machine idea, convincing the Facebook brass that it was the best way forward. He's an opinionated sort -- quick to back up his stance with the most reasoned of arguments. "He's a very strong voice," says Collins. "It didn't surprise me that he joined Facebook and immediately became one of their most prominent engineers."

In the end, the company put significant resources behind the effort -- backing Adams, Evans, and Paroski with other engineers like Mark Williams, Owen Yamauchi, Aravind Menon, Brett Simmers, Guilherme Ottoni, and Jordan DeLong -- and it organized the team under Pobar, a seasoned engineering manager who had also come from Microsoft.

Drew Paroski.

Photo: Alex Washburn/Wired

But things took far longer than expected. Part of the problem, Adams says, is that they underestimated the complexity of the task, but the other issue is that HipHop continued to improve. For months, they were chasing a moving target. After two years, they got to point where the virtual machine could run the entire world of Facebook, but it was still three times slower than the original HipHop system.

When Adams remembers this time, you can hear the emotion creep back into this voice. "For me at least, this was a very scary time," he remembers. "What was really nerve-wracking is that we didn't have any really good theories about where all the extra time was going." Evans and Paroski felt much the same way. "You could see the amount of stress these guys were under," their manager, Pobar, remembers. "Facebook had given these guys so much leash to really go after this -- and then it was like: 'Holy fuck. Are they actually going to get there?'"

They continued to chip away at the speed gap, but by the end of the summer of 2012, the virtual machine was still only 65 percent as fast as the live site. "We were like a pregnant lady. We just wanted this baby out of us," Pobar says. "But it was really unclear how that was going to happen."

So they went into lockdown.

The wall of 'The Battle Cave.'

Image: Facebook

Engineers in Lockdown

Lockdown happens all the time in the world of high-stakes technology. You move an entire development team in its own room and pretty much keep them there until the project is finished. "It's a common thing," says Sam Schillace, who helped build Google Docs at Google and is now the vice president of engineering at Silicon Valley startup Box.com. "We have a couple of these going right now."

Pobar moved Facebook's virtual machine team into a room on the ground floor of Building 18, where there were relatively few engineers, and according to Paroksi, they didn't even tell anyone they were there. "It was like we had been lost in the desert waiting for someone to rescue us by helicopter," Adams remembers. "But then we decided we just had to hike out. We didn't know quite where to go, but we had to get there under our own power."

>'It was like we had been lost in the desert waiting for someone to rescue us by helicopter. But then we decided we just had to hike out. We didn't know quite where to go, but we had to get there under our own power.'

Keith Adams

For months, the team -- and Adams in particular -- had been looking for the one magic place in the code where they could fix the speed problem in one fell swoop. But when the team moved into The Battle Cave, there was a change in philosophy. Rather than look for one cure-all, they would buckle down and fix anything and everything.

"The idea was to find small things, where you could quickly do an experiment and find out if changing that code was going to help," says Paroski. "If the experiment failed, you would put it down and quickly pick up something else."

On the far wall of the room, they organized their efforts on a massive white board. It was littered with Post-It Notes, each identifying a possible way of improving the system and each positioned according to how long the improvement would take. If a Post-It Note proved a dead-end, it was summarily moved to the side.

Then, above this white board, the team installed two monitors that tracked the speed of the new system relative to HipHop. At first, the line barely moved. But with Adams taking the early shift, and others, such as Evans and Paroski, working well into the night for a good five weeks, the tiny improvements began to pay off.

According to those who worked on the project, Adams was the main idea man. Evans was the engineer who would crank out vast amounts of code in finding a way to bring these ideas to fruition. And Paroski, the resident PHP expert, was the man who keep those ideas from veering off course, away from what the language was capable of.

Slowly, the speed line started to climb again, and on Election Day -- November 6, 2012 -- it finally passed HipHop. That weekend, Adams went bike riding with Eli Collins, his old colleague from VMware. "I remember it well," Collins says.

In a nod to the company's original PHP converter, they called the system the HipHop Virtual Machine, or HHVM for short, and it was soon installed beneath the live site, where it continues to run today.

HHVM uses what's called just-in-time compilation, which means Facebook's PHP code is converted to machine language as it executes on the server. This is the way the Java programming language runs, but the Java virtual machine was built over many years to serve an entire industry of programmers. The HipHop Virtual Machine was built just for Facebook -- though, as with so many parts of its infrastructure, Facebook has open sourced the system, so that anyone can use it.

Jason Evans, in The Battle Cave.

Photo: Facebook

The Garden State

With the HipHop Virtual Machine, Facebook can run PHP at speeds most developers never thought possible. But some still wonder why the company would go to such extremes. Longtime developer and programming pundit David Pollack doesn't buy the notion that PHP helps Facebook iterate at a faster clip.

>'Apple is about polish. Google is about scale. Microsoft is about, well, 30 years old. But Facebook is about innovation. They're not necessarily optimized for elegance. They're optimized for innovation.'

Sam Schillace

"PHP was optimized for doing a quick and dirty website. It's remarkable to me that Facebook has been able to scale a quick-and-dirty language to a site as powerful and as flexible as the Facebook site," he says. "But I just can't see PHP being the best way to build a website. It's kind of like people who live in New Jersey and call it the Garden State."

Even Adams admits his claim that developers are more productive with dynamically typed languages is controversial in certain circles, and Evans acknowledges that although a language like PHP may make you more productive in the short-term, it can make things more difficult in the long run.

"Static typing ends up being really good documentation for what you intended when you wrote the code," he says. "If you write some code in a dynamic language like PHP and then you come back two years later, you lack that documentation."

Certainly, Facebook has stuck with PHP in large part because it would be an even bigger task to rewrite the entire site in another language. It's called The Legacy Problem. "Eventually, you get to a size where it's not feasible to rewrite it all," says Paroksi.

But it's clear from talking with people like Adams that Facebook still prefers the feel of PHP, and though building a new virtual machine for the language was quite the risk, this is a company that's willing not only to take risks, but to take risks others wouldn't take.

"Apple is about polish. Google is about scale. Microsoft is about, well, 30 years old," says ex-Googler and Box vice president of engineering Sam Schillace. "But Facebook is about innovation. They're not necessarily optimized for elegance. They're optimized for innovation. The idea is to crush everyone with pure experimentation and velocity."

Others may not understand that. But they don't understand The Hacker Way.

Update: This story originally said that server engineer Amir Michael was employed by Facebook. But Michael left the company three weeks ago.