Aug 24, 2012 6:30 AM

Facebook Tackles (Really) Big Data With 'Project Prism'

Facebook has built a software platform called Prism that can juggle data across data centers spanning the globe. And at some point, it plans to open source this mystery creation, sharing it with whoever wants it.

Today, countless websites are facing the epic amounts of online data that first hit Facebook a half decade ago. But according to Facebook engineering bigwig Jay Parikh, these sites have it so much easier.

That's because many of the web's largest operations -- including Facebook -- spent the last few years building massive software platforms capable of juggling online information across tens of thousands of servers. And they've shared much of this "Big Data" software with anyone who wants it.

Together with Yahoo, Facebook spearheaded the creation of Hadoop, a sweeping software platform for processing and analyzing the epic amounts of data streaming across the modern web. Yahoo started the open source project as a way of constructing the index that underpinned its web search engine, but others soon plugged it into their own online operations -- and worked to enhance the code as necessary.

>'It lets us move data around, wherever we want. Prineville, Oregon. Forest City, North Carolina. Sweden.'

— Jay Parikh

The result is a platform that can juggle as much as 100 petabytes of data -- aka hundreds of millions of gigabytes. "Five years ago, when we started on these technologies, there were limitations on what we could do and how fast we could grow. What's happened with the open source community is that a lot of those limitations, those hindrances, have been removed," says Parikh, who oversees the vast hardware and software infrastructure that drives Facebook. "People are now able to go through the tunnel a lot faster than we did."

But now, Facebook is staring down an even larger avalanche of data, and there are new limitations that need fixing. This week, during a briefing with reporters at Facebook's Menlo Park headquarters, Parikh revealed that the company has developed two new software platforms that will see Hadoop scale even further. And Facebook intends to open source them both.

The first is called Corona, and it lets you run myriad tasks across a vast collection of Hadoop servers without running the risk of crashing the entire cluster. But the second is more intriguing. It's called Prism, and it's a way of running a Hadoop cluster so large that it spans multiple data centers across the globe.

"It lets us move data around, wherever we want," Parikh says. "Prineville, Oregon. Forest City, North Carolina. Sweden."

Hadoop is based on research papers describing two massive software platforms that Google built to run its search engine over a decade ago: GFS and MapReduce. GFS -- short for Google File System -- was a means of storing data across thousands of servers, while MapReduce let you pool the processing power in those servers and crunch all that data into something useful. Hadoop works in much the same way. You store data in the Hadoop File System -- aka HDFS -- and you process it with Hadoop MapReduce.

The two Hadoop platforms have helped run the likes of Yahoo and Facebook for years, but they're not perfect -- and as Facebook expanded to more than 900 million users, their flaws became ever more apparent. Most notably, both were plagued by a "single point of failure." If a master server overseeing the cluster went down, the whole cluster went down -- at least temporarily.

In recent months, Facebook eliminated the single point of failure in the HDFS platform using a creation it calls AvatarNode, and the open source Hadoop project has followed with a similar solution known as HA NameNode, for high-availability. But that still left the single point of failure on MapReduce. Now, with Corona, Facebook has solved this as well.

Traditionally, MapReduce used a single "job tracker" to manage tasks across a cluster of servers. But Corona creates multiple job trackers. "It has helped us scale out the number of jobs we can do on this MapReduce infrastructure. We can get more throughput through the system, so more teams and products here at Facebook can keep moving," Parikh says.

"In the past, if we had a problem with the job tracker, everything kinda died, and you had to restart everything. The entire business was affected if this one thing fell over. Now, there's lots of mini-job-trackers out there, and they're all responsible for their own tasks."

Tomer Shiran, one of the first employees at a Silicon Valley Hadoop startup called MapR, points out that his company offers a version of Hadoop that includes a similar fix, but he says that multiple job trackers have not yet reached the open source version of Hadoop. Shiran has seen a version of Corona, and he says the platform also lets you spin up MapReduce jobs much quicker than before.

Facebook's Jay Parikh provides few details on Corona, but apparently, it's already used inside Facebook -- and it's much needed. The company runs what Parikh calls the world's largest Hadoop cluster, which spans more than 100 petabytes of data, and it analyzes about 105 terabytes every 30 minutes.

But Facebook will soon outgrow this cluster. Those 900 million members are perpetually posting new status updates, photos, videos, comments, and -- well, you get the picture. This is why Parikh and crew built Prism, which will let them run a Hadoop cluster across multiple data centers.

Traditionally, Parikh says, you couldn't run Hadoop across geographically separate facilities because network packets couldn't travel between the servers fast enough. "One of the big limitations of Hadoop is that all the servers have to be net to each other," he says. "The system is very tightly coupled, and if you introduce tens of milliseconds of delay between these servers, the whole thing comes crashing to a halt."

>'We can move the warehouses around, depending on cost or performance or technology. We're not bound by the maximum amount of power we wire up to a single data center.'

— Jay Parikh

But Prism will change that. In short, it automatically replicates and moves data wherever it's needed across a vast network of computing facilities. "It allows us to physically separate this massive warehouse of data but still maintain a single logical view of all of it," Parikh says. "We can move the warehouses around, depending on cost or performance or technology.... We're not bound by the maximum amount of power we wire up to a single data center."

Prism is reminiscent of a Google platform called Spanner. Little is known about Spanner -- Google tends to keep much of its infrastructure work on the down-low -- but in the fall of 2009, it publicly described the platform as a "storage and computation system that spans all our data centers [and that] automatically moves and adds replicas of data and computation based on constraints and usage patterns." This includes network constraints related to bandwidth, packet loss, power, and "failure modes." If a data center melts down, for instance, Spanner can automatically shift data to another facility.

Google said the platform provided "automated allocation of resources across [the company's] entire fleet of machines" -- which spanned as many as 36 data centers across the globe.

Parikh acknowledges that Prism is analogous to Google Spanner, but he's also careful to point out that he too knows little about Spanner or how it's used. He says that like Spanner, Facebook Prism could be used to instantly relocate data in the event of a data center meltdown.

MapR's Tomer Shiran says such software is not available outside of Google or Facebook, but he points out that you always run multiple clusters, and he questions how many companies need such a thing. "Companies like Google operate at a scale no one else operates at," he says.

Facebook has yet to actually deploy Prism, and Parikh declines to say when this will happen. But he did say that at some point, the company hopes to open source the platform. And it plans to do much the same with Corona. Yes, few companies face the avalanche of online data that now hits Google or Facebook. But they will. "These are the next set of scaling challenges," says Parikh.

Photo: Shazoor Mirza/Flickr