Facebook Doesn’t Have Big Data. It Has Ginormous Data.

Jay Parikh, vice president of infrastructure engineering at Facebook

measure which one performs better in terms of click-through rates, purchases, or what have you. Big Web companies like Google A/B test absolutely everything; in one notorious case Marissa Mayer, now CEO of Yahoo, asked her team to test 41 shades of blue for the Google toolbar to see which one elicited the most clicks.

Every Facebook user has been an unwitting participant in an A/B test at one time or another. In one small example from my own experience, I’m sometimes offered the ability to edit my status updates and comments after I post them, but other times the Edit option is missing. (I’d like to have the option all the time, but apparently the jury is still out on that one.) Facebook runs so many A/B tests that it has a system called Gatekeeper to make sure that simultaneous tests don’t “collide” and yield meaningless data, according to Andrew Bosworth, a director of engineering at the company. (He’s the guy who invented the News Feed.)

Sometimes the questions Facebook is testing are mundane—should a new button be red or blue? But other experiments are a great deal more complex, and are intended to suss out “what kinds of content, relationships, and advertisements are important to people,” Agarwal says. The company can break down its answers by country, by age group, and by cohort (how long they’ve been members of Facebook).

But the back end isn’t used purely to store data from active experiments. It’s also a rich mine for what medical clinicians might call retrospective studies. “If the amount of time somebody spent on the site changed from one month to another, or from one day to another, why?” says Agarwal. “What was the underlying reason for the change of behavior? At what step in the process did someone decide to pursue or not to pursue a new feature? That is the kind of deep understanding we can get at.”

The leaders of Facebook's Data Infrastructure team.
The leaders of Facebook's Data Infrastructure team. Left to right: Santosh Janardhan, Sameet Agarwal, and Jay Parikh.

Sometimes, whether out of pure curiosity or for more practical reasons, Facebook’s data scientists even investigate social-science questions, such as whether people’s overall happiness levels correlate with the amount of time they spend on the site. (At least one outside researcher at Stanford has found that there may be a negative correlation: the more upbeat posts you see from your friends, the sadder you feel about your own life.) There’s no question that the back end is mainly designed to support rapid product experimentation. But it’s not solely about “what designs work better and what don’t,” in Agarwal’s words—it’s also about “Facebook as a new medium by which people communicate, and how our social behaviors and social norms are changing.”

Overall, Parikh is certain that Facebook’s ability to experiment, measure, re-optimize, and experiment again—what he calls “A/B testing on super steroids”—is one of its key competitive advantages. It’s “the long pole in the tent,” he says. “It’s critical for running our business.”

But there’s one more type of analysis that may be just as critical. Facebook has enough servers to populate a small country, and it’s constantly collecting data on their performance. By instrumenting every server, rack, switch, and node, and then analyzing the data, the company can identify slowdowns, choke points, “hot spots,” and “problems our users haven’t even reported yet,” says Parikh.

His team recently designed one Web-based tool called Scuba to make it easier to analyze statistics about Facebook’s internal systems, such as how long it’s taking for machines in various countries to serve up requested files. Another program called Claspin shows engineers heat maps representing the health of individual servers in a cluster.

That’s the kind of thing Facebook’s infrastructure team usually has to build for itself, because “there is nothing commercially available that can handle our scale,” Parikh says. “So, analytics is something we use not only for product insight but also operational insight.”

Big Data Meets Cheap, Open Hardware

Whatever money Facebook can save by building infrastructure on the cheap goes directly to its bottom line. That’s why the company has, over the past three years, adopted another kind of build-your-own philosophy, one embraced in the past only by the largest of Internet companies—i.e., Google and Amazon. By designing its own servers and storage devices and sending the specifications directly to custom manufacturers in Asia, Facebook can now avoid shelling out for name-brand computing hardware.

In the hopes of further lowering its big data costs, the company is now trying to kindle a wider industry movement around what it calls the Open Compute Project (OCP). Announced in early 2011 and recently spun off as a non-profit corporation, OCP is dedicated to spreading Facebook’s designs for servers, high-density storage devices, power supplies, rack mounting systems, and even whole data centers. The idea is to convince manufacturers and major buyers of data center equipment to adopt the specifications as a common standard, so that everyone will be able to mix and match hardware to meet their own needs, saving money in the process.

So far, companies like Intel, AMD, Dell, Arista, and Rackspace have lent their tentative support. Name-brand makers of servers, storage, and networking devices like Oracle, IBM, EMC, and Cisco have not, for obvious reasons; in a world of disaggregated, commodity data center components, they’ll have an even harder time charging a premium.

Which is exactly the point. In the past, “Either you were so big that you could afford to build this at your own scale, or you were at the mercy of the vendors,” Janardhan says. “Now that we are publishing these specs—even the circuit diagrams for some of the machines we have—you can go to an ODM [original design manufacturer] in Taiwan and get, in some cases, 80 percent off the sticker price.”

Janardhan says he’s been stunned so far by the reaction to the Open Compute Project. “At almost any hardware industry conference I have gone to, that is the only thing people want to talk about,” he says. In a conservative business where both vendors and customers have been slow to

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/