adopt new ideas such as open source software, seeing the Facebook switch to generic, low-cost hardware has changed the conversation. “People say, ‘If Facebook can run it, why can’t we run it?’”
To Parikh, the Open Compute Project is a natural extension of Facebook’s immersion in the open source software community. “We started off many years ago doing this with things like Hadoop and Hive, but there are many other pieces of the infrastructure that we have open-sourced,” he says. (Hive is a distributed data warehouse system developed at Facebook and now overseen by the Apache Foundation; for more details on that and other tools that Facebook has contributed to the open source community, see our Facebook Big Data Glossary.)
At a January summit hosted by the Open Compute Project, Parikh said companies would have to work together to meet the challenges of big data—especially storing the 40 zettabytes, or 40,000 exabytes, of data expected to be generated worldwide by 2020. “I don’t think we are going to keep up if we don’t work together,” Parikh said.
Move Fast, Break Things
Facebook’s own ever-growing storage needs are never far from Parikh’s mind. Every month, Facebook must put another 7 petabytes toward photo storage alone, Parikh said at the OCP Summit. “The problem here is that we can’t lose any of those photos,” he said. “Users expect us to keep them for decades as they accumulate a lifetime of memories and experiences. So we can’t just put them on tape. ‘That Halloween picture from five years ago? We’ll send it to you in a week?’ That doesn’t work for us.”
At the same time, though, Facebook’s analytics data shows that 90 percent of the traffic a photo will ever get comes in the first four months after it’s been posted. Storing older, less frequently viewed photos indefinitely on the same servers with the newer, hotter photos is simply inefficient, Parikh says.
The company’s solution may be something Parikh calls “cold storage.” It would mean putting older photos into customized racks of hard drives optimized for high storage density and low power consumption, rather than quick retrieval. The less often the photos are needed, in other words, the less speedily they’ll appear. Eventually, Facebook will probably share the specs for its cold storage racks as part of the Open Compute Project.
Unfortunately, no similar tradeoffs are feasible when it comes to the activity logs that are the centerpiece of the back end infrastructure. All of that data needs to be accessible fast, and over the years, that means the back end has outgrown one data center after another, with each changeover necessitating a costly migration.
But in the last few months, Parikh’s team has been perfecting an improvement on Hadoop called Prism that could help sidestep that problem. The idea is to provide the illusion that the entire analytics back end is living in one data center, even if it’s distributed across two or more. It’s a no-more-band-aids moment. Prism will “allow people to do arbitrary analyses of arbitrarily large data sets, and prevent us from running out of capacity in a single data center,” explains Janardhan.
Facebook is one of the only big-data users that’s both building such solutions and talking about them in public. Prism means product managers at Facebook, and maybe other companies in the future, will get to ask even bigger questions and run even bigger experiments. “Every time we make [the infrastructure] faster by 10x we get 10x as much usage,” says Janardhan. “People find new things to do with the performance that we had not even thought of.”
One example: a political app within Facebook that allowed users, on Election Day last November, to say whether they’d voted. “You could see how many people were voting in what states and towns, and whether they were male or female, all the demographics, in real time, as soon as it happened,” says Janardhan. That’s the kind of information the TV networks pay exit pollsters good money for—but Facebook didn’t charge a cent. It was just a demonstration of Facebook’s big data chops.
Most organizations innovate more slowly as they get bigger. Thanks in large part to the work of the infrastructure team, Facebook hopes to move in the opposite direction. The company recently started pushing new releases of its front-end user interface twice a day, up from once a day before. “Our top priority, beyond keeping the site up and running and fast, is enabling our product teams to move at lightning speed,” says Parikh.
Sometimes that means breaking things. In a blog post last fall, Andrew Bosworth related a story about a tiny change in the way Facebook users can scroll through the list of friends available to chat interface that led to a catastrophic 9 percent drop in the number of chats initiated. “In a system processing billions of chats per day, that’s a significant drop-off and a very bad product,” he wrote. But with the analytics data in hand, Bosworth’s team was able to fix the problem within days; the new version was 4 percent better than the original.
The key to moving fast, Parikh says, is to “make sure the right guardrails are in place…so when you make a mistake, you mitigate and protect yourself from the fault. We never claim to move fast and never make mistakes.”
Having the right data on hand, in other words, enables Facebook to take greater risks—but also helps it pull back when necessary. And that, in the end, may be the biggest reason for any business to care about big data.