Greenplum Purchase Gets EMC into the Big Data Game

[Corrected July 8, 2010, 10:20 a.m.; see below] Boston is already a powerhouse in “big data.” It’s home to companies like Netezza, Dataupia, Vertica, and Lightwolf Technologies, which all help enterprises manage and mine the huge databases used in business intelligence applications. It was the site of the first “Boston Big Data Summit” last fall. And now, with the acquisition of San Mateo, CA-based Greenplum by Hopkinton, MA-based EMC, the region will be even bigger into big data.

Greenplum is probably best known as the provider of the multi-petabyte data warehouse that auction site eBay formerly used to analyze the behavior of site visitors. EBay users generate a reported 150 billion individual event records per day as they skim the site and place bids. That’s information eBay can use to optimize the site’s performance and serve customers better—but doing so requires sifting through trillions of records overall. This huge task requires a massively parallel processing approach, which is what Greenplum’s database software, built on top of the open-source Postgres object-relational database system, is optimized to do. [Update and correction: Oliver Ratzesberger, who is in charge of the analytics platform at eBay, wrote to say that the company now uses a different technology for analytics.]

The main difference between Greenplum’s technology and other database software schemes has to do with how data is accessed. In traditional database management systems built by companies like Oracle and Microsoft, different query processing jobs generally share access to the same hard-drive disks, which can slow down individual queries. But Greenplum’s so-called “shared-nothing” system divides data across multiple servers or segments, each of which has its own connection to a disk drive. That means a single database query can be run against many segments of data simultaneously—perfect for the analytics applications run by Greenplum customers like eBay, Fox Interactive Media, NASDAQ, the New York Stock Exchange, Skype, and T-Mobile.

Announced Tuesday, the all-cash acquisition of Greenplum (terms weren’t given) means that EMC will now have a data computing product division that allows it to compete directly with suppliers of large-scale data warehousing systems like Netezza and Vertica, not to mention database giants like Oracle and Teradata. It’s perhaps surprising that EMC would reach all the way to California for an acquisition in the big-data sector, given that there were several options within Route 128. But it’s easy to understand why EMC would want to be a player in this area, considering that

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/