MongoDB Wizards Work to Make 10gen the Red Hat of Databases

you’ve probably heard of: Gilt Groupe, The Business Insider, and ShopWiki. In 2007 Merriman co-founded 10gen—another AlleyCorp company—together with Eliot Horowitz, who is the database company’s chief technology officer.

Their first project was writing MongoDB, which departs in significant ways from the relational databases that Web application developers had always used in the past. “In a way, MongoDB is the kind of database I always wanted to have at DoubleClick,” Merriman says. “There were certain kinds of problems that kept coming up. There were scaling problems with the data, and also it was a pain to write apps using relational databases. It seemed like it could be easier.”

The basic problem with old-fashioned relational databases is that they weren’t designed for the huge datasets, unpredictable loads, or rapid software development cycles commonplace in the Web world. (If you’re a programmer, you might want to skip the next few paragraphs, as I’m about to give a painfully dumbed-down explanation.) For years, the Web world was centered around a set of free, open-source components known as the LAMP stack, for Linux (the underlying operating system running most Web servers), Apache (the open-source Web server program), MySQL (the free relational database management system developed by Swedish startup MySQL AB, now owned by Oracle), and Perl/PHP/Python (the most common Web programming languages). Relational databases can be fast and powerful, and MySQL is still the database of choice for many common Web applications, such as the Drupal, Joomla, and WordPress publishing systems. But more and more developers over the last half-decade have been coming up against MySQL’s limitations.

One of them is scaling. Relational databases consist of thousands of related or “joined” tables where the data in one column of Table A might be defined by the rows and columns of Table B, which might contain columns that only make sense in relation to Table C, et cetera. Unless all of these tables live on one machine, it’s hard to keep them joined, and it’s hard to make sure that each transaction—each query or read-write operation—is completed consistently. That makes it difficult to add server and storage capacity quickly if your Web application happens to become popular. And if you’re rewriting your application and you discover that you need to add a new category of data to the existing tables—well, good luck. Before Craiglist adopted MongoDB, adding one new column to its MySQL database would take three months of continuous computing time, according to 10gen’s Frieberg.

“It was the imperative to scale that started this space,” says Merriman. “At the app server level, it’s easy to have 100 servers with load balancers distributing the work. But it’s very hard to have a single database running across all 100 servers. Distributed joins becomes a very hard problem, and distributed transactions becomes a very hard problem. The way Big Table and MongoDB solve this problem is by saying we are not going to do those two things. That means some stuff is left out, but you can still cover a very large set of use cases.”

In MongoDB, as Merriman explains it, there are no tables, rows, or columns. It’s a document-oriented database, meaning it consists of collections of documents, with each document containing an arbitrary number of fields. Programmers can add fields to documents as needed, without affecting other documents in the collection. With this kind of data storage, chunks of the database can be

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/