you’ve probably heard of: Gilt Groupe, The Business Insider, and ShopWiki. In 2007 Merriman co-founded 10gen—another AlleyCorp company—together with Eliot Horowitz, who is the database company’s chief technology officer.
Their first project was writing MongoDB, which departs in significant ways from the relational databases that Web application developers had always used in the past. “In a way, MongoDB is the kind of database I always wanted to have at DoubleClick,” Merriman says. “There were certain kinds of problems that kept coming up. There were scaling problems with the data, and also it was a pain to write apps using relational databases. It seemed like it could be easier.”
The basic problem with old-fashioned relational databases is that they weren’t designed for the huge datasets, unpredictable loads, or rapid software development cycles commonplace in the Web world. (If you’re a programmer, you might want to skip the next few paragraphs, as I’m about to give a painfully dumbed-down explanation.) For years, the Web world was centered around a set of free, open-source components known as the LAMP stack, for Linux (the underlying operating system running most Web servers), Apache (the open-source Web server program), MySQL (the free relational database management system developed by Swedish startup MySQL AB, now owned by Oracle), and Perl/PHP/Python (the most common Web programming languages). Relational databases can be fast and powerful, and MySQL is still the database of choice for many common Web applications, such as the Drupal, Joomla, and WordPress publishing systems. But more and more developers over the last half-decade have been coming up against MySQL’s limitations.
One of them is scaling. Relational databases consist of thousands of related or “joined” tables where the data in one column of Table A might be defined by the rows and columns of Table B, which might contain columns that only make sense in relation to Table C, et cetera. Unless all of these tables live on one machine, it’s hard to keep them joined, and it’s hard to make sure that each transaction—each query or read-write operation—is completed consistently. That makes it difficult to add server and storage capacity quickly if your Web application happens to become popular. And if you’re rewriting your application and you discover that you need to add a new category of data to the existing tables—well, good luck. Before Craiglist adopted MongoDB, adding one new column to its MySQL database would take three months of continuous computing time, according to 10gen’s Frieberg.
“It was the imperative to scale that started this space,” says Merriman. “At the app server level, it’s easy to have 100 servers with load balancers distributing the work. But it’s very hard to have a single database running across all 100 servers. Distributed joins becomes a very hard problem, and distributed transactions becomes a very hard problem. The way Big Table and MongoDB solve this problem is by saying we are not going to do those two things. That means some stuff is left out, but you can still cover a very large set of use cases.”
In MongoDB, as Merriman explains it, there are no tables, rows, or columns. It’s a document-oriented database, meaning it consists of collections of documents, with each document containing an arbitrary number of fields. Programmers can add fields to documents as needed, without affecting other documents in the collection. With this kind of data storage, chunks of the database can be