Earlier this week, Amazon announced an important new feature in its cloud computing Web service offerings that should further cause those of us who work in technology to consider the way in which we build software systems. It’s another exciting little glimpse of the future, one that should give pause to anybody still thinking of Amazon as just a bookseller.
In case you’ve recently emerged from under a rock, cloud computing represents a shift in thinking built on four pillars: open source software, virtualization, cheap commodity hardware and, most importantly, an acceptance of computing capabilities that exist outside the corporate firewall. In recent years, Amazon has transformed itself from an online store to a broad e-commerce platform that includes fulfillment and payment services, all while introducing an advanced suite of Web services that allow other businesses to incrementally scale up their computing needs. They’re now the poster child for cloud computing.
Amazon’s latest announcement was the addition of the Elastic Block Store (EBS) to their Elastic Cloud virtualized Linux box service. In Linux-speak, a block device is usually used to mount a storage device, such as a networked file system or a local disk, to a local server. Previously, EC2 virtual servers or “instances” were stateless—meaning all data on the server was lost when it rebooted or crashed. In order to keep data persistent, one needed to copy it to a durable storage system such as Amazon’s S3 service, which allows for reliable long-term storage of data. The new EBS makes this much easier and allows EC2 instances to create snapshots of data that are kept on S3. If the EC2 instance reboots, your data is safely stored on S3 (at least from the point of the last snapshot). This greatly simplifies previous methods of dealing with data on EC2 and will make it more attractive to developers.
The notion of using network-attached storage devices on Linux machines is not new, but something here is. In the Olden Days (pre-2006), figuring out how to attach a terabyte of highly redundant storage to one of your machines usually required some wrangling. It was an expensive proposition that involved heady fixed costs that were usually hard for any entrepreneurial spirit to swallow. If you needed 100Gb today and would need 700Gb later in the year, you were stuck with the option of buying 1Tb of disk up front and using it gradually. Today, it’s a pay-as-you-go system that can scale up to use some serious storage in a manner that’s very easy to access from a traditional Linux filesystem.
It’s not all roses, however. EC2 instances sometimes suffer from erratic or slow disk I/O, although Amazon advertises that the EBS performs better than EC2 local disks. You currently cannot share an EBS between multiple EC2 instances. In other words, you can’t spin up twenty EC2 servers and have them mount a shared filesystem like you can with a traditional networked filesystem in your datacenter. This is a complex problem, but Amazon has the technical chops to work on it and I expect them to look closely at this for future EBS releases.
The question that system architects should be asking is “does my architecture allow me to take advantage of services like EBS?” My company, StyleFeeder, has tens of millions of images hosted on Amazon’s cloud computing services. We were able to migrate our data quite easily primarily because key parts of our infrastructure were designed with a level of abstraction that requires only minimally invasive software surgery. Once you start working with pay-as-you-go cloud computing services, it’s important to think creatively about how you can use them. Removing rigid structures in your systems isn’t a small task, to be sure, but working in this direction can help revolutionize what your business is capable of and how quickly you can move it forward.