If you ate on your best china every night, flew first class even on puddle jumpers, and habitually drove your Mercedes rather than your minivan to the grocery store, it would be a lot like what most big companies do with their data, according to Tom Cook.
More and more of the information that e-commerce companies and other data-intensive businesses collect sits on expensive “primary storage” devices from companies like EMC, Hitachi, and Hewlett-Packard. Those machines make the data immediately accessible to the company’s Web-based applications or enterprise management software. But on average, only about 25 percent of the data in primary storage is actually needed for day-to-day transactions, says Cook, CEO of Cambridge, MA-based Permabit Technology. “If you moved the other 75 percent to a lower-cost tier, you’d get much better efficiency and better cost savings,” he says.
Permabit, you may not be surprised to hear, offers just such a technology: what it calls “enterprise archive storage.” Enterprise archiving isn’t the same as the daily data backups that most companies generate. Those systems, which are often tape-based, are still needed to guarantee that companies can recover from disasters. The difference is that most companies never plan on using the data that goes into their backup systems, whereas Permabit’s systems are built to store the final copies of frequently used files—just at lower cost than primary storage.
Most companies pay $30 to $50 per gigabyte for primary storage, according to Cook, while Permabit’s systems list for $3.50 per gigabyte. If customers use compression and de-duplication (the weeding out of redundant data) to squeeze even more information onto Permabit’s hard drive arrays, they can get that cost below $1 per gigabyte, he says.
There’s a technical secret to how Permabit can store all this data cheaply and reliably, in a way that frees customers from having to “migrate” from one generation of storage technology to the next every few years. And there’s a business secret to how the company—which was founded in 2000 but has only begun to see serious market demand for its technology in the last couple of years, according to Cook—has stayed alive so long without an “exit” event for its investors.
The technical secret first. If you’ve ever wandered into a data center, you’ve probably heard of RAID—an acronym for “redundant array of inexpensive [or independent] disks.” This became the dominant technology in the 1990s for splitting up data across lots of PC-class hard drives (as opposed to the huge, expensive drives on 1980s mainframes). RAID is great for storing terabytes of data cheaply, and it’s somewhat fault-tolerant: if one drive fails, it’s usually okay, because the data is copied and stored on at least one other drive.
But RAID has a weakness. If one drive fails and a new one is installed in its place, the data that was on the failed drive has to be replicated by locating it and reading it off remaining drives in the array. If an error occurs during that process—if, say, a storage block becomes corrupted and unreadable—there’s a small but real chance that the original data will be lost forever. And if a second drive fails before the reconstruction is complete—well, let’s just say you’re hosed. (In the case of a 16-drive RAID 6 array with two failed drives, Permabit calculates that there’s a whopping 50 percent chance that reconstruction will fail.)
To guard against that problem, Permabit’s founder and chief technology officer, Jered Floyd, led the development of an alternative storage approach called RAIN-EC. That stands for “redundant array of independent nodes—erasure coding.” The erasure coding is the key part; it describes how Permabit’s drives slice up data during the de-duplication process to make it “erasure resilient.”
The geeky details: For any given chunk of data, RAIN-EC first splits the chunk into four “shards.” It then uses a special algorithm to whip up two additional “protection” shards containing bits and pieces of the first four shards, in such a way that reading back any four of the six shards is enough to reconstruct the original chunk. Each of the six shards is then written to a different storage node in the array. (A node can consist of a single hard drive, or a cluster of them.)
In this way, very large files get spread across nearly the entire array. If any single node in the array fails,