Oh, for the days when we all thought the Library of Congress was big.
Said Library is estimated to contain about 20 terabytes of information. (A terabyte is 1,024 gigabytes.) And these days, when just about everything is recorded digitally, it seems like you can’t go through a single day without accidentally creating or replicating that much information in the form of work documents, e-mails, digital photos and video, financial records, search engine records, voice mails, medical images, DVR recordings, BitTorrent downloads, surveillance video, and the like.
Okay, I exaggerate. It still takes longer than a day to create 20 terabytes of data. But not as much longer as you might think. According to Framingham, MA-based International Data Corporation (IDC), the market-research subsidiary of International Data Group, yours truly is responsible for creating a “digital shadow” of about 250 gigabytes just since January 1 of this year. So locate just 80 people like me, and together we will already have created a 20-terabyte shadow in 2008. That’s according to the Digital Footprint Calculator, a free program IDC has released in concert with an update to its big 2007 white paper, “The Expanding Digital Universe.”
I downloaded the calculator a few days ago and gave it some estimates about the number of e-mails I send every week, the number of digital photos I take, the hours of TV I record, and the like. It shows a ticker on my desktop, like the old National Debt Clock in Times Square. At last count, I was responsible for creating 271,904,050,609 bytes of information. Oops—that’s 271,904,051,117. Damn! 271,904,051,348.
The truth is that to slow down the ticker, I’d have to stop writing this article. And you’d have to stop reading it. The point of IDC’s updated white paper—the 2008 edition has the more dire title “The Diverse and Exploding Digital Universe”—is that every tiny individual act in a cyber-community creates expanding ripples of data.
Consider IDC’s example of “a day in the life of an e-mail.” Say you send out a message containing 100 kilobytes of text and a 1-megabyte attachment to four people. Now there are 10 copies (the original plus the copy on your e-mail server plus the four copies on your recipients’ computers and the four copies on their e-mail servers), totaling 11 megabytes. Then there are the backups of all of those copies, and all the communications overhead (such as e-mail packet headers) generated as e-mails pass through the network. At the end of the day, IDC estimates, the original 1.1-megabyte e-mail has racked up a 51.5-megabyte shadow.
But only about half of our digital shadows arises from our individual actions. The other half is what IDC calls “ambient” content—“digital images of you on a surveillance camera and records in banking, brokerage, airline, telephone, and medical databases…information about Web searches and general backup data…copies of hospital scans.” It’s true, there are some scary things about having a digital shadow—the millions of credit and debit card numbers stolen from local retail giant TJX by hackers are a disturbing reminder. But most of the data that institutions have on us really is there so that they can serve us better. I think it’s kinda nice that because I purchased a book called You Are a Dog: Life Through the Eyes of Man’s Best Friend, Amazon believes I might also like Planet Dog: A Doglopedia.
Of course, all that data about us has to be stored somewhere. And in 2007, for the first time, the total amount of information generated worldwide—some 281 exabytes (an exabyte is 1,024 petabytes, and a petabyte is 1,024 terabytes)—exceeded the capacity of all the hard drives, tapes, CDs, DVDs, and volatile and non-volatile memory created to hold it, according to IDC. Fortunately, some of the data, like voice mails, doesn’t really need to be stored forever. But the IDC report creates the impression that the race is on between information creation and information storage—and storage is going to lose, unless companies pick up the pace. (The report’s exact words: the “mismatch between creation and storage, plus increasing regulatory requirements for information retention will put pressure on those responsible for developing strategies for storing, retaining, and purging information on a regular basis.”) So maybe it’s not a total coincidence that the sponsor of both the 2007 and 2008 white papers is EMC (NYSE: [[ticker:EMC]]), the Hopkinton, MA-based maker of corporate data storage networks.
By the way, I never thought I’d need to know what comes after an exabyte, but by 2010 or so, according to IDC, the world will be generating more than 1,000 exabytes of information annually, and IDC says the word for 1,024 exabytes is zettabyte. Which is a terrible word for such a grand amount of data; it sounds like a variety of pasta, but it’s actually more than 1,180,591,620,800,000,000,000 bytes. Multiply that by just 510 and you’ve got Avogadro’s number: the number of atoms in 12 grams of carbon-12. When I was taking high school chemistry, I thought Avogadro’s number was unimaginably large. But it turns out we humans have bigger imaginations than I thought.