A Call For Good Data Stewardship Before The Digital Deluge

someone to pay the bills – for new media, data transfer, oversight, etc. – and someone to take responsibility for the access, evolution, and preservation of critical data.

Herein lies the problem. We often don’t include the “data bill” as part of our long-term IT cost estimates—especially when that means the costs of maintaining electronic records for decades. It’s easy to think that data is free and resides somewhere on the Internet where others are keeping an eye on it. We depend on longitudinal climate data, electronic medical records, and electronic financial data, but do we know who’s preserving that data for future access?

The economics of the access and preservation of digital materials is emerging as a fundamental challenge of the Information Age. Sustainable support for data preservation must survive the ebb and flow of competing institutional priorities, and cannot sustain gaps in funding. Current economic models for data preservation include advertising, per-use fees, institutional subsidies, consortium funding, subscription, and other approaches. Regulations such as Sarbanes-Oxley are beginning to provide “sticks” (penalties for non-compliance) to go with the traditional “carrots” (recognition of the importance of future investment, increased opportunities) to accelerate the development of viable sustainable economic models for digital access and preservation.

As the co-chair of a task force that recently released an interim report on this problem, it’s clear that we must think of our cyber-infrastructure in the same way we think of our bridges, utilities and physical infrastructure. We must integrate digital data (and information technology in general) into our economic models. Our cyber-infrastructure cannot be allowed to crumble. Only then will it provide the necessary support for our digital world.

I’ve developed more specific recommendations in a guide to data preservation that I prepared for Communications of the ACM (Association for Computing Machinery), which is available here. Our interim report on the challenges of sustainable digital preservation is available here.

Author: Fran Berman

Dr. Fran Berman, a pioneer in grid computing, is the director of the San Diego Supercomputer Center. First holder of the Jacobs School of Engineering High Performance Computing Endowed Chair, and Professor in UC San Diego's Computer Science and Engineering Department, Dr. Berman's vision has led SDSC into the Internet Age. She is one of the two founding Principal Investigators of the National Science Foundation's TeraGrid project, and also directed the National Partnership for Advanced Computational Infrastructure (NPACI), a consortium of 41 research groups. Dr. Berman has served on a broad spectrum of national and international leadership groups and committees and is currently co-chair of the international Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Dr. Berman has been recognized by Business Week as one of the top women in technology and by the IEEE Spectrum as one of the top technologists.