Microsoft Rolls Out Tools to Help Scientists (and Eventually Companies) Manage Data Deluge

an oceanographer’s workbench. So we said, ‘Let’s get involved in this. Let’s do a proof of concept.'”

Barga got a couple of interns from the University of Washington to work on the project, and by the summer of 2007, they had a working demo. The idea of the software is to help people manage the workflow between data collection and analysis—coordinating a sequence of steps to be taken with the data. It’s not a revolutionary algorithm, but it’s a way to break the process into manageable chunks that can be reused and recombined, so you don’t have to start from scratch or hire a programmer every time you want to manipulate your data in a new way. The tools are built on top of Microsoft’s Windows Workflow Foundation (making use of Microsoft SQL Server and Windows HPC Server cluster technologies), and they include advanced gaming graphics tools to display what’s going on in your data.

The reception in academia has been very positive, says Barga, a 13-year Microsoft veteran whose group now totals seven people. All told, Microsoft has put well over $1 million into Project Trident, counting the researchers’ time. The next step is to get more scientists to use the tools, and to share their work. Currently, ocean researchers from the UW, Monterey Bay Aquarium Research Institute, and other institutions are using Trident tools as part of the Neptune oceanographic project for networking the seafloor, funded by the National Science Foundation. And astronomers at Johns Hopkins University are using the software as part of their Pan-STARRS project to detect objects in the solar system that could pose a threat to Earth.

There are plenty of other efforts to build scientific data management tools, of course. Some examples are Taverna, a UK-based project specialized for bioinformatics, and California-based workflow software projects Kepler and Pegasus, developed by academics. “We collaborate with them,” says Barga. “We wanted to show you don’t have to build these systems from the ground up.”

Barga says other organizations are getting interested in all this too. “We have medical research groups and financial analyst groups talking to us about it,” he says. But many challenges remain when it comes to dealing with data. For example, Barga says, some tasks are just too big for the computer on your desk to handle. You might need to manipulate a whole data center, say. That’s why Microsoft is also announcing a new programming language, called Dryad, which is specialized for doing high-performance computing across parallel and distributed systems. It could come in handy for large-scale studies that involve searching, filtering, and aggregating data on topics like social networks or broad economic trends.

As researchers and businesses think increasingly globally when it comes to data, you can bet there will be a big role for companies like Microsoft in providing the key tools of the trade. “Our ability to collect data will outpace our ability to analyze it,” Barga says. Computer science is going to be driven and challenged to visualize and analyze [more] data. It’s a big enabler.”

Author: Gregory T. Huang

Greg is a veteran journalist who has covered a wide range of science, technology, and business. As former editor in chief, he overaw daily news, features, and events across Xconomy's national network. Before joining Xconomy, he was a features editor at New Scientist magazine, where he edited and wrote articles on physics, technology, and neuroscience. Previously he was senior writer at Technology Review, where he reported on emerging technologies, R&D, and advances in computing, robotics, and applied physics. His writing has also appeared in Wired, Nature, and The Atlantic Monthly’s website. He was named a New York Times professional fellow in 2003. Greg is the co-author of Guanxi (Simon & Schuster, 2006), about Microsoft in China and the global competition for talent and technology. Before becoming a journalist, he did research at MIT’s Artificial Intelligence Lab. He has published 20 papers in scientific journals and conferences and spoken on innovation at Adobe, Amazon, eBay, Google, HP, Microsoft, Yahoo, and other organizations. He has a Master’s and Ph.D. in electrical engineering and computer science from MIT, and a B.S. in electrical engineering from the University of Illinois, Urbana-Champaign.