“Ultimately, this will tie in to the science of urban and metro areas, looking at optimizing entire systems where human beings play a key role, including the power grid, cybersecurity, traffic and transportation systems, communication, development, and the environment,” Jandhyala says.
The NIAC provides a conduit to “real-world problems”—critical to training data scientists—being tackled by PNNL in these and other areas, Jandhyala says. “Here’s where the connection to PNNL becomes very important, because there’s datasets which nobody else can have access to,” he says.
The lab is participating in some of the most data-intensive projects in the world right now. For example, it is preparing to host data from the Belle II high-energy particle physics experiments, which will be performed in Japan, beginning in 2015.
PNNL’s relationship with the Belle project, hosted at KEK in Japan, deepened after the 2011 Fukushima earthquake and tsunami, when PNNL stepped in to help the international community of scientists carry on with data analysis of the Belle experiment, says Dick Russell, manager of high-performance computing in the energy cluster of PNNL’s Computational Science and Mathematics Division.
Some 240 petabytes of raw data are expected to be stored at PNNL, more than the projected output of the Large Hadron Collider at CERN. “Right now we think it’s one of the biggest envisaged datasets,” Russell says.
Work is going on now at PNNL to help design the experimental apparatus and plan for transferring a full backup copy of the data from Japan via undersea cables.
Meanwhile, both UW and PNNL have been national leaders in advancing data-driven discovery. Coordinated work in this area will be a key focus for NIAC.
“Most fields of discovery are transitioning from data-poor to data-rich,” says Lazowska, who leads the eScience Institute, which is the center of this work at UW. “The world is full of tiny but powerful sensors—in telescopes, in gene sequencers, in roads and bridges and buildings, in our environment, in the form of Twitter feeds and Web requests. The challenge today is converting all of this data into knowledge, and converting this knowledge into action.”
The goal of the eScience Institute is to make UW a leader in inventing new approaches to data-driven discovery, and also in making these new approaches usable by researchers in a broad range of fields.
Dunning, the new NIAC co-director, also has deep expertise in this area, having helped create the Department of Energy’s Scientific Discovery through Advanced Computing program.
That’s one of several assets he brings to the NIAC, and in turn to the Northwest’s growing big-data cluster.