Monday night, in New York’s Flatiron District, Microsoft Research opened its doors to give the world a glimpse of what its data scientists are up to.
Jennifer Chayes, co-founder of the New York lab and managing director at Microsoft Research New England in Cambridge, MA, cut the ribbon last evening for the latest home of Microsoft Research New York City.
The team of data scientists has been in town since 2012, operating at two prior locations before settling in last August at the current digs on Sixth Avenue. “Hopefully we will not move every eight months going forward,” Chayes joked.
The lab, she said, is part of Microsoft’s ongoing efforts in data-intensive computer sciences, such as machine learning and information retrieval. That research can include collaborations in social sciences, economics, sociology, and other areas. While building up the lab, Microsoft Research has developed connections, Chayes said, with academic institutions including Cornell Tech, NYU’s Center for Urban Science and Progress, and the Institute for Data Science and Engineering at Columbia University.
This summer Microsoft Research plans to run a data science summer school at the new location, she said. The objective is to encourage more people, especially women and underrepresented minorities, to seek data science education.
In addition to finding a more permanent home, the New York team has also expanded. “In the past two years, we’ve more than doubled in size from less than 15 researchers to about 30 researchers,” Chayes said. One of Microsoft’s principal researchers, danah boyd (who prefers a lowercase spelling of her name), has also founded the Data & Society Research Institute in New York to address technology policy questions.
After the ribbon cutting, a series of Microsoft researchers, including bowtie-loving David Rothschild, discussed their work and the changing ways data is being used. Rothschild is known for using data to predict such things as the outcomes of elections, sporting events, and even winners of the Academy Awards.
He and researcher Justin Rao talked about what they called “medium” data, a convergence of small and big data that also uses new mediums to get information. Rao said big data often comes from such sources as the Internet of Things, search engines, social media, and GPS information. Meanwhile social sciences have relied on small data, he says, such as clinical trials, experimental psychology, and Bureau of Labor Statistics-type surveys.
A way to leverage medium data, Rao said, would be through second-screen devices as people watch television. Those devices, he said, allow for active data collection across a range of activities. “In some sense, second screens are everywhere,” Rao said.
Big data analytics, Rothschild said, and adaptive questions can help turn old-fashioned polling methods into something more meaningful. “These are things you can only do when you have people in front of a computer screen versus being called on a phone or knocking on a door,” he said.
Though some folks might try to draw conclusions from raw social media and online data, Rothschild said it can be unclear what is being discussed. For example, he said data might show people are talking online about Justin Bieber or the potential for war in Ukraine but the context of the discussion may be lost. Breaking down the questions people are responding to, he said, can give a better understanding the relevance of the information. “Online data is noisy, it’s broad, and not necessarily as deep as you would think,” Rothschild said.
After the presentations, Chayes moderated a panel that comprised Kathleen McKeown, director of the institute for data sciences and engineering at Columbia University; Dan Huttenlocher, dean and vice provost of Cornell Tech; and Microsoft Research’s boyd.
They discussed a bit of the evolution of big data and what it may mean for the future. Recalling the chatter about the big data phenomenon a few years ago, boyd said it raised thoughts of potentially amazing or terrible things that could be done with it. “It was this dichotomous tension we were seeing come down the line,” she said.
The rise of big data, she said, followed a long set of trends in statistical work and computer science. The spread of social media, she said, showed the public was starting to understand what sociologists already knew about mapping relationships in terms of networks.
The startup sector also evolved in the last decade, she said, from a world where relational databases were used to organize most information to a new understanding of how information could be architected. As perceptions about big data changed, more groups began to see how they might use it. “It’s a cultural logic shift.” boyd said. “I’m spending a lot of time with groups who’d never imagined they’d have any relationship to this.”
And more change is ahead, said Huttenlocher, as new ways to use data are discovered. “We’re in a time in the development of the digital age that’s analogous to the Industrial Age after the first wave of industrialization,” he said.
McKeown said there is a need for new approaches to be developed in machine learning and applied statistics to better accommodate different disciplines of science. “I’m seeing people come together in areas like environmental science and big data,” she said. “They don’t yet know how to apply machine learning methods. There’s a lot of promise there that hasn’t happened yet.”