For years, cyber attackers’ primary aim has been to pilfer sensitive information from businesses and individuals, either to sell it in the dark corners of the Internet, hold it for ransom, or use it themselves for material gain.
More recently, efforts to connect all manner of machines to the Internet has led to growing worries that hackers will gain control of critical infrastructure—such as the electrical grid or traffic lights—and wreak havoc.
One of the next frontiers in cyber threats, some security investors and technologists say, could be to manipulate data to use it against us in new ways.
“The obvious play today is to disrupt systems and steal data,” says Joseph Witt, vice president of engineering at data management software firm Hortonworks (NASDAQ: [[ticker:HDP]]) and a former National Security Agency software engineer. “But a much more nefarious and troubling problem is slow, persistent manipulation of systems and enterprises.”
Call this emerging threat the “weaponization of data,” says Bob Ackerman, founder and managing director of AllegisCyber, a bicoastal venture capital firm that backs startups in cybersecurity, data science, and connected devices.
“We see the beginnings of this in the Russian interference in elections by manipulating data,” Ackerman says.
By that he means adversaries creating fake online identities and distributing disinformation “designed to shape opinion” or to “sow confusion and undermine trust” in American institutions and leaders, Ackerman says.
The spread of disinformation is “something that we’ve practiced in warfare for eons—propaganda and electronic warfare,” Ackerman adds. “How do you make the planes [being tracked on the screen] appear some place other than where they are? That’s all disinformation.”
That concept could spread to commercial arenas. One threat Ackerman suggests would be to inject false data into automated financial trading systems, to drive certain stocks up or down. Or, in a worst-case scenario, feeding bad data to a high-frequency trading system could set off a chain reaction that could disrupt or melt down entire financial markets, Ackerman says. Still, that might be difficult for hackers to pull off because they’d have to get past financial firms’ security measures to insert the data. Plus, the stock markets have controls in place to halt trading to avoid such a disaster—although there is some debate about how well such “circuit breakers” work.
Nevertheless, Ackerman says any digital system that involves “data-driven automation” could be vulnerable to such an attack.
“Machine-learning systems are only as good as the data that they are trained with,” Ackerman says, meaning that if trained with disinformation, the systems could go haywire. “In a digital economy, everything we’re processing is ones and zeroes. How do we trust it?”
Witt thinks it’s possible hackers or a business’s competitors could try to infiltrate a company’s IT infrastructure to insert false data that could trip up their operations. In the case of an automaker, for example, perhaps the hacker would create erroneous data about supply chain activity that might cause the company to spend more on inventory.
“It’s one thing just to steal the data of your competitor so you know what they’re going to do,” Witt says. “It’s another to alter their systems in subtle ways that actually change what they do.” That’s a more sophisticated and “potentially more disruptive” kind of attack, Witt adds, although he says he hasn’t seen examples of it happening “in a commercial context yet.”
Greg Dracon, a partner with Boston-based venture capital firm .406 Ventures, which invests in cybersecurity startups (among others), says he has heard of targeted cyber attacks based on manipulating data. One involved changing a company’s financial documents to try to influence negotiations of its acquisition, Dracon says.
But outside of the sort of election interference that Ackerman alluded to, Dracon says he hasn’t heard about widespread cyber attacks involving the spread of disinformation or weaponizing data. One reason for that may be economics.
“It’s harder to monetize that,” Dracon says. “It’s much easier to steal data [and] sell it on the dark Web.”
Ackerman admits that his concerns about weaponizing data are still mostly just the “paranoid reflections of a cybersecurity guy.”
“It’s like, where are those bastards going to go next?” he says of cyber criminals. “This is where they’re going to go next.”
If he’s correct that such attacks will become more common in the next five years, he says, companies and organizations must get better at tracking data and confirming its authenticity. Ackerman thinks a potential technology tool could be a sort of digital wrapper that keeps data secure and helps verify that no one has tampered with it as it travels between different systems. (That idea sounds similar to encryption techniques, but Ackerman says it’s different because hackers could theoretically manipulate data before it gets encrypted, so that the encrypted package delivers bad data to the recipient.)
“I think data provenance is going to turn out to be one of the significant areas of data science innovation going forward,” Ackerman says.
Much of Witt’s software development work during the past decade-plus has been on tools that can help establish data provenance, among other capabilities, he says. He describes data provenance as a “digital chain of custody for data,” beginning at the point where a piece of digital information is created, and following it as it travels through any IT pathway or database.
While at the NSA, Witt was the lead developer of software called Niagarafiles (NiFi), which was aimed at automating the transfer of data between computer networks, even if the data formats and processes weren’t the same. The NSA released an open-source version of the software, called Apache NiFi, in 2014. The following year, Witt left the agency to help start Onyara, a company that developed software tools, powered by Apache NiFi, for managing the flow of data. Ackerman says he was one of Onyara’s investors. Hortonworks bought the startup that same year (2015). Witt’s role at the Santa Clara, CA-based company involves working on its DataFlow product, which uses Apache NiFi.
The software’s capabilities include automatically generating “rich event-level provenance data,” Witt says. Basically, that means the software tracks all the digital systems that touch the data, registers the timing of each data transfer, and validates the authenticity of such logs, he says. The software can help, say, track information about a car’s engine performance as the data gets beamed from an Internet-connected device on board the vehicle, to a cloud database where the manufacturer and its suppliers can access it (after personally identifying information has been scrubbed), Witt says.
Despite all the money spent on data analytics tools and cloud databases, many businesses—especially large global enterprises—still struggle to create a verifiable record of the origin and movements of every piece of data flowing through their IT systems, Witt says.
“At scale, it’s a really hard problem,” he says.
Of course, Hortonworks isn’t the only company developing software that can