The second episode of Xconomy’s new podcast, Xconomy Voices, features Recorded Future co-founder and CEO Christopher Ahlberg. His Somerville, MA-based cybersecurity company monitors both the public, visible Web and the Internet’s darker corners for “threat intelligence” that can help its clients prepare for, and fend off, cyber attacks.
Ahlberg’s background in data analytics and his perspective on cyber threats make him a compelling figure in the national cybersecurity discussion. Here’s the full transcript of our interview, which took place a few months ago. (This Q&A includes a lot of material not found in the podcast.)
Xconomy: Can I get you to start by telling us a little bit about where you come from and what you’re up to now—your background and how you came to Recorded Future. And then we can talk about the company itself.
Christopher Ahlberg: I’m the co-founder and CEO of Recorded Future. I have a background in analytics and data visualization. Originally I grew up in Sweden. I came to the U.S. 20 years ago. I started a company called Spotfire here in the Boston area that we built and sold eventually to a company out on the West Coast, and then started Recorded Future together with two guys. I’m a co-founder. We started that back in early 2010 and have been sort of off and running since then.
Xconomy: What is the central idea at Recorded Future? If you had to say what sets you apart from other companies in the cybersecurity or threat detection sphere, what’s your key idea?
CA: Cybersecurity has become an enormous marketplace in itself. It’s probably a $100 billion market, or soon to be, if you add it all up. But you’re talking about a lot of stuff that adds up to that $100 billion. We play in the market that people would call threat intelligence, probably a billion and a half of the market.
We are basically centered on the idea of being able to pick up the threats before they hit your doorstep. What happens outside your company. We try to detect bad actors before they come to you, before they attack you. We try to understand their intentions, their capabilities, what they’re up to, what they’re doing tomorrow, what they’re doing next week. But we also try to find out who is being targeted.
I would say that the best indication of who’s going to begin breaking into your house is that your neighbor has had a break-in and you have the same lock, or the guy with the same flower delivery service or the same cleaning service, and so on and so forth. So we help people understand what happens outside their firewall, basically, and do that in a way that is very actionable and very impactful to their company.
Xconomy: And what’s the basic underlying technology? What would you say are the tools you’re applying to that problem?
CA: We start off using the Web as one large sensor. Historically, when you had intelligence agents running around on the ground, you might listen into phone calls or have satellites in the sky. You actually might read newspapers, open source intelligence. There are lots of ways to do intel. We started off with this idea that the world’s information was quickly flowing to the Internet and the Web. And so we started off harvesting the Web at a large scale and doing that not just in English but in basically all the languages that bad news happens in: Chinese, Russian, Farsi, Arabic, and Spanish, French.
So it’s sort of now built up to basically covering some 30, 40 different languages in total. But then also getting into the underground of the Web—people like to call it the Dark Web—as well as into the technical areas of the Web and really being able to pull together from open sources, Dark Web sources, as well as technical sources all into one place where we connect the dots to allow us to find the most imminent threats and help them be very actionable for a company or for an organization.
Xconomy: What are your equivalent of the intelligence agents running around? I’m assuming you’re using software, right, so we’re talking about machine learning?
CA: Machine learning plays a role in all of this, absolutely. But our equivalent of the people running around…that’s a good question, actually, because it turns out that our competitors in this space, they basically all are dependent on humans collecting information, which can be all nice and fine. But the problem is it’s inherently very unscalable. We run hundreds and up towards thousands of servers in the cloud that collect information from all those types of sources I talked about before. Open sources, Dark Web sources, as well as technical sources. And then pre-aggregate and aggregate that information into consumable information or artifacts, whether it’s data visualizations or alerts or reports or API feeds for computers.
Xconomy: And then what are you doing on those sources? Can you say a little bit about the techniques you’re applying, the computer science approaches to analyzing that information, turning into something actionable?
CA: So you can imagine that we pick up, in a Chinese newspaper, where China states their capabilities for their new offensive cyber capabilities that they’re building. Or we’re in a Russian forum and a series of Dark Web actors discuss how to commit fraud to retail business in the U.S. Or an Iranian forum where a bunch of hackers talk about how to [gain] entrance into a facility of some sort.
From there, what we do is natural language processing. So, being able to look at language where it might say, Actor A says to Actor B, “I’m developing this piece of malware that has this capability,” and so on and so forth. We’re able to take that natural language and decompose that into data points, which can range everything from threat actors to their capabilities, their intents, as well as the technical data that goes with it. In other cases we pick up more technical information and organize that. Part of the trick here is being able to marry the data coming from narrative text with the data that comes from more technical sources, and put that together in a place where it can be consumed by either human or machine.
Xconomy: And then I guess you supply this intelligence to your clients in a form that lets them do something with it. So can you give an example of how this information might be made useful if you become aware of a threat that may or may not materialize. What do you do next?
CA: Assume you’re a large financial institution somewhere here on the East Coast. Lots of different things can happen, and you might find out that your neighbor bank was hacked. You want to know about that. You want to know about it immediately. Because many times the bad guys…will come after you. Not necessarily because they wake up in the morning and say, “I want to go hack Bank X,” but because they found a way in. And once they’ve found a way in, they’re going to go from Target 1 to Target 2 to Target 3 or 4.
So you’re a target of convenience and you just want to make sure that when you see Target 1 being hacked, and if you’re Target 3, you want to patch your systems based on what you learned from that hack and Target 1. So keeping a high degree of visibility to what’s going on in the threat landscape is highly important. That’s number one.
Number two, you might find out, whether it’s in a forum or similar, that the bad guys talk about what softer vulnerabilities that they’re going to attack. So you’re going to see somebody saying, “Look, I’m building a piece of malware that takes an exploit that takes advantage of this vulnerability,” and they might actually use the same sort of descriptor for the vulnerability that the U.S. government publishes. And then they’ll talk about that and they might actually publish an exploit kit on that and sell that to a fellow bad guy. So again, if you can gain that sort of intelligence, that will tell you what systems you need to patch on your site to get out ahead of any threat like that.
It could be that you find leaked credentials—usernames and passwords that have been stolen from a company. And on and on. There are many of these sorts of scenarios. In fact we probably count some 60 or 70 different core scenarios of this kind that we configure into our product, to help customers be prepared for threats and be able to take action on the scene as soon as they see them.
Xconomy: Would you say that the kind of threat intelligence you guys provide is a necessary complement to a more defensive internal posture? Lots of companies have their own Internet and Intranet security software and you’re not saying that they can do without that. You’re just saying that unless they understand the threat environment, they’re only halfway prepared.
CA: Yeah, exactly. Historically people have invested primarily in defenses of their firewall, and that’s where that other $80 billion has been going, to try to firewall their perimeters, and they’ve been building what people have thought about as higher and higher walls… thicker and thicker walls. The problem is that if you think about this in reality, people haven’t been building thicker or higher walls, they’ve really just been building mazes.
And those mazes are getting more and more complicated. And the interesting dilemma that we end up here with is that we have high degree of personnel turnover in security departments. It’s one of the highest-turnover sort of places. We need to know all the possible entry points and be able to try to do something about those. The bad guy, he just needs to know one password. So what we have really [is] just sort of a superficial wall, and you need to supplement that with other approaches, threat intelligence being one of the few ways that you actually can supplement that.
The other analogy is, think war. If you were defending a castle or a moat or a bridge or whatever, you’re going to defend it. Whether you were in Roman times or these times, you’re not going to just sit there and wait it out. You’re going to send out people to look, before the bad guy shows up. And there are many ways to send that guy out. And we’d like to think we’re one of those ways. To fight this war without intelligence is really just stupid.
Xconomy: I’d like to hear a bit about the history of Recorded Future. My understanding is that when you guys started out in 2009 you were using these sources on the open Web to try and make predictions about the near future, based on chatter or intel from the open Web from maybe hotspots around the world. What was the original idea, and what does it take, or what did it take then to be able to make a reliable prediction about a world event?
CA: The company got started in more general intelligence. It’s always had a very strong focus on intelligence, and we still have a strong foundation in that. And even in our business there’s sort of a strong foundation in intelligence. And then over time what we’ve done is just sort of build more and more of the business