needs to pull in—specific types of knowledge, from linguistics [for example]—to how you make servers work well. That was a tremendous exercise in being able to pull the right resources together and get very disparate groups to communicate well.
We have a “who knows what” database—if we have a question about who in the company knows about mechanical engineering [say]. It has nothing to do with geography. It has to do with what questions you ask of people, and having a company culture where it’s conceivable you might have “random question X.”
X: Let’s talk about the genesis of Wolfram Alpha, which was released in May.
SW: I’d been kind of thinking about what makes knowledge computable for a long time. Like many of these things, the idea is only clear after you’ve built something serious about it. Looking back, I was sort of embarrassed to find things I was doing when I was 12 years old—gathering scientific information and putting it on a typewriter. I’d been thinking about how one makes knowledge systematic for a long time.
In the beginning of the ‘80s, when I was starting to work on NKS [A New Kind of Science], I had built a [computing] language called SMP. I was wondering how far you could get formalizing knowledge, and how does this relate to AI-ish things. At the time, I thought making all knowledge formal was too hard, we can’t do it.
After finishing NKS [in 2002], I was thinking—you can get complexity from simple rules. Can we make a large swath of human knowledge computable? I got more serious about that. At the beginning, it was really unclear this would be possible. There’s just too much data in the world, too many topics, you can’t understand the linguistics [of queries], you can’t deliver the stuff fast enough.
In linguistics, we used the NKS system. For years, people were trying to do natural language processing and making computers understand written text. It turns out to be really hard, but what do you mean by “understand”? For us, we have a very clear target: is this related to something we can compute?
X: Can you give more details on how it works? How do you interpret a query?
SW: We’ve had to build our own big edifice of linguistic processing to handle what we want. I wasn’t sure if it was possible. I thought there might be too much ambiguity. You might have to see the person—see if they were dressed in a spacesuit or in surgeon’s garb—to get enough context. As it turns out, it hasn’t been a huge problem. There’s enough sparsity in human expression. By the time someone is asking anything real, you have enough context. The whole thing is full of heuristics. Any sequence [of terms or numbers] could be anything. But if it’s the name of a town with a population of 20, and it’s 6,000 miles away from where the query is being asked, that’s unlikely [to be relevant].
X: Where does Wolfram Alpha get the data with which it computes answers?
SW: The truth is very little of our data comes from the Web. The Web is a great place to know what‘s out there, but in the end, for every one of thousands of domains, we’ve gone to the primary data source and gotten the most original, most useful source. One exception to that, I suppose, is what happens with linguistic things. Wikipedia is really useful to us—if we have an entity, a chemical, a movie, what do people actually call this?
The reason that Wolfram Alpha has been at all possible for us is we’re starting with