In a recent article in Science, Cornell professor Hod Lipson and graduate student Michael Schmidt described a new computer system that can discover scientific laws. At first glance, it looks like a fulfillment of the dreams of “computational scientific discovery,” a small field at the intersection of philosophy and artificial intelligence (AI) that seeks to reverse-engineer scientific imagination and create a computer as skilled as we are at constructing theories. But if you look closer, it turns out that the system’s success at analyzing large, complicated data sets, formulating initial theories, and discarding trivial patterns in favor of interesting ones comes not from imitating people, but from allowing a very different kind of intelligence to grow in silico — one that doesn’t compete with humans, but works with us.
Efforts to create computer programs that find scientific laws date back to the 1960s. Pat Langley and Herbert Simon’s BACON programs, for example, were able to rediscover several scientific laws, including Kepler’s Third Law and Ohm’s Law. However, these systems haven’t had much of an effect on science, as they’ve been designed to work in narrow fields or have required cleaned-up, hand-groomed data to work at all. Other programs, designed to make routine tasks more efficient and reliable, have had a much bigger impact. There’s no shortage of creative researchers, but there is a lot more scientific grunt work these days, and a lot more data to crunch.
The problem is, as data accumulates, doing creative things with it gets harder. At a certain point, volume overwhelms understanding. As Schmidt explains, their program solves this problem by “exploiting the computing power that’s available right now” — using the technological advances that created the problem to solve it. Where a scientist might rely on intuition or existing theories, the Cornell system “applies millions or billions of terms and nonlinear equations to the data and looks for ones that make sense of deeper underlying phenomena,” says Schmidt. It starts from zero, but it’s a fast learner.
This approach is computationally intensive because it uses a technique called evolutionary programming to analyze and theorize. You can think of evolutionary programs (or genetic algorithms) as ecosystems in which simple programs compete to explain data. The poor performers are discarded; the best integrate, evolve, and spawn a new generation of programs. These are tested against new data, and the process is repeated until a handful of strong theories remains.
Older AI projects in scientific discovery tried to model the way scientists think. This approach doesn’t try to imitate an individual scientist’s cognitive processes — you don’t need intuition when you have processor cycles to burn — but it bears an interesting similarity to the way scientific communities work. Lipson says it figures out what to look at next “based on disagreement between models, just as a scientist will design an experiment that tests predictions made by competing theories.”
But that doesn’t mean it will replace scientists. Schmidt views it as a tool to see what they can’t: “Something that is not obvious to a human might be obvious to a computer,” he speculates. A program, says Schmidt, may find things “that look really strange and foreign” to a scientist. More fundamentally, the Cornell program can analyze data, build models, and even guess which theories are more powerful, but it can’t explain what its theories mean — and new theories often force scientists to rethink and refine basic assumptions. “E=mc2 looks very simple, but it actually encapsulates a lot of knowledge,” Lipson says. “It overturned a lot of older preconceptions about energy and the speed of light.” Even as computers get better at formulating theories, “you need humans to give meaning to what the system finds.”
The Cornell system may validate Carnegie-Mellon professor and entrepreneur Raul Valdez-Perez’s suggestion that over the long run, computers will be less useful “in the application of textbook knowledge… [than] at the science frontiers,” those areas where, as Schmidt puts it, there is “a lot of data, but very little theoretical knowledge,” and lots of big questions left to answer. But the system also highlights how essential human judgment is in taming those frontiers.
Cardiff University sociologist Harry Collins breaks the issue down this way: “Real science turns on two problems,” he says via e-mail. A) what is data and what is noise? and B) what is a credible outcome of a piece of data analysis? If you know the answer to A, science is easy, but frontier scientists spend most of their time trying to work that answer out. Further, A and B interact; scientists often make decisions about what must have been data and what must have been noise by reference to the credibility of the outcome of the analysis. Deciding what’s credible often requires intuition, a sense of whether a theory is elegant or powerful. That kind of judgment seems impossible to describe or automate. For all its reliance on instruments, computers, and code, science remains a profoundly human enterprise.
Originally published May 12, 2009