Why We’re Not Obsolete

Signals / by Alex Soojung-Kim Pang /

As scientific data accumulates, volume can overwhelm understanding. A new Cornell computer program is using the technological advances that created this data-understanding problem to help solve it.

Credit: Schmidt/AAAS

In a recent article in Science, Cornell professor Hod Lipson and graduate student Michael Schmidt described a new computer system that can discover scientific laws. At first glance, it looks like a fulfillment of the dreams of “computational scientific discovery,” a small field at the intersection of philosophy and artificial intelligence (AI) that seeks to reverse-engineer scientific imagination and create a computer as skilled as we are at constructing theories. But if you look closer, it turns out that the system’s success at analyzing large, complicated data sets, formulating initial theories, and discarding trivial patterns in favor of interesting ones comes not from imitating people, but from allowing a very different kind of intelligence to grow in silico — one that doesn’t compete with humans, but works with us.

Efforts to create computer programs that find scientific laws date back to the 1960s. Pat Langley and Herbert Simon’s BACON programs, for example, were able to rediscover several scientific laws, including Kepler’s Third Law and Ohm’s Law. However, these systems haven’t had much of an effect on science, as they’ve been designed to work in narrow fields or have required cleaned-up, hand-groomed data to work at all. Other programs, designed to make routine tasks more efficient and reliable, have had a much bigger impact. There’s no shortage of creative researchers, but there is a lot more scientific grunt work these days, and a lot more data to crunch.

The problem is, as data accumulates, doing creative things with it gets harder. At a certain point, volume overwhelms understanding. As Schmidt explains, their program solves this problem by “exploiting the computing power that’s available right now” — using the technological advances that created the problem to solve it. Where a scientist might rely on intuition or existing theories, the Cornell system “applies millions or billions of terms and nonlinear equations to the data and looks for ones that make sense of deeper underlying phenomena,” says Schmidt. It starts from zero, but it’s a fast learner.

This approach is computationally intensive because it uses a technique called evolutionary programming to analyze and theorize. You can think of evolutionary programs (or genetic algorithms) as ecosystems in which simple programs compete to explain data. The poor performers are discarded; the best integrate, evolve, and spawn a new generation of programs. These are tested against new data, and the process is repeated until a handful of strong theories remains.

Older AI projects in scientific discovery tried to model the way scientists think. This approach doesn’t try to imitate an individual scientist’s cognitive processes — you don’t need intuition when you have processor cycles to burn — but it bears an interesting similarity to the way scientific communities work. Lipson says it figures out what to look at next “based on disagreement between models, just as a scientist will design an experiment that tests predictions made by competing theories.”

But that doesn’t mean it will replace scientists. Schmidt views it as a tool to see what they can’t: “Something that is not obvious to a human might be obvious to a computer,” he speculates. A program, says Schmidt, may find things “that look really strange and foreign” to a scientist. More fundamentally, the Cornell program can analyze data, build models, and even guess which theories are more powerful, but it can’t explain what its theories mean — and new theories often force scientists to rethink and refine basic assumptions. “E=mc2 looks very simple, but it actually encapsulates a lot of knowledge,” Lipson says. “It overturned a lot of older preconceptions about energy and the speed of light.” Even as computers get better at formulating theories, “you need humans to give meaning to what the system finds.”

The Cornell system may validate Carnegie-Mellon professor and entrepreneur Raul Valdez-Perez’s suggestion that over the long run, computers will be less useful “in the application of textbook knowledge… [than] at the science frontiers,” those areas where, as Schmidt puts it, there is “a lot of data, but very little theoretical knowledge,” and lots of big questions left to answer. But the system also highlights how essential human judgment is in taming those frontiers.

Cardiff University sociologist Harry Collins breaks the issue down this way: “Real science turns on two problems,” he says via e-mail. A) what is data and what is noise? and B) what is a credible outcome of a piece of data analysis? If you know the answer to A, science is easy, but frontier scientists spend most of their time trying to work that answer out. Further, A and B interact; scientists often make decisions about what must have been data and what must have been noise by reference to the credibility of the outcome of the analysis. Deciding what’s credible often requires intuition, a sense of whether a theory is elegant or powerful. That kind of judgment seems impossible to describe or automate. For all its reliance on instruments, computers, and code, science remains a profoundly human enterprise.

Originally published May 12, 2009

Tags data information innovation limits research technology

Share this Stumbleupon Reddit Email + More


  • Ideas

    I Tried Almost Everything Else

    John Rinn, snowboarder, skateboarder, and “genomic origamist,” on why we should dumpster-dive in our genomes and the inspiration of a middle-distance runner.

  • Ideas

    Going, Going, Gone

    The second most common element in the universe is increasingly rare on Earth—except, for now, in America.

  • Ideas

    Earth-like Planets Aren’t Rare

    Renowned planetary scientist James Kasting on the odds of finding another Earth-like planet and the power of science fiction.

The Seed Salon

Video: conversations with leading scientists and thinkers on fundamental issues and ideas at the edge of science and culture.

Are We Beyond the Two Cultures?

Video: Seed revisits the questions C.P. Snow raised about science and the humanities 50 years by asking six great thinkers, Where are we now?

Saved by Science

Audio slideshow: Justine Cooper's large-format photographs of the collections behind the walls of the American Museum of Natural History.

The Universe in 2009

In 2009, we are celebrating curiosity and creativity with a dynamic look at the very best ideas that give us reason for optimism.

Revolutionary Minds
The Interpreters

In this installment of Revolutionary Minds, five people who use the new tools of science to educate, illuminate, and engage.

The Seed Design Series

Leading scientists, designers, and architects on ideas like the personal genome, brain visualization, generative architecture, and collective design.

The Seed State of Science

Seed examines the radical changes within science itself by assessing the evolving role of scientists and the shifting dimensions of scientific practice.

A Place for Science

On the trail of the haunts, homes, and posts of knowledge, from the laboratory to the field.


Witness the science. Stunning photographic portfolios from the pages of Seed magazine.

SEEDMAGAZINE.COM by Seed Media Group. ©2005-2012 Seed Media Group LLC. All Rights Reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | Research Blogging | SEEDMAGAZINE.COM