Illustration: Tyler Lang
Organizing the world’s species into branches on a phylogenetic tree is a major goal of biologists trying to understand how life evolved. DNA-sequencing technologies are providing them with more information than ever with which to accomplish this goal, but with less than 1 percent of all species currently placed in any kind of phylogeny, there is still much work to be done. In a recent paper in Science, researchers at the University of Texas at Austin introduced new tree-building software that could expand the tree of life and change our understanding of evolution.
One way to construct evolutionary trees is with software that compares and interprets discrepancies between the molecular sequences of different species using various statistical techniques. The robustness of the math driving these techniques largely determines the speed and accuracy of a given tree-building method. Thus taking a mathematically well-grounded approach to constructing evolutionary trees can limit a method’s scope. “The statisticians who have been developing these methods have been really trying to get the mathematics right,” explains Tandy Warnow, a phylogeneticist at the University of Texas at Austin. “And getting the mathematics right really does tend to limit you to small datasets.” Many programs are only fast enough to handle about 20 molecular sequences at a time—a paltry number considering the datasets biologists are trying to analyze are usually anywhere from a few hundred to a few thousand sequences.
To find out just how slow these programs were, Warnow attempted to run them on a data set of 100 sequences. “They looked like they were not going to complete for months and months and months,” she says. Larger datasets, then, could take decades.
“There is a clear and desperate need for methods that compute phylogenetic trees much faster,” says Antonis Rokas, a biologist at Vanderbilt University, adding that scientists ultimately hope to build trees containing millions of species.
To address the problem, Warnow and her colleagues developed a tree-building program called SATé capable of processing 1,000 sequences in 24 hours. She refers to the statistical method used by SATé as “not completely kosher” in her field, because in order to up the speed and power of the software, her team used mathematical techniques without solid theoretical grounding. “We’re not following a mathematically rigorous approach,” she says. But the risk paid off: SATé constructed trees with a high degree of accuracy from simulated datasets as well a real one whose tree structure had already been determined.
SATé solves another problem common among tree-building programs. Some species’ DNA changes so quickly that their molecular sequences from generation to generation can be quite different and thus more difficult for software to compare. But SATé is able to handle many of these rapidly evolving species and by doing so opens previously impenetrable datasets to new types of phylogenetic analysis.
“This is certainly a big step in the right direction,” Rokas says. “And I expect this software to be used more and more.”
Because Warnow’s team developed SATé using mathematical methods they don’t yet completely understand, it is still a mystery how the program is able to deal with rapidly evolving sequences so successfully. “We have something that works well but doesn’t really yet have an explanation,” Warnow says. According to her, learning how the program works will require the attention of a strong probabilist.
“The difficulty is finding mathematicians interested in the biological problems,” explains Rokas. But if mathematicians can determine why SATé is able to outperform other tree-building methods, Warnow may be able to improve upon the design of SATé to consider even larger sets of data and move closer to the goal of constructing a tree of life containing all species.
Rokas explains that Warnow’s research is so successful because she understands the practical considerations facing biologists. “I want software that is easy and that runs quickly so that I can train my students to use it,” he says. SATé, which works on a laptop computer, was made to do just that. “We designed it so that it was really going to be easy for anyone to use,” Warnow explains.
By taking a step away from the mathematically well-understood approach to molecular phylogenetics, Warnow’s team was able to address the needs of researchers like Rokas. SATé enables scientists to handle real-world databases of a wider range of species, and, according to Warnow, this could lead to new scientific discoveries and have broad implications for evolutionary biology. What remains to be seen is how far Warnow can push mathematicians to solve the problems necessary to move SATé—and our understanding of evolution—forward.
Originally published July 1, 2009