# ICS 269, Spring 1997: Theory Seminar

## 11 Apr 1997:

Constructing Big Trees from Short Sequences

Tandy
Warnow, Dept. Comp. and Inf. Sci., Univ. of Pennsylvania

The construction of evolutionary trees is a fundamental problem
in biology. All current polynomial time methods (to our knowledge)
are unlikely to recover the topology of the true tree from
sequences of realistic lengths (bounded, perhaps, by 10,000
nucleotides) for large sets of widely divergent sequences, even if
such methods are known to be able to reconstruct the correct
topology given long enough sequences. Therefore, all current
polynomial time methods may be incapable of addressing difficult
and important data sets such as the Tree of Life. We address this
problem and present a new polynomial time algorithm for
reconstructing evolutionary trees called the Short Quartets Method.
Our method computes topologies of subtrees induced by "short"
quartets, which avoid the problematic influence of pairs of widely
divergent sequences. We reconstruct a tree on the entire set of
sequences from these inferred subtrees when they are compatible. We
describe a simple implementation and study its convergence rate for
the Cavender-Farris model of evolution. Our analytical results and
experimental study indicates that this method has significantly
greater statistical power on large divergent trees than other
distance methods methods, such as Neighbor Joining and the recent
work of Agarwala et al, Farach & Kannan, and Cohen &
Farach, on approximating the L_{inf}-nearest tree.

Joint work with Péter
L. Erdös, Michael A.
Steel,
László A. Székely and Ken Rice.

Paper available
from Székely's home page.