The construction of evolutionary trees is a fundamental problem in biology. All current polynomial time methods (to our knowledge) are unlikely to recover the topology of the true tree from sequences of realistic lengths (bounded, perhaps, by 10,000 nucleotides) for large sets of widely divergent sequences, even if such methods are known to be able to reconstruct the correct topology given long enough sequences. Therefore, all current polynomial time methods may be incapable of addressing difficult and important data sets such as the Tree of Life. We address this problem and present a new polynomial time algorithm for reconstructing evolutionary trees called the Short Quartets Method. Our method computes topologies of subtrees induced by "short" quartets, which avoid the problematic influence of pairs of widely divergent sequences. We reconstruct a tree on the entire set of sequences from these inferred subtrees when they are compatible. We describe a simple implementation and study its convergence rate for the Cavender-Farris model of evolution. Our analytical results and experimental study indicates that this method has significantly greater statistical power on large divergent trees than other distance methods methods, such as Neighbor Joining and the recent work of Agarwala et al, Farach & Kannan, and Cohen & Farach, on approximating the Linf-nearest tree.
Joint work with Péter
L. Erdös, Michael A.
László A. Székely and Ken Rice.
Paper available from Székely's home page.