The deterministic effects of alignment bias in phylogenetic inference

Simmons MP, Webb CT, Müller KF

Research article (journal)


Alignment of nucleotide and/or amino acid sequences is a fundamental component of sequence based molecular phylogenetic studies. Most comparisons of alignment programs have focused on how the inferred alignments compared with “known” alignments. In this study, however, our focus was on how different alignment methods affect the gene trees that are inferred from the alignments. We used simulations to determine how alignment errors can lead to systematic biases that affect phylogenetic inference from those sequences. We compared four alternative approaches to sequence alignment: progressive pairwise alignment, as implemented in Clustal W and {MUSCLE;} simultaneous multiple alignment of sequence fragments, as implemented in {DCA;} local pairwise alignment, as implemented in {DIALIGN} 2 and {DIALIGN-T;} and direct optimization, as implemented in {POY.} We set out to address three questions in this study. First, when only a single process partition is simulated, how accurately do the gene trees derived from the inferred alignments reconstruct the simulated tree topology? How do they compare to the gene tree derived from the correct alignment? Second, when different process partitions that have different histories are aligned and phylogenetically analyzed together, how congruent are the gene trees derived from the inferred alignments with the tree topology for the process partition with the greatest number of characters? Are they more or less congruent than the gene tree derived from the correct alignment? Third, when sequences without any insertions or deletions are simulated, how many gapped positions are incorrectly inserted by the different alignment methods?

Details zur Publikation

Pages: 15
Release year: 2010
Language in which the publication is writtenEnglish