Genetics, Vol. 167, 2027-2043, August 2004, Copyright © 2004
doi:10.1534/genetics.103.023226

Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes

* EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, United Kingdom
{dagger} Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom

1 Corresponding author: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
E-mail: simon{at}ebi.ac.uk

Existing mathematical models of DNA sequence evolution assume that all substitutions derive from point mutations. There is, however, increasing evidence that larger-scale events, involving two or more consecutive sites, may also be important. We describe a model, denoted SDT, that allows for single-nucleotide, doublet, and triplet mutations. Applied to protein-coding DNA, the SDT model allows doublet and triplet mutations to overlap codon boundaries but still permits data to be analyzed using the simplifying assumption of independence of sites. We have implemented the SDT model for maximum-likelihood phylogenetic inference and have applied it to an alignment of mammalian globin sequences and to 258 other protein-coding sequence alignments from the Pandit database. We find the SDT model's inclusion of doublet and triplet mutations to be overwhelmingly successful in giving statistically significant improvements in fit of model to data, indicating that larger-scale mutation events do occur. Distributions of inferred parameter values over all alignments analyzed suggest that these events are far more prevalent than previously thought. Detailed consideration of our results and the absence of any known mechanism causing three adjacent nucleotides to be substituted simultaneously, however, leads us to suggest that the actual evolutionary events occurring may include still-larger-scale events, such as gene conversion, inversion, or recombination, or a series of rapid compensatory changes.




This article has been cited by other articles:


Home page
Syst BiolHome page
T.-K. Seo and H. Kishino
Statistical Comparison of Nucleotide, Amino Acid, and Codon Substitution Models for Evolutionary Analysis of Protein-Coding Sequences
Syst Biol, June 29, 2009; (2009) syp015v1.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Anisimova and C. Kosiol
Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. H. Majoros and U. Ohler
Complexity reduction in context-dependent DNA substitution models
Bioinformatics, January 15, 2009; 25(2): 175 - 182.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
T.-K. Seo and H. Kishino
Synonymous Substitutions Substantially Improve Evolutionary Inference from Highly Diverged Proteins
Syst Biol, June 1, 2008; 57(3): 367 - 377.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Z. Yang and R. Nielsen
Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage
Mol. Biol. Evol., March 1, 2008; 25(3): 568 - 579.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. L. Thorne, S. C. Choi, J. Yu, P. G. Higgs, and H. Kishino
Population Genetics Without Intraspecific Data
Mol. Biol. Evol., August 1, 2007; 24(8): 1667 - 1677.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Kosiol, I. Holmes, and N. Goldman
An Empirical Codon Model for Protein Sequence Evolution
Mol. Biol. Evol., July 1, 2007; 24(7): 1464 - 1479.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
L. Bofkin and N. Goldman
Variation in Evolutionary Processes at Different Codon Positions
Mol. Biol. Evol., February 1, 2007; 24(2): 513 - 521.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. B. Saakian and C.-K. Hu
Exact solution of the Eigen model with general fitness functions and degradation rates
PNAS, March 28, 2006; 103(13): 4935 - 4939.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Whelan, P. I. W. de Bakker, E. Quevillon, N. Rodriguez, and N. Goldman
PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D327 - D331.
[Abstract] [Full Text] [PDF]