- THIS ARTICLE
- Full Text
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Whelan, S.
- Articles by Goldman, N.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Whelan, S.
- Articles by Goldman, N.
Genetics, Vol. 167, 2027-2043, August 2004, Copyright © 2004
doi:10.1534/genetics.103.023226
Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes
Simon Whelan*,
,1 and
Nick Goldman*
* EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, United Kingdom
Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom
1 Corresponding author: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
E-mail: simon{at}ebi.ac.uk
Existing mathematical models of DNA sequence evolution assume that all substitutions derive from point mutations. There is, however, increasing evidence that larger-scale events, involving two or more consecutive sites, may also be important. We describe a model, denoted SDT, that allows for single-nucleotide, doublet, and triplet mutations. Applied to protein-coding DNA, the SDT model allows doublet and triplet mutations to overlap codon boundaries but still permits data to be analyzed using the simplifying assumption of independence of sites. We have implemented the SDT model for maximum-likelihood phylogenetic inference and have applied it to an alignment of mammalian globin sequences and to 258 other protein-coding sequence alignments from the Pandit database. We find the SDT model's inclusion of doublet and triplet mutations to be overwhelmingly successful in giving statistically significant improvements in fit of model to data, indicating that larger-scale mutation events do occur. Distributions of inferred parameter values over all alignments analyzed suggest that these events are far more prevalent than previously thought. Detailed consideration of our results and the absence of any known mechanism causing three adjacent nucleotides to be substituted simultaneously, however, leads us to suggest that the actual evolutionary events occurring may include still-larger-scale events, such as gene conversion, inversion, or recombination, or a series of rapid compensatory changes.
This article has been cited by other articles:
![]() |
T.-K. Seo and H. Kishino Statistical Comparison of Nucleotide, Amino Acid, and Codon Substitution Models for Evolutionary Analysis of Protein-Coding Sequences Syst Biol, June 29, 2009; (2009) syp015v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and C. Kosiol Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. H. Majoros and U. Ohler Complexity reduction in context-dependent DNA substitution models Bioinformatics, January 15, 2009; 25(2): 175 - 182. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-K. Seo and H. Kishino Synonymous Substitutions Substantially Improve Evolutionary Inference from Highly Diverged Proteins Syst Biol, June 1, 2008; 57(3): 367 - 377. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang and R. Nielsen Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage Mol. Biol. Evol., March 1, 2008; 25(3): 568 - 579. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Thorne, S. C. Choi, J. Yu, P. G. Higgs, and H. Kishino Population Genetics Without Intraspecific Data Mol. Biol. Evol., August 1, 2007; 24(8): 1667 - 1677. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kosiol, I. Holmes, and N. Goldman An Empirical Codon Model for Protein Sequence Evolution Mol. Biol. Evol., July 1, 2007; 24(7): 1464 - 1479. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bofkin and N. Goldman Variation in Evolutionary Processes at Different Codon Positions Mol. Biol. Evol., February 1, 2007; 24(2): 513 - 521. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Saakian and C.-K. Hu Exact solution of the Eigen model with general fitness functions and degradation rates PNAS, March 28, 2006; 103(13): 4935 - 4939. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Whelan, P. I. W. de Bakker, E. Quevillon, N. Rodriguez, and N. Goldman PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees Nucleic Acids Res., January 1, 2006; 34(suppl_1): D327 - D331. [Abstract] [Full Text] [PDF] |
||||




