Genetics, Vol. 154, 381-395, January 2000, Copyright © 2000

A New Method for Characterizing Replacement Rate Variation in Molecular Sequences: Application of the Fourier and Wavelet Models to Drosophila and Mammalian Proteins

Pavel Morozova, Tatyana Sitnikovab, Gary Churchillc, Francisco José Ayalad, and Andrey Rzhetskye
a Columbia Genome Center, Columbia University, New York, New York 10032,
b Eisai Research Institute, GEFA Biology Group, Boston, Massachusetts 02138,
c The Jackson Laboratory, Bar Harbor, Maine 04609,
d New York, New York 10013
e Department of Medical Informatics, Columbia University, New York, New York 10032

Corresponding author: Andrey Rzhetsky, Columbia Genome Center, Columbia University, 1150 St. Nicholas Ave., Unit 109, New York, NY 10032., andrey{at}genome2.cpmc.columbia.edu (E-mail)

Communicating editor: J. HEY

We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.





This article has been cited by other articles:


Home page
Mol Biol EvolHome page
S. L. Kosakovsky Pond and S. D. W. Frost
A Simple Hierarchical Approach to Modeling Distributions of Substitution Rates
Mol. Biol. Evol., February 1, 2005; 22(2): 223 - 234.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
D. Pol
Empirical Problems of the Hierarchical Likelihood Ratio Test for Model Selection
Syst Biol, December 1, 2004; 53(6): 949 - 962.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
D. Posada and T. R. Buckley
Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests
Syst Biol, October 1, 2004; 53(5): 793 - 808.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. A. Gaucher, M. M. Miyamoto, and S. A. Benner
Function-structure analysis of proteins using covarion-based evolutionary approaches: Elongation factors
PNAS, January 16, 2001; 98(2): 548 - 552.
[Abstract] [Full Text] [PDF]