- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Zhang, J.
- Articles by Gu, X.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Zhang, J.
- Articles by Gu, X.
Correlation Between the Substitution Rate and Rate Variation Among Sites in Protein Evolution
Jianzhi Zhanga and Xun Guaa Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802
Corresponding author: Jianzhi Zhang, 322 Mueller Laboratory, Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park, PA 16802, jxz128{at}psu.edu (E-mail).
| ABSTRACT |
|---|
It is well known that the rate of amino acid substitution varies among different proteins and among different sites of a protein. It is, however, unclear whether the extent of rate variation among sites of a protein and the mean substitution rate of the protein are correlated. We used two approaches to analyze orthologous protein sequences of 51 nuclear genes of vertebrates and 13 mitochondrial genes of mammals. In the first approach, no assumptions of the distribution of the rate variation among sites were made, and in the second approach, the gamma distribution was assumed. Through both approaches, we found a negative correlation between the extent of among-site rate variation and the average substitution rate of a protein. That is, slowly evolving proteins tend to have a high level of rate variation among sites, and vice versa. We found this observation consistent with a simple model of the neutral theory where most sites are either invariable or neutral. We conclude that the correlation is a general feature of protein evolution and discuss its implications in statistical tests of positive Darwinian selection and molecular time estimation of deep divergences.
THE rate of amino acid substitution varies considerably among different proteins as well as different sites of a protein (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
| DATA AND METHODS |
|---|
Characterization of among-site rate variation:
Let us assume that the substitution rate (r) at an amino acid site of a protein sequence is an identically and independently distributed random number following an unknown distribution g(r). Let k be the number of substitutions at a given site during a period of evolutionary time T. Assuming that amino acid substitutions follow the Poisson process when the rate is given, k follows the distribution:
![]() |
(1) |
The expectation of k, E(k), can be obtained by
![]() |
(2) |
![]() |
(3) |
![]() |
(4) |
![]() |
(5) |
The degree of among-site rate variation is better measured by the coefficient of variation, which is obtained from
![]() |
(6) |
It is interesting to see from Equation 4 and Equation 6 that if we can estimate the mean and variance of k, we will be able to study the mean substitution rate and the extent of among-site rate variation without knowing the underlying rate distribution g(r) (![]()
Nevertheless, the gamma distribution has been predominantly used in recent years to characterize the among-site rate variation of proteins (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]() |
(7) |
() is the gamma function. There are two parameters in Equation 7: the mean rate µ = E(r) and the shape parameter
. The extent of among-site rate variation can be measured by
. Small values of
indicate a high degree of rate variation among sites, whereas large values indicate a low degree of variation. In fact, under the gamma model, CV(r) =
. Some examples of the gamma distribution with different values of µ and
are given in Figure 1.
|
Genes used:
We randomly chose 51 nuclear genes (see Table 1) for which the orthologous sequences from the human (Homo sapiens), mouse (Mus musculus), chicken (Gallus gallus), and African clawed frog (Xenopus laevis) are available in the GenBank. All the protein sequences were downloaded from the HOVERGEN database (![]()
![]()
![]()
requires at least three sequences, (2) there are many gene sequences available for these species, (3) the phylogeny of the species is known, and (4) the divergence among these species is moderate for most genes so that pairwise distances and substitution rates can be accurately estimated. As shown in Equation 4, to study the mean rate µ, we have to know the total amount of evolutionary time T for a tree. But if we use the same species for all the genes, T will be the same so that we can infer properties of µ from those of µT without knowing the actual value of T.
|
We also studied 13 mitochondrial genes of 9 mammalian species, including the mouse (GenBank accession no.
J01420), rat (Rattus norvegicus;
X14848), human (D38112), gibbon (Hylobates lar;
X97707), cow (Bos taurus;
V00654), whale (Balaenoptera physalus;
X61145), opossum (Didelphis virginiana;
Z29573), wallaroo (Macropus robustus;
Y10524), and platypus (Ornithorhynchus anatinus;
X83427). In this case only mammalian species were used because mammalian mitochondrial genes evolve quickly (![]()
![]()
Sequence analysis:
For the orthologous sequences of each gene, we used the known species tree and estimated the number of substitutions (k) along the tree at each amino acid site by using Equation 7 of GU and ZHANG's (1997) method. We then estimated the mean and variance of k. Estimates of µT and CV(r) were then obtained by using Equation 4 and Equation 6, and they were denoted as D and CV, respectively.
We also estimated the gamma shape parameter (
) for the protein sequences of each gene by using GU and ZHANG's (1997) method, which has been shown to be as accurate as, but much faster than, the likelihood method. Actually, the likelihood method (![]()
is estimated by maximizing the following likelihood function derived from the negative binomial distribution,
![]() |
(8) |
![]() |
(9) |
is the estimated gamma shape parameter.
For each gene, by a likelihood ratio test, we examined whether the null hypothesis that all sites have the same substitution rate can be rejected and whether the alternative hypothesis of the gamma distribution is favored. We computed
![]() |
(10) |
is the estimated gamma shape parameter and L(
) is the likelihood of the null hypothesis. Because the null hypothesis is a special case of the alternative hypothesis with
being infinity, l is approximately
2 distributed with the number of degrees of freedom being 1. | RESULTS |
|---|
Nuclear genes:
Table 1 shows the estimates for the mean substitution rate and the extent of among-site rate variation for the 51 nuclear genes examined. The coefficient of variation (CV) of the substitution rate among sites varies from 0.43 to 2.02 with the mean of 1.09 and the median of 1.00 (Figure 2). Figure 3 shows the relationships between CV and D (average number of substitutions per site along the whole tree). We fitted the data by a linear regression of CV and lnD. The coefficient of correlation (R) is -0.95, which is significantly different from 0 (P < 0.0001). If we fit the data by a linear regression of CV and D, R becomes -0.91 (P < 0.0001). Because D = µT and T (total amount of evolutionary time of the tree) is identical for all the genes, the correlation coefficient between CV and D (or ln D) is the same as that between CV and µ (or lnµ). These results therefore show that the mean substitution rate and the extent of among-site rate variation of a protein are negatively correlated. This observation is independent of any assumption about the underlying distribution of the substitution rate g(r).
|
|
We then examined the correlation under the assumption of a gamma distribution for the rate variation among sites. The gamma shape parameter
of the 51 nuclear genes varies from 0.17 to 3.45 with the median of 0.71 (Table 1 and Figure 4). About two-thirds of the genes have
values <1, and, therefore, their rate distributions are L-shaped (see Figure 1A). Only 12% of the genes have
values >2. Likelihood ratio tests showed that the null hypothesis of no rate variation among sites can be rejected at the 1% significance level for 48 of the 51 genes and can be rejected at the 5% level for the remaining three genes. Therefore, the alternative hypothesis of the gamma distribution is favored. A common feature of the three genes mentioned above is that they are all short (around 150 amino acids), suggesting that the test might not be powerful enough for these genes. The gamma distance d (number of amino acid substitutions per site) between the mouse and chicken varies from 0.057 to 0.703 for the 51 genes (Table 1), which corresponds to the rate of about 0.1 to 1.2 amino acid substitutions per site per billion (109) years if we assume that mammals and birds diverged about 300 mya, which is now generally accepted (![]()
![]()
|
Figure 5 shows the relationship of
and d (between the mouse and chicken) for the 51 nuclear genes. The R value for
and d is 0.68, which is significantly differ-ent from 0 (P < 0.0001). Linear regression of
and lnd does not change the R value very much. Because only orthologous sequences were used, d is proportional to the average substitution rate µ (d = 2µt, where t is the time since the divergence of birds and mammals). Therefore, the correlation coefficient between
and µ is also 0.68. Although only the distances between the mouse and chicken are presented in Figure 5, use of pairwise distances among other species gave similar results. Large estimates of
(e.g.,
> 2) are known to have substantial sampling errors (![]()
< 2 is now 0.86 (P < 0.0001).
|
Influence of estimation biases:
The above analyses showed that the estimated mean substitution rate and the extent of rate variation among sites are highly correlated. However, to prove that our observations reflect the true properties of protein evolution, we have to show that the correlation is not an artifact due to some biases of the estimation methods we have used. In the estimation of CV and D, no assumptions of the distribution g(r) were made. It is, therefore, very difficult to examine the estimation biases quantitatively. In the following paragraphs, we studied possible biases in the estimation of
and d, where g(r) was assumed to be gamma.
A computer simulation study (J. ZHANG, unpublished results) demonstrated that when the number of sequences used is small (e.g., four), estimation of
is subject to asymptotic bias:
is likely to be underestimated for closely related sequences, but overestimated for highly diverged sequences. This is because when the divergence level is low, there is not enough time for substitutions to occur at potentially variable sites, so they are regarded as invariant sites and
is underestimated. When the divergence level is very high, multiple hits cannot be fully corrected. As a result, the numbers of substitutions are underestimated at very fast evolving sites and
is subsequently overestimated. The estimation bias decreases when the number of sequences used increases (![]()
is likely to create a positive correlation between
and d (or µ), it is worthwhile to examine the influence of the estimation bias on the present study quantitatively by using computer simulation.
The tree of the human, mouse, chicken, and frog (Figure 6) was used as a model tree in the simulation. The branch lengths of the tree were proportional to the divergence times of the species, that is, 100 million yr between the human and mouse (![]()
![]()
value from a uniform distribution in the range of 0.2 to 2 as the gamma shape parameter of the gene, and a p value from a uniform distribution in the range of 0.05 to 0.45 as the proportional difference between the mouse and chicken proteins. These ranges represented the observed variations in the 45 genes, excluding the 6 genes with estimated
values >2. The sequence length used was 450 amino acids, because it is approximately the average length of the proteins that we have examined. The simulation procedure was similar to that described in ![]()
, and the amino acid substitution followed the Poisson process. The tree branch lengths (expected numbers of substitutions per site) were computed for each site separately. For example, the expected distance between the mouse and chicken is dM-C, computed from Equation 9 by using the
and p values that were randomly chosen for this gene, and the average substitution rate for the protein is then µ = dM-C/(2 x 300) substitutions per site per mya, where 300 stands for 300 million yr. The substitution rate at a given site is therefore equal to r = rRµ, where rR is the relative rate at the site. The length of the branch from human to the common ancestor of human and mouse is then 100r substitutions for the site, where 100 stands for 100 million yr. Other branch lengths can be computed similarly. In the simulation, after the four protein sequences at the exterior nodes of the tree were generated, the gamma shape parameter
was estimated by the method of ![]()
and p. The simulation results with the estimated
>2 were abandoned. When 45 independent genes were simulated and 45 couples of
(<2) and d were obtained, the correlation coefficient R was computed. We conducted 2500 replications of such computer simulations, and the distribution of the 2500 R values is presented in Figure 7.
|
|
It is seen from Figure 7 that the estimated
and d are positively correlated, though the true
and d have no correlation at all. The correlation coefficient of the estimated
and d is significantly different from 0 because all 2500 simulated R values are positive. The average correlation coefficient is 0.65. However, only 0.32% of the 2500 simulated R values are greater than 0.86, which is the observed R value for the 45 nuclear genes examined. In other words, the observed R value of 0.86 is significantly greater than 0 at the 0.32% significance level, even when the estimation bias of
is considered. Therefore, the observed positive correlation between
and d of the nuclear proteins seems to have underlying causes other than the estimation bias.
In the above simulation,
and p were randomly chosen from a uniform distribution, respectively. But in reality, neither of them follow a uniform distribution (see Figure 4). It is expected, and is confirmed in a small-scale simulation, that the R values obtained from the simulation will be smaller when the true distributions of the
and p are used. Therefore, the probability of observing an R value of 0.86 or larger, given that there is no correlation between
and d, may have been overestimated in the simulation. In other words, the effect of the estimation bias may be exaggerated in the simulation and the confidence level of 0.32% is a conservative estimate. The assumptions of the Poisson model of amino acid substitution for protein evolution does not seem to have affected our simulation result very much. Because only orthologous proteins are used in the analysis, the evolutionary rate of a protein is expected to be more or less constant among lineages unless there is variation of the mutation rate due to the generation time effect (![]()
![]()
) and D (or d).
Mitochondrial genes:
The estimated CV values of the 13 mitochondrial genes are between 0.66 and 2.05, with the mean of 1.14 and the median of 1.10 (Table 2 and Figure 2). Again, we found that CV and D are highly correlated. The R value is -0.98 (P < 0.0001) for the linear regression between CV and ln D (Figure 8). If we fit the data by a linear regression of CV and D, R becomes -0.92 (P < 0.0001). These results show that the mean substitution rate and the extent of among-site rate variation are negatively correlated for the mitochondrial genes as well.
|
|
The estimated
values for the mitochondrial genes vary from 0.19 to 1.68 with the median of 0.56 (Table 2; Figure 4). Similar to the nuclear genes, about two-thirds of the mitochondrial genes have
values <1. The distance d between the human and mouse varies from 0.20 to 1.15 (Table 2), corresponding to the rate of about 1.0 to 5.8 amino acid substitutions per site per billion years under the assumption that the human and mouse diverged 100 mya. It is seen that mammalian mitochondrial genes evolve about 5 to 10 times faster than nuclear genes, as generally believed (![]()
Figure 9 shows the relationship of
and d (of the human and mouse) for the mitochondrial genes. The correlation coefficient between
and d (or µ) is 0.72 (P < 0.01). Because we have used nine species in the estimation of
, and the estimation bias of
is expected to be quite small when the number of sequences used is no less than eight (![]()
and d is unlikely to have been affected by the estimation bias of
. This result was also confirmed by a small-scale computer simulation similar to that performed for the nuclear genes (data not shown). As mentioned before, ![]()
and d for mitochondrial genes is 0.82 by using 1st and 2nd codon positions of the nucleotide sequences. This value is quite close to what we have obtained (0.72), although we used different species and different methods of
estimation. This suggests that the correlation is robust. As in the case of the nuclear genes, the hypothesis of no rate variation among sites was rejected for all 13 mitochondrial genes (P < 0.01).
|
| DISCUSSION |
|---|
Causes of the negative correlation between the mean rate and the extent of among-site rate variation:
Through the analysis of 64 genes in total, we have shown that the mean substitution rate and the extent of among-site rate variation of a protein are negatively correlated. We have also excluded the possibility that the correlation is caused by estimation biases or is dependent on the assumption of the gamma distribution of the rate variation among sites. Because the 51 nuclear genes we examined are scattered all over the genome and they encode proteins with a variety of functions, it is unlikely that the correlation is limited to particular chromosomal regions or is restricted to particular groups of proteins. Rather, the correlation is likely to be a general feature of molecular evolution because both nuclear and mitochondrial genes show similar patterns. Nevertheless, all the data we used are from vertebrates, and our conclusion needs further support from genes of other organisms.
The observation of negative correlation can be explained under the neutral theory of molecular evolution (![]()
![]()
Let
be the mutation rate at an amino acid site, which is assumed to be constant for all sites of all proteins. Then the substitution rate is 0 for an invariant site, between 0 and
for a variable but functionally constrained site, and equal to
for a neutral site. Let us first consider a simple model in which the functional constraints at all sites of a protein are equal. Then the mean rate µ and CV have no correlation because CV is always 0, and µ can be any value from 0 to
. Therefore, this model cannot explain our observation of the correlation between CV and µ. Now let us consider another model, in which there are two types of sites, either invariant or neutral. Then the average substitution rate of the protein becomes
![]() |
(11) |
![]() |
(12) |
![]() |
(13) |
By using Equation 11 and Equation 13, we obtained
![]() |
(14) |
Equation 14 shows that, under this model, µ is negatively correlated with CV and is dependent only on CV, when
is given. Use of more complex models (e.g., ![]()
In reality, the pattern of the variation of functional constraint among sites is somewhere between the above two extreme models, and the observation of the strong correlation between µ and CV or 1/(CV2 + 1) suggests that it is probably much closer to the second model than to the first. This is biologically understandable because, for an amino acid sequence to be functional, there must be some essential sites that are invariant and some other sites that are from mildly variable to almost free of any change. This pattern of functional constraints seems to be the requirement for a functional protein. It is unrealistic that the constraints at all amino acid sites of a protein are more or less equal unless the protein is free of any functional constraint (no function). In Equation 14, we can see that v also influences the relationship of µ and CV. There is evidence that the mutation rate varies among nuclear genes (![]()
![]()
The biological meaning of
:
The gamma distribution is predominantly used to fit the rate variation among sites of proteins. Our results from the likelihood ratio test indicated that the gamma distribution model fit the rate variation significantly better than the uniform rate model. But the reason why rate variation follows the gamma distribution is unknown and, therefore, the biological meaning of the shape parameter
is not clear. According to our observation of the strong positive correlation between
and µ,
can be interpreted as a measure of functional conservation of a protein. Small values of
indicate strong constraint, and large values indicate weak constraint. It is therefore expected that fourfold degenerate sites and intron regions of functional genes and pseudogenes have very large
values. For example, we estimated that the
of the glycerol-3-phosphate dehydrogenase (GPDH, EC 1.1.1.8) of 13 Drosophila and related species (![]()
of the intron 3 and 5 of the Gpdh gene is ~3.0. Actually, the hypothesis of rate constancy over sites cannot be rejected for these intron sequences (P > 0.05, goodness of fit test). However, the above interpretation of
is valid only when the majority of amino acid sites are undergoing neutral evolution (i.e., no positive selection). When strong positive selection is acting on a large number of sites, µ tends to be high and
tends to be small because functionally most important sites remain invariant, whereas positively selected sites have very high substitution rates. So, in this case,
does not measure the functional conservation of a protein. For example, we found that the
value of the sperm lysin of 20 abalone species (![]()
. Here,
was estimated under the assumption that the mutation rate per amino acid is 2.25 times the mutation rate per nucleotide (![]()
It is intriguing that there are no dots sitting in the upper left corner in Figure 5 or the upper right corner in Figure 3. These are the locations where both the mean rate and the among-site rate variation are high, and where we expect to find proteins consisting of a number of conserved sites and a number of directionally selected sites. The simplest explanation would be that such genes are not included in our sample of 64 genes. Because our sample is random and is not small, the result suggests that those genes that are under strong positive selection are rare in the genome. Of course, we cannot distinguish neutral genes from those in which positive selection operates in a very small number of sites, because in the latter, the overall relationship between
and µ is mainly determined by the sites that are neutral or under purifying selection. In fact, there are no statistical methods that can effectively detect positive selection that only acts on one or several sites except in the case of convergent and parallel evolution (![]()
![]()
![]()
It is also interesting that no dots were found in the lower left corner of Figure 3, where both the mean rate and the rate variation are small. This suggests that even in the most conserved proteins it is unlikely that all sites are equally conserved. More probably, there are still a few nearly neutral sites so that CV becomes high (see Equation 13).
Other implications:
Accelerated evolution in some evolutionary lineages has been observed in several proteins (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
may provide us some information, because when µ is high, high values of
are expected if functional constraint is reduced, but small values of
are likely if strong positive selection acts on a large number of sites (as in the case of the abalone sperm lysin).
Using the relationship between µ and
, let us examine an interesting case of fast evolution. ![]()
![]()
values of the SRY proteins of eight primate species (![]()
for the N and C domains, particularly the latter, suggest that the accelerated evolution in SRY is probably due to low functional significance out of the HMG domain rather than adaptation under strong positive selection. Actually, all clinical mutations in human SRY resulting in phenotypic sex reversal are found in the HMG domain (![]()
![]()
![]()
![]()
![]()
Ignoring rate variation among sites causes underestimation of distances between homologous proteins. For estimating dates of early divergences by the molecular clock, only slowly evolving proteins can be used. Because slowly evolving proteins tend to have high degrees of rate variation among sites, consideration of the rate variation and use of gamma distances seems to be necessary for such estimations. This has been highlighted in recent debates on the age of the most recent common ancestor of present-day organisms (![]()
![]()
![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We are grateful to SUDHIR KUMAR and PATRICK PARKER for providing us aligned protein sequences of the nuclear genes. We thank SUDHIR KUMAR, MASATOSHI NEI, TANYA SITNIKOVA, CHUNG-I WU, and two anonymous reviewers for their comments on an earlier version of this article. This work was supported by the research grants from the National Institutes of Health and the National Science Foundation to M. NEI.
Manuscript received December 10, 1997; Accepted for publication April 7, 1998.
| LITERATURE CITED |
|---|
AHLBERG, P. E. and A. R. MILNER, 1994 The origin and early diversification of tetrapods. Nature 368:507-514.
BENTON, M. J., 1993 The Fossil Record, Vol. 2, Chapman and Hall, New York.
BROWN, W. M., M. GEORGE, JR., and A. C. WILSON, 1979 Rapid evolution of animal mitochondrial DNA. Proc. Natl. Acad. Sci. USA 76:1967-1971
DOOLITTLE, R. F., D.-F. FENG, S. TSANG, G. CHO, and E. LITTLE, 1996 Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271:470-477[Abstract].
DURET, L., D. MOUCHIROUD, and M. GOUY, 1994 HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res. 22:2360-2365
FENG, D.-F., G. CHO, and R. F. DOOLITTLE, 1997 Determining divergence times with a protein clock: update and reevaluation. Proc. Natl. Acad. Sci. USA 94:13028-13033
GOLDING, G. B., 1983 Estimation of DNA and protein sequence divergence: an examination of some assumptions. Mol. Biol. Evol. 1:125-142[Abstract].
GOODFELLOW, P. N. and R. LOVELL-BADGE, 1993 SRY and sex determination in mammals. Annu. Rev. Genet. 27:71-92[Medline].
GU, X., 1997 The age of the common ancestor of eukaryotes and prokaryotes: statistical inferences. Mol. Biol. Evol. 14:861-866[Abstract].
GU, X. and J. ZHANG, 1997 A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:1106-1113[Abstract].
GU, X. and W.-H. LI, 1998 Estimation of evolutionary distances under stationary and nonstationary models of nucleotide substitution. Proc. Natl. Acad. Sci. USA 95:5899-5905
GU, X., Y.-X. FU, and W.-H. LI, 1995 Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol. Biol. Evol. 12:546-557[Abstract].
HASEGAWA, M., W. M. FITCH, J. P. GOGARTEN, L. OLENDZENSKI, and E. HILARIO et al., 1996 Dating the cenancestor of organisms. Science 274:1750-1753[Medline].
HAWKINS, J. R., A. TAYLOR, P. BERTA, J. LEVILLIERS, and B. VAN DER AUWERA et al., 1992 Mutational analysis of SRY: nonsense and missense mutations in XY sex reversal. Hum. Genet. 88:471-474[Medline].
HEDGES, S. B., P. H. PARKER, C. G. SIBLEY, and S. KUMAR, 1996 Continental breakup and the ordinal diversification of birds and mammals. Nature 381:226-229.
JANKE, A., X. XU, and U. ARNASON, 1997 The mitochondrial genome of the wallaroo (Macropus robustus) and the phylogenetic relationship among Monotremate, Marsupialia, and Eutheria. Proc. Natl. Acad. Sci. USA 94:1276-1281
KIMURA, M., 1979 Model of effectively neutral mutations in which selective constraint is incorporated. Proc. Natl. Acad. Sci. USA 76:3440-3444
KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge.
KIMURA, M. and T. OHTA, 1974 On some principles governing molecular evolution. Proc. Natl. Acad. Sci. USA 71:2848-2852
KUMAR, S., 1996 Patterns of nucleotide substitution in mitochondrial protein coding genes of vertebrates. Genetics 143:537-548[Abstract].
KWIATOWSKI, J., M. KRAWCZYK, M. JAWORSKI, D. SKARECKY, and F. J. AYALA, 1997 Erratic evolution of glycerol-3-phosphate dehydrogenase in Drosophila, Chymomyza, and Ceratitis.. J. Mol. Evol. 44:9-22[Medline].
LEE, Y.-H., T. OTA, and V. D. VACQUIER, 1995 Positive selection is a general phenomenon in the evolution of abalone sperm lysin. Mol. Biol. Evol. 12:231-238[Abstract].
LI, W.-H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.
LI, W.-H. and T. GOJOBORI, 1983 Rapid evolution of goat and sheep globin genes following gene duplication. Mol. Biol. Evol. 1:94-108[Abstract].
LI, W.-H., D. L. ELLSWORTH, J. KRUSHKAL, B. H.-J. CHANG, and D. HEWETT-EMMETT, 1996 Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol. Phylogenet. Evol. 5:182-187[Medline].
MIYAMOTO, M. M. and W. M. FITCH, 1996 Constraints on protein evolution and the age of the eubacteria/eukaryote split. Syst. Biol. 45:568-575[Medline].
NEDBAL, M. A. and J. J. FLYNN, 1998 Do the combined effects of the asymmetric process of replication and DNA damage from oxygen radicals produce a mutation-rate signature in the mitochondrial genome? Mol. Biol. Evol. 15:219-223[Medline].
NEI, M., 1975 Molecular Population Genetics and Evolution. North-Holland Publishing Company, Amsterdam.
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
NEI, M., R. CHAKRABORTY, and P. A. FUERST, 1976 Infinite allele model with varying mutation rate. Proc. Natl. Acad. Sci. USA 73:4164-4168
OHTA, T., 1992 The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263-286.
OHTA, T., 1995 Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 40:56-63[Medline].
OTA, T. and M. NEI, 1994 Estimation of the number of amino acid substitutions per site when the substitution rate varies among sites. J. Mol. Evol. 36:642-643.
PONTIGGIA, A., S. WHITFIELD, P. N. GOODFELLOW, R. LOVELL-BADGE, and M. E. BIANCHI, 1995 Evolutionary conservation in the DNA-binding and -bending properties of HMG-boxes from SRY proteins of primates. Gene 154:277-280[Medline].
SHARP, P. M., 1997 In search of molecular Darwinism. Nature 385:111-112[Medline].
STEWART, C.-B., J. W. SCHILLING, and A. C. WILSON, 1987 Adaptive evolution in the stomach lysozymes of foregut fermenters. Nature 330:401-404[Medline].
TAJIMA, T., J. NAKAE, N. SHINOHARA, and K. FUJIEDA, 1994 A novel mutation localized in the 3' non-HMG box region of the SRY gene in 46X, Y gonadal dysgenesis. Hum. Mol. Genet. 3:1187-1189
TAMURA, K. and M. NEI, 1993 Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in human and chimpanzees. Mol. Biol. Evol. 10:512-526[Abstract].
TUCKER, P. K. and B. L. LUNDRIGAN, 1993 Rapid evolution of the sex determining locus in Old World mice and rats. Nature 364:715-717[Medline].
UZZELL, T. and K. W. CORBIN, 1971 Fitting discrete probability distributions to evolutionary events. Science 172:1089-1096
WALLIS, M., 1993 Remarkable high rate of molecular evolution of ruminant placental lactogens. J. Mol. Evol. 37:86-88[Medline].
WALLIS, M., 1996 The molecular evolution of vertebrate growth hormones: a pattern of near-stasis interrupted by sustained bursts of rapid change. J. Mol. Evol. 43:93-100[Medline].
WERNER, M. H., J. R. HUTH, A. M. GRONENBORN, and G. M. CLORE, 1995 Molecular basis of human 46X, Y sex reversal revealed from the three-dimensional solution structure of the human SRY-DNA complex. Cell 81:705-714[Medline].
WHITFIELD, L. S., R. LOVELL-BADGE, and P. N. GOODFELLOW, 1993 Rapid sequence evolution of the mammalian sex-determining gene SRY. Nature 364:713-715[Medline].
WOLFE, K. H., P. M. SHARP, and W.-H. LI, 1989 Mutation rates differ among regions of the mammalian genome. Nature 337:283-285[Medline].
YANG, Z., 1994 Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306-314[Medline].
ZHANG, J. and S. KUMAR, 1997 Detection of convergent and parallel evolution at the amino acid sequence level. Mol. Biol. Evol. 14:527-536[Abstract].
ZHANG, J. and M. NEI, 1997 Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J. Mol. Evol. 40(Suppl. 1):S139-S146.
ZHANG, J., H. F. ROSENBERG, and M. NEI, 1998 Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708-3713
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Zhang, J.
- Articles by Gu, X.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Zhang, J.
- Articles by Gu, X.






















