Skip to main content
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus
  • Other GSA Resources
    • Genetics Society of America
    • G3: Genes | Genomes | Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org

Institution: Massachusetts Inst of Technol MIT Libs

  • Log in
Genetics

Main menu

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Early Career Reviewers
    • Contact us
  • SERIES
    • Centennial
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Prediction
    • Multiparental Populations
    • FlyBook
    • WormBook
    • YeastBook
  • ARTICLE TYPES
    • About Article Types
    • Commentaries
    • Editorials
    • GSA Honors and Awards
    • Methods, Technology & Resources
    • Perspectives
    • Primers
    • Reviews
    • Toolbox Reviews
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Why subscribe?
    • For institutions
    • For individuals
    • Email alerts
    • RSS feeds
  • Other GSA Resources
    • Genetics Society of America
    • G3: Genes | Genomes | Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org

User menu

  • Log out

Search

  • Advanced search
Genetics

Advanced Search

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Early Career Reviewers
    • Contact us
  • SERIES
    • Centennial
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Prediction
    • Multiparental Populations
    • FlyBook
    • WormBook
    • YeastBook
  • ARTICLE TYPES
    • About Article Types
    • Commentaries
    • Editorials
    • GSA Honors and Awards
    • Methods, Technology & Resources
    • Perspectives
    • Primers
    • Reviews
    • Toolbox Reviews
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Why subscribe?
    • For institutions
    • For individuals
    • Email alerts
    • RSS feeds
Previous ArticleNext Article

The Age of Nonsynonymous and Synonymous Mutations in Animal mtDNA and Implications for the Mildly Deleterious Theory

Rasmus Nielsen and Daniel M. Weinreich
Genetics September 1, 1999 vol. 153 no. 1 497-506
Rasmus Nielsen
* Department of Organismic and Evolutionary Biology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel M. Weinreich
† Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
Loading

Abstract

McDonald/Kreitman tests performed on animal mtDNA consistently reveal significant deviations from strict neutrality in the direction of an excess number of polymorphic nonsynonymous sites, which is consistent with purifying selection acting on nonsynonymous sites. We show that under models of recurrent neutral and deleterious mutations, the mean age of segregating neutral mutations is greater than the mean age of segregating selected mutations, even in the absence of recombination. We develop a test of the hypothesis that the mean age of segregating synonymous mutations equals the mean age of segregating nonsynonymous mutations in a sample of DNA sequences. The power of this age-of-mutation test and the power of the McDonald/Kreitman test are explored by computer simulations. We apply the new test to 25 previously published mitochondrial data sets and find weak evidence for selection against nonsynonymous mutations.

ONE of the greatest standing controversies in evolutionary biology is the debate over the causes of molecular variation in natural populations (e.g., Kimura 1983; Gillespie 1991). The neutral theory asserts that the vast majority of variation is selectively neutral and when two or more alleles coexist in a population, natural selection cannot distinguish between the alleles. Under the neutral theory, allelic frequencies are determined by genetic drift. Similarly the theory holds that most allelic fixation events are the consequence of genetic drift and not of natural selection. In contrast, selective theories, including the mildly deleterious model (Ohta 1973), claim that a significant amount of the allelic variation and fixations in natural populations is determined by natural selection.

The McDonald/Kreitman test (McDonald and Kreitman 1991) analyzes the neutral assumption that the same processes determine allelic polymorphism and allelic fixation by comparing the number of synonymous and nonsynonymous sites segregating within and fixed between species in a 2 × 2 contingency table. Recently, neutrality has been rejected by the McDonald/Kreitman test in a number of mitochondrial DNA (mtDNA) data sets (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Randet al. 1994; Rand and Kann 1996, 1998; Nachman 1998; Hasegawaet al. 1998; Wiseet al. 1998). The direction of deviation from neutrality in each of these data sets is the same: an excess number of polymorphic nonsynonymous sites segregating within species relative to the number of nonsynonymous sites fixed between species. Mildly deleterious alleles may segregate in a population in low or intermediate frequencies but are not as likely to become fixed (Kimura 1983), and thus if some nonsynonymous mutations are mildly deleterious, we expect that they will segregate within populations but rarely reach fixation. This is the interpretation commonly given to the significant McDonald/Kreitman tests observed for animal mtDNA (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Nachman 1998; Rand and Kann 1996, 1998; Randet al. 1994; Hasegawaet al. 1998; Wiseet al. 1998).

We present a new method for testing the hypothesis that segregating nonsynonymous sites in animal mtDNA are mildly deleterious, based on a comparison of the age of segregating nonsynonymous and synonymous mutations. Throughout we assume that there is no recombination between adjacent nucleotide sites. This implies that nonsynonymous and synonymous sites share the same underlying gene genealogy. We first show that the mean age of a segregating deleterious mutation is less than the mean age of a segregating neutral mutation, even in the absence of recombination. Thus under the hypothesis that nonsynonymous mutations are mildly deleterious, the mean age of nonsynonymous segregating sites should be less than the mean age of synonymous segregating sites despite the fact that the underlying gene genealogy is identical for the two types of sites. We present a test for comparing the mean ages of two arbitrary classes of segregating sites (e.g., synonymous and nonsynonymous) in a sample and determine its power as a function of the strength of purifying selection. We then determine the power of the McDonald/Kreitman test to detect purifying selection. Finally we apply our test to a set of 25 previously published mtDNA data sets.

METHODS

Computer simulations: Throughout, computer simulations were used to test theoretical predictions and to examine the properties of the suggested statistical tests. In these simulations a haploid population of size N chromosomes was simulated under the Wright-Fisher model. To model the mutational process an infinite sites model was used (Watterson 1975). The mutational makeup of each individual chromosome and the generation in which each segregating mutation arose were stored in computer memory. In each generation the population of chromosomes underwent mutation, selection, and genetic drift. The probability of a new mutation occurring in a given generation on a chromosome is μ=μs +μn, where μs is the mutation rate of selected mutations and μn is the mutation rate of neutral mutations. Simple multiplicative fitness was employed; thus the relative fitness of the ith chromosome is given by wi=(1+s)ni∕∑j(1+s)nj, (1) where s is the selection coefficient of selected mutations and ni is the number of selected mutations carried by the ith chromosome. For the sake of simplicity we assume that s does not vary in time or among selected mutations and that there are no epistatic interactions. Selection together with genetic drift is simulated by sampling multinomially between generations among the k segregating chromosomes according to the probabilities specified by the wi, i = 1, 2,.., k given by (1). In general, samples were taken from the population every 10N generations after the first 100N generations. For each sample, some function of the sample was evaluated, such as the mean age of selected or neutral segregating sites or the P value under one of the suggested tests. Denote the obtained value of the evaluated function in the ith sample by fi. Assuming that stationary distributions exist for these functions, the theorem of ergodicity ensures that 1n∑i=1nfi→E(f)asn→∞, where E(f) is the expectation of the function f and n is the number of samples. To examine the power of the McDonald/Kreitman test, we also scored the number of fixed differences between the sequences in each sample and a single sequence sampled t generations in the past. Population sizes of N = 1000 chromosomes were used in the simulations throughout. Except where otherwise noted, the parameter values used in the simulations were chosen to fit the estimated values based on the analyzed data sets (see Data analysis); i.e., the number of sequences in a sample was set to n = 25, θ = 10 (θ = 4Nμ), and the level of intraspecific divergence was set to μt = 180. The ratio of nonsynonymous to synonymous mutations was set to μn/μs = 1, corresponding to assuming that ∼¾ of all new nonsynonymous mutations are immediately lethal to the organism.

Test statistics: We explored several tests of the hypothesis that the mean ages of segregating sites in two distinct classes of sites (e.g., neutral and selected) in a sample are equal. The basics of these tests are the same: a score is calculated for each segregating site. Denote the value of this score for the ith site by ωi. Then, we define the test statistic Ω=∑i(1−Ii)⋅∑iIiωi∑iIi⋅∑i(1−Ii)ωi, (2) where Ii is an indicator function that returns 0 if site i is a neutral site and 1 if site i is selected. Thus Ω is the ratio of the average of ω for selected sites to the average of ω for neutral sites. The null distribution of Ω under the hypothesis that the value of ωi is independent of the class of site i can be approximated by repeatedly recalculating Ω after random relabeling of sites such that the total number of selected sites remains constant and the total number of neutral sites remains constant. A total of 100,000 replicates of this resampling scheme were used to establish critical values of Ω. If we denote the ith value of Ω by Ωi and the observed value of Ω by Ωobs, then a test based on k resampled samples rejects at the 5% level if (1/k)Rki=1I(Ωobs≥Ωi) < 0.05, where I() is an indicator function that returns 1 if the Boolean condition is true and 0 otherwise. We refer to the statistic Ω as the age-of-mutation (AOM) statistic.

The power of such a test to detect differences in mean age between classes is obviously dependent on how strongly ωi is correlated with the age of the mutation responsible for polymorphic site i. The power of the test using the different estimators for ω was evaluated as described in Computer simulations. One obvious candidate for ω is an estimator of the age of a mutation under a neutral equilibrium infinite sites model suggested by Griffiths and Tavaré (1999). This estimator uses E(T | X) (where T is the age of a mutation and X is the observed data) as an estimator of T. E(T | X) is calculated by integrating over all possible gene genealogies (branch lengths and topologies) while weighting each genealogy by its likelihood. The integral is evaluated by the Monte Carlo method described in Griffiths and Tavaré (1995, 1999). This estimator may have desirable statistical properties because all the information in the data regarding T is applied in the estimation of T. However, application of this estimator is computationally very time consuming. Furthermore, in cases where the data do not conform to the infinite sites model, it is necessary to remove conflicting sites or conflicting sequences. For this reason, considerable effort was devoted to developing additional estimators that could be calculated fast and did not require elimination of any data.

Given only b, the observed number of chromosomes bearing a mutation in a sample of n chromosomes, the expected age (a) of the mutation under strict neutrality is given by E(a∣b)=2(n−1b)−1∑j=2n(n−jb−1)n−j+1n(j−1) (3) (Griffiths and Tavaré 1998). Results such as this, based on the marginal frequency of a mutation in a single site, can obviously be applied to estimate ages of individual mutations. However, they were in general found to perform much worse than estimators employing more of the data. Also, estimators based on neutrality such as E(T | X) may not perform well when the assumption of selective neutrality is violated. Instead, a heuristic estimator was devised on the basis of estimation of the structure of the underlying genealogy. In this method, an estimate of the underlying genealogy was obtained by the unweighted pair group method using arithmetic averages (UPGMA; Sokal and Michener 1958). Mutations were subsequently assigned to branches (edges) using Fitch and Wagner parsimony (e.g., Fitch 1971), and the expected age of a mutation was estimated as the average age of the branch on which the mutation occurred, i.e., the average of the time of the start of the branch and the time of the end of the branch. In cases where two or more mutations were needed to account for the observed site pattern, the average of the estimates obtained for the different assignments of the mutation was used as an estimator. We denote this estimator by â. Note that under the infinite sites model, â is the expected age of a mutation conditional on the estimated genealogy.

This estimator (â) is applied in addition to E(T | X) because it can be calculated quickly, making it suitable for simulation purposes, and because it does not require the elimination of data when the assumptions of the infinite sites model are violated. However, we do not suggest that this heuristically derived estimator should be used for more general purposes. It was chosen here solely because it provided more power in the test than estimators based on the marginal site pattern such as estimators derived from Equation 3.

Power analysis: The power of the test statistic given in Equation 2 and of the McDonald/Kreitman test to detect the presence of deleterious mutations was evaluated by computer simulations as described in Computer simulations for different values of the selection coefficient. The proportion of simulated data sets that gave significant test statistic values was tabulated for a range of values of Ns for θ = 10 and for a range of values of θ for Ns = –3.

McDonald/Kreitman tests (McDonald and Kreitman 1991) of simulated data were performed as a G-test in a 2 × 2 contingency table using the Williams correction for small sample size (see, for example, Sokal and Rohlf 1981). The numbers entering the contingency table were the number of fixed selected and neutral mutations and the number of segregating selected and neutral mutations in the sample. Empty cell entries were ignored when calculating the test statistic. This mimics a common method for performing the McDonald/Kreitman test and should not inflate the chosen significance level markedly under biologically reasonable parameter values.

Data analysis: The null hypothesis that we wish to test is that the average age of nonsynonymous mutations is the same as the average age of synonymous mutations. To analyze the data, all polymorphisms were partitioned into nonsynonymous and synonymous polymorphisms. To do this, the following heuristic approach was applied: the most common sequence was chosen as a reference sequence representing a hypothesized ancestral state. This was done only to establish an unambiguous criterion to determine if a mutation was synonymous or nonsynonymous. The identity of a variable site in a codon was determined by comparison to the homologous codon in the reference sequence. In this manner, all polymorphisms could be scored unambiguously as nonsynonymous or synonymous even when there are multiple segregating sites in a single codon. If three nucleotides were segregating in the same site, two mutations were inferred and represented as two variable sites. In most cases only one mutation was observed in each codon, suggesting that the effect of this heuristic approach does not seriously bias the analysis. Also, because the approach was applied blindly with regard to the class of the mutation, it should not inflate the chosen significance level in the resampling test.

For each data set, two resampling tests were performed, one using E(T | X) and one using the heuristically derived â as estimators of the age of a mutation. In each case, the data were resampled 100,000 times to provide a P value. When E(T | X) was applied, the Watterson (1975) estimator of θ was used to drive the simulations, and 100,000 genealogies (runs of the Markov chain) were sampled to provide estimates of the age of each mutation (see Griffiths and Tavaré 1995). Calculation of E(T | X) requires that the data strictly conform to the infinite sites model. For example, Harding et al. (1997) chose to eliminate apparent recombinant sequences. In the present analysis we instead eliminated sites to obtain a subset of sites that are consistent with the infinite sites model. The following heuristic algorithm was implemented for this purpose. Two diallelic sites are in conflict if all of the patterns in Table 1 are observed, where the labeling of state (0, 1) is arbitrary. Let Ci be the number of sites that conflict with site i. Then site i is eliminated from the data set when Ci = max{C1, C2,..., Ch}, where h is the number of sites. This elimination scheme is repeated until C1 = C2 =... = Ch = 0. The test based on E(T | X) was performed on these reduced data sets, whereas the test based on â was performed on the full data sets. Data sets so small that a significant result was not theoretically possible were not included in the analysis. Four data sets were eliminated for this reason.

View this table:
  • View inline
  • View popup
TABLE 1

Conflicting site patterns

Amino acid saturation dynamics: To examine the relative degree of mutational saturation in nonsynonymous and synonymous substitutions in mtDNA, the number of nonsynonymous and synonymous nucleotide differences between pairs of species was plotted for 13 mtDNA genes (NADH1, NADH2, NADH3, NADH4, NADH4L, NADH5, NADH6, ATPase 6, ATPase 8, COI, COII, COIII, Cyt. b). The species included in this analysis were Homo sapiens (Andersonet al. 1981); Pan paniscus, Pongo pygmaeus (Horaiet al. 1995); Mus musculus (Bibbet al. 1981); Gallus gallus (Desjardins and Morais 1990); and Drosophila melanogaster (Garesse 1988). Sequences were aligned by eye and nonsynonymous and synonymous nucleotide differences were counted as described in Data analysis.

RESULTS

Expected ages of selected and linked neutral segregating mutations: Several results regarding the ages of neutral mutations are known. For example, the age of a neutral mutation with frequency b in a sample of size n is given by Equation 3. However, selection will affect the underlying gene genealogy and thus the distribution of coalescence times in those genealogies (e.g., Kaplanet al. 1988; Hudson and Kaplan 1994) and will therefore affect the expected age of a sampled selected mutation and of linked neutral mutations. In addition, the age of a selected mutation may differ from the neutral expectation because selected mutations may not be distributed on the genealogy according to a Poisson process.

We are interested in the ratio of the age of selected to the age of neutral mutations in models without recombination. We show that in the models of interest, selection tends to reduce this ratio below unity and that this property of models with selection can be used to test neutrality. First, consider a haploid model in which there is only one selected site in the population and assume that there are only two possible states for this site (A and a). A haplotype carrying state A has fitness 1 and a haplotype carrying state a has fitness 1 + s. Furthermore, assume that mutations occur from state A to state a at rate υ and from state a to state A at rate υ. Assume that neutral mutations occur according to an infinite sites model at rate μ and that there is no recombination. Following Hudson and Kaplan (1994), we analyze this model in the limit of large population sizes so that the frequency of A is maintained at the equilibrium value. Under this assumption the mutation/selection model can be analyzed as a model of migration between two populations. Mutation between allele A and a is equivalent to migration between two isolated subpopulations with population sizes Nf(A) and Nf(a), respectively, where N is the population size and f(A) and f(a) are the frequencies of A and a, respectively, in the population. The equilibrium frequency of a for biologically plausible values of s and υ is given by f(a)=(s+sυ+2υ−(s+sυ+2υ)2−4sυ(1+s))/(2s) (4) (see, for example, Nagylaki 1992, pp. 9–13). The probability that an A allele arose by mutation in the previous generation is given by PaA=f(a)υf(a)υ+f(A)(1−υ) (5) and the probability that an a allele arose by mutation in the previous generation is given by PAa=f(A)υf(A)υ+f(a)(1−υ). (6) Equations 5 and 6 correspond to Equation 4 of Hudson and Kaplan (1994). Let T(A, a) be the expected coalescence time of a sample of size 2 given that one A allele and one a allele were obtained in the sample, T(2a) be the expected coalescence time of a sample of size 2 given that two a alleles were obtained in the sample, and T(2A) be the expected coalescence time of a sample of size 2 given that two A alleles were obtained in the sample. Assuming that only one mutation or coalescence event can happen in each generation, we can establish the following system of recursive equations for T(A, a), T(2A), and T(2a), T(2A)=1+T(A,a)2PaA2PaA+1∕(Nf(A))T(2a)=1+T(A,a)2PAa2PAa+1∕(Nf(a))T(A,a)=1+T(2a)PAa+T(2A)PaAPAa+PaA, (7) which can easily be solved for T(A, a), T(2A), and T(2a). Then the mean coalescence time can be found as T2=f(a)2T(2a)+f(A)2T(2A)+2f(A)f(a)T(2a) (8) (Hudson and Kaplan 1994). Using this result, the expected age of a segregating neutral mutation in a sample of size 2 is then simply given by T2/2. The expected age of a detectable selected mutation in a sample of size 2 is given by 1PaA+PAa. (9) We can therefore evaluate the ratio of the expected age of a detectable selected mutation to the expected age of a neutral mutation in a sample of size 2 (Figure 1). The absolute ratio of the ages in the figure does not approach 1.0 as Ns approaches 0 because it is a function of υ. However, note that the ratio is a monotonically decreasing function of s. The reason is that selection has only a marginal effect on the distribution of coalescence times but has a strong effect on the age of the selected mutation. That selection has only little effect on the expected coalescence time in a diallelic model at equilibrium has previously been noted by Neuhauser and Krone (1997). The model described above does not easily pertain to real data because of the assumption that the frequency of A is maintained at the equilibrium value and because only one selected site is considered. A more realistic model would assume an infinite sites model for both selected and neutral mutations and would allow the frequency of an allele to be a random variable through time. However, such a model does not lend itself to analytical treatment. Instead, we examined this model using computer simulations as described in methods (Figure 2). Note that the ratio of the mean age of a selected segregating mutation to the age of a segregating neutral mutation is a monotonically increasing function of the selection coefficient in the examined range of Ns and that this relationship extends to positive values of Ns. These results confirm the notion that the differences in the average age of nonsynonymous and synonymous mutations may be used to detect selection against slightly deleterious mutations. If a large fraction of nonsynonymous mutations is slightly deleterious, their average age should be less than the average age of synonymous mutations. However, if the excess of nonsynonymous mutations segregating in mtDNA within populations is caused by other forces such as balancing selection, one would not necessarily expect the age of nonsynonymous mutations to be less than the age of synonymous mutations. Indeed, balancing selection is expected to increase the age of the selected mutation relative to the neutral expectation.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

—The ratio of the expected age of a detectable selected mutation to the age of a neutral mutation (age ratio) in the model described in the text with Nυ = 1.0.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

—The ratio of the average age of nonsynonymous mutations to synonymous mutations in a sample when nonsynonymous mutations are under constant positive or negative selection. Parameter values were n = 25 (sample size), θ = 10, N = 1000, μs/μn = 1. A total of 1000 data sets were obtained to evaluate each point. Simulations were performed as described in methods.

Power analyses: The power of the McDonald/Kreitman test (McDonald and Kreitman 1991) has not received much attention in the literature, although Akashi (1999) recently explored its power under the assumption of free recombination between adjacent sites. We were interested in examining its power against the hypothesis of purifying selection in the absence of recombination. This is the case applicable to mitochondrial DNA. Such analysis may provide a crude estimate of the size of the selection coefficients required to explain the repeated observations of departures from neutrality in animal mtDNA (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Randet al. 1994; Rand and Kann 1996, 1998; Nachman 1998; Wiseet al. 1998). We examined the power of the test as described in methods (Figure 3) and found that, although it has a maximum at ∼Ns = –10, the test has power >50% over the range –2 > Ns > –40. Thus selection coefficients acting on nonsynonymous mutations in mtDNA should fall in this range if they are to account for the many observed deviations from neutrality.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

—The power (the proportion of tests significant at the 5% level) of the McDonald/Kreitman test (a) and the test of the ratio of mean ages of selected to neutral mutations (b) under negative selection. The parameter values are as in Figure 2; μt = 180.

Having demonstrated that under purifying selection the mean age of segregating selected sites is less than the mean age of segregating neutral sites in a population (Figures 1 and 2), we next turned to the problem of comparing mean ages of selected and neutral sites segregating in a sample of chromosomes drawn from a population. We examined the power of the AOM test (Equation 2) when using â as an estimator of the age of mutations by computer simulations as described in methods and found it to increase as Ns is reduced from 0 to –10 with a maximum in the range –10 > Ns > –50 (Figure 3a). Power analysis of the AOM test using E(T | X) as an computaestimator of age of mutations by simulation is tionally too expensive to perform.

Data analysis:The hypothesis that the ages of nonsynonymous and synonymous polymorphisms are equal was tested for 25 previously published animal mtDNA data sets (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Rand and Kann 1996; Wiseet al. 1998; and references in Nachman 1998 and Rand and Kann 1998). The results of the tests are described in Table 2. Note that in only 6 cases using â as the estimator of relative age and in only 4 cases using E(T | X) as the estimator of age was the hypothesis of an equal average age of nonsynonymous and synonymous mutations rejected by the AOM test, whereas the McDonald/Kreitman test rejected neutrality in 14 of 25 cases.

DISCUSSION

Application of the McDonald/Kreitman test (McDonald and Kreitman 1991) to animal mtDNA has repeatedly demonstrated that the evolution of this genetic element is not consistent with strict neutrality (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Randet al. 1994; Rand and Kann 1996, 1998; Nachman 1998; Hasegawaet al. 1998; Wiseet al. 1998). Relative to the number of nonsynonymous sites fixed between species, these data sets all show an excess number of segregating nonsynonymous sites within species. Kimura (1983) has shown that deleterious alleles contribute more to (within-species) heterozygosity than to (between-species) divergence and thus, these observations may be consistent with the notion that most segregating synonymous mutations are roughly neutral but that most segregating nonsynonymous mutations are mildly deleterious. We set out to explicitly test this hypothesis. On the basis of theoretical predictions (Figures 1 and 2) we chose the ratio of the estimated mean age of nonsynonymous mutations to the estimated mean age of synonymous mutations as a test statistic. Two estimators of the age of a mutation were used to analyze published mtDNA data sets: E(T | X) and â (an estimator based on estimation of the underlying gene genealogy using classical phylogenetic methods). E(T | X) has the advantage of using all the information in a sample, but assumes that the data fit the infinite sites model. As shown in Table 2, most of the published data sets (18/25) do not fit this model. Furthermore, the calculation of E(T | X) is very time consuming, making it unsuitable for simulation purposes. This is the motivation for using â as an age estimator of the age of a mutation in addition to E(T | X).

However, note that the selective removal of sites when E(T | X) is used will not lead to falsely significant results. To realize this, note that under the null hypothesis of strict neutrality, both nonsynonymous and synonymous mutations should be distributed according to a Poisson process on the underlying shared gene genealogy. Because the distribution of the times of events in a Poisson process is given by the order statistics of a uniform random variable, the expected age of a mutation in a site is independent of the number of mutations occurring at the site. Thus the expected age of mutations in a data set will not change by the removal of sites with many mutations nor will the ratio of the expected ages of nonsynonymous to synonymous mutations change.

On the basis of an analysis of the power of the McDonald/Kreitman test (Figure 3a), we conclude that if purifying selection alone is responsible for the observed deviations from neutrality in animal mtDNA, selection coefficients must be in the range –2 > Ns > –40. It was surprising to us that the McDonald/Kreitman test retains power over such a wide range and that such large (negative) values of Ns may be responsible for the observations. Akashi (1999) found that little power remained for values of Ns <–10 and that essentially no power remained for values of Ns <–20. The discrepancy between the results can presumably be explained by the effect of linkage, which is known to reduce the efficacy of selection (e.g., Hill and Robertson 1966).

View this table:
  • View inline
  • View popup
TABLE 2

The results of the test of equal age of nonsynonymous and synonymous segregating mutations

The AOM test based on â has less power to reject the hypothesis of neutrality across most of the ranges of Ns examined (Figure 3b) than the McDonald/Kreitman test. Power rises as Ns is reduced from –1 to –10 after which it appears to reach a plateau of ∼65%. It is not surprising that the McDonald/Kreitman test has more power in most of the parameter space because this test employs more data (interspecific comparisons in addition to intraspecific comparisons) than the test based on â. However, the new test based on â uses information in the data not employed by the McDonald/Kreitman test, and it is able to distinguish between hypotheses of balancing selection and selection on deleterious mutations. It should therefore provide additional information regarding selection coefficients when applied in addition to the McDonald/Kreitman test. Another advantage of the new test is that it is quite robust to the model assumptions. Under strict neutrality, the distribution of site patterns in nonsynonymous sites and in synonymous sites should be identical. If the weights assigned to different site patterns are poorly chosen, this will lead to a reduction in power but not to an inflation of the chosen significance level. This is also true even if the population is not at stationarity because of the decoupling of the mutational and the genealogical processes under neutrality.

Six out of 25 tests were significant using â as an estimator of the age of an allele (Table 2). The probability of randomly obtaining 6 or more significant results in 25 tests at the 5% significance level is P = 0.0012. One of the data sets (Mesomys) has a P value so low (P < 0.001) that the null hypothesis of no difference in age of synonymous and nonsynonymous mutations for all data sets can be rejected by a Bonferroni test. It is curious to note that all six significant data sets contain a relatively large number of segregating sites; indeed three of the six are the three largest data sets shown in Table 2 with respect to number of segregating sites. This may suggest that the small number of significant tests is caused by a lack of statistical power when only few polymorphic sites are available. In contrast there appears to be no bias among significant data sets with respect to number of chromosomes examined. Also, in 17/25 cases, the estimated mean age of nonsynonymous mutations is less than the estimated mean age of synonymous substitution (not shown). Together, these results provide weak support for the hypothesis that the average age of nonsynonymous mutations is less than the average age of synonymous mutations.

To test the hypothesis that the small number of significant AOM results in Table 2 could be the consequence of small sample size (polymorphic sites), additional power analyses of both the McDonald/Kreitman and the AOM tests were performed in which Ns was held constant and θ varied (Figure 4). Indeed the reduction in power when θ is small is somewhat more severe for the AOM test than for the McDonald/Kreitman test.

Supposing that purifying selection is acting on amino acid replacement mutations, the next question is which range of values of Ns are consistent with the observations from both the McDonald/Kreitman test and the AOM test presented here. Figure 5 (solid bars) shows the expected frequency distributions of P values for the test based on â derived from simulation with Ns = 3. Note that Ns = 3 is near the bottom of the power curve of the AOM test (Figure 3b); however, if Ns is larger, the expected distribution would be even more skewed toward low P values. In contrast the frequency distribution of P values observed in the 25 data sets is nearly uniform (Figure 5, hatched bars). It appears that Ns must be <3 to account for the distribution of P values obtained in the test based on â. However, if selection against deleterious mutations is to explain the many significant McDonald/Kreitman tests, Ns should be >2. There is therefore only a small window of values of Ns that are consistent with the findings in this article. Given the variability of population sizes in naturally occurring populations, it is difficult to explain why all values of Ns should fall in this range.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

—The power (the proportion of tests significant at the 5% level) of the two tests (□, McDonald/Kreitman test; ▵, AOM test) when Ns = 3 and θ is varied. The remaining parameters are as in Figure 2.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

—The observed distribution of P values (solid bars) for the 25 data sets and the expected distribution (hatched bars) for Ns = 3 and the same parameter values as in Figure 2.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

—Observed nonsynonymous differences per site (dn) and synonymous differences per site (ds) in pairwise comparisons for multiple mtDNA-encoded proteins.

The model applied to describe selection against deleterious mutations is very simple in that it assumes that all nonlethal nonsynonymous mutations have the same selection coefficient and that this selection coefficient is constant in time. Likewise, it is assumed in the simulations that the populations are in stationarity, i.e., a constant population size and no population subdivision. Although there is no reason to assume that a more complex demographic model or a more complex distribution of negative selection coefficients would change the relative power of the McDonald/Kreitman test vs. the AOM test, we cannot rule out this possibility.

If purifying selection alone is not responsible for the repeated observations of deviation from neutrality, what other processes are consistent with the data? One obvious explanation is that the significant results partly are artifacts caused by the simple methods used to estimate the number of nonsynonymous and synonymous mutations (e.g., Hasegawaet al. 1998). Such an explanation would require that saturation occur faster in nonsynonymous than in synonymous sites, i.e., the number of nonsynonymous nucleotide differences should be a convex function of the number of synonymous nucleotide differences (Nielsen 1997). To examine this hypothesis, we plotted the number of nonsynonymous nucleotide differences between pairs of species against the number of synonymous nucleotide differences for several mitochondrial genes (Figure 6). As expected, saturation occurs faster in synonymous sites than in nonsynonymous sites. Thus the use of nucleotide differences instead of the estimated number of substitutions in the McDonald/Kreitman test does not appear to be responsible for the apparent pattern of deviation from neutrality in mtDNA evolution. In fact, saturation by multiple hits should make the McDonald/Kreitman test more conservative against the alternative of segregating deleterious mutations.

Another problem is sequencing errors. Such errors could inflate the significance level of the McDonald/Kreitman test because they would tend preferentially to elevate the number of intraspecific nonsynonymous mutations. Likewise, sequencing errors could be of importance in explaining the significant results from the AOM test presented in this article. However, sequencing errors should also elevate the ratio of nonsynonymous mutations to synonymous mutations for small levels of divergence in interspecific data. Because no such pattern is observed in Figure 6, we do not believe that this explanation is likely. Moreover, elevated intraspecific nonsynonymous mutations have not been observed in nuclear data sets (McDonald and Kreitman 1991; D. M. Weinreich and D. Rand, unpublished results). However, we cannot conclusively rule out that sequencing errors may be of some importance in explaining the large number of significant McDonald/Kreitman tests.

Theoretically, balancing selection acting on mtDNA could maintain levels of nonsynonymous polymorphism within species above that seen between species (Nachmanet al. 1994; Rand and Kann 1996). Balancing selection could act through cytoplasmic-nuclear interaction (McRae and Anderson 1988; Babcock and Asmussen 1996; Goodisman and Asmussen 1997). However, it is difficult to imagine that this type of selection can act on sufficiently many nucleotide sites simultaneously to account for the observed results. Indeed, selection against slightly deleterious mutations appears at the moment to be the most plausible explanation for the significant results of the McDonald/Kreitman test in animal mtDNA. However, a combination of forces cannot be excluded.

Acknowledgments

This study was supported by a fellowship to R.N. from the Danish Research Council. David Rand and Michael Nachman provided several animal mtDNA data sets prior to publication.

Footnotes

  • Communicating editor: A. G. Clark

  • Received December 20, 1998.
  • Accepted June 2, 1999.
  • Copyright © 1999 by the Genetics Society of America

LITERATURE CITED

  1. ↵
    1. Akashi H.
    , 1999 Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151: 221–238.
    OpenUrlAbstract/FREE Full Text
  2. ↵
    1. Anderson S.,
    2. Bankier A. T.,
    3. Barrell B. G.,
    4. deBruijn M. H. L.,
    5. Coulson A. R.,
    6. et al
    ., 1981 Sequence and organization of the human mitochondrial genome. Nature 290: 457–465.
    OpenUrlCrossRefPubMedWeb of Science
  3. ↵
    1. Babcock D. S.,
    2. Asmussen M. A.
    , 1996 Effects of differential selection in the sexes on cytonuclear polymorphism and disequilibria. Genetics 144: 839–853.
    OpenUrlAbstract/FREE Full Text
  4. ↵
    1. Baker R. J.,
    2. VanDenBussche R. A.,
    3. Wright A. J.,
    4. Wiggins L. E.,
    5. Hamilton M. J.,
    6. et al
    ., 1996 High levels of genetic change on rodents of Chernobyl. Nature 380: 707–708.
    OpenUrlCrossRefPubMed
  5. ↵
    1. Ballard J. W. O.,
    2. Kreitman M.
    , 1994 Unraveling selection in the mitochondrial genome of Drosophila. Genetics 138: 875–888.
    OpenUrl
  6. ↵
    1. Bibb M. J.,
    2. Van Etten R. A.,
    3. Wright C. T.,
    4. Walberg M. W.,
    5. Clayton D. A.
    , 1981 Sequence and gene organization of mouse mitochondrial DNA. Cell 26: 167–180.
    OpenUrlCrossRefPubMedWeb of Science
  7. ↵
    1. Bruna E. M.,
    2. Fisher R. N.,
    3. Case T. J.
    , 1996 Morphological and genetic evolution is decoupled in Pacific skinks (Squamata: Scinidae: Emoia). Proc. R. Soc. Lond. Ser. B. 263: 681–688.
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Desjardins P.,
    2. Morais R.
    , 1990 Sequence and gene organization of the chicken mitochondrial genome. J. Mol. Biol. 212: 599–634.
    OpenUrlCrossRefPubMedWeb of Science
  9. ↵
    1. Fitch W. M.
    , 1971 Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool. 20: 406–416.
    OpenUrlCrossRefWeb of Science
  10. ↵
    1. Garesse R.
    , 1988 Drosophila melanogaster mitochondrial DNA: gene organization and evolutionary considerations. Genetics 118: 649–663.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. Gillespie J. H.
    , 1991 The Causes of Molecular Evolution. Oxford University Press, Oxford.
  12. ↵
    1. Goodisman M. A. D.,
    2. Asmussen M. A.
    , 1997 Cytonuclear theory for haplodiploid species and X-linked genes. I. Hardy-Weinburg dynamics and continent-island, hybrid zone models. Genetics 147: 321–338.
    OpenUrlAbstract/FREE Full Text
  13. ↵
    1. Griffiths R. C.,
    2. Tavaré S.
    , 1995 Unrooted genealogical tree probabilities in the infinitely-many-sites model. Math. Biosci. 127: 77–98.
    OpenUrlCrossRefPubMedWeb of Science
  14. ↵
    1. Griffiths R. C.,
    2. Tavaré S.
    , 1998 The age of a mutation in a general coalescent tree. Stoch. Model. 14: 273–295.
  15. ↵
    1. Griffiths R. C.,
    2. Tavaré S.
    , 1999 The ages of mutations in gene trees. Ann. Appl. Prob. (in press).
  16. ↵
    1. Harding R. M.,
    2. Fullerton S. M.,
    3. Griffiths R. C.,
    4. Bond J.,
    5. Cox M. J.,
    6. et al
    ., 1997 Archaic African and Asian lineages in the genetic ancestry of modern humans. Am. J. Hum. Genet. 60: 772–789.
    OpenUrlPubMedWeb of Science
  17. ↵
    1. Hasegawa M.,
    2. Cao Y.,
    3. Yang Z.
    , 1998 Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Mol. Biol. Evol. 15: 1499–1505.
    OpenUrlFREE Full Text
  18. ↵
    1. Hill W. G.,
    2. Robertson A.
    , 1966 The effect of linkage on the limits to artificial selection. Genet. Res 8: 269–294.
    OpenUrlCrossRefPubMedWeb of Science
  19. ↵
    1. Horai S.,
    2. Hayasaka K.,
    3. Kondo R.,
    4. Tsugane K.,
    5. Takahata N.
    , 1995 Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc. Natl. Acad. Sci. USA 92: 532–536.
    OpenUrlAbstract/FREE Full Text
  20. ↵
    1. Golding B.
    1. Hudson R. R.,
    2. Kaplan N. L.
    , 1994 Gene trees with background selection, pp. 140–153 in Non-Neutral Evolution, edited by Golding B.. Chapman and Hall, New York.
  21. ↵
    1. Kaplan N. L.,
    2. Darden T.,
    3. Hudson R. R.
    , 1988 The coalescent in models with selection. Genetics 120: 819–829.
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Kimura M.
    , 1983 The Neutral Theory of Molecular Revolution. Cambridge University Press, Cambridge, United Kingdom.
  23. ↵
    1. McDonald J. H.,
    2. Kreitman M.
    , 1991 Adaptive evolution in the ADH locus in Drosophila. Nature 351: 652–654.
    OpenUrlCrossRefPubMedWeb of Science
  24. ↵
    1. McRae A. F.,
    2. Anderson W. W.
    , 1988 Evidence for non-neutrality of mitochondrial DNA haplotypes in Drosophila pseudoobscura. Genetics 120: 485–494.
    OpenUrlAbstract/FREE Full Text
  25. ↵
    1. Moritz C.,
    2. Schneider C. J.,
    3. Wake D. B.
    , 1992 Evolutionary relationships within the Ensatina eschscholtzii complex confirm the ring species interpretation. Syst. Biol. 41: 273–291.
    OpenUrlCrossRef
  26. ↵
    1. Nachman M. W.
    , 1998 Deleterious mutations in animal mitochondrial DNA. Genetica 102/103: 61–69.
  27. ↵
    1. Nachman M. W.,
    2. Boyer S. N.,
    3. Aquadro C. F.
    , 1994 Non-neutral evolution at the mitochondrial NADH dehydrogenase subunit 3 gene in mice. Proc. Natl. Acad. Sci. USA 91: 6364–6368.
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. Nachman M. W.,
    2. Brown W. M.,
    3. Stoneking M.,
    4. Aquadro C. F.
    , 1996 Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142: 953–963.
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Nagylaki T.
    , 1992 Introduction to Theoretical Population Genetics. Springer-Verlag, Berlin.
  30. ↵
    1. Neuhauser C.,
    2. Krone S. M.
    , 1997 The genealogy of samples in models with selection. Genetics 145: 519–534.
    OpenUrlAbstract/FREE Full Text
  31. ↵
    1. Nielsen R.
    , 1997 The ratio of replacement to silent divergence and tests of neutrality. J. Evol. Biol. 10: 217–231.
    OpenUrlCrossRef
  32. ↵
    1. Ohta T.
    , 1973 Slightly deleterious mutant substitutions in evolution. Nature 246: 96–98.
    OpenUrlCrossRefPubMedWeb of Science
  33. ↵
    1. Rand D. M.,
    2. Kann L.
    , 1996 Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice and humans. Mol. Biol. Evol. 13: 735–748.
    OpenUrlAbstract
  34. ↵
    1. Rand D. M.,
    2. Kann L.
    , 1998 Mutation and selection at silent and replacement sites in the evolution of animal mitochondrial DNA. Genetica 102/103: 393–407.
  35. ↵
    1. Rand D. M.,
    2. Dorfsman M.,
    3. Kann L. M.
    , 1994 Neutral and nonneutral evolution of Drosophila mitochondrial DNA. Genetics 138: 741–756.
    OpenUrlAbstract/FREE Full Text
  36. ↵
    1. Ruvolo M.,
    2. Pan D.,
    3. Zehr S.,
    4. Goldberg T.,
    5. Disotell T. R.,
    6. et al
    ., 1994 Gene trees and hominoid phylogeny. Proc. Natl. Acad. Sci. USA 91: 8900–8904.
    OpenUrlAbstract/FREE Full Text
  37. ↵
    1. Sokal R. R.,
    2. Michener C. D.
    , 1958 A statistical method for evaluating systematic relationship. Univ. Kansas Sci. Bull. 28: 1409–1438.
    OpenUrl
  38. ↵
    1. Sokal R. R.,
    2. Rohlf F. J.
    , 1981 Biometry. W. H. Freeman and Company, New York.
  39. ↵
    1. Summers K.,
    2. Bermingham E.,
    3. Weigt L.,
    4. McCafferty S.,
    5. Dahlstrom L.
    , 1997 Phenotypic and genetic divergence in three species of dart-poison frogs with contrasting parental behavior. J. Hered. 88: 8–13.
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Watterson G. A.
    , 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276.
    OpenUrlCrossRefPubMedWeb of Science
  41. ↵
    1. Wise C. A.,
    2. Micheala S.,
    3. Easteal S.
    , 1998 Departure from neutrality at the mitochondrial NADH dehydrogenase subunit 2 gene in humans, but not in chimpanzees. Genetics 148: 409–421.
    OpenUrlAbstract/FREE Full Text
  42. ↵
    1. Wood T. C.,
    2. Krajewski C.
    , 1996 Mitochondrial DNA sequence variation among the subspecies of Sarus Crane (Grus antigone). Auk 113: 655–663.
    OpenUrl
  43. ↵
    1. Zink R. M.,
    2. Blackwell R. C.
    , 1996 Patterns of allozyme, mitochondrial DNA, and morphometric variation in four sparrow genera. Auk 113: 59–67.
    OpenUrlCrossRefWeb of Science
View Abstract
Previous ArticleNext Article
Back to top

PUBLICATION INFORMATION

Volume 153 Issue 1, September 1999

Genetics: 153 (1)

ARTICLE CLASSIFICATION

INVESTIGATIONS
View this article with LENS
Email

Thank you for sharing this Genetics article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
The Age of Nonsynonymous and Synonymous Mutations in Animal mtDNA and Implications for the Mildly Deleterious Theory
(Your Name) has forwarded a page to you from Genetics
(Your Name) thought you would be interested in this article in Genetics.
Print
Alerts
Enter your email below to set up alert notifications for new article, or to manage your existing alerts.
SIGN UP OR SIGN IN WITH YOUR EMAIL
View PDF
Share

The Age of Nonsynonymous and Synonymous Mutations in Animal mtDNA and Implications for the Mildly Deleterious Theory

Rasmus Nielsen and Daniel M. Weinreich
Genetics September 1, 1999 vol. 153 no. 1 497-506
Rasmus Nielsen
* Department of Organismic and Evolutionary Biology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel M. Weinreich
† Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation

The Age of Nonsynonymous and Synonymous Mutations in Animal mtDNA and Implications for the Mildly Deleterious Theory

Rasmus Nielsen and Daniel M. Weinreich
Genetics September 1, 1999 vol. 153 no. 1 497-506
Rasmus Nielsen
* Department of Organismic and Evolutionary Biology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel M. Weinreich
† Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Related Articles

Cited By

More in this TOC Section

  • Fine-Grained Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana
  • Mms21: A Putative SUMO E3 Ligase in Candida albicans That Negatively Regulates Invasiveness and Filamentation, and Is Required for the Genotoxic and Cellular Stress Response
  • Glucose Signaling Is Connected to Chromosome Segregation Through Protein Kinase A Phosphorylation of the Dam1 Kinetochore Subunit in Saccharomyces cerevisiae
Show more Investigations
  • Top
  • Article
    • Abstract
    • METHODS
    • RESULTS
    • DISCUSSION
    • Acknowledgments
    • Footnotes
    • LITERATURE CITED
  • Figures & Data
  • Info & Metrics

GSA

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.

Online ISSN: 1943-2631

  • For Authors
  • For Reviewers
  • For Subscribers
  • Submit a Manuscript
  • Editorial Board
  • Press Releases

SPPA Logo

GET CONNECTED

RSS  Subscribe with RSS.

email  Subscribe via email. Sign up to receive alert notifications of new articles.

  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus

Copyright © 2019 by the Genetics Society of America

  • About GENETICS
  • Terms of use
  • Advertising
  • Permissions
  • Contact us
  • International access