Abstract
McDonald/Kreitman tests performed on animal mtDNA consistently reveal significant deviations from strict neutrality in the direction of an excess number of polymorphic nonsynonymous sites, which is consistent with purifying selection acting on nonsynonymous sites. We show that under models of recurrent neutral and deleterious mutations, the mean age of segregating neutral mutations is greater than the mean age of segregating selected mutations, even in the absence of recombination. We develop a test of the hypothesis that the mean age of segregating synonymous mutations equals the mean age of segregating nonsynonymous mutations in a sample of DNA sequences. The power of this ageofmutation test and the power of the McDonald/Kreitman test are explored by computer simulations. We apply the new test to 25 previously published mitochondrial data sets and find weak evidence for selection against nonsynonymous mutations.
ONE of the greatest standing controversies in evolutionary biology is the debate over the causes of molecular variation in natural populations (e.g., Kimura 1983; Gillespie 1991). The neutral theory asserts that the vast majority of variation is selectively neutral and when two or more alleles coexist in a population, natural selection cannot distinguish between the alleles. Under the neutral theory, allelic frequencies are determined by genetic drift. Similarly the theory holds that most allelic fixation events are the consequence of genetic drift and not of natural selection. In contrast, selective theories, including the mildly deleterious model (Ohta 1973), claim that a significant amount of the allelic variation and fixations in natural populations is determined by natural selection.
The McDonald/Kreitman test (McDonald and Kreitman 1991) analyzes the neutral assumption that the same processes determine allelic polymorphism and allelic fixation by comparing the number of synonymous and nonsynonymous sites segregating within and fixed between species in a 2 × 2 contingency table. Recently, neutrality has been rejected by the McDonald/Kreitman test in a number of mitochondrial DNA (mtDNA) data sets (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Randet al. 1994; Rand and Kann 1996, 1998; Nachman 1998; Hasegawaet al. 1998; Wiseet al. 1998). The direction of deviation from neutrality in each of these data sets is the same: an excess number of polymorphic nonsynonymous sites segregating within species relative to the number of nonsynonymous sites fixed between species. Mildly deleterious alleles may segregate in a population in low or intermediate frequencies but are not as likely to become fixed (Kimura 1983), and thus if some nonsynonymous mutations are mildly deleterious, we expect that they will segregate within populations but rarely reach fixation. This is the interpretation commonly given to the significant McDonald/Kreitman tests observed for animal mtDNA (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Nachman 1998; Rand and Kann 1996, 1998; Randet al. 1994; Hasegawaet al. 1998; Wiseet al. 1998).
We present a new method for testing the hypothesis that segregating nonsynonymous sites in animal mtDNA are mildly deleterious, based on a comparison of the age of segregating nonsynonymous and synonymous mutations. Throughout we assume that there is no recombination between adjacent nucleotide sites. This implies that nonsynonymous and synonymous sites share the same underlying gene genealogy. We first show that the mean age of a segregating deleterious mutation is less than the mean age of a segregating neutral mutation, even in the absence of recombination. Thus under the hypothesis that nonsynonymous mutations are mildly deleterious, the mean age of nonsynonymous segregating sites should be less than the mean age of synonymous segregating sites despite the fact that the underlying gene genealogy is identical for the two types of sites. We present a test for comparing the mean ages of two arbitrary classes of segregating sites (e.g., synonymous and nonsynonymous) in a sample and determine its power as a function of the strength of purifying selection. We then determine the power of the McDonald/Kreitman test to detect purifying selection. Finally we apply our test to a set of 25 previously published mtDNA data sets.
METHODS
Computer simulations: Throughout, computer simulations were used to test theoretical predictions and to examine the properties of the suggested statistical tests. In these simulations a haploid population of size N chromosomes was simulated under the WrightFisher model. To model the mutational process an infinite sites model was used (Watterson 1975). The mutational makeup of each individual chromosome and the generation in which each segregating mutation arose were stored in computer memory. In each generation the population of chromosomes underwent mutation, selection, and genetic drift. The probability of a new mutation occurring in a given generation on a chromosome is μ=μ_{s} +μ_{n}, where μ_{s} is the mutation rate of selected mutations and μ_{n} is the mutation rate of neutral mutations. Simple multiplicative fitness was employed; thus the relative fitness of the ith chromosome is given by
Test statistics: We explored several tests of the hypothesis that the mean ages of segregating sites in two distinct classes of sites (e.g., neutral and selected) in a sample are equal. The basics of these tests are the same: a score is calculated for each segregating site. Denote the value of this score for the ith site by ω_{i}. Then, we define the test statistic
The power of such a test to detect differences in mean age between classes is obviously dependent on how strongly ω_{i} is correlated with the age of the mutation responsible for polymorphic site i. The power of the test using the different estimators for ω was evaluated as described in Computer simulations. One obvious candidate for ω is an estimator of the age of a mutation under a neutral equilibrium infinite sites model suggested by Griffiths and Tavaré (1999). This estimator uses E(T  X) (where T is the age of a mutation and X is the observed data) as an estimator of T. E(T  X) is calculated by integrating over all possible gene genealogies (branch lengths and topologies) while weighting each genealogy by its likelihood. The integral is evaluated by the Monte Carlo method described in Griffiths and Tavaré (1995, 1999). This estimator may have desirable statistical properties because all the information in the data regarding T is applied in the estimation of T. However, application of this estimator is computationally very time consuming. Furthermore, in cases where the data do not conform to the infinite sites model, it is necessary to remove conflicting sites or conflicting sequences. For this reason, considerable effort was devoted to developing additional estimators that could be calculated fast and did not require elimination of any data.
Given only b, the observed number of chromosomes bearing a mutation in a sample of n chromosomes, the expected age (a) of the mutation under strict neutrality is given by
This estimator (â) is applied in addition to E(T  X) because it can be calculated quickly, making it suitable for simulation purposes, and because it does not require the elimination of data when the assumptions of the infinite sites model are violated. However, we do not suggest that this heuristically derived estimator should be used for more general purposes. It was chosen here solely because it provided more power in the test than estimators based on the marginal site pattern such as estimators derived from Equation 3.
Power analysis: The power of the test statistic given in Equation 2 and of the McDonald/Kreitman test to detect the presence of deleterious mutations was evaluated by computer simulations as described in Computer simulations for different values of the selection coefficient. The proportion of simulated data sets that gave significant test statistic values was tabulated for a range of values of Ns for θ = 10 and for a range of values of θ for Ns = –3.
McDonald/Kreitman tests (McDonald and Kreitman 1991) of simulated data were performed as a Gtest in a 2 × 2 contingency table using the Williams correction for small sample size (see, for example, Sokal and Rohlf 1981). The numbers entering the contingency table were the number of fixed selected and neutral mutations and the number of segregating selected and neutral mutations in the sample. Empty cell entries were ignored when calculating the test statistic. This mimics a common method for performing the McDonald/Kreitman test and should not inflate the chosen significance level markedly under biologically reasonable parameter values.
Data analysis: The null hypothesis that we wish to test is that the average age of nonsynonymous mutations is the same as the average age of synonymous mutations. To analyze the data, all polymorphisms were partitioned into nonsynonymous and synonymous polymorphisms. To do this, the following heuristic approach was applied: the most common sequence was chosen as a reference sequence representing a hypothesized ancestral state. This was done only to establish an unambiguous criterion to determine if a mutation was synonymous or nonsynonymous. The identity of a variable site in a codon was determined by comparison to the homologous codon in the reference sequence. In this manner, all polymorphisms could be scored unambiguously as nonsynonymous or synonymous even when there are multiple segregating sites in a single codon. If three nucleotides were segregating in the same site, two mutations were inferred and represented as two variable sites. In most cases only one mutation was observed in each codon, suggesting that the effect of this heuristic approach does not seriously bias the analysis. Also, because the approach was applied blindly with regard to the class of the mutation, it should not inflate the chosen significance level in the resampling test.
For each data set, two resampling tests were performed, one using E(T  X) and one using the heuristically derived â as estimators of the age of a mutation. In each case, the data were resampled 100,000 times to provide a P value. When E(T  X) was applied, the Watterson (1975) estimator of θ was used to drive the simulations, and 100,000 genealogies (runs of the Markov chain) were sampled to provide estimates of the age of each mutation (see Griffiths and Tavaré 1995). Calculation of E(T  X) requires that the data strictly conform to the infinite sites model. For example, Harding et al. (1997) chose to eliminate apparent recombinant sequences. In the present analysis we instead eliminated sites to obtain a subset of sites that are consistent with the infinite sites model. The following heuristic algorithm was implemented for this purpose. Two diallelic sites are in conflict if all of the patterns in Table 1 are observed, where the labeling of state (0, 1) is arbitrary. Let C_{i} be the number of sites that conflict with site i. Then site i is eliminated from the data set when C_{i} = max{C_{1}, C_{2},..., C_{h}}, where h is the number of sites. This elimination scheme is repeated until C_{1} = C_{2} =... = C_{h} = 0. The test based on E(T  X) was performed on these reduced data sets, whereas the test based on â was performed on the full data sets. Data sets so small that a significant result was not theoretically possible were not included in the analysis. Four data sets were eliminated for this reason.
Amino acid saturation dynamics: To examine the relative degree of mutational saturation in nonsynonymous and synonymous substitutions in mtDNA, the number of nonsynonymous and synonymous nucleotide differences between pairs of species was plotted for 13 mtDNA genes (NADH1, NADH2, NADH3, NADH4, NADH4L, NADH5, NADH6, ATPase 6, ATPase 8, COI, COII, COIII, Cyt. b). The species included in this analysis were Homo sapiens (Andersonet al. 1981); Pan paniscus, Pongo pygmaeus (Horaiet al. 1995); Mus musculus (Bibbet al. 1981); Gallus gallus (Desjardins and Morais 1990); and Drosophila melanogaster (Garesse 1988). Sequences were aligned by eye and nonsynonymous and synonymous nucleotide differences were counted as described in Data analysis.
RESULTS
Expected ages of selected and linked neutral segregating mutations: Several results regarding the ages of neutral mutations are known. For example, the age of a neutral mutation with frequency b in a sample of size n is given by Equation 3. However, selection will affect the underlying gene genealogy and thus the distribution of coalescence times in those genealogies (e.g., Kaplanet al. 1988; Hudson and Kaplan 1994) and will therefore affect the expected age of a sampled selected mutation and of linked neutral mutations. In addition, the age of a selected mutation may differ from the neutral expectation because selected mutations may not be distributed on the genealogy according to a Poisson process.
We are interested in the ratio of the age of selected to the age of neutral mutations in models without recombination. We show that in the models of interest, selection tends to reduce this ratio below unity and that this property of models with selection can be used to test neutrality. First, consider a haploid model in which there is only one selected site in the population and assume that there are only two possible states for this site (A and a). A haplotype carrying state A has fitness 1 and a haplotype carrying state a has fitness 1 + s. Furthermore, assume that mutations occur from state A to state a at rate υ and from state a to state A at rate υ. Assume that neutral mutations occur according to an infinite sites model at rate μ and that there is no recombination. Following Hudson and Kaplan (1994), we analyze this model in the limit of large population sizes so that the frequency of A is maintained at the equilibrium value. Under this assumption the mutation/selection model can be analyzed as a model of migration between two populations. Mutation between allele A and a is equivalent to migration between two isolated subpopulations with population sizes Nf(A) and Nf(a), respectively, where N is the population size and f(A) and f(a) are the frequencies of A and a, respectively, in the population. The equilibrium frequency of a for biologically plausible values of s and υ is given by
Power analyses: The power of the McDonald/Kreitman test (McDonald and Kreitman 1991) has not received much attention in the literature, although Akashi (1999) recently explored its power under the assumption of free recombination between adjacent sites. We were interested in examining its power against the hypothesis of purifying selection in the absence of recombination. This is the case applicable to mitochondrial DNA. Such analysis may provide a crude estimate of the size of the selection coefficients required to explain the repeated observations of departures from neutrality in animal mtDNA (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Randet al. 1994; Rand and Kann 1996, 1998; Nachman 1998; Wiseet al. 1998). We examined the power of the test as described in methods (Figure 3) and found that, although it has a maximum at ∼Ns = –10, the test has power >50% over the range –2 > Ns > –40. Thus selection coefficients acting on nonsynonymous mutations in mtDNA should fall in this range if they are to account for the many observed deviations from neutrality.
Having demonstrated that under purifying selection the mean age of segregating selected sites is less than the mean age of segregating neutral sites in a population (Figures 1 and 2), we next turned to the problem of comparing mean ages of selected and neutral sites segregating in a sample of chromosomes drawn from a population. We examined the power of the AOM test (Equation 2) when using â as an estimator of the age of mutations by computer simulations as described in methods and found it to increase as Ns is reduced from 0 to –10 with a maximum in the range –10 > Ns > –50 (Figure 3a). Power analysis of the AOM test using E(T  X) as an computaestimator of age of mutations by simulation is tionally too expensive to perform.
Data analysis:The hypothesis that the ages of nonsynonymous and synonymous polymorphisms are equal was tested for 25 previously published animal mtDNA data sets (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Rand and Kann 1996; Wiseet al. 1998; and references in Nachman 1998 and Rand and Kann 1998). The results of the tests are described in Table 2. Note that in only 6 cases using â as the estimator of relative age and in only 4 cases using E(T  X) as the estimator of age was the hypothesis of an equal average age of nonsynonymous and synonymous mutations rejected by the AOM test, whereas the McDonald/Kreitman test rejected neutrality in 14 of 25 cases.
DISCUSSION
Application of the McDonald/Kreitman test (McDonald and Kreitman 1991) to animal mtDNA has repeatedly demonstrated that the evolution of this genetic element is not consistent with strict neutrality (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Randet al. 1994; Rand and Kann 1996, 1998; Nachman 1998; Hasegawaet al. 1998; Wiseet al. 1998). Relative to the number of nonsynonymous sites fixed between species, these data sets all show an excess number of segregating nonsynonymous sites within species. Kimura (1983) has shown that deleterious alleles contribute more to (withinspecies) heterozygosity than to (betweenspecies) divergence and thus, these observations may be consistent with the notion that most segregating synonymous mutations are roughly neutral but that most segregating nonsynonymous mutations are mildly deleterious. We set out to explicitly test this hypothesis. On the basis of theoretical predictions (Figures 1 and 2) we chose the ratio of the estimated mean age of nonsynonymous mutations to the estimated mean age of synonymous mutations as a test statistic. Two estimators of the age of a mutation were used to analyze published mtDNA data sets: E(T  X) and â (an estimator based on estimation of the underlying gene genealogy using classical phylogenetic methods). E(T  X) has the advantage of using all the information in a sample, but assumes that the data fit the infinite sites model. As shown in Table 2, most of the published data sets (18/25) do not fit this model. Furthermore, the calculation of E(T  X) is very time consuming, making it unsuitable for simulation purposes. This is the motivation for using â as an age estimator of the age of a mutation in addition to E(T  X).
However, note that the selective removal of sites when E(T  X) is used will not lead to falsely significant results. To realize this, note that under the null hypothesis of strict neutrality, both nonsynonymous and synonymous mutations should be distributed according to a Poisson process on the underlying shared gene genealogy. Because the distribution of the times of events in a Poisson process is given by the order statistics of a uniform random variable, the expected age of a mutation in a site is independent of the number of mutations occurring at the site. Thus the expected age of mutations in a data set will not change by the removal of sites with many mutations nor will the ratio of the expected ages of nonsynonymous to synonymous mutations change.
On the basis of an analysis of the power of the McDonald/Kreitman test (Figure 3a), we conclude that if purifying selection alone is responsible for the observed deviations from neutrality in animal mtDNA, selection coefficients must be in the range –2 > Ns > –40. It was surprising to us that the McDonald/Kreitman test retains power over such a wide range and that such large (negative) values of Ns may be responsible for the observations. Akashi (1999) found that little power remained for values of Ns <–10 and that essentially no power remained for values of Ns <–20. The discrepancy between the results can presumably be explained by the effect of linkage, which is known to reduce the efficacy of selection (e.g., Hill and Robertson 1966).
The AOM test based on â has less power to reject the hypothesis of neutrality across most of the ranges of Ns examined (Figure 3b) than the McDonald/Kreitman test. Power rises as Ns is reduced from –1 to –10 after which it appears to reach a plateau of ∼65%. It is not surprising that the McDonald/Kreitman test has more power in most of the parameter space because this test employs more data (interspecific comparisons in addition to intraspecific comparisons) than the test based on â. However, the new test based on â uses information in the data not employed by the McDonald/Kreitman test, and it is able to distinguish between hypotheses of balancing selection and selection on deleterious mutations. It should therefore provide additional information regarding selection coefficients when applied in addition to the McDonald/Kreitman test. Another advantage of the new test is that it is quite robust to the model assumptions. Under strict neutrality, the distribution of site patterns in nonsynonymous sites and in synonymous sites should be identical. If the weights assigned to different site patterns are poorly chosen, this will lead to a reduction in power but not to an inflation of the chosen significance level. This is also true even if the population is not at stationarity because of the decoupling of the mutational and the genealogical processes under neutrality.
Six out of 25 tests were significant using â as an estimator of the age of an allele (Table 2). The probability of randomly obtaining 6 or more significant results in 25 tests at the 5% significance level is P = 0.0012. One of the data sets (Mesomys) has a P value so low (P < 0.001) that the null hypothesis of no difference in age of synonymous and nonsynonymous mutations for all data sets can be rejected by a Bonferroni test. It is curious to note that all six significant data sets contain a relatively large number of segregating sites; indeed three of the six are the three largest data sets shown in Table 2 with respect to number of segregating sites. This may suggest that the small number of significant tests is caused by a lack of statistical power when only few polymorphic sites are available. In contrast there appears to be no bias among significant data sets with respect to number of chromosomes examined. Also, in 17/25 cases, the estimated mean age of nonsynonymous mutations is less than the estimated mean age of synonymous substitution (not shown). Together, these results provide weak support for the hypothesis that the average age of nonsynonymous mutations is less than the average age of synonymous mutations.
To test the hypothesis that the small number of significant AOM results in Table 2 could be the consequence of small sample size (polymorphic sites), additional power analyses of both the McDonald/Kreitman and the AOM tests were performed in which Ns was held constant and θ varied (Figure 4). Indeed the reduction in power when θ is small is somewhat more severe for the AOM test than for the McDonald/Kreitman test.
Supposing that purifying selection is acting on amino acid replacement mutations, the next question is which range of values of Ns are consistent with the observations from both the McDonald/Kreitman test and the AOM test presented here. Figure 5 (solid bars) shows the expected frequency distributions of P values for the test based on â derived from simulation with Ns = 3. Note that Ns = 3 is near the bottom of the power curve of the AOM test (Figure 3b); however, if Ns is larger, the expected distribution would be even more skewed toward low P values. In contrast the frequency distribution of P values observed in the 25 data sets is nearly uniform (Figure 5, hatched bars). It appears that Ns must be <3 to account for the distribution of P values obtained in the test based on â. However, if selection against deleterious mutations is to explain the many significant McDonald/Kreitman tests, Ns should be >2. There is therefore only a small window of values of Ns that are consistent with the findings in this article. Given the variability of population sizes in naturally occurring populations, it is difficult to explain why all values of Ns should fall in this range.
The model applied to describe selection against deleterious mutations is very simple in that it assumes that all nonlethal nonsynonymous mutations have the same selection coefficient and that this selection coefficient is constant in time. Likewise, it is assumed in the simulations that the populations are in stationarity, i.e., a constant population size and no population subdivision. Although there is no reason to assume that a more complex demographic model or a more complex distribution of negative selection coefficients would change the relative power of the McDonald/Kreitman test vs. the AOM test, we cannot rule out this possibility.
If purifying selection alone is not responsible for the repeated observations of deviation from neutrality, what other processes are consistent with the data? One obvious explanation is that the significant results partly are artifacts caused by the simple methods used to estimate the number of nonsynonymous and synonymous mutations (e.g., Hasegawaet al. 1998). Such an explanation would require that saturation occur faster in nonsynonymous than in synonymous sites, i.e., the number of nonsynonymous nucleotide differences should be a convex function of the number of synonymous nucleotide differences (Nielsen 1997). To examine this hypothesis, we plotted the number of nonsynonymous nucleotide differences between pairs of species against the number of synonymous nucleotide differences for several mitochondrial genes (Figure 6). As expected, saturation occurs faster in synonymous sites than in nonsynonymous sites. Thus the use of nucleotide differences instead of the estimated number of substitutions in the McDonald/Kreitman test does not appear to be responsible for the apparent pattern of deviation from neutrality in mtDNA evolution. In fact, saturation by multiple hits should make the McDonald/Kreitman test more conservative against the alternative of segregating deleterious mutations.
Another problem is sequencing errors. Such errors could inflate the significance level of the McDonald/Kreitman test because they would tend preferentially to elevate the number of intraspecific nonsynonymous mutations. Likewise, sequencing errors could be of importance in explaining the significant results from the AOM test presented in this article. However, sequencing errors should also elevate the ratio of nonsynonymous mutations to synonymous mutations for small levels of divergence in interspecific data. Because no such pattern is observed in Figure 6, we do not believe that this explanation is likely. Moreover, elevated intraspecific nonsynonymous mutations have not been observed in nuclear data sets (McDonald and Kreitman 1991; D. M. Weinreich and D. Rand, unpublished results). However, we cannot conclusively rule out that sequencing errors may be of some importance in explaining the large number of significant McDonald/Kreitman tests.
Theoretically, balancing selection acting on mtDNA could maintain levels of nonsynonymous polymorphism within species above that seen between species (Nachmanet al. 1994; Rand and Kann 1996). Balancing selection could act through cytoplasmicnuclear interaction (McRae and Anderson 1988; Babcock and Asmussen 1996; Goodisman and Asmussen 1997). However, it is difficult to imagine that this type of selection can act on sufficiently many nucleotide sites simultaneously to account for the observed results. Indeed, selection against slightly deleterious mutations appears at the moment to be the most plausible explanation for the significant results of the McDonald/Kreitman test in animal mtDNA. However, a combination of forces cannot be excluded.
Acknowledgments
This study was supported by a fellowship to R.N. from the Danish Research Council. David Rand and Michael Nachman provided several animal mtDNA data sets prior to publication.
Footnotes

Communicating editor: A. G. Clark
 Received December 20, 1998.
 Accepted June 2, 1999.
 Copyright © 1999 by the Genetics Society of America