Abstract
Artifactual evidence of adaptive amino acid substitution can be generated within a McDonald-Kreitman test if some amino acid mutations are slightly deleterious and there has been an increase in effective population size. Here I investigate the conditions under which this occurs. I show that fairly small increases in effective population size can generate artifactual evidence of positive selection if there is no selection upon synonymous codon use. This problem is exacerbated by the removal of low-frequency polymorphisms. However, selection on synonymous codon use restricts the conditions under which artifactual evidence of adaptive evolution is produced.
THE McDonald-Kreitman (MK; McDonald and Kreitman 1991) test is a powerful test of neutral molecular evolution; furthermore, it can be used to infer the proportion of substitutions driven by positive adaptive evolution (Charlesworth 1994; Akashi 1999; Fayet al. 2001; Smith and Eyre-Walker 2002). Under the MK test, the pattern of evolution within a species is compared to that between species, for two different types of site. Typically the data are divided into synonymous and nonsynonymous sites, and this is how the test is phrased throughout this article. However, the test can also be applied to other categorizations of sites; for example, Jenkins et al. (1995) applied the test to protein and nonprotein binding sites within the ftz enhancer element in Drosophila.
Let us imagine that we have mutliple alleles of a gene from within a species and a single outgroup sequence, of the same gene, from a different species. We count the number of sites at which we have a synonymous (Ps) or a nonsynonymous polymorphism (Pn) and the number of sites at which there has been a synonymous (Ds) or a nonsynonymous (Dn) substitution. Under the neutral theory of molecular evolution, in which all mutations are either neutral or strongly deleterious, it is not difficult to show that Pn/Ps = Dn/Ds. This forms the basis of the MK test of neutral evolution. In a number of data sets violations of the equality Pn/Ps = Dn/Ds have been demonstrated. Possibly the most interesting of these are those in which Dn/Ds > Pn/Ps, since this is consistent with adaptive amino acid substitution. If there has been adaptive amino acid substitution, the proportion of substitutions that were adaptive can be estimated as
It has been argued that artifactual evidence of adaptive amino acid substitution can be obtained in a MK test if some nonsynonymous mutations are slightly deleterious and the current effective population size is larger than the long-term effective population size (McDonald and Kreitman 1991); this is because slightly deleterious mutations that currently do not segregate in the population, may have been fixed in the past. Here I investigate the difference in effective population size that is needed to generate artifactual evidence of adaptive evolution when there are slightly deleterious amino acid mutations.
THE MODEL
Let us consider the divergence between two species and let us imagine that the effective population size was
—The model.
Kimura (1983) has shown that the probability that such a mutation will ultimately become fixed in the population is
A simple model: Let us assume for simplicity that all synonymous mutations are neutral and that nonsynonymous mutations are either deleterious or slightly delete-rious; then the estimated proportion of amino acid substitutions that have been driven by adaptive evolution is
Let us begin by assuming that the effective population was
—The effect of increasing Ne for deleterious mutations when the increase occurred (a) very recently (γ= 1) and (b) at the time of divergence of the two species being considered (γ= 0.5). (c) The case when the population was bottlenecked for 10% of the divergence time (γ= 0.1). In all cases the sample size was set at 10 sequences and the curves with increasing dash length are 4Nes =-0.1, -1, -2, and -4.
The critical value of λ above which α > 0 for a model in which nonsynonymous mutations are deleterious and synonymous mutations are neutral
Table 1 gives the critical value of λ above which α> 0. The threshold increases as a function of both the strength of selection and the sample size, although it is not heavily dependent upon the latter. The increase in Ne needs to be at least threefold to generate artifactual evidence of positive selection for a sample of 10 sequences.
If we now assume that the increase in effective population size occurred sometime in the past (i.e., γ< 1), then α is again underestimated for deleterious mutations (Figure 2). Small increases in Ne actually decrease α, but larger increases generate artifactual evidence of positive selection. If λ* is the value of λ that gives α> 0 when γ= 1, the threshold for γ< 1 is at least λ*/γ, although it is much greater than this for mutations of small effect (Table 1). Note that a decrease in effective population size can increase α, but it nevers leads to artifactual evidence of adaptive evolution.
Although I phrased the model in terms of an increase in effective population size, one can equally think of the population as going through a bottleneck during the divergence between the species; in fact, it is more sensible to think in terms of a population bottleneck when γ is small—for example, γ= 0.1 corresponds to a bottleneck of 10% of the total divergence time. Furthermore, it is worth noting that this bottleneck could have occurred at any time during the divergence of the species; it does not have to have been in the lineage leading to the outgroup taxon.
Excluding rare variants: The fact that slightly deleterious mutations can make the McDonald-Kreitman test highly conservative, when the population size has not changed, led Fay et al. (2001; see also Charlesworth 1994) to suggest that polymorphisms segregating below a certain frequency be ignored. They reasoned that this would increase the power of the MK test because slightly deleterious mutations do not segregate at high frequency. Unfortunately this is also likely to make the test more sensitive to changes in effective population size. To investigate, I revaluated Equation 3, ignoring mutations segregating at a frequency <k, where k was set to 0.1 and 0.2 (Figure 3). The effect is clear, as the cutoff frequency increases so that the MK test becomes more sensitive to changes in Ne; the overestimation of α becomes larger, it occurs more readily, and the range of selection coefficients that readily yield an overestimation of α is broader. This latter effect is slightly deceptive, since mutations with selection coefficients 4Nes < -4 contribute little to evolution, since their fixation probabilities are very low. If we assume that the change in Ne occurred sometime in the past, then larger increases in Ne are needed to generate artifactual evidence of adaptive evolution; exactly the same relationship holds as above, if λ* is the value of λ that gives α> 0 when γ= 1, the threshold for γ< 1 is at least λ*/γ.
Selection on synonymous codon use: We have so far assumed that synonymous mutations are neutral, but there is evidence in many species that synonymous codon use is subject to selection (Sharpet al. 1992). To start, let us assume that both nonsynonymous and synonymous mutations are unconditionally deleterious with effects of Sn = 4Nesn and Ss = 4Ness. Under this model
However, assuming that synonymous mutations are unconditionally deleterious is unrealistic. It is more usual to model synonymous codon bias as being a balance among mutation, selection, and genetic drift (Li 1987; Bulmer 1991). Consider a biallelic site at which an individual homozygous for allele A1 has an advantage of +2s over an individual homozygous for A2, where there is semidominance (i.e., the advantage of the heterozygote is +s). We assume that the mutation rate between A1 and A2 is the same in both directions and is equal to u. At equilibrium the proportion of sites occupied by the A1 allele is
—The effect of removing mutations segregating at frequencies ≤0.1 (b) and 0.2 (c) vs. when all mutations are considered (a). The sample size was set at 10 sequences and the curves with increasing dash length are 4Nes =-0.1, -1, -2, -4, and -6.
—The effect of slightly deleterious synonymous mutations. (a) Sn =-2 and for increasing dash length, starting with the solid lines, Ss = 0.1, 0.5, 1, 2, and 3. (b) Ss =-4 and for increasing dash length, starting with the solid lines, Ss = 0.1, 1, 2, 3, 4, and 5. The sample size was set at 10 sequences.
Now let us consider the case where the ancestral population size was λ
Since it does not seem sensible to consider the case where γ= 1, since synonymous codon bias would not have equilibrated at the population size of λ
DISCUSSION
As McDonald and Kreitman originally pointed out in their seminal paper (McDonald and Kreitman 1991), artifactual evidence of adaptive amino acid substitution can be generated by an increase in effective population size, if some amino acid mutations are slightly deleterious. The conditions under which this can occur appear to be quite permissive if there is no selection upon synonymous codon use—a 3-fold increase in effective population size is sufficient to generate artifactual evidence of positive selection if the change in effective population size occurred very recently. If the change in effective population size occurred sometime in the past, be it an increase to the current population size or a bottleneck during the divergence of the species, the change in the population size needs to be larger, but not very large; for example, if the population was bottlenecked for 10% of the divergence time, then mutations of moderate effect would produce artifactual evidence of adaptive evolution if the reduction in population size was ∼50-fold.
—The effect of selection on synonymous codon use, where synonymous codon use is in a mutation-selection-drift balance equilibrated at a population size of . In each graph Sn =-2 and for increasing dash length, starting with the solid lines, Ss = 0.001, 1, 2, 3, 4, and 5. (a) γ= 1.0, (b) γ= 0.75, and (c) γ= 0.5. The sample size was set at 10 sequences.
—The effect of selection on synonymous codon use, where synonymous codon use is in a mutation-selection-drift balance equilibrated at a population size of . In each graph Sn = -2 and increasing dash length, starting with the solid lines, Ss = 0.001, 1, 2, 3, and 4. (a) γ= 0.1 and (b) γ = 0.5. The sample size was set at 10 sequences.
The range of selection coefficients over which an increase in effective population generates artifactual evidence of positive selection may seem rather small, but mutations with 4Nes values <-4 contribute little to substitution anyway, since their fixation probabilities are very low (<6% that of a neutral mutation). The critical quantity is the proportion of mutations with -0.1 > 4Nes >-4 out of the mutations with 0 > 4Nes >-4. This we do not know, but we can make some inferences. The level of constraint in protein-coding genes, as measured by the ratio of the nonsynonymous to the synonymous substitution rate, is <0.3 in most species (e.g., see Eyre-Walkeret al. 2002; Keightley and Eyre-Walker 2000), including species like our own that have very low effective population sizes. If all mutations were equally deleterious the strength of selection needed to produce such a level of constraint would be 4Nes ≈ -2 (calculated using Equation 5). However, it seems very likely that some mutations are actually much more deleterious than this, which means that on average 4Nes >-2 for slightly deleterious mutations. In fact, a substantial proportion of mutations could be effectively neutral at all population sizes, in which case they will provide no artifactual evidence of adaptive evolution. To investigate this further, let us assume that the strength of selection is exponentially distributed,
—The effect of variation in the strength of selection acting upon nonsynonymous mutations, which are assumed be deleterious, but exponentially distributed with S¯ = -6.74. Synonymous mutations are assumed to be neutral. Curves of increasing dash length, starting with the solid lines, are γ = 1, 0.5, and 0.1.
Although it is relatively easy to generate artifactual evidence of positive selection when there is no selection on synonymous codon use, this is generally not the case when there is selection. The behavior depends upon which population size synonymous codon use has equilibrated at—if we assume that synonymous codon bias equilibrated at some ancestral population size and that the population size has subsequently increased in the lineage we have sampled polymorphism from, then selection on synonymous codon use reduces the effect of increasing population size; i.e., if α is overestimated, the bias is not as great as if there was no selection on synonymous codon use. Furthermore, there is often no artifactual evidence of adaptive amino acid substitution for any parameter combination if the change occurred sometime in the past, since under these conditions adaptive evolution occurs at synonymous sites leading to α< 0. Adaptive evolution occurs at synonymous sites because the equilibrium frequency of the preferred codon is lower in the ancestral population size than it will be eventually at the increased population size, and there is therefore a period during which advantageous preferred codons are fixed. In contrast, if synonymous codon bias equilibrated at the current size of the population that has been sampled for polymorphism data, but there have been bottlenecks during the divergence of the species, then synonymous codon bias increases α, but α is still often negative. Only if the strength of selection on synonymous mutations is similar to the strength of selection on nonsynonymous mutations, do we get artifactual evidence of positive selection. Of course, artifactual evidence of adaptive amino acid substitution is produced if the strength of selection is greater upon synonymous mutations than on nonsynonymous mutations, but this is true whether or not the population size has changed.
Do these results have any implications for our estimates of adaptive evolution? The proportion of amino acid substitutions that have been fixed by adaptive evolution has been estimated in Drosophila (Akashi 1999; Fayet al. 2002; Smith and Eyre-Walker 2002) and in humans (Fayet al. 2001). Akashi (1999) suggested that the vast majority of the amino acid substitutions in Drosophila were probably a consequence of adaptive evolution, since in the sample of genes he studied, amino acid variants were present either as very rare polymorphisms or as fixed differences between species, in contrast to synonymous variants that segregated at a variety of frequencies; he therefore inferred that most of the amino acid polymorphisms were slightly deleterious and that the fixed differences were due to adaptive evolution. Subsequently Fay et al. (2002) estimated that 26% of the amino acid substitutions between Drosophila melanogaster and D. simulans had been fixed by positive selection, a number not significantly different from the 45% estimated for the divergence between D. simulans and D. yakuba (Smith and Eyre-Walker 2002). However, it is believed that D. melanogaster and D. simulans have both spread out from Africa relatively recently, so many of the polymorphism data, coming as they do from North America, are from a population that has undergone an increase in size (Begun and Aquadro 1993; Andolfatto 2001). Furthermore, there is no evidence of selection currently acting on synonymous codon use in D. melanogaster (Akashi 1996; Akashi and Schaeffer 1997). In D. simulans there is evidence of current selection on synonymous codon use (Akashi and Schaeffer 1997; Kliman 1999; Begun 2001) but also evidence that selection may have been absent in the past (Begun 2001). The conditions in Drosophila therefore appear to be exactly those most likely to generate an overestimation of adaptive evolution—i.e., an expansion in population size and little, or no, selection on synonymous codon use.
However, there are several reasons for believing that α has not been overestimated. First, although D. simulans and D. melanogaster have expanded out of Africa, the effective population size of the non-African population appears to be lower than that of the African population (Begun and Aquadro 1993; Andolfatto 2001), which means that the non-African population is probably smaller than the ancestral population (i.e., the situation actually corresponds to a recent contraction rather than an expansion). Some caution needs to be exercised because an increase in the population size increases the efficiency of selection, so the effective population size experienced by neutral variation might be lower than that experienced by selected variation (Otto and Whitlock 1997). Second, there is evidence of adaptive amino acid substitution between D. melanogaster and D. simulans even when only African D. melanogaster sequences are used in the analysis (Fayet al. 2002). Third, although the average frequency of nonsynonymous polymorphisms was lower than that of synonymous polymorphisms in the eight genes studied by Akashi (1999), this is not generally the case for genes in either D. simulans (Smith and Eyre-Walker 2002) or D. melanogaster (N. G. C. Smith and A. Eyre-Walker, unpublished results). Furthermore, the ratio of Pn/Ps does not differ significantly, or in a consistent direction, between African and non-African populations of D. melanogaster and D. simulans, despite their different effective population sizes (Andolfatto 2001). So there is no evidence that amino acid mutations are on average more deleterious than synonymous mutations, although neither of these lines of evidence is strong. Fourth, Fay et al. (2002) have argued that evidence of positive selection in Drosophila is unlikely to be a consequence of an expansion in effective population size since the effect should be seen for all genes, whereas positive selection is observed for only a small number of genes. However, genes may vary in the proportion of mutations that are slightly deleterious.
Fay et al. (2001) have estimated that ∼35% of all amino acid substitutions in the divergence in humans and old-world monkeys were adaptive, using single-nucleotide polymorphism data from humans. However, humans have undergone population size expansion and there is no clear evidence of selection on synonymous codon use (Eyre-Walker and Hurst 2001). Furthermore, because nonsynonymous polymorphisms were segregating at lower frequencies than synonymous mutations in their data, Fay et al. removed low-frequency variants. So the estimate of adaptive evolution in humans may be an overestimate. Interestingly, there is an excess of amino acid polymorphism, and not substitution, in human mitochondrial DNA, when human mtDNA is compared to chimpanzee (Nachmanet al. 1996). This suggests that although the human population size has expanded, the current effective population size is actually similar to or smaller than the long-term effective population size separating humans and chimpanzees, a view corroborated by the fact that the estimate of the effective population size of humans is lower than that of chimpanzees (Wiseet al. 1998; Eyre-Walkeret al. 2002) and the ancestor of humans and chimpanzees (Chen and Li 2001). This may mean that the estimate of adaptive substitution in nuclear DNA is not a substantial overestimate.
In summary, an increase in effective population size can generate artifactual evidence of adaptive amino acid substitution if there are slightly deleterious amino acid mutations; the conditions are quite permissive if there is no selection upon synonymous codon use, but the conditions become more restrictive if there is selection at synonymous sites.
Acknowledgments
Thanks to Nicolas Bierne for helpful discussion and the Biotechnology and Biological Sciences Research Council and Royal Society for support.
Footnotes
-
Communicating editor: G. B. Golding
- Received May 21, 2002.
- Accepted September 30, 2002.
- Copyright © 2002 by the Genetics Society of America