Departures from the assumption of homogenously interdigitated neutral and putatively selected sites in the McDonald–Kreitman test can lead to false rejections of the neutral model in the presence of intermediate levels of recombination. This problem is exacerbated by small sample sizes, nonequilibrium demography, recombination rate variation, and in comparisons involving more recently diverged species. I propose that establishing significance levels by coalescent simulation with recombination can improve the fidelity of the test in genomewide scans for selection on noncoding DNA.
THE McDonald–Kreitman (MK) test is a widely used statistical test of neutral model of evolution originally proposed for nonrecombining protein sequences (McDonald and Kreitman 1991; Nielsen 2005). The test compares within-species polymorphism and between-species divergence for two distinct classes of sites: synonymous sites, which are assumed to be neutral and nonsynonymous sites, which are putative targets of selection. The null model assumes that some fraction, f, of nonsynonymous sites is strongly deleterious and contributes negligibly to polymorphism and divergence. For the remaining nonsynonymous sites, (1 − f), the ratio of polymorphism to divergence is expected to be identical to that for synonymous sites, if both behave according to the neutral model.
Departures from expectations under neutral model are expected if nonsynonymous sites experience negative or positive selection. If the evolution of (nonlethal) nonsynonymous sites in a particular gene is largely governed by negative (or purifying) selection, this will tend to decrease the level of divergence at nonsynonymous sites more strongly than levels of polymorphism (Kimura 1983). The result will be that nonsynonymous sites will exhibit a higher ratio of polymorphism to divergence than putatively neutral synonymous sites. In contrast, if the evolution of nonsynonymous sites is governed primarily by positive selection, this will tend to decrease the ratio of polymorphism to divergence relative to synonymous sites. A departure from the neutral expectation in either direction can be detected by applying a standard statistical test to a 2 × 2 contingency table of polymorphism and divergence counts for synonymous and nonsynonymous sites.
An attractive feature of the MK test is that the inference of selection, although not necessarily its mode, is remarkably robust to assumptions about nonequilibrium demography (Nielsen 2001; Eyre-Walker 2002) and recombination rates (Sawyer and Hartl 1992). In the absence of recombination, this robustness stems from the fact that all surveyed sites share the same genealogy and thus the entries of the 2 × 2 table are sufficient statistics (Nielsen 2001). In the presence of recombination, the robustness of the MK test is largely owed to the fact that nonsynonymous and synonymous sites are homogenously interdigitated in protein sequences (Figure 1A).
Using synonymous (or other neutral sites) to test for selection in linked noncoding DNA:
Although the MK test was originally applied to protein coding sequences, it can readily be generalized to test for departures from neutrality using any putatively neutral and putatively selected classes of mutations. For example, several recent studies have applied the MK test to putatively functionless (i.e., neutral) vs. putatively functional (i.e., selected) sites in noncoding DNA elements (Jenkins et al. 1995; Ludwig and Kreitman 1995; MacDonald and Long 2005; Ding and Kullo 2006; Casillas et al. 2007). Similarly, a wave of recent studies has used closely linked synonymous sites to test for departures in noncoding DNA (Kohn et al. 2004; Begun et al. 2007; Holloway et al. 2007; Egea et al. 2008; Jeong et al. 2008) or compared synonymous sites and noncoding sites from unlinked genomic regions (Andolfatto 2005; Casillas et al. 2007; Egea et al. 2008). These applications of the MK test are a departure from the standard assumption of the MK test—that neutral and selected sites are homogenously interdigitated (i.e., compare Figure 1A to Figures 1B and 1C).
Departures from the assumption of homogenously interdigitated neutral and selected sites make the MK test sensitive to assumptions about the recombination rate. Intuitively this problem arises because, in the presence of recombination, neutral sites in one part of a sequence no longer share the same genealogical history as selected sites in other parts of the sequence (Hudson 1983). As illustration of the problem, I used Hudson's (2002) coalescent program ms to simulate data that resemble the situation depicted in Figure 1B—a case in which linked synonymous sites are used to test for departures from neutrality in linked noncoding DNA. I simulated genealogies with intralocus recombination and generated MK test contingency tables with polymorphism and divergence counts from the first and second half of the simulated region. Since the MK test lacks power when table cell counts are low, I restricted tests to replicates that had a minimum of six mutations in each marginal row and column count [following Begun et al. (2007) and Holloway et al. (2007)].
Figure 2 shows the performance of the MK test under the standard neutral model (SNM) (a large panmictic population at mutation-drift equilibrium; all sites are neutral). Note that when there is no recombination, or when recombination rates are high (i.e., ρ/θ = 100), the type-I error of the MK test is <5%. However, with recombination rates empirically estimated in humans (ρ/θ ∼ 1, Pritchard and Przeworski 2001) and Drosophila (ρ/θ ∼ 4–10, Andolfatto and Przeworski 2000), there is considerable false rejection of the neutral model when θ for the surveyed genomic region is large (i.e., >3 under the standard neutral model for sample size n = 12 and divergence time T = 8Ne generations ago). The inflation in the type-I error of the test is generally symmetrical on both tails of the test and the median 2 × 2 table is close to that expected under the SNM (results not shown). These features distinguish the linkage effect described here from linkage-selection effects described elsewhere (Smith and Eyre-Walker 2002; Welch 2006; Shapiro et al. 2007).
In Figure 3, I show how several other factors further exacerbate the problem outlined above. Figure 3A shows an effect due to sample size. The type-I error generally decreases with increasing sample size, although the effect is diminished as the number of polymorphisms increases (as suggested by Figure 2). In Figure 3B, I show examples of three nonequilibrium demographic departures from the SNM that increase levels of linkage disequilibrium, such as population structure or a recent bottleneck (or their combined effects). Note that under all demographic models simulated, θ and T vary; however, the average number of observed polymorphic and divergent mutations is the same and correspond to θ = 1.5, T = 2, and n = 12 under the SNM. These results suggest that nonequilibrium demography can considerably elevate type-I errors relative to the SNM. In Figure 3C, I show that the type-I error of the MK test also increases as speciation times become more recent (relative to within-species coalescence times).
In Figure 3D, I show that the type-I error of the test can also be inflated by heterogeneity in recombination rates or the presence of an intervening intron. The elevation in recombination between the two halves of the sequence decreases the extent to which their genealogies are correlated, thus increasing the type-I error of the test (in fact, type-I error is maximized when the two regions compared are unlinked; results not shown). Recombination hotspots are a pervasive feature of yeast and mammalian genomes (Coop and Przeworski 2007) and evidence for small-scale spatial variation in recombination rates, though less extreme, is beginning to be documented in Drosophila (Cirulli et al. 2007). However, even in a uniform recombination environment, the presence of a large intron may effectively increase the recombination distance between two exons (for example one exon containing a UTR and the other synonymous sites to which it is compared) will look similar to recombination rate variation. Similarly, spatial variation in mutation rates, or levels of selective constraint, will also lead to patterns that look similar to recombination rate variation (Andolfatto and Przeworski 2000).
The results in Figures 2 and 3 suggest that caution should be used in interpreting the MK test in comparisons of noncoding DNA to synonymous or other putatively neutral sites. For example, in a genomewide application of this approach to detect noncoding DNA elements rejecting the neutral model, Begun et al. (2007) found that the proportion 5′-UTR, 3′-UTR, intron, and intergenic loci that deviated from neutrality with P < 0.05 was 0.13, 0.13, 0.12, and 0.17, respectively. In that study, the sample size was n = 6 (or smaller) and an average coding region (∼300 synonymous sites) has a ∼ 6–10 while an average 300 bp noncoding region would have θ ∼ 3–6. In addition, synonymous site divergence along the Drosophila simulans lineage is particularly short (Begun et al. 2007). The results in Figures 2 and 3 suggest that the proportions of rejections observed by Begun et al. are not necessarily unexpected under neutrality particularly given estimated levels of recombination D. simulans (ρ/θ ∼ 1, Andolfatto and Przeworski 2000), the small sample size used, the relatively low levels of synonymous site divergence along the D. simulans lineage and evidence for population structure and recent bottlenecks in this species (Wall et al. 2002; Baudry et al. 2006).
Correcting for linkage effects by simulation:
Andolfatto (2005) previously noted that pooling data from unlinked regions (see Figure 1C) can lead to incorrect confidence limits for the MK test. He proposed to correct this problem by simulating data to determine an appropriate critical P-value (P_crit) that accounts for the effects of linkage on the confidence interval of the test. A similar approach can be taken when the MK test is applied in situations resembling Figure 1B. As an example, I use recent findings by Jeong et al. (2008), who used the MK test in a similar way to investigate the mode of selection operating on tMSE, an adult abdominal pigmentation-specific cis-regulatory element (CRE) of the tan locus in Drosophila (see Figure 1D). In this study, Jeong et al. (2008) conclude that there is highly significant evidence for adaptive evolution of tMSE in a D. yakuba–D. teissieri comparison (P = 0.001), but fail to detect evidence for adaptive evolution when using D. santomea, consistent with the recent inactivation of this CRE (associated with a loss of pigmentation) and subsequent accumulation of polymorphism in the latter species.
Table 1 shows a reanalysis of the data for the D. yakuba–D. teissieri comparison that incorporates effects of linkage. When ρ/θ = 1, the type-I error of the MK test is 14.1% and the true P-value (P_true) of the data is ∼0.016 rather than 0.001 as estimated assuming no recombination. If ρ/θ is 4, the type-I error of the test is 10% and the true P-value of the data is ∼0.008. Thus, it appears that, in this case, the conclusion of adaptive evolution is robust to the issue of linkage, particularly given that ρ/θ is estimated to be ∼15 on the basis of levels of intragenic linkage disequilibrium in this species group (Bachtrog et al. 2006).
The MK test assumes that polymorphic and divergent mutations within a locus are homogenously interdigitated. Some applications of the MK test, such as using synonymous sites to test for departures in noncoding DNA violate this assumption and result in spurious rejections of the neutral model in the presence of intralocus recombination. This problem is alleviated to some extent by very high levels of recombination, but can be exacerbated by any factor that increases the scale and the variance in levels of linkage disequilibrium, such as nonequilibrium demography, the presence of introns and small-scale spatial variation recombination rate, mutation rate, or variation in selective constraint. While this article has focused on the MK test, the same issues apply to estimating confidence limits on the fraction of adaptive divergence or the estimation of selection coefficients using similar statistical frameworks (Kohn et al. 2004; Andolfatto 2005; Begun et al. 2007; Casillas et al. 2007; Haddrill et al. 2008).
Perhaps a bigger concern is evaluating evidence for selection at individual genes or genomic regions, relative to others, using the MK test in genomewide scans for selection. Since the type-I error of the MK test (applied as in Figure 1, B or C) increases with increasing size of the regions surveyed (i.e., with increasing θ), type-I error is expected to be higher for tests involving longer genes or noncoding regions. As a result, it may be misleading to rank genomic regions of different sizes by their P-values without accounting for locus-specific propensities for error. Further, if gene ontology (GO) categories are associated with particular gene structures (such as long introns for example) or recombination hotspots (e.g., International HapMap Consortium 2007), this may also make it problematic to compare distributions of MK test P-values by GO category.
While the linkage effect described here is clearly an issue in Drosophila (discussed above), how important is this issue to species, such as primates, with smaller effective population sizes? In humans, an average protein-coding gene with 300 synonymous sites has ∼1.5 polymorphisms (SNPs) on average in a sample of 39 individuals (Bustamante et al. 2005). Figure 2 suggests that, for small genomic region sizes (i.e., on the scale of <6 kb in humans), excessive type-I error is not likely to be an issue in humans under the standard neutral model. On the other hand, Figure 3B suggests nonequilibrium demography can be an issue when average SNP density is as low as 4.5/region (keeping in mind we have limited analyses to 2 × 2 tables that have marginal sums of more than five mutations). This corresponds to ∼1 kb of noncoding DNA in a sample of 39 human individuals (assuming average diversity levels of ∼0.001 per site) (Zhao et al. 2006). Figure 3D further suggests that recombination rate variation may further exacerbate the problem in humans.
The examples outlined here are largely meant to be illustrative rather than quantitative. As illustrated in the above example of the tan CRE tMSE, it is straightforward enough to simulate genealogies with linkage to obtain more appropriate P-values for the MK test in particular cases. One can similarly establish more appropriate confidence intervals for the MK test and related methods to estimate selection parameters, using multilocus data sets (Andolfatto 2005). In the case of genomewide scans for selection using the MK test, one could incorporate more realistic details about how putatively neutral and selected sites are spatially organized across the genome to establish more appropriate critical P-values. Though coalescent programs such as ms and msHOT (http://home.uchicago.edu/∼rhudson1/source/mksamples.html) can implement a wide range of genome structures and demographic scenarios, accounting for uncertainty about the demographic history of populations, and complex sampling schemes used by researchers, present a formidable challenge. From the limited simulations carried out here, it is clear that caution should be used in interpreting the MK test when population samples come from recently bottlenecked or highly structured populations, unless it is clear that neutral and selected polymorphic and divergent mutations are close to evenly interdigitated.
An R script to determine type-I error, critical P-values, and true P-values from a list of MK tables is available from the author by request.
Thanks go to M. Przeworski and K. Thornton for helpful discussions and C. Bustamante, G. Coop, D. Presgraves, J. Jensen, and S. Wright for comments on earlier versions of this manuscript.
Communicating editor: R. Nielsen
- Received May 23, 2008.
- Accepted September 5, 2008.
- Copyright © 2008 by the Genetics Society of America