Simulations of positive directional selection, under parameter values appropriate for approximating human genetic diversity and rates of recombination, reveal that the effects of strong selective sweeps on patterns of linkage disequilibrium (LD) mimic the pattern expected with recombinant hotspots.
IN several cases, the local distribution of meiotic recombination in humans is nonuniform and concentrated into small regions, on the order of 1 kbp in size, termed recombinant hotspots (for reviews see Petes 2001; Arnheim et al. 2003; de Massy 2003; Wall and Pritchard 2003; Kauppi et al. 2004). These recombinant hotspots appear to be a common feature in the human genome (Crawford et al. 2004; McVean et al. 2004) and are rarely shared between humans and closely related species (Wall et al. 2003; Ptak et al. 2004, 2005; Winckler et al. 2005). This suggests recombinant hotspots may be rapidly evolving and species specific. These hotspots contribute to the “haplotype block” pattern of genetic regions with high linkage disequilibrium (LD; nonindependent associations between alleles at different positions) separated by boundaries of low LD, which is actively being characterized to optimize marker choice for association mapping studies (International HapMap Consortium 2003, 2005; Tishkoff and Verrelli 2003). Knowledge of the distribution of LD is critical to mapping the genetic basis of complex phenotypes (Weiss and Clark 2002). Methods have been developed to detect recombinant hotspots from DNA sequence data, which utilize patterns of LD to infer the existence of these hotspots (e.g., Chakravarti et al. 1984; Li and Stephens 2003; Zhang et al. 2004; for review see Stumpf and McVean 2003). In particular, Li and Stephens (2003) developed a coalescent-based “product of approximate conditionals” model, which uses the distribution of haplotypes to estimate the likelihood of the underlying recombination rate.
Positive directional selection, in which a new mutant rises in frequency and quickly fixes in a population (i.e., a selective sweep), can be rapid on an evolutionary timescale and/or population specific (for reviews see Andolfatto 2001; Aquadro et al. 2001; Schlötterer 2002a; Bamshad and Wooding 2003). One predicted effect of this type of selection on a sample of DNA sequences is an increase in LD in regions flanking the site undergoing selection (Kim and Stephan 2002; Przeworski 2002), but a reduction of LD across the site of selection (Kim and Nielsen 2004). This dual pattern may not be intuitive at first, so consider that reducing the genealogical history of a sequence reduces the number of recombination events, thus generally increasing LD in the region. However, for linked genetic variation to be present immediately after a selective sweep, it must either be a new mutation, and therefore rare and contribute little to overall LD, or have experienced recombination during the sweep with the target of selection. This suggests that haplotypes with LD between polymorphic alleles that span the target of selection will not persist beyond the fixation of the selected allele. Here we do not address gene conversion, which could preserve LD over the site of selection. This pattern of two regions of high LD, separated by low LD, is similar to the pattern of LD expected with a recombinant hotspot. Furthermore, the speed and species specificity of selective sweeps may also mimic the species-specific distribution of recombinant hotspots.
To explore the possible effect of selective sweeps on the inference of recombinant hotspots, we simulated positive directional selection of varying intensities (using SelSim 2.1, Spencer and Coop 2004) and applied hotspot detection software (Hotspotter 1.0, Li and Stephens 2003) to the resulting simulated sequences. The parameter values for the simulations were picked to approximate a 10-kbp sequence of DNA from a human population sample. We found that for strong positive selection (σ ≈ 100, where σ ≡ 2Ns, N is the diploid population size, and s is the relative strength of selection), a locally elevated recombination rate can falsely be inferred in the region of selection with statistical significance 22% of the time, corresponding to a 16% excess over the false positive rate (FPR) from neutral simulations (Figure 1). This selective elevation over the neutral FPR is highly significant (P = 1 × 10−7; see Figure 1 legend). However, as the strength of selection becomes even stronger (σ ≥ 300), there is a rapid drop in the FPR—probably due to a loss in power associated with a paucity of genetic variation remaining immediately after a strong selective sweep. The patterns of LD resulting from positive selection can produce locally elevated estimated rates of recombination that are similar to the relative rates reported in the literature (e.g., Figure 2; cf. Jeffreys and Neumann 2002; Wall et al. 2003; McVean et al. 2004; Ptak et al. 2004; Verrelli and Tishkoff 2004; Winckler et al. 2005). The geometric mean of estimated recombination rates at the site of selection, from 100 replicates at σ = 100, is 13.25 times higher than the background rate. Furthermore, this elevated FPR can persist for up to N generations (0.5 × 2N generations) after the selective sweep has ended (Figure 3). Assuming an effective human population size of 10,000 and an average generation time of 25 years, this corresponds to a maximum persistence time of ∼250,000 years.
We do not mean to imply that true recombinant hotspots do not exist in humans; they have certainly been verified by experimental means (e.g., Hubert et al. 1994; Cullen et al. 1995; Smith et al. 1998; Yip et al. 1999; Jeffreys et al. 2001). But we do suggest caution when inferring the existence of hotspots solely on the basis of patterns of LD. The transient nature of positive selection, both over time and between populations, may easily mimic the rapidly evolving nature of recombination in primates. When a hotspot is inferred, it may be useful to also address the relative levels of genetic variation compared to levels of divergence (Ptak et al. 2004) to help rule out past positive selection—particularly since recombination may be associated with a mutagenic process (Rattray et al. 2002; Hellmann et al. 2003) and selective sweeps can quickly remove genetic variation. However, recombinant hotspots and selective sweeps may be linked at a more basic level. There is evidence that hotspot crossover asymmetry can result in a form of meiotic drive (Jeffreys and Neumann 2002), which itself is a “selfish” form of positive selection (for review see Reed et al. 2005 and references therein). This crossover asymmetry predicts that a derived recombination-suppressing allele will eventually fix in the population (Jeffreys and Neumann 2002), resulting in the co-occurrence of both a recombinant hotspot and a progressing selective sweep.
The possibility exists that inferred recombinant hotspots in gene regions that also appear to have undergone positive selection (e.g., Verrelli and Tishkoff 2004) are not due to nonuniform densities of meiotic recombination, but may simply be a by-product of positive selection. In the same vein, estimates of the rate of hotspot sharing between species, based on LD analysis (Ptak et al. 2005), may be underestimated, and short-scale LD may be lower than expected (e.g., Pritchard and Przeworski 2001) if recent positive selection plays a significant role. If selective sweeps do make a significant contribution to the patterns of LD in the human genome, then a better understanding of the effects of positive selection may have important implications for projects that characterize LD for association studies, particularly to the extent that selective pressures may have varied among human populations (e.g., Hamblin and Di Rienzo 2000; Schlötterer 2002b; Akey et al. 2004; Storz et al. 2004).
We thank Yuseob Kim and Michael Li for helpful suggestions. We also thank two anonymous reviewers for their feedback on a previous version of this manuscript. This work was supported by a Burroughs Wellcome Fund and David and Lucile Packard Career Awards to S.A.T. F.A.R. was partially supported by the Center for Bioinformatics and Computational Biology, University of Maryland.
Communicating editor: M. Nordborg
- Received October 8, 2005.
- Accepted December 23, 2005.
- Copyright © 2006 by the Genetics Society of America