Abstract
Over the last decade, surveys of DNA sequence variation in natural populations of several Drosophila species and other taxa have established that polymorphism is reduced in genomic regions characterized by low rates of crossing over per physical length. Parallel studies have also established that divergence between species is not reduced in these same genomic regions, thus eliminating explanations that rely on a correlation between the rates of mutation and crossing over. Several theoretical models (directional hitchhiking, background selection, and random environment) have been proposed as population genetic explanations. In this study samples from an African population (n = 50) and a European population (n = 51) were surveyed at the su(s) (1955 bp) and su(wa) (3213 bp) loci for DNA sequence polymorphism, utilizing a stratified SSCP/DNA sequencing protocol. These loci are located near the telomere of the X chromosome, in a region of reduced crossing over per physical length, and exhibit a significant reduction in DNA sequence polymorphism. Unlike most previously surveyed, these loci reveal substantial skews toward rare site frequencies, consistent with the predictions of directional hitchhiking and random environment models and inconsistent with the general predictions of the background selection model (or neutral theory). No evidence for excess geographic differentiation at these loci is observed. Although linkage disequilibrium is observed between closely linked sites within these loci, many recombination events in the genealogy of the sampled alleles can be inferred and the genomic scale of linkage disequilibrium, measured in base pairs between sites, is the same as that observed for loci in regions of normal crossing over. We conclude that gene conversion must be high in these regions of low crossing over.
THE causes of the empirically observed reductions in DNA sequence polymorphism in chromosomal regions experiencing a low rate of crossing over per physical length (Aguadéet al. 1989; Stephan and Langley 1989; Aguadé and Langley 1994; Aquadroet al. 1994; Stephan 1994) have been a subject of intense theoretical analysis (Kaplanet al. 1989; Charlesworthet al. 1993; Charlesworth 1994, 1996; Stephan 1994; Bravermanet al. 1995; Gillespie 1997). The lack of any parallel correlation between crossing over per physical length and divergence between species firmly establishes that the mechanism(s) creating the association of crossing over per physical length with polymorphism must be operating at the population genetic level (Berryet al. 1991; Begun and Aquadro 1992; Martín-Camposet al. 1992; Langleyet al. 1993). Accumulating evidence in Drosophila ananassae (Stephan and Langley 1989; Stephanet al. 1998), Mus (Nachman 1997), Aegilops (wild relatives of wheat; Dvoráket al. 1998), and Lycopersicon (wild relatives of tomato; Stephan and Langley 1998) indicates that this relationship can now be expected in many organisms.
Two dichotomous models based on gene frequency perturbations associated with selection at linked loci were proposed to explain the observations. The first is the extension of Maynard Smith and Haigh’s (1974) original “directional hitchhiking effect” analysis to the comparison of genomic regions with differing levels of crossing over per physical length (Kaplanet al. 1989). This model supposes that rare (perhaps newly arising) highly favored variants occasionally spread rapidly to fixation in a population dragging with it the genetic haplotype upon which it originated. The size of the region (measured in morgans) affected by such hitchhiking events is of the same order as the selection coefficient associated with the favored, rare variant. Thus in regions of low crossing over per physical length the impact of each selected substitution is much greater. The few polymorphisms at tightly linked sites that have survived or emerged since the last hitchhiking event are likely to be rare, so a skew in the frequency spectrum is expected under the directional hitchhiking effect (Aguadéet al. 1989; Bravermanet al. 1995). While the original survey of the yellow-Achaete Scute Complex (ASC) revealed the expected skew (Aguadéet al. 1989; Martín-Camposet al. 1992), subsequent surveys have yielded equivocal results (Aguadéet al. 1994; Bravermanet al. 1995).
The background selection model, proposed as an alternative to the directional hitchhiking model, posits that the deleterious mutation rate at closely linked selected sites is sufficiently great that most chromosomes bear one or more deleterious mutations tightly linked to the selectively neutral sites being studied (Charlesworthet al. 1993; Charlesworth 1994). Since selection is assumed to be sufficiently strong to effectively prevent these mutation-bearing chromosomes from leaving any descendent copies over the long term, the effective population size of deleterious-mutant-free chromosomes can be reduced to a small fraction of that expected in a selectively neutral region recombinationally distant from a selected locus. Obviously in genomic regions of very low crossing over per physical length the total deleterious mutation rate (per unit crossing over) will be higher. The parameter domain in which the predictions of the background selection model predict the observed genomic distribution of the levels of polymorphism is circumscribed by the known variation among genomic regions in crossing over per physical length, the experimentally determined limits on the deleterious mutation rate, and the model’s apparent sensitivity to even small levels of crossing over (Hudson and Kaplan 1995; Charlesworth 1996). Hudson and Kaplan (1995) have pointed out that the density of rare (presumably deleterious) transposable elements in regions of low crossing over per physical length in D. melanogaster may be sufficient to cause the background selection effect.
In a superficial and imprecise sense both the hitchhiking effect and background selection models are associated with a reduction in effective population size. Under the background selection model the spectrum of selectively neutral site frequencies for a population of size N is close to that expected for a population of size Nf0, where f0 is the proportion chromosomes free of deleterious mutations (Charlesworth 1994). Negative skew in the frequency spectrum due to background selection is “unlikely to be frequently associated with significant values” of the available test statics (Charlesworth 1996). Under the directional hitchhiking effect, linked polymorphisms are being episodically “swept out” and then replenished slowly by a combination of mutation and genetic drift. The expected skew in the site frequency spectrum under the directional hitchhiking effect reflects this continuing state of “recovery” of the neutral polymorphism (Aguadéet al. 1989; Bravermanet al. 1995). In his extensive simulations Gillespie (1997) has observed such a skew in the frequency spectrum of neutral sites linked to a selected locus undergoing any number of random-environment-selection processes; all have the tendency to strongly reduce standing variation. Here we report substantial reductions in DNA sequence polymorphism at two loci, su(s) and su(wa), located in a region of low crossing over per physical length near the telomere of the X chromosome of D. melanogaster, in large samples of alleles from an African and a European population. Consistent with the directional hitchhiking effect, a distinct skew toward rare sites is evident.
Substantial intragenic recombination is also evident in the inferred history of the sampled alleles. Molecular population genetic models typically fail to incorporate gene conversion. The effects of this recombination process are often assumed to be the same as crossing over or it is assumed that the scales of gene conversion tracts and/or of the rates of gene conversion are negligibly small. Analyses of both the hitchhiking effect and the background selection models have ignored gene conversion. The high level of historical recombination evident among our sampled alleles appears inconsistent with our expectations on the basis of surveys of loci in regions with normal levels of crossing over per physical length and the known reduction in crossing over per physical length at the tip of the X chromosome. We propose that gene conversion is the likely mechanism of this recombination and discuss its implications for the models proposed to explain the correlation of DNA sequence polymorphism with rates of crossing over per physical length.
MATERIALS AND METHODS
Drosophila stocks and DNA preparation: Genomic DNA was isolated from 50 independent isogenic X chromosome lines of D. melanogaster extracted from the population in the Sengwa Wildlife Reserve, Zimbabwe, Africa (Begun and Aquadro 1993) and 51 isogenic X chromosome lines from Barcelona, Catalonia, Spain (Martín-Camposet al. 1992). Genomic DNA was prepared as described previously (Aguadéet al. 1994).
Single strand conformation polymorphism survey: In an earlier report (Aguadéet al. 1994) it was demonstrated that DNA sequence polymorphism could be efficiently surveyed by a stratified approach of first applying single strand conformation polymorphism (SSCP) to PCR fragments, followed by direct DNA sequencing of representative members of each “allelic” SSCP class. The SSCP survey allows for relatively large sample sizes. If the amount of polymorphism per fragment is sufficiently low, only a small proportion of fragments must be sequenced. In genomic regions of low crossing over per physical length in D. melanogaster, where expected heterozygosity per site (or pairwise differences or gene diversity) is low (<0.001), surveys of >3 kb for sample sizes of 50 are practical. This method is not practical in genomic regions of normal crossing over per physical length because the number of polymorphisms within each SSCP fragment (and thus the number of SSCP “classes”) is typically so large that the amount of direct DNA sequencing approaches that associated with sequencing each sampled allele. Figure 1, a and b, shows the PCR-amplified segments of noncoding sequence, which totaled 3213 and 1955 bp for the su(s) and su(wa) regions, respectively, not counting the length of the primers and small segments of overlap (Aguadéet al. 1994). SSCP fragments surveyed ranged in length from 161 to 358 bp. PCR amplifications, sample loading and electrophoresis, silver staining, and scoring were conducted as previously described (Aguadéet al. 1994).
—Polymorphism and linkage disequilibria in the African sample. The positions of polymorphic sites detected (ticks on the third level) in the survey of the 5′ portion of the su(s) (a) and of su(wa) (b) in 50 alleles from an African population are indicated below a depiction of the gene structure (the open box is the 5′ untranslated portion, the solid boxes are coding portions of exons, and the thin lines are the introns). Only those polymorphisms at which the least common state occurred twice or more in the sample are shown. Directly below the gene structure are the positions and sizes of the PCR fragments surveyed by SSCP and DNA sequencing. The triangular matrix of squares represents the statistical significance determined by Fisher’s exact test (uncorrected for multiple tests); white, P ≥ 0.05; light gray, P < 0.05; dark gray, P < 0.01; and black, P < 0.005. The dashed lines connect the polymorphic sites to their corresponding columns in the matrix. The shading of the circles at the top of each column increases with the expected heterozygosity of the polymorphic site. Along the left margin of the matrix are the positions in the published sequence of the corresponding polymorphic sites. Along the right margin are the inferred positions of the “minimum” number of recombination events in the history of the sample (Hudson and Kaplan 1985).
DNA sequencing: DNA sequences were determined as described in Aguadé et al. (1994) or using standard protocols on the ABI 377 automated sequencer.
Estimation of DNA sequence variation: Following the estimation procedure in Aguadé et al. (1994), the reported estimates of the average number of pairwise differences (per site) were corrected for failure of our SSCP assay to detect all variants. The estimates of insertion/deletion variation were not corrected. Some alleles were not scored for SSCP but were directly sequenced. These SSCP-unscored alleles were treated as separate classes for purposes of estimation of average number of pairwise differences,
The calculated values for Tajima’s D (Tajima 1989) and for estimates of linkage disequilibrium were based on DNA sequences inferred by assuming each sampled allele within an SSCP class was identical in sequence to that of sequenced members. The few within-class variant sites (detected by direct sequencing) were ignored in the calculations of Tajima’s D and of linkage disequilibria. If the frequency spectrum of within-class variants is similar to that of detected polymorphisms, then no bias in the analysis of the site frequency spectrum should arise. The confidence intervals for Tajima’s D were determined by simulation, as in Braverman et al. (1995). The distributions of Tajima’s D under various rates of hitchhiking were determined as described in Braverman et al. (1995). Estimates of 3Nc (where N is the population size and c is the intragenic rate of recombination) were obtained according to Hudson (1987). The “minimum number of recombination events” was determined by the algorithm in Hudson and Kaplan (1985).
RESULTS
Tables 1, 2, 3, 4 present the SSCP scoring and the inferred and observed (boldface) DNA sequence state at each of the polymorphic sites in each surveyed X chromosome line at the su(s) and su(wa) loci (see Figures 1 and 2). Including polymorphic insertions, a total of 3220 (3213 + 7) nucleotide positions were surveyed at the su(s) locus; of these, 112 and 54 vary (as substitutions or as part of insertion/deletion variation) in the samples from Africa and Europe, respectively. Of the total of 1983 (1955 + 28) nucleotide positions surveyed at the su(wa) locus, 83 are polymorphic in the African sample, while 89 in the European sample are segregating either as single substitutions or as part of insertion/deletion polymorphisms.
Table 5 presents the estimated average number of pairwise differences,
SSCP polymorphism at su(s) and su(wa) in Africa
su(s) and su(wa) polymorphisms in Africa
To examine the site frequency spectra, Table 5 presents the calculated Tajima’s D (Tajima 1989) for each locus population. Under an equilibrium between random genetic drift and selectively neutral mutation, the expected value of Tajima’s D in samples such as these is close to zero. A skew in the frequency spectrum (excess rare polymorphisms) yields a negative value for Tajima’s D. The distribution of Tajima’s D is simulated as in Braverman et al. (1995) for 10,000 replicas (for the actual sample sizes and the observed number of segregating sites, assuming no intragenic recombination). The proportion of those simulated samples with Tajima’s D values less than the observed, P {Tajima’s D ≤ D0}, is an estimate of the probability of the observed value of Di or less given Si segregating sites. The observed D values for su(s) from the European sample are significantly negative (P < 0.05) compared to the expectations of the neutral theory and the background selection model, while the critical values associated with that for su(wa) are 0.07. Tajima’s D values from the African sample are also negative, but not statistically significant. Wall (1999) has demonstrated that the assumption of no intragenic recombination can lead to excessively conservative tests of Tajima’s D (see also Bravermanet al. 1995). Simulations assuming a more realistic level of recombination (3Nc = 5) yield less conservative critical values (see below).
The linkage disequilibrium between sites is depicted in Figures 1a and 2a, which also show the noncoding regions (3220 bp) of the su(s) locus surveyed by SSCP analysis of the indicated fragments for the African and European samples. The ticks on the horizontal line below the SSCP fragments represent the positions of informative sites (at least two of the rarer state observed in the sample). The shading of the circles connected to the ticks by the dashed lines increases with the expected heterozygosity at the site.
The diagonal matrix at the bottom presents the statistical significance of the nonrandom associations between the pairs of sites. As expected, closely linked sites with higher expected heterozygosities are more often represented among the pairs showing statistically significant nonrandom association. More than 10% of all pairs of sites at su(s) showed nonrandom associations with P < 0.005 in the African sample (Figure 1a). In the European sample (Figure 2a) 4 of 28 comparisons were significant at the P < 0.01 level. Also shown (small solid triangles) are the positions of the “minimum number” exchanges in the history of the sample (Hudson and Kaplan 1985): six in the African sample and three in the European sample.
Figures 1b and 2b present the same types of results for the surveys of several noncoding regions of the su(wa) locus in the African and European samples, respectively. A small block of many highly polymorphic sites between positions 98 and 156 exhibits strong linkage disequilibrium in the sample from Africa (Figure 1b), although at least two exchanges are inferred to have occurred in the ancestry of these sampled alleles within this small fragment. The pattern in the remaining portion of the matrix of nonrandom associations appears similar to that for su(s) from Africa, 24 of 276 pairs significant with P < 0.005. The distribution of these significant comparisons shows little apparent association with the distance between pairs (discussed below). There are a minimum of 10 exchanges in the history of these alleles. The European sample of su(wa) alleles does not exhibit the high level of polymorphism in the first fragment (number 13). The most significant associations are clustered among the tightly linked pairs of sites. A total of 21 of the 153 comparisons are significant with P < 0.005; 9 of these 21 are between adjacent (tightly linked) sites. At least 5 exchanges among the ancestral alleles of those in the European sample can be deduced.
The distribution of linkage disequilibria within and between su(s) and su(wa) is similar to that observed in the survey of the North American population (Aguadéet al. 1994). The estimate of crossing over between su(wa) and su(s) is less than 10-3 (see below). As was seen in the North American sample, linkage disequilibria between sites at the two loci are rare. In the European sample none of the 144 pairs of polymorphic sites, 1 at su(s) and the other at su(wa), are in linkage disequilibrium; i.e., P < 0.05. Of the 1008 such pairs in the African sample 80 appear to be in disequilibrium: 7 with P < 0.005, 2 with P < 0.001, and the remainder with P < 0.05; the mean squared correlation coefficient r2 = 0.035 (see discussion).
DISCUSSION
This survey brings two new aspects to the study of the reduction in DNA sequence polymorphism in regions of reduced crossing over per physical length in natural populations of D. melanogaster. First, two additional populations (African and European) are surveyed for su(s) and su(wa), providing a more complete view of the pattern of DNA sequence polymorphism at su(wa) and su(s) in D. melanogaster throughout its distribution. Second, the increased overall levels of DNA sequence polymorphism in the African population provide more accurate estimates and more powerful statistical inferences than were available in previous surveys of regions of low crossing over. Africa is thought to be the ancestral home of D. melanogaster (David and Capy 1988). It is not known when this species spread out of Africa to become the cosmopolitan species it is now. It may well have accompanied human ancestors as they migrated out of Africa. In any case, much like their human commensals (Cavalli-Sforzaet al. 1994), African D. melanogaster harbors most of the DNA sequence polymorphism found throughout the rest of the world and is significantly more heterozygous than populations on other continents (Begun and Aquadro 1993). It seems probable that African populations have been prehistorically larger and ecologically more stable than populations on other continents that were recently established. African populations are likely to be closer to the idealized populations assumed in theoretical analyses, while non-African populations may still be undergoing more intense adaptive selection to the non-African environments. For these statistical and biogeographical reasons the African sample affords the most straightforward interpretation.
The su(s) and su(wa) region exhibits reduced crossing over per physical length. The map distance to y of genes near this X telomere is the most appropriate available measure of the crossing over per physical length for two reasons. First, it is the most direct and reliable quantitative observation. Second, and more important, is the asymmetry in the pattern of crossing over per physical length in this genomic region. That portion of the genome that is tightly linked to su(s) and su(wa) is toward yellow and extends out to the X telomere. Crossing over between centromere proximal markers and y continues to decline distally throughout cytological section 1. Crossing over between y (cytological position 1B1; map position 0.0) and su(wa) (1E1-4) is reported to be somewhat <10-3 (M. Green, personal communication). R. Voelker and J. Mason (personal communication) report that rare recombinants between y and recessive lethals adjacent to su(s) (1B13) occur with a frequency somewhat <10-4. And since meiotic crossing over near y and beyond is absent, this terminal region forms a “block” of loci, all at the same (crossing over) distance from su(s) or su(wa). Crossing over is increasing so rapidly in the centromere-proximal direction that the impact of linked selected variation must be much less. For example, the white locus (3C2) crosses over with y ∼3% of the time.
su(s) and su(wa) are thus tightly linked to ∼1% of the genome (cytological section 1) that undergoes little crossing over. This large reduction in crossing over per physical length can be compared to the reductions in polymorphism at the loci in this region. The average number of pairwise differences per site (and
SSCP polymorphism at su(s) and su(wa) in Europe
The frequency spectra at loci demonstrating clear reductions in expected heterozygosity associated with reduced crossing over per physical length offer some hope. Braverman et al. (1995) showed that simple directional hitchhiking, which reduces heterozygosity by these observed amounts, should lead to a strong skew in the frequency spectra. This distortion from expectation under selective neutrality should be detectable in samples of the size reported here. Background selection is not expected to produce detectable skew (Charlesworth 1996). While Gillespie’s analyses of various random environment models do not provide sample properties nor any indication of statistical power, they do show a clear correlation between the reduction in average heterozygosity and the skew toward an excess of rare polymorphisms (Gillespie 1997). As Table 5 shows, the frequency spectra of nucleotide polymorphisms at both su(wa) and su(s) in both the African and European samples are skewed toward rare sites unlike those observed previously in the North American sample (Aguadéet al. 1994). The European samples reach nominal statistical significance in the deviations of Tajima’s D from the predictions of the neutral model or background selection. The results from the survey of su(s) and su(wa) from Europe and Africa favor a directional hitchhiking explanation over background selection. But how well do the observations fit the predictions of directional hitchhiking? Using the simulation approach in Braverman et al. (1995), the rates of directional hitchhiking (and selection coefficient) were set to yield the observed reductions in average number of pairwise differences per site from those expected in regions of normal crossing over, 0.007 in Africa and 0.004 in Europe and North America. Figure 3 shows above the probabilities of the observed Tajima’s D’s or less under the neutral or background selection models for the six locus samples. Depicted below the x-axis in Figure 3 are the proportions of directional hitchhiking simulations in which the Tajima’s D value exceeded that observed. The lack of linkage disequilibrium between these two loci suggests that the similarities are unlikely due to a single recent event.
It is evident in Figure 3 that the North American sample (especially the su(s) result) appears exceptional. Since the African, European, and North American populations cannot be considered strictly independent, it is prudent to take the African results as the most representative, since they are based on many more segregating sites and are from the putative ancestral region. While neither the background selection and neutral models nor the directional hitchhiking model can be rejected by Tajima’s D values from the African sample, these results favor some form of hitchhiking. And, indeed, the European results also support hitchhiking over background selection. The inconsistency of the North American results with this pattern might be attributed to transient hitchhiking associated with the more recent colonization of the Western Hemisphere (David and Capy 1988).
It has been suggested from empirical studies (Stephan and Langley 1989; Begun and Aquadro 1993) and from theoretical modeling (Nordborget al. 1996; Stephanet al. 1998) that geographic differentiation in DNA sequence polymorphism might be greater in those regions of the genome where crossing over per physical length is reduced. But the inherent paucity of variation and other considerations place considerable limitations on empirically based conclusions (Charlesworth 1998). The consideration of the genetic differentiation among populations of D. melanogaster has two broad aspects: the relatively greater overall level of polymorphism in Africa and the variance in site frequencies among populations. Figure 4, a-d, shows the distributions of FST of individual sites for su(s), Europe and Africa; su(s), North America and Africa; su(wa), Europe and Africa; and su(wa), North America and Africa, respectively. The patterns of these distributions are remarkably consistent in both their shape and the large contribution of polymorphism of the African sample. There is no evidence of “fixed differences” between populations. The mean FST values (Hudsonet al. 1992) for su(s) and su(wa) between the African and European (and North American) samples are 0.153 (0.245) and 0.291 (0.343), respectively. Comparable values for loci in regions of normal crossing over have been reported (white, 0.28; vermilion, 0.32; G-6-pdh, 0.30; Pgd, 0.25), while greater differentiation was reported for three loci in regions of low crossing over per physical length [yellow, 0.56; achaete, 0.54; su(f), 0.60; Begun and Aquadro 1993]. This difference in the apparent level of geographic differentiation can be attributed to the small number of polymorphic sites surveyed at the three loci in that study.
su(s) and su(wa) polymorphisms in Europe
—Polymorphism and linkage disequilibria in the European sample. The positions of polymorphic sites detected (ticks on the third level) in the survey of the 5′ portion of the su(wa) (a) and of su(s) (b) in 51 alleles from a European population are indicated below a depiction of the gene structure (the open box is the 5′ untranslated portion, the solid boxes are coding portions of exons, and the thin lines are the introns). Only those polymorphisms at which the least common state occurred twice or more in the sample are shown. Directly below the gene structure are the positions and sizes of the PCR fragments surveyed by SSCP and DNA sequencing. The triangular matrix of squares represents the statistical significance determined by Fisher’s exact test (uncorrected for multiple tests); white, P > 0.05; light gray, P < 0.05; dark gray, P < 0.01; and black, P < 0.005. The dashed lines connect the polymorphic sites to their corresponding columns in the matrix. The shading of the circles at the top of each column increases with the expected heterozygosity of the polymorphic site. Along the left margin of the matrix are the positions in the published sequence of the corresponding polymorphic sites. Along the right margin are the inferred positions of the “minimum” number of recombination events in the history of the sample (Hudson and Kaplan 1985).
Summary statistics
—The estimated relative probabilities of Tajima’s D values less than that observed under the neutral or background selection models, “N.T.,” or greater than that observed under the directional hitchhiking model, “H.H.” Neutral coalescent simulations (Hudson 1990) were conducted to generate 10,000 samples of size 50 for each locus in each population. Given the observed numbers of segregating sites the distribution of Tajima’s D was obtained. The proportion of simulations yielding a value less than that observed in the data (Table 3) is plotted above (gray bars for 3Nc = 0.0; hatched bars for 3Nc = 5.0). Simulations of recurrent, random directional hitchhiking (Bravermanet al. 1995) were conducted with the hitchhiking intensity sufficient to reduce the average number of pairwise differences per site to the observed values from a neutral expectation of 0.007 for Africa and to 0.004 for Europe and North America. The proportion of 10,000 such simulations yielding a Tajima’s D value greater than the observed is plotted below (open bars). The horizontal dashed lines indicate P > 0.05.
Figure 5 shows the distributions of the linkage disequilibria (measured as squared correlation coefficient, r2) between pairs of sites plotted against distances (base pairs) for su(wa) and su(s), respectively, for the African sample (the sample with the most polymorphism and thus the most linkage disequilibrium information). Also, as argued above from a biogeographic perspective, the African population may be closer to equilibrium with respect to mutation, drift, selection, and recombination. Only sites where the rarer state occurred at least twice in the sample can be considered in the linkage disequilibrium analysis. Clearly most of the large linkage disequilibria (r2 > 0.5) occur between sites separated by small distances, <200 bp. Also shown are the means over six contiguous intervals of distance [63 and 105 r2 values averaged in each interval for su(s) and su(wa), respectively]. As expected from previous surveys and theoretical predictions, the scatter is large for individual r2 values. The distribution of the mean r2 values reinforces the view that linkage disequilibrium tends to dissipate quickly and is near that expected from sampling, 0.02 for distances >500 bp.
To summarize the distributions of r2 an empirical function was developed and fitted. Under the assumption of r = 0.0, the expected value of r2 is the reciprocal of the sample size, 0.02 in this case. Under Wright-Fisher sampling the mean value of r2 in the absence of any recombination can be estimated by simulation (Hudson 1990). Specifically an estimate of this “intercept” was derived from 1000 neutral, infinite site model coalescent simulations of samples of size 50 with 15 segregating sites and 4Nc = 20, where N is the diploid population size and c is the recombination per generation (Hudson 1990). As in our data analysis from the su(s) and su(wa) loci, only sites where the rarer state occurred at least twice in the sample were considered. The value 0.337 was obtained by fitting a and λ in r2 = 0.02 + a/(1 + λ * distance) to the 45,434 simulated r2 values, where distance is in units of 4Nc (Hill and Weir 1994). The fitted λ for the simulated data was 0.354. The impact of recombination (proportional to distance) can be extended to include random gene conversion events with exponentially distributed tract lengths,
—Distributions of FST values for individual polymorphic sites at the su(s) and su(wa) loci in comparisons of the African sample with the European (a and c) and the North American (b and d). The FST was calculated as 1 - (Hw/Hb), where Hw is the unweighted average expected heterozygosity and Hb is the expected heterozygosity of an unweighted pooled population. The solid portion of each bar reflects the number of sites polymorphic only in the African sample and exhibiting an FST value within the particular range. The open portion shows the number polymorphic only in the non-African sample, while the gray portion indicates the number of sites polymorphic in both samples.
Two aspects of these distributions of linkage disequilibria at su(s) and su(wa) in the African sample are unexpected. First the scale of linkage disequilibrium is not different between these two loci, despite the fact that the density of crossing over per physical length is estimated to be as much as 10-fold less at su(s). Equally surprising is the fact that this pattern and scale of linkage disequilibrium is the same as that observed at the white locus (Miyashita and Langley 1988; Miyashitaet al. 1993). The white locus is in a genomic region of normal crossing over and exhibits much higher levels of polymorphism. It is clear that the rate of crossing over is not the determining parameter of linkage disequilibrium within these loci. Gene conversion is the obvious alternative recombination mechanism. The scale of strong linkage disequilibrium (r2 > 0.3; <400 bp) is approximately that reported for both meiotic and mitotic gene conversion in Drosophila. While quantification of gene conversion rates in Drosophila is limited, the evidence indicates that half or more of all meiotic recombination events among intragenic sites are gene conversions (Finnerty 1976). Thus gene conversion should have at least the same impact on intragenic linkage disequilibrium as crossing over. Most meiotic and mitotic conversion tracts are small (<500), comparable to the estimates above. The reported mean P-element-associated conversion tract is ≈1400 bp (Preston and Engels 1996), while that for meiotic gene conversion tracts is estimated to be ≈400 bp (Hillikeret al. 1994).
From both the distributions of r2 in Figure 5 and the inferred minimum recombination event depicted in Figures 1 and 2, it is evident that these loci experienced a great deal of recombination in the history of these sampled alleles, despite their location in a region of greatly reduced crossing over and despite the drastic reduction in polymorphism (and thus time since the last common ancestor of the sampled alleles). No theoretical analysis of the impact of intragenic recombination (e.g., gene conversion) on the background selection effect or on the directional hitchhiking effect has been reported. If the impact of background selection on linkage disequilibrium is similar to its effect on polymorphism, it can be approximated by r2 = 1/(1 + 4Nf0c), where f0 is the fraction of the deleterious-mutation-free chromosomes in the population, which can be estimated as the relative reduction in expected heterozygosity per nucleotide site (Charlesworthet al. 1993; Charlesworth 1994, 1996). The scale of linkage disequilibrium (in base pairs) should increase by 1/f0, even if gene conversion is the dominant source of intragenic recombination and if the rates of gene conversion are similar in regions of low and normal crossing over per physical length. Similar arguments and preliminary computer simulations extending our previous approach (Bravermanet al. 1995) suggest that directional hitchhiking will also increase the scale of linkage disequilibrium, if gene conversion rates are uniform.
—Linkage disequilibria vs. genomic distance at su(s) and su(wa) in the African sample. The squared correlation coefficient, r2, for pairs of sites in su(s) (horizontal ticks) and in su(wa) (vertical ticks) is plotted against the number of base pairs separating the sites. The mean r2 over six contiguous intervals (k values) are also plotted: su(s), ▴, k = 63; and su(wa), ▾, k = 105. Also plotted is a fit for each locus of the equation
dashed line, su(s); and solid lines, su(wa) (see text). The parameters, the rate of gene conversion, g (events per base pair per generation in the population), the mean length (base pairs) of a gene conversion tract, t, and the rate of crossing over per base pair, c (events per base pair per generation in the population), were estimated by nonlinear regression to the formula (see text): su(s), ĝ = 0.010, tˆ = 302, and ĉ = 0.0; su(wa), ĝ = 0.006, tˆ = 538, and ĉ = 0.0. The mean r2 for all the between-locus [su(s) - su(wa)] pairs of polymorphic sites is plotted as a large circle at the far right.
The observation of extensive recombination (presumably gene conversion) in the obviously short histories of these alleles at loci in regions of low crossing over per physical length (and low polymorphism) demands a modification or extension of the proposed linked-selected-perturbation models if they are to survive as general explanations of the correlation between standing polymorphism and crossing over per physical length. The rate of gene conversion could go up as the crossing over rate goes down. While there is no evidence to support this idea, it may be that the suppression of crossing over (near centromeres and telomeres) is simply the shunting of incipient crossovers toward gene conversions. Another interesting potential explanation for the high level of recombination in the histories of these alleles is a heterozygosity-dependent rate of gene conversion (see Stephan and Langley 1992). If gene conversion occurs at a higher rate between alleles that differ by fewer sites, the rate of recombination would change dynamically with increasing heterozygosity. Thus as directional hitchhiking or background selection reduced standing polymorphism, the recombination rate would increase. At equilibrium, regions of low crossing over would contain less polymorphism and thus undergo more gene conversion. This hypothesis of a heterozygosity-dependent rate of gene conversion is supported by observations in many systems including Drosophila. Nassif and Engels (1993) concluded that 0.5% sequence heterozygosity reduces the rate of P-element-induced mitotic gene conversion to 25% of that without heterozygosity. Variations in levels of heterozygosity have little or no effect on the rates of crossing over (Rutherford and Carpenter 1988). This is attributed to the strong regulatory processes underlying interference and the interchromosomal effect in crossing over (Lucchesi and Suzuki 1968). It is not known if heterozygosity-dependent inhibitions occur for gene conversion arising during meiosis or by other mechanisms in mitotic cells in Drosophila, but it is well known in yeast (see Kirkpatricket al. 1998; Chen and Jinks-Robertson 1999). If this indication of population genetic consequences of the interaction between DNA sequence heterozygosity and rates of genetic exchange can be corroborated, our theories of the forces shaping genomic polymorphism and divergence will require interesting improvements.
Acknowledgments
We thank the many colleagues whose comments, questions, and criticisms have helped us improve this report. Special thanks to R. Hudson for the suggested extension to random conversion tract length in the fitted equation in the Figure 5 legend and to Peter Andolfatto for queries leading to better data quality. This work was supported by National Science Foundation (NSF) grant DEB 95-09548. E.H. was supported by a postdoctoral fellowship from the Finnish Research Council. J.M.B. was supported by a NSF/Sloan Foundation Postdoctoral Research Fellowship in Molecular Evolution and a postdoctoral research fellowship from the Ministerio de Educación y Ciencia, Spain.
Footnotes
-
Communicating editor: R. R. Hudson
- Received June 25, 1999.
- Accepted August 23, 2000.
- Copyright © 2000 by the Genetics Society of America