- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Andolfatto, P.
- Articles by Wall, J. D.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Andolfatto, P.
- Articles by Wall, J. D.
Linkage Disequilibrium Patterns Across a Recombination Gradient in African Drosophila melanogaster
Peter Andolfattoa,b and Jeffrey D. Wallc,da Department of Zoology, University of Toronto, Toronto, Ontario M5S 3G5, Canada,
b Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom,
c Program in Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089
d Department of Human Genetics, University of Chicago, Chicago, Illinois 60637
Corresponding author: Peter Andolfatto, Ramsay Wright Bldg., 25 Harbord St., University of Toronto, Toronto, Ontario M5S 3G5, Canada., pandolfatto{at}zoo.utoronto.ca (E-mail)
Communicating editor: W. STEPHAN
| ABSTRACT |
|---|
Previous multilocus surveys of nucleotide polymorphism have documented a genome-wide excess of intralocus linkage disequilibrium (LD) in Drosophila melanogaster and D. simulans relative to expectations based on estimated mutation and recombination rates and observed levels of diversity. These studies examined patterns of variation from predominantly non-African populations that are thought to have recently expanded their ranges from central Africa. Here, we analyze polymorphism data from a Zimbabwean population of D. melanogaster, which is likely to be closer to the standard population model assumptions of a large population with constant size. Unlike previous studies, we find that levels of LD are roughly compatible with expectations based on estimated rates of crossing over. Further, a detailed examination of genes in different recombination environments suggests that markers near the telomere of the X chromosome show considerably less linkage disequilibrium than predicted by rates of crossing over, suggesting appreciable levels of exchange due to gene conversion. Assuming that these populations are near mutation-drift equilibrium, our results are most consistent with a model that posits heterogeneity in levels of exchange due to gene conversion across the X chromosome, with gene conversion being a minor determinant of LD levels in regions of high crossing over. Alternatively, if levels of exchange due to gene conversion are not negligible in regions of high crossing over, our results suggest a marked departure from mutation-drift equilibrium (i.e., toward an excess of LD) in this Zimbabwean population. Our results also have implications for the dynamics of weakly selected mutations in regions of reduced crossing over.
THE parameters
(= 4Neµ), where Ne is the species effective population size and µ is the mutation rate per generation, and
(= 4Ner), where r is the sex-averaged crossing-over rate per generation, influence the amount and the pattern of genomic sequence variability. Under a Wright-Fisher neutral model, hereafter the standard neutral model, these two parameters are both proportional to Ne. Ne cannot be estimated directly from polymorphism data; instead, it must be estimated indirectly using estimates of
and µ or estimates of
and r. Under the standard neutral model, both methods (i.e., using
and
or
and
) should yield similar estimates of Ne (![]()
Contrary to these expectations, recent studies of human and Drosophila data have revealed that the two methods for estimating Ne yield disparate results. In humans, the estimate of Ne from
and
, over short physical distances (i.e., within several kilobases), is substantially lower than the estimate from
and
. This suggests less intragenic linkage disequilibrium (LD) than expected under the standard neutral model, given estimated rates of crossing over (![]()
![]()
![]()
![]()
. By determining how various population genetic models differentially affect levels of diversity and levels of LD, we might infer which types of models are most plausible for each species (![]()
![]()
The Drosophila data analyzed thus far have consisted predominantly of non-African samples. Both D. melanogaster and D. simulans are human commensals; they are thought to have originated in sub-Saharan Africa and only relatively recently colonized other locations (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Interestingly, patterns of variation in human populations share some similarities with Drosophila (![]()
![]()
![]()
![]()
![]()
In addition to the effects of population history on patterns of LD, the nature of the meiotic recombination mechanism itself is an issue. In current models of meiotic recombination (e.g., ![]()
![]()
Frequent gene conversion in humans may account for inflated estimates of
(over small physical scales) relative to large-scale map-based estimates of r in African populations (![]()
![]()
![]()
![]()
In an attempt to distinguish between possible explanations for the discordant estimates of Ne in Drosophila, we analyze sequence polymorphism data at multiple loci in Zimbabwean population samples of D. melanogaster. This population is more likely than non-African populations to be closer to equilibrium since it is nearer to D. melanogaster's ancestral range and may not have experienced drastic changes in size or ecology in its recent history.
The physical scale of LD should increase as the rate of recombination decreases. Interestingly, a recent survey of two gene regions at the tip of the X chromosome of D. melanogaster, where rates of meiotic crossing over are greatly reduced, suggested that the physical scale of LD is not larger than that in other regions of the genome (![]()
![]()
![]()
| METHODS |
|---|
Previously published data:
We consider all sequence polymorphism studies that include a sample size (n) of seven or more D. melanogaster sequences from Zimbabwe and 10 or more segregating sites (S). We adopt these minimal size restrictions because estimates of
are highly biased and inaccurate when both n and S are small (see justification in RESULTS). Fifteen previously published X chromosome data sets fit this size requirement: su(s) and su(wa) (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Data collection:
We collected nucleotide variability data for eight additional gene regions: yellow (y, 2017 bp, n = 49); Fasciclin-2 (Fas2, 588 bp, n = 17); spaghetti squash (sqh, 572 bp, n = 16); Hyperkinetic (Hk, 563 bp, n = 21); dusky (dy, 567 bp, n = 20); licorne (lic, 615 bp, n = 23); rutabega (rut, 548 bp, n = 22), and 1049 bp of the white gene (in two regions of 546 and 503 bp, respectively, separated by 3.9 kb, n = 16), which we analyzed as one region. We also expanded the snf1A data set reported in ![]()
Details of the sequencing strategy for each gene region can be found at http://helios.bto.ed.ac.uk/evolgen/andolfatto/zimbabweLD. We PCR amplified each region from genomic DNA of single male flies (one male from each isofemale line), precipitated the products with polyethylene glycol (PEG-8000) to remove primers, and sequenced them on both strands using Big-Dye (Applied Biosystems, Foster City, CA) or DYEnamic ET (Amersham, Arlington Heights, IL) sequencing kits. One of our PCR/sequencing primers overlapped with a 63-bp deletion in region 4 of snf1A in individual zs53. For this region, this allele was amplified with flanking primers, cloned into pCR4-TOPO (Invitrogen, San Diego), and three independent clones were sequenced. For all loci surveyed, the lines used either are the same 13 individuals surveyed in ![]()
Analysis of levels of variability:
Our first goal is to estimate the effective population size (Ne) for each locus independently. For this purpose, we estimate 
=
/4
assuming a constant mutation rate, µ (see below). We multiply 
for loci on the X chromosome by 4/3 to make them comparable to estimates for autosomal loci (which makes a number of assumptions, see below). We employ two estimators of
:
W, which is based on the number of single nucleotide polymorphisms observed in the sample (![]()

, which is the average pairwise divergence between sampled chromosomes per site (cf. ![]()

, we also use the Jukes-Cantor correction for multiple hits as implemented in DnaSP version 3.59 (http://www.ub.es/dnasp/).
A second goal of this analysis is to compare joint estimates of Ne from
(
) to joint estimates of Ne from LD (see below). Since a correlation between variability and recombination rates has been well documented in D. melanogaster (![]()
![]()
![]()
> 1 cM/Mb (i.e., those regions least likely to be affected by selection at linked sites). To do this, we calculated averages of
W and 
for silent sites (weighted by the number of silent sites surveyed). This approach has the advantage that
W and 
are unbiased estimates of
; however, confidence intervals must be obtained by coalescent simulations for each locus. We instead estimated
(
ML) and confidence intervals using a likelihood-based approach based on Equation 12 of ![]()
![]()
![]() |
(1) |
where
![]() |
(2) |
and
![]() |
(3) |
over a range of
values, where n is the sample size and S is the observed number of silent polymorphisms (again including multiple hits). Assuming that each genomic region investigated is evolutionarily independent, likelihoods can be combined to produce a joint likelihood surface for all loci, the maximum of which is
ML. Confidence limits for
ML are estimated using the standard
2 approximation. This approach assumes no recombination within loci, but independence among loci. Simulations confirm that the bias of this estimator is not large and that confidence limits for
estimated by this method are wider than those in the presence of intralocus recombination (S. WRIGHT, personal communication).
Estimating 
from
requires knowledge of the mutation rate for silent sites. By comparing estimates of 
, or estimating 
from a joint estimate of
, we implicitly assume that the mutation rate at silent sites is constant among loci. We assume that µ = 1.5 x 10-8/silent site/year and that D. melanogaster populations undergo an average of 10 generations/year, yielding a
= 1.5 x 10-9/silent site/generation. Our estimate of the mutation rate depends on many assumptions (reviewed in ![]()
![]()
![]()
![]()
Linkage disequilibrium analysis:
The goal of this analysis is to estimate Ne from levels of LD. We choose to summarize LD by an estimate of the population recombination rate
= 4Ner, where Ne is the effective population size, and r is the sex-averaged rate of recombination (cf. ![]()
![]()
We first employ an estimator,
W00, proposed by ![]()
W00, we summarize the data using the observed number of distinct haplotypes (H) and the minimum number of inferred recombination events (RM, cf. ![]()
that maximizes the likelihood of obtaining the observed values of H and RM (cf. ![]()
![]()
for each locus on the basis of comparisons of physical and genetic maps as described in ![]()
![]()

(=
/4
) over increments of 3.3 x 105 or smaller, using a minimum of 2 x 105 replicates for each parameter value for each locus (cf. ![]()

is estimated for each locus separately and also jointly for loci with
> 1 cM/Mb, assuming that each locus is evolutionarily independent (cf. ![]()

for loci on the X chromosome by 4/3 to make them comparable to estimates for autosomal loci. Approximate 95% confidence intervals for joint estimates
W00 are found by making the standard asymptotic maximum-likelihood assumptions. We caution that there is no evidence that these assumptions are appropriate in this context, but the intervals serve as a useful diagnostic.
Several other summary estimators of
have also been proposed (e.g., ![]()
![]()
![]()
![]()
H01, which is a composite-likelihood estimator of
based on pairwise LD among all pairs of polymorphic mutations in a sample. This estimator performs similarly to
W00 and is substantially better than other estimators (see ![]()
![]()
H01 is extremely large (i.e., >10,000). We exclude this locus from correlation tests and thus tests involving
H01 use only 23 of 24 loci.
A considerable fraction of polymorphisms segregating among autosomal genes (43/279) are amino acid replacement substitutions. If these are under weak purifying selection (![]()
. We estimated heterozygosities for each site [
= 2np(1 - p)/(n - 1), where n is the sample size and p is the polymorphic site frequency] and found that the mean heterozygosities of replacement (mean
= 0.29) and silent (mean
= 0.32) polymorphisms are not significantly different (P = 0.25, Student's t-test with unequal variances; P = 0.27, two-tailed Mann-Whitney U-test). We have thus chosen to include replacement polymorphisms; excluding them would considerably reduce the number of polymorphisms for some loci, possibly resulting in an upward bias in estimates of
(see RESULTS). The fraction of replacement polymorphisms among X-linked loci was considerably smaller (9/884) and thus their effect on estimates of
, if any, should be negligible. Note that part of the difference in the number of amino acid replacement polymorphisms detected on X-linked vs. autosomal genes reflects the fact that about two times more coding DNA was surveyed for the latter. The remainder of the difference probably reflects differences in the efficacy of selection against amino acid polymorphisms on the two chromosomes (see ![]()
Assessing the behavior of estimators under neutrality and alternative models:
To test the properties of the two estimators of
under the standard neutral model, we run simulations with sample sizes (n) of 7, 13, or 50, with
= 3
. We consider a range of
values and calculate the average and median values of
divided by the actual value of
. A total of 104 replicates were run for
W00 (and 2 x 103 replicates for
H01) under each parameter combination. Results of similar simulations for n = 50 for
= 4
and
=
are reported by ![]()
/
will vary considerably across the genome, with
< 3
for yellow, su(s), and su(wa) and
> 3
for most loci in regions of high rates of crossing over. In addition to the simulations above, we have also investigated the distribution of
W00 for parameters (n and
) that best match those expected for the yellow locus under
/4 and
3
. These results have been posted at the site http://helios.bto.ed.ac.uk/evolgen/andolfatto/zimbabweLD.
Directional selection at linked sites is known to shape patterns of genetic variation (![]()
![]()
![]()
![]()
![]()
Background selection due to strongly deleterious mutations can be modeled as a simple reduction in effective population size (![]()
(and
) in simulations of the standard neutral model. However, under background selection models that posit more weakly deleterious mutations, this assumption may not be valid (![]()
![]()
We do examine the effects of recurrent hitchhiking due to advantageous mutations by running simulations under a model of recurrent, nonoverlapping selective sweeps (![]()
= 15,
= 45) and determine how the average and median values of 
and 
, as estimated assuming the standard neutral model, change as a function of the rate of selective sweeps (
in ![]()
Incorporating the effects of gene conversion:
Estimates of 
based on rates of crossing over (
) may be systematically underestimated because they ignore gene conversion. Gene conversion was implemented into the standard coalescent with recombination (cf. ![]()
![]()
![]()
and that the distribution of gene conversion tract lengths is geometric with a mean of 352 bp (![]()
We consider three models of how gene conversion (GC) and crossing-over (CO) rates may be associated. In the first model (GC1), we assume that two-thirds of recombination events are GC, and thus the rate of GC,
GC = 2
between adjacent sites, where
is the estimated local rate of crossing over for a particular gene region. In a second model (GC2), we assume that the rate of GC is constant across a chromosome;
GC = 2
MAX, where
MAX is the highest estimated rate of crossing over on the chromosome (
MAX = 2.27 x 10-8/generation between adjacent sites on the X chromosome). In a third model (GC3), we assume that the total rate of recombination between adjacent sites is constant across a chromosome and that the rate of gene conversion between adjacent sites is
GC = (
MAX -
), where
is the estimated rate of crossing over for the locus under study. GC1 and GC2 are equivalent for genomic regions with the highest rates of crossing over and diverge in regions of reduced crossing over. Likewise, GC3 is equivalent to a model with no GC for regions with the highest rates of crossing over. These models are meant to be illustrative rather than quantitative and other possible models of recombination are addressed in the DISCUSSION. 
is estimated as above (from
W00) from simulations with gene conversion to estimate the likelihoods.
How do gene conversion rates in our models compare to data on gene conversion from Drosophila? On the basis of the data from the genes maroon-like and rosy, it is thought that the rate of gene conversion (
, see ![]()
![]()
![]()
10-5, ![]()
![]()
1 at maroon-like (![]()
4 for rosy (![]()
= 0.7 cM/Mb) would be
2 and corresponds to a GC rate of 6 x 10-6/generation. In the GC2 model, the GC:CO ratio at rosy would be
6. For maroon-like (
= 1.36 cM/Mb), the GC:CO ratio = 2 under GC1 (rate
10-5/generation) and would be
0.7 under GC3 (rate
3 x 10-6/generation). In the GC2 model, the GC:CO ratio at maroon-like would be
3. Thus, our assumed rates of gene conversion, and the extent of association between gene conversion and crossing over, are roughly in agreement with the limited data on rates of gene conversion in Drosophila.
| RESULTS |
|---|
Performance of
estimators and choice of data:
With a choice of two similarly performing
estimators,
W00 (![]()
H01, (![]()
, proportional to the number of segregating sites). Both estimators show considerable upward bias when the sample size is small and/or
is small; however, the median of
W00 is generally biased downward under these conditions. Overall,
H01 appears to be more biased than
W00; this is particularly pronounced when the number of segregating sites is small (even for large sample sizes). On the basis of these simulations, we conclude that
W00 should be used cautiously for 7
n
13 unless
is in the vicinity of 10 or larger and probably not at all for samples of <7 alleles. Since
H01 seems to be particularly sensitive to small
, it should be used cautiously when there are <13 alleles unless
10 and probably not at all if
< 5.
|
We have used these results as a guide to select data for our study (see METHODS) and accordingly we have limited the data sets considered here to those that have n
7 and S > 10 (TABLE AA). Only 4 of our selected data sets have n = 7 and for each of these
W > 8 (
W is calculated from Stot; see TABLE AA). The 10 data sets that have 7 < n
13 have
W
6. Possible effects of biases of the estimators on our results are noted in the analyses and discussion that follow.
Fig 1 compares estimates of
W00 and
H01 for all loci fitting the above restrictions.
W00 and
H01 are strongly positively correlated (R = 0.78, P < 0.001, n = 33, Spearman rank correlation test, two-tailed). However,
H01 are also systematically larger than
W00, as might be expected given the results in Table 1. TABLE AA (http://helios.bto.ed.ac.uk/evolgen/andolfatto/zimbabweLD) summarizes correlations among n,
,
W (per locus), and estimates of
(per locus) for the X chromosome data.
|
Differences in sample size and the number of segregating sites are not the only possible causes of the systematic difference between estimators. Regions of reduced recombination in D. melanogaster have lower levels of variability due to hitchhiking and/or background selection (![]()
![]()
estimators will be similar to that under the neutral model (cf. Table 1). However, the observation that these regions also have more low-frequency polymorphisms suggests that recurrent selective sweeps may also influence polymorphism patterns in these regions (![]()
Since one of our objectives is to compare 
and 
estimates across a gradient of crossing-over rates, we investigated the properties of the two
estimators under a recurrent hitchhiking model (Table 2). More frequent hitchhiking is expected to decrease levels of variability at closely linked sites (![]()
![]()

. As can be seen in Table 2, the two
estimators respond differently to increasing rates of hitchhiking.
W00 is strongly positively correlated with
and generally yields 
that are slightly smaller than 
. In contrast, the median
H01 is only weakly correlated with estimates of
, and 
from
H01 are typically much larger than estimates of 
; this difference is larger as the rate of selective sweeps increases.
|
Further investigation of the behavior of these
estimators under alternative models is beyond the scope of this study. In the analyses that follow, we focus on
W00 for several reasons. First,
W00 allows us to easily combine information from multiple loci in a likelihood framework. Second,
W00 is not severely biased upward under parameters that best match our available data (i.e., most are small samples with few segregating sites); in fact, the median estimate is often below the true
. Finally, using
W00 is not likely to result in 
> 
in regions of reduced crossing over if recurrent hitchhiking is reducing levels of variability in these regions.
Comparing levels of diversity and linkage disequilibrium:
In a neutral panmictic population, we expect that 

. Table 3 lists estimates of
for each data set and these are plotted in cytological order in Fig 2. With the exception of the tip of the X chromosome (the first 10 points cover cytological bands 1B1 to 3B3 on the X chromosome), values of 
based on
W00 are slightly lower than the values of 
(13/14 on the X have 
< 
). For 
based on
H01, the pattern is less striking, particularly for the X (7/14 have 
< 
; data not shown); however, the pattern is equally strong for the two estimators on the autosomes (9/9 loci and 8/9 loci have 
< 
, based on
W00 and
H01, respectively). It is difficult to draw solid conclusions from this type of analysis because different sample sizes and numbers of segregating sites for each locus (and different averages for these loci on the X and autosomes) complicate the interpretation of comparisons.
|
|
Further, due to the inherent stochasticity of the evolutionary process, individual estimates of 
and 
are not particularly accurate. We can achieve greater precision by estimating the relative likelihoods for different
values using multiple loci jointly. Fig 3A shows the joint-likelihood curve of 
(based on
W00) for the X-linked and autosomal loci with
> 1 cM/Mb (i.e., loci in areas of high recombination that are least likely to be affected by selection at linked sites). The maximum-likelihood estimate of 
for the X chromosome is 
X = 1.7 million, with an
95% confidence interval of 1.12.3 million. The corresponding estimate of 
X is 4.0 million (based on the
ML), with a 95% confidence interval of 3.15.1 million. For the autosomes, 
A = 0.4 million, with an
95% confidence interval of 0.20.9 million. The corresponding estimate of 
A is 1.8 million with a 95% confidence interval of 1.32.9 million.
|
Several interesting observations emerge from these results. First, the estimate of 
for the X chromosome is larger than that for the autosomes. This trend has been noted before on the basis of levels of nucleotide (![]()
![]()
, we can reject the standard model with equal population sizes for males and females (i.e., an X:autosome ratio of 3/4). Several factors may contribute to this pattern, including a large variance in reproductive success for males (i.e., sexual selection), background selection, and autosome-specific inversion polymorphisms (![]()
![]()
![]()
![]()

X/
A = 2.2 whereas 
X/
A = 4.1. More LD on the autosomes (reflected in the higher 
X/
A ratio) is not predicted under either background selection or sexual selection models and therefore points to possible effects of inversion polymorphisms. Inversions are both much more common and at higher frequencies on the autosomes of D. melanogaster (![]()
![]()
![]()
Interestingly, joint estimates of 
and 
are significantly different from each other on both the X and the autosomes, given our assumed mutation rate (
= 1.5 x 10-9/generation). This suggests that Zimbabwe populations, despite our a priori prediction, show an excess of LD relative to expectations under the neutral model. However, while confidence intervals for 
are overestimated (since we have assumed no intragenic recombination), a trivial explanation for the discrepancy between 
and 
in Zimbabwean D. melanogaster might be error in the estimated mutation rate (e.g., 
would be half as large if we assume the true mutation rate is twofold higher; see METHODS). Given that the mutation rate is uncertain, the apparent LD excess in this population based on estimated rates of crossing over is not entirely convincing.
Joint-likelihood estimates of 
and 
for 15 X-linked and 14 autosomal loci from the California population of D. simulans are shown for comparison (Fig 3B). It is interesting to note in Fig 3B that ratios of 
/
for both the X and the autosomes are much larger for the Zimbabwean population of D. melanogaster (0.4 and 0.2, respectively) than for the California population of D. simulans (0.02 and 0.07, respectively; cf. ![]()
2-fold) may explain the discrepancy between 
and 
in Zimbabwean D. melanogaster, it is unlikely to explain the >10-fold difference between these estimates of N in the D. simulans population in Fig 3B. This suggests that something is fundamentally different in the evolutionary history of these two populations or species. In particular, the Zimbabwe population of D. melanogaster may be closer to mutation-drift equilibrium, while the California population of D. simulans clearly is not.
Behavior of estimates of
as a function of r:
A positive correlation between
and
is expected for two reasons: first,
(an estimate of 4Ner) should increase proportionally with
; second, the gradient in
strongly positively covaries with empirical estimates of diversity (
) in Drosophila (![]()
![]()
![]()

, which under the standard neutral model is inversely proportional to the expected linkage disequilibrium for a given r). Here, we focus on data from the X chromosome because we lack data for autosomal loci in regions of reduced crossing over. For the 24 gene regions on the X chromosome,
W00 (per site) is positively correlated with
in our data set [ Fig 4A;
W00, R = 0.44, P < 0.05, 24 loci;
H01, R = 0.43, P < 0.05, 23 loci (see METHODS); Spearman rank correlation test, two-tailed]. As expected on the basis of previous results,
and 
for the 24 X-linked loci considered here also show a strong positive correlation with
(Fig 4B; based on 
, R = 0.62, P = 0.001, 24 loci; as above). However, a striking aspect of the results presented in Fig 2 is that estimates of 
are rather large near the telomere of the X chromosome (i.e., loci to the far left of the graph), despite the fact that the crossing-over rate and levels of diversity are considerably reduced relative to other regions on the X (TABLE AA and Table 3). Under the standard neutral model, a measure that should be independent of the effective population size is
/
(= 4Ner/4Neµ), which we estimate as
/
(![]()
![]()
/
to positively covary with
. We detect no such correlation (Fig 4C;
W00/
, R = 0.01, 24 loci, P > 0.05;
H01/
, R = 0.13, 23 loci, P > 0.05; as above).
|
The insensitivity of
/
to changes in
(Fig 4C) suggests that the correlation between
and
(Fig 4A) is driven by the relationship between
and
(Fig 4B). In fact, 
and
are significantly negatively correlated [ Fig 4D; 
(
W00), R = -0.51, 24 loci, P = 0.01; 
(
H01), R = -0.27, 23 loci, P > 0.05; as above], the opposite trend to that expected on the basis of the relationship between 
and
. In other words, there is less diversity in regions of low crossing over (Fig 4B), but also less LD than expected given levels of diversity and estimated rates of crossing over. This can also be seen in Fig 4E, which plots 
/
(estimated from
W00 and 
) as a function of
. Under the standard neutral model, N
/N
should be
1 and independent of r. While this prediction is roughly true for the 11 X-linked loci with
> 1 cM/Mb (
/
1), loci with
< 0.5 cM/Mb show considerably higher values of 
/
. Overall, there is a strong negative correlation between 
/
and
(Fig 4E; R = -0.79, 24 loci, P < 0.001; as above), implying that regions of reduced crossing over have considerably less LD than predicted by rates of crossing over under the standard neutral model.
Could a bias in
W00 estimates explain the observed correlations? In simulations of the neutral model (Table 1), we noted a slight upward bias in mean
estimates with decreasing
. However, our correlation result (which employs a nonparametric test) depends more on median estimates of
W00, which were typically lower than the actual
in these simulations (see Table 1). Additional simulations with parameters appropriate for the yellow locus (n = 49, S = 18,
/4) confirmed that the median
W00 is downward biased (see METHODS). Thus, bias in
W00 is not an explanation for the negative correlation between 
/
and
.
Gene conversion and LD patterns in regions of high crossing over:
Our estimates of recombination rate are based on large-scale comparisons of genetic and physical maps that ignore the contribution of gene conversion to the total rate of recombination. At small physical scales (i.e., <1 kb in Drosophila), the contribution of gene conversion to the overall rate of recombination might be substantial (![]()
![]()
Given that gene conversion is a potentially important contributor to the overall rate of recombination, we reexamine the relationship between levels of LD relative to levels of variability (measured as joint
for loci with
> 1 cM/Mb), assuming a model that includes both crossing over and gene conversion (e.g., GC1; see METHODS). In general, we find that, if gene conversion contributes substantially to the overall rate of recombination in regions of high crossing over, there is clear evidence for an excess of LD given levels of diversity in the Zimbabwe data. As an illustration, we present results for gene conversion model GC1 in Fig 5 (this model assumes that two-thirds of recombination events are gene conversions without associated crossing over; see METHODS). Under this model, 
X = 0.33 million for the X chromosome (
95% confidence interval of 0.240.50) and 
A = 0.15 million for the autosomes (
95% confidence interval of 0.040.20). Since 
and 
differ by more than an order of magnitude under this model, this discrepancy is not likely to be explained by uncertainty in the mutation rate (about twofold, see METHODS). Thus, our results suggest either that the level of genetic exchange due to gene conversion is close to negligible in regions of high crossing over or that this Zimbabwean population departs substantially from the predictions of mutation-drift equilibrium in the direction of a excess of







