- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Grote, M. N.
- Articles by Thomson, G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Grote, M. N.
- Articles by Thomson, G.
Constrained Disequilibrium Values and Hitchhiking in a Three-Locus System
Mark N. Grotea, William Klitzb, and Glenys Thomsonca Section of Evolution and Ecology, University of California, Davis, California 95616,
b School of Public Health, University of California, Berkeley, California 94720
c Department of Integrative Biology, University of California, Berkeley, California 94720
Corresponding author: Mark N. Grote, Section of Evolution and Ecology, University of California, Davis, CA 95616., mngrote{at}ucdavis.edu (E-mail).
Communicating editor: G. B. GOLDING
| ABSTRACT |
|---|
Positive selection on a new mutant allele can increase the frequencies of closely linked alleles (through hitchhiking), as well as create linkage disequilibrium between them. Because this disequilibrium is induced by the selected allele, one may be able to identify loci under selection by measuring the influence of a candidate locus on pairwise disequilibrium values at nearby loci. The constrained disequilibrium values (CDV) method approaches this problem by examining differences in pairwise disequilibrium values, which have been normalized for two- and three-locus systems, respectively. We have investigated in detail the reliability of inferences based on CDV, using simulation and analytical methods. Our main results are (i) in some circumstances, CDV may not distinguish well between a selected locus and a neighboring neutral locus, but (ii) CDV seldom indicates "selection" in neutral haplotypes with moderate to large 4Nc. We conclude that, although the CDV method does not appear to precisely locate selected alleles, it can be used to screen for regions in which hitchhiking is a plausible hypothesis. We present a microsatellite data set from human chromosome 6, in which constrained disequilibrium values suggest the action of selection in a region containing the human leukocyte antigen (HLA)-A and myelin oligodendrocyte glycoprotein (MOG) loci. The connection between hitchhiking and disequilibrium has received relatively little attention, so our investigation presents opportunities to address more general issues.
IN the genetic hitchhiking model, positive selection on a new mutant allele increases the frequencies of other alleles physically linked to the mutant, skewing the frequency distributions at the linked loci. Theoretical and empirical studies of hitchhiking generally focus on the reduction in variation at linked neutral loci that can result if the recombination rate is low and the selected mutant is quickly fixed in the population (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Relatively insignificant linkage disequilibrium is always created by the appearance of a new mutant, because initially the mutant is found only on an "ancestral" haplotype of closely linked alleles. ![]()
![]()
![]()
![]()
![]() |
(1) |
Our purpose is to present some recent results that bear upon the use and interpretation of CDV. First, we summarize further simulations of the deterministic model, describing some circumstances under which CDV does, or does not, lead to reliable inferences about the position of the selected locus. In connection with this, we analyze the normalized disequilibrium measures under a selection model with simplifying assumptions and show that inferences with CDV are especially sensitive to allele frequencies at neutral loci closely linked to the selected locus. Second, we apply the CDV method to data sets generated under a stochastic model of neutral haplotypes, using a simulation program of ![]()
![]()
| METHODS |
|---|
Measures of disequilibrium and hitchhiking:
Our attention centers on two normalized measures of pairwise linkage disequilibrium, D' and D'', and in particular on the difference in their magnitudes,

D' is the familiar normalized pairwise linkage disequilibrium measure (![]()
![]()
![]()
![]()
![]() |
(2) |
![]()
![]() |
(3) |

and

The associations between a and c, and between b and c, enter the calculation of D''ab through m1, m2, M1, and M2.
Like D'ab , D''ab lies between +1 and -1, where the extreme values indicate the strongest possible positive or negative association between alleles a and b, within the constraints imposed by the allele frequencies and pairwise disequilibria of the three-locus system. We write D''ab(c) because D''ab is calculated with reference to a particular allele at the third locus, but in a diallelic system, one can show D''ab(c) = D''ab(C) . Moreover, as with D'ab in a diallelic system, D''ab = D''AB = -D''aB = -D''Ab .
Assuming Dab > 0 for the moment,
= |D'| - |D''| is greater than zero when the pairwise measure Dab is more extreme relative to its two-locus maximum, than to its positive range in the three-locus system; in this case the pairwise association between a and b appears to be relatively weaker when all of the pairwise associations of the three-locus system are taken into account. Loosely speaking, when
> 0, the association between a and b is said to be partly accounted for by their mutual association with c. Assuming further that c is a selected mutant, this property of
is the primary reason for treating
> 0 as the "footprint" of a hitchhiking event, in which the neutral a and b alleles have hitchhiked with c (![]()
Although the normalized measure D' may change during a hitchhiking event (![]()
![]()
![]()
![]()
values, when interpreted appropriately, can make this distinction.
For a given three-locus haplotype, each locus may play the role of the constraining locus, and there are three
values:
ab(c),
a(b)c, and
(a)bc. Using deterministic simulations, ![]()
was often large and positive when the "constraining" allele was increasing in frequency due to positive selection, but the linked alleles were selectively neutral. When a nonselected allele played the "constraining" role, ![]()
tended to be zero or negative. Based on their observations, ![]()
values:
- If one of the three
values is positive and the remaining two are zero or negative, the constraining allele that gives the positive
is the one that may have experienced recent selection. - If more than one of the
values is positive, but one is much larger than the rest (for this study, more than double the next largest), the constraining allele that gives the large
is the one that may have experienced recent selection. - If all three
values are
0, or two are positive but close in value, no conclusion about selection can be drawn.
![]()
values under various scenarios, but we first focus simply on which loci the CDV method identifies as candidates for selection, in a large series of deterministic simulations.
A deterministic hitchhiking model:
The deterministic simulations are based on a three-locus, diallelic model that evolves via a standard system of algebraic recursions (![]()
![]()
![]()
The recursion equations describing changes in the haplotype frequencies can be specified by selection and mutation parameters described immediately below, the recombination rates r1 and r2 between the A and B loci and the B and C loci, respectively (where r1 + r2 - 2r1r2 gives the recombination rate between A and C for the "no-interference" model), and a set of initial haplotype frequencies. The latter are determined by specifying initial allele frequencies pa(0), pb(0), pc(0), and a single initial disequilibrium value [e.g., D'ab (0) when c is the new mutant]. In addition, we assume that the haplotype bearing the new mutant has not experienced mutation or recombination before the simulation begins at generation zero [for example, if c is the new mutant, this implies fabc(0) = pc(0)]. The frequency dynamics of a strongly selected allele, once it has left the zero-frequency boundary, are commonly modeled as a deterministic process (e.g., ![]()
Fitnesses at the selected locus (using genotypes at the C locus for illustration) are given by wcc = 1 - sc, wCc = 1, and wCC = 1 - sC. We have adopted a general framework for hitchhiking studies, as our selection model encompasses both directional selection leading to fixation of the new mutant (e.g., sc
0 and 0 < sC
1) and balancing selection (0 < {sC,sc} < 1). Mutation is unidirectional at rates µa = µb = µc = 10-5 per generation from alleles a, b, and c to A, B, and C, respectively, so that the alleles of interest are transient. We use terms like "equilibrium frequency" loosely, referring to the relatively fast adjustment of allele frequencies that results from the appearance of a new selected mutant. For completeness, we have included the recursion equations in the Appendix 1.
Scope of the deterministic simulations:
The parameter space for the deterministic model is large and multidimensional, so we limit our investigation to a relatively narrow subset of parameter values under which measurable linkage disequilibrium is likely to be present. Using simple frequency arguments, one can conclude that most new mutants arise on relatively common haplotypes; but more unusual events, in which mutants appear on rare haplotypes, are actually of greater interest in hitchhiking studies. ![]()
In the following simulations, we have (somewhat arbitrarily) set the initial frequency of at least one of the neutral alleles at p(0) = 0.05, to ensure that the ancestral haplotype is sufficiently rare. Table 1 shows parameter values that are typical of the simulations. Here, c is the selected mutant and pa(0) and pb(0) are treated in a symmetric fashion, each assuming the value p(0) = 0.05 while the other takes values between 0.05 and 0.9 in successive runs. Some values of the initial pairwise disequilibrium D'ab (0) rule out certain combinations of pa(0), pb(0), and pc(0) in Table 1, but the same treatments are always applied to the a and b alleles.
|
Values of the remaining parameters were guided by a few basic rules. Hitchhiking is thought to be a weak force unless selective values are roughly an order of magnitude greater than recombination rates (![]()
![]()
![]()
r2 in Table 1] were left unexamined to keep the number of runs reasonable. More detailed tables are in ![]()
Within these guidelines, our first objectives are to significantly enlarge upon the number of deterministic cases examined in ![]()
values is inconsistent with correct inference of the selected locus.
CDV in a stochastic neutral model:
Our second aim is to study the performance of CDV in a neutral, finite-population model, to determine whether or not genetic drift and sampling effects can produce patterns of linkage disequilibrium conforming to criteria 1 or 2 above. ![]()
values under genetic drift.
We have modified a computer program of ![]()
![]()
values in the neutral model. The program simulates random samples of three-locus haplotypes, generated under the neutral "infinite alleles" model with recombination at equilibrium. The program requires the following input parameters: n, the number of haplotypes per sample; 4Nc, the scaled recombination rate between the A and C loci (the B locus is assumed to be halfway between A and C);
a,
b, and
c (with, e.g.,
a = 4N µa, where N is the effective population size and µa is the mutation rate to new A-locus alleles). We used the value
= 0.2 at each locus, corresponding to the approximate numerical solution of

(![]()
= 0.2 were segregating exactly two alleles at each locus, so we screened each sample and retained only those with diallelic loci. We further required in each sample a standard minimum level of heterozygosity, H
0.095 per locus. We then calculated the three values,
(a)bc,
a(b)c, and
ab(c) in each accepted sample. For each of three levels of recombination 4Nc, we generated independent samples until 1000 samples had met the screening criteria; our stochastic simulation results are based on these groups of 1000 samples.
| RESULTS |
|---|
Deterministic simulations:
Figure 1 Figure 2 Figure 3 show sample runs of a deterministic model in which c is the selected mutant and the A and B loci are neutral. In Figure 1 Figure 2 Figure 3, recombination rates, initial allele frequencies at the A and C loci, and mutation and selection parameters are the same; only the initial frequency of the b allele varies between the figures.
|
|
|
In the allele frequency plots, pc(t) approaches the equilibrium value
= 0.25, then slowly declines due to mutation (not evident in these plots). Frequencies of the a and b alleles both increase due to hitchhiking with the selected mutant c. The frequencies ultimately attained by a and b depend on their initial frequencies, Dab(0), the strength of selection on c, and the recombination rates between these loci (![]()
![]()
is zero. In the deterministic model without selection on c, Dab would remain at zero, whereas Dac and Dbc would decline from their initial values to zero without a transient increase. In these runs, it is only after the disequilibrium measures D have attained relatively large values that deviations from
= 0 are observed.
In Figure 1,
values between roughly generations 100 and 300 satisfy criteria 1 or 2 to correctly indicate selection at the C locus. Later in the run
values conform to criterion 3, where no conclusions about selection would be made. Figure 2 conforms entirely to criterion 3, having no signal for selection during the run. In Figure 3, both the b and c alleles meet criteria for selection at different times in the run, although only c is under selection. In particular, applying the CDV criteria at any time between generations 320 and 550, we could conclude that the neutral b allele is in fact under positive selection (Figure 3 is similar to pattern II' in Figure 3 of ![]()
The performance of the CDV criteria in a large series of deterministic runs is summarized in Table 1 and Table 2. Following the discussion above, we have classified each run by determining which alleles, if any, the CDV criteria would indicate as "selected." The run of Figure 1 shows a correct signal at the C locus for 100
t
300 but gives no signal for selection otherwise, and is counted under the column "signal at c alone" in Table 1. Sampling such a run at an arbitrary time, we might draw no conclusions, but would not incorrectly identify a neutral allele as selected. The run of Figure 2 gives no signal for selection at all and is counted under "no signal" in Table 1. The run of Figure 3 gives, for 320
t
550, a misleading signal for selection at the neutral B locus and is counted under "signal at b" in Table 1 (there is a similar column for "signal at a"). Because there are no runs in this series with signals at both neutral loci, each run falls into only one of these categories. We have chosen a conservative classification that emphasizes times during which CDV leads to incorrect inferences. In the text below, we describe broad trends and give some breakdowns of the runs that would not be evident by examining Table 1 and Table 2 alone. We use percentages in the tables and text as convenient summaries, but do not view these as probabilities.
|
In 26.1% (676/2590) of the runs in Table 1, the only signal identifying an allele under selection correctly points to c as the selected mutant. The CDV criteria identify the selected locus most reliably when the b allele is initially of moderate frequency: 50.5% (283/560) of the runs in Table 1 with pb(0) = 0.3, 0.4, or 0.5 and pa(0) = 0.05 resulted in the c allele being correctly identified, 35.5% (199/560) led to a possible misidentification of the b allele, and the remaining 14.0% (78/560) gave no signal for selection. CDV also performs well when the initial disequilibrium between the neutral alleles is negative: 46.7% (294/630) of the runs with D'ab(0) = -0.25 correctly identified the c allele and only 20.8% (131/630) gave false signals at a or b. CDV performs poorly when the initial frequencies of the neutral loci differ widely, resulting in a false signal for selection at the rarer of the two neutral alleles: 52.1% (219/420) of the runs with pa(0) = 0.75 or 0.9 and pb(0) = 0.05 gave a false signal at b, and 33.3% (140/420) of the runs with pa(0) = 0.05 and pb(0) = 0.75 or 0.9 gave a false signal at a. In general, CDV does a poor job identifying the selected allele when the b allele is rare: 62.1% (956/1540) of the runs with pb(0) = 0.05 gave a false signal for selection at the b allele. When the b allele is initially rare, the CDV criteria do not distinguish well between the new selected mutant c and its closest neutral neighbor.
In these simulations, when sc
0.0 and sC > 0.0, the new mutant c will be transiently fixed in the population (often called a "selective sweep"). In the selective-sweep runs, 24.8% (257/1036) gave a correct signal from the c allele, 50.5% (523/1036) led to a possible misidentification of the b allele, and 4.2% (43/1036) to a possible misidentification of the a allele. In the next section, we will examine why CDV may not perform especially well in a selective sweep.
As one might imagine, there is a trend in the reliability of inferences associated with the ratio sc/sC: for fixed values of the remaining parameters, with both sc, sC > 0, runs with larger values of sc/sC tend to have no signal, those with smaller values of sc/sC tend to have incorrect signals from the b allele, and those with intermediate values of sc/sC allow the CDV method to perform best. The critical values of the ratio sc/sC depend in a complex way on the remaining parameters and appear to be different in each series of runs.
Table 2 is similar in structure to Table 1, except now b is the new selected mutant and the A and C loci are neutral. Here, by symmetry, there is no need to switch the roles of the neutral loci, and we use pc(0) = 0.05, r2 = 0.001 throughout. When b is the new mutant, a relatively small number of runs have a potentially misleading signal at a neutral locus, and nearly all are cases in which the neutral allele frequencies pa(0) and pc(0) differ widely [i.e., pa(0) = 0.75 or 0.9 and pc(0) = 0.05].
The role of allele frequencies at a closely linked neutral locus:
The most problematic observation in the simulations above was a strong tendency for the CDV method to indicate selection at the B locus when c was the new selected mutant. Using some mathematics and general aspects of the hitchhiking model, it is possible to show that a rare neutral allele on the ancestral haplotype can easily be mistaken for the selected allele, when using the CDV method. The analysis requires some simplifying assumptions, but gives some generality to the results of the deterministic simulations, showing that our observations do not depend strongly on particular choices of parameter values.
An overdominance model:
We examine the behavior of
a(b)c, the
value that indicates selection at the B locus, during the rapid increase of a new, strongly overdominant c allele. To avoid dealing with the time component explicitly, we focus on
a(b)c at t = 0, t "small" (a few generations) and t "moderate" (on the order of 100 to a few hundred generations). We assume that r1 and r2 are small enough so that recombination in the ancestral haplotype abc can be practically ignored when t is near zero, and further assume that pb(0) is small enough so that b and c are in strong coupling for small-to-moderate t. Low recombination and strong coupling of b and c imply that fab(t), fac(t), and fbc(t) are all approximately equal to pc(t) for small-to-moderate t. We finally assume Dab = 0, but due to hitchhiking, all of Dab, Dac, and Dbc are positive after a few generations of selection.
To characterize
a(b)c, we must study the relationship between D'ac and D''a(b)c for t = 0, t small and t moderate. For convenience, the required definitions when Dac
0 are

and

where

and

Because all of Dab, Dac, and Dbc are
0 by assumption, the sign of min*Dac is determined entirely by the relative sizes of the positive and negative terms in m2. When the disequilibria Dab and Dbc in m2 are small relative to the third-order products of allele frequencies (as they will tend to be for t near zero), m2
0 and min*Dac
0. When the disequilibria are large relative to the third-order products (as they tend to be for moderate t), m2 < 0 and min*Dac > 0.
At t = 0, fac = pc, and because the new mutant c is found only with a, Dac = max Dac. Further, the inequality

must hold, because the set (paqc, qapc, M1, M2) that determines max*Dac contains the set that determines max Dac. Dac = max Dac then implies max*Dac = max Dac, and therefore

for t = 0.
For small t, with the disequilibria of m2 still small relative to the third-order products, the reasoning is very similar. Because the new mutant c is still found almost exclusively on the ancestral haplotype, Dac
max Dac to a good approximation, so it also must be true that max*Dac
max Dac. We then have
a(b)c
0 for small t.
The situation changes when the disequilibria of m2 are large relative to the third-order products, so that m2 < 0 and min*Dac > 0; here, we must use the second case in the definition of D''ac above. We further observe that when the loci are evenly spaced, recombination begins relatively soon to reduce Dac below its two-locus maximum (compared to Dab and Dbc), although all of the disequilibria may have dropped below earlier large values due to allele frequency constraints. Now consider t moderate, with min*Dac > 0 and Dac < max Dac. To determine the sign of
a(b)c, we must examine as before the relative magnitudes of max Dac and max*Dac. It is convenient to use algebraically equivalent expressions for the terms M1 and M2 in max*Dac:

Under our assumptions, fab
fbc
pc for small-to-moderate t to a reasonable approximation, and therefore M1
qapc, M2
paqc. We then may write

Along with Dac < max Dac, this implies

which is algebraically equivalent to

or
a(b)c > 0. Putting the above together, we have shown that
a(b)c
0 for t = 0 and t small, but
a(b)c > 0 for t moderate.
Using very similar arguments, it is possible to show that
ab(c)
0 during the same time interval, so that the same general mechanisms give the "correct" signal at the C locus. The constrasting result
(a)bc
0 can be obtained using the same detailed arguments, or more easily can be obtained by noting that Dbc remains very close to max Dbc during the time interval of interest. Taken together, these arguments suggest that under our assumptions the CDV criteria could indicate selection at either the B or C loci, but not at the A locus.
A selective sweep model:
A second basic model may be handled without doing any further analysis. For the selective sweep case, we assume sc
0 and sC > 0, so that the selected mutant c will be fixed, but the remaining assumptions are the same. The transient dynamics of allele and haplotype frequencies are the same as in the overdominance model, with perhaps minor differences in time scale; the main difference is in the endpoint of the selection process. ![]()
values, in the sweep model. The basic reasoning of the previous section again suggests that during this time, a signal for apparent selection is possible from either the selected locus or a nearby neutral locus carrying a rare allele.
Stochastic simulations:
We have calculated
values in simulated random samples from a stochastic, neutral diallelic model, to informally investigate the "type I" error in the CDV method. The three-locus neutral model is perhaps the simplest null-model that would be considered for data of the type used for CDV. ![]()
![]()
![]()
![]()
Distributions of the three
values in samples of size n = 100 are shown in Figure 4 for 4Nc = 10, 25, and 100. The histograms of Figure 4 show only the univariate (marginal) distributions of
values and contain no information about the associations within samples of the three
values. At each value of 4Nc, the
= 0 class is by far the most common for each of
(a)bc,
a(b)c, and
ab(c), with
0 relatively uncommon. Negative values of
are more common than positive values when
departs from zero. Relative to
(a)bc and
ab(c),
a(b)c is more often different from zero.
|
The frequencies of apparent "hitchhiking" events, obtained by applying the CDV criteria to the samples in Figure 4, are shown in Table 3. For 4Nc = 10, each locus satisfies the criteria for selection in a small percentage of cases: here one can expect to find a signal for selection at some locus perhaps 8 to 9% of the time, using the CDV criteria in a neutral sample. With 4Nc
25, however, any apparent signal for "selection" based on the CDV criteria would be unusual. In concordance with the deterministic simulations, although here there is no selection, we obtain false signals for selection at the B locus more often than at A or C (as expected, the A and C loci give similar results). We take this as further evidence of a "position" effect that favors the middle locus.
|
Marker haplotypes from human chromosome 6:
To illustrate one use of the CDV method, we have calculated
values in a series of three-locus microsatellite haplotypes in the 6p21.3-22.1 region of human chromosome 6 (see Figure 5). We do not presume any of these marker loci are selected, but suppose instead that perhaps one or more markers could be closely linked to a selected gene.
|
We used a "sliding window" approach, examining in turn each of the five groups of three adjacent markers among the seven markers shown in Figure 5. Human leukocyte antigen (HLA)-F3' and myelin oligodendrocyte glycoprotein (MOG)c are dinucleotide repeats closely linked to the HLA F locus and the MOG locus, respectively. HLA-A, a major histocompatibility complex class I locus, is located between the D6S265 and HLA-F3' markers shown in Figure 5 (![]()
![]()
![]()
![]()
![]()
![]()
![]()
values in each of these 17 haplotypes, converting to dialleles by combining the alleles not under consideration into a single class. All 3 haplotypes of the D6S265/HLA-F3'/MOGc loci had disequilibrium patterns conforming to criteria 1 or 2 (Table 4), but none of the remaining 14 haplotypes met these criteria.
|
| DISCUSSION |
|---|
Inferences with CDV:
In the deterministic runs, where the new selected mutant appeared at a terminal locus (the C locus), the CDV method did not distinguish well between the selected locus and a neutral neighbor, especially when a relatively rare allele of the neutral locus was initially linked with the selected mutant. In this case a signal for apparent selection could either be detected from the selected locus or the neutral locus. This situation is unfortunate and somewhat paradoxical, because we have argued that selected mutants that form on rare haplotypes create the most significant linkage disequilibrium in a hitchhiking scenario. To some extent, the CDV method is sensitive to each of the parameters of the model, but we discovered in particular a sensitivity to allele frequencies at the middle locus (the B locus). We showed, using an analytical approach under the assumption of strong selection and tight linkage, that a rare neutral allele at the B locus may easily be mistaken by the CDV criteria for the selected mutant c.
![]()
In our deterministic simulations, when the middle locus (the B locus) had the new selected mutant, the CDV method gave correct inferences in a large majority of runs. It is difficult to put this attractive result into practice in the inference setting, because a signal for apparent selection at the B locus could indeed reflect selection at the locus or could be a false signal of the type that was commonly observed when c was the selected allele. One remedy might be to confine inferences to terminal loci, perhaps obtaining additional markers that could place any locus of interest at the "A" or "C" positions of our model. This assumes we could be virtually certain about inferences at terminal loci, an assertion that is contradicted by the fraction of deterministic runs of Table 1 and Table 2 in which a signal appears at an unselected terminal locus. It further seems possible that a generalization of our analytical approach, which relaxes assumptions about position, could show that inferences about terminal loci may not be reliable in the presence of rare neutral alleles. We think at present that the CDV method may not allow for high-precision inferences about the location of selected mutants; on this point we depart from ![]()
The stochastic simulations showed that patterns of linkage disequilibrium conforming to criteria 1 or 2 are uncommon for 4Nc
10 and highly unusual for 4Nc as large as 100. Here, we think there is potential inference value in the CDV method, because a simple neutral model can apparently be ruled out if either criteria 1 or 2 is met in a moderate-sized sample, with 4Nc on the order of 100. At this point, other nonselective alternative hypotheses (such as the neutral model with population structure or migration) cannot immediately be ruled out; this requires work beyond our current scope.
Although we do not think that CDV can very accurately distinguish the particular locus that has the selected allele, we do think that CDV can be used to screen for fairly localized regions that may have a recent history of hitchhiking (in general agreement with ![]()
0.095 at each locus, and that there is moderately strong, but not complete, linkage disequilibrium in the region.
Selected mutations and linked markers at equilibrium:
We now describe a simple model of recurrent selected mutations and address some implications for CDV and similar methods. The simplest model assumes that selected alleles arise at random points in the genome. If such events are rare, the influence of new selected alleles on linked loci is transient: eventually the new mutant reaches equilibrium, and recombination, mutation, and genetic drift again dominate the dynamics of linked loci. Under this simple model, neutral alleles linked to a new overdominant mutant will increase in frequency and may reach high levels of disequilibrium, but do not generally fix (because the overdominance mode tends to preserve extant variation). Two such loci will return to neutral frequency and phase equilibria, respectively, at rates 1 - 1/2N, the rate of loss of heterozygosity at either locus (with N the effective population size; see, e.g., ![]()
![]()
![]()
= 4Nµ is small, a majority of linked neutral sites will be monomorphic. We have further claimed that disequilibrium created by hitchhiking is primarily connected to rare events in which selected mutants appear on low-frequency haplotypes. In particular, these impediments suggest that in chromosomal regions thought to be subject to recurrent selective sweeps (![]()
![]()
![]()
For tightly linked loci, patterns of linkage disequilibrium conforming to criteria 1 or 2 persist approximately as long as the time required for the new mutant to reach equilibrium (![]()

generations (![]()
![]()
is the approximate deterministic change per generation in the frequency of a favored allele a, where fitnesses are wAA = 1 -
, wAa = 1, waa = 1 +
. Using the same reasoning in the symmetric overdominance model, with fitnesses wAA = 1 - s, wAa = 1, waa = 1 - s (so that the fitness differential between the most extreme genotypes is s in both cases), the expected time for the new mutant to reach the interior polymorphism is approximately

generations. These persistence times can be small relative to the times required for the recovery of linkage equilibrium or neutral levels of polymorphism. For example, if N is 105, s = 0.01, and µ = 10-5, the persistence time for CDV-type patterns of linkage disequilibrium is <5000 generations in the selective sweep model, whereas if most extant variation is lost during the sweep, 105 generations on average are required to reestablish polymorphism at monomorphic sites, and during this period no new CDV-type patterns could be observed.
Human chromosome 6 haplotypes:
In Table 4, we showed three haplotypes of the D6S265/HLA-F3'/MOGc loci that met the CDV criteria for hitchhiking. HLA-F3' and MOGc are physically close, so we must make a rough assessment of 4Nc between these loci if we wish to compare the data with the neutral simulations of Figure 4 and Table 3. Although there is apparently no family data that give precise estimates of the recombination fraction between HLA-F3' and MOGc, the physical distance between these loci is known to be approximately 100150 kb, based on YAC contig and STS maps (![]()
1.16 cM [obtained by observing that the genome size is equivalently 3200 Mb or 3702 cM in human females (THE HUMAN TRANSCRIPT MAP 1996)], we conclude that 4Nc between HLA-F3' and MOGc is ~9. Thus, the D6S265/HLA-F3'/MOGc haplotype appears to span a distance over which criteria 1 or 2 are not commonly met in the simple neutral model. The setting here is not directly analogous to the null-model calculations of Figure 4 and Table 3 for two main reasons: (i) different three-locus marker haplotypes may share an allele at one or more loci, introducing dependencies not present in the simulated neutral haplotypes; (ii) it is well known that the "infinite alleles" mutation model used for the neutral simulations does not apply to microsatellite loci (see, e.g., ![]()
We conclude that hitchhiking with one or more selected alleles, closely linked to the D6S265/HLA-F3'/MOGc loci, is a plausible explanation for the patterns of linkage disequilibrium observed in these haplotypes. Three apparently distinct haplotypes meet criteria 1 or 2, suggesting that hitchhiking with overdominant alleles is the more likely scenario: the data would seem to require otherwise that several favored alleles in the region are simultaneously being selected for, or that an ancestral haplotype bearing a favored allele has experienced several mutation events. We have also argued that the loss of variation under the selective sweep model poses a serious problem for observing disequilibrium, making it unlikely that disequilibrium created specifically by selectively favored alleles would ever be observed. While we have scaled back previous efforts to infer the precise location at which selection has acted, our results are consistent with other work on selection in this region of the human genome (![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We thank C. H. Langley, who read an earlier draft of the manuscript and made suggestions that led to substantial revisions. The human chromosome 6 haplotypes were collected by L. Calandro and G. F. Sensabaugh, who generously allowed us to use them here. We thank D. Cutler and A. D. Long for discussion and suggestions. An anonymous reviewer made suggestions that improved the presentation. This work was supported by National Institutes of Health grants HD-12731, GM-56688, and 5 T32 GM-07127.
Manuscript received February 6, 1998; Accepted for publication August 7, 1998.
| APPENDIX 1 |
|---|
THREE-LOCUS DETERMINISTIC RECURSIONS
In the three-locus diallelic model, the eight haplotypes (gametes) are ABC, ABc, AbC, Abc, aBC, aBc, abC, abc, and their respective frequencies in a given generation are x1, ... x8,
8i=1 xi = 1. Let

where wij is the fitness of the genotype formed from haplotypes i and j, and let
=
8i=1
ixi . After selection and recombination, the haplotype frequencies are given by

where

To complete one generation, we need only introduce mutation, which is unidirectional from a, b, and c to A, B, and C, respectively, all at rate µ per generation. After mutation, the haplotype frequencies are

This completes one generation of the recursion.
| LITERATURE CITED |
|---|
AGUADÉ, M., N. MIYASHITA, and C. H. LANGLEY, 1989 Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster.. Genetics 122:607-615
BAUR, M. P., and J. A. DANILOVS, 1980 Population analysis of HLA-A, B, C and DR and other genetics markers, pp. 955993 in Histocompatibility Testing 1980, edited by P. TERASAKI. University of California, Tissue Typing Laboratory, Los Angeles.
BEGUN, D. J. and C. F. AQUADRO, 1994 Evolutionary inferences from DNA variation at the 6-phosphogluconate dehydrogenase locus in natural populations of Drosophila: selection and geographic differentiation. Genetics 136:155-171[Abstract].
BEGUN, D. J. and C. F. AQUADRO, 1995 Evolution at the tip and base of the X chromosome in an African population of Drosophila melanogaster.. Mol. Biol. Evol. 12:382-390[Abstract].
CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics Theory. Burgess Publishing Co., Minneapolis.
EWENS, W. J., 1973 Conditional diffusion processes in population genetics. Theor. Pop. Biol. 4:21-30[Medline].
EWENS, W. J., 1979 Mathematical Population Genetics. Springer-Verlag, Berlin.
FEDER, J. N., A. GNIRKE, W. THOMAS, Z. TSUCHIHASHI, and D. A. RUDDY et al., 1996 A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis. Nat. Genet. 13:399-408[Medline].
FELDMAN, M. W., I. FRANKLIN, and G. J. THOMSON, 1974 Selection in complex genetic systems I. The symmetric equilibria of the three-locus symmetric viability model. Genetics 76:135-162
GEIRINGER, H., 1944 On the probability theory of linkage in Mendelian heredity. Ann. Math. Stat. 15:25-57.
GOLDING, G. B., 1984 The sampling distribution of linkage disequilibrium. Genetics 108:257-274
GROTE, M. N., 1996 Models of genetic selection and the Human Leukocyte Antigen loci. Ph.D. Thesis, University of California, Berkeley.
HARTL, D. L., and A. G. CLARK, 1989 Principles of Population Genetics. Sinauer Associates, Inc., Sunderland, MA.
HEDRICK, P. W., 1987 Gametic disequilibrium measures: proceed with caution. Genetics 117:331-341
HILL, W. G., 1974 Disequilibrium among several linked neutral genes in finite population I. Mean changes in disequilibrium. Theor. Pop. Biol. 5:366-392[Medline].
HILL, W. G., 1975 Linkage disequilibrium among multiple neutral alleles produced by mutation in finite populations. Theor. Pop. Biol. 8:117-126[Medline].
HILL, W. G. and B. S. WEIR, 1988 Variances and covariances of squared linkage disequilibria in finite populations. Theor. Pop. Biol. 33:54-78[Medline].
HUDSON, R. R., 1983 Properties of a neutral allele model with intragenic recombination. Theor. Pop. Biol. 23:183-201[Medline].
HUDSON, R. R., 1985 The sampling distribution of linkage disequilibrium under an infinite allele model without selection. Genetics 109:611-631
HUMAN GENOME DATA BASE, 1997 http://www.gdb.org
THE HUMAN TRANSCRIPT MAP, 1996 http://www.ncbi.nlm.nih.gov/SCIENCE96
KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989 The "hitchhiking effect" revisited. Genetics 123:887-899
KLITZ, W. and G. THOMSON, 1987 Disequilibrium pattern analysis. II. Application to Danish HLA-A and B locus data. Genetics 116:633-643
LAUER, P., N. C. MEYER, C. E. PRASS, S. M. STARNES, and R. K. WOLFF et al., 1997 Clone-contig and STS maps of the hereditary hemochromatosis region on human chromosome 6p21.3-p22. Genome Res. 7:457-470
LEWONTIN, R. C., 1964 The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:49-67
LEWONTIN, R. C., 1988 On measures of gametic disequilibrium. Genetics 120:849-852
MAYNARD-SMITH, J. and J. HAIGH, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35[Medline].
MOSSER, J., A. M. JOUANOLLE, G. GANDON, N. ANDRIEUX, and A. HAMPE et al., 1997 A YAC contig and an STS map spanning at least 3.9 megabasepairs telomeric to HLA-A. Immunogenet. 45:447-451[Medline].
OHTA, T. and M. KIMURA, 1975 The effect of a selected linked locus on heterozygosity of neutral alleles (the hitchhiking effect). Genet. Res. 25:313-325[Medline].
PARHAM, P. and T. OHTA, 1996 Population biology of antigen presentation by MHC class-I molecules. Science 272:67-74[Abstract].
ROBINSON, W. P., A. CAMBON-THOMSEN, N. BOROT, W. KLITZ, and G. THOMSON, 1991a Selection, hitchhiking and disequilibrium analysis at three linked loci with application to HLA data. Genetics 129:931-948[Abstract].
ROBINSON, W. P., M. A. ASMUSSEN, and G. THOMSON, 1991b Three-locus systems impose additional constraints on pairwise disequilibria. Genetics 129:925-930[Abstract].
SATTA, Y., C. O'HUIGEN, N. TAKAHATA, and J. KLEIN, 1994 Intensity of natural selection at the major histocompatibility complex loci. Proc. Natl. Acad. Sci. USA 91:7184-7188
SENSABAUGH, G. F., L. CALANDRO, T. THORSEN, L. BARCELLOS, and J. GRIGGS et al., 1996 Commentary. Blood Cells Mol. Dis. 22:194a-194b.
STEPHAN, W. and C. H. LANGLEY, 1989 Molecular genetic variation in the centromeric region of the X chromosome in three Drosophila ananassae populations. I. Contrasts between the vermillion and forked loci. Genetics 121:89-99
THOMSON, G., 1977 The effect of a selected locus on linked neutral loci. Genetics 85:753-788
THOMSON, G. and M. P. BAUR, 1984 Third order linkage disequilibrium. Tissue Antigens 24:250-255[Medline].
VALDES, A. M., M. SLATKIN, and N. B. FREIMER, 1993 Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737-749[Abstract].
This article has been cited by other articles:
![]() |
D. Meyer, R. M. Single, S. J. Mack, H. A. Erlich, and G. Thomson Signatures of Demographic History and Natural Selection in the Human Major Histocompatibility Complex Loci Genetics, August 1, 2006; 173(4): 2121 - 2142. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Stephan, Y. S. Song, and C. H. Langley The Hitchhiking Effect on Linkage Disequilibrium Between Linked Neutral Loci Genetics, April 1, 2006; 172(4): 2647 - 2663. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. H. Kohn, H.-J. Pelz, and R. K. Wayne Natural selection mapping of the warfarin-resistance gene PNAS, July 5, 2000; 97(14): 7911 - 7915. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Fay and C.-I Wu Hitchhiking Under Positive Darwinian Selection Genetics, July 1, 2000; 155(3): 1405 - 1413. [Abstract] [Full Text] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Grote, M. N.
- Articles by Thomson, G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Grote, M. N.
- Articles by Thomson, G.









