## Abstract

Given the relative ease of identifying genetic markers linked to QTL (compared to finding the loci themselves), it is natural to ask whether linked markers can be used to address questions concerning the contemporary dynamics and recent history of the QTL. In particular, can a marker allele found associated with a QTL allele in a QTL mapping study be used to track population dynamics or the history of the QTL allele? For this strategy to succeed, the marker-QTL haplotype must persist in the face of recombination over the relevant time frame. Here we investigate the dynamics of marker-QTL haplotype frequencies under recombination, population structure, and divergent selection to assess the potential utility of linked markers for a population genetic study of QTL. For two scenarios, described as “secondary contact” and “novel allele,” we use both deterministic and stochastic methods to describe the influence of gene flow between habitats, the strength of divergent selection, and the genetic distance between a marker and the QTL on the persistence of marker-QTL haplotypes. We find that for most reasonable values of selection on a locus (*s* ≤ 0.5) and migration (*m* > 1%) between differentially selected populations, haplotypes of typically spaced markers (5 cM) and QTL do not persist long enough (>100 generations) to provide accurate inference of the allelic state at the QTL.

QUANTITATIVE trait locus (QTL) mapping is a method for identifying genomic regions contributing variation to continuously varying traits. It is used to analyze the genetic architecture of ecologically important phenotypic differences in natural populations and of economically important differences in domesticated plants and animals (Lander and Botstein 1989; Tanksley 1993; Falconer and Mackay 1996; Hawthorne 2003). The aim of most QTL mapping efforts is to understand features such as the number and location of genomic regions contributing to the observed phenotypic variation, the direction and magnitude of effect of QTL on the trait in question, and the causes of genetic correlations (Lynch and Walsh 1998; Slate 2005).

Once QTL are delimited by nearby markers on a linkage map, one might seek to understand the history and the contemporary dynamics of the QTL. Ideally, polymorphism in the genes that cause phenotypic variation would be used to study the population genetics and phylogeography of the loci contributing to phenotypic diversity (*e.g.*, Wang *et al.* 1999). Although tools for identifying candidate genes underlying phenotypic variation are available for a few model organisms (Wayne *et al.* 2000), achieving such a level of detail in ecologically or economically important organisms is in most cases prohibitively expensive. Further, it is not clear that sequencing the relevant loci will reveal the phenotypically relevant nucleotide change(s). We are seeking alternative strategies for the population-genetic analysis of QTL using the marker-QTL associations that emerge from a typical QTL mapping effort. In particular we ask, can we infer the evolutionary history or population genetics of QTL using the genetic markers and linkage information available from a QTL mapping experiment?

One important product of a QTL mapping effort is a set of alleles at marker loci that are associated with particular alleles at nearby QTL. Given the relative ease of identifying genetic markers linked to QTL (compared to finding the causative loci), we ask whether they can be used to address questions concerning the contemporary dynamics and recent history of the QTL. Because QTL experiments are most often begun with the cross of a single individual from one population to one individual from another, the experiment will often identify a single marker allele at each linked locus that is associated with a given QTL allele. How accurately might that marker allele infer the allelic state of the QTL locus in a natural population? To use markers in this way requires the allelic states of nearby QTL to be reliably inferred from the allelic state(s) of flanking marker loci. Because this inference relies on intact marker-QTL haplotypes, it is weakened by recombination between the flanking marker loci and QTL (Maynard Smith and Haigh 1974; Slatkin and Wiehe 1998; Barton 2000). Here we investigate the dynamics of marker-QTL haplotype frequencies under recombination, population structure, and divergent selection, to assess the feasibility of using linked markers for population genetic studies of QTL. In fact, the analyses presented here are not limited to marker-QTL associations. The flanking marker could be associated with any locus causing ecologically important phenotypic variation.

It is well established that selection at one locus causes changes in frequencies of linked neutral loci or “genetic hitchhiking” (Maynard Smith and Haigh 1974; Barton 2000). This phenomenon is used in an alternative to QTL mapping, hitchhiking mapping, to identify genomic regions with characteristics, such as reduced allelic diversity, commonly found in regions under selection (Storz 2005). The questions that we address here could therefore be used to understand the dynamics of a QTL (or selected locus), using markers identified via hitchhiking mapping as well as via QTL mapping.

Theoretical treatments confirm that the effects of recombination on linkage disequilibrium between selected and flanking neutral loci will vary with the recombinational distance between the marker and selected loci (or QTL) and with the time since the adaptive mutation first appeared (Slatkin and Wiehe 1998). To appreciate the dynamics of selected loci and linked markers under selection, it is also important to distinguish the cytological phenomenon of crossing over from the population-genetic phenomenon of recombination. This is because recombination depends not only on the genetic distance between loci but also on genetic variation at the QTL and the marker loci, an observation implicit in the formulas of Maynard Smith and Haigh (1974) and discussed by Charlesworth *et al.* (1997). Without variation at both the marker and the QTL loci, crossing over may occur during meiosis, but no recombination will result. For example, crossing over in a population containing only AB/AB genotypes will result in only AB gametes, regardless of the genetic distance between the loci, and will generate no recombinant genotypes. Strongly structured populations may be genetically depauperate overall because of genetic drift or lack variation locally because of recent strong selection or minimal input of new variation through gene flow. Such reduced variation would increase the difference between crossing over and recombination rates. Thus the details of population structure may influence the hitchhiking process such that in structured populations facing divergent selection or reduced gene flow realized recombination may be less common than crossing over, resulting in relatively wide and persistent hitchhiking regions around selected loci. In a recent example of the role of population genetic structure in the process of genetic hitchhiking, early computer simulation studies of the width of hitchhiking regions around human disease genes likely underestimated the width of hitchhiking regions by a factor of 10, by neglecting the details of a population's history and structure (Collins *et al.* 1999). We have used mathematical analysis and computer simulation to address the influence of additional population genetic forces on interlocus associations to quantify the influence of selection and gene flow on the durability of marker-QTL or marker-selected locus haplotypes.

We analyze the dynamics of a two-population, two-habitat system in which each population is specialized on one of the habitats and mates randomly with others in that habitat. Our two populations experience divergent selection, in contrast to the uniform selection regime discussed by Maynard Smith and Haigh (1974) and Barton (2000). Although Barton (2000) also discussed the effects of population structure on the features of a hitchhiking region, his analysis considers populations under the same selection regime, so that divergence among the populations in that analysis is caused by genetic drift.

Following development and interpretation of an analytical model, we use both deterministic and stochastic simulations to describe the influence on the durability of a hitchhiking region of such factors as gene flow between habitats, the strength of divergent selection, and the genetic distance between a marker and the QTL. Our results show that the strength of selection, the amount of gene flow among divergent populations, and the genetic distance of the markers from the selected locus (or QTL) have a large influence on the durability of the marker-QTL haplotypes. Our analyses also suggest that under most conditions, associations among moderately spaced markers and QTL will not be durable enough to allow the use of markers to track the underlying allelic states of QTL in population genetic or phylogeographic analyses. There are, however, conditions under which the associations are maintained for many generations.

## ANALYTICAL MODEL

Our model comprises two infinitely large, hermaphroditic diploid subpopulations in different local environments. We consider the association of two alleles (*Q* and *q*) at a selected locus (QTL) with two alleles (*A* and *a*) at a neutral marker locus, with a probability *c* of recombination between the two loci in each generation. Migration between the two environments takes place at a rate *m*; specifically, in each generation a fraction *m* of the population in environment 1 is randomly chosen to migrate to environment 2 and vice versa. Selection, followed by random mating in each environment, takes place after migration. The relative fitnesses of the QTL genotypes in the two environments are given in Table 1. An important feature of these fitnesses is that individuals favored in environment 1 are selected against in environment 2 and vice versa.

Insight into the dynamics of the marker-QTL haplotype frequencies can be gained from a formal analysis of the model's near-equilibrium behavior. To begin, let *x*_{1}(*t*), *x*_{2}(*t*), *x*_{3}(*t*), and *x*_{4}(*t*) be the frequencies in environment 1 and generation *t* of QTL-marker haplotypes *QA*, *Qa*, *qA*, and *qa*, respectively; and let *y*_{1}(*t*), *y*_{2}(*t*), *y*_{3}(*t*), and *y*_{4}(*t*) denote these frequencies in environment 2. We note that since *x*_{1} + *x*_{2} + *x*_{3} + *x*_{4} = *y*_{1} + *y*_{2} + *y*_{3} + *y*_{4} = 1, to describe the system it suffices to track the dynamics of the six-component vector *v* = (*x*_{1}, *x*_{2}, *x*_{3}, *y*_{1}, *y*_{2}, *y*_{3})^{T}.

Under the model assumptions above, it is possible to derive expressions that give the haplotype frequencies in generation *t* + 1 as functions of the frequencies in generation *t.* To do so, we let *x _{G}* (respectively,

*y*) denote the frequency of the (diploid) two-locus genotype

_{G}*G*in environment 1 (respectively, 2) and generation

*t*. Then after migration and selection but before recombination and mating, the

*relative*genotype frequencies in each environment are given by expressions such as(the computer algebra package Maple was used to compute these and subsequent steps in the derivation; Maple 9 worksheet is available upon request). Then after meiosis (including recombination at the rate

*c*) and random mating, the haplotype frequencies

*x*(

_{i}*t*+ 1) (

*i*= 1, 2, 3) in the next generation are given bywhere

*b*is a suitable normalization constant and the relative frequencies of the genotypes following recombination and random mating areand for all other genotypes

*g.*Expressions for

*y*(

_{i}*t*+ 1) (

*I*= 1, 2, 3) are similar.

The entire process described above may be summarized with a nonlinear function *F*, such that(1)is the vector of haplotype frequencies in generation *t* + 1 if *v*(*t*) is the corresponding vector in generation *t* (the equations are specified in full in appendix a). Our aim is to find equilibria, *i.e.*, vectors *v** satisfying *F*(*v**) = *v**, and to analyze the behavior of trajectories (*v*(*t*), *F*(*v*(*t*)), *F*^{2}(*v*(*t*)), …) for *v*(*t*) near *v**. This generalizes the work of a number of authors for single populations and pairs of populations; see Levene (1953), Moran (1962), Felsenstein(1976), and references therein.

We obtain approximate expressions for equilibria of *F*. As long as recombination and migration are present (*c*, *m* > 0), at equilibrium we must have*i.e.*, the ratios *P*(*QA*)/*P*(*Qa*) and *P*(*qA*)/*P*(*qa*) must be equal both within and between subpopulations. This is demonstrated by making a change of variables so that the dependent variables are the allele frequencies at the QTL, the allele frequencies at the marker, and the within-subpopulations QTL-marker linkage disequilibria. Using the symmetry of the system to infer that at equilibrium the proportions of *Q* alleles in population 1 and *q* alleles in population 2 are equal, one obtains a linear system for the remaining four variables. It is then straightforward to show that the system is at equilibrium if and only if the proportions of *A* alleles are equal in both subpopulations and the within-subpopulation linkage disequilibria are equal to zero. We therefore seek equilibria of the formwhere *p* and α are constants in the interval [0, 1] representing equilibrium values in population 1 of *P*(*Q*) and *P*(*A*), respectively.

Since the marker locus does not influence fitness, expressions for *p* can be obtained by solving a simpler system including only the QTL. As shown in appendix b, when the migration rate *m* is small compared to *h* and *s*, the value(2)gives a stable solution of the “QTL-only” system. Furthermore, if *h <* 1, this value gives the *only* stable solution of that system. Since we are interested in the case of weak migration, we confine our attention to this value of *p*. This specifies the equilibrium values of the allele frequencies at the QTL; equilibrium values of the marker-QTL haplotype frequencies will be determined by the equilibrium value of α, which in turn will be determined by initial conditions. From this point on we also specify , to simplify the presentation and make necessary computer algebra operations tractable.

To understand the dynamics of solutions of the full (marker + QTL) system (1) in the neighborhood of the equilibrium *v**, we must find the eigenvalues and eigenvectors of the linearized system(3)where *w = v − v** represents the difference between a solution *v* and the equilibrium *v** and *J* is the 6 × 6 matrix of partial derivatives of the components of the transition function *F* with respect to *x*_{1}, *x*_{2}, *x*_{3}, *y*_{1}, *y*_{2}, and *y*_{3}, evaluated at the equilibrium *v**. However, even approximate expressions for these eigenvalues and eigenvectors are quite complicated. Using the computer algebra software Maple, it was possible to compute approximate expressions for the eigenvalues that are valid when both the migration rate *m* and the recombination rate *c* are small (Maple worksheets are available upon request). Specifically, to leading order in *m* and *c* we have(4)(5)(6)(7)(8)(9)A few comments on these eigenvalues are in order. First, we note that the expression λ_{3} = 1 (6) is exact. This is because the equilibrium *v** is not unique, but rather is a member of a set of equilibria parametrized by α. Loosely speaking, therefore, since the error *w = v − v** remains constant when *v* itself is an equilibrium, an eigenvalue exactly equal to 1 is inevitable (see Devaney 1992, Chap. 5, for a more precise discussion). Second, we note that the eigenvalues λ_{1} and λ_{2} in fact have no dependence on the recombination rate *c* at any order. This is because they are identical to the eigenvalues of the simpler QTL-only system linearized around the corresponding equilibrium (*p*, 1 − *p*) (see appendix b). They therefore characterize evolution in which allele frequencies change only at the QTL while remaining constant (*e.g.*, at fixation or loss) at the marker locus.

Further examination of the eigenvalues gives conditions for the stability of the equilibrium *v**. Specifically, the equilibrium will be linearly stable if all its eigenvalues lie strictly between −1 and 1 (except for λ_{3} = 1). Keeping in mind that we must have 0 ≤ *c* ≤ , *h* ≥ 0, and 0 ≤ *s* ≤ 1, we find that the stability criterion is satisfied provided that selection is sufficiently strong (*i.e.*, *h* and *s* are sufficiently near 1) and/or migration and recombination are sufficiently weak (*i.e.*, *m* and *c* are sufficiently close to 0, where the meaning of “sufficiently close” depends on *s* and *h* due to the powers of 1/*hs* in the expressions for the eigenvalues). We assume henceforth that the values of *h*, *s*, *m*, and *c* indeed give stability of the equilibrium *v**.

To draw more detailed conclusions about the behavior of solutions to (1) near the equilibrium *v**, it is necessary to examine the eigenvectors of the linearized system (3) as well as the eigenvalues. We do so here only for the special case *s* = 0.5, *h* = 0.5, *m* = 0.01, and *c* = 0.025. In this case, numerical expressions for the eigenvalues are, to four significant digits,(10)(here λ_{4} ≠ λ_{6} because numerical values were computed using expressions accurate to second order in *m* and *c*). The corresponding eigenvectors (normalized to have magnitude 1) are(11)(12)(13)(14)(15)(16)The behavior of a solution *w* of (3) is given bywhere *w _{i}* is the projection of

*w*(0) onto the eigenvector

*v*.

_{i}We now show that the form of the eigenvectors *v _{i}* [(11)–(16)] indicates that when selection is strong and migration and recombination rates are low, solutions will approach equilibrium allele frequencies rapidly at the QTL, but only very slowly at the marker locus. The conclusion regarding QTL allele frequencies follows from analysis of the QTL-only system: since the eigenvalues, which govern the QTL allele frequencies, are close to zero, these allele frequencies will approach stable equilibrium values rapidly (see appendix b for details). To obtain the conclusion regarding marker allele frequencies, first suppose that the

*Q*and

*q*QTL allele frequencies are close to equilibrium. In this case the first two components of the error vector should be approximately equal in magnitude and of opposite sign, because any excess of

*QA*haplotypes (relative to equilibrium) must be balanced by a deficit in

*Qa*haplotypes. Therefore, the projections

*w*

_{1}and

*w*

_{2}of the error

*w*onto

*v*

_{1}and

*v*

_{2}should be small and the dynamics should be dominated instead by the other projections

*w*

_{3},

*w*

_{4},

*w*

_{5}, and

*w*

_{6}.

Continuing, we find that in this case nearly all of the error in the frequency *x*_{3} of the *qA* haplotype in population 1 will be projected onto the eigenvectors *v*_{4} and *v*_{6}. Thus, the rate at which *x*_{3} approaches its equilibrium value is controlled by the eigenvalues λ_{4} and λ_{6}. Since λ_{4} and λ_{6} are small (*i.e.*, not close to 1), *x*_{3} will approach its equilibrium value rapidly. Likewise, the rates at which the frequencies *x*_{1} and *x*_{2} in population 1 of the *QA* and *Qa* haplotypes approach their equilibrium values are controlled by λ_{3} and λ_{5}. Since these eigenvalues are close to 1, *x*_{1} and *x*_{2} will approach their equilibrium values extremely slowly. However, the equilibrium value of *x*_{3} + *x*_{4} will be quite small because of strong selection against the *q* allele in population 1. For the same reason, the equilibrium value of *x*_{1} + *x*_{2} will be large. Thus the overall dynamics in population 1 are dominated by the dynamics of *x*_{1} and *x*_{2}, so that equilibrium allele frequencies are approached much more slowly at the marker than at the QTL. Similar considerations apply for population 2.

In fact, we have seen that λ_{3} = 1 exactly, which could appear to indicate that no approach to equilibrium occurs. However, it is possible to show that every solution of the full, nonlinear dynamical system does approach one of a continuum of equilibria. The formal mathematical arguments that demonstrate this behavior rigorously have been developed elsewhere (Wood and Miller, unpublished results).

The eigenvalue λ_{5} can be interpreted as follows. For the system to move closer to equilibrium along the direction of the eigenvector *v*_{5}, an *A* allele must move from population 1 to population 2 or an *a* allele must move from population 2 to population 1 [in the case that *P*(*A*) > *P*(*a*) in population 1; the opposite case is similar]. Focusing on the *A* allele, we see that this transfer requires four events to take place. First, the individual carrying the *A* allele must migrate (with probability *m*). Second, this individual (who is probably a *QQ* homozygote) must be selected in environment 2 (with probability 1 − *s*). Third, an offspring bearing the *A* allele (who is probably a *Qq* heterozygote) must be selected (with probability 1 − *hs*). Finally, the *A* allele must recombine onto a chromosome bearing the *q* allele (with probability *c*). The product of these four probabilities is *mc*(1 − *s*)(1 − *hs*), one-half the numerator of our approximation to λ_{5} − 1 (see Equation 8), and can be viewed as the rate of flow of *A* alleles from population 1 to population 2. Since an equal number of *a* alleles flow from population 2 to population 1, the rate is multiplied by 2. The denominator *hs* in this approximation can be viewed as a reflection of the fact that some *A* alleles in population 1 are already linked to *q* alleles, etc., which speeds up the approach to equilibrium.

The eigenvalue λ_{5} can also be used to estimate error in genotyping the QTL locus via the linked marker. If a *Q* (respectively, *q*) allele is inferred to be present at the QTL whenever an *A* (*a*) allele is present at the marker, then an error will occur whenever a haplotype is *Qa* or *qA*. Thus the probability of error is *P*_{err} = *P*(*Qa*) + *P*(*qA*), where *P*(*H*) denotes the frequency of haploytpe *H*. Focusing on population 1, the equilibrium frequency of the *Q* allele at the QTL is close to 1 (Equation 2) and this equilibrium frequency is approached rapidly. We therefore approximate *P*_{err} by 1 − *P*(*QA*), where the frequency (here and henceforth) is within population 1 only. It is the movement of marker alleles from one population to another and formation of recombinant haplotypes (*e.g.*, *Qa* or *qA*) that generate the possibility for genotyping error. Writing for the initial frequency of *QA*, for the equilibrium frequency, and for the frequency after *t* generations, we have(17)If population 1 is initially fixed for *QA* and population 2 for *qa*, then , so (17) becomes and using *P*_{err} ≅ 1 −*P*(*QA*) gives(18)Solving for *t* gives the approximate number of generations until error in genotyping reaches a specified level *P*_{err},(19)with the latter approximation being appropriate when λ_{5} is close to 1 (as it will be under strong selection and low migration). For example, when *m* = 0.01, *c* = 0.025, *s* = 0.1, and *h* = 0.5 Equation 19 predicts that an error rate of 0.1 will be reached after ∼26 generations; if *s* is increased to 0.5, Equation 19 predicts that an error rate of 0.1 will be reached after ∼297 generations.

Generalizing from this example, in light of the expressions above for the eigenvalues and eigenvectors of the system with general *m*, *c*, *h*, and *s*, one conclusion can be stated as follows: In large populations, when selection is strong (*h* and *s* near 1) and the migration rate is small (*m* near 0), equilibrium allele frequencies will be approached rapidly at the QTL but very slowly at the linked marker locus. As a result, the genotyping error rate *t* generations after migration begins between two divergently fixed populations will be approximately *mc*(1 − *s*)(1 − *hs*)*t*/*hs*, where *m* is the rate of migration, *c* is the QTL-marker recombination rate, 1 − *s* is the fitness of disfavored homozygotes, and 1 − *hs* is the fitness of heterozygotes. Formally, this conclusion applies only for systems already close to equilibrium; in the next section, we examine to what extent this condition can be relaxed.

## DETERMINISTIC SIMULATIONS

To gain intuition about the dynamics of population subdivision and haplotype frequencies in the system described above, numerical simulations were carried out for a range of system parameter values and for initial conditions typical of two scenarios: (1) populations making secondary contact after evolution in allopatry or (2) the appearance of a novel allele conferring a significant selective advantage, such as resistance to a xenobiotic. We created simulations using Equation 1 to project haplotype frequencies in generation *t* + 1 as functions of those in generation *t*. Simulations were initialized in two ways, one for each scenario type, as described below and iterated for 2000 generations. Simulations of both scenarios are begun with maximum population divergence or opportunity thereof. This provides a very favorable scenario for the use of neutral markers to track alleles at the flanking QTL. We used two metrics for describing the association of QTL and marker loci, *P*_{err} (the probability of a marker allele in association with the “wrong” QTL allele) and *F*_{ST} (the degree of genotypic differentiation among populations at the marker or QTL locus as calculated in Weir and Cockerham 1984). *P*_{err} is a useful measure of marker-QTL haplotype durability because it describes the degree to which the marker locus accurately infers a particular allele at the QTL. As the marker-QTL haplotypes are dissolved through recombination, *P*_{err} increases. When *P*_{err} exceeds some threshold (here we use 5 and 10%) the utility of the marker-QTL inference is in doubt. Calculation of *F*_{ST} of the marker locus between two populations facing different selection regimes describes the degree to which the alleles at the marker locus differ between the populations. Because QTL alleles are under strong divergent selection, marker locus alleles maintaining strong association with the QTL alleles will have high *F*_{ST} between populations and reduced *F*_{ST} indicates decay of marker-QTL haplotypes. These simulations were carried out using the scientific computation software Matlab (Matlab 7.0.1 programs are available upon request). We now present some of the simulation results and discuss their implications for inferring QTL (or a selected locus) genotypes using linked markers.

#### Secondary-contact scenario:

Simulation of two populations in secondary contact after divergent evolution in allopatry began with all individuals in population 1 carrying the *A* allele of the marker locus and all those in population 2 carrying the *a* allele. At the QTL, initially all individuals in population 1 carried the *Q* allele and all those in population 2 carried the *q* allele. A series of simulations were performed with strong yet realistic selection on the QTL (*s =* 0.1) (Rieseburg and Burke 2001), an array of migration rates between two divergent populations (*m =* 0.001, 0.01, 0.05, and 0.25), an intermediate measure of dominance for the QTL (*h =* 0.5), and a marker-QTL genetic distance typical of those obtained in QTL mapping studies (*c* = 0.025). To assess the effects of dominance the values *h* = 0.1 and 0.9 were also used with *s =* 0.1 and *m* = 0.01; some simulations were also performed with extraordinarily strong selection (*s* = 0.5). For all conditions tested, allele frequencies at the QTL approached their equilibrium values very quickly (within 100 generations for *s* = 0.1 and 10 generations for *s =* 0.5). The equilibrium value of the *Q* allele frequency in population 1 decreased as *m* increased, *e.g.*, from 0.98 with *m* = 0.001 to 0.625 with *m* = 0.05 for *s* = 0.1. The rate of decay of *F*_{ST} at the marker also depended strongly on migration rates, with higher migration leading to faster decay, as should be expected (Figure 1). With *s* = 0.1 and *h* = 0.5, *F*_{ST} at the marker decayed from its initial value of 1 to ∼0.86, 0.21, and 0.0019 after 100 generations and 0.61, 0.0049, and 4.7*e*-9 after 400 generations for *m* = 0.001, 0.01, and 0.05, respectively (Figure 1). In contrast, equilibrium values (approached within 100 generations) of *F*_{ST} at the QTL for these parameter combinations were ∼0.93, 0.49, and 0.063, respectively. With extremely strong selection, *s* = 0.5, all values of *F*_{ST} at both the selected and neutral loci were higher at any given time. For example, with *s* = 0.5 and *m =* 0.05, *F*_{ST} at the marker was 0.81 after 100 generations, 0.52 after 400 generations, and 0.050 after 2000 generations.

Our primary objective is to consider these dynamics as they influence marker-QTL association and thereby accuracy in QTL genotyping. With *s =* 0.1, *h* = 0.5, and *m =* 0.001, 0.01, and 0.05, decay of the marker-QTL association led to a high probability of error (*P*_{err} > 0.1) after 376, 42, and 18 generations, respectively (Figure 2); these times were longer than those predicted by Equation 19. *P*_{err} was sensitive to dominance: with *s* = 0.1, *m* = 0.01, and *h* = 0.1 or 0.9, *P*_{err} rapidly climbed above 10% after only 33 and 58 generations, respectively. With very strong selection (*s =* 0.5), the time required for *P*_{err} to climb above 10% was substantially longer: 55, 307, and >2000 generations for *m* = 0.05, 0.01, and 0.001, respectively (data not shown); in these cases, agreement with Equation 19 was good.

The distance between the marker and the QTL or selected locus is perhaps the only researcher-controlled parameter involved. Using an abbreviated set of model parameters, we show that only under rather extreme conditions will typically spaced markers be sufficiently close to the QTL for accurate inference to occur (Table 2). For example, even when the selection coefficient *s* is extreme (*s* > 0.5) linked markers accurately infer the QTL genotype for only 100 generations. If, however, the markers are 10- to 1000-fold closer to the QTL, clearly unusual under current technology, the prospects of accurate inference improve significantly. When the marker-QTL distance is on the order of 0.01 cM, for example, some reasonable combinations of selection and migration retain marker-QTL haplotype integrity for ∼1000 generations.

#### Novel allele scenario:

The second scenario modeled was one in which a new and strongly favored allele arises in a population. This population exchanges migrants with a second population in which the new allele is strongly selected against. This scenario mimics the selection regimes acting on novel alleles conferring resistance to a xenobiotic such as an insecticide or an antimalarial drug (Yan *et al.* 1998; Wootton *et al.* 2002). We acknowledge that the fitness costs observed in the xenobiotic-free environment are possibly higher than those found in nature; however, this feature acts to increase the difference between the favored genotypes in each environment, leading to a conservative analysis of the ability of linked markers to accurately distinguish the genotypes at the selected locus.

Haplotype frequencies were initialized as follows. In population 2, all QTL alleles were of type *q*, and either marker alleles were evenly split between *A* and *a* (*P*_{0}(*A*) = 0.5) or one allele was much more common (*P*_{0}(*A*) = 0.9) and the alternate allele was at a frequency 1 − *P*_{0}(*A*). In population 1, initial frequencies were identical to those in population 2 except that *qA* bearers representing 0.001% (*i.e*., 10^{−5}) of the total population were replaced by *QA* bearers. Parameter values used were *s* = 0.1 and 0.5; *h* = 0.5; *m =* 0.001, 0.01, and 0.05 (as well as *m* = 0.25 for *s* = 0.5); and *c* = 0.025.

When the novel QTL allele occurred with a common marker allele, *i.e.*, when the initial frequency of the shared marker allele (*A*) was 0.9, *F*_{ST} at the marker remained below 0.01 at all times for all parameter combinations used. When the novel QTL allele occurred with a less common marker allele, *e.g.*, *P*_{0}(*A*) = 0.5, *F*_{ST} at the marker remained extremely small (<0.001) at all times for *s* = 0.1. When the selection coefficient was much higher (*s* = 0.5), *F*_{ST} at the marker rose quickly to maxima, ranging from 0.0011 (for *m* = 0.25) to 0.061 (for *m* = 0.001), and then decayed (Figure 3). This was in contrast to *F*_{ST} at the QTL, whose behavior was like that described above for the secondary-contact case. Genotyping error rates were uniformly high (>0.1 and often closer to 0.5) for all conditions tested (data not shown).

## STOCHASTIC SIMULATIONS

To examine the extent to which population size and genetic sampling might lead to observations different from those recorded above, a series of stochastic simulations was conducted for the secondary-contact scenario. The stochastic model included two randomly mating diploid hermaphroditic populations of *N* individuals each (*N* = 20, 100, and 500 were used) and conditions similar to those described above: a single QTL and a neutral marker 2.5 cM apart (assuming a Haldane mapping function), selection coefficient *s* = 0.1, dominance coefficient *h* = 0.5, and a migration rate of *m* = 0.01 per generation. Initially all individuals in population 1 were *QA*/*QA* and all individuals in population 2 were *qa*/*qa* homozygotes. For each population size, 1000 replicate populations were simulated for 2000 generations each with migration, recombination, and random mating steps in each generation as described above. Haplotype frequencies were saved for generations 1–40, and every 20 generations thereafter, and used to calculate *F*_{ST} at the neutral and selected loci. Simulations were written in Matlab (programs are available upon request), compiled using the Matlab C compiler, and executed on a Linux cluster.

Plots of the 10th, 50th, and 90th percentiles (of replicate runs) of *F*_{ST} for both loci are given in Figure 4 for *N* = 20 and in Figure 5 for *N* = 500. With *N* = 20, *F*_{ST} at both the selected and marker loci decayed rapidly (90th percentile = 0 after ∼350 generations) while with *N* = 500, *F*_{ST} at the marker decayed rapidly (median *F*_{ST} close to 0 after ∼350 generations), while median *F*_{ST} at the selected locus remained close to 0.5 after 2000 generations. This behavior is readily understandable, as small population size will weaken the effect of selection and increase the effective migration rate (Charlesworth *et al.* 1997). With *N* = 100, the behavior of *F*_{ST} at both loci was similar to that with *N* = 500, but with faster decay at the marker and more variance in *F*_{ST} among the replicate runs. The reduction in variance (across replicate runs) at higher population sizes is also readily understandable, as larger populations more closely approximate the infinite population size that would display deterministic behavior.

## DISCUSSION

The results presented here suggest that neutral markers flanking a QTL at intervals typical of QTL mapping studies will not accurately indicate allelic states at linked QTL except under strict conditions of very strong selection, very low gene flow, or very short distances between marker loci and QTL. These results suggest that with current technology and marker spacing neutral markers identified in QTL mapping experiments will not accurately measure the dynamics of QTL alleles.

The present system is a special case, amenable to analysis, of a more general system in which migration rates as well as selective regimes could be asymmetrical between environments (*i.e.*, in which the parameters *m*, *s*, and *h* would be replaced by pairs of parameters *m*_{1}, *m*_{2}, *s*_{1}, *s*_{2}, and *h*_{1}, *h*_{2}). It would also be possible to reframe the equations in terms of genotype rather than haplotype frequencies, which would allow the inclusion of effects like assortative mating in the model.

We are interested in using flanking markers to track selected loci in two scenarios using the present model. The secondary-contact scenario models several natural situations, including, of course, secondary contact of previously diverged types currently found in a hybrid zone, locally adapted populations, and ecologically based speciation following habitat-driven divergence. The QTL and marker loci are fixed for alternative alleles until interaction among populations begins. In this scenario markers that flank selected QTL would assist the study of the dynamics and spread of ecologically important alleles that perhaps contribute to the population's divergence. The novel allele scenario models situations such as the evolution of resistance to a xenobiotic by a pest or pathogen population. In this scenario the favored allele at the selected locus or QTL is initially quite rare, arising via mutation or gene flow, and selection may be relatively strong. We envision this scenario most easily in managed ecosystems such as agricultural and public health contexts (Yan *et al.* 1998; Wootton *et al.* 2002; Hawthorne 2003) or highly disturbed ecosystems such as heavy-metal-laden soils (Antonovics *et al.* 1971). There are, however, likely natural situations to which it applies as well.

The strength of selection acting at the QTL had an important influence on both *P*_{err} and *F*_{ST}. Extremely strong selection increased the persistence of marker-QTL haplotypes, reducing *P*_{err} and increasing *F*_{ST} at the marker locus. Under these conditions, we expect the marker to serve as a reliable surrogate for the selected locus for ∼10^{3} generations. Unfortunately, more realistic values of *s*, even those at the higher end of those reported for QTL in nature, result in quite short-lived marker-QTL haplotypes. Strong selection retards gene flow at selected and linked loci by rapidly eliminating individuals with unfavored alleles. This has the important consequence of reducing opportunities for recombination between the selected locus and flanking marker loci. These effects of selection are reflected in the factor 1 − *s* of our eigenvalue λ_{5} − 1. Dominance (*h* in our model) interacts with selection by altering the probability of recombination between QTL and marker loci. Since recombination occurs only in double heterozygotes, reduction of heterozygote fitness causes reduced recombination. As a result, marker allele frequencies take longer to approach their equilibrium values and associations of specific QTL and marker alleles persist for a longer time when heterozygotes are less fit. This effect is seen in the factor *c*(1 − *hs*) of our eigenvalue λ_{5} − 1.

Similarly, higher migration rates lead to increased formation of heterozygotes and a quicker approach to equilibrium marker allele frequencies. Under conditions of realistic yet strong selection (*s* = 0.1) the migration rate strongly affected the durability of marker-QTL associations. As with strong selection, decreased migration diminishes opportunities for recombination by reducing the frequency of QTL-marker heterozygoes; this is reflected in the factor *m* of λ_{5} − 1. Decreased migration between interacting populations is an effective force maintaining haplotypes in our simulations, with especially strong effects of reduced migration observed when *m* < 0.01. Ecological phenomena such as habitat-mediated mate choice (Rundle *et al.* 2000; Danley and Kocher 2001; Hawthorne and Via 2001), seasonal asynchrony (Wood and Keese 1990; Feder *et al.* 1994), or sexual selection (Panhuis *et al.* 2001; Shaw and Parsons 2002) may facilitate the maintenance of marker-QTL haplotypes through significant reductions in effective migration between habitat or population types.

We show here that for marker-QTL associations recovered from a typical QTL mapping exercise (2.5 cM) the markers are too far from the target QTL or selected locus to accurately infer the genotype at the target locus (given an acceptable error rate of 10%). Because the marker-QTL distance can be reduced by developing ever-denser linkage maps, the recombination rate (*c*) between the marker and QTL loci is ultimately under the control of the researcher. We show that when markers are much closer to the target locus (*e.g.*, 0.1–0.01 cM from the QTL or target locus) marker-QTL haplotypes may persist for ∼10^{3}–10^{4} generations. We are optimistic that future linkage maps will be sufficiently dense for accurate inference of selected loci. The use of linked markers to track phenotypically important loci is not necessarily rendered moot by greatly increased resolution of markers or even by complete genome sequencing. Even if the QTL-containing region is sequenced, the phenotypically relevant nucleotide substitution(s) is often not obvious (Templeton *et al.* 2005). In such cases, a closely linked neutral marker may be more useful for tracking the dynamics of selected alleles than use of a possibly nonneutral mutation within the selected locus.

The conclusions above are derived from analysis of our deterministic model, although they should broadly apply to stochastic models as well. Additionally, our simulations have been carried out for a period of only 2000 generations. On such a short timescale, new mutation at the marker and the QTL can be neglected. However, such mutation must be considered in analysis of processes on longer timescales.

Three empirical analyses of the width of hitchhiking regions around strongly selected loci corroborate our results. Wooten *et al.* (2002) found large regions (5 cM) of hitchhiking-caused linkage disequilibrium (LD) and reduced allelic diversity around chloroquine resistance-conferring loci in the malaria parasite *Plasmodium falciparum*. Similarly, Raymond *et al.* (1991) and Yan *et al.* (1998) have shown that mosquito populations worldwide share a common haplotype around insecticide resistance-causing loci. These two cases share the relatively short timescale, strong selection, and recessivity of the favored allele that would favor maintenance of marker-target locus haplotypes.

Two other studies, Ting *et al.* (2000) and Clark *et al.* (2004), also analyzed hitchhiking regions around genes that had been subject to selection, but found relatively narrow hitchhiking regions, corresponding to narrow blocks in which old marker-QTL haplotypes were likely intact. Ting *et al.* (2000) found narrow hitchhiking regions (1.8 kb ≅ 0.01 cM) around Odysseus, a potential “speciation gene” in Drosophila. Similarly, Clark *et al.* (2004) found a narrow hitchhiking region (≤100 kb) around teosinte-branched, a gene thought to have contributed to domestication of maize. Our analysis suggests that many factors, including a long timescale, weak selection, and relatively high fitness of heterozygotes, may have influenced the width of hitchhiking regions in these studies. For example, the timescale of divergence for the Drosophila study is on the order of millions of generations, clearly sufficient to weaken association between nearby loci in our model. For the teosinte-branched example, several features of the evolution of modern maize might contribute to the narrow region of marker-teosinte-branched associations, including high recombination rates in maize, especially near and within genes, and high potential gene flow between ancient maize and teosinte populations, both of which would cause the quick decay of marker-QTL associations (Clark *et al.* 2004).

Our results indicate that unless selection is very strong, migration is very low, and/or markers are extremely close to the target locus, a neutral marker will not accurately track genotypes at a flanking QTL or target locus. For this reason, markers linked to a QTL, at a distance typical of a QTL mapping project, may not be especially useful for inference of the QTL genotype in population genetic analysis. This result is especially robust because in several instances, including fixed alternative allele frequency difference and equal and symmetric selection regimes in alternative habitat types, we have set the simulations up to maximize differences between populations, facilitating the persistence of marker-QTL haplotypes. The failure of those haplotypes to persist in simulations with all but the most extreme parameter values suggests that under natural conditions flanking markers at similar distances from the QTL will perform even worse as surrogates for the selected locus.

## APPENDIX A: HAPLOTYPE FREQUENCY EQUATIONS

The haplotype frequencies *x _{i}* and

*y*(

_{i}*i*= 1, 2, 3) in generation

*t*+ 1 are given in terms of the frequencies in generation

*t*as follows (the argument

*t*is omitted on the right-hand sides of the equations),whereand

## APPENDIX B: THE QTL-ONLY SYSTEM

As with the QTL-marker system, we assume two large diploid populations in different local environments with the fitness functions given in Table 1, migration at a rate *m* and recombination at a rate *c*. We let *x*(*t*) and *y*(*t*) denote the proportions *P*(*Q*) of *Q* alleles at generation *t* in populations 1 and 2, respectively. We let *x*_{11}(*t*)=*P*(*QQ*)=*x*^{2}(*t*), *x*_{12}=*P*(*Qq*)=*x*(*t*)(1−*x*(*t*)), and *x*_{22} = *P*(*qq*)=(1−*x*(*t*))^{2} denote genotype frequencies in population 1, with *y*_{11}, *y*_{12}, and *y*_{13} denoting the analogous frequencies in population 2.

The genotype frequencies after migration but before selection and mating are , for *ij* = 11, 12, and 22. The frequencies after selection are , and . Finally, the new allele frequencies after migration, selection and mating are and , where and are normalization constants. These three steps can be summarized in a nonlinear function , giving the allele frequencies in generation *t +* 1 in terms of the allele frequencies in generation *t*.

To find approximate expressions for the equilibria of *f*, we expand *x* and *y* in powers of *m*,and solve for the coefficients (*x _{i}*,

*y*) (

_{i}*i*= 1, 2, …). At

*O*(1) there are nine distinct solutions for (

*x*

_{0},

*y*

_{0}), namely (0, 0), (1, 0), (0, 1), (1, 1), ((

*h*−1)/(2

*h*−1), 0), (0,

*h*/(2

*h*−1)), (1,

*h*/(2

*h*−1)), ((

*h*−1)/(2

*h*−1), 1), and ((

*h*−1)/(2

*h*−1),

*h*/(2

*h*−1)). To determine the stability of the solution (

*x*,

*y*) corresponding to a given (

*x*

_{0},

*y*

_{0}) pair, we first solve for (

*x*

_{1},

*y*

_{1}), giving an approximate solution, . Next we obtain

*O*(

*m*

^{2}) expressions for the eigenvalues of the Jacobian matrix of partial derivativesevaluated at . An equilibrium is stable if both eigenvalues lie between −1 and 1. It turns out that if the relative fitness of heterozygotes (1−

*hs*) is greater than that of the disfavored homozygote (1−

*s*), then all equilibria except that with the

*O*(1) term (1, 0) are unstable for

*m*sufficiently small. Since we are interested in the case of low migration and no heterozygote advantage or disadvantage, we therefore confine our attention to the sole linearly stable equilibrium, which we denote by and which is given to order

*O*(

*m*

^{2}) by . The eigenvalues of

*J*evaluated at (

*x**,

*y**) are given to order

*O*(

*m*) by

Thus both λ_{1} and λ_{2} will lie strictly between −1 and 1 provided that migration is sufficiently weak (*i.e.*, that *m* is sufficiently close to 0), where the meaning of “sufficiently weak” depends on the values of *h* and *s*. For the specific values , , *m* = 0.05, by taking *O*(*m*^{2}) terms (not shown) into account we find , .

## Acknowledgments

J.R.M. thanks the University of British Columbia Mathematics Department for hospitality and support provided during part of the writing of this article. The work of J.R.M. was partially supported by National Science Foundation grant DMS-0201173. This work is also supported by a grant to D.J.H. from the U.S. Department of Agriculture-National Research Initiative (2002-35302-12478).

## Footnotes

Communicating editor: D. M. Rand

- Received October 15, 2004.
- Accepted June 28, 2005.

- Copyright © 2005 by the Genetics Society of America