## Abstract

We study the trajectory of an allele that affects a polygenic trait selected toward a phenotypic optimum. Furthermore, conditioning on this trajectory we analyze the effect of the selected mutation on linked neutral variation. We examine the well-characterized two-locus two-allele model but we also provide results for diallelic models with up to eight loci. First, when the optimum phenotype is that of the double heterozygote in a two-locus model, and there is no dominance or epistasis of effects on the trait, the trajectories of selected mutations rarely reach fixation; instead, a polymorphic equilibrium at both loci is approached. Whether a polymorphic equilibrium is reached (rather than fixation at both loci) depends on the intensity of selection and the relative distances to the optimum of the homozygotes at each locus. Furthermore, if both loci have similar effects on the trait, fixation of an allele at a given locus is less likely when it starts at low frequency and the other locus is polymorphic (with alleles at intermediate frequencies). Weaker selection increases the probability of fixation of the studied allele, as the polymorphic equilibrium is less stable in this case. When we do not require the double heterozygote to be at the optimum we find that the polymorphic equilibrium is more difficult to reach, and fixation becomes more likely. Second, increasing the number of loci decreases the probability of fixation, because adaptation to the optimum is possible by various combinations of alleles. Summaries of the genealogy (height, total length, and imbalance) and of sequence polymorphism (number of polymorphisms, frequency spectrum, and haplotype structure) next to a selected locus depend on the frequency that the selected mutation approaches at equilibrium. We conclude that multilocus response to selection may in some cases prevent selective sweeps from being completed, as described in previous studies, but that conditions causing this to happen strongly depend on the genetic architecture of the trait, and that fixation of selected mutations is likely in many instances.

TO improve our understanding of the genetics of adaptation, recent approaches of molecular population genetics and genomics have attempted to detect signatures of positive selection in the genome (selective sweeps) (Stephan 2010). Typically, these studies reveal many genes or gene regions that may have been under positive selection. However, the relationship between the genes under selection and associated traits remains usually unknown. Here we follow the opposite direction by starting with phenotypes and working toward the genotypes. A phenotype may be determined by a multitude of genes as well as the environment. Multilocus population genetics has been developed in the last decades to describe the evolution of multilocus systems and phenotypes (Bürger 2000). Different types of selection, such as directional, stabilizing or disruptive selection, modify the genetic constitution of the population and favor either extreme or intermediate genotypic values of the trait. In this study we focus on stabilizing selection, which drives a trait toward a phenotypic optimum. We investigate the trajectory of an allele affecting the phenotypic trait (from low frequency up to an equilibrium value). We are particularly interested in exploring the parameter range of trajectories that fix and therefore might generate selective sweeps.

Historically, there has been a great interest in the maintenance of genetic variability under stabilizing selection, because stabilizing selection is assumed to operate on traits in various organisms, for example, the coat color in mice (Vignieri *et al.* 2010), human facial features (Perrett *et al.* 1994), plant defense mechanisms (Mauricio and Rausher 1997), enhancer elements in *Drosophila* (Ludwig *et al.* 2000), and vocalization in frogs and toads (Gerhardt 1994); see also Endler (1986, Chap. V) for examples and discussion. Furthermore, it has been suggested that this type of selection exhausts genetic variation (Fisher 1930; Robertson 1956).

By contrast, many quantitative traits exhibit high levels of genetic variability. This contradiction motivated researchers to study the role of mutation (Lande 1975; Turelli 1984; Gavrilets and Hastings 1994; Bürger 1998), overdominance (Bulmer 1973; Gillespie 1984), migration (Tufto 2000), frequency-dependent selection through intraspecific competition for some resource (Bürger 2002; Bürger and Gimelfarb 2004), genotype–environment interaction (Gillespie and Turelli 1989), pleiotropy (Hill and Keightley 1988; Barton 1990; Zhang and Hill 2002), and epistasis (Zhivotovsky and Gavrilets 1992). Additionally, a lot of work has been devoted to exploring the ability of stabilizing selection in maintaining genetic variability of quantitative traits that are controlled by multiple loci in the absence of mutation. Theoretical focus was mainly on two-locus models, but also models of more than two loci have been analyzed.

Surprisingly, predictions about genetic variability depend profoundly on the number of loci. The two-locus model predicts that genetic variability may remain in the population due to stabilizing selection *per se*. On the other hand, in models with more than two loci the amount of genetic variability maintained by stabilizing selection is smaller. The reason is that the optimum can be approached very closely by various homozygous genotypes (Bürger 2000, Chap. VI) when there are more than two loci that control the trait. For the two-locus model, and assuming a symmetric viability model, such that the double heterozygous genotype is optimal and the fitness values of the remaining eight genotypes are symmetric about the optimum (*e.g.*, Bodmer and Felsenstein 1967; Karlin and Feldman 1970), it has been shown that there are nine equilibria (Bürger 2000), seven of which can be stable but not simultaneously. Those seven equilibria split into four classes (Bürger and Gimelfarb 1999): they can be polymorphic for both loci or one of them or totally monomorphic. Because analytical solutions of the two-locus model are available, analysis of this model plays an important role in our study.

To our knowledge, the first effort that bridges quantitative trait evolution and selective sweeps was made by Chevin and Hospital (2008). Their work was based on a seminal article by Lande (1983). Lande’s model focuses on one locus of major effect on the trait and treats the remaining loci of minor effects as genetic background for this locus. It is assumed that heritable background variation is maintained at a constant amount by polygenic mutation and recombination (Lande 1975, 1983); also, the various loci that affect the trait are unlinked and there are no epistatic interactions. Chevin and Hospital (2008) used Lande’s model to infer the deterministic trajectory of a beneficial mutation that affects a quantitative trait in the presence of background genetic variability. They studied both directional and stabilizing selection and showed that fixation needs longer time than in the classical one-locus model (*i.e.*, when genetic variability in the background is absent). In the case of stabilizing selection their approach (based on Lande’s model) suggests that the occurrence of selective sweeps at quantitative trait loci (QTL) is expected to be very rare. In contrast to Chevin and Hospital (2008) the present study assumes an explicit number of loci that determine the trait, as this was done by Bodmer and Felsenstein (1967), Karlin and Feldman (1970), and Bürger (2000, Chap. VI). Therefore, the assumption of constant variability in the genetic background is relaxed since the genetic background is modeled explicitly.

We analyze the evolution of the deterministic multilocus model and also its stochastic analog assuming a finite constant effective population size. The focus is on the properties of the trajectory of a new mutation at a certain locus (called focal locus thereafter) that affects the trait under selection. We also examine the parameters (such as the recombination rate and the contribution of the alleles to the phenotype) that affect the fixation probability of the new mutation. Finally, conditioning on the trajectory we generate coalescent simulations and examine the properties of the genealogy and the associated polymorphism patterns. Results are presented for the classical two-locus two-allele model, but we also extend the analysis up to an eight-locus two-allele model.

## Methods

### The general model

We consider a diploid population of size *N* and a quantitative trait under selection. The quantitative trait is controlled by *l* diallelic loci with no epistatic interactions on the phenotype. There is no dominance of allelic effects on the trait (but there may be for their effects on fitness). The alleles at the locus *i* are labeled as *i*. The optimum for the trait is set to 0 without loss of generality. The recombination fraction between loci *i* and *i* + 1 is *r _{i}* < 0.5. At time

*D*= 0). The trait is assumed to be under a Gaussian fitness function;

*i.e.*, if the phenotypic value of an individual is

*P*, then its fitness is given by

^{2}determines how fast the fitness decreases away from the optimum (the smaller ω

^{2}, the faster the fitness is decreasing). Note that due to the fitness function the effects of the alleles

*on fitness*are not additive. Here, the phenotypic value

*P*is determined solely by the genotype. However, it is straightforward to include environmental noise assuming that the environmental component is normally distributed with mean 0 and variance

^{2}in the Gaussian fitness function by

The population evolves forward in time from *N* matings take place among *N* individuals. Each mating produces one diploid offspring and each individual can participate in multiple matings as a male or female. In each generation, at the zygote phase, the frequencies of the alleles of the locus of interest are recorded and the trajectories are stored. Note that in this model selection and drift act simultaneously in step ii, where a finite number of individuals is chosen as parents for the next generation. Also, random genetic drift acts in steps iii and iv: from a pair of gametes only one recombinant is chosen to pass to the next generation.

### Recursion equations of the multilocus model

As mentioned above, the two-locus two-allele model has been widely used and will serve here as a reference point. We are particularly interested in the work of Willensdorfer and Bürger (2003) who explore the equilibrium properties of the two-locus two-allele model for Gaussian selection under the assumption of a symmetric fitness function for which the double heterozygous genotype is optimal and the fitness values of the remaining eight genotypes are symmetric about the optimum (see also Table S1). The analysis of Willensdorfer and Bürger (2003) provides the existence and stability criteria for the equilibrium points of the model. The fitnesses of the nine possible genotypes are shown in Table S1B. Let *x*_{1}, *x*_{2}, *x*_{3}, and *x*_{4} represent the frequencies of the gametes *e.g.*, Karlin and Feldman 1970; Willensdorfer and Bürger 2003) gives the recursion relations for the frequencies in the next generation as*i* and *i.e.*, random genetic drift is neglected.

Willensdorfer and Bürger (2003) parametrize the model so that the effect of alleles _{i} denote the fitness for the γ_{i} phenotype, *i.e.*, ^{2} quantifies the strength of selection (1/ω^{2} corresponds to *s* in Willensdorfer and Bürger 2003). They show that α_{1} and α_{2} and the recombination fraction *r* determine the equilibrium properties of the model (see also Willensdorfer and Bürger 2003, Equations 3.1, 3.2a, 3.2b, 3.8). We further demonstrate that the initial frequencies of the alleles determine to a large extent whether a new mutation will be fixed. For the symmetric fitness models in this study we use the model of Willensdorfer and Bürger (2003) as it was described above. Note, however, that by assuming that the effects of the alleles of the same locus are opposite (as has been assumed here) we study only a subcategory of symmetric fitness models. In general, when the optimum value is 0, any model with

While Equation 1 describes the evolution of two-locus two-allele models, for more loci the following classical equation (see Gimelfarb 1998 and references therein) describes the evolution of gametic frequencies,*g _{i}*,

*g*,

_{j}*g*are gametes,

_{k}*p*, p′ describe gametic frequencies at two successive generations,

### Lowess procedure

To capture the effect of the parameters on the equilibria in the system, we used the Lowess or Loess (locally weighted scatterplot smoothing) function. Lowess is a method that fits lines to scatterplots (Cleveland 1979). Lowess fits low-degree (usually 1 or 2) polynomial curves to localized subsets of the data. Thus, a global function (and therefore a model) is not required, and great flexibility can be gained. Here, we fit Lowess functions to data that assume two values (binary data): class 0 or class 1.

### Coalescent simulations and SNP summary statistics

Assume a sample of *k* individuals from a present-day population (*t* = 0). Given the trajectory of the *t* = 0 to the most recent common ancestor (MRCA) of the neutral genomic region around the locus *et al.* 1989; Wakeley 2008; Teshima and Innan 2009; Ewing and Hermisson 2010). The population is subdivided into two genetic backgrounds: one class of lineages is linked to the *e.g.*, Teshima and Innan 2009; Ewing and Hermisson 2010). However, as is mentioned in the following sections, this is appropriate only when selection is relatively weak and the loci are weakly linked.

Four statistics of coalescent trees have been used. First, *h* (the height of the coalescent tree) measures the scaled time from the present to the MRCA of the sample; second, *l* is the total length of the coalescent; third, two quantities, *n* is the total number of nodes (excluding the root), *i.e.*, *n* = 2*k*−2, where *k* is the sample size. *e.g.*, Fay and Wu 2000) are based on this perturbation of coalescent trees around the beneficial mutation.

Furthermore, we used SNP summary statistics to describe the polymorphism patterns in a present-day sample, as we move along the sequence alignment away from the *D*, which summarizes the site frequency spectrum. These summary statistics facilitate the comparison between polymorphism patterns that are created by the multilocus model and the one-locus selective sweep. Similarly to the summaries of the genealogies, they can describe perturbations of the polymorphism patterns that are created by the action of recent selection. It is well known that the level of polymorphism and the number of haplotypes are reduced around the target of selection, the site frequency spectrum is shifted toward low- and high-frequency derived variants, which cause negative values of Tajima’s *D*, and the linkage disequilibrium increases on each side of the beneficial mutation (Kim and Stephan 2002; Kim and Nielsen 2004; Stephan *et al.* 2006).

## Results

### Trajectories of the L 1 1 allele at the focal locus *L*_{1}

#### Two-locus two-allele model with symmetric fitness function:

The first goal of this analysis is to illustrate the effect of the parameters of Willensdorfer and Bürger’s (2003) model on the fixation of the focal *N* = 10,000. Then, we relax the assumption of the symmetric fitness matrix and analyze a more general fitness scheme.

##### Deterministic model:

For the deterministic two-locus two-allele model with symmetric fitness matrix, we numerically solve the system of recursions described in Equation 1 and record the frequency of the ^{2}, *i*, *j* = 1, 2, are given as the product *D* is 0.

Fixation of the allele is possible and this fixation may occur fast. These trajectories are similar to the trajectories obtained from the classical selective sweep theory. There is, however, a subset of trajectories that remain polymorphic for the focal locus. Furthermore, there is a class of trajectories that shows nonmonotonic behavior. The frequency initially increases and then may decrease to some equilibrium value.

To construct a trajectory we draw uniformly a value from the six-dimensional space described in Table 1. Since we draw random values for the parameters from the six-dimensional space, trajectories may become extinct, reach a polymorphic equilibrium point, or fix, depending on the parameter values. After 10,000 generations we record the final frequency of the trajectory. The proportion of parameter values that lead to a polymorphic equilibrium with frequency in the intervals (0, 0.5) and (0.5, 1) are 0.113 and 0.029, respectively. Similarly, the proportion of equilibrium points 0, 0.5, and 1 are 0.401, 0.415, and 0.041, respectively. Thus, the vast majority of trajectories lead to extinction or the polymorphic equilibrium value 0.5. Although it is not clear whether the frequencies reached after 10,000 generations represent equilibrium values, the fact that >40% of the trajectories remain polymorphic at frequency 0.5 is not surprising since the double heterozygote genotype is optimal for the symmetric fitness model. Thus, for the given parameter values the majority of trajectories either approach 0 or remain polymorphic at frequency 0.5 (Figure S1A).

To identify the factors that determine the fixation of the *vs.* trajectories that stay at equilibrium frequency 0.5 (polymorphic class), and (ii) fixed trajectories *vs.* trajectories where the allele

Willensdorfer and Bürger (2003) proved that two conditions are required for the fixation or extinction of *Methods*). These two conditions are derived from the stability analysis of Equation 1 at the monomorphic equilibria. An equivalent way to write the first condition is

In addition to

This finding may be explained as follows. Let the initial frequency of the *i.e.*, black points in Figure 1). Then, as illustrated in Figure 1C, these points are located in the proximity of the line *x* = *y*; *i.e.*, the contributions of the

From this initial state, which is suboptimal, the population will move toward the optimal genotypes. Given that allele

Thus in this case there is a competition between alleles similar to the situation in other sweep models with multiple loci (*e.g.*, Kirby and Stephan 1996). Given that the initial frequency of the

Furthermore, comparing the lowest frequencies for the ^{−3} *vs.* 1.1 × 10^{−5}). This means that classical selective sweeps (as described by Maynard Smith and Haigh 1974) may be rare under the symmetric fitness model (where

The parameters

Background genetic variance at time

##### Stochastic model:

Next we study the behavior of the stochastic model when the fitness matrix is symmetric. The population size *N* = 10,000. The simulation parameters are similar to the deterministic two-locus two-allele model with symmetric fitness matrix. We use the average frequency of the last 500 generations, *vs.* trajectories that stay in a polymorphic equilibrium, and (ii) trajectories that result in fixation *vs.* trajectories that result in extinction. The definitions of the three trajectory classes (fixed, polymorphic, extinct) are given above. For the first comparison, the relation of six parameters with the class of the trajectory is depicted in Figure 3. We observe that the initial frequency

The role of *i.e.*, the parameter values for the trajectories that fix are located in the proximity of the two diagonals.

Comparing the fixation class with the extinction class the following results have been obtained under the stochastic model. The roles of

Finally, it should be mentioned that symmetry itself is not solely responsible for the frequency of the trajectory classes we observed in our simulations. Obviously, if the model is symmetric but the fitness of the heterozygote is sub-optimal (this may occur when the genotypic values are *d*, *d* and *d* is the distance of the heterozygote from the optimum), this model would still be symmetric but the fitness of the heterozygote would be less than the fitness of one homozygote. On the other hand, if we relax the symmetry assumption but require fitness advantage of the double heterozygote (the effects of the alleles are drawn uniformly as in previous simulations, but we require that the phenotypic value of the double heterozygote is the closest to the optimum), we do not observe the same patterns as in the symmetric model. In the symmetric model ∼40% of the trajectories reach their equilibrium at frequency 0.5. For the nonsymmetric model, but with heterozygote advantage, the respective proportion of trajectories is ∼10%. For the general model (see below), the respective proportion is ∼0.02 (Figure S2). Thus, our simulations indicate that both symmetry and heterozygote advantage are responsible for the observed final frequencies of trajectories.

#### Two-locus two-allele model with general fitness function:

We relax the assumption of symmetry of the fitness matrix in the deterministic two-locus two-allele model using a general fitness scheme. The parameter space is given in Table 1. Essentially, the difference between this model and the symmetric fitness model is that there is no restriction on the relations between the contributions of the alleles. Thus, the effects

The shape of the trajectories in this model is similar to the symmetric fitness matrix model. The number of trajectories where the

An informative quantity for disentangling trajectories in which

Initial background genetic variance does not appear to have the same effect as in the symmetric model. In the symmetric model, we observed that the proportion of trajectories that reach fixation decreases as the initial genetic variance increases and that for large values of initial background genetic variance the proportion of trajectories that reach fixation diminishes (Figure 2B). In the general model, we observe only a slight decrease of the proportion of fixed trajectories as the initial background genetic variance increases.

The comparison between the symmetric and the general fitness model may be open to discussion because of the different sampling spaces for the allelic effects. In particular, for the two-locus two-allele model, we uniformly sample random values from

#### Biallelic models of up to eight loci with general fitness function:

We study the effect of the number of loci on the trajectory of a focal mutation at locus *l* loci are present, we study the general model where the effects of loci are sampled in

We first analyzed the effect of the number of loci on the percentage of fixed trajectories in the deterministic model. For nonsymmetric fitness matrices, the maximum number of fixed trajectories occurs for two loci and then decreases as the number of loci increases (Figure 4). When the initial frequency of the ^{−3}, 1 × 10^{−4}, and 1 × 10^{−5} to approximate sweeps from new mutations more precisely. Under the general fitness model the effect of the initial frequency of the

The effect of the number of loci on the percentage of fixed trajectories in the stochastic model is as follows. Due to drift the probability of a polymorphic equilibrium is reduced, and the population evolves mostly toward the absorbing states. For small initial frequencies of the

To examine how much the frequency of fixed trajectories is reduced by the presence of other loci under selection (beyond the effect of drift), we compare the proportion of fixed trajectories in multilocus models to a single-locus model with genotypes *AA*, *Aa*, *aa*, and constant selection coefficient *N*= 10,000 is the effective population size, and *p* is the initial frequency of the *A* allele. The probability of fixation is calculated by averaging

Under the general model, for more than two loci the effect of the initial background genetic variance (

#### The ratio of the effect of the focal allele on the phenotype to the mean initial trait value:

Another quantity of interest for multilocus models is the ratio of the effect of the focal allele to the mean initial trait value (henceforth denoted as ϕ). The results shown here refer to the deterministic model. ϕ is interesting because the mean initial phenotypic value may increase with the number of loci (results not shown). For the symmetric fitness model we observe the following patterns for ϕ: for the two-locus two-allele model the variance of ϕ is smaller for trajectories that reach fixation than trajectories that reach an equilibrium at frequency 0.5 or vanish (Figure S3A). The variance of ϕ is smaller for trajectories that reach equilibrium at a frequency in (0.5, 1) than trajectories that reach equilibrium at a frequency in (0, 0.5). Furthermore, ϕ is strictly negative for trajectories that fix and takes small absolute values. Negative ϕ implies that the signs of the initial mean trait value and the effect of

For the general fitness model, under the two-locus two-allele case the variance of ϕ is greater than for the symmetric two-locus two-allele model, and ϕ may be either positive or negative (Figure S4A). Thus, even if the effect of the focal locus on the phenotype is large compared to the initial mean phenotypic value, fixation of the focal allele might still occur. Furthermore, fixation of the focal allele is possible even when its effect is initially deleterious (*i.e.*, ϕ is positive). Finally, trajectories that go to fixation or extinction have greater variance of ϕ than trajectories that stay polymorphic. For models with more than two loci, results are similar to the two-locus two-allele model (Figure S4B).

### Coalescent simulations conditioning on the trajectory of the L^{1}_{1} allele

In this section we describe our coalescent simulations that were performed to obtain (i) the genealogies and (ii) the neutral polymorphism patterns in the neighborhood of the focal locus. The results are approximate because of two reasons. First, conditioning on the frequency of one allele implies that the coalescent rates of all genotypes that carry this allele are equal. However, in the case of multilocus models this is not true. For example, the coalescent rate of the

Given a trajectory, coalescent simulations require specifying the time point from which the backward process is considered. That means, the genealogies will be strikingly different if the backward process initiates 100 or 5000 generations after the onset of the

Backward simulations have been performed using either a modified version of the software mbs (Teshima and Innan 2009) or the msms software (Ewing and Hermisson 2010). Our mbs algorithm implements the infinite site model, in contrast to the original software, and it calculates and outputs statistics related to the coalescent trees, such as the height, the total length, and the balance of the coalescent. msms is more efficient. Both algorithms produce equivalent results. For the coalescent simulations we have used parameters related to human data. Assuming that the mutation rate *e.g.*, Zhao *et al.* 2006), then *N* = 10,000 and remains constant. Simulations are performed for a 0.5-Mb genomic segment. The locus is located in the middle of the simulated segment. The sample size is 50. For a given equilibrium frequency bin (see below), we have chosen randomly one trajectory whose initial frequency is <0.001. This is done to resemble closely a selective event of a new variant. For a given trajectory, 1000 coalescent simulations are performed. Finally, the summary statistics for the coalescent trees are computed at the recombination breakpoints for each simulation, and the results are binned. The following example demonstrates the binning process: if the (arbitrary) positions (in base pairs) *i.e.*, the genealogy may change), and the bin size is set to 100, then

Furthermore, we have used molecular population genetics summary statistics to describe the properties of the polymorphisms in the proximity of the *Methods*. For each window the mean value of each summary statistic is calculated over the simulated data sets.

Tajima’s *D* (Tajima 1989) is negative over the whole region for frequencies >0.9. For fixed trajectories, Tajima’s *D* becomes less negative closely to the *D* obtains its most negative value exactly at the location of the locus *D* with the *e.g.*, Kim and Stephan 2002).

## Discussion

### Overview

In this study, we explore selective sweeps in multilocus two-allele models of a quantitative trait. Selection works on the phenotype based on a Gaussian fitness function. The Gaussian function seems an appropriate choice for many quantitative traits (Endler 1986; Willensdorfer and Bürger 2003), because it naturally formalizes the concept of the evolution toward an optimum value. Furthermore, it is sufficiently flexible to allow for modeling both stabilizing and directional selection. Stabilizing selection is modeled by assuming that the optimal genotypic value is located between the extreme genotypic values that an individual may obtain. Directional selection can be modeled by assuming that the optimum is more extreme than the genotypic values that the individual may have. Therefore, the allele frequencies shift toward the direction of fixation of the most extreme genotype favored by selection.

Previous studies (Bodmer and Felsenstein 1967; Karlin and Feldman 1970) suggest that multiple equilibrium points exist in two-locus two-allele models with a Gaussian fitness function. Furthermore, conditions are provided for their existence and stability. However, the trajectories of the alleles toward the equilibrium points have not been explored. This study focuses on the trajectory of an allele, which initially is in low frequency and moves toward its equilibrium points.

An important result of our analysis shows that selective sweeps that initiate from a very low frequency of the *i.e.*, the beneficial allele at the focal locus) are very rare in the two-locus two-allele model with symmetric fitness matrix. Multiple conditions need to be satisfied to achieve fixation. First, the contribution of one of the alleles at the second locus (*e.g.*, *i.e.*, when the model resembles the one-locus two-allele model. Since fixation of

Relaxing the assumption of symmetric

Assuming an effective population size *N* = 10000 we also explore the effects of random genetic drift. Genetic drift increases the proportion of the trajectories that reach monomorphic states (Figure S1C). This is expected because genetic drift pushes the model toward its absorbing states. Therefore, selection needs to be sufficiently strong to maintain the polymorphic state of the trajectory. This is illustrated clearly in Figure 3, where the

When more than two loci are modeled, the proportion of trajectories that reach fixation decreases as the number of loci increases (Figures 4, 5, and 6). The proportion of trajectories that become extinct increases, whereas that of trajectories that remain polymorphic decreases. This is in agreement with the results of Bürger (2000) who shows that when the trait is determined by more than four loci the monomorphic equilibrium points become more likely.

### Model limitations

We presented a study of selective sweeps in multilocus models based on numerical calculations and computer simulations. We studied the effects of recombination rates between loci, locus contributions to the phenotypic value, selection intensity

Furthermore, by using Lowess smoothing functions we study the effects of parameters independently of each other. Clearly, several parameters interact (*e.g.*, initial allelic frequencies and

Additionally, we demonstrated that a significant percentage of trajectories may reach fixation under our simulation parameter values. This result might be invalid for multilocus models in general, since we omit mutations during simulations and we study only a single trait.

A further limitation of our analysis might be that different simulated models assume different numbers of free parameters. For example, as mentioned above, we sample allelic effects in

### Coalescent trees for trajectories that approach fixation are similar to coalescent trees of classical selective sweeps

Conditioning on the trajectory of the

### Coalescent trees for trajectories that stay polymorphic at intermediate or low frequencies resemble neutrality

When the trajectories do not reach fixation, then a part or all signatures of a selective sweep become invisible, depending on the equilibrium frequency of the trajectory. For example, when the equilibrium frequency is between 0.9 and 1, then the height of the coalescent tree equals the neutral expectations, because ancestral alleles (*e.g.*, 0.3–0.4) both the coalescent and polymorphism summaries resemble the neutral expectations.

Depending on the parameter values, a large fraction of trajectories is maintained at some equilibrium value and does not reach fixation. For these trajectories analysis of incomplete sweeps (Sabeti *et al.* 2002; Voight *et al.* 2006; Tang *et al.* 2007) may be useful. There is, however, an essential difference between incomplete sweeps and sweeps in multilocus models that were studied in this article. Incomplete sweeps are on the way to fixation, whereas the sweeps studied here remain at equilibrium frequency. Therefore, the signatures of selection will be visible only in the cases in which the equilibrium frequency has been reached recently. If the trajectory remained at the equilibrium level (either polymorphic or monomorphic for the focal allele) for too long, then the signatures of selection will fade away due to recombination.

Our results indicate that detection of selection from polymorphism patterns in multilocus models may be hard. When the focal allele fixes in the population, then the statistical tools that are used to detect sweeps in one-locus two-allele models may be useful (*e.g.*, Kim and Stephan 2002; Nielsen *et al.* 2005; Pavlidis *et al.* 2010). This is also true for equilibrium points close to fixation. Even if the patterns appear to be different than those of fixed trajectories, the direction of perturbations is similar to the classical sweep models, and therefore the same statistical tools may be used. However, for smaller equilibrium frequencies some or all signatures of selection studied in this article disappear.

A hallmark of multilocus two-allele models are the nonmonotonic trajectories. This class of trajectories is absent from one-locus two-allele models. These trajectories quickly approach a certain frequency, but eventually they decline either to extinction or to some other equilibrium frequency. The difference between their maximum frequency and the equilibrium frequency may be quite large. In the simulated data sets, we observed differences even larger than 0.5. However, the polymorphism and coalescent patterns seem to be very similar to the neutral expectations. Thus, these trajectories may be completely invisible using the summary statistics studied in this article. Summarizing the results, it may be claimed that the statistical tools that have been developed to detect selective sweeps may detect only a small proportion of the multilocus selection cases, namely only those cases that result in fixed trajectories or equilibrium trajectories close to fixation. Tools that are used for detecting incomplete sweeps may be useful when the trajectory has reached its equilibrium frequency very recently. For trajectories that have reached their equilibrium frequency further in the past, we expect that recombination will destroy the signatures of selection. In fact, the results imply that positive or stabilizing selection may occur at a much higher rate than previous studies that analyze selective sweeps report (*e.g.*, Li and Stephan 2006). However, the majority of the cases remains undetectable since both the coalescent trees (as summarized here) and the polymorphism summary statistics do not deviate from neutrality.

### Comparison of the present study to Chevin and Hospital (2008)

To our knowledge the only study of selective sweeps at QTL was done by Chevin and Hospital (2008). They assume an infinite number of unlinked and independent loci that control a trait. Moreover they assume that the variability in the genetic background remains constant during the selective phase and that the effect of the focal locus on the trait value is small compared to the effect of the genetic background. These assumptions enable them to solve the trajectory of a new allele analytically for linear, exponential, and Gaussian fitness functions. Chevin and Hospital (2008) focus mainly on the trajectories that reach fixation, but they also study trajectories that initially increase and vanish eventually. Under their model polymorphic equilibria are also possible for certain initial conditions (Luis-Miguel Chevin, personal communication). The results of Chevin and Hospital (2008) indicate that trajectories of new alleles evolve slightly slower than classical selective sweeps, and given that the allele will be fixed, selective sweeps look slightly older than the classical one-locus selective sweep.

In our study, we model the genetic basis of a quantitative trait explicitly, taking into account that a finite number of loci makes the model mathematically intractable. Therefore, computer simulations were employed to study the trajectory of a new beneficial allele. The contribution of alleles may be arbitrary as well as the recombination fraction between the loci. We provide information about the role of various parameters on the fixation of the trajectories, but we also study extensively the trajectories that remain polymorphic. The present study may thus be considered complementary to the study of Chevin and Hospital (2008) for finite multilocus models, providing information about the trajectories of new alleles and the polymorphism patterns generated by selective sweeps in multilocus models. Furthermore, our study complements the deterministic analyses of Willensdorfer and Bürger (2003) and Gimelfarb (1998) by including random genetic drift. We expect that the results of this study as well as those of the previous theoretical investigations will be essential for the development of software for the detection of selective sweeps in multilocus models.

## Acknowledgments

We are very grateful to two anonymous reviewers and Luis-Miguel Chevin for their valuable suggestions. This work has been supported by grants from the Deutsche Forschungsgemeinschaft Research Unit 1078 to W.S. (Ste 325/12) and D.M. (Me 3134/3) and a grant from the Volkswagen-Foundation to P.P. (I/824234).

## Footnotes

*Communicating editor: L. M. Wahl*

- Received February 14, 2012.
- Accepted June 10, 2012.

- Copyright © 2012 by the Genetics Society of America