## Abstract

Evolutionary biologists attribute much of the phenotypic diversity observed in nature to the action of natural selection. However, for many phenotypic traits, especially quantitative phenotypic traits, it has been challenging to test for the historical action of selection. An important challenge for biologists studying quantitative traits, therefore, is to distinguish between traits that have evolved under the influence of strong selection and those that have evolved neutrally. Most existing tests for selection employ molecular data, but selection also leaves a mark on the genetic architecture underlying a trait. In particular, the distribution of quantitative trait locus (QTL) effect sizes and the distribution of mutational effects together provide information regarding the history of selection. Despite the increasing availability of QTL and mutation accumulation data, such data have not yet been effectively exploited for this purpose. We present a model of the evolution of QTL and employ it to formulate a test for historical selection. To provide a baseline for neutral evolution of the trait, we estimate the distribution of mutational effects from mutation accumulation experiments. We then apply a maximum-likelihood-based method of inference to estimate the range of selection strengths under which such a distribution of mutations could generate the observed QTL. Our test thus represents the first integration of population genetic theory and QTL data to measure the historical influence of selection.

IDENTIFYING which quantitative traits have been subject to strong selection and which have evolved under neutral or nearly neutral conditions is a challenging and important task for evolutionary biologists (Boake *et al.* 2002). To this end, biologists are devoting increased attention to the genetic basis and evolutionary causes of quantitative variation (Barton and Keightley 2002; Lai *et al.* 2007; Barton and De Vladar 2009). On one hand, substantial progress has been made in revealing the genetic architecture of quantitative traits (Zimmerman *et al.* 2000; Ashton *et al.* 2001; Mackay 2001; Gleason *et al.* 2002; Verhoeven *et al.* 2004; Mackay and Lyman 2005; Brem and Kruglyak 2005; Lai *et al.* 2007; Gleason *et al.* 2009). On the other hand, it has been difficult to connect the action of microevolutionary forces detected in studies of contemporary populations to their macroevolutionary effects (Grant and Grant 2002). Consequently, few attempts to detect natural selection currently exploit the growing body of knowledge about the sizes and directions of quantitative trait locus (QTL) effects (for counterexamples, see Rieseberg *et al.* 2002; Albertson *et al.* 2003; Lexer *et al.* 2005).

Progress toward understanding the basis of quantitative genetic variation is likely to come from studying allelic variation at specific QTL (Barton and Keightley 2002). Identifying the genetic architecture of quantitative traits begins with mapping QTL to broad genomic regions and ends with the molecular definition of quantitative trait loci alleles. This high degree of resolution has been achieved, for instance, for some QTL in Drosophila (Zimmerman *et al.* 2000; Mackay 2001; Palsson and Gibson 2004; Palsson *et al.* 2005). QTL mapping may be coupled with further genetic dissection (*e.g.*, fine-scale mapping, disequilibrium mapping, transgenic manipulation; Mackay 2001) to characterize the specific loci targeted by selection. Weinig *et al.* (2003) demonstrated that the QTL architecture behind herbivory tolerance was one of many loci of small effect and simultaneously elucidated locus-specific evolutionary constraints, demonstrating that linking molecular genetic tools to quantitative genetic analysis and field studies in ecologically relevant settings can clarify the role of specific loci in the evolution of quantitative traits.

These QTL studies not only generate maps from genotype to phenotype, but also yield information that may be applied to the study of phenotypic evolution (Erickson *et al.* 2004). Behind every genetically based phenotypic difference lies an evolutionary history. Despite the evident relevance of QTL data to the study of the evolutionary process, very little theory has been developed to link data from QTL studies to population genetic parameters via the methods of molecular evolution. This lack of theoretical tools has inhibited progress in explaining the process underlying the evolution of phenotypes that are genetically dissected in empirical QTL studies.

While most tests for selection depend directly on molecular data (McDonald and Kreitman 1991; Depaulis *et al.* 2003; Schlenke and Begun 2003; Mousset *et al.* 2004; Schlenke and Begun 2004; Nurminsky *et al.* 2005; Schlenke and Begun 2005), Orr (1998a) presented two innovative attempts to integrate QTL data and population genetic theory into tests for selection: the QTL sign test and the QTL effect size test. These tests employ a null model of the evolutionary process that produces QTL. They have been applied multiple times to empirical data (Rieseberg *et al.* 2002; Albertson *et al.* 2003; Lexer *et al.* 2005; Orr 2010). However, the QTL sign test has been deemed substantially flawed by its high false-positive rate (Anderson and Slatkin 2003) and the QTL effect size test has been shown to provide very low power to detect selection (Anderson and Slatkin 2003). Moreover, the tests neither model selection explicitly nor account for the distribution of mutational effects, both of which are important to accurately model the “filtering” action of natural selection (Orr 1998b).

Accordingly, understanding the evolutionary significance of QTL observations requires evaluation of both the origin and fate of heritable variation. In particular, the evolutionary forces that result in the genetic differences we observe must be estimated (Barton and Keightley 2002). Here we develop a test for a history of directional selection on quantitative traits and a method for estimating the strength of that selection. Using mutation accumulation (MA) data and the QTL effect distribution derived from a cross between two divergent populations, we infer the selective history of the trait. While existing tests (Orr 1998) only model neutral evolution and assume selection to be at work whenever data does not fit the model’s predictions under neutrality, we model selection explicitly and consider neutrality as one potential inference on a continuum. Such a flexible model is a key first step toward unraveling the complex link between molecular evolution and quantitative trait variation.

## Theory

### Overview

Our framework for inference integrates information from mutation accumulation experiment (Figure 1, A and B), population genetic theory on selection (Figure 1, G–J), the consequent expected outcome of selection (Figure 1, K–N), and the empirically obtained QTL effect size distribution (Figure 1, O–R). In our schematic example, mutation accumulation is depicted as resulting in increased variance of the phenotype with no directional bias (Figure 1A) or, alternatively, as resulting in an increased variance in phenotype with a downward bias (Figure 1B). From phenotypic observations obtained along the time course of a mutation accumulation experiment (Figure 1, A and B), the distribution of mutational effects on phenotype can be inferred (Figure 1, C–F). Data as in Figure 1A yield an inferred distribution of mutational effects that is symmetrical around no change (Figure 1, C and D); in contrast, data as in Figure 1B yield asymmetry (Figure 1, E and F).

We multiply the mutation effect distribution by the calculated probability of fixation as a function of the strength of selection corresponding to the phenotypic effect (Figure 1, G–J). For a neutral phenotype, fixation probabilities are equivalent across all phenotypes (Figure 1, G and I), whereas under selection, fixation probabilities depend upon the phenotype conferred (Figure 1, H and J). The product of the mutation effect distribution and the fixation probability distribution is the probability distribution of fixed phenotypic differences resulting from substitutions at quantitative trait loci (Figure 1, K–N). Sampling a number of substitutions from this distribution and summing their effects (*e.g.* Figure 1, O–R, outlines), we compare the cumulative effects with those of the observed QTL (Figure 1, O–R, bars). The accord between the sampled effects and the observed effects varies with the estimated mutation effect distribution and selective regime considered: the accord will be better when the level of selection is appropriate for generating fixed effects whose cumulative effects are close to those of the observed QTL. A likelihood function (Figure 1, S and T) quantifies the fit of the model and provides a test of the hypothesis of neutrality and an estimate of the strength of selection. In our schematic example, the MA and QTL data together could support an effectively neutral model (Figure 1, A, C, D, G, H, K, L, O, P, S). Alternatively, parameterizing with the same QTL data (Figure 1, O–R, bars), but with different MA data (Figure 1, A and B) could yield rejection of neutrality and thus, a conclusion that the quantitative trait has been subject to a history of selection.

To develop this inferential framework, we first present a method for estimating the distribution of mutational effects with data from mutation accumulation experiments, which are a well-established approach to inferring the effects of mutations (Keightley, 1994; Garciadorado 1997; Shaw *et al.* 2002; Halligan and Keightley 2009). In these experiments, one or more lines of the organism under study are kept under unselective conditions, subjected to serial population size bottlenecks (Lopez and Lopez-Fanjul 1993). This enforced genetic drift facilitates the accumulation of mutations with minimal interference from selection. A quantitative trait may be assayed at the end of the experiment or at regular intervals to track its change over time in each population. Using the change in the trait means and the number of generations between observations, we estimate a Gaussian distribution of mutational effects (Turelli 1984; Sawyer *et al.* 2003; Jones *et al.* 2007).

We then derive the phenotypic effect fixation function from components of classical population genetic theory (Kimura 1962). The product of this function with the mutational effect distribution provides a distribution of QTL effect sizes. Because these components are all probabilistic, comparison of their product to estimated QTL effect sizes and the standard errors thereof constitutes a likelihood-based approach to inferring the historical regime of genetic drift or natural selection.

We apply this test to an example trait, sensory bristle number in the fruit fly *Drosophila melanogaster*. The genetic architecture of *Drosophila* sensory bristle number is well studied (Lai *et al.* 1994; Long *et al.* 1995, 1996; Dilda and Mackay, 2002; Robin *et al.* 2002; MacDonald *et al.* 2005; Mackay and Lyman 2005) and the selective history on this trait has been the subject of much debate (Mather 1941; Robertson 1955; Robertson 1967; Kearsey and Barnes 1970; Mackay *et al.* 1994,).

### Estimating the mutational effect distribution

To estimate the distribution of the phenotypic effects of mutations on the trait, we analyze the time course of a mutation accumulation experiment. Although the appropriate formal distribution of the mutational effects for any given trait is unknown (Keightley and Lynch 2003), moments of the distribution of mutation effects on a trait can be inferred from the changes in the trait over time. For analytical tractability, we follow Turelli (1984), Sawyer *et al.* (2003), and Jones *et al.* (2007) and assume that the effects of mutations on a trait follow a Gaussian distribution

The sum of *k* independent normally distributed random variables with the same mean and variance is normally distributed with mean *k* mutations is

The distribution of the total effect of the mutations that accumulate in *g* generations is the sum of the distributions of the effect of *k* mutations, as *k* varies from zero to infinity, weighted by the probability that *k* mutations occur. If mutations are independent, the number of mutations affecting the trait that occur in *g* generations follows a Poisson distribution,*u* is the per generation rate of mutations affecting the trait. The sum over *k* of Equation 2 weighted by Equation 3 is*gu* may be achieved by sums of only a small number of terms normalized by the total of all terms considered. We included terms so that, for a given *gu*, the total Poisson weight of the terms was at least 0.999.

### Inference of historical selection from QTL and MA data

Inference of historical selection from QTL and MA data requires specification of the relationship between the strength of selection and the distribution of effect sizes of mutations that fix in the population. To calculate the probability that a mutation with a particular effect would fix, we apply the solutions of selection–diffusion equations. Assuming that these equations hold, the probability of a new mutation fixing in the population is given by Kimura (1962),*s* is the selection coefficient and *N* is the population size.

Following Lande (1983) and Chevin and Hospital (2008), we assume that the selective value of the mutation is proportional to its quantitative effect on the trait. To avoid making assumptions about the optimal value of the trait, we assume that the fitness gradient did not change as the value of the trait changed. This assumption stands in contrast to some existing models of adaptive walks (Orr 1998; Martin and Lenormand 2006) that use a Gaussian fitness landscape. It is also a conservative assumption for our purposes: a walk toward a nearby optimum on a peaked fitness landscape will fix mutations of smaller effect than in our model, diminishing the signature of selection that our method is sensitive to. Additionally, we assume that adaptation proceeds by successive fixation of beneficial mutations (Atwood *et al.* 1951), as opposed to simultaneous selection at many loci. If many loci are selected simultaneously, the strength of selection on a particular locus will tend to decrease over time (Chevin and Hospital 2008), rather than remain constant. Hence, the assumption of successive fixation is also conservative. Furthermore, we assume that the mutation has no other effects on fitness beyond its effect on the trait in question. Thus, for a mutation with phenotypic effect *x*,*c* is a constant whose sign and magnitude determine the direction and strength of selection, respectively. A positive value of *c* indicates selection for an increase in the trait and a negative value of *c* indicates selection for a decrease in the trait. When *c* = 0, the trait is neutral (*i.e.*, the selection coefficient for a mutation of any effect size is zero). Substituting Equation 6 into Equation 5 yields*x* will fix in the population.

The probability distribution of the phenotypic effects of substitutions is determined by the distribution of mutational effects multiplied by the relative fixation probability of those effects. Accordingly, multiplying Equation 1 by Equation 7 yields the distribution of effect sizes of substitutions, given that they have fixed*f*(*x*) a probability distribution, Equation 8 contains a normalization constant, *C*, chosen to ensure that the integral of *f(x)* over all *x* is equal to one.

## Methods

### Overview of the data required

To infer a mutation effects distribution, we use the time course of a mutation accumulation experiment, specifically the change in trait value and number of generations between each measurement in each line. We also use an independent estimate of the per-generation mutation rate of the trait. To calculate probabilities of fixation of mutants drawn from the effect distribution, we also use the historical effective population size of the population in question. To construct a likelihood function for the parameters underlying the observed change in the quantitative trait, we use the effect size and standard error of QTL identified by crossing individuals from two divergent populations that differ in the trait value.

### Mutation effects distribution

Moments of the distribution of mutation effects on a trait were inferred from the changes in the trait value over the course of a mutation accumulation experiment. Given the change in the trait means, the number of generations between observations, and an independent estimate of the per-generation mutation rate for the trait, a likelihood function was specified for the parameters μ and σ of the distribution of mutation effects as*L*(…) is the log likelihood, *h*(…) is from Equation 4, and *x _{i}* and

*g*are the change in the mean value of the trait and number of generations between each observation. The values of μ and σ for which this function is maximized are the maximum-likelihood estimates of the parameters of the Gaussian distribution of mutational effect sizes.

_{i}### Estimating selection

We also applied a maximum-likelihood framework to estimate the strength of historical selection on the trait. The phenotypic effect of each QTL is the result of one or more underlying substitutions, so we maximize the joint likelihood of *c* and the vector **H** = (*H*_{1}, *H*_{2}, …, *H _{n}*), where

*H*is the number of substitutions at the

_{i}*i*th locus and can be any positive integer, and

*n*is the number of loci. There is no explicit constraint on the number of mutations at each locus, nor on the directions of the underlying mutational effects, only the constraint of the specified total effect. Thus, our method can properly accommodate amalgamations of several QTL that are detected as single QTL by a QTL analysis (

*cf*. Mackay 2001; Mackay and Lyman 2005; Studer and Doebley 2011), even if the cryptic QTL are in opposing directions, just as it accommodates multiple mutations of opposing direction. The likelihood of

*c*and

**H**given the QTL data are the product of the joint likelihood of

*c*and

**H**given each locus individually:

**q**and

**e**are the effects and standard errors of the observed QTL. The joint likelihood of

*c*and

**H**given a single locus is

*g*(…) is the probability density function of a Gaussian distribution with mean

*q*and standard deviation

_{i}*e*, and

_{i}*F*(…) is the probability density function of the sum of

*H*substitutions under strength of selection

_{i}*c*. The latter is calculated by

*H*convolutions of

_{i}*f*(

*x*) from Equation 8 with itself.

The integral in Equation 11 has no closed-form solution and is difficult to evaluate by typical numerical approaches for most values of *H _{i}*. We effectively approximated the likelihood by Monte Carlo sampling from the distribution given by

*F*(

*x*) and calculating average value of

*g*(

*x*) evaluated for each sampled effect:

*x*. As the number of Monte Carlo samples increases, the right-hand side of Equation 12 approaches the true likelihood.

_{j}To calculate the likelihood of a given value of *c*, we fixed *c* at a value and evaluated the right-hand side of Equation 12 for a range of vectors of **H**. For each locus *i*, we started with *H _{i}* = 1 and then increased

*H*until we found its unimodal maximum likelihood value. We took the likelihood of

_{i}*c*to be the product over

*i*of these likelihoods (Equation 10). We calculated the likelihood of a dense grid of values of

*c*in this way to obtain

*ĉ*, the maximum-likelihood value. To test the hypothesis of neutral evolution, we performed a likelihood-ratio test with a null hypothesis of

*c*= 0 and an alternate hypothesis of

*c*=

*ĉ*.

### Sources of data

We applied our method to QTL and MA data for sensory bristle number in *D. melanogaster*. QTL effect size data were extracted from Dilda (2002), who had applied four different artificial selection regimes on bristle traits. Populations HST and LST were selected for high and low numbers of sternopleural bristles, respectively, while populations HAB and LAB were selected for high and low numbers of abdominal bristles. The QTL effects and standard errors from each population are presented in Figures 3A, 4A, 5A, and 6A. These QTL were assayed by crossing the artificially selected populations with a wild-type laboratory strain. Because the QTL for this trait were assayed using artificially selected strains rather than phenotypically divergent strains from wild populations, our example serves to illustrate the use of the test rather than to make an inference about the selective history of the trait in the wild.

Mutation accumulation data for sternopleural bristle number and abdominal bristle number were extracted from Paxman (1957) and Lopez and Lopez-Fanjul (1993), respectively. Paxman (1957) propagated six lines of flies to accumulate mutations for 40 generations and measured the average number of sternopleural bristles in each line at 5 time points. The distribution of trait changes in the mean sternopleural bristle number of each line between each measurement is depicted in Figure 2A. Lopez and Lopez-Fanjul (1993) maintained 93 MA lines for 61 generations and measured the change in the number of abdominal bristles in each line at the end of the experiment. The distribution of trait changes is depicted in Figure 2B.

The per-generation rate of mutation affecting sensory bristle number in *Drosophila* is estimated to be 0.03 (Mackay and Lyman 2005).

### Estimating the distribution of mutational effects

To estimate the mean and standard deviation of the distribution of mutation effects μ and σ, we calculated their maximum-likelihood values for each type of bristle. For each bristle trait, we evaluated Equation 9, letting *u* = 0.03 (Mackay and Lyman 2005). We let *x _{i}* and

*g*be the trait changes and numbers of generations between measurements for all measurements in all lines. Implementing a simple hill-climbing algorithm in MATLAB, we calculated the joint maximum-likelihood estimates for µ and σ (see supporting information, File S1). We estimated the standard deviations of our estimates by finding the values of µ and σ whose log likelihood was equal to the maximum log likelihood minus one (Hudson 1971).

_{i}### Estimating selection

For each of the four QTL data sets, we calculated the likelihood of a range of values of *c* using an effective populations size *N* = 50 (Gurganus *et al.* 1999) and the maximum-likelihood estimates of µ and σ for the appropriate bristle trait. For each value of *c* and *H _{i}*, we generated 6.5 × 10

^{4}Monte Carlo samples from the distribution of substitution effects in MATLAB and calculated the likelihood according to Equation 12 (see File S2). To calculate the significance of the likelihood-ratio test, we computed the test statistic

*D*is approximately χ

^{2}-distributed with 1 d.f. (Hudson 1971). It follows that the 95% confidence interval includes values of

*c*with likelihoods within 2 log units of the maximum.

### False-positive rate

To estimate the false-positive rate of the test, we simulated QTL data under a neutral model of evolution. For each of the four data sets, we found the maximum-likelihood value of **H** for *c* = 0. For the *i*th observed locus, we generated a neutral locus by summing *H _{i}* effects drawn from the distribution of mutation effects for the appropriate bristle trait. Although it entailed extensive computation, we generated 100 sets of simulated neutral QTL for each data set. We applied the likelihood-ratio test for selection to these simulated data and recorded the proportion of neutral data sets for which the neutral hypothesis was rejected.

### Undetected QTL

A well-documented bias in QTL studies is the failure to detect small-effect loci (Beavis 1998; Bost *et al.* 2001; Xu 2003). To assess whether undetected QTL might have influenced our estimate, we applied the analytical approach of Otto and Jones (2000) to estimate the number and average effect of undetected loci in each data set. We then reran our analysis including in each data set the estimated number of additional QTL, with their effect sizes following an exponential distribution with the estimated mean.

### Sensitivity to the mutation effect distribution

To assess the sensitivity of our test to the estimates of the mutation-effect distribution parameters, we perturbed μ by 1 SD in each direction and applied the test. We then similarly perturbed σ and compared the resulting estimates and confidence intervals to our primary results.

### Power

To estimate the power of the test with a diverse set of putatively realistic QTL distributions, we applied the test to all subsets of the QTL from each of the four populations, using the appropriate mutation-effect distribution parameters in each case. We examined how the proportion of times the test rejected neutrality varied with the number of loci included in the subset.

## Results

### The distribution of mutation effects

We estimated the maximum-likelihood mean and standard deviation of the Gaussian distribution of mutational effects (Equation 9) on the basis of the MA data of each type of sensory bristle. For sternopleural sensory bristles (Paxman 1957), the maximum-likelihood estimate (MLE) of μ was −0.08 ± 0.06 (±1 SD) and the MLE of σ was 0.23 ± 0.05. For abdominal bristles (Lopez and Lopez-Fanjul 1993), our estimate of μ was −0.25 ± 0.04 and our estimate of σ was 0.34 ± 0.05. We calculated the MLE *ĉ* for the four QTL data sets, in each case applying the corresponding mutation-effect distribution.

### Selection in the HST population

Based on the nine QTL detected in the HST population (Figure 3A) and the mutation-effect distribution estimated for sternopleural bristle number (Figure 3B), the likelihood-ratio test (χ^{2} = 38.1, *P* < 0.001) rejects a neutral hypothesis for an increase in sternopleural bristle number. The MLE for *c*, *ĉ*, was 0.018, with a 95% confidence interval (CI) ranging from 0.013 to 0.022 (Figure 3C). None of the corresponding one hundred simulated neutral data sets based on the HST data yielded a false positive.

Applying Otto and Jones (2000) to assess whether undetected QTL might have influenced our HST result yielded an estimate that four loci were not detected and that these loci had an average effect of 0.388. Including four simulated QTL following an exponential distribution with mean 0.388, the MLE *ĉ* remained 0.018 and the CI ranged from 0.013 to 0.022 (Figure 3D), the same result as in our main analysis.

To assess the sensitivity of our test to the estimates of the mutation-effect distribution parameters, we perturbed μ by 1 SD in each direction and applied the test. For μ = −0.144, *ĉ* = 0.027 with a CI ranging from 0.021 to 0.031, while for μ = −0.022, *ĉ* = 0.012 with a CI ranging from 0.007 to 0.016 (Figure 3E). Similarly perturbing σ, we found that for σ = 0.183, *ĉ* = 0.029, with a CI ranging from 0.025 to 0.034, and for σ = 0.288, *ĉ* = 0.005, with a CI from 0.001 to 0.01 (Figure 3F). Because confidence intervals for *ĉ* did not include zero, rejection of the hypothesis of neutrality underlying the evolution of the HST QTL data are robust to moderate perturbations in μ and σ.

### Selection in the LST population

On the basis of the four QTL detected in the LST population (Figure 4A) and the mutation-effect distribution estimated for sternopleural bristle number (Figure 4B), a likelihood-ratio test (χ^{2} = 4.88, *P* = 0.027) rejects a neutral hypothesis for a decrease in sternopleural bristle number. In this case, the log-likelihood curve does not have a maximum, but instead increases asymptotically as *ĉ* decreases (Figure 4C). Conservatively using the largest sampled likelihood as the maximum likelihood and applying the likelihood-ratio test, we found that *ĉ* < −0.005 with 95% confidence. None of the 100 simulated neutral data sets based on the LST data yielded a false positive.

We estimated that four loci were not detected and that these loci had an average effect of −0.284. When we included these simulated QTL in the data, we again found *ĉ* < −0.005 (Figure 4D). Perturbing μ, we found that for μ = −0.144, *ĉ* < 0.007, and for μ = −0.022, *ĉ* < −0.017 (Figure 4E). Perturbing σ, we found that for σ = 0.183, *ĉ* < 0.001, and for σ = 0.288, *ĉ* < −0.007 (Figure 4F). Because confidence intervals for *ĉ* included zero, rejection of the hypothesis of neutrality underlying the evolution of the LST QTL data were not robust to moderate perturbations in μ and σ.

### Selection in the HAB population

On the basis of the three QTL detected in the HAB population (Figure 5A) and the mutation-effect distribution estimated for abdominal bristle number (Figure 5B), a likelihood-ratio test (χ^{2} = 34.7, *P* < 0.001) rejects the neutral hypothesis for an increase in the number of abdominal bristles. As in the LST population, the log-likelihood curve does not have a maximum, instead increasing asymptotically as *ĉ* increases (Figure 5C). Conservatively using the largest observed likelihood as the maximum likelihood and applying the likelihood ratio test, we found that *ĉ* > 0.031 with 95% confidence. Only 1 of the 100 simulated neutral data sets based on the HAB data yielded a false positive.

We estimated that five loci were not detected and that these loci had an average effect of 0.245. When we included these simulated QTL in the data, we found that *ĉ* > 0.037 (Figure 5D). Perturbing μ, we found that for μ = −0.294, *ĉ* >0.037, and for μ = −0.215, *ĉ* > 0.026 (Figure 5E). Perturbing σ, we found that for σ = 0.30, *ĉ* > 0.034, and for σ = 0.392, *ĉ* > 0.029 (Figure 5F). Because confidence intervals for *ĉ* did not include zero, rejection of the hypothesis of neutrality underlying the evolution of the HAB QTL data are robust to moderate perturbations of μ and σ.

### Selection in the LAB population

On the basis of the three QTL detected in the LAB population (Figure 6A), and the mutation-effect distribution estimated for abdominal bristle number (Figure 6B), a likelihood-ratio test (χ^{2} = 4.00, *P* = 0.046) rejects a neutral hypothesis for a decrease in abdominal bristle number (Figure 6C). Conservatively using the largest observed likelihood as the maximum likelihood and applying the likelihood-ratio test, we found that *ĉ* < 0 with 95% confidence. Only 1 of the 100 simulated neutral data sets based on the LAB data yielded a false positive.

We estimated that 13 loci were not detected and that these loci had an average effect of −0.240. When we included these simulated QTL in the data, *ĉ* was 0 with a 95% CI ranging from −0.032 to 0.015 (Figure 6D). Perturbing μ, we found that for μ = −0.294, *ĉ* < 0.006, and for μ = −0.215, *ĉ* < −0.004 (Figure 6E). Perturbing σ, we found that for σ = 0.30, *ĉ* < 0.003, and for σ = 0.39, *ĉ* < −0.001 (Figure 6F). Because confidence intervals for *ĉ* included zero, rejection of the hypothesis of neutrality underlying the evolution of the LAB QTL data are not robust to perturbations in μ and σ.

### Power

We examined the power of subsets of the observed QTL to yield a positive result. The proportion of cases in which neutrality could be rejected increased monotonically with the number of QTL in the subset (Figure 7). For single QTL and the observed mutation-effect distribution, the test rejected neutrality in only 43% of cases, while for data sets with six or more of these loci, the test always rejected neutrality.

## Discussion

We have derived a new method for detecting the level of historical selection on quantitative traits. First, we showed how to estimate the parameters of a Gaussian distribution of mutational effects from the change in the trait over the course of a mutation-accumulation experiment. Then, we derived the distribution of the effect sizes of mutations that fix as a function of the regime of selection or neutrality applied to the trait. We demonstrated how to integrate these two results to yield a maximum-likelihood estimate of the strength of historical selection given a set of observed QTL effect sizes and the results of a mutation-accumulation experiment. Finally, we applied the test to a well-studied quantitative trait, detecting a history of artificial selection on four cases of sensory bristle numbers in *D. melanogaster*.

The test we have derived is the first test for selection on phenotype that incorporates information about a trait’s genetic architecture as well as the phenotypic change induced by mutation. It is also the first test to include an explicit model of how neutrality and selection shape the loci that affect a trait. Finally, our test is the first that will allow biologists not only to distinguish between neutrally evolving traits and selected traits but also to estimate the range of strengths of selection that are most likely to have produced the current phenotypic value. As such, it represents a theoretical and practical improvement on existing methods to detect selection on quantitative traits.

Application of the test to sensory bristle number in *D. melanogaster* demonstrated its ability to detect a history of selection. Likelihood-ratio tests demonstrated sufficient power to reject neutrality in four out of four populations that had been subject to artificial selection. For the HST data, we were able to identify a MLE for a coefficient determining the strength of selection, as well as to describe its 95% CI. For the other three data sets, we were able to estimate upper or lower bounds for the strength of selection. Because the HST data contained the largest number of QTL (9), this difference in precision is expected. The HST data set was also the only one that contained some QTL with effects opposite to the actual and inferred direction of selection. Similarly, by applying the test to subsets of the data, we demonstrated that the test’s power increased with an increased number of QTL. We also assessed the false-positive rate by applying the test to 100 simulated neutral data sets based on each set of real QTL. In no case did more than 1 of the 100 simulated data sets yield a false positive.

We assessed the sensitivity of our test to the estimates of the mutation-effect distribution parameters by perturbing μ and σ by 1 SD in each direction and reapplying the test. For the HST and HAB data, rejection of neutrality was robust to these perturbations. However, for the LST data and the LAB data, the result was no longer significant in the case of a more negative μ and in the case of a smaller σ. This loss of significance speaks to a perhaps unsurprisingly higher sensitivity of the test in cases in which selection and mutation are acting in opposite directions (as they are in HST and HAB) than in cases in which selection and mutation are acting in the same direction. Our low false-positive rate across both cases in which selection and mutation are acting in concert or in conflict indicates that this difference in sensitivity is in accord with the actual inferential difficulty and not a problem of a biased test construction.

A well-documented bias in QTL studies is the tendency to fail to detect small-effect loci (Beavis 1998; Bost *et al.* 2001; Xu 2003). To assess whether undetected QTL might have influenced our estimates, we applied the analytical approach of Otto and Jones (2000) to estimate the number and effect of undetected QTL and included simulated loci following these parameters in our analysis. For the three data sets in which the test rejected neutrality, this modification did not change the significance of the result and had little effect on the estimates of *c*. However, when we included the simulated loci in the LAB data set, where we had already bordered on insufficient power to detect the action of selection, the MLE for *c* was shifted to 0. For this data set in particular, the analysis indicated that there were many more undetected loci (13) than observed QTL (5). Furthermore, the simulated loci had a mean of −0.240, almost identical to the estimate of μ for abdominal bristles, making them effectively neutral for the purposes of our test. Therefore, application of Otto and Jones (2000) imputation can have a considerable impact when the expected number of unobserved loci is much larger than the number of detected QTL.

Wider application of this test awaits the availability of more MA and QTL data. Unfortunately, there are few traits for which both MA and QTL analysis have been performed. Practical and ethical considerations limit the range of species for which mutation-accumulation experiments may be performed. On the other hand, numerous examples of applications of Orr’s 1998 test to empirical data (Rieseberg *et al.* 2002; Albertson *et al.* 2003; Lexer *et al.* 2005; Orr 2010) demonstrate that tests for selection on phenotypic traits are in demand. Although both types of experiments can be labor intensive and time consuming, there are many cases in which either MA or QTL data are already available, and only the corresponding partner experiment need be completed to apply this test for historical selection.

Several expansions of our model would improve its scope and precision. First, the model presented here implicitly rather than explicitly incorporates time. Because the generation and fixation of mutations take time, the amount of time a trait has had to evolve is an important factor in how many substitutions are responsible for each QTL effect. Explicitly incorporating time into the model should provide more information for determining in which vector **H** is most likely. Since **H** affects the likelihood of *c*, this additional discriminatory power would yield more precise estimates of the strength of selection. However, explicit rather than implicit incorporation of time in our framework is complex.

Our test assumes that the probability of fixation of a new mutation follows Equation 5, which has been derived from various models of evolution (Kimura 1962; Moran 1960; Gillespie 1974), and is a fairly general result, accommodating a variety of evolutionary scenarios for which the effective population size captures all relevant complications. Examples include cyclically varying population size (Otto and Whitlock 1997), population subdivision with symmetric migration (Pollack 1966), and deleterious mutations at completely linked loci (Charlesworth 1994). Probability of fixation results that could be used in place of Equation 5 under more diverse evolutionary scenarios are reviewed by Patwa and Wahl (2008).

Because we model the sequential mutation and fixation of alleles at different loci, our test applies to QTL identified by crossing related populations and estimates the selective regime responsible for the trait difference between the populations. The method is not applicable, as currently implemented, to QTL identified by crossing individuals from the same population. However, since selection should shape the standing variation within a population, a related test that applies to QTL variation in a single population is conceivable. Accordingly, one potential weakness of the test is that our model assumes that QTL between the divergent populations are constructed from novel mutations, without allowing some proportion of fixed differences to arise from standing variation in the ancestral population. QTL arising from standing variation fix faster and are likely to be of smaller effect size than QTL arising from novel mutations (Barrett and Schluter 2008). Consequently, our model assumption that all variation arises as new mutation is conservative for the purpose of our test: within our framework, smaller QTL effect sizes will be more consistent with neutrality.

Finally, distributions of mutational effect other than the Gaussian distribution used here should be explored. Implementing the model with other distributions would allow assessment of its sensitivity to this assumption and provide flexibility to its use as new information comes to light about the distributions of mutational effects found in nature. The central challenge in this endeavor is that other distributions lack the convenient property of the normal distribution that the sum of normally distributed random variables is also normally distributed. As currently implemented, our approach relies on this fact for tractability (Equation 2).

As a theoretical result, this test for selection on quantitative traits is important because it bridges population genetic theory and genetic architecture as revealed by QTL. Furthermore, the test for selection outlined here provides an improvement over currently available methods for determining the regime of historical selection or neutrality that generated observed quantitative differences in phenotype. Results of the test applied to four data sets known to be the result of artificial selection for sensory bristle number in *D. melanogaster* are highly encouraging, because they indicate that the test has sufficient power to be used on moderately sized QTL data sets and also that the false-positive rate is very low. Applying the test to QTL data generated by crossing two individuals representative of diverged wild populations will yield conclusions about the history of selection responsible for the observed trait differences. Finally, our model provides a novel framework that holds promise for expansion in future work to incorporate such factors as time, varying probabilities of fixation, and diverse shapes of the distribution of mutational effects.

## Acknowledgments

We thank Trudy Mackay and Carlos Lopez-Fanjul for graciously and helpfully digging deep in their files to provide key details of their published experiments. DPR was partially funded while performing this work by a Yale College Dean’s Research Fellowship, a Yale-Howard Hughes Medical Institute Future Scientist Summer Fellowship, and a National Science Foundation Graduate Research Fellowship DGE-1144152. The final computations were run on the Odyssey cluster supported by the Faculty of Arts and Sciences Science Division Research Computing Group at Harvard University.

## Footnotes

Supporting information is available online at http://www.genetics.org/content/suppl/2012/01/31/genetics.111.137075.DC1.

*Communicating Editor: C. D. Jones*

- Received November 23, 2011.
- Accepted January 17, 2012.

- Copyright © 2012 by the Genetics Society of America