# Multiple-Line Inference of Selection on Quantitative Traits

- 1Corresponding author: Institute for Theoretical Physics, University of Cologne, Zülpicher Straße 77, D-50937 Köln, Germany. E-mail: nriedel{at}thp.uni-koeln.de
- 2Present address: The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway, London NW7 1AA, United Kingdom.

## Abstract

Trait differences between species may be attributable to natural selection. However, quantifying the strength of evidence for selection acting on a particular trait is a difficult task. Here we develop a population genetics test for selection acting on a quantitative trait that is based on multiple-line crosses. We show that using multiple lines increases both the power and the scope of selection inferences. First, a test based on three or more lines detects selection with strongly increased statistical significance, and we show explicitly how the sensitivity of the test depends on the number of lines. Second, a multiple-line test can distinguish between different lineage-specific selection scenarios. Our analytical results are complemented by extensive numerical simulations. We then apply the multiple-line test to QTL data on floral character traits in plant species of the *Mimulus* genus and on photoperiodic traits in different maize strains, where we find a signature of lineage-specific selection not seen in two-line tests.

- quantitative traits
- natural selection
- hypothesis testing

EXTENSIVE experimental work has helped to reveal the genetic architecture of quantitative traits (Dilda and Mackay 2002; Brem and Kruglyak 2005; Mackay and Lyman 2005; Mezey *et al.* 2005; Nuzhdin *et al.* 2005; Flint and Mackay 2009), allowing researchers to study the basis of trait variation within and across species. A long-term goal of QTL research is to understand the mapping from genotype to phenotype underlying a particular quantitative trait. Crosses between individuals from different lines are used to identify loci whose states are statistically correlated with a particular trait. However, the ability of QTL studies to identify the molecular basis of quantitative traits is still limited; it is especially difficult to pinpoint genetic loci that influence a trait (Mackay *et al.* 2009). Targeted efforts have been made to resolve loci at the level of single genes or even nucleotides (Pasyukova *et al.* 2000; Fanara *et al.* 2002; De Luca *et al.* 2003; Harbison *et al.* 2004; Moehring and Mackay 2004; Jordan *et al.* 2006), but these cases are still the exception.

In recent years, QTL experiments also have been extended to crosses between multiple lines. Harnessing information from several lines dramatically increases the power and accuracy of QTL identification (Rebai and Goffinet 1993; Steinhoff *et al.* 2011), allowing researchers to test for epistatic interactions (Jannink and Jansen 2001; Blanc *et al.* 2006) and increasing the genetic variability that can be accessed (Blanc *et al.* 2006). For instance, all loci that have the same allele in two lines also have the same allele in all crosses of these lines. In the absence of genetic variance, the effect of such a locus on a trait cannot be determined. Analyzing more than two lines increases the number of loci that differ by state in at least one line, allowing researchers to identify more loci affecting a quantitative trait. Multiple-line pairwise crosses are most common in animal and plant breeding (Blanc *et al.* 2006; Rückert and Bennewitz 2010), where often many different lines are available for crossing. However, the extension to multiple-line crosses also brings new challenges. For instance, choosing the right mating design for the QTL experiments is important for multiple-line crosses (Crepieux *et al.* 2004; Verhoeven *et al.* 2006). Because most statistical methods for QTL identification developed for two-line crosses cannot be easily extended to the multiple-line case, new and more sophisticated methods have been developed (Xie *et al.* 1998). These methods are based on least-squares regression (Rebai and Goffinet 2000), maximum likelihood (Xie *et al.* 1998; Xu 1998), and a Bayesian approach (Yi and Xu 2002) and have been applied to a range of experimental data sets (Blanc *et al.* 2006; Chen 2009; Coles *et al.* 2010; Rückert and Bennewitz 2010; Steinhoff *et al.* 2011).

In the field of evolutionary genetics, information from QTL analysis has been employed to infer the evolutionary forces acting on a particular trait. Here the central question is whether natural selection acted on a trait during its evolutionary history. A more specific question is whether the strength of selection is constant across a phylogeny or the selection has acted in a lineage-specific manner. Several statistical tests make use of the data gained from QTL experiments to detect the effects of natural selection. Orr’s test (Orr 1998) asks whether the statistical distribution of alleles shows an excess of alleles that increase the value of the trait (+ alleles) in one line as a sign of selection. Orr’s test, in turn, was assessed by Anderson and Slatkin (2003), who found that the test statistics are conservative, and Rice and Townsend (2012b), who observed an unusual dependence of the test on the variance of the distribution of additive effects. Based on Orr’s approach, Fraser, Moses, and Schadt (Fraser *et al.* 2010; Fraser 2011) used QTL statistics to detect a signal of nonneutral evolution in gene expression levels in different yeast strains. Rice and Townsend (2012a) used a test that combines QTL analysis with data from mutation accumulation experiments and asks whether mutations seen between two lines tend to affect the trait more than those seen in experiments that accumulate largely neutral mutations. However, currently, no test uses the full statistical information from multiple-line QTL experiments.

In this paper we develop a statistical framework to test different evolutionary hypotheses for multiple QTL lines. Using a systematic likelihood-based approach, we find that a multiple-line test has a higher statistical power to identify selection compared to the two-line test. However, the consequences of multiple-line testing go beyond the mere increase in the number of observed loci. In two lines, the effect of lineage-specific selection turns out to be statistically indistinguishable from the bias introduced by testing traits with the largest phenotypic differences from a pool of traits for selection. In three or more lines, evolutionary scenarios involving lineage-specific selection generally can be distinguished from such bias. We use this effect to search for lineage-specific selection in QTL data on different traits in species of the genus *Mimulus* and in different maize lines.

Our test follows Orr’s test (Orr 1998) in that we use a two-state model at each locus and infer selection from the statistics of + and − alleles. Also, we condition the allele statistics on the phenotypic difference to deal with a potential bias introduced by testing multiple traits. Unlike Orr, we use population genetics models to compare the empirical allele statistics with the statistics observed under different evolutionary scenarios. The approach of Rice and Townsend (2012a) is similar in spirit but uses information from mutation-accumulation experiments, which goes beyond standard QTL analysis. Numerical simulations performed to assess the statistical power of the test are similar to those of Rice and Townsend (2012b) (which, however, focus on the connection to the variance of the distribution of additive effects), and for the multiple-testing simulations, we use a scenario analogous to that of Anderson and Slatkin (2003).

In what follows, we develop a log-likelihood score that quantifies the likelihood of neutral and selective hypotheses in an explicit evolutionary framework. We first explore our approach on artificial data and probe the efficiency of our test in the presence of different confounding factors. We then discuss the bias that trait selection can introduce into the allele statistics and how in three lines or more the effects of natural selection can be distinguished from bias resulting from trait selection. Finally, we apply our test to floral quantitative traits in different *Mimulus* species and to photoperiod traits in maize, finding evidence for lineage-specific selection that is not detectable in two lines.

## An *n*-Line Selection Model

In this section we construct a simple population genetics model of QTL evolving in *n* haploid populations in the weak-mutation regime with full recombination. Trait and fitness are a linear function of the states of the loci. The effects of interlocus epistasis, simultaneous polymorphism, and lack of recombination will be examined in section entitled, *Epistasis and Multiple Segregating Loci*.

Central to our analysis is a quantitative trait *T* affected by *L* loci labeled . Each locus is characterized by a genotype, and the genotype at each locus affects the trait in a particular way. For example, consider a trait affected by a transcription factor. In this case, the regulatory region of a gene may be a sequence locus affecting the trait. We can approximate the relationship between trait and locus by a two-state variable *q* with states “on” (functional binding site in the regulatory region) or “off” (nonfunctional site). We describe each locus *l* by such an effect state (state for short) . The effect state depends on the genotype and describes the effect a particular genotype at a locus has on the trait. In general, different genotypes at a locus correspond to the same state (there are many different sequences with a functioning binding site and even more without). We denote the number of genotypes of a locus corresponding to state *q* by *ω _{q}*.

Because of the limitations of QTL mapping, the information on the effect state of a locus is indirect; in most cases it is not known what feature of the genotype determines the state of the locus. Instead, for each allele at a locus, QTL analysis gives the effect a particular allele has on the trait averaged over many crosses. In QTL studies using crosses between four different lines (Blanc *et al.* 2006; Coles *et al.* 2010), most loci show a clear separation between alleles; the different alleles either decrease or increase the trait by a certain amount. For this reason, we restrict ourselves to a two-state model of loci, , effectively focusing on the feature of a locus’s genotype with the largest effect on the trait. An extension to more states is easily possible and may be required when analyzing a large number of lines. In a study using crosses between 25 lines, loci harboring alleles with several different effects on the trait have been observed (Buckler *et al.* 2009).

We assume a linear trait model (character model) without trait epistasis (interlocus epistasis); the state at each locus contributes additively to the trait, that is, (1)where the additive QTL effect specifies the contribution of locus *l* to the trait. Without loss of generality, we take , so (termed the + *state*) results in a higher trait value than (the *− state*). The additive effect of a locus is taken from experiments on multiple crosses between different lines, as is the state of a particular allele. denotes the set of effect states at all *L* loci. We assume a linear Malthusian fitness (log-fitness) landscape (2)with selection strength , resulting in a selection coefficient for each locus proportional to the additive effect . Under this assumption, the effect of the state of a locus on both trait and fitness is independent of the states of other loci. This assumption will be examined and relaxed later. There is no environmental component and (for a diploid population) no dominance.

We consider a simple population genetics model describing a haploid population of effective population size *N* in the weak-mutation regime with full recombination. In this regime, mutations appear at some rate and are eventually either fixed or excised from the population. The arrival and fixation of mutations are a stochastic process whose rate depends on the fitness difference of the new allele relative to the preexisting allele, the effective population size *N*, and the mutation rate *μ* (Wright 1931; Kimura 1962).

At low mutation rates, most loci are monomorphic at a given point in time but may differ between lines (owing to mutations that fix in a given population before the next mutation occurs). The statistic of states of a locus describes the probability that this locus in a given line is in state *q*. In the limit of long evolutionary times between lines, this statistic no longer changes with time, so the probability is stationary (equilibrium). Under neutral evolution, the equilibrium probability depends only on the number *ω _{q}* of sequence variants of the locus corresponding to state

*q*, that is, (3)The shorthand is called the

*multiplicity parameter*of a particular locus. In our example with the transcription factor binding site, the number of sequences with a functioning binding site

*ω*is much lower than the number of sequences without such a site

_{+}*ω*, leading to in the absence of selection. The multiplicity parameter of a locus quantifies the asymmetry between + and − states in the absence of selection and, correspondingly, the relative number of mutations at a locus increasing or decreasing the trait. Under selection, however, the equilibrium-state statistic also depends on the fitness difference between the two states and is given by Iwasa (1988), Berg

_{−}*et al.*(2004), and Sella and Hirsh (2005) as (4)This result is valid in the low-mutation regime but can be generalized (Iwasa 1988; Barton and Coe 2009; Nourmohammad

*et al.*2013b). A brief derivation is given in Appendix B. The key assumption behind this result is that after long times since the last common ancestor, a stationary distribution is reached. This assumption will be examined in the section entitled,

*Testing for Selection at Different Evolutionary Times*. In Appendix A we derive results valid in the complementary regimes of short times since the last common ancestor.

For *n* lines labeled , the joint probability distribution in the limit of long evolutionary time factorizes over lines, so the statistic of states for a given locus is (5)where . Here we need to consider one subtlety arising from QTL analysis based on crosses between individuals from different lines: in crosses, only the effects of loci differing in their state *q* in at least two lines can be determined. For this reason, the two configurations remain unobservable. Thus the sum in the normalizing factor *Z* is over all states of the *n* lines excluding the cases (indicated by ).

Under the linear fitness model (2), states at different loci are statistically independent, so the statistics of states over several loci are the product of (5) over loci (6)where the number of loci with different states in at least two lines is denoted by . The statistics of states at different loci may differ from this simple form for several reasons. The first is genetic linkage: here we assume free recombination between loci, as is standard in quantitative genetics (see later for an example with full linkage). A second reason is epistasis, which will be discussed in the section entitled, *Epistasis and Multiple Segregating Loci*.

## Inference and Hypothesis Testing for Different Evolutionary Scenarios

The statistics of states (6) can be used to infer the parameters of this model (selection strengths for different lines and the multiplicity parameters at different loci) from experimental data on the states across lines and loci and on the additive effects . Denoting the position of the maximum of a function over *x* by , the maximum-likelihood estimates of the free parameters are obtained by maximizing (6) with respect to the free parameters (7)There are two limitations to the inference of multiplicity parameters and selection strengths. The first is that the number of lines *n* limits in particular the inference of multiplicity parameters. For lines, the only observable loci are in states or , so . Hence the statistics of states do not depend on the multiplicity parameters, making their inference impossible. For , the statistics of states depend on the multiplicity parameters, and the estimate of these parameters improves with increasing number of lines because the size of the data (*i.e.*, number of loci times the number of lines) increases relative to the number of multiplicity parameters (one per locus). Second, selection strengths can only be determined relative to each other: The likelihood (6) depends on the states via . Increasing all selection strengths uniformly by some and decreasing each multiplicity parameter by thus leaves the likelihood unchanged. As a result, for example, a situation where selection strength is uniform over the lines and the multiplicity parameters are all zero is statistically indistinguishable from a multiplicity parameter and neutral evolution. In what follows we will focus on lineage-specific selection and determine selection strengths relative to each other. Using further information on multiplicity parameters [*e.g.*, from mutation-accumulation experiments (Rice and Townsend 2012a)] or further assumptions (*e.g.*, that multiplicity parameters are uncorrelated with effect sizes or are, on average, nonnegative), we can also obtain information on absolute selection strengths from (6).

When only few loci for a trait are known, the inference of all parameters may be unreliable because of overfitting. In this case, it is convenient to restrict the parameter space and test specific hypotheses against each other. For example, we can compare a scenario with uniform selection strength on all lines () with a lineage-specific selection pattern (). The log-likelihood score (8)quantifies the evidence for two such evolutionary scenarios *P* and *Q* relative to each other. Both these scenarios are described by statistics of the form (6) but differ in their parameter values. The score (8) is positive if the distribution of states observed in a particular data set is more in agreement with the statistics of states in scenario *Q* than in scenario *P*. For both these scenarios, the remaining selection parameters are estimated together with the multiplicity parameters according to (7).

When two scenarios with different numbers of free parameters are tested against each other, the log-likelihood score is generally biased toward the scenario with more parameters. A simple way to correct this bias is the Bayesian information criterion (BIC) (Schwarz 1978). Under the BIC correction, the score (8) is decreased by an offset , where *k* is the excess number of parameters in model *Q*.

## Increased Statistical Power in More than Two Lines

There is a simple reason why the power of the selection tests increases when more lines are used. Because only loci with different states in at least two lines can be observed, a certain fraction of loci affecting the trait remains hidden from the analysis. For two lines, loci with the states and cannot be observed. For three lines, there are only two unobserved out of eight possible configurations, and the fraction of unobserved loci decreases further with the number of lines. In general, the probability of a locus remaining unobserved in *n* lines is given by , where the statistics of states are given by (4).

To probe the log-likelihood score (8) for a varying number of lines, we test selective and neutral hypotheses against each other on artificial data. For lines and loci, additive effects are drawn randomly from a gamma distribution (Zeng 1992; Orr 1998). After choosing the effects , their values are fixed and are taken to be known explicitly (in practice, they are obtained via experiments using QTL crosses). Then we generate artificial QTL data under different scenarios, which we label for easy reference. In the first, neutral scenario , the selection strength on all lines is zero (). In the second scenario , only line 1 is under selection (, ).

In each run, a set of states is drawn from the probability distribution (5) with fixed values of for each line and for each locus corresponding to scenario (see caption to Figure 1 for details). For the subset of loci with different states in at least two lines, the log-likelihood score (8) is computed. To gauge the statistical significance of a given value of this score, we also estimate the probability of reaching the same score or higher under the neutral scenario . This *P*-value measures the rate of false-positive results (type I error rate) and is computed by performing a large number of runs under the scenario to see what fraction of them gives a score matching or exceeding *S*. To gauge how frequently a positive score occurs in favor of scenario with selection on line 1, 2, or 3, the configurations drawn from the null model are sorted according to their trait values .

As expected, the log-likelihood score for the selective model increases with the number of lines, while the mean *P*-value decreases (Figure 1). This increase in statistical power as a result of the increased number of loci is a simple quantitative effect arising from an increase in the number of loci with different states in at least two lines. The dependence of the score *S* on the number of lines *n* is approximately given by (9)where is the score for two lines. Since we average over many loci to obtain *S*, the multiplicity parameter Ω appearing in has to be understood as an average multiplicity parameter over the loci. is an increasing function of *n*, with the largest increase in score between two and three lines (Figure 1A). The value of saturates for large *n* because all loci become detectable; the exact saturation value is . Nevertheless, the number of detectable QTL, and hence the statistical signal of selection, can remain small, even when the number of lines is large, if selection strength is so high in all *n* lines that all or nearly all QTL have the same state in all lines. This can be seen from the expression for the fraction of unobserved loci , which tends to 1 as all go to . If in all lines (which is the important quantity here), fewer than of the loci are observable (assuming that ). In practice, this particular problem can be remedied by including in the analysis one line with small selection pressure on the trait.

While more lines bring more information, they also increase the experimental effort required to perform pairwise crosses between them. For this reason, we also compare two- and three-line tests while keeping the total number of crosses constant. Given a fixed number of crosses that can be performed, should those crosses be concentrated on two lines, or should pairwise crosses on three lines be performed (with fewer crosses between each pair of lines)?

To compare two- and three-line tests on QTL mapping data at a fixed total number of crosses, we simulated a QTL model for three lines with loci differing in state between these lines under scenario . One-hundred SNP markers were simulated, with every fifth marker being linked to a QTL whose additive effect is drawn from a gamma distribution. We performed crosses between lines 1 and 2 for the two-line mapping and crosses for lines 1 and 2, 1 and 3, and 2 and 3, respectively, for the three-line mapping. The recombination probability between two adjacent markers is set to 0.25 such that the QTL segregate mostly independently. We used the random forest mapping method as described in Michaelson *et al.* (2010) to infer QTL positions and additive effects. We then used the QTL found by the mapping algorithm for our selection test comparing the selective scenario against neutral scenario .

The results in Figure 2 show that the three-line design is more effective at detecting selection given a sufficient number of crosses. However, for a small number of crosses, the two-line test is more effective. The existence of two regimes can be understood as follows: at a large number of crosses between two lines, all or nearly all QTL that differ between these lines have been detected, and further crosses do not yield new QTL. Between three lines, however, the number of diverged QTL is larger, so crosses between three lines can yield more QTL. The effect thus arises from the competition between detecting more QTL among those diverged between two lines and having more diverged QTL available in three lines but fewer crosses per pair of lines. In our simulations, the crossover between the two regimes lies around *M*_{tot} = 200–300 crosses, which is a realistic number in QTL experiments, but of course, this depends on details of the QTL mapping algorithm and simulation parameters.

So far the increase in statistical power in multiple-line tests is due to an increase in the number of diverged loci with the number of lines. In order to address other, qualitative effects arising when the number of lines is increased, is kept fixed for the remainder of this paper.

## Detection of Selection

To probe how well neutral and selective evolutionary scenarios can be distinguished, we applied our test to artificial data generated under different hypotheses. The number of lines is set to and the number of diverged loci to . In addition to the neutral scenario and the lineage-specific selection scenario defined earlier, we also consider a scenario in which all three lines are under selection but in different directions (, , and ). We also test a selective two-line scenario with a relative difference of selection strength between the lines against the neutral scenario (Δ*s* = 0). Analogous to the preceding section, scenarios and , and , and , and and are compared with each other. For the tests of scenario against , the selection strengths are chosen to yield, on average, the same trait difference .

Figure 4 shows that the log-likelihood score (8) clearly can distinguish selective and neutral scenarios (Figures 3 and 4), as well as between different lineage-specific selection scenarios. As expected, the sensitivity of the test increases with selection strength. The test works in a reasonable parameter range, allowing us to infer selection strength with only few loci available ( loci for ) and a reasonable selection strength ( corresponds to a probability of 0.88 for a locus to be in the + state for ).

## Epistasis and Multiple Segregating Loci

The statistical framework comprising the equilibrium statistics of states (6), the maximum-likelihood estimates of selection strengths (7), and the log-likelihood scores (8) is built on a very simple population genetics model. In this section we explore how the resulting test performs when specific assumptions behind this model are not fulfilled. To this end, we do finite-population simulations in a regime with multiple segregating loci and look at two different kinds of epistases between loci: phenotype epistasis and fitness epistasis.

To model phenotype epistasis (character epistasis), we add a pairwise interaction term to the linear relationship between QTL states and the trait (1), yielding (10)where *J* is an symmetrical matrix describing the interactions between loci. The interaction coefficients are drawn from the same gamma distribution as the effects (and are assigned random signs); however, the average value of the is varied relative to that of by multiplying them by a factor . Then, for , the cumulative contribution to the trait from the epistatic interaction is, on average, as large as the contributions from the linear term . The regime of large corresponds to significant epistasis: in this regime, the trait value *T* can change significantly with the change of state of a single locus. We assume that, as is generally the case, the epistatic interactions are not known.

We perform numerical simulations using a Wright-Fisher model with and without trait epistasis. The Wright-Fisher model does not involve recombination, unlike the assumption of the selection test. Starting from a random initial configuration for loci, a Wright-Fisher model is simulated with three independent populations of 100 individuals each evolving over *M* generations. At the end of each run, the configuration of loci with the largest fraction in the population is used to calculate the score (8). We simulate both the selective scenario and the neutral scenario . We perform simulations both at high mutation rates leading to multiple segregating loci [mutation rate over generations, resulting in (Wilke 2004)] and in a second regime with low mutation rates ( over generations, with ), where there is typically at most a single segregating locus.

As is increased, the effect of each locus on the trait becomes coupled to the states of other loci, and the linear trait model (1) becomes increasingly inaccurate. If the epistatic interactions in (10) were known, the trait model with epistasis (10) could be incorporated into the state statistic (6) to restore the test’s sensitivity. As a result, the power of our test decreases with (Figure 5). Yet, for weak epistatic interactions , the results of the test are only mildly affected. There is no significant difference in the power of the test between the regimes with and without multiple segregating loci in the regimes we examined.

For fitness epistasis, we consider a quadratic fitness function in place of the linear function (2). is the trait value giving maximal fitness, and determines how quickly fitness decreases away from the maximum. The fitness parameters and are chosen such that the mean and variance of the distribution of trait values *T* equal those under the model (2) without epistasis at a given value . In this way, the scenarios with and without fitness epistasis can be compared directly. We again perform simulations in regimes with and without multiple segregating loci. Figure 6 shows a very similar performance of the test on data generated under the linear and quadratic fitness landscapes in both cases. This is so because the test evaluates only the probabilities of alleles at individual loci. Correlations between loci depend on the nonlinearities of the fitness landscape (Nourmohammad *et al.* 2013b), but they do not enter the test.

Beyond epistasis, the results of a QTL-based test for selection are potentially limited by pleiotropic effects: a subset of QTL of one trait may affect a second, unknown trait. If this unknown trait is under selection, but not the first, a QTL-based test may erroneously lead to the conclusion that the first trait is under selection (because some of its loci show a signal of selection induced by the second, unknown trait). Hence the evidence for selection from QTL statistics pertains to the trait for which the loci were identified or some unknown trait with substantial overlap of QTL loci with the trait under study. Conversely, the trait under study may be under selection (favoring + states, say), but some of its loci affect another trait also under selection, favoring − states. If the second trait is unknown, the test would infer a selection strength on the first trait that is too low. With a small number of lines or loci, the signal of selection may even be lost altogether.

## Testing for Selection at Different Evolutionary Times

Here we probe the statistical power of the equilibrium test at different evolutionary times. The statistics of states (5) were derived in the steady state and are reached a long time after the divergence of the different lines. This equilibration time depends on, besides the mutation rate, the strength of selection and the size of mutational targets. In a regime of long evolutionary times, each locus has changed state many times since the last common ancestor. In a regime of short evolutionary times, most loci have not changed their state (and thus are not detected in crosses), and most diverged loci have undergone a single change of state over the phylogeny. With a sufficient number of lines, the two scenarios can be distinguished easily on the basis of the QTL states in all lines; in the limit of short times, the states are compatible with a single mutation event in the phylogeny (for each diverged locus).

We performed simulations analogous to the ones described earlier in the section entitled, *Increased Power in More than Two Lines*, but instead of drawing configurations from the equilibrium distribution (5), we simulated for a number of *t* time-step transitions between states at each locus with substitution rates and for the transition from − to + and vice versa (Kimura 1962) (see also Appendix A). The phylogenetic tree used for three lines is shown in Figure A1. To simulate the transition between short and long evolutionary times, we varied the average number of substitutions per locus but kept selection strength and the number of diverged loci fixed ( is smaller than the total number of mutable QTL loci *L* when the expected number of substitutions per locus is smaller than 1). The score and *P*-value of the test (8) built on the assumption of long evolutionary time are plotted against the average number of substitutions (Figure 7). The statistical power decreases only slightly when going from long to short evolutionary times, and the test retains some of its statistical power even as goes to zero. The statistics of states in this limit of short evolutionary times are derived in Appendix A.

## Multiple Testing

As emphasized by Orr (1998), a large trait difference between two lines alone is not sufficient evidence for lineage-specific selection. Often traits in QTL experiments are picked from a larger pool of traits; among those, traits that diverged markedly between lines are chosen for further analysis because this difference hints at lineage-specific selection. However, in a sufficiently large set of traits, neutral evolution alone would produce traits differing between lines. In such a trait, we also would observe an imbalance of states enhancing the trait value in one line and reducing it in the other. The bias in trait difference and the statistics of states resulting from a nonrandom choice from a set of traits is called *ascertainment bias* (Nielsen and Signorovitch 2003). Ascertainment bias can lead to nonneutral evolution being attributed to a trait that evolved neutrally along with a set of other neutrally evolving traits.

There are two ways to correct for this effect. If the total number of traits from which the observed trait is taken is known explicitly, we are faced with a standard multiple-testing problem. We look at this case first. However, if the trait is chosen from an ill-characterized set of traits, the situation is different. We follow the approach of Orr (1998) and consider the statistics of states conditioned on the observed trait difference. We will see that in this case there is a drastic difference between two and more than two lines.

## Holm–Bonferroni Correction

If the total number of observed traits is known, a standard multiple-testing correction can be applied. An example is gene expression levels, where traits are analyzed on a genome-wide level, and the number of genes is known (Fraser *et al.* 2010). A suitable multiple-testing correction for this case is the Holm–Bonferroni correction (Holm 1979), which has the advantage that no independence of the different hypotheses needs to be assumed. This is particularly important in QTL analysis because different traits can be affected by the same genetic loci. The Holm–Bonferroni correction controls the family-wise error rate (FWER), *i.e.*, the false-positive rate not only for a single trait but for a whole set of traits. If there are *m* traits for which scenario *Q* is tested against the null hypothesis of scenario *P*, we calculate the log-likelihood score (8) and the corresponding *P*-values for all *m* traits. The traits are then ranked according to their *P*-values with the highest *P*-values first. Next, we search for the first trait *j* for which , where *α* is the significance threshold for the FWER. Scenario *P* then can be rejected for the traits but not for traits .

## Conditioning on the Trait Difference

Often, however, the size of the pool from which traits are picked is not known. Most traits from this pool remained unnoticed simply because they showed little difference between lines and were not recognized as interesting traits for investigation. Orr’s proposal (Orr 1998) for this case is to use, in place of (6), a statistic of states conditioned on the empirical trait difference between two lines, *i.e.*, to restrict the states to those giving rise to the observed . In so doing, the part of the evidence for selection that comes from the trait difference between two lines is discarded. Orr writes the trait difference as for the case of two lines. We generalize this notion to the case of *n* lines and denote the maximal trait difference across two lines , where the lines are ordered such that line 1 has the largest trait value and line 2 has the smallest trait value .

Our next step is to calculate the statistic of states conditioned on a particular value of *R*_{max}. This statistic then can be used in the log-likelihood score (8) in place of the neutral null model. Our calculation is based on the principle of maximum entropy. This general principle applies to situations with incomplete knowledge on the probability distribution of some variable *x*. This distribution must be consistent with any prior information on *x* we might have (*e.g.*, the mean value of *x*), but otherwise it should be as unbiased as possible. The principle of maximum entropy posits that the distribution that best describes the incomplete state of knowledge is the distribution that maximizes the information entropy with respect to , subject to the constraints resulting from prior information. Stated in this form first by Jaynes (1957), the principle of maximum entropy already appears at the core of statistical physics, where the distribution over configurations *x* of a physical system are constrained by the mean energy . The maximum-entropy distribution in this case turns out to be the Boltzmann (exponential) distribution , where *β* is determined by the mean value of the energy . Other applications of the principle of maximum entropy are in image reconstruction (Narayan and Nityananda 1986), language modeling (Berger *et al.* 1996), and neural networks (Mora and Bialek 2011). In the context of quantitative traits, the principle of maximum entropy and the associated calculus of exponential distributions have been used to estimate unobserved allele frequencies and to infer selection from trait observables (Prügel-Bennett and Shapiro 1994, 1997; Ruttray 1995; Berg *et al.* 2004; Mustonen and Lässig 2005; Lässig 2007; Mustonen *et al.* 2008; Barton and De Vladar 2009; De Vladar and Barton 2011; Nourmohammad *et al.* 2013a, b). Here we use the principle of maximum entropy to derive the statistics of states conditioned on the largest trait difference . A pedagogical example is given in Appendix B.

Starting from the neutral null model , we derived the neutral null model conditioned on the trait difference, *i.e.*, with an additional parameter *h* determining the value of . This distribution is obtained by maximizing the information entropy (11)with respect to . Here refers to the neutral null model . The sum over all possible states , , for a given locus again excludes the two unobserved states with . The maximization is subject to two constraints, implemented by Lagrange multipliers— to implement the normalization of and *h* to implement the constraint that the largest trait difference equals the expected value under (see Appendix B). Setting the derivative of the information entropy (11) with respect to equal to zero gives the state statistics of a locus with additive effect *a* and multiplicity parameter Ω as (12)The parameter *h* is set such that the mean trait difference under (12) (summed over all *L* loci) equals the trait difference observed in the data.

The maximum-entropy statistic conditioned on will be used to describe the statistic of states under neutral evolution and with ascertainment bias. The resulting log-likelihood score (13)compares evolution under selection and neutral evolution with ascertainment bias. This score depends on the ascertainment parameter *h*; extremizing the score with respect to *h* sets the expected value of the trait difference under the conditioned model equal to the trait difference observed in the data.

In the case of two lines, it turns out that the probabilities for the two observable states and , , are the same as for the selective model at equilibrium, (the multiplicity parameters cancel for ). Maximizing the score with respect to *h*, the statistics of states with ascertainment bias and under selection are exactly the same, making it impossible to distinguish selection from neutral dynamics and ascertainment bias. As a result, the log-likelihood score comparing evolution under selection at equilibrium with the neutral statistic conditioned on the observed trait value is exactly zero. Hence, for two lines at equilibrium, it is not possible to statistically distinguish neutral evolution with ascertainment bias from the effect of selection.

A key difference between our log-likelihood score and Orr’s test is that Orr uses not only the empirically observed additive effects {*a _{i}*} available from crossing experiments but also additive effects drawn from a plausible distribution

*P*(

*a*). Orr’s test can appear to yield significant results when calculating the trait difference

*R*using the additive effects empirically determined from crosses but uses a different set of additive effects drawn from some distribution

*P*(

*a*) for

*P*-value computations. Consistent with this, Rice and Townsend (2012b) found that the outcome of Orr’s test strongly depends on the assumptions made on that distribution and that the test can produce nonsensical results in particular cases.

This situation is fundamentally different for more than two lines. For more than two lines, the statistic of states in the selective scenario in equilibrium (12) differs from the neutral scenario, and the score (13) generally gives nonzero results both at equilibrium and at short evolutionary times. (However, again there is a particular selection scenario , , , that is not distinguishable from neutral evolution conditioned on .)

To test these different approaches to the multiple-testing problem, we examined a multiple-testing scenario in which a trait was picked from a larger set of traits. This multiple-testing scenario followed the lines of Anderson and Slatkin (2003). First, states were drawn at random for traits and three lines evolving neutrally. Then the traits were sorted according to the maximal trait difference across lines. The trait with the highest was tested for selection using selective scenario against the neutral scenario. We did this in three ways: by using the score (8) without a multiple-testing correction, by applying the Holm–Bonferroni correction assuming that the number of traits is known, and by conditioning on using (13). Repeating this procedure many times over, we computed the false-positive rate (type I error rate) for all three approaches (Figure 8). Second, we generated the statistic of states of one trait under the selective scenario and for the other traits under the neutral scenario . Then we determined how often the trait under selection was correctly identified by the different approaches (true-positive rate) with a *P*-value < 0.05 ( for Holm–Bonferroni). Figure 8 shows that, as expected, a test without correction yields the highest rate of true positives. Yet it also suffers from the highest false-positive rate because many neutrally evolving traits happen to have a high leading to a high score (8). The Holm–Bonferroni method and the conditioning on both have lower false-positive rates. This result for the conditioning on is in accord with that of Anderson and Slatkin (2003), who found that Orr’s test, which uses a similar correction scheme, also led to conservative test statistics. Because the false-positive rate of the Holm–Bonferroni method is the lowest, it is to be preferred when the size of the pool of traits is known.

While the maximum trait difference is a plausible observable on the basis of which traits can be selected from a larger pool, it is by no means the only one. For instance, with three lines, traits could, in principle, be selected based on the difference between the trait in line 1 and the trait mean in lines 2 and 3, . We would use this observable when looking specifically for traits with lineage-specific selection acting on line 1. For , the fitness (2) can be written as (14)where , and . The maximum-entropy distribution conditioned on is (up to a normalizing constant) and thus again differs from the equilibrium distribution , except in the special case .

## Selection on Plant Quantitative Traits

In this section we apply the multiple-line selection test to data from two studies of plant quantitative traits. Our first example is based on QTL data on corolla (petal) sizes in three different plant species of the genus *Mimulus*. *M. guttatus*, *M. platycalyx*, and *M. micranthus* are labeled lines 1, 2, and 3, respectively. At each locus detected by Chen (2009), it turns out that there are two alleles with very similar effect on the trait (within experimental error) and one allele with a significantly different effect. If there is a single high allele, we assign it the + state, while the two other alleles are assigned the − state, and vice versa. Additive effects for the states are computed by averaging the additive effects listed for different alleles over alleles corresponding to the same state. The resulting states and additive effects for the corolla width and corolla length traits are listed in Table 1.

Log-likelihood ratios for the pairwise comparisons of the evolutionary scenarios , , and are calculated as described in the section entitled, *Inference and Hypothesis Testing for Different Evolutionary Scenarios*. These scenarios describe neutral evolution, neutral evolution in the presence of ascertainment bias, and lineage-specific selection, respectively. For each scenario, the multiplicity parameters and (in the case of scenario ) selection strength are calculated according to (7). Where applicable, we use the Bayesian information criterion described earlier to correct scores for different numbers of free parameters of the underlying models. This leaves *P*-values unaffected. When testing against a neutral scenario, we use either scenario (conditioning on *R*) or scenario (Holm–Bonferroni correction). In the first case, we condition the null model on the pair of lines with the highest trait difference for each trait. For the Holm–Bonferroni test, we take the ad hoc choice of as the total number of traits in this data set because five different traits are analyzed in the QTL experiment in Chen (2009). However, this choice is artificial because we do not know the potentially much larger set of traits from which these five traits were chosen.

We start with the corolla width trait, where seven QTL have been identified along with their additive effects (Chen 2009). Comparing scenario against described by (12) gives a log-likelihood score (13) of in favor of the selective scenario. We test the significance of this score by repeated simulations under scenario at fixed additive effects . The ascertainment parameter *h* is set such that the conditioned neutral model gives, on average, the trait difference observed in the data. For each configuration drawn from , we sort the lines according to their trait values *T*. In this way, we account for the possibility that under neutrality, fluctuations create patterns of lineage-specific selection in any of the lines (rather than only in what is called line 1 here). A *P*-value of 0.13 is obtained.

The unconditioned test together with the Holm–Bonferroni correction yields a similar result. In testing of scenario against , the score corresponding to a *P*-value of 0.05 is obtained. With the Holm–Bonferroni correction, however, a more stringent *P*-value cutoff of less than for the family-wise error and has to be applied.

A preference for a selective model is in agreement with the different reproductive modes of these species (Chen 2009): line 1 reproduces predominantly by outcrossing (so that large floral characters are needed to attract pollinators), whereas lines 2 and 3 are mostly self-pollinating (but still maintain a certain degree of outcrossing). In the latter species, large petals are less indispensable for reproduction but nevertheless require resources to develop and maintain.

Next, we examine the corolla length trait, where six QTL were observed (Table 1) and the maximal trait difference is between lines 1 and 3. Here the comparison to the neutral null model yields the score (*P* = 0.54), so the neutral hypothesis cannot be rejected, and similarly for the Holm–Bonferroni procedure (, *P* = 0.14, implying a substantial family-wise error).

For comparison, we also apply Orr’s sign test (Orr 1998) (not the equal-effects version) to this data set. Because Orr’s test is a two-line test, we apply it to the two lines with the largest trait difference, where one would expect the strongest signal for selection. Following Orr, the additive effects are taken from a gamma distribution whose parameters for each trait are estimated by maximum likelihood. Then the probability of finding at least the observed number of + states in the high line given the observed trait difference *R* or greater is calculated according to equation (4) in Orr’s paper (Orr 1998). For the corolla width trait, five of six diverged loci in lines 1 and 2 have the + state. Here Orr’s test returns a *P*-value of 0.42. For the comparison of lines 1 and 3, the test gives *P* = 0.29. For the corolla length trait, four of five diverged loci are in the + direction between lines 1 and 3, and three of four diverged loci in the + direction between lines 1 and 2. Orr’s test yields *P*-values of 0.48 and 0.72, respectively.

Our second example is based on QTL data on photoperiod response traits of four different maize strains. The photoperiod response of a trait is defined as the trait difference observed between specimens grown in an environment with long days and specimens grown in a short-day environment. We consider the traits’ *days to anthesis* (the time from planting to full flower development) and *days to silking* (silk emergence in maize), both measured in growing degree-days (daily average temperature above a threshold temperature of 10° cumulated over days of growing). For comparison, we also look at plant height, which is not directly linked to day length. For maize, it has been shown that the architecture of quantitative traits such as flowering time and leaf size accurately follows a model with additive trait effects and only weak epistatic effects (Buckler *et al.* 2009; Tian *et al.* 2011). Coles *et al.* (2010) provided the additive effect of alleles from different QTL and the corresponding experimental errors. For each locus, it is specified which lines harbor an allele with the same effect on the trait (within experimental error). As in *Mimulus* earlier, most of the loci show alleles that have one of two experimentally distinguishable effects on the trait. In such cases, the + and − states can be unambiguously assigned to each line and locus, and the resulting values for and are collected in Table 2. Yet about a third of the loci show more than two significantly different effects on the trait or have one line where the experimental error on the effect on the trait is so large that it cannot be assigned unambiguously to one of the two states. Loci with such unclear assignment of states are excluded from the analysis.

Two of the lines in Coles *et al.* (2010) (B73 and B97) are taken from temperate climates featuring long days in summer and short days in winter, while the other two (CML254 and Ki14) are taken from tropical environments with constant day lengths over the year. Thus we use as the simplest evolutionary scenario (, , , and ), with only a single free parameter *s*. We compare this selective scenario against the null model from (6) with .

We first consider the *growing degree-day to anthesis* (GDDTA) trait, which measures the time to full flower development. For tropical lines, which are not adapted to long day lengths, the flowering time is reduced for specimens grown in temperate latitudes compared to tropical environments (Coles *et al.* 2010). For the temperate lines, no difference in flowering time is observed between the different environments. For this trait, four of seven loci show a clear two-state pattern. We first apply scenario conditioned on . In this case, the straightforward maximum-likelihood estimate of the parameter *h* fails because all states in the high line are + states and all states in the low line are − states, leading to a diverging . We use a lower-bound estimate for *h* by determining the value of *h* for which the probability of seeing this extreme configuration equals . is chosen to obtain a conservative estimate for *h*. For consistency, is determined in the same way. The log-likelihood score (8) then gives (*P* = 0.07) in favor of the selective scenario. The Holm–Bonferroni correction yields a result consistent with this (, *P* = 0.045).

For the *growing degree-day to silking* (GDDTS) trait, with four two-state loci of six, the score (*P* = 0.048) favors the selective scenario over the neutral null model as well. Again, is conditioned on , and the lower bound for *h* is used as described earlier. Using the Holm–Bonferroni correction, one obtains a similar result (, *P* = 0.030). The *plant height* trait, however, with four two-state loci of six, yields the scores (*P* = 0.42) under conditioning and (*P* = 0.022) with the Holm–Bonferroni correction in favor of the neutral model. Here *h* was again determined by maximum likelihood, and the conditioning was on . The other traits investigated in the study (Coles *et al.* 2010) (*i.e.*, growing degree-day anthesis–silking interval, ear height, and total leaf number) have fewer two-state loci (≤3), and none of these traits show a significant support for either of the two hypotheses (data not shown).

Again, we also apply Orr’s test for comparison. We compare the two lines B73 and CML254, which show the largest trait difference both in the GDDTA trait and the GDDTS trait. For the GDDTA trait, six of six diverged loci have the + state, giving a *P*-value of 0.13. For the GDDTS trait, five of five diverged loci go in the + direction with *P* = 0.2. A summary of the results can be found in Table 3.

In both case studies, the statistical significance of the evidence for a particular evolutionary scenario is limited by the number of identified trait loci. With a higher number of crosses in the original studies, identifying more trait loci, we expect a stronger statistical signal.

## Conclusions

In this paper we developed a statistical framework to quantify the evidence for different evolutionary scenarios from QTL data for more than two lines. We find that using more than two lines not only increases the statistical power of selection tests but also increases their scope: for more than two lines, signals of selection can be distinguished from the effects of ascertainment bias. We applied our test to QTL data on floral characters in different *Mimulus* species and photoperiod response traits in maize.

Applying our test to very large numbers of lines poses interesting challenges in connection with the number of alleles per locus and the rapid growth of the number of possible evolutionary scenarios. At the same time, the need for experimental crosses between three or more different lines is a major bottleneck of the multiple-line test. Because of the additional experimental work involved, there are currently few data sets on QTL and their additive effects in more lines than two. However, recent studies employing crosses of 25 maize lines and detecting around 30–40 QTL per trait give a promising outlook to the future (Buckler *et al.* 2009; Tian *et al.* 2011).

A possible application of this test is the inference of gene expression adaptation using expression QTL (eQTL) (Fraser 2011). Because the number of eQTL is typically small for a single gene, the test could be applied on gene modules, *e.g.*, genes belonging to the same pathway or protein complex, allowing one to infer selection on individual pathways. Another future perspective for this method may arise if genome-wide association studies (GWAS) with fully sequenced organisms enable the inference of causal mutations behind QTL effects (Mackay *et al.* 2009; Manolio *et al.* 2009), allowing one to apply multiple-line tests without the need to perform crosses between different lines (Fraser 2013).

## Acknowledgments

We gratefully acknowledge discussions with Andreas Beyer, Daniel Barker, Mathieu Clément-Ziza, Sinéad Collins, and Michael Nothnagel. This work was supported by the Deutsche Forschungsgemeinschaft under grant SFB 680.

## Appendix A: Short-Time Dynamics

The statistics of states (6) were derived in the limit of long evolutionary times (equilibrium). In general, the statistics of states depend on the lengths of branches of the phylogenetic tree (which we assume to be known). In this appendix we derive the statistics of states in the limit of short evolutionary times and derive the corresponding log-likelihood score. At short evolutionary times, at most one mutation changing the state has fixed at each locus and across the phylogeny.

Again, we consider loci that are monomorphic in each population and identical initially. Then a mutation appears in one population and (with a certain probability) is fixed. The fixation probability depends on fitness, so the relative frequencies of such events at different loci allow in principle the inference of selection. Such short evolutionary times are characterized by ; nevertheless, the total number of diverged loci, characterized by (where *n* is the number of lines and *L* is the total number of mutable loci affecting the trait), still must be at least of order 1. Because our observable is the relative number of times mutations have fixed in one particular line (relative to other lines), the total number of mutable loci does not enter the statistics of states. In the regime of short evolutionary times, the ancestral states of the loci and the phylogeny of the lines affect the statistics of states, so general results for *n* lines are unwieldy. Here we compare the cases of and .

We start with the case of two lines and consider a locus where one line has undergone a single change of state since the last common ancestor. This change can occur in either line; the relative probabilities for the change to occur in a particular line equal the relative rates at which the transition between states occurs in the two lines. The transition rates between states (substitution rates) in a given line are

for the transition from *c* = − to *q _{i}* = + and

*c*= + to

*q*= −, respectively (Kimura 1962) (the factor 4 comes about because the phenotype changes by during the transition). In general, the mutation rates (from − to +) and (from + to −) will be different. Yet, for the relative probabilities of a mutation in one of the lines given the ancestral state, a difference in mutation rates does not play a role because both lines start with the same ancestral state. This leads to the probability for the transition to occur in line

_{i}*i*(A2)where we define the shorthand . (

*i*= 1, 2) is the selection strength on the trait in line

*i*. Given two lines, both final configurations and can be reached from either ancestor .

If the ancestral states are unknown, one can average over both possible ancestors. Writing the probability of ancestral state *c* as and relative rates as , the dependence on the multiplicity parameter drops out again, and we obtain (A3)We have assumed that the distribution of states in the ancestral line has reached equilibrium under some selection strength , which will be inferred by maximum likelihood.

Considering three lines, four of the six possible diverged configurations can be assigned a unique ancestor: denoting line 3 as the outgroup (Figure A1), configurations and diverged from the ancestral state and configurations and from ancestor . Configurations and can be reached either by a mutation in the ancestor of lines 1 and 2 or by a mutation in line 3. One can write the relative probabilities of the six state configurations excluding as (A4)where the times and account for the different branch lengths of the phylogenetic tree (Figure A1). With these relative probabilities, the statistics of states in the three lines are (A5)where we define the shorthand , and denotes the selection strength of the line with the minority state [*e.g.*, for the configuration ], and . Again, the two states with are excluded from this sum.

Analogous to the equilibrium case, the statistics of states (A5) for different hypotheses and and so on enter a log-likelihood score of the form (8). To compare the resulting tests under different evolutionary scenarios, we perform numerical simulations at short evolutionary times, as in the section entitled, *Testing for Selection at Different Evolutionary Times*. No knowledge of the ancestral states is assumed. Under the selective scenario , we find that the statistical power of the short-time test on three lines on short-time data is somewhat lower than that of the three-line equilibrium test applied to data for long evolutionary times at the same number of diverged loci (Figure A2) but still allows us to detect selection. However, for two lines, the test under conditioning on gives hardly any significant results (Figure A2), while the conditioning for three lines, as well as the Holm–Bonferroni correction for two and three lines, allows to infer selection in a reasonable parameter range.

## Appendix B: Pedagogical Example for the Maximum-Entropy Principle

Here we give a simple concrete example to illustrate the link between ascertainment bias and the maximum-entropy principle. Consider a uniform distribution on the interval , from which 10 numbers are drawn independently (Figure B1, left). If one repeatedly draws such sets of 10 numbers, the sum over each set will fluctuate from set to set with a mean value of 5. In the next step, we only retain those sets whose sum is close to some value of . The numbers in these sets follow a nonuniform distribution, and for , we find that larger values *x* appear with a higher probability than the uniform distribution (Figure B1, right). Although each of these numbers was drawn originally from the uniform distribution, retention of sets with a particular mean value introduces a bias in the observed distribution of *x*. This is the ascertainment bias induced by conditioning the sum of each set. The principle of maximum entropy allows us to determine the exact form of this biased distribution . We maximize the relative information entropy between the distribution and the original (uniform) distribution for

subject to the constraints (B2)where is the size of each set. Here the first constraint ensures the normalization of , and the second constraint fixes the mean value of *x* to . Introducing Lagrange multipliers to maximize (B1) subject to the constraints (B2) leads to maximization of (Jaynes 1957) (B3)with respect to . Differentiating (B3) with respect to *p* and setting the derivative to zero give (B4)Ascertainment bias thus makes *x* exponentially rather than uniformly distributed, with coefficients and determined by the constraints (B2). For and , we obtain and ; the result for shown in Figure B1 agrees perfectly with the histogram of numbers in sets with a constrained sum.

Suppose that we did not know whether the original distribution from which the data were drawn was uniform or not and we had access only to data subject to the known constraint. If the distribution of the empirical data deviates from or agrees with the maximum-entropy distribution , then this deviation or agreement could be used to quantify the likelihood that the original data came from the uniform distribution (*vs.* an alternative hypothesis). We follow the analogous approach with the score (B1) to tell whether a particular statistic of states more likely comes from neutral evolution in combination with ascertainment bias (*vs.* an alternative scenario involving selection).

Finally, we sketch the derivation of the equilibrium statistics of states , which also follow an exponential form (Iwasa 1988; Berg *et al.* 2004; Sella and Hirsh 2005). For a finite population evolving under genetic drift and selection at low mutations rates, Kimura (1962) gives the rate at which a mutation appears and spreads to fixation as

where is the fitness difference relative to the preexisting allele, and *μ* is the mutation rate. This rate obeys an exact relationship for forward and backward mutations (detailed balance). Approximating by *N*, the equilibrium distribution over alleles is then ∼exp{2*NF*} (Van Kampen 2007), where *F* is the fitness function of alleles. Grouping together alleles corresponding to the same state of a locus yields (4). When *F* is linear in the states of loci, the corresponding probability distribution factorizes of loci.

## Footnotes

*Communicating editor: J. B. Wolf*

- Received December 9, 2014.
- Accepted June 18, 2015.

- Copyright © 2015 by the Genetics Society of America