## Abstract

Selection, mutation, and random drift affect the dynamics of allele frequencies and consequently of quantitative traits. While the macroscopic dynamics of quantitative traits can be measured, the underlying allele frequencies are typically unobserved. Can we understand how the macroscopic observables evolve without following these microscopic processes? This problem has been studied previously by analogy with statistical mechanics: the allele frequency distribution at each time point is approximated by the stationary form, which maximizes entropy. We explore the limitations of this method when mutation is small (4*N*μ < 1) so that populations are typically close to fixation, and we extend the theory in this regime to account for changes in mutation strength. We consider a single diallelic locus either under directional selection or with overdominance and then generalize to multiple unlinked biallelic loci with unequal effects. We find that the maximum-entropy approximation is remarkably accurate, even when mutation and selection change rapidly.

MOST traits of interest have a complex genetic basis depending on very many loci. Quantitative genetics gives a sophisticated statistical description of the components of trait variance that can predict the immediate change due to selection. The present abundance of genetic markers allows us to find some of the loci that affect traits, but such QTL typically account for only a small fraction of the genetic variance (Hill and Kirkpatrick 2010; Yang *et al.* 2010). While we may be able to predict breeding values and estimate the distribution of effects, it does not seem possible—even in principle—to identify the individual alleles responsible for the bulk of heritable variance. Thus, we cannot hope to predict the evolution of quantitative traits by using a direct population genetics approach based on the frequencies of each individual allele.

Here we develop a general method that allows us to closely approximate the evolution of quantitative traits knowing only the distribution of allelic effects and mutation rates but without requiring knowledge of individual allele frequencies. This can be seen as an extension of the classical infinitesimal model to include arbitrary gene interactions and the effects of selection, mutation, and drift on the genetic variance. It also can be viewed as a generalized version of the quasi-steady-state assumption (QSSA) that is often made in dynamical reaction systems (Segel and Slemrod 1989; Goeke and Walcher 2013) to noisy systems described by partial differential equations (PDEs), where the dynamics are approximated using a quasi-stationary distribution assumption (QSDA); here we use the maximum-entropy (MaxEnt) principle to define that distribution.

In physics, the MaxEnt principle has a long history, starting with the seminal work of Jaynes (1957), who interpreted the Boltzmann distribution of statistical physics as the most random distribution subject to a constraint on fixed average energy. In the recent decade, there has been a resurgence of interest in MaxEnt, especially when applied to biophysics problems ranging from the statistics of neural spiking (Schneidman *et al.* 2006; Tkačik *et al.* 2014), bird flocking (Bialek *et al.* 2012), protein structure (Weigt *et al.* 2009), and immunology (Mora *et al.* 2010). These approaches have been generalized to describe temporal dynamics of high-dimensional systems, known collectively as *maximum-caliber* or *dynamical/kinetic maximum-entropy* models (Pressé *et al.* 2013), where the entropy of distribution over temporal paths is maximized given constraints on dynamical variables. Surprisingly, however, MaxEnt distributions have not been used widely as a variational ansatz for cases discussed in this work, where the evolution equation for the distribution might be known *a priori*, *e.g.*, as is the case with the diffusion approximation in population genetics. In such a case, the approach differs from the maximum-caliber methods because it involves a combination of a static MaxEnt ansatz for the stationary microscopic distribution with a quasi-stationary assumption in the diffusion/Fokker-Planck equation, which together extend the static MaxEnt inference to a dynamical approximation.

Prügel-Bennett and Shapiro (1997) and Rattray and Shapiro (2001) introduced the MaxEnt approximation to the dynamics of polygenic systems, predicting the cumulants of the trait distribution under mutation, selection, and drift; their main motivation was to understand evolutionary algorithms rather than natural populations. The same method was described independently in physics (Plastino *et al.* 1997) and used to approximate cosmic-ray fluxes (Hick and Stevens 1987).

However, neither of the two applications of MaxEnt was taken up in their respective fields. Barton and de Vladar (2009) modified Prügel-Bennet, Rattray, and Shapiro’s method so that it could be justified from population genetics principles. With this modification, it gives the stationary distribution exactly and is accurate in the limit of slowly changing conditions. Numerical calculations showed that it is remarkably accurate, even when selection or mutation changes abruptly. However, the method is fully valid only when mutation is stronger than drift (4*N*μ > 1), so populations are almost never fixed for one or other allele. Yet, in nature, mutation is typically weaker than drift, so most sites are fixed; in this case, Barton and de Vladar’s approximation applies only in cases where mutation or population size does not change with time.

We begin by summarizing the stationary MaxEnt approximation and its extension to nonstationary problems. We then extend the MaxEnt approximation so that it applies over the full range of mutation rates and test the accuracy of this approximation for directional selection and for balancing selection that favors heterozygotes. This extension is a combination of the continuous approach of Barton and de Vladar (2009) and a special treatment of the dynamics at the boundaries. A similar approach, where the boundaries have to be treated differently, has been used in the semidiscrete, semicontinuous methods studied in reaction-diffusion systems (Flegg *et al.* 2011; Robinson *et al.* 2014) and also in traveling fitness waves, where the fluctuations can be introduced to the model using a cutoff function (Tsimring *et al.* 1996; Hallatschek 2011). We initially consider the distribution of allele frequencies at a single locus and then extend consideration to multiple loci with a distribution of effects. Throughout, we assume that recombination is fast relative to other processes so that the population is in linkage equilibrium and can be described by its allele frequencies.

## Dynamics of Allele Frequencies

The dynamics of allele frequencies (for biallelic loci) can be described by a diffusion process using a deterministic forward Kolmogorov equation (*i.e.*, the Fokker-Planck or diffusion equation). The evolution of the joint probability density of allele frequencies satisfies (1)where the number of loci that contribute to the trait is *L*.

The second term of (1) equals and captures stochasticity of the allele frequencies arising from random sampling, *i.e.*, the random drift. While in the case of linkage disequilibrium this term would contain a double summation, reflecting correlations between loci, the off-diagonal terms vanish at linkage equilibrium. (The factors of 2 in the brackets of (1) arise because we assume a diploid population of *N* individuals; the corresponding haploid model would be the same, apart from these factors.)

The first term of (1) captures deterministic changes of allele frequencies. We consider (2)where β is the strength of directional selection, *h* denotes a higher-order correction that captures dominance, μ and ν are the forward and backward mutation rates, and and η_{i} are the additive effects of the *i*th locus on the traits under selection. can be written in a potential form where the potential , obtained by inverting this relationship, reflects effects of selection and mutation: (3)with quantities , *U*, *V*, and *H* defined as (4) (5)where , and and η_{i} are the effects of loci on the traits and *H*, respectively.

Selection, captured by the mean fitness , may be an arbitrary function of the distribution of quantitative traits, which, in turn, may be an arbitrary function of the *n* allele frequencies. The log mean fitness may be further decomposed into a sum in which the represent various selection coefficients and various selected traits. In the case of dominance, the terms associated with selection are , where and . Here we assume a weak selection such that .

The deterministic effects on allele frequency can be summarized into a vector of coefficients **α** and a vector of complementary quantities **A**. We study directional selection and dominance with nonsymmetrical mutation and define **α** and **A** as (6)In the sections that follow, we show that **A** and **α** can be understood as constraints and corresponding Lagrange multipliers, respectively, of a particular variational problem.

For and , this represents the simplest scenario of directional selection with symmetrical mutation. Directional selection of strength β acts on a trait , assumed to be additive, while selection of strength *h* acts on heterozygosity *H*. In this work, we consider unequal effects on the trait but equal effects on *H*. This can be easily extended to distribution of effects . A wide variety of other models can be treated in the same way. For example, de Vladar and Barton (2011) studied stabilizing selection on an additive trait.

This diffusion process is known to be an accurate continuous-time approximation to a wide range of specific population genetics models (Kimura 1955a; Ewens 2012); moreover, it corresponds directly to the coalescent process that describes the ancestry of samples taken from the population (Wakeley 2008). In order to represent the population in terms of allele frequencies, we must assume that linkage disequilibria are negligible, which will be accurate if recombination is sufficiently fast. For simplicity, we also assume two alleles per locus.

The stationary distribution of (1) has the form (Wright 1931; Kimura 1955a) (7)where ℤ is a normalization constant, also called a *partition function*. This distribution falls to zero at the boundaries (*p =* 0, 1) provided that 4*N*μ > 1 and 4*N*ν > 1. However, when mutation rates fall below this threshold, the distribution develops singularities at boundaries (if 4*N*μ or 4*N*ν is small, the singularity occurs at *p =* 0 or *p =* 1, respectively), even though the density function is still integrable.

## MaxEnt in Equilibrium Quantitative Genetics

The stationary distribution (7) can be derived from a variational MaxEnt principle. The key assumption is that selection and mutation act only through a set of observable quantities, which can be arbitrary functions of allele frequencies ; the strength of selection and mutation is given by the corresponding set of . Together these define the potential function .

We can define an entropy, relative to a reference measure , as (8)which has a unique maximum at the reference distribution; the entropy is (minus) the Kullback-Leibler divergence from . The key choice is to set the reference distribution as the neutral distribution of allele frequencies in the absence of mutation or selection: (9)Note that is not integrable, but it does provide the neutral probability distribution given that the allele is not fixed and yields the stationary distribution under mutation, selection, and drift when we maximize *S*_{H} subject to a normalization constraint and constraints . The latter condition enforces a constraint on the ensemble averages . These macroscopic quantities represent information that, in principle, could be observed. We refer to as observables, even though this does not necessarily mean that their values over time are known. The constrained maximization of entropy is solved by a method of Lagrange multipliers; for details, see Appendix A. For the example in (6), the constraints include

– normalization constraint,

with Lagrange multiplier 2

*N*β,with Lagrange multiplier 2

*Nh*,with Lagrange multiplier 2

*N*μ, andwith Lagrange multiplier 2

*N*ν.

The normalization condition sets the total probability of the allele frequency distribution to 1 and introduces the partition function ℤ as a constant multiplier in (7).

This variational principle recovers the stationary distribution of the diffusion equation (1); the 2*N* times the mutation rates and selection coefficients can be thought of as the Lagrange multipliers. Iwasa (1988) introduced the same entropy measure but used it in a slightly different way: he showed that the sum of the potential function and S_{H}/2*N* define a free fitness that increases over time, just as in thermodynamics the free energy increases over time. Further connections to thermodynamics have been studied by Sella and Hirsh (2005) for a special case of very small mutations, where most of the alleles are fixed, and in Barton and Coe (2009), including the novel entropy term (analogous to our *U* and *V*) that involves the effects of mutation.

Note that **A** and **α** have a specific meaning both in quantitative genetics and in statistical physics. In quantitative genetics, **A** characterizes properties of a quantitative trait whose means can be observed () and that evolves in response to the evolutionary forces **α**. In statistical physics, **A** and **α** represent conjugate pairs of thermodynamic variables, which can be interpreted as the constraints and Lagrange multipliers in the variational MaxEnt problem, commonly encountered when microscopic states of the system are unobserved but its macroscopic features are known.

Note that there is some flexibility in the choice of the reference distribution , where different choices may lead to the same stationary distribution. For instance, one may take the neutral distribution that involves mutation terms while omitting the constraints on *U* and *V* and assuming that μ and ν are functions of time. In this way, the reference distribution would be normalizable. Because these approaches are equivalent, one can view the constraints on *U* and *V* as conditions that regularize the allele frequency distribution.

## Dynamic MaxEnt Approximation

Our aim is to approximate the dynamics of a high-dimensional system by a small number of variables, which include the quantities that determine fitness. We approximate the real distribution of allele frequencies by the stationary distribution obtained by the MaxEnt method with a small number of constraints and use it as an ansatz in the diffusion equation. This leads to effective dynamical forces **α*** that yield the correct dynamics for the observed quantities. The assumption that the population is perturbed only through the forces **α** is crucial to the success of our approximation. If we could manipulate individual allele frequencies in an arbitrary way, then the long-term evolution would become essentially unpredictable: alleles that are initially rare could increase to cause arbitrary changes as they eventually rose to appreciable frequency (Barton and de Vladar 2009) (Figure 1). The overall strategy of the dynamical MaxEnt (DynMaxEnt) approximation is summarized in Table 1, while the terminology from statistical physics and quantitative genetics is provided in Table 2. Various approximate methods, discussed in our work, are summarized in Table 3.

We first describe the continuous DynMaxEnt method, proposed in Barton and de Vladar (2009), which requires a sufficient number of mutations in every generation. However, a discrete approximation, also used in Barton and de Vladar (2009) and described in Appendix C, is applicable when the mutation rate is small and selection is limited. The dynamics are then formulated in terms of fixed classes of alleles. However, the discrete approximation is not accurate unless the mutation rate is very small, and even then it has a limitation when (Appendix C). Similarly, the continuous method fails for 4*N*μ < 1 (Appendix D). We compare the performance of these methods in Appendix G and find that while the discrete method applies to a very small mutation rate and the continuous method to a large mutation rate, the intermediate regime is not captured by either of them. This is similar to the result of Mustonen and Lässig (2007, 2008), who studied fitness waves in the problem of fluctuating selection and introduced a novel approximation that is accurate for small selection timescales. Figure 2 in Mustonen and Lässig (2008) also shows an intermediate regime in which neither the diffusion theory nor the novel approximation is accurate. In this work, we present a *general DynMaxEnt* approximation that is applicable in all regimes. This approximation is compared to the numerical solution of the diffusion equation. Instead of using individual-based simulations, which are computationally demanding, we consider allele frequencies to hold biologically relevant values , for , and forward iterate an explicit transition matrix, consistent with the diffusion equation. This approach, provided in a Mathematica notebook (Supplemental Material, File S1), is also feasible when a moderate number of loci have different effects on the trait of interest.

### Continuous DynMaxEnt approximation

Any set of forces **α** will cause the population to evolve to a stationary distribution ; this is the distribution that maximizes entropy subject to constraints on , the being the Lagrange multipliers. Now suppose that the forces change abruptly, from to , and no further information about the system is provided. The expected observables will evolve toward the new stationary distribution. At any time, there will be a set of forces **α*** that would produce the current if the population were stationary; we expect that the **α*** will evolve from to as the population evolves from one stationary state to the other. Thus, we can describe the dynamics either by the change in or, equivalently, by the change in **α***.

Under the diffusion approximation, the expectations change as (10)where (11)[Equations 13 and 14 of Barton and de Vladar (2009); note that in their Equation 13 the expectation should be taken over the whole equation, not inside the derivatives as typed]. The expectations that appear on the left-hand side of (10) are not the same as the ones on the right; therefore, the system is not closed. We now introduce the continuous DynMaxEnt approximation, namely, that and in the dynamically changing system are approximated by the values that they would have at the corresponding stationary state that generates the actual ; the stationary distribution coincides with the MaxEnt distribution. If the population were at a stationary state under the forces **α***, chosen to produce the current expectations , then there would be no change: (12)where the asterisks denote values at the stationary distribution. The approximation has form , which gives Equation 15 of Barton and de Vladar (2009) as (13)It may be more convenient to follow the rates of change in **α***, which can be written in terms of the covariance of fluctuations in the **A**. Using matrix notation, (14)where (15)with an initial condition **α***(0) = **α**_{0} and **α** = **α**_{1}. The difference represents the change in evolutionary forces. Because the matrices **B*** and **C*** depend only on the effective forces **α***, as shown in Appendix B, the dynamical system for **α*** is closed. A detailed derivation of the DynMaxEnt method under more general conditions can be found in Appendix E.

Intuitively, one may assume quasi-stationarity in (10), provided that the evolutionary forces **α** change on a slower timescale than the timescales of selection (1/β), mutation (1/μ), and random drift (2*N*). Then the adiabatic approximation in (14) should be accurate not only for the predicted observables but also for the microscopic distribution. However, we will show that even when the evolutionary forces change abruptly, *i.e.*, when is not small, the approximation remains accurate—even though there is no guarantee that the inferred microscopic distribution agrees with the correct one.

The matrix **B** can be seen as a generalization of the additive genetic covariance matrix, where the corresponds to (twice) the marginal effects of the *k*th allele on *A*_{i}. The MaxEnt approximation consists of assuming that this matrix is approximately what one would obtain at equilibrium with the current . Thus, (13) is an extension to the ”breeder’s equation” (Lynch and Walsh 1997), which allows for quantities that can be any function of allele frequencies—not just trait means—and that allows for random fluctuations, mutational bias, and nonadditive selection.

The DynMaxEnt method can be contrasted with the maximum-caliber method, reviewed in Pressé *et al.* (2013). DynMaxEnt uses static observables to infer the correct stationary allele frequency distribution but allows the Lagrange multipliers to change over time in accordance with the known diffusion equation, ensuring that the observables are correct at all times. However, the maximum-caliber method uses constraints on temporal characteristics to arrive at a distribution over the allele frequency paths with constant values of Lagrange multipliers. DynMaxEnt is suitable for our problem because it only assumes knowledge of initial and changed evolutionary parameters and no further information on the properties of the allele frequency paths.

### General DynMaxEnt approximation

When numbers of mutations are small (*i.e.*, 4*N*μ < 1 and 4*N*ν < 1), we face a problem of diverging components in the continuous DynMaxEnt approximation as a result of a U-shaped allele frequency distribution (for simplicity, we will consider a single locus). If , this divergence is caused by diagonal elements of matrix **B** that correspond to *U* and *V*, in particular, for 4*N*μ < 1 and for 4*N*ν < 1 (see Appendices B and D for more detail). Therefore, the continuous DynMaxEnt approximation fails completely in a regime of dynamic selection and mutation when 4*N*μ < 1 or 4*N*ν < 1 simply because the right-hand side of the dynamical system (14) is ill defined. The breakdown of the continuous DynMaxEnt method, when the number of mutations are small (*i.e.*, for small populations), is not a numerical problem but a fundamental limitation of the method itself. However, it can be avoided by considering a modified diffusion problem that does not aim to resolve all details of the allele frequency distribution close to the fixation and loss but instead agrees with the original problem in terms of the probability that the allele frequency is extreme.

We define the boundary layers as and for but finite. The value of the truncation parameter ε is discussed later, but typically, . We then replace the original diffusion equation with solution by a new system of PDEs with solution that agrees with the true dynamics in the following properties:

The stationary distribution in the bulk is the same for both problems: , for .

The stationary probabilities of extreme allele frequencies are the same for both problems:

and

to the lowest order in ε.

As the truncation parameter goes to 0, the problem converges to the original diffusion equation;

*i.e.*, it develops singularities at the boundaries: .

Replacing the original diffusion equation with a set of coupled diffusion equations in different regions of the state space captures an important characteristic of the problem: the presence of multiple timescales in the allele frequency dynamics. When the system is perturbed from stationarity by a change in the Lagrange multipliers (selection, mutation, heterozygosity), *e.g.*, by a dramatic drop in number of mutations, the correct distribution very quickly adjusts to the form at the boundaries. Only then does the mass in the interior slowly transfer to the vicinity of the fixed states to converge to the new stationary distribution. This results in a very fast dynamics of and and a considerably slower dynamics of the trait mean and the mean heterozygosity.

DynMaxEnt can capture these multiscale features only if it can incorporate a low and changing rate of mutation. The quick initial adjustment of the mutation rates is then naturally followed by a slow dynamics of the trait mean and heterozygosity because the speed of the transfer of the mass between the fixed states is limited by the infrequent rate of mutation. Similarly to other observables, the boundary masses also can be expressed as means of functions of allele frequency and thus treated as additional observables. The appropriate function is a characteristic function where (16)Splitting the problem domain augments the degrees of freedom by and , which control the boundary dynamics independently of the bulk dynamics and the corresponding Lagrange multipliers and . The values of κ and ρ will be determined later. The general DynMaxEnt method, derived in Appendix E and summarized next, provides a way to couple the boundary dynamics with the bulk dynamics to account for the multiscale features and resolve the technical problems of the continuous method while keeping the same number of degrees of freedom as the continuous DynMaxEnt.

We first split the allele frequency domain into the bulk part and the boundary part and define the diffusion equation separately in the three regions in Appendix E. We couple the equations by a boundary flux that is consistent with the original diffusion equation, thus leading to the same probability mass at the boundaries as in the continuous DynMaxEnt method. The stationary distribution of the problem (E4) has the form (17)with , where ℤ is the normalization constant, and the relative masses in the three regions are determined by constants κ and ρ. The stationary distribution (17) is not generally continuous at . However, it can still be obtained by maximizing a relative entropy with a bounded base distribution (18)complemented by the following constraints on and Lagrange multipliers 2*N***α**: (19) (20)where instead of *U* and *V* that diverge at the boundaries, we take their truncation to : (21)The two remaining parameters, κ and ρ, are matched to satisfy conditions 2 from the preceding list, leading to (22)This relationship ensures that the approximate stationary distribution has the same proportion of the mass at the boundaries to the stationary solution of the original diffusion. Generalized to multiple loci, the MaxEnt distribution has the form (23)with ℤ such that . We remark that the reference distribution is the stationary distribution in the absence of selection and mutation. Therefore, , where is the state-dependent diffusion coefficient:

The diffusion equation in the split domain can be used to derive a new DynMaxEnt approximation for arbitrary mutation strengths. This is done in a manner similar to the continuous DynMaxEnt method using the ansatz (17) and a quasi-stationarity assumption. The derivation, detailed in Appendix E, leads to the same dynamics as the original method: (25) (26)where (26) is a closed dynamical system for . However, since constant prefactors in the stationary distribution (23) depend on mutation rates, the matrix contains additional terms: (27)where the subtracted terms form a square matrix of the same dimension as . Its leftmost columns, denoted by , represent a contribution to the terms . Because the stationary distribution (23) depends on the selection-related Lagrange multipliers only through , in a similar manner to the continuous DynMaxEnt method, all added terms are zero. In contrast, the presence of the mutation-related Lagrange multipliers in (23) as the scaling factors results in additional terms added to and , forming the remaining two columns in (27). The components of the matrices and are functions of effective forces only, similar to the continuous DynMaxEnt approximation. The difference is that the expectations involve integrals through parts of the domain (bulk part or boundary parts), and thus the terms have to be evaluated numerically.

## Applications of the General DynMaxEnt Method

In the following, we show the performance of the general DynMaxEnt method in the parameter regimes when the continuous DynMaxEnt method is not applicable because the dynamics (14) become singular. These examples involve dynamic selection and mutation in the regime . To illustrate the method step by step, we provide a pedagogical Mathematica notebook with the code for the general DynMaxEnt method (File S1).

Performance of the DynMaxEnt approximation for different scenarios is compared to the discrete and continuous approximations in Appendix G.

### Example 1: Low and changing mutation

Here we consider a single locus under directional selection. As we show in Appendix E, this also corresponds to the case of multiple loci with equal effects. Despite its apparent simplicity, the system still contains degrees of freedom that capture the allele frequency distribution in a population of *N* diploid individuals. The general DynMaxEnt approximation, based on a stationary MaxEnt approximation, reduces the dynamics to a few degrees of freedom that correspond to the Lagrange multipliers . For instance, when , the full dynamics in the 2*N*-dimensional space reduces to a three-dimensional (3D) space .

The general DynMaxEnt method is tested in the most challenging situation when the initially strong mutation suddenly changes to . The continuous DynMaxEnt method does not apply to this example for two reasons. First, when effective mutation drops below or , the components of matrix **B** diverge because of singularities of the stationary allele frequency distribution. Second, even when the mutation rates are kept fixed at their terminal values (, ), resulting in a reduced dynamics, the continuous DynMaxEnt method does not give correct dynamical predictions for small mutation rates. A closer look at the failures of the continuous DynMaxEnt method is provided in Appendix D.

The general method (with ) gives a satisfactory estimate of the trait mean at the beginning of the adaptation process, as shown in Figure 1, while a quick drop in effective mutation, compensated by a change in the effective selection, results in a quick loss of polymorphism. When the equilibration of the effective mutation is too fast, the speed of the dynamics that are limited by effective mutation becomes slower than necessary and leads to underestimation of trait mean and . This can be observed over the medium timescale *t/*2*N* ∼ 1–10.

Over long timescales, the approximate dynamics accurately match the exact dynamics. When we employ an additional constraint on the mean heterozygosity, although no *a priori* selection is acting on it, we increase the number of degrees of freedom from three to four. While a general feature of variational approximations is that an increase in the number of constraints leads to an improvement in the performance of the approximation, this particular choice yields a match that is almost indistinguishable in all four observables (including the added ), as shown in Figure 1.

The dynamics of Lagrange multipliers in Figure 1 (E and F) demonstrates separation of timescales. An initial quick drop in mutation and subsequently to the vicinity of their steady states is compensated by the changes in (and *h**) and followed by a slow relaxation of (and *h**) to equilibrium once the mutation rates are barely changing. It is interesting that the dynamics of the effective forces differs significantly between cases E and F.

### Example 2: Overdominance

Here we consider that the population dynamics is driven not only by directional selection but also by selection on heterozygosity, *i.e.*, overdominance. This yields stationary distributions that may have up to three modes: two modes represent peaks at the boundaries in the low-mutation regime, and an intermediate peak represents an increased probability of heterozygous individuals. We consider an analogous example to Example 1 but add initially strong selection on heterozygosity (trimodal distribution of allele frequency) that switches to a purely directional selection.

Figure 2 shows an almost perfect agreement between the exact solution and the approximation despite the fact that the approximation is not built to capture rapid switches in the forces but, to the contrary, assumes that the forces change slowly. The effective forces do not capture the rapid change in the real forces on the system but show a slow–fast relaxation to their new steady states. Appendix H shows a complementary simulation when the initial () and terminal conditions () are switched. The system follows a different trajectory in the direction and .

In all the simulations presented here, we use ; such a threshold ε corresponds to an allele frequency of a single individual out of a population of size 2*N*. In Appendix F, we provide additional simulations to show that there exists an optimal threshold *N*ε that minimizes the approximation error. Moreover, we show that in the worst-case scenario, where a strong change in selection is coincident with a low and changing mutation, the relative error of the approximation is on the order of 1%.

### Example 3: Multiple loci with different effects

In this example, we show the applicability of the DynMaxEnt approximation for the evolution of quantitative traits that depend on multiple loci with different effects. The state space of the system of *L* independent loci contains essentially degrees of freedom because each locus is characterized by an allele frequency distribution with 2*N* degrees of freedom. The reduction of the dimensionality to three (directional selection) or four (overdominance) degrees of freedom thus offers an immense simplification of the problem in which instead of tracking the full allele frequency distribution we are tracking just the dynamics of the Lagrange multipliers corresponding to the underlying constraints.

How does the method perform for different distributions of effects? A simple case, in which all effects are the same (), coincides with Example 2 because the dynamics become the same as for a single locus (Appendix I). We consider three distinct distributions of effects to demonstrate both the effect size distributions typically assumed and also extreme examples. We chose a uniform distribution in , exponential distribution with mean 1, and a bimodal distribution with many loci of small effect and a few loci of large effect. We chose all distributions to have for an easier comparison.

Figure 3 shows a comparison between the exact dynamics and the general DynMaxEnt approximation. All forces, including forward and backward mutations, were initially perturbed to a state where , the regime where the continuous approximation does not apply. When the distribution of effects was uniform or exponential, the approximation showed almost a perfect match with the true dynamics in all observed moments. Intuitively, one expects that when the distribution of effects is strongly bimodal, with many small effects and a few large effects, the approximation will perform badly. This is so because the large-effect loci dominate the quick response of the system, while the small-effect loci form a second wave of adaptation at a later time that is difficult to capture by the simple approximation schemes. Figure 3 shows these two timescales in the dynamics of Lagrange multipliers. The approximation of the observables still gives a solid match, but the second wave of adaptation due to small-scale effects is not perfectly accurate anymore.

## Discussion

A central result in population genetics is that the stationary distribution of allele frequencies is the product of the neutral distribution and the mean fitness raised to the power of population size: (Wright 1937; Kimura 1955b, 1964). This result can be interpreted via an optimization principle: selection constrains the expected values of the traits that determine fitness, but the allele frequency distribution is distorted as little as possible by this constraint—the distortion being measured by (minus) the relative entropy. Our approximation is to assume that this MaxEnt principle holds even away from equilibrium, with the approximate distribution of allele frequencies having the stationary functional *form* at all times. This provides a variational ansatz for the diffusion equation and results in a set of dynamical equations for the parameters of the MaxEnt distribution.

This MaxEnt approximation can be justified in the limit of slowly changing conditions. Yet, provided that mutation rates are above a critical threshold (), it is remarkably accurate even in the worst case, when parameters change abruptly. Even for a single locus, this approximation gives a substantial reduction in complexity: the whole distribution is described by a small number of dynamical variables that correspond directly to the forces of mutation and selection. The method extends directly to multiple loci such that the joint distribution of allele frequencies can be approximated by a small number of variables.

This paper extends the approximation to low mutation rates (), where the original approximation breaks down completely. The problem is that when , populations are likely to be near fixation—*i.e.*, the distribution is concentrated near the borders. When mutation rate or population size changes, the distribution near the boundaries changes immediately, whereas the distribution in the interior changes much more slowly. To see why this is so, think of the probability that a population carries one or a few copies of an allele. Because the lifetime of such rare alleles is short and is determined primarily by mutation and random reproduction, this probability changes rapidly with those processes and in the short term is independent of selection or of the total population size. In contrast, the distribution of polymorphic alleles in the interior changes slowly with changes in mutation rate because it takes a long time for new mutations to reach high frequency or for polymorphic alleles to be lost by drift.

The MaxEnt approximation for the expectations of a set of traits extends the “breeder’s equation” to include random fluctuations, mutation, and an arbitrary relation between selected traits and the underlying genotype. It takes the form (13), where is a generalization of the additive genetic covariance, **α** are the actual forces of selection and mutation, and are the effective forces that would yield the same expectations . Our extension is simply to modify by truncating the distribution of allele frequencies within a distance of the boundaries, thus suppressing its divergence for low mutation rates. The approximation is insensitive to the location of the truncation threshold ε. We give examples of directional selection and dominance; here we assume additivity across loci, but de Vladar and Barton (2011) show how the method applies to stabilizing selection, which induces pairwise epistasis between loci. We emphasize that the approximation can be applied without detailed knowledge of individual loci: only the distribution of allelic effects is needed. The approximation also can be generalized in principle to traits with linkage disequilibrium. The constraints for such a case also would involve pairwise measures, *e.g.*, the correlations between loci.

Like the MaxEnt approximation, the infinitesimal model (Fisher 1919, p. 403) reduces the dimensionality of the dynamics by following trait values rather than individual alleles. However, these approaches are quite distinct. The infinitesimal model assumes that there are so many genes that the distribution of allele frequencies at each locus is hardly perturbed by selection; thus, the genetic variance that segregates within families remains approximately constant. This is equivalent to assuming that *N*β on each allele is small (Robertson 1960). In contrast, the MaxEnt approximation can be applied to a single locus, with large *N*β, and predicts the change in genetic variance owing to selection. The infinitesimal model is broader, in that it describes the effects of linkage disequilibrium—though, because it assumes free recombination, these effects are only significant when selection on traits is strong; see Bulmer (1974) for an extension to allow linkage. However, if selection is weak enough that the population is at linkage equilibrium, the MaxEnt approximation will be more accurate than the infinitesimal model because it accounts for the effects of selection on the genetic variances. Of course, this does require knowledge of the distribution of effects of alleles and their interactions.

What are the possible applications of our results? First, our results allow us to tractably predict the temporal evolution of interesting macroscopic observables, such as the trait mean or heterozygosity, even when these are determined by an arbitrary function of a single locus or multiple loci under dynamically changing evolutionary forces. This is made possible by the drastic dimensionality reduction of the MaxEnt ansatz. The examples we presented here focused on exploring this “forward prediction” scenario in regimes where previously proposed approximation methods break down.

The MaxEnt approximation predicts the evolution of the allele frequency distribution—yet almost always we have only a single realization of the evolution of any one locus. However, whole-genome sequencing gives us information about the frequencies of alleles at very large numbers of loci. If these can be treated as independent realizations of a process common to all loci (or at least all loci in a functional class), then we can apply our method. Indeed, our assumptions are the same as those typically made in analyzing the distribution of frequencies of synonymous *vs.* nonsynonymous variants: each allele is taken to have an additive effect on fitness, drawn from some specific distribution. Usually, the distribution is assumed to be stationary. However, related species that have different effective population sizes (*e.g.*, Loewe *et al.* 2006) or newly formed sex chromosomes (*e.g.*, Zhou *et al.* 2013) require a time-dependent analysis of the kind proposed here.

Second, our results have consequences for inference of evolutionary forces from genomic and phenotypic data. The success of the MaxEnt approximation suggests that the allele frequency distribution remains close to the stationary form, even when selection, mutation, and population size are changing rapidly. This, in turn, suggests that it may be difficult to detect such changes from sequence data taken at a particular time point; note that the moments of the allele frequency distribution correspond directly to the distribution of genealogies. This is consistent with the finding that unless selection is very strong (), it has only weak effects on genealogical structure (Williamson and Orive 2002; Barton and Etheridge 2004).

However, there are several reasons why this conclusion may be too pessimistic. First, even when the MaxEnt approximation accurately predicts the expectations of the observables, the underlying distribution may not necessarily be close to the stationary form [*e.g.*, from Barton and de Vladar (2009, Figure 10)]. Second, when , sudden changes in population size and mutation rate cause immediate changes at the boundaries so that the distribution deviates from the stationary form—which is the problem that we address here. Indeed, it is relatively straightforward to detect strong population bottlenecks. Third, specific events, such as the sweep to fixation of a single mutation, can be detected. However, such events occur even when the ensemble is at a stationary state, so it still may be hard to find evidence for changing conditions. Tests that use an out-group or examine rates on a phylogeny (*e.g.*, Kimura 1977; McDonald *et al.* 1991; Goldman and Yang 1994) can detect changes in selection but require multiple species and so are not covered by the arguments here. Moreover, our argument from MaxEnt only applies to freely recombining loci: additional information may come from patterns along the genome, which depend on linkage disequilibria and on rates of recombination. Lastly, if instead of data taken at a particular time point we were provided with the temporal profile of changing observables , we could use our results to solve an inference problem and learn about the time courses of evolutionary forces **α**.

Third, our results, together with previous relevant theoretical work, allow us to interpret the evolutionary process in information-theoretic terms. What is the meaning of entropy, beyond being simply a tool for approximation? Minus the relative entropy is the Kullback-Leibler distance, a measure of divergence from the neutral distribution. If we include mutation in the base distribution ϕ (18), then minus the relative entropy measures the degree to which selection concentrates populations around states of high fitness. Following Kimura (1961), entropy changes can be seen as the information about its selective history that the population can transmit (Watkins 2008). Arguably, concentration around fit states is a better measure of adaptation than the increase in mean fitness: though fitness differences determine the rate of adaptation, they do not measure the outcome. Fitness may fluctuate, and absolute fitness must stay close to zero, on average, if populations are to persist. Mustonen and Lässig (2010) derive an intriguing relation between the gain in information (equal to the reduction in entropy) and the *fitness flux* Φ: (28)[(28) corrects a factor 2 error in Mustonen and Lässig (2010)]. This applies to a haploid population at linkage equilibrium, as assumed throughout this paper. Selection may change arbitrarily over time so that this relation gives a lower bound on the fitness flux Φ that is required to achieve a given gain in information (*i.e.*, reduction in entropy ). If selection changes slowly—as required for our MaxEnt approximation to be accurate—then the inequality approaches an equality. The fitness flux can be separated into a component resulting from selection (which must be positive and equal to the additive variance in fitness ) and the remaining components, resulting from mutation and drift. Because forces other than selection are expected, on average, to act against adaptation, the latter component is negative, so the additive variance in fitness should set a bound on the rate of information gain (*i.e.*, ).

Our approximation states that even out of equilibrium, the distribution of allele frequencies minimizes the information gain, subject to constraints on selected traits. By drastically reducing the dimensionality of the system to cover only the expectations of selected quantities, we can simplify expressions for the total fitness flux and variance in fitness over the evolutionary trajectory and therefore may be able to understand how these quantities limit the amount of information that can be accumulated by selection.

## Acknowledgments

We thank Harold de Vladar and Richard Kollár for helpful discussions. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement number 250152 (N.B.). This work was supported in part by the Human Frontiers Science Program (grant number RGP-0065/2012 to G.T.).

## Appendix A: Solution of MaxEnt by the Method of Lagrange Multipliers

The variational problem

(A1)with subject to constraints (A2)[*e.g.*, for directional selection and asymmetrical mutation and ] can be solved by the method of Lagrange multipliers, yielding an unconstrained maximization of a Lagrangian (A3)with multipliers λ and **α**. The variational derivative of this function with respect to its argument ψ is (A4)and leads to a solution (A5)where the normalization and the Lagrange multipliers are such that observables are correctly matched. The distribution coincides with the stationary solution of the Fokker-Planck equation (1) provided that the **α** values are the evolutionary forces (selection, mutation) and the **A** values are the traits associated with the underlying processes. The evolutionary forces appear in the constrained optimization as Lagrange multipliers .

## Appendix B: Matrices B and C in the Continuous DynMaxEnt Approximation

When and , the explicit form of matrix **B** is

as implemented in the Mathematica notebook (File S1). Because the expectation is taken over the stationary distribution , which depends on the effective parameters , the matrix is also a function of the effective evolutionary forces. The components of matrix may be calculated using special functions, as in Barton and de Vladar (2009).

Similarly, the components of the matrix can be written aswhere are functions of the microscopic allele frequencies, and after averaging over the stationary distribution , only the dependence on the effective forces remains. Thus, the right-hand side of vector equation (14) is solely a function of effective evolutionary forces , forming a system of ordinary differential equations of dimension equal to the number of observables.

## Appendix C: Discrete Dynamics in the Limit of Small *N*μ

As and become very small, the probability distribution (7) becomes concentrated at the boundaries. The populations then switch between fixation for the favorable and deleterious alleles at a given locus and can be described by the fraction *P*, , of populations fixed (or nearly fixed) for each allele:

The probability of fixation for the favorable allele is (Kimura 1962), and the rates of substitution of alleles by their counterparts, and , are (C2)Hence, the exact dynamics in the regime of small mutations have a form (C3)How does the standard MaxEnt approximation compare in the limit of low mutation rates? We keep mutation rates fixed and follow a single variable ; we define the complementary variable β* as the selection that gives *P* at stationarity. In the limit of low mutation rates, (C4)implying (C5)The same equilibrium formula is given by the ratio of substitution rates . Under the MaxEnt approximation, the rate of change is (C6)Moreover, in the limit of small mutations, (C7)where this expression has been obtained by computing contributions to the integrals separately for each of the boundaries and , for and for . Note that integration by parts has been used, resulting in the presence of the term in the denominator of (C7). Equation (C6) then becomes (C8)Thus, the MaxEnt approximation to the full distribution does not converge to the exact dynamics (C3) as mutation rates become small. Nevertheless, the dynamics are approximated quite well, provided that selection is not too strong (*e.g.*, ). The MaxEnt approximation greatly underestimates the rate of change at the margins and gives no effect of selection at the extreme allele frequencies (see Figure C1, left).

However, the equilibria necessarily agree: the exact and the approximate rates of change are zero at the same point. The right plot in Figure C1 shows the exact and the MaxEnt approximation; these are close to , but the solution becomes poor for (upper pair of curves). The MaxEnt solution is accurate for because *P* remains within the interior [), where the rate of change is approximated well.

## Appendix D: Failure of the Continuous DynMaxEnt for 4*N*μ,4*N*ν < 1

The log mean fitness typically will be a sum over moments of allele frequencies. For example, a selection gradient β on a trait with mean will introduce a component where and ; a model with dominance requires and , and epistasis introduces mixed second-order moments of the allele frequencies. Thus, the matrix **B** is an expectation over polynomial functions of allele frequencies and is well behaved.

In contrast, the elements of **B** that describe mutation diverge when or . To see this, consider a single locus for which , and the elements , , and are , , (B1). Thus, diverges when , and diverges when . If we can assume that mutation rates are fixed, then we can avoid the difficulty either by fixing the mutation rates always at their actual values (*i.e.*, , ) or by choosing a reference distribution that includes mutation, *e.g.*, , and dropping the observables *U* and *V*. These two approaches are equivalent because fixing the mutation rate leads to , leading to the same dynamics for β* and *h** as if the reference distribution included mutation.

We first explore the continuous DynMaxEnt approximation for and compare its accuracy with that for . We study the worst-case scenario when a selection suddenly changes sharply from to ; this is by no means an adiabatic change. Mutations equal , where the first choice () allows a full continuous DynMaxEnt approximation including constraints on and , while the second and third choices require a fixed instantaneous mutation rate and constraints on and dropped to keep the entries of the additive covariance matrix **B** finite. Figure D1 (top row) shows the predicted observables , , and estimated by the continuous DynMaxEnt method for each of the mutation rate alternatives compared with the exact solution, while keeping the heterozygosity fixed throughout the simulation (not employing the constraint on ). The method is accurate for (where the mutation rate also changes dynamically) but shows significant deviations from the true dynamics for .

We also show simulations (Figure D1, D–F) in which the selection on heterozygosity *h* is treated as a dynamical force, employing an additional constraint on . This increases the number of degrees of freedom in the approximate dynamics and affects the accuracy of the approximation. For instance, the decrease in performance for is partially caused by losing two degrees of freedom by fixing μ and ν. However, an increase in accuracy in Figure D1 (bottom row) is caused by introducing additional constraints on . This is visible both for , where the number of Lagrange multipliers was increased from three to four, and for , where the constraint on was complemented by the second constraint on a mean heterozygosity .

Figure D2 presents the dynamics of Lagrange multipliers in the example in Figure D1, where a sudden change in selection from to was applied to three systems with different mutation rates . While in the superthreshold regime , the mutation is allowed to change in the continuous DynMaxEnt approximation; in the subthreshold regime , it is fixed.

Figure D2 suggests that increasing the number of constraints; which increases the dimensionality of the problem, is associated with a slower convergence to a steady state and a separation of timescales. This is visible both for (left), where the number of Lagrange multipliers was increased from three to four, and for , where the constraint on was complemented by the second constraint on a mean heterozygosity .

Next, suppose that the scaled mutation rate is initially high enough that the stationary distribution is concentrated in the interior but then abruptly falls below the threshold at which populations are typically near fixation: falls from 2 to 0.1. (Note that a fall in also could be due to a fall in population size rather than in mutation rate.) We also assume that selection changes abruptly at the same time (*i.e.*, changes from to 1) in order to give the most challenging example: errors in estimating will be reflected in errors in the rate of change of the mean. Immediately after the fall in scaled mutation rate, probability accumulates at the boundary and develops a boundary layer of a form that agrees with the stationary form, but polymorphism decays more slowly, so the full distribution is not from the stationary class. The continuous DynMaxEnt method fails to capture the true dynamics because the mutation rates have to be instantly adjusted to the terminal values.

Figure D3 shows that three measures of diversity () change rapidly, with falling most rapidly because they are more sensitive to the rapid changes near the boundary than . Note that falls until because probability accumulates close to (note the log scale in Figure D3) but then increases again more slowly as favorable mutations substitute, transferring probability away from . The mean changes rapidly and substantially while heterozygosity is still high but then changes more slowly after , when the genetic variance is low and selection is limited by the influx of new mutations.

The changes in effective parameters are shown in Figure D3F. Because we assume that the population is initially in its stationary state, these parameters necessarily begin at their actual values (,, ). After the mutation rate decreases, the probability of being fixed increases rapidly, and therefore, and fall quickly, approaching their new values by; falls first because the probability of being near increases faster than the probability of being near . Over this time, changes even while the mean hardly changes in order to compensate for changing and . At later times, when the effective mutation rates are constant and close to their actual (low) rates, increases as the mean changes.

Fixing the effective mutation rates makes DynMaxEnt unable to capture transient properties of the adaptation process. The dynamics of the mean allele frequency follow , where can be obtained at each time as a value that gives the current mean allele frequency given and . The effective directional selection can be used further to compute the change of observables, as displayed in Figure D3 (dashed blue). This continuous DynMaxEnt approximation fails to capture transient dynamics of the observed quantities but still converges to the correct state. If the exact dynamics are followed for a short time from the switch of the evolutionary forces and only then is the continuous DynMaxEnt approximation initialized, the approximation would more closely agree with the actual observables. This is so because the real dynamics of the trait mean slow down in time as the system gets close to an equilibrium and therefore are better captured by an approximate process whose speed of adaptation is limited by the small fixed mutation rate.

## Appendix E: Derivation of the General DynMaxEnt Approximation for Low Mutation from the Diffusion Equation 1

We consider a directional selection acting on a single locus. The extension to multiple loci with different effects and overdominance is straightforward and discussed later.

The failure of the continuous DynMaxEnt method, described in Appendices C and D, requires a special treatment of the boundaries of the allele frequency domain to avoid divergence of the method owing to singularities in matrix **B**. A naive resolution is to completely disregard the boundaries and simply to truncate the allele frequency distribution. This *ad hoc* method, despite resolving problems with divergence of matrix **B**, would not capture the dynamics at the boundaries very accurately, particularly in the small-mutation regime. Thus, we approach the problem in a more pedantic way.

First, we split the domain to the interior and the boundaries and propose a dynamical model that captures all essential processes. The new system depends on a truncation parameter ε representing the width of the boundary layer. The stationary distribution of the system still can be represented as a solution of the variational problem with two additional constraints and Lagrange multipliers related to the boundary masses. The DynMaxEnt method can be derived for this expanded system. However, the resulting macroscopic dynamics of the effective forces involve inclusion of the coupling terms at , which are microscopic quantities; these are not in general accessible. Therefore, we introduce an approximation that does not directly prescribe the probability fluxes but circumvents this by setting the total mass at the boundaries consistently with the continuous stationary distribution. This step is crucial for our approximation. The effect is that the expanded dynamics are brought back to a dimension where only the mutation and selection forces are followed.

The derivation of the general DynMaxEnt approximation for a system where the stationary solution is a discontinuous function

(E1)consists of the following steps:

Set up a piecewise-defined diffusion dynamics with the appropriate stationary solution.

Use the stationary solution as an ansatz for a solution of the diffusion problem, allowing only the Lagrange multipliers to change in time.

Derive dynamics of the means of observables from this system and reduce it to a closed dynamical system for Lagrange multipliers.

1. Diffusion equation (1) can be written in the flux form (E2)where the flux *J* is defined as (E3)We set up the dynamical system for separately in the three regions of the domain: (E4)where are the fluxes in the system with split domain (E5) (E5)and where the coupling between regions, which influences the probability of extreme allele frequencies, is set as (E6) (E7) (E8) (E9)While the conditions (E6 and E7) at ensure that the probability mass does not dissipate through the boundaries, the conditions at set the fluxes in accordance with the fluxes of the original continuous dynamics (1), which are not explicitly known. This results in the same stationary solution in the bulk as in the original diffusion process. Moreover, the probability masses at the boundaries also will agree with the standard problem because the fluxes are identical. The stationary solution, computed for as a solution of the problem and analogously for subdomains and , has a form (17). Moreover, setting the total mass at each subdomain equal to the corresponding mass of the stationary solution (7) to leading order implies the choice of constant prefactors in (22).

2. The stationary distribution in (17) depends on five parameters (for directional selection), including κ and ρ related to boundary masses. These parameters relate to a set of observables as the corresponding Lagrange multipliers . We derive the full DynMaxEnt method for this extended problem. Only after the derivation of the full dynamics will we incorporate the relationship between and . We use an ansatz (E10)for the solution of (E4) that expresses the nonstationary solution of the equation as a sum of a stationary distribution, with effective Lagrange multipliers changing in time and δ representing the deviation from the stationary form. First, we express the left-hand side of (E4) using a chain rule:Next, we calculate each summand and, for clarity, drop the notation in the exponent of :where arises from differentiating ℤ and from differentiating with respect to . This leads toNext, we express the right-hand side of (E4) by artificially adding an expression . This term represents dynamics where the forces, applied to the system, equal the effective forces. Given stationarity, the term equals zero. The reason for its inclusion is a cancellation of the diffusive part of the equation that does not depend on **α**. This leads to a simple outcome where each term contains the difference between the true and the effective force:Equation (E4) thus becomes (E11)where the function in the bulk and and when and , respectively.

3. We then multiply the equation by and average via allele frequency distribution. The goal is to find effective forces such that the error terms vanish; *i.e.*, the projection of the full dynamics to the space of macroscopic quantities **A** is closed. Because there are *k* constraints and the same number of forces, such **α** values in principle exist and are unique. The most crucial implication is that the approximation of macroscopic quantities is forced to coincide with the exact values **A** while the quasi-stationarity is valid. The equation becomes (E12)Next, we use integration by parts on the right-hand side of the equation. This introduces boundary terms, *i.e.*, terms evaluated at coming from the integration by parts. We neglected these terms by assuming a rapid convergence to the stationary distribution and instantaneous adjustment of the fluxes at the boundaries of the subdomains to their stationary values. This stationarity assumption is adopted to avoid dependence of the approximation on the microscopic details of the distribution. This leads to a relationship between the moment dynamics and dynamics of Lagrange multipliers in the full model: (E13)where and symmetrical matrices are obtained by averaging against the stationary ε-dependent distribution (E14) (E15)with and are defined as **B** and **C** but in the context of the truncated stationary distribution and constraints and with asterisks denoting the symmetrical terms. The boundary masses entering into matrix are defined as (E16)The matrix is padded by zeros because and vanish in the interior of the subdomains. By neglecting the boundary terms coming from the integration by parts in the preceding calculation, we effectively limit the transfer of mass between the bulk and the boundaries so that the boundary masses agree with the continuous stationary distribution.

Note that the stationary allele frequency distribution depends on the allele frequencies but also on the effective forces . Therefore, the matrices and , as well as matrices and , obtained by averaging over the microscopic states are still functions of effective forces as in the continuous DynMaxEnt method. However, because the averages require integration through the interior domain and separately through the boundaries and , the integrals can no longer be written using the special functions and have to be evaluated numerically.

So far we have derived an extended dynamical system for the Lagrange multipliers . One of the key questions is whether the dynamics converge to the target state **α**. Because the system has form (E17)where (note that the standard form of the ODE has a negative sign in front of ), and convergence to the steady-state is captured by the sign of the eigenvalues of . When they are positive (*i.e.*, has negative eigenvalues), the target state is asymptotically stable. It is easy to see that both and are symmetrical and positive semidefinite ( is a covariance matrix and can be written as a square), and thus the matrix has nonnegative eigenvalues. Therefore, the dynamics of Lagrange multipliers should converge to the fixed point. This fixed point is precisely the target point **α** unless some of the eigenvalues converge to zero—then the dynamics may get trapped at a different point of the phase space and be unable to continue all the way to the target state.

Moreover, because the dynamics of Lagrange multipliers are five dimensional, while the dynamics of observables, driven by the **B** matrix, have only three degrees of freedom (because the entries of corresponding to and are zero), the method is in principle underdetermined. This essentially means that Lagrange multipliers μ and κ and ν and ρ may follow strange dynamics that together nevertheless lead to well-approximated observables. This is why in the next step we need to employ the constraints on κ and ρ by slaving their dynamics with dynamics of mutation rates: (E18)Imposing these constraints leads to reduced dynamics with only three degrees of freedom, and .

The full dynamics can be reduced by disregarding the entries of that are identically equal to zero and considering only the submatrix . Intuitively, one expects that the reduced dynamics will be identical to the dynamics (14) governed by 3 × 3 matrices and . However, we show that the **C** matrix involves additional terms. To do that, we start by taking the first three equations from (E11), leaving out the equations for the dynamics of ρ and κ and by substituting ρ and κ with the expressions for μ and ν from (E18):We are left with a system of three ODEs that involve terms from the dynamics of κ and ρ (the second sum on the left). We then substitute values of κ and ρ from (E18) and obtain the correct **C**matrix: (E19)resulting in a reduced dynamics . Note that despite the original intuition, the added terms in the second and third columns must be present to reflect the dynamical character of the prefactors in the stationary distribution. The case with dominance is treated in an analogous way and results in matrices and of dimension 4 × 4.

An important question is whether the dynamics described in the preceding system converge to the state **α**. This is true for the dynamics of observables, driven by matrix , no matter the initial and target state because of the symmetry and positive definiteness. This means that the approximation can, in principle, work, but it remains to be seen if the dynamics of Lagrange multipliers also have good convergence properties. The matrix is no longer symmetrical, and thus is not necessarily positive definite. However, all our simulations suggest that positive semidefiniteness holds, and the only way that the approximation may fail to converge to the fixed point α is when some of the eigenvalues go to zero. In such a case, the trajectory gets trapped at a different state. We have observed this behavior for , where one of the mutation rates approaches zero.

## Appendix F: Dependence on the Truncation Parameter *N*ε

Evolution of quantitative traits depends on the following nondimensional parameters: , , and . The general DynMaxEnt approximation, as well as the piecewise-defined diffusion (E4), depends on an additional nondimensional parameter that influences the quality of the general DynMaxEnt approximation. As this parameter gets very small, the method approaches the continuous method without boundary layers. Because in the limiting case matrix contains diverging components, the error of the general approximation should increase as gets smaller. Similarly, the error will be large when . This is so because the relationship (22), which assumed , no longer matches the boundary masses of the approximated stationary distribution with the correct stationary distribution. Moreover, the boundary is too wide for to disentangle the effects of selection and mutation.

Simulations in Figure F1 applied a sudden change in selection while the small mutation rates remained unperturbed. The figure shows that there exists an optimal value of the threshold that leads to the best approximation. This threshold seems to depend on the mutation rates but also weakly on the selection rate , which is consistent with a representation of as a mutation-dominated regime. The optimal threshold differs when considering error in the trait mean and in the mean heterozygosity.

It is useful to know how the general DynMaxEnt method performs for realistic examples given a sensible fixed choice of . We set such that ε corresponds to a frequency of one in the total of individuals. We took the reference simulation in Example 2 and varied strengths of the parameters , , and one at a time. Figure F2 shows that the relative error in a trait mean is for most parameter settings within 1%. The worst-case scenario occurs when the selection gradient is strong and mutations drop rapidly to a small value. This is expected because strong and rapid change in selection implies a large deviation from adiabatic regime; thus the quasi-stationarity assumption may fail. Additionally, small mutations lead to a separation of timescales in the model that may be too complex to capture by a simple approximation. Surprisingly, stronger selection on heterozygosity does not decrease the accuracy of the method, which is counterintuitive because large leads to more complex allele frequency distributions.

## Appendix G: Comparison of Different Methods

We compare three available methods: the discrete approximation (see Appendix C), which captures responses to a change in selection in the limit ; the continuous DynMaxEnt approximation (see Equation 14), valid for with a possibility to treat also small but static mutation; and the general DynMaxEnt approximation (see Equation 26), which extends the validity to dynamic mutations of an arbitrary magnitude. Note that dynamic Lagrange multipliers also may represent changing population size, not just the evolutionary forces.

Figure G1 shows two scenarios: perturbation of the system via a change of selection (A) and perturbation through changes in all evolutionary parameters (B). In the case of (A), the continuous DynMaxEnt method can be applied even in the case of small mutations by enforcing a fixed mutation rate in time. But even then the continuous method performs worse than the general method, which shows a relative error on the order of a few percent. In comparison, the discrete approximation gives an accurate estimate when mutations drop below , but given its simplicity, it does not capture unequal and dynamic mutation rates.

The power of the general DynMaxEnt is demonstrated in Figure G1B for dynamic mutations. This situation cannot be captured by the simple discrete approximation, and the continuous DynMaxEnt, not valid for , shows divergence that is an inherent property of the matrix **B**, approximating the dynamics of the observables. To show the performance of the continuous method, we included ∞ in the axis denoting the accuracy of the approximation.

## Appendix H: Irreversibility of the Adaptation Dynamics

Figure H1 compares the dynamics of the observables and Lagrange multipliers for Example 2, where the initially trimodal distribution loses polymorphism owing to a decrease in the selection for heterozygous individuals and a simultaneous decrease in the mutation rate and a reversed dynamics with the initial and final conditions switched. The dynamics are irreversible because switching the initial and final conditions leads to very different paths in the space of observables, as well as in the space of the effective forces. The dynamics in the space of observables show a good agreement with the exact solution of the problem. The paths from the initial to the final state are relatively straight. On the contrary, the dynamics of Lagrange multipliers show a complicated response of the system that exhibits a separation of timescales and overshooting of the equilibrium levels; *e.g.*, the effective selection coefficient grows to a level before decreasing back to .

## Appendix I: Equivalence Between Single-Locus Approximation and Multiple Loci with Equal Effects

A trait typically depends on many loci with different effects. If these effects are called , the simplest case is an additive trait . If linkage equilibrium is assumed, the constraints can be written in terms of allele frequencies at the loci where each constraint is additive via loci () with symmetrical expressions in terms of . While the selection-related constraints depend on the distribution of effects γ linearly, the mutation-related constraints do not depend on it. If the effects of the *L* loci on an additive trait are the same, and , thenwhere is the matrix of genetic covariances for *L* loci. This is true even in the approximation for small dynamic mutations because the boundary terms are also additive accross loci. A similar relationship holds for the covariance of fluctuations . Therefore, the matrix defining the MaxEnt dynamics of Lagrange multipliers in case of *L* loci with equal effects is identical to dynamics of Lagrange multipliers in case of a single locus, *i.e.*, . As a result, the trait mean , where *p* represents the allele frequency in the single-locus simulation with other parameters unchanged. However, the calculation of a trait mean for *L* independent loci of equal effects using the continuous model gives(I.1)showing that the relationship follows also from the model description.

## Appendix J: Additional Results for Multiple Loci with Unequal Effects

Figure J1 shows details on the quality of the general DynMaxEnt approximation in Example 3, where a linear trait depending on 100 loci with different effects evolves owing to a combination of a directional selection and selection on heterozygosity, random drift, and mutation. The changes in evolutionary forces are the same as in Example 2.

## Footnotes

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.184127/-/DC1.

*Communicating editor: J. Hermisson*

- Received October 28, 2015.
- Accepted February 9, 2016.

- Copyright © 2016 by the Genetics Society of America