## Abstract

Understanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the robustness of the out-of-Africa model inference to the choice of modern population.

UNDERSTANDING the role of demography and selection in shaping genetic diversity is a central challenge in population genetics. In recent years, genome sequencing experiments have generated large amounts of data that can be used to test and refine this understanding. Detailed models of human genetic diversity have shed a new light on the origins and history of modern humans and have helped researchers design and interpret biomedical studies.

Present-day genomes are the result of a large number of more or less randomly occurring mutations, recombinations, matings, and deaths. Individual events and their genomic consequences bear limited information about the past. We can, however, learn a lot by statistically integrating information across entire genomes and populations. Classical work has focused on simple summaries of genomic diversity that could be computed analytically, such as the number of pairwise differences between individuals, the number of segregating sites in a sample, or linkage disequilibrium (Crow and Kimura 1970). In recent years, the ability to simulate genomes and the availability of data has led to a wide array of summaries designed to identify different evolutionary forces (Patterson *et al.* 2012; Scheinfeldt and Tishkoff 2013; Schiffels and Durbin 2014). Here we consider the allele frequency spectrum (AFS): We compute the frequency of the derived allele at each site, and build a histogram of the number of sites observed at each frequency.

The AFS is a classical measure of diversity (Kimura 1964) that has seen a surge in popularity recently because of improved computational approaches to estimate it. Back-in-time approaches based on coalescence theory (Excoffier and Foll 2011; Excoffier *et al.* 2013; Kamm *et al.* 2017; Kelleher *et al.* 2016) can be extremely efficient for neutral simulations and can handle large number of populations, but they often become cumbersome or intractable in models with selection. Forward-in-time approaches tend to be more easily generalized to account for selection. Despite recent progress in discrete whole-population methods (Haller and Messer 2016), most current approaches are based on the diffusion approximation initiated by Fisher and Wright in the early 1930s (Fisher 1930; Wright 1931; Kimura 1964; Crow and Kimura 1970). In this approach, the evolution of the AFS is described as a continuous process through a partial differential equation. Several simulating tools based on this diffusion approximation have been implemented and distributed to the population genetics community (Gutenkunst *et al.* 2009; Lukić *et al.* 2011; Lukić and Hey 2012), among which is probably the most commonly used. In the past few years, these numerical tools have been used widely and led to significant results from both historical and biological points of view (Gutenkunst *et al.* 2009; Gravel *et al.* 2011; Schmutz *et al.* 2014).

Despite these successes, computational cost and stability remains an important limitation. For instance, available software can only handle up to four populations simultaneously can handle three). The time needed to perform individual simulations can be prohibitive when many simulations must be run, for example when performing bootstrap analysis for multiple models. Furthermore, biases inherent to Kingman’s coalescent or the diffusion approximation, which are benign in small samples, can become important in contemporary data sets (Bhaskar *et al.* 2014; Spence *et al.* 2016).

Here, we describe a new and efficient method to simulate the AFS that does not require the diffusion approximation. In cases where the diffusion approximation is appropriate, this approach is equivalent to a moment representation of the diffusion equation which is analytically intuitive and numerically tractable. We integrate this moment approach into the inference framework to facilitate inference from allele frequency distributions across a broader range of problems than was previously possible.

## Materials and Methods

### Heterozygosity rate evolution

First consider a large population of *N* diploid individuals evolving under the neutral Wright–Fisher model. Generations are discrete and individuals from generation receive alleles drawn randomly and with replacement from the parental alleles present in generation *k*. Because alleles in a diploid individual are inherited and transmitted independently in the neutral case, we can forget about diploid individuals and think of the population as a set of haploid samples (or *ploids*, for short). We are interested in the expected number of sites where the alternate allele is observed exactly *i* times in a sample of size *n* at generation *k*. We neglect correlations between sites and suppose that each locus is transmitted independently.

In this model, heterozygosity is proportional to the number of heterozygous sites in a sample of ploids. Under neutral Wright–Fisher reproduction, this follows the classical recursion(1)The derivation of this recursion is simple: If two ploids at generation are inherited from the same parental ploid, which happens with probability they are identical and do not contribute to Otherwise [with probability they are drawn randomly without replacement from the previous generation. The probability that they are different is by definition, leading to Equation 1.

### Neutral case AFS

We can similarly derive a recursion equation for the neutral AFS for arbitrary *n* and *i*. This time we draw *n* ploids from the previous generation. The probability that a given pair of ploids has the same parent is still For simplicity, we suppose that at most one ploid pair shares a parent at each generation. The situation with multiple coalescent events per generation, which can occur for large sample sizes (Bhaskar *et al.* 2014), is discussed in Appendix B, *Drift with Multiple Coalescences* (Figure B1).

If there is no coalescence, the *n* ploids at generation are copied from *n* randomly drawn ploids at generation *k* and the distribution of allele frequencies does not change: If there is a coalescence, the *n* descendants are copied from a random set of parental haplotypes. The distribution of possible parental frequencies is described by To treat all possible coalescence cases in a unified way, we imagine that we always draw *n* parental ploids, with the understanding that one selected parental ploid may not leave descendants. By doing this, we can express as a simple linear function of where is a sparse linear operator describing the change in allele frequency due of drift in a single generation. Its coefficients are provided in Appendix A.

*De novo* mutations can change the allele type and therefore also affect the allele frequency. Their effects on the entry of the AFS is described by the linear operator (2)As a first approximation and to easily compare to commonly used simulation tools, we consider the infinite-sites model (Kimura 1969), in which the only mutations allowed are at previously invariant sites, and at most one mutation per site can occur. In this model, the mutation term is a source term, with *u* being the genome-wide mutation rate. In the infinite-sites model, we therefore have(3)A more general form of the mutation operator, accounting for backward mutations, is discussed in Appendix B, *Finite-Genome Model for Mutations*.

Equation 3 is formally equivalent to a numerical discretization of the diffusion approximation. However, it avoids two approximations. The first is the approximation of a discrete Wright–Fisher system by a time- and frequency-continuous diffusion. The second is the approximation of a continuous diffusion by a discrete numerical partial differential equation solver.

For notational and numerical convenience, we approximate by the time derivative and work in continuous time:(4)Discrete-time results can be recovered by interpreting as everywhere.

### Modeling selection and dominance

To obtain similar recursion equations under selection, we consider a model of selection in which ploids are drawn uniformly from the previous generation, but are accepted with different rates depending on the fitness of the parent. Alleles drawn from a diploid parent are accepted with probability 1 for genotype for genotype and for genotype where is the selection coefficient and *h* is the dominance coefficient. If a ploid is not accepted, it must be redrawn at random from the entire population.

In this model, contrary to the neutral case, we may need to draw more than *n* alleles from generation *k* to build In what follows, we assume that so that at most one allele is rejected from the sample at each generation. This assumption is also implicit in the diffusion approximation. Higher-order corrections can also be computed.

Assuming we find(5)where is a sparse linear operator describing the change in sample allele frequencies due to selection. Its coefficients are given in Appendix A.

To compute Equation 5 requires which itself requires and so forth. The evolution equation for is not closed and cannot be solved directly in that form.

### Moment closure

To find a closed approximation to Equation 5 and compute directly from we want to approximate the larger sample AFS in terms of Intuitively, this should be possible for large enough *n*: a sample of size 102 adds little information that is not already contained in a sample of size 100. In fact, the number of variants to be discovered in a sample of size can be estimated accurately from a sample of size *n* by jackknife extrapolation [Gravel and National Heart, Lung, and Blood Institute (NHLBI) GO Exome Sequencing Project 2014].

Here we adapt jackknife extrapolation to perform the moment closure approximation. We consider a third-order jackknife to express entries of the higher-order spectra and as linear combinations of three entries of Higher-order jackknives would increase accuracy at the cost of computational complexity and stability. Since jackknives provide a linear approximation, we can write where is a sparse linear operator whose coefficients are provided in Appendix D. Under this approximation, the selection term becomes where we defined as the closed approximation to the selection operator. The resulting system is a closed and linear system of ordinary linear differential equations:

(6)### Multiple populations and migrations

In practice, we often want to study the distribution of allele frequencies across multiple populations where mating is more common within than across populations. In such a case, we consider the multidimensional AFS where **n** is a vector of sample sizes of the different populations. Thus, is the number of variants that are found times in population 1, times in population 2, and so forth.

Using the same method as for the single-population case, we can derive a system of ordinary equations for the joint AFS (see Appendix C, *Extension to Multiple Dimensions*). Because of migration, however, the evolution of depends on for When creating ploids in population *i* and in population *j*, we may occasionally draw, *e.g.*, ploids from *i* and from *j*. Thus evolution equations for are coupled for all such that We could therefore write evolution equations that are closed on the space of all However, this is numerically costly. For weak migration, with for all we can again use the jackknife to write uncoupled approximations to these equations. A simple way of achieving this is to write the evolution equation for in terms of the slightly larger AFS with and a unit vector with a unit value for the *j*th population. We then have(7)where is the drift operator and is the selection operator for population *j*, is the linear migration operator from populations *j* to population *k*, and the migration rate is the proportion of ploids in population *k* whose parent was in population *j*. Parameters for the migration operator are provided in Appendix A.

We can then use a jackknife approximation as above to writewhere is a sparse and linear jackknife operator (the code distributed with this article uses a slightly more complex jackknife formulation described in Appendix D). We then obtain a closed version of the migration operator, leading to a closed-form evolution equation for (8)So far, we supposed that migration was weak enough that the expected number of migrants per generation in our sample was less than one. However, many populations have received many migrants over few generations. This is the case, for example, in many human populations in the Americas.

In such cases, we compute the resulting AFS directly without using the jackknife approximation. Because migration is a stochastic process, the number of migrant lineages varies across loci. A given sample of 10 ploids may derive ancestry from population 1 at one locus, and from population 2 at another. In the independent-sites model and two-way admixture, the distribution of lineages that trace back to one population is binomial: The expected frequency spectrum is an integral over the possible numbers of migrating lineages. To generate *n* admixed lineages, we can therefore need as many as *n* lineages from each of the source populations. We explain in Appendix B, *Strong Migration and Admixture*, how the number of source population lineages can be chosen to maximize computational efficiency.

### Implementation and performance

We developed a python library, called *Moments*, to simulate multidimensional AFS and infer demographic history using the method described above. The linear set of ordinary differential equations (ODEs) is integrated using an implicit Crank–Nicolson scheme which is well suited to diffusion problems. For three and more populations problems, we use alternating direction implicit methods to speed up computations (Baolin and Wenzhi 1994). From the user perspective, the library is very similar to since we reused the software architecture, including the user interface and data handling methods. We added a number of new convenience features, described in Appendix F, but the most important differences are performance and generality.

Whereas could model up to three populations and MULTIPOP (Lukić and Hey 2012) up to four, *Moments* can handle models with up to five populations with selection, migrations, and population splits. The run time is still exponential in the number of populations, so the direct computation of large sample sizes for more than three populations remains challenging.

To provide a fair comparison between *Moments* and for a comparable problem size, we need to consider both accuracy and computation time. users can choose the number of grid points used for solving the diffusion equation. A large number of points results in slower but more accurate integration. We used the recommended strategy in namely using three different grid sizes and performing Richardson extrapolation.

*Moments* also presents a trade-off between speed and accuracy because the moment closure approximation improves with increasing *n*. *Moments* can therefore be made more accurate by computing the AFS for larger than the desired *n*, and subsampling to *n* after the integration. For the simulations below, we simply integrated with the desired *n*.

Table 1 shows comparisons between the two methods in several test cases. A more extensive set of test cases is provided in Appendix E.

In the case of neutral equilibrium with constant population size, we can compare our computations to the exact solution. For more complex cases, where analytical results are unavailable, we use with a very fine frequency grid and use the spectrum thus obtained as a reference to compare our method and Specifically, the reference AFS is computed with third-order Richardson extrapolations using fine grids—30, 35, and 40 times the sample size for one-dimensional (1D) and two-dimensional (2D) cases and 5, 6, and 8 times the sample size for three-dimensional (3D) simulations. Using this reference as the true AFS can induce a bias in favor of

We consider several configurations with up to three dimensions including selection, migration, and nonconstant populations sizes. We used 30 samples per population. The first five cases are simple integrations starting from a null spectrum: a single population without selection until the equilibrium is reached, two isolated populations under selection, and three populations with selection and migrations. The sixth model is the demographic model of the out-of-Africa expansion described in Gravel *et al.* (2011). Details of the models, and additional benchmarks, are provided in Appendix E.

We report the execution time, the mean relative error compared to the “true spectrum,” and the difference in likelihood where is the probability to observe the AFS *x* assuming that the expected AFS is *y*. Here “truth” is reference AFS and “test” is the AFS computed using each software with regular settings. The likelihood is computed as a product of Poisson likelihoods, one for each frequency spectrum entry (Gutenkunst *et al.* 2009). Finally, we report the number of biologically impossible, negative AFS entries.

In the given examples, *Moments* performs better than for most metrics. In the more general set of benchmarks provided in Appendix E, we find a few instances where outperforms *Moments*. Overall, we find that *Moments* is particularly efficient for integrations over long time periods, and rarely predicts negative frequencies.

For models of four and five populations, we could not compare *Moments* to since does not allow for such high-dimensional simulations. Similarly, comparison with was not possible when modeling reversible mutations in a finite genome. Figure 1 shows the predictions of *Moments* in two finite-genome models with a constant-size population, where analytical solutions are available for comparison.

### Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. Software is available at https://bitbucket.org/simongravel/moments.

## Application to Data

### Three populations out of Africa

We used our method to update the out-of-Africa expansion model described in Gutenkunst *et al.* (2009) and Gravel *et al.* (2011). These models are used in a wide variety of medical and evolutionary applications.

We used autosomal synonymous sequence data from the publicly available 1000 Genomes Project (Sudmant *et al.* 2015; 1000 Genomes Project 2015). We first computed the joint AFS for three populations: Yoruba individuals in Ibadan, Nigeria (YRI); Utah residents with northern and western European ancestry (CEU); and Han Chinese from Beijin (CHB). We restricted our analysis to 80 ploids for each population and fit the same 13-parameter demographic model as in Gutenkunst *et al.* (2009) and Gravel *et al.* (2011). However, whereas previous studies had access to capture data for a subset of the exome, here we had access to the entire high-coverage exome data. Moreover, we updated the mutation coefficient and the generation time to more realistic values: respectively (Gravel *et al.* 2013) and *T _{g}* = 29 year (Tremblay and Vézina 2000). The best-fit model is represented in Figure 2.

We used a Powell optimizer to find maximum-likelihood parameters (see Appendix F). Confidence intervals were obtained using recently introduced approaches for uncertainty estimation for composite likelihood (Coffman *et al.* 2016). Maximum-likelihood parameters are presented in Table 2.

### Validation with different trios of populations

Because of data and computational limitations, most previous studies have only considered a single triplet of populations for estimating an out-of-Africa model. We investigated whether other populations would yield consistent results. Here we inferred the parameters with three different trios incorporating data from Luhya from Kenya (LWK), British (GBR), and Kinh Vietnamese (KHV) populations in addition to the three original populations we used. Inference results are presented in Table 3.

The inference is robust to permutation of European and Asian populations, in the sense that inferred parameters across triplets are consistent with the confidence intervals obtained in the YRI-CEU-CHB triplet. However, changing the African population from YRI to LWK changes parameters beyond the confidence intervals. This is because the reported confidence intervals take into account the finite nature of the genome, but not the variation across populations. Parameters inferred from different triplets are not wildly different, reflecting the shared or similar histories of some of the populations, but quantitative details can vary substantially. Perhaps the most interesting difference is the more recent inferred split between LWK and Eurasians, compared to the split between YRI and Eurasians, supporting either population structure within Africa prior to the out-of-Africa population (with LWK ancestors more closely related to the out-of-Africa population), or interaction between LWK and Eurasians after the out-of-Africa event. This naturally begs for more detailed modeling of the relationships between African populations in the 1000 Genomes Project. This would however require a more thorough discussion of the historical and archeological context, and is outside the scope of this manuscript.

### Out-of-Africa model with four and five populations

As *Moments* handles up to five populations, we can simulate more complex models than the out-of-Africa model described in Gutenkunst *et al.* (2009). In this section, we consider a model of out-of-Africa expansion with four populations including one additional Asian population, the Japanese from Tokyo (JPT). The model is identical to that of *Three populations out of Africa* with an additional population split in Asia leading to the Japanese population (see Figure 3 and Table 4).

We also considered a model with five populations by adding a third Asian population, the KHV. To reduce computational time and facilitate comparison with the four-population model, we fixed the parameters inferred from the four-population model and inferred the new parameters (see Table 4) introduced by the split which gives rise to the KHV population.

## Discussion

### Connections with other models

#### Diffusion approximation:

A standard approach to simulate AFS evolution relies on a continuous approximation to the Wright–Fisher process. This approximation leads to an advection-diffusion equation describing the evolution of the population allele frequencies density (*i.e.*, the expected proportion of alternate ploids at frequency in the population and at time *t*).(9)The software uses a finite-difference scheme to solve this equation and approximate the distribution *φ*. The expected AFS entries are then computed via the integral:The AFS entries, which we had computed without reference to an underlying continuous model, can thus be seen as moments of the distribution *φ* computed in a noncanonical basis of polynomials. The idea of deriving evolution equations for the classical moments was proposed in Evans *et al.* (2007). Unfortunately, obtaining the allele frequency distribution from the requires a sum with large fluctuating terms, making it impractical for large sample sizes. Evans *et al.* (2007) pointed out that writing an equation in terms of the as was done here, might be preferable, but apparently did not realize that these equations could be brought to a simple form that avoids alternating coefficients.

In fact, we show in Appendix C, *1D Case*, that by computing a moment equation for the diffusion equation, we obtain precisely Equation 5, which is not more complex than the original diffusion equation and is, as we have seen, more numerically stable. Furthermore, the lack of closure of the evolution equation with selection is easily addressed by a jackknife procedure, at least for the weak selective values consistent with the diffusion approximation.

Despite the alternating sum problem, Živković *et al.* (2015) did show that the (Evans *et al.* 2007) equations for the could be used to compute the frequency spectrum, at least for a single population with piecewise constant size and for up to 200 ploids. They also proposed a truncation approach to resolve the moment closure problem. This truncation approach would not be practical in the moment representation of Equation 5, as setting high-order terms to zero would amount to neglecting selection altogether.

In short, the Evans *et al.* (2007) equations and the ones presented here are algebraically equivalent. The main advantages of the present formulation is numerical stability, ease of generalization, and the availability of the moment closure approach.

#### Particle models:

The existence of closed evolution equations for the AFS in a sample of size *n* suggests that we can learn about the evolution of a large population by tracking only *n* lineages forward in time. Consider for example Equation 3, describing neutral evolution for a large Wright–Fisher population. Equation 3 uses the fact that *n* samples are either copied from *n* distinct parents, or from distinct parents with one parent contributing to two offspring. The evolution of the expected sample frequency spectrum can therefore be emulated as a Moran model for a population of size *n* with a suitably scaled replacement rate, an interpretation followed in Kamm *et al.* (2017). A more general theoretical discussion of the relationship between effective “particle” models and continuous models can be found in Donnelly and Kurtz (1999).

#### Coalescent models:

Three related methods for estimating the neutral, multipopulation allele frequency distribution have been presented recently (Bryant *et al.* 2012; De Maio *et al.* 2015; Kamm *et al.* 2017). These use a transition matrix similar to the one computed here for the single population, neutral, large-*N* limit. Rather than directly integrating a high-dimensional, diffusion-like equation, these methods compute transition matrices for entire epochs during which populations are assumed to be isolated. They then combine these transition matrices using dynamic programming approaches, such as Felsenstein’s tree-peeling algorithm. These approaches can handle many populations because they can compute a subset of the entries of the AFS, reducing the dimensional curse affecting and *Moments*. However, they handle neither selection nor continuous migration. We see no reason why the jackknife-based transition matrices, which we introduced here to simulate the evolution of allele frequencies forward in time, could not also be used to introduce selection, multiple coalescence, and complex mutation models to the coalescent-based dynamic programming framework.

### Conclusions

We described a highly accurate and numerically robust approach to simulate the evolution of allele frequency distributions over time in a discrete Wright–Fisher model. On a practical level, this approach can be used wherever the diffusion approximation is applicable, in which case it typically provides faster, more accurate, and more robust solutions. Our software implementation will provide an easy transition to users: data handling and ancillary methods for *Moments* are largely copied from open-source code from model specification is simplified, and the number of adjustable parameters is reduced.

The applications of our solver to data from the 1000 Genomes Project largely recapitulate and refine worldwide models of genetic diversity. Within Asia, we found that the best model involved a split between CHB, JPT, and KHV ∼9000 years ago. Importantly, we found the choice of a “representative” population in out-of-Africa models can substantially affect the inferred parameters, even ones not directly involving the population. Unsurprisingly, the choice of the African population has the largest impact on the inferred demography, emphasizing that previous out-of-Africa models may be applicable to many Eurasian populations, but not to other African populations. Building models including more than one African population will likely provide much more information about human ancestry both in Africa and across the world.

## Acknowledgments

We thank Ryan Gutenkunst and Yun Song for useful discussions. This research was undertaken, in part, thanks to funding from the Canada Research Chairs program, Canadian Institutes of Health Research award MOP 136855, and Natural Science and Engineering Research Council Discovery grant (to S.G.), and Google Summer of Code (to W.L.).

## Appendix A: Coefficients of the Discrete Operators

In this section, we first present the full form of the discrete drift, selection, and migration operators involved in Equation 7 in the simplest model where selection, migration, and drift are treated linearly, and then present an outline of the derivation.

Remember that the ODE for an AFS in *p* populations, with diploid individuals in population *j* and sequenced ploids per population, is(A1)

where is the parental sample size required to reach sample size after migration from *j* to *k*, and is the vector of the canonical basis of

The drift operator has the form(A2)and is closely related to transition matrices derived in Evans *et al.* (2007) and Spence *et al.* (2016). The selection operator is defined by(A3)and the migration operator is(A4)

### Derivation for the Drift Operator

For simplicity, we derive the drift and selection operators in one dimension, since the multidimensional case is a straightforward generalization with a heavier notation burden. The probability of observing a single two-way coalescence in a sample of size *n* is(A5)where is the probability that a given pair of lineages coalesces, is the number of distinct pairs of lineages that can coalesce, is the probability that there is no additional coalescence to the same ploid, and is the probability that none of the remaining lineage coalesce to the remaining ancestors. If we only keep leading-order terms in we find the classical large-population limitGiven a single two-way coalescence, we then want to compute the frequency distribution in the offspring given the distribution in the parental generation. We imagine that the *n* descendant ploids are copied from *n* parental ploids, with one parental ploid drawn twice, and one parental ploid never drawn. Since the *n* parental ploids were drawn independently, their expected allele frequency distribution corresponds to the expected frequency in the parental generation. To compute the change in expected allele frequency between parental and offspring generation, we must compute the probability of allele frequency changes between parents and offsprings.

The probability of observing a transition from parental frequency to descendant frequency *i* after a two-way coalescence is the probability that the twice-selected ploid carries the reference genotype (which we represent as ), and the unselected ploid is nonreference (which we represent as ):Similarly, the gain of an alternate ploid results from event and has probabilityThe probability of transitioning from *i* alternate ploids to (via ) or (via ) isWe can finally compute the drift operator for a single two-way coalescence:(A6)which simplifies to Equation A2.

### Derivation of the Selection Operator

Here we derive a 1D version of Equation A3. The derivation is elementary but a bit cumbersome. See Appendix C for a different derivation that relies on the diffusion approximation.

We consider the action of selection on a single transmission, when a new ploid is drawn from a diploid individual. We derive the results in the context of negative selection and where we can think of selection as eliminating a proportion of the neutral transmissions. A transmission is eliminated by selection with probability if the parent is a heterozygote, and with probability if the parent is an alternate homozygote. For bookkeeping, we write the state of the parent as , with the empty circle representing a reference allele, solid circle representing the alternate allele, and the vertical line representing the putative initial selected ploid. When selection acts, this ploid is replaced by a new ploid, taken at random from the parental population. Since we assume that at most one selective event occurs per generation, we do not need to know the diploid state of this replacement ploid. In the selective event labeled as , the transmission of an alternate allele from a homozygous ancestor was eliminated and replaced by transmission from a reference allele. To generate *n* descendant ploids with one selective event, there have been a total of relevant ploids: the *n* selected initially, the diploid companion of the rejected ploid, and the replacement ploid. We want to compute the change in allele frequency caused by the selection process relative to the neutral transmission of the *n* selected ploids. We write for the probability that the *n* initial parental ploids included alternate alleles, but the offspring ploids include *i* because of selective process . This is proportional to the number of loci having the required alternate alleles in a sample of size the probability of selecting the correct triplet of ploids [namely the hypergeometric for the correct choice of three ploids times for the correct order], the probability that the selection event occurs for one transmission and the number of transmissions where selection can act (*n*). Putting all these together, we get(A7)The first three terms, of the form describe increases to the number of alleles at frequency *i*. The last three terms, of the form contribute to a decrease relative to the neutral case: Their contribution to the evolution equation will have opposite signs. Other combinations of ploids either do not change the frequency spectrum (*e.g.*, ), or have zero probability of happening in our selective model ().

Under an additive model, the diploid companion to the selected allele plays no role and the evolution should only depend on rather than To make this explicit, we collect terms into an additive term, proportional to *h*, and a dominance term, proportional to The dominance term thus vanishes under genic selection Its coefficient can be obtained by setting in the list of contributions from Equation A7. The dominance term readsThe additive term contains the remaining terms from (A7). We use the fact that and to simplify the additive term:(A8)We combine the first and last term using the downsampling formulaand similarly combine the middle two terms usingto reach Equation A3.

### Derivation of the Migration Operator

We derive the migration operator for populations *j* and *k*, with the probability that a ploid in population *k* has a parent from population *j*. We consider a sample of ploids from population 1 and from population *k* and suppose that the migration rate is small enough that at most one migration occurs per site per generation. Here we consider parental sample sizes to account for the fact that a lineage from *j* may be needed to produce the offspring.

In this case, there are only two ploids involved in the migration, the replaced and the replacement ploids, and two possible configurations that result in changes in the allele frequency, and , where circles represent ploids: a dark circle represents an alternate allele, and the ploid on the left is the migrant replacing the ploid to the right. If is the rate of increase of the number of loci with alternate allele counts because of process , we have two rates of increase:(A9)and two corresponding rates of decrease:(A10)[The indices in represent the allele frequency in parental sample of size The allele frequency would have been after discarding the extra lineage if there had been no migration.]

We can combine these to get the transition rate due to migration:(A11)The code in *Moments* uses a slightly different equation that can be derived from this by the application of the downsampling formulas, namely(A12)This makes it possible to only use the jackknife on the first two terms.

## Appendix B: Generalized Forms of the Operators

### Finite-Genome Model for Mutations

To facilitate comparison with previous work, we focused in this work on the infinite-sites model, assuming that mutations occur at previously invariant loci and neglecting back mutations. However, finite genome and back mutations are easily accommodated by the present model. In this case, the single-population mutation term is:where *μ* and *ν* are the forward and backward mutation rates per base pair.

### Drift with Multiple Coalescences

The standard diffusion approximation and Kingman’s coalescent model neglect the possibility that multiple coalescences may occur during the same generation in the history of a sample. We used a similar approximation to compute the drift operator in Appendix A. Even though multiple coalescences are indeed rare for small sample sizes and large populations, they can have a measurable effect for large sample sizes (Kamm *et al.* 2017).

Here we consider the next-order correction to the drift operator by accounting for three-way and double two-way coalescences (Figure B1), each contributing corrections of order

We can decompose the drift operator in contributions from two-way, three-way, and double two-way coalescenceThe computation of individual terms, outlined below, is cumbersome but elementary. The key result is that the rate of change in is a linear combination of the with *j* ranging from to

#### Single two-way coalescent

The two-way coalescent contribution is also described by Equation A6, but if we consider multiple coalescences we must also account for corrections to of order *i.e.*,

#### Single three-way coalescent

Similarly, the probability of a three-way coalescent, to leading order in *N*, isand we must now compute five transition probabilities.

For example, the transition probability from to *i* isHere is the hypergeometric distribution with *j* trials and *i* successes, sampling from a finite population of size *n* with *k* successes. To derive this result, we note that a triple coalescence can create a transition from to *i* if it involves three parental ploids: one (reference) chosen three times, and two (alternate) never chosen: . The probability of this happening is the product of the probability of drawing two alternate ploids in a sample of three [*i.e.*, times the probability that we picked the reference allele to coalesce (*i.e.*,

Similarly, we have contributions from , , and Finally we want the probability of changing from ancestral frequency *i* to any other offspring frequency. Because the only way to maintain the number of reference ploids under a triple coalescence is to pick either three reference ploids () or three nonreference ploids (), we have(B1)We finally have the corresponding drift operator(B2)with

#### Double two-way coalescent

The probability of drawing a double coalescence isHere we need to consider four parental ploids with two coalescences and two not selected. We have a term corresponding to Similarly, and contribute toWe then have the complementary contributions and toand forThere are now three possibilities for not changing the allele frequency, namely , , and , so that the probability of changing from frequency *i* given a double two-way coalescence isThe drift operator is therefore(B3)Changes to caused by drift are a simple linear function of the for This operator is sparse for large *n* and easy to compute numerically. Higher-order corrections in would simply involve more terms and a progressively denser linear operator.

### Strong Migration and Admixture

To compute the frequency spectrum after admixture, we need to consider the origin of lineages. Under strong recent bidirectional migration, there are loci at which all lineages come from population 1, and other loci at which all lineages come from population 2. Thus depends on the set for all containing lineages. This information is contained in the frequency spectrum

We can therefore writewhere is the probability of observing alternate allele counts when each allele is drawn independently and without replacement from the two ancestral samples with frequency The direct computation of summing over the possible inheritance patterns, is straightforward but computationally demanding for large sample sizes or high-dimensional applications. Here we describe two alternatives implemented in *Moments*.

### Exact Dynamic Programming

One alternative is to use dynamic programming: We can add an additional population to the frequency spectrum, and define a function that migrates a single lineage from populations 1 or 2 into population 3. For example, if we want to implement migration from population 1 into population 2 with migration rate we would need to start with a frequency spectrum of dimension After creating an additional dimension, the spectrum has dimension Then we would iteratively extrude migrant lineages, creating spectra of dimension then until we reach at which point we can simply discard the empty population 2.

The downside of this approach is that it requires the creation of a new population, and operations.

### Approximate Dynamic Programming

To further speed this up, we used a dynamic programming approximation that speeds up computation to and further reduces the number of lineages that must be tracked prior to the admixture event.

The general idea is to first perform migrations in place one ploid at a time, allowing later migrants to replace earlier ones. As we will see, the end result is not the desired frequency spectrum under the conventional admixture model which does not allow for replacement among migrant lineages. However, we will show how we can compute and correct for this difference.

To simulate one-way admixture from population 1 into population 2 at rate we therefore first allow one lineage at a time from population 1 to migrate into population 2. If is the frequency spectrum resulting from migrating one lineage from population 1 to population 2, starting from and changing the sample size from to the element of the resulting frequency spectrum is(B4)Starting from a frequency spectrum we sequentially compute the spectra resulting from migrations. Using projection operator to a sample size we obtain frequency spectra The parameter *ν* is chosen so that all plausible migrant numbers are covered, as explained below. The does not correspond to the desired frequency distribution after bulk admixture, because the number of migrant alleles per site does not have the desired binomial distribution. Our goal is then to choose a linear combination such that(B5)In the standard infinite-sites model, the number of replaced lineages in the admixed sample follows the binomial distribution We can therefore write the frequency spectrum as the sum where is the frequency spectrum that would result if exactly *j* lineages were replaced through migration.

To identify the set of that satisfy Equation B5, we also express as a linear combination of the The probability of observing *j* net replacements after *R* sequential replacements is easily computed numerically. This leaves us with If we express both sides of Equation B5 in terms of the we findWe would then like to choose the such that the coefficients are equal for each *j*:(B6)Provided that *ν* is large enough, this linear problem admits solutions for Unfortunately, exact solutions to this equation are prone to strong oscillations in the which lead to numerical instabilities. Rather than seeking exact but oscillating solutions to Equation B6, we seek nonnegative solutions that minimize the squared error using the active set method implemented in the scipy.optimize.nnls routine. Assuming that we found a solution with an acceptably small error, we finally use Equation B5 to compute the frequency spectrum.

To choose we note that *ν* sequential replacements give an average of net replacements once multiple replacements of the same lineage have been accounted for (*e.g.*, Mendelson *et al.* 2016). We find that we get accurate results if we have enough net replaced lineages to cover most likely cases in binomial sampling:

We compared the exact and approximate dynamic programming approaches to simulate admixture from population 1 into population 2 with target sizes The exact approach required 400 lineages from population 1 and took 70 sec on a 3.5 GHz processor. The approximate approach used 385 lineages from population 1 and took 5 sec on the same processor. The mean squared difference to the exact distribution of allele frequencies was 1 × 10^{−13}, and the largest *relative* error for an entry containing more than one millionth of counts was 6 × 10^{−6}. The approximate approach is therefore much more economical in computational time and the required number of lineages from the source populations.

## Appendix C: ODEs on the Moments

### 1D Case

In this section, we show how the system of ODEs (7) derived in the main text can also be derived by integration by parts from the diffusion approximation to the Wright–Fisher process. Under the diffusion approximation, the evolution of the single-population allele frequencies density follows (*e.g.*, Gutenkunst *et al.* 2009):(C1)The first term of the right-hand side is a diffusion term modeling the effect of genetic drift, and the second term is a transport term that accounts for selection. Finally, the source term models the mutation process under the infinite site assumption.

We are interested in the expected sample frequency spectra We only consider because in the infinite-sites model. The expected sample frequencies can be obtained by projecting the allele frequency density *ϕ* on weights The time derivative of is given by:The δ-function source term in Equation C1 is integrated easily. The leading-order term in is simply this is the rate at which mutations appear on *n* lineages over one generation:

#### Drift term

We want to rewrite the drift term of Equation C2 in terms of the moments To this end, we integrate by parts:As we assume that the first term is 0. We can integrate by parts the second term:As we consider a finite genome, and are finite and, once again, the term in square brackets is zero. Moreover, we have:Thus,Rearranging this expression, we can write it in terms of the which corresponds to the 1D version of Equation A2.

#### Selection term

Here again, we integrate by parts the selection term:We can again assume that the term in square brackets is zero and we rearrange the other integral term to write it in terms of the We finally get:which corresponds to the 1D version of Equation A3.

Bringing these expressions together, we get a system of ODEs on the (C3)This is the 1D, continuous-time equivalent to Equation (A1).

### Extension to Multiple Dimensions

The diffusion model can be generalized to multiple population studies. The drift, selection, and mutation terms are directly derived from the 1D case. We just need to add the effect of the migrations between the populations. The equation on the joint frequency spectrum is:(C4)Now **x** is a vector: It is the same for the parameters **s** and **h**, and is the reference population size for population *j*. The term accounts for the migrations where is the proportion of individuals in population *k* whose parents were born in population *j*.

We are interested in the multidimensional statistics generalizing the where and are vectors. The calculations are a bit more tedious but we can do the same as for the single-population case: integrating by parts each term and writing it in terms of the we recover Equation A1.

## Appendix D: Jackknife Approximation for Moment Closure

The evolution equations for involves higher-order terms, such as or which leads to a moment closure problem. To tackle this, we use a jackknife approach to approximate higher-order terms as linear functions of where is a set of indices chosen so that the frequency is close to the target frequency

Given a set we choose the coefficients so that the jackknife is exact for a given parameterized family of functions

The “order” of the jackknife refers to the number of terms in the more terms, the more general the family of functions can be, but the more numerically unstable the jackknife becomes.

We focus here on the third order jackknife for the terms and we will use the following approximation:where the index is chosen so that the frequency is as close as possible to and to satisfy the following boundary conditions:We choose the approximation so that the interpolation is exact for quadratic allele frequency distribution:Even though this seems like a drastic approximation, the jackknife only requires the quadratic approximation to hold locally: the parameters *a*, *b*, and *c* will be chosen independently for each value of *i*.

The moments become:The integral can be performed analytically to yield:

We then look for the jackknife coefficients and that cancel the approximation error:We replace by the expressions under Equation D1:This equation should hold for all values of (*a*, *b*, *c*), so we can use the particular values and and write the corresponding system of equations:We can compute the jackknife coefficients and by solving the previous linear system. We finally get:withThe same strategy is used to approximate as a linear combination of the entries. We use the same formulation for multidimensional simulations as the moment closure problem can be addressed separately in each dimension. For instance, in the 2D case, the selection in the first population will involve the higher-order term that we will approximate as follows:

## Appendix E: Benchmarks

In this section we present an extensive set of test cases to compare computational performances of *Moments* and in terms of speed and accuracy. Refer to paragraph *Implementation and performance* for more details about the metrics used for the comparisons. The results on test cases with 30, 80, and 200 ploids are given in Table 5, Table 6, and Table 7, respectively. A more explicit description of the test cases is given in Table 8. Unless otherwise specified, population sizes grow linearly: with *t* the time in genetic units. For the exponential growth, we used the function When dealing with several populations, we used the same selection and migration parameters as well as the same population growth functions for all the populations, except for the out-of-Africa model.

## Appendix F: Additional Features

*Moments* is based on interface but incorporates a new computation engine based on the method described in this article. In addition, we added a few convenience features and optimizations. We developed a model plotting module that returns a schematic representation of a given demographic model. This is not only convenient for presenting results, but also proved very useful in troubleshooting. Figure 2 has been generated with this module. Moreover, it is now possible to directly import data and extract the AFS from a vcf file and to generate bootstrapped frequency spectra from the original data set. Finally, we implemented the Powell method for likelihood optimization, which we found to be more robust than the gradient and simplex methods used previously in consistent with the findings of Excoffier *et al.* (2013).

## Footnotes

*Communicating editor: Y. S. Song*

- Received January 25, 2017.
- Accepted April 26, 2017.

- Copyright © 2017 by the Genetics Society of America