## Abstract

The joint and accurate inference of selection and demography from genetic data is considered a particularly challenging question in population genetics, since both process may lead to very similar patterns of genetic diversity. However, additional information for disentangling these effects may be obtained by observing changes in allele frequencies over multiple time points. Such data are common in experimental evolution studies, as well as in the comparison of ancient and contemporary samples. Leveraging this information, however, has been computationally challenging, particularly when considering multilocus data sets. To overcome these issues, we introduce a novel, discrete approximation for diffusion processes, termed *mean transition time approximation*, which preserves the long-term behavior of the underlying continuous diffusion process. We then derive this approximation for the particular case of inferring selection and demography from time series data under the classic Wright–Fisher model and demonstrate that our approximation is well suited to describe allele trajectories through time, even when only a few states are used. We then develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy and further extend this model to also infer the rates of sequencing errors and mutations. We finally apply our approach to recent experimental data on the evolution of drug resistance in influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported.

DETECTING signatures of past selective events gives insights into the evolutionary history of a species and elucidates the interaction between genotype and phenotype, providing important functional information. Unfortunately, a population’s demographic history is a major confounding factor when inferring past selective events, particularly because demographic events can mimic many of the molecular signatures of selection (Andolfatto and Przeworski 2000; Nielsen 2005). Despite efforts to create statistics robust to demography, all currently available methods to detect selection are prone to misinference under nonequilibrium demography.

Some of these issues can potentially be overcome by using multi-time-point data, as the trajectory of even a single allele contains valuable information about the underlying selection coefficient. Owing to advances in sequencing technologies, such multi-time-point data are becoming increasingly common from experimental evolution (Foll *et al.* 2014), from longitudinal medical or ecological studies (Wei *et al.* 1995; Renzette *et al.* 2014), and through ancient samples (Sverrisdóttir *et al.* 2014; Wilde *et al.* 2014). However, computationally efficient and accurate methods to infer demography and selection jointly from such data sets are still limited.

A natural and common way of modeling such time series data are in a hidden Markov model (HMM) framework, which allows efficient integration over the distribution of unobserved states of the true population frequencies, thus allowing calculation of the likelihood based on the observed samples. Williamson and Slatkin (1999), for instance, developed a maximum-likelihood approach based on such an HMM to infer the population size *N* from samples taken at different time points. More recently, similar approaches have been developed to infer population size along with the selection coefficient of a selected locus for which time series data are available (Bollback *et al.* 2008; Malaspinas *et al.* 2012).

All such approaches, however, are plagued by the problem that the number of hidden frequency states is equal to the population size, which renders HMM applications computationally unfeasible for large populations. Different routes have been taken to overcome this. One approach is to model the underlying Wright–Fisher process as a continuous diffusion process, which is then discretized for numerical integration using a numerical difference scheme (Bollback *et al.* 2008). Since this approach remains computationally expensive, it was later suggested to directly model the diffusion process on a more coarse-grained grid (Malaspinas *et al.* 2012). Under this approach, the generator matrix for the transition between the coarse-grained states is then approximated by fitting the first and second infinitesimal moments. Unfortunately, the minimum number of states required is still computationally prohibitive for large values of (Malaspinas *et al.* 2012). For this reason, the most recent reported method resorted to simulation-based approximate Bayesian computation (ABC), which allowed the joint inference of locus-specific selection coefficients for many loci (Foll *et al.* 2014, 2015). However, this method requires first estimating the population size under the assumption that all loci are neutral and thus may be biased when many loci are under selection.

Here we introduce a novel framework by approximating the Wright–Fisher (WF) process with a coarse-grained Markov model that exactly preserves the expected waiting times for transition between states. This is achieved by exploiting the theory of Green’s function for diffusion processes. Contrary to previous approaches, our approximation matches the WF process closely even when only very few states are considered, regardless of As we show with extensive simulations and a data application from experimental evolution, our method allows for accurate joint inference of both population size and locus-specific selection coefficients even in the presence of pervasive selection. Further, it is readily extended to incorporate population size changes, sequencing errors, or the appearance of novel mutations.

## Models

### Mean transition time approximation

Let be a diffusion process on the state space This is a continuous-time Markov process with continuous sample paths and with infinitesimal generator(1)For general information about diffusion processes we refer to Durrett (2008, Chap. 7) and Etheridge (2011, Chap. 3).

The classical example in population genetics is the Fisher–Wright diffusion, which we discuss below. We seek to find a discrete-state Markov process that approximates For this purpose, we subdivide the unit interval into, not necessarily equidistant, frequenciesThese form the states of For two states consider the transition time to the first visit of when starting at Similarly we define the transition time for the diffusion process We say that is a *mean transition time approximation* of if(2)for all pairs of states (see Figure 1). This condition guarantees that the paths of and exhibit comparable long-term behavior. In the following we show how to construct the Markov process from the diffusion process using the theory of Green’s function.

We begin by recalling some notions for diffusion processes. The natural scale of the process is given by(3)where see Durrett (2008, p. 264). The so-called speed measure is defined by(4)According to theorem 7.16 in Durrett (2008), Green’s function for an interval is given by(5)Denote by or the time to first visit of *u* or *v*, respectively, starting at *x*. Then is the exit time from the interval given the process is at *x* at time One can show (see Durrett 2008, p. 279)(6)Moreover, the probability of exiting at the lower limit *u* is(7)We now want to determine the instantaneous transition rates of the discrete-state Markov process Recall the definitionsandThe sojourn time of state *i.e.*, the time interval of spent in state is an exponential random variable with parameter Since the expectation of this exponential variable is our condition (2) enforceswhere we write instead of etc., to unburden the notation. From this we getandWe can now form the tridiagonal generator matrixThe transition matrix of the Markov process is given by

### Application to Wright–Fisher models

We consider a classic Wright–Fisher Model of two alleles that segregate in a population of size Time *t* is measured in generations of the Wright–Fisher process. In the presence of a nonvanishing dominance coefficient *h* the fitnesses of the three genotypes are given by and Under such a model, the infinitesimal mean, which corresponds to the change in allele frequency, is then given by (Ewens 2004, p. 13)(9)Let be a diffusion process corresponding to the frequency of allele *A*. As shown by Lacerda and Seoighe (2014), an excellent approximation of the Wright–Fisher process is obtained by setting(10)and(11)in the infinitesimal generator (1), where(12)and(13)when

Note that in the standard diffusion approximation the denominator term in is often omitted. But the above choice yields a much more accurate approximation to the WF process (Lacerda and Seoighe 2014).

From (3) and (4) we get(14)and(15)where we have set(16)For the speed measure we obtain(17)Consider three consecutive states and For the probability to exit at the lower state we get(18)The probability for exit at the upper state isObserve that Green’s function is calculated byUsing the quantities calculated above we get for the two parts of Green’s function(19)and(20)With numerical integration we can determineSpecifically, we use the extended Simpson’s rule for the numerical integration (Press 2007), which we found to give accurate results with typically only 8 or 10 intervals.

If is large, we get approximations for Green’s function that allow for analytic expressions of the integrals (see *Appendix*). Similarly, analytic expressions can be found in the special case (see *Appendix*).

### Bayesian inference

Consider that at the times samples of sizes were taken from the population and alleles *A* were observed in these samples. In this section, we describe how the mean transition time approximation introduced above can be embedded into a Bayesian inference scheme to estimate the population size and the locus-specific selection coefficient jointly from time series data.

As has been noted previously (Williamson and Slatkin 1999; Bollback *et al.* 2008; Malaspinas *et al.* 2012; Mathieson and McVean 2013; Lacerda and Seoighe 2014; Steinrücken *et al.* 2014), a natural way of modeling both the underlying evolutionary process and the process of sampling is a HMM. Under the assumption that the population size between two time points and is constant at the transition matrix of such an HMM from state to state is calculated bywhere and the generator matrix is determined as explained above using We note here that this framework allows for instantaneous population size changes to occur at every time *t* during the HMM. However, we henceforth deal only with situations in which the population size is assumed to be constant across the whole sampling period.

Following previous implementations (*e.g.*, Williamson and Slatkin 1999; Bollback *et al.* 2008; Malaspinas *et al.* 2012; Mathieson and McVean 2013; Lacerda and Seoighe 2014; Steinrücken *et al.* 2014), we assume that the sampling of alleles from the underlying population frequency is binomial; *i.e.*,However, for large sample sizes, the few states may be too coarse grained to capture the region of high emission probability. We thus propose to integrate the emission probabilities against a smoothing kernel. We chose to implement a *β*-distribution kernel, which is the conjugate prior to the binomial emission probabilities. As a result, this choice leads to a *β*-binomial emission probability that can be evaluated analytically. Specifically, we chose to use a *β*-kernel with mean and standard deviation such that the interval corresponds to in the case of equidistant states. Under this choice, the emission probabilities are then calculated bywhere is the Beta function and the parameters and are determined via the moment estimators for a *β*-distributionWith both transition and emission probability matrices at hand, we calculate the likelihood of the full data, using the standard forward recursion. To be specific, let us first define for the emission probability matricesDenoting we define the total probabilityThis total probability can be determined efficiently with the forward recursion (Murphy 2012, p. 609)(21)and Then one has(22)where we made explicit the dependence of this probability on the parametersIf we impose priors on the parameters, then we can simulate the posterior probability with the usual MCMC scheme using (22) and the Hastings ratio

### Extension of basic model

#### Sequencing errors:

Generally, sequencing errors are overcome with sufficient coverage. However, in many applications of next-generation sequencing to experimental evolution, the goal of the sequencing is not to infer individual genotypes, but instead allele frequencies directly. Under such a setting, each sequencing read is assumed to be from a different individual. In such cases, sequencing errors may lead to false inference, especially when allele frequencies are very small.

Incorporating sequencing errors into our framework is straightforward. Under the assumption that there are only two alleles present at the locus (achieved by, for instance, pooling all nonselected alleles into one class) and symmetric error rates *ε* between those classes, we can approximate the probability that the *i*th allele surveyed at time *t*, is *A* in the presence of sequencing errors as

#### Mutational input:

We allow for the production of mutant alleles only when the process is in state or The production of new alleles proceeds at a rate of Once a new allele is produced, say when the system is in state it must get from state into state This happens with probability which is calculated according to (7). This yields the transition rateSince is close to 0, we can assume that and see (16). Using (7) and (15) we obtainFor the production of a new allele in state an analogous argument yields the approximations and by (18)In the selection-free case, *i.e.*, in the limit the transition probabilities simplify to

### Implementation

We have implemented the proposed model and the Bayesian inference scheme in an easy-to-use C++ program available on our laboratory website (http://www.unifr.ch/biology/research/wegmann). While we use standard implementations for most aspects, we note the matrix exponentiation in Equation 8, which is a numerically very demanding problem. A classic algorithm for matrix exponentiation is by diagonalization of the matrix (Moler and Van Loan 1978). While computationally efficient, this algorithm may be numerically unstable for matrices with large condition numbers, which are typically observed when becomes large. This was previously observed by Malaspinas *et al.* (2012), who addressed this issue using multiple-precision arithmetics. Unfortunately, such arithmetics are computationally very demanding, leading to slow performance of their implementation.

Here, we propose to alleviate this problem, using the approximationwhich can be calculated by successive quadration. Such matrix multiplications are generally demanding, but can be implemented in a computationally efficient manner for generator matrices that are tridiagonal, as each quadration step adds only two additional diagonals and such band matrices can be multiplied efficiently (see Dahlquist and Björk 2008, Chap. 7.4).

We further mention the choice of frequency bins. Malaspinas *et al.* (2012) report that for their approach, a tighter spacing of frequencies toward the boundaries led to more accurate results, in particular with what they call a “quadratic grid.” We thus chose to implement, apart from a uniformly spaced grid, also a quadratic grid with andscaled such that The major difference of this choice from the quadratic grid proposed by Malaspinas *et al.* (2012) is that we do not force and as this would force us to change the frequency bins as a function of *N* during the MCMC and hence to recalculate emission probabilities.

### Application to influenza data

#### Influenza data:

We analyzed allele frequency data from whole-genome data sets of influenza H1N1 obtained in a recent evolutionary experiment (Renzette *et al.* 2014). While we refer the reader to the original study for a detailed description of the experimental setup, we summarize the key point briefly here: Influenza A/Brisbane/59/2007 (H1N1) was serially amplified on Madin–Darby canine kidney (MDCK) cells for 12 passages of 72 hr each to prevent any freeze–thaw cycles. After the three initial cycles, samples were passed either in the absence of drug or in the presence of increasing concentrations of oseltamivir, a neuraminidase inhibitor, for another nine passages. At the end of each passage, samples were collected for whole-genome high-throughput population sequencing up to a median coverage of >50,000×.

For our analysis here we considered only the time points taken during drug treatment (passages 4–12), but considered all 13,395 sites for which data were available (Foll *et al.* 2014). For each site, we first identified the two alleles having the highest frequencies over all passages and considered the minor allele to be the one with the lower frequency at the beginning of the experiment (passage 0). To avoid any bias, all other alleles were treated collectively as the major allele. We estimated *N* along with locus-specific selection coefficients *s*, the sequencing error rate *ε*, and the per site mutation rate *μ*. We assumed log-uniform priors on *N*, *ε*, and *μ* such that and and a normal prior on the selection coefficients such that Since viruses are haploid, we fixed the dominance coefficient at We then ran an MCMC using 51 states for 25,000 iterations during which each parameter was updated in turn. The first 2000 such iterations were discarded as burn-in phase.

#### Simulations:

To assess the accuracy of our approximation, we simulated trajectories under the discrete Wright–Fisher process and the diffusion process, as well as under the mean transition time approximation and the approximation proposed by Malaspinas *et al.* (2012). All simulations under the discrete Wright–Fisher model and the diffusion process were performed using binomial sampling and the Euler–Maruyama method, respectively. Those under approximations using frequency states were generated by simulating transitions according to the transition matrices calculated under the specific approximations.

To evaluate the power of our method to infer population sizes and selection coefficients, we also simulated data for 20 or 100 unlinked loci with *N* = 100, 1000, or 10,000. For each of these settings, we set either 20% or 80% of the loci to be under selection, with an equal representation of four selection coefficients: −0.1, −0.01, 0.01, and 0.1. All loci, both selected and neutral, had the starting allele frequency set at random. The change in allele frequency from one time point to the next was calculated under the Wright–Fisher model, matching the experimental setup of our application. Specifically, we simulated a total of 117 generations and took a sample of 1000 sequences every 13 generations, unless otherwise stated.

### Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

## Results and Discussion

### Mean transition time approximation

Comparisons of the long-term behavior of the here-introduced mean transition time approximation of the Wright–Fisher process with its discrete realization demonstrate the power of our approximation. In Figure 2 we show the frequency distribution of alleles with an initial frequency of 0.2 after 10 generations of selection and random drift under the discrete Wright–Fisher process for different population sizes and different selection strength. As expected from our assumptions, the distributions obtained under our approximation have identical means and show only a slightly increased variance for large selection coefficients and a small number of states. This finding is further strengthened when comparing this distribution over larger timescales up to 1000 generations (Figure 3), which also illustrates that our approximation leads to accurate loss/fixation ratios. In the situations studied here, all loci correctly fix in the case of strong selection or when *N* is large. In the case of and however, we estimate that 62.00% or 61.99% of all loci will be lost when using 1001 or 21 states, respectively. This is very similar to the proportion of 62.02% obtained among simulations with the diffusion approximated here.

A more direct illustration of our assumption is the comparison of the distribution of waiting times for a specific transition. As shown in Figure 2, our approximation indeed captures the mean transition time perfectly, while again exhibiting an increased variance for large selection coefficients and small number of states. Based on these results, and to keep the computational effort minimal, we use 51 states for all our inference shown below.

### Choice of grid

All results shown above were obtained with a uniform grid of frequency states. Following Malaspinas *et al.* (2012), we also implemented a quadratic grid, but we found the choice of grids not to affect our approximation noticeably. In general, we found the quadratic grid to describe probabilities close to boundaries more accurately, but to be less accurate for intermediate frequencies than a uniform grid. These differences, however, are small, do not inflate with increasing number of generations, and are visible only for a very low number of states (Supplemental Material, Figure S1). This suggests that our approximation is very robust to the choice of the grid and that the differences observed are due to the resolution in characterizing the probability distribution at different frequencies rather than an effect of the approximation itself. We thus continue using a uniform grid in the following.

### Comparison with related methods

Recently, Lacerda and Seoighe (2014) proposed to approximate the probability distribution of allele frequencies after *t* generations by a Gaussian distribution, the mean and variance of which can be obtained iteratively using the delta method. Their approach can easily be applied to the diffusion process studied here (see *Appendix*). As shown in Figure 3, the approximation obtained via the delta method and our approximation are very similar over a large range of the parameter space and also agree well with the diffusion process they both approximate. However, due to the assumption of a Gaussian distribution, the approximation obtained with the delta method is less accurate than our approach in describing allele frequencies close to boundaries. This is particularly true when selection is weak enough such that the probability of fixation is which results in a bimodal distribution (Figure 3A).

A major advantage of the delta approach, however, is its computational speed, which does not depend on the population size or the selection strength. Our method is generally much more demanding due to its reliance on matrix calculations rather than simple recursions. But the benefit of our approach lies in the discretization of allele frequencies, without which any inference from time-series data is computationally impossible whenever *N* is large.

In this regard, our method is closer to that introduced by Malaspinas *et al.* (2012) that also uses a grid of discretized allele frequencies. In contrast to our method, however, their approach approximates the mean and variance of the infinitesimal transition probabilities, rather than those of the resulting waiting times. While Malaspinas *et al.* (2012) derive their approximation for the classic diffusion, it is straightforward to generalize their approach and apply it to the diffusion studied here (see *Appendix*). As shown in Figure 3, their approximation holds generally well for most of the range tested, but allele frequencies appear to rise slightly too fast. More importantly, the approximation introduced by Malaspinas *et al.* (2012) requires substantially more states than our approximation due to the mathematical nature of the approximation. For the case of and shown in Figure 3D, for instance, a minimum of 5813 states are required. In contrast, our approximation is computationally stable even with just a handful of states and thus allows us to balance accuracy and computation effort regardless of *N* or *s*. This difference between the two approaches easily translates into a reduction in computation time of several orders of magnitude when attempting to infer parameters using an HMM and essentially rendering such an analysis unfeasible for large with the approximation introduced by Malaspinas *et al.* (2012), as has been reported recently (Foll *et al.* 2015).

### Power to infer population sizes

While allele trajectories are affected by both selection and drift, we aim here to disentangle these effects by integrating information from multiple loci. We first assessed the power to infer population sizes *N* accurately under ideal conditions, that is, for 100 unlinked loci in the absence of selection. In Figure 4 we show the likelihood surfaces for *N* obtained with a different number of states, for data simulated under different population sizes. While this analysis suggests high power to infer small population sizes accurately, it highlights the general issue of inferring large population sizes from changes in allele frequencies, accentuated when fewer states are used. The issue arises from the fact that in large populations and over the short time course of evolutionary experiments in general, the changes in allele frequencies between time points are so small that they are compatible with almost arbitrarily large populations. While using fewer frequency states further decreases the resolution of detectable allele frequency changes, we note that this issue is more general and expected to affect all methods for inferring population sizes from such data, particularly when a small number of samples is used. The best way to overcome it is to observe changes in allele frequencies over larger intervals. Indeed, when taking samples every 130 generations instead of every 13, population sizes up to *N* = 100,000 can be estimated accurately (Figure 4, bottom row).

### Power to infer selection

To assess the power of our framework to infer locus-specific selection coefficients, we simulated 100 unlinked loci, of which 20% experienced selection at various strengths. As shown in Figure 5, both the population size and the strength of selection affect the power of this inference. For medium to large population sizes, our method infers even small selection coefficients with high accuracy. When the population size is small, however, inference of selection proves more difficult (Figure 5). While this is generally expected due to the much larger effect of drift in small populations ( for the strongly selected alleles), it is accentuated here by our choice to simulate initial frequencies at random. Indeed, when given ideal starting frequencies (0.1 for positively and 0.9 for negatively selected alleles), our method identifies strongly selected alleles accurately even in small populations (Figure S2).

Remarkably, we found the power to infer population sizes as well as locus-specific selection coefficients not to be negatively affected under pervasive selection. This is illustrated by comparing the posterior distributions obtained from simulations where 80% of all loci were targeted by selection (Figure S3) to those shown here where only 20% were affected by selection (Figure 5). More direct evidence is given in Table 1, where we report the posterior probability for for different combinations of population sizes and selection coefficients and actually find higher power to identify selected loci in the case of pervasive selection than when only 20% of all loci were simulated under selection.

For computational efficiency, all results shown here were obtained using 51 states. However, we note a trade-off between power of inference and computational costs. As shown in Figure S4, using very few states (21) may lead to slightly broader posteriors and a small bias toward weaker values of *s*. Both effects are already largely overcome when using 51 states for most loci, but small improvements are still detectable with more states (Figure S4).

### Application to influenza data

We next applied our approach to publicly available sequencing data of influenza H1N1 segment 6, obtained at multiple time points throughout an evolutionary experiment in which the virus was exposed to an antiviral drug (oseltamivir) (Renzette *et al.* 2014). While allele frequencies are generally estimated with high accuracy due to the very high coverage in this experiment (∼50,000×), sequencing error may contribute substantially to the observed low-frequency variants. In addition, many of the observed mutations likely entered the population only during the experiment, but their exact time of origin is blurred by both the sequencing error and sampling. We thus extended our framework to estimate the mutation rate as well as the overall sequencing error rate jointly with the demographic and selection parameters.

We applied our extended method to each of the eight segments of the influenza genome individually, but obtained highly concordant results among all segments. As shown in Figure 6, we infer the effective population size during the experiment to be ∼7000, a mutation rate of ∼ and a sequencing error rate of ∼ While our estimates of the mutation and error rates are consistent with published mutation rates for influenza (Nobusawa and Sato 2006) and RNA viruses in general (Drake *et al.* 1998) and also with the employed quality filters on sequencing reads (Foll *et al.* 2014), our estimate of the population size is substantially larger than previous estimates of ∼225 (Foll *et al.* 2014). While we found our approach to slightly overestimate larger population sizes under the spacing of time points relevant here, there are several arguments supporting a larger population size. First, the original estimates were obtained under the assumption of neutrality at all loci, while our approach infers *N* jointly with selection. Second, the previous estimates were obtained from a small subset of the data, namely the 147 loci with an observed allele frequency after down-sampling to 1000 reads per locus at no less than three time points. In contrast, our inference is based on the raw data at the complete set of 13,395 loci, including those with small frequencies particularly informative about drift. Third, the original inference accounted for neither sequencing errors nor mutations. In summary, our results argue for a much larger effective population size than previously reported.

Our results on selection, on the other hand, are highly concordant with previous estimates. In Figure 6 we report the posterior distributions on the locus-specific selection coefficients for all polymorphic sites for each of the eight segments of the influenza genome. As expected, most mutations were found to be selectively neutral or under slight purifying selection (observe the slight asymmetry toward negative selection coefficients for many loci). For a few mutations, however, we found compelling evidence for them to be the target of positive selection (99% credible interval does not include 0). On segment NA, there were three such mutations, of which two stand out with an estimated selection coefficient ∼0.2. One of these mutations (Y274H) occurred at a locus at which resistance to oseltamivir has been previously described (Collins *et al.* 2008). Many additional mutations were found to be the target of selection throughout the genome, with many of those likely under negative selection. These are mutations that were found at elevated frequencies at the beginning of the experiment, yet at much lower frequencies after a few passages. The complete list of all mutations found to be under selection is given in Table S1.

### Conclusion

Here we present a novel, discrete approximation for diffusion processes. This approximation, which we term mean transition time approximation, is designed to preserve the long-term behavior of the continuous process it approximates, which renders it particularly suitable to study time-series data. Here we derived this approximation for the particular case of inferring selection and demography from such time-series data under the classic Wright–Fisher model. As shown through extensive simulations, our approximation is well suited to describe allele trajectories through time, even when only a few states are used. This allowed us to develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy. We further extended this model to estimate the average sequencing error rate, as well as the per generation mutation rate. The approach is further readily applicable to models of instantaneous population size changes. We finally applied our approach to data from a recent experiment on the evolution of drug resistance in influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported.

## Acknowledgments

We thank Nicolas Renzette for advice on how to identify the protein changes corresponding to individual mutations. We are grateful to two anonymous reviewers for their very constructive comments on an earlier version of this work. This study was supported by Swiss National Foundation grants PZ00P3_142643 and 31003A_149920 (to D.W.) and grants from the Swiss National Science Foundation and a European Research Council Starting Grant (to J.D.J.).

## Appendix

### Approximation for Large

If is large, we get approximations for Green’s function that allow for analytic expressions of the integrals. More precisely, assume that is large. We can then neglect the −1 terms in the numerator and denominator of (18) and we get the approximation(A1)which will be very small for large *γ*. The probability for exit at the upper state is Inserting the first approximating expression or into (19) and using we get(A2)The exponential term is dominant for *y* close to In the integral we can thus keep the factor of the exponential constant at since it does not vary much when *y* is close to :(A3)From (20) we get the approximationTo get we integrate this approximate expression. Observe that the exponential term becomes important only when *y* gets close to For this reason we can safely keep the factor in front of the exponential term constant when integrating the second term:(A4)Numerical experiments indicate that the approximate formulas (A3) and (A4) are adequate when the conditions(A5)are met. In that case we set andNote that formula (A4) gets singular for since in that case Using the substitution we get for that case from (20) the approximationThe last integral can be written as an exponential integralin the formUsing the approximationwhere is the Euler–Mascheroni constant, we finally arrive at(A6)Similarly, the case deserves special attention because the denominator of (A2) gets singular at Since is small and *y* even smaller, we can set and From (19) we then get the approximationsFrom this we obtain for the downward mean transition timebecause the integrand is very dominant at the upper integration limit. From (A1) we get the approximation and thus

### The Wright–Fisher Process in the Absence of Selection

In the absence of selection (), the expressions for the generator matrix can be explicitly evaluated since (see Equation 11). We have and From this we get(A8)The two parts of Green’s function are given byandThese integrate to(A9)and(A10)As above we determine the transition rates by

### Approximations via the Delta Method

Following the argument of Lacerda and Seoighe (2014), an approximate solution to the diffusion equation can be obtained by the delta method. While their original formulation applies to the discrete Wright–Fisher process, the argument works as well for the diffusion process studied here.

As above (Equation 1), is a diffusion process on the state space with infinitesimal generator(A11)Recall that the infinitesimal moments of the diffusion process are given byThe mean of the process can be approximated iteratively as follows:In the last step, we used the delta approximation Similarly, we apply the delta approximation to get an iterative approximation for the variation:For the case of and by inserting (9), one gets in particular

### Approximations as Proposed by Malaspinas *et al.*

As in Malaspinas *et al.* (2012), we construct the Markov chain with states by matching the infinitesimal mean and infinitesimal variance of *U* and *X*. This allows us to determine the generator matrix Here we generalize their notation for any diffusion processFrom formulas (8) and (9) from (Malaspinas *et al.* 2012) we then getThese can be solved for the infinitesimal generatorswhere we used the abbreviation To apply these general formulas to the particular diffusion studied here we simply use and as given in Equations 10 and 11, respectively.

## Footnotes

*Communicating editor: J. Wakeley*Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.184598/-/DC1.

- Received November 7, 2015.
- Accepted March 22, 2016.

- Copyright © 2016 by the Genetics Society of America