Abstract
The joint and accurate inference of selection and demography from genetic data is considered a particularly challenging question in population genetics, since both process may lead to very similar patterns of genetic diversity. However, additional information for disentangling these effects may be obtained by observing changes in allele frequencies over multiple time points. Such data are common in experimental evolution studies, as well as in the comparison of ancient and contemporary samples. Leveraging this information, however, has been computationally challenging, particularly when considering multilocus data sets. To overcome these issues, we introduce a novel, discrete approximation for diffusion processes, termed mean transition time approximation, which preserves the long-term behavior of the underlying continuous diffusion process. We then derive this approximation for the particular case of inferring selection and demography from time series data under the classic Wright–Fisher model and demonstrate that our approximation is well suited to describe allele trajectories through time, even when only a few states are used. We then develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy and further extend this model to also infer the rates of sequencing errors and mutations. We finally apply our approach to recent experimental data on the evolution of drug resistance in influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported.
DETECTING signatures of past selective events gives insights into the evolutionary history of a species and elucidates the interaction between genotype and phenotype, providing important functional information. Unfortunately, a population’s demographic history is a major confounding factor when inferring past selective events, particularly because demographic events can mimic many of the molecular signatures of selection (Andolfatto and Przeworski 2000; Nielsen 2005). Despite efforts to create statistics robust to demography, all currently available methods to detect selection are prone to misinference under nonequilibrium demography.
Some of these issues can potentially be overcome by using multi-time-point data, as the trajectory of even a single allele contains valuable information about the underlying selection coefficient. Owing to advances in sequencing technologies, such multi-time-point data are becoming increasingly common from experimental evolution (Foll et al. 2014), from longitudinal medical or ecological studies (Wei et al. 1995; Renzette et al. 2014), and through ancient samples (Sverrisdóttir et al. 2014; Wilde et al. 2014). However, computationally efficient and accurate methods to infer demography and selection jointly from such data sets are still limited.
A natural and common way of modeling such time series data are in a hidden Markov model (HMM) framework, which allows efficient integration over the distribution of unobserved states of the true population frequencies, thus allowing calculation of the likelihood based on the observed samples. Williamson and Slatkin (1999), for instance, developed a maximum-likelihood approach based on such an HMM to infer the population size N from samples taken at different time points. More recently, similar approaches have been developed to infer population size along with the selection coefficient of a selected locus for which time series data are available (Bollback et al. 2008; Malaspinas et al. 2012).
All such approaches, however, are plagued by the problem that the number of hidden frequency states is equal to the population size, which renders HMM applications computationally unfeasible for large populations. Different routes have been taken to overcome this. One approach is to model the underlying Wright–Fisher process as a continuous diffusion process, which is then discretized for numerical integration using a numerical difference scheme (Bollback et al. 2008). Since this approach remains computationally expensive, it was later suggested to directly model the diffusion process on a more coarse-grained grid (Malaspinas et al. 2012). Under this approach, the generator matrix for the transition between the coarse-grained states is then approximated by fitting the first and second infinitesimal moments. Unfortunately, the minimum number of states required is still computationally prohibitive for large values of (Malaspinas et al. 2012). For this reason, the most recent reported method resorted to simulation-based approximate Bayesian computation (ABC), which allowed the joint inference of locus-specific selection coefficients for many loci (Foll et al. 2014, 2015). However, this method requires first estimating the population size under the assumption that all loci are neutral and thus may be biased when many loci are under selection.
Here we introduce a novel framework by approximating the Wright–Fisher (WF) process with a coarse-grained Markov model that exactly preserves the expected waiting times for transition between states. This is achieved by exploiting the theory of Green’s function for diffusion processes. Contrary to previous approaches, our approximation matches the WF process closely even when only very few states are considered, regardless of As we show with extensive simulations and a data application from experimental evolution, our method allows for accurate joint inference of both population size and locus-specific selection coefficients even in the presence of pervasive selection. Further, it is readily extended to incorporate population size changes, sequencing errors, or the appearance of novel mutations.
Models
Mean transition time approximation
Let be a diffusion process on the state space
This is a continuous-time Markov process with continuous sample paths and with infinitesimal generator
(1)For general information about diffusion processes we refer to Durrett (2008, Chap. 7) and Etheridge (2011, Chap. 3).
The classical example in population genetics is the Fisher–Wright diffusion, which we discuss below. We seek to find a discrete-state Markov process that approximates
For this purpose, we subdivide the unit interval
into, not necessarily equidistant, frequencies
These form the states of
For two states
consider the transition time to the first visit of
when starting at
Similarly we define the transition time for the diffusion process
We say that
is a mean transition time approximation of
if
(2)for all pairs of states
(see Figure 1). This condition guarantees that the paths of
and
exhibit comparable long-term behavior. In the following we show how to construct the Markov process
from the diffusion process
using the theory of Green’s function.
Mean transition time approximation of Markov processes. Shown are the realizations of a continuous diffusion process (black) and a discrete-state Markov process
(red) starting at
until they reach
for the first time. If the expected waiting time for such a transition is the same for both processes for all pairs of states
we say that
is a mean transition time approximation of
We begin by recalling some notions for diffusion processes. The natural scale of the process is given by
(3)where
see Durrett (2008, p. 264). The so-called speed measure is defined by
(4)According to theorem 7.16 in Durrett (2008), Green’s function for an interval
is given by
(5)Denote by
or
the time to first visit of u or v, respectively, starting at x. Then
is the exit time from the interval
given the process is at x at time
One can show (see Durrett 2008, p. 279)
(6)Moreover, the probability of exiting at the lower limit u is
(7)We now want to determine the instantaneous transition rates
of the discrete-state Markov process
Recall the definitions
and
The sojourn time of state
i.e., the time interval of
spent in state
is an exponential random variable with parameter
Since the expectation of this exponential variable is
our condition (2) enforces
where we write
instead of
etc., to unburden the notation. From this we get
and
We can now form the tridiagonal generator matrix
The transition matrix of the Markov process
is given by

Application to Wright–Fisher models
We consider a classic Wright–Fisher Model of two alleles that segregate in a population of size Time t is measured in generations of the Wright–Fisher process. In the presence of a nonvanishing dominance coefficient h the fitnesses of the three genotypes are given by
and
Under such a model, the infinitesimal mean, which corresponds to the change in allele frequency, is then given by (Ewens 2004, p. 13)
(9)Let
be a diffusion process corresponding to the frequency of allele A. As shown by Lacerda and Seoighe (2014), an excellent approximation of the Wright–Fisher process is obtained by setting
(10)and
(11)in the infinitesimal generator (1), where
(12)and
(13)when
Note that in the standard diffusion approximation the denominator term in is often omitted. But the above choice yields a much more accurate approximation to the WF process (Lacerda and Seoighe 2014).
From (3) and (4) we get(14)and
(15)where we have set
(16)For the speed measure we obtain
(17)Consider three consecutive states
and
For the probability to exit at the lower state we get
(18)The probability for exit at the upper state is
Observe that Green’s function is calculated by
Using the quantities calculated above we get for the two parts of Green’s function
(19)and
(20)With numerical integration we can determine
Specifically, we use the extended Simpson’s rule for the numerical integration (Press 2007), which we found to give accurate results with typically only 8 or 10 intervals.
If is large, we get approximations for Green’s function that allow for analytic expressions of the integrals (see Appendix). Similarly, analytic expressions can be found in the special case
(see Appendix).
Bayesian inference
Consider that at the times
samples of sizes
were taken from the population and
alleles A were observed in these samples. In this section, we describe how the mean transition time approximation introduced above can be embedded into a Bayesian inference scheme to estimate the population size
and the locus-specific selection coefficient jointly from time series data.
As has been noted previously (Williamson and Slatkin 1999; Bollback et al. 2008; Malaspinas et al. 2012; Mathieson and McVean 2013; Lacerda and Seoighe 2014; Steinrücken et al. 2014), a natural way of modeling both the underlying evolutionary process and the process of sampling is a HMM. Under the assumption that the population size between two time points and
is constant at
the transition matrix of such an HMM from state
to state
is calculated by
where
and the generator matrix
is determined as explained above using
We note here that this framework allows for instantaneous population size changes to occur at every time t during the HMM. However, we henceforth deal only with situations in which the population size is assumed to be constant across the whole sampling period.
Following previous implementations (e.g., Williamson and Slatkin 1999; Bollback et al. 2008; Malaspinas et al. 2012; Mathieson and McVean 2013; Lacerda and Seoighe 2014; Steinrücken et al. 2014), we assume that the sampling of alleles from the underlying population frequency is binomial; i.e.,However, for large sample sizes, the few states
may be too coarse grained to capture the region of high emission probability. We thus propose to integrate the emission probabilities against a smoothing kernel. We chose to implement a β-distribution kernel, which is the conjugate prior to the binomial emission probabilities. As a result, this choice leads to a β-binomial emission probability that can be evaluated analytically. Specifically, we chose to use a β-kernel with mean
and standard deviation
such that the interval
corresponds to
in the case of equidistant states. Under this choice, the emission probabilities are then calculated by
where
is the Beta function and the parameters
and
are determined via the moment estimators for a β-distribution
With both transition and emission probability matrices at hand, we calculate the likelihood of the full data, using the standard forward recursion. To be specific, let us first define for
the
emission probability matrices
Denoting
we define the total probability
This total probability can be determined efficiently with the forward recursion (Murphy 2012, p. 609)
(21)and
Then one has
(22)where we made explicit the dependence of this probability on the parameters
If we impose priors
on the parameters, then we can simulate the posterior probability
with the usual MCMC scheme using (22) and the Hastings ratio

Extension of basic model
Sequencing errors:
Generally, sequencing errors are overcome with sufficient coverage. However, in many applications of next-generation sequencing to experimental evolution, the goal of the sequencing is not to infer individual genotypes, but instead allele frequencies directly. Under such a setting, each sequencing read is assumed to be from a different individual. In such cases, sequencing errors may lead to false inference, especially when allele frequencies are very small.
Incorporating sequencing errors into our framework is straightforward. Under the assumption that there are only two alleles present at the locus (achieved by, for instance, pooling all nonselected alleles into one class) and symmetric error rates ε between those classes, we can approximate the probability that the ith allele surveyed at time t, is A in the presence of sequencing errors as

Mutational input:
We allow for the production of mutant alleles only when the process is in state or
The production of new alleles proceeds at a rate of
Once a new allele is produced, say when the system is in state
it must get from state
into state
This happens with probability
which is calculated according to (7). This yields the transition rate
Since
is close to 0, we can assume that
and
see (16). Using (7) and (15) we obtain
For the production of a new allele in state
an analogous argument yields the approximations
and by (18)
In the selection-free case, i.e., in the limit
the transition probabilities simplify to

Implementation
We have implemented the proposed model and the Bayesian inference scheme in an easy-to-use C++ program available on our laboratory website (http://www.unifr.ch/biology/research/wegmann). While we use standard implementations for most aspects, we note the matrix exponentiation in Equation 8, which is a numerically very demanding problem. A classic algorithm for matrix exponentiation is by diagonalization of the matrix (Moler and Van Loan 1978). While computationally efficient, this algorithm may be numerically unstable for matrices with large condition numbers, which are typically observed when becomes large. This was previously observed by Malaspinas et al. (2012), who addressed this issue using multiple-precision arithmetics. Unfortunately, such arithmetics are computationally very demanding, leading to slow performance of their implementation.
Here, we propose to alleviate this problem, using the approximationwhich can be calculated by successive quadration. Such matrix multiplications are generally demanding, but can be implemented in a computationally efficient manner for generator matrices that are tridiagonal, as each quadration step adds only two additional diagonals and such band matrices can be multiplied efficiently (see Dahlquist and Björk 2008, Chap. 7.4).
We further mention the choice of frequency bins. Malaspinas et al. (2012) report that for their approach, a tighter spacing of frequencies toward the boundaries led to more accurate results, in particular with what they call a “quadratic grid.” We thus chose to implement, apart from a uniformly spaced grid, also a quadratic grid with and
scaled such that
The major difference of this choice from the quadratic grid proposed by Malaspinas et al. (2012) is that we do not force
and
as this would force us to change the frequency bins as a function of N during the MCMC and hence to recalculate emission probabilities.
Application to influenza data
Influenza data:
We analyzed allele frequency data from whole-genome data sets of influenza H1N1 obtained in a recent evolutionary experiment (Renzette et al. 2014). While we refer the reader to the original study for a detailed description of the experimental setup, we summarize the key point briefly here: Influenza A/Brisbane/59/2007 (H1N1) was serially amplified on Madin–Darby canine kidney (MDCK) cells for 12 passages of 72 hr each to prevent any freeze–thaw cycles. After the three initial cycles, samples were passed either in the absence of drug or in the presence of increasing concentrations of oseltamivir, a neuraminidase inhibitor, for another nine passages. At the end of each passage, samples were collected for whole-genome high-throughput population sequencing up to a median coverage of >50,000×.
For our analysis here we considered only the time points taken during drug treatment (passages 4–12), but considered all 13,395 sites for which data were available (Foll et al. 2014). For each site, we first identified the two alleles having the highest frequencies over all passages and considered the minor allele to be the one with the lower frequency at the beginning of the experiment (passage 0). To avoid any bias, all other alleles were treated collectively as the major allele. We estimated N along with locus-specific selection coefficients s, the sequencing error rate ε, and the per site mutation rate μ. We assumed log-uniform priors on N, ε, and μ such that
and
and a normal prior on the selection coefficients such that
Since viruses are haploid, we fixed the dominance coefficient at
We then ran an MCMC using 51 states for 25,000 iterations during which each parameter was updated in turn. The first 2000 such iterations were discarded as burn-in phase.
Simulations:
To assess the accuracy of our approximation, we simulated trajectories under the discrete Wright–Fisher process and the diffusion process, as well as under the mean transition time approximation and the approximation proposed by Malaspinas et al. (2012). All simulations under the discrete Wright–Fisher model and the diffusion process were performed using binomial sampling and the Euler–Maruyama method, respectively. Those under approximations using frequency states were generated by simulating transitions according to the transition matrices calculated under the specific approximations.
To evaluate the power of our method to infer population sizes and selection coefficients, we also simulated data for 20 or 100 unlinked loci with N = 100, 1000, or 10,000. For each of these settings, we set either 20% or 80% of the loci to be under selection, with an equal representation of four selection coefficients: −0.1, −0.01, 0.01, and 0.1. All loci, both selected and neutral, had the starting allele frequency set at random. The change in allele frequency from one time point to the next was calculated under the Wright–Fisher model, matching the experimental setup of our application. Specifically, we simulated a total of 117 generations and took a sample of 1000 sequences every 13 generations, unless otherwise stated.
Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.
Results and Discussion
Mean transition time approximation
Comparisons of the long-term behavior of the here-introduced mean transition time approximation of the Wright–Fisher process with its discrete realization demonstrate the power of our approximation. In Figure 2 we show the frequency distribution of alleles with an initial frequency of 0.2 after 10 generations of selection and random drift under the discrete Wright–Fisher process for different population sizes and different selection strength. As expected from our assumptions, the distributions obtained under our approximation have identical means and show only a slightly increased variance for large selection coefficients and a small number of states. This finding is further strengthened when comparing this distribution over larger timescales up to 1000 generations (Figure 3), which also illustrates that our approximation leads to accurate loss/fixation ratios. In the situations studied here, all loci correctly fix in the case of strong selection or when N is large. In the case of and
however, we estimate that 62.00% or 61.99% of all loci will be lost when using 1001 or 21 states, respectively. This is very similar to the proportion of 62.02% obtained among
simulations with the diffusion approximated here.
Mean transition time approximation of Markov processes. For both small (N = 100, left) and large (N = 10,000, right) population sizes as well as weak (s = 0.01) and strong (s = 0.3) selection, we show the allele frequency distributions after 10 generations of selection and random drift starting from a frequency of 0.2, as well as the waiting times for a transition from a frequency 0.2 to 0.9. Results obtained under the discrete Wright–Fisher process are given in gray, those obtained under the diffusion approximation as black solid lines or open boxes, and those under our approximation in shades of blue, with darkness increasing with higher number of frequency states considered.
Comparison of different approximations. Shown are the cumulative probability density distributions (CDF) of allele frequencies after 10, 100, and 1000 generations (shown in each top left corner) of selection and random drift starting from a frequency of 0.2 and obtained under the Wright–Fisher diffusion (black) and three approximations of it: the approximations introduced by Lacerda and Seoighe (2014) (orange) and Malaspinas et al. (2012) (red) and the mean transition time approximation introduced here (shades of blue for different numbers of frequency states). Results are shown for small ( A and B) and large (
C and D) population sizes and weak (
A and C) and strong (
B and D) selection. For the approximation by Malaspinas et al. (2012), we used a quadratic grid as originally proposed with the minimum number of states mathematically possible, but at least 101 states (101, 101, 326, and 5813 states for A, B, C, and D, respectively). The distributions for the diffusion approximation were obtained from
simulations, using kernel density estimation.
A more direct illustration of our assumption is the comparison of the distribution of waiting times for a specific transition. As shown in Figure 2, our approximation indeed captures the mean transition time perfectly, while again exhibiting an increased variance for large selection coefficients and small number of states. Based on these results, and to keep the computational effort minimal, we use 51 states for all our inference shown below.
Choice of grid
All results shown above were obtained with a uniform grid of frequency states. Following Malaspinas et al. (2012), we also implemented a quadratic grid, but we found the choice of grids not to affect our approximation noticeably. In general, we found the quadratic grid to describe probabilities close to boundaries more accurately, but to be less accurate for intermediate frequencies than a uniform grid. These differences, however, are small, do not inflate with increasing number of generations, and are visible only for a very low number of states (Supplemental Material, Figure S1). This suggests that our approximation is very robust to the choice of the grid and that the differences observed are due to the resolution in characterizing the probability distribution at different frequencies rather than an effect of the approximation itself. We thus continue using a uniform grid in the following.
Comparison with related methods
Recently, Lacerda and Seoighe (2014) proposed to approximate the probability distribution of allele frequencies after t generations by a Gaussian distribution, the mean and variance of which can be obtained iteratively using the delta method. Their approach can easily be applied to the diffusion process studied here (see Appendix). As shown in Figure 3, the approximation obtained via the delta method and our approximation are very similar over a large range of the parameter space and also agree well with the diffusion process they both approximate. However, due to the assumption of a Gaussian distribution, the approximation obtained with the delta method is less accurate than our approach in describing allele frequencies close to boundaries. This is particularly true when selection is weak enough such that the probability of fixation is which results in a bimodal distribution (Figure 3A).
A major advantage of the delta approach, however, is its computational speed, which does not depend on the population size or the selection strength. Our method is generally much more demanding due to its reliance on matrix calculations rather than simple recursions. But the benefit of our approach lies in the discretization of allele frequencies, without which any inference from time-series data is computationally impossible whenever N is large.
In this regard, our method is closer to that introduced by Malaspinas et al. (2012) that also uses a grid of discretized allele frequencies. In contrast to our method, however, their approach approximates the mean and variance of the infinitesimal transition probabilities, rather than those of the resulting waiting times. While Malaspinas et al. (2012) derive their approximation for the classic diffusion, it is straightforward to generalize their approach and apply it to the diffusion studied here (see Appendix). As shown in Figure 3, their approximation holds generally well for most of the range tested, but allele frequencies appear to rise slightly too fast. More importantly, the approximation introduced by Malaspinas et al. (2012) requires substantially more states than our approximation due to the mathematical nature of the approximation. For the case of and
shown in Figure 3D, for instance, a minimum of 5813 states are required. In contrast, our approximation is computationally stable even with just a handful of states and thus allows us to balance accuracy and computation effort regardless of N or s. This difference between the two approaches easily translates into a reduction in computation time of several orders of magnitude when attempting to infer parameters using an HMM and essentially rendering such an analysis unfeasible for large
with the approximation introduced by Malaspinas et al. (2012), as has been reported recently (Foll et al. 2015).
Power to infer population sizes
While allele trajectories are affected by both selection and drift, we aim here to disentangle these effects by integrating information from multiple loci. We first assessed the power to infer population sizes N accurately under ideal conditions, that is, for 100 unlinked loci in the absence of selection. In Figure 4 we show the likelihood surfaces for N obtained with a different number of states, for data simulated under different population sizes. While this analysis suggests high power to infer small population sizes accurately, it highlights the general issue of inferring large population sizes from changes in allele frequencies, accentuated when fewer states are used. The issue arises from the fact that in large populations and over the short time course of evolutionary experiments in general, the changes in allele frequencies between time points are so small that they are compatible with almost arbitrarily large populations. While using fewer frequency states further decreases the resolution of detectable allele frequency changes, we note that this issue is more general and expected to affect all methods for inferring population sizes from such data, particularly when a small number of samples is used. The best way to overcome it is to observe changes in allele frequencies over larger intervals. Indeed, when taking samples every 130 generations instead of every 13, population sizes up to N = 100,000 can be estimated accurately (Figure 4, bottom row).
Power to infer population sizes. Shown are the relative likelihood surfaces obtained via our mean transition time approximation for a particular simulation of 100 neutral loci for different population sizes (vertical dashed lines) and different numbers of frequency states considered (see key). The top row is for the case of 13 generations between time points, and the bottom row is for the case of 130 generations between time points.
Power to infer selection
To assess the power of our framework to infer locus-specific selection coefficients, we simulated 100 unlinked loci, of which 20% experienced selection at various strengths. As shown in Figure 5, both the population size and the strength of selection affect the power of this inference. For medium to large population sizes, our method infers even small selection coefficients with high accuracy. When the population size is small, however, inference of selection proves more difficult (Figure 5). While this is generally expected due to the much larger effect of drift in small populations ( for the strongly selected alleles), it is accentuated here by our choice to simulate initial frequencies at random. Indeed, when given ideal starting frequencies (0.1 for positively and 0.9 for negatively selected alleles), our method identifies strongly selected alleles accurately even in small populations (Figure S2).
Power to infer selection and population size jointly. Here we show the posterior distributions on the population size (left panel) and locus-specific selection coefficients obtained for five replicate simulations for each of three different population sizes. For each replicate we plot the posteriors of all loci simulated under positive selection (blue shades, top row) and under negative selection (red shades, middle row), as well as five neutral loci picked at random (black, bottom row). In all simulations, starting frequencies were chosen randomly for each locus.
Remarkably, we found the power to infer population sizes as well as locus-specific selection coefficients not to be negatively affected under pervasive selection. This is illustrated by comparing the posterior distributions obtained from simulations where 80% of all loci were targeted by selection (Figure S3) to those shown here where only 20% were affected by selection (Figure 5). More direct evidence is given in Table 1, where we report the posterior probability for for different combinations of population sizes and selection coefficients and actually find higher power to identify selected loci in the case of pervasive selection than when only 20% of all loci were simulated under selection.
For computational efficiency, all results shown here were obtained using 51 states. However, we note a trade-off between power of inference and computational costs. As shown in Figure S4, using very few states (21) may lead to slightly broader posteriors and a small bias toward weaker values of s. Both effects are already largely overcome when using 51 states for most loci, but small improvements are still detectable with more states (Figure S4).
Application to influenza data
We next applied our approach to publicly available sequencing data of influenza H1N1 segment 6, obtained at multiple time points throughout an evolutionary experiment in which the virus was exposed to an antiviral drug (oseltamivir) (Renzette et al. 2014). While allele frequencies are generally estimated with high accuracy due to the very high coverage in this experiment (∼50,000×), sequencing error may contribute substantially to the observed low-frequency variants. In addition, many of the observed mutations likely entered the population only during the experiment, but their exact time of origin is blurred by both the sequencing error and sampling. We thus extended our framework to estimate the mutation rate as well as the overall sequencing error rate jointly with the demographic and selection parameters.
We applied our extended method to each of the eight segments of the influenza genome individually, but obtained highly concordant results among all segments. As shown in Figure 6, we infer the effective population size during the experiment to be ∼7000, a mutation rate of ∼ and a sequencing error rate of ∼
While our estimates of the mutation and error rates are consistent with published mutation rates for influenza (Nobusawa and Sato 2006) and RNA viruses in general (Drake et al. 1998) and also with the employed quality filters on sequencing reads (Foll et al. 2014), our estimate of the population size is substantially larger than previous estimates of ∼225 (Foll et al. 2014). While we found our approach to slightly overestimate larger population sizes under the spacing of time points relevant here, there are several arguments supporting a larger population size. First, the original estimates were obtained under the assumption of neutrality at all loci, while our approach infers N jointly with selection. Second, the previous estimates were obtained from a small subset of the data, namely the 147 loci with an observed allele frequency
after down-sampling to 1000 reads per locus at no less than three time points. In contrast, our inference is based on the raw data at the complete set of 13,395 loci, including those with small frequencies particularly informative about drift. Third, the original inference accounted for neither sequencing errors nor mutations. In summary, our results argue for a much larger effective population size than previously reported.
Evolution of drug resistance in influenza. Here we show the posterior distributions on the population size [], sequencing error rate (ε), mutation rate (μ), and locus-specific selection coefficients
estimated independently for each of the six segments of the influenza genome. For the selection coefficients, black solid circles represent posterior medians and gray lines indicate the 99% credible intervals. Loci for which the 99% credible interval does not include
are shown in red and their actual position within the segment is designated.
Our results on selection, on the other hand, are highly concordant with previous estimates. In Figure 6 we report the posterior distributions on the locus-specific selection coefficients for all polymorphic sites for each of the eight segments of the influenza genome. As expected, most mutations were found to be selectively neutral or under slight purifying selection (observe the slight asymmetry toward negative selection coefficients for many loci). For a few mutations, however, we found compelling evidence for them to be the target of positive selection (99% credible interval does not include 0). On segment NA, there were three such mutations, of which two stand out with an estimated selection coefficient ∼0.2. One of these mutations (Y274H) occurred at a locus at which resistance to oseltamivir has been previously described (Collins et al. 2008). Many additional mutations were found to be the target of selection throughout the genome, with many of those likely under negative selection. These are mutations that were found at elevated frequencies at the beginning of the experiment, yet at much lower frequencies after a few passages. The complete list of all mutations found to be under selection is given in Table S1.
Conclusion
Here we present a novel, discrete approximation for diffusion processes. This approximation, which we term mean transition time approximation, is designed to preserve the long-term behavior of the continuous process it approximates, which renders it particularly suitable to study time-series data. Here we derived this approximation for the particular case of inferring selection and demography from such time-series data under the classic Wright–Fisher model. As shown through extensive simulations, our approximation is well suited to describe allele trajectories through time, even when only a few states are used. This allowed us to develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy. We further extended this model to estimate the average sequencing error rate, as well as the per generation mutation rate. The approach is further readily applicable to models of instantaneous population size changes. We finally applied our approach to data from a recent experiment on the evolution of drug resistance in influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported.
Acknowledgments
We thank Nicolas Renzette for advice on how to identify the protein changes corresponding to individual mutations. We are grateful to two anonymous reviewers for their very constructive comments on an earlier version of this work. This study was supported by Swiss National Foundation grants PZ00P3_142643 and 31003A_149920 (to D.W.) and grants from the Swiss National Science Foundation and a European Research Council Starting Grant (to J.D.J.).
Appendix
Approximation for Large 
If is large, we get approximations for Green’s function that allow for analytic expressions of the integrals. More precisely, assume that
is large. We can then neglect the −1 terms in the numerator and denominator of (18) and we get the approximation
(A1)which will be very small for large γ. The probability for exit at the upper state is
Inserting the first approximating expression or
into (19) and using
we get
(A2)The exponential term is dominant for y close to
In the integral we can thus keep the factor of the exponential constant at
since it does not vary much when y is close to
:
(A3)From (20) we get the approximation
To get
we integrate this approximate expression. Observe that the exponential term becomes important only when y gets close to
For this reason we can safely keep the factor in front of the exponential term constant when integrating the second term:
(A4)Numerical experiments indicate that the approximate formulas (A3) and (A4) are adequate when the conditions
(A5)are met. In that case we set
and
Note that formula (A4) gets singular for
since in that case
Using the substitution
we get for that case from (20) the approximation
The last integral can be written as an exponential integral
in the form
Using the approximation
where
is the Euler–Mascheroni constant, we finally arrive at
(A6)Similarly, the case
deserves special attention because the denominator of (A2) gets singular at
Since
is small and y even smaller, we can set
and
From (19) we then get the approximations
From this we obtain for the downward mean transition time
because the integrand is very dominant at the upper integration limit. From (A1) we get the approximation
and thus

The Wright–Fisher Process in the Absence of Selection
In the absence of selection (), the expressions for the generator matrix can be explicitly evaluated since
(see Equation 11). We have
and
From this we get
(A8)The two parts of Green’s function are given by
and
These integrate to
(A9)and
(A10)As above we determine the transition rates by

Approximations via the Delta Method
Following the argument of Lacerda and Seoighe (2014), an approximate solution to the diffusion equation can be obtained by the delta method. While their original formulation applies to the discrete Wright–Fisher process, the argument works as well for the diffusion process studied here.
As above (Equation 1), is a diffusion process on the state space
with infinitesimal generator
(A11)Recall that the infinitesimal moments of the diffusion process are given by
The mean
of the process can be approximated iteratively as follows:
In the last step, we used the delta approximation
Similarly, we apply the delta approximation
to get an iterative approximation for the variation:
For the case of
and by inserting (9), one gets in particular

Approximations as Proposed by Malaspinas et al.
As in Malaspinas et al. (2012), we construct the Markov chain with states
by matching the infinitesimal mean and infinitesimal variance of U and X. This allows us to determine the generator matrix
Here we generalize their notation for any diffusion process
From formulas (8) and (9) from (Malaspinas et al. 2012) we then get
These can be solved for the infinitesimal generators
where we used the abbreviation
To apply these general formulas to the particular diffusion studied here we simply use
and
as given in Equations 10 and 11, respectively.
Footnotes
Communicating editor: J. Wakeley
Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.184598/-/DC1.
- Received November 7, 2015.
- Accepted March 22, 2016.
- Copyright © 2016 by the Genetics Society of America