## Abstract

The effective population size () is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide estimates, we extend our method using a recursive partitioning approach to estimate locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their estimates. We present an application to Pool-seq data from experimental evolution with *Drosophila* and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest.

DURING experimental evolution studies, populations are maintained under specific laboratory conditions (Kawecki *et al.* 2012; Long *et al.* 2015; Schlötterer *et al.* 2015). In sexually reproducing organisms, the census population size is typically kept fixed at fairly low numbers, rarely exceeding 2000 individuals. With such small population sizes, genetic drift causes stochastic fluctuations in allele frequencies. Under neutrality, the level of random frequency changes is determined by the effective population size () (Wright 1931). Furthermore, the efficacy of selection is influenced by For weakly selected alleles, the probability of fixation is directly proportional to the product of and the intensity of selection (Fisher 1930; Kimura 1964). As changes in allele frequency are greatly affected by the population size, it is fundamental to estimate accurately to understand molecular variation in experimental evolution studies.

Krimbas and Tsakas (1971) estimated using the standardized variance of allele frequency (*F*, see also Falconer and Mackay 1996) from longitudinal samples in natural populations of olive flies. As *F* was calculated from these samples, they accounted for the sampling variance that also contributed to the true allele frequency variance. This approach was further improved and used by several authors (Nei and Tajima 1981; Pollak 1983; Waples 1989; Jorde and Ryman 2007).

With the widespread availability of powerful computers, also maximum-likelihood-based methods became popular (Williamson and Slatkin 1999; Anderson *et al.* 2000; Wang 2001; Hui and Burt 2015) in addition to the moment-based approaches discussed above. Although these methods show less bias than the moment-based approaches (Wang 2001), they are still computationally demanding, in particular for the large numbers of markers typically obtained with novel sequencing technologies (Foll *et al.* 2015).

Estimating with temporal methods requires samples collected at least at two time points. Alternative methods that use only a single time point are based on linkage disequilibrium (LD) (Hill 1981; Waples and Do 2008, 2010; Waples and England 2011), heterozygote excess (Pudovkin *et al.* 1996), molecular coancestry (Nomura 2008), sibship frequencies (Wang 2009, 2013), or combinations of summary statistics using approximate Bayesian computation (Tallmon *et al.* 2008). LD-based methods are widespread but require haplotype or unphased diploid genotype information, which limits their applicability.

Although the cost for sequencing has dropped considerably, the separate sequencing of thousands of individuals in replicate populations in experimental evolution studies is still out of reach. Sequencing samples in pools (Pool-seq) can provide a cost-effective alternative (Schlötterer *et al.* 2014). Pool-seq has also been shown to outperform individual-based sequencing in estimating allele frequencies and inferring population genetic parameters under several conditions (Futschik and Schlötterer 2010; Zhu *et al.* 2012; Gautier *et al.* 2013). For these reasons, Pool-seq has become the basis of many experimental evolution “evolve and resequence” (E&R) studies (Turner *et al.* 2011; Schlötterer *et al.* 2015). Following the emergence of E&R, many population genetic estimators have been adjusted to handle the properties of Pool-seq data (Futschik and Schlötterer 2010; Kofler *et al.* 2011a,b; Kolaczkowski *et al.* 2011; Boitard *et al.* 2013; Ferretti *et al.* 2013). To the best of our knowledge, no estimators have been developed so far that properly deal with Pool-seq data.

In this article, we present a novel temporal method to estimate from pooled samples. We show that previously proposed estimators can lead to substantial bias, as they neglect the variance component due to pooled sequencing. We introduce a new model accounting for the two-step sampling process associated with Pool-seq data. In the first sampling step individuals are drawn from the population to create pooled DNA samples. In the second step, pooled sequencing is modeled as binomial sampling of reads out of the DNA pool. We show on simulated data that our method outperforms standard temporal estimators. For real data, we also suggest to use a segmentation algorithm, to partition the genome-wide sequence data into stretches of DNA with significantly different estimates. Finally, we present an application to a genome-wide experimental evolution data set from *Drosophila melanogaster* (Franssen *et al.* 2015).

## Materials and Methods

### Sampling schemes

Nei and Tajima (1981) investigated the sampling properties of temporal estimators and proposed two different sampling schemes. Under the first scheme (plan I), individuals are either sampled after reproduction or returned to the population after their genotypes have been examined. In contrast, under the second scheme (plan II) sampling takes place before reproduction and the sampled individuals are permanently removed from the population and their genotypes do not contribute to the next generation. By assuming different sampling distributions, they derived separate estimators under sampling plans I and II.

Waples (1989) unified the calculations under the two plans by assuming binomial sampling out of an infinitely large parental gamete pool for both sampling schemes. He concluded that the measure of variance under the two sampling plans differs only in a covariance term. For plan I, there is a positive correlation between allele frequencies sampled *t* generations apart because they are both derived from the same population at generation 0. In contrast, for plan II, the initial sample and individuals contributing to the next generation can be considered as independent binomial samples; thus sample frequencies at generations 0 and *t* are uncorrelated.

For a typical E&R study, outbred experimental populations are created by mixing a large number of inbred lines (*e.g.*, Turner and Miller 2012; Huang *et al.* 2014; Franssen *et al.* 2015). The populations are then propagated under the desired experimental conditions while keeping the census size of the population controlled through time (Figure 1). However, the experimenter has no direct influence on the effective population size, which is in general lower than the census size. In E&R studies with *Drosophila*, the census size rarely exceeds some hundreds of individuals, and sampling usually takes place after reproduction according to plan I. For organisms maintained at larger sizes, such as yeast, the sample for genetic analysis is not returned to the population (Burke *et al.* 2014). Plan II applies to such cases.

In E&R studies, sampled individuals are often pooled together for DNA extraction (Schlötterer *et al.* 2014). The size of the pool can be as large as the whole population. Depending on the experimental design, it is also possible that only a fraction of the population is sequenced, for instance, only females (Tobler *et al.* 2014; Franssen *et al.* 2015). Pooled individuals are used to create DNA libraries, which are, in turn, subjected to high-throughput sequencing.

We consider two separate sampling steps when estimating from Pool-seq samples (Figure 1). In the first step, we model the sampling of individuals out of the population. This can take place according to either plan I or plan II. In the second step, we model the sequencing of a DNA pool by drawing reads at random with replacement from the first-step sample. The allele frequency variance inferred from the sample is corrected for the additional variance coming from the two-step sampling and used for estimating

### Notation

We assume that the experimental population is propagated at a constant census size *N* and that We use genome-wide single-nucleotide polymorphism (SNP) data sampled *t* generations apart to estimate (Figure 1) and denote the estimated effective population size by Multiallelic sites in populations with low mutation rates, such as *Drosophila*, exist but are rare and likely to be sequencing errors (Burke *et al.* 2010). Therefore we consider only biallelic SNPs at *n* polymorphic sites. At each site *i* () the true population allele frequency is denoted by at time where To obtain allele frequency estimates for an unknown the population is subjected to sampling. We consider two sampling steps (Figure 1). At we first sample individuals out of the population to create a pooled DNA library for sequencing. Note that the number of sampled individuals is constant over the *n* sites, and therefore the index *i* is omitted here. Sampling individuals can take place according to either plan I or plan II, as described above (also shown in Supplemental Material, Figure S1). As the second sampling step, we model Pool-seq by drawing reads out of the pooled DNA sample at each site *i* (). This allows for variation in sequence coverage. Below we derive the variance in allele frequency for a given site. To keep notation simple, we omit again the index *i* and denote the unknown sample allele frequency among the individuals at the first sampling time point () by *x* and the subsequent allele frequency estimate obtained via pool sequencing from reads by Similarly, at some the respective frequencies are denoted by *y* and Note that under pool sequencing only and are observed.

### Estimating from temporal allele frequency changes

Under neutral Wright–Fisher evolution the variance in allele frequency () generated by drift after *t* generations at a single locus in a diploid population is well described by the expression (1)where *p* is the starting allele frequency (Falconer and Mackay 1996). Wright (1931) denoted the standardized variance by which leads to a convenient closed-form expression for Furthermore, if is large enough, and can be calculated as (2)The relation between and allele frequency changes described in Equation 1 was first used by Krimbas and Tsakas (1971) in natural populations of olive flies. They estimated the variance using (3)where and () are the observed allele frequencies in the samples collected *t* generations apart and *a* is the number of alleles at a specific locus. To eliminate the contribution of sampling errors to the variance, the total variance was corrected for the random sampling noise by simply subtracting the corresponding variance. This approach was further investigated and developed by a number of authors (Pamilo and Varvio-Aho 1980; Nei and Tajima 1981; Pollak 1983; Waples 1989).

Possible sources of bias in estimators were later investigated by Jorde and Ryman (2007). The authors pointed out that the expectation over *F* is typically approximated by taking the expected values separately for the numerator and the denominator (Turner *et al.* 2001). They suggested a different weighting scheme of alleles leading to an alternative less-biased estimator to measure temporal frequency change.

### Correction for two-step sampling

We consider a random-mating population with discrete generations. Neutral evolution is assumed with no selection, migration, and mutation. Samples are drawn from the population at generations and *t*. Throughout the derivation we consider diploid populations, and therefore a sample of individuals leads to sequences at times Sampling is assumed to be binomial with parameters and (Waples 1989). In the second sampling step at time sequencing a random pool of reads is also modeled as binomial sampling.

In analogy to Jorde and Ryman (2007), we use the following expression as our measure of the temporal change in allele frequency for biallelic sites, (4)where

The expectation of for a single biallelic locus is approximated by (5)For both plans, we derive expressions for the numerator and denominator in Equation 5 separately under the two-step sampling procedure, described above. Here we summarize our main conclusions; details on the derivation are provided in File S1. With for and *p* denoting the true population allele frequency in the gamete pool at generation 0, we obtain (6)and (7)Note that Equations 6 and 7 differ only in the correction term from that in Waples (1989).

Waples (1989) previously showed that the denominator in Equation 5 reduces to (8)For plan II, (Waples 1989), and corrected for the noise coming from the two-step sampling for a single locus is given by (9)For plan I, on the other hand, the sample allele frequency at generation 0 is positively correlated to the sample allele frequency at *t* because both are derived from the same population at generation 0. This requires us to calculate the sample covariance in Equation 5. It turns out (see File S1 for details) that the covariance of and is equal to (10)where *N* is the census size of the population at generation 0. Equation 10 is in agreement with the corresponding term of the standard methods (Waples 1989). Substituting the inferred covariance into Equation 5 leads to the following corrected variance estimate, for plan I (11)We provide the corresponding formulas of in haploid populations in File S1.

With Pool-seq data, randomness in sequencing and local structures in the genome can lead to different coverage across marker sites, which we denote by for site *i* () and time *j* (). In the genome-wide data set, we calculate across *n* SNPs by summing over all loci in the numerator and denominator separately before carrying out the division in Equation 9, leading to the following weighting scheme for plan II:(12)Similarly, can be calculated for plan I using Equations 4 and 11. Analogous to the single-locus case, our proposed estimators for a diploid population are obtained by plugging into Equation 2.

Long time series have recently become available for some E&R experiments (Barrick *et al.* 2009; Burke *et al.* 2010, 2014). Standard estimators (Krimbas and Tsakas 1971; Nei and Tajima 1981; Waples 1989) assume a small number of generations and approximate using If, however, is larger, using this approximation can lead to severe bias (Figure S2). To avoid such a bias, we use Equation 2 to estimate

### Simulations

We evaluate the performance of our estimator on data simulated under the neutral Wright–Fisher model. With a given population size of we simulate the frequency trajectory of *n* independent SNPs. As we focus on biallelic SNPs, we assume two possible nucleotides (alleles) to be present in the population with given frequencies at the start. To create a new generation, nucleotides are drawn independently at random with a probability given by their respective allele frequencies in the old generation. The population is propagated at a constant size of for *t* nonoverlapping generations. The effective population size is then estimated from allele frequencies inferred from Pool-seq samples taken from the population at the start and after *t* generations. The sampling of individuals to create the pooled DNA library is simulated by using sampling without replacement. To model the uneven coverage of genome-wide sequence data, we simulate a random coverage for each site, using a Poisson distribution with parameter equal to the given mean coverage. For every position, reads are then generated by binomial sampling from the library with sample size equal to the local coverage.

We assess the performance of our estimator for various combinations of pool size, coverage, number of SNPs, and distribution of starting allele frequencies. Additional to these parameters, we also test how the ratio between census and effective population size () affects the accuracy of the proposed estimator. For this purpose, we increment the population size to a desired level of *N* in each generation while keeping the allele frequencies unchanged to avoid introducing additional sampling variance. We simulated each scenario 100 times.

Linkage disequilibrium between loci can reduce the number of independent SNPs, thereby increasing the variance of the estimate. The impact of dependence between SNPs is investigated based on 10 replicated whole-genome forward simulations with recombination, using the software tool MimicrEE (Kofler and Schlötterer 2014). As a starting population for the forward simulations, we sampled 2000 haploid genomes out of 8000 genomes simulated with fastsimcoal v.1.1.2 (Excoffier and Foll 2011; Bastide *et al.* 2013). The parameters used to generate the genomes mimic a wild population of *D. melanogaster* from Vienna (Fiston-Lavier *et al.* 2010; Bastide *et al.* 2013; Kofler and Schlötterer 2014). Allele counts are subjected to binomial sampling to mimic Pool-seq with a given sequence coverage. is estimated in nonoverlapping windows, each containing a fixed number of SNPs.

### Estimating on simulated data

We denote our estimator corrected for the additional sampling step, *i.e.*, pooling, by We compare to the standard estimators and proposed by Waples (1989) and Jorde and Ryman (2007) that correct only for a single sampling step.

We illustrate experimental sampling procedures by considering two major scenarios: (i) The full population is sequenced as one large pool and (ii) only a subset of the population is used to create pooled samples. Under scenario (i) we simulate only a single binomial sampling step to represent sampling reads out of the DNA pool. The pool size is set to be equal to the census size of the population (), and the number of sampled reads () represents the per-site coverage. For estimators that correct only for a single sampling step, we use the coverage () as the sample size. For scenario (ii), the sampled individuals () and the read number () represent the pool size and coverage for The coverage () is taken as the sample size for the and estimators, as these methods consider only one sampling step.

### Change point inference for genome-wide estimates

The effect of genetic drift on the variance in allele frequency specified in Equation 1 holds only under the assumptions of Wright–Fisher neutral evolution. Deviations from the Wright–Fisher model, such as the presence of selection or demography, may cause systematically different changes in allele frequency, affecting the variance and causing locally variable patterns in genetic diversity. Furthermore, the effect of selection on one site of the genome may cause changes in the behavior of variants at nearby sites (Maynard Smith and Haigh 1974; Barton 2000; Comeron *et al.* 2008). As a result, the estimates of at different locations of the genome may deviate from the true number of breeding individuals in the population (Kimura and Crow 1963; Charlesworth 2009). For example, regions under background selection are associated with reduced values that extend to linked sites due to the Hill–Robertson effect (Charlesworth 1996, 2012a; Comeron *et al.* 2008). Similarly, selectively favorable alleles can also drag nearby neutral sites to high frequency (Maynard Smith and Haigh 1974), causing a local reduction in the estimated (Liu and Mittler 2008). Such an event is also known as a selective sweep (Berry *et al.* 1991). On the other hand, we expect the opposite pattern, *i.e.*, a local elevation of for types of selection such as balancing selection that maintain variation in the genome (Baysal *et al.* 2007; Charlesworth 2009).

To detect such patterns in we apply a segmentation algorithm to partition the genome into locally homogeneous stretches. We use a method related to an approach suggested by Futschik *et al.* (2014) for partitioning DNA sequences with respect to GC content. It is based on a statistical multiscale criterion and provides statistical error control, in the sense that the estimated number of windows will not exceed the true one except for a small error probability *α* to be specified by the user. With our estimates, we use a criterion proposed by Frick *et al.* (2014) for normally distributed responses. It is implemented as part of the R package *stepR* (Frick *et al.* 2014). By using simulations with selection we also illustrate that this method is able to capture the signal of locally variable along the chromosome.

### Data availability

We estimated *N*_{e} in an E&R study with *D. melaongaster*, published in Orozco-terWengel *et al.* (2012) and Franssen *et al.* (2015). Pool-seq read libraries from these studies are available at the European Sequence Read Archive at http://www.ebi.ac.uk/ena/ under accession nos. ERP001290 and ERS460611–ERS460613.

## Results and Discussion

### Two-step correction is vital to avoid large bias in with Pool-seq data

Methods that do not correct for the additional sampling step caused by pooling can lead to substantial bias in as illustrated in Figure 2. Using simulated data, we compare our proposed estimator to two commonly used estimators (Waples 1989) and (Jorde and Ryman 2007) that provide highly accurate estimates when only a single sampling event is simulated (Figure S3). Figure 2 shows that the additional correction substantially decreases the bias for almost all scenarios (see also Figure S4, Figure S5, and Figure S6). Under plan I, is nearly unbiased. The plan II version of the estimator has a slight upward bias when applied on data simulated under plan I, if the samples are taken at very close time points.

As an alternative approach, we also estimated separately for each locus, using in Equations 9 and 11. We then calculated the effective population size across the *n* loci as the harmonic mean over the single-locus estimates () (Peel *et al.* 2013). In our simulations, the harmonic mean estimator shows an accuracy similar to that of the original (Figure S7). However, for *t* lying in the midrange of the simulated interval ( 15–40), is slightly more biased under plan I.

Because of the additional sampling variance, both and have a downward bias in particular for small *t*. Furthermore, is upwardly biased for larger values of *t*, probably reflecting that alleles closer to fixation or loss are contributing less to the variance (Waples 1989). The drift variance accumulates with an increasing number of generations, while the sampling variance stays constant, making the initial bias of less pronounced for larger *t*. When samples are collected only a few generations apart, the variance of estimators tends to be larger than that of and under both plans.

Plan I and II estimators differ by a factor resulting from the covariance between the sample frequencies at generations 0 and *t* (Equation 10), which is inversely proportional to the census population size. Consequently, the difference between plans I and II becomes smaller for increasing *N*. Waples (1989) investigated how the ratio between census and effective population size () affects the accuracy of the estimators and concluded that the ratio of is sufficient to reach similar estimates for both sampling schemes. We tested the performance of on simulated data with and ratios of with different coverages and pool sizes (Figure S4, Figure S5, and Figure S6). When the plan I method achieves highly accurate estimates for all time points in contrast to the other methods (Figure S4). If, however, the plan II estimator is applied to data simulated under plan I, we observe an upward bias for small *t*, which improves with an increasing number of generations. This pattern is not unexpected since the missing covariance term becomes less influential in view of the increasing drift variance after several generations. When the entire population is sequenced as a single pool (), the plan II estimators of Waples (1989) and Jorde and Ryman (2007) perform similarly to the plan I estimator because the correction for pooling in cancels out the additional covariance term when making the term used as *F* approximately identical to that of This is a general pattern irrespective of *r*.

For plan I remains highly accurate (Figure S5 and Figure S6). Furthermore, when increasing the census size under a constant (equivalent to increasing *r*), the covariance between sample allele frequencies decreases, making the difference between plans I and II almost negligible (Waples 1989). The sampling variance becomes proportionally smaller compared to the drift variance with an increasing number of generations between the samples. This improves our ability to accurately estimate

Correcting for the additional variance inherent to Pool-seq leads to an improved performance of compared to the standard methods for both plans. In general, with Pool-seq data the extent of the bias of the and estimates depends on the ratio between *N* and *S*, smaller sample sizes (*S*) leading to a larger bias. As we accounted for the sequencing step with these estimators (*Estimating * *on simulated data*), decreasing the coverage at a given pool size does not change the bias much but rather increases the variance of the estimators.

In most of the experimental studies the investigator has control over the census population size; thus requiring the knowledge of *N* for plan I does not restrict the analysis. We illustrate the performance of plan I only when in the main text, but according to Figure S5 and Figure S6, plan I is also highly accurate when

We show the coefficient of variation (CV) of the plan I estimator in Figure 3. The CV is defined as the ratio between the standard deviation and the mean ( where both and are estimated from the sample). It measures the relative dispersion of the distribution of the estimated values. estimators are highly precise in nearly all cases, except when the drift variance is negligible compared to the sampling variance (Figure 3; see also Figure S9 and Figure S11 where and ). The bias is coming from a few outlier estimates, but the median shows more robust results (Figure S13). For plan II estimators, the behavior of the method is similar (Figure S8, Figure S10, Figure S12, and Figure S14). Note that the simulations underlying Figure S8, Figure S10, Figure S12, and Figure S14 have been done under plan I.

### Increasing the number of SNPs reduces the variance of

We test how the number of loci (*n*) used to infer affects the accuracy and the precision of the estimates by gradually increasing the number of independent SNPs from 100 to 10,000 (Figure 4). We observe a larger variance and a slight downward bias for a small number of SNPs (100 SNPs). Both the bias and the variance become smaller with a larger number of SNPs. Some further improvement is obtained when >10,000 SNPs are used (not shown), but the benefit of additional independent SNPs levels off. We conclude that SNPs usually provide sufficient precision and accuracy. However, when linkage disequilibrium is present in a genome-wide data set, the number of truly independent SNPs per window is reduced and a larger number of loci is recommended.

### A skewed starting allele frequency distribution only moderately increases the variance of

In natural populations, the neutral site frequency spectrum is skewed toward allele frequencies close to the boundaries. uses a weighting scheme that is not very sensitive to this skew (see also Jorde and Ryman 2007). This makes it robust with respect to the shape of the starting allele frequency distribution. We illustrate this with a simulated data set having a starting allele frequency distribution that is skewed toward low- and high-frequency variants (Beta(0.2, 0.2)) as expected under neutrality. The estimates of from such data sets are compared to simulated data with matching parameters but uniform starting allele frequency distribution (Figure 5). We observe a very slight upward bias with neutral starting allele frequencies compared to uniform and a moderate increase in the variance given With an increasing number of generations, the difference becomes negligible.

### The presence of linkage disequilibrium does not have a large effect on the precision of

We investigated the sensitivity of our estimator to linkage disequilibrium between loci, using genome-wide neutral simulations with recombination (Kofler and Schlötterer 2014). We simulated data with three different rates of recombination: high, normal, and no recombination. For the first case, the recombination rate is set to mimic the behavior of almost independent SNPs. In the normal recombination rate scenario, we use *D. melanogaster* recombination rates (Fiston-Lavier *et al.* 2010). The effective population size was estimated in nonoverlapping windows with a fixed number of SNPs (Figure 6). Different levels of linkage disequilibrium affect the number of independent loci per window. Nevertheless, we observe only a slight increase in the precision of the estimates with increasing recombination rate (Figure 6).

### Heterogeneous along the genome in an E&R study with *D. melanogaster*

We estimated in a recent E&R study with *D. melanogaster* (Orozco-terWengel *et al.* 2012; Franssen *et al.* 2015). In this experiment replicate populations of 1000 individuals were subjected to a fluctuating hot environment for 59 generations. Allele frequency estimates were obtained for founder and evolved populations, using Pool-seq. was estimated based on the allele frequency changes between founder and latest evolved populations in nonoverlapping windows containing 10,000 SNPs, using under plan I. To determine the number of DNA stretches with different along the genome, we use a segmentation algorithm provided in the software tool by Frick *et al.* (2014). This method requires homogeneity of variances. Since the variance of estimates increases with the estimates were log-transformed before applying the partitioning procedure. The obtained step function was back-transformed to the original scale and is shown for three biological replicates (Figure 7).

The mean and the median estimates for each chromosome arm as well as across the genome are stable across replicates (see Table 1 and Table S1). As experimental evolution studies often aim to find signals that are consistent across replicates, this can be an important check of the experimental setup. On the other hand, we see differences between chromosome arms. For example, the mean is clearly lower for 3R, emphasizing the added value of spatial analysis compared to genome-wide estimates.

In *D. melanogaster* ranges between ∼100 and 400. Around the centromere of chromosome 2, the estimated decreases by two-thirds in replicates 1 and 3, which is in agreement with the expectation of low diversity and, as a consequence, low in regions with reduced recombination (Begun and Aquadro 1992; Presgraves 2005; Haddrill *et al.* 2007; Campos *et al.* 2012). Furthermore, is low on the entire chromosome arm 3R and also on parts of 3L. Overall, these patterns can be attributed to strong LD, caused either by low recombination rates around the centromeres (Chan *et al.* 2012) or by segregating inversions (Kapun *et al.* 2014) in combination with selection potentially on rare variants. The reduction in is also well captured by the segmentation algorithm (Figure 7), which shows a similar pattern when applied on simulated data with selection (Figure S15). These results are consistent with those of Tobler *et al.* (2014), who observed a massive amount of outlier SNPs around the centromere of chromosome 2 and on 3R. Interestingly, certain regions of the genome show extensive differences in between the replicates, which might be reflecting different selection histories or differences in demography, such as replicate-specific bottlenecks.

may also vary as a result of differences in the modes of transmission of different components in the genome. For example, on the X chromosome, is equal to three-quarters of the autosomal population size (Vicoso and Charlesworth 2006, 2009). Interestingly, our estimates in the E&R experiment do not reflect this expectation of reduced effective population size. Instead, we estimate to be as high as on the autosomes. Unequal sex ratio between males and females can be a source of such a pattern (Charlesworth 2009); however, unbalanced sex ratio has not been reported in this experiment. Another possible explanation for increased on the X can be the presence of background selection as suggested by Charlesworth (2012b). He argues that because of the lack of recombination in male *Drosophila*, the effect of background selection is more effective on the autosomes than on the X chromosome. Orozco-terWengel *et al.* (2012) reported differences in the number of putatively selected sites between the X and autosomes. They found that candidate SNPs were underrepresented on the X. Their selection scan identifies signatures of deviation from neutral expectation, which is also reflected in the reduction in on the autosomes, indicating higher selection pressure.

### Recommendations for genome-wide data sets

Most of the methods proposed previously are not designed for genome-wide high-density SNP data sets. However, the method of Jorde and Ryman (2007) was successfully used for genome-wide data by Foll *et al.* (2014). Reed *et al.* (2014) also used a similar approach to estimate for whole-genome data, using sliding windows. We estimated in windows with a fixed number of SNPs. Using windows of fixed lengths in base pairs would affect the variance of the estimator (Figure 4) but does not distort the mean. All these approaches, however, do not account for the ruggedness of the recombination landscape and can lead to windows with different levels of linkage disequilibrium in them. To overcome this problem it would be possible to define windows based on recombination distance. Unfortunately, the lack of haplotype information in the Pool-seq data makes it difficult to infer linkage disequilibrium. One way to infer linkage information from pooled sequence data is provided by the software LDx (Feder *et al.* 2012). For model organisms such as *Drosophila*, readily available recombination maps can also be used as a proxy (Przeworski *et al.* 2001; Kulathinal *et al.* 2008; Fiston-Lavier *et al.* 2010). If only a single genome-wide estimate is required, one can alternatively use a set of randomly distributed SNPs over the genome to obtain

Temporal methods make a number of assumptions, which, if violated, can lead to biased estimates. For example, in our simulations, we considered only effective population sizes that are constant over time. Fluctuating is a frequent phenomenon in natural populations and can be an important component of an experimental design. For example, in repeatedly bottlenecked populations, the smallest population size dominates (Luikart *et al.* 1999; Charlesworth 2009). But even in strictly controlled populations the experimental regime can induce changes in When the population changes in size, the estimated is generally interpreted as the harmonic mean of the effective population sizes over the generations (Wright 1938; Nei and Tajima 1981; Waples 1989). However, if time series allele frequency data are available, such changes can be detected by estimating from pairwise comparisons between consecutive time points.

All evolutionary forces (selection, demography, etc.) that lead to deviations from the neutral expectation will also affect our estimate. Nevertheless, systematic forces that result in locally different values of can be detected with a sliding-window approach, as illustrated with simulations under selection (Figure S15). The *D. melanogaster* data set also illustrates this point; *i.e.*, the hypothesized regions under selection coincide with regions of reduced (Orozco-terWengel *et al.* 2012; Tobler *et al.* 2014; Franssen *et al.* 2015). For this to be detected, however, most of the allele frequency change has to occur over the sampled time span.

In the E&R study with *D. melanogaster*, shown above, the criterion of nonoverlapping generations, assumed by temporal methods, is met (see Orozco-terWengel *et al.* 2012 for details on experimental design). However, for samples from an age-structured population, the resulting can be biased (Waples and Yokota 2007). In these cases, as suggested by Waples and Yokota (2007), larger spacing between samples maximizes the drift signal compared to sampling biases associated with age structure.

### Using a small number of generations can lead to outlier estimates

In general, has a lower bias but larger variance, especially when *t* is small. As pointed out by Jorde and Ryman (2007) our weighting scheme leads to an increased variance but a smaller bias compared to other schemes. We observe outlier estimates among replicates at early generations (generation 5, Figure 2, Figure S4, Figure S5, and Figure S6) for When the sampling variance is large compared to the drift variance ( and Figure S11 and Figure S12), the deviation of the outlier estimates from the true is particularly large. For a few cases, we even observe large negative estimates. Negative estimates, in general, can be interpreted as being infinity, that is, no evidence of genetic drift (Peel *et al.* 2013). In our simulations this is plausible when is large and *t* is small, such that drift has not had a large effect on the population allele frequencies yet. Note that the harmonic mean estimator () has smaller variance for large (Figure S16). This estimator, however, is less accurate than for small as shown in Figure S17.

To eliminate potential outliers and an inflated variance we recommend increasing the signal-to-noise ratio by pooling a sufficient number of individuals. Using later generations or increasing the number of SNPs in the analysis also helps to avoid outlier estimates. When none of these strategies can be applied, we suggest using the genome-wide median estimates or the harmonic mean estimator, as these are more robust to extreme outliers.

### Conclusions

Effective population size is an important parameter for describing evolutionary dynamics, making its accurate estimation essential for population genetic studies. Several methods have been designed to estimate and their performance was comprehensively evaluated on simulated as well as real data (Barker 2011; Serbezov *et al.* 2012; Baalsrud *et al.* 2014; Holleley *et al.* 2014; Gilbert and Whitlock 2015). These studies mainly focused on genetic data collected from natural populations, which usually differ from experimental studies in terms of the census population size and sampling scheme. We designed a method that accurately infers the effective population size in genome-wide data from experimental populations sequenced in pools. Our approach improves temporal methods by explicitly correcting for two stages of sampling introduced by pooling and sequencing. Our results on simulated data confirm that methods that fail to properly account for the two stages of sampling inherent to Pool-seq can lead to severely biased estimates.

Pool-seq data are often considered to be overdispersed, *i.e.*, displaying more variability than is predicted by the binomial sampling model (Yang *et al.* 2012). However, Zhu *et al.* (2012) and Futschik and Schlötterer (2010) validated that the error in allele frequency estimates is reasonably well approximated by binomial sampling given that a large enough number of individuals are pooled. Nevertheless, if overdispersion is present in the data, that will lead to additional variance, which is not modeled in our framework and will result in a downward bias of the estimated If the level of overdispersion can be inferred for the data (see, *e.g.*, Gautier *et al.* 2013; Illingworth 2015), it is possible to introduce a parameter that accounts for the additional between-pool variation (see File S1, Equation S8).

We also illustrate the applicability of our method for estimating from experimental data of *D. melanogaster* and show that in combination with a recursive partitioning method we can infer patterns of local variation in along the genome. Additionally, it is possible to calculate confidence intervals based on the distribution (Waples 1989) or alternatively apply a nonparametric bootstrap approach.

### Software availability

Our proposed estimators along with standard methods from the literature are implemented within the R package *Nest*. The package is currently available at https://github.com/ThomasTaus/Nest.

## Acknowledgments

We thank Mads Fristrup Schou, Susanne U. Franssen, and Neda Barghi for helpful comments on the *Nest* software package and Robin S. Waples and an anonymous reviewer for their constructive comments that greatly improved the manuscript. A.J. and T.T. are members of the Vienna Graduate School of Population Genetics, which is funded by the Austrian Science Fund (FWF, W1225). C.S. is also supported by the European Research Council grant ”ArchAdapt,” and T.T. is a recipient of a Doctoral Fellowship (DOC) of the Austrian Academy of Sciences.

## Footnotes

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.191197/-/DC1.

*Communicating editor: M. A. Beaumont*

- Received May 4, 2016.
- Accepted July 30, 2016.

- Copyright © 2016 Jónás
*et al*.

Available freely online through the author-supported open access option.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.