## Abstract

Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of “standardized allele frequencies” that allows investigators to apply tests of their choice to multiple populations while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to detect nonparametric correlations with environmental variables; these correlations are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to *F*_{ST} and is shown to be more powerful, as we account for population history. We also extend the model to next-generation sequencing of population pools—a cost-efficient way to estimate population allele frequencies, but one that introduces an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by reanalyzing human SNP data from the Human Genome Diversity Panel populations and pooled next-generation sequencing data from Atlantic herring. An implementation of our method is available from http://gcbias.org.

THE phenotypes of individuals within a species often vary clinally along environmental gradients (Huxley 1939). Such phenotypic clines have long been central to adaptive arguments in evolutionary biology (Mayr 1942), with diverse examples including latitudinal clines in skin pigmentation in humans (Jablonski 2004), body size and temperature tolerance in *Drosophila* (Hoffmann and Weeks 2007), and flowering time in plants (Stinchcombe *et al.* 2004). Unsurprisingly, comparisons of allele frequencies between populations that differ in environment were among the earliest population genetic tests for selection (Cavalli-Sforza 1966; Lewontin and Krakauer 1973; Endler 1986) and have continued to be central to population genetics to this day (*e.g.*, Coop *et al.* 2009; Akey *et al.* 2010).

The falling cost of sequencing and genotyping means that such comparisons can now be made on a genome-wide scale, allowing us to start to understand the genetic basis of local adaptation across a broad range of organisms. However, such studies need to acknowledge the sampling issues inherent in population genetic studies of natural populations. In assessing correlations between allele frequencies and environmental variables, or in looking for loci with unusually high levels of differentiation, two broad technical issues need to be addressed. First, sample allele frequencies are noisy estimates of the population allele frequency, and this issue is exacerbated when sample sizes differ across populations. Second, population allele frequencies among multiple populations are not statistically independent, as populations vary in their relationship to one another due to varying amounts of shared genetic drift and migration over time. Failure to account for differences in sample size and the shared history of populations can lead to a high rate of false positives and negatives when searching for the signal of local adaptation due to the unaccounted sources of variance and nonindependence among populations (Robertson 1975; Excoffier *et al.* 2009; Bonhomme *et al.* 2010). Therefore, accounting for these potential biases should provide additional precision in the identification of loci responsible for adaptation.

To accommodate these sources of noise, the Bayesian method Bayenv was developed (Hancock *et al.* 2008; Coop *et al.* 2010); Bayenv attempts to account for these two factors while testing for a correlation between allele frequencies and an environmental variable. To control for a general relationship between populations, an arbitrary covariance matrix of allele frequencies is estimated from a set of control markers. This model of covariance is then used as a null model against an alternative model that allows for a linear relationship between the (transformed) allele frequencies at a particular locus and an environmental variable of interest. Inference under these models is performed using Markov chain Monte Carlo (MCMC) to integrate over the posterior of the parameters. Bayenv has been successfully applied to identify loci putatively involved in local adaptation to environmental variables across a range of different species (*e.g.*, Hancock *et al.* 2008, 2010, 2011b,c; Eckert *et al.* 2010; Fumagalli *et al.* 2011; Jones *et al.* 2011; Cheng *et al.* 2012; Fang *et al.* 2012; Keller *et al.* 2012; Limborg *et al.* 2012; Pyhäjärvi *et al.* 2012).

A range of other model-based methods have been developed in parallel to detect environmental correlations. One of the earliest of these was the method of Joost *et al.* (2007), which accounted for differences in sample size but did not account for population structure. Building on this, Poncet *et al.* (2010) developed a fast method based on generalized estimation equations, to allow population structure for correlated sample frequencies within a set of population clusters (but no relatedness between these clusters). A recent simulation study (De Mita *et al.* 2013) has shown that environmental correlation methods that explicitly account for population structure (*e.g.*, Coop *et al.* 2010; Poncet *et al.* 2010) outperform those that do not account for structure, with Bayenv having somewhat higher power and greater robustness to population structure, likely due to it allowing arbitrary relatedness between populations (at the expense of computational speed).

Recently, Guillot (2012) developed an inference method very similar to Bayenv, offering large gains in computational efficiency. These gains are at the expense of constraining the covariance matrix in an explicit isolation-by-distance parametric form. In addition, Frichot *et al.* (2013) presented a latent factor mixed model that estimates the effect of population history and environmental correlations simultaneously. The Frichot *et al.* (2013) method resulted in a slightly higher power than that of Bayenv to detect environmental correlations in simulations, perhaps in part as a result of the simultaneous inference of fixed and random effects reducing the effect of selected loci inflating the covariance matrix.

These methods (Coop *et al.* 2010; Poncet *et al.* 2010; Guillot 2012; Frichot *et al.* 2013) all seem to have similar performance in terms of power, suggesting that the detection of linear correlations in the presence of population structure has reached a reasonable level of development. However, further work is needed to extend the utility and robustness of these methods to enhance their application to population genomic data.

One concern about applications of these methods is that linear models are not robust to outliers, which can lead to spurious correlations. For example, if a single population has both an extreme allele frequency and an extreme environmental variable, while all other populations show no correlation, then the linear model may be misled (see Hancock *et al.* 2011b; Pyhäjärvi *et al.* 2012, for examples). This sensitivity can be overcome by using rank-based nonparametric statistics, such as Spearman’s *ρ*, which may also offer increased power to robustly detect linear and nonlinear relationships. The difficulty is that such tests do not acknowledge the differences in sample size or the covariance in allele frequencies across populations. To address this some authors (Fumagalli *et al.* 2011; Hancock *et al.* 2011a) have used a nonparametric partial Mantel test, which can partially account for this covariance and makes fewer model assumptions. However, the partial Mantel approach is known to perform very poorly when both genotypes and environmental variables are spatially autocorrelated (see Guillot and Rousset 2013, for discussion). As such, we currently lack a framework to perform robust, nonparametric inference of environmental correlations in population genomic data.

To overcome these difficulties we use the framework laid out in Bayenv to provide the user with a set of “standardized” allele frequencies at each SNP. In these standardized allele frequencies the effect of unequal sampling variance and covariance among populations has been approximately removed. This affords users a general framework to utilize statistics of their choosing to investigate environmental correlations or other sources of allele frequency variation. As an application of this we show how these standardized allele frequencies can be used to develop a powerful test that robustly infers environmental correlations in the face of outlier populations. As a further example of how these “standardized allele frequencies” can be used, we construct a global *F*_{ST}-like statistic, which accounts for shared population history and sampling noise, to identify loci with unusually high allele frequency variance among populations (a test that is closely related to that of Bonhomme *et al.* 2010). We demonstrate the utility of these approaches through simulation and reanalyzing SNP genotyping data from the Centre d'Etude du Polymorphisme Hmain (CEPH) Human Genome Diversity Panel (HGDP) (Conrad *et al.* 2006; Li *et al.* 2008).

We also extend Bayenv to deal with some of the statistical challenges posed by next-generation sequencing. Recently, pooled next-generation sequencing (NGS) of multiple individuals from a population has gained in popularity (*e.g.*, Turner *et al.* 2010, 2011; He *et al.* 2011; Kolaczkowski *et al.* 2011; Boitard *et al.* 2012; Fabian *et al.* 2012; Kofler *et al.* 2012; Lamichhaney *et al.* 2012; Orozco-terWengel *et al.* 2012), as it offers a cost-efficient alternative to sequencing of single individuals. However, estimating allele frequencies from read counts sequenced from a pool implies a second level of sampling variance (Futschik and Schlötterer 2010; Zhu *et al.* 2012), which needs to be considered in population genetic analyses. We extend the model of Bayenv to account for the sampling of reads in pooled NGS experiments. We show that this improves the power to detect environmental correlations in pooled resequencing data. We also show the utility of our approach by the reanalysis of a pooled next-generation sequence data set from Atlantic herring (*Clupea harengus*) populations along a salinity gradient (Lamichhaney *et al.* 2012).

The extensions to Bayenv detailed here are schematically depicted in Figure 1, and all of these extensions are implemented in Bayenv2.0, available from http://gcbias.org.

## Methods

### General model of Bayenv

First, we briefly explain the underlying model of Bayenv for the sake of completeness. Further details about the model and the inference method can be found in Coop *et al.* (2010). Consider a biallelic locus *l* with a population allele frequency *p _{jl}* in a population

*j*, where

*n*alleles have been sampled from this population in total. Assuming that each population is reasonably large and approximately at Hardy–Weinberg equilibrium, the observed count of allele 1,

_{j}*k*, in this population is the result of binomial sampling from this population frequency:(1)We follow the model of Nicholson

_{jl}*et al.*(2002) by assuming that a simple transform of the population allele frequency

*p*in subpopulation

_{jl}*j*at locus

*l*represents a normally distributed deviate around an “ancestral” frequency

*ɛ*. Specifically we assume that (2)

_{l}*i.e.*, that the masses <1 and >1 are placed as point masses at 0 and 1, representing the loss or fixation of the allele in population

*j*, respectively. We then assume that that the marginal distribution of

*θ*is normally distributed, around an ancestral mean frequency

_{jl}*ε*with variance proportional to

_{l}*ε*(1 −

_{l}*ε*) (inspired by the model of Nicholson

_{l}*et al.*2002). We denote the vector of transformed population allele frequencies at a locus by

*θ*, where

_{l}*θ*= (

_{l}*θ*

_{1}

*, …,*

_{l}*θ*) when

_{Jl}*J*is the number of populations. As we do not expect that the populations are independent from each other, we assume that

*θ*follows a multivariate normal distribution(3)where Ω is the variance–covariance matrix of allele frequencies among populations. We can write the joint probability of our counts at a locus and the

_{l}*θ*as (4)We place priors on Ω (inverse Wishart) and the

_{l}*ε*at each SNP (symmetric Beta). Assuming that our SNPs are independent, we write the joint probability of all of our loci and parameters as(5)Our posterior is this joint probability normalized by the integral over Ω and the

_{l}*ε*and

_{l}*θ*at all of the loci.

_{l}We then use MCMC to sample posterior draws of the covariance matrix (Ω) from a set of unlinked, putatively neutral control SNPs. Our observations showed that the MCMC converges quickly to a small set of covariance matrices for each data set given a sufficient number of independent SNPs (Coop *et al.* 2010). Given this tight distribution, we use a single draw of Ω, denoted by , after a sufficient burn-in. The entries of the matrix Ω are closely related to the matrix of pairwise *F*_{ST} (Weir and Hill 2002; Samanta *et al.* 2009), and so this model provides a flexible model of population history; for example, Pickrell and Pritchard (2012) use a similar model to infer a tree-like graph of population history and Guillot (2012) uses a related model as a model of isolation by distance.

Next, we formulate an alternative model where an environmental variable *Y*, standardized to have mean 0 and variance 1, has a linear effect *β* on the allele frequencies: (6)To express the support for the alternative model at a locus *l*, Coop *et al.* (2010) calculated a Bayes factor (BF) by taking the ratio of probability of the alternative and the null model given the data and , integrating out the uncertainty in *θ _{l}*,

*ε*, and

_{l}*β*(under a uniform prior on

*β*between −0.2 and 0.2).

### Tests based on standardized allele frequencies

The linear relationship between the transformed allele frequencies (Equation 6) may not be the best fit in all situations, as other monotonic relationships could be viewed as biologically realistic in some cases. Additionally, there may be situations in which a linear model is not robust to outliers and so will spuriously identify loci as strong correlations. Therefore, we provide a general framework to allow investigators to apply statistics of their choice, such as rank-based nonparametric statistics, to detect environmental correlations, while taking advantage of the Bayenv framework. These statistics could in theory be applied to the raw sample frequencies; in practice, however, that can lead to high false-positive and false-negative rates as sample allele frequencies are naturally noisy because of the process of sampling and nonindependent due to the covariance among populations.

The multivariate normal framework employed by Bayenv offers a natural way to attempt to standardize *θ _{l}* to be variates with mean zero, variance one, and no covariance. These allele frequencies allow standard statistics that rely on these assumptions to be applied more directly. We denote the Cholesky decomposition of the covariance matrix

*C*(Ω =

*CC*, where

^{T}*C*is an upper-triangular matrix), which can be thought of as being equivalent to the square root of the matrix and so analogous to the standard deviation of

*θ*. To standardize the

_{l}*θ*for effects of unequal sampling variance and covariance among populations we write(7)If

_{l}*θ*∼ MVN(

_{l}*ε*,

_{l}*ε*(1 −

_{l}*ε*)Ω), then

_{l}*X*∼ MNV(0, ) where is the identity matrix (

_{l}*i.e.*,

_{i}_{,}

*= 1 if*

_{j}*i*=

*j*and

_{i}_{,}

*= 0 otherwise). Note that this transform is not unique, as a range of other factorizations of the covariance matrix are possible.*

_{j}If we wish to test the correlation of our transformed allele frequencies with an environmental variable, we also need to similarly transform our environmental variable, to ensure that our frequencies and environmental variable are in the same frame of reference. Specifically, if our environmental variable is *Y* (standardized to be mean 0, variance 1), then our transformed environmental variable is(8)The fact that we need to do this follows from the fact that we are interested in predicting *θ _{l}* by

*Y*, and so what we are doing is conceptually equivalent to multiplying the left- and right-hand sides of the equation relating the two by

*C*

^{−1}to remove the effect of the covariance in allele frequencies. This rotation by

*C*

^{−1}means that

*X*and

*Y*′ are linear superpositions of

*θ*and

*Y*, respectively, and so the population labels no longer apply to particular entries in these vectors. The transform will tend to exaggerate the environmental variable differences between very closely related populations. Furthermore, if part of the variation in the environmental variable precisely matches the major axis of variation in the genetic data, then applying the transform may remove much of this variation. Both of these effects seem desirable properties, as we are interested in identifying correlations discordant with the patterns expected from drift. However, users should visually inspect

*Y*and

*Y*′ to understand how the transform has altered the environmental variable (see Supporting Information, Figure S1, Figure S2, Figure S3, and Figure S4 for examples).

We do not get to observe *θ _{l}*, so we obtain a representative sample of

*M*draws from the posterior . Given these draws, there is an enormous variety of ways that we could choose to summarize the support for the correlation with our environmental variable

*Y*′. Here we choose to write(9)

*i.e.*,

*ρ*is the mean of the function

_{l}*ρ*() over our posterior draws of

*X*.

_{l}In this article, we calculated Pearson’s and Spearman’s correlation coefficients [as our *ρ*()] as alternative tests to the Bayes factors. To obtain an appropriate sample from the posterior in a computationally efficient manner, these statistics were calculated between *X _{l}* and

*Y*′ every 500 MCMC generations and then averaged over the complete MCMC run. Our draws of

*X*will therefore be weakly autocorrelated, but as

_{l}*ρ*is a mean, this does not affect its expectation.

_{l}While this standardization, for a known , would work perfectly if our *θ _{l}* were really multivariate normal, in reality this is only an approximation, as even under the null model deviations due to drift are only approximately normal over short timescales. Thus, while we model drift at a locus as being multivariate normal (

*i.e.*,

*θ*has a prior given by Equation 3), if the true model is more complex, the joint probability of this along with our count data (and our uncertainty in Ω) may force

_{l}*θ*to not be MVN(0, ). While, under these circumstances,

_{l}*X*will conform to those assumptions better than

_{l}*θ*, we still choose to use the empirical distribution of

_{l}*ρ*across SNPs rather than rely on asymptotic results.

_{l}#### A robust statistic to detect extreme population differentiation:

This transformation to provide a set of *X _{l}* can also be used to propose an

*F*

_{ST}-like statistic. Specifically we can write the variance of

*X*at a locus as(10)Note that while the multiplication by

_{l}*C*

^{−1}is not a unique transform, would be the same no matter which matrix factorization was used. This statistic is closely related to

*F*

_{ST}, which can be expressed as(11)where Var(

*p*) is the variance of our allele frequency across populations (see Nicholson

_{lj}*et al.*2002; Balding 2003, for discussion). Thus

*X*is directly analogous to

^{T}X*F*

_{ST}, but it accounts for the variance–covariance structure of the populations in question. If

*θ*is truly multivariate normal, then is distributed . This suggests that is a natural test statistic to identify loci that deviate away strongly from the multivariate normal distribution,

_{l}*e.g.*, due to selection. Furthermore, this form naturally accounts for hierarchical population structure or other models of population structure that can confound

*F*

_{ST}-style outlier analyses (Excoffier

*et al.*2009). As we do not observe

*X*and rather obtained a posterior distribution of

_{l}*X*, we take the average across the sample of

_{l}*X*from our MCMC (we term this

_{l}*X*).

^{T}XOur *X ^{T}X* statistic is very closely related to the work of Bonhomme

*et al.*(2010), who independently developed a similar test statistic (extended Lewontin and Krakauer test, FLK) for the case of a known population tree (see also Nei and Maruyama 1975; Robertson 1975, for earlier discussion of the effect of a population tree on the Lewontin and Krakauer 1973 test). Specifically, Bonhomme

*et al.*(2010) use an equation analogous to Equation 10, but where Ω is a covariance matrix specified by a neighbor-joining population tree estimated from the data [using Reynolds’ genetic distance between population pairs (Reynolds

*et al.*1983)]. Thus we expect similar performance of

*X*to that of FLK in situations where the covariance matrix is well approximated by a tree and

^{T}X*X*to outperform FLK under models such as isolation by distance and more complex models where population history is poorly represented by a tree.

^{T}X### Sequencing of pooled samples

If genotyping is conducted as sequencing of population pools, an additional step of sampling is included. At a site *l* the total coverage in population *j* is *m _{jl}*, and we observe

*r*reads of allele 1. Assuming that each individual contributed the same number of chromosomes to the pool, the sequenced reads are the result of binomial sampling(12)where is the unknown sample allele frequency in the pooled sample. Summing over this unknown frequency (13)(again assuming Hardy–Weinberg equilibrium) gives us the probability of our sampled reads given the population frequency. This replaces the binomial probability (Equation 1) in the joint probability given by Equation 4. In Coop

_{jl}*et al.*(2010), the Bayes factors were approximated by an importance sampling technique while performing MCMC under the null model;

*i.e.*,

*β*= 0. This allowed the rapid calculation of the Bayes factor for many environmental variables with little extra computational cost. However, Bayes factors calculated by this technique are noisy, and so here we also implement an MCMC to estimate the posterior on

*β*. We place a uniform prior on

*β*(between −0.1 and 0.1) and update

*β*along with

*ε*and

_{l}*θ*. For our update on

_{l}*β*we use a random-walk sampler with a small normal deviate (

*σ*= 0.01) and accept this move with the ratio of the joint posterior of our proposed parameters to that of our current parameters. As a simple summary of the posterior support for

*β*≠ 0, we look at the skew of the posterior away from zero. Specifically we estimate the proportion (

*f*) of the marginal posterior on

*β*that is above 0 and then take

*Z*= |0.5 −

*f*| as a test statistic, with values of

*Z*close to 0.5 showing strong support for

*β*≠ 0.

### Power simulations

The extended model was implemented in Bayenv2.0. Simulations were conducted to evaluate the power of these extensions. To use both a realistic covariance among populations and realistic environmental values, we based these simulations on SNP data from the HGDP populations (Conrad *et al.* 2006; Li *et al.* 2008) and the environmental variables measured at these sampling locations (also used in Hancock *et al.* 2008; Coop *et al.* 2010). These simulations were implemented in R (R Development Core Team 2011). We employed a single Bayenv2.0 estimate of the covariance matrix from the original SNP data (sampled after 100,000 MCMC iterations) to simulate population allele frequencies. For each SNP, an ancestral frequency *ε _{l}* was drawn from a beta distribution (with parameters

*α*= 0.5,

*β*= 3) if not otherwise stated. Then population allele frequencies were drawn from the multivariate normal , with values >1 or <0 replaced with 1 and 0, respectively. Allele counts were then drawn binomially from these frequencies, with a sample size of 20 chromosomes per population (if not otherwise stated).

To construct a null distribution we calculated our test statistics for these simulated SNPs and an environmental variable *Y* during 100,000 MCMC iterations. For a second set of SNPs, an environmental effect was simulated by drawing their population allele frequencies from a multivariate normal , using a range of *β*’s. Again our test statistics were calculated over 100,000 MCMC iterations. Power estimates were based on the proportion of these SNPs that were detected at a certain significance level *α* (5% here), *i.e.*, the fraction of our simulations (with a *β*) in the upper *α* tail of the null distribution. Note that by taking this empirical approach we are setting our false-positive rate to 5%, and as such we do not need to investigate our false-positive rate.

#### Testing differentiation statistics:

To compare the power of differentiation statistics, we simulated data sets based on the HGDP populations, using the minimum winter temperature and latitude as environmental variables (latitude is obviously not an environmental variable, but is taken as a proxy for many environmental variables that are correlated with latitude). To highlight the effects of different sample sizes among populations, we simulated populations using the original sample sizes of the HGDP data (Li *et al.* 2008). Sets of neutral and selected SNPs were simulated as described above, and the differentiation statistics *F*_{ST} [using the pegas R library (Paradis 2010)], *X ^{T}X*, and the related FLK [using an R script provided by the authors (Bonhomme

*et al.*2010)] were calculated. The ancestral frequency was used as an outgroup population for the calculation of FLK. The ancestral allele frequency

*ε*was set to 0.5 for all SNPs simulated for these tests as we discovered noisy power estimates for FLK when the ancestral frequency was drawn from a beta distribution (not shown). All power estimates again used empirical cutoffs based on the neutral simulations.

#### Simulation of pooled samples:

For the simulation of pooled NGS data, we assume that the depth of coverage of a pool follows a negative binomial distribution, which allows for the overdispersion of read depths compared to the Poisson. Coverages for each population and SNP were independently drawn from a negative binomial distribution NB(*r*, *p*), for which we set *r* = 5 and set *p* to obtain the respective coverage mean [*i.e.*, NB(*r*, *p*) has a mean of *pr*/(1 − *p*) and a variance of *pr*/(1 − *p*)^{2}]. This represents a case where the variance–mean ratio increases for higher average coverages. This pattern is generally consistent with observations from pooled next-generation data in *Arabidopsis thaliana* (T. Günther, C. Lampei, I. Barilar, and K. J. Schmid, unpublished results). An environmental effect of |*β*| = 0.06 was simulated when all 52 HGDP populations were used and |*β*| = 0.15 was used for simulations of smaller population subsets. The sign of *β* was assigned to be positive or negative at random.

### Data analysis

Bayenv2.0 was employed to reanalyze a genome-wide data set of 640,698 SNPs from 52 HGDP-CEPH populations (Li *et al.* 2008; Hancock *et al.* 2010), using both Bayes factors and our nonparametric test statistic (*ρ _{l}*). We restricted our analysis to minimum winter temperature, as most winter climate variables show outliers. All environmental variables were normalized to a mean of zero and a standard deviation of one. The covariance matrix for this reanalysis was estimated from a random subset of 5000 SNPs after 100,000 MCMC iterations. Bayes factors and correlation coefficients for each SNP were estimated using 100,000 MCMC iterations. In addition to these test statistics, we sampled

*X*every 500 MCMC generations and averaged

_{l}*X*over these values for each SNP. SNP positions and gene annotations were obtained from Ensembl (Flicek

^{T}X*et al.*2012) and Entrez Gene (Maglott

*et al.*2011).

To explore the use of Bayenv on pooled data we reanalyzed a pooled sequence data set from eight Atlantic herring (*C. harengus*) populations that were sampled along a salinity gradient (Lamichhaney *et al.* 2012). Lamichhaney *et al.* (2012) sequenced eight population pools of 50 individuals and aligned their reads back to a *de novo* assembly of the *C. harengus* exome, with each pool being sequenced to on average 30-fold depth. They identified 440,817 SNPs in these data and concentrated their analyses on SNPs with at least 40 reads in each population to reduce sampling noise. We reanalyzed the data of Lamichhaney *et al.* (2012), using allele count data provided by the authors. We first estimated the covariance matrix, using a random subset of 1000 SNPs, chosen to have ≥40-fold coverage in all populations. To explore the performance of our pooled sequencing extension of Bayenv we calculated the statistic *Z* for the correlation of allele frequencies and salinity across the eight populations. The environmental correlation test was calculated for all SNPs with at least 40 reads in all populations (36,794 SNPs) and for 100,000 SNPs chosen at random irrespective of sequencing depth. All statistics were calculated during 100,000 MCMC iterations.

## Results

### Using tests based on standardized allele frequencies

We explored the performance of tests based on our standardized transformed population frequencies (*X _{l}*). Before we calculated test statistics on our standardized allele frequencies, we examined whether the multivariate standardization (as in Equation 7) had removed the covariance among populations from our standardized

*X*. We first calculated the sample covariance matrix, using the sample frequencies for all 2333 autosomal SNPs of Conrad

_{l}*et al.*(2006) shown in Figure 2A. Specifically, denoting the vector of sample frequencies by

*k*/

_{l}*n*we calculated . As expected, there is substantial structure in this sample covariance matrix between regions, which corresponds to known population structure in humans (Coop

_{l}*et al.*2010). Then we calculated the sample covariance matrix of the

*X*across these SNPs, using Bayenv2.0; specifically, we took a single draw of

_{l}*X*(after a burn-in) for each of these 2333 SNPs and calculated . The resulting sample covariance matrix (shown in Figure 2B) is close to the identity matrix in form, demonstrating that the majority of the covariance between populations has been removed. This suggests that our

_{l}*X*are appropriately standardized for the application of correlation tests, averaging across our uncertainty in

_{l}*X*at each locus.

_{l}To further test the normality of *X _{l}*, we checked whether the

*X*statistic follows a

^{T}X*χ*

^{2}-distribution with 52 d.f. (

*i.e.*, the number of populations). A QQ plot of the

*X*and the expected -distribution shows that the mean of the distribution approximately matches that of the , whereas the variances do not (Figure S5). Therefore, while

^{T}X*X*provides a potentially suitable summary statistic for identifying empirical outliers, we cannot assume a distributional form to those outliers under a null neutral model.

^{T}X#### Detecting extreme population differentiation:

To test the utility of *X ^{T}X* as a statistic to detect overly differentiated loci, we simulated adaptation of the HGDP populations to minimum winter temperature and latitude and explored different ways to detect such environmental effects with an empirical cutoff score determined by simulations of a neutral model. In addition to Bayenv’s Bayes factors, we calculated the differentiation statistics

*F*

_{ST},

*X*, and FLK (Bonhomme

^{T}X*et al.*2010). The environmental correlation Bayes factors clearly outperformed the differentiation statistics for both environments (Figure 3), which is consistent with the intuition that knowing the responsible environmental factor should improve the detection power (De Mita

*et al.*2013). Our simulations show that FLK and

*X*clearly outperform the traditional

^{T}X*F*

_{ST}-based tests, suggesting that accounting for the relationship among populations increases the power of differentiation statistics. Notably, FLK and

*X*have similar power with slightly higher estimates for

^{T}X*X*. The similar level of power of the two tests is perhaps not unsurprising, as the history of the worldwide HGDP populations can be well represented by a tree-like relationship [with a small number of migration events (Pickrell and Pritchard 2012)].

^{T}X#### Correlations using standardized allele frequencies:

To explore the power of standard correlation tests applied to our standardized *X _{l}*, in comparison to the Bayes factors, we again conducted power simulations based on the HGDP data. We calculated both Spearman’s

*ρ*and Pearson’s

*r*between

*Y*′ and our transformed allele frequencies averaged across the posterior on these transformed frequencies. Statistics based on our Bayesian model clearly outperform correlation tests calculated for point estimates from sample allele frequencies (Figure 4, A and B). This improvement in power is due to the fact that the methods based on the sample frequencies fail to incorporate the sampling noise and the relationship among populations. All three tests based on Bayenv performed effectively identically with marginal advantages of the Bayes factors for minimum winter temperature (Figure 4B) and a slightly lower power of Spearman’s

*ρ*, which is not surprising, as all simulated effects are linear. We expect that the relative performance of the rank-based test,

*i.e.*, Spearman’s

*ρ*, may be reduced as the number of populations is decreased. We also tested the power of the

*X*tests incorrectly, using

_{l}*Y*in place of

*Y*′, which led to power curves intermediate between the two sets (data not shown). Overall these results show that correlation tests based on

*X*perform well.

_{l}The alternative model of Bayenv (Equation 6) implies a linear relationship between the transformed allele frequencies and the environmental variable. However, the fitting and significance of this linear model may be misled by populations that are statistical outliers. For instance, linear models might mistakenly identify a SNP as a strong candidate, when a single outlier population features both an extreme environment and an extreme allele frequency. We note that the extreme allele frequency may be due to a component of drift not well modeled by our MVN framework or due to a selection pressure (or response) poorly correlated with our environmental variable of interest. While loci of the latter form are of interest as *genomic* outliers, we believe researchers interested in particular environmental variables would consider such loci spurious and would prefer a set of candidates where many populations support a consistent pattern of correlation with an environmental variable.

To test such a case, we used winter minimum temperature as a climate variable since one population, the Yakuts from northeast Russia, is characterized by a very low minimum temperature (Figure 4C). We simulated allele frequencies for the HGDP populations as described above but to create outliers, we set the allele frequencies of the Yakuts to 0. Bayes factors and Pearson’s correlation coefficient *r*, which both assume a linear relationship, showed an excess of false positives (Figure 4D), while a nonparametric Spearman’s rank correlation coefficient (*ρ*) was much less sensitive to these outliers, with a false-positive rate very close to the expected value of 5% (Figure 4D).

### Robust candidates in the HGDP data

We next explored the use of our standardized *X _{l}* for identifying robust putative candidates for adaptive evolution in a genome-wide data set of 640,698 SNPs from a global sample of 52 human populations [HGDP-CEPH (Li

*et al.*2008; Hancock

*et al.*2010)].

#### Environmental correlation statistics in the HGDP data:

As described above, populations with outliers in terms of allele frequencies and/or environments can potentially lead to spurious correlations. For example, the use of minimum winter temperature as an environmental variable could generate false-positive correlations in analyses of the HGDP data because of the extremely low temperature for the Yakut population. To explore this, we used minimum winter temperature and reanalyzed all SNPs of the HGDP data, calculating both Bayes factors and *ρ _{l}*(

*X*,

_{l}*Y*′) using Spearman’s rank correlation coefficient. Our Bayes factors and |

*ρ*(

_{l}*X*,

_{l}*Y*′)| are correlated across SNPs (Spearman’s

*ρ*= 0.72) and show an overlap of 29 SNPs in their top 100 most extreme SNPs, 142 SNPs in the top 500, and 2.8% in the top 5% of signals. These overlaps are substantial but suggest that our two tests are detecting somewhat different signals, which likely reflects in part the influence of outlier populations.

The 100 strongest signals of the Bayes factor analysis and |*ρ _{l}*(

*X*,

_{l}*Y*′)| are shown in Table S1 and Table S2. Among our top hits for both statistics are multiple SNPs that fall in the gene

*MKL1*(megakaryoblastic leukemia 1), which is a myocardin-related transcription factor that is involved in smooth muscle cell differentiation, mammary gland function, and cytoskeletal signaling (Parmacek 2007; Maglott

*et al.*2011) and has been associated with various disease phenotypes (Ma

*et al.*2001; Hinohara

*et al.*2009; Scharenberg

*et al.*2010).

To exemplify the effect of an outlier in Figure S6, we compare two SNPs that fall in our top 20 Bayes factors, but that have very different Spearman’s *ρ*. While in general, as seen above, there is good agreement between the Bayes factors and |*ρ _{l}*(

*X*,

_{l}*Y*′)|, we suggest that the Bayes factors, or other linear model test statistics, should be used in conjunction with robust test statistics such as those described here to avoid spurious signals due to outliers. As these both can be calculated from the same MCMC run, this should be reasonably computationally efficient.

#### Population differentiation statistics in the HGDP data:

We also explored our test statistic *X ^{T}X*, which is designed to highlight loci that deviate strongly from the expected pattern of population structure, by calculating it for each of the 640,698 HGDP SNPs. These

*X*statistics have been uploaded as a genome browser track to http://hgdp.uchicago.edu/. The empirical distribution is shown in Figure 5. The empirical distribution clearly differs from the expected -distribution, with a higher mean and a lower variance than expected. This again highlights that we do not have a good theoretical expectation for the distribution and so must use the empirical ranks to judge how interesting a signal is.

^{T}XTo examine the relationship between *X ^{T}X* and global

*F*

_{ST}we took per SNP values of global

*F*

_{ST}previously calculated among the population groups as colored in Figure 2 (values from Coop

*et al.*2009; Pickrell

*et al.*2009). The Spearman’s

*ρ*between

*X*and

^{T}X*F*

_{ST}was 0.48. Looking at the extremes of both distributions,

*X*and

^{T}X*F*

_{ST}show an overlap of 6 SNPs in their top 100 most extreme SNPs, 37 SNPs in the top 500, and 1.4% in the top 5% of signals.

In Table S3 we present the top 100 *X ^{T}X* SNPs in the genome, along with their nearest genes and global

*F*

_{ST}values. To briefly explore where known signals fall in our empirical distribution in Figure 5, we also plot as arrows the maximum

*X*for SNPs that fall within 50 kbp up- and downstream of 10 well-known pigmentation genes (list taken from Pickrell

^{T}X*et al.*2009). As these arrows represent maximums across a number of SNPs around the gene, they will necessarily be more extreme than an average draw from this distribution. However, the extreme signals at a number of these genes demonstrate that the method is detecting loci with extreme allele frequency patterns. The SNP with the most extreme value of

*X*in the genome falls close to

^{T}X*SLC24A5*(Nakayama

*et al.*2002; Lamason

*et al.*2005), while a SNP close to

*SLC45A2*is the second-largest signal in the genome. More generally, 5 of these 10 pigmentation genes fall in the top 1% and 9 genes fall in the top 5% of the empirical distribution. A SNP close to the gene

*EDAR*, one of the highest pairwise

*F*

_{ST}’s between East Asia and Western Eurasia HGDP populations, is also in the top 10 SNPs (Sabeti

*et al.*2007).

The weak overlap in the tails of the genome-wide *X ^{T}X* and

*F*

_{ST}means that they are finding different sets of candidate SNPs, presumably due to the reweighting of allele frequencies in

*X*. For example, our 12th highest SNP for

^{T}X*X*falls close to

^{T}X*MCHR1*, with our 21st highest gene being a nonsynonymous variant (rs133072) in the same gene.

*MCHR1*(melanin-concentrating hormone receptor 1) is known to play a role in the intake of food, body weight, and energy balance in mice (Marsh

*et al.*2002), and the effect of the nonsynonymous variant on human obesity has been debated (Wermter

*et al.*2005; Rutanen

*et al.*2007; Kring

*et al.*2008; Speliotes

*et al.*2010). Both of these SNPs are nearly fixed differences between East Asia and the American HGDP populations (Figure S7 and Figure S8). This strong difference between regions that share a recent history and, thus, covariance among allele frequencies (Figure 2) makes these SNPs an interesting pattern for

*X*. However, neither of these two SNPs has an extremely impressive global

^{T}X*F*

_{ST}(falling in only the 5% tail), presumably because East Asia and the American HGDP populations are only two of seven groups in the global

*F*

_{ST}calculation and the other five groups do not show an interesting pattern.

### Analysis of pooled data

Pooled sequencing of multiple individuals has increased in popularity, as it is considerably cheaper than barcoding all individuals and sequencing them separately (but see Cutler and Jensen 2010, for a discussion of drawbacks). The use of allele frequencies estimated from the resulting read counts seems to be a reasonable application of our method. However, it raises the question of how Bayenv behaves for different coverages as increasing sequencing coverage is not the same as increased numbers of sampled individuals.

#### Power simulations:

We simulated data that resemble the HGDP populations and then pooled 10 diploid individuals (*i.e.*, 20 chromosomes) from each population and used the populations’ respective latitudes as our environmental variable. First, we experimented with incorrectly using read counts in place of the chromosome counts (*i.e.*, assuming *r _{jl}* and

*m*were

_{jl}*k*and

_{jl}*n*, respectively) and found that this resulted in an excess of extreme Bayes factors for high coverages under the null (data not shown). We found this inflation to be most pronounced when read depths are greater than the actual sample size, and this is likely due to false certainty about the population frequencies by failing to account for the second level of sampling.

_{jl}We then ran power simulations of Bayenv matched to the HGDP data, with *Z* as a test statistic, using the true sample frequencies (black squares in Figure 6) and the read counts (incorrectly) as the input data for the previous version of Bayenv (Bayenv1.0, black circles in Figure 6). Bayenv2.0, which accounts for both stages of binomial sampling in pooled data (as described above), was also applied to the same read counts (white circles in Figure 6). The true sample frequencies naturally resulted in the best power as there is no additional sampling noise (Figure 6A). For higher mean coverages the power of Bayenv1.0 using the read counts as sample allele frequencies was almost as good as the power using true sample allele frequencies (Figure 6A). As most applications may consist of a smaller number of populations, we additionally sampled two subsets (Figure 6, B and C). On all of these, as expected, Bayenv using the true sample frequencies outperformed Bayenv1.0 using the read counts.

In part, the poor power in pooled studies is unavoidable due to the additional sampling noise. However, the loss of power is likely boosted by failing to properly account for this second stage of sampling, which leads to poor performance due to variation in depth across populations and SNPs. Including the sampling of reads into the model clearly had a positive effect on power in our population subsets, while incorrectly using the read counts as input did not reach similar powers, even with high coverages (Figure 6, B and C). However, the power of Bayenv2.0 was still considerably low for mean coverages <20×, suggesting that such low read depths do not provide enough certainty for reliable frequency estimation. Somewhat surprisingly, we did not observe any advantages of the extended model in detection power if all 52 HGDP populations are simulated (Figure 6A), perhaps because differences in coverage over populations are averaged over so many populations.

We note that both our original implementation and Bayenv2.0, which acknowledges the pooled nature of the data, will outperform tests of association that use the sample frequency computed from the read count data. This follows from the fact that both of these tests more fully acknowledge sampling noise. This is an important issue for next-generation data, as most technologies currently lead to high variation in coverage along the genome, which will be a substantial source of additional noise. Failure to acknowledge this noise across loci will greatly reduce the power of tests of environmental correlations.

#### Analysis of empirical data from Atlantic herring:

Finally, we applied Bayenv2.0 to the data of Lamichhaney *et al.* (2012), which consist of pooled sequence for eight Atlantic herring populations sampled along a salinity gradient, and identified outliers for differentiation using *F*_{ST} and *P*-values of a *χ*^{2}-test on the read counts. For their *F*_{ST} analysis, Lamichhaney *et al.* (2012) selected a subset of their data set consisting of the 36,794 SNPs (of 440,817 SNPs in total) with at least 40 reads in each population; this sampling cutoff was implemented to reduce the impact of sampling noise on *F*_{ST}. While this is a sensible approach, it does discard data. Approaches, like those in Bayenv2.0, that acknowledge sampling should make better use of these data, as they can identify candidate loci in regions with lower coverage while avoiding false positives due to sampling noise. We calculated *Z* for the salinity gradient on the 36,794 SNPs (with coverage >40×) and for a random subset of 100,000 SNPs chosen regardless of their coverage. In addition, we calculated *F*_{ST} and *P*-values of a *χ*^{2}-test on the read counts for these data [to compare to the original study (Lamichhaney *et al.* 2012)]. To demonstrate how these three tests treat sites with different coverages, we show the mean coverage of sites in the upper 10–0.5% tail in Figure 7A. SNPs in the tail of *F*_{ST} show a strong bias toward loci with lower coverages, which in turn implies that many of these loci are likely false positives due to sampling error being mistaken for extreme population differentiation. On the other hand, SNPs in the tails of the *χ*^{2}-test’s *P*-value are enriched for high-coverage positions, reflecting the fact that this significance of the *χ*^{2}-test will be greater for sites with higher coverage, even if those sites do not have particularly high levels of differentiation. Using our *Z* statistic, which accounts for the sampling error while looking for differentiation along an environmental gradient, we find that our outliers seem much less biased toward either extremely high or extremely low coverage. This observation, along with our higher power to detect environmental correlations than differentiation-based approaches (Figure 3 and De Mita *et al.* 2013), suggests that our *Z* statistic is potentially detecting more true signals of loci that are strongly differentiated along the salinity gradient while the number of false positives is reduced.

Among the contigs in which the top 200 SNPs for our *Z* statistic were found, only 8 were not named as candidates by Lamichhaney *et al.* (2012), suggesting good agreement with their extreme outliers. More generally, we find >70% of our top 1% *Z* SNPs among the top 5% of *F*_{ST} and *χ*^{2} (Figure 7B). This enrichment increases with coverage, reassuringly suggesting that the approaches are in stronger agreement about outliers in regions with high coverage. The overlap is not perfect; the tails of the differentiation statistics should also include SNPs involved in adaptation to other environmental variables and outliers caused by selection in single populations. Due to the current poor annotation of herring data available in public databases, we refrain from further speculation about the biological relevance of our top hits.

## Discussion

In this article we have presented a method to more robustly identify loci where allele frequencies correlate with environmental variables. We have also described a method to detect loci that are outliers with respect to genome-wide population structure, while accounting for the differential relatedness across populations.

Many available tests for selection are designed to detect rapid complete sweeps from new mutations; however, such events are likely just a small percentage of adaptive genetic change (Coop *et al.* 2009; Pritchard *et al.* 2010; Cao *et al.* 2011; Hernandez *et al.* 2011). Analyzing allele frequencies across multiple populations offers the opportunity to detect selection acting on standing variation and polygenic phenotypes. The falling cost of genotyping means that typing individuals from many populations is now in reach, a development that will allow us to connect environmental variables to more subtle adaptive genetic variation. However, we stress that loci detected by the approaches discussed above are obviously at best just candidates for being involved in adaptation to a particular climate variable, or set of climate variables, and so additional evidence is needed to build the adaptive case at any locus.

Our use of the covariance matrix of population allele frequencies when looking for environmental correlations is conceptually similar to linear mixed-model (LMM) approaches that account for kinship structure in genome-wide association studies (GWAS) (*e.g.*, Yu *et al.* 2006; Kang *et al.* 2008, 2010; Zhou and Stephens 2012, who use an observed relatedness matrix as the covariance matrix of the random effect). One important difference is that we seek to predict allele frequencies at a locus using the environmental variable, whereas these LMM methods are predicting a phenotype as a function of genotypes at a locus. In our approach, the equivalent of the random-effect matrix is a proxy for a neutral model of allele frequency variation, while in the application to GWAS the kinship matrix accounts for the confounding due to heritable variation in the phenotype elsewhere in the genome. Our model could be used to detect loci that strongly covaried with population mean phenotypes [*e.g.*, phenotypes measured at the breed level in dogs (Boyko *et al.* 2010)]. However, the method used this way would have a high rate of false positives if there are large environmental effects on the phenotype that coincide with the principal axes of the covariance matrix. Similarly, LMM approaches could be used to identify loci that covaried with environmental gradients, but they may be underpowered, because their random-effects model does not attempt to reflect a model of genetic drift.

### Standardized allele frequencies

We introduced a set of tests based on using our model of the covariance of allele frequencies to produce a set of standardized allele frequencies (*X _{l}*). The calculation of standardized allele frequencies allows us to calculate a variety of statistics while taking advantage of the other features of Bayenv2.0’s approach to account for covariance among populations and sampling noise. The removal of covariance is often a standard step in multivariate analysis; here we remove this covariance structure in a way that acknowledges the approximate form of genetic drift and the bounded nature of allele frequencies. By integrating our statistic across the posterior for

*X*, we are averaging across our uncertainty in allele frequencies, which should further increase our power. Since the

_{l}*X*are estimated using a single sample of Ω from the posterior, they are dependent on the quality of this . With a large number of SNPs used to estimate the covariance matrix, MCMC runs converge to a small set of Ω’s with little variance, so in practice this is not a major concern (for smaller numbers of loci, investigators could average the covariance matrix across the MCMC output).

_{l}As an example of the usefulness of the *X _{l}*, we explored their application in identifying robust correlations with environmental variables. While the use of Spearman’s

*ρ*on these transformed allele frequencies results in a small loss of power, it is much less sensitive to outliers and able to detect any monotonic relationship. Therefore, a combined approach that takes a set of SNPs in the intersection of the tail of Bayes factors and in the tail of Spearman’s

*ρ*on our transformed allele frequencies should provide best results.

Our transformed allele frequencies could also be used to detect and distinguish between the effects of multiple environmental variables shaping variation at a locus. This could be accomplished by including the multiple transformed environmental variables (*Y*′) into a linear model to predict the *X _{l}* at a locus or by applying appropriate transformed ecological niche models (ENM) to the

*X*to understand the predictors of allele frequencies at a locus (see Fournier-Level

_{l}*et al.*2011; Banta

*et al.*2012 for applications of ENMs to allele frequencies). However, there is limited information about the effects of even a single environmental variable from contemporary allele frequencies if neutral allele frequencies are autocorrelated on the same scale as environmental variation (as is the case in humans). Therefore, we caution that in many situations there will be very limited power to learn about the effect of multiple environmental variables.

Using our *X _{l}* statistics, we also introduced a method to identify loci that are outliers from the general pattern of population structure (our

*X*statistic). This statistic is closely related to

^{T}X*F*

_{ST}, which can be expressed as Var(

*p*)/(

_{lj}*ε*(1 −

_{l}*ε*)), where Var(

_{l}*p*) is the variance of our allele frequency across populations (see Nicholson

_{lj}*et al.*2002; Balding 2003; Bonhomme

*et al.*2010, for discussion). Our statistic, which is the variance of

*X*, can be written as Equation 10, and so

_{l}*X*can be seen as closely related to

^{T}X*F*

_{ST}calculated on the standardized allele frequencies. Importantly, by removing the covariance, we reweight populations so that a small change shared across many closely related populations is downweighted. This reweighting therefore strongly increases our power to detect unusual allele frequencies compared to that of global

*F*

_{ST}. The fact that we remove the covariance between closely related populations also means that, unlike in

*F*

_{ST}-based methods, we do not have to arbitrarily clump populations to identify globally differentiated SNPs. While in this article we use the 52 HGDP population labels, in principle Bayenv2.0 could be run, treating each individual as a population, allowing

*X*to be calculated without regard to any population label. However, this would be computationally time-consuming with thousands of individuals. In such cases perhaps the sample frequencies and the Cholesky decomposition of the sample covariance matrix could instead be used to mitigate the computational burden. This could also be done for environmental correlations for large numbers of samples.

^{T}XIdeally our *X ^{T}X* statistic would have a parametric distribution under a general null model in which only drift and migration shaped our frequencies. That might allow us to make statements about what fraction of allele frequency change was due to selection. Indeed, as noted above, if our population frequencies were truly multivariate normal, our

*X*statistic would be

^{T}X*χ*

^{2}-distributed if our sample sizes were sufficiently large. This assumption would be approximately met if our levels of drift were sufficiently small, such that the transition density of allele frequencies was well approximated by a normal (see Price

*et al.*2009; Bhatia

*et al.*2011 for recent empirical applications along these lines). However, when levels of drift are higher, our normal approximation will break down, as demonstrated by the poor fit of the

*χ*

^{2}to the transformed HGDP frequencies. The distribution of our statistic could be obtained by simulation if the population history were known. In practice, we are skeptical that our knowledge of population genetic history will be sufficiently accurate to make this feasible, but simulations may be useful in guiding the setting of approximate significance levels.

### Pooled next-generation sequencing

Recent empirical validations have shown that pooled resequencing of populations is a powerful and cost-efficient way to estimate allele frequencies (Zhu *et al.* 2012; but see Cutler and Jensen 2010). The downside of the saving of costs in library preparation and sequencing is the potential for increased sampling noise in the allele frequency estimates (Futschik and Schlötterer 2010; Zhu *et al.* 2012) and the loss of haplotype information [although some haplotypic information can be recovered (Long *et al.* 2011)]. We account for the sampling of sequencing reads as an additional level of binomial sampling in the model of Bayenv2.0. Our power simulations show that accommodating the extra level of sampling in pooled designs can help to improve the power. However, they also highlight the large, unavoidable loss in power due to increased sampling noise when the depth of coverage is low. The only way that this can be circumvented is through increasing sequencing coverage to provide sufficient certainty in the estimated allele frequencies and, thus, sufficient power to detect environmental correlations. Although low fold sequencing of many populations may help to increase power in some situations, it is likely that for some species (notably humans) sampling, and not sequencing, will be the limiting resource in the future.

Our model of pooled resequencing in Bayenv2.0 implies uniform sampling of reads from each individual. Therefore, we do not account for the possibility of an unequal number of chromosomes per individual due to measurement errors, different DNA content per individual, or differences caused during DNA extraction, all of which might cause additional noise in the allele frequency estimation (Cutler and Jensen 2010; Futschik and Schlötterer 2010). This additional noise, if it is constant across loci, should be absorbed into the covariance matrix in Bayenv2.0, which will result in a reduction in power. However, including a sufficient number of individuals in each pool should mitigate this effect (Zhu *et al.* 2012). Furthermore, our model assumes perfectly called bases, as we do not consider quality scores or sequencing errors. Researchers dealing with NGS data should exercise caution with these issues. However, examining multiple-population pools simultaneously provides some straightforward approaches to minimize error rates in SNP calling, such as calling only SNPs supported by a minimum number of reads in at least one population (Futschik and Schlötterer 2010). Such strategies are already good practice in studies of pooled samples and should be used in combination with the Bayenv model. For the application to individual-based NGS data, further possible extensions of our model include acknowledgment of sequencing errors and a probabilistic approach to genotype calling (see Nielsen *et al.* 2011, for a discussion on SNP calling from NGS data).

### Outlook

The population genomic comparison of closely related populations that differ strongly in environmental variables has already yielded many great candidate loci [see, for example, altitude adaptation in Tibetans (Beall *et al.* 2010; Simonson *et al.* 2010; Yi *et al.* 2010)]. The methods developed here and elsewhere are part of realizing the power of these population comparisons. Such empirical studies also highlight the current deficiencies of such methods, as some of the best signals in these studies are not shared across populations with broadly similar environments and instead indicate that adaptation has occurred through independent mutations in the same gene or pathway. For example, high-altitude adaptation seems to have a different genetic basis in highland Ethiopian and Andean populations (Bigham *et al.* 2010; Scheinfeldt *et al.* 2012). Methods based on environmental correlations will fail to detect such cases, unless the data are split into the appropriate geographic subsets (*e.g.*, Hancock *et al.* 2011c) on an appropriate geographic scale (Ralph and Coop 2010). While shared standing variation will surely be part of the adaptive response across geographically separated instances of similar environments, ideally we would have methods that could cluster signals at the level of the gene or pathway to allow putative cases of parallel adaptation to be identified (Daub *et al.* 2013). The development of such techniques poses an important challenge for future method development.

## Acknowledgments

We thank Jeremy Berg, Gideon Bradburd, Yaniv Brandvain, Fabian Freund, Chuck Langley, Joe Pickrell, Jonathan Pritchard, Peter Ralph, Jeffrey Ross-Ibarra, Karl Schmid, and Alisa Sedghifar for helpful discussions and comments on earlier versions of the manuscript. We acknowledge the various comments we received from readers of the preprint of the manuscript available on the arXiv. The *X ^{T}X* estimates for the HGDP data were kindly made available through the HGDP selection browser (hgdp.uchicago.edu/) by Joe Pickrell. We also thank Alvaro Martinez Barrio and Leif Andersson for sharing the herring data. T.G. was supported by the German Federal Ministry for Education and Research (Synbreed, 0315528D) and by a Volkswagen Foundation scholarship (I/84225), affording him a visit to the University of California, Davis. G.C. was supported by a Sloan Foundation fellowship.

## Footnotes

*Communicating editor: M. A. Beaumont*

- Received April 23, 2013.
- Accepted June 17, 2013.

- Copyright © 2013 by the Genetics Society of America