## Abstract

In population genomics studies, accounting for the neutral covariance structure across population allele frequencies is critical to improve the robustness of genome-wide scan approaches. Elaborating on the BayEnv model, this study investigates several modeling extensions (i) to improve the estimation accuracy of the population covariance matrix and all the related measures, (ii) to identify significantly overly differentiated SNPs based on a calibration procedure of the XtX statistics, and (iii) to consider alternative covariate models for analyses of association with population-specific covariables. In particular, the auxiliary variable model allows one to deal with multiple testing issues and, providing the relative marker positions are available, to capture some linkage disequilibrium information. A comprehensive simulation study was carried out to evaluate the performances of these different models. Also, when compared in terms of power, robustness, and computational efficiency to five other state-of-the-art genome-scan methods (BayEnv2, BayScEnv, BayScan, flk, and lfmm), the proposed approaches proved highly effective. For illustration purposes, genotyping data on 18 French cattle breeds were analyzed, leading to the identification of 13 strong signatures of selection. Among these, four (surrounding the KITLG, KIT, EDN3, and ALB genes) contained SNPs strongly associated with the piebald coloration pattern while a fifth (surrounding PLAG1) could be associated to morphological differences across the populations. Finally, analysis of Pool-Seq data from 12 populations of *Littorina saxatilis* living in two different ecotypes illustrates how the proposed framework might help in addressing relevant ecological issues in nonmodel species. Overall, the proposed methods define a robust Bayesian framework to characterize adaptive genetic differentiation across populations. The BayPass program implementing the different models is available at http://www1.montpellier.inra.fr/CBGP/software/baypass/.

CONTRASTING patterns of local genetic variation over the whole genome represent a valuable strategy to identify loci underlying the response to adaptive constraints (Cavalli-Sforza 1966). As further noted by Lewontin and Krakauer (1973, p. 176), “while natural selection will operate differently for each locus and each allele at a locus, the effect of breeding structure is uniform over all loci and all alleles.” Hence, genome-scan approaches to detect footprints of selection aim at discriminating among the global effects of the demographic evolutionary forces (*e.g.*, gene flow, inbreeding, and genetic drift) from the local effect of selection (Balding and Nichols 1995; Vitalis *et al.* 2001). In practice, applications of these methods have long been hindered by technical difficulties in assessing patterns of genetic variation on a whole-genome scale. However, the advent of next-generation sequencing and genotyping molecular technologies now allows researchers to provide a detailed picture of the structuring of genetic variation across populations in both model and nonmodel species (Davey *et al.* 2011). As a result, in the population genomics era, a wide range of approaches have been developed and applied to detect selective sweeps using population data (see Oleksyk *et al.* 2010 and Vitti *et al.* 2013, for reviews). Among these, population differentiation ()-based methods still remain among the most popular, particularly in nonmodel species since they do not require accurate genomic resources (*e.g.*, physical or linkage maps) and experimental designs with only a few tens of genotyped individuals per population are generally informative enough. Also, -based methods are well suited to the analysis of data from Pool-Seq experiments that consist of sequencing pools of individual DNAs (Schlötterer *et al.* 2014) and provide cost-effective alternatives to facilitate and even improve allele frequency estimation at genome-wide markers (Gautier *et al.* 2013).

In practice, assuming the vast majority of the genotyped markers behave neutrally, overly differentiated loci that are presumably subjected to selection might simply be identified from the extreme tail of the empirical distribution of the locus-specific (Akey *et al.* 2002; Weir *et al.* 2005; Flori *et al.* 2009). Even if such a model-free strategy does not rely on any arbitrary assumptions about the (unknown) demographic history of the sampled populations, it prevents one from controlling for false positive (and negative) signals. Conversely, model-based approaches have also been developed and are basically conceived as locus-specific tests of departure from expectation under neutral demographic models (*e.g.*, Gautier *et al.* 2010a). These include, for instance, demographic models under pure drift (Nicholson *et al.* 2002; Gautier *et al.* 2010a) and at migration–drift equilibrium without (Beaumont and Balding 2004; Foll and Gaggiotti 2008; Riebler *et al.* 2008; Guo *et al.* 2009) or with selection (Vitalis *et al.* 2014). Although robust, to some extent, to more complex history (Beaumont and Nichols 1996; Beaumont 2005), these methods remain limited by the oversimplification of the underlying demographic models. In particular, hierarchically structured population histories, produced under tree-shaped phylogenies, have been shown to increase false positive rates (Excoffier *et al.* 2009). To cope with these issues, two kinds of modeling extensions have recently been explored. They either rely on hierarchical island models, thus requiring a prior definition of the sampled population relationships (Gompert *et al.* 2010; Foll *et al.* 2014), or consist of estimating the correlation structure of allele frequencies across the populations that originates from their shared history (Bonhomme *et al.* 2010; Coop *et al.* 2010; Günther and Coop 2013).

Whatever the method used, the main limitation of the indirect genome-scan approaches ultimately resides in the biological interpretation of the footprints of selection identified, *i.e.*, to which adaptive constraints the outlier loci are responding. In species with functionally annotated reference genomes, the characterization of cofunctional relationships among the genes localized within regions under selection might help in gaining insights into the underlying driving physiological pathways (*e.g.*, Flori *et al.* 2009). Although, following a “reverse ecology” approach (Li *et al.* 2008), they may further lead to the definition of candidate adaptive traits for validation studies, such interpretations remain prone to misleading storytelling issues (Pavlidis *et al.* 2012). Alternatively, prior knowledge about some characteristics discriminating the populations under study could provide valuable insights. Focusing on environmental gradients, several approaches have recently been proposed to evaluate association of ecological variables with marker genetic differentiation by extending -based models (Joost *et al.* 2007; Hancock *et al.* 2008, 2011; Coop *et al.* 2010; Poncet *et al.* 2010; Frichot *et al.* 2013; Günther and Coop 2013; Guillot *et al.* 2014; de Villemereuil and Gaggiotti 2015). The rationale is that environmental variables distinguishing the differentiated populations should be associated with allele frequencies differences at loci subjected to the selective constraints they impose (Coop *et al.* 2010). In principle, such population-based association studies may also be more broadly relevant to any quantitative or categorical population-specific covariable. More generally, as for the covariable-free genome-scan approaches, accounting for the neutral correlation of allele frequencies across populations is critical for these methods (De Mita *et al.* 2013; de Villemereuil *et al.* 2014).

Overall, the Bayesian hierarchical model proposed by Coop *et al.* (2010) and implemented in the BayEnv2 software represents a flexible framework to address these issues. It indeed allows one to both identify outlier loci (Günther and Coop 2013) and further annotate the resulting footprints of selection by quantifying their association with population-specific covariables (if available). A key parameter of the model is the (scaled) population covariance matrix across population allele frequencies. Although this matrix might be viewed as purely instrumental, it explicitly incorporates their neutral correlation structure and is in turn highly informative for demographic inference purposes (Pickrell and Pritchard 2012; Lipson *et al.* 2013). Elaborating on the BayEnv model (Coop *et al.* 2010; Günther and Coop 2013), the purpose of this article is threefold. First, we introduce modeling modifications and extensions to improve the estimation accuracy of the population covariance matrix and the different related measures. Second, we propose a posterior checking procedure to identify markers subjected to adaptive differentiation based on a calibration of the XtX statistics (Günther and Coop 2013). Third, we investigate alternative modeling strategies and decision criteria to perform association studies with population-specific covariables. In particular, we introduce a model with a binary auxiliary variable to classify each locus as associated or not. Through the prior distribution on this latter variable, the approach deals with the problem of multiple testing (*e.g.*, Riebler *et al.* 2008). In addition, if information about marker positions is available, this modeling also allows us to account for linkage disequilibrium (LD) between markers via an Ising prior. As a by-product of this study, a user-friendly and freely available program, named BayPass (for Bayesian population association analysis), was developed to implement inferences under the different models. To evaluate the accuracy of the methods, we further carried out comprehensive simulation studies. In addition, two real data sets were analyzed in more detail to illustrate the range of application of the methods. The first one consists of 453 individuals from 18 French cattle breeds genotyped at 42,056 SNPs (Gautier *et al.* 2010b) and the second one consists of Pool-Seq data on 12 *Littorina saxatilis* populations from three distinct geographical regions and living in two different ecotypes (Westram *et al.* 2014).

## Models

In the following we describe the different Bayesian hierarchical models considered in this study and implemented in the BayPass program. Consider a sample made of *J* populations (sharing a common history) with a label, *j*, which varies from 1 to *J*. The data consist of *I* SNP loci, which are biallelic markers with the reference allele arbitrarily defined (*e.g.*, by randomly drawing the ancestral or the derived state). Let be the total number of genes sampled at the *i*th locus () in the *j*th population (), that is, twice the number of genotyped individuals in a diploid population. Let be the count of the reference allele at the *i*th locus in the *j*th sampled population. When considering allele count data, the ’s (and the ’s) are the observations while for Pool-Seq data, read count are observed instead. In this case, the ’s correspond for all the markers within a given pool to its haploid sample size (*i.e.*, twice the number of pooled individuals for diploid species). Let further be the (observed) total number of reads and be the (observed) number of reads with the reference allele. For Pool-Seq data, to integrate over the unobserved allele count, the conditional distribution of the given and the (unknown) is assumed binomial (Gautier *et al.* 2013; Günther and Coop 2013):

Assuming Hardy–Weinberg equilibrium, the conditional distribution of given and the (unknown) allele frequency is also assumed binomial: (1)Note that this corresponds to the first level (likelihood) of the hierarchical model when dealing with allele count data and to the second level (prior) for Pool-Seq data. As previously proposed and discussed (Nicholson *et al.* 2002; Coop *et al.* 2010; Gautier *et al.* 2010a), for each SNP *i* and population *j* an instrumental variable taking value on the real line is further introduced such that As represented in Figure 1, three different subclasses of models are considered (each with their allele and read counts version). They are hereafter referred to as (i) the core model (Figure 1A), (ii) the standard covariate (STD) model (Figure 1B), and (iii) the auxiliary variable covariate (AUX) model (Figure 1C). Note that the core model is nested within the STD model, which is itself nested within the AUX model.

### The core model

The core model (Figure 1A) is a multivariate generalization of the model by Nicholson *et al.* (2002) that was first proposed by Coop *et al.* (2010). For each SNP *i*, the prior distribution of the vector is multivariate Gaussian, (2)where is an all-one vector of length *J*, the precision matrix is the inverse of the (scaled) covariance matrix () of the population allele frequencies, and is the weighted mean reference allele frequency that might be interpreted as the ancestral population allele frequency (Coop *et al.* 2010; Pickrell and Pritchard 2012). The are assumed *β*-distributed: (3)In such models, the parameters and are frequently fixed. For instance, in BayEnv2 (Coop *et al.* 2010), leading to a uniform prior on over the (0, 1) support. However, these parameters may be easily estimated from the model by specifying a prior distribution on the mean and the so-called “sample size” (Kruschke 2014). Hence, a uniform and an exponential prior distribution are respectively considered for these two parameters, (4)and (5)Finally, a Wishart prior distribution is assumed for the precision matrix (6)*i.e.*, ( being the identity matrix of size *J*). For this is strictly equivalent to the parametrization introduced in Coop *et al.* (2010) who eventually came to fix Here, weaker informative priors are also explored with (*e.g.*, Gelman *et al.* 2003, p. 581), leading to so-called singular Wishart distributions. As will become apparent, appears as the best default choice. Note, however, that inspection of the full conditional distribution of (see Supporting Information, File S1) suggests the influence of the prior might become negligible with increasing number of SNPs *I* and populations *J*.

### The STD model

The STD model represented in Figure 1B extends the core model as Coop *et al.* (2010) proposed and allows us to evaluate association of SNP allele frequencies with a population-specific covariable vector Note that is a (preferably scaled) vector of length *J* containing for each population the measures of interest. Under the STD model,the prior distribution of the vector is multivariate Gaussian for each SNP *i*: (7)The prior distribution for the correlation coefficients () is assumed uniform: (8)Unless stated otherwise, and instead of and as in Coop *et al.* (2010).

### The AUX model

The AUX model represented in Figure 1C is an extension of the STD model that consists of attaching to each locus regression coefficient a Bayesian (binary) auxiliary variable In a similar population genetics context, this modeling was also proposed by Riebler *et al.* (2008) to identify markers subjected to selection in a genome-wide scan for adaptive differentiation (under a -model). In the AUX model, the auxiliary variable actually indicates whether a specific SNP *i* can be regarded as associated with the covariable vector () or not (). As a consequence, the posterior mean of may directly be interpreted as a posterior probability of association of the SNP *i* with the covariable, from which a Bayes factor (BF) is straightforward to derive (Gautier *et al.* 2009). Under the AUX model, the prior distribution of the vector is multivariate Gaussian for each SNP *i*: (9)Assuming information about marker positions is available, the ’s auxiliary variables also make it easy to introduce spatial dependency among markers. In the context of high-throughput genotyping data, SNPs associated to a given covariable might indeed cluster in the genome due to LD with the underlying (possibly not genotyped) causal polymorphism(s). To learn from such positional information, the prior distribution of the vector of SNP auxiliary variables, takes the general form of a 1D Ising model with a parameterization inspired from Duforet-Frebourg *et al.* (2014), (10)where (respectively ) are the numbers of SNPs associated (respectively not associated) with the covariable, and is the number of pairs of consecutive markers (neighbors) that are in the same state at the auxiliary variable (*i.e.*, ). The parameter *P* corresponds to the proportion of SNPs associated to the covariable and is assumed β-distributed: (11)Unless stated otherwise, and This amounts to assuming *a priori* that only a small fraction of the SNPs ( = 1%) are associated to the covariable, but within a reasonably large range of possible values (*e.g.*, *a priori*). Importantly, integrating over the uncertainty on the key parameter *P* allows us to deal with multiple-testing issues.

Finally, the parameter called the inverse temperature in the Ising (and Potts) model literature, determines the level of spatial homogeneity of the auxiliary variables between neighbors. When the relative marker position is ignored (no spatial dependency among markers). This is thus equivalent to assuming a Bernoulli prior for the ’s: as in Riebler *et al.* (2008). Conversely, leads us to assume that the ’s with similar values tend to cluster in the genome (the higher the is, the higher the level of spatial homogeneity). In practice, is commonly used and values of are recommended. Note that the overall parametrization of the Ising prior assumes no external field and no weight (as in the so-called compound Ising model) between the neighboring auxiliary variables. In other words, the information about the distances between SNPs is therefore not accounted for and only the relative positions of markers are considered. Hence, marker spacing is assumed homogeneous.

## Materials and Methods

### Markov chain Monte Carlo sampler

To explore the different models and estimate the full posterior distribution of the underlying parameters, a Metropolis–Hastings within Gibbs Markov chain Monte Carlo (MCMC) algorithm was developed (see File S1 for a detailed description) and implemented in a program called BayPass. The software package containing the Fortran 90 source code, a detailed documentation, and several example files is freely available for download at http://www1.montpellier.inra.fr/CBGP/software/baypass/. Unless otherwise stated, a MCMC chain first consists of 20 pilot runs of 1000 iterations each, allowing us to adjust proposal distributions (for Metropolis and Metropolis–Hastings updates) with targeted acceptance rates lying between 0.2 and 0.4 to achieve good convergence properties (Gilks *et al.* 1996). Then MCMC chains are run for 25,000 iterations after a 5000-iterations burn-in period. Samples are taken from the chain every 25 post-burn-in iterations to reduce autocorrelations, using a so-called thinning procedure. To validate the BayPass sampler, an independent implementation of the core model was coded in the bugs language and run in the openbugs software (Thomas *et al.* 2009) as detailed in File S2. Analyses of some (small) test data sets using both implementations gave consistent results (data not shown).

Finally, as a matter of comparison, in the analysis of prior sensitivity in estimation, the BayEnv2 (Günther and Coop 2013) software was also used with default options except the total number of iterations was set to 50,000.

### Estimation and visualization of

For BayPass analyses, point estimates of each element of consisted of their corresponding posterior means computed over the sampled matrices. For BayEnv2 analyses, the first 10 sampled matrices were discarded and only the 90 remaining sampled ones were retained. As a matter of comparison, the frequentist estimate of as proposed by Bonhomme *et al.* (2010) and implemented in the flk package was also considered. Briefly, the flk estimator of the covariance matrix relies on a neighbor-joining algorithm on the Reynolds pairwise population distances matrix to build a population tree from which the covariance matrix is deduced (after midpoint rooting of the tree).

For visualization purposes, a given estimate was transformed into a correlation matrix with elements using the cov2cor() R function (R Core Team 2015). The graphical display of this correlation matrix was done with the corrplot() function from the R package *corrplot* (Wei 2013). In addition, hierarchical clustering of the underlying populations was performed using the hclust() R function, considering as a dissimilarity measure between each pair of populations *i* and *j*. The resulting bifurcating tree was plotted with the plot.phylo() function from the R package *ape* (Paradis *et al.* 2004). Note that the latter representation reduces the correlation matrix into a block-diagonal matrix, thus ignoring gene flow and admixture events.

### Computation of the metric to compare matrices

The metric proposed by Förstner and Moonen (2003) for covariance matrices and hereafter referred to as the FMD (for Forstner and Moonen Distance) was used to compare the different estimates of and to assess estimation precision and robustness in the prior sensitivity analysis. Let and be two (symmetric positive definite) covariance matrices with rank *J*; the FMD distance is defined as (12)where represents the *j*th generalized eigenvalue of the matrices and that were all computed with the R package *geigen* (Hasselman 2015).

### Computation and calibration of the XtX statistic

Identification of SNPs subjected to adaptive differentiation relied on the XtX differentiation measure introduced by Günther and Coop (2013). This statistic might be viewed as a SNP-specific explicitly corrected for the scaled covariance of population allele frequencies. For each SNP *i*, XtX was estimated from the *T* MCMC (post-burn-in and thinned) parameters sampled values, and as (13)To provide a decision criterion for discriminating between neutral and selected markers, *i.e.*, to identify outlying XtX, we estimated the posterior predictive distribution of this statistic under the null (core) model by analyzing pseudo-observed data sets (POD). PODs are produced by sampling new observations (either allele or read count data) from the core inference model with (hyper)parameters and (the most distal nodes in the Directed Acyclic Graph of Figure 1) fixed to their respective posterior means obtained from the analysis of the original data. The sample characteristics are preserved by sampling randomly (with replacement) SNP vectors of ’s (for allele count data) or ’s (for read count data) among the observed ones. For Pool-Seq data, haploid sample sizes are set to the observed ones. The R (R Core Team 2015) function simulate.baypass() available in the BayPass software package was developed to carry out these simulations. The POD is further analyzed using the same MCMC parameters (number and length of pilot runs, burn-in, chain length, etc.) as for the analysis of the original data set. The XtX values computed for each simulated locus are then combined to obtain an empirical distribution. The quantiles of this empirical distribution are computed and are used to calibrate the XtX observed for each locus in the original data: *e.g.*, the 99% quantile of the XtX distribution from the POD analysis provides a 1% threshold XtX value, which is then used as a decision criterion to discriminate between selection and neutrality. Note that this calibration procedure is similar to the one used in Vitalis *et al.* (2014) for the calibration of their SNP

### Population association tests and decision rules

Association of SNPs with population-specific covariables is assessed using BFs or what may be called “empirical Bayesian *P*-values” (eBP). Briefly, for a given SNP, BF compares models with and without association while eBP is aimed at measuring to which extent the posterior distribution of the regression coefficient excludes 0. Note that eBPs are not expected to display the same frequentist properties as classical *P*-values.

Two different approaches were considered to compute BFs. The first estimate (hereafter referred to as ) relies on the importance sampling algorithm proposed by Coop *et al.* (2010) and uses MCMC samples obtained under the core model (see File S3 for a detailed description). The second estimate (hereafter referred to as ) is obtained from the posterior mean of the auxiliary variable under the AUX model, (14)where is the (estimated) posterior odds that the locus *i* is associated to the covariable and is the corresponding prior odds (Gautier *et al.* 2009). Hereby, is derived for the AUX model only with (the prior odds being challenging to compute when ). In practice, to account for the finite number T of MCMC sampled values, is set equal to [respectively, ] when the posterior mean of the = 1 (or 0, respectively). Note that, through the prior on *P*, the computation of explicitly accounts for multiple-testing issues. BFs are generally expressed in deciban (dB) units [via the transformation ]. Jeffreys’ rule (Jeffreys 1961) provides a useful decision criterion to quantify the strength of evidence (here in favor of association of the SNP with the covariable), using the following dB unit scale: “strong evidence” when 10 BF15, “very strong evidence” when 15BF20, and “decisive evidence” when BF20.

For the computation of eBPs, the posterior distribution of each SNP was approximated as a Gaussian distribution where and are the estimated posterior mean and standard deviation of the corresponding The eBPs are further defined as (15)where is the cumulative distribution function of the standard normal distribution. Roughly speaking, a value of *β* might be viewed as “significantly” different from 0 at a level of . Two different approaches were considered to estimate the moments of the posterior distribution of the ’s. The first one, detailed in File S3, relies on an importance sampling algorithm similar to the one mentioned above and thus uses MCMC samples obtained under the core model. The resulting eBP estimates are hereafter referred to as The second approach relies on posterior samples of the MCMC obtained under the STD model. The resulting eBP estimates are hereafter referred to as

Note finally, that for estimating (under the AUX model) and (under the STD model), the value of was fixed to its posterior mean as obtained from an initial analysis carried out under the core model.

### Simulation study

#### Simulation under the inference model:

Simulations under the core or the STD inference models defined above (Figure 1) were carried out using the function simulate.baypass() available in the BayPass software package. Briefly, a simulated data set is specified by the -matrix, the parameters of the *β*-distribution for the ancestral allele frequencies ( and ), and the sample sizes. As a matter of expedience, ancestral allele frequencies <0.01 (respectively above 0.99) were set to 0.01 (respectively 0.99) and markers that were not polymorphic in the resulting simulated data set were discarded from further analyses. For the generation of PODs (see above), the ’s (or the ’s for Pool-Seq data) were sampled (with replacement) from the observed ones and for the power analyses, these were fixed to for all the populations. To simulate under the STD model, the simulated ’s (SNP regression coefficients) were specified and the population covariable vector was simply taken from the standard normal cumulative distribution function such that for the *j*th population (of the *J* ones).

#### Individual-based simulations:

Individual-based forward-in-time simulations under more realistic scenarios were carried out under the Simupop environment (Peng and Kimmel 2005) as described in de Villemereuil *et al.* (2014). Briefly, three scenarios corresponding to (i) a highly structured isolation with migration (HsIMM-C) model, (ii) an isolation with migration (IMM) model, and (iii) a stepping-stone (SS) scenario were investigated. For each scenario, one data set consisted of 320 individuals belonging to 16 different populations that were genotyped for 5000 SNPs regularly spread along 10 chromosomes of 1 M length. Polygenic selection acting on an environmental gradient (see de Villemereuil *et al.* 2014 for more details) was included in the simulation model by choosing 50 randomly distributed SNPs (among the 5000 simulated ones) and assigning them a selection coefficient calculated as a logistic transformation of the corresponding population-specific environmental variable following (with and ). For each individual, the overall fitness was finally derived from their genotypes, using a multiplicative fitness function.

To assess the performance of the AUX model in capturing information from SNP spatial dependency, data sets displaying stronger LD were generated under HsIMM-C (the least favorable scenario, see *Results*) by slightly modifying the corresponding script available from de Villemereuil *et al.* (2014). The resulting HsIMMld-C (for HsIMM-C with LD) data sets each consisted of 5000 SNPs spread on five smaller chromosomes of 4 cM (leading to a SNP density of ∼1 SNP every 4 kb, assuming 1 cM 1 Mb). In the middle of the third chromosome, a locus with a strong effect on individual fitness was defined by two consecutive SNPs strongly associated with the environmental covariable (such that and in the computation of as defined above). Note that for all the individual-based simulations described in this section, SNPs were assumed in complete linkage equilibrium in the first generation.

#### Comparison with different genome-scan methods:

In addition to analyses under the models implemented in BayPass (see above), the HsIMM-C, IMM, and SS individual-based simulated data sets were analyzed with five other popular or recently developed genome-scan approaches. First, these include BayeScan (Foll and Gaggiotti 2008), which is a Bayesian covariate-free approach that identifies overly differentiated markers (with respect to expectation under a migration–drift equilibrium demographic model) via a logistic regression of the population-by-locus on a locus-specific and population-specific effect. The decision criterion was based on a Bayes factor that quantifies the support in favor of a nonnull locus effect. Second, the recently developed BayScenv (de Villemereuil and Gaggiotti 2015) model was also used. It is conceived as an extension of BayeScan, incorporating environmental information by including a locus-specific regression coefficient parameter (noted *g*) in the above-mentioned logistic regression. The decision criterion to assess association with the covariate was based on the estimated posterior probability of *g* being nonnull. In practice, to limit computation burden for both BayeScan (version 2.1) and BayScenv, default MCMC parameter options of the programs were chosen except for the length of the pilot runs (set to 1000), the length of the burn-in period (set to 10,000), and the number of sampled values (set to 2500). A third and covariate-free approach consisted of computing the flk statistics (which might be viewed as the frequentist counterpart of the described above) as described in Bonhomme *et al.* (2010). The fourth considered method relied on latent factor mixed models (LFMM) as implemented in the lfmm (version 1.4) software (Frichot *et al.* 2013) to detect association of allele frequencies differences with population-specific covariables while accounting for population structure via the so-called latent factors. Following de Villemereuil *et al.* (2014), who analyzed the same data sets, the prior number of latent factors required by the program was set to Note also that LFMM analyses were run on individual genotyping data rather than population allele frequencies, which were previously shown to display better performance (de Villemereuil *et al.* 2014). For each data set, the decision criterion to assess association of the SNP with the environmental covariable relied on a *P*-value that was either computed based on a single analysis (denoted as lfmm) or derived after combining *Z* scores from 10 independent analyses (denoted as lfmm-10rep) following the procedure described in the lfmm (version 1.4) manual. Finally, the data sets were also analyzed with BayEnv2 (Coop *et al.* 2010), following a two-step procedure (as required by the program) that was similar to the one performed by de Villemereuil *et al.* (2014). For each data set, a first MCMC of 15,000 iterations was run under default parameter settings and the last sampled covariance matrix was used as an estimate of For each SNP in turn, an MCMC of 30,000 iterations was further run to estimate the corresponding and BF based on this latter matrix. To facilitate automation of the whole procedure, a custom shell script was developed.

Each analysis was run on a single node of the same computer cluster to provide a fair comparison of computation times. To further compare the performances of the different models, the actual (i) true positive rates (TPR) or power, *i.e.*, the proportion of true positives among the truly selected loci; (ii) false positive rates (FPR), *i.e.*, the proportion of false positives among the nonselected loci; and (iii) false discovery rates (FDR), *i.e.*, the proportion of false positives among the significant loci, were computed from the analysis of each data set with the different methods for various thresholds covering the range of values of the corresponding decision criterion. From these estimates, both standard receiver operating curves (ROC) plotting TPR against FPR and precision-recall (PR) curves plotting (1-FDR) against TPR could then be drawn.

### Real data sets

#### The HSA_{snp} data set:

This data set is the same as in Coop *et al.* (2010) and was downloaded from the BayEnv2 software Web page (http://gcbias.org/bayenv/). It consists of genotypes at 2333 SNPs for 927 individuals from 52 human populations of the Human Genome Diversity Project (HGDP) panel (Conrad *et al.* 2006).

#### The BTA_{snp} data set:

This data set is a subset of the data from Gautier *et al.* (2010b) and consists of 453 individuals from 18 French cattle breeds (from 18 to 46 individuals per breed) genotyped for 42,046 autosomal SNPs displaying an overall minor allele frequency (MAF) >0.01. As detailed in File S4, two breed-specific covariables were considered for association analyses. The first covariable corresponds to a synthetic morphology score (SMS) defined as the (scaled) first principal component of breed average weights and wither heights for both males and females (taken from the French BRG Web site: http://www.brg.prd.fr/). The second covariable is related to coat color and corresponds to the piebald coloration pattern of the different breeds that was coded as 1 for pied breed (*e.g.*, Holstein breed) and for breeds with a uniform coloration pattern (*e.g.*, Tarine breed).

#### The LSA_{ps} data set:

This data set was obtained from whole transcriptomes of pooled *L. saxatilis* (LSA) individuals belonging to 12 different populations (Westram *et al.* 2014). These populations originate from three distinct geographical regions (UK, the United Kingdom; SP, Spain; and SW, Sweden) and lived in two different ecotypes corresponding to the so-called “wave” habitat (subjected to wave action) and “crab” habitat (*i.e.*, subjected to crab predation). The *mpileup* file with the aligned RNA-seq reads from the 12 pools (three countries two ecotypes two replicates) onto the draft LSA genome assembly was downloaded from the Dryad Digital Repository, doi: 10.5061/dryad.21pf0 (Westram *et al.* 2014). The *mpileup* file was further processed using a custom awk script to perform SNP calling and derive read counts for each alternative base (after discarding bases with a Base Alignment Quality score <25). A position was considered variable if (i) it had a coverage of >20 and <250 reads in each population, (ii) only two different bases were observed across all five pools, and (iii) the minor allele was represented by at least one read in two different pool samples. Note that triallelic positions for which the two most frequent alleles satisfied the above criteria with the third allele represented by only one read were included in the analysis as biallelic SNPs (after filtering the third allele as a sequencing error). The final data set then consisted of allele counts for 53,387 SNPs. As a matter of expedience, the haploid sample size was set to 100 for all the populations because samples consisted of pools of ∼40 females with their embryos (from tens to hundreds per female) (Westram *et al.* 2014). To carry out the population analysis of association with ecotype and identify loci subjected to parallel phenotypic divergence, the habitat is considered a binary covariable, respectively coded 1 for the wave habitat and for the crab habitat.

## Results

### Performance of the core model for estimation of the scaled population covariance matrix

The scaled covariance matrix of population allele frequencies represents the key parameter of the models considered in this study. Evaluating the precision of its estimation is thus crucial. To illustrate how prior parameterization might influence estimation of we first analyzed the BTA_{snp} (with *J* = 18 French cattle populations) and the HSA_{snp} (with *J* = 52 worldwide human populations) data sets, using both BayPass (under the core model represented in Figure 1A with ) and BayEnv2 (in which and according to Coop *et al.* 2010). Note that the sampled populations in these two data sets have similar characteristics in terms of the overall ( and for the cattle and human sampled populations, respectively). The resulting estimated -matrices are hereafter denoted as and respectively, for the cattle data set and are represented in Figure 2. Similarly, for the human data set, the resulting and are represented in Figure S1. For both data sets, the comparisons of the two different estimates of reveal clear differences that suggest in turn some sensitivity of the model to the prior assumption. Analyses under three other alternative BayPass model parameterizations, (i) and (ii) and (iii) and confirmed this intuition (Figure S2). For the human data set, the FMD between the different estimates of varied from 1.73 (BayPass with *vs.* BayPass with and ) to 31.1 (BayPass with *vs.* BayPass with ). However, for the cattle data set that contains about 20 times as many SNPs for 3 times fewer populations, the four BayPass analyses gave consistent estimates (pairwise FMD always <0.5) that clearly depart from the BayEnv2 one (pairwise FMD always >14). Note also that BayPass estimates were in better agreement with the historical and geographic origins of the sampled breeds (see Figure 2 and Gautier *et al.* 2010b for further details).

Overall these contrasting results call for a detailed analysis of the sensitivity of the model to prior specifications on both (*ρ*-value) and the *β*-distribution parameters ( and ), but also to data complexity (number and heterozygosity of SNPs). To that end we first simulated under the core inference model (Figure 1A) data sets for four different scenarios labeled SpsH1, SpsH2, SpsB1, and SpsB2. In SpsH1 and SpsH2 (respectively SpsB1 and SpsB2), the population covariance matrix was set to (respectively ), and in SpsH1 and SpsB1 (respectively SpsH2 and SpsB2) the ’s were sampled from a uniform distribution over (0, 1) [respectively a distribution]. Note that the two different -distributions lead to quite different SNP frequency spectra, the uniform one approaching (ascertained) SNP chip data (*i.e.*, good representation of SNPs with an overall intermediate MAF), while the one is more similar to those obtained in whole-genome sequencing experiments with an overrepresentation of poorly informative SNPs (see, *e.g.*, results obtained on the LSA_{ps} Pool-Seq data below). To assess the influence of the number of genotyped SNPs, data sets consisting of 1000, 5000, 10,000, and 25,000 SNPs were simulated for each scenario. For each set of simulation parameters, 10 independent replicate data sets were generated, leading to a total of 160 simulated data sets (10 replicates 4 scenarios 4 SNP numbers) that were each analyzed with BayEnv2 (Coop *et al.* 2010) and four alternative BayPass model parameterizations: (i) (ii) and (iii) and (iv) and . As a matter of comparison, the flk frequentist estimate (Bonhomme *et al.* 2010) of the covariance matrices was also computed. FMD distances (averaged across replicates) of the resulting estimates from their corresponding true matrices are represented in Figure 3. Note that for a given simulation parameter set, the FMD distances remained quite consistent (under a given model parametrization) across the 10 replicates (Figure S3).

Except for the BayEnv2 and flk analyses, the estimated matrices converged to the true ones as the number of SNPs (and thus the information) increased. In addition, as observed above for real data sets, the BayEnv2 estimates were always quite different from those obtained with BayPass parameterized under the same model assumptions ( npop and ). It should also be noted that reproducing the same simulation study by using the and matrices in the four different scenarios led to similar patterns (Figure S4). Reasons for this behavior of BayEnv2 (possibly the result of some minor implementation issues) were not investigated further and we hereafter concentrate only on results obtained with BayPass.

As expected, the optimal number of SNPs also depends on their heterozygosity. Hence, when the simulated ’s were sampled from a (Figure 3, B and D) instead of a Unif(0,1) distribution, a higher number of SNPs were required (compare Figure 3, A and B, with Figure 3, C and D, respectively) to achieve the same accuracy. Likewise, all else being equal, the estimation precision was found always lower for the SpsH1 (and SpsH2) than SpsB1 (and SpsB2) scenarios. This shows that the optimal number of SNPs is an increasing function of the number of sampled populations. One might also expect that more SNPs are required when population differentiation is lower (although this was not formally tested here). Regarding the sensitivity of the models to the prior definition, the parametrization with clearly outperformed the more informative one (), most particularly for the smaller number of SNPs and more complex data sets. Naturally, estimating the parameters and compared to setting them to had almost no effect in the estimation precision of for the SpsH1 and SpsB1 scenarios, their resulting posterior means being slightly >1 ( due probably to the simulation SNP ascertainment scheme described in *Materials and Methods*). Interestingly, however, a substantial gain in precision was obtained for the SpsH2 and SpsB2 data sets (for which ). Hence, for the SpsB2 data sets (Figure 3D), the FMD curves reached a plateau with the parameterization (for both and ) as the number of SNPs increased whereas precision kept improving when and were estimated.

We finally investigated to which extent estimation of and might improve robustness to SNP ascertainment. To that end, 10 additional independent data sets of 100,000 SNPs were simulated under both the SspH 1 and SspB1 scenarios. For each of the 20 resulting data sets, 6 subsamples were constituted by randomly sampling 25,000 SNPs with an overall MAF >0, >0.01, >0.025, >0.05, >0.075, and >0.10, respectively. The 120 resulting data sets (2 scenarios 10 replicates 6 MAF thresholds) were analyzed with BayPass (assuming ) by either estimating and or setting Although the estimation precision of was found to decrease with increasing MAF thresholds (Figure S5), estimating and allowed us to clearly improve accuracy in these examples. Note, however, that the effect of the ascertainment scheme remained limited, in particular for small MAF thresholds (MAF <0.05).

### Performance of the XtX statistics to detect overly differentiated SNPs

To evaluate the performance of the XtX statistics to identify SNPs subjected to selection, data sets were simulated under the STD inference model (Figure 1B), *i.e.*, with a population-specific covariable. This simulation strategy was mainly adopted to compare covariable-free XtX-based decision (scan for differentiation) with association analyses (based on covariate models) as described in the next section. Obviously, the XtX is a covariable-free statistic that is powerful to identify SNPs subjected to a broader kind of adaptive constraints, as elsewhere demonstrated (Bonhomme *et al.* 2010; Günther and Coop 2013). Hence, two different (demographic) scenarios, labeled SpaH and SpaB, were considered. In the scenario SpaH (respectively SpaB), was set equal to (respectively ), and the ’s were sampled from a uniform distribution. For each scenario, 25,600 SNPs were simulated of which 25,000 are neutral SNPs (*i.e.*, with a regression coefficient ) and 600 are SNPs associated with a normally distributed population-specific covariable (see *Materials and Methods*) and with regression coefficients (*n* = 100), (*n* = 100), (*n* = 100), (*n* = 100), (*n* = 100), and (*n* = 100). For each scenario, 10 independent replicate data sets, each with a randomized population covariable vector, were generated. The resulting 20 simulated data sets (10 replicates 2 scenarios) were then analyzed with four alternative BayPass model parameterizations corresponding to (i) the core model (Figure 1A) with (ii) the core model by setting (iii) the STD model (Figure 1B) by setting and (iv) the default AUX model (Figure 1C), *i.e.*, with and

As expected, under the core model, the higher is, the higher the estimated XtX on average (Figure S6). As a matter of expedience, for power comparisons, 1% POD thresholds were further defined for each analysis, using the XtX distribution obtained for SNPs with simulated Note that the resulting thresholds were very similar to those obtained using independent data sets (*e.g.*, SpsH1 and SpsB1) that led to FPR close to 1%. As shown in Table 1, the power was optimal () for strongly associated SNPs () in both scenarios but remained small () for weakly associated SNPs. In addition, power was always higher with the SpaH than with the SpaB data probably due to a more informative design (three times as many populations). Likewise, estimating (*i.e.*, including information from the associated SNPs) slightly affected the performance of the XtX-based criterion when compared to setting (see Table 1 and also the ROC curve analyses in Figure S7). Yet the resulting estimated matrices were close to the true simulated ones ( across the SpaH and across the SpaB simulated data sets), suggesting in turn that the core model is also robust to the presence of SNPs under selection (at least in moderate proportion). Conversely, a misspecification of the prior investigated here by similarly analyzing the SpaH (respectively SpaB) data sets under the core, the STD, and the AUX models but setting (respectively ), led to an inflation of the XtX estimates (Figure S8). The XtX mean was in particular shifted away from *J* (number of populations) expected under neutrality (see also figure 5 in Günther and Coop 2013). As a consequence, the overall performances of the XtX-based criterion were clearly affected (see Table S1 and ROC in Figure S7).

Interestingly, under both the STD and AUX models, the distribution of the XtX for SNPs associated to the population covariable was similar to the neutral SNP one, whatever the underlying (Figure S6). Accordingly, the corresponding true positive rates were close to the nominal POD threshold in Table 1. This suggests that both covariate models allow us to efficiently correct the XtX estimates for the (“fixed”) covariable effect of the associated SNPs.

### Performance of the models to detect SNPs associated to a population-specific covariable

The performances of the STD and AUX models to identify SNPs associated to a population-specific covariable were further evaluated using results obtained on the SpaH and SpaB data sets (see above). As shown in Figure 4, the importance sampling estimates of the coefficients (computed from parameter values sampled under the core model) were found less accurate than posterior mean estimates obtained from values sampled under the STD or AUX models. For smaller however, the introduction of the auxiliary variable (AUX model) tended to shrink the estimates toward zero in the SpaB data sets probably due, here also, to a less powerful design (three times fewer populations).

Accordingly, the BFs estimated under the AUX model () had more power to identify SNPs associated to the population-specific covariables than the corresponding (Table 2 and Figure S9). Indeed, although constrained by construction to a maximal value (here 53.0 dB) that depends both on the number of MCMC samples (here ) and on the prior expectation of *P* (here 0.01), at the “decisive evidence” threshold of 20 dB (Jeffreys 1961), the TPR for SNPs with a simulated were, for instance, with for the SpaH data compared to with the -based decision criterion (Table 2). For the SpaH data (but not for the SpaB data) a similar trend was observed when comparing decision criteria based on the (relying on the importance sampling algorithm) and the estimated under the STD model (see Table 2 and Figure S10). In addition, Table 2 shows that the intuitive, but still arbitrary, threshold of 3 on the eBP performed worse than the 20-dB threshold on the BF, particularly for the smallest This suggests that a decision criterion rule relying on the may be the most reliable in the context of these models.

We next explored how a misspecification of the prior affected the estimation of the ’s and the different decision criteria. As in the previous section, we considered results obtained for the SpaH (respectively SpaB) data sets with analyses setting (respectively ). Surprisingly, although the importance sampling estimates of the ’s obtained under the core model clearly performed poorer (particularly for the SpaB data), the estimates obtained under the STD and AUX models were not so affected (Figure S11). Nevertheless, if the resulting TPR and FPR were similar to the previous ones for the SpaH data, and for the SpaB data the power to detect associated SNPs strongly decreased with both the and criteria. Conversely, increased FPR were observed with the (up to ) and -based decision criteria (see Table S2 and compare with Table 2). These results thus suggest that the influence of model misspecification, although unpredictable, may be critical for association studies under the STD and AUX covariate models.

### Comparison of the performances of BayPass with other genome-scan methods under realistic scenarios

To compare the performances of the different approaches implemented in BayPass with other popular or recently developed methods, data sets simulated under three realistic scenarios were considered. Following de Villemereuil *et al.* (2014) (see *Materials and Methods*), these correspond to (i) a HsIMM-C model, (ii) an IMM model, and (iii) a SS scenario with polygenic selection acting on an environmental gradient. In total 300 data sets (100 per scenario), each consisting of genotyping data on 5000 SNPs for 320 individuals belonging to 16 different populations, were analyzed with BayPass under the core model (to estimate XtX, and ), the STD model (to estimate and also the XtX corrected for the fixed covariable effect), and the AUX model (to estimate and also the corrected XtX). These data sets were also analyzed with five other programs (see *Materials and Methods*), two of which, namely BayeScan (Foll and Gaggiotti 2008) and flk (Bonhomme *et al.* 2010), implemented (only) covariate-free approaches, and the three others, namely BayEnv2 (Coop *et al.* 2010), lfmm (Frichot *et al.* 2013), and BayScenv (de Villemereuil and Gaggiotti 2015), allowed us to test for association with a population-specific covariable.

For each scenario, average ROC and PR curves resulting from the analyses of the 100 simulated data sets are plotted for the different methods (and decision criteria) in Figure 5. In addition, area under the ROC curve (AUC) together with averaged computation times are detailed in Table 3. In agreement with previous studies (*e.g.*, de Villemereuil *et al.* 2014), under such complex scenarios with polygenic selection, the association-based methods clearly outperformed covariable-free approaches (BayeScan, flk, and XtX-based criterion). For the latter, however, the BayPass XtX (estimated under the core model) always performed better than BayeScan and flk in all three scenarios. Surprisingly, for the HsIMM-C and the SS scenarios, the BayEnv2 XtX-based criterion led to higher AUC than its BayPass counterpart with a value close to that of the BayEnv2 BF association test (Table 3).

Among the association-based methods, BayPass was found to display similar performances (using and ) to LFMM-10rep, being even slightly better than single-run LFMM analyses for the IMM and SS scenarios. Both methods outperformed BayeScenv and Bayenv2 in all scenarios (except the SS scenario for the latter). It should be noted that LFMM-10rep analyses were based on individual genotyping data (and a balanced design) that represent the most favorable situation (de Villemereuil *et al.* 2014). The criterion displayed similar performances in the PR analysis to the BayPass and criteria. Nevertheless, ROC AUC values were always found lower when considering probably as a result of the inherent correction in the AUX model for multiple-testing issues, which, as expected, affects the power. Interestingly, as expected from previous results, the XtX calculated under the STD model (and to a lesser extent the AUX model) led here to a worthless decision criterion (ROC AUC almost = 0.5), illustrating the efficiency of the correction for the fixed covariable effect (Table 3).

Finally, under the parameter options chosen to run the different programs (see *Materials and Methods*), BayPass analyses were always among the most computationally efficient approaches (Table 3). For instance, under the core model, BayPass was found to run 1.5 times faster than a single LFMM run.

### Performance of the Ising prior to account for SNP spatial dependency in association analyses

To evaluate the ability of the AUX model Ising prior to capture SNP spatial dependency information, 100 data sets simulated under the HsIMMld-C scenario (see *Materials and Methods*) were analyzed under the AUX model with three different parameterizations for the Ising prior: (i) (no spatial dependency), (ii) and (iii) For each data set, analyses with and without the causal variants were carried out and the required estimate of the covariance matrix was obtained from a preliminary analysis performed under the core model. As shown in Figure 6, increasing improved the mapping precision. Indeed, both a noise reduction at neutral position and a sharpening of the 95% envelope (containing 95% of the posterior means across the 100 simulated data sets) around the selected locus can be observed (*e.g.*, compare Figure 6A1 and A3). Interestingly, given the considered SNP density (and level of LD), excluding the causal variants had only a marginal effect on the overall results.

### Analysis of the French cattle SNP data

The XtX estimates were obtained for the 42,046 SNPs of the data (Figure S12) from the previous analysis under the core model with (*e.g.*, Figure 2). In agreement with the above results, setting instead (the estimate of obtained in the latter analysis) gave almost identical XtX estimates (). To calibrate the XtX’s, a POD containing 100,000 simulated SNPs was generated and further analyzed, leading to a posterior estimate of very close to (). Similarly, the posterior means of and obtained on the POD data set ( and respectively) were almost equal to the ones obtained in the original analysis of the data set ( and respectively). This indicated that the POD faithfully mimics the real data set, allowing the definition of relevant POD significance thresholds on XtX to identify genomic regions harboring footprints of selection. To that end, the UMD3.1 bovine genome assembly (Liu *et al.* 2009) was first split into 5400 consecutive 1-Mb windows (with a 500-kb overlap). Windows with at least two SNPs displaying XtX (the 0.1% POD threshold) were deemed significant and overlapping “significant” windows were further merged to delineate significant regions. Among the 15 resulting regions, two regions were discarded because their peak XtX value was <40.0 (the 0.01% POD threshold). As detailed in Table 4, the 13 remaining regions lie within or overlap with a core selective sweep (CSS) as defined in the recent meta-analysis by Gutiérrez-Gil *et al.* (2015). This study combined results of 21 published genome scans performed on European cattle populations, using various alternative approaches. The proximity of the XtX peak allows us to define positional candidate genes (Table 4) that have, for most regions, already been proposed (or demonstrated) either to be under selection or to control genes involved in traits targeted by selection (see *Discussion*).

To illustrate how information provided by population-specific covariables might help in formulating or even testing hypotheses to explain the origin of the observed footprints of selection, characteristics of the 18 cattle populations for traits related to morphology (SMS) and coat pigmentation (piebald pattern) were further analyzed within the framework developed in this study. An across-population genome-wide association study was thus carried out under both the STD and the AUX models (with ), allowing the computation for each SNP of the corresponding and estimates (Figure S12) and and estimates (Figure S13). We hereafter concentrated on results obtained with BF that are more grounded from a decision theory point of view (and roughly lead to similar conclusions to eBP). For both traits, the resulted in larger BF estimates and a higher number of significant association signals (*e.g.*, at the 20-dB threshold) than This trend was confirmed by analyzing the POD. Indeed, the 99.9% (respectively ) quantiles were 24.9 dB (respectively 18.3 dB) for association with SMS and 26.3 dB (respectively 11.7 dB) for association with piebald pattern. Nevertheless, at the BF threshold of 20 dB, the false discovery rate for remained small (0.035%) and similar to the one obtained in the simulation studies (*e.g.*, Table 2). Interestingly, among the 13 regions identified in Table 4, three contained (regions 4, 11, and 12) at least one SNP significantly associated with SMS based on the criterion and none with the criterion (although for the peak of region 12, providing substantial evidence according to Jeffreys’ rule). For the piebald pattern, results were more consistent since of the four regions (regions 3, 7, 8, and 11) that contained at least one SNP with a the of the corresponding peak SNP was also (although lower) for all but region 11 (although for the peak, providing strong evidence according to Jeffreys’ rule). Except for region 7 where both BF peaks lay within the KIT gene (and to a lesser extent for region 11 with SMS), the BF peaks colocalized with (regions 3, 4, and 8) or were very close to (<50 kb) the XtX peaks. Accordingly, the corresponding XtX estimates decreased when estimated under the STD model, *i.e.*, accounting for the covariables (Figure S14). For instance, the SNP under the XtX peak dropped from 76.3 to 50.3 (from 40.7 to 19.3) for region 3 (respectively region 8). Overall, the posterior means of the individual SNP regression coefficients estimated under the STD model ranged (in absolute value) from (respectively ) to 0.166 (respectively 0.233) for SMS (respectively piebald pattern). These estimates remained close to those derived from the importance sampling algorithm, although the latter tended to be lower in absolute value (Figure S15). As expected from the above simulation studies, estimates obtained under the AUX model tended to be shrunk toward 0, which was particularly striking in the case of SMS (Figure S15).

Finally, analyses of association with SMS were conducted under the AUX model with three different Ising prior parameterizations ( and ), focusing on the 1394 SNPs mapping to BTA14 (Figure 7). Under the parameterization (equivalent to the AUX model analysis conducted above on a whole-genome basis), four SNPs (all lying within region 12) displayed significant signals of association at the -dB threshold with a peak value of 28.5 dB at position 24.6 Mb (Figure 7A). These results, obtained on a chromosome-wide basis, provide additional support to the region 12 signal previously observed. They alternatively suggest that power of the computed on a whole-genome basis might have been altered by the small proportion of SNPs strongly associated to SMS due to multiple-testing issues (which computation does not account for). Hence, for SNPs mapping to BTA14, the estimated on the initial genome-wide analysis were almost identical to the () and highly correlated to the () estimated in the chromosome-wide analysis. As expected from simulation results, increasing led us to refine the position of the peak toward a single SNP mapping ∼400 kb upstream the PLAG1 gene (Figure 7, B and C).

### Analysis of the *L. saxatilis* Pool-Seq data

The LSA_{ps} Pool-Seq data set was first analyzed under the core model (with ). In agreement with previous results (Westram *et al.* 2014), the resulting estimate of the population covariance matrix confirmed that the 12 different *Littorina* populations cluster at the higher level by geographical location and then by ecotype and replicate (Figure 8A). This analysis also allowed us to estimate the XtX for each of the 53,387 SNPs that were further calibrated by analyzing a POD containing 100,000 simulated SNPs to identify outlier SNPs (Figure 8). As for the cattle data analysis, the estimate of on the POD was close to the matrix estimated on the original LSA_{ps} data set () although the posterior means of and were slightly higher ( compared to with the LSA_{ps} data set). In total, 169 SNPs subjected to adaptive divergence were found at the 0.01% POD significance threshold. To illustrate how the BayPass models may help in discriminating between parallel phenotype divergences from local adaptation, analyses of association were further conducted with ecotype (crab *vs.* wave) as a categorical population-specific covariable. Among the 169 XtX outlier SNPs, 65 (respectively 75) displayed dB (respectively dB) (Figure 8B). The two BF estimates resulted in consistent decisions (113 SNPs displaying both dB and dB), although at 20 dB more SNPs were found significantly associated under the AUX model (*n* = 176) than with (*n* = 117). Interestingly, several overly differentiated SNPs (high XtX value) were clearly not associated to the population ecotype covariable (small BF). These might thus be responding to other selective pressures (local adaptation) but might also, for some of them, map to sex chromosomes (Gautier 2014). As a consequence, SNP XtX estimated under the AUX model (*i.e.*, corrected for the fixed ecotype effect) remained highly correlated with the XtX estimated under the core model (including for some XtX outliers) with the notable exception of the SNPs significantly associated to the ecotype. For the latter, the corrected XtX dropped to values generally far smaller than the 0.01% POD threshold (Figure 8C). Finally, Figure 8D gives the posterior mean of the SNP regression coefficients, quantifying the strength of the association with the ecotype covariable. It shows that several SNPs displayed strong association signals (), pointing toward candidate genes underlying parallel phenotype divergence. As observed above in the simulation study and in the analysis of the cattle data set, the AUX model estimates tended to be shrunk toward 0, except for the highest values (corresponding to SNPs significantly associated to the covariable) when compared to the estimates obtained under the STD model (Figure S16A). A similar trend for the estimates of the strongly associated SNPs was observed with the importance sampling estimates (Figure S16B).

## Discussion

The main purpose of this study was to develop a general and robust Bayesian framework to identify genomic regions subjected to adaptive divergence across populations by extending the approach first described in Coop *et al.* (2010) and Günther and Coop (2013). Because of the central role played in the underlying models by the scaled population covariance matrix (), a first objective was to improve the precision of its estimation. To that end, instead of defining an inverse-Wishart prior on as in Coop *et al.* (2010), a Wishart prior defined on the precision matrix () was instead considered and equivalently parameterized with an identity scale matrix but varying the number of degrees of freedom (*ρ*). As the extensive simulation study revealed, the most accurate estimates were obtained by setting (instead of the number of populations, which is equivalent to Coop *et al.* 2010), leading to a weaker (and singular) informative Wishart prior. Although flexible, the purely instrumental nature of the prior parameterization considered in our models makes it difficult to incorporate prior and possibly relevant information about the populations under study. For instance, a spatially (Guillot *et al.* 2014) or even phylogenetically explicit prior might represent in some context attractive alternatives, borrowing for the latter on population genetics theory to model the effect of the demographic history on the covariance matrix (Pickrell and Pritchard 2012; Lipson *et al.* 2013). Apart from investigating different prior specification, additional levels in the hierarchical models were also introduced to estimate the parameters of the (*β*) prior distribution on the ancestral allele frequency. Interestingly, estimating these parameters improved robustness to the SNP ascertainment scheme, in particular when the allele frequency spectrum is biased toward poorly informative SNPs as generally obtained with data from whole-genome sequencing experiments (*e.g.*, Pool-Seq data). Simulation results on MAF filtered data sets also suggested that these additional levels might reduce sensitivity of the models to SNP ascertainment bias characterizing genotyping data obtained from SNP chip. Finally, inclusion of a moderate proportion of SNPs under selection did not significantly affect estimation of Overall, it can be concluded that the core model parameterized with a weakly informative Wishart prior () and that includes the estimation of the parameters and provides a general robust and accurate approach to estimate even with a few thousand genotyped SNPs. It should also be noted that it outperforms previous implementations carried out under a similar hierarchical Bayesian framework, as in the BayEnv2 software (Coop *et al.* 2010), or relying on moment-based estimators (Bonhomme *et al.* 2010; Pickrell and Pritchard 2012; Lipson *et al.* 2013) (see, *e.g.*, Figure 3). As the latter are based on sample allele frequencies, they also remain more sensitive to sample size (and coverage for Pool-Seq data) and, more importantly, they do not allow combining estimation of both the ancestral allele frequencies and covariance matrix that represent a serious issue for small and/or unbalanced designs. Finally, as briefly sketched with visualizations based on correlation plot or hierarchical trees in the present study, the estimation procedure implemented in the BayPass core model might be quite relevant for demographic inference purposes since the matrix has already been shown to be informative about the population history (Pickrell and Pritchard 2012; Lipson *et al.* 2013).

Accounting for renders the identification of SNPs subjected to selection less sensitive to the confounding effect of demography (Bonhomme *et al.* 2010; Günther and Coop 2013). To that end the XtX introduced by Günther and Coop (2013) provides a valuable differentiation measure for a genome scan of adaptive divergence. While XtX might be viewed as a Bayesian counterpart of the FLK statistic (Bonhomme *et al.* 2010), its computation allows considering population histories more complex than bifurcating trees (*i.e.*, including migration or ancestral admixture events), not to mention improved precision in the estimation of the underlying For practical purposes, however, defining a significance threshold for the XtX remains challenging. Indeed, although the XtX are expected under the neutral model to be chi-square distributed (Günther and Coop 2013), the Bayesian (hierarchical) model-based procedure leads to shrinking the XtX posterior mean toward the prior mean (Gelman *et al.* 2003). As a consequence, an empirical posterior checking procedure, similar in essence to the one previously used in a similar context (Vitalis *et al.* 2014), was evaluated here. It represents a relevant alternative to an arbitrary threshold although it comes at a cost of some additional computational burden. The procedure indeed consists of analyzing (POD) data simulated under the inference model with hyperparameters and set equal to those estimated on the real data. Comparing the and estimates obtained on the POD to the original ones ensures that the simulated data provide good surrogates to neutrally evolving SNPs under a demographic history similar to that of the sampled populations. More generally, given the efficiency of the simulation procedure, such simulated data sets might also be relevant to investigate the properties of other estimators of genetic diversity or to evaluate the robustness of various approaches to demographic confounding factors. In the context of this study, a better estimation of was hence shown to improve the performance of the XtX-based differentiation test and association studies with population-specific covariables under the STD and AUX covariate models.

Based on the STD model, Coop *et al.* (2010) relied on importance sampling () estimates of the BF to assess association of allele frequency differences with population-specific covariables. A major advantage of this algorithm stems from its computational efficiency, since only parameter samples drawn from the core model are required. However, the simulation study showed that estimating the regression coefficients with this approach tended to bias (sometimes strongly) the estimates toward zero, as opposed to the posterior means from MCMC parameter values sampled under the STD model. Accordingly, the performances of decision criteria based on eBPs that measure to which extent the posterior distribution of the departs from 0 were generally poorer for the than for the In addition, while a POD calibration similar to the XtX one considered above is straightforward to apply in practice, eBP ( and ) and could not *per se* deal with multiple-testing issues. As previously proposed in a similar modeling context (Riebler *et al.* 2008), introducing binary auxiliary variables attached to each SNP to indicate whether they are associated to a given population covariable allows us to circumvent these limitations. The resulting showed indeed improved power at a stringent decision threshold in the simulation study under the inference model compared to In analyses of real data sets, whereas estimates were found similar to the ones in analysis of association with ecotype in the *Littorina* data set, they led to inflated estimates with the cattle data and thus more (possibly false) significant signals. As shown by analyses on data sets simulated in more realistic scenarios, the intrinsic multiple-testing correction (through the prior on the auxiliary variable) might in turn affect the power of the decision-based criterion. This might explain differences between the results obtained with genome-wide and chromosome-wide analyses of association with the morphology trait in cattle for the region surrounding the PLAG1 gene (BTA14). Besides, in the context of dense genomic data, the AUX model might also be viewed as relevant to more focused analyses for validation (*e.g.*, of genome-wide signals) and fine-mapping purposes. Hence, the Ising prior on the SNP auxiliary variable provides a straightforward and computationally efficient modeling option to account for the spatial dependency among the neighboring markers (Duforet-Frebourg *et al.* 2014). Prior definition of the parameter represents, however, a first limitation of the AUX model, as defined in this study, and estimating it via an additional hierarchical level would be computationally demanding due to the handling of the normalizing constant (*e.g.*, Marin and Robert 2014, Chap. 8.3). Comparing the results from different analyses with increasing values of thus appears as a valuable empirical strategy. More importantly, it should also be noted that the Ising prior essentially consists of a local smoothing of the association signals whose similarity stems from a correlation of the underlying allele frequencies (across all the populations). It thus does not fully capture LD information contained in the local haplotype structure. To that end further extensions of the AUX (and STD) model following the hapFLK method (Fariello *et al.* 2013) that directly relies on haplotype information might be particularly appropriate although difficult to envision for data originating from Pool-Seq experiments.

As expected, in both simulated and real data sets, SNPs strongly associated ( > 0.2) with a given covariable tended to be overly differentiated (high XtX value). Interestingly, however, the STD and AUX covariate models remained more powerful to identify SNPs displaying weaker association signal (typically with ) for which the XtX values did not overly depart from that of neutral SNPs. Assuming information on an underlying covariable (or a proxy of it) is available, the STD and AUX models might thus allow us to identify SNPs within soft adaptive sweeps or subjected to polygenic adaptation, these types of selection schemes leading to more subtle population allele frequency differences that are difficult to detect (*e.g.*, Pritchard *et al.* 2010). Conversely, the covariate models were shown to correct the XtX differentiation measure for the fixed effects of the considered population-specific covariables, refining the biological interpretation of the remaining overly differentiated SNPs by excluding these covariables as key drivers. In principle, across-population association analyses could be performed with any population-specific covariable like environmental covariables (Coop *et al.* 2010; Günther and Coop 2013) but also categorical or quantitative traits as illustrated in examples treated in this study. As such, the STD and AUX covariate models might also be viewed as powerful alternatives to *Q*_{ST}–*F*_{ST} comparisons to assess divergence of quantitative traits (see Leinonen *et al.* 2013, for review) by accurately incorporating genomic information to account for the neutral covariance structure across population allele frequencies. Yet, it should be kept in mind that the considered models capture only linear relationships between allele frequency differences and the covariable. Apart from possibly lacking power for more complex types of dependency, the correlative (and not causative) nature of the association signals might be misleading, notably when the (unobserved) causal covariable is correlated with the analyzed trait or with the principal axes of the covariance matrix (Günther and Coop 2013). Nevertheless, increasing the number of populations and (if possible) the number of studied covariables should overcome these limitations. Still, when jointly considering several covariables, this also advocates for an orthogonal transformation (and scaling) step, *e.g.*, using principal components analysis, to better assess their relationships and to further perform analysis of association on an uncorrelated set of covariables (*e.g.*, principal components).

As a proof of concept, analyses were carried out on real data sets from both model and nonmodel species. Results obtained for the French cattle data demonstrated the versatility of the approach and illustrated how association studies could give insights into the putative selective forces targeting footprints of selection. As a matter of expedience we hereby focused only on the 13 strongest differentiation signals. As expected from the importance of coat pigmentation in the definition of breed standards, at least six genomic regions contained genes known to be associated to coat color and patterning variation, in agreement with a previous genome scan for footprints of selection (see Gutiérrez-Gil *et al.* 2015, for review). These include MC1R (region 13) that corresponds to the locus *Extension* with three alleles identified to date in cattle responsible for the red and black (or combination of both) colors (Seo *et al.* 2007). Similarly, variants localized within the KIT (region 7) and PAX5 (region 10) genes were found highly associated to patterned pigmentation (proportion of black) in Holsteins, accounting for respectively 9.4% and 6.0% of the trait variance (Hayes *et al.* 2010). Within region 7, KIT clusters with KDR (closest to the XtX peak) and PDGFRA, two other tyrosine kinase receptor genes that have also been proposed as candidate coloration genes under selection in other studies (Flori *et al.* 2009; Qanbari *et al.* 2014; Gutiérrez-Gil *et al.* 2015). In region 11, the XtX peak was <25 kb upstream of EDN3 that is involved in melanocyte development and within which mutations were found associated to pigmentation defects in mice, humans, and also chickens (Bennett and Lamoreux 2003; Saldana-Caboverde and Kos 2010; Dorshorst *et al.* 2011). Accordingly, Qanbari *et al.* (2014) recently found a variant in the vicinity of EDN3 strongly associated with coat spotting phenotype of bulls (measured as the proportion of their daughters without spotting) in the Fleckvieh breed. The peak in region 2 was 100 kb upstream the KITLG gene, which is involved in the roan phenotype (mixture of pigmented and white hairs) observed in several cattle breeds (Seitz *et al.* 1999). Mutations in this gene have also been found to underlie skin pigmentation diseases in human (Picardo and Cardinali 2011). Finally, region 5 contains the LEF1 gene (100 kb from the XtX peak) that has recently been demonstrated to be tightly involved in blond hair color in (human) Europeans (Guenther *et al.* 2014). Three other regions contained genes that affect cattle body conformation. These include region 1, containing the myostatin gene (MSTN), one of the best-known examples of economically important genes in farm animals since it plays an inhibitory role in the development and regulation of skeletal muscle mass (Stinckens *et al.* 2011). MSTN is in particular responsible for the so-called double-muscling phenotype in cattle (Grobet *et al.* 1997). Region 12 contains PLAG1 that has been demonstrated to influence bovine stature (Karim *et al.* 2011). Similarly, region 6 encompasses the NCAPG-LCORL cluster in which several polymorphisms have been found strongly associated to height in humans (Allen *et al.* 2010), horses (Signer-Hasler *et al.* 2012), and cattle (Pryce *et al.* 2011). However, combining results from a genome scan for adaptive selection with a comprehensive genome-wide association study with milk production traits in the Holstein cattle breed, Xu *et al.* (2015) proposed the LAP3 gene (within which the XtX peak mapped) as the main driver of a selective sweep overlapping with region 12. Regarding the four remaining regions (2, 4, 8, and 9), the retained candidate genes corresponded to the gene within which the XtX peak is located (NUDCD3, RPS26, and VDAC1 for regions 2, 4, and 9, respectively) or is the closest (<15 kb from ALB for region 8). As for RPS26, although NUDCD3 has been highlighted in other studies (*e.g.*, Flori *et al.* 2009; Xu *et al.* 2015), the poorly known function of these genes makes highly speculative any interpretation of the origin of the signals. Conversely, the various and important roles played by ALB (bovine serum albumin precursor) do not allow a clear hypothesis to be formulated about the trait underlying the region 8 signal. More presumably, due to the role of VDAC1 in male fertility (Kwon *et al.* 2013), the footprint of selection observed in region 9 might result from selection for a trait related to reproduction. Overall, association analyses carried out under the covariate models revealed strong association of SNPs within KITLG (region 3), KIT (region 7), and EDN3 (region 11) with variation in the piebald pattern across the populations thereby supporting the hypothesis of selection on coat coloration to be the main driver of the three corresponding signatures of selection. These results also confirm the already well-known key role of these genes in coloration patterning. Interestingly, the observed association signals within ALB (region 8) also suggest that this gene might influence coat coloration in cattle, which, to our knowledge, has not been previously reported. Finally, association studies on the SMS trait suggested that PLAG1 (region 12) has been under strong selection in European cattle and contributes to morphological differences across the breeds. Yet, the strongest association signal was 400 kb upstream of PLAG1, suggesting the existence of some functional variants (possibly in regulatory regions) different from those already reported (Karim *et al.* 2011), although such results need to be confirmed with denser SNP data sets. Conversely, no association signal was found within the selection signature under region 6, adding more credit to selection for milk production (Xu *et al.* 2015) as the main underlying adaptive constraint rather than a morphological trait as previously hypothesized (see above). Analysis of the *L. saxatilis* Pool-Seq data (Westram *et al.* 2014) illustrates how BayPass can be helpful to realize a typology of the markers relative to an ecological covariable in a nonmodel species. In agreement with the original results, several genes represent good candidates to underlie parallel phenotypic divergence in this organism and might deserve follow-up validation studies. From a practical point of view, however, compared to combining several pairwise *F*_{ST} population tests (Westram *et al.* 2014), the approach proposed here greatly simplified the analyses and the biological interpretation of the results while allowing both an optimal use of the data and a better control for multiple-testing issues.

Overall, the models described here and implemented in the software package BayPass provide a general and robust framework to better understand the patterns of genetic divergence across populations at the genomic level. They allow (i) an accurate estimation of the scaled covariance matrix whose interpretation gives insights into the history of the studied populations, (ii) a robust identification of overly differentiated markers by correcting for confounding demographic effects, and (iii) robust analyses of association of SNP with population-specific covariables, giving in turn insights into the origin of the observed footprints of selection. In practice, when compared to BayEnv2, BayPass led to a more accurate and robust estimation of the matrix (and the related measures) and thus improved the performances of the different tests. In addition, various program options were developed to investigate the different modeling extensions, including analyses under the STD and AUX models and exploration of the Ising prior parameters to incorporate LD information. More generally, as demonstrated by the analysis of individual-based simulated data sets, the method developed in this study was found to be among the most efficient in terms of power, robustness, and computational cost when compared to the other state-of-the-art or recently developed genome-scan methods. Moreover, as opposed to most of the currently available approaches, the different decision measures (XtX, eBP, and BF) can be computed for both allele (from standard individual genotyping experiments) and read (from Pool-Seq experiments) count data (while also accommodating missing data). Finally, although computation times scale roughly linearly with the data set complexity (number of populations number of markers), for very large data sets, several strategies might be efficient to reduce computational burden. For instance, because estimation of was found robust to moderate ascertainment bias, one may filter low polymorphic markers (*e.g.*, overall MAF <0.01) since those are not informative for genome-scan purposes and/or consider subsampling of the initial data set (*e.g.*, chromosome-wide analyses).

## Acknowledgments

I thank Anja Westram for providing early access to the *Littorina* data and Pierre de Villemeureuil for providing the polygenic data sets simulated under the HsIMM, IMM, and SS scenarios (and information about the underlying Python/SimuPOP scripts). I also thank the two anonymous reviewers for their valuable comments. I am finally grateful to the genotoul bioinformatics platform Toulouse Midi–Pyrenees for providing computing resources. This work was partly funded by the ERA-Net BiodivERsA2013-48 (EXOTIC), with the national funders Fondation pour la Recherche sur la Biodiversité, Agence Nationale de la Recherche, Ministère de l'Ecologie, du Développement Durable et de l'Energie, BELSPO (for BELgian Science POlicy), PT-DLR, and Deutsche Forschungsgemeinschaft, part of the 2012–2013 BiodivERsA call for research proposals.

## Footnotes

*Communicating editor: J. D. Wall*Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.181453/-/DC1.

- Received July 31, 2015.
- Accepted October 12, 2015.

- Copyright © 2015 by the Genetics Society of America