## Abstract

There is currently large interest in distinguishing the signatures of genetic variation produced by demographic events from those produced by natural selection. We propose a simple multilocus statistical test to identify candidate sites of selective sweeps with high power. The test is based on the variability profile measured in an array of linked microsatellites. We also show that the analysis of flanking markers drastically reduces the number of false positives among the candidates that are identified in a genomewide survey of unlinked loci and find that this property is maintained in many population-bottleneck scenarios. However, for a certain range of intermediately severe population bottlenecks we find genomic signatures that are very similar to those produced by a selective sweep. While in these worst-case scenarios the power of the proposed test remains high, the false-positive rate reaches values close to 50%. Hence, selective sweeps may be hard to identify even if multiple linked loci are analyzed. Nevertheless, the integration of information from multiple linked loci always leads to a considerable reduction of the false-positive rate compared to a genome scan of unlinked loci. We discuss the application of this test to experimental data from *Drosophila melanogaster*.

THE central role of adaptation for the evolution of natural populations is widely accepted. Nevertheless, until very recently no systematic approaches were available to uncover the genetic changes underlying adaptation processes in natural populations.

One approach relies on population genetic principles to identify beneficial mutations from patterns of natural variation and has been called hitchhiking mapping (Schlötterer 2003). The basic idea of hitchhiking mapping is that beneficial mutations increase in frequency until they become fixed in the population. Hereby, not only the selected site but also linked neutral variants are affected by such a selective sweep (Maynard Smith and Haigh 1974). As a consequence of the spread of a beneficial mutation, levels of variability are strongly reduced in the genomic region flanking the selected site. Hence, hitchhiking mapping does not require the exact location of the selected site to be known; it is sufficient to analyze linked neutral markers. This approach has already gained widespread interest and a number of studies have set out to survey genomewide levels of genetic variability to trace regions in the genome that may have been shaped by natural selection (Schlötterer *et al*. 1997; Payseur *et al*. 2002; Vigouroux *et al*. 2002; Glinka *et al*. 2003; Kauer *et al*. 2003; Kayser *et al*. 2003; Schöfl and Schlötterer 2004; Ihle *et al*. 2006; Pool *et al*. 2006).

In a genome screen for beneficial mutations, a large number of loci are analyzed. Due to their high amount of polymorphism microsatellites are well-suited markers (Schlötterer 2004). Their high informativeness and the cost-effective typing has made microsatellites the marker of choice for many hitchhiking mapping studies. The emerging picture from all of the various studies was that genome scans are a suitable tool for the identification of putatively selected genomic regions (Kohn *et al*. 2000; Harr *et al*. 2002; Vigouroux *et al*. 2002; Wootton *et al*. 2002; Kayser *et al*. 2003; Nair *et al*. 2003; Schlenke and Begun 2004; Schöfl and Schlötterer 2004; Ihle *et al*. 2006).

Typical hitchhiking mapping studies use markers that are distributed across the genome. Despite some recent progress toward a high-throughput analysis of markers, the density of characterized markers is still fairly low. Hence, in most cases a candidate region is identified on the basis of a single locus only. The problem of such large surveys is that each locus is tested for deviation from neutral expectations and a large number of tests could result in substantial numbers of false positives. While statistical approaches, such as the Bonferroni correction (Sokal and Rohlf 1995) and false discovery rate (FDR) (Storey 2002), could account for this, the trade-off is that the sensitivity of the mapping strategy can be compromised.

Alternatively, after an initial genomewide survey one can perform a more detailed analysis of additional markers flanking a previously identified candidate locus. The rationale is that the genealogical history around a selected site is more uniform, and resembles the history of the selected site, than would be expected under neutrality. Hence, since flanking loci are also affected by a selective sweep, they should be useful to obtain further confidence that an identified genomic region has been exposed to a recent selective sweep. In agreement with this expectation experimental work demonstrated that variability is often reduced in genomic regions flanking a putative selective sweep (Harr *et al*. 2002; Nair *et al*. 2003; Schlenke and Begun 2004). Nevertheless, until recently no statistical tests were available for microsatellite data to evaluate the statistical significance of reduced variability at linked microsatellite loci.

Here, we describe a new test statistic for the identification of recent selective sweeps using linked microsatellites.

## METHODS

#### Definition of ln *R*θ:

A selective sweep often reduces variability in the flanking sequences (Charlesworth 1992; Hudson 1994; Schlötterer 2003). Hence, a genome scan for regions affected by selective sweeps could focus on those regions with low levels of variability. If microsatellites are used as genetic markers in such genome scans, the complication arises that microsatellites have a large heterogeneity in mutation rates (Schlötterer 2000; Ellegren 2004). To distinguish between low variability due to a low mutation rate and a selective sweep, additional information is required. To overcome this limitation, it has been proposed to consider for each locus the ratio of the population variation estimators (θ = 4*N*_{e}μ) from two populations (Schlötterer 2002; Kauer *et al*. 2003). If the mutation rate μ is the same in both populations, it cancels out and an estimator *R*θ is obtained that does not explicitly depend on the mutation rate but only on the heterozygosities in the two populations:(1)In Equation 1, the first equality is the definition of *R*θ and the last equality is derived from Ohta and Kimura's (1973) formula of the expected heterozygosity (*H*) in the stepwise mutation model.

For unlinked microsatellites extensive computer simulations showed that the distribution of the natural logarithm of *R*θ is well approximated by a standard normal distribution with the mean corresponding to the ratio in effective population size of the two populations compared. The standard deviation was shown to be affected by mutation rate, sample size, and demography (Schlötterer 2002; Kauer *et al*. 2003; Schlötterer and Dieringer 2005). To test for deviation from neutral expectation, a set of neutrally evolving loci (reference loci) is required to estimate the mean (*m*) of ln *R*θ and its standard deviation (σ). Hence, altogether four data sets are required. The loci to be tested need to be genotyped in two populations (test loci), and a set of reference loci needs to be genotyped from the same two populations. The ln *R*θ-values *y _{i}* of the test loci are transformed according to

*z*(

_{i}=*y*)/σ,

_{i}− m*i =*1, 2, …, to obtain standard-normal-distributed variates. Throughout this article we assume that selection or a bottleneck occurred in population 1 (Pop

_{1}). Thus, negative ln

*R*θ-values are expected for loci that are linked to a selected site. Due to the above transformation the expected value of ln

*R*θ = 0 under a population bottleneck, since the reference loci are expected to be equally affected by the bottleneck as the test loci.

#### Principle of the test:

Here, we extend the single-locus test to multiple linked loci. The rationale is that a selective sweep typically affects a genomic region that is large enough to contain multiple microsatellite loci. Thus, their joint analysis should provide more confidence about deviation from neutrality than an analysis based on a single microsatellite marker. The size of the region affected by a single selective sweep depends mainly on the strength of selection and the local recombination rate. A rough estimate is obtained by calculating the “half-life” of a selective sweep using Equation 19 from Stephan *et al*. (1992). The half-life (*D*) is the size of the region around the selected site in which variability is reduced by 50% or more of its neutral equilibrium value. This isFor instance, if 2*Ns* = 1000 and *r*/*s* =10^{−6} the region in which variability is reduced by ≥50% has a size of ∼110 kb. Qualitatively similar estimates have been obtained by Kaplan *et al*. (1989). With the above parameters one would obtain an estimate of ∼370 kb (see Table 2 in Kaplan *et al*. 1989).

The experimental design for a survey of linked loci is tricky, as the parameters of a selective sweep, and therefore the size of the affected genomic region, are not known *a priori*. One further complication arises from the variation in microsatellite density among genomic regions (Bachtrog *et al*. 2000). To avoid the problem that the number of analyzed and/or available markers affects the test results, we propose the following multilocus test strategy:

Standardize ln

*R*θ-values of the test loci as described above.Identify the locus

*x**with the lowest ln*R*θ-value.Starting from

*x**include all upstream and downstream microsatellites for the test until, on either side of*x**, the first locus with a positive value of ln*R*θ is found or the terminus of the analyzed sequence is reached. Using this procedure,*K +*1 loci are chosen (a number that may be different for different data sets).Drop locus

*x**from further analysis; this results in a set of*K*loci to be analyzed.Calculate the test statistic

*T*(*K*) = , which is the sum of the standardized ln*R*θ-values of the*K*loci.Determine the

*P*-value.

Under neutral scenarios, either with constant population size or with population bottleneck, and if the individual loci are not tightly linked, the distribution of *T*(*K*) is approximately normal with mean ∼0 and standard deviation √*K*. Tight linkage leads to a violation of the assumption of independence and in this case the distribution of *T*(*K*) is not known analytically.

Two different approaches can be used to determine if *T*(*K*) deviates from neutral expectation. The first one is based on computer simulations and the second relies on the simplifying assumption of independence of linked loci. Both approaches are discussed in detail below.

#### Simulating the distribution of the test statistic:

We use neutral coalescent simulations to determine *T*(*K*) for 10,000 neutral data sets. Assuming that selection will generate more extreme (negative) *T*(*K*) values than expected under neutrality, we determined the lower α-quantile *q*_{sim}(α) of *T*(*K*) (for example, α = 0.05) below which we consider a test significant. Note that *q*_{sim}(α) does not depend on *K*, since it is obtained from averaging over a large number of data sets (10,000) and in each data set *K* may be different. A test may be significant either due to several slightly negative ln *R*θ-values or due to a small number of strongly negative ln *R*θ-values. The simulation approach is preferable for small populations and closely linked loci (see results). Our computer simulations were performed with modified versions of the program “ms” of R. Hudson (Hudson 2002) and a program of Y. Kim (Kim and Stephan 2002). Both programs were originally written to simulate the distribution of segregating sites in DNA sequences. A modification was needed to account for microsatellite evolution. Furthermore, to avoid memory and runtime problems when simulating a large genomic region (of length 600 kb), we assumed that recombination scales linearly. For our purposes it is then equivalent to simulate a region for which the length is downscaled by some factor and simultaneously the recombination rate is upscaled by the same factor. Thus, we simulated a region of 600 bp with a 1000-fold increased recombination rate. Generation of the neutral and bottlenecked data sets **N** and **B** (see below) was performed with the modified version of ms, which is fast and permits the incorporation of demography, in particular, population bottlenecks. This program does not produce the genealogy of individual sites but partitions the unit interval into regions that share the same genealogy. We simulated microsatellite mutations according to an unbiased stepwise mutation model. Each mutation either added or removed one repeat unit with equal probability. At the end of the simulation the program provides the number of repeat units for each locus and chromosome. To generate data set **S** (see below) we modified the program of Y. Kim (Kim and Stephan 2002) to simulate the genealogy of a positively selected site embedded in a genomic region of 600 bp. This program generates polymorphism data and provides the character state for each nucleotide in the region. Microsatellites were modeled by reinterpreting the evolution of nucleotides. Each site that coincided with a microsatellite position was treated as a microsatellite array and each mutation either added or removed one repeat unit with equal probability.

Both programs produced consistent results for neutral simulations of constant-size populations when corresponding parameters were used. We further checked the software with a different microsatellite evolution software (Schlötterer 2002) and also obtained consistent results.

#### Simulated data sets:

##### Neutral, constant population size (data set **N**):

Assuming neutrality and a constant population size, we generated various data sets differing in the scaled recombination rate (*R* = 4*Nr*), the scaled mutation rate (θ = 4*N*μ), and sample size (*n*). Each data set consisted of 10,000 replicate simulations.

##### Bottlenecked populations (data set **B**):

Population bottlenecks were simulated assuming a three-phase model: a population of constant size *N* is reduced to constant size *d · N* during time interval δ. The population size before and after the bottleneck was assumed to be identical. We further assumed that reduction and increase in population size was instantaneous. The population size reduction (*d*) and duration (δ) of the bottleneck were combined into the bottleneck-severity parameter γ = δ/*d*. The severity was varied from 0.1 to 10. Furthermore, we also modified the time τ when the bottleneck was completed (measured in *2N* generations, going backward from the present). The population recombination rate was set to 0.02/bp/generation and the scaled mutation rate was fixed at θ = 5. Samples of *n* = 60 chromosomes were simulated.

#### Selective sweep (data set S):

Data sets with selected loci were obtained by the modified version of Kim's program (Kim and Stephan 2002). We varied the selection coefficient 2*Ns* from 10 to 2000. For all simulations we assumed a very recent selective sweep that was completed τ = 0.001 generations ago, where time is measured in units of 2*N*. We set ϵ to 10^{−4} (ϵ is a parameter in a sweep scenario that determines from which minimal frequency onward the dynamics of the beneficial allele are treated as deterministic rather than stochastic; see Kim and Stephan 2002 for a detailed discussion on the choice of ϵ).

#### Candidate regions (data sets N_{L} and B_{L}):

To mimic a typical first-pass hitchhiking mapping study, we considered a subset of either the neutral runs **N** or bottleneck runs **B**: we selected from these simulations the 5% of the runs with the lowest ln *R*θ-values at a particular locus (say *x**). These subsets are called **N _{L}** and

**B**, respectively. The rationale is that a significantly reduced value of ln

_{L}*R*θ at a locus would be falsely interpreted as the trace of a selective sweep in a one-locus test;

*i.e*., for data sets

**N**and

_{L}**B**the false-positive rate would be 100%. It is our goal to reduce the false-positive rate in such cases by adding information from flanking loci.

_{L}#### Approximate test based on the assumption of a normal distribution:

The second approach applies to large neutrally evolving populations and relies on the observation that levels of heterozygosity are correlated only among very tightly linked sites (Figure 1). In contrast, a selective sweep leads to a drastic increase of the correlation of heterozygosity levels among linked microsatellite loci. As a simplifying null model we therefore treat neutrally evolving multiple microsatellites as independent even if there is some linkage. Since the sum of *k* independent standard-normal-distributed random variables is also normal with mean 0 and standard deviation √*k*, it is easy to construct a one-sided test on the basis of this distribution. More explicitly, one compares *T*(*k*) with the α-quantile *q*_{theo}(α, *k*) and rejects the null hypothesis (“no selective sweep”) if *T*(*k*) *< q*_{theo}(α, *k*), which is the α-quantile of the normal distribution with mean 0 and variance *k*. In contrast to the simulation approach described before, the critical value *q*_{theo}(α, *k*) here is based on the number *k* of loci that are included in a particular test. For different tests, *k* may be different.

#### Multiple testing:

A practical consequence of this method is that it can be applied to any number of linked loci. Irrespective of the number of loci surveyed, the locus with the smallest ln *R*θ-value is identified and *k* flanking loci are then chosen for the test. Note that only a single test is performed per region, since the *k* flanking loci are tested jointly. Thus, no adjustment for multiple testing is required. Recall that the approximate analytical test rests upon the assumption that the genealogical histories of the microsatellite loci are independent.

#### Calculation of ln *R*θ and standardization:

Similar to the ln *R*θ test based on a single locus or unlinked loci, the multilocus test also requires data from two populations, population 1 and population 2. They need to be typed for the same set of loci to calculate the ln *R*θ-values. Throughout all analyses population 2 is assumed to evolve neutrally at a constant population size (*N* = 5 × 10^{5}). Once ln *R*θ-values are obtained, they are standardized. This requires two additional data sets from different loci for the same two populations to estimate mean (*m*) and standard deviation (σ) of ln *R*θ. For this we simulated an additional 10,000 unlinked loci with otherwise the same population parameters and calculated *m* and σ. Data set **S** is standardized with values obtained from neutral simulations.

#### Marker spacing:

All test statistics that are based on multiple, linked loci depend on the number of loci, their physical spacing, and the recombination rate. Thus, an almost infinite number of possible parameter combinations could be tested. We have therefore focused on the analysis of some representative examples either to evaluate the influence of one parameter or to indicate a general trend. We considered different marker distributions along a 600-kb genomic region. In one case (marker topology A, see Figure 2) we assumed that the loci were evenly spaced and that the target of selection coincided with the position of one of the analyzed loci. In another set (marker topology B) we assumed the position of the selected site to be located between two adjacent neutral markers, all of them evenly spaced. In a third case (marker topology C) the marker spacing was obtained from an experimental data set of 15 microsatellites (Harr *et al*. 2002).

#### Experimental data from *Drosophila melanogaster*:

We applied the new multilocus test to a recently identified sweep region on the third chromosome of *Drosophila melanogaster* (Harr *et al*. 2002). For the standardization of the data we were particularly cautious to use test and reference loci from the same populations. We used 40 third-chromosomal reference loci (Kauer *et al*. 2003) for one African (Kisoro, Uganda) and one European (Katovice, Poland) population. We genotyped 15 microsatellites covering the sweep region for 15 Kisoro and 30 Katovice females that were first-generation descendants from freshly collected flies, using standard typing protocols (Schlötterer 1998). All loci are located on autosomes. Gene diversity was calculated with the “Microsatellite Analyzer” software (Dieringer and Schlötterer 2003).

## RESULTS

The multilocus test statistic *T*(*K*) rests on the assumption that the pattern of variability at linked loci is more correlated under selection than under neutrality. However, the increase of correlation as well as the reduction of variability produced by a selective sweep is transitory and depends on recombination rate and selection coefficient as well as on the time since the selective sweep was completed. Our approach focuses on the expectation that a recent selective sweep leaves reduced levels of variability in a genomic region flanking the target of selection. Therefore, such a region should harbor more linked loci with reduced variability than expected under a neutral scenario. To account for the large variation in microsatellite density across a genome, we propose a dynamically adjusted, rather than a fixed, number of markers to be tested. The details are described in materials and methods.

Data set **N** provides the (simulated) distribution of *T*(*K*) and the critical value *q*_{sim}(α). Due to the construction of the test, the distribution's mean and median are not equal to zero, but are slightly shifted to a positive value (Figure 3A, shaded solid line). This is due to the way in which multiple markers are selected for the test: both the leftmost and the rightmost markers are required to have positive ln *R*θ-values, thereby creating a slight upward bias of the distribution of *T*(*K*). The false-positive rate of the test for data sets **N _{L}** and

**B**is determined on the basis of

*q*

_{sim}(α).

While computer simulations yield an accurate estimate for the probability that a given data set is consistent with neutral expectations, they suffer from the disadvantage that *a priori* assumptions have to be made for a set of parameters, such as the mutation and recombination rates, θ = 4*N*μ and *R* = 4*N*ρ. Therefore, we also use an approximate analytical procedure that assumes independence between linked loci and standard normality of the statistic ln *R*θ for single loci. Figure 1 shows the correlation coefficient between the ln *R*θ-values for a pair of loci with varying distance. Even for markers as close as 1 kb the correlation coefficient is quite low (∼0.1). This shows that under neutrality only a very small proportion of the variation at a given marker can be explained by a linked one, even if linkage is tight. We determined the power and the false-positive rate of the test with two test strategies, the one relying on the simulated distribution of *T*(*K*) (Monte Carlo strategy) and the one relying on an analytical approximation (Tables 1–3⇓). Both test strategies produced highly consistent results. For most cases we observed that the theoretical false-positive rate was below the simulated one.

The various data sets and their descriptions together with the symbols used are listed in Table 4.

#### Power estimates:

The statistical power to detect deviation from neutrality is assessed with the help of data set **S** in which data for a range of different selection intensities and recombination rates were simulated. Overall, our test had a high statistical power with many scenarios having a 100% detection rate of selective sweeps. The most important factor influencing the power of our test is the strength of selection, with small selection coefficients (2*Ns <* 10) resulting in a very low power. Similarly, high recombination rates (*R >* 2 × 10^{−2}) also resulted in a loss of power to detect a selective sweep. Consistent with previous results for unlinked loci (Schlötterer and Dieringer 2005), the mutation rate had almost no impact on the statistical power. The same holds for the sample size *n*, except for extremely small samples (Table 1). We also found the tests based on gene diversity (heterozygosity) to be more powerful than those based on the variance in repeat number (data not shown). Another parameter that strongly influences the detection rate of selective sweeps is the time τ since a selective sweep was completed. Recent sweeps are easy to detect, while older ones are not. We observe a dramatic loss in power when the ratio *s*/τ < 0.1; sweeps for which the ratio *s*/τ < 0.01 are virtually undetectable (results not shown).

#### False positives:

A set of linked microsatellites can be analyzed in two different experimental contexts. Either the genomic region was chosen in the absence of *a priori* information (first-pass genome scan) or previous work suggested that a surveyed region might not evolve neutrally and a dense marker analysis was subsequently performed for that region (candidate region analysis). Any statistical test using linked microsatellite data has to distinguish between these two scenarios.

#### False positives in first-pass genome scans:

We determined the false-positive rate for a first-pass genome scan using linked microsatellites by the comparison of two neutral data sets (data set **N**). In this case the false-positive rate for the Monte Carlo test strategy is *a fortiori* equal to 5%, since the Monte Carlo simulations are used only to determine the critical value. However, the false-positive rate for data set **N** can be determined for the approximate analytical test strategy. We find that, irrespective of the mutation and recombination rates, the number of false positives is close to the expected value of 5% (Table 2a). Only in the case of very low recombination rates (2 × 10^{−4}) is the false positive rate slightly >5%.

#### False positives in a candidate region analysis:

The analysis of linked microsatellite data for a fine-scale candidate region analysis is more complicated. Depending on the choice of the level α of the test, each genome scan will identify a fraction of putatively selected loci, even if both populations have been evolving neutrally. Thus, an important question is whether the analysis of flanking microsatellite loci could reduce the number of false positives. We used a neutral data set consisting of 10,000 simulations of linked microsatellites and selected those runs for which the ln *R*θ-values of one particular locus fell in the lower 5% tail (data set **N _{L}**, as explained in materials and methods). The resulting 500 simulations all had low ln

*R*θ-values at the same locus and were subject to the multilocus test. The results show that the analysis of linked loci drastically reduces the number of false positives. Compared to an analysis of linked loci with no

*a priori*information (“first-pass genome-scan” scenario), the false-positive rate increased generally less than twofold (Table 2b). Interestingly, even for the case of low recombination the approximate analytical test resulted only in 10.6% false positives. Hence, the analysis of flanking markers can provide an efficient tool to enhance the specificity of hitchhiking mapping studies.

#### Dependence on the number and spacing of microsatellites:

In the analyses described above, we considered 29 loci evenly spaced over 600 kb of genomic DNA and the target of selection coincided with one of the microsatellite loci (Figure 2, marker topology A). We also tested how the number and distribution of microsatellites affect the power and false positives. First, we analyzed the case that the target of selection falls in the middle between two microsatellites. The region was again of length 600 kb and contained 30 microsatellite loci spaced at a distance of 20 kb (Figure 2, marker topology B). We observed a slight reduction in power. The loss in power is more pronounced only for the case of high recombination and the approximate analytical test strategy. On the other hand, the false positive rate is slightly increased for the case of low recombination. In general, the loss in power and increase in false positives do not severely compromise the test (supplemental Table S3 at http://www.genetics.org/supplemental/). Thus, the position of markers relative to the target of selection has no major effect on the test statistic.

Second, we varied the physical distance of microsatellites. Although this analysis is equivalent to varying the recombination rate, we included it as a guideline for experimental design as experimentalists are often interested to what extent the additional expenses of an increased marker density are paid off by an increase in statistical power. Our analysis of different recombination rates already indicated that more densely distributed microsatellites might yield higher power. We investigated this effect in detail by fixing the recombination rate, the selection coefficient, and the number of markers (supplemental Table S2 at http://www.genetics.org/supplemental/). Consistent with our results for different recombination rates, we also observe an increase in power with marker density. The trade-off is an increase in the false-positive rate of the test for tightly spaced markers. However, even for a marker distance as close as 5 kb the false-positive rate in the candidate region scenario is only 10.8% when the approximate analytical test strategy is applied.

Third, we determined the influence of the number of loci genotyped. We analyzed an increasing number of loci (5, 10, 15, and 20 loci), but the spacing of the loci remained constant at 10 kb. As expected, we observe no reduction in statistical power when fewer loci are available. The false-positive rate also remains under 7% in the candidate region scenario and when the approximate analytical test strategy is applied. For the Monte Carlo test strategy, the false-positive rate increases almost fourfold in the case of only five available markers (supplemental Table S1 at http://www.genetics.org/supplemental/).

Finally, we determined power and false-positive rate of the multilocus test for a set of 15 microsatellites with physical distances as in the experimental data set from Harr *et al*. (2002) (Figure 2, marker topology C). In line with the above results we find that the power of the test depends most critically on the recombination rate, with low power for high recombination rates. The number of false positives remains under 10.6% for all cases investigated (supplemental Tables S4a–S4c at http://www.genetics.org/supplemental/).

#### Impact of demography:

Until now, we were considering only stable populations with no changes in population size. Most genome scans for selected genomic regions are, however, performed in populations that experienced a recent reduction in population size. While genome scans using the ln *R*θ-test statistic for unlinked loci are not strongly affected by changes in population size, an analysis of linked loci needs to account for demographic events.

Typically, three parameters are used to characterize population bottlenecks, the depth of the bottleneck (*d*), the duration of the bottleneck (δ), and the time τ when the bottleneck was completed, looking backward from the present. We also call the latter the time of onset. The depth is the factor by which the original population size *N* is reduced during the bottleneck. To shrink the parameter space, we use the ratio γ = δ/*d* of bottleneck duration and depth, called severity, and fix the duration at δ = 10^{−5} (in units of 2*N*). While this simplification is unproblematic for typical parameter values, it should be noted that it becomes incorrect for very long and shallow bottlenecks. However, as far as our test is concerned such cases do not inflate the false-positive rate (see supplemental Figure S1 at http://www.genetics.org/supplemental/). Very recent and very severe bottlenecks result in a high proportion of invariable loci (>15%). For such demographic scenarios the ln *R*θ-statistic is not applicable since the high number of invariant loci makes ln *R*θ nonnormally distributed (Schlötterer 2002). Nevertheless, such extreme bottlenecks can be easily distinguished from selective sweeps by the large number of invariant loci occurring throughout the entire genome. Furthermore, the multilocus test did not result in an increased number of false positives for old bottlenecks, irrespective of their severity. Recent bottlenecks with a low severity (γ < 0.2) were also unproblematic. In contrast, bottlenecks with intermediate severities (γ between 0.5 and 2) produce a signal that the multilocus test often misinterprets as deviation from neutrality. In the worst case (γ = 2, τ = 0.002) we detected 40.2% false positives (Table 3a, Figure 3B, and supplemental Table S4d at http://www.genetics.org/supplemental/). We also tested to what extent population bottlenecks compromised the ability to reduce the number of false positives in a set of candidate loci derived from a first-pass genome scan. As for the constant population case described above, we selected those data sets for which the ln *R*θ-values of one particular locus fell in the lower 5% tail. In contrast to the previous analysis, the low 5% were taken from bottleneck simulations. As expected for old bottlenecks and low severity values, the analysis of linked loci resulted in a considerable reduction of false positives. Bottlenecks with intermediate severities remain problematic in being distinguishable from selective sweeps. Nevertheless, even in the worst case the analysis of linked loci reduces the number of false positives from 100 to 58.6% (Table 3b, Figure 3B, and supplemental Table S4e at http://www.genetics.org/supplemental/).

#### Application:

We analyzed a genomic region that has previously been identified as a putative target of a selective sweep. Most important, despite that linked microsatellites were genotyped, only a single locus was found to be deviating from neutral expectations. We genotyped the same 15 microsatellites that were analyzed previously by Harr *et al*. (2002) in one additional European (Poland) and one African (Uganda) population. Figure 4 indicates the ln *R*θ-values for all 15 loci covering a 600-kb region on the third chromosome in *D. melanogaster*. Similar to the results of Harr *et al*. (2002) we found one locus with a pronounced reduction in variability. Our multilocus test statistic was *T*(10) = −5.795. For a level of α = 0.05 the quantile is *q*_{theo}(α, 10) = −5.201 and the *P*-value is *P* = 0.0334 [*i.e*., the cumulative probability of the normal distribution with mean 0 and variance 10 evaluated at *T*(10) = −5.795]. Therefore, the reduction of variability at the 10 loci adjacent to the one with the strongest reduction of ln *RH* provided sufficient information to reject the null hypothesis of neutrality.

Given that the habitat expansion of *D. melanogaster* was associated with a pronounced population bottleneck, it is important to consider its impact on the multilocus test. For instance, the population bottleneck parameters that had been inferred by Haddrill *et al*. (2005) for non-African populations of *D. melanogaster* would yield a false-positive rate of 20.6% in the genome-scan scenario and of 49.6% in the candidate locus scenario. Assuming that a similar parameter range may apply to European populations of *D. melanogaster*, there would be an odds ratio of ∼1:2 for the observed data to be due to a bottleneck instead of a selective sweep.

## DISCUSSSION

Neutrality tests based on microsatellites focus mainly on unlinked loci and are based on either reduced variability (ln *R*θ) or population differentiation (*F*_{ST}). Tests of linkage disequilibrium are rarely used for microsatellite data due to the difficulty of haplotype inference of multiallelic loci. Furthermore, tests of linkage disequilibrium are extremely sensitive to the demographic history. In contrast, our proposed test considers ln *R*θ of several linked loci for the inference of selection. While the single-locus ln *R*θ-test is largely independent of the demographic past, the additional power of linked loci is balanced by the cost of an increasing dependence on the demographic past.

Previously, an alternative method using linked microsatellites for the identification of selection was proposed (Pollinger *et al*. 2005). The authors show that a pronounced reduction in variability at three linked microsatellite loci provides a very strong signal of a selective sweep. However, with this test strategy it is not clear how to account for a theoretical null distribution. A general method for the identification of selection at linked microsatellites should be more flexible and cope with the following challenges:

The method should be independent of the number of microsatellites genotyped, as the target of selection is often not known.

The heterogeneity in microsatellite density could result in a clustering of markers and thus lead to false positives.

The mutation rate of microsatellites often differs by orders of magnitude; to distinguish whether low variability is due to selection or due to a low mutation rate, it is important to filter out mutation rate differences.

It should be suitable for a candidate region study to confirm or dismiss a selective sweep detected by a single-locus test in a genome scan.

Our proposed multilocus test is designed to meet these criteria. Its key property is the independence from an *a priori* choice of the number of loci to be typed. Through a dynamic, *a posteriori*, determination of the number of loci, the multilocus test can be applied to any number of available loci, allowing for an arbitrary spacing of markers. Nevertheless, if the genotyped markers do not adequately cover the selected region, the power of the test statistic may be reduced, as only a fraction of the loci carrying the signal of the selective sweep are included. Furthermore, because the ratio of variability in two populations is considered for each locus, mutational differences between loci are mostly eliminated (Equation 1)—at least as far as first-order moments are concerned. Therefore, the test statistic *T*(*K*) *=* , as described above, does depend only marginally on the mutation rate θ. We have analyzed heterozygosity (ln *RH* in the notation of Schlötterer 2002) as well as the variance in repeat number at microsatellites (ln *RV*) as measures of variability. We find qualitatively corresponding results for ln *RV* and ln *RH*. However, the multilocus test generally is more powerful and has a similar or even lower false-positive rate for ln *RH* than for ln *RV*. Therefore, we decided to report here only the results for ln *RH*. Finally, as our multilocus test is based only on markers flanking the candidate locus, the false-positive rate can be kept low and is in fact close to the theoretical level α of the test. While this strategy is conservative it leads to a minor loss in power compared to an approach that uses all markers. Nevertheless, the power to detect recent sweeps remains close to 100%, except in regions of high recombination. The power also drastically decreases with the age of selective sweeps.

As in all tests that use information from linked sites, the most difficult problem remains to disentangle the effects of a selective sweep from a medium severe, recent bottleneck. If the demographic past and other population parameters are known, it is possible to use the Monte Carlo test strategy and determine the distribution of *T*(*K*) by simulations. As this information is often not available, we focused on an alternative test strategy. Assuming no linkage among the microsatellites, we determined the increase in false positives when one population experienced a change in population size. We found that bottlenecks of intermediate severity resulted in the most pronounced increase in false positives. Interestingly, a recent reanalysis of DNA sequence polymorphism data in *D. melanogaster* (Haddrill *et al*. 2005) also identified a demographic scenario that was compatible with data for which previously multiple selective sweeps were detected (Glinka *et al*. 2003). Recent population bottlenecks with a severity roughly between 0.5 and 2 can produce a signature in the genome that is easily misinterpreted as selection. Similar results have been obtained for sequence variability data instead of microsatellites. Jensen *et al*. (2005) note that the composite-likelihood-ratio test of Kim and Stephan (2002) may yield an up to 90% false-positive rate for certain parameter combinations.

It can be shown that a high false-positive rate of our multilocus test for the mentioned range of bottleneck severities is associated with a high variance in heterozygosity at the microsatellite loci. Recent population bottlenecks of intermediate severity can lead to a more than fourfold increase of the variance of heterozygosity compared to the variance in a constant-size population (Figure 5). In a very similar manner, selective sweeps, while decreasing average heterozygosity, also lead to an increase of the variance of heterozygosity.

Our test assumes that variability data from two genetically isolated populations are compared. Migration between populations generally results in a more distant common ancestor and increased genetic variability within a population. Therefore, migration might reduce the power but should not increase the false-positive rate of our test. Nevertheless, the details of the impact of extended gene flow upon false-positive rate and power of the test need further investigation. Furthermore, in classical hitchhiking models—and also in the one considered here—it is assumed that the selected allele is codominant. As has been pointed out by Teshima *et al*. (2006), recessivity or dominance may have a nonnegligible effect upon tests for selective sweeps.

## Acknowledgments

We thank M. Thomas for many stimulating discussions and two anonymous reviewers for helpful comments. This work has been supported by grants from the Fonds zur Förderung der wissenschaftlichen Forschung to C.S. and from the German Ministry of Education and Research (FK 0312705A) and the German Science Foundation (DFG-SFB680) to T.W.

## Footnotes

Communicating editor: R. Nielsen

- Received July 18, 2006.
- Accepted October 14, 2006.

- Copyright © 2007 by the Genetics Society of America