## Abstract

Neutrality tests based on the frequency spectrum (*e.g*., Tajima's *D* or Fu and Li's *F*) are commonly used by population geneticists as routine tests to assess the goodness-of-fit of the standard neutral model on their data sets. Here, I show that these neutrality tests are specific instances of a general model that encompasses them all. I illustrate how this general framework can be taken advantage of to devise new more powerful tests that better detect deviations from the standard model. Finally, I exemplify the usefulness of the framework on SNP data by showing how it supports the selection hypothesis in the lactase human gene by overcoming the ascertainment bias. The framework presented here paves the way for constructing novel tests optimized for specific violations of the standard model that ultimately will help to unravel scenarios of evolution.

THE standard models of population genetics (*i.e*., the Wright–Fisher model and related ones) constitute null models for which an amazing amount of theory has been developed. Population geneticists have used some aspect of the theory (*e.g.*, summary statistics) to test the goodness-of-fit of the standard model on a given data set. Rejection of the standard model typically suggests that alternative hypotheses, such as selection or demographic history, have to be accounted for. Although they test for more than neutrality, tests that compute the goodness-of-fit of the standard model have been referred to as “neutrality tests.” Since different neutrality tests have varying sensitivity to different violations of the standard model, one typically uses a plethora of tests on the data set of interest. One then hopes that the evolutionary processes that generated the data set will be, at least partially, uncovered by the tests. Although neutrality tests based on population samples exhibit important diversity, they can be assigned to families such as “haplotype tests” (*e.g.*, Fu 1997; Depaulis and Veuille 1998) that use the distribution of haplotypes, “tree shape tests” that try to capture specific tree deformations (*e.g.*, Ramos-Onsins and Rozas 2002), and “frequency spectrum tests” that are based on the frequency spectrum (*e.g.*, Tajima 1989; Fu and Li 1993b; Fay and Wu 2000; Achaz 2008).

In this study, I investigate neutrality tests based on the frequency spectrum (hereafter referred to simply as neutrality tests) and show that they are all specific instances of a general framework. Neutrality tests compare two estimators of the population mutation parameter θ that characterizes the mutation–drift equilibrium. It is defined as θ = 2*pN*_{e}μ, where *p* is the ploidy (1 for haploids and 2 for diploids), *N*_{e} is the effective population size, and μ is the locus neutral mutation rate. When the standard model is true, the expectations of the several unbiased estimators of θ are equal.

Typical estimators of θ, in a sample of *n* sequences, are , where *S* is the number of polymorphic sites and (Watterson 1975), and , where π is the average pairwise difference between all sequences in the sample (Tajima 1983). If an outgroup is available, mutations at frequency *i*/*n* can be distinguished from mutations at frequency 1 − *i*/*n*. Following Fu (1995)'s notations, **ξ** is a vector that represents the unfolded frequency spectrum composed of ξ_{i}, the number of polymorphic sites at frequency *i*/*n* in the sample (*i* ∈ [1, n − 1]). When no outgroup is available, the frequency spectrum is folded and is given by a vector **η**, composed of η_{i}, the number of polymorphic sites at both frequencies *i*/*n* and 1 − *i*/*n*. Accordingly, it has been shown that θ can be estimated from , with ξ_{1} the number of derived singletons (Fu and Li 1993b), from , with η_{1} the total number of singletons (derived and ancestral) (Fu and Li 1993b), and from (Fay and Wu 2000). Recently, it has been suggested that singletons should be ignored when θ is estimated in samples with sequencing errors; this leads to estimators such as , and (Achaz 2008). Other estimators of θ, such as and , were designed to minimize their variance (Fu 1994b), although they can be computed using recursions only for a given value of θ.

Neutrality tests compute the goodness-of-fit of a statistic *T*, which is the difference between two estimators of θ, normalized by its standard deviation:(1)

For a given θ, under the standard model, *T* has a mean of *E*[*T*] = 0 and a variance of Var[*T*] = 1. Lowercase letters (*e.g.*, *t*) denote the absolute difference (*i.e*., the numerator only) and uppercase letters (*e.g.*, *T*) denote the normalized difference (Equation 1) throughout this work. Interestingly, the variance in the denominator is a function of both θ and θ^{2}. Because θ is unknown, the denominator cannot be computed as such. In practice, unbiased estimators of θ and θ^{2} must be used instead. Because the variance of vanishes asymptotically in a very large sample (), θ and θ^{2} are, in practice, substituted by estimators based on *S* (Tajima 1989), which changes the mean and the variance of *T* to *E*[*T*] ≈ 0 and Var[*T*] ≈ 1.

Tajima's *D* (Tajima 1989) is defined by ; the statistics proposed by Fu and Li (1993b) are , , , and . Another classical statistic is (Fay and Wu 2000), even though its variance was not given by the authors. Finally, two other related neutrality tests that are, *a priori*, immune to sequencing errors were proposed: and (Achaz 2008). Other tests based on θ_{ξ} and θ_{η} (which are optimized for a given θ-value) as well as the difference between the observed and the expected values of the frequency spectrum were also proposed (Fu 1996).

Here, I show that when using a general weighted linear combination of (or when no outgroup is available), any estimators of θ [*i.e*., ] and consequently any neutrality tests can be derived. Nawa and Tajima (2008) recently advocated the use of the spectrum, which is expected to be uniform under the standard model, as a visual test for neutrality instead of the classical frequency spectrum. This last proposal is in complete agreement with the current work. Importantly, it has been previously reported that some θ-estimators and neutrality tests could be expressed as specific linear combinations of ξ_{i} or η_{i} (Tajima 1997; Wakeley 2009). Furthermore, Fu (1997) shows that several θ-estimators can be expressed as specific linear combinations of () or in a related framework that uses instead of . was subsequently designed as (Fay and Wu 2000). However, some estimators (like , , or ) cannot be expressed using the Fu (1997) framework. To the best of my knowledge, no previous study has explicitly derived the framework presented here. No work has yet highlighted the striking simplicity of θ-estimators and related tests, when expressed in this framework. I further show how the use of such a simple framework greatly facilitates the study of previous θ-estimators and their related neutrality tests and how it opens the door for constructing yet undiscovered interesting θ-estimators and neutrality tests with enhanced power.

## MODEL

#### With an outgroup:

According to Fu (1995), we know that(2)(3)(4)where σ_{ii} and σ_{ij} depend only on *n* and are given in Equation 2 of Fu (1995). This shows that *E*[*i*ξ_{i}] = θ and therefore that any ξ_{i} can be used to construct an unbiased estimator of θ:(5)

Consequently, a linear combination of the 's (in which the weights sum to 1) is also an unbiased estimator of θ. Mathematically, it is expressed as(6)

where ω_{i} is the weight of each in the combined estimator. Therefore, any estimator based on the frequency spectrum can be solely described by an -vector. Importantly, it should be mentioned that Fu (1997) also proposed a linear combination of *i*ξ_{i}, but in which only a subset of the weight vectors was used. Namely, the proposed weight vectors were restricted to ω_{i} = *i*^{−}* ^{x}*.

Using Equations 3 and 4 the variance of can be shown to be(7)

Following Tajima (1989), using Equation 1, one can compute a normalized statistic that is, in the general framework,(8)which can be expressed as a function of an Ω-vector,(9)with

The Ω-vector results from the difference between two weight vectors normalized to 1. As a consequence, (1) all elements of the Ω-vector sum to 0 and (2) the sum of all positive values cannot be >1 and the sum of all negative values cannot be < −1. Any vector that fits these two constraints can be considered, along with Equation 9, as a neutrality test.

#### Without an outgroup:

If no adequate outgroup is available, the unfolded frequency spectrum and consequently the spectrum, cannot be computed. This implies that one has to use the folded frequency spectrum. Following Fu (1995), we define and therefore we have(10)(11)(12)where δ_{i,n−i} is a Kronecker delta (1 if *i* = *j*, and 0 otherwise) and where

Although, we cannot compute the spectrum (as defined above), we can compute a folded spectrum defined as(13)

This folded spectrum is the visual neutrality test proposed by Nawa and Tajima (2008). Using a similar reasoning to that above, a linear combination of leads to a generic unbiased estimator of θ defined as(14)

whose variance is given by(15)

Consequently, the corresponding neutrality test is(16)with

It is important to mention that Tajima (1997) previously showed that *D*, *F**, and could be expressed as a linear combination of η_{i}. More precisely, the vectors used then correspond in the present framework to . This vector definition emphasizes the weight on each η_{i} rather than on each .

#### With or without an outgroup:

Using both definitions of (Equation 5) and (Equation 13), it is easy to show that we have(17)

As a consequence, the use of an -vector along with the folded frequency spectrum is equivalent to the use of an -vector with the unfolded frequency spectrum only when we have(18)

This makes clear that there is an equivalent -vector for any -vector that adheres to the following constraint:(19)

To fold the frequency spectrum, the weight *i*ω_{i} associated with ξ_{i} (and not with ) has to be the same as the weight (*n* − *i*)ω_{n−i} associated with ξ_{n−i}. This translates into an *i*ω_{i} vector that is symmetric around *n*/2. Furthermore, when the constraint (expressed in Equation 19) is fulfilled, we can write, for any 0 ≤ *f* ≤ 1,which leads interestingly for *f* = (*n* − *i*)/*n* to(20)

The weights on simply result from the sums of the weights on and on that are pooled when the spectrum is folded. In that respect, any -vector complying to Equation 19 can be used without the help of an outgroup. The -vectors are then a subset of all possible values of the -vectors. The former can be computed from the latter by using Equation 18 or 20.

Because is the difference between two normalized -vectors, all relationships between and expressed above also hold for and .

## RESULTS

The model described above shows that all estimators of θ based on the frequency spectrum are linear combinations of , weighted by a specific vector . When no outgroup is available, one can use a linear combination of , weighted by a vector . Consequently, neutrality tests can be expressed as a linear combination of (or ) weighted by a vector (or ), for which a variance can be computed easily. Three applications of the model are developed below. First, I reinvestigate the previous estimators of θ and their corresponding neutrality tests and frame their intrinsic properties in terms of the () spectrum. Then, since previous tests are only specific instances of the framework, I show how the model can be used to build new tests that are more powerful than previous ones. Finally, I exemplify the benefit of the framework on real data that are known to be subject to an ascertainment bias.

#### Previous θ-estimators and neutrality tests:

Using Equation 6, all previously reported θ-estimators are given by an -vector (Table 1). When defined, the corresponding -vectors are also provided (Table 1). A graphical representation of four estimators of θ is shown in Figure 1. Figure 1 highlights that both and emphasize the low-frequency polymorphic sites in their estimation of θ (although not as much as , which is solely based on derived singletons) and that, on the contrary, gives more weight to ancestral polymorphisms. Framed in the folded spectrum, still weights more low plus high frequencies whereas has a uniform weight. Potentially, using other weight vectors, one could express any undiscovered estimator of θ based on the frequency spectrum.

The numerical variances of the previous estimators of θ are reported in Table 1 (for *n* = 30 and θ = 1, 10, 100). They can be computed either by their original derivations or by Equation 7. This clearly shows that, among previous estimators of θ, the variance of is the smallest and that of is the largest. This can be explained by the fact that the variance of increases with *i*. As a consequence , which puts more weight on ancestral alleles, shows a larger variance. Interestingly, estimators without singletons have relatively small variances.

Previous neutrality tests are given in Table 2. A graphical representation of the -vectors (and when defined) used in four previous tests is reported in Figure 2. Figure 2 shows that the sensitivity of the different tests differs although they share some common features. For example, *D* and *F** both are negatively sensitive to both low and high frequencies (although more sensitive to low frequencies). *D* shows opposite sensitivity between medium frequencies and low/high frequency, whereas *F** shows poor sensitivity to medium-frequency polymorphisms. *F* and *F** have opposite effects on doubletons and singletons. Thus, deviations that enhance both will have opposite effects. Finally, *H* is oppositely skewed by low and high frequencies.

One crucial aspect of neutrality tests is their important variance under the neutral model. This variance induces a large confidence interval and therefore decreases their power to detect a deviation. It has been argued that this variance is a consequence of the tree shape variance and that neutrality tests based on the frequency spectrum are doomed to exhibit low power (Felsenstein 1992b).

As a consequence, an ideal neutrality test should minimize its variance under the standard model. The variances of the denominator of previous neutrality tests are given in Table 2 (for *n* = 30 and θ = 1, 10, 100). It is also important to mention that previous derivations of *f*, *f* *, *y*, and *y** variances give different values. Simulations show that the new derivations are the correct ones (supporting information, Table S1). First, it should be noted that the original *D* test has a very low variance when compared to all other tests. This is connected to the low variance of both and . Second, *Y* and *Y* * tests have also a small variance, although they ignore an important fraction of the data (*i.e*., singletons). All other tests have a similar variance.

This predicts that *D* typically will be sensitive to low, medium, and high frequencies and should be more powerful because it has a relatively low variance under neutrality. Therefore, it has the potential to be an excellent neutrality test and it appears that it is often one of the most powerful tests (Simonsen *et al.* 1995; Fu 1997). *H* is sensitive either low or high frequencies; however, its larger variance predicts that it will be useful only when the distortion in the θ-spectrum is very strong. In practice, it is powerful only when there is a large excess of high-frequency polymorphisms. The singleton tests appear to be good candidates to capture an excess of singletons, although they neglect other deviations in the spectrum. The *Y* and *Y* * tests have low variance, although ignoring singletons can lead to low power especially when they are in excess (Achaz 2008).

#### Building new tests:

To design new neutrality tests using this framework I started by analyzing the deviation of the average spectrum, which is expected to be uniform under the standard models. Furthermore, because Fu (1995) showed that the covariance between ξ_{i}'s is weak when compared to their variance, visual inspection of the variance of provides a first approximation to the expected variance of and therefore of their related *T*_{ω} tests. I studied two deviations from the standard model: a severe bottleneck and isolated populations with migration.

The severe bottleneck was simulated as a sudden change of size from *N* chromosomes to *N*/100 that lasts for a time *T*_{l} = 0.1 (in *N* generations). Accordingly, the coalescent rates within the bottleneck are accelerated by 0.01 and the simulations were performed as in Simonsen *et al.* (1995). Sampling was performed after a time *T*_{b} has elapsed after the bottleneck. The mean and the standard deviation of are given in Figure 3a for two times, *T*_{b} = 0.03 and *T*_{b} = 0.3. Figure 3 shows that most of the deviation comes from the sites with low frequency. Therefore, I designed a new test that captures the deviations within low frequencies. In this test, I used a first vector of ω_{1i} = *e*^{−α}* ^{i}*, with α = 0.9 and a second uniform vector ω

_{2i}= 1. This results in an exponentially decreasing weight for low-frequency mutations (Figure 3) that is positive for frequency

*i*/

*n*≤ 0.13. The choice of α = 0.9 was mostly empirical, although using α = 0.8 or α = 1 leads to similar results (data not shown). As stressed in the discussion, this study aims at illustrating how easy it is to create new tests with enhanced power; power optimization deserves an entire new study. A graphical view of the -vector associated with this new

*T*

_{Ω}test is given in Figure 3 and its variance is reported in Table 2. Most of the weight of this test is given to low frequencies and its variance is comparable to those of other neutrality tests. The power of this new test and of

*D*,

*F*, and

*H*is reported in Figure 3. Results show that the new test outperforms the previous tests by 20% and is able to detect the deviation for a longer time.

The 95% confidence intervals were built using coalescent simulations under the standard model, using a fixed number of segregating sites (Hudson 1993; Depaulis and Veuille 1998). Although there has been much debate on how confidence intervals should be set (Depaulis *et al.* 2001; Markovtsova *et al.* 2001; Wall and Hudson 2001), it has been clearly shown that the choice of a particular method does not alter the results in standard models (Ramos-Onsins *et al.* 2007) and therefore is not discussed here.

In the second scenario, I compared the power of neutrality tests in detecting a case of isolation with migration (*e.g.*, Nielsen and Wakeley 2001). In the simulations, the isolation event happened at time *T _{i}* = 3 and both populations were sampled equally (

*n*

_{a}=

*n*

_{b}= 15). The migration rate between the two populations is variable. Similar to the analysis of the bottleneck, I first report the mean and the standard deviation of the spectrum. Figure 4 shows that most of the deviation comes from the sites at frequency 15/30. Additionally, for a small enough migration rate (

*M*= 0.1), there are almost no polymorphisms with frequency >0.5. Although the standard deviations are large, the coefficients of variations (variance/mean) are relatively small. To design a new test, I used for the first -vector the probabilities given by a binomial law, with

*p*= 0.5 and

*n*= 30 and a uniform vector ω

_{2i}= 1 as a second vector.

This was motivated by the idea of designing a test that specifically captures an excess of medium-frequency polymorphisms. A graphical view of the resulting -vector is given in Figure 4 and its variance is given in Table 2. Almost all the weight of this test is given to the 13 < *i* < 17 sites. The variance of this new test is large, and this is related to the large variance of in the sample with even *n*. Despite this large variance, the test clearly outperforms all previous tests (Figure 4).

#### Overcoming the ascertainment bias:

As an example of the power of designing new neutrality tests, I analyzed SNP data (from HapMap) around the Lactase gene (LCT), which has been shown to exhibit a footprint of a recent strong selective sweep in European populations (Bersaglieri *et al.* 2004) as well in eastern African populations (Tishkoff *et al.* 2007). This pattern of recent selection is one of the strongest in the human genome (Nielsen *et al.* 2005). Indeed, it has been advanced that the lactase-persistence phenotype (the ability to digest milk as an adult) has been advantageous in European populations of farmers (especially in Northern European ones). The SNPs that are tightly associated with the selective sweep in Europeans are located at 13–22 kb upstream of the gene start (Bersaglieri *et al.* 2004). From HapMap (release 27, February 2009) I gathered all SNPs in a window of 100 kb centered at the start of the lactase gene. This includes 50 kb upstream and the entire gene. I considered only SNPs whose sample size was at least 85 chromosomes. Because the sample size of all SNPs was not identical, I used the observed frequencies to generate a folded frequency spectrum of 85 chromosomes for the following populations: Utah residents with northern and western European ancestry from the CEPH collection (CEU); Han Chinese in Beijing, China (CHB); Japanese in Tokyo, Japan (JPT); and Yoruban in Ibadan, Nigeria (YRI).

According to the literature, one expects to find a trace of an ongoing selective event in the CEU population only. Without the help of an outgroup, this would translate into an excess of low-frequency polymorphism in the folded frequency spectrum (typically, negative *D*, *F**, and ). Computation of the standard neutrality tests shows a deficit of low-frequency polymorphism rather than an excess. This deficit is even often significant (Table 3). This is clearly caused by the ascertainment bias in the data set. Because the polymorphisms were first screened in a small group and further genotyped in larger groups, rare variants are underrepresented (*e.g.*, Kuhner *et al.* 2000; Clark *et al.* 2005). This ascertainment bias has been subject to various corrections (*e.g.*, Wakeley *et al.* 2001; Nielsen *et al.* 2004). To avoid any correction, I computed a test where the weights of both and vectors were set to 0 for *i* < 8. The remaining two vectors were computed using and . As a consequence, this test is *D*-like in that it considers only polymorphisms with frequencies in the range [0.09, 0.91]. This is reminiscent of ignoring the singletons data set where sequencing errors are suspected (Achaz 2008). Results (Table 3) show that this test significantly deviates from the standard model for the CEU population. Ignoring fewer polymorphisms (*e.g.*, only the 5% that are of low frequency) or changing the minimum sample size leads to similar results (data not shown).

## DISCUSSION

Here I developed a unifying framework for θ-estimators on the basis of the frequency spectrum. Namely, all known estimators of θ are linear combinations of (or ). Because neutrality tests based on the frequency spectrum are simple functions of these θ-estimators, the framework can be used to derive them. All tests (of this family) proposed so far are embedded in the framework. Using the model, I have shown that estimators of θ based on a folded spectrum always have an unfolded equivalent. The reciprocal, however, is not true.

Besides its unifying appeal, the model developed here can be used in several ways. First, I showed how it can be used to compute the variance of all estimators of θ and consequently of statistics such as . All variances of all estimators can be computed either using this framework or from their previous derived analytical formula. The same should be true for all *t*. Importantly, the computation of *f*, *f* *, *y*, and *y** revealed differences between both methods. Simulations demonstrate that the previous formulas were not correct while the new ones are. Besides a minor error in the *f* and *f* * variance (corrected in Simonsen *et al.* 1995), it appears that the Cov[π, ξ_{1}] that was derived by Fu and Li (1993b) is inexact. Therefore the variances of *f* and *f* * (Fu and Li 1993b) as well as the variances of *y* and *y** (Achaz 2008) that were using this covariance carried along the error. Framed within the model presented here, all variances are correct. Finally, it can be used to compute the variance of *h* that was not given by the authors (Fay and Wu 2000).

One potentially interesting development is to find an ω-vector that minimizes variance of the associated estimator of θ. This problem was previously addressed thoroughly (Felsenstein 1992a,b; Fu and Li 1993a; Fu 1994a,b). Indeed, it was shown that phylogenetic estimates have lower variance than estimators based on summary statistics (Felsenstein 1992b; Fu and Li 1993a; Fu 1994b). Moreover, Fu (1994a,b) proposed a general method to find weight vectors that minimize the variance of the estimators and showed that the best vector actually depends on the value of θ itself. Nonetheless, it remains true that some estimators have less variance than others (*i.e*., *vs.* ), whatever is the value of θ. This latter observation suggests that re-exploring this question of minimizing the variance may be of interest.

Nawa and Tajima (2008) recently proposed to use the spectrum instead of the classical frequency spectrum as a visual test for neutrality. This can be extended to the unfolded spectrum if an outgoup is available. The study presented here fully supports this idea. The visual inspection of the spectrum indicates why some tests will reject neutrality. Contrary to what intuition may suggest, when one is interested in θ-estimation, the appropriate representation for weight vectors is the -vector as defined above rather than weights on the ξ_{i} themselves (or on the η_{i} as in Tajima 1997).

When an outgroup is used to unfold the spectrum, the choice of the appropriate outgoup is of critical importance. If the outgroup is not adequate (too distant or too close), misoriented sites will have a disastrous effect on θ-estimations and therefore on related neutrality tests (Baudry and Depaulis 2003). This adds to the difficulty of using tests based on the full ξ-spectrum. However, when low and high frequencies can be sorted apart, much power is gained in terms of choosing the adequate evolutionary scenario. For example, no high frequencies are overrepresented under recent growth or severe bottlenecks.

Specific problems that concern only some area of the spectrum can be handled easily by setting to 0 all weights in the suspicious area. For example, the sequencing errors can be avoided when the singletons are ignored (Achaz 2008). With the current framework, by ignoring the low-frequency polymorphisms, the ascertainment bias can be overcome and the pattern expected from selection at the lactase gene appears. This strategy has endless extensions as long as we have some prior knowledge of the suspicious area.

Finally, I think that this framework opens the door for new estimations of θ and the related neutrality tests. Using simple examples, I show how the power of neutrality tests can easily be improved to detect deviations from the standard model. To optimize the power of the future new tests, one could (1) minimize their variance under the standard model, (2) select their area of sensitivity on the basis of prior knowledge of the impact of specific deviations, and (3) use recombination estimates to compute smaller confidence intervals (Wall 1999) (because recombination results in quasi-independent replicates that lower the variance of the θ-estimators). By building specific tests that will be sensitive to specific deviations, one could envision how several selected tests will be able to help the population geneticist to choose between different possible scenarios for a given data set. Another interesting alternative would be to use the different θ-estimators as summary statistics to infer the best parameters for a given evolutionary scenario (*e.g.*, using ABC analysis).

The source code for this study was designed as a C++ library for the simulations and a C library for sequence analysis and is available upon request. A dedicated web version of the tests is available at http://wwwabi.snv.jussieu.fr/achaz/neutralitytest.html. Furthermore, the tests will be incorporated in a future release of DNAsp.

## Acknowledgments

I thank F. Tajima, E. P. C. Rocha, J. Wakeley, P. Nicolas, and D. Higuet for their interesting comments on the manuscript and T. Treangen for English language improvement. I also thank two anonymous reviewers for their constructive comments. This work was supported by grant 07-GMGE-004-04 from the Agence Nationale de la Recherche.

## Footnotes

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.104042/DC1.

Communicating editor: N. Takahata

- Received April 16, 2009.
- Accepted June 12, 2009.

- Copyright © 2009 by the Genetics Society of America