## Abstract

We investigate the dependence of the site frequency spectrum on the topological structure of genealogical trees. We show that basic population genetic statistics, for instance, estimators of *θ* or neutrality tests such as Tajima’s *D*, can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’s *D* and Fay and Wu’s *H* depend in a direct way on a peculiar measure of tree balance, which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu’s *H* and discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulas for these extreme values as a function of sample size and number of segregating sites.

COALESCENT theory (Kingman 1982; Hein *et al.* 2004; Wakeley 2009) provides a powerful framework to interpret the mutation patterns in a sample of DNA sequences. Grounded in the neutral theory of molecular evolution (Kimura 1985), binary coalescent trees are the dual backward representations of the continuous-forward-time diffusion model of genetic drift. In this view, sequences are related by a genealogical tree where leaf nodes represent the sampled sequences at present time, and internal nodes (coalescent events) represent last common ancestors of the leaves underneath. In particular, the root node represents the most recent common ancestor of the whole sample.

In species phylogeny and epidemiology, tree structure is often used to compare different models of evolution or to fit model parameters (Bouckaert *et al.* 2014). Two summary statistics are routinely used to characterize tree structure: the *γ* statistic relates to the waiting times (Pybus *et al.* 2000) and the *β* statistic to tree balance (Blum and François 2006). Importantly, these statistics can only be computed after the tree structure was independently inferred, typically by phylogenetic reconstruction methods (Felsenstein 2004).

In population genetics, the historical relationship among nonrecombining sequences is represented by a single genealogical tree. The tree is completely determined by the waiting times and the branching order of coalescent events. The waiting times determine branch lengths; the branching order determines tree shape. Population genetic statistics, such as estimates of the scaled mutation rate or tests of the neutral evolution hypothesis (neutrality tests) are sensitive to waiting times and tree shape.

The site frequency spectrum (SFS) is one of the most-used statistics in population genetics. The unfolded SFS of a sample of *n* sequences is defined as the vector of counts of all polymorphic sites with a derived allele (“mutation”) at frequency The SFS is a function of both tree structure and mutational process. For a given mutational process, the SFS carries information on the underlying, but not directly observable, genealogical trees and therefore on the forward process that has generated the trees. For a nonrecombining locus, the SFS carries information on the realized coalescent tree and can be used to estimate tree structure (both waiting times and topology).

Variation over time in the effective population size affects the expected waiting times between coalescent events. In the past, much attention in theoretical works has been paid to the relation between waiting times and population size variation. For example, skyline plots (Pybus *et al.* 2000) are directly used to infer variation of population size (Ho and Shapiro 2011), although care should be taken while using this approach (Lapierre *et al.* 2016). More generally, formulas of the SFS can be generalized to include deterministic changes of population size (Griffiths and Tavaré 1998; Zivkovic and Wiehe 2008; Liu and Fu 2015). In contrast, the influence of tree shape on the SFS has not yet been tackled analytically.

The shape of a tree can range from completely symmetric trees, in which all internal nodes evenly split the lineages; to caterpillar trees, in which each node isolates exactly one lineage. In the standard neutral model, as well as in any other equal-rates Markov or Yule model (Yule 1925), both of these extreme cases are very unlikely to appear by chance (Blum and François 2006). In fact, since the number of binary tree shapes [enumerated by the Wedderburn–Etherington numbers (Sloane and Plouffe 1995)] grows rapidly with the number of sequences *n*, any specific tree shape is arbitrarily improbable if *n* is sufficiently large. Nonetheless, tree topology is a major determinant of the SFS. For example, a caterpillar shape leads to a large excess of singleton mutations, while a completely symmetric tree leads to an overrepresentation of intermediate frequency alleles.

This study aims at a providing a systematic analysis of the impact of the structure of genealogical trees upon the SFS. First, we introduce the theoretical framework for neutrality tests and tree balance. In particular, we develop a new measure of imbalance appropriate for population genetics. Then, we present the decomposition of the SFS in terms of waiting times and tree shape. We discuss the case of a single nonrecombining locus, assuming a single realized tree (fixed topology). As recombination affects mostly lower branches of the tree, this also constitutes an excellent approximation for a locus with a low level of recombination.

We present a mathematically rigorous, yet intuitive, interpretation of neutrality tests in terms of tree topology and branch lengths. We focus on a subclass of tests of special interest and simplicity. A qualitative summary of the results about the interpretation of neutrality tests is given in Table 1. We also propose a new neutrality test, *L*, for selection. Finally, we find the trees corresponding to the maximum and minimum expected values of the tests and provide explicit formulas for these extreme values.

## Methods

Trees can be divided into time segments (“levels”) delimited by the nodes. Each level is unambiguously characterized by its number of lineages *k*, The most recent level has *n* lineages; the most ancient level (from the root to the next internal node) has two lineages. Hereafter, the branches and internal nodes close to the root will be referred to as “upper part” of the tree; conversely, the “lower part” is close to the leaves.

The waiting times between subsequent “binary” coalescent events, *i.e.*, the level heights, are denoted by For trees with coalescent events involving multiple mergers, some of the binary waiting times could be null, *i.e.*, For example, if four lineages would coalesce together in a tree with five lineages, and then the two remaining lineages would coalesce to form the root, then

In a neutral, panmictic population of ploidy *p* (typically or 2) and constant effective population size that can be modeled by the Kingman coalescent, the are exponentially distributed with parameter when the time is measured in generations (Wakeley 2009). Two summary tree statistics are the height which is the time from the present to the most recent common ancestor, and the total tree length Basic coalescent theory states and where is the harmonic number.

### Tree imbalance per level

Following Fu (1995), we define the size of a branch from level *k* as the number of leaves that descend from that branch. Any mutation on this branch is carried by sequences from the present sample. We denote by the probability that a randomly chosen branch of level *k* is of size *i*, given tree *T*. The complete set of distributions for each *i* and *k* determines uniquely the shape of the tree *T*.

The mean number of descendants across all branches from level *k* is This holds for any tree, since all *n* present-day sequences must descend from one of the *k* branches from that level.

In contrast, the size variance, depends on the tree topology: at all levels, it is almost zero in completely balanced trees and maximal in caterpillar trees, where all nodes isolate one leaf from the remaining subtree. For this reason, we propose the variance as the natural measure of imbalance for each level.

The bounds on shown in Figure 1A, vary greatly from level to level: for example, the variance of the uppermost level is whereas (since for all branches). More generally, the maximum variance at a given level *k* is obtained in trees where lineages lead to exactly one leaf and one lineage has descendants. For this case, we compute(1)Minimum variance at level *k* is obtained when all lineages have either (we denote by the floor of *x*, *i.e.*, the largest integer smaller or equal to *x* or descendants) and it is always small (*i.e.*, ):

### Tree imbalance in population genetics *vs.* phylogenetics

Measures of tree topology, and especially tree imbalance, have received considerable attention in the phylogenetic literature (Blum and François 2006). Several measures of imbalance have been proposed, among them the Sackin’s and Colless’ indices (Blum and François 2005; Blum *et al.* 2006), which depend only on tree topology and not on branch lengths. In the context of phylogenies of genes from different species, the divergence is expected to be large enough such that there would be substitutions on all branches to resolve, in principle, the tree topology. Some of these substitutions are expected to have a functional role (*e.g.*, nonsynonymous substitutions, indels). Therefore, almost all splits between lineages should be detectable and correspond to a functional/phenotypic difference between species.

In theory, the same statistics could be applied to the genealogy of a single population or a sample. However, genealogical trees in population genetics are usually much shorter than phylogenetic trees. For short nonrecombining sequences, there could be many branches without any mutation event on them. This raises two issues: the detection of imbalance in trees with short branches, and its evolutionary meaning.

Regarding detection, neither a mutation-free branch nor the split above it could be detected from sequences. Hence, a split should be weighted by the probability of being detected through sequence comparison. For example, let be a measure of imbalance at the *k*th level. Using the probability that there is at least one mutation on level *k*, that is as a weight function, the combined statistics becomes(3)On the other hand, from an evolutionary point of view, the importance of a given branch—and of the adjacent splits in the tree—is related to the number of mutations on the branch. For example, consider a branch that is not supported by any mutation. Its significance for future evolution is null, since there is no selective difference between identical alleles and there is no effect on the genetic variability of the population. This branch could be contracted to zero length, and the splits collapsed into a polytomy of three lineages, without any effect on the present population or on future evolution (even a branch that is supported only by nonepistatic neutral mutations does not affect in any way the future selective processes, even if it has an impact on the genetic diversity of the population). Since mutation-free branches do not have evolutionary significance, their weight in imbalance measures should be low.

Since selective effects and effects on genetic diversity are both proportional to the number of mutations along the branches, it seems reasonable to weight local imbalance measures by the expected number of mutations in the branches supporting them. For example, a measure of imbalance for each level would be weighted by the expected number of mutations at that level. In this case, we obtain the same statistics as in Equation 3 above.

### A new measure of tree imbalance

We propose an informative statistics on tree balance based on and the reasoning in the previous section. We can compute the variance in branch size for each level of the tree, then average it across levels. Fixing a tree *T*, the average variance in branch size across all levels *k* is(4)This summary statistic contains the natural weights which is the fraction of branch lengths at level *k*, discussed in the previous section. Note that this average is different from the total variance in offspring number, *i.e.*, when the variance of sizes is taken across all branches, irrespective of their level. The statistics corresponds instead to the “within-level” component of variance.

To better understand we study the extent to which each level contributes to the statistics. Figure 1A shows contributions per unit time and per whole level. In the first case, are weighted by the number of lineages *k*, while in the second case they are weighted by the length at level *k*, which is for constant population size. Figure 1A shows that the largest contributions to come from the levels close to the root. In particular, for the neutral model at constant population size, the dominant contribution comes from the uppermost level, *i.e.*, from This measure contains the same information as the root balance defined as the smaller of the two root branch sizes:(5)Hence the imbalance measure depends strongly on the root balance which has been previously recognized as a meaningful global measure of tree balance (Ferretti *et al.* 2013; Li and Wiehe 2013), and on the imbalances of the first upper splits as well.

### Estimators of *θ* and neutrality tests

A fundamental population genetic quantity is the scaled mutation rate where *μ* is the mutation rate per generation per sequence. *θ* is the key parameter of the neutral mutation–drift equilibrium. Usually, it cannot be measured directly, but can only be estimated from observable data. For example, under the standard neutral model (*i.e.*, constant population size) an unbiased estimator of *θ* is Watterson’s where *S* is the number of observed polymorphic sites in a sequence sample of size *n* (“segregating sites”) (Watterson 1975).

More generally, it has been shown that many of the well-known *θ* estimators can be expressed as linear combinations of the components of the SFS (Tajima 1983; Achaz 2009; Ferretti *et al.* 2010). For example, or Tajima’s Other estimators are presented in Table 2. Furthermore, the classical neutrality tests (in their nonnormalized version) can be written as a difference between two *θ* estimators, hence as a linear combination of the For instance, the nonnormalized Tajima’s *D* (Tajima 1989) is while Fay and Wu’s *H* (Fay and Wu 2000) is The most common tests are presented in Table 3.

Their expression as linear combinations of the helps to understand discrepancies between these tests through their weight functions. For instance, from the weight functions, it is immediately clear that *H* assigns large negative weight only to with large *i* (high-frequency derived alleles), while *D* assigns negative weight to with small and large *i* (rare alleles).

For each component of the SFS, the product is an unbiased estimator of Hence, given weights the weighted linear combination(6)is also an unbiased estimator of *θ*. For instance, Watterson’s estimator follows from setting in Equation 6; Tajima’s estimator (Tajima 1983) is obtained by letting In fact, one can write all usual *θ* estimators (Tajima 1989; Fu and Li 1993; Fay and Wu 2000) as linear combinations of the SFS with adequate weights (Achaz 2009) detailed in Table 2:(7)where In this expression, *θ* is usually estimated by method of moment as with (Tajima 1989). Hence the general form of is with appropriate coefficients reported in Table 4 for some tests.

### Decomposition of the SFS and its combinations

Here we discuss the dependence of the average spectrum on tree topology and branch lengths.

The SFS is determined by the number of mutations of size *i*, A mutation has size *i* if it appears on a branch of size *i*. We assume that mutations occur along branches according to a homogeneous Poisson process with rate *μ* per unit time. Fixing a tree with respect to shape and branch lengths, we can average over the mutation process. Denoting by the expected value for the mutation process, we obtain for the mean frequency spectrum (Fu 1995)(8)where is the distribution of the number of descendants of the branches of level *k*. These probabilities depend only on the shape of the tree *T* and not on waiting times. The full set of actually gives a complete description of the tree up to permutation of the leaves.

Replacing by their mean according to Equation 8, we obtain the general expression for the mean of SFS-based *θ* estimators(9)and tests(10)where the normalization function is defined by(11)and depends on only, since *S* is a Poisson variable with parameter

It is also possible to condition on *S* as well, obtaining(12)and

### A new subclass of neutrality tests and their decomposition

Interestingly, several common tests (and estimators) are polynomials up to second order in the frequency of mutations, and hence can be written in terms of a general weight function of the form(14)with appropriate values of satisfying(15)For instance, has and while has and hence their difference Tajima’s *D* has and Coefficients for other estimators and tests can be found in Table 2 and Table 3. With this special class of weights, Equation 10 becomes(16)Using and and exchanging the order of the sums, this becomes

### Data availability

The human SNP data are publicly available on the 1000 Genomes Project Web site. The tests discussed in this article are implemented in the mstatspop software, available at https://github.com/cragenomica/mstatspop.

## Results

### Interpretation of neutrality tests for a single locus

We consider the application of neutrality tests to a single locus without recombination, *i.e.*, with a given genealogy. We show that some commonly used test statistics have a simple but rigorous interpretation in terms of tree imbalance and waiting times. The tests are summarized in Table 3 and their interpretation in Table 1. The weight of the different components is illustrated in Figure 1B.

#### Tajima’s *D* test:

This is the most-used neutrality test. It is proportional to the difference If positive, it indicates an excess of common alleles; if negative, an excess of rare alleles.

Watterson’s estimator itself has a simple interpretation. In fact, its average is *i.e.*, it is proportional to the total length of the tree, divided by the mean length. As such, it is independent from the tree topology. In more practical terms, it is independent of mutation frequencies.

Using the result of section *A new subclass of neutrality tests and their decomposition* with the weights in Table 3, we can reexpress the mean Tajima’s *D* as(18)*i.e.*, can be decomposed into two components: one that is a linear combination of tree lengths, independent from the topology, plus a topological component that corresponds to the measure of tree imbalance introduced before.

In qualitative terms, Tajima’s *D* is the sum of an imbalance term with negative sign plus terms that give positive weight to the ancient waiting times, and negative weight to the recent ones:Therefore, Tajima’s *D* is large and positive when there are long branches close to the root. It is strongly negative when the tree is unbalanced and/or when recent branches are long. Tajima’s *D* is thus sensitive to both unbalanced trees and trees with long branches close to the leaves (when negative), and balanced trees with long branches close to the root (when positive). The former are typical trees for recently increasing populations or loci under directional selection, the latter are typical under balancing selection or for structured populations or contractions in population size.

Note that the definition of upper and lower branches, as well as their weighting, depends explicitly on the test and on the sample size *n*.

#### Fay and Wu’s *H* test:

This statistic was specifically designed to detect selective sweeps at partially linked loci, as most weight is given to derived alleles with high frequency. Strongly negative *H* is caused by an excess of high-frequency derived alleles, which is a signature of a locus “hitchhiking” on a nearby sweep locus (Fay and Wu 2000). In this article, we always consider the normalized version of this test (Zeng *et al.* 2006). We can rewrite its mean value as(19)Like Tajima’s *D*, *H* contains the imbalance term with negative sign. However, it has another contribution that weights positively the waiting times close to the leaves, which is opposite to Tajima’s *D*:The only waiting time that has null weight is the root waiting time. Therefore, *H* is strongly negative for (i) large imbalance, and (ii) long branches close to the root. This is precisely the signal expected by hitchhiking in the proximity of strong selective sweeps, *i.e.*, when the sweep locus itself is uncoupled from the locus under consideration by one (or a few) recombination event(s).

#### Zeng’s *E* test statistic:

This is another test designed to detect selective sweeps. However, it is known to be less powerful than *H* (Zeng *et al.* 2006). It is defined by where the estimator also has a simple interpretation: is the height *h* of the tree divided by the expected height. Unsurprisingly, the test is therefore a comparison of height and length of the tree:(20) which can be rephrased asLike Fay and Wu’s *H*, the *E* test is focused on high-frequency alleles. However, it uses no topological information, but depends only on waiting times. This explains its lower power for classical signatures of hitchhiking compared to other tests. Furthermore, since *E* compares upper and lower branches, it can actually be naturally interpreted as a test for star-likeness of a tree. In star-like trees, the length is maximal with respect to the height (), corresponding to strongly negative values of *E*.

#### Fu and Li’s *D*_{FL}:

_{FL}

Finally, we will discuss a common test not included in Equation 14. Fu and Li’s is one of several tests based on singletons. Its mean is hence this test should measure the relative contribution of external branches to total tree length:(21)*i.e.*, negative values of this test should signal extremely unbalanced (caterpillar) trees or star-like trees. However, despite its intuitive interpretation, negative values of Fu and Li’s can be misleading if interpreted in terms of tree shapes. The reason for this is that these values of the test can be a result of purifying selection—nonneutral mutations that decrease fitness and therefore can only reach low frequencies before disappearing from the population. These mutations appear mostly as singletons concentrated on the lower branches. This scenario violates the assumption of mutational homogeneity along the tree and therefore the interpretation of Equation 21 is not valid anymore.

### A new neutrality test for positive selection

The family of tests described by Equation 14 includes Tajima’s *D*, Fay and Wu’s *H*, and Zeng’s *E* among many others. These three tests are built as differences of four different estimators and However, they do not exhaust all combinations of these estimators. There is another combination that has not been studied previously and will be detailed in this section (since the other two combinations and are equivalent to Fay and Wu’s *H*).

The new test is the difference between the Watterson estimator and the Fay and Wu’s estimator We denote this test by *L*:(22)The test compares the amount of high-frequency polymorphisms with the total number of polymorphisms.

The *L* test belongs to the family described by Equation 14, with weights and Its precise definition is(23)where the coefficients and are given in Table 4. Their derivation can be found in Supplemental Material, File S1.

The interpretation of the test can be read from Equation 17:(24)Its qualitative interpretation is different from all previous tests. It is the sum of an imbalance term with negative sign, plus negative weight to the ancient waiting times, and positive weight to the recent ones:This interpretation and the presence of the Fay and Wu’s estimator in the test suggest that this test could be most powerful in selective scenarios.

In fact, simulations of the statistical power of the test in Figure 2 show that the left tail of *L* has a power similar to the normalized Fay and Wu’s *H* test for hitchhiking (but slightly lower for most parameters). On the other hand, the right tail of *L* has a power similar to the left tail of Zeng’s *E*, performing well immediately after fixation and outperforming most other tests for an intermediate range of times after fixation. The test seems therefore to retain some of the advantages of Fay and Wu’s *H*, while being able to detect different selective signals as well.

### Extreme trees and extreme mean values of neutrality tests

Our precise interpretation of the expected values of neutrality tests in terms of tree shape and waiting times allows us to find both the extreme expected values of the tests and the corresponding “extreme” trees.

In this section, we will compute the maximum and minimum value of *i.e.*, the maximum and minimum expected values of the test across all trees *T*, for a given number of mutations *S*, and a given sample size *n*. For large *S*, these values depend only on the sample size. The extreme values are presented in Figure 3 as a function of *n* and for different values of *S*.

The expected value for all tests described by Equation 14 is a linear combination of imbalances with coefficients of the same sign:(25)For this reason, maximum and minimum values correspond to maximally balanced or unbalanced topologies. Hence, to obtain these values, it is sufficient to replace by its maximum or minimum, then maximize/minimize the result over the waiting times (see File S1). The maximum imbalance is given by Equation 1, while the minimum imbalance will be approximated by In the following we will also use the related approximation Both approximations are correct up to for large trees.

#### Tajima’s *D*:

Its maximum corresponds to a tree with maximally balanced topology and length concentrated in the upmost branches (), while its minimum corresponds to all maximally unbalanced trees with length concentrated in the upmost and lowest branches (). The corresponding values are(26)and(27)where the first arrow in each equation represents the limit of a large number of segregating sites; and the second, the asymptotic behavior for a large sample size. The maximum and minimum values of are also the absolute maximum and minimum values of *D* over all possible spectra.

#### Fay and Wu’s *H*:

Its maximum corresponds to a tree with maximally balanced topology and length concentrated (surprisingly) in branches at while its minimum corresponds to a maximally unbalanced tree with length concentrated in the upmost branches (). The corresponding values are(28)and

(29)#### Zeng’s *E*:

Its maximum corresponds to a tree with length concentrated in the upper branches (), while its minimum corresponds to star-like trees (*i.e.*, length concentrated in the lowest branches, ). The corresponding values are(30)and(31)The minimum value of is also the minimum absolute values of *E*.

*L* test:

Its maximum corresponds to a star-like tree with length concentrated in the lowest branches (), while its minimum corresponds to a maximally unbalanced tree with length concentrated in the upmost branches (). The corresponding values are(32)and(33)The maximum value of is also the maximum absolute value of *L*.

The dependence of the extreme values on *n* and *S* is shown in Figure 3 for all the tests discussed above.

These results are useful to interpret the actual strength of the signal given by the tests. The normalization of neutrality tests suggests that values between −1 and 1 fall into the normal range for realizations of the neutral model without recombination. However, there is no indication of which values could be deemed “large” in absolute terms. The extreme values computed above fill this gap, since they give a natural reference in terms of extreme trees. These values can be used to see if a tree is close to one of the extreme trees for the test used, and to understand how large a signal of nonneutrality could be in theory.

As an example, consider the regions of the human genome shown in Figure S1 in File S1, based on data from 1000 Genomes Project Consortium *et al.* (2015). The strong signals of selection in Central Europeans detected by Fay and Wu’s *H* appear much less extreme when compared with the theoretical minimum, which is so low that it does not appear in the plot. On the other hand, the deviations from neutrality shown by Tajima’s *D* ∼136.4 Mbp of chromosome 2 and ∼29.4 and 30.6 Mbp of chromosome 6 in Central Europeans do not look impressive, unless we notice that it is pretty close to the minimum possible value for the test. As another example, the deviations from neutrality of *L* and Fay and Wu’s *H* ∼31.3 Mbp on chromosome 6 in Yoruba look similar, but the minimum of *L* is much closer.

The results of this section could also be used to renormalize neutrality tests in the spirit of Schaeffer (2002) (see File S1). However, our results could not actually show any improvement with respect to the usual normalization.

## Discussion

The ancestry of the sequences in a sample from a single locus, or an asexual population, is described by a single genealogical tree. The same is not true for multi-locus analyses of sexual species: recombination generates different trees along the genome. Inferring these trees is possible only if there are enough mutations per branch. However, in most sexual and asexual populations, lower branches are typically short compared to the inverse mutation rate. Moreover, in many eukaryotic genomes, the mutation and recombination rates are of the same order of magnitude, which means that there are just a few segregating sites in each nonrecombining fragment of the genome. The paucity of mutations, caused by the interplay of genetic relatedness within a population (hence short branches), and recombination, does not allow a full reconstruction of the trees. Therefore, summary statistics are often used for population genetics analysis. These statistics are also computationally useful, since any given configuration of mutations has low probability and it is therefore hard to apply inference methods on the configuration itself. Moreover, they are more robust to details of the model like mutation and recombination rates.

Summary statistics are often more directly related to the mutation pattern of the sequences rather than to their genealogy. In this work, we clarified the precise correspondence between some SFS summary statistics and some features of the genealogical trees.

It is well known that the frequency spectrum is sensitive to tree topology and branch lengths. Interestingly, several estimators and neutrality tests built on the SFS, such as Watterson Tajima’s *D*, and Fay and Wu’s *H*, show quite a simple dependence on tree imbalance and waiting times. A new measure of tree imbalance, the variance in the number of descendants of a mutation at a given level, plays an important role in the interpretation of these neutrality tests. The simplicity of these results stems from the simple weights of these estimators and tests: the SFS is multiplied by functions of the frequency that are constant (Watterson), linear (Zeng), or quadratic polynomials (Fay and Wu and Tajima).

The interpretation of common estimators and tests is summarized in Table 1. This interpretation is rigorous and consistent with intuition. Our results help to understand the peculiarities of the different tests. For example, we reinterpret Zeng’s *E* as a test for star-likeness, and understand its reduced power to detect selection compared to Fay and Wu’s *H* as a consequence of its insensitivity to tree imbalance and of the compressed distributions of its negative values.

The imbalance measure is also related to other balance statistics proposed recently, namely the root balance and the standardized sum (Li and Wiehe 2013), which can also be inferred quite reliably from sequence data. In contrast, balance statistics such as Colless’ index (Colless 1982), which considers the average balance of the tree across all internal nodes, are less suited for population genetic applications, since balance at lower nodes cannot usually be estimated from sequence data, due to the paucity of polymorphisms which separate closely related sequences. Furthermore, recombination affects mostly the lower part of the tree, hence it introduces additional noise, preventing accurate reconstruction of its topology. Further studies of and similar imbalance measures on phylodynamic trees could provide some interesting summary statistics.

The limitation of the approach presented here lies in the assumption that mutations are mostly neutral and the mutation rate is constant, *i.e.*, mutations should occur randomly on the tree. This assumption fails for the case of purifying selection, when deleterious mutations can be more abundant than neutral ones and tend to accumulate on the lower branches of the tree. In fact, for sequences under purifying selection, the topology of the tree itself depends on the deleterious mutations. Therefore, our approach could not work for tests aimed at detecting rare alleles under purifying selection, like Fu and Li’s tests (or extreme negative values of Tajima’s *D*).

Beyond clarifying the interpretation of existing tests, our results open some possibilities for building new neutrality tests to explore different aspects of tree shape. Our new *L* test is a simple test for selection that shows an interesting behavior, with power similar to Fay and Wu’s *H* in the left tail and to Zeng’s *E* in the right tail, and therefore is able to detect deviations from neutrality in hitchhiking and selective scenarios at different times and different recombination rates. This new *L* test is in the same class as Tajima’s *D* and the other tests, hence it is sensitive to the variance New tests in the same class are possible, but one could imagine other tests sensitive, *e.g.*, to different combinations of the variances or to the skewness or kurtosis of as well. While the variance is a direct measure of imbalance and especially to the imbalance of the upper branches, other combinations could be sensitive to different tree features.

While our results help to interpret positive and negative values of the tests, they also provide information about the size of these values. Given the normalization of the tests, it is well known that the typical range of values of the standard neutral model is and confidence intervals can be computed by coalescent simulations, but this says nothing about the size of deviations from this model. Our results on extreme trees and the corresponding extreme test values give some indication on the range of potential deviations from neutrality.

Finally, our approach can be used to understand the average structure of the genealogical trees generated by models for which the expected SFS is known. Some of our results could also find application in phylogenetic studies of closely related species or populations, where the reconstruction of the phylogenetic tree could be difficult or ambiguous.

## Acknowledgments

This work was stimulated by discussions with Michael Blum and Filippo Disanto. We thank an anonymous reviewer for useful comments. A.L. is funded by the United Kingdom National Institute for Health Research, Health Protection Research Unit on Modelling Methodology (grant HPRU-2012-10080). L.F. and G.A. acknowledge support from the grant ANR-12-JSV7-0007 from Agence Nationale de Recherche (France). G.A. acknowledges support from the grant ANR-12-BSV7-0012-04 from Agence Nationale de Recherche (France). T.W. acknowledges support from DFG-SPP1590 by the German Science Foundation. S.E.R.O. is supported by grants CGL2009-09346 (MICINN, Spain), AGL2013-41834-R (MEC, Spain), by the CERCA Programme/Generalitat de Catalunya and acknowledges financial support from the Spanish Ministry of Economy and Competitiveness, through the Severo Ochoa Programme for Centres of Excellence in R&D 2016-2019 (SEV‐2015‐0533).

## Footnotes

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.188763/-/DC1.

*Communicating editor: Y. S. Song*

- Received March 1, 2016.
- Accepted May 19, 2017.

- Copyright © 2017 by the Genetics Society of America