Genetics, Vol. 166, 1115-1131, February 2004, Copyright © 2004

The Effect of Selection on Genealogies

N. H. Bartona and A. M. Etheridgeb
a Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
b Department of Statistics, University of Oxford, Oxford OX1 3T6, United Kingdom

Corresponding author: N. H. Barton, Animal and Population Biology, University of Edinburgh, W. Mains Rd., Edinburgh EH9 3JT, Scotland., n.barton{at}ed.ac.uk (E-mail)

Communicating editor: W. STEPHAN


*  ABSTRACT
*TOP
*ABSTRACT
*THE MODEL
*IDENTITIES, TREE LENGTHS, AND...
*A CHANGE OF VARIABLES
*THE NET EFFECT OF...
*APPROXIMATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The coalescent process can describe the effects of selection at linked loci only if selection is so strong that genotype frequencies evolve deterministically. Here, we develop methods proposed by Kaplan, Darden, and Hudson to find the effects of weak selection. We show that the overall effect is given by an extension to Price's equation: the change in properties such as moments of coalescence times is equal to the covariance between those properties and the fitness of the sample of genes. The distribution of coalescence times differs substantially between allelic classes, even in the absence of selection. However, the average coalescence time between randomly chosen genes is insensitive to the current allele frequency and is affected significantly by purifying selection only if deleterious mutations are common and selection is strong (i.e., the product of population size and selection coefficient, Ns > 3). Balancing selection increases mean coalescence times, but the effect becomes large only when mutation rates between allelic classes are low and when selection is extremely strong. Our analysis supports previous simulations that show that selection has surprisingly little effect on genealogies. Moreover, small fluctuations in allele frequency due to random drift can greatly reduce any such effects. This will make it difficult to detect the action of selection from neutral variation alone.


WE develop a diffusion approximation, first introduced by KAPLAN et al. 1988 Down, which extends the coalescent to take account of arbitrary forms of selection. KINGMAN 1982 Down introduced the coalescent process as a simple description of the genealogical relationships among a set of neutral genes. Although the theory of the coalescent has developed largely independently, it is closely related to the classical concept of identity by descent (NAGYLAKI 1989 Down). The coalescent extends naturally to describe structured populations, in which genes may be found in different places or embedded in different genetic backgrounds. The effects of selection can easily be included, provided that it is so strong relative to random drift that the frequencies of different genetic backgrounds can be approximated as changing deterministically (i.e., the product of population size and selection coefficient, Ns >> 1; KAPLAN et al. 1988 Down; HUDSON 1990 Down).

In some cases, assuming that the genetic or spatial structure of a population changes deterministically is a good approximation. For example, when a single favorable mutation arises and spreads, it carries with it any linked variants. The effects of such "selective sweeps" on genetic variability can be accurately described by assuming that the new allele increases exponentially, even though it is subject to strong random fluctuations in the early generations, when it is present in few copies (MAYNARD SMITH and HAIGH 1974 Down; KAPLAN et al. 1988 Down; BARTON 1998 Down). However, the deterministic approximation plainly fails when selection is weak or absent. Consider the relationships between genes that can be of two allelic types. Even if these alleles are neutral, and so represent an arbitrary labeling of the genes, two genes of the same allelic type are likely to be substantially more closely related than are two genes of different type. Moreover, the average relationship between randomly chosen genes depends on the allele frequency, since an allele that happens to have increased by chance will cause a selective sweep just as if it had increased by selection. Although relationships averaged over the distribution of allele frequencies and over allelic classes must be unaffected by the labeling of neutral alleles, relationships do depend on allelic class and on allele frequency (e.g., SLATKIN 1996 Down).

We usually do not know which alleles are selected and so can observe relationships only among randomly chosen genes in populations with random genotype frequencies. However, with selection, even the average relationships are distorted and can be calculated using the structured coalescent only when selection is much stronger than drift. Weak selection (Ns ~ 1) spread across many loci can have significant cumulative effects (MCVEAN and CHARLESWORTH 2000 Down). Even when Ns is large, fluctuations may still be important. For example, BARTON and NAVARRO 2002 Down showed that the effects of balancing selection at multiple loci are strongly affected by drift even when Ns is extremely large, because each particular genetic background is present in few (if any) copies.

When selection and drift are of comparable strength, a purely coalescent-based approach becomes complicated. NEUHAUSER and KRONE 1997 Down and KRONE and NEUHAUSER 1997 Down have shown that certain kinds of selection can be represented by "ancestral graphs," which are constructed by allowing branching as well as coalescence as one moves back in time, followed by a culling of potential ancestors to generate the genealogy. This method is computationally demanding, especially with strong selection, because of the proliferation of ancestral lineages. SLADE 2000A Down, SLADE 2000B Down, SLADE 2001 Down and FEARNHEAD 2001 Down have introduced modifications that make calculations feasible for stronger selection. Nevertheless, this method does not seem likely to lead to a deeper analytical understanding, which would extend to more general kinds of selection and more complex genetics. DONNELLY and KURTZ 1999A Down and SLADE 2001 Down have also shown how recombination can be included with selection in the algorithm. However, despite these various advances, the method is still computationally intensive for strong selection: for example, with overdominance, only Ns < 10 or so can be simulated (SLADE 2000A Down). Moreover, it is limited to certain kinds of selection: linear frequency dependence or selection on diploids requires branching into three potential ancestors instead of two, and, more generally, a kth order polynomial dependence of haploid fitness on allele frequency requires branching into (k + 2) virtual ancestors. In practice, anything beyond the simplest kind of epistasis or frequency dependence is ruled out.

Other recent simulation techniques follow the state of the whole population backward through time. DONNELLY et al. 2001 Down discuss methods for importance sampling, which start with the well-understood neutral process, and apply a bias that represents the action of selection. This is most efficient when selection is weak. SLATKIN 2001 Down begins by reversing the selective process, which should allow stronger selection to be represented accurately. Because the diffusion is reversible with additive selection, the procedure is exact in this case. With nonadditive selection, SLATKIN 2001 Down uses a procedure that approximates the correct backward diffusion. One can also follow the evolution of population allele frequencies back through time and use the Metropolis-Hastings algorithm to sample the appropriate distribution of frequencies. Under the diffusion approximation, the probability of any particular path is given by a Gaussian distribution of velocities around their deterministic expectation, which approximates the product of Markov transition matrices (SCHULMAN 1981 Down; ROUHANI and BARTON 1987 Down).

KAPLAN et al. 1988 Down introduced a more direct approach to the problem, by following relationships between genes within and between allelic classes, conditional on the frequencies of those classes in the population. However, their approach has not been taken up. This may be partly because KAPLAN et al. 1988 Down did not specify boundary conditions for their equations, so that they could not be solved using standard software; however, DARDEN et al. 1989 Down do provide an alternative numerical algorithm. BARTON et al. 2003 Down give a rigorous justification for KAPLAN et al. 1988 Down diffusion equations, including the necessary boundary conditions. In this article, we show that in the absence of selection, the overall average relationship between random genes in a random population is the same as under simple neutrality. This must of course be the case, since labeling a pair of neutral alleles cannot affect the distribution of genealogies. However, this result extends to give a general expression for the effect of selection on the genealogy, which can be seen as an extension of PRICE's (1970) equation.

The analysis is of a neutral locus that is linked to a single selected locus that carries two alternative alleles. (Setting the recombination rate to zero allows us to follow genealogies at a single selected locus). Essentially the same equations apply to probabilities of identity in state, assuming infinite-allele mutation at the neutral locus, the mean and higher moments of coalescence time, and the full distribution of coalescence times. The equivalence between these can be seen by noting that the identity in state is the generating function for the distribution of coalescence times. Our numerical results are mainly for the distribution of pairwise coalescence times, but in the last part of the article we consider the distribution of the total length of a large genealogy.

We begin by setting out the diffusion approximation for identities in allelic state; the equations for coalescence times are essentially the same. We then change variables to work with (i) the average over randomly chosen pairs of genes, (ii) differences associated with one or the other allele, and (iii) differences within classes vs. between classes. This change of variables leads to a simple formula for the expectation over the stationary distribution and to approximations for strong mixing between classes and for strong selection. Throughout this first part of the article, two simple examples are used to illustrate the derivations (two genes sampled from either a neutral locus or a locus under balancing selection). In the later sections, a wider range of parameters is explored.


*  THE MODEL
*TOP
*ABSTRACT
*THE MODEL
*IDENTITIES, TREE LENGTHS, AND...
*A CHANGE OF VARIABLES
*THE NET EFFECT OF...
*APPROXIMATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Consider selection on a single locus that carries two alleles, labeled P, Q. This is linked to a neutral second locus, with recombination rate r. Allele frequencies at the selected locus are p, q at the beginning of the generation. For simplicity, assume that selection acts on haploids; however, detailed assumptions about the life cycle do not affect the diffusion approximation. Numerical examples assume purifying selection with fitnesses of Q, P of 1:1 + s; balancing selection is modeled by assuming frequency dependence such that s is replaced by s(p0 - p). This is close to a model of overdominance with diploid fitnesses 1 - sp0:1:1 - sq0, with p0 + q0 = 1. (The relation between models of overdominance and linear frequency dependence is discussed by NEUHAUSER 1999 Down.) Mutation then occurs at a rate µ from Q alleles to P alleles and µ in the opposite direction. (In terms of the actual mutation rates µ = µQ->P + µP->Q, ; the equilibrium under mutation alone is .) Where we consider identity in allelic state, mutation to novel alleles occurs at a rate {nu} at the linked neutral locus. Diploid zygotes are formed by random union and undergo meiosis. Finally, 2N gametes are sampled to found the next generation. (We keep the convention that population size is 2N genomes, corresponding to N diploid individuals.) This discrete time model is defined in more detail by BARTON et al. 2003 Down, who also set out the analogous continuous-time Moran model. Our notation is summarized in Table 1.


 
View this table:
In this window
In a new window

 
Table 1. Summary of notation

In the limit where selection, drift, and mutation are weak, we can take a diffusion approximation to this model. We scale selection, mutation, and recombination relative to N, and time relative to 2N, so that T = t/2N, S = Ns, R = Nr, U = Nµ, and V = N{nu}. [Note that BARTON et al. 2003 Down scale time relative to N generations.] Suppose that we sample n genes. We need to follow the probability fj,k that j genes of type Q, and k genes of type P, are identical in state at the neutral locus. We assume this neutral locus to be subject to infinite-alleles mutation at rate {nu}. fj,k is also the generating function for the distribution of total length of the genealogy and so can be used to find the distribution of the number of segregating sites.

KAPLAN et al. 1988 Down(Equation 20) provide a system of diffusion equations for fj,k without recombination. This system is extended to include recombination by HUDSON and KAPLAN 1988 Down. These give

(1)

with f0,1 = f1,0 = 1 by convention. BARTON et al. 2003 Down give a rigorous derivation for this stationary version, for n = 2.

The terms involving V = N{nu} represent the steady decay in identity due to mutation at the neutral locus. The positive terms (fj-1,k - fj,k)/q, (fj,k-1 - fj,k)/p represent the increase in identity due to coalescence within allelic classes. The terms involving differences in identity (fj-1,k+1 - fj,k), (fj+1,k-1 - fj,k) represent the movement of genes between allelic classes by mutation of allelic classes and by recombination. Note that mutation from allele Q to allele P dominates recombination when allele P is rare [term U(/p) + R in Equation 1], because most copies of P will in that case have arisen as recent mutations from Q. Finally, the last two terms represent the proportion of populations currently at allele frequency p that derive from populations with a different frequency; this process is approximated as a backward diffusion.

In principle, the full distribution of genealogies can be recovered by assigning a notional mutation rate to each node, {{nu}i, {nu}ij, {nu}ijk, ...} say, and by following identities among sets of genes that are either in background Q or P (f{a},{b,c}, say). Then, terms such as j(j - 1)/2 in Equation 1 separate out into distinct terms, corresponding to different permutations of loci over backgrounds. This gives a set of equations in a very large number of variables and, worse, an extremely large number of f's: all possible partitions of the lineages must be tracked separately. So, numerical solution is difficult for even three genes. This approach might be useful, however, for deriving simpler equations that describe particular features of the distribution of genealogies.

As detailed in BARTON et al. 2003 Down, the probabilities of identity are the minimal positive solutions to the equilibrium version of Equation 1. This implicitly specifies the boundary conditions for the system, but to obtain numerical solutions, we require them explicitly. Consider first small p. The right-hand side of Equation 1 is dominated by terms in 1/p. Solving for these terms leads to

(2)

Similarly, as p tends to 1, the terms in 1/q dominate and we obtain analogous boundary conditions to Equation 2.

Numerical methods for solving Equation 1 and Equation 2 are explained in the Appendix. Fig 1 gives a check on these methods for two genes. Three methods for calculating identities within and between neutral allelic classes are compared. First, identities can be calculated conditional on allele frequencies, simulated over 50,000 generations, for a population of 2N = 100 (Fig 1, dots). Allele frequencies were simulated over 50,000 generations, using the exact backward transition matrix calculated for the Wright-Fisher model. Second, the identities can be calculated by solving the discrete equivalent of Equation 1, which gives exact results for the Wright-Fisher model. This involves linear equations for a set of three vectors for f0,2, f1,1, f2,0, each of length 2N + 1. The results fit closely with those estimated by simulation and are indistinguishable in Fig 1. Finally, the diffusion approximation (Equation 1) was used. This fits closely over most of the range (compare solid curves with dots). However, it underestimates identities between genes in rare allelic classes (e.g., f0,2 for p -> 0; Fig 1, top left). The approximation is expected to fail for p ~ 1/N, since only a small number of copies are involved. A similar comparison for 2N = 1000 shows that the discrepancy is then restricted to a narrower region, as expected.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 1. Comparison among three methods for calculating identities as functions of allele frequency. The dots show identities calculated by simulation of a neutral allele over 50,000 generations and by solution of a matrix recursion; the values are barely distinguishable on this scale. The thin solid lines show the identities calculated using the diffusion approximation, Equation 1. The thick curve shows the stationary distribution. (This distribution is divided by 4 for clarity; the probability of being in the range 0.1 < p < 0.9 is 0.59.) 2N = 100, {nu} = µ = 0.005, ; thus, U = V = 0.25.


*  IDENTITIES, TREE LENGTHS, AND PAIRWISE COALESCENCE TIMES
*TOP
*ABSTRACT
*THE MODEL
*IDENTITIES, TREE LENGTHS, AND...
*A CHANGE OF VARIABLES
*THE NET EFFECT OF...
*APPROXIMATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

First, consider probabilities of identity in allelic state under the infinite-alleles model. There are substantial differences between identities involving different allelic classes: identities between classes are much lower than those within (compare the lower curve for f1,1 with the upper curves for f0,2, f2,0). Identities also vary substantially with allele frequency. Within-class identity decreases from ~1 when the allele is present in a few copies, down to (1 + 4Uf1,1)/(1 + 4U) (Equation 2) when it is frequent enough for the diffusion approximation to hold (p >> 1/N), and then down to a value somewhat greater than the neutral expectation of 1/(1 + 4V) when the allele nears fixation. The between-class identity necessarily approaches that within the commonest class near fixation and decreases in between, because there is then a rapid influx into the rarer class by mutation from the commoner class.

Note that there is considerable variation in the identities, and hence in the distribution of coalescence times, in any particular generation of a simulated population (Fig 2, top). This arises from the random history of allele frequencies (Fig 2, middle) and is an extra source of variation, over and above the intrinsic variation in the actual coalescence time. The latter is a sample from a distribution that itself fluctuates with allele frequencies. The smooth curves shown in Fig 1 are the identities conditional on current allele frequency and are an average over the distribution of past fluctuations in allele frequency. Calculation of the variance in pairwise identity as a function of current allele frequencies would require consideration of associations among sets of four genes.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 2. An example of the time-course of identities (top), allele frequencies (middle), and average identity (bottom), from simulations of a neutral allele (2N = 100, {nu} = µ = 0.005, , as in Fig 1). In the top, the bottom thick line shows the identity f1,1 between allelic classes, over generations 2000–4000. The top two lines show the within-class identities f2,0, f0,2. Around generation 3000, allele P is lost (middle); then the identity f0,2 is set to 1, and the identities f1,1, f2,0 become equal. The converse pattern is seen when allele Q is lost, around generations 2000 and 4000. Although the identities within and between classes fluctuate greatly with allele frequency, the average identity stays close to the expected value .

Fig 3 shows the same comparison as in Fig 1, but with balancing selection of strength Sb = Nsb = 4 toward an equilibrium point p0 = 0.7. Again, the three methods agree to high accuracy, except for identities within rare allelic classes. Even though selection is much stronger than mutation and drift, it has surprisingly little effect in reducing the identities f2,0, f1,1, f0,2. The stationary distribution is now concentrated around p0 = 0.7; because identities depend strongly on allele frequency, this might be expected to alter the identity between random pairs of genes, averaged over the stationary distribution. However, balancing selection reduces this average to only , relative to the neutral value of 1/(1 + 4V) = 0.5. This is because the average identity is almost independent of allele frequency (e.g., Fig 2, bottom). We consider this issue in more detail below.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 3. Comparison among three methods for calculating identities as functions of allele frequency, under balancing selection. Parameters are as in Fig 1, except that there is balancing selection with coefficient s(p0 - p), s = 0.08, p0 = 0.7. The stationary distribution is now concentrated around p0; the probability of 0.1 < p < 0.9 is increased to 0.74.

Similar equations can be derived for the probability density of total length of the genealogy, {Phi}j,k. This is a function of the total length, , and current allele frequency, p. After a long time, the density approaches a steady state, which satisfies

(3)

At = 0, we have

(4)

The boundary conditions at = 0 set the rate of coalescence within allelic classes as being inversely proportional to the frequency of the class. (Recall that time has been scaled relative to 2N). The partial differential equations themselves have essentially the same form as for the identities and represent the movement of genes between allelic classes and the diffusion of populations between allele frequency states.

Fig 4 shows the solution to Equation 3, for the same parameters as Fig 3. The rate of coalescence between allelic classes, {Phi}1,1, is necessarily zero, and so the distribution {Phi}1,1 passes through the origin (Fig 4, thick lines). However, when one or the other allele is rare, genes in the rarer class are likely to be descended from genes in the common class relatively recently. Hence, {Phi}1,1 rapidly approaches {Phi}2,0 when P is rare (Fig 4, top left and bottom right). Coalescence times within a rare allelic class are likely to be very short, unless the two genes derive from the common class via recent mutation. Because the mutational flux from common to rare is high, these two possibilities have comparable probability (see two terms ~1/p in Equation 1). Thus, for small p the distribution {Phi}0,2 is a mixture of a singularity at zero and a component proportional to {Phi}2,0 (Fig 4, bottom curve in top left and bottom right). At intermediate frequency, both within-class distributions are close to the neutral expectation, Exp[-T] (Fig 4, middle).



View larger version (24K):
In this window
In a new window
Download PPT slide
 
Figure 4. The distribution of coalescence times, calculated from Equation 3, for a locus under balancing selection. The three parts show distributions at p = 0.01, 0.1, 0.5 (left to right). In each, the thick line shows the between-class distribution {Phi}1,1, the dashed curve shows {Phi}2,0, and the thin solid curve shows {Phi}0,2 (, as in Fig 3). Time is scaled relative to 2N generations.

Fig 5 compares the distribution of coalescence times between randomly sampled pairs of genes, , with the neutral formula Exp[-T], for the same example of balancing selection. When allele P is rare, the coalescence time tends to be shorter, while as the frequency approaches the intermediate value where the population is most likely to be found (p ~ p0 = 0.7), coalescence times become slightly longer than the neutral expectation. As P approaches fixation, coalescence times again become slightly shorter than in the absence of selection and allelic structure. Overall, there is little change: in this example, balancing selection increases mean coalescence time by 13.9%.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 5. The distribution of coalescence times between randomly chosen pairs of genes (thick line) compared with the neutral expectation Exp[-T] (thin line). Parameters are for a locus under balancing selection, as for Fig 3 and Fig 4.

The relation between Equation 1 and Equation 3 can be understood by noting that the identity in state is the generating function for the distribution of coalescence times, with scaled parameter 4V (i.e., . This can be confirmed by integrating the product of Equation 3 with Exp[-4V] over . The moments of coalescence times can be recovered by taking differentials of f at V = 0. For example, the expected total length is given by

(5)

with J1,0 = J0,1 = 0 by convention. As p -> 0, mean total length tends to

(6)

These limits can be found directly or from Equation 2. Note that for n = 2 genes, the expected total length of the genealogy is twice the mean pairwise coalescence time, which we denote by {tau}.

Fig 6 shows how the mean coalescence time changes with allele frequency, with and without balancing selection. The mean coalescence time between genes in different allelic classes is much greater than that within classes, but approaches the same value as within the commoner class as that class approaches fixation. Within-class coalescence times approach 4UJ1,1/(1 + 4U) as the allele becomes rare. This value is determined by a balance between the rapid rate of coalescence within rare classes and the rapid influx of copies by mutation from the commoner class.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 6. Mean coalescence time within and between allelic classes, plotted against allele frequency. The thin lines are for neutral alleles, while the thick lines are for balancing selection S = 4, p0 = 0.7; U = 0.25 as before. The top pair of curves are for genes in different allelic classes ({tau}1,1). The bottom two pairs are for mean coalescence time within classes (dashed curves, {tau}2,0; solid curves, {tau}0,2). Mean coalescence time between two P genes, {tau}0,2, decreases to 4U{tau}1,1/(1 + 4U) as p -> 0; conversely, {tau}2,0 -> 4U{tau}1,1/(1 + 4U) as p -> 1.


*  A CHANGE OF VARIABLES
*TOP
*ABSTRACT
*THE MODEL
*IDENTITIES, TREE LENGTHS, AND...
*A CHANGE OF VARIABLES
*THE NET EFFECT OF...
*APPROXIMATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

To make some approximations, and to understand average identity, it is helpful to change variables, as follows. First, consider the pairwise case. Let

(7)

; {Delta} is the difference in identity between a gene associated with P and a random partner and a gene associated with Q and a random partner; and {theta} is the sum of the differences between identities of alleles in the same classes and alleles in distinct classes. Equation 1 transform to

(8a)


(8b)


(8c)

Each variable is augmented by a diffusion term, (). As is shown in Equation 9, the average of this diffusion term over the stationary distribution is zero; thus, it shifts the variable without producing a net change. The first equation shows that average identity is augmented by 4pqS{Delta}[p], which is the product of the change in allele frequency due to selection, and the difference in identity between genes associated with P rather than with Q. Net identity increases if a higher identity is associated with an allele favored by selection. The crucial quantity, then, is {Delta}. The second equation shows that this is reduced by recombination and mutation, especially near the edges, and augmented by a term proportional to {theta}, which is the difference in identity within and between allelic classes. The last equation, for {theta}, shows that it is augmented by coalescence, especially at the boundaries, in proportion to (1 - )/pq. It is also augmented by a term proportional to {Delta}.

The equations for the distribution of coalescence times are obtained by setting V = 0 and dropping the terms that do not involve , {Delta}, or {theta} (i.e., 1 in Equation 8a and Equation 1/pq in Equation 8c). Boundary conditions at {tau} = 0 are that , {Delta} = 0, and {theta} = 1/pq. The expected total length (which is twice the mean pairwise coalescence time for n = 2) is given by the same equations as above, but setting V = 0, and subtracting 1/pq from Equation 8c.

Fig 7 shows the transformation for mean coalescence time, to the variables , {Delta}{tau}, {theta}{tau} (Table 1). The mean coalescence time between randomly chosen pairs of genes varies rather little with allele frequency and increases only moderately with increasing balancing selection (Fig 7, top). This is because the equation for is driven mainly by the term 1 - , which leads to the standard solution . The diffusion terms ({partial}p - 4{Delta}{tau}) redistribute across the range of allele frequencies, but, as we show below, do not alter its average value. The main driving term is small, because under balancing selection both S and {Delta} are zero somewhere in the interior, and so their product is small when allele frequency is concentrated in the interior.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 7. Transformed representation of the mean coalescence time, for increasing strengths of balancing selection (U = 0.25, p0 = 0.7). Top left, , the average over randomly chosen pairs of genes; bottom left, {Delta}{tau}, the effect on mean coalescence time of association with P rather than with Q; top right, {theta}{tau}, twice the difference in mean coalescence time within relative to between classes; bottom right, {psi}[p], the stationary distribution. The thick curves are for neutral alleles, and the successive thin curves are for S = 1, 2, 4, 8, 16, 32.

Fig 8 shows the same comparisons, but for purifying selection. The overall mean is affected little by weak selection (S <= 1, say). With stronger purifying selection, mean coalescence time is substantially reduced when the favorable allele becomes rare (Fig 8, left side of top left), but is slightly increased when the favorable allele is common. The average identity is insensitive to selection when selection is weak because then {Delta}{tau} changes sign in the center and so its product with Spq also changes sign; the net effect through the driving term 4Spq{Delta}{tau} is thus small. As selection becomes stronger, {Delta}{tau} becomes positive over a wider region, implying that the mean coalescence times involving genes in the favored allelic class become longer. However, the net effect on the pattern of is hard to predict, because the diffusion term is strong. The next section sets out a simple result for the mean coalescence time, averaged over the stationary distribution, which necessarily does not include this diffusion term.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 8. Transformed representation of the mean coalescence time, for increasing strengths of purifying selection; parameters and notation are as in Fig 7. The thick curves are for neutral alleles, and the successive thin curves are for S = 0.25, 0.5, 1, 2, 4, 8.


*  THE NET EFFECT OF SELECTION
*TOP
*ABSTRACT
*THE MODEL
*IDENTITIES, TREE LENGTHS, AND...
*A CHANGE OF VARIABLES
*THE NET EFFECT OF...
*APPROXIMATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

A remarkably simple result is obtained by taking the expectation of the average identity over the stationary distribution, E[]. We know that the stationary distribution satisfies the forward diffusion . Integrating the first of Equation 8aEquation 8bEquation 8c over the stationary distribution {psi}, we have

(9)

Integrating by parts, the third term vanishes, and we have

(10)

We immediately see that, in the neutral case (S = 0), E[ ] is unaffected by the arbitrary labeling: it is given by the standard formula from the neutral theory, 1/(1 + 4V). Selection will perturb average identity by a proportion that depends on E[Spq{Delta}[p]]; this will be small for purifying selection, since {Delta} changes sign somewhere in the center. However, it will be large and negative for balancing selection, if the null point of selection coincides with the null point for {Delta}. Then, both {Delta} and S will change sign near the center and so E[Spq{Delta}[p]] is negative throughout. Thus, balancing selection is expected to reduce average identity.

The mean coalescence time is given by the same equation as Equation 9, but with V set to zero, and {Delta}{tau} = p(0,2 - 1,1) + q(1,1 - 2,0):

(11)

Similarly, the distribution of coalescence times, averaged over random pairs of genes and over the distribution of allele frequencies, is

(12)

The expected mean coalescence time can be calculated either directly or by using the right side of Equation 11. In numerical calculations, the latter is more accurate, both because it gives what is usually a small deviation from the neutral expectation of 1 and because the regions near fixation (where the stationary distribution diverges for U, U < 1/4) do not contribute significantly to the integral (since pq and {Delta}{tau} tend to zero at the boundaries). The boundaries do not contribute to the deviation from neutrality because in these regions selection is negligible relative to drift, and there is rapid flow between backgrounds.

The relationships of Equation 9Equation 10Equation 11Equation 12 extend to arbitrary numbers of genes in the sample. To be definite, consider the expected total length of the genealogy, Jj,k (Equation 5). However, the same argument applies to other properties of the genealogy, such as the identity in state or the distribution of times to the most recent common ancestor. As mentioned above, one can write down the generating function for the complete density in the same form, and so this result applies for any quantities derived from it.

Summing over the binomial probability density for the number of genes {j, k} sampled from each background, we obtain

(13)

The first term n represents the increase in tree length at a rate n, when n lineages are present. The second term represents coalescence at a rate n(n - 1)/2; when coalescence occurs, the tree length expected among (n - 1) genes replaces that among n genes: (n-1 - n). The third term represents the perturbation to average tree length caused by selection. It is proportional to {Delta}Jn, which is half the regression of average tree length on the number of P alleles in the sample (i.e., of Jj,k on k). The last term represents the effect of changes in allele frequency through time.

Taking the expectation over the stationary distribution, the last term vanishes, and we obtain a recursion:

(14)

We see immediately that in the absence of selection the expected total tree length is given by the neutral formula, . The perturbation due to selection during the time when there are j lineages in the genealogy is proportional to 2/j(j - 1), which is the expected time for which j lineages persist before coalescence. This suggests that the effects of selection will primarily be on the deeper parts of the genealogy (i.e., small j); however, it is not clear how {Delta}j changes with j. Numerical calculations confirm our intuition that the effects of selection decrease rapidly as j increases. Fig 9 shows the successive perturbations due to selection for up to n = 50 genes, compared with the successive contributions 2/j under neutrality. The {Delta}j decrease for j > 10, and so the net perturbation, which is further reduced by the factor 2/j(j - 1), quickly becomes negligible. The sum of the perturbations is -0.29, compared with a neutral expectation of . Thus, purifying selection reduces total tree length by only 3.3% in this example, and more than half of that effect arises during the time when only two or three lineages are extant. (As explained in the Appendix, we use the insensitivity of the genealogy to allele frequency fluctuations in our calculations of terms involving large numbers of genes.)



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 9. The perturbations to total expected tree length caused by purifying selection, plotted against the number of extant genealogies, j (bottom series of points). These are calculated from E[4pqS{Delta}j] (Equation 16). The top series of points show the contribution to total expected tree length expected under neutrality, 2/j. Mutation rate is U = 0.5, , and purifying selection S = 2; there is no recombination.

Equation 14 can be interpreted as an instance of ROBERTSON's (1966) "secondary theorem of natural selection," which states that the increase in any quantity caused by selection is equal to the covariance between that quantity and fitness. [This was independently developed into a general representation of selection by PRICE 1970 Down.] In terms of the original variables, , which is equal to the covariance between the fitness of the sample of n genes and the expected total length of the genealogy that relates these genes. (Here, relative fitness of a sample with k copies of allele P is sk, or 2Nsk = 2Sk when rescaled.) Equation 14 shows that this covariance is exactly equal to the increase in the expected total length caused by selection. Because essentially the same equations apply to the identity f, which can be viewed as the generating function of the genealogy, we can see that relations similar to Equation 14 apply to any quantities that are linear functions of the probability density of genealogies (for example, the kth moments of coalescence times).


*  APPROXIMATIONS
*TOP
*ABSTRACT
*THE MODEL
*IDENTITIES, TREE LENGTHS, AND...
*A CHANGE OF VARIABLES
*THE NET EFFECT OF...
*APPROXIMATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

There does not appear to be an explicit solution to Equation 1 or Equation 8aEquation 8bEquation 8c, even in the absence of selection. We therefore explore several approximations, in the hope that these might extend to more complex situations, which are difficult to investigate numerically. We consider in turn the extremes of no mixing between allelic classes, assuming that change in allele frequency is negligible and assuming that there is rapid mixing between backgrounds by mutation and recombination.

No movement between backgrounds:
If there is no recombination or mutation between alleles (U, R = 0) then the three identities given by Equation 1 change independently. The between-class identity f1,1 clearly tends to zero, while the within-class identities are those of two separate populations of size p, q, which fluctuate according to a diffusion process. (Strong frequency dependence is required to prevent loss of variation at the selected locus in the absence of mutation: near the boundaries, S -> p-{alpha} as p -> 0, {alpha} > 1.) However, even this uncoupling into a set of single-variable equations does not lead to an explicit solution. The identities could be found by a change of timescale in the standard coalescent, but this does not lead to closed-form solutions (DONNELLY and KURTZ 1999B Down).

Weak random drift:
If random drift is weak relative to mutation, recombination, and selection, then allele frequencies will be close to a deterministic limit. Many treatments of the effects of selection on genealogies assume this limit and apply the standard structured coalescent (e.g., KAPLAN et al. 1988 Down, KAPLAN et al. 1989 Down; HUDSON and KAPLAN 1995 Down; WAKELEY 2001 Down). For the model presented here, this limit corresponds to dropping the diffusion terms (i.e., those terms involving {partial}p or {partial}p,p from Equation 1) and thereby assuming that the population has always had the same allele frequency. This approximation is not explicitly dependent on the strength of selection, but does depend on both mutation and recombination rates, which determine the rate of mixing between allelic classes. Under this approximation, the mean pairwise coalescence time is

(15)

A similar expression can be obtained for the identity in allelic state, which allows calculation of higher moments of the distribution of coalescence times. The overall mean, E[], can be estimated either by integrating over the stationary distribution or by fixing p at its deterministic equilibrium value (NEUHAUSER and KRONE 1997 Down).

Fig 10 shows the mean coalescence time as a function of the strength of balancing selection. Here, U = N, µ = 0.5 or 0.25, and so effects are not large: balancing selection has a substantial effect on genealogies only when mutation is low enough that genes rarely move between backgrounds. The mean coalescence time does approach the prediction from Equation 15 for large S (right side of figure), but balancing selection must be extremely strong for the deterministic limit to be accurate. That is, weak random fluctuations in allele frequency can substantially reduce coalescence times. Note that weak disruptive selection (e.g., underdominance) reduces coalescence times slightly, because allele frequencies tend to sweep back and forth between alternative alleles (see left of graph). However, strong underdominance has little effect, because the population is then near fixation.



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 10. The effect of balancing selection on the mean coalescence time, E[]. The strength of balancing selection, Sb, increases to the right; negative values correspond to disruptive selection; p0 = 0.7. The two curves are for mutation rate U = 0.5 (thick curve) and 0.25 (thin curve). There is no recombination (R = 0). The straight lines are the predictions assuming that allele frequency is fixed at p0.

Fig 11 shows an example in which fluctuations significantly reduce mean coalescence time despite strong selection. With S = Ns = 32, allele frequencies cluster around the equilibrium of p0 = 0.7 (bell-shaped curve). The coalescence time is almost independent of allele frequency, simply because populations away from equilibrium are recently derived from populations close to equilibrium (thin solid curve). In contrast, the deterministic prediction (dashed line) ignores the diffusion of populations between different allele frequencies and so depends more strongly on allele frequency. Taking the value of this approximation at p0 = 0.7 gives a substantial overestimate of mean coalescence time, even under such strong selection.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 11. The mean coalescence time, , as a function of allele frequency; U = 0.5, , R = 0, Sb = 32, p0 = 0.7. The thin solid curve gives the exact solution from Equation 5. This is compared with the prediction from Equation 15 (dashed line), which assumes that allele frequency is fixed; this prediction is independent of selection. The thick curve shows the stationary distribution.

Fig 12 and Fig 13 show similar results, but for mutation/selection balance rather than for balancing selection. The lines to the right in Fig 12 show the prediction for the deterministic limit (Equation 15); as S -> {infty}, mean coalescence time is reduced by a factor (1 - (U/S) + O(1/S2)) (CHARLESWORTH et al. 1993 Down). This effect of "background selection" against deleterious alleles is reduced substantially by random drift for S = Ns < ~3. Fig 13 shows the stationary distribution at S = 3, U/S = µ/s = 1/8; as for balancing selection, a moderate degree of fluctuation around the deterministic expectation substantially reduces the effect of selection in reducing mean coalescence time [here, from 1 - (U/S) = 0.875 at S -> {infty} to 0.908 at S = 3].



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 12. The effect of purifying selection on the mean coalescence time, E[]. The two curves are for equilibrium frequency U/S = 1/8 (thick), 1/4 (thin). There is no recombination (R = 0). The lines on the right are the predictions assuming that allele frequency is fixed at U/S.



View larger version (10K):
In this window
In a new window
Download PPT slide
 
Figure 13. The mean coalescence time, , as a function of allele frequency, at mutation-selection balance, U = 3/8, , R = 0, S = 3. The thin solid curve gives the exact solution from Equation 5. This is compared with the prediction from Equation 15, which assumes that allele frequency is fixed; this prediction is independent of selection. The thick curve shows the stationary distribution.

Rapid mixing:
If mutation and/or recombination are strong relative to drift (U or R >> 1), then there will be little divergence between allelic classes: {Delta} and {theta} will be small, and close to 1/(1 + 4V). First, consider the case with U, S ~ 1 and R >> 1. From Equation 8a HREF="#FD8b">Equation 8bEquation 8c, we see that {theta} is augmented by the term (1 - )/pq and dissipated by recombination, and so is O(1/R). {Delta} is generated by {theta} and dissipated by recombination and also mutation at the boundaries (pq ~ 1/R). Hence, {Delta} ~ 1/R2. The perturbation to is therefore only of order 1/R2. Letting the leading terms are

(16)

Hence

(17)

Note that the term U( - p)(q - p) in the denominator of the expression for {Delta} is negligible except near the boundaries, when Rpq ~ 1. The perturbation {phi} to average identity is not given explicitly. However, its expected value, integrated over the stationary distribution, is given by Equation 10. Hence

(18)

Because the stationary distribution is known explicitly, this perturbation can readily be calculated. Moreover, it has a simple form when seen as a function of V. This allows us to take the inverse Laplace transform, which shows that the distribution of coalescence times is

(19)

Note that the key term E[ ] in Equation 19 is negative for purifying selection and positive for balancing selection.

If balancing selection is strong enough that the stationary density near the boundaries is negligible, and if R >> U, then the term involving mutation in the denominator is negligible, and

(20)

Thus, when linkage is loose the effect of selection is to increase mean coalescence times by an amount proportional to the additive variance in fitness (2S2pq). This can be understood as an inflation in the rate of random genetic drift due to inherited variation in fitness (HILL and ROBERTSON 1966 Down; SANTIAGO and CABALLERO 1995 Down). With balancing selection, the marginal selection coefficient S = Sb(p0 - p) is zero at p = p0. However, allele frequency fluctuates around this expectation with variance var(p) ~ 1/4Sb for large Sb. Hence, E[S2pq/R2] ~ Sbp0q0/4R2. Note that this is not the same as the limit of Equation 15 as R -> {infty}, which is 1/4R. Allele frequency fluctuations have a significant effect that cannot be neglected even for strong selection.

We examine the accuracy of these approximations by considering the effect of increasing recombination. Fig 14 shows the increase in mean coalescence time caused by balancing selection, plotted against recombination rate. Mutation is set to a small positive value (U = 0.05), to ensure that a stationary distribution exists. However, mutation has a negligible effect on the results unless linkage is tight (R ~ U). The increase over the neutral value, E[] - 1, is plotted on a log scale, because when R is large, small effects must be discerned. As selection increases (Sb = 4, 8, 16, 32, bottom to top), mean coalescence time converges to the deterministic limit (thick line; Equation 15) However, convergence is slow for large R. There, the large R approximation of Equation 19 is accurate (dashed lines to right). However, the approximation is good only for R > 10, in which case mean coalescence time is increased by at most 2.5%. The large R approximation is not helpful for parameters that give a large effect.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 14. The effect of balancing selection on mean coalescence time, plotted against recombination rate, R; U = 0.05. The vertical axis shows increases over the neutral value, E[] - 1, on a logarithmic scale. The top thick curve shows the deterministic limit, in which allele frequency is assumed to be fixed at p0 = 0.7 (Equation 15). The thin solid curves are for Sb = 4, 8, 16, 32 (bottom to top), calculated using Equation 1 and Equation 11. The dashed curves show the high recombination limit (Equation 19).

Fig 15 shows a similar plot for the decrease in mean coalescence time caused by purifying selection. Now, mutation is set to a moderately high level (U = 0.5); with weak mutation, the population would almost always be fixed, and effects on linked variation would be negligible. The deterministic limit of Equation 15 (top thick line) now performs poorly. This is because it is based on the assumption that allele frequency is close to the deterministic equilibrium of 1 - (U/S), which is never the case for U = 0.5, even when selection is strong. This approximation is expected to be accurate only for U >> 1. The large R approximation of Equation 19 is more accurate (dashed lines), but systematically underestimates the effect by ~18%. Examination of the differences in mean coalescence time between backgrounds, {Delta}{tau}, {theta}{tau}, shows that the approximation of Equation 17 breaks down near the margins (q ~ (1/R)). Since the stationary density is appreciable in this region for R = 10, much larger recombination rates would be needed for accuracy to be improved. As for balancing selection, this approximation is accurate only in cases where the effect on coalescence time is small.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 15. The effect of purifying selection on mean coalescence time, plotted against recombination rate, R; U = 0.5, . The vertical axis shows decreases from the neutral value, 1 - E[], on a logarithmic scale. The top thick curve shows the deterministic limit for S = 8, in which allele frequency is assumed to be fixed at p0 = 1 - (U/S) (Equation 15). The thin solid curves are for S = 0.5, 1, 2, 4, 8 (right side, bottom to top), calculated using Equation 1 and Equation 11. The dashed curves show the high recombination limit (Equation 19).

For tight linkage, the effect of purifying selection at first increases with selection, but then decreases (see Fig 15, left side). This can be seen more clearly in Fig 16, which shows the mean coalescence time as a function of S = Ns, with complete linkage. As selection becomes very strong, the rare allele is driven out of the population, and so the genealogy returns toward its form under the neutral coalescent.



View larger version (10K):
In this window
In a new window
Download PPT slide
 
Figure 16. The effect of purifying selection on mean coalescence time, plotted again