## Abstract

An important clue to the evolutionary history of an allele is the structure of the neighboring region of the genome, which we term the genomic background of the allele. Consider two copies of the allele. How similar we expect their genomic background to be is strongly influenced by the age of their most recent common ancestor (MRCA). We apply diffusion theory, first used by Motoo Kimura as a tool for predicting the changes in allele frequencies over time and developed by him in many articles in this journal, to prove a variety of new results on the age of the MRCA under the simplest demographic assumptions. In particular, we show that the expected age of the MRCA of two copies of an allele with population frequency *f* is just 2*Nf* generations, where *N* is the effective population size. Our results are a first step in running exact coalescent simulations, where we also simulate the history of the population frequency of an allele.

CONSIDER two copies of an allele in a panmictic population. What is the age of the most recent common ancestor (MRCA)? This question is interesting for population genetics, but has also taken on importance for medical genetics, because the age controls the expected length of the region over which the genomic backgrounds of the two copies are identical. If the MRCA is old, then, since the time of the ancestor, many nearby mutations or recombinations will usually have occurred. If the MRCA is young, then most often the genomic background of a copy of the allele will be similar to the background of the most recent ancestor. If the genomic background of a disease allele has little variation over a long region, then detection of genomic association becomes much easier than if linkage with the allele exists only over a short distance. Thus, on average, it is easier in a genomic scan to detect young disease-causing mutations than old, although after successful detection precise mapping will be harder (on average) with a young allele.

The association of the age of the MRCA with the extent of genomic diversity is natural to consider and is not a new idea. For example, it is discussed by Reich and Goldstein (1999), although they are not concerned with the probability distribution of the age. How then can we estimate the age of the MRCA? A quantity that is relatively easy to estimate is the population frequency of an allele. This is correlated with both the age of the mutation and the age of the MRCA. Most rare mutations are young, as has been well understood for a long time and made precise in Kimura and Ohta (1973), and most young mutations are rare. This is related to the central idea of a recently proposed powerful test for selection (Sabeti *et al.* 2002); suppose we have an allele that is frequent, but the genomic background over a long distance lacks diversity. Under a neutral model, we have a mismatch of information—the lack of diversity suggests a young allele that should not be frequent.

It therefore is interesting to consider the *joint* distribution of population frequency and the age of the MRCA. Related ideas are discussed by Slatkin and Rannala (2000) who consider the problem of estimating allele age, using both frequency and linkage disequilibrium information. However, they are chiefly concerned with practical methods of analyzing data, and with the influence of past demographic events, while our aim here is to derive exact probability models and work out the consequences under the simplest possible demography. There are some striking results, including the new result that the expected age of the MRCA of two copies of an allele of population frequency *f* is proportional to *f*.

The two most important tools for the analysis of how mutant alleles spread through a population are diffusion theory, largely developed in a genetic context by Kimura over a series of 25 years from 1955 to 1980 (see Watterson 1996 for a very clear review), and Kingman's coalescent (Kingman 1982; Nordborg 2001). However, these two essential tools have infrequently been used together.

We work through some ideas motivated by the coalescent, but using diffusion techniques, very much in the spirit of Kimura's work. Indeed we generalize some of his results. There are evident relationships with our work and the theory of the coalescent, which deserve future study. The work of Griffiths and Tavaré (1998)(1999, 2003), Wiuf and Donnelly (1999), and Griffiths (2003) could be regarded as applying the theory of the coalescent to the diffusion of alleles. We do the reverse. It seems that some results are much easier in one approach than in the other.

We briefly mention two other relevant recent articles.

Barton

*et al.*(2004): This, like our work, considers coalescence at a single biallelic locus with the allele frequencies varying in time and uses diffusion theory. Comparing their work and ours, although the situations analyzed are similar they are not the same. They consider recurrent mutations while we are interested in alleles where there has been a single mutation event. They also consider the case of selection, and their main technique is to derive a set of coupled ordinary differential equations, which require numerical solution. Our situation is simpler and we can obtain analytic solutions.Innan and Nordborg (2003): This article, which does

*not*use diffusion theory, considers expressions relating to the length of a polymorphism of known frequency. For instance, they use the theory of the coalescent, conditional on the observed number of alleles in the sample, to estimate the probability of no recombination between two sites of distance*L*apart in the history of the sample.

Using our results we can run coalescent simulations backward in time where we simulate jointly both the ancestral history of a small sample of a population and the population frequency.

Even if (as is the case for humans) the past demography is complex, we believe our results will find application. For instance, if the past population size is not constant, one can “stretch time” so that coalescence occurs at a constant rate, but the rate of mutation is varying. Our techniques should still apply and remain a foundation for simulation.

Our main object of study is the probability distribution of the age of the MRCA of two copies of an allele that has known population frequency *f* today. We ignore the possibility of recurrent mutations, so that the polymorphism arose with a unique mutation event.

Two cases naturally arise. The allele may be *ancestral*, which for the remainder of this article we term *old*, or *nonancestral*, which we term *new*. Suppose we are considering a biallelic polymorphic locus, for instance, an {*A*, *C*} SNP. In the absence of other information the prior probability that the *A* allele is new is ^{1}/_{2}. If we observe a set *S* of *A* alleles and study the joint distribution of coalescence times of ancestors of *S* and the population frequency of the *A* allele, then the setup is *not* symmetric between the cases of *A* new or old.

For example, it is obvious that for a new allele all coalescent events must occur after the mutation that gave rise to the allele, but for an old allele some coalescent events can occur before the mutation that made the locus polymorphic and some after. This distinction is important for much of our discussion.

We have here an implicit reverse diffusion process where we consider the frequency of the allele back in time from the present. For a new allele the frequency will at some point be 0, for an old allele the frequency will at some point be 1, and the coalescent process will become the ordinary Kingman coalescent.

We assume our organism is diploid and scale time so that unit time is 2*N* generations, where *N* is the effective population size. This is convenient and has become standard in the literature.

Our results, true for a panmictic population of very large constant size *N*, include:

The MRCA satisfies a diffusion equation

*with killing*(see Karlin and Taylor 1981, for definitions), and the solution can be given as an explicit (infinite) sum of orthogonal polynomials.The expected time to the MRCA of two copies of an allele is

*f*, the frequency of the allele. More generally, if*M*is the expected time to the first coalescent event for*k*copies of an allele of frequency*f*, then and the mean times for the whole coalescent subtree of these*k*alleles are the same as for the ordinary coalescent with fixed population size*Nf*. This is true although the probability density of each coalescent time is*not*the same as for the ordinary coalescent.If an allele is known to be new then over a small time interval back from the present, the probability density of the time

*t*to the MRCA is approximately the constant 1/*f*.If the allele is not known to be new or old, then the probability density (for small time

*t*) is approximately*e*^{−}. [Corollaries 3 and 4 make these statements precise.]^{t}/fIf an allele is known to be new then the expected time

*M*to the MRCA is an expression*M*_{1}(*y*) given in Equation 16 below. (This expression also appears in Equation 9.2 of Griffiths and Tavaré 2003.)For a new allele, as the frequency

*f*tends to 1, the expected time to the MRCA tends to

## METHODS

### Diffusion equations:

Suppose *X*(*t*) is the population frequency of an allele at time *t*. We begin with the basic diffusion which is the Wright-Fisher equation for the propagation of the frequency of an allele under no selection, in a panmictic constant-size population of *N* diploids, where we scale time so that unit time is 2*N* generations.

We call this the *Kimura case*. A key quantity is the *transition function K*(*x*, *y*; *t*), the probability density of *X*(*t*) = *y* conditional on *X*(0) = *x*. Thus setting 1*P*(*t*, *x*) is the probability that the allele is polymorphic at time *t* given that it has frequency *x* at time 0. The transition function *K*(*x*, *y*; *t*) satisfies two important partial differential equations: 23Equation 2 is the *Kolmogorov forward equation* and (3) is the *Kolmogorov backward equation*. Kimura uses these equations creatively in much of his work on diffusions. Kimura (1955) contains the first appearance of these equations in his work, with an extensive justification and discussion given in Kimura (1964). A textbook treatment is in Karlin and Taylor (1981), while the original rigorous treatment is by Kolmogoroff (1931).

We are interested in the time to the MRCA of two copies of an allele with current population frequency *y*. We call this the *coalescent case*.

We discuss the diffusion of the population frequency back in time, conditional on coalescence not having occurred. When coalescence occurs we stop (*kill*) the diffusion.

It is natural to consider the function *C*(*x*, *y*; *t*), which is the joint probability density that *X*(0) = *y* and the MRCA has age greater than *t* conditional on *X*(−*t*) = *x*.

Consider the event *E* that the allele frequency is *x* at time −*t* and that the time to the MRCA (TMRCA) occurs earlier than −*t* so that coalescence has not yet occurred at −*t*. Thus the probability, conditional on the event *E*, that the MRCA occurs in the interval (−(*t* + δ*t*), −*t*) is δ*t*/*x* + *o*(δ*t*). This is for the same reason that in the ordinary coalescent the waiting time for the coalescence of two alleles is exponential with mean 1. That is, the instantaneous probability that our diffusion is killed is 1/*x*.

Now it follows that the function *C* satisfies the backward equation: 4To motivate Equation 4, note that if we consider a Wright-Fisher diffusion in a population of *constant* frequency *x*, then the probability *D*(*t*) that the MRCA of two alleles is greater than *t* is just *e*^{−}^{t}^{/}* ^{x}* and we have Equation 4 also occurs as a special case of Equation 83 in a recent article of Griffiths (2003).

We may consider the forward process where the allele frequency diffuses as in the *K* process but in addition at time *t* when the frequency is *y* the probability in the interval (*t*, *t* + δ*t*) that the process is killed is δ*t*/*y* + *o*(δ*t*). This then yields the forward equation 5Karlin and Taylor (1981)(p. 314) give a simple method of realizing this forward process with killing. Let *X*(*t*) be a sample path in the (forward) Wright-Fisher diffusion with *X*(0) = *x*. As *X*(*t*) ≤ 1 the integral diverges. Let *R* be a random variable, exponentially distributed with mean 1 and now set ζ (the *killing time*) by The path *X* killed at ζ is the realization we seek.

### Solution of the transition function:

In both the Kimura and coalescent cases, we can write the transition function as an explicit sum of polynomials, orthogonal on the unit interval with respect to an appropriate weight function.

#### The Kimura case:

As we give a detailed argument for the coalescent case and there is now a textbook treatment for the easier Kimura case in Karlin and Taylor (1981)(Chap. 16), we just give the result without proof.

Alternative approaches (“lines of descent”) are presented in Griffiths (1980) and Tavaré (1984).

Here is the result, first obtained by Kimura (1955), 6with λ(*i*) given by 7The *J*^{1,1}_{i} are Jacobi polynomials (in this case also called *Gegenbauer* polynomials), and *N*^{1,1}_{i} are normalization constants. The *J _{i}* and

*N*are defined explicitly in the appendix. This result is generalized by Theorem 2 below.

_{i}#### The coalescent case:

The diffusion *C* differs from *K* by considering the extra chance that coalescence has occurred. As a result *C*(*x*, *y*; *t*) < *K*(*x*, *y*; *t*). It follows that *C*(*x, y*; *t*) → 0 as *x* → 0 or *x* → 1. Fix *y* and seek solutions to (4). We seek a solution of the form α(*t*)β(*x*).

We give details of the analysis in the appendix, but here is a summary of the main results.

Theorem 1. *Let* α(*t*)β(*x*) *be a solution to the differential equation* (4). *Then*

*where*λ*=*(*n*+ 2)(*n*+ 3)/2*for some integer n*≥ 0.*For a given choice of n of*1*above, set**then w*(_{n}*x*)*is a polynomial in x.**Substitute x*= (1 +*z*)/2;*then w*=*w*_{n}satisfies*which is a special case of the Jacobi differential equation*, 8*with a*= 1,*b*= 3,*whose solutions are the Jacobi polynomials P*^{a,b}_{n}*, which we define explicitly in the*appendix. (*See, for example,*Birkhoff and Rota 1969,*Chap. 9*,*Equation*32).*The w*It now follows from standard results on orthogonal polynomials that we can write_{n}are orthogonal with respect to the norm*C*(*x*,*y*;*t*) as 9where the coefficient*c*(_{i}*y*) is determined from the orthogonality of the Jacobi polynomials.

#### The general case:

There is a more general diffusion equation for the most recent coalescent event with *k* copies of an allele. What we have been terming the “coalescent case” is the simplest case where *k* = 2.

If we have *k* copies, there is a most recent coalescent time, which is the first time in which the *k* copies have fewer than *k* distinct ancestors. We call this a *coalescent event*. We now define *C _{k}*(

*x*,

*y*;

*t*) as the joint probability that the frequency now (at time 0) is

*y*, and no coalescent event occurred in the time interval (−

*t*, 0), conditional on

*X*(−

*t*) =

*x*. If

*k*= 1, then coalescence is impossible so we take The same argument as before shows that

*C*satisfies the backward equation, 10where .

_{k}We can now generalize Theorem 1. Detailed arguments are found in the appendix.

Theorem 2. *Consider k copies of an allele with population frequency y. Suppose that the diploid population is panmictic with constant size N. Take unit time to be* 2*N generations. Write X*(*t*) *to be the population frequency of the allele at time t, so that X*(0) = *y. Let C _{k}*(

*x*,

*y*;

*t*)

*be the probability density, conditional on X*(−

*t*) =

*x of the event that X*(0) =

*y and no coalescence has occurred by time*−

*t.*(

*So each of the k ancestors is distinct at time*−

*t*.

*If k*= 1,

*coalescence is meaningless and we take C*

_{1}(

*x*,

*y*;

*t*) =

*K*(

*x*,

*y*;

*t*)).

*Then under the diffusion approximation*11

*where*

*The polynomials J*

^{1,2k−1}

_{i}

*and normalizing constants N*

^{1,2k−1}

_{i}

*are defined in the*appendix.

#### The age of an allele—a Bayesian interpretation:

We now discuss our interpretation of the meaning in our diffusion model of the “age of an allele.” We propose first taking a rather natural Bayesian prior and then taking limits. This is an interpretation made in Kimura and Maruyama (1975) and made very explicit in Watterson (1976) and Sawyer (1977), although both of these authors preferred a different interpretation involving reversibility. Take the time now to be 0 and assume for the moment that mutations have taken place in a finite time interval [−*T*, 0] where *T* is very large. We assume that the prior distribution of the mutation time is uniform in [−*T*, 0] and that the probability of a double mutation is so low as to be negligible.

We model a mutation as giving birth to an allele with frequency ε that is very small. (Indeed this is biologically as well as mathematically reasonable; the initial frequency is 1/2*N*, where *N* is the effective population size, and *N* is of course finite.)

We are interested in quantities of the form for various functions *f*(*x*).

We always neglect terms of order ε^{2}. Here is an example. Let *A*(*t*) be the probability that the age of a nonancestral allele of frequency *y* is at least *t*.

Then it follows from our mutation model that without approximation, Since *T* is large this can be approximated by the infinite integral where we use Theorem 3 below.

Now *K*(*x*, *y*; *t*) = *O*(*x*) for small *x*. It follows that the first integral can be neglected and that approximately, Other such integrals in what follows should be interpreted in the same sense.

As *A*(0) = 1, it is easy to determine the constant of proportionality, and we finally obtain

### Green's functions:

We next discuss the *Green's function* of our diffusion, which is a technical tool that has become standard in diffusion theory. Karlin and Taylor (1981) have an extensive treatment. Knowledge of Green's function is the fastest route to deriving a number of the results we will prove. Ewens (1963a)(1964) was the first to make use of Green's function in genetics, and it was used implicitly by Kimura in several of his articles.

#### The Kimura case:

Green's function 12is well known, and a textbook treatment is in Karlin and Taylor (1981).

The result is as follows.

Theorem 3 (Ewens 1963a, 1964). *For x* ≤ *y* *For x* > *y*

Most of the basic results on the Kimura diffusion follow rapidly from knowledge of Green's function. We give two examples.

Corollary 1 (Kimura and Ohta 1973; Watterson 1976). *Given an allele has frequency x, the probability the allele is old is x.*

*Proof*.

Thus

Corollary 2 (Watterson 1962; Ewens 1963b). *Let an allele have frequency x*; *then the expected time to fixation or extinction is*

*Proof*. If *A*(*u*) is a probability density, and (the complement of the cumulative density), then it is well known that Using this, as asserted.

Suppose all we know about a biallelic marker is that it is polymorphic. If *x* is the population frequency of one of the two alleles, what is the natural prior probability distribution *P*(*x*) for *x*? From Green's function and our Bayesian interpretation, we see that the appropriate prior is the *improper* prior,

This prior is often used in the remainder of the article. An interesting consequence, given in Kruglyak and Nickerson (2001)(Box 1), though with a different argument, is that if a SNP is detected from examination of two chromosomes, then the distribution of the allele frequency of the discovered SNP is uniform. This follows because if a SNP has frequency *x*, the probability that two copies of the marker do not agree is proportional to *x*(1 − *x*) and multiplication by the prior gives a constant, independent of *x*.

#### The coalescent case:

Our process now is subject to killing at rate 1/*x*.

Define Green's function: 13

Standard techniques (and some heavy algebra using Maple) yield the following.

Theorem 4. *For x* ≤ *y* *For x* > *y*

Theorem 5. *For an allele of unknown ancestry status and frequency y*:

*Let A*(*t*)*be the probability that the MRCA of two copies is at least t. Then*14, 2.The expected age of the MRCA is y. More generally the expected age of the first coalescent event, for k copies of the allele is y/*S, where*.*The expected age of each coalescent event, in the coalescent tree of k copies of an allele with frequency y is the same as in the ordinary coalescent in a fixed size population of size y.*We give details in the appendix, but briefly sketch the central, rather simple idea to prove 2 and 3. An easy calculation shows that if τ is the time to the coalescent event,

*M*=*E*(τ), and*x̅*is the expected population frequency at time τ, then . But from general principles . This gives statement 2. Statement 3 now follows by induction.

Although the *expectations* are the same as for the ordinary coalescent, the *distributions* of the coalescent times are different, because the population size of the allele varies with time.

Theorem 6. *For a new allele, of frequency y:*

*Let B*(*t*)*be the probability that the MRCA of two copies has age at least t. Then*15, 2.The expected age of the MRCA is M_{1}(*y*),*where*16

*and*

We again give details in the appendix.

If we consider an old allele, rather than a new one, as in Theorem 6, then the expected age of the MRCA of two copies, *M*_{2}(*y*) say, is 17This follows immediately, since by Theorem 5 and Corollary 1:

In Figure 1 we show plots of *M*_{1} and *M*_{2}.

Our formulas have implications for coalescent theory. As an illustration, and a check on our algebra, suppose we observe *n* alleles and of these *i* carry a mutation, known to be nonancestral. Wiuf and Donnelly (1999)(Equation 28) give a formula for the expected age of the MRCA of the *i* alleles, using coalescent arguments.

Take as an example *i* = 2, *n* = 4. Then with the same time normalization as we have been using, the Wiuf-Donnelly formula gives the expected MRCA age as 5/18. On the other hand, if we observe two copies of a new mutation out of four, the posterior distribution of the population frequency *y* is readily seen to be 12*y*(1 − *y*)^{2} and so the expected age of the MRCA is which Maple shows is indeed 5/18. If the mutation is not known to be new or old, we get an easier integral for the expected age of the MRCA, which we checked by coalescent simulations.

### The distribution of the time to the MRCA:

#### Some numerical plots:

Consider an allele of frequency *y* at *t* = 0. We are again interested in coalescence of two copies of the allele. We let *V*(*t*) = *V*(*t*|*y*) be the probability density of the event that the coalescence time is −*t*, which we can compute numerically using Theorem 2 and Equations A22 and A23.

In Figure 2 we show plots of *V*(*t*). We first show a plot for *y* = 0.01, 0.02, 0.05, 0.10. Then we plot *y* = 0.10, 0.20, 0.50, 0.99.

The frequency in the top graph is lower than that in the bottom graph (except we chose to plot frequencies 0.1 and 0.99) by a factor of 10 and the graphs are similar except for the scale. As a rough approximation the probability density *P*(*y*, *t*) that the MRCA of two alleles of population frequency *y* has age *t* is a function of *y*/*t*.

#### A simulation scheme:

We simulate the coalescent with side information—the population frequency of an allele.

In the ordinary coalescent, if we begin with *k* haploids, the next coalescent time *t* is exponentially distributed with mean 1/ and at coalescence we have *k* − 1 distinct ancestors. Thus a simulation procedure can be iterated until *k* = 1.

In our case, if we begin with *k* haploids, each with an allele of frequency *y*, we can proceed analogously by sampling (*t*, *x*), where *x* is the population frequency at coalescence. Just as in an ordinary coalescent simulation we can iterate, until *k* = 1.

The required probability densities are given by Equations A22 and A23 in the appendix. To use these we will need to evaluate *C _{k}*(

*x*,

*y*;

*t*) numerically. The series expansion given by the orthogonal polynomial expansion of Theorem 2 converges very fast unless

*t*is small, but in that case the series has poor numerical behavior. We therefore give alternative formulas for the coalescent times valid for small

*t*. Taken together these give satisfactory results for all

*t*.

#### Taylor series for the coalescent time:

Our orthogonal polynomial expansions are not numerically very stable for small time *t*. The instability is related to the well-known Gibbs phenomenon. (See Gottlieb and Shu 1997, for a modern review and some sophisticated techniques for alleviating the problem.) To develop random sampling procedures valid for all regions of time, we wanted to have high-accuracy solutions to the density of the coalescent time valid for small time.

Voronka and Keller (1975) and Tier and Keller (1978) develop, for the Kimura case, highly accurate asymptotic series when time *t* is small. Their methods use techniques from the theory of wave propagation. We instead use methods that are more elementary and seem sufficient for our needs.

Fix the frequency *y* and let *C*(*x*, *y*; *t*) be defined as before. We term a function *f*(*x*) on the unit interval *well-behaved* if for all derivatives *f*^{(}^{k}^{)}(*x*) of *f* and all partial derivatives

If *f*(*x*) is a rational function, it is well behaved. Let There is an interesting recursion for the Taylor series of *F* as a power series in *t* around *t* = 0.

Theorem 7. *Let f*(*x*) *be a well-behaved function on the unit interval. Let* *Define f*_{0}* = f and iteratively for n =* 1, 2, *…* , *let* *Then f _{n}*(

*y*)

*is the nth partial derivative of F with respect to t, evaluated at t*= 0.

Let an allele have frequency *y*, and let *Y*(*t*) be the probability distribution of the coalescence time of two copies of the allele.

Corollary 3. *If the allele is new, then the Taylor series T*(*t*) *for Y at t* = 0 *is*

Corollary 4. *If the allele is not known to be new or old, then the Taylor series T*(*t*) *for Y at t* = 0 *is*

*Proof*. For Corollary 3, using Green's function, we see that we evaluate the Taylor series by setting *f*_{0} = *f*(*x*) = *y*/*x*^{2} in Theorem 7. But then *f*_{1}(*x*) = 0 and the result follows.

For Corollary 4 set *f*_{0} = *f*(*x*) = *y*(1 − *y*)/*x*^{2}(1 − *x*). Now *f*_{1} = −*f*_{0} and iterating, *f _{n}* = (−1)

^{n}f_{0}. Again the result follows.

## DISCUSSION

Kingman's coalescent has for 20 years been the dominant paradigm for the analysis of ancestral trees. Our work provides convincing evidence that diffusion theory still has much to offer. Suppose, for example, the frequency of some allele in the population is known. One wishes to sample from the probability distribution of the coalescent, for *k* copies of the allele. Our theory gives the joint distribution of the next coalescent time *t* and the population frequency *X*(*t*), at that time. Sample (*t*, *x*) according to this distribution. The procedure may be iterated, just as in an ordinary coalescent simulation. Equations A22 and A23 of the appendix give the required probability densities.

We found it interesting that the theory is much simpler if the ancestral state is unknown than if it is known. The underlying reason is suggested by the proof of our Theorem 5. If the ancestral state is unknown then the expected population frequency in the past is the same as it is now, and this causes substantial simplification.

## APPENDIX

### Proof of Theorem 1:

As we have already argued, *C*(*x*, *y*; *t*) < *K*(*x*, *y*; *t*), and so *C*(*x*, *y*; *t*) → 0 as *x* → 0 or *x* → 1. Fix *y*. We seek solutions to (4) of the form α(*t*)β(*x*). From the equation, α′(*t*)/α(*t*) is independent of *t*. It follows that α(*t*) = *e*^{−λ}* ^{t}* for some λ and A1

We want solutions with β(0) = β(1) = 0. We first show λ ≠ 0. Consider the homogeneous differential equation A2This equation has two independent solutions, as can be verified directly by differentiation: A3A4We find that A5A6A7A8And so no nonzero linear combination gives 0 at both ends of the unit interval. Thus λ ≠ 0.

Next, in contrast to the Kimura case, we show λ ≠ 1. Indeed if λ = 1 Equation A1 is which has the two independent solutions *x*^{2} and 1/*x* and again no linear combination is feasible.

We now seek solutions with a convergent power series in the open interval (0, 1), so that where Equation A1 now gives A9and so A10It follows that if there exists *n* with (*n* + 1)(*n* + 2) > 2λ, and *c _{n}* ≠ 0 that the sequence

*c*converges to a nonzero value,

_{m}*L*say, as

*m*→ ∞. But then as

*x*→ 1,

*x*

^{2}(1 −

*x*)β″(

*x*) →

*L*, and β(

*x*) →

*L*/(2(1 − λ)). This is a contradiction. Therefore β″(

*x*) [and β(

*x*)] is a polynomial, and λ = (

*n*+ 1)(

*n*+ 2)/2 for some integer

*n*≥ 0. As λ ≠ 1 we in fact obtain From (A1) and noting that λ > 1, β(

*x*) is divisible by

*x*

^{2}(1 −

*x*). Set Substituting in (A1) we get after a little algebra A11a hypergeometric equation, A12(See, for example, Birkhoff and Rota 1969, Chap. 9, Equation 18.) Here, Substituting

*x*= (1 +

*z*)/2 we obtain Equation A13.

A13which is a special case of the Jacobi equation (8). Theorem 1 now follows from standard facts about the Jacobi polynomials.

### The general case—Proof of Theorem 2:

We now generalize the analysis to the solution of Equation 10, which is the corresponding equation for considering the coalescence of *k* copies of an allele. As before set . Again we seek solutions of the form α(*t*)β(*x*), and The corresponding equation to (A1) for general *k* is A14Substitute Making the same transformation as before, by setting *x* = (1 + *z*)/2 we obtain the differential equation, generalizing (A13), A15a Jacobi equation with *a* = 1, *b* = 2*k* − 1.

The eigenvalues are A16and the solution to Equation A15 is the Jacobi polynomial *P*^{1,2k−1}_{n}.

Thus the equation generalizing (9) is where λ* _{i}* is given by (A16). Multiply by

*x*

^{k−1}

*J*

^{1,2k−1}

_{i}and integrate. We define and get, using

*C*(

_{k}*x*,

*y*, 0) = δ(

*x*−

*y*), A17This is enough to prove Theorem 2.

In Table A1 we give the first few values of *J _{n}* and

*N*, which are useful for checking a software implementation that uses Theorem 2.

_{n}### Proof of Theorem 5:

Let *A*(*t*) be the probability that the MRCA of two copies of an allele with frequency *y* is at least *t*, where the ancestral status of the allele is unknown. Then where the last term corresponds to the case that the allele is old and that the coalescent event occurs before the mutation event, so that the allele was fixed in the population at the time of coalescence.

As *A*(0) = 1 we get A18which is what we required. For the second part we give a Martingale argument that has its own interest and avoids any heavy computation.

We first give a “reversibility” argument that is now well known (see Watterson 1976, for a detailed description and applications). Write calculated for the Kimura diffusion. First assume that 0 < *x* < 1. It follows from the Kolmogorov equations or directly from Theorem 2 that Under our Bayesian interpretation,

At the boundary we can also check that with our Bayesian interpretation It follows that if we fix *y*, then *R*(*x*, *y*; *t*) and *K*(*y*, *x*; *t*) are equivalent stochastic processes. However, *K*(*y*, *x*; *t*) is evidently a *Martingale* (Karlin and Taylor 1975) and therefore *R*(*x*, *y*; *t*) is also.

Next we give a result, of independent interest, relating the expected coalescent time to the expected allele frequency at coalescence. The result holds in wide generality.

Lemma 1. *Consider k copies of an allele with population frequency f. Do not assume details of the process such as whether the allele is old or has unknown status, selection … Normalize time so that unit time is* 2*N generations. Let the mean time to the first coalescent event be M*(*f*)*. Set* . *Let X*(*f*) *be the expected frequency of the allele at first coalescence. Then* *Proof*. Let *A*(*t*) be the probability of no coalescence by time *t* back from the present. Then by a standard argument A19

Let *P*(*t*, *x*) be the joint probability that coalescence has not occurred at time −*t* and that the frequency at time −*t* is *x*. To make the notation clear we assume 0 < *x* < 1 and define *Q*(*t*) to be the joint probability that coalescence has not occurred at time −*t* and that the frequency at time −*t* is 1. [In general, lim_{x}_{→1}*P*(*t*, *x*) ≠ *Q*(*t*).] Then A20Next define *B*(*t*, *x*) as the joint probability density that coalescence occurs at time −*t* and that the frequency is *x* (0 < *x* < 1). Then Thus where we use Equation A19. This proves the lemma.

Robert Griffiths (personal communication) pointed out that this lemma is a result about any stochastic process *X*(*t*) on the positive real line that is killed at a rate proportional to 1/*X*(*t*).

Continuing with the proof of Theorem 5, we have constant population size, no selection, and the population frequency is *f*. We suppose that the allele status (new or old) is unknown.

Let τ be the first coalescence time given *k* copies of an allele. Now τ is a *stopping time* with finite mean, and it follows from basic Martingale results that Set, as before, . By Lemma 1 we have and it follows that This completes the proof of part 2 of Theorem 5. The last part follows easily by induction.

For numerical work we need expressions for the probability density [so we could write, more explicitly, *P*(*t*, *x*; *y*) for *P*(*t*, *x*)] and for Then by an argument we have already given, for 0 < *x* < 1, A22while A23Here *C _{k}* is given explicitly by the series expansion of Equation 11.

### Proof of Theorem 6:

The same argument as the proof of the first part of Theorem 5 gives our expression for *B*(*t*). For the second part, we have Now using Theorem 4 and Maple we can integrate this in closed form and obtain Equation 16.

### Proof of Theorem 7:

We see that *F*(*y*, *f*; 0) = *f*(*y*). Define A24We just prove the result for *k* = 1. The general case is essentially the same. For the first derivative and so This proves our result.

### Jacobi polynomials—some formulas:

We give here some formulas involving Jacobi polynomials, necessary for calculations using the orthogonal polynomial expansion of Theorem 1. Our main source is Abramowitz and Stegun (1964)(Chap. 22).

The Jacobi polynomials *P*^{a,b}_{n} are orthogonal on the interval [−1, 1] with respect to the weight function *W _{a}*

_{,}

*(*

_{b}*z*) = (1 −

*z*)

*(1 +*

^{a}*z*)

*. We normalize so that A25With this normalization A26where*

^{b}*A*

^{a,b}

_{n}is given by A27and

*P*

^{a,b}

_{n}is defined by A28(see Abramowitz and Stegun 1964, Equation 22.18), where

*a*

_{0,}

*(*

_{n}*z*) is defined recursively by A29A30and A31The above formulas are recommended for numerical work if one wishes to evaluate

*P*

^{a,b}

_{n}.

It is more convenient for us to work with polynomials *J*^{a,b}_{n} defined on the unit interval by which are orthogonal on the unit interval with weight function *w _{a}*

_{,}

*(*

_{b}*x*) =

*x*(1 −

^{b}*x*)

*and A32where A33and*

^{a}*A*

^{a,b}

_{n}is given by Equation A27.

We have *a* = *b* = 1 for the Kimura case and *a* = 1, *b* = 3 for the coalescent case. With *a* = 1 our normalization is .

## Acknowledgments

We thank David Reich for constant advice and encouragement; Pardis Sabeti and Eric Lander, whose comments much improved the presentation; Clifford Taubes for a very useful conversation; and Noam Elkies, who substantially improved an argument. After a first draft of this article had been written Robert Griffiths kindly sent me manuscripts of two articles of his in prepublication and made a number of constructive suggestions. We are extremely grateful to him and as a result made a number of improvements to our draft, especially a much better set of references. Finally we are very grateful to the editor and two anonymous referees, whose careful reading has greatly improved the article. This work would hardly have been possible without the extensive use of the symbolic algebra program MAPLE (2002).

## Footnotes

Communicating editor: S. Tavaré

- Received March 6, 2003.
- Accepted October 23, 2004.

- Genetics Society of America