Estimating the Time to the Most Recent Common Ancestor for the Y chromosome or Mitochondrial DNA for a Pair of Individuals
Bruce Walsh


Bayesian posterior distributions are obtained for the time to the most recent common ancestor (MRCA) for a nonrecombining segment of DNA (such as the nonpseudoautosomal arm of the Y chromosome or the mitochondrial genome) for two individuals given that they match at k out of n scored markers. We argue that the distribution of the time t to the MRCA is the most natural measure of relatedness for such nonrecombining regions. Both an infinite-alleles (no recurring mutants) and stepwise mutation model are examined, and these agree well when n is moderate to large and k/n is close to one. As expected, the infinite alleles model underestimates t relative to the stepwise model. Using a modest number (20) of microsatellite markers is sufficient to obtain reasonably precise estimates of t for individuals separated by 200 or less generations. Hence, the multilocus haplotypes of two individuals can be used not only to date very deep ancestry but also rather recent ancestry as well. Finally, our results have forensic implications in that a complete match at all markers between a suspect and a sample excludes only a modest subset of the population unless a very large number of markers (>500 microsatellites) are used.

MOLECULAR marker information has proven an invaluable tool for assessing the degree of relatedness between individuals. To date, most uses of marker information have been concerned with zero-, one- or two-, or deep-generation relatives. By zero-generation, we mean matching/rejecting a forensic sample and a suspect. One- or two-generation assessment includes paternity testing and assessing the degree of relatedness between individuals in natural populations. Typically, these tests have very low power for detecting relatives more distant than sibs and first cousins (Queller and Goodnight 1989; Lynch and Ritland 1999). Finally, human geneticists have been very successful in using marker information to assess very deep relationships, on the order of hundreds (or more typically thousands) of generations. Many of these deep relationship studies have used the haplotypes of nonrecombining chromosomes, such as the nonpseudoautosomal arm of the Y (e.g., Hammer 1995; Dekaet al. 1996; Skoreckiet al. 1997; Bianchiet al. 1998; Kittleset al. 1998; Wilson and Balding 1998; Thomaset al. 2000) and mitochrondial DNA (e.g., Torroniet al. 1994; Merriwetheret al. 1995; Forsteret al. 1996; Brownet al. 1998; Stone and Stoneking 1998; Torroniet al. 1998). Here we show how haplotype information can also be used to estimate the age of the common ancestor for individuals sharing an intermediate ancestry (tens to hundreds of generations) with reasonable precision. Since our comparison is restricted to the time to the common ancestor for a particular pair of individuals of interest, the fine details of the population history and structure do not enter into the analysis, other than very weakly through the mean of the assumed prior (as discussed below).

For unlinked markers, the product rule (multiplying single-locus genotype probabilities together to obtain a multilocus genotype probability) applied to highly polymorphic loci allows just a few (5-10) unlinked markers to be quite sufficient for identifying individuals that share a common relative in the last generation (such as parent-offspring or sibs). However, because of recombination, unlinked markers have very weak power for distinguishing individuals sharing deeper common ancestry. While there is a growing body of literature on estimating relatedness of two individuals given autosomal marker information (Thompson 1975; Queller and Goodnight 1989; Blouinet al. 1996; Ritland 1996; Marshallet al. 1998; Lynch and Ritland 1999), it is less clear how to proceed when using markers from nonrecombining DNAs. The product rule does not hold for such regions, as the markers are inherited as a single block.

The key to assessing the amount of relatedness using markers on a nonrecombining chromosome is that any two individuals (indeed, the entire population) will have a most recent common ancestor for that region. This follows from coalescence theory (Hudson 1991; Donnelly and Tavaré 1995), which shows that in a demographically stable population the expected time back to this most recent common ancestor (MRCA) follows a geometric distribution with parameter λ = 1/Ne, the inverse of the appropriate measure of effective population size [the effective numbers Nm and Nf of males and females for the Y and mitochondrial (mt)DNAs, respectively]. Using marker information, we can estimate the time t to the MRCA, and it is the distribution of t that provides a natural metric for describing the relatedness (at least for that region) between two individuals. It is important to stress this difference between the MRCA of a region of DNA and the MRCA for two individuals. Two individuals can easily share an ancestor that is more recent than their MRCA for a particular DNA region. Hence, estimates of the time to the MRCA using a particular DNA region provide an upper bound on the time back to the most recent common ancestor shared by two individuals.

Here we develop a Bayesian estimator for t, obtaining the complete posterior distribution for the time t to the MRCA. We assume that n markers are scored on the nonrecombining chromosome of interest (either the Y or mtDNA) and that we observe matches in allelic state at k of these. We start by assuming the infinite alleles model, where each mutation is assumed to be unique. We then modify our results by assuming a (symmetric single-step) stepwise mutational model, which is a more appropriate descriptor for microsatellite markers. As we show, when k/n is close to one and n moderate to large, the two different mutational models give essentially the same distribution for t.


Suppose we score the allelic states at n defined markers on a nonrecombining segment of DNA for two individuals and we wish to estimate the time back to their MRCA (for this segment). We first assume for each marker that (at most) only a single mutation has occurred over both lineages leading from the MRCA to the two individuals being considered. This is not an unreasonable assumption if the individuals match at most markers. We refer to this model as the infinite alleles model, as our development is also exact for the situation where each mutation is unique so that matches occur only when the marker locus in both lines has not mutated. Finally, we assume that the markers are exchangeable in the sense that the per-generation mutation rate μ is the same for each locus. We relax these assumptions below.

Let q(t) be the probability of a match (at any given single marker) between two individuals with a most recent common ancestor t generations ago. The number of matches (k) out of n loci follows a binomial distribution, with Pr(k)=n!(nk)!k!q(t)k[1q(t)]nk. (1) Ignoring matches created by parallel and/or back mutations, the probability that a marker matches after t generations is simply (1 - μ)2t, as the probability of no mutations occurring in the lineage from the ancestor to individual one is (1 - μ)t, with a similar probability from the MRCA to individual two. Hence, q(t)=(1μ)2te2μt=eτ, (2a) where τ=2μt (2b) is the time scaled as total generations of divergence times the mutation rate. Note that the expression given by Equation 2a is a lower bound for the probability of a match, as if back mutations occurred in one (or both) lineages, or, if both lineages experienced parallel mutations, we also observe a match. These types of matches require at least two mutational events, and hence from the first two terms of the Poisson distribution their probability is bounded above by 1 - exp(-2μt)(1 + 2μt), i.e., on the order of (2μt)2exp(-2μt). Thus if 2μt ≪ 1, the effects of such multiple mutations have only a trivial effect on increasing q(t) over the value assuming no mutations (see Figure 4). When a specific value of μ is required, we generally use μ = 1/500 = 0.20%, motivated by estimates for Y chromosome microsatellites of 0.28% (Kayseret al. 2000) and 0.21% (Heyeret al. 1997), which are very similar to the estimated mutation rates of 0.1 to 0.21% for autosomal microsatellites (Weber and Wong 1993; Brinkmannet al. 1998). We note that these mutation rate estimates are generally done by scoring microsatellites already known to be polymorphic, which introduces a slight ascertainment bias. However, since we assume the markers being scored are also chosen because they are known to be polymorphic (in the population as a whole), then these potentially biased estimates of mutation rates are appropriate for our analysis.

From Equations 1 and 2a, the resulting likelihood for the time t back to the MRCA given that we observe k out of n matches is L(tn,k)=n!(nk)!k!ekτ(1eτ)nk. (3) Setting the derivative of ln(L) equal to zero gives the maximum-likelihood estimate (MLE) for τ^=2t^μ as τ^=2t^μ=ln(nk). (4) Hence, the MLE for the time back to the MRCA becomes t^=12μln(nk). (5) Note that the MLE is not especially informative, as the distribution for t is highly positively skewed, resulting in a considerable variance and highly asymmetric confidence intervals about the MLE. In particular, note that the MLE is zero for all values of n when there are no mismatches (k = n), which tells us nothing about the possible restrictions on the maximal time back to the MRCA (see Fu and Li 1996 and Donnellyet al. 1996 for a related discussion).


While the MLE describes one feature of the distribution of t (the mode), the most complete picture is given by the full posterior distribution of t, which can be obtained by a Bayesian analysis (e.g., Lee 1997). Such an analysis proceeds from Bayes’ theorem, with the posterior distribution p(t|k) being proportional to the product of a prior distribution p(t) for t and a likelihood L(t|n, k) given the data (k out of n matches), p(tk)L(tn,k)p(t). (6)

The main objection to a Bayesian analysis raised by non-Bayesians is that the choice of a prior is often very subjective and, as such, this can greatly bias the posterior. For the time back to the MRCA, an appropriate prior naturally follows from standard coalescence theory, as the expected time back to a MRCA under pure drift in an effective population size of Ne follows the geometric distribution with success parameter Ne1 (e.g., Wilson and Balding 1998). The parameter is 1/Ne in this case [as opposed to 1/(2Ne) for an autosomal gene] because the uniparental inheritance means that both mtDNA and the Y chromosome are essentially haploid. As summarized by Hammer (1995), estimates for Ne based on the standing level of variation at presumably neutral markers are on the order of 5000 in humans for both mtDNA and the male-specific region of the Y chromosome.

Treating time as continuous, the geometric prior is equivalent to using an exponential distribution with hyperparameter λ=Ne1 . Thus, the natural prior for the time to MRCA (in the absence of any marker information) is to use p(t)=λexp(λt),whereλ=Ne1. (7) Taking the limit as λ → 0 gives an (improper) flat prior. As we will shortly see, the actual value of λ used has at best a trivial effect on the posterior distribution unless most markers do not match (kn) and the effective population size is extremely small. Thus the prior is both well motivated and the choice of the prior hyperparameter (λ) has very little effect on the final (posterior) distribution in most cases.

Recalling Equation 3, the resulting posterior distribution becomes p(tk)L(tn,k)p(t)=exp[(2μk+λ)t](1exp[(2μt)])nk (8a) so that p(tk)=exp[(2μk+λ)t](1exp[(2μt)])nkI(μ,k,n,λ), (8b) where the normalizing constant is given by I(μ,k,n,λ)=0exp[(2μk+λ)t](1exp[(2μt)])nkdt. (9) The impact of the choice of the hyperparameter λ for the prior immediately follows from Equation 8a. Provided 2μk ≫ λ, alternate choices of λ have little effect on the posterior distribution. Since λ=Ne1 , this rearranges to 2Neμk ≫ 1. For a mutation rate of μ = 1/500, the choice of Ne (and hence λ) has essentially no effect on the prior provided Nek ≫ 250, which is a very mild restriction.

Returning to the posterior distribution, the normalizing constant is easily computed by expanding the (1 - e-2μt)n-k term, noting that we can express the function being integrated as exp[(2μk+λ)t](i=0nk(1)i(nk)!i!(nki)!exp[(2μti)])=i=0nk(1)i(nk)!i!(nki)!exp[(2μ(k+i)+λ)t]. (10) Thus I(μ,k,n,λ)=i=0nk(1)i(nk)!i!(nki)!0exp[(2μ(k+i)+λ)t]dt=i=0nk(1)i(nk)!i!(nki)!12μ(k+i)+λ=2nk(nk)!μnki=0nk[λ+2μ(ni)]. (11a) The last step can be shown either by induction or by using a standard symbolic algebra package (such as Mathematica).

With a flat prior (λ = 0) the normalizing term further simplifies to I(μ,k,n,0)=(nk)!(k1)!(2μ)n!. (11b) Hence, the posterior density becomes p(tk,λ)=(i=0nk[λ+2μ(ni)]2nk(nk)!μnk)(1exp[2μt])nkexp[t(2μk+λ)]. (12)

For zero marker mismatches (k = n), the posterior is simply an exponential distribution with parameter λ + 2nμ, p(tk=n)=(λ+2nμ)exp[(2μn+λ)t]. (13) It immediately follows that the mean (μt) and standard deviation (σt) for the time to MRCA when there are no mismatches are μt=σt=1λ+2nμ12nμ. (14a)

Likewise, the cumulative probability distribution for the time back to the MRCA is just Pr(tT)=0Tp(tk=n)dt=1exp((2μn+λ)T). (14b) In particular, the time Tα satisfying Pr(tTα) = α is given by Tα=ln(1α)2μn+λ. (14c) Assuming a flat prior (λ = 0), if two individuals are identical at all n marker loci, there is a 90% probability they shared a MRCA in the last 1.15/(μn) generations. The values for 50, 95, and 99% are 0.347, 1.498, and 2.303 (μn)-1 generations, respectively.

The posterior distributions with one or more mismatches also follow from Equation 12. For example, for one (k = n - 1) and two (k = n - 2) mismatches, the posteriors are p(tk=n1)=((λ+2nμ)(λ+2μ[n1])2μ)1exp[(2μt)]exp[t(2μ(n1)+λ)] and p(tk=n2)=((λ+2nμ)(λ+2μ[n1])(λ+2μ[n2])8μ2)×(1exp[(2μt)])2exp[t(2μ(n2)+λ)]. The mean and variance for the posterior distribution for any value of k again follow by expanding the (1 - e-2μt)n-k term. Define h(μ,k,n,λ)=0texp[(2μk+λ)t](1exp[2μt])nkdt (15a) and g(μ,k,n,λ)=0t2exp[(2μk+λ)t](1exp[2μt])nkdt. (15b) Expanding and term-by-term integration gives h(μ,k,n,λ)=i=0nk(1)i(nk)!i!(nki)!1(2μ(k+i)+λ)2 (16a) g(μ,k,n,λ)=i=0nk(1)i(nk)!i!(nki)!2(2μ(k+i)+λ)3. (16b) Hence, the mean and variance for the time t to the MRCA, given k of n marker loci match, a prior with hyperparameter λ, and a per marker mutation rate of μ, are given by μt=h(μ,k,n,λ)I(μ,k,n,λ) (17a) and σ2(t)=g(μ,k,n,λ)I(μ,k,n,λ)μt2. (17b)

Figures 1, 2, 3 plot the posterior distributions corresponding to different numbers of matches (k) for n = 5, 10, 20, 50, and 100 markers under the assumption that μ = 1/500 and there is a flat prior (λ = 0). Provided that Ne ≫ 250, the results are the same for any other prior using λ = 1/Ne.

Table 1 summarizes the mean, standard deviation (SD), and mode (the MLE) for the posterior distributions as well as the 2.5, 50, 90, and 97.5% cutoff values. Note that a 95% credible region for t is given by the 2.5 and 97.5% values. As expected, the distribution of t is highly skewed, as mode < median < mean. The distribution becomes increasingly skewed to the right as the number of mismatches increases, which is reflected in not only an increase in the mean but also in the variance. However, note that the mean/SD ratio declines with increasing numbers of mismatches, so that the coefficient of variation declines as k decreases.

Finally, note that the resolution for t offered by using n = 5 markers is very poor, but rather fine precision is offered by using 100 markers. While scoring the latter number of markers may be unrealistic, using 20 markers is both feasible as well as offering reasonable precision.


As our knowledge of the parameters associated with the mutational process continues to improve, it is likely that we may find significant differences in the mutation rates at different markers. Fortunately, it is straightforward to modify the likelihood function (Equation 3) to take this into account. Suppose n markers are examined, generating the matching data x1,..., xn, where the xi are coded as xk={1match at markerk0no match at markerk.} The likelihood becomes L(x1,,xnt)=k=1nqk(t)xk[1qk(t)]1xk, (18a) where qk(t)=(1μk)2te2tμk, (18b) giving L(x1,,xnt)=exp[2tk=1nμkxk]k=1n[1e2tμi]1xk. (18c) Using an exponential prior with hyperparameter λ, the posterior is proportional to p(tx)exp[t(λ+2k=1nμkxk)]k=1n[1e2tμi]1xk. (19) In any particular data set, the normalization constant is easily obtained by expanding the product involving 1 - e-2tμi, which generally involves only a few terms unless there are a significant number of mismatches.

Figure 1.

—The posterior distributions for the time to the most recent common ancestor (MRCA) between two individuals assuming 5 (top) and 10 (bottom) marker loci were scored and k matches are observed. A flat prior (λ = 0) and a mutation rate of μ = 1/500 were assumed. Numbers indicate number of marker alleles k that match between the two individuals. Values for another mutation rate μ* are given by scaling the values by μ*/μ. Time is measured in generations.

For the simplest case of no mismatches (k = n), the posterior for the time to the MRCA becomes p(tno mismatches)=(λ+2nμ)exp[t(λ+2nμ)], (20a) where μ is the mean mutation rate across all markers. Equations 14a, 14b, 14c hold with μ replaced by the mean mutation rate μ . Likewise, for one mismatch (say marker i), the posterior becomes [2(n1)μi+λ][2(nμ)+λ]2μiexp[t(λ+2(n1)μi)][1e2tμi], (20b) where μi is the mean mutation rate for all markers, excluding marker i.


Microsatellites, given their higher mutation rates compared to single nucleotide polymorphisms (SNPs), are clearly the marker of choice for estimating the time to MRCA when the individuals are assumed to be at least modestly related. As microsatellite “alleles” correspond to different lengths of the repeat unit in the microsatellite array, the infinite alleles model assumed previously is not appropriate if multiple mutations are expected in any given marker. Since mutations change the number of repeats (and hence the size) of an array, two (or more) mutations can recover the initial state found in the MRCA. Likewise, parallel mutations in both lineages leading from the MRCA can also lead to the two individuals sharing the same allelic state, even though mutations have occurred. Thus, using an infinite alleles model for microsatellites will tend to underestimate the time to the MRCA. To examine the severity of this underestimation, we consider the divergence, assuming a stepwise mutational model (SMM) assuming equal probabilities of an up (increase array size by one) or down (decrease array size by one) move (the symmetric single-step SMM). The roots of the SMM trace from Ohta and Kimura’s (1973) model of charge differences in electrophoretically scored proteins. A number of workers have shown this model to be a good fit for microsatellites in both indirect studies examining the distribution of array sizes in natural populations (Edwardset al. 1992; Shriveret al. 1993; Valdeset al. 1993; Di Rienzoet al. 1994) and direct studies looking at actual mutations arising in pedigrees (Brinkmannet al. 1998; Kayseret al. 2000). In the latter two studies, the vast majority (35 out of 37) of new mutations were single step, while the remaining (2 out of 37) were two step.

Figure 2.

—Posterior distributions for time to MRCA for 20 (top) and 50 (bottom) markers. Details are as in Figure 1.

Denote the allelic state (array size) in the MRCA as state 0, and let Xt denote the array size at time t. As before we assume a per-generation mutation rate of μ. Under this model, the transition probabilities between states become Pr(X(t+1)=i1X(t)=i)=Pr(X(t+1)=i+1X(t)=i)=μ2Pr(X(t+1)=iX(t)=i)=1μPr(X(t+1)X(t)2X(t)=i)=0. (21)

To apply this model, we need to compute q(t), the probability of a match between two lineages sharing a MRCA t generations ago. A little thought shows that for the one-step model allelic states in two lineages can only match if an even number (2M) of mutations have occurred. The appendix shows that Pr(match2Mmoves)=122M(2MM)=122M(2M)!(M!)2. (22) For example, after a total (between both lineages) of 2, 4, 6, 8, and 10 mutations, the probabilities that the marker allelic states match are 0.5, 0.375, 0.313, 0.273, and 0.246 (respectively). Thus, under this model there is a one in four chance that the two lineages share the same allelic state even after a total of 10 mutations have occurred.

Since the probability of 2M total mutations along both lineages assuming t generations back to the MRCA is given by a Poisson distribution with parameter 2tμ, we have the probability of a match conditioned on t as p(t)=M=0Pr(match2Mmoves)Pr(2Mmovest)=M=0(122M(2M)!(M!)2)((2μt)2M(2M)!)exp(2tμ)=exp(2tμ)(M=0(μt)2M(M!)2). (23) For example, considering only the first 10 mutations, p(t)=exp(2tμ)(1+(tμ)2+(tμ)44+(tμ)636+(tμ)8576+(tμ)1014,400). The general solution to Equation 23 follows by noting that k=0(x)2k(k!)2=I0(2x), (24) where I0 denotes the zero-order modified type I Bessel function (Olver 1964). Hence, under the single-step mutational model, the probability of a match after τ = 2μt generations is q(τ)=exp(τ)I0(τ). (25)

Figure 3.

—Posterior distributions for time to MRCA for 100 markers. Details are as in Figure 1.

Figure 4 compares the probability of a match as a function of τ for the infinite allele and one-step models. The match probabilities under both models are rather similar for τ < 0.5 (125 generations with μ = 1/500), but they diverge rather quickly after that. Note that even after 20τ generations (5000 generations with μ = 1/500), the match probability under the stepwise model is still nontrivial (0.09), reflecting the very slow decrease in the probability of a match as the total number of mutations increases. By contrast, the corresponding match probability is essentially zero (2.06 × 10-9) under the infinite alleles model, as a single mutation causes the allelic states to diverge and subsequent back (and/or parallel) mutations are not allowed.

As shown in Table 2, both the mean and variance (measured by the standard deviation) of the distribution for time to the MRCA are larger under the stepwise model than the infinite alleles model. This is certainly expected, as the effect of the stepwise mutational model is to allow for a match following two (or more) mutations, while a match is assumed to never recover following a mutation under the infinite alleles model. Table 2 gives the ratios of the means and standard deviations for the distribution of t under the stepwise model compared to the same statistic under the infinite alleles model. When the number of markers is large (20 or greater) and the number of mismatches is small, these ratios are close to one. The ratio of standard deviations is always larger than the means ratio, reflecting the longer tail (relative to that for the infinite alleles model) generated under the stepwise model. As the number of mismatches increases, the mean and SD ratios increase, reflecting increasingly larger probabilities for t under the stepwise relative to the infinite alleles model.

View this table:

Summary of the posterior distribution p(t|k, n), where t is the time back to the most recent common ancestor (MRCA) for two individuals that match at k of n markers

Also note from Table 2 that for the same fraction of observed matches (say 4/5, 8/10, 16/20, 40/50, and 80/100), the ratios of the means and standard errors under the stepwise vs. the infinite alleles model decrease toward 1 as the number n of markers scored increases. For example, for 80% observed matches, assuming a flat prior (λ = 0), the mean ratios are 4.8, 1.34, 1.2, 1.1, and 1.1 (for n = 5, 10, 20, 50, and 100 markers). Likewise, the ratios of standard deviations are 34.2, 2.0, 1.3, 1.2, and 1.2. The reason for this can be seen by considering the case where we observe what appears to be a complete match at all n markers. In this case, there is some small chance that two (or more) mutations have occurred in one (or more) markers. However, if indeed a total of two mutations have occurred across both lineages, the probability that both occurred in the same marker is just 1/n (assuming the same mutation rate across markers). Hence, as the number of markers increases, the probability that a multiple mutation is masked decreases due to scoring changes over more loci.

Figure 4.

—The probability of a match in allelic state between two lineages with a MRCA t generations ago under the infinite alleles and stepwise models. Top, generations 0 to τ = 2μt = 1, which corresponds to 250 generations for μ = 1/500. Bottom, up through 20τ (5000 generations under this value of μ).

The final observation from Table 2 is that, unlike for the infinite alleles model, the choice of the hyperparameter λ for the prior can significantly affect the posterior distribution. This is true when the number of markers is very small and/or the fraction of mismatches is nontrivial. The most extreme difference between the two assumed mutational models is seen under a flat prior (λ = 0). As the value of λ increases (corresponding to a decrease in the assumed effective population size as λ = 1/Ne), the ratios of the means and standard deviations under the two models become increasingly similar. Almost all of this difference is due to significant decreases in the mean and variance under the distribution of t under the stepwise model as λ increases. This arises because the effect of increasing λ is to shorten the distributions of times to MRCA in the prior (decreasing Ne decreases the coalescent times), which in turn down-weights matches under the stepwise model from individuals with assumed very long times to the MRCA.

Our analysis under the SMM model considers markers only as showing a match or mismatch, ignoring any additional information on the differences between marker alleles when a mismatch has occurred. Over short time scales (on the order of 1/2μ < 1) we are likely not losing much information, as most markers will likely have at most one mutation and hence scoring a match vs. no match is sufficient. Over longer time scales, we are clearly losing information. In such cases, a logical extension of our model would be replacing the probability of a match with the probability that the two microsatellite alleles in the individuals being compared differ by r repeats. This is accomplished as follows. Consider the probability of an even number (2k) of differences between microsatellite array sizes. Again, a little thought shows that this can occur only with an even number of total mutations (2M). With a total of 2M mutations, the probability that the array sizes differ by 2k is the probability that the number of up (+) mutations is either M + k or M - k (by symmetry, these two probabilities are the same under the SMM). Thus, the probability of a difference (in absolute value) of Δ = 2k given 2M total mutations is Pr(Δ=2k2M)=2(12)2M(2MMk)forkM. (26) Since M follows a Poisson distribution with mean 2μt, the probability of a difference of 2k given the time t to the MRCA is Pr(Δ=2kt)=M=kPr(Δ=2k2M)Pr(2Mt)=M=k2(12)2M(2MMk)e2μt(2μt)2M(2M)!=2e2μtM=k(μt)2M(Mk)!(M+k)!=2e2μtI2k(2μt) (27a) where Is denotes the s-order modified type I Bessel function (Olver 1964).

Using the same logic, for a difference of 2k + 1 an odd number (2M + 1) of total mutations are required, and following the same steps leading to Equation 27a gives Pr(Δ=2k+1t)=2e2μtI2k+1(2μt). (27b) Hence, the probability that the array sizes for two alleles differ by j after τ = 2μt generations is q(j)(τ)=2exp(τ)Ij(τ)forj1. (28) Figure 5 plots the probability for 0, 1, 2, 3, and ≥4 differences in array size as a function of τ. F. Rousset (personal communication) kindly pointed out that Equation 28 can be found buried in Wehrhahn (1975), who obtained this result using the method of generating functions (also see Li 1976; Wilson and Balding 1998).

View this table:

The increase in the mean and standard deviation (SD) for the time to the MRCA under the stepwise mutational model as compared to the infinite alleles model (Table 1)

The likelihood function follows from the multinomial distribution. If a total of n markers are scored, and ni denotes the number of markers differing in size by i (with the largest difference being k), then L(tn0,,nk)=n!n0!n1!nk!j=0k[q(j)(2μt)]nj. (29) Again using an exponential prior, the resulting posterior distribution is proportional to p(tn0,,nk)j=0k[q(j)(2μt)]njeλt=2nn0e(λ+2μn)tj=0k[Ij(2μt)]nj. (30) The full distribution is recovered by numerical integration to normalize the posterior, viz., p(tn0,,nk)=e(λ+2μn)tj=0k[Ij(2μt)]nj0e(λ+2μn)tj=0k[Ij(2μt)]njdt. (31)

As an example of applying Equation 31, consider haplotypes 1 and 3 from Thomas et al.’s (2000) study on the Lemba and the Cohen (Y chromosome) modal haplotype. Six microsatellite markers were scored and, of these, both alleles match at four markers, while one marker differs by one repeat and another by two repeats. In this case, Equation 31 becomes p(t4,1,1)e(λ+2μ6)t[I0(2μt)]4I1(2μt)I2(2μt). Table 3 compares the estimated parameters under this model with those estimated using the infinite alleles and SMM matching models. Figure 6 plots the resulting posterior distributions. We use μ = 0.245% (the average of the Kayseret al. 2000 and Heyeret al. 1997 estimates) and a prior of λ = 1/5000 (from Hammer’s 1995 estimate of Ne for the Y) for these results.

While Equation 31 provides the foundation for a full Bayesian analysis, we caution that its usefulness depends on accurately capturing the mutation model for the markers in question. While the stepwise mutation model seems a reasonable fit, the fine details are still unclear. For example, there are suggestions that mutation rate may increase with array size (Brinkmannet al. 1998; Fu and Chakraborty 1998) but also observations that suggest this is not the case (Valdeset al. 1993). There are also observations suggesting a bias toward increased array size (Kayseret al. 2000; Fu and Chakraborty 1998), and although most mutational steps change array size by one, mutations of two or more steps are seen (Brinkmannet al. 1998; Kayseret al. 2000). Finally, there may be multiple molecular processes operating at microsatellites, such as a majority of small changes against a background of rare major changes (Di Rienzoet al. 1994) or independent deletions (Walsh 1987). Until these details are sorted out, the risk run by using a model to correct for multiple mutations is that at least as much bias could be introduced by assuming an incorrect mutation model as is removed by accounting for multiple hits. One approach would be to use Fu and Chakraborty’s (1998) minimum chi-square approach for estimating the generalized stepwise mutation model and compare the minimal (best) fitting parameters with those for the symmetric single-step model that we have assumed.


Both the Y chromosome and mtDNA have been successfully used for assessing deep ancestry, while unlinked autosomal markers have proven much more valuable for determining very recent ancestry. Here we examine the effectiveness of using multilocus haplotypes from a region of nonrecombining DNA (such as the majority of Y chromosome or mtDNA) to estimate the time t to the MRCA of two individuals for that region. We argue that, to assess the relatedness of individuals on the basis of their haplotypes in a region of nonrecombining DNA, the time to the MRCA is the natural replacement for probability calculations based on the product rule used for unlinked markers.

Figure 5.

—Probabilities that two microsatellite alleles that have been separated for a total of τ = 2μt generations differ in array size by 0, 1, 2, 3, and >3 repeats (computed using Equation 28). The single-step symmetric stepwise mutational model is assumed.

We assume n prechosen markers are examined and are (initially) coded as either matching (agreeing in allelic state) or not matching. Using a Bayesian approach, the resulting posterior distribution for t is a function of n, the number of matches k, the per-marker mutation rate μ, and the hyperparameter λ = 1/Ne (the reciprocal of the effective population size) of the assumed prior distribution of t (the assumed distribution for the time to the MRCA for two random individuals in the absence of any marker information). We examined two rather different mutation models—infinite alleles (each mutation is unique, with a match implying no mutation at that marker since the MRCA) and the SMM (which allows for matches created by several parallel mutations).

View this table:

Estimated time to MRCA between haplotypes 1 and 3 of Thomas et al. (2000)

Figure 6.

—Posterior distributions for t, the estimated time to MRCA, between Y chromosome haplotypes 1 and 3 of Thomas et al. (2000). IAM, infinite alleles model; SMM0, stepwise mutational model only scoring matches vs. no match; SMME, stepwise mutation model scoring number of differences between array sizes. See the Table 3 legend for further details.

Our results show that it is possible to use Y or mtDNA marker information to provide reasonable estimates for t when individuals share a MRCA of intermediate age (tens to hundreds of generations when μ = 10-3), provided a sufficient number of markers are scored (Table 1, Figures 1, 2, 3). Estimates using only 5 markers have distributions that are highly skewed for large t values, especially when the possibility of back mutations regenerating a match is taken into account (i.e., the SMM model). As the number of markers increases, the width of the credible intervals around an estimate of t decreases. While 10 markers give a reasonable interval, 20 markers seems a more workable tradeoff between cost and precision. Our results also suggest that the forensic use of either the Y or mtDNA is rather limited. They can be used for exclusion, but make only weak probability statements about a sample and a suspect when there is a complete match. Using Equation 14c, a complete match at 10 out of 10 markers between a sample and a suspect implies only that the individual generating the sample and the suspect have a 90% chance of sharing a MRCA no more than 58 generations ago (assuming a typical microsatellite mutation rate of μ = 1/500). With a complete match, the number of markers that need to be scored to have a 90% probability that the sample and suspect have a MRCA no more than 1 generation ago is ∼580, an order of magnitude more than the number of currently known Y-linked markers (M. Hammer, personal communication). A 50% probability that the MRCA does not exceed 1 generation requires roughly 340 microsatellite markers.

We obtained the posterior distributions for t given the marker data by using a Bayesian analysis, which requires a prior distribution for t. From standard population-genetics theory, t follows a geometric distribution with success parameter 1/Ne (as the Y and mtDNA behave as haploids). Hence, the functional form of the prior is well motivated, while its exact shape is determined by the hyperparameter λ = 1/Ne used for any particular prior. For the infinite alleles model, the actual value of λ had essentially no effect unless we used a very small value for Ne (<200) and/or there were a very significant number of mismatches (k/n ≪ 1). Thus, we used a flat prior (λ = 0, corresponding to an infinite effective population size) for the general results tabulated, although the equations given cover arbitrary values of λ ≥ 0. Conversely, under the stepwise model, the value of λ often had a significant effect on the posterior, especially when the number of markers n is small and the number of matches is modest or poor (k/n ≪ 1). In such cases, the use of a flat prior gave the largest difference in the posteriors for the two mutational models. As the value of λ is increased (corresponding to decreasing Ne), the posteriors under the two different models become increasingly similar (Table 2). This occurs because, as we decrease Ne, we make the initial assumption that individuals with long times to the MRCA become increasingly unlikely. It is these individuals that still have a modest probability of showing a match under the stepwise model, greatly inflating the estimated times to the MRCA relative to the infinite alleles estimate. In cases where there was a recent population expansion (or contraction) or a selective sweep, the distribution of times to MRCA may deviate from a geometric distribution. However, even in such cases, we expect there to be little dependence on the prior except in cases where changes in λ under a geometric prior have a significant effect on the posterior.

One potential issue of concern is whether our results are somehow biased by scoring only markers known to be polymorphic in the human population as a whole (but not necessarily in the particular pair of individuals being contrasted). When the goal is to estimate the coalescent time for an entire population, using markers known (in advance) to be polymorphic in the population of interest creates an ascertainment bias that needs to be corrected (e.g., Nielsen 2000). In our analysis, μ is the mutation rate for microsatellites conditional on their being polymorphic. However, since the estimates of microsatellite mutation rates used are also ascertained by scoring microsatellites known to be polymorphic, we have corrected for this effect. When considering random microsatellites, the mutation rate μ in our analysis is replaced by cμ, where c > 1 is an ascertainment correction that can be specified only by knowing how the polymorphic markers were ascertained.

While we have implicitly framed much of our discussion in terms of microsatellites [simple tandem repeats (STRs)], SNP data can be included by using the infinite alleles model and the appropriate mutation rates. Equation 18a allows the method to handle mixed (STRs and SNPs) marker data by using the appropriate expression for qk(t), the probability of a match for marker k at time t. For SNPs, the infinite alleles model is used, qk(t) = exp(-2μkt), while microsatellites use Equation 25, qk(t) = exp(-2μkt)I0(2μkt), which corrects for multiple hits under the stepwise mutation model (I0 denoting the zero-order modified type I Bessel function; Olver 1964). In either case, μk is the mutation rate for marker locus k. More generally, if microsatellite comparisons are coded by differences in size (as opposed to match/no match), Equation 28 gives the probability that two individuals differ by j steps at the kth microsatellite as qk(j)(τ)=2exp(2μkt)Ij(2μkt) , where Ij is the jth-order modified type I Bessel function.

Our approach is easily modified to estimate the time between an individual and a particular (known or inferred) ancestral haplotype. When comparing an individual’s haplotype against a fixed standard, we are following only one branch from the MRCA, so that q(t) = (1 - μ)t ∼ exp(-μt), as opposed to q(t) ∼ exp(-2μt) when looking at mutations over both branches. Hence, we simply replace the mutation rate μ by μ/2, and all of our previous results apply.

Estimation of the time to the MRCA for a sample of individuals (as opposed to our simpler setting of just two individuals) has been examined by Fu (1996) and Tavaré et al. (1997), under the assumption of an infinite sites model (Watterson 1975). These analyses assume a Poisson likelihood for the number of segregating sites, while we assumed a binomial likelihood for number of segregating sites under the assumption that the n sites to be scored in the two individuals were fixed in advance. Given that the rough figure is one common polymorphism every 10 kb for the human Y (M. Hammer, personal communication), focusing on specific sites known to be polymorphic in the population as a whole (as opposed to sequencing large regions) is not an unreasonable approach. When n is large and μ small, the two different likelihoods for estimating t should give very similar results.

As we tried to stress in several places, the major caveat to this (or any other) approach for estimating the time t to MRCA is our uncertainty about both the mutational process and rates. The Bayesian framework allows us to incorporate these uncertainties; for example, if p(μ) is some assumed prior for the mutation rates, then the marginal posterior (after integrating out μ) can be used to estimate t, p(tλ,marker information)L(tmarker information)p(λ)p(μ)dμ. However, practical application requires a reasonable prior for μ. An even more serious problem is the exact form the stepwise mutational model used for microsatellite loci. Using an inappropriate model can potentially introduce more bias than it corrects.


Suppose a total of S mutations have occurred over the two lineages, with n mutations in lineage 1 and S - n in lineage 2, where n is a random variable. Let X1 and X2 denote the changes in array size from the common ancestor. What is the distribution for d = X1 - X2 (the difference in array size between the two lineages) conditioned on the total number of mutations S in both lineages? Jay Taylor (Department of Ecology and Evolutionary Biology, University of Arizona) suggested the following approach, which is considerably more elegant than a more brute force method I initially used. Define the random variable Zi as the change (±1) in array size generated by the ith mutation. Under the symmetric single-step model, Pr(Zi = +1) = Pr(Zi = -1) = ½. Further, X1=i=1nZiandX2=i=1SnZn+i. Noting that, in distribution, Zi = -Zi, it immediately follows that, in distribution, X1X2=i=1nZii=1SnZn+i=i=1nZi+i=1SnZn+i=i=1SZi. Hence the probability that the difference in array size is d given a total of S mutations over both lineages is simply the probability that the random walk given by the symmetric single-step model starting from state zero is in state d after S steps.

Using Taylor’s result, the probability that two arrays are the same size given that a total of 2M mutations have occurred between them equals the probability of M up moves and M down moves, or Pr(match2Mmutations)=(2M)!(M!)2122M.


Thanks go to Mike Lynch, Mike Hammer, Monty Slatkin, Francois Rousset, and Yun-Xin Fu for useful comments and advice and to Bennet Greenspan and Max Rothschild for dragging me into this problem in the first place. Special thanks go to Jay Taylor for very useful discussions on the probability of a match in two independent Markov chains. Curses go to Danny Gianola for planting the seed of Bayesian statistics and then adding plenty of fertilizer and to Bruce Southey, Sandra Rodriguez-Zas, and Dan Sorensen for useful tutoring. Two anonymous reviewers provided very useful comments and criticisms.


  • Communicating editor: Y.-X. Fu

  • Received July 22, 2000.
  • Accepted March 22, 2001.


View Abstract