| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Bruce Walsh
Communicating editor: Y.-X. FU
| ABSTRACT |
|---|
Bayesian posterior distributions are obtained for the time to the most recent common ancestor (MRCA) for a nonrecombining segment of DNA (such as the nonpseudoautosomal arm of the Y chromosome or the mitochondrial genome) for two individuals given that they match at k out of n scored markers. We argue that the distribution of the time t to the MRCA is the most natural measure of relatedness for such nonrecombining regions. Both an infinite-alleles (no recurring mutants) and stepwise mutation model are examined, and these agree well when n is moderate to large and k/n is close to one. As expected, the infinite alleles model underestimates t relative to the stepwise model. Using a modest number (20) of microsatellite markers is sufficient to obtain reasonably precise estimates of t for individuals separated by 200 or less generations. Hence, the multilocus haplotypes of two individuals can be used not only to date very deep ancestry but also rather recent ancestry as well. Finally, our results have forensic implications in that a complete match at all markers between a suspect and a sample excludes only a modest subset of the population unless a very large number of markers (>500 microsatellites) are used.
MOLECULAR marker information has proven an invaluable tool for assessing the degree of relatedness between individuals. To date, most uses of marker information have been concerned with zero-, one- or two-, or deep-generation relatives. By zero-generation, we mean matching/rejecting a forensic sample and a suspect. One- or two-generation assessment includes paternity testing and assessing the degree of relatedness between individuals in natural populations. Typically, these tests have very low power for detecting relatives more distant than sibs and first cousins (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
For unlinked markers, the product rule (multiplying single-locus genotype probabilities together to obtain a multilocus genotype probability) applied to highly polymorphic loci allows just a few (510) unlinked markers to be quite sufficient for identifying individuals that share a common relative in the last generation (such as parent-offspring or sibs). However, because of recombination, unlinked markers have very weak power for distinguishing individuals sharing deeper common ancestry. While there is a growing body of literature on estimating relatedness of two individuals given autosomal marker information (![]()
![]()
![]()
![]()
![]()
![]()
The key to assessing the amount of relatedness using markers on a nonrecombining chromosome is that any two individuals (indeed, the entire population) will have a most recent common ancestor for that region. This follows from coalescence theory (![]()
![]()
=
, the inverse of the appropriate measure of effective population size [the effective numbers Nm and Nf of males and females for the Y and mitochondrial (mt)DNAs, respectively]. Using marker information, we can estimate the time t to the MRCA, and it is the distribution of t that provides a natural metric for describing the relatedness (at least for that region) between two individuals. It is important to stress this difference between the MRCA of a region of DNA and the MRCA for two individuals. Two individuals can easily share an ancestor that is more recent than their MRCA for a particular DNA region. Hence, estimates of the time to the MRCA using a particular DNA region provide an upper bound on the time back to the most recent common ancestor shared by two individuals.
Here we develop a Bayesian estimator for t, obtaining the complete posterior distribution for the time t to the MRCA. We assume that n markers are scored on the nonrecombining chromosome of interest (either the Y or mtDNA) and that we observe matches in allelic state at k of these. We start by assuming the infinite alleles model, where each mutation is assumed to be unique. We then modify our results by assuming a (symmetric single-step) stepwise mutational model, which is a more appropriate descriptor for microsatellite markers. As we show, when k/n is close to one and n moderate to large, the two different mutational models give essentially the same distribution for t.
| THE INFINITE ALLELES MODEL |
|---|
Suppose we score the allelic states at n defined markers on a nonrecombining segment of DNA for two individuals and we wish to estimate the time back to their MRCA (for this segment). We first assume for each marker that (at most) only a single mutation has occurred over both lineages leading from the MRCA to the two individuals being considered. This is not an unreasonable assumption if the individuals match at most markers. We refer to this model as the infinite alleles model, as our development is also exact for the situation where each mutation is unique so that matches occur only when the marker locus in both lines has not mutated. Finally, we assume that the markers are exchangeable in the sense that the per-generation mutation rate µ is the same for each locus. We relax these assumptions below.
Let q(t) be the probability of a match (at any given single marker) between two individuals with a most recent common ancestor t generations ago. The number of matches (k) out of n loci follows a binomial distribution, with
![]() |
(1) |
Ignoring matches created by parallel and/or back mutations, the probability that a marker matches after t generations is simply (1 - µ)2t, as the probability of no mutations occurring in the lineage from the ancestor to individual one is (1 - µ)t, with a similar probability from the MRCA to individual two. Hence,
![]() |
(2a) |
where
![]() |
(2b) |
is the time scaled as total generations of divergence times the mutation rate. Note that the expression given by Equation 2a is a lower bound for the probability of a match, as if back mutations occurred in one (or both) lineages, or, if both lineages experienced parallel mutations, we also observe a match. These types of matches require at least two mutational events, and hence from the first two terms of the Poisson distribution their probability is bounded above by 1 - exp(-2µt)(1 + 2µt), i.e., on the order of (2µt)2exp(-2µt). Thus if 2µt << 1, the effects of such multiple mutations have only a trivial effect on increasing q(t) over the value assuming no mutations (see Fig 4). When a specific value of µ is required, we generally use µ = 1/500 = 0.20%, motivated by estimates for Y chromosome microsatellites of 0.28% (![]()
![]()
![]()
![]()
|
|
|
|
From Equation 1 and Equation 2a, the resulting likelihood for the time t back to the MRCA given that we observe k out of n matches is
![]() |
(3) |
Setting the derivative of ln(L) equal to zero gives the maximum-likelihood estimate (MLE) for
= 2
µ as
![]() |
(4) |
Hence, the MLE for the time back to the MRCA becomes
![]() |
(5) |
Note that the MLE is not especially informative, as the distribution for t is highly positively skewed, resulting in a considerable variance and highly asymmetric confidence intervals about the MLE. In particular, note that the MLE is zero for all values of n when there are no mismatches (k = n), which tells us nothing about the possible restrictions on the maximal time back to the MRCA (see ![]()
![]()
| BAYESIAN POSTERIOR DISTRIBUTIONS FOR TIME TO MRCA |
|---|
While the MLE describes one feature of the distribution of t (the mode), the most complete picture is given by the full posterior distribution of t, which can be obtained by a Bayesian analysis (e.g., ![]()
![]() |
(6) |
The main objection to a Bayesian analysis raised by non-Bayesians is that the choice of a prior is often very subjective and, as such, this can greatly bias the posterior. For the time back to the MRCA, an appropriate prior naturally follows from standard coalescence theory, as the expected time back to a MRCA under pure drift in an effective population size of Ne follows the geometric distribution with success parameter N-1e (e.g., ![]()
![]()
Treating time as continuous, the geometric prior is equivalent to using an exponential distribution with hyperparameter
= N-1e. Thus, the natural prior for the time to MRCA (in the absence of any marker information) is to use
![]() |
(7) |
Taking the limit as
0 gives an (improper) flat prior. As we will shortly see, the actual value of
used has at best a trivial effect on the posterior distribution unless most markers do not match (k << n) and the effective population size is extremely small. Thus the prior is both well motivated and the choice of the prior hyperparameter (
) has very little effect on the final (posterior) distribution in most cases.
Recalling Equation 3, the resulting posterior distribution becomes
![]() |
(8a) |
so that
![]() |
(8b) |
where the normalizing constant is given by
![]() |
(9) |
The impact of the choice of the hyperparameter
for the prior immediately follows from Equation 8a. Provided 2µk >>
, alternate choices of
have little effect on the posterior distribution. Since
= N-1e, this rearranges to 2Neµk >> 1. For a mutation rate of µ = 1/500, the choice of Ne (and hence
) has essentially no effect on the prior provided Nek >> 250, which is a very mild restriction.
Returning to the posterior distribution, the normalizing constant is easily computed by expanding the (1 - e-2µt)n-k term, noting that we can express the function being integrated as
![]() |
(10) |
Thus
![]() |
(11a) |
The last step can be shown either by induction or by using a standard symbolic algebra package (such as Mathematica).
With a flat prior (
= 0) the normalizing term further simplifies to
![]() |
(11b) |
Hence, the posterior density becomes
![]() |
(12) |
For zero marker mismatches (k = n), the posterior is simply an exponential distribution with parameter
+ 2nµ,
![]() |
(13) |
It immediately follows that the mean (µt) and standard deviation (
t) for the time to MRCA when there are no mismatches are
![]() |
(14a) |
Likewise, the cumulative probability distribution for the time back to the MRCA is just
![]() |
(14b) |
In particular, the time T
satisfying Pr(t
T
) =
is given by
![]() |
(14c) |
Assuming a flat prior (
= 0), if two individuals are identical at all n marker loci, there is a 90% probability they shared a MRCA in the last 1.15/(µn) generations. The values for 50, 95, and 99% are 0.347, 1.498, and 2.303 (µn)-1 generations, respectively.
The posterior distributions with one or more mismatches also follow from Equation 12. For example, for one (k = n - 1) and two (k = n - 2) mismatches, the posteriors are

and

The mean and variance for the posterior distribution for any value of k again follow by expanding the (1 - e-2µt)n-k term. Define
![]() |
(15a) |
and
![]() |
(15b) |
Expanding and term-by-term integration gives
![]() |
(16a) |
![]() |
(16b) |
Hence, the mean and variance for the time t to the MRCA, given k of n marker loci match, a prior with hyperparameter
, and a per marker mutation rate of µ, are given by
![]() |
(17a) |
and
![]() |
(17b) |
Fig 1 Fig 2 Fig 3 plot the posterior distributions corresponding to different numbers of matches (k) for n = 5, 10, 20, 50, and 100 markers under the assumption that µ =
and there is a flat prior (
= 0). Provided that Ne >> 250, the results are the same for any other prior using
=
.
Table 1 summarizes the mean, standard deviation (SD), and mode (the MLE) for the posterior distributions as well as the 2.5, 50, 90, and 97.5% cutoff values. Note that a 95% credible region for t is given by the 2.5 and 97.5% values. As expected, the distribution of t is highly skewed, as mode < median < mean. The distribution becomes increasingly skewed to the right as the number of mismatches increases, which is reflected in not only an increase in the mean but also in the variance. However, note that the mean/SD ratio declines with increasing numbers of mismatches, so that the coefficient of variation declines as k decreases.
|
Finally, note that the resolution for t offered by using n = 5 markers is very poor, but rather fine precision is offered by using 100 markers. While scoring the latter number of markers may be unrealistic, using 20 markers is both feasible as well as offering reasonable precision.
| DIFFERENTIAL MUTATION RATES |
|---|
As our knowledge of the parameters associated with the mutational process continues to improve, it is likely that we may find significant differences in the mutation rates at different markers. Fortunately, it is straightforward to modify the likelihood function (Equation 3) to take this into account. Suppose n markers are examined, generating the matching data x1, ... , xn, where the xi are coded as

The likelihood becomes
![]() |
(18a) |
where
![]() |
(18b) |
giving
![]() |
(18c) |
Using an exponential prior with hyperparameter
, the posterior is proportional to
![]() |
(19) |
In any particular data set, the normalization constant is easily obtained by expanding the product involving 1 - e-2tµi, which generally involves only a few terms unless there are a significant number of mismatches.
For the simplest case of no mismatches (k = n), the posterior for the time to the MRCA becomes
![]() |
(20a) |
where
is the mean mutation rate across all markers. Equation 14aEquation 14b HREF="#FD14c">Equation 14cc hold with µ replaced by the mean mutation rate
. Likewise, for one mismatch (say marker i), the posterior becomes
![]() |
(20b) |
where
-i is the mean mutation rate for all markers, excluding marker i.
| CORRECTING FOR MULTIPLE HITS: THE STEPWISE MUTATION MODEL |
|---|
Microsatellites, given their higher mutation rates compared to single nucleotide polymorphisms (SNPs), are clearly the marker of choice for estimating the time to MRCA when the individuals are assumed to be at least modestly related. As microsatellite "alleles" correspond to different lengths of the repeat unit in the microsatellite array, the infinite alleles model assumed previously is not appropriate if multiple mutations are expected in any given marker. Since mutations change the number of repeats (and hence the size) of an array, two (or more) mutations can recover the initial state found in the MRCA. Likewise, parallel mutations in both lineages leading from the MRCA can also lead to the two individuals sharing the same allelic state, even though mutations have occurred. Thus, using an infinite alleles model for microsatellites will tend to underestimate the time to the MRCA. To examine the severity of this underestimation, we consider the divergence, assuming a stepwise mutational model (SMM) assuming equal probabilities of an up (increase array size by one) or down (decrease array size by one) move (the symmetric single-step SMM). The roots of the SMM trace from OHTA and KIMURA's (1973) model of charge differences in electrophoretically scored proteins. A number of workers have shown this model to be a good fit for microsatellites in both indirect studies examining the distribution of array sizes in natural populations (![]()
![]()
![]()
![]()
![]()
![]()
Denote the allelic state (array size) in the MRCA as state 0, and let Xt denote the array size at time t. As before we assume a per-generation mutation rate of µ. Under this model, the transition probabilities between states become
![]() |
(21) |
To apply this model, we need to compute q(t), the probability of a match between two lineages sharing a MRCA t generations ago. A little thought shows that for the one-step model allelic states in two lineages can only match if an even number (2M) of mutations have occurred. The Appendix shows that
![]() |
(22) |
For example, after a total (between both lineages) of 2, 4, 6, 8, and 10 mutations, the probabilities that the marker allelic states match are 0.5, 0.375, 0.313, 0.273, and 0.246 (respectively). Thus, under this model there is a one in four chance that the two lineages share the same allelic state even after a total of 10 mutations have occurred.
Since the probability of 2M total mutations along both lineages assuming t generations back to the MRCA is given by a Poisson distribution with parameter 2tµ, we have the probability of a match conditioned on t as
![]() |
(23) |
For example, considering only the first 10 mutations,

The general solution to Equation 23 follows by noting that
![]() |
(24) |
where I0 denotes the zero-order modified type I Bessel function (![]()
= 2µt generations is
![]() |
(25) |
Fig 4 compares the probability of a match as a function of
for the infinite allele and one-step models. The match probabilities under both models are rather similar for
< 0.5 (125 generations with µ = 1/500), but they diverge rather quickly after that. Note that even after 20
generations (5000 generations with µ = 1/500), the match probability under the stepwise model is still nontrivial (0.09), reflecting the very slow decrease in the probability of a match as the total number of mutations increases. By contrast, the corresponding match probability is essentially zero (2.06 x 10-9) under the infinite alleles model, as a single mutation causes the allelic states to diverge and subsequent back (and/or parallel) mutations are not allowed.
As shown in Table 2, both the mean and variance (measured by the standard deviation) of the distribution for time to the MRCA are larger under the stepwise model than the infinite alleles model. This is certainly expected, as the effect of the stepwise mutational model is to allow for a match following two (or more) mutations, while a match is assumed to never recover following a mutation under the infinite alleles model. Table 2 gives the ratios of the means and standard deviations for the distribution of t under the stepwise model compared to the same statistic under the infinite alleles model. When the number of markers is large (20 or greater) and the number of mismatches is small, these ratios are close to one. The ratio of standard deviations is always larger than the means ratio, reflecting the longer tail (relative to that for the infinite alleles model) generated under the stepwise model. As the number of mismatches increases, the mean and SD ratios increase, reflecting increasingly larger probabilities for t under the stepwise relative to the infinite alleles model.
|
Also note from Table 2 that for the same fraction of observed matches (say 4/5, 8/10, 16/20, 40/50, and 80/100), the ratios of the means and standard errors under the stepwise vs. the infinite alleles model decrease toward 1 as the number n of markers scored increases. For example, for 80% observed matches, assuming a flat prior (
= 0), the mean ratios are 4.8, 1.34, 1.2, 1.1, and 1.1 (for n = 5, 10, 20, 50, and 100 markers). Likewise, the ratios of standard deviations are 34.2, 2.0, 1.3, 1.2, and 1.2. The reason for this can be seen by considering the case where we observe what appears to be a complete match at all n markers. In this case, there is some small chance that two (or more) mutations have occurred in one (or more) markers. However, if indeed a total of two mutations have occurred across both lineages, the probability that both occurred in the same marker is just 1/n (assuming the same mutation rate across markers). Hence, as the number of markers increases, the probability that a multiple mutation is masked decreases due to scoring changes over more loci.
The final observation from Table 2 is that, unlike for the infinite alleles model, the choice of the hyperparameter
for the prior can significantly affect the posterior distribution. This is true when the number of markers is very small and/or the fraction of mismatches is nontrivial. The most extreme difference between the two assumed mutational models is seen under a flat prior (
= 0). As the value of
increases (corresponding to a decrease in the assumed effective population size as
=
), the ratios of the means and standard deviations under the two models become increasingly similar. Almost all of this difference is due to significant decreases in the mean and variance under the distribution of t under the stepwise model as
increases. This arises because the effect of increasing
is to shorten the distributions of times to MRCA in the prior (decreasing Ne decreases the coalescent times), which in turn down-weights matches under the stepwise model from individuals with assumed very long times to the MRCA.
Our analysis under the SMM model considers markers only as showing a match or mismatch, ignoring any additional information on the differences between marker alleles when a mismatch has occurred. Over short time scales (on the order of 1/2µ < 1) we are likely not losing much information, as most markers will likely have at most one mutation and hence scoring a match vs. no match is sufficient. Over longer time scales, we are clearly losing information. In such cases, a logical extension of our model would be replacing the probability of a match with the probability that the two microsatellite alleles in the individuals being compared differ by r repeats. This is accomplished as follows. Consider the probability of an even number (2k) of differences between microsatellite array sizes. Again, a little thought shows that this can occur only with an even number of total mutations (2M). With a total of 2M mutations, the probability that the array sizes differ by 2k is the probability that the number of up (+) mutations is either M + k or M - k (by symmetry, these two probabilities are the same under the SMM). Thus, the probability of a difference (in absolute value) of
= 2k given 2M total mutations is
![]() |
(26) |
Since M follows a Poisson distribution with mean 2µt, the probability of a difference of 2k given the time t to the MRCA is
![]() |
(27a) |
where Is denotes the s-order modified type I Bessel function (![]()
Using the same logic, for a difference of 2k + 1 an odd number (2M + 1) of total mutations are required, and following the same steps leading to Equation 27a gives
![]() |
(27b) |
Hence, the probability that the array sizes for two alleles differ by j after
= 2µt generations is
![]() |
(28) |
Fig 5 plots the probability for 0, 1, 2, 3, and
4 differences in array size as a function of
. F. ROUSSET (personal communication) kindly pointed out that Equation 28 can be found buried in ![]()
![]()
![]()
|
The likelihood function follows from the multinomial distribution. If a total of n markers are scored, and ni denotes the number of markers differing in size by i (with the largest difference being k), then
![]() |
(29) |
Again using an exponential prior, the resulting posterior distribution is proportional to
![]() |
(30) |
The full distribution is recovered by numerical integration to normalize the posterior, viz.,
![]() |
(31) |
As an example of applying Equation 31, consider haplotypes 1 and 3 from ![]()

Table 3 compares the estimated parameters under this model with those estimated using the infinite alleles and SMM matching models. Fig 6 plots the resulting posterior distributions. We use µ = 0.245% (the average of the ![]()
![]()
=
(from ![]()
|
|
While Equation 31 provides the foundation for a full Bayesian analysis, we caution that its usefulness depends on accurately capturing the mutation model for the markers in question. While the stepwise mutation model seems a reasonable fit, the fine details are still unclear. For example, there are suggestions that mutation rate may increase with array size (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| DISCUSSION |
|---|
Both the Y chromosome and mtDNA have been successfully used for assessing deep ancestry, while unlinked autosomal markers have proven much more valuable for determining very recent ancestry. Here we examine the effectiveness of using multilocus haplotypes from a region of nonrecombining DNA (such as the majority of Y chromosome or mtDNA) to estimate the time t to the MRCA of two individuals for that region. We argue that, to assess the relatedness of individuals on the basis of their haplotypes in a region of nonrecombining DNA, the time to the MRCA is the natural replacement for probability calculations based on the product rule used for unlinked markers.
We assume n prechosen markers are examined and are (initially) coded as either matching (agreeing in allelic state) or not matching. Using a Bayesian approach, the resulting posterior distribution for t is a function of n, the number of matches k, the per-marker mutation rate µ, and the hyperparameter
=
(the reciprocal of the effective population size) of the assumed prior distribution of t (the assumed distribution for the time to the MRCA for two random individuals in the absence of any marker information). We examined two rather different mutation modelsinfinite alleles (each mutation is unique, with a match implying no mutation at that marker since the MRCA) and the SMM (which allows for matches created by several parallel mutations).
Our results show that it is possible to use Y or mtDNA marker information to provide reasonable estimates for t when individuals share a MRCA of intermediate age (tens to hundreds of generations when µ = 10-3), provided a sufficient number of markers are scored (Table 1, Fig 1 Fig 2 Fig 3). Estimates using only 5 markers have distributions that are highly skewed for large t values, especially when the possibility of back mutations regenerating a match is taken into account (i.e., the SMM model). As the number of markers increases, the width of the credible intervals around an estimate of t decreases. While 10 markers give a reasonable interval, 20 markers seems a more workable tradeoff between cost and precision. Our results also suggest that the forensic use of either the Y or mtDNA is rather limited. They can be used for exclusion, but make only weak probability statements about a sample and a suspect when there is a complete match. Using Equation 14c, a complete match at 10 out of 10 markers between a sample and a suspect implies only that the individual generating the sample and the suspect have a 90% chance of sharing a MRCA no more than 58 generations ago (assuming a typical microsatellite mutation rate of µ =
). With a complete match, the number of markers that need to be scored to have a 90% probability that the sample and suspect have a MRCA no more than 1 generation ago is
580, an order of magnitude more than the number of currently known Y-linked markers (M. HAMMER, personal communication). A 50% probability that the MRCA does not exceed 1 generation requires roughly 340 microsatellite markers.
We obtained the posterior distributions for t given the marker data by using a Bayesian analysis, which requires a prior distribution for t. From standard population-genetics theory, t follows a geometric distribution with success parameter 1/Ne (as the Y and mtDNA behave as haploids). Hence, the functional form of the prior is well motivated, while its exact shape is determined by the hyperparameter
=
used for any particular prior. For the infinite alleles model, the actual value of
had essentially no effect unless we used a very small value for Ne (<200) and/or there were a very significant number of mismatches (k/n << 1). Thus, we used a flat prior (
= 0, corresponding to an infinite effective population size) for the general results tabulated, although the equations given cover arbitrary values of
0. Conversely, under the stepwise model, the value of
often had a significant effect on the posterior, especially when the number of markers n is small and the number of matches is modest or poor (k/n << 1). In such cases, the use of a flat prior gave the largest difference in the posteriors for the two mutational models. As the value of
is increased (corresponding to decreasing Ne), the posteriors under the two different models become increasingly similar (Table 2). This occurs because, as we decrease Ne, we make the initial assumption that individuals with long times to the MRCA become increasingly unlikely. It is these individuals that still have a modest probability of showing a match under the stepwise model, greatly inflating the estimated times to the MRCA relative to the infinite alleles estimate. In cases where there was a recent population expansion (or contraction) or a selective sweep, the distribution of times to MRCA may deviate from a geometric distribution. However, even in such cases, we expect there to be little dependence on the prior except in cases where changes in
under a geometric prior have a significant effect on the posterior.
One potential issue of concern is whether our results are somehow biased by scoring only markers known to be polymorphic in the human population as a whole (but not necessarily in the particular pair of individuals being contrasted). When the goal is to estimate the coalescent time for an entire population, using markers known (in advance) to be polymorphic in the population of interest creates an ascertainment bias that needs to be corrected (e.g., ![]()
While we have implicitly framed much of our discussion in terms of microsatellites [simple tandem repeats (STRs)], SNP data can be included by using the infinite alleles model and the appropriate mutation rates. Equation 18a allows the method to handle mixed (STRs and SNPs) marker data by using the appropriate expression for qk(t), the probability of a match for marker k at time t. For SNPs, the infinite alleles model is used, qk(t) = exp(-2µkt), while microsatellites use Equation 25, qk(t) = exp(-2µkt)I0(2µkt), which corrects for multiple hits under the stepwise mutation model (I0 denoting the zero-order modified type I Bessel function; ![]()
) = 2exp(-2µkt)Ij(2µkt), where Ij is the jth-order modified type I Bessel function.
Our approach is easily modified to estimate the time between an individual and a particular (known or inferred) ancestral haplotype. When comparing an individual's haplotype against a fixed standard, we are following only one branch from the MRCA, so that q(t) = (1 - µ)t
exp(-µt), as opposed to q(t)
exp(-2µt) when looking at mutations over both branches. Hence, we simply replace the mutation rate µ by µ/2, and all of our previous results apply.
Estimation of the time to the MRCA for a sample of individuals (as opposed to our simpler setting of just two individuals) has been examined by ![]()
![]()
![]()
As we tried to stress in several places, the major caveat to this (or any other) approach for estimating the time t to MRCA is our uncertainty about both the mutational process and rates. The Bayesian framework allows us to incorporate these uncertainties; for example, if p(µ) is some assumed prior for the mutation rates, then the marginal posterior (after integrating out µ) can be used to estimate t,

However, practical application requires a reasonable prior for µ. An even more serious problem is the exact form the stepwise mutational model used for microsatellite loci. Using an inappropriate model can potentially introduce more bias than it corrects.
| ACKNOWLEDGMENTS |
|---|
Thanks go to Mike Lynch, Mike Hammer, Monty Slatkin, Francois Rousset, and Yun-Xin Fu for useful comments and advice and to Bennet Greenspan and Max Rothschild for dragging me into this problem in the first place. Special thanks go to Jay Taylor for very useful discussions on the probability of a match in two independent Markov chains. Curses go to Danny Gianola for planting the seed of Bayesian statistics and then adding plenty of fertilizer and to Bruce Southey, Sandra Rodriguez-Zas, and Dan Sorensen for useful tutoring. Two anonymous reviewers provided very useful comments and criticisms.
Manuscript received July 22, 2000; Accepted for publication March 22, 2001.
| APPENDIX |
|---|
Suppose a total of S mutations have occurred over the two lineages, with n mutations in lineage 1 and S - n in lineage 2, where n is a random variable. Let X1 and X2 denote the changes in array size from the common ancestor. What is the distribution for d = X1 - X2 (the difference in array size between the two lineages) conditioned on the total number of mutations S in both lineages? Jay Taylor (Department of Ecology and Evolutionary Biology, University of Arizona) suggested the following approach, which is considerably more elegant than a more brute force method I initially used. Define the random variable Zi as the change (±1) in array size generated by the ith mutation. Under the symmetric single-step model, Pr(Zi = +1) = Pr(Zi = -1) =
. Further,

Noting that, in distribution, Zi = -Zi, it immediately follows that, in distribution,

Hence the probability that the difference in array size is d given a total of S mutations over both lineages is simply the probability that the random walk given by the symmetric single-step model starting from state zero is in state d after S steps.
Using Taylor's result, the probability that two arrays are the same size given that a total of 2M mutations have occurred between them equals the probability of M up moves and M down moves, or

| LITERATURE CITED |
|---|
BIANCHI, N. O., C. I. CATANESI, G. BAILLIET, V. L. MARTINEZ-MARIGNAC, and C. M. BRAVI et al., 1998 Characterization of ancestral and derived Y-chromosome haplotypes of new world native populations. Am. J. Hum. Genet. 63:1862-1871[Medline].
BLOUIN, M. S., M. PARSONS, V. LACAILLE, and S. LOTZ, 1996 Use of microsatellite loci to classify individuals by relatedness. Mol. Ecol. 5:393-401[Medline].
BRINKMANN, B., M. KLINTSCHAR, F. NEUHUBER, J. HÜHNE, and B. ROLF, 1998 Mutation rate in human microsatellites: influence of the structure and length of the tandem array. Am. J. Hum. Genet. 62:1408-1415[Medline].
BROWN, M. D., S. H. HOSSEINI, A. TORRONI, H.-J. BANDLET, and J. C. ALLEN et al., 1998 mtDNA haplogroup X: an ancient link between Europe/Western Asia and North America? Am. J. Hum. Genet. 63:1852-1861[Medline].
DEKA, R., L. JIN, M. D. SHRIVER, L. MEI YU, and N. SAHA et al., 1996 Dispersion of human Y chromosome haplotypes based on five microsatellites in global populations. Genome Res. 6:1177-1184
DI RIENZO, A., A. C. PETERSON, J. C. GARZA, A. M. VALDES, and M. SLATKIN et al., 1994 Mutational processes of simple-sequence repeat loci in human populations. Proc. Natl. Acad. Sci. USA 91:3166-3170
DONNELLY, P. and S. TAVARÉ, 1995 Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29:401-421[Medline].
DONNELLY, P., S. TAVARÉ, D. J. BALDING, and R. C. GRIFFITHS, 1996 Estimating the age of the common ancestor of men from the ZFY intron. Science 272:1357-1359
EDWARDS, A. H., A. HAMMOND, L. JIN, C. T. CASKEY, and R. CHAKRABORTY, 1992 Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. Genomics 12:241-253[Medline].
FORSTER, P., R. HARDING, A. TORRONI, and H.-J. BANDELT, 1996 Origin and evolution of Native American mtDNA variation: a reappraisal. Am. J. Hum. Genet. 59:935-945[Medline].
FU, Y.-X., 1996 Estimating the age of the common ancestor of a DNA sample using the number of segregating sites. Genetics 144:829-838[Abstract].
FU, Y.-X. and R. CHAKRABORTY, 1998 Simultaneous estimation of all the parameters of a stepwise mutation model. Genetics 150:487-497
FU, Y.-X. and W.-H. LI, 1996 Estimating the age of the common ancestor of men from the ZFY intron. Science 272:1356-1357[Medline].
HAMMER, M. F., 1995 A recent common ancestry for human Y chromosomes. Nature 378:376-378[Medline].
HEYER, E., J. PUYMIRAT, P. DIELTJES, E. BAKKER, and P. DE KNIJFF, 1997 Estimating Y chromosome specific microsatellite mutation frequencies using deep rooting pedigrees. Hum. Mol. Genet. 6:799-803
HUDSON, R. R., 1991 Gene genealogies and the coalescent process, pp. 144 in Oxford Surveys in Evolutionary Biology, edited by D. J. FUTUYAMA and J. ANTONOVICS. Oxford University Press, Oxford.
KAYSER, M., L. ROEWER, M. HEDMAN, L. HENKE, and J. HENKE et al., 2000 Characteristics and frequency of germline mutations at microsatellite loci from human Y chromosome, as revealed by direct observation in father/son pairs. Am. J. Hum. Genet. 66:1580-1588[Medline].
KITTLES, R. A., M. PEROLA, L. PELTONEN, A. W. BERGEN, and R. A. ARAGON et al., 1998 Dual origins of Finns revealed by Y chromosome haplotype variation. Am. J. Hum. Genet. 62:1171-1179[Medline].
LEE, P. M., 1997 Bayesian Statistics: An Introduction, Ed. 2. Arnold, London.
LI, W.-H., 1976 Electrophoretic identity of proteins in a finite population and genetic distance between taxa. Genet. Res. 28:119-127[Medline].
LYNCH, M. and K. RITLAND, 1999 Estimation of pairwise relatedness with molecular markers. Genetics 152:1753-1766
MARSHALL, T. C., J. SLATE, L. E. B. KRUUK, and J. M. PEMBERTON, 1998 Statistical confidence for likelihood-based paternity inference in natural populations. Mol. Ecol. 7:639-655[Medline].
MERRIWETHER, D. A., F. ROTHHAMMER, and R. E. FERRELL, 1995 Distribution of the four-founding lineage haplotypes in Native Americans suggests a single wave of migration for the New World. Am. J. Phys. Anthropol. 98:411-430[Medline].
NIELSEN, R., 2000 Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154:931-942
OHTA, T. and M. KIMURA, 1973 The model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a genetic population. Genet. Res. 22:201-204[Medline].
OLVER, F. W. J., 1964 Bessel functions of integer order, pp. 355434 in Handbook of Mathematical Functions, edited by M. ABRAMOWITZ and I. A. STEGUN. National Bureau of Standards, Washington, DC.
QUELLER, D. C. and K. F. GOODNIGHT, 1989 Estimating relatedness using genetic markers. Evolution 53:258-275.
RITLAND, K., 1996 Estimators for pair-wise relatedness and individual inbreeding coefficients. Genet. Res. 67:175-185.
SHRIVER, M. D., L. JIN, R. CHAKRABORTY, and E. BOERWINKLE, 1993 VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134:983-993[Abstract].
SKORECKI, K., S. SELIG, S. BLAZER, R. BRADMAN, and N. BRADMAN et al., 1997 Y chromosomes of Jewish priests. Nature 385:32[Medline].
STONE, A. C. and M. STONEKING, 1998 mtDNA analysis of a prehistoric Oneota population: implications for the peopling of the New World. Am. J. Hum. Genet. 62:1153-1170[Medline].
TAVARÉ, S., D. J. BALDING, R. C. GRIFFITHS, and P. CONNELLY, 1997 Inferring coalescent times from DNA sequence data. Genetics 145:505-518[Abstract].
THOMAS, M. G., T. PARFITT, D. A. WEISS, K. SKORECKI, and J. F. WATSON et al., 2000 Y chromosomes traveling south: the Cohen modal haplotype and the origins of the Lembathe "Black Jews of Southern Africa.". Am. J. Hum. Genet. 66:674-686[Medline].
THOMPSON, E. A., 1975 The estimation of pair-wise relationship. Ann. Hum. Genet. 39:173-188[Medline].
TORRONI, A., J. V. NEEL, R. BARRANTES, T. G. SCHURR, and D. C. WALLACE, 1994 Mitochondrial DNA "clock" for the Amerinds and its implications for timing their entry into North America. Proc. Natl. Acad. Sci. USA 91:1158-1162
TORRONI, A., H.-J. BANDELT, L. D'URBANO, P. LAHERMO, and P. MORLA et al., 1998 mtDNA analysis reveals a major late paleolithic population expansion from southwestern to northeastern Europe. Am. J. Hum. Genet. 62:1137-1152[Medline].
VALDES, A. M., M. SLAKTIN, and N. B. FREIMER, 1993 Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737-749[Abstract].