Abstract
The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce a parametric model that relaxes the molecular clock by allowing rates to vary across lineages according to a compound Poisson process. Events of substitution rate change are placed onto a phylogenetic tree according to a Poisson process. When an event of substitution rate change occurs, the current rate of substitution is modified by a gammadistributed random variable. Parameters of the model can be estimated using Bayesian inference. We use Markov chain Monte Carlo integration to evaluate the posterior probability distribution because the posterior probability involves high dimensional integrals and summations. Specifically, we use the MetropolisHastingsGreen algorithm with 11 different move types to evaluate the posterior distribution. We demonstrate the method by analyzing a complete mtDNA sequence data set from 23 mammals. The model presented here has several potential advantages over other models that have been proposed to relax the clock because it is parametric and does not assume that rates change only at speciation events. This model should prove useful for estimating divergence times when substitution rates vary across lineages.
THE molecular clock hypothesis states that the evolutionary rate of a gene is roughly constant among different lineages (Zuckerkandl and Pauling 1962). Zuckerkandl and Pauling (1965) also suggested that the substitution process is approximately Poisson. If the rate of substitution is constant across lineages, then the distances between species should be ultrametric (i.e., all tips are an equal distance from the root of the tree). Furthermore, if the substitutions follow a Poisson process, then the variance and the mean of the number of substitutions that occur on different lineages in the same amount of time should be equal. Since the early 1970s, however, neither prediction has been shown to hold true; the variance to mean ratio of the number of substitutions is generally greater than one, suggesting that the substitution process is overdispersed (Ohta and Kimura 1971; Langley and Fitch 1973, 1974). Moreover, rates of substitution have been shown to vary across lineages (see Gillespie 1991).
Despite the observation that the molecular clock hypothesis does not perfectly explain the substitution process, it remains a powerful conceptual and analytical tool in evolutionary biology. Conceptually, the molecular clock provides a timescale, albeit an imperfect one, for evolution as well as a mechanistic description of the substitution process. As an analytical tool, the molecular clock has also proven useful. Many of the more interesting applications of phylogenies assume that the molecular clock holds. For example, statistical analysis of hostparasite cospeciation often assumes a molecular clock (Huelsenbecket al. 1997). Also, the estimation of divergence times using molecular data depends upon (1) an accurate calibration date for at least one speciation event on the tree and (2) a constant substitution rate among lineages. Several recent studies have attempted to date the divergence times for eubacteria/eukaryotes (Doolittleet al. 1996), metazoa (Wrayet al. 1996), birds (Cooper and Penny 1997), mammalian orders, and major lineages of vertebrates (Kumar and Hedges 1998) using the molecular clock assumption. To a large extent, the validity of any such analysis depends on how well the data conform to the clock assumption.
Several different approaches have been taken to accommodate rate variation among lineages. One method, commonly used for estimating phylogenetic trees, is to assign each branch of a phylogenetic tree its own rate parameter. However, this procedure does not allow estimation of divergence times of clades because rate and time are confounded. Another approach was proposed by Sanderson (1997), who used a nonparametric method for smoothing the rate differences across speciation events on the tree. Sanderson's method allows rates to vary on branches of the tree and also allows divergence times to be estimated. At the other extreme, Thorne et al. (1998) proposed a parametric model for relaxing the clock. Like Sanderson (1997), they assume that rates are autocorrelated across speciation events; their model assigns new rates to descendent lineages from a lognormal distribution with the mean of the distribution equal to the rate of the ancestral lineage. The model we present here differs from the models proposed by Sanderson (1997) and Thorne et al. (1998) by allowing rates to change anywhere on the tree. Yet, the model introduces only two additional parameters over the strict molecular clock.
In this article, we propose a compound Poisson process for introducing rate variation across lineages on a tree. We assume that nucleotide substitutions occur along branches of the tree according to a Poisson process, as in the Markov substitution models widely used for phylogenetic inference. However, an independent Poisson process also generates events of substitutionrate change. At each of these events, the rate of substitution is changed by multiplying the current rate by a gammadistributed random variable. We use Markov chain Monte Carlo when making Bayesian inferences.
METHODS
A compound Poisson process of rate variation: We assume that the phylogeny of a group of species can be represented by a rooted binary tree, an arbitrary example of which is shown in Figure 1. The tips are labeled n_{1} to n_{s}, and the internal nodes are labeled n_{s}_{+1} to n_{2}_{s}_{−1}, where s is the number of sequences. The root of the tree is always labeled n_{2}_{s}_{−1}. The times of the nodes are denoted t = (t_{1}, t_{2}, …, t_{2}_{s}_{−1}). The branches are scaled such that the tips are at time 0 (t_{1} = t_{2} = … = t_{s} = 0) and the root is at time 1 (t_{2}_{s}_{−1} = 1). The sum of the scaled branch lengths is referred to as the total length of the tree, T. The total length of the tree of Figure 1, for example, is T = 4.55. If the branch lengths are multiplied by a parameter m, representing the expected number of substitutions per site on a single branch reaching from the root of the tree to the tips, the branch lengths of the tree are in terms of expected number of substitutions per site. Moreover, the branch lengths on the tree conform to a molecular clock; there has been no variation in rates across lineages with the result that all of the tips can be drawn to lie on a single time line. Figure 2 shows a rescaling of the tree of Figure 1 with m = 0.17.
We use a compound Poisson process to introduce rate variation across lineages of the tree. Events (positions on the tree at which substitution rate changes) occur according to a Poisson process with parameter λ. When an event occurs, the rate of substitution just prior to the event (m) is multiplied by a gammadistributed random variable (taking value r) to produce a new substitution rate above the event (m′; m′ = mr). The gamma distribution has density
A prior model that changes rates multiplicatively with independent multipliers must be carefully chosen so that the rate does not tend to either 0 or infinity in probability. To see this, let r_{i}, i = 1, 2, … be independent and identically distributed positive random variables and let
In this article, we largely circumvent this problem by placing a prior on λ such that the number of change points on the tree does not become large. In addition, we define a onedimensional family of gammadistributed random variables with the property that E[log r_{i}] = 0 by letting β = e^{Ψ(αP)}, where Ψ is the derivative of the logarithm of the gamma function:
In this article, we multiply the rate at an event by a gammadistributed random variable with density
The collection of events is denoted e = (ξ, z, r), where ξ is the number of rate change events, z is the vector of event positions, and r is the vector of rate multipliers. When ξ = 0, z and r are empty. For ξ > 0, z ϵ [0,T]^{ξ} and
The prior distribution of e = (ξ, z, r) given the tree and parameters α_{P} and λ is described by the probability measure
Model of DNA substitution: We assume that DNA sequences from homologous regions are available for species n_{1} to n_{s}. Let X = {x_{ij}} be the aligned nucleotide sequences, where i = 1, 2, …, s; j = 1, 2, …,c; and c is the number of nucleotide sites per sequence. Each column of the data matrix x_{j} = {x_{1}_{j}, …, x_{sj}}′ specifies the nucleotides for the s sequences at the jth site.
As is usual for DNA sequences, we assume that substitutions occur according to a Poisson process with rate matrix Q. A general reversible rate matrix allows a different stationary frequency for the four nucleotides π = (π_{A}, π_{C}, π_{G}, π_{T}) constrained to sum to one and six rates for the 12 substitution types
Amongsite rate variation can be accommodated in several different ways. One potential method partitions the DNA sequence into different regions (e.g., first, second, and third codon positions) and then estimates the rate separately for each partition, assuming that the rate within a partition is homogeneous [i.e., instead of a single substitution rate (m), applying to all sites, there are multiple substitution rates (m_{1}, m_{2}, …)]. Another method assumes that the rate assigned to a site is a random variable. Typically, the gamma distribution, parameterized with the shape parameter equal to the scale parameter, is used to model rate variation across sites (Yang 1993, 1994a). The shape parameter of the gamma distribution for amongsite rate variation is denoted α_{R}. In this article, we assume equal rates across sites or gammadistributed rates across sites. Models that assume gammadistribution rate variation are denoted “Γ.” Our implementation of gammadistributed rate variation uses the discrete gamma approximation with four rate categories (Yang 1994a).
The probability of observing the states present at the ith site (x_{i}) is a sum over all possible assignments of nucleotides to the internal nodes of the tree. Suppose y = {y_{k}} for k = s + 1, …, 2s − 1 is a generic data vector at the internal nodes. Branch k of the tree has length v_{k} expected substitutions per site and ancestral node σ(k). The transition probability from state i to state j along a branch with v expected substitutions is p_{ij}(v). The initial substitution rate at the root is m. Then, the conditional probability of observing the data at the ith site given the tree and rate events is
Assuming independence of the substitutions across sites, the conditional probability of observing the full sequence data set given the tree and rate events is the product of the probabilities of observing the sites:
Estimating m, λ, and α_{P} using Markov chain Monte Carlo: We wish to estimate the rate of molecular evolution at each branch through Bayesian estimation of the parameters λ, α_{P}, and m (where m is the rate of substitution at the base of the tree). The likelihood function for λ, α_{P}, and m is
All Bayesian inference arises from the joint posterior distribution of the parameters of interest (in this case, the posterior distribution of λ, α_{P}, and m). The posterior probability density of λ, α_{P}, and m is
Calculation of the posterior probability density (Equation 8) involves evaluating highdimensional integrals and summations. We use Markov chain Monte Carlo (MCMC) integration to estimate the posterior distributions of interest. Specifically, we used the MetropolisHastingsGreen (MHG) algorithm (Green 1995; see also Geyer 1999), an extension of the MetropolisHastings algorithm for problems in which the dimension of the sample space changes. The MHG algorithm constructs a Markov chain by first proposing a new state and then moving to that state with probability R. The steps of the algorithm are as follows: (1) the current state of the chain is θ; (2)with probability density f(θ*θ), a new state (θ*) is nominated; (3) the acceptance probability of the nominated state is calculated,
Several different mechanisms were used to update the state of the chain. These mechanisms included (1) adding an event to the tree, (2) deleting an event from the tree, (3) changing the position of an event on the tree, (4) changing the gammadistributed random variable associated with an event on the tree, (5) changing the time of an internal node on the tree, (6) changing the substitution rate (m) at the base of the tree, (7) changing the Poisson parameter (λ), (8) changing the gamma parameter (α_{P}), (9) changing the transition/transversion rate ratio (κ), (10) changing the gamma shape parameter for amongsite rate variation (α_{R}), and (11) changing the base frequencies (π).
The acceptance probabilities for the different moves take the form
Move type 1: Adding an event to the tree: The prior ratio for the addition of a single point (z*, r*) to the current state e = (ξ, z, r) is
Move type 2: Deleting an event from the tree: The acceptance probability for the reverse step, deleting an event from the tree, has the same form as Equation 13, but with the ratio terms inverted. Detailed balance between the move types that add and delete an event on the tree is demonstrated in the appendix.
Move type 3: Changing the position of an event: Two different move types changed the position of an event on the tree. The first move type (3a) was chosen with probability ϕ_{3}_{a} and picked at random one of the ξ events on the tree. A new position was chosen uniformly on the interval [0, T]. The second move type (3b) was chosen with probability ϕ_{3}_{b} and picked one of the ξ events on the tree at random and moved its current position by a small amount randomly drawn from the interval [−ε_{3}_{b}, +ε_{3}_{b}]. The acceptance probability for both of these moves is
Move type 4: Changing the rate multiplier associated with an event: Two different move types changed the rate associated with an event on the tree. Both move types pick at random one of the ξ events. This randomly chosen event will have its rate multiplier changed from r to r*. For the first move type (4a) a change to r* is proposed with probability ϕ_{4}_{a} such that log_{e}(r*/r) is uniformly distributed on the interval [−½, +½]. The acceptance probability for this move is then
Move type 5: Changing the time of an internal node on the tree: With probability ϕ_{5}, the time of an internal node was changed. The times of the internal nodes were updated as follows. First, an internal node of the tree was chosen at random (excluding the root node). The time of this node was increased or decreased by adding a uniformly distributed random variable drawn from the interval [−ε_{5}, +ε_{5}]. The acceptance probability for this move is
Move types 6, 7, 8, 9, and 10: Changing m, λ, α_{P}, κ, and α_{R}: The substitution rate (m), Poisson parameter (λ), gamma shape parameter (α_{P}), transition/transversion rate ratio (κ), and gamma shape parameter for amongsite rate variation (α_{R}) were updated with probabilities ϕ_{6}, ϕ_{7}, ϕ_{8}, ϕ_{9}, and ϕ_{10}, respectively, by adding to the current value a uniformly distributed random variable on the interval [−ε_{6}, +ε_{6}], [−ε_{7}, +ε_{7}], [−ε_{8}, +ε_{8}], [−ε_{9},+ε_{9}], and [−ε_{10}, +ε_{10}], respectively. The acceptance probabilities are
Move type 11: Changing the base frequencies: With probability ϕ_{11} a move was attempted that changed the equilibrium base frequencies, π. The sum of the base frequencies is constrained to equal 1 and new values are proposed from a Dirichlet distribution with expected values at the current values. The Dirichlet distribution is the natural conjugate prior for a multinomial distribution and has probability density
Changing parameters near the boundary of the parameter space: Note that for move types 3b, 5, 6, 7, 8, 9, and 10 that the parameter is changed by adding a uniformly distributed random variable from an interval [−ε, +ε]. When the parameter is restricted to an interval (l, h) and the proposed value is outside this interval, we reflect the excess back into the interval. Namely, if θ′ = θ + U, where θ is the current parameter value (either z, t, m, λ, or α_{P}) and U is the uniformly distributed change, the proposed parameter value θ* is
Estimating parameters using Bayesian inference: The chain was sampled every g_{S} generations. The posterior distributions of the parameters are obtained directly by noting the position of the chain and recording the values of m, λ, and α_{P} for each sampled state. The proportion of the time the chain stays in different intervals is an approximation of the posterior distribution.
The chain was burned in by discarding the first g_{B} generations of the Markov chain. The burnin was performed to allow the chain to approach stationarity before states are sampled.
Validation of computer program: A computer program implementing the compound Poisson process for the HKY85 + Γ (Hasegawa et al. 1984, 1985; Yang 1993, 1994a) model of DNA substitution was written in C by one of us (J.P.H.; available via anonymous ftp to brahms.biology.rochester.edu or via the WWW at http://brahms.biology.rochester.edu). The advantage of using MCMC for Bayesian inference is that the sampled points are a valid (albeit dependent) sample from the posterior distribution: the Markov chain law of large numbers (theorem 3; Tierney 1994) states that posterior probabilities can be validly estimated from longrun sample frequencies. However, it is impossible to guarantee that an implementation of MCMC will converge for any particular problem (Geyer 1999). As Geyer (1999, p. 80) states, “MCMC is a complex mixture of computer programming, statistical theory, and practical experience. When it works, it does things that cannot be done any other way, but it is good to remember that it is not foolproof.”
The computer program was validated in several ways: (1) likelihoods were checked against existing computer programs (e.g., PAUP*; Swofford 1998); (2) the acceptance probabilities for the various move types were worked out independently by two of us; and (3) the program was run without any data. When the program is run without any data, then the likelihood ratio equals one and the chain should target the prior distribution. When the chain is run without data, the expectation and variance of the number of events on the tree should both be equal to λT. Moreover, the average rate associated with the events should equal α_{P}/e^{Ψ(αP)} and the variance should equal α_{P}/e^{Ψ(αP)}. When the chain was run without data, the results satisfied these expectations. Also, multiple independent chains were started from different starting values of the parameters. The chains were examined to see if they converged to the same posterior probability distribution.
Analysis of DNA sequence data: We applied the method to a single DNA sequence data set. The data set included complete mitochondrial sequences from 23 mammals (Arnasonet al. 1997). The tree topology was considered fixed in the analysis; the tree topology estimated using UPGMA (Sokal and Michener 1958) with the JukesCantor distance (Jukes and Cantor 1969) was used in the analysis. All analyses were performed assuming the HKY85 model of DNA substitution (Hasegawa et al. 1984, 1985). This model allows different base frequencies as well as a transition/transversion rate bias. Parameters of the substitution model were estimated on the UPGMA tree using maximum likelihood (as implemented in PAUP*; Swofford 1998).
The Markov chain was run for at least 500,000 generations. Every 50th state of the chain was sampled (i.e., g_{S} = 50). The sampled states were used to construct the posterior distribution for the variables m, λ, and α_{p}.
RESULTS
The molecular clock hypothesis of equal rates across lineages could be rejected for the mammalian mtDNA sequence data set. We used a likelihoodratio test to examine the molecular clock hypothesis (Felsenstein 1981). The HKY85 + Γ model was assumed in the analysis. The null hypothesis is that the molecular clock holds; the likelihood is maximized under the constraint of equal rates across lineages (L_{0}). The alternative hypothesis relaxes the clock constraint by assigning a different rate to each of the lineages; the likelihood under the alternative hypothesis is L_{1}. The likelihoodratio test statistic (twice the difference in the log_{e} likelihood, −2 log_{e}Λ; Λ = L_{0}/L_{1}) between the null and alternative models is approximately χ^{2} distributed with s − 2 degrees of freedom. The molecular clock hypothesis was rejected at the 5% level (mammalian mitochondrial sequences: log_{e}L_{0} = −114,514.23, log_{e}L_{1} = −114,431.21, −2 log_{e}Λ = 166.03, P < 0.00001).
We assumed two different priors for λ for the mtDNA data. Figures 6 and 7 show the results of the analysis of the mammalian mitochondrial sequences assuming an exponential prior for λ with mean 1.0. Figure 6 shows the change in the log_{e} likelihood through time. The probability of observing the data increased for the first 10,000 generations of the chain. The log_{e} likelihood then stabilized, fluctuating around a value of ~ −130,240. We discarded the first g_{B} = 50,000 generations as the burnin time of the chain. Figure 7 shows the posterior distributions for the parameters m and α_{P}. The estimates were m = 0.33 (0.32, 0.34) and α_{P} = 5.70 (2.35, 10.82). The credibility interval for each variable was determined by taking the 2.5% tails of the posterior distributions.
Figures 8 and 9 show the results of analyses in which an exponential prior with mean 10.0 was assumed for λ. Estimates of α_{P} (the shape parameter of the compound Poisson process) and m changed when different parameter values of the prior were used [m = 0.33 (0.32, 0.33); α_{P} = 63.89 (30.00, 104.24)]. The posterior distribution of m became broader and the posterior distribution of α_{P} increased (there were more events, each with a smaller effect). However, the branch lengths were very similar regardless of the exact value of the exponential prior on λ. Figure 10 shows the relationship between the maximumlikelihood estimates of the branch lengths obtained assuming the HKY85 model of DNA substitution and the mean of the posterior distribution of each branch obtained under the compound Poisson process. In both cases, the compound Poisson process was able to accommodate rate variation across lineages of the tree (the slope was 1.00, with r^{2} = 0.999 for both priors). We also examined the relationship between the compound Poisson process with exponential priors of means 1 and 10 (Figure 11). Again, there is a close relationship between the branch lengths regardless of the prior used for λ. The posterior distributions of branch lengths are quite robust to the two different priors used for λ.
The compound Poisson process for relaxing the molecular clock should prove practically useful for several applications. For example, estimation of divergence times for clades depends upon a calibration time for at least one speciation event on the tree and equal rates across lineages (though see Sanderson 1997). The compound Poisson process can be used to relax the molecular clock while at the same time allow estimation of divergence times of clades. An application of the method applied to the mammalian mitochondrial data is shown in Figure 12. The general approach is the same as that taken by Thorne et al. (1998), but we used the compound Poisson process discussed in this article instead of a lognormal distribution to relax the clock assumption. The estimation of divergence times required only one modification to the notation described in this article; instead of rescaling branch times between 0 (at the tips) and 1 (at the root), the tips of the tree were considered to be time 0 and the times of other branches on the tree were in terms of millions of years before present. The HKY85 + Γ model of DNA substitution was assumed in the analysis; the gamma model fit the observed sequences better than a model that pooled sites according to codon position and allowed a different rate for each partition (log_{e}L = −114,431.21 for HKY85 + Λ, log_{e}L = −117,091.83 for HKY85 + SS). An exponential prior with mean 0.006 was placed on λ. The estimates of the substitution model parameters were m = 0.010 (0.008, 0.012), α_{P} = 7.58 (2.60, 17.13), κ = 13.67 (13.00, 14.37), and α_{R} = 0.230 (0.228, 0.240). The posterior distribution of the divergence time of ferungulates (carnivores, perissodactyls, artiodactyls, and cetaceans) and Xenartha (Edentata) is shown in Figure 12. Arnason et al. (1997) obtained an estimate of 86 mya for this speciation event using a calibration date of 60 million years for the cowwhale divergence. We used a calibration date of 56.5 million years for the cowwhale divergence that is based on the earliest occurrence of either artiodactyls or cetaceans. The estimate of the FerungulateXenartha divergence is 92.5 mya (83.7, 99.6). The credibility interval overlaps the estimate obtained by Arnason et al. (1997). In Figure 12, we also indicate the divergence time estimates of the other clades on the tree, with the 95% credibility intervals.
Two different priors were used in the analysis of the speciation times for the mammals. We used exponential priors with means 0.06 and 0.006 per million years. These numbers were chosen so that the means of the exponential priors were similar to the priors used earlier (0.06 × 150 mya = 10; 0.006 × 150 = 1). Figure 13 shows the relationship between estimates of speciation times for two different priors on λ for the mammal mtDNA data. We used exponential priors with means 0.06 (x) and 0.006 (y). The xaxis shows that speciation times differed by at most 0.2 for the two priors. The estimates of speciation times appear robust to the choice of prior used for λ.
DISCUSSION
The molecular clock hypothesis, although useful in evolutionary biology, is inaccurate in two important details; rates across lineages appear to vary and the substitution process deviates significantly from a Poisson process (Ohta and Kimura 1971; Langley and Fitch 1973, 1974; also see Gillespie 1991). This article considers only lineage effects on rate variation; that is, we only consider changes in the overall substitution rate that occur along different lineages through processes such as a change in the generation time. We do not relax the assumption that the substitutions on a tree follow a Poisson process. It is important to relax the assumption that rates are constant across lineages because almost any study of the rate of substitution in DNA or amino acid sequences rejects the null hypothesis that rates across lineages are equal through time (Langley and Fitch 1974; Goodmanet al. 1975; Wu and Li 1985). This is especially true now that biologists are using recently available software that allows easy testing of the molecular clock hypothesis (e.g., PAML, Yang 1997; PAUP*, Swofford 1998).
Several approaches have been taken to relax the molecular clock. The most frequently used method for relaxing the rateconstancy assumption is to assign a fixed rate parameter to each branch of a phylogeny. This solution works well for the phylogeny problem where rates of substitution are often treated as nuisance parameters and the parameter of interest is the phylogeny of a group of organisms. However, allowing a rate of substitution for each lineage on a phylogenetic tree does not allow estimation of divergence times or allow comparison of speciation times for different taxonomic groups. Sanderson (1997) made an important advance in the estimation of divergence times using molecular data by (1) showing how all of the sequences of a study can be used simultaneously to estimate the divergence time of a clade and (2) relaxing the molecular clock by smoothing the rate differences across speciation times on a tree. Similarly, Thorne et al. (1998) relaxed the molecular clock by assuming that rates change across speciation events. They assigned rates to descendant lineages by sampling rates from a lognormal distribution. They evaluated the posterior distribution of speciation times using MCMC.
The compound Poisson process considered here models rate variation across lineages on a tree as a step function. There were several reasons we chose to model rate variation as a step function, with a major reason being analytical tractability. However, there are biologically plausible mechanisms that could cause rates to change in a nearly steplike manner. For example, any process that rapidly changes the substitution rate (such as a change in the proofreading mechanisms, a change in selective constraints, or a rapid change in the generation time) can be approximately described using a step function. One of the main advantages of treating rate variation as a compound Poisson process is that the model can introduce rate variation across lineages at any point on the phylogenetic tree; other methods that have been considered assume that substitution rate changes at speciation events (e.g., Sanderson 1997; Thorneet al. 1998). Such models may be sensitive to taxon sampling. On the other hand, these solutions may prove to be more practical for data analysis because there are fewer parameters to estimate. The gamma distribution used to modify ancestral rates, however, is not so easily justified using as an argument biological plausibility. We chose to use a gamma distribution to modify rates because of its analytical tractability and flexibility. Other methods for modifying rates, however, may work just as well.
We uncovered significant rate variation across lineages for the mammalian mitochondrial DNA sequence data sets. The clock hypothesis could be rejected using a likelihoodratio test (Felsenstein 1981). Importantly, the posterior distribution of the gamma parameter (α_{P}) was close to 0, which also suggests that the data were not consistent with the clock hypothesis.
Other models may also prove useful for relaxing the molecular clock assumption. For example, a doubly stochastic (or Cox) process can be used to allow rates to vary through time (Gillespie 1991). In this process, the rate of substitution is drawn from some stationary distribution r (u) (the rate at time u) with the transition probabilities calculated as they were in this article. Importantly, this process has a variance to mean ratio (index of dispersion) greater than one. One of the main difficulties of this process is the choice of the rate distribution. It is not clear what distribution would be most appropriate for modeling rate variation through time. Also, if the rate distribution has too many parameters, it may be difficult to estimate the parameters of the rate distribution while simultaneously allowing estimation of other interesting parameters (such as the divergence time of a clade).
We were able to estimate the divergence times of ferungulates (and other mammalian speciation events) using Bayesian inference under a relaxed molecular clock. In this study, we assumed one calibration date; the date of the speciation event leading to cows and whales was assumed to be exactly 56.5 mya. However, the Bayesian approach makes it easy to accommodate multiple calibration dates each with error. The error in a calibration date might be assumed to be uniformly distributed on some interval. However, this would only provide a starting point for characterizing the error in the calibration dates. Ideally, information on the density of fossil taxa around the calibration date would be used to construct a more realistic probability distribution describing the divergence time of a clade.
One important result of this article is that we demonstrate that events can be placed onto a phylogenetic tree under a Poisson process and estimation performed using MCMC. The Poisson process is one of the most useful models in biology, and there are several possible extensions of the approach we describe in phylogenetics. For example, one could place events onto a tree according to a Poisson process with each event changing the synonymous/nonsynonymous rate ratio, the base frequencies, or whether the site has a positive rate or a rate of zero (i.e., the covarion model). The advantage of this approach is that it allows the rate of nucleotide substitution to change on a tree under a biologically reasonable model with the addition of a small number of parameters.
Acknowledgments
This article benefited from discussions with Rasmus Nielsen and Michael Newton. Mary Kuhner provided suggestions for checking the results of the computer program. The suggestions of two anonymous reviewers also improved this article. J.P.H. and B.L. thank the Isaac Newton Institute for Mathematical Sciences and the National Science Foundation (NSF grant DMS9804739) for their support in Cambridge during the summer of 1998. B.L.'s research was supported by NSF grant DBI9723799.
APPENDIX
Here we demonstrate detailed balance between the move types that add and delete an event from the tree. Following the notation of Green (1995), consider the move m “insert a point into position ξ + 1 in the sequence” and its dual return move “delete the point from position ξ + 1 in the sequence.” We show that (from Green 1995; p. 714)
Let a generic point in A be x = (ξ, z, r), where z_{i} ϵ [0,T] and r_{i} ϵ (0,∞) for i = 1, 2, …, ξ. The point x′ = (ξ +1, (z, z*), (r, r*)) is a generic point in B. Note that q_{m}(x, dx′) = 0 if any of the arguments for the first ξ u's or r's in x′ differ from the corresponding arguments in x (because m adds a point at the end of the sequence). This means that the double integrals will measure 0 over any points in A that are not in B restricted to the smaller dimension or points in B that are not in A expanded to the larger dimension. Therefore, it suffices when specifying A and B to assume that the open intervals for the first ξ dimensions of the z's and r's agree.
The purported acceptance probabilities for traversing the prior are
Let
Note that
Footnotes

Communicating editor: S. Tavaré
 Received March 22, 1999.
 Accepted December 17, 1999.
 Copyright © 2000 by the Genetics Society of America