- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Nielsen, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Nielsen, R.
Mutations as Missing Data: Inferences on the Ages and Distributions of Nonsynonymous and Synonymous Mutations
Rasmus Nielsenaa Department of Biometrics, Cornell University, Ithaca, New York 14853-7801
Corresponding author: Rasmus Nielsen, Department of Biometrics, Cornell University, 439 Warren Hall, Ithaca, NY 14853-7801., rn28{at}cornell.edu (E-mail)
Communicating editor: W. STEPHAN
| ABSTRACT |
|---|
This article describes a new Markov chain Monte Carlo (MCMC) method applicable to DNA sequence data, which treats mutations in the genealogy as missing data. The method facilitates inferences regarding the age and identity of specific mutations while taking the full complexities of the mutational process in DNA sequences into account. We demonstrate the utility of the method in three applications. First, we demonstrate how the method can be used to make inferences regarding population genetical parameters such as
(the effective population size times the mutation rate). Second, we show how the method can be used to estimate the ages of mutations in finite sites models and for making inferences regarding the distribution and ages of nonsynonymous and synonymous mutations. The method is applied to two previously published data sets and we demonstrate that in one of the data sets the average age of nonsynonymous mutations is significantly lower than the average age of synonymous mutations, suggesting the presence of slightly deleterious mutations. Third, we demonstrate how the method in general can be used to evaluate the posterior distribution of a function of a mapping of mutations on a gene genealogy. This application is useful for evaluating the uncertainty associated with methods that rely on mapping mutations on a phylogeny or a gene genealogy.
MAPPING of character changes on a phylogeny using parsimony or other methods is one of the most important and fundamental tools in evolutionary biology. In molecular evolution it is common to map the evolution of nucleotides on a phylogeny or an intraspecific gene genealogy. Inferences regarding the evolution of the molecular characters then proceed by treating the estimated mutational events as the true data. The power of this approach is that it converts sequence data to pseudodata of mutations on a phylogeny/genealogy. It can be used to track the evolution of specific characters, to estimate the ages and distribution of mutations, and in general to examine hypotheses regarding molecular evolution. For example, ![]()
![]()
![]()
![]()
Although the mapping of mutations on trees has been an incredibly powerful tool in molecular evolution, the methodology may be criticized on statistical grounds for at least three reasons.
- These methods do not take into account the uncertainty in the estimation of tree topology.
- Parsimony mapping assigns the smallest possible number of mutations on the tree, which may lead to serious biases in the parameter estimates in some cases.
- The uncertainty in the assignment of mutations on the tree is usually ignored.
For example, even under the parsimony criterion, there are multiple ways of mapping character changes on a tree, e.g., the accelerated transformation (ACCTRAN) and the delayed transformation (DELTRAN; ![]()
In many cases, using one of the available maximum-likelihood methods can alleviate these problems. For example, if we are interested in testing the hypothesis that more nonsynonymous mutations relative to synonymous mutation occur in the internal branches of the tree, this can easily be done using maximum likelihood (![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article we develop a statistical method for investigating the distribution of mutations on trees. We are especially interested in estimating ages of mutations in population genetical data.
The general approach we take to this problem is to treat mutations as missing data. We devise a Markov chain Monte Carlo (MCMC) algorithm that effectively integrates over the set of possible mappings of mutations on a gene tree and the set of possible gene trees. In this way, it is possible to make inferences regarding mutations, while taking into account both the uncertainty in the mapping of mutations on the genealogy and the uncertainty associated with the estimation of the tree topology. We focus on population genetical data and first develop methods for estimating population genetical parameters such as
= 4Neµ (Ne is the effective population size and µ is the per generation mutation rate). Thereafter, methods for estimating the age of nonsynonymous and synonymous mutations in the genealogy are presented. Finally, we discuss how any function of the mapping of mutations can be estimated with associated measures of uncertainty using this method. Throughout, the new methods are applied to several previously published data sets of DNA sequences.
| MODELS |
|---|
There has recently been a lot of interest in developing statistical methods for estimating population genetical parameters, e.g., ![]()
![]()
![]()
![]()
![]() |
(1) |
where
is the vector of parameters, X is the observed genetic data, G denotes the gene genealogy,
is the set of all possible genealogies, and P
(G) is the probability distribution of G given
. It is necessary to take account of the gene genealogy because it summarizes information regarding the correlation among individuals in the population due to shared common ancestry.
A major breakthrough in population genetics was achieved when it was demonstrated how to derive distributions of gene genealogies, P
(G), from classical population genetical models (![]()
![]()
, a coalescence process then arises (![]()
![]()
![]()
= (
2, ... ,
n), where
i is the time in G in which there are i ancestors of the sample; i.e., the length of the edges in the genealogy are proportional to the time the genes have diverged from each other (Fig 1). Here and in the following, time is measured in number of generations scaled by the mutation rate µ. For example, for a class of neutral population genetical models in which the genes segregating in the population are exchangeable and the distribution of offspring number is constant in time, the probability density function of gene genealogies is given by
![]() |
(2) |
|
(![]()
![]()
![]()
![]()
In these models Pr(X|G,
) is calculated by superimposing a model of mutation on the gene genealogy. The mutation models are usually time-reversible continuous time Markov chains, usually developed for statistical inference in phylogenetics. For example, for DNA sequence data, models such as the F84 model (![]()
![]()
![]() |
(3) |
and diagonal elements determined by the mathematical requirement that the row sums should be zero.
is here the transition/transversion rate ratio.
j is the stationary frequency of nucleotide j and
is chosen such that
![]() |
(4) |
Then, the transition probabilities of this Markov chain along an edge of length t in the genealogy are given by P(t) = {pij(t)} = eQt. In the case of the F84 model these transition probabilities can be calculated analytically and can be found in ![]()
) can then be calculated for any G assuming stationarity of the process by summing over the ancestral states at each node of the genealogy (e.g., ![]()
![]()
![]()
) does not depend on
.
| MUTATIONS AS MISSING DATA |
|---|
In the following we describe an algorithm for simulating genealogies and mappings of mutations from a distribution proportional to the likelihood function. The method is based on the Metropolis-Hastings algorithm (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Note that the likelihood can be written as
![]() |
(5) |
where
is an assignment (or mapping) of a set of mutations to G, PG,
(
) is the probability distribution of
on G given
, and I(X, G,
) is an indicator function that returns to 1 if
on G is compatible with X and 0 otherwise.
consists of a vector of mutations for each edge in G, in which mutations are labeled with respect to type (e.g., G
T) and time. PG,
(
) can then easily be calculated for any model such as (3). The calculations of PG,
(
) are best illustrated using an example. Consider the case in Fig 2 of a sample of size 2 in a single site, with genealogy G* and assignment of mutations
*. In this case three mutations occurred: (a
t) at time 0.22, (a
c) at time 0.12, and (c
g) at time 0.38. All times are measured from the time of the root toward the present (t = 0.55). If the current state is i, then the distribution of the time to next mutation is exponentially distributed with mean
, qi = -qii, and the probability that the next mutation is a mutation to nucleotide j, j
i, is qij/qi. So
![]() |
(6) |
where pG*,
(
*) is the density of assignments of mutations on the genealogy G* evaluated at the point
*.
|
Most models of DNA sequence evolution are time reversible, with rate matrices taking the form qij = a(i, j)
j, where a(i, j) = a(j, i) (see ![]()
![]()
(
*) can be calculated as
![]() |
(7) |
In general, for a genealogy in which nucleotide i appears ni times, there are nij mutations from i to j or j to i, and nucleotide i exists during a total of ti time units in the genealogy, we have
![]() |
(8) |
where j
i means that j can take on the values {c, t, g} if i = a, {t, g} if i = c, and {g} if i = t. In reversible mutation models, Pr(X|G,
) is independent of the position of the root (![]()
(
); the calculations of PG,
(
) are identical no matter where the root is placed on the tree.
We are now able to calculate PG,
(
) and P
(G), using Equation 8 and Equation 6, respectively, for specific values of G and
and we can then establish a Markov chain with stationary distribution of (
, G) proportional to Pr(X|
,
, G). In this Markov chain, updates to
and G are proposed according to a proposal kernel h[(
, G)
(
', G')] such that I(X, G,
) = 1. A proposed update is accepted with probability
![]() |
(9) |
Assuming that the proposal kernel is constructed such that the chain is ergodic, the chain will have stationary distribution of (
, G) proportional to Pr(X|
,
, G). The details of the construction of the simulation algorithm are described in the Appendix
| ESTIMATION OF POPULATION GENETIC PARAMETERS |
|---|
In models, such as the finite state space DNA models, evaluation of (1) is possible only by simulation techniques for realistic-sized data sets. Several approaches to this problem have been published, the most successful being the method by ![]()
; i.e., we set
= {
}. We calculate the data probability according to the model described in Equation 3 assuming independence among sites and we use Equation 2 to assign probabilities to genealogies. We assume a uniform prior for
in the interval (0,
max), where
max is chosen to be sufficiently large to provide estimates of the likelihood-posterior distribution for all values of interest. If a maximum-likelihood estimate equal to
max is obtained, the analysis should obviously be repeated with a larger value of
max. Equation 1 is then evaluated stochastically by simulation of a Markov chain, which has stationary distribution of (G,
) equal to P(G,
|X)
Pr(X|G)P
(G)P(
). We then sample values of
from this chain. Since P(
|X)
Pr(X|
) when assuming a uniform prior for
, the likelihood function for
can be approximated by the empirical distribution of values of
sampled from the chain, i.e.,
![]() |
(10) |
where I
i
(a,b) is the indicator for the event that the ith sampled value of
is in the interval (a, b).
The method used for simulating the Markov chain is as described in the Appendix; however, the state space of the chain is expanded to also include
. Updates to
are proposed by choosing a new value of
(
1) uniformly in the interval (
0 -
,
0 +
), where
0 is the current value of
and
is a value that can be adjusted to provide an appropriate intermediate acceptance rate (e.g., ![]()
1 < 0 we set
1 = -
1, and if
1 >
max we set
1 = 2
max -
1. The acceptance probability for this type of update is then
![]() |
(11) |
To check the computer implementation of the new method, results were compared to the results obtained using numerical integration for very small data sets and to the results obtained using the program described in ![]()
Applications:
The performance of the new method was evaluated by repeated analysis of a real data set. It consists of a 360-nucleotide-long region of 63 mtDNA sequences from the Nuu-chah-Nulth tribe. It was published by ![]()
![]()
. The state space of the chain was augmented with this parameter and updates to the
were performed similarly to the updates for
.
max was set to 10
w, and
was set to
w, where
w is the ![]()
. The results of the 10 replicates using different initial values of the parameters are in Fig 3. Note that the likelihood surfaces are roughly similar. We would make essentially the same statistical inferences from each replicate. This suggests that convergence was achieved and demonstrates that the method in fact can be used for population genetical inferences.
|
| ESTIMATION OF THE AGES OF MUTATIONS |
|---|
There has recently been considerable interest in making inferences about ages of mutations in DNA sequence data. There are several reasons for this. First, information regarding the age of mutations may be used in linkage disequilibrium studies (e.g., ![]()
![]()
![]()
Several estimators of the age of a mutation have been proposed (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
For a set of aligned DNA sequences, the most efficient estimators of the age of a mutation should employ all of the genealogical information contained in the sample. The estimator of ![]()
![]()
However, in much of the commonly analyzed data, such as most mammalian mitochondrial data, polymorphisms cannot with certainty be determined to be unique. In much of the available data, especially mitochondrial DNA data, there is a significant probability that more than one mutation is segregating in a particular variable site. The problem of estimating the age of a mutation, therefore, becomes a problem of estimating the age, or average age, of a polymorphism.
The solution to the problem is to integrate over all the possible ways mutations could be distributed on a genealogy to get the average age of a mutation in a site. The posterior expectation of the age can be written as
![]() |
(12) |
Here Ai is the age of a mutation in site i and E(Ai|G,
) is the average age of a mutation in site i given a particular gene genealogy and a distribution of mutations on the gene genealogy. E(Ai|X) is the estimator of the age of a mutation and can be considered a Bayesian estimator, since it is given by the posterior expectation of Ai. The integrals in (3) can easily be solved stochastically by the MCMC method previously discussed because E(Ai|G,
) can be observed directly from the genealogy. A simulation consistent estimator of E(Ai|X) is given by
![]() |
(13) |
where Ai(j,k) is the age of the jth mutation in site i in the kth step of a Markov chain with stationary distribution of G and
given by P
(G,
|X) and where n(i, k) is the number of mutations in site i in the kth cycle of this chain. A Markov chain with this stationary distribution can be established using the methods described in the section regarding the estimation of population genetic parameters. It is exactly the same Markov chain as in the previous case, except now we are interested in the distribution of genealogies themselves for the purpose of making Bayesian inferences regarding mutations and genealogy.
Applications:
The method was applied to two different previously published data sets. The distribution of genealogies and the mutational process was modeled as in the previous section. The first is a mtDNA data set (Cytochrome b) from Mesomys hispidus (spiny tree rat), obtained from GenBank and originally sequenced by ![]()
![]()
![]()
![]()
and
.
To ensure convergence 5,500,000 cycles of the Markov chain were simulated for each gene. The first 500,000 cycles were used as burn-in time. Three runs were completed for each data set using different starting trees, all runs giving essentially identical results. Only the results from the first run are shown for each data set.
The ratios of the average ages of nonsynonymous mutations to the average ages of synonymous mutations were 1.034 for the Influenza data set and 0.682 for the mtDNA data set. To test if the ratios of ages were different from 1, a two-sample bootstrap test for difference in the mean age was performed on each data set using 1,000,000 simulations. To reduce the variance, only sites in which the posterior probability of at least one mutation was larger than 0.2 were included. The P value for the mtDNA data set was
0.000028 and the P value for the Influenza data set was
0.836. In the Mesomys Cytochrome b data set, 25% of sites in which the posterior probability of at least one nonsynonymous mutation is larger than 0.2 have an average age of nonsynonymous mutations less than 0.05 Ne. Only 2.59% of sites in which the posterior probability of at least one synonymous mutation is larger than 0.2 have an average age of synonymous mutations less than 0.05 Ne.
The difference between the two data sets is not surprising. It was previously suggested that the average age of mutations in nonsynonymous sites is less than the average age of synonymous mutations in a smaller data set of Cytochrome b sequences from M. hispidus (![]()
![]()
![]()
![]()
|
| EVALUATING FUNCTIONS OF THE DISTRIBUTION OF MUTATIONS |
|---|
In the previous two sections we demonstrated how the method can be used to make inferences regarding population genetical parameters (
) and to estimate the ages of mutations/polymorphisms. However, the method can be applied quite generally to evaluate the expectation of any function of a mapping mutations. As previously discussed, mapping of mutations on trees has been a powerful tool in molecular evolution despite some skepticism regarding the robustness of these methods to violations of assumptions regarding tree topology and mapping algorithm. However, by stochastically integrating over the set of possible topologies, edge lengths, and mappings, it is possible to alleviate these valid statistical concerns. The posterior distribution of a discrete function of the genealogy and distribution of mutations [h(G,
)] is given by
![]() |
(14) |
where
[h(G,
),y] is Kronecker's delta function returning 1 if h(G,
) = y and 0 otherwise. It is estimated by
![]() |
(15) |
where Gi and
i are the ith of n samples of G and
, respectively, from a Markov chain with stationary distribution of G and
equal to the posterior distribution of G and
. Continuous densities can similarly be evaluated using density estimation methods such as kernel smoothing.
Applications:
As an example, we consider tests of the distribution of the number of mutations among branches. For example, ![]()
![]()
The results are shown in Fig 5. As expected from the previous analysis, the P values for the Influenza data set tend to be rather large whereas the P values from the Mesomys data set are smaller with a majority of P values <0.001. However, also note that there is quite a spread in the P values for the Mesomys data set. This illustrates that the P values obtained from a single mapping of mutations are highly dependent on the particular mapping of mutations and emphasizes that results based on mappings of mutations that consider only one of the possible mappings should be interpreted with caution.
|
| DISCUSSION |
|---|
The new method shows promise as an estimator of population genetical parameters, for Bayesian estimation of properties of the genealogy and the distribution of mutations, and in general as a method for evaluating any function of the distribution of mutations on a gene genealogy. There are already several methods available for estimating population genetical parameters using MCMC and related simulation methods (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
However, the most important application of the method appears to be for Bayesian inferences regarding the distribution of mutations on the gene genealogy. Here we focused especially on inferences regarding the age of nonsynonymous and synonymous mutations. Such inferences are of interest because the ages of mutations are intimately related to models of selection at the DNA level.
One potential problem of concern is the assumption of a neutral coalescent prior. This distribution of genealogies may obviously not be correct under selection and is also likely to be violated under more realistic demographic models. However, if nonsynonymous and synonymous mutations are distributed identically on the tree, the effect of assuming a wrong prior is the same for the two types of mutations. Therefore, a test of equal average ages of nonsynonymous and synonymous mutations, as implemented here, will not give an excess of falsely significant results beyond the chosen significance level, even if a wrong prior has been assumed for the distribution of genealogies. Also, for large samples the distribution of genealogies will be determined primarily by the data and the prior distribution will have a diminishing effect on any inferences made.
We note here that there are many other potential applications of the method, for example, in the analysis of the degree of correlation in the mutation process and in the analysis of the substitution process in interspecific data, i.e., data from multiple different species. In this article the focus was on population genetical models, population genetical data, and population genetical problems. However, analyzing interspecific data would require only a change in the prior distribution of genealogies. For example, ![]()
![]()
![]()
![]()
![]()
A final limitation of the method is the assumption of a single shared genealogy for all sites. This assumption implies that there can be no recombination, and the method is therefore only applicable to mtDNA data, Y-chromosome data, and data from viral strains with little or no recombination when considering population genetical problems. In theory, however, the method could also be implemented for models that include recombination as in ![]()
Manuscript received November 21, 2000; Accepted for publication June 5, 2001.
| APPENDIX |
|---|
This appendix describes the algorithmic details of the MCMC method. The proposal algorithm presented here is based on updating each edge of the genealogy one at a time. In each update, the length of the edge and potentially also the topology of the tree are updated simultaneously with the assignment of mutations to the updated edge. Mutations on the new edge are simulated under the condition I(X, G,
) = 1, using a fast approximation based on a Poisson process that enables easy evaluation of

It is designed to be efficient when the expected number of mutations per site in an edge is small and the mutation process is approximately Poisson. The initial tree is an UPGMA tree, where mutations are assigned using random parsimony mappings (see below). Updates to the tree and assignment of mutations are then performed as follows:
- 1. Moving an edge: Choose an edge uniformly among all edges in the tree. Move the time at which the edge connects to its parent edge a Gaussian-distributed distance (d) with mean 0 and variance
2. If the length of the edge becomes negative, reflect d against the boundary provided by the time of the origin of the edge. When a node is encountered, continue to move the edge along either of the two other edges connecting to the node, each with probability 0.5. - 2. Simulating times of mutations: For each site in the new edge, determine the ancestral state (A; not to be confused with nucleotide a) and the end state (E) under the condition I(X, G,
) = 1. If A = E simulate a Poisson-distributed number of mutations with mean qAt (qA = -qAA), conditional on not observing exactly one mutation. At the same time, simulate the times at which the mutations occur. If A
E simulate a Poisson-distributed number of mutations with mean qAt and the associated times, conditional on observing at least one mutation. These simulations are done by repeatedly simulating times from an exponential density with mean 1/qA. In the case of A = E, simulations are discarded when only one mutation occurred during time t. In the case A
E, the time to the first mutation (t1) is simulated from the distribution 
(A1) to assure that at least one mutation occurs. Simulations from (10) can easily be performed using the inverse transformation method

(A2) where U is a uniform [0, 1] random variable.
- 3. Distributing the mutations on the edge: Mutations are distributed on the edge according to the times simulated in step (2) and according to the relative rates given by q; i.e., if the current nucleotide is i, the probability that the nucleotide after mutation is of type j is qij/qi. However, set qiE = 0 for all i for the second to last mutation in the edge, and qij = 0 for all j
E for the last mutation in the site.
Note that updates to the topology are achieved at step 2 at the same time the length of the edge is updated. Also note that in the present representation, A and E will be given for any site in an edge because of the condition I(X, G,
) = 1. Then,
![]() |
(A3) |
where zs, fs, and us are the functions z, f, and u evaluated in site s and
![]() |
(A4) |
where k is the number of mutations assigned to the site, t is the length of the edge, ti is the time of the ith mutation on the edge since the origin of the edge, and bi is the nucleotide resulting from the ith mutation in the site, and we use the convention t0 = 0, tk+1 = t, b0 = A. c is a constant that cancels when the ratio

is evaluated. The factor f exp(-u) enters because we have simulated k mutations according to a Poisson process with rate qA. However, the rate at which mutations occur after the ith mutation is qbi, according to the prior distribution. Also, if A = E, k was simulated from the distribution Pr(k|k
1) and if A
E, k was simulated from the distribution Pr(k|k > 0). The factor z enters because we simulated mutations assuming that the second to last mutation cannot be a mutation to state E and the last mutation must be a mutation to state E.
PG,
(
)/h[(
', G'), (
, G)] can be found similarly and since P
(G')/P
(G) easily can be evaluated from (2) under the assumption of a standard neutral coalescence model, the Metropolis-Hastings ratio can easily be evaluated under this proposal distribution. The value of
2 will determine the magnitude of the proposed updates and can be adjusted to guarantee an appropriate intermediate ratio of accepted to rejected updates (e.g., ![]()
Example:
To illustrate how the method works, in the following we give an example of a single update for a data set containing four sequences and a single site. The initial state is depicted in Fig 1A. We assume a rate matrix of the form

and
= 1. We choose a random edge, in this case the striped edge (see Fig 1A), and slide it a random distance to a new point in the genealogy. The mutations on the edge are erased and a new set of mutations is simulated. For the site depicted in Fig 1, the new edge must start with state c and terminate in state g. We simulate a random number of mutations (k) and times of their occurrences using a Poisson process with rate -qcct = 0.2, conditional on k
0. In this incident k = 1 and the time is 0.1, and since E = g, we determine that the mutation is a c
g mutation (Fig 1B). We then have

|
The latter two ratios could also have been calculated directly using Equation A4. The acceptance probability for the new update is then min{1, 7.389 x 0.060/0.150} = 1.
Since the chain is aperiodic, the probability of moving an edge to any other position in the genealogy in any particular cycle of the chain is positive, and the probability of changing the distributions of a mutation on an edge to any other supported distribution of mutations on the edge in a single cycle is positive, the chain is ergodic as desired. However, this does obviously not guarantee convergence of the ergodic averages in finite time.
Improving mixing:
One potential concern is that the chain may not mix fast between different mappings with similar posterior probabilities. Especially, if the number of mutations per site is low, the chain might get trapped in a particular mapping corresponding to a parsimony mapping that does not communicate well with other possible parsimony mappings. To alleviate this problem an update algorithm is added to the chain, which, for all sites in the sequence in which the current state (
) is a parsimony mapping, will choose a new parsimony mapping (
') uniformly among all possible parsimony mappings. The new mapping is then accepted with probability
![]() |
(A5) |
Such updates are added to the chain every 50 cycles. New parsimony mappings are simulated by enumerating all possible parsimony mappings and choosing one of these by random.
To illustrate the improvement in mixing resulting from adding this update, the method was applied to a data set containing four sequences and eight variable sites. For this data set, four of the variable sites supported the bipartion of the leaf nodes [(1, 2), (3, 4)] and one (site 1) supported the bipartion [(1, 3), (2, 4)]. The proportion of cycles in the chain in which the number of mutations assigned to site 1 in the edge leading to leaf node 1 changes is used as a proxy for how well the chain mixes. In 5000 updates, only 11 updates resulted in a change in the number of mutations assigned to the site-edge combination. However, if a parsimony update was added each cycle, 2279 updates out of 5000 updates resulted in a change in the number of mutations assigned to the site/edge combination. In the analysis of the real data sets, very little if any improvement in the rate of convergence of the ergodic averages was observed with and without a random parsimony step. One of the possible explanations for this is that the real data sets contained only very few sites with multiple hits.
| LITERATURE CITED |
|---|
BAHLO, M. and R. C. GRIFFITHS, 2000 Inference from gene trees in a subdivided population. Theor. Popul. Biol. 57:79-95[Medline].
BEAUMONT, M. A., 1999 Detecting population expansion and decline using microsatellites. Genetics 153:2013-2029
BEERLI, P. and J. FELSENSTEIN, 1999 Maximum likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152:763-773
BUSH, R. M., W. M. FITCH, C. A. BENDER, and N. J. COX, 1999 Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol. Biol. Evol. 16:1457-1465[Abstract].
DA SILVA, M. N. F. and J. L. PATTON, 1993 Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha). Mol. Phyl. Evol. 2:243-255[Medline].
EDWARDS, A. W. F., 1970 Estimation of branch points from a branching diffusion process. J. R. Stat. Soc. Ser. B 32:155-174.
FELSENSTEIN, J., 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376[Medline].
FELSENSTEIN, J., 1984 DNAML, computer program. Distributed from http://evolution.genetics.washington.edu.
FELSENSTEIN, J., 1992 Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method. Genet. Res. 60:209-220[Medline].
FELSENSTEIN, J. and G. A. CHURCHILL, 1996 A hidden Markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 93:93-104.
FISHER, R. A., 1930 The Genetical Theory of Natural Selection, Ed. 1. Clarendon Press, Oxford.
FITCH, W. M., R. M. BUSH, C. A. BENDER, and N. J. COX, 1997 Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl. Acad. Sci. USA 94:7712-7718
GELMAN, A., G. O. ROBERTS and W. R. GILKS, 1995 Efficient Metropolis jumping rules, pp. 599608 in Bayesian Statistics 5, edited by J. M. BERNADO, J. O. BERGER, A. P. DAWID and A. F. M. SMITH. Oxford University Press, Oxford.
GEYER, J. C., 1991 Markov chain Monte Carlo likelihood, pp. 156163 in Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, edited by E. M. KERAMIDAS. Interface Foundation, Fairfax Station, VA.
GOLDMAN, N. and Z. YANG, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736[Abstract].
GRIFFITHS, R. C. and S. TAVARÉ, 1994a Simulating probability distributions in the coalescent. Theor. Popul. Biol. 46:131-159.
GRIFFITHS, R. C. and S. TAVARÉ, 1994b Ancestral inference in population genetics. Stat. Sci. 9:307-319.
GRIFFITHS, R. C. and S. TAVARÉ, 1998 The age of a mutation in a general coalescent tree. Stoch. Models 14:273-295.
GRIFFITHS, R. C. and S. TAVARÉ, 1999 The ages of mutations in gene trees. Ann. Appl. Prob. 9:567-590.
HASTINGS, W. K., 1970 Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109
HUELSENBECK, J. P., B. RANNALA, and B. LARGET, 2000 A Bayesian framework for the analysis of cospeciation. Evolution 54:353-364.
KINGMAN, J. F. C., 1982a The coalescent. Stochast. Proc. Appl. 13:235-248.
KINGMAN, J. F. C., 1982b On the genealogy of large populations. J. Appl. Prob. 19A:27-43.
KISHINO, H. and M. HASEGAWA, 1989 Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order of Hominoidea. J. Mol. Evol. 31:151-160.
KUHNER, M. K., J. YAMATO, and J. FELSENSTEIN, 1995 Estimating effective population size and mutation rate from sequence data using METROPOLIS-HASTINGS sampling. Genetics 140:1421-1430[Abstract].
KUHNER, M. H., J. YAMATO, and J. FELSENSTEIN, 1998 Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149:429-434
LARA, M. C., J. L. PATTON, and M. N. F. DA SILVA, 1996 The simultaneous diversification of South American echimyid rodents (Hystricognathi) based on complete cytochrome b sequences. Mol. Phylogenet. Evol. 5:403-413[Medline].
LARGET, B. and D. SIMON, 1999 Markov chain Monte Carlo algorithms for the Bayesian























