- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Cutler, D. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Cutler, D. J.
Understanding the Overdispersed Molecular Clock
David J. Cutleraa Center for Population Biology, University of California, Davis, California 95616
Corresponding author: David J. Cutler, Rm. BRB 747B, Case Western Reserve University, 2109 Adelbert Rd., Cleveland, OH 44106-4955., djc14{at}cwru.edu (E-mail)
Communicating editor: G. B. GOLDING
| ABSTRACT |
|---|
Rates of molecular evolution at some protein-encoding loci are more irregular than expected under a simple neutral model of molecular evolution. This pattern of excessive irregularity in protein substitutions is often called the "overdispersed molecular clock" and is characterized by an index of dispersion, R(T) > 1. Assuming infinite sites, no recombination model of the gene R(T) is given for a general stationary model of molecular evolution. R(T) is shown to be affected by only three things: fluctuations that occur on a very slow time scale, advantageous or deleterious mutations, and interactions between mutations. In the absence of interactions, advantageous mutations are shown to lower R(T); deleterious mutations are shown to raise it. Previously described models for the overdispersed molecular clock are analyzed in terms of this work as are a few very simple new models. A model of deleterious mutations is shown to be sufficient to explain the observed values of R(T). Our current best estimates of R(T) suggest that either most mutations are deleterious or some key population parameter changes on a very slow time scale. No other interpretations seem plausible. Finally, a comment is made on how R(T) might be used to distinguish selective sweeps from background selection.
THE most simple version of the neutral theory of molecular evolution (![]()
![]()
![]()
![]()
The first article to demonstrate a deviation from a Poisson number of substitutions occurred early in the history of the neutral theory (![]()
The first attempts to use a phylogeny in an explicit manner came a few years later (![]()
![]()
2 (![]()
![]()
![]()
![]()
In 1983, Kimura attempted to directly test whether or not the index of dispersion truly equaled one (![]()
In a series of articles, ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Gillespie's solution to lineage effects was to (1) restrict his analysis to three species at a time, thereby guaranteeing a single unrooted phylogeny, and (2) weight the number of substitutions in each lineage by one over the mean number for that lineage, where the mean is taken over all loci examined. This weighting process amounts to regressing out lineage effects from the data. Using these weightings, Gillespie showed that for replacement substitutions in 20 loci, R(T) ranged from 0.13 to 43.82 with a mean of 6.95 (![]()
![]()
![]()
By 1995 enough data had been gathered to examine 49 mammalian loci. Using Gillespie's weighting factor, ![]()
A recent study of Drosophila (![]()
It now seems clear that mammalian loci are, on average, overdispersed at both silent and replacement sites. In Drosophila, it is likely that silent sites are overdispersed, but replacement sites might not be. In any case, because the most simple neutral theory can never produce an R(T) > 1, it is of interest to know which models of molecular evolution can produce a large index of dispersion. Several models have been suggested. They include episodic selection on a mutational landscape (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
1 (![]()
|
In other simulations, ![]()
![]()
The goals of this article are threefold: first, to describe the mathematical machinery necessary to analyze the index of dispersion for an infinite site model of the gene and second, using this machinery, to describe which models will produce R(T) > 1, which will produce R(T) < 1, and which will produce R(T)
1. Finally, from our observation that R(T) appears to be >5 for mammalian data, this article attempts to discover what we can infer about the nature of mammalian evolution.
| CALCULATION OF R(T) |
|---|
A substitution is a mutation that ultimately fixes in the population. There are two different processes that might be called the substitution process. One process, the origination process (![]()
![]()
To derive the index of dispersion of the origination process begin by assuming that the gene contains an infinite number of sites and by assuming that there is no recombination between those sites (![]()
haploid individuals. Let the population reproduce according to a discrete time Moran model (![]()
![]() |
(1) |
(![]()
is the origination rate,
= E{St} = Pr{St = 1}, and h(t) is the conditional intensity function defined by
![]() |
(2) |
Equation 1 and Equation 2 are discrete time analogs of results given by ![]()
![]()
![]() |
(3) |
where E{Xt} is the expected frequency of a mutant t time steps after it enters the population, and
(t) is the amount of interaction between sites separated by t time units, defined by
![]() |
(4) |
where p is the probability of fixation of a new mutant, p = E{St|Mt = 1}, and jt on i0 is the condition of a mutant arising at time t on a piece of DNA containing a mutant that arose at time 0.
If h(t) converges to
sufficiently quickly, so that 
t=1t(h(t) -
) <
, then for large T, R(T) can be approximated by
![]() |
(5) |
![]() |
(6) |
where Ds = 
1(h(t) -
). The approximation uses the fact that
<< 1. The sign of Ds determines whether the substitutional process is overdispersed (Ds > 0), underdispersed (Ds < 0), or indistinguishable from a neutral model (Ds = 0). Thus Ds can be thought of as the deviation in R(T) from a simple neutral model. It turns out that Ds can be calculated directly for a few simple models. Even when direct calculation of Ds is difficult, its sign and relative magnitude can often be estimated.
A few comments concerning the conditional intensity function should be made. It is defined to be the product of three terms, the probability there is a mutation at time 0, given a mutation at time t, Pr{M0 = 1|Mt = 1}, the expected frequency of a mutant t time units after it entered the population, E{Xt}, and the amount of interaction between mutants separated by t time steps,
(t). The amount of interaction between mutants,
(t), is defined to be the probability of fixation of a mutant, given that it occurred on a piece of DNA containing a mutation that entered the population t time steps earlier, Pr{St = 1|jt on i0}, divided by the unconditional probability of fixation of a mutant p. A more complete description of
(t) is given below.
| SLOWLY CHANGING ENVIRONMENT |
|---|
Virtually any model containing a key parameter that changes on a sufficiently slow time scale can explain the observed index of dispersion. Other work has shown (![]()
Despite the fact that slowly changing parameters can cause R(T) to be large, simply invoking slow change appears to be an incomplete explanation of R(T). If one assumes that the time between substitutions is measured in millions of years, one must also assume that the environment changes on the time scale of millions of years. At first glance, it is not obvious that any environmental process has this property. Environmental processes that are often considered slow (for instance glaciation) are usually orders of magnitude faster than would be required here. To fully explain R(T), a mechanism would need to be suggested that could cause a key parameter to change so slowly. Without such a mechanism, a slowly changing environment appears a somewhat hollow explanation.
Takahata's fluctuating neutral space (FNS) model (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
First, for the FNS model to generate large values of R(T), there must be more than two possible mutation rates (![]()
![]()
![]()
![]()
| UNDERSTANDING MODELS WITH SELECTION |
|---|
Many models make the assumption that the mutation process has a constant rate. If
(t) =
, then
![]() |
(7) |
If there is little interaction between sites (
(t)
1), then (7) further reduces to
![]() |
(8) |
where Ds1 = 2

t=1(E{Xt} - p) can be thought of as the deviation of R(T) due to selection in the absence of mutation interactions. In many cases, understanding Ds1 is the key to understanding selection's effect on the index of dispersion.
The expected frequency of a neutral mutation does not change over time; E{Xt1} = E{Xt2} = p for all t1 and t2. Nonneutral mutations do not necessarily have this property. Ds1 measures the effect a changing expected frequency has on the index of dispersion. A simple rule of thumb results. In the absence of site interactions, deleterious mutations cause R(T) > 1, and advantageous mutations cause R(T) < 1. The magnitude of the effect can be made quite large.
The overall sign of Ds1 is obviously determined by the sign of the E{Xt} - p. If the expected frequency of mutations does not change over time, then Ds1 = 0. If the expected frequency of sites monotonically declines over time, then Ds1 > 0 [because E{Xt}
E{X
} = p, E{Xt} - p
0]. Conversely, if the expected frequency monotonically increases, then Ds1 < 0. An interesting unsolved problem is to describe which models of molecular evolution have the property that the expected frequency of mutations is monotonic over time. A natural conjecture (and one that is consistent with the simulations performed here) is that any stationary model has this property.
Apart from a simple one-locus, two-allele Fisher-Wright world, there is some difficulty defining what is meant by a deleterious/advantageous mutation. For the purposes of this article, a particular mutation will be said to be deleterious/advantageous if its expected frequency decreases/increases over time. A model will be said to be a deleterious/advantageous mutation model if, averaged over all possible mutants, the expected frequency of mutants decreases/increases. Note that the definition of deleterious/advantageous mutation model is a property of the mutations, not the originations. Thus, a model will be called a deleterious site model if the majority of mutations decline in frequency, but this statement implies nothing at all about the fitness of the sites that actually fix. It is a statement about the average properties of mutants, not a statement on the properties of those rare mutants who eventually fix.
Thus, we arrive at the conclusion that, in the absence of site interactions, deleterious mutation models have an R(T) > 1, and advantageous mutation models have an R(T) < 1. Although this result is clear as stated, the intuition concerning why it's true may be less obvious.
Mutations do not necessarily fix one at a time. Ds1 can be thought of as measuring the effect the size and frequency of multiple fixations has on the index of dispersion. To see this, write Ds1 as:
![]() |
(9) |
Consider the piece of DNA that reproduces at time step 0. Call this piece of DNA i0. i0 may contain a mutation from time step -1. The probability that there was a mutation at time -1 is
. The probability that i0 contains this mutation is E{X1} (![]()
![]()
E{X1} is the expected number of mutations on i0 from time step -1. Similarly
E{X2} is the expected number of mutations from time step -2. In general, the first sum in (9) is the expected number of mutations on i0.
In a neutral model, the expected frequency of a site does not change over time. So, for a neutral model E{Xt} = p for all t. Thus, the second sum in Equation 9 is what the expected number of mutations on i0 would be, if this were a neutral model with probability of fixation p. Therefore, Ds1/2 is equal to the expected number of mutations on i0 minus the expected number of mutations under a neutral model. For a deleterious site model E{Xt} > p, so that the first sum is bigger than the second, and, on average, there are"too many" mutations on i0, relative to a neutral model with the same probability of fixation. Conversely, in an advantageous model E{Xt} < p, so that there are "too few" mutations on i0 relative a neutral model with the same probability of fixation.
Finding Ds1 directly for any particular model is not trivial. Other than for the neutral case, it is not obvious that E{Xt} is ever easy to calculate. For models that may be approximated with a diffusion, finding E{Xt} amounts to solving a Kolmogorov backward equation. If a two-allele model is an adequate approximation, the problem can also be formulated as an ordinary differential equation (![]()
To estimate Ds1 within a simulation, a single extra vector, call it DS[0 ... R], needs to be stored, where R is a number sufficiently large so that all sites are extremely likely to be fixed or lost within R generations (R = 1000N was used in the simulations for this article). Initialize the DS vector to 0. During the simulation, track the frequency of each site in all generations. For each mutation add its frequency t generations after it entered the population to the value stored in DS[t]. When the simulation is done, divide each element of DS by the total number of mutations. Estimate Ds1 by Ds1 = 2
(
Rt=0DS[t] - DS[R]).
Finally, one might ask if there is any general intuition on the effects of the overall mutation rate and overall strength of selection on Ds1. As is obvious from Equation 8, Ds1 is independent of time. Also, Ds1 appears to be a linear function of the mutation rate. For small mutations rates, this may be roughly true, but for large 2
the linear dependence must disappear. The reason for this is that E{Xt} and p must also be a function of 2
, because 2
effects, among other things, the overall heterozygosity of the population and its mean fitness. As a result, Ds1 is unlikely to depend on 2
in a simple linear manner.
By analogy to a classical Fisher-Wright model, one can imagine changing the strength of selection. This can have two effects on Ds1. First, it can change the probability of fixation, p, thereby making p closer to/further from the initial frequency of a new mutant (1/N), thereby decreasing/increasing |Ds1|. Second, when the strength of selection changes, the time it takes for the expected frequency to reach p will also change. Increasing selection decreases the time, so that |Ds1| decreases. Decreasing selection increases the time, so that |Ds1| increases. Predicting the net effect is difficult. In all the simulations, increasing selection usually increased |Ds1|, and never significantly decreased it, but for very strong selection, Ds1 generally appeared to approach some asymptote.
| INTERACTION BETWEEN SITES |
|---|
Consider a mutant that enters the population at the current time step, t. Call this mutation jt. The probability that jt ultimately fixes is p. When jt entered the population, it arose on some piece of DNA. The piece of DNA might contain other mutations. Pr{St = 1|jt on i0} is the probability that jt fixes, given that the piece of DNA on which it arose contains another mutant, i0, which entered the population at time zero. If knowing that the piece of DNA contains an earlier mutation does not effect jt's chance of fixation, then Pr{St = 1|jt on i0} will equal p, and
(t) = 1. When
(t) = 1 we say there is no interaction between mutants. On the other hand, when the knowledge that jt arose on a piece of DNA containing i0 alters the probability that jt fixes, we say that there is interaction between mutants, and
(t)
1.
There are at least two fundamental ways in which mutants can interact. We call these two ways direct and indirect interactions. For many models of natural selection, the fitness of a piece of DNA is proportional to the number of mutations that it contains. For instance, in the negative gamma shift model (described below), when a new mutation enters the population, the fitness of the piece of DNA on which it arose is equal to its fitness prior to the mutation, minus a gamma-distributed random variable. When the fitness of a piece of DNA is a function of the number of mutations contained on the piece of DNA, we say that mutations directly interact with one another.
For other models of evolution, the fitness of a piece of DNA containing a new mutation is independent of the number of previous mutations. In the house of cards model, the fitness of a piece of DNA containing a new mutation is drawn independently from some fixed [often Gaussian (![]()
![]()
If one knows that jt arose on a piece of DNA containing i0, one has some information about the state of the population. In particular, one suspects that i0's expected frequency, conditional on jt arising on i0, is higher than its unconditional expected frequency. The knowledge that i0 is expected to be at higher frequency may, in turn, suggest something about the expected mean fitness of the population. The expected mean fitness of the population may, in turn, suggest something about the probability that jt will fix. In general, when i0 effects jt's probability of fixation through one or more intermediaries (like population mean fitness), we say that i0 and jt indirectly interact. Virtually all nonneutral models should have some form of indirect interactions, although we suspect that models that produce relatively constant population mean fitnesses might have relatively negligible indirect interactions.
A simple rule of thumb can be applied to site interactions. In general, direct interactions tend to move R(T) toward one; indirect interactions tend to move R(T) away from one. This can be seen by considering a few simplified cases.
Consider an advantageous mutation model where the fitness of a piece of DNA with k mutations is equal to 1 + ks, s > 0. This is, by definition, a model of direct interactions. If interactions were absent, R(T) would be <1. Direct calculation of
(t) is hard, but it has to be >1. Because all mutations are advantageous, the probability of a site fixing, given that it arises on a piece of DNA containing another mutant, must be larger than its unconditional probability of fixation, because its fitness is higher. So,
(t) > 1, which implies that when E{Xt} < p,
(t)E{Xt} > E{Xt}. Thus, when mutations are beneficial, direct interactions move R(T) toward 1.
The converse is true for the deleterious model with direct interactions. If the fitness of a sequence with k mutations is 1 - ks, s > 0, then E{Xt} > p, and in the absence of interactions, R(T) would be >1. But, because each additional mutation lowers the fitness of a piece of DNA, Pr{St = 1|jt on i0} must be less than p, and
(t) must be <1. Thus, this form of direct interaction must move R(T) toward 1.
Indirect interactions often have the opposite effect. Consider a mutation jt, which enters the population at time t, on a piece of DNA containing an earlier mutation i0. Conditional on jt landing on a piece of DNA containing i0, i0's expected frequency is likely to be higher than its unconditional expected frequency. If this is a deleterious mutation model, i0's higher conditional expected frequency suggests that the conditional expected population mean fitness is likely to be lower than the unconditional expectation. Given that jt arose at a time when the conditional population mean fitness is expected to be lower than the unconditional expected fitness, jt's probability of fixation is likely to be higher, thereby making
(t) > 1. Conversely, if this is an advantageous mutation model, jt arising on i0 suggests that the conditional expected population mean fitness may be higher than the unconditional average, making it likely that jt's probability of fixation is lower than the unconditional average, so that
(t) < 1. Therefore, indirect interactions are likely to increase R(T) for deleterious site models and decrease R(T) for advantageous ones.
| EXCHANGEABLE ALLELES |
|---|
![]()
![]()
![]()
Symmetrical over-/underdominance is characterized by individuals who are homozygous for all sites of the locus having fitness 1. Individuals who are heterozygous for even a single site have fitness 1 + s, where s is fixed and greater than zero for overdominance and less than zero for underdominance. The mutational model is assumed to be Poisson, with constant rate
.
If mutation interactions were absent, the overdominance model would produce an R(T) < 1, and underdominance would lead to R(T) > 1. When a mutation enters the population, the piece of DNA on which it arose will be in heterozygotes for at least its first few generations. Therefore, this piece of DNA will have higher than average fitness during this time, and E{Xt} will be an increasing function for this time. Similarly, a new underdominant mutant will have a lower than average fitness, and E{Xt} will be a decreasing function at first. Whether this pattern continues (overdominance mutants increase in expected frequency; underdominance mutants decline) for the entire time a mutant segregates in the population remains an open analytical question. It is clear from GILLESPIE's (1994a) simulations, and the ones done here, that E{X
} > 1/N for overdominance models, and E{X
} < 1/N for underdominance models. Thus, one suspects that this pattern of expected frequency change may hold the entire time a mutant segregates. It is certain, for the parameter values examined in this study, in every simulation E{Xt}
E{Xt+1} for all overdominance models, and E{Xt}
E{Xt+1} for all underdominance ones.
There is no direct interaction between sites in the over-/underdominance model, because each new mutation makes the piece of DNA distinguishable from all other alleles, regardless of the number of previous mutations. Because our intuition suggests that indirect interactions will be often accomplished through changes in population mean fitness, there are likely to be only small amounts of indirect interactions in the overdominance model, because Gillespie has shown that the homozygosity, and as a result mean fitness, changes very little over time. There is likely to be a great deal more indirect interaction in the underdominance model, because this model does not maintain polymorphism, and there are significant changes in mean fitness as sites go to fixation. Indirect interactions should reinforce the effects of advantageous mutants, making R(T) < 1 + Ds1 for the overdominance model (but only slightly because strong interactions are unlikely), and making R(T) > 1 + Ds1 for the underdominance model.
Even though direct calculation of R(T) is difficult for the over-/underdominance model, Ds1 can be estimated from simulation. The basic simulation procedure is described in ![]()
|
The underdominance model can probably account for R(T) > 5, but this is difficult to show in simulation, because the origination rate goes to zero very rapidly as Ns gets below -4. Interpolating from the graph, there appears to be a narrow range of Ns, perhaps -8 < Ns < -4, with a large, but not astronomical, R(T) that could account for the observed values, but with an overall rate of evolution that is much lower than the neutral rate.
TIM and SAS-CFF are both models of a rapidly fluctuating environment, and understanding their behavior requires slightly more conjecture. There are no direct interactions between sites in either of these models, but the magnitude of indirect interactions is difficult to predict. Gillespie has shown in simulation that E{X
} > E{X0}, which is consistent with expected site frequencies increasing over time. Gillespie also found that R(T) < 1 for all these simulations, which is also consistent with expected site frequencies increasing over time. Nevertheless, actually demonstrating that expected site frequencies increase over time is a formidable problem. One is, however, fairly convinced of this by comparing simulated R(T) with 1 + Ds1, as estimated in these simulations. Simulation details can be found in ![]()
|
|
|
|
|
| HOUSE OF CARDS |
|---|
The house of cards model of molecular evolution is the most thoroughly analyzed (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In its most common form, the HOC model assumes that the fitness of any mutation is picked from a normal distribution with mean 0 and variance
2. Under a small mutation rate assumption, it has been shown that the fitnesses of the most recently fixed sites can be thought of as a Markov process with a stationary distribution that is approximately Gaussian with mean 2N
2 and variance
2 (![]()
![]()
2, and new mutations have mean fitness 0, we can think of the relative fitness of new mutations as a normally distributed random variable with mean -2N
2. Therefore, the vast majority of mutations have to be deleterious, so Ds1 > 0, and in the absence of interaction, R(T) > 1.
In fact, indirect interactions can lead R(T) to be vastly largely than one. To a first approximation, the mean fitness of the population is equal to 1 + s*, where s* is the selection coefficient of the most recently fixed site. Therefore, the population mean fitness must fluctuate, and these fluctuations must occur slowly (in fact, on the exact same time scale as molecular evolution). Because mean fitness fluctuates, indirect interactions are expected. Because mean fitness fluctuates on the same time scale as molecular evolution, the indirect interaction component must be large. Putting together deleterious sites with large indirect interactions leads to the prediction that R(T) > 1, and perhaps much greater (Fig 7). As expected, 1 + Ds1 > 1, but not much greater. 1 + Ds1 never rises above 1.5, despite R(T) growing to nearly 500. The indirect interaction component is enormous, though.
|
The house of cards model can account for an index of dispersion >5, but only when 0.5 < N
< 2. This is an incredible parameter sensitivity. For N
< 0.5, the house of cards is essentially a neutral model. For N
= 2, the index of dispersion is well into the hundreds. It is difficult to simulate N
> 3, because the origination rate is so slow.
| OPTIMUM MODEL |
|---|
The optimum model is a simple model of purifying selection. All mutations are assigned a phenotype drawn from a zero mean, unit variance normal distribution. The fitness function is quadratic with a maximum at zero. A single parameter
measures the width of the fitness function [see ![]()
= 0.05 to
= 0.1, R(T) rose by only 1%. Over a wide range of parameter values (results not shown), the optimum model always has difficulty producing an R(T) as large as 5.
|
| SHIFT MODELS |
|---|
Shift models are a perfect example of why performing simulations in the absence of theory can lead to entirely uninterpretable results. At first glance, the shift story looks simple. Gamma and exponential shifts yield R(T)
1; normal shifts lead to R(T) < 1 (![]()
The gamma and exponential shift models can be further subdivided into positive and negative shifts. In the negative-shift models (the only kind that has received significant theoretical attention; ![]()
![]()
![]()
![]()
Qualitatively analyzing Ds1, in the absence of site interactions, for gamma or exponential shifts is easy. For negative shifts, each new mutation has, on average, a fitness lower than the mean fitness of the population; hence mutations are on average deleterious and Ds1 > 0. Similarly, positive shifts are advantageous and Ds1 < 0. Direct interactions qualitatively change this picture.
Shift models fundamentally differ in their mode of site interaction from all other models of evolution that we have so far considered. In all other models, the fitness of a sequence is essentially independent of the number of mutations it contains. In shift models, the fitness of a piece of DNA is directly proportional to the number of mutations it contains. Thus, sites directly interact with one another, so R(T) should be closer to one.
One can attempt to crudely estimate this effect. Consider,
(t) = Pr{St = 1|jt on i0}/p. In words,
(t) is the probability a mutant, jt, which entered the population at time t, fixes given it arose on a piece of DNA containing a mutant, i0, that entered at time 0, divided by the probability that jt fixes. The analysis is done by considering two cases.
In the first case jt arises on a piece of DNA containing i0, and i0 is still segregating in the population. In the second case jt arises only after i0 has been fixed (because jt is on i0, i0 cannot have been lost before time t). Therefore,

The first approximation is to assume that Pr{j fixes|i fixed before t}
p. In other words, if i0 has been fixed before jt enters the population, then i0 has little effect on jt. Because this is an attempt to capture direct interactions, the approximation essentially amounts to assuming that if i0 is fixed, it contributes equally to the fitness of all alleles. Using this approximation,

Consider the ratio of probabilities in the first term. This term is the probability that jt fixes, given that it arose on a piece of DNA containing another segregating mutant, i0, divided by the probability that jt fixes. Thus, it is the ratio of the probability of fixation of a piece of DNA with at least two segregating mutants divided by the probability of fixation of a piece of DNA with at least one segregating mutant. Loosely, it is the probability of fixation of a piece of DNA with two mutants divided by the probability of fixation of a piece of DNA with one mutant. The question is, How does the extra mutant effect jt's probability of fixation? Direct calculation appears very difficult, but by considering a two-allele diffusion, an approximation may be found.
Consider a simple two-allele diffusion, where the fitnesses of the genotypes A1A1, A1A2, and A2A2 are 1, 1 + s/2, and 1 + s, respectively. The probability of fixation of a new mutant A2 is given by ![]()
![]() |
(10) |
Under a model of direct interactions, think of the fitness of a piece of DNA with only jt on it as 1 + s. Think of the fitness of a piece of DNA with both jt and i0 on it as 1 + 2s. Therefore, by analogy to the two-allele diffusion, an approximation for Pr{j fixes|jt on i0, i segs at t}/p might be
(2s)/
(s). For small values of Ns,
(2s)/
(s) may be further approximated by 2N
(s) (see Fig 9). Noting that Equation 1 was derived for a haploid model with population size N, this suggests approximately Pr{j fixes|jt on i0, i segs at t}/p with the very simple Np. Plugging this approximation into (7),
![]() |
(11) |
where Ds2 = 2

t=1(E{Xt} - Pr{Xt} = 1})(Np - 1). Quick examination shows that Ds2, at least qualitatively, captures the effect of direct interactions. The sign of Ds2 is determined by the sign of Np - 1, because E{Xt}
Pr{Xt = 1}. If most mutations are advantageous, then Ds1 < 0, but Np > 1, so that Ds2 > 0, and R(T) is re-stored toward 1. If most mutations are deleterious, then Ds1 > 0, but Np < 1, so that Ds2 < 0, and R(T) is again restored toward 1. So, qualitatively Ds2 behaves as it should. Nevertheless, Ds2 contains two approximations that may effect its quantitative agreement with simulation.
|
Ds2, much like Ds1, contains a term, Pr{Xt = 1}, that is difficult to find analytically. However, this term is easy to obtain from simulation. Hence, Fig 10 was produced using simulation to estimate both Ds1 and Ds2. These simulations behave qualitatively as expected. 1 + Ds1 + Ds2 does a much better job of predicting R(T) than does 1 + Ds1 alone, but there is still much room for improvement. Moreover, because R(T) is so nearly equal to 1.0 for the negative gamma shift, one wonders if a much better approximation than Ds2 is easily available.
|
Normal shifts are similar in structure to gamma shifts. The fitness of a sequence with a newly arising mutation is its parents' fitness plus a normally distributed random variable with mean 0 and variance
2 (instead of a gamma-distributed random variable). Unlike the gamma shifts, where all sequences with new mutations were either uniformly worse than their parents (negative) or uniformly better (positive), mutants under a normal shift have a 50% chance of having higher fitness than their parents and a 50% chance of having a lower fitness. It is a little difficult to predict a priori that under this model, mutations on average increase in frequency, but this is not altogether surprising. New mutants have a higher than average fitness half the time. Thus, one expects new mutants to increase in frequency roughly half the time. Because there is a lot more "space" above 1/N than there is below it, it is not surprising that mutants on average increase in frequency. In any case, from simulation it is clear that mutants do, in fact, increase in frequency, on average, and as a result Ds1 < 0. Given that Ds1 < 0, one expects that Ds2 > 0 because of direct interactions. It is clear from simulation (Fig 11) that 1 + Ds1 + Ds2 does a reasonable job of predicting R(T), but once again there is still considerable room for improvement, particularly for weak selection.
|
| INFINITE ALLELE MODELS |
|---|
![]()
Gillespie's argument can be understood in the terms presented here as well. From (2) and (5), the index of dispersion can be written as
![]() |
(12) |
Under the assumption that the mutation process has a constant rate, (12) can be written as
![]() |
(13) |
Consider Pr{St = 1|S0 = 1, Mt = 1}. This is the probability that a mutation that enters the population at the t fixes, given that a mutation that entered the population at time 0 also fixes. Suppose the mutation from time 0 first reaches frequency 1 at time t*. Because of the structure of an infinite allele model, at time t* there cannot be any segregating sites at this locus. Thus, Pr{St = 1|S0 = 1, Mt = 1} = 0 for all values of t, such that t* - N < t < t*. In a Moran model it takes at least N time steps for a mutation to reach frequency 1; therefore any mutant that enters the population in the N time steps before t* is destined to be lost. For values of t slightly smaller than t* - N, Pr{St = 1|S0 = 1, Mt = 1} will be nearly 0. Thus, for all values of t slightly less than t*, Pr{St = 1|S0 = 1, Mt = 1} - p < 0. This suggests that infinite allele models will generally have an R(T) < 1. Note that this conclusion is a direct consequence of the infinite allele assumption, and it is hard to imagine that this result sheds any additional light on infinite site models.
| A DELETERIOUS MUTATIONS MODEL |
|---|
With an understanding of Ds1 and mutation interactions in hand, it is easy to construct a model that ought to produce large values of R(T). Deleterious sites cause Ds1 > 0, so the model must have mostly deleterious mutants. Direct interactions can negate this effect, so there must be no direct interactions between sites. Any model with these two properties ought to produce an R(T) > 1. The following extremely simple model (![]()
Consider a two-allele model with alleles A1 and A2 with fitnesses 1 and 1 -
,
> 0, respectively. When an A1 allele mutates it becomes A2 with probability 1. When an A2 allele mutates it becomes A1 with probability q, q << 1, and stays A2 with probability 1 - q. A1 should be nearly fixed most of the time, and therefore almost all mutations will be deleterious. Even when A2 is nearly fixed, most mutations are neutral, so that Ds1 should be large.
To finish off the model, assume that q = 0.001 and this is an additive diploid population structure, so the ith sequence, with fitness wi
{1,1 -
} and frequency Xi(t) in generation t, has deterministic frequency change

where

is the marginal fitness of the ith sequence, Ch(t) is the number of distinct sequences segregating in the population, and

is the population mean fitness.
This model should produce indirect interactions, because whenever polymorphism is unusually high, it is extremely likely that at least one A2 allele is at high frequency, which suggests that population mean fitness is unusually low, and subsequent fixations of other A2 alleles are unusually easy (
(t) > 1). Simulations reflect this intuition (Fig 12). Under a weak mutation approximation, ![]()
|
The deleterious mutant model behaves exactly as expected. For 2N
= 2, R(T) does not quite reach five before the origination rate falls significantly below the neutral level, but because the leading term in Ds1 is
, elevating 2N
ought to increase R(T). It does (Fig 13). This model can create an R(T) as large as one likes, while still maintaining an origination rate within an order of magnitude of the neutral rate, but only for a narrow range of N
values. R(T) can be further elevated (but only slightly) by making the A2 allele recessive (results not shown).
|
These results are not crucially dependent on choice of q. As long as q is small the conclusions hold. Fig 14 shows this. As long as q stays below 0.1, R(T) remains quite high. Interestingly, and perhaps not surprisingly, large q versions of this model are the only example in this article of a model without a renewal-like appearance. For q > 0.01, one can show that R(T) is statistically different from what its value would be had the model been a renewal process. As a q = 0 limiting case, a slight variant of this model was considered. In this model, A1 mutates to A2 with probability one, and A2 mutates to A2 with probability one, but when a site fixes, the sequence on which that site arose instantaneously becomes an A1 allele. This model can be thought of as a deleterious shift model, analogous to the gamma shift, but with direct site interactions removed.
|
| DISCUSSION |
|---|
In the absence of site interactions, deleterious mutations cause R(T) to be >1, and advantageous mutations cause R(T) to be <1. Advantageous mutations are shown in simulation to nearly completely explain all previous models that produced an R(T) < 1 (Fig 1 Fig 2 Fig 3 Fig 4 Fig 5 Fig 6). Direct interactions (a sequence's fitness is directly proportional to the number of mutations contained in the sequence) tend to make R(T) closer to 1 than it would otherwise be. Indirect interactions (sites interact through an intermediary, usually population mean fitness) generally have the opposite effect.
For mammalian species, the observed index of dispersion of protein-encoding loci is >>1, and our current best estimate suggests that it is >5 (![]()
![]()
![]()
In all the simulations done here, the long-run behavior of R(T) is reported. The reason for this is that virtually any stationary, orderly (no more than one mutation per time step) model of molecular evolution has the property that R(T) will be an increasing function of time. For Equation 1, it is clear that so long as h(t) converges monotonically to
, the longer one allows the process to evolve, the larger R(T) will be. For every model simulated here, R(T) was an increasing function of T, at least for small T. It is also clear from Equation 1 that if one observes a process for only a single time step, R(T) will be exactly equal to 1.0 -
. Thus, as a population evolves, R(T) will start at ~1 and continue to rise, until some long-term asymptotic value is reached. The length of time it takes to approach this asymptote is crucially dependent on the details of the model, but some intuition is possible.
Consider an attempt to estimate R(T) in a highly simplified situation. Suppose there are X1 substitutions in lineage one and X2 substitutions in lineage two. A simple estimate of R(T) might be (X2 - X1)2/(X1 + X2). Because the number of substitutions can never be negative, R(T) will never be larger than the maximum of {X1, X2}. Thus, the estimate of R(T) can never be larger than the maximum number of substitutions observed in the two lineages. So, unless there are at least 5 substitutions in one of the lineages, the estimate of R(T) has to be <5, no matter what the long run value is. Thus, when the level of divergence is very low, there is no possibility of estimating a large value of R(T).
In 12 of the 24 proteins examined by ![]()
In any case, it is quite certain that our best estimate of R(T) for mammals is >5 for both silent and replacement sites. The question remains, What evolutionary processes can generate an index of dispersion so large? This work shows that there are fundamentally only two ways to create an R(T) as large as 5. The first way is to force most mutations to be deleterious (create a large Ds1 and indirect interactions). The second way is to create fluctuations (in mutation rate or probability of fixation or both) that occur on a very slow timescale. There appear to be no other ways to create a large R(T).
Virtually any model of molecular evolution with mostly deleterious mutations and no direct site interactions can explain an R(T) as large as 5, but only with sufficiently high mutation rates and selection coefficients. A very simple model of deleterious alleles is examined and shown to easily explain R(T). This model is shown to be reasonably insensitive to rare advantageous mutations.
Although the neutral theory does a poor job of explaining the index of dispersion, it nevertheless does a passable job of predicting the mean rate of originations in a lineage. We know that the per-site origination rate in many proteins in most organisms is not too different from the per-site, per-generation mutation rate for these proteins (![]()
All the current models that can explain a large R(T) suffer from a parameter sensitivity problem. Models, which create a large R(T) by changing some extrinsic parameter on the same timescale as molecular evolution (![]()
![]()
generations (or slower), where
is the mutation rate. Now, this just may be the way the world is, but explaining it as such really just pushes the logical question back one level. The new question now becomes, How is it that some environmental parameter just happens to change at a rate keyed to the mutation rate?
Other models suffer from similar parameter problems. Takahata's fluctuating neutral space model (![]()
![]()
Deleterious site models can often explain large values of R(T), but only when Ns
2 - 10. If Ns > 10 evolution stops, and if Ns << 1, the world is neutral. Again these models just push the logical question back a level. If Ns = 5 explains the index of dispersion, how is it that organisms with N's that vary over many orders of magnitude just happen to have s's that match?
This last question is not entirely rhetorical. The notion that the average selective coefficient of a mutation might evolve has been suggested before (![]()
![]()
Finally, much of population genetics theory over the last 15 years has been devoted to understanding within-population sequence variation. In particular, there has been a great deal of work trying to understand the observed levels of nucleotide polymorphism in genomic regions of low recombination (![]()
![]()
| ACKNOWLEDGMENTS |
|---|
I thank John Gillespie, Hiroshi Akashi, Mark Grote, Michael Turelli, and two anonymous reviewers for much advance and many helpful suggestions.
Manuscript received April 23, 1999; Accepted for publication December 2, 1999.
| LITERATURE CITED |
|---|
ARAKI, H. and H. TACHIDA, 1997 Bottleneck effect on evolutionary rate in the nearly neutral mutation model. Genetics 147:907-914[Abstract].
BIRKY, C. W. and J. B. WALSH, 1988 Effects of linkage on rates of molecular evolution. Proc. Natl. Acad. Sci. USA 85:6414-6418
BULMER, M., 1989 Estimating the variability of substitution rates. Genetics 123:615-619
CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993 The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303[Abstract].
CHERRY, J. L., 1998 Should we expect substitution rate to depend on population size. Genetics 150:911-919
COX, D. R., and V. ISHAM, 1980 Point Process. Chapman and Hall, London.
CUTLER, D. J., 2000 The index of dispersion of molecular evolution: slow fluctuations. Theor. Popul. Biol. in press.
EWENS, W. J., 1979 Mathematical Population Genetics. Springer-Verlag, Berlin.
GILLESPIE, J. H., 1978 A general model to account for enzyme variation in natural populations. V. The SAS-CFF model. Theor. Popul. Biol. 14:1-45[Medline].
GILLESPIE, J. H., 1984a The molecular clock may be an episodic clock. Proc. Natl. Acad. Sci. USA 81:8009-8013
GILLESPIE, J. H., 1984b Molecular evolution over the mutational landscape. Evolution 38:1116-1129.
GILLESPIE, J. H., 1986a Natural selection and the molecular clock. Mol. Biol. Evol. 3:138-155[Abstract].
GILLESPIE, J. H., 1986b Variability of evolutionary rates of DNA. Genetics 113:1077-1091
GILLESPIE, J. H., 1987 Molecular evolution and the neutral allele theory. Oxf. Surv. Evol. Biol. 4:10-37.
GILLESPIE, J. H., 1989 Lineage effects and the index of dispersion of molecular evolution. Mol. Biol. Evol. 6:636-647[Abstract].
GILLESPIE, J. H., 1991 The Causes of Molecular Evolution. Oxford University Press, New York.
GILLESPIE, J. H., 1993 Substitution processes in molecular evolution. I. Uniform and clustered substitutions in a haploid model. Genetics 134:971-981[Abstract].
GILLESPIE, J. H., 1994a Substitution processes in molecular evolution. II. Exchangeable models from population genetics. Evolution 48:1101-1113.
GILLESPIE, J. H., 1994b Substitution processes in molecular evolution. III. Deleterious alleles. Genetics 138:943-952[Abstract].
GILLESPIE, J. H. and C. H. LANGLEY, 1979 Are evolutionary rates really variable? J. Mol. Evol. 13:27-34[Medline].
GOLDMAN, N., 1994 Variance to mean ratio, r(t), for poisson processes on phylogenetic trees. Mol. Phylogenet. Evol. 3:230-239[Medline].
HARTL, D. L., D. D. DYKHUIZEN, and A. M. DEAN, 1985 Limits of adaptation: the evolution of selective neutrality. Genetics 111:655-674
IWASA, Y., 1993 Overdispersed molecular evolution in constant environments. J. Theor. Biol. 164:373-393[Medline].
KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989 The hitchhiking effect revisited. Genetics 123:887-899
KELLY, F. P., 1979 Reversibility and Stochastic Networks. John Wiley & Sons, Chichester, United Kingdom.
KIMURA, M., 1979 Model of effectively neutral mutations in which selective constraint is incorporated. Proc. Natl. Acad. Sci. USA 76:3440-3444
KIMURA, M., 1983 The Neutral Allele Theory of Molecular Evolution. Cambridge University Press, Cambridge.
LANGLEY, C. H., and W. M. FITCH, 1973 The constancy of evolution: a statistical analysis of the
and ß haemoglobins, cytochrome c, and fibrinopeptide A, pp. 246262 in Genetic Structure of Populations, edited by N. E. MORTON. Univ. of Hawaii Press, Honolulu.
LANGLEY, C. H. and W. M. FITCH, 1974 An estimation of the constancy of the rate of molecular evolution. J. Mol. Evol. 3:161-177[Medline].
MORAN, N. A., 1996 Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 93:2873-2878
MORAN, P. A. P., 1958 Random processes in genetics. Proc. Camb. Philos. Soc. 54:60-72.
NACHMAN, M. W., S. N. BOYER, and C. F. AQUADRO, 1994 Nonneutral evolution at the mitochondrial NADH dehydrogenase subunit 3 gene in mice. Proc. Natl. Acad. Sci. USA 91:6364-6368
NIELSEN, R., 1997 Robustness of the estimator of the index of dispersion for DNA sequences. Mol. Phylogenet. Evol. 7:346-351[Medline].
OHTA, T., 1977 Extension of the neutral mutation drift hypothesis, pp. 148167 in Molecular Evolution and Polymorphism, edited by M. KIMURA. National Institute of Genetics, Mishima, Japan.
OHTA, T., 1995 Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 40:56-63[Medline].
OHTA, T. and J. H. GILLESPIE, 1996 Development of the neutral and nearly neutral theories. Theor. Popul. Biol. 49:128-142[Medline].
OHTA, T. and M. KIMURA, 1969 Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genet. Res. 16:165-177.
OHTA, T. and M. KIMURA, 1971 On the constancy of the evolutionary rate of cistrons. J. Mol. Evol. 1:18-25[Medline].
OHTA, T. and H. TACHIDA, 1990 Theoretical study of near neutrality. I. Heterozygosity and rate of mutant substitution. Genetics 126:219-229[Abstract].
SAWYER, S., 1977 On the past history of an allele now known to have frequency p.. J. Appl. Prob. 14:439-450.
TACHIDA, H., 1991 A study on a nearly neutral mutation model in finite populations. Genetics 128:183-192[Abstract].
TACHIDA, H., 1996 Effects of the shape of distribution of mutant effect in nearly neutral mutation models. J. Genet. 75:33-48.
TAKAHATA, N., 1987 On the overdispersed molecular clock. Genetics 116:169-179
TAKAHATA, N., 1989 Statistical models of the overdispersed molecular clock. Theor. Popul. Biol. 39:329-344.
TAKAHATA, N., K. ISHII, and H. MATSUDA, 1975 Effect of temporal fluctuation of selection coefficient on gene frequency in a population. Proc. Natl. Acad. Sci. USA 72:4541-4545
WATTERSON, G. A., 1975 On the number of segregating sites in genetic models without recombination. Theor. Popul. Biol. 7:256-276[Medline].
ZENG, L.-W., J. M. COMERON, B. CHEN, and M. KREITMAN, 1998 The molecular clock revisited: the rate of synonymous vs. replacement change in drosophila.. Genetica 102(103):369-382.
This article has been cited by other articles:
![]() |
S. Y. W. Ho and M. J. Phillips Accounting for Calibration Uncertainty in Phylogenetic Estimation of Evolutionary Divergence Times Syst Biol, July 3, 2009; (2009) syp035v1. [Full Text] [PDF] |
||||
![]() |
E. V. Koonin Darwinian evolution in the light of genomics Nucleic Acids Res., March 1, 2009; 37(4): 1011 - 1034. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Bedford and D. L. Hartl Overdispersion of the Molecular Clock: Temporal Variation of Gene-Specific Substitution Rates in Drosophila Mol. Biol. Evol., August 1, 2008; 25(8): 1631 - 1638. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Bedford, I. Wapinski, and D. L. Hartl Overdispersion of the Molecular Clock Varies Between Yeast, Drosophila and Mammals Genetics, June 1, 2008; 179(2): 977 - 984. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Kim and S. V. Yi Mammalian Nonsynonymous Sites Are Not Overdispersed: Comparative Genomic Analysis of Index of Dispersion of Mammalian Proteins Mol. Biol. Evol., April 1, 2008; 25(4): 634 - 642. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. McBride, J. R. Arguello, and B. C. O'Meara Five Drosophila Genomes Reveal Nonneutral Evolution and the Signature of Host Specialization in the Chemoreceptor Superfamily Genetics, November 1, 2007; 177(3): 1395 - 1416. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Takahata Molecular Clock: An Anti-neo-Darwinian Legacy Genetics, May 1, 2007; 176(1): 1 - 6. [Full Text] [PDF] |
||||
![]() |
V. Mustonen and M. Lassig Adaptations to fluctuating selection in Drosophila PNAS, February 13, 2007; 104(7): 2277 - 2282. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Bloom, A. Raval, and C. O. Wilke Thermodynamics of Neutral Protein Evolution Genetics, January 1, 2007; 175(1): 255 - 266. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Akashi, W.-Y. Ko, S. Piao, A. John, P. Goel, C.-F. Lin, and A. P. Vitins Molecular Evolution in the Drosophila melanogaster Species Subgroup: Frequent Parameter Fluctuations on the Timescale of Molecular Divergence Genetics, March 1, 2006; 172(3): 1711 - 1726. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Novichkov, M. V. Omelchenko, M. S. Gelfand, A. A. Mironov, Y. I. Wolf, and E. V. Koonin Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution J. Bacteriol., October 1, 2004; 186(19): 6575 - 6585. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Kern, C. D. Jones, and D. J. Begun Molecular Population Genetics of Male Accessory Gland Proteins in the Drosophila simulans Complex Genetics, June 1, 2004; 167(2): 725 - 735. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. G. C. Smith and A. Eyre-Walker Partitioning the Variation in Mammalian Substitution Rates Mol. Biol. Evol., January 1, 2003; 20(1): 10 - 17. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Johnson and N. H. Barton The Effect of Deleterious Alleles on Adaptation in Asexual Populations Genetics, September 1, 2002; 162(1): 395 - 411. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kusumi, Y. Tsumura, H. Yoshimaru, and H. Tachida Molecular Evolution of Nuclear Genes in Cupressacea, a Group of Conifer Trees Mol. Biol. Evol., May 1, 2002; 19(5): 736 - 747. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Cutler, D. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Cutler, D. J.

































