Abstract
Rates of molecular evolution at some proteinencoding loci are more irregular than expected under a simple neutral model of molecular evolution. This pattern of excessive irregularity in protein substitutions is often called the “overdispersed molecular clock” and is characterized by an index of dispersion, R(T) > 1. Assuming infinite sites, no recombination model of the gene R(T) is given for a general stationary model of molecular evolution. R(T) is shown to be affected by only three things: fluctuations that occur on a very slow time scale, advantageous or deleterious mutations, and interactions between mutations. In the absence of interactions, advantageous mutations are shown to lower R(T); deleterious mutations are shown to raise it. Previously described models for the overdispersed molecular clock are analyzed in terms of this work as are a few very simple new models. A model of deleterious mutations is shown to be sufficient to explain the observed values of R(T). Our current best estimates of R(T) suggest that either most mutations are deleterious or some key population parameter changes on a very slow time scale. No other interpretations seem plausible. Finally, a comment is made on how R(T) might be used to distinguish selective sweeps from background selection.
THE most simple version of the neutral theory of molecular evolution (Ohta and Kimura 1971; Sawyer 1977; Kelly 1979; Kimura 1983) predicts that the number of mutations that arise in a population in T generations, which ultimately become fixed in the population, will be Poisson distributed with mean uT, where u is the per sequence, per generation mutation rate. Therefore, the variance in the number of substitutions will equal the mean under this most simple neutral model. The ratio of the variance in the number of substitutions to the mean number is called the index of dispersion of molecular evolution. Under the most simple neutral theory, the index of dispersion, R(T), should equal 1.
The first article to demonstrate a deviation from a Poisson number of substitutions occurred early in the history of the neutral theory (Ohta and Kimura 1971). Ohta and Kimura examined three proteins in several pairwise comparisons in mammals. They showed that for two of the proteins in a few of the pairs, a Poisson substitution rate could be rejected. This result was hard to interpret, as no explicit phylogenetic hypothesis was made, and the effect of phylogeny went unconsidered.
The first attempts to use a phylogeny in an explicit manner came a few years later (Langley and Fitch 1973, 1974). Langley and Fitch examined four proteins in 18 species. The species were assumed to have a known phylogeny. The numbers of amino acid substitutions along all branches in the phylogeny were inferred. Next, the branch lengths and mutation rates were found by a maximumlikelihood method. Finally, a χ^{2} (Langley and Fitch 1973) and a likelihoodratio test (Langley and Fitch 1974) were performed to ask whether the observed numbers of mutations on the branches were statistically different from the expected. The neutral Poisson model was rejected with high confidence. Two basic interpretations of Langley and Fitch have been offered. The first interpretation suggested that the rate of molecular evolution changed over time (Langley and Fitch 1974); i.e., the mean number of substitutions was not constant over time (the substitution process is not stationary). The second interpretation (Gillespie and Langley 1979) was that the mean number of substitutions remained constant over time (the substitution process was stationary), but that the variance in the number of substitutions was larger than would be produced by a Poisson process, i.e., the index of dispersion was larger than one.
In 1983, Kimura attempted to directly test whether or not the index of dispersion truly equaled one (Kimura 1983). Kimura considered four different proteins taken from six mammalian lineages. He assumed these six lineages came from a star phylogeny, and therefore the number of substitutions in each lineage was an independent sample, each with mean uT. He calculated R(T) for each of these four proteins and found that R(T) ranged between 1.7 and 3.3. Although R(T) was bigger than predicted, in only two of the proteins was it significantly larger than one.
In a series of articles, Gillespie (1984a, 1986a,b) extended Kimura's test to nine proteins (both nuclear and mitochondrial), but again assumed a mammalian star phylogeny. Gillespie (1986b) found that R(T) ranged from 0.16 to 35.55, and he had more than enough observations to repeatedly reject the most simple neutral theory. Despite the rather clear rejection of the simple neutral model, this analysis was less than completely convincing for several reasons. First, a star phylogeny was assumed. If this assumption were false, individual lineages would have different T's, and the variance would be artificially inflated (Gillespie 1989). Second, under the neutral theory, the substitution rate should only be constant per generation, so different length generations in different lineages should artificially inflate R(T) (Gillespie 1989). Third, an overall increase or decrease in the mutation rate in only some of the lineages (perhaps due to a systemic change in metabolic rate or in DNA repair machinery, etc.) would lead to an artificial inflation of R(T) (Gillespie 1989). Fourth, in certain circumstances use of a correction formula to estimate divergence distances could artificially inflate R(T) (Bulmer 1989; Gillespie 1989). Gillespie solved the first three problems, collectively known as lineage effects, in 1989.
Gillespie's solution to lineage effects was to (1) restrict his analysis to three species at a time, thereby guaranteeing a single unrooted phylogeny, and (2) weight the number of substitutions in each lineage by one over the mean number for that lineage, where the mean is taken over all loci examined. This weighting process amounts to regressing out lineage effects from the data. Using these weightings, Gillespie showed that for replacement substitutions in 20 loci, R(T) ranged from 0.13 to 43.82 with a mean of 6.95 (Gillespie 1991, p. 119). Silent sites at these same loci had an average R(T) of 4.64. Gillespie concluded that R(T) was clearly statistically significant for replacement sites, but was perhaps only marginally significant for silent sites, due to the bias introduced by use of correction formulas.
Goldman (1994) quantified the extent of the error lineage effects might have introduced in the early estimates of R(T). He further noted that Gillespie's simulations of his weighting factor solution had not been as extensive as they could have been. Nielsen more than made up for this lack of simulations (Nielsen 1997) and further showed that the fourth problem due to correction formulas was not very large as long as the sequences were not too close to saturation.
By 1995 enough data had been gathered to examine 49 mammalian loci. Using Gillespie's weighting factor, Ohta (1995) showed that, averaged over all these loci, R(T) > 5 for both silent and replacement sites. These values were more than large enough to reject simple neutrality. So taken as a whole, the evidence from mammals is compelling. The observed index of dispersion is larger than one for both silent and replacement sites. This “overdispersion” is not due to lineage effects and is not an artifact of correction formulas. The conclusion is inescapable. For mammals, the most simple neutral theory of molecular evolution does not explain protein divergence data.
A recent study of Drosophila (Zenget al. 1998) has cast some doubt on whether conclusions drawn from mammalian data should necessarily be applied to all life. Zeng and colleagues examined 24 proteins from three species of Drosophila, Drosophila pseudoobscura, D. subobscura, and D. melanogaster. D. pseudoobscura and D. subobscura are relatively closely related, with D. melanogaster a more distantly related outgroup. They found that, using Gillespie's weighting factor, averaged over these 24 loci, R(T) was 4.37 for silent sites but only 1.64 for replacement sites. The value obtained for silent sites was qualitatively in agreement with Ohta's value for mammals, but the replacement values were much smaller and not statistically different from 1. Unfortunately, interpretation of the replacement site results is confounded by extremely low replacement divergence between D. pseudoobscura and D. subobscura (see discussion below).
It now seems clear that mammalian loci are, on average, overdispersed at both silent and replacement sites. In Drosophila, it is likely that silent sites are overdispersed, but replacement sites might not be. In any case, because the most simple neutral theory can never produce an R(T) > 1, it is of interest to know which models of molecular evolution can produce a large index of dispersion. Several models have been suggested. They include episodic selection on a mutational landscape (Gillespie 1984a,b, 1991), the fluctuating neutral space model (Takahata 1987, 1989), and the house of cards (HOC) model of slightly deleterious mutations (Ohta and Tachida 1990; Tachida 1991; Iwasa 1993; Gillespie 1994b; Tachida 1996; Araki and Tachida 1997). In addition to these particular models, there is an extensive set of simulations by Gillespie (1993, 1994a,b), which tried to characterize R(T) for numerous models. With these simulations, Gillespie showed that fluctuating selection could account for a high index of dispersion, but only if the fluctuations occurred very slowly, roughly at the same rate fixations happened (Gillespie 1993). In addition, he found that symmetric underdominance (Gillespie 1994a), optimizing selection, and the house of cards model (Gillespie 1994b) could all produce R(T) > 1, but only in a very narrow range of parameter values. He found that exponential and gamma shift models of deleterious sites produced R(T) ≈ 1 (Gillespie 1994b) (Table 1).
In other simulations, Gillespie (1994a) showed that rapidly fluctuating selection not only failed to explain a large index of dispersion, but actually produced an R(T) < 1. This result was particularly surprising. Put only a little bit facetiously, Gillespie took a neutral model, added some random fluctuations to it, and derived a process that was less random than the one he started with. Other models also produced R(T) < 1, including symmetric overdominance and normal shift models. Gillespie attempted to develop some insight concerning how a model might produce an R(T) < 1 (Gillespie 1993), but his insight was built by considering an infinite allele, not an infinite site model. It is argued below that infinite site models behave substantially differently than infinite allele models, and his result is only applicable to the latter. The mechanism by which an infinite site model could ever produce an R(T) < 1 has not yet been suggested.
The goals of this article are threefold: first, to describe the mathematical machinery necessary to analyze the index of dispersion for an infinite site model of the gene and second, using this machinery, to describe which models will produce R(T) > 1, which will produce R(T) < 1, and which will produce R(T) ≈ 1. Finally, from our observation that R(T) appears to be >5 for mammalian data, this article attempts to discover what we can infer about the nature of mammalian evolution.
CALCULATION OF R(T)
A substitution is a mutation that ultimately fixes in the population. There are two different processes that might be called the substitution process. One process, the origination process (Gillespie 1994a), is the point process of the times of entry of those mutations that ultimately fix in the population. The other process, the fixation process (Gillespie 1994a), is the point process of the times when mutations, which ultimately fix, first reach frequency one. This article is concerned only with the origination process.
To derive the index of dispersion of the origination process begin by assuming that the gene contains an infinite number of sites and by assuming that there is no recombination between those sites (Watterson 1975). Assume that time is discrete and that population size is constant and equal to N < ∞ haploid individuals. Let the population reproduce according to a discrete time Moran model (Moran 1958). The mutation process and site frequency dynamics are assumed to be stationary, so that translations of the time axis do not effect the origination process. Let M_{t} equal one if there is a mutation at time t, and equal zero otherwise. Let S_{t} equal one if there is an origination at t, and zero otherwise. It can be shown that ratio of the variance in the number of originations, divided by the mean number of originations in T time steps, R(T), is given by
If h(t) converges to ρ sufficiently quickly, so that
A few comments concerning the conditional intensity function should be made. It is defined to be the product of three terms, the probability there is a mutation at time 0, given a mutation at time t, Pr{M_{0} = 1M_{t} = 1}, the expected frequency of a mutant t time units after it entered the population, E{X_{t}}, and the amount of interaction between mutants separated by t time steps,
SLOWLY CHANGING ENVIRONMENT
Virtually any model containing a key parameter that changes on a sufficiently slow time scale can explain the observed index of dispersion. Other work has shown (Cutler 2000) that if either the mutation rate or the probability of fixation changes as slow as, or slower than, the average time between fixation of sites, then the index of dispersion can be elevated significantly above one. Gillespie's (1993) simulations confirm that models of a slowly fluctuating environment can produce large R(T)'s regardless of the details of the model.
Despite the fact that slowly changing parameters can cause R(T) to be large, simply invoking slow change appears to be an incomplete explanation of R(T). If one assumes that the time between substitutions is measured in millions of years, one must also assume that the environment changes on the time scale of millions of years. At first glance, it is not obvious that any environmental process has this property. Environmental processes that are often considered slow (for instance glaciation) are usually orders of magnitude faster than would be required here. To fully explain R(T), a mechanism would need to be suggested that could cause a key parameter to change so slowly. Without such a mechanism, a slowly changing environment appears a somewhat hollow explanation.
Takahata's fluctuating neutral space (FNS) model (Takahata 1987, 1989) could provide a possible mechanism. At the heart of the FNS model is the notion that each mutation changes the subsequent mutation rate for a given piece of DNA. This process is difficult to model exactly [see Cutler (2000) for an attempt], so it is often approximated by a model where mutation rate changes with each substitution (Takahata 1987, 1989; Cutler 2000). Whether this change occurs at the moment a mutant first reaches frequency one (i.e., corresponding to events in the fixation process), or at the moment a mutant destined to fix first enters the population (i.e., corresponding to events in the origination process) is often left obscured by the coarseness of the approximations used in the analysis (Takahata 1987; Cutler 2000). Regardless of the modeling details, the FNS model has the property the mutation rate must change on the same time scale as molecular evolution. Therefore, the FNS model is capable of generating large R(T) values. Several results on the FNS model have been obtained.
First, for the FNS model to generate large values of R(T), there must be more than two possible mutation rates (Cutler 2000). Second, when new mutation rates are picked independently of previous rates, R(T) = 5 implies that sequences that differ by only a single site will have mutation rates that differ by an order of magnitude 2–5% of the time (depending on the details of the distribution of mutation rates; Cutler 2000). Processes where new mutation rates are not independent of previous rates are difficult to analyze (Takahata 1989), but can produce large values of R(T), if the process has a sufficient amount of time to evolve (Takahata 1989).
UNDERSTANDING MODELS WITH SELECTION
Many models make the assumption that the mutation process has a constant rate. If ν(t) = ν, then
The expected frequency of a neutral mutation does not change over time; E{X_{t1}} = E{X_{t2}} = p for all t_{1} and t_{2}. Nonneutral mutations do not necessarily have this property. D_{s1} measures the effect a changing expected frequency has on the index of dispersion. A simple rule of thumb results. In the absence of site interactions, deleterious mutations cause R(T) > 1, and advantageous mutations cause R(T) < 1. The magnitude of the effect can be made quite large.
The overall sign of D_{s1} is obviously determined by the sign of the E{X_{t}} − p. If the expected frequency of mutations does not change over time, then D_{s1} = 0. If the expected frequency of sites monotonically declines over time, then D_{s1} > 0 [because E{X_{t} ≥ E{X_{∞} = p, E{X_{t}} − p ≥ 0]. Conversely, if the expected frequency monotonically increases, then D_{s1} < 0. An interesting unsolved problem is to describe which models of molecular evolution have the property that the expected frequency of mutations is monotonic over time. A natural conjecture (and one that is consistent with the simulations performed here) is that any stationary model has this property.
Apart from a simple onelocus, twoallele FisherWright world, there is some difficulty defining what is meant by a deleterious/advantageous mutation. For the purposes of this article, a particular mutation will be said to be deleterious/advantageous if its expected frequency decreases/increases over time. A model will be said to be a deleterious/advantageous mutation model if, averaged over all possible mutants, the expected frequency of mutants decreases/increases. Note that the definition of deleterious/advantageous mutation model is a property of the mutations, not the originations. Thus, a model will be called a deleterious site model if the majority of mutations decline in frequency, but this statement implies nothing at all about the fitness of the sites that actually fix. It is a statement about the average properties of mutants, not a statement on the properties of those rare mutants who eventually fix.
Thus, we arrive at the conclusion that, in the absence of site interactions, deleterious mutation models have an R(T) > 1, and advantageous mutation models have an R(T) < 1. Although this result is clear as stated, the intuition concerning why it's true may be less obvious.
Mutations do not necessarily fix one at a time. D_{s1} can be thought of as measuring the effect the size and frequency of multiple fixations has on the index of dispersion. To see this, write D_{s1} as:
In a neutral model, the expected frequency of a site does not change over time. So, for a neutral model E{X_{t}} = p for all t. Thus, the second sum in Equation 9 is what the expected number of mutations on i_{0} would be, if this were a neutral model with probability of fixation p. Therefore, D_{s1}/2 is equal to the expected number of mutations on i_{0} minus the expected number of mutations under a neutral model. For a deleterious site model E{X_{t}} > p, so that the first sum is bigger than the second, and, on average, there are“too many” mutations on i_{0}, relative to a neutral model with the same probability of fixation. Conversely, in an advantageous model E{X_{t}} < p, so that there are “too few” mutations on i_{0} relative a neutral model with the same probability of fixation.
Finding D_{s1} directly for any particular model is not trivial. Other than for the neutral case, it is not obvious that E{X_{t}} is ever easy to calculate. For models that may be approximated with a diffusion, finding E{X_{t}} amounts to solving a Kolmogorov backward equation. If a twoallele model is an adequate approximation, the problem can also be formulated as an ordinary differential equation (Ohta and Kimura 1969), but truncation of higherorder moments is often necessary. Despite these difficulties, D_{s1} can be directly measured in a simulation.
To estimate D_{s1} within a simulation, a single extra vector, call it DS[0 … R], needs to be stored, where R is a number sufficiently large so that all sites are extremely likely to be fixed or lost within R generations (R = 1000N was used in the simulations for this article). Initialize the DS vector to 0. During the simulation, track the frequency of each site in all generations. For each mutation add its frequency t generations after it entered the population to the value stored in DS[t]. When the simulation is done, divide each element of DS by the total number of mutations. Estimate D_{s1} by
Finally, one might ask if there is any general intuition on the effects of the overall mutation rate and overall strength of selection on D_{s1}. As is obvious from Equation 8, D_{s1} is independent of time. Also, D_{s1} appears to be a linear function of the mutation rate. For small mutations rates, this may be roughly true, but for large 2ν the linear dependence must disappear. The reason for this is that E{X_{t}} and p must also be a function of 2ν, because 2ν effects, among other things, the overall heterozygosity of the population and its mean fitness. As a result, D_{s1} is unlikely to depend on 2ν in a simple linear manner.
By analogy to a classical FisherWright model, one can imagine changing the strength of selection. This can have two effects on D_{s1}. First, it can change the probability of fixation, p, thereby making p closer to/further from the initial frequency of a new mutant (1/N), thereby decreasing/increasing D_{s1}. Second, when the strength of selection changes, the time it takes for the expected frequency to reach p will also change. Increasing selection decreases the time, so that D_{s1} decreases. Decreasing selection increases the time, so that D_{s1} increases. Predicting the net effect is difficult. In all the simulations, increasing selection usually increased D_{s1}, and never significantly decreased it, but for very strong selection, D_{s1} generally appeared to approach some asymptote.
INTERACTION BETWEEN SITES
Consider a mutant that enters the population at the current time step, t. Call this mutation j_{t}. The probability that j_{t} ultimately fixes is p. When j_{t} entered the population, it arose on some piece of DNA. The piece of DNA might contain other mutations. Pr{S_{t} = 1j_{t} on j_{0}} is the probability that j_{t} fixes, given that the piece of DNA on which it arose contains another mutant, i_{0}, which entered the population at time zero. If knowing that the piece of DNA contains an earlier mutation does not effect j_{t}'s chance of fixation, then Pr{S_{t} = 1j_{t} on i_{0}} will equal p, and
There are at least two fundamental ways in which mutants can interact. We call these two ways direct and indirect interactions. For many models of natural selection, the fitness of a piece of DNA is proportional to the number of mutations that it contains. For instance, in the negative gamma shift model (described below), when a new mutation enters the population, the fitness of the piece of DNA on which it arose is equal to its fitness prior to the mutation, minus a gammadistributed random variable. When the fitness of a piece of DNA is a function of the number of mutations contained on the piece of DNA, we say that mutations directly interact with one another.
For other models of evolution, the fitness of a piece of DNA containing a new mutation is independent of the number of previous mutations. In the house of cards model, the fitness of a piece of DNA containing a new mutation is drawn independently from some fixed [often Gaussian (Gillespie 1994b; Tachida 1996)] distribution. Thus, the fitness of a piece of DNA containing a new mutation is independent of the number of earlier mutations it contains. In this case, we say there is no direct interaction between mutations. Nevertheless, knowledge that a piece of DNA contains an earlier mutation can still indirectly effect the mutation's chance of fixation.
If one knows that j_{t} arose on a piece of DNA containing i_{0}, one has some information about the state of the population. In particular, one suspects that i_{0}'s expected frequency, conditional on j_{t} arising on i_{0}, is higher than its unconditional expected frequency. The knowledge that i_{0} is expected to be at higher frequency may, in turn, suggest something about the expected mean fitness of the population. The expected mean fitness of the population may, in turn, suggest something about the probability that j_{t} will fix. In general, when i_{0} effects j_{t}'s probability of fixation through one or more intermediaries (like population mean fitness), we say that i_{0} and j_{t} indirectly interact. Virtually all nonneutral models should have some form of indirect interactions, although we suspect that models that produce relatively constant population mean fitnesses might have relatively negligible indirect interactions.
A simple rule of thumb can be applied to site interactions. In general, direct interactions tend to move R(T) toward one; indirect interactions tend to move R(T) away from one. This can be seen by considering a few simplified cases.
Consider an advantageous mutation model where the fitness of a piece of DNA with k mutations is equal to 1 + ks, s > 0. This is, by definition, a model of direct interactions. If interactions were absent, R(T) would be <1. Direct calculation of
The converse is true for the deleterious model with direct interactions. If the fitness of a sequence with k mutations is 1 − ks, s > 0, then E{X_{t}} > p, and in the absence of interactions, R(T) would be >1. But, because each additional mutation lowers the fitness of a piece of DNA, Pr{S_{t} = 1j_{t} on i_{0}} must be less than p, and τ must be <1. Thus, this form of direct interaction must move R(T) toward 1.
Indirect interactions often have the opposite effect. Consider a mutation j_{t}, which enters the population at time t, on a piece of DNA containing an earlier mutation i_{0}. Conditional on j_{t} landing on a piece of DNA containing i_{0}, i_{0}'s expected frequency is likely to be higher than its unconditional expected frequency. If this is a deleterious mutation model, i_{0}'s higher conditional expected frequency suggests that the conditional expected population mean fitness is likely to be lower than the unconditional expectation. Given that j_{t} arose at a time when the conditional population mean fitness is expected to be lower than the unconditional expected fitness, j_{t}'s probability of fixation is likely to be higher, thereby making
EXCHANGEABLE ALLELES
Gillespie (1994a) performed an extensive set of simulations of exchangeable allele models. His results can be summarized as follows: symmetrical overdominance, TIM (Takahata, Ishii, and Matsuda model; Takahataet al. 1975), and SASCFF (Gillespie 1978) all produced R(T) < 1, and symmetrical underdominance produced R(T) > 1. These results should be expected.
Symmetrical over/underdominance is characterized by individuals who are homozygous for all sites of the locus having fitness 1. Individuals who are heterozygous for even a single site have fitness 1 + s, where s is fixed and greater than zero for overdominance and less than zero for underdominance. The mutational model is assumed to be Poisson, with constant rate ν.
If mutation interactions were absent, the overdominance model would produce an R(T) < 1, and underdominance would lead to R(T) > 1. When a mutation enters the population, the piece of DNA on which it arose will be in heterozygotes for at least its first few generations. Therefore, this piece of DNA will have higher than average fitness during this time, and E{X_{t}} will be an increasing function for this time. Similarly, a new underdominant mutant will have a lower than average fitness, and E{X_{t}} will be a decreasing function at first. Whether this pattern continues (overdominance mutants increase in expected frequency; underdominance mutants decline) for the entire time a mutant segregates in the population remains an open analytical question. It is clear from Gillespie's (1994a) simulations, and the ones done here, that E{X_{∞}} > 1 / N for overdominance models, and E{X_{∞}} < 1 / N for underdominance models. Thus, one suspects that this pattern of expected frequency change may hold the entire time a mutant segregates. It is certain, for the parameter values examined in this study, in every simulation E{X_{t}} ≤ E{X_{t+1}} for all overdominance models, and E{X_{t}} ≥ E{X_{t+1}} for all underdominance ones.
There is no direct interaction between sites in the over/underdominance model, because each new mutation makes the piece of DNA distinguishable from all other alleles, regardless of the number of previous mutations. Because our intuition suggests that indirect interactions will be often accomplished through changes in population mean fitness, there are likely to be only small amounts of indirect interactions in the overdominance model, because Gillespie has shown that the homozygosity, and as a result mean fitness, changes very little over time. There is likely to be a great deal more indirect interaction in the underdominance model, because this model does not maintain polymorphism, and there are significant changes in mean fitness as sites go to fixation. Indirect interactions should reinforce the effects of advantageous mutants, making R(T) < 1 + D_{s1} for the overdominance model (but only slightly because strong interactions are unlikely), and making R(T) > 1 + D_{s1} for the underdominance model.
Even though direct calculation of R(T) is difficult for the over/underdominance model, D_{s1} can be estimated from simulation. The basic simulation procedure is described in Gillespie (1994a), and D_{s1} is estimated from the simulation as described above. Several conclusions result. First, 1 + D_{s1} does an extraordinarily accurate job of predicting R(T) for the overdominance model, suggesting that indirect interactions are, in fact, very small. Second, it correctly predicts that the underdominance model has R(T) > 1, but underestimates the magnitude of R(T), but this underestimate is in the expected direction, given that indirect interactions should exist (Figure 1).
The underdominance model can probably account for R(T) > 5, but this is difficult to show in simulation, because the origination rate goes to zero very rapidly as Ns gets below −4. Interpolating from the graph, there appears to be a narrow range of Ns, perhaps −8 < Ns < −4, with a large, but not astronomical, R(T) that could account for the observed values, but with an overall rate of evolution that is much lower than the neutral rate.
TIM and SASCFF are both models of a rapidly fluctuating environment, and understanding their behavior requires slightly more conjecture. There are no direct interactions between sites in either of these models, but the magnitude of indirect interactions is difficult to predict. Gillespie has shown in simulation that E{X_{∞}} > E{X_{0}}, which is consistent with expected site frequencies increasing over time. Gillespie also found that R(T) < 1 for all these simulations, which is also consistent with expected site frequencies increasing over time. Nevertheless, actually demonstrating that expected site frequencies increase over time is a formidable problem. One is, however, fairly convinced of this by comparing simulated R(T) with 1 + D_{s1}, as estimated in these simulations. Simulation details can be found in Gillespie (1994a). Results are similar to the over/underdominance case. For the TIM model, which will not maintain polymorphism in an infinite population and is therefore more likely to experience significant mean fitness fluctuation and as a result more indirect interactions, 1 + D_{s1} does a qualitatively good job of predicting R(T), but there is some room for quantitative improvement. For the SASCFF models with a balancing component to selection (B > 1; note that this also implies a stationary frequency distribution for the finite allele diffusion and therefore is less likely to have a significant indirect interaction component), 1 + D_{s1} is an extremely accurate predictor of R(T) (Figures 2, 3, 4, 5, and 6).
HOUSE OF CARDS
The house of cards model of molecular evolution is the most thoroughly analyzed (Ohta and Tachida 1990; Tachida 1991, 1996; Iwasa 1993; Gillespie 1994b; Araki and Tachida 1997) and widely accepted (Nachmanet al. 1994; Moran 1996; Ohta and Gillespie 1996) model of molecular evolution with the possibility of accounting for a large index of dispersion. The HOC model achieves a high index of dispersion through deleterious sites and an enormous indirect interaction component. There is no direct interaction in this model.
In its most common form, the HOC model assumes that the fitness of any mutation is picked from a normal distribution with mean 0 and variance σ^{2}. Under a small mutation rate assumption, it has been shown that the fitnesses of the most recently fixed sites can be thought of as a Markov process with a stationary distribution that is approximately Gaussian with mean 2Nσ^{2} and variance σ^{2} (Gillespie 1994b; Tachida 1996). Because the fitness of the most recently fixed site has mean 2Nσ^{2}, and new mutations have mean fitness 0, we can think of the relative fitness of new mutations as a normally distributed random variable with mean −2Nσ^{2}. Therefore, the vast majority of mutations have to be deleterious, so D_{s1} > 0, and in the absence of interaction, R(T) > 1.
In fact, indirect interactions can lead R(T) to be vastly largely than one. To a first approximation, the mean fitness of the population is equal to 1 + s*, where s* is the selection coefficient of the most recently fixed site. Therefore, the population mean fitness must fluctuate, and these fluctuations must occur slowly (in fact, on the exact same time scale as molecular evolution). Because mean fitness fluctuates, indirect interactions are expected. Because mean fitness fluctuates on the same time scale as molecular evolution, the indirect interaction component must be large. Putting together deleterious sites with large indirect interactions leads to the prediction that R(T) > 1, and perhaps much greater (Figure 7). As expected, 1 + D_{s1} > 1, but not much greater. 1 + D_{s1} never rises above 1.5, despite R(T) growing to nearly 500. The indirect interaction component is enormous, though.
The house of cards model can account for an index of dispersion >5, but only when 0.5 < Nσ < 2. This is an incredible parameter sensitivity. For Nσ < 0.5, the house of cards is essentially a neutral model. For Nσ = 2, the index of dispersion is well into the hundreds. It is difficult to simulate Nσ > 3, because the origination rate is so slow.
OPTIMUM MODEL
The optimum model is a simple model of purifying selection. All mutations are assigned a phenotype drawn from a zero mean, unit variance normal distribution. The fitness function is quadratic with a maximum at zero. A single parameter σ measures the width of the fitness function [see Gillespie (1994b) for further details on this model]. It is obvious that most mutations are deleterious; therefore, in the absence of mutation interaction, one expects R(T) > 1 and D_{s1} > 0. It is also clear that, like the HOC model, there are no direct interactions, but there are indirect interactions caused by the mean fitness of the population changing with each fixation (Figure 8). One sees roughly equal contributions from D_{s1} and indirect interactions. Also, note that R(T) grows very slowly with increasing selection. As selection increased by a factor of 2 from σ = 0.05 to σ = 0.1, R(T) rose by only 1%. Over a wide range of parameter values (results not shown), the optimum model always has difficulty producing an R(T) as large as 5.
SHIFT MODELS
Shift models are a perfect example of why performing simulations in the absence of theory can lead to entirely uninterpretable results. At first glance, the shift story looks simple. Gamma and exponential shifts yield R(T) ≈ 1; normal shifts lead to R(T) < 1 (Gillespie 1994b). Without theory, one might be tempted to say that the gamma and exponential shift models are “neutral” like, and therefore might be easy to analyze. Nothing could be further from the truth. Shift models are quite complicated and provide the clearest example of how direct interactions move R(T) toward 1.
The gamma and exponential shift models can be further subdivided into positive and negative shifts. In the negativeshift models (the only kind that has received significant theoretical attention; Ohta 1977; Kimura 1979; Gillespie 1987, 1994b) the fitness of a new mutation is equal to the fitness of its parent's sequence, minus a gamma or exponentially distributed random variable (gamma distribution is the gamma shift, exponential is the exponential). In the positive shifts, the random variable is added, not subtracted from the fitness.
Qualitatively analyzing D_{s1}, in the absence of site interactions, for gamma or exponential shifts is easy. For negative shifts, each new mutation has, on average, a fitness lower than the mean fitness of the population; hence mutations are on average deleterious and D_{s1} > 0. Similarly, positive shifts are advantageous and D_{s1} < 0. Direct interactions qualitatively change this picture.
Shift models fundamentally differ in their mode of site interaction from all other models of evolution that we have so far considered. In all other models, the fitness of a sequence is essentially independent of the number of mutations it contains. In shift models, the fitness of a piece of DNA is directly proportional to the number of mutations it contains. Thus, sites directly interact with one another, so R(T) should be closer to one.
One can attempt to crudely estimate this effect. Consider,
In the first case j_{t} arises on a piece of DNA containing i_{0}, and i_{0} is still segregating in the population. In the second case j_{t} arises only after i_{0} has been fixed (because j_{t} is on i_{0}, i_{0} cannot have been lost before time t). Therefore,
Consider a simple twoallele diffusion, where the fitnesses of the genotypes A_{1}A_{1}, A_{1}A_{2}, and A_{2}A_{2} are 1, 1 +
s/2, and 1 + s, respectively. The probability of fixation of a new mutant A_{2} is given by Ewens (1979, p. 147):
D_{s2}, much like D_{s1}, contains a term, Pr{X_{t} = 1}, that is difficult to find analytically. However, this term is easy to obtain from simulation. Hence, Figure 10 was produced using simulation to estimate both D_{s1} and D_{s2}. These simulations behave qualitatively as expected. 1 + D_{s1} + D_{s2} does a much better job of predicting R(T) than does 1 + D_{s1} alone, but there is still much room for improvement. Moreover, because R(T) is so nearly equal to 1.0 for the negative gamma shift, one wonders if a much better approximation than D_{s2} is easily available.
Normal shifts are similar in structure to gamma shifts. The fitness of a sequence with a newly arising mutation is its parents' fitness plus a normally distributed random variable with mean 0 and variance σ^{2} (instead of a gammadistributed random variable). Unlike the gamma shifts, where all sequences with new mutations were either uniformly worse than their parents (negative) or uniformly better (positive), mutants under a normal shift have a 50% chance of having higher fitness than their parents and a 50% chance of having a lower fitness. It is a little difficult to predict a priori that under this model, mutations on average increase in frequency, but this is not altogether surprising. New mutants have a higher than average fitness half the time. Thus, one expects new mutants to increase in frequency roughly half the time. Because there is a lot more “space” above 1/N than there is below it, it is not surprising that mutants on average increase in frequency. In any case, from simulation it is clear that mutants do, in fact, increase in frequency, on average, and as a result D_{s1} < 0. Given that D_{s1} < 0, one expects that D_{s2} > 0 because of direct interactions. It is clear from simulation (Figure 11) that 1 + D_{s1} + D_{s2} does a reasonable job of predicting R(T), but once again there is still considerable room for improvement, particularly for weak selection.
INFINITE ALLELE MODELS
Gillespie (1993) presents a proof that a twoallele diffusion with a reflecting barrier below and an absorbing barrier above has an index of dispersion less than one. The proof is formulated as a waiting time problem in a diffusion and is technical. The intuition is as follows. Consider an infinite allele model of the gene. Origination processes in infinite allele models differ from those in infinite site models in at least one fundamental way. Under infinite alleles, an origination occurs only when every individual in the population has exactly the same allele at the locus. Therefore, at the instant when an allele fixes, there must not be any mutations in the entire population's coalescent. With this in mind, one can think of the time between fixations as being composed of two pieces T_{b} + T_{c}_{f}. T_{b} is the time the population waits until a mutation occurs that will eventually fix, and T_{c}_{f} is time between when a mutant destined to fix arises in the population and when it actually fixes. Gillespie argues that as the mutation rate gets small, T_{b} looks increasingly like an exponential waiting time, under lots of models. He notes from his simulations that the variance to mean ratio of T_{c}_{f} looks no more erratic than an exponential wait; therefore he concludes that T_{b} + T_{c}_{f} is more regular than an exponential waiting time. Recall that the sum of two exponentials is more regular than a single exponential.
Gillespie's argument can be understood in the terms presented here as well. From (2) and (5), the index of dispersion can be written as
A DELETERIOUS MUTATIONS MODEL
With an understanding of D_{s1} and mutation interactions in hand, it is easy to construct a model that ought to produce large values of R(T). Deleterious sites cause D_{s1} > 0, so the model must have mostly deleterious mutants. Direct interactions can negate this effect, so there must be no direct interactions between sites. Any model with these two properties ought to produce an R(T) > 1. The following extremely simple model (Iwasa 1993) should produce a large R(T).
Consider a twoallele model with alleles A_{1} and A_{2} with fitnesses 1 and 1 − σ, σ > 0, respectively. When an A_{1} allele mutates it becomes A_{2} with probability 1. When an A_{2} allele mutates it becomes A_{1} with probability q, q ≪ 1, and stays A_{2} with probability 1 − q. A_{1} should be nearly fixed most of the time, and therefore almost all mutations will be deleterious. Even when A_{2} is nearly fixed, most mutations are neutral, so that D_{s1} should be large.
To finish off the model, assume that q = 0.001 and this is an additive diploid population structure, so the ith sequence, with fitness w_{i} ϵ {1,1 − σ} and frequency X_{i}(t) in generation t, has deterministic frequency change
This model should produce indirect interactions, because whenever polymorphism is unusually high, it is extremely likely that at least one A_{2} allele is at high frequency, which suggests that population mean fitness is unusually low, and subsequent fixations of other A_{2} alleles are unusually easy (τ). Simulations reflect this intuition (Figure 12). Under a weak mutation approximation, Iwasa (1993) showed that a similar model with more alleles could also produce a large R(T).
The deleterious mutant model behaves exactly as expected. For 2Nν = 2, R(T) does not quite reach five before the origination rate falls significantly below the neutral level, but because the leading term in D_{s1} is ν, elevating 2Nν ought to increase R(T). It does (Figure 13). This model can create an R(T) as large as one likes, while still maintaining an origination rate within an order of magnitude of the neutral rate, but only for a narrow range of Nσ values. R(T) can be further elevated (but only slightly) by making the A_{2} allele recessive (results not shown).
These results are not crucially dependent on choice of q. As long as q is small the conclusions hold. Figure 14 shows this. As long as q stays below 0.1, R(T) remains quite high. Interestingly, and perhaps not surprisingly, large q versions of this model are the only example in this article of a model without a renewallike appearance. For q > 0.01, one can show that R(T) is statistically different from what its value would be had the model been a renewal process. As a q = 0 limiting case, a slight variant of this model was considered. In this model, A_{1} mutates to A_{2} with probability one, and A_{2} mutates to A_{2} with probability one, but when a site fixes, the sequence on which that site arose instantaneously becomes an A_{1} allele. This model can be thought of as a deleterious shift model, analogous to the gamma shift, but with direct site interactions removed.
DISCUSSION
In the absence of site interactions, deleterious mutations cause R(T) to be >1, and advantageous mutations cause R(T) to be <1. Advantageous mutations are shown in simulation to nearly completely explain all previous models that produced an R(T) < 1 (Figures 1, 2, 3, 4, 5 and 6). Direct interactions (a sequence's fitness is directly proportional to the number of mutations contained in the sequence) tend to make R(T) closer to 1 than it would otherwise be. Indirect interactions (sites interact through an intermediary, usually population mean fitness) generally have the opposite effect.
For mammalian species, the observed index of dispersion of proteinencoding loci is ≫1, and our current best estimate suggests that it is >5 (Ohta 1995), for both silent and replacement sites. In Drosophila a somewhat different picture emerged (Zenget al. 1998). R(T) for silent sites is statistically >1 (an average of 4.37) and not too dissimilar from the mammalian value. Replacement sites, on the other hand, showed an R(T) (1.64) that is much lower than mammals and not distinguishable from one. This raises the immediate possibility that whatever model of evolution is correct in mammals, something quite different may be happening in flies (Zenget al. 1998). Unfortunately, interpretation of the Drosophila results is difficult because of the extremely low divergences between D. pseudoobscura and D. subobscura.
In all the simulations done here, the longrun behavior of R(T) is reported. The reason for this is that virtually any stationary, orderly (no more than one mutation per time step) model of molecular evolution has the property that R(T) will be an increasing function of time. For Equation 1, it is clear that so long as h(t) converges monotonically to ρ, the longer one allows the process to evolve, the larger R(T) will be. For every model simulated here, R(T) was an increasing function of T, at least for small T. It is also clear from Equation 1 that if one observes a process for only a single time step, R(T) will be exactly equal to 1.0 − ρ. Thus, as a population evolves, R(T) will start at ~1 and continue to rise, until some longterm asymptotic value is reached. The length of time it takes to approach this asymptote is crucially dependent on the details of the model, but some intuition is possible.
Consider an attempt to estimate R(T) in a highly simplified situation. Suppose there are X_{1} substitutions in lineage one and X_{2} substitutions in lineage two. A simple estimate of R(T) might be (X_{2} − X_{1})^{2}/(X_{1} + X_{2}). Because the number of substitutions can never be negative, R(T) will never be larger than the maximum of {X_{1}, X_{2}}. Thus, the estimate of R(T) can never be larger than the maximum number of substitutions observed in the two lineages. So, unless there are at least 5 substitutions in one of the lineages, the estimate of R(T) has to be <5, no matter what the long run value is. Thus, when the level of divergence is very low, there is no possibility of estimating a large value of R(T).
In 12 of the 24 proteins examined by Zeng et al. (1998), both D. pseudoobscura and D. subobscura show less than five substitutions. For eight loci, neither species has more than a single substitution. Now, this situation is complicated by the existence of a third species, and the use of Gillespie's weighting factor, but the above intuition suggests that one should be very careful in drawing any inference from these replacement sites. One has to worry that, because of the incredibly low replacement site divergences, this study is simply incapable of estimating the asymptotic behavior of R(T). In contrast, for silent sites only a single locus shows fewer than five substitutions in both lineages, and silent sites exhibit a value of R(T) quite similar to mammals. Thus, the principal difference between Zeng et al.'s and Ohta's results may be that Ohta examined deeper divergences and came closer to observing the longrun behavior of R(T).
In any case, it is quite certain that our best estimate of R(T) for mammals is >5 for both silent and replacement sites. The question remains, What evolutionary processes can generate an index of dispersion so large? This work shows that there are fundamentally only two ways to create an R(T) as large as 5. The first way is to force most mutations to be deleterious (create a large D_{s1} and indirect interactions). The second way is to create fluctuations (in mutation rate or probability of fixation or both) that occur on a very slow timescale. There appear to be no other ways to create a large R(T).
Virtually any model of molecular evolution with mostly deleterious mutations and no direct site interactions can explain an R(T) as large as 5, but only with sufficiently high mutation rates and selection coefficients. A very simple model of deleterious alleles is examined and shown to easily explain R(T). This model is shown to be reasonably insensitive to rare advantageous mutations.
Although the neutral theory does a poor job of explaining the index of dispersion, it nevertheless does a passable job of predicting the mean rate of originations in a lineage. We know that the persite origination rate in many proteins in most organisms is not too different from the persite, pergeneration mutation rate for these proteins (Kimura 1983). These sorts of mutation rates are extremely hard to measure accurately, but it is certain that the origination rates are not, say, four orders of magnitude higher or lower than the mutation rates for most proteins. Therefore, any model that attempts to explain R(T) must also have the feature that the overall origination rates are at least somewhat similar to the mutation rates.
All the current models that can explain a large R(T) suffer from a parameter sensitivity problem. Models, which create a large R(T) by changing some extrinsic parameter on the same timescale as molecular evolution (Gillespie 1984a; Araki and Tachida 1997), essentially are asking one to believe that there exists some key parameter that just happens to change around every 1/ν generations (or slower), where ν is the mutation rate. Now, this just may be the way the world is, but explaining it as such really just pushes the logical question back one level. The new question now becomes, How is it that some environmental parameter just happens to change at a rate keyed to the mutation rate?
Other models suffer from similar parameter problems. Takahata's fluctuating neutral space model (Takahata 1987) can explain a large value of R(T), but if one assumes that new mutation rates are independent of previous rates, sequences that differ by only a single nucleotide will sometimes (2–5% of the time) have mutation rates that differ by an order of magnitude (Cutler 2000). To some, this feature may not seem biologically realistic.
Deleterious site models can often explain large values of R(T), but only when Ns ≈ 2 − 10. If Ns > 10 evolution stops, and if Ns ≈ 1, the world is neutral. Again these models just push the logical question back a level. If Ns = 5 explains the index of dispersion, how is it that organisms with N's that vary over many orders of magnitude just happen to have s's that match?
This last question is not entirely rhetorical. The notion that the average selective coefficient of a mutation might evolve has been suggested before (Hartlet al. 1985) and has been given a recent serious treatment (Cherry 1998). Moreover, allowing the selective coefficient to evolve might also explain the similarity between R(T) for silent and replacement sites in mammals. If Ns evolves to approximately the same place for both silent and replacement sites, then their respective R(T)'s ought to be approximately the same. Thus, models that allow levels of selection to evolve might explain the parameter sensitivity problem.
Finally, much of population genetics theory over the last 15 years has been devoted to understanding withinpopulation sequence variation. In particular, there has been a great deal of work trying to understand the observed levels of nucleotide polymorphism in genomic regions of low recombination (Kaplanet al. 1989; Charlesworthet al. 1993). Two major theories have been proposed: selective sweeps and background selection. Selective sweeps suggest that advantageous mutations go to fixation regularly and sweep out linked neutral variation in the process. Background selection suggests that recurrent deleterious mutations are being driven out by selection, thereby lowering the effective population size. Put simply, selective sweeps are inconsistent with an R(T) > 1, unless one further contends that something forces these sweeps to occur irregularly (and on a timescale tied to the mutation rate). Background selection, on the other hand, predicts that R(T) ought to be >1. An obvious question is whether there is a model that can simultaneously account for the index of dispersion and levels of polymorphism in natural populations.
Acknowledgments
I thank John Gillespie, Hiroshi Akashi, Mark Grote, Michael Turelli, and two anonymous reviewers for much advance and many helpful suggestions.
Footnotes

Communicating editor: G. B. Golding
 Received April 23, 1999.
 Accepted December 2, 1999.
 Copyright © 2000 by the Genetics Society of America