- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.052175v1
172/4/2621 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Eldon, B.
- Articles by Wakeley, J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Eldon, B.
- Articles by Wakeley, J.
Originally published as Genetics Published Articles Ahead of Print on February 1, 2006.
Genetics, Vol. 172, 2621-2633, April 2006, Copyright © 2006
doi:10.1534/genetics.105.052175
Coalescent Processes When the Distribution of Offspring Number Among Individuals Is Highly Skewed
Bjarki Eldon and John Wakeley1
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
1 Corresponding author: Harvard University, 2102 Biological Laboratories, 16 Divinity Ave., Cambridge, MA 02138.
E-mail: wakeley{at}fas.harvard.edu
We report a complex set of scaling relationships between mutation and reproduction in a simple model of a population. These follow from a consideration of patterns of genetic diversity in a sample of DNA sequences. Five different possible limit processes, each with a different scaled mutation parameter, can be used to describe genetic diversity in a large population. Only one of these corresponds to the usual population genetic model, and the others make drastically different predictions about genetic diversity. The complexity arises because individuals can potentially have very many offspring. To the extent that this occurs in a given species, our results imply that inferences from genetic data made under the usual assumptions are likely to be wrong. Our results also uncover a fundamental difference between populations in which generations are overlapping and those in which generations are discrete. We choose one of the five limit processes that appears to be appropriate for some marine organisms and use a sample of genetic data from a population of Pacific oysters to infer the parameters of the model. The data suggest the presence of rare reproduction events in which
8% of the population is replaced by the offspring of a single individual.
DNA sequences are variable within species because mutations have occurred between the present day and the time of the most recent common ancestor (TMRCA) at most genetic loci. Mutation rates are quite small: on the order of 1010 per base pair per replication event in eukaryotes and 1061010 in microbes (DRAKE et al. 1998), while mutation rates measured from sequence differences between species range from
108 to
1010 per base pair per generation (LI 1997). The abundance of genetic variation within most species implies that a great number of generations must have elapsed since the MRCA. The occurrence of the MRCA at a locus results from the birth and death of individuals in a population (hereafter synonymous with species). Over time, some genetic lineages are lost and others leave many descendants. Other things being equal, TMRCA should be greater in a large population than in a small one. If we take 105 as a typical mutation rate per locus per generation, then to see any genetic variation, TMRCA must be roughly on the order of the inverse of this, or
100,000 generations.
What, then, do we expect the relationship to be between TMRCA and the population size N? The bulk of work has focused on just one possibility: that TMRCA should be a constant multiple of N generations. Then if N is very large, and the inverse of the mutation rate 1/µ is also large, the level and pattern of genetic diversity in a sample of DNA sequences will depend only on the product Nµ. For example, in the neutral haploid WrightFisher model (FISHER 1930; WRIGHT 1931) there is a chance 1/N that two sequences are descended from a common ancestor in the previous generation and a chance 1 (1 µ)2
2µ that there is a mutation between them in that generation. The expected number of differences between a pair of sequences is E[K]
2Nµ, which is simply the product of the expected number of generations to the common ancestor (N) and the expected number of mutations per generation on the two genetic lineages (
2µ).
A rigorous formulation of these ideas yields the coalescent (KINGMAN 1982a,b; HUDSON 1983; TAJIMA 1983), which is a continuous-time stochastic process for the ancestry of a sample from the present back to the MRCA. In the limit N
, and with time measured in units proportional to N generations, each pair of ancestral lines reaches a common ancestor, or coalesces, with rate 1. Independently, each ancestral line undergoes mutation with rate
/2. This holds when Nµ is constant in the limit as N tends to infinity. In the WrightFisher model above 2Nµ
, so that E[K] =
, as N
. This scaling between mutation and population size is shared by the various extensions of the coalescent that are reviewed in NORDBORG (2001). The discovery of the coalescent greatly expanded the ways in which genetic data can be used to make inferences about historical events and the characteristics of populations (TAVARÉ 2004). Although the coalescent is a continuous-time process that exists in the limit as N
with time rescaled by N, it is used to approximate the behavior of gene genealogies in a wide range of species whose population sizes are very large.
It is important to note that the scaling relationship between N and µ in the coalescent is the consequence of a key assumption: that the variance
of the number of offspring among individuals in the population is not too large. Specifically, in the original proof of the coalescent KINGMAN (1982a,b) assumed that this variance converged to a constant
2 in the limit N
. This assumption has the additional consequence that only binary mergers of ancestral lines occur in the limit, so the gene genealogy of the sample is a bifurcating tree. Recently, a broader class of ancestral processes has been described in which this assumption about the distribution of offspring number is relaxed. While Kingman's coalescent is robust to many deviations from its assumptions (MÖHLE 1998, 1999), a major shift in the behavior of the ancestral process occurs when the variance of offspring number is large. The general ancestral process allows for multiple mergers of ancestral lines and happens on a timescale that is faster than that of Kingman's coalescent (PITMAN 1999; SAGITOV 1999; SCHWEINSBERG 2000; MÖHLE and SAGITOV 2001).
We consider a simple neutral population model that can exhibit multiple-mergers behavior in the limit as the population size tends to infinity. Depending on parameter values, it can also converge to Kingman's coalescent. Through analysis and simulations, we address two main questions. First, what are the possible behaviors of such a model with respect to the scaling relationship between mutation rate and population size? This question stems from the desire to explain genetic diversity. In short, the model must predict nonzero levels of genetic variation if the coalescent with multiple mergers is to be a viable alternative to Kingman's coalescent for many species. Second, what are the differences between the coalescent with multiple mergers and Kingman's coalescent with respect to predictions about patterns of genetic variation in a sample? Kingman's coalescent is the standard for interpreting genetic variation, but it is not uncommon to reject this null model, even using simple tests (TAJIMA 1989; FU and LI 1993). For many species, the coalescent with multiple mergers might be a better null model than Kingman's coalescent.
The variance of offspring number does appear to be very high in some species, in particular those with type III survivorship curves, which produce very large numbers of offspring in the face of high mortality early in life (HEDGECOCK 1994). This strategy is common among marine organisms but it also occurs in terrestrial species that have large reproductive potential, such as some plants and fungi. HEDGECOCK (1994) proposed that very large
, or Vk (CROW and KIMURA 1970), is the primary reason that levels of genetic variation in many species are much lower than predictions based on their population sizes (and the assumption that
Nµ). Estimates of Ne based on temporal variation of allele frequencies or on samples of genetic data from a single time point are often much lower than the estimates, made independently, of the actual population size N. Small values of the ratio Ne/N are taken as evidence of large Vk since Ne/N
1/Vk in many models (CROW and KIMURA 1970; HEDRICK 2005). For example, HEDGECOCK (1994) estimated the ratio Ne/N to be between 105 and 106 in a population of the Pacific oyster (Crassostrea gigas). TURNER et al. (2002) estimated Ne/N to be <103 in a commercially important fish, the red drum (Sciaenops ocellatus). HEDGECOCK (1994) also cites the case of the American lobster (Homarus americanus) whose effective population size is estimated to be
104 while some 107 lobsters are harvested annually. ÁRNASON (2004) estimated Ne/N to be 105106 in the Atlantic cod (Gadus morhua).
We suggest that the multiple-mergers coalescent processes might resolve many of the questions raised by HEDGECOCK (1994) and others. These ancestral processes are radically different from Kingman's coalescent: the relationship between
and the population size N is less than linear, gene genealogies include multifurcations, and these processes have no effective population size in the usual sense (see the DISCUSSION). We identify one such multiple-mergers coalescent process, of five that are possible under the model we propose in the next section, and we apply it to a sample of genetic data from Pacific oysters in British Columbia (BOOM et al. 1994). The model includes the possibility that the offspring of a single individual replace a substantial fraction of the population and yet still predicts that some genetic variation should be observed. This addresses the point made by ÁRNASON (2004) in his study of Atlantic cod, that "large" reproduction events cannot be too frequent or there would be no genetic variation. For Pacific oysters in British Columbia, we find that the individuals who win HEDGECOCK's (1994) reproduction "sweepstakes" replace
8% of the population.
|
Mathematical analysis of ancestral limit processes:
Consider a sample of size n taken without replacement from the population. The gene genealogy traces the ancestral lines of the sample back to their MRCA. Time is measured backward into the past, with the time of sampling defined to be time zero. We use the term "x-merger" to denote the event that x ancestral lines are descended from a single member of the population (the parent above) in a single time step. If an x-merger occurs when there are i ancestral lines, then the number of ancestral lines changes from i to i x + 1 in that time step, and x = 2, 3,..., i. The probability of an x-merger is given by
![]() | (1) |
(r j + 1) with (r)0 = 1. The events x = 0 and x = 1 are also possible, and we have
, but events in which x < 2 do not lead to mergers. The assumption of a single reproduction event per time step excludes the possibility of simultaneous mergers (SCHWEINSBERG 2000; MÖHLE and SAGITOV 2001).
A key quantity in assessing convergence to a continuous-time limit process is what MÖHLE (1998) has called the coalescence probability, which he denoted cN, but which in our notation is G2,2. From Equation 1, we have
![]() | (2) |
for a continuous-time limit process to exist. One unit of time in the limit process is typically taken to be 1/G2,2 steps in the discrete-time model. The requirement that G2,2
0 as N
excludes certain distributions of U, namely those that have too much weight on values of U of order N. As in Kingman's coalescent, we seek a limit process for use as an approximation to the behavior of gene genealogies in a large population.
Let µ be the probability of mutation for each of the U 1 offspring in each single time step. Mutations to the offspring occur independently, but neither the parent nor the N U individuals who simply live through each time step can mutate. To capture the fact that mutation rates are very small, we let µ
0 as N
, although for the moment we refrain from specifying its rate of approach to zero. With µ infinitesimally small, genetic variation cannot be explained by mutations that co-occur with mergers because only a finite number of mergers occur in the ancestry of any (finite) sample. If the model is to predict realistic (neither zero nor infinite) levels of genetic variation, then in one time step the probability
![]() | (3) |
The usual way to measure time in these models is to scale it by the inverse of the coalescence probability, so that one unit of time in the limit process is equal to 1/G2,2 steps in the discrete-time model (PITMAN 1999; SAGITOV 1999; MÖHLE and SAGITOV 2001; BIRKNER et al. 2005). However, as is illustrated below, we find that this choice of timescale makes it difficult to interpret the relative sizes of gene genealogies. Therefore, we scale time by 1/G2,2 times a constant
2, which derives from the simple model for PU(u) that we adopt in Equation 7 below. We emphasize that the predictions of the model concerning patterns of genetic variation do not depend on which of these timescales is used.
After scaling by
2/G2,2 the rate of x-mergers becomes
![]() | (4) |
![]() | (5) |
. The limit process follows from the existence of limits
(PITMAN 1999; SAGITOV 1999). Note that PU*(u) corresponds to the measure
invoked in other works (PITMAN 1999; SAGITOV 1999; MÖHLE and SAGITOV 2001; BIRKNER et al. 2005).
Whether a limit process of our model predicts reasonable levels of genetic variation will depend on the value of the scaled rate of mutation per ancestral line,
![]() | (6) |
. In particular, we wish to distinguish cases in which
must be zero from those in which
could be greater than zero. As discussed above, we assume that µ
0 as N
. Therefore, for
to be greater than zero, the value of the sum in Equation 6 must grow with N.
For the remainder of this work, we adopt the following simple model for the distribution of U in our modified Moran model. We assume that
![]() | (7) |
is a constant between 0 and 1, and
0. In seeking a continuous-time limit process, we require that G2,2
0 and this further restricts us to
> 0. In words, most of the time (with probability 1 N
) the parent has the usual number of Moran-model offspring, but occasionally (with probability N
) the parent and its offspring replace a fraction
of the population. In the usual Moran model, where PU(2) = 1, coalescence occurs on a timescale of 1/G2,2 = N(N 1)/2 steps. Thus, if
> 2 we expect N
-reproduction events to be too infrequent to shift the ancestral process from Kingman's coalescent. In contrast, the parameter range 0 <
2, in which large (U = N
) reproduction events occur at least as frequently as regular (U = 2) reproduction events, will be of particular interest. The model requires that N
is an integer, and we assume implicitly that this is true.
The constant
is a parameter of the limit process and it has a clear biological interpretation. It is the scaled family size, or the scaled number of offspring, of a large reproduction event, measured as a fraction of the total population. In comparison, work on general multiple-mergers coalescent processes occurs in a more abstract mathematical setting (PITMAN 1999; SAGITOV 1999; BIRKNER et al. 2005). More easily interpreted models include the power-law distribution function for family sizes that yields the beta-coalescent (SCHWEINSBERG 2003; BIRKNER et al. 2005) and the models of recurrent selective sweeps (GILLESPIE 2000) that are best approximated by a coalescent with simultaneous multiple mergers (DURRETT and SCHWEINSBERG 2005). Note that our assumption in Equation 7, that the family size of large families is on the order of the population size, is required to produce an ancestral process that is different from Kingman's coalescent given our modified Moran model; see MÖHLE and SAGITOV (2001, p. 1552).
We show in the APPENDIX that five different limit processes of our modified Moran model are possible as N tends to infinity, depending on the value of
> 0. These are summarized in Table 1
. Consideration of
i,x alone uncovers three different behaviors in the limit. If
< 2, the result is a multiple-mergers coalescent process, because the rate of N
-reproduction events is much greater than the rate of 2-reproduction events. In this case,
(N
)
1 as N
, and the limit process is of the type described by PITMAN (1999) and SAGITOV (1999) with
= 
, i.e., the
-function at the point
. Again, note that our timescale differs from theirs by the factor
2. If
= 2, then the two types of reproduction events occur on the same timescale, and
(N
)
2/(2 +
2) as N
. This corresponds to the case where
has mass 2/(2 +
2) at 0 and mass
2/(2 +
2) at
. Note that the mass at 0 affects only the rate of binary mergers, which is the only possible type of merger when a 2-reproduction event occurs. If
> 2, then
in the limit, and the ancestral limit process is Kingman's coalescent, as expected.
|
In the first case above (0 <
< 2), N
-reproduction events are responsible for all mergers in the limit because
and
as N
. Our consideration of mutation and genetic diversity subdivides 0 <
< 2 into three different cases. The key point here is that, despite the fact that
, these infrequent 2-reproduction events can generate many mutations in the ancestry of the sample if
as
(see Equation A4 in the APPENDIX). In the first two cases in Table 1, the rate of N
-reproduction events is too large (
) and only a finite number of mutations will occur before the sample reaches its MRCA. The scaled mutation parameter becomes a constant times µ, and this will be so small that the predicted level of genetic diversity is zero. In the third case, 1 <
< 2, multiple mergers can occur, but it is also reasonable to expect some genetic variation to be observed. The scaled mutation parameter is µ times a strictly increasing function of N, so
could be appreciable if the population size is large enough. In the final two cases in Table 1, the scaled mutation parameter is a linear function of N, which is the case typically in population genetics.
Among the five possible limit processes, we suggest that the case 1 <
< 2 might be a good null model for many organisms, namely those with very skewed offspring number distributions and very large population sizes. A large variance in offspring number leads to an ancestral process of coalescence that includes multiple mergers, while a very large population size is needed because the mutation parameter
scales less than linearly with N. Specifically,
= 2N
1µ and 0 <
1 < 1, so depending on the value of
, N might have to be very large for the level of genetic variation to be appreciable.
Consider a sample of two DNA sequences at some genetic locus, and let K be the number of mutations on their gene genealogy. If mutations occur according to the infinitely many sites model without recombination (WATTERSON 1975), then every mutation results in a polymorphic site or in a difference between the two sequences at some site. For this limit process with 1 <
< 2, we have
2,2 =
2, and E[TMRCA] = 1/
2. Then, since the rate of mutation is
/2 for each of the two ancestral lines, we have E[K] =
/
2 and
![]() |
should be smaller than the level of genetic variation in a sample from a population with a smaller value of
, all other parameters being equal. Note that under the usual time scaling for multiple-mergers coalescent processes (PITMAN 1999; SAGITOV 1999; MÖHLE and SAGITOV 2001), E[TMRCA] = 1 and by analogy with Kingman's coalescent the mutation parameter is defined to be
(N) = 4N
1µ/
2, so that E[K] =
.
Properties of multiple-mergers genealogies in simulations:
The level and pattern of variation in a sample depends on the sample size n, the mutation parameter
, and the family-size parameter
. A program, written in C, to simulate the ancestral process for the case 1 <
< 2 is available from the authors upon request. The program simulates the ancestry, or gene genealogy, of a sample, including the tree relating the members of the sample and all the branch lengths, or coalescent times. It also implements the inference method described in the next section.
Figure 2 shows estimates of the expected total length of the gene genealogy, Ttot, which is the sum of the lengths of all ancestral lines back to the MRCA, as
ranges from 0.05 to 0.95. The result for our timescale is given in Figure 2a, while Figure 2b shows the same results when time is measured using the usual scaling (PITMAN 1999; SAGITOV 1999). Figure 2 should be interpreted as a comparison of different populations, which have the same values of N and
, but different values of
. Under our timescale, of two populations that experience N
-reproduction events with probability N
per time step, the population with the larger value of
will have shorter gene genealogies. Under the usual timescale, in Figure 2b, the average total tree length increases with
. This is because, as
increases to 1, every sample will likely reach its MRCA at the first N
-reproduction event in the past, so that E[TMRCA] = 1 under the usual timescale, regardless of sample size, and E[Ttot] = n. The predictions about levels of polymorphism are the same under both timescales due to the different definitions of
.
|
It is also of interest to know how E[Ttot] depends on the sample size n. Under Kingman's coalescent,
, so the dependence on n is logarithmic. The weak dependence on n when n is large under Kingman's coalescent, and the associated "diminishing return" on further sampling, has shaped discussions of sampling strategies for the measurement of sequence polymorphism (PLUZHNIKOV and DONNELLY 1996). Figure 3 compares the dependence of E[Ttot] on n in simulations of the current model with 1 <
< 2 for a range of values of
. The logarithmic dependence under Kingman's coalescent is shown for reference (Figure 3, thick curve). When
is small, the dependence on n is close to that under Kingman's coalescent, but becomes dramatically different (linear) as
increases to 1. To emphasize the dependence on n, rather than the effect of
that is shown in Figure 2, the values of Ttot in Figure 3 are normalized by the values at n = 5 for each
.
|
The shapes of gene genealogies can also be very different under the current model than under Kingman's coalescent. For example, when
is large, gene genealogies will tend to be star shaped, with all ancestral lines emanating from the MRCA. One way to measure the shape of a gene genealogy is to compute the total length of all branches that are ancestral to 1, 2,..., n 1 members of the sample. Let Ti be the sum of the lengths of all branches in the gene genealogy that have i descendants in the sample. The tests of TAJIMA (1989) and FU AND LI (1993), which are often described as tests of selective neutrality, in fact simply detect deviations from the predictions of Kingman's coalescent about Ti, under the additional assumption of the infinitely many sites mutation (WATTERSON 1975).
Figure 4 shows the dependence of E[Ti] on
, estimated from simulations for a sample of size n = 4. The values are given as fractions of the expected total tree length E[Ttot], so that they sum to one for each value of
. When
is small, the allocation to different kinds of branches is similar to that under Kingman's coalescent, in which case E[T1]/E[Ttot] = 0.55, E[T2]/E[Ttot] = 0.27, and E[T3]/E[Ttot] = 0.18 when n = 4. As
grows, the genealogy becomes dominated by external branches, and this is of course true for samples of any size. For small samples it is possible to generate analytical predictions for E[Ti] or other quantities by enumerating all possible gene genealogies. The lines in Figure 4 show the predictions for n = 4 derived in the APPENDIX. Although Figure 4 implies that
needs to be relatively large for the differences from Kingman's coalescent to become apparent, the application to data below indicates a greater sensitivity to
for larger samples.
|
Application to Pacific oyster data:
We used the program described above as the basis for a method of inferring
and
from samples of genetic data. As noted already, Pacific oysters may have a population structure in which many or most individuals leave few offspring, or none at all, while others may even replace the entire population if conditions are favorable (HEDGECOCK 1994). Our model is a simplified version of this, in which reproduction events are nonoverlapping in time and where large reproduction events are of a single type (an individual replaces a fraction
of the population). These N
-reproduction events occur with probability N
at each reproduction event and, in basing our method of inference on the program above, we also assume that 1 <
< 2. Figures 2 and 4 imply that we should be able to estimate
and
on the basis of information about T1, T2,..., Tn1 (and/or Ttot). Note that, just as it is impossible to disentangle N and µ in Kingman's coalescent without independent knowledge of one or the other, we cannot estimate
, but only the composite parameter
= 2N
1µ.
We use the data of BOOM et al. (1994), which are the result of a restriction-enzyme digest of mtDNA on a sample of size n = 141 individuals. These data were previously analyzed under a conceptually related model in which some fraction of a WrightFisher population produced all the offspring every generation and the other fraction produced no offspring at all (WAKELEY and TAKAHASHI 2003). We adopt the same framework for inference and fit the two parameters of our model by a computational maximum-likelihood method on the total number of segregating sites S and the number of singleton polymorphisms
1. Under the infinite-sites mutation model, these are identical to the total number of mutations on the gene genealogy and the total number of mutations on the external branches of the gene genealogy. Then, given a gene genealogy, S
1 and
1 are independent and Poisson distributed with parameters
(Ttot T1)/2 and
T1/2, respectively. As above, Ttot is the total branch length of the genealogy and T1 is the total length of external branches. At each point in a grid of values of
and
we estimated the log-likelihood of the data as the average over a large number of simulated genealogies.
The data are S = 50 and
1 = 31 in the sample of size n = 141, and a contour plot of the log-likelihood surface is shown in Figure 5. We estimated the surface by simulating 10,000 gene genealogies for each point in a grid composed of 80 values of
and 100 values of
. Within the constraint of this grid, there were two maximum-likelihood points, whose approximate positions are marked with a single x in the figure. The points are adjacent on the grid and differ only in their values of
, which were 0.075 and 0.0775, while
= 0.0308 at both points. We estimated E[S] and E[
1] at these two points using simulations and obtained average values
and
at the point (
= 0.075,
= 0.0308) and
and
at the point (
= 0.0775,
= 0.0308). In contrast, Kingman's coalescent, with its single parameter
, cannot generate expected values close to S = 50 and
1 = 31. For example, if we estimate
using WATTERSON's (1975) moment method, we obtain
. Under Kingman's coalescent, the expected number of singletons is E[
1] =
= 9, which is much smaller than the observed value
1 = 31.
|
1 above) observed in some data. Even using a simple method of inference it is possible to estimate the parameters of the model from a sample of genetic data. The results suggest that the ancestral process in the Pacific oyster is a multiple-mergers coalescent in which a single individual may replace a significant fraction (8% by our estimate) of the population with its offspring.
Our results hold for a modified Moran model in which there is a chance that the parent has a large number of offspring. An important feature of this model is that generations are overlapping. Many organisms have overlapping generations, although we do not claim that the details of our model are true for any particular species. In contrast, most work in population genetics is done under the WrightFisher model of reproduction (FISHER 1930; WRIGHT 1931), which is an idealized model of nonoverlapping generations. Under the standard assumptions, the WrightFisher model and the Moran model have the same ancestral limit process, and that is Kingman's coalescent. Interestingly, analysis of a modified WrightFisher model that is comparable to our modified Moran model yields a different range of ancestral limit processes. Consider a WrightFisher-type model in which most generations proceed according to the usual dynamics, but where occasionally (with probability N
each generation) there is a single highly fecund individual. This special individual has chance
of being the parent of each individual in the next generation, while the other N 1 individuals share the remaining fraction 1
of reproduction events according to the usual WrightFisher sampling process.
We show in the APPENDIX that the range of ancestral processes under this modified WrightFisher model is similar, but not identical to those under our modified Moran model. This is due to the fact that in the modified WrightFisher model all individuals die and are replaced by offspring every generation (and thus all N have the potential to mutate), whereas in the modified Moran model only a fraction of individuals are replaced by offspring every time step. Consequently, there is no range of
> 0 in the modified WrightFisher model equivalent to 0 <
1 in the modified Moran model, where the population scaled mutation rate must tend to zero as N
. The behavior of the modified WrightFisher model corresponds to that of the modified Moran model with
> 1 if
=
1. When
> 2 and
> 1, the difference in opportunities for mutation in the two models is perfectly compensated for by the difference in probabilities of coalescence. Thus, consideration of large variances in offspring number uncovers a fundamental difference between models with overlapping vs. models with nonoverlapping generations.
As with any idealization, there are probably many aspects of our modified Moran model that would be unrealistic for a given species. Among other things, one might question whether the population size has been constant over time, whether all genetic variation is selectively neutral, whether the population is well mixed, and whether the age distribution is close to what our model would predict. Given the difference between our model and the modified WrightFisher model discussed above, it would be risky to extend the well-known robustness of Kingman's coalescent (MÖHLE 1998, 1999) to multiple-mergers coalescent processes. For example, the model we considered looks superficially similar to WrightFisher models with periodic, extreme bottlenecks or with periodic selective sweeps. However, in both these cases the limit process would include simultaneous multiple mergerssee DURRETT and SCHWEINSBERG (2005) for an analysis of periodic selective sweepsrather than asynchronous multiple mergers (SAGITOV 1999) as we have here.
Robustness results in population genetics are usually described in terms of effective population size. This term has been defined loosely to be the size of an ideal, WrightFisher population that would have the same "rate of genetic drift" as the population under consideration. The rate of genetic drift can be defined in several different ways (EWENS 1982), but the essential idea is that the dynamics of a complicated model can in some cases be shown to be identical to those of a simpler model via an effective population size alone. In other cases, the dynamics of a more complicated model cannot be reduced to those of a simpler model, and then there is no effective population size. For example, the well-known result that the effective size of a population whose size fluctuates over time is equal to the harmonic mean of the population sizes over time requires that the fluctuations are not too large and that the average population size does not change over time.
SJÖDIN et al. (2005) recently formalized the concept of the coalescent effective population size, which they argue should supplant all other definitions. The existence of a coalescent effective size means that the ancestral process for a sample from the population converges to Kingman's coalescent in the limit as the actual population size tends to infinity, so that all aspects of genetic variation in samples should conform to the predictions of Kingman's coalescent. In the limit process we apply to the Pacific oyster data of BOOM et al. (1994), in which we have assumed 1 <
< 2, the ancestral process is drastically different from Kingman's coalescent, so the coalescent effective size does not exist. Other definitions of effective size are similarly inapplicable and uninformative because the dynamics of genetic diversity in the population both forward and backward in time are in no sense equivalent to those of the idealized WrightFisher model. From the forward-time perspective, the presence of the N
-reproduction events would produce jumps in allele frequencies that would invalidate the usual diffusion approximation (EWENS 2004). The modified Moran model and the modified WrightFisher model considered here have effective population sizes in the usual sense only when
> 2 and
> 1 and the ancestral limit process is Kingman's coalescent.
Ancestral limit processes with mutation:
Here we consider the limits of Equation 4 and Equation 6 as N
, under the assumption that the number of ancestral lines i is finite and treating the parameters
and
as constants, with 0 <
< 1 and
> 0. We rewrite Equation 4 and Equation 6 as
![]() |
![]() |
should be the rescaled rate of an x-merger and no mutation, but we note that the correction would simply be to multiply
by 1 + O(µ).
We consider the limits of PU*(u), AN(u, i, x), and BN(u, i) as N
. For PU*(u), depending on the value of
,
![]() | (A1) |
. Next, we have AN(2, i, 2) = 1, and AN(2, i, x) = 0 for x > 2, while
![]() |
![]() |
![]() | (A2) |
![]() |
![]() | (A3) |
Now consider BN(u, i) as N
. We have the pair of equations
![]() |
![]() |
The scaled mutation parameter becomes
![]() | (A4) |
0 and N
, which means that
can be nonzero only if
as N
and if we further assume that µ
1/(
). Therefore, to explore the full range of
, it is necessary to know the rates at which
(2) and
(
N) approach their limits given in Equation A1. Analysis of the first line of Equation A4 reveals
![]() | (A5) |
(N) depends on the number of ancestral lines, i. When 0 <
1, mutations on the different lines are not independent because they occur at
N-reproduction events, where it is possible that several lines mutate at once.
Expected lengths of i-branches:
Here we derive the total length of all branches that have i = 1, 2, 3 descendants in a sample of size four. Let qi,x be the probability of an x-merger among i ancestral lines given that a merger has occurred. Thus,
![]() | (A6) |
. With respect to the site frequencies, or the total length of branches in the ancestry of the sample that have j = 1, 2,..., i 1 descendants in the sample, there are only five possible gene genealogies of a sample of size four, and these are defined by the series of events that takes the sample from the present time back to the MRCA.
Let
be the probability of a gene genealogy in which a series of k mergers takes the sample back to its MRCA, and where ij is equal to the number of ancestral lines present between the (j 1)st and the jth mergers. The probabilities of the five possible gene genealogies of a sample of size four are
. The first two are the two possible kinds of rooted binary trees for a sample of size four, which differ in the number of tips at either side of the root: 2 and 2 in (a) vs. 3 and 1 in (b).
When there are i ancestral lines, the expected time back to the next merger is equal to
. The structure of the gene genealogy determines how many branches in the interval are ancestral to one, two, or three members of the sample and thus would contribute to either T1, T2, or T3, respectively. For example, all the branches in the star tree, which has probability p4, are included in T1. Considering each of the five possible trees, we have
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Overlapping vs. nonoverlapping generations:
Here we compare the results for a sample of size two under the modified Moran model to those under the modified WrightFisher model. We consider the probability of coalescence G2,2 and the expected number of pairwise differences E[K].
For the modified Moran model, from Equation 2 and Equation 7 in the main text, we have
![]() |
![]() | (A7) |
![]() |
![]() |
> 0 is identical to the condition
. Given the distribution in Equation 7 in the main text, this becomes
![]() | (A8) |
Compare these results to those for the modified WrightFisher model, in which all adults die each generation and are replaced by offspring, all of which can mutate. We assume that with probability N
each generation, where
> 0, a single adult has probability
of being the parent of each individual in the next generation. If this happens, then each of the other N 1 adults has chance (1
)/(N 1) of being the parent of each individual in the next generation. With probability 1 N
each generation, the standard WrightFisher model holds, in which each adult has chance 1/N of being the parent of each individual in the next generation. Under this model,
![]() | (A9) |
![]() | (A10) |
0 as N
. One can check that the limit process for the modified WrightFisher model is a multiple-mergers process in the case 0 <
1, rather than a Kingman coalescent, by showing that
(MÖHLE and SAGITOV 2001). Note that, similarly to the case 1 <
< 2 in the modified Moran model, it is necessary to assume that the mutation rate scales less than linearly with population size in the modified WrightFisher model when 0 <
< 1 if the model is to predict any genetic variation.
ÁRNASON, E., 2004 Mitochondrial cytochrome b variation in the high-fecundity Atlantic cod: trans-Atlantic clines and shallow gene genealogy. Genetics 166: 18711885.
BIRKNER, M., J. BLATH, M. CAPALDO, A. ETHERIDGE, M. MÖHLE et al., 2005 Alpha-stable branching processes and beta-coalescents. Electron. J. Probab. 10: 303325.
BOOM, J. D. G., E. G. BOULDING and A. T. BECKENBACH, 1994 Mitochondrial DNA variation in introduced populations of Pacific oyster, Crassostrea gigas, in British Columbia. Can. J. Fish. Aquat. Sci. 51: 16081614.
CROW, J. F., and M. KIMURA, 1970 Introduction to Population Genetics Theory. Harper & Row, New York.
DRAKE, J. W., B. CHARLESWORTH and D. CHARLESWORTH, 1998 Rates of spontaneous mutation. Genetics 148: 16671686.
DURRETT, R., and J. SCHWEINSBERG, 2005 A coalescent model for the effect of advantageous mutations on the genealogy of a population. Stoch. Proc. Appl. 115: 16281657.[CrossRef]
EWENS, W. J., 1982 On the concept of effective size. Theor. Popul. Biol. 21: 373378.
EWENS, W. J., 2004 Mathematical Population Genetics I. Theoretical Introduction. Springer-Verlag, Berlin.
FISHER, R. A., 1930 The Genetical Theory of Natural Selection. Clarendon, Oxford.
FU, X.-Y., and W.-H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133: 693709.[Abstract]
GILLESPIE, J. H., 2000 Genetic drift in an infinite population: the pseudo-hitchhiking model. Genetics 155: 909919.
HEDGECOCK, D., 1994 Does variance in reproductive success limit effective population sizes of marine organisms?, pp. 12221344 in Genetics and Evolution of Aquatic Organisms, edited by A. BEAUMONT. Chapman & Hall, London.
HEDRICK, P. W., 2005 Large variance in reproductive success and the Ne/N ratio. Evolution 59: 15961599.[CrossRef][Medline]
HUDSON, R. R., 1983 Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: 203217.[CrossRef]
KINGMAN, J. F. C., 1982a The coalescent. Stoch. Proc. Appl. 13: 235248.[CrossRef]
KINGMAN, J. F. C., 1982b On the genealogy of large populations. J. Appl. Probab. 19A: 2743.
LI, W.-H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.
MÖHLE, M., 1998 Robustness results for the coalescent. J. Appl. Probab. 35: 438447.[CrossRef]
MÖHLE, M., 1999 Weak convergence to the coalescent in neutral population models. J. Appl. Probab. 36: 446460.[CrossRef]
MÖHLE, M., and S. SAGITOV, 2001 A classification of coalescent processes for haploid exchangeable population models. Ann. Appl. Probab. 29: 15471562.
MORAN, P. A. P., 1958 Random processes in genetics. Proc. Camb. Philos. Soc. 54: 6071.
MORAN, P. A. P., 1962 Statistical Processes of Evolutionary Theory. Clarendon Press, Oxford.
NORDBORG, M., 2001 Coalescent theory, pp. 179212 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. J. BISHOP and C. CANNINGS. John Wiley & Sons, Chichester, UK.
PITMAN, J., 1999 Coalescents with multiple collisions. Ann. Probab. 27: 18701902.[CrossRef]
PLUZHNIKOV, A., and P. DONNELLY, 1996 Optimal sequencing strategies for surveying molecular genetic diversity. Genetics 144: 12471262.[Abstract]
SAGITOV, S., 1999 The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab. 36: 11161125.[CrossRef]
SCHWEINSBERG, J., 2000 Coalescents with simultaneous multiple collisions. Electron. J. Probab. 5: 150.
SCHWEINSBERG, J., 2003 Coalescent processes obtained from supercritical Galton-Watson processes. Stoch. Proc. Appl. 106: 107139.[CrossRef]
SJÖDIN, P., I. KAJ, S. KRONE, M. LASCOUX and M. NORDBORG, 2005 On the meaning and existence of an effective population size. Genetics 169: 10611070.
TAJIMA, F., 1983 Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437460.
TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585595.
TAVARÉ, S., 2004 Ancestral inference in population genetics, pp. 1188 in École d'Été de Probabilités de Saint-Flour XXXI2001 (Lecture Notes in Mathematics, Vol. 1837), edited by O. CANTONI, S. TAVARÉ and O. ZEITOUNI. Springer-Verlag, Berlin.
TURNER, T. F., J. P. WARES and J. R. GOLD, 2002 Genetic effective size is three orders of magnitude smaller than adult census size in an abundant, estuarine-dependent marine fish (Sciaenops ocellatus). Genetics 162: 13291339.
WAKELEY, J., and T. TAKAHASHI, 2003 Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol. 20: 208213.
WATTERSON, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256276.[CrossRef][Medline]
WRIGHT, S., 1931 Evolution in Mendelian populations. Genetics 16: 97159.
Communicating editor: N. TAKAHATA
This article has been cited by other articles:
![]() |
J. E. Taylor The Genealogical Consequences of Fecundity Variance Polymorphism Genetics, July 1, 2009; 182(3): 813 - 837. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Eldon and J. Wakeley Coalescence Times and FST Under a Skewed Offspring Distribution Among Individuals in a Population Genetics, February 1, 2009; 181(2): 615 - 629. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wakeley and O. Sargsyan Extensions of the Coalescent Effective Population Size Genetics, January 1, 2009; 181(1): 341 - 345. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Eldon and J. Wakeley Linkage Disequilibrium Under Skewed Offspring Distribution Among Individuals in a Population Genetics, March 1, 2008; 178(3): 1517 - 1532. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lessard Cooperation is less likely to evolve in a finite population with a highly skewed distribution of family size Proc R Soc B, August 7, 2007; 274(1620): 1861 - 1865. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Shedlock Exploring Frontiers in the DNA Landscape: An Introduction to the Symposium "Genome Analysis and the Molecular Systematics of Retroelements" Syst Biol, December 1, 2006; 55(6): 871 - 874. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.052175v1
172/4/2621 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Eldon, B.
- Articles by Wakeley, J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Eldon, B.
- Articles by Wakeley, J.










. (b)
. a and b correspond to the two ways of measuring time in the limit process discussed in the text.






























