Abstract
A simple genealogical process is found for samples from a metapopulation, which is a population that is subdivided into a large number of demes, each of which is subject to extinction and recolonization and receives migrants from other demes. As in the migration-only models studied previously, the genealogy of any sample includes two phases: a brief sample-size adjustment followed by a coalescent process that dominates the history. This result will hold for metapopulations that are composed of a large number of demes. It is robust to the details of population structure, as long as the number of possible source demes of migrants and colonists for each deme is large. Analytic predictions about levels of genetic variation are possible, and results for average numbers of pairwise differences within and between demes are given. Further analysis of the expected number of segregating sites in a sample from a single deme illustrates some previously known differences between migration and extinction/recolonization. The ancestral process is also amenable to computer simulation. Simulation results show that migration and extinction/recolonization have very different effects on the site-frequency distribution in a sample from a single deme. Migration can cause a U-shaped site-frequency distribution, which is qualitatively similar to the pattern reported recently for positive selection. Extinction and recolonization, in contrast, can produce a mode in the site-frequency distribution at intermediate frequencies, even in a sample from a single deme.
THE standard neutral coalescent model (Kingman 1982a,c; Hudson 1983; Tajima 1983) assumes a panmictic species with a constant, large, effective population size over time within which no selective differences exist. While this model is often applied in the analysis of genetic data, its assumptions are probably inappropriate for most organisms. That is, many species are subdivided and/or have changed in size over time and/or are subject to natural selection. The study of genealogical processes in populations subject to these forces has been a major part of the recent effort in population genetics. The key parameter of the coalescent is the effective population size, Ne, because this is what determines the time scale of the process. It is important to note that some biologically interesting characteristics of species are manifest only through this effective size. These include the distribution of offspring number among individuals in the population and the details of the age structure of the population. The robustness of the coalescent to these features is generally thought to be a positive aspect of the model. However, it might also be considered disadvantageous, or at least unfortunate, that genetic data from a large population will not contain information about these important biological characteristics of organisms.
The mathematical simplicity of Kingman's coalescent follows from one important assumption about the population: that lineages or alleles are exchangeable. Roughly speaking, exchangeable lineages are ones whose predicted properties are unchanged if they are relabeled or permuted. More precise definitions can be found in Kingman (1982b) and Aldous (1985). In the case of population subdivision, which is the focus of the present work, the genealogy of a sample depends on the locations of the lineages, so the lineages are not exchangeable. For example, very different genealogies result if all n members of a sample are from the same subpopulation or deme than if one is from one deme and the other n − 1 are from another. Even when the entire sample is from a single deme, so that the present-day lineages appear to be exchangeable, the lineages ancestral to the sample will not be exchangeable if they are in different demes. A similar situation holds for natural selection, but in this case the labels are the allelic states of the lineages rather than the geographic locations.
The facility with which extensions to the coalescent can be made depends on this problem of exchangeability. For instance, when the effective size of the population changes over time, the analysis is relatively straight-forward because lineages remain exhangeable (Donnelly and Tavaré 1995; Slatkin 1996). In the case of natural selection, Neuhauser and Krone (1997) recently constructed a “dual process” (Donnelly 1984) for an ancestral selection graph in which lineages are exchangeable. Other approaches to selection assumed that it is strong enough to be treated in a similar manner to the problem of population subdivision (Kaplan et al. 1988, 1989; Hudson and Kaplan 1995; Nordborget al. 1996; Nordborg 1997). Formally, population subdivision is modeled using the structured coalescent (Notohara 1990, 1997; Wilkinson-Herbots 1998), but analytical results are difficult to obtain because lineages are not exchangeable. The structured coalescent applies to populations in which the per-generation migration rates are on the order of the reciprocal of the deme sizes. However, making this assumption does not simplify the analysis much, as exact expressions for samples of size two (Nagylaki 1998) are about as complicated as those coming out of the structured coalescent. If we are confident that the demography of a population is such that the general structured coalescent process is applicable, then simulation-based maximum-likelihood methods, like those of Beerli and Felsenstein (1999) and Bahlo and Griffiths (2000), may provide the best framework for historical inference. However, there are practical issues in the implementation of these methods. Chief among these in relation to the present work, it is unclear how to account for possibly numerous unsampled demes.
When the demography of a subdivided population is such that lineages are exchangeable, genealogical models that are robust to some of the details of demography can be found and applied. For example, when migration rates among demes are high relative to the sizes of demes, the strong-migration limit of Nagylaki (1980, 2000) and Notohara (1993) approximates the behavior of the population. A Kingman-type coalescent describes the genealogy of the sample but with an effective size that depends on the pattern of migration among demes. This result follows from a separation of time-scales between fast migration and slow coalescence (Nagylaki 1980; Nordborg 1997; Möhle 1998b). Because lineages are exchangeable in the strong-migration limit, the structure of the population will only be manifest in the effective size of the coalescent process. In particular, if a sample is taken from such a population, levels of polymorphism within and between demes will be the same. The existence of the strong-migration limit explains why geographic structure is sometimes not observed in samples, even in widely dispersed species that are obviously not panmictic across their entire range.
In fact, it is not uncommon to find evidence of geographic structure in genetic data (Slatkin 1985). With this motivation, the present model provides a framework for historical inference using samples from a metapopulation. A metapopulation is a population subdivided into many different demes among which there is some pattern of migration, extinction, and recolonization. As with the migration-only cases studied previously (Wakeley 1998, 2001), a simple genealogical process exists for samples from a population in which there are a large number of demes. The result holds for a fairly broad class of population structures and has similarities both to the structured coalescent and to the strong-migration limit. In common with the structured coalescent, the scaled rates of demic migration and extinction/recolonization (M = 2Nm and E = 2Ne below) are assumed to be finite. However, as the number of demes in the population becomes large, there is also a strong-migration limit for movement among types of demes (defined below). Lineages become exchangeable in this large-number-of-demes model only after a short burst of within-deme coalescent events. Thus the model predicts higher genetic variation among than within demes. In contrast to the small-number-of-demes structured coalescent (Wilkinson-Herbots 1998) and exact approaches (Nagylaki 1998), in which analytic results are typically confined to samples of size two, here results for arbitrary samples can be obtained. The large-number-of-demes model does not predict isolation by distance in the sense of Wright (1943), but such a pattern could result if the sizes of demes, in numbers of individuals, are positively correlated with their geographic extent; see the discussion.
The term metapopulation was introduced by Levins (1968, 1969) to describe a population that is subdivided into a large number of discrete demes, each of which is subject to random extinction and recolonization. Originally, the concern was for the numbers or fractions of empty and full demes in the population. Later metapopulation models focused on within-deme dynamics and included other processes, such as migration among demes. Over the last 30 years, metapopulation biology has grown into an active subfield of biology as a whole. Most of the emphasis has been on empirical and theoretical ecology. The recent book by Hanski and Gilpin (1997) gives a good overview of the subject. Of course, the study of subdivided populations has, from the beginning, been an important part of population genetics (Wright 1931). The Levins-type metapopulation was promoted by Wright (1940) as a demography that could lead to rapid evolution or speciation. Nevertheless, the rise of metapopulation ecology beginning in the 1970s caused a coincident increase in research on the genetics of metapopulations. Hanski (1998) and Pannell and Charlesworth (2000) provide thorough histories of these developments from the ecological and genetic perspectives, respectively.
Slatkin (1977) described the two fundamental conflicting consequences of extinction and recolonization in a subdivided population. One is the added genetic drift within demes that can occur when extinct demes are recolonized by a small number of individuals. This will tend to increase the level of differentiation among demes. Founder effects, or bottlenecks, such as this can also substantially decrease effective size of the population. The second consequence of extinction and recolonization emphasized by Slatkin (1977) is the increased amount of genetic exchange among demes that results from the movement of colonists across the population. This, like regular migration, will tend to decrease the level of differentiation among demes. These same two forces were envisioned by Wright (1940) to facilitate the fixation of chromosomal rearrangements or other strongly underdominant mutations. Slatkin (1977) illustrated these effects with quantitative predictions about genetic variation within and between demes for two different models of extinction and recolonization: the propagule-pool model and the migrant-pool model.
In the propagule-pool model, the k individuals that recolonize extinct demes come from a single source deme, and each deme in the population has an equal chance of providing these founders. In the migrant-pool model, the k founders come from the migrant pool, which each deme in the population contributes to equally, so all k may have different source demes. Thus, the structure of movement among demes is similar to that in the island model of migration (Wright 1931; Maruyama 1970; Latter 1973). Under the above assumptions, Slatkin (1977) studied recurrence relations for the probabilities of identity-by-descent for two gene copies sampled either from the same deme or from different demes. He also found expressions for the effective number of alleles in the metapopulation. Maruyama and Kimura (1980) studied probabilities of identity under a propagule-pool model and also obtained expressions for the effective size of the metapopulation. Wade and McCauley (1988) reformulated the model in terms of FST. Whitlock and Barton (1997) derived formulas for the effective size of a more general metapopulation in which demes may vary in size and within which there can be selection. Pannell and Charlesworth (1999) recast Slatkin's (1977) model in terms of genetic diversity within and between demes and emphasize the important point that no single measure (e.g., FST) is sufficient to characterize a subdivided population. A full account of research on Slatkin's (1977) model can be found in a recent review by Pannell and Charlesworth (2000).
The model considered here includes variation in the characteristics of demes and allows for structure in the pattern of movement of lineages, by migration or by recolonization, across the population. It is not, therefore, an island model in the strict sense of Wright (1931) or in the general sense of Wakeley (2001). Similar to genealogies in a large migration-only population (Wakeley 1998, 1999), it is shown here that, when the number of demes in the population is large, the genealogy of a sample includes two phases. There is a short recent part of the history, which I have elsewhere called the “scattering” phase (Wakeley 1999), and a more ancient “collecting” phase that dominates the history. Coalescent events during the scattering phase are the source of the within- vs. between-deme structure of genetic variation in a sample. The collecting phase is a Kingman-type coalescent process, with an effective size determined by the rate and pattern of movement across the population and by the distribution of deme sizes and propagule sizes. In addition to this effective size, the parameters that determine the pattern of genetic variation in a sample are the rates of migration and extinction/recolonization and the founding-propagule sizes for each sampled deme. Thus, in a large metapopulation, on the one hand, many interesting aspects of the biology of the species are tied together in the effective size of the collecting phase. On the other hand, we have a robust framework for investigating other phenomena, such as changes in effective size over time, within the context of a metapopulation. While many analytic results are possible, and some are given below, the model is also easy to program. Simulations are used here to show the contrasting effects of migration and extinction/recolonization on site-frequency distribution at polymorphic sites in a sample from a single deme.
THEORY
Large metapopulation model: Consider a population that is subdivided into a large number of local populations or demes. The total number of demes is D, and these are arbitrarily labeled 1 through D. Deme i has diploid size Ni, or, equivalently, haploid size 2Ni. In either case 2Ni copies of each genetic locus reside within deme i. The results presented below apply in a straight-forward way to haploid organisms and to diploid monoecious organisms with the additional assumption that migration and recolonization are gametic rather than zygotic. This leads to an apparent factor of two difference of terms involving k below, relative to results of previous authors, but this is not a meaningful difference. Nagylaki (1998) has shown that results for zygotic migration will be equivalent to those for gametic migration as long as the effective number of migrants each deme accepts each generation is not too small.
Each deme receives migrants from other demes in the population and is also subject to extinction/recolonization. If a deme goes extinct, it is recolonized immediately. Thus, there are no empty habitat patches in this model as there often are in ecological models of metapopulations; e.g., see Hanski (1997). This assumption is unnecessary as long as the total number of extant demes remains constant from one generation to the next (Pannell and Charlesworth 1999). Deme i receives Mi (haploid) migrants each generation. That is, a fraction Mi/(2Ni) of deme i is replaced by migrants every generation. The other portion, 1 − Mi/(2Ni), is derived from the previous generation of deme-i individuals. Reproduction within each deme occurs according to the Wright-Fisher model (Fisher 1930; Wright 1931). The parameter Mi is the scaled backward migration rate for deme i. Correspondingly, Ei is the scaled extinction/recolonization rate, so Ei/(2Ni) is the pergeneration probability that deme i goes extinct. If deme i goes extinct, it is recolonized by ki individuals, which immediately restore the deme to its original size of 2Ni gene copies. This step also occurs according to the Wright-Fisher model. That is, the 2Ni descendants are obtained by sampling with replacement from the ki colonists. It is important to note that the subscripts of N, M, E, and k refer to individual demes, not to the classes of demes introduced below.
An example, with K = 3 classes of demes, of the population structure assumed throughout this work. Within each of the three classes, or regions, there are many demes. Arrows depict the movement of lineages, by migration and/or extinction/recolonization, both within and among regions.
The population is assumed to comprise K different types of demes, which may represent different geographic regions. Demes of type i make up a fraction βi of all demes; thus,
Separation of timescales and the structure of genealogies: As in Wakeley (1998, 1999, 2000, 2001), the results presented here will hold for metapopulations that are composed of a large number of demes. In particular, the sample size must be much smaller than the number of demes in the population (n ⪡ D). This does not appear to be an unrealistic assumption for some metapopulations in nature and is one that is commonly made in theoretical studies of metapopulations (Hanksi 1997). It leads to a separation of timescales in the ancestral process of a sample, which is similar to that found in studies of partial selfing (Nordborg 1997, 1999; Nordborg and Donnelly 1997). A useful convergence theorem, derived in context of partial selfing, was found by Möhle (1998a). Consider the genealogy of a sample from such a population. The separation of timescales is a consequence of the fact that at any given time in the past, the overwhelming majority of demes in the population will not contain any lineages ancestral to the sample. Demes that do contain ancestral lineages are called occupied demes (Wakeley 1999), and the fraction of these in the population is never >n/D. Two kinds of events differ vastly in rate. The first is migration and extinction/recolonization events in which the source deme is occupied. The second is coalescent events within demes and migration or extinction/recolonization events in which the source deme is unoccupied. Events of the second type dominate the history of the sample because they are approximately D times more likely than events of the first type.
Given this, it is necessary to distinguish sample configurations in which every lineage is in a separate deme from those in which at least one deme contains multiple lineages. When at least one deme contains multiple lineages, migration events and extinction/recolonization events will send lineages to unoccupied demes and coalescent events will join together lineages within demes until each remaining lineage is in a separate deme. This scattering phase takes a negligible amount of time compared to the waiting time to the next relevant event, which is a migration or extinction/recolonization event to an occupied deme. At least one event of this type must occur before another coalescent event can happen. In fact, if n ⪡ D, so many will occur that the movement of the lineages among unoccupied demes by migration and extinction/recolonization will reach a statistical equilibrium before two lineages will have the chance to coalesce. This is the essence of Möhle's (1998a) result and of the strong-migration limit (Nagylaki 1980). As shown below, the collecting phase is a Kingman-type coalescent process with a characteristic effective size. Thus, the structure of genealogies is two-fold. First there is a one-time stochastic sample size adjustment, the scattering phase, which results in greater relatedness within than between demes; then the bulk of the history is spent in a collecting phase coalescent process. Figure 2 illustrates the structure of genealogies under this approximation for a sample from one deme.
Scattering phase: Consider the recent history of a sample n = (n1, … , nd) taken from d different demes. The total sample size is
An example of a genealogy of sample size eight from a single deme. In this case, during the scattering phase, there are two migration events (to some unoccupied demes that are not pictured) and then an extinction/recolonization event with k = 2 in which all of the lineages remaining in the deme are descended from a single common ancestor. The coalescent collecting phase of the three resulting lineages is shown above. The relative duration of the scattering phase is greatly exaggerated for purposes of illustration.
To know how many lineages remain at the end of the scattering phase, we must first distinguish histories that involve different numbers of migration events. The probability that x migration events occur and the extinction event occurs when there are j lineages is given by
Given that an extinction event occurs when there are j lineages remaining in deme i, the probability that these have y colonist-parents in the propagule of size ki is given by
When an extinction/recolonization event occurs, as long as D is large, each lineage will have a different source deme and the scattering phase will end for deme i. Thus, this model is a general version of Slatkin's (1977) migrant pool model. At the other extreme is Slatkin's (1977) propagule-pool model in which all y lineages in (5) would have the same source deme, chosen randomly according to some probability function. If this were the case, the scattering phase would continue, but with the scaled coalescent, migration, and extinction/recolonization rates of the source deme. If there was no migration, the propagule-pool model would always give
If there are x migration events and y colonist-parent lineages in the sample from deme i, then the scattering phase for deme i ends with
Collecting phase: The distribution among deme types of the n′ lineages that enter the collecting phase will depend on the particular outcome of the scattering phase for each deme's sample. Let ri be the number of lineages that are in type i demes. The vector (r1, … , rK) then denotes the configuration of the lineages among the different types of demes. The total number of lineages is equal to
Note that, from the perspective of a single lineage in a singly occupied deme, a migration event and an extinction/recolonization event are indistinguishable. Both simply move the lineage to another deme. Therefore, it is sufficient during the collecting phase to consider the combined effect of migration and extinction/recolonization: hij = mij + eij. It is assumed that hij is small, on the order of the reciprocal of the deme size. Thus, squared and higher-order terms in hij, which represent the movement of two or more of the lineages in a single generation, will be ignored. Note also that here there is no difference between migrant-pool and propagule-pool recolonization.
Looking back to the immediately previous generation, there are two kinds of events: changes in the configuration, (r1, … , rK), and the movement of a lineage into an occupied deme. Events of the first kind occur with probability
Consider first the movement of just one lineage among demes in the population. This is determined by the matrix Q, which has off-diagonal entries qij = hij. The diagonal entries are
The stationary distribution of the full configuration (r1, … , rK) is multinomial:
The total rate of events that put two lineages together in a deme of type i is the average of (9) over the stationary distribution of (r1, … , rK), that is,
Because each of the βiD demes of type i is equally likely to be the one that contains the two lineages, the overall chance that the two will have a common ancestor is equal to
This shows that the collecting phase is a Kingman-type coalescent process and is thus independent of the starting distribution of lineages among deme types, (r1, … , rK). The effective size of this coalescent process is given by
Analytic predictions about DNA sequence polymorphisms: Because the collecting phase is a coalescent process, a natural way to incorporate neutral mutations is to define θ = 4Neu, where Ne is given by (23) and u is the neutral mutation rate at some genetic locus. With θ defined in this way, the history of one particular kind of sample (n1 = 1, n2 = 1, … , nd = 1) will conform to the standard neutral coalescent model. All the usual coalescent results, for example, those found in Tavaré (1984), will apply directly to this sample when θ = 4Neu. Predictions about levels of genetic variation for other samples will have to be averaged over the possible outcomes of the scattering phase and weighted by their probabilities as given in (7). If S(n) is the number of segregating sites in the multideme sample, n = (n1, … , nd), then
The most basic prediction of this model, or of any subdivided population model, is that levels of polymorphism will be higher among than within demes. The simplest case, two sequences sampled either from the same deme or from two different demes, illustrates this point. If the two are from different demes, the expected number of pairwise differences will be equal to θ, identically for any pair of demes. A randomly chosen pair will have this same expectation because the chance of randomly sampling the same deme twice will be low when the number of demes is large. The expected number of differences between a pair of sequences from deme i will be equal to this, θ, times the probability that the scattering phase for this sample ends with two lineages. Call these expected values πT and πi, respectively. We can define an inbreeding coefficient for the deme,
Equations 24 and 25 provide results for arbitrary samples. For example, Figure 3 plots (25) for a sample of five sequences from each of two different demes over a broad range of migration and extinction/recolonization rates. Shown are results for two different propagule sizes: k = 1 in Figure 3a and k = 10 in Figure 3b. In both cases, θ = 10. As expected, the model predicts that samples from demes with larger backward migration rates will be more polymorphic. If within each deme all five lineages share a common ancestor by the end of the scattering phase, the collecting phase begins with just two lineages and the expected number of segregating sites will be equal to 10.0. At the other extreme, if, for example, migration is very frequent, the scattering phase could end with 10 sequences all in different demes. The collecting phase would then begin with 20 lineages, and the expected number of segregating sites will be equal to 28.3. These extremes are very nearly realized in Figure 3. In contrast to migration, the effect of changes in the rates of extinction/recolonization in the two demes depends on the propagule size. When the propagule size is small (Figure 3a), increases in the rate of extinction/recolonization make the expected number of segregating sites smaller, whereas when the propagule size is large (Figure 3b), increases in extinction/recolonization rates have a similar effect to increase in the migration rates, which is to increase levels of polymorphism. This is just restatement, within the framework of the coalescent, of Slatkin's (1977) conclusions about the dual roles of extinction/recolonization.
As in Wakeley (1999) it may be possible to find analytic expressions for the expected number of sites segregating at particular joint frequencies among the sampled demes. In addition, because the collecting phase is a coalescent, it is relatively straightforward to incorporate changes in effective population size over time. Again, existing expressions found for changing population sizes in the context of a Kingman-type coalescent, such as those in Slatkin and Hudson (1991) and Hey and Harris (1999), will apply directly to the sample where n = d and can be averaged over (7) for other samples. It should also be possible to model diverging species, each of which conforms to the present metapopulation model, as in Wakeley (2000). Instead of pursuing these ideas any further here, the next section describes how genealogies can be simulated easily so that these and other questions can be addressed computationally.
SIMULATIONS
The coalescent process in the large-number-of-demes metapopulation is simulated as follows. First, the scattering phase is carried out for each sampled deme. This is done by simulating a series of coalescent and migration events, and possibly an extinction/recolonization event,
for each deme according to the relative rates specified above Equation 1. The result for deme i is a number of lineages,
The expected number of segregating sites in a sample of five sequences from each of two demes, plotted as a function of the migration and extinction/recolonization rates. These rates are assumed to be the same for both demes. In a the propagule size is equal to 1 and in b it is equal to 10.
When this instantaneous scattering phase has been completed for every deme, the
One aspect of DNA sequence data that has received a lot of attention in recent years is the distribution of allele frequencies in a sample. These form the basis of many tests of the standard neutral coalescent model, such as Tajima's (1989) D, so it is of interest to know what other models predict about these “site frequencies.” Figure 4 shows the predicted site-frequency distributions under different regimes of migration and extinction/recolonization. These are the “unfolded” site frequency distributions, that is, assuming it is known which is the ancestral and which is the mutant base at each polymorphic site. Figure 4, a–c, shows simulation results for a sample of 10 sequences from a single deme and are averaged over 1 million replicates. Figure 4a shows that as the migration rate becomes small when there is no extinction/recolonization, the site-frequency distribution becomes U-shaped. When migration is infrequent, typically only one or zero migration events will occur during the scattering phase of the sample. The genealogies with a single scattering-phase migration event will have a long internal branch, and it is most likely that this branch separates a single sequence from the rest of the sample. This U-shaped distribution under low migration appears to be a general property of subdivision with migration because it also occurs in a continuous-habitat model (J. F. Wilkins, unpublished results) and when the number of demes in the population is small (results not shown).
Figure 4b shows the site-frequency distribution expected when there is no migration among demes. In this case, as the rate of extinction/recolonization increases, the distribution develops a mode for i > 1. This results from the fact that k = 2 in the simulations that produced Figure 4b. As E increases, the scattering phase becomes equivalent to throwing 10 balls into two boxes, the result being a binomial (n = 10, p = 0.5) distribution for the “sizes” of the two collecting-phase lineages. For other values of k, the sizes will be multinomial, and the mode will be for site frequencies around n/k. While a U-shaped distribution of unfolded site frequencies can result from positive Darwinian selection, e.g., see Fay and Wu (2000) and Kim and Stephan (2000), this appears to be the first report of exchangeable lineages of a mode in the site-frequency distribution for anything other than the singleton-polymorphism (i = 1) class. Figure 4c shows that when both migration and extinction/recolonization occur, a mixture of the two above cases can result, and this can lead to a trimodal distribution of unfolded site frequencies.
The expected site frequency distribution in a sample of 10 sequences from a single population, as a function of the rates of migration and extinction/recolonization. The vertical bars represent the proportion of polymorphic sites expected to be segregating at various frequencies in the sample. In b and c the propagule size is equal to 2.
DISCUSSION
The above work shows that samples of genetic data from a metapopulation that contains a large number of demes will reveal certain aspects of population structure but not others. Genetic variation will be shaped by the overall rates of migration, extinction/recolonization, and the number of colonists for the sampled demes only and by the scaled mutation parameter, θ, which hides all other aspects of demography. In particular, the geographic structure represented by the rates of movement of lineages, mij and eij, among the different classes of demes will not be discernible if the number of demes in each class is large. For this result to hold, the only restrictions on the movement matrix, Q, are that lineages can get from any deme class to any other given enough time and that there is some chance a lineage does not switch deme classes in a single generation, i.e., that Q is ergodic. Even highly structured populations, such as those with stepping-stone movement among deme classes (Kimura and Weiss 1964), meet these simple criteria.
Analysis shows that samples from demes with higher rates of migration will be more polymorphic than samples from demes with lower rates of migration. Demes with higher rates of extinction/recolonization will be more polymorphic if the number of colonists is not small; otherwise they will be less polymorphic. These effects are shown in Figure 3. If demes vary in their characteristics, it is possible to observe a great reduction of variation within sampled demes even if the variability between demes is high. This may explain the recent observation of such a pattern, by Liu et al. (1998), in samples from populations of the plant Leavenworthia. In a large metapopulation, no structure will be visible in between-deme comparisons. This is analogous to the (total) lack of geographic structure in data under the strong-migration limit of Nagylaki (1980) and Notohara (1993). However, the source of the limit here is that there are a large number of demes within each class, while the values of M and E per deme are not necessarily large.
It is interesting to note that a kind of isolation by distance could be observed under the present model. To see how this might develop, imagine a species with two types of demes: small and large. For simplicity, assume that every deme accepts the same fraction of migrants and has the same chance of extinction each generation. Thus, we have mij = m and eij = e for all i and j. The smaller demes will have smaller values of M and E than the larger demes and will thus be less polymorphic. For example, if the smaller demes have M = E = 0.5 and the larger demes are 10 times larger (M = E = 5.0), and if θ = 10.0, then the average number of pairwise differences within small demes will be 4.94, and the average number of pairwise differences within large demes will be 7.74. These values are obtained from (25) with the additional assumption that k = 2 for all demes. The average number of pairwise differences between demes will be equal to θ, or 10.0 in this case. Then all that is needed to create a pattern of isolation by distance would be for the larger demes to occupy larger geographic ranges and for the geographic distances between demes to be larger than the average geographic distance of within-larger-deme pairwise comparisons. Thus, while the model loses some of its structure in the large-number-of-demes limit, a correlation between geographic and genetic distances can develop under, arguably, reasonable biological conditions. This result does not depend strongly on the particular values of M, E, and k used here; e.g., if E = 0.0 for all demes, the only differences are that the within-small-deme average becomes 4.90 and the within-large-deme average becomes 9.0.
It should be possible to estimate the relative contributions of migration vs. extinction/recolonization using DNA sequence data, due to the dramatically different effect these have on the frequencies of the segregating bases at polymorphic sites (Figure 4). The migration site-frequency distribution can be U-shaped, and this could be tested using Fay and Wu's (2000) statistic, which was designed to detect signatures of positive selection. In contrast, the extinction site-frequency distribution can have a mode for middle-frequency polymorphisms, which is quite unusual for exchangeable coalescent models. Note, however, that the colonization events in a metapopulation violate a fundamental premise of the coalescent: that no more than one common ancestor event can happen in a single generation. Thus, we should expect to see an identical middle-frequency mode in the site-frequency distribution in samples from species that recently experienced brief severe bottleneck events. Coalescent models of bottleneck events, in which θ becomes small for some period of time, never predict a nonsingleton mode in the site-frequency distribution. Instead, the distribution simply flattens out as a coalescent bottleneck becomes more severe. Thus, it will sometimes be inadequate to use Kingman's coalescent to model population bottlenecks.
A noteworthy consequence of there being a large number of demes in a population is that if the population mutation parameter, θ, is finite, then the demic mutation rates, 4Niu, will be vanishingly small. Conversely, if the demic mutation rates are not small, so that mutations occur during the scattering phase, then the population mutation rate will be infinite. Metapopulation studies often presume the former (Hanksi 1997). In fact, the latter can often be rejected using data because it predicts complete saturation (i.e., multiple mutations per site) in among-deme samples. When the latter case does hold, then the approach of Slatkin (1982), which assumes an infinite-allele mutation model, could be used. In this model, migration and extinction/recolonization act like mutation, because every lineage that enters the collecting phase is guaranteed to represent a unique allele.
Recombination could easily be included in this metapopulation coalescent model. As with mutation, there will be no recombination events during the scattering phase if the total population recombination rate is finite. That is, in the same way mutations are modeled we assume that the population recombination rate, R = 4Ner, where r is the rate per locus per generation and Ne is given by (23), is finite. Similar to the case of partial selfing (Nordborg 2000), the observable recombination rate will be smaller than this actual recombination rate. When a recombination event occurs in some collecting-phase lineage, it will split it into two lineages. The two lineages will necessarily be in the same deme, so there is some chance that they coalesce and the event is erased. The relationship between the actual and observable recombination rates in this case is
The metapopulation model presented here has so far not addressed some well-known features of such species. The assumption that a recolonized population regains its previous size in a single generation is not realistic. It is known that delayed or slower growth can change some results (Whitlock 1992; Ingvarsson 1997). The effect here would be to increase the number of coalescent events during the scattering phase. Another point is that there is probably an inverse relationship between the size of a deme and its probability of going extinct (Foley 1997). In fact, the present model allows for this possibility already. One would simply have to add the assumption that deme classes with higher rates of extinction/recolonization also contained smaller-sized demes. Finally, one of the major concerns in metapopulation studies is whether the population under study is at equilibrium or not. The present work shows that if there are a large number of demes in the population, changes in demography over time will be manifest simply as changes in the effective size of the population. If the changes occur on a longer timescale than the scattering phase, they will affect only the collecting phase and can be modeled as a change in effective size of that coalescent process. Consequently, using samples of genetic data it will be impossible to distinguish between changes in population number and changes in the rates and patterns of migration and extinction/recolonization as explanations of variable effective size over time.
Acknowledgments
We thank Martin Möhle for continuing helpful discussions of his convergence results and two anonymous reviewers for helpful comments. Nicolas Aliacar participated in this project while visiting the lab for his research training period during April–July 2000 and was supported by l'Ecole Polytechnique, Paris. This work was supported by grant DEB-9815367 from the National Science Foundation.
Footnotes
-
Communicating editor: D. Charlesworth
- Received May 18, 2001.
- Accepted July 18, 2001.
- Copyright © 2001 by the Genetics Society of America