- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
-
All Versions of this Article:
genetics.107.078998v1
177/2/1043 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Skalski, G. T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Skalski, G. T.
Originally published as Genetics Published Articles Ahead of Print on August 24, 2007.
Genetics, Vol. 177, 1043-1057, October 2007, Copyright © 2007
doi:10.1534/genetics.107.078998
Joint Estimation of Migration Rate and Effective Population Size Using the Island Model
Garrick T. Skalski1
Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas 66045
1 Corresponding author: 1637 Merion Pl., Lawrence, KS 66047.
E-mail: gt.skalski{at}gmail.com
>ABSTRACT
THE INFINITE-ISLAND MODEL WITH...
THE FINITE-ISLAND MODEL WITH...
ROBUSTNESS OF THE ESTIMATION...
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Using the island model of population demography, I report that the demographic parameters migration rate and effective population size can be jointly estimated with equilibrium probabilities of identity in state calculated using a sample of genotypes collected at a single point in time from a single generation. The method, which uses moment-type estimators, applies to dioecious populations in which females and males have identical demography and monoecious populations with no selfing and requires that offspring genotypes are sampled following reproduction and prior to migration. I illustrate the estimation procedure using the infinite-island model with no mutation and the finite-island model with three kinds of mutation models. In the infinite-island model with no mutation, the estimators can be expressed as simple functions of estimates of the F-statistic parameters FIT and FST. In the finite-island model with mutation among k alleles, mutation rate, migration rate, and effective population size can be simultaneously estimated. The estimates of migration rate and effective population size are somewhat robust to violations in assumptions that may arise in empirical applications such as different kinds of mutation models and deviations from temporal equilibrium.
POPULATION geneticists recognize that the demographic characteristics of populations, such as migration rates and population sizes, affect population genetic structure (WRIGHT 1951). Accordingly, many population genetic studies have investigated how demographic properties might be inferred from genetic measurements in populations (e.g., SLATKIN 1985; WAPLES 1989; PUDOVKIN et al. 1996; BEERLI and FELSENSTEIN 2001; VITALIS and COUVET 2001; WANG and WHITLOCK 2003; ROBLEDO-ARNUNCIO et al. 2006). In parallel, the cultivation of genomic resources in species that are amenable to field study has facilitated the application of genetic methodologies to estimate demographic rates in natural populations.
The island model of WRIGHT (1951) is an important model in population genetics. Under a simple version of this model, an infinite number of demes, each having population size N, exchange migrants at rate m under the assumption that migrants into a deme come from any of the other demes with equal probability. In the absence of mutation and other evolutionary forces, genetic polymorphism is maintained within demes via a balance between genetic drift and migration. An important feature of the infinite-island model is that, at temporal equilibrium, the magnitude of genetic differentiation among demes, FST, is approximated by
![]() |
There is continuing interest in statistical approaches that estimate both migration rate, m, and effective population size, N, from genetic data, including methods that are applicable to a sample taken from a single generation at a single point in time (BEERLI and FELSENSTEIN 2001; VITALIS and COUVET 2001; WANG and WHITLOCK 2003). Here, I report results that show how the island model of dioecious or monoecious populations can be used to simultaneously estimate migration rate, m, and effective population size, N, using a sample of selectively neutral markers taken from a single generation at a single point in time. In particular, at temporal equilibrium under the infinite-island model with no mutation, the demographic parameters m and N can be estimated using data on FIT and FST (WRIGHT 1951). At temporal equilibrium under the finite-island model with a k-allele mutation scheme, the demographic parameters m and N, as well as the mutation rate, u, can be jointly estimated using data on probabilities of identity in state.
ABSTRACT
>THE INFINITE-ISLAND MODEL WITH...
THE FINITE-ISLAND MODEL WITH...
ROBUSTNESS OF THE ESTIMATION...
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Population genetic structure can be characterized using probabilities of gene identity (e.g., MARUYAMA 1970; MAYNARD SMITH 1970; NEI and FELDMAN 1972; CROW and AOKI 1984; EPPERSON 1999; ROUSSET 2001; VITALIS 2002). Accordingly, let Q1(t), Q2(t), and Q3(t) be the probabilities (summed over k alleles at one locus) that genes within individuals, between individuals within a deme, and between individuals between demes are the same allele at time t, respectively. Hence, Q1(t), Q2(t), and Q3(t) are probabilities of identity in allelic state. A key idea in the following theory is that the interpretation of the probabilities of identity can depend on the timing of the sampling of genotypes within the sequence of demographic events that defines the life cycle (VITALIS 2002). I first assume that the sampling of genotypes follows a premigration census in the sense that genotypes are sampled from offspring immediately following reproduction and prior to migration. This kind of sampling is appropriate for highly fecund organisms with localized mating in which many offspring may be available following reproduction for genotyping. Under the premigration census in the infinite-island model with no mutation, the probabilities of identity at temporal equilibrium satisfy
![]() | (1) |
![]() |
]. Equation 14 in VITALIS (2002) assuming no mutation, an infinite number of demes, and no sex-specific dispersal is the same as the equation for FST given above, but VITALIS (2002) does not report an expression for FIT. Thus, migration rate, m, and effective population size, N, can be expressed in terms of FIT and FST, without approximation, via
![]() |
![]() | (2) |
and
denote estimates of FIT, FST, Q1(t), Q2(t), and Q3(t), respectively. Methods for estimating FIT, FST, Q1(t), Q2(t), and Q3(t) are discussed by ROUSSET (2001). Equation 17 in VITALIS (2002) gives an estimator for sex-specific dispersal rates similar to the estimator of m in Equation 2, but, importantly, the former requires estimates of FST from a sequence of samples taken pre- and postmigration, rather than estimates of FIT and FST from a single sample as in Equation 2. FONTANILLAS et al. (2004) also give estimators for sex-specific dispersal based on the idea of VITALIS (2002) that require estimates of FST from samples taken pre- and postmigration. VITALIS (2002) and FONTANILLAS et al. (2004) do not report estimators of effective population size.
Estimates of m and N can be calculated from multiple loci by calculating
and
over loci (or, equivalently,
and
over loci). Interestingly, if offspring genotypes are sampled following migration using a postmigration census scheme (VITALIS 2002), then the recursions for the probabilities of identity are different from those for the premigration census with the consequence that FIT = FST. Hence, the parameters m and N cannot be jointly estimated in this way using a postmigration census.
To verify the recursions in Equation 1, and thus that the estimators in Equation 2 work as intended, I simulated genotype data under the infinite-island model with no mutation for dioecious and monoecious (with no selfing) populations at temporal equilibrium over a range of migration rates (0.02, 0.05, 0.10, and 0.20) and effective population sizes (10, 20, and 50). In the simulations, individual genotypes were tracked forward in time using a Monte Carlo implementation of the probability model defined by the life cycle using a premigration census. For each replicate simulation, diploid genotypes were initialized using random pairs of alleles, the life cycle was iterated until the system reached temporal equilibrium, and offspring genotypes were sampled prior to migration. I numerically solved the analytical recursions in Equation 1 to identify, in advance of the stochastic simulations, a sufficient number of generations required for the system of probabilities of identity to reach equilibrium to a precision of 10–4 (equilibrium to four decimal places; 1000 generations is sufficient for all parameter combinations under the infinite-island model). Means of the probabilities of identity calculated over replicate simulations are in close agreement with those calculated from the analytical recursions. The simulations were carried out using 20 independent eight-allele loci (with equally frequent alleles) at which 50 offspring were genotyped from each of 20 demes. Simulated data were combined over loci to calculate
and
Negative estimates of N (equivalent to infinite-valued estimates of N) and m were set equal to 1000 and zero, respectively.
The simulations of the infinite-island model show, given sufficient data collected using a premigration census, that estimates of migration rate and effective population size using Equation 2 are close to their true values for both dioecious (Figure 1A; Table 1) and monoecious populations (Figure 1B; Table 1). The precision of the estimates of N decreases with increasing N, and the precision of the estimates of m decreases with increasing m and N. Additional simulation results are in supplemental Table S1 at http://www.genetics.org/supplemental/.
|
|
ABSTRACT
THE INFINITE-ISLAND MODEL WITH...
>THE FINITE-ISLAND MODEL WITH...
ROBUSTNESS OF THE ESTIMATION...
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Under the premigration census in the finite-island model with k-allele mutation, the probabilities of identity at temporal equilibrium satisfy
![]() | (3) |
![]() |
![]() | (4) |
and
respectively, are the respective moment-based estimators of u, m, and N. The estimates
and
are calculated assuming that the number of demes, s, and the number of possible alleles, k, are known. Estimates can be calculated from multiple loci by summing the left- and right-hand sides of Equation 4 over loci (k may be locus specific in the U1 and U2 terms in the right-hand side of Equation 4) and calculating the parameter values that solve the moment equations obtained by setting the left-hand sum over loci equal to the right-hand sum over loci.
To verify the recursions in Equation 3, and thus that the estimators
and
based on Equation 4 work as intended, I simulated genotype data under the finite-island model with k-allele mutation for dioecious and monoecious (with no selfing) populations at temporal equilibrium over a range of migration rates (0.02, 0.05, 0.10, and 0.20) and effective population sizes (10, 20, and 50) at three different mutation rates (0.001, 0.0005, and 0.0001). These values for the mutation rate are consistent with values used in similar simulation studies (e.g., VITALIS and COUVET 2001; WANG and WHITLOCK 2003; EXCOFFIER et al. 2005) and estimates from empirical data (ESTOUP et al. 2001; LAI and SUN 2003; EXCOFFIER et al. 2005). Simulations were executed as described above for the infinite-island model: individual genotypes were tracked forward in time using a Monte Carlo implementation of the probability model defined by the life cycle using a premigration census for s demes. I numerically solved the analytical recursions in Equation 3 to identify a sufficient number of generations for the system to reach equilibrium to a precision of 10–4 (5000 generations for u = 0.001, 6000 generations for u = 0.0005, and 15,000 generations for u = 0.0001). Models with smaller values of u and m and larger values of N require more generations to reach equilibrium. Means of the probabilities of identity calculated over replicate simulations are in close agreement with those calculated from the analytical recursions. The simulations were carried out using 30 independent eight-allele loci (with equally frequent alleles; hence, k = 8) at which 100 offspring were genotyped from each of 20 demes. Simulated data were combined over loci to calculate
and
and the estimators
and
defined by Equation 4 were calculated numerically using a nonlinear least-squares procedure in the software application MATLAB (The Mathworks). Estimates of u, m, and N were constrained according to 0
1, 0
1, and 2
1000, respectively, to obtain realistic estimates and to account for the possibility of infinite-valued estimates of N (cf. WAPLES 1989; WILLIAMSON and SLATKIN 1999; WANG and WHITLOCK 2003).
The simulations of the finite-island model show, given sufficient data collected using a premigration census, that estimates of migration rate and effective population size using Equation 4 are close to their true values for both dioecious (Figure 2, A–C; Table 2) and monoecious populations (Figure 2D; Table 2). The precision of the estimates of N decreases with increasing values of N, and the precision of the estimates of m decreases with increasing values of m and N. The precision of the estimates of m and N decreases with lower mutation rates. Estimates of mutation rate, given sufficient data, are similarly close to their true values (Table 2). The precision of the estimates of u decreases with decreasing values of u and increasing values of N. Additional simulation results are in supplemental Table S2 at http://www.genetics.org/supplemental/.
|
|
Examples of the sampling distributions of the parameter estimates are shown in Figure 3 for dioecious populations with u = 0.0005, m = 0.05, and N = 20 for samples of 100 genotypes from 20 demes genotyped at 30 eight-allele loci. The distributions of
and
are approximately symmetrical (Figure 3, A and B), and the distribution of
is positively skewed (Figure 3C). The estimates of m and N are strongly negatively correlated and fall along the line defined by
= mN (Figure 3D). The estimates of u and m are positively correlated, and the estimates of u and N are negatively correlated (the correlation between
and
is strong, but not as strong as that exhibited by
and
). These results suggest that the product parameter mN is well estimated, but that the individual estimates of u, m, and N are more difficult to identify precisely from data.
|
In empirical situations, k, the number of possible alleles at a locus, might be expected to vary across loci. In this case, estimates of u, m, and N may be constructed from the moment equations defined by summing the left- and right-hand sides of Equation 4 over loci and setting the left-hand sum equal to the right-hand sum. Simulations of dioecious populations in the finite-island model (100 individuals genotyped from each of 20 demes) with k-allele mutation (u = 0.0005) for 30 loci having a random mixture of 4, 8, or 12 possible alleles show, given sufficient data collected using a premigration census, that estimates of mutation rate, migration rate, and effective population size are close to their true values and have similar properties to those calculated when all loci have the same number of possible allelic states (simulation results are in supplemental Table S3 at http://www.genetics.org/supplemental/).
ABSTRACT
THE INFINITE-ISLAND MODEL WITH...
THE FINITE-ISLAND MODEL WITH...
>ROBUSTNESS OF THE ESTIMATION...
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
I used the k-allele mutation model to describe the mutation process in the estimation procedure described above. The infinite-allele model (KIMURA and CROW 1964) can be employed using Equation 4 by setting k to a large (and effectively infinite) value. An alternative mutation model that may be appropriate for some markers such as microsatellite sequences is the stepwise mutation model (OHTA and KIMURA 1973). The stepwise mutation model assigns a length-based ordering to alleles and posits that mutation occurs between allelic states that are adjacent in the ordering. Accordingly, I applied the estimation procedure to data generated using the simulation approach described above for the finite-island model except that I implemented mutation according to two simple kinds of stepwise mutation models: a stepwise mutation model with no bounds on allele length (unbounded stepwise mutation model; OHTA and KIMURA 1973) and a stepwise mutation model with lower and upper bounds on allele length (bounded stepwise mutation model; constrained to eight allelic states). I simulated genotype data under the finite-island model with dioecious populations over a range of migration rates (0.02, 0.05, 0.10, and 0.20) and effective population sizes (10, 20, and 50) at a mutation rate of u = 0.0005. Hence, each generation every gene mutates to an adjacent allele with probability u, mutating to either neighboring allele with equal probability. In the bounded stepwise mutation model, genes at the lower bound mutate toward the upper bound and genes at the upper bound mutate toward the lower bound. The simulations were carried out using 30 independent eight-allele loci (with equally frequent alleles) at which 100 offspring were genotyped from each of 20 demes after 6000 generations. Simulated data were combined over loci to calculate
and
I set k = 10,000 to mimic the infinite-allele mutation model for parameter estimation using data generated from the unbounded stepwise mutation model, and I set k = 8 for parameter estimation using data generated from the bounded stepwise mutation model.
The accuracy (when comparing the medians of replicate estimates to their parametric values) of estimates of migration rate and effective population size for data generated under the stepwise mutation models is similar to that observed under the k-allele mutation model (Figure 4; Table 3), despite the fact that the k-allele mutation model is the assumed mutation process in Equation 4. In particular, the medians of the estimates of m and N are close to their respective parametric values. The estimates of m and N for data generated under the stepwise mutation models (Figure 4; Table 3) exhibit levels of precision similar to, but slightly lower than, the estimates of those parameters for data generated under the equivalent k-allele mutation model (Figure 2B; Table 2). Hence, the estimates of m and N based on Equation 4 are somewhat robust to violations in the assumptions of the mutation model. In contrast, the estimates of mutation rate, when summarized using their medians, are negatively biased (Table 3), suggesting that the mutation rate estimates are sensitive to violations in the assumptions of the mutation model in the estimation procedure. Additional simulation results are in supplemental Table S4 at http://www.genetics.org/supplemental/.
|
|
The assumption of temporal equilibrium in the probability of identity recursion equations is used to estimate mutation rate, migration rate, and effective population size. To assess this assumption, I applied the estimation procedure to genotype data simulated under nonequilibrium conditions using the infinite-island model with no mutation and the finite-island model with mutation for dioecious populations with parameter values m = 0.05 and N = 20. For the finite-island model, I simulated data under the k-allele mutation model, the unbounded stepwise mutation model, and the bounded stepwise mutation model using a mutation rate of u = 0.0001 (the value of the mutation rate requiring the most generations to reach temporal equilibrium). At these parameter values, the infinite-island model is very close to temporal equilibrium after 50 generations, whereas the finite-island model with k-allele mutation requires
10,000 generations to reach equilibrium to four decimal places. Diploid genotypes were initialized using random pairs of alleles, the life cycle was iterated for 10, 20, 50, 100, or 200 generations, and genotypes were then sampled prior to migration (100 offspring genotyped at 30 eight-allele loci in each of 20 demes).
The accuracy of the estimates of mutation rate, migration rate, and effective population size, measured using the median of replicate estimates relative to their parametric values, increases as the number of generations increases from 10 to 200 for both the infinite- and the finite-island models (Figure 5). The estimates of u under the finite-island models are strongly positively biased at 10–200 generations (Figure 5A). In contrast, the estimates of m under the infinite-island model and the finite-island model with unbounded stepwise mutation are positively biased at 10 generations, but are nearly unbiased after
20 generations (Figure 5B). The estimates of m under the finite-island models with k-allele and bounded stepwise mutation models are negatively biased, but the bias is relatively small, especially after
100 generations (Figure 5B). The estimates of N exhibit the least bias among the estimated parameters, indicating essentially no bias in the infinite-island model and the finite island model with unbounded stepwise mutation at
10 generations and a small negative bias in the finite-island models with k-allele and bounded stepwise mutation models after 10 and 20 generations and very little bias for
50 generations (Figure 5C). The precision (measured via 5th and 95th percentiles in Figure 5) of the estimates of m and N is high relative to estimates of those parameters under equilibrium conditions (see supplemental Table S2 at http://www.genetics.org/supplemental/: u = 0.0001, m = 0.05, N = 20). Hence, the estimation procedures do not require equilibrium conditions to provide reasonable estimates of migration rate and effective population size, and the nonequilibrium conditions explored here actually increase the precision of these estimates. In contrast, the nonequilibrium conditions examined here result in inaccurate estimates of the mutation rate. Simulations suggest that at least 4000 generations under the finite-island model with k-allele mutation are required to obtain accurate estimates of the mutation rate when u = 0.0001, m = 0.05, and N = 20.
|
The estimation of parameters using Equation 4 assumes that k, the number of possible allelic states at a locus, is known. In practice, k might be estimated using the total number of alleles observed in all of the data for each locus, and there would be uncertainty in its value. Accordingly, using data generated under the finite-island models (100 offspring genotyped at 30 eight-allele loci in each of 20 demes after 6000 generations; hence k = 8) with k-allele mutation, unbounded stepwise mutation, and bounded stepwise mutation (u = 0.0005, m = 0.05, N = 20), I calculated estimates of mutation rate, migration rate, and effective population size using values for k in Equation 4 equal to 2, 4, 8, 16, and 100.
The medians of the estimates of migration rate and effective population size are close to their parametric values over the range of assumed values of k (2, 4, 8, 16, and 100), but the medians of the estimates of mutation rate deviate from the parametric values for most of the assumed values of k, with most cases exhibiting negative bias. The precision (measured via 5th and 95th percentiles) of the estimates of m and N is similar over the range of assumed values of k with the exception that estimates of m are slightly less precise for k = 2 under the k-allele mutation model. Estimates of u for k = 2 are quite variable and some are very near zero; otherwise the precision of the estimates of u is similar across the different assumed values of k. Hence, estimates of m and N are robust to uncertainty in the value of k, and, in contrast, estimates of u are more sensitive to deviations from the parametric value of k.
ABSTRACT
THE INFINITE-ISLAND MODEL WITH...
THE FINITE-ISLAND MODEL WITH...
ROBUSTNESS OF THE ESTIMATION...
>DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
The estimation procedure works because assuming a dioecious population—or monoecious populations with no selfing—in which offspring genotypes are sampled prior to migration has the consequence that Q1(t)
Q2(t) and hence provides additional information that is not available for other mating systems and sampling schemes that result in Q1(t) = Q2(t) (e.g., random selfing resulting in random pairing of gametes from all adults during mating, including pairing of gametes from the same adult; MARUYAMA 1970; NEI and FELDMAN 1972; NAGYLAKI 1983; CROW and AOKI 1984; EPPERSON 1999) or Q1(t) very nearly equal to Q2(t) (e.g., the finite-island model under a postmigration census). Previous studies in population ecology (e.g., CASWELL 2001) and genetics (e.g., NAGYLAKI 1983; WAPLES 1989; VITALIS 2002) have recognized that the mating system and/or timing of sampling can affect the interpretation of demographic quantities, but the application of these ideas to the present scenario of joint estimation of migration rate and effective population size using a sample from a single generation collected at a single point in time seems not to have been analyzed in prior investigations. Several studies have examined recursions for probabilities of identity in state that are similar to those used here, but these studies do not identify the estimation procedures developed here. In the APPENDIX, I outline how various recursions for probabilities of identity that have been studied (MARUYAMA 1970; MAYNARD SMITH 1970; NEI and FELDMAN 1972; NAGYLAKI 1983; CROW and AOKI 1984; EPPERSON 1999; VITALIS and COUVET 2001; VITALIS 2002; BALLOUX et al. 2003) can be derived under the pre- and postmigration census schemes, thus helping to explain the different forms of these equations that occur in the literature. Indeed, VITALIS and COUVET (2001) estimate m and N using probabilities of identity, and their Equation 5 with no selfing is equivalent to the infinite-island model with k-allele mutation under a premigration census as defined here; but VITALIS and COUVET (2001) assume an infinite-island model with mutation among an infinite number of alleles, and they use the approximation
along with a two-locus identity measure (a fourth-moment quantity) rather than the single-locus quantities FIT and FST or Q1(t), Q2(t), and Q3(t) (all second-moment quantities).
Like other genetic methods for estimating demographic parameters (e.g., WAPLES 1989; PUDOVKIN et al. 1996; WILLIAMSON and SLATKIN 1999; BEERLI and FELSENSTEIN 2001; VITALIS and COUVET 2001; WANG and WHITLOCK 2003), the procedure described here will typically require considerable data to recover accurate and precise estimates. The accuracy and precision achieved will, in general, depend on several factors, including the values of the parameters u, m, and N, as well as the number of demes, number of individuals genotyped, number of loci used, and the appropriateness of the model to the empirical system under investigation. Previous studies (WAPLES 1989; PUDOVKIN et al. 1996; WILLIAMSON and SLATKIN 1999; VITALIS and COUVET 2001; WANG and WHITLOCK 2003) have shown that effective population size is more difficult to estimate as N increases, and my simulation results are consistent with the findings in these earlier studies. Indeed, my results suggest that Q1(t) and Q2(t) become increasingly similar as N increases, with the consequence that estimates of N approach infinity as
and
become equal. Further, negative estimates of N are possible with Equation 2 if
>
Because effective population size influences genetic quantities via the function 1/N in standard models, it is not surprising that larger population sizes are more difficult to estimate with genetic data because this involves estimating an effect of magnitude
1/N—a small number for even moderately sized N. Hence, genetic-based methods for estimating effective population size work best for small populations, and the numbers of loci and individuals genotyped must increase with increasing N to maintain a given level of precision (WAPLES 1989; PUDOVKIN et al. 1996; WILLIAMSON and SLATKIN 1999; VITALIS and COUVET 2001; WANG and WHITLOCK 2003). Accordingly, the method presented here is most likely to be useful for populations with a metapopulation structure defined by many small demes. Detailed guidance on the accuracy and precision of the estimators for specific empirical scenarios can be obtained using simulations. Source code used to simulate data under the infinite-island model with no mutation and the finite-island model with k-allele mutation is available at http://www.genetics.org/supplemental/.
The simulation results suggest that the estimates of migration rate and effective population size are somewhat robust to violations of the model assumptions. Reasonable estimates of m and N can be obtained for loci exhibiting stepwise mutation (OHTA and KIMURA 1973), under nonequilibrium conditions, or if the number of possible allelic states is not precisely known. In contrast, estimates of the mutation rate, u, are sensitive to violations in the assumptions and can be quite biased in these settings. The precision of the estimates of m and N is higher for data simulated with a high mutation rate and for data simulated under nonequilibrium conditions, suggesting that the procedure works better at higher levels of genetic diversity. The estimation procedure under the finite-island model requires that the number of demes, s, be known, but it does not require that all demes be sampled because all demes are identical in island models. In non-island models, the set of demes that is exchanging migrants generally must be known to estimate migration rates among the demes (BEERLI and FELSENSTEIN 2001; WANG and WHITLOCK 2003; SLATKIN 2005).
Many studies investigate the estimation of demographic parameters from genetic data (SLATKIN 1985; WAPLES 1989; PUDOVKIN et al. 1996; WANG and WHITLOCK 2003; ROBLEDO-ARNUNCIO et al. 2006); however, few methods exist for jointly estimating parameters like migration rate and effective population size from genetic data collected from a sample taken from a single generation at a single point in time (BEERLI and FELSENSTEIN 1999, 2001; VITALIS and COUVET 2001). For example, the product parameter mN can be estimated from single-generation data on FST under the infinite-island model (SLATKIN 1985), and effective population size alone can be estimated from multiple samples on allele frequencies from two or more generations (WAPLES 1989) or from a single sample of offspring assuming unrelated parents using heterozygote excess (PUDOVKIN et al. 1996). VITALIS (2002) and FONTANILLAS et al. (2004) use two samples (both pre- and postmigration samples) to estimate migration rate alone using F-statistics under the infinite-island model with sex-specific dispersal. Extending the idea in WAPLES (1989), if allele frequency data are available from multiple samples from multiple generations from two or more demes, then migration rate and effective population size can be jointly estimated (WANG and WHITLOCK 2003). Using data from a single generation, the method of BEERLI and FELSENSTEIN (2001) estimates the deme-specific product parameters 4uN and m/u for s demes under a general migration scheme under the assumption that effective population size is sufficiently large so that the coalescent model of genetic drift is appropriate and that m and u are sufficiently small so that the quantities mN and uN remain finite as N goes to infinity. In a two-deme version of their coalescent procedure, BEERLI and FELSENSTEIN (1999) initially estimate 4uN and m/u using moment estimators based on the probability-of-identity equations of NEI and FELDMAN (1972), which can be derived for randomly mating monoecious populations under a postmigration census scheme. The method of VITALIS and COUVET (2001) uses one- and two-locus probabilities of identity to estimate m and N under the infinite-island model with infinite-allele mutation and random selfing, assuming that u = 0 and m is sufficiently small so that the approximation
might be valid. In a somewhat different demographic scenario that tackles the same issues, if migration occurs via the dispersal of male gametes and genotype data are available from offspring and their mothers (e.g., pollen dispersal with genotype data from seeds and their mother plant), then the gamete dispersal curve can be estimated independently of effective population density by making use of probabilities of identity, and an approximate estimate of effective population density can also be calculated (ROBLEDO-ARNUNCIO et al. 2006). The method of ROBLEDO-ARNUNCIO et al. (2006) is nonequilibrium in the sense that it does not model mutation and it estimates dispersal in the most recent generation assuming that parents are unrelated. Under a demographic model of admixture of previously separated demes (vs. demes exhibiting continuous migration and drift), computationally intensive Bayesian procedures based on coalescent models have been used to estimate demographic parameters (e.g., the admixture proportion) that are consistent with a set of observed summary statistics, including estimates of FST (ESTOUP et al. 2001; EXCOFFIER et al. 2005). Under the standard coalescent model, only the product parameter uN is estimable unless additional information on u (or N) is available (ESTOUP et al. 2001) or the sampling scheme and demography mimic samples taken from the same deme over different generations (cf. WAPLES 1989; EXCOFFIER et al. 2005). The results from these studies illustrate the challenges of estimating demographic parameters from genetic data.
The method I describe here requires only a sample from a single generation at a single point in time; it can jointly estimate mutation rate, migration rate, and effective population size; it is relatively simple computationally and, given the parametric model, need not make assumptions concerning the values of parameters that might be estimated; but, at present, it has not been developed to accommodate more general demographic situations. However, it may be possible to extend the method to include other demographic and genetic scenarios, such as a time series of samples (WANG and WHITLOCK 2003), stepping-stone dispersal, more general migration models (e.g., BEERLI and FELSENSTEIN 2001), deme-specific effective population sizes, and other mutation models (e.g., LAI and SUN 2003). More general forms of the model can lead to additional (but still linear) recursions for the probabilities of identity in state, but if the probability of identity within individuals remains different from the probability of identity among individuals within demes in these more general settings, then information may be available to jointly estimate migration rates and effective population sizes in more detailed models.
ABSTRACT
THE INFINITE-ISLAND MODEL WITH...
THE FINITE-ISLAND MODEL WITH...
ROBUSTNESS OF THE ESTIMATION...
DISCUSSION
>APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Because demographic and genetic measures can depend on the timing of sampling within the life cycle (e.g., WAPLES 1989; CASWELL 2001; VITALIS 2002), I consider two census schemes, a premigration census and a postmigration census. Assuming pre- and postmigration census schemes for dioecious populations with sex-specific dispersal following the infinite-allele mutation model in a finite number of demes, VITALIS (2002) gives recursions for probabilities of identity by descent (premigration, Equation A1.4; postmigration, Equation A1.1; VITALIS 2002) that can be readily modified to obtain the results that follow. Note that the second and third columns in the matrix A following Equation A1.1 in VITALIS (2002) should have terms like (1 – 2/N) consistent with Equation 4 in that article, rather than terms like (1 – 1/N).
First, I consider the premigration census under the infinite-island model. Under the premigration census, the sampling of offspring occurs immediately following reproduction and prior to migration. Let Q1(t), Q2(t), and Q3(t) be the probabilities (summed over k alleles at one locus) that genes within individuals, between individuals within a deme, and between individuals between demes are the same allele at time t, respectively. Under the premigration census scheme, Equation A1.4 of VITALIS (2002) modified for an infinite number of demes with no mutation and no sex-specific dispersal yields, at temporal equilibrium, the recursions
![]() |
![]() |
![]() |
![]() |
Under the postmigration census scheme, Equation A1.1 of VITALIS (2002) modified for an infinite number of demes with no mutation and no sex-specific dispersal yields, at temporal equilibrium, the recursions
![]() |
![]() |
![]() |
Under the finite-island model with a k-allele mutation scheme, the number of demes, s, is finite, and each gene occupies one of k allelic states, mutates with probability u per generation, and, given a mutation event, mutates to each of the other k – 1 alleles with equal probability. Under a premigration census, modifying Equation A1.4 of VITALIS (2002) to assume the finite-island model with a k-allele mutation scheme with no sex-specific dispersal yields, at temporal equilibrium, the recursions
![]() |
![]() |
![]() |
In the case of dioecious populations (assuming the locus is not sex-linked), probabilities of identity within and between individuals must be specified for male and female pairs of genes so that
and
are the probabilities that genes are identical in state within female and male individuals;
and
are the probabilities that genes are identical in state between two females, between two males, and between a female and a male within a deme; and
and
are the probabilities that genes are identical in state between two females, between two males, and between a female and a male for individuals in different demes (e.g., VITALIS 2002). Under a premigration census, modifying Equation A1.4 of VITALIS (2002) to have the k-allele mutation model with no sex-specific dispersal yields, at temporal equilibrium, the dioecious population recursions
![]() |
Probabilities of gene identity have been analyzed extensively in the population genetics literature. I briefly summarize previous results in the context of the models that I have presented here. Equation 2-1 of MARUYAMA (1970) and Equation 1 of NEI and FELDMAN (1972) can be derived in the present context by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with random mating [including random selfing; hence, Q1(t) = Q2(t)] under a postmigration census. Equation 78 of NAGYLAKI (1983) with zero selfing can be derived by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with no selfing under a postmigration census. Equation 5 of CROW and AOKI (1984) can be derived by assuming a finite-island model, mutation among k alleles, and monoecious populations with random mating [including random selfing; hence, Q1(t) = Q2(t)] under a postmigration census. Equation 2 of EPPERSON (1999) can be derived by assuming a finite-island model with general between-deme migration rates, no mutation, and monoecious populations with random mating under a postmigration census. Recursions for Q1(t) and Q3(t) [a one-generation recursion for Q2(t) is not presented] in MAYNARD SMITH (1970) can be derived by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with no selfing under a premigration census. Equation 5 of VITALIS and COUVET (2001) with zero selfing can be derived by assuming an infinite-island model, mutation among k alleles, and monoecious populations with no selfing under a premigration census. Equations A1.1 and A1.4 of VITALIS (2002), assuming pre- and postmigration census schemes, respectively, can be derived assuming dioecious populations with sex-specific dispersal following the infinite-allele mutation model in a finite number of demes. Finally, the juvenile life stage recursions of BALLOUX et al. (2003) with no selfing and no clonal reproduction can be derived by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with no selfing under a premigration census. Although many previous studies have analyzed probabilities of gene identity, I am not aware of any study that has identified the connection between the census scheme and the procedures for estimating mutation rate, migration rate, and effective population size as I have outlined them here.
ABSTRACT
THE INFINITE-ISLAND MODEL WITH...
THE FINITE-ISLAND MODEL WITH...
ROBUSTNESS OF THE ESTIMATION...
DISCUSSION
APPENDIX
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
THE INFINITE-ISLAND MODEL WITH...
THE FINITE-ISLAND MODEL WITH...
ROBUSTNESS OF THE ESTIMATION...
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
>LITERATURE CITED
BALLOUX, F., L. LEHMANN and T. DE MEEÛS, 2003 The population genetics of clonal and partially clonal diploids. Genetics 164: 1635–1644.
BEERLI, P., and J. FELSENSTEIN, 1999 Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152: 763–773.
BEERLI, P., and J. FELSENSTEIN, 2001 Maximum likelihood estimation of a migration matrix and effective population size in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA 98: 4563–4568.
CASWELL, H., 2001 Matrix Population Models: Construction, Analysis, Interpretation. Sinauer Associates, Sutherland, MA.
CROW, J. F., and K. AOKI, 1984 Group selection for a polygenic behavioral trait: estimating the degree of population subdivision. Proc. Natl. Acad. Sci. USA 81: 6073–6077.
EPPERSON, B. K., 1999 Gene genealogies in geographically structured populations. Genetics 152: 797–806.
ESTOUP, A., I. J. WILSON, C. SULLIVAN, J. CORNUET and C. MORITZ, 2001 Inferring population history from microsatellite and enzyme data in serially introduced cane toads, Bufo marinus. Genetics 159: 1671–1687.
EXCOFFIER, L., A. ESTOUP and J. CORNUET, 2005 Bayesian analysis of an admixture model with mutations and arbitrarily linked markers. Genetics 169: 1727–1738.
FONTANILLAS, P., E. PETIT and N. PERRIN, 2004 Estimating sex-specific dispersal rates with autosomal markers in hierarchically structured populations. Evolution 58: 886–894.[Medline]
HÄNFLING, B., and D. WEETMAN, 2006 Concordant genetic estimators of migration reveal anthropogenically enhanced source-sink population structure in the river sculpin, Cottus gobio. Genetics 173: 1487–1501.
KIMURA, M., and J. CROW, 1964 The number of alleles that can be maintained in a finite population. Genetics 49: 725–738.
LAI, Y., and F. SUN, 2003 The relationship between microsatellite slippage mutation rate and the number of repeat units. Mol. Biol. Evol. 20: 2123–2131.
MARUYAMA, T., 1970 Effective number of alleles in a subdivided population. Theor. Popul. Biol. 1: 273–306.[CrossRef][Medline]
MAYNARD SMITH, J., 1970 Population size, polymorphism, and the rate of non-Darwinian evolution. Am. Nat. 104: 231–237.[CrossRef]
NAGYLAKI, T., 1983 The robustness of neutral models of geographic variation. Theor. Popul. Biol. 24: 268–294.[CrossRef]
NEI, M., and M. W. FELDMAN, 1972 Identity of genes by descent within and between populations under mutation and migration pressures. Theor. Popul. Biol. 3: 460–465.[CrossRef][Medline]
OHTA, T., and M. KIMURA, 1973 A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22: 201–204.[Medline]
PUDOVKIN, A. I., D. V. ZAYKIN and D. HEDGECOCK, 1996 On the potential for estimating the effective number of breeders from heterozygote-excess in progeny. Genetics 144: 383–387.[Abstract]
ROBLEDO-ARNUNCIO, J. J., F. AUSTERLITZ and P. E. SMOUSE, 2006 A new method of estimating the pollen dispersal curve independently of effective density. Genetics 173: 1033–1045.
ROUSSET, F., 2001 Inferences from spatial population genetics, pp. 239–269 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, New York.
SLATKIN, M., 1985 Gene flow in natural populations. Annu. Rev. Ecol. Syst. 16: 393–430.
SLATKIN, M., 2005 Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations. Mol. Ecol. 14: 67–73.[CrossRef][Medline]
VITALIS, R., 2002 Sex-specific genetic differentiation and coalescence times: estimating sex-biased dispersal rates. Mol. Ecol. 11: 125–138.[CrossRef][Medline]
VITALIS, R., and D. COUVET, 2001 Estimation of effective population size and migration rate from one- and two-locus identity measures. Genetics 157: 911–925.
WANG, J., and M. C. WHITLOCK, 2003 Estimating effective population size and migration rates from genetic samples over space and time. Genetics 163: 429–446.
WAPLES, R. S., 1989 A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121: 379–391.
WHITLOCK, M. C., and D. E. MCCAULEY, 1999 Indirect measures of gene flow and migration: FST
1/(4Nm+1). Heredity 82: 117–125.[CrossRef][Medline]
WILLIAMSON, E. G., and M. SLATKIN, 1999 Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152: 755–761.
WRIGHT, S., 1951 The genetical structure of populations. Ann. Eugen. 15: 323–354.
Communicating editor: R. NIELSEN
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
-
All Versions of this Article:
genetics.107.078998v1
177/2/1043 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Skalski, G. T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Skalski, G. T.





















