## Abstract

Using the island model of population demography, I report that the demographic parameters migration rate and effective population size can be jointly estimated with equilibrium probabilities of identity in state calculated using a sample of genotypes collected at a single point in time from a single generation. The method, which uses moment-type estimators, applies to dioecious populations in which females and males have identical demography and monoecious populations with no selfing and requires that offspring genotypes are sampled following reproduction and prior to migration. I illustrate the estimation procedure using the infinite-island model with no mutation and the finite-island model with three kinds of mutation models. In the infinite-island model with no mutation, the estimators can be expressed as simple functions of estimates of the *F*-statistic parameters *F*_{IT} and *F*_{ST}. In the finite-island model with mutation among *k* alleles, mutation rate, migration rate, and effective population size can be simultaneously estimated. The estimates of migration rate and effective population size are somewhat robust to violations in assumptions that may arise in empirical applications such as different kinds of mutation models and deviations from temporal equilibrium.

POPULATION geneticists recognize that the demographic characteristics of populations, such as migration rates and population sizes, affect population genetic structure (Wright 1951). Accordingly, many population genetic studies have investigated how demographic properties might be inferred from genetic measurements in populations (*e.g*., Slatkin 1985; Waples 1989; Pudovkin *et al*. 1996; Beerli and Felsenstein 2001; Vitalis and Couvet 2001; Wang and Whitlock 2003; Robledo-Arnuncio *et al*. 2006). In parallel, the cultivation of genomic resources in species that are amenable to field study has facilitated the application of genetic methodologies to estimate demographic rates in natural populations.

The island model of Wright (1951) is an important model in population genetics. Under a simple version of this model, an infinite number of demes, each having population size *N*, exchange migrants at rate *m* under the assumption that migrants into a deme come from any of the other demes with equal probability. In the absence of mutation and other evolutionary forces, genetic polymorphism is maintained within demes via a balance between genetic drift and migration. An important feature of the infinite-island model is that, at temporal equilibrium, the magnitude of genetic differentiation among demes, *F*_{ST}, is approximated by where the approximation is intended to apply for small values of the migration rate, *m* (Wright 1951). An important statistical consequence of the above result is that the product parameter *mN*, the number of individuals migrating and reproducing per generation, may be estimated using data on *F*_{ST} (Slatkin 1985), but the parameters *m* and *N* cannot be estimated individually in this way. Although the indiscriminate application of the infinite-island model to interpret genetic data in terms of demographic rates has been discouraged (Whitlock and Mccauley 1999), the island model continues to support a variety of theoretical and empirical investigations (*e.g*., Vitalis and Couvet 2001; Balloux *et al*. 2003; Hänfling and Weetman 2006).

There is continuing interest in statistical approaches that estimate both migration rate, *m*, and effective population size, *N*, from genetic data, including methods that are applicable to a sample taken from a single generation at a single point in time (Beerli and Felsenstein 2001; Vitalis and Couvet 2001; Wang and Whitlock 2003). Here, I report results that show how the island model of dioecious or monoecious populations can be used to simultaneously estimate migration rate, *m*, and effective population size, *N*, using a sample of selectively neutral markers taken from a single generation at a single point in time. In particular, at temporal equilibrium under the infinite-island model with no mutation, the demographic parameters *m* and *N* can be estimated using data on *F*_{IT} and *F*_{ST} (Wright 1951). At temporal equilibrium under the finite-island model with a *k*-allele mutation scheme, the demographic parameters *m* and *N*, as well as the mutation rate, *u*, can be jointly estimated using data on probabilities of identity in state.

## THE INFINITE-ISLAND MODEL WITH NO MUTATION

I first describe key results for the infinite-island model of a selectively neutral locus with no mutation; theoretical details are in the appendix. Under the infinite-island model with no mutation, an infinite number of demes, each having effective population size *N*, exchange migrants at rate *m*. The results that follow apply to monoecious populations with no selfing (*N* adults) and dioecious populations when males and females have identical demography (*N* adults composed of *N*/2 females and *N*/2 males). This model is appropriate for highly fecund organisms with localized mating, including species of invertebrates, amphibians, fishes, and plants.

Population genetic structure can be characterized using probabilities of gene identity (*e.g*., Maruyama 1970; Maynard Smith 1970; Nei and Feldman 1972; Crow and Aoki 1984; Epperson 1999; Rousset 2001; Vitalis 2002). Accordingly, let *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)} be the probabilities (summed over *k* alleles at one locus) that genes within individuals, between individuals within a deme, and between individuals between demes are the same allele at time *t*, respectively. Hence, *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)} are probabilities of identity in allelic state. A key idea in the following theory is that the interpretation of the probabilities of identity can depend on the timing of the sampling of genotypes within the sequence of demographic events that defines the life cycle (Vitalis 2002). I first assume that the sampling of genotypes follows a premigration census in the sense that genotypes are sampled from offspring immediately following reproduction and prior to migration. This kind of sampling is appropriate for highly fecund organisms with localized mating in which many offspring may be available following reproduction for genotyping. Under the premigration census in the infinite-island model with no mutation, the probabilities of identity at temporal equilibrium satisfy(1)Equation 1 is the same as Equation A1.4 of Vitalis (2002) when assuming an infinite number of demes with no mutation and no sex-specific dispersal in the latter. Although recursions similar to Equation 1 have been presented and analyzed (Maynard Smith 1970; Vitalis and Couvet 2001; Vitalis 2002; Balloux *et al*. 2003; see the appendix for details), previous work seems to have overlooked the idea that Equation 1 can be used to jointly estimate *m* and *N*. Indeed, at temporal equilibrium, the parameters *F*_{IT} and *F*_{ST} are distinct and are given by where the approximation omits terms proportional to *m*^{2} [the approximation is given here solely to connect these findings to Wright's (1951) classic result that ]. Equation 14 in Vitalis (2002) assuming no mutation, an infinite number of demes, and no sex-specific dispersal is the same as the equation for *F*_{ST} given above, but Vitalis (2002) does not report an expression for *F*_{IT}. Thus, migration rate, *m*, and effective population size, *N*, can be expressed in terms of *F*_{IT} and *F*_{ST}, without approximation, viaHence, the above expressions can be used to estimate *m* and *N* via the moment-based estimators(2)where and denote estimates of *F*_{IT}, *F*_{ST}, *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)}, respectively. Methods for estimating *F*_{IT}, *F*_{ST}, *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)} are discussed by Rousset (2001). Equation 17 in Vitalis (2002) gives an estimator for sex-specific dispersal rates similar to the estimator of *m* in Equation 2, but, importantly, the former requires estimates of *F*_{ST} from a sequence of samples taken pre- and postmigration, rather than estimates of *F*_{IT} and *F*_{ST} from a single sample as in Equation 2. Fontanillas *et al*. (2004) also give estimators for sex-specific dispersal based on the idea of Vitalis (2002) that require estimates of *F*_{ST} from samples taken pre- and postmigration. Vitalis (2002) and Fontanillas *et al*. (2004) do not report estimators of effective population size.

Estimates of *m* and *N* can be calculated from multiple loci by calculating and over loci (or, equivalently, and over loci). Interestingly, if offspring genotypes are sampled following migration using a postmigration census scheme (Vitalis 2002), then the recursions for the probabilities of identity are different from those for the premigration census with the consequence that *F*_{IT} = *F*_{ST}. Hence, the parameters *m* and *N* cannot be jointly estimated in this way using a postmigration census.

To verify the recursions in Equation 1, and thus that the estimators in Equation 2 work as intended, I simulated genotype data under the infinite-island model with no mutation for dioecious and monoecious (with no selfing) populations at temporal equilibrium over a range of migration rates (0.02, 0.05, 0.10, and 0.20) and effective population sizes (10, 20, and 50). In the simulations, individual genotypes were tracked forward in time using a Monte Carlo implementation of the probability model defined by the life cycle using a premigration census. For each replicate simulation, diploid genotypes were initialized using random pairs of alleles, the life cycle was iterated until the system reached temporal equilibrium, and offspring genotypes were sampled prior to migration. I numerically solved the analytical recursions in Equation 1 to identify, in advance of the stochastic simulations, a sufficient number of generations required for the system of probabilities of identity to reach equilibrium to a precision of 10^{−4} (equilibrium to four decimal places; 1000 generations is sufficient for all parameter combinations under the infinite-island model). Means of the probabilities of identity calculated over replicate simulations are in close agreement with those calculated from the analytical recursions. The simulations were carried out using 20 independent eight-allele loci (with equally frequent alleles) at which 50 offspring were genotyped from each of 20 demes. Simulated data were combined over loci to calculate and Negative estimates of *N* (equivalent to infinite-valued estimates of *N*) and *m* were set equal to 1000 and zero, respectively.

The simulations of the infinite-island model show, given sufficient data collected using a premigration census, that estimates of migration rate and effective population size using Equation 2 are close to their true values for both dioecious (Figure 1A; Table 1) and monoecious populations (Figure 1B; Table 1). The precision of the estimates of *N* decreases with increasing *N*, and the precision of the estimates of *m* decreases with increasing *m* and *N*. Additional simulation results are in supplemental Table S1 at http://www.genetics.org/supplemental/.

## THE FINITE-ISLAND MODEL WITH *k*-ALLELE MUTATION

I now describe key results for the finite-island model with mutation following the *k*-allele mutation model; theoretical details are in the appendix. Under the finite-island model, *s* demes, each having effective population size *N*, exchange migrants at rate *m*, and genes can mutate into other alleles after gamete production according to the *k*-allele mutation model. The results that follow again apply to monoecious populations with no selfing and dioecious populations when males and females exhibit identical demography (including identical rates of mutation).

Under the premigration census in the finite-island model with *k*-allele mutation, the probabilities of identity at temporal equilibrium satisfy (3)whereEquation 3 is the same as Equation A1.4 of Vitalis (2002) when the latter is modified to have *k*-allele mutation (rather than infinite-allele mutation) and no sex-specific dispersal. Although recursions similar to Equation 3 have been presented and analyzed (Maynard Smith 1970; Vitalis and Couvet 2001; Vitalis 2002; Balloux *et al*. 2003; see appendix for details), the subsequent estimation of *u*, *m*, and *N* based on Equation 3 seems not to have been recognized in previous work. Indeed, Equation 3 can be used to jointly estimate the mutation rate, *u*, migration rate, *m*, and effective population size, *N*, by solving the system of equations(4)for *u*, *m*, and *N*. The values of *u*, *m*, and *N* that satisfy Equation 4, denoted by and respectively, are the respective moment-based estimators of *u*, *m*, and *N*. The estimates and are calculated assuming that the number of demes, *s*, and the number of possible alleles, *k*, are known. Estimates can be calculated from multiple loci by summing the left- and right-hand sides of Equation 4 over loci (*k* may be locus specific in the *U*_{1} and *U*_{2} terms in the right-hand side of Equation 4) and calculating the parameter values that solve the moment equations obtained by setting the left-hand sum over loci equal to the right-hand sum over loci.

To verify the recursions in Equation 3, and thus that the estimators and based on Equation 4 work as intended, I simulated genotype data under the finite-island model with *k*-allele mutation for dioecious and monoecious (with no selfing) populations at temporal equilibrium over a range of migration rates (0.02, 0.05, 0.10, and 0.20) and effective population sizes (10, 20, and 50) at three different mutation rates (0.001, 0.0005, and 0.0001). These values for the mutation rate are consistent with values used in similar simulation studies (*e.g*., Vitalis and Couvet 2001; Wang and Whitlock 2003; Excoffier *et al*. 2005) and estimates from empirical data (Estoup *et al*. 2001; Lai and Sun 2003; Excoffier *et al*. 2005). Simulations were executed as described above for the infinite-island model: individual genotypes were tracked forward in time using a Monte Carlo implementation of the probability model defined by the life cycle using a premigration census for *s* demes. I numerically solved the analytical recursions in Equation 3 to identify a sufficient number of generations for the system to reach equilibrium to a precision of 10^{−4} (5000 generations for *u* = 0.001, 6000 generations for *u* = 0.0005, and 15,000 generations for *u* = 0.0001). Models with smaller values of *u* and *m* and larger values of *N* require more generations to reach equilibrium. Means of the probabilities of identity calculated over replicate simulations are in close agreement with those calculated from the analytical recursions. The simulations were carried out using 30 independent eight-allele loci (with equally frequent alleles; hence, *k* = 8) at which 100 offspring were genotyped from each of 20 demes. Simulated data were combined over loci to calculate and and the estimators and defined by Equation 4 were calculated numerically using a nonlinear least-squares procedure in the software application MATLAB (The Mathworks). Estimates of *u*, *m*, and *N* were constrained according to 0 ≤ ≤ 1, 0 ≤ ≤ 1, and 2 ≤ ≤ 1000, respectively, to obtain realistic estimates and to account for the possibility of infinite-valued estimates of *N* (*cf*. Waples 1989; Williamson and Slatkin 1999; Wang and Whitlock 2003).

The simulations of the finite-island model show, given sufficient data collected using a premigration census, that estimates of migration rate and effective population size using Equation 4 are close to their true values for both dioecious (Figure 2, A–C; Table 2) and monoecious populations (Figure 2D; Table 2). The precision of the estimates of *N* decreases with increasing values of *N*, and the precision of the estimates of *m* decreases with increasing values of *m* and *N*. The precision of the estimates of *m* and *N* decreases with lower mutation rates. Estimates of mutation rate, given sufficient data, are similarly close to their true values (Table 2). The precision of the estimates of *u* decreases with decreasing values of *u* and increasing values of *N*. Additional simulation results are in supplemental Table S2 at http://www.genetics.org/supplemental/.

Examples of the sampling distributions of the parameter estimates are shown in Figure 3 for dioecious populations with *u* = 0.0005, *m* = 0.05, and *N* = 20 for samples of 100 genotypes from 20 demes genotyped at 30 eight-allele loci. The distributions of and are approximately symmetrical (Figure 3, A and B), and the distribution of is positively skewed (Figure 3C). The estimates of *m* and *N* are strongly negatively correlated and fall along the line defined by = *mN* (Figure 3D). The estimates of *u* and *m* are positively correlated, and the estimates of *u* and *N* are negatively correlated (the correlation between and is strong, but not as strong as that exhibited by and ). These results suggest that the product parameter *mN* is well estimated, but that the individual estimates of *u*, *m*, and *N* are more difficult to identify precisely from data.

In empirical situations, *k*, the number of possible alleles at a locus, might be expected to vary across loci. In this case, estimates of *u*, *m*, and *N* may be constructed from the moment equations defined by summing the left- and right-hand sides of Equation 4 over loci and setting the left-hand sum equal to the right-hand sum. Simulations of dioecious populations in the finite-island model (100 individuals genotyped from each of 20 demes) with *k*-allele mutation (*u* = 0.0005) for 30 loci having a random mixture of 4, 8, or 12 possible alleles show, given sufficient data collected using a premigration census, that estimates of mutation rate, migration rate, and effective population size are close to their true values and have similar properties to those calculated when all loci have the same number of possible allelic states (simulation results are in supplemental Table S3 at http://www.genetics.org/supplemental/).

## ROBUSTNESS OF THE ESTIMATION PROCEDURES

The extent to which a statistical procedure yields useful parameter estimates typically depends on the appropriateness of the assumptions to a given data set. Accordingly, I assessed the properties of the parameter estimates when some of the assumptions of the models are violated.

I used the *k*-allele mutation model to describe the mutation process in the estimation procedure described above. The infinite-allele model (Kimura and Crow 1964) can be employed using Equation 4 by setting *k* to a large (and effectively infinite) value. An alternative mutation model that may be appropriate for some markers such as microsatellite sequences is the stepwise mutation model (Ohta and Kimura 1973). The stepwise mutation model assigns a length-based ordering to alleles and posits that mutation occurs between allelic states that are adjacent in the ordering. Accordingly, I applied the estimation procedure to data generated using the simulation approach described above for the finite-island model except that I implemented mutation according to two simple kinds of stepwise mutation models: a stepwise mutation model with no bounds on allele length (unbounded stepwise mutation model; Ohta and Kimura 1973) and a stepwise mutation model with lower and upper bounds on allele length (bounded stepwise mutation model; constrained to eight allelic states). I simulated genotype data under the finite-island model with dioecious populations over a range of migration rates (0.02, 0.05, 0.10, and 0.20) and effective population sizes (10, 20, and 50) at a mutation rate of *u* = 0.0005. Hence, each generation every gene mutates to an adjacent allele with probability *u*, mutating to either neighboring allele with equal probability. In the bounded stepwise mutation model, genes at the lower bound mutate toward the upper bound and genes at the upper bound mutate toward the lower bound. The simulations were carried out using 30 independent eight-allele loci (with equally frequent alleles) at which 100 offspring were genotyped from each of 20 demes after 6000 generations. Simulated data were combined over loci to calculate and I set *k* = 10,000 to mimic the infinite-allele mutation model for parameter estimation using data generated from the unbounded stepwise mutation model, and I set *k* = 8 for parameter estimation using data generated from the bounded stepwise mutation model.

The accuracy (when comparing the medians of replicate estimates to their parametric values) of estimates of migration rate and effective population size for data generated under the stepwise mutation models is similar to that observed under the *k*-allele mutation model (Figure 4; Table 3), despite the fact that the *k*-allele mutation model is the assumed mutation process in Equation 4. In particular, the medians of the estimates of *m* and *N* are close to their respective parametric values. The estimates of *m* and *N* for data generated under the stepwise mutation models (Figure 4; Table 3) exhibit levels of precision similar to, but slightly lower than, the estimates of those parameters for data generated under the equivalent *k*-allele mutation model (Figure 2B; Table 2). Hence, the estimates of *m* and *N* based on Equation 4 are somewhat robust to violations in the assumptions of the mutation model. In contrast, the estimates of mutation rate, when summarized using their medians, are negatively biased (Table 3), suggesting that the mutation rate estimates are sensitive to violations in the assumptions of the mutation model in the estimation procedure. Additional simulation results are in supplemental Table S4 at http://www.genetics.org/supplemental/.

The assumption of temporal equilibrium in the probability of identity recursion equations is used to estimate mutation rate, migration rate, and effective population size. To assess this assumption, I applied the estimation procedure to genotype data simulated under nonequilibrium conditions using the infinite-island model with no mutation and the finite-island model with mutation for dioecious populations with parameter values *m* = 0.05 and *N* = 20. For the finite-island model, I simulated data under the *k*-allele mutation model, the unbounded stepwise mutation model, and the bounded stepwise mutation model using a mutation rate of *u* = 0.0001 (the value of the mutation rate requiring the most generations to reach temporal equilibrium). At these parameter values, the infinite-island model is very close to temporal equilibrium after 50 generations, whereas the finite-island model with *k*-allele mutation requires ∼10,000 generations to reach equilibrium to four decimal places. Diploid genotypes were initialized using random pairs of alleles, the life cycle was iterated for 10, 20, 50, 100, or 200 generations, and genotypes were then sampled prior to migration (100 offspring genotyped at 30 eight-allele loci in each of 20 demes).

The accuracy of the estimates of mutation rate, migration rate, and effective population size, measured using the median of replicate estimates relative to their parametric values, increases as the number of generations increases from 10 to 200 for both the infinite- and the finite-island models (Figure 5). The estimates of *u* under the finite-island models are strongly positively biased at 10–200 generations (Figure 5A). In contrast, the estimates of *m* under the infinite-island model and the finite-island model with unbounded stepwise mutation are positively biased at 10 generations, but are nearly unbiased after ≥20 generations (Figure 5B). The estimates of *m* under the finite-island models with *k*-allele and bounded stepwise mutation models are negatively biased, but the bias is relatively small, especially after ≥100 generations (Figure 5B). The estimates of *N* exhibit the least bias among the estimated parameters, indicating essentially no bias in the infinite-island model and the finite island model with unbounded stepwise mutation at ≥10 generations and a small negative bias in the finite-island models with *k*-allele and bounded stepwise mutation models after 10 and 20 generations and very little bias for ≥50 generations (Figure 5C). The precision (measured via 5th and 95th percentiles in Figure 5) of the estimates of *m* and *N* is high relative to estimates of those parameters under equilibrium conditions (see supplemental Table S2 at http://www.genetics.org/supplemental/: *u* = 0.0001, *m* = 0.05, *N* = 20). Hence, the estimation procedures do not require equilibrium conditions to provide reasonable estimates of migration rate and effective population size, and the nonequilibrium conditions explored here actually increase the precision of these estimates. In contrast, the nonequilibrium conditions examined here result in inaccurate estimates of the mutation rate. Simulations suggest that at least 4000 generations under the finite-island model with *k*-allele mutation are required to obtain accurate estimates of the mutation rate when *u* = 0.0001, *m* = 0.05, and *N* = 20.

The estimation of parameters using Equation 4 assumes that *k*, the number of possible allelic states at a locus, is known. In practice, *k* might be estimated using the total number of alleles observed in all of the data for each locus, and there would be uncertainty in its value. Accordingly, using data generated under the finite-island models (100 offspring genotyped at 30 eight-allele loci in each of 20 demes after 6000 generations; hence *k* = 8) with *k*-allele mutation, unbounded stepwise mutation, and bounded stepwise mutation (*u* = 0.0005, *m* = 0.05, *N* = 20), I calculated estimates of mutation rate, migration rate, and effective population size using values for *k* in Equation 4 equal to 2, 4, 8, 16, and 100.

The medians of the estimates of migration rate and effective population size are close to their parametric values over the range of assumed values of *k* (2, 4, 8, 16, and 100), but the medians of the estimates of mutation rate deviate from the parametric values for most of the assumed values of *k*, with most cases exhibiting negative bias. The precision (measured via 5th and 95th percentiles) of the estimates of *m* and *N* is similar over the range of assumed values of *k* with the exception that estimates of *m* are slightly less precise for *k =* 2 under the *k*-allele mutation model. Estimates of *u* for *k* = 2 are quite variable and some are very near zero; otherwise the precision of the estimates of *u* is similar across the different assumed values of *k*. Hence, estimates of *m* and *N* are robust to uncertainty in the value of *k*, and, in contrast, estimates of *u* are more sensitive to deviations from the parametric value of *k*.

## DISCUSSION

Population geneticists have actively studied the idea that demographic parameters such as migration rate and effective population size might be estimable from genetic data (*e.g*., Slatkin 1985; Waples 1989; Pudovkin *et al*. 1996; Beerli and Felsenstein 2001; Vitalis and Couvet 2001; Wang and Whitlock 2003; Robledo-Arnuncio *et al*. 2006). Using the classic island model (Wright 1951), I report that migration rate and effective population size can be jointly estimated from probabilities of identity using neutral markers in dioecious or monoecious populations when offspring genotypes are collected prior to migration from a single generation at a single point in time. The life cycle and sampling model are appropriate for highly fecund organisms with localized mating, including species of invertebrates, amphibians, fishes, and plants; hence the method has the potential for broad taxonomic utility.

The estimation procedure works because assuming a dioecious population—or monoecious populations with no selfing—in which offspring genotypes are sampled prior to migration has the consequence that *Q*_{1(t)} ≠ *Q*_{2(t)} and hence provides additional information that is not available for other mating systems and sampling schemes that result in *Q*_{1(t)} = *Q*_{2(t)} (*e.g*., random selfing resulting in random pairing of gametes from all adults during mating, including pairing of gametes from the same adult; Maruyama 1970; Nei and Feldman 1972; Nagylaki 1983; Crow and Aoki 1984; Epperson 1999) or *Q*_{1(t)} very nearly equal to *Q*_{2(t)} (*e.g*., the finite-island model under a postmigration census). Previous studies in population ecology (*e.g*., Caswell 2001) and genetics (*e.g*., Nagylaki 1983; Waples 1989; Vitalis 2002) have recognized that the mating system and/or timing of sampling can affect the interpretation of demographic quantities, but the application of these ideas to the present scenario of joint estimation of migration rate and effective population size using a sample from a single generation collected at a single point in time seems not to have been analyzed in prior investigations. Several studies have examined recursions for probabilities of identity in state that are similar to those used here, but these studies do not identify the estimation procedures developed here. In the appendix, I outline how various recursions for probabilities of identity that have been studied (Maruyama 1970; Maynard Smith 1970; Nei and Feldman 1972; Nagylaki 1983; Crow and Aoki 1984; Epperson 1999; Vitalis and Couvet 2001; Vitalis 2002; Balloux *et al*. 2003) can be derived under the pre- and postmigration census schemes, thus helping to explain the different forms of these equations that occur in the literature. Indeed, Vitalis and Couvet (2001) estimate *m* and *N* using probabilities of identity, and their Equation 5 with no selfing is equivalent to the infinite-island model with *k*-allele mutation under a premigration census as defined here; but Vitalis and Couvet (2001) assume an infinite-island model with mutation among an infinite number of alleles, and they use the approximation along with a two-locus identity measure (a fourth-moment quantity) rather than the single-locus quantities *F*_{IT} and *F*_{ST} or *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)} (all second-moment quantities).

Like other genetic methods for estimating demographic parameters (*e.g*., Waples 1989; Pudovkin *et al*. 1996; Williamson and Slatkin 1999; Beerli and Felsenstein 2001; Vitalis and Couvet 2001; Wang and Whitlock 2003), the procedure described here will typically require considerable data to recover accurate and precise estimates. The accuracy and precision achieved will, in general, depend on several factors, including the values of the parameters *u*, *m*, and *N*, as well as the number of demes, number of individuals genotyped, number of loci used, and the appropriateness of the model to the empirical system under investigation. Previous studies (Waples 1989; Pudovkin *et al*. 1996; Williamson and Slatkin 1999; Vitalis and Couvet 2001; Wang and Whitlock 2003) have shown that effective population size is more difficult to estimate as *N* increases, and my simulation results are consistent with the findings in these earlier studies. Indeed, my results suggest that *Q*_{1(t)} and *Q*_{2(t)} become increasingly similar as *N* increases, with the consequence that estimates of *N* approach infinity as and become equal. Further, negative estimates of *N* are possible with Equation 2 if > Because effective population size influences genetic quantities via the function 1/*N* in standard models, it is not surprising that larger population sizes are more difficult to estimate with genetic data because this involves estimating an effect of magnitude ∼1/*N*—a small number for even moderately sized *N*. Hence, genetic-based methods for estimating effective population size work best for small populations, and the numbers of loci and individuals genotyped must increase with increasing *N* to maintain a given level of precision (Waples 1989; Pudovkin *et al*. 1996; Williamson and Slatkin 1999; Vitalis and Couvet 2001; Wang and Whitlock 2003). Accordingly, the method presented here is most likely to be useful for populations with a metapopulation structure defined by many small demes. Detailed guidance on the accuracy and precision of the estimators for specific empirical scenarios can be obtained using simulations. Source code used to simulate data under the infinite-island model with no mutation and the finite-island model with *k*-allele mutation is available at http://www.genetics.org/supplemental/.

The simulation results suggest that the estimates of migration rate and effective population size are somewhat robust to violations of the model assumptions. Reasonable estimates of *m* and *N* can be obtained for loci exhibiting stepwise mutation (Ohta and Kimura 1973), under nonequilibrium conditions, or if the number of possible allelic states is not precisely known. In contrast, estimates of the mutation rate, *u*, are sensitive to violations in the assumptions and can be quite biased in these settings. The precision of the estimates of *m* and *N* is higher for data simulated with a high mutation rate and for data simulated under nonequilibrium conditions, suggesting that the procedure works better at higher levels of genetic diversity. The estimation procedure under the finite-island model requires that the number of demes, *s*, be known, but it does not require that all demes be sampled because all demes are identical in island models. In non-island models, the set of demes that is exchanging migrants generally must be known to estimate migration rates among the demes (Beerli and Felsenstein 2001; Wang and Whitlock 2003; Slatkin 2005).

Many studies investigate the estimation of demographic parameters from genetic data (Slatkin 1985; Waples 1989; Pudovkin *et al*. 1996; Wang and Whitlock 2003; Robledo-Arnuncio *et al*. 2006); however, few methods exist for jointly estimating parameters like migration rate and effective population size from genetic data collected from a sample taken from a single generation at a single point in time (Beerli and Felsenstein 1999, 2001; Vitalis and Couvet 2001). For example, the product parameter *mN* can be estimated from single-generation data on *F*_{ST} under the infinite-island model (Slatkin 1985), and effective population size alone can be estimated from multiple samples on allele frequencies from two or more generations (Waples 1989) or from a single sample of offspring assuming unrelated parents using heterozygote excess (Pudovkin *et al*. 1996). Vitalis (2002) and Fontanillas *et al*. (2004) use two samples (both pre- and postmigration samples) to estimate migration rate alone using *F*-statistics under the infinite-island model with sex-specific dispersal. Extending the idea in Waples (1989), if allele frequency data are available from multiple samples from multiple generations from two or more demes, then migration rate and effective population size can be jointly estimated (Wang and Whitlock 2003). Using data from a single generation, the method of Beerli and Felsenstein (2001) estimates the deme-specific product parameters 4*uN* and *m*/*u* for *s* demes under a general migration scheme under the assumption that effective population size is sufficiently large so that the coalescent model of genetic drift is appropriate and that *m* and *u* are sufficiently small so that the quantities *mN* and *uN* remain finite as *N* goes to infinity. In a two-deme version of their coalescent procedure, Beerli and Felsenstein (1999) initially estimate 4*uN* and *m*/*u* using moment estimators based on the probability-of-identity equations of Nei and Feldman (1972), which can be derived for randomly mating monoecious populations under a postmigration census scheme. The method of Vitalis and Couvet (2001) uses one- and two-locus probabilities of identity to estimate *m* and *N* under the infinite-island model with infinite-allele mutation and random selfing, assuming that *u* = 0 and *m* is sufficiently small so that the approximation might be valid. In a somewhat different demographic scenario that tackles the same issues, if migration occurs via the dispersal of male gametes and genotype data are available from offspring and their mothers (*e.g*., pollen dispersal with genotype data from seeds and their mother plant), then the gamete dispersal curve can be estimated independently of effective population density by making use of probabilities of identity, and an approximate estimate of effective population density can also be calculated (Robledo-Arnuncio *et al*. 2006). The method of Robledo-Arnuncio *et al*. (2006) is nonequilibrium in the sense that it does not model mutation and it estimates dispersal in the most recent generation assuming that parents are unrelated. Under a demographic model of admixture of previously separated demes (*vs*. demes exhibiting continuous migration and drift), computationally intensive Bayesian procedures based on coalescent models have been used to estimate demographic parameters (*e.g*., the admixture proportion) that are consistent with a set of observed summary statistics, including estimates of *F*_{ST} (Estoup *et al*. 2001; Excoffier *et al*. 2005). Under the standard coalescent model, only the product parameter *uN* is estimable unless additional information on *u* (or *N*) is available (Estoup *et al*. 2001) or the sampling scheme and demography mimic samples taken from the same deme over different generations (*cf*. Waples 1989; Excoffier *et al*. 2005). The results from these studies illustrate the challenges of estimating demographic parameters from genetic data.

The method I describe here requires only a sample from a single generation at a single point in time; it can jointly estimate mutation rate, migration rate, and effective population size; it is relatively simple computationally and, given the parametric model, need not make assumptions concerning the values of parameters that might be estimated; but, at present, it has not been developed to accommodate more general demographic situations. However, it may be possible to extend the method to include other demographic and genetic scenarios, such as a time series of samples (Wang and Whitlock 2003), stepping-stone dispersal, more general migration models (*e.g*., Beerli and Felsenstein 2001), deme-specific effective population sizes, and other mutation models (*e.g*., Lai and Sun 2003). More general forms of the model can lead to additional (but still linear) recursions for the probabilities of identity in state, but if the probability of identity within individuals remains different from the probability of identity among individuals within demes in these more general settings, then information may be available to jointly estimate migration rates and effective population sizes in more detailed models.

## APPENDIX

I consider parametric expressions involving *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)}, the probabilities of identity in allelic state within one selectively neutral locus at time *t*, in an *s*-deme finite-island model with genetic drift, migration, and mutation. I adopt part of the derivation strategy of Nagylaki (1983) and outline the life cycle for the models that I consider. Starting with *N* diploid, monoecious adults in each of *s* demes, reproduction begins with each adult producing a large (*i.e*., infinite) number of haploid gametes. The allele in each gamete may then mutate into a different allele according to a general mutation model. The reproduction phase is completed by the random pairing of gametes from different adults within demes; no offspring are produced using two gametes from the same adult (*i.e*., no selfing occurs). Offspring then migrate among demes so that, following migration, a fraction *m* of the individuals in a deme are migrants and a fraction 1 − *m* of the individuals are residents. Population regulation completes the life cycle with *N* offspring chosen at random within each deme to compose the adults that will produce the next generation. The life cycle just described is the diploid dispersion life cycle considered by Nagylaki (1983). The equilibrium results that follow also apply to dioecious populations when males and females are equal in number (the populations within demes are regulated to *N*/2 females and *N*/2 males so that the total adult effective population size is *N*), migrate at the same rate, and experience the same mutation model.

Because demographic and genetic measures can depend on the timing of sampling within the life cycle (*e.g*., Waples 1989; Caswell 2001; Vitalis 2002), I consider two census schemes, a premigration census and a postmigration census. Assuming pre- and postmigration census schemes for dioecious populations with sex-specific dispersal following the infinite-allele mutation model in a finite number of demes, Vitalis (2002) gives recursions for probabilities of identity by descent (premigration, Equation A1.4; postmigration, Equation A1.1; Vitalis 2002) that can be readily modified to obtain the results that follow. Note that the second and third columns in the matrix **A** following Equation A1.1 in Vitalis (2002) should have terms like (1 − 2/*N*) consistent with Equation 4 in that article, rather than terms like (1 − 1/*N*).

First, I consider the premigration census under the infinite-island model. Under the premigration census, the sampling of offspring occurs immediately following reproduction and prior to migration. Let *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)} be the probabilities (summed over *k* alleles at one locus) that genes within individuals, between individuals within a deme, and between individuals between demes are the same allele at time *t*, respectively. Under the premigration census scheme, Equation A1.4 of Vitalis (2002) modified for an infinite number of demes with no mutation and no sex-specific dispersal yields, at temporal equilibrium, the recursionsSolving for *Q*_{1(t)} and *Q*_{2(t)} yieldsAccordingly, the parameters *F*_{IT} and *F*_{ST} (Wright 1951) are given bywhere the approximation omits terms proportional to *m*^{2}. Equation 14 in Vitalis (2002) assuming no mutation, an infinite number of demes, and no sex-specific dispersal is the same as the equation for *F*_{ST} given above, but Vitalis (2002) does not report an expression for *F*_{IT}. Thus, migration rate, *m*, and effective population size, *N*, can be expressed exactly in terms of *F*_{IT} and *F*_{ST} viaEquation 16 in Vitalis (2002) gives an expression for sex-specific dispersal rates similar to the expression for *m* given here, but, importantly, the former is a function of *F*_{ST} from a sequence of samples taken pre- and postmigration, rather than a function of *F*_{IT} and *F*_{ST} for a single sample as given here. Vitalis (2002) does not report an expression for *N*.

Under the postmigration census scheme, Equation A1.1 of Vitalis (2002) modified for an infinite number of demes with no mutation and no sex-specific dispersal yields, at temporal equilibrium, the recursionsSolving for *Q*_{1(t)} and *Q*_{2(t)} givesIn this case, because *Q*_{1(t)} = *Q*_{2(t)}, the parameters *F*_{IT} and *F*_{ST} are given bywhere the approximation omits terms proportional to *m*^{2}. Equation 12 in Vitalis (2002) assuming no mutation and an infinite number of demes is the same as the equation for *F*_{ST} given above, but Vitalis (2002) does not report an expression for *F*_{IT}. Hence, unlike the premigration census, the parameters *m* and *N* cannot be uniquely determined from *F*_{IT} and *F*_{ST} under a postmigration census.

Under the finite-island model with a *k*-allele mutation scheme, the number of demes, *s*, is finite, and each gene occupies one of *k* allelic states, mutates with probability *u* per generation, and, given a mutation event, mutates to each of the other *k* − 1 alleles with equal probability. Under a premigration census, modifying Equation A1.4 of Vitalis (2002) to assume the finite-island model with a *k*-allele mutation scheme with no sex-specific dispersal yields, at temporal equilibrium, the recursionswhereThe above equations, being linear, can be solved explicitly for the equilibrium values of *Q*_{1(t)}, *Q*_{2(t)}, and *Q*_{3(t)}. However, because I could not identify a simple form for the resulting expressions, I do not list them here. Under a postmigration census, modifying Equation A1.1 of Vitalis (2002) to assume the finite-island model with a *k*-allele mutation scheme with no sex-specific dispersal yields, at temporal equilibrium, the recursionsHence, at temporal equilibrium under a postmigration census, *Q*_{1(t)} is nearly equal to *Q*_{2(t)} because the mutation rate, *u*, is typically a very small number. Thus, the estimation of *u*, *m*, and *N* under a postmigration census should be difficult.

In the case of dioecious populations (assuming the locus is not sex-linked), probabilities of identity within and between individuals must be specified for male and female pairs of genes so that and are the probabilities that genes are identical in state within female and male individuals; and are the probabilities that genes are identical in state between two females, between two males, and between a female and a male within a deme; and and are the probabilities that genes are identical in state between two females, between two males, and between a female and a male for individuals in different demes (*e.g*., Vitalis 2002). Under a premigration census, modifying Equation A1.4 of Vitalis (2002) to have the *k*-allele mutation model with no sex-specific dispersal yields, at temporal equilibrium, the dioecious population recursionsHence, at temporal equilibrium the probabilities of identity for dioecious populations are identical to those in monoecious populations (with no selfing) when males and females have identical demography.

Probabilities of gene identity have been analyzed extensively in the population genetics literature. I briefly summarize previous results in the context of the models that I have presented here. Equation 2-1 of Maruyama (1970) and Equation 1 of Nei and Feldman (1972) can be derived in the present context by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with random mating [including random selfing; hence, *Q*_{1(t)} = *Q*_{2(t)}] under a postmigration census. Equation 78 of Nagylaki (1983) with zero selfing can be derived by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with no selfing under a postmigration census. Equation 5 of Crow and Aoki (1984) can be derived by assuming a finite-island model, mutation among *k* alleles, and monoecious populations with random mating [including random selfing; hence, *Q*_{1(t)} = *Q*_{2(t)}] under a postmigration census. Equation 2 of Epperson (1999) can be derived by assuming a finite-island model with general between-deme migration rates, no mutation, and monoecious populations with random mating under a postmigration census. Recursions for *Q*_{1(t)} and *Q*_{3(t)} [a one-generation recursion for *Q*_{2(t)} is not presented] in Maynard Smith (1970) can be derived by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with no selfing under a premigration census. Equation 5 of Vitalis and Couvet (2001) with zero selfing can be derived by assuming an infinite-island model, mutation among *k* alleles, and monoecious populations with no selfing under a premigration census. Equations A1.1 and A1.4 of Vitalis (2002), assuming pre- and postmigration census schemes, respectively, can be derived assuming dioecious populations with sex-specific dispersal following the infinite-allele mutation model in a finite number of demes. Finally, the juvenile life stage recursions of Balloux *et al*. (2003) with no selfing and no clonal reproduction can be derived by assuming a finite-island model, mutation among an infinite number of alleles, and monoecious populations with no selfing under a premigration census. Although many previous studies have analyzed probabilities of gene identity, I am not aware of any study that has identified the connection between the census scheme and the procedures for estimating mutation rate, migration rate, and effective population size as I have outlined them here.

## Acknowledgments

I thank Mark Holder, Steve Hudman, John Kelly, Rasmus Nielsen, Bruce Weir, and two anonymous reviewers for assistance, conversations, and/or comments concerning this research. I acknowledge funding from the University of Kansas and the National Science Foundation (DEB 06-09722).

## Footnotes

Communicating editor: R. Nielsen

- Received July 17, 2007.
- Accepted August 13, 2007.

- Copyright © 2007 by the Genetics Society of America