Estimation of Effective Population Size and Migration Rate From One and TwoLocus Identity Measures
 ^{*} Laboratoire Génétique et Environnement, Institut des Sciences de l'Évolution de Montpellier, Université Montpellier II, 34095 Montpellier Cedex 05, France
 ^{†} Station Biologique de la Tour du Valat, Le Sambuc, 13200 Arles, France
 ^{‡} Laboratoire Génome, Populations et Interactions, Université Montpellier II, 34095 Montpellier Cedex 05, France
 ^{§} CRBPOMuseum National d'Histoire Naturelle, 75005 Paris, France
 Corresponding author: Renaud Vitalis, Laboratoire de Génétique et Environnement, C.C. 065, Institut des Sciences de l'Évolution de Montpellier, Université de Montpellier II, Place Eugène Bataillon, 34095 Montpellier Cedex 05, France. Email: vitalis{at}isem.univmontp2.fr
Abstract
Standard methods for inferring demographic parameters from genetic data are based mainly on onelocus theory. However, the association of genes at different loci (e.g., twolocus identity disequilibrium) may also contain some information about demographic parameters of populations. In this article, we define one and twolocus parameters of population structure as functions of one and twolocus probabilities for the identity in state of genes. Since these parameters are known functions of demographic parameters in an infinite island model, we develop momentbased estimators of effective population size and immigration rate from one and twolocus parameters. We evaluate this method through simulation. Although variance and bias may be quite large, increasing the number of loci on which the estimates are derived improves the method. We simulate an infinite allele model and a K allele model of mutation. Bias and variance are smaller with increasing numbers of alleles per locus. This is, to our knowledge, the first attempt of a joint estimation of local effective population size and immigration rate.
IN finite populations, genes undergo a random sampling process, known as genetic drift. As a consequence of this process, genetic variation is continuously lost. A powerful way to illustrate this point is that if we could trace the genealogy of genes, going backward in time, we would observe a continuous decrease in the number of “ancestor” genes. For neutral genes, the dynamic of this process depends mostly on the effective population size. Indeed, the expected time elapsed since two genes diverged from their common ancestor (the coalescence time) increases with the effective population size. The effective population size, noted N_{e}, has been defined as the size of an ideal population, in which a given genetic parameter (e.g., the rate of inbreeding) takes the same expected value as in the population under scrutiny (Wright 1931). This suggests that there may be many different effective sizes depending on which parameter is chosen. Indeed, the inbreeding effective size has been defined from the expected rate of increase in homozygosity, and the variance effective size is derived from the expected rate of increase of variance in allele frequency per generation (Crow and Denniston 1988). Ewens (1979) also defined the eigenvalue effective size as the first nonunit eigenvalue of the transition matrix for allelic changes in a population. In the particular case of the WrightFisher model, inbreeding and variance effective sizes take the same value (Crow and Kimura 1970, chap. 8) as does the eigenvalue effective size (Ewens 1982). This model assumes a finite monoecious population of constant size, nonoverlapping generations, random mating, and equal expected contribution of individuals to the next generation (Fisher 1930; Wright 1931).
The conditions under which the different definitions of N_{e} coincide can be understood by considering the relationship between the rate of coalescence and the effective population size. The asymptotic rate at which two genes coalesce (the probability of coalescence) in a single ideal monoecious diploid population of N individuals is 1/(2N). But consider a structured population with different classes of individuals. Those classes may represent groups of individuals differing by their sex, age, stage, social rank, or geographical position. In contrast with unstructured populations, lines of descent may now coalesce at different rates within classes and among classes.
There is a strict relationship between probabilities of identity by descent and coalescence times (Malécot 1975; Slatkin 1991). Indeed, the leading eigenvalue of the transition matrix describing the increase in identity by descent for pairs of genes gives the longterm rate of coalescence of lineages. It is also equal to the leading nonunit eigenvalue of the transition matrix that gives the change in the distribution of allelic states (Whitlock and Barton 1997). This is true for an arbitrarily structured population. Therefore, the longterm rate of coalescence defines an asymptotic effective size that gives the asymptotic rate of increase in inbreeding, gene frequency variance, and the expected rate of change in the distribution of allelic states (Whitlock and Barton 1997).
Mating systems, variance in the reproductive success of individuals, changes in population size through time, skewed sex ratios, and overlapping generations are some factors expected to make the effective size different from the census size (Caballero 1994). The effective population size also affects the efficiency of natural selection in maintaining advantageous mutations or promoting the spread of new ones. A small effective size also reduces the ability of natural selection to purge slightly deleterious alleles. Slightly deleterious mutations are more likely to be fixed, decreasing the mean fitness of the population and, thus, further reducing the effective population size (Lynch and Gabriel 1990; Gabriel and Bürger 1994). This runaway process, described as the mutational meltdown, eventually leads to extinction (Lande 1994; van Noordwijk 1994; Lynch et al. 1995a,b). But despite its importance in predicting population health or inferring past demography, estimation of effective size in natural populations is still a difficult task (Waples 1989; Schwartzet al. 1999).
Since the discrepancy between the census and the effective size of a population depends mainly on the parameters of the mating system and relative reproductive success of individuals, effective population size can be calculated from the direct evaluation of these parameters (Nunney and Elam 1994). However, the demographic data needed to calculate the effective population size are difficult to obtain practically in natural situations (Crow and Denniston 1988). Moreover, singleseason assessments of these parameters are known to overestimate longterm effective population size, since interannual fluctuations in population size are not taken into account (Vucetichet al. 1997). As an alternative to direct evaluation of effective population size from ecological data, indirect estimates of effective size from genetical data have attracted much attention. So far, indirect estimates have been based on the temporal change in allele frequencies (Nei and Tajima 1981; Pollak 1983; Waples 1989; Williamson and Slatkin 1999), the excess of heterozygotes in progeny (Pudovkinet al. 1996; Luikart and Cornuet 1999), and linkage disequilibrium (Hill 1981; Waples 1991; Bartleyet al. 1992).
It is worth noting that all of these approaches assume a single isolated population. However, in natural situations, it is more likely that populations of the same species are connected, to some extent, by gene flow (Slatkin 1987). Indeed, from an evolutionary and ecological perspective, the only way a species can persist in fragmented and changing habitats is to disperse (Hanski and Gilpin 1997). Moreover, patterns of dispersal are of primary importance in preventing or favoring local adaptation (Maynard Smith and Hoekstra 1980). There have been several attempts to estimate gene flow from genetic data (Slatkin and Barton 1989). However, all the approaches based on F_{ST} or the rare allele method of Slatkin (1985) cannot untangle the effect of drift from the migration pattern (Slatkin and Barton 1989). Indeed, gene flow is estimated as N_{e}m, the product of effective population size and immigration rate, and is referred to as the “effective number of migrants” per generation. Even more powerful methods, based on coalescent theory, have the same drawback (Slatkin and Maddison 1989; Beerli and Felsenstein 1999). Tufto et al. (1996) developed a maximumlikelihood method to estimate patterns of migration from allele frequency distribution. Their method provides estimates for short and long range migration rates, but with the effective population size treated as a known parameter.
Here, we propose a methodofmoments approach to infer effective population size on the one hand and immigration rate on the other hand, based on estimates of functions of one and twolocus identity probabilities. Onelocus parameters are functions of first and second moments of allele frequencies. Twolocus identity probabilities are functions of higherorder moments, up to the fourth (Ohta 1982a,b). Onelocus identity probabilities are influenced primarily by genetic drift and local immigration. Twolocus identity probabilities are also influenced by drift and migration, through the formation of gametic disequilibria (Weir and Cockerham 1969; Cockerham and Weir 1977; Ohta 1982a,b; Tachida and Cockerham 1986). Thus, the joint analysis of one and twolocus identity probabilities provides more information on the parameters of interest than the analysis of only onelocus parameters.
In a companion article, we derived the expected values for twolocus probabilities for identity in state in an island model with partial selfing (Vitalis and Couvet 2001). We defined a twolocus parameter, which we call the “withinsubpopulation identity disequilibrium,” as the excess of twolocus identity probabilities over the product of singlelocus probabilities among individuals within subpopulations. Here, we first recall the definitions of one and twolocus identity probabilities and derive the expectation for onelocus parameters. We then define appropriate statistics for estimating one and twolocus parameters, whose expectations depend on the population parameters of interest (effective size N_{e}, migration rate m) and not on some nuisance parameters, such as the mutation rate or the model of mutation. Then, we present a simple methodofmoments joint estimation of effective size and migration rate. We further explore the reliability of N_{e} and m estimators by means of stochastic simulations.
THEORY
Onelocus population structure: The expectation of any descriptive statistic of population genetic structure can ultimately be expressed as a function of some probabilities of identity in state for pairs of genes taken at different hierarchical levels (within individuals, among individuals within a population, among populations).
Let us consider an infinite island model of population structure (Wright 1940). This model assumes that the whole population (or the species) is subdivided into an infinite number of subpopulations that exchange genes. We may relax the usual assumption of equal subpopulation sizes and immigration rates and then define as many sets of parameters as there are subpopulations. A focal subpopulation i has N_{e}_{i} diploid individuals and receives m_{i} migrant genes per generation. We define Q_{0,i} as the probability that two homologous genes randomly sampled in one individual from subpopulation i are identical in state (IIS), Q_{1,i} as the IIS probability for two genes randomly sampled in subpopulation i, and Q_{2} may also be defined as the IIS probability for two genes randomly sampled in the pool of immigrants.
Let us first assume an infinite allele model (IAM). In this model, mutation always creates a new allelic state in the population. Therefore, genes that are IIS are also identical by descent (IBD), i.e., two exact copies (without mutation) of the same ancestral gene (Malécot 1948). Let a_{i} = (1 — m_{i})^{2} be the frequency of pairs of genes that come from the same subpopulation in the previous generation. Each generation, some offspring are produced by selfing. We define s as the conditional probability that two homologous genes of one individual were produced by the same individual, given that they are copies of genes from one subpopulation in the previous generation. Mutations arise at rate μ, and ν = (1 — μ)^{2}. The recursion equations for IBD probabilities in the IAM are given by
We also derived the recursion equations for IIS probabilities in the K allele model (KAM). In this model, there are a finite number (K) of possible allelic states at a locus. Since the unconditional probability of mutation is μ, each gene mutates toward a particular allele with probability μ/(K — 1). Thus, genes that were IIS in the previous generation are still IIS in the current generation with probability ν′ = (1 — μ)^{2} + μ^{2}/(K — 1). Also, genes that were different in state in the previous generation can become IIS in the current generation with probability ψ = (1 — ν′)/(K — 1). Therefore, recursion equations for the IIS probabilities in the KAM are given by
Twolocus population structure: The onelocus theory for IIS probabilities can be extended to the twolocus case (Cockerham 1984; Goodnight 1987, 1988). In a random mating population, three twolocus IIS probabilities need to be defined (see Whitlocket al. 1993) for pairs, triplets, and quadruplets of sampled haplotypes (we call a haplotype a set of two genes taken at two distinct loci, which is inherited from a single parent, after recombination). However, in any case of departure from random mating, more coefficients must be defined, since IIS probabilities defined for pairs, triplets, or quadruplets of haplotypes may have different expectations whether these haplotypes are sampled in two, three, or four individuals. For this purpose, we defined a total of 10 IIS twolocus probabilities (see Vitalis and Couvet 2001, for further details). Figure 1 depicts the definition of twolocus IIS probabilities.
For example, for twoindividual IIS probabilities, φ is defined as the probability that two haplotypes sampled in two distinct individuals are IIS at the two loci considered. γ is defined as the probability that, when a single haplotype of one individual and the two haplotypes of a second individual are sampled, both pairs of homologous genes taken among the two individuals are IIS. And δ is the probability that, when both haplotypes of each individual are sampled, one pair (across individuals) is IIS at the first locus, and the other (distinct) pair is IIS at the second locus. See Figure 1 and Vitalis and Couvet (2001) for the definition of other IIS probabilities.
In a diploid population, the gametic phase is usually not known. Therefore, we define the compound IIS twolocus probability for two pairs of genes, each taken at two distinct loci, among individuals in a population as
It follows also from this graph that, besides any consideration about the variance of the estimates, large effective sizes would be difficult to estimate, since the amplitude of the variation of η_{S} with effective population size decreases as the effective size increases. Moreover, the amplitude of variation depends on F, suggesting that the efficiency of estimation should vary with the population genetic diversity. The joint analysis of one and twolocus identity probabilities for unlinked loci should thus permit the estimation of effective population size, provided this parameter is not of too high an order. Obtaining reliable estimates for large population sizes would require tightly linked loci (see also Hill and Weir 1994).
ESTIMATION
Onelocus statistics: For any given allele u, we define the indicator variable x_{ijku} to describe the state of the kth gene, with k = (1, 2), of the jth individual in the ith subpopulation. x_{ijku} = 1 if the allelic state is u, x_{ijku} = 0 otherwise. Let
Twolocus statistics: Now, x_{iju} is the indicator variable that describes the state of gene j in individual i at a first locus and y_{ijv} is the indicator variable that describes the state of gene j in individual i at a second, distinct, locus. For the sake of clarity, all subpopulation indices are dropped in this section. x_{iju} = 1 if the allelic state at the first locus is u, x_{iju} = 0 otherwise; and y_{ijv} = 1 if the allelic state at the second locus is v, y_{ijv} = 0 otherwise. Let
However, η_{S} depends on the underlying mutation model (Table 1). There has been some debate in the literature about the dependence on some measures of gametic disequilibrium of the underlying allellic frequencies (Hedrick 1987; Lewontin 1988). At the origin of this dispute is the fact that Hedrick (1987) considered allelic frequencies as fixed parameters. An alternative approach is to consider allelic frequencies as random variables, whose distribution depends on some parameters of the population model. In this sense, we expect that a good measure of disequilibrium shall satisfy the following criteria: (i) The parameter depends on the population parameters of interest, and not on any other parameter; (ii) the expectation of an estimator of this quantity, taken over replicates of the stochastic process of drift, depends only on the parameters of interest (unbiased statistic); and (iii) the distribution of this unbiased estimator depends on the parameters of interest, and not on any other parameter. This allows the measures to be compared across loci or populations, as well as to be pooled over loci.
Various parameters of gametic disequilibrium among loci i and j have been standardized with the product of the HardyWeinberg heterozygosities or nonidentities (Hedrick 1987; Ohta 1980),
Estimators for F and η′_{S} have been evaluated through the simulation of a population in an infinite island model. Several mutation models (from the twoallele model to the infinite allele model) were compared. For five alleles and more, the mean values for the parameters are close to their expectations and do not depend on the mutation model (Table 2). This agrees with the criteria given above. The effect of sample size is more pronounced for identity disequilibrium measures than for onelocus parameter measures. Indeed, increasing the sample size decreases the variance among η′_{S} estimates.
Estimation of N_{e} by the methodofmoments: For a focal population within an infinite island model, we have two statistics F̂ and
We solve numerically a system of two simultaneous
equations, with two unknowns, as follows. For a wide range of N_{e} values starting from N_{e} = 2, Equation 6 is solved for m, with F = F^_{.} Then, for each pair of N_{e} and m values, the expected value of
ASSESSING THE METHODOFMOMENTS ESTIMATOR
Simulation procedure in an infinite island model: We evaluated the method of inference through simulations. We focused our analysis on a focal population receiving migrants from an infinitely large number of populations. In practice, a single local population consisting of N_{e} diploid individuals was simulated. Gametic dispersal occurred prior to reproduction. Under the IAM, at every locus, immigrant individuals carried alleles that were absent from the local population. Therefore, the probability to draw IIS genes from two different populations (Q_{2}) was considered to be zero in the IAM and Q_{2} = 1/K in the KAM. In both cases, Q_{2} was taken as a fixed parameter when calculating statistics for the population parameters. Local random mating was assumed in all the results presented thereafter. For each set of simulations, the initial population was formed with 2N_{e} different alleles at each locus. For each set of parameters, the results from 10 independent simulations were pooled. In each simulation replicate, the first measure was realized after 1000 generations, and then every 100 generations, to avoid temporal correlation across samples, for 100 times. Fifty individuals were sampled for each measure. Indeed, since the identity disequilibrium may be very small in many circumstances, reasonably large sample sizes are required. However, this sample size is representative of those found in the literature (see Williamson and Slatkin 1999, and references therein).
Estimation efficiency was also assessed by examining the percentage of successful inferences as compared with unsuccessful ones. Indeed, among the 1000 identity measures obtained for a set of simulations, there were some cases in which reliable N_{e} estimates could not be inferred. Overall, we distinguished three cases.
Case 1: For a given F value, the
Case 2: The estimate for
Case 3: The estimation of N_{e} is reliable [lying in the interval (2, 10 × N_{e})].
Simulation results: Concerning the estimation of effective population size, a first source of bias is expected to result from the following. One and twolocus identity parameters are inversely related to effective population size. As a consequence, even if estimates of parameters are unbiased, effective population size estimates may be biased, since the expectation of an inverse function is not the inverse of the expectation of the function, as discussed in Cockerham and Weir (1993). This is why we present harmonic rather than arithmetic means of N̂_{e} estimates.
In the IAM, harmonic means for N̂_{e} may be underestimated (43.32 for N_{e} = 50 and 70.02 for N_{e} = 100) when 4 loci are pooled (Table 3). However, for a greater number of loci, estimates are in very close agreement with true values (Figure 3, Table 3). Indeed, increasing the number of loci reduces the confidence interval, as given by the interval between the 5th and 95th percentiles of the overall distribution of estimates. Increasing the number of loci also increases the percentage of successful inferences (Table 3). Estimations of immigration rates were also in close agreement with the truth. Estimates generally have a high variance, as indicated by the coefficients of variation. However, the variance was substantially reduced when at least 8 loci were sampled. The estimation was very efficient in most cases. In all cases, we note that estimating effective size <2 never occurred (Case 1). Conversely, for 8 loci and less, estimating infinite effective size (Case 2) occurred only when the true effective size was large. For 12 loci and more, the percentage of successful estimation (Case 3) was >95% (Table 3).
In the KAM, harmonic means for N_{e} are biased downward for 8 loci and less, in a more significant way than in the infinite allele model (Figure 3, Table 4). This point is also indicated by a lower percentage of successful inferences (Table 4). The variance was smaller in a 10allele rather than in a 5allele model. Again, increasing the number of loci reduces both the variance and the confidence interval for the estimates. The estimation was efficient (>85% of successful estimates) for 12 loci and more, or for 8 loci with at least 10 alleles. As in the IAM, unsuccessful estimates (Case 1) occurred only when the true effective size was small, and too large estimates (Case 2) occurred only when the true effective size was large. Again, estimates of immigration rates were close to the truth, although they were slightly overestimated in all cases.
We also evaluated the method for various migration rates in an infinite allele model with 8 and 16 independent loci (Table 5, Figure 4). Bias and variance were reduced with a higher effective number of migrants per generation, N_{e}m. Indeed, the dispersion around the mean was large for low migration rate. In this model, lower immigration rates imply also lower levels of gene diversity within subpopulations, and therefore a poorer estimation efficiency.
We obtained a confidence region from the joint distribution of N_{e} and m estimates in the following way: All simulations were run to generate a large number of observations, each of which consisted of a pair of N_{e} and m estimates. These observations were binned to a twodimensional array of size 100 × 100. The bins did not overlap, had the same width, and were evenly distributed in both dimensions. All bin counts were standardized by the total number of counts in the twodimensional array, and the discrete probability distribution was derived. Then, the cells were sorted in order of decreasing probability. Finally, starting from the cells with the highest associated probabilities, cells were sequentially added to the confidence region until the cumulative probability of the whole set of cells obtained was less than or equal to the preliminary fixed qvalue. From this procedure, we obtained for each simulation a region within which a fraction q of the data lay. This confidence region was not constrained to be continuous.
ROBUSTNESS TO MODEL ASSUMPTIONS
Finite number of demes: The reliance on the infinite island model might be seen as a serious drawback of our method. However, developing the theory in a finite island model would necessitate as many as 28 distinct twolocus identity coefficients (instead of 10 here) and would be thus far more complicated. In contrast, we assumed that the number of demes had a small effect on our estimators. Therefore, we chose to test our method on data sets generated from a more realistic population model. For this purpose, we simulated a finite island model. Twenty subpopulations were used, each containing the same number of reproducing individuals and receiving the same number of immigrant haplotypes per generation. As in the infinite island model, a single subpopulation was repeatedly sampled.
In contrast to the infinite island model, migration per se cannot maintain polymorphism within a subpopulation in a finite island model. Therefore, in our simulations, mutations will arise at a sufficient rate to maintain polymorphism. We chose the mutation rate so that the product dN_{e}μ is at least equal to 1. With smaller dN_{e}μ values, not enough variation was maintained at equilibrium. In particular, we used two mutation rates, μ = 10^{—3} and μ = 5 × 10^{—3}, giving dN_{e}μ = 1 and dN_{e}μ = 5 (with N_{e} = 50 and d = 20). These two mutation rates remain in the range of realistic values, as given by Estoup and Angers (1998) for microsatellite markers.
As one might expect, the results depend on the level of genetic variation (Table 5). With μ = 10^{—3}, the bias and variance of the N_{e} and m estimates are always larger than those obtained with the infinite island model. With μ = 5 × 10^{—3}, we obtained results that were in very close agreement with those obtained with the infinite island model. With only eight loci, bias and coefficient of variation can even be smaller with the simulations based on the finite island model. In all cases, however, the bias for N_{e} estimates was positive, although it was always negative when an infinite island model was simulated.
Figure 5 shows the joint distributions of N_{e} and m estimates over different numbers of loci and different effective numbers of migrants per generation, for two different mutation rates. As in the infinite island model, we obtained better estimates with larger numbers of migrants per generation (N_{e}m). With μ = 10^{—3}, even with 16 loci, the distributions of pairwise N_{e} and m estimates are always broader than the distributions obtained with data generated from infinite island model simulations (to compare, see Figure 4). With μ = 5 × 10^{—3} (Table 5, Figure 5B), the bias and variance were very close to the values obtained in the infinite island model. Bias and variance were also greatly reduced when 16 loci were scored (Figure 5). Increasing the sample size up to n = 100 also increases the estimation efficiency: more successful inferences are made, all of which give slightly less biased estimates with less variance.
Departure from equilibrium: Among the methods that aim at inferring population parameters, many rely on the hypothesis of equilibrium between mutation, migration, and drift. Our approach is no exception. To test whether our method was sensitive to recent departures from migration drift equilibrium, we conducted the same simulations as before with an infinite island model of population structure. But this time, individuals were sampled 10, 20, 30, 40, or 50 generations after the initial state. This process was repeated 1000 times.
Surprisingly, after only 50 generations, bias and variance of joint N_{e} and m estimates were as small as after 1000 generations (Figure 6). With eight loci sampled among 50 individuals, N̂_{e} = 52.33 [coefficient of variation (c.v.) = 0.60] and m̂ = 0.0216 (c.v. = 0.45). Estimates were more biased and the variance of the distribution was larger for m than for N_{e} estimates. Moreover, after 30–40 generations, N_{e} estimates showed bias and variance as low as after 1000 generations. In all cases, the joint inference was successful in >99% of cases.
DISCUSSION
Effective population size is directly related to the asymptotic rate of coalescence for neutral genes. Thus, this population parameter determines the rate at which neutral genetic variation is lost from the population. Therefore, any attempt to provide a reliable estimate of effective population size should deserve careful evaluation. Our method is, to our knowledge, the first attempt for a joint estimation of (local) effective population size and immigration rate.
An advantage of using this methodofmoments to estimate N_{e} is that it requires only a single sample. Our estimates of effective population size are in general slightly biased (Figure 3). The dispersion around the mean, as shown by the coefficient of variation as well as by the 95% confidence interval, is large and increases with the true population size (Figure 3, Table 3). Bias and variance are also decreased when the local immigration rate is increased (Figure 4). In this latter situation, the genetic diversity within the population is increased, as well as the gametic disequilibrium, making the estimation procedure more efficient.
However, we show that the bias and variance of effective population size estimates can be substantially reduced when F̂ and
We did not directly compare the results obtained with this method to other tentative estimations. The main reason is that alternative methods do not always estimate the same quantities. Another reason is that the number of required samples may be different. Variance of effective population size has been tentatively estimated from temporal changes in allele frequency (Nei and Tajima 1981; Pollak 1983; Waples 1989). These estimates also exhibited high variance (Waples 1989). Williamson and Slatkin (1999) recently proposed a maximumlikelihood method to estimate effective population size from temporal change in allele frequencies. This improvement of the method, although having smaller bias and lower variance, still exhibited a large variance. However, Williamson and Slatkin (1999) stated that the maximumlikelihood estimate of population size from temporal change in allele frequency should be difficult to compute exactly when there are more than two alleles per locus. Maximumlikelihood methods for highly polymorphic loci could be implemented using Monte Carlo Markov chain computations.
Pudovkin et al. (1996) provided an estimate of the effective number of breeders in a population from the observed excess of heterozygotes in the progeny relative to HardyWeinberg expected proportions in the base population. Any heterozygote excess relative to the expected distribution for random mating populations results from a difference in allele frequencies among different gamete pools. Indeed, in finite diploid populations, a difference in the allele frequencies between male and female uniting gametes is expected from the sampling error due to the finite size of male and female gamete pools and from the difference in allele frequencies among male and female parents (Wang 1996). Luikart and Cornuet (1999) further evaluated the accuracy and precision of this method. This procedure provides reliable estimates for very small numbers of breeders, but is sensitive to the mating system. Moreover, this method does not hold for consanguineous reproductive systems.
Effective population size has also been evaluated from the measure of the variance of the correlation of allele frequencies between pairs of loci (Hill 1981; Waples 1991; Bartleyet al. 1992). In an infinite isolated random mating population at mutationdrift equilibrium, the correlation of (neutral) allele frequencies at a pair of loci is zero. In a finite population, however, genetic drift causes the allele frequencies at independent loci to be correlated. Moreover, the magnitude of this correlation depends on the effective population size (Weir and Hill 1980; Hill 1981; Waples 1991). This method gave discouraging results and has not been evaluated through simulations.
Finally, none of these methods allow for migration or population subdivision. They assume single isolated populations that do not receive any migrant individuals from elsewhere. If this hypothesis were false, estimates of effective population size would be biased upward. Moreover, all previous attempts to estimate gene flow could not untangle the effect of local drift from immigration.
In conclusion, we address the issue of obtaining a confidence interval in practical cases. Resampling methods have their drawbacks. For example, the bootstrap does not seem to be adapted to correctly estimating second moments statistics. Estimates of one and twolocus identity probabilities, which are second and fourth moments of allelic frequency, are obtained through the comparison of pairs of haplotypes in a sample. Random sampling with replacement within a sample increases the probability of sampling the same individuals twice. Therefore, bootstrapping over individuals indubitably overestimates the measures of identity among pairs of individuals. So, we suggest obtaining a confidence interval by means of stochastic simulations. Since we suppose the population to be at migrationdrift equilibrium, we can simulate the stochastic process of dispersal and reproduction with estimated values of N_{e} and m used as input parameters. The distribution of new estimates could then be used to build a confidence interval.
Acknowledgments
We thank K. Dawson, G. Luikart, O. Hardy, I. Olivieri, and C. Garza for helpful comments on a previous draft of this manuscript. We thank M. Slatkin and two anonymous reviewers for constructive criticism of this manuscript. R.V. acknowledges financial support from the Training and Mobility of Researchers programme “FRAGLAND” of the European Commission, coordinated by I. Hanski, from contract number BIO4CT961189 of the Commission of the European Communities (DG XII), and from the Fondation Sansouire. This is contribution 2000099 of the Institut des Sciences de l'Évolution de Montpellier.
Footnotes

Communicating editor: M. Slatkin
 Received May 10, 2000.
 Accepted November 9, 2000.
 Copyright © 2001 by the Genetics Society of America