Abstract
The heterozygoteexcess method is a recently published method for estimating the effective population size (N_{e}). It is based on the following principle: When the effective number of breeders (N_{eb}) in a population is small, the allele frequencies will (by chance) be different in males and females, which causes an excess of heterozygotes in the progeny with respect to HardyWeinberg equilibrium expectations. We evaluate the accuracy and precision of the heterozygoteexcess method using empirical and simulated data sets from polygamous, polygynous, and monogamous mating systems and by using realistic sample sizes of individuals (15120) and loci (530) with varying levels of polymorphism. The method gave nearly unbiased estimates of N_{eb} under all three mating systems. However, the confidence intervals on the point estimates of N_{eb} were sufficiently small (and hence the heterozygoteexcess method useful) only in polygamous and polygynous populations that were produced by <10 effective breeders, unless samples included > ∼60 individuals and 20 multiallelic loci.
THE effective population size (N_{e}) is an important parameter in evolutionary genetics and conservation biology because it influences the rate of inbreeding and loss of genetic variation. It also influences the efficiency of natural selection in maintaining beneficial alleles and purging deleterious ones. For example, when N_{e} is very small, genetic drift will often be too strong for natural selection to efficiently maintain or purge alleles. Unfortunately, N_{e} has proven very difficult to estimate in natural populations (Waples 1989). Thus, any method with potential for improving our ability to estimate N_{e} deserves thorough evaluation. N_{e} can be estimated using demographic or genetic data. However, demographic methods require information such as variance in reproductive success, which is difficult to obtain for many species. Further, demographic estimates of N_{e} are often higher than the true N_{e} because demographic methods seldom incorporate all of the factors (e.g., skewed sex ratios, change in population size, etc.) that can make N_{e} smaller than the population census size (N_{c}) (Frankham 1995).
The (current) effective population size can be estimated using genetic data and the four following statistical methods: the loss of heterozygosity method (e.g., Harris and Allendorf 1989); the temporal method (Krimbas and Tsaskas 1971; Waples 1989); the linkage disequilibrium method (Hill 1981; Waples 1991); and, most recently, the heterozygoteexcess method (Pudovkinet al. 1996). Pudovkin et al. (1996) demonstrated that the heterozygoteexcess method estimates the effective number of breeding parents (N_{eb}) with no bias and fair precision when the sample size of progeny is infinite and when gametes combine completely at random, i.e., when all male gametes have an equal probability of combining with all female gametes, as in some polygamous, randommating species (e.g., marine invertebrates such as shellfish).
Here, we extend the evaluation of Pudovkin et al. (1996) to include finite samples of individuals (n = 15120), reasonable numbers of polymorphic loci (530) with a wide range of allele frequencies, monogamous mating systems where only pairmating occurs, and polygynous mating systems where only a few males mate with many females (i.e., skewed sex ratios). Our goal is to delineate the conditions under which the heterozygoteexcess method will be useful for estimating N_{eb} in natural populations (or in captive populations such as fish hatcheries). In this article, we (i) explain the importance of using an unbiased estimator of the expected heterozygosity (H_{exp}) for calculating Nˆ_{eb} from finite samples of progeny, (ii) quantify the bias of Nˆ_{eb,} (iii) determine the number of loci and individuals that must be sampled to achieve precise estimates of N_{eb}, and (iv) test if monogamy generates an interfamily Wahlund effect (i.e., heterozygote deficiency) that counteracts the heterozygote excess generated by small N_{eb}. To conduct these evaluations, we use data from simulations and natural populations. We focus on populations with a small N_{eb} (4100) because a heterozygote excess is generated only when N_{eb} is small.
PRINCIPLE AND METHODS
The principle of the heterozygoteexcess method is as follows: When the number of breeders is small, the allele frequencies in males and females will be different due to binomial sampling error. This difference generates an excess of heterozygotes in the progeny relative to the proportion of heterozygotes expected under HardyWeinberg equilibrium (Robertson 1965; Rasmussen 1979). The proportion of heterozygotes expected in progeny produced by a small and equal number of females and males can be calculated as (Falconer 1989, p. 67)
Pudovkin et al. (1996) called H′ the proportion of heterozygotes expected to be observed (H_{obs}) in the progeny (given a limited number of parents), whereas the expected proportion of heterozygotes in the base population under HardyWeinberg proportions, 2pq, was designated as H_{exp}.
Now Nˆ_{eb} can be estimated as follows:
We note that when the sample size of individuals is finite, H_{exp} must be estimated using the following unbiased estimator of 2pq (Nei 1987): 2N(2pq)/2N  1, where N is the number of progeny sampled. If 2pq is used instead of the unbiased estimator, Nˆ_{eb} will give a biased estimate N_{eb} (especially when N is small). In nature, N_{eb} can range from only two to near infinity and Nˆ_{eb} can be negative (e.g., in the case of an overall deficit of heterozygotes). In our analyses, we considered N_{eb} values as infinite if Nˆ_{eb} was negative or >10,000 (an arbitrary but reasonably large value). If Nˆ_{eb} is infinite, it simply means that the heterozygoteexcess signal is obscured by the noise (i.e., sampling error) associated with small samples of loci or individuals.
The simulation model that we used to evaluate the heterozygoteexcess method has three main steps. First, it assigns genotypes to the parental generation using random numbers and a predefined allele frequency distribution. We modeled loci with 2, 3, and 5 alleles and the following three allele frequency distributions: equal (e.g., H = 0.8, for 5 alleles), triangular (e.g., 0.33, 0.30, 0.20, 0.13, 0.07, and H = 0.74; see Pudovkinet al. 1996), and rare (e.g., 0.52, 0.2, 0.10, 0.07, 0.04, with H = 0.67). Second, the simulation model randomly picks a male and female parent and one gamete from each. Under the polygamy model, the gametes from males and females are randomly combined such that one male can potentially mate with several females and vice versa. Under monogamy, the same random male is always paired with the same female. For example, when N_{eb} = 4, one random male is paired with one random female, and then a second random male is paired with a second random female. Then one of the two pairs is randomly chosen and gametes are drawn to make an offspring. This is repeated until 15120 offspring have been generated. Here, the variance in reproductive success among the four parents follows a Poisson distribution, thus N_{eb} = 4. Under polygyny, only one or two males mate with all the females. For example, if one male mates with 99 females, the number of breeders is 100 but the effective number is only ∼4 [N_{e} = 4(N_{m})(N_{f})/(N_{m} + N_{f}), where N_{m} and N_{f} are the number of breeding males and females, respectively; Crow and Kimura 1970]. Third, the program computes Nˆ_{eb} from a random sample of 15120 progeny using Equations 1 and 2. These three steps were repeated 5002000 times per combination of the following parameters: N_{eb}, sample size of individuals and loci, allele number, allele frequency distribution, and mating system.
RESULTS AND DISCUSSION
Bias: Our simulations suggest that the heterozygoteexcess method slightly overestimates the N_{eb} when using finite samples and Nei’s unbiased estimator of H_{exp}. When N_{eb} was 4, 20, and 100, the harmonic mean estimates of N_{eb} (from 500 simulations) were 4.1, 22.2, and 103.4, respectively, when sampling 30 individuals and 10 loci (five alleles/locus) with triangular allele frequency distributions and a polygamous breeding system (Table 1). When more individuals were sampled, the bias was slightly lower. Harmonic mean estimates of N_{eb} were nearly identical for loci with two, three, and five alleles and with widely different allele frequency distributions (Table 1; Figure 1, horizontal bars inside box plots). The largest bias occurred under the polygynous mating system (e.g., = Nˆ_{eb} 5.0 when true N_{eb} ≅ 4; Figure 1). This bias was not surprising given the assumption of the heterozygoteexcess method that there are equal numbers of male and female parents. Still, the bias was small enough not to substantially diminish the usefulness of the heterozygoteexcess method.
The harmonic mean N_{eb} estimates were similar for the monogamous and polygamous mating systems (Nˆ_{eb} was 4.5 and 4.1, respectively, when N_{eb} was 4.0; Figure 1). This suggests that there is little impact of an interfamily Wahlund effect on the accuracy of N_{eb} estimates. When only approximately two to three large families exist (e.g., under monogamy with N_{eb} equaling 46), sampling across families does not generate a large heterozygote deficiency (i.e., Wahlund effect). However, a Wahlund heterozygote deficiency is expected when many families exist (A. Pudovkin, personal communication). Such a deficiency would at least partially cancel the heterozygote excess caused by small N_{eb}, and thereby cause biased (high) estimates of N_{eb}. Although monogamy did not cause a large bias, it did substantially increase the variance among N_{eb} estimates (see below and Figure 1).
Precision: To quantify the precision of the N_{eb} estimates, we used the Student’s tdistribution to compute a 95% confidence interval for each Nˆ_{eb} (as in Pudovkinet al. 1996). The confidence interval on Nˆ_{eb} contained the true N_{eb} in 9296% of the simulation estimates of N_{eb} when using loci with five alleles (Table 1). As expected, approximately half of the confidence intervals that did not contain the true N_{eb} were too low (L) and half were too high (H). This suggests that the Student’s tdistribution is useful for computing confidence intervals, even though N_{eb} is not exactly normally distributed. For loci with three alleles or for monogamous mating systems, the confidence intervals also contained the true N_{eb} ∼9296% of the time. However, when using loci with only two alleles, the confidence intervals were generally too narrow and contained the true N_{eb} in only 8389% of the simulation estimates of N_{eb} (Table 1). Thus, confidence intervals must be interpreted with caution or computed by alternative methods (e.g., bootstrap resampling) when using loci with only two alleles (e.g., many allozyme loci).
Under extreme polygyny (e.g., one male mating with 99 females), the confidence intervals were often too high. For example, when N_{eb} was four, ∼25% of the 500 simulated confidence intervals were slightly higher than the true N_{eb}, and none were lower than N_{eb}. Although the confidence intervals were often too high, they were also much narrower under polygyny than under monogamy or polygamy (Figure 1). This narrowness substantially increases the usefulness of the heterozygoteexcess method under polygyny. Thus, under extreme polygamy, the heterozygoteexcess method will be useful for detecting a small N_{eb} but will be less useful for quantifying the exact size of N_{eb}.
To determine the minimum number of loci and individuals that must be sampled to achieve a high probability of obtaining narrow confidence intervals, we plotted the distribution of the (upper and lower) 95% confidence limits obtained from 500 simulation estimates of N_{eb}. When the true N_{eb} is only 4, at least 10 loci (with five alleles) and 30 individuals must be sampled to achieve an 80% probability of obtaining an upper 95% confidence limit < ∼20 (Figure 2b). In other words, the statistical power is 0.80 when testing the null hypothesis that the true N_{eb} ≠ 20 (and when the true N_{eb} is actually only 4). The power will be slightly higher when using a onetailed test and the null hypothesis that true N_{eb} ≥ 20 (the alternative hypothesis is N_{eb} < 20).
These results show that the heterozygoteexcess method is sufficiently powerful for detecting a small N_{eb} when sampling reasonable numbers of individuals and loci with five alleles. Such results are important for conservation biology and the management of captive and natural populations. The precision of Nˆ_{eb} is often increased more by analyzing a larger number of loci than by sampling more individuals. Doubling the number of loci from 10 to 20 (compare the first box plot in Figure 2, b and c) generally reduces confidence intervals more than doubling the number of individuals sampled from 15 to 30 (compare the first two box plots in Figure 2b). However, the benefit from doubling the number of loci depends on the number and frequency of alleles (Figure 1).
When true N_{eb} is 10, we must sample >20 polymorphic loci and 60 individuals to have an 80% probability of obtaining confidence intervals that are <50 (and to have a 95% probability of obtaining confidence intervals <100; Figure 2e). When the true N_{eb} is 100, >80% of the confidence intervals include infinity, even when sampling 120 individuals and 20 loci with five alleles (data not shown). Clearly, when N_{eb} is >10, very large samples of loci and individuals are required to achieve a high probability of obtaining reasonably small confidence intervals. Thus, the main limitation of the usefulness of the heterozygoteexcess method is its poor precision, i.e., its wide confidence intervals. The confidence intervals are generally too wide for the method to be useful when using diallelic loci, loci with mostly rare alleles, or when studying strictly monogamous species (Figure 1).
When applied to data from natural populations, the heterozygoteexcess method often gave estimates of N_{eb} equal to infinity. For example, Nˆ_{eb} was infinity in 5 of 10 cohorts for which the total number of parents was known (or estimated) to be small (i.e., three to a few dozen). Further, only 2 of the 10 estimates gave 95% confidence intervals that did not include infinity as an upper limit (Table 2). This poor precision is not surprising in that only 59 polymorphic loci were analyzed, and only 1125 progeny were sampled. Additional empirical evaluations are needed, but it is extremely difficult to find large data sets containing individuals produced from a known number of parents.
One potential limitation of the method is the requirement for random, representative sampling. For example, if a sample contains only one or few families (due to sampling error) then we could obtain a very low N_{eb} estimate, even though many families actually exist and N_{eb} is large. Another obvious limitation is that the method will work only in species with separate sexes. The method will work for haplodiploid species (e.g., Hymenopterans), but will require the derivation of equations different from those presented here.
Four approaches may help circumvent the problem of poor precision. First, one can compute 80% confidence intervals (in addition to 95% confidence intervals). This will reduce the likelihood that the upper confidence limit will include infinity and be uninformative. Second, one could explore alternative methods for computing confidence intervals (e.g., nonparametric methods such as bootstrap resampling of loci). Third, one could combine estimates of N_{eb} from several generations or cohorts by computing the harmonic mean of Nˆ_{eb} over the multiple generations or cohorts. This can reduce the probability of obtaining infinity for Nˆ_{eb} because, when computing a harmonic mean, the low estimates carry far more weight than high ones (e.g., infinity). Finally, one can potentially combine estimates of N_{e} obtained from several independent N_{e} estimators by computing the harmonic mean of the N_{e} estimates. Other promising N_{e} estimators include those based on gametic disequilibrium (Hill 1981) and on temporal variance in allele frequencies (Krimbas and Tsaskas 1971; Waples 1991). These two estimators also suffer from low precision (Waples 1989, 1991; Luikart 1997; Luikartet al. 1998). However, two or more of the estimators may be independent (Waples 1991; Pudovkinet al. 1996) and thus could potentially be used simultaneously to achieve a more precise estimate of N_{e}. More research is needed to evaluate the precision and accuracy that can be achieved by using several N_{e} estimators simultaneously. Any improvement in our ability to estimate N_{e} would be significant in light of both the difficulties in assessing N_{e} and the importance of N_{e} in population genetics and in conservation biology.
Acknowledgments
We thank I. TillBottraud and two anonymous reviewers for helpful comments on earlier versions of this manuscript, M. Schwartz for sharing unpublished simulation data, and especially P. Spruell, F. W. Allendorf, A. Estoup, and M. Brown for providing data sets. Support was provided by the French “Bureau Ressources Génétiques,” a postdoctoral fellowship (for G.L.) from National Science Foundation/North Atlantic Treaty Organization, and the Laboratoire de Biologie des Populations d’Altitude.
Footnotes

Communicating editor: G. B. Golding
 Received June 17, 1998.
 Accepted November 20, 1998.
 Copyright © 1999 by the Genetics Society of America