Abstract
The heterozygote-excess method is a recently published method for estimating the effective population size (Ne). It is based on the following principle: When the effective number of breeders (Neb) in a population is small, the allele frequencies will (by chance) be different in males and females, which causes an excess of heterozygotes in the progeny with respect to Hardy-Weinberg equilibrium expectations. We evaluate the accuracy and precision of the heterozygote-excess method using empirical and simulated data sets from polygamous, polygynous, and monogamous mating systems and by using realistic sample sizes of individuals (15-120) and loci (5-30) with varying levels of polymorphism. The method gave nearly unbiased estimates of Neb under all three mating systems. However, the confidence intervals on the point estimates of Neb were sufficiently small (and hence the heterozygote-excess method useful) only in polygamous and polygynous populations that were produced by <10 effective breeders, unless samples included > ∼60 individuals and 20 multiallelic loci.
THE effective population size (Ne) is an important parameter in evolutionary genetics and conservation biology because it influences the rate of inbreeding and loss of genetic variation. It also influences the efficiency of natural selection in maintaining beneficial alleles and purging deleterious ones. For example, when Ne is very small, genetic drift will often be too strong for natural selection to efficiently maintain or purge alleles. Unfortunately, Ne has proven very difficult to estimate in natural populations (Waples 1989). Thus, any method with potential for improving our ability to estimate Ne deserves thorough evaluation. Ne can be estimated using demographic or genetic data. However, demographic methods require information such as variance in reproductive success, which is difficult to obtain for many species. Further, demographic estimates of Ne are often higher than the true Ne because demographic methods seldom incorporate all of the factors (e.g., skewed sex ratios, change in population size, etc.) that can make Ne smaller than the population census size (Nc) (Frankham 1995).
The (current) effective population size can be estimated using genetic data and the four following statistical methods: the loss of heterozygosity method (e.g., Harris and Allendorf 1989); the temporal method (Krimbas and Tsaskas 1971; Waples 1989); the linkage disequilibrium method (Hill 1981; Waples 1991); and, most recently, the heterozygote-excess method (Pudovkinet al. 1996). Pudovkin et al. (1996) demonstrated that the heterozygote-excess method estimates the effective number of breeding parents (Neb) with no bias and fair precision when the sample size of progeny is infinite and when gametes combine completely at random, i.e., when all male gametes have an equal probability of combining with all female gametes, as in some polygamous, random-mating species (e.g., marine invertebrates such as shellfish).
Here, we extend the evaluation of Pudovkin et al. (1996) to include finite samples of individuals (n = 15-120), reasonable numbers of polymorphic loci (5-30) with a wide range of allele frequencies, monogamous mating systems where only pair-mating occurs, and polygynous mating systems where only a few males mate with many females (i.e., skewed sex ratios). Our goal is to delineate the conditions under which the heterozygote-excess method will be useful for estimating Neb in natural populations (or in captive populations such as fish hatcheries). In this article, we (i) explain the importance of using an unbiased estimator of the expected heterozygosity (Hexp) for calculating Nˆeb from finite samples of progeny, (ii) quantify the bias of Nˆeb, (iii) determine the number of loci and individuals that must be sampled to achieve precise estimates of Neb, and (iv) test if monogamy generates an interfamily Wahlund effect (i.e., heterozygote deficiency) that counteracts the heterozygote excess generated by small Neb. To conduct these evaluations, we use data from simulations and natural populations. We focus on populations with a small Neb (4-100) because a heterozygote excess is generated only when Neb is small.
PRINCIPLE AND METHODS
The principle of the heterozygote-excess method is as follows: When the number of breeders is small, the allele frequencies in males and females will be different due to binomial sampling error. This difference generates an excess of heterozygotes in the progeny relative to the proportion of heterozygotes expected under Hardy-Weinberg equilibrium (Robertson 1965; Rasmussen 1979). The proportion of heterozygotes expected in progeny produced by a small and equal number of females and males can be calculated as (Falconer 1989, p. 67)
Pudovkin et al. (1996) called H′ the proportion of heterozygotes expected to be observed (Hobs) in the progeny (given a limited number of parents), whereas the expected proportion of heterozygotes in the base population under Hardy-Weinberg proportions, 2pq, was designated as Hexp.
Now Nˆeb can be estimated as follows:
Harmonic mean N̂eb and distribution of 95% confidence limits for N̂eb from 500 simulations
We note that when the sample size of individuals is finite, Hexp must be estimated using the following unbiased estimator of 2pq (Nei 1987): 2N(2pq)/2N - 1, where N is the number of progeny sampled. If 2pq is used instead of the unbiased estimator, Nˆeb will give a biased estimate Neb (especially when N is small). In nature, Neb can range from only two to near infinity and Nˆeb can be negative (e.g., in the case of an overall deficit of heterozygotes). In our analyses, we considered Neb values as infinite if Nˆeb was negative or >10,000 (an arbitrary but reasonably large value). If Nˆeb is infinite, it simply means that the heterozygote-excess signal is obscured by the noise (i.e., sampling error) associated with small samples of loci or individuals.
The simulation model that we used to evaluate the heterozygote-excess method has three main steps. First, it assigns genotypes to the parental generation using random numbers and a predefined allele frequency distribution. We modeled loci with 2, 3, and 5 alleles and the following three allele frequency distributions: equal (e.g., H = 0.8, for 5 alleles), triangular (e.g., 0.33, 0.30, 0.20, 0.13, 0.07, and H = 0.74; see Pudovkinet al. 1996), and rare (e.g., 0.52, 0.2, 0.10, 0.07, 0.04, with H = 0.67). Second, the simulation model randomly picks a male and female parent and one gamete from each. Under the polygamy model, the gametes from males and females are randomly combined such that one male can potentially mate with several females and vice versa. Under monogamy, the same random male is always paired with the same female. For example, when Neb = 4, one random male is paired with one random female, and then a second random male is paired with a second random female. Then one of the two pairs is randomly chosen and gametes are drawn to make an offspring. This is repeated until 15-120 offspring have been generated. Here, the variance in reproductive success among the four parents follows a Poisson distribution, thus Neb = 4. Under polygyny, only one or two males mate with all the females. For example, if one male mates with 99 females, the number of breeders is 100 but the effective number is only ∼4 [Ne = 4(Nm)(Nf)/(Nm + Nf), where Nm and Nf are the number of breeding males and females, respectively; Crow and Kimura 1970]. Third, the program computes Nˆeb from a random sample of 15-120 progeny using Equations 1 and 2. These three steps were repeated 500-2000 times per combination of the following parameters: Neb, sample size of individuals and loci, allele number, allele frequency distribution, and mating system.
—Harmonic mean Nˆeb and the distribution of upper and lower 95% confidence limits on Nˆeb computed from 500 independent simulation estimates using 10 loci and samples of 30 individuals. The dotted horizontal line (between boxes) shows the true Neb. The solid horizontal line inside each box shows the harmonic mean Nˆeb. The top of each box is the 80th percentile of the distribution of 500 estimates of the upper 95% confidence limit on Nˆeb. Each line extending upward from a box is the 95th percentile of the distribution of the upper 95% confidence limit. Bottoms of boxes and lines extending downward from boxes represent the 20th and 5th percentiles, respectively, of the distribution of the lower 95% confidence limit. (*) Infinity is the 95th percentile of the distribution of the upper confidence limit. (**) Both the 95th and 80th percentiles of the distribution of upper confidence limits are infinity. Distributions are compared for a polygamous breeding system (i.e., complete random union of gametes between males and females) and for loci with (i) five alleles at equal frequencies (Even-5), (ii) five alleles at triangular frequencies (Tri-5, as in Table 1), (iii) five alleles including rare alleles (Rare-5, see principle and methods), (iv) three alleles with triangular frequencies (Tri-3), (v) two alleles with triangular frequencies (Tri-2), (vi) Tri-5 allele frequencies for a monogamous breeding system in which males and females are pair-mated, and, finally, (vii) Tri-5 for extreme polygyny where one male breeds all 99 females (and thus Neb ≈ 4).
RESULTS AND DISCUSSION
Bias: Our simulations suggest that the heterozygote-excess method slightly overestimates the Neb when using finite samples and Nei’s unbiased estimator of Hexp. When Neb was 4, 20, and 100, the harmonic mean estimates of Neb (from 500 simulations) were 4.1, 22.2, and 103.4, respectively, when sampling 30 individuals and 10 loci (five alleles/locus) with triangular allele frequency distributions and a polygamous breeding system (Table 1). When more individuals were sampled, the bias was slightly lower. Harmonic mean estimates of Neb were nearly identical for loci with two, three, and five alleles and with widely different allele frequency distributions (Table 1; Figure 1, horizontal bars inside box plots). The largest bias occurred under the polygynous mating system (e.g., = Nˆeb 5.0 when true Neb ≅ 4; Figure 1). This bias was not surprising given the assumption of the heterozygote-excess method that there are equal numbers of male and female parents. Still, the bias was small enough not to substantially diminish the usefulness of the heterozygote-excess method.
The harmonic mean Neb estimates were similar for the monogamous and polygamous mating systems (Nˆeb was 4.5 and 4.1, respectively, when Neb was 4.0; Figure 1). This suggests that there is little impact of an interfamily Wahlund effect on the accuracy of Neb estimates. When only approximately two to three large families exist (e.g., under monogamy with Neb equaling 4-6), sampling across families does not generate a large heterozygote deficiency (i.e., Wahlund effect). However, a Wahlund heterozygote deficiency is expected when many families exist (A. Pudovkin, personal communication). Such a deficiency would at least partially cancel the heterozygote excess caused by small Neb, and thereby cause biased (high) estimates of Neb. Although monogamy did not cause a large bias, it did substantially increase the variance among Neb estimates (see below and Figure 1).
Precision: To quantify the precision of the Neb estimates, we used the Student’s t-distribution to compute a 95% confidence interval for each Nˆeb (as in Pudovkinet al. 1996). The confidence interval on Nˆeb contained the true Neb in 92-96% of the simulation estimates of Neb when using loci with five alleles (Table 1). As expected, approximately half of the confidence intervals that did not contain the true Neb were too low (L) and half were too high (H). This suggests that the Student’s t-distribution is useful for computing confidence intervals, even though Neb is not exactly normally distributed. For loci with three alleles or for monogamous mating systems, the confidence intervals also contained the true Neb ∼92-96% of the time. However, when using loci with only two alleles, the confidence intervals were generally too narrow and contained the true Neb in only 83-89% of the simulation estimates of Neb (Table 1). Thus, confidence intervals must be interpreted with caution or computed by alternative methods (e.g., bootstrap resampling) when using loci with only two alleles (e.g., many allozyme loci).
—Distributions of 500 (95%) confidence limits on Nˆeb computed from 500 independent simulation estimates, as in Figure 1. Dotted horizontal lines represent the true Neb being estimated (4 and 10 for a-c and d-f, respectively). (b and e) “80%” shows the 80th percentile of the distribution of the upper 95% confidence limits. This distribution was generated from 500 independent simulation estimates of Neb. In e, 460 is the 95th percentile of the distribution of the upper 95% confidence limits.
Under extreme polygyny (e.g., one male mating with 99 females), the confidence intervals were often too high. For example, when Neb was four, ∼25% of the 500 simulated confidence intervals were slightly higher than the true Neb, and none were lower than Neb. Although the confidence intervals were often too high, they were also much narrower under polygyny than under monogamy or polygamy (Figure 1). This narrowness substantially increases the usefulness of the heterozygote-excess method under polygyny. Thus, under extreme polygamy, the heterozygote-excess method will be useful for detecting a small Neb but will be less useful for quantifying the exact size of Neb.
To determine the minimum number of loci and individuals that must be sampled to achieve a high probability of obtaining narrow confidence intervals, we plotted the distribution of the (upper and lower) 95% confidence limits obtained from 500 simulation estimates of Neb. When the true Neb is only 4, at least 10 loci (with five alleles) and 30 individuals must be sampled to achieve an 80% probability of obtaining an upper 95% confidence limit < ∼20 (Figure 2b). In other words, the statistical power is 0.80 when testing the null hypothesis that the true Neb ≠ 20 (and when the true Neb is actually only 4). The power will be slightly higher when using a one-tailed test and the null hypothesis that true Neb ≥ 20 (the alternative hypothesis is Neb < 20).
These results show that the heterozygote-excess method is sufficiently powerful for detecting a small Neb when sampling reasonable numbers of individuals and loci with five alleles. Such results are important for conservation biology and the management of captive and natural populations. The precision of Nˆeb is often increased more by analyzing a larger number of loci than by sampling more individuals. Doubling the number of loci from 10 to 20 (compare the first box plot in Figure 2, b and c) generally reduces confidence intervals more than doubling the number of individuals sampled from 15 to 30 (compare the first two box plots in Figure 2b). However, the benefit from doubling the number of loci depends on the number and frequency of alleles (Figure 1).
When true Neb is 10, we must sample >20 polymorphic loci and 60 individuals to have an 80% probability of obtaining confidence intervals that are <50 (and to have a 95% probability of obtaining confidence intervals <100; Figure 2e). When the true Neb is 100, >80% of the confidence intervals include infinity, even when sampling 120 individuals and 20 loci with five alleles (data not shown). Clearly, when Neb is >10, very large samples of loci and individuals are required to achieve a high probability of obtaining reasonably small confidence intervals. Thus, the main limitation of the usefulness of the heterozygote-excess method is its poor precision, i.e., its wide confidence intervals. The confidence intervals are generally too wide for the method to be useful when using diallelic loci, loci with mostly rare alleles, or when studying strictly monogamous species (Figure 1).
When applied to data from natural populations, the heterozygote-excess method often gave estimates of Neb equal to infinity. For example, Nˆeb was infinity in 5 of 10 cohorts for which the total number of parents was known (or estimated) to be small (i.e., three to a few dozen). Further, only 2 of the 10 estimates gave 95% confidence intervals that did not include infinity as an upper limit (Table 2). This poor precision is not surprising in that only 5-9 polymorphic loci were analyzed, and only 11-25 progeny were sampled. Additional empirical evaluations are needed, but it is extremely difficult to find large data sets containing individuals produced from a known number of parents.
Estimates of Neb from empirical data sets containing progeny from a known number of parents
One potential limitation of the method is the requirement for random, representative sampling. For example, if a sample contains only one or few families (due to sampling error) then we could obtain a very low Neb estimate, even though many families actually exist and Neb is large. Another obvious limitation is that the method will work only in species with separate sexes. The method will work for haplo-diploid species (e.g., Hymenopterans), but will require the derivation of equations different from those presented here.
Four approaches may help circumvent the problem of poor precision. First, one can compute 80% confidence intervals (in addition to 95% confidence intervals). This will reduce the likelihood that the upper confidence limit will include infinity and be uninformative. Second, one could explore alternative methods for computing confidence intervals (e.g., nonparametric methods such as bootstrap resampling of loci). Third, one could combine estimates of Neb from several generations or cohorts by computing the harmonic mean of Nˆeb over the multiple generations or cohorts. This can reduce the probability of obtaining infinity for Nˆeb because, when computing a harmonic mean, the low estimates carry far more weight than high ones (e.g., infinity). Finally, one can potentially combine estimates of Ne obtained from several independent Ne estimators by computing the harmonic mean of the Ne estimates. Other promising Ne estimators include those based on gametic disequilibrium (Hill 1981) and on temporal variance in allele frequencies (Krimbas and Tsaskas 1971; Waples 1991). These two estimators also suffer from low precision (Waples 1989, 1991; Luikart 1997; Luikartet al. 1998). However, two or more of the estimators may be independent (Waples 1991; Pudovkinet al. 1996) and thus could potentially be used simultaneously to achieve a more precise estimate of Ne. More research is needed to evaluate the precision and accuracy that can be achieved by using several Ne estimators simultaneously. Any improvement in our ability to estimate Ne would be significant in light of both the difficulties in assessing Ne and the importance of Ne in population genetics and in conservation biology.
Acknowledgments
We thank I. Till-Bottraud and two anonymous reviewers for helpful comments on earlier versions of this manuscript, M. Schwartz for sharing unpublished simulation data, and especially P. Spruell, F. W. Allendorf, A. Estoup, and M. Brown for providing data sets. Support was provided by the French “Bureau Ressources Génétiques,” a postdoctoral fellowship (for G.L.) from National Science Foundation/North Atlantic Treaty Organization, and the Laboratoire de Biologie des Populations d’Altitude.
Footnotes
-
Communicating editor: G. B. Golding
- Received June 17, 1998.
- Accepted November 20, 1998.
- Copyright © 1999 by the Genetics Society of America