Using Maximum Likelihood to Estimate Population Size From Temporal Changes in Allele Frequencies
 Ellen G. Williamson and
 Montgomery Slatkin
 Corresponding author: Ellen G. Williamson, University of California, Integrative Biology, 3060 VLSB, Berkeley, CA 947203141. Email: ellenw{at}socrates.berkeley.edu
Abstract
We develop a maximumlikelihood framework for using temporal changes in allele frequencies to estimate the number of breeding individuals in a population. We use simulations to compare the performance of this estimator to an Fstatistic estimator of variance effective population size. The maximumlikelihood estimator had a lower variance and smaller bias. Taking advantage of the likelihood framework, we extend the model to include exponential growth and show that temporal allele frequency data from three or more sampling events can be used to test for population growth.
BIOLOGISTS are often interested in the number of reproducing individuals in a population, or in the related quantity, the effective population size. If only small numbers of individuals are reproducing successfully, then loss of genetic diversity through genetic drift and increased homozygosity due to unavoidable inbreeding may harm the longterm potential of the population to survive (Soulé 1986).
The increase in homozygosity in small populations can be quantified by the increase in variance in allele frequencies among identical loci. In a WrightFisher model of diploid size N, the variance in allele frequencies among identical loci with initial allele frequency p is p(1  p)/2N after one generation. For many other models of population structure, one can define a variance effective population size, N_{e}, such that after one generation the variance in allele frequency among initially identical loci is p(1  p)/2N_{e} (Ewens 1979). This variance effective population size is related to the expected value of F, the standardized variance of change in allele frequency. The mathematical relationship between F and the variance effective population size led to the development of a statistic for estimating variance effective population size from temporal samples of allele frequency data (Krimbas and Tsakas 1971; Nei and Tajima 1981; Pollak 1983; Waples 1989a). These methods have been applied to data taken from a wide variety of taxa in both natural and artificial (i.e., hatchery) populations (Waples 1990; Waples and Teel 1990, Hedgecocket al. 1992; Husband and Barrett 1992a,b, Jordanet al. 1992; Lessioset al. 1994; Hedricket al. 1995; Burczyk 1996; Jorde and Ryman 1996; Richards and Leberg 1996; Miller and Kapuscinski 1997, Scribneret al. 1997; Dallaset al. 1998; Laikreet al. 1998; Lehmannet al. 1998).
With the advent of highspeed computers, maximumlikelihood methods for estimating population genetic parameters have become increasingly popular. Although these methods can be computationally intensive, maximumlikelihood estimators have many desirable statistical properties. One advantage of using a maximumlikelihood framework is that it is relatively simple to modify the underlying statistical model. In this study we develop a maximumlikelihood framework for estimating population size from temporal changes in allele frequencies. We then compare estimates of population size using maximum likelihood and the Fstatistic approach on both real and simulated data. Finally we extend the model to include population growth and show that it is possible in principle to estimate both population size and growth rate simultaneously from temporal allele frequency data, although large numbers of loci are needed.
THEORY
Our likelihood method is based on the WrightFisher model of neutral genetic evolution in an isolated population (Ewens 1979). The WrightFisher model assumes discrete, nonoverlapping generations and constant population size, N. Gametes are chosen randomly and with replacement from the previous generation. Sampling from the parental generation with replacement is equivalent to assuming that the gamete pool is infinitely large.
The likelihood of a discrete parameter value given some data is, by definition, the probability of observing the data given the parameter value (Edwards 1992). In the present case, the data consist of the counts of each allelic type at several unlinked neutral genetic markers in a random sample taken with replacement from a population at generations 0, t_{1}, t_{2},... t_{f}. The configuration of alleles at each locus can be represented by vectors n¯_{0}, n¯_{t}_{1}, n¯_{t}_{2},...n¯_{tf}, where the ith element in n¯_{0} is the number of alleles of type i. The likelihood of N, the population size, given the data, is equal to the probability of observing the data given N; L(N n¯_{0}, n¯_{t}_{1}, n¯_{t}_{2},...n¯_{tf}) = P(n¯_{0}, n¯_{t}_{1}, n¯_{t}_{2},...n¯_{tf}N). This N is a parameter in the WrightFisher model and will not always be the same as the variance effective size estimated by the Fstatistic method. However, these quantities are related; when a population is evolving according to the WrightFisher model, the variance effective size and N are the same.
In a WrightFisher population, the configuration of alleles in each generation has a multinomial probability that depends only on the allele frequencies of the previous generation. Similarly, the probability of the allelic configuration n, a sample of size n taken randomly from the gamete pool of a population at a locus with k alleles that has allele frequencies p, is also multinomial and is given by
The joint probability of observing the configuration n¯_{0} in a sample from generation 0, and n¯_{t} in a second sample taken t generations later depends on the allele frequencies at this locus in the population at generations 0 and t, p¯_{0} and p¯_{t}. Because the sampling events are statistically independent,
If sample data are from diallelic dominant markers such as RAPDs, then different equations must be used for P(n¯_{0}p¯_{0}) and P(n¯_{t}p¯_{t}). If the markers are in HardyWeinberg equilibrium, then one can use diploid phenotype frequencies from dominant diallelic markers instead of allele frequencies and
Because the parameter p¯_{0} is unknown and not directly of interest, we treat it as a “nuisance” parameter. Similarly, the unobserved random variable p¯_{t} is unknown and we are not interested in estimating its value. If the joint probability distribution of p¯_{0} and p¯_{t} is known, then we can compute the marginal likelihood of N by summing P(n¯_{0}, n¯_{t}p¯_{0}, p¯_{t}) over all possible values for (p¯_{0}, p¯_{t}) weighted by P(p¯_{t}, p_{0}N). The population size, N, determines the probability of p¯_{t} given p¯_{0}, and by the definition of conditional probability, P(p¯_{t}, p_{0}N) = P(p¯_{t}p¯_{0}, N) · P(p¯_{0}N). Thus the marginal likelihood of N, given the data, is
We now have equations for the probabilities P(n¯_{0}p¯_{0}), P(n¯_{t}p¯_{t}), and P(p¯_{0}N) of Equation 3. The remaining probability, P(n¯_{t}p¯_{0}, N), can be computed using a forward transition matrix, M. The number of rows and columns in the matrix is the number of possible allele configurations, C. The population’s allelic configurations can be enumerated from 1 to C. A population configuration can easily be converted into an allele frequency vector by dividing the elements of the configuration by N. Each element of M, m_{ij}, is the probability of a population going from the ith configuration in the parental generation to the jth configuration in the offspring generation. Because we are using the WrightFisher model, transition probabilities are multinomial. The m_{ij} are given by Equation 1 if the sample configuration n is replaced by the jth population configuration and the allele frequency vector p is replaced by the ith population configuration divided by N. The probability distribution of a population’s allelic configuration can be described by a vector v¯ of length C, where the ith element in v¯ is the probability that a population has the ith configuration. To compute P(p¯_{t}p¯_{0}, N), the transition matrix, M, must be raised to the power t, where t is the number of generations between sampling events. P(p¯_{t}p_{0}, N) is equal to M^{t} multiplied on the left by a row vector
For loci that have k > 2 alleles, the number of possible configurations, C, is (N + k  1)!/(N!(k  1)!) of which C_{poly} = (N  1)!/((N  k)!(k  1)!) are polymorphic (Feller 1950). Unfortunately, working with large transition matrices can be extremely computationally intensive. Because C and C_{poly} increase rapidly with k, we restrict the analysis in this article to diallelic loci. Likelihood methods for loci with k > 2 will probably require simulation approaches such as Monte Carlo integration or Markov Chain Monte Carlo to approximate the likelihood function (E. C. Anderson, unpublished results).
This likelihood function is slightly unusual in that the parameter to be estimated, N, is a discrete parameter that determines the number of summands in the computation of the likelihood (Equations 4 and 6). Thus, it is not practical to differentiate the likelihood function with respect to N either analytically or numerically. The maximumlikelihood estimate of N must be determined on a casebycase basis from the likelihood function, there seems to be no general equation for the maximumlikelihood estimate of N using temporal allele frequency data.
ASSESSING THE MAXIMUMLIKELIHOOD ESTIMATOR
Simulation methods: We compared the maximumlikelihood estimator to the Fstatistic estimator with simulation tests. As noted earlier, these two estimators do not always estimate the same quantity. The Fstatistic method estimates the variance effective size and the maximumlikelihood approach described above estimates the parameter N of a WrightFisher population. In these simulation tests the two estimators estimated the same quantity because our simulated populations evolved according to the WrightFisher model. In each simulated replicate we selected starting frequencies for each locus from a uniform distribution. The simulated populations consisted of 50 alleles (25 diploid individuals). We sampled 100 alleles (50 diploid individuals) from the population, with replacement, according to three different sampling regimes (see Table 1). For each replicate we computed both maximumlikelihood and Fstatistic estimates of the population size. The Fstatistic was computed according to
When there were three sampling events we used the harmonic mean of the Nˆ estimates for the two intervals. This estimator was derived by pooling the two Fstatistic estimates and using this pooled value to compute N. Pollak (1983) developed similar estimators for population sampling without replacement (hypergeometric). In the examples we considered, the three estimators derived by Pollak (1983) are the same as the harmonic mean estimator except larger by multiplicative constants. Because in our simulation tests the Fstatistic estimator was biased upward, the Pollak estimators of N were less accurate than our harmonic mean estimator and are not shown.
In all replicates, the number of sampled individuals in each sampling event was twice as large as the size of the simulated population. When sample sizes are small, sampling error rather than genetic drift may be the primary reason for observed changes in allele frequencies (Waples 1989b). We avoided this problem by assuming large samples could be taken from the population. Fortunately, it is frequently possible to sample large numbers of individuals even when the number of breeding individuals is low. Often, juveniles are sampled from populations with high juvenile mortality, and the number of breeding adults may be orders of magnitude lower than the number of juveniles. Thus, our sampling scheme is representative of those found in the literature (Waples 1990; Waples and Teel 1990; Hedgecocket al. 1992; Husband and Barrett 1992a,b; Jordanet al. 1992; Lessioset al. 1994; Hedricket al. 1995; Burczyk 1996; Jorde and Ryman 1996; Richards and Leberg 1996; Miller and Kapuscinski 1997; Scribneret al. 1997; Dallaset al. 1998; Laikreet al. 1998; Lehmannet al. 1998).
In an application to real data rather than a simulation test, the Fstatistic or the maximumlikelihood estimate can always be computed, but the estimate may be infinity or very large. In our simulation tests, to reduce running time, the program halted all searches for the maximum of the likelihood curve if the maximumlikelihood estimate of population size was determined to be >500. We chose 500 because it is an order of magnitude larger than the true population size, 50. To be fair to both estimators, we discarded all simulation replicates in which either estimator was >500. For each estimator, we computed the mean and standard deviation for the estimates of N using the replicates that were not discarded. As a separate statistic we recorded the number of times each estimator was >500. The number of discarded replicates was the union of these events. These simulation tests were computationally intensive, so we ran many more replicates for the cases in which there were 5, 10, or 15 loci as these are the most realistic values for applications of this method to real data.
Simulation results: Both estimators tended to overestimate population size and had a high variance for samples with few loci. This upward bias and large variance of the estimators would have been much greater if we had not discarded replicates. However, in all of our simulation tests, the mean maximumlikelihood estimate of population size among the replicates that were not discarded was closer to the simulated population size than was the mean Fstatistic estimate (Table 1). Similarly, the variance in the likelihood estimates among these replicates was smaller than the variance in Fstatistic estimates (Table 1). Also, the Fstatistic estimate was more than an order of magnitude larger than the true value about twice as often as the maximumlikelihood estimate (Table 1). The estimates were positively correlated and the Fstatistic estimate was generally >500 whenever the maximumlikelihood estimator was >500. We had similar results when we ran simulation tests with initial allele frequencies drawn from beta distributions with varying parameters (not shown). Thus the maximumlikelihood estimate seems robust to violations of our assumption that allele frequencies are drawn initially from a uniform distribution.
As expected, increasing the number of loci or the number of sampled timepoints reduced both the variance and the bias in both estimators (Table 1). When the number of markers was very large, on the order of 100 to 200 loci, then both estimators performed very well. The difference between the two estimators was more obvious with samples of fewer markers. Increasing the number of sampling times improved the estimates for both methods (Table 1). The dramatic improvement in both estimators with multiple sampling (Table 1) may be explained by the initial rapid increase in accuracy in both estimators, particularly the Fstatistic estimator, as the number of sampled loci increases. Having samples covering two time intervals of the same length is roughly equivalent to sampling twice as many loci over one time interval.
ANALYZING A REAL DATA SET
We chose data from Miller and Kapuscinski (1997) to use as an example of the maximumlikelihood approach to estimating population size. These authors typed seven polymorphic microsatellite loci using a historical collection of fish scales taken from a completely isolated population of Northern pike (Esox lucius). Of the seven loci, five had two alleles and two had three alleles. For our analysis in this article, we combined the two least common allelic classes together for the two loci with three alleles to artificially create a completely diallelic data set. The scales had been removed from individuals in the population in 1961, 1977, and 1993. These dates represent two intervals of approximately four generations each. The samples in 1961 and 1993 were taken with replacement to the population. The 1977 sample was taken without replacement, although probably after reproduction for at least some of the sampled individuals. Miller and Kapuscinski used the Fstatistic method to estimate variance effective population size two ways, first assuming the 1977 data were generated with replacement, then without. They concluded that the two estimates are not significantly different because the census size is orders of magnitude larger than the estimated number of breeders. In our analysis, for simplicity, we assumed that the samples in the Miller and Kapuscinski study were taken with replacement.
Our results agreed with those of Miller and Kapuscinski. Our maximumlikelihood estimate of population size based on their data was 46, with a support interval, based on a decrease of two in the loglikelihood, of 21 to 112 (Figure 1). In their article, Miller and Kapuscinski estimated variance effective population size using the same Fstatistic method that we used for the simulation studies in this article. They estimated the variance effective population size to be 48 with a confidence interval of 19 to 101. The agreement between the two estimators for this data set suggests that the assumptions of the underlying models on which these statistics are based are probably well met by this population, as is discussed by Miller and Kapuscinski in their article.
ESTIMATING POPULATION GROWTH RATE
One advantage of the likelihood framework for parameter estimation is the flexibility to modify the underlying model. Here, we modified the WrightFisher model to include exponential growth. In this model the population size at generation t, N_{t}, is the smallest integer less than or equal to r^{t}N_{0}, where r is the growth rate and N_{0} is the initial population size. Still treating the initial allele frequency as a nuisance parameter, we computed the marginal likelihood of the growth parameter and the starting population size simultaneously. To compute the transition probabilities, the matrix M^{t} of Equation 5 was replaced by M_{1}, M_{2},... M_{t}, where M_{i} is a matrix with N_{i} columns and N_{i}_{1} rows. Computing the likelihood of each combination of N_{0} and r requires different matrices, M_{1}, M_{2},... M_{t}. As a result, this method is computationally intensive, even in the diallelic case, and so we were not able to do extensive simulation tests to evaluate this method.
As an example, we applied the method to the data from the pike population of Miller and Kapuscinski (1997). The maximumlikelihood estimate was a growth rate of 120% per generation with an initial diploid population size of 25 in 1961 (Figure 1b). However, with the data available, the likelihood support interval is so large (entire graph in Figure 2) that it is impossible to determine if the population is growing, shrinking, or stable. The census size of this population, estimated from markrecapture studies, fluctuated without any clear trend in the study interval between 1961 and 1993 (Miller and Kapuscinski 1997, Table 4).
To test the potential utility of our method for a much larger data set, we generated samples with 150 loci from a simulated population. Three samples were taken with replacement, four generations apart, from the simulated population. The simulated population grew at a rate of 120% per generation with an initial size of 25. These parameters are the maximumlikelihood estimates from the Miller and Kapuscinski data. The maximumlikelihood estimates from the simulated data were equal to the true parameter values (Figure 3). The support interval was much smaller with this many markers, and it could be unambiguously determined that the population was growing (Figure 3). Thus, maximum likelihood can be used to estimate growth rate from temporal changes in allele frequencies, but only with extensive data.
DISCUSSION AND CONCLUSIONS
The Fstatistic method of estimating the variance effective population size is known to be biased upward and have a large variance (Pollak 1983; Waples 1989a). Estimates of the number of breeding individuals in a population may be needed to help guide management and conservation decisions for endangered species and populations. Therefore, any method that offers an increase in accuracy in estimating population size from genetic data should be seriously considered. The likelihood estimator for N appears to have a lower variance and smaller bias than the Fstatistic estimator of the variance effective size (Table 1).
A disadvantage of using likelihood to estimate N is that it is more computationally intensive than using Fstatistics. As a consequence, it is difficult to assess the performance of the likelihood method against the Fstatistic approach over a wide range of parameters. Furthermore, even with dramatic improvements in computing speed, it would still be unrealistic to exactly compute the likelihood function for samples of highly polymorphic loci. One approach might be to bin allelic classes together and reduce the number of alleles observed to two or three. Another more desirable approach would be to estimate the likelihood function for the full multiallelic data set by using Monte Carlo simulation or Markov Chain Monte Carlo methods (E. C. Anderson, unpublished results). The appropriateness of either approach would have to be determined on a casebycase basis and would depend on the number of loci, number of alleles, population size, accuracy desired, and limitations in terms of computational resources.
Both the Fstatistic and the maximumlikelihood estimator assume that the study population is isolated and that the markers are selectively neutral and not mutating. Selection, migration, and mutation all have the power to change allele frequencies and would therefore change the estimated value of N using temporal allele frequency data. For example, a population receiving a constant stream of immigrants from a large source population every generation may have very stable allele frequencies in spite of a very small size. In this case, population size would be overestimated using a temporal method. On the other hand, a single rare large pulse of immigration from a population between sampling events could cause the population size to be underestimated. It is therefore important to carefully assess the study population to see if it meets these criteria before using temporal changes in allele frequencies to estimate population size or growth rate.
Fstatistic estimates of variance effective size are generally improved more dramatically if more loci are sampled rather than more alleles. This is in part due to the fact that very rare alleles can be lost within a generation or two, while intervals between sampling events are often longer than this (Laikreet al. 1998; Luikartet al. 1998). Also much of the computational difficulty in estimating N with likelihood derives from the inclusion of multiple alleles. Thus, the best allocation of resources may be to gather samples from many diallelic loci. Diallelic dominant markers such as RAPDs can be easily incorporated into the likelihood model and can also be combined with other types of markers. Studies that use diallelic markers should benefit from the increase in accuracy offered by the maximumlikelihood approach developed here.
A primary advantage of the likelihood framework is its flexibility. For example, if one could generate from preexisting data a prior distribution for N, then one could extend the method developed here to a fully Bayesian approach and then estimate a posterior distribution for N. In this article we demonstrated how temporal allele frequency data could be used to estimate population growth rates and population size simultaneously. Temporal allele frequency data could potentially be used to estimate other parameters with likelihood as well. This would require a model relevant to the study population that can be specified with only a few parameters. For example, if we were interested in estimating variance in family size instead of growth rate, a fecundity distribution that could be parameterized with a single parameter would be ideal. From this fecundity distribution, a transition matrix for population allele configurations would need to be derived. Thus, in addition to a more accurate estimate of the population size, the likelihood framework offers the flexibility to modify the working model and the potential to estimate new parameters.
Acknowledgments
This work was funded by National Institutes of Health grant GM40282 to M. Slatkin. We thank E. Anderson, J. Beilawski, G. Bertorelle, C. Muirhead, R. Neilsen, B. Rannala, S. Tavaré, and T. Turner for their comments and suggestions.
Footnotes

Communicating editor: G. A. Churchill
 Received October 12, 1998.
 Accepted February 22, 1999.
 Copyright © 1999 by the Genetics Society of America