Genetics, Vol. 156, 2109-2118, December 2000, Copyright © 2000

Monte Carlo Evaluation of the Likelihood for Ne From Temporally Spaced Samples

Eric C. Andersona, Ellen G. Williamsonb, and Elizabeth A. Thompsonc
a Interdisciplinary Program in Quantitative Ecology and Resource Management, University of Washington, Seattle, Washington 98195,
b Department of Integrative Biology, University of California, Berkeley, California 94720-3141
c Department of Statistics, University of Washington, Seattle, Washington 98195

Corresponding author: Eric C. Anderson, Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195., eriq{at}cqs.washington.edu (E-mail)

Communicating editor: G. A. CHURCHILL


*  ABSTRACT
*TOP
*ABSTRACT
*FORMULATION OF THE MODEL...
*SIMULATED AND REAL DATASETS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

A population's effective size is an important quantity for conservation and management. The effective size may be estimated from the change of allele frequencies observed in temporally spaced genetic samples taken from the population. Though moment-based estimators exist, recently Williamson and Slatkin demonstrated the advantages of a maximum-likelihood approach that they applied to data on diallelic genetic markers. Their computational methods, however, do not extend to data on multiallelic markers, because in such cases exact evaluation of the likelihood is impossible, requiring an intractable sum over latent variables. We present a Monte Carlo approach to compute the likelihood with data on multiallelic markers. So as to be computationally efficient, our approach relies on an importance-sampling distribution constructed by a forward-backward method. We describe the Monte Carlo formulation and the importance-sampling function and then demonstrate their use on both simulated and real datasets.


REDUCTIONS in population size can lead to inbreeding, which increases the probability of population extinction in typically outbreeding species (FRANKHAM 1995 Down). Reductions in population size also lead to a loss of genetic diversity, which may restrict a population's ability to adapt to changing conditions (SOULE 1986 Down). To predict the risk to a population from these types of genetic factors, biologists are often interested in knowing the effective population size, Ne. An effective size is defined by comparison to an ideal population model, the Wright-Fisher model. The Wright-Fisher model assumes discrete, nonoverlapping generations of constant size, and it assumes that the gametes that unite to form adults in one generation are randomly sampled with replacement from the previous generation. The variance effective size of a natural population is the size of a Wright-Fisher population that would experience a comparable increase in variance of gene frequency over time. The inbreeding effective size is defined similarly, but is based on the increase in gene identity by descent over time.

It is possible to estimate the variance effective size from observed changes in allele frequencies in a population over time. Moment-based estimators using F-statistics have been developed for this purpose (KRIMBAS and TSAKAS 1971 Down; NEI and TAJIMA 1981 Down; POLLAK 1983 Down; WAPLES 1989 Down; JORDE and RYMAN 1995 Down). Recently, WILLIAMSON and SLATKIN 1999 Down described a method to estimate Ne by the method of maximum likelihood. To find the maximum-likelihood estimate e of Ne, given allele frequencies observed in samples taken from a population at different times, one models the population underlying the samples as a Wright-Fisher population. e is then the size of that underlying, ideal population for which the observed data are most probable. In simulation studies WILLIAMSON and SLATKIN 1999 Down showed that the maximum-likelihood estimator outperformed the moment-based estimators, and they also demonstrated how a likelihood approach may be extended to estimate parameters in more complex population models.

This likelihood method has been restricted to data on diallelic loci, because, with data on multiallelic loci, evaluating the likelihood for Ne exactly is computationally intractable. Here we describe the problem as one of inference from a hidden Markov chain (BAUM et al. 1970 Down). We develop an algorithm for importance sampling, which makes it possible to compute the likelihood by Monte Carlo.


*  FORMULATION OF THE MODEL AND MONTE CARLO
*TOP
*ABSTRACT
*FORMULATION OF THE MODEL...
*SIMULATED AND REAL DATASETS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The model:
The data are genetic samples collected at different generations. The first sample is collected at generation 0 and the last sample at generation T. Any samples drawn at intervening generations may be evenly or irregularly spaced in time. For notational simplicity, we assume for now that individuals are genotyped at a single locus, though we describe later the extension to multiple, independently segregating loci. The data include K different allelic types, indexed by k = 1, ... , K. The allele frequencies observed in samples taken from different generations will differ due to genetic drift and sampling variation.

Let Yt = (Yt,1, ... , Yt,K) be the counts of the K different allelic types in the sample at generation t, and let St denote the number of diploid individuals in the sample. We assume that the samples were taken from a Wright-Fisher population of size Ne and denote the unobserved population allele counts at generation t by Xt = (Xt,1, ... , Xt,K), with {Sigma}Kk=1 Xt,k = 2Ne. By the formulation of the Wright-Fisher model, the Xt form a Markov chain in time, with transitions defined by multinomial probabilities depending on Ne,

(1)

The genetic sample at a time t is assumed to be drawn with replacement from the copies of alleles present in the population at time t. This is equivalent to drawing the sample Yt from a very large gamete pool produced by the population at time t: sampling plan II of WAPLES 1989 Down. This type of sampling applies to many organisms, especially those species with high fecundity that may be sampled as juveniles, or those that may be sampled (preferably noninvasively) as adults in populations having census sizes considerably larger than their effective sizes (WAPLES 1989 Down). The sample allele counts Yt, given the latent variable Xt, are conditionally independent of all the other variables and follow the multinomial distribution depending on the parameter Ne, the sample size St, and Xt,

(2)

when St > 0. If there is no sample taken from the population at generation t, then St {equiv} 0, and we define PNe(Yt|Xt) {equiv} 1.

Such a system forms a hidden Markov chain with the dependence structure shown in the directed graph of Fig 1. The allele counts in the population when the first sample is drawn, X0, are nuisance parameters. To avoid estimating X0 and the associated consistency problems with the maximum-likelihood estimator (NEYMAN and SCOTT 1948 Down), we consider the integrated likelihood, assuming a uniform prior distribution, {pi}(X0), on the population allele counts at time 0. The likelihood for Ne is the probability of the data Y = (Y0, ... , YT) given the parameter Ne. The probability of Y is the sum of the joint probability of Y and the latent variables X = (X0, ... , XT) over the space of all X:

(3)



View larger version (22K):
In this window
In a new window
Download PPT slide
 
Figure A1. Figures representing and for 2N* = 20. The normal curve is the density for {theta}. (a) Reflections and translations as described in the Appendix Long-dashed lines represent the curve after reflection through L or H, while the short-dashed lines represent the reflected curve after one or more successive translations. (b) If {delta} = 1 and {kappa} = 1, then X(i) is constrained to be in {1 ... , 2N* - 1}. The shaded regions correspond to those values of {theta} for which X(i) = 13 by . The total shaded area is equal to µ,{sigma}2(X = 13; 10, 1, {gamma}, 1). (c) If {delta} = 0 then X(i) may take the value 0. The shaded area shows µ,{sigma}2(X = 0; 10, 0, {gamma}, {kappa}). (d) If {kappa} = 0 then X may take the value 2N*. The shaded area shows µ,{sigma}2(X = 2N*; 10, {delta}, {gamma}, 0).

For the case of K = 2 and Ne small, the likelihood in (3) may be computed exactly. WILLIAMSON and SLATKIN 1999 Down effected the summation in (3) in terms of multiplication of transition probability matrices. The dimension of the square matrices is (Ne - 1)!/[(Ne - K)!(K - 1)!], which increases rapidly with Ne and K. We note that the hidden Markov form of the system allows a more efficient computation of the likelihood using the algorithm of BAUM 1972 Down. Nonetheless, exact evaluation for multiple alleles would still require prohibitively large amounts of computation and storage. An alternative is to estimate PNe(Y) by Monte Carlo.

Monte Carlo evaluation:
For likelihood inference, we must evaluate PNe(Y) for a number of different values of Ne. Expressing this probability as an expectation with respect to the distribution of X gives

(4)

In this form the expectation would be taken over the marginal probabilities of X, and it could be estimated by Monte Carlo as

(5)

for large m, with X(i) being the ith realization from the marginal distribution of X. Such a naive scheme fails, however, because PNe(Y|X(i)) varies greatly over the values of X realized from their marginal distribution, resulting in enormous Monte Carlo variance.

Instead, we pursue a more efficient Monte Carlo approximation by using importance sampling (HAMMERSLEY and HANDSCOMB 1964 Down). We express PNe(Y) as an expectation with respect to a different distribution P*Ne(X) having the property that P*Ne(X) > 0 for all X such that PNe(Y, X) > 0. Thus, we have

(6)

where *Ne indicates that the expectation is over the space of X weighted by the distribution P*Ne(X). The expectation (6) may be estimated by Monte Carlo, giving

(7)

for large m, where X(i) is the ith realization of X drawn from P*Ne(X). The Monte Carlo variance of Ne(Y) is made small when varies little across the possible values of X and would be minimized if P*Ne(X) were exactly proportional to PNe(Y, X). Such a distribution of X would, by definition, be the conditional distribution PNe(X|Y). Unfortunately, for the same reasons that PNe(Y) cannot be computed exactly, it is infeasible to compute PNe(X|Y). Nonetheless, the Monte Carlo variance of Ne(Y) will be reduced to the extent that P*Ne(X) resembles PNe(X|Y). We now describe a method for rapid simulation of X(i)'s from a distribution P*Ne(X) that is close to PNe(X|Y). As is required for the importance sampling, it is also possible to compute P*Ne(X(i)) quickly for each X(i) generated.

Sampling from P*Ne(X) by a forward-backward method:
BAUM et al. 1970 Down describe a method applicable to general, hidden Markov chains for realizing latent variables, such as X = (X0, ... , XT), from their exact conditional distribution given the observed variables, such as Y = (Y0, ... , YT). Their algorithm first employs a "forward step" in which the conditional probability distributions of each Xt, given the observed variables up to and including Yt, are recursively computed and stored using the relation

(8)

which is normalized by the sum of that quantity over all the values of Xt. The last such conditional distribution computed is P(XT|Y0, ... , YT). The "backward step" begins with simulating a value X(i)T from this distribution (where, as before, the superscript (i) indicates a realized value of a random variable). One then proceeds backward, realizing X(i)T-1 from its conditional distribution given all of the observed variables, Y, and X(i)T. In similar fashion, one realizes X(i)T-2 and so forth back to X(i)0. In this backward phase, each X(i)t is simulated from its conditional distribution given all the data Y and all of the components of X that have been realized so far. That is, X(i)t is drawn from

(9)

Because of the conditional independence structure in a hidden Markov chain, (9) reduces to P(Xt|Y0, ... , Yt, X(i)t+1), which may be computed using the distributions stored during the forward step by the relation

(10)

At the end of the backward step, it is thus clear that the resulting realization (X(i)0, ... , X(i)T) is from the conditional distribution of X given Y.

An approximation for multiple alleles: In our application, with multiple alleles at a locus, since there are so many possible states that each Xt may take, the above procedure is computationally infeasible. However, we make use of the BAUM et al. 1970 Down algorithm in spirit, employing two alterations to make it feasible to simulate from P*Ne(X) and to compute its value. We emphasize that although the method described below involves a series of "approximations" by which P*Ne(X), differs from PNe(X|Y), the final sampling and computation of P*Ne(X) is exactly from the P*Ne(X) as constructed, so its use in (7) gives a true Monte Carlo estimate.

The first approximation is to perform the forward-backward cycle separately for each allele. To describe this, we introduce some more notation. Denote by X(k) the vector (X0,k, ... , XT,k) of latent counts of the kth allele from time t = 0 to t = T. Similarly we define Y(k) = (Y0,k, ... , YT,k). To do the forward-backward cycle separately over alleles we first focus on allele 1, simulating X(i)(1) by the forward-backward mechanism as if the data were on a diallelic locus with observed counts Y(1) from samples of size S0, ... , ST through time. Once we have realized X(i)(1) we update the sizes of the population and the sample. Thus we define the updated population size vector 2N*(2) = (2Ne - X(i)0,1, ... , 2Ne - X(i)T,1) and an updated sample size vector 2S*(2) = (2S*0,2, ... , 2S*T,2) = (2S0 - Y0,1, ... , 2ST - YT,1), in effect removing the first allelic type from the remainder of the data and the population. We then use the forward-backward mechanism again to simulate X(i)(2) as though the data were counts Y(2) from a diallelic locus drawn from a population with sizes that change over time N*(2) and sample sizes S*(2). This continues sequentially over alleles, updating population sizes and sample sizes as above: 2N*(k) <- (2N*(k-1) - X(i)(k-1)) and 2S*(k) <- (2S*(k-1) - Y(i)(k-1)), until X(K-1) has been realized, which also determines that X(K) <- (2N*(K-1) - X(i)(K-1)). (Here and later we use the notation A <- B to mean "the value B is assigned to the variable A.") At the end one has obtained a realized value X(i), which may be used in (7).

P*Ne(X) using a continuous approximation: Although realizing alleles sequentially, as above, greatly reduces the number of terms required to use (8) and (10), the method would still involve a prohibitive amount of summation over binomial probabilities. Thus we construct P*Ne(X), employing a normal approximation to binomial probabilities, which replaces all such sums by analytically tractable integrals. Recall that if W ~ Binomial(n, p), then the transformed variable sin-1(W/n)1/2 is approximately normally distributed with variance 1/(4n). Note that this quantity does not depend on p. Hence we use this transformation to define the quantities {phi}t,k = sin-1[] when S*t,k > 0, and {theta}t,k = sin-1[]. By realizing the continuous values {theta}(i)t,k in a forward-backward framework within a continuous setting, the computational demands are greatly reduced. And then, by transforming each {theta}(i)t,k back into the appropriate discrete X(i)t,k we have a way to realize X(i) from P*Ne(X) and to compute the probability P*Ne(X(i)). The details of this procedure are given in the Appendix. We use it to compute the Monte Carlo estimate Ne(Y) using (7).

Monte Carlo variance and multiple loci:
The quantity Ne(Y) is only an estimate of the true value PNe(Y). By the central limit theorem, for large m, PNe(Y) will be approximately normally distributed (HAMMERSLEY and HANDSCOMB 1964 Down) with mean Ne(Y) and a variance that may be approximated without bias by the quantity

(11)

These facts may be used to obtain a confidence interval estimate around each Ne(Y).

The ability to estimate Ne typically requires data from many loci. The extension to data on J independently segregating loci, indexed by j = 1, ... , J, is straightforward—each locus is treated separately, and the estimated likelihoods from each locus are multiplied together. Thus, let Ne,j(Y) be the Monte Carlo likelihood estimate from the data on the jth locus. The Monte Carlo likelihood estimate using all the loci is then

(12)

This requires that the initial allele counts have independent prior distributions, {pi}(X0). An implicit assumption of (12) is consequently that the loci used are in linkage equilibrium at t = 0. JNe(Y) will also have an approximately normal distribution. An unbiased estimator for its Monte Carlo variance (derived in Equation A9 in the Appendix) is

(13)

This can be used to compute a confidence interval estimate around JNe(Y).

When displaying the Monte Carlo likelihood curve it is preferable to plot the log-likelihood values, log JNe(Y), for different values of Ne. In this case, the endpoints of the confidence intervals may be similarly log-transformed.


*  SIMULATED AND REAL DATASETS
*TOP
*ABSTRACT
*FORMULATION OF THE MODEL...
*SIMULATED AND REAL DATASETS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We demonstrate the method by computing log-likelihood curves for Ne from three different datasets. First, to verify that the method gives correct results we apply it to a simple simulated dataset (dataset 1) for which it is possible to compute the likelihood exactly. We simulated samples of 20 diallelic loci from 100 diploid individuals at generations 0, 6, and 12 from a Wright-Fisher population of 25 diploid individuals. For each locus, the initial allele frequency in the population at time zero was an independently drawn uniform real number between 0 and 1. The log-likelihood for Ne given these data was estimated for values between 10 and 52, in steps of 2, using m = 20,000 realizations of X from P*Ne(X) for each locus and each Ne.

We simulated a second dataset (dataset 2) to see how the method performed with multiallelic markers taken from a Wright-Fisher population. The dataset included three samples of 100 diploids for 12 five-allele loci at generations 0, 4, and 8 from a population of 50 diploids. The allele frequencies at each locus in generation 0 for these simulations were independently drawn from a uniform Dirichlet density with five components. For this dataset, the log-likelihood was computed for values of Ne between 20 and 100 in increments of 4 using m = 50,000 realizations of X for each locus and each value of Ne.

Finally, we computed a log-likelihood curve for Ne given data on a population of Drosophila in BEGON et al. 1980 Down. These data were analyzed using F-statistics by BEGON et al. 1980 Down as well as by POLLAK 1983 Down. They observed allele frequencies in three samples at each of nine enzyme loci. The first two samples were taken a little more than 1 yr apart, and the third sample was taken some 8 mo later. Though the natural populations do not have discrete generations, they have been modeled previously by Begon et al. and Pollak as populations with discrete generations. Because of the different growth rates of flies during different seasons, seven generations separate the first two samples, while only two generations separate the second two samples (BEGON et al. 1980 Down). The sample sizes for all loci were the same, with larger sample sizes taken in the latter sampling periods. The sample sizes were S0 = 190, S7 = 250, and S9 = 335 flies. POLLAK 1983 Down notes that since BEGON et al. 1980 Down sampled adult flies, their sampling scheme is closer to what is known in the literature as sampling scheme I than it is to sampling scheme II. However, as discussed by WAPLES 1989 Down, the probability models underlying the two different sampling schemes are very similar when the actual size of the population is much larger than the effective size of the population. This is the case with these Drosophila. BEGON et al. 1980 Down report census sizes in the tens of thousands of flies, while the estimated Ne is orders of magnitude smaller. Because of this, it is still reasonable to analyze the data using the likelihood method we have developed here.

The data appear as allele frequencies in Table 1 of BEGON et al. 1980 Down. Unfortunately the allele frequencies at the Pgm locus are misreported there and fail to sum to one. We thus used only the remaining eight loci. Of these eight, three had three alleles, two had four alleles, two had five alleles, and one had six alleles. We evaluated JNe(Y) at values of Ne between 200 and 1200 in increments of 50, with two more points (Ne = 425 and Ne = 475) included near the peak of the likelihood curve. For each point we used m = 500,000 realizations of X.


*  RESULTS
*TOP
*ABSTRACT
*FORMULATION OF THE MODEL...
*SIMULATED AND REAL DATASETS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

For each of the datasets, we were able to use our importance-sampling method to compute a log-likelihood curve. Using a program we wrote in C, the runs for datasets 1 and 2 each took ~10 hr on a 266-Mhz laptop computer. The log-likelihood curves from datasets 1 and 2 appear as solid lines in Fig 2. The estimated 90% confidence intervals around each value of log JNe(Y) appear as two dashed lines bordering the log-likelihood curve. Despite the fact that few Monte Carlo replicates (m = 20,000 and 50,000) were used, the Monte Carlo variance is minimal, as indicated by the fact that the dotted lines practically lie on top of the estimated log-likelihood curve. In both cases, the true values of Ne (25 and 50, respectively) are well within 2 units of log-likelihood from the maximum-likelihood estimates, which may be read from the graph as 24 and 56. Since dataset 1 consists only of diallelic loci, it is possible to compute the exact log-likelihood curve. This exact curve has been plotted as a dotted line in Fig 2A. It is impossible to distinguish the exact curve because the Monte Carlo estimate is very accurate in this case.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 2. Log-likelihood curves estimated by Monte Carlo from datasets 1 and 2. The values of Ne at which the likelihood was computed are indicated by vertical lines above the horizontal axis in each figure. The log-likelihood values are connected by a solid line. Vertical bars intersecting the solid line indicate 90% confidence intervals around log JNe(Y) computed using the Monte Carlo variance estimate (13). The endpoints of the confidence intervals are connected by dashed lines. (a) Dataset 1 is simulated data from 20 diallelic loci. (b) Dataset 2 is simulated data from 12 loci with five alleles each.

The log-likelihood curve computed for the data of BEGON et al. 1980 Down is shown in Fig 3. It took ~54 hr on a 450-Mhz desktop computer to produce the results. As before, the 90% confidence intervals around the Monte Carlo estimates appear as dotted lines. With this dataset, even with m = 500,000 realizations of X, the Monte Carlo variance is not negligible. It is, however, small enough that reliable inferences may be made from the log-likelihood curve. The maximum-likelihood estimate of Ne is 500. Using the values of Ne at which the log-likelihood has decreased 2 units from its maximum gives an estimate of a 95% confidence interval for the true Ne. These points are 250 and 975. By contrast, POLLAK 1983 Down, using an F-statistic method, estimated Ne to be 251 with a standard error of 115. We discuss the discrepancy between the two estimates in the next section. Our results are not comparable to the Ne estimated by BEGON et al. 1980 Down because, at the time, those authors were unable to make a single estimate of Ne using the samples at all three time points.



View larger version (24K):
In this window
In a new window
Download PPT slide
 
Figure 3. Log-likelihood curves from the data of BEGON et al. 1980 Down estimated by Monte Carlo. The format of the plot is as for Fig 2.


*  DISCUSSION
*TOP
*ABSTRACT
*FORMULATION OF THE MODEL...
*SIMULATED AND REAL DATASETS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

As discussed in WILLIAMSON and SLATKIN 1999 Down, estimating Ne by maximum likelihood has advantages over estimating Ne using F-statistics. Until now, it was impractical to compute the likelihood for Ne using all the data when loci with more than two alleles were available. While it has been suggested that one may bin low-frequency alleles together to turn multiallelic loci into apparently diallelic loci and then apply exact likelihood calculation methods to such reduced data, this invariably throws away some information. Furthermore, different binning strategies lead to different results. Allowing full use of the data, the Monte Carlo likelihood procedure described here is a preferable way to analyze temporal data on multiallelic loci. The method is suitable for multiallelic loci such as the microsatellite markers becoming available in a wide variety of species.

Monte Carlo methods use realizations of random variables to estimate an expectation by a sample average. There are a number of ways one can express the likelihood of Ne as an expectation, and then estimate it by Monte Carlo, but few of those schemes will be successful, because most will have high Monte Carlo variance. We attempted several different schemes before settling on the importance-sampling method presented here. Although these less sophisticated Monte Carlo estimators produced reasonable estimates in very small problems, when applied to data involving loci with many alleles these methods failed to converge reliably, even after many days of computation (E. C. ANDERSON and E. G. WILLIAMSON, unpublished data).

The importance-sampling method we use is successful because our importance sampling function, P*Ne(X), closely resembles PNe(X|Y), the conditional probability of X given Y. This is achieved by recognizing the hidden Markov chain structure of the problem and using the forward-backward algorithm of BAUM et al. 1970 Down. In doing so, we have developed a Monte Carlo estimator with demonstrably small Monte Carlo variance. Though the computational demands of this procedure are substantial, the reduction in Monte Carlo variance obtained makes it worthwhile. Nonetheless, it may be possible to improve the estimates by making additional changes to P*Ne(X) so that it more closely resembles PNe(X, Y), especially in the tails of the distribution. This would further reduce the Monte Carlo variance.

It should be pointed out that while many Monte Carlo problems involving high-dimensional random variables like X make use of Markov chain Monte Carlo (MCMC) methods, our method is not an MCMC method. In MCMC, successive realizations are correlated. In our method each X(i) realized from the distribution P*Ne(X) is independent of the other realized values. As a result, our method does not have the same problems of convergence assessment as does Markov chain simulation (GELMAN 1996 Down).

It is interesting that our maximum-likelihood estimate differs so much from the estimate given by POLLAK 1983 Down for the same data. Though perhaps some of this is attributable to the fact that we chose not to use the incorrectly reported data at the Pgm locus, there are differences between the two estimation methods that could also account for some of the discrepancy. The most notable differences occur when combining information from multiple samples in time. Consider the fact that a better estimate of Ne may be made from two samples taken many generations apart than from two samples separated by fewer generations. Likewise two large samples will yield a better estimate than two small samples. When there are many samples, the relative information content in different intersample intervals will depend on the relative sample sizes and the number of generations between the samples. By its nature, the maximum-likelihood approach will appropriately weight information from different intervals. In contrast, Pollak's F-statistic, FKr, neither includes terms for sample size nor interval length between samples, and, ÑKr, his estimate of Ne based on FKr, includes a term for only the number of generations between the first and the last sample and is invariant to permutations of the sample sizes at different times. Since the data from BEGON et al. 1980 Down span sampling intervals of different lengths and include different sample sizes at different times, differences between our results and those in POLLAK 1983 Down should be expected.

The Monte Carlo variance of our estimate of the likelihood given the data from a natural population of Drosophila was higher than the variance associated with our estimates from simulated data. Although a good estimate was achieved after sufficient computation, it may still be that data generated under a model that differs from the Wright-Fisher model present difficulty for the Monte Carlo likelihood method. For example, it may be that the effective size of the natural Drosophila population was different during the two different sampling intervals. We note that one could extend the likelihood framework to allow for Ne changing over time. For example, if the estimated census size of the population were available and was known to change over time it would be more sensible to estimate directly the ratio, {lambda}, of the effective size of the population to the census size of the population. This ratio would be more useful for the purposes of modeling genetic change in the population than a single estimate of Ne over the entire time period between the first and last samples. The forward-backward approach implemented here could easily be modified to estimate this parameter, {lambda}.

It would also be possible to extend the present approach to compute likelihoods from different stochastic models of allele frequency change. We suspect there would seldom be enough information in the data to estimate accurately models in which allele frequency change is due jointly to genetic drift and some other genetic mechanism such as mutation or selection. However, our methods could be modified to handle different demographic models. For example, consider an alteration of the Wright-Fisher model such that the current generation is formed by sampling genes without replacement from a gamete pool in which each parental allele is represented by M gametes. In such a case, allele frequency changes occur due to multivariate hypergeometric sampling determined by the parameters N and M. For values of M that are not very small, the normal distribution is still a reasonable approximation to the hypergeometric distribution, and importance sampling could proceed as before. The importance-sampling distribution P*N,M(X) should then account for the decrease in variance (by a factor of 2N[M - 1]/[2N M - 1]) of the hypergeometric distribution relative to the binomial distribution. The same sort of sampling model and adjustment in the importance-sampling distribution could be used to accommodate scenarios in which one's genetic samples (the data Y) were assumed drawn without replacement from the population.

Another possible extension would be to stochastic models involving populations of organisms with more complex life histories, for example, overlapping generations or age-structured populations. As it becomes easier to gather extensive genetic data on populations, and as understanding of the structure within those populations improves, it will be possible to specify much richer probability models for temporal population genetic data. The forward-backward method here, and variations of it, should be useful in estimating population parameters from such models using Monte Carlo likelihood.

A software package, MCLEEPS, implementing the algorithm described in this article is available for free download from http://www.stat.washington.edu/thompson/Genepi/Mcleeps.shtml.


*  ACKNOWLEDGMENTS

We thank an anonymous referee for helpful comments on the manuscript and for suggesting the extension to the population model with hypergeometric sampling. This study was supported by National Science Foundation grant BIR-9807747 to E.T. Additionally, E.A. was supported by National Science Foundation grant BIR-9256537, the University of Washington QERM program, and a Burroughs Wellcome Fund PMMB training fellowship. E.W. was supported by National Institutes of Health grant GM40282 to M. Slatkin.

Manuscript received December 27, 1999; Accepted for publication August 7, 2000.


*  APPENDIX
*TOP
*ABSTRACT
*FORMULATION OF THE MODEL...
*SIMULATED AND REAL DATASETS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Using {theta}t,k and {phi}t,k in a continuous setting:
We define the random variables {phi}t,k = sin-1[] when S*t,k > 0, and {theta}t,k = sin-1[ when N*t,k > 0. These quantities have an approximate normal distribution, which is independent of their means. We use them in our construction of the importance-sampling function PNe(X). Below, we concentrate on their use for realizing X(i)(k), keeping in mind that if k > 1 then we will have already realized X(i)(k-1), and we use the updated population and sample sizes N*(k) and S*(k). If k = 1 then N*(1) = (Ne, ... , Ne) and S*(1) = (S0, ... , ST), respectively.

The forward step:
Following CAVALLI-SFORZA and EDWARDS 1967 Down, if {theta}t-1,k is normally () distributed with mean µt-1 and variance {sigma}2t-1, then, after a generation of genetic drift in a population of N*t,k diploids, {theta}t,k has an approximate normal distributions with mean µt-1 and variance {sigma}2t = {sigma}2t-1 + . If there are data Yt,k from a sample of size St,k at time t, then {phi}t,k has an approximate normal distribution with mean {theta}t,k and variance , so, given that {theta}t,k ~ t, {sigma}2t), the conditional distribution of {theta}t,k given {phi}t,k is also normal. These relations form the basis of a continuous approximation for doing the forward step. For the purpose of realizing X we assume that the uniform prior on X0 is equivalent to a diffuse prior on {theta}0,k. Therefore {theta}0,k |{phi}0,k ~ 0, {sigma}20) with µ0 = {phi}0,k and {sigma}20 = . With that as a starting point, we work iteratively forward in time, assigning values

(A1)


(A2)

if S*t,k = 0. If S*t,k > 0, however, then one first computes µt and {sigma}2 as in (A1) and (A2), but then further updates the values to reflect the information in the sample at time t:

(A3)


(A4)

This is analogous to computing a posterior distribution from a normal prior and normal data (see, for example, GELMAN et al. 1996 Down, p. 43).

Carrying this out until t = T gives values for the mean and variance of {theta}T,k given {phi}0,k, ... , {phi}T,k, assuming they follow a normal distribution. In fact, for each t, it gives us the parameters for the normal distribution of {theta}t,k conditional on {phi}r,k for all r <= t. We are thus in a position to realize {theta}(i)t,k's in the backward step and transform those {theta}(i)t,k's back into the X(i)t,k's that we need.

The backward step:
The backward step is more complicated than the forward step, because after realizing each value of {theta}(i)t,k we must transform it into the discrete value X(i)t,k that we require. This transformation process requires some extra bookkeeping to ensure that we do not waste time realizing X(i)'s that are incompatible with the data. This is described in the next section of the APPENDIX. We first realize the value {theta}(i)T,k from a T, {sigma}2T) distribution. Then we transform that to the realization X(i)T,k by a many-to-one map , which has two effects: the first is that of folding and translating the distribution of {theta}T,k so that it is bounded between 0 and {pi}/2, mapping {theta}(i)T,k {isin} (-{infty}, {infty}) to a value {theta}*T,k {isin} [0, ]. The second is transforming that {theta}*T,k into the appropriate value X(i)T,k (see the next section).

Working backward, each {theta}(i)T,k for t = T - 1 down to t = 0, is realized from a t, {sigma}2t) distribution and then transformed into the corresponding {theta}*T,k and X(i)t,k by . In keeping with (10), before {theta}(i)t,k is realized, µt and {sigma}2t must be appropriately updated, on the basis of the values of µt and {sigma}2t stored during the forward step and the realized value {theta}(i)t+1,k. This involves making the assignments

(A5)


(A6)

in the order as written.

Computing the probability P *Ne(X(i)):
By carrying out the forward-backward steps above on the first allele, the realization X(i)(1) is obtained. Then, N*(2) and S*(2) are computed and used in the forward-backward steps to obtain X(i)(2). Executing these steps for all the alleles yields the realization X(i), which is used in (7). PNe(Y, X(i)) in (7) is easily computed using the expansion shown between the large parentheses in (3).

It remains only to compute P*Ne(X(i)), which can be done by recording the probability of realizing each component X(i)t,k. Although this probability depends on the values of µt, {sigma}2t, N*(k), and several bookkeeping variables, we denote it here simply by (X(i)t,k). (The actual function is described later in this APPENDIX.) As long as the realization of X(i)(k) over alleles occurs in the same order over k (k = 1, 2, ... , K) for each i, then

(A7)

Details of :
The fact that we are realizing X(i)(k)'s one allele at a time requires that we do some extra bookkeeping to keep our importance-sampling scheme efficient. Primarily, we must avoid realizing X(i)'s for which PNe(Y, X(i)) = 0. Potential problems arise because by the method we use to realize values from P*Ne(X), Xt,k may only take values between 0 and 2N*t,k, inclusive. If 2N *t,k = 0 at any value of t, then for any s > t, X(i)s,k must also be 0. To avoid situations in which this leads to PNe(Y, X(i)) being 0 (as when X(i)t,k = 0 and Yt,k > 0) we introduce the following scheme and additional notation:

(A8)

Knowing the above quantities, we can define the function . In the remainder of this section and in the following one we drop the t and k subscripts for clarity.

With N* and {gamma} positive integers, {delta} {isin} {0, 1}, and {kappa} {isin} {0, 1, ... , min(2N* - {delta}, {gamma} - {delta})}, let ({theta}; N*, {delta}, {gamma}, {kappa}): 1 -> {{delta}, ... , 2N* - {kappa}} x [0, {pi}/2] be the many-to-one map that takes a realization of {theta} {isin} (-{infty}, {infty}) to the ordered pair (X, {theta}*), where X is an integer such that {delta} <= X <= 2N* - {kappa}, and {theta}* is a real number between 0 and {pi}/2, inclusive. may be described by the following pseudocode. We first define the quantities L = sin-1() and

Then,

if ({delta} = 2N* - {kappa} or {delta} = {gamma} - {kappa} = 0) then {theta}* <- 0 else if (L < {theta} < H) then {theta}* <- {theta} else if ({theta} < L)

and if ({delta} = 0) then {theta}* <- {theta}

else if ({delta} = 1) then {theta}[L] <- 2L - {theta} (this is reflection around {theta} = L), and then

if (L <= {theta}[L] < H) then {theta}* <- {theta}[L]

else we know {theta}[L] >= H, and we consider the sequence {theta}[i] = i(L - H) + {theta}[L], i = 1, 2, ... , and we assign {theta}* <- {theta}[i*], where i* is the least i such that L < {theta}[i] < H. (The sequence {theta}[i] represents successive translation leftward.)

else if ({theta} > H)

and if ({kappa} = 0) then {theta}* <- {pi}/2

else if ({kappa} > 1) then {theta}[H] <- 2H - {theta} (this is reflection around {theta} = H), and then

if (L < {theta}[H] < H) then {theta}* <- {theta}[H]

else we know {theta}[H] < L and we consider the sequence {theta}[j] = j(H - L) + {theta}[H], j = 1, 2, ... , and we assign {theta}* <- {theta}[j*], where j* is the least j such that L <= {theta}[j] < H. (The sequence {theta}[j] represents successive translation rightward.)

finally we use {theta}*, making the assignment X <- {lfloor}2N*sin2{theta}* + 0.5{rfloor},

where {lfloor}x{rfloor} denotes the largest integer <=x. The reflections and translations are depicted graphically in Fig 1A.

The probability µ,{sigma}2(X = X(i); N*, {delta}, {gamma}, {kappa}) of realizing X = X(i):
If {theta} ~ (µ, {sigma}2), and (X, {theta}*) = ({theta}; N*, {delta}, {gamma}, {kappa}), then we denote by µ,{sigma}2(X = X(i); N*, {delta}, {gamma}, {kappa}) the marginal probability that X = X(i). The value of µ,{sigma}2(X = X(i); N*, {delta}, {gamma}, {kappa}) can be expressed using the notation from the above section. First, µ,{sigma}2(X = X(i); N*, {delta}, {gamma}, {kappa}) = 0 if X(i) < {delta} or X(i) > 2N* - {kappa}, though such values of X(i) should never occur from anyway. Second, there are cases when constrains X(i) to be either 0 or 1 with probability 1. Hence if 2N* - {kappa} = {delta} or {gamma} - {kappa} = {delta} = 0 then µ,{sigma}2(X = {delta}; N*, {delta}, {gamma}, {kappa}) = 1.

If, on the other hand, 2N* - {kappa} > {delta} and {gamma} - {kappa} > 0, then for X(i) = 0 and X(i) = 2N* we have

while for 0 < X(i) < 2N* - {kappa} we define a = sin-1[] and b = sin-1[] and have

where I{·} is the indicator function and P(a <= {theta} < b) is the probability that a (µ, {sigma}2) random variable is between a and b, namely {int}ba(2{pi}{sigma}2) exp{}d{theta}. We compute this probability by numerical integration in our programs. In practice, the infinite sums are approximated by summing the first several terms of the series, until the contribution of the next term is very small (e.g., <10-7). Values of for different values of {delta} and {kappa} appear as shaded regions in Fig 1B&NDASH;D.

This folding and translating might seem to be a very involved process, but it is computationally much faster than realizing {theta} from a truncated normal distribution and computing the probability of X(i) when {theta} is from such a distribution.

Multilocus Monte Carlo variance calculation:
We derive an expression for the variance of a product of J independent random variables, Wj, j = 1, ... J:

Denoting Ne,j(Y) in (13) by Wj and taking the expectation gives the same result, verifying that the expression in (13) is unbiased for Var(JNe(Y)).


*  LITERATURE CITED
*TOP
*ABSTRACT
*FORMULATION OF THE MODEL...
*SIMULATED AND REAL DATASETS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

BAUM, L. E., 1972 An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, pp. 1–8 in Inequalities—III: Proceedings of the Third Symposium on Inequalities Held at the University of California, Los Angeles, September 1–9, 1969, edited by O. SHISHA. Academic Press, New York.

BAUM, L. E., T. PETRIE, G. SOULES, and N. WEISS, 1970  A maximization technique occurring in the statistical analysis of probabilistic functions on Markov chains. Ann. Math. Stat. 41:164-171.

BEGON, M., C. B. KRIMBAS, and M. LOUKAS, 1980  The genetics of Drosophila subobscura populations. XV. Effective size of a natural population estimated by three independent methods. Heredity 45:335-350.

CAVALLI-SFORZA, L. L. and A. W. F. EDWARDS, 1967  Phylogenetic analysis: models and estimation procedures. Evolution 21:550-570.

FRANKHAM, R., 1995  Inbreeding and extinction: a threshold effect. Conserv. Biol. 9:792-799.

GELMAN, A., 1996 Inference and monitoring convergence. pp. 131–143 in Markov Chain Monte Carlo in Practice, edited by W. R. GILKS, S. RICHARDSON and D. J. SPIEGELHALTER. Chapman & Hall, New York.

GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 1996 Bayesian Data Analysis. Chapman and Hall, New York.

HAMMERSLEY, J. M., and D. C. HANDSCOMB, 1964 Monte Carlo Methods. Methuen & Co. Ltd., London.

JORDE, P. E. and N. RYMAN, 1995  Temporal allele frequency change and estimation of effective size in populations with overlapping generations. Genetics 139:1077-1090[Abstract].

KRIMBAS, C. B. and S. TSAKAS, 1971  The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide control—selection or drift? Evolution 25:454-460.

NEI, M. and F. TAJIMA, 1981  Genetic drift and estimation of effective population size. Genetics 98:625-640[Abstract/Free Full Text].

NEYMAN, J. and E. L. SCOTT, 1948  Consistent estimates based on partially consistent observations. Econometrica 16:1-32.

POLLAK, E., 1983  A new method for estimating the effective population size from allele frequency changes. Genetics 104:531-548[Abstract/Free Full Text].

SOULÉ, M. E. (Editor), 1986 Conservation Biology: The Science of Scarcity and Diversity. Sinauer and Associates, Sunderland, MA.

WAPLES, R. S., 1989  A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121:379-391[Abstract/Free Full Text].

WILLIAMSON, E. G. and M. SLATKIN, 1999  Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152:755-761[Abstract/Free Full Text].




This article has been cited by other articles:


Home page
GeneticsHome page
J. P. Bollback, T. L. York, and R. Nielsen
Estimation of 2Nes From Temporal Allele Frequency Data
Genetics, May 1, 2008; 179(1): 497 - 502.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
P. E. Jorde and N. Ryman
Unbiased Estimator for Genetic Drift and Effective Population Size
Genetics, October 1, 2007; 177(2): 927 - 935.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. S. Waples and M. Yokota
Temporal Estimates of Effective Population Size in Species With Overlapping Generations
Genetics, January 1, 2007; 175(1): 219 - 233.
[Abstract] [Full Text] [PDF]