- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Anderson, E. C.
- Articles by Thompson, E. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Anderson, E. C.
- Articles by Thompson, E. A.
Monte Carlo Evaluation of the Likelihood for Ne From Temporally Spaced Samples
Eric C. Andersona, Ellen G. Williamsonb, and Elizabeth A. Thompsonca Interdisciplinary Program in Quantitative Ecology and Resource Management, University of Washington, Seattle, Washington 98195,
b Department of Integrative Biology, University of California, Berkeley, California 94720-3141
c Department of Statistics, University of Washington, Seattle, Washington 98195
Corresponding author: Eric C. Anderson, Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195., eriq{at}cqs.washington.edu (E-mail)
Communicating editor: G. A. CHURCHILL
| ABSTRACT |
|---|
A population's effective size is an important quantity for conservation and management. The effective size may be estimated from the change of allele frequencies observed in temporally spaced genetic samples taken from the population. Though moment-based estimators exist, recently Williamson and Slatkin demonstrated the advantages of a maximum-likelihood approach that they applied to data on diallelic genetic markers. Their computational methods, however, do not extend to data on multiallelic markers, because in such cases exact evaluation of the likelihood is impossible, requiring an intractable sum over latent variables. We present a Monte Carlo approach to compute the likelihood with data on multiallelic markers. So as to be computationally efficient, our approach relies on an importance-sampling distribution constructed by a forward-backward method. We describe the Monte Carlo formulation and the importance-sampling function and then demonstrate their use on both simulated and real datasets.
REDUCTIONS in population size can lead to inbreeding, which increases the probability of population extinction in typically outbreeding species (![]()
![]()
It is possible to estimate the variance effective size from observed changes in allele frequencies in a population over time. Moment-based estimators using F-statistics have been developed for this purpose (![]()
![]()
![]()
![]()
![]()
![]()
e of Ne, given allele frequencies observed in samples taken from a population at different times, one models the population underlying the samples as a Wright-Fisher population.
e is then the size of that underlying, ideal population for which the observed data are most probable. In simulation studies ![]()
This likelihood method has been restricted to data on diallelic loci, because, with data on multiallelic loci, evaluating the likelihood for Ne exactly is computationally intractable. Here we describe the problem as one of inference from a hidden Markov chain (![]()
| FORMULATION OF THE MODEL AND MONTE CARLO |
|---|
The model:
The data are genetic samples collected at different generations. The first sample is collected at generation 0 and the last sample at generation T. Any samples drawn at intervening generations may be evenly or irregularly spaced in time. For notational simplicity, we assume for now that individuals are genotyped at a single locus, though we describe later the extension to multiple, independently segregating loci. The data include K different allelic types, indexed by k = 1, ... , K. The allele frequencies observed in samples taken from different generations will differ due to genetic drift and sampling variation.
Let Yt = (Yt,1, ... , Yt,K) be the counts of the K different allelic types in the sample at generation t, and let St denote the number of diploid individuals in the sample. We assume that the samples were taken from a Wright-Fisher population of size Ne and denote the unobserved population allele counts at generation t by Xt = (Xt,1, ... , Xt,K), with
Kk=1 Xt,k = 2Ne. By the formulation of the Wright-Fisher model, the Xt form a Markov chain in time, with transitions defined by multinomial probabilities depending on Ne,
![]() |
(1) |
The genetic sample at a time t is assumed to be drawn with replacement from the copies of alleles present in the population at time t. This is equivalent to drawing the sample Yt from a very large gamete pool produced by the population at time t: sampling plan II of ![]()
![]()
![]() |
(2) |
when St > 0. If there is no sample taken from the population at generation t, then St
0, and we define PNe(Yt|Xt)
1.
Such a system forms a hidden Markov chain with the dependence structure shown in the directed graph of Fig 1. The allele counts in the population when the first sample is drawn, X0, are nuisance parameters. To avoid estimating X0 and the associated consistency problems with the maximum-likelihood estimator (![]()
(X0), on the population allele counts at time 0. The likelihood for Ne is the probability of the data Y = (Y0, ... , YT) given the parameter Ne. The probability of Y is the sum of the joint probability of Y and the latent variables X = (X0, ... , XT) over the space of all X:
![]() |
(3) |
|
For the case of K = 2 and Ne small, the likelihood in (3) may be computed exactly. ![]()
![]()
Monte Carlo evaluation:
For likelihood inference, we must evaluate PNe(Y) for a number of different values of Ne. Expressing this probability as an expectation with respect to the distribution of X gives
![]() |
(4) |
In this form the expectation would be taken over the marginal probabilities of X, and it could be estimated by Monte Carlo as
![]() |
(5) |
for large m, with X(i) being the ith realization from the marginal distribution of X. Such a naive scheme fails, however, because PNe(Y|X(i)) varies greatly over the values of X realized from their marginal distribution, resulting in enormous Monte Carlo variance.
Instead, we pursue a more efficient Monte Carlo approximation by using importance sampling (![]()
![]() |
(6) |
where
*Ne indicates that the expectation is over the space of X weighted by the distribution P*Ne(X). The expectation (6) may be estimated by Monte Carlo, giving
![]() |
(7) |
for large m, where X(i) is the ith realization of X drawn from P*Ne(X). The Monte Carlo variance of
Ne(Y) is made small when
varies little across the possible values of X and would be minimized if P*Ne(X) were exactly proportional to PNe(Y, X). Such a distribution of X would, by definition, be the conditional distribution PNe(X|Y). Unfortunately, for the same reasons that PNe(Y) cannot be computed exactly, it is infeasible to compute PNe(X|Y). Nonetheless, the Monte Carlo variance of
Ne(Y) will be reduced to the extent that P*Ne(X) resembles PNe(X|Y). We now describe a method for rapid simulation of X(i)'s from a distribution P*Ne(X) that is close to PNe(X|Y). As is required for the importance sampling, it is also possible to compute P*Ne(X(i)) quickly for each X(i) generated.
Sampling from P*Ne(X) by a forward-backward method:
![]()
![]() |
(8) |
which is normalized by the sum of that quantity over all the values of Xt. The last such conditional distribution computed is P(XT|Y0, ... , YT). The "backward step" begins with simulating a value X(i)T from this distribution (where, as before, the superscript (i) indicates a realized value of a random variable). One then proceeds backward, realizing X(i)T-1 from its conditional distribution given all of the observed variables, Y, and X(i)T. In similar fashion, one realizes X(i)T-2 and so forth back to X(i)0. In this backward phase, each X(i)t is simulated from its conditional distribution given all the data Y and all of the components of X that have been realized so far. That is, X(i)t is drawn from
![]() |
(9) |
Because of the conditional independence structure in a hidden Markov chain, (9) reduces to P(Xt|Y0, ... , Yt, X(i)t+1), which may be computed using the distributions stored during the forward step by the relation
![]() |
(10) |
At the end of the backward step, it is thus clear that the resulting realization (X(i)0, ... , X(i)T) is from the conditional distribution of X given Y.
An approximation for multiple alleles:
In our application, with multiple alleles at a locus, since there are so many possible states that each Xt may take, the above procedure is computationally infeasible. However, we make use of the ![]()
The first approximation is to perform the forward-backward cycle separately for each allele. To describe this, we introduce some more notation. Denote by X(k) the vector (X0,k, ... , XT,k) of latent counts of the kth allele from time t = 0 to t = T. Similarly we define Y(k) = (Y0,k, ... , YT,k). To do the forward-backward cycle separately over alleles we first focus on allele 1, simulating X(i)(1) by the forward-backward mechanism as if the data were on a diallelic locus with observed counts Y(1) from samples of size S0, ... , ST through time. Once we have realized X(i)(1) we update the sizes of the population and the sample. Thus we define the updated population size vector 2N*(2) = (2Ne - X(i)0,1, ... , 2Ne - X(i)T,1) and an updated sample size vector 2S*(2) = (2S*0,2, ... , 2S*T,2) = (2S0 - Y0,1, ... , 2ST - YT,1), in effect removing the first allelic type from the remainder of the data and the population. We then use the forward-backward mechanism again to simulate X(i)(2) as though the data were counts Y(2) from a diallelic locus drawn from a population with sizes that change over time N*(2) and sample sizes S*(2). This continues sequentially over alleles, updating population sizes and sample sizes as above: 2N*(k)
(2N*(k-1) - X(i)(k-1)) and 2S*(k)
(2S*(k-1) - Y(i)(k-1)), until X(K-1) has been realized, which also determines that X(K)
(2N*(K-1) - X(i)(K-1)). (Here and later we use the notation A
B to mean "the value B is assigned to the variable A.") At the end one has obtained a realized value X(i), which may be used in (7).
P*Ne(X) using a continuous approximation:
Although realizing alleles sequentially, as above, greatly reduces the number of terms required to use (8) and (10), the method would still involve a prohibitive amount of summation over binomial probabilities. Thus we construct P*Ne(X), employing a normal approximation to binomial probabilities, which replaces all such sums by analytically tractable integrals. Recall that if W
Binomial(n, p), then the transformed variable sin-1(W/n)1/2 is approximately normally distributed with variance 1/(4n). Note that this quantity does not depend on p. Hence we use this transformation to define the quantities
t,k = sin-1[
]
when S*t,k > 0, and
t,k = sin-1[
]
. By realizing the continuous values
(i)t,k in a forward-backward framework within a continuous setting, the computational demands are greatly reduced. And then, by transforming each
(i)t,k back into the appropriate discrete X(i)t,k we have a way to realize X(i) from P*Ne(X) and to compute the probability P*Ne(X(i)). The details of this procedure are given in the Appendix. We use it to compute the Monte Carlo estimate
Ne(Y) using (7).
Monte Carlo variance and multiple loci:
The quantity
Ne(Y) is only an estimate of the true value PNe(Y). By the central limit theorem, for large m, PNe(Y) will be approximately normally distributed (![]()
Ne(Y) and a variance that may be approximated without bias by the quantity
![]() |
(11) |
These facts may be used to obtain a confidence interval estimate around each
Ne(Y).
The ability to estimate Ne typically requires data from many loci. The extension to data on J independently segregating loci, indexed by j = 1, ... , J, is straightforwardeach locus is treated separately, and the estimated likelihoods from each locus are multiplied together. Thus, let
Ne,j(Y) be the Monte Carlo likelihood estimate from the data on the jth locus. The Monte Carlo likelihood estimate using all the loci is then
![]() |
(12) |
This requires that the initial allele counts have independent prior distributions,
(X0). An implicit assumption of (12) is consequently that the loci used are in linkage equilibrium at t = 0.
JNe(Y) will also have an approximately normal distribution. An unbiased estimator for its Monte Carlo variance (derived in Equation A9 in the Appendix) is
![]() |
(13) |
This can be used to compute a confidence interval estimate around
JNe(Y).
When displaying the Monte Carlo likelihood curve it is preferable to plot the log-likelihood values, log
JNe(Y), for different values of Ne. In this case, the endpoints of the confidence intervals may be similarly log-transformed.
| SIMULATED AND REAL DATASETS |
|---|
We demonstrate the method by computing log-likelihood curves for Ne from three different datasets. First, to verify that the method gives correct results we apply it to a simple simulated dataset (dataset 1) for which it is possible to compute the likelihood exactly. We simulated samples of 20 diallelic loci from 100 diploid individuals at generations 0, 6, and 12 from a Wright-Fisher population of 25 diploid individuals. For each locus, the initial allele frequency in the population at time zero was an independently drawn uniform real number between 0 and 1. The log-likelihood for Ne given these data was estimated for values between 10 and 52, in steps of 2, using m = 20,000 realizations of X from P*Ne(X) for each locus and each Ne.
We simulated a second dataset (dataset 2) to see how the method performed with multiallelic markers taken from a Wright-Fisher population. The dataset included three samples of 100 diploids for 12 five-allele loci at generations 0, 4, and 8 from a population of 50 diploids. The allele frequencies at each locus in generation 0 for these simulations were independently drawn from a uniform Dirichlet density with five components. For this dataset, the log-likelihood was computed for values of Ne between 20 and 100 in increments of 4 using m = 50,000 realizations of X for each locus and each value of Ne.
Finally, we computed a log-likelihood curve for Ne given data on a population of Drosophila in ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
The data appear as allele frequencies in Table 1 of ![]()
JNe(Y) at values of Ne between 200 and 1200 in increments of 50, with two more points (Ne = 425 and Ne = 475) included near the peak of the likelihood curve. For each point we used m = 500,000 realizations of X.
| RESULTS |
|---|
For each of the datasets, we were able to use our importance-sampling method to compute a log-likelihood curve. Using a program we wrote in C, the runs for datasets 1 and 2 each took
10 hr on a 266-Mhz laptop computer. The log-likelihood curves from datasets 1 and 2 appear as solid lines in Fig 2. The estimated 90% confidence intervals around each value of log
JNe(Y) appear as two dashed lines bordering the log-likelihood curve. Despite the fact that few Monte Carlo replicates (m = 20,000 and 50,000) were used, the Monte Carlo variance is minimal, as indicated by the fact that the dotted lines practically lie on top of the estimated log-likelihood curve. In both cases, the true values of Ne (25 and 50, respectively) are well within 2 units of log-likelihood from the maximum-likelihood estimates, which may be read from the graph as 24 and 56. Since dataset 1 consists only of diallelic loci, it is possible to compute the exact log-likelihood curve. This exact curve has been plotted as a dotted line in Fig 2A. It is impossible to distinguish the exact curve because the Monte Carlo estimate is very accurate in this case.
|
The log-likelihood curve computed for the data of ![]()
54 hr on a 450-Mhz desktop computer to produce the results. As before, the 90% confidence intervals around the Monte Carlo estimates appear as dotted lines. With this dataset, even with m = 500,000 realizations of X, the Monte Carlo variance is not negligible. It is, however, small enough that reliable inferences may be made from the log-likelihood curve. The maximum-likelihood estimate of Ne is 500. Using the values of Ne at which the log-likelihood has decreased 2 units from its maximum gives an estimate of a 95% confidence interval for the true Ne. These points are 250 and 975. By contrast, ![]()
![]()
|
| DISCUSSION |
|---|
As discussed in ![]()
Monte Carlo methods use realizations of random variables to estimate an expectation by a sample average. There are a number of ways one can express the likelihood of Ne as an expectation, and then estimate it by Monte Carlo, but few of those schemes will be successful, because most will have high Monte Carlo variance. We attempted several different schemes before settling on the importance-sampling method presented here. Although these less sophisticated Monte Carlo estimators produced reasonable estimates in very small problems, when applied to data involving loci with many alleles these methods failed to converge reliably, even after many days of computation (E. C. ANDERSON and E. G. WILLIAMSON, unpublished data).
The importance-sampling method we use is successful because our importance sampling function, P*Ne(X), closely resembles PNe(X|Y), the conditional probability of X given Y. This is achieved by recognizing the hidden Markov chain structure of the problem and using the forward-backward algorithm of ![]()
It should be pointed out that while many Monte Carlo problems involving high-dimensional random variables like X make use of Markov chain Monte Carlo (MCMC) methods, our method is not an MCMC method. In MCMC, successive realizations are correlated. In our method each X(i) realized from the distribution P*Ne(X) is independent of the other realized values. As a result, our method does not have the same problems of convergence assessment as does Markov chain simulation (![]()
It is interesting that our maximum-likelihood estimate differs so much from the estimate given by ![]()
![]()
![]()
The Monte Carlo variance of our estimate of the likelihood given the data from a natural population of Drosophila was higher than the variance associated with our estimates from simulated data. Although a good estimate was achieved after sufficient computation, it may still be that data generated under a model that differs from the Wright-Fisher model present difficulty for the Monte Carlo likelihood method. For example, it may be that the effective size of the natural Drosophila population was different during the two different sampling intervals. We note that one could extend the likelihood framework to allow for Ne changing over time. For example, if the estimated census size of the population were available and was known to change over time it would be more sensible to estimate directly the ratio,
, of the effective size of the population to the census size of the population. This ratio would be more useful for the purposes of modeling genetic change in the population than a single estimate of Ne over the entire time period between the first and last samples. The forward-backward approach implemented here could easily be modified to estimate this parameter,
.
It would also be possible to extend the present approach to compute likelihoods from different stochastic models of allele frequency change. We suspect there would seldom be enough information in the data to estimate accurately models in which allele frequency change is due jointly to genetic drift and some other genetic mechanism such as mutation or selection. However, our methods could be modified to handle different demographic models. For example, consider an alteration of the Wright-Fisher model such that the current generation is formed by sampling genes without replacement from a gamete pool in which each parental allele is represented by M gametes. In such a case, allele frequency changes occur due to multivariate hypergeometric sampling determined by the parameters N and M. For values of M that are not very small, the normal distribution is still a reasonable approximation to the hypergeometric distribution, and importance sampling could proceed as before. The importance-sampling distribution P*N,M(X) should then account for the decrease in variance (by a factor of 2N[M - 1]/[2N M - 1]) of the hypergeometric distribution relative to the binomial distribution. The same sort of sampling model and adjustment in the importance-sampling distribution could be used to accommodate scenarios in which one's genetic samples (the data Y) were assumed drawn without replacement from the population.
Another possible extension would be to stochastic models involving populations of organisms with more complex life histories, for example, overlapping generations or age-structured populations. As it becomes easier to gather extensive genetic data on populations, and as understanding of the structure within those populations improves, it will be possible to specify much richer probability models for temporal population genetic data. The forward-backward method here, and variations of it, should be useful in estimating population parameters from such models using Monte Carlo likelihood.
A software package, MCLEEPS, implementing the algorithm described in this article is available for free download from http://www.stat.washington.edu/thompson/Genepi/Mcleeps.shtml.
| ACKNOWLEDGMENTS |
|---|
We thank an anonymous referee for helpful comments on the manuscript and for suggesting the extension to the population model with hypergeometric sampling. This study was supported by National Science Foundation grant BIR-9807747 to E.T. Additionally, E.A. was supported by National Science Foundation grant BIR-9256537, the University of Washington QERM program, and a Burroughs Wellcome Fund PMMB training fellowship. E.W. was supported by National Institutes of Health grant GM40282 to M. Slatkin.
Manuscript received December 27, 1999; Accepted for publication August 7, 2000.
| APPENDIX |
|---|
Using
t,k and
t,k in a continuous setting:
We define the random variables
t,k = sin-1[
]
when S*t,k > 0, and
t,k = sin-1[
when N*t,k > 0. These quantities have an approximate normal distribution, which is independent of their means. We use them in our construction of the importance-sampling function PNe(X). Below, we concentrate on their use for realizing X(i)(k), keeping in mind that if k > 1 then we will have already realized X(i)(k-1), and we use the updated population and sample sizes N*(k) and S*(k). If k = 1 then N*(1) = (Ne, ... , Ne) and S*(1) = (S0, ... , ST), respectively.
The forward step:
Following ![]()
t-1,k is normally (
) distributed with mean µt-1 and variance
2t-1, then, after a generation of genetic drift in a population of N*t,k diploids,
t,k has an approximate normal distributions with mean µt-1 and variance
2t =
2t-1 +
. If there are data Yt,k from a sample of size St,k at time t, then
t,k has an approximate normal distribution with mean
t,k and variance
, so, given that
t,k
(µt,
2t), the conditional distribution of
t,k given
t,k is also normal. These relations form the basis of a continuous approximation for doing the forward step. For the purpose of realizing X we assume that the uniform prior on X0 is equivalent to a diffuse prior on
0,k. Therefore
0,k |
0,k
(µ0,
20) with µ0 =
0,k and
20 =
. With that as a starting point, we work iteratively forward in time, assigning values
![]() |
(A1) |
![]() |
(A2) |
if S*t,k = 0. If S*t,k > 0, however, then one first computes µt and
2 as in (A1) and (A2), but then further updates the values to reflect the information in the sample at time t:
![]() |
(A3) |
![]() |
(A4) |
This is analogous to computing a posterior distribution from a normal prior and normal data (see, for example, ![]()
Carrying this out until t = T gives values for the mean and variance of
T,k given
0,k, ... ,
T,k, assuming they follow a normal distribution. In fact, for each t, it gives us the parameters for the normal distribution of
t,k conditional on
r,k for all r
t. We are thus in a position to realize
(i)t,k's in the backward step and transform those
(i)t,k's back into the X(i)t,k's that we need.
The backward step:
The backward step is more complicated than the forward step, because after realizing each value of
(i)t,k we must transform it into the discrete value X(i)t,k that we require. This transformation process requires some extra bookkeeping to ensure that we do not waste time realizing X(i)'s that are incompatible with the data. This is described in the next section of the APPENDIX. We first realize the value
(i)T,k from a
(µT,
2T) distribution. Then we transform that to the realization X(i)T,k by a many-to-one map
, which has two effects: the first is that of folding and translating the distribution of
T,k so that it is bounded between 0 and
/2, mapping
(i)T,k
(-
,
) to a value
*T,k
[0,
]. The second is transforming that
*T,k into the appropriate value X(i)T,k (see the next section).
Working backward, each
(i)T,k for t = T - 1 down to t = 0, is realized from a
(µt,
2t) distribution and then transformed into the corresponding
*T,k and X(i)t,k by
. In keeping with (10), before
(i)t,k is realized, µt and
2t must be appropriately updated, on the basis of the values of µt and
2t stored during the forward step and the realized value
(i)t+1,k. This involves making the assignments
![]() |
(A5) |
![]() |
(A6) |
in the order as written.
Computing the probability P *Ne(X(i)):
By carrying out the forward-backward steps above on the first allele, the realization X(i)(1) is obtained. Then, N*(2) and S*(2) are computed and used in the forward-backward steps to obtain X(i)(2). Executing these steps for all the alleles yields the realization X(i), which is used in (7). PNe(Y, X(i)) in (7) is easily computed using the expansion shown between the large parentheses in (3).
It remains only to compute P*Ne(X(i)), which can be done by recording the probability of realizing each component X(i)t,k. Although this probability depends on the values of µt,
2t, N*(k), and several bookkeeping variables, we denote it here simply by
(X(i)t,k). (The actual function
is described later in this APPENDIX.) As long as the realization of X(i)(k) over alleles occurs in the same order over k (k = 1, 2, ... , K) for each i, then
![]() |
(A7) |
Details of
:
The fact that we are realizing X(i)(k)'s one allele at a time requires that we do some extra bookkeeping to keep our importance-sampling scheme efficient. Primarily, we must avoid realizing X(i)'s for which PNe(Y, X(i)) = 0. Potential problems arise because by the method we use to realize values from P*Ne(X), Xt,k may only take values between 0 and 2N*t,k, inclusive. If 2N *t,k = 0 at any value of t, then for any s > t, X(i)s,k must also be 0. To avoid situations in which this leads to PNe(Y, X(i)) being 0 (as when X(i)t,k = 0 and Yt,k > 0) we introduce the following scheme and additional notation:
![]() |
(A8) |
Knowing the above quantities, we can define the function
. In the remainder of this section and in the following one we drop the t and k subscripts for clarity.
With N* and
positive integers,
{0, 1}, and
{0, 1, ... , min(2N* -
,
-
)}, let
(
; N*,
,
,
):
1
{
, ... , 2N* -
} x [0,
/2] be the many-to-one map that takes a realization of
(-
,
) to the ordered pair (X,
*), where X is an integer such that
X
2N* -
, and
* is a real number between 0 and
/2, inclusive.
may be described by the following pseudocode. We first define the quantities L = sin-1(
)
and

Then,
if (
= 2N* -
or
=
-
= 0) then
*
0 else if (L <
< H) then
*
else if (
< L)
and if (
= 0) then
*
else if (
= 1) then
[L]
2L -
(this is reflection around
= L), and then
if (L
[L] < H) then
*
[L]
else we know
[L]
H, and we consider the sequence
[i] = i(L - H) +
[L], i = 1, 2, ... , and we assign
*
[i*], where i* is the least i such that L <
[i] < H. (The sequence
[i] represents successive translation leftward.)
else if (
> H)
and if (
= 0) then
*
/2
else if (
> 1) then
[H]
2H -
(this is reflection around
= H), and then
if (L <
[H] < H) then
*
[H]
else we know
[H] < L and we consider the sequence
[j] = j(H - L) +
[H], j = 1, 2, ... , and we assign
*
[j*], where j* is the least j such that L
[j] < H. (The sequence
[j] represents successive translation rightward.)
finally we use
*, making the assignment X
2N*sin2
* + 0.5
,
where
x
denotes the largest integer
x. The reflections and translations are depicted graphically in Fig 1A.
The probability
µ,
2(X = X(i); N*,
,
,
) of realizing X = X(i):
If
(µ,
2), and (X,
*) =
(
; N*,
,
,
), then we denote by
µ,
2(X = X(i); N*,
,
,
) the marginal probability that X = X(i). The value of
µ,
2(X = X(i); N*,
,
,
) can be expressed using the notation from the above section. First,
µ,
2(X = X(i); N*,
,
,
) = 0 if X(i) <
or X(i) > 2N* -
, though such values of X(i) should never occur from
anyway. Second, there are cases when
constrains X(i) to be either 0 or 1 with probability 1. Hence if 2N* -
=
or
-
=
= 0 then
µ,
2(X =
; N*,
,
,
) = 1.
If, on the other hand, 2N* -
>
and
-
> 0, then for X(i) = 0 and X(i) = 2N* we have

while for 0 < X(i) < 2N* -
we define a = sin-1[
]
and b = sin-1[
]
and have

where I{·} is the indicator function and P(a
< b) is the probability that a
(µ,
2) random variable is between a and b, namely
ba(2
2)
exp{
}d
. We compute this probability by numerical integration in our programs. In practice, the infinite sums are approximated by summing the first several terms of the series, until the contribution of the next term is very small (e.g., <10-7). Values of
for different values of
and
appear as shaded regions in Fig 1B&NDASH;D.
This folding and translating might seem to be a very involved process, but it is computationally much faster than realizing
from a truncated normal distribution and computing the probability of X(i) when
is from such a distribution.
Multilocus Monte Carlo variance calculation:
We derive an expression for the variance of a product of J independent random variables, Wj, j = 1, ... J:

Denoting
Ne,j(Y) in (13) by Wj and taking the expectation gives the same result, verifying that the expression in (13) is unbiased for Var(
JNe(Y)).
| LITERATURE CITED |
|---|
BAUM, L. E., 1972 An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, pp. 18 in InequalitiesIII: Proceedings of the Third Symposium on Inequalities Held at the University of California, Los Angeles, September 19, 1969, edited by O. SHISHA. Academic Press, New York.
BAUM, L. E., T. PETRIE, G. SOULES, and N. WEISS, 1970 A maximization technique occurring in the statistical analysis of probabilistic functions on Markov chains. Ann. Math. Stat. 41:164-171.
BEGON, M., C. B. KRIMBAS, and M. LOUKAS, 1980 The genetics of Drosophila subobscura populations. XV. Effective size of a natural population estimated by three independent methods. Heredity 45:335-350.
CAVALLI-SFORZA, L. L. and A. W. F. EDWARDS, 1967 Phylogenetic analysis: models and estimation procedures. Evolution 21:550-570.
FRANKHAM, R., 1995 Inbreeding and extinction: a threshold effect. Conserv. Biol. 9:792-799.
GELMAN, A., 1996 Inference and monitoring convergence. pp. 131143 in Markov Chain Monte Carlo in Practice, edited by W. R. GILKS, S. RICHARDSON and D. J. SPIEGELHALTER. Chapman & Hall, New York.
GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 1996 Bayesian Data Analysis. Chapman and Hall, New York.
HAMMERSLEY, J. M., and D. C. HANDSCOMB, 1964 Monte Carlo Methods. Methuen & Co. Ltd., London.
JORDE, P. E. and N. RYMAN, 1995 Temporal allele frequency change and estimation of effective size in populations with overlapping generations. Genetics 139:1077-1090[Abstract].
KRIMBAS, C. B. and S. TSAKAS, 1971 The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide controlselection or drift? Evolution 25:454-460.
NEI, M. and F. TAJIMA, 1981 Genetic drift and estimation of effective population size. Genetics 98:625-640
NEYMAN, J. and E. L. SCOTT, 1948 Consistent estimates based on partially consistent observations. Econometrica 16:1-32.
POLLAK, E., 1983 A new method for estimating the effective population size from allele frequency changes. Genetics 104:531-548
SOULÉ, M. E. (Editor), 1986 Conservation Biology: The Science of Scarcity and Diversity. Sinauer and Associates, Sunderland, MA.
WAPLES, R. S., 1989 A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121:379-391
WILLIAMSON, E. G. and M. SLATKIN, 1999 Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152:755-761
This article has been cited by other articles:
![]() |
J. P. Bollback, T. L. York, and R. Nielsen Estimation of 2Nes From Temporal Allele Frequency Data Genetics, May 1, 2008; 179(1): 497 - 502. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. E. Jorde and N. Ryman Unbiased Estimator for Genetic Drift and Effective Population Size Genetics, October 1, 2007; 177(2): 927 - 935. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. S. Waples and M. Yokota Temporal Estimates of Effective Population Size in Species With Overlapping Generations Genetics, January 1, 2007; 175(1): 219 - 233. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kitakado, S. Kitada, Y. Obata, and H. Kishino Simultaneous Estimation of Mixing Rates and Genetic Drift Under Successive Sampling of Genetic Markers With Application to the Mud Crab (Scylla paramamosain) in Japan. Genetics, August 1, 2006; 173(4): 2063 - 2072. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wang Estimation of effective population sizes from data on genetic markers Phil Trans R Soc B, July 29, 2005; 360(1459): 1395 - 1409. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. C. Anderson An Efficient Monte Carlo Method for Estimating Ne From Temporally Spaced Samples Using a Coalescent-Based Likelihood Genetics, June 1, 2005; 170(2): 955 - 967. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Excoffier, A. Estoup, and J.-M. Cornuet Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers Genetics, March 1, 2005; 169(3): 1727 - 1738. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Achaz, S. Palmer, M. Kearney, F. Maldarelli, J. W. Mellors, J. M. Coffin, and J. Wakeley A Robust Measure of HIV-1 Population Turnover Within Chronically Infected Individuals Mol. Biol. Evol., October 1, 2004; 21(10): 1902 - 1912. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Goldringer and T. Bataillon On the Distribution of Temporal Variations in Allele Frequency: Consequences for the Estimation of Effective Population Size and the Detection of Loci Undergoing Selection Genetics, September 1, 2004; 168(1): 563 - 568. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Tallmon, G. Luikart, and M. A. Beaumont Comparative Evaluation of a New Effective Population Size Estimator Based on Approximate Bayesian Computation Genetics, June 1, 2004; 167(2): 977 - 988. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Beaumont Estimation of Population Growth or Decline in Genetically Monitored Populations Genetics, July 1, 2003; 164(3): 1139 - 1160. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Laval, M. SanCristobal, and C. Chevalet Maximum-Likelihood and Markov Chain Monte Carlo Approaches to Estimate Inbreeding and Effective Size From Allele Frequency Changes Genetics, July 1, 2003; 164(3): 1189 - 1204. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Miller and L. P. Waits The history of effective population size and genetic diversity in the Yellowstone grizzly (Ursus arctos): Implications for conservation PNAS, April 1, 2003; 100(7): 4334 - 4339. [Abstract] [Full Text] [PDF] |
||||
![]() |
Estimating Effective Population Size and Migration Rates From Genetic Samples Over Space and Time Genetics, January 1, 2003; 163(1): 429 - 446. |
||||
![]() |
T. F. Turner, J. P. Wares, and J. R. Gold Genetic Effective Size Is Three Orders of Magnitude Smaller Than Adult Census Size in an Abundant, Estuarine-Dependent Marine Fish (Sciaenops ocellatus) Genetics, November 1, 2002; 162(3): 1329 - 1339. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Berthier, M. A. Beaumont, J.-M. Cornuet, and G. Luikart Likelihood-Based Estimation of the Effective Population Size Using Temporal Changes in Allele Frequencies: A Genealogical Approach Genetics, February 1, 2002; 160(2): 741 - 751. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Anderson, E. C.
- Articles by Thompson, E. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Anderson, E. C.
- Articles by Thompson, E. A.




























