Genetics, Vol. 164, 1139-1160, July 2003, Copyright © 2003

Estimation of Population Growth or Decline in Genetically Monitored Populations

Mark A. Beaumonta
a School of Animal and Microbial Sciences, Reading RG6 6AJ, United Kingdom

Corresponding author: Mark A. Beaumont, Whiteknights, PO Box 228, Reading RG6 6AJ, United Kingdom., m.a.beaumont{at}reading.ac.uk (E-mail)

Communicating editor: G. CHURCHILL


*  ABSTRACT
*TOP
*ABSTRACT
*IMPLEMENTATION OF MARKOV CHAIN...
*INFERENCE IN THE TEMPORAL...
*SIMULATION TESTS
*ANALYSIS OF EXAMPLE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

This article introduces a new general method for genealogical inference that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC). It is then possible to more easily utilize the advantages of importance sampling in a fully Bayesian framework. The method is applied to the problem of estimating recent changes in effective population size from temporally spaced gene frequency data. The method gives the posterior distribution of effective population size at the time of the oldest sample and at the time of the most recent sample, assuming a model of exponential growth or decline during the interval. The effect of changes in number of alleles, number of loci, and sample size on the accuracy of the method is described using test simulations, and it is concluded that these have an approximately equivalent effect. The method is used on three example data sets and problems in interpreting the posterior densities are highlighted and discussed.


THE effect of inbreeding on population fitness is currently the focus of many studies, both empirical (SACCHERI et al. 1998 Down) and theoretical (LYNCH et al. 1995 Down; LANDE 1998 Down). One motivation behind these studies is the need to investigate the genetic component of the threat to endangered species arising from low population size. The rate of inbreeding depends on Ne, which is generally much lower than the census size. If it can be assumed that the ratio of effective to census size is approximately constant, the detection of historical changes in Ne may indicate changes in census size. Similarly, if the ratio of effective to census size can be estimated for one population it can then be used to estimate census sizes in other populations for which only genetic information is available (BEAUMONT 2001 Down).

Estimation of Ne is problematic. There are three general approaches. One way is to estimate it nongenetically from the mating system (CABALLERO 1994 Down). However, this is generally unsatisfactory because detailed life-history information is required, as well as good estimates of census size, which is often unavailable with sufficient precision to make a good estimate of Ne (FRANKHAM 1995 Down). Furthermore, cross-generational effects that are difficult to measure, such as serial correlations in family size, may cause a substantial reduction in Ne from that expected purely from consideration of the variance in reproductive success (AUSTERLITZ and HEYER 1998 Down). An alternative approach is to use information from single genetic samples. For example, using a mutation model, Ne can be estimated from the variability in the sample (e.g., GRIFFITHS and TAVARE 1994A Down, GRIFFITHS and TAVARE 1994B Down, GRIFFITHS and TAVARE 1994C Down; KUHNER et al. 1995 Down; WILSON and BALDING 1998 Down; STORZ and BEAUMONT 2002 Down). A problem with this approach is that the value that is estimated may have little relationship to current rates of inbreeding or any value of Ne that could be estimated from direct observation of the mating system of the population. This is because, over the timescale in which the observed variability is generated by mutation, the unknown details of population history, gene flow, and metapopulation structure will greatly influence estimates of Ne, which is then probably best regarded as simply a scaling coefficient in a coalescent model (DONNELLY and TAVARE 1995 Down; NORDBORG 1997 Down; WAKELEY 1999 Down; WAKELEY and ALIACAR 2001 Down). Alternatively, genotypic disequilibria in single samples can be used to estimate Ne. This can be achieved by measuring departures from either Hardy-Weinberg equilibrium (PUDOVKIN et al. 1996 Down; LUIKART and CORNUET 1999 Down) or linkage disequilibrium (LANGLEY et al. 1978 Down; LAURIE-AHLBERG and WEIR 1979 Down; HILL 1981 Down). These have the advantage that they measure Ne on a more recent timescale, but have generally low power and are susceptible to the influence of many other phenomena.

The most widely used method to estimate Ne from genetic samples is from the difference in gene frequency between serial samples taken from the same population. This is the "temporal method," first introduced by KRIMBAS and TSAKAS 1971 Down. Their method-of-moments estimator has been elaborated by NEI and TAJIMA 1981 Down, POLLAK 1983 Down, and WAPLES 1989 Down. More recently WILLIAMSON and SLATKIN 1999 Down, ANDERSON et al. 2000 Down, WANG 2001 Down, and BERTHIER et al. 2002 Down have developed likelihood-based estimators, which show modest to rather more substantial improvements in accuracy over the method-of-moments estimators. In addition, WILLIAMSON and SLATKIN 1999 Down and WANG 2001 Down have been able to estimate change in population size, further illustrating the flexibility of likelihood-based approaches. WILLIAMSON and SLATKIN 1999 Down estimated likelihoods from a Wright-Fisher model in which any number of serial samples could be analyzed. Their method is practicable only for the biallelic case. More recently ANDERSON et al. 2000 Down used importance sampling to improve the speed of the approach, which makes it practicable to look at multiallelic data. WANG 2001 Down has suggested a further improvement in computational speed by approximating the probability of the data by the product of the marginal probabilities for each allele, thus reducing the problem to that studied by WILLIAMSON and SLATKIN 1999 Down, but solved substantially more efficiently. The method of BERTHIER et al. 2002 Down differs from the other three methods in that likelihoods are estimated from a coalescent model in which two samples are analyzed. Since only two samples are analyzed in their method it is not possible to make inferences about changes in population size.

The above methods all assume that the sampling period is sufficiently short that the effects of mutations can be safely ignored. Recently a strand of research that is independent of that initiated by KRIMBAS and TSAKAS 1971 Down was identified and has been motivated by the need to study human immunodeficiency virus viral dynamics and evolution on the basis of sequence data from serial samples (RODRIGO et al. 1999 Down; FU 2001 Down; DRUMMOND et al. 2002 Down). These methods use a coalescent model with mutations, and that of DRUMMOND et al. 2002 Down allows for full Bayesian estimation of mutational, demographic, and genealogical parameters from sequence data.

This study assumes that the effects of mutations over the sampling period can be ignored and makes three contributions. First, it is shown how the Monte Carlo method of importance sampling can be used to update sets of genealogies in a Markov chain Monte Carlo simulation to estimate posterior distributions of parameters of interest. This method is very general and can be applied to all models of genealogical inference and may lead to increased efficiency in implementation and execution. This computational method is applied to the coalescent-based model of BERTHIER et al. 2002 Down, described above. Second, the model of BERTHIER et al. 2002 Down is generalized to consider any number of samples in a temporal sequence rather than just the two previously considered. Third, the method is further extended to estimate parameters in a model of population growth and decline, similar to that studied in BEAUMONT 1999 Down.


*  IMPLEMENTATION OF MARKOV CHAIN MONTE CARLO WITH INDEPENDENT SAMPLING OF GENEALOGICAL HISTORIES
*TOP
*ABSTRACT
*IMPLEMENTATION OF MARKOV CHAIN...
*INFERENCE IN THE TEMPORAL...
*SIMULATION TESTS
*ANALYSIS OF EXAMPLE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Background and motivations:
The potential for genetic data to shed light on the evolutionary history of populations has been well appreciated over the last decade, and in the development of the statistical methodology there has been a general interest in moving away from moment-based methods of estimation to the use of likelihood and Bayesian inference (STEPHENS 2001 Down). Reflecting the youth of this field, the computational and technical details of the different approaches to inference tend to dominate much of the research. There are currently two interrelated computer-intensive methods to statistical inference—Markov chain Monte Carlo (MCMC) and importance sampling (IS). MCMC is a method for generating autocorrelated random samples from probability distributions. IS is generally used to approximate likelihoods based on independent samples. In population genetics these have been combined to give three main groups of methods, as distinguished by STEPHENS and DONNELLY 2000 Down: those in which the genealogical histories are independently sampled using IS, with likelihoods for specific parameters then computed from the sample of genealogical histories (GRIFFITHS and TAVARE 1994A Down; ANDERSON et al. 2000 Down), which are referred to here loosely as "pure IS" methods; those in which autocorrelated genealogical histories are sampled using MCMC, with likelihoods for specific parameters then computed from the sampled genealogical histories (KUHNER et al. 1995 Down; BEERLI and FELSENSTEIN 2001 Down); and those in which there is autocorrelated sampling of genealogical histories and demographic/mutational parameters. The latter approach yields samples from the posterior distribution of parameter values and leads naturally to Bayesian inference or the use of integrated likelihood (e.g., WILSON and BALDING 1998 Down; BEAUMONT 1999 Down; NIELSEN and WAKELEY 2001 Down; DRUMMOND et al. 2002 Down). By contrast, the other two approaches, which use importance sampling to approximate likelihood surfaces, tend to lead to more classical likelihood-based inference.

In the MCMC methods that give autocorrelated samples of parameter values the necessary integration for Bayesian inference is intrinsic, and the only additional computation, if required, is to estimate the posterior densities from the sampled points. By contrast, with importance sampling methods that give approximations of likelihood surfaces directly, further complex procedures are necessary for Bayesian inference. Given that it is generally easier to estimate densities (thereby enabling the choice of classical likelihood-based estimation, integrated likelihoods, or fully Bayesian inference) than to manipulate the approximated-likelihood surfaces, it would seem that methods that give autocorrelated samples of parameter values offer the greatest flexibility. However, these methods have currently two main drawbacks. They involve making small modifications to the genealogical history, and because of this (a) they are generally more difficult to program than pure IS methods, and (b) they can move quite slowly through the space of possible genealogical histories, making them potentially inefficient. This article introduces a method for overcoming these two disadvantages and applies it to a specific problem, the estimation of effective population size, Ne, from temporally spaced genetic samples.

The general motivation behind these computer-intensive methods is that from coalescent theory it is straightforward to calculate the probability p(D, G|{Phi}) = p(D|G)P(G|{Phi}) of any particular genealogical history, G, that gives rise to some data D, as a function of parameters specifying the demographic history and mutation model, {Phi}. There are a number of different representations of the genealogical history (see STEPHENS and DONNELLY 2000 Down), and in this article I consider it to be the timed sequence of coalescent and mutation events in the genealogical history of a sample, so that p(D|G) = 1 if the genealogical history can give rise to the data and 0 otherwise. Any particular data set can be obtained from very many different genealogical histories, and to calculate the likelihood we need to evaluate

(1)

where, following STEPHENS and DONNELLY 2000 Down, the integral denotes a summation over all discrete states (e.g., pattern of coalescences and mutations) and integration over continuous states (e.g., duration of intervals between events). Estimation of p(D|{Phi}) directly is most conveniently made using importance sampling (GRIFFITHS and TAVARE 1994A Down; STEPHENS and DONNELLY 2000 Down). In importance sampling, the equation is rewritten as

and this is estimated by sampling Gj from q(G|{Phi}) and evaluating

(2)

Generally the sampling distribution is chosen such that P(D|Gj) = 1 for all Gj. In the ideal case that q(G|{Phi}) = p(G|D, {Phi}), i.e., the posterior distribution of genealogical histories given the data and parameters, the variance in the estimate of p(D|{Phi}) is zero because each term in (2) evaluates to the likelihood axiomatically. The ratio p(G|{Phi})/q(G|{Phi}) is called the importance ratio, or importance weight.

However, the evaluation or estimation of p(D|{Phi}) is not necessarily an ideal goal for population genetic inference. The problem is that {Phi} often has many components, and generally we wish to make inferences about one component (e.g., growth rate) independent of the others. Furthermore, for most population genetic problems, the likelihood surfaces do not approximate that of a multivariate normal distribution, and therefore asymptotic theory and methods often do not apply. These problems can be side-stepped by taking a Bayesian approach to inference, which also has the advantage that background information can be incorporated into the model (WILSON and BALDING 1998 Down). In this case we estimate the posterior distribution

Inferences on particular parameters can be made from the marginal posterior distribution, where p({Phi}|D) is integrated over all other parameters. If uniform improper priors are used p({Phi}|D) {propto} p(D|{Phi}) and the methods used to obtain marginal posterior distributions will also give the integrated (relative) likelihood surface. Considerations of how best to make inferences on single parameters in multiparameter models has led, for example, NIELSEN and WAKELEY 2001 Down to advocate that there are many advantages to using integrated likelihoods even when a frequentist approach is preferred.

The only method currently used to perform fully Bayesian analyses for population genetic inference has been Metropolis-Hastings sampling of parameter values (e.g., WILSON and BALDING 1998 Down; BEAUMONT 1999 Down). Although the potential to use purely importance-sampling approaches for Bayesian analyses has been discussed (e.g., FEARNHEAD and DONNELLY 2001 Down), no such analysis of genetic data based on importance sampling has yet been published, and there has been no proposal for how this could easily be done for a complex multiparameter model, such as a hierarchical Bayesian model (STORZ and BEAUMONT 2002 Down).

To perform Metropolis-Hastings sampling it is not necessary to evaluate p(D|{Phi}), and we can work with p(D, G|{Phi}), which is easily calculated from coalescent theory. Starting with any Gi such that P(D|Gi) = 1, modify Gi -> Gi+1 [where P(D|Gi+1) = 1] and {Phi}i -> {Phi}i+1 such that it is straightforward to calculate the probability, p(Gi+1, {Phi}i+1|Gi, {Phi}i), of obtaining Gi+1 and {Phi}i+1, conditional on being at Gi, {Phi}i, and the reverse. Then accept Gi+1 and {Phi}i+1, with probability

(3)

otherwise Gi+1 = Gi, and {Phi}i+1 = {Phi}i. The first term in the product is the likelihood ratio, the second is the Hastings term, and the third is the ratio of the priors. The Hastings term is the ratio of the probability of reaching the current state from the proposed state to that of the reverse and ensures a uniform coverage of the parameter space. This simulated Markov chain will then give a (serially autocorrelated) sample from p({Phi}, G|D). Summaries of the marginal posterior density for a particular parameter or an estimate of the density itself can be obtained from the simulated sequence of values realized for that parameter, ignoring the others. The key point here is that it is possible to perform the simulation using p(D, G|{Phi}), which is easy to calculate, by updating the genealogical history G, and then the posterior distribution for the parameters of interest are obtained marginal to the genealogical histories. The price for this convenience is that the search space of the MCMC simulation is greatly increased. If it were possible to evaluateEquation 1, then p({Phi}|D) marginal to G could have been obtained by running the simulation with p(D|{Phi}) and updating {Phi}i -> {Phi}i+1 alone—i.e., accepting {Phi}i+1 with probability

(4)

and thus only {Phi} would have to be explored by the MCMC simulation.

Current methods that use independent sampling of genealogical histories within an MCMC framework:
Hitherto it has been easier to run the MCMC using p(G, D|{Phi}) and hence autocorrelated sampling of genealogical histories, but, as discussed above, there are programming problems and problems of efficiency with this approach. Therefore it is tempting to consider the use of importance sampling to obtain an approximation, (D|{Phi}), which can then be implemented in an MCMC simulation to incorporate prior information, and obtain marginal posterior distributions or integrated likelihoods, as discussed above. One advantage of importance sampling is that it is often very straightforward to implement in a computer program. Also, because the importance-sampling function uses heuristics from coalescent theory to attempt to generate genealogies from their posterior distribution, given the data, it is a potentially more efficient method for sampling genealogical histories in comparison with MCMC.

This approach has been used in a series of articles (O'RYAN et al. 1998 Down; CIOFI et al. 1999 Down; CHIKHI et al. 2001 Down; BERTHIER et al. 2002 Down) to make inferences based on coalescent models of drift without mutations (reviewed in BEAUMONT 2001 Down). A related method has been used by O'NEILL et al. 2000 Down for an epidemiological model. The likelihood ratio

inEquation 4 is replaced by

estimated (in the genealogical analyses) usingEquation 2. Note that in normal MCMC, if the denominator p(D|{Phi}i) in the likelihood ratio were known without error there would be no need to reevaluate it each time that R was evaluated. By contrast, with there is a choice whether to make independent estimates of (D|{Phi}i) when it is evaluated at each update of the MCMC (evaluation ofEquation 4) or to reuse the earlier estimate. Intuitively it seems reasonable, though more time consuming, to reevaluate it each time so that the estimates of are independent of each other and so that unusually large ratios arising by chance do not lead to sticking of the simulated Markov chain. Furthermore, the results in O'NEILL et al. 2000 Down concerning bias correction (discussed below) require independence of the estimates. The reevaluation approach has been taken in all the genealogical models that have used the method and also by O'NEILL et al. 2000 Down. Updates are required only for {Phi} and not for G as in the MCMC methods of WILSON and BALDING 1998 Down and BEAUMONT 1999 Down. This general method, where the MCMC uses an approximation to the likelihood, is abbreviated here as Monte Carlo within Metropolis (MCWM), following the terminology of O'NEILL et al. 2000 Down. A basic algorithm for MCWM is as follows:

  1. Choose initial parameter values {Phi}i with i = 0.

  2. Sample h independent genealogical histories using importance-sampling function and calculate (D|{Phi}i) fromEquation 2.

  3. Draw {Phi}i+1 ~ p({Phi}i+1|{Phi}i).

  4. Sample h independent genealogical histories using importance-sampling function and calculate (D|{Phi}i+1) fromEquation 2.

  5. Accept {Phi}i+1 with probability

    Otherwise {Phi}i+1 = {Phi}i.

  6. Set i = i + 1 and go to 2.

Bias correction:
Clearly, since is based on an approximation of the likelihood ratio the posterior distribution will also be approximate. Simulation tests performed in O'RYAN et al. 1998 Down suggested that an IS size of 500 was sufficient to obtain accurate estimates of posterior distributions, and this number has been used for subsequent articles.

O'NEILL et al. 2000 Down have carried out an analogous procedure where the likelihoods are estimated by a Monte Carlo (MC) method. They show that the method should be exact, independent of the sampling variance, providing that

where R is the true likelihood ratio and is its estimate. They suggest using the estimator R* = 2/[], where [] is an estimate of the expected value of the ratio, to correct for the bias in . Details of how R* has been estimated for the genealogical model considered here are given in the Appendix. In the results below, simulations carried out with this bias correction are referred to here as MCWM with bias correction and the earlier method as MCWM without bias correction.

Independence Metropolis-Hastings simulation:
The methods described above all use importance sampling to approximate p(D|{Phi}) and, with or without bias correction, will lead the MCMC simulation to sample from an approximate posterior distribution. I now show how a small modification to the approach will guarantee that the MCMC will sample from the true posterior distribution.

Consider now an importance sample of size 1 (i.e., h = 1 inEquation 2 above). The importance weight,

is an (admittedly very poor) estimate of p(D|{Phi}) as described above, but is also the ratio of the probability of sampling the genealogy under the coalescent to the probability of sampling the genealogy under the importance-sampling function. Supposing this were used in the Metropolis-Hastings simulation described by (3), the ratio of importance weights for the ith and i + 1th MCMC update is

which, multiplied with the Hastings term for the parameter updates, p({Phi}i|{Phi}i+1)/p({Phi}i+1|{Phi}i), will give (3) above. Note that the Hastings term for updates to G is not conditional on any particular value of G. Thus, at least for G, this method is an example of the well-studied independence Metropolis-Hastings sampler (see, e.g., TIERNEY 1996 Down, pp. 69–70) and has been proposed as a possible approach for genealogical inference by STEPHENS and DONNELLY 2000 Down. The MCMC is sampling the posterior distribution of genealogical histories, as well as parameter values, and inference is performed in much the same way as in WILSON and BALDING 1998 Down and BEAUMONT 1999 Down. If the importance sampling is used in this way, then the MCMC will correctly sample from the posterior distribution of parameters provided that the current genealogical history and associated likelihood are kept like any other parameter rather than resampled at each evaluation of (3). The sampled genealogical history is treated as a parameter on an equal footing with {Phi}, and although it would be possible to update the genealogical history independently of the demographic parameters, they are all updated together in the following simulations. To reiterate, the difference between the independence Metropolis-Hastings sampler and MCWM is that in MCWM the importance weight is viewed as an estimate of p(D|{Phi}i) in (4) and, although it would never be advisable to use it with a sample of size 1, is reevaluated with a new Gi at each evaluation of (4), whereas with the independence Metropolis-Hastings sampler the current Gi is retained at each evaluation of (3).

As is shown in the RESULTS, using a single genealogy in the independence sampler leads to very poor convergence of the MCMC, and, again this is a well-known property of the independence Metropolis-Hastings sampler when the sampling function is a poor approximation of the target density (TIERNEY 1996 Down). Essentially the distribution of importance weights is very skewed so that the simulation will "stick" at the (very rarely obtained) high importance weights and then wait a long time forEquation 3 to be satisfied. By contrast, at the other extreme, if the importance sample size was very large so that independent estimates of the likelihood ratio R had negligible variance then convergence of the MCMC would depend only on {Phi} and would generally be very good. Intuitively, therefore, we should get better convergence if we take larger sample sizes, but then the question arises whether the MCMC will converge to the required target density exactly—i.e., p({Phi}, G|D).

As shown in the Appendix the target density for the MCMC becomes rather more complicated when we consider importance sample sizes greater than one. However, it can be proved (see the Appendix) that ifEquation 2 is used with values of h > 1 then the independence sampling procedure will always give the correct posterior densities for the demographic and genealogical parameters for any importance sample size. Although the sample of h genealogical histories observed at any point in the simulated chain is not drawn from the posterior distribution, if we consider the sample of genealogies simulated by the importance sampling procedure to be ordered and keep a track of, say, the genealogies occupying the jth position throughout the MCMC simulation, then these genealogies will (in the long run) be sampled from the correct posterior distribution and, jointly with the parameters, will be sampled from p({Phi}, G|D). As described below, simulation tests suggest that acceptance rates increase rapidly with larger importance sample sizes, and for adequate importance sample sizes this procedure is, in general, more efficient than the other methods. This method differs from the normal independence sampler because we are using a group of sampled genealogies rather than one and is called grouped independence Metropolis-Hastings (GIMH) to distinguish it from the normal independence Metropolis-Hastings and from MCWM, which involves reevaluation of the likelihood. Since the grouped independence Metropolis-Hastings sampler can be shown to converge to the target densities exactly, whereas this is only approximate in the case of MCWM, with or without bias correction, the bulk of the analyses performed in this article are carried out using this approach.

A basic algorithm for GIMH (or the standard independence sampler when h = 1) is as follows:

  1. Choose initial parameter values {Phi}i with i = 0.

  2. Sample h independent genealogical histories using importance-sampling function and calculate i(D|{Phi}i) fromEquation 2.

  3. Draw {Phi}i+1 ~ p({Phi}i+1|{Phi}i).

  4. Sample h independent genealogical histories using importance-sampling function and calculate i+1 (D|{Phi}i+1) fromEquation 2.

  5. Accept {Phi}i+1 and i+1(D|{Phi}i+1) with probability

    Otherwise {Phi}i+1 = {Phi}i and i+1(D|{Phi}i+1) = i(D|{Phi}i).

  6. Set i = i + 1 and go to 3.

In this algorithm the importance sampling calculations are explicitly indexed for clarity. The essential difference from MCWM is that the iterations start at step 3 and genealogical histories are simulated only for the trial values.


*  INFERENCE IN THE TEMPORAL METHOD BASED ON A COALESCENT MODEL WITH SAMPLES TAKEN AT MANY TIME POINTS
*TOP
*ABSTRACT
*IMPLEMENTATION OF MARKOV CHAIN...
*INFERENCE IN THE TEMPORAL...
*SIMULATION TESTS
*ANALYSIS OF EXAMPLE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The data are assumed to be sampled at different times, given by the sequence = (x0, x1, ... , xd). Time is measured in units of generations, and the most recent sample is given subscript 0, and x0 = 0. The population is changing exponentially in size from a previously constant ancestral size NA at time X to size N0 at time 0. Time is taken to be increasing into the past, and terms such as "earlier" and "later" refer, respectively, to times nearer or farther from the most recent sample (see Fig 1). Corresponding to each time point is a sequence of sample sizes (number of chromosomes) = (n0, n1, ... , nd), where lineages are added to the genealogy. The sequence of frequency counts of the different allelic types in each sample is given by = (a0, a1, ... , ad), where the vectors are of length k, the total number of different allelic types observed in the data. For times >x0, at the time each set of lineages is added a number of lineages are present with descendants in earlier samples, lower down the genealogy (i.e., lower down in Fig 1). The allele frequency counts among these base lineages are denoted here as the random variable = (f0, ... , fd), where, to ease the notation below, f0 is defined to be 0. The number of these lineages, also a random variable, H = (h1, ... , hd), depends on the number of coalescences that occur in the intervals between sampling points. These are given by the sequence = (c1, ... , cd). Thus, at the ith sample point, the number of lineages deriving from earlier samples is given by



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 1. Diagram illustrating the terminology used in the text.

The notation used here is summarized in Fig 1.

The likelihood, assuming a model of drift without mutations, can be obtained as a straightforward extension of the two-sample case in BERTHIER et al. 2002 Down and is given by

(5)

where

  • p(ad + fd) is the probability of sampling the gene frequencies in the lineages extant at the earliest sampling time(at the top of Fig 1);

  • p(fi + ai|fi+1, ci+1) is the probability of obtaining the gene frequencies among the lineages extant at sample i given the base lineages at i + 1 and the number of coalescences within the interval;

  • p(ai+1, fi+1|ai+1 + fi+1) is the hypergeometric sampling probability of obtaining the frequencies in the base lineages and the frequencies in the sample lineages, given the frequencies of the combined lineages; and

  • p(ci+1|(xi+1 - xi)/(2Ñi+1)) is the probability of obtaining c coalescences in the sampling interval, over which the harmonic mean effective size is Ñ

(see the Appendix for further details). The sum is over all possible numbers of coalescences between sampling intervals and all possible frequency counts among the base lineages at each interval. In the case of many unlinked loci, the likelihoods can be estimated for each locus separately and then multiplied together. Although in principle the possibilities can be straightforwardly enumerated, allowingEquation 5 to be solved, in practice there are far too many possibilities to make this useful. Instead, the importance sampling approach of GRIFFITHS and TAVARE 1994A Down is applied to this problem, as in BERTHIER et al. 2002 Down.

In this approach S independent sequences of coalescences of lineages are explicitly sampled by simulation (see the Appendix for details), and we obtain

(6)

Thus for the {kappa}th simulated sequence p(fi + ai|fi+1, ci+1) x p(ci+1|(xi+1 - xi)/(2Ñi+1)) in (5) is replaced by {prod}c{kappa}ie=0 w{kappa}i(e+1), which is the ratio of the probability of obtaining the sampled sequence of lineages under the coalescent model, independent of the data, to the probability of obtaining it from the importance-sampling function.

The c{kappa}i coalescences are simulated using the coalescent model (see the Appendix for details of how this was done for a population of varying size). The distribution of the number of coalescences between data-sampling intervals is identical under the coalescent model and the importance-sampling function, and hence this term cancels out. This form of sampling is used for all the analyses described below. However, if importance weights are to be evaluated at parameter values other than those used to generate the samples, the terms in (6) need to be multiplied by a weight reflecting the different probability of obtaining the simulated number of coalescences under the coalescent compared to that under the importance-sampling function. This can be done in two ways. The weight p(ci,(xi+1 - xi)/2Ñi+1)/p(ci,(xi+1 - xi)/2Ñ*i+1) can be used from TAVARÉ's (1984) Equation 6.1 for each interval between samples, where Ñ and Ñ* are calculated for each interval from (A3), and Ñ* is used to generate the importance samples. Alternatively, the simulated coalescence times can be recorded, and an equivalent ratio can be calculated from their joint density under the coalescent compared to their joint density under the importance-sampling function. The advantage of the former is that it is marginal to the coalescence times and should therefore be more efficient; however, it is computationally time consuming to calculate and numerically unstable, and the latter is probably more practicable.

In the results described in the next sectionsEquation 6 has been used on its own to estimate likelihoods and also incorporated into the MCWM procedure with and without bias correction and into the GIMH) sampler. In general when MCWM and GIMH are used in the analyses rectangular priors are assumed for each parameter, as in BEAUMONT 1999 Down. In all of the analyses, X is assumed to be equal to xd and not separately estimated. In the MCMC, the initial values of the parameters are taken uniformly randomly from the priors. They are updated from a lognormal distribution with the median centered on the current value of the parameter and standard deviation (on a log scale) of 0.5, unless otherwise stated. In all the MCMC analyses the parameters are updated simultaneously (with the genealogies, as discussed above). Comparisons among the various approaches are made to demonstrate the superiority of GIMH, and then this method is used for further investigations of the accuracy and coverage properties of the method using simulated data sets. Finally GIMH is applied to three published data sets to illustrate its utility.


*  SIMULATION TESTS
*TOP
*ABSTRACT
*IMPLEMENTATION OF MARKOV CHAIN...
*INFERENCE IN THE TEMPORAL...
*SIMULATION TESTS
*ANALYSIS OF EXAMPLE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Comparison of MCWM and GIMH with pure IS estimation:
To compare the accuracy of the three different MCMC approaches a data set was simulated from the model from a diploid population with effective size Ne = 51.2. The population did not change in size over the sampling period, and six samples each of size 20 chromosomes were taken at generations 0, 4, 8, 12, 16, and 20. The data set consisted of 10 loci each with five alleles in the population (although, due to sampling, some data sets had fewer than five alleles). The population frequencies were simulated from a uniform Dirichlet distribution, according to the assumption of the model.

The data set was then analyzed by four different approaches (in all models, Ne = N0 = NA):

  1. The likelihoods for a grid of 81 values of Ne from 20 to 80 were evaluated using (6). The likelihoods were evaluated at each point independently, using an IS size of 40,000. The standard errors were estimated using (A2). The approximate likelihood surface was normalized to have unit volume, and the standard errors were scaled accordingly. The standard deviation was estimated from this distribution.

  2. Nine different simulations using MCWM without bias correction were carried out in which the IS sizes used in the evaluation of (4) were 5000, 1000, 500, 100, 50, 10, 5, 2, and 1. The simulations were run for 20,000 updates, which appeared to give good convergence (as judged by eye from the output traces), and densities and standard deviations of the posterior distribution were estimated from the values of Ne generated by the simulation.

  3. Eight simulations using bias-corrected MCWM were carried out as for MCWM. Simulations using an IS size of one were not performed because SE[(D/{Phi})] cannot be estimated.

  4. Nine simulations were carried out using GIMH as for MCWM. However, with GIMH the rate of convergence is heavily dependent on the IS size used. In particular, with an IS size of one the MCMC procedure tends to mix very poorly because the simulated chain will stick at chance high values of (D/{Phi}). The length of simulation, the thinning interval (the number of MCMC iterations between successive recordings of parameter values), and the standard deviation of the trial parameter updates were varied between simulations to achieve satisfactory convergence, judged by eye from output traces.

Estimated densities using the four approaches are shown in Fig 2. It can be seen that the standard errors for the IS method are still large, even with 40,000 points. However, the posterior distributions estimated by MCWM with and without bias correction are very similar to each other and to the distribution estimated from the pure IS method, despite the variability in the estimates of the likelihood (for an IS size of 500 the standard errors are expected to be around nine times larger than those shown in Fig 2). The distribution for GIMH with an IS size of just 10 per MCMC update (evaluation ofEquation 4) is very close to that of the pure IS method.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 2. Comparison of the posterior distributions obtained using MCWM without bias correction, MCWM with bias correction, GIMH, and pure importance sampling.

Fig 3 shows how the width of the estimated posterior distribution varies with the IS size.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 3. The relationship between the IS size and the standard deviation of the posterior distribution for MCWM with and without bias correction and GIMH. The standard deviation estimated from pure IS is shown as a dotted line.

The standard deviation estimated from the pure IS method is 10.4. It can be seen that for MCWM with and without bias correction there is a strong relationship between the width of the distribution and the IS size. Bias correction does appear to ameliorate the problem to some extent, but still leads to inaccurate estimation of the posterior distribution when the IS size is small. For MCWM without bias correction, an IS size of ~500 is the minimum required for accurate estimation, whereas ~100 are needed with bias correction. The rate of convergence of the two MCWM methods appears to be independent of IS size.

GIMH is unaffected by the IS size, as expected from the result in the Appendix. This is not a free lunch, however. The tradeoff is that the amount of mixing is severely reduced when the IS size is low, and the length of time required to achieve convergence is correspondingly increased. For example, with an IS size of 1 the result shown in Fig 3 was obtained by pooling together results from seven independent simulations of 108 MCMC updates, thinned every 10,000 updates. Even in this case, there is still appreciable variability in the results between the independent simulations. The result for an IS size of 10 was obtained from a single simulation of 107 MCMC updates thinned every 200 updates. In this case convergence appears very good, with a very uniform trace for Ne, and the resulting density is plotted in Fig 2.

These points are further illustrated in Fig 4, where the proportion of trial updates that are accepted in the MCMC, divided by IS size, is plotted against IS size. The acceptance rate varied rapidly from 0.0011 with an IS size of 1 to 0.15 with an IS size of 10 and then more slowly to 0.85 with an IS size of 5000. It can be seen in Fig 4 that the scaled acceptance rate has an optimum at an IS size of ~10.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 4. The proportion of trial updates in the MCMC that are accepted, divided by IS size, is plotted against IS size.

In these simulations the standard deviation of the distribution of parameter updates was kept at 0.1 for all IS sizes, and therefore the scaled acceptance rate is a measure of efficiency—for a given number of accepted trial updates, the total required number of IS evaluations is at a minimum if an IS size of 10 is used. This optimum will vary for different data sets and sizes of the trial parameter updates, and for the remaining analyses described in this article, which mostly used a standard deviation of 0.5 for the trial parameters, unless otherwise stated, GIMH was used with an IS size of 100 and 105 MCMC updates, thinned every 10 updates to give 10,000 points.

Effect of sample size and numbers of alleles and loci on the estimation of NA and N0:
To illustrate the effect of varying aspects of the sample, five independent simulations were performed for each combination of parameters in Table 1. As shown in the table the parameters were also summarized by a composite parameter SSAL = (sample size) x (k - 1) x (number of loci), where k is the number of alleles. Samples were simulated from populations that grew from NA = 20 to N0 = 200 and also populations that contracted from NA = 200 to N0 = 20. The samples were taken at six times, = (0, 2, 4, 6, 8, 10). The population frequencies were simulated from a uniform Dirichlet distribution, as before. The aim of the analysis was to compare the effect of the parameters on the deviation of the joint posterior mode of NA and N0 from the value used in the simulations and also to illustrate typical posterior distributions obtained with different data sets. Ideally, of course, an analysis of the accuracy of estimators should use a larger number of replicates, but the time taken to run the MCMC precludes this. Five replicates are, however, sufficient to illustrate the general trend toward consistency in the estimator, as the amount of information in the data increases. This number was chosen because pairs of sets of five replicates could be run in parallel on a 10-node cluster of 700-Mhz Pentium 3 processors running under Linux. MCMC parameters are as described above for all simulations other than those with SSAL = 8000. In this case an IS size of 500 was used and the standard deviation (on a log scale) of the lognormal used for updating the demographic parameters was 0.1 rather than 0.5. The simulations took ~4 hr for SSAL = 800 and ~6 days for SSAL = 8000 (which has an IS size five times larger).


 
View this table:
In this window
In a new window

 
Table 1. Combinations of numbers of loci, numbers of alleles at each locus, and sample size used

Examples of the joint posterior distributions for NA and N0 are shown in Fig 5. The posterior distributions are illustrated using highest posterior density (HPD) limits (as in BEAUMONT 1999 Down). These are obtained from the simulations of growing and declining populations with SSAL = 800 and SSAL = 8000 (see Table 1). In each case, of the five replicate simulations, that where the mode for NA and N0 is the median distance away from the true value was chosen to be illustrated. It can be seen that there is a tendency for the larger population size to be most poorly estimated, with substantial skew in the posterior density. In addition, there is a tendency (observed generally, as well as in the simulations that are illustrated) for the current population size to be well estimated by the joint mode (25 modes higher than true value and 35 lower out of 60 simulations) and the ancestral population size to be generally overestimated by the modes (44 modes higher and 16 lower).



View larger version (25K):
In this window
In a new window
Download PPT slide
 
Figure 5. Posterior distribution of NA and N0 for four different simulated data sets. The contour levels correspond to the 90, 50, and 10% HPD limits. The line where NA = N0 is shown. The values of NA and N0 used in the simulation are shown as a cross. (a) SSAL = 800, NA = 20, N0 = 200; (b) SSAL = 8000, NA = 20, N0 = 200; (c) SSAL = 800, NA = 200, N0 = 20; (d) SSAL = 8000, NA = 200, N0 = 20.

Using the mode from the joint posterior distribution as an estimator for NA and N0, the square root of the relative square error [defined as ( - )2/N2A + (N0- N0)2/N20], referred to here as the "relative error," was calculated for each simulation and is shown plotted against SSAL in Fig 6, a and b. Each point in the figure is the relative error for a data set plotted against the SSAL value given in Table 1. Although there is substantial variability a general trend toward a reduction of the relative error with increasing SSAL can be seen. Given the variability of the results, the logarithm of the relative errors was analyzed using a linear model. In the model growth/decline was specified as a factor and the covariates (log transformed) were number of loci, number of independent alleles at each locus (k - 1), and sample size. The coefficients and standard errors were 4.96 (1.30), -0.629 (0.217), -0.517 (0.325), -0.794 (0.314), and -0.686 (0.164) for the intercept, effect of growth, number of loci, k - 1, and sample size, respectively. The effect of growth/decline was significant at P = 0.005, and the effect of the three covariates was significant at P = 0.0001. The residuals from this model were roughly normal with no obvious heteroscedasticity, although the limit on the relative errors arising from the rectangular priors has some effect on the residuals. A model with the coefficients for the three covariates forced to be -1 did not fit significantly less well than a model where the three covariates were free to vary (P = 0.21). This simple model gives the equations Relative Error = 2128/SSAL for a declining population and Relative Error = 1135/SSAL for a growing population. The fitted values from this model are shown in Fig 6. Thus the main conclusions of this analysis are: (a) there is, at least at the level of precision in this simulation study, an equivalence between the number of independent alleles, number of loci, and sample size; and (b) for the same value of SSAL the relative error is almost twice as large in a declining population in comparison with a growing population.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 6. A plot of the relative error in estimation of ancestral and current population size against SSAL, which is a summary of the number of loci, number of alleles, and sample size. Simulations with different configurations in Table 1 but the same SSAL have been shifted slightly so that those with a higher number of loci are on the right. The fitted line is obtained using the model described in the text. (a) Growing population; (b) declining population.

The Bayes factor favoring a model of population growth vs. decline was calculated for each MCMC simulation as the proportion of MCMC iterations where N0 > NA divided by the proportion of iterations where N0 < NA (each model has equal prior probability). The Bayes factor gives the relative likelihood of one model over the other (GELMAN et al. 1995 Down). The logarithm of the Bayes factor is plotted against SSAL in Fig 7, a and b. Each point in the figure is the logarithm of the Bayes factor for a data set plotted against the SSAL value given in Table 1. Although some of the simulations with SSAL <= 1600 have |log(Bayes factor)| < 2 (i.e., would be judged to be nonsignificant by conventional criteria), the great majority of results very strongly support the model under which they were generated. It should be noted that the Bayes factor is sensitive to the priors chosen, and this is discussed in more detail in the context of the example data sets analyzed below.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 7. A plot of the logarithm of the Bayes factor in supporting a model of population growth against SSAL. Other details are as in Fig 6.

The bias in the joint estimation of NA and N0 apparent in Fig 5 does not appear to be caused by any systematic error in the estimation procedure, as judged by an examination of the coverage properties of the posterior distributions. The critical HPD P values corresponding to the true NA and N0 were estimated for each data set and plotted against the SSAL values given in Table 1 and Fig 8, a and b. If the posterior distribution was the same as the repeated sampling distribution the HPD P values should be uniformly distributed, irrespective of treatment. This will be true asymptotically when the posterior distribution approximates a multivariate normal. It can be seen that the estimated critical P values are broadly uniformly distributed, which is what would be expected under asymptotic theory. A Kolmogorov-Smirnoff one-sample test on the 60 P values shows no departure from a uniform (P = 0.93). There is no trend toward small P values with increasing SSAL, which would be expected if there was an error in the estimation procedure. The estimated critical P values for the examples in Fig 5, a–d, are, respectively, 0.57, 0.29, 0.11, and 0.44. Thus, overall, the method appears to estimate changes in effective population size satisfactorily, but it is preferable to present results for the full posterior distribution rather than rely on the mode as a point estimate.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 8. A plot of the critical HPD P value of the true NA and N0 against SSAL. Other details are as in Fig 6.


*  ANALYSIS OF EXAMPLE DATA SETS
*TOP
*ABSTRACT
*IMPLEMENTATION OF MARKOV CHAIN...
*INFERENCE IN THE TEMPORAL...
*SIMULATION TESTS
*ANALYSIS OF EXAMPLE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

To illustrate the behavior of the method on real data sets, three examples have been chosen: data from a population of Drosophila subobscura surveyed by BEGON et al. 1980 Down, data from a population of northern pike (Esox lucius) surveyed by MILLER and KAPUSCINSKI 1997 Down, and data from the Mauritius kestrel surveyed by GROOMBRIDGE et al. 2000 Down.

Drosophila:
The data were sampled from a population on Mount Parnes, 40 km north of Athens. The flies were genotyped for nine allozyme loci. The study site occupied ~20,000 m2 of fir woodland at an elevation of 900 m. BEGON et al. 1980 Down estimated the total suitable habitat to extend at least 107 m2. Thus the population is clearly open, vitiating one of the assumptions of the temporal method. Samples were taken in September 1975 (190 individuals), September 1976 (250 individuals), and May 1977 (335 individuals). BEGON et al. 1980 Down estimated these corresponded to sampling intervals of nine and two generations, respectively. Using mark-release-recapture methods they estimated the census size in their study area to be ~150,000 individuals. These data have also been analyzed by ANDERSON et al. 2000 Down, who noted that the frequencies at one locus (Pgm) appeared to be misreported in BEGON et al. 1980 Down, and used only eight loci. For comparison, these same eight loci are analyzed here (input file kindly provided by Eric Anderson). The number of alleles at each locus varied from three to six. A rectangular prior of (0, 5000) was chosen for both NA and N0. The joint posterior distribution for NA and N0 is shown in Fig 9.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 9. Posterior distribution of N0 and NA for the fly data of BEGON et al. 1980 Down. The contour levels are at the 0.1, 0.5, and 0.9 HPD limits, as in Fig 5.

In addition, a separate analysis was carried out with Ne = NA = N0 to compare with the results obtained by ANDERSON et al. 2000 Down, who obtained a maximum-likelihood estimate for Ne of 500 with support limits (log-likelihood 2 units less than the maximum) of 250–975. The posterior distribution obtained with the method described here should be directly comparable with the likelihood curve estimated by ANDERSON et al. 2000 Down because the limits of the rectangular priors, (0, 5000), are substantially wider than the posterior distribution obtained. The trace for Ne is illustrated in Fig 10 (with the initial 100 points discarded). The time taken to obtain 10,000 points on a 500-Mhz Pentium was 27 hr, although it can be seen from Fig 10 that good estimates of the posterior distribution can be obtained with substantially fewer points.



View larger version (43K):
In this window
In a new window
Download PPT slide
 
Figure 10. Trace of the realized values of Ne during an MCMC run with the fly data of BEGON et al. 1980 Down. The initial 100 points have been discarded.

The mode of the posterior distribution is 449 with support limits of 253–925 (in this case the support limit corresponds to the 0.922 HPD limit), which is very similar to the result of ANDERSON et al. 2000 Down. Interestingly, as noted by Anderson et al. this result is very different from that obtained by POLLAK 1983 Down, who obtained estimates of 253 (±115) for the first interval and 244 (±123) for the second and an overall estimate of 251 (±115) for both intervals. The reason for the discrepancy between the results from the two likelihood-based approaches and that from the moment-based approach of POLLAK 1983 Down is unclear (see ANDERSON et al. 2000 Down for discussion), although it is possible that omitting Pgm has some effect on the results.

The results for the varying population model are more in line with those of BEGON et al. 1980 Down, who used the original method of KRIMBAS and TSAKAS 1971 Down, and estimated Ne at 268 (±73) for the first interval and {infty} for the second. In Fig 9 it can be seen that there is very little evidence of a change in population size. The joint mode is at NA = 337 and N0 = 890. The line of equal population sizes is well within the 90% HPD limits. The Bayes factor in favor of growth is 5.4. The marginal modes and HPD limits are 196 (57–913) for NA and 726 (112–4138) for N0. As noted above, the Bayes factor is sensitive to the priors chosen. Thus, for example, widening the rectangular bounds equally for both NA and N0 will tend to increase the Bayes factor, and narrowing them will decrease it. In general the tendency for the posterior distributions to reach an asymptote for large NA or N0 will cause sample size to affect the inferences. Considering three samples, as here, if, for example, the most recent sample is smaller than the oldest sample there will be greater uncertainty in N0 and the posterior distributions may be more likely to asymptote, and therefore there will be a tendency to suggest population growth, even if there is none. In fact, for the fly data, it is the oldest sample that is the smallest, and therefore this argument does not explain the broad posterior distribution for N0.

Northern pike:
Fish scale samples taken in 1961, 1977, and 1993 from Lake Escanaba, Wisconsin, were chosen from a collection of scales kept by the Wisconsin Department of Natural Resources and were genotyped at seven microsatellite loci. Five of these loci were biallelic and the remaining two were triallelic. The allele frequency counts used in the following analysis were obtained from the relat