- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wang, J.
- Articles by Whitlock, M. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wang, J.
- Articles by Whitlock, M. C.
Estimating Effective Population Size and Migration Rates From Genetic Samples Over Space and Time
Jinliang Wanga and Michael C. Whitlockba Institute of Zoology, Zoological Society of London, London NW1 4RY, United Kingdom
b Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
Corresponding author: Jinliang Wang, Zoological Society of London, Regent's Park, London NW1 4RY, United Kingdom., jinliang.wang{at}ioz.ac.uk (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
In the past, moment and likelihood methods have been developed to estimate the effective population size (Ne) on the basis of the observed changes of marker allele frequencies over time, and these have been applied to a large variety of species and populations. Such methods invariably make the critical assumption of a single isolated population receiving no immigrants over the study interval. For most populations in the real world, however, migration is not negligible and can substantially bias estimates of Ne if it is not accounted for. Here we extend previous moment and maximum-likelihood methods to allow the joint estimation of Ne and migration rate (m) using genetic samples over space and time. It is shown that, compared to genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of Ne, respectively, if it is ignored. Extensive simulations are run to evaluate the newly developed moment and likelihood methods, which yield generally satisfactory estimates of both Ne and m for populations with widely different effective sizes and migration rates and patterns, given a reasonably large sample size and number of markers.
ONE of the major applications of population genetics theory has been in the estimation of genetic and demographic properties of populations. Quantities such as the mutation rate, the migration rate, and the recombination rate are all in principle estimable from population genetic data, at least as products with the effective population size.
The variance effective population size (Ne) is an important quantity in evolutionary biology, as a description of the rate at which genetic variance changes due to genetic drift. A variety of methods have been developed to estimate Ne (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
A key assumption made by all these approaches, however, is that the only factor involved in allele frequency change is genetic drift. All systematic forces (selection, mutation, and migration) are assumed to be absent. In our view, it is often legitimate to ignore the effects of mutation on the timescale involved in most empirical studies. It also may be reasonable to neglect the effects of selection because direct selection on most markers may be unlikely to be strong enough to cause substantial changes in their frequencies. (The effects of indirect selection acting on other linked loci on the allele frequency change of the marker locus may arguably be included as a "drift" effect and legitimately included in a Ne term.) However, for most populations the effects of migration are not negligible. A major weakness of these temporal methods is that they ignore the effects of migration, which can substantially bias estimates of Ne, either upward or downward (see below). The purpose of this article is to include the effects of migration in the estimation of demographic parameters from temporal genetic data.
Migration can affect the estimation of Ne in two ways. In the short term, migration can change allele frequencies quite quickly. Given that the temporal methods estimate Ne by measuring the pace at which allele frequencies change, migration causes a population to behave as if strong drift was changing allele frequencies very rapidly, resulting in an estimate of Ne that is too low. In the longer term, the problem is reversed. Constant migration and drift would cause the populations to approach an equilibrium level of genetic differentiation and the rate of change of allele frequencies in a deme to slow down to approach that predicted by the effective size of the whole metapopulation. Thus the effective population size of the deme would be substantially overestimated in the long term. These effects are discussed later in this article in more mathematical detail.
This problem can be avoided by simultaneously estimating Ne and the migration rate, which is the approach taken in this article. It turns out that while both migration and drift affect allele frequency changes in a population, they do so differently in terms of the direction and magnitude of these changes. Under migration, the allele frequency of a focal population would tend to become increasingly similar to that of the source population. Under drift, however, the allele frequency of a focal population changes purely at random, irrespective of that of the source population. Furthermore, the magnitude of allele frequency change of a focal population depends on the allele frequency difference between the focal and source populations under migration. Migration has no effect on alleles that happen to be at the same frequencies both in the deme in question and in the populations providing those migrants, but migration is expected to change allele frequencies substantially in cases when the source and recipient demes have very different allele frequencies. Genetic drift, on the other hand, is not affected by differences in allele frequencies among demes, but depends only on the local allele frequency. These differences in the patterns of changes of allele frequencies due to migration and drift allow us to estimate them separately, so long as enough data are from multiple loci. Fortunately, these kinds of data are often available with modern biochemical methods.
The estimation of migration rate and migrant number has a long history. The oldest and most widely used genetic method assumes that populations are at equilibrium in a system approximated by WRIGHT's (1931) island model and then uses estimates of Wright's FST to calculate 1/(4Ne m + 1), where m is the local immigration rate (see ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Here we extend previous temporal methods for estimating the Ne of a single isolated population so that they can be used to estimate Ne and m jointly for a metapopulation. We first develop an expectation based on a moment estimation approach, which allows an approximate description of the behavior of the estimator and the system. We follow this by developing a likelihood method based on that of ![]()
![]()
![]()
![]()
| DEFINITIONS AND ASSUMPTIONS |
|---|
We use temporal data to estimate the effective size (Ne) and immigration rate (m) of a partially isolated population. For brevity, we refer to the population for which parameters are being estimated as the focal population and that from which the migrants come as the source population. Data and parameters for the source population are given the subscript A, and those for the focal population, B. In the first section, we assume that the source population is infinitely large and therefore not changing genetically through the time frame of the study. In the section after that, we assume that both populations are finite and jointly estimate the properties of both. In the latter case, source and focal populations are obviously reciprocal. Throughout we assume diploidy. For data from haploid species or loci, the Ne estimated by these models would be double the actual effective size.
For the infinite-source population model, a minimum of three samples is required to estimate Ne and m of the focal population: two temporally spaced ones from the focal population and one taken at any time from the source. For the two small population models, a minimum of four samples is necessary, two temporally spaced ones taken concurrently from each population. Let us denote the size of these samples S, with subscripts A or B to describe whether the sample is from the source or focal population, respectively, and a subscript t to describe the generation in which the sample is taken. Thus SB,0 is the number of individuals sampled from the focal population in the initial generation. We use x to refer to the relative frequency of a particular allele in a given sample. Therefore x is an estimator of the true frequency (p) of that allele at that time in that population. For convenience, we define q = 1 - p. These quantities, x, p, and q, are subscripted in the same way as S.
Several summary statistics of these quantities turn out to be useful. We define
pAB
pA,0 - pB,0 to be the initial difference between the allele frequencies of the two populations and
pB,t
pB,t - pB,0 to be the change in allele frequency by generation t in deme B. In the derivations of the moment-based estimators, we also use
B,i to refer to the change in generation i due to genetic drift of the allele frequency in population B. Each of the quantities can refer to population A if the subscript B is replaced by A.
We make several assumptions throughout. These are that the alleles under study are not under direct selection, that mutation is a negligible source of allele frequency change, that potential sources of migrants to the focal populations can be identified a priori, that all samples are randomly taken from their populations, and that the sampling event does not change the available pool of reproductive individuals. The latter can be realized by sampling after reproduction, sampling with replacement, or sampling from a population much larger than the sample. This sampling scheme therefore corresponds to WAPLES's (1989) plan 1.
Table 1 summarizes most of the terms used in this article.
|
| IMMIGRATION FROM AN INFINITE-SOURCE POPULATION |
|---|
Theoretical background:
Imagine that the focal population is receiving migrants at rate m per generation from an infinite, unchanging source population. For effectively neutral alleles, changes in allele frequency over a short period of time are a function of the local effective population size and migration rate. Mutations are negligible compared to drift and migration because of their rare occurrence and the short time span. Under these conditions, and assuming that migration precedes population regulation, the allele frequency at generation i is given by
![]() |
(1) |
Note then that
![]() |
(2) |
Expanding across generations, we can find
![]() |
(3) |
so
![]() |
(4) |
Under the assumptions of Wright-Fisher sampling, the
terms are independently binomially distributed. The expected value of all of the
terms is zero, because drift does not change on average the allele frequency. The variance of each of these terms is given by pB,i(1 - pB,i)/Ne for haploids or pB,i(1 - pB,i)/2Ne for diploids. In reality, we do not know pB,i for any generation between samples, and we can only estimate pB,i for generations in which there was a sample. This uncertainty is the root of error in the estimation of statistics derived from these equations. In this article, we first estimate this error by using the approximate moments of the distribution of allele frequencies, and then we describe a likelihood approach to the estimation of Ne and m.
Moment approximations:
Ne and m affect the first and second moments of the change in allele frequency over time. Under certain circumstances, we can use these moments to estimate m and Ne. We do this in a two-step process, first estimating m from the mean change in allele frequencies at all loci and then using this value in the estimation of Ne from the second moment of allele frequency change.
Estimating the migration rate:
The change in allele frequency is given by Equation 4 above. Taking the expectation of
pB,t we find
![]() |
(5) |
because the expected values of all the
terms are zero. Assuming that t is known without error, we can therefore make a one-locus estimate of (1 - (1 - m)t) from
pB,t/
pAB.
To predict the optimal sampling strategy and to approximate standard errors, we estimate the error variance of an estimate of m. The expected error in estimating a ratio is an approximate function of the error variance of estimating the numerator and the denominator and the error covariance between the two (see ![]()
pB,t/
pAB. The error variances (Verr[ ]) and error covariance (COVerr) are given by
![]() |
(6) |
Note that the 2's in the denominators drop out when these equations are applied to haploids. The approximation in the first equation here comes from using the initial value of pq to predict the amount of variance in final allele frequency due to genetic drift.
Using these last equations and Equation A1.19b from ![]()
![]() |
(7) |
If the three sample sizes are roughly equivalent, then the error in estimating m is influenced more by the allele frequency in population B than by that in the source population, especially if the product mt is small. The terms involving only the sample sizes, m, and t are presumably constant across loci, so the error variance is largely proportional to 1/(
pAB)2. The minimum variance possible when combining L loci can be obtained by weighting each locus by the inverse of its expected error variance. The best estimator of m is given approximately by
![]() |
(8) |
where the weight for locus l is wl = (
xAB)2, Fl =
xB,t/
xAB, and W is the sum of wl across loci.
Estimating Ne:
Under the assumption that the total change in allele frequency per generation is not large (i.e., Ne is not very small and m is not very large), each of the terms in the equation describing allele frequency change is roughly independent of the others. The variance in allele frequency change at generation i due to genetic drift within the focal population is V[
B,i] = pB,iqB,i/2Ne,B. If the allele frequency is not changing rapidly, then we can assume that the value of pq is approximately the same each generation. A reasonable estimator of the mean value of pq is
(xB,0(1 - xB,0) + xB,t (1 - xB,t))/2. This is implicitly assuming that the allele frequency on average takes a linear path from the initial to the final value, but in any case is an ad hoc approximation. Then using Equation 4, we can find
![]() |
(9) |
where M = (1 - (1 - m)2t)/((2 - m)m). Again, the coefficient of Ne,B of 2 drops out when this formula is used for haploids. From Equation 9 and using the estimate of m, we obtain an estimator of 1/(2Ne,B),
![]() |
(10) |
where
= (1 - (1 -
)2t)/((2 -
)
).
In general we do not have exact information about the change in population allele frequencies, but only an estimate based on sample data. Because of sampling variance, E[
xB,t2] is biased upward as an estimator of
pB,t2:
![]() |
(11) |
Thus we can estimate
![]() |
(12) |
The error variance per locus for estimating 1/(2Ne) is approximately proportional to 1/
(calculations not shown), so a minimum-variance estimator of 1/(2Ne) can be obtained by weighting alleles by
approximately, estimated by (xB,0(1 - xB,0) + xB,t(1 - xB,t))/2.
Choosing the right value of t:
With a large separation in time between the first and second samples, estimates of m become problematic. The estimate of (1 - (1 - m)t) becomes more and more similar for different values of m as t gets larger (limt
[
(1 - (1 - m)t)/
m] = 0), yet the error variance in estimating this quantity increases linearly with t. Furthermore, as t increases, the allele frequency change approaches a distribution determined largely by the relative sizes of the effective size and m, rather than by the absolute values of either. When t is large enough that the populations reach an equilibrium between genetic drift and migration, then
0, irrespective of the actual true migration rate. This is clear by inspecting the single-locus estimator of m,
= 1 - t
. As t
,
xB,t no longer increases and
xB,t/
xAB is expected to reach a constant smaller than one with time, therefore resulting in
0.
If the time interval is too short and m too small, then there is less power to measure m because too few migration events may have occurred. Larger samples with more informative markers are therefore necessary to obtain a good estimate of m when (1 - m)t is close to one or zero.
Similarly, the estimation of Ne is also biased if the sampling interval is too long. As t
, we have
0 shown above and Equation 10 reduces to Est(1/2Ne,B) = E[
p2B,t]/(t
). If the source population is infinite in size, then E[
p2B,t] tends to a constant as t
, and therefore
e,B
. If the source population has a finite size, then E[
p2B,t] continues to increase as t
(so long as the alleles are not fixed in the population), but at a slower rate corresponding to the total size of the focal and source populations. In both cases, the estimate of Ne,B approaches the effective size of the whole species rather than that of the focal population. Again, there is less power for estimating Ne,B if t and S/(Ne,B) are too small. With a small quantity of tS/(Ne,B), the change in allele frequency due to drift would be overwhelmed by that from sampling.
As a numerical example, Fig 1A shows the average estimates of 1/(2Ne,B) and m over 10,000 replicates plotted against sampling interval t, and Fig 1B shows the coefficients of variation (CV) of the estimates when t = 112 where the estimation is unbiased. (Methods for these simulations are given in the Appendix.) The true values being estimated are 1/(2Ne,B) = 0.005 (Ne,B = 100) and m = 0.2 for the focal population, which is assumed to be in a metapopulation containing 10 other populations of the same size (Ne = 100) with an island migration model. The effective size of the source population is therefore
1000, sufficiently large although not infinite. As can be seen, estimates of both 1/(2Ne,B) and m are close to the true values when t is small, but approach the true values for the whole species, 0.00045 (Ne = 1112, calculated using the known parameters) and 0, with increasing t. The maximum number of generations allowed for valid joint estimates of 1/(2Ne,B) and m and the number of generations required to reach the equilibrium so that the species Ne is estimated depend on m, the effective sizes of the focal and source populations, and the initial genetic differentiation when the first sample was taken (see below).
|
For this specific numerical example, the estimation is unbiased when
t < 12. The precision of the estimators changes considerably over this short sampling period (Fig 1B). As expected, the best estimates are obtained at an intermediate value of t (t = 24 for the example). The optimal sampling interval depends, again, on m, the effective size of the focal population and the initial genetic differentiation when the first sample was taken. With decreasing m and/or increasing initial differentiation, for example, the optimal t is expected to increase.
In experimental design, it is difficult to determine beforehand the optimal interval between sampling for maximum power of the temporal method, because the power depends partially on the parameter being estimated (see e.g., Equation 7). Without any prior knowledge of the demographic history of the populations in question, it is reasonable to consider a short rather than a long sampling interval. While a too long sampling interval results in biased estimates of Ne and m, a too short sampling interval only decreases the precision of the estimates. With powerful statistical methods (e.g., maximum likelihood shown below) and large sample size and marker information, the estimation precision can be improved greatly even if t is as short as one generation.
The consequences of assuming that m = 0: Other methods using temporal data for estimating the effective size have assumed that the immigration rate is zero. We can use the equations above to investigate the effects of making this assumption when it is in fact false.
When m
0, the expected change in allele frequency is zero, M = t, and Equation 12 reduces to
![]() |
(13) |
which is similar to NEI and TAJIMA's (1981) estimator using Fc.
When m > 0, the variance in allele frequency change over the interval between samples is a function of both genetic drift and migration. This can be either greater or smaller than that expected by drift alone, depending on, among other parameters, t. In the short term, the change in allele frequency due to the joint effects of migration and drift will be larger than that expected by drift alone. If t = 1, for example, we have E[(pB,t - pB,0)2] =
pAB2m2 + pB,0(1 - pB,0)/2Ne,B. In this case, therefore, the squared change in allele frequency is increased relative to that due to drift only, which is pB,0(1 - pB,0)/2Ne,B. Falsely assuming that m = 0 would on average cause an overestimate of the amount of drift and an underestimate of Ne,B. In contrast, as the time between samples gets large, the change in allele frequency is constrained by migration to a term only somewhat larger than the squared initial difference in allele frequency between the source and focal populations: limt
E[(pB,t - pB,0)2] =
pAB2 + pB,0(1 - pB,0)/(Ne,Bm(2 - m)). With an increasing sampling interval (t), therefore, a false assumption of m = 0 would cause a downward bias in the estimation of the amount of genetic drift and an upward bias in estimating Ne. As the number of generations between samples continues to get larger, however, the estimate would approach the effective size of the whole species, rather than just the effective size of the focal population, no matter whether the assumption of m = 0 is made or not. These biases can be indefinitely large.
Fig 2 shows the biases of Ne estimation when a false assumption of m = 0 is made. The true parameter values are Ne = 100 and m = 0.1, and the populations are assumed to be either completely differentiated (no previous migration, case A) or at drift-migration equilibrium (case B, where Fst
0.02) when the initial sample is taken. Ignoring migration results in an overestimation of 1/(2Ne) when sampling interval (t) is small and underestimation when t is large. Compared with case A, the initial overestimation of 1/(2Ne) in case B from ignoring migration is much smaller, because the populations are more similar genetically (
pAB small) and therefore migrants have less effect on allele frequency changes. With a larger m and smaller Ne, the initial overestimation of 1/(2Ne) can be dramatic even when the populations are initially at equilibrium (data not shown). For both cases, the estimation is reasonably good for a large range of sampling interval when immigration is accounted for. However, the sampling interval cannot be too long. Otherwise, 1/(2Ne) is increasingly underestimated with t, approaching the true value for the whole species.
|
Likelihood model:
Let us first consider a single locus with two alleles, and we focus on a particular allele. For brevity in this subsection, subscript B is dropped for the focal population. To estimate m and Ne of the focal population jointly, at least two temporally spaced samples from the focal and a single sample from the source population (A) are necessary. Because the three sampling events are independent, the probability of getting a set of samples with allele frequencies denoted by x is
![]() |
(14) |
Each of these samples can be assumed to follow a binomial distribution (see ![]()

where S is the number of diploid individuals sampled. For haploid species, the coefficient 2 of S should be dropped.
The likelihood (L) of a particular set of values of Ne and m is the probability of observing the data given those values. We have, therefore,
![]() |
(15) |
As we can see, the key problem for the likelihood function is to calculate Pr[p0, pt, pA|Ne, m]. By the definition of conditional probability, we have
![]() |
(16) |
Determining the second term in Equation 16, Pr[p0, pA|Ne, m], is potentially quite tricky. If we assume, however, that we do not know anything about how close the focal population is to a migration-drift equilibrium nor about the history of these populations, then this can be given reasonably as a product of the uniform distributions of p0 and pA, Pr[p0, pA|Ne, m] = Pr[p0|Ne] x Pr[pA]. As ![]()
The first term in Equation 16, Pr[pt| p0, pA, Ne, m], can be calculated by using the transition matrix T (![]()

where p*j = (j/2Ne)(1 - m) + mpA and (i, j = 0, ... , 2Ne). This accounts for the deterministic change in allele frequency due to migration as well as the genetic sampling due to drift. Given the initial distribution of allele frequency of the focal population P0 [a column vector of 2Ne + 1 elements with values 1/(2Ne + 1) or 1/(2Ne - 1) as assumed above] and values of pA, Ne, and m, the distribution after t generations, Pt, is calculated by the recurrence equation
![]() |
(17) |
These are all the parts necessary to calculate the likelihood, above. For multiple independent, biallelic loci, the joint likelihood of Ne and m is calculated as the product of the likelihoods for each locus.
For loci with more than two alleles, the calculation of the likelihood function is problematic. The number of possible configurations, C = (2Ne + k - 1)!/(2Ne! (k - 1)!) for a diploid locus with k alleles (![]()
![]()
![]()
![]()
The likelihood method can also use more than two samples from the focal population simultaneously to obtain a mean Ne and an average m during the whole sampling period. If samples are taken from the focal population at generations t0, t1, t2, ... , tg, then the joint likelihood of Ne and m is
![]() |
(18) |
Similarly, a continuous growth model, such as exponential growth, could be fitted to the temporal data and joint estimates of initial effective size, growth rate, and immigration rate can be obtained (![]()
| TWO FINITE DEMES |
|---|
Up to this point, we have considered the demographic properties of a single deme that receives immigrants from an effectively infinite source, that is, a source so large that over the time course of the study it is not changing appreciably in allele frequency. While this is likely to be a reasonable approximation in many cases, there are examples in nature where only a few relatively small demes serve as the only source of migrants to each other. This case is more difficult, both analytically and computationally, but here we consider the case of two small demes as a beginning to such an analysis.
The allele frequency at a neutral locus will change over a single generation in the two populations according to
![]() |
(19) |
Moment estimator:
We can find a moment estimator for the two migration rates for this two-finite-deme case, if we consider the change in the difference in allele frequency between the demes over time. We consider the simple case in which the samples are separated by only a single generation, and so we drop the subscript notation for time. Taking the expectations of Equation 19 above and remembering that the expected values of
A and
B are zero, we find
![]() |
(20) |
The error variance in estimating migration rates would be proportional to 1/
p2AB, so we can find a low variance estimator of the migration rates by weighting information from each allele by
p2AB.
Finding an estimator of the effective size of each deme is also possible. The variance of the change in allele frequency over time due to drift for population A is
![]() |
(21) |
where p'A is the expected allele frequency after migration, which can be estimated from Equation 19 by setting
A = 0 and using the estimated migration rates and initial allele frequencies. Thus the effective size can be estimated in this case in a manner similar to the Waples technique, except instead of using the initial allele frequency, one uses the expected initial frequency after migration. Similar derivations for Ne,B can also be obtained.
For the case of t > 1, moment estimators for both m and Ne become much more complicated (especially the latter) and are not shown here.
Likelihood estimator:
The likelihood of the effective sizes (Ne,A, Ne,B) and immigration rates (mA, mB) of demes A and B given the temporal data (xA,0, xA,t, xB,0, xB,t) is
![]() |
(22) |
In Equation 22, each of the four samples follows a binomial distribution for Pr[x|p], the initial allele frequency of each population is assumed to be in a uniform distribution, and the allele frequency distribution of each population at generation t can be calculated by using the transition matrix.
In principle, the exact likelihood could be worked out for any set of the four parameter values, but in practice this would be computationally intensive for anything but small population sizes and small sampling interval (t). The total number of configuration combinations that need to be considered in Equation 22 is ((2Ne,A + 1) (2Ne,B + 1))t for a biallelic locus, which increases exponentially with sampling interval t. Therefore, we use the transition matrix method for the exact calculation of the likelihood for samples separated by only a single generation (t = 1) and use a Monte Carlo approach to approximate the likelihood for samples with t > 1.
Equation 22 could easily be extended to more than two samples from each deme with little increase in computational demands. It could also be extended to more than two demes, but the computational intensity increases prohibitively with the number of demes, because of the exponential increases in configuration combinations and the number of parameters for joint estimation.
| SIMULATION RESULTS |
|---|
Infinite source:
Methods for these simulations are summarized in the Appendix. For focal populations with wide ranges of effective size and immigration rates, the maximum-likelihood and moment estimates of 1/(2Ne) and m from simulations are summarized in Table 2. The sample sizes used in the simulations are large, but not unrealistic. Indeed, a large sample size is important for the temporal methods; otherwise, the apparent changes in allele frequency would be dominated by sampling error rather than by genetic drift. For most natural populations, the actual census population sizes could be orders of magnitude larger than the effective sizes (![]()
|
The maximum-likelihood estimator gives unbiased estimates of 1/(2Ne) in all cases considered. The coefficient of variation of the estimates is roughly one-quarter, with 95% of the estimates being in the range of
50 and 150% of the true values. The moment estimator underestimates 1/(2Ne) slightly when m is large and sample size is small relative to Ne. Increasing sample size can decrease the bias dramatically. For the case of Ne = 100 and m = 0.5 in Table 2, for example, the mean ± SD of the moment estimates of 1/(2Ne) improves to 0.9 ± 0.3 (relative to the true value) when the sample size is 500 individuals. The estimates from the moment estimator are generally more dispersed than those from the maximum-likelihood estimator.
The maximum-likelihood method overestimates m slightly only when m is small in very small populations (Ne = 10). This is expected because maximum-likelihood estimates have a lower bound at zero. Increasing sample size or marker information (number of loci and alleles) in a sample decreases the bias and increases precision. In contrast to 1/(2Ne), both the moment and the maximum-likelihood methods give similar estimates of m. However, some moment estimates of m are negative when m is small.
In general, both maximum-likelihood and moment methods give reasonable estimates of Ne and m. The accuracy and precision of the two estimators are similar for m, but the likelihood method is better for Ne. This is partially because moment methods must use the estimate of m in estimating Ne. Such a two-step estimation procedure is expected to be inferior to the joint estimation of m and Ne from the likelihood approach, because the estimation error of m is not accounted for in estimating Ne. Also, when m is large and therefore the changes of allele frequencies are rapid, the approximation to the mean value of pq in deriving the moment estimator of Ne may be no longer good enough.
The distribution of the estimates of Ne [or 1/(2Ne)] generally has a long tail, with a significant proportion of the estimates being much larger (smaller) than the true value. This is similar to the observation made on the case of a single isolated population (![]()
![]()
![]()
|
Typically, the likelihood surface is smooth and has a single maximum. This means that a single arbitrary starting point is sufficient to obtain maximum-likelihood estimates using Powell's quadratic convergence method (![]()
|
Multiple samples over time can increase the estimation precision for the average 1/(2Ne) and m during the entire sampling period. Table 3 summarizes the estimates of 1/(2Ne) and m, relative to the true parameter values of 1/(2Ne) = 0.01 and m = 0.2, from maximum-likelihood and moment estimators using two to six temporal samples from the focal population. As expected, increasing the number of samples has no influence on the accuracy, but increases the precision dramatically, as indicated by the decreasing SD and the width between the 2.5 and 97.5 percentiles. The likelihood method gives unbiased estimation of 1/(2Ne) and m, while the moment estimator results in underestimation of 1/(2Ne) and overestimation of m. The poor accuracy of the moment estimator is partially due to the highly polymorphic loci (10 alleles per locus) and small sample sizes, which lead to a high proportion of rare alleles in the samples (![]()
|
Quasi-infinite source, the island model and the stepping-stone model:
In reality, the source population is not infinite in size, although it could be very large. Moreover, there can be different numbers of source populations for a focal population with migration in different patterns. As long as we can identify the source populations a priori, however, the parameters for the focal population can be estimated approximately using the methods developed for the infinite-source model by treating the source populations collectively as the single "infinite" source. Here we explore numerical examples to show that the infinite-source model is very robust and works well for small source populations with various models of migration.
Consider a metapopulation consisting of 21 populations of equal effective size Ne. Each population receives, at each generation, a proportion of m migrants taken equally from each of the remaining 20 populations in the island model or from each of the two neighboring populations in the circular stepping-stone model. For the estimation of m and Ne of a focal population, we take two samples separated by two generations from the focal and each of four source populations (the island model) or each of the two neighboring populations (the stepping-stone model). Each sample had 50 individuals and two alleles per locus for Ne = 10 and 100 individuals and eight alleles per locus for Ne = 100. The estimates of 1/(2Ne) and m relative to their true values are summarized in Table 4 for maximum-likelihood and moment estimators. In all cases considered, maximum likelihood provided reasonable estimates of both 1/(2Ne) and m, while the moment estimator gives satisfactory estimation of 1/(2Ne) and m only when drift is not very weak relative to migration. Otherwise, 1/(2Ne) is underestimated, similar to the situation of an infinite large source population shown in Table 2.
|
We also tested the infinite-source model with the extreme case of two small populations of equal Ne and m, shown in Table 5. Although the source population (A) is small, the estimation of both 1/(2Ne) and m of the focal population (B) is still not bad, at least within a factor of two. The moment estimator gives similar results (not shown). Like the results shown above, the worst case occurs with an extremely high migration rate, where populations can quickly reach the drift-migration equilibrium. In this case, increasing sample size improves the estimation considerably. For the situation of Ne = 100 and m = 0.5 in Table 5, for example, the mean (relative) estimate ± SD is 0.969 ± 0.204 for 1/(2Ne) and 0.809 ± 0.204 for m when the sample size is increased to 400 individuals.
|
Two-deme models:
As a numerical example for the case of two demes with sampling interval t = 1, Fig 5 shows the frequency distributions of the estimates of Ne,A, mA, Ne,B, and mB. As can be seen, both estimators give reasonably good estimates of effective size and migration rate, the precision of the likelihood estimator being always higher than that of the moment estimator.
|
For the case of sampling interval t > 1, the moment estimator for Ne becomes quite complicated and the likelihood estimator becomes computationally intensive. In principle, however, both methods are not limited by the sampling interval, except for the case of very large t where populations are approaching an equilibrium between drift and migration (Fig 1 and Fig 2). At intermediate values of t, both methods should actually be more powerful than at t = 1 (Fig 1B). Because of the computational demands of the Monte Carlo approach, we have run only a few replicates for t > 1 for the likelihood method, yielding reasonable estimates of both effective size and migration rate, as expected (data not shown).
| APPLICATION TO A REAL DATA SET |
|---|
![]()
![]()
1.5 and 2 generations for ponds 1 and 2, respectively (![]()
e = (1.5/2)N'e and
= 1 - e(2/1.5)log(1-m'), respectively. Similar results were obtained by using a sampling interval of 1.
The relative profile log-likelihood curves for the effective size and migration rate of subpopulations in ponds 1 and 2 are shown in Fig 6. The maximum-likelihood estimates are Ne = 16.3 and m = 0.39 for pond 1 and Ne = 18.0 and m = 0.19 for pond 2. If isolation between ponds is assumed, the likelihood Ne estimates are 17.1 and 21.9 for ponds 1 and 2, respectively, both being somewhat larger than those allowing migration.
|
The estimated migration rates between ponds are high. Although newts are found to migrate as juveniles and adults during breeding seasons (![]()
1020. Interspecies introgression may also be partially responsible for the high migration rate estimates. On average
4% of adult newts from these ponds are F1-hybrids between T. cristatus and T. marmoratus (![]()
![]()
![]()
![]()
75 T. cristatus individuals.
| DISCUSSION |
|---|
The estimation of demographic properties of a species from its genetic characteristics is now an old topic. Many methods have been generated by statistical geneticists to estimate population sizes, migration rates, population growth rates, and many other interesting and useful ecological parameters. Some of these methods work quite well most of the time, and some are less successful. All share one property: they make simplifying assumptions about the possible range of demographies that can be considered, and the methods proposed here are no exception. It should be our goal, though



























