Genetics, Vol. 148, 1921-1930, April 1998, Copyright © 1998

Signatures of Population Expansion in Microsatellite Repeat Data

Marek Kimmela, Ranajit Chakrabortyb, J. Patrick Kinga, Michael Bamshadc, W. Scott Watkinsc, and Lynn B. Jordec
a Department of Statistics, Rice University, Houston, Texas 77251,
b Human Genetics Center, University of Texas Health Science Center, Houston, Texas 77225,
c Eccles Institute of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, Utah 84112

Corresponding author: Ranajit Chakraborty, Human Genetics Center, University of Texas Health Science Center, P.O. Box 20334, Houston, TX 77225, rc{at}hgc9.sph.uth.tmc.edu (E-mail).

Communicating editor: N. TAKAHATA


*  ABSTRACT
*TOP
*ABSTRACT
*DYNAMICS OF MICROSATELLITE LOCI...
*NUMERICAL EXAMPLES
*ANALYSIS OF DATA ON...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

To examine the signature of population expansion on genetic variability at microsatellite loci, we consider a population that evolves according to the time-continuous Moran model, with growing population size and mutations that follow a general asymmetric stepwise mutation model. We present calculations of expected allele-size variance and homozygosity at a locus in such a model for several variants of growth, including stepwise, exponential, and logistic growth. These calculations in particular prove that population bottleneck followed by growth in size causes an imbalance between allele size variance and heterozygosity, characterized by the variance being transiently higher than expected under equilibrium conditions. This effect is, in a sense, analogous to that demonstrated before for the infinite allele model, where the number of alleles transiently increases after a stepwise growth of population. We analyze a set of data on tetranucleotide repeats that reveals the imbalance expected under the assumption of bottleneck followed by population growth in two out of three major racial groups. The imbalance is strongest in Asians, intermediate in Europeans, and absent in Africans. This finding is consistent with previous findings by others concerning the population expansion of modern humans, with the bottleneck event being most ancient in Africans, most recent in Asians, and intermediate in Europeans. Nevertheless, the imbalance index alone cannot reliably estimate the time of initiation of population expansion.


TANDEM repeat loci, with repeat motifs 2–6 nucleotides long, called microsatellites (TAUTZ 1993 Down), have been shown to be extremely helpful in evolutionary studies (CHAKRABORTY and JIN 1993A Down; BOWCOCK et al. 1994 Down; DEKA et al. 1995 Down), forensic identification of individuals (National Research Council 1996), determination of parentage and relatedness of individuals (CHAKRABORTY and JIN 1993B Down; PENA and CHAKRABORTY 1994 Down), and mapping genes in the genome (COX-MATISE et al. 1994 Down; HANIS et al. 1996 Down). This is because of their abundant distribution in the genome (GYAPAY et al. 1994 Down) and ease and automated procedure of typing (LIN et al. 1996 Down). The relative efficiency of microsatellites in comparison to the classical genetic markers for all of the above applications mainly arises because of their high heterozygosity (WEISSENBACH et al. 1992 Down), as well as ubiquity of polymorphism, even in inbred populations or species (GILBERT et al. 1990 Down).

The use of microsatellite loci for evolutionary purposes, however, has been a subject of intense research in recent studies because the mechanisms that produce new variation at such loci are unusual in comparison to those of classical loci. While the exact mechanism of mutations at such loci is still not characterized at a molecular level (e.g., JEFFREYS et al. 1994 Down), it is generally believed that the processes and the patterns of mutations at different tandem repeat loci may differ from locus to locus, depending on the motif as well as the size of alleles at each locus (WEBER 1990 Down; WEBER and WONG 1993 Down; JIN et al. 1996A Down; CHAKRABORTY et al. 1997 Down). Empirical and theoretical studies indicate that for most microsatellite loci, mutations lead to stepwise changes of the repeat size of alleles although the relative frequencies of mutations leading to expansion may not be equal to those of contraction of allele sizes (DI RIENZO et al. 1994 Down; VALDES et al. 1993; SHRIVER et al. 1993 Down; RUBINSZTEIN et al. 1995 Down).

Therefore, we recently developed a general stepwise mutation model to study the population dynamics of microsatellite loci in which mutations may change the allele size in any arbitrary specified manner that is not necessarily symmetric (KIMMEL et al. 1996 Down; KIMMEL and CHAKRABORTY 1996 Down). Under such models, in accordance with the previous results of simple stepwise mutation models (MORAN 1975 Down), even though the allele size distributions may be fluctuating, mutations and genetic drift will produce a stationary distribution of size differences among randomly chosen alleles from the population, and consequently, the population will have a steady-state value of homozygosity (heterozygosity) that is specified by a composite parameter, {theta}, the product of the effective size of the population and the rate of mutation at the locus (KIMMEL and CHAKRABORTY 1996 Down).

In such formulations, it is assumed that the population maintains a constant effective size during evolution. In contrast, through the analysis of distributions of nucleotide differences in pairwise comparison of mitochondrial DNA sequences from human populations, ROGERS and HARPENDING 1992 Down, HARPENDING et al. 1993 Down, and ROGERS 1995 Down have concluded that most human populations have experienced recent expansions. Several authors, however, have argued that natural selection (DI RIENZO and WILSON 1991 Down), high levels of homoplasy associated with hypervariable nucleotide sites (LUNDSTROM et al. 1992 Down), and population structure (MARJORAM and DONNELLY 1994 Down) may also mimic the signature of population expansion on the distribution of nucleotide differences in pairwise comparisons of mtDNA sequence data. More recently, BERTORELLE and SLATKIN 1995 Down showed that when recurrent mutations at the same site (a more realistic mutation model for the mtDNA sequence data) are considered, the observed number of segregating sites does not always support the population expansion theory from the analysis of the mtDNA sequence data. In other words, specific assumptions of a mutation model may differentially affect different measures of genetic variation, and thus, inference regarding population history from different measures of genetic variation may not always be the same. Thus, because the mutation model for microsatellite loci is different from that of nucleotide sequence variation, it is important to examine the signature of population expansion on the genetic variance at microsatellie loci and to evaluate the effect of population expansion on different measures (e.g., heterozygosity vs. variance of allele sizes) of variability at microsatellite loci.

The purpose of this research is to investigate such problems. Specifically, we present calculations of genetic variance (variance of allele sizes) and homozygosity (probability of size identity of alleles) at a microsatellite locus using a time-continuous Moran model (MORAN 1975 Down) for several variants of population growth possibly preceded by a bottleneck. From the expected variance of allele sizes and homozygosity in the population, we show that if the population growth model is ignored and these population measures are used to estimate the equilibrium value of {theta}, the variance-based estimator deviates from that based on homozygosity.

To quantify this imbalance of variance- and homozygosity-based estimates of {theta}, we define their ratio as the imbalance index. Under the assumptions of our model, the parametric value of this imbalance index, ß, when >1, is a signature of population expansion preceded by a bottleneck. Under different scenarios of population growth, we provide numerical calculations of such a ratio over time and apply the theory to data on 60 tetranucleotide loci surveyed in three major groups of human populations. Our results indicate that the tetranucleotide loci generally provide evidence of recent population expansion preceded by a bottleneck in all major human populations.


*  DYNAMICS OF MICROSATELLITE LOCI ACCORDING TO THE TIME-CONTINUOUS MORAN MODEL
*TOP
*ABSTRACT
*DYNAMICS OF MICROSATELLITE LOCI...
*NUMERICAL EXAMPLES
*ANALYSIS OF DATA ON...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Statistics used to describe a sample of alleles:
Consider a sample of n haploid individuals or chromosomes and a locus with a denumerable set of alleles indexed by integer numbers. The expectation of the estimator of the within-population component of genetic variance,

(1)
where Xi is the size of the allele at the locus in the ith chromosome present and is the mean of the Xi, is equal to V(t)/2, where

(2)
and Xi and Xj are the sizes of two alleles from the population (KIMMEL et al. 1996 Down). Xi and Xj are time-dependent random variables, i.e., Xi = Xi(t) and Xj = Xj(t), but for notational simplicity, the argument t is suppressed frequently because the time dependence is always clear from the context.

If pk denotes the relative frequency of allele k in the sample, then an estimator of homozygosity has the form

(3)

Note that the random variables Xi are not independent by only exchangeable. The expected value of 0, however, is the true homozygosity; i.e.,

(4)

The latter equation can be demonstrated by using the definition of pk as the fraction of chromosomes with allele of size k, i.e., pk = , and further representing nk as the sum of indicator variables {delta}kXi (= 1 when Xi = k; and = 0 otherwise), i.e., nk = {Sigma}i{delta}kXi , substituting into Equation 3, and taking expectation.

The time-continuous Moran model:
We consider the evolution of joint distributions of allele sizes in a stepwise mutation model with sampling from the finite allele pool. We assume the following:

  1. The population is composed of a constant number of 2N haploid individuals. Each individual undergoes death/birth events according to a Poisson process with intensity 1 (mean length of life of each individual is equal to 1). Upon a death/birth event, a genotype for the individual is sampled with replacement from the 2N chromosomes present at this moment, including the chromosome of the just-deceased individual (time-continuous Moran model, EWENS 1979 Down).

  2. Each individual is independently subjected to a mutation that replaces an allele of size X with an allele of size X + U, where U is an integer-valued random variable with probability generating function (pgf )

    (5)

defined for s on the unit circle of the complex plane or in its neighborhood. Mutations occur according to a Poisson process with intensity v.

Suppose that we follow the evolution of the distribution of allele sizes X1(t) and X2(t) of two individuals in the population. We are interested in the distribution of the difference between these two allele sizes. The respective pgf is denoted as follows:

R(s,t) is a pgf of an integer-valued random variable. It is generally defined on the unit circle of the complex plane |s| = 1, or in its neighborhood. Consequently, it might be more appropriate to consider only R˜({phi},t) = R(e {iota}{phi},t), {phi} {isin} (-{infty}, {infty}), which is the characteristic function of the same random variable. For notational simplicity, however, it seems better to adhere to the pgf formalism and to use the characteristic function only when required.

In the next paragraphs, we consider the dynamics of R(s,t) when the population size is changing according to various patterns.

The assumptions above can be used to derive a differential equation for studying the dynamics of the function R(s,t) (our Equation 6 and Equation 17). We omit these calculations, however, in favor of a derivation based on the coalescent representation of the model. This has an advantage of proving that our calculations also are valid for a diffusion approximation of the Wright-Fisher model.

Stepwise change in population size and the disequilibrium index:
The ordinary differential equation that describes the dynamics of the pgf R(s,t) is given by

(6)
where (s,t) is the derivative of R(s,t) with respect to t and {psi}(s) = is the symmetrized version of the pgf {phi}(s) of U. This differential equation is analogous to the one used in the analysis of genetic variation at electrophoretically determined protein loci (WEHRHAHN 1975 Down; CHAKRABORTY and NEI 1982 Down; LI 1976 Down) under the stepwise mutation model (SMM; OHTA and KIMURA 1973 Down). In the present formulation, however, the distribution of allele size change caused by mutation (represented by the random variable U ) can be general, multistep, and asymmetric.

A formal solution of this differential equation can be obtained,

(7)
where

(8)

For |s| = 1, the solution tends to the equilibrium value

as t -> {infty}.

The stepwise change of population size is described as

Under this condition, Equation 7 assumes the form

(9)

Based on Equation 9, it is possible to derive expressions for the genetic variance and homozygosity at a given repeat locus. The variance is equal to V(t)/2, where V(t) = E {[X1(t) - X2(t)]2} = , because E[X1(t) - X2(t)] = 0. Consequently,

(10)
,

in which {psi}(1) is the second derivative of {psi}(s) evaluated at s = 1. V(t) clearly converges to V({infty}) = 4vN {psi}(1) = {theta}{psi}(1) as t -> {infty}. If the single-step SMM is assumed, i.e., if {psi}(s) = and consequently {psi}(1) = 1, we obtain

(11)

The expression for homozygosity requires evaluation of the zero-order (constant) term in the Laurent series expansion of R(s, t), i.e.,

with the integration path being a closed contour around the singularity at s = 0. It is convenient to choose the unit circle around the origin with the parameterization s = exp({iota}{phi}). If the single-step SMM is assumed, i.e., if {psi}(s) = , using the symmetry properties of the integrand, we obtain

(12)

As t -> {infty}, P0(t) converges to a limit value that can be explicitly written as

(13)

Equation 11 and Equation 13 provide two intuitive estimators of the composite parameter {theta},

(14)
called the (allele size) variance estimator of v, and

(15)
the homozygosity (heterozygosity) estimator of {theta}. At equilibrium,

which leads to a parametric definition of an index ß(t), given by

(16)
which represents an imbalance (caused by population size changes) at a microsatellite locus.

Arbitrary pattern of population size change:
Formal substitution of N(t) for N in Equation 6 yields

(17)
where

The solution obtained from the variation of constants is

(18)

As demonstrated in the APPENDIX, Equation 17 and Equation 18 can be obtained using the coalescent-based approach. Similarly as before, we derive expressions for variance and homozygosity,

(19)
and

(20)
where

and

If a mutation-drift equilibrium is assumed at time t = 0, we obtain

(21)
and V(0) = 4vN0{psi}(1). In this latter case,

(22)


*  NUMERICAL EXAMPLES
*TOP
*ABSTRACT
*DYNAMICS OF MICROSATELLITE LOCI...
*NUMERICAL EXAMPLES
*ANALYSIS OF DATA ON...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Modeling of imbalance index ß(t) under different population growth patterns and initial conditions:
We modeled the imbalance index ß(t), as defined in Equation 16, as a function of time (number of generations) for several patterns of population growth:

  1. Stepwise population growth: N(t) = N0, t = 0, and N(t) = N, t > 0.

  2. Exponential population growth: N(t) = N0 exp({alpha}t), t >= 0, where the growth rate {alpha} = has been selected so that N(t) = N if t = T.

  3. Logistic population growth: N(t) = , t >= 0, where the growth rate {alpha} and the carrying capacity K have been selected so that N(t) = N if t = T, and N(t) = if t = .

Three types of initial conditions selected are as follows:

  1. Mutation-drift equilibrium: V(0) = 4vN0, R(s,0) = R(s,{infty},N0).

  2. Initial population monomorphic: only a single allele present, hence V(0) = 0, R(s,0) = 1.

  3. Initial population carrying two alleles: uniform mixture of two alleles differing in size by k repeats, with respective frequencies p and q = 1 - p, hence V(0) = 2k 2pq, R(s,0) = (1 - 2pq) + pq(s k + s -k).

Finally, one more complex growth pattern was contemplated, with population initially of large size N00, dropping instantly to a smaller size N0, and then regrowing exponentially to a final size N, i.e.,

(23)
where {alpha} = has been selected so that N(t) = N if t = T. Technically, this variant can be computed for t > 0 as exponential growth starting from size N0 but from equilibrium R(s, {infty}, N00) corresponding to N00.

Population increase with parameters estimated from data on human populations:
We used the numerical values obtained by ROGERS and HARPENDING 1992 Down, who fitted distributions of pairwise differences of numbers of segragating sites in mitochondrial DNA to the data of CANN et al. 1987 Down. The second row of Table 1 in ROGERS and HARPENDING 1992 Down contains estimates concerning the world's population expansion. Correcting the fact that ROGERS and HARPENDING 1992 Down considered only females while we consider both genders, i.e., multiplying all effective sizes by 2, we obtain expansion from N0 = 3,254 to N = 547,586 within 120,000 yr or T = 4,800 generations, assuming generation times roughly equivalent to 25 yr. We combined these values with mutation rates v = 10-4 and 5 x 10-4 typical for microsatellite loci (WEBER and WONG 1993 Down).


 
View this table:
In this window
In a new window

 
Table 1. Estimates of parameter {theta} and of disequilibrium index ß based on data for three major human populations

Figure 1, a and b, presents the ß(t) index values for the stepwise and exponential population growth, with equilibrium initial conditions. The index falls with time to values <1, the deviation increasing with the mutation rate v. The logistic growth (not shown) leads to an effect that is intermediate between those caused by the stepwise and exponential growth.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 1. —Values of the ß(t) index for stepwise and exponential population growth corresponding to population expansion from N0 = 3,254 to N = 547,586, within 120,000 yr or T = 4,800 generations, with mutation rates v = 10-4 and 5 x 10-4. Equilibrium initial conditions: (a) stepwise growth and (b) exponential growth. Monomorphic initial conditions: (c) stepwise growth and (d) exponential growth.

Figure 1C and Figure D, presents the ß(t) index values for stepwise and exponential population growth, with initial conditions corresponding to a monomorphic population. The index is initially close to 0, but then rapidly, during ~100 generations, increases to a value close to 1 and subsequently follows almost the same trajectory as the case of equilibrium initial conditions.

Figure 2, a and b, presents the ß(t) index values for stepwise and exponential population growth, with initial conditions corresponding to a mixture of two alleles with parameters k = 5, p = q = . An interesting effect is observed: The index is initially much greater than 1 but falls to values between 1 and 2. Higher mutation rates yield lower values of the index.



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 2. —Values of the ß(t) index for stepwise and exponential population growth, corresponding to population expansion from N0 = 3,254 to N = 547,586, within 120,000 yr or T = 4,800 generations, with mutation rates v = 10-4 and 5 x 10-4. Initial conditions corresponding to a mixture of two alleles, with parameters k = 5, p = q = . (a) Stepwise growth, (b) exponential growth.

Figure 3 presents the ß(t) index values for the bottleneck patterns of Equation 23, with the prebottleneck population size N00 = 40,000, N0 = 3,254, and N = 547,586, as described above. Again, for an initial period, the index increases from 1 to values higher than 1, the increase being greater for greater mutation rates. After that initial period, an imbalance as in simple exponential growth is restored.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 3. —Values of the ß(t) index for the bottleneck pattern of Equation 23 with the pre-bottleneck population size N00 = 40,000, N0 = 3,254, N = 547,586, and T = 4,800 generations, with mutation rates v = 10-4 and 5 x 10-4.

To examine the impact of the initial population size (N0) on the imbalance index ß(t), in Figure 4, we present the values of ß(t) as a function of t for three values of the initial population size: N0 = 10,000, 20,000, and 50,000. As expected, larger N0 diminishes the deviation of ß(t) from 1. Nevertheless, the signature of expansion [namely, ß(t) < 1] is present for all initial sizes and for both models of population growth (stepwise or exponential). Similar sensitivity studies demonstrate robustness of the bottleneck pattern of Equation 23.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 4. —Values of the ß(t) index for stepwise (a) and exponential (b) population growth, corresponding to population expansion from equilibrium condition with N0 (= 10,000, 20,000, and 50,000) to N = 547,586, within 120,000 yr or T = 4,800 generations, with mutation rate v = 5 x 10-4.

In summary, if before expansion the population is at a mutation-drift equilibrium, the imbalance index deviates downwards from 1 [i.e., ß(t) < 1]. In contrast, if the population experiences a bottleneck preceding expansion, there will be a long (e.g., several thousand generations) transient time period during which ß(t) > 1 before showing the signature of expansion alone [ß(t) < 1]. Figure 1C and Figure D, shows an obvious exception to this general rule, when the bottleneck is severe enough to make the population monomorphic before expansion, in which case ß(t) < 1 for all times.


*  ANALYSIS OF DATA ON TETRANUCLEOTIDE LOCI
*TOP
*ABSTRACT
*DYNAMICS OF MICROSATELLITE LOCI...
*NUMERICAL EXAMPLES
*ANALYSIS OF DATA ON...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

JORDE et al. 1995 Down, JORDE et al. 1997 Down recently analyzed allele frequency distributions at 60 tetranucleotide loci in a worldwide survey of human populations. These authors also describe the details of the loci surveyed, as well as the various characteristics of the allele frequency distributions at these loci. In this section, we investigate whether there is any imbalance between allele size variances and heterozygosity (homozygosity) observed in these data, as analyzed by the imbalance index ß(t) defined above. The purpose is to examine if such an imbalance, if it exists, is in accordance with the population expansion model of human populations suggested from the analysis of mtDNA variation reported by ROGERS et al. (1992).

Three major groups of population, Asians, Africans, and Europeans, are considered for this purpose. For each population, the allele size variance and homozygosity at each locus were calculated from the distributions of allele frequencies within each of these population groups. Estimators /2 and 0 in Equation 1 and Equation 3, respectively, averaged over the 60 loci were used for these computations for the respective parameters. The variance estimator V is obtained by equating {theta} = V, while the homozygosity estimator P0 is obtained by equating {theta} = .

Finally, the estimator used has the form

where and 0 are estimates averaged over 60 loci.

Simulation studies were carried out to determine the statistical properties of the estimator ln = ln - ln0 under the null hypothesis of constant population size and mutation-drift equilibrium.

Figure 5 depicts histograms of ln based on coalescent simulations with different values of {theta} = 4Nv. The estimator has an almost symmetric distribution centered around 0. For example, for {theta} = 10, the 0.05 and 0.95 quantiles of the empirical distribution of ln{upsilon} are q0.05 = -0.24 and q0.95 = 0.21, respectively.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 5. —Empirical distribution of ln, from coalescent-based simulations, under the null hypothesis of constant population size and mutation-drift equilibrium and under the single-step stepwise mutation model. Estimates of lnß are based on averages of variance and homozygosity over 60 loci. Five hundred simulations were run for each assumed value of parameter {theta}.

Table 1 contains the values estimated from the data on three major groups of populations. The values of ln for Asians, Europeans, and Africans are equal to 0.60, 0.29, and 0.11, respectively.

Figure 6 depicts a comparison of the sample values of ln with the simulation-based quantiles (with 500 replications of coalescent simulations of 60 loci each) of the distribution of ln under the null hypothesis of constant population size and mutation-drift equilibrium. The value for Asians exceeds the 0.99 quantile. The value for Europeans is located between the 0.95 and the 0.99 quantiles. The value for Africans, residing around the 0.70 quantile, is not significantly different from 0.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 6. —Continuous lines: Simulation-based 0.01, 0.05, 0.95, and 0.99 quantiles of the distribution of ln, under the hypothesis of constant size and mutation-drift equilibrium and plotted against the assumed values of parameter {theta}. Symbols: Data-based estimates of ln for the three major human populations—{bigtriangleup}, Asians; {square}, Europeans; {circ}, Africans—plotted against estimates of {theta} based on variance (solid symbols) and on homozygosity (empty symbols).

The behavior of ln obtained from the data is consistent with the growth scenarios depicted in Figure 2 and Figure 3, i.e., > 1 or ln > 0. Both of these scenarios assume a reduced diversity of the population at the time when population expansion begins (t = 0), representing the consequences of a pre-expansion bottleneck.

The gradation of sample values of ln is consistent with the bottleneck being most ancient in Africans, most recent in Asians, and of intermediate age in Europeans.

In general, this is in agreement with a population growth scenario with pre-expansion and the present effective sizes, as estimated by ROGERS et al. (1992), although these authors do not explicitly model a bottleneck. Of course, from ß indices alone, the exact pattern of population growth (stepwise vs. logistic or exponential) or the time of initiation of the expansion cannot be predicted reliably.

Another technical remark concerns alternative estimators of lnß. For example, if (ln)i = (lnv)i - (lnv0)i is calculated for each individual locus and these individual estimators are averaged, one obtains an estimator that is seriously downward biased, although it has a lower variance than the one we used (based on simulations, not shown). For our purposes, it is more appropriate to have a less biased estimator. Furthermore, the estimator we used also has a lower mean square error than the one mentioned above.


*  DISCUSSION
*TOP
*ABSTRACT
*DYNAMICS OF MICROSATELLITE LOCI...
*NUMERICAL EXAMPLES
*ANALYSIS OF DATA ON...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Our theory indicates that population expansion leaves a strong signature on allele size distributions, and the signature is specific for different major human populations. The departure from the equilibrium value of ln is strongest in Asians, weakest in Africans, and intermediate in Europeans. This can be translated into the bottleneck being most ancient in Africans, least ancient in Asians, and of intermediate age in Europeans. This, in turn, is consistent with a scenario in which a small subpopulation emerges from Africa and moves via Europe to Asia, with some of its descendants settling en route and expanding, possibly replacing the preexisting populations.

Before considering the implications of these findings, recall that any signature of past population size changes through the imbalance index ß requires unbiased estimation of the index. We adopted the estimation procedure where ln{theta}V and ln{theta}P0 were estimated from average (over loci) estimates of V and P0 to obtain ln = ln - ln0 . While, in theory, locus-specific estimates of lnß can be obtained, our simulations (not shown) indicate that , estimated in this fashion, is severely biased downwards (i.e., in the direction ß < 1), even when population size is constant and the population remains in mutation-drift equilibrium throughout time.

The theory described above also indicates that the deviation from ß = 1 is of a qualitatively different pattern for different scenarios of past population size changes. For example, a population at a mutation-drift equilibrium, when it suddenly or gradually increases in size, will produce ß < 1, while if it experiences a bottleneck followed by expansion, it will produce ß transiently >1 and subsequently falling <1. With realistic values of parameters (Figure 3), the transient values of ß > 1 can persist for several thousand generations. These patterns, which are due to fluctuations of population sizes, these patterns are valid for a general stepwise mutation model. Because any general form of {psi}(s) (Equation 22 and Equation 20) can yield ß != 1, we argue that the specificity of mutation pattern is not the critical determinant of the signature of population expansion preceded by bottlenecks at different time points, as noted in the present work.

The importance of the implications of our findings is worth discussing. Expansion of population size, preceded by bottleneck events that appear to have occurred at different points in time for the three major human populations, is consistent with a replacement model (STRINGER 1989 Down) of the origin of modern humans, while it tends to argue against the multiregional model (WEIDENREICH 1939 Down), which maintains that humans probably have not experienced a major bottleneck. We also argue that should the recent expansion of size apply to most human populations, evolutionary inference based on summary statistics of microsatellite variation should be viewed with caution. For example, when genetic distances are based on indices related to heterozygosity alone (as in the case of Nei's distance Da; NEI et al. 1983 Down), the branch lengths and topology may be grossly misspecified. So will be the case of allele size variance–based measures of genetic distance (GOLDSTEIN et al. 1995; SLATKIN 1995 Down; KIMMEL et al. 1996 Down).

Second, deviation from mutation-drift equilibrium is not necessarily an indicator of selective forces operating on the microsatellites. Demographic history of populations, as shown in our analysis, can produce deviation that cannot always be distinguished from certain types of selection (see BERTORELLE and SLATKIN 1995 Down).

Third, note that the present analysis indicates that the within-population variance of allele size is different from its mutation-drift equilibrium value for a growing population, and this departure is dependent on the mutation rate at the locus, as well as the growth pattern of the population. Although in the present work we used data on tetranucleotide loci alone, the impact of these findings on the estimates of relative mutation rates of different motif types of microsatellites is also important. We argue that although CHAKRABORTY et al. 1997 Down used a mutation-drift equilibrium model to estimate the relative mutation rates of di-, tri-, and tetranucleotide loci, their conclusions are consistent with the analyses of the present set of data. This is so because Equation 20 clearly shows that even in the nonequilibrium case (caused by population size change), the ratio of expected variances between loci is simply given by the respective ratio of their mutation rates.

Finally, we note that an observed imbalance such as the one noted in the present analysis is not necessarily caused by population expansion alone. There could be possible effects of population structure superimposed on this factor (data considered here are in fact from a number of different national populations within each group), and even the different loci may be subject to differential allele size constraints.


*  ACKNOWLEDGMENTS

This work was supported by grants GM 41399 (R.C.), GM 58545 (R.C. and M.K.), and RR 00064 (L.B.J., W.S.W., and M.B.) from the National Institutes of Health, as well as grants DMS 9409909 (M.K.), DBS 9310105 (L.B.J., W.S.W., and M.B.), and DBS 9514733 (L.B.J., W.S.W., and M.B.). The authors also acknowledge support from the National Science Foundation, grant 1T15LM07093-04 from the National Library of Medicine (J.P.K.), and the Keck's Center for Computational Biology at Rice University (M.K. and J.P.K.).

Manuscript received May 30, 1997; Accepted for publication November 24, 1997.


*  APPENDIX
*TOP
*ABSTRACT
*DYNAMICS OF MICROSATELLITE LOCI...
*NUMERICAL EXAMPLES
*ANALYSIS OF DATA ON...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Coalescent-based derivation of expression (Equation 17):
Let us consider the present time (t) as a reference point, and let us introduce the reverse time {tau}* such that {tau}* = t - {tau}, where {tau} is the chronological time assuming value {tau} = t at the present. Let us further denote N *({tau}*) = N (t - {tau}*) and R *(s,{tau}*) = R(s,t - {tau}*). Suppose that lineages of two chromosomes from the population coalesce at the reverse time T = {tau}*. Then, under the SMM,

The distribution of the nonnegative random variable T has hazard rate [2N *({tau}*)]-1, {tau}* >= 0, equal to the coalescence intensity. T is proper if {int}{infty}0[2N*({tau}*)]-1d{tau}* = {infty} . Therefore,

Passing to the usual time, we obtain

But this is equal to

which is identical as Equation 18, considering that


*  LITERATURE CITED
*TOP
*ABSTRACT
*DYNAMICS OF MICROSATELLITE LOCI...
*NUMERICAL EXAMPLES
*ANALYSIS OF DATA ON...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

BERTORELLE, G. and M. SLATKIN, 1995  Number of segregating sites in expanding human populations, with implications for estimates of demographic parameters. Mol. Biol. Evol. 12:887-892[Abstract].

BOWCOCK, A. M., R.-A. LINARES, J. TOMFOHRDE, E. MINCH, J. R. KIDD, and L. L. CAVALLI-SFORZA, 1994  High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455-457[Medline].

CANN, R., M. STONEKING, and A. C. WILSON, 1987  Mitochondrial DNA and human evolution. Nature 325:31-36.

CHAKRABORTY, R. and M. NEI, 1982  Genetic differentiation of quantitative characters between populations of species. I. Mutation and random genetic drift. Genet. Res. Camb. 39:303-314.

CHAKRABORTY, R., and L. JIN, 1993a A unified approach to study hypervariable polymorphisms: statistical considerations of determining relatedness and population distances, pp. 153–175 in DNA Fingerprinting: State of the Science, edited by S. D. J. PENA, R. CHAKRABORTY, J. T. EPPLEN and A. J. JEFFREYS. Birkhauser, Basel.

CHAKRABORTY, R. and L. JIN, 1993b  Determination of relatedness between individuals by DNA fingerprinting. Hum. Biol. 65:875-895[Medline].

CHAKRABORTY, R., M. KIMMEL, D. N. STIVERS, R. DEKA, and L. J. DAVISON, 1997  Relative mutation rates at di-, tri-, and tetra-nucleotide microsatellite loci. Proc. Natl. Acad. Sci. USA 94:1041-1046[Abstract/Free Full Text].

COX-MATISE, T., M. PERLIN, and A. CHAKRAVARTI, 1994  Automated construction of genetic linkage maps using an expert system (multiMap): A human genome linkage map. Nature Genet. 6:384-390[Medline].

DEKA, R., M. D. SHRIVER, L. M. YU, R. E. FERRELL, and R. CHAKRABORTY, 1995  Intra- and inter-population diversity at short tandem repeat loci in diverse populations of the world. Electrophoresis 16:1659-1664[Medline].

DI RIENZO, A. and A. C. WILSON, 1991  Branching pattern in the evolutionary tree from human mitochondrial DNA. Proc. Natl. Acad. Sci. USA 88:1597-1601[Abstract/Free Full Text].

DI RIENZO, A., A. C. PETERSON, J. C. GARZA, A. M. VALDES, and M. SLATKIN et al., 1994  Mutational process of simple-sequence repeat loci in human populations journal. Proc. Natl. Acad. Sci. USA 91:3166-3170[Abstract/Free Full Text].

EWENS, W. J., 1979 Mathematical Population Genetics. Springer-Verlag, New York.

GILBERT, D. A., N. LEHMAN, S. J. O'BRIEN, and R. K. WAYNE, 1990  Genetic fingerprinting reflects population differentiation in California Channel Island fox. Nature 344:764-767[Medline].

GOLDSTEIN, D. B., A. R. LINARES, M. W. FELDMAN, and L. L. CAVALLI-SFORZA, 1995b  An evaluation of genetic distances for use with microsatellite loci. Genetics 139:463-471[Abstract].

GYAPAY, G., J. MORISSETTE, A. VIGNAL, C. DIB, and C. FIZAMES et al., 1994  The 1993–1994 Genethon human genetic linkage map. Nature Genet. 7:246-339[Medline].

HANIS, C. L., E. BOERWINKLE, R. CHAKRABORTY, D. L. ELLSWORTH, and P. CONCANNON et al., 1996  A genome-wide search for human non-insulin-dependent (type 2) diabetes genes reveals a major susceptibility locus on chromosome 2. Nature Genet. 13:161-166[Medline].

HARPENDING, H. C., C. S. T. SHERRY, A. R. ROGERS, and M. STONEKING, 1993  The genetic structure of ancient human populations. Curr. Anthropol. 34:483-496.

JEFFREYS, A. J., K. TAMAKI, A. MACLEOD, D. G. MONCKTON, and D. L. NEIL et al., 1994  Complex gene conversion events in germline mutation at human minisatellites. Nature Genet. 6:136-145[Medline].

JIN, L., C. MACAUBAS, J. MALLMAYAR, A. KIMURA, and E. MIGNOT, 1996a  Mutation rate varies among alleles at a microsatellite locus: phenotypic evidence. Proc. Nat. Acad. Sci. USA 93:15285-15288[Abstract/Free Full Text].

JORDE, L. B., M. J. BAMSHAD, W. S. WATKINS, R. ZENGER, and A. E. FRALEY et al., 1995  Origins and affinities of modern humans: A comparison of mitochondrial and nuclear genetic data. Am. J. Hum. Genet. 57:523-538[Medline].

JORDE, L. B., A. R. ROGERS, W. S. WATKINS, P. KRAKOWIAK, and S. SUNG et al., 1997  Microsatellite diversity and the demographic history of modern humans. Proc. Natl. Acad. Sci. USA 94:3100-3103[Abstract/Free Full Text].

KIMMEL, M. and R. CHAKRABORTY, 1996  Measures of variation at DNA repeat loci under a general stepwise mutation model. Theor. Pop. Biol. 50:345-367[Medline].

KIMMEL, M., R. CHAKRABORTY, D. N. STIVERS, and R. DEKA, 1996  Dynamics of repeat polymorphisms under forward-backward mutation model: Within- and between-population variability at microsatellite loci. Genetics 143:549-555[Abstract].

LI, W.-H., 1976  Electrophoretic identity of proteins in a finite population and genetic distance between taxa. Genet. Res. Camb. 28:119-127[Medline].

LIN, Z., X. CUI, and H. LI, 1996  Multiplex genotype determination at a large number of gene loci. Proc. Natl. Acad. Sci. USA 93:2582-2587[Abstract/Free Full Text].

LUNDSTROM, R., S. TAVARE, and R. H. WARD, 1992  Modelling evolution of the human mitochondrial genome. Math. Biosci. 112:319-335[Medline].

MARJORAM, P. and P. DONNELLY, 1994  Pairwise comparisons of mitochondrial DNA sequences in subdivided populations and implications for early human populations. Genetics 136:673-683[Abstract].

MORAN, P. A. P., 1975  Wandering distributions and the electrophoretic profile. Theor. Pop. Biol. 8:318-330[Medline].

NEI, M., F. TAJIMA, and Y. TATENO, 1983  Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data. J. Mol. Evol. 19:153-170[Medline].

NATIONAL RESEARCH COUNCIL, 1996 The Evaluation of Forensic DNA Evidence by National Research Council, National Academy Press, Washington DC.

OHTA, T. and M. KIMURA, 1973  A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22:201-204[Medline].

PENA, S. D. J. and R. CHAKRABORTY, 1994  Paternity testing in the DNA era. Trends Genet. 10:204-209[Medline].

ROGERS, A. R., 1995  Genetic evidence for a Pleistocene population explosion. Evolution 49:608-615.

ROGERS, A. R. and H. C. HARPENDING, 1992  Population growth makes waves in the distribution of pairwise genetic differences. Mol. Biol. Evol. 9:552-569[Abstract].

RUBINSZTEIN, D. C., W. AMOS, J. LEGGO, S. GOODBURN, and S. JAIN et al., 1995  Microsatellite evolution—evidence for directionality and variation in rate between species. Nature Genet. 10:337-343[Medline].

SHRIVER, M. D., L. JIN, R. CHAKRABORTY, and E. BOERWINKLE, 1993  VNTR allele frequency distributions under the stepwise mutation model—a computer simulation approach. Genetics 134:983-993[Abstract].

SLATKIN, M., 1995  A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457-462[Medline].

STRINGER, C. B., 1989 Neanderthals, their contemporaries and modern human origin, pp. 351–355 in Hominidae, edited by G. GIACOBINI. Jaca Book, Milan.

TAUTZ, D., 1993 Notes on the definition and nomenclature of tandemly repetitive DNA sequence, pp. 21–28 in DNA Fingerprinting: State of the Science, edited by S. D. J. PENA, R. CHAKRABORTY, J. T. EPPLEN and A. J. JEFFREYS. Birkhauser, Basel.

VALES, A. M., M. SLATKIN, and N. B. FREIMER, 1993  Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737-749[Abstract].

WEBER, J. L., 1990  Informativeness of human (dC-dA)n · (dG-dT)n polymorphisms. Genomics 7:524-530[Medline].

WEBER, J. L. and C. WONG, 1993  Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123-1128[Abstract/Free Full Text].

WEHRHAHN, C. F., 1975  The evolution of selectively similar electrophoretically detectable alleles in finite natural populations. Genetics 80:375-394[Abstract/Free Full Text].

WEIDENREICH, F., 1939  Six lectures on Sinanthropus pekinensis and related problems. Bull. Geol. Soc. China 19:1-110.

WEISSENBACH, J., G. GYAPAY, C. DIB, A. VIGNAL, and P. MORRISSETTE et al., 1992  A second-generation linkage map of the human genome. Nature 359:794-801[Medline].




This article has been cited by other articles:


Home page
GeneticsHome page
A. RoyChoudhury and M. Stephens
Fast and Accurate Estimation of the Population-Scaled Mutation Rate, {theta}, From Microsatellite Genotype Data
Genetics, June 1, 2007; 176(2): 1363 - 1366.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Wegmann, M. Currat, and L. Excoffier
Molecular Diversity After a Range Expansion in Heterogeneous Environments
Genetics, December 1, 2006; 174(4): 2009 - 2020.
[Abstract] [Full Text] [PDF]


Home page
Chem SensesHome page
U.-k. Kim, S. Wooding, N. Riaz, L. B. Jorde, and D. Drayna
Variation in the Human TAS1R Taste Receptor Genes
Chem Senses, September 1, 2006; 31(7): 599 - 611.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. P. Michel, O. Grushko, W. M. Guelbeogo, N. F. Lobo, N. Sagnon, C. Costantini, and N. J. Besansky
Divergence With Gene Flow in Anopheles funestus From the Sudan Savanna of Burkina Faso, West Africa
Genetics, July 1, 2006; 173(3): 1389 - 1395.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y. Xue, T. Zerjal, W. Bao, S. Zhu, Q. Shu, J. Xu, R. Du, S. Fu, P. Li, M. E. Hurles, et al.
Male Demography in East Asia: A North-South Contrast in Human Population Expansion Times
Genetics, April 1, 2006; 172(4): 2431 - 2439.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. E. Stajich and M. W. Hahn
Disentangling the Effects of Demography and Selection in Human History
Mol. Biol. Evol., January 1, 2005; 22(1): 63 - 73.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. F. Storz, B. A. Payseur, and M. W. Nachman
Genome Scans of DNA Variability in Humans Reveal Evidence for Selective Sweeps Outside of Africa
Mol. Biol. Evol., September 1, 2004; 21(9): 1800 - 1811.
[Abstract] [Full Text] [PDF]