Genetics, Vol. 159, 737-747, October 2001, Copyright © 2001

DNA Dinucleotide Evolution in Humans: Fitting Theory to Facts

Alexander Renwicka, Leslea Davisona, Heidi Spratta, J. Patrick Kinga, and Marek Kimmela
a Department of Statistics, Rice University, Houston, Texas 77251

Corresponding author: Marek Kimmel, Department of Statistics, Rice University, 6100 Main St., MS 138, Houston, TX 77251., kimmel{at}rice.edu (E-mail)

Communicating editor: N. TAKAHATA


*  ABSTRACT
*TOP
*ABSTRACT
*DATA
*MODEL OF EVOLUTION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We examine length distributions of ~6000 human dinucleotide microsatellite loci, representing chromosomes 1–22, from the GDB database. Under the stepwise mutation model, results from theory and simulation are compared with the empirical data. In both constant and expanding population scenarios, a simple single-step model with parameters chosen to account for the observed variance of microsatellite lengths produces results inconsistent with the observed heterozygosity and the dispersion of length skewness. Complicating the model by allowing a variable mutation rate accounts for the homozygosity, and introducing a small probability of a large mutation step accounts for the dispersion in skewnesses. We discuss these results in light of the long-term evolution of microsatellites.


MICROSATELLITES are regions of DNA where a short (2–6 bp) motif is repeated a number of times (e.g., ... ATATATATAT ... ). They are ubiquitous in that they are found in the genomes of all living organisms. They are also highly polymorphic; the number of repeats varies between individuals. Also, they are easily assayed; the repeat lengths of known loci can be assessed from a small tissue sample.

Polymorphism of microsatellites reflects their high mutation rate, on the order of 5 x 10-4 up to 5 x 10-3 per generation. Mutation rate is thought to depend, among others, on the length of repeat motif (WEBER and WONG 1993 Down; CHAKRABORTY et al. 1997 Down). Mutations are thought to occur by two possible mechanisms: Replication slippage occurs when the DNA polymerase enzyme "slips" during replication. Unequal crossing over occurs during chromosomal recombination when the site of recombination is located within a microsatellite but the DNA strands are misaligned.

The uses of microsatellites include forensics, gene mapping, and evolutionary studies. Because of the large number of alleles, individuals can be uniquely identified by their allele status at several loci (CHAKRABORTY and JIN 1993 Down). The same property helps resolve haplotypes in family linkage studies (DIB et al. 1996 Down). The application of most interest in this article is in evolutionary studies. Phylogenetic relationships and/or demographic events may be inferred from population level repeat length distributions (MOUNTAIN and CAVALLI-SFORZA 1997 Down; DIRIENZO et al. 1998 Down; KIMMEL et al. 1998 Down; REICH and GOLDSTEIN 1998 Down; REICH et al. 1999 Down).

To draw correct inference from microsatellite statistics, it is of importance to understand the evolutionary dynamics of microsatellites. Each particular microsatellite locus finds itself under the action of mutation and drift. In addition, although it is possible to restrict the study to microsatellites in noncoding regions of the genome, linkage disequilibrium may cause them to be associated with other loci, possibly under selection. Another force of importance is demography. Evolution of microsatellites, like that of any other loci, may be influenced by past population expansions, bottlenecks, and migrations (KIMMEL et al. 1998 Down).

The usual model invoked in the context of microsatellite loci is the stepwise mutation model (SMM; see KIMMEL and CHAKRABORTY 1996 Down for a review of literature), in which the only form of allelic change by mutation is an extension or contraction in the number of repeats at the locus. The most common version of the SMM is the single-step SMM (or SSMM), in which the extensions and contractions are never longer than one repeat. Long-term consequences of using the SSMM instead of a more general model with multiple expansions/contractions possible were explored by several authors and found to be significant.

One way to understand the evolutionary dynamics is to use mathematical modeling. In this approach, predictions of a mathematical model are related to data-based statistics. Such comparisons allow limiting the set of parameters of the model to those that fit the data. In this way, it can be determined whether a given mechanism plays a role in the evolutionary dynamics.

In this article, we focus on length distributions of human dinucleotide repeat microsatellites, obtained from a publicly accessible resource. We construct a family of models of dinucleotide evolution and identify important parameters for models of dinucleotide mutation, including the distinction between single- and multiple-step SMM. We compare model predictions with empirical data and discuss implications of our findings for dinucleotide evolution.


*  DATA
*TOP
*ABSTRACT
*DATA
*MODEL OF EVOLUTION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The Genome Database (GDB), established at Johns Hopkins University in 1990, is the official central repository for genomic mapping data resulting from the Human Genome Initiative. In support of this project, GDB stores and curates data generated worldwide by researchers engaged in the mapping effort of the Human Genome Project (HGP). The current version, GDB 6.0, is accessible through the Internet site http://gdbwww.gdb.org/gdb/gdbtop.html. To collect data we used the GDB 5.6 version, which was phased out recently.

For the purpose of this study, we collected data on nongene-associated dinucleotide loci using the GDB 5.6 polymorphism query. The data we extracted include the following: loci names, allele sizes in kilobases, relative allele frequencies, the number of chromosomes sampled, and the literature reference or names of the researchers contributing data. Data collected by us cover chromosomes 1–22, all autosomal chromosomes, jointly covering 3.2 x 109 bp of the DNA (source: Science Maps and Data at the http://www.chlc.org/ScienceData.html/ Internet site). This is most likely one of the largest data samples ever compiled in this fashion. At the time our data were collected from the GDB (1998 and 1999), the database included mostly Caucasian (CEPH) polymorphisms of dinucleotides.

The data, culled from the online Genome Database, consist of 5800 dinucleotide repeat loci from 22 chromosomes. In the vast majority of cases, at least 40 chromosomes were sampled per locus. The distribution of the number of loci per chromosome is depicted in Fig 1. We limited ourselves to dinucleotide loci, since quantitative information concerning other microsatellites is scarce in the GDB.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 1. Distribution of numbers of dinucleotide loci on the autosomal chromosomes in GDB data.

Fig 2 depicts the distributions of the number of alleles and the range of the number of repeats in the dinucleotides sampled. The distribution of the number of alleles is narrower than the distribution of the number of repeats, since, for any given locus, the number of alleles is frequently less than the range of repeat counts might indicate; i.e., within a given range of repeat counts, usually some alleles are missing. However, the shape of the two distributions is remarkably similar. For each locus, heterozygosity (H), variance of repeat count (V), and skewness of the distribution of repeat count (S) were estimated according to the expressions

and

where = {Sigma}Kk=1pkxk, xk is the number of repeats in the kth allele, pk is the relative frequency of the kth allele, K is the number of alleles observed at the locus, and n is the number of chromosomes in the sample. Fig 3 depicts scatter plots of heterozygosity and skewness against variance of repeat count. Our definition of skewness differs slightly from the standard one, which involves 3/2 in the denominator. Our definition is convenient, since the sample variance of () of from the data is very close to the mean , both calculated over all the loci available. In addition, simulated mean and () behave very similarly to each other. The mean heterozygosity in the sample is equal to = 0.68, the mean variance of repeat count is equal to = 5.4, and the sample variance of estimated skewness is equal to () = 5.5.




View larger version (33K):
In this window
In a new window
Download PPT slide
 
Figure 2. Distributions of (a) the range of allele lengths (numbers of repeats) and (b) the number of alleles in GDB dinucleotides.




View larger version (33K):
In this window
In a new window
Download PPT slide
 
Figure 3. Scatter plots of statistical moment characteristics of distributions of GDB dinucleotides. (a) Heterozygosity vs. variance of repeat count and (b) skewness vs. variance of repeat count.


*  MODEL OF EVOLUTION
*TOP
*ABSTRACT
*DATA
*MODEL OF EVOLUTION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We assume the microsatellite loci are neutral with respect to natural selection. Mutation is modeled as occurring in steps: Each mutation changes the length of the existing allele by an integer number of repeat motifs. Drift is modeled by the coalescent approximation of the Fisher-Wright process (TAVARE 1995 Down). The models and simulation tools are as described in detail in KING et al. 2000 Down.

SMM:
In this model it is assumed that mutation occurs with frequency {nu} for all loci and it changes allele length by a random number of repeats, i.e., that the change of allele size X by mutation has the form

where U is an integer-valued random. In the case of the SSMM model,

We also use the version of the SMM model with an admixture of larger mutation steps (of size U = u >= 2); i.e.,

Genetic drift and demographic change:
The evolution of a microsatellite locus is also shaped by drift and demography. The genetic drift (the loss of alleles through random sampling of genotypes of new individuals from the gamete pool) acts with strength inversely proportional to the effective population size. This effect determines the distribution of the branch lengths in the genealogy of the sampled locus. For a sample of n individuals, the genealogy may be partitioned according to the times Tk, k = 2, 3, ... , n, where Tk denotes the time for which the sample represents k distinct lineages. In the case of constant population size, it is known that the coalescence times Tk are independent exponentially distributed random variables, each with parameter , where N is the number of diploid individuals in the population.

Under constant population size, the most ancient coalescence times tend to be long relative to branches of the tree associated with more recent bifurcations. For this reason, coalescent trees in constant populations frequently exhibit two or three clusters at the tips of the tree connected with the root by a few long branches. Mutations accumulate on these long branches, accounting for much of the allelic variation observed in the sample. Under mutation-drift equilibrium and constant population size, model predictions are determined by the single composite parameter {theta} = 4N{nu}, where N is understood as the effective population size. Specifically, under the SSMM, expected heterozygosity, variance of repeat count, and skewness are given by closed-form expressions

Other statistics can be estimated by simulation of genealogies, under a variety of evolutionary and demographic scenarios.

When the population size varies over time, the above description of the coalescence process must be modified. The times to coalescence are no longer exponentially distributed. Intuitively, the constant coalescence intensity 1/(2N) is replaced by the time-dependent coalescence intensity 1/[2N(t)], where N(t) gives the population size t generations in the past. Given that there are k lineages represented in the sample at time t, the distribution of the time to coalescence is given by

As a result, the distributions of the coalescence times Tk in a population of variable size will be distorted relative to their counterparts in a constant population.

Two growth patterns are of interest here. The first one, so-called "long-neck," is rapid expansion from previously established mutation-drift equilibrium. Consider, for example, a population originally of size N0, which undergoes a stepwise expansion te generations in the past to its current size N, where N >> N0. Looking backward in time, this demographic expansion corresponds to a sudden increase in the coalescence intensity from 1/(2N) to 1/(2N0). The effect of this change on the genealogy of a sample of n chromosomes depends on the time since the expansion. If the expansion event is very ancient, even the coalescence times closest to the root will reflect the current population size 2N. If, however, the growth is sufficiently recent, the lineages in the genealogy at the time of expansion will be subject to the preexpansion coalescence intensity 1/(2N0), and expected coalescence times for these lineages will be much shorter. With high probability then, the most recent common ancestor of the sample is found close to time te.

The other growth pattern of interest, the so-called "hourglass," is a rapid expansion following a bottleneck. The prebottleneck population is supposed to maintain mutation-drift equilibrium. Consider, for example, a population originally of size N0, which goes through a bottleneck of size Nb, for duration tb, and then it undergoes a stepwise expansion te generations in the past, to its current size N, where N >> N0. The effect of this change on the genealogy of a sample of n chromosomes depends on the time since the expansion and duration of the bottleneck. For a detailed study, see KIMMEL et al. 1998 Down.


*  RESULTS
*TOP
*ABSTRACT
*DATA
*MODEL OF EVOLUTION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We carried out a large number of simulations under different variants of the model. We concentrated on three synthetic characteristics: variance of repeat count, heterozygosity, and variance of skewness. The reason to consider the variance of skewness, instead of skewness itself, becomes clear from examination of Fig 3B. In the data, skewness is distributed symmetrically (to a high accuracy) around 0, while its variance changes with the variance of repeat count.

  1. We began by simulating the model predictions under the SSMM, mutation-drift equilibrium, and constant population size. In this case, the only parameter varied was {theta}. The range of values used was {theta} = 1, 2, ... , 20. For each value of {theta}, 100 simulation runs were carried out, each of them involving 6000 loci, with the assumed sample size of 40 chromosomes. Fig 4, a and b, depicts model predictions, involving scatter of simulated values due to random fluctuations, for the dependence of heterozygosity on variance of repeat count (a) and for the dependence of variance of skewness on variance of repeat count (b). The corresponding mean values from the sample are depicted in the same graphs. There exists a systematic departure of the model predictions from the observed values, which extends far beyond the level of random fluctuations.






    View larger version (83K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 4. Predictions of the single-step stepwise mutation model. Constant-population, mutation-drift equilibrium graphs of (a) heterozygosity vs. variance of repeat count, obtained analytically, and (b) variance of skewness vs. variance of repeat count, obtained using simulations. The {otimes} symbols are placed at mean values in GDB data of the corresponding statistics. c and d depict the sensitivity of simulated heterozygosity and variance of skewness (respectively) to changes in the admixture of multiple-step mutational changes, under mutation rates dispersed with a coefficient of variation equal to 1. Longer "clouds" of simulated points (circles) correspond to reference values with no admixture of multiple-step mutational changes. Shorter "clouds" correspond to simulated values with varying admixture of multiple-step mutational changes.

  2. Studies in item 1 were repeated assuming a long-neck scenario, with an expansion te = 4000 generations in the past, from effective population size corresponding to {theta}0 = 0.5, to its current size corresponding to {theta} = 10, and mutation rate {nu} = 5 x 10-4 per generation. These values were selected to obtain a simulated value of close to the sample mean of 5.4. Under the long-neck scenario, the discrepancy between variance repeat count and heterozygosity is amplified. However, the discrepancy between variance of skewness and variance of repeat count stays basically unchanged. The values of effective population sizes, mutation rates, and times from expansion are approximate values, which seem applicable to the demographic history of Caucasian populations (for more details, see KIMMEL et al. 1998 Down and KING et al. 2000 Down). Values from a region of the parameter space generate the desired variances. However, the general conclusion concerning the discrepancy remains unchanged.

  3. Studies in (1) were repeated assuming an hourglass scenario, with an expansion te = 4000 generations in the past, from a bottleneck of duration tb = 2000 generations (exploratory simulations were carried out for a range of bottleneck durations). The prebottleneck population size corresponds to {theta}0 = 5, the bottleneck size to {theta}b = 0.5, and the current size to {theta} = 10, with mutation rate {nu} = 5 x 10-4 per generation. Again, these values were selected to obtain a simulated value of close to the sample mean. Under the hourglass scenario, the discrepancy between variance repeat count and heterozygosity is reduced or even reversed. However, the discrepancy between variance of skewness and variance of repeat count again stays basically unchanged.

  4. Further, in the framework of constant population and mutation-drift equilibrium scenarios, an array of simulations was performed including the following modifications in the basic model:

  5. Sampling {theta} values from a lognormal distribution and varying its expected value and coefficient of variation.

  6. Assuming an admixture of larger mutation steps (of size U = u >= 2) and varying the probability 1 - p of single step.

  7. Assuming an admixture of larger mutation steps (with probability p) and varying the size U = u >= 2 of the multiple step.

  8. summary of results of simulations in (3) is depicted in Table 1. As outlined above, the model was constructed by beginning with the one-parameter SSMM and then proceeding to add parameters until the model expectations matched the observed values of , , and H depicted in the top row of Table 1. In the SSMM we set {theta} (the one parameter) to a value that matched the expectation of V to . In this case, model expectations of S and H were, respectively, too low and too high (Table 1, row 2). Adding variation to the rate of mutation had little effect on the expectations of V and S while dramatically decreasing the expectation of H. Thus modeling the mutation rate as drawn from a probability distribution (lognormal, in particular) and adjusting the mean and coefficient of variation allowed the model to account for and H while still expecting too small a value of S (Table 1, row 3). Allowing for a rare, large-step mutational change in the model had little effect on the expectation of H (provided single-step mutations remained predominant) and increased the expectations of V and S. For large-step sizes (e.g., equal to seven) the effect on S was larger than the effect on V, and so allowing for a rare large-step mutational change produces model expectations that match as well as and H (Table 1, row 4). The final best fit was achieved using values {theta} = 8, coefficient of variation (c.v.) of {nu} equal to 1, p = 0.015, and u = 7. Fig 4C and Fig D, depicts the results of varying the proportion p of large-step mutational changes (with c.v. of {nu} equal to 1) on the heterozygosity and variance of skewness, for a range of mean values of {theta}.


     
    View this table:
    In this window
    In a new window

     
    Table 1. Observed and modeled average values of variance, skewness, and heterozygosity of human dinucleotide loci under various scenarios of molecular evolution

  9. Fig 5 depicts the results of a sensitivity study of varying the parameters around their best-fit values. As seen in Fig 5, variance of repeat count and variance of skewness are insensitive to the changes in the coefficient of variation of {theta}, while heterozygosity decreases with increasing coefficient of variation. Interestingly, this effect is predictable from Jensen's inequality (BILLINGSLEY 1986 Down) and is not peculiar to the lognormal distribution (see Appendix). Heterozygosity is quite sensitive to the variation in mutation rates across loci. Using a coefficient of variation of {theta} equal to 1, even without an admixture of multiple-step mutation, allows a close match to the empirical heterozygosity distribution (Fig 6).



    View larger version (33K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 5. Parametric sensitivity studies of variance of repeat count (column 1), variance of skewness (column 2), and heterozygosity (column 3) around the best-fit point for the constant-population scenarios. Row 1, a range of values of {theta}. Row 2, a range of values of the coefficient of variation of mutation rate. Row 3, a range of values of the frequency p of multiple-step mutations. Row 4, a range of values of the size of the multiple mutational step. Horizontal line corresponds to the mean level in the data. Further details in the text.



    View larger version (17K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 6. Comparison of the distributions of heterozygosity data (bars); SSMM, basic version (dashed line); and SSMM, coefficient of variation of mutation rate equal to 1 (solid line). The basic SSMM fit (dashed line) is obtained by requiring that the mean variance is as observed in the data.

  10. Varying the values of u and p does not change heterozygosity, while it changes both variance of repeat count and variance of skewness. Furthermore, variance of skewness is more sensitive than variance of repeat count. The discrepancy between the single-step model and the data in variance skewness lies in the tails of the distribution of skewness. In the empirical data, more of the mass is in the tails. Adding a small probability of a large mutation step brings the model into a close agreement with the data. Fig 7 depicts the outcome for the best-fit parameters.



    View larger version (13K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 7. Comparison of the distributions of the skewness data (bars); SSMM, basic version (dashed line); and SMM, with an admixture of p = 0.015 multiple mutational steps with length u = 7 (solid line). The basic SSMM fit (dashed line) is obtained by requiring that the mean variance is as observed in the data.

  11. Finally, it was verified whether the model with variable mutation rate and multiple-step admixture provides a fit to the data under the long-neck and hourglass scenarios. The answer is in the affirmative (Fig 8 and Fig 9), although the fits seem to be slightly worse compared to those assuming constant population. The best-fit values of V, S, and H were obtained under the following values: long-neck, te = 4000 generations, {theta}0 = 0.52, {theta} = 10.4, c.v. of {nu} equal to 0.9, p = 0.04, and u = 6; hourglass, te = 4000 generations, tb = 2000 generations, {theta}0 = 0.52, {theta} = 10.4, c.v. of {nu} equal to 0.9, p = 0.04, and u = 6. A similar exercise was carried out for bottleneck and expansion times decreased by a factor of 2. It resulted in a fit for almost identical parameter values, except that {theta} {cong} 20 in both cases.



    View larger version (16K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 8. Comparison of the distributions of heterozygosity data (bars) and SMM with admixture of multiple mutational steps for the following demographic scenarios: constant population (solid line), long-neck (dotted line), and hourglass (dashed-dotted line).



    View larger version (14K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 9. Comparison of the distributions of skewness data (bars) and SMM with admixture of multiple mutational steps for the following demographic scenarios: constant population (solid line), long-neck (dotted line), and hourglass (dashed-dotted line).


*  DISCUSSION
*TOP
*ABSTRACT
*DATA
*MODEL OF EVOLUTION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The Fisher-Wright coalescent provides a flexible framework for investigating multiparameter models. We studied model variants involving multistep mutations, variable mutation rate, and changing population size, in addition to parametric studies of the {theta} value. The most important conclusion from our study is that the distributions of GDB dinucleotides can be fitted by the SMM model, if a small admixture of multistep mutations is added to the single-step changes.

In our analyses, limited to human dinucleotide loci, it was not necessary to consider either limits on allele size or directionality of mutational changes. In other microsatellites, the patterns seem to be more complicated. For example, both constraints and directionality had to be included by DEKA et al. 1999A Down, to model distributions of human trinucleotides of several types. Similarly, in the context of Y chromosome tetranucleotides, COOPER et al. 1999 Down needed the assumption of expansion bias to explain out their data. Major asymmetry exists in disease-causing mutations of trinucleotides at loci such as fragile X, myotonic dystrophy, or Huntington's disease (RICHARDS and SUTHERLAND 1997 Down). Also, a highly polymorphic CAG repeat locus, ERDA1, on human chromosome 17q21.3, was recently analyzed by DEKA et al. 1999B Down. It has alleles as large as 50–90 repeats apparently without any disease association but with a high intergenerational instability. Paternal transmissions predominantly result in contractions, whereas maternal transmissions predominantly result in expansions.

SWINTON and AMOS 2001 Down argue that a moment measure like our skewness may not be sufficiently sensitive as a tool to detect asymmetry of mutational changes. They introduce index ({alpha}3) of asymmetry and apply it to GDB dinucleotides. Our simulations using their index indicate that it is very sensitive to a singular type of asymmetry, namely the presence of a mode of the distribution at, or close to, the extreme-size allele. However, these alleles seem to be most sensitive to typing errors. Therefore, even if in simulations the {alpha}3 index seems superior to , we found it cautious to adhere to .

The effects of changing population size observed in our simulations are consistent with previous observations and theoretical work. Indeed, population expansion results in a transient process characterized by growth of both variance and heterozygosity. However, in the long-neck expansion, heterozygosity grows faster than variance (KIMMEL et al. 1998 Down). The net effect is that the so-called imbalance index ß, defined as the ratio of the variance-based estimate of {theta} to the heterozygosity-based estimate of {theta}, remains less than its equilibrium value for up to several thousand generations. The opposite effect is present for the hourglass pattern of population change, at least for a long initial period, exceeding our te (KIMMEL et al. 1998 Down). The estimated value of the imbalance index, ß = 1.23, implied by the mean values of variance and heterozygosity in the top row of Table 1, seems consistent with the hourglass scenario, similarly as it was observed in KIMMEL et al. 1998 Down for tetranucleotide-repeat loci. However, following KING et al. 2000 Down, this is not significantly different from 1. This is consistent with equally good fits obtained for each of the three demographic scenarios.

The present analysis demonstrates the importance of accounting for mutation rate heterogeneity when interpreting measures of DNA polymorphism. Heterozygosity is strongly affected by a variable mutation rate.

As indicated, to fit the data, it was required to consider an admixture of multistep mutations to the single-step model. This was the only way to fit the heavy tails of the allele length skewness distribution. This suggests that the two proposed mutation mechanisms, replication slippage and recombinatorial misalignment, are both at work in creating variability in microsatellite loci. This conclusion is consistent with studies of long-term evolutionary change in microsatellites in other organisms (ZHU et al. 2000 Down), in which major rearrangements were found in addition to single- and multistep contractions and expansions.

Manuscript received December 5, 2000; Accepted for publication July 15, 2001.
*  APPENDIX
*TOP
*ABSTRACT
*DATA
*MODEL OF EVOLUTION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

HETEROZYGOSITY UNDER DISTRIBUTED MUTATION RATE
Under the SSMM model, heterozygosity is equal to H({theta}) = 1 - (1 + 2{theta}), for any given {theta}. If {nu}, and consequently {theta}, is distributed with density f({theta}) and expected value E({theta}), then the expected heterozygosity is equal to E{theta}[H({theta})] = {int}[1 - (1 + 2{theta})]f({theta})d{theta}. Since the function 1 - (1 + 2{theta})-1/2 is concave in {theta}, we have from the Jensen's inequality (BILLINGSLEY 1986 Down) that E{theta}[H({theta})] <= H[E({theta})], which explains the reduction of heterozygosity under distributed mutation rate.


*  LITERATURE CITED
*TOP
*ABSTRACT
*DATA
*MODEL OF EVOLUTION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

BILLINGSLEY, P., 1986 Probability and Measure. Wiley, New York.

CHAKRABORTY, R., and L. JIN, 1993 A unified approach to study hypervariable polymorphisms: Statistical considerations of determining relatedness and population distances, pp. 153–175 in DNA Fingerprinting: State of the Science, edited by S. D. J. PENA, R. CHAKRABORTY, J. T. EPPLEN and A. J. JEFFREYS. Birkhäuser, Basel, Switzerland.

CHAKRABORTY, R., M. KIMMEL, D. N. STIVERS, L. J. DAVISON, and R. DEKA, 1997  Relative mutation rates at di-,tri-, and tetranucleotide microsatellite loci. Proc. Natl. Acad. Sci. USA 94:1041-1046[Abstract/Free Full Text].

COOPER, G., N. J. BURROUGHS, D. A. RAND, D. C. RUBINSZTEIN, and W. AMOS, 1999  Markov Chain Monte Carlo analysis of human Y-chromosome microsatellites provides evidence of biased mutation. Proc. Natl. Acad. Sci. USA 96:11916-11921[Abstract/Free Full Text].

DEKA, R., S. GUANGYUN, D. SMELSER, Y. ZHONG, and M. KIMMEL et al., 1999a  Rate and directionality of mutations and effects of allele size constraints at anonymous, gene-associated, and disease-causing trinucleotide loci. Mol. Biol. Evol. 16:1166-1177[Abstract].

DEKA, R., S. GUANGYUN, J. WIEST, D. SMELSER, and S. CHUNHUA et al., 1999b  Patterns of instability of expanded CAG repeats at the ERDA1 locus in general populations. Am. J. Hum. Genet. 65:192-198[Medline].

DIB, C., S. FAURÉ, C. FIZAMES, D. SAMSON, and N. DROUOT et al., 1996  A comprehensive map of the human genome based on 5264 microsatellites. Nature 380:152-154[Medline].

DIRIENZO, A., P. DONNELLY, C. TOOMAJIAN, B. SISK, and A. HILL et al., 1998  Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics 148:1269-1284[Abstract/Free Full Text].

KIMMEL, M. and R. CHAKRABORTY, 1996  Measures of variation at DNA repeat loci under a general stepwise mutation model. Theor. Popul. Biol. 50:345-367[Medline].

KIMMEL, M., R. CHAKRABORTY, J. P. KING, M. BAMSHAD, and W. S. WATKINS et al., 1998  Signatures of population expansion in microsatellite repeat data. Genetics 148:1921-1930[Abstract/Free Full Text].

KING, J. P., M. KIMMEL, and R. CHAKRABORTY, 2000  A power analysis of microsatellite-based statistics for inferring past population growth. Mol. Biol. Evol. 17:1859-1868[Abstract/Free Full Text].

MOUNTAIN, J. L. and L. L. CAVALLI-SFORZA, 1997  Multilocus genotypes, a tree of individuals, and human evolutionary history. Am. J. Hum. Genet. 61:705-718[Medline].

REICH, D. E. and D. B. GOLDSTEIN, 1998  Genetic evidence for a Paleolithic human population expansion in Africa. Proc. Natl. Acad. Sci. USA 95:8119-8123[Abstract/Free Full Text].

REICH, D. E., M. W. FELDMAN, and D. B. GOLDSTEIN, 1999  Statistical properties of two tests that use multilocus data sets to detect population expansions. Mol. Biol. Evol. 16:453-466.

RICHARDS, R. I. and G. R. SUTHERLAND, 1997  Dynamic mutation: possible mechanisms and significance in human disease. Trends Biochem. Sci. 22:432-436[Medline].

SWINTON, J. and W. AMOS, 2001  Measurement of distributional asymmetry in allele frequency distributions of microsatellites. Inst. Math. Appl. J. Math. Appl. Med. Biol. in press.

TAVARÉ, S., 1995 Calibrating the clock: using stochastic processes to measure the rate of evolution, pp. 114–152 in Calculating the Secrets of Life, edited by E. S. LANDER and M. S. WATERMAN. National Academy Press, Washington, DC.

WEBER, J. L. and C. WONG, 1993  Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123-1128[Abstract/Free Full Text].

ZHU, Y., D. C. QUELLER, and J. E. STRASSMANN, 2000  A phylogenetic perspective on sequence evolution in microsatellite loci. J. Mol. Evol. 50:324-338[Medline].




This article has been cited by other articles:


Home page
GeneticsHome page
A. L. Seyfert, M. E. A. Cristescu, L. Frisse, S. Schaack, W. K. Thomas, and M. Lynch
The Rate and Spectrum of Microsatellite Mutation in Caenorhabditis elegans and Daphnia pulex
Genetics, April 1, 2008; 178(4): 2113 - 2121.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Xu, R. Chakraborty, and Y.-X. Fu
Mutation Rate Variation at Human Dinucleotide Microsatellites
Genetics, May 1, 2005; 170(1): 305 - 312.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. H. Wagner, R. Holderegger, S. Werth, F. Gugerli, S. E. Hoebee, and C. Scheidegger
Variogram Analysis of the Spatial Genetic Structure of Continuous Populations Using Multilocus Microsatellite Data
Genetics, March 1, 2005; 169(3): 1739 - 1752.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
O. J. Hardy, N. Charbonnel, H. Freville, and M. Heuertz
Microsatellite Allele Sizes: A Simple Test to Assess Their Significance on Genetic Differentiation
Genetics, April 1, 2003; 163(4): 1467 - 1482.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. F. Storz, M. A. Beaumont, and S. C. Alberts
Genetic Evidence for Long-Term Population Decline in a Savannah-Dwelling Primate: Inferences from a Hierarchical Bayesian Model
Mol. Biol. Evol., November 1, 2002; 19(11): 1981 - 1990.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. T. Webster, N. G. C. Smith, and H. Ellegren
Microsatellite evolution inferred from human- chimpanzee genomic sequence alignments
PNAS, June 25, 2002; 99(13): 8748 - 8753.
[Abstract] [Full Text] [PDF]