- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gonser, R.
- Articles by Di Rienzo, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Gonser, R.
- Articles by Di Rienzo, A.
Microsatellite Mutations and Inferences About Human Demography
Rusty Gonsera, Peter Donnellyb, George Nicholsonb, and Anna Di Rienzoaa Department of Human Genetics, University of Chicago, Chicago, Illinois 60637
b Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom
Corresponding author: Anna Di Rienzo, Department of Human Genetics, University of Chicago, JFK Rm. 116, 924 E. 57th St., Chicago, IL 60637., dirienzo{at}genetics.uchicago.edu (E-mail)
Communicating editor: N. TAKAHATA
| ABSTRACT |
|---|
Microsatellites have been widely used as tools for population studies. However, inference about population processes relies on the specification of mutation parameters that are largely unknown and likely to differ across loci. Here, we use data on somatic mutations to investigate the mutation process at 14 tetranucleotide repeats and carry out an advanced multilocus analysis of different demographic scenarios on worldwide population samples. We use a method based on less restrictive assumptions about the mutation process, which is more powerful to detect departures from the null hypothesis of constant population size than other methods previously applied to similar data sets. We detect a signal of population expansion in all samples examined, except for one African sample. As part of this analysis, we identify an "anomalous" locus whose extreme pattern of variation cannot be explained by variability in mutation size. Exaggerated mutation rate is proposed as a possible cause for its unusual variation pattern. We evaluate the effect of using it to infer population histories and show that inferences about demographic histories are markedly affected by its inclusion. In fact, exclusion of the anomalous locus reduces interlocus variability of statistics summarizing population variation and strengthens the evidence in favor of demographic growth.
INTEREST in the use of microsatellites as tools for the study of population processes followed soon after their discovery (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
As a step toward an improved characterization of the mutation process on a locus-by-locus basis, we devised an approach based on the analysis of somatic microsatellite mutations in cancer patients, which allows the estimation of the distribution of mutation sizes for each locus (![]()
![]()
Here, we report on three results and a subsequent population analysis. First, we extend our previous findings to a broader range of human populations by applying the same approach to a second data set on 14 additional microsatellite loci. The results further validate the use of microsatellite instability as a means of characterizing the mutation process of microsatellites. Second, we investigate the variability of mutation rates across loci by using our locus-specific estimates of the distributions of mutation size. The purpose of this analysis is to identify loci that are "anomalous" in that their extreme pattern of population variation cannot be accounted for solely by mutation size variability. We identify one such locus and conclude that its inclusion may compromise the inference about population histories. Third, we extend a result of ![]()
The amount of variation expected at a particular locus in a population increases with the variability (measured by the mutation mean square) of the mutation process. Interlocus variability in the mutation mean square is effectively another source of noise in an already very noisy system. Locus-specific estimates of the mutation mean square can be used to correct for this effect before combining information across loci. ![]()
![]()
As a result, we are in a position to allow for most of the complexities of the mutation process at microsatellite loci and, thus, carry out an advanced multilocus analysis of different demographic scenarios. Because of our less restrictive assumptions about the mutation process, our method is more powerful to detect departures from the null hypothesis of constant population size than other methods that have been applied to similar datasets (![]()
![]()
| MATERIALS AND METHODS |
|---|
Subjects:
Study subjects included 219 patients with sporadic colorectal cancer diagnosed and treated at the Northwestern Memorial Hospital, Chicago, Illinois, as described in ![]()
![]()
The Sardinia (Italy) population sample was randomly selected from previously described samples (![]()
![]()
![]()
![]()
|
Typing protocol:
For both population and patient tissue samples, we used previously described typing protocols based on radioactively endlabeling one of the PCR primers (![]()
![]()
| RESULTS |
|---|
Microsatellite instability and patterns of somatic mutations:
Fourteen out of the 30 tetranucleotide repeat loci described in ![]()
![]()
![]()
![]()
|
For 8 out of 14 loci, the mean of the estimated distribution of mutation size was above zero, showing no evidence for a mutational bias toward an increase of repeat size.
Validating the use of somatic mutations for estimating germ-line mutation parameters:
In Appendix A, we show that a linear relationship is expected between the variance of repeat number in a population sample and the mutation mean square for each microsatellite locus for any demographic scenario. This extends earlier results of ![]()
![]()
![]()
![]()
![]()
![]()
![]()
The fit of the data to the general expectation of a linear relationship between population and mutation parameters can be tested by assessing the significance of the rank correlation between them. As shown in Table 2, all the population samples, with the exception of the African sample, have a significant rank correlation. A graphical representation of the relationship between the population variance and the mutation mean square is also shown in Fig 2.
|
|
We regard the significance for all populations other than the African sample as evidence that the somatic mutations in cancer are indeed informative for the mutation process generating population variation. Recall that if it were, we would expect a linear relationship between population variance and mutation mean square. The test based on rank correlation examines the null hypothesis of no association. Whether or not such a test will detect a linear relationship if one is present depends on the power of the test, which in turn will depend on the variability of the data around the linear relationship. This variability is considerable for our data, due to the stochastic nature of the evolutionary process. If informative at all, the cancer data will be equally informative for all populations. In the light of the other results, we are thus inclined to view the lack of significance of the rank correlation for the African sample as a reflection of the low power of the test, rather than on the utility of the cancer data.
Bootstrap resampling was used to assess the sampling error in our estimates of mutation mean square. The bootstrap distributions are shown in Fig 3. We discuss below the consequences of this sampling variability for estimation of demographic parameters and testing of demographic scenarios.
|
Identifying "anomalous" loci:
Equation A1 in Appendix A shows that at a given locus the expected population variance depends linearly on features of the mutation mechanism (the mutation mean square,
2, and the mutation rate, µ) and a feature of the population demographic history [the expected coalescence time for a pair of genes at the locus, E(T12)]. As noted above, the use of the NPV facilitates interlocus comparison by "correcting" for an estimate of mutation mean square at the locus. For example, microsatellites with large variability in mutation size, such as D10S526, are expected to show greater population variance, and allowance should be made for this effect when pooling results across loci. However, reliable empirical estimates of locus-specific mutation rates are not available and an analogous correction is not feasible. We now utilize the estimated distributions of mutation size to investigate the interlocus variability of mutation rate.
Write NPV(l, p) for the NPV value at locus l in population p, and writing kp for the number of loci tested in population p, define

the average NPV value across loci within population p. Now, writing T(p)12 for the mean pairwise coalescence time at an autosomal locus in population p, µl for the mutation rate at locus l, and
for the average mutation rate across these loci,

so that

is a natural ratio estimator of
[the use of
(l, p) was suggested to us by M. STEPHENS, personal communication]. Fig 4 plots the values of
(l, p) for the data set of ![]()
|
The most striking feature of Fig 4 is that the bar for locus D19S244 is much higher than that for any other locus in the corresponding samples. The difference is significant: a permutation test (![]()
(l, p) values across loci within populations has P = 0.02. The variability of the estimator
(l, p) will depend on the true demographic scenario. It could be substantial, especially in a population of constant size. We note, however, that the use of the permutation test, and hence our conclusion that D19S244 is unusual, is valid regardless of the magnitude of such variability.
We propose three possible explanations for this observation. The first is that this locus has a substantially larger mutation rate than the other loci in the first data set. The second is that it has been affected by natural selection: if natural selection acted at or near the locus in all populations, it may systematically increase (balancing selection) or decrease the expected average coalescence time (background selection or a selective sweep; ![]()
![]()
![]()
![]()
(l, p) values for that locus in all populations, and hence in a large bar in Fig 4, as for D19S244. (The latter forms of selection would have the opposite effect, leading to a small bar in Fig 4.) A third potential explanation is that the mutation mean square at that locus was substantially underestimated, hence inflating the NPV values in all populations.
A high rate of de novo mutations has been observed at the D19S244 locus in family studies (![]()
(l, p) value for the D19S244 locus reflects a substantially higher mutation rate for that locus. While the second data set (Fig 4) also suggests heterogeneity across loci, no locus appears to be markedly different from the others.
As shown in ![]()
Using the NPV to make inferences about human evolution:
Parameters of natural interest include the effective population size, in the case of a constant-sized population, or the timing or rate of changes in population size, under other demographic scenarios. It follows from Equation A1 in Appendix A that the average NPV value across loci within a population is an unbiased estimator of the product
E(T12), where
is the average mutation rate of the loci in the study, and E(T12) is the expected coalescence time of a pair of genes at the locus. For a constant-sized population, the expected average coalescence time equals the effective number of chromosomes in the population, N, while for a population that was initially very small before undergoing a rapid and substantial growth in size T generations ago, it equals the time since the expansion (T) (e.g., ![]()
|
|
One scenario of interest in connection with human evolution would be a population of nontrivial size that grows rapidly to become quite large. Then the expected average coalescence time would be larger than the time since growth, so that it would provide an upper bound on that time (provided, as seems plausible for humans, that the time since growth was less than the effective population size after growth).
The present data set also confirms our previous finding that the coalescence time of the overall pooled sample is very similar to the coalescence times of the individual populations. This strongly suggests that populations from different ethnic groups share a substantial portion of their genetic ancestry and is in agreement with previous studies indicating that a small proportion of human genetic diversity occurs between populations (![]()
![]()
Testing demographic scenarios:
Aside from estimating parameters, under the assumption that a particular demographic scenario applies, multilocus microsatellite data allow testing of demographic scenarios. It turns out that the expected variability, across loci, in statistics such as the population variance or the NPV changes considerably under different demographic scenarios. For example, under the null hypothesis of constant population size, relatively large values of the variance of NPV across loci would be expected, while under scenarios of recent population growth this variance will be smaller. ![]()
![]()
Here, we apply our method to the second data set. In the light of the earlier analysis suggesting an unusual status for the locus D19S244, we also reapply the method to the first data set, both with and without D19S244. The details of the hypothesis testing procedure are described in Appendix B. Table 4 gives the P values for three different test statistics, F1, F2, and g, defined in Equations B1, B2, and B3. The statistic g was effectively introduced by ![]()
|
Our test rejects the null hypothesis of constant population size for large values of the test statistics, corresponding to the data showing significantly less variability across loci than would be expected under this demographic scenario (see Table 4). Whatever the true demographic scenario, variation in mutation rates across loci will tend to increase the variation in population variances and NPVs across loci, thus decreasing the value of all three test statistics. While this means that it is conservative to calculate P values under the assumption of the same mutation rate for all loci, this can involve a considerable loss of power. We believe it is more helpful to calculate P values separately under a range of assumptions (see Table 4 for details) about the variation in mutation rate and have done so.
![]()
While the extent of variation in mutation rate across microsatellite loci has not been well documented, either our "medium" or "high" variability scenarios may be most realistic (![]()
![]()
All three test statistics lead to valid tests. The differences in P values on the same data are due to differences in their power to detect departures from the null hypothesis. The statistic F2 is slightly more powerful than F1. We should expect the statistics based on NPV (F1 and F2) to be more powerful than one based simply on population variance (g), exactly because the normalization (division by mutation mean square) corrects for one source of variation before combining data across loci. Nonetheless, the difference in power between g and the two F statistics is striking, particularly if one were to use the procedure based on g without making allowance for variation in mutation rate.
| DISCUSSION |
|---|
Somatic and germ-line mutations:
An understanding of microsatellite mutation patterns is central to their use for the accurate reconstruction of population processes. We have developed and validated an experimental approach to estimate the distribution of mutation sizes for each individual microsatellite locus. These distributions were estimated from somatic mutations observed in the tumor tissue of sporadic patients with colorectal cancer.
It is not known whether such mutations arise from the same events that produce variation in the normal population. Microsatellite instability in some cancer patients may reflect defects in mismatch repair; but, in other patients, it may be a consequence of the higher number of cell divisions that occurs in the tumor compared to the normal tissue. Nevertheless, in the absence of specific mechanistic or genetic information on the source of these mutations, it is still possible to test whether they reflect the mutation process in the general population by using population theory. Here, we demostrate that under the generalized stepwise model with arbitrary distribution of mutation sizes, the relationship between the variance of repeat number at a given locus in a population sample and the mutation mean square for the same locus is expected to be linear regardless of assumptions about the demographic history of the population (see Appendix A). Therefore, if the mutation mean square estimated in cancer patients parallels that of the "real" mutation process, one expects it to be linearly related to the variance of repeat number of different population samples. Three out of the four population samples examined in this article conform to this expectation. This observation extends our previous findings of a linear relationship between the population variance and the mutation mean square for an additional three population samples. Taken together, the results of these two studies indicate that the somatic mutations observed in sporadic colorectal cancer patients are a useful approach to the characterization of the mutation process of microsatellite loci on a locus-by-locus basis.
Even though most of the loci show a preponderance of short mutations, i.e., gain or loss of one or two repeat units, our estimated distributions of mutation sizes (Fig 1) are relatively broad for a small subset of the loci examined. To investigate whether mismatch repair defects result in unusually large mutation sizes, we partitioned the patients into two groups. The first group includes patients with high levels of microsatellite instability (at least 20% of loci tested had somatic mutations). These patients are more likely to have mismatch repair defects, and this was recently confirmed by staining their tumor tissue with antibodies against MSH2 and MLH1 (A. DI RIENZO, K. HALLING and S. THIBODEAU, unpublished results; ![]()
Furthermore, a similar broad range of mutation sizes (e.g., from -12 to +11 repeat units) was observed in the largest survey reported to date of de novo mutations in family studies (1107 events over 952,962 parent-offspring transmissions; ![]()
![]()
![]()
In addition to the study of germ-line de novo or somatic mutations, another approach to understanding the mutation processes is to examine the variation at tightly linked microsatellite loci. For example, the analysis of multilocus haplotypes carrying the CCR5-
32 allele showed that 9.5% of the alleles at locus D3S4580, located 28 kb from CCR5, differ from the most common one by 410 repeat units. Detailed haplotype analysis revealed that this pattern cannot be easily explained by recombination and is more consistent with occasional large mutations (J. MARTINSON, personal communication; ![]()
Overall, our finding in this and the preceding article of a significant rank correlation between population variance and mutation mean square estimated from the cancer data in five of the six population/loci pairs we have examined would seem extraordinarily unlikely if, in fact, the cancer data were uninformative for the germ-line processes. Further, the results of our analyses of human demography, utilizing the mutation mean squares estimated from the cancer data, are in broad agreement with those of analyses of other genetic systems.
Identifying anomalous loci:
Here, we developed a method for identifying loci that are anomalous, either in the sense of having a different mutation rate from others in the study or because their evolution is not governed by the class of (neutral, generalized stepwise) models on which the analysis is based.
We identified one such locus in our studies, D19S244. In the light of independent evidence as to its unusually high mutation rate, we regard this as the most likely explanation for its status as an outlier (![]()
More generally, this method has the potential to detect loci at or near which natural selection has acted. Recall that balancing selection, respectively background selection or a selective sweep, acting near a locus will increase, respectively decrease, the observed
(l, p) value at that locus relative to others in the same population. In our analysis the effect of natural selection is confounded with a higher mutation rate at the locus. One way of distinguishing between the effects of selection and mutation rate changes would be to examine tightly linked microsatellite loci near the anomalous one. Selection should have an effect in the same direction on all such loci. If the original outlier results from an unusually high or low mutation rate, the effect should not extend to linked loci.
Inferences for population parameters and population history:
Under general assumptions, the average NPV value across loci within a population is an unbiased estimator of
E(T12), the product of the average mutation rate for the loci and the mean pairwise coalescence time. For the two simplified demographic scenarios of constant population size and sudden expansion from a small size, this leads naturally to estimators for the effective population size and the time since expansion, respectively. The estimates shown in Table 3 are in line with those obtained based on other studies suggesting an ancient expansion of the human population (![]()
Several points about this estimation procedure are noteworthy. The first is that recovery of time or population size estimates is dependent on assumptions about the average mutation rate for the loci used. Direct empirical evidence is scanty, yet a change by a factor of two, for example, in this average rate will change estimated times or sizes by a factor of one-half. This problem, effectively one of calibrating mutational events into numbers of generations, afflicts all estimates of such parameters from microsatellite data. Particularly in view of the current speculative nature of such calibrations, the actual value of such estimates should be interpreted with caution. We have presented point estimates with no attempt at assessing the precision of the estimates or, equivalently, of giving confidence intervals. In large part, this is because of the difficulty with the calibration just described. In addition, while the estimator is unbiased for the compound parameter
E(T12) rather generally, its sampling properties, and in particular its precision, will depend sensitively on the underlying demographic scenario. Finally, in our approach we have estimated the mutation mean square at each locus. This estimation also carries uncertainty, in the usual way through sampling, but in addition because we are only measuring a surrogate for the germ-line parameter. While we have used the bootstrap to quantify the former uncertainty, the latter is problematical. It thus does not seem straightforward to quantify the uncertainty in these kinds of estimates of population parameters. On the other hand, it is clear that the uncertainty is large, and in the absence of further relevant data any point estimate based on microsatellite data should be interpreted with great caution.
We performed significance tests of the null hypothesis of constant population size for both data sets. Taking the effective population size as 10,000 individuals, allowing medium variability in mutation rate across loci, and ignoring the locus D19S244 shown above to be anomalous, the null hypothesis would be rejected, in favor of scenarios involving population growth, for all populations in the first data set and for all but the African population in the second data set. If there were more variation in mutation rates across loci (our "high" variability scenario), then the African population in the second data set also becomes highly suggestive of population expansion (P = 0.07).
The mutation process at microsatellite loci is clearly complex. Accordingly, we chose to use an analytical approach that takes into account most of these complexities. There are related recent approaches that use microsatellite data to estimate demographic histories (![]()
![]()
A signal of population expansion has been observed in virtually all major ethnic groups for mtDNA (![]()
![]()
![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We thank D. Barch, G. K. Haines, and B. Sisk for help in sample collection; R. R. Hudson and C. Ober for comments on the manuscript; M. Stephens for helpful discussions; and L. Jorde for providing a file containing the original population data. This work was supported in part by grants from the National Science Foundation (SBR-9317266 to A.D. and DMS-9505129 to P.D.) and the American Cancer Society, Illinois Division, and Digestive Disease Research Center (DK-42086) to A.D. and a UK Engineering and Physical Sciences Research Council Advanced Fellowship (B/AF1255) to P.D.
Manuscript received July 29, 1999; Accepted for publication November 29, 1999.
| APPENDIX A |
|---|
Writing S2 for the sample variance of allele length in a sample from a population at a particular locus, we show here that, for any demographic scenario,
![]() |
(A1) |
where
2 is the mutation mean square (the expected squared size of the change in allele length caused by mutation), µ is the mutation rate at the locus, and E(T12) is the expected coalescence time (in generations) for a pair of genes at the locus. The effect of the demographic scenario enters through its effect on E(T12). This extends a result of ![]()
Note that for a population with constant effective size N chromosomes, E(T12) = N, and for one that has expanded rapidly from a very small size T generations ago, E(T12)
T. The general result (A1) thus reduces to known results [for example, ![]()
Throughout, we assume the generalized stepwise model for mutation, namely that neither the mutation rate nor the distribution of the change in allele length caused by mutation will depend on the length of the progenitor allele, and we assume selective neutrality. Aside from this, we allow arbitrary distribution of mutation sizes and, as we have noted, an arbitrary demographic scenario.
Recall from Equation A9 of ![]()
![]() |
(A2) |
where Y1 and Y2, respectively, are the differences between the lengths of two sampled copies of the locus and the length of their most recent common ancestor.
Now, write W1 for the number of mutations on the lineage to the first sampled chromosome since its common ancestor with the second, and W12 for the total number of mutations along either lineage since their common ancestor. Conditional on T12, the number of generations since this common ancestor, W1 and W12 have binomial distributions with parameters (T12, µ) and (2T12, µ), respectively. In particular, conditional on T12, the means of W1 and W12 are µT12 and 2µT12, respectively, and their respective variances are µT12(1 - µ) and 2µT12(1 - µ). Since µ is small, we approximate these conditional variances by µT12 and 2µT12, respectively.
As in ![]()
![]() |
(A3) |
and
![]() |
(A4) |
where m and
2 are, respectively, the mean and variance of the distribution of mutation size.
Now,
![]() |
(A5) |
Further,
![]() |
(A6) |
Analogously,
![]() |
(A7) |
The result (A1) now follows on substituting (A5) and (A6) into (A3) and (A7) into (A4), before substituting the resulting expressions into (A2). (Recall that
2 = m2 +
2.)
| APPENDIX B |
|---|
We describe here the details of the significance tests of demographic scenarios used in the article. Write S2 and V, respectively, for the population variance and the normalized population variance, L for the number of loci in the data, and
2 and
4, respectively, for the mutation mean square and the fourth moment of the distribution of mutation size. We use an overbar to denote the average of a quantity across loci in the data set, and Var to denote its variance across loci. Thus, for example,
2 and Var(S2) denote the average value and variance of S2 across loci.
The three test statistics we consider are
![]() |
(B1) |
![]() |
(B2) |
and
![]() |
(B3) |
The statistic g is the reciprocal of the g statistic introduced by ![]()
), the statistic F1 does not include a linear term in
. The reason for the somewhat involved definition of F2 is that we are aiming to rid the numerator of any dependence on (
), and under the assumption of constant population size,

For each statistic the null hypothesis is rejected for large values of the test statistic, corresponding to smaller variation across loci in the normalized or unnormalized population variance than would be expected under the null hypothesis of constant population size.
For the values of L and n (the number of chromosomes in the sample) appropriate for each part of our data sets, we evaluated the null distribution of g by simulating 30,000 realizations of evolution with constant population size,
2Nµ = 4 (where N is the number of chromosomes in the population), a simple stepwise mutation model, and no variation in either the mutation rate or the mutation mechanism across loci. For each of the scenarios in which there is variation across loci in the mutation rate, we found the null distribution of g by repeating these simulations, except that a population size of N = 10,000 was assumed, and in the simulations the mutation rate for each locus was chosen randomly according to Moderate variability:

Medium variability:

High variability:

(If variation in mutation rate across loci is assumed, the null distribution of g depends explicitly on N. With no such variation, it depends only on
.)
In each case, the simulated null distribution for g (under the appropriate assumption about variability in µ) was also used as the null distribution for F1 and F2. Significance levels for each of the three statistics were calculated as the percentage of times in the relevant simulation that the simulated value of g was larger than the observed value of the test statistic.
Our principal interest focuses on the use of the statistics F1 and F2. To establish the validity of our procedure we carried out extensive simulations to check that the nominal P values calculated as just described were conservative, in understating the probability of a type I error, under more general assumptions about the processes involved.
First, we simulated from the null distribution of the F statistics under exactly the same assumptions as for g. Next, we weakened the assumption of simple stepwise mutation at each locus, simulating instead using the "two-phase" model introduced in ![]()
![]()

For each of several sets of parameters spanning this range, 1000 data sets were simulated (with L = 15, n = 100).
Next, we introduced possible variation in the mutation mechanism across loci by choosing parameters for the two-phase model independently for each locus, with p being chosen uniformly over various ranges ((0,1), (0.5, 1), (0.8, 1), and (0.9, 1)) for each of which
2g was chosen uniformly over (0, 50), (0, 100), and (0, 200). We also tried several discrete distributions on p and
2g in these ranges.
Finally, we allowed for uncertainty in the estimation of the mutation mean square at each locus by repeating the simulations described in the previous paragraph, but, in addition, using a value of
2 for the locus that is chosen from a normal distribution, with mean given by the true value of
2 (which is specified once the parameters for the mutation model are chosen) and variance (
2 - 1)2/4, independently for each locus. The choice of distribution for the sampling error in our estimates of mutation mean square is motivated by the bootstrap estimates of the sampling variability described in this article.
Each of these sets of simulations was performed under each of the assumptions about variation in mutation rate, with the nominal level of the test set at 0.05. On no occasion was the actual type I error >0.05.
| LITERATURE CITED |
|---|
BOWCOCK, A. M., L. A. RUIZ, J. TOMFOHRDE, E. MINCH, and J. R. KIDD et al., 1994 High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455-457[Medline].
BRINKMANN, B., M. KLINTSCHAR, F. NEUHUBER, J. HUHNE, and B. ROLF, 1998 Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62:1408-1415[Medline].
BURKE, T. (EDITOR), 1991 DNA Fingerprinting: Approaches and Applications. Birkhäuser Verlag, Basel/Boston.
CHAKRABORTY, R., M. KIMMEL, D. N. STIVERS, L. J. DAVISON, and R. DEKA, 1997 Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc. Natl. Acad. Sci. USA 94:1041-1046
CHARLESWORTH, D., B. CHARLESWORTH, and M. T. MORGAN, 1995 The pattern of neutral molecular variation under the background selection model. Genetics 141:1619-1632[Abstract].
DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977 Maximum likelihood estimation from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39:1-38.
DI RIENZO, A. and A. C. WILSON, 1991 Branching pattern in the evolutionary tree for human mitochondrial DNA. Proc. Natl. Acad. Sci. USA 88:1597-1601
DI RIENZO, A., A. C. PETERSON, J. C. GARZA, A. M. VALDES, and M. SLATKIN et al., 1994 Mutational processes of simple-sequence repeat loci in human populations. Proc. Natl. Acad. Sci. USA 91:3166-3170
DI RIENZO, A., C. TOOMAJIAN, B. SISK, K. HAINES, and D. BARCH et al., 1995 STRP variation in human populations and their patterns of somatic mutations in cancer patients. Am. J. Hum. Genet. 57(Suppl.):A41.
DI RIENZO, A., P. DONNELLY, C. TOOMAJIAN, B. SISK, and A. HILL et al., 1998 Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics 148:1269-1284
DONNELLY, P. and S. TAVARÉ, 1995 Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29:401-421[Medline].
FELDMAN, M. W., A. BERGMAN, D. D. POLLOCK, and D. B. GOLDSTEIN, 1997 Microsatellite genetic distances with range constraints: analytic description and problems of estimation. Genetics 145:207-216[Abstract].
GOOD, P., 1994 Permutation Tests. Springer-Verlag, New York.
HARPENDING, H. C., C. S. T. SHERRY, A. R. ROGERS, and M. STONEKING, 1993 The genetic structure of ancient human populations. Curr. Anthropol. 34:483-496.
HARPENDING, H. C., M. A. BATZER, M. GURVEN, L. B. JORDE, and A. R. ROGERS et al., 1998 Genetic traces of ancient demography. Proc. Natl. Acad. Sci. USA 95:1961-1967
HUDSON, R. R. and N. L. KAPLAN, 1988 The coalescent process in models with selection and recombination. Genetics 120:831-840
JORDE, L. B., M. J. BAMSHAD, W. S. WATKINS, R. ZENGER, and A. E. FRALEY et al., 1995 Origins and affinities of modern humans: a comparison of mitochondrial and nuclear genetic data. Am. J. Hum. Genet. 57:523-538[Medline].
KAPLAN, N. L., T. DARDEN, and R. R. HUDSON, 1988 The coalescent process in models with selection. Genetics 120:819-829
KIMMEL, M. and R. CHAKRABORTY, 1996 Measures of variation at DNA repeat loci under a general stepwise mutation model. Theor. Popul. Biol. 50:345-367[Medline].
KIMMEL, M., R. CHAKRABORTY, J. P. KING, M. BAMSHAD, and W. S. WATKINS et al., 1998 Signatures of population expansion in microsatellite repeat data. Genetics 148:1921-1930
LEWONTIN, R. C., 1972 The apportionment of human diversity. Evol. Biol. 6:381-398.
LITT, M. and J. A. LUTY, 1989 A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am. J. Hum. Genet. 44:397-401[Medline].
MARTINSON, J. J., E. C. LAWRENCE, M. J. ALEXANDER, M. SWEENEY, and R. E. FERRELL, 1998 A population-based survey of STR allelic association at the chemokine receptor gene CCR5. Am. J. Hum. Genet. 63(Suppl.):A216.
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
PENA, S. D. J. (Editor), 1993 DNA Fingerprinting: State of the Science. Birkhäuser Verlag, Basel/Boston.
PRITCHARD, J. K. and M. W. FELDMAN, 1996 Statistics for microsatellite variation based on coalescence. Theor. Popul. Biol. 50:325-344[Medline].
REICH, D. and D. GOLDSTEIN, 1998 Genetic evidence for a Paleolithic human population expansion in Africa. Proc. Natl. Acad. Sci. USA 95:8119-8123
REICH, D. E., M. W. FELDMAN, and D. B. GOLDSTEIN, 1999 Statistical properties of two tests that use multilocus data sets to detect population expansions. Mol. Biol. Evol. 16:453-466.
ROE, A., 1992 Correlations and interactions in random walks and population genetics, Ph.D. Thesis, University of London.
ROGERS, A. R., 1995 Genetic evidence for a Pleistocene population explosion. Evolution 49:608-615.
SEIELSTAD, M., X. XU and X. XU, 1999 Direct observations of microsatellite mutations, p. 57 in Human Evolution, edited by L. L. CAVALLI-SFORZA, S. PÄÄBO and D. WALLACE. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
SHRIVER, M. D., L. JIN, R. CHAKRABORTY, and E. BOERWINKLE, 1993 VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134:983-993[Abstract].
SLATKIN, M., 1995 A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457-462[Medline].
SMITH, J. M. and J. HAIGH, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35[Medline].














