- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Fu, Y.-X.
- Articles by Chakraborty, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Fu, Y.-X.
- Articles by Chakraborty, R.
Simultaneous Estimation of All the Parameters of a Stepwise Mutation Model
Yun-Xin Fua and Ranajit Chakrabortyaa Human Genetics Center, University of Texas, Houston, Texas 77225
Corresponding author: Yun-Xin Fu, Human Genetics Center, University of Texas at Houston, 6901 Bertner Ave., Houston, TX 77030., fu{at}hgc.sph.uth.tmc.edu (E-mail).
Communicating editor: G. B. GOLDING
| ABSTRACT |
|---|
Minisatellite and microsatellite are short tandemly repetitive sequences dispersed in eukaryotic genomes, many of which are highly polymorphic due to copy number variation of the repeats. Because mutation changes copy numbers of the repeat sequences in a generalized stepwise fashion, stepwise mutation models are widely used for studying the dynamics of these loci. We propose a minimum chi-square (MCS) method for simultaneous estimation of all the parameters in a stepwise mutation model and the ancestral allelic type of a sample. The MCS estimator requires knowing the mean number of alleles of a certain size in a sample, which can be estimated using Monte Carlo samples generated by a coalescent algorithm. The method is applied to samples of seven (CA)n repeat loci from eight human populations and one chimpanzee population. The estimated values of parameters suggest that there is a general tendency for microsatellite alleles to expand in size, because (1) each mutation has a slight tendency to cause size increase and (2) the mean size increase is larger than the mean size decrease for a mutation. Our estimates also suggest that most of these CA-repeat loci evolve according to multistep mutation models rather than single-step mutation models. We also introduced several quantities for measuring the quality of the estimation of ancestral allelic type, and it appears that the majority of the estimated ancestral allelic types are reasonably accurate. Implications of our analysis and potential extensions of the method are discussed.
SINCE the discovery that a large number of loci with tandemly repeated sequences in human and many eukaryote species are highly polymorphic because of copy number variation of the repeats in different individuals (JEFFREYS 1985; ![]()
![]()
![]()
To avoid misinterpretation when using the information from these loci, understanding the dynamics of polymorphism at minisatellite and microsatellite loci is important. It is also vital for population and evolutionary study. Important for better understanding of the evolution of such loci is the estimation of relevant population parameters. There are several parameters of a population that can affect the pattern of polymorphism in a sample. The most important is
= 4Nµ, where N is the effective population size and µ is the mutation rate per locus per generation.
is primarily responsible for the amount of polymorphism in a sample. Under the infinite-allele model, that is, each mutation at a locus creates a new allele in the population,
is the only parameter for the distribution of polymorphism in a sample from a steady population. EWENS' (1972) sampling formula provides the basis for estimating
from a single quantitythe number of alleles in a sample. However, under SMM, the pattern of polymorphism in a sample becomes more complex and depends on additional parameters, the number of which depends on the complexity of the mechanism of evolution for such a locus. Unfortunately, a sampling distribution for a SMM parallel to EWENS' (1972) has not been found, resulting in difficulty in making inference under the SMM. Since the values of parameters are necessary in interpreting the evolution of a locus under the SMM, proper estimation of parameters is critical in studying the mechanism of evolution of loci under the SMM. To date, no method is available for simultaneously estimating all the parameters of the SMM, which limits the usefulness of such loci for studying the history of a population, particularly the human population.
We develop in this article a method for simultaneous estimation of all the parameters of the SMM and the ancestral allelic type of the alleles in a sample. The estimator is a combination of the minimum chi-square estimator (MCS) and Monte Carlo simulation, taking advantage of fast coalescent algorithms. We apply the method to the samples of dinucleotide repeats of ![]()
| STEPWISE MUTATION MODELS |
|---|
Let
ij be the probability that a mutation causes an allele size change from i to j. For a stable population, which is assumed throughout this article, a SMM is completely specified by the distribution
ij and the mutation parameter
= 4Nµ, where N is the effective population size and µ is the mutation rate per allele per generation. Following the introduction of the SMM by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Although most studies on SMMs assume either a single- or two-step SMM, there are as many SMMs as different distributions for
ij. One problem is that many SMMs can result in patterns of polymorphism that are practically indistinguishable. As a result, the choice of the distribution
ij is not trivial. Distributions that are sufficiently flexible and depend on few parameters, each having clear biological meaning, should be preferred. Although the method developed in this article for estimating parameters of a SMM applies to any distribution for
ij, we shall consider a distribution
ij that is homogeneous for a different value of i, partly because such a model has fewer parameters and partly because only the relative sizes of alleles in the samples analyzed later are known. We shall consider the following distribution:
![]() |
(1) |
1 and 0 < P < 1. We note that because
j>i
ij =
, the probability of a mutation increasing allele size is
, and the probability of a mutation decreasing allele size is 1 -
. In particular,
= 1 implies that mutations always increase allele size,
= 0 implies that mutations always decrease allele size, and
= 0.5 implies that there is equal chance of increasing and decreasing allele size.
It follows from the distribution (1) that given the direction of size change (increasing or decreasing) the size of a change is given by geometric distribution (1 - P)P|i-j|-1. A small value of P implies that the size of a change is likely small and a large value of P means that the size of a change can be large. An example of
ij is given in Figure 1. Note that the geometric distribution (1-P)P|i-j|-1 does not have maximum size (step) for a size change. Any size of change is possible at least theoretically. Although we can consider a truncated geometric distribution that imposes a maximum size of change, doing so will introduce another parameter. We note that, although any size change is possible under a geometric distribution, the probabilities for most changes of large size are small and therefore negligible in practice. For this reason, it is simpler to consider an effective number of steps rather than to impose an absolute maximum step. We define the effective number of steps of a SMM to be the smallest integer s such that when
ij <
, |i - j| > s for some threshold value
, we shall use
= 10-3. For our model, s increases with P and is the largest integer that is not larger than

Figure 2 plots the effective number of steps against the value of P.
|
|
Given that a locus evolves according to the SMM described above, the values of three parameters
,
, and P then determine the values of various moments of allele frequencies at equilibrium. Therefore, in general, moments computed from a sample can be used to estimate these parameters. Because moments are computed from allele frequencies of a sample, allele frequencies thus contain more information about the parameters than a set of moments do, and consequently the accuracy of estimation directly based on allele frequencies should be higher than that based on a small set of moments. However, ![]()
The sequences (chromosomes) of a sample of size n can be traced back to their most recent common ancestor (MRCA), who lived on average 4N(1 - 1/n) generations ago. It is obvious from a coalescent point of view that the distribution of allele frequencies in a sample is entirely determined by the allele A possessed by the MRCA and the three parameters
,
, and P of SMM. The allele frequencies of a population in the more distant past are irrelevant to the allele frequencies in a sample once the ancestral allele A is known or specified. This observation implies that inference on parameters
,
, and P based on allele frequencies in a sample can be made once A is specified.
In addition to the importance of the ancestral allele A in determining the distribution of allele frequencies in a sample, the value of A for a sample is itself of great interest in studying the evolution of the locus from which alleles were sampled. It is thus desirable to be able to infer the value of A as well as the values of
,
, and P from a sample. Such an inference would be difficult if it were based on a set of moments only. In this article, we treat A as a parameter whose value is to be estimated from a sample. For convenience, we collect these four parameters in a vector
![]() |
(2) |
can be estimated from a sample. MINIMUM CHI-SQUARE ESTIMATOR OF ![]() |
|---|
Let fi, i = 1, ... be the number of alleles of size i in a sample of n chromosomes. Our aim is to derive an estimate of
from {fi}.
There are two widely used approaches for estimating parameters: the maximum likelihood method and the least-squares method. Often both methods result in similar estimates and share many properties that are desirable. The maximum likelihood method computes the probabilities of allele frequencies {fi} given the value of
. The computation is difficult and time-consuming, but the likelihood approach has the advantage of being able to test hypotheses. On the other hand, least-squares-based methods compute the means and perhaps the variances and covariances of allele frequencies, which are much easier to compute than probabilities. Therefore, least-squares-based methods are generally easier to use in practice and are particularly appealing when a large number of samples need to be analyzed. The major disadvantage is that they are difficult to extend for hypothesis testing.
Let ei(
), i = 1, ... be the expected number of alleles of size i conditional on the value of
, i.e., ei(
) = E(fi|
). Our strategy is to find the value of
that minimizes the quantity
![]() |
(3) |
) is a function of
, and g > 0. When wi = 1 and g = 2, the value of
that minimizes L is the well-known least-squares estimator. However, since ei(
) is not linear with the parameters in
, the least-squares estimator is not necessarily a good choice. For a discrete distribution, a generalized least-squares method, often referred to as the MCS estimator, is usually a better one. The MCS estimator corresponds to g = 2 and wi = e-1i(
). Therefore, the MCS estimate of
is the value of
that minimizes the function ![]() |
(4) |
, but a random variable whose distribution depends on
. Nevertheless, we expect the MCS estimator to be a reasonably good estimator of
. Note that the MCS estimator was used in
Although some optimization procedures with constraints can be adapted to search for the MCS estimate, a simple grid search is sufficient here because there are only four parameters. What complicates this seemingly straightforward procedure is that a formula or an even numerical solution for ei(
) is not available at present. Therefore, their values have to be estimated from simulated samples. Estimating ei(
) for a large number of combinations of parameter values can be very time consuming even when samples are generated by a fast algorithm from coalescent theory (![]()
![]()
![]()
) for each
. The second step is to carry out a fine-scale grid search in the small area identified by the first step. In the fine-scale search, a relatively large number of Monte Carlo samples is used to obtain more accurate estimates of ei(
), and thus more accurate MCS estimates. When there are many samples to be analyzed, an alternative approach is to create a database of {ei} for all reasonable combinations of parameter values, and estimation for each sample will retrieve the values of {ei} from the database. This latter approach is the one we used in our analyses of 63 samples of dinucleotide repeats.
Let us consider how ei(
) can be estimated. Suppose we have M simulated samples of size n given the value of
, and suppose the number of alleles of size i in the jth sample is nij. Then we can use the mean êi of nij as an estimate of ei(
). That is,
![]() |
(5) |
= (
, P,
, A) can be simulated as follows:
First, a genealogy of n sequences (alleles) is generated using a coalescent algorithm (e.g., ![]()
. Second, assign a value to A, which is by definition the allele at the root of the genealogy; then determine the resulting allele of each mutation that occurred on the genealogy. Obviously the exercise requires knowing the allele type before a mutation and can be accomplished by starting from the root and progressing toward the tips of the genealogy. In this process, the type of a new mutant allele is simulated according to the distribution {
i}, which is completely specified by the values of P and
.
Table 1 shows examples of the values of êi and their standard errors for different numbers of Monte Carlo samples. Note that the majority of estimates are reasonably accurate even when M = 500. These results as well as many other simulation experiments we performed suggest that for the purpose of identifying the vicinity of the MCS estimate, 5001000 Monte Carlo samples is usually sufficient, and 10,000 Monte Carlo samples is adequate for a fine-scale grid search to obtain the final MCS estimate.
|
We define ti(n) = ei/n as the proportion of alleles being size i and measure the closeness between {ti(n)} for two different sample sizes by Euclidean distance, i.e.,

where e'i is the expected number of alleles i in a sample of size n'. An interesting observation that we made in our simulations is that values of {ti(n)} become steady rapidly even for a modest sample size. Table 2 shows several examples of Euclidean distances between {ti(n)} of different sample sizes. Note that the difference between them when sample size is larger than 100 is extremely small. This observation, although not too surprising from the viewpoint of coalescent process, is a reward for our MCS method, because it means that one can convert with excellent accuracy the estimate êi for a sample of size n to the corresponding estimate for a sample of size m by mêi/n, saving considerable computer cpu time when many samples are to be analyzed.
|
| APPLICATIONS |
|---|
Since the discovery of highly polymorphic CA-repeat loci (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
Our task of estimation appears to be challenging at first glance because there are 63 samples to be analyzed, each requiring a considerable amount of computer cpu time. The observation we made earlier that the proportion of alleles of a given size is rather insensitive to sample size is of great help. Instead of generating a large number of samples for estimating ei for each of the 63 samples, we choose to obtain good estimates of ei/n for a sample of 100 chromosomes only and rescale them to obtain ei for samples of different sizes. We selected
= 0.2(0.2)10 (i.e., 0.2, 0.4, ..., 9.8, 10), P = 0.01(0.01)0.10(0.05)0.9, and
= 0.5(0.5)1.0, a total of 14,500 combinations of values of the three parameters. For each set of parameters, we generated 20,000 independent samples and obtained estimates of ei/n from (5). Note that we do not need to estimate ei/n for
< 0.5 separately, because they can be obtained from those for
> 0.5 because of symmetry. Therefore, we effectively obtained estimates of ei/n for 29,000 sets of parameters. These estimates were stored in a database and can be retrieved easily.
For each of the 63 samples, we performed a grid search over all the parameter sets to obtain a MCS estimate of
. In other words, for each of 29,000 parameter sets, we computed the
2 value, updated the minimum
2 value and corresponding parameter values, and obtained the MCS estimate after all the parameter sets had been examined. This fine-scale grid search still requires nontrivial computer cpu time but is quite manageable. The estimation results are given in Table 3.
|
Note that 26 of the 63 samples show contraction (
< 0.5) in allele sizes and 37 show expansion (
> 0.5) in allele sizes (Table 3). This suggests that there is a slight bias of mutations toward expansion. Table 3 also suggests that most of the loci evolve in a multi-stepwise fashion. We divided samples into two groups, one showing allele size contraction and the other expansion; we then found that P values of the latter group are considerably larger than those of the former group, which means that size change during expansion is likely greater than that during contraction. This result implies that if the probability
of expanding allele size by a mutation is the same as or even slightly smaller than the probability of contracting allele size, the alleles in a population will still tend to increase in size. Because Table 3 shows that
in the majority of samples is larger than 1/2, there is a tendency, stronger than that suggested by
alone, that most loci are expanding in allele size.
![]()
![]()
![]()
Let
ij be the value of
for the ith population at the jth locus. If these loci are selectively neutral, then it would be reasonable to assume that the mutation rate at each locus is the same for different populations. Therefore, under the neutrality assumption,
ij/
ij' = Nj/Nj', where Nj and Nj' are the effective population sizes of the jth and j'th populations, respectively. That is, the ratio of
s of two different populations is independent of the locus studied. If estimates of
are accurate, then we should expect to see a consistent value of ratio of
s over different loci. The estimated
s in Table 3 show that this ratio for most pairs of populations is not very consistent over loci. This suggests that the variance in
estimate is likely to be large. The estimated
s also vary considerably among different populations for each locus, but this is expected because of different effective population sizes. A large variance in the estimation of
is not unexpected because the estimate of
, [for example, by WATTERSON's (1975) estimator], from segregating sites of DNA sequences that contain more information about
than microsatellite data, is also accompanied by a relatively large variance. Furthermore, KIMMEL and CHAKRABORTY (1996) showed that the variance of a variance estimator of
does not diminish with sample size.
When we draw conclusions based on estimated values of parameters, which are associated with variances, it is important to have some measures of accuracy in the estimates. Often the variance of an estimate of a single parameter is hard enough to compute, and measuring the accuracy of simultaneous estimation of all four parameters in a SMM model is undoubtedly more difficult, but nevertheless of great importance for recommending a method. This is an area in which answers demand substantially more computer resource than estimation, and it deserves much effort in the future. Therefore, we do not intend to provide all the answers here. Instead, we will focus on discussing the accuracy in the estimation of ancestral allele A. Although estimates of the four parameters are interrelated, we observed that values of the three parameters p,
, and
that result in
2 that is close to the minimum are also close to the set that results in minimum
2, while MCS estimates conditional on different ancestral alleles can differ substantially. This is not difficult to understand. For example, specifying a small ancestral allele size (relative to the sizes of alleles in a sample) will require large
and P (or
) to explain the observed allele frequencies, while, on the other hand, specifying a large ancestral allele size will result in a small estimated value for
. Our experience suggests that accuracy in the estimation of an ancestral allele is a good indicator of the overall accuracy of estimation.
To measure the accuracy of the estimate of A, we compare the minimum
2 values conditional on different ancestral allele sizes. One way to facilitate the comparison is a plot of
2 values vs. various ancestral allele sizes. A sharp decrease in the overall minimum value of
2 at allele A should suggest high accuracy of estimation. To allow comparison of the estimates for different samples, we use the ratio of conditional minimum
2 to the overall minimum
2. Figure 3 shows two examples. Part a suggests that the estimate of A for the FLT1 locus from population DG is accurate but the estimate from population CH is uncertain. The allele frequencies in Figure 3 concur with this analysis, because the frequency of estimated A = 168 in the DG sample is extremely high, while the frequency of estimated A = 176 in the CH sample is only intermediate, although it is the highest. Figure 4B also shows a similar pattern for locus D13S122. It is interesting to note that estimated A = 105 for the SA sample is not the most frequent allele in that sample. Table 3 shows that the
estimate from this sample is suspiciously large. Because Figure 3B shows that the estimate of A for this sample is rather uncertain, we expect that quite different sets of P,
, and
can result in
2 values close to the minimum. Indeed, when ancestral allele A is set to 85 the conditional minimum
2 value is equal to 154.59, which is less than 1% larger than the minimum
2 value, and the corresponding estimates of P,
, and
are 0.3, 1.0, and 4.2, respectively.
|
When the ancestral allele is reasonably certain, it makes sense to examine closely the estimation of other parameters. Take the locus D13S122, for example; allele size 95 appears to be the ancestral allele for the PH and GR samples (see Figure 3 and Table 4). Under the condition A = 95, we examined the minimum
2 values for different values of
, and results are given in Figure 5. For the PH sample, Figure 5 shows that it is very unlikely that
< 0.5 while for the GR sample,
should not be substantially different from 0.5. These conclusions are reinforced by the allele frequencies in the two samples that are shown in Figure 3.
|
|
Another measure R2 of accuracy is the ratio of the
2 value of the second best estimate of A to that of the best. The larger the ratio, the worse the fit for the second best A, and thus the better the estimate of A. Another useful measure Rm is the ratio of the mean
2 values of the two neighboring sizes of A to that of A, which measures the goodness of the A compared to its neighbors. Obviously we always have R2
Rm. These two measures are particularly convenient when there are many samples to analyze, as in our situation. The values of R2 and Rm, as well as the minimum
2 value for the 63 samples, are given in Table 4.
Table 4 shows that 43 of the 63 samples result in R2 > 1.20, and 30 result in R2 > 1.5. Although further study is required for proper interpretation of these R2 values, it appears that other than the MCS estimate an increase of 20% or more in the
2 value for an ancestral allele should be a reasonable indication that the estimation is not totally out of line.
| DISCUSSION |
|---|
A strength of the MCS estimator developed in this article is its ability to simultaneously estimate all the parameters of the SMM, including the ancestral allele, making better use of available information in a sample. To date, mutation mechanisms for minisatellite and microsatellite loci are not yet fully understood, and even less is known about the mode of mutations, i.e., whether it is symmetric or nonsymmetric, single-step or multi-step. Although ![]()
Another strength of our method is its flexibility. Although we only considered a mutation model with three parameters and assumed a constant population size, it is not difficult to see that any combination of mutation model and population genetics model can be analyzed in a similar manner as long as alleles under these models can be simulated. In particular, one can consider mutation models with constraints on allele size or mutation rates dependent on allele size and more complex population genetic models, such as growth populations or subdivided populations. This strength should not be overlooked for two reasons. First, rapid accumulation of population samples from microsatellite and minisatellite loci provides excellent opportunities to examine various mutation models, and, second, many natural populations, particularly human populations, are not panmictic. Proper statistical inferences should be based on more realistic population models, allowing for population growth and subdivision. The expectation of alleles of a given size as estimated by Monte Carlo simulation provides great flexibility of the method, although it has the drawback of requiring more computer cpu time. Note that methods for parameter estimation that rely on Monte Carlo samples to obtain some necessary quantities were used in ![]()
The MCS estimator we developed is a generalized least squares estimator and is often used in statistics for discrete distributions. Therefore, we expect our estimator to have many desirable properties. Even though the procedure of estimation takes advantage of a fast coalescent algorithm, it is still a time-consuming method, which makes it hard to investigate the statistical properties of the estimator. Nevertheless, the statistical properties of the estimator will be worth studying in the future.
There are a number of potential extensions to the method we proposed. We chose to obtain parameter estimates from allele frequencies, but the same approach can be applied to a set of summary statistics, including various moments, number of alleles, heterozygosity, etc. It is also possible to incorporate variances and covariances of allele frequencies into an estimator, although doing so will demand even more computer resource. Another potential extension is to use the
2 statistics for testing hypotheses, such as the hypothesis that mutations follow a single-step SMM, but hypothesis testing is likely to be more effective using likelihood-based approaches such as the method by NIELSON (1997).
| ACKNOWLEDGMENTS |
|---|
This work was supported in part by National Institutes of Health grant R29 GM-50428 (Y.-X.F.) and GM58545 (R.C.).
Manuscript received March 13, 1998; Accepted for publication May 26, 1998.
| LITERATURE CITED |
|---|
BOWCOCK, A. M., A. RUIZ-LINARES, J. TOMFOHRDE, E. MINCH, and J. R. KIDD et al., 1994 High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455-457[Medline].
CHAKRABORTY, R. and M. NEI, 1982 Genetic differentiation of quantitative characters between populations or species. Genet. Res. 39:303-314.
CHAKRABORTY, R. and M. NEI, 1997 Bottleneck effects on average heterozygosity and genetic distance with the stepwise mutation model. Evolution 31:347-356.
DEKA, R., L. JIN, M. D. SHRIVER, L. M. YU, S. DECROO, and J. HUNDRIESER et al., 1995 Population genetics of dinucleotide (dC-dA)n·(dG-dT)n polymorphism in world populations. Am. J. Hum. Genet. 56:461-474[Medline].
DI RIENZO, A., A. C. PETERSON, J. C. GARZA, A. M. VALDES, and M. STATKIN et al., 1994 Mutational processes of simple-sequence repeat loci in human populations. Proc. Natl. Acad. Sci. USA 91:3166-3170
EWENS, W. J., 1972 The sampling theory of selectively neutral alleles. Theor. Pop. Biol. 3:87-112[Medline].
FU, Y. X., 1994 Estimating effective population size or mutation rate using the frequencies of mutations of various classes in a sample of DNA sequences. Genetics 138:1375-1386[Abstract].
HUDSON, R. R., 1991 Gene genealogies and the coalescent process, pp. 144 in Oxford Surveys in Evolutionary Biology, Vol. 7, edited by D. FUTUYMA and J. ANTONOVICS. Oxford University Press, New York.
JEFFREYS, A. J., V. WILSON, and S. L. THEIN, 1985 Hypervariable "minisatellite" regions in human DNA. Nature 314:67-73[Medline].
KAMINO, K., J. NAKURA, K. KIHARA, L. YE, and K. NAGANO et al., 1993 Population variation in dinucleotide repeat polymorphism at the D8S360 locus. Hum. Mol. Genet. 2:1751
KIMMELL, M. and R. CHAKRABORTY, 1996 Measure of variation at DNA repeat loci under a general stepwise mutation model. Theor. Popul. Biol. 50:345-367[Medline].
KIMMEL, M., R. CHAKRABORTY, D. N. STIVERS, and R. DEKA, 1996 Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between-population variability at microsatellite loci. Genetics 143:549-555[Abstract].
KINGMAN, J. F. C., 1982a The coalescent. Stochastic Process. Appl. 13:235-248.
KINGMAN, J. F. C., 1982b On the genealogy of large populations. J. Appl. Probab. 19A:27-43.
LI, W. H., 1976 Distribution of nucleotide differences between two randomly chosen cistrons in a subdivided population: the finite island model. Theor. Popul. Biol. 10:303-308[Medline].
LITT, M. and J. A. LUTY, 1989 A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am. J. Hum. Genet. 44:397-401[Medline].
MORAN, P. A. P., 1975 Wandering distributions and the electrophoretic profile. Theor. Popul. Biol. 8:318-330[Medline].
NIELSEN, R., 1997 A likelihood approach to populations samples of microsatellite alleles. Genetics 146:711-716[Abstract].
OHTA, T. and M. KIMURA, 1973 A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22:201-204[Medline].
RUBINSZTEIN, D. C., W. AMOS, J. LEGGO, S. GOODBURN, and S. JAIN et al., 1995 Microsatellite evolutionevidence for directionality and variation in rate between species. Nature Genet. 10:337-343[Medline].
SHRIVER, M. D., L. JIN, R. CHAKRABORTY, and E. BOERWINKLE, 1993 VNTR allele frequency distribution under the stepwise mutation model: a computer simulation approach. Genetics 134:983-993[Abstract].
STUART, A., and J. K. ORD, 1991 Kendall's Advanced Theory of Statistics, Vol. II, Ed. 6. Oxford University Press, New York.
TAUTZ, D., 1993 Notes on the definition and nomenclature of tandemly repetitive DNA sequence, pp. 2128 in DNA Fingerprinting: State of the Science, edited by S. D. J. PENA, R. CHAKRABORTY, J. T. EPPEN and A. J. JEFFREYS.
VALDES, A. M., M. SLATKIN, and N. B. FREIMER, 1993 Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737-749[Abstract].
WATTERSON, G. A., 1975 On the number of segregation sites. Theor. Popul. Biol. 7:256-276[Medline].
WEBER, J. L. and P. E. MAY, 1989 Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am. J. Hum. Genet. 44:388-396[Medline].
WEBER, J. L. and C. WONG, 1993 Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123-1128
WEIR, B. S., A. H. D. BROWN, and D. R. MARSHALL, 1976 Testing for selective neutrality of electrophoretically detectable protein polymorphisms. Genetics 84:639-659
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Fu, Y.-X.
- Articles by Chakraborty, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Fu, Y.-X.
- Articles by Chakraborty, R.









