- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Whittaker, J. C.
- Articles by Sibly, R. M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Whittaker, J. C.
- Articles by Sibly, R. M.
Likelihood-Based Estimation of Microsatellite Mutation Rates
John C. Whittakera,b, Roger M. Harborda,c, Nicola Boxalld, Ian Mackaye, Gary Dawsone, and Richard M. Siblyda School of Applied Statistics, University of Reading, Reading RG6 6FN, United Kingdom,
b Department of Epidemiology and Public Health, Imperial College London, London W2 1PG, United Kingdom,
c Department of Social Medicine, University of Bristol, Bristol BS8 2PR, United Kingdom,
d School of Animal and Microbial Sciences, University of Reading, Reading RG6 6AJ, United Kingdom
e Oxagen Ltd., Abingdon OX14 4RY, United Kingdom
Corresponding author: John C. Whittaker, Imperial College School of Medicine, St. Mary's Campus, Norfolk Pl., London W2 1PG, United Kingdom., j.whittaker{at}ic.ac.uk (E-mail)
Communicating editor: Y.-X. FU
| ABSTRACT |
|---|
Microsatellites are widely used in genetic analyses, many of which require reliable estimates of microsatellite mutation rates, yet the factors determining mutation rates are uncertain. The most straightforward and conclusive method by which to study mutation is direct observation of allele transmissions in parent-child pairs, and studies of this type suggest a positive, possibly exponential, relationship between mutation rate and allele size, together with a bias toward length increase. Except for microsatellites on the Y chromosome, however, previous analyses have not made full use of available data and may have introduced bias: mutations have been identified only where child genotypes could not be generated by transmission from parents' genotypes, so that the probability that a mutation is detected depends on the distribution of allele lengths and varies with allele length. We introduce a likelihood-based approach that has two key advantages over existing methods. First, we can make formal comparisons between competing models of microsatellite evolution; second, we obtain asymptotically unbiased and efficient parameter estimates. Application to data composed of 118,866 parent-offspring transmissions of AC microsatellites supports the hypothesis that mutation rate increases exponentially with microsatellite length, with a suggestion that contractions become more likely than expansions as length increases. This would lead to a stationary distribution for allele length maintained by mutational balance. There is no evidence that contractions and expansions differ in their step size distributions.
MICROSATELLITES consist of repeats of short sequences (16 bp) of DNA and are very common in eukaryotic genomes. They are highly mutable, with the primary mutational mechanism believed to be replication slippage (![]()
![]()
![]()
![]()
![]()
![]()
The simplest model of microsatellite mutation, often known as the stepwise mutation model (SMM; ![]()
![]()
![]()
![]()
However, fitting these mutational models to data is not straightforward, because of the difficulty of obtaining sufficient mutational events. Two broad strategies have been pursued. First, the mutational model may be used to provide an equilibrium distribution of microsatellite length, which may then be compared with that observed in DNA sequences, either by genotyping a number of individuals at one or more markers (![]()
![]()
![]()
Direct observation of the mutational events is preferable if possible, but involves much more genotyping. Typing of large numbers of sperm is one approach (![]()
![]()
![]()
Despite these difficulties, some interesting results have emerged. It is now generally accepted that mutation rate increases with allele length, measured as number of repeats (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| METHODS |
|---|
Data collection:
Automated genotyping of blood samples from 680 individuals was performed by an ABI PRISM 377 DNA sequencer and interpreted with associated software GeneScan and Genotyper Software v 3.6 (Applied Biosystems, Foster City, CA) to filter out stutter peaks and A+ peaks. Genotyping errors are removed using the protocols in ![]()
![]()
![]()
![]()
Allele sizes were converted to number of repeats by stripping out the nonmicrosatellite bases between the two primer sequences using sequence data from the Foundation Jean Dausset-CEPH database (version 8.1; ![]()
Statistical analysis:
The likelihood of the observed data set was calculated for each of the models described in RESULTS, as follows. Consider a triplet with parental marker genotypes (x1, x2, x3, x4) and child genotype (x5, x6), where alleles (x1, x2) are carried by one parent and (x3, x4) by the other. To avoid the need to model parental genotype probabilities, we worked with the likelihood conditional on parental genotypes. Writing pij for the probability that a microsatellite of length i in the parental generation mutates to length j in the child generation, this conditional likelihood is easily shown to be equal to
![]() |
(1) |
We consider a number of possible models for pij below.
Since child genotypes are independent even for sibs once we have conditioned on parental genotypes, the likelihood for the complete data set is then simply the product over all parent-offspring triplets. Maximization for the models discussed above gives estimates for the underlying parameters and allows the calculation of confidence intervals and the comparison of nested models via the usual statistical machinery.
Statistical analysis was performed using the statistical language S-PLUS. Likelihoods were maximized using the built-in function nlminb, calling functions for likelihood calculations coded in C for speed. Times for a single maximization were of the order of 10100 sec, depending on the complexity of the model fitted, on a Sun Ultra 10 workstation running Solaris (SPECfp95 12.9). We found that working with logistic transforms of the transition probabilities pij gave much-improved numerical properties. Model comparisons are based on the usual result that for two nested models for which d1 and d2 > d1 and L1 and L2 denote the number of independent parameters and values of the maximized likelihood, respectively, 2(log L2 - log L1) has asymptotically a
2d2-d1 distribution when model 1 is true (![]()
![]() |
(2) |
for the models considered. Standard procedure is to choose the model minimizing AIC as optimal (![]()
![]()
| RESULTS |
|---|
The data set contained 118,866 parent-offspring transmissions from 59,433 parent-offspring triplets. We identified 53 Mendelian discrepancies, giving a naive estimate of the overall mutation rate of 4.5 x 10-4 per allele transfer. It is never possible to identify with certainty the mutational event causing the discrepancy, but by taking from the set of possible mutations the one involving the smallest change in length, measured as number of AC repeats as described above, we were able to classify the mutations by step size and direction (Fig 1) and thus obtained naive estimates of mutation rates by length (Fig 2).
|
|
These naive estimates may be biased, as explained in the Introduction. To circumvent this, we adopted the likelihood-based approach described in METHODS and applied it to a series of mutational models chosen to be compatible with existing knowledge and with Fig 1 and Fig 2. For convenience, the parameters of the models are listed in Table 1.
|
In the first model considered, we allow for mutations of any step size, but acknowledge that smaller changes in the number of repeats are more common than larger changes by assuming that the probability of a mutation of step size k declines geometrically with k. This model therefore has two parameters: µ, which represents the overall mutation rate, and an exponential decay rate parameter
. This model (1 in Table 1) is symmetric in that the same relationship is assumed for up and down mutations, and it is length independent in that mutation rates are independent of microsatellite length. Generalizations (24 in Table 1) allow for either overall mutation rate or the exponential decay parameter, or both, to vary according to the direction of the mutation. Thus the most general model in this family has four parameters, µu and µd controlling overall mutation rate and
u and
d controlling the exponential decay with increasing step size, with the subscripts u and d indicating whether the mutation gives an increase or decrease in microsatellite length, respectively. With pij the probability that a microsatellite of length i in the parental generation mutates to length j in the child generation as above, we thus obtain
![]() |
(3) |
However, since there is strong evidence that mutation rates increase with the length of the parental allele (![]()
![]()
![]()
u and
d determining the underlying mutation rate,
u and
d controlling the rate of change of mutation rate with parental microsatellite length, and
u and
d as in the length-independent models. Thus pij is as above but with

Maximized log-likelihoods and values of the AIC are given in Table 1 for each of these models. The length-independent models (14) differ little in log-likelihood (Table 1), with none giving a significant improvement over model 1. However, all the length-dependent models (59) give hugely significant improvements in fit compared to any of the length-independent models (P < 10-15, by the usual likelihood-ratio tests). The best-fitting model based on the AIC is model 8, in which the direction of mutation affects the dependence of mutation rate on parental allele length but not on step size. The natural model with which to compare model 8 is model 5, in which length dependence is the same for up and down mutations. Comparing these two models gives a change in log-likelihood of 2.69, which, by referring 2 x 2.69 to the
22 distribution, gives P = 0.068, and so is marginally significant. Parameter estimates and 95% confidence intervals of the parameters in these models are given in Table 2. Fig 3 shows the mutation rates predicted by model 8, plotted on a log scale against parental allele length.
|
|
Model 8 predicts that up mutations will exceed down mutations for microsatellites of <20 repeats, with the opposite true for microsatellites with >20 repeats. Though we must bear in mind the considerable uncertainty in parameter estimates when interpreting these values, this would lead to an equilibrium distribution for microsatellite length with a mode at 20, which is exactly what we see in the distribution of parental alleles plotted in Fig 4. Thus it is possible that the length distribution at the loci we studied is maintained by the length-dependent mutation bias reported in Fig 3. This cannot apply to all AC loci, however, since in the genome overall the frequency of AC alleles decreases with their length (![]()
|
We also considered models in which mutation rate increased linearly with allele length as in ![]()
![]()
| DISCUSSION |
|---|
The maximum-likelihood method introduced here avoids the biases inherent in the naive estimation methods used in previous studies and makes fuller use of the available data. Simulation results suggest that for our data set naive estimates of mutation rates are biased downward by
12% (![]()
The method described here involves conditioning on parental genotypes (1), so that all information on mutation is obtained directly from parent-child transmissions. It is also possible to write down the complete likelihood, including terms for the likelihood of parental genotypes. However, these depend on the population relative frequencies of genotypes in the population, which in turn depend both on the mutational model and on population history. In principle, we could therefore write down a complete likelihood, incorporating both a mutational model and a model for population history, which unifies the "direct" and "indirect" approaches to inference on microsatelite mutation. However, here we have preferred to concentrate on direct inference of mutational mechanisms, avoiding dependence on population history.
Our method also assumes that at each locus we have no information about which parental allele is transmitted aside from allele length at that locus. If information on linked markers is available this could be incorporated, thus reducing uncertainty about the parental origin of transmitted alleles. It should be possible to modify the algorithms used for likelihood calculations in linkage analysis (e.g., ![]()
Size and direction of mutational steps:
The naive analysis presented in Fig 1 suggests that most mutations consist of steps of one repeat, with the distribution of step sizes similar in both up and down directions. The maximum-likelihood analysis reinforces these conclusions (Table 1). Thus although in the best-fitting model (8) up and down mutation rates have separate relationships with parental allele length, the distribution of step sizes given that a mutation occurs is controlled by a single parameter,
, and no significant increase in likelihood was obtained by separating mutations according to whether they increased or decreased microsatellite length.
The maximum-likelihood estimate of the exponential decay rate parameter
was 1.06 (Table 1). This suggests that 65% of mutations were of step size 1, 23% of size 2, and the remaining 12% of step size greater than 2. The frequency of multistep changes is higher than that recorded in most previous studies, which give values in the range 014% (![]()
![]()
![]()
![]()
![]()
![]()
of (0.68, 1.43) x 10-7; taking the boundary points gives 23 and 50% multistep changes. If we treat the 61 multistep mutations identified by ![]()
![]()
![]()
Length-dependent mutational bias and evolutionary equilibrium:
Several previous studies have suggested that microsatellite mutations are biased toward expansion, reporting an excess of increases over decreases in microsatellite lengths (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Demonstration of length-dependent mutational bias is particularly important in the context of the controversy regarding the nature of the factors that constrain microsatellite lengths. Early models of the slippage mutation process carried the implication that some microsatellite lengths would increase indefinitely over evolutionary time, but in practice lengths only very rarely exceed a few tens of repeats (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| CONCLUSIONS |
|---|
We have introduced a likelihood-based procedure that allows formal comparison of competing models of microsatellite mutation. Application to data composed of 118,866 parent-offspring transmissions of AC microsatellites provides very strong evidence that the mutation rate of microsatellite loci is length dependent and some support for the hypothesis that the stationary distribution of microsatellites is maintained by mutational balance (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Key advantages of the method presented here are that it avoids bias in parameter estimation due to unobserved mutations, it makes full use of the available data, and, by allowing comparisons between models, it is readily extended to investigate other aspects of microsatellite evolution. For example, if data from microsatellites with several repeat motifs were available it would be easy to add dependence of mutation rate on repeat motif to the models discussed here.
| ACKNOWLEDGMENTS |
|---|
We thank Bill Amos for helpful comments on this work and for giving us the Excel macro used to convert allele sizes to number of repeats and Mark Beaumont for his insightful reading of an earlier draft of this article. Two referees made helpful comments on the original submission.
Manuscript received November 5, 2002; Accepted for publication February 25, 2003.
| LITERATURE CITED |
|---|
AMOS, W. and D. C. RUBINSTZEIN, 1996 Microsatellites are subject to directional evolution. Nat. Genet. 12:13-14.[Medline]
ANDERSON, D. R., and K. P. BURNHAM, 1998 Model Selection and Inference: A Practical Information-Theoretic Approach. Springer-Verlag, New York.
BELL, G. I. and J. JURKA, 1997 The length distribution of perfect dimer repetitive DNA is consistent with its evolution by an unbiased single step mutation process. J. Mol. Evol. 44:414-421.[Medline]
BRINKMANN, B., M. KLINTSCHAR, F. NEUHUBER, J. HUHNE, and B. ROLF, 1998 Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62:1408-1415.[Medline]
COOPER, G., N. J. BURROUGHS, D. A. RAND, D. C. RUBINSZTEIN, and W. AMOS, 1999 Markov chain Monte Carlo analysis of human Y-chromosome microsatellite provides evidence of biased mutation. Proc. Natl. Acad. Sci. USA 96:11916-11921.
COX, D. R., and D. V. HINKLEY, 1974 Theoretical Statistics. Chapman & Hall, London.
DIB, C., S. FAURE, C. FIZAMES, D. SAMSON, and N. DROUOT et al., 1996 A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380:152-154.[Medline]
ELLEGREN, H., 2000a Heterogeneous mutation processes in human microsatellite DNA sequences. Nat. Genet. 24:400-402.[Medline]
ELLEGREN, H., 2000b Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet. 16:551-558.[Medline]
EWEN, K. R., M. BAHLO, S. A. TRELOAR, D. F. LEVINSON, and B. MOWRY et al., 2000 Identification and analysis of error types in high-throughput genotyping. Am. J. Hum. Genet. 67:727-736.[Medline]
HARBORD, R. M., 2001 Modelling microsatellite evolution using directly observed mutations. M.Sc. Dissertation, University of Reading, Reading, UK.
HARR, B. and C. SCHLOTTERER, 2000 Long microsatellite alleles in Drosophila melanogaster have a downward mutation bias and short persistence times, which cause their genome-wide underrepresentation. Genetics 155:1213-1220.
HUANG, Q.-Y., F.-H. XU, H. SHEN, H.-Y. DENG, and Y.-J. LIU et al., 2002 Mutation patterns at dinucleotide microsatellite loci in humans. Am. J. Hum. Genet. 70:625-634.[Medline]
KAYSER, M., L. ROEWER, M. HEDMAN, L. HENKE, and J. HENKE et al., 2000 Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs. Am. J. Hum. Genet. 66:1580-1588.[Medline]
KRUGLYAK, L., M. J. DALY, M. P. REEVE-DALY, and E. S. LANDER, 1996 Parametric and non-parametric linkage analysis: a unified approach. Am. J. Hum. Genet. 58:1347-1363.[Medline]
KRUGLYAK, S., R. DURRETT, D. SCHUG, and C. AQUADRO, 1998 Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations Proc. Natl. Acad. Sci. USA 95:10774-10778.
KRUGLYAK, S., R. DURRETT, D. SCHUG, and C. AQUADRO, 2000 Distribution and abundance of microsatellites in the yeast genome can be explained by a balance between slippage events and point mutations. Mol. Biol. Evol. 17:1210-1219.
LEEFLANG, E. P., L. ZHANG, S. TAVARE, R. HUBERT, and J. SRINIDHI et al., 1995 Single sperm analysis of the trinucleotide repeats in the Huntington's disease gene: quantification of the mutation frequency spectrum. Hum. Mol. Genet. 4:1519-1526.
O'CONNELL, J. and D. WEEKS, 1998 PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am. J. Hum. Genet. 63:259-266.[Medline]
OHTA, T. and M. KIMURA, 1973 The model of mutation appropriate to calculate the number of electrophoretically detectable alleles in a genetic population. Genet. Res. 22:201-204.[Medline]
PRIMMER, C. R., N. SAINO, A. P. MOLLER, and H. ELLEGREN, 1998 Unraveling the processes of microsatellite evolution through analysis of germ line mutations in barn swallow Hirundorustica. Mol. Biol. Evol. 15:1047-1054.
SCHLOTTERER, C., R. RITTER, B. HARR, and G. BREM, 1998 High mutation rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele-specific mutation rates. Mol. Biol. Evol. 15:1269-1274.[Abstract]
SIBLY, R. M., J. C. WHITTAKER, and M. TALBOT, 2001 A maximum-likelihood approach to fitting equilibrium models of microsatellite evolution. Mol. Biol. Evol. 18:413-417.
SIBLY, R. M., A. MEADE, N. BOXALL, M. WILKINSON, and D. W. CORNE et al., 2003 The structure of interrupted human AC microsatellites. Mol. Biol. Evol. 20:453-459.
TAUTZ, D., 1993 DNA fingerprinting: state of the science, pp. 2128 in DNA Fingerprinting: State of the Science, edited by S. D. J. PENA, R. CHAKRABORTY, J. T. EPPLEN and A. J. JEFFREYS. Birkhauser, Basel, Switzerland.
VENABLES, W. N., and B. D. RIPLEY, 1999 Modern Applied Statistics with S-PLUS. Springer, New York.
WEBER, J. L. and C. WONG, 1993 Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123-1128.
WIERDL, M., M. DOMINSKA, and T. D. PETES, 1997 Microsatellite instability in yeast: dependence on the length of the microsatellite. Genetics 146:769-779.[Abstract]
XU, X., M. PENG, Z. FANG, and X. XU, 2000 The direction of microsatellite mutations is dependent upon allele length. Nat. Genet. 24:396-399.[Medline]
This article has been cited by other articles:
![]() |
A.-L. Raquin, F. Depaulis, A. Lambert, N. Galic, P. Brabant, and I. Goldringer Experimental Estimation of Mutation Rates in a Wheat Population With a Gene Genealogy Approach Genetics, August 1, 2008; 179(4): 2195 - 2211. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Seyfert, M. E. A. Cristescu, L. Frisse, S. Schaack, W. K. Thomas, and M. Lynch The Rate and Spectrum of Microsatellite Mutation in Caenorhabditis elegans and Daphnia pulex Genetics, April 1, 2008; 178(4): 2113 - 2121. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Zhang and N. A. Rosenberg On the Genealogy of a Duplicated Microsatellite Genetics, December 1, 2007; 177(4): 2109 - 2122. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Hoelzel, J. Hey, M. E. Dahlheim, C. Nicholson, V. Burkanov, and N. Black Evolution of Population Structure in a Highly Social Top Predator, the Killer Whale Mol. Biol. Evol., June 1, 2007; 24(6): 1407 - 1415. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. A. Okello, G. Wittemyer, H. B. Rasmussen, I. Douglas-Hamilton, S. Nyakaana, P. Arctander, and H. R. Siegismund Noninvasive Genotyping and Mendelian Analysis of Microsatellites in African Savannah Elephants J. Hered., November 1, 2005; 96(6): 679 - 687. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sainudiin, R. T. Durrett, C. F. Aquadro, and R. Nielsen Microsatellite Mutation Models: Insights From a Comparison of Humans and Chimpanzees Genetics, September 1, 2004; 168(1): 383 - 395. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Whittaker, J. C.
- Articles by Sibly, R. M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Whittaker, J. C.
- Articles by Sibly, R. M.









