Genetics, Vol. 152, 1103-1110, July 1999, Copyright © 1999

Pattern of Nucleotide Substitution and Rate Heterogeneity in the Hypervariable Regions I and II of Human mtDNA

Sonja Meyera, Gunter Weissa, and Arndt von Haeselera
a Max-Planck-Institut für evolutionäre Anthropologie, D-04103 Leipzig, Germany

Corresponding author: Arndt von Haeseler, Max-Planck-Institut für evolutionäre Anthropologie, Inselstr. 22, D-04103 Leipzig, Germany., haeseler{at}eva.mpg.de (E-mail)

Communicating editor: S. TAVARÉ


*  ABSTRACT
*TOP
*ABSTRACT
*DATA
*MODEL AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

This study provides a comprehensive survey of the complex pattern of nucleotide substitution in the control region of human mtDNA, which is of central importance to the studies of human evolution. A total of 1229 different hypervariable region I (HVRI) and 385 different hypervariable region II (HVRII) sequences were analyzed using a complex substitution model. Moreover, we suggest a new method to assign relative rates to each site in the sequence. Estimates are based on maximum-likelihood methods applied to randomly selected subsets of sequences. Our results indicate that the rate of substitution in HVRI is approximately twice as high as in HVRII and that this difference is mainly due to a higher frequency of pyrimidine transitions in HVRI. However, rate heterogeneity is more pronounced in HVRII.


SEQUENCES from the noncoding control region of mitochondrial DNA are widely used to address questions concerning genetic variation within species (CANN et al. 1987 Down; GARNER and RYDER 1996 Down; WATSON et al. 1996 Down; GOLDBERG and RUVOLO 1997 Down). Their high evolutionary rate and their maternal inheritance make them exceptionally suitable for analyzing population history. Hypervariable regions I and II (HVRI, HVRII) in the control region of human mtDNA have been studied extensively to infer, for example, aspects of historical biogeography and the time since the most recent common ancestor of human mtDNAs (CANN et al. 1987 Down; HASEGAWA and HORAI 1991 Down; VIGILANT et al. 1991 Down; WARD et al. 1991 Down, WARD et al. 1993 Down; PESOLE et al. 1992 Down; STONEKING et al. 1992 Down; WATSON et al. 1996 Down; KRINGS et al. 1997 Down). Although >4000 HVRI and 900 HVRII sequences of humans from all over the world have been determined, knowledge of the substitution pattern is still far from complete. However, it is now clear that human control region sequences evolve according to a complex pattern that makes analysis difficult. For example, base composition is not uniform, transitions occur in greater frequencies than transversions, the number of pyrimidine transitions in the L-strand exceeds the number of purine transitions, and substitution rates vary among sites (AQUADRO and GREENBERG 1983 Down; KOCHER and WILSON 1991 Down; TAMURA and NEI 1993 Down; WAKELEY 1993 Down).

Probably the most enigmatic feature of HVR sequence evolution is the variation of rates among sites. Until recently, the importance of accounting for rate heterogeneity to obtain unbiased estimates of the transition-transversion ratio, unbiased dating of speciation events, and correct reconstruction of phylogenies has not been recognized (WAKELEY 1993 Down, WAKELEY 1994 Down, WAKELEY 1996 Down; YANG 1996 Down). Especially in a population genetics context, a model that describes the substitution process is explicitly necessary. Choosing an inappropriate model may lead to misinterpretation of the data. For example, mutational hot spots (i.e., positions at which substitutions accumulate predominantly) could mimic population expansion (LUNDSTROM et al. 1992A Down; BERTORELLE and SLATKIN 1995 Down; ARIS-BROSOU and EXCOFFIER 1996 Down). Ignoring the existence of heterogeneous mutation rates may yield biased estimates of measures of genetic diversity and/or parameters of population history (LUNDSTROM et al. 1992A Down; BERTORELLE and SLATKIN 1995 Down; ARIS-BROSOU and EXCOFFIER 1996 Down; DENG and FU 1996 Down; TAJIMA 1996 Down; MISAWA and TAJIMA 1997 Down). For example, the estimation of the time to the most recent common ancestor strongly depends on both the population genetics model and the estimated parameters therein. Therefore, population genetics inference benefits from a better understanding of the substitution process of a genomic region.

On the other hand, the findings of pedigree analyses have generated much confusion about the frequency of mutations that affect the HVR (HOWELL et al. 1996 Down; PARSONS et al. 1997 Down). From these studies, mutation rate estimates have been obtained that are ~20-fold higher than those derived from phylogenetic studies. The question of whether mutational hot spots can account for this enormous discrepancy in mutation rates has become a hotly debated issue (PAABO 1996 Down; JAZIN et al. 1998 Down; PARSONS and HOLLAND 1998 Down). To gauge the contribution of hot spots to the high rate estimate, we need to know the rate for each sequence position. Beyond being of use for the interpretation of the apparently conflicting mutation rate estimates, knowledge of site-specific rates is of great benefit to sequence analyses in general, as it allows the refinement of phylogenetic models and a more precise interpretation of population sequence data. This study provides a comprehensive survey of the nucleotide substitution pattern of HVRI and HVRII with special attention to variation among sites.


*  DATA
*TOP
*ABSTRACT
*DATA
*MODEL AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We used a publicly available collection of aligned human mitochondrial control-region sequences that comprised 4079 HVRI and 969 HVRII sequences of individuals from all over the world (HANDT et al. 1998 Down). It can be retrieved via the worldwide web at URL http://www.eva.mpg.de/hvrbase/. From this data collection we extracted all sequences that were sequenced without ambiguities in the range from 16024 to 16382 in HVRI and from 57 to 371 in HVRII according to the numbering of ANDERSON et al. 1981 Down. These sequence parts were chosen because they were the largest continuously determined subregions in the majority of sequences in the collection. The alignments of the two regions contained only a few gaps. The HVRI part enclosed seven gaps of varying length at positions 16104.1, 16169.1, 16174.1, 16183.1–16183.4, 16227.1, 16259.1, and 16366.1, whereas five gaps at positions 65.1, 190.1, 294.1, 302.1–302.4, and 310.1–310.2 were observed in the alignment of the HVRII sequences. Positions in the alignment that show a gap were excluded from the analysis. Because we pursued a phylogenetic approach, we reduced the set of sequences such that each sequence type was represented only once in our set. This led to a final data set comprising 1229 different HVRI and 385 different HVRII sequences.


*  MODEL AND METHODS
*TOP
*ABSTRACT
*DATA
*MODEL AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Model of sequence evolution:
To quantify the substitution process and rate heterogeneity among sites, we used the Tamura-Nei (TAMURA and NEI 1993 Down) model assuming {Gamma} -distributed rates. It has been suggested (WEISS and VON HAESELER 1998 Down) that this model is the most appropriate to describe the evolutionary process of human HVR sequences. The Tamura-Nei (1993) model with {Gamma} -distributed rates includes the parameters {pi}A, {pi}C, {pi}G, {pi}T, {kappa}, {tau}, and {alpha} that have to be estimated from the data. {pi} = ({pi}A, {pi}C, {pi}G, {pi}T) is the equilibrium distribution of base frequencies, the parameter {kappa} adjusts for the transition-transversion ratio, and {tau} describes the ratio of pyrimidine transitions to purine transitions. The shape parameter of the {Gamma} -distribution {alpha} is inversely related to the extent of rate heterogeneity among sites.

Parameter estimation:
The equilibrium distribution of base frequencies, {pi}, was estimated from the data by averaging the base composition of all sequences. This estimate should be very similar to the maximum-likelihood estimate (GOLDMANN 1993 Down). Estimation of the transition-transversion parameter {kappa}, the pyrimidine-purine transition parameter {tau}, and the rate-heterogeneity parameter {alpha} was done using a phylogenetic approach and a subsampling procedure. More precisely, we drew a random sample of different sequences from the data set containing either HVRI or HVRII sequences. From this random sample a tree was constructed and the parameters {kappa}, {alpha}, and {tau} were estimated from the tree using approximate maximum likelihood and discrete {Gamma} -distribution with eight categories as implemented in the PUZZLE program (STRIMMER and VON HAESELER 1996 Down). From biological considerations, one should expect a continuum of rates among sites (UZZEL and CORBIN 1971 Down; KOCHER and WILSON 1991 Down). However, maximum-likelihood calculations with the continuous {Gamma} -distribution involve intensive computations and are feasible only for data sets up to six sequences (YANG 1996 Down). We repeated the entire estimation procedure 150 times for different random samples of a given size to obtain parameter estimates that are not affected by the sample but are representative for all HVRI or HVRII sequences, respectively. Subsequently, we averaged the values of the parameter estimates from the subsamples. To investigate the effect of sample size on the estimates , , and , the above procedure was carried out for samples of size 10, 20, 30, ... , 80.

Site-specific rates:
In the following we assumed that the parameters {kappa}, {tau}, and {alpha} are known or estimates are given, for example, from the approach described in the previous section. To obtain estimates of the site-specific rates, we used a discretized {Gamma} -distribution (YANG 1993 Down). Here, the range of possible rates was divided into eight categories such that each category was equiprobable under the {Gamma} -distribution. Within each category we computed the median rate. Thus, we assumed that each site evolved according to one of these eight rates. The following procedure was applied to estimate site-specific rates: For a random sample of 50 different sequences a maximum-likelihood tree was computed. On the basis of this tree, the likelihood of a specific site was computed for each of the eight rates using PUZZLE (STRIMMER and VON HAESELER 1996 Down). The rate that yielded the highest likelihood value was assigned to this site. From a Bayesian point of view this corresponds to choosing the rate with the maximal posterior probability, because the way of discretizing the {Gamma} -distribution resulted in a uniform prior distribution. Thus, for each site in the alignment of 50 sequences, a rate was computed. The entire procedure was repeated 50 times for different random samples. Therefore we obtained for each site 50 estimates of its specific rate. The relative rate for each site was simply the average value.


*  RESULTS
*TOP
*ABSTRACT
*DATA
*MODEL AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Figure 1 displays the averages of the transition-transversion parameter {kappa}, the pyrimidine-purine transition parameter {tau}, and the rate-heterogeneity parameter {alpha} for HVRI and HVRII as a function of the number of different sequences sampled (sample size). With increasing sample size all estimates decrease and show few changes for samples of size 60 or more; this is also reflected in the decrease of the variance of the sample mean. The observation that 60 or more sequences are needed to reduce the bias in parameter estimates is due to the fact that the sequences are very similar, and so there is little information regarding the parameters. For more divergent sequences we do not expect such a strong relationship between subsample size and bias.



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 1. The ordinate represents the value of the parameter estimate, calculated as the average from 150 repeats. The abscissa gives the number of different sequences in each sample. HVRI and HVRII represent the estimated values for the transition-transversion parameter, HVRI and HVRII are the estimated values for the transition-transversion parameter, and HVRI and HVRII denote the estimates of the rate-heterogeneity parameter for HVRI and HVRII, respectively. Asterisks reflect averaged values from 150 repeats each; bars reflect twofold standard deviation of the sample mean.

In HVRI, the estimate of the transition-transversion parameter {kappa} decreases from 20.0 to 15.7 with increasing sample size, whereas {kappa} for HVRII varies from 7.1 to 7.6, more or less independently of the sample size. A similar picture emerges for the estimation of the pyrimidine-purine transition parameter {tau}. In HVRI, the estimated value of {tau} decreases from 2.5 to 1.75, and the estimated value of {tau} in HVRII varies between 1.07 and 1.18.

Because larger subsamples reflect the transition-transversion ratio and the pyrimidine-purine transition ratio with smaller standard derivation than smaller subsamples, we suppose that the estimated values of {kappa} and {tau} for the larger subsamples are closer to the true values. Small samples may contain none or too few transversions, and thus {kappa} is overestimated. If {kappa} were not inferred correctly, then {tau} could not be inferred correctly either.

To estimate the rate-heterogeneity parameter, {alpha} samples of size 10 were too small, whereas, for samples of size 20 and larger, estimates of {alpha} are close to 0.26 and 0.13 in HVRI and HVRII, respectively. Table 1 summarizes the averaged values for samples of size 80 and 150 repeats. The estimated transition-transversion parameter {kappa} in HVRI is approximately twice as high as the corresponding HVRII value. Accordingly, the estimate of the pyrimidine-purine transition parameter {tau} from HVRI is higher than the estimate of {tau} from HVRII. The smaller value of the estimated rate-heterogeneity parameter {alpha} in HVRII indicates that the mutation rate of this region is more heterogeneous than in HVRI. Calculation of the expected number of substitutions from the Tamura-Nei (1993) rate matrix reveals that the expected number of transversions is approximately the same for both HVRs. The two regions differ mainly in the number of pyrimidine transitions, leading to the higher pyrimidine-purine transition ratio and higher transition-transversion ratio in HVRI. The total number of substitutions is twice as high in HVRI as in HVRII (8.14:4.08). The parameter estimates in Table 1 provide a comprehensive model of HVRI and HVRII evolution. On the basis of this model, we estimated site-specific rates, summarized in Figure 2. As expected from the estimate of rate heterogeneity in HVRII, sites evolve either with a small relative rate or with a high rate. The fastest positions in HVRII evolve more than six times faster than the average substitution rate of this region. In HVRI the fastest positions evolve with relative rates of 4.8. The proportion of sites in HVRII with relative rates <0.001 (virtually not variable) is 0.54. This is almost twice as high as the value of 0.28 that we found for HVRI.



View larger version (37K):
In this window
In a new window
Download PPT slide
 
Figure 2. Estimated relative rates vs. sequence positions of HVRI and HVRII. Lengths of bars reflect the respective rates for each site; the average substitution rate is 1. Blue and green bars are at positions that have also been classified as fast by WAKELEY 1993 Down or HASEGAWA et al. 1993 Down. Red bars mark positions that have been identified as fast by both. Arrows indicate positions where substitutions were observed in family studies. Yellow, PARSONS et al. 1997 Down; orange, HOWELL et al. 1996 Down. Locations of major regulatory functions are given below the graph. SP, trinucleotide stop-point for the 3' ends of the 7S DNA strands (16104–16106; DODA et al. 1981 Down); 7S DNA, location of the 7S DNA (DODA et al. 1981 Down) that ranges from position 16106 to at least position 110 but not beyond position 440 (CHANG and CLAYTON 1985 Down); TAS, termination-associated sequence (16157–16172; DODA et al. 1981 Down); CE, possible control element (16194–16208; OHNO et al. 1991 Down); OH, origin of heavy strand replication (110–440; CHANG and CLAYTON 1985 Down); CSB I–III, conserved sequence blocks I–III (216–235, 299–315, 346–363; WALBERG and CLAYTON 1981 Down); TFB, mitochondrial transcription-factor binding sites (233–260, 276–303; CLAYTON 1991 Down); RMC, the RNase MRP cleaving site (317–321; CLAYTON 1991 Down). For further explanation, see text.


 
View this table:
In this window
In a new window

 
Table 1. Summary of estimated parameters and ratios


*  DISCUSSION
*TOP
*ABSTRACT
*DATA
*MODEL AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Applying maximum-likelihood methods to randomly selected subsets of different sequences, we estimated from the subsets the parameters of the Tamura-Nei (1993) model with rate heterogeneity. The importance of simultaneous parameter estimation, especially if rate heterogeneity exists among sites, has become increasingly clear (WAKELEY 1994 Down, WAKELEY 1996 Down; SWOFFORD et al. 1995 Down). Our estimates agree by and large with results from other studies, even though direct comparison is complicated by the different parts of the HVR studied. Moreover, assessing the significance of the differences to published estimates is demanding, because this requires determination of the variances of our estimation procedure. In principle, this could be done by one of the usual resampling techniques (EFRON and TIBSHIRANI 1993 Down). However, the amount of iterations necessary is computationally very expensive. HASEGAWA and HORAI 1991 Down analyzed a smaller sample composed of three data sets that cover different positions of the control region. They estimated transition-transversion ratios between 14.5 and 27.0, depending on the data set. Other estimates of the transition-transversion ratio range from 12 to 37 (HORAI and HAYASAKA 1990 Down; HASEGAWA and HORAI 1991 Down; KOCHER and WILSON 1991 Down; VIGILANT et al. 1991 Down; PESOLE et al. 1992 Down; TAMURA and NEI 1993 Down). TAMURA and NEI 1993 Down obtained by parsimony analysis transition-transversion ratios of 18.8, 12.2, and 15.7 for HVRI, HVRII, and both (HVRI + HVRII), respectively. Our estimated ratios (Table 1) are slightly lower than TAMURA and NEI's (1993), even though it is known that estimates of substitution parameter using parsimony are likely to be underestimates. It is possible that in TAMURA and NEI's (1993) data set transversions are underrepresented and thus the transition-transversion ratio is overestimated. For example, WARD et al. 1991 Down, who examined the transition-transversion ratio from a sample of 28 sequences of HVRI, found no transversions at all. This fits well with our observation that the transition-transversion parameter for HVRI decreases with increasing sample size. Estimates of the rate heterogeneity parameter {alpha} have been reported to be 0.11 for the entire control region (KOCHER and WILSON 1991 Down; TAMURA and NEI 1993 Down) and 0.47 for HVRI (WAKELEY 1993 Down). No separate estimates of {alpha} are published for HVRII, but it is known that HVRII has a higher heterogeneity of rates than HVRI (ARIS-BROSOU and EXCOFFIER 1996 Down). Our estimate of 0.26 for {alpha} in HVRI is substantially smaller than Wakeley's estimate of 0.47. This is due to the different data set and the bias inherent in the parsimony-based inference of rate heterogeneity (WAKELEY 1993 Down).

Up to now, estimates for site-specific rates have been derived only for HVRI by counting the numbers of substitutions in a most parsimonious tree (HASEGAWA et al. 1993 Down; WAKELEY 1993 Down). HASEGAWA et al. 1993 Down estimated the rates from only 14 HVRI sequences, studying the whole HVRI region, whereas WAKELEY 1993 Down estimated rates from positions 16130 to 16379 in 322 sequences. Positions that experienced more than five substitutions were classified as fast. Table 2 lists positions that are, according to our approach, at least twice as fast as the average rate in HVRI. The higher our rate estimate, the higher the concordance with positions that have been classified as fast by either WAKELEY 1993 Down and/or HASEGAWA et al. 1993 Down. Eight of the nine positions with rates >4 are also fast evolving according to the other studies. Both studies classified five positions as fast, for which we estimated moderate rates between 1 and 2 (16209, 16290, 16291, 16298, and 16304). Position 16234 has been classified as fast by WAKELEY 1993 Down and HASEGAWA et al. 1993 Down, for which we found a rate close to 1. However, no position has been classified as fast by either WAKELEY 1993 Down or HASEGAWA et al. 1993 Down, for which we found a relative rate of <1.


 
View this table:
In this window
In a new window

 
Table 2. HVRI positions with rate of >=2

Our approach, however, detected quite a few positions that are evolving moderately rapidly. The deviations are due to the smaller data sets analyzed by WAKELEY 1993 Down and HASEGAWA et al. 1993 Down. Moreover, they used different estimation methods. The parsimony method is known to be biased, because a parsimony reconstruction of the internal nodes of the tree gives the minimum number of changes required at a site, and the amount of rate heterogeneity therefore is underestimated (WAKELEY 1993 Down). Our results suggest that maximum-likelihood methods in combination with the subsampling strategy are an approach to detect rapidly evolving sites. Moreover, this approach has the advantage of rates estimated relative to the mean substitution rate. These rates are easier to interpret than actual numbers of substitutions and furthermore simplify comparison of rates among sites. According to our results, the proportion of almost invariable sites, 54% in HVRI and 28% in HVRII, is considerably smaller than the HASEGAWA and HORAI 1991 Down estimate of 70%. From neutral theory (KIMURA 1983 Down), the existence of almost invariable sites suggests that these sites are subject to functional constraints. Although the control region is noncoding, it is known to contain the main regulatory elements for transcription and replication. It is the binding site for numerous molecules such as DNA and RNA polymerases and other transcription and regulatory factors and thus may well be subjected to various evolutionary pressures (SACCONE et al. 1991 Down). HVRII is probably the more important functional part of the control region because it contains the origin of heavy strand replication (positions 110–440; CHANG and CLAYTON 1985 Down).

The major regulatory features of HVRII that lie within the region studied here are three conserved sequence blocks (CSBs) that have been suggested to serve as control sequences involved in the transition from primer RNA synthesis to DNA synthesis (WALBERG and CLAYTON 1981 Down), the RNase MRP cleaving site (RMC; CLAYTON 1991 Down), and two mitochondrial transcription-factor binding sites (TFB; CLAYTON 1991 Down). Looking at the rates of these regulatory elements, we see that the majority of the positions in the CSBs (216–235, 299–315, and 346–363) and the RMC (317–321) evolve much more slowly than an average site of HVRII, with the exception of position 357 in CSBIII, which has a rate of 2.29. The TFBs (233–260 and 276–303) show four sites (236, 247, 295, and 297) that evolve more than twice as fast as the average rate, while the rest of the positions show moderate variability (range, 0.0001–1.28). The finding that the mitochondrial transcription factor has flexible sequence specificity (FISHER et al. 1989 Down) might explain the slightly higher average variability found in the TFBs compared to the CSBs and the RMC. Positions 16104–16106 map the trinucleotide stop-point for the 3' ends of the 7S DNA strands (SP; DODA et al. 1981 Down). For these positions, as well as for a possible control element (CE; OHNO et al. 1991 Down; 16194–16208), most rates are close to 0. The termination-associated sequence TAS (DODA et al. 1981 Down; 16157–16172), which is a putative template stop signal for the elongation of the D-loop strands, shows more variability than the functional regions mentioned above. This analysis provides the interesting insight that functionally important regions do not necessarily have a small substitution rate.

Another question of interest is how well our site-specific estimates correlate with positions that were variable in pedigree analyses (HOWELL et al. 1996 Down; PARSONS et al. 1997 Down). The rates for these 10 positions are given in Table 3. Surprisingly, position 94 is invariant in our collection of 969 HVRII sequences (HANDT et al. 1998 Down). For positions 16092 and 234, the rates are <1. The 7 remaining positions show rates of >2. Among these are 3 positions with the highest rate of 6.22. Altogether our results suggest that the substitutions observed in family studies occur preferentially but not exclusively at sites with elevated rates. Even though diminishing the discrepancy (PAABO 1996 Down; JAZIN et al. 1998 Down; PARSONS and HOLLAND 1998 Down) of rates estimated from family and phylogenetic studies, our observations do not fully explain the ~20-fold higher rate of the latter. But one should be aware that this time we exploited a method where rates are collected in eight categories. Thus the category representing the highest rate lumps together all sites that are extremely fast. That is to say, an upper limit for the relative rate at a position is introduced. Therefore, it is possible that the numerical values for some of the fast positions are underestimated. The discrete {Gamma} -approach (YANG 1993 Down) yielded eight nonequidistant rates. Especially small evolutionary rate factors are very close to each other. Hence, for some site patterns the likelihood values do not differ substantially between neighboring categories. Because the final rate at a position is an average over many subsamples, possible inaccuracies in the rate assignment for a single subsample should have no substantial effect. Certainly these problems could be circumvented if a continuous {Gamma} -distribution were applied. Unfortunately, this is at present computationally unfeasible.


 
View this table:
In this window
In a new window

 
Table 3. Relative rates for positions where subsitutions were observed in family studies

In this article we used maximum-likelihood methods to coestimate the parameters of the Tamura-Nei (1993) model including rate heterogeneity. By using a purely phylogenetic approach, we regarded the sequences as an interspecies data set. Therefore, the analyses were based on a restricted set of sequences, where each sequence type of the data collection was represented only once. This restriction may result in a loss of information about the mutational process, because clearly the human mtDNA variation is shaped by demographic processes as well. Obviously, it would be desirable to estimate substitution and demographic parameters simultaneously by using a population genetics model. To this end, coalescence theory (KINGMAN 1982A Down, KINGMAN 1982B Down, KINGMAN 1982C Down) provides an excellent tool to describe the ancestral relationship of a sample of sequences under various demographic scenarios (DONNELLY and TAVARE 1995 Down). The mutational process has been studied under demographic models of limited complexity (LUNDSTROM et al. 1992A Down, LUNDSTROM et al. 1992B Down). However, accurately modeling the dynamics of the worldwide population may result in an overly parameter-rich model. On the other hand, it seems worthwhile to put further effort into creating a more exact picture of the relative rates, as knowledge of the site-specific rates combined with the strength of the pedigree approach (HOWELL et al. 1996 Down; PARSONS et al. 1997 Down) provides a powerful tool to unravel the absolute mutation rate in HVR. Our study is a step in this direction, as relative rates for the HVR have been estimated on the basis of a survey of a large amount of data. Our parameter estimates and rate estimates can be used to refine models of molecular structure and function as well as methods of phylogenetic inference. Thus, a better understanding of the forces and mechanisms that affect sequence evolution can be obtained and more sound conclusions about history of sequences and populations can be drawn.


*  ACKNOWLEDGMENTS

We express our special thanks to Korbinian Strimmer, Roland Fleißner, and Svante Pääbo for stimulating discussions. We also thank Simon Tavaré and two anonymous referees for helpful comments on the manuscript. Financial support from the Deutsche Forschungsgemeinschaft is greatly appreciated.

Manuscript received August 6, 1998; Accepted for publication April 2, 1999.


*  LITERATURE CITED
*TOP
*ABSTRACT
*DATA
*MODEL AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ANDERSON, S., A. T. BANKIER, B. G. BARELL, M. H. L. DE BRUIJN, and A. R. COULSON et al., 1981  Sequence and organization of the human mitochondrial genome. Nature 290:457-465[Medline].

AQUADRO, C. F. and B. D. GREENBERG, 1983  Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. Genetics 103:287-312[Abstract/Free Full Text].

ARIS-BROSOU, S. and L. EXCOFFIER, 1996  The impact of population expansion and mutation rate heterogeneity on DNA sequence polymorphism. Mol. Biol. Evol. 13:494-504[Abstract].

BERTORELLE, G. and M. SLATKIN, 1995  The number of segregating sites in expanding human populations, with implications for estimates of demographic parameters. Mol. Biol. Evol. 12:887-892[Abstract].

CANN, R., M. STONEKING, and A. C. WILSON, 1987  Mitochondrial DNA and human evolution. Nature 325:31-36.

CHANG, D. D. and D. A. CLAYTON, 1985  Priming of human mitochondrial DNA replication occurs at the light-strand promotor. Proc. Natl. Acad. Sci. USA 82:351-355[Abstract/Free Full Text].

CLAYTON, D. A., 1991  Nuclear gadgets in mitochondrial DNA replication and transcription. Trends Biol. Sci. 16:107-111.

DENG, H. and Y. FU, 1996  The effects of variable mutation rates across sites on the phylogenetic estimation of effective population size or mutation rate of dna sequences. Genetics 144:1271-1281[Abstract].

DODA, N. D., C. T. WRIGHT, and D. A. CLAYTON, 1981  Elongation of displacement-loop strands in human and mouse mitochondrial DNA is arrested near specific template sequences. Proc. Natl. Acad. Sci. USA 10:6116-6120.

DONNELLY, P. and S. TAVARÉ, 1995  Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29:401-421[Medline].

EFRON, B., and R. TIBSHIRANI, (Editors) 1993 Monographs in Statistics and Applied Probability. Chapman & Hall, London/New York.

FISHER, R. P., M. A. PARAISI, and D. A. CLAYTON, 1989  Flexible recognition of rapidly evolving promotor sequences by mitochondrial transcription factor 1. Genes Dev. 3:2202-2217[Abstract/Free Full Text].

GARNER, K. J. and O. A. RYDER, 1996  Mitochondrial DNA diversity in gorillas. Mol. Phyol. Evol. 6:39-48.

GOLDBERG, T. and M. RUVOLO, 1997  The geographic apportionment of the mitochondrial genetic diversity in east African chimpanzees, Pan Troglodytes schweinfurthii. Mol. Biol. Evol. 14:976-984[Abstract].

GOLDMANN, N., 1993  Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182-198[Medline].

HANDT, O., S. MEYER, and A. VON HAESELER, 1998  Compilation of human mtDNA control region sequences. Nucleic Acids Res. 26:126-129[Abstract/Free Full Text].

HASEGAWA, M. and S. HORAI, 1991  Time of the deepest root for polymorphism in human mitochondrial DNA. J. Mol. Evol. 32:37-42[Medline].

HASEGAWA, M., A. D. RIENZO, T. KOCHER, and A. WILSON, 1993  Toward a more accurate time scale for the human mitochondrial DNA tree. J. Mol. Evol. 37:347-354[Medline].

HORAI, S. and K. HAYASAKA, 1990  Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA. Am. J. Hum. Genet. 46:828-842[Medline].

HOWELL, N., I. KUBACKA, and D. A. MACKEY, 1996  How rapidly does the human genome evolve? Am. J. Hum. Genet. 59:501-509[Medline].

JAZIN, E., H. SOODYALL, P. JALONEN, E. LINDHOLM, and M. STONEKING et al., 1998  Mitochondrial mutation rate revisited: hot spots and polymorphism. Nat. Genet. 18:109-110[Medline].

KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, London.

KINGMAN, J. F. C., 1982a  The coalescent. Stoch. Proc. Appl. 13:235-248.

KINGMAN, J. F. C., 1982b  On the genealogy of large populations. J. Appl. Prob. 19A:27-43.

KINGMAN, J. F. C., 1982c Exchangeability and the evolution of large populations, pp. 97–112 in Exchangeability in Probability and Statistics, edited by G. KOCH and F. SPIZZICHINO. North-Holland Publishing Company, Amsterdam.

KOCHER, T. D., and A. C. WILSON, 1991 Sequence evolution of mitochondrial DNA in humans and chimpanzees: control region and protein-coding regions, pp. 391–413 in Evolution of Life: Fossils, Molecules and Culture, edited by S. OSAWA and T. HONIO. Springer Verlag, Tokyo.

KRINGS, M., A. STONE, R. W. SCHMITZ, H. KRAINITZKY, and M. STONEKING et al., 1997  Neanderthal DNA sequences and the origin of modern humans. Cell 90:19-30[Medline].

LUNDSTROM, R., S. TAVARÉ, and R. H. WARD, 1992a  Modelling the evolution of the human mitochondrial genome. Math. Biosci. 112:319-335[Medline].

LUNDSTROM, R., S. TAVARÉ, and R. H. WARD, 1992b  Estimating substitution rates from molecular data using the coalescent. Proc. Natl. Acad. Sci. USA 89:5961-5965[Abstract/Free Full Text].

MISAWA, K. and F. TAJIMA, 1997  Estimation of the amount of dna polymorphism when the neutral mutation rate varies among sites. Genetics 147:1959-1964[Abstract].

OHNO, K., M. TANAKA, H. SUZUKI, T. OHBAYASHI, and S. I. IKEBA et al., 1991  Identification of a possible control element, mt5, in the major noncoding region of mitochondrial DNA by intraspecific nucleotide conservation. Biochem. Int. 24:263-272[Medline].

PÄÄBO, S., 1996  Mutational hot spots in the mitochondrial microcosm. Am. J. Hum. Genet. 59:493-496[Medline].

PARSONS, T. and M. M. HOLLAND, 1998  Response to: Mitochondrial mutation rate revisited: hot spots and polymorphism. Nat. Genet. 18:110-110.

PARSONS, T. J., D. S. MUNIEC, K. SULLIVAN, N. WOODYATT, and R. ALLISTON-GREINER et al., 1997  A high observed substitution rate in the human mitochondrial DNA control region. Nat. Genet. 15:363-367[Medline].

PESOLE, G., E. SBISA, G. PREPARATA, and C. SACCONE, 1992  The evolution of the mitochondrial D-loop region and the origin of modern man. Mol. Biol. Evol. 9:587-598[Abstract].

SACCONE, C., G. PESOLE, and E. SBISÁ, 1991  The main regulatory region of mammalian mitochondrial DNA: structure-function model and evolutionary pattern. J. Mol. Evol. 33:83-91[Medline].

STONEKING, M., S. T. SHERRY, A. J. REDD, and L. VIGILANT, 1992  New approaches to dating suggest a recent age for the human mtDNA ancestor. Philos. Trans. R. Soc. Lond. 337:167-175[Medline].

STRIMMER, K. and A. VON HAESELER, 1996  Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969.

SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL and D. M. HILLIS, 1995 Accommodating rate heterogeneity among sites, pp. 442–445 in Molecular Systematics, edited by D. M. HILLIS, C. MORITZ and B. K. MABLE. Sinauer, Sunderland, MA.

TAJIMA, F., 1996  The amount of DNA polymorphism maintained in a finite population when the neutral mutation rate varies among sites. Genetics 143:1457-1465[Abstract].

TAMURA, K. and M. NEI, 1993  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512-526[Abstract].

UZZEL, T. and K. W. CORBIN, 1971  Fitting discrete probability distributions to evolutionary events. Science 172:1089-1096[Abstract/Free Full Text].

VIGILANT, L., M. STONEKING, H. HARPENDING, K. HAWKES, and A. C. WILSON, 1991  African populations and the evolution of mitochondrial DNA. Science 253:1503-1507[Abstract/Free Full Text].

WAKELEY, J., 1993  Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J. Mol. Evol. 37:613-623[Medline].

WAKELEY, J., 1994  Substitution rate variation among sites and the estimation of transition bias. Mol. Biol. Evol. 11:436-442[Abstract].

WAKELEY, J., 1996  The excess of transitions among nucleotide substitutions: new methods of estimating transition bias underscore its significance. TREE 11:158-163.

WALBERG, M. W. and D. A. CLAYTON, 1981  Sequence and properties of the human KB cell and mouse L cell D-loop regions of mitochondrial DNA. Nucleic Acids Res. 9:5411-5421[Abstract/Free Full Text].

WARD, R. H., B. L. FRAZIER, K. DEW-JAGER, and S. PÄÄBO, 1991  Extensive mitochondrial diversity within a single Amerindian tribe. Proc. Natl. Acad. Sci. USA 88:8720-8724[Abstract/Free Full Text].

WARD, R. H., A. REDD, D. VALENCIA, B. FRAZIER, and S. PÄÄBO, 1993  Genetic and linguistic differentiation in the Americas. Proc. Natl. Acad. Sci. USA 90:10663-10667[Abstract/Free Full Text].

WATSON, E., K. BAUER, R. AMAN, G. WEISS, and A. VON HAESELER et al., 1996  mtDNA sequence diversity in Africa. Am. J. Hum. Genet. 59:437-444[Medline].

WEISS, G. and A. VON HAESELER, 1998  Inference of population history using a likelihood approach. Genetics 149:1539-1546[Abstract/Free Full Text].

YANG, Z., 1993  Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:1396-1401[Abstract].

YANG, Z., 1996  Among-site rate variation and its impact on phylogenetic analyses. TREE 11:367-372.




This article has been cited by other articles:


Home page
J HeredHome page
A. Blancher, M. Bonhomme, B. Crouau-Roy, K. Terao, T. Kitano, and N. Saitou
Mitochondrial DNA Sequence Phylogeny of 4 Populations of the Widely Distributed Cynomolgus Macaque (Macaca fascicularis fascicularis)
J. Hered., May 1, 2008; 99(3): 254 - 264.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. M. Pilkington, J. A. Wilder, F. L. Mendez, M. P. Cox, A. Woerner, T. Angui, S. Kingan, Z. Mobasher, C. Batini, G. Destro-Bisol, et al.
Contrasting Signatures of Population Growth for Mitochondrial DNA and Y Chromosomes among Human Populations in Africa
Mol. Biol. Evol., March 1, 2008; 25(3): 517 - 525.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Howell, J. L. Elson, C. Howell, and D. M. Turnbull
Relative Rates of Evolution in the Coding and Control Regions of African mtDNAs
Mol. Biol. Evol., October 1, 2007; 24(10): 2213 - 2221.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. A. Tishkoff, M. K. Gonder, B. M. Henn, H. Mortensen, A. Knight, C. Gignoux, N. Fernandopulle, G. Lema, T. B. Nyambo, U. Ramakrishnan, et al.
History of Click-Speaking Populations of Africa Inferred from mtDNA and Y Chromosome Genetic Variation
Mol. Biol. Evol., October 1, 2007; 24(10): 2180 - 2195.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. G. B. Blum and N. A. Rosenberg
Estimating the Number of Ancestral Lineages Using a Maximum-Likelihood Method Based on Rejection Sampling
Genetics, July 1, 2007; 176(3): 1741 - 1757.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
Y. Zhou, H. Ushijima, and T. K. Frey
Genomic analysis of diverse rubella virus genotypes
J. Gen. Virol., March 1, 2007; 88(3): 932 - 941.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. K. Gonder, H. M. Mortensen, F. A. Reed, A. de Sousa, and S. A. Tishkoff
Whole-mtDNA Genome Sequence Analysis of Ancient African Lineages
Mol. Biol. Evol., March 1, 2007; 24(3): 757 - 768.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. M. S. Belle, U. Ramakrishnan, J. L. Mountain, and G. Barbujani
Serial coalescent simulations suggest a weak genealogical relationship between Etruscans and modern Tuscans
PNAS, May 23, 2006; 103(21): 8012 - 8017.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Santos, R. Montiel, B. Sierra, C. Bettencourt, E. Fernandez, L. Alvarez, M. Lima, A. Abade, and M. P. Aluja
Understanding Differences Between Phylogenetic and Pedigree-Derived mtDNA Mutation Rate: A Model Using Families from the Azores Islands (Portugal)
Mol. Biol. Evol., June 1, 2005; 22(6): 1490 - 1505.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Lalueza-Fox, M. L. Sampietro, D. Caramelli, Y. Puder, M. Lari, F. Calafell, C. Martinez-Maza, M. Bastir, J. Fortea, M. d. l. Rasilla, et al.
Neandertal Evolutionary Genetics: Mitochondrial DNA Data from the Iberian Peninsula
Mol. Biol. Evol., April 1, 2005; 22(4): 1077 - 1081.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Howell, J. L. Elson, D. M. Turnbull, and C. Herrnstadt
African Haplogroup L mtDNA Sequences Show Violations of Clock-like Evolution
Mol. Biol. Evol., October 1, 2004; 21(10): 1843 - 1854.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
I. Mayrose, D. Graur, N. Ben-Tal, and T. Pupko
Comparison of Site-Specific Rate-Inference Methods for Protein Sequences: Empirical Bayesian Methods Are Superior
Mol. Biol. Evol., September 1, 2004; 21(9): 1781 - 1791.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Fuselli, E. Tarazona-Santos, I. Dupanloup, A. Soto, D. Luiselli, and D. Pettener
Mitochondrial DNA Diversity in South America and the Genetic History of Andean Highlanders
Mol. Biol. Evol., October 1, 2003; 20(10): 1682 - 1691.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Meyer and A. von Haeseler
Identifying Site-Specific Substitution Rates
Mol. Biol. Evol., February 1, 2003; 20(2): 182 - 189.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page