Abstract
To test whether patterns of mitochondrial DNA (mtDNA) variation are consistent with a neutral model of molecular evolution, nucleotide sequences were determined for the 1041 bp of the NADH dehydrogenase subunit 2 (ND2) gene in 20 geographically diverse humans and 20 common chimpanzees. Contingency tests of neutrality were performed using four mutational categories for the ND2 molecule: synonymous and nonsynonymous mutations in the transmembrane regions, and synonymous and nonsynonymous mutations in the surface regions. The following three topological mutational categories were also used: intraspecific tips, intraspecific interiors, and interspecific fixed differences. The analyses reveal a significantly greater number of nonsynonymous polymorphisms within human transmembrane regions than expected based on interspecific comparisons, and they are inconsistent with a neutral equilibrium model. This pattern of excess nonsynonymous polymorphism is not seen within chimpanzees. Statistical tests of neutrality, such as Tajima's D test, and the D and F tests proposed by Fu and Li, indicate an excess of low frequency polymorphisms in the human data, but not in the chimpanzee data. This is consistent with recent directional selection, a population bottleneck or background selection of slightly deleterious mutations in human mtDNA samples. The analyses further support the idea that mitochondrial genome evolution is governed by selective forces that have the potential to affect its use as a “neutral” marker in evolutionary and population genetic studies.
THE neutral theory of molecular evolution asserts that most mutations are deleterious and are quickly removed from the population, thereby contributing little, if anything, to the levels of polymorphism detected within a species (Kimura 1983). Genetic variation within a species is largely caused by random genetic drift of mutations that are selectively equivalent (i.e., neutral). One of the appealing features of the neutral theory is that it provides a number of straightforward predictions, and thus serves as a useful null hypothesis for studies of genetic variation within and between species. One such prediction is that the amount of nucleotide polymorphism within a species will be correlated with the amount of divergence between species (e.g., Hudsonet al. 1987). An additional prediction of the neutral theory, formulated into a test by McDonald and Kreitman (1991), is that the ratio of amino acid replacement (nonsynonymous) to silent (synonymous) nucleotide differences will be the same within and between species.
Mitochondrial DNA (mtDNA) is widely used as a marker in population and evolutionary studies, and it is generally assumed to evolve according to a neutral model of molecular evolution. This assumption may be important for such things as measuring gene flow (Slatkin 1985), estimating effective population size (Wilsonet al. 1985), detecting population subdivision (Aviseet al. 1987), and dating times of species divergence or historical events within a species using a molecular clock (Brown 1980; Cannet al. 1987). The apparent lack of genetic recombination in mammalian mtDNA (Thyagarajanet al. 1996) means that the whole genome is a single completely linked entity. Any selective force acting at one site will equally affect the history of the whole molecule. Thus, the selective fixation of an advantageous mutation, for example, will lead to the concomitant fixation of all other polymorphisms through the process of genetic hitchhiking. Analyses of the simple hitchhiking model predict a reduction in heterozygosity (Maynard-Smith and Haigh 1974; Stephanet al. 1992) and in the number of segregating (or polymorphic) sites (Kaplanet al. 1989) near the selected site.
A number of studies have compared patterns of RFLP variation in human mtDNA to neutral predictions (e.g., Johnsonet al. 1983; Whittamet al. 1986; Excoffier 1990; Merriwetheret al. 1991) and found fewer intermediate frequency polymorphisms than expected using Watterson's (1978) test of homozygosity and/or Tajima's (1989) test. While these findings are inconsistent with neutral expectations, it is unclear whether the deviations arise from recent natural selection or changing population sizes. Rogers and Harpending (1992) studied the distribution of pairwise nucleotide differences for human mitochondrial data and found that the distribution does not conform to a neutral equilibrium model. They suggest that the results fit well with a rapid population expansion, but they do not rule out the possibility of a population bottleneck. A bottleneck could well have been the result of a selective sweep of a mtDNA type rather than an actual population size reduction. In these studies, departures from the neutral equilibrium model can be explained by a variety of processes, including selection.
Evidence for selection in mtDNA comes from more recent studies utilizing the McDonald and Kreitman (1991) approach (Ballard and Kreitman 1994; Nachman et al. 1994, 1996; Randet al. 1994; Templeton 1996). Common to all of these studies was the finding of higher ratios of nonsynonymous to synonymous nucleotide differences within species than between species either for all or part of the genes in question.
In humans and chimpanzees, previous studies have involved DNA sequence data from the 345-bp NADH dehydrogenase subunit 3 (ND3) gene (Nachmanet al. 1996) and the 783-bp cytochrome c oxidase subunit II (COII) gene (Templeton 1996) using limited sample sizes (particularly chimpanzees). To investigate neutral predictions further, we have sequenced the 1041-bp NADH dehydrogenase subunit 2 (ND2) gene in 20 geographically diverse humans and 20 common chimpanzees. The patterns of variation within these species were compared to the patterns between species by using a simple contingency test of neutrality (McDonald and Kreitman 1991; Templeton 1987, 1996). Departures from neutrality were also investigated using Tajima's (1989) D test (hereafter referred to as DT), as well as the D and F tests proposed by Fu and Li (1993).
MATERIALS AND METHODS
Samples: Twenty human blood samples were obtained from indigenous populations of Africa (Bantu from Durban, South Africa), Asia (Cantonese from Hong Kong), Europe (AngloCelts from Canberra, Australia), and Australia (Aboriginal Australians from the Kimberley region of Western Australia).
Common chimpanzee (Pan troglodytes) blood samples were obtained from animals held under long-term observation in one of several primate colonies at the Laboratory of Slow, Latent and Temperate Virus Infection of the National Institutes of Health (NIH, Bethesda, MD). They were supplied by D. C. Gajdusek and C. J. Gibbs Jr. Twenty individuals were included in the study. These were drawn from a larger sample of 102 individuals, the majority of which were wild caught (Boardet al. 1981). Three major subspecies of Pan troglodytes are currently recognized: P. t. troglodytes, P. t. schweinfurthii, and P. t. verus. In the wild, these subspecies are geographically isolated yet very similar morphologically. The geographic origin of most captive chimpanzees in the United States, including those in our sample, is unknown. Previous analysis of mitochondrial control region sequences (Wiseet al. 1997) indicates, however, that all but two of the individuals (Pt175 and Pt176) included in the present study are from the west African subspecies P. t. verus. Furthermore, individual Pt281 may belong to the newly recognized subclade of western chimpanzees in Nigeria (Gonderet al. 1997).
DNA amplification and sequencing: Sequencing templates were prepared by polymerase chain reaction (PCR) amplification of three overlapping fragments (I–III) encompassing the 1041-bp ND2 gene in all individuals. Amplification primers were designed using the sequences reported by Horai et al. (1992) and are given in Table 1. PCR cycling conditions for fragments I and III were 30 cycles of 96° for 1 min, 56° for 1 min, and 73° for 1 min. Fragment II required the lower annealing temperature of 52°. For each fragment, two separate PCR reactions were performed with the primers (L′ biotinylated and H′ –21M13) and (L′ M13 reverse and H′ biotinylated). For each individual, both H (heavy) and L (light) strands were sequenced on a DNA sequencer (model 373A; Applied Biosystems, Foster City, CA). Consensus sequences were obtained by aligning forward- and reverse-complement sequences from the same individual in the SeqEd 675 DNA Sequence Editor program (Applied Biosystems). Fragments I–III were concatenated for each individual, and sequence alignment was performed manually using Genetic Data Environment (GDE) 2.2 (Smithet al. 1994). Previously published human (Andersonet al. 1981) and chimpanzee (Horaiet al. 1992) ND2 sequences were used in the comparative analyses. The 20 human and 20 chimpanzee mitochondrial ND2 sequences reported here have been deposited in the DDBJ/EMBL/GenBank International Nucleotide Sequence Database under accession numbers AF014882–AF014921.
Intraspecific variation: The amount of genetic variation within a species was estimated from the number of segregating (polymorphic) sites (S) using the following relation:
Primers for the amplification of the NADH dehydrogenase subunit 2 (ND2) gene of humans and chimpanzees
Median network analysis: A median network approach (Bandeltet al. 1995) was used to portray the human and chimpanzee mitochondrial ND2 sequence relationships. Median networks are generated by partitioning the groups of sequence types character by character. An unmodified network contains almost parsimonious solutions and displays graphically the full information content of the sequence data. This approach highlights any incompatibility between pairs of characters, which enables identification of homoplasy (parallel mutation events or reversals), and can assist in identifying sequencing errors.
Tests of neutrality: The data were tested for departures from the neutral expectation that the ratio of nonsynonymous to synonymous polymorphisms within species should equal the ratio of nonsynonymous to synonymous fixed differences between species (McDonald and Kreitman 1991). Within and between species, nucleotide differences were counted as the number of mutational events occuring along the various branches of a network connecting humans, chimpanzees, and gorilla. All of these nucleotide differences were classified as either nonsynonymous or synonymous.
In addition to the standard categories of nonsynonymous vs. synonymous mutations, an additional pair of categories was defined based on the predicted secondary structure of the human ND2 protein (Persson and Argos 1994). According to this model, there are 10 transmembrane domains of approximately equal length (21–29 codons) and 11 surface domains of various lengths (1–24 codons). Hence, there are a total of four mutational categories: transmembrane nonsynonymous, transmembrane synonymous, surface nonsynonymous, and surface synonymous.
Contingency tables were constructed in which one dimension consists of the structural mutational categories and the other dimension consists of the “fixed” vs. “polymorphic” categories. The intraspecific “polymorphic” class was further split into those mutations falling on “tip” branches vs. “interior” branches (Castelloe and Templeton 1994; Templeton 1996). Two-by-two contingency tables were analyzed using Fisher's exact test (FET). Larger tables were analyzed with an exact permutational test using the algorithm of Zaykin and Pudovkin (1993) and using 1000 random permutations of the data to simulate the null hypothesis of homogeneity. Uncorrected values were used in the statistical tests. This ensures that all observations are independent and results in conservative tests when there is an excess of polymorphic nonsynonymous differences since the number of fixed synonymous differences may be underestimated (Maynard-Smith 1994).
For n nucleotide sequences, quantities such as π, the average number of pairwise nucleotide differences between sequences (Nei 1987), S, the number of segregating (or polymorphic) sites (Watterson 1975), and the total number (η) of mutations and the number (ηe) of mutations in the external branches (Fu and Li 1993) may be calculated. These quantities were used to investigate departures from neutrality using the DT test [Equation 38 in Tajima (1989)], and the D and F tests proposed by Fu and Li (1993). This was done for all nucleotide sites, nonsynonymous sites, and synonymous sites in humans and chimpanzees.
RESULTS
Intraspecific variation: We determined the nucleotide sequence of the 1041-bp coding region of the ND2 gene (positions 4470–5510 according to the numbering system of Andersonet al. 1981) for 20 human and 20 common chimpanzee individuals. These sequences were compared with a previously published human (Andersonet al. 1981), chimpanzee, bonobo, and gorilla (Horaiet al. 1992) sequence. All polymorphic and fixed nucleotide sites are shown in Figure 1. Polymorphism data within humans and chimpanzees are summarized in Table 2.
Within humans, there were 17 sequence types among 21 individuals, and two sequence types were shared among two or four individuals. Most differences between the sequences (92.7%) result from transitiontype mutations, which is consistent with the general patterns of mtDNA sequence variation in humans (e.g., Aquadro and Greenberg 1983; Greenberget al. 1983; Vigilantet al. 1989; Horai and Hayasaka 1990; Horaiet al. 1993; Watsonet al. 1996). The uncorrected nucleotide diversity (per site) for the entire human sample was π = 0.24% ± 0.15%, πN = 0.17% ± 0.12% per nonsynonymous site, and πS = 0.46% ± 0.34% per synonymous site (Table 2).
Within chimpanzees, there were 16 sequence types among 21 individuals, and four sequence types were shared among two or three individuals. Again, differences between the sequences (99.1%) result almost exclusively from transition-type mutations. The bias toward transitions has been noted in previous sequence comparisons of mtDNA in chimpanzees (Morinet al. 1994; Wiseet al. 1997). The uncorrected nucleotide diversity (per site) for the entire chimpanzee sample was π = 1.02% ± 0.54%, πN = 0.31% ± 0.20% per nonsynonymous site, and πS = 3.12% ± 1.69% per synonymous site (Table 2). This high level of diversity derives partly from the presence of a few very divergent sequence types (Pt175 and Pt176) that differ at 28 out of 1041 sites (2.69%). This is considerably greater than the most divergent human sequence types, which differ at six out of 1041 sites (0.58%). It is, however, similar to the level of divergence reported for a small section of the mitochondrial cytochrome b gene (2.8%; Morinet al. 1994) and the ND3 gene (2.03%; Nachmanet al. 1996) between P. t. verus and either P. t. troglodytes or P. t. schweinfurthii. Divergence between P. t. troglodytes and P. t. schweinfurthii at cytochrome b is < 0.5% (Morinet al. 1994). It is therefore likely that our sample includes P. t. verus and at least one of the two closely related subspecies, P. t. troglodytes and P. t. schweinfurthii, as noted previously (Wiseet al. 1997).
To ensure that the chimpanzee sample represents a single interbreeding group, individuals Pt175 and Pt176 were excluded from all analyses. The uncorrected nucleotide diversity (per site) for P. t. verus is π = 0.73% ± 0.40%, πN = 0.18% ± 0.13% per nonsynonymous site, and πS = 2.35% ± 1.31% per synonymous site (Table 2).
Network evaluation: Before the contingency analysis can be performed, it is necessary to resolve any ambiguities in the networks, since this will determine the topological categories into which sequence differences are sorted. Figure 2A shows the unrooted median network for 17 human ND2 sequence types. There is a box of ambiguity involving nucleotide positions 4917 and 5147 (numbered according to Andersonet al. 1981). Since position 4917 is a nonsynonymous change, it is more parsimonious to assume that two mutational events have occurred at position 5147. This could involve either a reversal or two parallel mutations, which will be referred to as resolutions Ia and IIa, respectively. Figure 2B shows the unrooted median network for 16 chimpanzee ND2 sequence types. Homoplasies occur at nucleotide position 5177 (asterisks) and probably positions 5087 and 5492. The box of ambiguity involving position 4898 can be broken in two equally parsimonious ways involving either two parallel mutations or a reversal, which will be referred to as resolutions Ib and IIb, respectively. These ambiguities are important because the thick lines in the boxes can either be an interior branch or part of a tip branch, depending on which resolution is chosen. In all analyses, whenever these ambiguities are relevant, tests are performed under all resolutions to ensure robustness of the test results to this network uncertainty. Figure 2C shows a reduced ND2 network using the gorilla sequence found in Horai et al. (1992) as an outgroup. There are 11 fixed nonsynonymous nucleotide differences between humans and chimpanzees. The number of fixed synonymous differences varies between 84 and 91, depending on the branch placement of some mutations. In the contingency analyses to follow, the minimum number of fixed synonymous differences is used to ensure that the tests remain conservative.
Variable nucleotide sites in the coding region of the ND2 gene from humans, chimpanzees, bonobo (Horaiet al. 1992), and gorilla (Horaiet al. 1992). Nucleotide positions are numbered according to Anderson et al. (1981). A dot indicates identity to the gorilla sequence, and differences are classified as either nonsynonymous (N) or synonymous (S). Human sequences are from Africa (AFR), Europe (EUR), Asia (ASN), and Australia (AUS). Hs-ref is a previously published human sequence (Andersonet al. 1981). The majority of the chimpanzee sequences are from the west African subspecies P. t. verus, with the exception of Pt175 and Pt176, which are from P. t. troglodytes and/or P. t. schweinfurthii (Wiseet al. 1997). Pt-ref is a previously published chimpanzee sequence (Horaiet al. 1992).
Summary of ND2 variation in humans and chimpanzees
Contingency tests of neutrality: Contingency tables were constructed by counting the number of mutational events in the various categories and on the various types of branches for the networks shown in Figure 2. The contingency tables and test results are shown in Table 3 for the full contrast of all four mutational categories (transmembrane nonsynonymous, surface nonsynonymous, transmembrane synonymous, and surface synonymous) vs. all three topological categories (tip, interior, and fixed). The null hypothesis of homogeneity is rejected when polymorphism data from humans are compared with fixed differences between species, but not when chimpanzee polymorphisms are compared with interspecific differences (Table 3). The more standard McDonald and Kreitman (1991) test collapses the transmembrane and surface categories into the nonsynonymous/synonymous mutational categories and collapses the tip and interior categories into a single polymorphic category, yielding a 2 × 2 table. FETs were used to test the null hypothesis that the ratio of nonsynonymous to synonymous nucleotide differences is the same within and between species. The tests reveal a significantly higher nonsynonymous to synonymous ratio within humans than is seen between species (FET probability = 0.0005), but not within chimpanzees (FET probability = 0.4033).
(A) Unrooted median network of 17 human ND2 sequence types. (B) Unrooted median network of 16 chimpanzee ND2 sequence types. (C) Reduced human and chimpanzee ND2 network (see results) using a gorilla sequence as the outgroup. Circles denote sequence types, and individuals are identified as described in Figure 1. Differences between sequence types are numbered according to Anderson et al. (1981). Amino acid replacement (nonsynonymous) mutations are shown in bold, and likely parallel mutations (or reversals) are marked by asterisks. Unresolved homoplasies in the networks are indicated by thick lines. ①, 58 synonymous and 19 nonsynonymous; ②, 42 synonymous and 7 nonsynonymous; ③, 24 synonymous and 2 nonsynonymous; ④, 9 synonymous and 6 nonsynonsymous; ⑤, 10 synonymous and 2 nonsynonymous; ⑥, 4 synonymous and 5 nonsynonymous; ⑦, 6 synonymous and 1 nonsynonymous unambiguous nucleotide differences. The following sites are ambiguous with respect to their branch placement: 4511, branches 1 and 4, or 2 and 5; 4664, 5187, and 5420, branches 1 and 5, or 2 and 4; 4541, 4814, 4910, 5384, and 5471, branches 1 and 2, or 1 and 3, or 2 and 3.
Contingency analysis of the full mutational categories vs. the full network topological categories within and between species at the ND2 gene
The contingency test for the full model can be subdivided to investigate the evolutionary dynamics of the different mutational categories (Templeton 1987, 1996). First, the impact of the structural region of the molecule upon the evolutionary dynamics of nonsynonymous and synonymous mutations can be examined by a contingency test of the first and second rows of Table 3 (which contrasts the evolutionary dynamics of nonsynonymous mutations in the transmembrane vs. surface regions) and a separate contingency analysis of the third and fourth rows (synonymous mutations across the two structural regions). The permutational probability values for the contingency analysis of nonsynonymous mutations in the transmembrane vs. surface regions are 1.000 and 0.369 for the human and chimpanzee data, respectively. For synonymous mutations across structural regions, the permutational probabilities are 0.379 and 0.456 for human network resolutions Ia and IIa, respectively, and 0.580 and 0.599 for chimpanzee network resolutions Ib and IIb, respectively. None of these results is significant at the 5% level, and this may be caused in part by the small numbers of observations in some categories in the contingency table. To enhance statistical power, the data were further pooled into polymorphic (tip and interior) vs. fixed, and young (tip) vs. old (interior and fixed) categories. None of the tests was significant.
A second nested series of additional contingency tests examines the evolutionary dynamics of nonsynonymous vs. synonymous mutations within the transmembrane regions (rows one and three of Table 3) and within the surface regions (rows two and four of Table 3; Templeton 1987, 1996). The transmembrane results are highly significant for the human data (permutational probability = 0), but neither network resolution yields significant results for the chimpanzee data (permutational probabilities = 0.372 and 0.411 for resolutions Ib and IIb, respectively). None of the surface results is significant (permutational probability values are 1.000 for both human network resolutions and 0.429 for the chimpanzee data). The pooling categories of polymorphic/fixed and young/old were also only significant for the human transmembrane data (FET probabilities = 0.0001 and 0.0037, respectively).
The Tajima and the Fu and Li tests of neutrality: The Tajima (1989) test examines whether the average number of pairwise nucleotide differences between sequences (π) is larger or smaller than expected from the observed number of polymorphic sites (θ). Under the assumption of a random mating population at equilibrium, the difference between π and θ (DT) is expected to be zero. A positive value of DT indicates possible balancing selection or population subdivision. A negative value suggests recent directional selection, a population bottleneck, or background selection of slightly deleterious alleles (Tajima 1989). The Fu and Li (1993) test takes a genealogical approach and is based on the principle of comparing the number of mutations on internal branches with those on external branches. Compared with a neutral model of evolution, directional selection would result in an excess of external mutations, while balancing selection would result in an excess of internal mutations. Ideally, an outgroup is used so that the number of mutations in the external branches can be determined unambiguously. Since it is not clear which tests are most powerful, Tajima's (1989) DT test, as well as the D and F tests proposed by Fu and Li (1993), were used to investigate departures from neutrality at the ND2 gene in humans and chimpanzees (Table 4).
DT, D, and F calculated for the human and chimpanzee ND2 gene
A significantly negative DT, D, and F is observed for the human ND2 data. This is consistent with a pattern of there being too many rare nucleotide polymorphisms with respect to predictions of the neutral theory (see e.g., Bravermanet al. 1995). In the case of chimpanzees, none of the tests is significant, and thus, by this criterion, the data are consistent with a neutral model of mtDNA evolution.
DISCUSSION
The contingency approach to testing neutrality depends on accurate and unbiased counts of the numbers of mutations in various categories (Templeton 1996). The network approach of Bandelt et al. (1995) enables identification of any ambiguity in the tree topology that will affect this analysis. For the ND2 data set, the human and chimpanzee networks each contained two alternatives that affected the numbers of mutations in some of the categories (Figure 2 and Table 3). The contingency analyses were therefore repeated over all network alternatives. The conclusions about the evolution of the ND2 gene in humans and chimpanzees are not affected by this topological ambiguity.
The contingency tests do not apply for highly diverged sequences: when fully saturated, the ratio of nonsynonymous to synonymous nucleotide differences is expected to be as much as two times greater in between-species than in within-species comparisons. This effect is not important when the species to be compared are closely related and the sequences are not close to saturation (Maynard-Smith 1994). Thus, an analysis of mutational saturation of nonsynonymous and synonymous substitutions should be done before interpreting a rejection of the null hypothesis as evidence for selection. Such an analysis of substitutions in the ND2 gene is presented in Figure 3 and Table 5. Nonsynonymous substitutions are far from saturated between divergent species. Although synonymous substitutions do not appear to be saturated in the human × chimpanzee comparison, they are likely to be undercounted, resulting in conservative tests.
Accumulation of synonymous (broken line) and nonsynonymous (solid line) nucleotide substitutions in the ND2 gene with Δ T50H values derived from DNA–DNA hybridization (Sibley and Ahlquist 1987). (A) Human × chimpanzee comparison. (B) Human + chimpanzee × gorilla comparisons. (C) Human + chimpanzee + gorilla × orangutan comparisons. (D) Human + chimpanzee + gorilla + orangutan × siamang comparisons.
The data presented here for the full contingency analysis (Table 3) provide a clear rejection of the null hypothesis that the human ND2 gene is evolving, according to a strictly neutral model of molecular evolution. This strong departure from neutrality is also seen in the McDonald and Kreitman (1991) test, and it is consistent with an excess of nonsynonymous polymorphisms (or deficiency of synonymous polymorphisms) within humans compared with interspecific differences. This pattern has previously been documented for the human ND2 gene (based on a limited sample size), and it appears to be widespread in the human mitochondrial genome (Nachmanet al. 1996).
Further insights into the biological basis of the rejection of the null hypotheses are possible by performing additional contingency analyses nested within the original contingency table (Templeton 1987, 1996). The first nested series examined the distribution of nonsynonymous and synonymous mutations in the transmembrane vs. surface regions. None of the contingency tests involving synonymous mutations resulted in a rejection of the null hypothesis, nor did the tests for nonsynonymous mutations. There are very few nonsynonymous mutations overall (especially in the surface regions), however, so this lack of significance could be caused by the much lower statistical power in this case as compared to the synonymous mutation case (Templeton 1996). One way of regaining power in a contingency framework is to pool categories. The standard pooling of “polymorphic vs. fixed” (Templeton 1987) did not produce a significant result, nor did pooling into “young vs. old” (Templeton 1996).
Estimates of the number of nonsynonymous and synonymous nucleotide substitutions per site between five species of primates using the method of Li (1993)
The second nested series examined the distribution of nonsynonymous vs. synonymous mutations within the transmembrane and surface regions separately. The null hypothesis of neutrality is not rejected in the surface regions, but it is strongly rejected in the transmembrane regions of the human ND2 gene. An examination of the data reveals a greater number of nonsynonymous polymorphisms within human transmembrane regions than expected based on interspecific comparisons. Within human transmembrane regions, 66.7% of the polymorphisms are nonsynonymous mutations. In contrast, between humans and chimpanzees, only 15.2% of the fixed differences are nonsynonymous mutations. Furthermore, there is an excess of nonsynonymous mutations in the young category (63.6%) compared with the old category (18.6%).
The results presented here are generally consistent with other studies that have used DNA sequence data to test the hypothesis that mtDNA variation is neutral. In Drosophila, non-neutral patterns have been documented for ND5 (Randet al. 1994) and cyt b (Ballard and Kreitman 1994). Similar patterns have also been observed for ND3 in mice (Nachmanet al. 1994), as well as ND3 (Nachmanet al. 1996) and COII (Templeton 1996) in humans and chimpanzees. In all of these studies, the ratio of nonsynonymous to synonymous nucleotide differences is greater within species than between species (Table 6). What might account for the observed pattern?
Summary of variation at nonsynonymous (NS) and synonymous (S) sites within and between different species
One possible explanation for these observations is that some form of balancing selection is maintaining amino acid variability. This hypothesis is considered unlikely for the human ND2 data because Tajima's (1989) DT test, as well as the D and F tests proposed by Fu and Li (1993), are significantly negative, indicating an excess of low frequency polymorphisms (Table 4). Under a model of balancing selection, some polymorphisms would be maintained in the population at intermediate frequencies, thus leading to positive test values.
A second possible explanation for the results is that there has been a recent relaxation of selective constraint in the human lineage. This would result in some previously deleterious mutations becoming neutral and being incorporated into the population as polymorphisms (Takahata 1993a). Takahata (1993a) has argued that deleterious mutations in the human population may have became harmless under the changed (improved) environment throughout the Pleistocene. This hypothesis, however, does not adequately explain the results of Tajima's and Fu and Li's tests (Table 4). A relaxation of selective constraint is expected to increase the rate at which mutations at nonsynonymous sites are introduced into the population, but it is expected to have very little such effect on mutations at synonymous sites. Under this scenario, we would expect to observe negative test values for nonsynonymous sites but not for synonymous sites. The results presented here, however, show that DT, D, and F are significantly negative for both nonsynonymous and synonymous sites (Table 4). Negative test values have also been observed for the noncoding control region (Jordeet al. 1995; C. A. Wise, unpublished results).
The pattern of mtDNA variation is consistent with a model of a population bottleneck followed by an expansion in population size (e.g., Di Rienzo and Wilson 1991; Rogers and Harpending 1992; Harpendinget al. 1993; Sherryet al. 1994; Rogers and Jorde 1995). This model can be used to explain the negative values of Tajima's and Fu and Li's tests; however, it does not explain the significant contingency test results. Furthermore, if human mitochondrial genome diversity reflects historical patterns of population size change, then similar patterns are expected of nuclear genome diversity. This appears not to be the case, and differences in the patterns of mitochondrial and nuclear genome diversity have recently been interpreted as evidence against the population expansion scenario (Hey 1997). A population bottleneck in the human lineage also appears to be incompatible with the unusual polymorphism at the major histocompatibility complex (Mhc) loci (Takahata 1990, 1993b; Kleinet al. 1993; Ayalaet al. 1994; Ayala 1995; Ayala and Escalante 1996), despite some criticism of the details of some of these analyses (Erlichet al. 1996). It is also inconsistent with the pattern of nucleotide polymorphism at the β-globin locus (Hardinget al. 1997), and of Alu repeat and microsatellite variation (H. Harpending, personal communication).
Another possibility is that amino acid mutations at ND2 are slightly deleterious (e.g., Ohta 1992). Slightly deleterious mutants may persist within populations for brief periods, but they are unlikely to rise in frequency or become fixed. Slightly deleterious models of molecular evolution have previously been invoked as potential explanations for patterns of mitochondrial genome evolution in Drosophila (DeSalle and Templeton 1988; Ballard and Kreitman 1994), mice (Nachmanet al. 1994), and humans (Nachmanet al. 1996; Templeton 1996). The test results presented here reveal a significant excess of young nonsynonymous polymorphisms within human ND2 transmembrane regions, suggesting that they may be deleterious. This model is also consistent with the negative values of Tajima's and Fu and Li's tests.
The relative contribution of slightly deleterious mutations to heterozygosity increases as effective population size, Ne, decreases (Kimura 1983). Thus, if nonsynonymous mutations in the mitochondrial genome are slightly deleterious, we would expect a relative increase, compared to neutral synonymous mutations, as Ne decreases. The results presented here for the ND2 gene indicate that the nonsynonymous to synonymous ratio is significantly greater within humans than within chimpanzees (0.91:0.22, Table 6; FET probability = 0.0196). The results for the much smaller ND3 gene are not significant (0.57:1.33, Table 6; FET probability = 0.6305) based on a comparison of 61 humans and five chimpanzees (Nachmanet al. 1996).
Under neutrality, diversity should be low in small populations and high in large ones. Thus, the lower level of mitochondrial diversity in humans compared with chimpanzees (Table 2; Ferriset al. 1981; Morinet al. 1994; Ruvoloet al. 1994; Nachmanet al. 1996; Wiseet al. 1997) may reflect a smaller Ne in humans. This is consistent with the results described above. However, the lower level of nuclear genome diversity in chimpanzees (Wiseet al. 1997, and references therein) implies that the Ne of chimpanzees is smaller than humans, thus we would expect to observe a higher nonsynonymous to synonymous ratio within chimpanzees. This is inconsistent with the slightly deleterious model presented above. The contingency test results could, however, reflect the occurrence of slightly deleterious mutations if effective population size had been reduced for the mitochondrial genome by a selective sweep that did not affect most or all of the nuclear genome.
The low levels of human mtDNA diversity have been used as support for the out-of-Africa replacement hypothesis (Cannet al. 1987; Vigilantet al. 1991); however, directional selection could also explain the reduced mtDNA diversity in humans compared with chimpanzees (π = 0.24% for humans and 0.73% for the chimpanzee subspecies P. t. verus, Table 2). Because there is no apparent genetic recombination in mtDNA, this depletion of variation could be the result of an advantageous mutation anywhere within the mitochondrial genome sweeping through the human population. The results of Tajima's and Fu and Li's tests are consistent with the occurrence of directional selection in the human mitochondrial genome. If an advantageous mutation has recently become fixed in the population, then the majority of mutations in the population are expected to be “young,” thus leading to negative test values. The disparate pattern of variation in the nuclear genome (Hardinget al. 1997; Hey 1997) is also consistent with this explanation since even though the entire mitochondrial genome would be affected the nuclear genome would be unaffected except possibly at specific genes interacting epistatically with mitochondrial genes.
Acknowledgments
We are grateful to D. C. Gajdusek and C. J. Gibbs, Jr., for supplying the chimpanzee blood samples, to L. Croft for technical assistance, and to G. Chelvanayagam for discussions and assistance with the secondary structure modeling. This work was funded by Australian Research Council grant A59332440 to S.E.
Footnotes
-
Communicating editor: R. R. Hudson
- Received April 8, 1997.
- Accepted October 6, 1997.
- Copyright © 1998 by the Genetics Society of America