Abstract
Comparison of the levels of nucleotide diversity in humans and apes may provide much insight into the mechanisms of maintenance of DNA polymorphism and the demographic history of these organisms. In the past, abundant mitochondrial DNA (mtDNA) polymorphism data indicated that nucleotide diversity (π) is more than threefold higher in chimpanzees than in humans. Furthermore, it has recently been claimed, on the basis of limited data, that this is also true for nuclear DNA. In this study we sequenced 50 noncoding, nonrepetitive DNA segments randomly chosen from the nuclear genome in 9 bonobos and 17 chimpanzees. Surprisingly, the π value for bonobos is only 0.078%, even somewhat lower than that (0.088%) for humans for the same 50 segments. The π values are 0.092, 0.130, and 0.082% for East, Central, and West African chimpanzees, respectively, and 0.132% for all chimpanzees. These values are similar to or at most only 1.5 times higher than that for humans. The much larger difference in mtDNA diversity than in nuclear DNA diversity between humans and chimpanzees is puzzling. We speculate that it is due mainly to a reduction in effective population size (Ne) in the human lineage after the human-chimpanzee divergence, because a reduction in Ne has a stronger effect on mtDNA diversity than on nuclear DNA diversity.
SINCE the discovery of extensive mitochondrial DNA (mtDNA) polymorphism in apes by restriction enzyme mapping (Ferriset al. 1981), it has been known that the nucleotide diversity (π) in mtDNA is at least threefold higher in chimpanzees than in humans. This view has been confirmed by recent sequence data from the control region (Wiseet al. 1997) and from synonymous sites in the ND2 gene (Stoneet al. 2002). On the other hand, since the late 1970s it has been known that the level of heterozygosity at protein coding loci is higher in humans than in chimpanzees (King and Wilson 1975; Lucotte 1983). Supporting this view that humans may have as much or greater diversity in the nuclear genome was the observation that humans had higher levels of heterozygosity at microsatellite loci than chimpanzees (Wiseet al. 1997), although such a difference seen in similar studies was potentially attributable to ascertainment bias (Ellegrenet al. 1995; CrouauRoyet al. 1996). Therefore, it was commonly thought that mtDNA and nuclear DNA gave different pictures of polymorphism in humans and chimpanzees. However, recent DNA polymorphism data from a 10-kb X-linked noncoding region, two intergenic (HOXB6, DRD4, ∼1 kb each), two intronic (ADH1, ∼600 bp; DRD2, ∼300 bp) regions, and 5.8-kb silent sites in genes at six nuclear loci revealed a three- to fourfold higher nucleotide diversity in chimpanzees than in humans (Deinard and Kidd 1999, 2000; Kaessmannet al. 1999; Jensen-Seamanet al. 2001; Satta 2001), leading to the view that, like mtDNA, nuclear DNA sequence diversity is also much higher in chimpanzees than in humans.
However, as the data are limited, this issue deserves further investigation. In a recent study Yu et al. (2002) sequenced 50 DNA segments randomly chosen from the noncoding, nonrepetitive parts of the human genome in 30 humans from various localities around the world. In the present study we have sequenced the same 50 segments in 9 bonobos and 17 chimpanzees from East, Central, and West Africa. Unexpectedly, the new data reveal a small difference between the levels of nucleotide diversity in chimpanzees and humans. Therefore, nuclear DNA and mtDNA actually give different pictures of the levels of nucleotide diversity in humans and chimpanzees. How this difference arose is a puzzling question and we will attempt to find an answer.
MATERIALS AND METHODS
Sample sources: DNA from nine bonobos (Pan paniscus) and 17 common chimpanzees (six P. troglodytes verus, five P. t. troglodytes, two P. t. schweinfurthii, and four individuals of unknown subspecies) was used in this study. The five P. t. troglodytes individuals (named Cheetah, Dodo, Bakoumba, Julie, and Noemie) were from J. Wickings, CIRMF, Gabon. Three of the six P. t. verus individuals (Rinus, Anita, and Hannibal) were from A. Prince, Vialab, Liberia, one (Herman) was from the Lowery Zoo, and two (Bert and Tate) were from the New Iberia Research Center; four individuals (Carl, Kasey, Harv, and Tank) were of unknown geographic origin. The two P. t. schweinfurthii individuals (Harriet and Kobi) were from J. Fritz, Arizona Primate Foundation. Although no geographical information was available for the individuals housed at the New Iberia Research Center, they were regarded as P. t. verus on the basis of their mitochondrial D-loop sequences (Morinet al. 1994), and the nuclear sequences generated from these samples did not contradict this classification (Deinard and Kidd 2000). Of the nine P. paniscus individuals, two (Bosondjo and Matata) were from the Atlanta Zoo/Yerkes Regional Primate Center; two (Lody and Maringa) from the Milwaukee Zoo, and five (Kakowet, Lokalema, Charlie, Vernon, and Linda) from the San Diego Zoo. However, all nine individuals were originally caught in Zaire and all of them were chosen to be independent.
PCR amplification and sequencing of DNA segments: The 50 noncoding, nonrepetitive genomic segments (each ∼1 kb) were originally selected randomly from the human genome (Chen and Li 2001; Yuet al. 2002). All were chosen to avoid coding regions or close linkage to any coding regions. In each segment and its nearby regions there was no registered gene in GenBank and no potential coding region was detected by either GenScan or GRAIL-EXP.
Touchdown PCR (Donet al. 1991) was used and the reactions were carried out following the conditions described in Zhao et al. (2000). The PCR products were purified by Wizard PCR Preps DNA purification resin kit (Promega, Madison, WI). Sequencing reactions were performed according to the protocol of the ABI Prism BigDye Terminator sequencing kits (Perkin-Elmer, Norwalk, CT) modified by one-quarter reaction. The extension products were purified with Sephadex G-50 (DNA grade, Pharmacia, Piscataway, NJ) and run on an ABI 377XL DNA sequencer using 4.25% gels (Sooner Scientific). About 500 bp of each segment was sequenced in both directions.
ABI DNA Sequence Analysis 3.0 was used for lane tracking and base calling. The data were then proofread manually and heterozygous sites were detected as double peaks. The forward and reverse sequences were assembled automatically in each individual using SeqMan (DNAStar, Madison, WI). The assembled files were carefully checked by eye. Fluorescent traces for each variant site were rechecked again in all individuals. All singletons, which are variants that appear only once in the entire sample, were verified by PCR reamplification and resequencing of the PCR products in both directions.
Data analysis: The sequences were aligned by SeqMan. Nucleotide diversity values were calculated using DNASP version 3.14 (Rozas and Rozas 1999) and the average percentage distances between species were calculated using DAMBE (Xia and Xie 2001).
RESULTS AND DISCUSSION
Distribution of SNPs: We sequenced 50 noncoding segments in nine bonobos and 17 chimpanzees from East, Central, and West Africa. The total number of nucleotide sites sequenced, after exclusion of deletions and insertions (mostly single nucleotide indels), is ∼23,500.
A total of 186 single nucleotide polymorphisms (SNPs) were found in the 17 chimpanzee samples (34 sequences); 51 of them were observed only once (i.e., singletons) and 15 only twice (doubletons). The number of variant sites found was 54 in the 12 West African chimpanzee (P. t. verus) sequences, 101 in the 10 Central African chimpanzee (P. t. troglodytes) sequences, and 39 in the 4 East African chimpanzee (P. t. schweinfurthii) sequences. Thus, many more variants were found in the Central African subspecies than in the West and East African subspecies, indicating a much higher DNA diversity in the Central African subspecies. The numbers of singletons were 14, 58, and 27 in the West, Central, and East African chimpanzee sequences, respectively. Thus, in the Central and East African subspecies, more than half of the variants were singletons, whereas in the West African subspecies less than one-third of the variants were singletons with an equal number of doubletons. A total of 63 SNPs were found in the 18 bonobo sequences; there were 21 singletons and 11 doubletons. Clearly, bonobos are less polymorphic than each of the three chimpanzee subspecies.
Adequacy of the sample sizes: As our sample sizes are relatively small, we need to consider the problem of sampling bias. For this purpose, we consider the effect of sampling on nucleotide diversity (π) because π is the quantity of our primary interest in this study; π is defined as the number of nucleotide differences per site between two randomly chosen sequences in a population. As noted in Yu et al. (2002), an estimation bias may be detected by comparing within-individual π values (πw) with between-individual π values (πb). Ideally, each sequence in a sample should be taken randomly from the population, but we have included the two sequences within each of the individuals sampled. The two sequences in an individual are not completely independent if the individual is “inbred” to some extent. Anyway, the within-individual π values (πw) should tend to be smaller than the between-individual π values (πb) and their inclusion should tend to give an underestimate of π. However, if the average πb and πw values are similar, then the sampling scheme would seem largely adequate and the inclusion of πw values in the estimation of π should produce no substantial bias. To simplify the analysis, we concatenate the segments in an individual in a random manner into two continuous sequences and these sequences are then used to compute the πb and πw values.
For the Central African chimpanzee sequences, the distribution of πb values, which ranges from 0.098 to 0.174%, is only somewhat wider than that of the five πw values, which ranges from 0.102 to 0.153%. Since the average πb (0.131%) is <10% higher than the average πw (0.121%), the sampling bias should not be strong. A similar comment applies to the West African chimpanzee sample. There is one very low πw value (0.051%) and the average πw value (0.072%) is ∼10% lower than the average πb value (0.082%). The average πw without the outlier becomes 0.077%, which is not far from the average πb (0.082%). This comparison suggests that the estimated π value (0.082%) in this subspecies is probably somewhat biased downward. For the East African chimpanzee sample, the average πw value (0.088%) is close to the average πb value (0.092%). This may suggest no substantial bias. However, because only two individuals were sampled, the estimate may not be reliable and should be taken with caution.
—Distributions of the within-individual (πw; □) and between-individual (πb; ♦) nucleotide diversity values in bonobos. (a) All of the 18 sequences are included. (b) One individual (Bosondjo) and one sequence (randomly chosen) from each of three individuals (Kakowet, Lody, and Linda) are excluded.
For the bonobo sample, the distributions of πw and πb values are given in Figure 1a. Several πw and πb values are <0.04%, whereas most of the others are considerably >0.04%. This observation suggests that some of the individuals are fairly closely related to each other or inbred, although they were originally chosen to be independent. Indeed, the four πb values between Bosondjo and Maringa (studbook numbers 64 and 60, respectively) were only 0.034, 0.034, 0.038, and 0.047%. We therefore excluded Bosondjo from comparison. Moreover, the πw value is only 0.030% for Kakowet (studbook no. 34) and 0.038% for Lody and Linda (studbook nos. 68 and 23). We therefore excluded one sequence (randomly chosen) from each of these three individuals. After the exclusion of these sequences, only 13 sequences remain and the new distributions of πw and πb values are given in Figure 1b. The exclusion of the above sequences increases the average π value from 0.075 to 0.078%. The latter value will be used as our estimate.
Nucleotide diversity: For the 50 DNA segments we obtained, the range of π is from 0 (14 segments) to 0.39% in the West African chimpanzee sample, from 0 (7 segments) to 0.46% in the Central African chimpanzee sample, from 0 (23 segments) to 0.45% in the East African chimpanzee sample, and from 0 (3 segments) to 0.46% in the entire chimpanzee sample (Table 1). The range of π is from 0 to 0.36% in the bonobo sample and from 0 to 0.30% in the human sample (Table 1). Such large fluctuations are not surprising because the nucleotide diversity in a short DNA region is subject to strong stochastic effects. In addition, variation in π may also arise from variation in mutation rate among genomic regions.
Table 2 shows that Central African chimpanzees have the highest average π value (0.130%), followed by East African chimpanzees (0.092%), and then West African chimpanzees (0.082%). As mentioned above, the π value estimated from the East African chimpanzee sample may not be reliable because of a small sample size and that from the West African chimpanzee sample might be biased downward. We note further that in previous studies the East African chimpanzee had a greater π at APOB and PABX but a smaller π at the HOXB6 intergenic region and at the mtDNA ND2 and the mtDNA control region than did the West African chimpanzee (Table 2; Wiseet al. 1997; Deinard and Kidd 2000; Stoneet al. 2002). Therefore, further data are needed to see whether the π value for the East African chimpanzee is actually higher than that for the West African chimpanzee.
Surprisingly, the average π value in bonobos (0.078%) is somewhat lower than that in humans (0.088%; Table 2). The observation that bonobos have lower nucleotide diversity than humans is in agreement with the Xq13.3 data, but contrary to the HOXB6, DRD4, DRD2, and NRY regions. Furthermore, the average π values in the East and West African chimpanzee subspecies (0.092 and 0.082%) are similar to that in humans. The Central African chimpanzee is the only subspecies that has a π value (0.130%) higher than that in humans and the difference is only 50%. We note further that even the highest πb value for the concatenated sequences in the Central African chimpanzee sample is only 0.174% (see above), which is only two times the average π value in humans. When all chimpanzee sequences are considered together, the average π value (0.132%) is again only 50% higher than that in humans. Actually, if we consider African humans only, the π value for the 50 DNA segments becomes 0.115% (Yuet al. 2002), which is only 13% lower than that for chimpanzees.
Previous reports have suggested that chimpanzees have two to four times greater amounts of π than humans at several autosomal nuclear loci, including the noncoding intergenic regions near HOXB6 and DRD4 (Deinard and Kidd 1999; Jensen-Seamanet al. 2001), the intronic regions of DRD2 and AHD1 (Deinard and Kidd 1998; Jensen-Seamanet al. 2001), and synonymous sites at six protein coding loci (Satta 2001). Similar results were also observed at the X chromosomal locus Xq13.3 (Kaessmannet al. 1999) and the Y chromosomal NRY locus (Stoneet al. 2002). However, since our data set has a much wider genomic representation, it should be more reliable. The stochastic effects of examining only a small number of loci can be seen, for example, in that in one-third (17 of 50) of the segments that we examined chimpanzees had more than three times higher π than humans (Table 1), which is approximately the value that was found by others when looking at a single locus (Table 2; Deinard and Kidd 1999; Kaessmannet al. 1999). Thus, the difference in nucleotide diversity between humans and chimpanzees is considerably smaller for nuclear DNA than for mtDNA data (Table 2). The disparity in using mtDNA vs. nuclear DNA in comparisons between humans and chimpanzees was originally pointed out by Wise et al. (1997), who in further comparisons to other nonhuman primates suggested that humans, not chimpanzees, were unusual in possessing such low levels of mtDNA diversity relative to that of the nuclear genome.
Another measure of genetic variability is the number of segregating alleles in the sample. However, because the sample sizes are different for different populations, we consider θ= 4Neu, where Ne is the effective population size and u is the mutation rate per site per generation. The θ values estimated from the numbers of segregating sites by Watterson’s estimator (Watterson 1975) are given in Table 2. We note that, compared to the difference in π (0.088 vs. 0.078%) between humans and bonobos, the difference in θ (0.123 vs. 0.082%) is even larger (Table 2), probably reflecting an increase in low-frequency alleles due to a recent population expansion in humans. The highest θ value for the three chimpanzee subspecies (0.152%) is that for Central African chimpanzees, but it is only 24% higher than that for humans. When all chimpanzee sequences are considered together, θ becomes 0.194%, which is only 50% higher than that in humans.
Effective population sizes: To estimate effective population size (Ne) we estimate the mutation rate per nucleotide site per generation (u) by using the sequence divergence (d) between species (Table 3) and assuming that the divergence time between the human and chimpanzee-bonobo lineages is 6 MY (Brunetet al. 2002; Vignaudet al. 2002). Since we are interested in the long-term effective population size, we use Tajima’s estimator, π= 4Neu (Tajima 1983), and we assume that the generation time is 15 years for chimpanzees and bonobos and 20 years for humans. The Ne for humans is estimated to be 10,500 (Table 2), which is similar to the commonly used value (10,000) in the literature (Nei and Graur 1984; Takahataet al. 1995; Zhaoet al. 2000), while that for bonobos (12,400) is only slightly larger and that for the Central African chimpanzees (20,900), or for the entire species of chimpanzees, is about twice as large.
Nucleotide diversity in each of the 50 DNA segments studied in chimpanzees, bonobos, and humans
Average nucleotide diversity in chimpanzees, bonobos, and humans and effective population sizes estimated from π
Causes for different patterns of nuclear DNA and mtDNA diversity: As noted above, for nuclear DNA the nucleotide diversity in humans is only 50% lower than that in chimpanzees, whereas previous studies have found the mtDNA nucleotide diversity in humans to be at most only one-third of that found in chimpanzees (Wiseet al. 1997; Stoneet al. 2002). What are the causes for this sharp contrast? A highly plausible cause is a reduction in the effective population size in the human lineage since the human-chimp divergence. There is strong evidence for this putative reduction (Ruvolo 1997; Harpendinget al. 1998; Chen and Li 2001). As noted by Fay and Wu (1999), a reduction in Ne causes a larger decrease in nucleotide diversity for mtDNA than for nuclear DNA. This follows from the theory that the effect of a bottleneck on π is proportional to T/N1, where T is the time since the bottleneck and N1 is the new effective population size, and from the fact that the effective population size for mtDNA is usually smaller than that for nuclear DNA (Takahata 1993). An additional factor is that there has been an increase in generation time in the human lineage, which leads to a higher mutation rate per generation and also to a higher expected π value in the case of nuclear DNA but to little increase in the case of mtDNA because of maternal inheritance and limited germ cell divisions per generation in the female germline.
Average sequence divergence (%) between taxa estimated from the 50 DNA segments studied
Another possibility is that population subdivision might have been stronger in humans than in chimpanzees or the female migration rate might have been lower in humans than in chimpanzees, so that the effective population size for mtDNA is considerably smaller in humans than in chimpanzees (Birkyet al. 1989). Evidence to support this argument may come from the observation that chimpanzee females uniformly disperse from their natal troop, in contrast to the typical mammalian (and primate) pattern of female philopatry with male dispersal (Greenwood 1980). The potential for high rates of female migration in chimpanzees is seen in the long-distance (>900 km) sharing of mtDNA haplotypes (Morinet al. 1994; Goldberg and Ruvolo 1997).
Divergences between subspecies and species: The average sequence divergence (d; average nucleotide differences per site) between chimpanzee subspecies is the smallest between the West and East African chimpanzees (Table 3), although in terms of geographic distance it should be the largest. The smaller divergence occurs because the levels of nucleotide diversity in the East and West African chimpanzees are smaller than that in the Central African chimpanzees. For the same reason, the d value is the largest between the Central and East African subspecies. The between-subspecies d values are only slightly higher than the within-subspecies π values, indicating a small separation between subspecies.
The sequence divergence between the bonobo and the chimpanzee (0.373%) is about three times higher than the nucleotide diversity within the chimpanzee species (0.132%), indicating a much longer separation time between the two species than between the chimpanzee subspecies. The sequence divergence between the human and the chimpanzee is 1.22% (Table 3), which is similar to the values (∼1.2%) obtained on the basis of >2 million bp (Chenet al. 2001; Ebersbergeret al. 2002; Fujiyamaet al. 2002). The sequence divergence between the human and the bonobo (1.30%) is somewhat higher than that between the human and the chimpanzee, probably because the bonobo has a smaller effective population size than the chimpanzee and so has been subject to a stronger effect of random drift.
These data also provide the largest and most comprehensive estimate of divergence time between chimpanzees and bonobos, estimated here to be 1.8 MYA, assuming a 6-MYA Homo-Pan split (Brunetet al. 2002; Vignaudet al. 2002). This estimate is the same as that by Stone et al. (2002), based on their data from the nonrecombing region of Y, and is intermediate between published data using mtDNA (2.5 MYA; Gagneuxet al. 1999) and X chromosomal DNA (0.9 MYA; Kaessmannet al. 1999). Also, the date of the formation of the Congo River, which currently prevents contact between these species, has been estimated from geological and limnological evidence to have formed ∼1.5 MYA (Beadle 1981). Perhaps the formation of the Congo River initiated the separation between chimpanzees and bonobos. This was also a time of potentially large climatic changes in the African climate and changes in vegetation patterns, proposed by some to have catalyzed the origins of several hominid species and human innovations (Vrba 1995).
Acknowledgments
We thank the zoos, research organizations, and individuals listed in materials and methods for the generous donation of DNA or blood samples used in this study. This study was supported by National Institutes of Health grants GM-55759 and GM-30998.
Footnotes
-
Sequence data from this article have been deposited with the GenBank Data Libraries under accession nos. AY275957-AY277244.
-
Communicating editor: N. Takahata
- Received February 4, 2003.
- Accepted April 16, 2003.
- Copyright © 2003 by the Genetics Society of America