Nucleotide Diversity in Gorillas
Ning Yu, Michael I. Jensen-Seaman, Leona Chemnick, Oliver Ryder, Wen-Hsiung Li


Comparison of the levels of nucleotide diversity in humans and apes may provide valuable information for inferring the demographic history of these species, the effect of social structure on genetic diversity, patterns of past migration, and signatures of past selection events. Previous DNA sequence data from both the mitochondrial and the nuclear genomes suggested a much higher level of nucleotide diversity in the African apes than in humans. Noting that the nuclear DNA data from the apes were very limited, we previously conducted a DNA polymorphism study in humans and another in chimpanzees and bonobos, using 50 DNA segments randomly chosen from the noncoding, nonrepetitive parts of the human genome. The data revealed that the nucleotide diversity (π) in bonobos (0.077%) is actually lower than that in humans (0.087%) and that π in chimpanzees (0.134%) is only 50% higher than that in humans. In the present study we sequenced the same 50 segments in 15 western lowland gorillas and estimated π to be 0.158%. This is the highest value among the African apes but is only about two times higher than that in humans. Interestingly, available mtDNA sequence data also suggest a twofold higher nucleotide diversity in gorillas than in humans, but suggest a threefold higher nucleotide diversity in chimpanzees than in humans. The higher mtDNA diversity in chimpanzees might be due to the unique pattern in the evolution of chimpanzee mtDNA. From the nuclear DNA π values, we estimated that the long-term effective population sizes of humans, bonobos, chimpanzees, and gorillas are, respectively, 10,400, 12,300, 21,300, and 25,200.

THE amount and pattern of genetic diversity in a species can provide valuable information for deducing the evolutionary history of the species, including past changes in population size, effects of social structure on genetic diversity, patterns of past migration, and signatures of past selection events. For these reasons, numerous studies of genetic diversity have been conducted on humans (e.g., Cannet al. 1987; Tishkoffet al. 1996; Harpendinget al. 1998). Recently, there has also been considerable interest in the level and pattern of nucleotide diversity in the African apes (see below). These studies have revealed that although humans currently are geographically widespread and number in the billions, they show reduced genetic variation compared to the geographically more restricted African apes (Ruvoloet al. 1994; Deinard and Kidd 1998, 1999, 2000; Gagneuxet al. 1999; Kaessmann et al. 1999, 2001; Jensen-Seamanet al. 2001) and have a long-term effective population size of only ∼10,000. It seems that humans are unusual compared to African apes in this respect, which suggests that the last common ancestor of Homo, Pan, and Gorilla was probably much more similar to the extant apes than to modern humans.

Ferris et al.'s (1981) study of ape mitochondrial DNA (mtDNA) by restriction enzyme mapping was the first to suggest a much higher level of mtDNA variation in great apes than in humans. In that study, the mtDNA genome of chimpanzees was found to be three times more variable than that of humans; even gorillas, which had the least amount of mtDNA variation among the apes, exhibited twice as much variation as humans. This view was supported by sequence data from COII and 16S rRNA (Ruvoloet al. 1994; Nodaet al. 2001). Further, in the first hypervariable segment (∼300 bp) of the D-loop, gorillas also carried twice as much nucleotide diversity as humans, and chimpanzees had three times that of humans (Morinet al. 1994; Garner and Ryder 1996; Deinard and Kidd 2000; Jensen-Seaman and Kidd 2001). In a recent study of five autosomal loci among African apes, common chimpanzees carried the greatest amount of nucleotide diversity, with bonobos and gorillas possessing somewhat less variation (Jensen-Seamanet al. 2001). Data from a 10-kb X-linked noncoding region also revealed a three- to fourfold higher nucleotide diversity in both gorillas and chimpanzees than in humans (Kaessmannet al. 1999). Although data from gorillas are lacking, substantially greater diversity was found in the Y chromosome of chimpanzees and bonobos than in that of humans (Stoneet al. 2002). Therefore, DNA sequence data from both the mitochondrial and nuclear genomes strongly suggested that this greater amount of diversity is a general feature of the African apes (Jensen-Seamanet al. 2001).

Noting that the nuclear DNA polymorphism data in apes were from only a few loci, we decided to do a further investigation (Yuet al. 2003). We sequenced 50 DNA segments in nine bonobos and 17 chimpanzees from East, Central, and West Africa. These 50 segments were the same as in Yu et al. (2002), who studied 30 humans from various localities around the world; the 50 segments were randomly chosen from the noncoding, nonrepetitive parts of the human genome (Chen and Li 2001). Unexpectedly, the new data revealed a considerably smaller difference between the levels of nucleotide diversity in chimpanzees and humans (Yuet al. 2003), with the former possessing only ∼1.5-fold greater diversity than the latter. Thus, with a clearer view obtained with a much larger number of loci, we found that autosomal DNA and mtDNA actually gave different pictures of the levels of nucleotide diversity in humans and chimpanzees. This result raised the question, Is this also true for other apes? To gain a better understanding of the amount of intraspecific genetic variation in the gorilla genome, it is necessary to obtain data from a similarly large number of loci.

Gorillas are found discontinuously in the tropical forests of equatorial Africa. The taxonomy of gorillas has long been debated (Groves 2003). They have traditionally been considered a single species (Coolidge 1929); however, more researchers are considering them as two separate species, Gorilla gorilla for western gorillas and G. beringei for eastern gorillas (Groves 2001). Wild gorilla populations are diminishing due to destruction of the tropical forests, habitat loss, and poaching. Mountain and eastern lowland gorillas are classified by IUCN— The World Conservation Union as endangered, while western lowland gorillas are considered threatened (Leeet al. 1988). In the present study we have obtained DNA samples from 15 captive western lowland gorillas (G. gorilla gorilla) to estimate nucleotide diversity at the same 50 loci as used previously for humans, chimpanzees, and bonobos (Yu et al. 2002, 2003).


Sample sources: DNA from 15 western lowland gorillas was used in this study. Seven individuals (named Massa, Samson, Dolly, Tuffi, Porta, Freddy, and OR 802) were from the San Diego Zoo. Blood from three individuals (Holoko, Choomba, and Mumbah) was a generous gift from George Amato of the Wildlife Conservation Society. DNA from two individuals (Abe and Oko) was a generous gift from Amos Deinard and Kenneth Kidd of Yale University. Blood obtained during routine veterinary examinations from one individual (Moka) was kindly donated by the National Zoo in Washington, DC, and that from two more individuals (Josephine and Jimmie) was kindly donated by the Miami Metro Zoo in Miami. Except for Or802 (no name), who has unrelated parents and whose grandparents were all wild born, all individuals were originally wild born and all samples were independent.

PCR amplification and sequencing of DNA segments: The 50 noncoding, nonrepetitive genomic segments (each ∼1 kb) were originally selected randomly from the human genome (Chen and Li 2001; Yuet al. 2002). All were chosen to avoid coding regions or close linkage to any coding regions. In each segment and its nearby regions there was no registered gene in GenBank and no potential coding region was detected by either GenScan or GRAIL-EXP.

Touch-down PCR (Donet al. 1991) was used and the reactions were carried out following the condition described in Zhao et al. (2000). The PCR products were purified by Wizard PCR Preps DNA purification resin kit (Promega, Madison, WI). Sequencing reactions were performed according to the protocol of the ABI Prism BigDye terminator sequencing kits (Perkin-Elmer, Norwalk, CT) modified by one-quarter reaction. The extension products were purified with Sephadex G-50 (DNA grade, Pharmacia), and run on an ABI 377XL DNA sequencer using 4.25% gels (Sooner Scientific). About 500 bp of each segment was sequenced in both directions.

ABI DNA Sequence Analysis 3.0 was used for lane tracking and base calling. The data were then proofread manually and heterozygous sites were detected as double peaks. The forward and reverse sequences were assembled automatically in each individual using SeqMan (DNAStar, Madison, WI). The assembled files were carefully checked by eye. Fluorescent traces for each variant site were rechecked again in all individuals. All singletons, which are variants that appear only once in the entire sample, were verified by PCR reamplification and resequencing of the PCR products in both directions. No attempt was made to determine gametic phase (haplotypes) of individuals with multiple polymorphic sites per locus. Rather, the segments within an individual were concatenated in a random manner into two continuous sequences using DAMBE (Xia and Xie 2001). The new sequences have been deposited in GenBank (accession nos. AY447025–AY447950).

Data analysis: The sequences were aligned by SeqMan. Nucleotide diversity values and the average percentage distances between species were calculated using DNASP version 3.14 (Rozas and Rozas 1999).


Distribution of single nucleotide polymorphisms: Because one of the 50 segments could not be amplified in four individuals, this segment was not included in this study. We sequenced the remaining 49 segments in 15 western lowland gorillas. The total number of nucleotide sites sequenced, after exclusion of deletions and insertions, is ∼23,056 bp. A total of 138 single nucleotide polymorphisms (SNPs) were found in the 15 gorilla samples (30 sequences); 29 of them (21%) were observed only once (i.e., singletons) and 21 (15%) only twice (doubletons). Interestingly, in gorillas more than half (64%) of the variants were intermediate or high-frequency variants. This excess of intermediate frequency variants is also seen in the values of Tajima's D statistic (Tajima 1989), where for the concatenated sequences only gorillas have a positive value of D, while humans, chimpanzees, and bonobos have negative values, implying that ancient gorilla populations may have been subdivided.

Adequacy of the samples: Since our sample size is relatively small, we need to consider the problem of sampling bias. For this purpose, we consider the effect of sampling on nucleotide diversity (π) because π is the quantity of our primary interest in this study; π is defined as the number of nucleotide differences per site between two randomly chosen sequences in a population. As noted in Yu et al. (2002), an estimation bias may be detected by comparing within-individual π values (πw) with between-individual π values (πb). Ideally, each sequence in a sample should be taken randomly from the population, but we have included the two sequences within each of the individuals sampled. It is possible that the two sequences in an individual are not completely independent if the individual is “inbred” to some extent, in the sense that both sequences within an individual likely came from the same subpopulation, rather than from true random mating throughout the larger population. Therefore, the within-individual π values (πw) should tend to be smaller than the between-individual π values (πb) and their inclusion should tend to give an underestimate of π. However, if the average πb and πw values are similar, then the sampling scheme would seem largely adequate and the inclusion of πw values in the estimation of π should produce no substantial bias.

Figure 1.

—Distributions of the within-individual and between-individual nucleotide diversity in gorillas.

Figure 1 shows that the distribution of πb values is like a normal distribution, except that one point (πb = 0.078%) is substantially lower than the others. This observation suggests that there was no strong sampling bias. Moreover, excluding the “exceptional” point affects little the average π value. The distribution of the 15 πw values, which range from 0.078 to 0.195%, is somewhat narrower than that of the πb values, which range from 0.078 to 0.221%, and the average πw (0.136%) is lower than the average πb (0.159%; P < 0.01, one-tailed t-test). This comparison suggests that the some of the individuals may have been inbred to some extent. However, excluding the 15 πw values from comparison increases the average value only from 0.158 to 0.159%. We therefore take 0.158% as our estimate of the nucleotide diversity in western lowland gorillas.

The present study included individuals from only one of the gorilla subspecies, the western lowland gorilla; furthermore, since little is known of the geographic origin of these individuals they may not represent the full range of variation in this subspecies. As several studies have shown the amount of genetic distance between eastern and western gorillas to be as much as or greater than that between chimpanzees and bonobos from mtDNA loci such as COII, D-loop, and NADH5 (Ruvoloet al. 1994; Garner and Ryder 1996; Jensen-Seamanet al. 2001), we may be missing substantial variation in the genus Gorilla. On the other hand, data from eight independent nuclear loci suggested that the difference between the nuclear genomes of eastern and western gorillas was actually rather small compared to that between chimpanzees and bononos (Jensen-Seaman et al. 2001, 2003) and that the inclusion of eastern gorilla samples at a few nuclear loci made almost no difference in the estimate of π for gorillas as a whole (Jensen-Seaman 2000). A study of a 10-kb noncoding region in Xq13.3 confirmed the much smaller divergence between the nuclear genomes of eastern and western gorillas when compared to the Pan species (Kaessmannet al. 2001). Nonetheless, without more data it is impossible to speculate on the potential effect of the inclusion of eastern gorilla individuals.

Nucleotide diversity: For the 49 DNA segments we studied, the range of π is from 0 (6 segments) to 0.49% (Table 1). Such large fluctuations were also observed in humans, chimpanzees, and bonobos (Yu et al. 2002, 2003). These observations are not surprising because the nucleotide diversity in a short DNA region is subject to strong stochastic effects. In addition, variation in π may also arise from variation in mutation rate among genomic regions. Low π could also result from a recent selective sweep, but since these 49 segments were drawn from 16 different chromosomes, with most chosen to be millions of nucleotides from the next nearest segment, selection is not likely having any strong impact on the diversity values. Gorillas have the highest average π value (0.158%), which is close to twice that of humans (0.087%, Table 2). In contrast, the π value of bonobos (0.077%) is somewhat lower than that of humans, and that of chimpanzees (0.134%) is only 50% higher than that of humans.

Some reports have suggested that at autosomal loci gorillas have up to three times greater sequence diversity than humans (Deinard and Kidd 1999; Jensen-Seamanet al. 2001). Sequences from a 10-kb X-linked noncoding region revealed nucleotide diversity five times higher in gorillas than in humans (Kaessmannet al. 2001). However, several mtDNA studies, each using different gorilla samples, showed a pattern similar to that observed in this study. Results using mtDNA sequence data from the COII gene, 16S rRNA, and the first hypervariable segment of the D-loop all revealed approximately twice as much diversity in western lowland gorillas as in humans (Ruvoloet al. 1994; Jensen-Seaman and Kidd 2001; Nodaet al. 2001).

Among previous nuclear loci studied, the highest nucleotide diversity was found in chimpanzees at ADH1, APOB, DRD2, and DRD4, while gorillas carried the highest variation at HOXB6 and Xq13.3. The nucleotide diversity in gorillas from the 49 segments in the present study is the highest, followed by chimpanzees. However, at 20 of the 49 segments (41%), chimpanzees had greater nucleotide diversity than gorillas, demonstrating the importance of examining a large number of loci to obtain a reliable conclusion. Bonobos carry the lowest nucleotide diversity, lower than that in humans. Therefore, having a much greater amount of nucleotide diversity than humans is not a general feature of the African apes.

Effective population sizes: To estimate effective population size (Ne), we calculated the average mutation rate, which is 1.0469 × 10–9/site/year and determined the mutation rate per nucleotide site per generation (u) by using the sequence divergence (d) between species (Table 3) and assuming that the divergence time between human and gorilla and between human and the chimpanzee-bonobo lineage is 8 and 6 million years, respectively (Brunetet al. 2002; Vignaudet al. 2002). Of course, using different divergence dates will yield slightly different estimates of u and Ne. Since we are interested in the long-term effective population size, we use Tajima's (1983) estimator π= 4Neu and we assume that the generation time is 15 years for gorillas. The Ne for humans is estimated to be 10,400 (Table 2), which is similar to the commonly used value (10,000) in the literature (Nei and Graur 1984; Takahataet al. 1995; Zhaoet al. 2000), while that for bonobos (12,300) is only slightly larger, that for chimpanzees (21,300) is about twice as large, and the Ne for western lowland gorillas (25,200) is ∼2.5 times larger.

Differences in Ne between species could be due to several factors, including differences in present census size, past changes in population size, mating system, and population substructure. The relatively small population size in humans, especially considering their large census size, has most often been attributed to a large expansion, possibly following a bottleneck, from a much smaller population at some time in the recent past (Harpendinget al. 1998). That gorillas and chimpanzees have Ne at least twice as large as humans would suggest that these apes have not experienced similar dramatic population bottlenecks. The larger effective population size of gorillas relative to chimpanzees is intriguing and not likely due to a larger census size, given that at least at present gorillas have a more restricted geographic range than chimpanzees and in most habitats live at similar population densities as chimpanzees (Kurodaet al. 1996; Yamagiwaet al. 1996). Also, it is unlikely that the larger gorilla Ne is due to differences in mating system between chimpanzees and gorillas. In fact, given the single-male polygyny of gorillas (Watts 1996) and its associated high variance in male reproductive success, one may predict the opposite—that chimpanzees would have a larger Ne since their promiscuous mating (Dixson 1998) would lead to more males contributing to the next generation and a higher male effective population size.

Therefore, it is possible that the larger gorilla Ne may be due to their greater population subdivision. The excess of intermediate frequency variants in our gorilla sample supports this notion. Also, ecological evidence suggests that gorilla populations may be more subdivided than chimpanzee populations inhabiting the same area. Chimpanzees are able to live in a wider range of habitats including open woodland and savanna (Kortland 1983) and therefore may be capable of maintaining long distance gene flow between forests. In contrast, gorilla populations are restricted to forests and therefore may be unable to share migrants with other populations across open habitats. Indeed, genetic studies have revealed that chimpanzees share mtDNA haplotypes over 900 km (Morinet al. 1994; Goldberg and Ruvolo 1997). The same has not been found for gorillas (Jensen-Seaman and Kidd 2001; Cliffordet al. 2003), although far fewer wild populations of gorillas have been sampled. The increased Ne of gorillas may therefore be due to increased population subdivision relative to chimpanzees, with the caveat that although it has been shown that population subdivision can lead to an increase in Ne (Wright 1943), it can also lead to a decreased Ne depending on actual levels of migration and variance in reproductive success between subpopulations (Whitlock and Barton 1997; Laporte and Charlesworth 2002).

View this table:

Nucleotide diversity in each of the 49 DNA segments studied in gorillas, chimpanzees, bonobos, and humans

It is especially interesting to compare the levels of diversity and estimates of Ne at the subspecies level between our sample of western lowland gorillas (G. g. gorilla) and our previous data from the sympatric Central African chimpanzee (Pan t. troglodytes; Yuet al. 2003). The geographic range of the western gorilla is contained entirely within that of this chimpanzee subspecies (Groves 1971; Kortland 1983). P. t. troglodytes has the highest levels of nucleotide diversity among the three common chimpanzee subspecies as estimated from mtDNA data (Morinet al. 1994; Deinard and Kidd 2000), X chromsomal data (Kaessmannet al. 1999), Y chromosomal data (Stoneet al. 2002), and autosomal data (Yuet al. 2003). Similarly, it is apparent that G. g. gorilla has substantially more nucleotide diversity than other gorilla subspecies on the basis of mtDNA data (Garner and Ryder 1996; Jensen-Seaman and Kidd 2001). Thus, these sympatric African ape subspecies are each the most diverse representatives of their respective genera, suggesting that perhaps ecological conditions in this part of equatorial West Africa have been more conducive to maintaining long-term effective population sizes. Several forest refuges within the ranges of these subspecies have been proposed, which may have buffered them against population reductions during climatic fluctuations of the African Quaternary (Livingstone 1982; Maley 1996). Where they differ, however, is that western lowland gorillas have somewhat greater π and Ne than Central African chimpanzees. More strikingly, these gorillas have an excess of intermediate frequency mutations, while this chimpanzee subspecies has an excess of singletons (Yuet al. 2003), suggesting a greater level of population subdivision in gorillas within the same geographic area as chimpanzees (Avise 2000). This may have been especially true during periods of forest reduction and fragmentation associated with global cooling and drying over the last several hundred thousand years because chimpanzees are capable of living in dry savanna or open woodland environments by maintaining communities with very large home ranges, while gorillas are not found in such open environments (Yamagiwa 1999). Of course, other alternative explanations could be invoked; for example, the eastern gorilla subspecies may have originated as migrants from West Africa and therefore the reduced variation in the former may be a result of a population bottleneck associated with colonization.

View this table:

Average nucleotide diversity in gorillas, chimpanzees, bonobos, and humans and effective population sizes estimated from π

View this table:

Average sequence divergence (%) between taxa estimated from the 49 DNA segments studied

Our estimates of the Ne of African apes are at most only 2.5 times larger than that of modern humans. This estimate is close to the average of five nuclear loci of Jensen-Seaman et al. (2001) but is considerably lower than that of Chen and Li (2001), who estimated the Ne of the ancestral human-chimpanzee population to be between five and nine times that of modern humans. On the other hand, a reanalysis of Chen and Li's data set using the maximum-likelihood method suggested an effective population size of ∼12,000 for the ancestor of humans and chimpanzees (Yang 2002). This estimate is only slightly larger than the estimate (10,400) of Ne for modern humans and much lower than that (21,300) for extant chimpanzees. The Ne for the common ancestor of all three species was estimated to be ∼38,000 (Yang 2002), which is higher than the present estimates of Ne for any living African ape. It is tempting to speculate that this last common ancestor of the African apes and humans, with its large effective population size, may have shared some ecological characteristics with gorillas, the most diverse living African ape. Furthermore, one may speculate that some of the social or ecological changes resulting in a lower Ne in extant humans and chimpanzees may have already begun to occur in their common ancestor following its divergence from gorilla. Of course, we recognize that the long-term Ne of any species may be influenced by unrecoverable idiosyncrasies in the species' unique history. Further research is needed to reconcile the different estimates proposed for the effective population sizes of our ancestors and to test hypotheses seeking to explain the differences among species.

Mitochondrial vs. nuclear DNA: As one can see from the above studies, most of the data disclosed twice as much nucleotide diversity in gorillas as in humans, assessed using both mitochondrial and nuclear DNA. Thus, unlike chimpanzees, there is a similar ratio of nucleotide diversity between humans and gorillas in both nuclear and mtDNA data (Table 2). Wise et al. (1997) pointed out a disparity in using mtDNA vs. nuclear DNA between humans and chimpanzees and in further comparisons to other nonhuman primates they suggested that humans, not chimpanzees, were unusual in possessing such low levels of mtDNA diversity relative to that of the nuclear genome. In the 49 autosomal segments studied, the difference in nucleotide diversity between humans and chimpanzees is considerably smaller for nuclear DNA than for mtDNA data (Yuet al. 2003), while a similar level was observed in gorillas in this study. Therefore, this disparity may arise from the unusually high estimates of chimpanzee mtDNA diversity. There are several possible explanations. First, not all loci are expected to give the same results because of stochastic effects. The fourfold smaller effective population size in mtDNA compared to nuclear DNA will increase the stochastic aspects of drift. Second, the cause of this disparity could be a reduction in the effective population size (Ne) in the human lineage since the human-chimp divergence; a reduction in Ne causes a larger decrease in nucleotide diversity for mtDNA than for nuclear DNA. This is also true for bonobos, which may also have experienced a population reduction during the recent past. Third, different studies have used different samples, which can strongly affect the results of diversity studies. Finally, it may be possible that balancing selection or local directional selection has acted to increase variation in the chimpanzee mitochondrial genome.

Conclusion: Gorillas possess the greatest amount of autosomal nucleotide diversity and the largest effective population size among all of the living species in the African ape-human clade, with about twice as much diversity as modern humans. Gorillas also show the greatest evidence of population subdivision. A reduction in effective population size may have occurred in the common ancestor of humans and chimpanzees following divergence from the gorillas. Finally, we hope that this understanding of the amount and pattern of genetic variation in the African apes, along with future studies that sample wild populations, can help in establishing conservation priorities for these endangered species.


We thank George Amato, Amos Deinard, and Kenneth Kidd for their generous donation of blood samples or DNA. We thank the National Zoo and Miami Metro Zoo for their donation of gorilla blood samples. This study was supported by National Institutes of Health grants GM55759 and GM30998.


  • Communicating editor: Y. X. Fu

  • Received June 26, 2003.
  • Accepted November 19, 2003.


View Abstract