Neanderthals were a group of archaic hominins that occupied most of Europe and parts of Western Asia from ∼30,000 to 300,000 years ago (KYA). They coexisted with modern humans during part of this time. Previous genetic analyses that compared a draft sequence of the Neanderthal genome with genomes of several modern humans concluded that Neanderthals made a small (1–4%) contribution to the gene pools of all non-African populations. This observation was consistent with a single episode of admixture from Neanderthals into the ancestors of all non-Africans when the two groups coexisted in the Middle East 50–80 KYA. We examined the relationship between Neanderthals and modern humans in greater detail by applying two complementary methods to the published draft Neanderthal genome and an expanded set of high-coverage modern human genome sequences. We find that, consistent with the recent finding of Meyer et al. (2012), Neanderthals contributed more DNA to modern East Asians than to modern Europeans. Furthermore we find that the Maasai of East Africa have a small but significant fraction of Neanderthal DNA. Because our analysis is of several genomic samples from each modern human population considered, we are able to document the extent of variation in Neanderthal ancestry within and among populations. Our results combined with those previously published show that a more complex model of admixture between Neanderthals and modern humans is necessary to account for the different levels of Neanderthal ancestry among human populations. In particular, at least some Neanderthal–modern human admixture must postdate the separation of the ancestors of modern European and modern East Asian populations.
NEANDERTHALS were a group of archaic hominins that occupied large parts of Europe and West Asia from ∼30,000 to 300,000 years ago (KYA) (Stringer and Hublin 1999; Hublin 2009). Their disappearance in the fossil record often coincides with the first appearance of anatomically modern humans (AMH) in that region (Finlayson 2004). Where, when, and how often Neanderthals interbred with expanding AMH populations is still an open question. Morphological studies have generally concluded that Neanderthals made little or no contribution to present-day human populations (Stringer and Andrews 1988; Lahr 1994), but others have suggested there was some admixture (Duarte et al. 1999; Trinkaus 2007). Initial comparisons of Neanderthal and modern human DNA found no evidence for a Neanderthal contribution to the modern human gene pool (Krings et al. 1997; Serre et al. 2004; Noonan et al. 2006). However, indirect studies of patterns of linkage disequilibrium (LD) in contemporary human populations have consistently found support for admixture between “archaic” human groups (such as Neanderthals) and modern humans (Garrigan et al. 2005a,b; Plagnol and Wall 2006; Wall et al. 2009; Hammer et al. 2011; Lachance et al. 2012).
A detailed analysis of a draft Neanderthal genome and five low-coverage (4×) human sequences estimated that Neanderthals made a 1–4% contribution to the gene pool of modern non-African populations (Green et al. 2010). The presence of “Neanderthal DNA” in East Asians and Melanesians was initially surprising because the archaeological record shows that Neanderthals and early modern humans coexisted only in Europe and western Asia. Green and colleagues hypothesized that Neanderthals and modern humans came into contact and interbred in the Middle East ∼50–80 KYA, prior to the divergence of modern-day European and Asian populations.
Green et al. (2010) presented three kinds of evidence in favor of interbreeding. First, they found (using D-statistics, a new measure of genetic similarity introduced in that article) that the three sampled non-African genome sequences (from a French, a Han Chinese, and a Papua New Guinean) are more similar to the Neanderthal sequence than is either of the two sampled African sequences (from a San and a Yoruban). Second, they identified several haplotypes that are in low frequency in Europeans, absent from Africans, and present in the Neanderthal sequence, which suggests those haplotypes were derived from Neanderthals. Third, they found many more genomic fragments in a European genome than in an African genome that have low divergence to the Neanderthal genome.
Admixture between modern humans and Neanderthals within the past 100,000 years (Kyr) is only one possible explanation for these D-statistic patterns. Green et al. noted that another potential explanation is ancient population subdivision within Africa before both Neanderthals and modern humans left Africa (cf. Green et al. 2010, figure 6). If there had been long-lived (e.g., >500 Kyr) population structure within Africa, and both Neanderthals and non-African AMH came from the same “source” subpopulation, then Neanderthals would be more similar to non-Africans in the absence of any recent admixture between AMH and Neanderthals (see Figure 1A). This intuitive argument was confirmed by the simulation studies of Durand et al. (2011) and Eriksson and Manica (2012), but these studies did not account for the other two lines of evidence summarized above. Two other studies have shown that the ancient-subdivision model is incompatible with other aspects of the data. Yang et al. (2012) demonstrated that recent admixture (Figure 1B) could be distinguished from ancient subdivision (Figure 1A) by computing the frequency spectrum of modern humans, conditioned on the Neanderthal sequence having the derived allele and an African sequence having the ancestral allele. This double conditioning enriches for alleles introduced by recent admixture if it occurred. Yang and colleagues found that the doubly conditioned frequency spectrum in Europeans and in East Asians is consistent with recent admixture, not with ancient subdivision. Separately, an analysis of the extent of LD at closely linked sites also concluded that the data were consistent with recent admixture and not with ancient subdivision (Sankararaman et al. 2012).
In this study, we revisit the question of Neanderthal admixture, using an expanded data set of 42 high-coverage (>45×) modern human genomic sequences, and we take advantage of the recent high-coverage Denisova genome (Meyer et al. 2012) to obtain more refined estimates of admixture proportions. We use two complementary methods of analysis. One is the D-statistic method introduced by Green et al. (2010). D-statistics reflect site-by-site differences. Because we have multiple individuals from each of several populations, we can quantify the extent of variation in D-statistics among pairs of individuals from the same two populations and obtain greater statistical power by combining estimates among all pairs. The second method is an LD-based method similar to one introduced by Wall (2000) and Plagnol and Wall (2006) for identifying putatively introgressed regions in modern human genomes. We use the draft Neanderthal genome to identify segments in the modern human genome that were derived from admixture with Neanderthals. This method is similar to the one used by Green et al. (2010) but is less restrictive and allows quantification of the differences in the number of admixed segments in different populations.
Using both of these methods, we show there was more Neanderthal admixture into East Asian populations than into European populations. This conclusion is consistent with that of Meyer et al. (2012), which was based on the analysis of a smaller number of modern human sequences. By using the high-coverage Denisova genome, we are able to show that the admixture rate into East Asians is 40% higher than into Europeans. We conclude that admixture between Neanderthals and modern humans did not occur at a single time and place, as suggested by Green et al. (2010). Some of it had to have occurred after the separation of East Asians and Europeans. Further, we show that there was significant Neanderthal admixture into the Maasai population of East Africa, probably because of secondary contact with a non-African population rather than admixture directly from Neanderthals.
Materials and Methods
Complete genomics data
We downloaded data from 69 publicly available genome sequences from the Complete Genomics (CGI) website (http://www.completegenomics.com/public-data/). Complete Genomics sequenced a Yoruba (YRI) trio, a Centre d'Etude du Polymorphisme Humain (CEPH)/Utah (CEU) pedigree family of 17 family members, a Puerto Rican (PUR) trio, and a diversity panel from 10 different populations. Combining these data sets and using only nonrelated, nonadmixed individuals, we have a sample size of 42 individuals representing nine different populations (Table 1). In addition to 36 members of the diversity panel, we also used the parents from the YRI trio and the maternal and paternal grandparents in the CEU pedigree. The individual genomes were sequenced to a minimum 45-fold coverage (Drmanac et al. 2010). The eight populations are Utah residents with Northern and Western European ancestry from the CEPH collection (CEU); Han Chinese from Beijing, China (CHB); Gujarati Indians from Houston (GIH); Japanese from Tokyo (JPT); Luhya from Webuye, Kenya (LWK); Maasai from Kinyawa, Kenya (MKK); Toscani from Italy (TSI); and Yoruba from Ibadan, Nigeria (YRI). Samples from three other populations were also available from Complete Genomics, those of Mexican ancestry in Los Angeles (MXL), African-Americans from southwest Arizona (ASW), and Puerto Ricans from Puerto Rico (PUR), but these were excluded from our analysis because of recent intercontinental admixture. All genomic data were downloaded from Complete Genomics’ ftp site (ftp://ftp2.completegenomics.com/). We used two separate pipelines for filtering and processing the data, optimized for the different analyses performed (see below).
For the D-statistic analyses, each individual genome was aligned with the human genome assembly hg19 for consistency with the available assembly of the Neanderthal genome. Since our results were somewhat unexpected, we prepared the data for analysis in two different ways to check for consistency. We denote these analysis A and analysis B.
For analysis A, we used the release of the file format version 2.0 (software version 188.8.131.52) that was generated in September 2011. This version was mapped to the human reference genome hg19. We also downloaded the chimpanzee genome pantro2 aligned to hg19 from the University of California, Santa Cruz (UCSC) Genome Browser (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/). The Neanderthal sequence was obtained by pooling reads from the three Vindija bones (SL Vi33.16, SL Vi33.25, and SL Vi33.26) that were aligned to the reference human genome (Green et al. 2010). The Neanderthal data were downloaded from the UCSC genome browser (http://genome.ucsc.edu/Neandertal/). To match the filtering used in the original Green et al. (2010) study, we used only sites with a mapping quality score (MAPQ) of at least 90 and a sequence quality >40. On average, the coverage of the Neanderthal genome was ∼1.3-fold. We kept only sites that had one, two, or three reads.
After filtering out any insertions, deletions, or ambiguously called sites in the Complete Genomics data, we merged them with the chimpanzee and Neanderthal genomes. We kept only sites that had no more than two alleles in any of the human genomes and at which alleles were called for each human, the chimp, and the Neanderthal. Furthermore, we considered only transversion differences.
We also obtained the high-coverage Denisova genome from Meyer et al. (2012). The genome was aligned to the human reference genome (hg19) and the average coverage was ∼30x. We filtered out all sites that had <16 reads or >46 reads. We merged these data with the data from analysis A to compute the D-statistic and f-statistic.
For analysis B, we redownloaded the genomic data from the Complete Genomics website (ftp://ftp2.completegenomics.com/, software version 184.108.40.206, file format version 2.0, February 2012). These sequences were aligned to hg18. We applied a less stringent filter of the Neanderthal data: the filtering for mapping quality and sequence quality remained the same as in analysis A, but there were no restrictions on the number of reads per site. Finally, instead of considering the chimp genome as the outgroup, we used the ancestral alleles defined by the 1000 Genomes Project from the Enredo-Pecan-Ortheus (EPO) pipeline (Paten et al. 2008a,b) (data downloaded from ftp://ftp.1000genomes.ebi.ac.uk/). We refer to this outgroup as the reconstructed common ancestor (RCA).
For samples from any two populations compared, we filtered out any insertions, deletions, or ambiguously called sites. These genomic samples were then merged with the Neanderthal genome and the RCA outgroup. This differs from analysis A, where all populations were merged with the Neanderthal, Denisova, and chimp genome prior to any comparisons between populations. We considered only sites where the difference between the ancestral allele from the RCA and the alternate allele is a transversion, as we did in analysis A.
LD-based analysis filters
Since the LD-based analyses primarily utilize patterns of extant genetic variation (and only secondarily use the draft Neanderthal genome), we aligned variant calls to the updated human genome assembly (hg19), included both transitions and transversions, and imposed more stringent filters to throw out repetitive regions. Specifically, a custom series of Perl/C scripts and cgatools v220.127.116.11 were used to get a common set of variants from each individual. Using the CGI’s variant file, all polymorphic regions containing SNPs were identified and reconstructed according to CGI’s descriptions. These regions were then filtered for SNPs in such a way that both alleles were known for a given individual and were not part of a complex variant (for example, a SNP on one haploid phase and a deletion on the other phase). We then pooled all unique SNP positions from the full panel of samples and removed all SNPs located within repeats and segmental duplications with a minimum size of 50 bp. Structural variants (dgv track on UCSC), self chain (identity <90%, UCSC self-chain track), segmental duplications (UCSC), microsatellites (UCSC), simple tandem repeats (UCSC), and repeat masked sequence (UCSC) were also excluded. The final list of SNPs was then used by CGI’s “snpdiff” tool to extract each sample’s base calls relative to the human reference genome (hg19, Build 37). The snpdiff output was then reformatted to ms, PLINK, and other text-based formats for further analyses.
Subsequently, we identified numerous regions where all/most individuals had heterozygous SNP calls but only one homozygous genotype was present. These regions likely reflect either alignment errors due to the Complete Genomics short-read sequencing technology or errors in the human reference genome sequence. We excluded all regions that included sites where over half of the individuals are heterozygous and only one homozygous genotype is present. The coordinates for these regions are available from the authors upon request.
Denisova sequence reads (Reich et al. 2010), mapped to the human reference genome hg18, were downloaded from the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&c=chrX&g=bamSLDenisova). Consensus Neanderthal sequence generated from three bones and aligned to the human reference genome hg18 was downloaded from the Ensembl genome browser (http://neandertal.ensemblgenomes.org/data_info.html). Samtools 0.1.18 (Li et al. 2009) was used to convert the BAM files into a pileup alignment (mpileup arguments: -B -q5 -Q30) of each ancient hominin genome and hg18 for the region of interest. To compare modern human sequence tracks to ancient hominin sequences, hg19 coordinates of interest were converted to hg18 coordinates using the UCSC genome browser tool liftOver and extracted from the pileup alignments via custom perl scripts. To further compare the human sequences to sequences of other primate genomes, another custom perl script was used to extract the same hg19 coordinates of interest from a subset of the genomes in the UCSC MultiZ alignments found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/. Computations were performed using the University of California, San Francisco, Biostatistics High-Performance Computing System.
D-statistics and estimates of admixture rates
D-statistics, introduced by Green et al. (2010), are summary statistics for genome sequences from four populations. Two populations, P1 and P2, are compared to a test population, P3. The fourth population P4 is used as an outgroup to determine which allele is ancestral at each site. In our case, P4 is the chimpanzee reference sequence (pantro2) denoted by C, and P3 is the Neanderthal sequence, denoted by N. P1 and P2 are two human sequences. The chimp reference sequence is assumed to have the ancestral allele, denoted by A. D is computed only for sites at which both of the Neanderthal and one but not both of the human sequences have a different allele, assumed to be derived and denoted by B. That is, only those sites with configurations ABBA and BABA are used, where the order is P1, P2, P3, P4. The requirement that two copies of both the derived and the ancestral alleles be present greatly reduces the effect of sequencing error (Durand et al. 2011).
When only a single sequence from each population is available,(1)where nABBA and nBABA are the numbers of sites with each of the two configurations. When diploid sequences from each individual from P1 and P2 are available, then(2)where and are the frequencies of the derived allele (0, 0.5, 1) in the individual in P1 and P2, respectively at site i. Equation 2 is equivalent to sampling one of the chromosomes at random from P1 and P2 and then using Equation 1.
Green et al. (2010) and Durand et al. (2011) showed that the expected value of D is 0 if P1 and P2 form a clade and P3 is the outgroup. These articles also showed that if there was admixture from P3 into P2, then E(D) > 0. The magnitude of D depends on the admixture proportion f and on the population divergence times and various effective population sizes.
Reich et al. (2010) showed that if there is a sister group of P3, which we call P5, that has not admixed with P1, P2, or P3, then it is possible to estimate f directly. In our case, P5 is the Denisovan genome. To estimate f, we define to be the numerator of either Equation 1 or Equation 2. Then(3)The intuition behind this estimator is that the denominator quantifies the excess coalescent events that occur between lineages in P3 and P5 because they are sister groups. Lineages in P2 that are introduced by admixture have the same coalescent history as all lineages from P3. Hence, the ratio is the fraction of lineages in P2 that trace their ancestry to P3 because of admixture (Reich et al. 2010). In our application of this method, we are assuming that there is no admixture from Denisovans (P5) into the other populations (P1, … , P4). Although Skoglund and Jacobsson (2011) have argued that there was admixture from Denisovans into East Asians, our results described below did not find evidence of this admixture for the Han Chinese and Japanese samples we analyzed. For analysis A, we explored the variation in estimated D-statistics and admixture rates (f) for all pairs of individuals of different human populations. For analysis B, since we did not include the Denisova genome, we estimated only D-statistics.
We computed D for each pair of individuals, both within populations and between populations. We developed two randomization tests of statistical significance. Both are similar to the Mantel test. Test 1 tests whether the average D computed for one pair of populations is significantly larger than for another pair, and test 2 tests whether the average D for a pair of populations differs significantly from 0.
For test 1, we start with sequences from three human populations, G1, G2, and G3, each containing k1, k2, and k3 diploid sequences. We compute two matrices of D values. The elements of M1 are D(G1,i, G3,j, N, C), where G1,i and G3,j are the ith and jth individuals in G1 and G3 (i = 1, … , k1; j = 1, … , k3). The elements of M2 are D(G2,i, G3,j, N, C). M1 has k3 rows and k1 columns, and M2 has k3 rows and k2 columns. From M1 and M2 the average D’s are computed, D1 and D2. The problem is to test whether D1 = D2. A t-test cannot be used because the elements within each matrix are not independent of each other and because the same reference population (G3) is used to compute both matrices. Instead, we combine M1 and M2 into a single matrix with k3 rows and k1 + k2 columns. Then we randomize the columns and compute D1 for the matrix containing the first k1 columns and D2 for the matrix containing the last k2 columns. Then we compare the observed D1 – D2 with the distribution of differences from the randomized matrices. We used a two-tailed test and 1 million replicates for each test.
Test 2 is similar to test 1, but because we compare only G1 and G2, a subset of one population is used in place of the reference population, G3. For the population with the larger sample size (say G1), we create a random partition subject to the constraint that they differ in number by no more than one. For M1, we compute D for all pairs of individuals in and G2. The elements of M2 are , where and are the ith and jth individuals in the two subpopulations created by the partition. Test 1 is then applied to M1 and M2.
We also calculated the f-statistics for each pair of individuals. Using the same randomization tests as described above, we determined whether there were significant differences between populations in estimates of the admixture rate. Significant differences observed using the admixture rate suggest that the effect is truly due to the Neanderthal and not admixture with Denisovans.
Identifying putative archaic human regions
Previous work has shown that archaic admixture often leads to long, divergent haplotypes at low frequency (Wall 2000; Plagnol and Wall 2006). We define two SNPs to be “congruent” if their diploid allele counts (i.e., zero, one, or two counts of a particular allele) across individuals are completely correlated (i.e., r2 = 1). We define the maximum number of pairwise congruent SNPs to be ld and denote the collection of rarer (minor allele frequency ≤ 0.5) alleles at each of these pairwise congruent sites to be the putative archaic haplotype. From the filtered Complete Genomics data, we then identified all regions from 8 to 100 kb in length where ld ≥ 30 and ld/S ≥ 0.1, where S is the total number of polymorphic sites in the region. When identified regions overlapped, we took the region with the largest value of ld/S. We also required that neighboring regions with putative archaic haplotypes congruent with each other be separated by at least 200 kb, to avoid double counting long archaic haplotypes. A total of 2254 regions were identified. Of these, 411 were private to the non-African samples.
To estimate what proportion of these regions might be false positives, we simulated whole-chromosome sequence data (Chen et al. 2009) under a model that incorporated both recent (intracontinental) and ancient (intercontinental) population structure (Figure 2). Specifically, we assume a panmictic ancestral population split into two daughter populations at time T0 = 0.6 (using the standard coalescent scaling of 4N generations), with (symmetric) scaled migration rate of M0 = 5. At time T1 = 0.05 – 0.053, one of the ancestral populations (i.e., the “non-African” one) experiences a population bottleneck resulting in a 100-fold reduction in population size. Then, at time T2 = 0.045, each population splits into two descendant populations, connected by migration rate M1 = 8. While arbitrary, this model attempts to incorporate the major features of human demographic history, including intra- and intercontinental population structure and a bottleneck in the history of non-African populations, and is similar to the model used by Yang et al. (2012). The results described below are qualitatively similar if other plausible values for the times and migration rates are used (results not shown). Using N = 10,000 and an average generation time of 25 years, each unit of scaled time corresponds to 1 million years.
We simulated 30 different 100-Mb chromosomes, using the model described above with mutation parameter θ = 3.5 × 10−4/bp, recombination parameter ρ = 4 × 10−4/bp, and 10 individuals sampled from each of the four extant populations. The simulated number of segregating sites was substantially higher than the actual number in our filtered data. Since average ld values are positively correlated with levels of diversity, the simulated ld values are higher on average than expected in real data, and our choice of θ is conservative. Also, standard estimates of ρ are generally higher than the value we took (Myers et al. 2005), which is also conservative for our purposes. We then tabulated the total number of regions with ld ≥ 30, ld/S ≥ 0.1, and with divergent haplotype SNPs private to the simulated non-African samples. We identified a total of 3 regions that satisfied these criteria, compared with 411 regions that were identified from the actual data. This leads to an estimate of a false discovery rate of q < 0.01.
Identifying putative Neanderthal regions
To identify which of the 2254 regions described above were likely to reflect recent Neanderthal admixture, we imposed the following additional criteria on the putative archaic human haplotypes:
The Neanderthal allele must be called at ≥12 SNPs and match the putative archaic haplotype at ≥70% of these SNPs.
The Neanderthal allele and the chimp allele must be called at ≥8 SNPs and the Neanderthal allele must be derived (relative to chimp) at ≥60% of these sites.
The putative archaic haplotype must be at low frequency (<5%) in the sub-Saharan African samples.
The motivation for criterion 1 is obvious, and we note that a more stringent cutoff was not used due to the poor quality of the Neanderthal genome sequence. Criterion 2 was implemented to cut down on regions that reflect shared ancestral polymorphism between modern humans and Neanderthals; it is based on an observation of Noonan et al. (2006) that recent Neanderthal admixture will lead to an increase in SNPs where Neanderthals have the derived allele. Finally, criterion 3 reflects our prior belief that admixture with Neanderthals did not occur in Africa and that the presence of Neanderthal alleles in Africa could reflect only more recent migration patterns. A total of 226 regions were identified that meet these additional criteria. We note in passing that the specific cutoffs used in criteria 1–3 are somewhat arbitrary, but our qualitative conclusions are unchanged under a range of similar criteria (results not shown).
We implemented a simple permutation test to assess the statistical significance of the observed difference in frequencies of Neanderthal regions in East and South Asians and Europeans. Specifically, we kept the presence/absence of Neanderthal regions for each individual constant and randomly permuted the geographic label (i.e., “European” vs. “East Asian”) of the sample 100,000 times. Similar analyses were used to compare the frequency of Neanderthal regions in Maasai vs. other sub-Saharan African samples.
Identifying putative Denisovan regions
Excluding the 226 Neanderthal regions identified above, we screened the remaining 2028 putative archaic regions for Denisovan admixture, using the same criteria as for Neanderthals. Thirty total regions fit these criteria.
Estimating local ancestry in the Maasai
We took the filtered Complete Genomics data described at the start of this section and estimated SNP allele frequencies separately in the 13 European samples and the 13 non-Maasai African samples. These were used as proxies for the (unknown) non-African and African ancestral populations. We then included only those SNPs with allele frequencies that differ by at least 0.3 in our analyses. We calculated the likelihood of each ancestral configuration (i.e., zero, one, or two alleles inherited from the non-African population) separately for each SNP. Then, over sliding windows of 1 Mb, we formed a composite likelihood by multiplying together all of the single-SNP likelihoods contained in the window and tabulated which ancestral configuration had the highest (composite) likelihood. For each SNP, we then used majority rule to make ancestry calls, using all windows containing the SNP in question. See Wall et al. (2011) for further details.
D-statistics and estimates of f
The D-statistics and estimates of f we computed are summarized in Figure 3 and Supporting Information, File S1, Table S1, Table S2, Table S3, Table S4, Table S5, Table S6, Table S7, Table S8, Table S9, Figure S1, Figure S2, Figure S3, Figure S4, Figure S5, Figure S6, Figure S7, and Figure S8. Several features of the results are notable. First, we find evidence for more Neanderthal admixture into the East Asian samples than into the European samples (P = 0.001)—consistently higher D values result when East Asians are compared to one of the African populations than when Europeans are compared (Figure 3A, Table S4), and the average D is positive when East Asians are compared to Europeans (Figure 3C, Table S5). In analysis B, comparisons with the South Asian samples are intermediate with respect to the European and East Asian samples but not in analysis A, indicating that the South Asian sample differs from the East Asian ones but the degree of similarity to Europeans remains to be established. Also, we find evidence for a small but significant amount of Neanderthal admixture into the Maasai genomes (P ∼ 0.03, Table S4). When compared to the Yoruba, the Maasai have a higher average D than the Luhya (Figure 3B, Table S4). When the Maasai are compared to all other African samples, the average D is positive (Figure 3D). In addition, when East Asians and Europeans are compared to the Maasai, the average D’s are somewhat lower than when they are compared to either the Yoruba or the Luhya. The P-values shown in Figure 3, A and B are from test 1 and those in Figure 3, C and D are from test 2.
Table S1, Table S2, and Table S3 show estimated values of f. The estimates of the admixture rate show that when we incorporate the Denisovan genome into our analysis, the admixture rate between East Asians and Neanderthals remains significantly higher than the admixture rate between Europeans and Neanderthals (P ∼ 0.001, Table S7). The Maasai remain significantly more genetically similar to the Neanderthals when compared to the Luhya (P ∼ 0.03, Table S7), but the observed significant difference for the D-statistic when comparing the Maasai and the Yoruba is not observed for the f-statistic (P ∼ 0.34, Table S7), which probably reflects the lower power of using f as a test statistic. The admixture rates for the South Asians give the same results as those for the D-statistic (Table S9).
Identifying “Neanderthal haplotypes”
Our new method for identifying introgressed Neanderthal fragments in human populations detected 226 different putative Neanderthal regions. The relative frequencies of these putative Neanderthal haplotypes in the 42 sampled modern human individuals then provide estimates of the relative contributions of Neanderthal DNA to the gene pools of contemporary human populations. We found that on average the “Neanderthal haplotypes” were at higher frequency in the East Asians than in the Europeans (9.6% vs. 6.4%; P = 3.0 × 10−4, permutation test), consistent with the D-statistic results presented in Figure 3 (Figure 4). We also found evidence for a small, but statistically significant, Neanderthal contribution to the genomes of the Maasai (P = 4.9 × 10−4), but did not find a significant difference in Neanderthal haplotype frequency between the East Asian and South Asian samples (P > 0.05).
Additional test of ancient population structure
As reviewed in the Introduction, there is already evidence against the hypothesis that the extra similarity of non-African populations to Neanderthals is accounted for by ancient population subdivision. To explore this point further, we took the 411 regions from our whole-genome analyses that were identified purely on the basis of their LD patterns (i.e., without using any information from the Neanderthal genome sequence). Then, for each non-African individual, we calculated the D-statistic for those regions where the individual contained a rare, diverged haplotype. If this haplotype were recently inherited from Neanderthals, we would expect the D values to be strongly positive. If instead there were no recent admixture between modern humans and Neanderthals, then there is no a priori reason why these regions would show D values significantly different from 0. Recombination acting over the past 300 Kyr would break up local patterns due to shared ancestral polymorphisms to scales <0.01 cM (i.e., <10 kb on average). The D values that we observe are strongly positive (average D = 0.594, compared with an average D = 0.068 for the whole genome), providing additional evidence that most of the unusual haplotypes from these 411 regions are indeed the result of recent introgression from the Neanderthal gene pool (P << 10−8, Figure 5).
Identifying “Denisovan haplotypes”
Excluding the 226 Neanderthal regions described above, we used the same criteria to identify regions likely inherited from Denisovans. We identified a total of 30 regions, all at low frequency, with no significant difference in frequency between populations.
Previous genetic studies have suggested that the Maasai may be an admixed population with a substantial proportion of non-African ancestry (Henn et al. 2011). If the non-African ancestry were due to recent (i.e., post-Neanderthal) admixture, then the observation of Neanderthal ancestry in the Maasai would not be unexpected. Alternatively, spatially explicit models of ancient population structure might explain the greater similarity between Maasai and Neanderthals relative to other sub-Saharan African groups (A. Manica, personal communication). One difference between these alternative explanations is what they predict about the patterns of similarity across the genomes of Maasai individuals. Under a model of recent admixture, we expect Maasai genomes to show large, distinct blocks of sequence with different genetic patterns, corresponding to blocks with non-African vs. African ancestry. The average size of the non-African blocks (in morgans) is roughly the inverse of the time (in generations) since admixture. In contrast, under a model of ancient admixture the similarity of Maasai genomes to the Neanderthal genome will be spread throughout the genome because the admixture happened much longer ago.
To distinguish between these two possibilities, we employed a composite-likelihood–based approach to identifying African and non-African regions of ancestry across the genomes of the four Maasai samples (Wall et al. 2011). Briefly, we used the European (CEU and TSI) and other African (YRI and LWK) samples (Table 1) to estimate allele frequencies in non-African and African ancestral populations and then estimated the number of alleles inherited from each ancestral population at each SNP in the genome. These extant samples may not be perfect proxies for the true ancestral populations, but the qualitative results presented below are likely to be valid.
In summary, we estimate an average of ∼30% non-African ancestry in each Maasai genome, and the sizes of the ancestral blocks are consistent with admixture that happened ∼100 generations ago (Figure 6A). We then partitioned each Maasai genome into regions with zero, one, or two inferred African alleles and calculated D separately for each partition. We found that the D values are significantly more negative with increasing numbers of inferred non-African alleles (P = 2.0 × 10−4; Figure 6B). This observation provides strong support for recent non-African gene flow into the Maasai, with the non-African alleles bringing with them low levels of Neanderthal ancestry.
Our results confirm and reinforce several conclusions about admixture between Neanderthals and the ancestors of modern humans. Using a much larger number of high-coverage genome sequences than were previously analyzed for this purpose and using two complementary methods of analysis (D-statistics and detection of introgressed Neanderthal segments), we confirm the conclusion of Meyer et al. (2012) that East Asians (Han Chinese and Japanese) are more similar to the published Neanderthal sequence than are Europeans. Because we have analyzed more modern human sequences than Meyer et al. (2012) did, we are able to show the extent of variation within both Asian and African populations. We also confirm the conclusions of Yang et al. (2012) and Sankararaman et al. (2012) that the similarity of both Europeans and East Asians to Neanderthals is the result of recent admixture and not ancient population subdivision. Finally, we used the high-coverage Denisova sequence of Meyer et al. (2012) to determine that the admixture rate (f) into East Asians is ∼40% higher than into Europeans.
We were not able to confirm the conclusion of Skoglund and Jakobsson (2011) that there was Denisovan admixture into East Asians. We did not detect any difference in the number of apparent Denisovan segments in Europeans and East Asians. The East Asian genomes analyzed, however, were from northern East Asia (Beijing and Tokyo), not from southern East Asia where Skoglund and Jakobsson found the strongest signal of admixture with Denisovans.
Our results and those of Meyer et al. (2012) imply that the relatively simple admixture scenario proposed by Green et al. (2010) needs to be altered. At least two separate episodes of admixture between Neanderthals and modern humans must have occurred, and at least one of those episodes must have occurred after the separation of the ancestors of modern Europeans and East Asians. Rather than have two distinct episodes of admixture, it seems more plausible that admixture took place over a protracted period 50–80 KYA. During that period the ancestors of Europeans diverged and subsequently experienced less admixture than the ancestors of East Asians. This scenario is consistent with the simulation models of Currat and Excoffier (2011) and Skoglund and Jakobsson (2011).
If this scenario is correct, the time of separation of the ancestors of modern European and East Asian populations is constrained. Since there is no archaeological record of Neanderthals in the past ∼30 Kyr, it follows that the separation of Europeans from East Asians had to have occurred before Neanderthals went extinct. Consequently, estimates of East Asian–European population divergence of <30 KYA (Gutenkunst et al. 2009; Gravel et al. 2011) are unlikely to be correct. This timeframe is also supported by a 40- to 50-KYA modern human fossil recently found in China (Fu et al. 2013).
Our two analyses yielded slightly different results for the Gujarati (South Asian) samples. However, it would not be surprising if the true level of Neanderthal ancestry in South Asians was intermediate between Europeans and East Asians because previous studies have shown gradients in genetic ancestry across Eurasia (Rosenberg et al. 2002).
Our finding of Neanderthal admixture into the Maasai was initially surprising, given the lack of evidence that Neanderthals ever crossed into Africa or that the ancestors of the Maasai were ever in the Middle East. Although direct contact between the two groups in the past is theoretically possible, our results are more consistent with a scenario involving recent admixture between the ancestors of the Maasai and one or more (historically) non-African groups with Neanderthal ancestry several thousand years ago. This interpretation is broadly consistent with recent findings of African admixture into Middle Eastern and Southern European populations during the same timescale (Moorjani et al. 2011) and a greater genetic similarity between East African and non-African samples than between West African and non-African samples (Tishkoff et al. 2009). Together these studies provide additional support for the hypothesis that admixture between genetically diverged groups is a common feature of human demographic history.
The new picture of human and Neanderthal ancestry that emerges from our results is almost certainly not complete, and our results suggest that intracontinental variation in levels of Neanderthal ancestry may be common. With the current rate of progress in whole-genome sequencing and the possibility of additional draft genomes from specimens of archaic individuals, we will soon learn more about the admixture process. In particular, the construction of “archaic admixture maps” detailing the distribution of archaic DNA segments in different modern human populations will help us to infer the timing, locations, and exact numbers of introgression events and the role that archaic admixture may have played in the evolution of the AMH genome.
This work was supported in part by National Institutes of Health grants R01-GM40282 (to M.S.), R01-HG005226 (to J.D.W. and M.F.H.), and T32 HG 00047 (training grant to M.A.Y.), as well as by National Science Foundation Graduate Research Fellowship Program Division of Graduate Education grant 1106400 (to M.A.Y.).
Communicating editor: A. Di Rienzo
- Received November 30, 2012.
- Accepted February 4, 2013.
- Copyright © 2013 by the Genetics Society of America