Multilocus Analysis of Introgression Between Two Sympatric Sister Species of Drosophila: Drosophila yakuba and D. santomea
Ana Llopart, Daniel Lachaise, Jerry A. Coyne


Drosophila yakuba is widely distributed in sub-Saharan Africa, while D. santomea is endemic to the volcanic island of São Tomé in the Atlantic Ocean, 280 km west of Gabon. On São Tomé, D. yakuba is found mainly in open lowland forests, and D. santomea is restricted to the wet misty forests at higher elevations. At intermediate elevations, the species form a hybrid zone where hybrids occur at a frequency of ∼1%. To determine the extent of gene flow between these species we studied polymorphism and divergence patterns in 29 regions distributed throughout the genome, including mtDNA and three genes on the Y chromosome. This multilocus approach, together with the comparison to the two allopatric species D. mauritiana and D. sechellia, allowed us to distinguish between forces that should affect all genes and forces that should act on some genes (e.g., introgression). Our results show that D. yakuba mtDNA has replaced that of D. santomea and that there is also significant introgression for two nuclear genes, yellow and salr. The majority of genes, however, has remained distinct. These two species therefore do not form a “hybrid swarm” in which much of the genome shows substantial introgression while disruptive selection maintains distinctness for only a few traits (e.g., pigmentation and male genitalia).

ACCORDING to the biological species concept (BSC), the coexistence of distinct entities in sympatry suggests a severe reduction in gene flow between them (Dobzhansky 1937; Mayr 1942; Coyne 1992; Coyne and Orr 1998). It is clear, however, that there is more introgression between species than early advocates of the BSC suspected (Coyne and Orr 2004). This observation of introgression has prompted the proposal of alternative species concepts. Mallet (1995), for example, proposed the “genotypic cluster species concept” (GCSC), in which factors beyond reproductive isolation, such as stabilizing selection or historical inertia, are claimed to maintain discrete species in one habitat. As Coyne and Orr (2004) noted, many of these “other factors” are in fact forms of reproductive isolation. The main difference between the BSC and many proposed alternatives—including the GCSC and Wu's (2001) “genic species concept”—is in how much gene flow between sympatric entities can occur while those entities still remain distinct. Sympatric, recognizably distinct clusters, for example, may differ at only a few loci while exchanging genes freely throughout the rest of the genome.

For several reasons it is important to determine the degree of gene exchange between sympatric or parapatric clusters. First, the evolutionary independence of such taxa depends on the degree to which they can exchange generally adaptive alleles. Second, we want to know whether specific regions of the genome introgress more readily than others. For example, regions linked to genes causing species-specific adaptations or hybrid sterility may be limited in their ability to move between species (Tucker et al. 1992; Machado et al. 2002). Chromosomal rearrangements that differ between species may also serve as “traps” for genes causing hybrid incompatibilities and thus also show limited introgression (Rieseberg et al. 1995; Noor et al. 2001; Machado et al. 2002; Navarro and Barton 2003). Third, it has been claimed that introgression can be a source of genetic variation that allows species to adapt to new environments and hence can serve as an engine of adaptation (Arnold 1997). Finally, we want to determine on the genic level the applicability of various species concepts: for example, Are the genomes of sympatric, hybridizing species largely impermeable to new variation, as the BSC might predict?

Answers to these questions have generally come from hybrid zones, which show in general that introgression, while more frequent than previously suspected, is limited between sympatric taxa (Coyne and Orr 2004). Unfortunately, there is a dearth of hybrid zones in the most genetically well-studied group, Drosophila. Lachaise et al. (2000), however, recently described a hybrid zone between the sister species Drosophila yakuba and D. santomea on the Island of São Tomé, a small volcanic island 280 km off the coast of Gabon. D. yakuba is widely distributed in sub-Saharan Africa and in the islands near the continent (including Madagascar), while D. santomea is endemic to São Tomé. The most striking morphological difference between D. yakuba and D. santomea is the pigmentation pattern: D. yakuba has the characteristic pattern of the D. melanogaster group (females' yellow abdomens are striped with black, while those of males have black tips), while D. santomea completely lacks this dark pigmentation (Lachaise et al. 2000; Llopart et al. 2002). Other traits, such as male genital morphology and the number of sex comb teeth, also distinguish these species (Lachaise et al. 2000; Coyne et al. 2004). Moreover, D. yakuba is highly polymorphic for chromosome inversions, with the right arm of chromosome 2 showing the most polymorphism (Lemeunier and Ashburner 1976). In contrast, D. santomea shows no polymorphism for chromosomal arrangements, and no fixed inversions distinguish D. yakuba and D. santomea (Lachaise et al. 2000). Molecular data suggest that D. santomea arose on the island after the colonization by its common ancestor with D. yakuba ∼400,000 years ago (Cariou et al. 2001; Llopart et al. 2002). Tentative molecular evidence suggests that the current presence of D. yakuba on the island reflects a more recent colonization (Cariou et al. 2001).

As expected from its status as an open forest/savannah species, D. yakuba usually occurs at low elevations on São Tomé, mainly in areas cleared by humans beginning in the 16th century. In contrast, D. santomea inhabits the wet forests found at higher elevations. On the largest mountain in the island, Pico de São Tomé, D. yakuba is found only below 1450 m, while D. santomea occurs between 1150 and 2024 m. Between 1150 and 1450 m—an area that coincides with the transition between agricultural land and virgin rainforest—the species coexist, with the abundance ratio of D. yakuba/D. santomea changing from 2 to 0.05 as one moves upward through the zone (Lachaise et al. 2000). Hybrids have been reported in this zone at a frequency of ∼1%; these were diagnosed as hybrids by their possession of intermediate genitalia and pigmentation. In the laboratory, these species show several forms of reproductive isolation, which include mate discrimination (Lachaise et al. 2000; Coyne et al. 2002), conspecific sperm precedence (Chang 2004), and sterility that conforms to Haldane's rule: F1 hybrid males are completely sterile and F1 hybrid females are partly or fully fertile (Coyne et al. 2004).

Here, we report the patterns of polymorphism and divergence between D. yakuba and D. santomea on the basis of the examination of a substantial number of genes spread throughout the genome. Our aim is to determine the extent of introgression for nuclear and mitochondrial genomes and to compare the amount and patterns of introgression with those predicted from laboratory data on the genetics of hybrid sterility. Because our multilocus approach investigates genome-wide variation, it allows us to distinguish between forces that should affect all genes and forces that should act on some but not all genes (e.g., introgression). This multilocus approach, pioneered by J. Hey and colleagues, has proved useful in revealing the pattern and extent of gene exchange in nature (Hey and Kliman 1993; Hilton et al. 1994; Wang et al. 1997; Kliman et al. 2000; Machado et al. 2002; Broughton and Harrison 2003; Ramos-Onsins et al. 2004).


Loci and flies analyzed:

We used D. yakuba or D. melanogaster sequences to design primers to sequence 45 nuclear genes (Table 1) in D. yakuba Taï18 and D. santomea STO.4 strains (Llopart et al. 2002). Polymorphism data in both species were collected for 28 out of 45 randomly selected nuclear genes and the mitochondrial region ND5-ND4 (see Figure 1), with a sample size ranging from 21 to 32 chromosomes depending on the loci (Table 2). The “sympatric” specimens of D. yakuba and D. santomea were captured in the hybrid zone (elevation 1250 m) in the Obo Natural Reserve in March 2003 on São Tomé Island. At the same time, “allopatric” individuals of D. yakuba and D. santomea were collected, respectively, in a garden outside São Tomé City (elevation 5 m) and on Pico Calvario (a shoulder on the main mountain of Pico de São Tomé at 1566 m). In each of these locations only one species is found, although on Pico Calvario D. yakuba occurs within a few kilometers. Upon collection, all flies were immediately preserved in absolute ethanol until DNA extraction was performed in the laboratory. For mtDNA and Y chromosome genes, we analyzed a more geographically diverse sample of isofemale lines. The D. yakuba lines include strains from the Ivory Coast (Taï18 and TM34), Cameroon (CAM), Gabon (GAB), and Principé Island (ANTON-1) and from the zones of allopatry (SJ1 and BAR2) and sympatry (BOSU, COST2, SA1, and OBAT5) with D. santomea on São Tomé Island. For D. santomea, the sample includes one allopatric isofemale line (CAR1566.6) from Pico Calvario (elevation 1566 m) and nine additional sympatric lines. Eight of these were collected in the hybrid zone on Pico de São Tomé (STO.4, STO.10, STO.15, STO.18, COST 1235.1, OBAT 1200.13, OBAT 1200.14, and LAGO 1482.11), and one (QUIJA 650.1) came from Rio Quija, in southwest São Tomé, an area that also harbors D. yakuba. More detailed information on some of these strains can be found in Coyne et al. (2002). We also used the isofemale strains D. mauritiana B (from Mauritius Island) and D. sechellia SY 001 (from the Seychelles archipelago) to distinguish between selective constraints and introgression as explanations for sequence similarity in regions of low frequency of crossing over. To assess introgression between D. yakuba and D. teissieri, we also sequenced the ND5-ND4 mitochondrial region and the Y chromosome genes in the Brazzaville 8 strain (collected in the Republic of the Congo).

Figure 1.—

Chromosomal locations of the loci included in the polymorphism survey. Cytological positions were inferred from D. melanogaster and the comparative salivary-gland banding maps of Lemeunier and Ashburner (1976). Solid circles symbolize the centromeric regions of the chromosomes. Loci in regions of reduced crossing over are indicated by asterisks.

View this table:

Synonymous (Ks) and nonsynonymous (Ka) substitutions per site between D. yakuba and D. santomea

View this table:

Polymorphism data summary

DNA extraction and sequencing:

We extracted DNA from single flies using the Puregene DNA isolation kit for paraffin-embedded tissue (Gentra Systems, Minneapolis) and performed PCR amplifications using ∼25 ng of genomic DNA. To ensure the specific amplification of the genes on the Y chromosome, we performed PCR reactions using genomic DNA extracted from females as a control. PCR products were purified using the Wizard MagneSil PCR clean-up system (Promega, Madison, WI) and sequenced directly with an ABI PRISM 3100 genetic analyzer (Applied Biosystems, Foster City, CA) after we performed cycle sequencing reactions using Big Dye 3.0 (Applied Biosystems). Both strands were sequenced. We edited sequences with the software Sequencher 3.0 (Gene Codes, Ann Arbor, MI) and aligned them using the ClustalX program (Thompson et al. 1997). Haplotypes were reconstructed (Stephens and Donnelly 2003) using the Phase 2.0 program (Stephens et al. 2001), excluding indels. We deposited all newly obtained sequences in GenBank, EMBL, and DDBJ database libraries under accession nos. AY804458AY804777, AY804780AY804885, and AY804888AY805047.

Data analysis:

Basic polymorphism analyses were performed using DnaSP 4.0 (Rozas et al. 2003). To determine whether regions under study were heterogeneous in their polymorphism-to-divergence ratios, and to test whether the global frequency spectrum of polymorphisms conforms to the neutral expectation, we used the HKA program (Hey and Kliman 1993). This program obtains the significance of the observed statistics (Math and D), using multilocus coalescent simulations.

To detect the presence of significant introgression between D. yakuba and D. santomea, we used two different strategies. In regions with nonreduced frequency of crossing over we fit the data to an isolation model (i.e., a model of genic patterns based on the assumption that there is no gene exchange between species) (Wakeley and Hey 1997), following Wang et al. (1997) and using the Wakeley-Hey (WH) program. This model assumes that all variation is neutral and that population size remains constant over time, but allows for differences in selective constraints among loci (Wang et al. 1997). The parameters of the model (θ for the ancestral and descendant populations and τ, the time since physical separation) are estimated by choosing values that most closely equate expectations with observations; hence tests based on this model are conservative because all shared variation is assumed to be ancestral polymorphism. A significant excess of shared variation is the unequivocal fingerprint of introgression in D. yakuba and D. santomea, for they most likely went through a period of allopatry and came in contact secondarily recently. In particular, we calculated a χ2-statistic on the basis of the difference between expected and observed numbers of shared polymorphisms: Math. We generated the null distribution of this statistic under the isolation model by multilocus coalescent simulations (Wang et al. 1997) with recombination, and we obtained the statistical significance of the observed Math by comparing it to the null distribution generated. We estimated recombination between sites (ρ = 4 Ner) within DNA fragments using the maximum-composite-likelihood method implemented in the Maxhap software (Hudson 2001). The estimates obtained ranged between 0.0025 and 0.17 in D. yakuba and between 0.0025 and 0.5 in D. santomea.

Loci on the tips and near centromeres of chromosomes show a reduced frequency of crossing over (Figure 1) in species of the D. melanogaster subgroup (Ashburner 1989; True et al. 1996). This reduction, usually associated with decreased levels of polymorphism, compromises the detection of shared variation. Thus in these regions and also in mtDNA and Y chromosome genes, our strategy to detect introgression was based on comparing divergence estimates between D. yakuba and D. santomea with those between D. mauritiana and D. sechellia. We derived these estimates for synonymous and nonsynonymous sites (Ks and Ka, respectively), as well as confidence intervals, using the program K-Estimator 5.5 (Comeron 1999). To estimate overall divergence between species pairs, we constructed a concatenated sequence for genes on each chromosome separately and also for all genes taken together as a single unit. To obtain maximum-likelihood estimates of the population migration rates we fit the polymorphism and divergence data on genes of reduced crossing over to an isolation with migration (IM) model (Hey and Nielsen 2004). This Markov chain Monte Carlo method assumes no intragenic recombination and yields estimates of multiple population parameters. Convergence of the Markov chain simulations was assessed by comparing the results and using different starting points.


Intraspecific nucleotide variation:

We collected polymorphism data in D. yakuba and D. santomea from 29 regions: 28 nuclear and 1 mitochondrial (Table 2). D. yakuba shows significantly higher levels of within-species variation than does D. santomea: the weighted average values of Watterson's (1975) estimator of 4Neμ (where Ne is the effective population size and μ is the neutral mutation rate) per site (θ) are 0.0055 for D. yakuba and 0.0044 for D. santomea (Wilcoxon signed rank test using individual genes; Z = −2.22, P = 0.026). This result is consistent with D. santomea having a smaller Ne, although this reduction of only 20% contrasts with a large difference in current population size inferred from the geographical distribution of both species.

In both species, silent variation (synonymous and noncoding) for genes on the X chromosome tends to be smaller than that for genes on the autosomes. The difference is close to the neutral expectation (a 25% reduction for the X chromosome compared to autosomes) based on their difference in Ne: θautosomal = 0.025 vs. θX = 0.015 for D. yakuba, and θautosomal = 0.018 vs. θX = 0.012 for D. santomea. Polymorphism in the mitochondrial region ND5–ND4 in D. yakuba and D. santomea is similar to that seen in D. melanogaster (Rand et al. 1994; Rand and Kann 1996). Consistent with previously published data from D. melanogaster (Zurovcova and Eanes 1999), in both D. santomea and D. yakuba we see extremely low levels of variation on the Y chromosome.

We also compared nucleotide variation between allopatric and sympatric populations of D. yakuba and D. santomea for all genic regions under study. If introgression has occurred recently in sympatric populations, one expects a certain degree of genetic differentiation between sympatric and allopatric populations of the different species. K*ST values (Hudson et al. 1992), a measure of population substructure, are consistently close to 0. Permutation tests (Hudson et al. 1992) show that only 5 of the 29 regions have significantly nonzero values of K*ST (salr and Kr in D. yakuba and bnb, Est6, and sara in D. santomea), and none of these remain significant after Bonferroni correction for multiple tests (Rice 1989). Even for these 5 regions, differences between allopatric and sympatric populations account only for 2.3–5.8% of the total molecular variance (Excoffier et al. 1992). Overall, our results show no genetic difference between sympatric and allopatric populations of D. yakuba and D. santomea on São Tomé Island. Consistently, the fraction of shared/total polymorphisms is similar for allopatric and sympatric flies (7.7 and 7.9%, respectively).

Tests for neutrality:

Under the neutral theory of molecular evolution, regions of the genome that evolve rapidly should also show high levels of intraspecific variation (Kimura 1983), a prediction that can be evaluated with the HKA test (Hudson et al. 1987). For data from several loci one can test the heterogeneity of the polymorphism-to-divergence ratio across all the regions under study (Hey and Kliman 1993). When applied to the 29 loci with polymorphism data in both D. yakuba and D. santomea, this test shows a significant departure from the neutral expectation (Math; see materials and methods for details). This departure is also observed when the five regions with significant population structure are excluded Math, and it is due primarily to regions located on tips and near centromeres of chromosomes. If these regions are excluded from the analysis, the data fit the neutral expectations Math.

We also used Tajima's D (Tajima 1989b) to determine whether the frequency of mutations in our sample is consistent with neutral expectations. Most genes in our data set show negative D-values in both species (implying an excess of low-frequency variants over that expected under the neutral theory), although none of these values are statistically significant after multiple test correction. However, a multilocus analysis (Hey and Kliman 1993) shows that in both species the mean D-value among loci departs from neutral expectations in a negative direction (D = −0.49, P = 0.009 for D. yakuba and D = −0.89, P < 0.0001 for D. santomea). We observe the same pattern after excluding regions that have a reduced frequency of crossing over (D = −0.59, P = 0.009 for D. yakuba, and D = −0.95, P < 0.0001 for D. santomea) or regions with significant population structure (D = −0.53, P = 0.007 for D. yakuba, and D = −1.03, P < 0.0001 for D. santomea). Significant negative values of D imply either purifying selection or population expansion (Tajima 1989a,b).

Analysis of introgression:

Speciation can be thought of as the fragmentation of an ancestral population into two descendant populations that, in time, acquire reproductive isolation (i.e., genetic fixed differences). If the two incipient species exchange genes, the accumulation of fixed differences is retarded and shared polymorphisms are introduced. Yet this same pattern is also expected in isolated incipient species, due to shared variants inherited from their common ancestor, particularly if speciation did not involve a strong bottleneck. There is also a third source of shared variation: independent parallel mutations, which may be a special problem in species that are very polymorphic.

To test for independent mutations as a factor that may mimic introgression, we calculated the expected number of recurrent mutations using a hypergeometric distribution conditioning on the number of polymorphisms in each species and the number of sites under analysis (Clark 1997; Kliman et al. 2000). In D. yakuba and D. santomea, 56 of the 719 polymorphisms are shared (Table 3), while the expectations are just 5.85 under a model in which variation is evenly distributed along all sites, and slightly more—13.01—when only silent sites are considered. The number of observed shared polymorphisms substantially exceeds both of these expectations (P < 1 × 10−6). Therefore a significant fraction of shared variation cannot be explained by parallel mutation and must be accounted for by either introgression or common ancestry.

View this table:

Shared and fixed variation between D. yakuba and D. santomea

To claim shared variation as evidence of introgression we ought to test it against a model that takes into account possible ancestral polymorphism. This approach based on detecting an excess of shared variation, however, is less informative in regions where polymorphism is severely reduced. In Drosophila, as in many other eukaryotes, regions of reduced frequency of crossing over show a strong reduction of intraspecific variation because of recurrent selection (Aguadé et al. 1989; Stephan and Langley 1989; Berry et al. 1991; Begun and Aquadro 1992; Langley et al. 1993; Dvorak et al. 1998; Kraft et al. 1998; Nachman et al. 1998; Przeworski et al. 2000). Thus to investigate introgression between D. yakuba and D. santomea we analyzed separately genes of nonreduced and reduced crossing over.

Regions of nonreduced crossing over:

To determine whether the number of shared polymorphisms exceeds the expectations estimated using the isolation model proposed by Wakeley and Hey (1997) we used the Math-statistic (see materials and methods). We performed two sets of tests. In the first, we contrast Math for each locus against the expected distribution of this statistic under the isolation model, and in the second we add the Math-values of all loci to assess whether the observed variance among loci differs from that expected under the isolation model. Tests of individual loci show a highly significant excess of shared polymorphisms in the salr region Math. The same tendency is observed for the sfl gene Math on the third chromosome, but this is not significant after the Bonferroni correction. salr is on the second chromosome and, as a consequence, the isolation model is also rejected when we test the whole chromosome Math. This result should not be interpreted as evidence for whole-chromosome exchange between D. yakuba and D. santomea, but as an indication that gene flow involving this chromosome is heterogeneous, with at least one locus having a number of shared polymorphisms that is not consonant with the background level. Despite the significant departure from the isolation model for salr and the second chromosome, the overall patterns of variation among all regions of nonreduced frequency of crossing over are compatible with the isolation model (Table 5), which suggests that introgression is not pervasive.

Regions of reduced crossing over:

Loci on the tips and near centromeres of chromosomes show a severe reduction of polymorphism in D. yakuba and D. santomea (Mann-Whitney test; Z = −3.76, P = 0.0002 for D. yakuba and Z = −4.31, P < 0.0001 for D. santomea). This is consistent with these regions having a reduced frequency of crossing over, but makes difficult the detection of introgression through shared variation. Divergence, however, can be used to examine gene flow, for one of the consequences of introgression is the reduction of interspecific differences in the regions being exchanged. Such a reduction is also expected under strong selective constraints. To distinguish between these two possibilities, we compared estimates of genetic divergence between D. yakuba and D. santomea with estimates from a pair of allopatric species in the same group: species in which gene flow is very unlikely. These species are D. mauritiana and D. sechellia. Each is endemic to an Indian Ocean island very far from the other (Mauritius and the Seychelles, respectively), and these well-studied species almost certainly arose after the independent colonization of the islands by their common ancestor with the mainland African species D. simulans (Figure 2).

Figure 2.—

Schematic of the phylogenetic relationships among the nine species of the D. melanogaster subgroup (Kliman et al. 2000; Lachaise et al. 2000; Parsch 2003).

Ideally, one would compare species pairs separated by similar genetic distances. (Using this criterion assumes that introgression is not so pervasive as to distort the genetic differences at all loci used to calculate divergence times.) To check that this was so for the species pairs used in our test, we therefore estimated divergence between D. yakuba and D. santomea using newly obtained sequences of 45 nuclear genes (Table 1). We also estimated divergence between D. mauritiana and D. sechellia, using published sequences of 25 genes (supplementary Table S1 at For D. yakuba and D. santomea, the estimated number of nonsynonymous substitutions per site, Ka, is 0.0029 [95% confidence intervals (C.I.) 0.0022–0.0035], while divergence at synonymous sites, Ks, is 0.044 (C.I. 0.039–0.048). For D. mauritiana and D. sechellia, estimates for Ka and Ks are 0.0083 (C.I. 0.0069–0.0097) and 0.047 (C.I. 0.04–0.053), respectively. Clearly the Ks values, which are critical in determining species age, are very similar for these species pairs. Thus, comparative analysis of interspecific divergence between these two pairs of species allows us to determine what genes—if any—are able to cross the species boundary between D. yakuba and D. santomea. Such introgression is revealed by a reduction of divergence compared to that seen between truly allopatric species.

We compared estimates of divergence in the 12 regions of reduced crossing over for D. yakuba and D. santomea with estimates in the same regions between D. mauritiana and D. sechellia (Table 4). MtDNA and genes on the Y chromosome have, a priori, similar abilities to reveal the genetic history of species, as they both experience an equivalent reduction in Ne compared to that in autosomal loci. However, mtDNA and the Y chromosome show strikingly different patterns of genetic divergence between D. yakuba and D. santomea. The number of substitutions in the Y chromosome is similar for the D. yakuba/D. santomea and D. mauritiana/D. sechellia comparisons (Table 4). In contrast, D. yakuba/D. santomea show a >60-fold reduction in mitochondrial divergence compared to that in D. mauritiana/D. sechellia (K = 0.0006 vs. K = 0.039, P < 0.0001 and Table 4). The reduction is >50-fold if only synonymous sites are considered. These results suggest that introgression of mtDNA has occurred between D. yakuba and D. santomea.

View this table:

Substitutions between D. yakuba-D. santomea and D. mauritiana-D. sechellia in regions with reduced frequency of crossing over

Analysis of population differentiation (Hudson et al. 1992) indicates that the mtDNA of D. yakuba is not genetically different from that of D. santomea (K*ST = −0.01, P = 0.54). Indeed, 9 out of 10 lines analyzed in D. santomea have the same mtDNA haplotype as do 5 out of 6 D. yakuba lines collected in São Tomé. In addition, mtDNA haplotypes of D. yakuba collected in mainland Africa are significantly different from those of D. yakuba and D. santomea lines collected on the island (K*ST = 0.072, P = 0.02), another hint of mtDNA introgression. As this observation implies, mainland and island strains of D. yakuba are also genetically different (K*ST = 0.88, P = 0.044). Consequently, gene genealogies based on mtDNA show alleles of D. santomea clustering with alleles of D. yakuba, a pattern that stands in strong contrast to that seen for the Y chromosome (Figure 3).

Figure 3.—

Gene genealogies from the Y chromosome and mtDNA. The reconstruction was done using the neighbor-joining algorithm (Saitou and Nei 1987). The numbers above the branches are bootstrap values based on 1000 replicates (>50%). Solid squares, D. yakuba; open circles, D. santomea.

Lachaise et al. (2000) also reported a high level of similarity among sequences of the mitochondrial gene cytochrome b in D. yakuba, D. santomea, and D. teissieri (Monnerot et al. 1990). We investigated whether this similarity could reflect an introgression of mtDNA between more than two species by examining synonymous divergence at nuclear and mtDNA genes among these species (supplementary Table S2 at Synonymous divergence at nuclear genes gives us an estimate of the times of these species splits, which we can then compare to divergence times estimated from mtDNA and Y chromosome genes. If there is introgression of mtDNA, one expects reduced divergence times for mtDNA compared to those for nuclear genes.

We sequenced the ND5ND4 region and two genes in the Y chromosome (Dhc-Yh3 and CG17629; we could not amplify Pp1Y1 in D. teissieri) in D. teissieri. In addition, we combined our data with sequences for 26 nuclear genes in D. teissieri and D. yakuba taken from GenBank. The overall estimate for synonymous and nonsynonymous divergence between these 26 nuclear genes for D. yakuba and D. teissieri is Ks = 0.087 (C.I. 0.079–0.094) and Ka = 0.017 (C.I. 0.015–0.019), respectively. Ks for the mitochondrial ND5–ND4 region between D. yakuba and D. teissieri is 0.011, the lowest value among all genes analyzed. Therefore, our results are consistent with a reduced mtDNA divergence between D. yakuba and D. teissieri compared to that in nuclear genes, but we also find fixed differences between the pair D. yakuba/D. santomea on one hand and D. teissieri on the other in the regions studied (4 for the ND5ND4 region of mtDNA, 32 for the Y chromosome). These results suggest that while there was ancient introgression of mtDNA between the ancestor of D. teissieri and that of D. santomea/D. yakuba, there has also been a more recent introgression of mtDNA between D. yakuba and D. santomea, presumably after D. yakuba experienced secondary contact with D. santomea on São Tomé.

For the remaining genes in regions of reduced crossing over, exon 2 of yellow is the only one showing significantly less divergence between D. yakuba and D. santomea than between D. mauritiana and D. sechellia after correcting for multiple tests (Rice 1989) (K = 0.001 vs. K = 0.0058, P < 0.0001). In fact, D. santomea shows no variation in this region: all sequences are identical to that of the most common haplotype in D. yakuba. Moreover, the multilocus HKA test (Hey and Kliman 1993) shows a significant heterogeneity of the polymorphism-to-divergence ratio across these eight regions when the Y chromosome and mtDNA loci are excluded Math. This heterogeneity is not observed when only the yellow region is excluded from the analysis Math. We therefore suggest that the lack of fixed differences in the yellow gene between D. yakuba and D. santomea is the main cause of the heterogeneity among these regions having reduced crossing over. The similarity between these species in the yellow region, however, is not observed in a fragment located ∼5 kb upstream from the start codon (data not shown). Indeed, we obtained the sequence for this additional 0.7-kb-long fragment in the same flies for which the second exon of yellow was studied, and found eight fixed differences between D. yakuba and D. santomea and no shared polymorphism (see discussion). The introgression in the yellow region may reflect its lack of effect on hybrid sterility (see discussion).

Finally, we obtained maximum-likelihood estimates (MLEs) of the population migration rates between D. yakuba and D. santomea by fitting polymorphism and divergence data on genes with reduced frequency of crossing over to the recently proposed IM model (Hey and Nielsen 2004). MLEs of population migration rates (M; M = 2 Nem, where m is the migration rate) are m1 = 0.0008 and m2 = 0.062 for D. santomea to D. yakuba and for D. yakuba to D. santomea, respectively. Ninety-five percent confidence intervals of m2 rule out a scenario with no introgression of genes from D. yakuba to D. santomea.

Autosomal vs. sex-linked genes:

The study of hybrid male sterility in D. yakuba and D. santomea revealed at least three male “sterility genes” that map to the X chromosome, as well as a significant effect of foreign Y chromosomes on sterility (Coyne et al. 2004). Thus, all else being equal, the movement of sex chromosomes across the hybrid zone should be limited. We therefore examined whether the Y and X chromosomes show less introgression between these species than do the autosomes. For the Y chromosome, none of the three genes studied shows shared variation and all show fixed interspecific differences (Table 3). Thus there is no evidence of recent gene flow between D. yakuba and D. santomea involving this chromosome. This result is, of course, consistent with the expectation that the Y chromosome is the least likely to introgress because it not only causes sterility by itself, but also is restricted to males, which are completely sterile as F1 hybrids and largely sterile in backcrosses (Coyne et al. 2004).

To compare the amount of introgression between the X chromosome and autosomes, we calculated the number of shared polymorphisms and fixed differences between D. yakuba and D. santomea for these chromosomes. We used only variation in regions with “normal” frequencies of crossing over to avoid artificially distorting the polymorphism-to-divergence ratio differentially for X chromosomes and autosomes. The ratio of shared to fixed differences is significantly lower for genes on the X chromosome than for those on the autosomes (10/41 for the X chromosome and 44/27 for the autosomes, G-test, G = 22.72, P < 0.0001). While this result is consistent with reduced introgression of the X chromosome, there is an alternative explanation: the difference in Ne between X chromosomes and autosomes is also expected to reduce both θ and the number of shared polymorphisms and to increase the number of fixed differences for X-linked compared to autosomal regions. To properly judge the likelihood of introgression, we must correct for this difference in Ne. To do so, we estimated the number of shared and fixed differences for the X chromosome and autosomes under the isolation model (9.47 and 28.79, respectively, for the X chromosome, and 44.52 and 39.20, respectively, for the autosomes). The deviation of the observed values from the expectations is estimated with a χ2-statistic, and the statistical significance is obtained by comparing the observed χ2, 8.98, with a null distribution of this statistic obtained by multilocus coalescent simulations. Note that in very closely related species, the variance of the outcome of the coalescent process has a substantial effect on the number of fixed differences. Although we observe a higher than expected number of fixed differences for the X chromosome (and a deficit of fixed differences on the autosomes), the deviation from expectation is not statistically significant (P = 0.18). Therefore, although the difference between the X chromosome and the autosomes is in the direction indicating less introgression of the former between D. yakuba and D. santomea, this difference is not significant.


We examined intra- and interspecific variation for 29 randomly selected loci regions of D. yakuba and D. santomea. Overall, the data are not compatible with an isolation model that assumes no gene flow between these species after they began to diverge (P = 0.0001, Table 5). Therefore, the genomes of these entities have not remained completely distinct, almost certainly because of past hybridization. Nevertheless, this gene flow has not been extensive: only 3 of the 29 regions (mtDNA and the nuclear regions containing the yellow and salr loci) show statistically significant evidence for introgression. Thus these species do not form a “hybrid swarm” on São Tomé in which there is pervasive introgression in much of the genome, while disruptive selection maintains distinctness for only a few traits (e.g., pigmentation and male genitalia). However, we must bear in mind that the introgression we are able to detect today is unlikely to reflect very recent gene flow, for neutral mutations require substantial time to reach detectable frequencies in a population (Kimura and Ohta 1972; Nei and Feldman 1972).

View this table:

Testing the isolation model in D. yakuba and D. santomea

Introgression in regions of nonreduced crossing over:

The salr gene, located in 2L, shows evidence of introgression between D. yakuba and D. santomea. Its level of shared polymorphism and the absence of fixed differences between species stand out against a chromosomal background practically devoid of shared variation. There is a strong reduction of the ratio of shared-to-exclusive polymorphisms on the right arm of the second chromosome, which has the most inversion polymorphism in D. yakuba, compared to that on chromosome 3 (2/70 vs. 29/219, G = 6.4, P = 0.011). These results suggest that gene flow is particularly restricted in 2R, an observation consistent with the hypothesis that genes contributing to reproductive isolation accumulate faster in chromosomal inversions than in colinear regions of the genome (Rieseberg et al. 1995; Noor et al. 2001; Navarro and Barton 2003). In agreement with this proposal, Machado et al. (2002) reported that regions that are distinct between D. pseudoobscura and D. persimilis are also located in chromosomal inversions. Clearly the effect of inversions on limiting gene flow is greatest if they are fixed between species, but polymorphic inversions may also contribute, to a lesser degree, to restricting gene flow. The combination of shared variation for salr on 2L and the absence of shared variation on 2R leads to the result that patterns of variation in the entire second chromosome are incompatible with the isolation model—a result not seen in the X or third chromosomes.

The isolation model is not rejected when all genes in regions of nonreduced crossing over are considered. The estimated values of the parameter θA of the isolation model (Table 5), however, suggest that the ancestral species of D. yakuba and D. santomea had an effective population size larger than that of either of the derived species, a result that is inconsistent with the observed negative values of D (i.e., population expansion). Thus, it is conceivable that the degree of introgression between D. yakuba and D. santomea may have been underestimated.

Introgression in regions with reduced crossing over:

Divergence between D. yakuba and D. santomea for the mitochondrial region ND5–ND4 is strongly reduced compared to both the rest of the genes in these species' genomes and the divergence between D. mauritiana and D. sechellia. This reduction is even more conspicuous when we consider that mitochondrial genes evolve 4.5–9 times faster than nuclear genes (Moriyama and Powell 1997) at synonymous sites and is a sound indication that D. yakuba and D. santomea have exchanged mtDNA. In addition, the D. yakuba and D. santomea mtDNA haplotypes found on São Tomé are more similar to each other than to mtDNA haplotypes found on the African mainland. We therefore conclude that the lack of fixed differences in the ND5–ND4 region between D. yakuba and D. santomea, together with the observation of shared haplotypes on the island, strongly suggests that these species have exchanged mitochondrial genomes.

There is ample evidence from many groups that mtDNA introgresses between species more frequently than does nuclear DNA (Smith 1992; Ferris et al. 1993; Bernatchez et al. 1995; Taylor and McPhail 2000; Shaw 2002; Ballard and Whitlock 2004). The reason for this pattern is not yet understood, but Hudson and Coyne (2002) and Coyne and Orr (2004) suggest two reasons. First, mitochondrial loci appear to have primarily “housekeeping” functions, such as production of tRNAs and enzymes used in respiration. Selection on such loci may be largely divorced from the external (but not the internal) environment, and therefore mitochondria from one species may function relatively well on the genetic background of a closely related species. In addition, even “neutral” nuclear DNA may often fail to introgress during hybridization because it is linked to other genes that are divergently selected between species.

Because mainland African populations of D. yakuba can be thought of as an outgroup, it is very likely that the direction of introgression of mtDNA has been from D. yakuba into D. santomea. (The alternative direction of introgression would require the unlikely scenario that D. santomea mtDNA introgressed into D. yakuba on São Tomé, and this mtDNA subsequently spread throughout D. yakuba on the African mainland.) This result is consonant with laboratory studies showing that, while D. yakuba and D. santomea produce hybrids in both directions, interspecific matings occur much more frequently between D. yakuba females and D. santomea males than vice versa (Coyne et al. 2002).

The mtDNA haplotype found in a single strain in D. santomea collected on the southwest part of São Tomé is identical to one D. santomea haplotype (though not the most frequent) detected on Pico de São Tomé, suggesting that D. yakuba haplotypes may be fixed on the entire island. The similarity of mtDNA between D. yakuba and D. santomea may have resulted from either genetic drift or selection following introgression. The presence of some polymorphisms in the mtDNA of the D. yakuba and D. santomea lines from São Tomé Island, as well as the detection of shared variation on this organelle, points to a neutral explanation. On the other hand, given the average time to fixation for neutral mutations (Kimura 1983), and the relatively short divergence time between these species, it remains plausible that selection may have played some role in the invasion of the D. yakuba mtDNA through a “transspecies-selective sweep” (Hilton et al. 1994; Stephan et al. 1998; Machado and Hey 2003). Under this scenario, an ancient selective sweep of D. yakuba mtDNA through both species was followed by the appearance of new mutations on mtDNA that later introgressed between the species. It is unlikely that this mitochondrial sweep could have been associated to the spread of a Wolbachia infection, as proposed in D. simulans (Turelli and Hoffmann 1995; Ballard 2004), because D. yakuba and D. santomea do not show cytoplasmic incompatibility when infected by natural strains of Wolbachia (Zabalou et al. 2004).

As with mtDNA, divergence in the yellow region between D. yakuba and D. santomea is reduced compared to that seen between D. mauritiana and D. sechellia. That this disparity is also due to introgression rather than to selection on amino acids is supported by the observed fivefold difference in Ks (Ks = 0.0029 for D. yakuba-D. santomea and Ks = 0.015 for D. mauritiana-D. sechellia). In addition, all D. santomea yellow haplotypes are identical to the most frequent haplotype seen in the D. yakuba sample.

Apart from introgression, there is one other hypothesis that could explain the pattern observed in yellow: stronger selective constraints at synonymous sites at yellow in D. yakuba and D. santomea than in the D. simulans clade. Indeed, Takano-Shimizu (1999) has suggested stronger selective constraints at yellow synonymous sites in the D. yakuba lineage compared to the D. melanogaster lineage (using D. orena as outgroup; Figure 2) on the basis of patterns of synonymous codon usage. However, there is no reduction in the number of synonymous substitutions in the D. yakuba lineage compared to the D. melanogaster lineage (Takano-Shimizu 1999). Moreover, synonymous polymorphism at yellow in D. yakuba is not drastically reduced compared to that in the other nuclear regions studied, with 14 of the 29 regions analyzed showing lower polymorphism than this locus (10/12 genes in regions of reduced crossing over and 4/12 in regions of nonreduced crossing over). Therefore, we conclude that the lack of divergence in the yellow locus between D. yakuba and D. santomea largely reflects introgression. This is also consistent with evidence that this region apparently does not contain genes affecting male sterility in D. yakuba/D. santomea hybrids (Coyne et al. 2004). However, we do not detect introgression in a region only 5 kb upstream from the start codon of yellow. It has been suggested that the telomere end of the D. yakuba X chromosome shows a substantial increase in recombination (∼14-fold higher) compared to that of the D. melanogaster X chromosome (Takano-Shimizu 1999). Thus, we tentatively propose that introgression is restricted to the coding region of the yellow gene, while a 5′ flanking region remains distinct. This in turn suggests that introgression can be a quite localized phenomenon on the chromosome (Wang et al. 1999; Ting et al. 2000; Takahashi et al. 2001).

Autosomal vs. sex-linked genes:

We also tested the hypothesis that there is less interspecific introgression of genes on the X chromosome than on the autosomes because of the large effect of the X chromosome on hybrid male sterility in D. yakuba and D. santomea (Coyne et al. 2004). If the X chromosome contributes a disproportionate effect on hybrid male sterility and inviability (Charlesworth et al. 1987; Coyne and Orr 1989), all else being equal, there should be less introgression for the X chromosome than for autosomes. The numbers of fixed differences between these species for the X chromosome and autosomes are higher and lower, respectively, than the expectations under the isolation model, but these differences are not statistically significant when tested by coalescent simulation. These differences, however, are in the direction expected if the X chromosome shows less introgression, and it is possible that in such closely related species the large variance of the coalescent process denies us the statistical power to demonstrate this.

Alternatively, there are two other explanations for a lack of difference in introgression of X-linked vs. autosomal genes. First, the large effect of the X chromosome may reflect not a difference in the number of genes causing sterility, but merely its expression of recessively acting sterility genes in males—the so-called “dominance theory,” for which there is substantial evidence (Coyne and Orr 2004). If the density of sterility genes is similar on all chromosomes, it is thus possible that “X effects” in males caused by recessivity alone (rather than a higher density of sterility genes) would not lead to a reduced rate of introgression. Theoretical work is needed to explore this possibility. However, recent experiments in D. simulans and D. mauritiana show that X-chromosomal sterility genes are not only partially recessive, but also more numerous than those on autosomes (Tao et al. 2003). If this were also the case in D. yakuba and D. santomea, one would indeed expect less introgression of X-linked vs. autosomal genes.

Alternatively, reduced introgression of the X chromosome caused by its larger effect on hybrid sterility may be balanced by the accumulation of autosomal genes contributing to other reproductive barriers, such as sexual or habitat isolation. Theoretical work by Orr and Betancourt (2001)(Orr and Betancourt 2001), for example, shows that when adaptation to a sudden change in environmental conditions is achieved through selection of existing genetic variation at initial mutation/selection equilibrium, genes in the X chromosome are expected to evolve less rapidly than those in autosomes.


We thank D. Charlesworth and two anonymous reviewers for comments and suggestions. We are also grateful to J. M. Comeron for helping us with different programs and to S. Elwyn and S. Powers for their assistance in the laboratory. We thank A. Chang for collecting flies, S. Sousa (Conservation et utilization rationale des ECOsystèmes Forestiers D'Afrique Centrale, ECOFAC) for assistance in the field, and the Government of São Tomé and Principé for allowing us to work in the Obo National Park. This work was supported by National Institutes of Health grant GM58260 to J. A. Coyne.


  • Communicating editor: D. Charlesworth

  • Received July 16, 2004.
  • Accepted June 9, 2005.


View Abstract