Patterns of DNA Sequence Variation Suggest the Recent Action of Positive Selection in the janus-ocnus Region of Drosophila simulans
John Parsch, Colin D. Meiklejohn, Daniel L. Hartl


Levels of nucleotide polymorphism in three paralogous Drosophila simulans genes, janusA (janA), janusB (janB), and ocnus (ocn), were surveyed by DNA sequencing. The three genes lie in tandem within a 2.5-kb region of chromosome arm 3R. In a sample of eight alleles from a worldwide distribution we found a significant departure from neutrality by several statistical tests. The most striking feature of this sample was that in a 1.7-kb region containing the janA and janB genes, 30 out of 31 segregating sites contained variants present only once in the sample, and 29 of these unique variants were found in the same allele. A restriction survey of an additional 28 lines of D. simulans revealed strong linkage disequilibrium over the janA-janB region and identified six more alleles matching the rare haplotype. Among the rare alleles, the level of DNA sequence variation was typical for D. simulans autosomal genes and showed no departure from neutrality. In addition, the rare haplotype was more similar to the D. melanogaster sequence, indicating that it was the ancestral form. These results suggest that the derived haplotype has risen to high worldwide frequency relatively recently, most likely as a result of natural selection.

THE comparison of DNA sequences both between and within species provides valuable information for understanding the evolutionary forces affecting genetic loci. Interspecific comparisons are useful for measuring rates of molecular evolution, inferring functional domains that are under strong selective constraint, and understanding processes such as gene duplication. Intraspecific comparisons reveal the standing genetic variation within a population and the microevolutionary forces that have shaped such variation. Previously we investigated the rates of molecular evolution of three paralogous male-reproductive genes, janusA (janA), janusB (janB), and ocnus (ocn), in the Drosophila melanogaster species subgroup (Parschet al. 2001). The three genes are the result of two duplication events. The initial duplication of an ancestral sequence produced the janA and janB genes and clearly predates the divergence of the D. melanogaster and D. obscura species groups (Yanicostaset al. 1995), which is estimated to have occurred 25 million years ago (mya) (Russoet al. 1995). The subsequent duplication of janB to produce ocn appears to have occurred after the divergence of the D. melanogaster and D. obscura species groups but prior to divergence of the D. melanogaster species subgroup (Parschet al. 2001), which occurred ~10 mya (Russoet al. 1995). Our results indicated significant heterogeneity in rates of evolution (as measured by the ratio of the nonsynonymous and synonymous substitution rates, dN/dS) among the three genes, suggesting that each gene has evolved under different selective constraints following duplication. In addition, all three genes showed a faster rate of evolution than genes encoding metabolic enzymes. This result was consistent with a general pattern of increased evolutionary rates in genes with reproductive function (Civetta and Singh 1998). Some reproductive genes, such as the Drosophila accessory protein gene Acp26Aa, show evidence for positive selection by a dN/dS ratio that is significantly greater than 1 (Tsaur and Wu 1997). This ratio was much less than 1 for janA, janB, and ocn, and thus there was no evidence for positive selection by this strict criterion. However, a number of powerful statistical techniques have been developed to detect patterns of selection from intraspecific DNA polymorphism data. For this reason, we chose to investigate intraspecific variation in the janA, janB, and ocn genes.

In species of the D. melanogaster species subgroup, janA, janB, and ocn lie in tandem within a 2.5-kb region of the right arm of chromosome 3 (Figure 1). janA produces two alternatively spliced transcripts, one that is specific to testes and another that is found in various tissues and in both sexes (Yanicostaset al. 1989). The two janA transcripts differ in their 5′ untranslated regions (UTRs) and their translation begins at different AUG initiation codons, with initiation of the sperm-specific polypeptide occurring 48 bp downstream of the general initiation site (Yanicostaset al. 1989). The 3′ UTR of janA overlaps with the 5′ UTR of janB and the beginning of the janB protein-encoding region (Yanicostaset al. 1989; Figure 1). Despite this overlap, both janA and janB produce monocistronic transcripts that are controlled by independent promoters (Yanicostas and Lepesant 1990). janB and ocn produce only testis-specific transcripts (Yanicostaset al. 1989; Parschet al. 2001). The ocn transcriptional unit lies ~250 bp downstream from the janB polyadenylation site, and there is no overlap between the janB and ocn transcripts (Figure 1). The janB 5′ UTR contains translational control elements that have been shown to restrict translation to the postmeiotic stages of sperm development (Yanicostaset al. 1995). The high degree of similarity between the janB and ocn 5′ UTR sequences suggests that ocn translation is under similar post-transcriptional regulation (Parschet al. 2001).

Figure 1.

Diagram of the janus-ocnus genomic region. The relative chromosomal location of the three genes is shown at the top, with coding regions represented as solid boxes. A bracket indicates the region that was PCR amplified and sequenced in this study. The janA, janB, and ocn transcriptional units are shown below. Solid boxes represent coding regions, open boxes represent untranslated regions, and lines represent introns.

In this article, we report DNA sequence variation in the janA-ocn region of D. simulans. We find reduced levels of polymorphism at these loci relative to other third chromosome loci surveyed previously. More strikingly, we find two highly divergent haplotypes in a 1.7-kb region spanning the janA and janB genes. The distribution of variants at segregating sites is shown to differ significantly from the neutral expectation by several statistical tests. These results are consistent with a model of genetic hitchhiking and suggest the recent action of positive selection in this region of the genome.


Fly stocks: Each D. simulans line was derived from a single wild-caught female and maintained by brother/sister mating for >50 generations. The lines were collected from various geographic locations and at various times and were kindly provided to us by P. Capy and Y. Tao. The Canton-S strain of D. melanogaster was used as an outgroup. Genomic DNA was prepared from a single male of each line as described previously (Parschet al. 2001).

PCR and DNA sequencing: The janA-ocn region was PCR amplified as a single 2.4-kb fragment from genomic DNA using primers and amplification conditions described in Parsch et al. (2001). The amplified region contained the complete coding sequences of janA and janB and a large portion of the ocn coding sequence extending into exon 3 (Figure 1). PCR products were cloned following the protocol of the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). Plasmid DNA was then purified by the alkaline lysis procedure (Sambrooket al. 1989) and used as a template for DNA sequencing. Alternatively, PCR products were purified using the QIAquick PCR purification kit (QIAGEN, Valencia, CA) and used directly as sequencing templates. DNA sequencing was performed with the dye terminator cycle sequencing kit (Applied Biosystems, Foster City, CA), using the amplification primers and gene-specific internal primers (Parschet al. 2001). In addition, universal M13 forward and reverse primers were used for sequencing plasmid templates. Sequencing gels were run on an ABI 373 automated sequencer. DNA was sequenced on both strands and a minimum of either two independently cloned plasmid templates or one plasmid template and one PCR template were sequenced for each line. Additional clones or PCR templates were sequenced when necessary to resolve ambiguities. We did not encounter any heterozygous positions within the sequenced regions. DNA sequences have been submitted to the GenBank database under accession nos. AF393330–AF393368.

Restriction analysis: Restriction enzymes and buffers were supplied by New England Biolabs (Beverly, MA). The following enzymes were used: BstYI, RsaI, MluI, and FokI. For restriction analysis, the janA, janB, and ocn genes were amplified separately from each D. simulans line and an aliquot of the undigested product was run on a 1% agarose gel to ensure correct amplification. Five microliters of PCR product was then digested using the manufacturer's buffer and 2–6 units of enzyme. Digests were carried out at 37° for 2 hr. Digestion products were separated on 2% agarose gels, which allowed the unambiguous scoring of the presence or absence of a particular restriction site. Separate digests were performed for each restriction enzyme.

Sequence analysis: Standard DNA polymorphism analyses and coalescent simulations to determine the probabilities of the observed number of haplotypes, haplotype diversity, and Tajima's (1989) D statistic were performed using DnaSP 3.50 (Rozas and Rozas 1999). Coalescent simulations to determine the statistical significance of Fay and Wu's (2000) H statistic were performed using a program provided by J. Fay. The haplotype test of Hudson et al. (1994), which determines the probability of observing a subset of i alleles with j or fewer segregating sites given a total sample of n alleles with S segregating sites, was performed using a program provided by J. Braverman. Values of i and j were chosen to produce the most extreme subset possible from the observed data. The probability was corrected for the a posteriori choice of i and j by including the probability of all more extreme configurations theoretically possible. For all of the above, 10,000 random coalescent simulations (Hudson 1990) were performed under the conservative assumption of no recombination. The number of segregating sites was fixed at the observed value.

Figure 2.

Segregating sites in the original sample of eight D. simulans lines (s1–s8) from a worldwide distribution. m1 represents the D. melanogaster outgroup sequence. Dots indicate a match to the s1 sequence. Gaps are indicated by dashes. The dash at position 2188 represents a 3-bp deletion beginning at 2188.


Nucleotide polymorphism in the janA-ocn region: We initially surveyed levels of nucleotide polymorphism in the janA-ocn region of eight D. simulans lines from a worldwide distribution. This survey revealed a total of 44 single nucleotide polymorphisms (SNPs) and two insertion/deletion (indel) polymorphisms (Figure 2). All of the polymorphisms were at silent or noncoding sites, with the exception of a C/A polymorphism at position 93. Interestingly, this polymorphism changes the sperm-specific downstream initiation codon of janA from AUG (Met) to CUG (Leu). This change presumably eliminates the sperm-specific form of the janA polypeptide in line s2. However, s2 males appear phenotypically normal and are completely fertile (our unpublished results).

The most striking feature of the data is the distribution of variants at segregating sites over the janA-janB region. In this region there are a total of 31 segregating sites. Variants at 30 of these sites are unique within the sample (singletons), and 29 of the singletons are found in line s3. The ocn gene also shows an unusual distribution of variation. Five of the eight alleles are identical and match the s1 sequence. Of the three remaining alleles, two are identical to each other (s5 and s7) but differ from s1 at 11 sites. The final allele, s8, differs from s1 at 8 sites but also differs from s5 and s7 at 7 sites. Interestingly, the rare variants in ocn occur in different lines than the rare variants in janA and janB. Because many of these rare variants match the D. melanogaster sequence, we can discount the possibility that they are all new mutations and can infer a recombination event within the intergenic region between janB and ocn (Figure 2).

Several statistical tests were applied to the data to determine whether the observed distribution of variants at segregating sites differed from the neutral expectation (Table 1). Only SNPs were considered in the calculation of statistics presented in Table 1 and in the analyses below. All tests were applied separately to each gene and to the region as a whole. In addition, due to the strong linkage disequilibrium between variants in the janA and janB genes, we applied the tests to the combined janA-janB region. First, we tested for a departure from neutrality in the frequency distribution of variants at segregating sites using Tajima's (1989) D statistic. We obtained a significantly negative value of D for janA, janB, and the combined janA-janB region (Table 1). This indicates an excess of low-frequency variants and can be explained by the large number of sites at which s3 differs from the other alleles. Tajima's D is positive, although not significantly so, for ocn, where many of the variants are in intermediate frequency.

View this table:

Summary statistics for original sample (eight alleles)

View this table:

Restriction survey of a worldwide sample of D. simulans

Two haplotype tests implemented in the DnaSP computer program (Rozas and Rozas 1999) were applied to our data. These test for either a reduction in the number of haplotypes or in haplotype diversity by comparing the observed data to the results of random coalescent simulations and are similar to the tests proposed by Depaulis and Veuille (1998). Our results indicate a significant paucity of haplotypes at janA and in the combined janA-janB region (Table 1). We also find a significant reduction in haplotype diversity at both janA and ocn and in the region as a whole (Table 1). In addition, the haplotype test of Hudson et al. (1994) revealed a highly significant departure from the neutral expectation for janA, janB, and the combined janA-janB region (Table 1). This indicates that there are fewer segregating sites within the common haplotype than would be expected under a neutral equilibrium model.

Finally, we compared the frequency spectrum of derived variants at polymorphic sites to the neutral expectation using the H statistic of Fay and Wu (2000). A significantly negative value of H indicates that derived variants are in higher frequency than expected under a neutral mutation-drift model. Since derived mutations are expected to increase in frequency when linked to a positively selected site, this can be used as a test for genetic hitchhiking (Fay and Wu 2000). Our results indicate that H is significantly negative for janA and janB and also for the combined janA-janB region and the entire janA-ocn region (Table 1). Although H is not significantly negative for the ocn gene by itself, the common ocn haplotype does show the derived state at 8 of 13 SNP sites and at both of the indel sites (Figure 2). These results suggest that the haplotype structure observed in this region may be explained by previously rare variants being driven to high frequency due to their linkage with a positively selected site.

View this table:

Linkage disequilibrium between polymorphic restriction sites

Restriction survey: To further investigate the haplotype structure in this region of the genome, we surveyed restriction site polymorphism in an additional 28 lines collected from a worldwide distribution. On the basis of our initial sequencing, we chose five polymorphisms that resulted in either the gain or loss of a restriction site. The first two polymorphisms [BstYI(246) and RsaI(1291)] span the janA-janB region and distinguish the s3 allele from all others. Furthermore, these were sites at which the s3 allele matched the D. melanogaster outgroup sequence, suggesting that the rare variant represents the ancestral state. We also surveyed an RsaI polymorphism at site 1452. This is the only site over the janA-janB region that is not a singleton and represents a derived polymorphism segregating within the common haplotype (Figure 2). The final two restriction site polymorphisms [MluI(1836) and FokI(2164)] span the ocn gene. The former polymorphism distinguishes the s5 and s7 alleles from all others, while the latter distinguishes s5, s7, and s8 from all others. In both cases, the less frequent variant matches the D. melanogaster sequence. A summary of restriction site polymorphism is shown in Table 2. Over the janA-janB region, we found 27 alleles matching the common haplotype and 7 alleles matching the rare haplotype (i.e., a restriction pattern identical to s3). Two alleles (s24 and s35) appear to be recombinants. Overall, the strongest linkage disequilibrium was between site 246 in janA and site 1291 in janB (Table 3). The restriction pattern at these two sites was used as a diagnostic to classify the alleles as either haplotype 1 or 2 (Table 2).

Nucleotide polymorphism in the rare haplotype: The six additional alleles matching the restriction pattern of haplotype 2 were completely sequenced over the janA-ocn region and compared to the original s3 sequence. This revealed a total of 71 SNPs and 3 indel polymorphisms (Figure 3). All of the polymorphisms occurred at silent or noncoding sites. Levels of nucleotide polymorphism within haplotype 2 were typical of D. simulans autosomal loci (Moriyama and Powell 1996; Table 4), and we found no significant departure from neutrality by any of the statistical tests described above when applied to each gene separately, the entire region, or to the combined janA-janB region (data not shown). The amount of polymorphism in the janA and janB genes within haplotype 2 is nearly 30-fold greater than that within haplotype 1 (Table 4), suggesting that haplotype 2 is ancestral. The ancestral state of haplotype 2 is also supported by the neighbor-joining tree shown in Figure 4, where the haplotype 1 alleles form a single clade within the haplotype 2 alleles. The separate evolutionary histories of these alleles over the janA-janB and ocn genes further indicate recombination between the janB and ocn genes. For example, the s3 allele differs substantially from the common haplotype over janA and janB but is identical to the most common haplotype at ocn (Figure 4).


Our survey of nucleotide polymorphism in the janA-ocn region of D. simulans reveals two noteworthy features. First, we detect strong linkage disequilibrium between variants in the janA and janB genes, which results from the segregation of variation in two divergent haplotypes. There are 13 fixed differences between the two haplotypes. The common haplotype, designated here as haplotype 1, has an estimated frequency of 75% and is present in worldwide populations. The rare haplotype, designated as haplotype 2, is present in several geographically distinct regions but appears to be more frequent in populations from central/southern Africa and the Pacific rim (Table 2). A recent report of nucleotide variation in the janus region of three D. simulans lines from Kenya (Klimanet al. 2000) is consistent with our results, as all three matched haplotype 2 at the diagnostic sites. The second noteworthy feature of our data is that levels of nucleotide polymorphism among alleles of the common haplotype are greatly reduced in comparison to alleles of the rare haplotype. The latter show levels of polymorphism typical for D. simulans autosomal genes, while the former show a nearly 30-fold reduction in polymorphism. Previous surveys of DNA sequence variation in D. melanogaster have revealed nonneutral haplotype structures at a number of loci, including Sod (Hudsonet al. 1994), white (Kirby and Stephan 1995, 1996), Suppressor of Hairless (Depauliset al. 1999), Fbp2 (Bénassiet al. 1999), and the region spanning the proximal breakpoint of the chromosomal inversion In(2L)t (Andolfattoet al. 1999). In D. simulans, nonneutral haplotype structures have been reported for Pgd (Begun and Aquadro 1994), runt (Labateet al. 1999), G6pd and vermilion (Hamblin and Veuille 1999), and the In(2L)t breakpoint (Andolfatto and Kreitman 2000). In addition, Andolfatto and Przeworski (2000) report that for an unexpectedly high number of D. melanogaster and D. simulans loci that have been surveyed, there is a discordance between the recombination rate inferred from population surveys and that from experimental mapping, with the experimental rates being higher. This can be interpreted as a genome-wide excess of linkage disequilibrium, although note that this criterion for detecting linkage disequilibrium is not as strict as many of the haplotype tests that assumed no recombination. Below we consider some genetic and evolutionary forces that may affect the distribution of variants at segregating sites in a population and discuss whether or not they can explain the patterns observed in the janA-ocn region.

Figure 3.

Segregating sites in seven D. simulans lines matching the restriction pattern of haplotype 2. m1 represents the D. melanogaster outgroup sequence. Dots indicate a match to the s3 sequence. Gaps are indicated by dashes. Insertions are indicated by “i” and represent sequences of 23, 2, and 7 bp beginning at sites 266, 1499, and 1568, respectively.

Figure 4.

Neighbor-joining trees of the 14 completely sequenced D. simulans alleles (Figures 2 and 3). The D. melanogaster sequence was used as an outgroup. Separate trees were constructed for the combined janA-janB genes and for the ocn gene. Numbers in parentheses indicate the haplotype class of each allele according to restriction pattern (Table 2).

View this table:

Summary statistics for haplotypes 1 and 2 (seven alleles each)

One potential cause of strong linkage disequilibrium over an extended region of DNA sequence is a severe reduction in the rate of recombination. For example, recombination rates are known to be reduced in regions containing chromosomal inversions. To eliminate this possibility, we performed in situ hybridizations of a janA-ocn probe to polytene chromosomes of several representative lines from each haplotype to ensure that the genes were at the same cytological location in both haplotypes. In addition, we examined Giemsa-stained polytene chromosomes of the remaining lines of each haplotype and saw no evidence for inversion polymorphism in this region of the genome. Recombination rates are also known to be reduced in certain chromosomal regions, particularly those flanking centromeres and telomeres (Lindsley and Sandler 1977). Since the janA-ocn region is near the tip of chromosome arm 3R, this is a possibility. However, several observations argue against reduced recombination being the cause of the observed haplotype structure. First, a comparison of the physical and genetic distances for loci on the third chromosome (Hamblin and Aquadro 1996) indicates that there is not a great reduction in recombination in this region in D. simulans. Second, a worldwide sample of nine alleles of the Tpi locus (Hassonet al. 1998), which is located distal to ocn on chromosome 3R, shows no departure from neutrality by the haplotype number, haplotype diversity, or Hudson's haplotype test as implemented in this article. Third, there is ample evidence for recombination within the janA-ocn region in haplotype 2 (Figure 3). From the distribution of segregating sites in the seven alleles of haplotype 2 we infer a minimum of seven recombination events by the method of Hudson and Kaplan (1985), six of which fall within the janA-janB region (Table 4). Finally, the probabilities associated with all of the statistical tests in Table 1 were estimated from coalescent simulations under the conservative assumption of no recombination. Thus, even in the extreme case of zero recombination, the observed data differ significantly from the neutral expectation.

The presence of two divergent haplotypes could also be explained by demographic factors, such as population subdivision, changes in population size, or founder effects. A survey of nucleotide polymorphism in the vermilion and G6pd genes detected significant population subdivision within D. simulans, particularly among African populations (Hamblin and Veuille 1999). Population subdivision is also indicated by the presence of distinct mitochondrial races within the species (Baba-Aissaet al. 1988). Although these results indicate that D. simulans likely departs from the standard demographic assumption of panmixis, it is unlikely that demographics alone can explain the pattern of nucleotide polymorphism in the janA-ocn region. This is because demographic forces are expected to affect the entire genome, not just particular loci, and when compared with previously sequenced loci the janA-ocn region appears to be unusual (see below). Furthermore, a demographic explanation must account for both features of the observed data (two divergent haplotypes and very low polymorphism within the common haplotype). A comparison of DNA polymorphism at janA, janB, and ocn with other D. simulans third chromosome loci studied by Begun and Whitley (2000) is relevant to these two points. Figure 5 shows the distribution of several statistics for 19 loci on chromosome arm 3R (see Table 1 of Begun and Whitley 2000) as well as for janA, janB, and ocn. All three of these genes show low levels of silent DNA polymorphism (Figure 5A). For example, the janB gene has lower values of πsyn and θsyn than any of the other genes. janA and ocn are also among the lowest for these values (Figure 5A). The low variability observed in these genes cannot be caused by reduced mutation rates in this region of the genome or by unusually strong selective constraints, as measures of nucleotide variability are low in these genes even when standardized by interspecific divergence (Figure 5B). Differences in sampling schemes between our data and those of Begun and Whitley (2000) strengthen support for the conclusion that the janA-ocn region is unusual among chromosome 3R loci. Most of the genes in Begun and Whitley's study were sampled from a single California population (although two notable exceptions are discussed below), whereas our data are from a worldwide sampling of D. simulans. If there is population subdivision, one would expect that increasing the geographical scope of a sample would increase, not decrease, the observed level of nucleotide polymorphism. In this respect, comparison of our data to Begun and Whitley's is conservative. In addition, none of the other third chromosome loci show a significant departure from neutrality by any of the statistical tests used in this study. This is illustrated by the distributions of Tajima's D (Figure 5C), the probability of Fay and Wu's H, and the probability of Hudson's haplotype test (Figure 5D). For the latter two quantities, only loci with a sample size ≥8 are shown because the power to detect significant results with these tests increases with sample size. However, it should be noted that none of the loci with sample size <8 showed a significant departure from neutrality. Of course, if there is population subdivision then the comparisons in Figure 5, C and D, could be misleading due to the different sampling schemes used for the different loci. However, if one invokes population subdivision to explain the divergent haplotypes it becomes difficult to explain the low levels of polymorphism within haplotype 1. The most obvious way to divide the sample under a population subdivision model would be to consider all of the haplotype 1 alleles as a single population. When this is done the level of polymorphism in janA and janB drops to nearly zero (see Table 4). Thus, the comparison of variability among 3R loci would be even more extreme than shown in Figure 5, A and B, and would argue against a purely demographic explanation of the data. Although the janA-janB region shows the most extreme haplotype structure of loci on chromosome 3R, some loci on the X chromosome show a similar pattern (Labateet al. 1999; Begun and Whitley 2000). In general, it appears that the amount of variation on the X chromosome is lower than that on the autosomes, while the amount of linkage disequilibrium on the X is higher (Andolfatto and Przeworski 2000; Begun and Whitley 2000). This difference is larger than expected even after correcting for the different effective population sizes of the X and the autosomes and may be explained by stronger selection on X-linked loci (Begun and Whitley 2000).

Figure 5.

Comparison of DNA polymorphism at janA (open triangles), janB (open squares), and ocn (open circles) with 19 other D. simulans loci (solid circles) spanning chromosome arm 3R. (A) Two measures of nucleotide polymorphism (πsyn and θsyn) at synonymous sites. (B) Synonymous nucleotide polymorphism divided by interspecific divergence, where Dsyn is the average pairwise divergence at synonymous sites of the original eight D. simulans alleles from the outgroup D. melanogaster sequence. (C) Tajima's (1989) D statistic. (D) Probability of H (Fay and Wu 2000) and sub(i, j) (Hudsonet al. 1994) shown on a log scale. In D, only loci with a sample size ≥8 are shown.

Another potential explanation for the maintenance of two divergent haplotypes in a population is balancing selection. However, the very low levels of nucleotide polymorphism within haplotype 1 are inconsistent with this being an old balanced polymorphism. It is also possible that strong linkage disequilibria are maintained by epistatic selection favoring combinations of segregating sites (Kimura 1956; Lewontin 1974). This explanation seems unlikely because there are many linked sites over janA and janB and all of the segregating sites that distinguish the two haplotypes are at silent or noncoding positions. One possibility is that epistatic selection is acting at silent or noncoding sites to maintain mRNA or pre-mRNA secondary structures. Such interactions have been proposed to explain patterns of linkage disequilibria in the Adh gene of D. pseudoobscura (Kirbyet al. 1995). However, we find no evidence for strongly conserved RNA secondary structures in either janA or janB, using the comparative method of Parsch et al. (2000).

Finally we consider a model of genetic hitchhiking (Maynard Smith and Haigh 1974), in which neutral variants are driven to high frequency in a population due to their linkage with a positively selected variant. If selection is strong or recombination is low, hitchhiking will result in a “selective sweep” that removes variation in the region flanking the selected site (Kaplanet al. 1989; Stephanet al. 1992). A reduction in polymorphism is also expected under a model of “background selection” (Charlesworthet al. 1993), in which neutral variants are removed from the population due to the recurrent action of purifying selection at tightly linked sites. Several features of our data are consistent with the selective sweep hypothesis. For example, genetic hitchhiking is expected to affect the frequency distribution of variants at segregating sites such that derived variants will be in higher frequency than expected under a neutral equilibrium model (Fay and Wu 2000; Kim and Stephan 2000). The significantly negative value of Fay and Wu's H in the janA-janB region (Table 1) agrees with this prediction. Genetic hitchhiking is also expected to skew the frequency distribution of variants at segregating sites toward rare alleles, resulting in a significantly negative value of Tajima's D (Bravermanet al. 1995). We find that Tajima's D is significantly negative for janA and janB (Table 1). This is due to the large number of unique variants in our sample (Figure 2). However, many of these singletons can be inferred to be ancestral from the D. melanogaster outgroup sequence, so the negative Tajima's D is not caused by new mutations occurring after a complete selective sweep as modeled by Braverman et al. (1995). This is confirmed by the tests of Fu and Li (1993) that use an outgroup to identify derived singletons and do not produce a significant result when applied to our janA-janB data (D = −0.34, P > 0.10; F = −0.88, P > 0.10).

Although a selective sweep can explain the low level of polymorphism within haplotype 1 and the high frequency of many derived variants, it does not explain the presence of the highly divergent haplotype 2, which shows a normal level of polymorphism. One possibility is that the selective sweep is either temporally or spatially incomplete (Hudson et al. 1994, 1997), perhaps due to limited gene flow into ancestral African populations. It is also possible that there are positively selected variants at different sites in the two haplotypes, and fixation of a single haplotype is delayed until a recombination event brings the two variants together on the same chromosome (the “traffic” model; Kirby and Stephan 1996). A final possibility is that the positively selected site responsible for the hitchhiking event lies proximal to the janA gene, and there has been limited recombination between this unknown site and janA. Our results indicate that distally the haplotype structure is broken in the intergenic region between janB and ocn. However, the strong linkage disequilibrium in the janA-janB region does not allow us to define the proximal limit of the haplotype structure. Two of the chromosome 3R loci included in Begun and Whitley's (2000) study, Rh3 (Ayalaet al. 1993) and boss (Ayala and Hartl 1993), were sampled from a worldwide distribution and included some of the same lines used in our study. Both of these loci show high levels of variation relative to other loci on chromosome 3R and show no sign of haplotype structure. In particular, the line s3 (designated as “f” in those articles), which shows a quite divergent haplotype over the janA-janB region in our survey (Figure 2), does not show this same pattern at Rh3 or boss. Both of these loci lie proximal to janA, with boss being relatively close at 96F. The observation of high variation and no haplotype structure at boss again argues against a purely demographic explanation for our data and indicates that the unusual haplotype structure does not extend over a large portion of the chromosome. It is tempting to speculate that the janB gene may be a target of positive selection. Our previous work showed that within the D. melanogaster species subgroup janB has a faster rate of evolution than janA or ocn (Parschet al. 2001). janB is also the most divergent of these genes in a comparison between D. simulans and D. melanogaster (Table 4). In addition, janB shows the lowest level of within-species polymorphism (Table 1). This combination of high divergence and low polymorphism is suggestive of positive selection. However, further surveys of DNA sequence polymorphism in this region of the genome are required to identify the selected site or sites and allow estimation of the selection coefficient associated with the hitchhiking event.


We thank J. Braverman and J. Fay for providing computer programs and P. Capy and Y. Tao for providing D. simulans stocks. We are grateful to W. Stephan and members of the D. Hartl and J. Wakeley labs for valuable suggestions throughout the course of our work. We also thank M. Aguadé and two anonymous reviewers for comments on an earlier version of the manuscript. This work was supported by National Institutes of Health grants GM60035 and HG01250 to D.L.H.


  • Note added in proof: Following the submission of this manuscript, Rozas et al. (Rozas, J., M. Gullaud, G. Blandin and M. Aguadé, 2001, DNA variation at the rp49 gene region of Drosophila simulans: evolutionary inferences from an unusual haplotype structure. Genetics 158: 1147–1155) reported unusual haplotype structure in the rp49 gene region in both a European and an African population of D. simulans. The rp49 gene lies ~7 kb proximal to janA on chromosome 3R.

  • Communicating editor: M. Aguadé

  • Received April 2, 2001.
  • Accepted July 16, 2001.


View Abstract