Genetics, Vol. 158, 1147-1155, July 2001, Copyright © 2001

DNA Variation at the rp49 Gene Region of Drosophila simulans: Evolutionary Inferences From an Unusual Haplotype Structure

Julio Rozasa, Myriam Gullaud1,a, Gaëlle Blandin2,a, and Montserrat Aguadéa
a Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, 08071 Barcelona, Spain

Corresponding author: Julio Rozas, Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08071 Barcelona, Spain., julio{at}bio.ub.es (E-mail)

Communicating editor: W. STEPHAN


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

An ~1.3-kb region including the rp49 gene plus its 5' and 3' flanking regions was sequenced in 24 lines of Drosophila simulans (10 from Spain and 14 from Mozambique). Fifty-four nucleotide and 8 length polymorphisms were detected. All nucleotide polymorphisms were silent: 52 in noncoding regions and 2 at synonymous sites in the coding region. Estimated silent nucleotide diversity was similar in both populations ({pi} = 0.016, for the total sample). Nucleotide variation revealed an unusual haplotype structure showing a subset of 11 sequences with a single polymorphism. This haplotype was present at intermediate frequencies in both the European and the African samples. The presence of such a major haplotype in a highly recombining region is incompatible with the neutral equilibrium model. This haplotype structure in both a derived and a putatively ancestral population can be most parsimoniously explained by positive selection. As the rate of recombination in the rp49 region is high, the target of selection should be close to or within the region studied.


DROSOPHILA simulans, like D. melanogaster, is a cosmopolitan human commensal that originated in tropical Africa ~2.5 mya (LACHAISE et al. 1988 Down; POWELL 1997 Down). Populations of both species from that area could be considered ancestral populations and, therefore, neutral variation in those populations would be expected to be at mutation-drift equilibrium. On the other hand, populations from other areas would be derived populations and their variation might or might not be at equilibrium. In fact, there would have been ample room for adaptive evolution, i.e., for the action of natural selection, during the dispersal of these species from tropical to temperate regions. Additionally, derived populations of these species might still reflect the possible founder events associated with the out-of-Africa expansion of these species.

Both demographic events and natural selection acting in a particular genomic region can have similar effects on the pattern of nucleotide variation in that region. However, population history has a genome-wide effect and should affect, therefore, all regions of the genome. In contrast, both directional and balancing selection are locus specific and affect neutral variation only at loci tightly linked to the locus under selection. In this sense, the level of both within-population variation and between-population differentiation is generally lower for morphological characters, allozymes, and mitochondrial DNA variation in D. simulans than in D. melanogaster (as reviewed in SINGH and LONG 1992 Down); this might reflect a more recent expansion of the distribution area of D. simulans (but see BALLARD et al. 1996 Down for mtDNA).

Initial surveys of nucleotide sequence variation in nuclear genes of D. simulans generally analyzed few sequences sampled from different populations (see BEGUN and WHITLEY 2000 Down for references). An unusual haplotype structure was detected in some of these surveys (BEGUN and AQUADRO 1995 Down; EANES et al. 1996 Down; HASSON et al. 1998 Down; LABATE et al. 1999 Down), suggesting the presence of at least two old lineages in this species.

HAMBLIN and VEUILLE 1999 Down have studied variation in two X-linked gene regions (vermilion and G6pd) in several African and non-African populations of D. simulans. A strong haplotype substructure was detected both in West African and in all non-African populations studied. The haplotype structure observed in the vermilion region departed from predictions of the neutral theory for stationary populations in the non-African, but not in Central African populations. This departure from neutrality was considered evidence of a bottleneck in their rather recent foundation (HAMBLIN and VEUILLE 1999 Down). The authors also suggested that the nonequilibrium haplotype distributions might be compatible with ancient population subdivision and recent admixture in populations of D. simulans ancestral to the European and American populations.

The paucity in the number of haplotypes and/or in haplotype diversity detected in three out of the four loci analyzed in non-African populations by HAMBLIN and VEUILLE 1999 Down was considered to support a genome-wide phenomenon and, therefore, the population admixture hypothesis. In vermilion, however, the region surveyed was not randomly chosen, but its choice was based on the previous knowledge that it presented two divergent haplotypes (BEGUN and AQUADRO 1995 Down). Indeed, in the North American sample sequenced by BEGUN and AQUADRO 1995 Down, the haplotype test gave different results when applied to the complete region than when applied to the particular region chosen by HAMBLIN and VEUILLE 1999 Down. To draw any general conclusion, it is important, therefore, to survey nucleotide variation in African and non-African samples of D. simulans in a larger number of randomly chosen regions.

We have analyzed variation in an ~1.3-kb region encompassing the rp49 gene (named RpL32 in the FlyBase Drosophila database; http://flybase.bio.indiana.edu) in a European and a Southeast African population of D. simulans. This gene is located in band 99D of D. simulans and encodes ribosomal protein 49 (ribosomal protein L32 in FlyBase). Similarly to the vermilion region surveyed by HAMBLIN and VEUILLE 1999 Down, recombination in this autosomal region is expected to be high (KLIMAN and HEY 1993A Down; TRUE et al. 1996 Down). Extensive surveys of variation in the homologous region of D. subobscura in relation to chromosomal polymorphism (ROZAS and AGUADE 1993 Down, ROZAS and AGUADE 1994 Down; ROZAS et al. 1999 Down) indicate that, at least in this species, the rp49 region is a neutrally evolving region with normal levels of nucleotide polymorphism. Surprisingly, our initial survey of nucleotide variation in a European population of D. simulans revealed an unusual haplotype structure showing one rather common haplotype with zero variation even though the complete sample had a normal level of variation. This motivated extending the survey to a putatively ancestral population of that species in an effort to discern between historical and selective explanations.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Fly samples:
Twenty-four lines randomly sampled from two natural populations of D. simulans were studied: 10 lines from Montblanc, Tarragona, Spain (SimS lines) and 14 from Maputo, Mozambique (SimMz lines). The European and African samples were collected in September 1993 and in August 1997, respectively. We obtained highly inbred lines after 10 generations of sibmating. We also used 1 line of D. melanogaster (line M66), which was collected in Montemayor, Córdoba, Spain in March 1990 and was subsequently made isochromosomal for the third chromosome by the standard series of crosses with the TM6/MKRS balancer stock.

DNA extraction, PCR amplification, and DNA sequencing:
Genomic DNA was extracted using a modification of protocol 48 from ASHBURNER 1989 Down. An ~1.4-kb fragment, which included the rp49 gene (402 bp of coding region and a small intron of 59 bp) and its 5' and 3' flanking regions, was amplified by PCR (SAIKI et al. 1988 Down) using oligonucleotides designed on the published sequence of D. melanogaster (O'CONNELL and ROSBASH 1984 Down; GenBank accession no. X00848). Several oligonucleotides, designed at intervals of ~300 nucleotides, were used as primers for sequencing. The amplified fragments were cyclesequenced and separated on a Perkin-Elmer (Norwalk, CT) ABI PRISM 377 automated DNA sequencer following the manufacturer's instructions. For each line, the DNA was sequenced on both strands. The nucleotide sequences are available from the EMBL nucleotide sequence database under accession nos. Y13939 (D. melanogaster) and AJ309023–AJ309046 (D. simulans).

Data analysis:
Nucleotide sequences were assembled using the SeqEd version 1.0.3 program (Applied Biosystems, Inc., Foster City, CA), multiply aligned using the Clustal W program (THOMPSON et al. 1994 Down), and edited with the MacClade version 3.06 program (MADDISON and MADDISON 1992 Down). Phylogenetic analysis was performed with genetic distances corrected according to the Jukes and Cantor model (JUKES and CANTOR 1969 Down) using the neighbor-joining algorithm (SAITOU and NEI 1987 Down) implemented in the MEGA version 2 (KUMAR et al. 2000 Down) program. The analysis was conducted using the rp49 nucleotide sequence of D. melanogaster (line M66) as the outgroup; the bootstrap values were based on 1000 replicates.

The DnaSP version 3.50 software (ROZAS and ROZAS 1999 Down) was used to estimate population genetic parameters and genetic distances and also to perform different neutrality tests. The confidence intervals (and the P values) of several test statistics were obtained by Monte Carlo simulations based on the coalescent process for a neutral infinite-sites model and assuming a large and constant population size (KINGMAN 1982A Down, KINGMAN 1982B Down; HUDSON 1983 Down, HUDSON 1990 Down). The simulations were carried out either by assuming a value of {theta} ({theta} = 4Nu, where N is the effective population size and u is the per gene mutation rate; WATTERSON 1975 Down) or by fixing the number of segregating sites. As in both cases the simulations yielded similar results, we present only results based on the coalescent conditional on the number of segregating sites. The simulations were performed assuming either no intragenic recombination or intermediate levels of recombination (HUDSON 1983 Down, HUDSON 1990 Down). Each computer simulation was based on 10,000 (for no recombination) or 1000 (for intermediate levels of recombination) independent replicates. The empirical distribution of the corresponding statistic was thus generated and used to determine the confidence intervals.

The recombination parameter C (in Drosophila C = 2Nc, where N is the effective population size and c is the recombination rate per generation between the most distant sites) was estimated using the methods of HUDSON and KAPLAN 1985 Down and of HUDSON 1987 Down. The first method is based on RM or the minimum number of recombination events in the sample; estimates of RM were used to estimate C by coalescent simulations. The HUDSON 1987 Down method is based on the variance of the number of differences between pairs of sequences; in that case, the estimate of C can be obtained numerically. We also estimated the minimum value of C compatible with the observed value of RM (CL); thus, CL is an underestimate of the true C value. The CL value was estimated as the lowest value of C for which the right tail (5%) of the RM distribution contains values equal to or higher than the observed value of RM.

An estimate of C based on the estimates of c (TRUE et al. 1996 Down) in the rp49 region, or CM, was also obtained (CM = 49.2). This estimate was obtained following ANDOLFATTO and PRZEWORSKI 2000 Down and considering that (1) the rp49 and the Tpi regions (located in cytological bands 99D and 99E, respectively) have the same recombination rates (i.e., c = 0.92 x 10-8), (2) in D. simulans N = 2 x 106, and (3) the average length of the rp49 region is 1337 bp.

The overall genetic association between polymorphic sites was measured by the ZnS statistic (KELLY 1997 Down), which is the average of r2 (HILL and ROBERTSON 1968 Down) over all pairwise comparisons,

where S is the number of polymorphic sites and ri,j is the r estimator (HILL and ROBERTSON 1968 Down) between sites i and j. The confidence intervals of ZnS were determined by computer simulations using the coalescent algorithm.

The effect of intragenic recombination on nucleotide variation was studied by analyzing the levels of linkage disequilibrium between polymorphic sites in relation to the physical distance. A new test statistic, ZZ, that is defined as

(1)

was developed, where

(2)

and ZnS is the KELLY 1997 Down statistic. The ZA statistic is the average of r2 (HILL and ROBERTSON 1968 Down), but only between adjacent polymorphic sites. Because linkage disequilibrium decays with physical distance due to intragenic recombination, the ZZ statistic is expected to have larger positive values with increasing recombination, and eventually it could be used to estimate the recombination parameter C. Although in regions with high levels of recombination RM estimates might be inflated by parallel mutation, ZZ values would probably not be affected. Confidence intervals of the ZZ statistic were determined by coalescent simulations. An algorithm for computing the ZZ statistic from DNA sequence data and for estimating its confidence intervals will be implemented in the next release of the DnaSP software (ROZAS and ROZAS 1999 Down).


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

DNA sequence variation:
The rp49 gene plus its 5' and 3' flanking regions were sequenced in 24 lines of D. simulans (10 from Europe and 14 from Africa) and in one line of D. melanogaster. In D. simulans, a total of 54 polymorphic nucleotide sites (corresponding to 56 mutations) were identified over the 1292 bp examined (excluding all sites with alignment gaps). Polymorphisms at sites 626 and 713 (exon 2) were synonymous, whereas the rest were in noncoding regions (Fig 1). Eight insertion/deletion polymorphisms (ranging from 1 to 29 bp in length) were also detected in noncoding regions. Estimates of nucleotide variation are shown in Table 1.



View larger version (55K):
In this window
In a new window
Download PPT slide
 
Figure 1. Nucleotide polymorphism at the rp49 gene region in D. simulans. Nucleotide numbering is according to the rp49 sequence of the M66 line of D. melanogaster. Nucleotides identical to the first sequence are indicated by a dot. For length polymorphisms, the nucleotide position refers to the first site affected, and a dash indicates absence of the corresponding length variant. d, deletion; i, insertion; d#, deletion of #bp; i#, insertion of #bp. Polymorphic sites 471–479 correspond to the intron of the rp49 gene (I), and polymorphic sites 626 and 713 to exon 2 (E2) of the rp49 gene. SimS, D. simulans from Montblanc; SimMz, D. simulans from Maputo; MelM66, D. melanogaster line M66. A continuous line indicates a deletion affecting more than one polymorphic site. Shaded blocks indicate L#b sequence information. Rec, recombinant sequence. The last row gives the nucleotide information of the outgroup species D. melanogaster (putative ancestral state) for all polymorphic sites in D. simulans.


 
View this table:
In this window
In a new window

 
Table 1. Nucleotide polymorphism and divergence in the rp49 region

Ten of the 24 rp49 sequences surveyed were identical (for both nucleotide and insertion/deletion changes); there was an additional sequence (line SimMz7) that differed from this common haplotype by a single nucleotide substitution (Fig 1). These 11 sequences were designated as L#a. Most other sequences were designated as L#b. Two sequences (SimS13 and SimMz39) probably originated by recombination between the two divergent major haplotypes (L#a and L#b). Lines with the L#a haplotype were found both in Montblanc (six lines) and in Maputo (five lines); the frequency of this haplotype did not differ significantly between both populations (Fisher's exact test, P = 0.41).

Estimates of nucleotide divergence between populations (dxy = 0.0118, and da = 0.000) and FST values (FST = 0.020) were consistent with weak population subdivision. The methods of ROFF and BENTZEN 1989 Down and of HUDSON et al. 1992 Down were used for detecting genetic differentiation between populations (considering either all sequences or only L#b sequences). None of the tests performed detected any significant differentiation between populations (results not shown). This lack of genetic differentiation between populations is reflected in the neighbor-joining tree, where European and African sequences are interspersed (Fig 2).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 2. Neighbor-joining tree of the rp49 gene region sequences of D. simulans. Bootstrap values >90% are shown on the tree. D. mel, D. melanogaster (M66 line); Rec, recombinant sequence. The open and solid circles indicate lines from Montblanc and Maputo, respectively. (A) Tree built considering only nucleotide substitutions information. (B) Tree built using information on both nucleotide substitutions and length variants. Each indel was treated as a single mutational event.

Intragenic recombination and linkage disequilibrium:
We tested the effect of intragenic recombination on nucleotide sequence variation (Table 2). The estimated ZZ values were significantly positive, suggesting that in this region intragenic recombination has played an important role in shuffling nucleotide variation among DNA sequences. Estimates of the recombination parameter C obtained by the methods of HUDSON and KAPLAN 1985 Down and of HUDSON 1987 Down are shown in Table 2. The discrepancy between both estimates could be due to the particular structure of genetic variation found at the rp49 gene region (see below). This structure would also cause the CL values to be conservative; indeed, larger CL values were obtained when a single L#a sequence was considered in the computer simulations (16.9 and 2.7 for Montblanc and Maputo, respectively).


 
View this table:
In this window
In a new window

 
Table 2. Estimates of the recombination parameter

The significance of the pairwise associations between polymorphic sites, or linkage disequilibrium, was established by the chi-square test. In the total sample, 332 out of 1326 pairwise comparisons showed a significant association; 86 of these comparisons remained significant after applying the Bonferroni procedure. No significant overall association between polymorphic sites was detected by using the ZnS statistic (KELLY 1997 Down) without recombination or introducing conservative recombination estimates (C = CL; results not shown).

Neutrality tests:
We tested whether the observed pattern of nucleotide variation is compatible with that expected under neutrality. We applied several tests that compare different estimates of {theta} either using only intraspecific data (TAJIMA 1989 Down) or using intraspecific data and sequence information of another species (the outgroup) to determine the polarity of mutations (FU and LI 1993 Down; FAY and WU 2000 Down). All these tests failed to reject the neutral equilibrium model (Table 1). The HKA test (HUDSON et al. 1987 Down) was conducted to assess whether levels of polymorphism and divergence were correlated. We compared polymorphism (in D. simulans) and divergence (between D. simulans and D. melanogaster) in the rp49 region (present results) and in the vermilion region in samples from North Carolina and the Congo (BEGUN and AQUADRO 1995 Down). None of the HKA tests showed a significant deviation from neutral predictions (results not shown).

We tested by coalescent simulations whether the large number of identical sequences found in the sample was compatible with the neutral equilibrium model (see HUDSON et al. 1994 Down). We also investigated whether the presence of such a major haplotype was compatible with the equilibrium neutral model by analyzing the distribution of the number of haplotypes and of haplotype diversity (EWENS 1972 Down; STROBECK 1987 Down; DEPAULIS and VEUILLE 1998 Down). According to the neutral model, both the number of haplotypes (and the number of identical sequences) and the haplotype diversity are a function of the sample size and of {theta}. The analyses were performed under conservative assumptions (under no recombination and using the conservative CL estimate of the recombination parameter) and also under a more realistic assumption (using the CM estimate that is based on the comparison of the physical and genetic maps). The analyses showed a significant (or nearly significant) excess of identical sequences and a significant (or nearly significant) reduction in the number of haplotypes and in the haplotype diversity values (Table 3). Values of Fu's FS statistic (FU 1997 Down), which is related to Strobeck's S statistic, pointed in the same direction (results not shown).


 
View this table:
In this window
In a new window

 
Table 3. Haplotype distribution tests


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

In D. simulans, as in D. melanogaster, the rp49 gene is located at band 99D where recombination is rather high (KLIMAN and HEY 1993A Down; TRUE et al. 1996 Down; ANDOLFATTO and PRZEWORSKI 2000 Down; this article). The relatively high RM values and the significant ZZ estimates obtained from our data would support the conclusion that recombination in this region is not reduced. Furthermore, D. simulans is monomorphic at the chromosomal level and thus, in this species, no local decrease of recombination is expected as a consequence of chromosomal polymorphism.

The estimated silent nucleotide variation in the rp49 region ({theta} = 0.016) was lower than estimates for other regions that were also located on the 3R chromosomal arm (the average {theta} for 19 genes was 0.035; BEGUN and WHITLEY 2000 Down). However, most silent variation at the rp49 region corresponds to variation at noncoding sites, while estimates in BEGUN and WHITLEY 2000 Down are based on synonymous variation. Our data conform, therefore, to the general observation of lower variation in noncoding flanking regions than at synonymous sites of coding regions (MORIYAMA and POWELL 1996 Down).

Haplotype substructure and demographic factors:
Nucleotide variation at the rp49 region in the two populations of D. simulans stands out because it is highly structured. Both the European and African samples present the same haplotype at intermediate frequency. They also share other minor haplotypes (Fig 1 and Fig 2) and, in fact, no significant genetic differentiation was detected between populations.

In the subsample of lines that constitute the major haplotype in the rp49 region (11 lines designated as L#a), there was a single polymorphism and its rarest variant was present in only one line. Forty-five polymorphisms segregated, however, in L#b lines (n = 11). There were nine fixed differences between L#a and L#b lines (Fig 1). The presence of such a major and divergent haplotype (L#a lines) at the rp49 region is incompatible with the neutral equilibrium model, even in the absence of recombination.

Variation at the vermilion and rp49 regions departed from neutral expectations in a similar way. There are, however, important differences between both sets of results. First, in the vermilion region, only non-African populations showed a significant reduction in haplotype number and/or haplotype diversity. Second, in that region there was no major haplotype common to all populations surveyed. Both observations in the vermilion region are compatible with an important founder effect in the origin of these derived populations. The haplotype structure detected in other regions surveyed in samples from non-African populations might also be the result of founder events (HAMBLIN and VEUILLE 1999 Down; LABATE et al. 1999 Down; ANDOLFATTO and KREITMAN 2000 Down; DUVERNELL and EANES 2000 Down). It has been also argued that the haplotype structure detected in most surveys of non-African populations could be the result of population subdivision and recent admixture.

The presence in the rp49 region of the same major haplotype in an ancient and in a recently established population cannot be easily explained by founder events. It could be argued, however, that African populations of D. simulans were genetically differentiated (HAMBLIN and VEUILLE 1999 Down) and that the Maputo population was a recently established population. Even in that case, it would be rather unlikely that the same haplotype (haplotype L#a) was present at relatively high frequency in both populations (Montblanc and Maputo). Variation at the Acp26Aa region in lines from the same collections does not show such a haplotype structure (M. AGUADÉ, unpublished results) suggesting that the observed haplotype structure in the rp49 region is not the result of a genome-wide phenomenon. On the other hand, for the rp49 region only one subset of lines is depleted of variation. If the pattern observed were due to population subdivision and recent admixture, one of the subpopulations should have been nearly monomorphic for this region. This is rather unlikely since the level of silent nucleotide diversity in regions with normal rates of recombination can be quite important, even in species with small effective population sizes such as D. mauritiana and D. madeirensis (e.g., HEY and KLIMAN 1993 Down; KLIMAN and HEY 1993B Down; KHADEM et al. 2001 Down; see also GILLESPIE 1999 Down, GILLESPIE 2000 Down).

Haplotype structure and selective causes:
In Drosophila, only a few surveys of DNA sequence variation in regions with normal (or high) levels of recombination have revealed a high proportion of sequences with zero (or nearly zero) variation in fragments longer than 1.3 kb. The first such pattern was detected in the Sod region of D. melanogaster (HUDSON et al. 1994 Down, HUDSON et al. 1997 Down), and it was considered to reflect the hitchhiking effect of an advantageous mutation that was increasing in frequency. However, unlike in this study, only North American and European populations had been surveyed, which would always leave room for historical explanations. Similarly, a survey of DNA variation at the runt region of D. simulans in lines from different North American populations revealed that most lines (six out of eight) were identical or nearly identical (LABATE et al. 1999 Down).

Selection on favorable mutations can remove nucleotide variation at linked sites, causing a selective sweep or hitchhiking effect (MAYNARD SMITH and HAIGH 1974 Down; KAPLAN et al. 1989 Down; BARTON 1998 Down; FAY and WU 2000 Down; GILLESPIE 2000 Down; KIM and STEPHAN 2000 Down). Positive selection has thus been proposed to explain the pattern of nucleotide variation found at the Sod region in two populations of D. melanogaster (HUDSON et al. 1994 Down, HUDSON et al. 1997 Down). The rp49 data of D. simulans exhibits a pattern of nucleotide variation similar to that found for the Sod locus, i.e., two sets of highly diverged sequences in a region with normal levels of recombination. Moreover, we have found the same haplotype structure not only in Europe, but also in a population from the putative ancestral distribution area. It is thus unlikely that the unexpected pattern of variation found at the rp49 region was due to some founder event associated with the colonization of Europe by D. simulans. Therefore, positive selection would most parsimoniously explain the pattern of variation observed at this region. Although we did not detect the excess of high frequency-derived variants expected immediately after a selective sweep (FAY and WU 2000 Down; KIM and STEPHAN 2000 Down), eight of the nine mutations fixed in L#a sequences (relative to L#b sequences) were derived (see Fig 1), which would support the selective hypothesis. There are, however, several selective scenarios that could explain the presence of a major haplotype with low variation: (1) the selected haplotype could be in its transient phase either to fixation or to an equilibrium frequency, (2) the pattern could reflect a very recently established balanced polymorphism, or (3) the advantageous mutation could have attained fixation, but the rp49 region could be relatively far away from the advantageous mutation.

Because in D. simulans the rp49 gene is located in a genomic region with high recombination, the fragment affected by the proposed selective sweep should be short (KAPLAN et al. 1989 Down; STEPHAN et al. 1992 Down). In D. melanogaster, the genomic region encompassing the rp49 gene presents a high density of coding regions: 12 coding regions have been identified in a 20-kb fragment spanning the rp49 gene (FlyBase database). The conserved synteny between D. melanogaster and D. simulans allows prediction of a similar density in the latter species and, thus, any of these coding regions (or some regulatory regions) could have been the target of selection.

Time of hitchhiking:
The time back to the hitchhiking event can be inferred from the amount of nucleotide variation present in the hitchhiked haplotype. For this inference, we need to know (1) the neutral mutation rate for the rp49 region and (2) the expected topology of the gene genealogy. Assuming that silent substitutions (both at noncoding and synonymous sites) are neutral, the neutral mutation rate for the rp49 region can be estimated from the estimated silent nucleotide divergence between D. simulans and D. melanogaster (K silent = 0.072; Table 1). Assuming that the split of the D. melanogaster and D. simulans lineages occurred 2.5 mya (LACHAISE et al. 1988 Down; POWELL 1997 Down), the silent mutation rate would be 1.4 x 10-8 per nucleotide and per year and 1.4 x 10-5 on a per sequence basis (as the rp49 region includes 987 silent sites; Table 1). Several authors have shown (e.g., SLATKIN and HUDSON 1991 Down) that after a selective sweep the gene genealogy is star-like, i.e., a genealogy compressed at the internal nodes. If mutations are Poisson distributed, the expected number of mutations on the genealogy is µE(T), where T is the total length of the branches in the genealogy (in years), and µ is the mutation rate per sequence per year (HUDSON 1990 Down, Equation 1). In the rp49 region, only one mutation (a singleton variant) was detected (site 59 in SimMz7 line) among all L#a sequences (n = 11). Assuming a star genealogy for our sample, T would be 11*t (where t is the time back to the hitchhiking event) and, consequently, t would be ~6500 years. Thus, the proposed selective sweep would have occurred very recently. The lack of length variation in L#a sequences (Fig 1 and Fig 2) would also support the conclusion that the hitchhiking event was rather recent.

Although hitchhiking would most consistently explain the pattern of variation observed in the rp49 region, we have not definitively ruled out historical explanations. Indeed, the detection of some haplotype structure in other surveyed regions pointed to historical explanations. Only a multilocus approach using large population samples might allow discarding the admixture hypothesis. Also, analysis of variation across contiguous regions of the genome might be used to detect the differential signature of natural selection (NURMINSKY et al. 2001 Down). If the haplotype structure detected at the rp49 region were due to hitchhiking, it would decay and eventually disappear at some distance of this region. Analysis of variation in genomic regions located at increasing distances from the rp49 gene would, thus, allow contrasting of the selective hypothesis and, if confirmed, it would also allow delimiting the target of selection.


*  FOOTNOTES

1 Present address: Laboratoire de Génétique Moléculaire de la Différenciation, Institut Jacques Monod, 75251 Paris Cedex 05, France. Back
2 Present address: Unité de Génétique Moléculaire des Levures, Institut Pasteur, 75724 Paris Cedex 15, France. Back


*  ACKNOWLEDGMENTS

We thank Gema Blasco and David Salguero for technical support and Serveis Científico-Tècnics, Universitat de Barcelona, for automated sequencing facilities. We are especially grateful to A. Barbal and C. Arribas for collecting flies in Maputo, and S. O. Kolokotronis for his collaboration in the project. We also thank C. Segarra for valuable comments on the manuscript. G.B. and M.G. were supported by the Erasmus program between Université Denis Diderot, Paris, France, and Universitat de Barcelona, Spain. This work was supported by grants PB97-0918 from Comisión Interdepartamental de Ciencia y Tecnología, Spain and 1999SGR-25 from Comissió Interdepartamental de Recerca i Innovació Tecnològica, Catalonia, Spain, to M.A.

Manuscript received January 13, 2001; Accepted for publication April 11, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ANDOLFATTO, P. and M. KREITMAN, 2000  Molecular variation at the In(2L)t proximal breakpoint site in natural populations of Drosophila melanogaster and D. simulans.. Genetics 154:1681-1691[Abstract/Free Full Text].

ANDOLFATTO, P. and M. PRZEWORSKI, 2000  A genome-wide departure from the standard neutral model in natural populations of Drosophila. Genetics 156:257-268[Abstract/Free Full Text].

ASHBURNER, M., 1989 Drosophila: A Laboratory Handbook. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

BALLARD, J. W. O., J. HATZIDAKIS, T. L. KARR, and M. KREITMAN, 1996  Reduced variation in Drosophila simulans mitochondrial DNA. Genetics 144:1519-1528[Abstract].

BARTON, N. H., 1998  The effect of hitch-hiking on neutral genealogies. Genet. Res. 72:123-133.

BEGUN, D. J. and C. F. AQUADRO, 1995  Molecular variation at the vermilion locus in geographically diverse populations of Drosophila melanogaster and D. simulans.. Genetics 140:1019-1032[Abstract].

BEGUN, D. J. and P. WHITLEY, 2000  Reduced X-linked nucleotide polymorphism in Drosophila simulans.. Proc. Natl. Acad. Sci. USA 97:5960-5965[Abstract/Free Full Text].

DEPAULIS, F. and M. VEUILLE, 1998  Neutrality tests based on the distribution of haplotypes under an infinite site model. Mol. Biol. Evol. 15:1788-1790[Medline].

DUVERNELL, D. D. and W. F. EANES, 2000  Contrasting molecular population genetics of four hexokinases in Drosophila melanogaster, D. simulans and D. yakuba.. Genetics 156:1191-1201[Abstract/Free Full Text].

EANES, W. F., M. KIRCHNER, J. YOON, C. H. BIERMANN, and I.-N. WANG et al., 1996  Historical selection, amino acid polymorphism and lineage-specific divergence at the G6pd locus in Drosophila melanogaster and D. simulans.. Genetics 144:1027-1041[Abstract].

EWENS, W. J., 1972  The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3:87-112[Medline].

FAY, J. C. and C.-I WU, 2000  Hitchhiking under positive Darwinian selection. Genetics 155:1405-1413[Abstract/Free Full Text].

FU, Y.-X., 1997  Statistical tests of neutrality against population growth, hitchhiking and background selection. Genetics 147:915-925[Abstract].

FU, Y.-X. and W.-H. LI, 1993  Statistical tests of neutrality of mutations. Genetics 133:693-709[Abstract].

GILLESPIE, J. H., 1999  The role of population size in molecular evolution. Theor. Popul. Biol. 55:145-156[Medline].

GILLESPIE, J. H., 2000  Genetic drift in an infinite population: the pseudohitchhicking model. Genetics 155:909-919[Abstract/Free Full Text].

HAMBLIN, M. T. and M. VEUILLE, 1999  Population structure among African and derived populations of Drosophila simulans: evidence for ancient subdivision and recent admixture. Genetics 153:305-317[Abstract/Free Full Text].

HASSON, E., I. N. WANG, L. W. ZENG, M. KREITMAN, and W. F. EANES, 1998  Nucleotide variation in the triosephosphate isomerase (Tpi) locus of Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 15:756-769[Abstract].

HEY, J. and R. M. KLIMAN, 1993  Population genetics and phylogenetics of DNA sequence variation at multiple loci within the Drosophila melanogaster species complex. Mol. Biol. Evol. 10:804-822[Abstract].

HILL, W. G. and A. ROBERTSON, 1968  Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38:226-231.

HUDSON, R. R., 1983  Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183-201[Medline].

HUDSON, R. R., 1987  Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245-250[Medline].

HUDSON, R. R., 1990 Gene genealogies and the coalescent process, pp. 1–44 in Oxford Surveys in Evolutionary Biology, edited by P. H. HARVEY and L. PARTRIDGE. Oxford University Press, New York.

HUDSON, R. R. and N. L. KAPLAN, 1985  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147-164[Abstract/Free Full Text].

HUDSON, R. R., M. KREITMAN, and M. AGUADÉ, 1987  A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159[Abstract/Free Full Text].

HUDSON, R. R., D. D. BOOS, and N. L. KAPLAN, 1992  A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9:138-151[Abstract].

HUDSON, R. R., K. BAILEY, D. SKARECKY, J. KWIATOWSKI, and F. J. AYALA, 1994  Evidence for positive selection in the Superoxide Dismutase (Sod) region of Drosophila melanogaster.. Genetics 136:1329-1340[Abstract].

HUDSON, R. R., A. G. SÁEZ, and F. J. AYALA, 1997  DNA variation at the Sod locus of Drosophila melanogaster: an unfolding story of natural selection. Proc. Natl. Acad. Sci. USA 94:7725-7729[Abstract/Free Full Text].

JUKES, T. H., and C. R. CANTOR, 1969 Evolution of protein molecules, pp. 21–120 in Mammalian Protein Metabolism, edited by H. W. MUNRO. Academic Press, New York.

KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989  The "hitchhiking effect" revisited. Genetics 123:887-899[Abstract/Free Full Text].

KELLY, J. K., 1997  A test of neutrality based on interlocus associations. Genetics 146:1197-1206[Abstract].

KHADEM, M., J. ROZAS, C. SEGARRA, and M. AGUADÉ, 2001  DNA variation at the rp49 gene region in Drosophila madeirensis and D. subobscura from Madeira: inferences about the origin of an insular endemic species. J. Evol. Biol. in press.

KIM, Y. and W. STEPHAN, 2000  Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics 155:1415-1427[Abstract/Free Full Text].

KINGMAN, J. F. C., 1982a  The coalescent. Stochastic Processes and Their Applications 13:235-248.

KINGMAN, J. F. C., 1982b  On the genealogy of large populations. J. Appl. Probab. 19A:27-43.

KLIMAN, R. M. and J. HEY, 1993a  Reduced natural selection associated with low recombination in Drosophila melanogaster.. Mol. Biol. Evol. 10:1239-1258[Abstract].

KLIMAN, R. M. and J. HEY, 1993b  DNA sequence variation at the period locus within and among species of the Drosophila melanogaster complex. Genetics 133:375-387[Abstract].

KUMAR, S., K. TAMURA, I. JAKOBSEN and M. NEI, 2000 MEGA, Molecular Evolutionary Genetics Analysis, version 2.0.

LABATE, J. A., C. H. BIERMANN, and W. F. EANES, 1999  Nucleotide variation at the runt locus in Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 16:724-731[Abstract].

LACHAISE, D., M.-L. CARIOU, J. R. DAVID, F. LEMEUNIER, and L. TSACAS et al., 1988  Historical biogeography of the Drosophila melanogaster species subgroup. Evol. Biol. 22:159-255.

MADDISON, W. P., and D. R. MADDISON, 1992 MacClade: Analysis of Phylogeny and Character Evolution. Version 3.0. Sinauer, Sunderland, MA.

MAYNARD SMITH, J. and J. HAIGH, 1974  The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35[Medline].

MORIYAMA, E. N. and J. R. POWELL, 1996  Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261-277[Abstract].

NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.

NEI, M. and T. GOJOBORI, 1986  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426[Abstract].

NURMINSKY, D., D. DE AGUIAR, C. D. BUSTAMANTE, and D. L. HARTL, 2001  Chromosomal effects of rapid gene evolution in Drosophila melanogaster.. Science 291:128-130[Abstract/Free Full Text].

O'CONNELL, P. and R. ROSBASH, 1984  Sequence, structure and codon preference of the Drosophila ribosomal protein 49 gene. Nucleic Acids Res. 12:5495-5513[Abstract/Free Full Text].

POWELL, J. R., 1997 Progress and Prospects in Evolutionary Biology. The Drosophila Model. Oxford University Press, New York.

ROFF, D. A. and P. BENTZEN, 1989  The statistical analysis of mitochondrial DNA polymorphisms: {chi}2 and the problem of small samples. Mol. Biol. Evol. 6:539-545[Abstract].

ROZAS, J. and M. AGUADÉ, 1993  Transfer of genetic information in the rp49 region of Drosophila subobscura between different chromosomal gene arrangements. Proc. Natl. Acad. Sci. USA 90:8083-8087[Abstract/Free Full Text].

ROZAS, J. and M. AGUADÉ, 1994  Gene conversion is involved in the transfer of genetic information between naturally occurring inversions of Drosophila.. Proc. Natl. Acad. Sci. USA 91:11517-11521[Abstract/Free Full Text].

ROZAS, J. and R. ROZAS, 1999  DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175[Abstract/Free Full Text].

ROZAS, J., C. SEGARRA, G. RIBÓ, and M. AGUADÉ, 1999  Molecular population genetics of the rp49 gene region in different chromosomal inversions of Drosophila subobscura.. Genetics 151:189-202[Abstract/Free Full Text].

SAIKI, R. K., D. H. GELFAND, S. STOFFEL, S. J. SCHARF, and R. HIGUCHI et al., 1988  Primer-directed enzymatic amplification of DNA with a thermostable polymerase. Science 239:487-491[Abstract/Free Full Text].

SAITOU, N. and M. NEI, 1987  The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425[Abstract].

SINGH, R. S. and A. D. LONG, 1992  Geographic variation in Drosophila: from molecules to morphology and back. Trends Ecol. Evol. 7:340-345.

SLATKIN, M. and R. R. HUDSON, 1991  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129:555-562[Abstract].

STEPHAN, W., T. H. E. WIEHE, and M. W. LENZ, 1992  The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor. Popul. Biol. 41:237-254.

STROBECK, C., 1987  Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117:149-153[Abstract/Free Full Text].

TAJIMA, F., 1989  Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595[Abstract/Free Full Text].

TRUE, J. R., J. M. MERCER, and C. C. LAURIE, 1996  Differences in crossover frequency and distribution among three sibling species of Drosophila. Genetics 142:507-523[Abstract].

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680[Abstract/Free Full Text].

WATTERSON, G. A., 1975  On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276[Medline].




This article has been cited by other articles:


Home page
GeneticsHome page
A. Ramirez-Soriano, S. E. Ramos-Onsins, J. Rozas, F. Calafell, and A. Navarro
Statistical Power Analysis of Neutrality Tests Under Demographic Expansions, Contractions and Bottlenecks With Recombination
Genetics, May 1, 2008; 179(1): 555 - 567.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Derome, E. Baudry, D. Ogereau, M. Veuille, and C. Montchamp-Moreau
Selective Sweeps in a 2-Locus Model for Sex-Ratio Meiotic Drive in Drosophila simulans
Mol. Biol. Evol., February 1, 2008; 25(2): 409 - 416.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Sanchez-Gracia and J. Rozas
Unusual Pattern of Nucleotide Sequence Variation at the OS-E and OS-F Genomic Regions of Drosophila simulans
Genetics, April 1, 2007; 175(4): 1923 - 1935.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Ojeda, J. Rozas, J. M. Folch, and M. Perez-Enciso
Unexpected High Polymorphism at the FABP4 Gene Unveils a Complex History for Pig Populations
Genetics, December 1, 2006; 174(4): 2119 - 2127.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
H. Quesada, S. E. Ramos-Onsins, J. Rozas, and M. Aguade
Positive Selection Versus Demography: Evolutionary Inferences Based on an Unusual Haplotype Structure in Drosophila simulans
Mol. Biol. Evol., September 1, 2006; 23(9): 1643 - 1647.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
E. Baudry, N. Derome, M. Huet, and M. Veuille
Contrasted Polymorphism Patterns in a Large Sample of Populations From the Evolutionary Genetics Model Drosophila simulans
Genetics, June 1, 2006; 173(2): 759 - 767.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. J. Lawson Handley, L. Berset-Brandli, and N. Perrin
Disentangling Reasons for Low Y Chromosome Variation in the Greater White-Toothed Shrew (Crocidura russula)
Genetics, June 1, 2006; 173(2): 935 - 942.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. D. Stump, M. C. Fitzpatrick, N. F. Lobo, S. Traore, N. Sagnon, C. Costantini, F. H. Collins, and N. J. Besansky
Centromere-proximal differentiation and speciation in Anopheles gambiae
PNAS, November 1, 2005; 102(44): 15930 - 15935.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
P. K. Ingvarsson
Molecular Population Genetics of Herbivore-induced Protease Inhibitor Genes in European Aspen (Populus tremula L., Salicaceae)
Mol. Biol. Evol., September 1, 2005; 22(9): 1802 - 1812.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. P. Lazzaro
Elevated Polymorphism and Divergence in the Class C Scavenger Receptors of Drosophila melanogaster and D. simulans
Genetics, April 1, 2005; 169(4): 2023 - 2034.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. D. Meiklejohn, Y. Kim, D. L. Hartl, and J. Parsch
Identification of a Locus Under Complex Positive Selection in Drosophila simulans by Haplotype Mapping and Composite-Likelihood Estimation
Genetics, September 1, 2004; 168(1): 265 - 279.
[Abstract] [Full Text]