Nucleotide variation at the ribosomal protein 49 (rp49) gene region has been studied in 75 lines of Drosophila subobscura belonging to four chromosomal arrangements (Ost, O3+4, O3+4+8, and O3+4+23). The location of the rp49 gene region within the inversion loop differs among heterokaryotypes: it is very close to one of the breakpoints in heterozygotes involving Ost chromosomes, while it is in a more central position in all other heterokaryotypes. The distribution of nucleotide polymorphism in the different arrangements is consistent with a monophyletic origin of the inversions. The data also provide evidence that gene conversion and possibly double crossover are involved in shuffling nucleotide variation among gene arrangements. The analyses reveal that the level of genetic exchange is higher when the region is located in a more central position of the inverted fragment than when it is close to the breakpoints. The pairwise difference distributions as well as the negative values of Tajima's and Fu and Li's statistics further support the hypothesis that nucleotide variation within chromosomal arrangements still reflects expansion after the origin of the inversions. Under the expansion model, we have estimated the time of origin of the studied inversions.
INVERSION polymorphism is widespread in the genus Drosophila. In Drosophila subobscura, for instance, >80 different chromosomal gene arrangements have been described (reviewed in Krimbas and Loukas 1980; Krimbas 1992; Powell 1997). In this species, inversion polymorphism is distributed among its five acrocentric chromosomes, in contrast to other Drosophila species (like D. pseudoobscura). In D. subobscura there is also ample evidence for the adaptive value of chromosomal inversion polymorphism. Studies of the distribution of chromosomal polymorphism in different localities have shown that the frequencies of some chromosomal arrangements exhibit clines correlated with latitude (Krimbas 1992). Although parallel gradual variation does not imply a causal relationship, and therefore is not unambiguous evidence of selection, in D. subobscura the correlation between latitude and the frequency of certain chromosomal arrangements was found both in the Old World and in the recently colonized areas of North and South America. The presence of parallel clines in different geographic areas clearly supports the adaptive value of the inversion polymorphism (Prevostiet al. 1988).
It is generally accepted that the adaptive character of inversion polymorphism is due to the gene content of the different gene arrangements (see Powell 1997). Each gene arrangement could contain a particular combination of genes that, because of the suppression of recombination in heterokaryotypes, would be maintained as a block. Little is known, however, about the relative importance of recombination (i.e., gene flow) and mutation in shaping nucleotide variation of different gene arrangements. If recombination were totally suppressed in inversion heterozygotes, DNA regions included in a particular inversion would evolve independently in inverted and noninverted chromosomes; on the other hand, recombination as a homogenizing factor can prevent their independent evolution.
To date, there are few empirical studies relating nucleotide and chromosomal variation, and most of them have been performed in three species of Drosophila: D. melanogaster (Aguadé 1988a; Bénassiet al. 1993; Wesley and Eanes 1994; Hasson and Eanes 1996), D. pseudoobscura (Aquadroet al. 1991; Popadić and Anderson 1994, 1995; Popadićet al. 1995; Babcock and Anderson 1996), and D. subobscura (Rozas and Aguadé 1990, 1993, 1994; Rozaset al. 1995). However, these studies have provided useful information about the evolution of inversions. For example, at the rp49 (ribosomal protein 49) gene region of D. subobscura, Rozas and Aguadé (1990, 1994) detected genetic exchange between different chromosomal gene arrangements, and they proposed that this transfer was accomplished by gene conversion.
Here, we have studied nucleotide variation at the rp49 gene region of D. subobscura in the four major chromosomal arrangements of the O chromosome with inversions that include the rp49 gene (Ost, O3+4, O3+4+8, and O3+4+23). As depicted in Figure 1, arrangements Ost and O3+4 arose independently from the O3 arrangement (Ramos-Onsinset al. 1998). Arrangement O3 is not present in extant populations of D. subobscura, but it is present in the closely related species D. madeirensis and D. guanche. On the other hand, O3+4+8 and O3+4+23 originated from the O3+4 arrangement, and in contrast these three arrangements coexist in some populations. In the arrangements studied, the location of the gene within the inversion loop differs in the different heterokaryotypes: between Ost and the other arrangements the rp49 gene is located near one of the breakpoints, while in heterokaryotypes between O3+4, O3+4+8, and O3+4+23, the gene is in a more central position in the inversion loop. The different location of the gene in the heterokaryotypes is expected to affect the level of genetic exchange between arrangements because near breakpoints it can be due exclusively to gene conversion; however, in more internal positions of the inversion loop both gene conversion and double crossover may contribute to the genetic exchange. The present study therefore tries to investigate the level of gene flow between arrangements. Moreover, the study aims to assess if nucleotide variation in the different gene arrangements is at stationary equilibrium and to estimate the age of the inversions.
MATERIALS AND METHODS
Fly samples: Seventy-five D. subobscura lines from El Pedroso (Spain) and from Bizerte (Tunisia) were used. Strains (and DNA sequences) from El Pedroso have been reported previously (Rozas and Aguadé 1994). The 41 strains newly reported were collected at Bizerte (Tunisia) in May 1996. The Va/Ba balancer stock (Sperlichet al. 1977) was used to obtain isochromosomal lines. For each line the chromosomal gene arrangement was determined as previously described in Rozas and Aguadé (1990). The location of the rp49 gene on the O chromosome defines four major chromosomal classes: Ost, O3+4, O3+4+8, and O3+4+23. As shown in Figure 1, these chromosomal classes differ by at least one inversion that includes the rp49 gene, and therefore recombination by a single crossover would be prevented in the different heterokaryotypes. Chromosomes with both inversions 3 and 4 (i.e., O3+4, O3+4+8, and O3+4+23 chromosomes) are generically cited as O3+4+X. Although the sequenced lines are not a random sample of the population, for a given chromosomal class they were randomly chosen (Table 1).
DNA sequencing: The rp49 nucleotide sequences for the 34 lines of El Pedroso have been previously reported (Rozas and Aguadé 1994; EMBL/GenBank accession numbers X80076-X80109). Genomic DNA from the Tunisian isochromosomal lines was extracted using a modification of protocol 48 from Ashburner (1989). An ∼1.6-kb region including the rp49 coding region (402 bp) and its 5′ and 3′ flanking regions was amplified by PCR (Saikiet al. 1988). Several oligonucleotides, designed at intervals of ∼300 nucleotides, were used as primers for sequencing. The amplified fragments were cyclesequenced and separated on a Perkin-Elmer (Norwalk, CT) ABI PRISM 377 automated DNA sequencer following the manufacturer's instructions. For each line, the sequence of both strands was determined. The newly reported nucleotide sequences will appear in the EMBL, GenBank, and DDBJ Nucleotide Sequence Databases under the accession numbers AJ228881-AJ228921.
Data analysis: The rp49 sequence of the H27 line of D. subobscura (Aguadé 1988b; accession number M21333) was used as the reference sequence. Sequences were multiply aligned using the Clustal W program (Thompsonet al. 1994) and edited with the MacClade version 3.06 program (Maddison and Maddison 1992). Phylogenetic analysis was performed using the neighbor-joining algorithm (Saitou and Nei 1987) implemented in the MEGA version 1.01 program (Kumaret al. 1994); bootstrap values were based on 500 replicates. The rp49 nucleotide sequences of D. guanche and D. madeirensis (Ramos-Onsinset al. 1998; accession numbers Y09707-Y09708) were used as outgroups.
The DnaSP version 2.82 program (Rozas and Rozas 1997) was used to estimate nucleotide diversity, genetic distances, and genetic differentiation between populations and to detect gene conversion tracts. Nucleotide diversity, π, or the average number of nucleotide differences per site, was estimated using Equation 10.5 of Nei (1987). The silent nucleotide diversity (including both noncoding positions and synonymous sites in the coding region) was estimated as the average number of silent nucleotide differences per silent site (Nei and Gojobori 1986; Nei 1987). DNA divergence between arrangements was estimated as Dxy, the average number of nucleotide substitutions between populations (Equation 10.20 in Nei 1987), and as Da, the net number of nucleotide substitutions (Equation 10.21 in Nei 1987). FST, the proportion of nucleotide diversity that is attributable to variation between populations, was estimated using Equation 3 in Hudson et al. (1992a). The average level of gene flow, measured as Nm, was estimated from FST, assuming that populations are structured in an island model and are at migration-drift equilibrium (Wright 1951; Hudsonet al. 1992a).
The DnaSP program (Rozas and Rozas 1997) was also used to perform some neutrality tests. Tajima's test (Tajima 1989) and Fu and Li's tests (Fu and Li 1993) were performed using η, the total number of mutations (also called the minimum number of mutations), instead of S, the number of segregating sites (i.e., considering the additional mutations present in those sites segregating for three nucleotides). The outgroup used for Fu and Li's tests was the D. guanche rp49 sequence (Ramos-Onsinset al. 1998).
The critical values of Tajima's D statistic (Tajima 1989) and of the raggedness statistic r (Harpendinget al. 1993; Equation 1 in Harpending 1994) were obtained by computer simulations (10,000 replicates) using the coalescent algorithm (for no recombination) described in Hudson (1990). The coalescent approach (Hudson 1990) was also used to estimate the confidence intervals of Tajima's D and r statistics for different levels of recombination.
We performed computer simulations to estimate the probability of obtaining k shared polymorphic sites segregating among all gene arrangements. The main assumption of the model is that the mutation rate is the same for all silent sites. In each replicate we generated four sets of sequences (one for each chromosomal class) of n sites (the number of silent sites), and we randomly spread in each set as many mutations as we had observed in each chromosomal class (each mutation would generate a polymorphic site). The probability of obtaining a number of shared polymorphic sites equal to or higher than the observed number k, P (K ≥ k), was estimated as the proportion of the simulated replicates, where the number of shared polymorphic sites segregating in all four sets of sequences was ≥k. The null hypothesis that shared polymorphic sites are due to independent accumulation of mutations is rejected if that probability is <5% critical value.
Nucleotide variation: A total of 137 polymorphic sites (representing 147 mutations) were detected in the rp49 region among the 75 lines of D. subobscura (Figure 2); all polymorphisms were silent. The average length of the DNA region studied was 1515 bp, while the total number of sites compared was 1467 (excluding sites with alignment gaps), which represented a total of 1161 silent sites (including both synonymous sites and noncoding positions). Tables 2 and 3 give a summary of nucleotide polymorphism for the complete region, for the different functional regions, and for the different chromosomal classes. For the total data set, the average nucleotide diversity was π – 0.010, although the corresponding estimates for each chromosomal class were slightly lower (0.006-0.008). Figure 3 shows a plot of the estimates of polymorphism and divergence across the region studied. In the 5′ flanking region there were some peaks of high polymorphism (π) corresponding to regions of high divergence (K).
Genetic differentiation between gene arrangements: Table 4 gives a summary of the genetic differentiation between gene arrangements. We found the highest values of genetic differentiation, as measured by Dxy and Da, in all comparisons including the Ost gene arrangement. In fact, in comparisons between O3+4+X chromosomes, the Da values were ∼0.001, while in comparisons between Ost and O3+4+X chromosomes the Da values were ∼0.008. In addition, in comparisons including Ost, the Dxy values (and also Da values) were rather similar regardless of the coexistence of arrangements in the same population (as, for instance, Ost and O3+4) or not (as, for instance, Ost and O3+4+23). Thus, geographic origin has a weak effect on genetic differentiation estimates, which is consistent with the lack of geographic differentiation within arrangement detected in Europe (Rozaset al. 1995). As in Dxy and Da estimates, the largest FST values were also found in comparisons involving the Ost arrangement, and consequently these comparisons gave the lowest Nm values. We tested whether the different gene arrangements were genetically differentiated using the test statistic (Hudsonet al. 1992b). All pairwise comparisons showed highly significant probability values (Table 4), indicating that, despite the lower genetic differentiation of O3+4+X chromosomes, all chromosomal classes were genetically differentiated. The differentiation between gene arrangements can be attributed to the presence of several fixed (or nearly fixed) nucleotide differences.
Comparison of the polymorphisms present in the different gene arrangements revealed several polymorphic sites segregating for the same variants in different arrangements, that is, shared polymorphic sites. Assuming that inversions have a unique origin, no shared polymorphism is expected at the origin of a particular inversion. Therefore, the presence of shared polymorphic sites can be due to: (i) mutations arisen independently in both gene arrangements (parallel mutations) or (ii) the transfer of genetic information between gene arrangements. Under the assumption that mutations arise randomly along the rp49 region, the probability of obtaining k polymorphic sites shared by two chromosomal classes was estimated from the hypergeometric distribution (Rozas and Aguadé 1994) (Table 4), while the probability of obtaining k polymorphic sites shared among all chromosomal classes was estimated by computer simulations (see materials and methods). Probability values were obtained considering only the silent sites (i.e., noncoding sites plus synonymous sites in the coding region, n – 1161) and assuming that each site can have only two states (i.e., two different nucleotides). This assumption is more conservative than that previously used in Rozas and Aguadé (1994), considering that each site had four possible states. The computer simulation analysis was performed according to the number of polymorphic sites observed in each gene arrangement (46, 54, 56, and 53); in this case, the probability of obtaining four (the observed number of polymorphic sites shared by all chromosomal classes) or more shared polymorphic sites segregating in all four gene arrangements is very low (P – 0.0000). In all cases the null hypothesis that shared polymorphic sites are due to independent accumulation of mutations was rejected.
Genetic differentiation is a prerequisite to identify the footprint left by gene conversion in DNA sequences. Betrán et al. (1997) described an algorithm to detect gene conversion tracts from DNA sequence data from different populations. That method depends on the parameter ψ (Equation A4 in Betránet al. 1997), which measures the probability of detecting a converted site; the higher the value of ψ, the more accurate will be the estimate of the number and length of the gene conversion tracts. In our sample, and in agreement with the Dxy and Da values, the ψ values were relatively high between Ost and O3+4+X sequences and rather low among O3+4+X sequences (Table 4). Using the algorithm of Betrán et al. (1997) we identified several gene conversion tracts (Figure 2). This algorithm can detect gene conversion tracts only if they contain at least two informative sites (Equation A1 in Betránet al. 1997), and the tract length is underestimated by the nucleotide distance between them. Nearly all putative gene conversion tracts detected were transferred between Ost and O3+4+X sequences. Although no distinction can be made between gene conversion and double crossover events if the tract includes the outermost informative nucleotides, genetic exchange by double crossover is expected to affect longer chromosomal regions than gene conversion. In this sense, two of the detected recombination events (in lines TB154 and TB390a with the O3+4+23 arrangement) could correspond to double crossovers instead of to gene conversion events as the length of these tracts (between O3+4+X arrangements) is unusually large. Tracts in lines TB154 and TB390a are longer than 1 kb, while the mean length of the gene conversion tracts observed between Ost and O3+4 arrangements is 51 bp. In addition, these long tracts include the outermost informative nucleotides of the data set, and, consequently, it cannot be dismissed that their length was still considerably large.
Figure 4 shows the neighbor-joining tree (Saitou and Nei 1987) of the rp49 sequences. In the gene tree, sequences corresponding to the Ost gene arrangement form a monophyletic group, while those corresponding to the other gene arrangements (O3+4, O3+4+8, and O3+4+23) are mixed. Under the assumption that inversions are monophyletic, the topology of the tree would indicate that gene flow is higher among O3+4+X sequences than between O3+4+X and Ost sequences.
Neutrality tests: The observed distribution of mutations within species was contrasted with that predicted by the neutral theory. The Tajima (1989) and the Fu and Li (1993) tests contrast different measures of θ (θ – 4Nu, where N is the effective population size and u is the mutation rate): those based on the number of segregating sites, on the number of pairwise differences, and on the number of mutations in external branches of the genealogy. Although the neutral model in general was not rejected, the probability values were very close to the 5% critical value (Table 3; see also Table 5).
We have also analyzed the pairwise nucleotide difference distribution, or mismatch distribution (Slatkin and Hudson 1991; Rogers and Harpending 1992), for each gene arrangement (Figure 5). These authors have shown that the expected pairwise difference distribution in a growing population resembles a Poisson distribution. In all gene arrangements, the observed distribution was close to the Poisson distribution and was fit poorly by the geometric distribution, which is expected in populations of constant size. Although the observed shape of the distribution is consistent with a process of population (in this case, gene arrangement) expansion, it does not constitute any evidence for expansion. In fact, under the neutral model with no intragenic recombination, the expected geometric distribution in populations of constant size is the sum of several quite different distributions (including bimodal and trimodal distributions; Watterson 1975; Slatkin and Hudson 1991; Rogers and Harpending 1992). However, the expected values of the raggedness statistic r, which quantifies the smoothness of the pairwise difference distribution, are lower in expanding populations than in populations of constant size (Harpendinget al. 1993; Harpending 1994). To test whether our data are compatible with the constant population size model (Eller and Harpending 1996), we performed simulations on the basis of the neutral coalescent process without recombination (Hudson 1990). In the simulations the sample size (the studied sample size in each gene arrangement) and θ (the θ estimated from the average number of nucleotide differences) were fixed for all replicates. The parameter r was estimated in each replicate, and the proportion of cases (replicates) where the r value was lower than the observed value was computed (Table 3). The constant population size model was rejected in all cases except for the O3+4+23 gene arrangement (P – 0.215); however, when nucleotide variants included in gene conversion tracts were subtracted, that model was also rejected for this arrangement (P – 0.04; Table 5).
On the other hand, in populations of constant size the expected shape of the pairwise difference distribution is a function of the intragenic recombination parameter C – 4Nc (where N is the effective population size, and c is the recombination rate per generation between the most distant sites). For high values of C the distribution of the pairwise differences might resemble a Poisson distribution, and thus the raggedness r statistic is expected to be lower. Additionally, the variance of the distribution of Tajima's D statistic is expected to be reduced with increasing values of the recombination parameter. Because there is recombination in homokaryotypes, we have conducted computer simulations to determine the critical values of the raggedness r and Tajima's D statistics under different values of the recombination parameter (Table 6). In general, either the raggedness r statistic or Tajima's D statistic estimated from our sequences was significantly different from the expected values.
Origin and evolution of inversions: It has been classically considered that inversions have a unique origin, that is, that they are monophyletic (Powell 1997 and references therein), due to the low probability of generating two simultaneous breaks at exactly the same positions independently in different chromosomes. The monophyletic character has been questioned, however, by the observation in some species of hot spots of inversion breakpoints and by the discovery of transposable elements. It has been argued that transposable elements, which can generate chromosomal rearrangements in the laboratory, could also have generated the naturally occurring inversions of Drosophila. However, the nucleotide sequence of regions covering the breakpoints of two such inversions did not reveal the presence of any transposable element (Wesley and Eanes 1994; Cireraet al. 1995).
Studies of nucleotide variation in inverted and noninverted chromosomes have revealed that the distribution of variation within and between gene arrangements varies according to the location of the region studied in the inversion heterokaryotypes. In those surveys where the region studied was close to an inversion breakpoint, all sequences of the inverted chromosomes formed a unique cluster in the gene tree. This was the case for the amylase (Amy) locus and the most common gene arrangements of the third chromosome of D. pseudoobscura (Aquadroet al. 1991; Popadić and Anderson 1994), the esterase-5 (Est-5) gene region, and the sex-ratio and standard chromosomes of D. pseudoobscura (Babcock and Anderson 1996), the rp49 gene region and the Ost and O3+4 gene arrangements of D. subobscura (Rozas and Aguadé 1993, 1994), and both the break-point regions and the heat shock 83 (Hsp83) gene and the In(3L)Payne and standard chromosomes of D. melanogaster (Wesley and Eanes 1994; Hasson and Eanes 1996). These studies clearly support the monophyly of inversions. However, in those studies of nucleotide regions located in a more central position of the inversion loop, the sequences did not cluster in the gene tree according to their gene arrangements. This was the case for the Amy region and gene arrangements Tree Line, Olympic, Estes Park, and Hidalgo of D. pseudoobscura (Popadićet al. 1995), for the rp49 region and gene arrangements O3+4 and O3+4+8 of D. subobscura (Rozas and Aguadé 1993), for the Est-6 gene region and In(3L)Payne and standard chromosomes of D. melanogaster (Hasson and Eanes 1996), and for the P6 (also named Fbp2) region and In(2L)t and standard chromosomes of D. melanogaster (Bénassiet al. 1993). The results of these latter studies would be consistent with: (i) a multiple origin of the inversions, (ii) a unique origin of inversions with transfer of genetic information between gene arrangements (by double crossover or gene conversion), or (iii) a unique origin of inversions with an important number of parallel mutations.
Present data on nucleotide polymorphism at the rp49 gene region of D. subobscura are compatible with a monophyletic origin of the studied inversions. The topology of the rp49 gene tree (Figure 4), including sequences of the four major gene arrangements, clearly supports the monophyly of the Ost gene arrangement. However, for the other gene arrangements (O3+4, O3+4+8, and O3+4+23) the sequences do not form unique clusters in the gene tree. As we have shown that parallel mutation cannot account for the high number of observed shared polymorphic sites, the topology of the tree for O3+4+X arrangements would be compatible either with a multiple origin of the corresponding inversions (Figure 1) or with a unique origin with genetic exchange between arrangements. It should be noted, however, that the location of the rp49 gene in the inversion loop is different in O3+4+X heterozygotes than in heterozygotes between Ost and O3+4+X. In the latter, the rp49 region is very close to one of the breakpoints and within the inversion, while in heterozygotes between O3+4+X chromosomes the region is located in a more central position of the inversion loop. The observed pattern of nucleotide variation in the different arrangements and the location of the DNA region in the heterokaryotypes support a differential rate of genetic transfer between arrangements: no (or reduced) genetic exchange if the gene is at (or near) the breakpoint and extensive genetic exchange if the gene is in a more central position in the inversion. Therefore, the distribution of the O3+4+X sequences in the gene tree would not provide evidence for a multiple origin of either inversion 8 or inversion 23 (see Figure 1).
Recombination and gene flow between inversions: Here we have detected several short gene conversion tracts and no evidence of double crossovers when the rp49 region is close to the inversion breakpoint (i.e., in heterozygotes between Ost and O3+4+X arrangements), and few short gene conversion tracts and some long tracts that might correspond to double crossovers when the gene is in a more central position in the inversion loop (i.e., in heterozygotes between O3+4+X arrangements). Our results are in agreement with some predictions of the genetic exchange between gene arrangements: a major role of gene conversion in regions around the inversion breakpoints as opposed to a major role of crossing over in the central regions of inversions (Navarroet al. 1997). These authors showed that exchange in inversion heterokaryotypes due to double crossing over would be negligible near the breakpoints and maximum in the central regions of the inverted fragment. Assuming that the gene conversion rate is constant across the inverted region, gene flow (both by double crossing over and gene conversion) would therefore be lower near the breakpoints and maximum in the central region of inversions. According to these predictions, the lower genetic differentiation detected between O3+4+X chromosomes than between Ost and O3+4+X gene arrangements is caused by the higher genetic exchange expected in the central part of inversions.
Alternatively, it could be argued that the observed pattern of genetic differentiation between arrangements was due to their geographic differentiation or to the differential frequency of Ost/O3+4+X heterokaryotypes relative to heterokaryotypes between O3+4+X arrangements. However, genetic differentiation between Ost and O3+4+X arrangements is similar regardless of the geographic origin of the O3+4+X chromosomes, which is consistent with previous studies at the rp49 region showing that European populations are not genetically differentiated within each of these arrangements (Rozaset al. 1995). Also, Ost and O3+4 are the most frequent gene arrangements in these populations, and therefore genetic exchange between these arrangements could occur in the rather frequent Ost/O3+4 heterokaryotypes. These observations support the idea that the strong genetic differentiation between Ost and O3+4+X would not be caused by the geographic distribution of these arrangements, but by the reduced genetic exchange at the rp49 region in these heterokaryotypes.
It has been shown that the rate of decay of locus-inversion disequilibria only depends on the rate of transfer of genetic information between gene arrangements (Ishii and Charlesworth 1977; Nei and Li 1980) and that the decay half-life of the association is of the order of the reciprocal of the genetic exchange rate. Assuming that all gene flow detected between Ost and O3+4 was due to gene conversion, Rozas and Aguadé (1994) estimated that the gene conversion rate per base pair and per generation is ∼3.5 × 10–7. It is also possible to estimate the gene conversion rate from the total number of gene conversion tracts (those observed plus those inferred but not observed); for the rp49 sequences of Ost and O3+4, this rate has been estimated as 2.7 × 10–7 (Betránet al. 1997).
The adaptive character of chromosomal polymorphism is well documented (see, e.g., Prevostiet al. 1988), as well as the involvement in this adaptation of the differential gene content of inversions (Powell 1997). Coadaptation among loci within inversions has been proposed to maintain chromosomal polymorphism (Prakash and Lewontin 1968; see also Powell 1997). Our results (together with previous observations) provide valuable information on the possible distribution along the inversion of the members of the co-adapted complex. In fact, the reduction of recombination in inversion heterokaryotypes would play a fundamental role in maintaining blocks of genes together. However, nucleotide variation studies have shown that recombination is not totally suppressed and can be an important homogenizing factor, especially for genes located in the central part of the inversion. Therefore, selection could more easily maintain co-adapted complexes if the corresponding loci were in regions with low rates of genetic exchange between arrangements, that is, close to the inversion breakpoints. In the case of genes in a more central location in the inversion, selection should be stronger to successfully counteract the increased exchange of genetic information due to crossing over. Therefore, if selection were maintaining blocks of genes together, the target genes would be more likely located near the breakpoints.
Nucleotide variation distribution: The topology of the gene tree for all Ost and for all O3+4+X sequences clearly resembles a star phylogeny (Figure 4), that is, a phylogeny where the tree is stretched near the terminal nodes and compressed near the root. This is the topology expected for populations that have recently expanded from a very small size and are therefore in the transient phase to equilibrium. During this phase, the specific footprint left by the expansion should be detected in the pattern of nucleotide variation. No footprint would be detected, however, if the elapsed time since the expansion were long enough (e.g., more than 4N generations).
Due to the unique origin of inversions, a particular gene arrangement increases in frequency (and therefore expands) from one copy to its current frequency in the population. In our case, the expansion of at least some extant arrangements was probably associated with the extinction of the ancestral O3 arrangement. The observed pattern of nucleotide variation would indicate, therefore, that the time since the origin of the particular inversion has not been long enough to reach equilibrium. The negative values of Tajima's D and of Fu and Li's D and F statistics and the Poisson shape of the pairwise difference distribution (and the corresponding small values of the raggedness statistic) might also support this interpretation. Nevertheless, in populations of constant size, Tajima's D and the raggedness r statistics are a function of the intragenic recombination level. We have shown that the higher the recombination parameter, the lower the raggedness r values and the lower the variance of Tajima's D statistic (Table 6). Although the actual value of the recombination parameter is not known for the region studied, in general, either the raggedness r statistic or Tajima's D statistic was significantly different from the expected values. This would allow us to conclude, therefore, that the observed pattern of nucleotide variation within gene arrangement still reflects the expansion of the corresponding inversion since its origin (Figure 6).
Because of the geographic distribution of the gene arrangements, some of them with a clinal distribution, the expansion hypothesis should be more appropriately contrasted with a stepping-stone model. However, both the estimated population size of D. subobscura (107, Comeron 1997) and the estimated times for the origin of the different inversions (see below) support the hypothesis that variation within gene arrangement has not yet attained equilibrium (i.e., the time of origin should be reflect an expansion of the whole species, but in this case all loci in the genome would show the same pattern of variation. Nucleotide variation at the region encompassing the two Acp70A genes of D. subobscura (Cirera and Aguadé 1998), which is located in a region not affected by inversions, did not show, however, negative values of Tajima's D statistic. The hypothesis of a recent expansion of the whole species can be discarded on this basis, although additional data on variation at loci not associated with inversions would be needed to further support this conclusion.
Age of inversions: Assuming that the pattern of nucleotide variation within a particular arrangement reflects its expansion, we can estimate the time of origin of that arrangement from the coalescent time for its nucleotide sequences. The sudden expansion model basically depends on three parameters: θ0, or initial theta; θ1, or final theta; and τ – 2ut (units of mutational time, where u is the mutation rate and t the time in generations; Rogers and Harpending 1992). We can estimate τ (Rogers 1995, Equation 3) from the observed values of k (nucleotide diversity per sequence, that is, the average number of nucleotide differences between two sequences), considering that θ0 – 0 (due to the unique origin of inversions). To estimate coalescent times, only variation that originated independently in each gene arrangement should be considered. For this reason, we have subtracted all nucleotide variants for which there is evidence of genetic transfer from another gene arrangement: each informative nucleotide variant within a detected gene conversion or double crossover tract has been replaced by the most frequent nucleotide variant in the recipient chromosomal class.
To estimate coalescent times, an estimate of the neutral mutation rate is needed. For a time of divergence between D. guanche and D. subobscura (Ramos-Onsinset al. 1998) of 1.8 myr (if the Sophophoran radiation was 30 mya; Throckmorton 1975) or 2.8 myr (if the Sophophoran radiation was 46 mya; Beverley and Wilson 1984), the estimated rate of silent nucleotide substitution per site and per year at the rp49 region would be λ – 14.1 × 10–9 or λ – 9.1 × 10–9, respectively.
Table 7 shows the estimated times for the origins of the different inversions. These coalescent times are slightly lower than our previous estimates for some of the inversions (Rozas and Aguadé 1994). The differences in the estimates are mainly due to the different methods used for estimation: here we have assumed that variation within gene arrangement is not at equilibrium, while in our previous work the time estimates were based on the maximum value of the number of pairwise differences. In both studies, the time estimates have been obtained under the assumption that all nucleotide variants observed in a particular chromosomal arrangement originated in that arrangement. However, some additional variants not included in the detected gene conversion or double crossover tracts (e.g., some shared polymorphisms) could have been transferred from other gene arrangements; in that case, most probably for the O3+4, O3+4+8, and O3+4+23 arrangements, the coalescent time would be overestimated. If gene flow among these gene arrangements were actually quite important, the coalescent time for the O3+4+X chromosomes would represent the coalescent time for inversion 4 (Figure 6).
We thank Serveis Científico-Tècnics, Universitat de Barcelona, for automated sequencing facilities. This work was supported by grants PB94-923 from Comisión Interdepartamental de Ciencia y Tecnología, Spain, and 1995SGR-577 from Comissió Interdepartamental de Recerca i Tecnologia, Catalonia, Spain, to M.A.
Communicating editor: N. Takahata
- Received April 13, 1998.
- Accepted September 21, 1998.
- Copyright © 1999 by the Genetics Society of America