In the malaria mosquito Anopheles gambiae polymorphic chromosomal inversions may play an important role in adaptation to environmental variation. Recently, we used microarray-based divergence mapping combined with targeted resequencing to map nucleotide differentiation between alternative arrangements of the 2La inversion. Here, we applied the same technique to four different polymorphic inversions on the 2R chromosome of An. gambiae. Surprisingly, divergence was much lower between alternative arrangements for all 2R inversions when compared to the 2La inversion. For one of the rearrangements, 2Ru, we successfully mapped a very small region (∼100 kb) of elevated divergence. For the other three rearrangements, we did not identify any regions of significantly high divergence, despite ample independent evidence from natural populations of geographic clines and seasonal cycling, and stable heterotic polymorphisms in laboratory populations. If these inversions are the targets of selection as hypothesized, we suggest that divergence between rearrangements may have escaped detection due to retained ancestral polymorphism in the case of the youngest 2R rearrangements and to extensive gene flux in the older 2R inversion systems that segregate in both An. gambiae and its sibling species An. arabiensis.
MORE than 70 years ago Dobzhansky and Sturtevant (1938) first discovered polymorphic inversion arrangements carried by various Drosophila pseudoobscura populations. After observing correlations between environmental conditions and inversion frequencies, Dobzhansky proposed that inversions are under strong selection due to their role in promoting local adaptation to the heterogeneous conditions a species encounters both spatially and temporally (Dobzhansky 1944, 1948; Powell 1997). More recent studies have implicated chromosomal inversions in the adaptation of a diversity of eukaryotes including humans (Coluzzi et al. 1979; Feder et al. 2003; Hoffmann et al. 2004; Stefansson et al. 2005). Long known to be common in dipteran insects, more recent HapMap data suggest that polymorphic inversions may be numerous in human populations and by extension other mammals (Bansal et al. 2007). Given their potential importance in facilitating adaptation, surprisingly little is known about the mechanism(s) or the genes responsible for maintaining inversion polymorphisms in natural populations.
Gene exchange between inverted and standard arrangements, although reduced, can still occur through gene flux: the action of gene conversion and multiple crossovers in inversion heterozygotes (heterokaryotypes) (Chovnick 1973; Navarro et al. 1997; Schaeffer and Anderson 2005). Over time allelic variation unrelated to ecological adaptation should become homogenized between arrangements, while alleles which are under divergent selection pressures should remain in linkage disequilibrium with each other and with the inversion itself, leading to heightened differentiation between standard and inverted arrangements at and near the target loci. In principle, this process allows the identification of specific loci involved in adaptive divergence (Schaeffer et al. 2003; Schaeffer and Anderson 2005; Storz 2005). Consistent with this model, previous low-resolution studies of Drosophila inversions revealed heterogeneous patterns of nucleotide diversity relative to divergence, as well as the interspersion of regions of high and low genetic association potentially due to the interaction of selection and gene flux (Schaeffer et al. 2003; Kennington et al. 2006; but see Munte et al. 2005). The application of high-resolution tools flowing from completely sequenced genomes will facilitate the mapping of genes that are the targets of divergent natural selection within gene arrangements.
Although Drosophila has been the favored model, the African malaria vector Anopheles gambiae sensu stricto also provides an excellent system for studying the maintenance of inversion polymorphisms, not only within a species but across speciation events of different ages in the An. gambiae sibling species complex. The nominal species An. gambiae s.s. (hereafter, An. gambiae) is synanthropic: almost exclusively biting humans, resting indoors, and exploiting anthropogenic larval habitats (Coluzzi 1999). This close association with humans, vital to making An. gambiae one of the most proficient vectors of malaria, is likely to have been facilitated by chromosomal inversions thought to confer adaptive benefits in heterogeneous climatic and ecological settings in Africa. Seven common polymorphic inversions exist on the second chromosome. Six of these are located on the right arm (2R): j, b, c, u, d, and k, while 2La is the only inversion on the left arm (Coluzzi et al. 2002). Facilitated by the sequenced reference genome (Holt et al. 2002), some of the breakpoints for these polymorphic inversions have been localized to small genomic regions (Sharakhov et al. 2006; Coulibaly et al. 2007; Sangare 2007). Most of these inversions appear to be the targets of strong selection. Five of the inversions (2La and 2Rb, -c, -d, and -u) are nonrandomly associated with degree of aridity; each cycles seasonally with rainfall, and all except 2Ru form stable geographic clines in frequency from mesic forest to xeric regions bordering the Sahara (Coluzzi et al. 1979; Toure et al. 1994, 1998; Powell et al. 1999). Inversion 2Rj is not clinal, but its distribution in Mali is consistent with adaptation to novel rockpool niches (Coluzzi et al. 1985; Manoukis et al. 2008).
In the An. gambiae species complex, inversion polymorphisms can be maintained across the boundaries of emerging and even full species. An. gambiae and its sibling An. arabiensis, strictly sympatric throughout most of their extensive ranges in sub-Saharan Africa, differ by multiple fixed chromosomal rearrangements on the X but share three chromosome 2 inversions: 2La, fixed in An. arabiensis and polymorphic in An. gambiae; and 2Rb and -c, polymorphic in both species (Coluzzi et al. 1979, 2002). Moreover, these same inversions and all other common An. gambiae inversions with the exception of 2Rj are shared and polymorphic in two lineages apparently undergoing ecological speciation within An. gambiae—the assortatively mating M and S molecular forms (della Torre et al. 2002, 2005). Inversion frequencies are correlated with climatic and ecological conditions in parallel in both lineages (Costantini et al. 2009; Simard et al. 2009). Unlike the full species, the M and S incipient species are not distinguished by any fixed inversion differences. Indeed, genomewide divergence mapping between the M and S forms revealed that significant differentiation was confined to two small low-recombination regions adjacent to the centromeres of 2L and X which are distant from any inversions (Turner et al. 2005). Thus, in distinction to models of speciation invoking inversions as facilitating the persistence of hybridizing species (Noor et al. 2001; Rieseberg 2001; Ortiz-Barrientos et al. 2002; Navarro and Barton 2003), the An. gambiae data suggest that chromosome 2 inversions are not directly responsible for reproductive isolation. Instead, the same chromosome 2 inversion polymorphisms appear to confer similar ecological benefits, within and across species boundaries. A long-term research goal is to identify the mechanisms and the genes controlling these processes.
Previously we conducted the first high-density genomic scan of divergence across a chromosomal inversion (2La) in An. gambiae (White et al. 2007). By hybridizing genomic DNA from S form mosquitoes homokaryotypic for alternate gene arrangements on chromosome 2L (2La or 2L+a) to oligonucleotide microarrays we were able to measure divergence across the 22-Mb inversion at nearly 14,000 markers. Differentiation in the rearranged region was significantly higher than in collinear portions of chromosome 2L. Between breakpoints the pattern of differentiation was heterogeneous: two genomic clusters of significantly higher divergence were identified near but not adjacent to the breakpoints. Directed resequencing within the S form confirmed these results and suggested that both clusters contained genes targeted by selection. Observed levels of linkage disequilibrium between the 2La breakpoints and markers in the clusters are highly unlikely under a neutral scenario, in light of known recombination rates and plausible estimates of the age of the inversion.
The present study characterizes the patterns of genetic variation in polymorphic rearrangements on the opposite (right) arm of chromosome 2: 2Rj, -b, -c, and -u. With the goal of identifying candidate genes maintaining these inversions in natural populations, we applied microarray-based divergence mapping to measure differentiation between alternative 2R arrangements. Because three of four inversions have taxonomic distributions that span incipient and/or completed speciation events, we validated the microarray findings by targeted sequencing in multiple taxa: sympatric Malian populations of An. gambiae M and S forms, and the sibling species An. arabiensis.
MATERIALS AND METHODS
Mosquito collection, identification, and DNA isolation:
Indoor resting mosquitoes were collected by spray catch in August through September 2004 from five villages in Mali: Banambani (12°48′N, 08°03′W), Bancoumana (12°20′N, 08°20′W), Fanzana (13°20′N, 06°13′W), Kela (11°52′N, 08°26′W), and Moribabougou (12°41′N, 07°57′W). An. arabiensis samples from this 2004 collection were augmented by collections made in 1995 from Banambani and Moribabougou, kindly provided by A. della Torre. Specimens were sorted morphologically to An. gambiae s.l. and by gonotrophic stage.
Ovaries of half-gravid specimens were dissected and placed in individual numbered microtubes containing Carnoy's solution; the remaining carcass was preserved over desiccant in individual microtubes with the corresponding number. Polytene chromosomes were scored for inversions with reference to the An. gambiae polytene chromosome map (published as a poster in Science 298, October 4, 2002 by M. Coluzzi and V. Petrarca; http://www.sciencemag.org/feature/data/mosquito/pdfs/poster.pdf) following della Torre (1997). Unused ovarian material was preserved for later validation, performed blind to the original scoring.
DNA was isolated from individual carcasses using the DNeasy Extraction kit (QIAGEN, Valencia, CA). An. arabiensis and An. gambiae s.s. molecular forms were identified using an rDNA-based PCR diagnostic (Fanello et al. 2002). Quantity and quality of eluted DNA for each specimen was checked by spectrophotometry using the Nanodrop-1000 (Nanodrop Technologies).
Microarray hybridization and analysis:
Mosquito genomic DNA hybridized to Affymetrix Anopheles/Plasmodium GeneChip microarrays came from five villages in southern Mali, principally Kela. Arrays were hybridized with genomic DNA from individual mosquitoes, in sets of five biological replicates per each of four different 2R homokaryotypes (20 arrays, of 5 mosquitoes × 4 homokaryotypes: 2R+, 2Rb, 2Rbc, and 2Rjbcu). Labeling and hybridization of genomic DNA followed White et al. (2007). Labeled DNA from each mosquito was hybridized to 20 individual microarrays using standard protocols for eukaryotic cRNA hybridization. Hybridization and scanning of the arrays was performed at the Indiana University School of Medicine. Ten arrays were processed on each of 2 consecutive days under otherwise identical conditions. Groups of mosquitoes with the same karyotype were split between the 2 days to avoid any bias introduced by the hybridization and scanning.
Affymetrix CEL files containing the raw probe intensities were imported into Bioconductor (http://www.bioconductor.org). Using the “affy” program, data quality was assessed by examining the distribution of raw probe intensities using both a histogram and box plot (Bolstad et al. 2005). Comparison between chips showed two with irregularly low hybridization intensities. Examination of chip images using raw and normalized probe intensities revealed that the same two aberrant chips had unusually low hybridization intensities across their surfaces (Irizarry et al. 2003). Both of these chips had been hybridized with specimens carrying the 2Rb homokaryotype. After discarding these two chips, an insufficient number of biological replicates (n = 3) remained for this karyotype, leading us to eliminate it from further analysis.
Background adjustment and quantile normalization using the robust multiarray average (RMA) without summarization by probe set were performed (Irizarry et al. 2003). Probe level data were exported as comma separated value files for importation into Excel and are available from B.J.W. upon request. All Anopheles probes from the Anopheles/Plasmodium GeneChip were mapped against the AgamP3 assembly and filtered to remove those that exactly matched multiple genomic locations or had secondary one-off mismatches. For each of the 151,213 retained probes, a two-tailed t-test was performed to compare the background adjusted and normalized probe intensities obtained from the two alternative genetic classes being compared. Probes whose signal intensities differed at P < 0.01 between classes were considered to have single feature polymorphisms (SFPs) between the two groups (Turner et al. 2005; White et al. 2007).
The ∼61.5-Mb chromosome arm 2R is interrogated by 43,363 probes. In order of size and distal-to-proximal location on 2R, inversion j spans ∼12.5 Mb and 9,611 probes, inversion b ∼8 Mb and 6,218 probes, inversion c ∼4.5 Mb and 4,003 probes, and inversion u ∼4 Mb and 3,013 probes (Coulibaly et al. 2007; Sangare 2007). To gauge whether SFPs were overrepresented on 2R, we used a χ2 test to compare the observed and expected number of SFPs on 2R vs. all other chromosomes combined (2L+3R+3L+X; note that these other chromosome arms were devoid of inversion polymorphisms). We used the genomewide frequency of SFPs to determine the expected number for both categories. Similarly, overrepresentation of SFPs in rearranged vs. collinear regions of 2R was tested by comparing observed and expected numbers based on the 2R-specific proportion of significant probes. To test for significant clustering of SFPs across 2R we performed a sliding window analysis with a window size of 300 probes and a step size of 20 probes. Each window was tested (χ2) for an excess of SFPs compared to the number expected on the basis of the 2R-specific frequency of significant probes. Significance was evaluated after Bonferroni correction for multiple tests (conservative because windows are correlated).
DNA sequencing and analysis:
To shed light on the underlying evolutionary processes shaping the patterns observed from microarray divergence mapping, we performed targeted resequencing across the 2R chromosome. Chromosomal inversions have a complex history in An. gambiae, as they are shared across molecular forms and species boundaries (Della Torre et al. 1997; Coluzzi et al. 2002; Besansky et al. 2003). To account for this complexity, sequence data were obtained from Malian samples of both M and S molecular forms homokaryotypic for alternative arrangements of three 2R inversions: b, c, and u. As 2Rj is exclusive to the S form (Della Torre et al. 2005), corresponding M sequences were obtained only for the standard arrangement. In addition, sequences were determined from Malian samples of the sibling species An. arabiensis. Subject to constraints of availability, average sample size per sequenced locus was 30 chromosomes for S form (range, 16–46), 17 for M form (6–24), and 16 for An. arabiensis (10–22), resulting in a total resequencing effort of ∼1 Mb across 17 genes.
To avoid the complication of heterozygous insertion–deletions commonly encountered within introns of these species, we targeted exon segments of ∼600 bp (range, 410–740 bp) from 17 genes on chromosome 2R predicted from the AgamP3.4 gene annotation (Figure 1). Among these were three genes near the estimated breakpoints of the b, c, and u inversions, for the purpose of dating their origin in An. gambiae. At least two additional genes distant from the breakpoints were sequenced for each inversion.
Primers targeting exons were designed using Primer3 software (Rozen and Skaletsky 2000) and custom synthesized (Invitrogen). Primer sequences and their corresponding VectorBase gene identifier are given in supporting information, Table S1. PCR was carried out in 25-μl reactions following the conditions of White et al. (2007). PCR products were purified using ExoSAP-IT (USB, Cleveland, OH) or the GeneClean Spin kit (MP Biomedicals) following excision of bands separated on a 1.5% agarose gel.
PCR products were directly sequenced on both strands using an Applied Biosystems 3730xl DNA Analyzer and Big Dye Terminator v3.1 chemistry. Electropherograms were trimmed and visually inspected for heterozygous SNPs and indels using SeqMan II (DNASTAR, Madison, WI). Because direct sequencing from diploid mosquitoes produced gene sequences whose polymorphic sites were not phased, haplotypes were inferred for each set of gene sequences using PHASE 2.1 software (Stephens et al. 2001; Stephens and Donnelly 2003). Sequences are available from GenBank under accession nos. FJ891066–FJ892701.
DnaSP version 4.2.02 (Rozas et al. 2003) and Arlequin version 3.1 (Excoffier et al. 2005) were used to calculate standard polymorphism and divergence statistics. Significance was determined from 10,000 permutations (Arlequin) or 10,000 coalescent simulations without recombination (DnaSP). Negative values for Da and FST were reported as zero.
Hierarchical AMOVA were conducted in Arlequin. In the models, differentiation was represented by the mutational (pairwise nucleotide) distance between haplotypes for each gene in 2Rb, -c, and u rearrangements. Two groups were defined: standard or inverted. Within groups, populations were defined by molecular form, M or S.
To evaluate whether Tajima's D values (Tajima 1989) for genes near inversion breakpoints were significantly lower than those of genes residing in the same inversion but more distant from the breakpoints, we implemented a modification of the heterogeneity test proposed by Hahn et al. (2002). Aimed at distinguishing demographic from selective events that may both produce an excess of rare mutations, the test compares the distribution of mutations at synonymous vs. nonsynonymous sites within a gene. However, the heterogeneity test also can be used to test for differences in Tajima's D between any two sets of nucleotide variation data (M. W. Hahn, personal communication). Here, the distribution of mutations was compared between gene sequences at breakpoints and concatenated sequences from nonbreakpoint genes, excluding mosquitoes for which the data set was incomplete.
To estimate the time to the most recent common ancestor for the 2Rb, -c, and -u arrangements in An. gambiae, we used the expectation E[TMRCA] = 4Nef(1 − ni −1), which is based on the number of segregating sites unique to each of the inverted and standard arrangements (Andolfatto et al. 1999); polymorphisms shared between arrangements and species were removed prior to calculation. This estimate assumes that each inversion instantly rose to its equilibrium (current) frequency immediately after entering the population. Violation of these assumptions makes E[TMRCA] a minimum estimate of inversion age and, as noted by Andolfatto et al. (1999), the estimates have very large variances.
On chromosomes bearing inversion polymorphisms, higher divergence is expected between their rearranged vs. collinear regions, because of suppressed recombination between alternative gene arrangements. In our previous study of 2L in the S form of An. gambiae, divergence mapping by microarray revealed much higher differentiation between rearrangements of 2La than collinear parts of that arm or elsewhere in the genome (White et al. 2007). In the present study, microarray hybridization of An. gambiae samples indicated that divergence on 2R also was consistently higher in rearranged compared to collinear regions (P = 0.003). However, rearranged 2R regions contained only 1.16 times more SFPs than collinear regions, while this ratio was >2 for the same comparison on 2L (Table 1). Among the four inversions we examined, 2Rb and 2Ru had marginally higher levels of differentiation between rearrangements than 2Rj and 2Rc although neither approached the level observed for alternative rearrangements of the 2La inversion.
Apart from differences in overall divergence levels between rearranged and collinear regions, we sought to map locations with unusually high levels of divergence on 2R as a guide to identifying regions that may be responsible for the maintenance of inversions in populations. To test for regional clustering of SFPs, we performed a sliding window analysis of divergence on the basis of the comparative hybridization of 2Rjbcu and 2R+j+b+c+u karyotypes of An. gambiae (Figure 1). This revealed a small region within the 2Ru rearrangement that contained significantly more SFPs (14) than expected by chance, given the 2R-specific proportion of SFPs observed in this experiment and the number of windows tested (P < 0.05 after Bonferroni correction). If this region contains sequences that contribute to the maintenance of 2Ru, we expect a peak of statistically significant divergence at the same position in comparisons between other An. gambiae genetic backgrounds, but only if they differ by the alternative arrangements 2Ru and 2R+u. Comparison between genetic backgrounds that carry the same 2Ru arrangement are not expected to show a peak of genetic divergence at this position. Our results confirmed these predictions for 2Ru (Figure S1; Figure S2).
Inspection of the pattern of divergence across the 2Ru rearrangement shows that the significant cluster of SFPs is near, yet distinct from, the distal breakpoint (Figure 1, Figure S1). Although the data do not allow for precise localization, the cluster spans a region of up to 250 kb between chromosomal coordinates ∼32.25–32.50 Mb. From its midpoint, this cluster lies ∼875 kb from the distal breakpoint. This 250-kb cluster contains 29 predicted genes whose location and inferred function in VectorBase (if any) is given in Table S2.
Unlike the results for 2Ru, no significant regional clustering of SFPs was observed within the other three rearranged regions on 2R; the marginally significant peak within the 2Rb rearrangement seen in Figure 1 was not validated in a separate comparison of alternative 2Rb arrangements (Figure S2).
Overall, these results contrasted with our previous study of the 2La rearrangement, where we observed substantially greater differentiation between rearranged and collinear regions, and greater heterogeneity in divergence across the inversion, allowing for the localization of two nonadjacent significantly differentiated regions putatively maintaining the polymorphism.
In an effort to validate and illuminate the microarray results, particularly in light of the contrast between 2R and 2La rearrangements, we performed targeted sequencing of genes within and outside of rearranged regions. The expectation is that a newly arising inversion carries no variation, but its diversity recovers with age through mutation and gene flux (Chovnick 1973; Navarro et al. 2000; Andolfatto et al. 2001). Diversity is expected to recover more rapidly in the center of the inversion, away from the breakpoints where recovery is suppressed by low rates of gene flux (Chovnick 1973; Andolfatto et al. 2001; Schaeffer and Anderson 2005).
Age of rearrangements:
To estimate age of the inversions, we used nucleotide variation data from genes near the breakpoints of b, c, and u (2751, 2775, and gpr45, respectively) in the An. gambiae S form following Andolfatto et al. (1999) (Figure 1; Table 2). Taken together with a previous estimate of the age of 2Rj on the basis of the same population sample (Coulibaly et al. 2007), the data agree with expectations on the basis of geographic and taxonomic distributions. They suggest that the 2Rj and -u inversions are the youngest—at ∼0.4 Ne and ∼1.7 Ne generations, respectively—while the trans-specific inversions 2Rb and -c both share a much older origin at ∼2.6 Ne generations. It is remarkable that the other trans-specific inversion (2La) dates to approximately the same time, ∼2.7 Ne (White et al. 2007).
DNA sequence polymorphism:
Concordant with their relative ages, nucleotide diversity was reduced by as much as twofold in the inverted relative to standard arrangements of the younger inversions 2Rj and -u, but no pronounced or consistent reduction in diversity was observed between alternative arrangements for the older b and c inversions within M and S forms (Figure 2; Table 2). Similarly, the expected trend of lower diversity near inversion breakpoints relative to more central locations was observed most clearly for the youngest j inversion at locus 2110, only 10 kb from the proximal breakpoint. For the other inversions, the trend (if present at all) was weaker, perhaps due not only to their greater age but also to the longer distance between the breakpoint and the gene that was surveyed (Table 2).
Genes in breakpoint regions where flux is lowest should show strong skews in the mutation frequency spectrum (Andolfatto et al. 2001). Skews toward rare mutations were apparent from the preponderance of negative values for Tajima's D statistic (Tajima 1989) in Table 2. However, as in previous studies on An. gambiae, the D values were negative regardless of molecular form, rearrangement, or chromosomal location, consistent with a proposed population expansion within the past ∼10,000 years (Donnelly et al. 2001, 2002). Yet only four D values were significantly negative, three of which were associated with loci adjacent to the breakpoints of an inverted arrangement: 2110 (S form, j inversion), 2751 (S form, b inversion), and gpr45 (M form, u inversion). To help distinguish heterogeneous selective processes from the effects of more homogeneous demographic processes, heterogeneity among D estimates at different loci was assessed through two approaches. Where possible, we implemented a modified version of the test by Hahn et al. (2002) to demonstrate that D values for these breakpoint-adjacent genes were significantly different than those for genes located farther away inside the inversion (for 2751 in the S form, P = 0.04; for gpr45 in the M form, P = 0.045; the test could not be performed for 2110 because sequence was derived from a different subsample of mosquitoes). Additionally, we explored the ratio of Tajima's D to its theoretical minimum value Dmin as proposed by Schaeffer (2002). A plot of this statistic indicates that the same breakpoint-proximal loci gave the most extreme D/Dmin values (near the theoretical minimum) (Figure S3).
We estimated sequence divergence in terms of numbers of shared polymorphisms vs. fixed differences between alternative arrangements, as well as two indices of differentiation, FST and Da. As expected from the microarray results, the numbers of polymorphisms shared between collinear regions was higher than the numbers shared between rearranged regions, in all except three genes (“Inverted-Standard” columns in Table 3). Also expected, collinear regions were not differentiated within molecular forms: FST values were not significantly different from zero, and Da values were at or near zero. By contrast, rearranged regions were characterized by highly significant FST values and elevated Da values, particularly within 2Ru. Moreover, genes flanking the breakpoints for the 2Rj, -b, and -u inversions shared fewer polymorphisms between arrangements than genes more distant from the breakpoints, though this was not invariably reflected in higher FST and Da values.
Four genes in 2Ru were targeted for resequencing within the only significant cluster of SFPs identified by microarray. One of these, e2-230k, is represented by an expressed sequence tag in the AgamP3.4 assembly (ENSANGEST00000008826) that partially overlaps a predicted gene annotated as AGAP003090. At this locus, the level of net nucleotide divergence (Da) between alternative arrangements was more than twice that seen at any other sequenced locus on 2R, and FST values also were much higher, the two exceptions being genes located close to the 2Ru and -j inversion breakpoints. In addition, two of only three fixed nucleotide differences between alternative arrangements observed in our entire data set were identified in e2-230k, in both M and S population samples. These were in perfect linkage disequilibrium with the inversion despite a distance of nearly 1 Mb from the 2Ru distal breakpoint. Such linkage is unlikely to have persisted as a remnant of complete inversionwide linkage disequilibrium established when the rearrangement first arose. On the basis of rates of recombination estimated in the 2La rearrangement (Stump et al. 2007), the recombination fraction between e2-230k and the 2Ru distal breakpoint is expected to be roughly r = 0.007. Given complete linkage disequilibrium upon origin of the inversion, in which fixed differences have linkage disequilibrium values of D0 = 0.25, Dt should decay in the absence of selection to 0.001 in <800 generations (Lewontin and Kojima 1960). Assuming that 2Ru arose ∼1.7 Ne generations ago, and applying a conservative microsatellite-based estimate of long-term Ne of ∼6500 (Pinto et al. 2003), the age of 2Ru is at least 11,000 generations, a time-frame larger by an order of magnitude than the time required for linkage disequilibrium to decay. This result is consistent with natural selection maintaining the association between e2-230k (and presumably other genes not sequenced in the diverged cluster) with the distal breakpoint, and therefore the 2Ru inversion itself.
Notwithstanding the expected overall pattern of significant differentiation between rearranged but not collinear regions on both arms of chromosome 2, we were struck by the much greater magnitude of differentiation between rearranged regions of 2L vs. 2R—a pattern consistent between microarray and sequencing results. On 2L, fixed nucleotide differences were found between arrangements at five of nine sequenced genes, and average FST and Da values were relatively high (0.527 and 1.66%, respectively; White et al. 2007). On 2R, no fixed differences were found between arrangements except at two genes located within the 2Ru inversion. Moreover, FST and Da values also were much higher between 2L than between 2R rearrangements. In the S form, the average FST value for 2R rearranged regions was 0.242, more than twofold lower than the value for 2La rearrangements. Even more striking, the corresponding average Da value on 2R was approximately sixfold lower, 0.263%. (In the M form, 2R estimates were similar, but no 2L sequences from this form are available for comparison).
Species boundaries and chromosome rearrangements:
Three of the four rearrangements considered on 2R are shared across taxonomic boundaries, a condition that we refer to here as “trans-specific” regardless of sharing between incipient or full species. In principle, the sharing of gene arrangements between taxa may occur due to multiple independent origins, or alternatively to a unique origin followed by retention of ancestral polymorphism and/or secondary introgression between taxa. Previous evidence suggests that 2L and 2R inversions likely share a common origin (Coluzzi et al. 1979; della Torre et al. 1997; Besansky et al. 2003). Our results (below) are consistent with this interpretation.
In the M and S forms, the standard arrangements from different molecular forms are more closely related to each other than to alternative arrangements within the same form; the same is true for inverted arrangements (Table 3). The most parsimonious explanation is that the rearrangements predate the origin of M and S forms, implying a unique origin of a shared polymorphism. However, it is notable that most comparisons between M and S within the same arrangement class revealed significant and moderately large values of FST (∼0.1–0.5). Moreover, at genes within 2Rc and -u, nucleotide divergence between M and S was almost invariably higher for inverted than for standard chromosomes (paired t-test, P = 0.035). This result is reinforced by hierarchical analyses of molecular variance (AMOVA, Table 4), which indicated a significant contribution of variance by M vs. S sequences within the same arrangement class, but not between arrangement classes. These results most likely reflect the fact that the combined forces of selection and drift (especially in the inverted arrangement) are stronger than migration due to interform gene flow, suggesting that rearrangements may be evolving largely independently in M and S forms.
Between the sibling species An. gambiae and An. arabiensis, a complete comparison of rearrangements analogous to that performed for the M and S forms was not possible, because our sample of An. arabiensis contained only one arrangement for each rearranged region considered. An. arabiensis sequences determined from seven 2R and six 2L genes (supplemented by existing 2L data; White et al. 2007), were compared to corresponding gene sequences of the An. gambiae S form. Similar to the pattern observed for the incipient species, same-arrangement divergence was lower than alternative-arrangement divergence on both 2R and 2L (Table 5). However, the overall level of sequence differentiation was substantially larger for alternative arrangements of 2La than for 2R (Figure 3). Whereas average Da (FST) values between 2L arrangements were as high as 1.67 (0.589), those between 2R arrangements were only 0.265–0.574 (0.281–0.417). This result is even more striking because same-arrangement contrasts between sibling species did not reveal the same dichotomy between 2L and 2R: Da and FST values for 2La were within the range observed for the other inversions.
Our study was motivated by the goal of localizing regions within An. gambiae chromosomal inversions that are the targets of selection responsible for their maintenance and the promise of mapping those targets with oligonucleotide microarrays (Borevitz et al. 2003; Winzeler et al. 2003; Turner et al. 2005; White et al. 2007; Turner et al. 2008). Yet for the right arm of chromosome 2, significant divergence was detected for only one of four studied inversions (only 2Ru and not 2Rj, -b, or -c), and overall differentiation between 2R arrangements was sharply lower than between 2L arrangements studied previously (White et al. 2007). Possible reasons for these outcomes are discussed below.
One formal possibility is that An. gambiae 2R inversion polymorphisms j, b, and c are selectively neutral. Although this possibility cannot be dismissed on the basis of our molecular data, independent circumstantial evidence similar to that which influenced Dobzhansky (Krimbas and Powell 1992; Powell 1997) strongly favors an adaptive role for inversions in An. gambiae, including: (1) stable geographic clines that follow aridity gradients in distant parts of the African continent, such as Nigeria and Burkina Faso in West Africa (Coluzzi et al. 1979; Costantini et al. 2009), Cameroon in Central Africa (Simard et al. 2009), and the Comoros in East Africa (Petrarca et al. 1990); (2) microgeographic partitioning of alternative arrangements relative to degree of aridity, e.g., differential indoor/outdoor resting behavior of mosquito carriers of alternative arrangements (Coluzzi et al. 1977); (3) seasonal cycling of inversions in association with rainfall (Coluzzi et al. 1979; Toure et al. 1998); and (4) stable heterozygote excess over Hardy-Weinberg expectations in some laboratory populations (della Torre et al. 1997). Schaeffer (2008) recently modeled the migration-selection balance required to maintain chromosomal polymorphisms across diverse geographic habitats in D. pseudoobscura, concluding that migration levels are too extensive in this species to explain observed clines in arrangement frequencies through neutral diffusion processes. Such modeling has yet to be attempted in An. gambiae, but the remarkable parallels to Drosophila—including high migration levels (Pinto et al. 2003)—suggest that a similar conclusion would not be surprising.
If selection is responsible for the maintenance of inversions 2Rj and 2Ru in natural populations, what can explain the failure to detect the anticipated signatures of selection by microarray divergence mapping? Under the model proposed by Kirkpatrick and Barton (2006), chromosomal inversions that capture multiple (at least two) locally adapted alleles spread because they suppress recombination with different genetic backgrounds disadvantageous under local conditions, in the face of migration or hybridization. However, these authors also suggest that the preconditions for the inversion to spread may still persist in the standard (uninverted) class of chromosomes. In other words, at least some of the locally adapted alleles captured by the inversion might continue to segregate among the standard chromosomes. Indeed, a proposed test of their hypothesis includes the expectation of linkage disequilibrium in standard chromosomes between the alleles carried by the inversion (Kirkpatrick and Barton 2006). Although such tests are beyond the scope of the present study, the possibility that beneficial alleles captured by the newly arisen inversion remain polymorphic in the parental (standard) population could explain the inability to measure divergence between arrangements of the relatively young 2R inversions using the microarray approach as applied here. Given samples of only 10 chromosomes per microarray hybridization, our array-based mapping technique lacks the sensitivity to uncover variants maintaining an inversion when those variants continue to segregate in the ancestral arrangement at frequencies much greater than 10% (i.e., 1 chromosome of the 10 we hybridize). In the case of putatively recent 2Rj and -u inversions, retention of beneficial alleles by the ancestral population represents a plausible explanation for the discovery of just a single candidate region within these two inversion systems—despite the theoretical prediction that inversions are maintained by adaptive benefits conferred by at least two loci.
The persistence of preconditions for the inversion to spread may be less plausible an explanation in the case of the apparently more ancient and trans-specific 2Rb and -c rearrangements. Rather, multiple shared polymorphisms between alternative arrangements at every gene sequenced within the 2Rb and -c rearrangements indicate extensive gene flux between both rearrangements, which could eventually winnow the region of divergence to a small size (Schaeffer et al. 2003; Schaeffer and Anderson 2005). The microarray that we utilized for divergence mapping contains only an average of eleven 25-mer probes per interrogated gene, and not all predicted genes are represented on the array. While this platform provides very high resolution compared to more traditional approaches with far fewer markers, genome coverage is neither complete nor uniform for the Affymetrix Anopheles chip. Furthermore, our technique for discovering highly diverged regions relied on a sliding 300-probe window, because adoption of smaller window sizes would increase the false positive discovery rate. Thus, regions of divergence that encompass only a single or even a few genes could escape detection with our method. Accordingly, we hypothesize that gene flux between alternative arrangements of 2Rb and 2Rc eroded divergence around adaptive variants below the resolution of the oligonucleotide array. In the future, overcoming this resolution threshold is possible with a higher density array, such as a tiling array (Turner et al. 2008), or by high throughput whole genome sequencing.
The extensive gene flux hypothesized between the 2Rb and 2Rc arrangements stands in contrast to our results for alternative 2La arrangements, which exhibited only minimal genetic exchange and much higher levels of divergence (Figure 3). However, the estimated ages of these three inversions suggest that they have similar sojourn times in An. gambiae, which implies that gene flux between alternative arrangements should be the same for all three inversion systems. A possible explanation for this paradox rests on the fact that the sibling species An. arabiensis is monomorphic for 2La, but polymorphic for both 2Rb and 2Rc. In Figure 4, we propose a speculative history for chromosomal evolution in the lineages leading to present-day An. gambiae and An. arabiensis. It reflects the fact that the lineage leading to An. gambiae was probably derived from a homokaryotypic standard ancestor (Ayala and Coluzzi 2005). The model posits that the sibling species were fixed for alternative 2R and 2L arrangements upon secondary contact, represented by horizontal arrows in Figure 4. The plausibility of introgressive hybridization upon secondary contact proposed in our model is supported by documented levels of female F1 hybrids in nature between these strictly sympatric species (0.1–0.2%; White 1971; Temu et al. 1997) and their fertility (Davidson et al. 1967). Detailed arguments behind the interpretation that chromosome 2 inversions are shared between An. arabiensis and An. gambiae by introgression have been elaborated previously (see Garcia et al. 1996; Powell et al. 1999; Besansky et al. 2003). According to our speculative history, introgression from An. arabiensis brought the 2Rb, 2Rc, and 2La arrangements into An. gambiae. Conversely, introgression in the opposite direction is proposed for the introduction of 2R+b and 2R+c, but not 2L+a into An. arabiensis. The inability of the 2L+a to introgress from An. gambiae into An. arabiensis, in contrast to the bidirectional flow of 2Rb and -c rearrangements, is supported by laboratory crossing experiments (della Torre et al. 1997). As a result of these semipermeable species boundaries (Besansky et al. 2003), both 2R rearrangements are polymorphic in both species, but the 2L rearrangement is polymorphic only in An. gambiae. Thus, the opportunity for gene flux is much greater between alternate arrangements of 2R than those on 2L. This process would be accelerated if multiple bidirectional introgressions of 2R arrangements had occurred. Because of low interspecific divergences relative to intraspecific diversity in the An. gambiae complex (e.g., Obbard et al. 2009), the testing of this hypothesis must await solutions to complex, interrelated and as yet unsolved problems. These include the correct inference of phylogenetic relationships, the identification of an appropriate outgroup species, and the disentanglement of polymorphisms shared through recent common ancestry vs. introgression—problems that future genome sequencing projects could powerfully address.
The single candidate region we mapped via microarray was further narrowed down to ∼100 kb with targeted sequencing. This region contains only seven genes, which should make identifying candidate mutations practical in future studies. In fact, the ubiquitin-conjugating enzyme e2-230k that we sequenced is an interesting candidate not only due to the high differentiation in our sequenced samples, but also because this gene contained four SFPs in the microarray comparison—the most of any gene in the genome. Interestingly, the region of the gene we sequenced and the location of the SFPs are not the same, suggesting that this gene is highly differentiated between arrangements across its length. Unfortunately, nothing is known about the phenotypic differences between mosquitoes carrying different arrangements of 2Ru. Unlike for 2La, 2Rb, and 2Rc, no clines have been reported for 2Ru; however, an ecological study in Mali revealed that the inversion cycles seasonally, increasing in frequency during the wet season and then decreasing during the dry season (Toure et al. 1994). Although finding candidate mutations is foreseeable, linking these mutations to phenotypic differences will be a challenging step in the path toward a more comprehensive understanding of the adaptive mechanisms underlying inversions in An. gambiae. Further knowledge of this fundamental aspect of An. gambiae biology may lead to more effective interventions against malaria.
We thank M. Kern for laboratory assistance, O. Grushko for karyotyping, and B. Cassone for R scripts. M. Coulibaly and S. Traoré managed the 2004 field collections in Mali and generously made these available to us; A. della Torre kindly provided karyotyped An. arabiensis in 1995. Z. Chunxiao and R. Jerome performed the microarray hybridizations under the direction of H. Edenberg at the Center for Medical Genomics, Indiana University School of Medicine. The authors are indebted to J. Feder, M. Hahn, two anonymous reviewers, and D. Begun, whose suggestions substantially improved the text. This work was funded by National Institutes of Health grant AI63508 (to N.J.B.). B.J.W. was supported by an Arthur J. Schmidt fellowship from the University of Notre Dame. The Center for Medical Genomics is funded in part by the Indiana Genomics Initiative of Indiana University (INGEN); INGEN is funded in part by The Lily Endowment.
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.105817/DC1.
↵1 Present address: Malaria Research and Training Center, Bamako BP1805, Mali.
Communicating editor: D. Begun
- Received June 2, 2009.
- Accepted July 1, 2009.
- Copyright © 2009 by the Genetics Society of America