Species-specific regulation of gene expression contributes to the development and maintenance of reproductive isolation and to species differences in ecologically important traits. A better understanding of the evolutionary forces that shape regulatory variation and divergence can be developed by comparing expression differences among species and interspecific hybrids. Once expression differences are identified, the underlying genetics of regulatory variation or divergence can be explored. With the goal of associating cis and/or trans components of regulatory divergence with differences in gene expression, overall and allele-specific expression levels were assayed genomewide in female adult heads of Drosophila melanogaster, D. simulans, and their F1 hybrids. A greater proportion of cis differences than trans differences were identified for genes expressed in heads and, in accordance with previous studies, cis differences also explained a larger number of species differences in overall expression level. Regulatory divergence was found to be prevalent among genes associated with defense, olfaction, and among genes downstream of the Drosophila sex determination hierarchy. In addition, two genes, with critical roles in sex determination and micro RNA processing, Sxl and loqs, were identified as misexpressed in hybrid female heads, potentially contributing to hybrid incompatibility.
EXTENSIVE variation for transcript level within and between species has been found in many different organisms (Sandberg et al. 2000; Townsend et al. 2003; Morley et al. 2004; Nuzhdin et al. 2004; Lai et al. 2006). Researchers have also sought to connect such overall transcript level variation with cis and/or trans regulatory differences (Lo et al. 2003; Yvert et al. 2003; Hughes et al. 2006). The large number of genomewide investigations into expression variation, divergence, and inheritance in both Drosophila melanogaster and its closest known relative, D. simulans, provide a strong basis for further study of regulatory evolution. Within D. melanogaster and D. simulans, thousands of genes differ between strains (Jin et al. 2001; Nuzhdin et al. 2004; Hutter et al. 2008). The number of genes with expression level differences between these species is even greater: approximately twice as many genes differ between these species as vary within them (Ranz et al. 2003, 2004; Rifkin et al. 2003). Investigations into the mode of inheritance for expression levels have revealed that, while most transcript level variation appears to be additive, significant dominance variance is frequently observed (Gibson et al. 2004; Vuylsteke et al. 2005; Cui et al. 2006; Wayne et al. 2007). While these overall patterns in transcriptome divergence have been characterized, the genetic and mechanistic basis of transcriptome variation and divergence are not as well understood.
For genes found to be expressed differently between genotypes or species, the underlying sequence differences may be located within the regulatory regions of the gene itself (cis; Doebley et al. 1997; Sucena and Stern 2000; Cong et al. 2002). However, many differences in gene expression are expected to be the result of sequence divergence elsewhere in the genome (divergence of trans acting factors). A single coding or regulatory change in one gene can result in a cascade of downstream trans effects on many other genes (Michalak and Noor 2004; Chesler et al. 2005; Tarone et al. 2005). Molecular interactions between regulatory components may also affect expression (i.e., cis by trans interactions, trans by trans interactions; Shaw et al. 2002; Landry et al. 2005).
While the phenotypic effects of divergence in regulatory interactions are generally poorly understood, massive epistasis for gene expression levels has been observed in species hybrids (Michalak and Noor 2003; Ranz et al. 2004; Malone et al. 2007). Misexpression, i.e., expression in the hybrids that differs from expression in either parent resulting from aberrant regulatory interactions, could be the underlying cause of hybrid sterility or inviability in some cases (Michalak and Noor 2004; Ortíz-Barrientos et al. 2007). Successfully identifying cis and trans components of regulatory divergence may, therefore, yield important insights into the genetics of hybrid incompatabilities.
Cis and trans differences in gene expression can be explored with eQTL mapping (see for review Gibson and Weir 2005; Li and Burmeister 2005; Ranz and Machado 2006); however, the required mapping populations would be exceedingly difficult to create for the D. melanogaster, D. simulans species pair due to hybrid sterility and inviability (Sturtevant 1920; Barbash et al. 2000). Fortunately, cis and trans differences can also be dissected through analysis of allele-specific expression (ASE) in hybrid animals (Yan et al. 2002; Wittkopp et al. 2004; for review, see Knight 2004). ASE can be measured by allele-specific reverse transcription–PCR (asRT–PCR; Singer-Sam 1994; Cowles et al. 2002; Guo et al. 2004) or by pyrosequencing (Ahmadian et al. 2000; Wittkopp et al. 2004). These approaches capitalize on differences in how cis and trans acting regulatory variants affect gene expression in the F1 hybrid compared to the parental, pure species. However, pyrosequencing and asRT–PCR methods currently have practical limits on the number of genes that can be assayed, somewhere from 10 to 100. Consequently, while cis and trans regulatory divergence have been studied in species hybrids for limited numbers of genes (Wittkopp et al. 2004, 2008; Landry et al. 2005), there are currently no genomewide studies of D. melanogaster–D. simulans species differences in cis and trans transcript level regulation.
Cis and trans effects can be assayed for the whole genome by estimating allele-specific expression on a SNP array, or expression array platform (SNP array, Lo et al. 2003; expression array, Ronald et al. 2005), or potentially using next generation sequencing. However, as there is no SNP-based array design currently available that can be used for Drosophila, and next generation sequencing techniques are currently expensive for multiple samples, a novel method to infer cis- and trans-based differences in gene expression from the Drosophila Affymetrix GeneChip Tiling 1.0R Array was developed. Cis and trans effects were inferred from comparisons of allele-specific expression, adapting well-established methods to a tiling array design (Knight 2004; Wittkopp et al. 2004). Analyzing expression using a tiling format, rather than an expression array, is advantageous because of the even distribution of probes along each transcript and the greater number of probes per gene.
A tiling array, was used to estimate overall expression level and allele-specific expression for D. melanogaster, D. simulans, and F1 hybrid genotypes leading to an analysis of cis and trans regulatory divergence in female adult heads. Many studies indicate the importance of tissue specificity in gene regulation (Maniatis et al. 1987; Parisi et al. 2004; Yang et al. 2006; Mank et al. 2008). In particular, regulatory divergence of brain, eye, and antennal genes among species may be linked to adaptive phenotypes such as those involved with behavior and sensory perception (Kopp et al. 2004; Ranz et al. 2004; Landry et al. 2007), which in turn may be linked to differences in sexual selection or in ecological niches.
MATERIALS AND METHODS
The expression of the D. melanogaster allele in D. melanogaster, D. simulans hybrid genotypes can be estimated using a D. melanogaster tiling array if probe target sequences differ between species. Tiling array probes in genic regions were categorized as specific to the D. melanogaster allele (allele-specific expression or ASE probes) or as not species specific and thus measuring overall expression (OE probes). Given the two types of probes, tests were constructed for species divergence, dominance, and cis/trans components of expression. To control for bias in hybridization, gDNA samples were used. Probes were screened for possible biases and if biases were detected they were removed from further analyses.
Theoretical basis for interpreting cis and trans effects from ASE:
The overall expression level (OE) of an autosomal gene in D. melanogaster (m) and D. simulans (s) can be described as:(1a)(1b)where c and t represent the main cis and trans effects and ct and tt are interaction effects. Note that in the homozygous parent, all the effects are due to alleles in that parent, and that these individual measurements do not allow the separate estimation of the cis and trans components. A common control for measurement error is to use mixed parental samples, as these have the same expected value as the mathematical average of the two parental genotypes. The transcript level in heterozygous samples (F1 hybrids, ms and parental mixes, mix) is described by the following equations:(2a)(2b)
Additivity, defined as the situation where the expression level in heterozygotes is equal to the mean level of the two homozygotes [OEms = (OEmm + OEss)/2], can be expressed in terms of cis and trans effects as ctmm + ctss + ttmm + ttss = ctms + ctsm + 2ttms (Equations 1, a and b, and 2a). Note that for additivity of expression, the sum of all of the within-species interaction terms must be equal to the sum of all of the between-species interaction terms.
The allele-specific expression in heterozygous samples for the two alleles (m and s) is:(3a)(3b)(3c)(3d)
Note that the expected value of allele-specific expression in D. melanogaster, , is the same as . is used in lieu of to control for possible cross-hybridization effects of the m allele in the presence of the s allele. The contrast between allele-specific expression of the D. melanogaster parent as estimated by with allele-specific expression measured in the hybrid, , shows that the measurements of ASE in parental and hybrid genetic backgrounds will be equal if tm + 2ctmm + 3ttmm = ts + ttss + 2ttms + 2ctms (Equation 3c = 3a). This is then a test for the composite trans effects, main effects plus interaction terms. For the remainder of the article, trans effects include the main trans effect and all of the associated interactions specified above. The test of trans effects will be:(4)
Comparing allele-specific expression in the F1 hybrid, to shows that the two alleles are only equal in their expression when cm + ctmm + ctms = cs + ctss + ctsm (i.e., Equation 3a = 3b). If , then ASEmsm/(ASEmsm + ASEmss) = ASEmsm/2ASEmsm and therefore equality of allele-specific expression can be tested as ASEmsm/OEms equal to 0.5. Note that this test, as with the test for trans effects, includes all of the interactions with the main effect of cis; this composite cis effect is referred to simply as cis throughout. If array measurements (which use different probe sets for overall and allele-specific expression) were free from sequence-specific biases in probe intensity levels, this relationship could be tested using estimates derived only from RNA samples directly. As probe-specific biases have been demonstrated (Li and Wong 2001; Wu and Irizarry 2005; Gharaibeh et al. 2007), DNA controls were used to estimate and account for the probe effect. gDNA hybridizations were used to estimate bias in probe intensity levels for this experiment and cis effects were tested (for log transformed, background normalized expression scores) as:(5)
Note that this assumes that transcription level is independent between alleles. For some genes showing transvection effects on expression (for review, see Duncan 2002), this may not be true. If this assumption is violated, divergence in cis regulatory regions could contribute to estimates of trans effects, leading to possible underestimation of the overall cis effect and overestimation of the trans effect.
In summary, cis effects were tested by comparing the allele-specific expression of the D. melanogaster allele in the hybrid to the overall expression level in the hybrid. Trans effects were tested by comparing the allele-specific expression of the D. melangaster allele in the hybrid to the allele-specific expression of the D. melanogaster allele in the mixed parental sample (see statistical analysis for details).
Fly strains and sample collection:
Sequenced, isogenized strains of D. melanogaster (y; cn bw sp), D. simulans W501, and D. simulans C167.4 were used as parental stocks (Adams et al. 2000; Begun et al. 2007). Two of these sequenced stocks, D. melanogaster y; cn bw sp and D. simulans W501, contain visible mutations (y, body color; cn and bw, eye color; sp, body color and wing; w, eye color) the implications of this are further discussed in the supporting information (File S1 and Table S6). Isogenic parental stocks (20 virgin females and 20 males) were propagated in bottles at 25° on dextrose medium. Interspecific crosses were conducted by crossing 20 D. melanogaster virgin females to 60 D. simulans males (W501 or C167.4). The F1 progeny of these interspecific crosses will be referred to as F1W and F1C, respectively. Virgin female progeny, from the parental lines and F1W and F1C, were transferred to vials and after 5–7 days, lightly anesthetized (under CO2), and snap frozen in liquid nitrogen. All flies were frozen between 3:30 and 5:30 pm on consecutive days. The heads and bodies of three independent replicates per genotype (∼100 flies each) were separated and collected in a cold room using a sieve chilled on dry ice (Telonis-Scott et al. 2008). Each independent sample was stored at −80° prior to DNA and RNA extractions.
The bodies of 60 female flies (50–70 mg of tissue) were homogenized in 180 μl of phosphate buffered saline (pH 7.2). Samples were treated with RNase A (4 μl) and incubated for 2 min at room temperature. Proteinase K (20 μl) and buffer AL (200 μl) were added and samples were incubated for 10 min at 70°. The lysate was centrifuged for 5 min at 20,000 × g and the supernatant was transferred to a fresh tube. Genomic DNA was then extracted using the QIAGEN DNAeasy blood and tissue kit following the manufacturer protocol. Independent gDNA samples, from D. melanogaster and from D. simulans W501, were pooled in equal concentrations to produce a mixed gDNA sample (Figure 1). These sample types are notated as MelDNA, C167.4DNA, W501DNA, F1W-DNA, F1C-DNA, and MixDNA.
Samples were homogenized using a motorized pestle in 1 ml of TRIzol in prechilled microcentrifuge tubes. This resulted in 21 independent samples (3 replicates of D. simulans C167.4, F1W-RNA, and F1C-RNA; 6 replicates of D. simulans W501, and 6 replicates of D. melanogaster). Total RNA was then isolated according to the Drosophila Genomics Resource Center RNA extraction protocol (Bogart and Andrews 2006). This resulted in yields of 7–10 μg of RNA for each sample. For each of the six W501 and D. melanogaster samples, three samples were randomly selected and randomly paired. The three pairs were then mixed in equal concentrations to generate three independent samples (MixRNA). The result was 18 independent RNA samples notated as MelRNA, C167.4RNA, W501RNA, F1W-RNA, F1C-RNA, and MixRNA.
Sample labeling and hybridization:
For each of the RNA samples the starting material for sample processing was 5 μl of total RNA in DEPC H2O (7 μg total RNA). cDNA was synthesized using the GeneChip WT Double-Stranded cDNA synthesis kit (Affymetrix, PN 900813) according to GeneChip WT Double-Stranded target assay procedure A: First strand cDNA synthesis protocol. The resulting samples were then cleaned, fragmented, and labeled using the Sample Cleanup Module (Affymetrix, PN 900371) and WT Double-Stranded DNA terminal labeling kit (Affymetrix, PN 900812) following procedures B through D of the WT target assay protocol. The genomic DNA samples were fragmented and labeled following protocols provided by Michael E. Zwick (Cutler et al. 2001; Nuzhdin et al. 2007). Fragmentation was analyzed using the Bioanalyzer DNA 1000 LabChip kit. Both cDNA and gDNA samples were hybridized to GeneChip Drosophila Tiling 1.0R Arrays and scanned at the University of Pennsylvania School of Medicine Microarray Facility according to standard manufacturer protocols. Arrays were examined for quality and all arrays were found to have successfully hybridized.
All positive match and control probe sequences on the array were queried against FlyBase release R5.4 using Mega BLAST to the genome sequence (DUST filter off, word-size 19). If a probe failed to match the genome with these criteria or, alternatively, aligned to the genome at multiple positions, it was discarded (2,998,760 probes retained; 174,132 discarded). Negative control probes (Arabidopsis thaliana and Bacillus subtilis genes) were discarded if full or partial matches existed in the current Drosophila genome release (n = 707 negative control probes retained). Exons annotated in R5.4 were downloaded (on January 1, 2008) and all remaining probes were BLASTed to these exons to determine probe locations relative to intron/exon boundaries. Intergenic probes and probes located at exon borders were removed; retained probes are referred to as “exon probes” (n = 742,711). For multitranscript genes, probes located within a single gene may correspond to different sets of transcripts (transcript set).Therefore, each probe was classified as belonging to all transcripts (All Ts) for a gene or to a subset (i) of transcripts (S. Tsi) for the gene (Figure 2). All probes, regardless of exon location, that target the same set of transcripts belong to the same transcript set. All probes targeting the same transcript set were grouped together in probe groups, n = 22,580, representing 14,027 genes.
If an exon probe's signal was greater than the 90% value of the set of negative control probes, then the probe was considered “detected” for that array. To account for low signal caused by poor probe performance (Santalucia and Hicks 2004; see for review Koltai and Weingarten-Baror 2008), exon probes not detectable on the gDNA arrays for all three parental strains were removed from further consideration. Probes in a single genome position that are part of gene models for more than one gene, probes that are not unambiguously assigned to a single exon, and probes in exons with fewer than three detected probes were also removed. At this stage 167,854 of the 742,711 probes were removed from further consideration. Signal values were then log normalized and scaled by the array median of the negative controls for each array.
Probes may measure overall transcript level or, if significant sequence divergence exists between the species, a probe may not hybridize effectively to the D. simulans sample. These probes would be discarded if the only interest was in measuring OE. If these probes can be identified, they can be used to measure ASE for the D. melanogaster allele. The most straightforward way to do this would be to use the available D. simulans sequence data (Begun et al. 2007). Unfortunately, only a portion of the detected probes have complete D. simulans sequence available (590,423/742,711) and of these only 374,959 can be unambiguously assigned as containing (not containing) a SNP (see File S1).
A novel approach was developed based upon the expectation that sequence divergence between D. melangaster and D. simulans will result in signal attenuation in the D. simulans allele relative to the D. melanogaster allele. The purpose is to identify cases where signal intensity is reduced or absent to classify probes as allele specific or not allele specific rather than to identify particular sequence differences. Each probe group (corresponding to a transcript set) was examined for probes with signal intensities in the top two-thirds on the D. melanogaster gDNA array, but with decreased relative signal (in the bottom third) on both D. simulans gDNA arrays. These probes were identified as allele-specific probes (ASE) and the remaining probes were classified as overall expression probes (OE). For each probe group, OE and ASE probe sets were constructed if at least three probes were detected for each category. There are 13 transcript sets for which only ASE probe sets were constructed, 6662 transcripts sets with only probe sets measuring OE and 8154 transcript sets for which both OE and ASE probe sets are available (Table 1). Thus, OE probe sets were constructed for a total of 14,816 transcript sets (12,399 genes, ∼82% of R5.4 annotated genes).
This approach was compared to identification of allele-specific probes based upon sequence data for the set of probes, assigned to probe sets, with available SNP status (N = 193,145/374,959). The false positive rate (the percentage of probes without SNPs that were falsely predicted to contain SNPs) of the rank-based method of probe classification was 0.76% and the positive predictive value (the percentage of all SNP calls that were true SNPs) was 97%. The false negative rate (the percentage of probes with true SNPs that were classified as no SNP probes) was much higher, 49%, and the negative predictive value (the percentage of all no SNP calls that had no sequenced SNPs) was 78% (supporting information, Table S1). As the positive predictive value is crucial to the identification of ASE probes, the method was deemed to be a success. All probes, regardless of the availability of underlying sequence information, classified as ASE probes using the rank-based method were examined in subsequent analyses. Additionally, there was little evidence for bias in this method when logical checks were applied (see File S1). Other approaches were also examined but were discarded as they did not perform as well.
Gene expression signal for a probe set was considered to be absent on a given RNA array if <50% of the probes had detectable signal. If expression for a probe set was not detected in two or more replicates, it was called absent for that genotype. If expression was absent from all genotypes, the probe set was removed from all tests. Considering only probe sets with detectable signal, there are 6764 ASE probe sets representing 6403 genes, and 12,417 OE probe sets representing 10,207 genes (Table 1). The level of gene expression for a probe set was estimated by taking the Tukey biweight of the intensity level for all probes in the set (Goodall 1983).
Overall expression levels in the two D. simulans parental lines and the two corresponding D. melanogaster, D. simulans hybrid genotypes were compared. The ANOVA model Yij = μ + gi + εij was fit. Yij is the OE for the ith genotype (i = W501, C167.4, Mel, F1C, F1W, Mix) and the jth replicate (j = 1, 2, 3). Mean OE of the two hybrids was compared using an F-test and very few differences were found (Table 3 and File S1). As combining the two D. simulans genotypes into single species level tests improves power, and the OE differences between D. simulans genotypes were minimal, the remainder of the analysis focused on the average effect for the parental D. simulans lines and for the hybrid genotypes. For completeness, estimates of individual effects are included in File S2 and Table S8.
Species and dominance tests:
For overall gene expression, differences between parents (species) and between parents and offspring (dominance) provide a baseline with which to understand how expression levels as a whole vary within and among species. For the RNA samples (n = 18), the ANOVA model Yij = μ + gi +εij was fit for each OE probe set. Yij is the gene expression for the ith genotype (i = W501, C167.4, Mel, F1C, F1W, Mix) and the jth replicate (j = 1, 2, 3). Variance for each genotype was not assumed to be constant. The null hypothesis that the species means are equal was tested. Dominance effects were estimated as the difference between the hybrid mean, OEms = (OEmel-W501 + OEmel-C167.4)/2 and the midparent mean, OEMP = [(OEW501 + OEmm)/2 + (OEC167.4 + OEmm)/2]/2, and classified as underdominance (OEms < OEmm and OEms < OEss, where OEss = (OEW501 + OEC167.4)/2), overdominance (OEms > OEmm and OEms > OEss), or dominance (complete, for example OEmm = OEms < OEss, or partial, for example OEmm < OEms < OEss). F-tests for these contrasts were constructed using the residual error as the denominator for the tests. Multiple tests were controlled for using a false discovery rate (FDR) (Benjamini and Hochberg 1995; reviewed in Verhoeven et al. 2005).
Cis and trans tests:
Genetic differences in gene expression are a result of cis and trans regulatory differences. For the current analysis to be logically consistent with the analyses described by Wittkopp et al. (2004), probe sets must be detected in at least one parent, and further, should show no evidence of probe-specific bias. Additionally, there should be no evidence that gDNA and cDNA hybridizations have different properties for the probes analyzed (see File S1 for full details). There were 2459 probe sets (2383 genes) that met these strict criteria implemented for analysis of cis and trans effects.
For cis effects the following ANOVA was fit for each probe set separately: Rijk = μ + Eij + εijk. Rijk is the difference between ASE and OE measurements for the ith genotype (i = Mel, C167.4, W501, F1W, F1C, Mix), for the jth sample type (j = RNA, DNA), and for the kth replicate (k = 1, 2, 3). Note that for the DNA, the three samples F1W, F1C, Mix were considered heterozygous replicates of the difference in ASE and OE to control for possible hybridization bias. The test of cis effects (Equation 5) is a test of the null hypothesis that the difference between allele-specific and overall expression is the same for the RNA and for the DNA controls (ASEms_mRNA− OEms_RNA = ASEDNAm − OEDNA). This was performed as a contrast in the above model: (RF1c + RF1w)/2 = RDNA with an F-test for each transcript set and variances were not assumed to be constant.
For trans effects, the ANOVA model Yijk = μ + gik + ejk + geijk + εijk was fit, where Yijk is the expression estimate for the ith genotype (i = Mel, C167.4, W501, F1W, F1C, Mix), the jth expression set (ASE or OE), and the kth replicate (k = 1, 2, 3). Variance among genotypes was not assumed to be constant. To test for trans effects (ASEmixm = ASEmsm; Equation 4), the allele-specific expression for each hybrid was compared to the parental allele-specific expression estimated from the mix sample using LS-means from the above model (Ymix,ASE = (YF1w,ASE + YF1c,ASE)/2). An F-test was used to test this hypothesis. For all tests, multiple testing was controlled using the original FDR procedure (Benjamini and Hochberg 1995). All models were fit in SAS V 9.1.3 and all SAS code for analysis, as well as a detailed set of documentation, is available upon request from the authors.
RESULTS AND DISCUSSION
Cis and trans differences in gene regulation between species can result in various patterns of overall expression levels observed in each species and in hybrid genotypes. Regulatory divergence may give rise to differences in overall expression between species. However, even when the overall transcript level is conserved between species, differences may exist. Stabilizing selection for transcript levels can result in compensatory changes in cis and trans regulatory elements (True and Haag 2001; Haag 2007; Johnson and Porter 2007). Therefore, regulatory divergence was analyzed for as many genes as possible, including those not differentially expressed between species. The relative contributions of cis and trans differences to regulatory divergence were assessed by comparing allele-specific expression levels within hybrids and, for a single allele, between the hybrid and parental genotypes (Table S8http://www.genetics.org/cgi/data/genetics.109.105957/DC1/7). In hybrid individuals, trans differences affect both alleles equally, while the effects of cis differences on expression are allele specific. Therefore, in species hybrids, differences in the amount of transcript produced from each allele are caused by divergence of cis regulatory elements. The difference in expression of an allele in the hybrid compared to the expression of that allele in the parent is a result of differences in trans components, because the hybrid genetic background contains trans factors from both of the parental species. The Drosophila 1.0R tiling array format was used to estimate overall expression levels and allele-specific expression levels, allowing the comparison of species, dominance, and cis/trans effects for the entire transcriptome. (See Table S9.)
Variation in cis and trans regulation may be shaped by different evolutionary forces:
Regulatory differences between species are a result of multiple cis and/or trans differences and their interactions. A total of 2457 probe sets (2381 genes) were tested for both cis and trans differences. There were 654 probe sets (650 genes), which showed evidence for significant cis differences and 394 probe sets (390 genes), which showed evidence for significant trans differences (Table 2). There were 93 probe sets significant for both cis and trans effects. This is a substantially higher number of cis effects than trans effects, and the trend is consistent at different FDR levels.
The relative number of cis and trans effects have been studied in Drosophila at different levels of divergence and using different experimental designs. As the level of divergence increases, the difficulty of creating backcross genotypes also increases and experimental designs that require them become untenable. Thus, studies of cis and trans regulatory variation between strains or between races of D. melanogaster have been done using recombinant inbred lines (RILs) and chromosome substitution designs (Hughes et al. 2006; Genissel et al. 2008; Lemos et al. 2008; Wang et al. 2008). Studies of cis/trans regulatory divergence between D. melanogaster and D. simulans have been done using analyses of allele-specific expression in parental and hybrid genotypes (Wittkopp et al. 2004, 2008).
It is important to point out explicitly that counts of cis and trans reported in eQTL and chromosome substitution experiments are not necessarily directly comparable to cis and trans as inferred from F1 hybrid-based experiments. There are likely differences in the power to detect cis and trans effects, which are dependent on experimental design and there are also differences between designs in the interpretation of cis/trans effects. For example, in an eQTL experiment where RILs are employed, trans acting regulatory variants can be mapped individually if there are sufficient recombination events between the loci; the number and magnitude of the resulting trans effects on expression level can be estimated. In contrast, in a F1 hybrid design that infers cis and trans from comparisons of allele-specific expression, the inference of trans refers to the cumulative trans influences on expression of a single gene. The main effect is not separately estimable from the interactions and trans effects of potentially multiple genes are estimated as a whole, rather than individually. The result is that estimates of the extent of trans in hybrid experiments are dependent on the cumulative impact of trans variation across loci. Thus, trans effects can be detected in this design only if they tend to shift expression in one particular direction. In contrast, if trans acting regulatory divergence occurs in multiple genes, and the effects are compensatory such that the total level of expression remains relatively constant, then these trans effects will not be detected in the hybrid design. Differences in the reported contribution of cis effects may also be explained, in part, by differences in the genetic design. Multiple cis and/or trans effects can be obscured in eQTL designs when regulatory differences are clustered close together or are otherwise in linkage disequilibrium. Cis effect estimates may also differ between approaches because the cis component in a hybrid experimental design can consist of both main effects and interactions, whereas eQTL designs can separate these terms to some degree.
Keeping in mind differences between experimental designs in interpretation of cis and trans, what can be concluded about cis and trans components of regulatory variation? Surveying within-species variation for cis and trans in D. melanogaster males, Hughes et al. (2006) found roughly equal numbers of cis and trans differences (for third chromosome substitutions). Wang et al (2008) also explored cis and trans effects in males using chromosomal substitutions (X, 2, and 3) between races of D. melanogaster and found pervasive effects of trans (>80% of genes had a trans effect), with many fewer main cis effects (3–14%, occurring mostly in combination with trans regulatory variation). This suggests that the trans effect (composite) may increase with increasing genetic divergence. The trans effect estimated in these chromosome substitution experiments may contain a significant interaction-based component (relative to main effects; Wang et al. 2008). The contribution of genetic interactions to expression variation is expected to increase with divergence and a significant proportion of this increase is likely driven by rapid divergence in male reproductive genes (Orr and Turelli 2001; Ranz et al. 2003; Haerty et al. 2007). However, a similar study of second chromosome substitutions, also in males, found no increase in trans with increasing divergence (Lemos et al. 2008). Thus, the relative contribution of divergence time within and between species remains unclear.
In contrast, there is clear evidence that the cis component of regulatory variation does increase with genetic divergence. Both the within-species work of Lemos et al. (2008) comparing the cis contribution to expression variation between populations and races in males, and the experiments of Wittkopp et al. (2008) comparing the cis contribution within species and between species in females, show an increased contribution of cis regulatory differences to expression variation with increasing genetic distance. There are multiple hypotheses that might explain this pattern. An increased rate of cis regulatory divergence could be explained by differences in additivity between cis and trans components of regulatory variation or by differences in the distribution of fitness effects associated with cis and trans regulatory mutations (Barton and Partridge 2000; Lemos et al. 2008; Wittkopp et al. 2008).
Interestingly, there are currently no studies assessing cis and trans components separately for males and females. Studies using both sexes or focusing exclusively on females have found greater contributions of cis effects than of trans effects both within and between species. Using a set of RILs derived from two strains of D. melanogaster, Genissel et al. (2008) found far fewer trans effects than cis, but argue that the trans component could be underestimated in this design. Wittkopp et al. (2004) assayed cis and trans components between species in females only and found a much greater contribution of cis regulatory divergences than trans. In their experiment, cis and trans effects were inferred for 29 genes selected on the basis of overall differences in expression between D. melanogaster and D. simulans. All but one of the genes assayed showed patterns consistent with cis differences; roughly half also had a trans component. The results of this genomewide study, of necessity also in females, are consistent with the findings of Wittkopp et al. (2004). Approximately 27% of genes tested had significant cis effects, 16% had trans, and 4% had both cis and trans effects. For some classes of genes, selection on expression level variation is likely to differ between the sexes and this could potentially result in differences in cis and trans contributions to expression variation and/or in different rates of evolution for cis/trans regulation. Similarly, there may well be parent-of-origin effects with respect to cis and trans differences (Gibson et al. 2004), though associations between such effects and cis and trans variation is unexplored. However, since the species pair of D. melanogaster, D. simulans does not produce both males and females in a single direction of the cross (without introducing hybrid rescue mutations into the genetic background), conclusions about the interplay of parent-of-origin, sex, and divergence time with respect to shaping regulatory variation and divergence will likely require experiments in more closely related species pairs.
Species differences are focused on defense and olfaction:
There were 12,416 probe sets (representing 10,206 genes) tested for species effects. Of these, 359 probe sets were expressed differently between species (Table 3, Figure 3). While this is a much lower percentage of expression divergent genes than has been observed in whole body samples (Ranz et al. 2003, 2004; Nuzhdin et al. 2004), expression of genes in the head is generally conserved relative to whole bodies (Ranz et al. 2004). An examination of GO categories for enrichment of genes differentially expressed between species using Fisher's exact test (Rivals et al. 2007; Table S2) found overrepresentation of genes from categories related to olfaction, defense, and the nervous system [olfaction–GO:0007606 (sensory perception of chemical stimulus), GO:0005549 (odorant binding), GO:0042048 (olfactory behavior); Defense–GO:0009636 (response to toxin), GO:0008202 (steroid metabolic process); Nervous system–GO:0006952 (neurotransmitter transport), and GO:0005328 (neurotransmitter: sodium symporter activity)]. For these enrichment tests, we used all genes where there were informative probe sets for the test of species. To test whether these broader functional categories were themselves enriched, multiple GO categories were grouped into the larger categories defense, olfaction, and nervous system (details given in File S1 and Table S5). In the head, the broad groups of defense (P = 0.002) and olfaction (P = 0.004) are clearly divergent between these species.
Genes involved in host defense may evolve rapidly due to host–pathogen coevolution (Jiggins and Kim 2007; Sackton et al. 2007). There were 41 defense-related genes that were differentially expressed between species (Table S3http://www.genetics.org/cgi/data/genetics.109.105957/DC1/4http://www.genetics.org/content/vol0/issue2009/images/data/genetics.109.105957/DC1/Supplemental_Table_5._Go_categories_grouped_to_produce_larger_functional_categories.XLShttp://www.genetics.org/cgi/data/genetics.109.105957/DC1/5http://www.genetics.org/cgi/data/genetics.109.105957/DC1/6), including Turandot A and Turandot C (TotA and TotC, antimicrobial peptides; Agaisse et al. 2003), Argonaute 2 (AGO2, viral infection response via siRNA regulation; Zamore 2007), and glutathione S-transferases (GstE1, GstD2, and GstD9 in response to toxins; Low et al. 2007), some of which show signatures of positive selection in coding sequence (Obbard et al. 2006; Sackton et al. 2007). Seven odorant-binding proteins were identified: Obp8a, Obp49a, Obp56a, Obp56g, Obp56h, Obp57c, and Obp99b. Many of these, and other Obps, have been reported to differ in antennae (Kopp et al. 2008). In Drosophila, odorant-binding proteins are expressed primarily in the olfactory and gustatory sensilla (both of which are present in the tissues that comprise head samples) and are thought to function mainly in chemosensory processes (Steinbrecht 1998; Galindo and Smith 2001). An analysis of coding sequence differences showed results consistent with balancing selection in Obp56h, Obp56g, and Obp99b, while Obp56a showed evidence for strong positive selection (Wang et al. 2007). Obp56g and Obp56a have been detected as mating responsive in studies of changes in female gene expression after copulation and four of the listed Obps (Obp8a, Obp56a, Obp57c, and Obp99b) were expressed differently between lines selected for increased/decreased copulation latency (McGraw et al. 2004; Mackay et al. 2005). Coupled with our results, this indicates that Obps could be involved in species differences in female reproduction, acting in olfactory processes important for pheromone detection (Xu et al. 2005) or ovoposition (Matsuo et al. 2007). However, the specific functional role of most Obps in Drosophila is unknown.
While neurological genes in general were not enriched for species differences, a large number of genes shown to be sex-specifically regulated in Drosophila heads (Goldman and Arbeitman 2007) are differentially expressed between species (Table 4). Of the 216 genes tested for species differences that were reported to be sex-specifically regulated by Goldman and Arbeitman (2007), 16% were divergent between species for gene expression in female heads. Overall, 3% of probe sets are divergent between species for gene expression. This difference is suggestive of increased divergence among sex-regulated genes in Drosophila heads that is potentially dependent on pathway (Table 4). Of the 17 genes tested that are regulated downstream of fruitless (fru, the master regulator of male courtship behavior; Demir and Dickson 2005), 9 are expressed at greater levels in D. simulans relative to D. melanogaster. In contrast, the 17 genes regulated downstream of tra are all expressed at greater levels in D. melanogaster. Since the role of fru regulation in head tissues of females is unknown, it is unclear whether these differences are functionally important in females.
Functional implications of over- and underdominance for expression level in hybrid female heads:
The test for differences between the average of the parental expression and offspring is a test of dominance for gene expression. Dominance for gene expression occurs when genetic interactions (trans by trans, cis by trans, and higher order interactions) differ between parental genotypes and hybrids. Several analyses of misexpression in hybrids found a relative excess of underdominance, but also observed large effect overexpression in species hybrids (Michalak and Noor, 2003; Ranz et al. 2004; Moehring et al. 2007). In this study, most of the genes expressed in the head appear to behave additively (also observed by Ranz et al. 2004). In contrast with the patterns of dominance observed for expression in whole body samples, roughly even numbers of genes showing underdominance and overdominance were observed for expression in hybrid head tissues (Figure 4, Table 5).
It has previously been shown that male-biased genes, in general, are misexpressed in D. melanogaster, D. simulans female hybrids (Ranz et al. 2004). Three of the genes with significant dominance (Obp99b, fit, and to) are known to be sex biased in the fat body cells of the Drosophila head tissues (Fujii and Amrein 2002; Wolfner 2003). Obp99b and to are genes with male-biased expression that are thought to play a role in male mating behavior (Dauwalder et al. 2002; Anholt et al. 2003), while fit is generally female biased in overall expression. However, fit is also upregulated in Drosophila males as a result of courtship, so in fact may also be involved in male courtship (Carney 2007). It is possible that misexpression of these genes in hybrid females results in abnormal mating behaviors. Fujii and Amrein (2002) overexpressed Obp99b in females and found that this resulted in reduced receptivity to mating relative to wild-type flies. However, hybrid female mating behavior has not been studied in this species pair. Studying abnormal female behavior would be difficult because hybrid females are completely sterile with degenerate ovaries. Any behavioral abnormalities observed could be interconnected with this major defect. In fact, overexpression of male fat body genes is likely a direct result of the enlarged fat body of hybrid females, which itself is physiologically associated with degenerate ovaries (Dickinson et al. 1984; Ranz et al. 2004). An important implication of the pattern of expression dominance observed in head tissues is that regulation of gene expression in the germline likely influences expression in the fat body and, possibly, in other head tissues.
Several other genes known to be sex biased or sex-specifically regulated in the Drosophila brain show dominance for gene expression (CG16898, CG5966, Cyp309a2, Iris; Goldman and Arbeitman 2007). In addition, two major genes involved in the regulation of transcript levels for large numbers of downstream genes, loqs (microRNA processing, germline stem cell maintenance; Förstemann et al. 2005) and Sxl (sex determination; for review, see Casper and Van Doran 2006), were found to show underdominance in female head tissues. The interpretation of these patterns is complicated by the fact that both loqs and Sxl are multitranscript genes (Förstemann et al. 2005) Significant underdominance was observed for the probe set corresponding to the loqs-RB specific exon, but not for the probe set that interrogated regions common to all transcripts (Figure 5). For Sxl, the probe set corresponding to exon CG18350:11 (common to male-specific transcripts) showed significant underdominance (note that leaky expression of male-specific isoforms of sex-lethal has been documented in D. melanogaster; Tarone et al. 2005). No significant underdominance was observed for the two other probe sets (corresponding to both male and female transcripts and to a single female transcript, respectively), although the trend is toward reduced expression relative to the parental mean (Figure S1http://www.genetics.org/cgi/data/genetics.109.105957/DC1/11http://www.genetics.org/cgi/data/genetics.109.105957/DC1/12http://www.genetics.org/cgi/data/genetics.109.105957/DC1/13http://www.genetics.org/cgi/data/genetics.109.105957/DC1/8http://www.genetics.org/cgi/data/genetics.109.105957/DC1/9). Ranz et al. (2004) also observed significant underdominance for Sxl in whole body samples (but not for head samples). Sxl protein levels in hybrid embryos were also found to show misregulation of sex specificity in both directions of the D. melanogaster–D. simulans cross (PAL Bhadra et al. 2006). Discordant results for different probe sets are not unexpected given the complex splicing patterns for Sxl.
Although head samples were analyzed, it is possible that misexpression of loqs and/or Sxl in other tissues is associated with degeneration of the gonad directly because both of these genes are intimately involved in the normal function of the germ line (Granadino et al. 1993; Park et al. 2007; Wang and Lin 2007). In D. melanogaster–D. simulans F1 hybrid females, hybrid female sterility is associated with a failure to maintain germ line stem cells, eventually resulting in degeneration of the gonad as a whole (Hollocher et al. 2000). Processing of microRNA (miRNA) precursors is mediated by the loqs-Dicer-1 complex and loqs is also required for regulation of endogenous small interfering RNAs (endo-siRNAs; Leuschner et al. 2005; Czech et al. 2008). The loqs-Dicer-1 complex is intrinsically required for germ line stem cell maintenance: in loqs mutant females germ line stem cells fail to self-renew as both daughter cells improperly differentiate gradually depleting stem cell numbers (Park et al. 2007). Sxl functions both broadly in sex determination of the germ line and specifically in germ line cyst development, where its repression coincides with cystoblast differentiation (Wang and Lin 2007). Similarity of mutant phenotypes in loqs and Sxl to abnormal phenotypes observed in hybrid females is suggestive and warrants further investigation of the possible role of these genes in hybrid female sterility.
Cis and trans regulatory divergence and differences in overall transcript level:
No significant difference in the proportion of genes with significant cis (or of trans) tests was observed among genes showing significant differences between species. This indicates that selecting only genes with species effects for analysis is unlikely to bias conclusions about the relative proportion of cis and trans effects. Of 43 probe sets (tested for cis and trans effects) with significant differences between species, 16 were significant for cis differences and 11 were significant for trans differences in expression (Figure 3, A and B). In two cases the species and cis effects are of opposing direction (DIP1 and trpl) and this may be due to undetected trans effects, undetected dominance effects, or to a significant contribution of interaction terms to the differences in allele-specific expression. For trpl, which has a trans estimate (not significant) that is consistent with the species difference, it may be that the trans effect explains most of the difference observed between species. Interestingly, trpl is normally upregulated in D. simulans relative to D. melanogaster and shows expression variation patterns consistent with directional selection (Landry et al. 2007). This is consistent with the observed cis difference, but not the trans effect. In this case the overall difference between species observed may be specific to the lines analyzed and explained mostly by trans effects, perhaps as a result of a marker mutation.
Lemos et al. (2008) found a significant association between dominance for gene expression and trans variation within D. melanogaster. Although there was only a handful of genes with significant dominance that could be tested for cis and trans differences, all six of the genes with significant dominance also showed significant trans differences (Figure 6). Of these, three are also significant for cis differences. The effects of both trans by trans and a subset of cis by trans interactions are included in the estimate of composite trans effects, while the composite cis effects include differences between alleles in cis by trans interactions. For cases in which both the composite cis and composite trans effects are significant, underlying interactions may be due to either or a combination of both. These results indicate that for species hybrids the contribution of trans by trans and cis by trans effects to misexpression may be roughly equal.
In summary, a novel whole-genome approach to efficiently assay allele-specific expression with tiling microarrays was developed. Using estimates of overall expression and allele-specific expression, a model parameterizing expression differences in terms of cis and trans effects was used to test for regulatory divergence. More genes appear to be affected in cis (27%) than in trans (16%), with a smaller number (4%) showing both cis and trans regulatory divergence. Trends toward increased expression divergence in fly heads among defense, olfaction, and sex hierarchy regulated genes suggest possible links between regulatory divergence affecting these genes and species differences in reproductive and ecological traits. For a subset of these genes cis/trans regulatory differences were identified, and these are good candidates for future studies connecting regulatory divergence with specific trait differences. In addition, several genes known to be associated with mating behavior are misexpressed in hybrid female heads, suggesting the possibility of hybrid behavioral abnormalities. While not many genes exhibited dominance variation, those that did were highly suggestive of an association between D. melanogaster, D. simulans hybrid female sterility and nonadditivity of expression in major genes. This may be due to changes in expression in somatic tissues, which can affect development of the germ line and oogenesis, or the transcripts may be underexpressed in the germ line itself. Interestingly these genes are likely conserved in overall expression level, but are divergent in regulation, suggesting that deleterious regulatory interactions in otherwise functionally conserved genes may contribute to reproductive isolation between these species.
We thank Kristy Harmon for assistance with sample preparation and the Bloomington Drosophila Stock Center and the Tucson Drosophila Species Stock Center for providing fly strains. In addition we greatly appreciate assistance with GEO submission provided by Rob Kulathinal and Brandon Walts and bioinformatics provided by Brandon Walts. This research was supported by National Institutes of Health grant 1R01GM077618.
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.105957/DC1.
Microarray data from this article have been deposited with the Gene Expression Omnibus under accession no. GSE17453.
Communicating editor: J. A. Birchler
- Received June 5, 2009.
- Accepted August 3, 2009.
- Copyright © 2009 by the Genetics Society of America