Hybrids between species are often characterized by novel gene-expression patterns. A recent study on allele-specific gene expression in hybrids between species of Drosophila revealed cases in which cis- and trans-regulatory elements within species had coevolved in such a way that changes in cis-regulatory elements are compensated by changes in trans-regulatory elements. We hypothesized that such coevolution should often lead to gene misexpression in the hybrid. To test this hypothesis, we estimated allele-specific expression and overall expression levels for 31 genes in D. melanogaster, D. simulans, and their F1 hybrid. We found that 13 genes with cis-trans compensatory evolution are in fact misexpressed in the hybrid. These represent candidate genes whose dysregulation might be the consequence of coevolution of cis- and trans-regulatory elements within species. Using a mathematical model for the regulation of gene expression, we explored the conditions under which cis-trans compensatory evolution can lead to misexpression in interspecific hybrids.
VARIATION in Darwinian fitness results from interactions among genes in the context of environmental variation. Through the process of natural selection, genes within a genome become coadapted (Dobzhansky 1937). The extent of genomic coadaptation within species can be measured ex post facto by disrupting the harmonious genetic background. A classical example is the production of novel phenotypes in hybrids between species, which is thought to result, at least in part, from combinations of incompatible gene products encoded by the respective genomes now present in the same cells. Although few examples of the molecular basis of such interactions are known, it is likely that regulatory differences are important (Orr and Presgraves 2000; Osborn et al. 2003).
Gene regulation involves numerous molecular interactions. Elements within a given regulatory module are expected to be coadapted (Dover 1992; Porter and Johnson 2002). So far, there is limited evidence for regulatory coadaptation, owing to the difficulty of addressing this question on a large scale. Recent genome-scale measurements of gene expression in interspecific hybrids of Drosophila (Michalak and Noor 2003; Ranz et al. 2004), maize (Auger et al. 2005), and Arabidopsis (Comai et al. 2003; Wang et al. 2004) have revealed numerous genes with expression levels in F1 hybrids that are completely outside the range of that found in the parental species. These observations support the hypothesis of pervasive coadaptation of genetic regulatory elements. The reasons for the hybrid misexpression are unclear. As general possibilities, anomalies in the hybrid transcriptome could be a consequence of irregular development, generalized stress response associated with the presence of two incompatible genomes in the same cells, or of dysregulation of a few major regulatory genes whose effects cascade through the regulatory network (Comai et al. 2003; Osborn et al. 2003). In general, these scenarios are difficult to disentangle. The hypothesis of cis × trans interaction at individual genes, however, can be subjected to a rigorous experimental test. We can show whether, for individual genes, the interaction of the trans-regulatory elements from one species with the cis-regulatory elements of the other is responsible for the dysregulation in the hybrid. This model might be most appropriate for genes whose cis- and trans-regulatory elements have evolved since the time that the parental species became reproductively isolated (e.g., Shaw et al. 2002). It is this hypothesis that we set out to examine for a sample of genes.
The rationale for the hypothesis is that the regulation of gene expression requires the harmonious interaction of trans-regulatory elements (in the simplest case, transcription factors) with cis-regulatory elements (in the simplest case, DNA-binding sites for the transcription factors) (e.g., Ptashne and Gann 2002). If natural selection acts to maintain an optimal level of gene expression through time (stabilizing selection), as it appears to (Denver et al. 2005; Lemos et al. 2005), then genetic changes in cis- and trans-regulatory elements that compensate each other may accumulate. Consequently, regulatory elements may diverge genetically between species even though the level of gene expression remains approximately constant. Thermodynamic models that have examined the properties of gene regulatory systems also suggest that similar phenotypes may be covering up genetic variation in the underlying regulatory elements and proteins (Gibson 1996; Veitia 2003). Compensated cis- and trans-regulatory evolution between species is a manifestation of a process coined developmental-system drift by True and Haag (2001), through which the phenotypes are evolutionarily maintained despite a turnover of the underlying developmental networks.
Experimentally, different types of regulatory divergence can be detected by assaying allele-specific gene expression in related species and their hybrids (e.g., Ohno 1969; Parker et al. 1985; Wray et al. 2003; Pastinen and Hudson 2004). When the alleles differ in their expression to the same extent in the parental species as in the hybrids, then cis-acting genetic differences may be inferred (e.g., Yan et al. 2002; Wittkopp et al. 2004). When the alleles differ in their expression to a larger extent in the parental species than in the hybrids, then trans-acting genetic differences may be inferred. However, when the cis-regulatory divergence between alleles detected in the hybrid background is larger than the divergence between species, we can infer that there is also divergence in trans that compensates the changes in cis to bring the level of expression of the gene to a more similar level between species than expected given the divergence in cis. An extreme example of this scenario would be when the allelic divergence in the hybrid is in the opposite direction to the species divergence. A recent survey of allelic expression of ∼30 genes in Drosophila melanogaster, D. simulans, and their F1 hybrid has revealed that, for roughly one-third of the genes, the alleles differed in their expression to a smaller extent between the parental species than in the hybrids (Wittkopp et al. 2004). This result suggests both cis-acting and trans-acting regulatory differences between the parental species, whose interactions within species produce more similar expression between species than expected given the regulatory divergence in cis.
To test for cis-trans-regulatory coadaptation that would result in misexpression, it is necessary to assay the total levels of gene expression in hybrids and the parental species, as well as the ratio of expression of the parental alleles in the hybrid background. The reason is that if there is coevolution between cis-acting and trans-acting factors originating from the parental genomes, it may result in regulatory incompatibilities and in overall misexpression in the hybrids. To test whether misexpression in hybrids is accompanied by patterns of cis-trans compensatory regulation, we tested whether genes that are known to be misexpressed in the F1 hybrid between D. melanogaster and D. simulans (Ranz et al. 2004) show differential expression of the parental alleles concordant with coevolution of cis- and trans-regulatory elements. We also tested for total misexpression in hybrids for genes that exhibited cis-trans compensatory regulation in Wittkopp et al. (2004). We combined these results to identify genes whose compensatory cis-trans regulation might contribute to their misexpression in hybrids between these two species. Finally, to explore the kinds of molecular changes that could underlie compensatory regulatory changes and hybrid misexpression, we developed a mathematical model of evolutionary divergence of gene regulation that incorporates changes in cis- and trans-regulatory elements. The qualitative behavior of this model suggests that cis-trans compensatory evolution can lead to misexpression in the hybrid as well as differential allelic expression, a pattern observed for a number of the genes examined experimentally.
MATERIALS AND METHODS
Gene expression analysis:
Relative allele-specific expression in the hybrid background and relative gene expression between parental species were assayed in D. melanogaster and D. simulans females and in their F1 hybrid. This sample includes 31 genes in total (supplemental Table 1 at http://www.genetics.org/supplemental/): a set of 23 genes studied in a cross between D. melanogaster Canton-S females and D. simulans Sim1 males and a second set of 8 genes studied in a cross between D. melanogaster zhr females and D. simulans Tsimbazaza males. In both cases, virgin females of 3–5 days old were used to produce whole-body extracts of mRNA. The first set of genes was selected on the basis of their total expression level in F1 hybrids and in the parental species as determined with DNA microarrays by Ranz et al. (2004), with no prior knowledge on the relative allelic expression of the genes in the F1 hybrid. For this set of genes, we measured relative allelic expression level in the hybrid and between species, using pyrosequencing (Ahmadian et al. 2000). The total gene expression level in hybrids relative to that in the parents was obtained from Ranz et al. (2004). The second set of 8 genes was chosen because previous results had shown patterns of expression consistent with compensatory cis-trans regulation in 24- to 28-hr-old hybrids (Wittkopp et al. 2004). Because gene expression is often age specific, we used 3- to 5-day-old flies for this work to match the sample conditions used for the microarray experiment (Ranz et al. 2004). Prior to this work, nothing was known about the total expression of these genes in hybrids. We measured the total level of expression in hybrids relative to that in the parental strains using quantitative real-time PCR (methods shown below) and measured allele-specific expression in hybrids and parental species using pyrosequencing. Only 4 of the 8 selected genes showed patterns consistent with compensatory cis-trans regulation (6 of 8 still show divergence in cis) at this later stage.
RNA/DNA extraction for pyrosequencing assays:
In the cross between D. melanogaster Canton-S and D. simulans Sim1, frozen samples of adult flies used in the microarray analysis (Ranz et al. 2004) were studied. For the cross between D. melanogaster zhr and D. simulans Tsimbazaza, the same collection conditions were used (3- to 5-day-old virgin females, frozen between 12 and 2 pm). Seven D. melanogaster and 7 D. simulans virgin females were pooled for each extraction (supplemental Figure 1 at http://www.genetics.org/supplemental/). Pools containing 14 interspecific hybrid virgin females were extracted in parallel. Four independent RNA extractions, one per pool of flies, were performed and were used for four independent cDNA syntheses (Omniscript; QIAGEN, Valencia, CA). Total RNA was extracted using the SV RNA system (Promega, Madison, WI) according to Wittkopp et al. (2004). Extractions with the SV total RNA kit allowed the independent isolation of genomic DNA from the same flies, which is required for normalization of the cDNA samples analyzed with pyrosequencing.
Quantitative real-time PCR:
RNA was extracted in duplicate as described above without the DNA extraction step and with a DNAse treatment to eliminate any genomic DNA carryover (DNAfree; Ambion, Austin, TX). Instead of relying on a housekeeping gene to control for RNA abundance among strains and extractions, we measured the concentration of total RNA in quadruplicate, using RiboGreen following the manufacturer's protocol (Molecular Probes, Eugene, OR). An equal amount of total RNA was used (1 μg) for two reverse-transcription reactions per RNA extraction, for a total of four cDNA synthesis reactions per strain, using Superscript II (Invitrogen, Carlsbad, CA) following standard protocols [using poly(dT) and random hexamers]. The four cDNA samples were pooled, and four real-time PCR reactions on each pooled cDNA were performed for each gene, using custom primers and SYBR green 1 (QIAGEN). Primers for the quantitative real-time PCR (rt-qPCR) were the same as those used in the pyrosequencing assays (see below) and were designed relative to conserved regions between the species and strains. To establish the relationship between fold change and the cycle-threshold (CT) measured, we established standard curves (eight dilutions and one no-template control) for each of the genes by purifying PCR products (ExoSAP-IT; United States Biochemical, Cleveland) generated from primers external to the primers used for the rt-qPCR. PCR reactions were run on a MX3000P real-time PCR machine (Stratagene, La Jolla, CA). Ninety-five percent confidence intervals (C.I.'s) were calculated around the mean of the four replicates and nonoverlapping C.I.'s for two samples were considered as significantly differentially expressed.
Allele-specific gene expression level:
For all of the 31 genes, pyrosequencing was used to measure allele-specific gene expression as described in Wittkopp et al. (2004). Briefly, a fragment of 500–1000 bp of the coding sequence of each of the genes was sequenced for each of the strains studied. Pyrosequencing uses a single-nucleotide difference in the transcribed sequence to measure the relative expression of two alleles in the same sample. A primer is annealed upstream of the divergent site and extended one base at a time, with the number of nucleotides incorporated at each position proportional to the number of transcripts in the sample. The relative expression levels of D. melanogaster and D. simulans alleles were determined by calculating the ratio of species-specific nucleotide incorporation at the divergent site.
Four parental pools, each composed of 14 flies, and an additional four pools, each with 14 hybrid flies, were analyzed. RNA from each of these eight pools was used in four separate cDNA synthesis reactions. Pyrosequencing was used to measure the ratio of D. melanogaster to D. simulans alleles in each genomic DNA extraction in duplicate and in each cDNA synthesis, for a total of 16 DNA and 32 cDNA measurements for each gene. For zhr/Tsimbazaza, cDNA samples from the four hybrid pools were measured in duplicate, whereas each hybrid DNA was measured once. Parental cDNA samples were measured in triplicate and DNAs were measured twice. A total of 12 DNA and 20 cDNA samples were collected for each gene examined in the zhr/Tsimbazaza cross. Replication of cDNA synthesis reactions and pyrosequencing of hybrid genomic DNA samples were reduced in the zhr/Tsimbazaza samples because these were found to be small sources of error in the Canton-S/Sim1 samples (data not shown). Pyrosequencing reactions that did not meet quality-control standards (on the basis of manual examination of the signal for conserved bases and background noise) were excluded from analysis. In all, 4% of measurements were excluded for Canton-S/Sim1 and 9% were excluded for zhr/Tsimbazaza. The ratios of the species-specific bases for each sample were then calculated. All ratios were log2-transformed to make them normally distributed. On this scale, a value of 0 means equal expression of both alleles, with positive and negative values representing more transcripts from D. melanogaster and D. simulans alleles, respectively.
Measurements of genomic DNA were used to normalize the cDNA ratios (Wittkopp et al. 2004). No significant difference was observed among DNA measurements from replicate hybrid pools (ANOVA, P = 0.86); thus all measurements were combined for normalization. For hybrid cDNA measurements, the median of the log-transformed hybrid DNA measurements was subtracted from each cDNA value to correct for differences in PCR amplification and/or pyrosequencing. Parental pools have an additional source of error from the potential unequal extraction of D. melanogaster and D. simulans alleles in the pools of adult flies. For each parental pool, a general linear model (SAS; SAS Institute, Cary, NC) was used to obtain regression fits of parental DNA on median hybrid DNA (R2 = 0.94). The fitted estimate of parental DNA for each gene within each pool was subtracted from the corresponding cDNA measurements to account for differences in PCR amplification, pyrosequencing, and possible extraction differences. This regression model allows DNA measurements of the same pool determined using different genes to be combined to more accurately estimate and incorporate the extraction bias of each pool. Normalized, log-transformed cDNA ratios estimated by pyrosequencing were used for analysis. For each gene, within each cross (Canton-S/Sim1 or zhr/Tsimbazaza), data were fitted to the following model using proc Mixed in SAS (SAS Institute), using restricted maximum likelihood (REML) to estimate parameters,where Y is the log-transformed, normalized cDNA value, G is the genotype (i = parent or hybrid), P is the pool (j = 1, 2, 3, 4) treated as a random effect, and e is a random error term. The variance component of “pool” is determined independently for each generation, using the group structure in proc Mixed. A random pool effect (nested within generation: P or H) was included in the model. Separate covariance parameters were fit for the pool effect in the P and H generations because the variance among replicate pools was always much larger for P than for H. Student's t-tests were computed within the Mixed procedure to test three null hypotheses concerning the relative level of expression of the parental alleles. These tests were for (1) no differential expression in the parents (P-ratio = 1), (2) no differential expression in the hybrid (H-ratio = 1), and (3) ratio of allelic expression in the parental lines equals that in the hybrid (P-ratio = H-ratio). These null hypotheses correspond, respectively, to (1) no expression difference between the parental species, (2) no difference in cis-regulatory elements, and (3) no difference in trans-regulatory elements.
The DNA measurements provide an empirical reference point for establishing equal expression of the two alleles in cDNA. In all cases, the variance of cDNA measurement was much larger than that of the DNA samples. The range of replicate hybrid DNA measurements was typically 1–2%. Cis-regulatory differences would be incorrectly inferred only if the difference between the observed and “true” ratio of DNA was greater than the 95% C.I. on the cDNA. In the mixed model used to assess significance, the observed error among replicate measures was incorporated for each gene individually. No systematic bias should be introduced. The normalization strategy was initially developed to calibrate linear standard curves. Note that a 1.4-fold change corresponds to pyrosequencing measurements of 58% of one allele and 42% of the other. A 2% error (i.e., 48/52) would correspond to only a 1.08-fold change.
Because we are interested in general trends, a P-value cutoff of 0.05 was used as the criterion for rejecting the null hypothesis, with no correction for multiple tests. Note that, by treating replicate pools of flies as a random effect, the test is more conservative than testing merely for differences in the means of the observed samples. In our data, 41% of 101 significant tests (150 tests total) have P-values between 0.05 and the Bonferroni-adjusted cutoff of 0.00033 (31 genes). Pyrosequencing appears to be better able to detect small differences than microarrays, probably because of the large number of replicates or perhaps because DNA microarrays are inherently noisier. For instance, some genes that were not differentially expressed between species based in the microarray data could be shown to be differentially expressed using pyrosequencing. Because the microarrays were developed for D. melanogaster, microarray comparisons of D. melanogaster and D. simulans may be slightly biased toward D. melanogaster, depending on the degree of divergence in the coding sequence of the gene studied. Pyrosequencing is unlikely to suffer from this bias, because differential PCR efficiency of the melanogaster and simulans alleles is controlled by the DNA amplifications. Pyrosequencing and microarray results were nevertheless in good agreement (r = 0.8, P < 1 × 10−5). We therefore used microarray measurements from Ranz et al. (2004) to compare the expression level of the hybrid relative to that of the parents and pyrosequencing to measure the expression difference between species and the differential allelic expression in hybrids for consistency of measurements across the genes studied.
Relative allelic expression between D. melanogaster and D. simulans and in their F1 hybrid is shown in Figure 1. We hypothesized that compensatory cis-trans regulatory evolution within species should often result in misexpression in the interspecific hybrid owing to incompatibilities in the regulatory systems. On the basis of this hypothesis, we compared allele-specific expression with total level of expression for 31 genes in D. melanogaster, D. simulans, and their interspecific hybrid. The goal was to determine what fraction of genes, if any, that exhibited misexpression in F1 hybrids showed a pattern of allelic expression consistent with cis-trans compensatory regulation. This result would be expected for genes that had undergone compensatory cis-trans regulatory evolution in at least one of the parental lineages.
Specifically, pyrosequencing was used to examine whether each gene showed (1) differential expression in the parents (P-ratio ≠ 1), (2) differential expression in the hybrid (H-ratio ≠ 1), or (3) a difference in the ratio of allelic expression in the parental lines relative to that in the hybrid (P-ratio ≠ H-ratio). Different patterns of regulatory evolution predict different results for these three tests. We distinguished five patterns:
Conserved: No change in regulatory functions predicts that P-ratio = 1, H-ratio =1, and P-ratio = H-ratio.
cis: A change in only cis-regulatory elements predicts that P-ratio ≠ 1, H-ratio ≠ 1, and P-ratio = H-ratio.
trans: A change in only trans-regulatory elements predicts that P-ratio ≠ 1, H-ratio = 1, and P-ratio ≠ H-ratio.
cis + trans: A change in both cis- and trans-regulatory elements predicts that H-ratio ≠ 1 and P-ratio ≠ H-ratio. In a subset of these genes, the ratio of expression levels in the parents is greater than the ratio of allele-specific expression in the hybrids. Changes in cis and trans contribute to changes in gene expression between species in the same direction.
cis × trans: In another subset of genes with H-ratio ≠ 1 and P-ratio ≠ H-ratio, the ratio of expression levels in the parents is less extreme than, or in the opposite direction from, the ratio of allele-specific expression in the hybrids. These are cases of cis-trans compensatory regulation.
In addition to their regulatory evolution being classified as conserved, cis, trans, cis + trans, or cis × trans, the genes can be classified orthogonally after Gibson et al. (2004) according to their overall level of expression in the hybrid as assayed through either microarray analysis (Ranz et al. 2004) (supplemental Figure 2 at http://www.genetics.org/supplemental/) or rt-qPCR measurements (supplemental Figure 3 at http://www.genetics.org/supplemental/). These categories are as follows:
No change: The overall level of expression in the hybrid and both parents are equal.
Dominant: The overall level of expression in the hybrid is equal to that of one of the parents.
Semidominant: The overall level of expression in the hybrid is intermediate between that of the parents.
Underexpressed: The overall level of expression in the hybrid is less than that of both parents.
Overexpressed: The overall level of expression in the hybrid is greater than that of both parents.
Together, these classifications produce the matrix shown in Table 1, where each entry is the number of genes in the corresponding cell. The patterns of allelic expression are also represented in Figure 1. On the basis of overall level of expression in the hybrids relative to that in the parents, Table 1 includes 3 genes showing no change, 1 showing dominance, 4 showing semidominance, and 23 showing either underexpression (5 genes) or overexpression (18 genes). On the basis of allele-specific expression patterns, Table 1 includes 2 genes that are conserved, 6 genes showing cis-regulatory differences only, 3 genes showing trans-regulatory differences only, and a total of 20 genes (almost two-thirds of the total) showing either cis + trans differences (5 genes) or cis × trans differences (15 genes). Four genes classified as cis × trans showed a significant cis-regulatory effect but this effect was not statistically significant from the difference between species, which was not different from 0.
In Table 1, the genes that qualify as showing compensatory cis × trans regulatory evolution are Mth, CG14438, CG11727, CG9273, CG3775, CG5288, CG9390, CG11236, CG14629, CG32444, CG15818, CG8997, CG9338, Cyp6w1, and CG17608. Mth shows a semidominant pattern of expression in hybrids; CG14438, CG11727, and CG9273 are underexpressed; and CG17608 does not change in hybrids. The remaining 10 genes were overexpressed in hybrids. In all, 10 of the 18 genes that are overexpressed and 3 of the 5 genes underexpressed in hybrids showed cis × trans regulation on the basis of allele-specific expression and thus are cases of compensatory regulation. This represents a total of 13 of 23 genes misexpressed in hybrids showing cis × trans regulation on the basis of allele-specific expression. These results not only are concordant with our hypothesis that some genes would show compensating cis- and trans-regulatory differences, but also indicate that a substantial proportion of regulatory changes fall into this category. Among the 10 cis × trans genes that are overexpressed in hybrids, in 7 cases (CG11236, CG15818, CG3775, CG5288, Cyp6w1, CG8997, and CG9338) the differential allelic expression in hybrids is opposite in sign to the difference in expression in the parental species. In other words, the allele originating in the parent with the lower level of expression is expressed to a higher level than the alternative allele in the hybrid. For example, gene CG9338 shows a log2 ratio in expression of −0.37 between species and 0.55 in the hybrid, and the gene is 2.78 times more highly expressed in the hybrid than in the parental species (supplemental Figure 3 at http://www.genetics.org/supplemental/). Another example, CG15818, shows log2 ratio in expression of 0.97 between species and −1.04 in the hybrid and is overexpressed by 5.15-fold in the hybrid relative to the average of the parents. Interestingly, CG15818 is likely to be associated with spermatogenesis (electronic annotation of gene ontology, www.flybase.org) and is related to rapidly evolving genes (Holloway and Begun 2004). On the other hand, not all cases of overexpression are associated with compensatory regulation between species. To cite one example, gene CG4847 shows a 5-fold increase in expression level in the hybrid relative to that in the parental species, yet there is no significant difference in the levels of allele-specific expression. This result suggests that nonadditive interaction between cis- and trans-regulatory elements is unlikely to be the source of misexpression of this particular gene in the hybrid. Since pyrosequencing measures the net effect of trans-regulatory changes, it is possible that in the case of CG4847, multiple compensatory trans-acting changes have evolved with opposing effects on gene expression. Interactions among coevolved trans-changes may be just as likely to lead to misexpression in hybrids. Although our results provide strong evidence for frequent compensatory changes in cis-acting and trans-acting regulatory elements, they do not reveal the molecular mechanisms. Elucidating these mechanisms will require detailed studies of individual genes. Unfortunately, little is known about the mechanisms controlling regulation of the genes examined in this study. Nevertheless, a model describing general aspects of gene regulation suggests that compensatory cis-trans changes can lead to similar expression between species and under- or overexpression in the F1 hybrid (see appendix).
Our data provide evidence for rapid compensatory coevolution of genetic regulatory elements. A study of overall gene expression and allele-specific expression in D. melanogaster, D. simulans and their F1 hybrid revealed that, in 15 of 31 genes, more similar overall levels of expression in D. melanogaster and D. simulans is observed than expected given their cis-regulatory divergence, resulting from cis-trans compensatory regulation. Thirteen of these genes were misexpressed in the F1 hybrid. We examined a heuristic model of gene regulation and considered specific examples to explore some of the possible sources of cis-trans compensation (see appendix). Because little is known about the specific mechanisms that control the expression of the genes studied here, the direct relevance of the heuristic model cannot be evaluated, but the genes that we identified are good candidates for further analysis of regulatory divergence. Direct experimental confirmation that cis- and trans-regulatory elements have coevolved by compensatory changes would require (1) characterization of the genetic regulatory elements and (2) measurements of gene expression of each parental allele in the genetic background of the other species. The key variables of the model, such as binding affinity and number of binding sites in the promoter, can be assayed (e.g., Oda et al. 1998; Stormo and Fields 1998; Shaw et al. 2002). Furthermore, the requisite technology exists to transfer individual Drosophila genes and their promoters between species (e.g., Laurie et al. 1990; Wittkopp et al. 2002). Tissue specificity of expression of these genes in the parental species and in their F1 hybrids should also be examined. Because the experiments we describe involve whole-body preparations of mRNA, species-specific differences in a factor that influences organ size may appear like trans-acting regulation, even though the interspecific divergence has nothing to do with the kinetics of transcription factor binding to the target promoter. Epigenetic effects or cytoplasmic-mitochondrial interactions may also act to confound the results. However, we did not examine this possibility here because the direction of the cross on the ratios of allelic expression in hybrids between D. melanogaster and D. simulans has been shown previously by Wittkopp et al. (2004) to have a negligible effect.
Models of gene regulation have made the clear prediction that changes in many parameters can lead to similar gene expression levels, and therefore similar levels of gene expression between species can conceal genetic divergence in the regulatory elements (Gibson 1996; Veitia 2003). Compensatory evolution has been studied in proteins (Kondrashov et al. 2002; Kulathinal et al. 2004), in RNA (Kern and Kondrashov 2004), and in DNA-binding motifs (Ludwig et al. 2000; Carter and Wagner 2002). In many cases, quantifying the functional effects of individual nucleotide or amino acid substitutions is difficult, but in the case of regulatory elements the level or pattern of gene expression provides a relevant assay. Only a few examples of compensatory evolution within regulatory elements have been documented, and examples of cis-trans-regulatory evolution are even rarer. Ludwig et al. (2000) showed that cis-regulatory elements in the even-skipped enhancer of D. melanogaster and D. pseudoobscura have coevolved in such a way that chimeric enhancers show abnormalities in expression pattern whereas the intact elements function apparently normally even in the genetic background of the other species. Juarez et al. (2000) have shown that amino acid substitutions in the DNA-binding domain of the transcription factor NifA reduce its binding affinity in Sinorhizobium meliloti. However, these changes have been compensated by amino acid replacements in its protein-activation domain, which has a higher affinity for the transcriptional machinery. One of the few examples of cis-trans-regulatory incompatibilities that have been identified is between bicoid and its binding sites in the tailless and hunchback promoters in D. melanogaster and Musca domestica. In most interactions, the bicoid protein from one species has a higher affinity for binding sites from the same species, but in the case of the Musca Bicoid protein, incompatibility is manifested through its higher affinity for the tailless promoter of both species (Shaw et al. 2002). Compensatory evolution can therefore occur at the cis-cis-, trans-trans-, and cis-trans-regulatory levels. The relative importance of each of those levels of interaction cannot be determined using our approach, because we concentrated on cases where there is cis-regulatory divergence. It is also impossible to determine from the patterns of expression if the changes in trans affect the transcription factors themselves, such as changes in the protein sequences, or other trans-regulatory effects such as the concentration of transcription factors or of other effects. The contribution of protein sequence variation of transcription factors to genetic variation in gene expression is largely unknown.
Compensatory evolution of cis- and trans-regulatory elements may be promoted by the abundant genetic variation in regulatory elements within species that has been reported. In the human population, for example, single-nucleotide polymorphisms and other types of genetic variation occur frequently in the core promoter region of genes and are known to have an impact on the level of expression, with effects as large as a 20-fold difference in gene expression level (reviewed in Rockman and Wray 2002; Tomso et al. 2005). Genetic variation located in trans has also been identified as contributing extensively to gene expression variation within species (e.g., Brem et al. 2002; Morley et al. 2004). A recent comparative genomic study has also shown that transcription factors can evolve rapidly relative to other classes of genes (Castillo-Davis et al. 2004), which was perhaps unexpected given the potentially large pleiotropic effects of mutations in transcription factors (Tautz 2000). Finally, experimental work has shown that amino acid changes in a transcription factor do not necessarily affect the expression pattern of all of its target genes, but are highly dependent on the particular DNA motifs that are present (Inga et al. 2002). A transcription factor can therefore coevolve with a subset of targets while leaving other interactions intact.
Our results suggest that cis-trans compensatory evolution occurs rapidly and on a wide scale. In the ∼2.5 million years since the divergence of D. melanogaster and D. simulans, >40% of our sample of genes show evidence of compensating regulatory changes. The effects appear to be gene specific and not due to some generalized malfunction of gene regulation in the interspecific hybrid. Analysis of a much larger sample of genes, and of samples of genes in various gene-ontology categories, would be most welcome in determining whether the results we have observed are pervasive across the genome or are more or less prevalent in particular classes of genes. It would also be of great interest to compare pairs of species that are more closely related to examine how these cis × trans interactions accumulate with time. If they accumulate rapidly, it would support the theoretical prediction that regulatory interactions may play a role in the reproductive isolation between species (Johnson and Porter 2000).
APPENDIX—MODEL OF GENE EXPRESSION REGULATION
Among the factors that influence the amount of transcript produced by a gene are the concentration of transcription factors, their binding affinity for the cis-regulatory motifs, and the strength of the cooperativity among them. Cooperativity here refers to the facilitation of the binding of transcription factors by the previous binding of transcription factors (Ptashne and Gann 2002). Experimental work has shown that the number of cis-regulatory elements, nucleotide variation in these elements, amino acid differences in DNA-binding domains of transcription factors, variation in cooperativity among transcription factors, or variation in the recruitment of the general transcription machinery can independently, but not necessarily additively, lead to changes in expression level (Hill et al. 1986; Gill et al. 1990; Beckett et al. 1993; Corton et al. 1998; Oda et al. 1998; Burz and Hanes 2001; Inga et al. 2002; Monti et al. 2002). Models of gene regulation that have included these variables (Gibson 1996; Veitia 2003) have made the clear prediction that changes in many parameters can lead to similar gene expression levels and therefore that similar levels of gene expression between species can conceal genetic divergence in the regulatory elements. In this section we examine the qualitative behavior of a heuristic model for the regulation of gene expression to determine whether misexpression of a gene in a hybrid can result from simple cis-trans-compensatory regulation within species.
The regulation of gene expression in eukaryotes can be modeled using laws of statistical thermodynamics and Boltzmann's factor (e.g., Gibson 1996; Wang et al. 1999). Our model includes variables sufficient to capture the qualitative behavior of gene expression regulation: the concentration of the transcription factor regulating the gene of interest (x, molar); its binding affinity for the cis-regulatory elements of the gene it regulates, expressed as free energy ΔGtf (kilocalories per mole); the strength of cooperativity among the transcription-factor molecules, also expressed in terms of free energy (ΔGco, kilocalories per mole); and the number of binding sites for the transcription factor upstream of the gene of interest (n). We assume that the transcriptional output (R) depends on the fractional occupancy of the promoter, which refers to the proportion of time (pσ) that the promoter spends in the particular configuration σ. The term configuration refers here to the number and arrangement of transcription factors on the binding sites in a promoter region. The probability of a given configuration pσ will depend on the concentration of the transcription factor, the binding affinity of the transcription factor for the binding sites, the number of binding sites, and the cooperativity among the transcription factor molecules. The cooperativity among the transcription factor molecules is assumed to depend on the proximity of the sites to which they bind, and therefore the strength of cooperativity will depend on the number of nearest-neighbor pairs p of transcription factors (p ≤ n − 1). The fractional occupancy of a configuration σ then becomes(A1)
Here, Z = serves to normalize the probabilities to 1, and(A2)(A3)where R is the ideal gas constant and T is the absolute temperature, RT = 0.59 kcal at 25°.
Once the fractional occupancy of each configuration is computed, we can assign to each a transcriptional response rσ and sum over all the configurations to obtain the expression level R:(A4)
We assume that the R0 does not differ between species and can be observed as the transcriptional output given that the gene is transcribed. The transcriptional response function rσ may depend on the number of sites bound (e.g., Gibson 1996) or on the binding affinity of the transcription factors for the transcriptional machinery. For simplicity, we let rσ be equivalent between species and set rσ = 1; this corresponds to transcription taking place whenever the promoter is bound. Accordingly, the transcriptional response R takes the form(A5)where WN is defined by(A6)
We can now imagine two species that can be hybridized to form an F1 hybrid. The species may differ in the values of the parameters x, n, ΔGtf, or ΔGco. When necessary, we distinguish the parameters of the two species by unprimed and primed symbols. A difference in x would correspond to divergence in the level of expression of the transcription factor (trans); that in n, to divergence in the number of binding sites upstream of the gene (cis); that in ΔGtf, to divergence in the affinity of the transcription factor for the binding sites [which can result from mutation of the transcription factor protein (trans) or the DNA binding sites (cis)]; and that in ΔGco, to divergence of the transcription factor molecules that changes their cooperativity (trans) or a difference in the distance among the binding sites (cis) that would produce a similar effect. The total expression level in the hybrid is the sum of the expression levels of the two alleles. This depends on the fractional occupancies of the two promoters, which can be deduced on the assumption that the transcription factors are equally available to each promoter.
Species that can hybridize are generally closely related. For example, the Drosophila species examined here diverged 2.5 MYA (∼5% nucleotide divergence) (Powell 1997). It is therefore reasonable to assume mutational changes affecting only a minimal number of parameters necessary for compensatory changes, say, two. Divergence only in cis-acting factors, through different numbers of binding sites n′ ≠ n, or through different binding affinity of the binding sites (A′ ≠ A) or distance between binding sites resulting in changes in binding cooperativity (ΔGco ≠ ΔGco′ and therefore b′ ≠ b), leads to a similar difference in expression levels between species and between alleles in the hybrid and leads to an intermediate gene expression level in the hybrid (Rh):(A7)(A8)
However, changes only in trans-acting factors (A′ > A, ΔGtf′ < ΔGtf, or x′ > x) lead to differential expression between parents but not between alleles in the hybrid. Note that in this case, contrary to the previous case where a cis-acting change was examined, a change in A represents a change in trans so that it is consistent with the case above:(A9)(A10)
Note that, because the promoters are functionally equivalent,(A11)
In this case(A12)Rh can be as large as R′, for example, if x′ > x, so that one species is closer to saturation, for instance, in cases where most transcription factor-binding sites are occupied by transcription factors. The ratio of allelic expression in the hybrid is nevertheless expected to be 1 because the promoters are functionally indistinguishable.
Our main interest is cases in which there is cis-regulatory divergence between alleles (Dal ≠ 1) but the expression level between species (Dsp) is compensated so that the difference in expression between the parental species is either (1) smaller than that between alleles in the hybrid background or (2) in the opposite direction. This can be achieved in many ways, but we concentrate on cases in which regulatory compensation is brought about by differences in the number of binding sites (n, a cis effect) balanced against differences in the expression level or binding affinity of the transcription factor (A, a trans effect). Other scenarios can be readily imagined (e.g., Shaw et al. 2002), but we focus on this mechanism because it can evidently lead to a rapid increase in genetic variation (Stone and Wray 2001).
Consider therefore the consequences of the assumptions that R = R′, n′ < n, and A′ > A. The expression level in hybrids is then(A13)
Because (WN − 1)/WN > (WN′ − 1)/WN′ follows from A′ > A,(A14)(A15)(A16)
However, if the regulatory systems are near saturation for the transcription factor, then (WN − 1)/WN ≈ (WN′ − 1)/WN′ and therefore(A17)
In other words, gene expression levels in hybrids can be lower than that in the parents despite the conserved level of expression between the parents. On the other hand, if the transcription factor in one species is at a concentration much smaller than saturation, then, for example, (WN − 1)/WN will be small and Rh can exceed R, and the ratio can be as large as(A18)
A specific example is presented in supplemental Figure 4 (http://www.genetics.org/supplemental/), where the affinity of the transcription factor for the binding sites in one species compensates for the difference in the number of binding sites. Although the gene is expressed at the same level in both species, Dsp = 1, the value of Dal ≠ 1 and the expression level of the gene in the hybrid is greater than that in the parental species. Through cis-trans compensatory regulation, therefore, the level of gene expression can be conserved between species but the gene will be overexpressed in the hybrid.
We thank the members of the Hartl Lab for motivating discussions and Nadia Aubin-Horth for comments on an earlier draft of the manuscript. We are grateful to the Bauer Center for Genomics Research for the use of their facilities and to Belinda Haerum for lab assistance. C.R.L. is a Frank Knox Memorial Fellow at Harvard University and was supported during this work by graduate fellowships from the Natural Sciences and Engineering Research Council of Canada, Fonds Québecois de Recherche sur la Nature et les Technologies, and Fondation Desjardins du Québec. P.J.W. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (DRG 1795-03). This project was funded by National Institutes of Health grants GM68465 and GM60035 to D.L.H. and R01-AI064950 to A.G.C.
Communicating editor: D. M. Rand
- Received June 29, 2005.
- Accepted August 25, 2005.
- Copyright © 2005 by the Genetics Society of America