Dosage compensation refers to the equalization of X-linked gene transcription among heterogametic and homogametic sexes. In Drosophila, the dosage compensation complex (DCC) mediates the twofold hypertranscription of the single male X chromosome. Loss-of-function mutations at any DCC protein-coding gene are male lethal. Here we report a population genetic analysis suggesting that four of the five core DCC proteins—MSL1, MSL2, MSL3, and MOF—are evolving under positive selection in D. melanogaster. Within these four proteins, several domains that range in function from X chromosome localization to protein–protein interactions have elevated, D. melanogaster-specific, amino acid divergence.
THE Drosophila core dosage compensation complex (DCC) is composed of five proteins and two noncoding RNA transcripts that complex to mediate the twofold hypertranscription of the single male X chromosome, largely through chromatin remodeling (Lucchesi et al. 2005). Loss-of-function mutations in msl1, msl2, msl3, mof, or mle result in male-specific lethality (Belote and Lucchesi 1980; Skripsky and Lucchesi 1982; Hilfiker et al. 1997).
These five proteins are evolutionarily unrelated and functionally distinct. MOF acetylates lysine 16 of histone 4 (Bone et al. 1994; Hilfiker et al. 1997), a histone mark known to stimulate transcription (Ikeda et al. 1999). Two other components, MSL1 and MSL2, are thought to scaffold the complex to the X chromosome (Lyman et al. 1997; Scott et al. 2000), while MSL3 bound to MSL1 stimulates histone acetylation induced by MOF (Morales et al. 2004). Finally, MLE interacts with at least one of the two noncoding RNAs associated with the DCC complex—roX1 and roX2—likely guiding these RNAs to the DCC and reinforcing their integration into the complex (Copps et al. 1998). Other proteins such as JIL-1 (Jin et al. 2000) and the NURF complex components (Bai et al. 2007) interact with the DCC but also are known to engage in several other biological processes. We therefore refer to MSL1, MSL2, MSL3, MOF, and MLE as the “core” DCC proteins. We note that these five proteins are also considered core DCC components in an alternative dosage compensation model posited by Birchler et al. (2003).
The DCC components are coded by essential genes that are conserved across Drosophila and at least four other drosophilid genera (Marin et al. 1996). Nevertheless, a genome scan of D. melanogaster–D. simulans divergence revealed several candidate Gene Ontology terms, including “dosage compensation,” that were enriched for genes with unusual patterns of divergence (Begun et al. 2007). However, those data were insufficient for inferring the population genetic mechanism, which motivated the detailed D. melanogaster polymorphism survey presented here. Our population genetic analysis supports the hypothesis that four of the five core DCC proteins are evolving adaptively, and surprisingly, this pattern is largely restricted to the D. melanogaster lineage.
We found a consistent and significant excess of amino acid fixations at msl1, msl2, msl3, and mof along the D. melanogaster branch (Table 1), suggesting that recurrent directional selection is an important force driving rapid evolution at these four loci in D. melanogaster. We found no evidence of recurrent adaptive protein evolution for mle; however, limited statistical power due to low levels of polymorphism at this locus makes our inability to reject neutrality difficult to interpret. Reduced polymorphism at mle may be explained by its location near a region of reduced crossing over at the 2R centromere. In D. simulans, only mof exhibited evidence of adaptive protein evolution (Table 1). These data suggest that adaptive protein divergence of DCC components is considerably more common in D. melanogaster than in D. simulans. An attempt to use PAML to understand long-term patterns of molecular evolution was thwarted by unreliable alignments among Drosophila species outside the D. melanogaster subgroup, which raises the possibility of rapid evolution on other distantly related lineages.
We used finer-scale analysis to gain biological insight into particular protein functions that might be under selection (Table 2). No known protein domains of MOF or MSL2 exhibited evidence of rapid evolution; in fact, the nonsynonymous divergence for these domains was relatively low compared to genomic averages (Begun et al. 2007). However, the MSL1 glycine-rich domain, which is thought to be involved in MSL1–MSL1 interactions (Li et al. 2005), as well as the MSL1 X chromosome-binding domain, have unusually high dN. The MSL1 leucine zipper/apolar/coiled-coil domain, essential for MSL2 binding as well as X chromosome localization (Li et al. 2005), and the polar region of MSL3, which is thought to cooperate with the chromatin-binding domain of MSL3 (Buscaino et al. 2006), also had elevated dN.
The four proteins at which adaptive evolution was inferred in D. melanogaster interact directly with at least one other member of the complex, which suggests the possibility of a history of coevolutionary interactions within this single complex; recurrent directional selection of one or more members of the complex may drive compensatory evolution in other members. Protein domains known to be involved in protein self-association, localization to the X chromosome-bound complex, intraprotein interactions, and interprotein interactions were found to be evolving particularly rapidly. These findings are consistent with the hypothesis of within-complex and even within-protein coevolutionary interactions and suggest protein domains where adaptive evolution is concentrated. These domains are excellent candidates for investigating DCC functional divergence between D. melanogaster and D. simulans.
The nature of DCC functional divergence between these sister species depends on the biological or molecular function targeted by directional selection. Adaptive protein evolution at msl1, msl2, msl3, and mof may result from selection of DCC function itself or of a dosage-independent function carried out by either a DCC protein or a non-DCC protein undergoing adaptive evolution that interacts with a dosage protein. For example, the nuclear pore protein, NUP153, interacts with MSL3 and MOF (Mendjan et al. 2006) and exhibits patterns of adaptive protein evolution along both the D. melanogaster and D. simulans lineages (Presgraves and Stephan 2007). However, given the observation that NUP153 has been under directional selection in both D. melanogaster and D. simulans, it is not clear why the putative coevolutionary interactions among the core DCC proteins would be restricted to the D. melanogaster lineage. Indeed, lineage specificity must characterize whatever biological or molecular process is driving adaptive evolution at the DCC.
An alternative hypothesis for explaining D. melanogaster-specific adaptive protein evolution at several core DCC loci may be a host–pathogen interaction. The male-killing bacteria, Spiroplasma, has been found in several natural populations of D. melanogaster that occur near the equator (Montenegro et al. 2000; Pool et al. 2006). Veneti et al. (2005) showed that Spiroplasma-infected D. melanogaster males carrying a loss-of-function mutation in any one of the five core DCC components survived to the third larval instar stage, while similarly infected males that carried no DCC mutations died as embryos. This result suggests that the deleterious effect of Spiroplasma on male fitness may be mediated through interactions with the dosage compensation complex and that a host–pathogen arms race localized to the DCC proteins may drive recurrent adaptive protein evolution. Interestingly, Spiroplasma has yet to be identified in D. simulans. Evidence that DCC adaptive protein evolution is more prevalent in Drosophila species with histories of Spiroplasma infections in nature would support this host–pathogen hypothesis.
The authors thank T. Mackay for providing the inbred D. melanogaster lines, S. Nuzhdin for providing the inbred D. simulans lines, and M. Kuroda for helpful comments on an earlier draft of the manuscript. M.T.L. was supported by a National Science Foundation (NSF) Graduate Research Fellowship. A.K.H. was supported by an NSF Postdoctoral Fellowship in Biological Informatics. This work was funded by National Institutes of Health grant GM071926 to D.J.B.
Communicating editor: D. M. Rand
- Received July 25, 2007.
- Accepted September 18, 2007.
- Copyright © 2007 by the Genetics Society of America