Pervasive and Largely Lineage-Specific Adaptive Protein Evolution in the Dosage Compensation Complex of Drosophila melanogaster

Dosage compensation refers to the equalization of X-linked gene transcription among heterogametic and homogametic sexes. In Drosophila, the dosage compensation complex (DCC) mediates the twofold hypertranscription of the single male X chromosome. Loss-of-function mutations at any DCC protein-coding gene are male lethal. Here we report a population genetic analysis suggesting that four of the five core DCC proteins—MSL1, MSL2, MSL3, and MOF—are evolving under positive selection in D. melanogaster. Within these four proteins, several domains that range in function from X chromosome localization to protein–protein interactions have elevated, D. melanogaster-specific, amino acid divergence.

T HE Drosophila core dosage compensation complex (DCC) is composed of five proteins and two noncoding RNA transcripts that complex to mediate the twofold hypertranscription of the single male X chromosome, largely through chromatin remodeling (Lucchesi et al. 2005). Loss-of-function mutations in msl1, msl2, msl3, mof, or mle result in male-specific lethality (Belote and Lucchesi 1980;Skripsky and Lucchesi 1982;Hilfiker et al. 1997).
These five proteins are evolutionarily unrelated and functionally distinct. MOF acetylates lysine 16 of histone 4 (Bone et al. 1994;Hilfiker et al. 1997), a histone mark known to stimulate transcription (Ikeda et al. 1999). Two other components, MSL1 and MSL2, are thought to scaffold the complex to the X chromosome (Lyman et al. 1997;Scott et al. 2000), while MSL3 bound to MSL1 stimulates histone acetylation induced by MOF (Morales et al. 2004). Finally, MLE interacts with at least one of the two noncoding RNAs associated with the DCC complex-roX1 and roX2-likely guiding these RNAs to the DCC and reinforcing their integration into the complex (Copps et al. 1998). Other proteins such as JIL-1 ( Jin et al. 2000) and the NURF complex components (Bai et al. 2007) interact with the DCC but also are known to engage in several other biological processes. We therefore refer to MSL1, MSL2, MSL3, MOF, and MLE as the ''core'' DCC proteins. We note that these five proteins are also considered core DCC components in an alternative dosage compensation model posited by Birchler et al. (2003).
The DCC components are coded by essential genes that are conserved across Drosophila and at least four other drosophilid genera (Marin et al. 1996). Nevertheless, a genome scan of D. melanogaster-D. simulans divergence revealed several candidate Gene Ontology terms, including ''dosage compensation,'' that were enriched for genes with unusual patterns of divergence (Begun et al. 2007). However, those data were insufficient for inferring the population genetic mechanism, which motivated the detailed D. melanogaster polymorphism survey presented here. Our population genetic analysis supports the hypothesis that four of the five core DCC proteins are evolving adaptively, and surprisingly, this pattern is largely restricted to the D. melanogaster lineage.
We found a consistent and significant excess of amino acid fixations at msl1, msl2, msl3, and mof along the D. melanogaster branch (Table 1), suggesting that recurrent directional selection is an important force driving rapid evolution at these four loci in D. melanogaster. We found no evidence of recurrent adaptive protein evolution for mle; however, limited statistical power due to low levels of polymorphism at this locus makes our inability to reject neutrality difficult to interpret. Reduced polymorphism at mle may be explained by its location near a region of reduced crossing over at the 2R centromere. In D. simulans, only mof exhibited evidence of adaptive protein evolution (Table 1). These data suggest that adaptive protein divergence of DCC components is 1 considerably more common in D. melanogaster than in D. simulans. An attempt to use PAML to understand long-term patterns of molecular evolution was thwarted by unreliable alignments among Drosophila species outside the D. melanogaster subgroup, which raises the possibility of rapid evolution on other distantly related lineages.
We used finer-scale analysis to gain biological insight into particular protein functions that might be under selection (Table 2). No known protein domains of MOF or MSL2 exhibited evidence of rapid evolution; in fact, the nonsynonymous divergence for these domains was relatively low compared to genomic averages (Begun et al. 2007). However, the MSL1 glycine-rich domain, which is thought to be involved in MSL1-MSL1 interactions (Li et al. 2005), as well as the MSL1 X chromosomebinding domain, have unusually high d N . The MSL1 leucine zipper/apolar/coiled-coil domain, essential for MSL2 binding as well as X chromosome localization (Li et al. 2005), and the polar region of MSL3, which is thought to cooperate with the chromatin-binding domain of MSL3 (Buscaino et al. 2006), also had elevated d N .
The four proteins at which adaptive evolution was inferred in D. melanogaster interact directly with at least one other member of the complex, which suggests the possibility of a history of coevolutionary interactions within this single complex; recurrent directional selec-tion of one or more members of the complex may drive compensatory evolution in other members. Protein domains known to be involved in protein self-association, localization to the X chromosome-bound complex, intraprotein interactions, and interprotein interactions were found to be evolving particularly rapidly. These findings are consistent with the hypothesis of within-complex and even within-protein coevolutionary interactions and suggest protein domains where adaptive evolution is concentrated. These domains are excellent candidates for investigating DCC functional divergence between D. melanogaster and D. simulans.
The nature of DCC functional divergence between these sister species depends on the biological or molecular function targeted by directional selection. Adaptive protein evolution at msl1, msl2, msl3, and mof may result from selection of DCC function itself or of a dosageindependent function carried out by either a DCC protein or a non-DCC protein undergoing adaptive evolution that interacts with a dosage protein. For example, the nuclear pore protein, NUP153, interacts with MSL3 and MOF (Mendjan et al. 2006) and exhibits patterns of adaptive protein evolution along both the D. melanogaster and D. simulans lineages (Presgraves and Stephan 2007). However, given the observation that NUP153 has been under directional selection in both D. melanogaster and D. simulans, it is not clear why the putative coevolutionary interactions among the core a Ten inbred lines of D. melanogaster (North Carolina, T. Mackay) were sequenced directly to obtain polymorphism data from each of the five core protein-coding loci in the DCC. For D. simulans, sequences were obtained from a population genomic data set, which consists of light-shotgun sequencing (one to two times coverage) of six lines syntenically aligned to D. melanogaster v4 assembly, and were used for population data for all five DCC genes (Begun et al. 2007). These D. simulans data were subjected to extensive quality control (http:/ /www.dpgp.org). Levels of nucleotide diversity (p) were estimated as in Nei (1987) and Weir (1990). The numbers of silent and replacement sites were estimated using the method of Nei and Gojobori (1986). The pathway between two codons was calculated as the average number of silent and replacement changes from all possible paths between the pair. Lineage-specific divergence was estimated by maximum likelihood using PAML v3.14 (Yang 1997). PAML was run in batch mode using a BioPerl wrapper (Stajich et al. 2002) for codeml with codon frequencies estimated from the data. Sequence data for this article have been submitted to GenBank under accession nos. EU167087-EU167150. b To test for adaptive protein evolution, we used the McDonald-Kreitman test (McDonald and Kreitman 1991), which determines whether the number of fixations relative to polymorphisms for synonymous and nonsynonymous sites deviates from neutral expectations. We used parsimony to infer whether a fixation occurred on the D. melanogaster or the D. simulans branch, restricting our attention to codons that varied at a single position among the three species. The syntenic alignment of D. yakuba was used to polarize changes to the D. melanogaster or D. simulans branch (Begun et al. 2007). For msl3 and mof, which had borderline significant McDonald-Kreitman tests for the D. simulans lineage, 10 additional inbred lines were sequenced (Winters, CA, S. Nuzhdin).
DCC proteins would be restricted to the D. melanogaster lineage. Indeed, lineage specificity must characterize whatever biological or molecular process is driving adaptive evolution at the DCC.
An alternative hypothesis for explaining D. melanogasterspecific adaptive protein evolution at several core DCC loci may be a host-pathogen interaction. The malekilling bacteria, Spiroplasma, has been found in several natural populations of D. melanogaster that occur near the equator (Montenegro et al. 2000;Pool et al. 2006). Veneti et al. (2005) showed that Spiroplasma-infected D. melanogaster males carrying a loss-of-function mutation in any one of the five core DCC components survived to the third larval instar stage, while similarly infected males that carried no DCC mutations died as embryos. This result suggests that the deleterious effect of Spiroplasma on male fitness may be mediated through interactions with the dosage compensation complex and that a host-pathogen arms race localized to the DCC proteins may drive recurrent adaptive protein evolution. Interestingly, Spiroplasma has yet to be identified in D. simulans. Evidence that DCC adaptive protein evolution is more prevalent in Drosophila species with histories of Spiroplasma infections in nature would support this host-pathogen hypothesis.