Drosophila melanogaster males transfer seminal fluid proteins along with sperm during mating. Among these proteins, ACPs (Accessory gland proteins) from the male's accessory gland induce behavioral, physiological, and life span reduction in mated females and mediate sperm storage and utilization. A previous evolutionary EST screen in D. simulans identified partial cDNAs for 57 new candidate ACPs. Here we report the annotation and confirmation of the corresponding Acp genes in D. melanogaster. Of 57 new candidate Acp genes previously reported in D. melanogaster, 34 conform to our more stringent criteria for encoding putative male accessory gland extracellular proteins, thus bringing the total number of ACPs identified to 52 (34 plus 18 previously identified). This comprehensive set of Acp genes allows us to dissect the patterns of evolutionary change in a suite of proteins from a single male-specific reproductive tissue. We used sequence-based analysis to examine codon bias, gene duplications, and levels of divergence (via dN/dS values and ortholog detection) of the 52 D. melanogaster ACPs in D. simulans, D. yakuba, and D. pseudoobscura. We show that 58% of the 52 D. melanogaster Acp genes are detectable in D. pseudoobscura. Sequence comparisons of ACPs shared and not shared between D. melanogaster and D. pseudoobscura show that there are separate classes undergoing distinctly dissimilar evolutionary dynamics.
ACCESSORY gland proteins (ACPs) induce a variety of physiological, behavioral, and reproductive changes when transferred to the female. Between 25 and 150 ACPs were initially thought to be transferred to the female during mating (Ingman-Baker and Candido 1980; Schmidt et al. 1985; Whalen and Wilson 1986; Coulthart and Singh 1988; Wolfner et al. 1997). Males lacking ACPs have impaired fertility, indicating that ACPs perform important reproductive functions (Kalb et al. 1993; Xue and Noll 2000). Specifically, ACPs cause females to increase their egg-production, egg-laying, and ovulation rates, decrease their propensity to remate, and store and utilize sperm (reviewed in Wolfner 2002; Chapman and Davies 2004). ACPs also participate in formation of the mating plug (Lung and Wolfner 2001) and mediate a decrease in the mated female's life span (Chapman et al. 1995). Genetic analyses have revealed the functions of four ACPs thus far. Acp26Aa (ovulin) is a prohormone that triggers an increase in ovulation rate (Herndon and Wolfner 1995; Heifetz et al. 2000). Acp36DE is a glycoprotein that is essential for sperm storage (Neubaum and Wolfner 1999), by regulating sperm accumulation into storage (Bloch Qazi and Wolfner 2003). Acp70A (sex peptide) induces egg laying and decreases females' receptivity to remating; it also contributes to the cost of mating to females (Chen et al. 1988; Aigaki et al. 1991; Chapman et al. 2003; Liu and Kubli 2003; Wigby and Chapman 2005). Acp62F is a trypsin protease inhibitor that localizes to the sperm storage organs of mated females and has been suggested to preserve sperm viability (Lung et al. 2002). Acp62F also enters the female's circulation and is toxic to flies upon repeated ectopic expression, suggesting a possible role in the life span cost of mating (Lung et al. 2002). In addition, the transfer of antimicrobial ACPs to the female (Lung et al. 2001) and the Acp-induced upregulation of antimicrobial peptides in mated females (Lawniczak and Begun 2004; McGraw et al. 2004) suggests that ACPs may contribute to a female's immune defense. Altogether, ACPs appear to participate in a complex set of interactions by competing/cooperating with seminal fluid proteins of other males (Clark et al. 1995; Clark et al. 1999; Prout and Clark 1996; Snook and Hosken 2004), receptors present in the female or on sperm, and pathogens. To better understand this diverse set of interactions of ACPs it is important to fully characterize the ACPs involved and examine their evolutionary dynamics.
Initially, 18 Drosophila melanogaster ACPs had been identified from multiple screens (Chen et al. 1988; Simmerl et al. 1995; Wolfner et al. 1997); however, this was far below the predicted 25–150 ACPs (Ingman-Baker and Candido 1980; Schmidt et al. 1985; Whalen and Wilson 1986; Civetta and Singh 1995; Wolfner et al. 1997). In an extensive screen (Swanson et al. 2001a), 57 new candidate ACPs were identified from partial gene sequencing of ESTs obtained from a D. simulans accessory gland cDNA library. These 57 candidate ACPs, plus the 18 previously identified, led to 75 putative ACPs. Statistical analysis of the frequency of multiple isolates predicted that these genes represented ∼90% of the total number of Acp genes (Swanson et al. 2001a). The Swanson et al. (2001a) EST screen identified ACPs from partial gene sequencing and from a species in which genetic analysis is not routine, D. simulans. Because it is important to obtain the complete sequence of these genes in a species in which genetic analysis is possible, we obtained and report here the D. melanogaster orthologs of the 57 D. simulans Acp candidates. Our RT-PCR and bioinformatic analyses determined that 34 of the candidate 57 ACPs identified by Swanson et al. (2001a) have sequences suggestive of encoding extracellular proteins and expression patterns suggestive of encoding ACPs. This resets the total number of D. melanogaster ACPs identified to 52 (34 plus 18 previously identified).
An unusually high fraction of the genes encoding ACPs show signs of positive selection (Aguadé et al. 1992; Cirera and Aguadé 1997; Tsaur and Wu 1997; Aguadé 1999; Begun et al. 2000; Panhuis et al. 2003; Kern et al. 2004; Kohn et al. 2004; Stevison et al. 2004). ACPs, as a class, evolve at about twice the rate of nonreproductive proteins (Whalen and Wilson 1986; Civetta and Singh 1995; Swanson et al. 2001a). Swanson et al. (2001a) found that ∼11% of the partially sequenced ESTs that they identified have an excess of nonsynonymous over synonymous nucleotide changes, suggesting that divergence of these genes is being accelerated by positive selection. Three selective forces are predicted to drive the generation of sequence diversity of ACPs: female sperm preference (Eberhard and Cordero 1995), sperm competition (Clark et al. 1995), and sexual conflict (Rice 1996). Previous evolutionary analyses of ACPs focused on some of the initially identified 18 ACPs (Aguadé et al. 1992; Cirera and Aguadé 1997; Tsaur and Wu 1997; Aguadé 1999; Begun et al. 2000; Kern et al. 2004). Here we present a detailed examination of the molecular evolution of the entire set of stringently selected and annotated 52 ACPs. We performed sequence-based comparisons of these D. melanogaster ACPs with their orthologs in three Drosophila species (D. simulans, D. yakuba, and D. pseudoobscura). This allowed us to determine levels of codon bias, rates of gene duplication, and levels of sequence divergence among three members of the D. melanogaster subgroup (D. melanogaster, D. simulans, and D. yakuba) and, via ortholog detection, which ACPs are conserved between D. melanogaster and D. pseudoobscura. These evolutionary analyses demonstrate that ACPs represent a combination of divergent and conserved proteins that undergo different patterns of sequence evolution.
MATERIALS AND METHODS
Annotation of D. melanogaster orthologs of D. simulans Acp-ESTs:
We sequenced D. simulans Acp ESTs (Swanson et al. 2001a) from their 3′-ends to determine the translational stop position. This in combination with previously sequenced 5′-end sequences (Swanson et al. 2001a) provided each candidate ACP's complete ORF. The complete EST sequences can be found under GenBank accession nos. DQ088689–DQ088699 and DQ079991–DQ079998. These D. simulans EST sequences were subsequently aligned using Sequencher 4.0.5 (Gene Codes) to the D. melanogaster genome (Release 4.0) (Celniker et al. 2002) to identify their D. melanogaster orthologs. Each translational start was located by the presence of sequences encoding a predicted signal peptide, either from the Berkeley Drosophila Genome Project (BDGP) D. melanogaster annotation or via manual inspection if the BDGP annotation did not match the D. simulans EST. Manual searches for predicted signal peptides constituted scanning ∼1.5 kb of noncoding upstream D. melanogaster sequence from the 5′-end of the D. simulans EST or BDGP predicted translational start site. Predicted signal peptides were identified using SignalP (Nielsen et al. 1997).
Candidate Acp genes were then examined for accessory gland-specific expression. Eighteen previously identified ACPs (Chen et al. 1988; Simmerl et al. 1995; Wolfner et al. 1997) were already known to show accessory gland-predominant or -exclusive expression. We searched each of the 57 new candidate ACPs identified by Swanson et al. (2001a) against the D. melanogaster BDGP EST database (http://www.fruitfly.org/EST/EST.shtml) to see if the gene was expressed in other tissues (e.g., head, embryo, tissue culture). Occasionally adult testis ESTs (Rubin et al. 2000a) included our Acp candidates. For example, Acp36DE [a highly expressed Acp (Wolfner et al. 1997)] has 32 testes EST hits, but has been shown by Western blots to be an accessory gland-specific protein (Bertram et al. 1996; Wolfner et al. 1997). These differences may result from low-level contamination by accessory gland fragments or cells in the large-scale testes preparations for the EST project or may indicate that Acp36DE is transcribed, but not translated, in the testes. Consistent with such models, all 15 Acp antibodies thus far generated detect exclusively accessory gland-specific proteins, even though 9 of 15 genes [CG8982 (Acp26Aa), CG4605 (Acp32CD), CG7157 (Acp36DE), CG6289, CG8137, CG9334, CG17575, CG1656, CG9029] (Monsma et al. 1990; Coleman et al. 1995; Bertram et al. 1996; Lung et al. 2002; Ravi Ram et al. 2005) have testis EST hits (Andrews et al. 2000; Rubin et al. 2000a; Parisi et al. 2003).
Of the 57 Acp candidates previously selected by Swanson et al. (2001a), 7 were eliminated from further study because mutational analysis (per FlyBase, http://flybase.net/) indicates that their phenotypes affect nonreproductive processes. Sixteen further candidates were removed because they either had EST hits in multiple nonreproductive tissue types or could not be annotated, thus leaving 34 candidate ACPs (see http://www.genetics.org/supplemental/ for list of ACPs removed from the previous 57 candidates). It is important to note that secreted proteins found in other tissues, or whose mutants have additional nonreproductive phenotypes, may be present in accessory gland secretions. However, we focused on accessory gland-specific candidates since the evolutionary pressures and functions of these genes should be more comprehensible than those of genes expressed in multiple tissues and thus likely having multiple functions.
This selection process resulted in a collection of 52 ACPs (Table 1). Twelve (CG1262 (Acp62F), CG4986, CG6069, CG10284, CG10956, CG11598, CG14034, CG17097, BG642378, BG642312, BG642167, and BG642163) either were not identified in or have different ORFs from those predicted in the D. melanogaster genome sequence (Release 4.0) (Celniker et al. 2002). We may have identified alternative splice forms of the predicted genes. An example is CG10956, whose Release 4.0 annotation predicts a single exon, while our annotation has identified a second exon at the 3′-end. Our annotation may have also revealed species-specific differences, since the EST library was constructed from D. simulans, or differences with the D. melanogaster annotation (Release 4.0) (Celniker et al. 2002).
We revised the current D. melanogaster (Release 4.0) annotation (Celniker et al. 2002) of the translational start sites for both CG4986 and CG10956; the splicing patterns for CG1262 (Acp62F), CG11598, CG6069, CG10284, and CG17097; and the translational start, translational stop, and splicing pattern for CG14034. Four Acp D. simulans ESTs (BG642378, BG642312, BG642167, and BG642163, Swanson et al. 2001a) likely represent real genes but remain unannotated in the current D. melanogaster genome Release 4.0 (Celniker et al. 2002). All genes unidentified and/or misannotated in the D. melanogaster (Release 4.0) genome annotation were submitted to GenBank under accession nos. BK005692–BK005702.
Confirmation of D. melanogaster annotations:
RT-PCR of full coding regions in D. melanogaster was performed from RNA isolated from whole, 3-day-old adult virgin Canton-S males. Approximately 30 flies were homogenized in Trizol according to the manufacturer's instructions (GIBCO, Bethesda, MD) and total RNA was prepared for RT-PCR as in Carninci and Hayashizaki (1999). Full-length coding regions were amplified using primers designed from our annotations, which verified the annotations and expression. All amplified products were PCR-purified, cloned into pENTR-DTopo or pDONR-201 vectors (Invitrogen), and sequenced by the Biotechnology Resource Center at Cornell using the vector's internal primers. ACPs that could not be RT-PCRd from whole adult male Canton-S cDNA were amplified from available EST clones (Rubin et al. 2000a) and subsequently cloned as above. Incomplete sequence information for Acp53Eb and a very short coding sequence for CG31056 (Acp98AB) (Wolfner et al. 1997) did not allow cloning into pENTR-DTopo or pDONR-201 vectors. Complete coding, amino acid, and primer sequences for each of the 34 new ACPs can be found in the supplemental materials (http://www.genetics.org/supplemental/).
D. melanogaster Acp sequence analysis:
Codon bias was measured by both the frequency of optimal codons (Fop) and the percentage G/C content in the third codon position (G/C3rd) (Moriyama and Powell 1997). Fop values range from 0.33 to 1, where 0.33 indicates homogeneous codon usage and 1 indicates that only optimal codons are used. Fop, G/C3rd, and gene GC content calculations were performed using the codonw program (http://www.molbiol.ox.ac.uk/cu/). Codon bias values (see http://www.genetics.org/supplemental/) were calculated using the D. melanogaster codon frequency table settings of the codonw program. Previous codon bias analysis of CG32952 (Acp33A) in D. melanogaster to D. simulans comparisons have combined its two ORFs (CG32952-A and CG32952-B) (Begun et al. 2000); however, since each ORF contains its own predicted signal sequence we performed our analysis as two separate genes.
For comparison, we generated a random sample of 100 D. melanogaster genes showing twofold higher expression in testes vs. ovaries from the Parisi et al. (2004) microarray data set. Additionally, a random sample of 150 D. melanogaster genes with approximately the same gene lengths as ACPs (Acp mean gene nucleotide length, 994.7; random gene nucleotide length, 957.6) was obtained from BDGP (http://www.fruitfly.org/sequence/dlMisc.shtml).
Sequence comparisons and chromosomal location were used together to identify gene duplicates. Individual Acp protein sequences were compared to the D. melanogaster genome using BlastP. Acp gene duplicate candidates were considered if they had a conservative E-value of 10−10 and a minimum of 30% sequence identity across ≥80% of the protein (Gu et al. 2002). Because many gene duplicates often are found in tandem (Friedman and Hughes 2003) we extended our search to locate significant matches falling within neighboring Acp genes that did not meet the >30% sequence identity cutoff. If such a hit was present, we checked for a similar protein domain prediction (http://www-cryst.bioc.cam.ac.uk/∼fugue/prfsearch.html) and conserved splicing pattern to support its being a possible duplicate. Candidates that both had a BlastP E-value of 10−10 or smaller and matched all three sequence search criteria were also considered gene duplicates, even though their sequence identity may be <30%. Gene duplication conservation in D. simulans and D. yakuba was searched via tBlastN to their whole-genome alignments (WashU-GSC http://genome.wustl.edu/tools/blast/).
Calculation of the expected number of ACPs in the D. melanogaster genome:
Two estimates of the total number of Acp genes in the D. melanogaster genome were performed as in Swanson et al. (2001a) by using maximum-likelihood fits to a truncated Poisson distribution. A third estimate was obtained by nonparametric maximum likelihood. The first two predictions differ with respect to how they deal with 5 ACPs (Acp26Aa, Acp26Ab, Acp32CD, Acp33A, Acp36DE) that were not adequately prescreened by Swanson et al. (and hence appeared in the postscreening library). In the first estimate, we ignore the 5 ACPs prescreened by Swanson et al. (2001a) and fit a truncated Poisson distribution to the frequency spectrum (counts of singleton hits, doubleton hits, etc.). This gives a maximum-likelihood count of 52 ACPs in addition to the 18 that were prescreened by Swanson et al. (2001a), for a total of 70. For the second estimate, we include the 5 Acp hit counts as though they were not prescreened at all, and we obtain a maximum-likelihood count of 59 ACPs. If the 13 ACPs that were successfully prescreened (or at least were not observed among the sequenced clones) are added back to the estimate of 59 ACPs, this yields a prediction of 72 Acp genes in the D. melanogaster genome. The third method was designed for an unscreened library, and fits the data to a Poisson mixture model by nonparametric maximum likelihood (Ji-Ping Wang, personal communication). The perl script eststat.pl (available at http://www.floralgenome.org/cgi-bin/eststat/eststat.cgi) took the frequency spectrum of EST hits and produced an estimate of the total count of distinct ACPs in the library at 106. This figure may be considered as an upper bound because of the prescreening that was applied to the library, leaving a more uniform frequency distribution than would be found in an unscreened library.
D. simulans and D. yakuba sequence comparisons to D. melanogaster ACPs:
Nonsynonymous substitutions per nonsynonymous site (dN) and synonymous substitutions per synonymous site (dS) values for some previously characterized ACPs (Aguadé et al. 1992; Cirera and Aguadé 1997; Aguadé 1999; Begun et al. 2000; Kohn et al. 2004) were incorporated into this analysis. D. yakuba sequences were retrieved via BlastN alignment outputs of the D. melanogaster ACPs to the D. yakuba genome (WashU-GSC http://genome.wustl.edu/blast/). D. simulans and D. yakuba coding regions (see http://www.genetics.org/supplemental/) were aligned to the D. melanogaster coding regions with ClustalX (Thompson et al. 1997). dN and dS values were calculated using DNASP 4.0 (Rozas et al. 2003). In a few cases, partial gene sequences were used. In a single D. yakuba case, CG32952-B, an adenine to cytosine change disrupted the apparent start codon. No other plausible ATG could be identified upstream of CG32952-B to compensate for this difference and CG32952-B was thus omitted from D. melanogaster to D. yakuba comparisons, although rare CUG start codons do exist (Prats et al. 1989). D. simulans and D. yakuba codon bias values (see http://www.genetics.org/supplemental/) were calculated as above. The D. yakuba non-Acp data set was obtained from a set of non-sex-specific transcripts (Domazet-Loso and Tautz 2003). The StatView statistical program (version 5.0.1; SAS Institute) was used for statistical analyses.
Detection of Acp orthologs in D. pseudoobscura:
The whole-genome alignment (WGA) of the D. melanogaster and D. pseudoobscura genome (Richards et al. 2005) was taken from (Emberly et al. 2003). The SMASH program (Zavolan et al. 2003) was used to find the strongest set of syntenic anchors between the D. pseudoobscura contigs and the D. melanogaster genome. Anchors were high-similarity regions from 10s to 100s of base pairs and covered ∼30% of the genome. The LAGAN program (Brudno et al. 2003) gave similar alignments. Since the size of syntenic domains between the two species generally exceeds 10 kb (i.e., much larger than most repeat elements within the sequenced euchromatin), using synteny eliminated almost all ambiguities due to repeats. The SMASH blocks along with the contigs they matched were displayed on top of the Release 3 annotation (Celniker et al. 2002) using GBROWSE (http://www.gmod.org/ggb/index.shtml).
We then examined the syntenic regions for each Acp individually at the sequence level. In 48 of 51 cases (51 ACPs instead of 52 were compared because Acp53Eb's sequence information has yet to be determined), SMASH blocks from a single contig either bracketed or “hit” the annotated gene in D. melanogaster. SMASH blocks from a single D. pseudoobscura contig that span a given Acp locus indicate that the Acp genomic region in question can be aligned at the sequence level. For CG31872, a contiguous D. pseudoobscura sequence could not be aligned because the Acp gene fell into a gap between two contigs. Two other cases, CG14560 and CG9074, contained SMASH block hits to multiple contigs that differed from the contig spanning this region. The coding sequence of CG14560 and CG9074 were then submitted to Repeatmasker (http://www.repeatmasker.org/), which indicated that both ACPs contained repetitive regions, thus explaining the multiple SMASH block contig hits. After filtering out the repetitive regions for CG14560 and CG9074, we could generate a single contig that bracketed each gene. Upon verification of the D. melanogaster to D. pseudoobscura contig alignments of the ACPs, we retrieved the corresponding D. pseudoobscura sequence within the aligned contig and searched the D. pseudoobscura contig sequence via tBlastN using the D. melanogaster protein sequence. If coding sequence alignments could not be identified, we used GENSCAN (Burge and Karlin 1997) and Genie (Reese et al. 2000) to locate possible ORFs. All ACPs for which coding sequence alignments could be generated with the corresponding D. pseudoobscura contig region are considered true orthologs (Table 1). The SMASH block-based coding sequence alignments were confirmed using another more recent D. pseudoobscura WGA (Karolchik et al. 2003). D. pseudoobscura coding sequences of conserved ACPs and D. melanogaster to D. pseudoobscura contig alignments for absent or undetectable ACPs can be found in the web supplement (http://www.genetics.org/supplemental/). It is important to note that even though we define conserved ACPs between D. melanogaster and D. pseudoobscura as true orthologs, we have not determined whether these ACPs have maintained their accessory gland expression in D. pseudoobscura.
D. melanogaster ACPs that could not be detected within the retrieved D. pseudoobscura contig were searched via tBlastN to the D. pseudoobscura genome, via the Baylor College of Medicine Drosophila Genome project website (http://www.hgsc.bcm.tmc.edu/projects/drosophila/). For tBlastN searches, only hits with an E-value of 1e-04 (Zdobnov et al. 2002) or smaller were considered significant. Whenever a significant tBlastN hit in D. pseudoobscura was identified, the corresponding D. pseudoobscura sequence was then return searched against the D. melanogaster genome (http://www.flybase.net) via BlastP to determine whether it hit the Acp in question or a protein within a similar sequence/structure-function class. In all cases significant D. pseudoobscura tBlastN hits were false positives [e.g., D. melanogaster ACPs CG8137 (serpin) and CG9334 (serpin) both hit the D. pseudoobscura ortholog of CG9456 (serpin)]. Alignments and “false-positive D. melanogaster genes” for ACPs whose true ortholog could not be detected via WGA, yet have a significant tBlastN hit in D. pseudoobscura whose return D. melanogaster BlastP does not match an Acp, can be found in the supplemental materials (http://www.genetics.org/supplemental/).
RESULTS AND DISCUSSION
D. melanogaster Acp genes:
Secreted proteins synthesized by the D. melanogaster male accessory gland have important functions in reproduction (reviewed in Wolfner 2002; Kubli 2003; Chapman and Davies 2004). To address more thoroughly the functions and evolution of these Acp proteins, we carried out a comprehensive identification and annotation of D. melanogaster Acp genes. Prior to 2001, 18 Acp genes had been reported in D. melanogaster (Chen et al. 1988; Simmerl et al. 1995; Wolfner et al. 1997). In 2001, Swanson et al. (2001a) identified 57 additional candidate Acp genes in D. simulans via an evolutionary EST approach that was performed to permit a rapid scan to identify genes with features suggesting rapid evolution. However, the ESTs identified by Swanson et al. (2001a) were partial cDNAs and from a species, D. simulans, which is presently less amenable to genetic analyses than is D. melanogaster. We therefore full-length sequenced a select set of the D. simulans EST sequences identified by Swanson et al. (2001a). The full-length D. simulans Acp EST sequences allowed us to identify the complete ORF of their D. melanogaster orthologs. We then applied a more stringent set of criteria to identify those genes on which to focus, based on what is known of the initial 18 ACPs. We define bona fide ACPs here as genes that: (a) encode a protein with a predicted secretion signal sequence, (b) have a pattern of EST hits in other tissue- or cell-type-specific EST screens consistent with accessory gland predominant expression, (c) have no previously characterized non-Acp function, and (d) show male and/or accessory gland predominant expression in D. melanogaster. Using these stringent criteria we utilized secretion signal prediction programs, EST databases, reports of mutant phenotypes, and RT-PCR to screen through the 57 candidate ACPs identified by Swanson et al. (2001a) (see materials and methods for details). Thirty-four ACPs fit these new stringent criteria (see Table 1). The other 23 genes identified by Swanson et al. (2001a) could encode proteins made in accessory glands and potentially also transferred to females, but their additional tissues of expression and/or nonreproductive functions complicate genetic and functional analyses and evolutionary interpretations; thus, we do not consider them further. It is also formally possible that the expression characteristics of some of these 23 ACPs differ in D. melanogaster and D. simulans, resulting in their exclusion from the stringently selected D. melanogaster ACPs on which we focus. The 34 stringently selected D. melanogaster ACPs that fit the above criteria, in combination with the 18 previously known ACPs, make a total of 52 D. melanogaster ACPs (Table 1) whose gene boundaries and expression have been confirmed. This comprehensive and characterized set of 52 ACPs has also allowed a recalculation of the predicted number of ACPs in the genome. Fitting the frequency spectrum of the 52 ACPs with EST hits from the Swanson et al. (2001a) screen to a truncated Poisson distribution and to a Poisson mixture model gave maximum-likelihood estimates in the range of 70–106 ACPs in the D. melanogaster genome (see materials and methods), respectively. Additionally, recently identified ACPs CG8626, CG15616, and CG17799 (Holloway and Begun 2004) suggest that the field is steadily approaching a complete list of ACPs in D. melanogaster.
These 52 D. melanogaster ACPs are expected to be extracellular and thus transferred to the female upon mating and to be produced primarily or exclusively in the male's accessory gland. Indeed, all 16 Acp genes tested so far encode seminal proteins detectable only in the male's accessory gland and transferred to the female during mating (Chen et al. 1988; Monsma et al. 1990; Coleman et al. 1995; Bertram et al. 1996; Lung et al. 2002; Albright 2003; Ravi Ram et al. 2005). Additional support that this set of 52 D. melanogaster ACPs truly represents accessory-gland predominant genes stems from the finding that 29 of 46 tested Acp genes showed twofold or higher expression values in germlineless male vs. germlineless female comparisons (6 ACPs were not present on the microarrays) (Parisi et al. 2004).
Presence of multiple Acp gene duplicates across the D. melanogaster genome:
About 40% of the D. melanogaster genome (5536 of 13601 genes) appears to be gene duplicates (Rubin et al. 2000b). Similarly, 16 (31%) of the 52 ACPs appear to have gene duplicates (Table 2) within the D. melanogaster genome. CG8137 and CG9334 are the only gene duplicates not in tandem, although they share the same intron splice positions. Percent identities of the Acp gene duplicates range from 25 [CG3801 (Acp76A) and BG642378] to 92% (CG6289 and CG6663), indicating that a range of recent and ancient gene duplicates have been identified.
Nine of these cases of gene duplication are within the 52 Acp collection (3 duplicate pairs plus 1 triplicate) (Table 2B), indicating that these duplicates have similar expression profiles. This is consistent with the observation that gene duplication events often lead to coexpressed genes that cluster together (Boutanaev et al. 2002). These 9 Acp gene duplicates are found in tandem clusters of paired (or triplicate) genes, and they share the same relative splice site positions, which are also conserved in D. simulans and D. yakuba.
For seven additional ACPs we detect duplicates in the genome (Table 2A). Again, tandem arrangements are seen in D. simulans and D. yakuba, and the D. melanogaster duplicates share the same splice site positions. However, in these seven cases, only one member of each duplicate pair is a member of our 52-Acp collection. This could be because the collection is incomplete (only 52 of the predicted 70–106 ACPs are described here), because a given duplicate's expression might not fit our stringent criteria of accessory gland-predominant expression, or because a given duplicate has an entirely different expression pattern. An example of the first is CG17799. This gene duplicate of CG17797 (Acp29AB) has recently been shown to also be expressed in the D. melanogaster accessory gland (Holloway and Begun 2004), but is not among the 52 genes we focused on here, simply because it was not detected in the Swanson et al. (2001a) EST screen or previous screens. It is likely that other gene duplicates of ACPs whose expression profiles have yet to be determined may later be identified as ACPs. The identification of the Acp gene duplicates will have an impact on future genetic analysis since duplication may introduce genetic redundancy. Additionally, since many ACPs are rapidly evolving, ACPs provide a good example for defining which evolutionary processes drive the divergence of gene duplicates.
Comparative sequence analysis of the D. melanogaster ACPs and their D. simulans and D. yakuba orthologs:
Several ACPs have features indicative of rapid evolution (Aguadé et al. 1992; Cirera and Aguadé 1997; Tsaur and Wu 1997; Aguadé 1999; Begun et al. 2000; Panhuis et al. 2003; Kern et al. 2004; Kohn et al. 2004; Stevison et al. 2004) and Swanson et al.'s (2001a) data suggested that rapidly evolving genes are represented at a high level among ACPs. With our larger collection of fully annotated Acp genes, and the recent release of Drosophila genomic sequences, we could examine this question in detail. We investigated the patterns of codon bias and rates of evolution (by examining the rates of nonsynonymous and synonymous nucleotide substitution, dN and dS) for the 52 stringently defined Acp genes and compared those results to those with a control set of genes that are not expressed in the accessory gland.
Levels of codon bias have been used as a criterion for detecting rapidly evolving genes in Drosophila (Schmid and Aquadro 2001). Although codon bias alone cannot conclusively prove rapid evolution, genes that are rapidly evolving tend to have low codon bias (Schmid et al. 1999). A previous study of 10 Acp genes (Begun et al. 2000) found that Acp genes tend to have lower levels of codon bias relative to the rest of the Drosophila genome. The 52 D. melanogaster Acp genes defined here as a class have significantly lower levels of codon bias (Mann-Whitney test, P < 0.001 for both Fop and G/C3rd calculations, Table 3) than the control random sample of D. melanogaster genes of approximately the same length. D. melanogaster Acp genes do not exhibit significant differences (Fop, Mann-Whitney test P = 0.612, G/C3rd Mann-Whitney test P = 0.302, Table 3) in codon bias from the majority of genes expressed in the testis. Comparing levels of codon bias in the D. simulans Acp gene orthologs to non-Acp genes, we also find that Acp genes exhibit lower levels of codon bias (data not shown). We also determined whether this phenomenon is found in a more distantly related species, D. yakuba. Levels of codon bias in D. yakuba ACPs were also significantly lower (Fop, Mann-Whitney test, P < 0.001, G/C3rd Mann-Whitney test, P < 0.001, Table 3) than those of a collection of D. yakuba non-ACPs (Domazet-Loso and Tautz 2003).
Our findings with the extended set of 52 ACPs agree with the findings by Begun et al. (2000)—on average the 52 ACPs exhibited lower than average levels of codon bias in D. melanogaster, D. simulans, and D. yakuba. It is possible that these low levels of codon bias could be due to rapid rates of protein evolution of ACPs (Akashi 1994). Drosophila codon bias can also be influenced by sequence length (Duret and Mouchiroud 1999), expression level, and local GC content. Because short Drosophila genes tend to exhibit high levels of codon bias (Duret and Mouchiroud 1999), and because Acp genes also tend to be short, our control set was selected to be genes of similar length to avoid the contribution of gene length. The unusual levels of codon bias seen for both ACPs and testis-genes (Table 3) suggest that male-reproductive proteins in general may exhibit lower levels of codon bias. Low levels of codon bias for D. melanogaster testis genes is consistent with their poorly conserved sequence and sex-specific expression pattern when compared to Anopheles gambiae (Parisi et al. 2003) or D. simulans (Ranz et al. 2003), respectively. That male-biased genes evolve more rapidly at the sequence (Singh and Kulathinal 2000) and expression pattern levels (Meiklejohn et al. 2003) suggests that their rapid evolution may not allow adaptation to high levels of codon bias.
Levels of divergence:
A high dN/dS ratio can identify genes for which amino acid replacement is being driven by a selective pressure. Acp genes have already been reported to demonstrate higher levels of sequence divergence than non-Acp genes between D. simulans and D. melanogaster (Swanson et al. 2001a; Kern et al. 2004; Stevison et al. 2004). However, those analyses used only partial sequences or included genes that our present analyses have shown not to fit the stringent definition of ACPs in D. melanogaster and thus could be subject to additional or different selection pressures.
Here we compare our complete sequences of a set of stringently selected D. melanogaster ACPs with their D. simulans and D. yakuba orthologs. ACPs exhibit high levels of sequence divergence with average dN values for D. simulans of 0.045 (Table 1), similar to previously reported dN values for D. simulans ACPs of 0.052 (Swanson et al. 2001a) and 0.050 (Begun et al. 2000). The average level of dS for this set of ACPs in D. simulans is 0.13 (Table 1), similar to the known average D. simulans dS value of 0.11 (Bauer and Aquadro 1997; Moriyama and Powell 1997; Begun and Whitley 2000; Betancourt et al. 2002). We also compared Acp to non-Acp levels of sequence divergence between D. melanogaster and D. yakuba (Table 4). In this comparison as well, ACPs have significantly higher dN (0.161) and dN/dS (0.407) values than non-ACPs (dN and dN/dS values of 0.026 and 0.082, respectively) (Mann-Whitney test, both dN and dN/dS, P < 0.001).
Using levels of dN and dS as a metric to identify rapidly evolving genes, which have a dN/dS value >1, Swanson et al. (2001a) identified 19 genes whose partial sequence had dN/dS >1 in D. melanogaster/D. simulans comparisons. However, our reanalysis of the 52 ACPs using complete gene sequences yields only 3 ACPs from both D. melanogaster/D. simulans and D. melanogaster/D. yakuba comparisons with dN/dS >1 (Table 1). We believe this discrepancy between the Swanson et al. (2001a) results and those reported here is because we analyzed full-length coding regions from an accurately annotated list of genes instead of partially sequenced cDNAs, which in some cases were misaligned. In addition, for many rapidly evolving genes often only part of the gene is under positive selection (Hughes and Nei 1988; Swanson et al. 2001b). Thus, some partial cDNAs analyzed by Swanson et al. (2001a) may have fortuitously contained regions under positive selection giving a higher dN/dS than when the entire gene is tested. For this reason a dN/dS >0.5 was recently proposed as a more practical cutoff when using full-length sequences, to identify candidate genes that may be driven by positive selection (Swanson et al. 2004). Applying this cutoff value of 0.5 to the 52 ACPs we find that 9 ACPs (but not the same 9 as in Swanson et al. 2001a) in both D. melanogaster/D. simulans and D. melanogaster/D. yakuba have dN/dS >0.5. This proportion of ACPs (9/52, 17%) is similar to the percentage of ACPs identified in the Swanson et al. (2001a) male accessory gland EST screen (19%) with a dN/dS >0.5. Comparable percentages of ACPs with a dN/dS >0.5 described here to those ACPs identified in Swanson et al. (2001a) support the idea that dN/dS >0.5 may serve as a good indicator for candidate rapidly evolving genes (Swanson et al. 2004). Further analysis of the role of natural selection in shaping Acp sequence evolution using codon-substitution models will be presented elsewhere.
Detection of D. melanogaster Acp orthologs in D. pseudoobscura:
The complete genome sequence of D. pseudoobscura (Richards et al. 2005) allowed us to search for conserved D. melanogaster ACPs in a distantly related species outside of the D. melanogaster subgroup. A whole-genome alignment (WGA) approach was used to determine which of the 52 D. melanogaster ACPs can be identified in D. pseudoobscura. Syntenic regions covering each Acp were generated for 50 ACPs. Limited sequence information for the other 2 ACPs (Acp53Eb and CG31872) prevented generation of accurate comparative genome sequence alignments. We verified all the D. melanogaster to D. pseudoobscura contig alignments and identified the corresponding D. pseudoobscura Acp, to generate coding sequence alignments between the two species. All D. melanogaster ACPs for which coding sequence alignments could be generated with the corresponding D. pseudoobscura contig are considered true orthologs (see Table 1). We found that, via WGA, 58% (29/50) of the D. melanogaster ACPs have true orthologs in D. pseudoobscura (Table 1). For the 21 D. melanogaster ACPs for which true orthologs could not be identified in D. pseudoobscura we used tBlastN against all D. pseudoobscura contigs and orphan sequences to ensure that we had not missed D. melanogaster ACPs that had moved to nonsyntenic chromosomal locations in D. pseudoobscura. In 10 cases, tBlastN comparisons gave significant D. pseudoobscura hits. However, each hit was interpreted as a false positive because it matched either repetitive sequence in the Acp or a different D. pseudoobscura gene with a respective non-Acp D. melanogaster ortholog (see materials and methods). Our inability to detect a D. pseudoobscura ortholog for a D. melanogaster Acp gene via this method does not mean that a D. pseudoobscura ortholog does not exist, but only that our searches were negative. D. melanogaster ACPs undetectable in D. pseudoobscura via our methods could be highly diverged, located in an unsequenced region of the D. pseudoobscura genome, or potential D. melanogaster lineage-specific proteins. A recent study (Wagstaff and Begun 2005) uncovered a D. pseudoobscura gene with 18.5% amino acid sequence identity to D. melanogaster Acp26Aa. This is below the similarity level detectable in our search for D. pseudoobscura orthologs. For another gene, Acp95EF, our analysis revealed its D. pseudoobscura ortholog, which was undetected by Wagstaff and Begun (2005). Differences in methodologies and the limited alignability of the D. pseudoobscura genome (only ∼48%; Richards et al. 2005) likely account for these two differences in Acp ortholog detection.
Of the 29 ACPs we found conserved between D. melanogaster and D. pseudoobscura it had been possible to generate comparative structural models to known protein classes for 20 (Table 1) (Mueller et al. 2004). This represents a greater fraction (20/29, 69%) than is seen for those D. melanogaster ACPs that do not have D. pseudoobscura counterparts (9/21, 43%). That more proteins within predicted protein functional classes are conserved between D. melanogaster and D. pseudoobscura suggests that these proteins may mediate reproductive strategies that are conserved across Drosophila. Interestingly, the protease inhibitor class is not well conserved between the two species (Table 1): only one (Acp62F) of seven predicted or known Acp protease inhibitors is identifiable between the two species (Table 1). The lack of conservation of protease inhibitors between D. melanogaster and D. pseudoobscura is significantly greater than the percentage of ACPs not shared in all other protein classes (chi-square = 12.28, d.f. = 1, P < 0.001). ACPs that are predicted protease inhibitors have been suggested to participate in sperm storage, cost of mating [specifically Acp62F (Lung et al. 2002)], and/or immune regulation (Khush and Lemaitre 2000; McGraw et al. 2004), which may contribute to their evolution between D. melanogaster and D. pseudoobscura lineages.
Comparative sequence analysis within the D. melanogaster subgroup of ACPs shared or not shared with D. pseudoobscura:
Within the set of ACPs conserved between D. melanogaster and D. pseudoobscura, we examined levels of codon bias and dN/dS with two other species in the D. melanogaster subgroup. We tested whether codon bias and dN/dS values could distinguish those D. melanogaster ACPs that share or do not share true orthologs in D. pseudoobscura. We find that D. melanogaster ACPs without detectable D. pseudoobscura true orthologs have significantly lower levels of codon bias in D. melanogaster (Fop and G/C3rd Mann-Whitney test, P = 0.001 and P < 0.001, respectively) and D. yakuba than ACPs conserved between D. melanogaster and D. pseudoobscura (Fop and G/C3rd Mann-Whitney test, P < 0.001 and P < 0.001, respectively, Table 5A). Additionally, levels of dN/dS are significantly higher for D. melanogaster/D. simulans and D. melanogaster/D. yakuba comparisons of ACPs without true orthologs in D. pseudoobscura compared to ACPs conserved between D. melanogaster and D. pseudoobscura (D. simulans and D. yakuba, Mann-Whitney test, P = 0.002 and P < 0.001, respectively, Table 5A). This subgroup divergence analysis can be extended to the case of the D. melanogaster predicted protease inhibitor ACPs that do not have counterparts in D. pseudoobscura (Table 1). We find that the seven predicted or known Acp protease inhibitors have both significantly lower levels of codon bias and higher levels of sequence divergence (dN/dS) than ACPs in other predicted functional classes (Table 5B). Together, these results suggest that D. melanogaster ACPs without a true D. pseudoobscura ortholog have greater levels of sequence divergence (dN/dS) within the D. melanogaster subgroup than D. melanogaster ACPs with a detectable D. pseudoobscura ortholog. Those D. melanogaster ACPs with higher sequence divergence levels that do not have a true ortholog in D. pseudoobscura thus serve as good candidates for mediating reproductive functions in close relatives of D. melanogaster.
Underrepresentation of ACPs on the D. melanogaster X chromosome:
As previously reported (Wolfner et al. 1997; Swanson et al. 2001a), ACPs' chromosomal locations are biased to autosomes in D. melanogaster. Only 1 of the 52 ACPs, CG11664, falls on the X chromosome at cytological band 1D2 in D. melanogaster. The remaining 51 ACPs are evenly distributed across the second (27 ACPs) and third (24 ACPs) chromosomes. Given that the X chromosome contains ∼17% of the total D. melanogaster genome (Celniker et al. 2002), if the 52 ACPs were randomly distributed across the genome we would expect ∼9 of the 52 ACPs to fall on the X chromosome and 43 on autosomes. The presence of only a single X-linked Acp is highly unlikely to have occurred by chance (Gcorr = 7.908, d.f. = 1, P = 0.005), supporting reports that the D. melanogaster X chromosome is deficient in male-biased genes (Wolfner et al. 1997; Andrews et al. 2000; Swanson et al. 2001a; Parisi et al. 2003; Ranz et al. 2003).
An alternative approach to understanding the chromosomal bias of sex-specific genes is to focus on the region that contains the single X-linked D. melanogaster Acp. The 50-kb region flanking CG11664 is unusual in several respects. First, CG11664 lies in an apparently gene-poor region, with only six other genes within the surrounding 100 kb. On average there are ∼11 genes/100 kb in the D. melanogaster genome (= 13792 genes/120 Mb) (Adams et al. 2000; Celniker et al. 2002). Second, of the 6 neighboring genes, 4 (CG3713, CG11663, CG14634, and CG14635) appear to be testis biased in their expression (Andrews et al. 2000; Parisi et al. 2004, no expression data could be found for CG14632 and CG14633); thus, perhaps this region is a “hotspot” for harboring male-biased genes on the X chromosome. Third, more than half of the genes in this region do not appear to be conserved between D. pseudoobscura and D. melanogaster, consistent with the report that male-biased genes tend to evolve more rapidly at both expression (Ranz et al. 2003) and sequence (Parisi et al. 2003) levels. Fourth, five of the six neighboring ORFs, in addition to CG11664, are intronless, suggesting they may be retrogenes. Additionally, this region appears to also be a hotspot for transposable elements. In the recent transposable element (piggyBac and P element) insertion mutagenesis collection release of 16,500 fly lines (Thibault et al. 2004), the 100-kb region surrounding CG11664 contained 34 insertions, which is more than the average of ∼14 transposable elements/100 kb (= 16,500 elements/120 Mb). Altogether, the region surrounding CG11664 contains a number of unique features that may help determine what pressures are driving the evolution of sex-specific genes on the X chromosome in D. melanogaster.
Multiple hypotheses including sexual antagonism, dosage compensation, and X inactivation may explain the paucity of male-biased genes on the D. melanogaster X chromosome (reviewed in Oliver and Parisi 2004). The ability to help distinguish the importance of these phenomena could be assisted by looking at D. pseudoobscura. In D. pseudoobscura, the X chromosome consists primarily of a region largely syntenic to the left arm of the third chromosome in D. melanogaster (3L) that fused more recently in the D. pseudoobscura lineage to a region syntenic to the X chromosome of D. melanogaster (Segarra and Aguadé 1992). Thus, all ACPs with D. pseudoobscura orthologs that are located on 3L in D. melanogaster [CG1262 (Acp62F), CG10852 (Acp63F), CG17673 (Acp70A), CG3801 (Acp76A), CG6289, CG13309, CG14560, BG642312, CG16707, CG8194, BG642378, and CG6168) would now be on the right arm of the D. pseudoobscura X chromosome (XR). If there is selection against X linkage for ACPs, we would expect a higher “loss” of ACPs from the “new” (D. melanogaster 3L homolog) X-linked genes in the D. pseudoobscura lineage than for ACPs on autosomes in D. pseudoobscura. We find that a larger proportion of new ACPs on the D. pseudoobscura X chromosome are not shared between the two species (as compared to autosomal ACPs in D. pseudoobscura), although this difference is not statistically significant [D. pseudoobscura X chromosome (7/13 = 54% absent or undetected) vs. autosomes (13/36 = 36% absent or undetected); chi-square = 1.01, d.f. = 1, P = 0.322]. That fewer X-linked D. pseudoobscura ACPs are conserved than autosomal ACPs is consistent with selection against X-linked Acp's. However, the D. melanogaster 3L's base chromosome and its D. pseudoobscura XR counterpart show the second lowest level of genome sequence alignability between species: 46.5% of D. melanogaster 3L's base pairs are alignable with D. pseudoobscura XR as compared to an average across all chromosomes of 48%. Therefore, the relatively low sequence conservation of the D. pseudoobscura XR arm suggests that loss or translocation of ACPs from this arm may have resulted from the particular X-chromosomal evolutionary dynamics in the D. pseudoobscura lineage rather than from any sex-specific selection acting differentially on X chromosomes vs. autosomes.
Genes with increased rates of evolution increase the frequency with which incompatibilities evolve between closely related species. Since some ACPs in Drosophila evolve faster than other genes, these rapidly evolving ACPs serve as good candidates for examining the selection pressures associated with reproductive functions. We have characterized here such divergent ACPs, whose divergence may be attributable to sexually antagonistic evolution with proteins from the female or male (Swanson et al. 2001b; Swanson and Vacquier 2002). The female's genotype has been shown to play an active role in sperm displacement (Clark and Begun 1998) and a recent EST screen identified a number of candidate receptors/sexually antagonistic genes for ACPs (Swanson et al. 2004). Candidate receptors would likely serve as the most upstream female genes in signaling pathways for the numerous biological processes/pathways regulated by ACPs, sperm, and the act of mating (McGraw et al. 2004). The comprehensive set of ACPs described here thus provides a basis for understanding both the evolutionary dynamics and function of specific ACPs. This, in turn, may help tease apart the functional importance of male-female interactions during the evolution of reproductive isolation.
We thank all members of the Wolfner lab for comments on the manuscript, S. Albright and S. Ji for assistance with the annotations, and V. DuMont, A. Fiumera, G. Reeves, and A. Wong for helpful advice. This work was supported by National Institutes of Health Grants HD38921 to M.F.W. and GM36431 to C.F.A., and by a National Science Foundation Grant DEB-0242987 to A.G.C. For part of this work, J.L.M. and L.A.M. were supported by National Institutes of Health training grant T32 GM7617 and J.L.M. was subsequently supported by a Department of Education Graduate Assistance in Areas of National Need Fellowship. M.C.B.Q. was supported by an American Cancer Society Postdoctoral Fellowship. We appreciate permission (from Richard K. Wilson) to use the D. yakuba genomic sequences generated by the Genome Sequencing Center, Washington University-St. Louis School of Medicine, prior to their publication.
↵1 Present address: Department of Biology, Nobel Hall of Science, Gustavus Adolphus College, St. Peter, MN 56082.
Communicating editor: L. Harshman
- Received March 29, 2005.
- Accepted May 19, 2005.
- Copyright © 2005 by the Genetics Society of America