| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 171, 131-143, September 2005, Copyright © 2005
doi:10.1534/genetics.105.043844
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853 and
Center for Studies in Physics and Biology, Rockefeller University, New York, New York 10021
2 Corresponding author: Department of Molecular Biology and Genetics, 423 Biotechnology Bldg., Cornell University, Ithaca, NY 14853.
E-mail: mfw5{at}cornell.edu
| ABSTRACT |
|---|
|
|
|---|
Initially, 18 Drosophila melanogaster ACPs had been identified from multiple screens (CHEN et al. 1988; SIMMERL et al. 1995; WOLFNER et al. 1997); however, this was far below the predicted 25150 ACPs (INGMAN-BAKER and CANDIDO 1980; SCHMIDT et al. 1985; WHALEN and WILSON 1986; CIVETTA and SINGH 1995; WOLFNER et al. 1997). In an extensive screen (SWANSON et al. 2001a), 57 new candidate ACPs were identified from partial gene sequencing of ESTs obtained from a D. simulans accessory gland cDNA library. These 57 candidate ACPs, plus the 18 previously identified, led to 75 putative ACPs. Statistical analysis of the frequency of multiple isolates predicted that these genes represented
90% of the total number of Acp genes (SWANSON et al. 2001a). The SWANSON et al. (2001a) EST screen identified ACPs from partial gene sequencing and from a species in which genetic analysis is not routine, D. simulans. Because it is important to obtain the complete sequence of these genes in a species in which genetic analysis is possible, we obtained and report here the D. melanogaster orthologs of the 57 D. simulans Acp candidates. Our RT-PCR and bioinformatic analyses determined that 34 of the candidate 57 ACPs identified by SWANSON et al. (2001a) have sequences suggestive of encoding extracellular proteins and expression patterns suggestive of encoding ACPs. This resets the total number of D. melanogaster ACPs identified to 52 (34 plus 18 previously identified).
An unusually high fraction of the genes encoding ACPs show signs of positive selection (AGUADÉ et al. 1992; CIRERA and AGUADÉ 1997; TSAUR and WU 1997; AGUADÉ 1999; BEGUN et al. 2000; PANHUIS et al. 2003; KERN et al. 2004; KOHN et al. 2004; STEVISON et al. 2004). ACPs, as a class, evolve at about twice the rate of nonreproductive proteins (WHALEN and WILSON 1986; CIVETTA and SINGH 1995; SWANSON et al. 2001a). SWANSON et al. (2001a) found that
11% of the partially sequenced ESTs that they identified have an excess of nonsynonymous over synonymous nucleotide changes, suggesting that divergence of these genes is being accelerated by positive selection. Three selective forces are predicted to drive the generation of sequence diversity of ACPs: female sperm preference (EBERHARD and CORDERO 1995), sperm competition (CLARK et al. 1995), and sexual conflict (RICE 1996). Previous evolutionary analyses of ACPs focused on some of the initially identified 18 ACPs (AGUADÉ et al. 1992; CIRERA and AGUADÉ 1997; TSAUR and WU 1997; AGUADÉ 1999; BEGUN et al. 2000; KERN et al. 2004). Here we present a detailed examination of the molecular evolution of the entire set of stringently selected and annotated 52 ACPs. We performed sequence-based comparisons of these D. melanogaster ACPs with their orthologs in three Drosophila species (D. simulans, D. yakuba, and D. pseudoobscura). This allowed us to determine levels of codon bias, rates of gene duplication, and levels of sequence divergence among three members of the D. melanogaster subgroup (D. melanogaster, D. simulans, and D. yakuba) and, via ortholog detection, which ACPs are conserved between D. melanogaster and D. pseudoobscura. These evolutionary analyses demonstrate that ACPs represent a combination of divergent and conserved proteins that undergo different patterns of sequence evolution.
| MATERIALS AND METHODS |
|---|
|
|
|---|
1.5 kb of noncoding upstream D. melanogaster sequence from the 5'-end of the D. simulans EST or BDGP predicted translational start site. Predicted signal peptides were identified using SignalP (NIELSEN et al. 1997). Candidate Acp genes were then examined for accessory gland-specific expression. Eighteen previously identified ACPs (CHEN et al. 1988; SIMMERL et al. 1995; WOLFNER et al. 1997) were already known to show accessory gland-predominant or -exclusive expression. We searched each of the 57 new candidate ACPs identified by SWANSON et al. (2001a) against the D. melanogaster BDGP EST database (http://www.fruitfly.org/EST/EST.shtml) to see if the gene was expressed in other tissues (e.g., head, embryo, tissue culture). Occasionally adult testis ESTs (RUBIN et al. 2000a) included our Acp candidates. For example, Acp36DE [a highly expressed Acp (WOLFNER et al. 1997)] has 32 testes EST hits, but has been shown by Western blots to be an accessory gland-specific protein (BERTRAM et al. 1996; WOLFNER et al. 1997). These differences may result from low-level contamination by accessory gland fragments or cells in the large-scale testes preparations for the EST project or may indicate that Acp36DE is transcribed, but not translated, in the testes. Consistent with such models, all 15 Acp antibodies thus far generated detect exclusively accessory gland-specific proteins, even though 9 of 15 genes [CG8982 (Acp26Aa), CG4605 (Acp32CD), CG7157 (Acp36DE), CG6289, CG8137, CG9334, CG17575, CG1656, CG9029] (MONSMA et al. 1990; COLEMAN et al. 1995; BERTRAM et al. 1996; LUNG et al. 2002; RAVI RAM et al. 2005) have testis EST hits (ANDREWS et al. 2000; RUBIN et al. 2000a; PARISI et al. 2003).
Of the 57 Acp candidates previously selected by SWANSON et al. (2001a), 7 were eliminated from further study because mutational analysis (per FlyBase, http://flybase.net/) indicates that their phenotypes affect nonreproductive processes. Sixteen further candidates were removed because they either had EST hits in multiple nonreproductive tissue types or could not be annotated, thus leaving 34 candidate ACPs (see http://www.genetics.org/supplemental/ for list of ACPs removed from the previous 57 candidates). It is important to note that secreted proteins found in other tissues, or whose mutants have additional nonreproductive phenotypes, may be present in accessory gland secretions. However, we focused on accessory gland-specific candidates since the evolutionary pressures and functions of these genes should be more comprehensible than those of genes expressed in multiple tissues and thus likely having multiple functions.
This selection process resulted in a collection of 52 ACPs (Table 1). Twelve (CG1262 (Acp62F), CG4986, CG6069, CG10284, CG10956, CG11598, CG14034, CG17097, BG642378, BG642312, BG642167, and BG642163) either were not identified in or have different ORFs from those predicted in the D. melanogaster genome sequence (Release 4.0) (CELNIKER et al. 2002). We may have identified alternative splice forms of the predicted genes. An example is CG10956, whose Release 4.0 annotation predicts a single exon, while our annotation has identified a second exon at the 3'-end. Our annotation may have also revealed species-specific differences, since the EST library was constructed from D. simulans, or differences with the D. melanogaster annotation (Release 4.0) (CELNIKER et al. 2002).
|
Confirmation of D. melanogaster annotations:
RT-PCR of full coding regions in D. melanogaster was performed from RNA isolated from whole, 3-day-old adult virgin Canton-S males. Approximately 30 flies were homogenized in Trizol according to the manufacturer's instructions (GIBCO, Bethesda, MD) and total RNA was prepared for RT-PCR as in CARNINCI and HAYASHIZAKI (1999). Full-length coding regions were amplified using primers designed from our annotations, which verified the annotations and expression. All amplified products were PCR-purified, cloned into pENTR-DTopo or pDONR-201 vectors (Invitrogen), and sequenced by the Biotechnology Resource Center at Cornell using the vector's internal primers. ACPs that could not be RT-PCRd from whole adult male Canton-S cDNA were amplified from available EST clones (RUBIN et al. 2000a) and subsequently cloned as above. Incomplete sequence information for Acp53Eb and a very short coding sequence for CG31056 (Acp98AB) (WOLFNER et al. 1997) did not allow cloning into pENTR-DTopo or pDONR-201 vectors. Complete coding, amino acid, and primer sequences for each of the 34 new ACPs can be found in the supplemental materials (http://www.genetics.org/supplemental/).
D. melanogaster Acp sequence analysis:
Codon bias:
Codon bias was measured by both the frequency of optimal codons (Fop) and the percentage G/C content in the third codon position (G/C3rd) (MORIYAMA and POWELL 1997). Fop values range from 0.33 to 1, where 0.33 indicates homogeneous codon usage and 1 indicates that only optimal codons are used. Fop, G/C3rd, and gene GC content calculations were performed using the codonw program (http://www.molbiol.ox.ac.uk/cu/). Codon bias values (see http://www.genetics.org/supplemental/) were calculated using the D. melanogaster codon frequency table settings of the codonw program. Previous codon bias analysis of CG32952 (Acp33A) in D. melanogaster to D. simulans comparisons have combined its two ORFs (CG32952-A and CG32952-B) (BEGUN et al. 2000); however, since each ORF contains its own predicted signal sequence we performed our analysis as two separate genes.
For comparison, we generated a random sample of 100 D. melanogaster genes showing twofold higher expression in testes vs. ovaries from the PARISI et al. (2004) microarray data set. Additionally, a random sample of 150 D. melanogaster genes with approximately the same gene lengths as ACPs (Acp mean gene nucleotide length, 994.7; random gene nucleotide length, 957.6) was obtained from BDGP (http://www.fruitfly.org/sequence/dlMisc.shtml).
Gene duplications:
Sequence comparisons and chromosomal location were used together to identify gene duplicates. Individual Acp protein sequences were compared to the D. melanogaster genome using BlastP. Acp gene duplicate candidates were considered if they had a conservative E-value of 1010 and a minimum of 30% sequence identity across
80% of the protein (GU et al. 2002). Because many gene duplicates often are found in tandem (FRIEDMAN and HUGHES 2003) we extended our search to locate significant matches falling within neighboring Acp genes that did not meet the >30% sequence identity cutoff. If such a hit was present, we checked for a similar protein domain prediction (http://www-cryst.bioc.cam.ac.uk/
fugue/prfsearch.html) and conserved splicing pattern to support its being a possible duplicate. Candidates that both had a BlastP E-value of 1010 or smaller and matched all three sequence search criteria were also considered gene duplicates, even though their sequence identity may be <30%. Gene duplication conservation in D. simulans and D. yakuba was searched via tBlastN to their whole-genome alignments (WashU-GSC http://genome.wustl.edu/tools/blast/).
Calculation of the expected number of ACPs in the D. melanogaster genome:
Two estimates of the total number of Acp genes in the D. melanogaster genome were performed as in SWANSON et al. (2001a) by using maximum-likelihood fits to a truncated Poisson distribution. A third estimate was obtained by nonparametric maximum likelihood. The first two predictions differ with respect to how they deal with 5 ACPs (Acp26Aa, Acp26Ab, Acp32CD, Acp33A, Acp36DE) that were not adequately prescreened by Swanson et al. (and hence appeared in the postscreening library). In the first estimate, we ignore the 5 ACPs prescreened by SWANSON et al. (2001a) and fit a truncated Poisson distribution to the frequency spectrum (counts of singleton hits, doubleton hits, etc.). This gives a maximum-likelihood count of 52 ACPs in addition to the 18 that were prescreened by SWANSON et al. (2001a), for a total of 70. For the second estimate, we include the 5 Acp hit counts as though they were not prescreened at all, and we obtain a maximum-likelihood count of 59 ACPs. If the 13 ACPs that were successfully prescreened (or at least were not observed among the sequenced clones) are added back to the estimate of 59 ACPs, this yields a prediction of 72 Acp genes in the D. melanogaster genome. The third method was designed for an unscreened library, and fits the data to a Poisson mixture model by nonparametric maximum likelihood (JI-PING WANG, personal communication). The perl script eststat.pl (available at http://www.floralgenome.org/cgi-bin/eststat/eststat.cgi) took the frequency spectrum of EST hits and produced an estimate of the total count of distinct ACPs in the library at 106. This figure may be considered as an upper bound because of the prescreening that was applied to the library, leaving a more uniform frequency distribution than would be found in an unscreened library.
D. simulans and D. yakuba sequence comparisons to D. melanogaster ACPs:
Nonsynonymous substitutions per nonsynonymous site (dN) and synonymous substitutions per synonymous site (dS) values for some previously characterized ACPs (AGUADÉ et al. 1992; CIRERA and AGUADÉ 1997; AGUADÉ 1999; BEGUN et al. 2000; KOHN et al. 2004) were incorporated into this analysis. D. yakuba sequences were retrieved via BlastN alignment outputs of the D. melanogaster ACPs to the D. yakuba genome (WashU-GSC http://genome.wustl.edu/blast/). D. simulans and D. yakuba coding regions (see http://www.genetics.org/supplemental/) were aligned to the D. melanogaster coding regions with ClustalX (THOMPSON et al. 1997). dN and dS values were calculated using DNASP 4.0 (ROZAS et al. 2003). In a few cases, partial gene sequences were used. In a single D. yakuba case, CG32952-B, an adenine to cytosine change disrupted the apparent start codon. No other plausible ATG could be identified upstream of CG32952-B to compensate for this difference and CG32952-B was thus omitted from D. melanogaster to D. yakuba comparisons, although rare CUG start codons do exist (PRATS et al. 1989). D. simulans and D. yakuba codon bias values (see http://www.genetics.org/supplemental/) were calculated as above. The D. yakuba non-Acp data set was obtained from a set of non-sex-specific transcripts (DOMAZET-LOSO and TAUTZ 2003). The StatView statistical program (version 5.0.1; SAS Institute) was used for statistical analyses.
Detection of Acp orthologs in D. pseudoobscura:
The whole-genome alignment (WGA) of the D. melanogaster and D. pseudoobscura genome (RICHARDS et al. 2005) was taken from (EMBERLY et al. 2003). The SMASH program (ZAVOLAN et al. 2003) was used to find the strongest set of syntenic anchors between the D. pseudoobscura contigs and the D. melanogaster genome. Anchors were high-similarity regions from 10s to 100s of base pairs and covered
30% of the genome. The LAGAN program (BRUDNO et al. 2003) gave similar alignments. Since the size of syntenic domains between the two species generally exceeds 10 kb (i.e., much larger than most repeat elements within the sequenced euchromatin), using synteny eliminated almost all ambiguities due to repeats. The SMASH blocks along with the contigs they matched were displayed on top of the Release 3 annotation (CELNIKER et al. 2002) using GBROWSE (http://www.gmod.org/ggb/index.shtml).
We then examined the syntenic regions for each Acp individually at the sequence level. In 48 of 51 cases (51 ACPs instead of 52 were compared because Acp53Eb's sequence information has yet to be determined), SMASH blocks from a single contig either bracketed or "hit" the annotated gene in D. melanogaster. SMASH blocks from a single D. pseudoobscura contig that span a given Acp locus indicate that the Acp genomic region in question can be aligned at the sequence level. For CG31872, a contiguous D. pseudoobscura sequence could not be aligned because the Acp gene fell into a gap between two contigs. Two other cases, CG14560 and CG9074, contained SMASH block hits to multiple contigs that differed from the contig spanning this region. The coding sequence of CG14560 and CG9074 were then submitted to Repeatmasker (http://www.repeatmasker.org/), which indicated that both ACPs contained repetitive regions, thus explaining the multiple SMASH block contig hits. After filtering out the repetitive regions for CG14560 and CG9074, we could generate a single contig that bracketed each gene. Upon verification of the D. melanogaster to D. pseudoobscura contig alignments of the ACPs, we retrieved the corresponding D. pseudoobscura sequence within the aligned contig and searched the D. pseudoobscura contig sequence via tBlastN using the D. melanogaster protein sequence. If coding sequence alignments could not be identified, we used GENSCAN (BURGE and KARLIN 1997) and Genie (REESE et al. 2000) to locate possible ORFs. All ACPs for which coding sequence alignments could be generated with the corresponding D. pseudoobscura contig region are considered true orthologs (Table 1). The SMASH block-based coding sequence alignments were confirmed using another more recent D. pseudoobscura WGA (KAROLCHIK et al. 2003). D. pseudoobscura coding sequences of conserved ACPs and D. melanogaster to D. pseudoobscura contig alignments for absent or undetectable ACPs can be found in the web supplement (http://www.genetics.org/supplemental/). It is important to note that even though we define conserved ACPs between D. melanogaster and D. pseudoobscura as true orthologs, we have not determined whether these ACPs have maintained their accessory gland expression in D. pseudoobscura.
D. melanogaster ACPs that could not be detected within the retrieved D. pseudoobscura contig were searched via tBlastN to the D. pseudoobscura genome, via the Baylor College of Medicine Drosophila Genome project website (http://www.hgsc.bcm.tmc.edu/projects/drosophila/). For tBlastN searches, only hits with an E-value of 1e-04 (ZDOBNOV et al. 2002) or smaller were considered significant. Whenever a significant tBlastN hit in D. pseudoobscura was identified, the corresponding D. pseudoobscura sequence was then return searched against the D. melanogaster genome (http://www.flybase.net) via BlastP to determine whether it hit the Acp in question or a protein within a similar sequence/structure-function class. In all cases significant D. pseudoobscura tBlastN hits were false positives [e.g., D. melanogaster ACPs CG8137 (serpin) and CG9334 (serpin) both hit the D. pseudoobscura ortholog of CG9456 (serpin)]. Alignments and "false-positive D. melanogaster genes" for ACPs whose true ortholog could not be detected via WGA, yet have a significant tBlastN hit in D. pseudoobscura whose return D. melanogaster BlastP does not match an Acp, can be found in the supplemental materials (http://www.genetics.org/supplemental/).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
These 52 D. melanogaster ACPs are expected to be extracellular and thus transferred to the female upon mating and to be produced primarily or exclusively in the male's accessory gland. Indeed, all 16 Acp genes tested so far encode seminal proteins detectable only in the male's accessory gland and transferred to the female during mating (CHEN et al. 1988; MONSMA et al. 1990; COLEMAN et al. 1995; BERTRAM et al. 1996; LUNG et al. 2002; ALBRIGHT 2003; RAVI RAM et al. 2005). Additional support that this set of 52 D. melanogaster ACPs truly represents accessory-gland predominant genes stems from the finding that 29 of 46 tested Acp genes showed twofold or higher expression values in germlineless male vs. germlineless female comparisons (6 ACPs were not present on the microarrays) (PARISI et al. 2004).
Presence of multiple Acp gene duplicates across the D. melanogaster genome:
About 40% of the D. melanogaster genome (5536 of 13601 genes) appears to be gene duplicates (RUBIN et al. 2000b). Similarly, 16 (31%) of the 52 ACPs appear to have gene duplicates (Table 2) within the D. melanogaster genome. CG8137 and CG9334 are the only gene duplicates not in tandem, although they share the same intron splice positions. Percent identities of the Acp gene duplicates range from 25 [CG3801 (Acp76A) and BG642378] to 92% (CG6289 and CG6663), indicating that a range of recent and ancient gene duplicates have been identified.
|
For seven additional ACPs we detect duplicates in the genome (Table 2A). Again, tandem arrangements are seen in D. simulans and D. yakuba, and the D. melanogaster duplicates share the same splice site positions. However, in these seven cases, only one member of each duplicate pair is a member of our 52-Acp collection. This could be because the collection is incomplete (only 52 of the predicted 70106 ACPs are described here), because a given duplicate's expression might not fit our stringent criteria of accessory gland-predominant expression, or because a given duplicate has an entirely different expression pattern. An example of the first is CG17799. This gene duplicate of CG17797 (Acp29AB) has recently been shown to also be expressed in the D. melanogaster accessory gland (HOLLOWAY and BEGUN 2004), but is not among the 52 genes we focused on here, simply because it was not detected in the SWANSON et al. (2001a) EST screen or previous screens. It is likely that other gene duplicates of ACPs whose expression profiles have yet to be determined may later be identified as ACPs. The identification of the Acp gene duplicates will have an impact on future genetic analysis since duplication may introduce genetic redundancy. Additionally, since many ACPs are rapidly evolving, ACPs provide a good example for defining which evolutionary processes drive the divergence of gene duplicates.
Comparative sequence analysis of the D. melanogaster ACPs and their D. simulans and D. yakuba orthologs:
Several ACPs have features indicative of rapid evolution (AGUADÉ et al. 1992; CIRERA and AGUADÉ 1997; TSAUR and WU 1997; AGUADÉ 1999; BEGUN et al. 2000; PANHUIS et al. 2003; KERN et al. 2004; KOHN et al. 2004; STEVISON et al. 2004) and SWANSON et al.'s (2001a) data suggested that rapidly evolving genes are represented at a high level among ACPs. With our larger collection of fully annotated Acp genes, and the recent release of Drosophila genomic sequences, we could examine this question in detail. We investigated the patterns of codon bias and rates of evolution (by examining the rates of nonsynonymous and synonymous nucleotide substitution, dN and dS) for the 52 stringently defined Acp genes and compared those results to those with a control set of genes that are not expressed in the accessory gland.
Codon bias:
Levels of codon bias have been used as a criterion for detecting rapidly evolving genes in Drosophila (SCHMID and AQUADRO 2001). Although codon bias alone cannot conclusively prove rapid evolution, genes that are rapidly evolving tend to have low codon bias (SCHMID et al. 1999). A previous study of 10 Acp genes (BEGUN et al. 2000) found that Acp genes tend to have lower levels of codon bias relative to the rest of the Drosophila genome. The 52 D. melanogaster Acp genes defined here as a class have significantly lower levels of codon bias (Mann-Whitney test, P < 0.001 for both Fop and G/C3rd calculations, Table 3) than the control random sample of D. melanogaster genes of approximately the same length. D. melanogaster Acp genes do not exhibit significant differences (Fop, Mann-Whitney test P = 0.612, G/C3rd Mann-Whitney test P = 0.302, Table 3) in codon bias from the majority of genes expressed in the testis. Comparing levels of codon bias in the D. simulans Acp gene orthologs to non-Acp genes, we also find that Acp genes exhibit lower levels of codon bias (data not shown). We also determined whether this phenomenon is found in a more distantly related species, D. yakuba. Levels of codon bias in D. yakuba ACPs were also significantly lower (Fop, Mann-Whitney test, P < 0.001, G/C3rd Mann-Whitney test, P < 0.001, Table 3) than those of a collection of D. yakuba non-ACPs (DOMAZET-LOSO and TAUTZ 2003).
|
Levels of divergence:
A high dN/dS ratio can identify genes for which amino acid replacement is being driven by a selective pressure. Acp genes have already been reported to demonstrate higher levels of sequence divergence than non-Acp genes between D. simulans and D. melanogaster (SWANSON et al. 2001a; KERN et al. 2004; STEVISON et al. 2004). However, those analyses used only partial sequences or included genes that our present analyses have shown not to fit the stringent definition of ACPs in D. melanogaster and thus could be subject to additional or different selection pressures.
|
Using levels of dN and dS as a metric to identify rapidly evolving genes, which have a dN/dS value >1, SWANSON et al. (2001a) identified 19 genes whose partial sequence had dN/dS >1 in D. melanogaster/D. simulans comparisons. However, our reanalysis of the 52 ACPs using complete gene sequences yields only 3 ACPs from both D. melanogaster/D. simulans and D. melanogaster/D. yakuba comparisons with dN/dS >1 (Table 1). We believe this discrepancy between the SWANSON et al. (2001a) results and those reported here is because we analyzed full-length coding regions from an accurately annotated list of genes instead of partially sequenced cDNAs, which in some cases were misaligned. In addition, for many rapidly evolving genes often only part of the gene is under positive selection (HUGHES and NEI 1988; SWANSON et al. 2001b). Thus, some partial cDNAs analyzed by SWANSON et al. (2001a) may have fortuitously contained regions under positive selection giving a higher dN/dS than when the entire gene is tested. For this reason a dN/dS >0.5 was recently proposed as a more practical cutoff when using full-length sequences, to identify candidate genes that may be driven by positive selection (SWANSON et al. 2004). Applying this cutoff value of 0.5 to the 52 ACPs we find that 9 ACPs (but not the same 9 as in SWANSON et al. 2001a) in both D. melanogaster/D. simulans and D. melanogaster/D. yakuba have dN/dS >0.5. This proportion of ACPs (9/52, 17%) is similar to the percentage of ACPs identified in the SWANSON et al. (2001a) male accessory gland EST screen (19%) with a dN/dS >0.5. Comparable percentages of ACPs with a dN/dS >0.5 described here to those ACPs identified in SWANSON et al. (2001a) support the idea that dN/dS >0.5 may serve as a good indicator for candidate rapidly evolving genes (SWANSON et al. 2004). Further analysis of the role of natural selection in shaping Acp sequence evolution using codon-substitution models will be presented elsewhere.
Detection of D. melanogaster Acp orthologs in D. pseudoobscura:
The complete genome sequence of D. pseudoobscura (RICHARDS et al. 2005) allowed us to search for conserved D. melanogaster ACPs in a distantly related species outside of the D. melanogaster subgroup. A whole-genome alignment (WGA) approach was used to determine which of the 52 D. melanogaster ACPs can be identified in D. pseudoobscura. Syntenic regions covering each Acp were generated for 50 ACPs. Limited sequence information for the other 2 ACPs (Acp53Eb and CG31872) prevented generation of accurate comparative genome sequence alignments. We verified all the D. melanogaster to D. pseudoobscura contig alignments and identified the corresponding D. pseudoobscura Acp, to generate coding sequence alignments between the two species. All D. melanogaster ACPs for which coding sequence alignments could be generated with the corresponding D. pseudoobscura contig are considered true orthologs (see Table 1). We found that, via WGA, 58% (29/50) of the D. melanogaster ACPs have true orthologs in D. pseudoobscura (Table 1). For the 21 D. melanogaster ACPs for which true orthologs could not be identified in D. pseudoobscura we used tBlastN against all D. pseudoobscura contigs and orphan sequences to ensure that we had not missed D. melanogaster ACPs that had moved to nonsyntenic chromosomal locations in D. pseudoobscura. In 10 cases, tBlastN comparisons gave significant D. pseudoobscura hits. However, each hit was interpreted as a false positive because it matched either repetitive sequence in the Acp or a different D. pseudoobscura gene with a respective non-Acp D. melanogaster ortholog (see MATERIALS AND METHODS). Our inability to detect a D. pseudoobscura ortholog for a D. melanogaster Acp gene via this method does not mean that a D. pseudoobscura ortholog does not exist, but only that our searches were negative. D. melanogaster ACPs undetectable in D. pseudoobscura via our methods could be highly diverged, located in an unsequenced region of the D. pseudoobscura genome, or potential D. melanogaster lineage-specific proteins. A recent study (WAGSTAFF and BEGUN 2005) uncovered a D. pseudoobscura gene with 18.5% amino acid sequence identity to D. melanogaster Acp26Aa. This is below the similarity level detectable in our search for D. pseudoobscura orthologs. For another gene, Acp95EF, our analysis revealed its D. pseudoobscura ortholog, which was undetected by WAGSTAFF and BEGUN (2005). Differences in methodologies and the limited alignability of the D. pseudoobscura genome (only
48%; RICHARDS et al. 2005) likely account for these two differences in Acp ortholog detection.
Of the 29 ACPs we found conserved between D. melanogaster and D. pseudoobscura it had been possible to generate comparative structural models to known protein classes for 20 (Table 1) (MUELLER et al. 2004). This represents a greater fraction (20/29, 69%) than is seen for those D. melanogaster ACPs that do not have D. pseudoobscura counterparts (9/21, 43%). That more proteins within predicted protein functional classes are conserved between D. melanogaster and D. pseudoobscura suggests that these proteins may mediate reproductive strategies that are conserved across Drosophila. Interestingly, the protease inhibitor class is not well conserved between the two species (Table 1): only one (Acp62F) of seven predicted or known Acp protease inhibitors is identifiable between the two species (Table 1). The lack of conservation of protease inhibitors between D. melanogaster and D. pseudoobscura is significantly greater than the percentage of ACPs not shared in all other protein classes (chi-square = 12.28, d.f. = 1, P < 0.001). ACPs that are predicted protease inhibitors have been suggested to participate in sperm storage, cost of mating [specifically Acp62F (LUNG et al. 2002)], and/or immune regulation (KHUSH and LEMAITRE 2000; MCGRAW et al. 2004), which may contribute to their evolution between D. melanogaster and D. pseudoobscura lineages.
Comparative sequence analysis within the D. melanogaster subgroup of ACPs shared or not shared with D. pseudoobscura:
Within the set of ACPs conserved between D. melanogaster and D. pseudoobscura, we examined levels of codon bias and dN/dS with two other species in the D. melanogaster subgroup. We tested whether codon bias and dN/dS values could distinguish those D. melanogaster ACPs that share or do not share true orthologs in D. pseudoobscura. We find that D. melanogaster ACPs without detectable D. pseudoobscura true orthologs have significantly lower levels of codon bias in D. melanogaster (Fop and G/C3rd Mann-Whitney test, P = 0.001 and P < 0.001, respectively) and D. yakuba than ACPs conserved between D. melanogaster and D. pseudoobscura (Fop and G/C3rd Mann-Whitney test, P < 0.001 and P < 0.001, respectively, Table 5A). Additionally, levels of dN/dS are significantly higher for D. melanogaster/D. simulans and D. melanogaster/D. yakuba comparisons of ACPs without true orthologs in D. pseudoobscura compared to ACPs conserved between D. melanogaster and D. pseudoobscura (D. simulans and D. yakuba, Mann-Whitney test, P = 0.002 and P < 0.001, respectively, Table 5A). This subgroup divergence analysis can be extended to the case of the D. melanogaster predicted protease inhibitor ACPs that do not have counterparts in D. pseudoobscura (Table 1). We find that the seven predicted or known Acp protease inhibitors have both significantly lower levels of codon bias and higher levels of sequence divergence (dN/dS) than ACPs in other predicted functional classes (Table 5B). Together, these results suggest that D. melanogaster ACPs without a true D. pseudoobscura ortholog have greater levels of sequence divergence (dN/dS) within the D. melanogaster subgroup than D. melanogaster ACPs with a detectable D. pseudoobscura ortholog. Those D. melanogaster ACPs with higher sequence divergence levels that do not have a true ortholog in D. pseudoobscura thus serve as good candidates for mediating reproductive functions in close relatives of D. melanogaster.
|
17% of the total D. melanogaster genome (CELNIKER et al. 2002), if the 52 ACPs were randomly distributed across the genome we would expect
9 of the 52 ACPs to fall on the X chromosome and 43 on autosomes. The presence of only a single X-linked Acp is highly unlikely to have occurred by chance (Gcorr = 7.908, d.f. = 1, P = 0.005), supporting reports that the D. melanogaster X chromosome is deficient in male-biased genes (WOLFNER et al. 1997; ANDREWS et al. 2000; SWANSON et al. 2001a; PARISI et al. 2003; RANZ et al. 2003).
An alternative approach to understanding the chromosomal bias of sex-specific genes is to focus on the region that contains the single X-linked D. melanogaster Acp. The 50-kb region flanking CG11664 is unusual in several respects. First, CG11664 lies in an apparently gene-poor region, with only six other genes within the surrounding 100 kb. On average there are
11 genes/100 kb in the D. melanogaster genome (= 13792 genes/120 Mb) (ADAMS et al. 2000; CELNIKER et al. 2002). Second, of the 6 neighboring genes, 4 (CG3713, CG11663, CG14634, and CG14635) appear to be testis biased in their expression (ANDREWS et al. 2000; PARISI et al. 2004, no expression data could be found for CG14632 and CG14633); thus, perhaps this region is a "hotspot" for harboring male-biased genes on the X chromosome. Third, more than half of the genes in this region do not appear to be conserved between D. pseudoobscura and D. melanogaster, consistent with the report that male-biased genes tend to evolve more rapidly at both expression (RANZ et al. 2003) and sequence (PARISI et al. 2003) levels. Fourth, five of the six neighboring ORFs, in addition to CG11664, are intronless, suggesting they may be retrogenes. Additionally, this region appears to also be a hotspot for transposable elements. In the recent transposable element (piggyBac and P element) insertion mutagenesis collection release of 16,500 fly lines (THIBAULT et al. 2004), the 100-kb region surrounding CG11664 contained 34 insertions, which is more than the average of
14 transposable elements/100 kb (= 16,500 elements/120 Mb). Altogether, the region surrounding CG11664 contains a number of unique features that may help determine what pressures are driving the evolution of sex-specific genes on the X chromosome in D. melanogaster.
Multiple hypotheses including sexual antagonism, dosage compensation, and X inactivation may explain the paucity of male-biased genes on the D. melanogaster X chromosome (reviewed in OLIVER and PARISI 2004). The ability to help distinguish the importance of these phenomena could be assisted by looking at D. pseudoobscura. In D. pseudoobscura, the X chromosome consists primarily of a region largely syntenic to the left arm of the third chromosome in D. melanogaster (3L) that fused more recently in the D. pseudoobscura lineage to a region syntenic to the X chromosome of D. melanogaster (SEGARRA and AGUADÉ 1992). Thus, all ACPs with D. pseudoobscura orthologs that are located on 3L in D. melanogaster [CG1262 (Acp62F), CG10852 (Acp63F), CG17673 (Acp70A), CG3801 (Acp76A), CG6289, CG13309, CG14560, BG642312, CG16707, CG8194, BG642378, and CG6168) would now be on the right arm of the D. pseudoobscura X chromosome (XR). If there is selection against X linkage for ACPs, we would expect a higher "loss" of ACPs from the "new" (D. melanogaster 3L homolog) X-linked genes in the D. pseudoobscura lineage than for ACPs on autosomes in D. pseudoobscura. We find that a larger proportion of new ACPs on the D. pseudoobscura X chromosome are not shared between the two species (as compared to autosomal ACPs in D. pseudoobscura), although this difference is not statistically significant [D. pseudoobscura X chromosome (7/13 = 54% absent or undetected) vs. autosomes (13/36 = 36% absent or undetected); chi-square = 1.01, d.f. = 1, P = 0.322]. That fewer X-linked D. pseudoobscura ACPs are conserved than autosomal ACPs is consistent with selection against X-linked Acp's. However, the D. melanogaster 3L's base chromosome and its D. pseudoobscura XR counterpart show the second lowest level of genome sequence alignability between species: 46.5% of D. melanogaster 3L's base pairs are alignable with D. pseudoobscura XR as compared to an average across all chromosomes of 48%. Therefore, the relatively low sequence conservation of the D. pseudoobscura XR arm suggests that loss or translocation of ACPs from this arm may have resulted from the particular X-chromosomal evolutionary dynamics in the D. pseudoobscura lineage rather than from any sex-specific selection acting differentially on X chromosomes vs. autosomes.
Conclusions:
Genes with increased rates of evolution increase the frequency with which incompatibilities evolve between closely related species. Since some ACPs in Drosophila evolve faster than other genes, these rapidly evolving ACPs serve as good candidates for examining the selection pressures associated with reproductive functions. We have characterized here such divergent ACPs, whose divergence may be attributable to sexually antagonistic evolution with proteins from the female or male (SWANSON et al. 2001b; SWANSON and VACQUIER 2002). The female's genotype has been shown to play an active role in sperm displacement (CLARK and BEGUN 1998) and a recent EST screen identified a number of candidate receptors/sexually antagonistic genes for ACPs (SWANSON et al. 2004). Candidate receptors would likely serve as the most upstream female genes in signaling pathways for the numerous biological processes/pathways regulated by ACPs, sperm, and the act of mating (MCGRAW et al. 2004). The comprehensive set of ACPs described here thus provides a basis for understanding both the evolutionary dynamics and function of specific ACPs. This, in turn, may help tease apart the functional importance of male-female interactions during the evolution of reproductive isolation.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
1 Present address: Department of Biology, Nobel Hall of Science, Gustavus Adolphus College, St. Peter, MN 56082. ![]()
| LITERATURE CITED |
|---|
|
|
|---|
ADAMS, M. D., S. E. CELNIKER, R. A. HOLT, C. A. EVANS, J. D. GOCAYNE et al., 2000 The genome sequence of Drosophila melanogaster. Science 287: 21852195.
AGUADÉ, M., 1999 Positive selection drives the evolution of the Acp29AB accessory gland protein in Drosophila. Genetics 152: 543551.
AGUADÉ, M., N. MIYASHITA and C. H. LANGLEY, 1992 Polymorphism and divergence in the Mst26A male accessory gland gene region in Drosophila. Genetics 132: 755770.[Abstract]
AIGAKI, T., I. FLEISCHMANN, P. S. CHEN and E. KUBLI, 1991 Ectopic expression of sex peptide alters reproductive behavior of female D. melanogaster. Neuron 7: 557563.[CrossRef][Medline]
AKASHI, H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136: 927935.[Abstract]
ALBRIGHT, S. N., 2003 Molecular and genetic characterization of Acp29AB and identification of Acp interactors in Drosophila melanogaster. Ph.D. Thesis, Cornell University, Ithaca, NY.
ANDREWS, J., G. G. BOUFFARD, C. CHEADLE, J. LU, K. G. BECKER et al., 2000 Gene discovery using computational and microarray analysis of transcription in the Drosophila melanogaster testis. Genome Res. 10: 20302043.
BAUER, V. L., and C. F. AQUADRO, 1997 Rates of DNA sequence evolution are not sex-biased in Drosophila melanogaster and D. simulans. Mol. Biol. Evol. 14: 12521257.[Abstract]
BEGUN, D. J., and P. WHITLEY, 2000 Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc. Natl. Acad. Sci. USA 97: 59605965.
BEGUN, D. J., P. WHITLEY, B. L. TODD, H. M. WALDRIP-DAIL and A. G. CLARK, 2000 Molecular population genetics of male accessory gland proteins in Drosophila. Genetics 156: 18791888.
BERTRAM, M. J., D. M. NEUBAUM and M. F. WOLFNER, 1996 Localization of the Drosophila male accessory gland protein Acp36DE in the mated female suggests a role in sperm storage. Insect Biochem. Mol. Biol. 26: 971980.[CrossRef][Medline]
BETANCOURT, A. J., D. C. PRESGRAVES and W. J. SWANSON, 2002 A test for faster X evolution in Drosophila. Mol. Biol. Evol. 19: 18161819.
BLOCH QAZI, M. C., and M. F. WOLFNER, 2003 An early role for the Drosophila melanogaster male seminal protein Acp36DE in female sperm storage. J. Exp. Biol. 206: 35213528.
BOUTANAEV, A. M., A. I. KALMYKOVA, Y. Y. SHEVELYOV and D. I. NURMINSKY, 2002 Large clusters of co-expressed genes in the Drosophila genome. Nature 420: 666669.[CrossRef][Medline]
BRUDNO, M., C. B. DO, G. M. COOPER, M. F. KIM, E. DAVYDOV et al., 2003 LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721731.
BURGE, C., and S. KARLIN, 1997 Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 7894.[CrossRef][Medline]
CARNINCI, P., and Y. HAYASHIZAKI, 1999 High-efficiency full-length cDNA cloning. Methods Enzymol. 303: 1944.[Medline]
CELNIKER, S. E., D. A. WHEELER, B. KRONMILLER, J. W. CARLSON, A. HALPERN et al., 2002 Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 3: RESEARCH0079.
CHAPMAN, T., J. BANGHAM, G. VINTI, B. SEIFRIED, O. LUNG et al., 2003 The sex peptide of Drosophila melanogaster: female post-mating responses analyzed by using RNA interference. Proc. Natl. Acad. Sci. USA 100: 99239928.
CHAPMAN, T., and S. J. DAVIES, 2004 Functions and analysis of the seminal fluid proteins of male Drosophila melanogaster fruit flies. Peptides 25: 14771490.[CrossRef][Medline]
CHAPMAN, T., L. F. LIDDLE, J. M. KALB, M. F. WOLFNER and L. PARTRIDGE, 1995 Cost of mating in Drosophila melanogaster females is mediated by male accessory gland products. Nature 373: 241244.[CrossRef][Medline]
CHEN, P. S., E. STUMM-ZOLLINGER, T. AIGAKI, J. BALMER, M. BIENZ et al., 1988 A male accessory gland peptide that regulates reproductive behavior of female D. melanogaster. Cell 54: 291298.[CrossRef][Medline]
CIRERA, S., and M. AGUADÉ, 1997 Evolutionary history of the sex-peptide (Acp70A) gene region in Drosophila melanogaster. Genetics 147: 189197.[Abstract]
CIVETTA, A., and R. S. SINGH, 1995 High divergence of reproductive tract proteins and their association with postzygotic reproductive isolation in Drosophila melanogaster and Drosophila virilis group species. J. Mol. Evol. 41: 10851095.[Medline]
CLARK, A. G., and D. J. BEGUN, 1998 Female genotypes affect sperm displacement in Drosophila. Genetics 149: 14871493.
CLARK, A. G., M. AGUADÉ, T. PROUT, L. G. HARSHMAN and C. H. LANGLEY, 1995 Variation in sperm displacement and its association with accessory gland protein loci in Drosophila melanogaster. Genetics 139: 189201.[Abstract]
CLARK, A. G., D. J. BEGUN and T. PROUT, 1999 Female x male interactions in Drosophila sperm competition. Science 283: 217220.
COLEMAN, S., B. DRAHN, G. PETERSEN, J. STOLOROV and K. KRAUS, 1995 A Drosophila male accessory gland protein that is a member of the serpin superfamily of proteinase inhibitors is transferred to females during mating. Insect Biochem. Mol. Biol. 25: 203207.[CrossRef][Medline]
COULTHART, M. B., and R. S. SINGH, 1988 Differing amounts of genetic polymorphism in testes and male accessory glands of Drosophila melanogaster and Drosophila simulans. Biochem. Genet. 26: 153164.[CrossRef][Medline]
DOMAZET-LOSO, T., and D. TAUTZ, 2003 An evolutionary analysis of orphan genes in Drosophila. Genome Res. 13: 22132219.
DURET, L., and D. MOUCHIROUD, 1999 Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96: 44824487.
EBERHARD, W. G., and C. CORDERO, 1995 Sexual selection by cryptic female choice on male seminal products - a new bridge between sexual selection and reproductive physiology. Trends Ecol. Evol. 10: 493496.[CrossRef]
EMBERLY, E., N. RAJEWSKY and E. D. SIGGIA, 2003 Conservation of regulatory elements between two species of Drosophila. BMC Bioinformatics 4: 57.[CrossRef][Medline]
FRIEDMAN, R., and A. L. HUGHES, 2003 The temporal distribution of gene duplication events in a set of highly conserved human gene families. Mol. Biol. Evol. 20: 154161.
GU, Z., A. CAVALCANTI, F. C. CHEN, P. BOUMAN and W. H. LI, 2002 Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol. Biol. Evol. 19: 256262.
HEIFETZ, Y., O. LUNG, E. A. FRONGILLO, JR. and M. F. WOLFNER, 2000 The Drosophila seminal fluid protein Acp26Aa stimulates release of oocytes by the ovary. Curr. Biol. 10: 99102.[CrossRef][Medline]
HERNDON, L. A., and M. F. WOLFNER, 1995 A Drosophila seminal fluid protein, Acp26Aa, stimulates egg laying in females for 1 day after mating. Proc. Natl. Acad. Sci. USA 92: 1011410118.
HOLLOWAY, A. K., and D. J. BEGUN, 2004 Molecular evolution and population genetics of duplicated accessory gland protein genes in Drosophila. Mol. Biol. Evol. 21: 16251628.
HUGHES, A. L., and M. NEI, 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335: 167170.[CrossRef][Medline]
INGMAN-BAKER, J., and E. P. CANDIDO, 1980 Proteins of the Drosophila melanogaster male reproductive system: two-dimensional gel patterns of proteins synthesized in the XO, XY, and XYY testis and paragonial gland and evidence that the Y chromosome does not code for structural sperm proteins. Biochem. Genet. 18: 809828.