| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 176, 1131-1137, June 2007, Copyright © 2007
doi:10.1534/genetics.106.069245
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Section of Evolution and Ecology, University of California, Davis, California 95616 and
Department of Biology and Genome Sciences Center, University of North Carolina, Chapel Hill, North Carolina 27599
1 Corresponding author: Section of Evolution and Ecology, University of California, Davis, CA 95616.
E-mail: djbegun{at}ucdavis.edu
| ABSTRACT |
|---|
|
|
|---|
While the origin of new genes by duplication of pre-existing coding sequence is clearly established as an important component of genome evolution, the question of novel genetic functions that do not clearly derive from closely related genes has received less attention. A recent whole genome analysis of annotated Drosophila melanogaster genes was specifically designed to identify empirically validated genes that have no clearly homologous gene-related sequences in D. melanogaster or its close relatives (LEVINE et al. 2006). We refer to this class of orphan genes as "de novo," to suggest the possibility that they may derive from ancestrally noncoding sequence. Such genes would likely have novel functions that had recently evolved under directional selection in D. melanogaster. LEVINE et al. (2006) proposed that there are a minimum of five such genes in D. melanogaster and/or D. simulans that are probably absent from D. yakuba, D. erecta, and D. anannasae. These D. melanogaster/D. simulans putative de novo genes are strongly testis biased in expression, which supports the hypothesis that male reproductive functions are under particularly strong selection for novel functions. Interestingly, four of the five genes reported in LEVINE et al. (2006) are X-linked. One of these X-linked genes is located near previously identified novel, testis-expressed genes in D. melanogaster, which suggests the possibility that larger-scale, chromosomal phenomena contribute to the origination patterns of such genes (LEVINE et al. 2006).
Despite the obvious appeal of investigating novel genes in the D. melanogaster model system, a comprehensive view of the evolution of novelty requires investigation of other lineages. Comparative genomic investigation of novelty enables stronger inferences regarding evolutionary patterns and processes than can be gleaned from a strictly D. melanogaster-centric viewpoint. However, an investigative cost is incurred with increasing divergence from D. melanogaster because the advantage of its high quality genome sequence and annotation is compromised by reduced quality of genome sequence alignments. In light of these considerations, the melanogaster subgroup of Drosophila is a nearly ideal system for the investigation of novelty. The subgroup contains a number of species of varying phylogenetic distance from each other and from D. melanogaster. Furthermore, sequence divergence between D. melanogaster and these other species is sufficiently low to allow high quality alignments over much of the genome.
The identification of putative de novo genes in species that are closely related to D. melanogaster requires empirical investigation into the gene complement of these other species. A comprehensive description of transcriptomes from other Drosophila species, which would greatly facilitate investigation of these issues, is currently unavailable. However, description of male reproductive tissue (e.g., accessory gland and testis) transcriptomes may be a relatively efficient strategy for identifying putative de novo genes. For example, investigation of D. yakuba and D. erecta accessory gland cDNA libraries revealed a number of potential de novo genes in these species (BEGUN et al. 2006). It should be noted that these approaches bias one toward discovery of novel genes functioning in reproduction. Our goal here was to extend these earlier studies (BEGUN et al. 2006; LEVINE et al. 2006) by identifying potential de novo testis-biased genes in D. yakuba and/or D. erecta through analysis of a D. yakuba testis-derived cDNA library, which was sequenced as part of the D. yakuba genome project (http://www.dpgp.org and http://genome.wustl.edu).
| MATERIALS AND METHODS |
|---|
|
|
|---|
Identification of candidate novel genes from testis ESTs:
We used procedures similar to those described in WAGSTAFF and BEGUN (2005), BEGUN et al. (2006), and LEVINE et al. (2006) to identity putative de novo genes. These procedures included an initial computational analysis of D. yakuba ESTs vs. genomic data from D. yakuba, D. melanogaster, D. erecta, and D. ananassae to identify a set of candidates. Genome assemblies of orthologous regions were then compared to generate microsyntenic alignments between D. yakuba and the other species. Candidate genes were characterized by RACE in D. yakuba. In many cases alignments, transcript data, and computational analysis of orthologous regions very strongly suggested that a given D. yakuba gene was absent from another species. In several cases, however, other species had highly diverged, yet apparently homologous sequence. For these cases, RTPCR or reverse Northern analysis was used to investigate transcription in other species.
BLAST analysis of ESTs:
We used BLAST analysis of D. yakuba genome assembly v2.0. (http://genome.wustl.edu) to identify the genomic region corresponding to each of 8772 ESTs. ESTs were also compared to SNAP (KORF 2004) gene predictions in D. yakuba. SNAP predictions that overlapped an EST were subsequently used in downstream analysis to evaluate regions of interest. Otherwise, ESTs alone were used in BLAST analyses. D. yakuba genomic regions corresponding to D. yakuba ESTs or EST/SNAP sequences were sequentially BLASTed vs. local BLAST databases of the D. melanogaster genome and to the NCBI trace archive for D. ananassae and D. erecta. We also compared these regions to all known transposable element sequences. Only the best BLAST hit from each database was retained for further analysis. For each D. yakuba region, we generated a summary score, which was the product of the e-values from non-D. yakuba BLAST comparisons (LEVINE et al. 2006). Regions with the highest values (i.e., worst matches across all data sets) became our candidates for D. yakuba-specific de novo genes, which ultimately derive from the testis EST collection. Following the same basic protocol, we created an analogous list of D. yakuba/D.erecta-specific orphans by limiting the candidate list to those with genes present in both D. yakuba and D.erecta, but absent in the other genomic data sets.
Syntenic alignment and de novo status:
BLAT comparisons of D. yakuba testis ESTs/cDNAs to the D. yakuba genome were used to identify D. yakuba genomic regions of variable size (generally several kilobases), which were then compared to the D. melanogaster, D. erecta, and D. ananassae genome assemblies (BLAT via UCSC Genome Browser (KENT et al. 2002, http://genome.ucsc.edu). Each putative orthologous gene region was then investigated in detail by pairwise alignments among species using the Martinez/Needleman-Wunsch algorithm as implemented in DNASTAR. For some D. yakuba gene regions, alignments to other species revealed evidence of homologous sequence in D. melanogaster, D. erecta, or D. ananassae, but no obvious evidence of an open reading frame (ORF) that was orthologous to the D. yakuba gene. In such cases, we computationally investigated the genomic sequence in the orthologous region for ORFs to determine protein-coding capacity in other (i.e., non-D. yakuba) species and whether any predicted proteins associated with these ORFs showed sequence similarity or similar protein lengths relative to the candidate. For D. yakuba genes for which these analyses left the status of an orthologous ORF in doubt in another species, we used RTPCR on RNA isolated from whole males and females of the appropriate species to investigate whether there was any evidence of transcription of the orthologous region (Table 1). These experiments were designed to detect highly diverged orthologous genes that would not typically be identified on the basis of sequence similarity. We also investigated whether putative D. yakuba/D. erecta de novo genes corresponded to D. melanogaster ESTs in the orthologous genomic region. We found no evidence of D. melanogaster ESTs (or genes) for any of the putative de novo genes presented in Table 1.
|
Characterization of transcripts and predicted proteins from candidate genes:
Each candidate gene was subjected to 5' and 3' RACE on D. yakuba line Tai18E2, the genome sequence stock, followed by sequence analysis of RACE products. Sequence discrepancies (e.g., indels or SNPs) between RACE products (or RTPCR products, where applicable) and genomic sequence were resolved in favor of the D. yakuba genome sequence. We determined whether D. yakuba candidate genes were testis biased in expression by using RTPCR on RNA isolated from dissected male testis, accessory gland and carcass, as well as RNA isolated from whole females, all from line Tai18E2. BLASTp was used to compare predicted proteins derived from RACE and EST analyses to protein databases.
Splicing of transcripts for non-D. yakuba species was inferred from sizes of genomic regions vs. RTPCR products or by direct sequencing of RTPCR products. In some instances, the longest ORF in D. yakuba corresponded to a D. teissieri sequence that did not share the putative D. yakuba initiation codon. In such cases we used the next in-frame, shared initiation codon as the putative start of the protein-coding region. Cases in which there was no shared initiation codon corresponding to a long ORF (see RESULTS) could be taken as support for the hypothesis that a gene produces a noncoding RNA.
All analysis and inference based on EST data, which were used to generate our list of candidates, were ultimately rechecked after the D. yakuba/D. erecta genes of interest were characterized by 5' and 3' RACE. Thus, any biases associated with analysis of ESTs did not affect the ultimate inference of de novo/orphan status. Gene names used throughout this article derive from the definition line of the GenBank entry of the associated EST, which can be recovered from GenBank. For example, gene 22f01y1 (Table 1) is named on the basis of the definition line for the EST in GenBank accession CV790171, which can be recovered from GenBank by searching on zaa22f01.y1. ESTs corresponding to the genes in Table 1 can be found this way, using the zaa prefix. Throughout this article we use abbreviated gene names. New sequences for this report can be found in GenBank accessions EF508208EF508269 and EF525530EF525535. The phylogenetic relationships of the melanogaster group species investigated here (Table 1) can be found at http://flybase.bio.indiana.edu/blast/.
Polymorphism and divergence:
Isofemale or inbred lines of D. yakuba (obtained from P. Andolfatto, University of California, San Diego) and D. teisierri (obtained from M. Long, University of Chicago) were used for the population genetic or divergence analysis. Direct sequencing of PCR products was used for inbred lines. For each isofemale line, high-fidelity PCR was followed by cloning, colony PCR, and, finally, sequencing of a single insert. Summary statistics were calculated in DnaSP (ROZAS et al. 2003).
| RESULTS |
|---|
|
|
|---|
These data left us with seven candidate, recently evolved genes. For some genes, the inference that they are recently evolved was weakened by suspect alignments of microsyntenic regions, which made the design of RTPCR experiments problematic. In these cases, reverse Northern analysis (WAGSTAFF and BEGUN 2005) provided no evidence of transcription of the orthologous regions (Table 1 and supplemental Figure 1 at http://www.genetics.org/supplemental/), which supports (but does not prove) our inference of recent evolution in D. yakuba/D. erecta. One gene, 93a11, appears to be absent from the orthologous region of D. teissieri as determined by sequence data from multiple D. teissieri isofemale lines, which suggests a recent origin in D. yakuba, subsequent to D. yakuba/D. teissieri speciation. None of the predicted proteins from the 11 genes that were investigated corresponds to known proteins or harbors known functional domains as determined by BLASTp analysis (MARCHLER-BAUER et al. 2003).
Gene organization and natural variation:
The D. yakuba genes described above are predicted to code for small proteins (Table 1). All transcripts are spliced (Figure 1), have canonical splice junctions, and are polyadenylated, which strongly supports the proposition that the genes are real rather than the result of experimental artifacts. Five of the genes have at least one untranslated exon. One gene has two 5' untranslated exons; another has two 3' untranslated exons. The putative ORFs for genes 52b02, 57b07, and 20f10 are <90 bp long. Thus, they either code for small peptides or do not produce proteins (TUPY et al. 2005). Support for the possibility that some of the genes described here are not protein coding comes from the fact that for 52b02 and 20f10, alignment of the orthologous region from D. tesisieri and D. yakuba suggests that D. teissieri does not have an orthologous start codon. The idea that several novel reproduction-related genes in the D. yakuba/D. erecta lineage may be RNA genes (BEGUN et al. 2006) is worth further consideration, but will require much additional work. Sequences corresponding to transcripts of the genes in Table 1 can be found in supplemental Table 1 (http://www.genetics.org/supplemental/).
|
25-codon windows with an average dN/dS > 4. Gene 90g02 is also the only gene (of three) on which we could carry out a McDonaldKreitman test (MCDONALD and KREITMAN 1991) that showed a significant deviation from the neutral model (fixed synonymous = 20, polymorphic synonymous = 16, fixed nonsynonymous = 89, polymorphic nonsynonymous = 26, G-test, P = 0.01), supporting adaptive protein divergence. Remarkably, syntenic alignments strongly suggest that this gene was recently lost in the D. erecta lineage (Table 1), in spite of the fact that it has a history of directional selection in the sibling species. This finding, which mirrors results from melanogaster subgroup Acps (BEGUN and LINDFORS 2005), supports the idea that functional roles of reproduction-related genes and the modes of selection impinging on them may change over relatively short time scales.
|
|
20% of de novo genes to be X-linked. The binomial probability of observing four or more X-linked de novo genes is 0.03. However, if the putative duplicated de novo genes zaa29a04x1 and zaa29a04x1-related derive from a single origination event, there are six originations, three of which are X-linked. This interpretation provides no statistical support for X-linked enrichment in the D. yakuba/D. erecta clade.
The duplicates zaa29a04x1 and zaa29a04x1-related reside
5 kb apart at the base of the X chromosome (D. yakuba assembly version 2.0). Remarkably, the D. melanogaster/D. simulans novel genes, Sdic (NURMINSKY et al. 1998), CG15323 (LEVINE et al. 2006), and hydra (H.-P. YANG, personal communication) are also testis biased in expression and located in the same genomic region. Thus, the
465-kb D. melanogaster region spanned by Sdic and hydra, which is not rearranged between D. melanogaster and D. yakuba (J. RANZ, personal communication), contains five novel testis-biased genes in two lineages. These data strongly suggest that a conserved property of this chromosomal region supports increased origination rates of novel testis-specific genes in Drosophila. The fact that the putative D. yakuba testis-biased orphan, 93a11, is only
7.5 kb from the D. yakuba/D. erecta-specific accessory gland protein gene, Acp223 (BEGUN et al. 2006), further supports the notion that physical locations of male-biased novel genes are nonrandom.
| DISCUSSION |
|---|
|
|
|---|
D. melanogaster novel, testis-biased genes were significantly enriched for X-linkage (LEVINE et al. 2006). The same general pattern was observed in the D. yakuba lineage, although it was not significant. Nevertheless, appearance of the same chromosomal pattern in two Drosophila lineages (in which 7 of 11 proposed originations are X-linked, when only 2 are expected) suggests the possibility that it is a manifestation of a fundamental process relating to the origin and/or fixation of such genes. The trend toward X-linkage in putative de novo genes is an interesting counterpoint to the origin of gene duplications by retrotransposition in Drosophila (BETRAN et al. 2002) and to the observation that genes showing male-biased expression are under-represented on the D. melanogaster X chromosome (PARISI et al. 2003).
Retrotransposed genes are more likely to have parental copies on the X and descendant, autosomal copies with testis-specific expression (BETRAN et al. 2002; BETRAN and LONG 2003; BAI et al. 2007), suggesting that novel, testis-expressed genes should tend to be autosomal. One possible explanation for the difference in chromosomal distribution for retrotransposed genes vs. the genes described here is that the genes described here are generally younger than the genes identified by BETRAN et al. (2002) and BAI et al. (2007). Alternatively, the mutation process underlying duplication by retrotransposition could be biased toward X ancestry and autosomal descent, while the mutation process for the genes described here could simply be biased toward the X chromosome. An interesting, if speculative, possibility for the preponderance of X-linkage among putative de novo testis-biased Drosophila genes is hypertranscription of the X chromosome in the male germline due to dosage compensation (GUPTA et al. 2006). This model proposes that transcription of noncoding regions in the male germline would be more likely to occur for the X chromosome because this chromosome would be hypertranscribed relative to the autosomes. Hypertranscription of genes could be associated with spurious transcription through regional effects (e.g., transcriptional domains) or through more local effects, such as readthrough transcription.
The issue of X-linkage among de novo genes is related to a remarkable feature of the data presented here and in previous work on novel D. melanogaster genes (LEVINE et al. 2006), namely, that the proximal region of the X appears to be a hotspot for fixation of novel testis-expressed genes. Two novel D. yakuba genes and three novel D. melanogaster genes are contained within a 500-kb region of the proximal X chromosome. This region may correspond to a testis expression domain in D. melanogaster (BOUTANAEV et al. 2002), which could lead to an increased origination rate of male reproduction-related genetic novelties by cooptation of noncoding DNA into coding function (BEGUN et al. 2006; LEVINE et al. 2006). This idea, if true, suggests that the extent and distribution of such chromosomal expression domains across lineages could have a substantial impact on the evolution of novelty in different species.
The path toward an understanding of the biological basis of selection for novelty in Drosophila male reproduction lies in revealing the functional biology of the genes reported here and elsewhere (e.g., LONG and LANGLEY 1993, BETRAN et al. 2002, NURMINSKY et al. 1998, LEVINE et al. 2006, BEGUN et al. 2006), through reverse genetic, cell biological, and biochemical analysis. Moreover, the existence of independently evolved, novel genes in closely related species provides an opportunity for comparative investigation of novel genes functioning in male reproduction. Such research could reveal whether similar biological processes are under selection for novelty in D. melanogaster and D. yakuba or, instead, whether each lineage has a unique constellation of selection pressures driving the evolution of novel male reproductive function.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
BAI, Y., C. CASOLA, C. FESCHOTTE and E. BETRAN, 2007 Comparative genomics reveals a constant rate of origination and convergent acquisition of functional retrogenes in Drosophila. Genome Biol. 8(1): R11.[CrossRef][Medline]
BEGUN, D. J., and H. A. LINDFORS, 2005 Rapid evolution of genomic Acp complement in the melanogaster subgroup of Drosophila. Mol. Biol. Evol. 22: 20102021.
BEGUN, D. J., H. A. LINDFORS, M. E. THOMPSON and A. K. HOLLOWAY, 2006 Recently evolved genes identified from Drosophila yakuba and D. erecta accessory gland expressed sequence tags. Genetics 172: 16751681.
BETRAN, E., and M. LONG, 2003 Dntf-2r, a young Drosophila retroposed gene with male specific expression. Genetics 164: 977988.
BETRAN, E., K. THORNTON and M. LONG, 2002 Retroposed new genes out of the X in Drosophila. Genome Res. 12: 18541859.
BOUTANAEV, A. M., A. I. KALMYKOVA, Y. Y. SHEVELYOV and D. I. NURMINSKY, 2002 Large clusters of co-expressed genes in the Drosophila genome. Nature 420(6916): 666669.[CrossRef][Medline]
GUPTA, V., M. PARISI, D. STURGILL, R. NUTALL, M. DOCTOLERO et al., 2006 Global analysis of X-chromosome dosage compensation. J. Biol. 5(1): 3.[CrossRef][Medline]
JONES, C. D., and D. J. BEGUN, 2005 Parallel evolution of chimeric fusion genes. Proc. Natl. Acad. Sci. USA 102: 1137311378.
KENT, W. J., C. W. SUGNET, T. S. FUREY, K. M. ROSKIN, T. H. PRINGLE et al., 2002 The Human Genome Browser at UCSC. Genome Res. 12(6): 9961006.
KORF, I., 2004 Gene finding in novel genomes. BMC Bioinformatics 5: 59.[CrossRef][Medline]
LEVINE, M. T., C. D. JONES, A. D. KERN, H. A. LINDFORS and D. J. BEGUN, 2006 Novel genes derived from non-coding DNA in Drosophila melanogaster are frequently X-linked and show testis-biased expression. Proc. Natl. Acad. Sci USA 103: 99359939.
LI, W.-H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.
LONG, M., and C. H. LANGLEY, 1993 Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260: 9195.
LONG, M., M. DEUTSCH, W. WANG, E. BETRAN, F. G. BRUNET et al., 2003 Origin of new genes: evidence from experimental and computational analysis. Genetica 118: 171182.[CrossRef][Medline]
LOPPIN, B, D. LEPETIT, S. DORUS, P. COUBLE and T. KARR, 2005 Origin and neofunctionalization of a Drosophila paternal effect gene essential for zygote viability. Curr. Biol. 15: 8793.[CrossRef][Medline]
MARCHLER-BAUER, A., J. B. ANDERSON, C. DEWEESE-SCOTT, N. D. FEDOROVA, L. Y. GEER et al., 2003 CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31: 383387.
MCDONALD, J., and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652654.[CrossRef][Medline]
NURMINSKY, D. I., M. V. NURMINSKAYA, D. DEAGUIAR and D. L. HARTL, 1998 Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature 396: 572575.[CrossRef][Medline]
OHNO, S., 1970 Evolution by Gene Duplication. Springer-Verlag, Berlin.
PARISI, M., R. NUTTALL, D. NAIMAN, G. BOUFFARD, J. MALLEY et al., 2003 Paucity of genes on the Drosophila X chromosome showing male-biased expression. Science 299: 697700.
PRÖSCHEL, M., Z. ZHANG and J. PARSCH, 2006 Widespread adaptive evolution of Drosophila genes with sex-biased expression. Genetics 174: 893900.
ROZAS, J., J. C. SANCHEZ-DELBARRIO, X. MESSEGYER and R. ROZAS, 2003 DnaSP, DNA polymorphism analysis by the coalescent and other methods. Bioinformatics 19: 24962497.
TUPY, J. L., A. M. BAILEY, G. DAILEY, M. EVANS-HOLM, C. W. SIEBEL et al., 2005 Identification of putative noncoding polyadenylated transcripts in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 102: 54955500.
WAGSTAFF, B. J., and D. J. BEGUN, 2005 Comparative genomics of accessory gland protein genes in Drosophila melanogaster and D. pseudoobscura. Mol. Biol. Evol. 22: 818832.
WANG, W., H. YU and M. LONG, 2004 Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species. Nat. Genet. 36: 523527.[CrossRef][Medline]
Communicating editor: L. HARSHMANRelated articles in Genetics:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |