| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 170, 207-219, May 2005, Copyright © 2005
doi:10.1534/genetics.104.037283
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,1
* Center for Population Biology, University of California, Davis, California 95616
Department of Biology and Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina 27599-3280
1 Corresponding author: Department of Biology, Carolina Center for Genome Sciences, CB 3280, 414 Coker Hall, University of North Carolina, Chapel Hill, NC 27599-3280.
E-mail: cdjones{at}email.unc.edu
| ABSTRACT |
|---|
|
|
|---|
Occasionally, duplication events lead to radical reorganization of gene structures that likely lead to dramatic and immediate functional divergence. One type of radical reorganization is gene fusion, whereby two previously separate and independent genes are fused to form a single contiguous gene. Such chimeric fusion genes (CFGs) have been identified in several taxa. For example, in plants CFGs are implicated in cytoplasmic male sterility (HE et al. 1996). A few CFGs have also been found in vertebrates (FINTA and ZAPHIROPOULOS 2000; ROGALLA et al. 2000; THOMSON et al. 2000; COURSEAUX and NAHON 2001). Finally, several novel CFGs have been described in Drosophila, such as jingwei in Drosophila tessieri and D. yakuba (LONG and LANGLEY 1993), Sdic in D. melanogaster (NURMINSKY et al. 1998), and Adh-finnegan (SULLIVAN et al. 1994; BEGUN 1997).
Two novel Drosophila genes, jingwei and Adh-finnegan, were previously thought to be Alcohol-dehydrogenase (Adh) pseudogenes (FISCHER and MANIATIS 1985; JEFFS and ASHBURNER 1991). In both cases, further analysis showed that these genes were functional genes that acquired protein-coding sequence 5' of the Adh-derived region of gene (LONG and LANGLEY 1993; BEGUN 1997). jingwei is a fusion of the amino terminus of a gene known as yellow emperor and a retrotrasposed Adh (WANG et al. 2000). The jingwei expression profile has diverged from its Adh ancestors. In D. tessieri expression is now limited to the testes (like ymp), although in D. yakuba it is expressed in other tissues as well (LONG and LANGLEY 1993). Adh-finnegan was created by the chromosomal duplication of Adh combined with the recruitment of a new 5' exon of unknown origin (BEGUN 1997). Adh-finnegan appears to be expressed broadly in adult tissues (SULLIVAN et al. 1994). Although these two Adh-derived fusion genes arose via different mechanisms and show dramatically different expression patterns, directional selection appears to have driven rapid amino acid evolution in both genes (LONG and LANGLEY 1993; BEGUN 1997). The fact that two novel Drosophila genes are derived from Adh and share some common aspects of their evolution raises two important questions:
Given the history of the discovery of jingwei and Adh-finnegan, a report of a third putative Adh pseudogene in the obscura group of Drosophila (MARFANY and GONZALEZ-DUARTE 1992; LUQUE et al. 1997) attracted our attention. DNA sequencing showed that this putative pseudogene originated by retrotransposition. Results from polytene in situ hybridization showed that this Adh retrosequence had transposed to chromosome arm E from chromosome U, which is the expected location of Adh on the basis of the conservation of Muller elements (ASHBURNER 1989). This retrotransposed Adh was found in D. subobscura, D. guanche, and D. madeirensis, but not in D. ambigua (VISA et al. 1991; MARFANY and GONZALEZ-DUARTE 1992; LUQUE et al. 1997), suggesting that the gene likely arose within the past 3 million years. LUQUE et al. (1997) sequenced six clones harboring this putative Adh retropseudogene, two each from genomic libraries of D. subobscura, D. guanche, and D. madeirensis. Comparisons of putative Adh pseudogenes to Adh for these species revealed frameshift mutations or indels in 5' and 3' regions flanking the Adh coding regions. The codon homologous to the ATG initiation codon of the ancestral Adh was CTG in both D. guanche clones. Premature stop codons were evident in one of the two D. guanche clones and one of the two D. subobscura clones, but none of the D. madeirensis clones. These observations suggested that at least some of these Adh sequences were no longer functional. The fact that the duplicate Adh was a retrosequence was also interpreted as support for the pseudogene hypothesis, as retrotransposed sequences potentially lack regulatory elements necessary for proper expression. All three species showed elevated amino acid substitution rates (dN) relative to Adh. None of the putative retropseudogenes, however, showed a nonsynonymous to synonymous substitution rate (dN/dS) close to one, the expectation for a neutrally evolving pseudogene. Moreover, codon bias increased in the putative Adh retropseudogenes, an unexpected result for a nonfunctional gene. Overall, the data presented a conflicting picture of the Adh retrosequence. Some aspects of the data supported the pseudogene hypothesis, yet others were strangely inconsistent with the hypothesis and were similar to the situation previously observed in the repleta group Adh duplication (SULLIVAN et al. 1994; BEGUN 1997). We present here our analysis of this retrotransposed Adh. We show that this putative Adh retropseudogene is actually part of a new chimeric fusion gene that is the result of an Adh mRNA inserting into the Gapdh-like gene, CG9010. This fusion gene is actively and widely transcribed. While the 5' promoters and transcription patterns of this gene are similar to those of CG9010, the protein-coding region has diverged for both the CG9010 and the Adh-like regions. Population genetic and phylogenetic analyses suggest that this amino acid evolution resulted from directional selection shortly after the chimeric fusion gene was formed.
| MATERIALS AND METHODS |
|---|
|
|
|---|
DNA sequencing:
PCR products were sequenced directly using an ABI 377 automated sequencer and BigDye Terminator chemistry (Applied BioSystems, Foster City, CA).
RNA extraction, cDNA preparation, and RT-PCR:
Poly(A+) RNA was prepared from whole flies or larvae using a MicroPoly(A) kit (Ambion, Austin, TX). cDNA for reverse transcriptase-PCR and rapid amplification of cDNA ends (RACE) was prepared from this RNA using the SMART RACE cDNA amplification kit (CLONTECH, Palo Alto, CA). SuperScript II reverse transcriptase (GIBCO BRL, Rockville, MD) was used for all RT reactions. Gene-specific primers were used to assay gene expression by RT-PCR on cDNA isolated from larvae (first, second, and third instar), whole adult males, and whole adult females.
Genomic library construction and screening:
D. subobscura genomic DNA was isolated from adult flies, partially digested with Sau3aI (New England BioLabs, Beverly, MA), and then dephosphorylated with CIAP (Promega, Madison, WI). These fragments were ligated into the Lambda DASH II vector according to the manufacturer's instructions (Stratagene, La Jolla, CA; T4 ligase was from GIBCO BRL), followed by packaging using Gigapack III Gold packaging reactions (Stratagene, La Jolla, CA). The library was amplified once on plates using XL-1Blue MRA [P2] cells.
Primary and secondary plaque lifts were carried out on Nytran Nylon membranes (Schleicher & Schuell, Keene, NH). The library was screened with a 1-kb CG9010 probe that was PCR amplified from D. melanogaster. Because this probe cross-hybridizes to plaques harboring Gapdh, we used PCR with primers designed for D. subobscura Gapdh to rule out false positives. Phage containing CG9010 were digested with EcoRI and PstI, resolved on a 0.7% agarose gel, Southern blotted, and probed with CG9010. The fragment containing D. subobscura CG9010 was subcloned into pBluescript.
Genomic Southern blot analysis of CG9010:
Southern blot analysis was used to infer copy number of CG9010 in D. subobscura. Genomic DNA (5 µg) was purified from D. melanogaster, D. pseudoobscura, D. guanche, and D. subobscura. These samples were digested with PstI or HindIII (GIBCO BRL), electrophoresed on a 0.7% gel, and Southern blotted to Nytran nylon membranes. These blots were then probed with PCR-amplified 1-kb fragments of D. melanogaster CG9010 and then D. subobscura CG9010.
Protein analyses:
One gram of tissue (whole adults and larvae) was homogenized in 2 ml ice-cold homogenization buffer and then centrifuged. Protein concentration in the supernatant was determined using the Bradford method (Bio-Rad, Hercules, CA). We then applied SDS-PAGE to our samples (10% acrylamide resolving gels, 4% acrylamide stacking gels). Typically, 5 µg of sample was boiled and then loaded in each lane. Gels were run at 70 mA constant current for 3040 min. Gels were electroblotted on nitrocellulose at 0° for 3060 min at 100 V constant voltage followed by blocking for 1 hr in a TBS-milk solution. Blots were incubated overnight (at 4°) with goat anti-D. melanogaster ADH (courtesy of C. Benyanjati) diluted in TBS-milk solution. Blots were placed in fresh TBS-milk and incubated with the secondary antibody (anti-goat HRP) for 1 hr followed by washing with TBS-Tween. The secondary antibody was visualized with ECL Plus (Amersham Biosciences, Piscataway, NJ) followed by autoradiography.
This approach repeatedly worked well for ADH proteins in all species we assayed (D. melanogaster, D. suboscura, D. yakuba, D. pseudoobscura, and D. hydei). Overexposure of the film to the blot would visualize a number of minor bands, but it was impossible to determine which, if any, would correspond to the band of interest.
We also used allozyme gels to look for residual ADH activity in the D. subobscura CFG. We adapted the protocol of BATTERHAM et al. (1983). Although we tried a variety of conditions, we observed only a single band of activity. This was consistent with what was reported in the literature (see RESULTS).
DNA sequence analysis:
BLASTN and TBLASTX were used to identify similar sequences from the NCBI databases (ALTSCHUL et al. 1997). DNAstar (DNASTAR, Madison, WI) was used for sequence alignments, contig assembly, and restriction mapping. Accession numbers of previously published data used in this analysis are X55390, X55391, M55545, X60112, U68470, U68469, X60113, U68472, U68471, AF175211, and AE003805.
Basic population genetic analyses were done using either DNAsp (ROZAS and ROZAS 1999) or software written by C.D.J. We limited our analysis to regions of high sequence quality. Typically, insertion/deletion polymorphism was ignored in our calculations of population genetics statistics.
Promoter prediction was accomplished using NNPP (REESE 2001) and McPromoter (OHLER et al. 2002). As noted in RESULTS, a threshold of 0.8 was used (which is predicted to give a false positive rate of 0.4% for NNPP). Signal peptide prediction used SignalP (NIELSEN et al. 1997). No signal peptides were detected.
Phlyogenetic analysis:
PAML provides a suite of maximum likelihood-based tools for combining DNA sequence and phylogenetic data to test molecular evolutionary hypotheses (YANG 1997; YANG and BIELAWSKI 2000). We used the phylogeny of RAMOS-ONSINS et al. (1998). There are three major steps to using PAML: (1) choice of appropriate model, (2) parameterization of that model, and (3) sequential comparison using log-likelihood ratio tests of simpler to more complex models to evaluate if a more complex model provides a significantly better fit to the data. For clarity, steps 1 and 2 will be described here and step 3 will be presented in RESULTS.
Evolution of protein-coding regions of CG9010 and Adh-derived sequences were analyzed independently using the codon model (Codeml; GOLDMAN and YANG 1994; YANG 1997). In the following sections, the difference in the log likelihood (
lnl), for the relevant degrees of freedom, implied a P-value <0.05 and was typically <0.001. Unless noted otherwise, model comparisons involving multiple tests remained significant after Bonferroni corrections. F3X4 codon model fit the CG9010 data the best of the codon models; the estimated codon table fit the Adh data the best. When appropriate and when a significantly better fit to the data was produced,
,
, and
were estimated (see YANG 1997). For the analyses discussed in RESULTS, we a priori hypothesize that the lineage created by the formation of the CFG (Adh-Twain) will be undergoing more rapid evolution than the Adh or CG9010 lineages [e.g., hypothesis generation is independent of the data used to test it; see YANG (1997) p. 23]. In several cases, convergence to maximum likelihood estimates was verified by changing the "small difference" parameter (see YANG 1997, p. 19). Reconstruction of ancestral sequences was done using both joint and marginal reconstruction. Ancestral states were identical regardless of method. Note that all amino acid positions are in terms of their position in D. subobscura, not D. melanogaster.
| RESULTS |
|---|
|
|
|---|
60% identity with the amino terminus of the D. melanogaster predicted protein CG9010 (TBLASTX E-value = e15). A similar region was found in D. madeirensis. The more distantly related D. guanche also contains a homologous region showing
67% identity with D. melanogaster CG9010 (TBLASTX E-value = e18). As mentioned in LUQUE et al. (1997), the region immediately upstream of the putative retrosequence initiation codon and downstream of the CG9010-like region is similar to obscura group Adh 5'-UTR. It is not clear from sequence comparisons whether this region represents the larval or adult leader variant. The Adh-like region of the retrosequence retained the Adh 3'-UTR and a viable polyadenylation signal. The region 3' of the Adh-like region 3'-UTR showed no significant similarity to any known gene, although it clearly harbors GEM elements (VIVAS et al. 1999). Given the location of CG9010 in D. melanogaster, the D. subobscura CG9010 would be located on chromosome arm E, which is also the location of the Adh retrosequence.
These results suggest that the Adh pseuodogene described by LUQUE et al. (1997) is a chimeric fusion gene that resulted from the retrotransposition of an Adh transcript into a D. subobscura CG9010. Our analysis, however, did not resolve the issue of the frameshift and nonsense mutations associated with at least some copies of the Adh retrosequence. To address this issue, we sequenced DNA encompassing 97% of the potential open reading frame of the chimeric fusion gene and some 5'-flanking sequence from 15 lines of D. subobscura, one line of D. guanche, and one line of D. madeirensis. Complete coding sequences were obtained for one D. subobscura line and one D. guanche line. All sequences had a contiguous open reading frame that included both the CG9010-like region and the Adh-like region. None of the frameshift insertion/deletions observed by Luque et al. were observed in our data. Nor were any premature stop codons found in any of the regions surveyed. This suggests that these indels and stop codons may have been sequencing errors in the original article. (If these stop codons and frameshifts do indeed exist, our data suggest that they are rare variants in D. subobscura.) The substitution of the canonical Adh start codon (ATG) by a leucine (CTG) reported by LUQUE et al. (1997) was also observed in our D. guanche sequence. From these data, we conclude that this Adh "pseudogene" is likely a novel Adh-derived fusion gene, which we have tentatively named "Adh-Twain." (Mark Twain's famous statement, "The rumors of my demise have been greatly exaggerated," was the inspiration for the name of this gene. We propose the abbreviation AdhT for Adh-Twain.)
CG9010 exists in D. subobscura and D. guanche:
If CG9010 function was necessary for the ancestor of D. subobscura, D. guanche, and D. madeirensis (which is likely given its similarity to Gapdh and its conservation across Drosophila), then CG9010 must have duplicated in this lineage. In other words, if insertion of the retrosequence into CG9010 abolished CG9010 function, then an alternative functional copy of CG9010 should exist in the subobscura clade. We used Southern analysis to determine if CG9010 is present in more than one copy in D. subobscura. Digestion of genomic DNA with several restriction enzymes followed by blotting and hybridization with the 5' end of CG9010 showed that CG9010 is single copy in D. melanogaster and D. pseudoobscura. The CG9010 probe, however, consistently produced two bands in D. subobscura (data not shown), supporting the idea that CG9010 duplicated in this lineage. BLAST searches of D. pseudoobscura suggest the presence of only one copy CG9010 in the strain used for the genome sequence.
To confirm that our Southern detected a surviving full copy of CG9010, we cloned CG9010. We used several methods to obtain the DNA sequence for the CG9010 homolog in D. subobscura and D. guanche (see MATERIALS AND METHODS). The amino acid sequence of the D. subobscura homolog of CG9010 shares 91% amino acid identity with D. pseudoobscura and 82% amino acid identity with D. melanogaster. ESTs from D. melanogaster show that CG9010 is transcribed, as do our RT-PCR data from D. guanche and D. subobscura (Figure 1).
|
We used RT-PCR to compare transcription between D. subobscura Adh-Twain and D. subobscura CG9010. Figure 1 suggests no significant differences in the expression patterns of Adh-Twain and CG9010, although CG9010 maybe slightly less abundant in males.
Adh-Twain-predicted protein characteristics:
Basic characteristics of the predicted D. subobscura ADH-TWAIN protein are presented in Table 1. The predicted protein is
40% larger than ADH. Other than size, the most notable difference between ADH-TWAIN protein vs. ADH and CG9010 is that ADH-TWAIN is much more positively charged at pH 7.0 than either CG9010 or ADH. The ADH-derived portion of ADH-TWAIN also appears to have increased in molecular weight relative to ADH.
|
Western blots (data not shown) revealed a single band of the size expected for ADH in D. melanogaster, D. subobscura, D. guanche, D. yakuba, D. pseudoobscura, D. virilis, and D. hydei. However, we observed no strong secondary band of the size expected for the predicted ADH-TWAIN. [Similarly, jingwei, which is known to produce a protein in vitro (ZHANG et al. 2004), was not visible in our Western blot.] Overexposure of the Western blot revealed a number of minor bands, but it was impossible to determine which, if any, would correspond to the band associated with the predicted ADH-TWAIN. The failure of the ADH antibody to react with ADH-TWAIN is not unexpected, given the very high protein divergence between the ADH-derived portion of ADH-TWAIN and known ADH proteins in Drosophila.
5' regulatory region of Adh-Twain shows similarity to that of CG9010:
If the 5' end of Adh-Twain is derived from a chromosomal duplication of CG9010, then we expect the 5' region of Adh-Twain to harbor regulatory elements, such as promoters, derived from those of CG9010 (expression patterns of Adh-Twain and D. subobscura CG9010 are similar; Figure 1). To investigate this possibility, we gathered sequence data for
600 bases 5' of the D. subobscura Adh-Twain start codon and
400 bp of 5'-flanking sequence from D. subobscura CG9010. These sequences were aligned to a sequence from the 5'-flanking region of D. pseudoobscura CG9010. We also used the VISTA browser (http://pipeline.lbl.gov/pseudo/) to qualitatively compare these sequences to D. melanogaster CG9010.
Figure 2 shows a multiple alignment of the 5' regions of CG9010. It is immediately obvious that there are several highly conserved regions. Using a combination of our 5'-RACE data, published cDNA data, interspecific sequence comparisons, and promoter prediction methods (see MATERIALS AND METHODS), we identified the putative 5'-UTRs and putative promoter elements of Adh-Twain and CG9010. Of particular note is the conserved sequence between base 177 and base 357 of the Adh-Twain 5' sequence in Figure 2. This region is very highly conserved in D. guanche Adh-Twain (data not shown), well conserved in D. pseudoobscura CG9010, and weakly conserved in D. melanogaster. The methods of REESE (2001) and OHLER et al. (2002) both suggest a promoter element in this region of Adh-Twain and CG9010 in D. subobscura. This region shows some similarity to TATA-less promoters seen in D. melanogaster (e.g., BURKE and KADONAGA 1997). This conserved region may be important for regulation of Adh-Twain and CG9010 and could potentially contribute to their similar expression patterns. The Adh-Twain 5'-UTR is much larger than the CG9010 5'-UTR. Using parsimony, we infer that the Adh-Twain 5'-UTR represents the derived state and that these sequence insertions arose after duplication of CG9010. The potential biological consequences of this larger UTR are not known, although 5'-UTRs are known to play a critical role in translational regulation (GRAY and WICKENS 1998). Most other insertions/deletions relative to D. pseudoobscura are shared by D. subobscura CG9010 and Adh-Twain.
|
330 bases of the 5'-UTR and regulatory regions from 15 iso-female lines of D. subobscura. We also collected sequence data from iso-female lines of D. guanche (n = 1) and D. madeirensis (n = 1). These data are summarized in Figure 3 and Table 2.
|
|
There are two in-frame indel polymorphisms in D. subobscura Adh-Twain, both of which are located near the junction of the CG9010-like and Adh-like regions (D1, bases 772787 and D2, 814820 on Figure 3). We used parsimony to infer that these were deletions relative to the ancestral CG9010 in D. subobscura and D. guanche (these amino acids are also present in CG9010 in D. melanogaster and D. pseudoobscura). Among the surveyed D. subobscura Adh-Twain alleles were three deletion haplotypes: (1) no deletions (1/15); (2) D2 only (3/15); and (3) D1 and D2 (11/15). Our D. madeirensis allele has both the D1 and the D2 deletions. D. guanche has neither of these deletions, but does have an in-frame 6-base deletion from base 889 to 895. All of these deletions occur in a region of high nucleotide polymorphism. This suggests that relative to most of the CG9010-like region and most of the Adh-like region, the intersection of these two regions is under relatively low constraint.
We also compared nucleotide divergence between the D. suboscura Adh-Twain and the ancestral D. suboscura CG9010 and Adh genes (gray lines in Figure 3, B and C). We looked at patterns of divergence between the D. subobscura and the D. guanche homologs of CG9010 and Adh (black lines in Figure 3, B and C). Comparison of these two sets of data (black vs. gray lines in Figure 3, B and C) shows a clear disconnect between the sites that tend to diverge between species in the parental genes and those that diverge between the Adh-Twain regions and their parental genes. This result, combined with the correlation between the polymorphism and divergence in Adh-Twain, suggests a shift in the sites that are conserved in Adh-Twain.
As initially reported by LUQUE et al. (1997), the Adh region of Adh-Twain shows more codon bias than does Adh in D. subobscura. Table 3 shows that the entire Adh-Twain has substantial codon bias in all three species. Relative to other genes in the obscura group, Adh-Twain appears to be a highly biased gene (POWELL 1997). Codon bias, however, has decreased in the CG9010 region of Adh-Twain, relative to CG9010.
|
|
Rapid early evolution of Adh-Twain:
Given the striking differences in the spatial patterns of divergence between Adh-Twain and its CG9010 and Adh ancestors (Figure 3, B and C), we also applied a contingency table analysis to compare the Adh-Twain divergence and polymorphism to that of its ancestor genes (Table 4). This is not a canonical MK test as it compares paralogous loci within a species, rather than homologous loci between species. A significant disconnect between polymorphism and divergence does not mean we can reject the neutral model, as we cannot rule out a fundamental shift in the neutral or nearly neutral mutation rate for one or more of these genes after the duplication events. Nevertheless, this is a useful way to detect a substantial change in the substitution rate at nonsynonymous or synonymous sites. The data indicate that there is no significant difference between the CG9010 ancestor and the CG9010-derived part of Adh-Twain. The Adh region, however, is marginally significant in the unpolarized comparison and highly significant in the polarized comparison. In the polymorphism data, there are
2.5 synonymous polymorphisms per nonsynonymous polymorphism. In contrast, in the polarized divergence data we estimate only 0.333 synonymous fixations per nonsynonymous fixationan eightfold difference. By comparison, Adh is estimated to have 2.6 synonymous fixations for every nonsynonymous fixation. Fisher's exact test shows that the ratio of synonymous to nonsynonymous fixations in Adh is not significantly different from the ratio of synonymous to nonsynonymous polymorphisms in Adh-Twain (P = 0.999). Both of these ratios are strikingly different from the ratio of synonymous to nonsynonymous fixations in the Adh-like regions of Adh-Twain (Adh-Twain fixations vs. Adh-Twain polymorphism, P = 0.0017; Adh-Twain fixations vs. Adh fixations, P = 0.0024). These highly significant results suggesting rapid amino acid divergence of Adh-Twain from its paralog Adh contrasts with our earlier comparisons of orthologous Adh-Twain sequences. From these results, we hypothesize that adaptive protein evolution in Adh-Twain occurred early in its history, prior to the speciation events leading to D. subobscura/D. guanche/D. madeirensis.
As the comparison of paralogous genes differs from the comparison of orthologous genes in many ways (e.g., time, genomic location), we applied the more suitable phylogenetic approach implemented in PAML (YANG 1997) to test the hypothesis that Adh-Twain evolved rapidly shortly after its formation (solid bars in Figure 4). Specifically, we estimated the nonsynonymous to synonymous rate ratio (dN/dS) for various branches of the Adh-Twain gene tree. In these analyses we separately investigated CG9010 and Adh, the corresponding regions of Adh-Twain.
|
lnl = 28.8, d.f. = 9, P = 0.0007). Clearly, the model of a single rate for dN/dS does not fit the CG9010 data.
The branch-specific model, which has one dN/dS rate for the fusion gene-related lineages and one rate for all other lineages, is a significantly better fit to the data than the one-ratio model (model 0 vs. model 2, lnl 943.1; 2
lnl = 16.8, d.f. = 1, P < 0.0001). In contrast, the free-ratio model does not fit the data better than the two-ratio branch-specific model (2
lnl = 12, d.f. = 8, P = 0.1512). A three-ratio branch-specific modelin which there is one rate from the branch immediately after the duplication of CG9010, one rate from the branches after the D. subobscura-D. guanche speciation, and one rate for all othersdoes not fit the data better than the two-ratio model (2
lnl = 0.2, d.f. = 1, P = 0.62). Interestingly, a four-ratio model, where all branches after the CG9010 duplication are free to vary their dN/dS ratio, fits the data marginally better than the two-ratio model (2
lnl = 7.4, d.f. = 2, P = 0.024). We are cautious of this later result given the large number of tests performed. Figure 4 illustrates these results.
Not surprisingly, dN/dS is small for CG9010 in most lineages (Figure 4). However, the dN/dS ratio of the CG9010-derived part of Adh-Twain early in its history is close to 1 (dN/dS = 0.9919). The D. guanche branch is clearly evolving slowly and is consistent with functional constraint (dN/dS = 0.3313). The only evidence for dN/dS significantly greater than 1 in the CG9010-derived portion of Adh-Twain is found in the D. subobscura lineage, which has a dN/dS of 2.09. This result is consistent with directional selection acting on the CG9010 portion of the D. subobscura Adh-Twain. This result, however, must be interpreted with caution as the dN/dS ratio is still close to 1 and MK tests described abovewhich also take into account polymorphism datawere not significant.
We repeated the above analysis for the Adh-derived region of Adh-Twain. Again, the free-ratio model fit the data substantially better than the one-ratio model (model 0, lnl = 2128.27; model 1, lnl = 2083.60; 2
lnl = 89.34, d.f. = 15, P > 0.0001). We compared the one-ratio and the free-ratio models to several different models of sequence evolution for the Adh-Twain lineages. First, we compared a two-ratio model, with one dN/dS ratio for the Adh-Twain branch and one ratio for all other lineages. The two-ratio model fit better than the one-ratio model, but the free-ratio model fit slightly better than the two-ratio model (two-ratio model vs. model 0, two-ratio model lnl = 2097.46; 2
lnl = 60.62, d.f. = 1, P > 0.0001; two-ratio model vs. model 1, 2
lnl = 27.7, d.f. = 14, P = 0.0156). A three-ratio model, with rapid evolution after the formation of Adh-Twain and then a subsequent slowing down after the D. subobscura-D. guanche speciation, fit better than the two-ratio model (three-ratio model vs. two-ratio model, three-ratio model lnl = 2093.19; 2
lnl = 8.54, d.f. = 1, P = 0.0035). Interestingly, the free-ratio model does not fit the data significantly better than the three-ratio model (2
lnl = 19.18, d.f. = 13, P = 0.1176). We also compared several other models, none of which were significant improvements over the three-ratio model (analysis not shown).
For the three-ratio model, the dN/dS ratio for the branch immediately after the formation of Adh-Twain cannot be calculated, as dS is 0. This implies a dN/dS ratio much greater than 1. In contrast, dN/dS ratio for the Adh branches is 0.0411. The likely nonsynonymous and synonymous substitutions along this branch were inferred with PAML. There were at least 14 nonsynonymous amino acid substitutions and 0 synonymous silent substitutions, strongly suggesting a bout of rapid adaptive amino acid evolution shortly after the formation of Adh-Twain. This is consistent with our earlier contingency table analysis. Moreover, the ratio of nonsynonymous substitutions to synonymous substitutions for this early branch is dramatically different from the ratio of nonsynonymous substitutions to synonymous substitutions observed in the D. subobscura polymorphism data, P < 0.0001. Interestingly, the fact that no silent substitutions occurred between the retrotransposition of the Adh mRNA and the speciation of D. guanche and D. subobscura hints that Adh-Twain may have formed shortly before this speciation event.
dN/dS is 0.3991 for Adh-Twain after the speciation event leading to D. guanche and D. subobscura. This, while greater than is typical for Adh, is not suggestive of adaptive evolution and is consistent with our earlier MK test analysis.
| DISCUSSION |
|---|
|
|
|---|
|
8.012.0 million years ago (MYA) and a lower bound of 1.82.8 MYA (RAMOS-ONSINS et al. 1998). Second, we do not know which copy of CG9010 participated in the fusion event. We found no evidence for DNA sequence homologous to the 3' end of CG9010 in the sequence data from the fusion gene. This is consistent with two hypotheses. Either the CG9010 target of the Adh retrosequence was a truncated, nonfunctional copy (e.g., KATJU and LYNCH 2003) or the 3' end of the CG9010 target is no longer recognizable as a result of extensive molecular evolution. Several puzzling results from the LUQUE et al. (1997) original data, including high codon bias and low dN/dS ratio compared to the expectation for a putative pseudogene, are explained by our data. The fact that the putative initiation codon of the D. guanche Adh retrosequence was CTG rather than ATG (which was interpreted as evidence of loss of function in Adh) is now explained by the fact that the actual initiation codon is upstream of this codon and is derived from the duplicated CG9010. Transcription patterns of the CFG were qualitatively similar to those of CG9010. This is consistent with conservation of putative 5' regulatory elements between Adh-Twain and CG9010. Bioinformatic analysis of Adh-Twain suggests that the fusion protein may have lost or reduced ancestral ADH activity, which is consistent with the negative results of our experimental work. The function of the Adh-Twain protein, however, remains unknown.
Adaptive protein evolution:
In general, patterns of polymorphism and divergence at Adh-Twain provide no strong evidence for recent directional selection in D. subobscura. Our analysis of substitution patterns along branches of the genealogy suggests that a "three-ratio" model has the greatest likelihood. In this model, Adh is normally very evolutionarily constrained. Shortly after Adh-Twain was formed, but prior to the splitting of the lineages leading to D. subobscura and D. guanche, there was a burst of adaptive substitutions in the Adh region. Subsequently, the Adh region of the fusion gene has slowed in its rate of evolution, although not nearly as slow as Adh and not along all lineages (e.g., D. guanche).
A strikingly similar pattern of rapid amino acid evolution shortly after the formation of jingwei was reported by LONG and LANGLEY (1993). Interestingly, one of the early, likely adaptive, amino acid changes in Adh-Twain (H190 to Q) occurs at a residue known to affect Adh activity in D. melanogaster mutants (http://www.flybase.org). This result is consistent with the Adh allozyme data from the obscura group, which provided no evidence for canonical Adh activity associated with the fusion gene. This may mean that the enzymatic activity of Adh-Twain has shifted away from secondary alcohols typically catalyzed by Drosophila Adh.
The history of the CG9010-derived region of Adh-Twain is less clear. We showed that, unlike the Adh region, a "two-ratio" model was most likely, given our data. There is strong evidence for rapid evolution early in the history of Adh-Twain. There is also weak evidence that the CG9010-derived region is adaptively evolving along the D. subobscura lineage. However, the strong signal of directional selectionmany more amino acid changes than silent changesobserved in the early evolution of the Adh region is obscured in the CG9010 region. Instead, the rates of both nonsynonymous and synonymous substitutions were elevated early in the history of this gene, followed by increasing constraint. The simplest interpretation is that the copy of CG9010 that ultimately became part of Adh-Twain evolved quickly under reduced functional constraint right after the initial duplication event, but then became more constrained once it was fused with the Adh retrogene.
Retrogene evolution:
If a retrogene is to preserve an ancestral function or acquire a new function, it must avoid premature termination codons and be properly expressed. For CFGs originating by retrotransposition of a "donor" gene into a preexisting "acceptor" gene, reading frame preservation implies an insertion site between acceptor gene codons. Furthermore, it implies removal or avoidance of in-frame stop codons in the 5'-UTR of the donor retrosequence. It is also possible that some ultimately successful fusion genes originating by retrotransposition go through an early stage of loss-of-function prior to restoration of function by new mutations. Distinguishing between these possibilities is problematic as we cannot accurately reconstructing the details of the ancestral state of the insertional mutation (e.g., the retrotransposition of the Adh mRNA in the case of Adh-Twain) in all but the most recently derived CFGs. Nevertheless, the fact that Adh-Twain showed early rapid amino acid evolution and no silent substitutions suggests that this gene was functional immediately after it was formed (LONG and LANGLEY 1993 reached a similar conclusion for jingwei).
Adh and the origins of new genes:
Recent studies in Drosophila and mammals have shown that retrotransposition of spliced mRNA's is a potent source of gene duplications (EICHLER 2001; BETRAN et al. 2002; DEVOR and MOFFAT-WILSON 2003; LONG et al. 2003a,b; EMERSON et al. 2004). LONG and LANGLEY (1993) showed that retrotransposition events such as these can result in the formation of new CFGs, e.g., jingwei. The discovery of three novel Adh-derived Drosophila genes, Adh-Twain, jingwei, and Adh-Finnegan, strongly suggests that Adh-derived novel genes are common in flies.
What aspects of Adh biology might contribute to this phenomenon? Adh in D. melanogaster and other Drosophila can catalyze a variety of substrates ranging from various alcohols to a number of aldehydes (ASHBURNER 1998). Perhaps the frequency of Adh duplications and Adh-derived CFGs reflects the flexibility and evolutionary potential of the Adh protein or the fact that Adh is one of the most abundant transcripts in Drosophila. Other factors relating to the structure and molecular biology of Adh may make it more likely to participate in retrotransposition-mediated novel gene formation. For example, GONCALVES et al. (2000) identified common characteristics of genes that were the source of retropseudogenes in humans. The coding sequences of the parental genes tended to be relatively short, expressed in a variety of tissues, and have a low G/C content. Their amino acid sequences were highly conserved. Nearly a quarter of these genes produced more than one retropseudogene. Consistent with these observations, the 23 recent retrotransposed genes in D. melanogaster identified by BETRAN et al. (2002) are shorter than the average D. melanogaster gene (mean retrotransposed gene length, 320 amino acids (aa); mean gene length, 522 aa; median retrotransposed gene length, 230 aa; median gene length, 421 aa). The parental genes of these retrotransposed genes are also often expressed in multiple tissues (BETRAN et al. 2002). Adh in Drosophila qualitatively fits several of the above patterns as it is relatively short, widely expressed, and highly conserved and has duplicated several times in different Drosophila lineages (YUM et al. 1991; NURMINSKY et al. 1996; POWELL 1997; AMADOR and JUAN 1999). In any event, the discovery of the obscura Adh-derived chimeric fusion gene strongly motivates the search for additional novel Adh genes and opens up the possibility of discovering general principles governing the evolution of novel genes.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
| LITERATURE CITED |
|---|
|
|
|---|
ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHAFFER, J. ZHANG, Z. ZHANG et al., 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 33893402.
AMADOR, A., and E. JUAN, 1999 Nonfixed duplication containing the Adh gene and a truncated form of the Adhr gene in the Drosophila funebris species group: different modes of evolution of Adh relative to Adhr in Drosophila. Mol. Biol. Evol. 16: 14391456.[Abstract]
ASHBURNER, M., 1989 Drosophila: A Laboratory Handbook. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
ASHBURNER, M., 1998 Speculations on the subject of alcohol dehydrogenase and its properties in Drosophila and other flies. BioEssays 20: 949995.[CrossRef][Medline]
BALANYA, J., C. SEGARRA, A. PREVOSTI and L. SERRA, 1994 Colonization of America by Drosophila subobscura: the founder event and a rapid expansion. J. Hered. 85: 427432.
BATTERHAM, P., J. A. LOVETT, W. T. STARMER and D. T. SULLIVAN, 1983 Differential regulation of duplicate alcohol dehydrogenase genes in Drosophila mojavensis. Dev. Biol. 96: 346354.[CrossRef][Medline]
BEGUN, D. J., 1997 Origin and evolution of a new gene descended from alcohol dehydrogenase in Drosophila. Genetics 145: 375382.[Abstract]
BETRAN, E., and M. LONG, 2003 Dntf-2r, a young Drosophila retroposed gene with specific male expression under positive Darwinian selection. Genetics 164: 977988.
BETRAN, E., K. THORNTON and M. LONG, 2002 Retroposed new genes out of the X in Drosophila. Genome Res. 12: 18541859.
BOREVITZ, J. O., D. LIANG, D. PLOUFFE, H. S. CHANG, T. ZHU et al., 2003 Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 13: 513523.
BURKE, T. W., and J. T. KADONAGA, 1997 The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila. Genes Dev. 11: 30203031.
CASTRO, J. A., M. RAMON, A. PICORNELL and A. MOYA, 1999 The genetic structure of Drosophila subobscura populations from the islands of Majorca and Minorca (Balearic Islands, Spain) based on allozymes and mitochondrial DNA. Heredity 83: 271279.
COURSEAUX, A., and J. L. NAHON, 2001 Birth of two chimeric genes in the Hominidae lineage. Science 291: 12931297.
DEVOR, E. J., and K. A. MOFFAT-WILSON, 2003 Molecular and temporal characteristics of human retropseudogenes. Hum. Biol. 75: 661672.[Medline]
EICHLER, E. E., 2001 Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17: 661669.[CrossRef][Medline]
EMERSON, J. J., H. KAESSMANN, E. BETRAN and M. LONG, 2004 Extensive gene traffic on the mammalian X chromosome. Science 303: 537540.
FINTA, C., and P. G. ZAPHIROPOULOS, 2000 The human cytochrome P450 3A locus. Gene evolution by capture of downstream exons. Gene 260: 1323.[CrossRef][Medline]
FISCHER, J. A., and T. MANIATIS, 1985 Structure and transcription of the Drosophila mulleri alcohol dehydrogenase genes. Nucleic Acids Res. 13: 68996917.
GOLDMAN, N., and Z. YANG, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11: 725736.[Abstract]
GONCALVES, I., L. DURET and D. MOUCHIROUD, 2000 Nature and structure of human genes that generate retropseudogenes. Genome Res. 10: 672678.
GRAY, N. K., and M. WICKENS, 1998 Control of translation in animals. Annu. Rev. Cell Dev. Biol. 14: 399458.[CrossRef][Medline]
HALDANE, J. B. S., 1932 The Causes of Evolution. Longmans Green & Co., London.
HARRISON, P. M., A. KUMAR, N. LANG, M. SNYDER and M. GERSTEIN, 2002 A question of size: the eukaryotic proteome and the problems in defining it. Nucleic Acids Res. 30: 10831090.
HE, S., A. R. ABAD, S. B. GELVIN and S. A. MACKENZIE, 1996 A cytoplasmic male sterility-associated mitochondrial protein causes pollen disruption in transgenic tobacco. Proc. Natl. Acad. Sci. USA 93: 1176311768.
HOLLAND, P. W., 2003 More genes in vertebrates? J. Struct. Funct. Genomics 3: 7584.[CrossRef][Medline]
HUGHES, A., 2002 Adaptive evolution after gene duplication. Trends Genet. 18: 433434.[CrossRef][Medline]
JAIN, R., M. C. RIVERA and J. A. LAKE, 1999 Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96: 38013806.
JEFFS, P., and M. ASHBURNER, 1991 Processed pseudogenes in Drosophila. Proc. R. Soc. Biol. Sci. 44: 151159.
KATJU, V., and M. LYNCH, 2003 The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics 165: 17931803.
KERN, A. D., C. D. JONES and D. J. BEGUN, 2002 Genomic effects of nucleotide substitutions in Drosophila simulans. Genetics 162: 17531761.
LONG, M., and C. H. LANGLEY, 1993 Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260: 9195.
LONG, M., M. DEUTSCH, W. WANG, E. BETRAN, F. G. BRUNET et al., 2003a Origin of new genes: evidence from experimental and computational analyses. Genetica 118: 171182.[CrossRef][Medline]
LONG, M., E. BETRAN, K. THORNTON and W. WANG, 2003b The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4: 865875.[Medline]
LOUKAS, M., C. B. KRIMBAS, P. MAVRAGANI-TSIPIDOU and C. D. KASTRITSIS, 1979 Genetics of Drosophila subobscura populations. VIII. Allozyme loci and their chromosome maps. J. Hered. 70: 1726.
LUQUE, T., G. MARFANY and R. GONZALEZ-DUARTE, 1997 Characterization and molecular analysis of Adh retrosequences in species of the Drosophila obscura group. Mol. Biol. Evol. 14: 13161325.[Abstract]
LYNCH, M., and J. S. CONERY, 2000 The evolutionary fate and consequences of duplicate genes. Science 290: 11511155.
LYNCH, M., and J. S. CONERY, 2003 The origins of genome complexity. Science 302: 14011404.
LYNCH, M., M. O'HELY, B. WALSH and A. FORCE, 2001 The probability of preservation of a newly arisen gene duplicate. Genetics 159: 17891804.
MARFANY, G., and R. GONZALEZ-DUARTE, 1992 Evidence for retrotranscription of protein-coding genes in the Drosophila subobscura genome. J. Mol. Evol. 35: 492501.[CrossRef][Medline]
MCDONALD, J. H., and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652654.[CrossRef][Medline]
NIELSEN, H., J. ENGELBRECHT, S. BRUNAK and G. VON HEIJNE, 1997 A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int. J. Neural Syst. 8: 581599.[CrossRef][Medline]
NURMINSKY, D. I., E. N. MORIYAMA, E. R. LOZOVSKAYA and D. L. HARTL, 1996 Molecular phylogeny and genome evolution in the Drosophila virilis species group: duplication of the alcohol dehydrogenase gene. Mol. Biol. Evol. 13: 132149.[Abstract]
NURMINSKY, D. I., M. V. NURMINSKAYA, D. DE AGUIAR and D. L. HARTL, 1998 Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature 396: 572575.[CrossRef][Medline]
OCHMAN, H., and I. B. JONES, 2000 Evolutionary dynamics of full genome content in Escherichia coli. EMBO J. 19: 66376643.[CrossRef][Medline]
OHLER, U., G. C. LIAO, H. NIEMANN and G. M. RUBIN, 2002 Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3: RESEARCH0087.
OHNO, S., 1970 Evolution by Gene Duplication. Springer, Berlin.
OHTA, T., 2003 Evolution by gene duplication revisited: differentiation of regulatory elements versus proteins. Genetica 118: 209216.[CrossRef][Medline]
PATTHY, L., 1999 Genome evolution and the evolution of exon-shuffling: a review. Gene 238: 103114.