- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Eskesen, S. T.
- Articles by Ruvinsky, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Eskesen, S. T.
- Articles by Ruvinsky, A.
Natural Selection Affects Frequencies of AG and GT Dinucleotides at the 5' and 3' Ends of Exons
S. T. Eskesena, F. N. Eskesen1,a, and A. Ruvinskyaa Institute for Genetics and Bioinformatics, University of New England, Armidale, New South Wales 2351, Australia
Corresponding author: A. Ruvinsky, University of New England, Armidale, NSW 2351, Australia., aruvinsk{at}metz.une.edu.au (E-mail)
Communicating editor: M. A. F. NOOR
| ABSTRACT |
|---|
GT and AG, located at the 5' and 3' ends of introns, are important for correct splicing. It is anticipated that natural selection decreases frequency of AG and GT near the 5' and 3' ends of exons, preventing appearance of cryptic splicing sites. The data presented in this article support the expectation.
IT is common knowledge that GT and AG dinucleotides, located at the 5' and 3' ends of introns, respectively, constitute an important part of the donor and acceptor splice sites. These sites are highly conserved and essential for correct splicing (![]()
![]()
![]()
![]()
| DISTRIBUTION OF AG PAIRS AT 5' ENDS OF EXONS |
|---|
Information relevant to Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens was extracted from the exon-intron database (EID), which was compiled in the W. Gilbert laboratory, Department of Molecular and Cellular Biology, Harvard University (![]()
Pictograms representing frequencies of different nucleotides at the first 10 5' positions of exons are shown in Fig 1. It is to be expected (![]()
|
To study the problem posed in the title of this article, we calculated observed and expected frequencies of AG pairs in the 5' part of exons in the compared species. Fig 2 represents the distributions of AG frequencies starting from the 5' end along nonfirst exons. The first exons were excluded from this count as they do not have the preceding intron-exon boundary and thus differ from all the rest. Periodic variations of AG frequencies, which can be seen in all distributions, are discussed in a separate article. As can be seen in Fig 2 the first five positions in the three compared species differ from the following positions. Taking this fact into account we compared observed and expected frequencies of AG pairs on the intervals 630 and 101125 (not shown in Fig 2) positions. In the nonfirst exons of C. elegans expected frequencies are significantly higher within both compared intervals (Table 1), while the intensity of the differences slightly declines in the 3' direction. Correlation (r = 0.65; P < 0.0001) between AG position in the exons and the difference between expected and observed values was found. In D. melanogaster significant differences are found in the first interval (positions 630) but not in the second (positions 101125; Table 1). Correlation (r = 0.40; P < 0.0001) between AG position in the D. melanogaster exons and the difference between expected and observed values was also observed. Thus, the nonfirst exons of C. elegans and D. melanogaster demonstrate significantly lower than expected AG frequencies in the positions adjacent to the 5' end of the exon and the difference diminishes downstream. Interestingly the observed frequencies of GA pairs (data not shown) are significantly higher than expected (for C. elegans, t = 14.751, P = 0 and for D. melanogaster, t = 6.155, P = 0, in the first 50 positions) and much higher than observed frequencies of AG pairs. One may presume that selective pressure against AG pairs could contribute to the observed phenomenon.
|
|
On the contrary the first exons do not have a preceding intron-exon junction and were not expected to show specific selection pressure against AG. Table 2 presents comparisons between observed and expected frequencies of AG on the same intervals as is shown above for nonfirst exons. In C. elegans and D. melanogaster no differences were observed in either interval in the first exons. This observation supports the idea that the preceding intron-exon junction likely contributes to selection against exonic AG in the vicinity of splicing site in the nonfirst exons.
|
In H. sapiens the situation seems to be different. Observed and expected frequencies of AG practically do not differ from expected in the first four positions, after which the pattern changes and observed frequencies of AG are higher than expected. Obviously there is a disparity between H. sapiens and the two other studied species. What could be a rational interpretation of this difference? One cannot rule out that splicing mechanisms in H. sapiens slightly differ and specific selection against AG typical for C. elegans and D. melanogaster might be much lower or even disappear in H. sapiens exons. It is also possible that in H. sapiens additional factors might be involved, which could mask possible selection pressure against AG. For instance, we found that in H. sapiens exons observed frequency of CG is dramatically lower (
200% less, data not shown) than expected, while in the other compared species this was not the case. As frequencies of dinucleotides are interrelated, one may guess that an increased level of observed AG in H. sapiens could be a compensation for low frequency of CG and some other dinucleotides in exons. In any case the difference of H. sapiens from the two other species is apparent.
Distributions of AG frequencies shown in Fig 2 represent a mixture of three phases (0, 1, and 2) in each species. Phase separation of any of these distributions into three distributions (phase 0, phase 1, and phase 2) reveal the same pattern, which is more sharply expressed (data not shown).
In an attempt to further test the selection hypothesis we compared frequencies of pairs of synonymous codons located at the 5' end of exons (phase 0) in the three studied species, which differ in the third position. The compared codons are AAA and AAG (lysine), CAA and CAG (glutamine), and GAA and GAG (glutamic acid). If natural selection really operates against AG pairs located near the splice junction (5' end of exons), one can also expect selection against AG-carrying codons, while AA-carrying codons should not be affected in the same degree or at all. This type of comparison based on studying two observed values is very much different from hitherto used comparisons of observed and expected values. Fig 3 supports the hypothesis of specific selection against codons carrying AG. It is quite obvious that in all nine cases presented in Fig 3 the frequency of AG-containing codons in the first few 5' positions and particularly the first position is lower than the frequency of AA-containing codons. There is also a distance effect: frequencies of AG-containing codons increase in the 3' direction. Comparisons of the first 10 codons located at the very 5' end of exons with the 10 codons located at positions 4150 are highly significant (P < 0.01) except codon AAG in H. sapiens. However, even in this case the first codon is considerably below the average value. AA-containing codons behave differently and do not show strong distance-related selection effects, while some compensatory increases near the 5' end of exons are possible. Such contrasting behavior of synonymous codons could hardly be explained by other causes and thus supports the tested selection hypothesis quite convincingly. Possibly this observation could add an extra factor affecting codon choices during evolution (![]()
|
Next we investigated whether positions of AG within codons in the 5' region of exons may have different selective values. Clearly there are three possible positions. When AG occupies position 1 within a codon (AG|N) and confuses splicing machinery, which may accept this particular AG as the last 3' AG of the previous intron, it leads to abnormal splicing and causes a frameshift in the downstream part of the gene. The same is true when AG occupies position 3 within a codon (NNAG|); both of these positions (1 and 3) should be deleterious. However, position 2 (NAG|) seems to be less deleterious in this regard as it will not cause a frameshift but rather causes a loss of one or a few codons in the case of abnormal splicing.
To test this hypothesis we calculated differences between expected and observed frequencies of AG and compared these values among three codon positions using the approach explained above. The results presented in Table 3 show that the differences between expected and observed values were significantly smaller in the second position of codons ("less deleterious") than in the two other positions in all studied cases, except the third position in humans. The same conclusion is correct for exons in all three phases (data not shown). A smaller difference between expected and observed frequencies of AG in the second codon position indicates that this position is subjected relatively less to the specific selection pressure than are two other positions. These data provide additional support for the tested selection hypothesis.
|
Several independent types of evidence presented here create sufficient grounds to believe that AG dinucleotides located on the 5' ends of exons experienced negative selection pressure in order to reduce the risk of their being mistakenly recognized as the last 3' end intronic AG signal and thus to diminish the chance of deleterious splicing.
| DISTRIBUTION OF GT PAIRS AT 3' ENDS OF EXONS |
|---|
A similar approach was applied for studying distribution of GT pairs at the 3' end of exons. We investigated how distribution of the 3' exonic GT could be affected by GT pairs nearly always located at the 5' splice sites of introns. The last exons of genes were separated from the rest, as they do not have the following intron-exon boundary and probably exist under different selection pressures. Position 1 represents the last two 3' positions of exons, which were aligned by their 3' ends (Fig 4). As expected (![]()
|
Interestingly the observed frequencies of TG pairs (data not shown) do not differ from expected frequencies for C. elegans (t = 0.7788, P = 0.4379) and for D. melanogaster (t = 1.1286, P = 0.2618) in the first 50 positions. In H. sapiens observed frequencies of TG were even higher than expected (t = 12.1069, P = 0, in the first 50 positions). These data indicate that frequencies of GT and TG pairs behave differently at the 3' ends of exons.
Again as in the previous section of this article, we compared observed and expected frequencies of GT pairs in the 3' part of nonlast exons on two intervals, 6 to 30 and 101 to 125 (not shown in Fig 4) in the studied species. In C. elegans expected frequencies of GT are significantly higher within both compared intervals (Table 1), while the intensity of the differences declines in the 5' direction. In D. melanogaster significant differences are also found on both intervals (Table 1). A similar pattern was found in H. sapiens. Thus the compared species demonstrate significant differences between expected and observed GT frequencies in the positions adjacent to the 3' ends of the exons, which diminish in the 5' direction. One may presume that selective pressure against GT pairs could contribute to the observed phenomenon.
The last exons do not have a following exon-intron junction and were not expected to show specific selection pressure against GT. Table 2 presents comparisons between observed and expected frequencies of GT on the same intervals calculated for the last exons. In all but one case there is a statistically detectable difference. This result is contradictory to our expectation that in the last exons there would not be specific selection pressure against GT. However, statistics for the last exons indicate a dramatic decline in the differences between observed and expected frequencies of GT, as compared to the nonlast exons. This may show a reduced selection pressure against GT; however, other factors may be the cause of the differences between observed and expected values in the last exons. It is quite possible that there are other independent selection pressures, which may reduce observed GT frequencies in the last exons, making the situation more complex.
Distributions of GT frequencies shown in Fig 4 represent a mixture of three phases (0, 1, and 2) in each species. Phase separation of any of these distributions into three distributions (phase 0, phase 1, and phase 2) reveal the same pattern, which is just more sharply expressed (data not shown).
Comparative analysis of pairs of synonymous codons located at the 3' end of exons similar to those described earlier in the article is presented in Fig 5. It can be seen that in many cases the last few 3' positions (15) and particularly the last position (1) have lower frequency of GT-containing codons compared with frequency of alternative synonymous codons. There are also distance effects in several cases: frequencies of GT-containing codons slightly increase in the 5' direction. However, it was not common for all studied cases. The data provide some support for the hypothesis, while not as compelling as the data for AG pairs (Fig 3).
|
Finally we compared differences between expected and observed frequencies of GT using the same logic that was applied to AG analysis earlier. The only difference is that in this case the first position of codons is "less deleterious." The results presented in Table 4 show that the differences between expected and observed values were significantly smaller between the second and the first position of codons, while the difference between the third and the first positions was significant only for C. elegans (Table 4).
|
It is well known that nucleotides surrounding intron-exon boundaries are essential for correct splicing. The latest data also demonstrate the importance of spliceosome structures (![]()
![]()
![]()
Recent publications provide good reasons to exercise a cautious approach in assuming possible selection pressures on dinucleotide frequencies (![]()
![]()
| FOOTNOTES |
|---|
1 Present address: T. J. Watson Research Center, 19 Skyline Dr., Hawthorne, NY 10532. ![]()
| ACKNOWLEDGMENTS |
|---|
We thank A. Fedorov for advice concerning the exon-intron database (EID) and E. Koonin and I. Ruvinsky for useful comments. We are also grateful to an anonymous reviewer for very helpful suggestions.
Manuscript received November 13, 2003; Accepted for publication January 16, 2004.
| LITERATURE CITED |
|---|
BURSET, M., I. A. SELEDTSOV, and V. V. SOLOVYEV, 2000 Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28:4364-4375.
DURET, L. and N. GALTIER, 2000 The covariation between TpA deficiency, CpG deficiency, and G+C content of human isochors is due to a mathematical artifact. Mol. Biol. Evol. 17:1620-1625.
FAIRBROTHER, W. G., R. F. YEH, P. A. SHARP, and C. B. BURGE, 2002 Predictive identification of exonic splicing enhancers in human genes. Science 297:1007-1013.
KARLIN, S. and J. MRAZEK, 1996 What drives codon choices in human genes. J. Mol. Biol. 262:459-472.[CrossRef][Medline]
LEWIN, B., 1994 Gene V, p. 914. Oxford University Press, New York.
MANIATIS, T. and B. TASIC, 2002 Alternative splicing pre-mRNA splicing and proteome expansion in metazoans. Nature 418:236-243.[CrossRef][Medline]
MENDELL, J. T. and H. C. DIETZ, 2001 When the message goes awry: disease-producing mutations that influence mRNA content and performance. Cell 107:411-414.[CrossRef][Medline]
NISSIM-RAFINIA, M. and B. KEREM, 2002 Splicing regulation as a potential genetic modifier. Trends Genet. 18:123-127.[CrossRef][Medline]
SAXONOV, S., I. DAIZADEH, A. FEDOROV, and W. GILBERT, 2000 The exon intron database: an exhaustive database of protein-coding intron-containing genes. Nucleic Acids Res. 28:185-190.
URRUTIA, A. S. O. and L. D. HURST, 2001 Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics 159:1191-1199.
This article has been cited by other articles:
![]() |
A. D. Cutter, A. Dey, and R. L. Murray Evolution of the Caenorhabditis elegans Genome Mol. Biol. Evol., June 1, 2009; 26(6): 1199 - 1234. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Warnecke and L. D. Hurst Evidence for a Trade-Off between Translational Efficiency and Splicing Regulation in Determining Synonymous Codon Usage in Drosophila melanogaster Mol. Biol. Evol., December 1, 2007; 24(12): 2755 - 2762. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Parmley and L. D. Hurst Exonic Splicing Regulatory Elements Skew Synonymous Codon Usage near Intron-exon Boundaries in Mammals Mol. Biol. Evol., August 1, 2007; 24(8): 1600 - 1603. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Shepelev and A. Fedorov Advances in the Exon-Intron Database (EID) Brief Bioinform, June 1, 2006; 7(2): 178 - 185. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Parmley, J. V. Chamary, and L. D. Hurst Evidence for Purifying Selection Against Synonymous Mutations in Mammalian Exonic Splicing Enhancers Mol. Biol. Evol., February 1, 2006; 23(2): 301 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kralovicova, M. B. Christensen, and I. Vorechovsky Biased exon/intron distribution of cryptic and de novo 3' splice sites Nucleic Acids Res., September 1, 2005; 33(15): 4882 - 4898. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Eskesen, S. T.
- Articles by Ruvinsky, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Eskesen, S. T.
- Articles by Ruvinsky, A.







