Genetics, Vol. 152, 269-280, May 1999, Copyright © 1999
Synonymous Rates at the RpII215 Gene of Drosophila: Variation Among Species and Across the Coding Region
Ana Lloparta and
Montserrat Aguadéa
a Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, 08071 Barcelona, Spain
Corresponding author:
Ana Llopart, Departament Genètica, Facultat de Biologia, Universitat de Barcelona, Av. Diagonal 645, 08071 Barcelona, Spain., llopart{at}porthos.bio.ub.es (E-mail)
Communicating editor: W. STEPHAN
 | ABSTRACT |
|---|
The region encompassing the RpII215 gene that encodes the largest component of the RNA polymerase II complex (1889 amino acids) has been sequenced in Drosophila subobscura, D. madeirensis, D. guanche, and D. pseudoobscura. Nonsynonymous divergence estimates (Ka) indicate that this gene has a very low rate of amino acid replacements. Given its low Ka and constitutive expression, synonymous substitution rates are, however, unexpectedly high. Sequence comparisons have allowed the molecular clock hypothesis to be tested. D. guanche is an insular species and it is therefore expected to have a reduced effective size relative to D. subobscura. The significantly higher rate of synonymous substitutions detected in the D. guanche lineage could be explained if synonymous mutations behave as nearly neutral. Significant departure from the molecular clock hypothesis for synonymous and nonsynonymous substitutions was detected when comparing the D. subobscura, D. pseudoobscura, and D. melanogaster lineages. Codon bias and synonymous divergence between D. subobscura and D. melanogaster were negatively correlated across the RpII215 coding region, which indicates that selection coefficients for synonymous mutations vary across the gene. The C-terminal domain (CTD) of the RpII215 protein is structurally and functionally differentiated from the rest of the protein. Synonymous substitution rates were significantly different in both regions, which strongly indicates that synonymous mutations in the CTD and in the non-CTD regions are under detectably different selection coefficients.
SYNONYMOUS mutations have been classically considered to behave near neutrality (KIMURA 1983
) because they do not contribute to variation of the primary structure of proteins. According to Ohta's nearly neutral theory (OHTA and KIMURA 1971
; OHTA 1972
), mutations that comply |Nes|
1, where Ne is the effective population size and s is the selection coefficient, should be defined as nearly neutral. Contrarily, strictly neutral mutations should satisfy |Nes| << 1 for any effective population size. Under a strictly neutral model (KIMURA 1968
, KIMURA 1983
; KIMURA and OHTA 1971
) the rate of substitutions per year (Ky) is equal to the neutral mutation rate per generation (µg) divided by the generation time (g): Ky = µg/g. Then, assuming constant neutral mutation rate per generation, strictly neutral mutations should exhibit generation-time effects (OHTA and KIMURA 1971
; GILLESPIE 1991
). For slightly deleterious mutations Ky =
(KIMURA and OHTA 1971
), and consequently rate constancy will be achieved only if there is a negative correlation between Ne and g. Such a correlation has been reported by CHAO and CARR 1993
for highly diverged species. A direct prediction from the nearly neutral theory is therefore that synonymous substitution rates among different lineages of closely related Drosophila species with equal generation times will depend on the effective population size. Also, testing the molecular clock hypothesis (ZUCKERKANDL and PAULING 1965
) by comparing the synonymous rates among different lineages can shed some light on the magnitude of the selection coefficients of mutations at synonymous sites in Drosophila.
Synonymous substitution rates vary extensively among different genes in Drosophila, and this variation is negatively correlated with codon usage bias (SHIELDS et al. 1988
; SHARP and LI 1989
). KLIMAN and HEY 1994
detected a small but significant correlation between the base composition of introns and codon bias among different loci of Drosophila melanogaster, which indicates a residual effect of mutational processes on base composition at synonymous sites. In Drosophila, variation in the strength of natural selection acting on synonymous mutations, usually related to the expression level, has been proposed to explain the observed pattern of variation of codon bias among genes (SHIELDS et al. 1988
; MORIYAMA and HARTL 1993
; KLIMAN and HEY 1994
). Indeed, selective constraints on synonymous sites to ensure amino acid incorporation accuracy and/or to enhance elongation rates in the translation process (KURLAND 1987A
, KURLAND 1987B
; PRECUP and PARKER 1987
; BULMER 1991
; AKASHI 1994
; COMERON and KREITMAN 1998
) have also been proposed to modulate the codon bias of a particular gene. The secondary structure of mRNA could also have some effect on Drosophila synonymous substitutions (COMERON and AGUADE 1996
) as has been suggested in enterobacteria (LAWRENCE et al. 1991
; EYRE-WALKER and BULMER 1993
). Recently, COMERON et al. 1999
have proposed that both the recombinational environment and the length of the coding region also contribute significantly to synonymous divergence and codon bias of a particular gene in Drosophila.
The RNA polymerase II complex is responsible for transcription of protein-encoding genes (MCKNIGHT and YAMAMOTO 1992
). The locus that encodes its largest subunit has been sequenced in several eukaryotic organisms. Extensive homology between prokaryotic and eukaryotic RNA polymerases II has been reported (ALLISON et al. 1985
). The largest subunit of the RNA polymerase II complex has an unusual C-terminal domain (CTD) that consists of a motif of seven amino acids tandemly repeated (CORDEN 1990
). The consensus sequence of the repeat is Tyr-Ser-Pro-Thr-Ser-Pro-Ser. The number of repeats varies among species, but the sequence characteristics are highly conserved among eukaryotes. Its function is considered essential for the viability of a wide range of eukaryotes (NONET et al. 1987
; ALLISON et al. 1988
), and it has been shown recently that it plays an important role in mRNA capping (MCCRACKEN et al. 1997
).
In Drosophila the largest component of the RNA pol II complex is a 215-kD peptide that is encoded by the RpII215 gene (JOKERST et al. 1989
). This gene is supposed to be ubiquitously expressed in cells because of its critical role in the transcriptional process, which, on the other hand, leads to the prediction that it will be a very conserved gene at the amino acid level. The expression pattern and the expected amino acid constraints would support the a priori idea that the RpII215 gene should exhibit a high codon bias. However, as predicted by LI 1987
for the case of absolute linkage among sites, smaller selection coefficients on individual synonymous mutations would be expected from its very long coding region (1889 codons in D. melanogaster; COMERON et al. 1999
). These features make the RpII215 gene an ideal candidate for the study of different aspects of synonymous substitution rates. In this sense, the comparison between the palearctic species D. subobscura and the insular species D. madeirensis and D. guanche provides an excellent opportunity to test the effects on synonymous rates of the expected reduced Ne of the latter species by contrasting the molecular clock hypothesis. On the other hand, we have addressed the question of whether the proposed reduction of Ne in D. melanogaster as compared to D. simulans (AKASHI 1995
, AKASHI 1996
) can also be detected in the RpII215 gene when D. melanogaster is compared to D. subobscura and D. pseudoobscura. Moreover, the long coding region of the RpII215 gene has allowed us to study the pattern of variation of codon bias and synonymous divergence across the gene, where possible sources of variation among genes (like mutational pattern, recombinational environment, or expression levels) can be overruled. Finally, the presence of the CTD region in this gene provides a unique chance to explore possible natural selection fingerprints in two regions of a gene that are structurally and functionally differentiated (ALLISON et al. 1988
).
 | MATERIALS AND METHODS |
|---|
DNA preparation, cloning, and sequencing strategy:
Two recombinant phages were isolated from a random genomic library of D. subobscura (
subRa111) from Raíces (Canary Islands). Two different sets of probes from D. melanogaster were used separately in the screening (Figure 1): (i) a 1.3-kb EcoRI-PstI and a 1.2-kb PstI-EcoRI fragment (isolated from a recombinant plasmid kindly provided by A. Greenleaf) that included the entire second exon of the RpII215 gene; (ii) a 3.6-kb EcoRI-EcoRI fragment that covered the third and fourth exons of the gene. After Southern blot analysis of the two positive phages, three DNA fragments (a 3.4-kb EcoRI vector and a 4.8-kb EcoRI-EcoRI from one of the phages and a 5.2-kb SalI-SalI from the second phage) were subcloned in pBluescriptII vectors. A set of nested deletions was obtained for each orientation of each subclone (HENIKOFF 1984
). The sequence of both strands of a 7.8-kb region was obtained by manually sequencing each subclone, entirely or partially, using double-stranded DNA and the dideoxy chain termination method (SANGER et al. 1977
).

View larger version (11K):
In this window
In a new window
Download PPT slide
|
Figure 1.
Structure of the Rp-II215 gene in D. melanogaster. Exons are shown as gray boxes. Black lines represent introns and flanking regions. The CTD included in exon 4 is depicted as a white box. Probes used to isolate the RpII215 region in D. subobscura are presented below the gene.
|
|
For D. madeirensis and D. guanche, genomic DNA was extracted from adult flies of isofemale lines using a CsCl gradient (BINGHAM et al. 1981
) and a standard small-scale method (ASHBURNER 1989
) with minor modifications, respectively. After digestion of the DNA with a set of restriction enzymes and Southern blot transfer, two different probes that encompassed the 5' and 3' ends of the RpII215 gene of D. subobscura were used separately to perform hybridization: (i) a 0.7-kb ClaI-digested PCR fragment that included 0.6 kb of the noncoding 5' region, the first exon, and a small part of the first intron; (ii) a 0.6-kb SalI-vector fragment within the CTD. In both species, digestion with HindIII produced a single band that showed cross-hybridization with both probes, and this enzyme was then used to construct the corresponding libraries with the
DASH vector (Stratagene, La Jolla, CA). A 1.8-kb SalI-SalI fragment that partially contained the third and fourth exons of the RpII215 gene of D. subobscura was used as probe to perform screening of the libraries. In each case, several positive recombinant phages were isolated. After restriction map analysis, the entire RpII215 region was cloned in two DNA fragments for D. madeirensis (7.8-kb EcoRI-HindIII and 4.8-kb EcoRI-EcoRI) and in three fragments for D. guanche (4-kb EcoRI-EcoRI, 0.3-kb EcoRI-EcoRI, and 4.5-kb EcoRI-XbaI). Synthetic oligonucleotides designed on the D. subobscura sequence were used to obtain the complete sequence of the RpII215 region of D. madeirensis (7.1 kb) and D. guanche (6.2 kb) for both strands. For those regions with a high level of divergence, new specific primers were designed. The dideoxy method and double-stranded plasmid DNA were used to manually sequence one strand, while fluorescent dye-terminator chemistry (Perkin Elmer, Norwalk, CT) and an ABI 377 sequencer were used to obtain the sequence of the other strand by cycle sequencing.
For D. pseudoobscura, a highly inbred strain (kindly provided by C. Segarra) and a standard small-scale method (ASHBURNER 1989
) were used to isolate genomic DNA from adult flies. Four overlapping fragments, 1.9, 2.3, 1.4, and 1.8 kb long, were amplified by PCR from genomic DNA. Most of the primers used for the amplifications and for sequencing were designed on the D. subobscura sequence, although some specific oligonucleotides were designed in highly diverged regions. The cycle sequencing method with fluorescent dye-terminator chemistry and an ABI 377 sequencer were used to obtain the sequence of both strands of a 5.7-kb region.
All sequences were assembled using Staden's programs (STADEN 1982
). Sequences newly reported in this study are deposited in the EMBL sequence database library under accession nos. Y18876, Y18877, Y18878, and Y18879.
Species divergence:
The GCG Wisconsin package programs (v 7.3; DEVEREUX et al. 1984
) were used to align the sequences. For coding regions, insertions and deletions were placed by eye to minimize the number of amino acid replacements. The K-Estimator 4.4 program provided by J. Comeron was used to calculate synonymous (Ks) and nonsynonymous (Ka) divergence estimates per site according to COMERON 1995
. This is a modification of LI 1993
method that tends to minimize stochastic errors and quantifies in a more accurate way the number of transitions/transversions substitutions. In the sliding window analysis, confidence intervals of the estimated divergence values for each window were calculated according to COMERON and AGUADE 1996
. For noncoding regions, KIMURA'S 1980
two-parameter method was applied to estimate the number of substitutions per site.
Codon bias:
The codon bias index (CBI; MORTON 1993
) was used to estimate the degree of biased usage of synonymous codons (codon bias) of the RpII215 coding region in each species. This measure exhibits a much lower dependence on the number of analyzed codons and a lower dispersion due to sampling than "Scaled
2" (
2/L; SHIELDS et al. 1988
). Moreover, it is independent of the length of the coding region (COMERON and AGUADE 1998
). For the subobscura cluster species (D. subobscura, D. madeirensis, and D. guanche), codons were classified as preferred and unpreferred according to AKASHI and SCHAEFFER 1997
.
Tests of the molecular clock hypothesis:
Two different kinds of tests were used to contrast the molecular clock hypothesis (ZUCKERKANDL and PAULING 1965
). Tajima's relative rate tests with known outgroup (1D and 2D; TAJIMA 1993
) were applied to test equal rates of evolution between lineages: D. guanche was used as the outgroup between D. subobscura and D. madeirensis, and D. pseudoobscura was used as the outgroup between D. subobscura and D. guanche. Both in the D. subobscura/D. madeirensis/D. guanche and in the D. subobscura/D. pseudoobscura/D. melanogaster comparisons, the index of dispersion (R(t); GILLESPIE 1989, 1991) was calculated to test whether the number of substitutions on a lineage is Poisson-distributed and whether evolutionary rates across lineages are constant (KIMURA and OHTA 1971
). In the latter comparison and to avoid negative values, the number of substitutions on a given branch of the phylogeny built by the three species was calculated by comparison of each sequence with a generated ancestral sequence that was constructed using a parsimony criterion (ZENG et al. 1998
). Computer simulations that considered the multiple hits effect were conducted to obtain significance levels of the estimated R values (BULMER 1989
; GILLESPIE 1989
, GILLESPIE 1991
) in accordance with ZENG et al. 1998
.
 | RESULTS |
|---|
Molecular evolutionary rates:
The RpII215 region was sequenced in four species of the obscura group (Figure 2): D. subobscura (7816 bp), D. madeirensis (7103 bp), D. guanche (6220 bp), and D. pseudoobscura (5666 bp). The structure of the gene in D. subobscura consists of four exons separated by three introns whose lengths are 1029, 77, and 108 bp, respectively. The different exons presented the same length in the four species compared (81, 2244, 2245, and 1097 bp long, respectively). In D. guanche, the first large intron presented a 783-bp deletion in the central region relative to D. subobscura and D. madeirensis. This central region of D. subobscura showed extensive similarity to an 823-bp region that is defined as uncharacterized highly repetitive sequence of the same species (EMBL accession no. AF043638), and is also described in D. madeirensis (AF043637) and in D. guanche (AF043639). Table 1 gives a summary of divergence estimates between species for the RpII215 region. Introns and flanking regions could only be aligned between the obscura group species. Divergence estimates in these noncoding regions were generally higher than the corresponding synonymous estimates.
D. madeirensis and D. guanche are insular species geographically restricted to Madeira and to the Canary Islands, respectively. In contrast, D. subobscura is a palearctic species, and consequently it is expected to have a larger Ne than the insular species. We have analyzed the effect of the expected smaller Ne of D. madeirensis and D. guanche on the nucleotide substitution rates by applying Tajima's relative rate tests (TAJIMA 1993
) between these species and D. subobscura. On the other hand, AKASHI 1995
, AKASHI 1996
proposed that D. melanogaster exhibits the effects of a reduction of Ne on nucleotide substitutions when compared to D. simulans. We have addressed the question of whether those effects in D. melanogaster are also detectable when compared to the D. subobscura and D. pseudoobscura lineages. Relative rate tests, however, use only the outgroup species to assign nucleotide substitutions to each internal lineage, and substitution rates cannot be tested in the external branch. The mean-to-variance ratio (KIMURA 1983
; GILLESPIE 1989
, GILLESPIE 1991
), also known as R(t), allowed us to test the molecular clock hypothesis (ZUCKERKANDL and PAULING 1965
) in a phylogeny built by D. subobscura, D. pseudoobscura, and D. melanogaster.
Relative rate tests between D. subobscura, D. madeirensis, and D. guanche:
Tajima's relative rate tests (1D and 2D methods; TAJIMA 1993
) were applied to coding (synonymous and nonsynonymous) and noncoding regions separately. For variable sites, only those positions with the same nucleotide in two of the three sequences compared were considered. As in the comparisons between D. subobscura, D. madeirensis, and D. guanche, variable codons were affected by a single change, and each observed substitution in the coding region of the RpII215 gene could be classified unambiguously as synonymous or nonsynonymous. Results from the 1D method are shown in Table 2. No significant departure from the molecular clock hypothesis was detected between the D. subobscura and D. madeirensis lineages using D. guanche as the outgroup, either when the total number of the observed substitutions (1D method) was considered or when transitional and transversional changes were analyzed separately (2D method). In contrast, the D. guanche lineage exhibited a significantly larger number of substitutions in the coding region than the D. subobscura lineage using D. pseudoobscura as the outgroup, when both the 1D method (P = 0.016) and the 2D method (P = 0.015) were used. The different substitution rates in the coding region between both lineages can be attributed to synonymous substitutions (P = 0.007 and P = 0.014 for 1D and 2D methods, respectively). Contrarily, the noncoding region seems to fit the constant rate hypothesis, which suggests that noncoding and synonymous substitutions in the RpII215 region behave differently.
Index of Dispersion Among D. subobscura, D. pseudoobscura, and D. melanogaster:
We used Gillespie's R(t) to test the molecular clock hypothesis in the phylogeny built by D. subobscura, D. pseudoobscura, and D. melanogaster (see Figure 2). Analysis focused on the RpII215 coding region because noncoding regions could not be aligned between any species of the obscura group and D. melanogaster. The numbers of synonymous and nonsynonymous substitutions in each lineage were calculated using the inferred ancestral sequence (sequence 0 in Figure 2) of both obscura species according to ZENG et al. 1998
. The lineage from the D. subobscura/D. pseudoobscura split to D. melanogaster was defined as the D. melanogaster lineage. LANGLEY and FITCH 1974
pointed out the important contribution of lineage and residual effects on R values. Lineage effects, like the generation-time effect and different branch lengths, were considered constant within a lineage and were removed by the method proposed by GILLESPIE 1989
, GILLESPIE 1991
. In the present study, we used the synonymous and nonsynonymous weights calculated in ZENG et al. 1998
from sequences of 24 genes in the same three species. The R values estimated for synonymous and nonsynonymous substitutions in the RpII215 gene are shown in Table 3.
View this table:
In this window
In a new window
|
Table 3.
Estimated numbers of synonymous and nonsynonymous substitutions and indexes of dispersion [R(t)]
|
|
A significant departure from the expected Poisson process was detected for synonymous (P = 0.025) and nonsynonymous (P = 0.008) substitutions in the RpII-215 gene. The D. melanogaster lineage presented an excess of both kinds of substitutions. A similar result was obtained when the numbers of synonymous and nonsynonymous substitutions on each branch were estimated directly from the corrected distances between real sequences (GILLESPIE 1989
), as the calculated R(t) values (9.74 and 6.36 for synonymous and nonsynonymous substitutions, respectively) were even higher than those obtained by means of the constructed ancestral sequence.
Analysis of preferred and unpreferred codons:
It has been proposed that mutations at synonymous sites behave near neutrality (OHTA and KIMURA 1971
; OHTA 1972
; KIMURA 1983
), and their fate in the population would therefore depend on the effective population size. Codons within a synonymous family can be classified as major (preferred) and nonmajor (unpreferred) codons according to AKASHI 1995
. Selection coefficients for mutations from preferred to unpreferred codons (unpreferred changes) were calculated by AKASHI 1995
, AKASHI 1997
, who suggested deleterious and beneficial effects on fitness of unpreferred and preferred changes, respectively. The observed numbers of preferred and unpreferred changes at the RpII215 coding region in the D. subobscura, D. madeirensis, and D. guanche lineages were studied and the results are summarized in Table 4. Preferred and unpreferred changes were assigned to one lineage by comparison to an outgroup sequence. For each variable site, the ancestral nucleotide was the one present in two of the three sequences compared. Positions with different nucleotides in the three sequences were not considered in this analysis. In the comparison between D. subobscura and D. guanche, the number of unpreferred changes in the D. guanche lineage was six times larger than the number of preferred changes. In contrast, preferred and unpreferred changes were equally frequent in the D. subobscura lineage. We tested the ratio of preferred to unpreferred changes between D. subobscura and each of the insular species (D. madeirensis and D. guanche) by applying a G-test of independence (Table 4). There was a significant excess (P = 0.017) of unpreferred changes in the D. guanche lineage as compared to the D. subobscura lineage. Otherwise, the D. madeirensis and D. subobscura lineages showed an equivalent ratio of preferred to unpreferred changes. Both results are consistent with our previous analysis using Tajima's relative rate test and support a reduction in the intensity of selection on synonymous mutations at the D. guanche lineage caused by its smaller Ne.
View this table:
In this window
In a new window
|
Table 4.
Observed numbers of unpreferred and preferred changes at the RpII215 coding region in the subobscura cluster species of Drosophila
|
|
Divergence and codon bias across the coding region:
Synonymous codon usage (codon bias) for the entire RpII215 coding region was studied using the CBI (MORTON 1993
). The CBI values for the species D. subobscura, D. madeirensis, D. guanche, and D. pseudoobscura were 0.505, 0.502, 0.475, and 0.522, respectively. For D. melanogaster (JOKERST et al. 1989
) the CBI value was 0.411. The distribution of synonymous substitutions across the RpII215 coding region was studied in the comparisons between D. subobscura, D. pseudoobscura, and D. melanogaster. We performed a sliding window analysis using five different window sizes: 360, 450, 540, 630, and 720 nucleotides. Figure 3 shows the analysis for a window size of 360 nucleotides. In the comparisons between D. melanogaster and either D. subobscura or D. pseudoobscura, for all window sizes there was a variable number of windows for which synonymous divergence could not be calculated because of the saturation of the observed synonymous substitutions per site (NA windows). All these windows encompassed, totally or partially, the CTD region that begins at nucleotide 4741 of the RpII215 coding region of D. melanogaster (nucleotide 4735 in the obscura group species). We generated a null distribution to calculate the probability of detecting the observed number of NA windows by chance. Codon positions of the D. melanogaster-D. subobscura comparison were randomized and the same sliding window analysis of synonymous divergence that was applied to the original data set was performed. The number of NA windows from the randomized sequences was compared to the observed number obtained from the data set. For all window sizes, the probability of obtaining by chance a number of NA windows equal or higher than that observed was
0.002, which indicates a heterogeneous distribution of synonymous substitutions across the RpII215 coding region.

View larger version (22K):
In this window
In a new window
Download PPT slide
|
Figure 3.
Sliding window analysis across the RpII215 coding region (in the x axis) of the number of synonymous substitutions per site (Ks in the y axis). Exons of the RpII215 gene are depicted as black rectangles below the graph. Ninety-five percent confidence intervals of the synonymous divergence estimates are shown as dotted lines. Window size, 360 nucleotides. sub, D. subobscura; pse, D. pseudoobscura; mel, D. melanogaster.
|
|
Figure 4 shows the codon usage bias across the RpII215 coding region of D. melanogaster, D. subobscura, and D. pseudoobscura. The distribution across the coding region of the codon bias estimator CBI was fairly similar in both species of the obscura group. The sequence of D. melanogaster, otherwise, showed a region (from approximately nucleotide 2500 to 4000 of the coding region) with a different pattern than that observed in the obscura species. The negative correlation between synonymous divergence and codon bias among genes is well known (SHIELDS et al. 1988
; SHARP and LI 1989
). This correlation has been usually associated with differences in the expression level (SHIELDS et al. 1988
; MORIYAMA and HARTL 1993
). However, the possible correlation between synonymous divergence and codon bias across a coding region cannot be explained by differences in the level of expression (COMERON and AGUADE 1996
). For adjacent windows of 540 nucleotides, synonymous divergence between D. melanogaster and D. subobscura across the RpII215 coding region was strongly correlated with the D. melanogaster codon bias (Kendall's nonparametric correlation
= -0.911; P = 0.0002) but not with that of D. subobscura (
= 0.0; P > 0.99; see Figure 5). The significant negative correlation held for window sizes of 360 and 630 nucleotides with probability values of 0.0041 and 0.013, respectively. These results are consistent with the observation that most synonymous substitutions between D. subobscura and D. melanogaster were located preferentially in the D. melanogaster lineage. In fact, there was a negative correlation across the coding region between the D. melanogaster codon bias and the number of synonymous substitutions per site in the D. melanogaster lineage (
= -0.644; P = 0.0095). Neither in the D. subobscura nor in the D. pseudoobscura lineages was there a significant relationship between codon bias and Ks across the RpII215 coding region (P = 0.53 and P = 0.94, respectively). These results support the possible acceleration of the D. melanogaster lineage detected by the significant index of dispersion.

View larger version (18K):
In this window
In a new window
Download PPT slide
|
Figure 4.
Sliding window analysis of the codon bias (CBI in y axis) across the RpII215 gene (exons are depicted as black rectangles below the graph) of the D. subobscura (sub), D. pseudoobscura (pse), and D. melanogaster (mel) species. Window size, 360 nucleotides.
|
|
Comparison between the CTD and non-CTD regions:
Gillespie's index of dispersion of synonymous substitutions was calculated separately for the CTD and the non-CTD regions of the RpII215 gene. Initially, two different sets of synonymous weights were used to compensate for lineage effects: (i) weights calculated by ZENG et al. 1998
, using 24 genes sequenced in the same three species (D. subobscura, D. pseudoobscura, and D. melanogaster), and (ii) weights of the entire RpII215 gene. The numbers of synonymous substitutions in each lineage and the calculated R values for the two regions are summarized in Table 5. The CTD region has accumulated more synonymous substitutions in the D. melanogaster lineage than expected according to both the ZENG et al. 1998
survey (R = 8.56, P = 0.002) and the whole RpII215 coding region (R = 3.91, P = 0.042). In contrast, the non-CTD region did not show a significant departure from the general tendency described in ZENG et al. 1998
. Finally, we addressed the possible incompatibility of synonymous rates between the non-CTD and the CTD regions in the three lineages studied. The CTD region showed a significant R value (P = 0.018) when the weights of the non-CTD region were used. The probability value was much lower (R = 19.31, P < 0.001) for the non-CTD region when the CTD weights were used. Forces with different intensity seem to have driven the synonymous evolution of both regions of the RpII215 gene.
View this table:
In this window
In a new window
|
Table 5.
Estimated numbers of synonymous substitutions and indexes of dispersion [R(t)] for the non-CTD and CTD regions of the RpII215 gene
|
|
 | DISCUSSION |
|---|
The observed low level of nonsynonymous divergence in the RpII215 gene indicates that purifying selection plays an important role in the evolution of the corresponding protein. In contrast, the synonymous substitution rate is moderately high. The average numbers of nonsynonymous (Ka) and synonymous (Ks) substitutions per site between D. melanogaster and the obscura species are 0.02 and 0.905, respectively, while the reported averages for 24 genes are 0.08 and 0.81, respectively (ZENG et al. 1998
). In the comparison between D. subobscura and D. pseudoobscura, which is not affected by the smaller Ne of D. melanogaster, the Ks estimate for the RpII215 gene is usually higher than those observed among genes with low levels of nonsynonymous divergence. In fact, of the 14 genes with Ka = 0.0094 or lower (ZENG et al. 1998
), 11 showed Ks values lower than the RpII215 estimate. The RpII215 gene encodes the largest component of the RNA pol II complex and, a priori, it is therefore expected to have a ubiquitous expression. Moreover, the reduced number of amino acid replacements in its coding region suggests that accuracy acting at the translational level may play a significant role in shaping its codon bias (AKASHI 1994
). Unexpectedly, however, the RpII215 gene of D. melanogaster showed low bias in codon usage. Recently, a positive correlation between synonymous divergence and the length of the coding region (COMERON 1997
; COMERON et al. 1999
) as well as a negative correlation between the degree of codon bias and the length of the coding region (COMERON 1997
; MORIYAMA and POWELL 1998
; COMERON et al. 1999
) have been found. COMERON et al. 1999
have proposed two different models to explain the correlation between the length of the coding region, the codon bias, and the synonymous divergence for the entire range of recombination rates in Drosophila. In these models selection on synonymous mutations would act less efficiently on long genes than on short genes. The RpII215 gene has a very long coding region (1889 codons in D. melanogaster), within the 5% longest coding regions sequenced in this species. Our observation of low codon bias and high synonymous divergence in the RpII215 gene would be consistent with those predicted for very long coding regions (COMERON et al. 1999
).
Evolutionary rates of the RpII215 gene:
If we assume that substitutions in noncoding regions are neutral, they should show generation-time effects (OHTA and KIMURA 1971
; OHTA 1993
) and be independent of changes in Ne. According to our results from TAJIMA's (1993) relative rate tests, the numbers of substitutions at the RpII215 noncoding region are not significantly different between the D. subobscura and D. guanche lineages. It is therefore likely that both species have a similar generation time and mutation rate. On the other hand, according to OHTA 1972
, nearly neutral mutations have selection coefficients close to the inverse of the effective population size (|Nes|
1), while effectively neutral mutations satisfy the inequality |Nes| << 1. Mutations on synonymous sites in Drosophila are under the influence of weak selection (AKASHI 1995
, AKASHI 1996
). The amount of synonymous mutations that could be maintained in a population and their probability of fixation would therefore depend on Ne (KIMURA 1983
). A reduction of the effective population size would increase the fraction of mutations that are considered strictly neutral.
D. guanche is restricted to very specific locations of the Canary Islands. As detected by the significant relative rate tests, the rate of synonymous substitutions at the RpII215 gene was higher in the D. guanche than in the D. subobscura lineage. Also, the R value for synonymous substitutions (R = 5.51, P = 0.01), estimated using weights from the five genes available in D. subobscura, D. madeirensis, and D. guanche species (Adh, Adhr, MARFANY and GONZALEZ-DUARTE 1993
; Gpdh, Sod, BARRIO and AYALA 1997
; and rp49, RAMOS-ONSINS et al. 1998
), was consistent with the relative rate test results. We propose, therefore, that this higher rate of synonymous substitutions in the D. guanche lineage may be caused by a smaller effective population size. This smaller Ne, probably associated with both the origin of D. guanche and its current distribution, could have increased the fixation rate of the nearly neutral synonymous mutations in the RpII215 gene. As argued above, the fate of mutations in noncoding regions would not have been affected because their selection coefficients would be much smaller (s << 1/Ne) and therefore their fate would not depend on Ne. A higher rate of synonymous substitutions in the D. guanche lineage as compared to the D. subobscura lineage might also be caused by a lower recombination rate at the RpII215 region, and therefore be equivalent to a lower Ne (HILL and ROBERTSON 1966
; FELSENSTEIN 1974
) in D. guanche. Changes in the recombinational map between closely related species (TRUE et al. 1996
) in centromere or telomere-proximal regions have been recently described. The cytological location of the RpII215 region on the sexual chromosome is equivalent (band 10A) in the three species of the subobscura cluster (SEGARRA and AGUADE 1992
) far from the centromeric and telomeric regions. Although we cannot exclude changes in the recombination rate between these species, the smaller Ne of D. guanche seems a more plausible explanation of the higher synonymous substitution rate in this lineage.
The proposed smaller Ne of D. guanche should have affected other regions of the genome. Analysis of the other five genes sequenced in the three species of the subobscura cluster revealed that the numbers of synonymous differences between D. subobscura and D. guanche were low (12, 15, 8, 5, and 2 for Adh, Adhr, Gpdh, Sod, and rp49, respectively) as compared to the 72 synonymous substitutions found in the RpII215 gene. Adhr, the gene with the highest number of synonymous substitutions, exhibited the same tendency (TAJIMA's 1D method, P = 0.075) to accumulate more synonymous changes in the D. guanche than in the D. subobscura lineage.
The smaller effective population size of D. guanche as compared to D. subobscura would have reduced the effectiveness of natural selection on synonymous mutations in that lineage. An equivalent effect in D. melanogaster as compared to D. simulans was reported by AKASHI 1996
. A reduction in the effectiveness of natural selection on synonymous sites would have caused the observed significant excess of unpreferred changes at the RpII215 gene in the D. guanche lineage. Our results, on the basis of the analysis of preferred and unpreferred codons, are consistent with measures of codon bias in the entire RpII215 coding region in both D. subobscura and D. guanche, which indicates that natural selection on synonymous mutations has been less effective in the D. guanche lineage. The constancy of the preferred to unpreferred changes ratio (see Table 4) was contrasted with a G-test using the number of nucleotide differences between sequences. Although this ratio is expected to be different for fixed and polymorphic changes, the ratio of total preferred to unpreferred changes (fixed and polymorphic) is predicted to be the same for lineages with equivalent branch lengths, Ne's, and generation times. Under the assumption of equal branch lengths and generation times, as would be the case for the D. subobscura and the D. guanche lineages when compared to D. pseudoobscura, the significant result from the G-test would also point to a different Ne between D. subobscura and D. guanche.
In the D. madeirensis lineage neither noncoding nor synonymous substitutions departed from molecular clock expectations when compared to the D. subobscura lineage. These results, though consistent with those found for the rp49 region (RAMOS-ONSINS et al. 1998
), are quite unexpected, considering that D. madeirensis, as D. guanche, is currently an insular species. Although the A (=X) chromosome shows some structural reorganizations between D. subobscura and D. madeirensis (PAPACEIT and PREVOSTI 1989
), the cytological location of the RpII215 gene does not change between these two species. The short divergence time between D. subobscura and D. madeirensis (0.61 x 106 years according to RAMOS-ONSINS et al. 1998
) could explain our results, considering that the average fixation time for neutral mutations is 4Ne generations and that this time increases exponentially for deleterious mutations (KIMURA 1983
). The much longer divergence time between D. subobscura and D. guanche would have allowed us, therefore, to detect the effect of a smaller Ne on synonymous substitution rates.
The mean-to-variance ratios (R) calculated for nonsynonymous and synonymous substitutions at the RpII215 gene were significantly higher than one. A reduction or fluctuation of Ne in the D. melanogaster lineage (ZENG et al. 1998
) could have caused the observed overdispersion. Congruently, the CBI measures of codon bias of the RpII215 gene in the obscura group species were systematically higher than in D. melanogaster, suggesting that the effectiveness of natural selection acting on synonymous sites might be different between the D. melanogaster and the obscura lineages. We conclude, therefore, that mutations on synonymous sites at the RpII215 gene are indeed nearly neutral and hence sensitive to changes in Ne.
Different selection coefficients on synonymous mutations across the RpII215 gene:
It is widely accepted that in Drosophila there is a negative correlation between synonymous substitution rates and codon bias (SHIELDS et al. 1988
; SHARP and LI 1989
). SHIELDS et al. 1988
suggested that in some Drosophila genes there would be a positive correlation between codon bias and expression levels, which would indicate a stronger selection on highly expressed genes. Differences in codon bias among genes could then be explained by different expression levels, which suggests a wide range of selection coefficients on synonymous mutations. The length of the RpII215 coding region (5667 bp) and the high level of synonymous divergence were the most adequate to study variation of codon bias and synonymous divergence across the coding region. Synonymous divergence (Ks) between D. subobscura and D. melanogaster correlates negatively with the D. melanogaster codon bias across the RpII215 coding region, which suggests that the selection coefficients on synonymous mutations may vary not only among genes but also within a particular gene. This variation of selection coefficients within the RpII215 gene is supported by the observed heterogeneous distribution of the synonymous divergence across the coding region and, obviously, cannot be explained by different levels of expression. The intragenic analysis of codon bias and synonymous divergence allowed us to study at what levels natural selection would most probably act on synonymous mutations at this gene. According to a mutation-selection-drift theory (BULMER 1991
), natural selection could modulate the synonymous codons usage to enhance translational efficiency (translational accuracy and/or elongation rates) or to maintain the mRNA secondary structure (HASEGAWA et al. 1979
; STEPHAN and KIRBY 1993
; PARSCH et al. 1997
). Conflicting selection pressures on synonymous mutations, as predicted by selection on mRNA structure, would prevent a negative correlation between Ks and codon bias (EYRE-WALKER and BULMER 1993
). The observed correlation between codon bias and synonymous divergence across the RpII215 coding region indicates that selection acts to enhance translational efficiency rather than to maintain the mRNA secondary structure. BULMER 1991
suggested that selection at the level of translational accuracy would generate a negative correlation between codon bias and the rate of nonsynonymous divergence (Ka). Equivalent results are then expected along a given coding region if there is a heterogeneous efficiency of selection on synonymous mutations (COMERON and AGUADE 1996
). There is a close to significant negative correlation between codon bias of D. melanogaster and nonsynonymous divergence between D. melanogaster and D. subobscura along the RpII215 coding region (
= -0.45; P = 0.052; window size, 540 nucleotides). In agreement with the general low rate of nonsynonymous substitutions, this correlation suggests that selection would contribute to shaping of the codon bias of the RpII215 gene by enhancing the accuracy of translation.
We have already pointed out that changes in the effective size of populations could affect the fate of mutations differently, depending on their selection coefficients. We have also shown that synonymous selection coefficients vary across the RpII215 coding region. The estimated mean-to-variance ratios (index of dispersion; GILLESPIE 1989
, GILLESPIE 1991
) for the CTD and the non-CTD regions of the RpII215 gene support the proposal that selection coefficients of synonymous mutations are detectably different in these two regions. Indeed, the synonymous substitution rates of the non-CTD region did not show a significant departure from the tendency described in ZENG et al. 1998
. In contrast, the proposed reduction or fluctuation of the D. melanogaster Ne would have had a stronger effect on the CTD region. The CTD is a highly conserved structure with an essential function among a wide range of organisms (NONET et al. 1987
). It contains several amino acids (primarily serine residues but also threonine and tyrosine residues to a lesser degree) that can be phosphorylated (ZHANG and CORDEN 1991
; BASKARAN et al. 1993
; YURYEV and CORDEN 1996
). An important interaction between the phosphorylated CTD of the largest subunit of the RNA polymerase II complex and the enzyme responsible for mRNA capping has been reported recently (MCCRACKEN et al. 1997
). A stronger selective constraint at the translational level on synonymous mutations in the CTD region than in the non-CTD region may explain the significantly detected different effect of the Ne change in the D. melanogaster lineage in these two regions of the RpII215 gene.
 | ACKNOWLEDGMENTS |
|---|
We are grateful to J. M. Comeron for computer simulations, helpful discussion, and comments on the manuscript. We thank Serveis Cien tífico-Tècnics of Universitat de Barcelona for automatic sequencing facilities. A.Ll. was a predoctoral fellow from Ministerio de Educación y Ciencia, Spain, during this study. This work was supported by grants PB94-0923 from Dirección General de Investigación Científica y Técnica, Ministerio de Educación y Ciencia, Spain, and 1997 SGR-59 from Comissió Interdepartamental de Recerca i Innovació Tecnològica, Generalitat de Catalunya, to M.A.
Manuscript received September 29, 1998; Accepted for publication January 21, 1999.
 | LITERATURE CITED |
|---|
AKASHI, H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927-935[Abstract].
AKASHI, H., 1995 Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:1067-1076[Abstract].
AKASHI, H., 1996 Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster.. Genetics 144:1297-1307[Abstract].
AKASHI, H., 1997 Codon bias evolution in Drosophila. Population genetics of mutation-selection-drift. Gene 205:269-278[Medline].
AKASHI, H. and S. W. SCHAEFFER, 1997 Natural selection and the frequency distributions of "silent" DNA polymorphism in Drosophila.. Genetics 146:295-307[Abstract].
ALLISON, L. A., M. MOYLE, M. SHALES, and C. J. INGLES, 1985 Extensive homology among the largest subunits of eukaryotic and prokaryotic RNA polymerases. Cell 42:599-610[Medline].
ALLISON, L. A., J. K. WONG, V. D. FITZPATRICK, M. MOYLE, and C. J. INGLES, 1988 The C-terminal domain of the largest subunit of RNA polymerase of Saccharomyces cerevisiae, Drosophila melanogaster, and mammals: a conserved structure with an essential function. Mol. Cell. Biol. 8:321-329[Abstract/Free Full Text].
ASHBURNER M., 1989 Drosophila: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
BARRIO, E. and F. J. AYALA, 1997 Evolution of the Drosophila obscura species group inferred from the Gpdh and Sod genes. Mol. Phylogenet. Evol. 7:79-93[Medline].
BARRIO, E., A. LATORRE, A. MOYA, and F. J. AYALA, 1992 Phylogenetic reconstruction of the Drosophila obscura group on basis of mitochondrial DNA. Mol. Biol. Evol. 9:621-635[Abstract].
BASKARAN, R., M. E. DAHMUS, and J. Y. WANG, 1993 Tyrosine phosphorylation of mammalian RNA polymerase II carboxyl-terminal domain. Proc. Natl. Acad. Sci. USA 90:11167-11171[Abstract/Free Full Text].
BINGHAM, P. M., R. LEVIS, and G. M. RUBIN, 1981 Cloning of DNA sequences from white locus of D. melanogaster by a novel and general method. Cell 25:693-704[Medline].
BULMER, M., 1989 Estimating the variability of substitution rates. Genetics 123:615-619[Abstract/Free Full Text].
BULMER, M., 1991 The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897-907[Abstract].
CHAO, L. and D. E. CARR, 1993 The molecular clock and the relationship between population size and generation time. Evolution 47:688-690.
COMERON, J. M., 1995 A method for estimating the numbers of synonymous and nonsynonymous substitutions per site. J. Mol. Evol. 41:1152-1159[Medline].
COMERON, J. M., 1997 Estudi de la variabilitat nucleotídica a Drosophila: Regió Xdh a D. subobscura. PhD thesis. Universitat de Barcelona, Barcelona, Spain.
COMERON, J. M. and M. AGUADÉ, 1996 Synonymous substitutions in the Xdh gene of Drosophila: Heterogeneous distribution along the coding region. Genetics 144:1053-1062[Abstract].
COMERON, J. M. and M. AGUADÉ, 1998 An evaluation of measures of synonymous codon usage bias. J. Mol. Evol. 47:268-274[Medline].
COMERON, J. M. and M. KREITMAN, 1998 The correlation between synonymous and nonsynonymous substitutions in Drosophila: Mutation, selection, or relaxed constraints? Genetics 150:767-775[Abstract/Free Full Text].
COMERON, J. M., M. KREITMAN, and M. AGUADÉ, 1999 Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239-249[Abstract/Free Full Text].
CORDEN, J. L., 1990 Tails of RNA polymerase II. Trends Biochem. Sci. 15:383-387[Medline].
DEVEREUX, J., P. HAEBERLI, and O. SMITHIES, 1984 A comprehensive set of sequence analysis programs for VAX. Nucleic Acids Res. 12:387-395.
EYRE-WALKER, A. and M. BULMER, 1993 Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res. 21:4599-4603[Abstract/Free Full Text].
FELSENSTEIN, J., 1974 The evolutionary advantage of recombination. Genetics 78:737-756[Abstract/Free Full Text].
GILLESPIE, J. H., 1989 Lineage effects and index of dispersion of molecular evolution. Mol. Biol. Evol. 6:636-647[Abstract].
GILLESPIE, J. H., 1991 The Causes of Molecular Evolution. Oxford Series in Ecology and Evolution, Oxford University Press, New York.
HASEGAWA, A. M., T. YASUNAGA, and T. MIYATA, 1979 Secondary structure of MS2 phage RNA and bias in code word usage. Nucleic Acids Res. 7:2073-2079[Abstract/Free Full Text].
HENIKOFF, S., 1984 Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene 28:351-359[Medline].
HILL, W. G. and A. ROBERTSON, 1966 The effect of linkage on limits to artificial selection. Genet. Res. 8:269-294[Medline].
JOKERST, R. S., J. R. WEEKS, W. A. ZEHRING, and A. L. GREENLEAF, 1989 Analysis of the gene encoding the largest subunit of RNA polymerase II in Drosophila. Mol. Gen. Genet. 215:266-275[Medline].
KIMURA, M., 1968 Evolutionary rate at the molecular level. Nature 217:624-626[Medline].
KIMURA, M., 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120[Medline].
KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK.
KIMURA, M. and T. OHTA, 1971 On the rate of molecular evolution. J. Mol. Evol. 1:1-17[Medline].
KLIMAN, R. M. and J. HEY, 1994 The effects of mutations and natural selection on codon bias in the genes of Drosophila. Genetics 137:1049-1056[Abstract].
KRIMBAS, C. B. and M. LOUKAS, 1984 Evolution of the obscura group Drosophila species. I. Salivary chromosomes and quantitative characters in D. subobscura and two closely related species. Heredity 53:469-482.
KURLAND, C. G., 1987a Strategies for efficiency and accuracy in gene expression. 1. The major codon preference: a growth optimization strategy. Trends Biochem. Sci. 12:126-128.
KURLAND, C. G., 1987b Strategies for efficiency and accuracy in gene expression. 2. Growth optimized ribosomes. Trends Biochem. Sci. 12:169-171.
LANGLEY, C. H. and W. M. FITCH, 1974 An examination of the constancy of the rate of molecular evolution. J. Mol. Evol. 3:161-177[Medline].
LAWRENCE, J. G., D. L. HARTL, and H. OCHMAN, 1991 Molecular considerations in evolution of bacterial genes. J. Mol. Evol. 33:241-250[Medline].
LI, W.-H., 1987 Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24:337-345[Medline].
LI, W.-H., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96-99[Medline].
LOUKAS, M., C. B. KRIMBAS, and Y. VERGINI, 1984 Evolution of the obscura group Drosophila species. II. Phylogeny of ten species based on electrophoretic data. Heredity 53:483-493.
MARFANY, G. and R. GONZÀLEZ-DUARTE, 1993 Characterization and evolution of the Adh genomic region in Drosophila guanche and Drosophila madeirensis.. Mol. Phylogenet. Evol. 2:13-22[Medline].
MCCRACKEN, S., N. FONG, E. ROSONINA, K. YANKULOV, and G. BROTHERS et al., 1997 5'-capping enzymes are targeted to pre-mRNA by binding to the phosphorylated carboxy-terminal domain of RNA polymerase II. Genes Dev. 11:3306-3318[Abstract/Free Full Text].
MCKNIGHT, L., and K. R. YAMAMOTO, 1992 Transcriptional Regulation. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
MORIYAMA, E. N. and D. L. HARTL, 1993 Codon usage bias and base compositions of nuclear genes in Drosophila. Genetics 134:847-858[Abstract].
MORIYAMA, E. N. and J. P. POWELL, 1998 Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli.. Nucleic Acids Res. 26:3188-3193<