- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Comeron, J. M.
- Articles by Kreitman, M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Comeron, J. M.
- Articles by Kreitman, M.
Population, Evolutionary and Genomic Consequences of Interference Selection
Josep M. Comerona,b and Martin Kreitmanaa Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637
b Department of Biological Sciences, University of Iowa, Iowa City, Iowa 52242
Corresponding author: Josep M. Comeron, University of Iowa, 433 Biology Bldg., Iowa City, IA 52242., josep-comeron{at}uiowa.edu (E-mail)
Communicating editor: N. TAKAHATA
| ABSTRACT |
|---|
Weakly selected mutations are most likely to be physically clustered across genomes and, when sufficiently linked, they alter each others' fixation probability, a process we call interference selection (IS). Here we study population genetics and evolutionary consequences of IS on the selected mutations themselves and on adjacent selectively neutral variation. We show that IS reduces levels of polymorphism and increases low-frequency variants and linkage disequilibrium, in both selected and adjacent neutral mutations. IS can account for several well-documented patterns of variation and composition in genomic regions with low rates of crossing over in Drosophila. IS cannot be described simply as a reduction in the efficacy of selection and effective population size in standard models of selection and drift. Rather, IS can be better understood with models that incorporate a constant "traffic" of competing alleles. Our simulations also allow us to make genome-wide predictions that are specific to IS. We show that IS will be more severe at sites in the center of a region containing weakly selected mutations than at sites located close to the edge of the region. Drosophila melanogaster genomic data strongly support this prediction, with genes without introns showing significantly reduced codon bias in the center of coding regions. As expected, if introns relieve IS, genes with centrally located introns do not show reduced codon bias in the center of the coding region. We also show that reasonably small differences in the length of intermediate "neutral" sequences embedded in a region under selection increase the effectiveness of selection on the adjacent selected sequences. Hence, the presence and length of sequences such as introns or intergenic regions can be a trait subject to selection in recombining genomes. In support of this prediction, intron presence is positively correlated with a gene's codon bias in D. melanogaster. Finally, the study of temporal dynamics of IS after a change of recombination rate shows that nonequilibrium codon usage may be the norm rather than the exception.
THE general concept of effective population size (Ne), due to ![]()
![]()
) and Neµ (ß), respectively, include effective population size (see Table 1). However, polymorphism levels are not constant across the genome but rather are correlated with recombination rates, suggesting that additional factors may be influencing Ne (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
Further investigation on whether Ne can be viewed as varying across a genome takes advantage of its consequences on the effectiveness of weak selection. Theory predicts that the evolutionary dynamics of mutations whose selective effects are on the order of the reciprocal of population size (i.e.,
0.252.5) are expected to be very sensitive to small shifts in Ne (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Several models of strong selection (
>> 1) have been proposed to explain why Ne is reduced in regions of low recombination: (i) the hitchhiking (HH) and pseudo-HH (pHH) models, which invoke frequent positive Darwinian selection (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
However, strong selection is not a requirement for this effect. ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Weakly selected mutations, taken individually, are not expected to have a measurable effect on population parameters or on the tree topology of linked neutral mutations (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
We also investigate two other consequences of IS under low rates of recombination. First, because genes (exons and regulatory regions) are embedded in a matrix of generally less severely constrained DNA, IS may occur at sites with well-defined boundaries along the DNA. Therefore, we study the expected consequences of IS along such intervals under selection. We hypothesize that the effects of IS should not be uniform across a gene, and we use simulations to generate predictions that can be tested with Drosophila genomic data. Second, neutral sequences located between groups or clusters of selected sites under weak/moderate selection (e.g., introns within coding regions of genes) can be viewed as modifiers of recombination and hence can alter the effectiveness of selection (![]()
![]()
| MATERIALS AND METHODS |
|---|
Forward computer simulations:
A Wright-Fisher model was simulated with N diploid individuals (2N chromosomes) as previously described (![]()
![]()
N, respectively (see Table 1 for definitions). The number of recombination events per meiosis is not restricted (assuming no chiasma interference) to avoid underestimating the effect of recombination in long sequences when
N is high. The mutation process allows only two allelic states at a site, and reversible mutation is permitted, mimicking the mutation process between preferred (p) and unpreferred (u) codons. Mutation rates from p to u and vice versa are w and v, respectively, where
is the mutational bias. Unless otherwise indicated, we applied a mutation rate of
, with
. The previous assumption of only two allelic states (![]()
![]()
![]()
![]()
![]()
![]()
![]()
The fitness of each individual is based only on the selected sequence. The selection differential between p (preferred allele) and u (unpreferred allele) in the selected sequence is +s; fitness is multiplicative over sites and mutations are semidominant in their effect on fitness. Each new generation is obtained by first choosing N individuals (2N chromosomes) with probability proportional to their relative fitness. The next generation is constituted by randomly pairing the 2N chromosomes (each composed by the selected and adjacent neutral sequence), which are possibly mutated and/or recombined to form N new diploid individuals.
As indicated in RESULTS, the time to reach base composition equilibrium under a mutation-selection-drift (MSD) balance and IS (e.g., codon usage) when
may take on the order of
100250 N generations. Accordingly, our study of IS at equilibrium begins after a minimum of 250 N generations to assure base composition equilibrium. Each independent population realization was analyzed every N generations for a minimum of 1000 N generations. Population parameters were estimated in 20 independent samples. All estimates of population and evolutionary parameters (heterozygosity, nucleotide diversity, frequency skew, fixation rates) as well as the frequency of preferred codons (P) were obtained by studying the same number of sites, 250, regardless of the total number of sites in the sequence when L
250. These studied sites were homogeneously distributed across the sequence, unless explicitly indicated, to assure an average estimate of the parameters across the region. In every simulation both the selected and neutral sequences were analyzed. The ranges of parameter values we investigated for recombination, selection, and length were 0
N
0.4, 0.25
N
2.5, and 125
L
2500. The ranges of recombination rates under study are representative of most eukaryotes, including D. melanogaster. Assuming Ne
1 x 106 for D. melanogaster (![]()
N < 0.05,
N < 0.1 after taking into account gene conversion (![]()
![]()
![]()
N
0.004 may contain
15% of genes using rates of crossing over; these genomic regions are defined by the cytological bands 1A2C/20C20F (X chromosome), 21A/38B40F/41A44B/60D60F (chromosome 2), 61A61B/76B80F/81A84E (chromosome 3), and the complete fourth chromosome. When the contribution of gene conversion to the total recombination is taken into account,
N
0.004 may apply to >10% of D. melanogaster genes.
To evaluate the relative change in the effectiveness of selection on the selected sequences caused by varying the parameters (changing recombination rates and L and presence and length of intermediate regions) we compared the estimated value of the parameter
on the basis of the observed P. Following directly from the probability of fixation of p and u, and P at equilibrium under the infinitely many sites model and free recombination (![]()
![]()
![]()

(see ![]()
![]()
Linkage disequilibrium (LD) was estimated as the average over all pairwise comparisons of polymorphic sites by using D' (LD-D'; ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Analyses of the D. melanogaster genome:
We studied the complete D. melanogaster genome (![]()
Heterogeneous codon bias across exons:
Two groups of genes were investigated. The first group (659 genes) was composed of all genes with a single long exon (>1000 bp or >333 amino acids). The second group (187 genes) included all genes with long coding regions (>333 amino acids) interrupted by introns and satisfying two criteria: (i) all (one or more) introns should be centrally located, dividing the coding region into two comparable regions (i.e., introns located between 30 and 70% of the relative total length of the coding region), and (ii) at least one intron should be >100 bp. The synonymous codon usage bias was measured using the frequency of GC-ending codons (GC3), the frequency of GC-ending codons in four-fold degenerate amino acids (GC4), and the frequency of preferred codons in D. melanogaster (![]()
Codon bias and the proportion of selected sites in a gene:
The analysis was carried out using all 7499 complete genes (out of 9172) with introns. As a proxy for the relative number (or density) of selected sites in a gene, we used the proportion of the length of the coding region (PLCR) in a gene, measured as the ratio between the length of the coding region and the length of the coding region plus the total length of the introns.
The recombination rate for each gene in the D. melanogaster genome was estimated as previously described (see ![]()
| RESULTS |
|---|
Effects of IS on population and evolutionary parameters at selected and adjacent neutral sequences
Effectiveness of selection:
We investigated the effectiveness of selection on weakly selected mutations by analyzing the proportion of preferred mutations at equilibrium (P; ![]()
![]()
![]()
) are nearly linearly related to Ne, the relationship between P and the selection parameter
(Nes) is strongly nonlinear. For instance, a 5% increase in P represents a 50, 21, and 27% increase in
when the original P is 0.5, 0.6, and 0.9. Therefore, although our simulations measure shifts in P with changes of parameters affecting IS, the magnitude of these shifts is better reflected by the change in the parameter
needed to account for the results under a no-interference model (SS-MSD). As Fig 1 shows, the effectiveness of selection, as measured by
, decreases as the recombination rate decreases, and this effect increases with L (see also ![]()
![]()
N < 2.5).
|
Polymorphism levels:
We studied the effect of IS on polymorphism levels, as measured by heterozygosity, in selected (
s) and neutral (
n) sequences. Under single-site models of weak selection (SS-MSD), the expectations are clear. A general reduction of the intensity of selection (
) predicts a relative increase of
s, making
s closer to
n. On the other hand, a reduction in Ne will cause a direct reduction in
n. The expected net consequence of reducing Ne, hence
, for mutations under SS-MSD is a reduction of
s because the reduction of
n is always greater than the expected increase of selected polymorphism due to a reduced selection, although this decrease of
s is not expected to be proportional to the reduction of Ne. For strong selection, a moderate reduction of Ne would not alter
s.
Our results (Fig 2) show that
s is below the levels expected on the basis of the imposed strength of selection acting on these mutations under SS-MSD. For all combinations of selection intensity (
N) and recombination rates (
N),
s decreases as the number of sites under selection (L) increases (see also ![]()
(e.g.,
s for
is 7075% of that for
, both for
and
), and this impact decreases, but is still noticeable, for very weak selection and high recombination (e.g.,
and
). For
, we also studied whether an even higher recombination rate
would completely eliminate IS. The results show that
does eliminate most IS on
s when L
125 compared to SS-MSD expectations, but IS is still detectable for larger L.
|
Heterozygosity is also reduced in the adjacent neutral sequences as a result of linkage to sites under weak selection. The most extreme reductions in neutral variation are observed for
(Fig 2A), where increasing either L or
N in the selected sequence substantially reduces
n. The impact that the L has on
n increases with the intensity of selection. For example, when
,
n is
75 and
50% of that observed when
for
and
, respectively. When
and L is large, there is a tendency to observe similar levels of polymorphism in selected (
s) and adjacent neutral mutations (
n), which are most evident for
(when L
2500), implying that linked selectively neutral sites and sites under weak selection may not be distinguishable by this criterion. Increasing L reduces
n even when the recombination rate is high (Fig 2B and Fig C). This observed reduction in
n is similar for different selection intensities when recombination is highest (
N = 0.4), likely reflecting a recombination rate threshold for IS.
Divergence and divergence to polymorphism ratio:
Under SS-MSD equilibrium the rate of fixation or divergence of selected mutations rapidly decreases with increasing selection intensity
. We focused on the rate of divergence when
N = 0 to illustrate the effect of IS on this evolutionary parameter (Fig 3A). As expected, selection acting at linked sites does not influence divergence for neutral mutations. The fixation rate of mutations under selection increases with L for any given
N, indicative of a reduction in the effectiveness of selection due to IS.
|
The SS-MSD model also predicts that the divergence:polymorphism ratio (Div/Pol) for weakly selected mutations decreases with increasing
because weak selection has stronger effects in reducing the rate of fixation than the level of polymorphism. Fig 3B shows the Div/Pol ratio for the region containing mutations under selection again for
. For the case of
, Div/Pol decreases with selection but to a lesser degree than that expected for a SS-MSD case. For the intermediate case of
, Div/Pol is only barely affected by selection. More exceptional is the situation in which the number of sites under selection is moderate to large (e.g.,
): In these cases Div/Pol increases not only relative to single-site expectations but also relative to neutral expectations. This trend results from two opposing effects of IS on selected mutations, increasing divergence and reducing polymorphism. Neutral sites (Fig 3C) show a consistent increase in the Div/Pol ratio with increasing IS, caused by the effect that IS has in reducing levels of linked neutral polymorphism.
Mutation frequency spectrum:
The SS-MSD models predict that, as
increases, weakly selected mutations will become less abundant and allele frequencies at polymorphic sites will decrease compared to neutral expectations. Fig 4 plots Tajima's D statistic, a measure of the skew of allele frequency compared to neutral frequency spectrum, for selected and neutral sequences under complete linkage. In the selected sequence, a more negative Tajima's D is observed with increasing selection intensity, as expected (see ![]()
N. This trend does not hold, however, when
N and L are large, (i.e.,
), where Tajima's D becomes unaffected or even less negative. This is, however, not surprising because Tajima's D statistic is not entirely independent of the number of segregating sites: It tends toward zero as the number of segregating sites in a sample becomes small for any given (nonneutral) frequency of variants. Therefore, IS increases the relative frequency of rare variants (hence it induces a negative Tajima's D) but IS also decreases the number of segregating sites, thus biasing Tajima's D estimates closer to zero when IS and its reduction of the number of segregating sites are severe.
|
The frequency spectrum of neutral mutations departs from the neutral equilibrium expectation, showing an excess of low frequency alleles when IS occurs in the adjacent selected sequence. Tajima's D becomes more negative as the number of selected sites or
N on these sites increases. When IS is strongest, the skew toward low frequencies becomes similar for both selected and neutral mutations, a trend we have also encountered for heterozygosity.
IS also influences the allele frequency of variants in the selected sequences when recombination is highest
while the frequency spectrum of neutral variants in adjacent sequences remains mostly unaffected. The skew toward low frequency variants in the selected sequences increases with L, although to a lesser degree compared to
N
0.004, and this effect intensifies with increasing
N.
IS and linkage disequilibrium:
![]()
![]()
![]()
N increases LD (with repulsion associations) in the selected sequences to a greater extent than in the adjacent neutral sequences (see Fig 5B for
). When recombination is very high
, LD-D' also varies in the selected sequences for different
N, but not in the adjacent neutral sequences.
|
However, the conditions that increase negative LD are the very same conditions for which there is an excess of low-frequency variants (with more negative Tajima's D estimates). Because most measures of LD correlate, to some degree, with the frequency of the mutations under study (![]()
N (and IS) increases. Therefore, the observed increase in LD with IS, as measured by D', is not only the consequence of a shift in the frequency spectrum.
|
Temporal dynamics after a change of recombinational environment:
Genome-wide and/or gene-specific changes in recombination rates may be common in many evolutionary systems, and so it is important to study the time needed to reach new equilibria. For instance, in Drosophila there is extensive gene order shuffling within chromosomal arms between species (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
We studied the number of generations needed to reach equilibrium after a change in the recombination rate under IS conditions. For simplicity we assumed a population at equilibrium under the initial conditions and the instantaneous fixation of a randomly chosen allele in a new recombinational environment. Such a situation might apply to a gene located near the breakpoint of a chromosomal rearrangement that was quickly driven to fixation (possibly by natural selection). In accord with expectations, most population and evolutionary parameters reach their new equilibria much sooner than codon usage (Fig 7A and Fig B). Under neutrality, few Ne generations (
4Ne for diploid individuals) after a hitchhiking event are sufficient to achieve near-equilibrium levels of polymorphism (![]()
s or Tajima's D increases with
N, requiring
2040N generations when
while this time is close to 4N generations when
N
0.5.
|
Multilocus parameters, such as those that estimate codon usage, take a very large number of generations to reach a new equilibrium following perturbation. Indeed, codon usage requires
100250N or
12.5/µ generations to reach the new equilibrium. This required time is not strongly dependent on the number of sites under selection, but it is dependent, as expected, on µ (data not shown). Two other features of codon bias evolution following a change of recombination environment were also observed. First, the change in codon bias is faster when the number of preferred mutations is increasing (i.e., changing from low to high recombination) than when the number is decreasing. Second, the weaker the selection the longer the period required to reach the new base composition equilibrium in either direction. These two features can be easily explained by three factors: (i) The average time to fixation of weakly advantageous mutations is shorter than that expected for weakly deleterious mutations, (ii) the speed to fixation of preferred mutations increases with
, and (iii) mutational pressure toward unpreferred mutations is higher when the ancestral sequence has higher P.
We also observed a tendency for population and evolutionary parameters to "overshoot" their equilibrium values when a sequence changes from no recombination to a high recombination rate (stronger for
than for
), but this effect is not detectable in the opposite direction. This overshoot creates a transient situation more nearly resembling a neutral equilibrium. Under the conditions we investigated, population parameters are closest to the neutral expectations
45N generations after the fixation of a single sequence in an environment with high recombination. For instance, in the case of
,
s changes through time from zero (right after the fixation event) to the new IS equilibrium under high recombination (after
40N generations) with an intermediate state 50% higher than that finally observed at equilibrium.
IS and its effect across regions under uniform selection: Consider an interval of a recombining genome in which segregating sites under selection result in IS, and further assume this interval is embedded in a region containing few additional mutations under selection. Since the magnitude of the IS effect acting at a particular site in this interval will be governed by the interactions between the segregating sites located on both sides of that site, it is reasonable to expect that IS will be stronger at sites embedded in the middle of a region under selection than at sites located close to an edge of this region. This situation may apply to many protein-coding regions in eukaryotic genomes, but it would be pertinent to any group of physically clustered sites under weak selection surrounded by largely unconstrained sites. Here, we investigate the magnitude of the "center" vs. "edge" effect for plausible rates of recombination and selection. The issue under scrutiny is whether IS differs measurably between the center and edge of a region under selection when both mutation rates and selection coefficients are uniformly distributed across the region.
We studied population and evolutionary parameters across sequences with 2500 sites under uniform selection, recombination, and mutation, with emphasis on a lateral and the central region of 250 sites each. Two indicators of IS are depicted in Fig 8 for the central and lateral regions: the proportion of preferred codons (P) and the divergence to polymorphism (Div/Pol) ratio. As expected, no effect is seen for the no-recombination case
. For intermediate recombination rates, IS differs between regions, with the central region showing stronger IS (central regions have lower P and higher Div/Pol ratio than lateral regions). This heterogeneous distribution of IS across regions (the center effect) decreases when recombination is very high, but it can still be seen for high recombination
when selection intensity is weak
.
|
The effect of neutral sequences between regions under selection: We studied whether or not small changes in the overall recombination rate between two regions under selection, caused only by a change in the physical distance between them, have a detectable effect on the overall IS. Here, the simulation procedure allowed us to generate a variable number of neutral sites (i.e., middle or "spacer" sequence) between two sequences under selection. The two regions under selection were identical, with equal numbers of selected sites, selection coefficients per site, and mutation and recombination rates per site. Mutation and recombination rates per site in the spacer sequence are the same as in the flanking selected sequences, but the mutations were selectively neutral. Thus, with respect to IS the presence and length of the intermediate neutral region alters only the number of recombination events between the two selected regions; it does not change directly any parameter on the flanking selected sequences.
We studied intermediate rates of recombination
for the case of a neutral sequence located between two sequences each of 500 selected sites. Fig 9A depicts the relative change of the effectiveness of selection (i.e.,
based on P) caused by the presence and length of the spacer sequence. The results show that the length of an intermediate neutral region has a detectable effect. In all cases, longer spacers lead to an increase of the effectiveness of selection (a reduction in IS) on the adjacent selected mutations. Serving as illustration, for the case of
and
the presence of a 1000-bp-long region in the middle of the selected sequence is equivalent to a relative increase of 7.4% in the overall fitness associated with the selected sequences (i.e., a gain of 2.3% preferred codons). A substantial fraction of the potential increment in fitness in regions of moderate to high recombination is achieved with short/intermediate sequences (<1000 bp), while for regions of more severely restricted recombination longer sequences are required to produce an equivalent increment in fitness. The maximum relative gain in fitness is higher for
than for
for the two rates of recombination investigated, as expected (see Fig 1).
|
Empirical tests of IS based on D. melanogaster's genome
Distribution of codon bias within genes:
As indicated in our simulations, IS is expected to be stronger in the center of regions under IS than in the margins of these regions. This leads to the first test prediction: Codon usage bias, a measure of the effectiveness of selection, will be lower in the middle of coding regions of genes than in the amino- or carboxy-terminal regions. Comparing codon bias levels within genes eliminates expression level and gene length as factors that can alter codon usage (![]()
![]()
![]()
We restricted our attention to the set of genes in the D. melanogaster genome composed of single long exons (>333 amino acids; see MATERIALS AND METHODS), a total of 659 genes. The frequency of GC at the third position of codons (GC3) was used as a measure of codon usage bias (![]()
![]()
![]()
, P < 1 x 10-6), with a lower GC3 in the central region. A similar result is obtained when the average GC3 of the two lateral sections is compared to the central section
or when each lateral region is compared separately to the central section (
, respectively). On average, the lower GC3 content in the central region of coding regions is equivalent to a reduction in
of
10% on synonymous mutations compared to lateral regions.
|
IS simulations also show that the intensity of IS increases with the length of the gene region (and hence number of sites) subject to weak selection. This leads to the second test prediction: The relative reduction in codon bias in the center of a gene will be positively correlated with the length of the coding region. Consistent with this prediction, we find a highly significant positive correlation between the length of the coding region and the difference of GC3 between lateral and central regions in the same set of 659 genes analyzed above (Spearman's correlation
). Quantitatively similar results are obtained with the frequency of preferred codons and with GC content at fourfold degenerate sites; data not shown.
According to our simulations, the presence of neutrally evolving sequences placed in the center of a region subject to weak selection can relieve the IS effect. This leads to the third test prediction: Centrally located introns will ameliorate the effect of IS in the central region of genes that contain them. To test this prediction, we compared codon bias in these same 659 genes, which lack introns, with comparable genes with introns located in the center of the coding regions (see MATERIALS AND METHODS). Fig 10B shows the results for the 187 genes obtained from the genome database satisfying these criteria. For these genes, there is no apparent reduction of GC3 in the middle of the coding regions. Accordingly, we do not detect significant heterogeneity of GC3 between the three regions
or a difference between the GC3 content of central and the two lateral regions (P > 0.15). In addition, the central sections of coding regions of genes with central intron(s) have a significantly higher GC3 content than the equivalent central sections in genes without introns (Mann-Whitney U-test,
) whereas both lateral sections show similar frequencies (P = 0.61 and P = 0.09). Therefore, the lower GC3 frequency in the middle of the coding region in genes without introns cannot be the result of general relaxed selection on codon bias in the central part of coding regions.
Proportion of selected sites in a gene and codon bias: According to our simulations, the presence of neutral sequences embedded in a region under selection causes an increase in the effectiveness of selection on adjacent selected sequences, the length of such neutral sequences being positively correlated with the increment of the effectiveness of selection. We investigated, therefore, a fourth test prediction: Codon bias will be positively correlated with measures of a gene's intron length and number. As a first approximation, we studied the relationship between measures of codon bias (e.g., GC3) and total intron length in the set of all genes with confirmed intron/exon structure. There is a weak positive relationship between GC3 and total intron length, both using all introns (R = 0.040, P = 0.0007) and after eliminating the small fraction of introns with detectable remnants of TE elements (R = 0.041, P = 0.0005).
We also studied the relationship between codon bias and measures of the proportion of sites under selection in a gene. As a simple measure of the density of sites under selection in a gene, we used the PLCR in a gene when embedded introns are included (see MATERIALS AND METHODS). The prediction under IS is again explicit: Codon bias (as measured by GC3) will decrease as PLCR increases. The analysis of all 7499 genes with introns reveals a significantly negative relationship between GC3 and PLCR (R = -0.136, P < 1 x 10-6); equivalent results are obtained using other measures of codon bias. Fig 11, a display of GC3 when genes are grouped with respect to PLCR into five sets of equal sample size, shows that the effect may be stronger when PLCR is medium/high.
|
Gene length may have a confounding effect on the relationship between GC3 and PLCR because the length of coding region is negatively related to codon bias (![]()
![]()
![]()
Gene length and intron presence: IS predictions of the favorable consequences of intermediate or spacer sequences forecast that intron length will increase with the length of the coding region. The average length of introns increases with the total length of the coding region (R = 0.219, P < 1 x 10-6). This relationship is not attributable to differences in either recombination rates or gene expression levels and remains significant (P < 1 x 10-6) after controlling for these variables. A greater number of introns are also observed in long genes (R = 0.53, P < 1 x 10-6) although this observation could be connected to causes other than IS.
Intergenic distance and gene length: In addition to having longer (and a greater number of) introns in relation to the length of a coding region, is there also evidence for greater intergenic distance as a function of length of coding regions of adjacent genes? This result is expected under a scenario where longer intergenic regions are favored when, otherwise, IS between adjacent genes would be enhanced, i.e., when the lengths of the neighboring coding regions increase. To address this seventh test prediction, we investigated the length of intergenic regions separating well-defined genes (see MATERIALS AND METHODS). The results, displayed in Fig 12, reveal a positive relationship between the length of the 6271 intergenic sequences investigated and the length of the flanking coding regions (R = 0.097, P < 1 x 10-6); a positive relationship is also observed in regions of high recombination (>3 x 10-8/bp/generation; R = 0.104, P < 1 x 10-6, n = 2367). Alternative explanations to this observation based on functional considerations might also be proposed, such as genes with longer coding regions, if they are functionally more complex, might require tighter gene regulation (and hence longer noncoding regions). But we are unaware of any explicit empirical support for this class of alternative explanations. If our interpretation of this correlation is correct, it would suggest that IS between adjacent genes might not be negligible in most of the range of recombination rates in Drosophila. This relationship is not an indirect consequence of the effect that recombination rates might have on both parameters: The length of the intergenic regions decreases with increasing recombination rates (R = -0.034, P = 0.008) but no relationship is detected between the length of coding regions and recombination (P > 0.40).
|
<


) Neutral (






(inf.). Open and solid diamonds depict 

