| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 175, 1381-1393, March 2007, Copyright © 2007
doi:10.1534/genetics.106.065557
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
1 Corresponding author: Institute of Evolutionary Biology, School of Biological Sciences, Ashworth Laboratories, University of Edinburgh, King's Bldgs., W. Mains Rd., Edinburgh EH9 3JT, United Kingdom.
E-mail: laurence.loewe{at}evolutionary-research.net
| ABSTRACT |
|---|
|
|
|---|
In addition, a higher level of nonsynonymous divergence in a gene between Drosophila species is correlated with a lower frequency of optimal codons (fop) (BETANCOURT and PRESGRAVES 2002; MARAIS et al. 2004; BIERNE and EYRE-WALKER 2006). To explain this in terms of selective sweeps, KIM (2004) modeled the effect of the spread of selectively favorable amino acid mutations on Ne for the gene in which they occur. In addition, interference among weakly selected sites may also reduce the efficacy of selection at such sites, as measured by Nes, where s denotes the relevant selection coefficient (LI 1987; COMÉRON et al. 1999; MCVEAN and CHARLESWORTH 2000; TACHIDA 2000; COMÉRON and KREITMAN 2002). Such interference has been proposed as an explanation of patterns in the inferred intensity of selection on codon bias within genes of Drosophila. As discovered from whole-genome analyses, less frequent use of optimal codons (i.e., lower codon usage bias) is found in the middle of genes that lack introns, in long genes, and in regions of low recombination (COMÉRON et al. 1999; COMÉRON and KREITMAN 2000, 2002; QIN et al. 2004).
Background selection causes a similar reduction in Ne, by the removal of weakly selected or neutral variants at sites that are closely linked to sites under purifying selection. When deleterious mutations at the latter sites have Nes > 1, they can be treated as effectively close to equilibrium under mutation–selection balance and contribute to background selection effects (CHARLESWORTH et al. 1993, 1995; NORDBORG et al. 1996). Recent results suggest that most amino acid mutations in Drosophila are sufficiently deleterious to fall into this category (LOEWE and CHARLESWORTH 2006; LOEWE et al. 2006); these are so abundant that they may exert significant effects on sites within the same or neighboring genes.
The basis for this can be understood as follows. Published data on autosomal DNA sequence polymorphisms in regions with normal recombination rates in African populations of Drosophila melanogaster yield a mean nonsynonymous nucleotide site diversity of
0.3% (B. VICOSO, personal communication). With a mean of
1333 nonsynonymous sites per gene (MISRA et al. 2002), this implies an average of 1333 x 0.003/2
2 amino acid variants per gene. Even if as few as 50% of these have Nes > 1, then each gene would carry an average of close to one effectively deleterious mutation. In the absence of recombination, Equation 4 of CHARLESWORTH et al. (1993) shows that Ne is then reduced to 37% of its maximal value. This suggests that there may be enough deleterious amino acid variants in Drosophila genes to cause significant background selection on closely linked sites, even in the presence of recombination. This reflects the weak selection coefficients for most amino acid mutations inferred from polymorphism studies (LOEWE and CHARLESWORTH 2006; LOEWE et al. 2006). Earlier models of background selection assumed stronger selection that leads to less frequent, but more deleterious, variants, on the basis of estimates of the fitness effects of mutations from mutation-accumulation lines (HUDSON and KAPLAN 1995; CHARLESWORTH 1996).
We use theoretical predictions of the effects of background selection on neutral diversity, which allow arbitrary levels of recombination to be modeled (HUDSON and KAPLAN 1995; NORDBORG et al. 1996). The theory has been extended to include the effects of background selection on fixation probabilities of weakly selected mutations linked to sites under strong selection (STEPHAN et al. 1999; unpublished results of M. NORDBORG, personal communication). This enables the prediction of codon usage bias, from standard results on mutation–selection–drift equilibrium (LI 1987; BULMER 1991; MCVEAN and CHARLESWORTH 1999). We can thus combine a set of mutation rates and fitness effects with an arbitrary recombinational landscape, for the purpose of predicting the effects of background selection for each point in the landscape.
In the past, such efforts have focused mainly on whole chromosomes to examine whether background selection can explain the relation between local recombination rate and nucleotide diversity for Drosophila (HUDSON and KAPLAN 1995; CHARLESWORTH 1996) and for humans (PAYSEUR and NACHMAN 2002a,b; REED et al. 2005). It was tacitly assumed that background selection at the level of a single gene is negligible. Since gene conversion acts only over short distances, it was also ignored in these studies. While the question of the pattern of chromosomewide variability is important, this article has a quite different goal. We explore whether background selection can cause the patterns of codon bias mentioned above, by predicting the reduction of Ne due to background selection in single genes or in small groups of genes. We investigate the effects of various parameters, including rates of recombination caused by both crossing over and gene conversion, mutation rates, selection coefficients, and gene structure (introns, intergenic distances, and numbers of neighboring genes). All the parameters are chosen as being realistic for D. melanogaster. The results show that background selection may play a significant role in shaping the observed patterns of codon usage bias.
| METHODS |
|---|
|
|
|---|
The strongly selected sites are assumed to be in mutation–selection equilibrium, so that qi, the frequency of the deleterious allele at site i, is given by
![]() | (1) |
B for the weakly selected (synonymous) site under consideration (the "focal site") is then equal to
![]() | (2) |
A study of the effect of background selection due to a single site subject to mutation and selection (STEPHAN et al. 1999) showed that the fixation probabilities of mutations at a weakly selected linked site can be predicted by substituting the value of Ne from Equation 2 into the standard formula for fixation probability for a single locus (KIMURA 1962). Simulations have confirmed that this result also applies to a large number of strongly selected, linked sites, each subject to mutation and selection (M. NORDBORG, personal communication). The level of adaptation at weakly selected, synonymous sites, measured by the frequency of preferred codons at statistical equilibrium under mutation, drift, and selection, is determined by these fixation probabilities (LI 1987; BULMER 1991; MCVEAN and CHARLESWORTH 1999).
There are, however, conditions on the validity of Equation 2 that need to be considered. First, use of Equation 1 requires Nesi > 1. This does not necessarily mean that the population is at equilibrium, but implies that the mean allele frequency over the distribution generated by selection, mutation, and drift is well approximated by Equation 1, assuming semidominant effects of mutations on fitness (MCVEAN and CHARLESWORTH 1999). Thus the mean frequency over a group of variants subject to selection is given by Equation 1, so that the formula works well in practice (NORDBORG et al. 1996). Second, if selection against deleterious mutations is very weak, there is a significant probability of fixation of a mutation at a weakly selected site in situations when the mutation is linked to a deleterious variant that is drifting to high frequencies or fixation; such cases are ignored in Equation 2. Use of Equations 5 and 6 in the Appendix to CHARLESWORTH et al. (1993) for the case of no recombination shows that this effect will be small if the fixation probability of a deleterious mutation can be neglected relative to the neutral value, as is the case if Nesi > 1 (KIMURA 1983, pp. 43–46). Third, if there is tight linkage among a group of deleterious mutations, Hill–Robertson effects among them undermine the effectiveness of selection, and Equation 2 overestimates the reduction in Ne (CHARLESWORTH et al. 1993; NORDBORG et al. 1996). For these reasons, we removed from consideration any sites for which Nesi
1 and restricted ourselves mostly to small groups of genes with nonzero levels of gene conversion. To produce our results, we computed B either for all synonymous sites in the focal gene or for 200 evenly distributed synonymous sites in the gene (to save computing time). To condense this into a single value of B for each gene, we computed the arithmetic mean over all synonymous sites for use in some of our plots.
Modeling gene structure and gene conversion:
To incorporate gene structure into Equation 2 requires only specification of the recombination rates, ri, if we assume a constant mutation rate and selection coefficient across the gene. Our basic approach was to measure the molecular distance di between the synonymous focal site and the selected site i while walking over all sites between them. Whenever nonselected sites were encountered, di was increased accordingly, without increasing the sum in Equation 2. Three types of sequences affect di in this way: synonymous sites, introns, and intergenic regions. Although our computer code is flexible, we assumed that all neighboring genes had the same structure (2000 bp in exons; four introns of 100 bp), independent of that of the focal gene. For a given number of introns, the l bp of the exon sequence were divided into a corresponding number of equally long exons.
To convert di into ri, we used Equation 1 of FRISSE et al. (2001), which assumes a mixture of reciprocal crossing over and gene conversion with an exponential distribution of tract lengths. This gives the net recombination rate between the focal site and site i as
![]() | (3) |
Modeling the distribution of deleterious mutational effects (DDME) on fitness:
We assumed that the distribution of heterozygous selection coefficients against deleterious mutations follows a lognormal distribution (AITCHISON and BROWN 1957; CROW 1988), since this distribution has proved useful for estimating mutational effects in Drosophila (LOEWE and CHARLESWORTH 2006). It is characterized by "shape" and "location" parameters,
g and µg, which correspond to the exponentials of the standard deviation and mean of the natural logarithm of the variate, respectively (LIMPERT et al. 2001). Unfortunately it is not possible to estimate the DDME in D. melanogaster by this method without making several assumptions. We therefore used estimates from D. miranda and D. pseudoobscura (LOEWE and CHARLESWORTH 2006) to choose plausible DDMEs, on the basis of the requirement that these be compatible with the diversity data for both species and also predict a realistic number of dominant, effectively lethal, mutations (LOEWE and CHARLESWORTH 2006).
We then used the shape parameters of these DDMEs to estimate the corresponding location parameters. This was done by using nonsynonymous and synonymous nucleotide site diversities (
A and
S, respectively) from autosomal genes in high-recombination regions of African populations of D. melanogaster. Means with
90% confidence intervals (from a metaanalysis of published data) were kindly provided by Beatriz Vicoso:
A = 0.295% (0.166–0.560%) and
S = 2.07% (1.67–2.59%), on the basis of 17 loci weighted by the inverses of their expected sampling variances (BARTOLOMÉ et al. 2005). The location parameter for an assumed shape parameter was obtained by equating observed and expected values of
A/
S, in a similar way to the procedure of LOEWE and CHARLESWORTH (2006). Key parameters of the resulting DDMEs are given in Table 1.
|
Since most DDMEs included a significant probability mass in the effectively neutral area (Nes
1.0), a significant number of nonsynonymous sites are nearly neutral and are thus omitted from the calculations. This makes our DDME-based estimates of B slightly overestimate the true value.
Plausible parameter combinations for D. melanogaster:
We chose our parameters to reflect the properties of autosomal genes in D. melanogaster. Nucleotide site mutation rate estimates (u = 5.8 x 10–9/bp/generation, with
95% confidence interval 2.1 x 10–9–1.31 x 10–8) have been obtained from a mutation-detection screen of mutation-accumulation experiments (HAAG-LIAUTARD et al. 2007). To cover a range of mutation rates across the genome, we used mutation rates of 2 x 10–9, 4 x 10–9, and 8 x 10–9, respectively, in the calculations described in RESULTS. If we combine these with the mean synonymous diversity at autosomal loci in high-recombination regions from African populations (see above), Ne
1.3 x 106, with a range from 0.65 x 106 to 2.6 x 106, corresponding to the upper and lower limits for the mutation rates that we use. Our results agree with other estimates that suggest a recent Ne of
106 (MORIYAMA and POWELL 1996; MCVEAN and VIEIRA 2001). Estimates of the parameters of the DDME assumed a "standard" mutation rate of 4 x 10–9. Previous work suggests that the estimates of the shape of the DDME and the product of Ne and location parameter are not very sensitive to the mutation rate (LOEWE et al. 2006).
We assumed crossing-over rates of recombining genes that ranged from rc = 1 x 10–9 to 3 x 10–8, with a mean of
1 x 10–8/bp/generation (BETANCOURT and PRESGRAVES 2002; HEY and KLIMAN 2002), averaging over the rc-values for females and males (which do not cross over). Unless otherwise stated, all computations assumed a gene conversion frequency per site (corrected for the lack of events in males) of rg = 0.25 x 10–5, on the basis of the mean of estimates from the rosy locus (HILLIKER and CHOVNICK 1981), and a mean tract length of 352 bp (HILLIKER et al. 1994).
Gene structures were estimated from the third release of the D. melanogaster genome (MISRA et al. 2002; FLYBASE 2006). The average length of the sum of all exons in a gene is 2078 bp (27.8 Mb total sequence in exons/13,379 protein-coding genes; MISRA et al. 2002), with extremes ranging from 63 to 15,603 bp (ADAMS et al. 2000). The typical gene has 3.6 introns (48,257 introns/13,379 protein-coding genes; MISRA et al. 2002). Most introns have a length between 59 and 63 bp (MOUNT et al. 1992), but extremes range from 40 bp to >70 kb (ADAMS et al. 2000). Intergenic distances are
6.2 kb on average [subtracting 4 introns of 100 bp from (116.8 Mbp total euchromatin – 27.8 Mb all exons)/13,379 protein coding genes] (MISRA et al. 2002). However, gene densities vary from 1/50 kbp to 30/50 kbp (ADAMS et al. 2000), so that the intergenic distance could be as little as 500 bp in dense gene clusters. We chose our "standard setup" to resemble these findings, by assuming that a typical gene has 2000 bp of exons, 4 introns of 100 bp, and a distance of 6 kb between genes. The possible effects of neighboring genes are ignored, except where specifically mentioned. We assume that two-thirds of sites are nonsynonymous, i.e., 1333 per gene. Mutations to stop codons were ignored. Deviations from this standard setting are mentioned explicitly.
The DDME estimates are shown in Table 1. In computations that assumed constant selection coefficients, we used the harmonic mean heterozygous selection coefficient, sh, estimated from a DDME with a width of
g = 50. This gives Nesh
12.7 and cne
12.6%, where cne is the fraction of effectively neutral nonsynonymous mutations (for which Nes
1). sh can be shown to be the dominant term in a Taylor series expansion of Equation 2 for low recombination rates, when there is a distribution of s-values, which provides a justification for using sh in Equation 2 as an approximation. This requires a correction for the presence of effectively neutral mutations among the nonsynonymous mutations. We therefore multiplied the overall mutation rate by a factor of 1 – cne to obtain the mutation rate used in Equation 2.
Computations:
The model described above was implemented using the statistical script programming language R (IHAKA and GENTLEMAN 1996; MAINDONALD and BRAUN 2002), which can be freely downloaded from http://www.r-project.org/. All core functionality was contained in a function "FopBgs," which takes all possible input parameters and returns a list that contains all potentially informative results. FopBgs was tested by monitoring key parameters while stepping through the important parts of the code and by comparing results with analytical results, for the case with no recombination and for an approximation for the case of crossing over with no gene conversion (Equation 9 of NORDBORG et al. 1996).
| RESULTS |
|---|
|
|
|---|
|
An important question is the sensitivity of these results to the assumption of constant selection coefficients. Figure 2, A and B, shows the same plots as Figure 1, A and B, but with a DDME of width
g = 50, averaged over 100 genes, instead of a fixed selection coefficient. Together with the results for other DDMEs (data not shown), this suggests that the general nature of the patterns within genes is robust to the distribution of s, but that the overall effects of background selection are reduced by a wide distribution. The effects of introns and gene boundaries seem to be slightly more pronounced with a wider DDME, but the reduction in B in the center of genes is smaller.
|
|
|
|
|
|
6 kb for many typical parameter combinations.
|
|
| DISCUSSION |
|---|
|
|
|---|
Patterns of Ne within genes:
Background selection caused by deleterious amino acid mutations within a single gene can reduce the effective population size experienced at linked neutral or nearly neutral sites (Figures 1 and 2). In addition, the dilution of background selection effects by recombination produces patterning along the gene of B, the ratio of Ne at a given site to its value in the absence of background selection, N0. This is because intergenic and intron sequences are assumed for convenience to be neutral and hence do not contribute to background selection. This produces an increase in B at the ends of genes and at the boundaries of exons with introns (see Equation 9 of NORDBORG et al. 1996). While there is evidence for purifying selection on synonymous mutations (COMÉRON and GUTHRIE 2005) and on mutations in noncoding sequences (HADDRILL et al. 2005), the levels of constraint on such mutations are typically much lower than those for nonsynonymous mutations, so that it seems reasonable as a first approximation to ignore them, especially as the effects of weak selection are rapidly diluted by recombination (Equation 2). This argument does not apply to the splicing signals at the beginning and the end of introns (MOUNT et al. 1992). These are probably under strong selection and can be accounted for by slightly longer "effective exons."
Although for plausible parameter sets, it is clear that the mean B over all sites within a gene is always reduced by at least 4% or so, the within-gene patterns in B are likely to be very small and would be very hard to detect in surveys of nucleotide site diversity, which previously have been used to infer differences across the genome in Ne caused by background selection and selective sweeps (BEGUN and AQUADRO 1992; CHARLESWORTH 1996). They may, however, be detectable from patterns of codon bias seen in genomewide analyses of sets of genes, since codon bias is affected by the value of Ne under the standard mutation–selection–drift model (LI 1987; BULMER 1991; MCVEAN and CHARLESWORTH 1999). According to the Li–Bulmer equation, the equilibrium frequency of optimal codons, fop (assuming a preferred and an unpreferred codon at each site), for a given strength of selection is given by
![]() | (4) |
is the selection coefficient against heterozygotes for nonoptimal codons (semidominance is assumed), and
is the ratio of the mutation rates from and to optimal codons, respectively (i.e., the mutational bias).
Without estimates of
and
, it is impossible to make fully quantitative predictions to compare with the data, but an approximate analysis can be carried out as follows. Differentiating fop in Equation 4 with respect to Ne, we obtain the following expression for the relation between a small change in fop as a proportion of its value, (dfop)/fop, and the corresponding small proportional change in Ne, (dNe)/Ne:
![]() | (5) |
The important parameters can be estimated as follows. In their genome analyses, COMÉRON and KREITMAN (2002) used the frequency of GC content at third coding positions (GC3) in genes of D. melanogaster as a proxy for codon usage bias, since most preferred codons in Drosophila end in G or C. Work on several species of Drosophila has suggested values of
3 for mutational bias from GC to AT mutations (MASIDE et al. 2004; BARTOLOMÉ et al. 2005); to be compatible with the mean GC3 of
0.65 found by COMÉRON and KREITMAN (2002), Nes for selection on GC3 must be
0.43. A proportional change in equilibrium fop (given by the right-hand side of Equation 5) is
60% of the corresponding small proportional change in Ne, if fop = 0.65. Thus, everything else being equal, a change in fop is associated with a substantially larger change in Ne.
Figure 10 of COMÉRON and KREITMAN (2002) shows that GC3 for the central part of a D. melanogaster gene without an intron is 3–4% lower than the value for the distal parts, but with a good deal of uncertainty as to the exact value of this difference. Figure 2B shows that, with the standard rate of gene conversion and the estimated DDME, a mutation rate of 4 x 10–9 (slightly lower than the point estimate of HAAG-LIAUTARD et al. 2007) gives a value of B for the central part of a gene with no introns that is
3% lower than that for the ends, corresponding to a difference of 1.8% in fop, somewhat smaller than the observed value. COMÉRON and GUTHRIE (2005) directly estimated values of Nes from polymorphism and divergence data on D. melanogaster and its close relatives and showed that it was lower for the central regions of long exons; their estimates of fop for these genes showed a reduction of
10% for the central 150 codons as opposed to the 150 codons at the beginning and the end of long genes without introns. One possibility for explaining this underprediction of the observed effects is that gene conversion rates may be higher at the ends of genes than in their centers, as seen for the rosy locus (HILLIKER and CHOVNICK 1981); this could enhance the relative difference in codon bias between the ends and the centers. Another possibility is that the mutation rate is higher than we have assumed. A mutation rate of 8 x 10–9, which is near the upper confidence interval of the estimates, gives a larger predicted difference in fop (
14%, transformed from Figure 2B) than is observed.
Figures 1 and 2 also show that the presence of introns reduces the size of the difference in B between the ends and the middle of genes, because the value of B for the central part of a gene is increased by the presence of introns. This is qualitatively consistent with the results in Figure 10 of COMÉRON and KREITMAN (2002).
QIN et al. (2004) reexamined these patterns in the context of the effects of gene length and level of gene expression, using the effective number of codons (ENC) as an inverse measure of codon usage bias. Somewhat unexpectedly, their analyses showed that codon bias is lowest at either end of the genes, reaches a peak toward
50–100 codons from the ends, and then declines toward the middle of the genes (their Figure 6). There is no obvious explanation for the low bias at the very ends of genes, which may reflect constraints on translational efficiency at the beginning and the end of translation (QIN et al. 2004), but the tendency for ENC to increase toward the centers of genes is qualitatively consistent with expectations under both background selection and weak Hill–Robertson effects among synonymous sites (COMÉRON and KREITMAN 2002). The spatial pattern appears to be stronger in genes with higher expression levels; since overall codon bias is well known to be correlated with level of gene expression (DURET and MOUCHIROUD 1999), this presumably reflects stronger selection for codon bias in more highly expressed genes. From Equation 5, it can be shown that a higher value of Nes is associated with a higher relative sensitivity of fop to Ne for realistic parameter values, so that any mechanism causing differences in Ne will cause differences in codon bias to be stronger when overall codon bias is higher. Genes with one intron have slightly higher levels of codon usage bias along their length than genes lacking introns (Figure 8 in QIN et al. 2004), consistent with the results in Figures 1 and 2.
The effects of gene length and intron length:
Figure 3 shows that exon length can have a substantial effect on the mean B for a gene, but the magnitude of the effect is very dependent on other parameter values. For selection coefficients drawn from the estimated DDME, and with the standard mutation rate of 4 x 10–9 and standard recombination parameters, the value for the longest genes in Figure 3B is
0.92 instead of the maximal value of 0.97 that would apply to very short genes; i.e., there is an
5% reduction in B, corresponding to a 3% reduction in fop below the maximum. With a mutation rate of 8 x 10–9 and a standard rate of recombination, B falls from
0.45 for short genes to
0.15 for long genes (Figure 3B, right scale). This 66% reduction in B corresponds to an
31% reduction in fop, from Equation 5 with fop = 0.55 and
= 3.
The results on D. melanogaster of DURET and MOUCHIROUD (1999, Figure 1 therein) showed that long genes (>570 codons) with high expression levels have
11% lower fop than very short genes (<333 codons). The direct estimates of Nes also suggested a large effect of exon length (COMÉRON and GUTHRIE 2005). Our results indicate that other processes will be required to explain these observations, if the mutation rate is 4 x 10–9; however, if the mutation rate in these genes is somewhat larger, background selection can generate these patterns.
Hill–Robertson interference among weakly selected synonymous sites is one of the other processes that may help explain these observations; simulations showed effects of this kind in regions of normal crossing over (but gene conversion was ignored in the model of COMÉRON et al. 1999). It is possible that a combination of background selection and Hill–Robertson interference among synonymous sites might produce larger effects at standard mutation rates. DURET and MOUCHIROUD (1999) argued that it was unlikely that general Hill–Robertson effects (which include background selection) could explain the effects of gene length, since they found no effect of the length of neighboring genes on fop in Caenorhabditis elegans. However, with the very high linkage disequilibrium observed in C. elegans, probably reflecting a high rate of self-fertilization (CUTTER 2006), it is likely that the effective rate of recombination is very low (CHARLESWORTH et al. 1993), so that genes experience background selection effects from many neighbors. This would greatly reduce the effects of immediate neighbors, if codon usage bias is determined by the current recombinational environment.
COMÉRON and KREITMAN (2002, Figure 11 therein) also showed that the GC3 content of a D. melanogaster gene decreases by
3% of its maximal value as the proportion of a gene contributed by introns decreases. The results in Figure 4B for the standard parameter set predict an effect of this kind, but the magnitude of the change in fop is only
2%, although bigger effects are again possible with a higher mutation rate. Also, it is possible that including Hill–Robertson interference among synonymous sites would improve the fit to the data.
The effects of intergenic distance and neighboring genes:
It has been argued that a higher local gene density correlates with reduced diversity because of increased levels of background selection in humans (PAYSEUR and NACHMAN 2002a) and in Arabidopis thaliana (NORDBORG et al. 2005). This is consistent with Figure 8, which shows more background selection with shorter intergenic distances in gene clusters of constant size.
Figures 5 and 6 show that neighboring genes have little effect on B and its behavior within a gene, unless crossing over is infrequent. This reflects the fact that background selection caused by the relatively weak selection experienced by most amino acid mutations is very sensitive to recombination. But if crossing over is completely absent, gene conversion on its own fails to prevent the cumulative effects of background selection (Figure 7), so that B is then very sensitive to the number of neighboring genes. With the standard gene conversion, selection, and mutation parameters, Figure 7B shows that B is reduced to 5% of its maximum level with only 40 genes that fail to cross over. It was not possible to produce numerical results for cases with a distribution of selection coefficients for >40 genes, due to computing time constraints, but it seems likely from the nearly log-linear relation between B and the number of genes that 80 genes (close to the number on chromosome 4 of D. melanogaster; FLYBASE 2006) would result in an effective population size of
0.1% of its size without background selection. These results assume that gene conversion is occurring at normal rates in regions of low crossing over, consistent with observations on SNPs in such regions in D. melanogaster, other than the Y chromosome (LANGLEY et al. 2000; JENSEN et al. 2002; SHELDAHL et al. 2003). The effect would obviously be even larger in the absence of gene conversion.
These results raise serious questions about the validity of the model for groups of genes that do not cross over. While codon usage bias is greatly reduced in low-recombination regions of the D. melanogaster genome, it is not completely absent, with an ENC of 50.9 on chromosome 4 compared with a value of 56.0 for random nucleotides from noncoding regions (COMÉRON et al. 1999). Furthermore, the level of SNP diversity on chromosome 4 is
20% of the genomic average (JENSEN et al. 2002; SHELDAHL et al. 2003), much greater than predicted by the model. Similarly, diversity on the nonrecombining neo-Y chromosome of D. miranda is about one-sixtieth of that of its partner, the neo-X chromosome (BARTOLOMÉ and CHARLESWORTH 2006).
Limitations of the background selection model:
These observations suggest that the model grossly overestimates the effects of background selection when recombination rates are low. Such an effect has indeed been detected in previous studies using Monte Carlo simulations (CHARLESWORTH et al. 1993; NORDBORG et al. 1996). It is probably caused by the fact that, with very close linkage, Hill–Robertson interference develops between the relatively strongly selected sites causing background selection, and so its efficacy is undermined. The more densely that selected sites are packed into a given map length, the greater the extent of Hill–Robertson interference among them, and so the weaker the effective selection acting on each of them. A pattern of this kind can be seen in Figure 9 of MCVEAN and CHARLESWORTH (2000). This suggests a need to carry out more detailed investigations, to determine whether the observed features of low-recombination regions can be adequately accounted for. There is also a need to reinvestigate the predictions of the background selection model for the distribution of Ne over large genomic regions in Drosophila (HUDSON and KAPLAN 1995; CHARLESWORTH 1996), using estimates of the distribution of selection coefficients from molecular population genetics analyses rather than mutation-accumulation experiments.
Conclusion:
Background selection within genes seems to be sufficient to explain the observed patterns of codon usage bias in genes, if mutation rates are high enough. However, we cannot exclude other explanations, since the absolute magnitude of the strength of background selection is strongly influenced by evolutionary parameters such as local mutation rates, recombination rates, and the distribution of selection coefficients. Other explanations like Hill–Robertson effects among synonymous sites or recurrent selective sweeps can be excluded only if the evolutionary parameters of background selection can be estimated with sufficient accuracy. We therefore suggest that any integrated theory of the patterns of codon bias in genes must include background selection. Further understanding of these complexities will require models that include all relevant factors.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
ADAMS, M. D., S. E. CELNIKER, R. A. HOLT, C. A. EVANS, J. D. GOCAYNE et al., 2000 The genome sequence of Drosophila melanogaster. Science 287: 2185–2195.
AITCHISON, J., and J. A. C. BROWN, 1957 The Lognormal Distribution, With Special Reference to Its Uses in Economics. Cambridge University Press, Cambridge, UK.
ANDOLFATTO, P., and M. NORDBORG, 1998 The effect of gene conversion on intralocus associations. Genetics 148: 1397–1399.
BARTOLOMÉ, C., and B. CHARLESWORTH, 2006 Evolution of amino-acid sequences and codon usage on the Drosophila miranda neo-sex chromosomes. Genetics 174: 2033–2044.
BARTOLOMÉ, C., X. MASIDE, S. YI, A. L. GRANT and B. CHARLESWORTH, 2005 Patterns of selection on synonymous and nonsynonymous variants in Drosophila miranda. Genetics 169: 1495–1507.
BEGUN, D. J., and C. F. AQUADRO, 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in Drosophila melanogaster. Nature 356: 519–520.[CrossRef][Medline]
BETANCOURT, A. J., and D. C. PRESGRAVES, 2002 Linkage limits the power of natural selection in Drosophila. Proc. Natl. Acad. Sci. USA 99: 13616–13620.
BIERNE, N., and A. EYRE-WALKER, 2006 Variation in synonymous codon use and DNA polymorphism within the Drosophila genome. J. Evol. Biol. 19: 1–11.[CrossRef][Medline]
BIRKY, JR., C. W., and J. B. WALSH, 1988 Effects of linkage on rates of molecular evolution. Proc. Natl. Acad. Sci. USA 85: 6414–6418.
BULMER, M., 1991 The selection-mutation-drift theory of synonymous codon usage. Genetics 129: 897–908.[Abstract]
CHARLESWORTH, B., 1996 Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet. Res. 68: 131–149.[Medline]
CHARLESWORTH, B., M. T. MORGAN and D. CHARLESWORTH, 1993 The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303.[Abstract]
CHARLESWORTH, D., B. CHARLESWORTH and M. T. MORGAN, 1995 The pattern of neutral molecular variation under the background selection model. Genetics 141: 1619–1632.[Abstract]
COMÉRON, J. M., and T. B. GUTHRIE, 2005 Intragenic Hill-Robertson interference influences selection intensity on synonymous mutations in Drosophila. Mol. Biol. Evol. 22: 2519–2530.
COMÉRON, J. M., and M. KREITMAN, 2000 The correlation between intron length and recombination in Drosophila: dynamic equilibrium between mutational and selective forces. Genetics 156: 1175–1190.
COMÉRON, J. M., and M. KREITMAN, 2002 Population, evolutionary and genomic consequences of interference selection. Genetics 161: 389–410.
COMÉRON, J. M., M. KREITMAN and M. AGUADÉ, 1999 Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151: 239–249.
CROW, E. L. (Editor), 1988 Lognormal Distributions: Theory and Applications. Marcel Dekker, New York.
CUTTER, A. D., 2006 Nucleotide polymorphism and linkage disequilibrium in wild populations of the partial selfer Caenorhabditis elegans. Genetics 172: 171–184.
DURET, L., and D. MOUCHIROUD, 1999 Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96: 4482–4487.
FELSENSTEIN, J., 1974 The evolutionary advantage of recombination. Genetics 78: 737–756.
FISHER, R. A., 1930 The Genetical Theory of Natural Selection. Clarendon Press, Oxford.
FLYBASE, 2006 A Database of the Drosophila Genome (http://flybase.bio.indiana.edu/).
FRISSE, L., R. R. HUDSON, A. BARTOSZEWICZ, J. D. WALL, J. DONFACK et al., 2001 Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet. 69: 831–843.[CrossRef][Medline]
GORDO, I., and B. CHARLESWORTH, 2001 Genetic linkage and molecular evolution. Curr. Biol. 11: R684–R686.[CrossRef][Medline]
HAAG-LIAUTARD, C., M. DORRIS, X. MASIDE, S. MACASKILL, D. L. HALLIGAN et al., 2007 Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature 445: 82–85.[CrossRef][Medline]
HADDRILL, P. R., B. CHARLESWORTH, D. L. HALLIGAN and P. ANDOLFATTO, 2005 Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content. Genome Biol. 6: R67.[CrossRef][Medline]
HALDANE, J. B. S., 1927 The mathematical theory of natural and artificial selection. Part V: selection and mutation. Proc. Camb. Philos. Soc. 23: 838–844.
HEY, J., and R. M. KLIMAN, 2002 Interactions between natural selection, recombination and gene density in the genes of Drosophila. Genetics 160: 595–608.
HILL, W. G., and A. ROBERTSON, 1966 The effect of linkage on limits to artificial selection. Genet. Res. 8: 269–294.[Medline]
HILLIKER, A. J., and A. CHOVNICK, 1981 Further observations on intragenic recombination in Drosophila melanogaster. Genet. Res. 38: 281–296.[Medline]
HILLIKER, A. J., G. HARAUZ, A. G. REAUME, M. GRAY, S. H. CLARK et al., 1994 Meiotic gene conversion tract length distribution within the rosy locus of Drosophila melanogaster. Genetics 137: 1019–1026.[Abstract]
HUDSON, R. R., and N. L. KAPLAN, 1995 Deleterious background selection with recombination. Genetics 141: 1605–1617.[Abstract]
IHAKA, R., and R. GENTLEMAN, 1996 R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5: 299–314.[CrossRef]
JENSEN, M. A., B. CHARLESWORTH and M. KREITMAN, 2002 Patterns of genetic variation at a chromosome 4 locus of Drosophila melanogaster and D. simulans. Genetics 160: 493–507.
KIM, Y., 2004 Effect of strong directional selection on weakly selected mutations at linked sites: implication for synonymous codon usage. Mol. Biol. Evol. 21: 286–294.
KIMURA, M., 1962 On the probability of fixation of mutant genes in a population. Genetics 47: 713–719.
KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK.
LANGLEY, C. H., B. P. LAZZARO, W. PHILLIPS, E. HEIKKINEN and J. M. BRAVERMAN, 2000 Linkage disequilibria and the site frequency spectra in the su(s) and su(wa) regions of the Drosophila melanogaster X chromosome. Genetics 156: 1837–1852.
LI, W. H., 1987 Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24: 337–345.[CrossRef][Medline]
LIMPERT, E., W. A. STAHEL and M. ABBT, 2001 Log-normal distributions across the sciences: keys and clues. BioScience 51: 341–352.[CrossRef]
LOEWE, L., and B. CHARLESWORTH, 2006 Inferring the distribution of mutational effects on fitness in Drosophila. Biol. Lett. 2: 426–430.[CrossRef][Medline]
LOEWE, L., B. CHARLESWORTH, C. BARTOLOMÉ and V. NÖEL, 2006 E