DNA Variability and Divergence at the Notch Locus in Drosophila melanogaster and D. simulans: A Case of Accelerated Synonymous Site Divergence

DNA diversity in two segments of the Notch locus was surveyed in four populations of Drosophila melanogaster and two of D. simulans. In both species we observed evidence of non-steady-state evolution. In D. simulans we observed a significant excess of intermediate frequency variants in a non-African population. In D. melanogaster we observed a disparity between levels of sequence polymorphism and divergence between one of the Notch regions sequenced and other neutral X chromosome loci. The striking feature of the data is the high level of synonymous site divergence at Notch, which is the highest reported to date. To more thoroughly investigate the pattern of synonymous site evolution between these species, we developed a method for calibrating preferred, unpreferred, and equal synonymous substitutions by the effective (potential) number of such changes. In D. simulans, we find that preferred changes per “site” are evolving significantly faster than unpreferred changes at Notch. In contrast we observe a significantly faster per site substitution rate of unpreferred changes in D. melanogaster at this locus. These results suggest that positive selection, and not simply relaxation of constraint on codon bias, has contributed to the higher levels of unpreferred divergence along the D. melanogaster lineage at Notch.

T O discern where and how natural selection has during metazoan development. Notch homologs are shaped levels of genomic diversity, we need estimates present in diverse organisms including sea urchins, fruit of the underlying "neutral" level of variation within and flies, and humans. Due to its sequence and functional between species. We have been studying sequence polyconservation, Notch's role during development has been morphism in regions of high crossing-over in an effort extensively studied (e.g., Beatus and Lendahl 1998; to define the levels of variability in regions presumably Fleming 1998;Artavanis-Tsakonas et al. 1999). As a free from the effect of linked selection to estimate neuresult, functional domains of the Notch protein have tral levels and patterns of variation for Drosophila melanobeen characterized, and many proteins involved in the gaster and D. simulans. In this article we present data for Notch pathway have been identified. nucleotide variability within and between species at two The Notch transcript spans 30 kb of genomic DNA. segments of the Notch locus. Notch was chosen for two We sequenced two segments (Figure 1), which we call reasons. First, it is located on the X chromosome in a the 5Ј region (exons 3 and 4 and neighboring introns) region of relatively high recombination (2.1 ϫ 10 Ϫ5 and the 3Ј region (the 3Ј end of exon 6). These segments recombinants/generation/kb; reported as R in Hey are 10 kb apart. Exons 3 and 4 encode several of the and Kliman 2002). Second, a previous course-scale (sixepidermal growth factor (EGF)-like repeats of the extracutter) restriction site survey of a 60-kb region encomcellular domain of Notch. EGF repeats are involved in passing Notch suggested this region of the genome to the interaction of Notch with its ligands (i.e., Delta and be evolving neutrally (Schaeffer et al. 1988;Begun and Serrate) and the initiation of the Notch signaling pathway Aquadro 1992). (Rebay et al. 1991). The 3Ј region of exon 6 represents Notch encodes a single-pass, transmembrane protein the beginning of the intracellular domain of Notch that that serves as a receptor for cell-to-cell communication mediates the ligand-dependent response of the Notch pathway within the cell (through interactions with a number of other proteins).
Sequence data from this article have been deposited with the Below we report that in D. melanogaster there are sig-EMBL/GenBank Data Libraries under accession nos. AF361372nificant differences in the ratio of polymorphism to AF361422, AF360581-AF360631, and AY191369-AY191414.
divergence between Notch 3Ј and other neutral X-chro-1 Figure 1.-Genomic structure of the Notch locus. Boxes represent exons of Notch, and lines represent noncoding 5Ј, intron, and 3Ј sequence. The locations of the two regions sequenced (called Notch 5Ј and Notch 3Ј) are shown.
3.0. This program was also used to calculate and , estimates number of preferred, unpreferred, and equal synonyof 3N e (since Notch is on the X chromosome). P values for mous preference sites suggests that at Notch 3Ј positive Tajima's D, Fu and Li's D, and Fu's Fs tests were obtained selection has played a role in accelerating different types using the coalescent simulator of DnaSP 3.0, assuming either of synonymous site evolution in both of these species. no recombination or R ϭ 3N e r ϭ 94. This choice of R is based on the genetic map-based recombination rate at Notch of r ϭ the effective number of synonymous preference sites, and they represent the mutation potential toward each type of change. As such, any bias in the mutation process can have profound effects on the estimation of the number of sites. We have used the mutation rate estimates obtained by Petrov and Hartl (1999) from the substitution pattern at pseudogenes (importantly these authors report relative mutation rates for each type of change). The mutation bias was incorporated by adjusting the proportion of substitutions at each codon position that would result in a preferred synonymous, unpreferred synonymous, or equal synonymous change [using Akashi's (1995) classification]. The effective numbers of synonymous preference sites are listed for each of the 61 sense codons in Table S1 at http:/ /www.genetics.org/supplemental/ (stop codons are ignored in these calculations). The number of each type of synonymous change (preferred, unpreferred, or equal) is then compared relative to the effective number of synonymous preference sites for that change. Our goal is to evaluate whether differences in the number of preferred and unpreferred fixations between these species' lineages repre- Figure 2.-Mutation pathways for Drosophila. Numbers sent positive selection or changes in selective constraint. We not in parentheses are values taken from Petrov and Hartl discuss this logic more in the results section.
(1999) divided by two, to represent the proportion of times We first describe how we estimated the number of synonysuch a mutation was observed out of the total of 12 possible mous preference sites for each codon. Petrov and Hartl changes in the pathway. Numbers in parentheses are the values (1999) used observed substitutions in "dead on arrival" transused to incorporate the mutation bias when estimating the posable elements (considered to be pseudogenes) that were effective number of synonymous preference sites (see materilocated throughout the genome to infer the underlying mutaals and methods). tional pattern in Drosophila. After accounting for base frequency differences, they document that C to T and G to A mutations occur 2.2 times more frequently than the average mous changes are at site 3: G to A and G to T are both equal of all other changes. They found this bias to be consistent (so 0.96) and the G to C is preferred (thus 0.24). On a per across two subgenera of the Drosophila radiation (Sophocodon basis (summing over the three codon positions), the phora and Drosophila). This mutation bias will affect estimates number of "preferred synonymous sites" is thus 0 ϩ 0 ϩ 0.24 ϭ of both the effective number of synonymous and nonsynony-0.24, the number of "equal synonymous sites" is 0.36 ϩ 0 ϩ mous sites and the effective number of synonymous preference 0.96 ϭ 1.32, and there are no "unpreferred synonymous sites." sites in two ways. First, when a nucleotide mutates, the likeli-So, summing across the codon, CGG is considered as 0.24 hood that it mutates to each of the other three bases is not preferred and 1.32 equal synonymous preference sites. equal. Second, codon positions made up of G's and C's will have To apply this method to a coding region, we inferred the a higher mutation rate than those made up of A's and T's.
ancestral sequence of D. melanogaster and D. simulans assum-When one assumes that all mutations occur at an equal ing parsimony, using D. yakuba as the outgroup. Given our rate, all nucleotide positions in a sequence are considered use of parsimony, multiple mutational hits are not taken into equal (i.e., each represents 1 "site" of the sequence) and each account. To determine whether multiple hits affect our reconof the three changes possible for a nucleotide are considered struction of the ancestral sequence, we also used a maximumequal (i.e., each change is considered as one-third of a site).
likelihood method (using PAML; Yang and Nielsen 1998, To incorporate the mutation bias, we scaled the value given 2000) to reconstruct the ancestor at Notch 3Ј (the coding to each change from one nucleotide to another (0.333) by region for which we observe the highest level of divergence). the percentage of difference between the number of times The highly supported ancestral sequence constructed using that mutation was observed by Petrov and Hartl (1999) to maximum likelihood was identical to the parsimony sequence. that expected if all mutations occurred at an equal rate (i.e., This finding supports the use of parsimony to reconstruct 1/12 or 0.0833, see Figure 2). As an example consider all the ancestral sequence between these closely related species. possible mutations from the C nucleotide. From Petrov and However, the congruence of the two methods does not com-Hartl (1999), a C to T mutation occurs 15% of the time, pletely eliminate the uncertainty in estimating ancestral states, while with no bias we would expect this value to be 8.33%.
which may increase the variance of our estimate. The small Thus, C to T mutations are now considered as (0.333(0.15 Ϭ number of codons for which the ancestral state of any of 0.0833)) ϭ 0.60 of a site instead of 0.333. Likewise, C to A the three positions could not be inferred was excluded from mutations represent (0.333(0.09 Ϭ 0.0833)) ϭ 0.36 and C further analysis. For each codon, the effective numbers of and G mutations (1/3(0.06 Ϭ 1/12)) ϭ 0.24 of a site. As a preferred, unpreferred, and equal synonymous preference result codon positions occupied by C and G are now considsites (given in Table S1 at http:/ /www.genetics.org/suppleered (0.60 ϩ 0.24 ϩ 0.36) ϭ 1.2 sites and T and A positions mental/) were multiplied by the number of times that codon as (0.32 ϩ 0.28 ϩ 0.20) ϭ 0.8 sites.
occurred in the ancestral sequence for each coding region. To illustrate how we incorporate these mutation-biased These numbers were summed across codons, leading to an pathways into our estimate of the number of synonymous overall estimate of the effective number of preferred, unprepreference sites we consider the codon CGG. One, zero, and ferred, and equal synonymous preference sites for the entire three of the three possible changes at sites 1, 2, and 3, respeccoding region. Two-by-two contingency tables were used to tively, of the codon are synonymous. The C to A change at compare the rates of preferred and unpreferred fixations per the first site is an equal synonymous change (see Table S1 site. Significance was determined using Fisher's exact test. at http:/ /www.genetics.org/supplemental/), so 0.36 of first-We point out that unless an equal number of purines and pyrimidines are found within a coding region, summing the position changes are equal preference. The only other synony-number of synonymous and nonsynonymous sites will not is observed in Ecuador with the U.S. results being marequal the total length of the region when the mutations bias ginally significant when we consider total divergence. is taken into account. These values do, however, reflect the In no population is there a significant difference bedifferential mutational potential between the different types tween the two regions of Notch with the lineage-specific of sites.
A program was written to construct the ancestral sequence Tajima's (1989) D and Fu and Li's (1993) D statistics org/supplemental/ for D. melanogaster. Estimates of nutend to be negative for the 5Ј region in all populations cleotide variability within each population are summaand for the 3Ј region in Zimbabwe (Table 1), but the rized in Table 1. As has been seen for other X-linked departures are not significant. In contrast, for Notch genes in this species (e.g., Begun and Aquadro 1993; 3Ј most of the non-African populations have positive Andolfatto 2001), there tends to be more nucleotide statistics for both Tajima's and Fu and Li's tests (Table  and haplotype diversity in Zimbabwe than in non-Afri-1). Only Notch 3Ј in Ecuador is significantly positive can populations. While the levels of variability observed (marginally), with two intermediate-frequency segregatat Notch are within the range seen at other loci in this ing sites (Table 1 and Figure S1 at http://www.genetics. species (e.g., Moriyama and Powell 1996), the 3Ј reorg/supplemental/). Fu's (1997) Fs statistic is negative gion is less variable than the 5Ј region for three of the at both regions of Notch but is not significant in any four populations of D. melanogaster. This is in contrast population when recombination is taken into account. to synonymous site divergence between D. melanogaster Fay and Wu's (2000) H test reveals that no individual and D. simulans, which is higher at Notch 3Ј than at Notch population sample or region has a significant excess of 5Ј (Table 1, Figure 3).
high-frequency-derived variants. We use the Hudson-Kreitman-Aguadé (HKA) test (Hud-Application of the McDonald-Kreitman test (McDonson et al. 1987) to evaluate deviations from the neutral ald and Kreitman 1991) to each Notch region revealed expectation of a consistent ratio of variability and diverno significant departure from the neutral prediction of gence among gene regions. Both regions of Notch are equivalent ratios of synonymous and nonsynonymous compared to one another as well as to other published polymorphism to divergence. This is also true when we X chromosome loci (Table 2). Only X chromosomal combine data from the two regions (36 synonymous loci are used to avoid assumptions about differences in polymorphisms:101 synonymous differences compared effective population size between the X chromosome to 2 nonsynonymous polymorphisms:3 nonsynonymous and autosomes (a contrast complicated by assumptions differences). The very small number of nonsynonymous about sex ratios in natural populations; e.g., Andolvariants, however, gives this test little power. fatto 2001). We perform the test in two ways. First we Nucleotide variability-D. simulans: Figures S3 and consider total pairwise differences between D. melanogas-S4 at http://www.genetics.org/supplemental/ present ter and D. simulans and then we consider only the difthe polymorphic sites at Notch 3Ј and 5Ј, respectively, ferences that have occurred specifically along the D.
in D. simulans. Estimates of nucleotide variability within melanogaster lineage (using D. yakuba as an outgroup). A significant difference between the two regions of Notch each population are summarized in Table 1. We observe n is the number of chromosomes sequenced for each population. S is the number of segregating sites. Hap is the number of haplotypes found within each population. Divergence is uncorrected pairwise differences between D. melanogaster no. 1 and U.S. D. simulans no. 18. The 3Ј region consists of 1581 bp of coding sequence, with 386 and 1195 effective numbers of synonymous and nonsynonymous sites, respectively. The 5Ј region consists of 465 bases of coding (112 and 353 effective numbers of synonymous and nonsynonymous sites, respectively) and 1023 bp of intron sequence. Due to insertion/deletion differences only 844 intron sites could be directly compared between the species. Intron variability reported here is only for polymorphic sites that occur within these 844 bases. Tajima's D, Fu and Li's D, and Fu's Fs values are given with associated P values obtained from the coalescent simulator of DNAsp (Rozas and Rozas 1997) with no recombination, or in parentheses, with R ϭ 94 as explained in text. Significance is considered at the 0.025 level as the simulation results are for a two-tailed test. *Significant at the 0.025 level; **significant at the 0.010 level. slightly more variation in the Zimbabwe population than significant when recombination is considered. Also, no departures from neutrality were detected in either pop-in the U.S. one. We note the pronounced haplotype structuring in the U.S. population, especially at Notch ulation with Fay and Wu's (2000) H test.
We also applied the McDonald-Kreitman test (McDon-3Ј (Figures S3 and S4 at http://www.genetics.org/ supplemental/). There is much less structure in the ald and Kreitman 1991) to the D. simulans data. Again we found no significant departures from neutral predic-Zimbabwe population. In fact, only one of the haplotypes observed in the U.S. population at Notch 3Ј is well tions. As with D. melanogaster, this is true even when the data across the two Notch regions are combined (41 represented in the African sample. The HKA test results for D. simulans are given in synonymous polymorphisms:101 synonymous differences, compared to 1 nonsynonymous polymorphism:3 Table 3. We detect no departure from neutrality in the relationship between levels of variability and divergence nonsynonymous differences). Synonymous site divergence at Notch: We observe low between Notch 5Ј and Notch 3Ј in either population. Neither region of Notch differs significantly from other X to moderate levels of synonymous site polymorphism in both species yet extremely high levels of divergence. In chromosomal loci although there is a trend for Notch 3Ј to have a lower ratio of variability to divergence. The contrast, intron divergence at Notch is less than the mean of 27 intron regions compared ( Figure 3). This suggests results are the same whether the HKA test is applied using total divergence or when considering only the divergence that a regionally high mutation rate does not explain the high level of synonymous divergence at Notch. Could along the D. simulans lineage (as described above).
Tajima's D and Fu and Li's D tests (Table 1) (Table 4). This result is in agreement with those of Akashi (1996) who noted a trend of more tends to be negative in these populations but is not Comparisons were made only between loci containing the same population sample and with similar rates of recombination. For each gene region we give the number of polymorphic and divergent sites in brackets. The number of polymorphic sites is given before the comma. After the comma we give the total number of pairwise differences between D. melanogaster and D. simulans (not in parentheses) and the number of differences that have occurred specifically along the D. melanogaster lineage (in parentheses). Lineage-specific analyses were not possible for all gene regions. P values in parentheses are for lineage-specific comparisons. *P Ͻ 0.05; **P Ͻ 0.01. a Data are from Begun and Aquadro (1995). b Data are from Begun and Aquadro (1994). c Data are from Eanes et al. (1996). d Data are from Labate et al. (1999).
synonymous substitutions along the D. melanogaster lin-vergence could reflect stronger functional constraint or lower mutation rate in introns and/or positive selection eage compared to D. simulans. We applied the relative rate test to a number of other loci sequenced in these accelerating divergence at synonymous sites. We thus compare levels of polymorphism and divergence be-species (Table 4). Without a Bonferroni correction, four other loci are significant at the 5% level: three with tween intron data at Notch 5Ј and synonymous data at Notch 5Ј and Notch 3Ј in Zimbabwe. While there is no sig-more synonymous substitutions occurring in D. melanogaster (per, Amy-P, and Amyrel) and one with more substi-nificant difference between the ratio of polymorphism and divergence at intron and synonymous sites within tutions in D. simulans (mei-218). Only Notch remains significant after a Bonferroni correction is applied. Sum-Notch 5Ј (17 intron polymorphisms and 34 intron differences vs. 9 synonymous polymorphisms and 18 synony-ming across loci we observe significantly more synonymous substitutions along the D. melanogaster lineage mous differences, Fisher's exact test P value Ն 0.999), the ratios are significantly different between Notch 5Ј even after correcting for multiple tests.
Interestingly, there is no significant difference be-intron and Notch 3Ј synonymous sites (17 intron polymorphisms and 34 intron differences vs. 22 synonymous tween lineages in intron divergence at Notch or at any of the other loci after a Bonferroni correction. Also, polymorphisms and 113 synonymous differences, Fisher's exact test P value ϭ 0.015). Note that the large when summing across introns there is no trend for more substitutions along the D. melanogaster lineage. In addi-number of synonymous differences at Notch 3Ј appears to be the outlier. The 5Ј intron and 3Ј synonymous data tion, intron divergence is significantly lower than fourfold synonymous divergence (Mann-Whitney P ϭ 0.023). (The are not, however, significant with the more conservative HKA test, which takes into account evolutionary vari-comparison of intron to fourfold synonymous divergence is chosen to avoid the need to assume a particular ance (P value ϭ 0.113). Thus, constraint or mutation rate differences cannot be completely discounted for transition/transversion bias when estimating synonymous site divergence.) Comparatively lower intron di-the lower intron divergence, yet the tendency toward Comparisons were made only between loci containing the same population sample and with presumably similar rates of recombination. For each gene region we give the number of polymorphic and divergent sites in brackets. The number of polymorphic sites is given before the comma. After the comma we give the total number of pairwise differences between D. melanogaster and D. simulans (not in parentheses) and the number of differences that have occurred specifically along the D. simulans lineage (in parentheses). Lineage-specific analyses were not possible for all gene regions. P values in parentheses are for lineage-specific comparisons. *P Ͻ 0.05. a Zimbabwe data are from Hamblin and Veuille (1999); U.S. data are from Begun and Aquadro (1995). b Data are from Begun and Whitley (2000). c Zimbabwe data are from Hamblin and Veuille (1999); U.S. data are from Eanes et al. (1996). greater synonymous site divergence at Notch 3Ј suggests equal preference sites in D. melanogaster and D. simulans, for both regions of Notch. For these comparisons we a potential role of positive selection acting on synonymous sites in this region. Below we perform additional consider fixed differences as we are evaluating per site rates of evolution for different types of synonymous analyses to further investigate this possibility.
Akashi ( 1996) showed that on average 60-70% of mutations. In this manner we minimize the effects of segregating deleterious mutations that will never go to synonymous differences involving unpreferred and preferred codons between these species have the unpre-fixation but could be counted as such in pairwise comparisons. With Fisher's exact test, Notch 3Ј in D. melano-ferred codon in D. melanogaster. At Notch 5Ј and Notch 3Ј, respectively, 89 and 98% of such synonymous divergent gaster shows significantly more unpreferred fixations than preferred fixations per site (Table 5). We also sites have the unpreferred codon in D. melanogaster. Relaxation of selective constraint on codon bias (due report equal substitutions per site for comparison. No significant difference is observed at Notch 5Ј in D. melano-to a smaller effective population size) has been proposed as the cause of the greater number of unpreferred gaster although the trend is in the same direction as Notch 3Ј (for Notch 5Ј the power of the comparison is fixations in D. melanogaster compared to D. simulans (Akashi 1995(Akashi , 1996McVean and Vieira 2001). Given limited by the small number of preferred synonymous sites). In D. simulans, we observe significant differences that Notch appears to be an extreme example of this general trend we wanted to discriminate between relax-at both Notch 3Ј and 5Ј (Table 5). However, in this species we observe a significant excess of preferred com-ation of constraint and positive selection as the cause for the codon usage patterns at this locus. We did this pared to unpreferred fixations per site. To determine whether these results at Notch are part of a genome-by developing a method to estimate the effective number of synonymous preference sites.
wide phenomenon, we repeated our analysis for loci for which D. melanogaster, D. simulans, and D. yakuba have If relaxation of constraint is the sole explanation for the difference in synonymous evolution between D. mela-been sequenced and for which there are at least 450 bases of coding sequence (Tables 6 and 7). Again, we nogaster and D. simulans, the ratio of the number of preferred differences to the effective number of pre-consider only fixed differences that have occurred along each species lineage. Differences in the number of loci ferred synonymous preference sites should equal the ratio of unpreferred differences per unpreferred synon-between the species reflect availability of polymorphism data. ymous preference site along the D. melanogaster lineage. The class of changes with a significantly lower ratio First we consider the results in D. melanogaster. Only Notch 3Ј has a significant excess of unpreferred com-could be deleterious, and ones with a higher ratio may be advantageous. pared to preferred substitutions per site in this species. While more loci have a higher unpreferred than pre-In Table 5 we report the number of changes observed and the effective number of preferred, unpreferred, and ferred rate of substitution, the Wilcoxon signed rank test is not significant (P value ϭ 0.322). Therefore, in D. melanogaster it appears that most but not all of the loci remain significant after applying the Bonferroni correction (Notch 3Ј,pgi,tpi,and Zw). In addition, the between their rate of preferred evolution and that at Notch. Also, at Notch 3Ј the rate of preferred fixations is majority of the loci have a higher level of preferred compared to unpreferred divergence per site (Wilcoxon significantly greater than the total of all the other loci (Notch 3Ј, 23 preferred fixations out of 63 sites vs. total signed rank test P value Ͻ 0.0001). Interestingly, as in D. melanogaster, the level and type of synonymous site across other loci, 67 preferred fixations out of 1252 sites, P value Ͻ 0.0001). evolution are also extreme at Notch. Only four of the additional loci studied have nonsignificant differences As with D. melanogaster, we compared the number of   TABLE 6 Numbers of preferred, unpreferred, and equal substitutions and number of synonymous preference sites for many loci along the D. melanogaster lineage  TABLE 7 Numbers of preferred, unpreferred, and equal substitutions and number of synonymous preference sites for many loci along the D. simulans lineage P values are from Fisher's exact tests comparing preferred and unpreferred mutations. **Significant with a Bonferroni correction; *significant only without the Bonferroni correction.
A/T to G/C mutations on a per base pair level between promised by the small number of preferred sites inferred in the ancestor of these species. fourfold synonymous and intron positions in D. simulans. We found no difference (Fisher's exact P value ϭ 0.119; 1 fourfold A/T to G/C substitution, with 13 A's DISCUSSION and T's in ancestor vs. 10 intron A/T to G/C substitutions, with 642 A's and T's in ancestor). However, the Our interest in the Notch locus stemmed from previpower of this comparison is compromised by the small ous data suggesting that this region was unaffected by number of A's and T's at fourfold degenerate sites.
positive selection in D. melanogaster. This is true at the Thus, specific types of synonymous changes have been amino acid level for the regions of Notch we surveyed. accelerated in both species. In D. simulans there appears However, our data suggest that synonymous fixations at to be a genome-wide trend of an accelerated fixation Notch have been accelerated by positive selection along of preferred changes per site, with Notch appearing to both the D. melanogaster and D. simulans lineages. be an extreme example. In contrast, in D. melanogaster At the level of polymorphism alone we do not detect unpreferred substitutions appear accelerated at Notch.
any departures from neutral expectations in D. melano-While there is a genome-wide trend in this same direcgaster. In D. simulans, we observe a significant excess of tion in D. melanogaster, it is not significant. However, in intermediate frequency variants in our U.S. sample. This result could be due to balancing selection and/or de-D. melanogaster the power of such comparisons is com-mography. Haplotype structure as seen at Notch is also per site at Notch 3Ј in D. melanogaster remains unchanged (data not shown). reported at other unlinked loci in non-African samples Our results in D. simulans support previous claims that of D. simulans (e.g., Begun and Aquadro 1994; Eanes positive selection is involved in the establishment of a et Hamblin and Aquadro 1996;Hamblin and bias toward preferred G-and C-ending codons in Dro-Veuille 1999;Labate et al. 1999). Thus, it is likely sophila (i.e., Akashi 1994; Akashi and Schaeffer 1997; our result is due to demography, perhaps the result of Kliman 1999; Kern et al. 2002). With our method, posihistorical admixture between divergent populations or tive selection appears to play a role not only at the Notch population bottlenecks (Wall et al. 2002).
locus but also genome-wide in this species. We note that When comparing levels of polymorphism to diverother studies (Begun 2001; McVean and Vieira 2001) gence we detect significant non-neutral patterns in D.
have reported evidence of relaxation of constraint on melanogaster. For example, the HKA test detects a sigcodon bias along the D. simulans lineage. Such conclunificantly lower ratio of polymorphism to divergence at sions are drawn from observing an excess of unpreferred Notch 3Ј compared to other "neutral" X chromosome substitutions (i.e., more unpreferred substitutions than loci in this species. The neutral theory predicts that preferred) compared to that expected under mutationregions with high divergence will also have high levels selection-drift equilibrium. There are a number of poof polymorphism. However, if the variants examined tential differences between our method and those of tend to be advantageous on average themselves, then Begun (2001) and McVean and Vieira (2001) to explain levels of divergence will be elevated relative to polymorthe contrasting results. For example, we compare the numphism. At the regions of Notch we studied, the most ber of unpreferred and preferred substitutions on a per striking feature of the data is the extremely high level site basis while Begun (2001) (2001) note that pairwise comparinumber of unpreferred substitutions per site than presons may lead to an underestimation of selection coeffiferred substitutions in D. melanogaster. In contrast, in D.
cients. simulans we observe significantly more preferred fixa-As an aside, our data suggest not only that positive tions per site than unpreferred ones. Thus, the pattern selection has shaped synonymous site evolution in D. of synonymous site evolution at Notch suggests the influsimulans but also that this species is at mutation-selecence of positive selection, but in the opposite directions tion-drift equilibrium. An equal number of preferred in these closely related species. and unpreferred substitutions would be expected for a What could be the cause of the acceleration of specific lineage at equilibrium (Bulmer 1991). This is what we synonymous changes at Notch in these species? Synonyobserve when we sum fixed differences across loci. Also, mous codon usage bias is thought to be the result of a due to evolutionary variance, at equilibrium some loci mutation or gene conversion bias, selection for translawill have more preferred substitutions and others more tional accuracy/efficiency, and/or other forms of selecunpreferred, but the number of loci that go one way tion (i.e., selection for mRNA stability or regulation of or the other should be equal. This is indeed the case transcription; reviewed in Akashi 2001 and Duret in D. simulans with 15 loci having more unpreferred 2002). Although we have tried to address the issue of fixations and 14 more preferred (Wilcoxon signed rank mutation bias by comparing the substitution pattern at test P value ϭ 0.370). fourfold synonymous and intron positions at Notch 5Ј, When considering the mutation-selection-drift equithere remains the possibility that our results in D. melanolibrium predictions detailed above, D. melanogaster does gaster are simply due to an underestimation of the not appear to be at equilibrium. When summing across mutation bias at the 3Ј end of Notch. Mutational proloci we observe approximately eight times more unprecesses in Drosophila have been shown to be context ferred fixations than preferred. Also, we observe more specific (Kliman and Eyre-Walker 1998). However, unpreferred fixations than preferred in 19 of the 21 the base composition at Notch 3Ј in the extant and ancesloci studied (Wilcoxon signed rank test P value ϭ tral sequences does not deviate from that observed at 0.0001). This general observation of a greater number the other loci in our study. We note that the mutational of unpreferred than preferred fixations in D. melanogasbiases that we use from Petrov and Hartl (1999) to ter has previously been interpreted as relaxation of coninfer the number of synonymous preference sites differ straint on codon bias along this species lineage (e.g., slightly from the biases estimated by McVean andAkashi 1994, 1996). With our method we cannot reject Vieira (2001). To see if this difference could affect our this hypothesis genome-wide in D. melanogaster. Howresults, we incorporated the McVean and Vieira (2001) ever, relaxation of constraint does not appear sufficient mutational biases into our estimation of the number of to explain the large excess of unpreferred fixations per sites and reapplied the test to Notch 3Ј. The conclusion of site in D. melanogaster at the Notch 3Ј region. Our results suggest that positive selection is involved a higher level of unpreferred than preferred divergence in the fixation of unpreferred mutations at Notch 3Ј, but recent switch in mutation bias does not explain our results. the nature of that selection is presently unknown. There are a number of possible explanations. A recent switch The substitution process is governed by the input of mutations and fixation due to drift and selection. With in codon preference due to a change in tRNA abundance seems unlikely, given the resulting load of delete-the available data, it does not appear that mutational processes alone can explain our results, unless the muta-rious fixations across the genome as detailed by Akashi et al. (1998). Selection pertaining to Notch mRNA stabiltion bias is more extreme in exons compared to introns. ity and/or gene expression and regulation could be Thus, positive selection appears to have accelerated the involved. For example, a growing number of studies fixation of a subset of synonymous codons at Notch in show the importance of synonymous sites in the func-D. melanogaster and D. simulans. These results add to the tionality of proteins and in translation kinetics (Corgrowing caution in the use of synonymous site evolution tazzo et al. 2002;Duan et al. 2003;Oresic et al. 2003).
as a neutral proxy (e.g., Akashi and Kreitman 1995; Smith and Eyre-Walker (2001) have shown that in Bustamante et al. 2002;Fay et al. 2002; Smith and some cases the use of "suboptimal" codons in Escherichia Eyre-Walker 2002;Swanson et al. 2003). For example, coli appears to be due to some form of "conflicting d N /d S comparisons for which synonymous changes are selection" (i.e., regulation of gene expression or mRNA/ assumed neutral may underestimate the presence of DNA secondary structure). To the best of our knowledge positive selection at the amino acid level. Our results the structure of the Notch mRNA is unknown, and no may also shed light as to the cause of the lower average programs to predict the structure can input the entire level and variance in intron divergence compared to 8-kb transcript.
synonymous site divergence observed between these The question remains as to why Notch is an extreme species (Figure 3; Bauer and Aquadro 1997; Takanoexample of genome-wide trends in codon usage within Shimizu 2001). A striking difference is the skew toward each of these species and in synonymous codon usage high levels of divergence for synonymous sites but not differences between them. Notch resides in a region of introns. This pattern could indicate that intron mutahigh recombination, which may aid in the efficiency of tions are in general more selectively constrained than weak selection due to a larger region-specific effective synonymous. Given that Drosophila genes tend to have population size (i.e., Birky and Walsh 1988; Barton short introns (Mount et al. 1992), this explanation is 1995). Pronounced differences in synonymous codon plausible. On the other hand, our Notch results suggest usage between the melanogaster and the ananassae and that positive selection has the potential to accelerate obscura species groups (i.e., more unpreferred fixations the fixation of some synonymous mutations for at least in the former) at the yellow locus are thought to be due some of the loci with high levels of synonymous site to a marked reduction of recombination at the tip of divergence. the X chromosome in the melanogaster species group We thank Willie Swanson, Floyd Reed, Jennifer Calkins, Leila Hatch, (Munté et al. 1997(Munté et al. , 2001. If such an explanation were Rasmus Nielsen, Guy Reeves, Rick Durrett, Gil McVean, Hiroshi to account for the extreme differences in codon useage Akashi, Montse Aguadé, and Andy Clark for valuable comments on the manuscript and/or participating in productive discussions. We between the species used in this study, we would expect particularly thank Martha Hamblin for discussions that played a key the recombination rate at Notch in D. melanogaster to role in instigating our comparison of the number of synonymous appear relatively low. However, using the method of preference changes on a per site basis. This research was supported Wall (2000), we found that the population rate of by National Institutes of Health grant GM36431 to C.F.A. recombination per base pair for both Notch regions is relatively high compared to other loci and is much higher than estimates for the Notch region based on LITERATURE CITED integrated map methods (data not shown). The estimates of R (3N e r for Notch) per base pair are 0.263 Akashi, H., 1994