Genetics, Vol. 167, 171-185, May 2004, Copyright © 2004

DNA Variability and Divergence at the Notch Locus in Drosophila melanogaster and D. simulans: A Case of Accelerated Synonymous Site Divergence

Vanessa Bauer DuMonta, Justin C. Fay1,a, Peter P. Calabrese2,b, and Charles F. Aquadroa
a Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853
b Department of Applied Mathematics, Cornell University, Ithaca, New York 14853

Corresponding author: Charles F. Aquadro, 235 Biotechnology Bldg., Cornell University, Ithaca, NY 14853., cfa1{at}cornell.edu (E-mail)

Communicating editor: M. AGUADÉ


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

DNA diversity in two segments of the Notch locus was surveyed in four populations of Drosophila melanogaster and two of D. simulans. In both species we observed evidence of non-steady-state evolution. In D. simulans we observed a significant excess of intermediate frequency variants in a non-African population. In D. melanogaster we observed a disparity between levels of sequence polymorphism and divergence between one of the Notch regions sequenced and other neutral X chromosome loci. The striking feature of the data is the high level of synonymous site divergence at Notch, which is the highest reported to date. To more thoroughly investigate the pattern of synonymous site evolution between these species, we developed a method for calibrating preferred, unpreferred, and equal synonymous substitutions by the effective (potential) number of such changes. In D. simulans, we find that preferred changes per "site" are evolving significantly faster than unpreferred changes at Notch. In contrast we observe a significantly faster per site substitution rate of unpreferred changes in D. melanogaster at this locus. These results suggest that positive selection, and not simply relaxation of constraint on codon bias, has contributed to the higher levels of unpreferred divergence along the D. melanogaster lineage at Notch.


TO discern where and how natural selection has shaped levels of genomic diversity, we need estimates of the underlying "neutral" level of variation within and between species. We have been studying sequence polymorphism in regions of high crossing-over in an effort to define the levels of variability in regions presumably free from the effect of linked selection to estimate neutral levels and patterns of variation for Drosophila melanogaster and D. simulans. In this article we present data for nucleotide variability within and between species at two segments of the Notch locus. Notch was chosen for two reasons. First, it is located on the X chromosome in a region of relatively high recombination (2.1 x 10–5 recombinants/generation/kb; reported as R in HEY and KLIMAN 2002 Down). Second, a previous course-scale (six-cutter) restriction site survey of a 60-kb region encompassing Notch suggested this region of the genome to be evolving neutrally (SCHAEFFER et al. 1988 Down; BEGUN and AQUADRO 1992 Down).

Notch encodes a single-pass, transmembrane protein that serves as a receptor for cell-to-cell communication during metazoan development. Notch homologs are present in diverse organisms including sea urchins, fruit flies, and humans. Due to its sequence and functional conservation, Notch's role during development has been extensively studied (e.g., BEATUS and LENDAHL 1998 Down; FLEMING 1998 Down; ARTAVANIS-TSAKONAS et al. 1999 Down). As a result, functional domains of the Notch protein have been characterized, and many proteins involved in the Notch pathway have been identified.

The Notch transcript spans 30 kb of genomic DNA. We sequenced two segments (Fig 1), which we call the 5' region (exons 3 and 4 and neighboring introns) and the 3' region (the 3' end of exon 6). These segments are 10 kb apart. Exons 3 and 4 encode several of the epidermal growth factor (EGF)-like repeats of the extracellular domain of Notch. EGF repeats are involved in the interaction of Notch with its ligands (i.e., Delta and Serrate) and the initiation of the Notch signaling pathway (REBAY et al. 1991 Down). The 3' region of exon 6 represents the beginning of the intracellular domain of Notch that mediates the ligand-dependent response of the Notch pathway within the cell (through interactions with a number of other proteins).



View larger version (7K):
In this window
In a new window
Download PPT slide
 
Figure 1. Genomic structure of the Notch locus. Boxes represent exons of Notch, and lines represent noncoding 5', intron, and 3' sequence. The locations of the two regions sequenced (called Notch 5' and Notch 3') are shown.

Below we report that in D. melanogaster there are significant differences in the ratio of polymorphism to divergence between Notch 3' and other neutral X-chromosome loci. The trend is similar, though not as dramatic, in D. simulans. Of particular note is the high level of synonymous site divergence at Notch 3', a level three times the average observed between these species. Consideration of synonymous substitutions per effective number of preferred, unpreferred, and equal synonymous preference sites suggests that at Notch 3' positive selection has played a role in accelerating different types of synonymous site evolution in both of these species.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Samples:
Four population samples of D. melanogaster were surveyed for nucleotide sequence variability: Zimbabwe (Sengwa Wildlife Research Institute), United States (California), Ecuador (Atacame), and China (Beijing). Two population samples of D. simulans from the United States (North Carolina) and Zimbabwe (Harare) were also surveyed. Collection data for these populations have been reported previously (BEGUN and AQUADRO 1991 Down, BEGUN and AQUADRO 1994 Down, BEGUN and AQUADRO 1995 Down). For all D. melanogaster populations, extracted X chromosome lines (BEGUN and AQUADRO 1994 Down) were used. For D. simulans we used inbred lines. Sample sizes for D. melanogaster were 12, 15, 11, and 12 chromosomes for Zimbabwe, United States, Ecuador, and China, respectively. For D. simulans sample sizes were 12 and 10 for the United States and Zimbabwe, respectively.

Generation and analysis of sequence polymorphism data at Notch:
Cesium chloride gradient-purified genomic DNA was used for the D. melanogaster Zimbabwe and United States samples and for the D. simulans United States sample. For Ecuador and China in D. melanogaster and Zimbabwe in D. simulans, genomic DNA was extracted using the Gentra systems DNA isolation kit by Puregene. For the 3' region, 1636 bp was amplified using PCR (SAIKI et al. 1988 Down) with the following primers: N5452 5'-CGGTTATGTCTCGACGTCACG and RN7087 5'-GCAATCCTCATAGCTCGGCGG. This region corresponds to bases 5452–7087 of GenBank accession M16152. A total of 1581 bases were sequenced using internal primers. PCR products were cleaned using QIAGEN's PCR purification kit. Sequencing was done manually using the Amersham (Arlington Heights, IL) cycle sequencing kit (as directed by the manufacturer), which utilizes the dideoxy chain terminator method (SANGER et al. 1977 Down).

For what we term Notch 5', 1525 bp was amplified using PCR with the following primers: NF1 5'-GCTAATCCGCATCTATCTG and NR1B 5'-GTTTCAAGTGTATGCTAATTGC. This corresponds to bases 336–1860 of GenBank accession K03508. Using internal primers, 1488 bp were sequenced as above.

Sequences are deposited in GenBank under accession nos. AF361372, AF361373, AF361374, AF361375, AF361376, AF361377, AF361378, AF361379, AF361380, AF361381, AF361382, AF361383, AF361384, AF361385, AF361386, AF361387, AF361388, AF361389, AF361390, AF361391, AF361392, AF361393, AF361394, AF361395, AF361396, AF361397, AF361398, AF361399, AF361400, AF361401, AF361402, AF361403, AF361404, AF361405, AF361406, AF361407, AF361408, AF361409, AF361410, AF361411, AF361412, AF361413, AF361414, AF361415, AF361416, AF361417, AF361418, AF361419, AF361420, AF361421, AF361422 for Notch 5' and AF360581, AF360582, AF360583, AF360584, AF360585, AF360586, AF360587, AF360588, AF360589, AF360590, AF360591, AF360592, AF360593, AF360594, AF360595, AF360596, AF360597, AF360598, AF360599, AF360600, AF360601, AF360602, AF360603, AF360604, AF360605, AF360606, AF360607, AF360608, AF360609, AF360610, AF360611, AF360612, AF360613, AF360614, AF360615, AF360616, AF360617, AF360618, AF360619, AF360620, AF360621, AF360622, AF360623, AF360624, AF360625, AF360626, AF360627, AF360628, AF360629, AF360630, AF360631 for Notch 3' in D. melanogaster. For both regions the D. simulans and D. yakuba accessions are AY191369, AY191370, AY191371, AY191372, AY191373, AY191374, AY191375, AY191376, AY191377, AY191378, AY191379, AY191380, AY191381, AY191382, AY191383, AY191384, AY191385, AY191386, AY191387, AY191388, AY191389, AY191390, AY191391, AY191392, AY191393, AY191394, AY191395, AY191396, AY191397, AY191398, AY191399, AY191400, AY191401, AY191402, AY191403, AY191404, AY191405, AY191406, AY191407, AY191408, AY191409, AY191410, AY191411, AY191412, AY191413, AY191414.

Sequences were aligned using MegAlign of the DNASTAR software package and analyzed using the DnaSP 3.0 program (ROZAS and ROZAS 1997 Down). The effective number of synonymous and nonsynonymous sites was estimated using DnaSP 3.0. This program was also used to calculate {theta} and {pi}, estimates of 3Neµ (since Notch is on the X chromosome). P values for Tajima's D, Fu and Li's D, and Fu's Fs tests were obtained using the coalescent simulator of DnaSP 3.0, assuming either no recombination or R = 3Ner = 94. This choice of R is based on the genetic map-based recombination rate at Notch of r = 2.1 x 10–8 recombinants/generation/bp (HEY and KLIMAN 2002 Down) and an estimate of Ne of 1 x 106 (KREITMAN 1983 Down) for D. melanogaster.

FAY and WU's (2000) H test was performed on-line at http://crimp.lbl.gov/htest.html in the following manner. Ten thousand simulations were performed with R = 3Ner = 94 and assuming the probability of back mutation is 0.01 and 0.02 for Notch 5' and Notch 3', respectively (chosen on the basis of observed levels of divergence). Segregating sites for which the derived state was ambiguous were not considered.

Nucleotide divergence between species:
Sequences were aligned using MegAlign of the DNASTAR software package, with some alignments adjusted manually to keep gaps in-frame in coding regions. We estimated synonymous site divergence between D. melanogaster and D. simulans for both regions of Notch plus 79 other protein-coding regions. We considered only regions for which at least 50 synonymous sites could be compared between species. Notch 5' plus 26 other gene regions also contained intron data where at least 100 bp of intron sequence could be aligned between these species. Pairwise divergence for synonymous and intron sites was estimated using Kimura's two-parameter model (KIMURA 1980 Down), using the program Sequencer 6.1.0 (written by B. Kessing and available at http://nmg.si.edu/).

The relative rates test of TAJIMA 1993 Down was used to evaluate differences in the rate of substitution along the D. melanogaster and D. simulans lineages using D. yakuba as the outgroup. A list of the loci used in this study and their GenBank accession numbers are given in supplemental SAppendix S1 at http://www.genetics.org/supplemental/.

Estimating the effective number of preferred, unpreferred, and equally preferred synonymous sites:
For D. melanogaster, D. simulans, and D. pseudoobscura, preferred codons had previously been determined by comparing codon usage between the 10% lowest and 10% highest biased genes (SHIELDS et al. 1988 Down; AKASHI 1994 Down). Depending on the amino acid, G- and/or C-ending codons were deemed preferred (indicated in Table S1 at http://www.genetics.org/supplemental/) as they were found to be used significantly more often at the high-biased loci compared to low-biased ones. Following AKASHI 1995 Down, an "unpreferred change" is a change within a synonymous family from a preferred to an unpreferred codon. Changes from an unpreferred to a preferred codon are called "preferred," and those among unpreferred or preferred codons (a few synonymous families have two preferred codons) are called "equal."

We attempt to estimate the "effective number" of each type of synonymous change (in the same vein as the effective number of synonymous or nonsynonymous sites). We call these the effective number of synonymous preference sites, and they represent the mutation potential toward each type of change. As such, any bias in the mutation process can have profound effects on the estimation of the number of sites. We have used the mutation rate estimates obtained by PETROV and HARTL 1999 Down from the substitution pattern at pseudogenes (importantly these authors report relative mutation rates for each type of change). The mutation bias was incorporated by adjusting the proportion of substitutions at each codon position that would result in a preferred synonymous, unpreferred synonymous, or equal synonymous change [using AKASHI's (1995) classification]. The effective numbers of synonymous preference sites are listed for each of the 61 sense codons in Table S1 at http://www.genetics.org/supplemental/ (stop codons are ignored in these calculations). The number of each type of synonymous change (preferred, unpreferred, or equal) is then compared relative to the effective number of synonymous preference sites for that change. Our goal is to evaluate whether differences in the number of preferred and unpreferred fixations between these species' lineages represent positive selection or changes in selective constraint. We discuss this logic more in the RESULTS section.

We first describe how we estimated the number of synonymous preference sites for each codon. PETROV and HARTL 1999 Down used observed substitutions in "dead on arrival" transposable elements (considered to be pseudogenes) that were located throughout the genome to infer the underlying mutational pattern in Drosophila. After accounting for base frequency differences, they document that C to T and G to A mutations occur 2.2 times more frequently than the average of all other changes. They found this bias to be consistent across two subgenera of the Drosophila radiation (Sophophora and Drosophila). This mutation bias will affect estimates of both the effective number of synonymous and nonsynonymous sites and the effective number of synonymous preference sites in two ways. First, when a nucleotide mutates, the likelihood that it mutates to each of the other three bases is not equal. Second, codon positions made up of G's and C's will have a higher mutation rate than those made up of A's and T's.

When one assumes that all mutations occur at an equal rate, all nucleotide positions in a sequence are considered equal (i.e., each represents 1 "site" of the sequence) and each of the three changes possible for a nucleotide are considered equal (i.e., each change is considered as one-third of a site). To incorporate the mutation bias, we scaled the value given to each change from one nucleotide to another (0.333) by the percentage of difference between the number of times that mutation was observed by PETROV and HARTL 1999 Down to that expected if all mutations occurred at an equal rate (i.e., 1/12 or 0.0833, see Fig 2). As an example consider all possible mutations from the C nucleotide. From PETROV and HARTL 1999 Down, a C to T mutation occurs 15% of the time, while with no bias we would expect this value to be 8.33%. Thus, C to T mutations are now considered as (0.333(0.15 ÷ 0.0833)) = 0.60 of a site instead of 0.333. Likewise, C to A mutations represent (0.333(0.09 ÷ 0.0833)) = 0.36 and C and G mutations (1/3(0.06 ÷ 1/12)) = 0.24 of a site. As a result codon positions occupied by C and G are now considered (0.60 + 0.24 + 0.36) = 1.2 sites and T and A positions as (0.32 + 0.28 + 0.20) = 0.8 sites.



View larger version (25K):
In this window
In a new window
Download PPT slide
 
Figure 2. Mutation pathways for Drosophila. Numbers not in parentheses are values taken from PETROV and HARTL 1999 Down divided by two, to represent the proportion of times such a mutation was observed out of the total of 12 possible changes in the pathway. Numbers in parentheses are the values used to incorporate the mutation bias when estimating the effective number of synonymous preference sites (see MATERIALS AND METHODS).

To illustrate how we incorporate these mutation-biased pathways into our estimate of the number of synonymous preference sites we consider the codon CGG. One, zero, and three of the three possible changes at sites 1, 2, and 3, respectively, of the codon are synonymous. The C to A change at the first site is an equal synonymous change (see Table S1 at http://www.genetics.org/supplemental/), so 0.36 of first-position changes are equal preference. The only other synonymous changes are at site 3: G to A and G to T are both equal (so 0.96) and the G to C is preferred (thus 0.24). On a per codon basis (summing over the three codon positions), the number of "preferred synonymous sites" is thus 0 + 0 + 0.24 = 0.24, the number of "equal synonymous sites" is 0.36 + 0 + 0.96 = 1.32, and there are no "unpreferred synonymous sites." So, summing across the codon, CGG is considered as 0.24 preferred and 1.32 equal synonymous preference sites.

To apply this method to a coding region, we inferred the ancestral sequence of D. melanogaster and D. simulans assuming parsimony, using D. yakuba as the outgroup. Given our use of parsimony, multiple mutational hits are not taken into account. To determine whether multiple hits affect our reconstruction of the ancestral sequence, we also used a maximum-likelihood method (using PAML; YANG and NIELSEN 1998 Down, YANG and NIELSEN 2000 Down) to reconstruct the ancestor at Notch 3' (the coding region for which we observe the highest level of divergence). The highly supported ancestral sequence constructed using maximum likelihood was identical to the parsimony sequence. This finding supports the use of parsimony to reconstruct the ancestral sequence between these closely related species. However, the congruence of the two methods does not completely eliminate the uncertainty in estimating ancestral states, which may increase the variance of our estimate. The small number of codons for which the ancestral state of any of the three positions could not be inferred was excluded from further analysis. For each codon, the effective numbers of preferred, unpreferred, and equal synonymous preference sites (given in Table S1 at http://www.genetics.org/supplemental/) were multiplied by the number of times that codon occurred in the ancestral sequence for each coding region. These numbers were summed across codons, leading to an overall estimate of the effective number of preferred, unpreferred, and equal synonymous preference sites for the entire coding region. Two-by-two contingency tables were used to compare the rates of preferred and unpreferred fixations per site. Significance was determined using Fisher's exact test.

We point out that unless an equal number of purines and pyrimidines are found within a coding region, summing the number of synonymous and nonsynonymous sites will not equal the total length of the region when the mutations bias is taken into account. These values do, however, reflect the differential mutational potential between the different types of sites.

A program was written to construct the ancestral sequence and then partition the effective number of synonymous sites into the effective number of preferred, unpreferred, and equal synonymous preference sites as detailed above. This program also determines the number of preferred, unpreferred, and equal changes that have occurred along each species lineage for both fixed and pairwise comparisons and is available upon request from the authors.

Correcting for multiple tests:
In this study a number of tests are applied first to the two Notch regions and subsequently to a number of other loci to determine the generality of the Notch results. For the test results at Notch, we apply the Bonferroni correction, considering that only those two regions have been tested. For example, with the relative rates test on the synonymous sites the necessary P values required to obtain significance are 0.05 ÷ 2 (since two tests were performed). For the remainder of the loci, 0.05 is divided by the total number of loci compared (including the Notch regions) to determine the significant P value. For tests applied along only one species lineage we consider only the number of loci compared for that species.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide variability—D. melanogaster:
Polymorphic sites at the 3' and 5' regions of Notch, respectively, are presented in Figures S1 and S2 at http://www.genetics.org/supplemental/ for D. melanogaster. Estimates of nucleotide variability within each population are summarized in Table 1. As has been seen for other X-linked genes in this species (e.g., BEGUN and AQUADRO 1993 Down; ANDOLFATTO 2001 Down), there tends to be more nucleotide and haplotype diversity in Zimbabwe than in non-African populations. While the levels of variability observed at Notch are within the range seen at other loci in this species (e.g., MORIYAMA and POWELL 1996 Down), the 3' region is less variable than the 5' region for three of the four populations of D. melanogaster. This is in contrast to synonymous site divergence between D. melanogaster and D. simulans, which is higher at Notch 3' than at Notch 5' (Table 1, Fig 3).



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 3. Histogram of pairwise synonymous and intron site divergence between D. melanogaster and D. simulans. Levels of divergence were corrected for multiple hits, using Kimura's two-parameter model. Mean levels of synonymous and intron divergence and the location of divergence at Notch are indicated.


 
View this table:
In this window
In a new window

 
Table 1. Levels of nucleotide sequence variability and results of neutrality tests at the 5' and 3' regions of Notch

We use the Hudson-Kreitman-Aguadé (HKA) test (HUDSON et al. 1987 Down) to evaluate deviations from the neutral expectation of a consistent ratio of variability and divergence among gene regions. Both regions of Notch are compared to one another as well as to other published X chromosome loci (Table 2). Only X chromosomal loci are used to avoid assumptions about differences in effective population size between the X chromosome and autosomes (a contrast complicated by assumptions about sex ratios in natural populations; e.g., ANDOLFATTO 2001 Down). We perform the test in two ways. First we consider total pairwise differences between D. melanogaster and D. simulans and then we consider only the differences that have occurred specifically along the D. melanogaster lineage (using D. yakuba as an outgroup). A significant difference between the two regions of Notch is observed in Ecuador with the U.S. results being marginally significant when we consider total divergence. In no population is there a significant difference between the two regions of Notch with the lineage-specific analyses. For the majority of the comparisons to other loci there is a significant departure with Notch 3' but not with Notch 5'. In general the results are more strongly significant with the lineage-specific analyses. The exceptions are nonsignificant results with analysis between both Notch 3' and other loci for which variability has previously been shown to be reduced. For example, Pgd has previously been shown to have significantly reduced variability compared to divergence (BEGUN and AQUADRO 1994 Down), and LABATE et al. 1999 Down reported a reduced level of variability at runt. Therefore, in general, Notch 3' appears to be evolving in a manner different from other presumably neutral X chromosome loci. The HKA results suggest too little variability and/or too much divergence at Notch 3' compared to strict neutral expectations (Table 2). The pattern of sequence evolution along the D. melanogaster lineage appears to be the primary cause of this departure. In China there is a significant difference between Notch 5' and vermilion with the lineage-specific analysis. China differs from the other populations in that there is no marked increase in variation between Notch 5' and Notch 3' (Table 1).


 
View this table:
In this window
In a new window

 
Table 2. HKA test data and P values in the four D. melanogaster populations

TAJIMA's (1989) D and FU and LI's (1993) D statistics tend to be negative for the 5' region in all populations and for the 3' region in Zimbabwe (Table 1), but the departures are not significant. In contrast, for Notch 3' most of the non-African populations have positive statistics for both Tajima's and Fu and Li's tests (Table 1). Only Notch 3' in Ecuador is significantly positive (marginally), with two intermediate-frequency segregating sites (Table 1 and Figure S1 at http://www.genetics.org/supplemental/). FU's (1997) Fs statistic is negative at both regions of Notch but is not significant in any population when recombination is taken into account. FAY and WU's (2000) H test reveals that no individual population sample or region has a significant excess of high-frequency-derived variants.

Application of the McDonald-Kreitman test (MCDONALD and KREITMAN 1991 Down) to each Notch region revealed no significant departure from the neutral prediction of equivalent ratios of synonymous and nonsynonymous polymorphism to divergence. This is also true when we combine data from the two regions (36 synonymous polymorphisms:101 synonymous differences compared to 2 nonsynonymous polymorphisms:3 nonsynonymous differences). The very small number of nonsynonymous variants, however, gives this test little power.

Nucleotide variability—D. simulans:
Figures S3 and S4 at http://www.genetics.org/supplemental/ present the polymorphic sites at Notch 3' and 5', respectively, in D. simulans. Estimates of nucleotide variability within each population are summarized in Table 1. We observe slightly more variation in the Zimbabwe population than in the U.S. one. We note the pronounced haplotype structuring in the U.S. population, especially at Notch 3' (Figures S3 and S4 at http://www.genetics.org/supplemental/). There is much less structure in the Zimbabwe population. In fact, only one of the haplotypes observed in the U.S. population at Notch 3' is well represented in the African sample.

The HKA test results for D. simulans are given in Table 3. We detect no departure from neutrality in the relationship between levels of variability and divergence between Notch 5' and Notch 3' in either population. Neither region of Notch differs significantly from other X chromosomal loci although there is a trend for Notch 3' to have a lower ratio of variability to divergence. The results are the same whether the HKA test is applied using total divergence or when considering only the divergence along the D. simulans lineage (as described above).


 
View this table:
In this window
In a new window

 
Table 3. HKA test data and P values in the two D. simulans populations

Tajima's D and Fu and Li's D tests (Table 1) are both significant at Notch 3' in the U.S. population of D. simulans. Fu and Li's D is also marginally significant at Notch 5' in this population. All departures are positive, suggesting too many intermediate-frequency variants. This is in agreement with the visual pattern observed in the polymorphic site tables (Figures S3 and S4 at http://www.genetics.org/supplemental/). No test is significant in the Zimbabwe population. Fu's Fs statistic tends to be negative in these populations but is not significant when recombination is considered. Also, no departures from neutrality were detected in either population with FAY and WU's (2000) H test.

We also applied the McDonald-Kreitman test (MCDONALD and KREITMAN 1991 Down) to the D. simulans data. Again we found no significant departures from neutral predictions. As with D. melanogaster, this is true even when the data across the two Notch regions are combined (41 synonymous polymorphisms:101 synonymous differences, compared to 1 nonsynonymous polymorphism:3 nonsynonymous differences).

Synonymous site divergence at Notch:
We observe low to moderate levels of synonymous site polymorphism in both species yet extremely high levels of divergence. In contrast, intron divergence at Notch is less than the mean of 27 intron regions compared (Fig 3). This suggests that a regionally high mutation rate does not explain the high level of synonymous divergence at Notch. Could these results indicate a role of positive selection in the accelerated fixation of synonymous mutations at this locus? We present a number of analyses to attempt to answer this question.

Relative rate tests on pairwise sequence divergence (rooted with D. yakuba) demonstrate that significantly more synonymous substitutions have occurred along the D. melanogaster lineage than along D. simulans at both regions of Notch (Table 4). This result is in agreement with those of AKASHI 1996 Down who noted a trend of more synonymous substitutions along the D. melanogaster lineage compared to D. simulans. We applied the relative rate test to a number of other loci sequenced in these species (Table 4). Without a Bonferroni correction, four other loci are significant at the 5% level: three with more synonymous substitutions occurring in D. melanogaster (per, Amy-P, and Amyrel) and one with more substitutions in D. simulans (mei-218). Only Notch remains significant after a Bonferroni correction is applied. Summing across loci we observe significantly more synonymous substitutions along the D. melanogaster lineage even after correcting for multiple tests.


 
View this table:
In this window
In a new window

 
Table 4. Results of Tajima's {chi}2 relative rate test on levels of pairwise divergence along the D. melanogaster and D. simulans lineages

Interestingly, there is no significant difference between lineages in intron divergence at Notch or at any of the other loci after a Bonferroni correction. Also, when summing across introns there is no trend for more substitutions along the D. melanogaster lineage. In addition, intron divergence is significantly lower than fourfold synonymous divergence (Mann-Whitney P = 0.023). (The comparison of intron to fourfold synonymous divergence is chosen to avoid the need to assume a particular transition/transversion bias when estimating synonymous site divergence.) Comparatively lower intron divergence could reflect stronger functional constraint or lower mutation rate in introns and/or positive selection accelerating divergence at synonymous sites. We thus compare levels of polymorphism and divergence between intron data at Notch 5' and synonymous data at Notch 5' and Notch 3' in Zimbabwe. While there is no significant difference between the ratio of polymorphism and divergence at intron and synonymous sites within Notch 5' (17 intron polymorphisms and 34 intron differences vs. 9 synonymous polymorphisms and 18 synonymous differences, Fisher's exact test P value >= 0.999), the ratios are significantly different between Notch 5' intron and Notch 3' synonymous sites (17 intron polymorphisms and 34 intron differences vs. 22 synonymous polymorphisms and 113 synonymous differences, Fisher's exact test P value = 0.015). Note that the large number of synonymous differences at Notch 3' appears to be the outlier. The 5' intron and 3' synonymous data are not, however, significant with the more conservative HKA test, which takes into account evolutionary variance (P value = 0.113). Thus, constraint or mutation rate differences cannot be completely discounted for the lower intron divergence, yet the tendency toward greater synonymous site divergence at Notch 3' suggests a potential role of positive selection acting on synonymous sites in this region. Below we perform additional analyses to further investigate this possibility.

AKASHI 1996 Down showed that on average 60–70% of synonymous differences involving unpreferred and preferred codons between these species have the unpreferred codon in D. melanogaster. At Notch 5' and Notch 3', respectively, 89 and 98% of such synonymous divergent sites have the unpreferred codon in D. melanogaster. Relaxation of selective constraint on codon bias (due to a smaller effective population size) has been proposed as the cause of the greater number of unpreferred fixations in D. melanogaster compared to D. simulans (AKASHI 1995 Down, AKASHI 1996 Down; MCVEAN and VIEIRA 2001 Down). Given that Notch appears to be an extreme example of this general trend we wanted to discriminate between relaxation of constraint and positive selection as the cause for the codon usage patterns at this locus. We did this by developing a method to estimate the effective number of synonymous preference sites.

If relaxation of constraint is the sole explanation for the difference in synonymous evolution between D. melanogaster and D. simulans, the ratio of the number of preferred differences to the effective number of preferred synonymous preference sites should equal the ratio of unpreferred differences per unpreferred synonymous preference site along the D. melanogaster lineage. The class of changes with a significantly lower ratio could be deleterious, and ones with a higher ratio may be advantageous.

In Table 5 we report the number of changes observed and the effective number of preferred, unpreferred, and equal preference sites in D. melanogaster and D. simulans, for both regions of Notch. For these comparisons we consider fixed differences as we are evaluating per site rates of evolution for different types of synonymous mutations. In this manner we minimize the effects of segregating deleterious mutations that will never go to fixation but could be counted as such in pairwise comparisons. With Fisher's exact test, Notch 3' in D. melanogaster shows significantly more unpreferred fixations than preferred fixations per site (Table 5). We also report equal substitutions per site for comparison. No significant difference is observed at Notch 5' in D. melanogaster although the trend is in the same direction as Notch 3' (for Notch 5' the power of the comparison is limited by the small number of preferred synonymous sites). In D. simulans, we observe significant differences at both Notch 3' and 5' (Table 5). However, in this species we observe a significant excess of preferred compared to unpreferred fixations per site. To determine whether these results at Notch are part of a genome-wide phenomenon, we repeated our analysis for loci for which D. melanogaster, D. simulans, and D. yakuba have been sequenced and for which there are at least 450 bases of coding sequence (Table 6 and Table 7). Again, we consider only fixed differences that have occurred along each species lineage. Differences in the number of loci between the species reflect availability of polymorphism data.


 
View this table:
In this window
In a new window

 
Table 5. Ratios of the number of preferred, unpreferred, and equal substitutions per number of synonymous preference sites at both regions of Notch along each species' lineage


 
View this table:
In this window
In a new window

 
Table 6. Numbers of preferred, unpreferred, and equal substitutions and number of synonymous preference sites for many loci along the D. melanogaster lineage


 
View this table:
In this window
In a new window

 
Table 7. Numbers of preferred, unpreferred, and equal substitutions and number of synonymous preference sites for many loci along the D. simulans lineage

First we consider the results in D. melanogaster. Only Notch 3' has a significant excess of unpreferred compared to preferred substitutions per site in this species. While more loci have a higher unpreferred than preferred rate of substitution, the Wilcoxon signed rank test is not significant (P value = 0.322). Therefore, in D. melanogaster it appears that most but not all of the apparent higher rate of unpreferred substitutions can be explained by Drosophila's mutation bias together with relaxation of constraint on codon bias along the D. melanogaster lineage. The exception is Notch. Not only are unpreferred substitutions occurring faster than preferred at Notch 3', but unpreferred divergence at both regions of Notch is significantly higher than that of all the other loci individually or combined in this species (Notch 3', 47 unpreferred fixations out of 264 sites vs. total across other loci, 146 unpreferred fixations out of 4965 sites, P value < 0.0001; Notch 5', 12 unpreferred fixations out of 97 sites vs. total across other loci, 146 unpreferred fixations out of 4965 sites, P value < 0.0001). No difference in the rate of preferred and equal divergence is observed between Notch and the other loci.

Could this extreme rate of unpreferred divergence at Notch (specifically Notch 3') in D. melanogaster simply reflect an even more extreme mutation bias toward A's and T's than that inferred by PETROV and HARTL 1999 Down? One approach to this question is to examine substitutions at nearby intron sites (hoping they more closely reflect the underlying mutation process). We do not have intron data immediately adjacent to Notch 3'. However, at Notch 5' we can compare the number of G/C to A/T substitutions relative to the total number of G's and C's in the ancestor between fourfold degenerate and adjacent intron sites. Note that while we could not detect a significantly elevated number of unpreferred vs. preferred substitutions per site along the D. melanogaster lineage for Notch 5' alone, unpreferred divergence per site at this region is significantly greater than that of the other genes analyzed. This is consistent with the Notch 5' pattern of evolution being a less extreme case of the pattern seen at Notch 3'. We found that G/C to A/T fixations have occurred significantly more often at fourfold degenerate sites than at introns in the Notch 5' region (Fisher's exact P value = 0.004; six fourfold G/C to A/T substitutions, with 60 G's and C's in ancestor vs. nine intron G/C to A/T substitutions, with 462 G's and C's in ancestor). There is no difference between fourfold and intron sites in the rate of A/T to G/C mutations on a per base level (Fisher's exact P value > 0.999; no fourfold A/T to G/C substitutions, with 13 A's and T's in ancestor vs. three intron A/T to G/C substitutions, with 642 A's and T's in ancestor). These data suggest that substitution of A- and T-ending codons (unpreferred codons) at Notch 5' in D. melanogaster has occurred more frequently than the substitution of A's and T's in introns, suggesting that mutation bias alone cannot explain our results.

The pattern of synonymous site substitution is very different along the D. simulans lineage. Here, Notch, like many other loci, shows significantly more preferred than unpreferred fixations per site (Table 5 and Table 7). Many loci remain significant after applying the Bonferroni correction (Notch 3', pgi, tpi, and Zw). In addition, the majority of the loci have a higher level of preferred compared to unpreferred divergence per site (Wilcoxon signed rank test P value < 0.0001). Interestingly, as in D. melanogaster, the level and type of synonymous site evolution are also extreme at Notch. Only four of the additional loci studied have nonsignificant differences between their rate of preferred evolution and that at Notch. Also, at Notch 3' the rate of preferred fixations is significantly greater than the total of all the other loci (Notch 3', 23 preferred fixations out of 63 sites vs. total across other loci, 67 preferred fixations out of 1252 sites, P value < 0.0001).

As with D. melanogaster, we compared the number of A/T to G/C mutations on a per base pair level between fourfold synonymous and intron positions in D. simulans. We found no difference (Fisher's exact P value = 0.119; 1 fourfold A/T to G/C substitution, with 13 A's and T's in ancestor vs. 10 intron A/T to G/C substitutions, with 642 A's and T's in ancestor). However, the power of this comparison is compromised by the small number of A's and T's at fourfold degenerate sites.

Thus, specific types of synonymous changes have been accelerated in both species. In D. simulans there appears to be a genome-wide trend of an accelerated fixation of preferred changes per site, with Notch appearing to be an extreme example. In contrast, in D. melanogaster unpreferred substitutions appear accelerated at Notch. While there is a genome-wide trend in this same direction in D. melanogaster, it is not significant. However, in D. melanogaster the power of such comparisons is compromised by the small number of preferred sites inferred in the ancestor of these species.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Our interest in the Notch locus stemmed from previous data suggesting that this region was unaffected by positive selection in D. melanogaster. This is true at the amino acid level for the regions of Notch we surveyed. However, our data suggest that synonymous fixations at Notch have been accelerated by positive selection along both the D. melanogaster and D. simulans lineages.

At the level of polymorphism alone we do not detect any departures from neutral expectations in D. melanogaster. In D. simulans, we observe a significant excess of intermediate frequency variants in our U.S. sample. This result could be due to balancing selection and/or demography. Haplotype structure as seen at Notch is also reported at other unlinked loci in non-African samples of D. simulans (e.g., BEGUN and AQUADRO 1994 Down; EANES et al. 1996 Down; HAMBLIN and AQUADRO 1996 Down; HAMBLIN and VEUILLE 1999 Down; LABATE et al. 1999 Down). Thus, it is likely our result is due to demography, perhaps the result of historical admixture between divergent populations or population bottlenecks (WALL et al. 2002 Down).

When comparing levels of polymorphism to divergence we detect significant non-neutral patterns in D. melanogaster. For example, the HKA test detects a significantly lower ratio of polymorphism to divergence at Notch 3' compared to other "neutral" X chromosome loci in this species. The neutral theory predicts that regions with high divergence will also have high levels of polymorphism. However, if the variants examined tend to be advantageous on average themselves, then levels of divergence will be elevated relative to polymorphism. At the regions of Notch we studied, the most striking feature of the data is the extremely high level of synonymous site divergence, not the level or patterns of polymorphism. A closer examination of the synonymous site divergence data reveals a significantly greater number of unpreferred substitutions per site than preferred substitutions in D. melanogaster. In contrast, in D. simulans we observe significantly more preferred fixations per site than unpreferred ones. Thus, the pattern of synonymous site evolution at Notch suggests the influence of positive selection, but in the opposite directions in these closely related species.

What could be the cause of the acceleration of specific synonymous changes at Notch in these species? Synonymous codon usage bias is thought to be the result of a mutation or gene conversion bias, selection for translational accuracy/efficiency, and/or other forms of selection (i.e., selection for mRNA stability or regulation of transcription; reviewed in AKASHI 2001 Down and DURET 2002 Down). Although we have tried to address the issue of mutation bias by comparing the substitution pattern at fourfold synonymous and intron positions at Notch 5', there remains the possibility that our results in D. melanogaster are simply due to an underestimation of the mutation bias at the 3' end of Notch. Mutational processes in Drosophila have been shown to be context specific (KLIMAN and EYRE-WALKER 1998 Down). However, the base composition at Notch 3' in the extant and ancestral sequences does not deviate from that observed at the other loci in our study. We note that the mutational biases that we use from PETROV and HARTL 1999 Down to infer the number of synonymous preference sites differ slightly from the biases estimated by MCVEAN and VIEIRA 2001 Down. To see if this difference could affect our results, we incorporated the MCVEAN and VIEIRA 2001 Down mutational biases into our estimation of the number of sites and reapplied the test to Notch 3'. The conclusion of a higher level of unpreferred than preferred divergence per site at Notch 3' in D. melanogaster remains unchanged (data not shown).

Our results in D. simulans support previous claims that positive selection is involved in the establishment of a bias toward preferred G- and C-ending codons in Drosophila (i.e., AKASHI 1994 Down; AKASHI and SCHAEFFER 1997 Down; KLIMAN 1999 Down; KERN et al. 2002 Down). With our method, positive selection appears to play a role not only at the Notch locus but also genome-wide in this species. We note that other studies (BEGUN 2001 Down; MCVEAN and VIEIRA 2001 Down) have reported evidence of relaxation of constraint on codon bias along the D. simulans lineage. Such conclusions are drawn from observing an excess of unpreferred substitutions (i.e., more unpreferred substitutions than preferred) compared to that expected under mutation-selection-drift equilibrium. There are a number of potential differences between our method and those of BEGUN 2001 Down and MCVEAN and VIEIRA 2001 Down to explain the contrasting results. For example, we compare the number of unpreferred and preferred substitutions on a per site basis while BEGUN 2001 Down does not. In addition, we draw our conclusions from fixed differences while MCVEAN and VIEIRA 2001 Down use pairwise comparisons. MCVEAN and VIEIRA 2001 Down note that pairwise comparisons may lead to an underestimation of selection coefficients.

As an aside, our data suggest not only that positive selection has shaped synonymous site evolution in D. simulans but also that this species is at mutation-selection-drift equilibrium. An equal number of preferred and unpreferred substitutions would be expected for a lineage at equilibrium (BULMER 1991 Down). This is what we observe when we sum fixed differences across loci. Also, due to evolutionary variance, at equilibrium some loci will have more preferred substitutions and others more unpreferred, but the number of loci that go one way or the other should be equal. This is indeed the case in D. simulans with 15 loci having more unpreferred fixations and 14 more preferred (Wilcoxon signed rank test P value = 0.370).

When considering the mutation-selection-drift equilibrium predictions detailed above, D. melanogaster does not appear to be at equilibrium. When summing across loci we observe approximately eight times more unpreferred fixations than preferred. Also, we observe more unpreferred fixations than preferred in 19 of the 21 loci studied (Wilcoxon signed rank test P value = 0.0001). This general observation of a greater number of unpreferred than preferred fixations in D. melanogaster has previously been interpreted as relaxation of constraint on codon bias along this species lineage (e.g., AKASHI 1994 Down, AKASHI 1996 Down). With our method we cannot reject this hypothesis genome-wide in D. melanogaster. However, relaxation of constraint does not appear sufficient to explain the large excess of unpreferred fixations per site in D. melanogaster at the Notch 3' region.

Our results suggest that positive selection is involved in the fixation of unpreferred mutations at Notch 3', but the nature of that selection is presently unknown. There are a number of possible explanations. A recent switch in codon preference due to a change in tRNA abundance seems unlikely, given the resulting load of deleterious fixations across the genome as detailed by AKASHI et al. 1998 Down. Selection pertaining to Notch mRNA stability and/or gene expression and regulation could be involved. For example, a growing number of studies show the importance of synonymous sites in the functionality of proteins and in translation kinetics (CORTAZZO et al. 2002 Down; DUAN et al. 2003 Down; ORESIC et al. 2003 Down). SMITH and EYRE-WALKER (2001) have shown that in some cases the use of "suboptimal" codons in Escherichia coli appears to be due to some form of "conflicting selection" (i.e., regulation of gene expression or mRNA/DNA secondary structure). To the best of our knowledge the structure of the Notch mRNA is unknown, and no programs to predict the structure can input the entire 8-kb transcript.

The question remains as to why Notch is an extreme example of genome-wide trends in codon usage within each of these species and in synonymous codon usage differences between them. Notch resides in a region of high recombination, which may aid in the efficiency of weak selection due to a larger region-specific effective population size (i.e., BIRKY and WALSH 1988 Down; BARTON 1995 Down). Pronounced differences in synonymous codon usage between the melanogaster and the ananassae and obscura species groups (i.e., more unpreferred fixations in the former) at the yellow locus are thought to be due to a marked reduction of recombination at the tip of the X chromosome in the melanogaster species group (MUNTE et al. 1997 Down, MUNTE et al. 2001 Down). If such an explanation were to account for the extreme differences in codon useage between the species used in this study, we would expect the recombination rate at Notch in D. melanogaster to appear relatively low. However, using the method of WALL 2000 Down, we found that the population rate of recombination per base pair for both Notch regions is relatively high compared to other loci and is much higher than estimates for the Notch region based on integrated map methods (data not shown). The estimates of R (3Ner for Notch) per base pair are 0.263 and 0.213 for Notch 5' and Notch 3', respectively.

It has also been noted that changes in mutation bias can cause different substitution processes even among closely related species and can lead to a brief burst of substitutions (TAKANO-SHIMIZU 1999 Down, TAKANO-SHIMIZU 2001 Down). However, these studies conclude a strong role of mutation without calibrating the comparisons by the base composition of the ancestral sequence. As previously mentioned for D. melanogaster, when base composition is taken into account, GC to AT substitutions (the type of mutations that tend to result in unpreferred changes) have occurred significantly faster at fourfold degenerate sites than at introns at Notch 5'. This suggests to us that a recent switch in mutation bias does not explain our results.

The substitution process is governed by the input of mutations and fixation due to drift and selection. With the available data, it does not appear that mutational processes alone can explain our results, unless the mutation bias is more extreme in exons compared to introns. Thus, positive selection appears to have accelerated the fixation of a subset of synonymous codons at Notch in D. melanogaster and D. simulans. These results add to the growing caution in the use of synonymous site evolution as a neutral proxy (e.g., AKASHI and KREITMAN 1995 Down; BUSTAMANTE et al. 2002 Down; FAY et al. 2002 Down; SMITH and EYRE-WALKER 2002 Down; SWANSON et al. 2003 Down). For example, dN/dS comparisons for which synonymous changes are assumed neutral may underestimate the presence of positive selection at the amino acid level. Our results may also shed light as to the cause of the lower average level and variance in intron divergence compared to synonymous site divergence observed between these species (Fig 3; BAUER and AQUADRO 1997 Down; TAKANO-SHIMIZU 2001 Down). A striking difference is the skew toward high levels of divergence for synonymous sites but not introns. This pattern could indicate that intron mutations are in general more selectively constrained than synonymous. Given that Drosophila genes tend to have short introns (MOUNT et al. 1992 Down), this explanation is plausible. On the other hand, our Notch results suggest that positive selection has the potential to accelerate the fixation of some synonymous mutations for at least some of the loci with high levels of synonymous site divergence.


*  FOOTNOTES

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AF361372, AF361373, AF361374, AF361375, AF361376, AF361377, AF361378, AF361379, AF361380, AF361381, AF361382, AF361383, AF361384, AF361385, AF361386, AF361387, AF361388, AF361389, AF361390, AF361391, AF361392, AF361393, AF361394, AF361395, AF361396, AF361397, AF361398, AF361399, AF361400, AF361401, AF361402, AF361403, AF361404, AF361405, AF361406, AF361407, AF361408, AF361409, AF361410, AF361411, AF361412, AF361413, AF361414, AF361415, AF361416, AF361417, AF361418, AF361419, AF361420, AF361421, AF361422, AF360581, AF360582, AF360583, AF360584, AF360585, AF360586, AF360587, AF360588, AF360589, AF360590, AF360591, AF360592, AF360593, AF360594, AF360595, AF360596, AF360597, AF360598, AF360599, AF360600, AF360601, AF360602, AF360603, AF360604, AF360605, AF360606, AF360607, AF360608, AF360609, AF360610, AF360611, AF360612, AF360613, AF360614, AF360615, AF360616, AF360617, AF360618, AF360619, AF360620, AF360621, AF360622, AF360623, AF360624, AF360625, AF360626, AF360627, AF360628, AF360629, AF360630, AF360631, and AY191369, AY191370, AY191371, AY191372, AY191373, AY191374, AY191375, AY191376, AY191377, AY191378, AY191379, AY191380, AY191381, AY191382, AY191383, AY191384, AY191385,