Abstract
PGM plays a central role in the glycolytic pathway at the branch point leading to glycogen metabolism and is highly polymorphic in allozyme studies of many species. We have characterized the nucleotide diversity across the Pgm gene in Drosophila melanogaster and D. simulans to investigate the role that protein polymorphism plays at this crucial metabolic branch point shared with several other enzymes. Although D. melanogaster and D. simulans share common allozyme mobility alleles, we find these allozymes are the result of many different amino acid changes at the nucleotide level. In addition, specific allozyme classes within species contain several amino acid changes, which may explain the absence of latitudinal clines for PGM allozyme alleles, the lack of association of PGM allozymes with the cosmopolitan In(3L)P inversion, and the failure to detect differences between PGM allozymes in functional studies. We find a significant excess of amino acid polymorphisms within D. melanogaster when compared to the complete absence of fixed replacements with D. simulans. There is also strong linkage disequilibrium across the 2354 bp of the Pgm locus, which may be explained by a specific amino acid haplotype that is high in frequency yet contains an excess of singleton polymorphisms. Like G6pd, Pgm shows strong evidence for a branch point enzyme that exhibits adaptive protein evolution.
UNDERSTANDING how selection at higher phenotypic levels impacts molecular variation at the single gene level is an important question in evolutionary and physiological genetics. A working view from a physiological perspective would propose that selection on life history variation results in selection to modulate energy budgets and this targets the partitioning of metabolic fluxes into growth and reproduction. Central to this reductionist view is how the differential fluxes at metabolic crossroads, such as those involving the glycolytic pathway and its numerous junctions, respond to selection. Historically, allozyme studies revealed much about polymorphism in these genes, and the observation that some enzymes are variable, while others are not, has potential adaptive explanations (Gillespie 1991; Mitton 1998). A fundamentally important question is whether intrinsic features of these enzymes, such as their structure-function constraints or relative roles in partitioning fluxes among competing pathways, lead to the gene-specific patterns of intra- and interspecific variation. Studies of G6pd, which shares a common substrate with both phosphoglucomutase and hexokinase at the head of the glycolytic pathway at the branch leading to the pentose shunt, make a compelling case for both selection acting on the enzyme polymorphism in Drosophila melanogaster and episodic selection on amino acid changes in the lineage leading to D. simulans (Eanes et al. 1993, 1996). To understand this problem in a larger context, a systematic study of the enzymes of the pathway is important (Eanes 1999). This lab has focused on the need to characterize molecular variation in these genes and in this report addresses the nature of molecular variation found at the Pgm locus in D. melanogaster and D. simulans.
Phosphoglucomutase (PGM; EC 2.7.5.1) plays a major role in the synthesis and breakdown of glycogen, which is important to energy storage in muscle tissue, and catalyzes the interconversion of glucose-1-phosphate and glucose-6-phosphate (Ray and Roscelli 1964; Hiroseet al. 1970). PGM shares glucose-6-phosphate with several other enzymes at a branch point at the head of the metabolic pathway leading into glycogen metabolism, the pentose shunt, and the main glycolytic corridor. PGM has been routinely studied in many organisms because of its high level of allozyme polymorphism (Dawson and Jaegar 1970), and there are three common allozymes in D. melanogaster (Hjorth 1970; Trippaet al. 1970). The Medium mobility allele is widespread, but seven additional allozyme alleles and other underlying amino acid variants have been revealed by electrophoretic and thermostability criteria (Trippa et al. 1976, 1977, 1978). A study of geographic variation of the allozyme alleles found no evidence of clines (Oakeshottet al. 1981), which differs from other allozyme polymorphisms in D. melanogaster, where latitudinal clines are common and taken as supporting evidence for the action of natural selection (Singh and Rhomberg 1987; Eanes 1999). D. simulans also has three common allozymes possessing the same electrophoretic mobilities and geographic patterns as the three common allozymes in D. melanogaster. This raises the interesting possibility that the common PGM allozyme polymorphisms could predate speciation (Anderson and Oakeshott 1984; Hyytiaet al. 1985), and in a pair of species for which very few shared polymorphisms exist, a locus with shared allozyme alleles suggests balancing selection maintains this protein polymorphism. In addition, several lab selection experiments also suggest balancing selection influences PGM allozyme allele frequencies, although other studies show inconsistent patterns (Carfagnaet al. 1980; Oakeshott et al. 1985, 1988). A biochemical study of the allozyme alleles in D. melanogaster reported no difference in kinetic properties (Fucciet al. 1979). Nonetheless, if a specific allozyme mobility class results from several different amino acid differences (as suggested by the thermostability studies), this will potentially confound both studies of geographic variation and functional studies where variation is defined only by allozyme mobility. Finally, the Pgm locus (located on the third chromosome at 72D7) also lies just inside the proximal breakpoint (73E3) of the cosmopolitan In(3L)P inversion. This inversion exhibits clines with latitude in natural populations on several continents (Mettleret al. 1977; Knibbet al. 1981), and, in principle, selection favoring the inversion could actually maintain variation at the Pgm locus (Wesley and Eanes 1994; Hasson and Eanes 1996). Despite this tight physical linkage, no significant linkage disequilibrium has been found between PGM allozyme alleles and the inversion in natural populations (Langley et al. 1974, 1977).
Given our interest in examining patterns of variation in metabolic genes, we were interested in addressing several questions in this study. We first recover and characterize the primary sequence for Pgm from D. melanogaster and from this investigate overall sequence variation and the molecular nature of the allozyme variation. We address whether Pgm is similar to Est-6 in exhibiting abundant amino acid variation (as predicted from thermostability studies), as well as possessing high levels of interspecific divergence. Because D. melanogaster and D. simulans share the same allozyme mobility alleles for PGM, we will also examine the possibility of a common mutational origin and long-term persistence of these allozymes in both species. Finally, we were interested in determining whether there is evidence for adaptive protein evolution at the Pgm locus like that seen for other metabolic enzymes such as G6pd (Eaneset al. 1993) and Adh (McDonald and Kreitman 1991).
MATERIALS AND METHODS
Isolation of Pgm locus: A third instar larvae cDNA library made from D. melanogaster was kindly provided by Peter Gergen (SUNY at Stony Brook). Yeast and human PGM protein sequences (GenBank accession nos. P33401 and P36871, respectively) were aligned and four degenerate primers (two positive, two negative) were constructed from highly conserved regions of the sequence. Using these primers, a 300-bp fragment was amplified from the cDNA library via PCR. This fragment was sequenced with the degenerate primers to verify the isolation of D. melanogaster Pgm cDNA. Primers specific to the D. melanogaster sequence were then constructed from the 300-bp fragment. The cDNA library was then amplified with vector-specific primers and the cDNA pool was size selected from a 3% agarose gel to remove other abundant transcripts of unwanted sizes. The recovered 300-bp Pgm fragment was labeled with biotin-14-dATP using PCR, denatured at 95° for 5 min, and hybridized to the size-selected singlestranded cDNA pool at 45° for 8 hr. This DNA-duplex was then isolated with streptavidin-coated magnetic beads (Dynal, Inc., Great Neck, NY) at room temperature for 20 min. After several washes to remove unspecific fragments bound to the beads, the DNA was eluted from the magnetic beads in 2.5 mm EDTA at 85°, and this template DNA was used directly for PCR. A specific Pgm positive primer was paired with a negative cDNA primer to obtain the remaining sequence to the 3′ end. However, due to the reverse transcriptase enzyme prematurely falling off during transcription, a heterogeneous fragment pool was amplified when a positive vector-specific primer was paired with a negative Pgm specific primer. This PCR fragment pool was cloned and 100 clones were subsequently used for PCR templates. The largest cloned insert from this screen was sequenced and found to be short of the 5′ end. A new negative Pgm specific primer was synthesized from this sequence and combined with the positive cDNA vector-specific primer to amplify another pool of Pgm transcripts from the template DNA recovered from the magnetic beads. This “transcript walk” of amplification, cloning, sequencing, and primer synthesis was performed several times until the 5′ end of the coding region was finally discovered. A new positive Pgm specific primer was then synthesized at the 5′ end to amplify the entire gene from genomic DNA. Introns were determined when full genomic sequence was compared to our previously recovered Pgm cDNA sequence.
Origin of wild lines: D. melanogaster isofemale lines were collected from a Davis Peach Farm, Mt. Sinai, New York, population in 1995 (DPF95) and made homozygous for the third chromosome using the TM3/TM6 balancer. Lines were genotyped for allozyme mobility (Hjorth 1970), and based on the frequency of the three common allozyme mobilities (Medium, Fast, and Slow) in this population, a constructed random sample (CRS, Hudsonet al. 1994) of 22 lines was sequenced. Therefore, parameter estimates (i.e., θ and π) calculated from this sample reflect the frequencies of PGM allozyme alleles in this population. Isofemale lines collected in 1990 from two populations in Zimbabwe [Havare, Z(H) and Sengwa Wildlife Preserve, Z(S)] were kindly provided by C.-I. Wu’s laboratory. Thirteen of these lines were made homozygous for the third chromosome and were sequenced for Pgm to compare the level of variation in Zimbabwe with the North American population sample. Using primers specific for the In(3L)P arrangement (Wesley and Eanes 1994), extracted third chromosome lines were screened for the inversion and Pgm was sequenced from these: Davis Peach Farm, Mt. Sinai, New York (3 lines; DPF95), Maryland (1 line; MD90), Spartanburg, South Carolina (1 line; SC96), and Homestead, Florida (5 lines; HFL97). D. simulans isofemale lines were collected from a Davis Peach Farm population in 1996 (DPF96), and a survey for the three PGM allozyme allele frequencies was performed. We sequenced a CRS of 13 lines, which reflects the frequencies of the alloyzme alleles in this population. Lines genotyped as a Fast or a Slow allozyme mobility were made homozygous at the Pgm locus after two generations of mating with a Medium mobility allele stock. Conversely, lines genotyped with a Medium allozyme mobility were made homozygous at the Pgm locus after two generations of mating with a Fast mobility allele stock. Flies were then cut in half, and the abdomen was used to check allozyme genotype while the thorax served as tissue for DNA extraction. Nine D. yakuba isofemale lines from a 1999 West African sample (T. F. C. Mackay’s laboratory) were sequenced and used to partition fixations into the D. melanogaster and D. simulans lineages. Finally, a PGM null allele (line PgmnGB1) found in a natural population sample from Great Britain (Langleyet al. 1981; Burkhartet al. 1984) was obtained from the Bloomington Stock Center (B4039) and sequenced.
PCR amplification and sequencing: Fragments were amplified in 10-μl volumes in an Idaho Technologies (Idaho Falls, ID) Air-Thermo-Cycler by PCR from single-fly CTAB genomic preps (Winnepenninckxet al. 1993). PCR products are excised from 2% agarose gels for 50-μl reamplifications using internal primers. DNA fragments were purified from PCR reactions [Bio-Rad (Hercules, CA) Prep-A-Gene kit] and double-stranded DNA templates were manually sequenced using the Sequenase kit (United States Biochemical Co., Cleveland) and [35S]dATP (Amersham, Buckinghamshire, England). Reactions were run on acrylamide gels with an electrolyte gradient and electrophoresed for 3-7 hr. All sequences are stored under GenBank accession nos. AF290313-AF290370.
RESULTS
Levels of intraspecific polymorphism: The isolation of the primary Pgm sequence from the D. melanogaster cDNA library yielded a 1680-bp coding region equal to 560 amino acids in length and shows high similarity to known PGM protein sequences from many other taxa (Whitehouseet al. 1998). Upon sequencing genomic DNA, four exons of 78, 247, 1219, and 136 bp and three introns of 71, 534, and 66 bp were discovered. There are 408 silent site equivalents across the 1680-bp coding region. The CRS of 22 lines (DPF95) is comprised of 3 Slow, 2 Fast, and 17 Medium allozyme alleles and contains 13 silent, 12 replacement, and 17 intron polymorphisms. Figure 1 lists a total of 55 variable sites from all 44 lines sequenced for D. melanogaster. The three Slow allozyme alleles in this data set result from three independent amino acid replacements at nucleotide sites 25 (Ala to Thr at amino acid residue 9, or A9T), 1194 (E197K), and 1308 (E235K). Although the latter two replacements explain the charge changes of the slower electrophoretic mobilities, the first replacement at nucleotide site 25 (A9T) does not predict a charge change. Additional sequence data for the Slow allozyme mobility class reveals that this class is truly heterogeneous; however, the polymorphism at nucleotide site 25 accounts for most of the Slow copies in natural populations (our unpublished data). The two Fast allozyme alleles in the CRS both possess a pair of tightly linked changes at nucleotide sites 1324 (R240L) and 1340 (E245D), the change at nucleotide site 1324 being responsible for the Fast electrophoretic mobility. The remaining 8 replacement polymorphisms in Figure 1 are cryptic amino acid changes that are not detectable in allozyme screens. The D. simulans sequences predict that the Medium allele is the ancestral state. The average numbers of silent differences within the Medium, Slow, and Fast allele classes are 1.76, 4.00, and 0.00. On average, Medium and Fast alleles differ by 2.35 silent sites, Medium and Slow alleles differ by 2.74 silent sites, and Fast and Slow alleles differ by 2.67 silent sites.
Sequence variation for the Davis Peach Farm CRS, the Zimbabwe sample, and the In(3L)P sample are all summarized in Table 1. The 13 sequences from Zimbabwe add an additional five silent, four replacement, and two intron polymorphisms to the North American sample in Figure 1. A single Slow allozyme allele was sequenced in this sample (Z26H) and shares the same change at nucleotide site 25 as that found in the North American sample (DPF95 94.1). The level of silent polymorphism in the Zimbabwe sample is actually lower than that of the North American sample. This is not a typical observation seen in comparisons made between these two geographic regions (Begun and Aquadro 1993; Eaneset al. 1996). The Zimbabwe sample segregates several of the North American replacements in addition to a few unique singletons. The CRS of 22 lines was screened for the In(3L)P inversion and two lines, DPF95 13.0 and DPF95 48.2, both bear the inversion. Eight additional lines containing the inversion are also listed in Figure 1 and together add only one unique silent polymorphism. A unique mutation (site 1308) resulting in a Slow allele (DPF95 13.0) is found on an inverted chromosome only. A surprising observation, given the proximity of the breakpoint, is the lack of divergence between Pgm alleles on standard and inverted arrangements. Figure 1 shows the presence of shared polymorphism between arrangements and indicates some level of exchange. A single gene conversion event is predicted between the two arrangements at nucleotide sites 226-460 (235 bp in length) using DnaSP (Rozas and Rozas 1999), and exchange of short “gene tracts” of this nature alone may explain the lack of divergence between the two arrangements. It seems that although most of the observed variation is shared among the three D. melanogaster samples in Figure 1, all three samples contain private alleles as well. Nucleotide divergence between the Davis Peach Farm CRS and the Zimbabwe sample is 0.0054 and 0.0015 at silent and replacement sites, respectively. Nucleotide divergence between the Davis Peach Farm CRS and the In(3L)P sample is 0.0059 and 0.0016 at silent and replacement sites, respectively. Overall, these three samples are not apparently different in silent or replacement variation and divergence at the Pgm locus. Finally, the PgmnGB1 null allele has a unique replacement polymorphism at nucleotide site 931 (G109A) and this change lies in the active site of PGM (Daiet al. 1992; Liuet al. 1997). This glycine to alanine substitution is not a conservative biochemical change (Argoset al. 1979; Chakrabarttyet al. 1991) and may explain the reported loss of PGM activity for this allele.
—List of 55 polymorphic sites identified across 2354 bp of Pgm in 44 lines of D. melanogaster. Characters a and b designate nucleotide positions responsible for Slow and Fast allozyme polymorphisms, respectively. Nucleotide polymorphisms shown in boldface type are amino acid polymorphisms. Italics refer to polymorphisms in introns I, II, and III, respectively. SIM refers to D. simulans line DPF96 3.0. Line labels followed by an asterisk have the In(3L)P inversion.
The summary statistics for the 13 D. simulans alleles are presented in Tables 1 and 2. The CRS of 2 Slow, 1 Fast, and 10 Medium alleles are presented in Figure 2. When compared to D. melanogaster, Table 1 shows almost a sixfold increase in the level of silent site polymorphism for D. simulans with 64 silent site polymorphisms in the sample. The level of replacement polymorphism is lower, which is typical for these two species (Moriyama and Powell 1996). Five replacement and 22 intron polymorphisms are also shown in Figure 2. All 5 of the replacement polymorphisms result in rare allozyme alleles, none of which are shared with D. melanogaster. The two sampled Slow allozyme alleles are the result of two different replacements at the nucleotide level. Line DPF96 10.6 has a second replacement polymorphism at nucleotide site 217 (A49G) in addition to the change responsible for its Slow mobility at site 205 (E45A). Line DPF96 17.4 has a change at nucleotide site 1128 (D175N) that results in the second Slow allele. Line DPF96 36.2 has a second replacement polymorphism at nucleotide site 1954 (T450I) in addition to the change responsible for its Fast mobility at nucleotide site 1075 (K157M). Additional Fast alleles were sequenced (our unpublished data), and all show the same polymorphism at nucleotide site 1075. In contrast to D. melanogaster, the second intron is highly variable in length within D. simulans and is omitted from the data in Figure 2. An apparent 2-bp insertion in the third intron in D. melanogaster and the highly variable second intron in D. simulans account for the only differences in gene length for the two species.
Figure 3 shows the neighbor-joining analysis (Saitou and Nei 1987) of all 44 sequences of D. melanogaster and 13 sequences of D. simulans. It is based only on silent and intron polymorphisms within and between both species, and this illustrates the clustering of silent variation independent of the amino acid polymorphisms. Although recombination can obscure the true genealogical relationship among allele copies, this analysis is used simply as an exploratory method to structure the variation at this locus. However, the analysis using all polymorphic sites (including amino acid polymorphisms) results in the same clustering of alleles indicating strong disequilibrium across the gene. As previously pointed out in Figure 1, Figure 3 demonstrates that there is no evidence of geographic structuring when the Davis Peach Farm and Zimbabwe populations are compared in a genealogical framework. In addition, the In(3L)P sequences do not form one single cluster, but three small clusters, representing gene conversion events that have led to some divergence between alleles on inverted chromosomes and convergence with alleles on the standard arrangements. Two In(3L)P sequences branch in a cluster that is ancestral to all other sequences, including all other standard alleles, an observation not uncommon for this inversion (Hasson and Eanes 1996). All allozyme alleles are also labeled in Figure 3. Although several Pgm alleles share electrophoretic mobilities, both species possess Slow mobility alleles that are more closely related to Medium and Fast alleles than they are to other Slow alleles. The difference in levels of silent polymorphism within each species is reflected in the deeper branches in D. simulans compared to D. melanogaster.
Summary parameter estimates for D. melanogaster and D. simulans
Intra- and interspecific comparisons for D. melanogaster and D. simulans for the 1680-bp coding region of Pgm
Linkage disequilibrium: There is pervasive linkage disequilibrium across the entire 2354-bp region of the Pgm gene in D. melanogaster. This is notable from Figure 4 in the high associations of polymorphisms for the 22 sequences of the Davis Peach Farm CRS. Using 21 polymorphic sites (21 singletons are omitted), 99 out of 210 pairwise correlations are statistically significant by a chi-square test (P < 0.05). Figure 4 also shows that nucleotide sites more than 2 kb apart show some of the strongest disequilibria (33 of the 210 pairwise correlations are significant at the 0.1% level with a Bonferroni correction). Because two of the sequences in our Davis Peach Farm CRS (DPF95 48.2 and DPF95 13.0) are associated with the In(3L)P inversion, this could inflate estimates of linkage disequilibrium for the entire data set. However, because there is no apparent association between the inversion and Pgm variation, and because no divergence exists between arrangements, linkage disequilibrium estimates did not change even when these two alleles were removed from the analysis.
Tests of the polymorphism frequency spectrum: Tajima (1989) and Fu and Li (1993) tests were applied to the data for both species. We tested the Davis Peach Farm CRS of 22 sequences of D. melanogaster, the 13 Zimbabwe sequences, the 10 In(3L)P sequences, and the 13 D. simulans sequences, each independently, to see if either the silent or replacement polymorphism frequency distributions differed from that expected for a neutral distribution of variation. All test results are presented in Table 3. Only the analysis of D. simulans replacement polymorphism by the Fu and Li test is significant.
—List of 92 polymorphic sites identified across 1852 bp of Pgm in 13 lines of D. simulans. Characters a and b designate nucleotide positions responsible for Slow and Fast allozyme polymorphisms, respectively. Nucleotide polymorphisms shown in boldface type are amino acid polymorphisms. Italics refer to polymorphisms in introns I and III, respectively (intron II omitted). MEL refers to D. melanogaster line DPF95 2.0.
Interspecific comparisons: Table 2 shows the summary of polymorphism and divergence for the total coding region of 1680 bp in D. melanogaster and D. simulans. Pooled polymorphisms for the two species result in 77 silent and 17 replacement polymorphisms. Comparing the Davis Peach Farm CRS of 22 sequences of D. melanogaster with the 13 sequences of D. simulans, the total numbers of fixed silent and replacement sites between the two species are 34 and 0, respectively. Because intraspecific polymorphism and interspecific divergence are correlated, a statistical test of independence can determine if these values deviate from that expected under neutrality (McDonald and Kreitman 1991). The 2 × 2 contingency Fisher’s exact test of the pooled polymorphism (77/17) vs. the divergence at silent and replacement sites (34/0) is highly significant (P < 0.001). Using D. yakuba as an outgroup, the number of fixed differences along the D. melanogaster-simulans lineage can be partitioned into either species’ lineage, as shown in Table 2. There is no significant difference in the number of silent fixations in either lineage. However, when the number of silent and replacement polymorphisms in each species is contrasted with the number of fixed silent and replacement differences by lineage using a Fisher’s exact test, the D. melanogaster comparison is highly significant (P < 0.0001), whereas the D. simulans contrast is not. Compared to the level of divergence at silent and replacement sites along the D. melanogaster lineage (19/0), there appears to be a significant deviation in the ratio of silent to replacement polymorphism (13/12). The level of silent site polymorphism at Pgm (θ = 0.0087) is typical of other D. melanogaster loci (Moriyama and Powell 1996). Assuming that the silent site variation is neutral, the difference in the ratios of polymorphism to divergence at replacement and silent sites could indicate unusually high amino acid polymorphism or unusually low amino acid divergence. Because interspecific comparisons often show no replacement site divergence for these species (Moriyama and Powell 1996), this suggests that the observation of high replacement polymorphism at the Pgm locus in D. melanogaster is unusual.
DISCUSSION
Excess of amino acid polymorphism: The principal observation of this study is the large number of amino acid polymorphisms segregating in both species. The significance of this observation is supported by the McDonald-Kreitman test using the pooled data for both species. The overall level of silent divergence (8%) is typical of these two species (Moriyama and Powell 1996) and therefore cannot explain the significant deviation in the McDonald-Kreitman test. By using D. yakuba as an outgroup, fixations along the D. melanogaster-simulans lineage can be partitioned into each species’ lineage and compared to levels of silent and replacement polymorphism. Although the ratio of silent to replacement polymorphisms is near parity within the D. melanogaster lineage (13/12), the ratio for fixations at silent and replacement sites is dramatically different (19/0). Because the contrast of silent and replacement polymorphism (64/5) to silent and replacement divergence (11/0) in the D. simulans lineage is not statistically significant, the result from the pooled data from both species (P < 0.001) appears largely due to the significant excess of replacement polymorphism in the D. melanogaster lineage (P < 0.0001).
—Neighbor-joining tree based on silent substitutions among and between the 44 D. melanogaster and 13 D. simulans Pgm sequences. S1, S2, and S3 designate Slow alloyzme alleles due to three unique mutations at the nucleotide level in D. melanogaster. S4 and S5 designate Slow allozyme alleles due to two unique mutations at the nucleotide level in D. simulans. F1 and F2 designate Fast allozyme alleles due to two unique mutations at the nucleotide level in D. melanogaster and D. simulans, respectively. Asterisks designate sequences with the In(3L)P inversion. Scale indicates one difference.
The relative difference between levels of polymorphism and divergence at the Pgm locus is atypical of other genes in D. melanogaster. Figure 5 plots the relationship of replacement polymorphism (characterized as θa) to divergence at replacement sites (da) for 17 genes. Sequence data for genes that exhibit protein polymorphism were taken from GenBank (see Figure 5 for references for original data). Watterson’s θa (1975) was calculated using the effective number of replacement sites, and divergence was calculated as the total number of fixed differences between species (compared with D. simulans) divided by the effective number of replacement sites. Overall there is a strong positive correlation (r2 = 0.587; P < 0.001), and loci that show high levels of amino acid polymorphism also display high levels of amino acid divergence. An extreme example is Est-6, where a high level of amino acid polymorphism is paralleled by extensive amino acid divergence (Cooke and Oakeshott 1989; Karotamet al. 1995). This correlation is expected under the neutral theory because polymorphism is simply an intermediate stage in divergence between species. When the McDonald-Kreitman test is performed on each of the 17 genes for D. melanogaster, only 2 reject the neutral expectation. These outliers, Pgm and G6pd, both show highly significant departures, but in opposite directions. In addition, the exclusion of these two outliers from the plot in Figure 5 results in an even stronger correlation (r2 = 0.750; P < 0.0001). G6pd possesses two protein polymorphisms, but displays a very high number of fixed replacements (Eaneset al. 1993). This pattern is consistent with adaptive protein evolution where amino acid replacements are favored, fix rapidly, and are generally unseen as polymorphisms. A significant and disproportionate number of these fixed differences have accumulated in the D. simulans lineage, and this is consistent with a larger effective population size and increased efficacy at fixing adaptive protein polymorphism (Eaneset al. 1996). By showing no amino acid fixations, yet a significant excess of amino acid polymorphism, the pattern for Pgm is unique among D. melanogaster loci.
—(a) Shaded boxes indicate the strength of linkage disequilibrium across the 2354-bp gene of Pgm and designate the 99 of possible 210 associations between the 21 polymorphic sites (not including singletons) in both introns and exons that are significant by a chi-square test. The gene diagram above the correlation matrix displays the structure of the four exons (black boxes) and three introns for Pgm. The (B) in both plots refers to the 33 correlations that are significant at the 0.1% level with a Bonferroni correction. (b) Relationship between the measure of linkage disequilibrium R2 and the distance between the 21 compared polymorphic sites.
Neutrality tests for D. melanogaster and D. simulans intraspecific polymorphism
Excessive intraspecific amino acid variation has been reported in several studies involving mtDNA (Ballard and Kreitman 1994; Nachmanet al. 1996; Rand and Kann 1996) and three explanations have been proposed for this excess: (1) Replacement mutations are typically slightly deleterious and as a consequence contribute disproportionately to polymorphism and divergence (Kimura 1983; Ohta 1992). (2) There has been a recent period of relaxed constraint that would affect immediate within-species amino acid variation, but not long-term substitution rates (Kennedy and Nachman 1998). (3) Positive selection maintains amino acid polymorphism by fluctuating over time and space, and these fluctuations either retard or do not impact fixation probabilities (Gillespie 1991, 1994).
There is increasing evidence that the classical neutral model (Kimura 1983) is inadequate to explain certain patterns in DNA evolution (Kreitman 1996; Ohta 1996). A more realistic relaxed model assumes a spectrum of selection where many segregating mutations possess selection coefficients near the reciprocal of the effective population size. As population size decreases, selection becomes increasingly less effective at countering drift and at the limit behaves as in the classical neutral model. This slightly deleterious or nearly neutral model is difficult to test explicitly because it involves an unknown spectrum of selection coefficients for classes with unknown mutation rates. It can be shown that as selection intensity or population size changes, fixation rates are more responsive than polymorphism levels (Kimura 1983). Nevertheless, under both models a positive correlation is still expected between levels of polymorphism and divergence. The relationship will be complex in contrasts between silent and replacement sites, because silent and replacement substitutions will have different frequency distributions under selection (Akashi 1999). Therefore, the classical neutral model may be more applicable to silent site variation, while replacement variation may follow a nearly neutral model where most variation is slightly deleterious. The deviations generated by this mixed model cannot explain apparent deficiencies of amino acid polymorphisms (or excess fixations) as seen for G6pd and Adh (McDonald and Kreitman 1991; Eaneset al. 1993). However, excess replacement polymorphisms, as seen in mtDNA genes (Ballard and Kreitman 1994; Randet al. 1994; Nielsen and Weinreich 1999) and Pgm (this study), may be consistent with a nearly neutral interpretation. In the case of mtDNA, this cannot be further addressed due to the lack of suitable comparisons (independent genes with the same Ne), but for Pgm, contrasts with other nuclear genes are informative. If a nearly neutral model serves as a global explanation for amino acid variation and Pgm is typical, then one must reconcile why 10 of the genes with lower amino acid polymorphism than Pgm in Figure 5 all possess higher levels of amino acid divergence. On the other hand, the 5 genes with comparable levels of amino acid polymorphism to Pgm show very high levels of amino acid divergence. The bottom line is that Pgm appears to possess too much amino acid polymorphism.
—Relationship between amino acid polymorphism (θa) and amino acid divergence (da) for 17 D. melanogaster loci. Plotted loci are the following: (1) Pgm, this study; (2) Sod, Hudson et al. (1994); (3) Tpi, Hasson et al. (1998); (4) Pgi, John H. McDonald, personal communication; (5) per, Kliman and Hey (1993a); (6) boss, Ayala and Hartl (1993); (7) Adh, McDonald and Kreitman (1991); (8) Gld, Hamblin and Aquadro (1997); (9) runt, Labate et al. (1999); (10) Pgd, Begun and Aquadro (1994); (11) Amy-p and (12) Amy-d, Inomata et al. (1995); (13) Est6, Cooke and Oakeshott (1989); Karotam et al. (1995); (14) ase, Hilton et al. (1995); (15) G6pd, Eanes et al. (1993); (16) Mst26Aa and (17) Mst26Ab, Aguadé et al. (1992). Boxed loci, Pgm (1) and G6pd (15), show significant deviations from neutrality by McDonald-Kreitman tests (both P < 0.001).
The general observation of lower levels of amino acid polymorphism in D. simulans is extended to Pgm (Moriyama and Powell 1996). Although there are five amino acid polymorphisms in our sample, they are found only with the allozyme alleles and likely segregate as low frequency polymorphisms in larger samples (additional copies of the allozyme alleles show these amino acid polymorphisms are rare, but not singletons). Like the amino acid polymorphisms, many silent polymorphisms at Pgm segregate as singletons on rare haplotypes. This odd DNA haplotype structure in D. simulans is possibly explained by historical population structure and subsequent admixture (Ballard and Kreitman 1994; Hassonet al. 1998; Hamblin and Veuille 1999). The difference in levels of polymorphism between D. simulans and D. melanogaster has been explained by a reduced effective population size in D. melanogaster resulting in higher levels of replacement polymorphism (if the majority of this class are slightly deleterious mutations), but lower levels of synonymous site variation, as seen here. This is further supported by the observation of increased levels of amino acid fixation in many genes in the D. melanogaster lineage (Akashi 1995, 1996; Eaneset al. 1996). Increased levels of amino acid fixation would be indicative of a long-term reduction in population size, but one must then explain for Pgm the absence of amino acid fixation in the face of the abundance of amino acid polymorphism.
Other features of the polymorphism at the Pgm locus are inconsistent with a recent decrease in population size and a release of deleterious mutation into the pool of amino acid polymorphisms. Recent contractions in population size with subsequent expansion result in the distortion of the frequency spectrum tending toward mutations at low frequencies, and in the case of mtDNA variation there are typically excesses of replacement polymorphism in the singleton class (Nachman et al. 1994, 1996; Rand and Kann 1996; Hasegawaet al. 1998; Nielsen and Weinreich 1999), consistent with a slightly deleterious model. However, many of the Pgm amino acid replacements in the D. melanogaster lineage are found at substantial frequencies (only 6 of the 12 are singletons), and both the Tajima test and the Fu and Li test find no significant skew in either the frequency distribution of amino acid variation or silent site variation. Because metabolic enzymes exhibit some of the highest codon bias for D. melanogaster genes (Kliman and Hey 1993b), we were also interested in examining silent site fixations between D. melanogaster and D. simulans for any significant lineage-specific divergence (Akashi 1995; Eaneset al. 1996). A significant excess of fixed unpreferred codons would predict historically relaxed functional constraint at the Pgm locus. We find no significant difference in silent site fixation between lineages, nor is there evidence for the fixation of preferred or unpreferred codons at Pgm in either lineage when D. yakuba is used as an outgroup. Because codon bias for Pgm is only slightly above average [codon adaptation index (CAI) = 0.51, relatively low for a metabolic gene], we might not expect a significant trend for the fixation of preferred or unpreferred codons. Weak purifying selection acting on amino acid polymorphisms will not affect the polymorphism frequency spectrum of linked silent site variation (Akashi 1999; Przeworskiet al. 1999). Therefore, while an analysis of polymorphism frequency distributions for preferred and unpreferred codons is informative for demographic inferences, it cannot explain the excess of amino acid polymorphism.
A recent decrease in population size should also be reflected in other genes since all are demographically influenced in the same fashion. However, there is no trend toward an excess of singletons for other D. melanogaster genes (Akashi 1995; Moriyama and Powell 1996). Thus, a period of relaxed functional constraint for PGM would have to demonstrate that this period has been fairly recent, given the absence of amino acid fixation at this locus. Given the recent colonization of North America by D. melanogaster (David and Capy 1988), the pattern of variation at Pgm may be an artifact of a founding population that has undergone a bottleneck. The Zimbabwe population represents an additional population for comparison to patterns of variation in the North American sample. Compared to cosmopolitan populations, estimates of nucleotide variation for other genes from Zimbabwe tend to show elevated levels of silent polymorphism, which is consistent with a much larger historical ancestral population size in Africa (Begun and Aquadro 1993; Eaneset al. 1996). Both Figure 1 and Figure 3 show that these two populations share the same genealogy at this locus. There is neither evidence for geographic clustering due to fixed differences nor an absence of shared polymorphism between population samples. If the analysis of polymorphism at Pgm is indicative of a decrease in population size for recently colonized areas, then we might expect lowered levels of variation in our North American sample. However, silent site variation is comparable in the two samples for Pgm, and the Zimbabwe population exhibits the same high level of amino acid polymorphism (Table 1). Although both samples harbor some private alleles for both silent and replacement polymorphism, the level of nucleotide diversity between samples is almost identical to the level of variation within samples. While this is generally unusual for comparisons between Zimbabwe and non-Zimbabwe populations (Begun and Aquadro 1993), these patterns of variation are typical of a few D. melanogaster loci (Acp26Aa, Tsauret al. 1998; cecropin, Clark and Wang 1997). This discrepancy suggests selection had historically determined variation at several, but not all, loci in ancestral African populations before D. melanogaster colonized cosmopolitan regions (Begunet al. 1999). Finally, there is no excess of singletons for either class of mutations in the North American or the Zimbabwe sample, as would be expected if amino acid polymorphism had recently accumulated at this locus or if there had been a recent bottleneck of silent site variation.
A selection model that manages to maintain amino acid polymorphisms within D. melanogaster while limiting amino acid fixation is required to explain the pattern of replacement mutation at the Pgm locus. Balancing selection predicts the persistence of lineages in the population longer than expected under genetic drift, and associated lineages should have elevated levels of between-lineage variation because of hitchhiking near the selected polymorphisms (Kaplanet al. 1988; Kreitman and Hudson 1991). At Pgm, there is no evidence of an excess of silent variation associated with the amino acid polymorphisms. The textbook case for balancing selection has become the study of Adh in D. melanogaster (Kreitman 1983; Kreitman and Hudson 1991), where a peak of excess silent polymorphism is found around the F/S polymorphism. This pattern of variation is novel and may be the exception rather than the rule for D. melanogaster loci, even where there is good ancillary evidence for natural selection (Kreitman and Akashi 1995; McDonald 1998). In addition, evidence suggests that this excess of silent site variation arose prior to the Fast allele, and the peak of polymorphism around the F/S polymorphism is purely coincidental (Begunet al. 1999). The difficulty of using silent polymorphism in detecting a pattern of balancing selection for Pgm is further complicated by the many amino acid polymorphisms. Because it is also unnecessary to invoke selection to explain all amino acid variation at the Pgm locus (some variation may have no adaptive value), it can be difficult to separate neutral background noise from the selected signal. There are also other reasons why an excess of silent site polymorphism may not always accompany a balanced polymorphism. If the local recombination rate is sufficiently high or the selected site is relatively recent, the signature of balancing selection will not be reflected in the genealogical structure of linked neutral variation (Hudsonet al. 1994; Navarroet al. 2000). Balancing selection may also be difficult to detect if much adaptive polymorphism is short lived. In this case, the accumulation of neutral variation will be negligible and an excess of silent polymorphism will not be observed. Instead, adaptive polymorphisms that are short lived and relatively recent may cause linked neutral variation to be skewed toward an excess of rare alleles (Gillespie 1994, 1997).
Models with fluctuating selection might best explain the excess of amino acid polymorphisms seen at the Pgm locus in D. melanogaster. The long-term turnover of amino acid polymorphisms may be the result of selection operating in response to fluctuating and ephemeral environments. Because adaptive episodes or environments may be short lived, amino acid polymorphisms may be driven to intermediate frequencies by positive selection, but fail to reach fixation as they are replaced by a continuous traffic of amino acid polymorphism. This incomplete fixation process generates overlapping hitchhiking events, where silent variation linked to adaptive amino acid polymorphism is reduced. Wayne et al. (1996) explain an excess of amino acid polymorphism at the ref(2)P locus in D. melanogaster as an adaptive response for resistance to the rapidly evolving sigma virus. However, the level of overall amino acid fixation for this locus is also relatively high. The fact that amino acid fixation is rare for Pgm suggests that periods of environmental fluctuation are very short relative to the strength of positive selection acting on amino acid polymorphism.
These episodes may be difficult to detect in DNA sequences because, while hitchhiking under fluctuating selection may decrease heterozygosity, the resulting frequency distributions may not deviate from neutrality (Gillespie 1994, 1997; Bravermanet al. 1995). Although the Tajima and the Fu and Li tests were not significant for the overall gene analysis, the trend of negative D values suggests an excess of rare alleles at Pgm. While these statistical tests lack the power to detect the effect of purifying selection on linked neutral variation (Golding 1997; Neuhauser and Krone 1997; Przeworskiet al. 1999), they can detect distortions in genealogical structure resulting from recent and recurrent selective sweeps (Bravermanet al. 1995). We were interested in whether there possibly is an underlying structure to the sequence variation that can explain the apparent excess of rare alleles (Depauliset al. 1999). A subset of Pgm alleles with significantly fewer silent polymorphisms than expected under neutrality would be indicative of a mutation recently favored by directional selection. Because amino acid mutations have functional implications, we were especially interested in accessing the potential of specific amino acid polymorphisms to distort the genealogical structure in our sample. From the polymorphism data in Figure 1 and the genealogy structure implied in Figure 3, there are two lineages of Pgm sequences in the North American CRS that are defined by the amino acid polymorphism at nucleotide site 2055 (V484L). While our sample contains many amino acid polymorphisms, we were specifically interested in the site 2055 polymorphism because it is the only amino acid polymorphism that is intermediate in frequency and also lies at the base of the gene tree. For this analysis, lineage A is defined by a G at nucleotide site 2055 and lineage B is defined by a T. Both the Tajima test and the Fu and Li test were used to test the frequency distributions of both silent and replacement polymorphisms independently for these two lineages to detect deviations from that expected under a strictly neutral model (Table 4). Neither test found any deviations from neutrality for lineage B, while both tests found the frequency distributions of both silent and replacement polymorphisms significantly deviated from neutrality for lineage A. Not only is less variation associated with lineage A, but also low frequency variants are found in significant excess for this lineage. It is unclear whether selection is acting upon the amino acid polymorphism at this defining site, or a site within or linked with this haplotype. However, it is clear that this specific Pgm amino acid haplotype may have recently increased in frequency. A fundamental feature of the coalescence process is that allele age, frequency, and intra-allele variability are correlated (Slatkin and Rannala 1997; Wiuf and Donnelly 1999). It should be emphasized that the pattern of variation associated with this Pgm amino acid haplotype is not the case of a new mutation, since the A lineage bears the ancestral state at nucleotide site 2055. An analysis of this variation along a latitudinal cline shows this haplotype markedly increasing in temperate climates (B. C. Verrelli and W. F. Eanes, unpublished data). This pattern of variation is consistent with a model of fluctuating selection where a certain amino acid haplotype may increase in frequency and accumulate little variation over a short period of time, but is limited from reaching fixation.
Summary parameter estimates and neutrality tests for the site 2055 lineages
Association with the In(3L)P inversion: Inversion polymorphisms are a pervasive feature of Drosophila genomes and their role in structuring genic variation by suppressing recombination is an enduring question. In this regard the Pgm locus occupies a potentially interesting chromosomal position. We had initially mapped the cytological position of Pgm by in situ hybridization to third chromosome bands 72D1-5, which are immediately inside the proximal breakpoint (73E3) of the In(3L)P inversion. With the completion of the entire Drosophila genome sequence, the exact location of Pgm is determined to be ∼180 kb inside the proximal inversion breakpoint. Despite this close proximity and the apparent old age of the In(3L)P inversion (Wesley and Eanes 1994), there are no fixed differences between Pgm alleles on different arrangements, and there are also shared polymorphisms indicating genetic exchange. Recombination is expected to be rare near inversion breakpoints because of difficulty in homologous pairing in inversion heterozygotes (Krimbas and Powell 1993), and this restriction has apparently facilitated fixations between the standard and the In(3L)P arrangements in the immediate vicinity of the breakpoints (Wesley and Eanes 1994; Hasson and Eanes 1996). Gene conversion is the likely mechanism for genetic exchange near inversion breakpoints, and it can explain the exchange of at least two small tracts of Pgm sequence between standard and inverted arrangements, and the three In(3L)P sequence clusters that are distributed among the standard arrangements in Figures 1 and 3. Apparent gene conversion has been reported for rp49 on standard and inverted arrangements in D. subobscura (Rozas and Aguadé 1994; Rozaset al. 1999). This evidence of exchange, combined with the fact that the same PGM allozyme allele arises from different amino acid changes, explains the lack of association between PGM allozyme variation and this inversion in earlier studies (Langley et al. 1974, 1977).
Although gene conversion has contributed to exchange between arrangements, it is possible to infer the ancestral Pgm sequence initially captured by the inversion event. Two Pgm sequences found on inverted chromosomes DPF95 48.2 and HFL97 93 appear ancestral to all standard sequences in the genealogy in Figure 3. This is in accord with Hasson and Eanes (1996), who showed that alleles on the In(3L)P chromosomes coalesce earlier in time than all alleles on the standard chromosomes. This basal relationship may be expected if this inversion is under balancing selection (Mettleret al. 1977; Knibbet al. 1981). Pgm alleles on In(3L)P chromosomes do not show elevated levels of polymorphism nor are there any fixed differences between arrangements; however, gene conversion at the Pgm locus will erode any effect of balancing selection. Although there is evidence of exchange between arrangements, Pgm alleles on inverted chromosomes that cluster ancestral to all other alleles may represent the allele first captured by the inversion.
The pattern of linkage disequilibrium: The linkage disequilibrium analysis of the entire 2354-bp region of Pgm shows that many sites are strongly correlated. This chromosomal region is predicted to have moderate rates of recombination (see Hudson and Kaplan 1995). Using the estimate of the recombination parameter C = 4Nc from Hudson (1987), which is based on the variance in the number of pairwise differences, we obtain a locus-specific value of C = 11.20. Dividing this value by our locus-specific estimate of θ, the effective number of recombination events per mutation event (c/μ) is estimated to be 1.37. The estimates for a few D. melanogaster loci, Mlc1 (13.4; Leichtet al. 1995) and Tpi (8.89; Hassonet al. 1998), result in larger values, while G6pd (1.7; Eaneset al. 1996), Adh (1.6; Hudson 1987), and Sod (0.80; Hudsonet al. 1994) are comparable to Pgm. There is also direct evidence of recombination in our sample from inspection of Figures 1 and 3, and using the criteria of Hudson and Kaplan (1985) at least five recombination events are detected in the Davis Peach Farm CRS. The absence of amino acid fixation, yet a normal level of silent site divergence at Pgm, implies that despite ubiquitous linkage disequilibrium across the Pgm locus, recombination has been sufficient that amino acid polymorphism has been historically uncoupled from silent site polymorphism.
It is likely that much of this disequilibrium has a recent mutational origin; an immediate association arises between a new mutation and sites on the allele that the new mutation first appears. As new mutations persist as low frequency variants, only after sufficient time will these initial associations between sites be reduced by recombination. Our sample shows evidence of recombination, but these involve intermediate frequency mutations. Much of the disequilibrium appears due to a single amino acid haplotype (as defined by a G at nucleotide site 2055) at high frequency with a significant excess of low frequency variants. Although the entire sample shows a trend toward rare mutations and association between sites, the strong disequilibrium associated with this amino acid haplotype is likely the result of recent directional selection.
Within-species allozyme heterogeneity: This study clearly shows that the PGM allozyme mobility classes can be heterogeneous mixtures of amino acid replacements, where amino acid replacements often converge on the same electrophoretic mobility. This within-allozyme heterogeneity had been predicted from thermostability studies on PGM (Trippa et al. 1976, 1978) and may explain the failure to detect functional differences in kinetic studies (Fucciet al. 1979; Carfagnaet al. 1980), the absence of linkage disequilibrium between allozyme mobility classes and the In(3L)P inversion (Langley et al. 1974, 1977), and the reported absence of latitudinal clines for PGM allozyme alleles (Oakeshottet al. 1981). The observation of three allozyme mobility alleles shared between D. melanogaster and D. simulans warranted study. The Medium allozyme allele in D. simulans is identical in amino acid sequence to one of the D. melanogaster Medium allozyme alleles, so both species still share a common ancestral amino acid sequence from which numerous alleles emerge. However, the other Medium alleles and derived Fast and Slow mobility classes have different mutational origins in the two species. Katz and Harrison (1997) found that although two distantly related cricket species in the genus Gryllus shared six allozyme mobility alleles for phosphoglucose isomerase, there were also no shared amino acid polymorphisms between the two species. Both studies emphasize how tenuous studies of enzyme variation and function can be when based solely on electrophoretic studies, especially in highly polymorphic allozyme loci.
Although D. yakuba does not share the same allozyme mobility alleles with D. melanogaster and D. simulans, it exhibits the same high level of allozyme variability. Three allozyme mobility alleles, two of which are high in frequency, were found in a screen of only 30 isofemale lines from the West African population sample. Our sample of D. yakuba Pgm sequences segregates four replacement polymorphisms (which are all responsible for electrophoretic differences), and like D. melanogaster and D. simulans, different amino acid replacements result in the same allozyme mobility alleles (our unpublished data). A preliminary analysis of this sample also shows a normal level of silent site divergence between the D. melanogaster-simulans lineage and the D. yakuba lineage at Pgm, yet a low number of amino acid fixations.
Adaptive protein evolution in the glycolytic pathway: D. melanogaster has been an important model for studying selection on enzyme polymorphisms dating back to the earliest of allozyme studies. While there is evidence for selection on some metabolic enzymes, it is of fundamental importance to understand why specific enzymatic points in the pathway possess protein polymorphisms and rapid evolution, while others do not. Because of their position and intrinsic ability to allocate substrate into competing minor pathways (LaPorteet al. 1984; Keightley and Kacser 1987; Keightley 1989), molecular variation in branch point enzymes may be subject to adaptive evolution. The location of PGM at a branch point at the head of the metabolic pathway and its high allozyme polymorphism implicated adaptive protein evolution. Like G6pd (another branch point enzyme competing for the same substrate), Pgm shows significant patterns of adaptive replacement polymorphism, but unlike Adh (Kreitman 1983; Kreitman and Hudson 1991) and possibly G6pd (McDonald 1998), Pgm does not exhibit the patterns of silent site variation expected of long-term balancing selection. It is possible that selection maintains diversifying variation in the competing storage of glycogen or that amino acid polymorphisms at Pgm arise from structure-function trade-offs between enzyme activity and stability (see Eanes 1999). The absence of any geographic variation for PGM allozymes was an unusual observation for a metabolic enzyme in D. melanogaster (Oakeshottet al. 1981); however, we find that amino acid haplotypes at the Pgm locus in D. melanogaster show strong and significant clines with latitude (B. C. Verrelli and W. F. Eanes, unpublished data). This adds compelling evidence for the selective maintenance of amino acid polymorphism at this locus.
Acknowledgments
The authors thank Ing-Nang Wang and Mike McCartney for their help in the early stages of the project with the recovery of the initial Pgm transcript. We also thank John H. McDonald for his interest in this project and useful comments on an earlier version of the manuscript. Jody Hey and two anonymous reviewers provided helpful criticism in revision. This research was supported by National Science Foundation grant DEB9318381 and U.S. Public Health Service grant GM-45247 to W.F.E. This is contribution number 1074 from the Graduate Program in Ecology and Evolution, State University of New York at Stony Brook.
Footnotes
-
Communicating editor: J. Hey
- Received April 18, 2000.
- Accepted July 31, 2000.
- Copyright © 2000 by the Genetics Society of America