Genetics, Vol. 153, 1357-1369, November 1999, Copyright © 1999
Molecular Evolution of Two Linked Genes, Est-6 and Sod, in Drosophila melanogaster
Evgeniy S. Balakireva,b,c,
Elena I. Balakireva,
Francisco Rodríguez-Trellesa,d, and
Francisco J. Ayalaa
a Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525,
b Institute of Marine Biology, Vladivostok 690041, Russia,
c Department of General Biology, Ecology and Soils, Far Eastern State University, Vladivostok 690600, Russia
d Departament de Genética, Universitat Autónoma de Barcelona, 08193 Bellaterra (Barcelona), Spain
Corresponding author:
Francisco J. Ayala, Department of Ecology and Evolutionary Biology, 321 Steinhaus Hall, University of California, Irvine, CA 92697-2525., fjayala{at}uci.edu (E-mail)
Communicating editor: M. SLATKIN
 | ABSTRACT |
|---|
We have obtained 15 sequences of Est-6 from a natural population of Drosophila melanogaster to test whether linkage disequilibrium exists between Est-6 and the closely linked Sod, and whether natural selection may be involved. An early experiment with allozymes had shown linkage disequilibrium between these two loci, while none was detected between other gene pairs. The Sod sequences for the same 15 haplotypes were obtained previously. The two genes exhibit similar levels of nucleotide polymorphism, but the patterns are different. In Est-6, there are nine amino acid replacement polymorphisms, one of which accounts for the S-F allozyme polymorphism. In Sod, there is only one replacement polymorphism, which corresponds to the S-F allozyme polymorphism. The transversion/transition ratio is more than five times larger in Sod than in Est-6. At the nucleotide level, the S and F alleles of Est-6 make up two allele families that are quite different from each other, while there is relatively little variation within each of them. There are also two families of alleles in Sod, one consisting of a subset of F alleles, and the other consisting of another subset of F alleles, designed F(A), plus all the S alleles. The Sod F(A) and S alleles are completely or nearly identical in nucleotide sequence, except for the replacement mutation that accounts for the allozyme difference. The two allele families have independent evolutionary histories in the two genes. There are traces of statistically significant linkage disequilibrium between the two genes that, we suggest, may have arisen as a consequence of selection favoring one particular sequence at each locus.
THE understanding of the genome as an aggregate of relatively independent genes (bean-bag genetics) was a feature of the classical period of genetics. Gene interaction (epistasis) played a primary role in the theory of evolution starting in the 1920s (WRIGHT 1931
; DOBZHANSKY 1937
; SCHMALHAUSEN 1946
; MATHER 1953
; WADDINGTON 1957
; MAYR 1963
). Linkage disequilibrium and nonrandom associations between alleles or groups of nucleotides may indicate epistatic relationships, and much empirical work has been devoted over several decades to demonstrate that linkage disequilibrium occurs between nonallelic genes. The issue of gene interaction is also important in connection with the long-lasting neutralist-selectionist controversy that has generated much research in population genetics for nearly 30 years (KIMURA 1968
, KIMURA 1983
; KIMURA and OHTA 1971
; AYALA et al. 1971
, AYALA et al. 1972A
, AYALA et al. 1972B
; LEWONTIN 1974
; and many others). Linkage disequilibrium is often considered strong evidence supporting the selectionist position, especially if its pattern is consistent between populations (LEWONTIN 1974
).
The evidence for linkage disequilibrium between individual loci remains scarce, except when genes are very closely linked or associated with chromosomal inversions (reviewed by LANGLEY 1977
; HEDRICK et al. 1978
; BARKER 1979
; KRIMBAS and POWELL 1993
). In the cases when significant associations have been detected, it is often far from clear whether they are caused by nonrandom haplotype sampling, random genetic drift, or natural selection (MUKAI et al. 1974
; MUKAI and VOELKER 1977
). Significant disequilibrium can indeed arise without epistasis as a result of random genetic drift within a given population (HILL and ROBERTSON 1968
; OHTA and KIMURA 1969A
, OHTA and KIMURA 1969B
; HILL 1975
, HILL 1976
), in subdivided populations (NEI and LI 1973
; LI and NEI 1974
; FELDMAN and CHRISTIANSEN 1975
; OHTA 1982A
, OHTA 1982B
), and by founder effects (AVERY and HILL 1979
).
Numerous examples of significant linkage disequilibrium have been discovered in Drosophila between specific allozymes and chromosomal inversions, which have been interpreted as reflecting selection for favored multilocus allele combinations (PRAKASH and LEWONTIN 1968
; PRAKASH 1974
; ZOUROS 1976
; VOELKER et al. 1978
). ISHII and CHARLESWORTH 1977
and NEI and LI 1980
have, however, shown that nonrandom associations between allozymes and inversions can be explained by absence or limited recombination without selection (as a consequence of chance and insufficient time for associations generated by mutation to decay by recombination and gene conversion). The inference from the allozyme studies is, therefore, that linkage disequilibrium is mostly associated with closely linked genes, but may involve distantly linked genes when special cytological mechanisms (polymorphic inversions) allow it to exist. Allozyme loci that can recombine freely exhibit little, if any, linkage disequilibrium. Failure to detect disequilibrium may, of course, be a consequence of the limited statistical power of the tests to detect it (BROWN 1975
), which might be overcome by using larger sample sizes and combining probabilities from independent tests (BROWN 1975
; ZAPATA and ALVAREZ 1992
, ZAPATA and ALVAREZ 1993
).
The introduction of DNA sequencing and other molecular techniques in population studies makes it possible to gain considerable information about linkage disequilibrium. Nonrandom associations have been detected between polymorphic sites of Adh, Adh-Dup (SCHAEFFER and MILLER 1993
), and Est-5B (VEUILLE and KING 1995
) in Drosophila pseudoobscura; vermilion (BEGUN and AQUADRO 1995
) and G6pd (EANES et al. 1996
) in D. simulans; and the following in D. melanogaster: Adh and Adh-Dup (KREITMAN and HUDSON 1991
), vermilion (BEGUN and AQUADRO 1995
), Pgd (BEGUN and AQUADRO 1994
), white (KIRBY and STEPHAN 1995
, KIRBY and STEPHAN 1996
), G6pd (EANES et al. 1996
), dpp (RICHTER et al. 1997
), Acp70A (CIRERA and AGUADE 1997
), period (ROSATO et al. 1997
), and Acp26Aa and Acp26Ab (AGUADE 1998
). But, as in the case with allozymes, a positive correlation between the presence of chromosome inversions and the extent of linkage disequilibrium is often the case (AQUADRO 1993
, and references therein). Nevertheless, DNA linkage disequilibrium between nucleotide sites is well established in some cases, as well as the fact that it reflects epistatic relationships (KIRBY et al. 1995
; KIRBY and STEPHAN 1996
), although the nature of the epistatic interactions between genes remains enigmatic.
In this article, we investigate the nucleotide polymorphisms in the Est-6 and Sod genes of D. melanogaster and compare them in the two genes, seeking to identify processes that contribute to the polymorphisms. We test whether linkage disequilibrium may occur between these two fairly closely linked genes, as has been intimated by the results of SMIT-MCBRIDE et al. 1988
, who investigated natural and laboratory populations and detected linkage disequilibrium between the allozyme polymorphisms of Sod and Est-6, but not between any other gene pairs. HUDSON et al. 1994
, HUDSON et al. 1997
have, moreover, shown that a selective sweep has recently occurred involving a many-kilobases-long region that includes the Sod gene. Est-6 and Sod are closely linked on the left arm of chromosome 3 of D. melanogaster, ~1000 kb apart (PROCUNIER et al. 1991
; HEINO et al. 1994
; HARTL and LOZOVSKAYA 1995
). The two genes are examined in a set of 15 haplotypes from a natural population in California.
 | MATERIALS AND METHODS |
|---|
Drosophila strains:
The 15 D. melanogaster strains were derived from wild flies collected by F. J. Ayala (October 1991) in El Rio Vineyard (Acampo, CA). The strains were made fully homozygous for the third chromosome by means of crosses with balancer stocks, as described by SEAGER and AYALA 1982
. The homozygous strains were named in accordance with the Cu,Zn superoxide dismutase (SOD) electrophoretic allele they carry, Fast (F) or Slow (S), as follows: 255S, 438S, 510S, 521S, 94F, 174F, 377F, 483F, 498F, 521F, 565F, 581F, 968F, 517S, and 357F. The Sod gene sequence of these strains has been investigated previously in our laboratory (HUDSON et al. 1994
, HUDSON et al. 1997
).
Allozyme electrophoresis:
Twenty flies from each D. melanogaster strain were homogenized in 20 µl 0.1 M Tris-HCl buffer, pH 8.0. The homogenates were electrophoresed for 89 hr using a Tris-borate-EDTA continuous buffer system, pH 8.6 (MARKERT and FAULHABER 1965
) at 100 V in a 11% starch gel. The gels were stained for esterase-6 (EST-6) activity according to standard procedures (MANCHENKO 1994
). See AYALA et al. 1972B
for additional details.
DNA extraction, amplification, and sequencing:
Total genomic DNA was extracted using the procedure described by PALUMBI et al. 1991
.
We used the Est-6 sequence, previously published by COLLET et al. 1990
, for designing PCR and sequence primers. The amplified fragments (2017 bp long) included 56 bp of the 5'-flanking region, the Est-6 gene, the intergenic region, and 82 bp of the
Est-6 gene (Figure 1). The
Est-6 gene has been referred to in the literature as Est-P, but the evidence indicates that it is a pseudogene (BALAKIREV and AYALA 1996
).
The two primers used for the PCR amplification reactions (1 and 2, Figure 1) were 5'-gcaattgccgcatctcaagatagt-3' (forward primer) and 5'-caacaatcaagggatcagcttcag-3' (reverse primer). All PCR reactions were carried out, as described by KWIATOWSKI et al. 1991
, in final volumes of 100 µl containing 40 µM each of dNTP, 2.5 units of AmpliTaq DNA Polymerase (Perkin Elmer, Norwalk, CT), 0.2 µM each of forward and reverse primers, buffer (Perkin Elmer) at a final concentration of 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, and ~50 ng of template (total genomic) DNA. The mixtures were overlaid with mineral oil, placed in a DNA thermal cycler (Perkin Elmer), incubated 5 min at 95°, and subjected to multiple cycles of denaturation, annealing, and extension under the following conditions: 95° for 1 min, 56° for 1 min, and 72° for 2 min (for the first cycle and progressively adding 3 sec at 72° for every subsequent cycle). After 30 cycles, a final 7-min extension period at 72° concluded each amplification reaction. Samples were stored at 4° for several hours or at -20° for up to 2 wk.
One-tenth of each reaction volume was assayed on a 0.8% agarose gel. If the desired PCR product was detected, the remainder volume of the reaction was purified with the Wizard PCR preps DNA purification system (Promega, Madison, WI). The purified PCR product was directly sequenced by the dideoxy chain termination technique (SANGER et al. 1977
) using Dye Terminator chemistry and separated with the ABI PRISM 377 automated DNA sequencer (Perkin Elmer). For each line, the sequences of both strands were determined. Eight internal primers spaced on average 350 nucleotides (ad for the 5'-to-3' direction and a'd' for the 3'-to-5' direction; Figure 1) were used for sequencing the Est-6: (a) 5'-ccaacaaattggtaggagagg-3', (b) 5'-aactatggactgaaagatcaacg-3', (c) 5'-ctgtattggagccatcggatg-3', (d) 5'-gatcttcatcgcaaatatgg-3', (a') 5'-tcacgcatcacgttctcgtgtcc-3', (b') 5'-gcgtctggagcatccgatggctcc-3', (c') 5'-tcgaatatcaaaaagtagtcgtc-3', and (d') 5'-aaattccacatgctgcctattct-3'. The Est-6 sequences reported in this article have been deposited in the GenBank sequence database library under accession nos.
AF147095,
AF147096,
AF147097,
AF147098,
AF147099,
AF147100,
AF147101,
AF147102.
For the Sod gene, we analyzed a region (1408 bp long) that included 43 bp of exon I, the intron (725 bp), exon II (396 bp), and 244 bp of the 3' flanking region. The DNA preparation, amplification, cloning, and sequencing procedures for the Sod gene are described by HUDSON et al. 1994
, HUDSON et al. 1997
.
DNA sequence analysis:
All primers were designed using the computer program DNASIS for Windows (1994, Hitachi Software Engineering), which allows one to check the secondary structure of primers. Multiple alignment was carried out manually, using the program DARWIN (elaborated by Robert Tyler from our laboratory), and automatically, using the program CLUSTAL W (THOMPSON et al. 1994
). Two DNA sequences were obtained from the GenBank database: D. melanogaster, Est-6 (accession no.
M33780), and D. simulans, Sod (accession no.
X15685). The maximum parsimony analysis of the Est-6 and Sod sequences was performed with the computer program PAUP (SWOFFORD 1993
). The computer program DNASP (ROZAS and ROZAS 1997
) was used to analyze the data by means of a "sliding window" (HUDSON and KAPLAN 1988
) and was used for most intraspecific analyses.
Linkage disequilibrium analysis:
Linkage disequilibrium within and between Est-6 and Sod was evaluated using Fisher's exact test and chi-square test for independence between sites. Singleton polymorphisms (mutations appearing in only one sampled allele) were omitted. Lewontin's sign test, elaborated especially for molecular sequence data (LEWONTIN 1995
), was also used to test the significance of linkage disequilibrium between Est-6 and Sod, including singletons. LEWONTIN 1995
has shown that Fisher's exact test is not sensitive in cases with very asymmetrical distribution of allele frequencies, a situation that is commonly observed in molecular sequence data, including our data. In such a situation, there is, for numerical reasons, very low probability of obtaining a significant test by means of the usual two-by-two contingency tables (Fisher's exact test or chi-square test). To overcome this problem, LEWONTIN's (1995) test is based on the distribution of the disequilibrium sign ("sign" test), which allows the analysis of asymmetrical allele frequency data to make inferences about overall linkage disequilibrium. The procedure involves examining the number of positive and negative D values for each polymorphic site within and between all types of pairwise comparisons (singletons vs. singletons, singletons vs. doublets, doublets vs. doublets, and so on; LEWONTIN 1995
). The observed value, summed for a particular type of pairwise comparison, is compared with the expected value using a goodness-of-fit test (the likelihood ratio statistic, G-test).
 | RESULTS |
|---|
The genes Est-6 and Sod are on the left arm of chromosome 3 of D. melanogaster, genetically mapped at 35.9 and 32.5 and located at 69A1-A5 and 68A8-A9 on the polytene chromosomes, respectively (HEINO et al. 1994
).
Allozyme polymorphism:
We have analyzed Est-6 in 15 D. melanogaster strains, fully homozygous for the third chromosome, derived from flies collected in the El Rio Vineyard in California. The strains were chosen because their Sod gene sequences have been previously investigated in our laboratory (HUDSON et al. 1994
, HUDSON et al. 1997
) and may be considered as randomly selected with respect to their Est-6 alleles. Electrophoretic analysis reveals two common EST-6 allozymes: Fast (357F and 517S strains) and Slow (255S, 438S, 510S, 521S, 94F, 174F, 377F, 483F, 498F, 521F, 565F, 581F, and 968F strains). (The S or F in each strain's designation refers to the SOD allele it carries; see MATERIALS AND METHODS.)
Nucleotide polymorphism:
The organization of the Est-6 gene is outlined in Figure 1. The sequenced region is 1879 bp long, comprising the Est-6 gene and the 193 bp of the intergenic region between Est-6 and
Est-6. Table 1 displays the Est-6 polymorphisms observed in the 15 D. melanogaster strains. There are 35 nucleotide polymorphic sites, 9 of which yield amino acid replacements.
Table 2 gives the values of
and
for the two genes, Est-6 and Sod. The data for Sod are obtained from HUDSON et al. 1997
. The overall nucleotide diversity is somewhat larger for Sod. However, the diversity is about the same for the coding or the noncoding regions considered separately (Sod has greater overall diversity because the noncoding region is larger). The diversity, as expected, is much greater in the noncoding than in the coding regions, reflecting the usual higher level of functional constraint in the coding regions of genes. The distribution of polymorphic sites among the Sod haplotypes is extremely nonuniform: 69.6% of the Sod polymorphic sites are introduced by just two haplotypes (498F and 968F; see HUDSON et al. 1994
, HUDSON et al. 1997
). It is also quite polarized in the case of Est-6, with 62.9% of the polymorphic sites resulting from two haplotypes, 517S and 357F, which are the only two Est-6 F alleles in the sample (shown at the bottom of Table 1). When these two pairs of sequences are removed, the
and
values considerably decrease for both genes; but in the coding region, the decrease is somewhat greater for Est-6 (Table 2). This difference may be due to different selective constraints in the two genes, but it may also reflect the different histories of the particular sets of alleles included in our analysis (see DISCUSSION).
There are 26 polymorphic nucleotide sites (1.6%) in the coding region of Est-6, but only 5 (1.1%) in Sod. In Est-6, the transversion/transition (Tv/Ts) ratio is 3/23 = 0.130, much lower than expected from random mutation, reflecting the usual selection effect against Tv. The Tv/Ts ratio for Sod is significantly higher than for Est-6 (2/3 = 0.667). The ratio of replacement to synonymous segregating sites is lower for Sod (1/4 = 0.250) than for Est-6 (9/17 = 0.529). The Sod data of our study do not come from a random sample, since the number of S alleles was made intentionally larger (33%) in the sample than would be expected in a random sample (frequency of S in the natural population is ~515%; see HUDSON et al. 1994
, HUDSON et al. 1997
). Thus, in our sample, there are more replacement substitutions in Sod than would be expected and fewer synonymous substitutions, since the S alleles are all completely (or very nearly) identical in sequence, reflecting their recent origin (HUDSON et al. 1994
, HUDSON et al. 1997
). In the case of Est-6, KAROTAM et al. 1993
, KAROTAM et al. 1995
observed a relatively high Rep:Syn ratio in D. melanogaster, D. simulans, and D. mauritiana. MORIYAMA and POWELL 1996
indicate that D. melanogaster tends to have the highest incidence of replacement polymorphisms (26.4%) when compared with two other Drosophila species, D. simulans (11.6%) and D. pseudoobscura (16.9%).
For Est-6, the A
G replacement substitution (nucleotide site 772, Table 1) results in a charge-altering amino acid replacement (Asn
Asp, amino acid position 258), which was first detected by COOKE and OAKESHOTT 1989
. COOKE and OAKESHOTT 1989
have suggested that another replacement substitution (G
A, at nucleotide site 802, resulting in Ala
Thr at amino acid site 268) might also contribute to the selective differences observed between the two Est-6 allozymes. They have proposed that one or both of the polymorphisms at 258 and 268 are the primary target for the selection underlying the F-S latitudinal clines. We have found, however, that the Ala
Thr replacement at 268 is not diagnostic for the observed F/S Est-6 polymorphism but rather occurs in both F and S Est-6 strains (see Table 1, site 802). HASSON and EANES 1996
found, like us, that the F/S allozyme difference can be attributed to the single-amino-acid polymorphism at site 258 (nucleotide site 772), which is diagnostic for the Est-6 allozyme lineages.
Haplotype structure:
Figure 2 represents the maximum parsimony tree of the Est-6 haplotypes. The distribution of the pairwise differences is bimodal, owing to haplotypes 517S and 357F, which are largely different from the rest, although quite similar to each other. These two haplotypes code for the Est-6 Fast allozyme and will be denoted as the Est-6 F allelic lineage, whereas the other 13 haplotypes will be denoted as the Est-6 S allelic lineage. The great divergence between the two lineages indicates that this S-F enzyme polymorphism is ancient, at least relative to the allelic diversity within each of the two lineages.

View larger version (13K):
In this window
In a new window
Download PPT slide
|
Figure 2.
Unrooted maximum-parsimony tree of the Est-6 haplotypes of D. melanogaster. The haplotypes are designated according to the Sod strain from which they originate. Along the branches are the numbers of mutational steps. The 15 Est-6 haplotypes group into 2 lineages, designated S and F on the right. The analysis is based on 1879 bp comprising the whole sequenced region of Est-6. D. simulans is used as an outgroup.
|
|
In D. melanogaster, the Est-6 S
F (asparagine
aspartic acid) replacement is associated with site 772 (A
G). D. simulans has an A at position 772, which suggests that in D. melanogaster, the S allozyme may have been the ancient condition from which the F allelic lineage derived. However, COOKE and OAKESHOTT 1989
and HASSON and EANES 1996
suggest that the F lineage is ancestral on the grounds that the overall level of polymorphism is significantly greater among the F than among S alleles. The two Est-6 allelic lineages (F and S) are about equally different from D. simulans (9098 nucleotide substitutions per sequence), much more so than they are from each other (2229 substitutions per sequence), which indicates that the F-S polymorphism originated well after the divergence of the two species. The haplotypes have accumulated a number of substitutions after the divergence between the F and S allelic lineages: the two F strains differ by 4 synonymous substitutions, but no replacements, whereas there are 4 polymorphic replacement sites among the 13 S haplotypes, 1 of which (at 1526) is shared by 2 haplotypes (94F and 174F), for a total of 5 replacement polymorphisms among the 13 S haplotypes.
Figure 3 represents the maximum parsimony tree for the Sod haplotypes. The contrasts with Est-6 are notable. There is only one replacement polymorphism (F
S allozyme) in Sod. Two F haplotypes (498F and 968F) are very different from all others, whereas the eight other F haplotypes are much more similar to the S haplotypes than they are to 498F and 968F.

View larger version (11K):
In this window
In a new window
Download PPT slide
|
Figure 3.
Unrooted maximum-parsimony tree of the Sod haplotypes of D. melanogaster, based on 1408 bp that include the whole sequenced region of Sod. The 15 Sod haplotypes group into two lineages designated F and F(A)S. D. simulans is used as an outgroup.
|
|
Linkage disequilibrium:
Within Est-6, 262 out of 351 pairwise comparisons (74.6%) between nonsingleton pairs of polymorphisms show statistically significant linkage disequilibrium by the chi-square test; with the Bonferroni correction for multiple comparisons, there are 192 (54.7%) significant associations. The distribution of significant associations is fairly uniform across the Est-6 sequence; linkage disequilibrium does not decline as distance between polymorphic sites increases.
We have also found an excess of nonrandom associations within Sod: 211 out of 325 pairwise comparisons (64.9%) are significant, and 191 (58.8%) are significant with the Bonferroni correction. The significant associations do not form any obvious cluster, nor is the strength of linkage disequilibrium related to the distance between polymorphic sites.
We have first evaluated linkage disequilibrium between the Sod and Est-6 genes using Fisher's exact test and the chi-square test, which fail to detect any significant interlocus association, as might be expected owing to asymmetrical allelic frequencies (LEWONTIN 1995
; see MATERIALS AND METHODS). We have also used the "sign" method (LEWONTIN 1995
), which is based on the distribution of the disequilibrium sign, which is sensitive to asymmetrical allele frequencies and efficiently operates with singleton polymorphisms, which are not informative when Fisher's exact test is used.
The sign method involves examining the number of positive and negative D values for each polymorphic site within and between all types of pairwise comparisons (singletons vs. singletons, singletons vs. doublets, doublets vs. doublets, and so on). The observed negative value summed for a particular type of pairwise comparison is compared with the expected value using a goodness-of-fit test (the likelihood ratio statistic, G-test, was recommended by LEWONTIN 1995
). We have calculated the D values for all pairwise comparisons between sites in Sod and Est-6 (Table 3). The observed total number of negative associations is 1551 vs. the expected number of 1248.27 (G = 459.60, d.f. = 1, P < 0.001; SOKAL and ROHLF 1981
, pp. 695707), manifesting a significant excess of negative associations; i.e., less frequent alleles at different loci are predominantly in the repulsion phase (the result remains the same after applying Williams' correction: G* = 459.45, d.f. = 1, P < 0.001).
Seeking to localize the nonrandom associations, we have applied the Lewontin test separately to different regions. There are very significant associations between the Est-6 coding region and the Sod intron (G = 277.36, d.f. = 1, P < 0.001) and between the Est-6 coding region and the 3'-flanking region of Sod (G = 53.11, d.f. = 1, P < 0.001). There are associations between the coding regions of the two genes that are less pronounced, but still statistically significant (G = 6.12, d.f. = 1, P < 0.05). It is interesting that in this last comparison, the significant disequilibrium occurs between the regions of Est-6 and Sod that include the sites responsible for the F/S polymorphism at each locus (i.e., exon I of Est-6 and exon II of Sod: G = 6.52, d.f. = 1, P < 0.05); no association can be detected between exon II of Est-6 and exon II of Sod (G = 1.41, d.f. = 1, P > 0.05). The observed pattern of linkage disequilibrium between the two genes remains unchanged when singletons are excluded.
Test of neutrality:
We have applied the HKA (HUDSON et al. 1987
), TAJIMA 1989
, MCDONALD and KREITMAN 1991
, and FU and LI 1993
tests of neutral equilibrium to our sequence data. We have first examined the intergenic region between Est-6 and
Est-6 vs. the coding sequence of Est-6 without observing any significant departures from neutral expectations within D. melanogaster relative to the differences between D. melanogaster and D. simulans (
2 = 1.24, P > 0.05). The same result was obtained by KAROTAM et al. 1995
, who used various upstream sequences of Est-6 and Adh as neutral reference sequences and various pairs of D. melanogaster and D. simulans Est-6 alleles. MORIYAMA and POWELL 1996
found significant deviation from neutral expectations in the comparison of the intraspecific polymorphism of Est-6 with the interspecific divergence between Est-6 and Pgd. However, nucleotide variation at Pgd is highly unusual because most HKA tests that include this locus (whether as the reference or test locus) show significant deviations from neutrality (MORIYAMA and POWELL 1996
). We have applied the TAJIMA 1989
and FU and LI 1993
tests separately to different parts of Est-6 and to the whole gene. These tests do not reveal any significant deviation from neutrality. However, the TAJIMA 1989
test applied to our data (exon I), combined with previously published Est-6 sequences (COOKE and OAKESHOTT 1989
; HASSON and EANES 1996
), reveals significant deviation from neutrality expectations for the S alleles (D = -1.864, P < 0.05), but not for the F alleles (D = -0.181, P > 0.10). Also, the MCDONALD and KREITMAN 1991
test applied to four D. simulans (from KAROTAM et al. 1995
) and 15 D. melanogaster Est-6 coding sequences reveals significant deviation from neutrality (G = 7.07, d.f. = 1, P < 0.01, Table 4). The ratio of replacement to silent polymorphism within species is lower than the ratio of fixed replacement to silent differences between them.
We have applied all the neutrality tests mentioned above to the Sod data from HUDSON et al. 1997
and, using the Tajima test (D = -1.873, P < 0.05), found significant departure from neutrality only for the 3'-flanking region.
Sliding window analysis:
We have analyzed separately different regions of Est-6 and Sod with the sliding window method (HUDSON and KAPLAN 1988
). Figure 4 shows the sliding window plots of exon I of Est-6 for all sequences (Figure 4A) and for the S allelic lineage only (Figure 4B). In both cases, there is a distinct peak of increased variation in the region surrounding the replacement site (772) responsible for the Est-6 F-S allozyme polymorphism; the peak becomes more apparent when only the S allelic lineage is considered (Figure 4B). The same peak is also clearly distinguishable for the data of COOKE and OAKESHOTT 1989
and HASSON and EANES 1996
(Figure 5A and Figure B). For the Sod gene, a distinct peak appears within the intron (Figure 6A) and toward the end of the 3'-flanking region (Figure 6B). These two peaks coincide with areas of significantly high values, according to Tajima's D-test.

View larger version (17K):
In this window
In a new window
Download PPT slide
|
Figure 4.
Sliding window plot of exon I polymorphism ( ) in the Est-6 gene of D. melanogaster. (A) All data. (B) Excluding the F allelic lineage. Window size is 100 nucleotides, with one-nucleotide increments.
|
|

View larger version (15K):
In this window
In a new window
Download PPT slide
|
Figure 6.
Sliding window plot of noncoding polymorphism ( ) in the Sod gene of D. melanogaster. (A) Intron. (B) The 3'-flanking region. Window size is 100 nucleotides, with one-nucleotide increments.
|
|
Overall, the sliding window analysis and the neutrality tests (Tajima's and McDonald and Kreitman's) suggest that the polymorphism distribution in the Est-6 and Sod genes significantly deviates from the expectations of neutrality.
Interspecific comparisons and divergence time:
The Est-6 average distance (nucleotide differences) between D. simulans and D. melanogaster is 91.9 (see Table 4). The average distance between the two main (F and S) allelic lineages of D. melanogaster is 25.5. If we assume that the divergence between the species occurred 2.32.5 mya (POWELL and DESALLE 1995
; RUSSO et al. 1995
), the divergence of the two D. melanogaster lineages would have occurred ~666,000 years agoassuming that Est-6 is a good molecular clock. The divergence between the two F alleles occurred ~105,000 years ago and between S alleles ~73,000 years ago.
The Sod average distance between D. melanogaster and D. simulans is 73.1, while between the two main allelic lineages [F vs. F(A)S] it is 29.3, which corresponds to ~962,000 years, while the divergence between the two F alleles occurred ~427,000 years ago. The average distance among all F(A) and S alleles (excluding 581F, which is probably a recombinant) is 2.4, corresponding to 79,000 years ago; the average distance among the S alleles is 0.8 or ~26,000 years ago.
We have calculated the time of divergence between and within the allelic lineages for both genes following the HUDSON et al. 1997
approach, using all sequences or their homogeneous subsets [see HUDSON et al. 1997
for details]. For the Est-6, the expected time of divergence between the F and S allelic lineages is ~223,000 years [µ x t x 15 (1879 - 0.75 x 1635) = 35; t = 223413.8], where µ is the neutral mutation rate at noncoding and silent sites, assumed to be 16 x 10-9 per site per year. For the Sod F(A)S and F allelic lineages, the divergence time is ~178,000 years [µ x t x 15 (1408 - 0.75 x 439) = 46; t = 177633.6]. The divergence time within the Est-6 S allelic lineage is 96,000 years [µ x t x 13 (1879 - 0.75 x 1635) = 13; t = 95748.8], considering all polymorphic sites, and 59,000 years [µ x t x 13 (1879 -0.75 x 1635) = 8; t = 58922.3], if we exclude putative recombinant sites. The expected time of divergence within the Sod F(A)S allelic lineage is ~45,000 years [µ x t x 13 (1408 - 0.75 x 439) = 10; t = 44556.9] and 27,000 years [µ x t x 13 (1408 - 0.75 x439) = 6; t = 26734.1], excluding the putative recombinant sites.
 | DISCUSSION |
|---|
The Est-6 and Sod genes of D. melanogaster are closely linked on the left arm of chromosome 3, separated by 3.4 cM, or 1 Mb. Both genes are characterized in natural populations by a polymorphism with two common allozymes, S and F, which differ by a single-nucleotide substitution and a corresponding amino acid replacement. In Sod, the S-F replacement is the only amino acid polymorphism commonly found in natural populations, whereas in Est-6, additional rare amino acid replacements are found.
Natural selection:
The Cu,Zn SOD is involved in protecting the cell against the toxicity of oxygen radicals by scavenging superoxide radicals and dismutating them to hydrogen peroxide and molecular oxygen (FRIDOVICH 1986
). The Sod F-S polymorphism is of recent evolutionary origin, as shown by the virtually complete absence of silent variation among S alleles collected in populations that are geographically very distant, from China and Europe to the United States (HUDSON et al. 1994
, HUDSON et al. 1997
). S alleles have not been found in South America, nor in Africa, whence D. melanogaster spread throughout the world, probably in recent millennia. In the United States, the frequency of S has been found to range from 0 to 0.15 in different populations, and can vary from year to year in a given population from 0.05 to 0.15 (HUDSON et al. 1994
, HUDSON et al. 1997
). The S alleles are completely or very nearly identical in nucleotide sequence to a set of F alleles denominated F(A). The F(A) alleles have not been found in Africa, but are present in Europe, the United States, and South America, where they often account for ~50% of all F alleles in a population (HUDSON et al. 1994
, HUDSON et al. 1997
).
A hypothesis of the geographic evolution of the Sod alleles consistent with the information just summarized is as follows. The F(A) mutation arose in a D. melanogaster population outside Africa ~5000 years ago (HUDSON et al. 1997
) and rapidly spread throughout other continents, impelled by natural selection (HUDSON et al. 1994
). The rapidity of the F(A) world expansion is evidenced by the virtually complete absence of silent substitutions throughout a fragment >10 kb that includes Sod (HUDSON et al. 1997
). The S allele arose by a single-nucleotide replacement from F(A), either in Europe or the United States, and spread from one to the other continent, but never reached South America or Africa. The nucleotide identity of S alleles from widely distant continents (and the nucleotide identity of a fragment >10 kb that includes Sod S) favors the interpretation that the F(A)
S mutation occurred only once. The rapid expansion of S in Europe and the United States must have been impelled by natural selection. It is uncertain, however, whether the selective advantage is the same favoring F(A) over other F alleles, or whether the S replacement is also favored. The S and F enzymes differ in such biochemical properties as thermostability and specific activity (LEE et al. 1981
). Moreover, the S allele is at an advantage relative to F in heavily irradiated populations (PENG et al. 1991
) or in those selected for reproduction at an advanced age (TYLER et al. 1993
).
Detecting selection is handicapped in our sample because we have studied only 15 sequences (HANFSTINGL et al. 1994
; SIMONSEN et al. 1995
; HASSON and EANES 1996
; RICHTER et al. 1997
). Nevertheless, we have found evidence of natural selection within the 3'-flanking region of Sod with TAJIMA's (1989) test. Moreover, the sliding window method of HUDSON and KAPLAN 1988
manifests a distinct peak in the 3'-flanking region as well as within the intron of Sod; these peaks are expected as a consequence of balancing selection (STROBECK 1983
; HUDSON and KAPLAN 1988
). Selective effects within introns have been recognized previously. BERRY and KREITMAN 1993
have shown that an intron polymorphism at Adh in D. melanogaster exhibits a more pronounced cline than the F-S allozyme polymorphism that is generally attributed to selection at that locus. KIRBY and STEPHAN 1995
have proposed that positive selection accounts for intron polymorphism also at the white locus of D. melanogaster. This may be because introns may include regulatory sequences (BINGHAM et al. 1988
; ARONOW et al. 1989
; GASCH et al. 1989
; HUANG et al. 1993
; POGULIS and FREYTAG 1993
). There is also other evidence for selective constraints and epistatic selection on the nucleotide sequence evolution of introns (LEARN et al. 1992
; LEICHT et al. 1993
, LEICHT et al. 1995
; STEPHAN and KIRBY 1993
; KIRBY et al. 1995
).
The Est-6 protein is transferred by males to females in the semen fluid during copulation in D. melanogaster (RICHMOND et al. 1980
; RICHMOND and SENIOR 1981
), and it affects the female's consequent behavior and mating proclivity (GROMKO et al. 1984
; SCOTT 1986
). The Est-6 coding sequence is selectively constrained (COOKE and OAKESHOTT 1989
; KAROTAM et al. 1993
, KAROTAM et al. 1995
), although presumably to a lesser extent than Sod, which exhibits only one polymorphic amino acid site in our sample vs. the nine polymorphic sites present in Est-6. The Est-6 5'-flanking region contains positive cis-regulatory elements controlling the expression of Est-6 and may contain binding sites for regulatory protein factors involved in tissue-specific transcription control. The evidence indicates that the 5'-flanking region is evolving under greater selective constraints than the coding region (KAROTAM et al. 1993
, KAROTAM et al. 1995
; ODGERS et al. 1995
). The occurrence of parallel latitudinal clines of the Est-6 F-S polymorphism in Europe, North America, and Australia, the pattern of temporal variation, and other lines of evidence all indicate that the Est-6 F-S polymorphism is subject to balancing selection (OAKESHOTT et al. 1989
, OAKESHOTT et al. 1993
, OAKESHOTT et al. 1995
; RICHMOND et al. 1990
). The peak around the S-F amino acid polymorphism we have observed (Figure 4) is also evidence of balancing selection impacting the Est-6 S-F polymorphism in our sample. The detected tendency cannot be explained by chance, because similar peaks are observed in the Est-6 data of other authors (COOKE and OAKESHOTT 1989
; HASSON and EANES 1996
). The MCDONALD and KREITMAN 1991
test applied to the Est-6 coding region also reveals significant deviation from neutrality.
The Est-6 and Sod loci are enclosed within the cosmopolitan inversion In(3L)Payne, which ranges globally from 0 to 40% (VOELKER et al. 1978
; LEMEUNIER and AULARD 1992
). Selection at the two loci might simply be a consequence of distinctive associations between alleles and chromosomal arrangements. HASSON and EANES 1996
have shown, however, that the In(3L)Payne inversion suppresses recombination in regions proximal to the chromosome breakpoints, but does not affect the central region, where Est-6 and Sod are located. Moreover, there is no association between Est-6 sequence variation and arrangement type, consistent with the extensive genetic exchange observed between standard (ST) and inverted [In(3L)Payne] third chromosomes of D. melanogaster (HASSON and EANES 1996
). In any case, there are no third chromosome inversions segregating in the El Rio population (SMIT-MCBRIDE et al. 1988
).
History of the allelic lineages:
Figure 2 and Figure 3 are maximum-parsimony trees of the Est-6 and Sod alleles. It is apparent that the phylogeny of the electrophoretic alleles is different in the two genes. The two Est-6 F alleles are quite similar to each other, but they are in haplotypes that carry an F(A) Sod allele in one case and an S Sod allele in the other case. The 13 Est-6 S alleles are associated with S or F Sod alleles, and the Sod F alleles include closely related F(A) alleles as well as the other distant F alleles. The F(A) and S alleles of Sod have diverged very recently (HUDSON et al. 1994
, HUDSON et al. 1997
), which is also apparent in Figure 3 and our extensive unpublished data (we reckon that the exceptional Sod allele 581F represents a case of intragenic recombination). We did not, however, expect that the two Est-6 F alleles would be closely related to one another, and even less so that the 13 Est-6 S alleles would have quite similar (and even identical) nucleotide sequences. Rather, we would have expected that the Est-6 S alleles, which may represent the ancestral electrophoretic state, would consist of alleles quite heterogeneous in nucleotide sequence. Such is, indeed, the case for the Sod F alleles, which, if we exclude the F(A) set, are extremely heterogeneous in nucleotide sequence (HUDSON et al. 1994
; F. J. AYALA, unpublished data). A reasonable account of the observations is to assume that the Sod F(A) mutation occurred in a haplotype carrying an Est-6 S allele, and that the rapid world expansion of the Sod F(A) and S alleles greatly enhanced the frequency of that particular Est-6 S allele. The Sod F(A) mutation is estimated to have occurred ~5000 years ago or somewhat earlier (HUDSON et al. 1997
). The 5000 years of the world expansion of the Sod F(A) and S alleles would allow for only limited recombination between Sod and Est-6. This scenario is consistent with the interpretation that the selective pressure favoring Sod F(A) and S alleles has been strong. The two strains 517S and 357F [which have F(A) and S alleles, respectively, at Sod, but the Est-6 F allele] would have arisen by a recombination event between Sod and Est-6. The fact that the two Sod F alleles that are not F(A) occur in haplotypes with Est-6 S alleles is not surprising because the Est-6 S allozyme is the most ancient and common.
An alternative historical scenario that might account for the presence of the two largely divergent sets of alleles that we observe at each locus is population subdivision with subsequent admixture. D. melanogaster may have been geographically split into two populations that remained separate for a time long enough to accumulate a number of nucleotide substitutions within each gene. The substitutions would have accumulated independently in the two populations. Recent admixture of the two populations would have brought together the two sets of alleles, as we find them in the El Rio population, where our strains were collected. According to this scenario, however, the two sets of alleles at the two loci would be associated in the same haplotypes. This is not the case. As shown in Figure 2 and Figure 3, the strains with the two Est-6 F alleles (517S and 357F) are different from the strains carrying the two Sod F alleles (377F and 581F).
Linkage disequilibrium:
Sod and Est-6 are 3.4 cM apart in the recombination map (332.5 and 335.9; HEINO et al. 1994
), separated by 983 kb in the genomic map (A. LONG, personal communication). Yet the nucleotide polymorphisms are negatively associated between the two loci (Table 3, P << 0.001); that is, minority nucleotide substitutions (present in one to five strains) at one locus are negatively associated with minority nucleotide substitutions at the other locus. A selection account of this observation implies that mutations occurring at one locus are selected against if nucleotide substitutions are present in the other locus, but not if the other locus exhibits the consensus sequence.
There is an extensive genetic literature advancing the notion that the unit of evolution is not the single gene, but rather, interacting gene complexes (WRIGHT 1931
; DOBZHANSKY 1937
; SCHMALHAUSEN 1946
; MATHER 1953
; WADDINGTON 1957
; MAYR 1963
; FRANKLIN and LEWONTIN 1970
). There also are traditional arguments propounding that sets of genes for quantitative traits are built up in an alternated arrangement of plus and minus alleles on chromosomes, so that selection minimizes actual variation for such traits, but maximizes potential variability, which is released by recombination. This +/- alternating arrangement explains why populations rapidly respond to varying environmental challenges, as well as the success of artificial selection favoring extreme values of the traits (MATHER 1953
; THODAY et al. 1964
). However, even if we accept such theoretical constructs and their evidential support, it remains obscure why the plus/minus compensatory arrangement would occur between two genes that are unlikely to share distinctive functional interactions, and between nucleotide substitutions that are in most cases synonymous.
It seems more likely that linkage disequilibrium has arisen as a consequence of selection strongly favoring one particular sequence at a locus, such as seems to have occurred at Sod, where the F(A) alleles (and the derivative S alleles) rapidly increased in frequency. Rare substitutions present in Est-6 would have hitchhiked along with Sod F(A) without allowing enough time for their elimination by purifying selection with or without recombination between the two loci. The reciprocal situation would have also occurred, in which common Est-6 alleles are favored by selection and low-frequency Sod substitutions are hitchhiking along. As noted earlier, there is evidence of positive selection in favor of Sod F(A); in the case of Est-6, the evidence favors balancing selection between the two common allozymes, S and F. In a study of two wild samples and four experimental populations, SMIT-MCBRIDE et al. 1988
did not observe linkage disequilibrium between Est-6 and Sod in the large wild samples, but linkage disequilibrium appeared after 830 generations in three of the four experimental populations derived from the wild samples. Population bottlenecks and other factors were excluded. SMIT-MCBRIDE et al. 1988
noted that natural selection acting on the allozymes, or on loci very tightly linked to them, was the most likely explanation for the disequilibrium.
HUDSON et al. 1997
have estimated that the time since the selective sweep for Sod F(A)S is ~5000 years or more, but not likely more than 30,00040,000 years. Assuming a molecular (neutral) clock, the age of this lineage is estimated at 79,000 years. Similarly, the time of divergence between the two main Sod allelic lineages is 178,000 years with the method of HUDSON et al. 1997
, but five times greater, 962,000 years, under the molecular clock assumption. These discrepancies can be accounted for by natural selection accelerating the evolution of the selectively favored F(A)S alleles. HUDSON et al. 1997
argue that the selection is strong (s > 0.01). Parallel differences occur for Est-6, but the differences between the estimates obtained by the two methods are smaller. Thus, the age of the Est-6 S allelic lineage is 60,00095,000 vs. 73,000 years, and the divergence between the S-F lineages is 220,000 vs. 666,000 years, corresponding to the method of HUDSON et al. 1997
vs. the clock method. This is consistent with the hypothesis that the divergence between the two Est-6 allelic lineages has been impelled by natural selection, but with lesser strength than in the case of Sod.
 | ACKNOWLEDGMENTS |
|---|
We thank several members of our laboratory: Kevin Bailey, Heather Carstens, Victor DeFilippis, Alberto García Sáez, Jan Kwiatowski, Carlos Machado, Stephen Rich, Douglas Skarecky, and Andrei Tatarenkov for encouragement and help. Walter M. Fitch, Brandon Gaut, Richard R. Hudson, and Anthony Long read the manuscript and offered valuable comments. This work is supported by National Institutes of Health grant GM-42397 to F.J.A.
Manuscript received June 7, 1999; Accepted for publication July 22, 1999.
 | LITERATURE CITED |
|---|
AGUADÉ, M., 1998 Different forces drive the evolution of the Acp26Aa and Acp26Ab accessory gland genes in the Drosophila melanogaster species complex. Genetics 150:1079-1089[Abstract/Free Full Text].
AQUADRO, C. F., 1993 Molecular population genetics of Drosophila, pp. 222266 in Molecular Approaches to Fundamental and Applied Entomology, edited by J. OAKESHOTT and M. J. WHITTEN. Springer-Verlag, New York.
ARONOW, B., D. LATTLER, R. SILBIGER, M. DUSING, and J. HUTTON et al., 1989 Evidence for a complex regulatory array in the first intron of the human adenosine deaminase gene. Genes Dev. 3:1384-1400[Abstract/Free Full Text].
AVERY, P. J. and W. G. HILL, 1979 Distribution of linkage disequilibrium with selection and finite population size. Genet. Res. 33:29-48.
AYALA, F. J., J. R. POWELL, and TH. DOBZHANSKY, 1971 Enzyme variability in the Drosophila willistoni group. II. Polymorphisms in continental and island populations of Drosophila willistoni. Proc. Natl. Acad. Sci. USA 68:2480-2483[Abstract/Free Full Text].
AYALA, F. J., J. R. POWELL, M. L. TRACEY, C. A. MOURÃO, and S. PÉREZ-SALAS, 1972a Enzyme variability in the Drosophila willistoni group. IV. Genic variation in natural populations of Drosophila willistoni. Genetics 70:113-139[Abstract/Free Full Text].
AYALA, F. J., J. R. POWELL, and M. L. TRACEY, 1972b Enzyme variability in the Drosophila willistoni group. V. Genetic variability in natural populations of Drosophila equinoxialis. Genet. Res. 20:19-42[Medline].
BALAKIREV, E. S. and F. J. AYALA, 1996 Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster? Genetics 144:1511-1518[Abstract].
BARKER, J. A. F., 1979 Inter-locus interactions: a review of experimental evidence. Theor. Popul. Biol. 16:323-346[Medline].
BEGUN, D. J. and C. F. AQUADRO, 1994 Evolutionary inferences from DNA variation at the 6-phosphogluconate dehydrogenase locus in natural populations of Drosophila: selection and geographi