Abstract
The within-chromosome distribution of gametic disequilibrium (GD) between protein loci, and the underlying evolutionary factors of this distribution, are still largely unknown. Here, we report a detailed study of GD between a large number of protein loci (15) spanning 87% of the total length of the third chromosome of Drosophila melanogaster in a large sample of haplotypes (600) drawn from a single natural population. We used a sign-based GD estimation method recently developed for multiallelic systems, which considerably increases both the statistical power and the accuracy of estimation of the intensity of GD. We found that strong GD between pairs of protein loci was widespread throughout the chromosome. In total, 22% of both the pairs of alleles and pairs of loci were in significant GD, with mean intensities (as measured by D′ coefficients) of 0.43 and 0.31, respectively. In addition, strong GD often occurs between loci that are far apart. By way of illustration, 32% of the allele pairs in significant GD occurred within pairs of loci separated by effective frequencies of recombination (EFRs) of 15–20 cM, the mean D′ value being 0.49. These observations are in sharp contrast with previous studies showing that GD between protein loci is rarely found in natural populations of outcrossing species, even between very closely linked loci. Interestingly, we found that most instances of significant interallelic GD (68%) involved functionally related protein loci. Specifically, GD was markedly more frequent between protein loci related by the functions of hormonal control, molybdenum control, antioxidant defense system, and reproduction than between loci without known functional relationship, which is indicative of epistatic selection. Furthermore, long-distance GD between functionally related loci (mean EFR 9 cM) suggests that epistatic interactions must be very strong along the chromosome. This evidence is hardly compatible with the neutral theory and has far-reaching implications for understanding the multilocus architecture of the functional genome. Our findings also suggest that GD may be a useful tool for discovering networks of functionally interacting proteins.
KNOWLEDGE about gametic disequilibrium (GD), the nonrandom association of alleles at different loci, remains limited for loci forming part of the functional genome. Nevertheless, knowledge on the amount and distribution of GD among protein loci along chromosomes and of the underlying evolutionary factors is fundamental to unraveling the multilocus architecture of the functional genome and its evolutionary dynamics. The theory of multilocus genetic systems suggests that epistatic (i.e., nonadditive) interactions in fitness are able to generate GD if linkage among loci is tighter than the value required by the magnitude of the epistasis. If this were so, the study of individual loci would have insufficient dimensionality to explain the empirical observations, because multilocus systems have additional properties over and above those of individual loci (Franklin and Lewontin 1970; Slatkin 1972; Lewontin 1974). Undoubtedly, more information is needed to ascertain the most appropriate levels of analysis. In this connection, many basic aspects of the functional multilocus architecture of chromosomes are still poorly understood, including (1) frequency and intensity of GD, (2) distribution of GD along chromosomes and its relationship with recombination frequency, and (3) the amount of GD among protein loci subject to selection.
Extensive empirical work has been carried out in the past with the aim of assessing the amount of GD between protein loci in natural populations, particularly in Drosophila species. It has been found that there is very little GD in random-mating populations, even between closely linked protein loci (Hedricket al. 1978; Barker 1979; Lewontin 1985; Hartl and Clark 1997; Mitton 1997; Powell 1997; Hedrick 2000). This apparent scarcity of GD suggested that multilocus genetics was of minor importance for understanding the evolution of protein variation (Lewontin 1985). Nevertheless, Zapata and Alvarez (1992) demonstrated that the failure to detect abundant GD between protein loci was in great measure due to the exceedingly low statistical powers resulting from small sample sizes. A metaanalysis of a number of individual studies of GD in Drosophila species showed that, despite the lack of significance in most individual studies, GD is in fact relatively frequent even between loosely linked protein loci. Specifically, 28% of the pairs of loci analyzed were in significant GD, and the average effective frequency of recombination was 9 cM (Zapata and Alvarez 1992).
The common practice of using only a few (two to six) protein loci per chromosome (see Zapata and Alvarez 1992) does not provide a systematic picture of the within-chromosome distribution of GD. It is certainly paradoxical that the within-chromosome distribution of GD for first-generation molecular genetic markers or protein loci is to date much less well known than for last-generation markers. The past few years have witnessed an extraordinary surge of information concerning the distribution of GD between DNA markers over extensive regions of human chromosomes. This research has been based on the analysis of single-nucleotide polymorphisms (SNPs; Taillon-Milleret al. 2000; Goldstein 2001) and microsatellites (Petersonet al. 1995; Zapataet al. 2001) and has been fueled by the usefulness of GD for locating genes that underlie susceptibility to common diseases. Unfortunately, estimates of GD from DNA markers such as SNPs and microsatellites cannot be used as reliable indicators of background GD over chromosomes, because of the occurrence of marker-dependent GD (Zapataet al. 2001). Each type of genetic marker is endowed with particular characteristics that influence its evolutionary dynamics, and thus reliable information on GD among protein loci can be obtained only from the study of the loci themselves.
A procedure that has been used almost routinely in GD studies is the pooling of rarer alleles at each locus into a single class to avoid the statistical problem of small expected frequencies, thereby reducing multiallelic loci to diallelic systems. Such a procedure may be criticized for several reasons. First, a considerable amount of information of value for elucidating the evolutionary forces that generate GD could eventually be lost by the pooling of alleles (Hedrick and Thomson 1986; Klitz and Thomson 1987; Zapataet al. 2001). Second, GD may be underestimated because pooling of alleles often reduces the statistical power of the statistical tests used (Weir and Cockerham 1978; Sham and Curtis 1995). Third, the pooling of alleles prevents proper quantification of the amount of GD and reliable detection of the possible occurrence of allele-dependent GD patterns (Zapataet al. 2001). Last, the manner in which the alleles are pooled can have important and unknown effects on the inferences drawn, because the allelic combinations probably have different GD levels. However, the theory of estimation of GD for multiallelic systems is nowadays sufficiently developed to make it both unnecessary and undesirable to pool alleles. In addition, the problem of false positives associated with small expectations can be evaluated by comparing the results from conservative and nonconservative statistical tests, taking into account that statistical tests should be used only as tools for the exploration of the hypotheses that need to be substantiated on the basis of other lines of evidence.
A long-standing question at the interpretative level is what evolutionary forces are responsible for the observed GD. This is a very difficult question because, besides natural selection, a number of other forces can generate GD in populations, such as genetic drift or bottlenecks (Hill and Robertson 1968; Avery and Hill 1979), stratification or admixture (Nei and Li 1973; Chakraborty and Weiss 1988), and recent mutations (Zapataet al. 2001); thus alternative explanations cannot usually be ruled out. The idea that epistatic interactions are more likely to occur between loci coding for proteins having a functional relationship provides one of the best ways to relate GD to selection. However, the experimental evidence on the occurrence of GD between functionally related protein loci is to date scarce (Mitton and Koehn 1973; Zouros and Krimbas 1973; Zouros and Johnson 1976; Fontdevilaet al. 1983; Zapataet al. 2000) and sometimes contradictory (Zouros and Krimbas 1973; Loukaset al. 1979).
Unraveling the forces generating GD among protein loci has acquired new relevance with the emerging field widely referred to as “proteomics.” As is well known, a major challenge over the coming years will be to interpret the functional significance of the DNA sequences that are currently being revealed by genome research. The goal of proteomics is not only to catalog the proteins of a given cell, but also to identify their functions and interactions, and thus to understand the biology of the cell (Dove 1999; Williams 1999; Eisenberget al. 2000; Pandey and Mann 2000). Various methodological approaches have been used to identify the functions of unknown proteins and their interactions, including their similarity in structure, sequence, or activity to previously characterized proteins; their phylogenetic profiles; and yeast two-hybrid systems (Eisenberget al. 2000; Pandey and Mann 2000). In addition, we suggest that GD analysis may be a useful approach for detecting networks of functionally interacting proteins, if it is demonstrated that GD is more likely to occur among loci coding for functionally related proteins.
In this article we report the amount (frequency and intensity) and distribution of overall and interallelic GD between pairs of 15 protein loci distributed along the third chromosome of Drosophila melanogaster. We used a considerably larger sample size than in previous studies and a sign-based GD estimation method for multiallelic loci, thus improving statistical power. Analysis of existing information on the functional relationships among the protein loci studied provided a basis for identification of evolutionary factors underlying the distribution of GD along the chromosome and for assessment of the potential usefulness of GD analysis in proteomics.
MATERIALS AND METHODS
Sampling and extraction of the third chromosomes: Adult flies of D. melanogaster were collected over a 3-day period in October 1998 by net-sweeping over conventional fermented banana baits placed in a large extension of fruit trees in Santa Cruz de Rivadulla (A Coruña, northwest Spain). Homozygous lines for independent third chromosomes were established from males by standard crosses to TM6B/MKRS balancer stock (Lindsley and Zimm 1992). Each wild male was mated to several virgin TM6B/MKRS females. A single F1 male of each cross, genotype III+/TM6B, was backcrossed to the balancer stock. Males and females from these matings, genotypes III+/TM6B, were mated to initiate the homozygous lines (III+/III+), which were maintained in mass cultures. Less viable or lethal wild third chromosomes were maintained balanced (no recombination) with TM6B. A total of 605 lines were eventually obtained and characterized by karyotype and allozyme analyses.
—Location of the 15 protein loci studied on the left (3L) and right (3R) arms of the third chromosome of D. melanogaster. Map distance (in centimorgans) is shown for each locus.
A sample of adults was collected in September 1998 for analyzing the genetic structure of the natural population of Santa Cruz de Rivadulla.
Selection, denomination, recombination frequencies, and functional relationships of the protein loci studied: To characterize GD we considered all previously described protein loci on the third chromosome, with only two basic requirements. First, the map positions of the loci were known; second, the protein loci were polymorphic according to previous screenings of variability in the natural population of Santa Cruz de Rivadulla. Fifteen protein loci distributed along the left and right arms of the third chromosome eventually satisfied those two requirements (Figure 1). The following 15 protein loci in the third chromosome, ordered from left-arm telomere to right-arm telomere, were studied: larval serum protein 1 (Lsp-1c), isocitrate dehydrogenase (E.C. 1.1.1.42; Idh), tetrazolium oxidase-1 (E.C. 1.15.1.1; To-1), esterase-6 (E.C. 3.1.1.1; Est-6), larval serum protein 2 (Lsp-2), phosphoglucomutase (E.C. 5.4.2.2; Pgm), alkaline phosphatase (E.C. 3.1.3.1; Aph), esterase-C (E.C. 3.1.1.1; Est-C), glucose oxidase (E.C. 1.1.99.10; Go), octanol dehydrogenase (E.C. 1.2.1.1; Odh), malic enzyme (E.C. 1.1.1.40; Me), xanthine dehydrogenase (E.C. 1.1.1.204; Xdh), pyridoxal oxidase (E.C. 1.2.3.8; Po), aldehyde oxidase-1 (E.C. 1.2.1.3; Ao-1), and leucine aminopeptidase-D (E.C. 3.4.11.1; Lap-D) (http://flybase.bio.indiana.edu).
The 15 loci studied span ∼87% (96.9 cM, from 1.4 to 98.3 cM) of the total length (110.9 cM) of the third chromosome (Lindsley and Zimm 1992). Effective frequencies of recombination (EFRs) between all possible pairs of the 15 loci were obtained from map distances (http://flybase.bio.indiana.edu), using Kosambi's map function (Kosambi 1944) and assuming no recombination in males of Drosophila (Table 1). The resulting values ranged from 0.1 cM (Est-C × Go, Me × Xdh, and Po × Ao-1) to 24 cM (Lsp-1c × Lap-D) and averaged 8.7 ± 0.7 cM.
An exhaustive analysis of the available information about the functional relationships of the loci studied (http://flybase.bio.indiana.edu) led us to group them under five major functional categories: hormonal control (Hewitt 1974; O'Brien and MacIntyre 1978; Cox-Fosteret al. 1990; Karotam and Oakeshott 1993; Antoniewskiet al. 1995; Farkas and Knopp 1998; Burmesteret al. 1999), molybdenum control (Hanly 1980; Warner and Finnerty 1981), antioxidant defense system (Hillikeret al. 1992; Humphreyset al. 1996), glucose metabolism (Cavener 1980; Metzler 2001), and reproduction (Cavener and MacIntyre 1983; Phillipset al. 1989; Richmondet al. 1990; Masseyet al. 1997). The distribution of the protein loci according to these categories is shown in Table 2. It can be seen that all the loci studied, except Aph, Est-C, and Lap-D, are related by at least one of these functional categories.
Cytological and electrophoretic analyses: Lines were screened for inversions by crossing males derived from each line with virgin Oregon-R (Or-R) females homozygous for the standard gene arrangement (Lindsley and Grell 1968). Larval salivary glands from the F1 progeny were dissected and squashes were examined after staining the chromosomes with lactic acetic orcein. Break points of the inversions were determined by comparison with standard maps (Lefevre 1976).
The 15 protein loci in the third chromosome were studied using starch gel electrophoresis except Lsp-1c and Lsp-2, which were analyzed by acrylamide gel electrophoresis. Electrophoresis and histochemical staining methods were essentially as described previously (Ayalaet al. 1972; Loukaset al. 1974; Loukas and Krimbas 1979, 1980; Cavener 1980; Cypheret al. 1982). Chromosomes in balance with TM6B were assayed for the enzymes described above, using a TM6B/MKRS balancer stock derived by us from a single TM6B chromosome with known allelic constitution at the 15 protein loci.
Genotypes at nine protein loci active in the adult stage (To-1, Est-6, Pgm, Est-C, Go, Odh, Me, Po, and Ao-1) were also determined in the sample of adults collected in September 1998.
Measures of variability and deviations from Hardy-Weinberg proportions: Unbiased expected heterozygosities at single loci were calculated as
Estimation of gametic disequilibrium: Inversion-carrying chromosomes and singletons were excluded from the analysis of GD, and so the number of haplotypes across pairs of loci ranged from 557 to 600. We used a sign-based GD estimation method recently proposed for multiallelic systems (Zapataet al. 2001). Briefly, let pi and qj be the relative frequencies of alleles Ai (i = 1,..., k) and Bj (j = 1,..., l) at the loci A and B, respectively, and Xij be the relative frequency of the haplotype Ai Bj in N haplotypes sampled from a population. The array of possible two-locus haplotypes can be ordered in a k × l contingency table and subsequently partitioned into k × l separate 2 × 2 contingency tables, the frequency in each cell being Xij, pi – Xij, qj – Xij, and 1 – pi – qj + Xij. We then considered the haplotype classes in coupling to be those involving the most or the least frequent alleles, and only instances of interallelic GD with a positive sign were included in the analyses. This strategy increases both statistical power and the accuracy of estimation of the intensity of GD between pairs of alleles.
Effective frequencies of recombination (in centimorgans) between the pairs of protein loci studied
The intensity of GD between pairs of alleles at the two loci was measured by
Functional relationships among the 15 protein loci studied
The relationship between the magnitude of GD and the EFR between locus pairs was investigated by calculation of the Pearson product-moment correlation coefficient (r) and Kendall's nonparametric coefficient of rank correlation (τ; Sokal and Rohlf 1995), with assessment of significance by one-sided Mantel's matrix-comparison test (Mantel 1967).
Deviations from Hardy-Weinberg proportions
RESULTS
Variability and deviations from Hardy-Weinberg proportions: The 15 protein loci studied were slightly to highly polymorphic. A total of 49 alleles were detected in the sample of haplotypes, and 22 (45%) of 49 had a frequency ≤3%. Rare alleles (frequency ≤3%) were distributed across all loci with the exception of To-1, Go, and Lap-D. The number of alleles varied between 2 (To-1, Lsp-2, Go, and Me) and 6 (Xdh), with the mean (±SE) being 3.3 ± 0.3. Unbiased estimates of the expected allelic diversity ranged from 0.01 (Me) to 0.66 (Xdh). The 15 markers had an average estimated heterozygosity of 0.24 ± 0.06. Such levels of variability are in agreement with those expected for these loci in west European populations of D. melanogaster (Girardet al. 1977; Cabreraet al. 1982; David 1982; Singhet al. 1982). There is no evidence of recent founder effects, which could induce GD in the natural population of Santa Cruz de Rivadulla.
We found no consistent evidence of deviations from Hardy-Weinberg proportions in the natural population of Santa Cruz de Rivadulla (i.e., in the sample of genotypes collected in September 1998; Table 3). Except at the To-1 locus, the magnitude of the deviation was in all cases very small and not statistically significant (Table 3). In six of the eight loci, the deviation was positive (i.e., fewer observed homozygotes than expected), but this apparent trend was likewise not statistically significant (sign-test, P > 0.05). The natural population of Santa Cruz de Rivadulla thus seems to behave as a panmictic unit with no indication of population subheterogeneity (which might cause GD). Evidence of fit to Hardy-Weinberg proportions is also a quality control of genotyping (Gomeset al. 1999).
Interallelic GD: A total of 800 different two-locus haplotypes were detected across all the possible pairwise comparisons (105) of the 15 loci studied. Only 329 allelic combinations with positive deviations from random association were eventually considered in the analyses. The 329 allelic combinations distributed at 79 pairs of loci. The frequency and the mean intensity of significant interallelic GD within each locus pair are shown in Figure 2. It was found that 72 (22%) of the 329 allelic combinations were in significant GD (P < 0.05), with a mean intensity as measured by
Further evidence discussed below argues against the possibility that most of the observed interallelic GD may have arisen as a consequence of type I errors. In fact, however, it is more likely that our estimate of the frequency of GD is an underestimate because the statistical power of the chi-square test to detect significant deviations from random association is low, even with the large sample sizes used in this study (see Zapata and Alvarez 1992).
Relationship of interallelic GD with recombination: Analysis of the relationship between GD and EFR revealed several interesting observations. First, GD was more frequent between more closely linked locus pairs: the frequency of significant GD was 33% (14/42) for locus pairs separated by EFRs between 0 and 2 cM and 20% (58/287) for locus pairs separated by >2 cM. This difference was statistically significant based on the onesided chi-square test for 2 × 2 contingency tables with 1 d.f. (χ2 = 3.69, P = 0.03). However, strong GD often extends over considerable chromosomal distances. The EFR between pairs of alleles in GD ranged from 0.1 cM (Est-C × Go, Me × Xdh, and Po × Ao-1) to 20.1 cM (Lsp-1γ× Po and Lsp-1γ× Ao-1), with a mean value of 9.2 ± 0.8 cM. We observed that 32% (23/72) of the significant cases were separated by EFRs within 15–20 cM, while mean
—Distribution of interallelic GD between pairs of loci along the third chromosome of D. melanogaster. Frequency (hatched bars) and mean intensity (solid bars) of significant interallelic GD within each locus pair are given as percentages. Data for pairs of loci with negative deviations from random association are not shown.
Interallelic GD dependent on the functional relationship among loci: Table 4 shows that GD was about two times more frequent among functionally related locus pairs than among locus pairs without known functional relationships. Specifically, the percentage of significant interallelic GD was 36% (32/90) for related locus pairs and 17% (40/239) for unrelated pairs. This difference was highly significant (P < 0.001, one-sided chi-square test for 2 × 2 contingency tables). The intensity values are also consistent with this pattern: considering allele pairs showing significant GD, the mean
—Plot of values against effective frequency of recombination (EFR) for significant comparisons.
The above-mentioned general trend was also evident when each particular functional category was considered separately: significant differences in the frequency of GD were detected between unrelated loci and those loci related by the functions of hormonal control (P < 0.001), molybdenum control (P < 0.001), antioxidant defense system (P < 0.01), and reproduction (P < 0.05), with the only exception being glucose metabolism (P = 0.14). In addition, the mean of
Interallelic GD dependent on the functional relationships among protein loci
Allele frequency-dependent GD: We next investigated whether GD depends on allele frequency. Following the same arbitrary criterion used in a previous study (Zapataet al. 2001), we subdivided the sample of two-locus haplotypes (329) into RR, RC, and CC, where R and C indicate rarer (frequency ≤3%) and more common (frequency >3%) alleles. Table 5 shows that GD is significantly more frequent in RR haplotypes (89%) than in RC (22%) or CC (18%) haplotypes (P ≪ 0.001), with no significant difference between RC and CC (P = 0.37). An increase in type I errors when one or more expectations in 2 × 2 contingency tables are small may be a source of spurious allele frequency-dependent patterns. In this case, however, the same allele frequency-dependent pattern was maintained when we considered only interallelic GD shown to be significant by the chi-square test with Yates's correction (Table 5). The extremely conservative behavior of this test for small expectations (i.e., it maintains the null hypothesis too frequently; Zapataet al. 2001) makes it very unlikely the pattern observed is attributable to type I error.
In addition, the pattern cannot be explained by variations in the amount of recombination, since mean EFR values were even higher for RR haplotypes (10.2 ± 2.7 cM) than for RC (8.4 ± 1.2 cM) and CC (9.9 ± 1.3 cM) haplotypes.
Overall two-locus GD: The relative contributions of allelic combinations in GD within each locus pair as a whole were evaluated by estimating overall two-locus GD (Table 6). Significant overall GD was detected in 17 (22%) of the 79 pairs of loci studied, despite application of the highly conservative Bonferroni correction. All loci studied were involved in at least 1 locus pair in significant overall GD, except Aph and Lap-D. It is worth noting that Aph and Lap-D are two loci without apparent functional relationships with any of the other loci studied (see Table 2). The loci involved in the highest relative number of comparisons in significant overall GD (∼44%) were Lsp-1γ, Lsp-2, and Me.
Interallelic GD depending on allele frequency
The intensity of significant overall GD between pairs of loci extends practically over the possible range of variation of D′(+). In this case, the possible range of D′(+) varies from 0 to 1, because the most common allele at each locus studied had a frequency >0.5 (see Zapata 2000). Observed D′(+) values ranged from 0.017 (Est-C × Xdh) to 1.0 (Lsp-1γ× Lsp-2 and Est-6 × Me) and averaged 0.31 ± 0.07. Note that overall GD between Est-C and Xdh was significant although its intensity was very weak. This is readily explained, because the pair Est-C × Xdh includes some haplotypes (two out of
six) carrying rare alleles in strong GD [
Pairs of protein loci in significant overall GD
Overall GD showed similar patterns of dependence on EFR and the functional relationships among loci to those detected for interallelic GD, although statistically this dependence was less consistent (analyses not shown). Naturally, overall GD analyses are less informative, because all the associations within a given locus pair are included in a single intensity measure and significance test, and significant and nonsignificant associations are considered jointly.
DISCUSSION
This study provides the first comprehensive data on the distribution of gametic disequilibrium (GD) between pairs of protein loci along a single chromosome. A first striking result of the study is that GD is frequent between protein loci on the third chromosome of D. melanogaster, since 22% of both allele pairs (72/329) and locus pairs (17/79) were in significant GD. In addition, GD often extends over considerable chromosomal distances: 32% of the interallelic associations involved pairs of loci separated by EFRs of 15–20 cM. This incidence of GD between pairs of protein loci is substantially larger than that previously reported for the same chromosome in other natural populations: for example, 10% of 20 pairs in a population in the United Kingdom (Charlesworth and Charlesworth 1973), 10% of 20 pairs in a population in North Carolina (Langleyet al. 1977), 20% of 15 pairs in a population in Japan, and none of 15 pairs in a population in Texas (Langleyet al. 1974), with an average EFR for significant pairwise comparisons generally <4 cM. A number of factors may have contributed to the higher frequency of significant GD detected in this study, such as the larger number of haplotypes sampled. However, the use of the sign-based GD estimation method seems to have been the most important factor, since the percentage of significant interallelic associations decreased in our study from 22% to only 8% when both positive and negative interallelic associations were considered. In addition, our results are similar to those obtained by Zapata and Alvarez (1992) in a metaanalysis of a large number of GD studies between pairs of protein loci on the second chromosome of D. melanogaster and the O chromosome of D. subobscura (see Introduction). Taken together, these observations argue against the long-standing paradigm that GD between protein loci is a rare phenomenon, even between very closely linked loci.
The relatively large number of protein loci considered in this study has allowed us to identify the specific factors that primarily determine the distribution of GD along the third chromosome of D. melanogaster. Both interallelic and overall GD were distributed all along the chromosome, involving closely and loosely linked loci on the same arm and on different arms. In addition, the amount (frequency and intensity) of interallelic and overall GD varies markedly among pairs of alleles and pairs of loci. These observations suggest that the factors governing the distribution of GD cannot be described in terms of simple rules. It should be noted that the distribution of GD between rather loosely linked loci along a chromosome always depends on the relative magnitude of two antagonistic forces. Specifically, the GD generated between two loci by some evolutionary force (i.e., selection, genetic drift or bottlenecks, admixture or stratification, and recent mutations) is broken down each generation by recombination during gametogenesis in doubly heterozygous individuals, at a rate determined by the recombination rate. Therefore, GD should tend to decay in proportion to the between-locus recombination rate if the processes generating GD are acting uniformly along the chromosome. Nevertheless, in this study, we did not find any indication that the strength of GD decreases monotonously with increasing between-locus recombination rate. This observation indicates that the evolutionary factor(s) causing the observed GD operate nonuniformly along the chromosome and with sufficient intensity to impose the high recombination frequencies existing between very distant loci. A common problem in GD studies is that the observed GD can be accounted for by a number of causes besides natural selection, and alternative explanations cannot be totally excluded. Nevertheless, our data set strongly supports the view that epistatic interactions in fitness are of primary importance in determining the amount and distribution patterns of gametic disequilibrium along the third chromosome of D. melanogaster. We found that GD was significantly higher between protein loci related by the functions of hormonal control, molybdenum control, antioxidant defense system, and reproduction than between loci without any known functional relationship. This evidence is strongly suggestive of epistatic interactions between functionally related protein loci. An interpretation of our results based on selection operating on other linked fitness genes—hitchhiking (Thomson 1977), background selection (Charlesworthet al. 1993), or associative overdominance (Ohta and Kimura 1970)—and not on the protein loci themselves can scarcely explain why the related loci are more prone to be in GD: it would be unreasonable to assume a scenario in which only functionally related protein loci have linked fitness genes. Recent migration events could potentially explain GD between very distant loci with different allelic frequencies among populations, even if genotype frequencies conform to HW proportions (Zapataet al. 2001). However, an interpretation of the observed GD based on migration would also require us to assume that functionally related loci are more differentiated among populations and this brings us back to epistasis, because genetic differentiation among populations for functionally related loci is very unlikely without epistatic interactions within each population.
It should be noted that in the past there have been many studies that have found no evidence of epistatic interactions generating GD between loci that are functionally related (e.g., Charlesworth and Charlesworth 1973; Langley et al. 1974, 1977; Charlesworthet al. 1979; Loukaset al. 1979; Ward and McAndrew 1985). GD between related protein loci has been reported only occasionally for particular locus pairs in some natural population of species such as Mytilus edulis (Mitton and Koehn 1973), D. mojavensis (Zouros and Johnson 1976), and D. subobscura (Zouros and Krimbas 1973; Fontdevilaet al. 1983; Zapataet al. 2000). In contrast, our results provide the first evidence showing that extensive GD occurs between functionally related loci along a chromosome. Two-thirds of the alleles in significant GD corresponded to pairs of loci between which functional relationships exist. It is also interesting to note that the pair Lsp-1γ× Lsp-2 is one of the two locus pairs showing the greatest amount of GD in our study, despite being separated by as much as 15.3 cM. In accordance with this result, the Lsp-1γ and Lsp-2 loci code for the most closely related proteins. The functional relationships between Lsp-1γ and Lsp-2 are well documented in D. melanogaster. At the end of the third larval instar, the hormone 20-hydroxy-ecdysone triggers the incorporation of larval serum proteins 1 and 2 into the fat body. These larval serum proteins are delivered to storage granules and serve as an energy and amino acid pool used during metamorphosis (Antoniewskiet al. 1995; Burmesteret al. 1999). The two proteins serve as nutrient reserves to support metamorphosis and reproduction (Masseyet al. 1997) and have similar biochemical properties (Akamet al. 1978). It has been suggested that the two loci were generated through duplication of a single ancestral gene after separation of the Diptera from other insect orders, with subsequent independent evolution (Roberts and Evans-Roberts 1979; Mousseron-Grallet al. 1997). In addition, the fact that GD was detected between related protein loci separated by very considerable recombination distances (EFR of 9 cM on average) suggests that epistatic interactions along chromosome 3 of D. melanogaster must be very strong. These observations are of considerable relevance to the long-running neutralist-selectionist controversy (Nei 1987; Mitton 1997). Certainly, epistatic interactions involving a large number of protein loci are difficult to reconcile with the neutral theory. The between-locus GD expected as a result of genetic drift depends only on effective population size and recombination frequency (Hill and Robertson 1968); thus GD generated by genetic drift should tend to decrease with increasing recombination. Therefore, genetic drift cannot explain satisfactorily why GD is not mainly dependent on recombination frequency, but rather on the functional relationships among loci. Widespread GD among functionally related protein loci points to the importance of multi-locus genetics for understanding the evolution of protein variation. Unfortunately, little is yet known about the behavior of multilocus systems under epistatic selection. The general equations describing the interaction between selection and linkage do not have explicit solutions at equilibrium even for the simplest two-locus, two-allele viability model, which imposes heuristic approaches by examining particular selection cases. Analytic and simulated selection studies on two-locus and multilocus systems were traditionally based on viability models for the two-allele case, assuming constant selection, sex-independent selection, and particular fitness parameter sets (often symmetric selection models), which considerably simplifies the algebra but results in clearly unrealistic models (Bodmer and Felsenstein 1967; Li 1967; Franklin and Lewontin 1970; Hartl and Clark 1997). Multilocus models incorporating sex-dependent variable selection, fertility, and total fitness components, as well as loci with multiple alleles, need to be explored for more realistic interpretations. Clearly, though, the development and implementation of realistic multi-locus models poses mathematical and computational problems that at present are effectively intractable.
One intriguing finding of our study is that a small but important proportion (∼30%) of the alleles in significant GD are at pairs of loci without any known functional relationship. This suggests that some other factor in addition to epistatic selection is generating GD in the natural population of Santa Cruz de Rivadulla. Part of this GD could be explained under the observation that two-locus haplotypes bearing rarer alleles are more frequently in GD than the remaining haplotypes. Recently, an allele frequency-dependent GD pattern was reported between microsatellite loci located on the human chromosome 11 (Zapataet al. 2001). This pattern was attributed to the recent origin of the rarer alleles due to the high rates of mutation at human microsatellites (∼10–3) and because time is required for new variants to increase their frequency in the population. However, protein loci show rates of mutation (∼10–5, Powell 1997) about two orders of magnitude lower than human microsatellites, so that other factors may also contribute. Genetic drift cannot be ruled out, taking into account that haplotypes carrying low-frequency alleles are subject to greater stochastic fluctuations due to their relatively smaller effective size. In any case, it should be pointed out that GD between rarer alleles represents only a small fraction (11%) of the total of significant cases. Interestingly, we have found that background selection (Charlesworthet al. 1993) can explain most of these instances of GD between unrelated loci (C. Núñez, T. Velasco and C. Zapata, unpublished results).
The results of our study suggest that epistatic interactions between functionally related protein loci have a considerable impact on the multilocus architecture of the functional genome. Hence, elucidation of proteomics presents us with the challenge of assessing the number and location of related protein loci along chromosomes and the strength of their functional relationships. Genome-wide scans for the presence of GD between protein loci may be a useful approach for unraveling proteomics. One of the best ways to assess the utility of GD for detecting relationships among uncharacterized proteins is to determine whether GD is more likely to occur among protein loci whose functional relationships are known. Our findings suggest that GD is indeed markedly more frequent between functionally related loci than between unrelated loci. GD analyses may thus be useful not only for tentative assignment of uncharacterized polymorphic proteins to groups of functionally related proteins, but also for untangling the complex among-protein interactions that surely govern the behavior of the cell. Nevertheless, the use of GD in proteomics has two main intrinsic limitations. First, only polymorphic proteins can be analyzed. Second, evidence of GD among protein loci is not necessarily a consequence of their functional relationships. It follows that GD analyses can offer only preliminary assessments of functions and interactions, and these would then have to be confirmed by alternative methods. Certainly, further investigation is required to systematically evaluate the potential utility of GD analysis in proteomics.
Acknowledgments
We thank J. Gómez-Márquez for discussion and suggestions concerning the functional relationships among protein loci. We thank R. Montero for his technical assistance. We are grateful to an anonymous reviewer for insightful suggestions. This research was supported by grant PB96-0948 (to C.Z.) from the Ministerio de Educación y Cultura (Spain) and by fellowships from the Consellería de Educación e Ordenación Universitaria of the Xunta de Galicia (Spain) to C.N. and T.V.
Footnotes
-
Communicating editor: W. Stephan
- Received March 6, 2002.
- Accepted May 9, 2002.
- Copyright © 2002 by the Genetics Society of America