- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Schaeffer, S. W.
- Articles by Miller, E. L.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Schaeffer, S. W.
- Articles by Miller, E. L.
Protein Variation in ADH and ADH-RELATED in Drosophila pseudoobscura: Linkage Disequilibrium Between Single Nucleotide Polymorphisms and Protein Alleles
Stephen W. Schaeffera,b, C. Scott Walthour1,a, Donna M. Toleno2,a, Anna T. Olek3,a, and Ellen L. Miller4,aa Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802
b Institute of Molecular Evolutionary Genetics, The Pennsylvania State University, University Park, Pennsylvania 16802
Corresponding author: Stephen W. Schaeffer, Department of Biology, The Pennsylvania State University, 208 Mueller Labs, University Park, PA 16802-5301., sws4{at}psu.edu (E-mail)
Communicating editor: G. B. GOLDING
| ABSTRACT |
|---|
A 3.5-kb segment of the alcohol dehydrogenase (Adh) region that includes the Adh and Adh-related genes was sequenced in 139 Drosophila pseudoobscura strains collected from 13 populations. The Adh gene encodes four protein alleles and rejects a neutral model of protein evolution with the McDonald-Kreitman test, although the number of segregating synonymous sites is too high to conclude that adaptive selection has operated. The Adh-related gene encodes 18 protein haplotypes and fails to reject an equilibrium neutral model. The populations fail to show significant geographic differentiation of the Adh-related haplotypes. Eight of 404 single nucleotide polymorphisms (SNPs) in the Adh region were in significant linkage disequilibrium with three ADHR protein alleles. Coalescent simulations with and without recombination were used to derive the expected levels of significant linkage disequilibrium between SNPs and 18 protein haplotypes. Maximum levels of linkage disequilibrium are expected for protein alleles at moderate frequencies. In coalescent models without recombination, linkage disequilibrium decays between SNPs and high frequency haplotypes because common alleles mutate to haplotypes that are rare or that reach moderate frequency. The implication of this study is that linkage disequilibrium mapping has the highest probability of success with disease-causing alleles at frequencies of 10%.
HUMAN diseases can be caused by single genes of large effect or by many genes of small effect. The challenge facing geneticists is to develop methods that can map and clone disease genes using the wealth of nucleotide polymorphism data that has emerged from the human genome project. Linkage and association studies are two approaches that are and will be used to map monogenic and polygenic human diseases (![]()
![]()
![]()
![]()
Association studies determine if there is significant nonrandom transmission of heterozygous DNA markers from parents to affected offspring (e.g., the transmission/disequilibrium test of ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
The Adh region of Drosophila pseudoobscura is an excellent model for the study of linkage disequilibrium between SNPs and protein phenotypes from population samples. The Adh region encodes two proteins, ADH and ADH-RELATED (ADHR; Fig 1). An ancient gene duplication prior to the Sophophoran radiation gave rise to the two genes (![]()
![]()
![]()
![]()
|
The levels of nucleotide and protein variation in the Adh region are sufficient for detection of significant linkage disequilibrium. The 3.5-kb Adh region of D. pseudoobscura was found to have 359 SNPs in a sample of 99 strains (![]()
![]()
![]()
![]()
We present here an analysis of protein variation at ADH and ADHR in the alcohol dehydrogenase (Adh) region in 139 strains of D. pseudoobscura. These data were used to test the two loci for departures from an equilibrium neutral model, which can indicate the past action of strong purifying or positive Darwinian selection on the proteins. Positive Darwinian selection, but not purifying selection, can generate significant linkage disequilibrium in a genetic region (![]()
![]()
| MATERIALS AND METHODS |
|---|
Nucleotide sequences and GenBank accession numbers:
Fig 1 shows the fine structure of the Adh region as well as the fragment of DNA that was sequenced in this study. A total of 139, sequences of the D. pseudoobscura Adh region were analyzed, including 99 previously published sequences (![]()
![]()
![]()
![]()
![]()
![]()
- Gundlach-Bundschu Winery, Sonoma, CA (n = 1): GB48 (U64521);
- Kaibab National Forest, AZ (n=38): PSU228, PS231, PSU232, PSU236, PSU242, PSU246, PSU248, PSU251, PSU254, PSU256, PSU260, PSU267, PSU268, PSU271, PSU273, PSU277, PSU280, PSU295, PSU296, PSU301, PSU308, PSU311, PSU317, PSU320, PSU322, PSU323, PSU329, PSU331, PSU333, PSU 334, PSU336, PSU342, PSU343, PSU348, PSU350, PSU352, PSU353, and PSU356 (U64522-U64559);
- Fort Davis, TX (n = 1): PSU367 (U64560);
where n is the population sample size. We used the Adh and Adhr coding sequences of six obscura group species as outgroups for nucleotide sequence comparisons. The GenBank accession numbers for these sequences are as follows: D. miranda (M60998), D. persimilis (M60997), D. subobscura (M55545), D. maderiensis (X60112), D. guanche (X60113), and D. ambigua (X54813).
DNA sequence alignment:
The 139 nucleotide sequences were aligned manually with the Eyeball Sequence Editor (ESEE, version 2.00a; ![]()
Analysis of nucleotide polymorphism:
Nucleotide diversity in the Adh region was estimated with three approaches. The first method used the number of multisite genotypes or haplotypes to estimate heterozygosity (Equations 8.4 and 8.12 in ![]()
; Equations 10.5 and 10.7, ![]()
; Equation 1.4, ![]()
The frequency spectrum of segregating sites in a genetic region can be altered by selection, population structure, or population expansion. The frequency spectra for the 17 different domains in the Adh region were tested for significant departures from an equilibrium neutral model with the ![]()
![]()
![]()
Analysis of amino acid sequence polymorphism:
Amino acid replacements in the Adh and Adhr genes were used to infer protein haplotypes for ADH and ADHR. The frequency distribution of protein alleles was tested for departures from an equilibrium neutral model with the Ewens-Watterson (EW; ![]()
![]()
![]()
![]()
![]()
The distribution of amino acid haplotypes into the 13 populations was used to estimate levels of gene flow among populations. SLATKIN's (1985) private alleles method was used to derive an average estimate of the neutral migration parameter.
Linkage disequilibrium between SNPs and amino acid haplotypes:
We tested each segregating site in the Adh region for significant nonrandom association with the ADHR protein haplotypes with Fisher's exact test (FET; ![]()
![]()
![]()
![]()
![]()
| RESULTS |
|---|
Nucleotide diversity in the Adh region:
Table 1 shows the three estimates of genetic diversity for the 17 functional domains of the Adh region. Haplotype diversity varies from a low value of 0.084 ± 0.032 in the Adh larval leader and Adhr exon 1 sequence to a high of 0.994 ± 0.003 in the Adh adult intron. Haplotype diversity was 0.971 ± 0.006 in Adh and 0.995 ± 0.002 in Adhr. Estimates of the nucleotide heterozygosity per site on the basis of pairwise differences (
) varied from 0.001 ± 0.0004 in Adhr exon 1 to 0.086 ± 0.005 in Adh intron 2. Overall,
was lower in Adh (0.004 ± 0.0002) than in Adhr (0.011 ± 0.0002). Estimates of nucleotide heterozygosity based on the number of segregating sites (
) are higher than
in most regions; however, the two measures of diversity show similar trends in the rank of their values in the 17 functional domains.
|
Table 1 also shows the results of the Tajima and Fu and Li tests. Tajima's DT was negative in all domains except for intron 2 of Adh. Two of the three exons in Adh showed a significant negative Tajima's DT due to an excess of rare variants. Overall, the Adh gene showed a significant excess of rare variants with the Tajima test. None of the exons of Adhr showed a significant departure from an equilibrium neutral model nor did the total of the Adhr coding region. Fu and Li's DF&L was negative for all functional domains except intron 1 of Adh. As with the Tajima test, two of the three exons in Adh showed a significant excess of external mutations or rare variants. While the Tajima test failed to find significant departures from an equilibrium neutral model in Adhr exons, variation in two of the three exons in Adhr showed a significant excess of external mutations with the Fu and Li test. Overall, both loci showed a significant excess of rare variants with the Fu and Li test.
We used the plot of the cumulative distribution function G(x) suggested by ![]()

where k is the number of the segregating site, S is the total number of segregating sites, xk is the nucleotide number of the kth segregating site, and N is the total number of nucleotides in the sequence. Fig 2 shows the relationship between nucleotide position and G(x). Regions where segregating sites are clustered show a monotonic increase in G(x), while regions that lack segregating sites show a monotonic decrease in G(x). We used a 6-site running average of the G(x) statistic to smooth the curve shown in Fig 2. The statistical significance of the segments that showed the longest increases or decreases in G(x) was evaluated with a random permutation test. For each simulation, 418 segregating sites were randomly assigned to the 3167 nucleotide sites in the Adh region and G(x) was estimated from the randomized data. The five largest monotonic increases or decreases were estimated from each randomized placement of the variable sites. A total of 100,000 simulations were performed and the significance level was determined by comparing the observed values to that of the rank-ordered simulation outcomes. We found that two sets of segregating sites were significantly clustered in the Adh adult intron, which resulted in large monotonic increases in G(x) (see 1 and 3 in Fig 2). These two segments account for the overall increasing trend of G(x) in the adult intron of Adh. Not only are there clusters of segregating sites in the Adh adult intron, but these polymorphic sites tend to have frequencies higher than those in other regions of Adh (Table 1). There were three large monotonic decreases in G(x) in the Adh region (see 2, 4, and 5 in Fig 2). The first region starts at the 3' end of the Adh adult intron. The second region begins at the 3' transcribed and untranslated leader of Adh and ends in intron 1 of Adhr. The third region covers exon 2 of Adh. These three segments tend to have fewer segregating sites because 75% of the nucleotide changes in coding sequences will result in amino acid changes. The reason why the intergenic region between Adh and Adhr is conserved is not clear at this time, but the reduced number of segregating sites may result from conserved regulatory information for the expression of Adhr in the dicistronic message.
|
Protein evolution in ADH:
Table 2 shows the three amino acid sites segregating in the ADH protein and the four haplotypes that are generated from these changes. Haplotype 1 is the dominant haplotype with a frequency of 0.978 while the three other haplotypes were all unique. The three unique haplotypes were collected from the Kaibab population, which had the largest sample size. The haplotype diversity in ADH is 0.043 with a corresponding homozygosity value (F) of 0.957, which departs from expectations of the infinite alleles model when tested with the EW test (Table 3). The excess homozygosity in the ADH protein could be due to the recent fixation of an adaptive mutation or to strong purifying selection (![]()
|
|
|
Protein evolution in ADHR:
Table 5 shows the 12 polymorphic amino acid residues and the 18 multisite genotypes or haplotypes for the ADHR protein. We refer to the ADHR haplotypes by the number designation in Table 5. For instance, ADHR 1 refers to haplotype 1. The haplotype diversity in ADHR is 0.804 and a EW test fails to reject the infinite alleles model of neutral evolution (Table 3). In addition, the MK test also fails to reject a neutral model of protein evolution (Table 4). The 12 amino acid changes in ADHR are significantly clustered at the C-terminal end of the protein as indicated by a long monotonic increase in G(x) (![]()
0.487 is 0.003 given a 278-amino-acid protein with 12 polymorphic sites. Eleven of the 12 amino acid changes are located in the third exon of Adhr.
|
Table 6 shows the distribution of the 18 ADHR haplotypes among the 13 populations that were sampled for this study. In populations where only a single allele was sampled, the most common ADHR allele was collected, while unique ADHR haplotypes were found only in the largest population samples. We used a chi-square test of heterogeneity to determine if the frequencies of alleles sampled in each population were significantly different. Given the number of cells with a value of zero, we used a random permutation test to sample ADHR haplotypes with and without replacement to determine the significance of our observed chi-square value. The first test was performed by drawing the 139 ADHR alleles without replacement to fill the frequency distribution table and a chi-square test was performed on the simulated data. We generated 100,000 random tables and determined the significance value of our observed test from the frequency of simulated chi-square values that were greater than our observed test statistic. The second test was performed by drawing ADHR haplotypes from the total observed frequency spectrum with replacement. The chi-square distribution and significance level were estimated with the same method as the first test. In both cases, we failed to reject the null hypothesis of homogeneity of the ADHR frequency spectra among the 13 populations (
2 = 133.128, without replacement P = 0.887; with replacement P = 0.690). SLATKIN's (1985) private allele analysis of ADHR haplotype frequencies estimates Nm to be 3.127.
|
The neighbor-joining method within the MEGA program (![]()
![]()
![]()
|
The ADHR 2 lineage has two sublineages of eight (GB27, GB32, GB41, GB92, GB114, BC93, PS108, and PS139) and three (AH69, AH54, and GB 12) strains where Adhr is identical in sequence (indicated by A2 in Fig 3). The eight-sequence sublineage has strains collected from British Columbia and California, while all of the strains in the three-sequence sublineage were collected from California. The two sublineages each have a significant number of nucleotide changes that differentiate them from the other strains and are supported with 96 and 98% of bootstrap replicates, respectively. Phylogenies constructed from nucleotide data from adjacent Adh regions do not cluster the D. pseudoobscura strains by their ADHR designations (data not shown).
We wanted to determine if the estimates of the neutral mutation parameter
in Adh and Adhr were equivalent in the nine most abundant ADHR haplotypes (Table 7). The ADHR 6 and 8 haplotypes appeared to have low levels of nucleotide diversity per gene in Adhr compared to the other protein alleles. We used coalescent simulations available within DnaSP 3.50 (![]()
in Adh and Adhr within a haplotype were higher or lower than the overall mean
value given the sample size for the tested haplotype. For instance,
per gene in Adhr was estimated to be 27.1 for all haplotypes. The level of variation in ADHR 6 (
= 2.4) is significantly less than that expected given the 95% confidence interval for the Adhr gene (
= 5.869.12) given a sample size of five alleles assuming no recombination. We used a sequential Bonferroni test to correct for multiple tests (![]()
value of 48.8. None of the other ADHR haplotypes showed a significant reduction or excess of nucleotide heterozygosity.
|
Linkage disequilibrium between SNPs and ADHR haplotypes:
We asked whether any of the SNPs in the Adh region were nonrandomly associated with any of the 17 most frequent ADHR haplotypes. There are 404 nucleotide sites that are polymorphic in this sample of 139 strains of D. pseudoobscura, excluding the 14 nucleotide sites that are located in codons responsible for the amino acid replacements in ADHR.
There are 17 independent tests of association with ADHR alleles that can be performed for each biallelic nucleotide site for a total of 6868 possible comparisons. Only 793 of the 6868 comparisons were capable of rejecting the null hypothesis of no association with FET given the observed marginal frequencies of our SNPs and ADHR alleles (![]()
![]()
![]()
![]()
|
Some general patterns emerge from our analysis of significant linkage disequilibria between SNPs and ADHR haplotypes. Few significant associations were observed, even though 80 of 404 SNPs had frequencies in the range of 548% and 5 of 18 ADHR haplotypes had frequencies in the range of 537%. None of the nucleotide sites were found to be in significant linkage disequilibrium with the dominant haplotype. ADHR 6 haplotype had the lowest frequency of any allele to show a significant association with a SNP (0.036), while ADHR 2 had the highest frequency of any allele to show a strong association with a nucleotide polymorphism (0.187). Eight of 404 nucleotide sites showed significant associations with three ADHR alleles. Nucleotide 744 had the lowest frequency of bases found to be in significant linkage disequilibrium with an ADHR haplotype (0.021), while base 2891 had the highest frequency (0.482). Nucleotide 2891 was the only polymorphic site to show an association with more than one ADHR haplotype (ADHR 2 and ADHR 3).
We determined the fraction of nonrandom associations between SNPs and ADHR haplotypes expected under a neutral model with mutation and recombination with coalescent simulations. The method of ![]()
![]()
![]()
![]()
![]()
The observed frequency spectrum of the 18 ADHR haplotypes was similar to those of the replicate samples within the two coalescent simulations. We used the EW homozygosity measure F to summarize the haplotype frequency spectrum of each replicate sample to determine a 95% confidence interval (C.I.) for the recombination (Rec) and no recombination (No Rec) simulations. The observed value of F fell within the 95% C.I. for both simulations, which suggested that the simulations have a haplotype structure similar to that of the Adh region [Observed ADHR, F = 0.196; Rec, mean F = 0.164, 95% C.I. (0.101.282); No Rec, mean F = 0.193, 95% C.I. (0.0990.362)]. For the recombination and no recombination cases, the fraction of significant associations as a function of haplotype frequency had a unimodal distribution with its maximum at the third most frequent haplotype (Fig 4A). The major difference between the recombination and no recombination cases was in the magnitude of percentage associations. As expected, the no recombination distribution showed more nucleotides in linkage disequilibrium with protein haplotypes than did the recombination distribution. The recombination and no recombination distributions showed lower fractions of significant associations with the most frequent protein haplotype than with moderate frequency haplotypes. The magnitude of association between nucleotides and ADHR haplotypes, as measured by r2, was negatively related to haplotype frequency (Fig 4B).
|
| DISCUSSION |
|---|
Adaptive fixations of amino acids in ADH:
One might be tempted to think that the Adh locus of D. pseudoobscura is the target of a recent adaptive fixation event because of the significant result in the MK test. There are two problems with this inference. First, if one were to polarize the mutations at the three amino acid sites within D. pseudoobscura using the D. miranda sequence, then the amino acid polymorphisms at codon 68 and 185 appear to be near fixation (Table 2). This is likely to be a faulty inference because when the amino acid polymorphism of D. pseudoobscura is compared to the ADH sequences of four ancestral members of the obscura group, then all three amino acid changes in D. pseudoobscura are inferred to be new rare variants. Second, if the Adh locus of D. pseudoobscura was the target of a recent selective sweep, one would expect little synonymous variation within the gene or in the local vicinity. The levels of synonymous variation in Adh and the surrounding region are not consistent with this explanation. The dominant ADH haplotype has 50 synonymous sites segregating within populations of D. pseudoobscura and the introns of Adh have highest levels of synonymous polymorphism in the region (Table 1). These data strongly suggest that the Adh locus harbors little nonsynonymous diversity in D. pseudoobscura because the protein is under strong purifying selection that has prevented new amino acid replacements from increasing to appreciable frequency in the recent past.
The significant MK test results suggest that Adh in D. miranda may have been the target of adaptive evolution in its recent history. An examination of nucleotide sequence diversity within D. miranda will reveal either low levels of synonymous polymorphism, lending support to a recent adaptive fixation within the species, or will show levels of variation similar to those of D. pseudoobscura, suggesting that selection acts episodically on the ADH protein. Protein polymorphism in ADH is generally low within species, yet comparisons of ADH among distantly related species show a modest number of amino acid fixations. For example, D. melanogaster and D. pseudoobscura have low levels of amino acid polymorphism, but 25 amino acid changes have accumulated since the two species shared a common ancestor. This paradox suggests that ADH is usually under strong purifying selection to prevent amino acid substitutions; however, the majority of fixations that occur would seem to result from adaptive fixation events (![]()
Multiple forms of selection act at the Adh locus of D. pseudoobscura:
The high rate of recombination in the Adh region (![]()
The purifying selection that acts to maintain the ADH amino acid sequence does not reduce genetic diversity of the whole region. One might expect variation at tightly linked neutral sites to be reduced when associated with a locus where amino acid variation is strongly selected against (![]()
![]()
![]()
Weakly selected synonymous codons of Adh also evolve independently of the purifying selection to preserve the amino acid sequence of ADH. Synonymous codons of Adh are weakly selected to increase the frequency of major or optimal codons that enhance the translational accuracy or efficiency of the transcript (![]()
Strong epistatic selection acts within two of the three Adh introns despite the other types of selection that operate in the region. Two clusters of nucleotide sites that are in significant linkage disequilibrium are maintained in the face of high levels of recombination in the region (![]()
![]()
![]()
The various forms of selection that act on sequences in the Adh gene do not influence the evolution of the Adhr gene, even though Adhr is translated as part of a dicistronic message (![]()
![]()
![]()
![]()
Protein evolution at Adhr:
All but one of the standard tests of selective neutrality (![]()
![]()
![]()
![]()
![]()
![]()
![]()
In our sample of ADHR 2 chromosomes, we have eight identical sequences that were sampled from two populations, Gundlach-Bundshu, California, and Stemwinder Provincial Park, British Columbia, Canada (Fig 3). These chromosomes differ at many nucleotide sites along the sequence, which generates a consistent pattern of nonrandom association across the Adh region and a significant bootstrap value for the eight sequences. These data suggest that a subset of the ADHR 2 haplotypes has recently increased in frequency and carried linked neutral sites into high frequency in the northern range of D. pseudoobscura's distribution. The reduction of nucleotide diversity in the Adh gene on the ADHR 2 background is consistent with this rapid increase of the ADHR 2 chromosome (Table 7). This analysis of ADHR haplotypes demonstrates how amino acid and nucleotide sequence data can detect subtle expansions of subtypes that would be missed if only rates of synonymous and nonsynonymous sites are estimated.
The geographic distribution of the 18 ADHR protein alleles was largely dictated by sample size rather than allele age. ADHR 1 is likely to be the oldest allele and it has the widest distribution of all the alleles in our sample; however, ADHR 6 is a relatively new allele that also has a wide distribution. On the other hand, large samples from populations tended to have the highest allelic diversity. A total of 64 ADHR alleles were sampled from the Kaibab National Forest and this population had 15 protein alleles present. When only 1 ADHR allele was collected from the population, 1 of the 2 most common alleles was present in the sample. These data are consistent with the extensive genetic exchange that occurs among D. pseudoobscura populations (![]()
Linkage disequilibrium between SNPs and ADHR haplotypes:
It has been suggested that SNPs will provide a powerful tool to map complex phenotypes in the human genome. The ADHR locus is a useful model for a disease-causing locus where different phenotypes are generated from a single locus with multiple alleles (![]()
![]()
![]()
The observed frequencies of SNPs in significant linkage disequilibrium with each of the ADHR haplotypes are within the 95% confidence limits expected under a neutral model with mutation and recombination (Table 8). These data suggest that low levels of linkage disequilibrium can be detected in a region that experiences moderate to high rates of recombination (![]()
We did not observe any significant associations with the most frequent ADHR haplotype. This might be expected given that the most frequent haplotype is likely to be the oldest allele in the population and recombination would have sufficient time to break up any nonrandom associations that are introduced by mutation. This trend is observed in the coalescent simulations with mutation and recombination where the third most frequent haplotype had the maximum fraction of nonrandom associations (Fig 4A). As expected, the fraction of comparisons that showed significant association was much higher for the no recombination case than the recombination case; however, the maximum level of SNP associations were observed with the third most frequent haplotype rather than with the protein allele with the greatest frequency (Fig 4A).
Several factors reduce the numbers of significant association with the dominant haplotype. First, the probability of rejecting random association between a SNP and a protein haplotype with FET depends on the protein haplotype frequency. Fig 5 shows the percentage of FET tables that are capable of rejecting the null model of no association for different protein haplotype frequencies. The percentage of significant FET was determined for each protein haplotype frequency by integrating over all SNP frequencies given a sample of 139 alleles. This power curve shows that FET has the most power to reject the null hypothesis of no association for a haplotype with 13 copies in a sample of 139 (13/139 = 9.3%). There is a decline in power to reject the null hypothesis with lower and higher frequencies. This power curve suggests that the reason the fraction of significant linkage disequilibria is lower for the most common allele in a neutral coalescent without recombination is that FET has less power to reject the null hypothesis for high frequency alleles (Fig 4A).
|
The power curve in Fig 5 assumes that all FET 2 x 2 tables are equally likely in a coalescent simulation; however, some 2 x 2 tables may be more likely in neutral genealogies than others depending on how protein haplotypes coalesce within trees. To observe significant linkage disequilibrium, the mutation that gives rise to a SNP must occur right before or soon after the coalescence of a protein haplotype. The probability of this co-occurrence will be a function of the coalescence time of the protein haplotype and the probability of mutations occurring near the origin of a new allele. Rare protein haplotypes will tend to coalesce earlier in a neutral genealogy compared to moderate or high frequency variants and the probability that a mutation will co-occur on the branch that gives rise to a rare variant will be extremely small (![]()
![]()
![]()
The coalescent genealogies used in this study were used to determine how much linkage disequilibrium is expected given a neutral model with and without recombination. In this analysis, we imposed a hierarchical structure on the genealogy that simulated a genetic locus with multiple alleles. The ramifications of these results are that (1) linkage disequilibrium between a protein allele and a SNP can be broken down by mutation of one major protein allele to a rare or moderate frequency allele; (2) moderate frequency alleles will have the highest likelihood of being in significant linkage disequilibrium with a SNP; and (3) gene regions that experience moderate recombination rates can show nonrandom associations between protein alleles and SNPs, although fewer associations will be observed than in regions with lower genetic exchange rates.
Implications for association studies in disease gene mapping:
Several conclusions can be drawn from the linkage disequilibrium analysis of SNPs and ADHR haplotypes. First, recombination reduces the fraction of polymorphic sites that show associations with a disease-causing gene, but significant linkage disequilibrium can be observed as a result of mutation and random genetic drift. The significant associations that are observed in a region of moderate to high levels of recombination are likely to indicate candidate regions with a disease-causing gene nearby. Second, linkage disequilibrium studies will be most effective in detecting disease alleles at moderate frequencies. If a genetic disease is caused by a high frequency allele, then linkage disequilibrium mapping will not be an effective tool for locating the disease gene. This will not be a problem because disease-causing alleles usually exist at moderate to low frequencies in human populations. Linkage disequilibrium studies are probably not going to be effective in detecting rare alleles because significant associations will occur only if a new mutation occurs on the same chromosome as a disease-causing allele. Linkage disequilibrium mapping offers the most promise of mapping disease genes with a frequency of 10% in a sample size of 100 genes. This results from the power of FET to detect significant associations and the genealogical history of disease-causing alleles.
| FOOTNOTES |
|---|
1 Present address: 29 Stevenson Ave., Everett, MA 02149. ![]()
2 Present address: Department of Mathematical Sciences, Villanova University, St. Augustine Ctr., Rm. 305, 800 Lancaster Ave., Villanova, PA 19085-1699. ![]()
3 Present address: Department of Botany and Plant Pathology, Purdue University, Lilly Hall, Rm. 1-414, West Lafayette, IN 47907. ![]()
4 Present address: Department of Genetics, Cell Biology and Development, Rm. 6-160 Jackson Hall, 321 Church St. SE, Minneapolis, MN 55455. ![]()
Stephen W. Schaeffer dedicates this article to the memory of Joyce L. Schaeffer (April 4, 1927March 18, 2001). ![]()
| ACKNOWLEDGMENTS |
|---|
We thank David J. Begun, Andrew G. Clark, Charles H. Langley, and Thomas S. Whittam for helpful discussions during the preparation of this article. We thank two anonymous reviewers for providing constructive criticisms that improved the article. This work was supported by grant GM-42472 from the National Institutes of Health.
Manuscript received April 27, 2001; Accepted for publication July 25, 2001.
| LITERATURE CITED |
|---|
AKASHI, H. and S. W. SCHAEFFER, 1997 Natural selection and the frequency distributions of "silent" polymorphism in Drosophila. Genetics 146:295-307[Abstract].
ALLISON, D. B., 1997 Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60:676-690[Medline].
AQUADRO, C. F., S. F. DEESE, M. M. BLAND, C. H. LANGLEY, and C. C. LAURIE-AHLBERG, 1986 Molecular population genetics of the alcohol dehydrogenase gene region of Drosophila melanogaster.. Genetics 114:1165-1190
BELL, D. A. and J. A. TAYLOR, 1997 Genetic analysis of complex disease. Science 275:1327-1328
BENYAJATI, C., N. SPOEREL, H. HAYMERLE, and M. ASHBURNER, 1983 The messenger RNA for alcohol dehydrogenase in Drosophila melanogaster differs in its 5' end in different developmental stages. Cell 33:125-133[Medline].
BROGNA, S. and M. ASHBURNER, 1997 The Adh-related gene of Drosophila melanogaster is expressed as a functional dicistronic messenger RNA: multigenic transcription in higher organisms. EMBO J. 16:2023-2031[Medline].
CABOT, E. L. and A. T. BECKENBACH, 1989 Simultaneous editing of multiple nucleic acid and protein sequences with ESEE. Comput. Appl. Biosci. 5:233-234
CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993 The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303[Abstract].
EWENS, W. J., 1972 The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3:87-112[Medline].
FU, Y.-X. and W.-H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133:693-709[Abstract].
HILL, W. G. and A. ROBERTSON, 1968 Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38:226-231.
HUDSON, R. R., 1987 Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245-250[Medline].
HUDSON, R. R., 1990 Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1-44.
HUDSON, R. R. and N. L. KAPLAN, 1995 Deleterious background selection with recombination. Genetics 141:1605-1617[Abstract].
JENKINS, T. M., C. J. BASTEN, and W. W. ANDERSON, 1996 Mitochondrial gene divergence of Colombian Drosophila pseudoobscura.. Mol. Biol. Evol. 13:1266-1275[Abstract].
JORDE, L. B., 1995 Linkage disequilibrium as a gene-mapping tool. Am. J. Hum. Genet. 56:11-14[Medline].
KIRBY, D. A., S. V. MUSE, and W. STEPHAN, 1995 Maintenance of pre-mRNA secondary structure by epistatic selection. Proc. Natl. Acad. Sci. USA 92:9047-9051
KOVACEVIC, M. and S. W. SCHAEFFER, 2000 Molecular population genetics of X-linked genes in Drosophila pseudoobscura.. Genetics 156:155-172
KREITMAN, M., 1983 Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster.. Nature 304:412-417[Medline].
KUMAR, S., K. TAMURA, and M. NEI, 1994 MEGA: molecular evolutionary genetics analysis. Comput. Appl. Biosci. 10:189-191
LEWONTIN, R. C., 1995 The detection of linkage disequilibrium in molecular sequence data. Genetics 140:377-388[Abstract].
LONG, A. D. and C. H. LANGLEY, 1999 The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9:720-731
LONG, A. D., M. N. GROTE, and C. H. LANGLEY, 1997 Genetic analysis of complex diseases. Science 275:1328.
MCDONALD, J. H. and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in Drosophila.. Nature 351:652-654[Medline].
MICKLE, J. E. and G. R. CUTTING, 2000 Genotype-phenotype relationships in cystic fibrosis. Med. Clin. North Am. 84:597-607[Medline].
NACHMAN, M. W., W. M. BROWN, M. STONEKING, and C. F. AQUADRO, 1996 Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142:953-963[Abstract].
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
NORDBERG, M., B. CHARLESWORTH, and D. CHARLESWORTH, 1996 The effect of recombination on background selection. Genet. Res. 67:159-174[Medline].
RICE, W. R., 1989 Analyzing tables of statistical tests. Evolution 43:223-225.
RISCH, N., 1990 Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet. 46:222-228[Medline].
RISCH, N. and K. MERIKANGAS, 1996 The future of genetic studies of complex human diseases. Science 273:1516-1517[Medline].
ROZAS, J. and R. ROZAS, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175
RUSSO, C. A., N. TAKEZAKI, and M. NEI, 1995 Molecular phylogeny and divergence times of drosophilid species. Mol. Biol. Evol. 12:391-404[Abstract].
SAITOU, N. and M. NEI, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425[Abstract].
SCHAEFFER, S. W. and C. F. AQUADRO, 1987 Nucleotide sequence of the Adh gene region of Drosophila pseudoobscura: evolutionary change and evidence for an ancient gene duplication. Genetics 117:61-73
SCHAEFFER, S. W. and E. L. MILLER, 1991 Nucleotide sequence analysis of Adh genes estimates the time of geographic isolation of the Bogota population of Drosophila pseudoobscura.. Proc. Natl. Acad. Sci. USA 88:6097-6101
SCHAEFFER, S. W. and E. L. MILLER, 1992a Estimates of gene flow in Drosophila pseudoobscura determined from nucleotide sequence analysis of the alcohol dehydrogenase region. Genetics 132:471-480[Abstract].
SCHAEFFER, S. W. and E. L. MILLER, 1992b Molecular population genetics of an electrophoretically monomorphic protein in the alcohol dehydrogenase region of Drosophila pseudoobscura.. Genetics 132:163-178[Abstract].
SCHAEFFER, S. W. and E. L. MILLER, 1993 Estimates of linkage disequilibrium and the recombination parameter determined from segregating nucleotide sites in the alcohol dehydrogenase region of Drosophila pseudoobscura.. Genetics 135:541-552[Abstract].
SCRIVER, C. R. and P. J. WATERS, 1999 Monogenic traits are not simple: lessons from phenylketonuria. Trends Genet. 15:267-272[Medline].
SHIELDS, D. C., P. M. SHARP, D. G. HIGGINS, and F. WRIGHT, 1988 "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704-716[Abstract].
SLATKIN, M., 1985 Rare alleles as indicators of gene flow. Evolution 39:53-65.
SOKAL, R. R., and F. J. ROHLF, 1981 Biometry. W. H. Freeman, New York.




