Genetics, Vol. 159, 673-687, October 2001, Copyright © 2001

Protein Variation in ADH and ADH-RELATED in Drosophila pseudoobscura: Linkage Disequilibrium Between Single Nucleotide Polymorphisms and Protein Alleles

Stephen W. Schaeffera,b, C. Scott Walthour1,a, Donna M. Toleno2,a, Anna T. Olek3,a, and Ellen L. Miller4,a
a Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802
b Institute of Molecular Evolutionary Genetics, The Pennsylvania State University, University Park, Pennsylvania 16802

Corresponding author: Stephen W. Schaeffer, Department of Biology, The Pennsylvania State University, 208 Mueller Labs, University Park, PA 16802-5301., sws4{at}psu.edu (E-mail)

Communicating editor: G. B. GOLDING


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

A 3.5-kb segment of the alcohol dehydrogenase (Adh) region that includes the Adh and Adh-related genes was sequenced in 139 Drosophila pseudoobscura strains collected from 13 populations. The Adh gene encodes four protein alleles and rejects a neutral model of protein evolution with the McDonald-Kreitman test, although the number of segregating synonymous sites is too high to conclude that adaptive selection has operated. The Adh-related gene encodes 18 protein haplotypes and fails to reject an equilibrium neutral model. The populations fail to show significant geographic differentiation of the Adh-related haplotypes. Eight of 404 single nucleotide polymorphisms (SNPs) in the Adh region were in significant linkage disequilibrium with three ADHR protein alleles. Coalescent simulations with and without recombination were used to derive the expected levels of significant linkage disequilibrium between SNPs and 18 protein haplotypes. Maximum levels of linkage disequilibrium are expected for protein alleles at moderate frequencies. In coalescent models without recombination, linkage disequilibrium decays between SNPs and high frequency haplotypes because common alleles mutate to haplotypes that are rare or that reach moderate frequency. The implication of this study is that linkage disequilibrium mapping has the highest probability of success with disease-causing alleles at frequencies of 10%.


HUMAN diseases can be caused by single genes of large effect or by many genes of small effect. The challenge facing geneticists is to develop methods that can map and clone disease genes using the wealth of nucleotide polymorphism data that has emerged from the human genome project. Linkage and association studies are two approaches that are and will be used to map monogenic and polygenic human diseases (RISCH 1990 Down; SPIELMAN et al. 1993 Down). Linkage studies use the transmission of DNA-based markers and disease traits within families to estimate the relative risks of getting a disease compared to the population at large (RISCH 1990 Down). Linkage analysis is very effective in mapping traits of large effect and high penetrance; however, linkage methods require that large numbers of families be surveyed as the relative risk factor for a trait declines (RISCH and MERIKANGAS 1996 Down).

Association studies determine if there is significant nonrandom transmission of heterozygous DNA markers from parents to affected offspring (e.g., the transmission/disequilibrium test of ALLISON 1997 Down; SPIELMAN et al. 1993 Down). This method takes advantage of the statistical association of two polymorphic markers or linkage disequilibrium to determine if a region contains a disease-causing gene. Linkage disequilibrium develops when a new nucleotide mutation arises on the same chromosome as a disease allele (JORDE 1995 Down). Linkage disequilibrium decays as a function of the recombination fraction between two segregating markers. The tighter the linkage of two sites, the longer it takes for linkage disequilibrium to decay. The advantage of this approach is that smaller numbers of families need to be examined for disease traits with smaller risk levels (RISCH and MERIKANGAS 1996 Down); however, it can take a longer time to collect appropriate family samples, especially in the case of late-onset diseases (BELL and TAYLOR 1997 Down). A genomic survey of linkage disequilibrium in population samples rather than families has been suggested as a method to detect associations between single nucleotide polymorphisms (SNPs) within candidate genes and complex phenotypes (BELL and TAYLOR 1997 Down; LONG et al. 1997 Down; LONG and LANGLEY 1999 Down). Linkage disequilibrium within population samples, however, can lead to spurious associations among unlinked loci when there is significant population admixture (SPIELMAN et al. 1993 Down).

The Adh region of Drosophila pseudoobscura is an excellent model for the study of linkage disequilibrium between SNPs and protein phenotypes from population samples. The Adh region encodes two proteins, ADH and ADH-RELATED (ADHR; Fig 1). An ancient gene duplication prior to the Sophophoran radiation gave rise to the two genes (SCHAEFFER and AQUADRO 1987 Down; RUSSO et al. 1995 Down). The Adh coding sequence is transcribed from two promoters that are regulated in a developmental and tissue-specific manner (BENYAJATI et al. 1983 Down). Adhr is not transcribed as a unique mRNA but is expressed as part of a dicistronic message produced from the Adh promoters (BROGNA and ASHBURNER 1997 Down).



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 1. The Adh region of D. pseudoobscura showing the two genes located in the segment, Adh and Adhr. The Adh is transcribed from two promoters during development, a larval promoter that is proximal to the coding region and an adult promoter that is distal to the coding region. Adhr is expressed as a result of read-through transcription of the Adh messages (BROGNA and ASHBURNER 1997 Down).

The levels of nucleotide and protein variation in the Adh region are sufficient for detection of significant linkage disequilibrium. The 3.5-kb Adh region of D. pseudoobscura was found to have 359 SNPs in a sample of 99 strains (SCHAEFFER and MILLER 1993 Down) and 20% of these sites have a frequency of 0.05 or greater. The Adh gene is virtually monomorphic for protein variation; however, the Adhr gene is segregating for 18 protein alleles that vary in frequency from 1 to 37% (this study and SCHAEFFER and MILLER 1992B Down). The populations from across the range of D. pseudoobscura are not significantly differentiated, which removes population admixture as a potential cause of linkage disequilibrium (SCHAEFFER and MILLER 1992B Down). The recombination rate in the Adh region is moderate to high so that linkage disequilibrium is expected to decay rapidly in this region (SCHAEFFER and MILLER 1993 Down). Thus, any significant associations between nucleotides and protein alleles in our population samples are likely to result from selection or recent origin of protein alleles.

We present here an analysis of protein variation at ADH and ADHR in the alcohol dehydrogenase (Adh) region in 139 strains of D. pseudoobscura. These data were used to test the two loci for departures from an equilibrium neutral model, which can indicate the past action of strong purifying or positive Darwinian selection on the proteins. Positive Darwinian selection, but not purifying selection, can generate significant linkage disequilibrium in a genetic region (KREITMAN 1983 Down; AQUADRO et al. 1986 Down). We analyzed the population distribution of protein alleles at the two loci to ensure that protein alleles were not significantly differentiated among populations. Finally, we tested SNPs in the Adh region for significant linkage disequilibrium with ADHR protein haplotypes.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide sequences and GenBank accession numbers:
Fig 1 shows the fine structure of the Adh region as well as the fragment of DNA that was sequenced in this study. A total of 139, sequences of the D. pseudoobscura Adh region were analyzed, including 99 previously published sequences (SCHAEFFER and MILLER 1993 Down) and 40 unpublished sequences. The 40 new sequences were determined with previously described methods (SCHAEFFER and MILLER 1991 Down, SCHAEFFER and MILLER 1992A Down, SCHAEFFER and MILLER 1992B Down, SCHAEFFER and MILLER 1993 Down). The population location, strain names, and the EMBL/GenBank Data Library accession numbers for the 99 previously published sequences can found in SCHAEFFER and MILLER 1993 Down. The population location, strain names, and the EMBL/GenBank Data Library accession numbers for the 40 unpublished nucleotide sequences are as follows:

  • Gundlach-Bundschu Winery, Sonoma, CA (n = 1): GB48 (U64521);

  • Kaibab National Forest, AZ (n=38): PSU228, PS231, PSU232, PSU236, PSU242, PSU246, PSU248, PSU251, PSU254, PSU256, PSU260, PSU267, PSU268, PSU271, PSU273, PSU277, PSU280, PSU295, PSU296, PSU301, PSU308, PSU311, PSU317, PSU320, PSU322, PSU323, PSU329, PSU331, PSU333, PSU 334, PSU336, PSU342, PSU343, PSU348, PSU350, PSU352, PSU353, and PSU356 (U64522-U64559);

  • Fort Davis, TX (n = 1): PSU367 (U64560);

where n is the population sample size. We used the Adh and Adhr coding sequences of six obscura group species as outgroups for nucleotide sequence comparisons. The GenBank accession numbers for these sequences are as follows: D. miranda (M60998), D. persimilis (M60997), D. subobscura (M55545), D. maderiensis (X60112), D. guanche (X60113), and D. ambigua (X54813).

DNA sequence alignment:
The 139 nucleotide sequences were aligned manually with the Eyeball Sequence Editor (ESEE, version 2.00a; CABOT and BECKENBACH 1989 Down). The alignments were determined by minimizing the number of mismatches and gaps assumed in the sequences. Any nucleotide site with two or more bases present in any population was defined as a segregating site or polymorphism. Insertion and deletion polymorphisms were excluded from all analyses and will be considered in a later article. Segregating sites found within insertions or deletions were also excluded from all analyses. The aligned sequences are available from the POPSET database within GenBank or from the author in other file formats.

Analysis of nucleotide polymorphism:
Nucleotide diversity in the Adh region was estimated with three approaches. The first method used the number of multisite genotypes or haplotypes to estimate heterozygosity (Equations 8.4 and 8.12 in NEI 1987 Down). The second approach used the number of pairwise differences to estimate nucleotide heterozygosity per site ({pi}; Equations 10.5 and 10.7, NEI 1987 Down). The last method used the number of segregating sites to estimate nucleotide heterozygosity per site ({Theta}; Equation 1.4, WATTERSON 1975 Down). Levels of diversity were estimated for 17 regions that correspond to the noncoding, coding, and intron domains of Adh and Adhr.

The frequency spectrum of segregating sites in a genetic region can be altered by selection, population structure, or population expansion. The frequency spectra for the 17 different domains in the Adh region were tested for significant departures from an equilibrium neutral model with the TAJIMA 1989 Down and the FU and LI 1993 Down tests. The outgroup sequence from D. miranda was used to polarize the mutations within D. pseudoobscura. A significant negative value of these test statistics can indicate purifying selection, population subdivision, or population expansion. A significant positive value of either test statistic can indicate some form of balancing selection. The DnaSP 3.50 (ROZAS and ROZAS 1999 Down) software package was used to complete these analyses.

Analysis of amino acid sequence polymorphism:
Amino acid replacements in the Adh and Adhr genes were used to infer protein haplotypes for ADH and ADHR. The frequency distribution of protein alleles was tested for departures from an equilibrium neutral model with the Ewens-Watterson (EW; EWENS 1972 Down; WATTERSON 1978 Down) and the McDonald-Kreitman (MK; MCDONALD and KREITMAN 1991 Down) tests. The EW method compares F, the observed sum of the haplotype frequencies squared, to that expected given an infinite alleles model with a known sample size and number of alleles. Significantly large values of F indicate purifying selection while significantly low values suggest balancing selection. The MK test was used to determine whether the protein evolution in Adh and Adhr departed from neutral expectations. The Adh and Adhr loci were each tested to determine if the ratio of nonsynonymous to synonymous changes was the same for fixed differences and polymorphisms. A single sequence from D. miranda was used as the outgroup for this test. Observing a significant difference in the ratio of nonsynonymous to synonymous variation between divergent and polymorphic sites can indicate adaptive fixations at amino acid sites (MCDONALD and KREITMAN 1991 Down) or relaxed selection within species (NACHMAN et al. 1996 Down).

The distribution of amino acid haplotypes into the 13 populations was used to estimate levels of gene flow among populations. SLATKIN's (1985) private alleles method was used to derive an average estimate of the neutral migration parameter.

Linkage disequilibrium between SNPs and amino acid haplotypes:
We tested each segregating site in the Adh region for significant nonrandom association with the ADHR protein haplotypes with Fisher's exact test (FET; SOKAL and ROHLF 1981 Down). Only comparisons of a SNP and protein allele capable of generating a significant result with FET were performed (LEWONTIN 1995 Down). A sequential Bonferroni correction was used to overcome the multiple comparison problem (RICE 1989 Down). The strength of linkage disequilibrium was assessed with the r2 coefficient (HILL and ROBERTSON 1968 Down). Coalescent simulations with and without recombination were used to determine how often significant nonrandom associations of nucleotides and ADHR protein haplotypes are expected given an infinite sites model (HUDSON 1990 Down).


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide diversity in the Adh region:
Table 1 shows the three estimates of genetic diversity for the 17 functional domains of the Adh region. Haplotype diversity varies from a low value of 0.084 ± 0.032 in the Adh larval leader and Adhr exon 1 sequence to a high of 0.994 ± 0.003 in the Adh adult intron. Haplotype diversity was 0.971 ± 0.006 in Adh and 0.995 ± 0.002 in Adhr. Estimates of the nucleotide heterozygosity per site on the basis of pairwise differences ({pi}) varied from 0.001 ± 0.0004 in Adhr exon 1 to 0.086 ± 0.005 in Adh intron 2. Overall, {pi} was lower in Adh (0.004 ± 0.0002) than in Adhr (0.011 ± 0.0002). Estimates of nucleotide heterozygosity based on the number of segregating sites ({Theta}) are higher than {pi} in most regions; however, the two measures of diversity show similar trends in the rank of their values in the 17 functional domains.


 
View this table:
In this window
In a new window

 
Table 1. Estimates of genetic diversity in the Adh region of D. pseudoobscura

Table 1 also shows the results of the Tajima and Fu and Li tests. Tajima's DT was negative in all domains except for intron 2 of Adh. Two of the three exons in Adh showed a significant negative Tajima's DT due to an excess of rare variants. Overall, the Adh gene showed a significant excess of rare variants with the Tajima test. None of the exons of Adhr showed a significant departure from an equilibrium neutral model nor did the total of the Adhr coding region. Fu and Li's DF&L was negative for all functional domains except intron 1 of Adh. As with the Tajima test, two of the three exons in Adh showed a significant excess of external mutations or rare variants. While the Tajima test failed to find significant departures from an equilibrium neutral model in Adhr exons, variation in two of the three exons in Adhr showed a significant excess of external mutations with the Fu and Li test. Overall, both loci showed a significant excess of rare variants with the Fu and Li test.

We used the plot of the cumulative distribution function G(x) suggested by TANG and LEWONTIN 1999 Down to look for significant clustering of segregating nucleotide sites in the Adh region. G(x) is defined as

where k is the number of the segregating site, S is the total number of segregating sites, xk is the nucleotide number of the kth segregating site, and N is the total number of nucleotides in the sequence. Fig 2 shows the relationship between nucleotide position and G(x). Regions where segregating sites are clustered show a monotonic increase in G(x), while regions that lack segregating sites show a monotonic decrease in G(x). We used a 6-site running average of the G(x) statistic to smooth the curve shown in Fig 2. The statistical significance of the segments that showed the longest increases or decreases in G(x) was evaluated with a random permutation test. For each simulation, 418 segregating sites were randomly assigned to the 3167 nucleotide sites in the Adh region and G(x) was estimated from the randomized data. The five largest monotonic increases or decreases were estimated from each randomized placement of the variable sites. A total of 100,000 simulations were performed and the significance level was determined by comparing the observed values to that of the rank-ordered simulation outcomes. We found that two sets of segregating sites were significantly clustered in the Adh adult intron, which resulted in large monotonic increases in G(x) (see 1 and 3 in Fig 2). These two segments account for the overall increasing trend of G(x) in the adult intron of Adh. Not only are there clusters of segregating sites in the Adh adult intron, but these polymorphic sites tend to have frequencies higher than those in other regions of Adh (Table 1). There were three large monotonic decreases in G(x) in the Adh region (see 2, 4, and 5 in Fig 2). The first region starts at the 3' end of the Adh adult intron. The second region begins at the 3' transcribed and untranslated leader of Adh and ends in intron 1 of Adhr. The third region covers exon 2 of Adh. These three segments tend to have fewer segregating sites because 75% of the nucleotide changes in coding sequences will result in amino acid changes. The reason why the intergenic region between Adh and Adhr is conserved is not clear at this time, but the reduced number of segregating sites may result from conserved regulatory information for the expression of Adhr in the dicistronic message.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 2. The cumulative distribution function G(x) plotted vs. nucleotide position. The locations of the two genes in the Adh region are shown at the top. The dashed line shows where G(x) = 0.0 falls on the curve. The five segments shown at the bottom indicate the five largest monotonic increases and decreases in G(x) in the Adh region. The number above each segment gives the rank of the absolute value of the monotonic increase or decrease. The probability of obtaining each of these increases or decreases in G(x) < 0.001.

Protein evolution in ADH:
Table 2 shows the three amino acid sites segregating in the ADH protein and the four haplotypes that are generated from these changes. Haplotype 1 is the dominant haplotype with a frequency of 0.978 while the three other haplotypes were all unique. The three unique haplotypes were collected from the Kaibab population, which had the largest sample size. The haplotype diversity in ADH is 0.043 with a corresponding homozygosity value (F) of 0.957, which departs from expectations of the infinite alleles model when tested with the EW test (Table 3). The excess homozygosity in the ADH protein could be due to the recent fixation of an adaptive mutation or to strong purifying selection (SCHAEFFER and MILLER 1992B Down). We used the MK test to discriminate between these two alternative hypotheses. Table 4 shows the results of the MK test where the D. miranda Adh sequence was used as the outgroup sequence. The Adh locus rejects a neutral model of protein evolution with the MK test because too few synonymous fixed differences have accumulated between D. pseudoobscura and D. miranda compared to the numbers of nonsynonymous changes.


 
View this table:
In this window
In a new window

 
Table 2. ADH amino acid haplotypes within D. pseudoobscura and six sibling species within the obscura group


 
View this table:
In this window
In a new window

 
Table 3. Ewens-Watterson test for two loci in the Adh region of D. pseudoobscura


 
View this table:
In this window
In a new window

 
Table 4. McDonald-Kreitman test for the Adh and Adhr loci in D. pseudoobscura

Protein evolution in ADHR:
Table 5 shows the 12 polymorphic amino acid residues and the 18 multisite genotypes or haplotypes for the ADHR protein. We refer to the ADHR haplotypes by the number designation in Table 5. For instance, ADHR 1 refers to haplotype 1. The haplotype diversity in ADHR is 0.804 and a EW test fails to reject the infinite alleles model of neutral evolution (Table 3). In addition, the MK test also fails to reject a neutral model of protein evolution (Table 4). The 12 amino acid changes in ADHR are significantly clustered at the C-terminal end of the protein as indicated by a long monotonic increase in G(x) (TANG and LEWONTIN 1999 Down). The change in G(x) from amino acid position 200 to 273 is 0.487. The probability of finding a run >=0.487 is 0.003 given a 278-amino-acid protein with 12 polymorphic sites. Eleven of the 12 amino acid changes are located in the third exon of Adhr.


 
View this table:
In this window
In a new window

 
Table 5. ADHR amino acid haplotypes within D. pseudoobscura and six sibling species within the obscura group

Table 6 shows the distribution of the 18 ADHR haplotypes among the 13 populations that were sampled for this study. In populations where only a single allele was sampled, the most common ADHR allele was collected, while unique ADHR haplotypes were found only in the largest population samples. We used a chi-square test of heterogeneity to determine if the frequencies of alleles sampled in each population were significantly different. Given the number of cells with a value of zero, we used a random permutation test to sample ADHR haplotypes with and without replacement to determine the significance of our observed chi-square value. The first test was performed by drawing the 139 ADHR alleles without replacement to fill the frequency distribution table and a chi-square test was performed on the simulated data. We generated 100,000 random tables and determined the significance value of our observed test from the frequency of simulated chi-square values that were greater than our observed test statistic. The second test was performed by drawing ADHR haplotypes from the total observed frequency spectrum with replacement. The chi-square distribution and significance level were estimated with the same method as the first test. In both cases, we failed to reject the null hypothesis of homogeneity of the ADHR frequency spectra among the 13 populations ({chi}2 = 133.128, without replacement P = 0.887; with replacement P = 0.690). SLATKIN's (1985) private allele analysis of ADHR haplotype frequencies estimates Nm to be 3.127.


 
View this table:
In this window
In a new window

 
Table 6. Distribution of ADHR protein haplotypes among 13 populations of D. pseudoobscura

The neighbor-joining method within the MEGA program (SAITOU and NEI 1987 Down; KUMAR et al. 1994 Down) was used to reconstruct the relationships among the ADHR alleles (Fig 3). Bootstrap replication was used to test the significance of clusters within the genealogy. The nucleotide data from the three exons of Adhr have remarkable phylogenetic signal given how much recombination has been estimated to occur in this region (SCHAEFFER and MILLER 1993 Down; Fig 3). Several pieces of evidence suggest that ADHR 1 is the ancestral protein haplotype. First, ADHR 1 has a single-amino-acid difference from the D. miranda protein allele. Second, ADHR 1 is the most abundant protein allele in our sample. Third, the phylogeny shows that ADHR 1 is the most ancestral allele based on the polymorphic nucleotides segregating in the sample.



View larger version (47K):
In this window
In a new window
Download PPT slide
 
Figure 3. Neighbor-joining phylogeny of ADHR haplotypes. The phylogeny was constructed with Kimura two-parameter distances estimated from the 834 coding nucleotides of Adhr. The statistical significance of clusters within the tree was tested with 500 bootstrap replicates. All bootstrap values >75% are presented in the phylogeny. Nucleotide sites within Adhr were used to construct the phylogeny. The inset shows the symbols used for the 10 major ADHR haplotypes that have a frequency greater than one in the sample. The vertical lines on the right indicate alleles that are described in more detail in the text.

The ADHR 2 lineage has two sublineages of eight (GB27, GB32, GB41, GB92, GB114, BC93, PS108, and PS139) and three (AH69, AH54, and GB 12) strains where Adhr is identical in sequence (indicated by A2 in Fig 3). The eight-sequence sublineage has strains collected from British Columbia and California, while all of the strains in the three-sequence sublineage were collected from California. The two sublineages each have a significant number of nucleotide changes that differentiate them from the other strains and are supported with 96 and 98% of bootstrap replicates, respectively. Phylogenies constructed from nucleotide data from adjacent Adh regions do not cluster the D. pseudoobscura strains by their ADHR designations (data not shown).

We wanted to determine if the estimates of the neutral mutation parameter {Theta} in Adh and Adhr were equivalent in the nine most abundant ADHR haplotypes (Table 7). The ADHR 6 and 8 haplotypes appeared to have low levels of nucleotide diversity per gene in Adhr compared to the other protein alleles. We used coalescent simulations available within DnaSP 3.50 (ROZAS and ROZAS 1999 Down) to test whether the estimates of {Theta} in Adh and Adhr within a haplotype were higher or lower than the overall mean {Theta} value given the sample size for the tested haplotype. For instance, {Theta} per gene in Adhr was estimated to be 27.1 for all haplotypes. The level of variation in ADHR 6 ({Theta} = 2.4) is significantly less than that expected given the 95% confidence interval for the Adhr gene ({Theta} = 5.8–69.12) given a sample size of five alleles assuming no recombination. We used a sequential Bonferroni test to correct for multiple tests (RICE 1989 Down). The results of these analyses show that the Adh gene within ADHR 2 has less nucleotide diversity than expected given a {Theta} value of 48.8. None of the other ADHR haplotypes showed a significant reduction or excess of nucleotide heterozygosity.


 
View this table:
In this window
In a new window

 
Table 7. Nucleotide heterozygosity per gene ({Theta}) within Adh and Adhr based on the classification of nine ADHR haplotypes

Linkage disequilibrium between SNPs and ADHR haplotypes:
We asked whether any of the SNPs in the Adh region were nonrandomly associated with any of the 17 most frequent ADHR haplotypes. There are 404 nucleotide sites that are polymorphic in this sample of 139 strains of D. pseudoobscura, excluding the 14 nucleotide sites that are located in codons responsible for the amino acid replacements in ADHR.

There are 17 independent tests of association with ADHR alleles that can be performed for each biallelic nucleotide site for a total of 6868 possible comparisons. Only 793 of the 6868 comparisons were capable of rejecting the null hypothesis of no association with FET given the observed marginal frequencies of our SNPs and ADHR alleles (LEWONTIN 1995 Down). Table 8 shows the number of SNP comparisons that were in significant association with the 17 most frequent ADHR protein alleles. Only 8 of the 404 nucleotide sites were in significant linkage disequilibrium with at least one of the ADHR protein haplotypes (nucleotide sites 744, 956, 1454, 2742, 2843, 2891, 2912, and 3089). Five of these 8 nucleotide sites are significantly clustered in the Adhr gene when the TANG and LEWONTIN 1999 Down method is used [sites 2742, 2843, 2891, 2912, and 3089 have the longest monotonic increase in G(x) = 0.389, P < 0.001]. The other three nucleotides (744, 956, and 1454) are located 5' to the Adhr gene and are separated from the beginning of the haplotype-generating bases by 1824, 1612, and 1114 bases, respectively. The average r2 (HILL and ROBERTSON 1968 Down) values tended to increase as the frequency of the ADHR haplotype decreased. This is not surprising given that the power to detect significant linkage disequilibrium decreases as allele frequency decreases (LEWONTIN 1995 Down); however, even low frequency variants can be in significant linkage disequilibrium if the level of association is large enough.


 
View this table:
In this window
In a new window

 
Table 8. Significant associations of SNPs with the 17 most frequent ADHR protein haplotypes

Some general patterns emerge from our analysis of significant linkage disequilibria between SNPs and ADHR haplotypes. Few significant associations were observed, even though 80 of 404 SNPs had frequencies in the range of 5–48% and 5 of 18 ADHR haplotypes had frequencies in the range of 5–37%. None of the nucleotide sites were found to be in significant linkage disequilibrium with the dominant haplotype. ADHR 6 haplotype had the lowest frequency of any allele to show a significant association with a SNP (0.036), while ADHR 2 had the highest frequency of any allele to show a strong association with a nucleotide polymorphism (0.187). Eight of 404 nucleotide sites showed significant associations with three ADHR alleles. Nucleotide 744 had the lowest frequency of bases found to be in significant linkage disequilibrium with an ADHR haplotype (0.021), while base 2891 had the highest frequency (0.482). Nucleotide 2891 was the only polymorphic site to show an association with more than one ADHR haplotype (ADHR 2 and ADHR 3).

We determined the fraction of nonrandom associations between SNPs and ADHR haplotypes expected under a neutral model with mutation and recombination with coalescent simulations. The method of HUDSON 1990 Down was used to generate samples of gametes that reflect a Wright-Fisher model given our observed values of the mutation parameter 4Neµ = 68 (WATTERSON 1975 Down), recombination parameter 4Nec = 307 (HUDSON 1987 Down), and sample size n = 139. A coalescent simulation replicate was included in our sample if 12 segregating sites within the sample generated 18 haplotypes. Initially, we tried to match the observed haplotype frequency distribution exactly, but we decided to use less stringent criteria when none of the simulated data sets matched the ADHR frequency spectrum. For each simulation replicate, FET was used to test all polymorphic nucleotide sites for significant nonrandom associations with the 17 most frequent haplotypes and the sequential Bonferroni method was used to overcome the multiple comparison problem (RICE 1989 Down) within the replicate. For the 17 most frequent haplotypes, we estimated the fraction of comparisons with a significant FET and the average r2 values (HILL and ROBERTSON 1968 Down). We performed a second coalescent simulation with mutation and no recombination as a control to determine the effect that recombination has on the distribution of percentage of significant tests.

The observed frequency spectrum of the 18 ADHR haplotypes was similar to those of the replicate samples within the two coalescent simulations. We used the EW homozygosity measure F to summarize the haplotype frequency spectrum of each replicate sample to determine a 95% confidence interval (C.I.) for the recombination (Rec) and no recombination (No Rec) simulations. The observed value of F fell within the 95% C.I. for both simulations, which suggested that the simulations have a haplotype structure similar to that of the Adh region [Observed ADHR, F = 0.196; Rec, mean F = 0.164, 95% C.I. (0.101–.282); No Rec, mean F = 0.193, 95% C.I. (0.099–0.362)]. For the recombination and no recombination cases, the fraction of significant associations as a function of haplotype frequency had a unimodal distribution with its maximum at the third most frequent haplotype (Fig 4A). The major difference between the recombination and no recombination cases was in the magnitude of percentage associations. As expected, the no recombination distribution showed more nucleotides in linkage disequilibrium with protein haplotypes than did the recombination distribution. The recombination and no recombination distributions showed lower fractions of significant associations with the most frequent protein haplotype than with moderate frequency haplotypes. The magnitude of association between nucleotides and ADHR haplotypes, as measured by r2, was negatively related to haplotype frequency (Fig 4B).



View larger version (48K):
In this window
In a new window
Download PPT slide
 
Figure 4. Linkage disequilibrium between SNPs and ADHR haplotypes. A1–A14 represent ADHR haplotypes 1–14 with A1 being the most frequent haplotype and A14 being the least frequent haplotype. (A) Relationship between protein allele frequency and the fraction of valid linkage disequilibrium tests that were significant with and without recombination. (B) Relationship between protein allele frequency and mean r2 value.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Adaptive fixations of amino acids in ADH:
One might be tempted to think that the Adh locus of D. pseudoobscura is the target of a recent adaptive fixation event because of the significant result in the MK test. There are two problems with this inference. First, if one were to polarize the mutations at the three amino acid sites within D. pseudoobscura using the D. miranda sequence, then the amino acid polymorphisms at codon 68 and 185 appear to be near fixation (Table 2). This is likely to be a faulty inference because when the amino acid polymorphism of D. pseudoobscura is compared to the ADH sequences of four ancestral members of the obscura group, then all three amino acid changes in D. pseudoobscura are inferred to be new rare variants. Second, if the Adh locus of D. pseudoobscura was the target of a recent selective sweep, one would expect little synonymous variation within the gene or in the local vicinity. The levels of synonymous variation in Adh and the surrounding region are not consistent with this explanation. The dominant ADH haplotype has 50 synonymous sites segregating within populations of D. pseudoobscura and the introns of Adh have highest levels of synonymous polymorphism in the region (Table 1). These data strongly suggest that the Adh locus harbors little nonsynonymous diversity in D. pseudoobscura because the protein is under strong purifying selection that has prevented new amino acid replacements from increasing to appreciable frequency in the recent past.

The significant MK test results suggest that Adh in D. miranda may have been the target of adaptive evolution in its recent history. An examination of nucleotide sequence diversity within D. miranda will reveal either low levels of synonymous polymorphism, lending support to a recent adaptive fixation within the species, or will show levels of variation similar to those of D. pseudoobscura, suggesting that selection acts episodically on the ADH protein. Protein polymorphism in ADH is generally low within species, yet comparisons of ADH among distantly related species show a modest number of amino acid fixations. For example, D. melanogaster and D. pseudoobscura have low levels of amino acid polymorphism, but 25 amino acid changes have accumulated since the two species shared a common ancestor. This paradox suggests that ADH is usually under strong purifying selection to prevent amino acid substitutions; however, the majority of fixations that occur would seem to result from adaptive fixation events (MCDONALD and KREITMAN 1991 Down).

Multiple forms of selection act at the Adh locus of D. pseudoobscura:
The high rate of recombination in the Adh region (SCHAEFFER and MILLER 1993 Down) has allowed opposing selective forces to act on the Adh and Adhr genes and on the subdomains within each locus. Strong purifying selection acts on the nonsynonymous polymorphisms of Adh. The frequency spectrum of the amino acid variation strongly supports this argument because the three nonsynonymous changes in Adh each have a frequency of one in our sample. If the Adh gene was located in a mutational cold spot, one might expect to see low levels of synonymous and nonsynonymous changes in the gene. This is not the case. The Adh locus has accumulated 53 synonymous changes within species and four nonsynonymous fixations since the common ancestor of D. pseudoobscura and D. miranda.

The purifying selection that acts to maintain the ADH amino acid sequence does not reduce genetic diversity of the whole region. One might expect variation at tightly linked neutral sites to be reduced when associated with a locus where amino acid variation is strongly selected against (CHARLESWORTH et al. 1993 Down). This is not the case because the two small Adh introns have levels of diversity significantly higher than those of the Adh exons, despite being flanked on either side by the strong conserved exons of Adh. The likely explanation for the independent evolution of different regions is that the recombination rate in this region is sufficient to allow the decoupling of evolutionary forces between the intron and exon domains of Adh (HUDSON and KAPLAN 1995 Down; NORDBERG et al. 1996 Down).

Weakly selected synonymous codons of Adh also evolve independently of the purifying selection to preserve the amino acid sequence of ADH. Synonymous codons of Adh are weakly selected to increase the frequency of major or optimal codons that enhance the translational accuracy or efficiency of the transcript (AKASHI and SCHAEFFER 1997 Down). This type of selection reflects a balance between selection to increase the usage of optimal codons and to decrease the frequency of nonoptimal codons. The high rate of recombination in the Adh region allows weak selection to act to increase the frequency of optimal codons in Adh, despite the strong purifying selection against amino acid mutations in the ADH protein.

Strong epistatic selection acts within two of the three Adh introns despite the other types of selection that operate in the region. Two clusters of nucleotide sites that are in significant linkage disequilibrium are maintained in the face of high levels of recombination in the region (SCHAEFFER and MILLER 1993 Down; KIRBY et al. 1995 Down). KIRBY et al. 1995 Down suggested that these clusters of nucleotide sites represent sequences for two alternative stem loop structures that are polymorphic within D. pseudoobscura populations. The selection that acts to maintain the association among nucleotide sites does so without being influenced by selection in the Adh gene.

The various forms of selection that act on sequences in the Adh gene do not influence the evolution of the Adhr gene, even though Adhr is translated as part of a dicistronic message (BROGNA and ASHBURNER 1997 Down). Selection against amino acid variation at Adh is stronger than at Adhr (Table 2, Table 4, and Table 5). Amino acid variation at Adhr does not reject an equilibrium neutral model. Weak selection to increase the use of optimal codons is stronger at Adh than at Adhr (AKASHI and SCHAEFFER 1997 Down). The lower codon bias in Adhr is expected given its lower level of expression compared to Adh (SHIELDS et al. 1988 Down; BROGNA and ASHBURNER 1997 Down). Thus, the high levels of recombination in the Adh region have allowed a variety of selective processes to act on the different genes and subdomains.

Protein evolution at Adhr:
All but one of the standard tests of selective neutrality (EWENS 1972 Down; WATTERSON 1978 Down; TAJIMA 1989 Down; MCDONALD and KREITMAN 1991 Down) failed to reject an equilibrium neutral model for Adhr. The genetic diversity of Adhr suggests that the amino acid polymorphism in this locus is quite old. ADHR 7 alleles cluster with both ADHR 2 and ADHR 3 alleles, suggesting that ADHR 7 has existed long enough for recombination to shuffle nucleotide diversity of the lineage. The ADHR 6 alleles are monophyletic in the phylogeny, yet these five alleles were sampled from the geographic range of the species. These five alleles were sampled from British Columbia, California, Arizona, Colorado, and Mexico. For these alleles to coalesce into a common ancestral lineage, we must assume at least four migration events. This estimate of gene flow based on the private alleles of ADHR is consistent with previous estimates from nucleotide polymorphism data at the Adh locus (SCHAEFFER and MILLER 1992A Down), mitochondrial loci (JENKINS et al. 1996 Down), and X-linked loci (KOVACEVIC and SCHAEFFER 2000 Down). In addition, the ADHR 6 haplotypes show evidence for genetic exchange. The ADHR locus is identical in nucleotide sequence in the five ADHR 6 haplotypes, but the regions upstream from ADHR are divergent in sequence, suggesting recombination of an ADHR 6 haplotype with one of the other protein haplotypes. The observed recombination among ADHR haplotypes and the lack of population structure at Adhr suggest that the ADHR polymorphism is quite old.

In our sample of ADHR 2 chromosomes, we have eight identical sequences that were sampled from two populations, Gundlach-Bundshu, California, and Stemwinder Provincial Park, British Columbia, Canada (Fig 3). These chromosomes differ at many nucleotide sites along the sequence, which generates a consistent pattern of nonrandom association across the Adh region and a significant bootstrap value for the eight sequences. These data suggest that a subset of the ADHR 2 haplotypes has recently increased in frequency and carried linked neutral sites into high frequency in the northern range of D. pseudoobscura's distribution. The reduction of nucleotide diversity in the Adh gene on the ADHR 2 background is consistent with this rapid increase of the ADHR 2 chromosome (Table 7). This analysis of ADHR haplotypes demonstrates how amino acid and nucleotide sequence data can detect subtle expansions of subtypes that would be missed if only rates of synonymous and nonsynonymous sites are estimated.

The geographic distribution of the 18 ADHR protein alleles was largely dictated by sample size rather than allele age. ADHR 1 is likely to be the oldest allele and it has the widest distribution of all the alleles in our sample; however, ADHR 6 is a relatively new allele that also has a wide distribution. On the other hand, large samples from populations tended to have the highest allelic diversity. A total of 64 ADHR alleles were sampled from the Kaibab National Forest and this population had 15 protein alleles present. When only 1 ADHR allele was collected from the population, 1 of the 2 most common alleles was present in the sample. These data are consistent with the extensive genetic exchange that occurs among D. pseudoobscura populations (SCHAEFFER and MILLER 1992A Down). Thus, the linkage disequilibrium of SNPs and ADHR haplotypes is the result of mutation and not positive Darwinian selection or population subdivision.

Linkage disequilibrium between SNPs and ADHR haplotypes:
It has been suggested that SNPs will provide a powerful tool to map complex phenotypes in the human genome. The ADHR locus is a useful model for a disease-causing locus where different phenotypes are generated from a single locus with multiple alleles (TADMOURI et al. 1998 Down; SCRIVER and WATERS 1999 Down; MICKLE and CUTTING 2000 Down). The Adh region provides a worst-case scenario for detecting nonrandom associations or linkage disequilibrium with 1 of 18 different protein haplotypes when recombination is effective at shuffling genetic diversity.

The observed frequencies of SNPs in significant linkage disequilibrium with each of the ADHR haplotypes are within the 95% confidence limits expected under a neutral model with mutation and recombination (Table 8). These data suggest that low levels of linkage disequilibrium can be detected in a region that experiences moderate to high rates of recombination (SCHAEFFER and MILLER 1993 Down).

We did not observe any significant associations with the most frequent ADHR haplotype. This might be expected given that the most frequent haplotype is likely to be the oldest allele in the population and recombination would have sufficient time to break up any nonrandom associations that are introduced by mutation. This trend is observed in the coalescent simulations with mutation and recombination where the third most frequent haplotype had the maximum fraction of nonrandom associations (Fig 4A). As expected, the fraction of comparisons that showed significant association was much higher for the no recombination case than the recombination case; however, the maximum level of SNP associations were observed with the third most frequent haplotype rather than with the protein allele with the greatest frequency (Fig 4A).

Several factors reduce the numbers of significant association with the dominant haplotype. First, the probability of rejecting random association between a SNP and a protein haplotype with FET depends on the protein haplotype frequency. Fig 5 shows the percentage of FET tables that are capable of rejecting the null model of no association for different protein haplotype frequencies. The percentage of significant FET was determined for each protein haplotype frequency by integrating over all SNP frequencies given a sample of 139 alleles. This power curve shows that FET has the most power to reject the null hypothesis of no association for a haplotype with 13 copies in a sample of 139 (13/139 = 9.3%). There is a decline in power to reject the null hypothesis with lower and higher frequencies. This power curve suggests that the reason the fraction of significant linkage disequilibria is lower for the most common allele in a neutral coalescent without recombination is that FET has less power to reject the null hypothesis for high frequency alleles (Fig 4A).



View larger version (56K):
In this window
In a new window
Download PPT slide
 
Figure 5. The relationship between protein haplotype frequency and percentage of significant FETs. The solid bars indicate the frequencies of ADHR haplotypes observed in this study.

The power curve in Fig 5 assumes that all FET 2 x 2 tables are equally likely in a coalescent simulation; however, some 2 x 2 tables may be more likely in neutral genealogies than others depending on how protein haplotypes coalesce within trees. To observe significant linkage disequilibrium, the mutation that gives rise to a SNP must occur right before or soon after the coalescence of a protein haplotype. The probability of this co-occurrence will be a function of the coalescence time of the protein haplotype and the probability of mutations occurring near the origin of a new allele. Rare protein haplotypes will tend to coalesce earlier in a neutral genealogy compared to moderate or high frequency variants and the probability that a mutation will co-occur on the branch that gives rise to a rare variant will be extremely small (TAJIMA 1983 Down; HUDSON 1990 Down). High frequency protein alleles will tend to coalesce very deep in genealogies, allowing for many mutations to co-occur with the common protein type; however, the SNP associations are reduced when new rare and moderate frequency protein haplotypes emerge from the high frequency type. Moderate frequency variants will tend to have more nonrandom associations because these alleles will have intermediate coalescence times and intermediate probabilities of co-occurring nucleotide mutations (ZOLLNER and VON HAESELER 2000 Down).

The coalescent genealogies used in this study were used to determine how much linkage disequilibrium is expected given a neutral model with and without recombination. In this analysis, we imposed a hierarchical structure on the genealogy that simulated a genetic locus with multiple alleles. The ramifications of these results are that (1) linkage disequilibrium between a protein allele and a SNP can be broken down by mutation of one major protein allele to a rare or moderate frequency allele; (2) moderate frequency alleles will have the highest likelihood of being in significant linkage disequilibrium with a SNP; and (3) gene regions that experience moderate recombination rates can show nonrandom associations between protein alleles and SNPs, although fewer associations will be observed than in regions with lower genetic exchange rates.

Implications for association studies in disease gene mapping:
Several conclusions can be drawn from the linkage disequilibrium analysis of SNPs and ADHR haplotypes. First, recombination reduces the fraction of polymorphic sites that show associations with a disease-causing gene, but significant linkage disequilibrium can be observed as a result of mutation and random genetic drift. The significant associations that are observed in a region of moderate to high levels of recombination are likely to indicate candidate regions with a disease-causing gene nearby. Second, linkage disequilibrium studies will be most effective in detecting disease alleles at moderate frequencies. If a genetic disease is caused by a high frequency allele, then linkage disequilibrium mapping will not be an effective tool for locating the disease gene. This will not be a problem because disease-causing alleles usually exist at moderate to low frequencies in human populations. Linkage disequilibrium studies are probably not going to be effective in detecting rare alleles because significant associations will occur only if a new mutation occurs on the same chromosome as a disease-causing allele. Linkage disequilibrium mapping offers the most promise of mapping disease genes with a frequency of 10% in a sample size of 100 genes. This results from the power of FET to detect significant associations and the genealogical history of disease-causing alleles.


*  FOOTNOTES

1 Present address: 29 Stevenson Ave., Everett, MA 02149. Back
2 Present address: Department of Mathematical Sciences, Villanova University, St. Augustine Ctr., Rm. 305, 800 Lancaster Ave., Villanova, PA 19085-1699. Back
3 Present address: Department of Botany and Plant Pathology, Purdue University, Lilly Hall, Rm. 1-414, West Lafayette, IN 47907. Back
4 Present address: Department of Genetics, Cell Biology and Development, Rm. 6-160 Jackson Hall, 321 Church St. SE, Minneapolis, MN 55455. Back
Stephen W. Schaeffer dedicates this article to the memory of Joyce L. Schaeffer (April 4, 1927–March 18, 2001). Back


*  ACKNOWLEDGMENTS

We thank David J. Begun, Andrew G. Clark, Charles H. Langley, and Thomas S. Whittam for helpful discussions during the preparation of this article. We thank two anonymous reviewers for providing constructive criticisms that improved the article. This work was supported by grant GM-42472 from the National Institutes of Health.

Manuscript received April 27, 2001; Accepted for publication July 25, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AKASHI, H. and S. W. SCHAEFFER, 1997  Natural selection and the frequency distributions of "silent" polymorphism in Drosophila. Genetics 146:295-307[Abstract].

ALLISON, D. B., 1997  Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60:676-690[Medline].

AQUADRO, C. F., S. F. DEESE, M. M. BLAND, C. H. LANGLEY, and C. C. LAURIE-AHLBERG, 1986  Molecular population genetics of the alcohol dehydrogenase gene region of Drosophila melanogaster.. Genetics 114:1165-1190[Abstract/Free Full Text].

BELL, D. A. and J. A. TAYLOR, 1997  Genetic analysis of complex disease. Science 275:1327-1328[Free Full Text].

BENYAJATI, C., N. SPOEREL, H. HAYMERLE, and M. ASHBURNER, 1983  The messenger RNA for alcohol dehydrogenase in Drosophila melanogaster differs in its 5' end in different developmental stages. Cell 33:125-133[Medline].

BROGNA, S. and M. ASHBURNER, 1997  The Adh-related gene of Drosophila melanogaster is expressed as a functional dicistronic messenger RNA: multigenic transcription in higher organisms. EMBO J. 16:2023-2031[Medline].

CABOT, E. L. and A. T. BECKENBACH, 1989  Simultaneous editing of multiple nucleic acid and protein sequences with ESEE. Comput. Appl. Biosci. 5:233-234[Free Full Text].

CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993  The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303[Abstract].

EWENS, W. J., 1972  The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3:87-112[Medline].

FU, Y.-X. and W.-H. LI, 1993  Statistical tests of neutrality of mutations. Genetics 133:693-709[Abstract].

HILL, W. G. and A. ROBERTSON, 1968  Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38:226-231.

HUDSON, R. R., 1987  Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245-250[Medline].

HUDSON, R. R., 1990  Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1-44.

HUDSON, R. R. and N. L. KAPLAN, 1995  Deleterious background selection with recombination. Genetics 141:1605-1617[Abstract].

JENKINS, T. M., C. J. BASTEN, and W. W. ANDERSON, 1996  Mitochondrial gene divergence of Colombian Drosophila pseudoobscura.. Mol. Biol. Evol. 13:1266-1275[Abstract].

JORDE, L. B., 1995  Linkage disequilibrium as a gene-mapping tool. Am. J. Hum. Genet. 56:11-14[Medline].

KIRBY, D. A., S. V. MUSE, and W. STEPHAN, 1995  Maintenance of pre-mRNA secondary structure by epistatic selection. Proc. Natl. Acad. Sci. USA 92:9047-9051[Abstract/Free Full Text].

KOVACEVIC, M. and S. W. SCHAEFFER, 2000  Molecular population genetics of X-linked genes in Drosophila pseudoobscura.. Genetics 156:155-172[Abstract/Free Full Text].

KREITMAN, M., 1983  Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster.. Nature 304:412-417[Medline].

KUMAR, S., K. TAMURA, and M. NEI, 1994  MEGA: molecular evolutionary genetics analysis. Comput. Appl. Biosci. 10:189-191[Abstract/Free Full Text].

LEWONTIN, R. C., 1995  The detection of linkage disequilibrium in molecular sequence data. Genetics 140:377-388[Abstract].

LONG, A. D. and C. H. LANGLEY, 1999  The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9:720-731[Abstract/Free Full Text].

LONG, A. D., M. N. GROTE, and C. H. LANGLEY, 1997  Genetic analysis of complex diseases. Science 275:1328.

MCDONALD, J. H. and M. KREITMAN, 1991  Adaptive protein evolution at the Adh locus in Drosophila.. Nature 351:652-654[Medline].

MICKLE, J. E. and G. R. CUTTING, 2000  Genotype-phenotype relationships in cystic fibrosis. Med. Clin. North Am. 84:597-607[Medline].

NACHMAN, M. W., W. M. BROWN, M. STONEKING, and C. F. AQUADRO, 1996  Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142:953-963[Abstract].

NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.

NORDBERG, M., B. CHARLESWORTH, and D. CHARLESWORTH, 1996  The effect of recombination on background selection. Genet. Res. 67:159-174[Medline].

RICE, W. R., 1989  Analyzing tables of statistical tests. Evolution 43:223-225.

RISCH, N., 1990  Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet. 46:222-228[Medline].

RISCH, N. and K. MERIKANGAS, 1996  The future of genetic studies of complex human diseases. Science 273:1516-1517[Medline].

ROZAS, J. and R. ROZAS, 1999  DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175[Abstract/Free Full Text].

RUSSO, C. A., N. TAKEZAKI, and M. NEI, 1995  Molecular phylogeny and divergence times of drosophilid species. Mol. Biol. Evol. 12:391-404[Abstract].

SAITOU, N. and M. NEI, 1987  The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425[Abstract].

SCHAEFFER, S. W. and C. F. AQUADRO, 1987  Nucleotide sequence of the Adh gene region of Drosophila pseudoobscura: evolutionary change and evidence for an ancient gene duplication. Genetics 117:61-73[Abstract/Free Full Text].

SCHAEFFER, S. W. and E. L. MILLER, 1991  Nucleotide sequence analysis of Adh genes estimates the time of geographic isolation of the Bogota population of Drosophila pseudoobscura.. Proc. Natl. Acad. Sci. USA 88:6097-6101[Abstract/Free Full Text].

SCHAEFFER, S. W. and E. L. MILLER, 1992a  Estimates of gene flow in Drosophila pseudoobscura determined from nucleotide sequence analysis of the alcohol dehydrogenase region. Genetics 132:471-480[Abstract].

SCHAEFFER, S. W. and E. L. MILLER, 1992b  Molecular population genetics of an electrophoretically monomorphic protein in the alcohol dehydrogenase region of Drosophila pseudoobscura.. Genetics 132:163-178[Abstract].

SCHAEFFER, S. W. and E. L. MILLER, 1993  Estimates of linkage disequilibrium and the recombination parameter determined from segregating nucleotide sites in the alcohol dehydrogenase region of Drosophila pseudoobscura.. Genetics 135:541-552[Abstract].

SCRIVER, C. R. and P. J. WATERS, 1999  Monogenic traits are not simple: lessons from phenylketonuria. Trends Genet. 15:267-272[Medline].

SHIELDS, D. C., P. M. SHARP, D. G. HIGGINS, and F. WRIGHT, 1988  "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704-716[Abstract].

SLATKIN, M., 1985  Rare alleles as indicators of gene flow. Evolution 39:53-65.

SOKAL, R. R., and F. J. ROHLF, 1981 Biometry. W. H. Freeman, New York.