No Evidence for an Association Between Common Nonsynonymous Polymorphisms in Delta and Bristle Number Variation in Natural and Laboratory Populations of Drosophila melanogaster
Anne Genissel, Tomi Pastinen, Andrea Dowell, Trudy F. C. Mackay, Anthony D. Long

Abstract

We test the hypothesis that naturally occurring nonsynonymous variants in the Delta ligand of the Notch signaling pathway contribute to standing variation in sternopleural and/or abdominal bristle number in Drosophila melanogaster, for both a large cohort of wild-caught flies and previously described laboratory lines. We sequenced the transcribed region of Delta for 16 naturally occurring chromosomes and 65 SNPs, including 7 nonsynonymous SNPs (nsSNPs), were observed. Identified nsSNPs and 6 additional common SNPs, all located in exon 6 and the 3′ UTR, were genotyped in 2060 wild-caught flies using an OLA-based methodology and genotyped in 38 additional natural chromosomes via DNA sequencing. None of the genotyped nsSNPs were significantly associated with natural variation in bristle number as assessed by a permutation test. A 95% upper bound on the additive genetic variance attributable to each genotyped SNP in the large natural cohort is <2% of the total phenotypic variation. Results suggest that two previously detected genotype/phenotype associations between bristle number and variants in the introns of Delta cannot be explained by linkage disequilibrium between these variants and nearby nonsynonymous variants. Unidentified regulatory variants more parsimoniously explain previous observations.

IT has been long debated whether the actual variants contributing to variation in quantitative traits and short-term evolutionary change are largely regulatory (King and Wilson 1975; Stern 2000; Hudson 2003) or structural in nature (Botstein and Risch 2003). Although the importance of variation in gene regulatory regions is undeniable, it is not at all clear that they are the major contributors to variation in complex traits. Approximately 60% of single-gene human Mendelian disorders are associated with missense or nonsense variants; 10% with splicing variants; 1% with regulatory variants; with insertions, deletions, and rearrangements contributing the bulk of the remainder (Botstein and Risch 2003). If, in a similar manner, variants of small quantitative effect are associated with coding or splicing variants 70% of the time, then coding and splicing variants should be excluded as contributing to quantitative variation before attempting to implicate more-difficult-to-study binding sites for transcription factors or cis-acting regulators. The most direct evidence for quantitative trait loci (QTL) being regulatory in nature are three studies in plants that have position-cloned QTL: the tb1 locus in maize (Lukens and Doebley 1999) and fw2.2 and Brix9-5-2 loci in tomato (Fraryet al. 2000; Fridmanet al. 2000). However, a direct identification of the actual causative variants, even in these landmark studies, is still lacking. It is currently unfeasible to test every single variant in the genome for an association with phenotypic variation. For example, in humans there is an estimate of 6 million common single-nucleotide polymorphisms (SNPs), clearly a number beyond current typing technologies even in modest-sized population samples. Typing a subset of these SNPs and relying on linkage disequilibrium between SNPs is likely to reduce genotyping effort by only 50% (Kruglyak 1999; Long and Langley 1999; Carlsonet al. 2003). However, if the bulk of complex disease is associated with SNPs encoding amino acid variants, then the complexity of the problem is greatly reduced, as <1% of SNPs are likely to encode amino acid changes.

Drosophila melanogaster sternopleural and abdominal bristle number is an excellent model system for dissecting the genetic basis of continuous variation (Falconer and Mackay 1996). QTL for bristle number variation, with both sex-specific and epistatic effects, have been identified by mutation-accumulation experiments, P-element mutagenesis, and QTL mapping using laboratory-selected lines (reviewed in Mackay 1995, 2001). Association studies between DNA polymorphisms at candidate genes and bristle number variation have further implicated candidate genes as contributors to variation in bristle number (ASC, Mackay and Langley 1990; sca, Laiet al. 1994; Dl, Longet al. 1998; ASC, Longet al. 2000; h, Robinet al. 2002). Among those genes is Delta (Dl), which encodes the Notch ligand in the Notch signaling pathway that is required during early and late steps of bristle development for correct cell fate specification (Parks and Muskavitch 1993). Delta is involved in cell-cell communication that drives the process of lateral inhibition among sensory organ precursor cells (reviewed in Jan and Jan 1994; Artavanis-Tsakonas et al. 1995, 1999), with recent studies providing support for the existence of feedback regulation of Dl and Notch expression patterns (e.g., Parkset al. 1997; Jacobsenet al. 1998; Ramainet al. 2001). Both QTL mapping experiments and tests for failure to complement support the hypothesis that DI and other candidate genes, initially identified through mutants of large effect on bristle number, harbor naturally occurring alleles of subtle effect contributing to variation in bristle number (Long et al. 1995, 1996). Moreover, a previous restriction enzyme survey of a sample of 55 naturally occurring chromosomes for a 57-kb region including the complete Delta transcribed region uncovered two polymorphic sites independently associated with variation in bristle number. A site in the second intron accounted for 12% of the total genetic variation in sternopleural bristle number and a site in the fifth intron accounted for 6% of the total genetic variation for abdominal bristle number in females, both in a set of homozygous third chromosome substitution lines (Longet al. 1998). It was proposed that these marker sites were likely in linkage disequilibrium (LD) with the actual causative sites contributing to bristle number variation. Furthermore, the patterns of LD in the region conservatively suggest that causative sites are unlikely to be >10 kb from the marker sites. Subsequent sequencing of the complete genome of D. melanogaster has shown that the closest predicted genes to Delta are either 74.4 kb upstream of the SNP, showing an association with sternopleural bristle number in intron 2, or 48 kb downstream of the SNP, showing an association with abdominal bristle number in intron 5 (Berkeley Drosophila Genome Project release 3.1). Thus, it seems unlikely that the observed associations are due to genes other than Delta. On the other hand, the SNP in intron 2 is only 0.8 kb away from exon 2 and the SNP in intron 5 is only 0.6 kb away from exon 5, making it possible that these markers are in LD with SNPs that encode amino acid polymorphisms [nonsynonymous (ns)SNPs] in the Delta gene product.

This work tests the hypothesis that nsSNPs in the Delta gene account for previously observed associations between other naturally occurring SNPs in the second and fifth introns of Delta and abdominal and sternopleural bristle number variation in D. melanogaster. We test every common nsSNP in Delta; thus if we fail to find associations we can exclude common nsSNPs as a class as contributors to bristle number variation. Although it would be desirable to examine additional SNPs, including the significant polymorphisms observed in Long et al. (1998), the magnitude of the experimental effort required to detect small but significant associations in a large natural cohort excluded genotyping more than a few SNPs in a single PCR fragment with current technologies. The translated product of Delta harbors extracellular domains [Delta/Serrate/Lag2 and a tandem array of epidermal growth factor (EGF)-like repeats] and a transmembrane domain (Vässin and Campos-Ortega 1987; Kopczynskiet al. 1988; Muskavitch 1994). The EGF-like repeats (ELRs) within the Delta extracellular domain play a role in molecular interactions between Dl and its receptor Notch (Lieberet al. 1992). Furthermore, proteolytic cleavage of Dl is related to down-regulation of Dl as well as Dl-Notch interactions (Klueget al. 1998). The biochemical function of Delta suggests that nsSNPs can potentially contribute to flux in Notch signaling and ultimately contribute to bristle variation.

A second goal of this study is to compare the nature of associations between coding and noncoding SNPs in Delta and variation in sternopleural and abdominal bristle number in three different genetic backgrounds: (1) a large natural cohort of D. melanogaster (1031 females and 1029 males), (2) a set of lines consisting of 47 different natural homozygous third chromosomes in an otherwise isogenic background, and (3) a set of lines consisting of 46 lines differing only in a small natural introgressed fragment including Dl. We test whether allelic effects estimated in the natural population and set of laboratory lines differ. The answer to this question has important implications for theories attempting to explain the maintenance of quantitative genetic variation in terms of the frequencies and effects of the alleles at the underlying QTL (Barton and Turelli 1989). As historical patterns of natural selection acting on these variants in nature are responsible for the observed distribution of frequencies and effects, it is the estimates of frequencies and effects in nature that are directly applicable to these theories. If laboratory estimates of the joint distributions of effects and frequencies are biased, then they may be of limited use in calibrating theoretical models.

We employ a two-tiered approach to allow large-scale association studies to be efficiently carried out (cf. Robinet al. 2002). In the first tier we identify common naturally occurring variants in Dl by sequencing the complete transcribed region for 16 isogenic chromosomes. Sequencing to this depth makes it unlikely that common polymorphisms escape detection. We also experimentally verify this assumption by sequencing 38 additional isogenic chromosomes from the above-described set of lines for much of exon 6 and part of the 3′ untranslated region (UTR). In the second tier, a subset of identified SNPs is genotyped in the large natural cohort, using a novel high-throughput genotyping assay, which can inexpensively and efficiently genotype diploid individuals. We then test for associations between assayed genotypes and bristle number phenotypes in our large natural cohort and two sets of laboratory lines (Longet al. 1998). We test all common nsSNPs for associations with bristle number variation and find no significant associations despite ample statistical power to detect even subtle effects. We conclude that previously observed associations between intronic SNPs in Dl and abdominal and sternopleural bristle number are not due to common amino-acid-changing SNPs and that regulatory polymorphisms more parsimoniously explain previous observations.

MATERIALS AND METHODS

Bristle counts and genomic DNA preparation: A sample of 2060 D. melanogaster adults (1029 males and 1031 females) were collected from openly fermenting Pinot Noir grapes and discarded pulp at an organic winery (Kaz Vineyard and Winery, Sonoma Valley, CA) in 1998. Of 2085 collected flies, 11 males and 14 females were discarded as they were determined to be D. simulans by use of a PCR assay that distinguishes D. melanogaster and D. simulans by amplicon size (data not shown). Sternopleural bristle number is taken to be the total number of macrochaetae and microchaetae on the left and right sternopleural plates and abdominal bristle number to be the number of bristles on the fifth sternite for males or the sixth sternite for females. After bristles were counted for each fly, DNA was extracted from whole flies using the Puregene DNA isolation from cell and tissue kit (Gentra Systems, Research Triangle Park, NC) following the manufacturer's instructions.

The laboratory strains used were derived from a population sample collected in North Carolina and are described elsewhere (Longet al. 1998; Lyman and Mackay 1998). The strains are (1) homozygous natural third chromosomes in an otherwise isogenic background (“W” background) and (2) homozygous natural introgressed fragments of the third chromosome including Delta, made by repeated rounds of backcrossing through females mutant for Delta in an otherwise isogenic background (“B” background; see Longet al. 1998).

SNP identification: Sequencing of the transcribed and small exon-flanking regions (25 bp on either side of each exon) of the Delta locus was performed on 16 isogenic lines of D. melanogaster (Figure 2; GenBank accessions nos. AY437140–AY437235), and on one D. simulans inbred line (GenBank accession nos. AY438142–AY438146). All the nucleotide positions referred to throughout this work are from a fragment of the Drosophila genome stretching from ∼14 kb upstream of the start of the 5′ UTR of Delta to the end of the 3′ UTR (DDBJ/EMBL/ GenBank accession nos. TPA: BK004004). Sequence data consist of six fragments for each of 16 strains with the fragment including the 5′ UTR and exon 1 extending from 14,784 to 15,542, exon 2 from 20,900 to 21,261, exon 3 from 24,975 to 25,091, exon 4 from 29,000 to 29,298, exon 5 from 32,348 to 32,452, and exon 6 and the 3′ UTR from 34,552 to 38,460. Templates consisted of long overlapping PCR amplicons of ∼5 kb in size covering the regions of interest (see Longet al. 1998). PCR clean up was carried out using Amicon filters following the manufacturer's instructions (Millipore, Bedford, MA). Sequencing was performed using internal primers (sequences available upon request) and the dye-terminator chemistry on an ABI 377 (Applied Biosystems, Foster City, CA). Traces were manually edited in SeqMan (DNAStar, Madison, WI), and sequences were aligned in BioEdit (version 5.0.9; Hall 1999). The bulk of the sequencing was carried out for only one strand, so a small fraction of singleton polymorphisms are potentially sequencing artifacts.

Additional sequencing and genotyping assay validation: We sequenced a 1545-bp DNA fragment spanning the last 1320 bases of exon 6 and the first 225 bases of the 3′ UTR (positions 35,016–36,560) for 38 additional chromosomes described in Long et al. (1998; GenBank accession nos. AY438104–AY438141). We partially sequenced the same region for a subset of 44 wild-caught diploid individuals to uncover 100 SNPs that gave discrepant genotypes in the oligonucleotide ligation assay (OLA) and allele-specific oligonucleotide (ASO) assays described below (i.e., SNPs 35,639, 35,791, 36,116, and 36,271; GenBank accession nos. AY438147–AY438202). Sequencing was carried out by dividing the 1545-bp region into three PCR amplicons. PCR was carried out, on first-round PCR products described below, using the following primers tailed with standard M13 sequencing sites (annotated in italics): (5′-TGTAAAACGACG GCCAGTCCCTGTCATCAGGGAATCT and 5′-CAGGAAACAG CTATGACCTAGGACTACCTGGGCATTG; 5′-TGTAAAACGACG GCCAGTGTTCGATGCGTGTGTGC and 5′-CAGGAAACAGCT ATGACCCAGTAGGAACCCTCGCCTA; and 5′-TGTAAAACGA CGGCCAGTCGGATAACAACAATGCCAAC and 5′-CAGGAAAC AGCTATGACCTGTTCCTGCTGGATACAATG). Cycling conditions were 4 min at 95° followed by 30 cycles of 45 sec at 94°, 30 sec at 64°, and 30 sec at 72°. Sequencing was performed using the complementary M13 sequencing primers and the dye-terminator-based sequencing on an ABI 377 (Applied Biosystems).

SNP typing: The SNPs were typed by two different assays, the OLA-based and the ASO assays, the second for assessing the error rate of the new genotyping assay we developed on the basis of the OLA.

Oligonucleotide ligation assay (OLA): We performed 12-plex ligation-dependent probe amplifications on the basis of a method similar to one initially described by Tobe et al. (1996), on 2060 different DNA samples. Modifications to the protocol included each upstream oligonucleotide synthesized to contain a unique “bar code” and M13F amplification primer-binding site and each downstream oligonucleotide synthesized to contain a reverse-complemented M13R amplification primer-binding site. More specifically, each pair of upstream oligonucleotides consists of an M13F sequence of 17 nucleotides at the 5′ end, two different bar codes of 13–18 nucleotides, and 16–22 nucleotides matching the DNA template, with the 3′-most base querying the SNP. Each phosphorylated downstream oligonucleotide consists of 14–17 nucleotides specific to the template DNA and the reverse complement of an M13R sequence 3′ end (for oligonucleotide sequences see Table S at http://www.genetics.org/supplemental/). This modified method closely follows a method independently proposed by Schouten et al. (2002), except that our OLA method grids the final products onto nylon membranes and hybridizes with labeled bar code-specific probes to determine final genotypes.

PCR: A 2136-bp fragment covering the end of exon 6 and the beginning of the 3′ UTR was PCR amplified in a 10-μl reaction from 20–50 ng of genomic (g)DNA template, with 0.27 units of Extaq polymerase in its recommended buffer (Panvera), 0.5 μm of each primer 5′-AAACCCTGTCATCAGGGAATCT and 5′-ATGAGGAGGTTTCTTTCAATCGT, and 200 μm of each dNTP. This amplicon contains all nsSNPs identified from the sequencing of 16 isogenic lines. A second round of PCR was performed to amplify a 1957-bp amplicon from 1 μl of first-round PCR, diluted 100 times in water using the same concentration of PCR reagents but internal primers 5′-CCTGTC ATCAGGGAATCTGC and 5′-GTCGTCCAGTGGTTCTTGGT. Cycling conditions were 4 min at 95° followed by 30 cycles of 45 sec at 94°, 45 sec at 60°, and 90 sec at 72° for the first PCR; and 5 min at 95° followed by 25 cycles of 45 sec at 95°, 45 sec at 64°, and 45 sec at 72° for the second PCR.

Ligation and amplification: Second-round PCR products were treated with Proteinase K (Fisher Scientific, Pittsburgh) at 0.5 μg μl–1 for 30 min at 37° and 15 min at 80°. One microliter of the proteinase K-treated amplicon was added to a 3-μl ligation mix, containing 1.6 units of Taq DNA ligase (New England Biolabs, Beverly, MA), 7.5 nm of each oligonucleotide (for a total of 12 sets of three oligonucleotides), 50 mm Tris pH 8.5, 7.5 mm MgCl2, 1 mm nicotinamide-adenine-dinucleotide, 50 mm KCl, and 2.5 mm dithiothreitol. The Taq DNA ligase catalyzes the formation of a phosphodiester bond between juxtaposed 5′ phosphate and 3′ hydroxyl of the two adjacent oligonucleotides that hybridize to the complementary template DNA. Ligation occurs with ∼1000 times greater affinity if the oligonucleotide is perfectly paired to the complementary target DNA compared to a mismatched 3′-most base (Landegrenet al. 1988; Luoet al. 1996). Cycling conditions were 4 min at 95° followed by 25 cycles of 30 sec at 95°, 7 min at 48°. A postligation PCR was performed to amplify successful ligation products by adding the 4-μl ligation products to a 12-μl PCR cocktail containing 2.5 units Taq polymerase in 50 mm KCl, 0.1% Triton X-100, 0.5 μm each of M13F and M13R primers (M13F 5′-CGTTTTACAACGTCGTG and M13R 5′-CAATTTCACACAGG), and 50 μm of each dNTP. Cycling conditions were 1 min at 94° followed by 30 cycles of 45 sec at 94°, 45 sec at 58°, 45 sec at 72°, and a final cycle of 30 sec at 72°.

Membrane preparation: Dry nylon membranes (Immobilon-N, Millipore) were gridded with the products of the OLA reactions using a 96-pin tool, for a total of 12 replicate filter sets at a density of 768 reactions per filter. DNA was denatured for 10 min in 0.5 m NaOH, 1.5 m NaCl and UV cross-linked to the nylon membrane at 50 mJ. Filters were then neutralized in 0.4 m Tris-HCl, 0.3 m NaCl, 0.03 m sodium citrate for 1 hr and stored at 4° in the same buffer prior to hybridization.

Probe preparation: Radioactive labeling of each probe was carried out in a 10-μl reaction, 1 μm with respect to probe and using 5 units of T4 polynucleotide kinase (Fisher Scientific) in its own buffer and 2 μl of [γ-33P]ATP [∼20 μCi (∼6.7 nmol); Dupont New England Nuclear], for 45 min at 37° and 15 min at 65°. Unincorporated ATP was removed using MicroSpin G-25 columns (Amersham, Arlington Heights, IL), following the manufacturer's instructions.

Prehybridization, hybridization, and washing: A prehybridization step was performed for 3 hr at 42° in 1 mm EDTA pH 7.2, 0.5 m NaPi pH 7.2, 7% SDS, 1% bovine serum albumin (Sigma, St. Louis), and herring sperm DNA (Promega, Madison, WI) diluted to 1 μg μl–1 and previously denatured at 96° for 5 min. Hybridization was performed for 3 hr, with radiolabeled probe diluted to ∼2 nm in the same buffer as for prehybridization except that a nonspecific 18-mer oligonucleotide diluted to 40 nm was substituted for the herring sperm DNA. Washing consisted of 37° for 1 min, 37° for 30 min, and room temperature for 1 min. Filters were exposed against phosphor screen for 6 days prior to scanning using a PhosphoImager 445 SI (Molecular Dynamics, Sunnyvale, CA). Each membrane was stripped at 80° for 10 min in 0.1% SDS and reprobed, for typing alternate alleles at a SNP, and membranes were reused for different SNPs. Hybridization steps were performed in duplicate on independent filters for each SNP.

ASO: We performed ASO assays on a subset of the SNPs typed using OLA to provide a comparison between two largely independent assays and aid in estimating the accuracy of OLA.

Probe design: Template-specific oligonucleotide probes of ∼15 nucleotides (see Table S at http://www.genetics.org/supplemental/) having a melting temperature over 55° and containing the query SNP in the center were designed. The lengths of probes and washing temperatures were empirically adjusted to maximally discriminate between alternative alleles at each SNP (data not shown).

Membrane preparation: Dry nylon transfer membranes (Osmonics, Minnetonka, MN) were printed with the second-round PCR product amplicons described above. DNA was denatured in 0.4 m NaOH, 1.5 m NaCl for 10 min and UV cross-linked at 150 mJ to the nylon membrane.

Prehybridization, hybridization, and washing: Prehybridization was performed at 37° for 3 hr in 5× Denhardt's buffer, 5× SSPE buffer (Sambrooket al. 1989), 0.5% SDS, supplemented with herring sperm DNA (Promega) diluted to 1 μg μl–1 and previously denatured at 96° for 5 min. Probes were labeled as described above. Hybridization was carried out for 2 hr in the same buffer supplemented with unlabeled competitor oligonucleotide (previously denatured at 96° for 5 min) corresponding to the alternate allele at 20 nm.

Washing was performed in 0.1% SDS, 5× SSPE (Sambrooket al. 1989), for 30 min at 39° for all probes. Filter exposing, scanning, and stripping were as described above, except only one hybridization per allele was performed for each SNP.

Image analysis and genotype calling: Data were extracted from scanned filters using ArrayVision (version 6.0; Imaging Research, St. Catherine, Ontario, Canada). All genotype calling was carried out using custom routines written in the “R” statistical language (version 1.5.1, the R project for statistical computing, http://www.r-project.org/), after background subtraction. All signal intensity measurements were natural log transformed and the signals associated with alternative alleles/bar codes were plotted against one another. On the basis of a visual examination of the resulting X-Y plot we defined sets of points belonging to each of three visually apparent clusters corresponding to AA, Aa, and aa genotypes. For each cluster, a mean centroid (M) and variance/covariance matrix (S) are calculated. As clusters are based on a large number of observations, repeated independent definitions of the clusters by the user had little effect on resulting estimates of M and S (i.e., processing of the data subsequent to cluster definition seemed very resistant to deviations in how those clusters were initially defined). Based on the estimates of S and M for each cluster, and assuming bivariate normality, the likelihood of individual i belonging to cluster j equals LAA = L(yi; Mj=AA, Sj=AA)/Σj L(yi; Mj, Sj), where L(yi; M, S) is the bivariate normal probability distribution function and yi is a pair of log transformed intensity measures corresponding to the two alleles at that SNP. We assigned a provisional genotype to an individual if the relative likelihood of that individual belonging to one of the three clusters was >95% [e.g., LAA/(LAA + LAa + Laa) > 0.95] and the likelihood of it belonging to that cluster was >0.0005 (i.e., a likelihood of 0.0005 corresponds to a point ∼3 standard deviations from the cluster center). Data from the two replicate experiments for each SNP were then compared. We assigned an individual a genotype, if either the provisional genotypes were the same in both replicates or one of the replicates had a provisional genotype and the other none (occurring with proportions A and B, respectively). We did not assign an individual a genotype if the two provisional calls conflicted or we were unable to assign a provisional genotype for either replicate (occurring with proportions C and D, respectively). We define call rate as the rate at which we assign an individual a genotype, which is equal to A + B. Furthermore, if we assume that the probability of a correct call in a single replicate is x, then we expect C/(A + C) = 2x(1 – x) if replicates are independently assigned calls. We can solve for x=0.5+0.51(2C)(A+C) ) and estimate our miscall rate as ((1 – x)2 A + (1 – x)B)/(A + B). The miscall rate is the proportion of called genotypes that we expect to be incorrect. We can also estimate the assay failure rate, or the rate at which the OLA assay gives a consistent yet incorrect call, through a comparison of two largely independent assays (i.e., OLA and ASO), with discrepancies being scored by DNA sequencing. We note that the assay failure rate is low enough (≪1%) that directly estimating this rate by DNA sequencing would require thousands of sequencing reactions. We estimate the assay failure rate for OLA as the fraction of discordant calls between ASO and OLA, for individuals called using both assays multiplied by the fraction of these discordant calls for which the OLA assay is incorrect as assessed by DNA sequencing. This rate was estimated by sequencing a subset of 44 individuals covering 100 discrepancies between OLA and ASO (we chose individuals discrepant for multiple SNPs to minimize sequencing effort).

Population genetics data analyses: The average per site heterozygosity π, the nucleotide diversity θ per site estimated from the number of segregating sites (Nei 1987), Tajima's D (Tajima 1989), the estimate of recombination R = 4Nc (Hudson 1987; Equation 4), the Hudson-Kreitman-Aguadé test, and the McDonald and Kreitman test were all calculated using DnaSP 3.51 (Rozas and Rozas 1999). Linkage disequilibrium between all pairs of SNPs was estimated using both R2 and D′, for haploid (Weir 1996, p. 113) and diploid data (Weir 1996, p. 95). The null hypothesis of linkage equilibrium between SNPs was tested for haploid and diploid data by using a χ2 test on R2 (Hartl and Clark 1997, p. 103). Tests for departures from Hardy-Weinberg equilibrium were performed for each SNP, using a test analogous to Fisher's exact test described in Guo and Thompson (1992), using Arlequin (Schneideret al. 2000).

Quantitative genetic analyses: ANOVA tests for association between genotype and phenotype were performed for both additive and arbitrary dominant effect models (Falconer and Mackay 1996). For diploid data with genotypes QQ, Qq, and qq, an additive model corresponds to a regression of the phenotypic data on the number of “Q” alleles present in each individual. As the additive model estimates only one parameter from the data, the additive effect of an allelic substitution (a), it is statistically more likely than the arbitrary dominance model to detect associations when inheritance is truly additive. For diploid data an arbitrary dominance model is statistically equivalent to a one-way ANOVA with three levels corresponding to the three genotypic classes, but results in the estimators a and d, which have traditional biological interpretations. The additive model applied to wild-caught flies results in an estimate of aadd, the arbitrary dominance model applied to wild-caught flies results in estimates of adom and ddom, and the “additive” model applied to laboratory lines (which have only homozygous genotypes) results in an estimate of ahap. As ahap estimates the difference between two homozygous classes it effectively estimates twice aadd or adom. For each polymorphic site, F-statistics corresponding to each model were calculated separately for each trait and sex. Permutation tests (i.e., each individual's multilocus diploid genotype was permuted relative to bristle counts) were carried out to assess the significance of the largest F-statistic over polymorphic sites for each sex/trait combination. An association was considered significant at P < 0.05 if the largest observed F-statistic over SNPs was greater than the 9500th largest corresponding order statistic obtained from 10,000 random permutations of the phenotypic data with respect to the genotypic data (Churchill and Doerge 1994; Longet al. 1998).

From the effects and associated errors estimated from the above models upper bounds can be placed on the magnitude of any true undetected underlying effects in the population, on the basis of the central limit theorem (Stuart and Ord 1994). These upper and lower bounds are valid regardless of the observed significance of the statistical test and are useful in determining how large an effect attributable to a typed SNP could truly be and yet remain undetected. Thus, for aadd we define amax to be a 95% upper bound on the size of the estimated effect [i.e., the absolute value of aadd + 1.96 × SD(aadd)]. Significance of the correlation between estimated effects in either W or B backgrounds and estimated effects in the wild population are tested as described in Sokal and Rohlf (1995, p. 574). Given the large sample sizes in the wild cohort, allele frequencies are measured virtually without error. As a result we estimate the maximum additive genetic variance under an additive model attributable to a site as VAmax=2pqamax2 and the total fraction of phenotypic variance attributable to the site as VAmax/VP, where VP is the total phenotypic variation. We calculate similar statistics for the laboratory lines as VAmax(lab)=pqamax(lab)2 and VAmax(lab)/VPlab, where amax(lab) is the estimate of the upper bound on ahap obtained from the laboratory lines and VPlab is the variance among laboratory lines. Estimates for the laboratory lines are for comparison only, as small sample sizes result in larger standard errors associated with parameter estimates.

RESULTS

Bristle variation in the natural D. melanogaster population: Histograms showing the distributions of abdominal and sternopleural bristle number in wild-caught male and female D. melanogaster are shown in Figure 1. Bristle number appears normally distributed, with a mean ± SD of 16.9 ± 2.1 for sternopleural bristle number in males (SBM), 17.1 ± 2.1 for sternopleural bristle number in females (SBF), 16.6 ± 2.4 for abdominal bristle number in males (ABM), and 18.2 ± 2.7 for abdominal bristle number in females (ABF). A significant sexual dimorphism was observed for both traits (Student's t-tests: SBM vs. SBF, t =–3.00, P = 0.002; ABM vs. ABF, t =–14.2, P < 0.001). The total phenotypic variation within the population is close to the phenotypic variation for single third chromosome isogenic lines observed in previously published studies. For example, the ratios (Vnature/VW;lab) of observed phenotypic variance in the wild and a previously described set of isogenic third chromosomes are 1.240, 1.143, 0.661, and 0.807 for SBM, ABM, SBF, and ABF, respectively (Longet al. 1998). In females, the variance in bristle number for homozygous third chromosomes in the laboratory is much greater than the total variance in bristle number in the wild. Genetic variance in the isogenic lines should have approximately double the additive genetic variance due to third chromosomes relative to an outbred population, but no genetic variance due to first and second chromosomes (Falconer and Mackay 1996, Chap. 15). As the third chromosome represents ∼40% of the Drosophila genome, bristle variation should thus not be thought of as strictly additive over chromosomes.

SNP identification and molecular population genetics of Dl: The regions sequenced for 16 isogenic chromosomes consist of a portion of the Delta gene limited to the 5′ and 3′ UTRs (2801 bp) and the six exons (for a total of 2493 bp), each flanked by 25 bp of intronic sequence. All observed segregating sites (N = 110) are listed in Figure 2. A total of 65 SNPs and 19 insertion/deletion (indel) polymorphisms were identified. Indels were present in only noncoding regions and in all cases were <10 nucleotides. Of the 65 SNPs, 38% were in 5′ and 3′ UTRs (utrSNPs), 5% were in flanking intronic regions, and 57% were in coding regions (cSNPs). Of the 37 cSNPs, 7 were nonsynonymous (nsSNPs) and were all located within the large sixth exon. Despite all the nonsynonymous SNPs being in the sixth exon the ratio of synonymous to nonsynonymous SNPs in the sixth exon was not significantly different from the same ratio over the other five exons, as assessed by a contingency table analysis (χ22 = 3.50, P = 0.173).

Figure 1.

—Histograms showing the distributions of abdominal (AB) and sternopleural (SB) bristle number in wild-caught male (M; N = 1029) and female (F; N = 1031) D. melanogaster.

The average per site heterozygosity, π, was estimated to be 0.00394 for the surveyed region (Table 1). The estimate of nucleotide diversity θ from the number of segregating sites was 0.00356. Estimates of π and θ are slightly smaller than previous estimates of nucleotide variation described for a restriction survey of the Delta region that included introns and flanking regions (Longet al. 1998) and relative to other structural genes surveyed in Drosophila (Purugganan 1998). Under a neutral model of evolution, the expectations of θ and π are the same, and departure from neutrality can be detected as the normalized difference between θ and π, a statistic referred to as Tajima's D (Tajima 1989). Tajima's D is globally equal to 0.383 and provides no evidence for departure from neutrality for the DNA region considered (P > 0.10). Coding regions show approximately two times higher genetic diversity for π compared to noncoding regions, suggesting that 5′ and 3′ UTRs are under greater functional constraints than regions coding for amino acids. The parameter R = 4Nc, which corresponds to four times the effective population size multiplied by the recombination rate per generation between the most distant sites in the survey (5.5 kb apart), is estimated to be 84.4. The estimate of 15.4 R kb–1 is similar to the previous estimate of 14.9 R kb–1 for the Delta region on the basis of a six-cutter restriction enzyme survey (Longet al. 1998).

Sequencing of 38 additional isogenic chromosomes for three overlapping amplicons (total of 1.6 kb) that include all identified coding SNPs from the survey of 16 chromosomes resulted in the detection of seven additional biallelic SNPs and one triallelic SNP. Among the newly identified cSNPs, we observed four synonymous and one nonsynonymous biallelic singletons and a triallelic nsSNP (with minor alleles C and A observed once and twice, respectively). We also observed two additional SNPs in the 3′ UTR, one a singleton and the other observed twice. The average minor-allele frequency among the eight newly identified SNPs was 2.21 ± 0.87%, providing empirical support for the assertion that the sequencing of 16 alleles identified the bulk of common variants. The estimate of Tajima's D from the total of 54 DNA sequences for a smaller region including only most of exon 6 and part of the 3′ UTR still did not provide evidence for departure from neutrality (D = –0.124, P > 0.10).

A Hudson-Kreitman-Aguadé test was carried out to compare polymorphism to divergence in Dl UTRs vs. coding DNA (Hudsonet al. 1987). We compared the 16 sequences from D. melanogaster to one D. simulans sequence. The D. simulans sequence covered all of the 5′ and 3′ UTR sequences and 93% of the amino acid coding region (we did not sequence all of exons 2, 3, and 4 in D. simulans). Results were consistent with neutrally evolving sites (χ2 = 0.012, P = 0.9134). A McDonald and Kreitman test was carried out using the set of 54 sequenced lines for much of exon 6 and the D. simulans sequence to determine if coding regions of Dl have evolved in a nonneutral manner (McDonald and Kreitman 1991). We observed a single replacement substitution vs. 10 nonsynonymous polymorphic sites within D. melanogaster, 18 fixed synonymous substitutions vs. 26 polymorphic sites in the D. melanogaster data, and no significant departure from neutrality (Fisher's exact test, P = 0.075).

View this table:
TABLE 1

Nucleotide diversity of coding and untranslated regions of Dl

Figure 2.

—DNA polymorphism in Delta untranslated and coding regions. Numbers above the top sequence indicate SNP position that matches accession number positions from GenBank nos. TPA: BK004004. NS, nonsynonymous site. Colors are related to the location in Dl: yellow, untranslated regions; red, coding regions; green, intronic flanking regions. The top sequence is the consensus haplotype with an allelic frequency ≥0.50 for each site; the 1957-bp DNA fragment examined in the large natural population is underlined. The first column indicates the isogenic line numbers corresponding to the numbers described in Long et al. (1998). SNPs identified from the additional 38 chromosome sequenced (N = 8) are not listed. They are at positions 35,495, 35,558, 36,117, 36,230, 36,234, 36,293, 36,536, and 36,540. Eight insertion-deletion polymorphisms are not listed; they are at positions 15,390–91, 15,392–93, 34,566–71, 36,719–20, 36,722–25, 36,906–14, 37,404–07, and 37,796–801.

Genotyping a large natural cohort: On the basis of the sequencing survey of 16 isogenic chromosomes we chose a subset of SNPs to type in a large sample of wild-caught flies. We genotyped 6 nsSNPs, but did not type the seventh (nsSNP 35,820) as it is in complete LD with typed nsSNP 35,791 in both the samples of 16 (Figure 2) and 54 (Figure 3) chromosomes. We genotyped 6 additional common SNPs (synonymous or 3′ UTR) located in the same 2136-bp amplicon that includes the 1358 most 3′ bases of exon 6 and the 778 adjacent bases of the 3′ UTR. Of the 12 SNPs genotyped, 2 nsSNPs (positions 35,039 and 36,195) are monomorphic in the large sample of 2060 flies. These two sites were also singletons in both the initial sequencing survey of 16 and the combined set of 54 chromosomes. The observation of a singleton in a small sample of chromosomes that is at a much lower frequency in the population from which the sample is drawn is not inconsistent with the Wright-Fisher sampling distribution (Crow and Kimura 1970, p. 434). Of the remaining 10 SNPs that were polymorphic, all except 2 (nsSNPs 35,163 and 35,217, Figure 2) had a minor-allele frequency >5% (Table 2). We observed no significant deviations from Hardy-Weinberg equilibrium on the basis of all the data (Table 2) or testing separately by sex (not shown). The observation of no significant departures from Hardy-Weinberg equilibrium, and in a number of cases almost exact adherence to Hardy-Weinberg proportions (e.g., SNPs 35,217 and 36,555), suggests that the natural population we sampled does not exhibit any evidence for admixture.

We examined pairwise linkage disequilibrium between SNPs for both the small haploid and large diploid data sets. The haploid data consisted of 24 SNPs with minorallele frequencies >5% (N = 54; Figure 3A). Although a few polymorphic sites were in strong linkage disequilibrium, most pairs of sites did not show significant departures from equilibrium (significant LD was observed for 24% of the comparisons at 0.05, for 0.2% at 0.01, and for 0.5% at 0.001). These results are consistent with the relatively high level of recombination within the Delta region suggested by previous surveys (Longet al. 1998). Linkage disequilibrium between SNPs for the 2060 wild-caught flies was estimated for 8 SNPs having a minor-allele frequency >5% (Figure 3B). No significant LD was observed for any of the 28 pairwise comparisons, in agreement with the same set of SNPs in the 54 isogenic chromosome samples in which only 1 of the 28 pairwise comparisons is significant at P < 0.05. The general pattern of R2 values is comparable in the small and large samples, except that we saw more instances of D′ equal to one in the smaller sample than in the larger sample (11/28 compared to 2/28). The implication is that the sampling error in D′ estimates is quite large and can often include D′ equal to one, even for samples of >50 phase-known chromosomes with minor-allele frequencies >5%.

Data quality from the high-throughput OLA genotyping assay: As we present a relatively novel SNP typing assay, it is worthwhile to evaluate the quality (and quantity) of data obtained using this method. We summarize the efficiency of our approach using three statistics. The average call rate per SNP is the proportion of individuals we assay to which we assign a genotype. In this experiment we observe an average call rate of 90.7 ± 2.8% standard deviations (Table 2). A large fraction of the non-called individuals are likely due to PCR failures, which are estimated to be ∼5% on the basis of running agarose gels (N = 384; data not shown). We further define a statistic called the miscall rate, which is the proportion of individuals that we estimate are assigned the incorrect genotype. We estimate our miscall rate to be 0.087 ± 0.06% standard deviation (Table 2). Thus our method performs at close to the 99.9% accuracy attributable to many SNP typing methodologies, although here quality control is assessed directly from the SNP typing experiment itself.

Both the call and miscall rates are calculated on the basis of measures of internal consistency over replicates. It is possible that an assay is internally very consistent (hence having a low miscall rate) and yet biased in some manner so that it consistently makes the incorrect call. We refer to this source of error as the assay failure rate. Such failures are often attributed to uncontrolled sources of variation such as polymorphisms in primer-binding sites. To assess assay failure rate we need to estimate error rates using an independent assay method. We used ASO assays successfully developed for four of the SNPs typed by OLA (SNPs 35,639, 35,791, 36,116, and 36,271) to identify individuals for which the two methods gave discordant calls (121, 180, 176, and 142 individuals, respectively). We then used DNA sequencing for a subset of the discordant individuals to determine which method is correct. DNA sequencing and OLA were in agreement for 12/12, 12/12, 43/44, and 32/32 sequences for SNPs 35,639, 35,791, 36,116, 36,271, respectively. Thus we estimate the OLA failure rate as 0.12 ± 0.04% standard deviation over SNPs (Table 2). We conclude that OLA has an acceptable failure rate given our assay conditions, but note that the assay failure rate is likely to vary on a SNP-by-SNP basis and is generally difficult to measure.

Figure 3.

—Patterns of pairwise linkage disequilibrium between common polymorphisms employed in the association study (minor-allele frequency >0.05). Linkage disequilibrium estimates R2 and D′ are plotted for each pairwise comparison. (A) Haploid data from 54 sequenced isogenic lines; (B) diploid data from 2060 wild-caught flies. SNP numbers indicated at the top of A correspond to the SNP positions in Figure 2. Minor-allele frequencies are indicated on the right. SNPs analyzed in both studies are connected by dashed red lines.

Associations between bristle number variation and polymorphic SNPs in Dl: Figure 4 plots F-statistics of tests for association between SNPs and bristle number variation in both wild-caught and laboratory lines. Figure 4, A and B, shows additive and arbitrary dominance models fitted to the wild population, whereas Figure 4, C and D, shows a haploid model fitted to 47 isogenic lines homozygous for the entire third chromosome or 46 lines homozygous for an introgressed fragment including Dl, respectively (Longet al. 1998). It is worth-while to note that the model in Figure 4B is statistically equivalent to a one-way ANOVA testing for significant differences among the three genotypic classes. Associations in Figure 4, C and D, are for all 36 of the observed SNPs between positions 35,018 and 36,555 (Figure 2), except 3 (SNPs 35,495, 36,041, and 36,540), for which phenotypic data were not available for the rare allele (see Longet al. 1998). We observed 3 SNPs (SNP 35,163 for ABF at P < 0.030, SNP 35,639 for SBM at P < 0.023, and SNP 35,579 for ABM at P < 0.048) that were significantly associated at P < 0.05 under the additive model analyzing the data separately by sex, and none were significantly associated under the arbitrary dominance model in the large wild-caught cohort. We observed three significant associations under the haploid model in the W background (SNP 36,230 for ABF at P < 0.009, SNP 35,114 for SBM at P < 0.043, and SNP 35,465 for SBM at P < 0.046) and one significant association in the B background (SNP 35,039 for ABM at P < 0.017). To correct for false positives arising from multiple comparisons we used a permutation testing approach. When we calculate significance thresholds using a permutation testing approach, none of the observed marginal associations remains significant (Figure 4, dashed lines).

View this table:
TABLE 2

SNP variation in Dl exon 6 in D. melanogaster natural population

In addition to analyzing marker/phenotype associations separated by sex, we also fitted models in which the two sexes were combined and we tested for marker, sex, and marker-by-sex interactions. Results from these analyses were in qualitative agreement with those obtained when the sexes were considered separately. The one exception was for SNP 36,271, where we detected an abdominal bristle-number-by-sex interaction (F = 6.00, P = 0.014), despite not detecting a significant marker effect or significant effects on bristle number when the sexes were analyzed separately. We also tested all possible two-way epistatic interactions between SNPs marginally associated with bristle number variation (regardless of the background, sex, or bristle trait for which we observed significance). For the laboratory lines we fitted the additive effect at each marker and a two-way interaction, for the large natural cohort we fitted an additive and arbitrary dominance effect at each marker and an additional additive-by-additive interaction term with data being analyzed separately by sex. None of these pairwise comparisons were found significant.

Although empirical thresholds were different for the haploid and diploid models as the number of markers and individuals varied over experiments, empirical thresholds were generally very similar within the haploid and diploid models over characters and sexes. The one exception to this rule is the W background for abdominal bristle number (Figure 4C, solid and shaded lines for ABM and ABF, respectively). These departures from a more typical threshold of ∼11 were due mainly to two phenotypically atypical lines; one line had a count of 26.7 abdominal bristles in males and the other a count of 15.1 abdominal bristles in females. Once these two lines were removed and the permutation testing was repeated, the threshold for experimentwise significance at P < 0.05 returned to more typical values of 10.9 and 11.1 for ABM and ABF, respectively.

Small effects can be detected: Despite the lack of significance via the permutation-testing framework, the maximum-likelihood estimates of the phenotypic effects associated with each SNP in the large natural cohort suggest that the failure to uncover associations is not due to lack of power. Table 3 lists effects associated with each of the 10 SNPs typed in the large natural cohort (with sign relative to the major allele) under additive and arbitrary dominance models for diploid data and under a haploid model for the laboratory lines. Standard deviations associated with phenotypic effects are generally larger in the laboratory than in natural populations, with average standard deviations of 0.89, 0.47, and 0.16 for the W background, B background, and wild cohort under an additive model, respectively. Thus, with large sample sizes phenotypic effects associated with markers can be accurately measured even without controlling for environmental and segregating genetic variation.

Figure 4.

F-statistics for association between SNPs and variation in bristle number for both sexes (blue for males and red for females) and abdominal bristles (AB) and sternopleural bristles (SB). (A and B) Association study on 10 SNPs for a sample of 2060 wild-caught flies genotyped via OLA, under additive and arbitrary dominance models, respectively. (C and D) Association study for the laboratory lines on 33 segregating SNPs for which phenotypic data are available, genotyped via sequencing. (C) Data from the W background sample that corresponds to isogenic lines homozygous for the entire third chromosome (N = 47). (D) Data from B background sample that corresponds to isogenic lines homozygous for an introgressed fragment including Dl (N = 46). Red lines at the top of C and D show the positions of SNPs typed in the large wild cohort, black dashed lines show the position of additional nsSNPs that are either in complete LD with other typed SNPs or monomorphic in the wild population. Experimentwise false-positive thresholds of 5% are represented by gray dashed lines for both sexes, except for those in C, showing different values between males (gray dashed line) and females (black dashed line). The bottom part of the figure depicts a feature of Delta diagrammatically with relative physical positions of the SNPs and functionally annotated features drawn to scale; exons are marked by black boxes.

On the basis of accurate genotypic frequency and allelic-effect estimates in the large natural cohort it is possible to ask how much of the total phenotypic variation could possibly be associated with each of the SNPs examined. To do this we use the central limit theorem to place a 95% upper bound on the maximum-likelihood estimate of the phenotypic effect associated with each marker (amax). This 95% upper bound on a is conceptually closely related to a power analysis, as the expected value of amax is the point where we have roughly 80% power to reject the null hypothesis (see discussion). We then substitute our estimated amax into the formula for the additive genetic variance under an additive model to calculate Vmax. We calculate a similar statistic for haploid laboratory lines primarily for comparison, although estimates of laboratory parameters are generally less accurate and results are more difficult to interpret. Figure 5, A (male data) and B (female data), plots the maximum proportion of total variance attributable to each SNP by sex and trait, using colors to identify the different genetic backgrounds. Figure 5 shows that the estimates of phenotypic effects associated with each marker in the large natural cohort are inconsistent with these markers contributing to a meaningful proportion of the total phenotypic variation. In fact, the largest fraction of total phenotypic variation attributable to any of the nonsynonymous SNPs, for either of the bristle characters in either males or females in the wild cohort, is only 1.7%.

View this table:
TABLE 3

Phenotypic effects in natural and laboratory populations

Figure 5.

—Ratios of VAmax/VP, for each of the 10 polymorphic sites typed in the laboratory and natural populations. A and B are for male and female data, respectively. Colors represent ratios from different genetic backgrounds [green, isogenic lines homozygous for an introgressed fragment including Dl (B background); black, isogenic lines homozygous for entire third chromosome (W background); red, natural population sample]. Gray dashed lines represent the 5% value of the ratio VAmax/VP, and red lines at the bottom represent the position of each SNP. SB, sternopleural bristles; AB, abdominal bristles.

An objective of this work was to compare phenotypic effects between the natural populations and the laboratory lines. We estimated correlation between effects measured in different backgrounds over SNPs, characters, and sexes (i.e., each correlation is >10 SNPs × 2 characters × 2 sexes). We observed that effects in the W background and the large natural cohort were not correlated (t = 1.93, ρ= 0.30, P > 0.5), although effects measured in the B background were significantly correlated with those measured in the large natural cohort (t = 3.99, ρ= 0.53, P < 0.01). We do not have a good explanation for why this correlation exists, as effects measured in two arbitrary halves of the natural cohort are uncorrelated (t = 1.40, ρ= 0.22, P > 0.5). The correlation of effects over arbitrary halves of the natural cohort would seem to define an upper bound on what the correlation should be between the natural cohort and any other background; thus it is likely that the observed correlation between effects in the B background and the natural cohort is an unexplained artifact. One possibility is that estimates of allelic effects in the B background are accurate estimates of true parameters in comparison to the W background (since more non-Dl segregating and environmental variation is controlled for in the B background), and we confirm these measures only when we have the power of the full natural cohort.

DISCUSSION

A novel method for high-throughput SNP genotyping: The high-throughput genotyping methodology we developed is based on previous methods investigated by Tobe et al. (1996) and Schouten et al. (2002). It is straightforward, cost efficient, and can easily handle large numbers of individuals. An additional feature is that most of the equipment is standard in molecular biology laboratories. On average we were able to genotype >90% of our individuals and we estimate our error rate to be <0.2%. Since we estimate our error rates directly from the SNP typing pipeline, as opposed to an independent assay, we are unlikely to consistently underestimate our error rate.

Predicting functional SNPs on the basis of sequence analysis: The molecular population genetic analysis of Delta does not suggest it has been a target of recent natural selection. We did not observe an excess of rare variants in DNA sequence data compared with what is expected under a neutral model (Tajima 1989) or a signature of past adaptive fixation of amino acids or differences in polymorphism to divergence levels in coding vs. noncoding regions (Hudsonet al. 1987). Similarly, patterns of linkage disequilibrium in the region do not appear atypical. Thus population genetics analysis provides no evidence to reject the null hypothesis that the Delta coding region is evolving solely under mutation, recombination, and drift. We observe that regulatory regions are less polymorphic than coding regions in Dl. This observation is consistent with previous studies in Drosophila (e.g., Moriyama and Powell 1996), reporting that noncoding regions (5′ and 3′ untranslated regions and introns) in many nuclear genes show less polymorphism than do silent sites within the coding regions. The same pattern of constraint on regulatory regions is observed in humans (e.g., Cargillet al. 1999).

Lieber et al. (1992) demonstrated that single-amino-acid changes in Dl extracellular ELRs 4 and 9 (mutants Dlsup5 and Dlsup4) altered developmental signaling of Notch-Dl complexes and eye development, and Parks et al. (2000) recently showed that ELR 3 is required for Delta endocytosis, Notch trans-endocytosis, and for adoption of cell fate. We detected four SNPs in a subset of these extracellular domains (Figure 4, bottom). Three SNPs were observed in EGF-like domains: nsSNP 35,039 is in the fourth ELR, nsSNP 35,163 is in the sixth ELR, and SNP 35,207 is in the first calcium-binding domain. Moreover we observed one nsSNP (35,639) in the downstream transmembrane domain. We observed no association between these SNPs and bristle number variation. We are led to conclude that detailed functional molecular dissection of a gene is not necessarily informative as to whether or not any given SNP is likely to have a phenotypic effect associated with it. This result has repercussions for the study of human disease (Botstein and Risch 2003).

Relationship between laboratory and natural populations: Comparison between phenotypic effects measured in large natural populations and laboratory-maintained strains represents an opportunity to detect differences between effects measured in two different environments. Past QTL mapping studies have detected significant genotype-by-environment interactions (Mackay 2001), so it seems plausible that phenotypic effects are different in the wild and in the laboratory. Models that purport to explain the maintenance of quantitative genetic variation are dependent on the phenotypic effects, frequencies, and the form of selection operating in nature on the genetic variants that contribute to phenotypic variation (Barton and Turelli 1989). Unfortunately our failure to observe significant associations precludes any assessment of the relationship between effects in natural and laboratory lines.

No evidence for an association between common nonsynonymous variants in Dl and standing variation in bristle number: No polymorphic sites in the Dl coding region, including six of seven common nonsynonymous variants, were significantly associated with variation in bristle number, as assessed by a permutation testing procedure. As the seventh common nsSNP is in complete LD with another typed nsSNP in a sample of 54 chromosomes we conclude that none of the common nonsynonymous SNPs in Delta are associated with bristle number variation. This observation is true under either additive or arbitrary dominance models for a sample of 2060 wild-caught flies, a set of 47 isogenic third chromosome lines, and a set of 46 isogenic lines differing in only a small wild introgressed fragment including Dl. The failure to detect associations is not due to a lack of power, as upper bounds on the fraction of variation attributable to cSNPs in Delta in a natural population sample are always <1.7%. We directly or indirectly examined every common nonsynonymous variant in Delta; as a result this conclusion is not dependent on detailed descriptions of linkage disequilibrium. We thus exclude nonsynonymous variants in Dl as an explanation for the previous observation of significant associations between two marker SNPs in introns of Dl and bristle number variation (Longet al. 1998). These results do not exclude the possibility that a number of individually rare ungenotyped SNPs may contribute to the bulk of the genetic variance; such low-frequency small-effect SNPs are very difficult to characterize at a molecular level. Future work will more closely examine multiple SNPs in the intronic regions harboring previously observed SNPs significantly associated with bristle number variation in laboratory strains; such work will involve a great deal of additional genotyping effort.

The quantitative genetic models we fit to the data allow us to estimate the random variables a and d, which are estimators of the true underlying population parameters α and δ. If errors within genotypic classes are Gausssian, it can be shown that the estimates of α and δ obtained from fitting the above statistical models to the data are maximum-likelihood estimates of these parameters (Stuart and Ord 1994). When the population sample size approaches 1000 individuals (and only a few parameters are estimated from the data), under the central limit theorem parameter estimates are very close to normally distributed, and standard deviations on these estimates can be used to accurately place upper and lower bounds on α and δ. These upper and lower bounds are valid regardless of the observed significance of the statistical test and are useful in determining how large an effect attributable to a typed SNP could truly be and yet remain undetected. The upper bound we define as amax is closely related to the value of a under the alternate hypothesis where we have 80% power to detect a site significantly associated with bristle number variation. Since amax = |a| + 1.96σa, the expected value of amax is ∼0.8σa + 1.96σa, and the critical value required to reject the null hypothesis of no effect is 1.96σa, the resulting power at the expected value of amax is 79%. It is important to have small errors on the estimates of allelic effects; otherwise amax will be large, and we are unable to exclude sites that contribute to >5% of the total variation in bristle number. In this study we typed a large number of individuals, errors on estimates of allelic effects were thus small, and as a result we can reject all sites examined as contributing to >2% of bristle number variation.

In previous studies, several noncoding variants have been found to be associated with variation in bristle number for different loci. Of three variants in the scabrous locus associated with bristle number variation, two were localized to the 3′ regulatory region of the gene (Lymanet al. 1999). At the hairy locus, a complex indel polymorphism ∼3.5 kb upstream of the transcription start site was associated with bristle variation, and the single nsSNP identified in hairy was not associated with bristle number variation (Robinet al. 2002). Such results support the argument that regulatory variants are more likely than nonsynonymous variants to contribute to short-term evolutionary response and standing variation (Chakravarti 1999; Stern 2000). By examining all nsSNPs in Delta in a large enough sample that meaningful associations are detectable, we are able to rule out common nonsynonymous variants in Delta as contributors to abdominal or sternopleural bristle number variation. These results suggest that regulatory (cis-acting regulatory and/or splicing variants) as opposed to amino acid changing variants contribute to previously observed associations between bristle number and polymorphisms in the Delta gene region. The ultimate identification of such regulatory SNPs will require additional high-throughput genotyping of SNPs throughout the regulatory regions and introns of Dl and other members of the Notch-signaling pathway.

Acknowledgments

The authors are grateful to T. Hudson and the Montreal Genome Centre for providing facilities and expertise and to S. J. Macdonald and P. Beldade for helpful comments on the manuscript. This work was supported by the National Institutes of Health grant GM-58564.

Footnotes

  • Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY437140-AY437325 and AY438-438202.

  • Communicating editor: M. Aguadé

  • Received May 12, 2003.
  • Accepted October 9, 2003.

LITERATURE CITED

View Abstract