help button home button Genetics AJP: Cell Physiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Originally published as Genetics Published Articles Ahead of Print on April 15, 2007.

Genetics, Vol. 176, 1261-1281, June 2007, Copyright © 2007
doi:10.1534/genetics.106.069641

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
genetics.106.069641v1
176/2/1261    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Related articles in Genetics
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Macdonald, S. J.
Right arrow Articles by Long, A. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Macdonald, S. J.
Right arrow Articles by Long, A. D.

Joint Estimates of Quantitative Trait Locus Effect and Frequency Using Synthetic Recombinant Populations of Drosophila melanogaster

Stuart J. Macdonald*,1 and Anthony D. Long{dagger}

* Department of Ecology and Evolutionary Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66045 and {dagger} Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697

1 Corresponding author: Department of Ecology and Evolutionary Biology, University of Kansas, 1030 Haworth Hall, 1200 Sunnyside Ave., Lawrence, KS 66045.
E-mail: sjmac{at}ku.edu

Manuscript received December 13, 2006. Accepted for publication April 10, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
We develop and implement a strategy to map QTL in two synthetic populations of Drosophila melanogaster each initiated with eight inbred founder strains. These recombinant populations allow simultaneous estimates of QTL location, effect, and frequency. Five X-linked QTL influencing bristle number were resolved to intervals of ~1.3 cM. We confirm previous observations of bristle number QTL distal to 4A at the tip of the chromosome and identify two novel QTL in 7F–8C, an interval that does not include any classic bristle number candidate genes. If QTL at the tip of the X are biallelic they appear to be intermediate in frequency, although there is evidence that these QTL may reside in multiallelic haplotypes. Conversely, the two QTL mapping to the middle of the X chromosome are likely rare: in each case the minor allele is observed in only 1 of the 16 founders. Assuming additivity and biallelism we estimate that identified QTL contribute 1.0 and 8.7%, respectively, to total phenotypic variation in male abdominal and sternopleural bristle number in nature. Models that seek to explain the maintenance of genetic variation make different predictions about the population frequency of QTL alleles. Thus, mapping QTL in eight-way recombinant populations can distinguish between these models.


VARIATION in quantitative, or complex, traits is influenced by numerous genetic loci and by environmental factors. For many complex traits we have estimates of the fraction of phenotypic variation that is due to genetic factors, but we do not have a general understanding of the number, effect, and frequency of the alleles that contribute to phenotypic variation. Are alleles at quantitative trait loci (QTL) generally of large effect, but low in frequency, consistent with models of mutation–selection balance (MSB; reviewed by JOHNSON and BARTON 2005)? Alternatively, is the bulk of standing genetic variation for complex traits due to modest-effect intermediate frequency alleles maintained by some form of balancing selection (reviewed by BARTON and TURELLI 1989; BARTON and KEIGHTLEY 2002)? In the human genetics community the idea that complex trait variation is due to intermediate frequency polymorphisms has been termed the common disease–common variant (CDCV) hypothesis (CARGILL et al. 1999). The distinction between MSB models and balancing selection/CDCV models not only is important for understanding how genetic variation is maintained in populations, but also will affect the power of current population-based approaches to identify risk alleles for human disease (WANG et al. 2005).

The most effective way to clarify the contribution of MSB and CDCV forces in maintaining phenotypic variation is to experimentally identify and characterize the underlying molecular genetic basis of several QTL. With this ultimate goal in mind, two non-mutually exclusive experimental programs are predominant in the literature: QTL mapping and association or linkage disequilibrium (LD) mapping. In its simplest form QTL mapping involves crossing a pair of lines that are differentially fixed for alleles at a genomewide set of marker loci and at QTL contributing to the phenotype. Genotyping and phenotyping a large number of recombinant progeny from this cross identifies genetic intervals that harbor factors contributing to segregating variation in the cross. Since the publication of influential articles by PATERSON et al. (1988) and LANDER and BOTSTEIN (1989), the community has enjoyed considerable success mapping QTL for a wide range of traits in a diverse set of genetic systems. Typically QTL are resolved to broad intervals of ~10 cM (MACKAY 2001), which may represent millions of base pairs. This lack of resolution has hindered identification of the molecular variants involved, particularly in QTL mapping studies of intraspecific variation where QTL can have subtle effects. Physically close genetic factors also pose a problem for QTL mapping, as it may be impossible to accurately estimate the effects and locations of linked QTL, and the number of QTL may be underestimated (WRIGHT and KONG 1997; CORNFORTH and LONG 2003). Additionally, since recombinant individuals for QTL mapping are generally derived from a pair of inbred parental lines, only QTL that segregate between the parents can be identified. As a result there is no way to know the population frequency of mapped QTL.

Association mapping is a population-based genetic mapping strategy. The approach involves genotyping a large number of single nucleotide polymorphisms (SNPs) in a large sample of individuals and at each marker testing for an association between genotype and phenotype. A strong association signal at a SNP suggests either that the SNP itself contributes to trait variation or that the causal site is in strong LD with the SNP marker genotyped. Instead of relying on meiotic recombination in experimental crosses, association mapping utilizes the pattern of historical recombination in a panel of natural chromosomes. Thus, association mapping has the potential for much higher resolution than QTL mapping, and in principal the actual quantitative trait nucleotide (QTN) can be identified and its effect and frequency estimated directly. In practice, association mapping has met with modest success, and the literature is rife with failures to replicate published associations (although see TODD 2006 for a positive view of the future). This reflects a variety of factors, such as cryptic population structure, different patterns of LD or genetic heterogeneity in different populations, or simply insufficient power to detect variants with only subtle effects (KRUGLYAK 1999; LONG and LANGLEY 1999). Association mapping can be effective only when the density of genotyped SNPs is sufficiently high that real associations are not missed (RISCH and MERIKANGAS 1996). Since powerful genomewide association studies are tremendously difficult to carry out, even in humans where resources are considerable (HIRSCHHORN and DALY 2005; WANG et al. 2005), researchers have elected to carry out localized mapping on candidate gene regions (e.g., GENISSEL et al. 2004; PALSSON and GIBSON 2004; MACDONALD et al. 2005a). Such a strategy will fail if the presumed candidate does not actually contribute to trait variation (e.g., FLOREZ et al. 2006). Finally, an aspect of association mapping that is often overlooked is that if much of the genetic variation underlying complex traits is due to rare variants of large effect (as predicted by MSB models) the association mapping paradigm is not very powerful at all, and is almost guaranteed to fail (WEISS and TERWILLIGER 2000; PRITCHARD 2001; REICH and LANDER 2001; PRITCHARD and COX 2002).

It is quite clear that both QTL and association mapping approaches, while powerful in many respects, suffer from distinct drawbacks that prevent the routine identification and characterization of QTN. To make the dissection of complex traits more routine we require a methodology that has some of the resolution of association mapping, combined with the power of QTL mapping to identify factors on a genomewide scale. To determine if standing variation is generally consistent with MSB or CDCV models a method allowing for direct estimation of the population frequency of mapped factors is highly desirable. An ideal methodology would also provide some mechanism with which to identify the precise molecular variants involved.

In this study we describe a mapping scheme that allows joint estimates of QTL effects and frequencies from a recombinant panel derived from multiple founder chromosomes. Conceptually, our approach is similar to the mouse "Collaborative Cross" scheme envisioned by the Complex Trait Consortium (THREADGILL et al. 2002; CHURCHILL et al. 2004) and has parallels with the "heterogeneous stock" strategy (TALBOT et al. 1999; MOTT et al. 2000; DEMAREST et al. 2001) most recently used by VALDAR et al. (2006b) to map QTL for 97 traits in mice. We take two independent sets of eight inbred Drosophila melanogaster lines, and from each set initiate a recombinant population. The genetic material for each synthetic population is thus derived from just eight founders, and after multiple generations of maintenance the genome of each recombinant individual is a mosaic of the founder chromosomes (Figure 1). Chromosomal segments transmitted to the recombinant flies by each of the founders are distinguished using markers composed of short runs of nonrecombining SNPs. Multiple rounds of recombination allow these synthetic populations to be used to map QTL with a fairly high level of resolution. Since each synthetic population is derived from eight founders, a key feature of our approach is that we obtain simultaneous estimates of the effect and the population frequency of each mapped QTL. Furthermore, because mapping resolution is generally a function of the number of generations of recombination since population inception, later generations can be used to map more precisely those QTL detected at an earlier generation in a coarse genomewide scan. Here we detail an experiment to map bristle number QTL on the D. melanogaster X chromosome and describe the analytical platform required to deal with experimental mapping data generated using eight-way synthetic populations.


Figure 1
View larger version (20K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 1.— Creation of the synthetic recombinant D. melanogaster populations. A recombinant mapping population is initiated from a set of eight inbred lines, A–H, that are intercrossed (virgin females crossed to males) in a one-way round-robin design: line A crossed to line B, line B crossed to line C, ..., and line H crossed to line A. From each of the eight crosses, 10 male and 10 virgin female F1 progeny were collected, pooled, and used to initiate the two replicate synthetic recombinant populations. Only the sex chromosomes and one set of autosomes are presented. Full details of the precise crosses performed are provided in the MATERIALS AND METHODS.

 

    MATERIALS AND METHODS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
D. melanogaster stocks:
All 16 wild-type D. melanogaster lines used to found the synthetic populations (Table 1) have been examined for both PM and IR dysgenesis and were shown to be MI (KIDWELL et al. 1983). We also made use of the strain of D. melanogaster used for genome sequencing, the "sequenced strain" (Bloomington Drosophila Stock Center no. 2057, ADAMS et al. 2000; CELNIKER et al. 2002), which has the M cytotype. We further verified that all lines are free of P elements using a PCR-based transposon-display assay (details available on request). After founding the synthetic populations we found that lines A7 and B8 (Table 1) were genetically indistinguishable on the basis of the X-linked markers we describe here. This likely represents an error that occurred at the stock center.


View this table:
[in this window]
[in a new window]

 
TABLE 1 D. melanogaster stocks

 
Synthetic recombinant populations:
Four synthetic populations were created: population A replicates 1 and 2 (pAr1, pAr2) were initiated from lines A1–A8, and population B replicates 1 and 2 (pBr1, pBr2) were initiated from lines B1–B8 (Table 1 and Figure 1). To initiate the pA populations the following line crosses were carried out: A4 x A3, A3 x A7, A7 x A8, A8 x A5, A5 x A2, A2 x A6, A6 x A1, and A1 x A4 (virgin females x males in each case). In the following generation (generation G0), 10 male and 10 virgin female progeny from each of these eight crosses were combined into a single bottle and allowed to lay eggs. These 160 flies were transferred to a fresh bottle on three successive days to generate four replicate bottles (b1, b2, b3, and b4). In the following generation (generation G1) offspring from bottles b1 and b4 were mixed and distributed into four fresh bottles (pAr1–b1, pAr1–b2, pAr2–b1, pAr2–b2), and offspring from bottles b2 and b3 were mixed and distributed into a further four bottles (pAr1–b3, pAr1–b4, pAr2–b3, pAr2–b4). This procedure produced eight bottles, four for each of the two replicate pA populations. From this point onward the replicate populations pAr1 and pAr2 were maintained independently. At generation G2, and in every subsequent generation, within each replicate, bottles b1 and b4 were mixed and distributed into two fresh bottles numbered b1 and b2, and bottles b2 and b3 were mixed and distributed into two fresh bottles numbered b3 and b4. This strategy effectively maintained each replicate population as a single, large, interbreeding cohort despite being split across four bottles. The census size for each population was maintained at well over 1000 individuals to minimize the effects of random genetic drift. The pair of replicate pB populations was established and maintained in a similar fashion.

Experimental flies:
Figure 2 presents the strategy used to create the individuals used for phenotyping and genotyping.


Figure 2
View larger version (23K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 2.— Overview of the experimental strategy. Virgin females from the synthetic D. melanogaster mapping population (colored mosaic chromosomes) are crossed to males of the isogenic strain of D. melanogaster used for genome sequencing (uniform black chromosomes). All F1 progeny from this cross are trans-heterozygotes of a maternally inherited synthetic recombinant chromosome against a paternally inherited chromosome from the isogenic strain. Male F1 are hemizygous for the recombinant X chromosome. F1 trans-heterozygotes are each phenotyped for the trait of interest and genotyped for a set of molecular markers spanning the chromosome(s). As shown, a marker represents a multilocus genotype from a set of nonrecombining SNPs (four in this example). Markers allow the eight lines founding the recombinant population to be distinguished. Only the sex chromosomes are presented in this figure—autosomes would behave similarly to female X chromosomes.

 
Coarse mapping:
Virgin females were collected from each of the four synthetic populations and aged in groups of 50 in vials for 2–5 days. Twelve aged virgin females from a given population were crossed to 12 males from the sequenced strain in vials. Multiple replicate vials were created, and from each vial 4 male and 4 female offspring were used for phenotyping and genotyping. The experimental flies are thus F1 progeny of a recombinant female and an isogenic male. The coarse-mapping experiment was split into four blocks: in block 1 (generation G16 of the synthetic populations) 24 vials were set up for each of the four populations (pAr1, pAr2, pBr1, and pBr2), and in blocks 2–4 (generations G17–G19) 36 vials were set up for each population. This resulted in a total of 528 male and female experimental flies collected for each population. For each fly two phenotypic measurements were taken: sternopleural bristle number (SBN) is the sum of the number of macro- and microchaetae on the left and right sternopleural plates, and abdominal bristle number (ABN) is the number of microchaetae on the most posterior sternite, corresponding to segment six of females and segment five of males.

A subset of the coarse-mapping experimental flies was tested for the presence of P elements using a transposon-display assay. All flies should be P free. We found that flies derived from population pAr2 showed P elements, implying that pAr2 was contaminated. This population was destroyed, and experimental flies from this population are not considered further.

Fine mapping:
Virgin females were collected from synthetic populations pAr1 and pBr1 and aged as for the coarse-mapping experiment. Multiple vial crosses were set up between 10 aged virgin females and 10 sequenced-strain males, and from each vial 4 male offspring were used for phenotyping and genotyping. The fine-mapping experiment was split into two blocks. In block 1 (generation G55) 144 vials were set up for each of the two populations pAr1 and pBr1, and in block 2 (generation G56) 120 vials were set up for each population. This resulted in a total of 1056 male experimental progeny collected for the pAr1 and pBr1 synthetic populations. Populations pAr1 and pBr1 were shown to be free of P elements at generation G52, just prior to beginning the fine-mapping experiment.

Molecular marker development:
We sought to identify 1-kb sequence fragments harboring several polymorphisms that collectively distinguish the founders (Figure 2), and over the coarse- and fine-mapping experiments developed 24 such markers (Table 2). Fourteen of these were identified via blind resequencing of 1-kb, primarily noncoding regions of the D. melanogaster genome for the founder lines (this study and MACDONALD and LONG 2005). The remaining 10 markers were taken from resequencing data generated by others (HARR et al. 2002; ORENGO and AGUADÉ 2004; DUMONT and AQUADRO 2005; OMETTO et al. 2005). Intermarker recombination fractions for the experimental panels of flies are estimated from the genotyping data (described below). To place markers to the standard D. melanogaster genetic map we extracted from FlyBase (http://www.FlyBase.org) all those genes with a known physical position (in base pairs), and an estimated genetic position (in centimorgans). For each chromosome we plotted base pairs against centimorgans, and using the ksmooth function in the statistical programming language R (http://www.R-project.org) generated a smoothed curve through the data. For each marker, using the smoothed curve we estimated the genetic position (on the standard map) from the known physical position. These marker positions were subsequently used as anchors to estimate QTL positions on the standard D. melanogaster genetic and physical maps.


View this table:
[in this window]
[in a new window]

 
TABLE 2 Details of the PCR amplicons used for genotyping

 
Genotyping:
Following phenotyping, experimental flies were deposited directly into 96-well plates on ice. We also collected 12 female flies from each of the 16 lines used to found the synthetic populations and multiple females from the sequenced strain. Subsequently, DNA from all flies was extracted in 96-well format (described in GRUBER et al. 2007), and diluted DNA was aliquoted into 384-well plates and dried down in preparation for PCR. Together with blanks and various control samples, the coarse-mapping DNA panel consisted of 12 384-well plates, and the fine-mapping DNA panel consisted of 6 384-well plates. The entire coarse-mapping (fine-mapping) DNA panel was PCR amplified for the appropriate 12 (17) 1-kb amplicons in standard 5-µl PCR reactions. These PCR products were pooled in groups of two or three and used as a template for multiplex genotyping of SNPs contained within the fragments. MACDONALD et al. (2005b) provides full details of this genotyping methodology.

The genotype data were processed using custom routines implemented in the statistical programming language R (http://www.R-project.org). First, we ensured that none of the SNPs genotyped segregated within the sequenced strain. Next, for each of the experimental flies we found the maternally inherited haplotype from the synthetic recombinant population. No change to the genotyping data from males is required, since all SNPs are X linked and Drosophila males have a hemizygous X. Experimental females have both a paternally inherited sequenced-strain X and a maternally inherited recombinant X. Because the sequenced strain is isogenic, the haplotype of the recombinant chromosome for each experimental female can be obtained by subtraction. For example, if the sequenced strain is abc, and we observe an experimental female genotype of aaBbCc, we know the inherited recombinant maternal chromosome is aBC. Thus, the maternally inherited recombinant haplotype can always be unambiguously defined.

The next step is to transform the haplotype data from the experimental individuals into a three dimensional matrix, G, where Gimk takes a binary value describing whether the observed maternal haplotype for individual i at marker m is consistent with the haplotype of founder k (k = 1, 2, ..., 8); i.e., Gimk = 1 if the haplotype is compatible with that of the kth founder, and Gimk = 0 otherwise. Using the data from the 12 females genotyped for each founder line, we can list all of the multilocus haplotypes present for each founder and marker. Generally the founder lines are completely inbred, although there is some residual heterozygosity and more than one haplotype can be present within a line at a given marker. Also, founders are not always unique at every marker, and missing data are unavoidable with a project on this scale. Typically we find that markers are not fully informative and fail to distinguish all eight possible founder chromosomes for one or both synthetic populations. Each test individual/marker combination is coded as follows. Consider that the marker haplotypes for the eight founder lines are (1) ABC, (2) AbC, (3) ABc, (4) aBC, (5) AbC, (6) aBc, (7) ABC, and (8) Abc (in this example founders 1 and 7, and 2 and 5, are indistinguishable). If an experimental fly is aBc it must have the chromosome from founder 6 and is coded as 2(6–1) = 32. Alternatively, if the experimental individual is found to be ABC it might equally be derived from founders 1 or 7 and is assigned the value 2(1–1) + 2(7–1) = 65. Finally, a haplotype with missing data, ?B?, is compatible with founders 1, 3, 4, 6, and 7, and is assigned the value 2(1–1) + 2(3–1) + 2(4–1) + 2(6–1) + 2(7–1) = 109. By extension it is obvious that an experimental individual will be assigned a value of 1–255 for each marker, precisely defining the potential ancestry of the chromosomal segment. Using this coding scheme the raw three-dimensional data matrix, G, can be alternatively represented as a two-dimensional matrix, C, with Cij (the code for the ith individual at the jth position) taking an integer value between 1 and 255. We provide C, along with the corresponding bristle phenotypes, as supplemental material on the GENETICS website (http://www.genetics.org/supplemental).

Statistical platform:
Data analysis consists of three steps, and the statistical machinery is implemented as series of functions in the statistical programming language R, expanding on the R/qtl package (http://www.rqtl.org; BROMAN et al. 2003). First, we consider a 1-cM grid along the chromosome and calculate the probability pijk that individual i carries founder allele k at position j, given the available genotype data, G. This is done using the standard hidden Markov model (HMM) technology of BAUM et al. (1970), first applied in a genetics context by LANDER and GREEN (1987) and adapted to allow for genotyping errors by LINCOLN and LANDER (1992). The observed data, G, are viewed as marker "phenotypes" that are possibly subject to error. The true underlying genotypes are assumed to follow a Markov chain, with each of the eight possible founder alleles being equally likely. For any two positions, the probability of a transition from founder allele k1 to founder allele k2 is r/7 if k1 != k2 (recombination in the interval) and 1 – r if k1 = k2 (no recombination). Here, r is analogous to the recombination fraction for the interval, but represents recombination events from multiple generations and is estimated from the data. The observed marker genotype at a locus is assumed to be compatible with the true underlying genotype with probability 1 – {epsilon}, where {epsilon} is the genotyping error rate. A readable tutorial on implementing the HMM is provided by BROMAN (2006). The information content of the available marker genotype data may be measured by the proportion of missing information, which we take to be Hj = {Sigma}i {Sigma}k pijk log pijk/n log 8, where n is the number of individuals.

The second step is to fit a model relating phenotype to genotype. Initially, at the jth position we calculate the average phenotype by founder genotype (with the ith individual's phenotypic contribution to the mean of the kth founder chromosome weighted by the pij's) and sort these eight means from smallest to largest. We then fit a maximum of seven linear models to the data at each position: model 1 tests the difference between founder material with the smallest mean against all others, model 2 tests the difference between the pair of founders with the two smallest means against all others, and so on. For each model, we create a regressor variable for individual i at position j that is the sum of the elements of pij associated with these contrasts. The test is accomplished by regressing phenotypes on this regressor variable, with the additional constraint that the sum (over individuals) of the regressor variable must be >50. The resulting LOD score at position j uses a model of all eight founders having the same mean as a null and accepts the above contrast with the maximal likelihood as the alternate. Implicit in this analysis is the idea that there is a single biallelic QTL at some position on the chromosome that is segregating among the eight founder chromosomes and that some optimal partitioning of the founders can be used to identify that QTL. We note that the LOD scores resulting from our approach are strongly correlated with the F-statistic obtained from a multiple regression of phenotype onto the pj's at each position over the X chromosome. In the simulations the correlation between the LOD scores and F-statistics is generally >99%, and across all of the experimental panels (both sexes, both traits, both synthetic populations, and both the coarse and fine mapping) the correlation is 97.2%.

The third step of the data analysis is then to estimate the probability that each of the eight founder chromosomes harbors the high, or Q, QTL allele at position j (pQk's) for the model implied by the best partitioning of the founders. This is simply the probability of observing each of the eight founder means given the estimated slope and intercept of that model, conditional on each founder harboring the high QTL. After all three steps are complete we obtain LOD scores and phenotypic effects at J positions in the genome and J corresponding pQ's. Our conservative estimate of the frequency of a QTL located at a local maximum in the LOD profile is the number of elements of pQ ≥ 0.95 divided by the number of elements of pQ ≥ 0.95 or pQ ≤ 0.05 (i.e., we ignore founder lines that do not allow for an accurate estimation of "phase").

Variation due to QTL:
Estimates of QTL effect and frequency can be derived from eight-way synthetic populations, and we can use these values to estimate the fraction of segregating variation, Va, due to identified QTL. We can estimate this both in our (effectively haploid) mapping population as Va = pq{alpha}2, and in a natural, outbred diploid population under additivity as Va = 2pq{alpha}2, where p and q are the allele frequencies and {alpha} is the effect of the QTL (FALCONER and MACKAY 1996, p. 126). In both our mapping population and a natural population, male QTL on the hemizygous X chromosome have Va = pq{alpha}2. We can place a 95% confidence interval on Va using Monte Carlo simulation. For {alpha} this is accomplished by drawing 10,000 random samples from a normal distribution with mean equal to the observed effect of the QTL and standard deviation equal to the observed standard error on the QTL effect. We estimate the allele frequency, p, differently depending on whether we wish to estimate the variance due to the QTL within our mapping population, or in a natural population. Allele frequency, p, in the mapping population is simply the observed QTL frequency. To estimate allele frequencies of mapped factors in natural populations we draw samples from an allele frequency distribution, whose derivation is conditional on the fact that we observe i copies of a QTL allele among N founder chromosomes. Under neutrality the distribution of allele frequencies is described by Wright–Fisher sampling as

Formula
where {theta} is the per-site heterozygosity under neutrality. The probability of drawing i copies of an allele in a sample of size N is described by a binomial distribution,

Formula
where Formula is "N choose i." By Bayes' theorem,

Formula
which after some simplification (and recognizing a Beta integral) reduces to

Formula
where Formula is the gamma integral. Two properties of pr(x;i,N) are noteworthy. First, typically {theta} << 1, and therefore {theta} has little effect on the shape of pr(x;i,N), and second, for large N, and i not close to one or N, pr(x;i,N) is approximately a binomial distribution, and the "prior" assumption of neutrality has little weight. In a natural population, for any given QTL, we assume D. melanogaster {theta} = 0.006 (averaged over 98 loci collated in PRESGRAVES 2005) and use "rejection sampling" (PRESS et al. 1996) to draw 10,000 random deviates from pr(x;i,N) to represent allele frequencies. For each pair of simulated {alpha} /p estimates we calculate Va as above. The 95% confidence interval on Va is taken as the 25th and 975th elements of the sorted vector of Va estimates. These values can be transformed to a percentage of the total bristle number variation explained by the QTL by dividing by the observed phenotypic variance.


    RESULTS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
We develop synthetic recombinant populations, each derived from eight inbred lines of D. melanogaster allowed to recombine at large population size for many generations. We use these populations to map bristle number QTL segregating on the D. melanogaster X chromosome. The mapping strategy we employ relies on the ability to take a recombinant individual, and specify which of the eight founders contributed each segment of the genome. Since we require haplotypic information for the recombinant chromosomes, all experimental individuals are the progeny of crosses between recombinant females and males from the isogenic sequenced strain of D. melanogaster (Figure 2). Thus, haplotypes can be defined unambiguously. The molecular markers we employed were 1-kb PCR fragments within which we genotyped 3–6 SNPs (Table 2). The SNPs were genotyped not only in the experimental individuals, but also in several individuals from each of the 16 founder lines used to initiate the synthetic populations. These procedures allowed us to define, for each experimental recombinant individual, the probability that each marked segment of the chromosome was contributed by each of the eight possible founder lines. Together with the phenotypic scores, this information allows us to map QTL, and obtain joint estimates of QTL effect and frequency.

Simulations:
We carried out simulations to assess our ability to accurately map QTL and jointly estimate their effect and frequency, and used parameters (chromosome size, marker density, marker informativeness) that realistically mimic the experimental data we collected. We sampled 1152 recombinants from an eight-way synthetic population 16 generations after founding to simulate the chromosome scan, and 56 generations after founding to simulate fine mapping. In each case we assume recombination occurs only in females. At the test generation (G16 or G56) recombinant individuals were created by concatenating chromosomal fragments derived from each of the eight founders with equal probability. Fragment lengths were drawn from an exponential distribution with mean 100/(16/2) or 100/(56/2) cM for the coarse and fine mapping, respectively. For the coarse mapping we simulated 12 partially informative markers equally spaced along a 66 cM chromosome and 5% missing data. For the fine-mapping simulation the 12 markers were placed in a more focused 10 cM region. For simplicity we assume the same level of informativeness at each marker, with four segregating haplotypes that group the founder lines as follows: haplotype 1 (three founders), haplotype 2 (two founders), haplotype 3 (two founders), haplotype 4 (single founder). The separation of founders into different haplotypes was random across markers. Finally, we place a biallelic QTL accounting for 5% of the total phenotypic variation at a random position within the mapping region, with the number of founders having the Q allele varied between one and four out of eight. Five-hundred realizations of each simulation were performed.

The probability of observing a peak in the LOD score >4 is ≥99%, with an expected maximum LOD score of ~9.4 and ~11.4 for the coarse- and fine-mapping simulations, respectively. For those peaks associated with a LOD score >4, a 2.5-LOD drop from the maximum includes the simulated position of the QTL >99% of the time. On average, a 2.5-LOD drop maps a significant QTL to a 13.2 cM window with a standard deviation of 6.1 cM (coarse mapping) or a 2.3 cM window with a standard deviation of 0.9 cM (fine mapping). When the LOD score is >4, in no case do we incorrectly infer the "phase" of the QTL, and phase is assigned for an average of 7.8/8 founders. The simulated frequency of the QTL does not appear to affect the probability of inferring the allelic state of the QTL, the power to map a QTL, the average maximum LOD score, or the accuracy in localizing QTL. This is perhaps not surprising given that the simulations hold the proportion of variance attributable to the QTL constant at 5% (LONG and LANGLEY 1999). With the same simulations, but no QTL, the false positive rate at a LOD of four is 2 and 1.6% for the coarse- and fine-mapping simulations, respectively. With our current recombinant panel, marker density, and marker informativeness we can map QTL to the eight founder chromosomes in each of the synthetic recombinant populations. Additional simulations suggest that reducing sample size, marker density, or marker informativeness is detrimental.

Marker informativeness:
Ideally, every marker (a 1-kb fragment genotyped for several SNPs) would completely distinguish among all eight founders in both the pA and pB synthetic populations. In our experimental data this is typically not the case and markers are not fully informative. In fact, it is frequently not possible to distinguish among the eight founders within either population based on the DNA sequence of the entire 1-kb marker amplicon. For those 11 markers for which we had access to sequence from all founders, the average number of distinguishable haplotypes is 6.5/8. This is likely an overestimate of the number of distinguishable founder haplotypes for any arbitrary 1-kb region of the Drosophila genome, as a number of potential markers were sequenced and discarded due to a lack of polymorphism (data not shown). As with any "haplotype tagging" strategy, the SNP genotyping approach we employ further reduces the number of distinguishable haplotypes, both because we do not genotype all available SNPs, and because a proportion of the developed genotyping assays failed (Formula, or 5.1%). Over the 24 independent markers examined in this study, we successfully genotyped 4.5 SNPs per marker on average, and the mean number of unique haplotypes identified per population per marker is 4.5 (pA = 4.4, pB = 4.6). Markers used solely for fine mapping were slightly more informative (5.0 unique haplotypes per marker on average) than those used solely for coarse mapping (4.1 haplotypes). The increase in informativeness for the fine-mapping markers is due to those used to map the region in the middle of the X chromosome (X-middle markers average 5.4 haplotypes, while X-tip markers average 4.1). Contrary to our intuition the number of distinguishable haplotypes in the founders was not strongly a function of how SNPs were ascertained: Markers developed by sequencing the actual founders, where SNPs were chosen to maximize within-marker haplotype diversity, yielded 4.7 haplotypes per population on average. Markers harvested from published sequencing surveys, where SNPs were simply chosen to have high frequency and little LD with other SNPs in the same fragment, showed similar haplotype diversity in our founders (4.3 haplotypes per population).

The inbred founder lines used to derive the synthetic populations are not isogenic, and 28/384 (7.3%) independent marker/founder combinations show heterozygosity. The heterozygosity is not localized to any particular marker as 17/24 markers show at least one heterozygous line. Half of the 16 founders show no evidence for heterozygosity, while 3 of the lines (A1, B3, and B7) are heterozygous at multiple amplicons. This trio of lines collectively contributes to 23/28 (82.1%) of the heterozygous marker/founder combinations, implying they are less well inbred than the remaining 13 lines. It is of interest that all 16 founder lines were maintained in stock centers at small effective population sizes for >40 years (without being contaminated by P-element-harboring flies). The observation that these lines are not completely homozygous suggests a relatively high rate of tightly linked deleterious alleles in trans.

The HMM employs the genotype data to infer (for every individual and every position) the probability that the chromosomal segment is derived from each of the eight founders. Founder assignment becomes more accurate as the information level in the genotype data increases. We can visualize spatial variation in the information level by color coding (by founder of origin) those chromosomal segments inferred to come from a single founder with a probability >75%. Figure 3 depicts this information for 40 typical males from the pBr1 population. Colored blocks represent highly likely founders, and the information content at any position can be loosely assessed by the amount of white space (i.e., where the probability was <0.75 for all eight founders). For the coarse-mapping scan, information is generally high at the markers, with the obvious exception of marker or.84 (third marker from the right), where only two haplotypes are distinguishable among the eight pB founders. Overall, there appears to be greater information in the fine-mapping population. One exception is the region around marker no.01 (fourth from the left) at the tip of X chromosome. This is likely due to its low marker informativeness (just three haplotypes are distinguishable at no.01 in pB), and because it is relatively distant from either of the flanking markers. We note that the relative size of nonrecombinant fragments is consistent with their expectation given the number of generations the populations experienced recombination/drift. Finally, with reduced information and/or a poorly performing HMM we may expect the most likely founder to "flip-flop" frequently along the chromosome, and this does not appear generally the case.


Figure 3
View larger version (39K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 3.— Visual representation of genotyping information. Each row of each plot represents a single experimental male, for which the X chromosome is derived from the pBr1 synthetic recombinant population. The top plot shows 40 flies from the coarse-mapping sample for the entire X chromosome, and the bottom two plots show 40 flies from the two small fine-mapped regions of the X chromosome. For each male, every 1 cM (on the expanded genetic map) across the mapped region we examine the probability that the segment of chromosome is derived from each of the eight possible founder lines. If the probability for any one founder is >0.75, the position is colored according to the founder (colors are as in Figures 1 and 2); otherwise the position is white. Marker positions are shown beneath each plot as solid triangles. Markers used for both the coarse mapping and the X-tip fine mapping are indicated with plus symbols (+), while markers used for both coarse mapping and the X-middle fine mapping are indicated with cross symbols (x).

 
We can examine marker informativeness more quantitatively using the measure H to estimate the proportion of missing genotypic information (H = 0, complete information; H = 1, no information). Figure 4, E and F, and Figure 5, E and F, present the amount of missing information across the three mapped regions (the entire X chromosome for the coarse-mapping scan, and two smaller regions of the X for the fine-mapping scans). It is easy to see that at the markers themselves the amount of missing information is lower than between the markers. The value of H, averaging over individuals for all sites, from both sexes from both synthetic populations is 0.374 (coarse), 0.346 (X-tip fine), and 0.187 (X-middle fine). The X-middle fine-mapping panel data has greater information content, both because the markers themselves are more informative (see above), and also because this region has the highest marker density relative to the recombination distance. For the X-middle fine-mapping region markers are placed every 21.2 cM on average (on the expanded scale), while for the X-tip fine region markers are 25.2 cM apart, and for the entire X in the coarse-mapping experiment markers are 39.4 cM apart.


Figure 4
View larger version (28K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 4.— Coarse-mapping bristle number across the X chromosome. pAr1, experimental flies have recombinant chromosomes derived from synthetic population pAr1. pBr1+2, experimental flies with recombinant chromosomes derived from synthetic populations pBr1 or pBr2 were pooled prior to analysis. (A) pAr1 female LOD; (B) pBr1+2 female LOD; (C) pAr1 male LOD; (D) pBr1+2 male LOD; (E) genotype information (pAr1 male); (F) genotype information (pBr1+2 male). (A–D) Likelihood profiles. Each curve shows the likelihood that a given region of the chromosome harbors a QTL for bristle number (solid curves, ABN; dashed curves, SBN). Marker positions are shown as solid triangles along the x-axis. LOD scores are plotted against position (in centimorgans) on the expanded genetic map. The expansion is due to the large number of meiotic recombination events the synthetic population was subjected to prior to mapping. Note that the genetic map positions are not identical across the four plots. Vertical shaded bars represent regions used for fine mapping (Figure 5). (E and F) Missing genotypic information. The proportion of missing genotypic information, H, is plotted against the expanded genetic map position. H = 0, no missing information; H = 1, no information; described fully in the MATERIALS AND METHODS. For population pAr1 (E) and the pooled pBr1+2 population (F), missing information is provided only for the experimental males. Missing data from females are very similar.

 

Figure 5
View larger version (15K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 5.— Fine-mapping bristle number in two small regions of the X chromosome. pAr1 (pBr1) indicates the synthetic population from which the recombinant chromosomes of the experimental flies are derived. X-tip and X-middle refer to the regions of the X chromosome showing evidence for a QTL in the coarse-mapping study and represent those regions of the chromosome shaded in Figure 4. (A) X-tip, pAr1 male LOD; (B) X-middle, pAr1 male LOD; (C) X-tip, pBr1 male LOD; (D) X-middle, pBr1 male LOD; (E) X-tip, genotype information (pBr1); (F) X-middle, genotype information (pBr1). (A–D) Likelihood profiles. Each curve shows the likelihood that a given region of the chromosome harbors a QTL for bristle number (solid curves, ABN; dashed curves, SBN). Marker positions are shown as triangles along the x-axis (solid triangles, markers used in coarse mapping [Figure 4]; open triangles, markers used only for fine mapping). LOD scores are plotted against position (in centimorgans) on an expanded genetic map. Note that the genetic map positions are not identical across the four plots. Bars at the top of the plots represent 2.5-LOD drop intervals across five fine-mapped QTL (solid bar, QTL for ABN; hatched bar, QTL for SBN). (E and F) Missing genotypic information. The proportion of missing genotypic information, H, is plotted against the expanded genetic map position. H = 0, no missing information; H = 1, no information—described fully in the MATERIALS AND METHODS. For the X-tip region (E) and the X-middle region (F), missing information is provided only for flies derived from population pBr1. Missing data from pAr1 flies are very similar.

 
Phenotypes of synthetic populations:
We scored two bristle phenotypes per experimental fly—abdominal bristle number (ABN) and sternopleural bristle number (SBN). Within each population (pAr1, pBr1, and pBr2), mapping generation (coarse and fine mapping), sex, and phenotype the bristle count distributions are approximately normal, similar to those measured in large outbred cohorts of flies sampled directly from nature (GENISSEL et al. 2004; MACDONALD and LONG 2004; MACDONALD et al. 2005a). Table 3 presents phenotype means and standard deviations for all sets of flies examined in this study. We note that panels pBr1 and pBr2 are very similar for both sexes and bristle counts, and that flies from pAr1 have more abdominal and sternopleural bristles than flies from either pB population. On average, pAr1 flies have 0.5–1.1 more bristles than pB flies (Table 3). A difference in body size between the pA and pB panels may contribute to this pattern. The within-population phenotype means, and more importantly variances, do not change over time, and values are consistent between the coarse- and fine-mapping studies. Finally, we note that the within-panel/sex/trait phenotypic variances we observe are lower than variances observed for the same traits in two wild-caught D. melanogaster cohorts (GENISSEL et al. 2004; MACDONALD and LONG 2004; MACDONALD et al. 2005a). This is presumably because each of the phenotyped flies in this study harbors a common set of isogenic, paternally derived chromosomes, and flies were reared under a controlled laboratory environment.


View this table:
[in this window]
[in a new window]

 
TABLE 3 Bristle number variation in synthetic populations

 
Position and effect of X-linked bristle number QTL:
Coarse scan of the X chromosome:
Initially we conducted a coarse scan of the entire X chromosome for QTL for two bristle traits for both sexes. For the coarse mapping we collected ~500 experimental flies of each sex from the populations pAr1, pBr1, and pBr2 (population pAr2 became contaminated during maintenance and was destroyed). Experimental individuals from the replicate populations pBr1 and pBr2 were pooled, and we refer to this pooled sample as pBr1+2. Comparison of the data from pBr1 and pBr2 alone with that from the pooled sample does not reveal any obvious inconsistencies. Since the sample size of population pAr1 is around half the size used in our simulations, we likely have reduced power to detect QTL in the pAr1 coarse-mapping sample. We only consider QTL to be present when the peak in the likelihood profile is >4-LOD.

The likelihood profiles for the coarse-mapping samples shown in Figure 4 (A–D) reveal the existence of QTL for bristle number at the very tip of the X chromosome in both females and males. We find no evidence for bristle number QTL anywhere else on the X for females, but do identify a male-specific QTL for ABN in the middle of the X chromosome (Figure 4D). Details of all QTL identified in the coarse-mapping study are presented in Table 4. For both populations, pAr1 and pBr1+2, and for both sexes QTL for SBN were detected at the tip of the X chromosome with LOD scores between 4.9 and 7.7. The effects of the pBr1+2 X-tip SBN QTL are lower than those detected in pAr1 (0.70 and 0.73 in pBr1+2 vs. 1.36 and 1.42 in pAr1), which may due to the smaller pAr1 sample leading to less robust estimates of the genetic effect. A single X-tip ABN QTL was identified in females of the pBr1+2 sample (Figure 4B). Figure 4D does show a peak at the tip of the X for ABN in pBr1+2 males, but the 2.5-LOD drop for this peak overlaps the larger ABN QTL in the middle of the chromosome, thus we do not consider it an independent QTL. All five X-tip QTL map somewhere between the distal end of the X chromosome and band 5B6 (Table 4). Our identification of five QTL mapping to the very tip of the X chromosome replicates the well-documented effect of this region on bristle number variation in D. melanogaster (LONG et al. 1995; GURGANUS et al. 1998, 1999; NUZHDIN et al. 1999; DILDA and MACKAY 2002).


View this table:
[in this window]
[in a new window]

 
TABLE 4 Coarse-mapped X-linked bristle number QTL

 
The largest bristle number QTL identified in the coarse-mapping scan is for male ABN in pBr1+2 (Table 4, Figure 4D). This QTL has a LOD peak of 8.3, an effect of 0.91 bristles, and was resolved to a 9 cM window (on the nonexpanded D. melanogaster recombination map) in the middle of the X chromosome between cytological bands 6F1–8E1. The peak is also evident in the separate analyses of populations pBr1 and pBr2 (data not shown). There is no evidence that a corresponding QTL exists in pAr1 males, despite largely equivalent genotypic information across the two panels at the QTL position (Figure 4, compare E and F), suggesting fairly strongly that this QTL segregates only in the pB populations. One caveat is that the sample size for the pAr1 population was low.

The coarse-mapped QTL are resolved to intervals averaging 8.3 cM (~4 Mb). We elected to fine map two interesting QTL regions (Figure 4, C and D) in males only from the populations pAr1 and pBr1. Figure 5 (A–D) presents the likelihood profiles for the two fine mapped regions (X-tip and X-middle) for the populations pAr1 and pBr1, and Table 5 gives details of the fine-mapped QTL. Overall, there is remarkable concordance between the coarse- and fine-mapped QTL (compare Figure 4, C–D, with Figure 5, A–D).


View this table:
[in this window]
[in a new window]

 
TABLE 5 Fine-mapped X-linked male bristle number QTL

 
X-tip fine mapping:
The QTL for male SBN coarse mapped to the tip of the X chromosome in population pAr1 replicates in the fine-mapping experiment (QTL1 in Figure 5A), and the interval harboring the QTL was reduced from 13.5 to 1.7 cM (on the nonexpanded Drosophila melanogaster recombination map). The effect of the QTL in the coarse and fine mapping alters slightly from 1.42 bristles to 1.06 bristles, respectively. The latter is likely a more robust estimate of the effect, as the sample size of the fine-mapping panel was higher, and the extra generations of recombination have stretched the genetic map, separating the QTL from any linked factors. The coarse-mapped pBr1+2 male SBN X-tip QTL splits into two on fine mapping in population pBr1 (QTL2 and QTL3 in Figure 5C), each having an effect similar to that estimated for the initial coarse-mapped QTL (coarse = 0.73, fine QTL2 = 0.51, and fine QTL3 = 0.68). These two QTL barely achieve the 4-LOD threshold required (QTL2 = 4.2 LOD, QTL3 = 5.1 LOD), but since the QTL intervals do not overlap, it is probable that two unlinked SBN QTL do exist at the tip of the X in population pBr1. It is not clear whether either pBr1 QTL2 or QTL3 correspond to pAr1 QTL1. We note that there is no evidence from the fine mapping for an X-tip male ABN QTL in either the pA or pB populations (Figure 5, A and C). This supports our earlier assertion that in the coarse mapping of pBr1+2 the peak at the tip of the chromosome for male ABN is spurious.

The two best bristle number candidate genes in the fine-mapped X-tip region are the achaete-scute complex, ASC, at cytological position 1A6, and Notch at 3C7-3C9. In Figure 5 (A and C) ASC is located distal to the leftmost marker (or.05), while the fourth marker from the left (no.01) is at Notch. Thus, our data is compatible with the notion that variation at ASC may be responsible for QTL1 and QTL2. It does not seem likely that variation at Notch contributes to QTL3, as the LOD at Notch is 3.4 less than that at QTL3 in population pBr1. However, we cannot rule out the possibility that Notch contributes to segregating variation for SBN in males as the genotype information around Notch in the X-tip region is somewhat low (Figures 3 and 5E), and the LOD score at Notch for SBN in pAr1 males is high (LOD = 6.0, Figure 5A). The broad QTL1 peak in pAr1 males may actually represent two QTL that we have insufficient power to resolve. Since we did not fine map the QTL identified at the tip of the X chromosome in females, we are unable to suggest whether ASC or Notch harbor factors affecting female bristle number.

X-middle fine mapping:
The coarse mapping revealed a strong QTL for male ABN in the middle of the X chromosome in the pBr1+2 population and a suggestive peak (LOD < 4) for male SBN in a similar position (Figure 4D). The fine-mapping experiment almost perfectly replicated these observations (Figure 5, B and D), aside from a slight shifting of the QTL maxima relative to the flanking markers (compare Figures 4D and 5D). The pair of X-middle fine-mapped QTL were resolved to intervals of 0.9 cM (QTL4) and 1.7 cM (QTL5), down from 9 and 41.1 cM in the coarse mapping. The effect of QTL4 is maintained between the coarse-mapping (effect = 0.91) and fine mapping (effect = 0.97), while the effect of QTL5 increases (coarse-mapping effect = 0.58, fine-mapping effect = 1.31). We looked for evidence that either of these X-middle QTL had been identified in other mapping studies of Drosophila bristle number variation. We found no evidence of similarly positioned bristle number QTL in LONG et al. (1995), GURGANUS et al. (1998), or DILDA and MACKAY (2002), and evidence only of weak QTL for female ABN in GURGANUS et al. (1999; QTL between 5D and 8E) and NUZHDIN et al. (1999; QTL between 7D and 8E), suggesting we have identified novel QTL for both male ABN and male SBN. The intervals within which QTL4 and QTL5 reside are genetically short (0.9 and 1.7 cM, respectively), physically short (204-kb and 408-kb, respectively), and harbor few genes (QTL4 = 13 genes, QTL5 = 26 genes, of which 13 overlap with those under QTL4). None of the 26 genes would be considered a priori classic bristle number candidate genes: genes in both QTL4 and QTL5 intervals (oc, CG12772, CG11284, Ppt1, Ogg1, CG11294, Hexo2, CG2004, CG1785, l(1)G0020, CG1789, Lim1, and CG32710) and genes in only QTL5 interval (CG12075, Moe, CG1885, CG10648, e(r), CG15352, CG12660, CG3898, CG12661, rdgA, CG10962, CG12662, and mir-31b). Nevertheless, for two of the genes under both QTL4 and QTL5, oc and Lim1, there is reported evidence of bristle defects in mutant flies: oc (ocelliless) mutants affect interocellar, ocellar, and postvertical bristles (ROYET and FINKELSTEIN 1995), and Lim1 (Lim kinase) mutants affect sternopleural and vibrissae bristles (PUEYO et al. 2000). These two genes may be the best candidates underlying the two novel QTL we identify in the middle of the X chromosome.

Frequency of X-linked bristle number QTL:
Since the synthetic recombinant populations we employ are derived from multiple inbred lines, it is possible to estimate the phenotypic mean for each founder at every position along the chromosome. In turn—under the assumption that an identified QTL is biallelic—founders can be probabilistically assigned to "high" or "low" QTL allele classes. This permits an estimate of the frequency of the QTL. Figure 6 shows, for all five fine-mapped male bristle number QTL and the corresponding coarse-mapped regions, the estimated founder phenotype means, and the probable QTL allele present in each founder. Founder means appear to be estimated well, and as expected estimates are more robust when the sample size is larger: The errors bars are wider for the coarse-mapping pAr1 data (Figure 6A, left) than for the other data sets. Also, those sporadic cases of large standard errors, e.g., line B6 for fine mapping of QTL2 (Figure 6B), are due to a comparatively small number of experimental individuals consistent with harboring this founder chromosome at the QTL.


Figure 6
View larger version (36K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 6.— Estimated phenotypic means for each of the founder chromosomes at QTL. Each plot represents a single male bristle number QTL (see Figure 5 and Tables 4 and 5 for details) and shows the estimated phenotypic mean (standard error) at the QTL peak for each of the eight lines used to found the particular synthetic population. The line numbers, A1–A8 and B1–B8, refer to the lines described in Table 1. For comparison the means estimated at the QTL peak are presented for both the coarse- (open bars) and fine-mapping (shaded bars) panels. Bars are presented only if the estimated number of experimental individuals consistent with having a given founder chromosome is >10; otherwise a cross is plotted. Below the bars we give the most probable QTL allele harbored by the founder (L, low allele; H, high allele), under the assumption that the QTL is biallelic. If the founder cannot be confidently (probability > 0.95) assigned an allele, a ? is applied. (A) QTL1 for pA male SBN mapped to the X-tip region in population pAr1, (B) QTL2 for pB male SBN mapped to the X-tip region in pBr1+2 (coarse mapping) and pBr1 (fine mapping), and (C) QTL3 for pB male SBN mapped to the X-tip region in pBr1+2 (coarse mapping) and pBr1 (fine mapping). The coarse-mapping information for QTL2 and QTL3 is identical, as these are the two fine-mapped QTL we detected under a single coarse-mapped peak. (D) QTL4 for pB male ABN mapped to the X-middle region in pBr1+2 (coarse mapping) and pBr1 (fine mapping), and (E) QTL5 for pB male SBN mapped to the X-middle region in pBr1+2 (coarse mapping) and pBr1 (fine mapping).

 
There are marked similarities in the overall pattern of estimated founder phenotype means in the coarse- and fine-mapping experiments. For instance, for QTL1 (Figure 6A) the pair of lines with the highest phenotypic means (A2 and A8) are the same in both the coarse and fine mapping. The similarity in founder means for this QTL is particularly encouraging as the sample size, and hence the power, was low in the coarse mapping of population pAr1. The pair of fine-mapped QTL2 and QTL3 were resolved from a single coarse-mapped peak. For QTL2 the overall pattern of line means is concordant between coarse and fine mapping (Figure 6B), but this is not the case for QTL3 (Figure 6C). This observation likely reflects the separation of the two QTL – the founder means for these QTL need not necessarily recapitulate those of the coarse-mapped region. The three identified X-tip QTL may be somewhat common. From our analysis we estimate that the high QTL allele is present in Formula, Formula, and Formula founders for QTL1, QTL2, and QTL3, respectively (founders not assigned to either allelic class are ignored). An important caveat is that this analysis rests on the assumption that the QTL are biallelic, which is not necessarily supported for the X-tip QTL. For instance, while founders A2 and A8 are considered the "high" lines for QTL1 (Figure 6A), there is a difference of nearly 0.8 bristles in the estimated phenotype mean of this pair of lines, and the standard errors do not overlap. Also, the error bar around "low" line A5 does not overlap those of "low" lines A3 and A4. Similar inconsistencies within assigned biallelic classes for the other two fine-mapped X-tip QTL (QTL2 and QTL3) also do not give any strong indication that only two QTL alleles segregate (Figure 6, B and C).

Of all the QTL, QTL4 for male ABN was fine mapped to the smallest region (0.9 cM), and for this QTL the pattern of founder means alters between the coarse and fine mapping (Figure 6D). In the coarse mapping, while it is difficult to visualize two clear allelic categories, under the assumption that the QTL is indeed biallelic, Formula founders have the low allele. In contrast, the fine mapping quite clearly reveals two classes, with the low allele in Formula founders. One explanation for the change is that the information content of the genotype data is much greater in the X-middle fine mapping (H = 0.187) than in the coarse scan of the entire X chromosome (H = 0.374). The increased information may have led to greater accuracy in estimating the ancestry of recombinant chromosomal segments, and more accurate estimates of the founder means in the fine mapping. Alternatively, there might be additional bristle number factors close to the mapped QTL that interfere with founder mean estimation in the coarse-mapping scan. Expansion of the genetic map in the fine-mapping panel would reduce the effect of any such interference. We note that if marker informativeness is the sole issue, the pattern of founder means for QTL4 observed in the fine-mapping experiment should be seen in the coarse-mapping panel simply by genotyping additional markers.

Both QTL4 and QTL5 appear rare in population pB, with the minor allele present in Formula founders (Figure 6, D and E), and the eighth founder (line B3) being ambiguous. Since there is no evidence for equivalent QTL in the pA population (Figures 4C and 5B), if we make the reasonable assumption that the pA lines are fixed for the major QTL allele, QTL4 and QTL5 may each have a frequency of Formula (or Formula) in our lines. We use Monte Carlo simulation to estimate the fraction of segregating bristle number variation due to these male-specific QTL in our mapping (synthetic recombinant x sequenced strain trans-heterozygote) population (see MATERIALS AND METHODS). QTL4 (effect = 0.97 abdominal bristles, SE = 0.160) was detected in males of panel pBr1, a population which shows an ABN variance in the fine-mapping panel of 3.03 (Table 3). If we assume the rare allele is present in Formula lines, the average variance explained by QTL4 is 1.9% (95% confidence interval, 0.84–3.22%). Similarly, QTL5 explains 4.1% (1.26–8.11%) of male SBN variation. Notwithstanding Beavis effects (BEAVIS 1994), our data imply these QTL contribute 2–4% to the total variation for bristle number in our mapping panel.

Given that QTL4 and QTL5 reside in very small, and overlapping intervals one might conclude that we have mapped a single pleiotropic QTL contributing to variation in both male ABN and male SBN. Figure 6 (D and E) shows this is not the case. The rare low allele for QTL4 is present in line B5, while the rare high allele for QTL5 is present in line B7 (and perhaps line B3): a single QTL affecting both traits would show the same pattern of alleles across the founders. Thus, QTL4 and QTL5 represent independent mutations that are very tightly linked, perhaps even residing in the same gene. Our ability to distinguish tight linkage from pleiotropy is a consequence of mapping QTL in an eight-way cross and estimating founder mean phenotypes at each QTL.


    DISCUSSION
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
Capturing experimental reality by simulation:
We performed simulations to examine our ability to map and characterize QTL in eight-way recombinant populations. Our intent was not to fully explore the parameter space, but rather to inform our experimental work to ensure we carried out a study of sufficient power. Results suggest that we have considerable power to detect QTL contributing 5% to variation in phenotype with the sample sizes and scale of genotyping we eventually employed. Furthermore, the false positive rate is very low with the critical LOD threshold applied. As with any simulation approach, we make various simplifying assumptions. Of potential concern is that we simulated just one QTL on the chromosome. In reality, there could be interference from linked QTL that may affect both our ability to detect QTL and to estimate founder phenotype means. Many QTL mapping algorithms have agreeable properties in the absence of "traffic" from nearby QTL, but are prone to errors in inference with linked QTL (WRIGHT and KONG 1997; CORNFORTH and LONG 2003). An important feature of the recombinant populations we employ is that the negative effects of "traffic" on mapping inference are evaded by genetic map expansion rather than by some form of statistical correction. Fine-mapping QTL should eliminate any problems associated with other nearby factors, implying that our method can ultimately cope with problems arising from linked QTL. Nevertheless, one could envisage scenarios under which linked factors might prevent initial QTL detection in a coarse scan of the genome.

In the simulations we also assume that the recombinant population is not subject to drift or selection, and that the expected frequency of genetic material derived from each founder at every point along the chromosome is Formula. Deviation from this neutral marker/infinite population size assumption may reduce our ability to detect QTL and accurately assign founders to allelic classes. In an extreme case the population could fix for one of the founder haplotypes, rendering QTL undetectable at that position. The likelihood of this occurrence increases as the population is subject to more genetic drift, for example by passing the population through a bottleneck or by maintaining the population for many generations. We deliberately maintained each of our synthetic populations as a large cohort to minimize the effect of drift. Nevertheless, it is clear from fine mapping at the tip of the X chromosome that the genetic material from certain founders can be largely eradicated from the population (Figure 6). It is unclear if the observed loss of some founder alleles is more consistent with random genetic drift or perhaps purifying selection against a disadvantageous chromosomal segment in our populations. The degree to which founder drop-out is a genomewide problem will require further genotyping of the fine-mapping panels across the five major chromosome arms of Drosophila. Further simulation of mapping performance using eight-way recombinant populations subject to many generations of maintenance will build on the work of VALDAR et al. (2006a), and incorporate drift, selection, and more complex, realistic genetic architectures.

Information content of markers:
Each marker is composed of a set of genotyped SNPs within a 1-kb PCR amplicon and has the potential to completely distinguish among a set of eight chromosomes. In practice, we find that developed markers are not completely informative. This is the combined result of marker sequence identity among two or more founders, genotyping only a subset of the available SNPs, genotyping assay failure, and residual segregating variation within founders. Despite the non-fully informative nature of the markers we have power to detect QTL because the HMM employed incorporates data from linked markers (BROMAN 2005). Unlinked markers only provide information on the specific marked segment of the chromosome, whereas a set of linked markers provide information across the linkage group. The level of the information increases with marker density (relative to the average distance between recombination breakpoints) even if the markers remain only partially informative. By extension, instead of attempting to develop highly informative markers, it is possible to apply the HMM to a relatively dense genomewide set of genotyped biallelic SNPs. Future studies of eight-way recombinant Drosophila populations could take advantage of this possibility, but such an approach awaits the development a genomewide bank of intermediate-frequency SNPs for D. melanogaster, as well as some means of inexpensively genotyping those SNPs.

X-linked bristle number QTL:
Drosophila bristle number is arguably the best studied quantitative trait, and coupled with its easy and accurate scoring, permitted a rigorous test of our mapping methodology. A strong expectation was that we would identify bristle number QTL at the distal tip of the X chromosome, as factors influencing both sternopleural and abdominal bristle number have been identified in this region in previous studies (LONG et al. 1995; GURGANUS et al. 1998, 1999; NUZHDIN et al. 1999; DILDA and MACKAY 2002). In a coarse-mapping experiment we identified QTL at the tip of the X for sternopleural bristle number (SBN) for both sexes in both synthetic populations, and for female abdominal bristle number (ABN) in just the pB population (Figure 4). Additionally, we found a QTL for male ABN in population pB in the middle of the X chromosome, but no corresponding QTL in the pA population, and a suggestive peak for male SBN in a similar position (Figure 4D). The limited resolution of the significant factors (8.3 cM on average) in the coarse-mapping experiment bars identification of the underlying molecular basis of mapped QTL—a commonly observed shortcoming associated with standard inbred line QTL mapping strategies (MACKAY 2001). Therefore, we took advantage of the increased mapping resolution we can achieve by maintaining our synthetic population for many generations, and chose to fine map two interesting QTL regions (the tip of the X and the middle of the X) in males only. This prevented comparison of any fine-mapped QTL between the sexes, however the sex-specific nature of bristle number QTL/QTN is well established (LAI et al. 1994; LONG et al. 1995, 1998, 2000; LYMAN et al. 1999).

On average, fine-mapped QTL were resolved to 1.3 cM, with the large male pB ABN QTL resolved to just 0.9 cM. These intervals implicate genetically tractable physical distances, and suggest a handful of genes for further study. The best bristle number candidate genes at the tip of the X chromosome are the achaete-scute complex (ASC) and Notch. Association between polymorphisms at ASC and bristle number variation were first seen by MACKAY and LANGLEY (1990), extended and confirmed by LONG et al. (2000), and more fully explored by GRUBER et al. (2007). ASC is located under QTL peaks QTL1 and QTL2, and segregating loci at ASC might plausibly be involved in the expression of these QTL. Unfortunately, the very tip of the X chromosome in Drosophila has a markedly reduced crossover rate relative to physical distance compared to the rest of the chromosome, and LD extends over large physical distances (AGUADÉ et al. 1989). Thus, the prospect for identifying the actual causal locus, rather than a locus in strong LD with the causal site, contributing to QTL1 and QTL2 is somewhat bleak. The Notch pathway is involved in the cell fate decisions that lead to bristle specification, and mutations of the component genes alter bristle patterning and spacing (reviewed by ARTAVANIS-TSAKONAS et al. 1999; LAI 2004). Thus, Notch is considered a viable candidate gene for bristle number variation, although no formal association mapping-style experiment has been performed across the region. The fine-mapping experiment presented here suggests that Notch is unlikely to contribute to segregating variation for male ABN, but we cannot completely rule out an effect of Notch on SBN.

The two QTL mapped to the middle of the X chromosome are particularly interesting as we could find no good evidence for similar QTL in other studies that have scanned the X chromosome (LONG et al. 1995; GURGANUS et al. 1998, 1999; NUZHDIN et al. 1999; DILDA and MACKAY 2002). This is probably because for both QTL the minor allele is rare in our experiment (Formula), therefore it is not likely that these QTL segregated between pairs of inbred lines studied previously. Together the two QTL intervals harbor 26 genes (just 13 under the smaller QTL4 interval), and none of these represent classic bristle number candidate genes, although two genes—ocelliless and Lim kinase—have mutants that exhibit bristle defects (ROYET and FINKELSTEIN 1995; PUEYO et al. 2000). Despite overlap in the regions harboring the two QTL, our data show that while QTL4 and QTL5 are tightly linked, they are independent: the alleles for the two QTL are not in phase across the pB founder lines. Thus, they do not represent a single genetic factor having pleiotropic effects on the two bristle characters. Distinguishing independent factors that are within ~1 cM highlights the power of our approach compared to standard QTL mapping between pairs of inbred lines. Since QTL4 and QTL5 map to a small region in the middle of the X chromosome having a high rate of recombination relative to physical distance, there is the potential to identify the actual