| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 176, 1261-1281, June 2007, Copyright © 2007
doi:10.1534/genetics.106.069641
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Department of Ecology and Evolutionary Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66045 and
Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697
1 Corresponding author: Department of Ecology and Evolutionary Biology, University of Kansas, 1030 Haworth Hall, 1200 Sunnyside Ave., Lawrence, KS 66045.
E-mail: sjmac{at}ku.edu
| ABSTRACT |
|---|
|
|
|---|
1.3 cM. We confirm previous observations of bristle number QTL distal to 4A at the tip of the chromosome and identify two novel QTL in 7F8C, an interval that does not include any classic bristle number candidate genes. If QTL at the tip of the X are biallelic they appear to be intermediate in frequency, although there is evidence that these QTL may reside in multiallelic haplotypes. Conversely, the two QTL mapping to the middle of the X chromosome are likely rare: in each case the minor allele is observed in only 1 of the 16 founders. Assuming additivity and biallelism we estimate that identified QTL contribute 1.0 and 8.7%, respectively, to total phenotypic variation in male abdominal and sternopleural bristle number in nature. Models that seek to explain the maintenance of genetic variation make different predictions about the population frequency of QTL alleles. Thus, mapping QTL in eight-way recombinant populations can distinguish between these models.
The most effective way to clarify the contribution of MSB and CDCV forces in maintaining phenotypic variation is to experimentally identify and characterize the underlying molecular genetic basis of several QTL. With this ultimate goal in mind, two non-mutually exclusive experimental programs are predominant in the literature: QTL mapping and association or linkage disequilibrium (LD) mapping. In its simplest form QTL mapping involves crossing a pair of lines that are differentially fixed for alleles at a genomewide set of marker loci and at QTL contributing to the phenotype. Genotyping and phenotyping a large number of recombinant progeny from this cross identifies genetic intervals that harbor factors contributing to segregating variation in the cross. Since the publication of influential articles by PATERSON et al. (1988) and LANDER and BOTSTEIN (1989), the community has enjoyed considerable success mapping QTL for a wide range of traits in a diverse set of genetic systems. Typically QTL are resolved to broad intervals of
10 cM (MACKAY 2001), which may represent millions of base pairs. This lack of resolution has hindered identification of the molecular variants involved, particularly in QTL mapping studies of intraspecific variation where QTL can have subtle effects. Physically close genetic factors also pose a problem for QTL mapping, as it may be impossible to accurately estimate the effects and locations of linked QTL, and the number of QTL may be underestimated (WRIGHT and KONG 1997; CORNFORTH and LONG 2003). Additionally, since recombinant individuals for QTL mapping are generally derived from a pair of inbred parental lines, only QTL that segregate between the parents can be identified. As a result there is no way to know the population frequency of mapped QTL.
Association mapping is a population-based genetic mapping strategy. The approach involves genotyping a large number of single nucleotide polymorphisms (SNPs) in a large sample of individuals and at each marker testing for an association between genotype and phenotype. A strong association signal at a SNP suggests either that the SNP itself contributes to trait variation or that the causal site is in strong LD with the SNP marker genotyped. Instead of relying on meiotic recombination in experimental crosses, association mapping utilizes the pattern of historical recombination in a panel of natural chromosomes. Thus, association mapping has the potential for much higher resolution than QTL mapping, and in principal the actual quantitative trait nucleotide (QTN) can be identified and its effect and frequency estimated directly. In practice, association mapping has met with modest success, and the literature is rife with failures to replicate published associations (although see TODD 2006 for a positive view of the future). This reflects a variety of factors, such as cryptic population structure, different patterns of LD or genetic heterogeneity in different populations, or simply insufficient power to detect variants with only subtle effects (KRUGLYAK 1999; LONG and LANGLEY 1999). Association mapping can be effective only when the density of genotyped SNPs is sufficiently high that real associations are not missed (RISCH and MERIKANGAS 1996). Since powerful genomewide association studies are tremendously difficult to carry out, even in humans where resources are considerable (HIRSCHHORN and DALY 2005; WANG et al. 2005), researchers have elected to carry out localized mapping on candidate gene regions (e.g., GENISSEL et al. 2004; PALSSON and GIBSON 2004; MACDONALD et al. 2005a). Such a strategy will fail if the presumed candidate does not actually contribute to trait variation (e.g., FLOREZ et al. 2006). Finally, an aspect of association mapping that is often overlooked is that if much of the genetic variation underlying complex traits is due to rare variants of large effect (as predicted by MSB models) the association mapping paradigm is not very powerful at all, and is almost guaranteed to fail (WEISS and TERWILLIGER 2000; PRITCHARD 2001; REICH and LANDER 2001; PRITCHARD and COX 2002).
It is quite clear that both QTL and association mapping approaches, while powerful in many respects, suffer from distinct drawbacks that prevent the routine identification and characterization of QTN. To make the dissection of complex traits more routine we require a methodology that has some of the resolution of association mapping, combined with the power of QTL mapping to identify factors on a genomewide scale. To determine if standing variation is generally consistent with MSB or CDCV models a method allowing for direct estimation of the population frequency of mapped factors is highly desirable. An ideal methodology would also provide some mechanism with which to identify the precise molecular variants involved.
In this study we describe a mapping scheme that allows joint estimates of QTL effects and frequencies from a recombinant panel derived from multiple founder chromosomes. Conceptually, our approach is similar to the mouse "Collaborative Cross" scheme envisioned by the Complex Trait Consortium (THREADGILL et al. 2002; CHURCHILL et al. 2004) and has parallels with the "heterogeneous stock" strategy (TALBOT et al. 1999; MOTT et al. 2000; DEMAREST et al. 2001) most recently used by VALDAR et al. (2006b) to map QTL for 97 traits in mice. We take two independent sets of eight inbred Drosophila melanogaster lines, and from each set initiate a recombinant population. The genetic material for each synthetic population is thus derived from just eight founders, and after multiple generations of maintenance the genome of each recombinant individual is a mosaic of the founder chromosomes (Figure 1). Chromosomal segments transmitted to the recombinant flies by each of the founders are distinguished using markers composed of short runs of nonrecombining SNPs. Multiple rounds of recombination allow these synthetic populations to be used to map QTL with a fairly high level of resolution. Since each synthetic population is derived from eight founders, a key feature of our approach is that we obtain simultaneous estimates of the effect and the population frequency of each mapped QTL. Furthermore, because mapping resolution is generally a function of the number of generations of recombination since population inception, later generations can be used to map more precisely those QTL detected at an earlier generation in a coarse genomewide scan. Here we detail an experiment to map bristle number QTL on the D. melanogaster X chromosome and describe the analytical platform required to deal with experimental mapping data generated using eight-way synthetic populations.
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
Experimental flies:
Figure 2 presents the strategy used to create the individuals used for phenotyping and genotyping.
|
A subset of the coarse-mapping experimental flies was tested for the presence of P elements using a transposon-display assay. All flies should be P free. We found that flies derived from population pAr2 showed P elements, implying that pAr2 was contaminated. This population was destroyed, and experimental flies from this population are not considered further.
Fine mapping:
Virgin females were collected from synthetic populations pAr1 and pBr1 and aged as for the coarse-mapping experiment. Multiple vial crosses were set up between 10 aged virgin females and 10 sequenced-strain males, and from each vial 4 male offspring were used for phenotyping and genotyping. The fine-mapping experiment was split into two blocks. In block 1 (generation G55) 144 vials were set up for each of the two populations pAr1 and pBr1, and in block 2 (generation G56) 120 vials were set up for each population. This resulted in a total of 1056 male experimental progeny collected for the pAr1 and pBr1 synthetic populations. Populations pAr1 and pBr1 were shown to be free of P elements at generation G52, just prior to beginning the fine-mapping experiment.
Molecular marker development:
We sought to identify 1-kb sequence fragments harboring several polymorphisms that collectively distinguish the founders (Figure 2), and over the coarse- and fine-mapping experiments developed 24 such markers (Table 2). Fourteen of these were identified via blind resequencing of 1-kb, primarily noncoding regions of the D. melanogaster genome for the founder lines (this study and MACDONALD and LONG 2005). The remaining 10 markers were taken from resequencing data generated by others (HARR et al. 2002; ORENGO and AGUADÉ 2004; DUMONT and AQUADRO 2005; OMETTO et al. 2005). Intermarker recombination fractions for the experimental panels of flies are estimated from the genotyping data (described below). To place markers to the standard D. melanogaster genetic map we extracted from FlyBase (http://www.FlyBase.org) all those genes with a known physical position (in base pairs), and an estimated genetic position (in centimorgans). For each chromosome we plotted base pairs against centimorgans, and using the ksmooth function in the statistical programming language R (http://www.R-project.org) generated a smoothed curve through the data. For each marker, using the smoothed curve we estimated the genetic position (on the standard map) from the known physical position. These marker positions were subsequently used as anchors to estimate QTL positions on the standard D. melanogaster genetic and physical maps.
|
The genotype data were processed using custom routines implemented in the statistical programming language R (http://www.R-project.org). First, we ensured that none of the SNPs genotyped segregated within the sequenced strain. Next, for each of the experimental flies we found the maternally inherited haplotype from the synthetic recombinant population. No change to the genotyping data from males is required, since all SNPs are X linked and Drosophila males have a hemizygous X. Experimental females have both a paternally inherited sequenced-strain X and a maternally inherited recombinant X. Because the sequenced strain is isogenic, the haplotype of the recombinant chromosome for each experimental female can be obtained by subtraction. For example, if the sequenced strain is abc, and we observe an experimental female genotype of aaBbCc, we know the inherited recombinant maternal chromosome is aBC. Thus, the maternally inherited recombinant haplotype can always be unambiguously defined.
The next step is to transform the haplotype data from the experimental individuals into a three dimensional matrix, G, where Gimk takes a binary value describing whether the observed maternal haplotype for individual i at marker m is consistent with the haplotype of founder k (k = 1, 2, ..., 8); i.e., Gimk = 1 if the haplotype is compatible with that of the kth founder, and Gimk = 0 otherwise. Using the data from the 12 females genotyped for each founder line, we can list all of the multilocus haplotypes present for each founder and marker. Generally the founder lines are completely inbred, although there is some residual heterozygosity and more than one haplotype can be present within a line at a given marker. Also, founders are not always unique at every marker, and missing data are unavoidable with a project on this scale. Typically we find that markers are not fully informative and fail to distinguish all eight possible founder chromosomes for one or both synthetic populations. Each test individual/marker combination is coded as follows. Consider that the marker haplotypes for the eight founder lines are (1) ABC, (2) AbC, (3) ABc, (4) aBC, (5) AbC, (6) aBc, (7) ABC, and (8) Abc (in this example founders 1 and 7, and 2 and 5, are indistinguishable). If an experimental fly is aBc it must have the chromosome from founder 6 and is coded as 2(61) = 32. Alternatively, if the experimental individual is found to be ABC it might equally be derived from founders 1 or 7 and is assigned the value 2(11) + 2(71) = 65. Finally, a haplotype with missing data, ?B?, is compatible with founders 1, 3, 4, 6, and 7, and is assigned the value 2(11) + 2(31) + 2(41) + 2(61) + 2(71) = 109. By extension it is obvious that an experimental individual will be assigned a value of 1255 for each marker, precisely defining the potential ancestry of the chromosomal segment. Using this coding scheme the raw three-dimensional data matrix, G, can be alternatively represented as a two-dimensional matrix, C, with Cij (the code for the ith individual at the jth position) taking an integer value between 1 and 255. We provide C, along with the corresponding bristle phenotypes, as supplemental material on the GENETICS website (http://www.genetics.org/supplemental).
Statistical platform:
Data analysis consists of three steps, and the statistical machinery is implemented as series of functions in the statistical programming language R, expanding on the R/qtl package (http://www.rqtl.org; BROMAN et al. 2003). First, we consider a 1-cM grid along the chromosome and calculate the probability pijk that individual i carries founder allele k at position j, given the available genotype data, G. This is done using the standard hidden Markov model (HMM) technology of BAUM et al. (1970), first applied in a genetics context by LANDER and GREEN (1987) and adapted to allow for genotyping errors by LINCOLN and LANDER (1992). The observed data, G, are viewed as marker "phenotypes" that are possibly subject to error. The true underlying genotypes are assumed to follow a Markov chain, with each of the eight possible founder alleles being equally likely. For any two positions, the probability of a transition from founder allele k1 to founder allele k2 is r/7 if k1
k2 (recombination in the interval) and 1 r if k1 = k2 (no recombination). Here, r is analogous to the recombination fraction for the interval, but represents recombination events from multiple generations and is estimated from the data. The observed marker genotype at a locus is assumed to be compatible with the true underlying genotype with probability 1
, where
is the genotyping error rate. A readable tutorial on implementing the HMM is provided by BROMAN (2006). The information content of the available marker genotype data may be measured by the proportion of missing information, which we take to be Hj =
i
k pijk log pijk/n log 8, where n is the number of individuals.
The second step is to fit a model relating phenotype to genotype. Initially, at the jth position we calculate the average phenotype by founder genotype (with the ith individual's phenotypic contribution to the mean of the kth founder chromosome weighted by the pij's) and sort these eight means from smallest to largest. We then fit a maximum of seven linear models to the data at each position: model 1 tests the difference between founder material with the smallest mean against all others, model 2 tests the difference between the pair of founders with the two smallest means against all others, and so on. For each model, we create a regressor variable for individual i at position j that is the sum of the elements of pij associated with these contrasts. The test is accomplished by regressing phenotypes on this regressor variable, with the additional constraint that the sum (over individuals) of the regressor variable must be >50. The resulting LOD score at position j uses a model of all eight founders having the same mean as a null and accepts the above contrast with the maximal likelihood as the alternate. Implicit in this analysis is the idea that there is a single biallelic QTL at some position on the chromosome that is segregating among the eight founder chromosomes and that some optimal partitioning of the founders can be used to identify that QTL. We note that the LOD scores resulting from our approach are strongly correlated with the F-statistic obtained from a multiple regression of phenotype onto the pj's at each position over the X chromosome. In the simulations the correlation between the LOD scores and F-statistics is generally >99%, and across all of the experimental panels (both sexes, both traits, both synthetic populations, and both the coarse and fine mapping) the correlation is 97.2%.
The third step of the data analysis is then to estimate the probability that each of the eight founder chromosomes harbors the high, or Q, QTL allele at position j (pQk's) for the model implied by the best partitioning of the founders. This is simply the probability of observing each of the eight founder means given the estimated slope and intercept of that model, conditional on each founder harboring the high QTL. After all three steps are complete we obtain LOD scores and phenotypic effects at J positions in the genome and J corresponding pQ's. Our conservative estimate of the frequency of a QTL located at a local maximum in the LOD profile is the number of elements of pQ
0.95 divided by the number of elements of pQ
0.95 or pQ
0.05 (i.e., we ignore founder lines that do not allow for an accurate estimation of "phase").
Variation due to QTL:
Estimates of QTL effect and frequency can be derived from eight-way synthetic populations, and we can use these values to estimate the fraction of segregating variation, Va, due to identified QTL. We can estimate this both in our (effectively haploid) mapping population as Va = pq
2, and in a natural, outbred diploid population under additivity as Va = 2pq
2, where p and q are the allele frequencies and
is the effect of the QTL (FALCONER and MACKAY 1996, p. 126). In both our mapping population and a natural population, male QTL on the hemizygous X chromosome have Va = pq
2. We can place a 95% confidence interval on Va using Monte Carlo simulation. For
this is accomplished by drawing 10,000 random samples from a normal distribution with mean equal to the observed effect of the QTL and standard deviation equal to the observed standard error on the QTL effect. We estimate the allele frequency, p, differently depending on whether we wish to estimate the variance due to the QTL within our mapping population, or in a natural population. Allele frequency, p, in the mapping population is simply the observed QTL frequency. To estimate allele frequencies of mapped factors in natural populations we draw samples from an allele frequency distribution, whose derivation is conditional on the fact that we observe i copies of a QTL allele among N founder chromosomes. Under neutrality the distribution of allele frequencies is described by WrightFisher sampling as
![]() |
is the per-site heterozygosity under neutrality. The probability of drawing i copies of an allele in a sample of size N is described by a binomial distribution,
![]() |
is "N choose i." By Bayes' theorem,
![]() |
![]() |
is the gamma integral. Two properties of pr(x;i,N) are noteworthy. First, typically
<< 1, and therefore
has little effect on the shape of pr(x;i,N), and second, for large N, and i not close to one or N, pr(x;i,N) is approximately a binomial distribution, and the "prior" assumption of neutrality has little weight. In a natural population, for any given QTL, we assume D. melanogaster
= 0.006 (averaged over 98 loci collated in PRESGRAVES 2005) and use "rejection sampling" (PRESS et al. 1996) to draw 10,000 random deviates from pr(x;i,N) to represent allele frequencies. For each pair of simulated
/p estimates we calculate Va as above. The 95% confidence interval on Va is taken as the 25th and 975th elements of the sorted vector of Va estimates. These values can be transformed to a percentage of the total bristle number variation explained by the QTL by dividing by the observed phenotypic variance. | RESULTS |
|---|
|
|
|---|
Simulations:
We carried out simulations to assess our ability to accurately map QTL and jointly estimate their effect and frequency, and used parameters (chromosome size, marker density, marker informativeness) that realistically mimic the experimental data we collected. We sampled 1152 recombinants from an eight-way synthetic population 16 generations after founding to simulate the chromosome scan, and 56 generations after founding to simulate fine mapping. In each case we assume recombination occurs only in females. At the test generation (G16 or G56) recombinant individuals were created by concatenating chromosomal fragments derived from each of the eight founders with equal probability. Fragment lengths were drawn from an exponential distribution with mean 100/(16/2) or 100/(56/2) cM for the coarse and fine mapping, respectively. For the coarse mapping we simulated 12 partially informative markers equally spaced along a 66 cM chromosome and 5% missing data. For the fine-mapping simulation the 12 markers were placed in a more focused 10 cM region. For simplicity we assume the same level of informativeness at each marker, with four segregating haplotypes that group the founder lines as follows: haplotype 1 (three founders), haplotype 2 (two founders), haplotype 3 (two founders), haplotype 4 (single founder). The separation of founders into different haplotypes was random across markers. Finally, we place a biallelic QTL accounting for 5% of the total phenotypic variation at a random position within the mapping region, with the number of founders having the Q allele varied between one and four out of eight. Five-hundred realizations of each simulation were performed.
The probability of observing a peak in the LOD score >4 is
99%, with an expected maximum LOD score of
9.4 and
11.4 for the coarse- and fine-mapping simulations, respectively. For those peaks associated with a LOD score >4, a 2.5-LOD drop from the maximum includes the simulated position of the QTL >99% of the time. On average, a 2.5-LOD drop maps a significant QTL to a 13.2 cM window with a standard deviation of 6.1 cM (coarse mapping) or a 2.3 cM window with a standard deviation of 0.9 cM (fine mapping). When the LOD score is >4, in no case do we incorrectly infer the "phase" of the QTL, and phase is assigned for an average of 7.8/8 founders. The simulated frequency of the QTL does not appear to affect the probability of inferring the allelic state of the QTL, the power to map a QTL, the average maximum LOD score, or the accuracy in localizing QTL. This is perhaps not surprising given that the simulations hold the proportion of variance attributable to the QTL constant at 5% (LONG and LANGLEY 1999). With the same simulations, but no QTL, the false positive rate at a LOD of four is 2 and 1.6% for the coarse- and fine-mapping simulations, respectively. With our current recombinant panel, marker density, and marker informativeness we can map QTL to the eight founder chromosomes in each of the synthetic recombinant populations. Additional simulations suggest that reducing sample size, marker density, or marker informativeness is detrimental.
Marker informativeness:
Ideally, every marker (a 1-kb fragment genotyped for several SNPs) would completely distinguish among all eight founders in both the pA and pB synthetic populations. In our experimental data this is typically not the case and markers are not fully informative. In fact, it is frequently not possible to distinguish among the eight founders within either population based on the DNA sequence of the entire 1-kb marker amplicon. For those 11 markers for which we had access to sequence from all founders, the average number of distinguishable haplotypes is 6.5/8. This is likely an overestimate of the number of distinguishable founder haplotypes for any arbitrary 1-kb region of the Drosophila genome, as a number of potential markers were sequenced and discarded due to a lack of polymorphism (data not shown). As with any "haplotype tagging" strategy, the SNP genotyping approach we employ further reduces the number of distinguishable haplotypes, both because we do not genotype all available SNPs, and because a proportion of the developed genotyping assays failed (
, or 5.1%). Over the 24 independent markers examined in this study, we successfully genotyped 4.5 SNPs per marker on average, and the mean number of unique haplotypes identified per population per marker is 4.5 (pA = 4.4, pB = 4.6). Markers used solely for fine mapping were slightly more informative (5.0 unique haplotypes per marker on average) than those used solely for coarse mapping (4.1 haplotypes). The increase in informativeness for the fine-mapping markers is due to those used to map the region in the middle of the X chromosome (X-middle markers average 5.4 haplotypes, while X-tip markers average 4.1). Contrary to our intuition the number of distinguishable haplotypes in the founders was not strongly a function of how SNPs were ascertained: Markers developed by sequencing the actual founders, where SNPs were chosen to maximize within-marker haplotype diversity, yielded 4.7 haplotypes per population on average. Markers harvested from published sequencing surveys, where SNPs were simply chosen to have high frequency and little LD with other SNPs in the same fragment, showed similar haplotype diversity in our founders (4.3 haplotypes per population).
The inbred founder lines used to derive the synthetic populations are not isogenic, and 28/384 (7.3%) independent marker/founder combinations show heterozygosity. The heterozygosity is not localized to any particular marker as 17/24 markers show at least one heterozygous line. Half of the 16 founders show no evidence for heterozygosity, while 3 of the lines (A1, B3, and B7) are heterozygous at multiple amplicons. This trio of lines collectively contributes to 23/28 (82.1%) of the heterozygous marker/founder combinations, implying they are less well inbred than the remaining 13 lines. It is of interest that all 16 founder lines were maintained in stock centers at small effective population sizes for >40 years (without being contaminated by P-element-harboring flies). The observation that these lines are not completely homozygous suggests a relatively high rate of tightly linked deleterious alleles in trans.
The HMM employs the genotype data to infer (for every individual and every position) the probability that the chromosomal segment is derived from each of the eight founders. Founder assignment becomes more accurate as the information level in the genotype data increases. We can visualize spatial variation in the information level by color coding (by founder of origin) those chromosomal segments inferred to come from a single founder with a probability >75%. Figure 3 depicts this information for 40 typical males from the pBr1 population. Colored blocks represent highly likely founders, and the information content at any position can be loosely assessed by the amount of white space (i.e., where the probability was <0.75 for all eight founders). For the coarse-mapping scan, information is generally high at the markers, with the obvious exception of marker or.84 (third marker from the right), where only two haplotypes are distinguishable among the eight pB founders. Overall, there appears to be greater information in the fine-mapping population. One exception is the region around marker no.01 (fourth from the left) at the tip of X chromosome. This is likely due to its low marker informativeness (just three haplotypes are distinguishable at no.01 in pB), and because it is relatively distant from either of the flanking markers. We note that the relative size of nonrecombinant fragments is consistent with their expectation given the number of generations the populations experienced recombination/drift. Finally, with reduced information and/or a poorly performing HMM we may expect the most likely founder to "flip-flop" frequently along the chromosome, and this does not appear generally the case.
|
|
|
|
500 experimental flies of each sex from the populations pAr1, pBr1, and pBr2 (population pAr2 became contaminated during maintenance and was destroyed). Experimental individuals from the replicate populations pBr1 and pBr2 were pooled, and we refer to this pooled sample as pBr1+2. Comparison of the data from pBr1 and pBr2 alone with that from the pooled sample does not reveal any obvious inconsistencies. Since the sample size of population pAr1 is around half the size used in our simulations, we likely have reduced power to detect QTL in the pAr1 coarse-mapping sample. We only consider QTL to be present when the peak in the likelihood profile is >4-LOD. The likelihood profiles for the coarse-mapping samples shown in Figure 4 (AD) reveal the existence of QTL for bristle number at the very tip of the X chromosome in both females and males. We find no evidence for bristle number QTL anywhere else on the X for females, but do identify a male-specific QTL for ABN in the middle of the X chromosome (Figure 4D). Details of all QTL identified in the coarse-mapping study are presented in Table 4. For both populations, pAr1 and pBr1+2, and for both sexes QTL for SBN were detected at the tip of the X chromosome with LOD scores between 4.9 and 7.7. The effects of the pBr1+2 X-tip SBN QTL are lower than those detected in pAr1 (0.70 and 0.73 in pBr1+2 vs. 1.36 and 1.42 in pAr1), which may due to the smaller pAr1 sample leading to less robust estimates of the genetic effect. A single X-tip ABN QTL was identified in females of the pBr1+2 sample (Figure 4B). Figure 4D does show a peak at the tip of the X for ABN in pBr1+2 males, but the 2.5-LOD drop for this peak overlaps the larger ABN QTL in the middle of the chromosome, thus we do not consider it an independent QTL. All five X-tip QTL map somewhere between the distal end of the X chromosome and band 5B6 (Table 4). Our identification of five QTL mapping to the very tip of the X chromosome replicates the well-documented effect of this region on bristle number variation in D. melanogaster (LONG et al. 1995; GURGANUS et al. 1998, 1999; NUZHDIN et al. 1999; DILDA and MACKAY 2002).
|
The coarse-mapped QTL are resolved to intervals averaging 8.3 cM (
4 Mb). We elected to fine map two interesting QTL regions (Figure 4, C and D) in males only from the populations pAr1 and pBr1. Figure 5 (AD) presents the likelihood profiles for the two fine mapped regions (X-tip and X-middle) for the populations pAr1 and pBr1, and Table 5 gives details of the fine-mapped QTL. Overall, there is remarkable concordance between the coarse- and fine-mapped QTL (compare Figure 4, CD, with Figure 5, AD).
|
The two best bristle number candidate genes in the fine-mapped X-tip region are the achaete-scute complex, ASC, at cytological position 1A6, and Notch at 3C7-3C9. In Figure 5 (A and C) ASC is located distal to the leftmost marker (or.05), while the fourth marker from the left (no.01) is at Notch. Thus, our data is compatible with the notion that variation at ASC may be responsible for QTL1 and QTL2. It does not seem likely that variation at Notch contributes to QTL3, as the LOD at Notch is 3.4 less than that at QTL3 in population pBr1. However, we cannot rule out the possibility that Notch contributes to segregating variation for SBN in males as the genotype information around Notch in the X-tip region is somewhat low (Figures 3 and 5E), and the LOD score at Notch for SBN in pAr1 males is high (LOD = 6.0, Figure 5A). The broad QTL1 peak in pAr1 males may actually represent two QTL that we have insufficient power to resolve. Since we did not fine map the QTL identified at the tip of the X chromosome in females, we are unable to suggest whether ASC or Notch harbor factors affecting female bristle number.
X-middle fine mapping:
The coarse mapping revealed a strong QTL for male ABN in the middle of the X chromosome in the pBr1+2 population and a suggestive peak (LOD < 4) for male SBN in a similar position (Figure 4D). The fine-mapping experiment almost perfectly replicated these observations (Figure 5, B and D), aside from a slight shifting of the QTL maxima relative to the flanking markers (compare Figures 4D and 5D). The pair of X-middle fine-mapped QTL were resolved to intervals of 0.9 cM (QTL4) and 1.7 cM (QTL5), down from 9 and 41.1 cM in the coarse mapping. The effect of QTL4 is maintained between the coarse-mapping (effect = 0.91) and fine mapping (effect = 0.97), while the effect of QTL5 increases (coarse-mapping effect = 0.58, fine-mapping effect = 1.31). We looked for evidence that either of these X-middle QTL had been identified in other mapping studies of Drosophila bristle number variation. We found no evidence of similarly positioned bristle number QTL in LONG et al. (1995), GURGANUS et al. (1998), or DILDA and MACKAY (2002), and evidence only of weak QTL for female ABN in GURGANUS et al. (1999; QTL between 5D and 8E) and NUZHDIN et al. (1999; QTL between 7D and 8E), suggesting we have identified novel QTL for both male ABN and male SBN. The intervals within which QTL4 and QTL5 reside are genetically short (0.9 and 1.7 cM, respectively), physically short (204-kb and 408-kb, respectively), and harbor few genes (QTL4 = 13 genes, QTL5 = 26 genes, of which 13 overlap with those under QTL4). None of the 26 genes would be considered a priori classic bristle number candidate genes: genes in both QTL4 and QTL5 intervals (oc, CG12772, CG11284, Ppt1, Ogg1, CG11294, Hexo2, CG2004, CG1785, l(1)G0020, CG1789, Lim1, and CG32710) and genes in only QTL5 interval (CG12075, Moe, CG1885, CG10648, e(r), CG15352, CG12660, CG3898, CG12661, rdgA, CG10962, CG12662, and mir-31b). Nevertheless, for two of the genes under both QTL4 and QTL5, oc and Lim1, there is reported evidence of bristle defects in mutant flies: oc (ocelliless) mutants affect interocellar, ocellar, and postvertical bristles (ROYET and FINKELSTEIN 1995), and Lim1 (Lim kinase) mutants affect sternopleural and vibrissae bristles (PUEYO et al. 2000). These two genes may be the best candidates underlying the two novel QTL we identify in the middle of the X chromosome.
Frequency of X-linked bristle number QTL:
Since the synthetic recombinant populations we employ are derived from multiple inbred lines, it is possible to estimate the phenotypic mean for each founder at every position along the chromosome. In turnunder the assumption that an identified QTL is biallelicfounders can be probabilistically assigned to "high" or "low" QTL allele classes. This permits an estimate of the frequency of the QTL. Figure 6 shows, for all five fine-mapped male bristle number QTL and the corresponding coarse-mapped regions, the estimated founder phenotype means, and the probable QTL allele present in each founder. Founder means appear to be estimated well, and as expected estimates are more robust when the sample size is larger: The errors bars are wider for the coarse-mapping pAr1 data (Figure 6A, left) than for the other data sets. Also, those sporadic cases of large standard errors, e.g., line B6 for fine mapping of QTL2 (Figure 6B), are due to a comparatively small number of experimental individuals consistent with harboring this founder chromosome at the QTL.
|
,
, and
founders for QTL1, QTL2, and QTL3, respectively (founders not assigned to either allelic class are ignored). An important caveat is that this analysis rests on the assumption that the QTL are biallelic, which is not necessarily supported for the X-tip QTL. For instance, while founders A2 and A8 are considered the "high" lines for QTL1 (Figure 6A), there is a difference of nearly 0.8 bristles in the estimated phenotype mean of this pair of lines, and the standard errors do not overlap. Also, the error bar around "low" line A5 does not overlap those of "low" lines A3 and A4. Similar inconsistencies within assigned biallelic classes for the other two fine-mapped X-tip QTL (QTL2 and QTL3) also do not give any strong indication that only two QTL alleles segregate (Figure 6, B and C).
Of all the QTL, QTL4 for male ABN was fine mapped to the smallest region (0.9 cM), and for this QTL the pattern of founder means alters between the coarse and fine mapping (Figure 6D). In the coarse mapping, while it is difficult to visualize two clear allelic categories, under the assumption that the QTL is indeed biallelic,
founders have the low allele. In contrast, the fine mapping quite clearly reveals two classes, with the low allele in
founders. One explanation for the change is that the information content of the genotype data is much greater in the X-middle fine mapping (H = 0.187) than in the coarse scan of the entire X chromosome (H = 0.374). The increased information may have led to greater accuracy in estimating the ancestry of recombinant chromosomal segments, and more accurate estimates of the founder means in the fine mapping. Alternatively, there might be additional bristle number factors close to the mapped QTL that interfere with founder mean estimation in the coarse-mapping scan. Expansion of the genetic map in the fine-mapping panel would reduce the effect of any such interference. We note that if marker informativeness is the sole issue, the pattern of founder means for QTL4 observed in the fine-mapping experiment should be seen in the coarse-mapping panel simply by genotyping additional markers.
Both QTL4 and QTL5 appear rare in population pB, with the minor allele present in
founders (Figure 6, D and E), and the eighth founder (line B3) being ambiguous. Since there is no evidence for equivalent QTL in the pA population (Figures 4C and 5B), if we make the reasonable assumption that the pA lines are fixed for the major QTL allele, QTL4 and QTL5 may each have a frequency of
(or
) in our lines. We use Monte Carlo simulation to estimate the fraction of segregating bristle number variation due to these male-specific QTL in our mapping (synthetic recombinant x sequenced strain trans-heterozygote) population (see MATERIALS AND METHODS). QTL4 (effect = 0.97 abdominal bristles, SE = 0.160) was detected in males of panel pBr1, a population which shows an ABN variance in the fine-mapping panel of 3.03 (Table 3). If we assume the rare allele is present in
lines, the average variance explained by QTL4 is 1.9% (95% confidence interval, 0.843.22%). Similarly, QTL5 explains 4.1% (1.268.11%) of male SBN variation. Notwithstanding Beavis effects (BEAVIS 1994), our data imply these QTL contribute 24% to the total variation for bristle number in our mapping panel.
Given that QTL4 and QTL5 reside in very small, and overlapping intervals one might conclude that we have mapped a single pleiotropic QTL contributing to variation in both male ABN and male SBN. Figure 6 (D and E) shows this is not the case. The rare low allele for QTL4 is present in line B5, while the rare high allele for QTL5 is present in line B7 (and perhaps line B3): a single QTL affecting both traits would show the same pattern of alleles across the founders. Thus, QTL4 and QTL5 represent independent mutations that are very tightly linked, perhaps even residing in the same gene. Our ability to distinguish tight linkage from pleiotropy is a consequence of mapping QTL in an eight-way cross and estimating founder mean phenotypes at each QTL.
| DISCUSSION |
|---|
|
|
|---|
In the simulations we also assume that the recombinant population is not subject to drift or selection, and that the expected frequency of genetic material derived from each founder at every point along the chromosome is
. Deviation from this neutral marker/infinite population size assumption may reduce our ability to detect QTL and accurately assign founders to allelic classes. In an extreme case the population could fix for one of the founder haplotypes, rendering QTL undetectable at that position. The likelihood of this occurrence increases as the population is subject to more genetic drift, for example by passing the population through a bottleneck or by maintaining the population for many generations. We deliberately maintained each of our synthetic populations as a large cohort to minimize the effect of drift. Nevertheless, it is clear from fine mapping at the tip of the X chromosome that the genetic material from certain founders can be largely eradicated from the population (Figure 6). It is unclear if the observed loss of some founder alleles is more consistent with random genetic drift or perhaps purifying selection against a disadvantageous chromosomal segment in our populations. The degree to which founder drop-out is a genomewide problem will require further genotyping of the fine-mapping panels across the five major chromosome arms of Drosophila. Further simulation of mapping performance using eight-way recombinant populations subject to many generations of maintenance will build on the work of VALDAR et al. (2006a), and incorporate drift, selection, and more complex, realistic genetic architectures.
Information content of markers:
Each marker is composed of a set of genotyped SNPs within a 1-kb PCR amplicon and has the potential to completely distinguish among a set of eight chromosomes. In practice, we find that developed markers are not completely informative. This is the combined result of marker sequence identity among two or more founders, genotyping only a subset of the available SNPs, genotyping assay failure, and residual segregating variation within founders. Despite the non-fully informative nature of the markers we have power to detect QTL because the HMM employed incorporates data from linked markers (BROMAN 2005). Unlinked markers only provide information on the specific marked segment of the chromosome, whereas a set of linked markers provide information across the linkage group. The level of the information increases with marker density (relative to the average distance between recombination breakpoints) even if the markers remain only partially informative. By extension, instead of attempting to develop highly informative markers, it is possible to apply the HMM to a relatively dense genomewide set of genotyped biallelic SNPs. Future studies of eight-way recombinant Drosophila populations could take advantage of this possibility, but such an approach awaits the development a genomewide bank of intermediate-frequency SNPs for D. melanogaster, as well as some means of inexpensively genotyping those SNPs.
X-linked bristle number QTL:
Drosophila bristle number is arguably the best studied quantitative trait, and coupled with its easy and accurate scoring, permitted a rigorous test of our mapping methodology. A strong expectation was that we would identify bristle number QTL at the distal tip of the X chromosome, as factors influencing both sternopleural and abdominal bristle number have been identified in this region in previous studies (LONG et al. 1995; GURGANUS et al. 1998, 1999; NUZHDIN et al. 1999; DILDA and MACKAY 2002). In a coarse-mapping experiment we identified QTL at the tip of the X for sternopleural bristle number (SBN) for both sexes in both synthetic populations, and for female abdominal bristle number (ABN) in just the pB population (Figure 4). Additionally, we found a QTL for male ABN in population pB in the middle of the X chromosome, but no corresponding QTL in the pA population, and a suggestive peak for male SBN in a similar position (Figure 4D). The limited resolution of the significant factors (8.3 cM on average) in the coarse-mapping experiment bars identification of the underlying molecular basis of mapped QTLa commonly observed shortcoming associated with standard inbred line QTL mapping strategies (MACKAY 2001). Therefore, we took advantage of the increased mapping resolution we can achieve by maintaining our synthetic population for many generations, and chose to fine map two interesting QTL regions (the tip of the X and the middle of the X) in males only. This prevented comparison of any fine-mapped QTL between the sexes, however the sex-specific nature of bristle number QTL/QTN is well established (LAI et al. 1994; LONG et al. 1995, 1998, 2000; LYMAN et al. 1999).
On average, fine-mapped QTL were resolved to 1.3 cM, with the large male pB ABN QTL resolved to just 0.9 cM. These intervals implicate genetically tractable physical distances, and suggest a handful of genes for further study. The best bristle number candidate genes at the tip of the X chromosome are the achaete-scute complex (ASC) and Notch. Association between polymorphisms at ASC and bristle number variation were first seen by MACKAY and LANGLEY (1990), extended and confirmed by LONG et al. (2000), and more fully explored by GRUBER et al. (2007). ASC is located under QTL peaks QTL1 and QTL2, and segregating loci at ASC might plausibly be involved in the expression of these QTL. Unfortunately, the very tip of the X chromosome in Drosophila has a markedly reduced crossover rate relative to physical distance compared to the rest of the chromosome, and LD extends over large physical distances (AGUADÉ et al. 1989). Thus, the prospect for identifying the actual causal locus, rather than a locus in strong LD with the causal site, contributing to QTL1 and QTL2 is somewhat bleak. The Notch pathway is involved in the cell fate decisions that lead to bristle specification, and mutations of the component genes alter bristle patterning and spacing (reviewed by ARTAVANIS-TSAKONAS et al. 1999; LAI 2004). Thus, Notch is considered a viable candidate gene for bristle number variation, although no formal association mapping-style experiment has been performed across the region. The fine-mapping experiment presented here suggests that Notch is unlikely to contribute to segregating variation for male ABN, but we cannot completely rule out an effect of Notch on SBN.
The two QTL mapped to the middle of the X chromosome are particularly interesting as we could find no good evidence for similar QTL in other studies that have scanned the X chromosome (LONG et al. 1995; GURGANUS et al. 1998, 1999; NUZHDIN et al. 1999; DILDA and MACKAY 2002). This is probably because for both QTL the minor allele is rare in our experiment (
), therefore it is not likely that these QTL segregated between pairs of inbred lines studied previously. Together the two QTL intervals harbor 26 genes (just 13 under the smaller QTL4 interval), and none of these represent classic bristle number candidate genes, although two genesocelliless and Lim kinasehave mutants that exhibit bristle defects (ROYET and FINKELSTEIN 1995; PUEYO et al. 2000). Despite overlap in the regions harboring the two QTL, our data show that while QTL4 and QTL5 are tightly linked, they are independent: the alleles for the two QTL are not in phase across the pB founder lines. Thus, they do not represent a single genetic factor having pleiotropic effects on the two bristle characters. Distinguishing independent factors that are within
1 cM highlights the power of our approach compared to standard QTL mapping between pairs of inbred lines. Since QTL4 and QTL5 map to a small region in the middle of the X chromosome having a high rate of recombination relative to physical distance, there is the potential to identify the actual