| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 170, 401-408, May 2005, Copyright © 2005
doi:10.1534/genetics.104.033746
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599
Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
1 Corresponding author: Department of Biology, Campus Box 3280, University of North Carolina, Chapel Hill, NC 27599.
E-mail: tjv{at}bio.unc.edu
| ABSTRACT |
|---|
|
|
|---|
A method for choosing mapping samples on the basis of observable crossovers has previously been proposed, although not in the context of QTL mapping. Selective mapping is an experimental design strategy for genome-wide, high-density linkage mapping of molecular markers in experimental crosses (VISION et al. 2000). In the first step, a limited number of framework markers are genotyped in a large base population. From the resultant genotype matrix, individuals are selected that collectively provide good coverage (as defined below) of the crossover sites in the larger population. Large numbers of secondary markers can then be genotyped on the selected sample and their positions inferred relative to the previously mapped framework markers. The resolution obtained with selective mapping for a given investment of genotyping effort can considerably exceed that obtained using an equivalently sized random sample of individuals. The gain is most dramatic for small genomes (<1000 cM).
In principle, a similar strategy could also be applied to QTL mapping with the aim of maximizing the resolution obtained when only a limited number of permanent genotypes can be propagated or phenotyped. Although it is generally undesirable to use a small sample for QTL mapping when a larger one is available due to the limited QTL detection power and inaccurate estimates of genetic effect sizes obtained with small samples (BEAVIS 1998), practical constraints on sample size are commonplace. For instance, the same genotypes may need to be phenotyped at multiple sites in multiple years, and financial constraints may set an upper limit to the number of genotypes that can be used (see also JIN et al. 2004).
We refer to the choice of individuals for phenotyping on the basis of their genotypes, namely the inferred positions of crossover sites, as selective sampling. Here we study the statistical consequences of using such a selected sample. In particular, we use simulations to quantify the effects of selective sampling on QTL detection power, sensitivity, and specificity and on the precision of estimated QTL positions.
| METHODS |
|---|
|
|
|---|
For most experiments, we used a base population of diploid recombinant inbred lines (RILs), each line derived by recurrent selfing of a unique member of an F2 population. Backcross recombinant inbred line (BRIL) and doubled haploid (DH) base populations were also studied where indicated. Each individual was assumed to have a single linkage group of length L cM, where L varied according to the experiment. At a given marker density, marker positions were assigned with even spacing or uniformly at random. The expected number of crossovers in each individual was calculated as zr = 2 x (1/100)L, since the cumulative number of crossovers is approximately twofold higher in a late-generation selfed F2 RIL than in an F2 individual. (HALDANE and WADDINGTON 1931). The realized number zi of crossovers in individual i was simulated as a Poisson random variable with expectation zr. The locations of the crossovers were drawn from a uniform distribution on (0, L) conditional on zi.
Sampling from a base population:
From a base population, n individuals were selected either at random or using selective sampling with the sum of squares of bin lengths (SSBL) objective function (VISION et al. 2000). The objective function can be understood as follows. A bin is defined on a sample of individuals as an interval along the linkage group within which there are no crossovers in any sampled individual and bounded on either side either by a crossover in at least one individual or by the end of a linkage group (VISION et al. 2000). By minimizing the sum of the squares of the bin lengths, we obtain a sample of individuals in which crossovers are more frequent, and the distance between them less variable, than in a random sample. Previous work has shown that SSBL behaves well even when framework markers are widely spaced and the genotyping error rate is high (VISION et al 2000).
The proportion of individuals from the base population present in either the random or the selected sample is termed the sample fraction, symbolized by f. Selective sampling was performed using the MapPop software package (http:/www.bio.unc.edu/faculty/vision/lab/mappop/).
Crossover enrichment:
Use of the SSBL objective function is expected to lead to an enrichment of crossovers in a selected sample. The total number of crossovers in the selected sample relative to that expected in a random sample of the same size is referred to as the crossover enrichment (CE). CE was measured for selected samples drawn from simulated base populations in which the sample fraction, marker density, and map length were varied.
Pseudointerference:
In addition to crossover enrichment, use of the SSBL objective function is expected to produce bin lengths that are less variable than those in a random sample. This phenomenon, which we call pseudointerference, differs from standard crossover interference in that it arises from selection of crossovers present in different individuals rather than from biological interference among crossovers during meiosis.
We use the Karlin map function (KARLIN 1984) to quantify the magnitude of pseudointerference. A map function models the relationship between m, the genetic distance in morgans, and r, the recombination rate. In random samples, the positions of crossover sites should be well fit by the Haldane map function, which assumes sites are spaced uniformly at random. When there is positive or negative interference among crossovers, a different map function is needed.
For our purposes, the key property of the Karlin map function,
![]() | (1) |
In addition, when QTL analysis is done using interval mapping (LANDER and BOTSTEIN 1989), it is necessary to specify the map function to accurately estimate the position of a QTL relative to its flanking markers. Another motivation for this aspect of the study is thus to provide some guidance as to what map function would be appropriate to use for interval mapping on selected samples.
We fit the Karlin map function to recombination data from selected samples and estimated the magnitude of N. Map positions were first rescaled by CE. Let the number of crossovers in the interval from 0 to i cM in individual j be denoted xij. For an interval to be considered recombinant, we required that mod(xij, 2) = 1 (i.e., there must be an odd number of crossovers in the interval). The recombination rate for the ith interval was calculated as
jmod(xij, 2)/S, where S is the number of individuals in the sample. Equation 1 was then fit by nonlinear regression using SAS (Cary, NC) to data that included all intervals of integral length starting at 0. While the intervals were not strictly independent, obtaining independent intervals with selective mapping would have been computationally prohibitive, and the large sample size (10,000 individuals) ensured that estimates of N were stable.
CE-adjusted random samples:
Differences in QTL estimates obtained with a selected sample vs. a random sample of the same size may be due to CE and/or pseudointerference. To separately investigate these two factors, we employed CE-adjusted random samples in which the expected number of crossovers was equal to zs = zr x CE. The realized number zi of crossovers in individual i was simulated as a Poisson random variable with expectation zs and the locations of the crossovers were drawn from a uniform distribution on (0, L) conditional on zi. Thus, CE-adjusted samples are free of pseudo-interference. By comparison of CE-adjusted random samples with varying levels of CE, the effects of CE alone can be isolated. Alternatively, by comparison of a selected sample with a CE-adjusted random sample of equivalent CE, the effects of pseudointerference alone can be isolated.
QTL experiments:
Simulation of QTL:
To study the effects of selected sampling on QTL analysis, loci affecting a quantitative trait were added to the base populations simulated above. The additive effect was parameterized as a and a for QTL genotypes qq and QQ, respectively. Environmental deviations were drawn from a standard normal distribution. To calculate the heritability h2, we used the fact that the additive genetic variance contributed by a QTL is Vg = a2 in an F2 RIL population. For simulations where additive effects were considered to be random variables, they were sampled from a gamma (1, 2) distribution (ZENG 1992). In simulations involving multiple QTL, the loci were constrained to be spaced at least 100 cM apart.
QTL analysis:
QTL analysis was performed via single-marker analysis as implemented in QTL Cartographer v. 1.16 (BASTEN et al. 2002). At each marker, the following model was fit: yi = b0 + b1xi + e (i = 1, 2, ... , total number of individuals), where yi is the phenotype of the ith individual, xi is the indicator variable for the marker genotype, and the error e is assumed to be normally distributed with mean 0 and variance e2 (BASTEN et al. 2002). A likelihood-ratio (LR) test statistic was computed to test the null hypothesis H0: b1 = 0 vs. the alternative hypothesis H1: b1
0. To estimate the genome-wide significance threshold for the LR, data were simulated under the null hypothesis that no QTL was present. The (1
)100th percentile of the maximum LR was used as the threshold to control the genome-wide type I error at
(after DUPUIS and SIEGMUND 1999).
To estimate the effect of selective sampling on detection power, QTL analysis was performed on samples differing only in crossover enrichment (with uniformly random crossover sites) or on selected vs. CE-adjusted random samples. These two comparisons allow us to separate the effects of crossover enrichment and pseudointerference on the power to detect a QTL and on the precision with which it is located. The detection power was defined to be the probability that the maximum LR at any marker or position exceeded the significance threshold for a type I error of
= 0.05.
To calculate sensitivity and specificity of QTL detection, we adopted the following conventions. A peak was defined as a point where the LR value exceeded both the significance threshold and the LR values of adjacent points (or point, if at the end of a linkage group). The range of the peak was taken to be the interval on either side of the peak bounded by the end of the linkage group or by that point closest to the peak with an LR value below the threshold, whichever came first. The position of the highest LR peak within the range was taken to be the QTL position. If the range bracketed a true QTL position, then the peak was counted as a true positive (TP); if not, it was counted as a false positive (FP). If a true QTL position was not bracketed by the range of any peak, then it was counted as a false negative (FN). Sensitivity (Sn) and specificity (Sp) were then calculated as follows:
![]() |
QTL mapping resolution was measured using two different methods. The first was to take the difference between the 2.5 and 97.5% quantiles of the estimated QTL position among a set of independent QTL populations sharing a fixed QTL position and effect size (DARVASI and SOLLER 1997). To approximate an infinite number of markers, we used an even density of 10 markers per centimorgan for QTL analysis and took the marker with the highest LR in each replicate to be the estimated QTL position. The second method was to calculate the 1-LOD drop support interval, defined as the distance between the two points on either side of the peak where the base-10 logarithm of the LR [the log of odds (LOD) score] declined by 1 unit. In the multiple-QTL simulations, only TP peak ranges were used for this calculation.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
10 cM, CE was fairly insensitive to variability in marker spacing, reflecting the rarity of double-recombinant intervals in such maps. The inverse relationship of CE to map length agrees with previous studies showing that the effectiveness of selective sampling declines with map length (VISION et al. 2000; BROWN 2001).
|
10 cM, CE was nearly inversely proportional to the square root of the map length, L, and the sample fraction, f. Remarkably, CE could be very closely predicted by the empirical formula
![]() | (2) |
500. For BRIL and DH populations, A = 750 and 1200, respectively. Within the particular parameter range that we explored (L = [100, 2500], f = [0.1, 0.9], and marker spacing from 1 to 10 cM), nonlinear regression using Equation 2 yielded R2 values of 0.96, 0.98, and 0.98 for F2 RIL, BRIL, and DH, respectively. Note that simulations were excluded when the map length was short (100 cM) and the sample fraction was small (0.1), as these gave unusually large deviations. Also, Equation 2 was derived using simulated data in which markers were evenly spaced; CE was found to be smaller when markers were distributed uniformly at random but the difference was slight (<0.1).
Pseudointerference:
We measured the magnitude of pseudointerference in selected samples by fitting the Karlin map function to simulated recombination data. A large value of N (>5) indicates that pseudointerference is negligible. We examined base populations of 500 individuals with L = [100, 5000], f = [0.1, 0.9], and marker spacings of 520 cM. The best-fit parameter of N was found to be sensitive to all three factors: map length, sample fraction, and marker spacing (Figure 2). The map function in Equation 1 was fit to recombination data from 10,000 individuals for each parameter combination (where the individuals came from multiple selected samples). In all cases, the R2 goodness-of-fit value was >0.94; for most parameter combinations, it was >0.97. Pseudointerference was greater when the sample fraction was small and the map length was short, consistent with the behavior of CE described above. A more surprising result was that pseudointerference was more pronounced when markers were more distantly spaced.
|
In support of this hypothesis, we found that there were more observed recombinations in selected samples than in random samples even when random samples were adjusted to have the same expected number of crossovers (Figure 3). Another way to understand the underlying effect of selection on crossovers and recombinations is to note the change in the distribution of intercrossover intervals within an individual in a selected sample. Remarkably, the mode in the intercrossover interval length occurs at the same centimorgan distance as the marker spacing used for selective sampling (Figure 4).
|
|
QTL analysis:
We examined the effect of selective sampling on QTL detection power for simulated base populations in which one QTL was segregating at an equal distance from two flanking markers. The threshold and power for a given experiment were determined empirically, as described in METHODS. We conducted two experiments to quantify the effects of CE and pseudointerference separately from one another.
In the first experiment, we compared CE-adjusted random samples with varying levels of CE under different marker spacings. This comparison allows us to evaluate the effect of CE alone since these samples are free of pseudointerference. We found that power was inversely related to CE but that the relationship was nearly flat when the marker spacing was <5 cM, corresponding to a marker-QTL distance of
2.5 cM (Figure 5A). Even for CE = 1, the detection power was inversely related to the marker spacing. This indicates that the increasing distance between the flanking markers and the QTL was more important to detection power than the variation in the significance threshold, which was lower for the more widely spaced markers.
|
For experimental design purposes, an investigator would like to know how dense markers need to be, when selecting a predetermined fraction f, to obtain the same QTL detection power as that of an equivalently sized random sample. To study this, we simulated populations in which QTL position was random with respect to the markers and compared random samples to selected samples differing in both CE and pseudointerference (Figure 6). In the case of a 1000-cM map, 51 markers in a random sample had equivalent power to an f = 0.5 selected sample with 59 markers or an f = 0.1 selected sample with 72 markers. Thus, a modest increase in marker density can counteract the effects of increased CE and pseudointerference even under extreme selection.
|
|
QTL mapping resolution:
Since QTL detection power is reduced considerably for distant markers, but less so for nearby ones, we hypothesized that the confidence or support intervals to which QTL are located in selected samples might be smaller than those in random ones. To test this hypothesis, we examined the effect of selective sampling on two different measures of QTL mapping resolution: the distribution of QTL peak positions among independent simulations (DARVASI and SOLLER 1997) and the 1-LOD drop support interval. For both measures, QTL were located with substantially greater precision in selected samples. The 1-LOD drop support interval results are shown in Figure 8A for simulated populations in which one QTL was segregating. The increase in QTL mapping resolution was similar across a range of additive-effect sizes. Precision was substantially better when selection was done with a shorter map. Precision was also improved, but not as dramatically, by selection using densely spaced markers.
|
Conclusions:
In summary, we have found that the probability of detecting a QTL is somewhat diminished in a selected sample relative to a random one when the QTL is far from a marker. But since this reduction in power disappears when the distance between the marker and the QTL approaches zero, the width of the confidence interval surrounding the QTL is narrowed, resulting in a more precise estimate of QTL location. The increased marker density needed to take advantage of this increased resolution is fairly modest. Additionally, specificity in QTL detection is slightly higher in selected samples.
One reason for the difference between selected and random samples is the increased number of crossover sites or CE. We found that a simple formula can be used to predict CE in a selected sample when the marker spacing is dense (
10 cM). The value of CE thus obtained can be used to adjust the genetic map prior to statistical analysis of QTL.
A second factor affecting QTL detection power and resolution is the reduced variability in intercrossover interval length within each individual, or what we have termed pseudointerference.
QTL mapping is widely used a first step in the determination of the molecular basis of phenotypic variation relevant to agriculture, medicine, ecology, and evolutionary biology (MACKAY 2001). Since, for most organisms, QTL intervals can encompass tens to hundreds, even thousands, of genes, the major effort in cloning a QTL is the work required to refine the estimated position once a genome-wide survey has been completed (REMINGTON et al. 2001). Thus the feasibility of QTL mapping hinges, in part, on the precision with which QTL can be located during this initial scan. A number of strategies are available for increasing QTL map resolution, among them the use of large populations and multiple generations of intercrossing. We have shown here that selective sampling is an additional strategy that can be used to increase QTL map resolution over a random sample of the same size. Selective sampling is suitable for situations in which the number of individuals that can be genotyped is large but the number that can be phenotyped is not. This situation is likely to become increasingly common as large, permanent, genotyped mapping populations are produced for various model organisms and as QTL mapping is applied to ever more subtle and complex traits (e.g., THREADGILL et al. 2002).
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
BASTEN, C. J., B. S. WEIR and Z-B. ZENG, 2002 QTL Cartographer: A Reference Manual and Tutorial for QTL Mapping. (http://statgen.ncsu.edu/qtlcart/manual/).
BEAVIS, W. D., 1998 QTL analyses: power, precision and accuracy, pp. 145161 in Molecular Analysis of Complex Traits, edited by A. H. PATERSON. CRC Press, Boca Raton, FL.
BROWN, D. G., 2001 A probabilistic analysis of a greedy algorithm arising from computational biology. Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, Washington, DC, pp. 206207.
DARVASI, A., and M. SOLLER, 1997 A simple method to calculate resolving power and confidence interval of QTL map location. Behav. Genet. 27: 125132.[CrossRef][Medline]
DOGANLAR, S., A. FRARY, H.-M. KU and S. D. TANKSLEY, 2002 Mapping quantitative trait loci in inbred backcross lines of Lycopersicon pimpinellifolium (LA1589). Genomics 45: 11891202.
DUPUIS, J., and D. SIEGMUND, 1999 Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151: 373386.
HALDANE, J. B. S., and C. H. WADDINGTON, 1931 Inbreeding and linkage. Genetics 16: 357374.
JIN, C., L. LAN, A. D. ATTIE, G. A. CHURCHILL, D. BULUTUGLO et al., 2004 Selective phenotyping for increased efficiency in genetic mapping studies. Genetics 168: 22852293.
KARLIN, S., 1984 Theoretical aspects of genetic map functions in recombination processes, pp. 209228 in Human Population Genetics: The Pittsburgh Symposium, edited by A. CHAKRAVARTI. Van Nostrand Reinhold, New York.
LANDER, E. S., and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185199.
MACKAY, T. F., 2001 The genetic architecture of quantitative traits. Annu. Rev. Genet. 35: 303339.[CrossRef][Medline]
NADEAU, J. H., and D. FRANKEL, 2000 The roads from phenotypic variation to gene discovery: mutagenesis versus QTL. Nat. Genet. 25: 381384.[CrossRef][Medline]
REMINGTON, D. L., M. C. UNGERER and M. D. PURUGGANAN, 2001 Map-based cloning of quantitative trait loci: progress and prospects. Genet. Res. 78: 213238.[CrossRef][Medline]
RONIN, Y., A. KOROL, M. SHTEMBERG, E. NEVO and M. SOLLER, 2003 High-resolution mapping of quantitative trait loci by selective recombinant genotyping. Genetics 164: 16571666.
THREADGILL, D. W., K. W. HUNTER and R. W. WILLIAMS, 2002 Genetic dissection of complex and quantitative traits: from fantasy to reality via a community effort. Mamm. Genome 13: 175178.[CrossRef][Medline]
VISION, T. J., D. G. BROWN, D. B. SHMOYS, R. T. DURRETT and S. D. TANKSLEY, 2000 Selective mapping: a strategy for optimizing the construction of high-density linkage maps. Genetics 155: 407420.
ZENG, Z-B., 1992 Correcting the bias of Wright's estimates of the number of genes affecting a quantitative trait: a further improved method. Genetics 131: 9871001.[Abstract]
ZENG, Z-B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 14571468.[Abstract]
Communicating editor: A. H. PATERSONThis article has been cited by other articles:
![]() |
M. Simon, O. Loudet, S. Durand, A. Berard, D. Brunel, F.-X. Sennesal, M. Durand-Tardif, G. Pelletier, and C. Camilleri Quantitative Trait Loci Mapping in Five New Large Recombinant Inbred Line Populations of Arabidopsis thaliana Genotyped With Consensus Single-Nucleotide Polymorphism Markers Genetics, April 1, 2008; 178(4): 2253 - 2264. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sillanpaa and F. Hoti Mapping Quantitative Trait Loci From a Single-Tail Sample of the Phenotype Distribution Including Survival Data Genetics, December 1, 2007; 177(4): 2361 - 2377. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. J. M. Rosa, N. de Leon, and A. J. M. Rosa Review of microarray experimental design strategies for genetical genomics studies Physiol Genomics, December 13, 2006; 28(1): 15 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Granhall, H.-B. Park, H. Fakhrai-Rad, and H. Luthman High-Resolution Quantitative Trait Locus Analysis Reveals Multiple Diabetes Susceptibility Loci Mapped to Intervals <800 kb in the Species-Conserved Niddm1i of the GK Rat Genetics, November 1, 2006; 174(3): 1565 - 1572. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Zou, Z. Xu, and T. Vision Assessing the Significance of Quantitative Trait Loci in Replicable Mapping Populations Genetics, October 1, 2006; 174(2): 1063 - 1068. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Orgogozo, K. W. Broman, and D. L. Stern High-Resolution Quantitative Trait Locus Mapping Reveals Sign Epistasis Controlling Ovariole Number Between Two Drosophila Species Genetics, May 1, 2006; 173(1): 197 - 205. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |