Abstract
Mapping quantitative trait loci (QTLs) is usually conducted with a single line cross. The power of such QTL mapping depends highly on the two parental lines. If the two lines are fixed for the same allele at a putative QTL, the QTL is undetectable. On the other hand, if a QTL is segregating in the line cross and is detected, the estimated variance of the QTL cannot be extrapolated beyond the statistical inference space of the two parental lines. To reduce the likelihood of missing a QTL and to increase the statistical inference space of the estimated QTL variance, we present a consensus QTL mapping strategy. We adopt the identical by descent (IBD)based variance component method originally applied to human linkage analysis by combining multiple line crosses as independent families. We explore the properties of consensus QTL mapping and demonstrate the method with F_{2}, backcross (BC), and fullsib (FS) families. In addition, we examine the effects of the QTL heritability, marker informativeness, QTL position, the number of families, and family size. We show that F_{2} families notably outperform BC and FS families in detecting a QTL. There is a substantial reduction in the standard deviation of the estimated QTL position and the separation of the QTL and polygenic variance. Finally, we show that the power to detect a QTL is greater when using a small number of large families than a large number of small families.
LINE crossing is a common experimental design for mapping quantitative trait loci (QTLs) in plants and laboratory animals. Crosses are initiated from at least two inbred lines, such as backcrosses (BC), F_{2}, and more derived generations. Statistical methods are well developed for QTL mapping using such line crossing data (Lander and Botstein 1989; Haley and Knott 1992; Martínez and Curnow 1992; Jansen 1993; Zeng 1994). These methods are mainly designed to handle a single line cross. The characteristics of line crossing experiments are: (1) a small number of parental lines are involved, (2) the linkage phases of the parental markers are known, and (3) family sizes are usually large. These properties allow the effects of a gene substitution to be tested directly. The methods developed by the above authors all test the effects of a gene substitution (the first moments) and therefore are referred to as the fixed model approach (Xu and Atchley 1995).
Quantitative geneticists are interested not only in detecting QTLs and locating their positions, but also in estimating the contribution of the detected QTLs to a trait. The contribution of a QTL, however, is only meaningful when expressed relative to the total phenotypic variance. Therefore, the effect of a QTL is actually measured by its variance. In a single line cross, the QTL variance is relative to the genetic variance among individuals within that line cross; i.e., the QTL variance is formulated as conditional on the cross. As a result, the variance itself is a variable that differs from one cross to another. Therefore, a QTL variance estimated from a single line cross cannot be extended to a statistical inference space beyond that cross. In addition, the number of founder alleles at any locus is expected to be small in a line cross. For instance, there are at most two alleles at each locus in an F_{2} family. With such a single line cross, one's entire effort is invested in this single large family. If the two founder alleles of a QTL are polymorphic, then detection of the QTL is possible with a relatively large family. On the other hand, if the two parental lines are fixed for the same allele at a particular QTL, then this QTL is undetectable, independent of the sample size. To increase the statistical inference space of the estimated QTL variance and ensure that polymorphic alleles are present in the parental gene pool, one needs to sample a sufficient number of parents (Muranty 1996). This can be achieved by combining data from multiple line crosses.
Suppose that there are 10 F_{2} families derived from 10 pairs of inbred lines. What is the appropriate statistical method for analyzing the data from these F_{2} families? One may simply extend the regression approach to fit 2 additional parameters, 1 mean and 1 gene substitution effect, for each F_{2} family added to the data set. This means estimating 20 parameters and testing 10 additive effects. If dominance deviations are considered, 10 additional terms must be estimated and tested. In a single line, fixed model, one can easily convert the effect of a gene substitution into the variance via
Data from multiple line crosses, such as diallelic and fourway crosses, can occasionally be analyzed using the methods of Rebai and Goffinet (1993) and Xu (1996a). A survey of the literature shows that the most popular computer software, such as MAPMAKER/QTL (Lincolnet al. 1993), QTL Cartographer (Bastenet al. 1997), MapQTL (van Ooijen and Maliepaard 1996), and MQTL (Tinker and Mather 1995), are designed to handle only a single line cross.
In contrast to the difficulties of the fixed model, the IBDbased variance component method initially developed for human genetic studies can handle multiple families (Haseman and Elston 1972). This method has been referred to as the random model approach because the QTL variance is directly estimated and tested (Xu and Atchley 1995). To separate the QTL variance from the polygenic variance, the IBDbased approach relies on variation in the proportion of genes IBD shared by relatives at the putative QTL. The random model approach is adopted here for combining data from different line crosses because each line cross is effectively a different family.
Before one can apply the random model approach to line crosses, one needs to adjust for the fact that regular fullsib (FS) families and families of line crosses differ in that the latter involves inbreeding (for example, an F_{2} individual is equivalent to a progeny resulting from a selfing parent). The traditional IBDbased method must therefore be modified to reflect the inbreeding effect. Our purpose here is to develop such an IBDbased random model methodology for combining data from different line crosses. We examine two types of line crosses: F_{2} and BC. We then compare the results with regular noninbred, FS families.
STATISTICAL METHODS
Linear model and likelihood function: We combine line crosses by treating each line cross as a family and using a multipoint QTL mapping methodology. Consider a family with n individuals; the phenotypic value (y_{i}) of the ith individual is described as y_{i} = μ + g_{i} + a_{i} + e_{i} (Goldgar 1990; Xu and Atchley 1995), where μ is the overall mean, g_{i} is the additive effect of a putative QTL with mean 0 and variance
Under the assumption that y is multivariate normal and Π is known, the likelihood function for a particular family is
Assume that families are independent so that the overall likelihood function for multiple families is simply the product of these familyspecific likelihoods. Therefore, the overall log likelihood for N families is:
To test the presence of a QTL, a log likelihood ratio test statistic is used, which is Λ = −2(L_{0} − L_{1}), where L_{1} is the log likelihood value evaluated at the maximum likelihood solution under the alternative model (
The IBD value between two sibs at a QTL: Because of inbreeding, the IBD values among F_{2} individuals are different from those among regular full sibs. If the parental lines are fixed for alternative QTLs, then F_{2} individuals have three possible genotypes at a QTL: QQ, Qq, and qq. Given the genotypic configuration of individuals i and j, the IBD value is measured as
Without inbreeding, the IBD value of an individual with itself (π_{jj}) always takes a value of 1. Under inbreeding, π_{jj} can be greater than 1, depending on whether the individual is homozygous or heterozygous, i.e.,
The elements in the additive relationship matrix A are IBD values of the polygenic component and can be obtained by taking the unconditional expectation of π_{ij}. In an F_{2} family, A has elements of A_{ij} = E(π_{ij}) = 1 and A_{jj} = E(π_{jj}) = 3/2, in contrast to A_{ij} = ½ and A_{jj} = 1 in a regular FS family.
In BC populations, the π's are derived similarly. They are
Inferring the IBD value of a QTL from markers: The IBD value is completely determined by the genotypes of two individuals at the QTL of interest. The actual genotype of an individual, however, is not observable and it must be inferred from its marker information. In F_{2} and BC populations, two flanking markers are sufficient if Haldane's mapping function is assumed and the markers are completely informative. The conditional distribution of the QTL genotype given the genotypes of the flanking markers is given by Jiang and Zeng (1996). Denote the conditional probabilities of the three genotypes of the QTL by
The conditional expectations of the IBD in a BC population are
SIMULATION STUDIES
Individuals within an F_{2} family are equivalent to full sibs resulting from selfing a single parent. As a consequence, we randomly sampled a single parent from an infinitely large panmictic (or base) population. This single parent was then selfed to produce an F_{2} family. Individuals within a BC family were derived by crossing an F_{1} hybrid with one of its homozygous parents. The regular (noninbred) FS families were generated from the mating of two unrelated parents sampled from the base population. Families, including those of regular (noninbred) FS families, were analyzed via the maximum likelihood method (Xu and Atchley 1995).
To infer the IBD value of a QTL from markers, we used a multipoint methodology (Fulkeret al. 1995; Kruglyak and Lander 1995; Olson 1995). In most cases, we simulated one chromosome of length 100 cM with six biallelic markers evenly spaced along the chromosome. The two alleles at each marker were equally frequent. A single QTL with six equally frequent alleles was simulated at position 50. In addition to the QTL of interest, we also simulated 12 independent biallelic loci of equal effects to form the polygenic contribution. A detailed description of the simulation process for random mating populations can be found in Gessler and Xu (1996).
We simulated 50 families each with 10 siblings (a total of 500 individuals). For each run, a single set of phenotypic values was generated with a QTL, polygenic, and residual variance of
To examine the effect of different factors on the performance of the methods, we varied each of the following factors successively: (a) the number of families × family size: 20 × 25, 250 × 2, or 500 × 2, (b) QTL heritability,
To estimate the strength of a false positive signal, we ran an additional 1000 simulations with no QTL segregating. We augmented the polygenic variance such that the total genetic variance remained unchanged. From each simulation we chose the maximum observed likelihood ratio (LR) found across the chromosome and then determined the 95th percentile from the list of 1000 runs as an estimate of the chromosomewise critical value.
RESULTS
The average likelihood ratio (test statistic) profiles over 100 replications of the three mating designs under the standard setting are depicted in Figure 1. It is evident that the F_{2} families notably outperform the two other mating designs. The benefit of QTL mapping using F_{2} families is manifest as a signal 70% higher than BC families, with BC families having a slightly higher signal than FS families. Since the critical values of the LR test statistic in the three mating populations are nearly equivalent, we conclude that QTL mapping using F_{2} families has a higher power than BC and FS families under the standard parameter setting.
Under the standard setting, the QTL position and the total phenotypic variance are successfully estimated, while the sum of the heritabilities,
The levels of the QTL effect (proportional to the heritability value at the QTL) and marker informativeness produce a strong effect on the precision of the estimated QTL position. As expected, a higher QTL effect or higher marker informativeness decreases the standard deviation of the estimated QTL position (Tables 2 and 3). However, the levels of QTL heritability or marker informativeness have a smaller effect on the precision of the phenotypic variance and estimated heritabilities. Higher marker informativeness tends to decrease the standard deviation of various ML estimates, while a decrease in marker informativeness leads to an increase in the confounding of
One clear feature in the simulations is that using a large sibship per family has a pronounced effect on the ability to detect the QTL. Figure 2 presents the results of sibships for three mating populations. The signal at the QTL with 10 or 25 sibs per family is 250 or 500% higher, respectively, than that for two sibs per family. In addition, with a fixed number of 500 individuals tested, increasing family size from 2 to 25 decreases the standard deviation of the estimated QTL position. It also increases the ability to separate the genetic variance into the polygenic and the QTL components (Tables 1 and 4). The standard deviation of the estimated phenotypic variance increases as the number of families decreases (Table 4).
Figure 3 shows the simulation results with the QTL at position 10 or 30 of a chromosome of length 100 cM. It is generated by taking the average value at each position, and this method shows no bias in predicting the position of the QTL. Alternatively, taking the average maximum value of each run produces a slight bias toward the center of the chromosome, as reported in Table 5. Of the three populations, the BC population has the largest bias in the estimated QTL position. This bias is caused by some runs where the QTL effect is not significant. In these situations, the QTL position, on average, tends to be close to the center.
The empirical threshold values of LR test statistics over 1000 replicated simulations are reported in Table 6. It can be seen that all three mating populations have nearly equivalent critical values. The average LR test statistics and the power estimates (Type I error rate at α = 0.05) over 100 replicated simulations are summarized in Table 7. First, the average LR in F_{2} families is notably greater than that in BC or FS families, whereas both BC and FS families have similar test statistics and powers. Second, under the condition of low marker informativeness, FS families have a power in QTL detection relatively higher than that of either F_{2} or BC families. Recall that an F_{2} family is generated from a single parent by selfing. Accordingly, only two alleles at a specific locus are randomly sampled from the reference population. The two alleles have a large probability being the same state under the condition of low marker informativeness. In contrast, to generate FS families, two parents or four alleles at a specific locus are randomly sampled. This process essentially reduces the chance of a locus being monomorphic and thus increases the marker and QTL informativeness. This explains why the FS design is more powerful than the F_{2} and BC when the marker information content is low.
DISCUSSION
What contributes to the variance in the estimate of the total phenotypic variance
Similarly, in Table 4 small families have smaller standard deviations. Note that the phenotypic variance can be partitioned into variance between families and variance within families. When the total number of individuals is fixed, reducing the number of families increases the standard deviation for the betweenfamily variance component (genetic drift) and decreases the standard deviation for the withinfamily component. When the increase is greater than the decrease, the net effect on the estimated phenotypic variance is an increased standard deviation.
To make the most efficient use of marker data, many QTL mapping experiments are designed to detect a number of economic traits, rather than only one trait (e.g., Edwardset al. 1987). How to select parental lines that are fixed for alternative QTLs for multiple traits is a difficult task. The natural choice is to use more than two parental lines in a mating design. Limited investigations have shown that QTL mapping by using multiple line crosses has several advantages. First, it can handle multiple alleles at any locus and thus has a wider statistical inference space than a single line cross. Second, the use of mating designs with an increased number of parents is more efficient than the use of only one F_{2}like FS family in outbred populations. This is because the variance attributable to the QTL is better estimated as the number of parents increases (Muranty 1996). However, with a fixed number of individuals, there is an optimal allocation between the number of families and the number of individuals per family where QTL mapping reaches its maximum power and minimum estimation error (Soller and Genizi 1978). Third, a joint test for multiple line crosses is more powerful than a test considering crosses independently (Rebai and Goffinet 1993).
For convenience of presentation, the consensus method of QTL mapping described above assumes that a dominance effect is absent. We now discuss how to relax this assumption using F_{2} family data as an example. The model can be described by y_{j} = μ + g_{j} + δ_{j} + s + e_{j}, where δ_{j} is the dominance deviation of a putative QTL with mean 0 and variance
In this study, we have used a random model methodology to detect a QTL. Essentially, the theoretical basis of the random model is based on the variability of the IBD proportion shared by sibs at the putative QTL. For example, the variance of the IBD proportions are, on average, 1/8 for noninbred full sibs and 3/16 for siblings from a BC. In contrast, the variance of the IBD proportion is 1/4 for siblings from F_{2}. This difference results in QTL mapping using F_{2} families being more powerful than BC or FS families, while both BC and FS families have similar test statistics and powers.
The F_{2} and BC mating designs require the availability of inbred lines. If no such lines exist in nature, one must develop such lines, and this is costly and time consuming. In this case, the FS mating design is more preferable than F_{2} and BC. In selfincompatible organisms, the FS mating design is the only choice.
The consensus QTL mapping proposed here is a general approach for combining or updating data. By setting the relevant Π and A matrices on a familybyfamily basis, families from all types of mating designs can be sampled at different locations or different laboratories may be combined. Alternatively, data can be combined vertically; that is, data collected in the same laboratory but at different times can be pooled through the consensus mapping strategy.
Acknowledgments
This research was supported by the National Institutes of Health Grant GM5532101 and the United States Department of Agriculture National Research Initiative Competitive Grants Program 97352055075 to S.X.
Footnotes

Communicating editor: T. F. C. Mackay
 Received October 1, 1997.
 Accepted February 20, 1998.
 Copyright © 1998 by the Genetics Society of America