Multiple Interval Mapping for Quantitative Trait Loci
- * Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China
- † Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203
- ‡ ForBio Research Pty Ltd., Toowong, Queensland 4066, Australia
- Corresponding author: Chen-Hung Kao, Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China. E-mail: chkao{at}stat.sinica.edu.tw
Abstract
A new statistical method for mapping quantitative trait loci (QTL), called multiple interval mapping (MIM), is presented. It uses multiple marker intervals simultaneously to fit multiple putative QTL directly in the model for mapping QTL. The MIM model is based on Cockerham's model for interpreting genetic parameters and the method of maximum likelihood for estimating genetic parameters. With the MIM approach, the precision and power of QTL mapping could be improved. Also, epistasis between QTL, genotypic values of individuals, and heritabilities of quantitative traits can be readily estimated and analyzed. Using the MIM model, a stepwise selection procedure with likelihood ratio test statistic as a criterion is proposed to identify QTL. This MIM method was applied to a mapping data set of radiata pine on three traits: brown cone number, tree diameter, and branch quality scores. Based on the MIM result, seven, six, and five QTL were detected for the three traits, respectively. The detected QTL individually contributed from ∼1 to 27% of the total genetic variation. Significant epistasis between four pairs of QTL in two traits was detected, and the four pairs of QTL contributed ∼10.38 and 14.14% of the total genetic variation. The asymptotic variances of QTL positions and effects were also provided to construct the confidence intervals. The estimated heritabilities were 0.5606, 0.5226, and 0.3630 for the three traits, respectively. With the estimated QTL effects and positions, the best strategy of marker-assisted selection for trait improvement for a specific purpose and requirement can be explored. The MIM FORTRAN program is available on the worldwide web (http://www.stat.sinica.edu.tw/~chkao/).
THE basic principle of using genetic markers to study quantitative trait loci (QTL) is well established (Sax 1923; Thoday 1960; Jayakar 1970; Lander and Botstein 1989; Carbonellet al. 1992; Haley and Knott 1992; Jansen 1993; Zeng 1993, 1994). Sax (1923) first used pattern and pigment markers in beans to analyze genes affecting seed size by investigating the segregation ratio of F2 progeny of different crosses. Thoday (1960) proposed the idea of using two markers to bracket a region for detecting QTL. The basic idea of Sax and Thoday for detecting the association of a QTL with a marker rests on the comparisons of trait means of different marker (chromosomal segment) classes. These methods, such as t-test and simple and multiple regressions, directly analyze markers.
In recent years, the advent of fine-scale molecular genetic marker maps for various organisms by molecular biology techniques has greatly facilitated the systematic mapping and analysis of individual QTL. Lander and Botstein (1989) proposed a much-improved method, named interval mapping (IM), for mapping QTL. They used one marker interval at a time to construct a putative QTL for testing by performing a likelihood ratio test (LRT) at every position in the interval. With a fine-scale genetic marker map throughout the genome, IM can be performed at any position covered by markers to produce a continuous LRT statistical profile along chromosomes. The position with the significantly largest LRT statistic in a chromosome region is an estimate of QTL position. It has been shown that IM has more power and requires fewer progeny than the methods for direct analysis of markers (Lander and Botstein 1989; Haley and Knott 1992; Zeng 1994). Haley and Knott (1992) proposed a regression version of interval mapping to approximate IM. Although Haley and Knott's method could save time in computation and produce similar results to those obtained by IM, the estimate of the residual variance is biased, and the power of QTL detection can be affected (Xu 1995).
The approach of IM considers one QTL at a time in the model for QTL mapping. Therefore, IM can bias identification and estimation of QTL when multiple QTL are located in the same linkage group (Lander and Botstein 1989; Haley and Knott 1992; Zeng 1994). To deal with multiple QTL problems, Jansen (1993) and Zeng (1993, 1994) independently proposed the idea of combining IM with multiple regression analysis in mapping. Zeng named this combination “composite interval mapping” (CIM). The approach of CIM is that, when testing for the putative QTL in an interval, one uses other markers as covariates to control for other QTL and to reduce the residual variance such that the test can be improved. The model of CIM includes one QTL and markers. Hoeschele and Vanranden (1993a,b), Satagopan et al. (1996), and Sillanpaa and Arjas (1998) used a Bayesian approach in estimation and to identify QTL. Doerge and Churchill (1996) used permutation tests for QTL detection. Mapping for QTL controlling binary trait and ordinal categorical traits is presented by Hackett and Weller (1995) and Xu and Atchley (1996). In human and animal genetics, the mixed model, including random effect, has been applied to QTL mapping (Xu and Atchley 1995; Grignola et al. 1996a,b).
Ideally, we would extend the current QTL mapping models to a multiple QTL model for mapping multiple QTL in a way that QTL can be directly controlled in the model to further improve QTL mapping. In this article, a new QTL mapping method named multiple interval mapping (MIM) was developed. MIM uses multiple marker intervals simultaneously to construct multiple putative QTL in the model for QTL mapping. Therefore, when compared with the current methods such as IM and CIM, MIM tends to be more powerful and precise in detecting QTL as shown by the example in this article. In addition, MIM can readily search for and analyze epistatic QTL and estimate the individual genotypic value and the heritabilities of quantitative traits. On the basis of the MIM result, genetic variance components contributed by individual QTL were also estimated, and marker-assisted selection can be performed.
GENETIC MODEL
Consider m QTL, Q1, Q2, ···, and Qm, in a backcross population in which there are two genotypes, QjQj and Qjqj, each with one-half frequency for a QTL, say Qj. For m QTL, there are 2m possible different QTL genotypes in the population. Cockerham's genetic model (C-H. Kao and Z-B. Zeng, unpublished results) is used to define the genetic parameters and model the relation between the genotypic value and the genetic parameters. If only up to digenic epistasis is considered, the relation between the genotypic value of individual i, Gi, and the genetic parameters can be expressed in the equation
To assist with explaining the estimation of the genetic effects in the MIM model (Equation 3), the genetic model in Equation 1 is expressed in matrix notation as Equation 2 (Scheme 1). In Equation 2, the column vector G contains the genotypic values of the 2m possible genotypes. The subscripts of G (1 or 0) denote the homozygote or heterozygote of the QTL in the order of the first, second, third, ···, and mth QTL, respectively. The first m columns in the genetic design matrix D are the coefficients associated with the main effects of the m QTL, and the last m(m – 1)/2 columns represent the coefficients of the epistatic effects among them. Vector E contains the QTL main and epistatic effects. If there is no epistasis between some QTL, some of the columns for epistasis should be dropped out from matrix D. If higher-order epistasis is considered, the dimension of matrix D is easy to expand accordingly. The matrix D plays an important role in estimation of genetic parameters in the MIM model.
STATISTICAL MODEL OF MIM
Multiple interval mapping: Assume m QTL, Q1, Q2, ···, and Qm, located at positions p1, p2, ···, pm in m different marker intervals, I1, I2, ···, Im, along the genome, control a quantitative trait y. Among the m QTL, some may show epistasis and some may not. The quantitative trait value for an individual, i, can be related
Scheme 1
to the m putative QTL by the model
The MIM model is a multiple QTL model and its likelihood is a finite normal mixture. There are two problems that need to be solved for the MIM model. The first is that of parameter estimation of the finite normal mixture model. As m becomes large, the derivation of the maximum-likelihood estimates (MLEs) of the QTL effects and positions in estimation quickly becomes unwieldy. To handle the estimation problem, the general formulas derived by Kao and Zeng (1997) are used to obtain the MLEs in parameter estimation. The second problem is how to find QTL to fit into the MIM model. To select QTL for the MIM model, a stepwise model selection procedure is proposed in strategy of qtl mapping.
LIKELIHOOD OF THE MIM MODEL
In the MIM model, the genotype of each putative QTL, Qj in interval Ij, is not observed, but its distribution can be inferred from the flanking markers of Ij based on the recombination frequency between them. For every QTL in the backcross population, the conditional probabilities of the QTL genotypes, given different flanking marker genotypes, can be found in Table 1 of Kao and Zeng (1997). To infer the joint conditional probability of the genotype of the m putative QTL, we use the property that if there is no crossing-over interference, the conditional distributions of the individual putative QTL genotypes, given the flanking marker genotypes, are independent. That is,
Given a sample with size n, the likelihood function of the MIM model for θ = (p1, p2, ···, pm, a1, ···, am, ···, wjk, ···, σ2) is
PARAMETER ESTIMATION
The likelihood of the MIM model is a finite normal mixture. In parameter estimation, the finite normal mixture model can be treated as an incomplete-data problem (Little and Rubin 1987) by regarding the trait and markers as observed data and the QTL as missing data. The EM algorithm can be used for obtaining the MLEs of the genetic parameters, and Louis's (1982) method can be implemented to obtain the variance-covariance matrix.
In the MIM model, when only one putative QTL (m = 1) is considered in a backcross population, the likelihood is a mixture of two normals (like IM and CIM), and four parameters need to be estimated. The derivation of the MLEs for the one putative QTL model using the EM algorithm has been provided (Carbonellet al. 1992; Zeng 1994). When arbitrary m putative QTL are considered, the likelihood is a mixture of 2m normals, and at least 2m + 2 parameters (including mean μ, QTL positions and effects, environment variance, and epistasis) need to be estimated. The number of mixture components and parameters increases dramatically as the number of putative QTL taken into account in the model increases. Taking m = 10 as an example, the likelihood is a mixture of 1024 normals with more than 22 parameters to estimate. Therefore, one of the main difficulties with the MIM model is that the derivation of the MLEs quickly becomes unwieldy if the number of putative QTL is large, and an efficient and systematic method for parameter estimation of the MIM model is needed to avoid rederivation for each m. Here, we use the general formulas provided by Kao and Zeng (1997) for deriving the MLEs and the asymptotic variance-covariance matrix of the parameters as the estimation method of MIM. The general formulas are based on two matrices, D and Q. The matrix D is the genetic design matrix that characterizes the genetic parameters of the QTL effects, and the matrix Q is the conditional probability matrix that contains the information on QTL positions. Given the two matrices, the MLEs and the asymptotic variance-covariance matrix can be systematically obtained.
To apply the general formulas to MIM, the genetic design matrix D of the MIM model has the same first m columns as those in Equation 2 for indicating the m main QTL effects and has some or none of the last m(m – 1)/2 columns for specifying epistasis. We refer to D asa2m × k matrix, where k is the column dimension. There are m individual conditional probability matrices, Q1, Q2, ···, and Qm for the m QTL. The components of the conditional probability matrix Qj of QTL Qj in the interval Ij with flanking markers Mj and Nj can be found in Table 1 of Kao and Zeng (1997). For each interval, there are four possible flanking marker genotypes. Totally, there are 4m possible flanking marker genotypes for m intervals. The joint conditional probability matrix Q then has dimension 4m × 2m and can be obtained by
Note that, at the tested positions p1, p2, ···, and pm, the mixing proportions pij's in the likelihood are fixed and need not be estimated. For obtaining the MLEs of mean, environmental variance, and marginal and epistatic effects, the general equations formulate the iteration of the (t + 1) EM step as follows:
E step: Update the posterior probabilities of the 2m possible QTL genotypes for each individual i,
STRATEGY OF QTL MAPPING
For the MIM approach, the second problem that needs to be considered is how to search for QTL to fit into the MIM model. It is quite common that genetic marker data, e.g., rice (Liet al. 1997), pine (Aitkinet al. 1997), and eucalyptus (Grattapagliaet al. 1996), contain more than 100 markers in several linkage groups to cover most of the genome. A QTL is potentially located in any position of each interval. To detect QTL using the MIM model, model selection procedures are considered because all possible subset selection is not feasible. There are at least three basic model selection techniques, forward, backward, and stepwise selections, for exploring the relationship between the independent and dependent variables (Draper and Smith 1981; Kleinbaumet al. 1988; Miller 1990). Several selection criteria, such as Akaike information criterion (AIC; Akaike 1974), cross-validation (Stone 1974), predictive sample reuse (Geisser 1975), Baysian information criterion (BIC; Schwarz 1978), minimum posterior predictive loss (Gelfand and Ghosh 1998), or LRT statistic for selection of variables can be incorporated with model selection techniques to determine the final model. In QTL mapping, any criterion used has to take the genetic marker data structure, such as genome size and distribution of markers, into account. There have been studies on the connection of the LRT statistic to the data structure (see below). So far, however, these related studies lack other criteria. The stepwise selection technique with the LRT statistic as a criterion is adopted for identifying QTL here.
Critical value for claiming QTL detection: When using the LRT statistic as a criterion in model selection for QTL detection, it is very important to determine the appropriate critical value for claiming QTL detection such that correct statistical inference about QTL parameters can be made. Lander and Botstein (1989) suggested using the Bonferroni argument for the sparse-map case and Orenstein-Uhlenbeck diffusion for the dense-map case to determine the critical value. Generally, it has been pointed out that the critical value might need to be adjusted for the number and size of interval, different levels of heritability, different number of multiple linked or unlinked QTL, and linked QTL in the same or opposite direction of effects (Lander and Botstein 1989; Jansen 1993; Zeng 1994). Visscher and Haley (1996) suggested that the critical value should be reduced after a QTL of large effect has been detected. However, most of this information is not available before mapping, and consequently the answers to most of the above questions remain unknown. Churchill and Doerge (1994) therefore suggested using a permutation test for determining an appropriate critical value for specific data sets.
The above considerations on critical value are for the single-QTL model. For a multiple-QTL model, a model selection procedure is required to determine the final model. If stepwise selection is used, the final model is selected from a sequence of nested tests, and the significance level of the sequence will depend on the unknown true model (Atkinson 1980; Terasvirta and Mellin 1986). Therefore, the critical value of the multiple-QTL model depends not only on the above considerations but also on the unknown true model, and the choice of critical value for claiming QTL detection becomes even more complicated for MIM. We are not sure currently what the appropriate critical value is for the MIM model. In practice, the critical value from IM or CIM based on the Bonferroni argument may be used until the complicated issue of choosing the significance level for the multiple-QTL model is solved.
Stepwise selection procedure: The stepwise selection begins with no QTL (m = 0). QTL are then added or deleted one by one in the model. Alternatively, a group of QTL can be added or deleted together. The testing hypotheses for adding or deleting one additional QTL Qi are
Step 1: Significant values for entry (SVE) and staying (SVS) of a LRT statistic are specified for adding and dropping a QTL in the MIM model. Note that SVE and SVS could be different in model selection.
Step 2: For each position on the genome covered by markers, the LRT statistic reflecting the contribution of the putative QTL to quantitative trait variation is calculated (m = 1; IM). If there are positions with LRT statistics larger than SVE, the position with the largest value will be selected and added first in the model. When m = 1, it is important to note the shape of the likelihood profile and the direction of effect change along the genome for further mapping. Note that quite often no position is found with the LRT statistic larger than SVE when m = 1 because individual QTL contribute little to the trait variation. Two alternative approaches are proposed to prevent the procedure from stopping at a very early stage.
First, when m = 1, the position with the highest LRT statistic is automatically included in the model to initiate the procedure. In our experience, when only one QTL is considered in the model (m = 1), it is quite often found that the LRT statistic of a QTL could be less than SVE. But, when multiple QTL (if any) are accumulated in the model (m > 1), the partial LRT statistics of individual QTL might become significant because more genetic variation is removed from residual variation by taking multiple QTL into account.
Second, chunkwise selection (Kleinbaumet al. 1988) can be used. For closely linked QTL with opposite effects, more than one QTL may be selected in the model as a chunk to effectively reduce the genetic residue in the model. If only one of them is selected, its contribution may not be significant because the effect is canceled out due to failure to consider the others. When m = 1 in the MIM model, the chromosome region with significant change in the directions of effect could suggest that linked QTL with opposite effects are present. Also, epistatic QTL can constitute a chunk. If QTL interact, they may not be significant if only one of them is considered, but they could be significant if they are considered together. Note that the critical value should be higher for chunkwise selection because more parameters are tested. Chunkwise selection allows the incorporation of prior knowledge and preference into the model selection procedure.
Step 3: After the first k QTL are added to the model, the MIM model with m = k + 1 QTL is considered. The position that produces the most significant partial LRT statistic at the SVE level is added into the model. After the k + 1 QTL are fitted to the model, stepwise selection checks all the QTL and deletes any QTL that does not produce a significant partial LRT statistic at the SVS level. Note that a QTL that enters at an early stage may become superfluous at a later stage in stepwise selection procedure. By the same argument, chunkwise selection (m = k + l, l > 1) can be implemented. The stepwise process ends when none of the other positions has a partial LRT statistic significant at the SVE level.
Separating linked QTL: The evidence of multiple-linked QTL clustering in a region could be suggested by the shape of the likelihood profile, for example, a likelihood profile with a wide range of significant multiple peaks, or by significant change in the direction of estimated QTL effects on a chromosome region. To separate closely linked QTL in a certain chromosome region, we can compare the likelihood of the multiple-QTL model with that of a single-QTL model in this region for separation.
Analyses of epistasis: For a backcross population, it can be shown that if epistasis is present and ignored in mapping, the estimates of main effects of epistatic QTL are asymptotically unbiased whether epistasis between QTL is considered in the model or not, and the power of the test for detecting epistatic QTL could be low (appendix). Therefore, when mapping QTL without considering epistasis in a backcross population, the positions and effects of the identified QTL could still be unbiased. For l QTL being tested, there are k = l(l – 1)/2 possible digenic epistases. For each pair of QTL Qi and Qj, the hypotheses for testing their epistatic effect wij are
Fine tuning the estimates of QTL positions and effects: In the above procedures, the estimates of QTL effects and positions were obtained individually. Therefore, the model likelihood might not be at the maximum, and the model is not the final model. To obtain the MLEs of the positions and effects, a multidimensional search around the regions of the identified QTL is suggested. By doing this, QTL estimates can be fine tuned and the final model can be determined. With estimates of QTL positions and effects, other composite genetic parameters (e.g., heritability and variance components) of a quantitative trait can be estimated and response to selection can be predicted.
Construction of the confidence interval for QTL positions and effects: It is important to construct the confidence interval (C.I.) for QTL effects and positions. For example, when a particular QTL is to be transferred to a recipient, a C.I. of QTL position estimate can give us an idea about how large a chromosome segment is around the detected position to be transferred. There are several approaches to constructing a C.I. of the QTL positions and effects, including lod support interval (Lander and Botstein 1989), bootstrapping, using asymptotic standard deviation (ASD; Darvasiet al. 1993; Kao and Zeng 1997), and the methods by Dupuis and Siegmund (1999). Darvasi et al. (1993) and Kao and Zeng (1997) suggested using (p̃ – Z(1–α/2)Sp̃, p̃ + Z(1–α/2)Sp̃), where p̃ and Sp̃ are the estimates of QTL position and its standard deviation, to construct a C.I.
Estimation of variance components and heritability: When the final model is determined, the variance components and the heritability of the quantitative trait can be estimated. The ratio VG/Vp, denoted by
To estimate the genetic variance components, for example, the total genetic variance contributed by m QTL in the backcross population by Equation 1 is
Estimation of individual genotypic value and marker-assisted selection: In plant or animal breeding, individuals with high genotypic values or favorable genotypes are usually selected for producing progeny. With the estimated QTL effects and positions, the genotypic values of individuals can be estimated by Equation 1 and the favorable QTL genotypes can be determined for selection. To select individuals with large trait values, genotype AA (Aa) of nonepistatic QTL with positive (negative) effects is preferred. For QTL with epistasis, their epistatic effects must be considered in selecting the best combination of genotypes. If QTL controlling different traits are closely linked or at the same positions, traits are genetically correlated. Selecting individuals for improvement of one trait will affect the other trait due to linkage or pleiotropy. In practice, selecting individuals with the desired character for one trait will frequently accompany an undesired character for other traits. By considering circumstances such as genetic correlation between traits, the distances between markers and QTL, and the effects of QTL, the best strategies of marker-assisted selection for (multiple) trait improvement under specific purposes and requirements can be explored.
DATA ANALYSIS
Radiata pine: Radiata pine is one of the most widely planted forestry species in the Southern Hemisphere. Two elite parents were crossed to produce 134 progeny. For each progeny, random amplified polymorphic DNA (RAPD) markers were generated, and traits measured included annual brown cone number at eight years of age, diameter of stem at breast height, and branch quality score. The cone number per tree, which varied from 0 to 45, was transformed to approximate a normal distribution using a square root transformation. The quality of branches of a tree were scored on a scale from 1 (poorest) to 6 (best). The mean of several branch quality scores denoted the branch quality of a tree. A pseudotestcross strategy is used to construct a linkage map for each parent, and then a backcross model can be used for mapping QTL for each parent separately (Grattapaglia and Sederoff 1994; Grattapagliaet al. 1996). The analysis reported here is on one parent. A genetic marker map was constructed using MapMaker/EXP (Lincolnet al. 1993). The RAPD marker data contained 120 markers in 12 linkage groups and covered ∼1679.3 cM. The average spacing of the 107 marker intervals was 13.5 cM.
As mentioned in strategy of qtl mapping, the choice of critical value is a very complicated issue for the multiple-QTL model. The value depends on the marker data structure and several unknown QTL parameters (true model). In data analysis, a critical value from IM based on Bonferroni argument is used to evaluate and illustrate the MIM approach. The SVE and SVS of the LRT statistic for claiming a QTL detection at the overall α = 0.05 level were chosen as 12.12
QTL detection: For trait DBH, when m = 1, there is no position along the genome with an LRT statistic higher than SVE. The position with the largest LRT statistic (7.85; R2 = 0.0639) was found at position [12,5,0] (0 cM away from the left marker of the fifth marker interval on the twelfth linkage group). The chromosome region between C1M3 (the third marker of the first linkage group) and C1M7 showed opposite direction of effects. At C1M3, the effect was positive (P = 0.57), while at C1M4 and C1M5, the effects were negative (P = 0.0253 and 0.4181, respectively). The genetic distance between C1M3 and C1M4 is 74.8 cM. It could suggest that there are two closely linked QTL with opposite directions of effects in this region. If only one QTL (m = 1) is fitted in the model for search, the effect can be canceled out by opposing QTL effects. QTL will be out of detection as shown by the LRT statistic profile of IM in Figure 1. Therefore, on linkage group 1, the MIM model with m = 2 selected two candidate QTL, at positions [1,3,63] and [1,4,0], as a chunk. The partial LRT statistic for fitting the two QTL in the model was 13.13 (SVE and SVS for two parameters are
The procedure restarted at m = 2 by fitting two QTL with effects of opposite directions at [1,3,63] and [1,4,0]. The partial LRT statistics were 8.034 and 8.458 for the two QTL, with estimated effects 65.65 and –73.48, respectively. Given QTL at [1,3,63] and [1,4,0] in the model, a QTL at [10,5,12] with partial LRT statistic 12.83 was selected into the model (m = 3). The partial LRT statistics became 14.89, 15.42, and 12.83, which were all larger than the SVS of 12.12, for the three QTL. The model R2 was 0.3202. Given these three QTL in the model, the largest partial LRT statistic 7.40 was found at position [2,2,0]. A chunkwise selection for epistatic QTL was attempted. If the candidate QTL at [2,2,0] and [12,5,12] with epistasis were selected as a chunk (m = 5 and one epistasis, k = 6), the partial LRT statistic of the chunk would be 24.76 (compared with
Summary of QTL detected by MIM in Radiata pine
—Results of QTL mapping. (a–e) The solid triangles denote the QTL positions localized by MIM. The size of triangle reflects the size of QTL effect.
Fine tuning the estimates of QTL position and effect: Two epistatic pairs were identified as described above; no other epistatic interaction between QTL was found. No QTL without main effect but interacting with the identified QTL were found. A multidimensional search around the detected QTL was used to fine tune the estimates of QTL parameters. The locations changed to [1,3,61], [1,4,0], [2,2,0], [5,5,0], [10,5,9], and [12,5,9]. The estimated QTL effects are shown in Table 1. QTL at positions [1,3,61], [2,2,0], [10,5,9], and [5,5,0] had positive effects, and QTL at positions [1,4,0] and [12,5,9] had negative effects. The effects of QTL at positions [1,3,61] and [1,4,0] were larger when compared with others. The model R2 was 0.5226. Therefore, six identified QTL were conclusively identified in QTL mapping for the diameter trait. The partial LRT statistic profiles for each QTL are shown in Figure 1.
Epistasis: The estimated epistatic effect between QTL at positions [2,2,0] and [12,5,9] was 39.54 (partial LRT statistic 15.23), and the epistatic effect between QTL at [5,5,0] and [10,5,9] was 22.64 (partial LRT statistic 4.84). Figure 2 shows how the QTL interact. Figure 2a shows that the effect of QTL (GBB – GBb) at position [12,5,9] was positive in the background of homozygote QTL (AA) at position [2,2,0], but it was negative in the heterozygote background (Aa). Figure 2b shows that the QTL at position [10,5,9] had a large effect in the background of homozygote QTL (AA) at [5,5,0], but it had a small effect in the heterozygote background (Aa).
Heritability and variance components: The broad sense heritability for tree diameter can be estimated by the R2 value of the final MIM model. The R2 of the model including six QTL and two epistases was 0.5226. QTL at positions [2,2,0], [5,5,0], [10,5,12], and [12,5,9] contributed ∼4.50, 1.36, 5.25, and 1.76% of the total genetic variance, respectively. The percentage of genetic variance contributed by the two linked QTL separated by 13.8 cM on the first linkage group was 76.75%. There was a negative genetic covariance between the two linked QTL. Two epistatic pairs contributed ∼10.38% to the total genetic variance.
QTL mapping for cone number and branch quality: QTL mapping was also performed on the traits of cone number and branch score. The mapping results are listed in Table 1. For cone number, seven QTL were identified (although the QTL at [1,1,3] was not significant with partial LRT statistic 9.44, we considered it as a candidate QTL). Epistasis was found between two QTL pairs using chunkwise selection. The model R2 value of the MIM model fitted to the seven QTL and their epistasis was 0.5606. The two linked QTL, separated by 27.6 cM on linkage group 10, contributed 29.93% of the genetic variance. The other five QTL contributed ∼55.93% of the total genetic variance. Epistasis contributed 14.14%. For branch quality, five QTL were identified (we also considered the two QTL with partial LRT statistic values 10.37 and 10.36 at [1,4,11] and [12,5,0] as candidate QTL). No epistasis was found for QTL controlling branch score. The model R2 was 0.3630. Two linked QTL, separated by 19.6 cM on linkage group 11, contributed 48.69% of the genetic variance. The remaining three QTL contributed from ∼11 to 27% of the total genetic variance.
—Epistasis between QTL controlling tree diameter. (a) Epistasis between QTL at positions [2,2,0] and [12,5,9]. A and B denote QTL at [2,2,0] and [12,5,9], respectively. (b) Epistasis between QTL at positions [5,5,0] and [10,5,9]. A and B denote QTL at [5,5,0] and [10,5,9], respectively.
Confidence intervals of QTL positions and effects: The lod support interval and the ASD of QTL effect and position are listed in Table 1. Out of the 18 QTL detected for three traits, 9 ([2,6,0], [5,10,0], and [10,9,0] for cone number; [1,4,0], [2,2,0], and [5,5,0] for tree diameter; [2,1,0], [11,6,0], and [12,5,0] for branch score) of them were localized at the markers, and 2 ([10,5,9] and [12,5,9]) had negative ASD. Therefore, the ASD of these QTL position estimates were not available for constructing C.I.'s. The asymmetric lod support intervals are typical in this case. For example, the diameter QTL at [5,5,0] has an asymmetric lod support interval ([5,4,7], [5,5,16]). In general, the interval constructed by ASD is much narrower than the lod support interval. For example, C.I.'s constructed using four times ASD were 6.52 and 7.04 cM for the cone QTL at [6,4,18] and [12,3,2], and the lod support intervals are 59.6 cM and 14.6 cM, respectively.
Marker-assisted selection: Individuals with favorable QTL genotypes are selected as parents to produce progeny. Trees carrying all the favorable QTL genotypes were not found for each trait in the sample. Therefore, only a subset of the detected QTL was considered in selection. For tree diameter, three trees were found to carry favorable genotypes and two trees were found to carry unfavorable genotypes (consider epistasis) of the five QTL (out of the six detected QTL) at positions [1,4,0], [2,2,0], [5,5,0], [10,5,9], and [12,5,9]. The observed trait means for the two groups were 232.38 and 163.05 mm, respectively, through selection of these five diameter QTL. The estimated genotypic values of the two groups were 233.84 and 160.06 mm (Table 2). The observed and estimated values of performing selection for the other two traits on the sample based on four and five QTL are also shown in Table 2. The mapping results in Table 1 also allow us to estimate the genotypic values of certain genotypes. For example, if trees carrying all six favorable diameter QTL were selected with epistasis taken into consideration, the estimated tree diameter for those trees would be 314.17 mm and the estimated cone number would be 8.22. If trees carry all seven favorable QTL (epistasis considered) for reducing cone number, the estimated cone number would be 0.33 and the estimated tree diameter would be 196.45 mm. Consequently, the improvement of tree diameter would cause simultaneous increase in cone number, which is a reflection of the positive genetic correlation between the two traits. Generally, the estimated and observed results were quite close based on the MIM result as found in this sample.
Comparison of the predicted and observed means of the selected populations
DISCUSSION
A new QTL mapping approach named MIM is proposed. It uses multiple-marker intervals simultaneously to construct multiple QTL in the model for QTL mapping. The MIM model is based on Cockerham's model (C-H. Kao and Z-B. Zeng, unpublished results) for defining genetic parameters and on the general formulas of Kao and Zeng (1997) for statistical estimation. Using the MIM model, stepwise and chunkwise selections with the LRT statistic as a selection criterion are proposed to identify QTL, to separate linked QTL, and to analyze epistasis between QTL. The asymptotic standard deviations of the estimated QTL positions and effects can be obtained for constructing the C.I.s. With the estimated QTL effects and positions provided by MIM, the variance components of QTL, the heritability of a quantitative trait, and the genotypic values of individuals can be estimated, and marker-assisted selection can be performed for trait improvement. Experimental data on three traits on radiata pine were analyzed to illustrate the potential power and benefit of MIM in comparison with the current methods, such as IM and CIM. While a backcross MIM model was used here as an example, the MIM model can be easily extended to an F2 population (C-H. Kao and Z-B. Zeng, unpublished results).
The MIM model is a multiple-QTL model. When the multiple-QTL model is considered, the likelihood is a finite normal mixture and becomes increasingly intractable in maximization as the number of QTL fitted into the model increases (Haley and Knott 1992; Satagopanet al. 1996). We used the method of maximum likelihood in estimation by applying the general formulas of Kao and Zeng (1997) to maximize likelihood and obtain MLEs as well as the variance-covariance matrix of the MLEs. The MLEs have some attractive properties, such as invariance, consistency, and asymptomatic efficiency, in statistical inference. If prior information of parameters is available, the Bayesian approach, such as in Satagopan et al. (1996) and Sillanpaa and Arjas (1998), can be used to incorporate the prior information in estimation. By specifying prior density of parameters, they used Markov chain Monte Carlo to evaluate the posterior density and to output empirical distribution of QTL parameters for QTL mapping. The MIM approach of using the multiple-QTL model in QTL mapping distinguishes itself from the current approaches, such as IM and CIM, by the ability to use multiple-marker intervals simultaneously to search the chromosome region between markers for QTL. As a result, the MIM method may provide greater power and precision for QTL mapping. However, it should be noted that the significance level of the multiple-QTL model depends on the marker data structure and the unknown true model, and the critical value for claiming QTL detection becomes a complicated issue for MIM (see strategy of qtl mapping). In the example, we used an ad hoc critical value. This value is appropriate for the one-QTL model, but it may not be appropriate for MIM. Although the MIM method claims more QTL detection than the current methods in data analysis, it is not appropriate to conclude that the MIM method is better until the complicated problem of assessing the appropriate critical value for the multiple-QTL model has been solved.
Under the ad hoc critical value, MIM detected six QTL for tree diameter and CIM detected only two of them on the first linkage group in this example. IM failed to detect any QTL (Figure 1). The major reason for this difference is that CIM is not capable of controlling the two detected linked QTL simultaneously in further mapping. As a result, only the QTL at position [1,4,0] is controlled, but it does not contribute substantially to reducing the genetic variation because its effect has been canceled out by ignoring the linked QTL with opposite effects at position [1,3,61]. Accordingly, most of the genetic variance (76.75%) contributed by the two linked QTL becomes part of the genetic residue, making the other four QTL undetectable. This shows the beauty of MIM, which allows the current detected QTL being fitted directly into the model to search for the next QTL. Consequently, more QTL were detected by MIM than the current methods in this example.
In the data analyses, MIM localized two linked QTL with large opposite directions of effect in the third interval of linkage group 1 (Figure 1a). They contributed 76.75% of the total genetic variance. The size of this interval was 74.8 cM, so it is suggested that more markers should be added to this interval to permit further investigation. Two linked QTL, one controlling diameter and another controlling cone number, were detected in the same fifth interval of linkage group 10 (Table 1). The estimated locations are 2 cM apart. Further investigation is needed to check if they are the same (pleiotropic) or different (closely linked) QTL. The likelihood profile of linkage group 12 in Figure 1e is a result of conditioning on the other five unlinked QTL. It shows multiple significant peaks, which could suggest multiple-linked QTL on the same linkage group. However, after further investigating the linkage group, there was no evidence of multiple QTL given the peak at position [12,5,9] and the other five detected QTL. It is therefore concluded that there is only one QTL at position [12,5,9] on linkage group 12.
Another benefit derived by MIM is that epistasis can be readily incorporated in the model for analysis or searching for epistatic QTL. When taking both main and epistatic effects into account in searching for QTL, the critical value for hypothesis testing needs to be adjusted for the extra degree of freedom for epistasis. It is interesting to know that the estimated main effects of linked QTL are asymptotically unbiased in the backcross population (appendix), but they are biased in the F2 population if epistasis is present and ignored in mapping (Kao 1995). This is because the coded variables for main and epistatic effects in Cockerham's model are still orthogonal under linkage disequilibrium for the backcross population but not for the F2 population. This asymptotic unbiasedness property ensures that QTL mapping could first be performed without taking epistasis into account without causing a problem in the backcross population. For tree diameter and cone number, respectively, epistasis contributed 10.38 and 14.14% of the total genetic variance. Therefore, epistasis should be generally considered in searching for QTL and marker-assisted selection. For example, in Figure 2a, the best combination of QTL genotype at positions [2,2,0] and [12,5,9] was AABB, which had an estimated genotypic value of 13.2. If epistasis was ignored, genotype AABb, with estimated genotypic value 1.75, would be selected. The benefit of taking epistasis into account was reflected in the mapping result in Table 1 and in grouping genotypes in Table 2.
It has been 76 yr since Sax (1923) associated seed coat characters with seed size in beans. The QTL mapping model has evolved from using marker analyses, e.g., t-test, simple or multiple regression, to one-QTL model (IM and CIM), and further to the multiple-QTL model, such as the MIM approach. In practice, the detected QTL will be used for selecting parents with desired genotypes for producing progeny or gene transfer to achieve the ultimate goal of trait improvement in later generations. QTL have to be mapped as precisely as possible to ensure good quality of the follow-up operation on QTL. Therefore, precision and unbiasedness in estimating the parameters of QTL should be more important than the ease of computation and implementation in QTL mapping. The computation burden of the multiple-QTL model is heavy when compared with the one-QTL model. However, the gain of doing so, as shown in this article, could be significant. Although further work is needed to establish a theoretical basis for determining an appropriate criterion of model selection in QTL mapping under MIM, MIM has the potentiality to be more powerful and more precise in QTL mapping by directly conditioning putative QTL and incorporating possible epistasis in the model. Thus, more genetic variation can be controlled in the model. With the estimates of QTL parameters, other composite genetic parameters, such as the genetic variance components and heritabilities, can also be estimated. In addition, based on the MIM results, genotypic values of individuals can be estimated to allow desired genotypes to be selected in marker-assisted selection under various requirements (e.g., cost, efficiency, and trait correlations).
An initial version of the MIM program source code (written in Fortran 77 language) is available on the worldwide web (http://www.stat.sinica.edu.tw/~chkao/). A more user-friendly package can be developed based on this program. Using the MIM program, we implemented stepwise and chunkwise selections with the LRT statistic as a selection criterion to search for QTL in data analyses. In analyzing the data, we chose the two linked QTL with opposite direction of effect on the first linkage as a starting point to initiate the selection process, and six QTL were found for tree diameter. We also tried another possible starting point at [12,5,0] to initiate the process and obtained the same final model. This final model obtained by model selection might not be optimal. Even though the optimal model was obtained, there is no guarantee that it is the true model (the estimated QTL are the true QTL) for limited sample size. Ultimately, the reliability of the identified QTL will depend on further experiments to assess the validity of QTL. There is no single criterion that plays the role of a panacea in the model selection problem. Other model selection techniques and criteria could also be implemented. It is a very important task to explore and automate the model selection procedures of the MIM approach for general use in the QTL mapping community.
Acknowledgments
We are greatly indebted to Dr. Chung-I Wu and three anonymous reviewers for their comments and criticisms. Chen-Hung Kao is grateful to Corinna Lange for her suggestions. C-H.K. was supported by grants NSC87-2313-B-324-001 and NSC88-2313-B-324-001 from the National Science Council, Taiwan, Republic of China; Z-B.Z. was funded by GM-45344 from the National Institutes of Health and no. 9600645 from the United States Department of Agriculture Plant Genome Program.
APPENDIX: THE PROBLEMS OF IGNORING EPISTASIS IN QTL MAPPING
To simplify the argument, consider the situation where the test positions for QTL are located precisely at the marker position. If only two epistatic QTL, A (x1) and B (x2), control a quantitative trait y, the singlemarker regression coefficient of y on one of the QTL, say x1, is given by byx1 = Cov(y, x1)/V(x1), where Cov(y, x1) is the covariance between the trait and QTL A and V(x1) is the variance of QTL A. Assuming that there is no covariance between environmental deviation and QTL, it is easy to show that
Footnotes
-
Communicating editor: C.-I Wu
- Received December 5, 1997.
- Accepted March 24, 1999.
- Copyright © 1999 by the Genetics Society of America