Genetics, Vol. 167, 1987-2002, August 2004, Copyright © 2004
doi:10.1534/genetics.103.021642
Multiple-Interval Mapping for Quantitative Trait Loci Controlling Endosperm Traits
Chen-Hung Kao1
Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China
1 Author e-mail: chkao{at}stat.sinica.edu.tw
Manuscript received August 28, 2003.
Accepted for publication April 30, 2004.
 |
ABSTRACT
|
|---|
Endosperm traits are trisomic inheritant and are of great economic importance because they are usually directly related to grain quality. Mapping for quantitative trait loci (QTL) underlying endosperm traits can provide an efficient way to genetically improve grain quality. As the traditional QTL mapping methods (diploid methods) are usually designed for traits under diploid control, they are not the ideal approaches to map endosperm traits because they ignore the triploid nature of endosperm. In this article, a statistical method considering the triploid nature of endosperm (triploid method) is developed on the basis of multiple-interval mapping (MIM) to map for the underlying QTL. The proposed triploid MIM method is derived to broadly use the marker information either from only the maternal plants or from both the maternal plants and their embryos in the backcross and F2 populations for mapping endosperm traits. Due to the use of multiple intervals simultaneously to take multiple QTL into account, the triploid MIM method can provide better detection power and estimation precision, and as shown in this article it is capable of analyzing and searching for epistatic QTL directly as compared to the traditional diploid methods and current triploid methods using only one (or two) interval(s). Several important issues in endosperm trait mapping, such as the relation and differences between the diploid and triploid methods, variance components of genetic variation, and the problems if effects are present and ignored, are also addressed. Simulations are performed to further explore these issues, to investigate the relative efficiency of different experimental designs, and to evaluate the performance of the proposed and current methods in mapping endosperm traits. The MIM-based triploid method can provide a powerful tool to estimate the genetic architecture of endosperm traits and to assist the marker-assisted selection for the improvement of grain quality in crop science. The triploid MIM FORTRAN program for mapping endosperm traits is available on the worldwide web (http://www.stat.sinica.edu.tw/chkao/).
CEREAL grains of many crops, such as rice, wheat, barley, and corn, are major food and nutritious resources for human, animal feeds, and industrial products. To enhance the yield and quality of grains, the understanding of the genetic basis underlying the cereal grains becomes increasingly important in crop study. The cereal grains are generally composed of diploid (embryo) and triploid (endosperm) tissues due to double fertilization. During the process of double fertilization, one of the two sperm cells fuses with the egg cell to produce a diploid zygote, which later divides mitotically to form the embryo, and the other sperm cell unites with the central cell (a diploid set of maternal chromosomes) to form a triploid endosperm nucleus, which also undergoes several mitotic divisions to become the endosperm. It is known that the endosperm plays a major role to nourish the embryo in the seed and the young seedling, and the content of endosperms, such as protein, sugar, oil, and carbohydrate concentration, showing quantitative variation is directly related to the quality of cereal grains. The genetic improvement targeting these endosperm traits can provide an efficient way to enhance the grain quality, and it has attracted a lot of attention in plant breeding (SADIMANTARA et al. 1997; MAZUR et al. 1999; TAN et al. 1999; WANG and LARKINS 2001; LOU and ZHU 2002). Genetically, the trisomic endosperm represents the next generation and has a more complex genetic mechanism than the diploid tissues. For these reasons, the approach of genetic analysis to endosperm traits is different from that to traits under diploid control, and special treatments are required in the study of endosperm traits.
Most endosperm traits show continuous variations. Quantitative genetic models considering the triploid nature of endosperm traits for studying the underlying genetic basis have been proposed by several researchers (GALE 1976; MO 1987; BOGYO et al. 1988; ZHU and WEIR 1994). These models generally focus on partitioning the phenotypic variance of an endosperm trait into various genetic and nongenetic (environmental) components. These variance components do not provide all the detailed information, such as the number, positions, and effects about the underlying quantitative trait loci (QTL). To unlock this QTL information, the ideas of the traditional QTL mapping methods utilizing the well-distributed genetic markers along the genome to infer the QTL parameters can be used. The traditional QTL mapping methods use the information about traits and markers from the same generation, e.g., backcross or F2 populations, to detect QTL controlling traits in diploid organisms (LANDER and BOTSTEIN 1989; HALEY and KNOTT 1992; JANSEN 1993; ZENG 1994; KAO et al. 1999; KAO and ZENG 2002). Although they are designed for traits under diploid control, some researchers have applied them to mapping for QTL controlling endosperm traits (TAN et al. 1999; WANG and LARKINS 2001; WANG et al. 2001). Such application implicitly relies on an invalid assumption that the endosperm traits are directly controlled by the diploid maternal genomes, not by the triploid endosperm genomes. Consequently, the traditional QTL mapping methods have limited power and precision in mapping endosperm traits (WU et al. 2002a).
WU et al. (2002a)(b) and XU et al. (2003) pioneered statistical methods to map endosperm traits by taking the triploid nature of endosperms into account using the marker information from the maternal plants (one-stage design) in the backcross or F2 population. WU et al. (2002a) further proposed a triploid QTL mapping method by using the marker information from both the maternal plants and their embryos (two-stage design), to improve the mapping of endosperm traits in the backcross population. Their methods have been shown to be able to provide improved QTL resolution. As these methods consider only one (or two) QTL at a time in the model, they can bias QTL identification and estimation when multiple QTL are located in the same linkage group (LANDER and BOTSTEIN 1989; JANSEN 1993; ZENG 1994). To deal with these problems and further improve the endosperm trait mapping, a potential way is to extend the current one-QTL model to a multiple-QTL model such that more genetic variation can be controlled in the model, as has been done in mapping traits in diploid tissues (KAO and ZENG 1997; KAO et al. 1999; ZENG et al. 1999). In this article, a triploid method based on multiple-interval mapping (MIM) using multiple marker intervals simultaneously to fit multiple putative QTL into the model is developed to achieve these purposes. This MIM-based triploid method can broadly take either the one- or two-stage design in either the backcross or F2 population into account to analyze endosperm traits. As shown in this article, the proposed method can detect QTL responsible for endosperm traits with more power and better precision, and it can readily analyze and search for epistatic QTL due to its multiple-QTL approach. Besides, some related issues in mapping endosperm traits, such as the problems of using the diploid methods, the differences and relation between the diploid and triploid methods, the genetic variance components of endosperm traits, and the problems if QTL effects are present and ignored, are also investigated. A series of simulation studies was performed to further investigate these issues, to examine the relative efficiency of different experimental designs, and to evaluate the performance of the MIM-based method as compared to the current methods in mapping endosperm traits.
 |
GENETIC MODEL OF ENDOSPERM TRAITS
|
|---|
Genetic model:
For individuals in a backcross or F2 population of autogamous plants, the endosperm tissues of their seeds can have four possible genotypes, QQQ, QQq, Qqq, and qqq, if only one QTL Q is considered (APPENDIX B). Some genetic models for defining the genetic parameters and modeling the relationship between their genotypic values and the genetic parameters already exist (e.g., GALE 1976; MO 1987; BOGYO et al. 1988; POONI et al. 1992; ZHU and WEIR 1994). Here, the genetic model by Bogyo et al. is adopted for modeling, and it can be expressed in matrix notation as
 | (1) |
where the notations G1, G2, G3, and G4 denote the genotypic values of genotypes QQQ, QQq, Qqq, and qqq, respectively, and a, d1, and d2 are the genetic parameters. In Equation 1, the matrix with 4 x 3 dimension is called a genetic design matrix as it specifies the relationship between the genotypic values and genetic parameters, and it is symbolized by D. The unique solutions of a, d1, and d2 in terms of the genotypic values are
The parameter µ obviously is not a measure of mean genotypic values as the genotypic values of AAa and Aaa are ignored. The parameter a, which measures the average effect of substituting Q for q, is defined as the additive effect, and the parameter d1 (d2), which measures the departure of the substitution effect in QQ (qq) background, is defined as the first (second) dominance effect. The genetic model can be expressed more succinctly as
 | (2) |
where the coded variables are defined as
such that each genotype corresponds to its genotypic value. If different genetic models are used for modeling, they can be also expressed as in Equations 1 and 2, but note that the parameters may have different meanings and the variance component may have different structure.
The extension of the one-locus genetic model in Equation 1 to multiple, say m, loci is straightforward. Consider m QTL, Q1, Q2, ... , and Qm, each with four genotypes and three genetic parameters. Together, for m QTL, there are 4m possible different QTL genotypes and 3m parameters if epistasis between QTL is not considered or 3m(3m 1)/2 parameters if only up to digenic epistasis is considered. The columns for epistasis can easily be obtained from the product of columns of marginal effects. By expanding the genetic design matrix D of Equation 1 to a 4m x 3m or 4m x 3m(3m 1)/2 matrix (see THE MIM MODEL FOR MAPPING ENDOSPERM TRAITS), the genetic model for m QTL in matrix notation can be obtained. The genetic design matrix D plays an important role in the estimation of the QTL effects in the triploid MIM model. The corresponding multiple-QTL model in the form of Equation 2 can be easily obtained using a regression principle. Following the regression principle, the genetic model of m QTL by considering up to digenic epistasis can be written as
 | (3) |
where µ is the intercept; aj, dj1, and dj2 are the additive and dominance effects of Qj; iajak, iajdk1, iajdk2, idj1dk1, idj1dk2, idj2dk1, and idj2dk2 denote the epistatic effects between QTL; and xj, zj1, and zj2 are the coded variables of the additive and dominance effects for Qj.
Variance components:
Consider only one QTL in the genetic model. It is easy to show that the variance of the additive variable, V(x), is 19/16, and the variances of the two dominance variables, V(z1) and V(z2), are 7/64, in a backcross population. In an F2 population, these variances are 7/4, 7/64, and 7/64, respectively. The covariances between the variables, Cov(x, z1), Cov(x, z2), and Cov(z1, z2), are 5/32, 1/32, and 1/64, respectively, in the backcross population, and they are 1/16, 1/16, and 1/64, respectively, in the F2 population. Therefore, the genetic variance components of an endosperm trait are
 | (4) |
in the backcross population, and they are
 | (5) |
in the F2 population. It shows that each effect contributes not only to its variance but also to the covariances with other effects, and that the relative importance of effects in contributing to the total genetic variance depends not only on their sizes but also on their associated coefficients (the variance or covariance of their coded variables). When m QTL each with complete effects are considered together, the genetic variance has [9m2(3m 1)2 + 6m (3m 1)]/8 components. For example, the total genetic variance has 120 components for m = 2 in both populations (not shown), and it reduces to 111 components in the backcross population and 83 components in the F2 population when the two QTL are unlinked (APPENDIX A). Among the coefficients of the variances involving the epistatic effects, the coefficients associated with the additive-by-additive effect
are relatively much larger than those of other variances and covariances. For example, in the F2 population, the coefficient of i2a1a2 (the variance of x1x2) is 49/16 (APPENDIX A); i.e., the variance contributed by ia1a2 is 49/16 x i2a1a2, the coefficients of the other four epistatic variances involving the additive effects are 7/32, and the coefficients of the remaining four different types of epistatic variance are 63/4096. The coefficients of the covariances between the additive effects and the epistatic effects involving the additive effects are 7/32, and the coefficients of the covariances between ia1a2 and the other epistatic effects involving the additive effects are 7/64. The other covariances are relatively smaller. Therefore, it implies that, for the same order of the epistatic effects, the epistatic effects involving the additive effects, especially the additive-by-additive effect, are relatively easy to detect, and the other epistatic effects are relatively difficult to detect in practical QTL mapping (with a limited sample size). A similar pattern can also be found in the backcross population. For two nonepistatic QTL, the variance components are
where r12 is the recombination fraction between the two QTL, in the F2 population. Similarly, the variance components for the backcross population also have 21 terms (not shown). If the two nonepistatic QTL are unlinked, the variance components reduce to a much simpler form with the first 12 components.
 |
THE RELATION BETWEEN THE DIPLOID AND TRIPLOID METHODS
|
|---|
The traditional QTL mapping methods are usually designed to map for QTL controlling traits in diploid organisms (LANDER and BOTSTEIN 1989; HALEY and KNOTT 1992; JANSEN 1993; ZENG 1994; KAO et al. 1999; KAO and ZENG 2002). These diploid methods classify the genotypes of each QTL into two groups, QQ (qq) and Qq, for the backcross population or three groups, qq, Qq, and QQ, for the F2 population, and they detect the association between the QTL genotype and the trait value both measured at the same generation for QTL mapping. Although the endosperms are known to be triploid and represent the next generation, some researchers have applied these diploid methods to mapping endosperm traits of the backcross or F2 individuals (TAN et al. 1999; WANG et al. 2001; WU et al. 2002a). Therefore, it is important to investigate the problems of using the diploid methods and the relation between the diploid and triploid methods in mapping endosperm traits.
Diploid methods:
When applying the diploid methods to mapping endosperm traits and only one QTL is considered, the statistical model for n endosperms in the backcross population can be written as
 | (6) |
where w*i is coded as
yi is the endosperm trait value; µ is the intercept; b is the QTL effect measuring the genotypic difference between Qq and qq. The statistical model for n endosperms in the F2 population can be expressed as
 | (7) |
where w*ai and w*di are defined as
ba and bd denote the additive and dominance effects. The residual error
i in the above two models is assumed to have a normal distribution with mean zero and variance
2
. As QTL may not be coincident with markers, the QTL genotype is usually unobservable. Therefore, the likelihood of the diploid model is known as a mixture of normals,
 | (8) |
where µj's correspond to the genotypic values of the k different QTL genotypes (k = 2 for the backcross model and k = 3 for the F2 model), and the mixing proportions, pij's, are the conditional probabilities of QTL genotypes (see Tables 1 and 2 in KAO and ZENG 1997). The maximum-likelihood estimate (MLE) of the QTL effects and their asymptotic variance-covariance can be obtained using the EM algorithm (DEMPSTER et al. 1977) and LOUIS's (1982) method by treating the normal mixture model as an incomplete-data problem.
View this table:
[in this window]
[in a new window]
|
TABLE 1 Simulation results of using diploid and triploid methods under different experimental designs and parameter settings with heritability 0.1
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 2 Simulation results of using the diploid and triploid methods under different experimental designs and parameter settings with heritability 0.2
|
|
The relation between the diploid and triploid models:
When applying the diploid models to mapping endosperm traits, it is generally assumed that the endosperm traits are directly controlled by the diploid genomes of the backcross or F2 individuals. This assumption, however, violates the fact that the triploid endosperms represent the genetic composition of the next generation, which, in fact, is mainly responsible for the trait variation. Consequently, as compared to the use of the triploid model, some problems, such as less power and precision in QTL detection, will occur in the diploid model as shown below.
When an endosperm trait affected only by one QTL, Q, is regressed on a marker M along the genome to infer Q, the regression coefficient of M in the backcross diploid model is
 | (9) |
where rQM is the recombination fraction between M and Q, in the backcross population (APPENDIX B). If the marker M is coincident with Q(rQM = 0), the coefficient reduces to bM = 3/2a + 1/4(d1 + d2). The estimated coefficient of the backcross diploid model is composed of the additive effect and two dominance effects. In the F2 diploid model, the regression coefficient for the additive effect of M is
 | (10) |
and the coefficient for the dominance effect is
 | (11) |
If M and Q are coincident, the additive coefficient reduces to
and the dominance coefficient reduces to
. The additive coefficient estimated in the F2 diploid model is 1.5 times the additive effect, and the estimated dominance coefficient is one-quarter of the sum of the two dominance effects. When both of the additive and dominance variables are fitted in the model, the partial regression coefficients are the same as Equations 10 and 11 because of orthogonality. The above derivations present the relation of parameters between the diploid and triploid models and show that the diploid models cannot directly estimate the QTL effects in mapping endosperm traits.
The phenotypic variance conditional on the marker M in the backcross diploid model is
 | (12) |
(APPENDIX B). It shows that the genetic variances and covariances contributed by the additive and dominance effects cannot be fully controlled in the model. The percentages of additive and dominance variances uncontrolled by the diploid model are
47.4% (9/19) and 14.3% (1/7), respectively. For the F2 population, the phenotypic variance conditional on the additive and dominance variables of marker M is the same as that in the backcross model (APPENDIX B). The percentages of uncontrolled additive and dominance variances are
63.4% (9/14) and 14.3% (1/7), respectively. In addition, a part of the genetic covariances is also uncontrolled by the diploid model. The uncontrolled variances and covariances will become a part of the genetic residual, causing inflation of the sampling variance of the coefficients. The sampling variance of the regression coefficient of the backcross model is
, where
2M is the variance of the coded variable of M, in a large sample with size n (STUART and ORD 1991). Using the approximation, the sampling variances of the regression coefficients between the diploid and triploid models can be compared when M and Q are coincident (rQM = 0). Taking a QTL with no dominance (a = 1, d1 = d2 = 0) and contributing 10% of the trait variation as an example, the conditional phenotypic variance roughly equals to
2 for the triploid model, and it is
181/171 x
2 for the diploid model. The variances
2M for the two different models are 1/4 and 19/16, respectively. Consequently, the sampling variance of the regression coefficient for the diploid model is
5.03 times that for the triploid model in the backcross population. It is
3.64 times that for the same setting in the F2 population. The sampling variances of the regression coefficients in the diploid models are larger than those in the triploid model.
On the basis of the above findings, two problems will occur if the diploid models are applied to mapping endosperm traits. First, the estimates in the diploid models are generally confounded by the additive and dominance effects of endosperm QTL (Equations 9 11). Second, the sampling variances of the estimates will inflate because the genetic variances and covariances contributed by QTL are not fully controlled in the model. Consequently, the diploid models cannot directly estimate the effects of the endosperm QTL, and they have the confounding problems in estimation and will decrease the power in endosperm QTL detection.
 |
THE MIM MODEL FOR MAPPING ENDOSPERM TRAITS
|
|---|
Endosperm trait multiple-interval mapping:
Assume an endosperm trait is controlled by m QTL, Q1, Q2, ... , and Qm, located at positions p1, p2, ... , and pm, in m different marker intervals, I1, I2, ... , and Im, along the genome. If only up to digenic epistasis is considered, the value of an endosperm trait, yi, in the backcross or F2 population can be related to the m putative QTL by the model
 | (13) |
where the parameters and coded variables have the same definitions as those in the genetic model in Equation 3, and the residual error
i is assumed to follow normal distribution with mean zero and variance
2. In QTL mapping, the endosperm QTL genotype of any putative QTL, say Qj, j = 1, 2, ... , m, is usually not observable and could be QjQjQj, QjQjqj, Qjqjqj, or qjqjqj with different (conditional) probabilities for different endosperm i. The conditional probabilities (distribution) for each Qj under different experimental designs can be derived by using its flanking marker information from the maternal plants (and their embryos) as shown below, and then the normal mixture likelihood of the model can be constructed. As multiple (m) intervals are used to infer the conditional distribution of the (m) endosperm QTL for modeling, this approach is called multiple-interval mapping in QTL mapping (KAO and ZENG 1997; KAO et al. 1999), and this model is a MIM-based triploid model. By specifying appropriate conditional probabilities to the 4m endosperm QTL genotypes of the m QTL, this triploid MIM model can be applied widely to mapping endosperm traits using data from various designs and populations.
Likelihood:
For any interval, Ij, flanked by the two markers, Mj and Nj, the maternal plants or their embryos can have four and nine different marker genotypes in the backcross and F2 populations, respectively. If both the plants and embryos are considered together, their marker genotypes can have 16 and 25 combinations in the two different populations, respectively (APPENDIX C). For any Qj in Ij, the (conditional) probabilities of the four endosperm QTL genotypes can be inferred only from the maternal plants (one-stage design) or both from the maternal plants and their embryos (two-stage design) as shown in APPENDIX C. To assist with explaining the parameter estimation, these conditional probabilities are extracted to form a matrix Qj, j = 1, 2, ... , m. The dimension of Qj is 25 x 4 (16 x 4) for a two-stage design in the F2 (backcross) population, it is 9 x 4 (4 x 4) for a one-stage design in the F2 (backcross) population (note that Q denotes QTL, and Q denotes the conditional probability matrix). For the total m QTL in the m different intervals, there are 4m possible endosperm QTL genotypes in each of 25m (16m, 9m, or 4m) possible marker genotypes. The 4m joint conditional probabilities of endosperm QTL genotypes can be obtained by the product of individual conditional probabilities for each QTL using the property of conditional independence among different QTL (KAO and ZENG 1997), and they play the role of mixing proportions in the normal mixture likelihood. Let the conditional probabilities of 4m possible QTL genotypes for endosperm i from designs and populations be denoted as pij's, j = 1, 2, ... , 4m (note that pj's denote QTL positions, and pij's denote the conditional probabilities). The likelihood of the triploid MIM model for the n endosperms is a mixture of 4m normals as
 | (14) |
where µj's correspond to the genotypic values of the 4m different QTL genotypes, and the mixing proportions, pij's, are the corresponding joint conditional probabilities. The density of each individual i is a mixture of 4m possible normal densities with different means, µj's, and mixing proportions, pij's. The general formulas proposed by KAO and ZENG (1997) are used to obtain the MLE of the effects and their asymptotic variance-covariance matrix.
Parameter estimation:
The application of the general formulas to obtain the MLE and the asymptotic variance-covariance matrix for the triploid MIM model is based on the construction of the two matrices D and Q, where D is the genetic design matrix for characterizing the QTL effects, and Q is the conditional probability matrix containing the mixing proportions of QTL genotypes. Given the two matrices, the MLE of QTL effects and their asymptotic variance-covariance matrix of the triploid model can be easily obtained. The construction of the D and Q matrices is described below.
For one QTL (m = 1) in the model, there are four endosperm QTL genotypes and three genetic effects, and the genetic design matrix is a 4 x 3 matrix as shown in Equation 1. For m QTL in the model, if epistasis between QTL effects is not considered, there are 4m endosperm QTL genotypes and 3m genetic effects (m additive effects, m first dominance effects, and m second dominance effects), and the genetic design matrix is then a 4m x 3m matrix. If all the possible digenic epistases between QTL are considered, the column dimension of D becomes 3m(3m 1)/2. An example of genetic design matrix with m = 2 and all possible effects (with dimension 16 x 15) can be found in WU et al. (2002a). The joint conditional probability matrix Q for the m QTL has a dimension 9m x 4m (4m x 4m) or 25m x 4m (16m x 4m) under the one- or two-stage design in the F2 (backcross) population, and they can be obtained by Q = Q1
Q2
...
Qm, where
denotes the Kronecker product. The 4m mixing proportions of any endosperm i, pij's, in the likelihood can be found to be one of the rows in Q according to the marker genotypes of the plants (and embryos). By applying the matrices D and Q to the general formulas, the MLE of the effects and their asymptotic variance-covariance matrix can be readily obtained.
The problems if effects are present and ignored:
Three marginal genetic effects are associated with each endosperm QTL. In practice, QTL may display all or some of the effects (see WU et al. 2002b as an example), and, before mapping, it is not known which effects are present or absent. The possible drawback of fitting the absent effects (overfitting) in the model is the loss of power in QTL detection, as higher critical value is usually required to claim the significance of QTL. If some displayed effects are ignored in the model, not only the power of detection will be affected but also the confounding problem will occur as discussed below.
Assume the endosperm trait value y is affected by two nonepistatic endosperm QTL, Q1 and Q2. When the trait value is regressed on Q1 by fitting only its additive variable x1 into the model, the regression coefficients in terms of the QTL effects and linkage parameter for the backcross and F2 populations are shown in APPENDIX D. It shows that the estimate of the additive effect of Q1 is not unbiased for a1 and is confounded by its other effects and the effects of Q2. The confounding of Q2 effects is through linkage parameter. If Q1 and Q2 are unlinked, the regression coefficients reduce to much simpler forms without the confounding of Q2. For example, if r12 = 0.5,
for the backcross population, and
for the F2 population. The confounding of Q2 disappears, and the coefficient is confounded only by its dominance effects. The same confounding problem can also be found for the estimate of the dominance effect if fitting only its dominance variable z1 in the model (APPENDIX D). If epistasis is present and ignored in the model, most of the epistatic effects will be confounded in the estimation as most of the covariances between the marginal and epistatic effects are not zero whether they are linked or not (result not shown). To avoid the confounding problem and enhance the detection power, it is desirable to fit only those displayed effects into the model in QTL mapping.
 |
SIMULATION STUDY
|
|---|
A series of simulations was performed to achieve three purposes: (1) to verify the derived relations and compare the differences between the diploid and triploid models, (2) to examine the performance of the triploid method in different experimental designs and populations, and (3) to evaluate the performance of the proposed MIM-based triploid method as compared to the current methods in mapping endosperm traits. The simulation study includes two parts. The first part is to achieve the first two purposes, and the second part is to achieve the third purpose. In each part, the sample size is assumed to be 200. The first part assumes one QTL affecting the endosperm trait with two levels of heritability (h2), 0.1 and 0.2. It includes four different parameter settings: (1) a = 1, d1 = 2, d2 = 2 (G1 = 3/2, G2 = 3/2, G3 = 5/2, and G4 = 3/2); (2) a = 1, d1 = 2, d2 = 2 (G1 = 3/2, G2 = 5/2, G3 = 3/2, and G4 = 3/2); (3) a = 1, d1 = 2, d2 = 0; (G1 = 3/2, G2 = 3/2, G3 = 1/2, and G4 = 3/2); and (4) a = 1, d1 = 0, d2 = 0 (G1 = 3/2, G2 = 1/2, G3 = 1/2, and G4 = 3/2). Among the four settings, the QTL genotypes are complete-recessive type in the first and third settings, and they are complete-dominance type in the second setting. For each setting, the QTL is placed in the middle of a chromosome with six 20-cM equally spaced markers, and data from both the one- and two-stage designs in the backcross and F2 populations were generated. The number of simulation replicates is 500. Both the diploid and triploid methods were used to detect the QTL using the generated data sets. The results are shown in Tables 1 and 2. The second part assumes three chromosomes each with six 20-cM equally spaced markers, and each chromosome contains only one QTL. The three unlinked QTL, QA, QB, and QC, are assumed to contribute 40% to the total trait variation together and to be located in the middle of the chromosomes. Data from the two-stage design in the F2 population were generated. The parameter setting is a1 = 3, d11 = 3, and d12 = 3 for QA; a2 = 2.5, d21 = 4, and d22 = 4 for QB; and a3 = 1.5, d31 = 0, and d32 = 0 for QC. There is additive-by-additive interaction between QB and QC, and the epistatic effect ia2a3 is assumed to be 1. Under the parameter setting, the genetic and environmental variances are
38.37 and 51.66, respectively. In the total genetic variance, the marginal effects of the three QTL contribute
45.44, 36.32, and 10.26%, respectively, and the epistatic effect contributes
7.98%. In the genetic variance contributed by QA (QB), the variance contributed by the two dominance effects is
11.29% (25.11%). The number of simulation replicates is 100. Both the current triploid method considering only one QTL, i.e., the interval-mapping (IM)-based method, and the proposed MIM-based method were used to analyze the data. The results are shown in Table 3. In each scenario, permutation tests proposed by CHURCHILL and DOERGE (1994) were used to determine the critical values for power calculation.
Tables 1 and 2 show the results of the first part of the simulation. The relationship between the estimates of the diploid and triploid models corresponds very well with the derived prediction (Equations 9 11). For the backcross population, the effects of the diploid models in the four settings are expected to be 0.5, 2.5, 1.0, and 1.5, according to Equation 9. The means of the estimates are found to be 0.610, 2.516, 1.040, and 1.521, respectively, for h2 = 0.1 (Table 2), and they are 0.599, 2.489, 1.005, and 1.475, respectively, for h2 = 0.2 (Table 3). For the F2 population, the means of the estimated additive and dominance effects in the diploid model are also found to be very close to the predicted values in both levels of heritability. For example, the mean of the estimated additive effects for the first setting with h2 = 0.1 is 1.499 (predicted value 1.5), and the mean of the estimated dominance effects for the second setting with h2 = 0.2 is 1.010 (predicted value 1.0). The estimated residual variance by the diploid model is found to be upwardly biased in all cases as expected by Equation 12.
The most striking differences in power and estimation between the diploid and triploid models are found in the first parameter setting when the additive and dominance effects are in the opposite direction and h2 = 0.2 (Table 2). The detecting powers of the diploid model are 0.160 and 0.100, respectively, in the two different populations. The detecting powers of the triploid model are 0.508 and 0.926, respectively, under the one-stage design, and they increase to 0.980 and 0.998, respectively, under the two-stage design. For QTL position, the means of position estimates by the diploid model are 46.46 (SD 28.58) and 49.63 (SD 11.10), respectively, in the two populations. The means of position estimates provided by the triploid model under the two-stage design are 49.77 (SD 7.08) and 50.21 (SD 5.68), respectively, and they are 49.00 (SD 24.26) and 51.03 (SD 11.19) under the one-stage design, respectively. Therefore, the triploid model performs significantly better than the diploid model in this setting. In other settings, the triploid model under the two-stage design is also found to be much more powerful and precise than the diploid model, but the triploid model under the one-stage design seems to provide power and precision (in position estimation) similar to the diploid model. For example, in the third setting of the backcross population with h2 = 0.1, the diploid model has power 0.400 and mean estimated position 49.03 (SD 23.68; Table 1). For the triploid model, they are 0.406 and 50.33 (SD 23.67) under the one-stage design, and they are 0.800 and 49.15 (SD 12.38) under the two-stage design. In the second setting of the F2 population with h2 = 0.1, the diploid model has power 0.668 and mean estimated position 49.66 (SD 19.16). For the triploid model, they are 0.648 and 49.85 (SD 19.05) under the one-stage design, and they are 0.796 and 49.86 (SD 12.35) under the two-stage design. A similar pattern can also be found for the other settings in Tables 1 and 2.
The triploid model is found to have better performance under the two-stage design than under the one-stage design in this study. Under the two-stage design, the triploid model can provide higher power for QTL detection and more precise estimates for positions and effects. For example, in the first setting with h2 = 0.1 in the backcross population, the powers are 0.190 and 0.730, respectively (Table 1), and the means of the position estimates are 47.48 (SD 29.98) and 49.39 (SD 15.10), respectively, under the two different designs. In the second setting with h2 = 0.2 in the F2 population, the powers are 0.938 and 0.986, respectively (Table 2), and the means of the position estimates are 50.70 (SD 11.65) and 49.77 (SD 6.37), respectively, under the two different designs. Besides, the triploid model under the one-stage design seems to have problems in correctly estimating the effects in the backcross population when the additive and dominance effects are in opposite direction. For example, in the first setting (a = 1, d1 = 2, and d2 = 2), the means of the effect estimates by the triploid model under the one-stage design are 0.199 (SD 0.452), 1.587 (SD 2.342), and 0.589 (SD 2.202), respectively, for h2 = 0.1 (Table 1), and they are 0.143 (SD 0.344), 2.117 (SD 1.628), and 0.766 (SD 1.547), respectively, for h2 = 0.2 (Table 2). These estimates are highly biased and imprecise under the one-stage design. Similar problems can also be found in the third setting (a = 1, d1 = 2, and d2 = 0) for the backcross population. Such estimation problems, however, do not occur in the F2 population or under the two-stage design (see Tables 1 and 2), which may suggest that the F2 population is a better population than the backcross population and the two-stage design might be a more suitable design than the one-stage design for mapping endosperm traits.
The simulation in the second part aims to evaluate and compare the differences between the proposed MIM-based and the current IM-based methods in mapping endosperm traits. The results are shown in Table 3. When the IM-based method is used to detect QTL, three different models, the additive-effect model (with a only), the one dominant-effect model (with a and d1), and the complete-effect model (with a, d1, and d2), will be implemented in the search. The experimentwise critical values at 0.05 significance level are found to be 9.36, 12.57, and 13.48 for the three different models, respectively, by 1000 permutations. For the additive-effect model, the powers to detect QA, QB, and QC are 0.97, 0.96, and 0.41, respectively. For the one dominant-effect model, the powers to detect the three QTL are 0.97, 0.95, and 0.31, respectively. For the complete-effect model, the powers are 0.97, 0.94, and 0.31, respectively. The three models have similar powers to detect QA and QB, and the additive-effect model has greater power than the other two models to detect QC. Among the 100 replicates, the three models can detect either both or one of QA and QB in each replicate. The results of mapping QA and QB by the complete-effect model and mapping QC by the additive-effect model are presented in Table 3. In Table 3, the means of the position estimates for the three QTL are 48.94 (SD 7.86), 50.95 (SD 6.50), and 50.83 (SD 20.43), respectively. The average LRT statistics are 31.12 (SD 10.00), 25.99 (SD 8.82), and 9.26 (SD 6.45), respectively, for the three QTL. This shows that the larger QTL, QA and QB, can be detected with higher power and better precision as compared to the small QTL, QC. Besides, the estimates of additive effects generally are more precise than those of dominance effects. For example, the mean of â1 is 3.003 (SD 0.554), and the means of
11 and
12 are 1.449 (SD 3.451) and 3.995 (SD 2.585), respectively. One of the advantages of the MIM-based method is that it is capable of fitting the detected QTL into the model in further searching for the other QTL. When the MIM-based method considers only one QTL in the model (m = 1), the mapping results are identical to those obtained by the IM-based method. Among the 100 replicates analyzed by the IM-based method, most of the replicates (91 replicates) have both QA and QB detected. For the remaining 9 replicates, either QA or QB is detected. If the detected QA (QB) is fitted into the MIM-based model in the search (m = 2), the undetected QB (QA) in the 9 replicates can be identified and the already detected QB (QA) in the other replicates will have a larger LRT statistic by including either their partial or complete effects in the model (that is, the power for detecting QA and QB is 1.0 for MIM with m = 2). To shorten the article, only the results of considering complete effects of QA and QB in the analysis are presented (Table 3). The average (partial) LRT statistics of QA and QB increase to 35.18 (SD 10.57) and 30.07 (SD 10.42), respectively. Further, if these two detected QTL are fitted into the MIM model for QTL search along the third chromosome (m = 3), the power to detect QC is 0.57 (average LRT statistic 11.36 with SD 6.97) if only the additive effect (a3) is considered (Table 3). The power decreases to 40% (36%) if the one-dominant-effect (complete-effect) model is considered (not shown). The means of the position estimates are 49.37 (SD 6.34), 50.60 (SD 6.63), and 49.40 (SD 18.29) for the three QTL, respectively, which become more precise as compared to those by the IM-based method. If epistasis is taken into account to search for the third chromosome, many different types of epistasis can be considered. For illustration, only the additive-by-additive epistatic effect between QTL is considered (see also GENETIC MODEL OF ENDOSPERM TRAITS for first taking the additive-by-additive effect into account). Among the three possible additive-by-additive effects, only the consideration of ia2a3 improves the QTL detection. The power increases to 71% (Table 3) when ia2a3 is considered in the MIM model (m = 3 with epistasis) to search for QC (critical value 12.57 by permutation tests; average partial LRT statistic 16.84 with SD 7.79). The mean estimate of ia2a3 is 0.904 (SD 0.510), and the mean estimate of
2 is 50.39 (SD 8.26). The mean of position estimate for QC becomes 48.97 (SD 17.18), and the mean of the estimated effect is 1.510 (SD 0.698), which is more precise than that obtained by ignoring epistasis.
 |
CONCLUSION AND DISCUSSION
|
|---|
The endosperm of a seed is a triploid tissue and has a more complicated genetic mechanism than the diploid tissues. Therefore, the traditional QTL mapping methods (LANDER and BOTSTEIN 1989; HALEY and KNOTT 1992; JANSEN 1993; ZENG 1994; CHURCHILL and DOERGE 1994; KAO et al. 1999; KAO and ZENG 2002) designed for traits under diploid control are not appropriate approaches to map for QTL underlying the endosperm traits because they ignore the triploid nature of endosperms. WU et al. (2002a)(b) and XU et al. (2003) first considered the triploid inheritance of endosperms to propose IM-based triploid methods in the detection of the underlying QTL. In this article, a new triploid approach based on the MIM method is developed to take multiple QTL into account in the model for mapping endosperm traits. The proposed method can be implemented to analyze data from either the one-stage design using only maternal genotypes or the two-stage design using both maternal and embryo genotypes in the backcross and F2 populations. As shown in this article, the triploid MIM method can provide better detection power and estimation precision, and it can analyze and search for epistatic QTL directly in comparison with the current IM-based methods when mapping endosperm traits. Some important issues in mapping endosperm traits, such as the problems of using the diploid mapping methods, the relation between the diploid and triploid methods, the variance components of genetic variance, the problems if effects are present and ignored, and the relative efficiency of the diploid and triploid models under different experimental designs, are also investigated analytically or by simulation.
The triploid mapping method can provide better power in detection and more precise estimation under the two-stage design than under the one-stage design in mapping endosperm traits as shown in the simulation study (Tables 1 and 2) and also demonstrated by WU et al. (2002b). This is because the two-stage design, which provides both the maternal and embryo marker genotypes, is more informative than the one-stage design, which offers only the maternal marker genotype, in inferring the conditional probabilities of the endosperm QTL genotypes (see the website http://www.stat.sinica.edu.tw/chkao/ for the conditional probabilities under different experimental designs). In the backcross population, the one-stage design provides only 4 different marker genotypes, and these marker genotypes are noninformative in inferring QQQ, QQq, and Qqq as equal conditional probabilities are assigned to them. The two-stage design, however, can provide 16 different marker genotypes, and the marker genotypes are not informative only for QQq and Qqq. In the F2 population, the one- and two-stage designs can provide 9 and 25 marker genotypes, respectively, and each marker genotype in either design is noninformative only for the genotypes QQq and Qqq. Therefore, the two-stage design is generally more informative than the one-stage design, and the F2 population is generally more informative than the backcross design in inferring the conditional probabilities. As these conditional probabilities are the mixing proportions in the normal mixture likelihood, they play a very important role in the quality estimation of QTL parameters for the model. A more informative design or population can provide more detailed information in inferring the conditional probabilities and thus can help improve the estimation of QTL parameters. This argument can explain the reasons why the performance of the triploid method is generally poor under the one-stage design in the backcross population as compared to the performance under another data structure (see, for example, the simulation results in Tables 1 and 2 when the additive and dominance effects are in the opposite directions) and why the triploid method under the two-stage design can perform well with satisfactory power and precision in all the parameter settings. The two-stage design generally requires more genotyping work as both the genomes of the plants and their seeds need to be genotyped, and different sampling strategies for allocations of a given sample size between the two generations should be considered for cost control. Besides, WU et al. (2002b) also pointed out that the different sampling strategies for allocations can affect the parameter estimation. Therefore, the best strategy of allocation for the two-stage design under the consideration of cost and estimation deserves further investigation in practical QTL mapping.
The traditional diploid methods proposed for mapping diploid traits have been applied to mapping endosperm traits by several researchers (TAN et al. 1999; WANG and LARKINS 2001; WANG et al. 2001). Such applications generally violate the traditional belief that the endosperm traits are under the control of triploid mechanisms (BENNER et al. 1989; ZHU and WEIR 1994; WU et al. 2002a,b). If the diploid methods are applied to mapping endosperm traits, the confounding problem in estimation will occur (Equations 9 11), and the sampling variances of the estimates will inflate. Consequently, the diploid methods can cause some problems, such as bias in estimation and loss in power, in mapping endosperm traits. Although the diploid methods have these problems, the simulation study indicates that, in some parameter settings, its performance (in power and position estimate) can be similar to the triploid method under the one-stage design (Tables 1 and 2) due mainly to the correlation between the genomes of the maternal plant and its endosperms. Therefore, the diploid method can still be used as a preliminary method in mapping endosperm traits. By taking the triploid mechanism into account, the triploid method, especially under the two-stage design, can effectively solve the problems and significantly improve the mapping of endosperm traits.
The proposed MIM-based triploid method is a multiple-QTL model. This multiple-QTL approach distinguishes itself from the current IM-based methods of WU et al. (2002a)(b) and XU et al. (2003) by the ability to use multiple-marker intervals simultaneously to fit multiple QTL into the model in mapping endosperm traits. As a result, the proposed method can provide greater power and precision, and it can readily analyze and search for epistatic QTL in endosperm trait mapping. Besides, the estimation procedures between these methods are different. The likelihood of the MIM-based method is a mixture of 4m normals and will become increasingly unwieldy in maximization as the number of QTL (m) fitted into the model increases. To solve the maximization problem with large m, the general formulas proposed by KAO and ZENG (1997) are applied to obtain the MLE of QTL effects as well as their variance-covariance matrix (see THE MIM MODEL FOR MAPPING ENDOSPERM TRAITS). The procedure of the general formulas is a maximum-likelihood approach based on the EM algorithm. The method by Xu et al. uses an iteratively reweighted least squares (IRWLS) procedure, which is a second-order approximation to the maximum likelihood, and it has problems in estimating the two dominance effects separately as pointed out by Xu et al. The estimation procedure in Wu et al. also implements a maximum-likelihood approach via the EM algorithm, but it needs additional procedures in the M-step to obtain the MLE if some QTL effects are not considered in the model (see APPENDIX B in WU et al. 2002b). The general formulas, however, do not have these problems and are relatively straightforward and simple to maximize. An initial version of the triploid MIM program source code (written in Fortran 77 language) is available on the worldwide web (http://www.stat.sinica.edu.tw/chkao/).
It has been pointed out that the critical value for claiming QTL detection is a very complicated issue and deserves further investigation (LANDER and BOTSTEIN 1989; JANSEN 1993; ZENG 1994; KAO et al. 1999). Generally, the critical value depends on the number and size of intervals, different levels of heritability (size of QTL), different numbers of (linked or unlinked) QTL, and linked QTL in the same or opposite direction of effects. VISSCHER and HALEY (1996) pointed out that the critical value should be reduced after a QTL of large effect has been detected. The determination of critical value in mapping endosperm traits will be more complicated as each QTL can have three possible effects and many different types of epistasis, and more different experimental designs (the one-stage or two-stage design with different allocations in the backcross or F2 population) can be considered. In this article, the permutation tests by CHURCHILL and DOERGE (1994) are used to determine the critical value for claiming QTL detection in endosperm trait mapping. It is found that the critical value for the triploid model in the two-stage design is larger than that in the one-stage design (Tables 1 and 2). Given the same heritability, the critical value in the F2 population is larger than that in the backcross population except for the third setting. More efforts are needed to unravel the issue of critical value in mapping endosperm traits. The understanding of QTL underlying the endosperm traits is very important to cereal breeding in improving yield potential and grain quality. This MIM-based triploid method can serve as an effective tool to estimate the parameters associated with the underlying QTL in mapping endosperm traits. Another important issue worth pursuing is to investigate the properties of different genetic models in mapping endosperm traits. Besides, several researchers (ZHU and WEIR 1994; MAZUR et al. 1999; VAN DER MEER et al. 2001; WU et al. 2002b; XU et al. 2003) have pointed out that the maternal and offspring genomes could jointly affect the seed- or endosperm-specific traits. Therefore, it is important to take the genome information about the two successive generations into account in mapping those traits and, more importantly, to do so on the basis of a multiple-QTL model approach.
 |
APPENDIX A:
|
|---|
THE GENETIC VARIANCE COMPONENTS OF ENDOSPERM TRAITS
When m QTL with complete marginal and epistatic effects are considered together, the genetic variance of an endosperm trait can be decomposed into 4m x (4m 1)/2 variance and covariance components. Taking m = 2 as an example, the genetic variance can have 120 variance and covariance components in the backcross and F2 populations (not shown). If the two QTL are unlinked, the genetic variance reduces to 83 and 111 components in the two populations. For the F2 population, these components are
Likewise, the components of variance and covariance for the backcross population can be also obtained.
 |
APPENDIX B:
|
|---|
THE RELATION BETWEEN THE PARAMETERS OF THE DIPLOID AND TRIPLOID MODELS IN MAPPING ENDOSPERM TRAITS
To simplify the argument, assume that an endosperm trait value, y, measured in the backcross or F2 population is affected only by a single QTL, Q. The backcross individuals can have two possible QTL genotypes, Qq (w = 1/2) and qq (w = 1/2), each with frequency 1/2. The F2 individuals can have three possible QTL genotypes, QQ (w1 = 1, w2 = 1/2), Qq (w1 = 0, w2 = 1/2), and qq (w1 = 1, w2 = 1/2), with frequencies 1/4, 1/2, and 1/4, respectively. For autogamous plants, the individuals with QQ or qq genotype can produce only one endosperm genotype, QQQ or qqq. The individuals with Qq genotype can produce four kinds of endosperm genotype, QQQ (x = 3/2, z1 = 0, z2 = 0), QQq (x = 1/2, z1 = 1, z2 = 0), Qqq (x = 1/2, z1 = 0, z2 = 1), and qqq (x = 3/2, z1 = 0, z2 = 0), each with frequency 1/4. The frequencies of the four triploid QTL genotypes are 1/8, 1/8, 1/8, and 5/8, respectively, in the backcross population, and they are 3/8, 1/8, 1/8, and 3/8, respectively, in the F2 population. The covariances between the coded variables for the QTL genotypes of a diploid individual and its triploid endosperm are found to be Cov(x, w) = 3/8, Cov(z1, w) = 1/16, Cov(z2, w) = 1/16 in the backcross population, and they are Cov(x, w1) = 3/4, Cov(z1, w1) = 0, Cov(z2, w1) = 0, Cov(x, w2) = 0, Cov(z1, w2) = 1/16, and Cov(z2, w2) = 1/16 in the F2 population.
If the diploid models in Equation 6 or 7 are applied to analyze a marker, M, to infer Q along the genome, the regression coefficient of y on the marker M (coded by wM) in the backcross model is given by byM = Cov(y, wM)/V(wM), where Cov(y, wM) is the covariance between the endosperm trait and the marker variable, and V(wM) is the variance of the marker variable. It is easy to obtain
if there is no covariance between the residual error and marker variable. The regression coefficient is byM = (1 2rQM)[3a/2 + (d1 + d2)/4] because V(wM) = 1/4. Similarly, the two regression coefficients for the additive and dominance effects of M in the F2 diploid model can be obtained. The regression coefficient of the additive variable is
and regression coefficient of the dominance variable is
Note that the partial regression coefficients for the additive and dominance effects are the same as byMa and byMd, as wMa and wMd are orthogonal in the F2 population.
The conditional phenotypic variance on the marker M for the backcross diploid model is
, where
2y is the phenotypic variance, and
yM denotes the covariance between y and M. The conditional phenotypic variance is
where
2 is the variance of residual error. For the F2 diploid model, the conditional phenotypic variance on the marker M is
. The conditional phenotypic variance is
The conditional phenotypic variances are the same for the backcross and F2 models.
 |
APPENDIX C:
|
|---|
CONDITIONAL PROBABILITIES OF ENDOSPERM QTL GENOTYPES
Consider a marker interval, Ij, flanked by markers, Mj and Nj, on a linkage group. For the plants in the F2 population, there are nine observable genotypes for markers Mj and Nj. They are MjNj/MjNj, MjNj/Mjnj, Mjnj/Mjnj, MjNj/mjNj, MjmjNjnj (MjNj/mjnj or Mjnj/mjNj), Mjnj/mjnj, mjNj/mjNj, mjNj/mjmj, and mjnj/mjmj with proportions (1 r)2/4, r(1 r)/2, r2/4, r(1 r)/2, (1 r)2/2 + r2/2, r(1 r)/2, r2/4, r(1 r)/2, and (1 r)2/4, respectively. For the plants in the backcross population, there are four observable genotypes, MjNj/MjNj, MjNj/Mjnj, MjNj/mjNj, and MjNj/mjnj, with proportions (1 r)/2, r/2, r/2, (1 r)/2, respectively. For autogamous plants, the plants with genotypes MjNj/MjNj, Mjnj/Mjnj, mjNj/mjNj, and mjnj/mjmj each can produce only one progeny (embryo) genotype. The plants with genotypes MjNj/Mjnj, MjNj/mjNj, Mjnj/mjnj, and mjNj/mjmj each can produce three different embryo genotypes. For example, the three embryo genotypes produced by plants with genotype MjNj/Mjnj are MjNj/MjNj, MjNj/Mjnj, and Mjnj/Mjnj. The plants with genotype MjNj/mjnj (Mjnj/mjNj) can produce nine different embryo genotypes. A total of 25 and 16 different combinations of the plant and embryo genotypes are in the F2 and backcross populations, respectively.
If an unobservable QTL, Qj, is located in Ij, among the seeds (progeny) collected from the