Abstract
The endosperm, a result of double fertilization in flowering plants, is a triploid tissue whose genetic composition is more complex than diploid tissue. We present a new maximumlikelihoodbased statistical method for mapping quantitative trait loci (QTL) underlying endosperm traits in an autogamous plant. Genetic mapping of quantitative endosperm traits is qualitatively different from traits for other plant organs because the endosperm displays complicated trisomic inheritance and represents a younger generation than its mother plant. Our endosperm mapping method is based on two different experimental designs: (1) a onestage design in which marker information is derived from the maternal genome and (2) a twostage hierarchical design in which marker information is derived from both the maternal and offspring genomes (embryos). Under the onestage design, the position and additive effect of a putative QTL can be well estimated, but the estimates of the dominant and epistatic effects are upward biased and imprecise. The twostage hierarchical design, which extracts more genetic information from the material, typically improves the accuracy and precision of the dominant and epistatic effects for an endosperm trait. We discuss the effects on the estimation of QTL parameters of different sampling strategies under the twostage hierarchical design. Our method will be broadly useful in mapping endosperm traits for many agriculturally important crop plants and also make it possible to study the genetic significance of double fertilization in the evolution of higher plants.
ONE of the most important events in the evolution of higher plants is the occurrence of double fertilization, a phenomenon independently discovered by Navashin of Russia in 1898 and Guignard of France in 1899 (Friedman 1990, 1998; Jensen 1998). During the process of double fertilization, one of the two sperm cells from a pollen tube fertilizes the haploid egg cell to form a diploid zygote (the new sporophytic generation) and the other sperm cell fertilizes the diploid central cell and fuses with the central cell (polar) nuclei, thus giving rise to the triploid endosperm. Different targets for fertilization by the sperm cells are regarded as evolutionarily significant, because such reproductive behavior provides an opportunity for differential segregation of those traits associated with the production of nutritive tissue (endosperm) for the seed from those traits associated with successful embryo development. With the initiation of zygote and endosperm from separate fertilizations, the opportunity exists for optimal, independent specialization of each tissue (Friedman 1990, 1998).
The endosperm is also tremendously important to human nutrition. Grain quality depends critically upon amylose content in rice, protein content and percentage of amino acid in wheat, gum content in barley, and sugar content in sweet corn. Genetic improvement of such endosperm traits that affect food quality has received considerable attention in plant breeding (Benneret al. 1989; Sadimantaraet al. 1997; Mazuret al. 1999; van der Meeret al. 2001). Quantitative genetic models for analyzing the trisomic inheritance of endosperm traits have been developed and applied to practical data analyses in a variety of grain crops (Gale 1976; Mo 1987; Bogyoet al. 1988; Pooniet al. 1992; Zhu and Weir 1994). However, traditional quantitative methods may not efficiently resolve the detection of genetic factors underlying endosperm traits because they fail to estimate the map positions of these genetic factors on the chromosomes. DNAbased molecular markers for mapping genetic factors conditioning a quantitative trait, or quantitative trait loci (QTL), are being widely used to map QTL for endosperm traits (Tanet al. 1999; Wang and Larkins 2001; Wanget al. 2001). For example, using 83 simple sequence repeat loci, Wang et al. (2001) detected two QTL, one on the short arm of chromosome 4 and the other on the long arm of chromosome 7, together accounting for 25% of the variance for elongation factor 1α content in maize endosperm.
Current statistical analyses of the association between markers and endosperm traits are based on the assumption that endosperm traits are controlled under the same genetic composition as other tissues developing from the diploid embryo. This assumption is obviously violated because the endosperm has three unique properties. First, the endosperm is triploid and has a more complex genetic composition than the embryo. For a locus with two alleles A and a, four genotypic combinations AAA, AAa, Aaa, and aaa are possible, vs. three genotypes AA, Aa, and aa for the diploid embryo. Second, the endosperm of a cross between two different genotypes will differ from that of the reciprocal cross. Third, the occurrence of the fertilized egg is the beginning of a new generation, so the embryo and endosperm on a plant represent the next generation. A precise resolution into mapping endosperm traits using marker information from diploid organs needs the development of a bridge between these two tissues with different levels of ploidy. Although this is statistically challenging, currently developed computational algorithms, such as the EM algorithm, provide a powerful means for mapping the QTL contained in the triploid endosperm on the basis of marker information from diploids.
In this article, we have developed a maximumlikelihoodbased method, implemented with the EM algorithm, to map QTL responsible for endosperm traits using a genetic linkage map of polymorphic markers. For a particular plant, the formation of endosperm QTL genotypes depends on how its polar nuclei, which are the duplication of the female gametes (eggs), are fertilized by the male gametes (pollen). Our models are different for autogamous and allogamous plants. For autogamous plants, such as rice, the sperm cells fertilizing the two central cells, which generate the endosperm, are derived from the same plant. For allogamous plants the sperm cells are derived from a pollen pool of the mapping population. For this reason, the endosperm QTL genotypes will have different segregation patterns between autogamous and allogamous plants. Here we report the development of a statistical method for endosperm mapping in autogamous plants. This method is studied under different statistical strategies by extensive simulation experiments.
EXPERIMENTAL DESIGNS
Consider an endosperm trait measured in a backcross or F_{2} population. Traditional diploid trait mapping, as proposed by Lander and Botstein (1989), uses marker information and phenotypic measurements derived from the same generation and the same ploidy level. However, because the endosperm is triploid, its precise molecular characterization can be difficult. For example, it is impossible to distinguish directly between two endosperm genotypes AAa and Aaa from commonly used dominant (randomly amplified polymorphic DNA or amplified fragment length polymorphism) or codominant marker systems (restriction fragment length polymorphism or microsatellite). For dominant markers, three genotypes AAA, AAa, and Aaa cannot be distinguished from one another. For this reason, marker information in endosperm mapping should not be derived from the triploid endosperm rather than from other diploid tissues. Assuming that an endospermspecific trait is controlled only by the endosperm QTL genotype and that no gene interactions from maternal effects exert to affect the trait expression, we need to predict endosperm QTL behavior (generation t + 1) using molecular markers from the maternal genome (generation t) or offspring genome (generation t + 1). It should be pointed out that, although generation t +1 is used here for the endosperm, it does not mean that the endosperm can reproduce to generate the progeny. Marker information can be sampled from a diploid tissue of a maternal plant and/or its progeny’s diploid tissue (e.g., embryo), which represents two different experimental designs for mapping endosperm QTL contained in seeds. Below, these two designs are described.
Consider a backcross plant of an autogamous species. The diploid marker genotype of this plant is determined only by the gamete genotypes of the heterozygous F_{1}, whereas its endosperm QTL genotypes are determined by the combination of the polar nucleus of two central cells and a sperm nuclei, which this plant produces for the next generation. Within individual plants, the frequencies of polar nuclei genotypes are identical to those of the female gamete (egg) genotypes because the two central cells are formed from the egg cell through mitosis and, thereby, the polar nuclei genotypes can be regarded as the homogeneous duplication of the egg genotypes (Friedman 1990, 1998). It is clear that, for autogamous plants as considered in this study, the frequencies of the sperm genotypes are identical to those of the egg genotypes and of the polar nuclei genotypes. On the basis of these properties, we can calculate the frequencies of the endosperm QTL genotypes that a backcross plant produces, given the diploid marker genotypes of this plant and its embryos.
Two different experimental designs can be used to predict triploid endosperm QTL genotypes on the basis of diploid marker genotypes. The first design is one in which the endosperm is predicted from a diploid tissue of backcross plants (in generation t). We call this design a onestage design. The second design uses marker genotypes from both backcross plants (maternal genome) and their embryos (offspring genome) to predict the endosperm, which is called a twostage hierarchical design because two successive generations (t and t + 1) are genotyped. The onestage design is simpler in terms of the material genotyped, whereas the twostage hierarchical design is more precise because withinfamily variation of a backcross plant is considered.
Suppose there are two flanking markers,
STATISTICAL MODEL
We have formulated a statistical mixture model, in which different QTL genotypes in the endosperm are viewed as components of a normal mixture. This mixture model is defined by the frequency of each of the endosperm QTL genotypes and the density corresponding to each genotype.
Additivedominant model: The phenotypic value (y) of an endosperm trait due to a single QTL for the ith backcross plant under the onestage design or the jth autogamous seed of the ith backcross plant under the twostage hierarchical design can be statistically modeled by
The variables ε_{i}, ε_{ij} are the residuals including the aggregate effect of both polygenes and error effect and distributed as N(0, σ^{2}_{ε}). The probabilities with which x’s and z’s take an assigned value depend on the genomic positions of the QTL in the interval bracketed by flanking markers given in Tables 1 and 2.
Consider an endosperm mapping population composed of M backcross plants and N_{i} randomly selected autogamous seeds from the ith backcross plant. These backcross plants and the embryos of their seeds are genotyped simultaneously, whereas phenotypes are measured for the endosperm of their seeds. The likelihood of the marker data and the endosperm trait values controlled by the putative QTL can be represented under the onestage or twostage hierarchical design by
The maximumlikelihood estimates (MLEs) of the unknown parameters Ω = (m, r_{1} or r_{2}, σ^{2}_{e})^{T} under the onestage or twostage hierarchical design can be computed by implementing an expectation maximization (EM) algorithm (Dempsteret al. 1977; Meng and Rubin 1993). The loglikelihood of Equation 3 for the twostage hierarchical design is given by
Epistatic model: Suppose there are two biallelic QTL
If the two putative QTL are tested on different intervals, the conditional probabilities of the twoQTL genotypes can be calculated independently for each QTL, i.e., p_{ik}_{1}_{k}_{2} = p_{ik}_{1}p_{ik}_{2} (i = 1,..., M; k_{1}, k_{2} = 0,..., 3) for the onestage design and p_{ijk}_{1}_{k}_{2} = p_{ijk}_{1}p_{ijk}_{2} (j = 1,..., N_{i}) for the twostage hierarchical design, assuming that there is no crossover interference between the two marker intervals. Denote Θ_{1} and Θ_{2} as the matrices for the conditional probability of QTL genotypes for
The phenotypic value of an endosperm trait due to the two putative QTL can be modeled under the onestage or twostage hierarchical design. As our analysis is similar between these two models, only the twostage hierarchical design is described. The phenotypic value of an endosperm trait from the ith backcross plant and its jth autogamous seed under the twohierarchical design is written as
HYPOTHESIS TESTING
Additivedominant model: A number of hypothesis tests can be formulated for endosperm inheritance. The first hypothesis considers the existence of any QTL affecting the expression of an endosperm trait, which is expressed as
Second, a hypothesis test can be made for the additive or dominant effects of the QTL on the endosperm trait, on the basis of
Epistatic model: The existence of any QTL affecting the expression of an endosperm trait under the twoQTL model can be tested on the basis of the hypotheses
Like the hypothesis tests for the additive or dominant effects of individual QTL, the significance of QTL epistasis effects on the expression of an endosperm trait can be tested on the basis of
If two QTL detected are on the same interval, the degree of their linkage can also be tested. Testing the QTL linkage is equivalent to testing r_{2} = 0, where r_{2} is the recombination fraction between the two QTL. This test can be extended to test for a particular value of the recombination fraction.
MONTE CARLO SIMULATION
Simulation scenarios: We performed a series of simulation experiments to examine the statistical properties of our endosperm mapping method. A linkage group length of 100 cM is simulated for a backcross population. Assume that there are six equidistant markers ordered as
In our simulation experiments, we use different sampling strategies for marker information (onestage vs. twostage hierarchical design), different levels of heritability (H ^{2} = 0.2 vs. 0.6) and different sample sizes (M = 200 vs. 400). For the twostage hierarchical design, different sampling strategies are designed on the basis of allocations of a given sample size between the backcross plants and their progeny (seeds). The sampling strategies used here are (1) 400 × 1 (one seed is sampled from each of 400 backcross plants), (2) 40 × 10 (40 seeds are sampled from each of 10 backcross plants), (3) 20 × 20 (20 seeds are sampled from each of 20 backcross plants), and (4) 10 × 40 (10 seeds are sampled from each of 40 backcross plants). In addition to their possible effects on parameter estimation, these different strategies have different utilities in practice. For example, strategy 1 does not require genotyping many embryos, but requires the maintenance of a large backcross population. In strategy 4, only a small backcross population is maintained, but it needs many embryos to be genotyped.
Additivedominant model: Suppose there is a QTL located on the middle point of the linkage group (i.e., 50 cM away from each end of the group), which affects a quantitative endosperm trait. The additive and dominant effects of the QTL are hypothesized as a = 1, d_{1} = 0.8, and d_{2} = 0.5. Given the overall mean μ= 10, the genotypic means of the four possible endosperm QTL genotypes are calculated using Equation 1. In the endosperm progeny of the backcross, the frequencies of these triploid QTL genotypes are 1/8 for QQQ, 1/8 for QQq, 1/8 for Qqq, and 5/8 for qqq. The genetic variance due to this QTL is 1.2. When the heritability H ^{2} of an endosperm trait is given, the residual variance can be calculated, from which the residual effects (and therefore phenotypic values) of the individuals are simulated.
The characterization of the threshold for declaring the existence of a QTL is a difficult issue. The permutation test proposed by Churchill and Doerge (1994) is regarded as a useful approach for calculating the threshold because it is not dependent on the distribution of the test statistic. However, permutation tests require expensive computations. We thus use chromosomewide permutation tests to characterize the thresholds only under a sample size of 400 for the onestage design and under a 40 × 10 strategy for the twostage hierarchical design. A set of endosperm phenotypic values for the onestage design is simulated using the residual variances of σ^{2}_{ε} = 1.2 and 6.4, which correspond to the heritability levels of 0.6 and 0.2, respectively, when a QTL is assumed. Similarly, for the twostage hierarchical design, two simulation scenarios, with σ^{2}_{ε} = 1.2 and 6.4, are designed. In each case, our model is used to estimate QTL parameters for endosperm inheritance and calculate the corresponding LR.
The distribution of the LR values over 1000 permutation replicates can be approximated by a χ^{2} distribution. The 95th, 99th, and 99.9th percentiles of the distribution of the maximum are used as empirical critical values to declare the existence of a QTL on the linkage groups at the significance levels α= 0.05, 0.01, and 0.001. Under the onestage design, these percentiles are 10.2551, 13.1874, and 15.7615 for H ^{2} = 0.6 and 10.1746, 15.5654, and 23.3611 for H ^{2} = 0.2, respectively. These percentiles under the twostage hierarchical design (40 × 10) are 12.6697, 15.9342, and 20.3486 for H ^{2} = 0.6 and 12.9303, 15.8864, and 18.7674 for H ^{2} = 0.2.
Epistatic model: Suppose there are two QTL affecting a quantitative endosperm trait, the first located on the middle point of the marker interval
We perform LR tests across a grid of locations on the chromosome to infer the most likely genome positions of two QTL. The declaration for the existence of QTL is based on a critical threshold for the LR test statistic that controls the chromosomewise type I error rate. Permutation tests proposed by Churchill and Doerge (1994) are used to calculate the threshold values for each simulation scenario.
RESULTS
Additivedominant model: Onestage design: The position and effects of the hypothesized QTL can be reasonably well estimated under this design, but, as expected, with better accuracy and precision for a larger than a smaller sample size and for a higher rather than lower heritability level (Table 3). The meansquared error (MSE) for the position MLEs among 100 replications of simulation is very high (97.52) at M = 200 and H ^{2} = 0.2, suggesting that the QTL position cannot be precisely estimated in this case. But such a high sampling error can be reduced by a factor of 3 when M is increased to 400 or by a factor of 8 when H ^{2} is increased to 0.6. This information also can be seen in Figure 1, in which a more precise localization of the QTL displays a narrower peak for the profile of the loglikelihood ratio test statistics across the length of the linkage group.
The MLEs of the genotypic means (μ’s) can be well estimated, although a larger heritability and larger sample size will lead to better estimates (Table 3). These estimates are used to solve for one additive effect (a) and two dominant effects (d_{1} and d_{2}) using the linear equation given in appendix b. The MLEs of the additive and dominant effects display different accuracy and precision. Whereas the MLE of the additive effect has acceptable precision at M = 200 and H ^{2} = 0.2, those of the two dominant effects are highly upward biased and imprecise. Moreover, the dominant effect (d_{2}) due to the intralocus interaction of Q and qq appears to be not only overestimated more seriously, but also has lower estimation precision than the dominant effect (d_{1}) due to the intralocus interaction of QQ and q. For example, at M = 200 and H ^{2} = 0.2, d_{2} is overestimated by onefold, whereas d_{1} is overestimated by ∼30%. When the heritability of an endosperm trait and/or the sample size used is increased, the precision of the dominant effect estimation improves, but to a lesser extent than does the precision of the additive effect estimation.
In general, the onestage design that ignores marker segregation within backcross families can be used to estimate the position and effect of a QTL on the endosperm. But acceptable estimation precision for dominant effects under the onestage design requires a high heritability level of an endosperm trait and a large sample size.
Twostage hierarchical design: The precision of the MLEs for the QTL position and effects improves significantly under the twostage hierarchical design (Table 4) compared to the onestage design (Table 3). Under the twostage hierarchical design, the QTL can be localized with high mapping resolution when H ^{2} is high (0.6; Figure 2). For the twostage hierarchical design, different sampling strategies have different precision of parameter estimation. The estimation precision is best under strategy 40 × 10 when H ^{2} is higher, but the precision is best under strategy 400 × 1 when H ^{2} is lower. For the same design, the precision of parameter estimation differs between different types of parameters. A general trend is that the MLEs of the dominant effects have lower precision than those of the additive effect and the QTL position. But compared to the onestage design, the estimates of the dominant effects are much less biased for the twostage hierarchical design (Table 4).
In summary, the twostage hierarchical design is better than the onestage design in terms of the accuracy and precision of QTL parameters. The twostage hierarchical design is particularly more advantageous than the onestage design in estimating the dominant effects of an endosperm trait.
Epistatic model: Onestage design: The accuracy and precision of the estimates of the hypothesized QTL locations and effects, as well as the power to detect a significant QTL, depend on the magnitude of QTL effect, the nature of the effect, the level of heritability, and sample size (Table 5). The QTL of a larger effect can be better mapped to a genomic location than that of a smaller effect. Figure 3, A and B, illustrates the landscapes of the loglikelihood ratio test statistics as a function of the locations of the two QTL for the onestage design. The maxima of the landscape is closer to the coordinate of the true positions of the two QTL when an endosperm trait has a larger (Figure 3A) than lower heritability (Figure 3B).
For a small sample size (200) and a low heritability trait (0.20), the additive effect of a large QTL and its additive × additive effect with other QTL can be well estimated. The estimation of the additive effect of a small QTL displays significantly increased accuracy and precision when sample size and/or the heritability is increased. The dominant effects among different alleles at the same QTL cannot be well estimated for a small sample size and lowheritability endosperm trait (Table 5). Both the accuracy and precision of the estimates of dominant effects can be increased with increased sample size and heritability. It seems that the estimation precision for dominant effects can be better improved with the increased level of heritability than increased sample size. It is difficult to precisely estimate the additive × dominant, dominant × additive, and dominant × dominant epistatic effects of the two QTL for an endosperm trait of low heritability. Even the estimates of these parameters cannot be well improved when the sample size used is increased from 200 to 400.
In general, the onestage design that ignores marker segregation within backcross families can be used to estimate the position and additive and additive × additive effect of a large QTL on the endosperm. But it is difficult to achieve acceptable estimation precision for dominant and epistatic additive × dominant, dominant × dominant, and dominant × dominant effects under the onestage design. It appears that a high heritability level is more important in improving the estimates of these parameters than the increase of sample size.
Twostage hierarchical design: When an endosperm trait is under low genetic control (H ^{2} = 0.2), the twostage hierarchical design, which captures both between and withinfamily variation, is not advantageous in the estimation precision of two epistatic QTL over the onestage design, which captures only betweenfamily variation (Table 6). This is in sharp contrast to the result of the oneQTLfitting model in which the twostage hierarchical design can provide more precise estimation of QTL parameters than the onestage design (Tables 3 and 4).
The twostage hierarchical design displays striking advantages in estimating the epistasis of QTL for an endosperm trait of high heritability (H ^{2} = 0.6; Table 5). In this case, the position of the two epistatic QTL, including one with a smaller effect, can be precisely mapped, as seen from reduced MSEs. For example, the MSE of the position of the smaller QTL is 42.33 under the onestage design (Table 5), whereas it is reduced to 9.4422.05 for the same sample size under the twostage hierarchical design (Table 6). Figure 4, AC, illustrates the landscapes of the loglikelihoodratio test statistics of the two putative QTL with epistasis calculated from one simulation run for the different sampling strategies. The peaks of the LR landscapes are close to the true positions of the two QTL, suggesting that their positions can be well estimated.
Aside from precise estimation of additive and additive × additive (i_{aa}) effects, the precision of the estimates of different dominant effects and additive × dominant (j_{ad}), dominant × additive (k_{da}), and dominant × dominant effects (l_{dd}) can be improved for a high heritability endosperm trait from the twostage hierarchical design (Table 6). It appears that the MLEs of the additive effects, i_{aa}, j_{ad}, and k_{da}, are more precise than those of the dominant effects and l_{dd} among all different sampling strategies. However, different sampling strategies display different estimation precision of the MLEs of the dominant effects and l_{dd}. For example, strategy 10 × 40 can provide more precise estimates of the dominant effects of one Q over two q’s (
In summary, the twostage hierarchical design is better for the estimation of epistatic QTL than the onestage design when an endosperm trait has a high heritability. Different sampling designs have different effects on the estimation of dominant effects and dominant × dominant epistatic effects. Depending on the nature of experimental material and breeders’ purposes, these sampling designs can be selectively used.
DISCUSSION
The genetic improvement of grain quality in crop plants has now become a major focus in many plant breeding programs (Sadimantaraet al. 1997; Mazuret al. 1999; Tanet al. 1999; van der Meeret al. 2001; Wanget al. 2001). Compared to yield improvement, however, quality improvement will be much more challenging because traits affecting grain quality are endosperm specific, and the endosperm being a triploid tissue has complex trisomic inheritance. Also, since the endosperm is one generation ahead of its mother plant, it is difficult to predict the segregation patterns of the endosperm genes on the basis of marker information from the maternal parent. These have been two major obstructions in improving grain quality through a markerassisted selection strategy. In this article, we employ traditional quantitative genetic principles and powerful statistical technologies to develop a new theoretical method for mapping QTL underlying endosperm traits in autogamous plants.
Unlike traditional diploid mapping, genetic mapping of the triploid endosperm requires estimation of a large number of genetic parameters because there are more copies of alleles at each locus. Statistically, an increased number of unknown parameters to be estimated can create numerous problems in estimation, such as larger biases, larger sampling errors, and lower power. We compare the differences between two alternative experimental designs in the capacity to minimize the problems due to an increased number of unknowns in our mapping model. The first model, the onestage design, draws marker information only from the current generation of plants, and the second model, the twostage hierarchical design, draws marker information from both the current generation and the selffertilized progeny that the plants generate. In practice, the onestage design is less expensive because no marker information for the progeny of the experimental plants is needed. However, theoretical simulations indicate that the estimation of QTL parameters, especially dominant effects, under the onestage design can be seriously overestimated when the heritability of an endosperm trait and/or sample size is not large. The twostage hierarchical design, which extracts marker information from two successive generations, can improve the accuracy and precision of QTL parameters including dominant effects. This is especially remarkable when an endosperm has a low heritability and/or the experiment is based on a limited number of backcross progeny. For the twostage hierarchical design, different allocations of a given number of samples between the backcross plants and their autogamous progeny also affect the parameter estimation.
The endosperm used as a mapping tissue has several theoretical advantages in its own right. For example, since the endosperm is triploid and contains extra gene copies, a number of hypotheses regarding the role of gene interactions within or between loci in plant development, adaptation, and evolution can be addressed using the endosperm as a model study material. Moreover, the endosperm is formed due to the fusing of two polar nuclei cells with a sperm nucleus. The endosperm is an ideal material to study maternal effects and interaction effects with nuclei genes on plant behavior. Our mapping model can be readily extended to include maternal effects. Also, our limited knowledge about the genetic mechanisms underlying endosperm traits makes it practically impossible to understand the evolutionary significance of double fertilization in higher plants (Friedman 1990, 1998), which produces the endosperm. The genetic mapping of endosperm provides a powerful means for addressing this fundamental question in plant evolution.
In this article, we extend a oneQTL analysis to include QTL with epistasis in the control of an endosperm trait. The role of epistasis in trait control, evolution, and breeding has been reconciled recently thanks to the use of powerful molecular markers that can effectively dissect a complex trait into its individual QTL locus components (Doebleyet al. 1995; Larket al. 1995; Lukens and Doebley 1999; Cheverud 2000; Kim and Rieseberg 2001). However, because epistasis results from the dependence of different genes activated in a physiological process or biochemical pathway (Phillips 1998) so that the precise estimation of epistatic effects on quantitative variation is always difficult, many statistical methods for QTL mapping in the current literature assume no epistasis, or they are simply based on a twoway analysis of variance specifying the interaction effect of a given pair of markers (see the references listed above). A few methodologies with power to detect epistasis include Kao et al.’s (1999) multipleinterval mapping, Du and Hoeschele’s (2000) finite locus model, and Jannink and Jansen’s (2001) onedimensional genome search. These methods allow for the determination of epistasis between different QTL or between QTL and polygenic background. Our method can be specifically used to map epistatic QTL affecting the expression of complex traits on triploid endosperm derived from double fertilization in flowering plants.
The idea described in our mapping model can be extended to map pleiotropic QTL affecting both grain yield and grain quality, although this will be a challenging statistical issue. Traits for grain yield, e.g., seed number and seed weight, are located on the diploid tissues of backcross plants. As a consequence, the QTL for grain yield should be modeled in the mode of disomic inheritance. However, endosperm traits for grain quality undergo trisomic inheritance and, thus, they should be modeled differently. A statistical framework should be built to model the pleiotropic effect of a QTL on diploid tissues and triploid endosperm. Such a framework will permit plant breeders to design a more efficient breeding program for selecting superior genotypes with high yield and high quality.
In this study, we report our results in a backcross population for an autogamous plant system. The application of this model in an F_{2} design is not difficult, except for the segregation of more marker genotypes in both the current and progeny generations. For an autogamous plant, the eggs and two polar nuclei cells are self fertilized so that the frequencies of male gamete genotypes are identical to those of female gamete genotypes. But in an allogamous plant, such as maize, each female gamete from each mother plant will be pollinated by all possible male gametes from the pollen pool. This difference should be considered when the current model is used to study the genetics of the allogamous endosperm. All of these issues deserve indepth investigations.
APPENDIX A: DERIVATION OF CONDITIONAL PROBABILITIES
We describe the procedures for deriving the conditional probabilities of endosperm QTL genotypes upon marker genotypes of the current backcross (onestage design) and marker genotypes of the backcross and its progeny for an autogamous species (twostage hierarchical design). Suppose two inbred lines P_{1} and P_{2}, with respective genotypes M_{η}M_{η}QQM_{η+1}M_{η+1} and m_{η}m_{η}qqm_{η+1} m_{η+1} at two flanking markers
Onestage design: In this model, we use only marker genotypes from the backcross plants, without considering the marker information from the progeny of the backcross. Thus, our interest here is how to generate endosperm QTL genotypes given a particular marker genotype of the backcross. The endosperm QTL genotypes (generation t + 1) produced by the first backcross genotype for an autogamous plant are the product of the array of the polar nuclei genotypes and the array of the sperm genotypes,
The second backcross genotype G^{(}^{t}^{)}_{101} with the same marker genotype Z^{(}^{t}^{)}_{11} produces one polar nuclei QTL genotype (qq) and one sperm QTL genotype (q), which thus results in only one single endosperm QTL genotype qqq. The conditional probabilities of the endosperm QTL genotypes upon the diploid marker genotypes
Similarly, we can derive the conditional probabilities of the endosperm QTL genotypes upon the other three diploid marker genotypes in the backcross (see Table 1).
Twostage hierarchical design: When the marker information from both the backcross and its progeny is considered simultaneously, we should see the segregation of both markers and QTL in the progeny of the backcross. Under the twostage hierarchical design, the endosperm genotypes (in generation t + 1) produced by the first backcross genotype for an autogamous plant are the product of the array of eight polar nuclei genotypes (denoted by GG) and the array of eight sperm genotypes,
The same endosperm genotypes derived from the first and second backcross genotype will be summed up because of their same marker genotype Z^{(}^{t}^{)}_{11}. With similar analyses we can sum up the same endosperm genotypes for the third and fourth, fifth and sixth, and seventh and eighth backcross genotypes.
The conditional probabilities of the endosperm QTL genotypes, conditional upon diploid marker genotypes of the backcross (t) and its autogamous progeny (t + 1) under the twostage hierarchical design can be derived according to Bayes’ theorem. For example, the conditional probability of endosperm QTL genotypes QQQ, QQq, Qqq, and qqq, conditional upon a diploid marker genotype Z_{111} in the two successive generations can be calculated as
APPENDIX B: ESTIMATORS OF MODEL PARAMETERS IN THE M STEP
Additivedominant model: The MLEs of the unknown QTL parameters are obtained by differentiating the likelihood with respect to each unknown, setting the derivatives equal to zero, and solving the loglikelihood equations. The EM algorithm is used to obtain the MLE of the genetic mean of each endosperm genotype at a putative QTL bracketed by a marker interval.
The expressions of the loglikelihood equations for estimating genotypic means and residual variance in the M step are given as
After the genotypic means (m) are estimated, a linear transformation is used to estimate the QTL effect parameters (e). We have
Epistatic model: The twoQTL quantitative genetic model for a triploid endosperm trait can be expressed in matrix form,
To reduce the sampling variances of the MLEs of the QTL effect parameters, especially the dominant × dominant effects, we use single j_{ad}, k_{da}, and l_{dd} to capture information of different additive × dominant, dominant × additive, and dominant × dominant effects, respectively. In this case, the number of the unknown parameters in e is reduced to 11. We thus have a new design matrix
Acknowledgments
We thank three anonymous reviewers for their helpful comments on earlier versions of this manuscript. This work is partially supported by a grant from National Science Foundation (DMS9971586) to C.G., an Outstanding Young Investigators Award of the National Science Foundation of China (30128017), and a University of Florida Research Opportunity Fund (02050259) to R.W. The publication of this manuscript is approved as Journal Series no. R08586 by the Florida Agricultural Experiment Station.
Footnotes

Communicating editor: P. D. Keightley
 Received April 12, 2002.
 Accepted June 27, 2002.
 Copyright © 2002 by the Genetics Society of America