help button home button Genetics Please Sign the Guestbook
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wu, R.
Right arrow Articles by Casella, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wu, R.
Right arrow Articles by Casella, G.
Genetics, Vol. 162, 875-892, October 2002, Copyright © 2002

Statistical Methods for Dissecting Triploid Endosperm Traits Using Molecular Markers: An Autogamous Model

Rongling Wua, Chang-Xing Maa, Maria Gallo-Meagherb, Ramon C. Littella, and George Casellaa
a Department of Statistics, University of Florida, Gainesville, Florida 32611
b Agronomy Department, University of Florida, Gainesville, Florida 32611

Corresponding author: Rongling Wu, 533 McCarty Hall C, University of Florida, Gainesville, FL 32611., rwu{at}stat.ufl.edu (E-mail)

Communicating editor: P. D. KEIGHTLEY


*  ABSTRACT
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

The endosperm, a result of double fertilization in flowering plants, is a triploid tissue whose genetic composition is more complex than diploid tissue. We present a new maximum-likelihood-based statistical method for mapping quantitative trait loci (QTL) underlying endosperm traits in an autogamous plant. Genetic mapping of quantitative endosperm traits is qualitatively different from traits for other plant organs because the endosperm displays complicated trisomic inheritance and represents a younger generation than its mother plant. Our endosperm mapping method is based on two different experimental designs: (1) a one-stage design in which marker information is derived from the maternal genome and (2) a two-stage hierarchical design in which marker information is derived from both the maternal and offspring genomes (embryos). Under the one-stage design, the position and additive effect of a putative QTL can be well estimated, but the estimates of the dominant and epistatic effects are upward biased and imprecise. The two-stage hierarchical design, which extracts more genetic information from the material, typically improves the accuracy and precision of the dominant and epistatic effects for an endosperm trait. We discuss the effects on the estimation of QTL parameters of different sampling strategies under the two-stage hierarchical design. Our method will be broadly useful in mapping endosperm traits for many agriculturally important crop plants and also make it possible to study the genetic significance of double fertilization in the evolution of higher plants.


ONE of the most important events in the evolution of higher plants is the occurrence of double fertilization, a phenomenon independently discovered by Navashin of Russia in 1898 and Guignard of France in 1899 (FRIEDMAN 1990 Down, FRIEDMAN 1998 Down; JENSEN 1998 Down). During the process of double fertilization, one of the two sperm cells from a pollen tube fertilizes the haploid egg cell to form a diploid zygote (the new sporophytic generation) and the other sperm cell fertilizes the diploid central cell and fuses with the central cell (polar) nuclei, thus giving rise to the triploid endosperm. Different targets for fertilization by the sperm cells are regarded as evolutionarily significant, because such reproductive behavior provides an opportunity for differential segregation of those traits associated with the production of nutritive tissue (endosperm) for the seed from those traits associated with successful embryo development. With the initiation of zygote and endosperm from separate fertilizations, the opportunity exists for optimal, independent specialization of each tissue (FRIEDMAN 1990 Down, FRIEDMAN 1998 Down).

The endosperm is also tremendously important to human nutrition. Grain quality depends critically upon amylose content in rice, protein content and percentage of amino acid in wheat, gum content in barley, and sugar content in sweet corn. Genetic improvement of such endosperm traits that affect food quality has received considerable attention in plant breeding (BENNER et al. 1989 Down; SADIMANTARA et al. 1997 Down; MAZUR et al. 1999 Down; VAN DER MEER et al. 2001 Down). Quantitative genetic models for analyzing the trisomic inheritance of endosperm traits have been developed and applied to practical data analyses in a variety of grain crops (GALE 1976 Down; MO 1987 Down; BOGYO et al. 1988 Down; POONI et al. 1992 Down; ZHU and WEIR 1994 Down). However, traditional quantitative methods may not efficiently resolve the detection of genetic factors underlying endosperm traits because they fail to estimate the map positions of these genetic factors on the chromosomes. DNA-based molecular markers for mapping genetic factors conditioning a quantitative trait, or quantitative trait loci (QTL), are being widely used to map QTL for endosperm traits (TAN et al. 1999 Down; WANG and LARKINS 2001 Down; WANG et al. 2001 Down). For example, using 83 simple sequence repeat loci, WANG et al. 2001 Down detected two QTL, one on the short arm of chromosome 4 and the other on the long arm of chromosome 7, together accounting for 25% of the variance for elongation factor 1{alpha} content in maize endosperm.

Current statistical analyses of the association between markers and endosperm traits are based on the assumption that endosperm traits are controlled under the same genetic composition as other tissues developing from the diploid embryo. This assumption is obviously violated because the endosperm has three unique properties. First, the endosperm is triploid and has a more complex genetic composition than the embryo. For a locus with two alleles A and a, four genotypic combinations AAA, AAa, Aaa, and aaa are possible, vs. three genotypes AA, Aa, and aa for the diploid embryo. Second, the endosperm of a cross between two different genotypes will differ from that of the reciprocal cross. Third, the occurrence of the fertilized egg is the beginning of a new generation, so the embryo and endosperm on a plant represent the next generation. A precise resolution into mapping endosperm traits using marker information from diploid organs needs the development of a bridge between these two tissues with different levels of ploidy. Although this is statistically challenging, currently developed computational algorithms, such as the EM algorithm, provide a powerful means for mapping the QTL contained in the triploid endosperm on the basis of marker information from diploids.

In this article, we have developed a maximum-likelihood-based method, implemented with the EM algorithm, to map QTL responsible for endosperm traits using a genetic linkage map of polymorphic markers. For a particular plant, the formation of endosperm QTL genotypes depends on how its polar nuclei, which are the duplication of the female gametes (eggs), are fertilized by the male gametes (pollen). Our models are different for autogamous and allogamous plants. For autogamous plants, such as rice, the sperm cells fertilizing the two central cells, which generate the endosperm, are derived from the same plant. For allogamous plants the sperm cells are derived from a pollen pool of the mapping population. For this reason, the endosperm QTL genotypes will have different segregation patterns between autogamous and allogamous plants. Here we report the development of a statistical method for endosperm mapping in autogamous plants. This method is studied under different statistical strategies by extensive simulation experiments.


*  EXPERIMENTAL DESIGNS
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Consider an endosperm trait measured in a backcross or F2 population. Traditional diploid trait mapping, as proposed by LANDER and BOTSTEIN 1989 Down, uses marker information and phenotypic measurements derived from the same generation and the same ploidy level. However, because the endosperm is triploid, its precise molecular characterization can be difficult. For example, it is impossible to distinguish directly between two endosperm genotypes AAa and Aaa from commonly used dominant (randomly amplified polymorphic DNA or amplified fragment length polymorphism) or codominant marker systems (restriction fragment length polymorphism or microsatellite). For dominant markers, three genotypes AAA, AAa, and Aaa cannot be distinguished from one another. For this reason, marker information in endosperm mapping should not be derived from the triploid endosperm rather than from other diploid tissues. Assuming that an endosperm-specific trait is controlled only by the endosperm QTL genotype and that no gene interactions from maternal effects exert to affect the trait expression, we need to predict endosperm QTL behavior (generation t + 1) using molecular markers from the maternal genome (generation t) or offspring genome (generation t + 1). It should be pointed out that, although generation t +1 is used here for the endosperm, it does not mean that the endosperm can reproduce to generate the progeny. Marker information can be sampled from a diploid tissue of a maternal plant and/or its progeny's diploid tissue (e.g., embryo), which represents two different experimental designs for mapping endosperm QTL contained in seeds. Below, these two designs are described.

Consider a backcross plant of an autogamous species. The diploid marker genotype of this plant is determined only by the gamete genotypes of the heterozygous F1, whereas its endosperm QTL genotypes are determined by the combination of the polar nucleus of two central cells and a sperm nuclei, which this plant produces for the next generation. Within individual plants, the frequencies of polar nuclei genotypes are identical to those of the female gamete (egg) genotypes because the two central cells are formed from the egg cell through mitosis and, thereby, the polar nuclei genotypes can be regarded as the homogeneous duplication of the egg genotypes (FRIEDMAN 1990 Down, FRIEDMAN 1998 Down). It is clear that, for autogamous plants as considered in this study, the frequencies of the sperm genotypes are identical to those of the egg genotypes and of the polar nuclei genotypes. On the basis of these properties, we can calculate the frequencies of the endosperm QTL genotypes that a backcross plant produces, given the diploid marker genotypes of this plant and its embryos.

Two different experimental designs can be used to predict triploid endosperm QTL genotypes on the basis of diploid marker genotypes. The first design is one in which the endosperm is predicted from a diploid tissue of backcross plants (in generation t). We call this design a one-stage design. The second design uses marker genotypes from both backcross plants (maternal genome) and their embryos (offspring genome) to predict the endosperm, which is called a two-stage hierarchical design because two successive generations (t and t + 1) are genotyped. The one-stage design is simpler in terms of the material genotyped, whereas the two-stage hierarchical design is more precise because within-family variation of a backcross plant is considered.

Suppose there are two flanking markers, {eta} and {eta}+1, derived from the diploid tissue and embryos of a backcross plant. The recombination fraction between the two markers is denoted by r. A putative QTL, located between the two markers (measured by the recombination fraction r1 with {eta} and r2 with {eta}+1), is segregating in a trisomic manner and exerts an effect on an endosperm trait. Let G(t)i denote the diploid marker genotype of backcross plant i (generation t) at the two flanking markers, G(t+1)ij denote the embryo marker genotype of the jth seed (generation t + 1) that this plant produces, and gk denote the kth (k = 0, 1, 2, 3 denotes different numbers of the increasing allele Q) QTL genotype of the triploid endosperm. A general expression for the conditional probability of an endosperm QTL genotype under the one-stage and two-stage hierarchical design can be written, respectively, as

where M is the number of the backcross plants and Ni is the number of seeds derived from backcross plant i, which provide both embryo genotypic and endosperm phenotypic information. Expressions of pik or pijk are derived for all possible marker genotypes of the backcross plant (one-stage design) or for all possible marker genotypes of the backcross plant and all its possible embryo marker genotypes (two-stage hierarchical design). These expressions are given in Table 1 and Table 2, respectively, with detailed derivations described in APPENDIX A.


 
View this table:
[in this window]
[in a new window]

 
Table 1. Conditional probabilities of endosperm QTL genotypes, conditional upon diploid marker genotypes for {eta} and {eta}+1 in a backcross design


 
View this table:
[in this window]
[in a new window]

 
Table 2. Conditional probabilities (pijk) of endosperm QTL genotypes, conditional upon two-generation diploid marker genotypes from backcross plants and their embryos


*  STATISTICAL MODEL
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

We have formulated a statistical mixture model, in which different QTL genotypes in the endosperm are viewed as components of a normal mixture. This mixture model is defined by the frequency of each of the endosperm QTL genotypes and the density corresponding to each genotype.

Additive-dominant model:
The phenotypic value (y) of an endosperm trait due to a single QTL for the ith backcross plant under the one-stage design or the jth autogamous seed of the ith backcross plant under the two-stage hierarchical design can be statistically modeled by

(1)

where µ is the overall mean; a is the additive effect of the increasing allele Q at the QTL; d1 is the first dominant effect, i.e., the dominance effect of QQ to q when Q is dominant or that of q to QQ when q is dominant; d2 is the second dominance effect, which reflects the dominance effect of Q to qq when Q is dominant or qq to Q when q is dominant; and x's and z's are the indicator variables describing an endosperm QTL genotype, defined as

(2)

The variables {epsilon}i, {epsilon}ij are the residuals including the aggregate effect of both polygenes and error effect and distributed as N(0, {sigma}2{epsilon}). The probabilities with which x's and z's take an assigned value depend on the genomic positions of the QTL in the interval bracketed by flanking markers given in Table 1 and Table 2.

Consider an endosperm mapping population composed of M backcross plants and Ni randomly selected autogamous seeds from the ith backcross plant. These backcross plants and the embryos of their seeds are genotyped simultaneously, whereas phenotypes are measured for the endosperm of their seeds. The likelihood of the marker data and the endosperm trait values controlled by the putative QTL can be represented under the one-stage or two-stage hierarchical design by

(3)

where is the vector for unknown QTL-effect parameters, QTL-position parameters, and residual variance to be estimated; pik, pijk are the proportions of each mixture normal (i.e., endosperm QTL genotype); and fk is the normal density corresponding to the kth genotype, with mean µk described in Equation 2 and Equation 3. On the basis of quantitative genetic models of endosperm traits (GALE 1976 Down; MO 1987 Down; BOGYO et al. 1988 Down; POONI et al. 1992 Down), unknown QTL-effect parameters can be exactly expressed as a linear combination of QTL genotypic means . We first estimate m and then solve for e because the former has a simpler structure.

The maximum-likelihood estimates (MLEs) of the unknown parameters under the one-stage or two-stage hierarchical design can be computed by implementing an expectation maximization (EM) algorithm (DEMPSTER et al. 1977 Down; MENG and RUBIN 1993 Down). The log-likelihood of Equation 3 for the two-stage hierarchical design is given by

(4)

with derivatives

where we define

(5)

which could be thought of as a posterior probability that the endosperm from the jth seed of the ith backcross plant has a QTL genotype k. We then implement the EM algorithm with the expanded parameter set {{Omega}, P}, where . Conditional on P, we solve for the zeros of ({partial}/{partial}{Omega}{phi})log {ell}({Omega}) (Appendix B) to get our estimates of {Omega} (the M step). The estimates are then used to update P (the E step), and the process is repeated until convergence. The values at convergence are the MLEs. The formulas for estimating the unknown parameters in the M step under both one-stage and two-stage hierarchical designs are given in Appendix B.

Epistatic model:
Suppose there are two biallelic QTL 1 of alleles Q1 and q1 and 2 of alleles Q2 and q2 to epistatically affect an endosperm-specific trait. At each QTL there are four possible genotypes for triploid endosperm. Thus, two-QTL genotypes in the endosperm can be arrayed by

where {otimes} is the Kronecker product.

If the two putative QTL are tested on different intervals, the conditional probabilities of the two-QTL genotypes can be calculated independently for each QTL, i.e., for the one-stage design and for the two-stage hierarchical design, assuming that there is no crossover interference between the two marker intervals. Denote {Theta}1 and {Theta}2 as the matrices for the conditional probability of QTL genotypes for 1 conditional upon the marker interval {eta}{eta}+1 and for 2 conditional upon a different marker interval {eta}'{eta}'+1, respectively. The conditional probability matrix ({Theta}) of joint QTL genotypes at Q1 and Q2 conditional upon the two-marker intervals can be expressed as . If two linked QTL are located within the same marker interval, the conditional probabilities of the two-QTL endosperm genotypes conditional upon the diploid marker genotypes of the flanking markers (bracketing two putative QTL) should be derived. These two-QTL conditional probabilities for the backcross population of an autogamous plant are given on our statistical genetics webpage (http://www.ifasstat.ufl.edu/genetics/~endosperm). In a practical data analysis, however, modeling two QTL within the same marker interval should be carefully considered because of a confounding effect, unless an adequately large sample size is used.

The phenotypic value of an endosperm trait due to the two putative QTL can be modeled under the one-stage or two-stage hierarchical design. As our analysis is similar between these two models, only the two-stage hierarchical design is described. The phenotypic value of an endosperm trait from the ith backcross plant and its jth autogamous seed under the two-hierarchical design is written as

(6)

where x's and z's are the indicator variables describing an endosperm QTL genotype at each QTL for endosperm ij, as defined in expression (2), and a's, d's, iaa, jad's, kda's, and ldd's are the corresponding additive, dominant, additive x additive, additive x dominant, dominant x additive, and dominant x dominant epistatic effects between the two QTL (see Equation 6). The EM algorithm is implemented to estimate all unknown parameters including the overall mean, QTL effects and position, and residual variance (Appendix B).


*  HYPOTHESIS TESTING
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Additive-dominant model:
A number of hypothesis tests can be formulated for endosperm inheritance. The first hypothesis considers the existence of any QTL affecting the expression of an endosperm trait, which is expressed as

(7)

for the backcross B1. The test statistic for testing the above hypotheses is calculated as the log-likelihood ratio of the full model (H1) over the reduced model (H0),

where and denote the ML estimates of the unknown parameters under H0 and H1, respectively. The log-likelihood ratio (LR) is asymptotically {chi}2 distributed with 3 d.f. However, the critical threshold value for declaring the existence of an endosperm QTL is generally calculated on the basis of permutation tests (CHURCHILL and DOERGE 1994 Down).

Second, a hypothesis test can be made for the additive or dominant effects of the QTL on the endosperm trait, on the basis of

(8)

whose log-likelihood ratio test statistic is asymptotically {chi}2 distributed with 1 d.f., and

(9)

whose log-likelihood ratio test statistic is asymptotically {chi}2 distributed with 2 d.f.

Epistatic model:
The existence of any QTL affecting the expression of an endosperm trait under the two-QTL model can be tested on the basis of the hypotheses

(10)


The test statistic for testing the above hypotheses is calculated as the log-likelihood ratio of the full model (H1) over the reduced model (H0), which is asymptotically {chi}2 distributed with 15 d.f.

Like the hypothesis tests for the additive or dominant effects of individual QTL, the significance of QTL epistasis effects on the expression of an endosperm trait can be tested on the basis of

(11)


Under the null hypothesis, the LR of Equation 11 is asymptotically {chi}2 distributed with 9 d.f. Specifically, different components of epistatic (additive x additive, additive x dominant, dominant x additive, and dominant x dominant) effects can also be tested.

If two QTL detected are on the same interval, the degree of their linkage can also be tested. Testing the QTL linkage is equivalent to testing r2 = 0, where r2 is the recombination fraction between the two QTL. This test can be extended to test for a particular value of the recombination fraction.


*  MONTE CARLO SIMULATION
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Simulation scenarios:
We performed a series of simulation experiments to examine the statistical properties of our endosperm mapping method. A linkage group length of 100 cM is simulated for a backcross population. Assume that there are six equidistant markers ordered as 16 on the group. In the backcross, these six markers generate a total of 64 genotypes whose frequencies are simulated on the basis of the recombination fractions between all pairs of two successive markers. We use the Kosambi map function to convert the map distance into the recombination fraction.

In our simulation experiments, we use different sampling strategies for marker information (one-stage vs. two-stage hierarchical design), different levels of heritability and different sample sizes . For the two-stage hierarchical design, different sampling strategies are designed on the basis of allocations of a given sample size between the backcross plants and their progeny (seeds). The sampling strategies used here are (1) 400 x 1 (one seed is sampled from each of 400 backcross plants), (2) 40 x 10 (40 seeds are sampled from each of 10 backcross plants), (3) 20 x 20 (20 seeds are sampled from each of 20 backcross plants), and (4) 10 x 40 (10 seeds are sampled from each of 40 backcross plants). In addition to their possible effects on parameter estimation, these different strategies have different utilities in practice. For example, strategy 1 does not require genotyping many embryos, but requires the maintenance of a large backcross population. In strategy 4, only a small backcross population is maintained, but it needs many embryos to be genotyped.

Additive-dominant model:
Suppose there is a QTL located on the middle point of the linkage group (i.e., 50 cM away from each end of the group), which affects a quantitative endosperm trait. The additive and dominant effects of the QTL are hypothesized as . Given the overall mean µ = 10, the genotypic means of the four possible endosperm QTL genotypes are calculated using Equation 1. In the endosperm progeny of the backcross, the frequencies of these triploid QTL genotypes are 1/8 for QQQ, 1/8 for QQq, 1/8 for Qqq, and 5/8 for qqq. The genetic variance due to this QTL is 1.2. When the heritability H2 of an endosperm trait is given, the residual variance can be calculated, from which the residual effects (and therefore phenotypic values) of the individuals are simulated.

The characterization of the threshold for declaring the existence of a QTL is a difficult issue. The permutation test proposed by CHURCHILL and DOERGE 1994 Down is regarded as a useful approach for calculating the threshold because it is not dependent on the distribution of the test statistic. However, permutation tests require expensive computations. We thus use chromosome- wide permutation tests to characterize the thresholds only under a sample size of 400 for the one-stage design and under a 40 x 10 strategy for the two-stage hierarchical design. A set of endosperm phenotypic values for the one-stage design is simulated using the residual variances of , which correspond to the heritability levels of 0.6 and 0.2, respectively, when a QTL is assumed. Similarly, for the two-stage hierarchical design, two simulation scenarios, with , are designed. In each case, our model is used to estimate QTL parameters for endosperm inheritance and calculate the corresponding LR.

The distribution of the LR values over 1000 permutation replicates can be approximated by a {chi}2 distribution. The 95th, 99th, and 99.9th percentiles of the distribution of the maximum are used as empirical critical values to declare the existence of a QTL on the linkage groups at the significance levels . Under the one-stage design, these percentiles are 10.2551, 13.1874, and 15.7615 for H2 = 0.6 and 10.1746, 15.5654, and 23.3611 for H2 = 0.2, respectively. These percentiles under the two-stage hierarchical design (40 x 10) are 12.6697, 15.9342, and 20.3486 for H2 = 0.6 and 12.9303, 15.8864, and 18.7674 for H2 = 0.2.

Epistatic model:
Suppose there are two QTL affecting a quantitative endosperm trait, the first located on the middle point of the marker interval 23 and the second on the middle point of the marker interval 45. The additive and dominant effects of the first QTL are hypothesized as , and and those of the second QTL as , and . The epistatic interaction effects between the two QTL are hypothesized as (additive x additive), (additive x dominant), (dominant x additive), and (dominant x dominant). Given the overall mean µ = 10, the genotypic means of the 16 possible endosperm QTL genotypes at the two QTL are calculated using Equation 6. In the endosperm progeny of the backcross, the frequencies of the triploid QTL genotypes for each QTL are 1/8 for QQQ, 1/8 for QQq, 1/8 for Qqq, and 5/8 for qqq. Thus, when these two QTL are on different marker intervals, the frequencies of their 16 genotypes can be arrayed by (1/8, 1/8, 1/8, 5/8)T {otimes} (1/8, 1/8, 1/8, 5/8)T. When these two QTL are on the same marker interval, the frequencies of the QTL genotypes in the endosperm progeny of the backcross plants can be calculated on the basis of the joint probabilities of the QTL-marker genotypes. With the genotypic means and frequencies of the QTL genotypes, the genetic variances due to these two QTL can be calculated for each model.

We perform LR tests across a grid of locations on the chromosome to infer the most likely genome positions of two QTL. The declaration for the existence of QTL is based on a critical threshold for the LR test statistic that controls the chromosome-wise type I error rate. Permutation tests proposed by CHURCHILL and DOERGE 1994 Down are used to calculate the threshold values for each simulation scenario.


*  RESULTS
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Additive-dominant model: One-stage design:
The position and effects of the hypothesized QTL can be reasonably well estimated under this design, but, as expected, with better accuracy and precision for a larger than a smaller sample size and for a higher rather than lower heritability level (Table 3). The mean-squared error (MSE) for the position MLEs among 100 replications of simulation is very high (97.52) at M = 200 and H2 = 0.2, suggesting that the QTL position cannot be precisely estimated in this case. But such a high sampling error can be reduced by a factor of 3 when M is increased to 400 or by a factor of 8 when H2 is increased to 0.6. This information also can be seen in Fig 1, in which a more precise localization of the QTL displays a narrower peak for the profile of the log-likelihood ratio test statistics across the length of the linkage group.



View larger version (10K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. The profiles of the log-likelihood-ratio test statistics calculated as a function of genome position on the simulated linkage group for different sample sizes and heritabilities under the one-stage design. The dotted and solid curves refer to heritabilities of 0.2 and 0.6, respectively. The vertical dotted line refers to the true position of a hypothesized QTL on the linkage group.


 
View this table:
[in this window]
[in a new window]

 
Table 3. The MLEs of QTL parameters and their MSEs (in parentheses) for different heritabilities (H2) and sample sizes (M) under the one-stage design based on 100 repeated simulations

The MLEs of the genotypic means (µ's) can be well estimated, although a larger heritability and larger sample size will lead to better estimates (Table 3). These estimates are used to solve for one additive effect (a) and two dominant effects (d1 and d2) using the linear equation given in Appendix B. The MLEs of the additive and dominant effects display different accuracy and precision. Whereas the MLE of the additive effect has acceptable precision at M = 200 and H2 = 0.2, those of the two dominant effects are highly upward biased and imprecise. Moreover, the dominant effect (d2) due to the intralocus interaction of Q and qq appears to be not only overestimated more seriously, but also has lower estimation precision than the dominant effect (d1) due to the intralocus interaction of QQ and q. For example, at M = 200 and H2 = 0.2, d2 is overestimated by onefold, whereas d1 is overestimated by ~30%. When the heritability of an endosperm trait and/or the sample size used is increased, the precision of the dominant effect estimation improves, but to a lesser extent than does the precision of the additive effect estimation.

In general, the one-stage design that ignores marker segregation within backcross families can be used to estimate the position and effect of a QTL on the endosperm. But acceptable estimation precision for dominant effects under the one-stage design requires a high heritability level of an endosperm trait and a large sample size.

Two-stage hierarchical design:

The precision of the MLEs for the QTL position and effects improves significantly under the two-stage hierarchical design (Table 4) compared to the one-stage design (Table 3). Under the two-stage hierarchical design, the QTL can be localized with high mapping resolution when H2 is high (0.6; Fig 2). For the two-stage hierarchical design, different sampling strategies have different precision of parameter estimation. The estimation precision is best under strategy 40 x 10 when H2 is higher, but the precision is best under strategy 400 x 1 when H2 is lower. For the same design, the precision of parameter estimation differs between different types of parameters. A general trend is that the MLEs of the dominant effects have lower precision than those of the additive effect and the QTL position. But compared to the one-stage design, the estimates of the dominant effects are much less biased for the two-stage hierarchical design (Table 4).



View larger version (22K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. The profiles of the log-likelihood-ratio test statistics calculated as a function of genome position on the simulated linkage group for different sampling designs and heritabilities under the two-stage hierarchical design. The dotted and solid curves refer to heritabilities of 0.2 and 0.6, respectively. The vertical dotted line refers to the true position of a hypothesized QTL on the linkage group.


 
View this table:
[in this window]
[in a new window]

 
Table 4. The MLEs of QTL parameters and their MSEs (in parentheses) for different heritabilities (H2) and different sampling designs under the two-stage hierarchical design based on 100 repeated simulations

In summary, the two-stage hierarchical design is better than the one-stage design in terms of the accuracy and precision of QTL parameters. The two-stage hierarchical design is particularly more advantageous than the one-stage design in estimating the dominant effects of an endosperm trait.

Epistatic model: One-stage design:
The accuracy and precision of the estimates of the hypothesized QTL locations and effects, as well as the power to detect a significant QTL, depend on the magnitude of QTL effect, the nature of the effect, the level of heritability, and sample size (Table 5). The QTL of a larger effect can be better mapped to a genomic location than that of a smaller effect. Fig 3A and Fig B, illustrates the landscapes of the log-likelihood ratio test statistics as a function of the locations of the two QTL for the one-stage design. The maxima of the landscape is closer to the coordinate of the true positions of the two QTL when an endosperm trait has a larger (Fig 3A) than lower heritability (Fig 3B).




View larger version (74K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. The landscapes of log-likelihood-ratio test statistics as a function of genome positions of two QTL affecting an endosperm trait of heritabilities 0.2 (A) and 0.6 (B) under the one-stage design. {tau}1 and {tau}2 denote the genome positions of the two epistatic QTL, respectively. The thick bottom lines indicate true QTL positions.


 
View this table:
[in this window]
[in a new window]

 
Table 5. The MLEs of QTL parameters and mean square errors (MSE) of the estimates among 100 replicated simulations at different heritability (H2) levels and sample sizes (M) under the one-stage design

For a small sample size (200) and a low heritability trait (0.20), the additive effect of a large QTL and its additive x additive effect with other QTL can be well estimated. The estimation of the additive effect of a small QTL displays significantly increased accuracy and precision when sample size and/or the heritability is increased. The dominant effects among different alleles at the same QTL cannot be well estimated for a small sample size and low-heritability endosperm trait (Table 5). Both the accuracy and precision of the estimates of dominant effects can be increased with increased sample size and heritability. It seems that the estimation precision for dominant effects can be better improved with the increased level of heritability than increased sample size. It is difficult to precisely estimate the additive x dominant, dominant x additive, and dominant x dominant epistatic effects of the two QTL for an endosperm trait of low heritability. Even the estimates of these parameters cannot be well improved when the sample size used is increased from 200 to 400.

In general, the one-stage design that ignores marker segregation within backcross families can be used to estimate the position and additive and additive x additive effect of a large QTL on the endosperm. But it is difficult to achieve acceptable estimation precision for dominant and epistatic additive x dominant, dominant x dominant, and dominant x dominant effects under the one-stage design. It appears that a high heritability level is more important in improving the estimates of these parameters than the increase of sample size.

Two-stage hierarchical design:

When an endosperm trait is under low genetic control (H2 = 0.2), the two-stage hierarchical design, which captures both between- and within-family variation, is not advantageous in the estimation precision of two epistatic QTL over the one-stage design, which captures only between-family variation (Table 6). This is in sharp contrast to the result of the one-QTL-fitting model in which the two-stage hierarchical design can provide more precise estimation of QTL parameters than the one-stage design (Table 3 and Table 4).


 
View this table:
[in this window]
[in a new window]

 
Table 6. The MLEs of QTL parameters and mean square errors (MSE) of the estimates among 100 replicated simulations at different heritability levels (H2) and sampling designs under the two-stage hierarchical design

The two-stage hierarchical design displays striking advantages in estimating the epistasis of QTL for an endosperm trait of high heritability (H2 = 0.6; Table 5). In this case, the position of the two epistatic QTL, including one with a smaller effect, can be precisely mapped, as seen from reduced MSEs. For example, the MSE of the position of the smaller QTL is 42.33 under the one-stage design (Table 5), whereas it is reduced to 9.44–22.05 for the same sample size under the two-stage hierarchical design (Table 6). Fig 4A Fig B–C, illustrates the landscapes of the log-likelihood-ratio test statistics of the two putative QTL with epistasis calculated from one simulation run for the different sampling strategies. The peaks of the LR landscapes are close to the true positions of the two QTL, suggesting that their positions can be well estimated.




View larger version (69K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. The landscapes of log-likelihood-ratio test statistics as a function of genome positions of two QTL affecting an endosperm trait of heritabilities 0.2 (A) and 0.6 (B) under a 40 x 10 sampling design of the two-stage hierarchical design. {tau}1 and {tau}2 denote the genome positions of the two epistatic QTL, respectively. The thick bottom lines indicate true QTL positions.

Aside from precise estimation of additive and additive x additive (iaa) effects, the precision of the estimates of different dominant effects and additive x dominant (jad), dominant x additive (kda), and dominant x dominant effects (ldd) can be improved for a high heritability endosperm trait from the two-stage hierarchical design (Table 6). It appears that the MLEs of the additive effects, iaa, jad, and kda, are more precise than those of the dominant effects and ldd among all different sampling strategies. However, different sampling strategies display different estimation precision of the MLEs of the dominant effects and ldd. For example, strategy 10 x 40 can provide more precise estimates of the dominant effects of one Q over two q's (d21 and d22), whereas more precise estimates of the dominant effects of two Q's over one q (d11 and d12) can be provided by strategy 400 x 1 (Table 6). Relatively speaking, strategy 10 x 40 can provide a better estimate of ldd than other designs.

In summary, the two-stage hierarchical design is better for the estimation of epistatic QTL than the one-stage design when an endosperm trait has a high heritability. Different sampling designs have different effects on the estimation of dominant effects and dominant x dominant epistatic effects. Depending on the nature of experimental material and breeders' purposes, these sampling designs can be selectively used.


*  DISCUSSION
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

The genetic improvement of grain quality in crop plants has now become a major focus in many plant breeding programs (SADIMANTARA et al. 1997 Down; MAZUR et al. 1999 Down; TAN et al. 1999 Down; VAN DER MEER et al. 2001 Down; WANG et al. 2001 Down). Compared to yield improvement, however, quality improvement will be much more challenging because traits affecting grain quality are endosperm specific, and the endosperm being a triploid tissue has complex trisomic inheritance. Also, since the endosperm is one generation ahead of its mother plant, it is difficult to predict the segregation patterns of the endosperm genes on the basis of marker information from the maternal parent. These have been two major obstructions in improving grain quality through a marker-assisted selection strategy. In this article, we employ traditional quantitative genetic principles and powerful statistical technologies to develop a new theoretical method for mapping QTL underlying endosperm traits in autogamous plants.

Unlike traditional diploid mapping, genetic mapping of the triploid endosperm requires estimation of a large number of genetic parameters because there are more copies of alleles at each locus. Statistically, an increased number of unknown parameters to be estimated can create numerous problems in estimation, such as larger biases, larger sampling errors, and lower power. We compare the differences between two alternative experimental designs in the capacity to minimize the problems due to an increased number of unknowns in our mapping model. The first model, the one-stage design, draws marker information only from the current generation of plants, and the second model, the two-stage hierarchical design, draws marker information from both the current generation and the self-fertilized progeny that the plants generate. In practice, the one-stage design is less expensive because no marker information for the progeny of the experimental plants is needed. However, theoretical simulations indicate that the estimation of QTL parameters, especially dominant effects, under the one-stage design can be seriously overestimated when the heritability of an endosperm trait and/or sample size is not large. The two-stage hierarchical design, which extracts marker information from two successive generations, can improve the accuracy and precision of QTL parameters including dominant effects. This is especially remarkable when an endosperm has a low heritability and/or the experiment is based on a limited number of backcross progeny. For the two-stage hierarchical design, different allocations of a given number of samples between the backcross plants and their autogamous progeny also affect the parameter estimation.

The endosperm used as a mapping tissue has several theoretical advantages in its own right. For example, since the endosperm is triploid and contains extra gene copies, a number of hypotheses regarding the role of gene interactions within or between loci in plant development, adaptation, and evolution can be addressed using the endosperm as a model study material. Moreover, the endosperm is formed due to the fusing of two polar nuclei cells with a sperm nucleus. The endosperm is an ideal material to study maternal effects and interaction effects with nuclei genes on plant behavior. Our mapping model can be readily extended to include maternal effects. Also, our limited knowledge about the genetic mechanisms underlying endosperm traits makes it practically impossible to understand the evolutionary significance of double fertilization in higher plants (FRIEDMAN 1990 Down, FRIEDMAN 1998 Down), which produces the endosperm. The genetic mapping of endosperm provides a powerful means for addressing this fundamental question in plant evolution.

In this article, we extend a one-QTL analysis to include QTL with epistasis in the control of an endosperm trait. The role of epistasis in trait control, evolution, and breeding has been reconciled recently thanks to the use of powerful molecular markers that can effectively dissect a complex trait into its individual QTL locus components (DOEBLEY et al. 1995 Down; LARK et al. 1995 Down; LUKENS and DOEBLEY 1999 Down; CHEVERUD 2000 Down; KIM and RIESEBERG 2001 Down). However, because epistasis results from the dependence of different genes activated in a physiological process or biochemical pathway (PHILLIPS 1998 Down) so that the precise estimation of epistatic effects on quantitative variation is always difficult, many statistical methods for QTL mapping in the current literature assume no epistasis, or they are simply based on a two-way analysis of variance specifying the interaction effect of a given pair of markers (see the references listed above). A few methodologies with power to detect epistasis include KAO et al. 1999 Down multiple-interval mapping, DU and HOESCHELE's (2000) finite locus model, and JANNINK and JANSEN's (2001) one-dimensional genome search. These methods allow for the determination of epistasis between different QTL or between QTL and polygenic background. Our method can be specifically used to map epistatic QTL affecting the expression of complex traits on triploid endosperm derived from double fertilization in flowering plants.

The idea described in our mapping model can be extended to map pleiotropic QTL affecting both grain yield and grain quality, although this will be a challenging statistical issue. Traits for grain yield, e.g., seed number and seed weight, are located on the diploid tissues of backcross plants. As a consequence, the QTL for grain yield should be modeled in the mode of disomic inheritance. However, endosperm traits for grain quality undergo trisomic inheritance and, thus, they should be modeled differently. A statistical framework should be built to model the pleiotropic effect of a QTL on diploid tissues and triploid endosperm. Such a framework will permit plant breeders to design a more efficient breeding program for selecting superior genotypes with high yield and high quality.

In this study, we report our results in a backcross population for an autogamous plant system. The application of this model in an F2 design is not difficult, except for the segregation of more marker genotypes in both the current and progeny generations. For an autogamous plant, the eggs and two polar nuclei cells are self fertilized so that the frequencies of male gamete genotypes are identical to those of female gamete genotypes. But in an allogamous plant, such as maize, each female gamete from each mother plant will be pollinated by all possible male gametes from the pollen pool. This difference should be considered when the current model is used to study the genetics of the allogamous endosperm. All of these issues deserve in-depth investigations.


*  ACKNOWLEDGMENTS

We thank three anonymous reviewers for their helpful comments on earlier versions of this manuscript. This work is partially supported by a grant from National Science Foundation (DMS9971586) to C.G., an Outstanding Young Investigators Award of the National Science Foundation of China (30128017), and a University of Florida Research Opportunity Fund (02050259) to R.W. The publication of this manuscript is approved as Journal Series no. R-08586 by the Florida Agricultural Experiment Station.

Manuscript received April 12, 2002; Accepted for publication June 27, 2002.


*  APPENDIX A
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

DERIVATION OF CONDITIONAL PROBABILITIES
We describe the procedures for deriving the conditional probabilities of endosperm QTL genotypes upon marker genotypes of the current backcross (one-stage design) and marker genotypes of the backcross and its progeny for an autogamous species (two-stage hierarchical design). Suppose two inbred lines P1 and P2, with respective genotypes M{eta}M{eta}QQM{eta}+1M{eta}+1 and m{eta}m{eta}qqm{eta}+1 m{eta}+1 at two flanking markers {eta} and {eta}+1 and the QTL they bracket. Denote the recombination fraction between the two markers by r and those between {eta} and the QTL as well as the QTL and {eta}+1 by r1 and r2. The F1 hybrid of the two lines has genotype M{eta}m{eta}QqM{eta}+1 m{eta}+1, which produces a total of eight joint marker-QTL gamete genotypes (assuming generation t), denoted by G, with the corresponding frequencies expressed as

Crossing the F1 with the homozygous P2 generates the B1 backcross generation t, in which there are eight joint marker-QTL diploid genotypes, each corresponding to a joint gamete genotype above. These joint diploid genotypes can be categorized into four groups of marker genotypes (denoted by Z),

For a traditional backcross design, the recombination fraction between the two markers is estimated on the basis of the numbers of these different marker genotypes in the population. The subsequent QTL analysis is performed by associating the marker genotypes with phenotypic values measured on diploid organs of the backcross progeny. However, when mapping endosperm traits, we must consider how the embryo and endosperm are formed through reproductive behavior. Whereas the embryo is formed due to the fusing of one haploid egg and one sperm cell, the formation of the endosperm results from the fusing of two central cells, whose genotype (denoted by GG) is the duplication of the egg, and one sperm cell. Since there are different levels of heterozygosity, each of the eight joint backcross genotypes above produces different compositions of the egg genotypes and polar nuclei genotypes. The first backcross diploid genotype derived from the first gamete genotype G(t)111 of the F1 produces eight egg genotypes (and therefore polar nuclei genotypes), whereas those from the rest of the gamete genotypes produce four (from G(t)101, G(t)110, and G(t)011), two (from G(t)100, G(t)001, and G(t)010), and one egg genotype (from G(t)000). For an autogamous plant, the genotypes and frequencies of sperms it produces are identical to those of eggs. Because the one-stage and two-stage hierarchical designs use different marker information, we derive the conditional probabilities of the endosperm QTL genotypes separately for these two models.

One-stage design:
In this model, we use only marker genotypes from the backcross plants, without considering the marker information from the progeny of the backcross. Thus, our interest here is how to generate endosperm QTL genotypes given a particular marker genotype of the backcross. The endosperm QTL genotypes (generation t + 1) produced by the first backcross genotype for an autogamous plant are the product of the array of the polar nuclei genotypes and the array of the sperm genotypes,

which results in four endosperm QTL genotypes QQQ, QQq, Qqq, and qqq, each with a frequency of 1/4.

The second backcross genotype G(t)101 with the same marker genotype Z(t)11 produces one polar nuclei QTL genotype (qq) and one sperm QTL genotype (q), which thus results in only one single endosperm QTL genotype qqq. The conditional probabilities of the endosperm QTL genotypes upon the diploid marker genotypes Z(t)11 of the backcross can be calculated as

where crossover interference is ignored.

Similarly, we can derive the conditional probabilities of the endosperm QTL genotypes upon the other three diploid marker genotypes in the backcross (see Table 1).

Two-stage hierarchical design:
When the marker information from both the backcross and its progeny is considered simultaneously, we should see the segregation of both markers and QTL in the progeny of the backcross. Under the two-stage hierarchical design, the endosperm genotypes (in generation t + 1) produced by the first backcross genotype for an autogamous plant are the product of the array of eight polar nuclei genotypes (denoted by GG) and the array of eight sperm genotypes,

which results in nine triploid endosperm marker genotypes, each containing four QTL genotypes QQQ(Q3), QQq(Q2), Qqq(Q1), and qqq(Q0). In the same manner, nine diploid embryo marker genotypes are formed with frequencies

By isolating DNA from embryos, their marker genotypes can be characterized. The second diploid backcross genotype derived from the F1's second gamete genotype G(t)101 has the same marker genotype Z(t)11 as the first diploid backcross genotype. The second backcross genotype produces four polar nuclei genotypes and four sperm genotypes, which are arrayed as

which also results in nine triploid endosperm marker genotypes, but each containing only a single QTL genotype qqq because of no allele Q in this backcross genotype.

The same endosperm genotypes derived from the first and second backcross genotype will be summed up because of their same marker genotype Z(t)11. With similar analyses we can sum up the same endosperm genotypes for the third and fourth, fifth and sixth, and seventh and eighth backcross genotypes.

The conditional probabilities of the endosperm QTL genotypes, conditional upon diploid marker genotypes of the backcross (t) and its autogamous progeny (t + 1) under the two-stage hierarchical design can be derived according to Bayes' theorem. For example, the conditional probability of endosperm QTL genotypes QQQ, QQq, Qqq, and qqq, conditional upon a diploid marker genotype Z111 in the two successive generations can be calculated as

Similarly, we can derive the conditional probabilities of the endosperm QTL genotypes, conditional upon other diploid marker genotypes at the backcross and its autogamous progeny (see Table 2).


*  APPENDIX B
*TOP
*ABSTRACT
*EXPERIMENTAL DESIGNS
*STATISTICAL MODEL
*HYPOTHESIS TESTING
*MONTE CARLO SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

ESTIMATORS OF MODEL PARAMETERS IN THE M STEP
Additive-dominant model:
The MLEs of the unknown QTL parameters are obtained by differentiating the likelihood with respect to each unknown, setting the derivatives equal to zero, and solving the log-likelihood equations. The EM algorithm is used to obtain the MLE of the genetic mean of each endosperm genotype at a putative QTL bracketed by a marker interval.

The expressions of the log-likelihood equations for estimating genotypic means and residual variance in the M step are given as

for the one-stage design and the two-stage hierarchical design, respectively, where Pik and Pijk are the posterior probabilities of the QTL genotypes under the two different designs, respectively (see Equation 5). The MLE of the QTL position is obtained by treating the recombination fraction between the QTL and one marker (r1 or r2) as a fixed parameter.

After the genotypic means (m) are estimated, a linear transformation is used to estimate the QTL effect parameters (e). We have

where

and

On the basis of invariance property of maximum-likelihood estimates, the estimate of e is the MLE because the estimate of m is the MLE.

Epistatic model:
The two-QTL quantitative genetic model for a triploid endosperm trait can be expressed in matrix form,

where m = (µk1k2)16x1 is the vector for the genotypic means of 16 QTL genotypes (k1, k2 = 0 , ... , 3), which can be estimated in the M step (a similar M step was described above for the additive-dominant model); e = (µ, a1, a2, d11, d21, d12, d22, iaa, j1ad, j2ad, k1da, k2da, l11dd, l12dd, l21dd, l22dd)T is the vector for the unknown QTL effects specified in the quantitative inheritance of the endosperm, and A is the design matrix relating the genotypic means to the QTL effects. Based on quantitative genetic models of the triploid endosperm (GALE 1976 Down; MO 1987 Down; BOGYO et al. 1988 Down; POONI et al. 1992 Down), we have

Then, the MLE of e is obtained as ê = A-1. The sampling variance of ê is calculated by A-1(A-1)T{sigma}2, whose elements on the diagonal are (5/16, 1/9, 1/4, 121/144, 121/144, 3/4, 1, 4/81, 28/81, 28/81, 157/324, 157/324, 775/81, 40/3, 775/81, 40/3). Using the design matrix A, the additive and additive x additive effects can be estimated precisely, with the sampling variances of the MLEs being a small proportion (1/4 ~ 4/81) of the estimated residual variance. Compared to additive and additive x additive effects, the precision of the MLEs of different dominant effects and additive x dominant and dominant x additive effects is reduced. The MLEs of different dominant x dominant effects have the lowest precision, whose sampling variances are enlarged relative to the residual variance (775/81 ~ 40/3).

To reduce the sampling variances of the MLEs of the QTL effect parameters, especially the dominant x dominant effects, we use