Abstract
Two major aspects have made the genetic and genomic study of polyploids extremely difficult. First, increased allelic or nonallelic combinations due to multiple alleles result in complex gene actions and interactions for quantitative trait loci (QTL) in polyploids. Second, meiotic configurations in polyploids undergo a complex biological process including either bivalent or multivalent formation, or both. For bivalent polyploids, different degrees of preferential chromosome pairings may occur during meiosis. In this article, we develop a maximum-likelihood-based model for mapping QTL in tetraploids by considering the quantitative inheritance and meiotic mechanism of bivalent polyploids. This bivalent polyploid model is implemented with the EM algorithm to simultaneously estimate QTL position, QTL effects, and QTL-marker linkage phases by incorporating the impact of a cytological parameter determining bivalent chromosome pairings (the preferential pairing factor). Simulation studies are performed to investigate the performance and robustness of our statistical method for parameter estimation. The implication and extension of the bivalent polyploid model are discussed.
POLYPLOIDS represent a group of plant species that are of great importance to evolutionary studies and plant breeding (Zeven 1979; Bever and Felber 1992; Hilu 1993; Ramsey and Schemske 1998; Ott and Whitton 2000; Soltis and Soltis 2000). The genetic study of polyploids intrigued earlier pioneering geneticists (Haldane 1930; Mather 1935, 1936; Fisher 1947), who developed a series of theoretical models to study segregation and linkage in autotetraploids. Unfortunately, these seminal models have been limited in practical analysis, partly due to the fact that genetic information needed in the models could not be obtained with ease. Currently, the advent of molecular marker technologies has led to a resurgence of interest in the genetic analysis of polyploids (Leitch and Bennett 1997). Much theoretical and empirical emphasis has been made on marker inheritance and segregation and the construction of a genetic linkage map in polyploids (Wuet al. 1992; Da Silvaet al. 1995; Grivetet al. 1996; Hackettet al. 1998; Minget al. 1998; Brouwer and Osborn 1999; Ripolet al. 1999; Fjellstromet al. 2001; Hoarauet al. 2001; Luoet al. 2001; Rajapakseet al. 2001; R. Wu et al. 2001, 2002a; S. Wuet al. 2001).
A significant gap that still remains in the current genetic study of polyploids is a serious lack of powerful statistical methods for mapping quantitative trait loci (QTL) on the basis of the genetic map of polymorphic markers. We know of only three articles that deal with the development of QTL-mapping methodologies (Doerge and Craig 2000; Xie and Xu 2000; Hackettet al. 2001). Considering the availability of marker and phenotype data in a variety of polyploid species ranging from tetra-ploids to octoploids (Yu and Pauls 1993; Grivetet al. 1996; Meyeret al. 1998; Minget al. 1998; Brouwer and Osborn 1999; Fjellstromet al. 2001; Hoarauet al. 2001; Rajapakseet al. 2001), this is a small number. One of these three articles did not use the appropriate biological process of meiosis in polyploids and its application is thus questionable (as noted by Hackett 2001). The other two articles were also based on limiting assumptions. Doerge and Craig assumed a completely preferential chromosome pairing mechanism for meiotic configurations and, therefore, their method can be appropriate only for extreme allopolyploids, in which chromosome pairings occur strictly between homologs. On the other hand, Hackett et al. treated bivalent pairings as a random event that occurs only when all chromosomes in the set are homologous (extreme autopolyploids). From a quantitative genetic perspective, none of the three articles have provided adequate estimations of allelic effects and dominant effects of different within-locus interaction levels for a putative QTL in polyploids. A major contribution of Hackett et al. (2001) is the implementation of Kempthorne's (1957) partitioning theory within a QTL-mapping framework to estimate additive and dominant effects of genes in polyploids. However, they did not explicitly show how the dominance effects were estimated from their model.
In this article, we have developed a new maximum-likelihood-based statistical infrastructure for mapping QTL in polyploids undergoing bivalent formation during meiosis. Beyond the existing statistical methods, our method integrates quantitative genetic knowledge about gene action and interaction and cytological mechanisms of chromosome pairing to gain better insights into the structure, organization, and function of polyploid genomes. It is observed that for many polyploids there is a higher probability of pairing between more similar chromosomes than between less similar chromosomes (Hickok 1978; Sybenga 1988, 1994, 1995; Allendorf and Danzmann 1997). By implementing powerful expectation-maximization (EM) algorithms, our method can provide simultaneous estimation of QTL position, QTL effects, linkage phase configuration, and cytological parameters. Moreover, results from our method will have potential implications for understanding the genetic architecture of a complex trait and evolutionary relatedness in polyploids. We present extensive simulation studies to investigate the statistical properties of our method built upon bivalent chromosome pairings.
MATHEMATICAL MODEL FOR LINKAGE ANALYSIS
Meiotic pairing: Consider a bivalent tetraploid, in which there are four sets of chromosomes. If chromosomes 1 and 2 are genetically more identical, as are chromosomes 3 and 4, there are three different combinations for the bivalent chromosome pairing. One of the three pairs is between more identical chromosomes 1 and 2 as well as 3 and 4 (
Tetraploid model for three-point linkage analysis: Linkage analysis in most diploid organisms is based on inbred line crosses, such as a backcross or F2. However, for many other species including polyploids, inbred lines are not available and, thus, their linkage analysis should be based on a full-sib family derived from outbred parental lines. In such a full-sib family, numerous cross types of genes can be possible. To simplify our description of linkage analysis in polyploids, we first consider fully informative markers between the two parents. Our mapping model can be readily generalized to consider arbitrary polyploid cross types composed of any type of partially informative markers.
Suppose there is a full-sib family of size n derived from two heterozygous tetraploid parents P and Q. Consider two fully informative markers
(1)
where lines indicate the individual chromosomes on which the QTL is bracketed by the two markers and ⊗ is the Kronecker product. The specific linkage phase combination of parents P and Q, which is not known a priori, must be inferred from these possibilities for correct QTL mapping on the basis of marker and phenotype observations. In general, the linkage phase of the two markers is known before QTL mapping. Thus, we need to determine only the most likely linkage phase combination from 24 × 24 = 576 possibilities of the QTL relative to its two flanking markers.
Apart from the effect of different linkage phases on gamete formation frequencies, as a case in diploid organisms (Wuet al. 2002b), different chromosome pairings (
(2)
where double lines are used to distinguish the two sets of paired chromosomes. For one parent, each of these three different bivalent pairings produces four diploid gamete types at a single locus. When the gametes are mixed from these pairings, a total of six gamete types will be produced for a locus. Thus, under bivalent pairings, parent P generates 36 diploid gametes at the two markers, whose genotypes are arrayed by
The probabilities of these marker gametes,
Joint probabilities of marker-QTL gamete genotypes for two fully informative markers and
under an assumed linkage phase as given in expression (1) for parent P
Similarly, for parent Q, we can write the array of the two-marker gamete genotypes,
With the information of the two parents, we can express the arrays of zygote genotypes for the markers and the QTL, respectively, as
The conditional probabilities of the QTL zygote genotypes upon the marker zygote genotypes can be derived as
STATISTICAL METHOD FOR QTL MAPPING
The mixture model: A fundamental statistical model for QTL mapping is the mixture model (Lander and Botstein 1989). In such a mixture model, each observation y is assumed to have arisen from one of n (n possibly unknown but finite) components, each component being modeled by a density from the parametric family f,
For the mixture model used in genetic mapping, each component represents a class of QTL genotypes and, thus, the mixture model provides a framework by which observations may be clustered together into different classes of QTL genotypes. The mixture proportions represent the relative frequency of occurrence of each QTL genotype in the population. For a particular two-marker genotype,
Linear model of a quantitative trait: The mixture components in the mixture model of Equation 6 follow a normal distribution, with the mean equal to the expected genotypic value (μu1u2v1v2) of a QTL genotype and the variance equal to the residual variance (σ2) within the QTL genotype. The phenotype of a quantitative trait observed for individual i can be described by a linear model,
Because some of the main and interaction effects are not independent, a parameterization process based on effect partitioning is needed to obtain a smaller number of estimable independent parameters (appendix a). After this, estimable parameters include 6 for the main effects, 13 for the diallelic interactions (2 for interactions between alleles from parent P, 2 for parent Q, and 9 for interactions between alleles from different parents), 12 triallelic interactions, and 4 tetraallelic interactions (see also Hackettet al. 2001). These 35 independent effect parameters, plus the overall mean, are denoted by the vector a.
We also used orthogonal polynomials to parameterize the main and interaction effects into linear contrasts, quadratic contrasts, and, if any, cubic contrasts (C.-X. Ma and R. L. Wu, unpublished results). Yet, we do not report the results from this parameterization approach here because of space limitation.
Computational algorithm: A maximum-likelihood approach is used to fit a single QTL affecting a quantitative trait in tetraploids. The likelihood of the phenotypes (y) for n offspring in a full-sib family of two outcrossing tetraploids is expressed as
As seen from above, the total number of QTL effects equals the number of the QTL genotypes in bivalent tetraploids. This permits us to estimate the overall mean and QTL effect parameters from the estimated values
The characterization of linkage phase: Above, we have derived a statistical procedure for estimating the recombination fraction and the preferential pairing factor in polyploids when their chromosome pairings at meiosis follow the bivalent model. The procedure assumes the linkage phase combination of the two markers and QTL as indicated by display (1). However, this represents only one of the 576 possible combinations for the two phase-known flanking markers and the QTL. Optimal estimates of all parameters should be based on a most likely linkage phase combination. Different linkage phases of the QTL relative to its flanking markers can be assigned on the basis of the permutation of four QTL alleles on four different chromosomes for each parent. A most likely linkage phase combination should correspond to the largest likelihood value calculated from Equation 9.
However, a new question arises about the comparisons of the likelihood values among different phase combinations. If we change different linkage phases, we may obtain different estimates for a QTL effect parameter, but we will obtain the same likelihood value. We therefore should pose constraints on allelic effects of the two parents to obtain comparable likelihood values. In fact, the occurrence of a particular linkage phase implies that alleles should be different for both loci under consideration. A total of 576 phase combinations between the QTL and its flanking fully informative markers are based on the condition that four QTL alleles are different for each parent. The direct description of such differences can be provided by allelic effects. Thus, we can pose the inequality constraints of three allelic effects from each parent. Without loss of generality, such constraints can be taken as
Hypothesis tests: After the optimal estimates for the linkage and linkage phase are obtained on the basis of the largest likelihood value, we test for the significance of linkage by calculating the likelihood-ratio test (LRT) statistic,
and
stand for the MLEs for unknown parameters under the full model (at least one element in a is not equal to zero) and reduced model (a = 0), respectively. By formulating similar reduced models, we can also test for the significance of additive effects or dominance effects at different interaction levels.
As in diploid mapping, simulation studies can be used to determine critical threshold values. We can declare the existence of a significant QTL located between two markers
RESULTS
Simulation studies are performed to examine the statistical behavior of our bivalent polyploid model. We first focus our simulation to quantify the effects of trait heritability and sample size on the estimation of QTL parameters and of the bivalent chromosome pairing parameter. Then, we compare the differences of parameter estimates between our method and Doerge and Craig's (2000) method, in which completely preferential bivalent chromosome pairings are assumed, and Hackett et al.'s (2001) method, in which random chromosome pairings are assumed.
Experimental design: Two outcrossing tetraploid parents are simulated for two fully informative markers and a QTL with an assumed linkage phase configuration shown in display (1). The recombination fractions between the two markers and between the first marker and the QTL are given as 0.20 and 0.10, respectively. The preferential pairing factor p = 0.30 is assumed. These two parents are crossed to generate a full-sib family of 200, 400, and 800 offspring. Given a sample size, the observations of each of 36 × 36 = 1296 offspring genotypes at these two markers are simulated on the basis of their respective frequencies (Equation 4).
The numbers of offspring within each marker genotype carrying each of 36 QTL genotypes are simulated on the basis of the conditional probability matrices of Equation 5. Because of the QTL effects, offspring with different QTL genotypes will be different for a quantitative trait. The genotypic values of the offspring carrying different QTL genotypes are calculated on the basis of their structures, as given in D–1a (appendix a), using the hypothesized values of the overall mean and 35 effects in the vector a (Table 2). The variance among these genotypic values is the genetic variance explained by this QTL. The phenotypic values of the offspring are calculated as an overall mean of 10, plus the genotypic values and the residual effects distributed as N(0, σ2). Different σ2 values are assigned by assuming different heritability levels 0.20 and 0.40. The heritability is defined as the proportion of the genetic variance to the total phenotypic variance.
For the simulated marker and phenotypic data, we use the bivalent polyploid model to estimate unknown parameters contained in the vector Θ and further obtain the MLEs of Ω using a procedure described in appendix a. By permutating the arrangements of four QTL alleles among the four chromosomes for each parent, we obtain the MLEs of Ω with the constraints, as given in displays (10) and (11), under a total of 576 linkage phase combinations. The phase combination that has the largest likelihood value is regarded as a most likely one, under which the MLEs of Ω are given in Table 2. The simulations are repeated 100 times to calculate the means and standard errors of the MLEs from our model.
The effects of trait heritability and sample size: Using the computational algorithms described in appendix b, we obtain the MLEs of Θ. The recombination fraction between the first marker and the QTL can be accurately estimated for different sample sizes (n) and heritability (H 2) levels considered, although its estimation precision increases with sample sizes and heritability levels. The estimate of residual variance (σ2) is considerably downward biased, especially for a trait with low heritability, if the sample size used is <400.
The real genotypic values of the 36 QTL genotypes are determined from a = D–1m (see appendix a). The EM algorithm provides accurate estimates for these genotypic values, even when sample size or heritability is low (results not shown). If the genotypic values can be well estimated, the QTL gene effects (a) can also be well estimated because, according to our parameterization, the sampling variances of â will be reduced relative
to those of m̂ [see the structure of D–1(D–1)T in appendix a]. It is shown that the estimators of additive effects of alleles for each parent have only one-sixteenth of the sampling variance of the estimated residual variances. The estimates of dominant effects vary depending upon the type and degree of interactions. If dominant effects are derived from the two alleles of one same parent, their estimators will be even more precise than those of the allelic effects. The estimators of dominant effects are derived from two alleles of different parents having the lowest precision, whose sampling variances are
MLEs of allelic action and interaction effects for a QTL bracketed by two flanking markers and
under different sample sizes (n) and heritability (H2) levels
As expected, the allelic (or additive) effects can be estimated both more accurately and more precisely than the dominant effects, and the dominant effects of lower-order interactions can be estimated more precisely than the dominant effects of higher-order interactions (Table 2). It is interesting to note that the diallelic dominance effects between two alleles from the same parent can be estimated better than those between two alleles from different parents.
The probabilities of detecting a correct linkage phase combination (Pr1), multiple linkage phase combinations including the correct one (Pr2), and an incorrect linkage phase combination (Pr3) from a total of 576 possible phase combinations between two phase-known fully informative markers and a putative QTL for two tetraploid plants
For all kinds of gene effects in bivalent tetraploids, the estimation accuracy and precision are increased when sample sizes and heritability levels are increased (Table 2). In general, a sample size of 200 can provide reasonably precise estimates of the allelic additive effects for a quantitative trait with a heritability of 0.20. But the estimation precision can be significantly improved if n is increased to 400 or for a quantitative trait with an increased H2 level. There is not much improvement if n is further increased from 400 to 800, even for a less inheritable trait.
For the diallelic dominance effects between two alleles from the same parent, it seems that for a lower heritability (0.20) a sample size of at least 400 is needed to achieve reasonable estimation precision, whereas for a heritability of at least 0.40 a smaller sample size (200) may be adequate, compared to the magnitudes of the actual values of these effects that are hypothesized (Table 2). For the diallelic dominance effects between two alleles from different parents, reasonable estimates need a sample size of at least 400 for a trait with a heritability of at least 0.40. In general, it is difficult to estimate triallelic dominance effects unless a sample size is extremely large (say 800). To obtain reasonable estimates for tetraallelic effects, an extremely large sample size should accompany a highly inheritable quantitative trait (see Table 2).
The estimates of all parameters listed in Table 2 were based on an optimal linkage phase combination selected from all possibilities in terms of the estimated likelihood values. The probabilities of detecting a correct linkage phase combination were estimated for different sample sizes and heritability levels (Table 3). When N = 200 and H 2 = 0.20, we have only about one-third probability to detect a correct linkage phase combination. Other probabilities include about one-quarter to detect two linkage phase combinations and about one-half to detect an incorrect linkage phase combination. When a sample size or heritability is doubled, the probability of detecting an incorrect linkage phase combination is reduced. If a sample size of 400 is used for a trait of H 2 = 0.40, no incorrect linkage phase combination will be detected.
The log-likelihood ratios (LRT) of Equation 12 were used to test for the significance of QTL effects under different sample sizes and heritability levels. Except for a few cases where N = 200 and H2 = 0.20, QTL can be detected at a significance level of P = 0.05 in all 100 repeated simulations. The critical threshold value was calculated by simulating data sets with QTL effects set to zero and examining the distribution of the LRT (see also Hackettet al. 2001). Using the 95% point of the distribution of the LRT gives a test of significance at a 5% level for the presence of a QTL.
The effects of completely preferential pairings and random pairings: Doerge and Craig (2000) assumed that chromosomes pair strictly between homologs during polyploid meiosis. If this assumption is true, we will have only one bivalent pairing pattern, as opposed to three patterns when incompletely preferential pairings are considered [see expression (2)]. Thus, under this assumption there will be only 16 gamete genotypes at two informative markers and 4 gamete genotypes at one QTL for each parent. Such a (16 × 4) matrix of conditional probabilities with the completely preferential pairing assumption represents an allopolyploid model and is a subset of the (36 × 6) matrix used in our method.
Hackett et al.'s assumption of random bivalent pairings (the autopolyploid model) leads to the same structure of the conditional probability matrix that we have in our bivalent polyploid model. Because our model covers the allo- and autopolyploid model, it can be regarded as the general polyploid model. Here, we make a comparison between Hackett et al.'s method and our method by first looking at the conditional probability matrix derived for the general polyploid model listed in Table 1. From the table it is found that only the conditional probabilities of QTL genotypes of 12 bold-faced marker genotypes contain p and the conditional probabilities of the rest of the 24 marker genotypes do not contain p because p is canceled out. This means that p may have a relatively small influence on the conditional probability matrix and therefore on parameter estimates under the general polyploid model when two markers considered are fully informative. In other words, for fully informative markers, results from the autopolyploid model will be similar to those from the general polyploid model. A small simulation study has confirmed this inference (results not shown).
However, for partially informative markers (R. Wuet al. 2001), some of the QTL genotypes will be collapsed into one so that the corresponding joint genotypic probabilities will be summed up. For example, for a single-dose restriction fragment (simplex) (Pppp), six QTL gamete genotypes (P1P2, P1P3, P1P4, P2P3, P2P4, and P3P4) will be reduced to two (Pp and pp) with each summed from three gamete genotypes. Similar reductions are also true for two flanking simplex markers. In this case, p would not be canceled out in the conditional probability matrix and, therefore, will play an important role in affecting the estimates of QTL position and effect parameters.
DISCUSSION
The development of statistical methods for mapping QTL in polyploids is one of the most difficult tasks in genetic and genomic study. Although quite a few studies of linkage analysis have used polymorphic markers in polyploids (Hackettet al. 1998; Ripolet al. 1999; Luoet al. 2001; R. Wu et al. 2001, 2002a; S. Wuet al. 2001), we know of only three articles published about the statistical developments of QTL mapping in this recalcitrant group of species (Doerge and Craig 2000; Xie and Xu 2000; Hackettet al. 2001), with one, unfortunately, based on an improper biological process of polyploid meiosis (as noted by Hackett 2001). The other two articles require simplifying assumptions, which are not likely to hold in real life. Doerge and Craig's (2000) method can be appropriate only for extreme allopolyploids, in which chromosome pairings occur strictly between two homologs. On the contrary, the assumption used in Hackett et al. (2001) is random bivalent pairings during meiosis and, thus, that method can fit only extreme bivalent autopolyploids having identical chromosomes in the set.
In this article we report on the development of a novel statistical methodology for QTL mapping in bivalent polyploids that represent an important group of polyploids including alfalfa, potato, and wheat. Using extensive simulations, we examined the robustness and performance of this bivalent polyploid method in estimating QTL effects, QTL position, and QTL linkage phase relative to known-phase markers under different sample sizes and heritability levels. We also compared the results from our method and the current methods on the basis of the allo- and autopolyploid model. Our method has four significant improvements over these current statistical methods for QTL mapping in bivalent polyploids. First, our method incorporates a general bivalent pairing mechanism of meiotic configuration by defining a cytological parameter called the preferential pairing factor. The preferential pairing factor (p) is defined as the propensity of bivalent pairings between more similar rather than less similar chromosomes (Sybenga 1988, 1994, 1995, 1996). Different values of this parameter, ranging from 0 to ⅔, describe different degrees of relatedness between the chromosomes in the set. When p = 0, it means that chromosomes pair randomly and that our method is automatically reduced to the autopolyploid model. When p = ⅔, only identical chromosomes can pair and our method is reduced to the allopolyploid model. Our method therefore represents a general model for QTL mapping in bivalent polyploids. It can, in particular, be applied for those polyploids whose chromosome origins (auto- vs. allopolyploids) are unknown a priori. In a recent review by Soltis and Soltis (2000), such origin-unknown polyploids commonly occur in nature. On the basis of the estimate of p, we will be in a better position to study the origin and relatedness of the genomes contained in a polyploid (Sybenga 1996).
The second improvement of our method is a thorough exploration of QTL action and interaction effects on phenotypes in polyploids. As with diploids, the inheritance mode of QTL in polyploids can be additive or dominant. But compared with diploids, these gene actions and interactions are much more complicated because of an increased number of alleles and allele combinations. Kempthorne (1957) extended the diploid theory of quantitative genetics to partition genetic effects of a QTL into additive and dominant components of different within-locus interaction levels in polyploids. For a bivalent tetraploid having four different alleles at a QTL, we are confronted with 4 allelic or additive effects, 28 diallelic dominant interaction effects, 48 triallelic dominant interaction effects, and 36 tetraallelic interaction effects. Because these 120 parameters are not completely independent, their dependence needs to be removed to obtain estimable parameters. We used a parameterization process to reduce these parameters to 36 independent ones. Such a reduced space of unknown parameters was also embedded in Hackett et al.'s (2001) QTL-mapping framework, but those authors have not provided a tractable estimation of all these components. In fact, it is impossible to obtain accurate and precise estimates of these 36 independent parameters on the basis of a sample size we can have in practice, using a traditional treatment for QTL mapping in diploids.
The efficient estimation of these 36 quantitative genetic parameters in tetraploid mapping, therefore, offers the third improvement of our method over the current methods. In this article, we incorporate the EM algorithm (Lander and Botstein 1989; Meng and Rubin 1993) and techniques of experimental design to estimate QTL effects at different levels. In a statistical mixture model for QTL mapping, the EM algorithm can provide robust estimates for the expected means of QTL genotypes. This advantage is combined with a parameterization process to provide robust estimates of QTL effects that constitute the QTL genotypic means. Through a parameterization process, the sampling variance of the estimator of each QTL effect is only a small portion of the sampling variance of the estimated residual variance (see appendix a). Also, the influences of the estimator of one QTL effect by other effects are limited within the QTL effects of similar nature [see the structure of D–1(D–1)T]. These two favorable properties of the parameterized QTL effects assure the estimation precision of QTL actions and interactions, as demonstrated in the simulation study.
The correct characterization of linkage phases is a prerequisite for genome mapping in species like polyploids, in which homozygous inbred lines cannot be obtained. In this article, we used a modified EM algorithm to simultaneously estimate linkage and linkage phases. Luo et al. (2001) determined a most likely linkage phase of different markers on the basis of the largest maximum log-likelihood ratio and the lowest estimate of the recombination fraction. However, when this procedure is used for characterizing a most likely linkage phase of a QTL relative to its flanking markers, there is a technical difficulty. When permuting different linkage phases, the same likelihood value will be detected, despite different estimates of QTL effects obtained, and thus gives no way of selecting a most likely linkage phase. In this article, the EM algorithm was performed by posing some constraints on additive effects of different QTL alleles. This modified approach greatly increases the probability of correctly selecting a most likely linkage phase, as shown in the simulation study. The more efficient characterization of linkage phase presents a fourth technical merit of our method.
Our method can be extended to several more general situations. First, we should consider statistical properties of using markers with lower degrees of informativeness to map QTL in polyploids. Hackett et al. (2001) found that less informative markers, e.g., dominant markers, displayed reduced power of QTL detection compared to fully informative markers. But it is unclear how much the estimation precision and power will be reduced for our bivalent polyploid model, incorporating the preferential pairing factor when partially informative markers are used. It appears that partially informative markers, e.g., single-, double-, or multidose restriction fragments, occupy a larger portion of polyploid genomes (da Silvaet al. 1995; Brouweret al. 2000; Minget al. 2001). In this study, we assume that the recombination fractions and the preferential pairing factor are identical between the two parents. Because these two parameters are often population or sex specific (e.g., Allendorf and Danzmann 1997), we should study the effects of parentdependent linkages and preferential pairing factors on the estimates of QTL parameters. In the real world, many polyploids undergo multivalent formations. Our bivalent polyploids cannot solve the issues arising from multivalent formation, which leads to be typical genetic phenomenon of double reduction (Butruille and Boiteux 2000). Last but not least, our bivalent tetraploid model should be extended to study polyploids at a higher polyploidy level. The model reported in this article represents a platform on which complicated problems related to polyploid mapping can be solved within our framework, integrating statistics, genetics, computer science, and cytology.
Acknowledgments
ments. This work is partially supported by an Outstanding Young Investigator Award of the National Natural Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense grant (7222061-12) to R.W. The publication of this manuscript is approved as a journal series no. R-08795 by the Florida Agricultural Experiment Station.
APPENDIX A: PARAMETERIZATION OF GENE EFFECTS
All the main and interaction effects in bivalent tetraploids should be parameterized to obtain a group of estimable parameters. In this article, the parameterization of these gene effects is based on different constraints posed on them. The constraints on the allelic (main) effects are expressed as
By these parameterization constraints, a total of 120 QTL effect parameters contained in the genotypic values Pu1Pu2Qv1Qv2 (1 ≤ u1 < u2 ≤ 4, 1 ≤ v1 < v2 ≤ 4) in Equation 8 can be reduced to 35 independent estimable parameters. Without loss of generality, 6 independent allelic (main) effect parameters are assigned as
The vector (m = (μu1u2v1v2)36×1, 1 ≤ u1 < u2 ≤ 4, 1 ≤ v1 < v2 ≤ 4) of the 36 QTL genotypic values can be expressed in terms of these assigned effect parameters. We have
and a is the vector of gene effects, expressed as
which has two desirable properties: (1) the elements on its diagonal are much smaller than one, ranging from
APPENDIX B: IMPLEMENTATION OF THE EM ALGORITHM
The parameter vector in which we are interested is denoted by Ω. But the estimation of this vector is not most efficient from a computational standpoint. As explained in the text, we define a new vector Θ = (m, r1 or r2, σ2, p), which can be more easily estimated by implementing the EM algorithm (Dempsteret al. 1977; Meng and Rubin 1993). The log-likelihood of the new vector is given by
The log-likelihood equations for the MLEs of m and σ2 are given as
Footnotes
-
Communicating editor: M. A. Asmussen
- Received November 5, 2002.
- Accepted September 28, 2003.
- Copyright © 2004 by the Genetics Society of America