Genetics, Vol. 156, 855-865, October 2000, Copyright © 2000

On the Differences Between Maximum Likelihood and Regression Interval Mapping in the Analysis of Quantitative Trait Loci

Chen-Hung Kaoa
a Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China

Corresponding author: Chen-Hung Kao

Communicating editor: Z-B. ZENG


*  ABSTRACT
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

The differences between maximum-likelihood (ML) and regression (REG) interval mapping in the analysis of quantitative trait loci (QTL) are investigated analytically and numerically by simulation. The analytical investigation is based on the comparison of the solution sets of the ML and REG methods in the estimation of QTL parameters. Their differences are found to relate to the similarity between the conditional posterior and conditional probabilities of QTL genotypes and depend on several factors, such as the proportion of variance explained by QTL, relative QTL position in an interval, interval size, difference between the sizes of QTL, epistasis, and linkage between QTL. The differences in mean squared error (MSE) of the estimates, likelihood-ratio test (LRT) statistics in testing parameters, and power of QTL detection between the two methods become larger as (1) the proportion of variance explained by QTL becomes higher, (2) the QTL locations are positioned toward the middle of intervals, (3) the QTL are located in wider marker intervals, (4) epistasis between QTL is stronger, (5) the difference between QTL effects becomes larger, and (6) the positions of QTL get closer in QTL mapping. The REG method is biased in the estimation of the proportion of variance explained by QTL, and it may have a serious problem in detecting closely linked QTL when compared to the ML method. In general, the differences between the two methods may be minor, but can be significant when QTL interact or are closely linked. The ML method tends to be more powerful and to give estimates with smaller MSEs and larger LRT statistics. This implies that ML interval mapping can be more accurate, precise, and powerful than REG interval mapping. The REG method is faster in computation, especially when the number of QTL considered in the model is large. Recognizing the factors affecting the differences between REG and ML interval mapping can help an efficient strategy, using both methods in QTL mapping to be outlined.


SINCE LANDER and BOTSTEIN 1989 Down initiated an interval mapping method for systematically searching the entire genome for quantitative trait loci (QTL) using molecular genetic marker data, many efforts have been made to enhance the precision, accuracy, and power of QTL mapping. They include the extension of the statistical model from one-QTL to multiple-QTL (JANSEN 1993 Down; ZENG 1994 Down; KAO et al. 1999 Down), incorporation of random effects in the model (HOESCHELE and VANRADEN 1993A Down, HOESCHELE and VANRADEN 1993B Down), ease of computation (HALEY and KNOTT 1992 Down; MARTINEZ and CURNOW 1992 Down; XU 1998A Down, XU 1998B Down), generalization to different experimental designs (CARBONELL et al. 1992 Down; JIANG and ZENG 1997 Down; SONG et al. 1999 Down; ZENG et al. 2000 Down) and to multiple and categorical trait analyses (HACKETT and WELLER 1995 Down; JIANG and ZENG 1995; HENSHALL and GODDARD 1999 Down), the use of permutation tests, and Bayesian estimation (DOERGE and CHURCHILL 1996 Down; SATAGOPAN et al. 1996 Down; SILLANPAA and ARJAS 1999 Down) in QTL mapping.

The likelihood of the interval mapping model is generally a finite normal mixture (LANDER and BOTSTEIN 1989 Down; JANSEN 1993 Down; ZENG 1994 Down; KAO et al. 1999 Down). In the computation of the maximum-likelihood estimates (MLE) of the finite normal mixture model, the iterative expectation-maximization (EM) algorithm (DEMPSTER et al. 1977 Down) is broadly applicable as Newton-Raphson and Fisher's score methods may turn out to be complicated. When the number of QTL considered in the model increases, the numbers of mixture components and parameters in the likelihood increase dramatically. As a result, maximization of the likelihood through the EM algorithm could become difficult to obtain; moreover, when mapping the entire genome for QTL, the search needs to be performed at every position of the genome. Therefore, the ML estimation by the EM algorithm is often regarded to be complex in analysis and computationally expensive for QTL mapping (HALEY and KNOTT 1992 Down; SATAGOPAN et al. 1996 Down; XU 1998A Down, XU 1998B Down). In view of these difficulties, regression (REG) interval mapping, which regresses the quantitative trait value on the conditional expected genotypic value, was proposed to approximate ML interval mapping to save computation time at one or multiple genomic positions (HALEY and KNOTT 1992 Down; MARTINEZ and CURNOW 1992 Down). Although REG interval mapping lacks some attractive properties, such as consistency and asymptotic efficiency, as compared to ML interval mapping in statistical inference, and may suffer from the lack of interpretability in terms of the genetic model (HALEY and KNOTT 1992 Down; JANSEN 1993 Down), it is often claimed that the two approaches provide virtually similar or identical estimates and test statistics in QTL mapping (HALEY and KNOTT 1992 Down; XU 1998A Down, XU 1998B Down). As a consequence, the REG method has been widely accepted and applied to QTL mapping studies by many researchers (HALEY et al. 1994 Down; WHITTAKER et al. 1996 Down; XU 1996 Down, XU 1998A Down, XU 1998B Down; GOFFINET and MANGIN 1998 Down; LEBRETON et al. 1998 Down; DUPUIS and SIEGMUND 1999 Down; REBAI and GOFFINET 2000 Down).

Although REG may approximate ML interval mapping well in some cases as shown by HALEY and KNOTT 1992 Down and XU 1998A Down, XU 1998B Down, their differences in the estimation of QTL parameters could be significant for practical QTL mapping as shown in this article. Unfortunately, there are few attempts to investigate these differences in the literature. XU 1995 Down pointed out that the estimation of residual variance by REG interval mapping is biased. In this article, the differences between the two approaches in the estimation of and testing for QTL parameters due to several factors, such as heritability, size of interval, relative QTL position in an interval, the difference between QTL effects, epistasis, and linkage between QTL, are investigated both analytically and numerically by simulation. With the understanding of the factors affecting the differences between the two methods, a more efficient, precise, and powerful strategy using both methods can be explored in QTL mapping. The QTL mapping properties under these factors are also investigated and discussed.


*  MAXIMUM-LIKELIHOOD INTERVAL MAPPING
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

The differences between the ML and REG interval mappings can be illustrated by investigating the differences between their estimators of mean, genetic effects, and residual variance. To simplify the explanation of their differences, a one-QTL model for a backcross population is first used as an example. Their differences under a multiple-QTL model are discussed later. The one-QTL ML mapping model can be written as

(1)

where yi is the quantitative trait value of individual i, µ is the mean, a is the effect of QTL Q, x*i, taking a value 1/2 (-1/2) for homozygote QQ (heterozygote Qq), denotes the genotype of Q, and {epsilon}i is the environmental deviation and is assumed to follow N(0, {sigma}2). Although the genotype of Q for an individual is usually unobserved and could be QQ or Qq, its distribution can be inferred from its flanking marker genotypes. Suppose the flanking markers are M and N. Then, there are four types of marker genotypes, type 1 MN/MN, type 2 MN/Mn, type 3 MN/mN, and type 4 MN/mn, as shown in Table 1. Given the four marker genotypes, the conditional probabilities for QTL genotypes QQ and Qq, denoted by pi1 and pi2, respectively, at a position between the markers can be calculated based on Haldane's mapping function (HALDANE 1919 Down), and they are listed in Table 1.


 
View this table:
In this window
In a new window

 
Table 1. Conditional probabilities of a putative QTL given flanking marker genotypes for a backcross population

Since the QTL genotype x*i could be homozygote (1/2) or heterozygote (-1/2) for an individual, the likelihood is then a normal mixture with mixing proportions equivalent to the conditional probabilities pi1 and pi2. For n individuals in the sample, the likelihood of the model in Equation 1 is

(2)

where {theta} denotes parameters (pij, µ, a, {sigma}2), Y and X denote the trait value and marker genotypes, Nij, {sigma}2) denotes the normal density function with mean µij and variance {sigma}2, and

are genotypic values of QQ and Qq. In estimation, this normal mixture model can be treated as an incomplete data problem (LITTLE and RUBIN 1987 Down), which regards the QTL genotype xi* as missing data and trait yi and markers Xi as observed data, and the EM algorithm can be implemented to maximize the likelihood and obtain the MLE.

Let the probability distribution of missing data x*i as

The conditional distribution of the observed data, yi and Xi, given the missing data xi* can be considered as an independent sample from a population such that yi|({theta}, Xi, xi*) ~ N + axi*, {sigma}2), and the EM algorithm can be used to obtain the MLE. At a given position, pij's can be determined. By the definition of the EM algorithm, the iteration of the EM-step for obtaining µ, a, and {sigma}2 proceeds as follows:

E-step:
The posterior probabilities of the QTL genotypes x*i's of the n individuals are updated as

and

for i = 1, 2, ... , n (see KAO and ZENG 1997 Down for the detailed procedure of derivation). Note that {pi}i1 for individual i is a function of pi1, pi2, yi, µ, a, and {sigma}2. It is important to clarify the relationship between the conditional probability pi1 and conditional posterior probability {pi}i1 of the QTL genotype in the comparison of ML and REG interval mappings. It is shown later that the more similarity between {pi}i1 and pi1 for each i, the better the approximation of REG to ML interval mapping. Note that pi1 = {pi}i1 if pi1 = 1 or pi1 = 0 (i.e., the QTL is located at the marker) or a = 0. If pi1 {approx} 1 or pi2 {approx} 0 or a {approx} 0, then {pi}i1 {approx} pi1.

M-step:
Find µ, a, and {sigma}2 to satisfy

(3)


(4)


(5)

where

and

are the conditional posterior expectations of x*i and x*i2 given yi and Xi, respectively. In each iteration, new estimates of µ, a, and {sigma}2 are obtained in the M-step. These new estimates are then used to obtain new {pi}i1's and {pi}i2's for the next iteration. The converged values in the iteration are the MLE. A disadvantage of the EM algorithm was that it did not provide the estimates of the covariance matrix of the MLE. However, this disadvantage can be easily removed by using appropriate methods, such as by LOUIS 1982 Down and MENG and RUBIN 1991 Down, associated with the EM algorithm. The null hypothesis, H0, a = 0, for the existence of a QTL is tested by a likelihood-ratio test (LRT), -2 loge(L0/L1), where L0 and L1 are the maximum likelihoods under H0 and no restriction. The larger the LRT statistic at a testing position, the more likely the existence of a QTL at that position.


*  REGRESSION INTERVAL MAPPING
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

HALEY and KNOTT 1992 Down commented on ML interval mapping that the iterative procedure of the EM algorithm in obtaining the MLE can be complex and computationally slow to converge. Therefore, they developed REG interval mapping to approximate ML interval mapping for mapping QTL. They claimed that the REG method can ease the computation and produce very similar results as those obtained by the ML method. The one-QTL REG interval mapping model for a backcross population can be formulated as

(6)

where µ, a, and {epsilon}i have the same definitions as the model in Equation 1, and

is the conditional expectation of the QTL genotype given the flanking marker genotype. By treating wi as fixed, the model is a regression model, and this method is called REG interval mapping in QTL mapping. In estimation, both least-squares and maximum-likelihood techniques can be implemented to estimate µ, a, and {sigma}2 in Equation 6. Least-squares estimates (LSE) of µ and a are the solutions of

(7)


(8)

where

Note that the estimates of the regression model will fail if zi = 0 (pi1 = pi2 = 1|2) for every i. However, this situation will not occur because pi1 != pi2 for individuals with type 1 (MN/MN) and type 4 (MN/mn) flanking marker genotypes in the backcross population. The LSE of {sigma}2 is

(9)

where n - 2 is the degree of freedom for the residual sum of squares. The likelihood of the REG mapping model is a normal density

(10)

rather than a normal mixture density. The mixing proportion of pij's in the ML mapping likelihood (Equation 2) is blended into wi of Equation 10. If the maximum-likelihood principle is used in estimation, the MLE of µ and a for maximizing Equation 10 are the same as Equation 7 and Equation 8. The MLE of {sigma}2 has a divisor n instead of n - 2 in Equation 9.


*  DIFFERENCES BETWEEN ML AND REG INTERVAL MAPPING
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

By comparing the solution sets between ML and REG interval mappings, it can be seen that the two solution sets have similar expressions, but different contents. In the REG method, the conditional expectations of QTL genotype, wi and zi, are used in estimation. In the ML method, the conditional posterior expectations of QTL genotype, wi* and zi*, play the same role in estimation. The conditional expectations consider only the conditional probabilities of QTL genotypes pi1's, and the conditional posterior expectations consider the posterior probabilities {pi}i1's. It can be seen that the posterior probability {pi}i1 also utilizes phenotypic information as well as marker information. Intuitively, the ML method can provide better estimates than the REG method because {pi}i1 is more informative than pi1. Analytically, the differences between ML and REG interval mapping in estimation will depend on the differences between the two kinds of expectations. The two kinds of expectations are equivalent if and only if {pi}i1 = pi1 and pi1 = 1 (or pi1 = 0) for each i (the QTL is located at a marker). How good the approximation of REG to ML interval mapping is depends on the similarity between pi1's and {pi}i1's. Investigating the factors affecting the similarity between {pi}i1 and pi1 can lead to identifying the differences between the two methods. These factors include (1) proportion of variance explained by a QTL (size of a QTL), (2) the relative QTL position within an interval, and (3) the size of the interval flanking the QTL.

Proportion of variance explained by a QTL:
If the proportion of variance explained by a QTL is small, the ratio of QTL effect a to the environmental deviation {sigma} (a/{sigma}) will be small. Consequently, the densities of normal mixture components are about the same for different genotypes, i.e.,

and pi1 {approx} {pi}i1. The extreme case is a = 0 (µi1 = µi2 = µ) and pi1 = {pi}i1. Therefore, REG mapping can approximate ML mapping well when the proportion is low (the QTL effect a is small). When the proportion is high, QTL effect a becomes relatively large when compared with the environmental deviation {sigma}, and the difference between the two normal densities can become significant. As a result, the approximation of REG to ML interval mapping may not be good for QTL with large effect.

Relative QTL location in an interval:
If a QTL is located on the boundary of a marker interval, pi1 is close to 1 or 0, and the conditional and posterior probabilities will be similar (pi1 {approx} {pi}i1). When the QTL position shifts from the boundary toward the middle of an interval, pi1 and {pi}i1 become more dissimilar to each other. If the QTL is located in the middle, individuals with type 2 or 3 flanking marker genotype have pi1 = 0.5, and pi1 and {pi}i1 will be the most dissimilar. Consequently, the approximation of REG to ML mapping will be better when the QTL is located near the boundary, but it becomes poor as the location moves toward the middle of an interval.

Interval size:
There are four types of flanking marker genotypes (Table 1). Types 1 and 4 are nonrecombinant, and types 2 and 3 are recombinant. Given a position in an interval, the conditional probability pi1 for QQ will be closer to 1 or 0, i.e., pi1 can be closer to {pi}i1, for nonrecombinant individuals than recombinant individuals. As there are more nonrecombinant flanking genotypes in a narrow interval than in a wider interval, the approximation of REG to ML mapping consequently is better for a QTL located in a narrow interval than in a wider interval. Therefore, if QTL are located in the dense marker region, the differences between the two methods will be minor.


*  MULTIPLE-QTL MODEL
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

For the one-QTL model, it has been shown that the approximation of REG to ML interval mapping depends on the similarity between the conditional probability {pi}i1 and conditional posterior probability {pi}i1 for each i. The same argument also applies to the multiple-QTL model. When multiple, say, m QTL are considered, the multiple interval mapping (MIM; KAO et al. 1999 Down) model can be written as

(11)

where xij* denotes the genotype of QTL Qj, aj and Ijk are the main and epistatic effects, {delta}jk is an indicator variable for indicating whether the epistasis between Qj and Qk is present or not, and {epsilon}i is the environmental deviation.

For m QTL, there are 2m possible QTL genotypes; hence there are 2m corresponding genotypic values, µij's, with probabilities pij's, j = 1, 2, ... , 2m. The likelihood of the multiple-QTL model is then a 2m normal mixture

(12)

It seems that the derivation of the MLE of µ, a1, a2, ... , am, Ijk, and {sigma}2 and their asymptotic variance-covariance matrix is tedious as the number of QTL considered increases in the model. However, this tedious estimation problem can be easily solved by the general formulas proposed by KAO and ZENG 1997 Down by expanding the genetic design matrix D and conditional probability matrix Q according to the number and positions of testing QTL. These two matrices play the same role as the matrix of independent variable X in regression. Given X matrix in multiple regression, the estimates of regression coefficients and the asymptotic variance-covariance matrix can be easily obtained by formulas ß = (X'X)-1 X'Y and V = (X'X)-1{sigma}2. Given these two matrices D and Q in the multiple-QTL mapping model, the derivation of the MLE and the asymptotic variance-covariance matrix can be systematically obtained by the general formulas. Given the testing QTL positions, pij's can be determined. According to the general formulas, in the E-step, the 2m posterior probabilities of QTL genotypes for n individuals,

are updated. In the M-step, the solutions of the parameter estimation are in the closed form as shown in KAO and ZENG 1997 Down. The asymptotic variances of QTL positions and effects can also be obtained using the general formulas.

The REG interval mapping model for taking m QTL into account can be written as

In the model, wij is the conditional expectation of Qj given its flanking markers. The LSE of µ, a1, a2, ... , am, Ijk, and {sigma}2 as well as their asymptotic variances can be obtained using the standard least-squares technique.

Differences due to QTL effects, epistasis, and linkage:
By the same argument, it is required that pij {approx} {pi}ij for each i and j for REG interval mapping to approximate ML interval mapping well in the multiple-QTL model. Besides the factors, such as the proportion explained by QTL, the relative QTL position in an interval, and interval sizes, discussed in the previous section, the relative sizes of genotypic values of the 2m possible genotypes, µij's, can affect the approximation of REG to ML interval mapping in the multiple-QTL model. If µij's are dissimilar to each other (more disperse), {pi}ij's can be more dissimilar to pij's, and the differences between the two methods can become large. The difference between the sizes of QTL effects and the strength of epistasis between QTL seems to be an appropriate measure to quantify the dissimilarity between the 2m genotypic values. If QTL effects differ from each other significantly or epistasis between QTL is strong, {pi}ij's tend to be dissimilar to pij's. Consequently, the differences between REG and ML interval mapping will be larger if QTL effects differ significantly or the interaction between QTL is strong.

If QTL are linked, they are correlated. Their correlation is 1 - 2r, where r is the recombination fraction between QTL. As QTL (predictors) in the model are correlated, the effects of collinearity, on modeling such as imprecise estimation and losing power in testing for individual parameters, will occur (NETER et al. 1990 Down). When detecting closely linked QTL using the multiple-QTL model, the correlations between the conditional expectations of QTL, wij's, in the REG model tend to be higher than those between the conditional posterior expectations, w*ij's, in the ML model. As a result, the REG method tends to give estimates with large SD and be less powerful in testing for closely linked QTL (to separate closely linked QTL). The closer they are, the worse the approximation of the REG to ML method will be.


*  SIMULATION STUDIES
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

Simulations were performed to verify the effects of the above factors, such as the proportion of variance explained by QTL, interval size, QTL position, the difference between QTL effects, epistasis, and linkage, on the approximation of REG to ML interval mapping. Assume two unlinked epistatic QTL with effects (a1 = 1, a2 = 1, I12 = 1) that affected a quantitative trait of interest in a backcross population (epistasis contributes 11.11% of the total genetic variation). For simplicity, 10 equally spaced marker intervals were simulated for each chromosome. Four proportions of variance explained by QTL (h2's), 0.01, 0.1, 0.3, and 0.5, and three different interval sizes, 10, 20, and 40 cM, are simulated. The relative QTL positions are placed in the middle or on the boundary of a marker interval (1, 2, and 4 cM away from the left marker of the three different spaced intervals, respectively). When investigating the effect of epistasis, the main and epistatic QTL effects are further set at (a1 = 1, a2 = 1, I12 = 2) or (a1 = 1, a2 = 1, I12 = 3), and the QTL are placed in the middle of 40-cM intervals. Together the QTL contribute 50% of the quantitative trait variation (the percentages of epistatic variance in the total genetic variance are 33.33 and 52.94%, respectively). When investigating the effect of the difference between QTL effects on the approximation, five unlinked QTL are placed in the middle of 10- or 40-cM intervals with effects (a1 = 1, a2 = 1, a3 = 1, a4 = 1, a5 = 1), (a1 = 4, a2 = 1, a3 = 1, a4 = 2, a5 = 1), or (a1 = 4, a2 = 1, a3 = 1, a4 = -1, a5 = 1), respectively, and together contribute 50% of the trait variation. When investigating the effect of linkage, two QTL are placed in two neighboring 40-cM intervals and are 10, 20, 30, or 40 cM apart from each other (5, 10, 15, and 20 cM from the marker between them). Their effects are set at (a1 = 1, a2 = -1) without epistasis or (a1 = 1, a2 = -1, I12 = 1) with epistasis, and the heritability is assumed to be 0.1, 0.3, or 0.5 for each case. The sample size is 200, and 500 replicates were simulated for all cases.

For simplicity of comparison, the QTL positions are assumed to be known, and the simulation is performed at the positions. When calculating the power of separating closely linked QTL, a successful separation requires the partial LRT statistic for each QTL > {chi}2 ({chi}2 for the epistasis case). Means of the estimated parameter values, their standard deviations (SDs), and mean squared errors (MSEs), as well as the LRT statistics, are recorded. MSE is used to evaluate the approximation of REG to ML interval mapping and the performance of the two methods under various cases. MSE, which is defined as

incorporates two components, one measuring the variability of the estimate (precision), and the other measuring its bias (accuracy). A good method needs to control both variance and bias in estimation.

As expected, if h2 is low (h2 = 0.01), the two methods provide almost identical means and SDs of the estimates for µ, a1, a2, I12, {sigma}2, and h2, and LRT statistics. These results for h2 = 0.01 correspond with the findings of HALEY and KNOTT 1992 Down and XU 1995 Down, XU 1998A Down, XU 1998B Down. When h2 becomes higher (h2 > 0.1), their differences due to the factors of interval size and QTL position become observable, but minor (the ML method generally has a smaller MSE and larger LRT statistic). To shorten the article, only part of the results are presented, and the investigation focuses on the factors of linkage, different QTL sizes, and epistasis under the multiple-QTL model.

Proportion of variance explained by QTL, interval size, and QTL position:
The means of the estimated main and epistatic effects by the ML and REG methods are almost identical and very close to the true values for various h2's, interval sizes, and QTL positions. However, the ML method tends to provide estimates with smaller SD (MSE) and larger LRT statistics when compared to the REG method. For example, the MSEs of â1 by the REG method are 0.178, 0.050, and 0.024 for h2 = 0.1, 0.3, and 0.5, respectively, and the MSEs by the ML method are 0.175, 0.047, and 0.020, respectively (only the result for h2 = 0.5 and QTL located in the middle of the intervals is shown in Table 2). There is a similar pattern for other estimates. The estimates of {sigma}2 and h2 by the REG method are biased, and the estimates by the ML method are (asymptotically) unbiased. For example, the h2 by the REG method is 0.072 (SD 0.033), 0.187 (SD 0.047), and 0.302 (SD 0.051) for h2 = 0.1, 0.3, or 0.5, respectively, and the h2 by the ML method is 0.128 (SD 0.055), 0.316 (SD 0.070), and 0.509 (SD 0.061), respectively. The bias of the REG method in estimating {sigma}2 and h2 becomes obvious as h2 becomes large. Also, the ML method gives larger LRT statistics than the REG method in all cases. The difference in mean LRT statistics between the two methods is negligible: 0.4 (15.2 - 14.8 = 0.4 for 40-cM marker spacing) for h2 = 0.1 (results not shown), but 9.9 (82.5 - 72.6 = 9.9 for 40-cM marker spacing) for h2 = 0.5. Therefore, the difference in the LRT statistic becomes larger as h2 becomes higher. Similar patterns of difference in MSE and LRT statistics, caused by the change of h2, can be observed for other interval sizes and QTL positions.


 
View this table:
In this window
In a new window

 
Table 2. Comparison of maximum likelihood and regression interval mapping of simulated data (h2 = 0.5)

Epistasis:
The means, SDs, and MSEs of the estimates as well as the mean LRT statistics for effect ratios 1:1:1, 1:1:2, and 1:1:3 are listed in Table 3. As interaction becomes stronger, the MSEs of the estimates by both methods become larger, and their differences in MSE and LRT statistics become larger. For example, the MSEs of Î12 by the ML method are 0.075, 0.110, and 0.159 for the three ratios, respectively, and they are 0.127, 0.175, and 0.254 by the REG method, respectively. A similar trend can be observed for other estimates. The means of LRT statistics for the three ratios are 82.5, 80.3, and 75.2 for the ML method, respectively, and they are 72.6, 65.5, and 59.5 for the REG method, respectively. The bias of the REG method in the estimation of {sigma}2 and h2 also becomes much more serious as interaction between QTL gets stronger. The h2 by the REG method is 0.302 (SD 0.051), 0.277 (SD 0.053), and 0.255 (SD 0.053) for the three ratios, respectively (h2 = 0.5). The ML method, however, can estimate h2 and other parameters well for all ratios.


 
View this table:
In this window
In a new window

 
Table 3. Comparison of maximum likelihood and regression interval mapping of simulated data under different strengths of epistasis

Difference between QTL effects:
Table 4 shows the means, SDs, and MSEs of the estimates as well as the mean LRT statistics for QTL effects (a1 = 1, a2 = 1, a3 = 1, a4 = 1, a5 = 1), (a1 = 4, a2 = 1, a3 = 1, a4 = 2, a5 = 1), and (a1 = 4, a2 = 1, a3 = 1, a4 = -1, a5 = 1). When QTL effects are of the same size, the mean LRT statistic of the ML method is 1.8 (81.5 - 79.7) larger than that of the REG method. If there are some relatively large and small QTL, their differences in LRT statistics are 3.7 (84.1 - 80.4) and 5.8 (84.7 - 78.9), respectively, for the other two cases. Also, the estimates by the REG method tend to have larger MSEs, and the h2 and 2 by the REG method are biased. For the case (a1 = 1, a2 = 1, a3 = 1, a4 = 1, a5 = 1, and h2 = 0.1) and 10-cM intervals, the difference in mean LRT statistic between the two methods is at a very micro level (52.3 - 52.2 = 0.08).


 
View this table:
In this window
In a new window

 
Table 4. Comparison of maximum likelihood and regression interval mapping of simulated data under different relative sizes of QTL effects

Linkage:
The powers of separating 10-, 20-, 30-, and 40-cM-apart QTL are 97.2, 99.0, 99.6, and 99.8% for the ML method, respectively, and are 22.0, 60.0, 91.6, and 98.4% for the REG method, respectively (Table 5 and Fig 1). Also, the difference in MSE between the two methods becomes larger as QTL get closer. The MSE ratios of â1 for the two methods are 4.52 (0.226/0.050), 8.67 (0.091/0.011), 4.00 (0.056/0.014), and 2.57 (0.036/ 0.011), respectively (Fig 1C). The estimated h2 by the REG method is seriously biased. The means of h2 by the REG method are 0.037, 0.070, 0.112, and 0.162 for 10-, 20-, 30-, and 40-cM-apart QTL, respectively (h2 = 0.5). The ML method, however, can estimate h2 well (see also Fig 1C). If h2 = 0.3 or 0.1, the power, the MSE ratio of â1, and h2 for the two methods are shown in Fig 1, a–c. If the linked QTL show epistasis, the advantage gained by the ML method becomes even more significant (Fig 1D).



View larger version (33K):
In this window
In a new window
Download PPT slide
 
Figure 1. The power, estimate of proportion of variance explained by QTL (h2), and MSE of effect estimate by the ML and REG methods in the analysis of two linked QTL without (a1 = 1, a2 = -1) and with (a1 = 1, a2 = -1, I12 = 1) epistasis under different genetic distances and proportions of variance explained by QTL (h2). (a) Power of separating two linked QTL with no epistasis. The solid lines from bottom to top denote the power by the ML method for h2 = 0.1, 0.3, and 0.5, respectively. The dotted lines from bottom to top denote the power by the REG method for h2 = 0.1, 0.3, and 0.5, respectively. (b) Estimate of h2. The solid lines from bottom to top denote the estimate of h2 by the ML method for h2 = 0.1, 0.3, and 0.5, respectively. The dotted lines from bottom to top denote the estimate of h2 by the REG method for h2 = 0.1, 0.3, and 0.5, respectively. (c) Ratio of MSE. The solid, dotted, and dashed lines from bottom to top denote the MSE ratio of â1 by the REG method and ML method for h2 = 0.1, 0.3, and 0.5, respectively. (d) Power of separating two linked QTL with epistasis. The solid lines from bottom to top denote the power by the ML method for h2 = 0.1, 0.3, and 0.5, respectively. The dotted lines from bottom to top denote the power by the REG method for h2 = 0.1, 0.3, and 0.5, respectively.


 
View this table:
In this window
In a new window

 
Table 5. Comparison of maximum likelihood and regression interval mapping of simulated data under different strengths of linkage


*  CONCLUSION AND DISCUSSION
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

In this article, the differences in QTL parameter estimation and testing for the existence of QTL between the ML and REG methods are investigated both analytically and numerically. It is found that the REG method tends to give estimates with larger MSE and smaller LRT statistics in testing parameters, and it is less powerful in QTL detection when compared with the ML method. Also, the REG method is biased in estimating the residual variance and the proportion of total variance explained by QTL. Therefore, ML interval mapping is more accurate, precise, and powerful than REG interval mapping in QTL mapping. The differences in power, MSE, and LRT statistics between the two methods depend on factors such as size of QTL effect, interval size, relative QTL position in an interval, difference between QTL effects, epistasis, and linkage between QTL, as shown in the article. Their differences in general may be minor, but can be significant in certain situations. The differences become larger as the proportion explained by QTL becomes higher, marker interval becomes wider, QTL position moves from boundary to middle of an interval, the difference of QTL effects is larger, epistasis becomes stronger, and the QTL positions are closer. Especially, the REG method may have a serious problem in detecting closely linked QTL when compared with the ML method. As shown in Table 5 and Fig 1, the difference in detecting closely linked (10–20 cM apart) QTL with opposite effects is quite significant (power 0.22 vs. 0.97 for 10-cM-apart QTL and h2 = 0.5; power 0.60 vs. 0.99 for 20-cM-apart QTL and h2 = 0.5; power 0.09 vs. 0.54 for 10-cM-apart QTL and h2 = 0.3; power 0.34 vs. 0.61 for 20-cM-apart QTL and h2 = 0.3). In addition, the REG method is seriously biased in estimating the proportion of variance explained by QTL (Fig 1B), and it gives the estimates of the effects with much larger MSEs (Fig 1C). The problem of the REG method in detecting closely linked QTL becomes worse if epistasis is present (Fig 1D).

It was often pointed out that there is no significant difference in the estimation of QTL parameter and statistical power of QTL detection between the REG and ML methods with the exception that the estimate of residual variance by the REG method is biased (HALEY and KNOTT 1992 Down; XU 1995 Down, XU 1998A Down, XU 1998B Down). These findings were mostly done by simulation and concentrated on the comparison of mean estimate (accuracy) of QTL effect for low heritability (h2 = 0, 0.008, 0.03, and 0.111 in HALEY and KNOTT 1992 Down) and a QTL positioned in a narrow interval (interval size 10 cM in XU 1998A,B Down) for a one-QTL model. Therefore, their differences due to factors such as different QTL sizes, epistasis, and linkage between QTL had not been identified and needed to be checked using the multiple-QTL model. When multiple QTL are considered simultaneously in the model, the differences in power and estimation between the two methods can be significant for these factors as shown in this article. Using the multiple-QTL model of the ML approach, it has been found that quantitative traits with a somewhat medium to high heritability might be affected by several linked and unlinked QTL, having different sizes, directions, and interaction of effects (KAO et al. 1999 Down; WEBER et al. 1999 Down; ZENG et al. 2000 Down). Also, the linkage map may have wide marker intervals (GRATTAPAGLIA et al. 1996 Down; SATAGOPAN et al. 1996 Down; LI et al. 1997 Down; KAO et al. 1999 Down). As a result, the REG method can be significantly different from the ML method and thus be problematic in practical QTL mapping.

The cost in computation per iteration in the EM algorithm is generally not very expensive (MCLACHLAN and KRISHNAN 1997 Down). If the model is extended to fit five QTL, the ML method needs ~18 iterations to converge and is <10 times slower than the REG method for the 40-cM interval case (the REG method takes ~65 sec and the ML method takes ~608 sec to finish the computation of 500 replicates). Therefore, the ML method should not be regarded as formidably expensive in computation as the computer technology is advancing. XU 1998A Down, XU 1998B Down proposed the iteratively reweighted least-squares (IRWLS) method to correct the bias of the REG method in estimating residual variance. The estimates by both REG and IRWLS methods tend to have larger SD than those by the ML method (Tables 1–4 in XU 1998A Down; Table 7 in XU 1998B Down). As the MLEs have the property of asymptotical efficiency, it should not be surprising that the ML method has the ability to provide the smallest SD among estimates (CASELLA and BERGER 1990 Down). Therefore, ML interval mapping is not only a more powerful but also a more precise method in QTL mapping.

The distributions of most quantitative traits approximate more or less close to normal or can be scaled to normal through simple transformation (FALCONER and MACKAY 1996 Down). Therefore, when mapping QTL, the likelihood is generally modeled as a normal mixture (Equation 2). When applying the EM algorithm to the estimation of a normal mixture model, the estimating Equation 3, Equation 4, and Equation 5 depend on the conditional posterior probabilities of QTL genotypes {pi}ij's, which take the distribution of the residual error into account using normal density. The estimation of the REG method depends on the conditional probabilities pij's, which ignore the distribution of residual error, and the IRWLS method takes only the second moment of residual error into account whatever the underlying residual error distribution is. This is also the reason why the ML method can be better than the REG and IRWLS methods. If the residual error does not follow normal distribution, the mixture model in Equation 2 should take its specific form into account to model the relation between the quantitative trait and QTL in estimation. In practice, although most of the residual errors are normally distributed and the use of the normal mixture model should be safe in most situations, it is important to examine the pattern of residuals, which is a requisite procedure in model selection, to ensure that the final QTL mapping model is appropriate.

The QTL mapping result will be used as a base for follow-up operations, such as marker-assisted selection or gene transfer, on QTL for trait improvement. To ensure the validity of trait improvement, the quality of QTL mapping should be more important than the ease of computation. Researchers using the REG method for mapping QTL need to be concerned with the factors affecting its approximation to the ML method in practice. For example, if there are wide marker intervals along the genome (known data structure), or the QTL effects are not sure to be equally small, or the QTL are linked with epistasis (unknown QTL parameters), the REG method may perform poorly when compared to the ML method. Then, after the analysis of the REG method, there is a need to further use the ML method to finalize the QTL mapping result. As far as computation is concerned, it is suggested that researchers may use the REG method as an initial procedure to obtain preliminary results and further use the ML method as a final procedure to obtain the conclusive results of QTL mapping.


*  ACKNOWLEDGMENTS

This study was supported by grants NCS89-2313-B-001-006 from the National Science Council, Taiwan, Republic of China.

Manuscript received July 26, 1999; Accepted for publication May 30, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MAXIMUM-LIKELIHOOD INTERVAL...
*REGRESSION INTERVAL MAPPING
*DIFFERENCES BETWEEN ML AND...
*MULTIPLE-QTL MODEL
*SIMULATION STUDIES
*CONCLUSION AND DISCUSSION
*LITERATURE CITED

CARBONELL, E. A., T. M. GERIG, E. BALANSARD, and M. J. ASINS, 1992  Interval mapping in the analysis of nonadditive quantitative trait loci. Biometrics 48:305-315.

CASELLA, G., and R. BERGER, 1990 Statistical Inference. Wadsworth, Belmont, CA.

DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977  Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39:1-38.

DOERGE, R. W. and G. A. CHURCHILL, 1996  Permutation test for multiple loci affecting a quantitative character. Genetics 142:284-294.

DUPUIS, J. and D. SIEGMUND, 1999  Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151:373-386[Abstract/Free Full Text].

FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics. Longman Group, London.

GOFFINET, B. and B. MANGIN, 1998  Comparing methods to detect more than one QTL on a chromosome. Theor. Appl. Genet. 96:628-633.

GRATTAPAGLIA, D., F. L. G. BERTOLUCCI, R. PENCHEL, and R. R. SEDEROFF, 1996  Genetic mapping of quantitative trait loci controlling growth and wood quality traits in Eucalyptus grandis using a maternal half-sib family and RAPD markers. Genetics 144:1205-1214[Abstract].

HACKETT, C. A. and J. I. WELLER, 1995  Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51:1252-1263[Medline].

HALDANE, J. B. S., 1919  The combination of linkage values and the calculation of distances between the loci of linked factors. J. Genet. 8:299-309.

HALEY, C. S. and S. A. KNOTT, 1992  A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].

HALEY, C. S., S. A. KNOTT, and J.-M. ELSEN, 1994  Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207[Abstract].

HENSHALL, J. M. and M. E. GODDARD, 1999  Multiple-trait mapping of quantitative trait loci after selective genotyping using logistic regression. Genetics 151:885-894[Abstract/Free Full Text].

HOESCHELE, I. and P. VANRADEN, 1993a  Bayesian analysis of linkage between genetic markers and quantitative trait loci. I. Prior knowledge. Theor. Appl. Genet. 85:953-960.

HOESCHELE, I. and P. VANRADEN, 1993b  Bayesian analysis of linkage between genetic markers and quantitative trait loci. II. Combining prior knowledge with experimental evidence. Theor. Appl. Genet. 85:946-952.

JANSEN, R. C., 1993  Interval mapping of multiple quantitative trait loci. Genetics 135:205-211[Abstract].

JIANG, C. and Z.-B. ZENG, 1997  Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140:1111-1127[Abstract].

KAO, C.-H. and Z.-B. ZENG, 1997  General formulas for obtaining the MLE and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53:359-371.

KAO, C.-H., Z.-B. ZENG, and R. D. TEASDALE, 1999  Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216[Abstract/Free Full Text].

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199[Abstract/Free Full Text].

LEBRETON, C. M., P. M. VISSCHER, C. S. HALEY, A. SEMIKHODSKII, and S. A. QUARRIE, 1998  A nonparametric bootstrap method for testing close linkage vs. pleiotropy of coincident quantitative trait loci. Genetics 150:931-943[Abstract/Free Full Text].

LI, Z., S. R. M. PINSON, W. D. PARK, A. H. PATERSON, and J. W. STANSEL, 1997  Epistasis for three grain yield components in rice (Oryza sativa L.). Genetics 145:453-465[Abstract].

LITTLE, R. J. A., and D. B. RUBIN, 1987 Statistical Analysis With Missing Data. John Wiley, New York.

LOUIS, T. A., 1982  Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44:226-233.

MARTINEZ, O. and R. N. CURNOW, 1992  Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor. Appl. Genet. 85:480-488.

MCLACHLAN, G. F., and T. KRISHNAN, 1997 The EM Algorithm and Extensions. John Wiley, New York.

MENG, X.-L. and B. RUBIN, 1991  Using EM to obtain asymptotic variance-covariance matrix: the SEM algorithm. J. Am. Stat. Assoc. 86:899-909.

NETER, J., W. WASSERMAN and M. H. KUTNER, 1990 Applied Linear Statistical Model. Richard D. Irwin, Tokyo.

REBAI, A. and B. GOFFINET, 2000  More about quantitative trait locus mapping with diallel designs. Genet. Res. 75:243-247[Medline].

SATAGOPAN, J. M., B. S. YANDELL, M. A. NEWTON, and T. C. OSBORN, 1996  A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144:805-816[Abstract].

SILLANPAA, M. J. and E. ARJAS, 1999  Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151:1605-1619[Abstract/Free Full Text].

SONG, J. Z., M. SOLLER, and A. GENIZI, 1999  The full-sib intercross line (FSIL): a QTL mapping design for outcross species. Genet. Res. 77:61-73.

WEBER, K., R. EISMAN, L. MOREY, A. PATTY, and J. SPARKS et al., 1999  An analysis of polygenes affecting wing shape on Chromosome 3 in Drosophila melanogaster. Genetics 153:773-786[Abstract/Free Full Text].

WHITTAKER, J. C., R. THOMPSON, and P. M. VISSCHER, 1996  On the mapping of QTL by regression of phenotype on marker type. Heredity 77:23-32.

XU, S., 1995  A comment on the simple regression method for interval mapping. Genetics 141:1657-1659[Medline].

XU, S., 1996  Mapping quantitative trait loci using four-way crosses. Genet. Res. 68:175-181.

XU, S., 1998a  Further investigation on the regression method of mapping quantitative trait loci. Heredity 80:364-373.

XU, S., 1998b  Iteratively reweighted least squares mapping for quantitative trait loci. Behav. Genet. 28:341-355[Medline].

ZENG, Z.-B., 1994  Precision mapping of quantitative trait loci. Genetics 136:1457-1468[Abstract].

ZENG, Z.-B., J. LIU, L. F. STAM, C.-H. KAO, and J. M. MERCER et al., 2000  Genetic architecture of a morphological shape difference between two Drosophila species. Genetics 154:299-310[Abstract/Free Full Text].




This article has been cited by other articles:


Home page
GeneticsHome page
B. Feenstra, I. M. Skovgaard, and K. W. Broman
Mapping Quantitative Trait Loci by an Extension of the Haley-Knott Regression Method Using Estimating Equations
Genetics, August 1, 2006; 173(4): 2269 - 2282.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. A. Anderson, A. F. McRae, and P. M. Visscher
A Simple Linear Regression Method for Quantitative Trait Loci Linkage Analysis With Censored Observations
Genetics, July 1, 2006; 173(3): 1735 - 1745.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Sen, J. M. Satagopan, and G. A. Churchill
Quantitative Trait Locus Study Design From an Information Perspective
Genetics, May 1, 2005; 170(1): 447 - 464.
[Abstract] [Full Text] [PDF]


Home page
J DAIRY SCIHome page
Y. Liu, G. B. Jansen, and C. Y. Lin
Quantitative Trait Loci Mapping for Dairy Cattle Production Traits Using a Maximum Likelihood Method
J Dairy Sci, February 1, 2004; 87(2): 491 - 500.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. O. Borevitz, J. N. Maloof, J. Lutes, T. Dabi, J. L. Redfern, G. T. Trainer, J. D. Werner, T. Asami, C. C. Berry, D. Weigel, et al.
Quantitative Trait Loci Controlling Light and Hormone R