Genetics, Vol. 157, 1789-1803, April 2001, Copyright © 2001

Enhanced Efficiency of Quantitative Trait Loci Mapping Analysis Based on Multivariate Complexes of Quantitative Traits

Abraham B. Korola, Yefim I. Ronina, Alexander M. Itskovicha, Junhua Penga, and Eviatar Nevoa
a Institute of Evolution, University of Haifa, Haifa 31905, Israel

Corresponding author: Abraham B. Korol, Institute of Evolution, University of Haifa, Haifa 31905, Israel., korol{at}esti.haifa.ac.il (E-mail)

Communicating editor: G. A. CHURCHILL


*  ABSTRACT
*TOP
*ABSTRACT
*THE PROPOSED METHOD
*RESULTS
*DISCUSSION
*LITERATURE CITED

An approach to increase the efficiency of mapping quantitative trait loci (QTL) was proposed earlier by the authors on the basis of bivariate analysis of correlated traits. The power of QTL detection using the log-likelihood ratio (LOD scores) grows proportionally to the broad sense heritability. We found that this relationship holds also for correlated traits, so that an increased bivariate heritability implicates a higher LOD score, higher detection power, and better mapping resolution. However, the increased number of parameters to be estimated complicates the application of this approach when a large number of traits are considered simultaneously. Here we present a multivariate generalization of our previous two-trait QTL analysis. The proposed multivariate analogue of QTL contribution to the broad-sense heritability based on interval-specific calculation of eigenvalues and eigenvectors of the residual covariance matrix allows prediction of the expected QTL detection power and mapping resolution for any subset of the initial multivariate trait complex. Permutation technique allows chromosome-wise testing of significance for the whole trait complex and the significance of the contribution of individual traits owing to: (a) their correlation with other traits, (b) dependence on the chromosome in question, and (c) both a and b. An example of application of the proposed method on a real data set of 11 traits from an experiment performed on an F2/F3 mapping population of tetraploid wheat (Triticum durum x T. dicoccoides) is provided.


THE detection power and mapping resolution of marker analysis of quantitative traits are the major factors affecting practical applications of quantitative trait loci (QTL) mapping. These characteristics strongly depend on the effect of the QTL in question relative to the phenotypic variance of the trait in the mapping population. The higher the discrepancy between QTL groups (or the contribution of the QTL to the trait heritability H2, the proportion of genetic variation {sigma}2G in total phenotypic variation {sigma}2Ph of the trait) the better the expected QTL detection power and mapping resolution. As shown by LANDER and BOTSTEIN 1989 Down, the expected value of the log-likelihood test statistics increases monotonically with H2:

(1)

Several strategies have been proposed to improve the precision of QTL mapping. These involve development of (i) new experimental designs to suit specific mapping goals and an organism's breeding system, and (ii) new QTL mapping models and algorithms to extract maximum information about QTL locations and effects. One of the improvements includes multilocus (composite) mapping analysis (JANSEN and STAM 1994 Down; ZENG 1994 Down), selective sampling (LEBOWITZ et al. 1987 Down; DARVASI and SOLLER 1994 Down), replicated progeny testing (SOLLER and BECKMANN 1990 Down), and sequential experimentation (MOTRO and SOLLER 1993 Down). For example, in composite mapping the increase in mapping resolution derives from a reduction of the residual variation by taking into account the effects of cosegregating QTL.

In QTL mapping, the experimental design usually includes simultaneous measurements of many related and unrelated quantitative traits and subsequent treatment of the individual traits. Recently, several groups attempted to improve the efficiency of marker analysis of QTL by taking into account possible effects of the putative QTL on several traits simultaneously (KOROL et al. 1987 Down, KOROL et al. 1995 Down, KOROL et al. 1998A Down; AMOS et al. 1990 Down; SCHORK et al. 1994 Down; JIANG and ZENG 1995 Down; RONIN et al. 1995 Down, RONIN et al. 1998 Down, RONIN et al. 1999 Down; WELLER et al. 1996 Down; ALMASY et al. 1997 Down; BOOMSMA and DOLAN 1998 Down; MANGIN et al. 1998 Down; HENSHALL and GODDARD 1999 Down; OLSON et al. 1999 Down; WILLIAMS et al. 1999 Down; ZENG et al. 2000 Down). In the simplest case of two noncorrelated traits, the advantage of joint analysis is in the increase of the multivariate effect according to d2 = ()2 + ()2 (Fig 1A), where dx and dy are the substitution effects of the QTL for traits x and y, and {sigma}x and {sigma}y are the corresponding standard deviations within the QTL groups (residual standard deviations). Consequently, for a population with 1:1 ratio of the alternative QTL groups (like backcross, dihaploid, or recombinant inbreds) the bivariate analogue of H2 could be represented in the form

(2)



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 1. The main sources for improvement of QTL mapping efficiency in multiple-trait analysis: (a) due to the pleiotropic effects; (b) due to correlation between the traits (within the QTL groups) caused by nongenetic effects and segregation of unlinked QTL; or (c) due to combined effect of both foregoing factors.

The situation becomes more complicated when correlated traits are involved. It can be shown (KOROL et al. 1995 Down) that Equation 1 remains valid in bivariate analysis of correlated traits,

(1)

with

(3)

It was shown earlier that either ELOD(x, y) >= ELOD(x) and ELOD(x, y) >= ELOD(y) follow from H2xy >= H2x and H2xy >= H2y, respectively (KOROL et al. 1995 Down; RONIN et al. 1999 Down). Given fixed dx/{sigma}x or H2x = , how will the resolution be affected by other traits being taken into account? Several situations should be considered to explain the expected gain of joint analysis of multiple traits compared to single-trait analysis. For the sake of simplicity, let us consider two traits. As mentioned above, if Rxy = 0, the effect of an additional trait is simply due to the increased Euclidean distance between the (two-dimensional) centers of the QTL groups (see Fig 1A). Consider now the situation when the traits are correlated within each of the QTL groups with residual correlation Rxy != 0. It is easy to see from Equation 3 that if dy != 0 and Rxy != 0 and sign(Rxydxdy) < 0, then H2xy >= H2x and one could expect a respective increase in ELOD. Moreover, the inequality H2xy > H2x holds even if dy = 0 but Rxy != 0, independent of the sign of correlation (Fig 1B). Therefore, we can further assume that the increment in H2xy, compared with H2x, will result in an increased resolution of the mapping analysis (in spite of complications due to certain statistical nonequivalence), no matter how this increment in H2xy was produced, due to (i) the pleiotropic effect of the QTL on x and y, (ii) residual correlation between x and y (within the QTL groups) caused by nongenetic effects or segregation of unlinked QTL, or (iii) the combined effect of both factors (i) and (ii) (Fig 1C). In other words, instead of separate analyses of traits x and y, one can conduct joint analysis of these traits that is formally equivalent to transformation of a two-dimensional phenotype into a one-dimensional phenotype. For the new phenotype, a higher ratio of the between-QTL group difference to the residual variation can be achieved owing to the pleiotropic effect of the QTL on both traits, and residual correlation between the traits caused by nongenetic factors and segregation at other QTL. These expectations, illustrated geometrically in Fig 1, are confirmed by both Monte Carlo simulations and analytical approximations, for marker and interval analysis (KOROL et al. 1995 Down, KOROL et al. 1998A Down; RONIN et al. 1995 Down, RONIN et al. 1999 Down). Although the described approach resembles the principal component analysis (PCA), it differs from PCA significantly. Besides technical differences, the main dissimilarity is that our trait transformations are interval dependent (KOROL et al. 1995 Down), whereas in PCA the transformation is applied to the initial trait complex (WELLER et al. 1996 Down). This difference may be important if the mapping population segregates for more than one QTL (see below).

Clearly, not only statistical reasons are of interest when discussing the advantages of the joint analysis of correlated trait complexes. The multitrait approach allows for an integral evaluation of the effects of genomic segments on a defined group of traits. Because of the internal balance of the organism's systems (SCHMALHAUSEN 1942 Down), such an approach for QTL mapping seems to be much more justified biologically than the usual "trait-by-trait" analysis. It may assist in testing numerous biologically important hypotheses concerning manifold effects of genomic segments on quantitative variation, to distinguish between linkage and pleiotropy as mechanisms of genetic correlation, to address the problem of QTL-by-environment interaction, etc. (KOROL et al. 1994 Down, KOROL et al. 1998B Down; JIANG and ZENG 1995 Down; LEBRETON et al. 1998 Down; RONIN et al. 1999 Down). Such analysis may be of major importance in formulating marker-assisted breeding strategies, dissecting heterosis as a multilocus multitrait phenomenon, developing optimized programs for evaluation and bioconservation of genetic resources, and revealing the genetic architecture of fitness systems in natural populations and of multifactorial diseases in humans.

The increased number of parameters to be estimated complicates the application of this approach when a large number of traits are considered. With n traits to be analyzed simultaneously in the simplest case of a backcross (as well as a dihaploid or recombinant inbred lines) mapping population using single-interval mapping, the model should include (n2 + 5n + 2)/2 parameters [QTL position, n mean values, n effects, n residual variances, and n(n - 1)/2 covariances]. At n = 10, this amounts to 76 parameters.

One possible ad hoc simplification of the estimation aspects is based on a reduction to two-trait analysis (KOROL et al. 1995 Down, KOROL et al. 1998A Down; RONIN et al. 1995 Down, RONIN et al. 1999 Down; JIANG and ZENG 1995 Down) that appeared to be very efficient in allowing for an increase in QTL detection power and mapping resolution. In specific situations, such a reduction to a two-trait analysis may also be justified by the biological nature of the involved traits. However, in real multitrait situations this approach may result in statistical difficulties caused by the large number of trait pairs. Corresponding multiple tests may be interdependent, causing a further complication in defining the critical values of the test statistics. Another possibility is related to the attempt at space transformation, e.g., using the PCA (WELLER et al. 1996 Down; MANGIN et al. 1998 Down) applied to the multivariate trait distribution across the entire data set. Although this approach seems to be very attractive, it cannot directly solve the problem when the mapping population segregates for more than one QTL, especially if some of the effects are relatively strong. Indeed, assume that in such a case the PCA transformation was applied to the initial trait complex (without taking out the effects of the target QTL), i.e., for all individuals independent of their genotypes. Then, the independence of the resulting derivative traits over the entire mapping population cannot guarantee their independence within the alternative QTL groups. Moreover, this problem may exist even in the case of one QTL segregating in the mapping population, because the total (across all individuals) variance-covariance matrix of the initial trait complex may differ from the residual one (i.e., for the matrix characterizing the within-QTL group variation). It is noteworthy that the largest principal components may be irrelevant in such an analysis, as can be seen from Fig 1C (see also OLSON et al. 1999 Down). Nevertheless, in some cases this approach may work (in situations represented by Fig 1A).

Here we present a generalization of our previous two- and three-trait QTL mapping algorithm (KOROL et al. 1995 Down; RONIN et al. 1995 Down), which is free of the mentioned difficulties, to multivariate trait complexes that allow analysis of a large number of traits. It is based on transformation of the initial trait space into a space of a lower dimension. In the simplest case of a single-QTL analysis of a backcross (dihaploid) mapping population, the resulting space is one-dimensional independent of the number of traits, whereas two-QTL analysis for such a population will employ a two-dimensional model (for F2 these will be three- and eight-dimensional models, correspondingly). The main difference between our previous two-trait models (KOROL et al. 1995 Down, KOROL et al. 1998A Down; RONIN et al. 1995 Down, RONIN et al. 1998 Down, RONIN et al. 1999 Down; see also JIANG and ZENG 1995 Down) and the foregoing PCA-based models lies in the fact that the residual variance-covariance matrix was considered interval dependent, in the following sense. Its elements are a subset of the vector of unknown parameters to be estimated by the employed procedure for each interval, so that for QTL residing in different genomic segments the resulting (transformed) traits could be very different. This interval dependence remains a notable characteristic of our new multivariate algorithm.


*  THE PROPOSED METHOD
*TOP
*ABSTRACT
*THE PROPOSED METHOD
*RESULTS
*DISCUSSION
*LITERATURE CITED

The model:
Assume first that only one QTL segregates in the mapping population. Consider the genomic segment carrying this QTL (with alleles Q and q) flanked by markers M1/m1 and M2/m2, with recombination rates r1 and r2 in intervals M1/m1Q/q and Q/qM2/m2. On the basis of the marker scores and the measurements of the trait complex x = (x1, x2, ... , xn), we should test whether or not variation of any trait of x indeed depends on the interval M1/m1M2/m2 and identify the corresponding locus Q/q. The expected joint distributions of the traits x in each of the marker groups, Um1m2 =U1(x), UM1m2(x) = U2(x), Um1M2(x) = U3(x), and UM1M2(x) = U4(x), can be written as

where the proportions {pi}i = {pi}i(r1, r2) depend on unknown recombination rates r1 and r2 and mode of interference. The specification of the n-dimensional densities fqq(x) and fQq(x) depends on the assumptions made about the genetic control of the traits. The simplest case of additive control can be represented by the model

where x = (x1, ... , xn) is the vector of phenotype scores for an arbitrary individual, e = (e2, ... , en) is a vector of random variables that obey multivariate normal distribution with zero expectations for all coordinates and (residual) variance-covariance matrix {Sigma}R = {sij}, m is the vector of trait means, d is the vector of the effects of substitution at the Q/q locus with respect to mean values of x, i.e., dxi = µxi(Qq) - µxi(qq), and gq denotes the genotype at locus Q/q(gq = -1 for qq and 1 for Qq).

Expected improvement owing to multiple-trait analysis:
As in the bivariate case, the QTL detection power should depend on the total contribution of the QTL to multivariate phenotypic variation (VPh) of the correlated trait complex. If VR is the multivariate residual variation (within the QTL groups) and VG is the combined between-QTL-group discrepancy, then

(4)

In the case of noncorrelated traits, the improvement is due to the "Euclidean effect," which grows with the number of traits:

(5)

Clearly, in the general case of correlated traits, the pure Euclidean contribution is only a part of the total effect, so that H2Eu < H2T. Note that an analogue of Equation 5 can be obtained by canonical transformation of the initial trait space (with the within-group covariance matrix associated with the QTL under consideration), allowing the evaluation of the total effect as defined by Equation 4. Then, the multivariate effect of the QTL will be manifested as in Equation 5, but with relative effects (di/{sigma}i) in the new coordinate system. Moreover, using scale transformations x'i = and corresponding angular transformations, one can map the multivariate space into another multivariate space where the QTL affects only one trait, with H2D = being equal to the total contribution of the QTL, as in Equation 4 (where D is the total multivariate effect of the QTL).

The short review in the Introduction indicates that correlation between traits may be no less (if not more) an important factor affecting the detection power of multitrait QTL analysis. Therefore, it is of great interest to evaluate the contribution of correlations between the traits to H2T in Equation 4. Consequently, in the following illustrations, we present the expected improvement due to the Euclidean effect and the additional contribution due to correlations. Moreover, although no effect is expected from correlations if all effects di are 0, situations are possible where for only a small subset of traits di != 0, the remaining traits are still very informative because of their correlations to the foregoing traits (the simplest such example is provided in Fig 1B).

The numerical procedures of interval analysis:
The distinctive feature of our analysis is that all the multivariate transformations are interval-specific (as can be seen from procedures 1 and 2 described below; see also KOROL et al. 1987 Down, KOROL et al. 1995 Down, KOROL et al. 1998A Down; JIANG and ZENG 1995 Down; RONIN et al. 1998 Down), in contrast to the aforementioned attempts based on canonical transformation applied to the entire mixed distribution (WELLER et al. 1996 Down; MANGIN et al. 1998 Down). To explain why this is important, let us consider the simplest situation with a mapping population polymorphic for two unlinked QTL, say Q1/q1 and Q2/q2. A double haploid (or recombinant inbred, backcross, etc.) population will consist of four groups, such as Q1Q1Q2Q2, Q1Q1q2q2, q1q1Q2Q2, and q1q1q2q2. Assume that the residual (nongenetic) multitrait variation is the same in all four groups and can be described by a variance-covariance matrix {Sigma}R.

Two possibilities for incorporating the joint variation of the traits exist when single-QTL mapping analysis is conducted on the basis of markers of one chromosome. These are (a) to consider the general variance-covariance matrix of the traits, which differs from {Sigma}R due to the contribution of both QTL (the higher the individual effects of Q1/q1 and Q2/q2, the higher the difference); and (b) to consider the residual variation for each QTL as a combined result of nongenetic variation and the contribution of all other QTL excluding the one under consideration. The second possibility provides a relevant description of the residual variation for each QTL.

Two different approximated procedures, giving very similar results, were employed to implement this approach. In both, the LOD score serves as the major criterion in interval mapping; the steps of evaluating the QTL effects and QTL position are separated. Both are based on our earlier maximum-likelihood approach (KOROL et al. 1995 Down). Although the proposed procedures are only approximations of the full procedure, their major advantage is that they allow treatment of a large number of traits simultaneously.

Procedure 1: For each interval, a five-step procedure is conducted.

  1. The vector of mean trait values in alternative QTL groups defined by flanking markers M1M2 and m1m2 is evaluated.

  2. The same groups are used to define the elements of the residual (for the current ith interval) covariance matrix, {Sigma}Ri. Throughout this article, we assume no variance-covariance effect (but see KOROL et al. 1995 Down, KOROL et al. 1996A Down), so that {Sigma}R(QQ) = {Sigma}R(qq) and {Sigma}Ri(QQ) = {Sigma}Ri(qq).

  3. Transformation of the trait space, as described earlier, reduces the problem to a single-trait analysis. This step includes solving the problem of eigenvalues and eigenvectors of matrix {Sigma}R followed by scale and angular transformations, resulting in a new space with all effects being absorbed by only one variable ("integral" trait; see also ALLISON et al. 1998 Down).

  4. For the resulting variable, a single-trait analysis is conducted, with the likelihood function being dependent on four parameters, {theta} = (µ, D, {sigma}, r), where µ, D, {sigma}, and r stand for the mean value of the new trait, total substitution effect, residual standard variation, and recombination rate from the left marker, respectively.

  5. After getting the estimates, back transformations can be conducted, making it possible to get more precise estimates of mean values of the QTL groups. Consequently, the analysis could be repeated from step (2) until a convergent result is obtained.

Procedure 2: This is a simplified version of procedure one. It includes three steps and gives approximated results compared with those of procedure 1. However, the differences appear to be very small. For each interval, the three-step analysis is conducted.

  1. The vector of mean trait values in alternative QTL groups defined by flanking markers M1M2 and m1m2 is evaluated.

  2. The same groups are used to define the elements of the residual (for the current ith interval) covariance matrix, {Sigma}Ri.

  3. The entire sample is used to calculate the conditional maximum-likelihood estimate of the QTL position within the interval with all other parameters being fixed at the estimates obtained at steps (1) and (2).

Clearly, two factors influence the results obtained by this procedure. First, the estimates of the QTL effects will be biased downward owing to undetectable double recombinants among the parental (for the flanking markers) haplotypes. With interval size of ~10–15 cM this danger is negligible unless high negative interference is characteristic of multiple exchanges in the considered region of the genome. The second factor results in a slight reduction of the sample size: when the QTL effects and the residual covariance matrix are determined according to the foregoing steps (1) and (2), the recombinants for the flanking markers are ignored. Consequently, the sampling error of the estimates is increased by a factor 1/, where r is the rate of recombination between the flanking markers; for an interval of 10–15 cM the loss of precision is ~5.4–8.5%.

Monte Carlo simulations:
For mapping a population of the dihaploid (or recombinant inbred, backcross, etc.) type, 200 individuals were simulated with one, two, and three unlinked QTL and a trait complex including up to 10 traits. For each chromosome six equidistant markers were simulated, with recombination rate r = 0.1 between the neighbors and no interference and QTL residing in the middle of the third interval. To get the critical level of the test statistics two approaches were employed: Monte Carlo simulations with parameters corresponding to H0 (no QTL in the simulated chromosome) and permutation of the data set corresponding to H1. In both cases, 5000 runs were assayed for each situation. To evaluate the detection power and the precision of the estimated QTL effects and chromosomal position, 500 runs were assayed for each situation. In some isolated examples the numbers of permutation and bootstrap runs were increased to 10,000 and 1000, respectively. The majority of calculations were conducted using the multiple-trait algorithms implemented in the MultiQTL package (http://www.MultiQTL.com). With this program, 1000 permutation runs or 1000 bootstrap runs using a single-QTL model to analyze a 10-variate trait complex for a chromosome with 20 markers and population size 150 genotypes takes a Pentium III 600 MHz ~3.5 min or 2 min, respectively.


*  RESULTS
*TOP
*ABSTRACT
*THE PROPOSED METHOD
*RESULTS
*DISCUSSION
*LITERATURE CITED

QTL detection power and mapping resolution:
Example 1: Improved quality of QTL analysis: To demonstrate the contribution of different factors to the detection power and mapping resolution of multivariate QTL mapping, a series of variants were simulated that differ with respect to the number of traits (from 1 to 10), the type of the covariance matrix, the number of QTL, the effects of the target QTL(s) on the traits, etc. These were based on four 10 x 10 covariance matrices {Sigma}R (Table 1), with a common vector of alternating effects d = (0.25, -0.25, 0.25, -0.25, ... ) and the same residual standard variation si = 1.0 for all traits. Table 2 represents a diversity of examples with a single QTL: covariance matrices of the majority of variants were derived as major minors of corresponding dimension of the matrix for the 10-trait problem.


 
View this table:
In this window
In a new window

 
Table 1. Residual covariance matrices and QTL effects used in the simulations


 
View this table:
In this window
In a new window

 
Table 2. Improvement of the efficiency of QTL mapping owing to joint analysis of multiple-trait complexes

As expected, the increase in H2T (see Equation 4) owing to higher information content of multivariate complexes of greater dimension than those of lower dimension brought about an appreciable improvement in the quality of the QTL mapping analysis. This is manifested in higher LOD values and, correspondingly, a better detection power and higher precision of QTL mapping (Table 2). As expected, the improvement strongly depends on correspondence between the QTL effects and the signs of correlation coefficients (e.g., compare cases 2 and 6). The same mechanism appeared to work already in the two-trait analysis, as manifested by the inequality dxdyRxy < 0 being the necessary condition for ELOD(x, y) > ELOD(x) and/or ELOD(x, y) > ELOD(y) to hold (KOROL et al. 1995 Down). The remarkable fact is that it makes no difference whether the increase in H2T is caused by correlation between the traits or by the Euclidean contribution H2Eu (Fig 2). Indeed, the variants represented in Fig 2 differ qualitatively. These include the number of traits taken from the entire 10-dimensional complex, the values and signs of the effects of the QTL on the selected traits, the values of correlations, and even the covariance matrices in general (e.g., numbers 3, 4, 6, 10, 11, 13–15, 17, and 18). In spite of this diversity, the detection power (P) and mapping precision ({sigma}L) display a unified pattern across variants reflected in the curves P(H2) and {sigma}L(H2) in Fig 2.



View larger version (94K):
In this window
In a new window
Download PPT slide
 
Figure 2. Multivariate heritability as a predictor of the LOD value (affecting QTL detection power) and mapping resolution. Right-hand scale, ELOD ({circ}), left-hand scale, SL (standard deviation of the estimated QTL position; •). The graphs are based on Monte Carlo simulations described in Table 1 and partially represented in Table 2.

The results presented in Table 2 and Fig 2 indicate the high potential for improving the QTL detection power and mapping resolution by employing the information contained in the multivariate trait complex without increasing the sample size. Thus, for the same data set corresponding to the first matrix (case 10) with no di/{sigma}i exceeding 0.25, the detection power grows from 13 to 100% (at significance level 0.01) for single- and 10-trait analyses, respectively. Especially pronounced is the improvement of mapping precision: standard deviation of the estimated QTL position, {sigma}L, decreases from 14.8 cM in single-trait, to 9.3 cM in 2-trait, to 4.0 cM for the matrix A defined in Table 1 (compare cases 1, 2, and 10), or correspondingly, 14.8, 9.4, and 1.4 cM for the matrix C (compare cases 1, 16, and 18). Clearly, this trend reflects the fact that the increasing H2 caused by joint multiple-trait analysis results not only in higher LOD values and detection power, but also in increased probability to find the QTL in the true interval (interval 3; see footnote a in the right column of Table 2). At the level of an individual experiment, the increased resolution derives from the effect of H2 on the form of the LOD as a function of chromosomal position (l): at high H2 values the function LOD(l) is more steep than at small H2 (Fig 3). Clearly, increased precision of the estimated QTL position should also allow a more accurate estimation of the QTL effect. This is indeed the case, as illustrated by Fig 4. The increase in H2 accompanied by a more strict slope of the LOD function may justify a saturation of the chromosomal region in the detected QTL by additional markers. This may allow a reduction of the chances of incorrect QTL location and finer QTL mapping, as well as an attempt at resolving the pleiotropy-linkage alternative (JIANG and ZENG 1995 Down; ALMASY et al. 1997 Down; LEBRETON et al. 1998 Down; KOROL et al. 1998A Down; RONIN et al. 1999 Down).



View larger version (42K):
In this window
In a new window
Download PPT slide
 
Figure 3. The dependence of the LOD function on the number of traits. The numbers in the solid circles indicate the number of traits; the simulated position of the QTL is marked by an arrowhead (based on the last example of Table 1).



View larger version (34K):
In this window
In a new window
Download PPT slide
 
Figure 4. Improved correspondence between the simulated and estimated QTL effects in multiple-trait analysis as compared to single-trait analysis. (a) Single-trait analysis; (b) 10-trait analysis (based on the first example of Table 1).

It is noteworthy that ELOD calculated on the basis of H2T appeared to be a very good predictor of the averaged LOD obtained from Monte Carlo simulations (see the column LODm in Table 2). This indicates that Equation 1 obtained by LANDER and BOTSTEIN 1989 Down for a single trait, and generalized by KOROL et al. 1995 Down for two-trait analysis, also holds in a general multivariate case. The last statement follows from the fact that heritability of a complex of noncorrelated traits with a single QTL affecting only one trait can be represented as H2D = , where D is the multivariate effect and {sigma}2 the residual variance for the "integral" trait described in The numerical procedures of interval analysis.

Example 2: Interval-specific estimation of the covariance matrix: Another comment concerns the interval specificity that is a characteristic of our approach to defining the elements of the residual covariance matrix, {Sigma}R. If, instead of that, one uses the total (interval-independent) covariance matrix defined on the entire sample, the efficiency of mapping may be lowered. The numerical example with a three-trait complex shown in Table 3 illustrates the difference between the two approaches. One can easily see that if the approach based on total covariance matrix is employed, instead of our interval-specific procedure, a reduction in the LOD value (hence lower detection power) and increase in the bias ({delta}) and standard variation ({sigma}) of the estimated QTL effects (di) and, especially, chromosomal position (L), may be obtained. Note that in the foregoing example only a single QTL was simulated in the mapping population. The difference between the methods derives from the noncorrespondence between the residual correlation matrix and the directions of the pleitropic effects. Nevertheless, in some cases, where the total covariance matrix does not differ strongly from {Sigma}R, the loss will be less pronounced (see MANGIN et al. 1998 Down).


 
View this table:
In this window
In a new window

 
Table 3. Comparison of the QTL mapping results obtained by the proposed method (based on interval-specific determination of the residual covariance matrix {Sigma}R) and by using the total covariance matrix ({Sigma}total)

Example 3: Multiple QTL: We now illustrate the efficiency of the proposed algorithm in situations with more than one QTL segregating in the mapping population. We simulated two and three identical unlinked QTL with the residual 10 x 10 covariance matrix equal to that of example 10 and the same pleiotropic effects (see Table 1 and Table 2). As before, 500 Monte Carlo runs were made. The results (Table 4) confirm the previous conclusion: a dramatic improvement can be achieved by use of joint analysis of the correlated traits. Note that segregation for one or two additional QTL resulted in an increase in the residual variances (as compared with Example 1). Consequently, we obtained a slightly lower detection power and a lower mapping precision. For the 10-trait analysis, the standard deviation of the estimated QTL position (SL) increased from 4.0 to 5.0–5.6 cM in case of two QTL and to 5.8–6.7 cM in the case of three QTL. Clearly, this reduction in mapping precision can be recovered by a composite interval mapping approach (ZENG 1994 Down; JANSEN and STAM 1994 Down) but within the framework of multiple-trait analysis.


 
View this table:
In this window
In a new window

 
Table 4. The effect of the number of traits on efficiency of QTL mapping analysis with multiple QTL segregating in the mapping population

Significance of the detected effects:
Testing for significance is a difficult problem in QTL mapping analysis, especially when multiple intervals and/or multiple traits are involved (LANDER and BOTSTEIN 1989 Down; LANDER and KRUGLYAK 1995 Down; WELLER et al. 1998 Down). To get the critical level of the test statistics in the foregoing analysis we employed Monte Carlo simulations with parameters corresponding to H0 (no QTL in the chromosome, with 5000 runs per each variant). Clearly, this technique can also be used for real data analysis, but it would be much more preferable to take into account the distribution properties of the real data set. The best way to do this in testing significance is the permutation test (DOERGE and CHURCHILL 1996 Down). A few different, although related, questions about the significance of the results can be recognized in the multiple-trait procedure: (i) What is the significance level of the detected QTL?, (ii) which traits significantly contributed to the criterion (multivariate LOD score)?, and (iii) which traits depend significantly on the detected QTL? The difference between the second question and the third is caused by the fact that the information value of a trait may derive from its correlation to other traits of the complex, from the pleiotropic effect of the QTL on this trait, or from both these factors (see Fig 1).

Example 4: Selecting significant traits and effects: The foregoing aspects are illustrated in a simulated example with seven quantitative traits and a chromosome with five intervals (10 cM each) with a QTL residing in the middle of the third interval. The pleiotropic effects of the simulated QTL, the residual correlation matrix, and residual variances were as shown in Table 5. The results can be outlined as follows:

  1. To evaluate the significance of the QTL detected by using seven-dimensional mapping analysis, the entire vector of trait values was reshuffled relative to the marker scores (while retaining the structure within the trait complex). For each such permutated data set, the mapping procedure was applied, resulting in a corresponding value of the test statistics LOD score. This process was repeated many times (10,000 in our experiment). The significance of the H0 hypothesis (no effect of the considered chromosome on the multivariate trait complex) is calculated as the proportion of permutation runs that resulted in LOD values equal to or exceeding LOD* obtained on the nonpermutated data.


     
    View this table:
    In this window
    In a new window

     
    Table 5. Permutation test of significance for the contribution of the traits: the multitrait LOD and the pleiotropic effects of the QTL

  2. The second test aimed to evaluate the significance of contributions of each of the traits for the QTL detection power. This test is conducted separately for each trait. For this, the individual values of the trait under consideration are reshuffled relative to the remaining data (the other trait values and marker scores). The resulting data set is treated as before and the proportion of runs with LOD >= LOD* is used as the measure of significance of the trait contribution. The permutations are always performed regarding all the traits included in the model independently of the contribution value of the remaining traits. Clearly, some traits may prove to be insignificant because they contribute the same information as one (or a few) of the remaining traits. Thus, one can exclude insignificant traits from consideration by creating a new trait set that does not include the insignificant traits(s). This procedure should be applied by simple steps, excluding only one trait per step and repeating the permutation test for the remainder. The last warning is important because after excluding one of the traits at some step, the significance of contributions of the remaining traits may change.

  3. The same procedure as in (ii) can be used to test the significance of the QTL effect for each of the traits. Namely, we calculate the proportion of permutated cases where the estimated QTL effect for the considered trait xi fits the condition abs(di) >= abs(d*i), where d*i is the estimated effect on trait xi obtained on initial (not reshuffled) data.

In the example of Table 5, trait 7 displayed the lowest contribution and hence was removed after the first step. Reevaluation of the remaining complex revealed the next candidate to remove, trait 3, and then, similarly, trait 4. All the remaining traits (1, 2, 5, and 6) showed significant contribution. This trait complex provides also the narrowest confidence interval for the estimated QTL position ({sigma}L), as shown by the results of bootstrap analysis. The last result means that maintenance of excessive (noninformative) traits is not neutral, a reduced precision of the estimated QTL position being the penalty. Filtering out of the nonsignificant traits should affect the QTL detection power, but further reduction of the trait complex by removing the significant traits may result in a reduced power and lowered mapping precision (see the characteristics obtained for the last two trait combinations, 1, 2, 5, and, especially, 2, 5, 6).

An example of application to real data:
We illustrate the efficiency of the proposed approach using real data on a wheat mapping population characterized for 11 morphological quantitative traits. The experiment was performed on an F2/F3 mapping population derived from a cross between a highly stripe-rust-resistant wild emmer wheat Triticum dicoccoides (accession no. H52, from Mt. Hermon, Israel) and a T. durum cultivar, Langdon, released in North Dakota. The tetraploid wild emmer, T. dicoccoides, is the progenitor of cultivated wheat; hence, the genetic dissection of quantitative trait differences between the wild species and the cultivated crop is of great interest from the viewpoint of domestication evolution. It is also important for the ever-increasing utilization of T. dicoccoides as a rich genetic resource for wheat improvement. The molecular markers [microsatellites and amplified fragment length polymorphisms (AFLP)] were scored on 150 F2 individuals resulting in a rather dense genetic map (PENG et al. 2000 Down). The quantitative traits were scored on the selfed progeny in field trials conducted in Neve Yaar Agricultural Experimental Station, Israel, during the 1997–1998 cropping season. Eleven quantitative traits were scored on F3 progeny (for ~10 individual plants from each family): plant height (HT), plant heading date—the days from sowing to heading (HD); spike number/plant (SNP); spike weight/plant (SWP) including the grains, hulls, and rachis; single spike weight (SSW); kernel number/plant (KNP); kernel number/spike (KNS); kernel number/spikelet (KNL); 100-grain weight (GWH); grain yield/plant (YLD); and spikelet number/spike (SLS).

A detailed QTL description of the obtained QTL mapping results on these traits will be presented elsewhere (J. H. PENG, A. B. KOROL, T. FAHIMA, Y. I. RONIN and E. NEVO, unpublished results). Here we employ the obtained data only to illustrate the efficiency of the multitrait analysis, using as an example markers of chromosome 7A. With single-trait analysis applied separately to each of the traits, only one significant QTL was found on 7A, for trait GWH, with significance level ~0.01 (Table 6). This level should be corrected for multiple comparisons, taking into account the fact that the analyzed traits are correlated (e.g., by using the method based on factor analysis, as suggested by SPELMAN et al. 1996 Down). Therefore, the corrected significance will be even worse. The mapping precision evaluated by bootstrap analysis is not high ({sigma}L = 74 cM), as one would expect for the modest population size employed (n = 150). Therefore, it makes sense to attempt improvement of the mapping by utilizing the information contained in the entire trait complex, owing to possible pleiotropic effects of the putative QTL and/or correlations between GWH and the remaining traits. This was done exactly in the same way as described in the foregoing simulated example presented in Table 5. First, the entire complex of 11 traits was analyzed and then the traits that did not contribute significantly to the test statistics were removed. The results presented in Table 6 and Fig 5 show a more than twofold increase in the mapping precision ({sigma}L decreased from 74 to 30 cM) and an increase in detection power that is especially clear at higher significance level (98.9% vs. 42.9%).



View larger version (47K):
In this window
In a new window
Download PPT slide
 
Figure 5. Joint analysis of 11 traits scored in F2/F3 mapping population of wheat Triticum durum x T. dicoccoides using markers of chromosome 7A. The results of removing nonsignificant traits are presented. (a) LOD score distribution along chromosome 7A for the 5-trait complex (GWH, YLD, HD, HT, and SWP). (b) Interval distribution of the maximum LOD values along chromosome 7A based on 1000 bootstrap runs. Both graphs are outputs of the MultiQTL package (http://www.MultiQTL.com).


 
View this table:
In this window
In a new window

 
Table 6. Interval analysis of a multitrait complex that includes 11 morphological traits scored in F2/F3 mapping population of wheat Triticum durum x T. dicoccoides (PENG et al. 2000 Down)


*  DISCUSSION
*TOP
*ABSTRACT
*THE PROPOSED METHOD
*RESULTS
*DISCUSSION
*LITERATURE CITED

A multivariate generalization of our previous two-trait QTL mapping analysis (KOROL et al. 1987 Down, KOROL et al. 1995 Down, KOROL et al. 1998A Down; RONIN et al. 1995 Down, RONIN et al. 1999 Down; see also JIANG and ZENG 1995 Down) is proposed here. It is not difficult to extend this method to other situations (e.g., analyzing F3 populations), to deal with linked QTL (similar to the analysis of KOROL et al. 1998A Down; RONIN et al. 1999 Down), to combine it with selective genotyping design (RONIN et al. 1998 Down; HENSHALL and GODDARD 1999 Down), or to adopt composite interval mapping (JANSEN and STAM 1994 Down; ZENG 1994 Down). Especially promising may be its application to fine mapping (Y. I. RONIN, E. BRITVIN, E. NEVO and A. B. KOROL, unpublished results). Indeed, the dramatic increase in mapping resolution derived from using the entire multivariate complex, as compared with univariate or even bivariate analysis, effectively increases the score D2n that was found to affect the mapping resolution (DARVASI and SOLLER 1997 Down). Consequently, it becomes reasonable to saturate the revealed intervals by additional markers even at modest population sizes like 200–500 individuals; usually this is pointless because with small effects no increase in precision is expected by addition of new markers to the map (DARVASI et al. 1993 Down). Therefore, the transition from a single- or even two-trait analysis to treatment of genuine multiple-trait complexes significantly improves all aspects of utilizing the mapping information contained in the data.

However, the application of multivariate complexes not only increases the QTL detection power, mapping resolution, and estimation accuracy but it may also increase the power of discriminating various important hypotheses that concern the genetic architecture of complex traits, such as linkage vs. pleiotropy (SCHORK et al. 1994 Down; JIANG and ZENG 1995 Down; ALMASY et al. 1997 Down; LEBRETON et al. 1998 Down; RONIN et al. 1999 Down), genetic interaction within and across QTL (additive vs. dominant or overdominant effects, and additive vs. epistatic effects or canalization; RONIN et al. 1999 Down; SHOOK and JOHNSON 1999 Down), and QTL-environment interaction (FRY et al. 1998 Down; KOROL et al. 1998B Down). Multivariate QTL analysis may be helpful in genetic dissection of such types of complex traits as multifactorial diseases (MANSFIELD et al. 1997 Down), development (WU et al. 1999 Down), longevity and aging (NUZHDIN et al. 1997 Down), behavior (PLOMIN and CRAIG 1997 Down; WEHNER et al. 1997 Down), fitness-related trait complexes and species differentiation (ZENG et al. 2000 Down), heterosis (XIAO et al. 1995 Down), marker-assisted breeding (LANDE and THOMPSON 1990 Down; VISSCHER et al. 1996 Down), characterizing the regulatory networks of structural genes (DAMERVAL et al. 1994 Down), bridging between gene-structure-and-function studies (e.g., when looking for functions of massively expressed sequence tags; LAHBIB-MANSAIS et al. 1999 Down), analyzing the genetic transmission system (breeding system, recombination, and mutation control; KOROL et al. 1994 Down; BERNACCHI and TANKSLEY 1997 Down), and so on.

As a not-so-remote analogy, one could compare the situation of multivariate QTL analysis with that characteristic of medical diagnoses: excluding simple situations, a good physician will never rely on one trait (symptom or analysis, etc.) Instead, he/she will try to take into account all available information concerning the patient. However, this does not mean that increasing the number of traits to be analyzed simultaneously will necessarily improve the quality of the QTL mapping results. A technical obstacle with high dimensionality is an increasing probability that many loci may affect the analysis along the chromosome, whereas a small-to-moderate population size could hardly justify fitting more than two or three linked QTL simultaneously. Another problem is the interpretation of the results. Therefore, in choosing the initial set of traits for joint QTL analysis, one may find it reasonable to restrict such sets by functionally related traits. The examples presented in this article, on both simulated and real data, show that maintenance of excessive traits in the model may be penalized. These concerns indicate that in spite of high potential and biological "compatibility" of the multiple-trait analysis to the main targets of QTL analysis, a lot of work remains to be done to fully extract the mapping information hidden in the collected data.

An additional complication that is worth mentioning is the possible effect of the model assumption on the obtained results. It was shown earlier that for testing for linkage, erroneous models may lead to valid tests for linkage (WRIGHT and KONG 1997 Down). For example, QTL mapping analysis may be quite robust to violations of the assumption of normality in single-trait situations (KOROL et al. 1996B Down) and more sensitive in multivariate ones. Likewise, the assumption of homoscedastic distributions (i.e., equal residual variances in QTL groups) that is usually applied automatically may be wrong, leading to reduced QTL detection power and biased estimates of parameters. On the contrary, if a correct model is fitted, this may increase the detection power and mapping accuracy compared to situations when no such disturbances exist. We demonstrated these effects earlier for single- and two-trait analysis (KOROL et al. 1995 Down, KOROL et al. 1996A Down). Especially important is the assumption of a single QTL per chromosome, which being violated may lead to the LOD score peaking in the wrong place (see KNOTT and HALEY 1992 Down; WRIGHT and KONG 1997 Down). For the two-trait case we found that joint analysis of correlated traits increases the power of the test aimed to discriminate between the single QTL and two-linked QTL situations (RONIN et al. 1999 Down). All these aspects should be taken into account in multivariate QTL analysis.

The described approach is implemented in the MultiQTL package (http://www.MultiQTL.com) for both single- and two-linked QTL models.


*  ACKNOWLEDGMENTS

We thank M. Soller, J. Weller, G. Churchill, and three anonymous referees for helpful comments and suggestions on the first version of the manuscript. This study was supported by the Israeli Science foundation (grants 02198 and 9048/99), the United States-Israel Binational Science Foundation (grant 4556), and the German-Israeli Cooperation Project (DIP project funded by the Internationales Büro Deutsch-Israelische des BMBF Projektkooperation).

Manuscript received February 20, 2000; Accepted for publication January 10, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THE PROPOSED METHOD
*RESULTS
*DISCUSSION
*LITERATURE CITED

ALLISON, D. B., B. THIEL, P. ST. JEAN, R. C. ELSTON, and M. C. INFANTE et al., 1998  Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am. J. Hum. Genet. 63:1190-1201[Medline].

ALMASY, L., T. D. DYER, and J. BLANGERO, 1997  Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet. Epidemiol. 14:953-958[Medline].

AMOS, C. I., R. C. ELSTON, G. E. BONNEY, B. J. KEATS, and G. S. BERENSON, 1990  A multivariate method for detecting genetic linkage, with application to a pedigree with an adverse lipoprotein phenotype. Am. J. Hum. Genet. 47:247-254[Medline].

BERNACCHI, D. and S. D. TANKSLEY, 1997  An interspecific backcross of Lycopersicon esculentum x L. hirsutum: linkage analysis and a QTL study of sexual compatibility factors and floral traits. Genetics 147:861-877[Abstract].

BOOMSMA, D. I. and C. V. DOLAN, 1998  A comparison of power to detect a QTL in sib-pair data using multivariate phenotypes, mean phenotypes, and factor scores. Behav. Genet. 28:329-340[Medline].

DAMERVAL, C., A. MAURICE, J. M. JOSSE, and D. DE VIENNE, 1994  Quantitative trait loci underlying gene product variation: a novel perspective for analyzing regulation of genome expression. Genetics 137:289-301[Abstract].

DARVASI, A. and M. SOLLER, 1994  Selective DNA pooling for determination of linkage between a molecular marker and a quantitative trait locus. Genetics 138:1365-1373[Abstract].

DARVASI, A. and M. SOLLER, 1997  A simple method to calculate resolving power and confidence interval of QTL map position. Behav. Genet. 27:125-132[Medline].

DARVASI, A., A. WEINREB, V. MINKE, J. I. WELLER, and M. SOLLER, 1993  Detection marker-QTL linkage and estimating QTL gene effect and map location using a saturated genetic map. Genetics 134:943-951[Abstract].

DOERGE, R. W. and G. A. CHURCHILL, 1996  Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285-294[Abstract].

FRY, J. D., S. V. NUZHDIN, E. G. PASYUKOVA, and T. F. MACKAY, 1998  QTL mapping of genotype-environment interaction for fitness in Drosophila melanogaster.. Genet. Res. 71:133-141[Medline].

HENSHALL, J. M. and M. E. GODDARD, 1999  Multiple-trait mapping of quantitative trait loci after selective genotyping using logistic regression. Genetics 151:885-894[Abstract/Free Full Text].

JANSEN, R. C. and P. STAM, 1994  High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455[Abstract].

JIANG, C. and Z-B. ZENG, 1995  Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140:1111-1127[Abstract].

KNOTT, S. A. and C. S. HALEY, 1992  Aspects of maximum likelihood methods for mapping of quantitative trait loci in line crosses. Genet. Res. 60:139-151.

KOROL, A. B., I. A. PREYGEL, and N. BOCHARNIKOVA, 1987  Linkage between loci of quantitative traits and marker loci. 5. Simultaneous analysis of a set of markers and quantitative traits. Genetika 23:1421-1431. (English translation in Sov. Genet. 1988, 23: 996–1004).[Medline].

KOROL, A. B., I. A. PREYGEL and S. I. PREYGEL, 1994 Recombination Variability and Evolution. Chapman & Hall, London.

KOROL, A. B., Y. I. RONIN, and V. M. KIRZHNER, 1995  Interval mapping of quantitative trait loci employing correlated trait complexes. Genetics 140:1137-1147[Abstract].

KOROL, A. B., Y. I. RONIN, Y. TADMOR, A. BAR-ZUR, and V. M. KIRZHNER et al., 1996a  Estimating variance effect of QTL: an important prospect to increase the resolution power of interval mapping. Genet. Res. 67:187-194.

KOROL, A. B., Y. I. RONIN, and V. M. KIRZHNER, 1996b  Linkage between loci of quantitative traits and marker loci. Resolution power of three statistical approaches in single marker analysis. Biometrics 52:426-441[Medline].

KOROL, A. B., Y. I. RONIN, P. HAYES, and E. NEVO, 1998a  Multi-interval mapping of correlated trait complexes: simulation analysis and evidence from barley. Heredity 80:273-284.

KOROL, A. B., Y. I. RONIN, and E. NEVO, 1998b  Approximated analysis of QTL-environmental interaction with no limits on the number of environments. Genetics 148:2015-2028[Abstract/Free Full Text].

LAHBIB-MANSAIS, Y., G. DALIAS, D. MILAN, M. YERLE, and A. ROBIC et al., 1999  A successful strategy for comparative mapping with human ESTs: 65 new regional assignments in the pig. Mamm. Genome 10:145-153[Medline].

LANDE, R. and R. THOMPSON, 1990  Efficiency of marker assisted selection in the improvement of quantitative traits. Genetics 124:743-756[Abstract].

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199[Abstract/Free Full&nb