Abstract
A multiple-trait QTL mapping method using least squares is described. It is presented as an extension of a single-trait method for use with three-generation, outbred pedigrees. The multiple-trait framework allows formal testing of whether the same QTL affects more than one trait (i.e., a pleiotropic QTL) or whether more than one linked QTL are segregating. Several approaches to the testing procedure are presented and their suitability discussed. The performance of the method is investigated by simulation. As previously found, multitrait analyses increase the power to detect a pleiotropic QTL and the precision of its location estimate. With enough information, discrimination between alternative genetic models is possible.
FREQUENTLY in quantitative trait loci (QTL) mapping experiments a number of phenotypic traits are scored. The common procedure has been to search for QTL trait by trait. The traits, however, are often genetically correlated and, hence, the same QTL may affect two or more traits. Where a QTL has pleiotropic effects on two or more traits, using information from the traits simultaneously should improve the power to detect the QTL and the precision of the location estimate. Alternatively, separate QTL for two different traits may map to a similar location, and it is important to be able to test whether the same QTL could be affecting several traits or whether different linked QTL explain the observations. An obvious extension to the methods for single-trait analyses, therefore, is to methods for multitrait analyses, taking advantage of their potential improved power and precision and enabling additional tests about the genetic control of multiple traits to be addressed.
Several approaches to multitrait analyses have been proposed. Weller et al. (1996) and Mangin et al. (1998) have both proposed the use of canonical transformations of the original data followed by single-trait analyses on the resulting independent traits. The advantage of using an approach based on canonical transformation is that after transformation, existing single-trait software can be used for QTL analyses. If all canonical traits are analyzed, results for a given location can be back-transformed to give estimates of the effects of a putative QTL at that location on the original traits. Using canonical transformation is not wholly satisfactory, however, as a transformation that produces traits that are either phenotypically or genetically uncorrelated does not ensure that QTL only influence a single canonical trait. This is because different QTL affecting a trait may have different patterns of pleiotropy (e.g., some QTL affecting only one trait, others affecting two or more traits). In this case, it is not possible to find a canonical transform that ensures all QTL only influence one canonical trait. Consequently, it cannot be assumed that QTL found to be affecting two different canonical variables in the same location are actually different QTL, as stated by Weller et al. (1996). One could only conclude that QTL affecting different canonical traits are indeed different if the genetic correlations between traits are the same as the phenotypic correlations and all individual QTL follow the same pattern. Indeed, if all individual QTL have the same pattern of effects on two or more traits, one would expect the genetic correlation between traits to be plus or minus unity or zero.
Maximum-likelihood (ML) approaches have been proposed for a number of different experimental designs, for example, for crosses between inbred lines (Jiang and Zeng 1995), for sib-pair analyses (Eaveset al. 1996), and for half-sib families (Roninet al. 1995). The main drawbacks of ML approaches are that they are computationally intensive, require specialized software, and can lack robustness. In some circumstances, these multitrait approaches have been shown to have increased power for detecting a QTL and improved precision of parameter estimates. The testing procedure, however, is slow to carry out with maximum likelihood, as standard test-statistic distributions under the null hypothesis are not usually applicable because multiple correlated tests are being performed. Also, extensions to include many QTL are cumbersome within the maximum-likelihood framework.
Recently Wu et al. (1999) used a multitrait leastsquares method to analyze tiller number in rice measured at different times during development. Using a population of recombinant inbred lines and treating the measurement at each time as a different trait, they consider a model with a QTL with an effect on each trait. This is equivalent to the pleiotropic model presented in this article. They compare this with single-trait analyses of tiller number at the different times and find that the multitrait analyses are not necessarily more powerful when considering a single trait measured on several occasions over time, but suggest that the benefit comes in estimating the QTL location more precisely. They do not consider any other multitrait models.
Cheverud et al. (1997) and Lebreton et al. (1998) have considered the problem of discriminating between linked QTL and a pleiotropic QTL. In both cases, however, the methods are based on single-trait analyses that may lack power compared with a multitrait alternative. Cheverud et al. (1997) propose a likelihood-ratio test to compare locations estimated for each trait separately vs. a weighted average location. This test has not been formally tested and it is not clear that an estimated location based on a weighted average will be the same as that obtained if the traits were analyzed simultaneously in a multitrait analysis. The method proposed by Lebreton et al. (1998) does not rely on the use of likelihoods. Using a bootstrap approach, multiple sets of data are resampled and analyzed and the linked-QTL hypothesis is rejected (in favor of the null hypothesis of pleiotropy) if the confidence interval for the distance between the estimated QTL locations (one for each trait) includes zero. This approach examines traits a pair at a time and results would become difficult to interpret if many traits were considered. Almasy et al. (1997) test for pleiotropy vs. linkage in a bivariate analysis using an identity-by-descent method with maximum likelihood. The extension to multiple traits is not clear and the use of maximum likelihood means that specialized software is required and analyses will be more time consuming than a least-squares alternative.
Testing procedures for multitrait analyses have not been well explored. For example, it is not clear whether one should start with one multitrait analysis of all traits, or with a series of single-trait analyses, or concentrate on multitrait analyses of subsets of traits clustered, for example, on their phenotypic or genetic correlations. The answer to such questions is likely to depend both on the objectives of the study (for example, is one interested in maximizing power to detect QTL that jointly affect several traits or, alternatively, to detect QTL with effects that run counter to a particular genetic correlation) and on the actual underlying genetic structure. Part of the reason for the difficulty in pursuing such questions is that performing replicated simulation studies is time consuming when using maximum-likelihood approaches.
In this article, we formally describe the implementation of a straightforward multitrait least-squares analysis for QTL detection and location, including models for pleiotropic and for linked QTL, and we explore alternative testing procedures. The basis of this approach is the single-trait analyses proposed by Haley and Knott (1992) and Haley et al. (1994). We have developed and applied the method to a three-generation pedigree where the QTL can be assumed to be fixed in the grandparental generation. The general principles, however, are applicable to the wide range of different population structures appropriate for QTL mapping and amenable to least-squares analysis (see, for example, Knott et al. 1996, 1997; van Kaamet al. 1998). Alternative approaches would be required for more general pedigree structures with many different relationships and multiple generations.
METHOD
Here we describe the use of standard multiple-trait multivariate regression for the detection of QTL. The approach is described for a population based on the F2 generation of a cross between two lines, where the original lines can be assumed to be homozygous for any QTL of interest. In its simplest form this could be a true F2 from a cross between inbred lines; alternatively, following procedures presented by Haley et al. (1994), the initial lines may be segregating at the markers although fixed for alternative alleles at QTL of major effect. Traits are recorded on the F2 generation and all three generations are genotyped at the markers. We illustrate the method by reference to data on two traits, but the extension to multiple traits is straightforward.
The analysis is in two parts. In the first part, at locations throughout the genome, the probability of each F2 individual being each of the four possible genotypes (accounting for maternal or paternal origin of the alleles) is calculated based on observed marker genotypes. Note that in an inbred line cross the two heterozygous genotypes are indistinguishable. In the second step these probabilities are used as independent variates in a model to analyze the phenotypic data. The approach presented in this article replaces the single-trait leastsquares analyses used in the second stage by Haley and Knott (1992) and Haley et al. (1994) with a multipletrait analysis.
Basic model: The extension of the single-trait to the multitrait analysis is straightforward and standard statistical procedures can be followed. The basic model is
The solutions to the equations can be obtained as usual as
For a given location the solutions and standard errors are identical to those obtained fitting the same model to the traits separately; the advantage of the multitrait approach comes in the testing procedure.
Significance tests: Focusing on the situation with only two traits, each being controlled by a maximum of one QTL in the linkage group being considered, we are primarily interested in two types of test: (1) Is there evidence for QTL in this linkage group affecting the traits? and (2) If there is evidence for QTL affecting both traits, are there two linked QTL (one affecting each trait) or one pleiotropic QTL? For both of these tests the significance thresholds cannot be obtained from standard tables.
Evidence for QTL? For both traits we consider the test of control by a single QTL vs. no QTL in the linkage group. The procedure is identical to that for single-trait analyses, in that the model given above, with a single QTL, is fitted for a given location in the genome and then repeated for different locations until the whole genome has been scanned. At each location the residual sums of squares (SS) matrix, which we denote RSSp, can be calculated as (Y′Y – B̂′X′Y), where the X matrix contains columns for the fixed effects and covariates that are constant throughout the genome and additional columns associated with the QTL at a given location that are not the same over different locations. The B matrix contains the relevant estimates. The location giving the best fit (see below) is the most likely location for a QTL. Many locations may give evidence for a QTL. Using results from a single scan of the genome, a number of alternative tests can be proposed. The procedure is detailed here and the tests are summarized in Table 1.
Single-trait test: For each trait separately, from the relevant diagonal element of the SS matrix RSSp, we can find the location giving the smallest residual SS when fitting the QTL. This is the most likely location for a single QTL that affects only this trait. The F-ratio of the mean square explained by fitting the QTL to the residual MS can be used to test whether the effect is significant. This is identical to the single-trait analyses described by Haley et al. (1994) and Haley and Knott (1992). For uniformity across the single- and multiple-trait analyses in this article, we present results as approximate likelihood ratios. For single-trait analyses, following Seal (1964), these can be obtained as –{resdf –½(2 – testdf)}ln(RSSp/RSSr), where resdf are the residual degrees of freedom after fitting the QTL and testdf are the degrees of freedom for the test being performed (i.e., the degrees of freedom used to model the QTL effect, e.g., one for an additive QTL with no dominance component fitted). RSSp is the relevant SS from the residual SS matrix having fitted the QTL and RSSr is the SS obtained when fitting only fixed effects and any covariates but not the QTL. RSSr is constant across the genome. This test statistic is approximately equal to testdf × F. For a given location in the genome and under the null hypothesis of no QTL, the test statistic is expected to be distributed approximately as chi square with degrees of freedom equal to testdf.
Multiple-trait tests—pleiotropic QTL model: The determinant of the residual SS matrix obtained fitting, at a given location, a QTL affecting both traits, |RSSp|, can be used to identify the location explaining the most variance in the two traits jointly. As stated above, for uniformity we present approximate likelihood-ratio tests, although other test statistics, such as Wilk's λ, could be used. We can calculate the following statistic (following Seal 1964) to test for the presence of the QTL
Summary of the alternative tests performed
Multiple trait tests—linked-QTL model: A more general multitrait model is one in which each trait is affected by a different, single QTL. The SS matrix for this model can be obtained by finding the best location for each trait separately (as described for the single-trait tests). The X matrix containing only fixed effects and covariates is augmented with columns for each trait (or QTL) that simply contain the relevant function of the genotype probabilities (additive or dominance coefficients, for example) for the best location of that trait. The B̂ matrix contains the estimates obtained when fitting the best location alone for each trait and zero effect for all other locations. For example, in a model fitting only the mean and the additive effect at the putative QTL, the X and B̂ matrices would be
Two linked QTL or one pleiotropic QTL? If both traits are found to be affected by QTL in a given region, we are generally interested in determining whether it is the same QTL affecting both traits or whether there are more than one QTL. We consider two approaches for comparing these alternative hypotheses.
Likelihood-ratio test: A test for two linked QTL vs. one pleiotropic QTL can be accommodated within the multiple-trait framework. The test is based on the ratio of the determinants of the residual SS matrix from the best pleiotropic QTL model, |RSSp|, to the residual SS from the best-linked QTL model, |RSSl|. Converting to an approximate likelihood ratio we wish the test to have 1 d.f. (for the additional location estimated in the linkage model). This can be achieved by setting t and testdf to 1 [or more generally for t traits, this test has (t – 1) d.f.]. If the best locations for the two QTL from singletrait analyses are the same, this test gives a value of 0. As stated for the previous tests, we are no longer within the conditions for the standard significance thresholds to apply and so the distribution of the test statistic under the null hypothesis is required. The null hypothesis for this test is a pleiotropic QTL and, hence, the standard permutation test (for which the null hypothesis is a model with no QTL) cannot be implemented. Walling et al. (1998) investigated the use of the parametric bootstrap to obtain empirical distributions of the QTL location estimate to estimate standard errors. They proposed using the estimates from the data analysis with the relevant model as parameters for simulation and performed multiple replicate simulations and analyses to obtain the distributions. For the test of linkage vs. pleiotropy an analogous procedure can be used. The estimates obtained from the best pleiotropic QTL model are taken and used as parameters for replicate simulations. The test statistic for linkage vs. pleiotropy can be obtained from each replicate giving the empirical distribution over the replicate simulations, from which the significance threshold can be obtained. The test statistic obtained from the original data set can be compared with this threshold to determine whether the test is significant. Walling et al. (1998) considered two situations, one where the original marker data were used and a second where the marker data were also simulated for each replicate. They found no consistent difference between these approaches and as the former is easier and faster to implement, this is the version considered here. From the original analysis we have already calculated the probability of an F2 individual being each of the possible genotypes at all locations throughout the linkage group and the probabilities for the location of the best-fitting pleiotropic QTL can be used in the simulation.
Nonparametric bootstrap: Lebreton et al. (1998) gave an alternative approach, not making direct use of the test statistic derived above, using the nonparametric bootstrap. If there was evidence for QTL affecting both traits, then the data were sampled with replacement to give a data set containing the same number of individuals and reanalyzed fitting a QTL for each trait. The distance between the best QTL locations was calculated and the 95% confidence interval (C.I.) for this distance obtained from replicate samplings. The test for linkage vs. pleiotropy is whether the C.I. for the estimate of the difference in position of the two QTL includes zero. If it does then there is no evidence to exclude the model of the same QTL affecting both traits (i.e., a pleiotropic QTL). This approach does not require the use of the multitrait framework but it is not obvious how it would be extended to more than two traits.
SIMULATION
To investigate the behavior of this approach to QTL detection, a number of different genetic scenarios were simulated. In all cases a single linkage group and only two traits were considered and there was a maximum of one QTL affecting each trait. The QTL could be pleiotropic, that is, affecting both traits, or affecting only one trait. At most two QTL were simulated in a replicate, one affecting each trait. A total of 400 F2 individuals were simulated, each with a 100-cM chromosome with markers every 10 cM (giving 11 markers in all). Three sizes of additive QTL effect were considered: 0.5, 0.25, and 0.125 residual standard deviations (i.e., explaining 0.11, 0.03, and 0.01 of the total variance for a single trait in the F2). With an F2 design we cannot distinguish between genetic (unlinked to the markers being considered) and environmental residual variance, hence an overall residual correlation of 0.75 was investigated in addition to no residual correlation for the QTL of intermediate effect. When only a single QTL was simulated, this QTL was placed at 50 cM (i.e., the midpoint of the linkage group). Two QTL were placed either 20 cM apart or 60 cM apart, equidistant from the center of the linkage group. A total of 100 replicates of each situation were simulated and analyzed.
Significance thresholds: For the tests with a null hypothesis of no QTL (see Table 1), significance thresholds were obtained by simulating F2 individuals with the markers described above (i.e., 100-cM chromosome with 11 equally spaced markers) and phenotypes depending only on a random environmental component without any QTL. A total of 5000 replicate simulations were performed to obtain the chromosomal 5% threshold (to be used as the experimental threshold for the purpose of this study). Residual correlations of 0.25 and 0.75 were investigated as well as no residual correlation. The relevant test statistics were calculated for each replicate and their distribution over multiple replicates was used to obtain the significance thresholds. For the single-trait tests, to account for the fact that two, possibly correlated, traits were being analyzed, for each replicate simulation the higher of the two single-trait test statistics was picked and the distribution of these was used to obtain the threshold. Alternatively, we can assume that the two traits are uncorrelated and use the 2.5% threshold for each trait, which is equivalent to 5% over both traits (following Bonferroni adjustment). For the multiple-trait tests, i.e., the pleiotropic QTL model and the linked-QTL model tests (Table 1), the significance thresholds already account for the fact that more than one trait is being analyzed.
To confirm that a straightforward implementation of the permutation test (or simulation with no QTL) would not be appropriate for the linkage vs. pleiotropy test (Table 1) where the null hypothesis is a pleiotropic QTL, sets of data were simulated with a single pleiotropic QTL. As before, three sizes of effect for the QTL were considered (0.5, 0.25, and 0.125 residual standard deviations). One thousand replicate simulations were performed for each situation. The distributions of the test statistics in the different situations were compared.
Testing procedure: Three separate procedures were followed to investigate the optimum strategy for detection of QTL.
-
Single-trait approach: Each trait was analyzed separately using the single-trait test. If both traits were significant using the empirical 5% thresholds obtained by simulation, the nonparametric version of the linkage vs. pleiotropy test was implemented.
-
Pleiotropy model approach: Initially the pleiotropic QTL model (a single QTL affecting the two traits) was fitted. The test statistic for the QTL having an effect on both traits jointly was calculated and compared with the empirical experimental 5% significance threshold (obtained by simulation) for the null hypothesis of no QTL. If this was significant, a nominal 5% F-test for the effect of the QTL on each trait was used to determine whether both traits were affected by the QTL. If both traits were involved, then the pleiotropic QTL model was tested against a linked-QTL model (two linked QTL, one affecting each of the traits) using the linkage vs. pleiotropy test.
-
Linkage model approach: To begin, the linked-QTL model was fitted. If this model was significant (tested against the null hypothesis of no QTL), for each trait the F-ratio for the effect of the QTL on that trait was calculated and tested against the 5% chromosomal threshold for the trait. These chromosomal thresholds are obtained from single-QTL analyses (calculated in the multitrait analyses, see above) not accounting for more than one trait. If both traits were significantly affected by a QTL, the linkage vs. pleiotropy test was carried out.
For the linkage vs. pleiotropy test, both the likelihoodratio test and the nonparametric bootstrap were performed, each with 1000 replicate samples in the bootstrap.
RESULTS
Significance thresholds: The thresholds for the chromosomal 5% significance level obtained by simulation for the various tests are given in Table 2. The singletrait thresholds obtained from analysis of the two traits are expected to be the same (other than differences due to sampling) and should be consistent over the different residual correlations. Accounting for multiple correlated traits when obtaining the single-trait threshold (T1 + T2 in Table 2) gave approximately the same threshold as using the Bonferroni correction (accounting for two uncorrelated traits) when the simulated traits were uncorrelated, as expected. With an increase in the simulated residual correlation, we expect no change in the single-trait thresholds obtained using the Bonferroni correction whereas those accounting for correlated traits should become less extreme. The results in Table 2 support this expectation, although the changes in threshold values are small. The residual correlation had virtually no effect on the 5% threshold for the pleiotropic QTL model test, whereas the linked-QTL model test threshold became more extreme (i.e., for this statistic the threshold value was slightly reduced).
When simulating a null hypothesis of one pleiotropic QTL, there was some evidence that the distribution of test statistics for the linkage vs. pleiotropy test depended on the magnitude of the effect of the simulated QTL (see Table 3). Although the differences observed were relatively small, we opted to use the bootstrap approaches, which take account of the size of the effect of the QTL to obtain a significance threshold for testing linkage vs. pleiotropy.
Power: Single-trait approach: Table 4 gives the number of runs out of the 100 replicates resulting in the different possible outcomes: two linked QTL (one affecting each trait), or one pleiotropic QTL, or a QTL affecting only one of the traits. The results presented are based on the significance thresholds obtained assuming two correlated traits and using the same residual correlation as used to simulate the data being analyzed.
Multiple-trait approaches: The nonparametric bootstrap proposed by Lebreton et al. (1998) to test for evidence for linked QTL vs. a pleiotropic one lacked power compared with the likelihood-ratio test (Table 5). When a pleiotropic QTL was simulated we would have expected to observe 5% of the runs resulting in two linked QTL (this being the type 1 error we accept in setting the significance thresholds). Both the likelihood-ratio test and the nonparametric methods found fewer significant replicate simulations than expected. As the QTL were simulated to be further apart, discrimination between the linked QTL and pleiotropic models was better with the likelihood-ratio test than with the nonparametric bootstrap approach. With the QTL of smaller effect, neither method was very successful at detecting linkage and nearly all replicates where evidence for a QTL affecting each trait was found resulted in the pleiotropic QTL model not being rejected.
Tables 6 and 7 give the number of runs resulting in the different models: a QTL affecting only one of the traits, one pleiotropic QTL, or two linked QTL (one affecting each trait) for the two multitrait approaches (i.e., the pleiotropic model and the linkage model approaches; see method for a description). The significance thresholds obtained using the same residual correlation as used to simulate the data being analyzed were used. The results from the likelihood-ratio test for linkage vs. pleiotropy are presented. These are the same as presented in Table 5, but given as the percentage of all runs, rather than the percentage of runs where there was significant evidence for a QTL affecting both traits. The number of replicates in the resulting models does not necessarily sum to the total number of significant replicates. This is because although the multitrait model (linkage or pleiotropy) was significant, when the effect of the QTL was tested on the traits separately, they were not significant.
5% significance thresholds
When data were simulated with a QTL affecting only one of the two traits (and the other trait had no QTL affecting it in the linkage group), with no residual correlation between the traits, all three approaches (singletrait and two multitrait approaches) were similar in their power to detect QTL. All the approaches generally resulted in the correct model. By chance we expect the QTL to have a significant effect on the second trait in 5% of the significant runs, this being the type 1 error we were trying to achieve when setting the thresholds. The pleiotropy model approach results in both traits being significant more often than expected, suggesting that the significance criterion used to assess the effect of the QTL on each trait was not stringent enough. When the residual correlation was simulated to be 0.75, the power of the multitrait analyses to detect a QTL increased compared with when no residual correlation was simulated. The best results were observed with the pleiotropy model approach.
Test statistic distributions for linkage vs. pleiotropy
When data were simulated with QTL affecting both traits, the multitrait QTL models were significant in more runs than the single-trait analyses. For the largest-effect QTL, all approaches detected QTL in all replicate simulations. The linkage model and single-trait approaches always resulted in a QTL affecting both traits. When the QTL were tightly linked, the pleiotropy model approach performed well; however, when the QTL were 60 cM apart, 50% of runs ended with a model with a QTL affecting only one of the traits. For this large-effect QTL, when there was evidence that QTL were affecting both traits, there was good discrimination between linkage and pleiotropy, especially for the multitrait approaches. For the intermediate-effect QTL, the power to detect at least one significant QTL was high. The highest power to detect the pleiotropic QTL was found with the pleiotropy model approach. For the simulated linked QTL, the linkage model approach gave the highest proportion of runs resulting in the correct QTL model. The power to discriminate between linkage and pleiotropy was much lower for the intermediate-effect QTL than with the large-effect QTL. When a residual correlation between the traits was simulated, the multitrait approaches gave less power when a pleiotropic QTL was simulated, compared with the situation with no residual correlation, although fewer runs resulted in only one trait being significantly affected by a QTL. When the simulated-QTL model was linked QTL, the power of the multitrait models was similar to the situation without a residual correlation. Except for the situation with QTL simulated to be 60 cM apart and analyzed following the pleiotropy model approach, both the single-trait and the multitrait approaches less frequently resulted in a model with only one of the traits affected by a QTL. The power to detect the smallest-effect QTL was low. The pleiotropy model approach most frequently picked up that two traits were involved, but for the simulated linked QTL there was a lack of power to detect linkage. The improved performance of the pleiotropy model approach could also reflect the reduced stringency observed when testing the effect of a significant pleiotropic QTL on each trait.
Power of the single-trait approach
Parameter estimates: In QTL mapping experiments, we are interested not only in the power to detect any QTL but also in the magnitude of its effect and the location on the chromosome. Table 8 gives the mean parameter estimates obtained when fitting the pleiotropic QTL model and when fitting the single-trait model. The results in Table 8 are from the analyses of the two QTL with larger effect, which were detected with a high frequency.
When the analyses resulted in the same QTL model as that used for the simulation (i.e., a pleiotropic QTL or linked QTL), the parameter estimates were, on average, close to those used for the simulation. Comparing the results from fitting the pleiotropic QTL model with the single-trait models (see Table 8), the multitrait analyses resulted in a reduction in the standard deviation of the estimate of location when a pleiotropic QTL was simulated, except as expected when the residual correlation between the traits was high. The additive effect was, on average, slightly overestimated in the single-trait analyses, as expected because of the selection of location for the QTL. The effect of selection was decreased when correctly fitting the pleiotropic QTL model, giving, on average, a lower overestimate. If the pleiotropic model was incorrectly fitted when linked QTL were simulated, as expected the mean estimate of location was, on average, halfway between the simulated locations of the QTL and the standard deviation of location was inflated. Also, the estimates of the additive effect were lower than those simulated, because the QTL could not be at the optimum location for both traits, and the standard deviation was inflated. In this situation, estimates obtained from the single-trait analyses (which would be the same as the linked-model results if all replicates ended in a linked-QTL model) were closer to the simulated parameter values. The estimates from the single-trait analyses were not affected by whether the simulated model was a pleiotropic QTL or linked QTL. When only one of the traits simulated was affected by a QTL, the pleiotropic QTL model and single-trait model gave very similar results when the QTL effect was large. The pleiotropic QTL model correctly detects the QTL and estimates it to have, on average, zero effect on the second trait. For the smaller-effect QTL, the pleiotropic QTL model is slightly less powerful, which results in a higher standard deviation of the location estimate.
Power of tests for linkage vs. pleiotropy
The presence of correlated residuals in the data makes very little difference in the parameter estimates. Two main differences were observed, both concerning the variance of the location estimate. The first is the larger standard deviation seen when a pleiotropic QTL was simulated and the second is the lower standard deviation seen when a QTL affecting only one of the traits was simulated. In both cases, the observed change in the power of these analyses might cause this effect (decrease of power in the first case and increase in the second).
DISCUSSION
The multitrait multiple regression approach for QTL mapping performs well in detecting and characterizing QTL and is very easy to implement. If traits are affected by the same QTL, a multitrait analysis increases the power of detection of this QTL compared with singletrait analyses. If the pleiotropic QTL model is the correct one, we would expect that fitting this model would give highest power and smallest standard deviations especially for location, as in this case both traits are being used to estimate the same parameter. When the simulated QTL are some distance apart, the pleiotropy model approach performs less well. The best estimate for the location of the pleiotropic QTL tends to be at the location of one of the simulated QTL and the evidence for the QTL affecting the second trait at this location will be low. The linkage model approach performs better in this situation.
We have implemented the analyses discussed in standalone software; however, the models being presented here could be fitted in a standard statistical package in which multitrait least squares is available. To do this, the genotype probabilities would have to be calculated from the marker data prior to analysis, using specialized software (e.g., Haleyet al. 1994). The analysis would involve a call to the least-squares software for each location considered in the genome, altering the X matrix for subsequent calls as required. The tests for the linked-QTL and pleiotropic QTL models could be easily performed from output of such analyses, using the residual SS matrices. The likelihood ratio for linkage vs. pleiotropy would need to be calculated by constructing the relevant matrices and performing simple matrix manipulations.
Power of the pleiotropy model approach
Being a standard multitrait regression problem, the inclusion of fixed effects and covariates is straightforward and easy. The restriction that the same design matrix is required for all traits should not cause problems, as although fixed effects and covariates may be included that do not have a significant effect on some traits, this should not cause a bias in the results. The extension to fit multiple QTL through cofactors or by a multidimensional search is also straightforward following Knott et al. (1998). There are two basic alternatives for determining the model to be fitted in terms of fixed effects, covariates, and cofactors. One is to find the best model for each trait separately and then include all the effects into the multiple-trait model; the alternative is to determine the best model in the multiple-trait framework, i.e., considering the effect of the explanatory variables on all traits simultaneously. If the explanatory variables were correlated, the latter approach would tend to result in fitting fewer explanatory variables.
Missing genotype data are not a problem as potential QTL genotype probabilities are obtained from neighboring informative markers (possibly more than two when markers are only partially informative). Individuals with no marker genotypes are best excluded because they provide no information about the location of potential QTL. With maximum likelihood, these individuals may be included, but they provide only distribution information, which, as shown previously (Haley and Knott 1992), has little effect on the results of the analysis in terms of the detection of QTL. If selective genotyping has been carried out, the analyses can still be performed. The estimate of the location of any QTL should not be biased but the effect of any QTL will be overestimated. Making assumptions about the distribution of the trait, a correction for the overestimate could be made. Henshall and Goddard (1999) consider this problem of selective genotyping in half-sib families and propose the use of a logistic regression, which considers the genotype as the dependent variable and the phenotype as the independent one. They find that this approach performs as well as maximum-likelihood alternatives but is much easier to implement. It is limited, however, to situations where the offspring can be one of only two genotypes (e.g., a backcross).
F2 individuals with missing phenotypes would not give information about the presence of a QTL and, hence, could be omitted. In the multitrait situation, however, individuals may be missing only one of several traits and hence they would be wanted in the analysis. In experimental situations, the frequency of missing phenotypes will usually be low, except where factors such as sex limitation are involved (e.g., traits that can only be measured in one sex). For more general application of the multitrait analysis, however, methods for missing phenotypes need to be investigated.
Power of the linkage model approach
The significance thresholds used to detect the presence of QTL in this simulation study have been obtained by simulation using the same residual correlation between the traits as used to generate the data with QTL. In practice, a permutation test may be implemented. In this case, the phenotypic correlation between the traits will be treated as a residual correlation. In the simulated data, however, any additional correlation between the traits generated by the QTL is small and, hence, ignoring this should not bias the results.
In this study we did not consider the more general model where there could be two linked QTL, both with an effect on each trait (i.e., two linked pleiotropic QTL). Such a model can easily be included within the multitrait framework described here and could be fitted in a two-dimensional search analogous to the two-QTL model for the single-trait analyses (for example, Haley and Knott 1992). This model could be tested against the nested ones of one pleiotropic QTL and two linked QTL using the parametric bootstrap to set the significance thresholds. Additionally the effect of the two QTL on each trait could be tested following the approach suggested here.
The results presented here are based on the situation with two traits. Obviously there are frequently more than two traits recorded and the models described here can easily be extended to accommodate more traits. The extreme models would be one QTL affecting each trait and one QTL possibly affecting all of them. In addition there could be a number of intermediate models, such as one QTL affecting some of the traits but not others. These models can easily be accommodated in the analysis and test statistics similar to those described here for two traits determined (based on the ratio of the determinants of sums of squares matrices). The problem is one of testing, as the null hypothesis will frequently not be no genetic control and a series of tests for alternative models may be required. As shown here, a permutation test could be performed to test for the presence of QTL. The parametric bootstrap could be adapted for the involvement of more traits to test for linkage or pleiotropy. The null hypothesis model would be simulated and alternative models tested against it. In this case, strategies for looking for QTL become important, as the number of possible models is much greater than in the two-trait situation.
Parameter estimates
Several strategies can be proposed to determine the genetic architecture of multiple correlated traits. If the traits are genetically uncorrelated, then we expect that QTL would not have pleiotropic effects on two or more traits so that there would be no benefit from a multitrait approach. (Although note that, theoretically at least, a lack of genetic correlation could result from pleiotropic effects at two or more QTL that counteract each other.) Thus, for uncorrelated traits, starting with single-trait analyses is appropriate, although tests of pleiotropy vs. linkage for specific QTL that map to a similar location should be performed by multitrait analysis. If the traits are highly genetically correlated, at least some QTL are expected to affect some or all traits and, if we wish to find these QTL, then the pleiotropy model approach will be the most suitable starting point. For traits that are less highly correlated, the linked QTL model may be a better starting point. This would also be appropriate where the investigator's interest is in locating QTL that run counter to the general trend for more highly correlated traits. In this case one might first analyze the data with a pleiotropic QTL model, fit the pleiotropic QTL found as cofactors, and repeat the analysis with a linked-QTL model. Further work is required to examine the efficiency of these alternative strategies for analyzing data on multiple traits and to determine how the strategies are influenced by both the biological structure that underlies the data and by the nature of the questions being asked.
Acknowledgments
for support from the Royal Society and the Biotechnology and Biological Sciences Research Council.
Footnotes
-
Communicating editor: T. F. C. Mackay
- Received December 14, 1999.
- Accepted June 20, 2000.
- Copyright © 2000 by the Genetics Society of America