Approximate Analysis of QTLEnvironment Interaction with No Limits on the Number of Environments
 Abraham B. Korol⇓,
 Yefim I. Ronin and
 Eviatar Nevo
 Corresponding author: Abraham B. Korol, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel. Email: korol{at}esti.haifa.ac.il
Abstract
An approach is presented here for quantitative trait loci (QTL) mapping analysis that allows for QTL × environment (E) interaction across multiple environments, without necessarily increasing the number of parameters. The main distinction of the proposed model is in the chosen way of approximation of the dependence of putative QTL effects on environmental states. We hypothesize that environmental dependence of a putative QTL effect can be represented as a function of environmental mean value of the trait. Such a description can be applied to take into account the effects of any cosegregating QTLs from other genomic regions that also may vary across environments. The conducted MonteCarlo simulations and the example of barley multiple environments experiment demonstrate a high potential of the proposed approach for analyzing QTL × E interaction, although the results are only approximated by definition. However, this drawback is compensated by the possibility to utilize information from a potentially unlimited number of environments with a remarkable reduction in the number of parameters, as compared to previously proposed mapping models with QTL × E interactions.
DIFFERENTIAL expression of a phenotypic trait by genotypes across environments, or genotype × environment (G × E) interaction, is an old problem of primary importance for quantitative genetics and its applications in breeding, conservation biology, theory of evolution, and human genetics (Eberhard and Russel 1966; Falconer 1981; Via and Lande 1987; Tiretet al. 1993; Wu and Stettler 1997). Recent successful attempts to dissect quantitative variation into Mendelian genes employing molecular markers (mapping quantitative trait loci, or QTLs) have shifted the focus of G × E interaction analysis from the genotype to gene level (e.g., Patersonet al. 1991; Hayeset al. 1993; Asinset al. 1994; SariGorlaet al. 1997). For breeding purposes, the primary concern is possible environmental instability in manifestation of mapped QTLs that might become candidates for markerassisted selection. To evaluate stability of QTL effects in crop species, dozens of immortal mapping populations have been developed for trait scoring under various environmental conditions (Hayes 1994).
Several algorithms and computer packages have been proposed to conduct QTL mapping, allowing for QTL × E interaction effects (Hayeset al. 1993; Jansenet al. 1995; Tinker and Mather 1995; Romagosaet al. 1996; Beavis and Keim 1996; Utz and Melchinger 1996). In addition to testing the hypothesis of QTL × E interaction, simultaneous treatment of data from multiple environments provides a significant increase in statistical power of QTL detection and accuracy of the estimates of QTL position and effect (Jansenet al. 1995). However, such an analysis is limited by situations where the environments can be obviously characterized by some parameters, like day length or irrigationfertilization treatments, etc. [like the “fixed effects” model of analysis of variance (ANOVA)]. When these characteristics are not available, the application of the “general” QTL × E mapping model (see below) is accompanied by a tremendous number of parameters involved in the model that increase as a product of the identified QTLs and the number of environments where the traits were measured. In such a case, one could think about a “random effects” model so that the number of parameters for each QTL will include only main effect, the variance of QTL × E interaction, and QTL position. Although this option seems very attractive, it has its own drawbacks, especially if we are going to deal with environmental variation associated with different localities. Indeed, some (or many) localities may manifest quite repeatable differences from each other, justifying the “fixed model” approach (Baker 1996). Moreover, the information of geographicallyspecific QTL effects may be of practical importance. In such a case, a fixed effects model is, whenever possible, preferable over the random effects model, because the latter hides the biological (geographic) specificity of the QTL effect, compressing all the results to an estimate of variance. The major goal of this paper is to present an approach of QTL mapping analysis, allowing for QTL × E interaction across a large (in fact, unlimited) number of environments, without the necessity for a corresponding increase in the number of parameters. The proposed model is especially relevant in situations of geographic variation of external conditions, where the fixed model approach is desirable but not easy to implement.
THE MODEL
Approximated description of environmental dependence of QTL effect: The main distinction of the proposed model is in the choice of approximation of the dependence of putative QTL effects on environmental states. In reality, each environment is a complex of abiotic (temperature, humidity, ion concentration, etc.), biotic (parasites, pathogens, competitors, etc.), and agrotechnical features. These could strongly affect the manifestation of quantitative traits and the effects of QTL but are difficult to characterize quantitatively. As first suggested by Eberhard and Russel (1966), we advocate that the measured trait values of the mapping population (e.g., trait means) may serve as objective integral characteristics of the environmental state. Accordingly, a larger number of traits should provide a better “bioindication.” In the simplest form, one can approximate the environmental dependence of the effect of allele substitution at a QTL by a polynomial over the mean values of the same trait across the environments. The following example of a QTL mapping in a barley experiment (Hayeset al. 1996) with measurements conducted in many environments, illustrates the idea (Figure 1). For someputative QTLs, the dependence on mean value of the respective traits explains a large part of the environmental variation of the QTL effect. This suggested approach does not exclude the possibility to take into account any additional information, like temperature, day length, water regime, etc., that might characterize the environments (e.g., Jansenet al. 1995). These “physical” characteristics can be introduced into the model parallel to the bioindicatory terms (e.g., polynomial over the mean values) together with terms characterizing the dependence of the putative QTL effect on interaction between the physical and bioindicatory factors. Another approach to analyze QTL × E interaction without direct specification of the physical characteristics of the environments was recently proposed by Romagosa et al. (1996). Their algorithm is based on clustering the environments using a few (e.g., two) detected QTL with most variable effects across environments. Actually, this is a different version of the same general idea of bioindicators as a tool for characterizing “anonymous” environments.
Clearly, the results one could obtain by means of the method of QTL × E analysis proposed in this paper will be approximate, allowing, at best, to consider the major part of QTL × E interaction. However, as will be demonstrated below, the possibility to work with an unlimited number of environments without increasing the number of parameters, as compared to usual mapping models with QTL × E interactions, may significantly offset lossofaccuracy drawback, resulting in increased power to detect QTL × E interactions and in improved accuracy of estimates of QTL genomic location.
Mixturemodel of interval QTL mapping: Consider a simplified situation when the trait of interest (x) depends on a single QTL, Q/q. We will confine the analysis to dihaploid mapping populations (which also applies to backcrosses and recombinant inbreds), but it can easily be extended to other population structures. Then, for an arbitrary genotype of the mapping population, the trait measurement in the ith environment can be presented as
Assume that Q/q resides in some interval (k,k + 1) of a chromosome marked by a series of marker loci, M_{j}/m_{j}, with recombination rates r_{1} and r_{2} in M_{k}/m_{k} − Q/q and Q/q − M_{k}_{+1}/m_{k}_{+1}, respectively. For simplicity, we confined the analysis to the “no interference” case. For a dihaploid (backcross) mapping population, the expected densities of the trait x in each of the four marker groups U_{mkmk}_{+1}(x) = U_{1}(x), U_{Mkmk}_{+1}(x) = U_{2}(x), U_{mkMk}_{+1}(x) = U_{3}(x), and U_{MkMk}_{+1}(x) = U_{4}(x) can be written as
In a singleenvironment formulation, one could test whether or not the observed variation of x is associated with segregation in interval M_{k}/m_{k} − M_{k}_{+1}/m_{k}_{+1} and identify the corresponding locus Q/q. Provided recombination rate between marker loci is known, the vector of n_{1} parameters specifying the putative QTL can be presented as θ_{n}_{1} = {r,μ,a,σ^{2}}. The assumption of no association between segregation in M_{k}/m_{k} − M_{k}_{+1}/m_{k}_{+1} interval can formally be presented by another set of parameters, θ = θ_{n}_{0} = {μ,σ^{2}}. The null hypothesis {H_{0}: θ = θ_{n}_{0}}, as contrasted with the alternative {H_{1}: θ = θ_{n}_{1}}, can be investigated with the likelihood ratio test approach (Wilks 1962). If H_{0} is true, the statistic
In multiple environments, we could use the foregoing to trait measurements obtained under several environmental conditions. Namely, when comparing the foregoing alternatives H_{0} and H_{1}, QTL × E interaction effects could be included in the model and tested against the alternative of no QTL × E interaction. In other words, an additional group of hypotheses {H_{2}: θ = θ_{n}_{2}} could be considered that assume a dependence of the target QTL effect and, possibly, of the residual variance, on environment. Vector θ_{n}_{2} of the full model, corresponding to H_{2} with environmentspecific parameters a_{i},
In the simplified case of only one QTL segregating in the mapping population, no correlation between trait measurements across environments are expected. With this assumption, instead of the test statistics (3), one can build its multienvironmental equivalent χ^{2} (H_{1} vs. H_{0}) with df = 2p + 2 − 2p = 2. If H_{0} is rejected (a_{i} ≠ 0), then the obvious benefit of the corresponding multienvironmental model is the striking increase in the number of measurements, resulting in higher precision of parameter estimates (e.g., Jansenet al. 1995). No less important is the possibility to conduct the following two tests:
The asymptotic distribution of the test statistics (3) in the multiinterval mapping remains unknown (see Zeng 1994), but one could use extensive MonteCarlo simulations in order to obtain an empirical critical value of the statistics for each considered situation. Our previous simulation studies (Korolet al. 1995) have shown that the chisquare distribution is a good approximation for the test statistic (3), and here we will demonstrate that it may also be suitable for the test statistic (3b).
Regression specification of QTL × E interaction: Ignoring possible variation of the QTL effect among environments may lead to erroneous breeding decisions in subsequent applications of the mapping results, an accompanied reduction in the power, and loss of precision in estimated QTL effects and genome location. On the contrary, accounting for QTL × E interaction in the data obtained in multiple environments can strongly increase the resolution of the mapping experiment (Jansenet al. 1995; Tinker and Mather 1995). However, this proficiency is seriously attenuated by the necessity to build into the mapping model a large number of parameters specifying the working hypothesis of the QTL effects. For example, an experiment with 10 environments will require a model with 31 parameters when evaluating a single interval.
According to the proposed approach, the unknown effects a_{i} and, if desirable, the residual variances
The degrees of polynomials in Equation 4 cannot be predetermined before the mapping analysis. By contrast, the analysis includes model adjustment with a series of polynomials a(μ) = P_{as}(μ) and
An important point of concern with the proposed approach is how to proceed in a situation where the employed model allowed us to detect a significant QTL effect, but QTL × E interaction was not detected. Does it mean that no QTL × E interaction is characteristic of the revealed QTL or, alternatively, that this interaction exists, but the chosen parametrization (e.g., regression of QTL effects on mean trait values across environments) poorly approximates the real dependence of the QTL effect on environment. One of the possible ways to overcome this obstacle will be presented below.
Obtaining parameter estimates: Maximum likelihood estimates of all of the parameters, including α and β, are obtained using the procedure of numerical multiparameter optimization of functions L(θ) from Equation 3, a and b. Optimization was by modified gradient method (Himmelblau 1972). The possibility of multiple maxima was excluded by using various sets of starting values.
RESULTS
The efficiency of the proposed method was tested through MonteCarlo simulations. Three groups of situations were simulated: a single QTL (situations S_{1}–S_{3}), two unlinked QTLs (situations S_{4}–S_{5}), and several unlinked QTLs (situation S_{6}) (Table 1).
Single QTL: In the situation with a single QTL, no “betweenenvironment” correlation is expected for the residual (within QTL groups) variation. Thus, the loglikelihood functions 3a and 3b for the mixture model 3a and 3b could be calculated by summing up over all environments and employing the polynomials of Equation 4. This assumes implicitly that after removing the effects of the QTL under consideration, the residuals are independent across environments. Clearly, such an idealization is correct if all residual genetic variation of the quantitative trait is taken into account by markers of other genomic regions, such as cofactors (Jansen and Stam 1994; Zeng 1994). This may not be the case, calling into question the applicability of the proposed approach to real data analysis. It is indeed a very serious problem, but as shown in the following section, the conclusions may be fairly promising.
As a first step in demonstrating the idea of our method, we here consider the simplest case of a single QTL. The dependence of the simulated QTL effect and the residual variance on environment was modeled as cubic and quadratic functions, respectively (see Table 1). The simulated experiment included 10 environments with mean value of the trait (μ_{i}) linearly increasing from μ_{1} = 0 to μ_{10} = 3.6. The target QTL was positioned in the middle of the third interval of six of a linkage group. Each interval consisted of 24 cM. The size of the mapping population (either dihaploid or backcross) was n = 200.
The results obtained with polynomials of different degrees corroborate the expectation that the best resolution is achievable when the adjusted polynomials are of the same degree as those employed in generating the data (not shown). We found such a correspondence to be more important for approximating the substitution effects a_{i} than the residual variances
Our intention was to compare the general model (MG), specifying all effects a_{i} and residual variances
The results presented in Table 2 show that adequate approximation of a(μ) results in an appreciable increase in the power of both tests: H_{1} vs. H_{0} (presence of a QTL, allowing for a_{i} = const and
With simulated data, it is easy to compare “the adequate” and “nonadequate” approximations simply because we know the employed model. The results in Table 3 illustrate this point. As one can see, the adequate model (MA_{3}) gave the highest power of detection of both the presence of the QTL in question and QTL × E interaction, and the most accurate and precise estimate of QTL location. Note that even the poorest approximate model (MA_{1}) resulted in a higher power of QTL detection and better estimate of location than the best singleenvironment model (i.e., for the environment where the QTL effect was the highest). However, the situation will be quite different when real data will be analyzed, i.e., no prior information exists on the form of a(μ). Thus, the decision about the adequacy should be justified using statistical criteria. This can be done on the basis of the dependence of the evaluated significance level on the degree of the applied polynomials. The corresponding results for the situation S_{2} are presented in Table 4.
Table 4 illustrates the possibility to deduce the adequate approximation of the QTL × E interaction based on the analysis of the obtained LOD scores. The columns β_{et} = β(α) show the power of detection of QTL × E interaction for each of the presented models for three levels of significance (5, 1, and 0.1%). It is noteworthy that the critical values of the test statistics (see Equation 3b) were determined by using: (1) the asymptotic χ^{2} distribution, and (2) MonteCarlo simulations with 5000 runs for each of the models (data in brackets). The obtained results showed a remarkable proximity of these two estimates of the power for all of the models. Clearly, such a correspondence may be disturbed when a QTL not accounted for by the model affects the residual genetic variation, causing correlation between environments (see below). As in Table 3, the highest power of detection of QTL × E interaction and the most precise estimate of QTL location were obtained with model MA_{3}. It is not surprising that MA_{3} is superior over MG. But less expected is the fact that the nonadequate approximations MA_{2} and MA_{4} were also superior over MG, whereas the poorest approximation MA_{1} gave the closest results to MG, but with fewer parameters. Thus, it is not mandatory to have the adequate approximation to take advantage of the proposed method. It will be sufficient to provide a good approximation. Nevertheless, how can we decide about the adequate model, provided the class of the approximation functions is chosen correctly?
To address the last question, the following procedure was employed. For each run, the data were analyzed using all of the models (MA_{1}–MA_{4}, and MG), and models that detected QTL × E interaction at the level of significance α were chosen. Then, the model that: (1) exceeded significantly (at some level α*) all of the more simple models: (2) did not differ significantly (at α*) from more complex models was selected as adequate. The general model also participated in this competition as the most complex one, because of the number of parameters needed. The resulting distribution of the choices of the adequate model is presented in the last three columns of Table 4. It allowed us to conclude that: (1) model MA_{3} is an adequate model because it was chosen in more than half of the runs where the QTL × E interaction was detected, and with a frequency that is threefold higher than the next best choice; (2) the models of the polynomial class were chosen 25–30 times more than the exact general model MG. Moreover, even the simplest approximation, MA_{1}, would be selected 4–6 times more frequently than MG.
Two QTLs: When several QTLs segregate simultaneously in the mapping population, their effects will generate correlations between trait measurements across environments, which should be taken into account. One of the possible ways to account for this correlation is through simultaneous analysis of multiple traits, taking the trait values in different environments as different quantitative traits (Korol et al. 1987, 1994, 1995; Jiang and Zeng 1995; Roninet al. 1995). However, the multiple trait analysis limits the number of environments, because it is associated with an increased number of parameters. The approach proposed in this paper does not have this drawback, but introduces other sources of distortions: (1) correlations caused by unaccounted QTLs, and (2) approximated description of QTL dependence on environment based on the bioindication assumption.
Consider the first problem. We should now evaluate to what extent correlations between environments caused by unaccounted QTLs may affect the efficiency of the proposed approach. The second problem will be treated in the next section and in the discussion.
For the simulated cases of two QTLs segregating in the mapping population (S_{4} and S_{5}), we first analyzed the consequences when the proposed approach of accounting QTL × E interaction is applied, ignoring the correlations caused by the effect of Q_{2}/q_{2} (model 1, Figure 3). Then, we reevaluated the results by applying the proper model (model 2, Figure 3). In these simulations, we considered two situations of relative effects of the “target” QTL (Q_{1}/q_{1}) and of the cosegregating QTL (Q_{2}/q_{2}): Q_{1}/q_{1} and Q_{2}/q_{2} have comparable effects on the target trait though a_{1} < a_{2} (S_{4}), and Q_{2}/q_{2} is much stronger than Q_{1}/q_{1} (S_{5}). The residual variance
First, compare the accuracy of the QTL mapping obtained employing model 1 for situation S_{4}, with those of S_{3} where only the effect of Q_{1}/q_{1} was simulated (the situations S_{3} of Table 2 and S_{4}, model 1, Figure 3). In both cases, the results clearly demonstrate the superiority of the approximated model MA. Hence, provided that the effect of a cosegregating QTL, Q_{2}/q_{2}, does not considerably exceed the effect of the target QTL, Q_{1}/q_{1}, the proposed approach provides accurate results even if the effect of Q_{2}/q_{2} is ignored. However, this may not be the case with larger effects of Q_{2}/q_{2}, as demonstrated by the results for S_{5} (model 1, Figure 3). In general, a correct model should account for the genetic components of the residual variation in the alternative genotypic groups of the target QTL, causing correlation between trait values across environments (e.g., Jiang and Zeng 1995). This is also true for the method proposed here of mapping analysis with data measured in multiple environments.
Two possibilities exist for considering the effects of cosegregating QTLs in the mixture mapping model. The first is to represent all QTL groups (four, in our case, of two QTLs cosegregating in a doubled haploid or backcross population) in the likelihood function. Although this procedure is not feasible for mapping multiple QTLs across the genome, it may be very useful in cases of linked QTLs. The second is to include into the mixture model the effects of the cosegregating QTLs as cofactors derived from regression analysis on marker loci (Zeng 1994; Jansen and Stam 1994). The proposed approximated method is equally applicable in both of these approaches. Here, we demonstrate it using the first approach. Although this mixture formulation is more challenging technically, it allows for a proper analysis of potential variance effect of the cosegregating QTL (although we do not deal with this problem here). It is not obvious how to model the effect of a second QTL with regression cofactors. As was shown earlier, variance effect of a QTLmay result in increased accuracy of the mapping model if it is included into the model, and may seriously reduce the accuracy with an inadequate model (Korolet al. 1996).
With two QTLs, four densitites f_{q1q1q2q2}(x), f_{q1q1q2q2}(x), f_{Q1Q1q2q2}(x), and f_{Q1Q1Q2Q2}(x) should be considered. Consequently, in calculations of the maximum likelihood function, instead of four marker groups for a current interval, it is necessary to characterize 16 marker groups for any pair of nonadjacent intervals. The application results of the full MG model and the approximated polynomial MA model are presented in Figure 3 (model 2). It is noteworthy, that the proposed approximation of the environmental dependence of QTL effect as a function of the mean value of the trait in a given environment was applied not only to the target QTL Q_{1}/q_{1}, but also to the cosegregating Q_{2}/q_{2}. This approach may be especially attractive when there are many cosegregating QTLs with environmental dependent effects (e.g., as regression cofactors on respective marker loci). This will result in far fewer parameters. As expected, the full model 2 increased the power of detection of Q_{1}/q_{1} effect on the trait x and (Q_{1}/q_{1}) × E interaction (not shown), as well as increased accuracy of estimates of the chromosome position of Q_{1}/q_{1} and of its effects by the environments (model 2, Figure 3). Again, MA had superior attributes than MG. This conclusion is also supported by the values of AIC.
Several QTL: As we could see before, a strong QTL, if not accounted by the model, may cause correlations between environments resulting in reduced accuracy of estimated parameters. Nevertheless, the distortion caused by a QTL comparable with the target one (e.g., exceeding the target effect no more than two times) is not dramatic (see Figure 3). Including the effects of cosegregating QTLs into the model solves this problem. This can be done by combining the proposed approach with regression cofactors. However, an appreciable proportion of genetic variation for the analyzed trait may still remain in the residuals, because of combined effect of many small QTLs. This residual genetic variation may be severalfold larger than the effect of the target QTL. Would the resulting correlation between environments preclude the application of the method?
To address this question, let us consider the case S_{6} (for detailed specification see Table 1). Here, the genetic variation of the trait depends on the target QTL (Q_{1}/q_{1}) (with an average h^{2} ~ 2.5% across environments) and 10 additional unlinked QTLs (Q_{2}/q_{2}–Q_{11}/q_{11}). The average (across environments) effect of Q_{2}/q_{2} was h^{2} ~ 10%, whereas the combined average effect of Q_{3}/q_{3}–Q_{11}/q_{11} was 15%. Thus, the total effect of Q_{2}/q_{2}–Q_{11}/q_{11} is 10fold compared to that of Q_{1}/q_{1}, whereas the effect of Q_{3}/q_{3}–Q_{11}/q_{11} is sixfold compared to that of Q_{1}/q_{1}. One may expect that the power of detection of Q_{1}/q_{1} × E interaction will be very low if the segregation of Q_{2}/q_{2}–Q_{11}/q_{11} is not accounted for by the model, hence causing correlation between the environments. This is indeed the case as can be seen from Table 5 (first row for N = 1,3,5). It is noteworthy, that in this case employment of the asymptotic distribution for the critical values of the test statistics gives seriously biased upward estimates β_{et} = β_{et} (α) of the power of detection of Q_{1}/q_{1} × E interaction (compared to the estimates β_{es} = β_{es} (α) obtained using MonteCarlo simulations with 5000 runs).
As we see from the foregoing results for the two QTL situations (S_{4} and S_{5} in Figure 3), an unaccounted QTL will not seriously affect the results for the target QTL if its effect does not exceed the target one by too much (e.g., not more than twofold). In the current situation S_{6}, each of the simulated effects of Q_{3}/q_{3} –Q_{11}/q_{11} fit this condition, whereas this is not true for their combined effect or for the individual effect of Q_{2}/q_{2}. It is interesting to explore whether including Q_{2}/q_{2} into the model as a cofactor will improve the situation. This is indeed an important question, because in practice sufficiently strong QTLs can be compensated in such a way (Jansen and Stam 1994; Zeng 1994), but this does not guarantee that the residual variation caused by many small polygenes will not exceed the target effect several times over, thus preventing the application of the proposed method.
The data presented in the second row of Table 5 show that including Q_{2}/q_{2} as a cofactor into the model substantially improved the situation by increasing the detection power of Q_{1}/q_{1} × E interaction from two to fivefold (for α, ranging from 0.05 to 0.001) and the precision of Q_{1}/q_{1} estimated location more than twofold. Note that in this case the χ^{2} distribution appeared to be a very good approximation for the distribution of the test statistic (compare corresponding values of β_{et} and β_{es}) in spite of the noise caused by Q_{3}/q_{3} –Q_{11}/q_{11}. It would be quite desirable to get some idea of the distorting effect of the correlations caused by joint action of the unaccounted QTLs Q_{3}/q_{3}–Q_{11}/q_{11}. Therefore, for comparison we provide in the third row the results for the case where all the residual genetic variation caused by Q_{3}/q_{3} –Q_{11}/q_{11} is replaced by nongenetic variation. We can conclude that distortion of the basic model assumption of “no correlation between environments” caused by the presence of Q_{3}/q_{3} –Q_{11}/q_{11}, which collectively exceed by a factor of six the effect of the target QTL Q_{1}/q_{1}, is incomparably smaller than that caused by a single QTL, Q_{2}/q_{2}, which exceeds the target QTL only by a factor of four. The same analysis was conducted when instead of Q_{1}/q_{1} another QTL was considered as a target one (Q_{3}/q_{3} or Q_{5}/q_{5}). The results are presented in the remainder of Table 5 and manifest the same pattern.
Missing data: One can hardly expect that all genotypes will be perfectly represented in all of the environments where the experiment was conducted. Some data will be missed, hence it is of interest to get some idea how it could affect the power of QTL × E detection. Our approximate model allows us to treat this problem easily. It appeared that with a large number of environments, even if a large proportion of genotypes is not represented in each environment, the resulting power of the test of QTL × E interaction and location accuracy of the target QTL are quite high. MonteCarlo simulations presented in Table 6 illustrate this point. It is noteworthy, that if only 20–50% of the data are available in each of the 50–100 environments, the approximated model is still very satisfactory even when a suboptimal approximation was used (compare the results for MA_{1}, MA_{2}, and MA_{3} for the two examples with the situation S_{4}). Clearly, an attempt to apply the general model would mean an unrealistic task of estimation of about 100–200 parameters, in contrast to our model which needs only eight parameters.
Example of application: The trait “alpha amylase activity” from a barley QTL × E study presented in Figure 1 (see Hayes et al. 1993, 1996) was used to demonstrate the utility of the proposed procedure. From previous analyses, the largest QTL effect for this trait was associated with segregation on chromosome 1 (Hayeset al. 1996). Thus, according to the simulation results of the previous section, even if one ignores the effects of other genomic segments when dealing with markers of chromosome 1, we did not expect serious reduction in the efficiency of the mapping analysis. As shown in Figure 1, the estimates of a_{i} for this trait obtained for separate environments can be approximated as a quadratic parabola of the mean value of the trait over the environments. This approximation was used to construct a combined model for testing QTL × E interaction effect and to estimate the QTL location on chromosome 2 (Table 7).
The first step was to decide whether variation in
An important question is whether the two models, MA or MG, differ significantly provided H_{2} {a_{i} ≠ const} is true. Such a comparison was conducted for both considered
situations, i.e., with
DISCUSSION
The conducted simulations and the example of barley multiple environments experiment demonstrate the utility of the proposed approximate approach for analyzing QTL × E interaction. Its main benefit is the ability to use data collected from a large number of environments without the necessity of increasing the number of parameters. Earlier, an elegant solution to this problem was proposed by Jansenet al. 1995. Their QTL mapping model includes in an obvious way the terms describing the effects of the target QTL and regression cofactors of cosegregation QTL, the effects of multiple environments, and the terms of QTL × E interactions. However, such an analysis is limited by situations where the environments can be obviously characterized by some physical attributes. When such characteristics are not available, the application of the general QTL × E mapping model (our foregoing MG model) is accompanied by a tremendous number of parameters involved in the model. The method proposed in this paper overcomes, though in an approximate form, both these obstacles, allowing us to analyze QTL × E interactions across a large (in fact, unlimited) number of “anonymous” environments. Expressing the dependence of a QTL effect on environmental conditions as a function of environmental mean value of the trait can also be applied to multiple QTLs from independent genomic regions. Therefore, the proposed approach could be very helpful in coping, albeit in an approximate form, with a difficult problem of QTL mapping analysis, i.e., rapid increase in the number of parameters with increasing number of effective QTLs and environments. This improves our ability to efficiently extract more mapping information when more environments are used to evaluate the quantitative trait.
In addition to the large number of parameters to be estimated, the general model MG of QTL × E interaction fails to account for correlation between environments caused by cosegregating QTLs not included into the model. While the first problem is not critical for our method, the second one may be more serious. The foregoing simulations showed that unaccounted QTLs with a strong individual effect may indeed reduce the power of detection of QTL × E interaction and the accuracy of parameter estimation by the proposed approximated method. Therefore, including such QTLs as cofactors into the model is mandatory for applications. However, such a compensation cannot be perfect and a significant genetic component may remain in the residual variation. An important question is whether this residual genetic variation, which can be severalfold larger than the effect of the target QTL, will produce correlation between the environments precluding the application of the method. Our simulations allowed us to conclude that distortion of the basic model assumption of “no correlation between environments” caused by the segregation of several small QTLs, which collectively exceed by a factor of six the effect of the target QTL, is much smaller than that caused by a strong single QTL, which exceeds the target QTL by only a factor of four (see Table 5). Thus, undetectable small QTLs will not attenuate seriously the resolution power of the proposed method, even if their combined effect is severalfold higher than that of the target QTL.
An important question is how to reveal the adequate approximation of the QTL × E interaction. With simulated data, it is easy to compare the adequate and the nonadequate approximations simply because we know the degrees of the polynomials employed in the simulations. However, the situation will be quite different when real data will be analyzed. Thus, the decision about the adequacy of the approximation should be justified statistically, i.e., we should decide about the adequate model, provided the class of the approximation functions is chosen correctly. This allows us to conclude that: (1) the adequate model MA_{3} was the best, i.e., it was chosen in more than half of the runs where the QTL × E interaction was detected and with a frequency that was threefold higher than the next best choice.
The last and most difficult problem is how to recognize the situations when the applied approximation is not valid. If the opposite is true, i.e., if the dependence of the QTL effect on environmental conditions can indeed be presented in the form of regression on mean values or any other bioindicators, then the proposed approximated method proved to give a higher detecting power of QTL × E interaction compared to the precise general model (MG). Thus, one can start the procedure using the approximated method, though the general model can also be applied in parallel if the number of environments is not too large, so that the number of parameters for MG is not unrealistically large. However, if the approximated analysis revealed no significant QTL × E interaction, does it really mean an independence of the QTL effect from environmental conditions? Or, alternatively, the interaction may exist, but it cannot be represented as a regression of the target QTL effect on the mean values of the trait or some other bioindicators?
Consider one of the possible ways to cope with this problem. If the general model is applicable, i.e., the number of parameters is not too large, it may be used as a tool to answer the foregoing question. Rejection of the H_{0} hypothesis “no QTL × E interaction” by MG will mean that our basic assumption (regression on the bioindicator) does not fit the data. If the number of environments is too large, the general model can be applied for randomly chosen groups of environments. Then, the significance of the interaction may be evaluated from the obtained distribution of the tests using the Bonferroni correction. For example, with N = 100 environments, one can produce k = 20 samples, each including data of m = 10 randomly chosen environments. Let α be the accepted level of significance for the QTL × E interaction test for the whole set of the samples. Then, assuming independence of these samples, one can reject the H_{0} hypothesis if at least one of the samples achieved the significance level of α/k. Clearly, due to the postulated independence, which is not the case for mk > N, this is a conservative test of QTL × E interaction. Nevertheless, it seems preferable to us than the standard way of multipleenvironment data analysis when the data from each environment are treated separately, and the final conclusion is derived from the analysis of the estimated QTL effects across environments (Patersonet al. 1991; Stuberet al. 1992; Utz and Melchinger 1996).
The foregoing test based on the general model may result in the same conclusion as the approximated model, i.e., “no QTL × E interaction.” By contrast, if the general model allowed us to detect QTL × E interaction, but the approximated model did not, it will indicate that the proposed bioindicator(s) is not informative and other explanatory factors could be found. Further studies are needed to develop more optimal algorithms of application of the proposed approach when applied to a large number of environments (and when direct utilization of the general model is impossible). However, even in the current form, the drawbacks of the proposed method are compensated by the possibility of working with an unlimited number of environments with missing data, and at a remarkable reduction in the number of parameters needed, as compared to the usual way of testing for QTL × E interactions based on ANOVA treatment of QTL estimates obtained on the basis of singleenvironment analysis.
Acknowledgments
We are grateful to the North American Barley Genome Mapping Project (Dr. P. M. Hayes) for the data set employed in our illustrative example. We are very thankful to the anonymous referees for their relevant comments and suggestions that assisted in improvement of the manuscript. This work was supported by the Israeli Ministry of Absorption and Ministry of Science and the AncellTeicher Research Foundation for Genetics and Molecular Evolution.
Footnotes

Communicating editor: B. S. Weir
 Received December 9, 1996.
 Accepted December 31, 1997.
 Copyright © 1998 by the Genetics Society of America