Genetics, Vol. 156, 2043-2050, December 2000, Copyright © 2000

A Mixed-Model Approach to Mapping Quantitative Trait Loci in Barley on the Basis of Multiple Environment Data

Hans-Peter Piephoa
a Institut für Nutzpflanzenkunde, FB 11, Universität Kassel, 37213 Witzenhausen, Germany

Corresponding author: Hans-Peter Piepho, Universität Kassel, Steinstrasse 19, 37213 Witzenhausen, Germany., piepho{at}wiz.uni-kassel.de (E-mail)

Communicating editor: J. B. WALSH


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*EXAMPLE
*DISCUSSION
*LITERATURE CITED

In this article, I propose a mixed-model method to detect QTL with significant mean effect across environments and to characterize the stability of effects across multiple environments. I demonstrate the method using the barley dataset by the North American Barley Genome Mapping Project. The analysis raises the need for mixed modeling in two different ways. First, it is reasonable to regard environments as a random sample from a population of target environments. Thus, environmental main effects and QTL-by-environment interaction effects are regarded as random. Second, I expect a genetic correlation among pairs of environments caused by undetected QTL. I show how random QTL-by-environment effects as well as genetic correlations are straightforwardly handled in a mixed-model framework. The main advantage of this method is the ability to assess the stability of QTL effects. Moreover, the method allows valid statistical inferences regarding average QTL effects.


THERE are several different strategies to map quantitative trait loci (QTL; KEARSEY and FARQUHAR 1998 Down), e.g., single-marker locus analysis (LIU 1998 Down); simple interval mapping (IM; LANDER and BOTSTEIN 1989 Down); composite interval mapping (CIM; ZENG 1993 Down, ZENG 1994 Down), also called multiple QTL mapping (MQM; JANSEN and STAM 1994 Down); simplified CIM (TINKER and MATHER 1995 Down); marker regression (KEARSEY and HYNE 1994 Down; WU and LI 1994 Down); Bayesian methods (SILLANPAA and ARJAS 1998 Down); and multiple interval mapping (MIM; KAO et al. 1999 Down; ZENG et al. 1999 Down). The latter methods have been shown to yield better power of QTL detection than IM and single-marker locus analysis (LIU 1998 Down; LYNCH and WALSH 1998 Down). In this article the main focus is on CIM for its simplicity and its importance in practice.

Within a frequentist framework, two different model-fitting approaches can be distinguished among procedures for QTL mapping by CIM: those based on maximum-likelihood (ML) estimation (LANDER and BOTSTEIN 1989 Down; ZENG 1994 Down) and those using multiple linear regression (HALEY and KNOTT 1992 Down; MARTINEZ and CURNOW 1992 Down, MARTINEZ and CURNOW 1994 Down). The latter approach usually gives results very similar to the former, though it must be seen as an approximate method. The main advantage of the multiple regression approach is its computational simplicity (MARTINEZ and CURNOW 1994 Down). The computational tradeoff may become quite considerable in complex situations, i.e., when the model is extended to cover several random effects, as in this article. I therefore prefer the regression approach to ML and also to Bayesian methods using cofactors (SILLANPAA and ARJAS 1998 Down).

In this article, I analyze the barley dataset by the North American Barley Genome Mapping Project (HAN and ULLRICH 1993 Down). This dataset comprises 16 trials performed in different environments. The aim of the analysis is to detect QTL with significant mean effect across environments and to characterize the stability of effects across multiple environments. A need for mixed modeling arises in two ways. First, for an assessment of mean effects and of stability, it is necessary to regard environments as a random factor. Second, I expect a genetic correlation among pairs of environments caused by undetected QTL (KOROL et al. 1998 Down). Random QTL-by-environment effects as well as genetic correlations are straightforwardly handled in a mixed-model framework. The main advantage of my method is the ability to assess the stability of QTL effects. Moreover, the method allows valid statistical inferences regarding average QTL effects.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*EXAMPLE
*DISCUSSION
*LITERATURE CITED

The data:
I use the six-row Steptoe/Morex mapping population by the North American Barley Genome Mapping Project (HAN and ULLRICH 1993 Down). The data comprise 150 doubled haploid (DH) lines, 223 markers, and 16 environments in the United States and Canada in 1991 and 1992. The same 150 genotypes were tested in all environments. The data are partially replicated with two replications. I use genotype-by-environment means for different analyses.

The model for a single environment:
The model will be adjusted to DH and backcross progeny data, but it is readily extended for F2 and other populations. Assume that the F1 cross is M1QM2/m1qm2 with respect to the two flanking markers bordering the interval of interest (M1 and M2) and the QTL (Q). Define a random variable gi from the ith genotype taking value gi = 1 if the F1 gamete carries the Q allele and gi = 0 if the F1 gamete carries the q allele. In the regression approach to QTL mapping (HALEY and KNOTT 1992 Down; MARTINEZ and CURNOW 1992 Down), the phenotype is regressed on the expected value of gi, given the flanking markers. Table 1 gives explicit expressions for zi = E(gi|flanking markers) assuming no interference (MARTINEZ and CURNOW 1992 Down). Missing marker data can be handled using the method of MARTINEZ and CURNOW 1994 Down. In the case of backcross progeny, the genetic effect for zi constitutes a mixture of additive and dominance effects, whereas for doubled haploids, the genetic effects are confined to additive effects. For CIM using DH data, the following basic regression model can be used (LYNCH and WALSH 1998 Down, p. 465),

(1)

where yi is a phenotypic observation for the ith genotype (i = 1, ... , N); µ is an intercept term; {alpha} is the effect of the putative QTL; c1i, c2i, ... are cofactors corresponding to markers on the map, which control for other QTL; {gamma}1, {gamma}2 ... are the associated regression coefficients; and ei is a residual accounting for both environmental variation and unexplained genetic variation. The residual genetic variation modeled by ei is regarded as random. If the unexplained residual genetic variation captured by ei is made up of a sum of small genetic contributions, the normality assumption for ei may be a suitable approximation. Note, however, that ei will also contain a component due to the different genetic effects at the putative QTL, so a more realistic model is a mixture of normal distributions, with the number of components depending on the number of genotypes at the putative QTL. The normality assumption for ei is a matter of convenience, allowing model fitting by ordinary least squares in the familiar regression framework, rather than by ML. Several authors have pointed out that using least squares in place of ML tends to cause only a marginal loss of information (HALEY and KNOTT 1992 Down; MARTINEZ and CURNOW 1992 Down). Also, ML estimation has large sample optimality properties only when the model is correctly specified. In most applications, the true underlying genetic model will be complex and any fitted model can at best be regarded as an attempt to approximate the true model as closely as possible (BURNHAM and ANDERSON 1998 Down). If there are a number of QTL not accounted for by the fixed part of the model and errors are normal, the residual will be a mixture of an unspecified number of component normals. Approximating this mixture by a normal distribution should be adequate in most circumstances. Model (1) may be expressed in vector notation as

(2)

where ß = (µ, {alpha}, {gamma}')' with {gamma} = ({gamma}1, {gamma}2, ... )', x'i =(1, zi, c'i), and c'i = (c1i, c2i, ... ). The parameter vector ß is easily extended to include multiple QTL and effects for epistasis (MORENO-GONZALEZ 1992 Down), but this is not elaborated here.


 
View this table:
In this window
In a new window

 
Table 1. Expectation of g conditional on flanking markers for a DH population (MARTINEZ and CURNOW 1992 Down)

Model for data from multiple environments:
The basic model (1) may be taken as a building block for more refined modeling to account for the design. Here, I am particularly interested in modeling data from multiple-environment trials (METs). Specifically, I contend that a realistic model for genotype-by-environment effects is needed to make unbiased inferences regarding QTL effects and positions. Moreover, the problem of genetic correlation among environments needs to be taken into account in case the same set of genotypes is tested in the different environments (KOROL et al. 1998 Down). Let yij be the observation of the ith genotype (i = 1, ... , N) in the jth environment (j = 1, ... , M). The proposed model is

(3)

This model is derived by regarding all regression parameters in (1) as means across environments and allowing an environment-specific deviation. Thus, the environmental main effect uj corresponds to µ, aj is a deviation of the QTL effect from the average {alpha} in the jth environment, gkj (k = 1, 2, ... ) is a deviation from the average cofactor regression coefficient {gamma}k (k = 1, 2, ... ) in the jth environment, and dij is a random deviation for the ith genotype in the jth environment from the "average" residual effect ei. Note that as opposed to (1), ei in (3) is a residual genetic effect for the ith genotype, which is free of experimental error. The experimental error is now captured by dij, which models both error and residual genotype-by-environment interaction. An alternative way of deriving (3) is to allow a separate model of the form (1) for each environment, to then combine all models into a single model, and finally to partition model parameters into average effects across environments and environment-specific deviations.

All environment-specific deviations (uj, aj, gkj, and dij) and the residual genetic effect ei are regarded as random normal deviates. To fully state the model, I need to specify the variances and covariances of all random terms. The variance-covariance structure should allow sufficient generality to realistically model real data. Before stating the full second moment assumptions, model (3) is modified to allow more generality. Model (3) contains a residual genetic effect ei and a residual dij. This model corresponds to the usual factorial partitioning of main effects and interaction for genotype-by-environment data in plant breeding and quantitative genetics (LYNCH and WALSH 1998 Down). To highlight the fact that dij contains both error and genotype-by-environment interaction, I may write dij = fij + hij, where fij is the interaction and hij is error. The customary assumption in mixed-model analyses, henceforth denoted as compound symmetry assumption, is that ei, fij, and hij are identically distributed with zero mean and constant variances, {sigma}2e, {sigma}2f, and {sigma}2h, respectively. This assumption is quite restrictive because it implies constancy of genetic correlation among pairs of environments as well as constancy across environments of genetic variances within environments. For this reason, I replace the term ei + dij by a term eij and initially assume that the random vector ei = (e1i, ... , eiM)' has unstructured variance-covariance matrix var(ei) = R, where R is symmetric and positive definite. I then explore various structured models for R, which have fewer parameters than in the unstructured case. The compound symmetry assumption is just a special case of this more general model with R = JM{sigma}2e + IM{sigma}2, where JM is a M x M matrix of ones everywhere, IM is the M-dimensional identity matrix, and {sigma}2 = {sigma}2f + {sigma}2h. The modified scalar model reads

(4)

In vector notation this can be written as

(5)

where ß is as defined in (1) and bj = (uj, aj, g'j)' with gj = (g1j, g2j, ... )'. The main interest of our analysis is in the average QTL effects in ß, while the random vector bj basically plays the role of an error term. To cater for generality, correlation among elements in bj is allowed. Details are discussed in the next paragraphs. A desirable feature of the model is this: instead of dropping certain genetic effects completely, I can choose whether to move them to ß and bj, or to eij. I now discuss suitable variance-covariance structures for eij and bj.

Genotype-by-environment effects: I regard the genetic components in eij as random. When testing the same set of genotypes in the various environments, as has been assumed so far, genetic correlation among observations on the same genotype made in different environments needs to be allowed for. Many articles on the mapping of QTL based on multienvironment data corresponding to this design (JANSEN et al. 1995 Down; BEAVIS and KEIM 1996 Down; ROMAGOSA et al. 1996 Down; SARI-GORLA et al. 1997 Down; KOROL et al. 1998 Down) have employed a multiple regression model with independent errors. This model implicitly assumes absence of genetic correlation, which is an unrealistic assumption. It is to be expected that in the presence of positive genetic correlation, the information on QTL position provided by data from a sample of environments is smaller than when genetic correlation is absent. This is best explained by a simple example (not specifically designed for QTL analysis). Consider two observations y1 and y2. The mean (y1 + y2)/2 has variance , where {sigma}21 and {sigma}22 are the variances of y1 and y2, respectively, and {rho} is the correlation. Clearly, the variance of the mean is an increasing function of the correlation {rho}. Similarly, in QTL mapping, accuracy of parameter estimates is expected to decrease with genetic correlation. Ignoring genetic correlation can therefore lead to overoptimistic inferences, i.e., to spurious detections and to inappropriate standard errors for parameter estimates. For this reason, I principally allow correlations among elements in ei. A variety of models for R = var(ei) can be considered. The most complex choice leaves R unstructured, and the simplest model is R = IM{sigma}2. While the unstructured model is often the most realistic one, it may entail an unnecessarily large number of parameters, when the number of environments (M) is large. Overparameterization may be avoided by imposing a certain variance-covariance structure. I consider models commonly used for the analysis of genotype-by-environment data, i.e., compound symmetry (R = JM{sigma}2e + IM{sigma}2), heteroscedastic (R = D and R = JM{sigma}2g + D, where D is a diagonal matrix; SHUKLA 1972 Down), and various factor-analytic models, which are characterized by a component {lambda}{lambda}', where {lambda} = ({lambda}1, ... , {lambda}M)' is a vector of factor loadings associated with the individual environments (PIEPHO 1998A Down). The simplest factor-analytic variance-covariance structure is given by R = {lambda}{lambda}' + IM{sigma}2, corresponding to the model eij = {lambda}jui + wij, where ui is a standard normal score for the ith genotype and wij is normal with zero mean and variance {sigma}2. Environments with a large absolute value for the factor loading {lambda}j will have residuals eij more widely spread out than environments with small {lambda}j. I note in passing that model (3) is also applicable if a different set of genotypes is tested in each environment. In this case eij from different environments are stochastically independent; i.e., R is diagonal. The diagonal elements may be either homogeneous (R = IM{sigma}2) or heterogeneous (R = D).

QTL-by-environment effects and environmental main effect: In my analysis inferences are to be drawn with respect to a target population of environments. I regard the testing environments as a random sample from the target. The purpose of a mixed-model analysis is to reveal mean QTL effects across environments. Here, random QTL-by-environment interaction essentially plays the role of an error term. Moreover, the stability of QTL effects across environments is an important aspect. The larger the variance of QTL-by-environment effects the lower the stability. Finally, I can make environment-specific inferences, employing best linear unbiased predictions (BLUPs) of QTL-by-environment effects.

It is necessary to allow for correlation among the effects in bj. For example, a perfect correlation must be assumed for regression coefficients pertaining to adjacent markers, since both are linear in the additive genetic effect of the flanked QTL (WHITTAKER et al. 1996 Down). Also, variances of two components in (bj) corresponding to a pair of markers will have to be heterogeneous, considering the explicit expressions given in WHITTAKER et al. 1996 Down(p. 25, right, bottom). Moreover, different QTL may be responding similarly to differential environments, giving rise to positive correlation among regression coefficients corresponding to markers adjacent to different QTL. For these reasons I make the general assumption that

(6)

where G is symmetrical and positive definite, but otherwise unstructured. Note that this assumption also ensures scale invariance, i.e., invariance to the particular coding for markers (0/1, 1/2, -1/1). Again, explicit modeling of the variance-covariance structure is worthwhile to keep the number of parameters low, although many parsimonious structures suffer from lack of scale invariance. I can consider the same structures as for R. It should be stressed that G does not contain genetic effects, but merely QTL-by-environmental effects and an environmental main effect.

The effect of the putative QTL in the jth environment is given by ({alpha} + aj). The diagonal element in G corresponding to aj can therefore be interpreted as a measure of stability of the effect of the putative QTL. The larger the variance of aj, the more variable/less stable are the environment-specific QTL effects ({alpha} + aj). The breeder will seek a large absolute value for {alpha} and a small variance for aj, i.e., a high stability. This interpretation of a variance component as a measure of QTL effect stability is akin to approaches for assessing yield stability in MET data (LIN et al. 1986 Down; BECKER and LEON 1988 Down; PIEPHO 1998B Down).

It is convenient to write the full model for yij in matrix form as

(7)

where y = (y11, ... , yN1, y12, ... , yNM)', 1M is a vector of M ones, X = (x1, ... , xN)', ß = (µ, {alpha}, {gamma}')', IM is the M-dimensional identity matrix, b = (b'1, ... , b'M)', and e = (e11, ... , eN1, e12, ... , eNM)'. Apart from the residual variance, this model is similar to the random effects model for longitudinal data (LAIRD and WARE 1982 Down). The first two moments under this model are

(8)

and

(9)

Why analysis of means across environments is problematic:
If the same set of genotypes is tested in different environments, it is tempting to compute genotype means across environments and subject these to standard CIM. Such an analysis assumes that means of different genotypes, = M-1(1'M {otimes} IN)y, are stochastically independent and have homoscedastic errors (M is the number of environments). This assumption is problematic with model (5), under which I have for the means

(10)

and

(11)

where {sigma}2R = M-21'MR1M. The important point to note here is that all elements in are correlated among one another due to the term M-1XGX' on the right-hand side of (11). Thus, the means violate the assumptions underlying a standard QTL analysis. An appropriate means analysis could be based on generalized least squares using (11), but this is not recommended for two reasons. First, using (11) requires an estimate of G, which necessitates an analysis of replicate data, thus annihilating the computational advantages of the means analysis. Second, and more importantly, the means themselves are unweighted and thus ignore the genetic correlation structure. Therefore the means analysis is not optimal.

Model selection:
In what follows I assume random environments and consider CIM for mean QTL effects across environments. Model selection is necessary regarding three aspects: (1) the markers to be used as cofactors, (2) the variance-covariance structure for R, and (3) the model for QTL-by-environment interaction, i.e., for G. These selection problems are briefly discussed. Unfortunately, the three problems cannot be tackled in an entirely independent manner. For example, in a joint analysis of the data, the chosen variance-covariance model will have an effect on the selected set of markers and vice versa. From a theoretical point of view it seems desirable to handle these three model components simultaneously. Due to the large number of candidate models (i.e., combinations of choices for 1–3), however, this simultaneous approach is not usually feasible in practice. Thus, some form of sequential approach is preferable. While this may entail the risk of missing some good-fitting models, it has the important advantage of reducing the total number of models to be considered. I suggest to first select the markers, then the variance-covariance structure for R, and finally the model for QTL-by-environment interaction (G). At each step, I use the Schwarz Bayesian Criterion (SBC) to choose among options (WOLFINGER 1993 Down; MCQUARRIE and TSAI 1998 Down). The criterion is given by SBC = log L - p log(n), where L is the maximized likelihood, p is the number of parameters, and n is the number of observations. Models with large values for SBC are preferred.

Cofactor selection: I select cofactors by multiple linear regression of means on marker types using the marker pair selection (MPS) approach by H.-P. PIEPHO and H. G. GAUCH (unpublished results). This procedure has three distinctive features: (i) markers are selected in adjacent pairs to increase the chance of selecting flanking markers while reducing the risk of selecting markers not linked to QTL; (ii) an exhaustive search per chromosome is used in place of simple forward selection, which reduces the risk of missing the best fitting model; (iii) a model selection criterion such as SBC is employed to select the final model among a sequence of models. Among a selected pair, I use the marker that fits best as a cofactor for CIM. The procedure was developed for models with a single error term. Extension to the mixed model (4) for the replicate data is straightforward in principle but not generally feasible at present, mainly because of the prohibitive workload of having to fit a multitude of complex models by ML or restricted maximum likelihood (REML). This is the reason for my suggestion to apply MPS to means . The approach is of an ad hoc nature, considering the fact that strictly speaking the means violate the independence assumption. As more efficient software becomes available, applying MPS to the replicate data using a mixed-model framework is preferable.

Genotype-by-environment interaction (R): I initially assume the most general model for QTL-by-environment interaction on the basis of the genetic model corresponding to the selected cofactors (WOLFINGER 1993 Down). At this stage, the model does not yet contain the covariate zi for the pair of markers flanking the putative QTL. Interactions are regarded as fixed, although at later steps of the analysis I take QTL-by-environment interaction as random. No specific model is assumed for the interactions. The model I select for R is used subsequently when modeling QTL-by-environment interaction and when scanning the genome for putative QTL.

QTL-by-environment interaction and environmental main effect (G): I now take QTL-by-environment interactions and the environmental main effect as random. Thus, I have the task of selecting an appropriate variance-covariance structure for G. The need for invariance under recoding of the markers dictates the unstructured model for G, while the parsimony principle suggests that simpler approximating structures may be worth considering. I propose to generally fit an unstructured model, except when the dimension of G is large and an unstructured model is difficult to fit.

Scanning the genome:
The same model as selected for both R and G in the previous steps is used when scanning the genome for QTL. Of course, R and G are reestimated at each putative QTL position. Note that G is extended at the scanning stage by the covariate zi for the two flanking markers. Assuming the same structure as selected for the case where xi contains only the cofactors may not be optimal. It would not be practicable, however, to select a different model at each step during the genome scan. Also, the type of model finally chosen for R and G not only depends on model fit but also on ease of estimation. Some structures for R and G may be well fitting but difficult to estimate (convergence problems, difficulty in choosing good starting values, etc.), thus making them infeasible for automated QTL scans, where the same model has to be estimated a large number of times.


*  EXAMPLE
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*EXAMPLE
*DISCUSSION
*LITERATURE CITED

I used the barley data to exemplify the proposed methods. MPS picked the marker pair M81/M82. Between these two, M82 had a better fit than M81 if fitted alone. Subsequently, the cofactor M82 and the interactions with environments were included in the fixed part of the model and various structures were fitted to R by the REML method. At this stage, xi did not yet contain a covariate zi for the putative QTL. All mixed-model analyses were done using ASREML (GILMOUR et al. 1999 Down). The same genotypes have been tested in all environments, so genetic correlation needs to be modeled in R. Since there are M = 16 environments and hence R is a M x M matrix, there are = 136 parameters for the unstructured model. The results for different models are shown in Table 2. On the basis of SBC the factor-analytic model R = JM{sigma}2e + {lambda}{lambda}' + D was selected.


 
View this table:
In this window
In a new window

 
Table 2. Fitting information for different models for R

The Wald F-statistic for cofactor(M82)-by-environment interaction in b was 10.84, which is significant at {alpha} = 5%. Thus, the interaction term was retained. In the next model-building step, the environmental main effect and interaction with M82 was regarded as random and different structures were fitted to G, using the model R = JM{sigma}2e + {lambda}{lambda}' + D. The results are given in Table 3. The unstructured model was selected for G, although D fitted slightly better according to SBC. The unstructured model is preferred based on the a priori reasoning that it ensures invariance to shifts in the covariates xi. This invariance is also useful when G is augmented by a covariate for the putative QTL.


 
View this table:
In this window
In a new window

 
Table 3. Fitting information for different models for G

I computed parameter estimates and standard errors of the fixed effects based on the selected models for R and G. For comparison, I also computed parameter estimates and standard errors on the basis of a multiple linear regression with means . The results are reported in Table 4. The parameter estimates do not differ much in the two analyses, but the standard errors are larger with the mixed-model analysis. The analysis shows that it is not appropriate to ignore correlations among genotypes as is done in a simple analysis of means.


 
View this table:
In this window
In a new window

 
Table 4. Parameter estimates for fixed effects and standard errors (SE) based on analysis of means and a mixed-model analysis with R = JM{sigma}2e + {lambda}{lambda}' + D and unstructured G (includes a cofactor for M82, but no effect for putative QTL)

The next step is to scan the genome for a QTL by CIM. For this purpose, xi needs to be augmented by the covariate zi for the putative QTL. This again raises the question of how to model G. A priori, the unstructured model seems most appropriate, especially since there is only one cofactor so that parsimony is not a pressing issue. Thus, I used the unstructured model. The window size was 10 cM; i.e., for putative QTL within 10 cM of a cofactor, the cofactor was dropped from the model. The step size of the chromosome scan was 1 cM. During the chromosome scan, parameter estimates for G and R from the present putative QTL position were used as starting values at the next position. This resulted in convergence of the REML algorithm within a few iterations (typically 5–10). [One referee noted that this choice of starting values may cause convergence to a local, but not the global maximum of the (restricted) log-likelihood. In my experience the likelihood of this problem is small with a short step size. In the present example the problem was not observed. The same referee indicated that using fixed starting values gives more stable results.] At each position, I computed a Wald statistic (F) for the test of the null hypothesis of no QTL. Conditionally on the position, F asymptotically follows a {chi}2 distribution with 1 d.f. On chromosome segments with the same set of cofactors for CIM, the type I error rate was controlled using the quick method proposed by DAVIES 1977 Down, DAVIES 1987 Down. An overall threshold controlling the genome-wise type I error rate was obtained by the Bonferroni method. The critical value was 12.83. The F profile for the seven barley chromosomes is shown in Fig 1. There is a significant QTL on chromosome 3 near the 58.9-cM position, where the F-statistic reaches a maximum of 20.62. Chromosome 5 shows some indication of a second QTL, though the F values are not significant. For comparison, I also computed the F profile on the basis of a regression analysis using means . The critical value was 13.01. The profiles are similar in shape, though there are some notable differences to the mixed-model analysis. The peak on chromosome 3 is much more pronounced, reaching a maximum of ~80, while for the mixed-model analysis the maximum value was ~20. At the estimated QTL position on chromosome 3 (58.9 cM), the estimated QTL effect was = -0.475(±0.105) based on the mixed-model analysis. Since the estimated QTL position is within 10 cM of the cofactor M82, this cofactor was dropped for the analysis at this position. The estimate for G was



View larger version (27K):
In this window
In a new window
Download PPT slide
 
Figure 1. F profile for composite interval mapping based on mixed model for genotype-by-environment data (left, critical threshold at F = 12.83) and on simple regression model for genotype means across environments (right, critical threshold at F = 13.01). Chromosomes 1–7.

Thus, the variance of environmental main effects was 2.14, which is notably larger than the variance of the QTL effects across environments (0.137). The variance of QTL effects corresponds to a standard deviation of 0.370, which is fairly large relative to the average QTL effect {alpha}. This shows that the detected QTL is not very stable across environments. For example, assuming normality, the probability that the QTL effect in a randomly chosen environment (given by {alpha} + aj) has positive sign (and thus the opposite sign of {alpha}) is {Phi}(-0.475/0.370) = 0.10, where {Phi} is the cumulative density function of the standard normal distribution. Table 5 gives BLUPs of aj. For 1 out of the 16 environments (SKg92), the resulting estimate of {alpha} + aj has a positive sign. This finding is in good agreement with the estimated probability of 10%. Thus, despite the relatively large average QTL effect, some surprises are possible in specific environments.


 
View this table:
In this window
In a new window

 
Table 5. BLUPs of QTL-by-environment effects aj at position 58.9 cM on chromosome 3 in mixed-model analysis with R = JM{sigma}2e + {lambda}{lambda}' + D and unstructured G


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*EXAMPLE
*DISCUSSION
*LITERATURE CITED

A common feature of QTL analyses is that QTL effects depend on environment. Many researchers have dealt with this problem by analyzing each environment separately. This approach is quite useful, if one is interested in the particular test environments. As pointed out by TINKER and MATHER 1995 Down, "separate analysis by environment circumvents the problem of dealing with QTL-by-environment interaction and avoids complications due to environmental heterogeneity. However, the results of separate analyses are difficult to interpret, and they do not take advantage of the built-in replication provided by multiple environments." Quite frequently, test environments are just a sample from a target population, and the breeder is interested in making broad inferences not restricted to the particular test environments (MELCHINGER et al. 1998 Down). This objective calls for a mixed-model analysis with random environments (BEAVIS and KEIM 1996 Down). The present article has shown how to implement such an analysis for CIM. The approach is easily extended to cover multiple-QTL and epistatic effects by appropriate modifications of ß and bj (MORENO-GONZALEZ 1992 Down; WANG et al. 1999 Down). This is useful as an additional analysis step, when several QTL have been detected by CIM.

A simple alternative to a mixed-model analysis of MET QTL data is to proceed in two steps as follows: first, the quantitative trait is analyzed by ANOVA techniques to obtain (adjusted) genotype means across environments. Second, the means together with the marker data are submitted to a routine for QTL analysis. My theoretical considerations and analysis of a real dataset led me to conclude that such analyses may lead to inappropriate inferences, mainly because standard error estimates are inappropriate. A mixed-model framework allows more valid inferences to be obtained by incorporation of different random components of variance that appropriately account for the environmental and genetic structure of the data.

WANG et al. 1999 Down use a mixed-model approach similar to the one presented here. There are some differences, the main one being that WANG et al. 1999 Down do not allow for genetic correlation. They exploit map information to model the equivalent of my G. This approach requires a specific genetic model, including epistasis and specification of multiple QTL. It assumes that all genetic effects are modeled by G and that all correlation among effects is solely due to the map. It is to be expected that the approach is susceptible to misspecification of the model. By contrast the model used in this article allows unexplained genetic effects and genetic correlation to be subsumed in the residual eij. As a result there is no need to assume a specific overall model. Moreover, using a general model for G allows for covariance among QTL that is due to correlated response to differential environmental condition. Such correlation can arise even for QTL on different chromosomes, which would be regarded as independent under the model used by WANG et al. 1999 Down. My model could be easily extended to exploit the map in modeling the correlation among cofactors and the putative QTL in G, but I prefer to work with a general model for G both for ease of computation and to reduce the danger of model misspecification. Also, if the approach is extended to multiple QTL and epistatic effects, it is still preferable to work with general G for the same reasons. When the dimension of G becomes large, it is reasonable to consider more parsimonious models such as those used for R.

In this article I have demonstrated how to use mixed models for assessing mean and stability of QTL effects based on genotype-by-environment means from MET data. My mixed-model framework is easily extended to other settings, e.g., when spatial heterogeneity needs to be modeled at the plot level (nearest neighbor analyses; MOREAU et al. 1999 Down) and when the experimental designs give rise to random effects (incomplete blocks, etc.; HALEY and KNOTT 1992 Down). Error variance heterogeneity among different environments in MET can be accounted for by weighting (CULLIS et al. 1996 Down), though in my experience weighting has little effect on final parameter estimates and standard errors.

My analysis has assumed random environments. In a model with fixed environments, the focus is on studying QTL-by-environment interactions. It has been stressed by KOROL et al. 1998 Down that the number of interaction parameters in models such as (4) increases linearly with the number of environments. These authors used the regression approach of EBERHART and RUSSELL 1966 Down to model interactions with fewer parameters. The regression approach by EBERHART and RUSSELL 1966 Down was originally proposed for the analysis of genotype-by-environment interaction, but it can be applied in the same way to model QTL-by-environment interaction, as demonstrated by KOROL et al. 1998 Down. In fact, there are a large number of different models for genotype-environment interaction (LIN et al. 1986 Down; BECKER and LEON 1988 Down; VAN EEUWIJK et al. 1996 Down; PIEPHO 1998B Down), all of which are potentially useful for modeling QTL-by-environment interaction. This potential seems to have gone largely unnoticed in QTL work. For example, extensions of the Eberhart-Russell regression such as the additive main effects multiplicative interaction model (GAUCH 1988 Down) typically explain a much larger fraction of the total interaction and so promise improved performance (ROMAGOSA et al. 1996 Down). These approaches can be incorporated into my mixed model by taking b as fixed and imposing some structure such as in Eberhart-Russell regression.


*  ACKNOWLEDGMENTS

Thanks are due to Hugh Gauch Jr. and Susan McCouch for inspiring discussions. H.F. Utz (University of Hohenheim, Germany) is thanked for helpful comments on an earlier draft. Support of the Heisenberg Programm of the Deutsche Forschungsgemeinschaft is gratefully acknowledged. Part of the research for this article was conducted while the author was visiting the Department of Biometrics and the Department of Plant Breeding, College of Agriculture and Life Sciences, Cornell University, Ithaca, NY.

Manuscript received March 22, 2000; Accepted for publication July 25, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*EXAMPLE
*DISCUSSION
*LITERATURE CITED

BEAVIS, W. D., and P. KEIM, 1996 Identification of quantitative trait loci that are affected by environment, pp. 123–149 in Genotype-by-Environment Interaction, edited by M. S. KANG and H. G. GAUCH JR. CRC Press, Boca Raton, FL.

BECKER, H. C. and J. LÉON, 1988  Stability analysis in plant breeding. Plant Breed. 101:1-23.

BURNHAM, K. P., and D. R. ANDERSON, 1998 Model Selection and Inference. Springer, New York.

CULLIS, B. R., F. M. THOMSON, J. A. FISHER, A. R. GILMOUR, and R. THOMPSON, 1996  The analysis of the NSW wheat variety data base. II. Variance component estimation. Theor. Appl. Genet. 92:28-39.

DAVIES, R. B., 1977  Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64:247-254[Abstract/Free Full Text].

DAVIES, R. B., 1987  Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74:33-43[Abstract/Free Full Text].

EBERHART, S. A. and W. A. RUSSELL, 1966  Stability parameters for comparing varieties. Crop Sci. 6:36-40[Abstract/Free Full Text].

GAUCH, H. G., 1988  Model selection and validation for yield trials with interaction. Biometrics 44:705-715.

GILMOUR, A. R., B. R. CULLIS, S. J. WELHAM and R. THOMPSON, 1999 ASREML. User manual, ftp://ftp.res.bbsrc.ac.uk/pub/aar/.

HALEY, C. S. and S. A. KNOTT, 1992  A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].

HAN, F. and S. E. ULLRICH, 1993  The North American Barley Genome Mapping Project: Mapping of quantitative trait loci associated with malting quality. Barley Genet. Newsl. 23:84-97.

JANSEN, R. C. and P. STAM, 1994  High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455[Abstract].

JANSEN, R. C., J. W. VAN OOIJEN, P. STAM, C. LISTER, and C. DEAN, 1995  Genotype-by-environment interaction in genetic mapping of multiple quantitative trait loci. Theor. Appl. Genet. 91:33-37.

KAO, C. H., Z-B. ZENG, and R. D. TEASDALE, 1999  Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216[Abstract/Free Full Text].

KEARSEY, M. J. and A. G. L. FARQUHAR, 1998  QTL analysis in plants; where are we now? Heredity 80:137-142.

KEARSEY, M. J. and V. HYNE, 1994  QTL analysis: a simple `marker regression' approach. Theor. Appl. Genet. 89:698-702.

KOROL, A. B., Y. I. RONIN, and E. NEVO, 1998  Approximate analysis of QTL-environment interaction with no limits on the number of environments. Genetics 148:2015-2028[Abstract/Free Full Text].

LAIRD, N. M. and J. H. WARE, 1982  Random effects model for longitudinal data. Biometrics 38:963-974[Medline].

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199[Abstract/Free Full Text].

LIN, C. S., M. R. BINNS, and L. P. LEVKOVITCH, 1986  Stability analysis: where do we stand? Crop Sci. 26:894-900[Abstract/Free Full Text].

LIU, B.-H., 1998 Statistical Genomics. CRC Press, Boca Raton, FL.

LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA.

MARTINEZ, O. and R. N. CURNOW, 1992  Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor. Appl. Genet. 85:480-488.

MARTINEZ, O. and R. N. CURNOW, 1994  Missing markers when estimating quantitative trait loci using regression mapping. Heredity 73:198-206.

MCQUARRIE, A. D. R., and C.-L. TSAI, 1998 Regression and Time Series Model Selection. World Science Publishing Company, Singapore.

MELCHINGER, A. E., H. F. UTZ, and C. C. SCHÖN, 1998  Quantitative trait locus (QTL) mapping using different testers and independent population samples in maize reveals low power of QTL detecting and large bias in estimates of QTL effects. Genetics 149:383-403[Abstract/Free Full Text].

MOREAU, L., H. MONOD, A. CHARCOSSET, and A. GALLAIS, 1999  Marker-assisted selection with spatial analysis of unreplicated field trials. Theor. Appl. Genet. 98:234-242.

MORENO-GONZALEZ, J., 1992  Genetic models to estimate additive and non-additive effects of marker-associated QTL using multiple regression techniques. Theor. Appl. Genet. 85:435-444.

PIEPHO, H. P., 1998a  Empirical best linear unbiased prediction in cultivar trials using factor analytic variance-covariance structures. Theor. Appl. Genet. 97:195-201.

PIEPHO, H. P., 1998b  Methods for comparing the yield stability of cropping systems—A review. J. Agron. Crop Sci. 180:193-213.

ROMAGOSA, I., S. E. ULLRICH, F. HAN, and P. M. HAYES, 1996  Use of additive main effects and multiplicative interaction model in QTL mapping for adaption in barley. Theor. Appl. Genet. 93:30-37.

SARI-GORLA, M., T. CALINSKI, Z. KACZMAREK, and P. KRAJEWSKI, 1997  Detecting QTL x environment interaction in maize by a least squares interval mapping method. Heredity 78:146-157.

SHUKLA, G. K., 1972  Some statistical aspects of partitioning genotype-environmental components of variability. Heredity 29:237-245[Medline].

SILLANPÄÄ, M. J. and E. ARJAS, 1998  Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373-1388[Abstract/Free Full Text].

TINKER, N. A. and D. E. MATHER, 1995  Methods for QTL analysis with progeny replicated in multiple environments. J. Quant. Trait Loci 1. http://probe.nalusda.gov:8000/otherdocs/jqtl/.

VAN EEUWIJK, F. A., J. B. DENIS and M. S. KANG, 1996 Incorporating additional information on genotypes and environments in models for two-way genotype by environment tables, pp. 15–50 in Genotype-by-Environment Interaction, edited by M. S. KANG and H. G. GAUCH JR. CRC Press, Boca Raton, FL.

WANG, D. L., J. ZHU, Z. K. LI, and A. H. PATERSON, 1999  Mapping QTLs with epistatic effects and QTL x environment interaction by mixed linear model approaches. Theor. Appl. Genet. 99:1255-1264.

WHITTAKER, J. C., R. THOMPSON, and P. M. VISSCHER, 1996  On the mapping of QTL by regression of phenotype on marker-type. Heredity 77:23-32.

WOLFINGER, R. D., 1993  Covariance structure selection in general mixed models. Commun. Stat. A 22:1079-1106.

WU, W.-R. and W.-M. LI, 1994  A new approach for mapping quantitative trait loci using complete genetic marker linkage maps. Theor. Appl. Genet. 89:535-539.

ZENG, Z-B., 1993  Theoretical basis of separation of multiple linked gene effects on mapping quantitative trait loci. Proc. Natl. Acad. Sci. USA 90:10972-10976[Abstract/Free Full Text].

ZENG, Z-B., 1994  Precision mapping of quantitative trait loci. Genetics 136:1457-1466[Abstract].

ZENG, Z-B., C. H. KAO, and C. J. BASTEN, 1999  Estimating the genetic architecture of quantitative traits. Genet. Res. 74:279-289[Medline].