## Abstract

Environment-specific quantitative trait loci (QTL) refer to QTL that express differently in different environments, a phenomenon called QTL-by-environment (*Q* × *E*) interaction. *Q* × *E* interaction is a difficult problem extended from traditional QTL mapping. The mixture model maximum-likelihood method is commonly adopted for interval mapping of QTL, but the method is not optimal in handling QTL interacting with environments. We partitioned QTL effects into main and interaction effects. The main effects are represented by the means of QTL effects in all environments and the interaction effects are represented by the variances of the QTL effects across environments. We used the Markov chain Monte Carlo (MCMC) implemented Bayesian method to estimate both the main and the interaction effects. The residual error covariance matrix was modeled using the factor analytic covariance structure. A simulation study showed that the factor analytic structure is robust and can handle other structures as special cases. The method was also applied to *Q* × *E* interaction mapping for the yield trait of barley. Eight markers showed significant main effects and 18 markers showed significant *Q* × *E* interaction. The 18 interacting markers were distributed across all seven chromosomes of the entire genome. Only 1 marker had both the main and the *Q* × *E* interaction effects. Each of the other markers had either a main effect or a *Q* × *E* interaction effect but not both.

GENOTYPE-BY-ENVIRONMENT (*G* × *E*) interaction is a very important phenomenon in quantitative genetics. With the advanced molecular technology and statistical methods for quantitative trait loci (QTL) mapping (Lander and Botstein 1989; Jansen 1993; Zeng 1994), *G* × *E* interaction analysis has shifted to QTL-by-environment (*Q* × *E*) interaction. In the early stage of QTL mapping, almost all statistical methods were developed in a single environment (Paterson *et al.* 1991; Stuber *et al.* 1992). Data from different environments were analyzed separately and the conclusions were drawn from the separate analyses of QTL across environments. These methods do not consider the correlation of data under different environments and thus may not extract maximum information from the data. Composite interval mapping for multiple traits can be used for *Q* × *E* interaction if different traits are treated as the same trait measured in different environments (Jiang and Zeng 1995). This multivariate composite interval mapping approach makes good use of all data simultaneously and increases statistical power of QTL detection and accuracy of the estimated QTL positions. However, the number of parameters of this method increases dramatically as the number of environments increases. Therefore, the method may not be applied when the number of environments is large. Several other models have been proposed to solve the problem of a large number of environments (Jansen *et al.* 1995; Beavis and Keim 1996; Romagosa *et al.* 1996). These methods were based on some special situations and assumptions. One typical assumption was independent errors or constant variances across environments. These assumptions are often violated in real QTL mapping experiments.

Earlier investigators realized the problem and adopted the mixed-model methodology to solve the problem (Piepho 2000; Boer *et al.* 2007). Under the mixed-model framework, people can choose which model effects are random and which are fixed. The mixed-model methodology is very flexible, leading to an easy way to model genetic and environmental correlation between environments using a suitable error structure. Piepho (2000) proposed a mixed model to detect QTL main effect across environments. Similar to the composite interval mapping analysis, his model incorporated one putative QTL and a few cofactors. The *Q* × *E* effects in the model were assumed to be random, which greatly reduced the number of estimated parameters. However, the fact that only one QTL is included in the model means that Piepho's (2000) model remains a single-QTL model rather than a multivariate model. Boer *et al*. (2007) proposed a step-by-step mixed-model approach to detecting QTL main effects, *Q* × *E* interaction effects, and QTL responses to specific environmental covariates. In the final step, Boer *et al*. (2007) rewrote the model to include all QTL in a multiple-QTL model and reestimated their effects.

In this study, we extended the Bayesian shrinkage method (Xu 2003) to map *Q* × *E* interaction effects of QTL. In the original study (Xu 2003), we treated each marker as a putative QTL and used the shrinkage method to simultaneously estimate marker effects of the entire genome. In the multiple-environment case, we can still use this approach to simultaneously evaluate marker effects under multiple environments but we can further partition the marker effects into main and *Q* × *E* interaction effects. For any particular marker, the mean of the marker effects represents the main effect and the variance of the marker effects represents the *Q* × *E* interaction effect for that marker. Under the Bayesian framework, we assigned a normal prior with zero mean and an unknown variance to each marker main effect and a multivariate normal prior with zero vector mean and homogeneous diagonal variance–covariance matrix to the *Q* × *E* interaction effects of each maker. In multiple environments, the structure of the error terms might be very complicated since we need to consider the correlation of the same genotype under different environments. In our analysis, we used different variance–covariance structures to model the error terms. The simplest case was the homogeneous diagonal matrix, and the most complex choice was an unstructured matrix. We also used a heterogeneous diagonal matrix whose parameters are somewhere between the two models. Finally, we considered several factor analytic models. The reason to use the factor analytic structure is that it can separate genetic effects into common effects and environment-specific effects. In addition, the factor analytic structure is parsimonious and thus can substantially reduce the computational burden of the mixed-model analyses.

## THEORY AND METHOD

#### Hierarchical model:

Let be an *m* × 1 column vector for the observed phenotypic values of individual *j* measured from *m* environments for , where *n* is the sample size. Let *q* be the number of QTL included in the model. The multivariate linear model is(1)In the above model, is an *m* × 1 vector for the intercepts. The dependent variable *Z*_{jk} is a genotype indicator variable for individual *j* at marker *k* and it is defined as for the two genotypes of a backcross (BC) individual or for the three genotypes of an individual. The regression coefficient is an *m* × 1 vector of QTL effects for the *m* environments. Finally, is an *m* × 1 vector for the residual errors. To model the *Q* × *E* interaction, we assume that γ_{k} follows a multivariate normal distribution,(2)where 1* _{m}* is a unity vector with dimension

*m*, I

_{m×m}is an

*m*×

*m*identity matrix, α

_{k}is the mean value representing the main QTL effect, and σ

_{k}

^{2}is the variance of γ

_{k}representing the

*Q*×

*E*interaction. This type of model with further modeling on γ

_{k}is called a hierarchical model. In the hierarchical model, the first moment parameter α

_{k}is the main effect and the second moment parameter σ

_{k}

^{2}represents the degree of

*Q*×

*E*interaction. The residual error vector ξ

_{j}is assumed to be multivariate normal with density(3)where Θ is an

*m*×

*m*variance–covariance matrix, which can be chosen from a class of available forms (to be discussed later). We have now defined the data and parameters. The nest step of the Bayesian analysis is to choose the prior distribution and infer the posterior distribution for each parameter.

#### Prior distribution:

We often have enough information from the data to estimate β and thus a flat (uninformative) prior was chosen for β; *i.e.*, *p*(β)=1. The main effect for the *k*th QTL was assigned the following normal prior,(4)where φ^{2}_{k} is the prior variance. A scaled inverse chi-square distribution was assigned to φ^{2}_{k}, which is(5)

A special case of this prior is τ=ω=0, leading to , called Jeffreys' prior. However, as mentioned by ter Braak *et al*. (2005), this prior is improper and leads to an improper posterior distribution. The revised prior is proposed by them and is claimed to lead to a proper posterior distribution. The revised prior is(6)where 0<δ≤0.5. In this study we used the proper prior to avoid any potential problems caused by the improper posterior distribution. The same scaled inverse chi-square distribution was also assigned to σ_{k}^{2},(7)

Finally, we assumed , where is a common residual variance and was assigned the same scaled inverse prior,(8)

Other structures of Θ are considered and described in a later section.

#### Posterior distribution:

The Markov chain Monte Carlo (MCMC) algorithm was used to implement the Bayesian shrinkage analysis. In the MCMC sampling, we need to derive only the fully conditional posterior distribution for each parameter. For example, the fully conditional posterior distribution for β is denoted by *p*(β | …), where the dots after the vertical bar represent the data and all other parameters. Except for the prior of β, all other priors we chose in the previous section are conjugate. Therefore, the fully conditional posterior has the same form as the prior distribution. Derivation of the posterior distribution was not given and we simply provided the parameters of the fully conditional posterior distribution for each variable.

The posterior distribution for β is multivariate normal(9)where(10)and(11)

The posterior for γ_{k} is also multivariate normal,(12)where(13)and(14)

The posterior distribution for α_{k} is normal,(15)where(16)

and(17)

We now discuss the posterior distributions for all the variance components, σ^{2}, σ^{2}_{k}, and φ^{2}_{k} for *k* = 1,…, *q*. All of them are scaled inverse chi squares as given below,(18)(19)and(20)where(21)

#### MCMC sampling:

Since all the fully conditional posterior distributions have closed-form distributions, either a normal or a scaled inverse chi-square, Gibbs sampler was used for sampling all the variables, which is summarized below:

Initialize all variables by sampling the values from their prior distributions.

Sample the parameters sequentially from their corresponding posterior distributions.

Repeat the sampling cycle until the chain reaches a desired length.

The posterior sample contains all the observations after burn-in deletion and chain thinning. Post-MCMC analysis was performed on the posterior sample. We often ran multiple chains and took the average posterior statistics across the chains as the Bayesian estimates of the parameters.

#### Covariance structure:

We now introduce several alternative covariance structures for the residual errors.

##### Identity matrix:

The simple structure described earlier, Θ=I_{m}σ^{2}, is called the scaled identity matrix structure. This assumption is oversimplified and should be relaxed in real data analysis.

##### Diagonal matrix:

The covariance matrix is defined as(22)which represents uncorrelated residual errors but has taken into account nonhomogenous residual variances for different environments. This assumption may hold in most situations. Each *d* was assigned a scaled inverse chi-square distribution and the fully conditional posterior distribution for *d*_{i} is(23)where(24)

##### Unstructured matrix:

The unstructured covariance matrix has been used before by Jiang and Zeng (1995) for multivariate QTL mapping. The only restriction for matrix Θ is positive definite. We assigned an inverse Wishart prior distribution to Θ. This prior is the multivariate version of the scaled inverse chi-square distribution,(25)where τ > *m* − 1 is the prior degree of belief and ω>0 is a positive definite matrix with the same dimension as matrix Θ. The posterior distribution remains inverse Wishart and thus(26)where(27)

is an *m*×*m* sum of squares matrix, different from the SS defined in Equation 20.

##### Factor analytic structured matrix:

The covariance matrix has the following structure,(28)It is called factor analytic structure because this structure has been used in factor analysis. This factor analytic structure was derived on the basis of the following latent variable linear model for the residual errors,(29)where *u*_{j} is an *r* × 1 latent factor (*r* < *m*) with a(30)distribution, B is an *m*× *r* matrix called factor loading, and is a vector of independent errors and is a diagonal matrix for the independent error variances.

Under the factor analytic structure, the MCMC algorithm requires sampling *B* and *u*_{j} for *j*=1,…, *n*, in addition to other parameters. We now describe the prior and posterior of these new variables. The prior for *u*_{j} is standardized multivariate normal given in Equation 30. The fully conditional posterior distribution remains multivariate normal,(31)where(32)and(33)

The factor loadings are represented by an *m* × *r* matrix *B*. Let be the *l*th column of matrix *B* for *l* = 1,…, *r*. We now rewrite Equation 29 as(34)

Given and knowing that(35)

Equation 34 is a typical multivariate regression problem. The fully conditional posterior distribution of *B*_{l} is multivariate normal,(36)where(37)and(38)

Having provided the fully conditional posterior distribution for every variable, we are now ready to conduct the Gibbs sampler to infer the empirical posterior distribution for each variable.

## APPLICATIONS

#### Barley data analysis:

We used barley data obtained from the North American Barley Genome Mapping Project (Tinker *et al.* 1996) to demonstrate the application of the new method. In the barley QTL mapping project, there were 127 mapped markers covering 1500 cM of the barley genome. Seven traits were investigated in the project. In this study, we used the yield trait analysis for the demonstration. The doubled haploid (DH) population was initiated from the cross between Harrington and TR306. The DH population consisted of 145 lines, each grown in 28 different environments. The data set was updated after it was first published in 1996, but the difference between the original and the updated data was minor so that we could still compare the current result with that from the original study.

We used six different covariance structures to analyze the data, which were (1) the homogeneous (constant) variance , (2) the heterogeneous variances , (3) unstructured matrix Θ (positive definite), (4) the first-order factor analytic structure , (5) the second-order factor analytic structure , and (6) the third-order factor analytic structure . The length of the Markov chain consisted of 200,000 sweeps. The first 100,000 sweeps were deleted as burn-in and thereafter the chain was thinned by keeping 1 observation in every 100 sweeps, producing 1000 observations in the collected posterior sample for post-MCMC analysis.

To test the significance of the QTL effects, we conducted a permutation test to generate the null distribution of each main effect and each *Q* × *E* interaction effect. In the permutation analysis, we repeated the MCMC sampling method as described before but reshuffled the phenotypic values. The permutation analysis was proposed by Che and Xu (2010), who called it permutation inside the Markov chain. In the permutation analysis, the length of the Markov chain was 200,000 sweeps. The first 100,000 sweeps were deleted as burn-in and the chain thinning rate was 1/25. The posterior sample contained 4000 observations. From the null distribution, we drew a confidence interval for each estimated effect. An effect was claimed to be significant if the estimated value fell outside of the 99% confidence interval of the null distribution.

Among the six covariance structures, the second structure Θ=*D* (a diagonal matrix) detected the maximum number of QTL (main and *Q* × *E* interaction effects). Although more QTL does not mean better, it is hard to use cross-validation to evaluate different structures under the MCMC implemented Bayesian analysis. Bayes factors are often used to evaluate different models. However, the complexity of our proposed model makes the calculation of the Bayes factors difficult. Therefore, we used the Bayesian information criteria (BIC) to evaluate the performance of the six different models. The BIC score was calculated using(39)where *L* is the likelihood function evaluated at the estimated parameters, *p* is the number of parameters, and *n* is the sample size. The Bayesian estimates of the parameters in *L* are the posterior means of β, γ, and Θ. The BIC scores are shown in Table 1, which indicates that the second (heterogeneous residual variance) model performed better than all other models. The first-order factor analytic model was the second best model with a BIC score slightly larger than that of the best model. The result of the best model is depicted in Figure 1, where the posterior means of the main effects and the *Q* × *E* interaction effects are plotted against the genome locations of the markers. Figure 1 also gives the 99% confidence intervals for the main and *Q* × *E* interaction effects. Eight markers showed significant main effects and 18 markers showed significant *Q* × *E* interaction. The 18 interacting markers were distributed across all seven chromosomes of the entire genome. Only 1 marker had both the main and the *Q* × *E* interaction effects. Each of the other markers had either a main effect or a *Q* × *E* interaction effect but not both. The estimated main and *Q* × *E* interaction effects for the markers are given in Table 2.

We also performed an individual marker analysis to compare the result with that of the Bayesian analysis. For the individual marker analysis, QTL mapping was conducted separately for each environment. The average estimated effect for each marker across the 28 environments represented the main effect while the variance of the estimated effects across the environments represented the *Q* × *E* interaction effect. The estimated main and *Q* × *E* interaction effects of the single-marker analysis are shown in Figure 2. We can see that Figure 2 is quite similar to Figure 1 in the Bayesian analysis. The main difference between the two figures is the different sharpness of the marker effects. The Bayesian analysis generated very clean (sharp) signals of the plots.

#### Arabidopsis data analysis:

The barley data contain many environments, which is hard to find in most studies. So we also applied our model to recombinant inbred line data of Arabidopsis (Loudet *et al.* 2002), where two parents initiating the line cross were Bay-0 and Shahdara, with Bay-0 as the female parent. Flowering time was recorded for each line in two environments: long day (16-hr photoperiod) and short day (8-hr photoperiod). The population contained 420 lines. A total of 38 microsatellite markers were used for QTL mapping. We inserted a pseudomarker in every 2 cM of the genome and had a total of 200 markers (38 true markers plus 162 pseudomarkers) in our analysis.

The variance of *Q* × *E* interaction may not be estimated accurately due to small environments. So in small environments the variance would then simply serve as a tool to shrink the environment-specific QTL effects. The bias of would lead to biased estimation of main effect as well. Although the MCMC algorithm remains the same as before, we need to revise our post-MCMC procedure. We use vector γ_{k} to estimate *Q* × *E* interaction effects of the *k*th marker. The differences between vector γ_{k} and its mean represent *Q* × *E* interaction effects. In the two-environments case, we can just use the differences between the two components of γ_{k} as the *Q* × *E* interaction effects because vector γ_{k} is a 2 × 1 vector. Since there are only two environments, we did not use the factor analytic model to analyze the data. The BIC scores for the three models (homogeneous, heterogeneous, and unstructured covariance matrices) are 3798.49, 3775.64, and 3645.55. Figure 3 shows the main and *Q* × *E* interaction effects of the Arabidopsis data under the unstructured covariance model. The 95% confidence intervals for the main and *Q* × *E* interaction effects are also given. Four markers showed significant main effects and six markers showed significant *Q* × *E* interaction effects.

#### Simulation study:

The barley data analysis did not show the advantage of fitting appropriate covariance structures over the simple diagonal covariance matrix because the 28 different environments did not seem to be correlated. Therefore, we conducted two simulation experiments (simulations 1 and 2) in this section to demonstrate the importance of covariance structure to the Bayesian analysis of *Q* × *E* interaction. We also did a two-environment simulation (simulation 3) to demonstrate the fitness of our model for small environments.

In simulation 1, we used the real marker information from the North American Barley Genome Mapping Project (Tinker *et al.* 1996) to simulate the genome. We simulated 127 markers from seven chromosomes with marker distances exactly the same as the real data. We simulated 145 DH lines in 28 environments. The intercept β was given values ranging from 200 to 605 for the 28 different environments. We assumed that 10 of the 127 markers had main effects and also *Q* × *E* interaction effects in the 28 environments. In the simulation experiment, we chose the factor analytic covariance structure with *B* defined as a 28×3 matrix, indicating that correlations had occurred between different environments. The true values of β, the QTL effects, the *B* matrix, and the *D* matrix are given in Tables 4 and 5. The simulated data were analyzed using the six different covariance structures described earlier in the barley data analysis. We expected that the first three structures (homogeneous variance, heterogeneous variance, and unstructured matrices) would perform poorly but the last three structures (first-order, second-order, and third-order factor analytic structures) would perform better, especially the third-order factor analytic structure.

In the MCMC-implemented Bayesian analysis, the length of the Markov chain was 50,000 sweeps. The first 25,000 sweeps (burn-in period) were deleted. The chain thinning rate was 1 in 50. The empirical posterior sample contained 500 observations for the post-MCMC analysis. The MCMC experiment with the same simulated data was repeated a few times to make sure that the chain had converged to the stationary distribution.

The results of the simulation studies are depicted in Figure 4 for the average main effects of 20 replicates and in Figure 5 for the average *Q* × *E* interaction effects of 20 replicates. These two figures show that the estimated QTL effects agreed well with the true effects. Figures 4 and 5 also show that the first four covariance structures (homogeneous residual, heterogeneous residual, unstructured covariance, and first-order factor analytic structure) have some notable background noise, indicating some false positives had occurred. However, the last two factor analytic structures have very little background noise. From the two figures, the background noise of the first four covariance structures may not be very clear. So we calculated standard deviations of each marker's main effect among the 20 replicates and plotted them in Figure 6, from which we can see the difference among the six models. The performance of the last two factor analytic models is very stable for the majority of the markers without main effects. While the first four structures, especially the first two, cannot achieve such a nice performance, which means that among the 20 replicates these models generated some false main effects. Table 3 shows the average BIC scores for the six different covariance structures. We see that the factor analytic structure outperformed the other three models. This is consistent with our expectation. The lowest BIC occurred in the second-order factor analytic structure. However, the third-order factor analytic structure (the true model) was just slightly higher in value than the second-order structure. The log-likelihood value of the third-order factor was higher than that of the second-order factor. Table 4 gives the average estimated main and *Q* × *E* interaction effects obtained from the 20 replicates based on the second-order factor analytic model. When we compared the estimated main effects and the true effects, we noted that large main effects were estimated quite accurately but small effects were shrunken to zero. The *Q* × *E* interaction effects were always detectable regardless of the sizes of the effects. Table 5 gives the true intercepts and the residual error variances (the *D* matrix) along with their estimated values. The true *B* matrix is also given in Table 5. The estimated *B* was not given here because the columns of *B* are independent and thus are exchangeable. This does not affect the estimate of the covariance structure. We checked the average estimated variance–covariance matrix and did observe three separate environment groups.

Although the stability test and BIC scores showed the advantage of the factor analytic model, the differences of marker main effects for the six models are not very obvious in Figure 4. So we performed the second simulation experiment (simulation 2) to further demonstrate the advantage of the factor analytic model. We focused mainly on comparison of heterogeneous diagonal structure and three factor analytic models. In this simulation, 100 DH lines in eight environments were generated with 30 markers. The distance between two nearby markers was 30 cM. The intercept β was given values ranging from 200 to 305 for the eight environments. We assumed that 3 of the 30 markers had main effects and also *Q* × *E* interaction effects in the eight environments. We also chose the factor analytic covariance structure with *B* defined as an 8×2 matrix. The first column of *B* had values of 20, 10, 10, 5, 0, 0, 0, 0. The second column had values of 0, 0, 0, 0, 15, 15, 10, 2. Matrix *D* was an identity matrix. Figure 7 shows the estimated main effects of the four models. We can clearly see many false positive main effects in the heterogeneous diagonal structure. The BIC scores for the four models are 4264, 3362, 2495, and 2558, which also are in favor of the factor analytic models.

In simulation 3, we also generated 100 DH lines using the same marker information given by simulation 2, but this time we simulated only two environments. The intercepts were 200 and 215 for the two environments. The factor analytic covariance structure was used with *B* defined as a 2×2 matrix. The first column of *B* had values of 1 and 2. The second column had values of 2 and 1. Matrix *D* was an identity matrix. The MCMC and post-MCMC analyses of these data used the same setup as Arabidopsis data analysis. Figure 8 gives the comparison of the true and the estimated main and *Q* × *E* interaction effects. From Figure 8, the true and the estimated marker effects are very close to each other for all three models. The promising results also demonstrate that our proposed method is a good choice to handle data with small environments.

## DISCUSSION

The importance of this study is reflected by two major contributions to *Q* × *E* study, the multiple-QTL model and the factor analytic covariance structure. The multiple-QTL model for *Q* × *E* is an extension of the Bayesian shrinkage analysis for mapping QTL in a single environment (Xu 2003). The factor analytic covariance structure is available in the literature but has never been applied to QTL mapping. Other covariance structures may be considered in future studies, *e.g.*, the autoregressive model of order 1 [AR(1)] and compound symmetry (CS) covariance structures. These alternative structures can be used to fit models when the environments represent temporal or spatial variation. The 28 environments in the barley experiments represent 28 different locations (spatial variation). However, the information about the location was not available to us. We believe that the factor analytic structure is robust and can be fit to a wide variety of covariance structures, ranging from the simplest diagonal matrix to the most complicated unstructured matrix, by choosing different orders of the factors. This has been demonstrated by the similarity of the diagonal matrix and the first-order factor analytic model in our data analyses. The factor analytic model is also easy to fit under the general linear model framework. Both the factor loadings and the factors themselves have normal posterior distributions and can be sampled using the Gibbs sampler approach.

The most significant contribution of this study was to use the variance of QTL effects across environments to measure the size of the *Q* × *E* interaction for a particular QTL. This has significantly simplified the *Q* × *E* study. If the number of environments were small, however, the variance would not be accurately estimated. In this case, one should use some kind of linear contrast of the environment-specific effects as a measure of the *Q* × *E* interaction. Arabidopsis data and simulation 3 are two examples of such a treatment. The variance would then simply serve as a tool to shrink the environment-specific QTL effects. The MCMC sampling procedure remains the same, but the post-MCMC analysis needs to be modified.

The method developed in the current study applies only to plants where the same genotype can be replicated in multiple environments. In animals where the same genotype cannot be replicated (except identical twins), some modification is required. For example, if an F_{2} family is raised in three environments, each animal may have a different genotype from other animals. This argument also applies to QTL-by-sex interaction, where the same individual cannot be split into male and female. The modification is not trivial and thus deserves further study.

Although the environment-specific QTL effects, denoted by vector for the *k*th marker, are used only to draw the posterior distributions for the main and *Q* × *E* interaction effects, they may be interesting parameters in their own rights. The posterior mean of each can be used to predict the molecular breeding value of each line in a particular environment. This information may facilitate marker-assisted selection (using a few markers) or genome selection (using all markers of the entire genome). Genome selection has been an important strategy for animal (Meuwissen *et al.* 2001) and plant breeding (Xu 2003).

The Bayesian method presented here applies only to multiple-marker analysis; *i.e.*, each marker is treated as a putative QTL. If the markers are not evenly placed in the genome, one may insert some pseudomarkers in regions not well covered by markers. In the regions with saturated markers, one may use only a few selected markers to avoid a potential multicollinearity problem. With the current molecular technology, genomes of most species of agricultural importance may be saturated very soon with high-density markers. Pseudomarker insertion will no longer be necessary, but marker selection will become important. One strategy for marker selection is to include one marker in every *d* cM for the Bayesian model. The optimal strategy may be the moving interval approach proposed by Wang *et al*. (2005), in which a fixed number of putative QTL were included in the model for each chromosome and the position of the putative QTL can move (jump) among a few neighboring markers. This approach may be adopted in the second stage of mapping, *i.e*., fine mapping after the important QTL regions have been identified.

One drawback of the MCMC-implemented Bayesian method is the slow computation process due to the large number of environments and the high dimensionality of the model. A quick method may be the posterior mode estimation in which only the conditional posterior modes are presented as the Bayesian estimates for the parameters of interest. Although the estimates are no longer Bayesian estimates, the results may be comparable. This quick posterior mode estimation may provide preliminary results to be used for further analysis using the fully Bayesian analysis.

Finally, the entire data analyses were conducted using a program developed in R. Interested readers may visit our website (www.statgen.ucr.edu) to download the program and the sample data to test the method and analyze their own data.

## Acknowledgments

This project was supported by the National Plant Genome Initiative of the U.S. Department of Agriculture Cooperative State Research, Education, and Extension Service grant 2007-02784 (to S.X.).

## Footnotes

Communicating editor: D. W. Threadgill

- Received June 27, 2010.
- Accepted August 21, 2010.

- Copyright © 2010 by the Genetics Society of America