## Abstract

The doubled-haploid (DH) barley population (Harrington × TR306) developed by the North American Barley Genome Mapping Project (NABGMP) for QTL mapping consisted of 145 lines and 127 markers covering a total genome length of 1270 cM. These DH lines were evaluated in ∼25 environments for seven quantitative traits: heading, height, kernel weight, lodging, maturity, test weight, and yield. We applied an empirical Bayes method that simultaneously estimates 127 main effects for all markers and 127(127−1)/2=8001 interaction effects for all marker pairs in a single model. We found that the largest main-effect QTL (single marker) and the largest epistatic effect (single pair of markers) explained ∼18 and 2.6% of the phenotypic variance, respectively. On average, the sum of all significant main effects and the sum of all significant epistatic effects contributed 35 and 6% of the total phenotypic variance, respectively. Epistasis seems to be negligible for all the seven traits. We also found that whether two loci interact does not depend on whether or not the loci have individual main effects. This invalidates the common practice of epistatic analysis in which epistatic effects are estimated only for pairs of loci of which both have main effects.

EPISTATIC effects are statistically defined as interactions between effects of alleles from two or more genetic loci (Fisher 1918). Interactions, however, are simply deviations from additivity in a general linear model; as such they are often treated as statistical errors. Cockerham (1954) showed that epistatic effects can be partitioned into various epistatic components, *e.g*., and effects, etc. Epistasis is now considered as an important source of genetic variation for quantitative traits. Because different components involve interactions of different numbers and different types of alleles, some components are more important than others. Especially, the component is shown to be heritable (Goodnight 1988) and thus much attention has been paid to the study of effects in response to selection and evolution (Goodnight 2000; Jannink 2003).

Epistasis is an important source of variation contributing to speciation (Wright 1931) because breakdown of a certain combination of alleles already adapted to a local environment will decrease the fitness of the recombinants. However, the importance of epistasis in quantitative traits among less diversified populations is less clear. Some studies showed that the epistatic variance can account for a large proportion of the genetic variance of quantitative traits among progeny of line crosses (Carlborg *et al*. 2005; Malmberg and Mauricio 2005; Malmberg *et al*. 2005). However, the reported epistatic effects of QTL are most likely biased because many studies that did not show any significant epistatic effects are perhaps not reported in the literature, a phenomenon called the Beavis effect (Beavis 1994; Xu 2003b). The censorship of small or null epistatic effects biases the reported results upward. Although efficient methods have been developed for mapping QTL with main effects (Lander and Botstein 1989; Sillanpaa and Arjas 1998; Xu 2003a; Yi *et al*. 2003a; Wang *et al*. 2005; Zhang *et al*. 2005), methods for mapping QTL with epistatic effects are still premature. These methods either utilize models including a single epistatic effect at a time (Holland 1998; Malmberg *et al*. 2005) or apply a model selection strategy that searches for multiple epistatic effects (Carlborg *et al*. 2000; Yi *et al*. 2003b, 2005). These methods may not guarantee that all important epistatic effects are detected. Recently, Xu (2007) developed an empirical Bayes method that can simultaneously estimate main effects of all individual markers and epistatic effects of all pairs of markers. The algorithm is computationally efficient so that large sets of data can be analyzed within very short computing time.

A doubled-haploid barley population was developed by the North American Barley Genome Mapping Project (Tinker *et al*. 1996). Each genotype was replicated ∼25 times. QTL were mapped for seven agronomic traits. On average, there were three to six QTL contributing to the genetic variance of each trait. The results were quite reliable due to the relatively dense marker map, the reasonable sample size, and, more importantly, the large number of replications. Because of this, the data set has been analyzed many times by various investigators to test new statistical models (Xu 2003a; Yi *et al*. 2003a,b; Zhang *et al*. 2005; Xu 2007). However, epistatic effects have not been tested in this barley population for all the traits recorded in the experiment. Xu (2007) analyzed only the trait kernel weight (KWT) to demonstrate the application of the empirical Bayes method. No general conclusion was made in that study. We conducted a genomewide analysis of epistatic effects for all the traits using the new method (Xu 2007). The genomewide analysis employed was a true multiple-effect analysis that required no variable selection. All markers and marker pairs were included in a single model and their effects were estimated simultaneously. Since the genome coverage of the markers was quite high, no QTL or QTL pairs would be missed. The results are reliable so that conclusions can be made inclusively about the relative importance of epistasis in the genetic variance of quantitative traits.

## MATERIALS AND METHODS

#### Experimental population:

Data were retrieved from the North American Barley Genome Mapping Project (NABGMP) website (http://gnome.agrenv.mcgill.ca/). The experimental design and results were reported by Tinker *et al*. (1996). For the article to be self-contained, the experiment is briefly described here. The population consisted of 145 doubled-haploid (DH) lines of a cross between two related Canadian two-row barley lines, Harrington and TR306. The cross was made by the NABGMP to (1) construct a molecular marker map and (2) locate QTL that affect traits of economic importance. These DH lines were evaluated in 25 replications (environments) for seven quantitative traits: heading (HED), height (HGT), KWT, lodging (LDG), maturity (MAT), test weight (TWT), and yield (YLD). The number of replicates for an individual trait varied from 15 to 29 with an average of 25. The map consisted of 127 markers (mostly RFLPs) distributed over seven chromosomes with an average marker interval of 10.5 cM. The genome coverage of the markers was 1270 cM in length. Tinker *et al*. (1996) identified, on average, three to six QTL per trait, collectively explaining 35–50% of the genetic variance. None of the traits were controlled by a major QTL. Some QTL had interaction effects with the environments, but many showed effects that were consistent across environments. Epistatic effects were not investigated in the original study.

The purpose of this analysis was to conduct a genomewide investigation on the epistatic effects for the seven traits. For simplicity, we took the average phenotypic value of each line across the environments as the input phenotype for that line. Because of the large number of replicates, the average phenotypic value of each line approximately represents the genotypic value of that line. All QTL detected would represent those showing consistent effects across environments. The genotype of each marker was coded as for the TR306 allele and for the Harrington allele. A missing genotype was coded as 0. There were ∼4.9% of the marker genotypes with missing values.

#### Statistical analysis:

Missing marker genotypes were imputed using information from the nearest nonmissing flanking markers. We first used the genotypes of flanking markers to calculate the conditional probability of the missing marker genotype. We then sampled the genotype of the missing marker from this conditional probability. This is called marker imputation. The missing markers were imputed one at a time from one end of the chromosome to the other end. If two or more consecutive markers were missing, the imputed marker genotype for the first missing marker, combined with the first nonmissing marker in the other side of the second missing marker, was used to calculate the probability of the second missing marker genotype. For example, consider five markers in the order of ABCDE and markers BCD have missing genotypes. The genotype of marker B is generated using information from markers A and E. The genotype of marker C is generated from the imputed genotype of marker B and the genotype of marker E. The genotype of marker D is generated from the imputed genotype of marker C and the genotype of marker E. Once all missing markers were imputed for all individuals, we had a set of imputed marker genotypes for the population. This data set was used as an input data set to conduct the epistatic analysis. We generated 20 imputed samples of the marker genotype data and thus analyzed the data 20 times, one from each imputed marker data set. The estimated parameters represented the average estimates of the 20 imputed samples. The marker distribution was quite even across the genome and thus, for simplicity, we treated each marker as a putative QTL; *i.e*., we estimated only the effects of markers. If a QTL was located between two markers, its effect would be picked up by the two flanking markers. Hereafter, we use markers and putative QTL interchangeably.

The empirical Bayes method (Xu 2007) was used to analyze the data. The model is briefly reintroduced here, but the technical detail of the method is referred to the original study (Xu 2007). Let be the number of DH lines and be the number of markers. The vector of phenotypic values for a trait is described by the following linear model,(1)where is an vector, is the population mean, is an vector of the genotype indicators for locus *l* , takes one of two values depending on which parental allele has been passed to line *i* for locus *l*, is the additive (main) effect for locus *l* and is the epistatic effect between loci *l* and , and is the residual error vector with an assumed distribution. The notation represents a direct product of vectors and . Excluding , the total number of QTL effects is , including additive effects and epistatic effects. We now use *j* to index the *j*th genetic effect (including additive and epistatic effects) for . We can rewrite model (1) as(2)

Comparing model (2) to model (1), we can see that and if the *j*th effect is a main effect, and and if the *j*th effect is an epistatic effect. Therefore, model (2) is a general model for both the main and the epistatic effects. As far as the method of estimation is concerned, distinction between a main effect and an epistatic effect is unnecessary. For convenience of presentation, we always assumed that has been centered and rescaled so that and . In other words, each *X* variable has been standardized to have a zero mean and a unity standard deviation.

Since the number of model effects is times as large as the sample size, the ordinary least-squares method would not work. The empirical Bayes method (Xu 2007) adopted a random model approach by treating each QTL effect, say , as a random variable sampled from a distribution. The random model regression analysis is essentially a Bayesian regression method (Lindley and Smith 1972).

We used the two-step approach of Xu (2007) to estimate the QTL effects. The first step was a typical random model variance-component analysis. The maximum-likelihood method (Hartley and Rao 1967) was used to estimate the population mean and all the variance components. The number of variance components was 8128, requiring a special algorithm to handle a model with such a large number of parameters (Xu 2007). The second step involved best linear unbiased prediction (BLUP) estimation of the QTL effects given the estimated variance components (Robinson 1991).

The estimated variance components were used only to shrink the estimation of the QTL effects. Because each QTL had its own estimated variance (), the shrinkage was selective, meaning that spurious and small-effect QTL would be shrunken to zero and large-effect QTL would be subject to virtually no shrinkage. This is the very reason that the shrinkage approach can handle a super-saturated model. Once the BLUP estimates of QTL effects were obtained, the genetic variance explained by a QTL took the square of . Therefore, the total additive variance was and the total epistatic variance was , neglecting the covariance caused by linkage. The overall genetic variance was . The corresponding proportions of the phenotypic variance contributed by the additive, the epistatic, and the total genetic variances were defined as , , and , respectively, where is the observed phenotypic variance. Note that only effects deemed to be “significant” contributed to the calculation of , , and .

How large of an estimated effect is sufficiently large to be declared as “significant”? One can convert each estimated effect into a *t*-test statistic, , and compare with a critical value. We found that either was close to zero (trivial) or notably deviated from zero (nontrivial). When was trivial, the estimation error was also close to zero. However, the speed of approaching zero for is slower than that for . When was nontrivial, was roughly constant across all the nontrivial effects. Therefore, we may simply compare with a critical value drawn from the distribution of . All effects larger than the critical value are declared as significant. The critical value was obtained from analysis of 20 reshuffled samples (permutation test). The percentile of the distribution of the estimated effects of the reshuffled sample was a good approximation of the critical value, where is a controlled experimental type I error.

## RESULTS

The data were analyzed using a SAS IML program written by Xu (2007). The computer program can be downloaded from our website (http://www.statgen.ucr.edu). The program took ∼2 min to converge for each trait on a Pentium PC with a 3.60-GHz processor and 3.00 GB RAM. The numbers of replications, the estimated means , and the phenotypic variances are listed in Table 1 for all seven traits. The estimated QTL effects are plotted three dimensionally (3D) in Figure 1. Clearly, additive effects (diagonals of the plots) dominate in each of the seven traits. We chose the critical value at an experimental type I error of for each trait. The critical values obtained from the average of 20 reshuffled samples are also listed in Table 1. The total number of QTL effects detected (*N*) ranged from 13 (KWT) to 21 (MAT and TWT) with an average of 18. The number of significant additive effects () ranged from 8 (LDG) to 17 (TWT) with an average of 12. The number of significant epistatic effects () ranged from 1 (KWT) to 11 (LDG) with an average of 6. Table 1 shows that, on average, additive variance () dominates over epistatic variance () for all traits, with an average = 0.35 and an average = 0.06. The average was 0.41 with KWT having the highest (0.47) and TWT the lowest (0.35). The highest single additive effect () explained ∼18% of the phenotypic variance (KWT) and the highest single epistatic effect () explained 2.6% of the phenotypic variance (LDG). Clearly, the total genetic variance contributed by significant QTL effects was predominantly determined by the additive variance. The result was consistent for all the seven traits.

Among the detected epistatic effects, we investigated the proportion of the epistatic effects between pairs of loci that both lack main effects []. We found that these epistatic effects accounted for 90% of the total number of epistatic effects. The remaining 10% of the epistatic effects involved pairs of loci in which only one had a significant main effect []. There are no significant epistatic effects that involve pairs of loci in which both had significant main effects. This means that whether a locus interacts with another locus does not depend on whether or not the locus has a main effect.

## DISCUSSION

The epistatic effect model used to analyze the barley data is an oversaturated model. The usual solution to the problem in regression is through variable selection (Schwarz 1978). For a model as large as , an exhaustive search would require evaluation of all possible models, which is impossible to achieve within a reasonable time frame, even using a supercomputer. A heuristic search is possible but may not guarantee finding the optimal model. Bayesian model selection (George and McMulloch 1993), by taking advantage of Markov chain Monte Carlo (MCMC) sampling, is a more efficient algorithm than both the exhaustive and the heuristic searches. We employed stochastic search variable selection (SSVS), which is a Bayesian model selection algorithm (George and McMulloch 1995; Yi *et al*. 2003a), to analyze this data set, but it failed because was too small for this large model. Bayesian shrinkage analysis (Xu 2003a) also failed to generate any meaningful results. We then tried the LASSO algorithm (Tibshirani 1996) and it failed for most of the traits. The failure of all these algorithms was reflected by the strange result: most of the estimated QTL effects were extremely large so that the total genetic variance (the sum of squares of the estimated QTL effects) was larger than the phenotypic variance by many orders of magnitude. The reasons for the failure of LASSO are unclear, but small sample size may be responsible for this. The MCMC-based analyses (SSVS and Bayesian shrinkage) use hierarchical models, which involve an extremely large number of parameters. The epistatic model is a linear model. Our primary interest is in the regression coefficients (additive and epistatic effects, both denoted by , ). However, the MCMC-based methods also infer the variance components (), which are the parameters of the prior distributions of the regression coefficients. The methods draw all the variables (regression coefficients and variance components) sequentially from their conditional posterior distributions. The methods try to infer too much from the data and thus require large sample sizes. The simulation experiments conducted by Xu (2007) showed that the MCMC-based methods performed satisfactorily when the sample size was 600.

The method we employed here is a Bayesian method in terms of estimation of the QTL effects. However, the variance components () were supposed to be hyperparameters of the normal priors for the regression coefficients. Because there are so many of them, their values are hard to determine *a priori*. Therefore, we estimated them first using the variance-component analysis and then substituted these prior variances by the estimated values, an approach called empirical Bayes (Carlin and Louis 1996). The empirical Bayes method not only generated meaningful results, but also was efficient in terms of high computing speed.

Given the fact that the empirical Bayes method is also a Bayes method, why is it more robust to small sample sizes than the MCMC-based full Bayes methods? Several special properties of the empirical Bayes method may contribute to the robustness. First, the empirical Bayes method does not use a hierarchical model, and thus it infers a smaller number of parameters than the full Bayes methods. Although the empirical Bayes still estimates the variance components, these variance components are estimated separately using a marginal maximum-likelihood (ML) method before the Bayes analysis. By “marginal ML” we mean that the likelihood function is only a function of the variance components and the regression coefficients have been integrated out. The estimates of the variance components do not depend on the regression coefficients. This is clearly in contrast to the full Bayes methods in which the regression coefficients and the variance components are inferred sequentially with the estimate of one parameter depending on estimates of all other parameters. Second, the empirical Bayes method does not involve MCMC sampling for inference of the parameter distribution. The MCMC-based full Bayes methods failed not because they generated wrong estimates but because the Markov chains had never converged to the stationary distribution of the parameters and thus never generated meaningful estimates of the parameters. Third, the empirical Bayes method employed in this study infers the marginal posterior mean of each regression coefficient, with other regression coefficients integrated out. Therefore, the estimate of one regression coefficient does not depend on the estimates of other regression coefficients. This clearly has improved the robustness of the method to small sample size.

The justification for the robustness of the empirical Bayes method to small sample sizes does not mean that the empirical Bayes method can deal with a population with an infinitely small sample size. There is a limit below which any method will fail because the data are just too small to contain sufficient information to make any inferences. Is a sample size of 145 DH lines sufficient for QTL mapping? The answer is probably no in general (Beavis 1994; Melchinger *et al*. 1998), but 145 lines seem to be sufficient to infer large-effect QTL in this particular barley population. One reason is that the phenotypic value of each line in each environment was measured by the average value of several plants, all with the same genotype (Tinker *et al*. 1996). The mean phenotypic value (plot mean) of a line in a particular environment already represented approximately the genotypic value of that line in that environment. The variance of plot means across environments was largely due to interaction. In addition, each line was replicated in many (∼25) environments. After taking the average of the trait values across the environments, we have further reduced the residual error and made the mean value close to the true genotypic value. Certainly, we would not be able to handle such a large number of effects with 145 lines if the trait value for each line were measured from a single plant. Nevertheless, 145 is not a large number and thus our conclusions are still limited.

Although the empirical Bayes method worked well for this data set, the sample size was not sufficiently large to shrink all small effects to zero. Many spurious QTL effects still occurred. Therefore, we adopted a permutation test to find a suitable criterion to select those QTL with effects sufficiently large to be declared as significant. Theoretically, the critical value should be obtained from many reshuffled samples, say 1000. In this study, the number of effects in the model was so large that 20 reshuffled samples appeared to be sufficient. We found that the variance of the critical values obtained from multiple reshuffled samples was small. For example, the average critical value for HED at obtained from 20 reshuffled samples was 0.0699 and the standard deviation was 0.00563 (coefficient of variation, CV = 8%). When taking the average of 20 reshuffled samples, we have reduced the CV to . The CVs of other traits are all <2% (data not shown). Therefore, we used the critical value obtained from 20 reshuffled samples for each trait as an approximation of the true critical value. We ranked the estimated QTL effects in descending order obtained from both the original sample and the reshuffled sample and investigated the patterns of change of the estimated effects. Figure 2 shows the patterns of the change for the top 500 effects (truncated from a total of 8128 effects) for four of the seven traits. The reshuffled samples showed drastically different patterns of change from the original samples. The shaded area of a reshuffled sample, *e.g*., HED-PERM, resembles a triangle whereas that of the original sample does not. Therefore, by simply comparing the patterns of change for the estimated effects from the original sample, we can tell that some large estimated effects cannot be explained by chance.

Usually, significance tests are not required in Bayesian analysis; only frequentists emphasize significance tests. We employed a permutation test to make inferences about the significance of a QTL effect. This is a hybrid approach between Bayesian and frequentist approaches. The purpose of this study was not for QTL detection; rather it was intended to assess the importance of epistasis relative to additivity for economically important quantitative traits in barley. If QTL detection were the purpose, simply reporting all QTL effects that explain more than -proportion of the total phenotypic variance would suffice, where may be a natural choice. If a QTL explains less than -proportion of the total phenotypic variance, it will not be biologically important, even if it may be statistically significant. However, the purpose of this study was to evaluate the genomewide QTL effects. On one hand, we wanted to include every estimated QTL effect that cannot be explained by chance. On the other hand, we wanted to exclude all the noisy effects. Therefore, we adopted the permutation test to separate the true QTL from the spurious QTL. Of course, the thresholds drawn from the permutation test are arbitrary. For the thresholds chosen in this study, each QTL effect deemed to be significant has a chance of to be false positive. The empirical Bayes method has already shrunk a large number of effects to zero, but still left many effects deviating from zero. These deviations, although very small individually, collectively contribute to a large proportion of the trait variance, because of the extremely large number of epistatic effects included in the model. We noted that the residual variance has been consumed by the large number of epistatic effects; *i.e*., the estimated residual error variance is close to zero in every case. The situation is equivalent to multiple regression using the least-squares method where the residual variance decreases as the number of independent variables increases until it reaches zero as . Once , the least-squares method becomes invalid. The empirical Bayes method still works even if *p* ≫ *n*, but the residual variance is consumed by so many of the spurious effects. If all the estimated effects were included in the calculation of the genetic variance without using some kind of testing criterion, the overall epistatic variance would always dominate over the additive variance (a useless conclusion). This explains why significance tests have been conducted in this study.

The epistatic effects defined in our model are different from the orthogonal contrasts defined by Cockerham (1954) and recently reiterated by Kao and Zeng (2002) and Zeng *et al*. (2005). Kao and Zeng called Cockerham's epistatic effects the statistical parameters and the epistatic effects defined here the genetic parameters. The genetic parameters were also called physiological parameters by Cheverud and Routman (1995). The orthogonal contrasts can be expressed as linear functions of the genetic effects defined in our model. This means that Cockerham's epistatic effects are functions of the main effects defined in our model. This may justify Kao and Zeng (2002) for estimating epistatic effects only for loci that both have main effects because main effects contribute to the orthogonalized epistatic effects. The presence of epistatic effects between two loci defined in our study does not depend on whether or not the two loci both have main effects. This has been proved by the analysis of this experiment. Take trait HED, for example; we detected five epistatic effects, but none of the effects involved two loci that both had significant main effects. One epistatic effect involved a pair of loci of which only one had a significant main effect (1/5 = 0.2). The remaining four epistatic effects (4/5 = 0.8) involved pairs of loci that both lack main effects. In traditional QTL mapping, people normally scan the genome for QTL with main effects. Once QTL with main effects are identified, an epistatic model is fit to examine the epistatic effects only between the QTL that have main effects. This approach certainly has no logical basis. We would not be able to detect any epistatic effects for the barley data if this approach had been taken.

Our conclusion for the barley data analysis was that epistatic variance does contribute to the genetic variance. However, the cumulative contribution from significant epistatic effects is very small relative to that from the additive effects. This discovery was consistent for all the seven traits investigated. A recent study using F_{2} progeny of a cross between two inbred mice for obesity-related traits showed that many epistatic effects were significant, but they were all small in magnitude relative to the additive effects (Yi *et al*. 2006). The largest main-effect QTL contributed ∼20% of the trait variance but the largest epistatic effect accounted for only ∼5% of the trait variance. The result from the mice experiment was similar to that from the barley analysis. Yi *et al*. (2006) used a Bayesian model selection method to analyze the mice data. Although the model used by Yi *et al*. (2006) was not saturated, it was a multiple-effects model in which multiple main effects and epistatic effects were simultaneously estimated in a single model, a feature also shared by the method used this study. Therefore, simultaneous estimation of all model effects seemed to support the notion that epistasis is not an important contributor to the genetic variance of less diversified varieties of crops or breeds of animals.

## Acknowledgments

We are grateful to two anonymous reviewers for their useful comments and suggestions on the first version of this manuscript, which have significantly improved its current presentation. This research was supported by the National Institutes of Health grant R01-GM55321 and the National Science Foundation grant DBI-0345205 to S.X.

## Footnotes

Communicating editor: N. Takahata

- Received October 6, 2006.
- Accepted January 13, 2007.

- Copyright © 2007 by the Genetics Society of America