# The Role of Epistasis in the Manifestation of Heterosis: A Systems-Oriented Approach

- A. E. Melchinger
^{*},^{1}, - H. F. Utz
^{*}, - H.-P. Piepho
^{†}, - Z.-B. Zeng
^{‡}and - C. C. Schön
^{§}

^{*}Institute of Plant Breeding, Seed Science and Population Genetics,^{†}Bioinformatics Unit, Institute of Crop Production and Grassland Research and^{§}State Plant Breeding Institute, University of Hohenheim, 70599 Stuttgart, Germany and^{‡}Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695

- 1
*Corresponding author:*Institute of Plant Breeding, Seed Science, and Population Genetics, University of Hohenheim, Fruwirthstrasse 21, 70599 Stuttgart, Germany. E-mail: melchinger{at}uni-hohenheim.de

## Abstract

Heterosis is widely used in breeding, but the genetic basis of this biological phenomenon has not been elucidated. We postulate that additive and dominance genetic effects as well as two-locus interactions estimated in classical QTL analyses are not sufficient for quantifying the contributions of QTL to heterosis. A general theoretical framework for determining the contributions of different types of genetic effects to heterosis was developed. Additive × additive epistatic interactions of individual loci with the entire genetic background were identified as a major component of midparent heterosis. On the basis of these findings we defined a new type of heterotic effect denoted as augmented dominance effect *d _{i}** that comprises the dominance effect at each QTL minus half the sum of additive × additive interactions with all other QTL. We demonstrate that genotypic expectations of QTL effects obtained from analyses with the design III using testcrosses of recombinant inbred lines and composite-interval mapping precisely equal genotypic expectations of midparent heterosis, thus identifying genomic regions relevant for expression of heterosis. The theory for QTL mapping of multiple traits is extended to the simultaneous mapping of newly defined genetic effects to improve the power of QTL detection and distinguish between dominance and overdominance.

THE concept of heterosis is widely used in plant and animal breeding. In many species, the controlled crossing of selected parental components, mainly inbred lines, is employed to maximize heterosis and thus agronomic performance of the resulting F_{1} hybrids. However, the understanding of this biological phenomenon is limited and the genetic basis of heterosis has yet to be elucidated.

Quantitative genetics has contributed to our understanding of heterosis by (i) formulating genetic models explaining heterosis on the basis of different modes of gene action such as dominance, overdominance, and epistasis (for review see Lamkey and Edwards 1999); (ii) devising the theory for the design and analysis of experiments investigating these types of gene action (for review see Lynch and Walsh 1998); and (iii) gathering a plethora of experimental data supporting or questioning these theories. One of the experimental approaches proposed for investigating the relative importance of different types of gene action is the analysis of generation means (Kearsey and Pooni 1996). Six basic generations (the two parental lines and their F_{1} and F_{2} as well as backcrosses of the F_{1} to each of the parents) are used to estimate the magnitude of additive, dominance, and epistatic effects affecting the quantitative trait under study. However, these parameters capture the net contribution of gene effects summed over all loci and consequences of summation may be pronounced if positive and negative effects at individual loci cancel each other. An alternative approach is the partitioning of the genetic variance into independent components due to additive, dominance, and epistatic effects (Fisher 1918). However, as pointed out by Lynch and Walsh (1998), unless information on gene frequencies of the reference population is available, variance components provide limited information on the relative importance of the different modes of gene action because dominance and epistasis can greatly affect additive or dominance components of variance. This limitation can be overcome by the use of populations derived from a cross between two inbred lines such as the North Carolina experiment III (design III) proposed by Comstock and Robinson (1952). A random sample of F_{2} individuals derived from a cross between two homozygous inbred lines is backcrossed to each of the parents, yielding a population with gene and genotype frequencies equivalent to an F_{2}. A one-way analysis of variance (ANOVA) of phenotypic means and differences of the F_{2} backcrosses yields estimates of the dominance and additive genetic variance with nearly equal precision and their ratio provides a weighted estimate of the squared degree of dominance.

A major step toward the analysis of the type of gene action at individual genetic loci affecting quantitative traits (QTL) was the advent of molecular marker technology. The first marker-aided study estimated the predominant type of gene action in segregating F_{2} populations of maize (Edwards *et al*. 1987). Additive and dominance gene effects at individual QTL as well as digenic epistatic interactions were estimated on the basis of contrasts of marker genotype classes. However, these estimates are not sufficient for making inferences about the genetic basis of heterosis. The performance of an F_{1} hybrid or its filial generations and midparent heterosis (MPH) have distinct genetic components. Assuming the presence of digenic epistasis, hybrid performance is a function of the sum of dominance effects and dominance × dominance (*dd)* epistasis. To date, a general mathematical derivation of the contributions of different genetic effects to MPH is still lacking, but quantitative genetic expectations of MPH have been presented in a descriptive manner (*e.g*., Van Der Veen 1959). While additive × additive (*aa*) interactions contribute to MPH, *dd* interactions may not, depending on the metric used for describing genotypic values. In this study we show that QTL exhibiting significant dominance effects may not contribute to MPH if the sum of their *aa* epistatic interactions with the genetic background is positive. Furthermore, we show that significant heterotic QTL can result from *aa* epistatic interactions with the genetic background if positive and negative alleles are contributed with equal frequency by each parent. Consequently, we define a new genetic effect (*d _{i}**) that allows us to express MPH as the sum of individual QTL effects and demonstrate that design III is suitable for mapping of QTL with genotypic expectations equivalent to the augmented dominance effect

*d*.

_{i}*The objectives of this study were to (i) develop a general theoretical framework for determining the contributions of the different types of genetic effects to heterosis, (ii) dissect MPH into its underlying components using composite-interval mapping (CIM) and a variant of design III, (iii) give quantitative genetic expectations of heterotic QTL, and (iv) extend the theory for the joint analysis of multiple traits (Jiang and Zeng 1995) to the simultaneous mapping of different newly defined genetic effects to improve the power of QTL detection and distinguish between dominance and overdominance. Results from an experiment on heterosis in Arabidopsis for which the presented theory has formed the quantitative genetic basis are in an accompanying article (Kusterer *et al*. 2007, this issue).

## GENETIC EFFECTS CONTRIBUTING TO HETEROSIS

When assessing the type of gene action contributing to heterosis, we need to express heterosis as a function of additive, dominance, and epistatic gene effects. To date, no generalized mathematical derivation showing the respective contributions of the different genetic effects has been developed for an arbitrary number of QTL and all types of higher-order epistatic effects.

MPH for a quantitative trait is defined as the difference between the genotypic value of an F_{1} hybrid () and the mean genotypic value of its two homozygous parents ():(1)Let P1 and P2 differ at the loci set *Q* = {1,…, *q*} affecting the quantitative trait of interest. Let *v _{i}* be an indicator variable for the genotype at QTL

*i*taking values 0, 1, 2 if homozygous P1, heterozygous, or homozygous P2, respectively. The types of genetic effects contributing to

*G*are described by the 3

*parameters α*

^{q}*, which define the genetic effects of type additive at the loci set*

_{AD}*A*() and of type dominance at the loci set (\

*A*, the complement of

*A*in

*Q*). For exemplification of α

*for*

_{AD}*q*= 3 see supplemental Table S1 at http://www.genetics.org/supplemental/.

The coefficients and meaning of parameters α* _{AD}* in the genotypic value

*G*depend on the choice of the metric. In the quantitative genetic literature, two main metrics have been described for populations derived from a cross between two inbred lines: the F

_{2}metric and the F

_{∞}metric (Van Der Veen 1959; Yang 2004). The F

_{2}metric model, a special case of Cockerham's (1954) model for partitioning the genetic variance into eight orthogonal contrasts due to additive, dominance, and epistatic effects, defines genetic effects as deviations from the mean of the F

_{2}population in linkage equilibrium. The F

_{∞}model defines genetic effects as contrasts between different genotypes without reference to any population (Yang 2004). Both models can be translated into each other by a linear transformation. In the presence of epistasis they differ with respect to interpretation of genetic effects and the structure of variance components. With the F

_{2}metric and an F

_{2}population in linkage equilibrium the genetic variance is partitioned into orthogonal components that represent sums of squared additive, dominance, and epistatic effects, providing insight into the relative importance of different types of gene action. Consequently, in the analysis of heterosis with design III, we prefer the F

_{2}metric. With the F

_{∞}metric, additional complexity is introduced, because besides the dominance effect at a QTL and its

*aa*epistatic interactions, also

*dd*epistatic interactions with the genetic background contribute to MPH (Van Der Veen 1959). Consequently, genetic expectations of individual heterotic QTL become unwieldy and more difficult to interpret with the F

_{∞}metric.

Under the F_{2} metric, variables and determine the coefficient of α* _{AD}* in

*G*with

*x*

_{V}_{,i}= −1, 0, 1 and when

*v*= 0, 1, 2, respectively.

_{i}Thus, we can express the genotypic value of genotype *V* = (*v*_{1},…, *v _{q}*) as(2)with indicating summation over all possible subsets

*A*within the set

*Q*. For

*A*= Ø and

*D*= Ø, α

*= μ; for*

_{AD}*D*= Ø, α

*= α*

_{AD}*; for*

_{A}*A*= Ø, α

*= α*

_{AD}*.*

_{D}Let us assume an F_{2} individual with *V* = (*v _{i}*,

*v*,

_{j}*v*) = (2, 1, 0). Then,

_{k}*x*

_{V}_{,i}= 1,

*x*

_{V}_{,j}= 0,

*x*

_{V}_{,k}= −1,

*y*

_{V}_{,i}= −

*y*

_{V}_{,j}= and

*y*

_{V}_{,k}= − Thus,The parameter μ denotes the genotypic expectation of the F

_{2}generation in linkage equilibrium. In accordance with the definition of Falconer and Mackay (1996, p. 109)

*a*denotes the additive or homozygous effect at QTL

_{i}*i*and

*d*the dominance or heterozygous effect. The additive effect at locus

_{i}*i*is positive (+

*a*) when the trait-increasing allele is contributed by P2 and negative (−

_{i}*a*) when contributed by P1. The degree of dominance is expressed as

_{i}*d*/

_{i}*a*. Epistatic interactions between loci

_{i}*i*,

*j*, and

*k*are denoted, adopting the notation of Yang (2004). Analogously, the genotypic values of the parental homozygous lines P1 and P2 and the F

_{1}hybrid areandandwhere and denote the number of elements in sets

*A*and

*D*, respectively. Then, MPH can be calculated as(3)This generalized derivation shows that under the F

_{2}metric the quantitative genetic expectation of MPH is affected by the dominance effects at QTL and by epistatic interactions including an odd number of dominance terms (

*e.g*.,

*ddd*but not

*dd*). In addition, epistatic effects including additive terms also contribute to MPH, because MPH is based on the deviation of the F

_{1}hybrid from the mean of the two homozygous parental lines. However, only effects with an even number of additive terms (

*e.g*.,

*aa*and

*aad*but not

*ad*and

*aaa*) contribute to MPH.

Considering only digenic epistasis, Equation 3 can be written aswith *Q _{i}* denoting the loci set

*Q*excluding element

*i*.

To express MPH as the sum of individual QTL effects we define a new type of heterotic genetic effect *d _{i}** that we denote as augmented dominance effect and that includes the dominance effect of QTL

*i*(

*d*) minus half the sum of its additive × additive epistatic interactions (

_{i}*aa*) with all other QTL irrespective of linkage(4)Then, MPH can be expressed as the sum of augmented dominance QTL effects(5)

_{ij}The effect of additive × additive epistasis on MPH and *d _{i}** is demonstrated with a numerical example for four QTL in the supplemental information (supplemental Table S2 at http://www.genetics.org/supplemental/). Depending on gene dispersion and the magnitude of the

*aa*epistatic effects, QTL with significant dominance effects

*d*may not contribute to MPH if the sum of their

_{i}*aa*epistatic interactions with the genetic background is positive [supplemental Table S2,

**S**= (+, +, +, +)]. Furthermore, significant heterotic QTL can result from

*aa*epistatic interactions with the genetic background if positive and negative alleles are contributed with equal frequency by each parent, as would be expected for an elite × elite cross [supplemental Table S2,

**S**= (+, +, −, −)].

The quantitative genetic expectation of the parental difference (PD) is found to be(6)

Note that under the F_{2} metric the formulas for homozygous genotypes such as parents P1 and P2 involve also dominance terms () besides purely additive terms (). As a consequence, PD is affected by the additive effects at the QTL as well as by epistatic interactions including an odd number of additive and an arbitrary number of dominance terms (*e.g*., *ad* but not *aa*). Considering only digenic epistasis, PD reduces towhere(7)In accordance with the term suggested for the effect *d _{i}**,

*a*is denoted as augmented additive effect. It includes the additive effect for QTL

_{i}**i*(

*a*) minus half the sum of dominance × additive epistatic interactions (

_{i}*da*) with all other QTL, corresponding exactly to the net contribution of QTL

_{ij}*i*to the parental difference.

## IDENTIFICATION OF AUGMENTED QTL EFFECTS

On the basis of the results from the previous section it becomes obvious that we need to revise our approaches for the identification of genomic regions contributing to heterosis. Instead of identifying QTL with maximum dominance and *dd* interactions that increase F_{1} performance we need to identify genomic regions that contribute to MPH, *i.e*., that yield significant results in the QTL analysis due to the dominance effect at QTL *i* and its epistatic interaction with the genetic background. Thus, specific experimental designs are needed that identify QTL with genotypic expectations that precisely equal the augmented dominance effect *d _{i}**. To our knowledge this criterion is met only by genetic effects estimated with design III (Comstock and Robinson 1952). The original design III comprised the analysis of F

_{2}individuals backcrossed to their parental inbred lines and was devised for estimating the average degree of dominance over all loci. We modified the statistical analysis of design III to accommodate the analysis of testcrosses of recombinant inbred lines (RILs) because they are “immortal” test units that can be shared between research groups and can be repeatedly phenotyped.

#### Design III with RILs:

Let us assume a random population of RILs derived from the cross between the two homozygous lines P1 and P2. Further, we assume that the RILs are backcrossed to their parental lines, yielding testcross progenies *H*_{1} and *H*_{2}. Gene effects of the design III testcross progenies are expressed with regard to the corresponding gene-orthogonal population, *i.e*., the F_{2}. The parental line exhibiting superior average testcross performance across both testers is denoted as P2. With testcrosses evaluated in a randomized complete block design we obtain phenotypic trait values *Y _{tpk}* of the testcross progeny

*H*of RIL

_{t}*p*(

*p*= 1,…,

*n*) crossed with tester

*t*(

*t*= 1, 2 for design III) in the

*k*th block (

*k*= 1,…,

*r*). Following Comstock and Robinson (1952), we perform two linear transformations

*Z*(

_{s}*s*= 1, 2) on the performance data

*Y*of testcross progenies

_{tpk}*H*with pair means

_{t}*Z*

_{1pk}= (

*Y*

_{1pk}

*+ Y*

_{2pk})/2 and pair differences

*Z*

_{2pk}=

*Y*

_{1pk}

*− Y*

_{2pk}. Thus,

*Z*denotes the phenotypic value of transformation

_{spk}*Z*for RIL

_{s}*p*grown in the

*k*th block and

*Z*the progeny mean value of

_{sp}*Z*for RIL

_{s}*p*. Expected mean squares from the ANOVA are given in Table 1.

#### Marker-based estimation of augmented effects *a*_{i}* and *d*_{i}*:

_{i}

_{i}

The first to present the statistical theory for estimation of the type of gene action at individual QTL with design III were Cockerham and Zeng (1996). They presented genotypic expectations of marker contrasts using single-marker ANOVA and extended the analysis to test for two-locus interactions between linked QTL. They demonstrated that genotypic expectations of QTL mapped with design III were biased with epistasis. Dominance effects were confounded with *aa* epistasis, additive effects with *da* interactions regardless of linkage. However, interactions of individual QTL with the entire genetic background were not accounted for and, most importantly, they did not make the connection to genetic expectations of MPH. While Cockerham and Zeng (1996) considered the confounding epistatic effects in their analysis a limitation, we claim that they are favorable for the identification of genomic regions contributing to heterosis. The following derivations show that genotypic expectations of QTL identified with design III precisely equal the augmented dominance effect *d _{i}**.

Cockerham and Zeng (1996) defined four orthogonal single-marker contrasts (*C*_{1}, *C*_{2}, *C*_{3}, *C*_{4}) among the means of testcross progenies from F_{2} individuals or F_{3} lines, estimating additive (*C*_{1}) and dominance effects (*C*_{3}) as well as digenic epistasis of linked QTL (*C*_{2}, *C*_{4}). Here, we extend their methods to the analysis of heterosis and to the analysis of RILs. We defined two orthogonal single-marker contrasts on the basis of progeny mean values *Z _{sp}* for pair means (

*Z*

_{1}(

*m*)) and pair differences (

*Z*

_{2}(

*m*)) (see the appendix for a generalized derivation). While contrasts

*C*

_{1}and

*C*

_{3}in the notation of Cockerham and Zeng (1996) correspond to

*Z*

_{1}(

*m*) and

*Z*

_{2}(

*m*) in our notation, contrasts

*C*

_{2}and

*C*

_{4}, which provide tests for epistasis among linked QTL, cannot be calculated for RILs because they rely on comparisons of the heterozygous

*vs*. homozygous marker classes.

With epistasis restricted to digenic effects, we obtain(8)(9)*D _{mi}* being the linkage disequilibrium between QTL

*i*and marker

*m*(Weir 1996). For RILs,

*D*can be calculated from the recombination frequency

_{mi}*r*between QTL

_{mi}*i*and marker

*m*as (for further details see the appendix).

As can be seen from Equation 9, the expectation of *Z*_{2}(*m*) is a multiple of the sum of the augmented dominance effects *d _{i}** for all QTL with

*D*> 0, weighted by their linkage disequilibrium

_{mi}*D*to the marker

_{mi}*m*. Thus, estimation of dominance effects of QTL linked to marker

*m*is confounded with digenic epistatic interactions of type additive × additive with the entire genetic background. Analogously, the linear contrast

*Z*

_{1}(

*m*) is a multiple of the sum of augmented additive effects

*a*for all QTL with

_{i}**D*> 0, weighted by their linkage disequilibrium

_{mi}*D*to the marker.

_{mi}Expected mean squares for an ANOVA that includes contrasts *Z*_{1}(*m*) and *Z*_{2}(*m*) for design III evaluated in a randomized complete block design are presented in Table 1. Appropriate statistical tests can be derived by standard statistical theory. From expected mean squares, it becomes evident that the power of detecting significant QTL effects linked to marker *m* depends mainly on segregation variances and of the variables *Z*_{1} and *Z*_{2} with respect to QTL effects not accounted for by the marker *m*, *i.e*., the genetic background variation.

With single-marker ANOVA the QTL position (here reflected by parameter *D _{mi}*) and QTL effects (here

*a*or

_{i}**d*) are confounded. Hence, the effects of linked QTL cannot be separated, and effects of other unlinked QTL, which cause background noise, are not accounted for. The above shortcomings have been overcome by CIM (Zeng 1994; Jansen and Stam 1994), which allows separate estimation of the QTL position and the QTL effect as well as separation of linked QTL and control of genetic background variation. We have adopted this method for identification of genomic regions affecting heterosis. Using the model(10)where

_{i}**Z*is the linear function

_{sp}*Z*calculated for the

_{s}*p*th RIL,

*b*

_{0s}is the mean effect of the model for

*Z*,

_{s}*b*is the augmented effect of the putative QTL on

_{s}**Z*,

_{s}*x*is the conditional probability of the dummy variable θ

_{p}**given the observed genotypes at the marker loci*

_{m}*m*and (

*m +*1) flanking the putative QTL (θ

*takes value 0 or 2 if the genotype of the RIL at the QTL is homozygous P1 or P2, respectively),*

_{m}*b*are partial regression coefficients of

_{ls}*Z*on the

_{sp}*l*th marker assuming

*h*markers included in the model as cofactors,

*x*is an indicator variable taking value 0 or 2 depending on the genotype at marker

_{lp}*l*, and is the residual effect on

*Z*for the

_{s}*p*th RIL. CIM analyses can be individually performed for the two linear functions

*Z*from design III. Tests of significance are straightforward as described for CIM (Zeng 1994) with the null hypothesis H

_{s}_{0}:

*b*= 0 and the alternative hypothesis H

_{s}*_{A}:

*b*≠ 0.

_{s}*If we assume complete linkage between the marker *m* and one of the QTL (*r _{mi}* = 0), we obtain the following genotypic expectations for the contrast of the two homozygous marker classes at marker

*m*:(11)(12)

The advantages of CIM in comparison to single-marker ANOVA become obvious immediately: (i) the position of the QTL *i* can be estimated and (ii) the effect of the linked QTL *j* should be blocked by the use of cofactors (Jansen and Stam 1994; Zeng 1994). Thus, it is possible to test contrasts for QTL (not markers) and the genotypic expectation for *Z*_{2}(*i*), the contrast of *Z*_{2} between the two unobservable homozygous genotype classes at QTL *i*, reduces to Consequently, a genome scan with *Z*_{2} localizes genomic regions affecting MPH and *b*_{2}*/2 = *d _{i}**. Accordingly,

*a*, the contribution of QTL

_{i}**i*to the parental difference, is equal to

*b*

_{1}*. The advantage of design III is that genotypic expectations of QTL mapped with

*Z*

_{2}precisely equal their net contribution to MPH. However, the contribution of the main dominance effect

*d*cannot be assessed independently from the sum of

_{i}*aa*epistatic interactions of QTL

*i*with the genetic background, which is a limitation of design III.

#### Extension of heterotic QTL analyses:

In addition to separate CIM of *Z*_{1} and *Z*_{2} we propose to apply the theory of a joint analysis for multiple traits as suggested by Jiang and Zeng (1995) to the simultaneous mapping of augmented QTL effects *a _{i}** and

*d*. Depending on the correlation structure of data on multiple traits a joint analysis may increase the power of the likelihood-ratio (LR) tests for QTL detection and may allow us to distinguish between pleiotropy and close linkage of QTL for individual traits. Following Jiang and Zeng (1995), we regard

_{i}**Z*

_{1}and

*Z*

_{2}as two different traits and rewrite the model from Equation 10 in matrix notation,(13)where

**Z**is a matrix of

*Z*,

_{sp}**x***is a column vector of

*x*,

_{p}***b***is a row vector of

*b*,

_{s}***X**and

**B**are two matrices controlling the genetic background variation, and

**ℰ**is a matrix of The power of QTL detection using the model from Equation 13 is significantly increased by a joint LR test if the product of the effects

*b*

_{1}* (corresponding to

*a*) and

_{i}**b*

_{2}* (corresponding to 2

*d*) and the correlation ρ

_{i}*_{12}between residuals and have different signs (Jiang and Zeng 1995). In the joint QTL analysis, the hypotheses to be tested are: H

_{0}:

*b*

_{1}* = 0,

*b*

_{2}* = 0

*vs*. H

_{A}:

*b*

_{1}* ≠ 0 or

*b*

_{2}* ≠ 0.

On the basis of experimental results from the analysis of complex traits it seems reasonable to assume that marker–trait associations detected with the model in Equation 10 will rarely explain >50% of the genotypic variance even with large sample sizes (Schön *et al*. 2004). Thus, the residual component must be subdivided into two components; *i.e*., *= g _{sp} + e_{sp}*, with

*g*reflecting the genetic component in not accounted for by the putative QTL and the cofactors in the model, and

_{sp}*e*being the experimental error. As a consequence, residual components and can be correlated within RILs but are independent among RILs. The correlation ρ

_{sp}_{12}between residuals and can be approximated by Using the derivation of covariances presented in the appendix and assuming digenic epistasis and absence of linkage,(14)Hence, it follows that ρ

_{12}depends mainly on the variation in sign and magnitude of

*a*and

_{i}**d*. When P1 and P2 are elite inbred lines, we expect that about half of the trait-increasing alleles are contributed by P1 (

_{i}**i.e*.,

*a*< 0) and half by P2 (

_{i}**i.e*.,

*a*> 0). With directional dominance (

_{i}**d*0), the correlation of residuals is in this case expected to be close to zero. Thus, the joint LR test statistic will approximate the sum of the individual LR test statistics and no increase in the power of QTL detection can be expected from a genome scan using the multivariate model from Equation 13. However, when analyzing progenies from crosses between an elite parent (P2) and an exotic donor line (P1) with a much smaller proportion of positive alleles than the elite parent the joint LR test statistic can be advantageous for finding heterotic QTL contributed by P1. If P2 contributes the majority of positive dominant alleles,

_{i}* >*i.e*.,

*a*> 0 and

_{i}**d*> 0, then ρ

_{i}*_{12}> 0. In genome regions where a positive dominant allele originates from P1 (

*a*< 0;

_{i}**d*> 0), the product of effects

_{i}**a*and

_{i}**d*will be negative, while ρ

_{i}*_{12}> 0. Thus, the power of detecting heterotic QTL from exotic donor parents will be increased with the joint LR test.

In addition to increasing the power of QTL detection with multiple traits, the statistical method suggested by Jiang and Zeng (1995) also provides a test for separating pleiotropy from close linkage. In our case, the analogous test can be applied to distinguish between the hypothesis of dominance *vs*. overdominance in genomic regions with significant effects *a _{i}** and

*d*. LR tests for the null hypothesis H

_{i}*_{0}:

*z*(1) =

*z*(2)

*vs*. H

_{A}:

*z*(1) ≠

*z*(2), where

*z*is the position of the QTL affecting

*Z*

_{1}and

*Z*

_{2}, respectively, have been described by Jiang and Zeng (1995). Under the null hypothesis, the same genetic locus contributes to the augmented effects

*a*and

_{i}**d*. If both effects are of similar magnitude, acceptance of H

_{i}*_{0}would imply that dominant gene action is prevalent. In contrast, acceptance of H

_{A}would imply that one locus shows additive gene action (

*a*> 0;

_{i}**d*= 0) and at the second locus the heterozygote outperforms both homozygotes, thus exhibiting overdominance (

_{i}**a*= 0;

_{j}**d*> 0; ).

_{j}*## GENOTYPIC EXPECTATION OF BETTER PARENT HETEROSIS

When analyzing heterosis in self-pollinating crops such as wheat, rice, or tomato the reference base for the calculation of heterosis is often not the midparent value but the superiority over the better parent (*e.g*., Semel *et al*. 2006). However, if one is interested in the genetic causes of heterosis, it is most plausible to compare the hybrid with the average performance of both parental lines (and not only the better parent) because the F_{1} inherited half its nuclear genome from each parent. As can be seen from the following derivations, better parent heterosis (BPH) can be expressed as a function of augmented dominance and additive effects. Consequently, the genetic causes of BPH are more complex than those of MPH and include the latter.

Defining P2 to be the better-performing parent, the genotypic expectation of BPH can be calculated asThus, we obtain that BPH is affected by genetic effects with an odd number of dominance terms and effects with at least one additive and an arbitrary number of dominance terms. Considering only digenic epistasis,

BPH can be estimated using phenotypic data −*H*_{2} (*i.e*., backcrosses of RILs to P2). If we assume complete linkage between the marker *m* and one of the QTL contributing to BPH (*r _{mi}* = 0), we obtain the following genotypic expectations for the contrast of the two homozygous genotypic marker classes at marker

*m*:(15)Using CIM for QTL mapping with −

*H*

_{2}, the position of the QTL

*i*can be estimated and the effect of the linked QTL

*j*is blocked by the use of cofactors. Hence, analogously to

*Z*

_{2}(

*i*) the genotypic expectation for −

*H*

_{2}(

*i*), the contrast of −

*H*

_{2}between the two unobservable homozygous genotype classes at QTL

*i*, reduces to

## EXTENSION OF ANALYSES TO QUADRIGENIC EPISTASIS

Considering the special case of quadrigenic epistasis and using Equation 1, the genotypic expectations of PD and MPH can be calculated asIt becomes obvious that effects of higher-order epistasis contribute to PD and MPH. Even if individual effects are small, their summed effects may be large. In the case of quadrigenic interactions, the sum of effects *aaaa* comprises individual effects *aaaa _{ijkl}* and the sum of

*aadd*effects comprises individual effects

*aadd*. When looking at expectations of the marker contrast at marker

_{ijkl}*m*for pair means

*Z*

_{1}and pair differences

*Z*

_{2}and assuming

*r*= 0 and linkage equilibrium between QTL, we obtainThus, the estimate

_{mi}*b*

_{2}*/2 from CIM yields QTL effect estimates that account not only for

*d*but also for the contribution of

_{i}**daa*and

*aaaa*interactions of QTL

*i*with the genetic background. However, compared with the contribution of digenic interactions to MPH, effects of type

*daa*and

*aaaa*are only half accounted for and effects

*ddd*and

*aadd*are not accounted for at all when estimating the heterotic effect at QTL

*i*with CIM. Similar results are obtained for a comparison between genotypic expectations of PD and that differ mainly in the contribution of effects

*dda*and

*addd*. With arbitrary linkage disequilibrium among QTL the expectations above become rather unwieldy, but it can be shown that

*b*explains a larger proportion of all types of higher-order epistatic effects contributing to midparent heterosis or to the parental difference if linkage between QTL is present.

_{s}*## GENETIC CONSTITUTION OF VARIANCES

Generalized derivations of the genetic constitution of variances and for two QTL are given in the appendix. Contributions of genetic effects to variances of pair means () and pair differences () for RILs summed over all QTL assuming arbitrary linkage and digenic epistasis are presented in Table 2. If segregating QTL show intermediate linkage, the linkage disequilibrium coefficient *D _{ij}* and consequently genotypic expectations of and differ for RILs and double-haploid lines (DHLs), but deviations are small (see appendix). For unlinked (

*r*= 0.5) and completely linked (

_{ij}*r*= 0) loci, variances for RILs and DHLs are identical.

_{ij}Using the definitions of *a _{i}** given in Equation 7 and

*d*given in Equation 4 and defining the quadratic forms(16)and(17)the variances and can be expressed as shown in Table 2. For a direct comparison with the results obtained for F

_{i}*_{2}progenies, Table 2 also presents variances (F

_{2}) and (F

_{2}), which can be readily transformed into those presented by Cockerham and Zeng (1996) with σ

_{m}^{2}= (F

_{2}) and = (F

_{2}). Regardless of the type of population used for producing the testcrosses (

*i.e*., RIL or F

_{2}), it can be seen that the quadratic forms and are the main components of variances and respectively. The bias due to digenic epistatic interactions of types

*aa*and

*dd*() as well as

*ad*and

*da*() is small, especially for F

_{2}, as pointed out also by Cockerham and Zeng (1996). Thus, we postulate that (RIL) and (RIL) are reasonable approximations for 1/4 and respectively. As a result, the estimate of obtained from the ANOVA of design III is a close approximation of the variance of heterotic effects at QTL segregating in the cross P1 × P2.

## AVERAGE DEGREE OF DOMINANCE

Originally, design III was devised to provide an estimate of the average degree of dominance over loci calculated from the ratio of dominance to additive variance (). The ratio proposed by Comstock and Robinson (1952) is equivalent to and is an approximation of the ratio of quadratic forms and of augmented dominance (*d _{i}**) and additive (

*a*) effects, respectively, rather than of the ratio of dominance and additive variance. Therefore, the ratio should be denoted the augmented degree of dominance Estimation of can be biased by linked QTL, if linkage equilibrium among them has not been reached. Genetic effects at linked QTL contributing to (

_{i}**i.e*., have the same sign when loci are in coupling, while signs are different when loci are in repulsion. If two elite parents are crossed and coupling and repulsion linkages occur with equal probabilities, the effects of linked loci are likely to cancel in However, estimates of are likely to be inflated because

*D*is positive by definition and in hybrid breeding, loci with high positive augmented dominance effects (

_{ij}*d*) are favored in reciprocal recurrent selection. Thus, the contribution of genetic effects at linked loci to (

_{i}**i.e*., ) is generally positive irrespective of their linkage phase and will be strongly affected by the presence of epistasis and the magnitude of the linkage disequilibrium between QTL. The numerical example in supplemental Table S2 (http://www.genetics.org/supplemental/) clearly shows the effect of

*aa*epistasis on estimates of for unlinked loci. Altogether, the ratio of variance components obtained with design III is not always a useful estimate of the average degree of dominance, thus questioning the interpretation of many experimental results on the relative importance of dominance, overdominance, and epistasis in the expression of quantitative traits.

## DISCUSSION

#### Genotypic components of trait performance and heterosis:

Elucidating the genetic basis of heterosis has been the aim of a number of studies making use of advances in molecular biology. Some studies compared specific molecular traits such as differential gene expression in the F_{1} hybrid and the parental inbred lines (*e.g*., Guo *et al*. 2006; Swanson-Wagner *et al*. 2006). A number of authors used testcross progenies of F_{3} lines or RILs with the parental inbred lines as testers for QTL studies on heterosis (Stuber *et al*. 1992; Xiao *et al*. 1995; Li *et al*. 2001; Luo *et al.* 2001). Estimation of QTL effects was separately performed for the backcrosses to each parental line, except for a recent study with triple testcross progenies in maize, which used also 2*Z*_{1} and *Z*_{2} for QTL mapping (Frascaroli *et al*. 2007). For the backcrosses to the better-performing parent, this corresponds to the marker contrast −*H*_{2}(*m*) defined in Equation 15. Thus, with interval mapping (Lander and Botstein 1989) their analyses yielded estimates for QTL contributing to BPH (*d _{i}** −

*a*) for the progenies backcrossed to one parent and for poorer parent heterosis (

_{i}**d*+

_{i}**a*) for the progenies backcrossed to the other parent. Consequently, conclusions on gene action at individual QTL can be made only within this context.

_{i}*Semel *et al*. (2006) used a set of tomato introgression lines to exclude the confounding effects of genetic background variation and epistasis from the analysis of heterosis. Parental genotypes and the F_{1} hybrid differed exclusively in one defined chromosome segment while the entire genetic background originated from the elite parent. For a multitude of traits, the authors compared the phenotypic means of the elite parent, the homozygous introgression line, and the hybrid between the two. On the basis of these comparisons, the type of gene action at QTL was determined. However, as pointed out earlier, trait performance and heterosis have different genotypic expectations. Employing the quantitative genetic theory derived in this article, it can be shown in terms of the F_{2} metric that genotypic expectations of QTL effects contributing to MPH and BPH comprise both main and epistatic effects, despite the fact that the entire genetic background originates from the elite parent (Melchinger *et al*. 2007).

The use of an immortalized F_{2} population was proposed by Hua *et al*. (2003) for identifying genomic regions that contribute to MPH. RILs were derived from a heterotic rice cross and intermated for construction of an immortalized F_{2} population. Heterosis was calculated as the phenotypic deviation of each immortal F_{2} from the mean of its two RIL parents. Digenic interactions were estimated on the basis of the interaction effect of two marker loci. Genotypic expectations of the QTL main effects and interactions were not given by the authors. On the basis of our findings, we postulate that the immortalized F_{2} design has great value for estimating the dominance effect *d _{i}* and certain types of digenic epistatic interactions but does not identify QTL with genotypic expectations that equal precisely

*d*, their contribution to MPH (derivations not shown).

_{i}*We conclude that none of the available experimental designs of quantitative genetics has the potential to separate the dominance effect *d _{i}* and half the sum of

*aa*epistatic interactions confounded in the augmented dominance effect

*d*of heterotic QTL. Design III identifies heterotic QTL but does not allow separation of dominance and

_{i}**aa*epistasis. For the time being, the only solution to this problem is to identify QTL contributing to heterosis with design III and to estimate the augmented QTL effect

*d*. In genomic regions exhibiting significant heterosis, the augmented dominance effect

_{i}**d*can be dissected into its components (

_{i}**d*and ) by employing additionally the immortal F

_{i}_{2}design and estimating the magnitude of dominance effect

*d*. With RILs, the same lines can be used for generating the progenies for design III and the immortal F

_{i}_{2}population, and marker data generated for the RILs can readily be employed in both types of analysis. The optimal dimensions of experimental studies applying our approach with respect to population size and population type as well as marker density have yet to be determined. Epistatic interactions of QTL

*i*with the genetic background are expected to vary considerably across RILs. Implications of this biological variation on estimates of

*d*will be addressed in a separate study.

_{i}*#### Epistasis and heterosis:

As has been shown in this study, even small epistatic interactions can be important for the expression of heterosis because their contribution to the genotypic expectation of augmented QTL effects sums up over many effects. On the basis of results from classical quantitative genetic experiments, we have reason to assume that epistasis plays an important role in the inheritance of quantitative traits and heterosis (*e.g*., Jinks and Jones 1958) but results from QTL studies on the importance of epistasis have been rather ambiguous (*e.g*., Stuber *et al*. 1992; Cockerham and Zeng 1996; Frascaroli *et al*. 2007). In QTL analyses, the power of detecting QTL with epistatic effects is generally low mainly due to the problem of multiple testing in two- or multidimensional genome scans (Lander and Botstein 1989) or due to the necessity of *a priori* model selection with one-dimensional scans (Kao *et al*. 1999). We need to keep in mind that QTL with significant epistatic interaction effects might not be representative for the majority of QTL with small effects contributing to gene networks that control the expression of quantitative traits. Hence, we are likely to introduce an ascertainment bias as pointed out by Kroymann and Mitchell-Olds (2005).

#### Type of epistasis:

When estimating the most prominent type of epistatic interactions in the expression of quantitative traits and heterosis in rice, a preponderance of *aa* epistatic effects was identified compared with *ad* or *dd* interactions (Yu *et al*. 1997; Hua *et al*. 2003). In self-pollinated crops, it is well known that coadapted gene complexes are favored by selection. As a consequence, half the sum of mainly positive *aa* interactions enters *d _{i}** with a negative sign, thus decreasing MPH and the power for detection of heterotic QTL. It is questionable if the results on heterosis from self-pollinated crops are directly applicable to cross-pollinated crops (Frascaroli

*et al*. 2007). Economic seed production, however, requires the development of inbred lines with high grain yield. Combined with the management of separate heterotic pools, it is highly probable that coadapted gene complexes are selected during inbred line development. As a consequence, if the sum of

*aa*epistatic interactions in the parents increases due to selection, MPH should decrease over time unless the sum of dominance effects at QTL influencing heterosis increases proportionally. A decrease in relative superiority of hybrids compared with their inbred lines has been described for maize (Duvick 1999). This decrease in relative heterosis can be the result of the accumulation of favorable dominant alleles at individual QTL, but it can also be explained by overdominance in the presence of

*aa*epistatic effects contributed by the parents. If this is the case, the outcome of marker-assisted selection programs aiming at the transfer of QTL for maximization of heterosis will strongly depend on the presence of favorable epistatic interactions with the genetic background in the respective germplasm and will be difficult to predict.

In conclusion, the results presented here are important in two ways. First, they provide the quantitative genetic theory to express heterosis as the sum of individual QTL effects. Second, they allow the assessment of epistatic interactions of individual QTL with the entire genetic background, thus extending the concept of epistasis from single-gene to system-level interactions. We suggest the use of CIM and design III with RILs to identify QTL expressing maximum heterosis (*i.e*., maximum *d _{i}**). All analyses can be performed with an extended version of the software PLABQTL (Utz and Melchinger 1996; http://www.uni-hohenheim.de/plantbreeding/software/index.html). Permutation tests for determining the significance threshold (Doerge and Churchill 1996) and cross-validation for unbiased QTL estimation (Utz

*et al*. 2000) can be readily applied. Using the joint likelihood-ratio test for augmented effects

*a*and

_{i}**d*will improve identification of heterotic QTL from elite × exotic crosses and provide a first test to distinguish between dominance and overdominance. Applying models accounting for multilocus epistasis and using molecular tools to finely dissect genomic regions contributing to heterosis will allow an assessment of the relative contribution of epistatic interactions in the manifestation of heterosis.

_{i}*## APPENDIX: GENERAL DERIVATION OF EXPECTATIONS, VARIANCES, AND COVARIANCES OF PAIR MEANS (*Z*_{1}) AND PAIR DIFFERENCES (*Z*_{2}) AS WELL AS MARKER CONTRASTS *Z*_{1}(*m*) AND *Z*_{2}(*m*)

Let denote the coefficient of parameter _{AD} in the conditional genotypic expectation of the testcross progeny *H _{t}* (

*t*= 1, 2) of a RIL with genotype

*v*at the

_{i}*i*th QTL. Then, we obtain from Equation 1(A1)withand

Let **E** denote the vector of genetic effects _{AD} for two QTL *i* and *j* under digenic epistasis; *i.e*., and denote the vector of conditional genotypic expectations of testcross progeny *H _{t}* (

*t*= 1, 2) with design III given the parental RIL has genotype

*v*(with

_{i}v_{j}*v*= 0, 2;

_{i}*v*= 0, 2). Then,

_{j}**G**with elements of the matrices

_{t}= H_{t}E**H**equal to coefficients calculated according to Equation A1.

_{t}By calculating with *p _{st}* equal to the

*st*th element offor design III, we get where denotes the vector of conditional genotypic expectations of

*Z*, given the genotype of the parental RIL.

_{s}To simplify formulas, to allow extension to multiple QTL, and to provide a generalized formula for RILs and DHLs, we use the parameter *D _{ij}* to quantify linkage disequilibrium between loci

*i*and

*j. D*can be calculated from the recombination frequency

_{ij}*r*between loci

_{ij}*i*and

*j*by the formulasandwith

*g*being the number of random-mating generations prior to selfing for the development of RILs or production of DHLs (Frisch and Melchinger 2006).

Under the assumption of Mendelian segregation we defineand

Expectations, variances, and covariances of *Z _{s}* are given byand

Expectations of marker contrasts *Z*_{1}(*m*) and *Z*_{2}(*m*) for the case of two QTL *i* and *j* linked to marker locus *m* with marker classes *u* (*u* = 0, 2) are given as follows. We define the parental genotypes as 0_{m}0_{i}0_{j} (P1) and 2_{m}2_{i}2_{j} (P2). Recombination frequencies between the three loci are denoted *r _{mi}*,

*r*, and

_{mj}*r*, respectively. The frequencies of the four possible QTL genotypes (

_{ij}*ij*= 22, 20, 02, 00) conditional on the marker genotype

*u*at the marker locus

*m*are given for RILs and DHLs in Table A1. Thus, we obtain the vector and the conditional expectations of linear functions

*Z*can be calculated as

_{s}*Z*From this, we obtain the orthogonal marker contrasts

_{s|m}*Z*(

_{s}*m*) =

*Z*(

_{s|m}*u*= 2) −

*Z*(

_{s|m}*u*= 0) = ((

*u*= 2) − (

*u*= 0))

## Acknowledgments

We thank two anonymous reviewers for their valuable contributions. This project was supported by the Deutsche Forschungsgemeinschaft (German Research Foundation) under the priority research program “Heterosis in Plants” (research grants ME931/4-1 and ME931/4-2, PI 377/7-1 and PI 377/7-2).

## Footnotes

Communicating editor: J. B. Walsh

- Received June 13, 2007.
- Accepted August 28, 2007.

- Copyright © 2007 by the Genetics Society of America