Abstract
Genomic prediction methods based on multiple markers have potential to include nonadditive effects in prediction and analysis of complex traits. However, most developments assume a Hardy–Weinberg equilibrium (HWE). Statistical approaches for genomic selection that account for dominance and epistasis in a general context, without assuming HWE (e.g., crosses or homozygous lines), are therefore needed. Our method expands the natural and orthogonal interactions (NOIA) approach, which builds incidence matrices based on genotypic (not allelic) frequencies, to include genome-wide epistasis for an arbitrary number of interacting loci in a genomic evaluation context. This results in an orthogonal partition of the variances, which is not warranted otherwise. We also present the partition of variance as a function of genotypic values and frequencies following Cockerham’s orthogonal contrast approach. Then we prove for the first time that, even not in HWE, the multiple-loci NOIA method is equivalent to construct epistatic genomic relationship matrices for higher-order interactions using Hadamard products of additive and dominant genomic orthogonal relationships. A standardization based on the trace of the relationship matrices is, however, needed. We illustrate these results with two simulated F1 (not in HWE) populations, either in linkage equilibrium (LE), or in linkage disequilibrium (LD) and divergent selection, and pure biological dominant pairwise epistasis. In the LE case, correct and orthogonal estimates of variances were obtained using NOIA genomic relationships but not if relationships were constructed assuming HWE. For the LD simulation, differences were smaller, due to the smaller deviation of the F1 from HWE. Wrongly assuming HWE to build genomic relationships and estimate variance components yields biased estimates, inflates the total genetic variance, and the estimates are not empirically orthogonal. The NOIA method to build genomic relationships, coupled with the use of Hadamard products for epistatic terms, allows the obtaining of correct estimates in populations either in HWE or not in HWE, and extends to any order of epistatic interactions.
- GenPred
- shared data resource
- genomic selection
- genetic variance components
- dominance
- epistasis
- genomic models
- NOIA approach
DOMINANCE and epistasis may play an important role in the genetic determinism of complex traits of interest, such as human health or economic traits in livestock and crops. The existence of interactions within and across loci is supported by classic quantitative genetic studies, QTL mapping, and the wide application of crossbreeding as a breeding strategy. Nowadays, genomics provides tools to understand the effects of the genes and their interactions and to offer new directions for genetic improvement (Mäki-Tanila and Hill 2014). In quantitative genetics, the partition of the variance in statistical components due to additivity, dominance, and epistasis does not reflect the biological (or functional) effect of the genes but it is most useful for prediction, selection, and evolution (Huang and Mackay 2016).
In livestock populations, one of the main reasons why dominance or higher-order interaction terms have not been considered in genetic evaluations is that pedigree relationships are not informative enough. However, genomic selection methods are beginning to demonstrate their potential to include nonadditive effects in evaluation models. Inclusion of dominant or/and epistatic effects in genomic evaluation has been proposed by several authors (Toro and Varona 2010; Su et al. 2012; Vitezica et al. 2013; Nishio and Satoh 2014; Jiang and Reif 2015). Most epistatic models only consider additive-by-additive epistatic interactions (e.g., Su et al. 2012; Jiang and Reif 2015), although dominant-by-dominant and dominant-by-additive interactions may play a major role in heterosis and also in inbreeding and outbreeding depression (Lynch and Walsh 1998, p.223). Moreover, to date these models assume Hardy–Weinberg equilibrium (HWE) (e.g., Vitezica et al. 2013). New statistical approaches to genomic selection that account for dominance and epistasis in a general context (i.e., in populations not in HWE, like crosses or inbred populations) are needed both in animal and plant breeding and for QTL association studies.
Genomic evaluation models can fit marker or haplotypic additive genetic effects either explicitly, estimating the effect of each marker (Meuwissen et al. 2001), or implicitly through the so-called “genomic” relationship matrix (VanRaden 2008; Goddard 2009; Yang et al. 2010), which uses an equivalent model from which the marker effects can be inferred by backsolving. Dominance and higher-order interaction terms can also be modeled using the “genomic” relationship approach. Several approaches exist (Su et al. 2012; Muñoz et al. 2014; Jiang and Reif 2015) but none has addressed the issue of orthogonality of the model, additive and dominant components, and all possible interactions.
For plant and animal breeders and evolutionary geneticists, a meaningful partition of the variance is such that estimates can be interpreted in the classical terms, as variances of breeding values, dominant deviations, epistatic deviations, and so on (Hill et al. 2008). A nonorthogonal partition may lead to the erroneous conclusion that assortative mating and inclusion of dominance and/or epistasis can yield higher genetic gains as opposed to the consideration of additivity and random mating. For instance, Muñoz et al. (2014) concluded that dominance accounted for 39% of the total genetic variance when they used a nonorthogonal partition, vs. 24% when they used the orthogonal partition in Vitezica et al. (2013).
In this study, we develop a general procedure to estimate genomic relationship matrices for interaction terms of any order, expanding the natural and orthogonal interactions (NOIA) approach (Álvarez-Castro and Carlborg 2007) to the scope of the covariances between individuals. We present how to compute epistatic relationships from genotypes. Our results generalize the results of Cockerham (1954) to genomic models (something that had not been proven so far) and to any population, either in HWE or not. In particular, we show that the use of Hadamard products to obtain high-order relationships is correct, something that is frequently used (Su et al. 2012; Muñoz et al. 2014; Jiang and Reif 2015), but so far unproven for the general case. Two simulated examples, assuming linkage equilibrium (LE) or linkage disequilibrium (LD) and selection, are used to illustrate the approach.
Theory
First, we review the state-of-the-art approaches for orthogonal partitions of genotypic values, including the NOIA model which generalizes these results to any population. Then we extend the NOIA model to genomic prediction and show how this new model translates, for main effects, to the use of genomic-additive and dominant-relationship matrices (built according to the NOIA model) and, for interaction effects, into Hadamard products.
Orthogonal models of gene effects
The epistatic model proposed by Cockerham (1954), following Fisher (1918), partitioned the genetic variance caused by two genes in LE into orthogonal variance components. For two loci A and B (with alleles A,a and B,b) and allele frequencies equal to Cockerham (1954) defined eight contrasts
which are orthogonal with respect to genotypic frequencies
is the joint frequency of genotypes j and k at each locus). The scales
(Table 1, see also Kao and Zeng 2002) satisfy two requirements:
and
where
correspond to the genotype
and
at the locus
and
are the contrast scales for the nine genotypes
These conditions allow the orthogonal estimation of genetic effects which are independent to each other in a statistical sense.

Note that and
and
are the additive and dominant contrasts for the locus A (for the locus B), and the epistatic components correspond to the relationships among these contrasts. Thus,
and
define the additive-by-additive, additive-by-dominant, dominant-by-additive, and dominant-by-dominant epistatic contrasts. Using the orthogonal contrast scales and in matrix notation, the genotypic values for an individual assuming two loci j and k is:
where the vector
was called E by Álvarez-Castro and Carlborg 2007) contains the mean and statistical additive
dominant (d), and epistatic
effects.
Thus, in a general context, the genotypic values are split in additive (or breeding) values, dominance deviations (intralocus interactions), and epistatic deviations (interloci interactions) (Falconer and Mackay 1996). In the classical theory, the genetic effects are defined specifically in reference to a population. The reference population is an “ideal” random mating population where HWE and LE are assumed (Cockerham 1954; Kempthorne 1954). Under these idealized conditions, epistatic variance can be partitioned into orthogonal additive-by-additive, additive-by-dominant, etc., variance components.
The orthogonal property is very important and useful. The partition of the genetic effect is directly related to the partition of the genetic variance. The genetic variance can be partitioned into eight independent components and there is no genetic covariance among them. The additive effects contribute to the additive variance, the dominance effects contribute to the dominance variance, and so on. This is a desirable property in modeling because the estimation of one genetic (i.e., additive) effect will not be affected by the presence or absence of other genetic effects in the model (i.e., epistasis). Note that the orthogonal property in Cockerham’s model applies only when allelic frequencies are exactly one-half (Zeng et al. 2005).
Later on, a model called general 2 allele model (G2A) (Zeng et al. 2005) was proposed for genetic effects regardless of the frequencies of the alleles at each locus. This model is a multi-loci, two-allele model which is orthogonal in populations under strict HWE and LE. The G2A model represents the Cockerham’s model in a multiple regression context aswhere
and
are defined as
and
Note that
and
contain the coefficients for the additive and dominant contrasts. This model is identical to the “breeding” or classical parameterization presented in Vitezica et al. (2013) for genomic evaluation, where genotypic values are split in additive (or breeding) values
and dominant deviations
respectively called
and
in Vitezica et al. (2013). Thus,
with
and
In this model, note that both
and
are shifted to have a mean of zero for a population in HWE
Thus, the mean breeding value is zero in HWE because
and the mean of dominant deviations is also zero because
These equations correspond to the first requirement of orthogonality in the Cockerham (1954) model. Since the means of
and
contrasts are scaled to zero for the population, the effects in the model are all orthogonal in HWE and LE. Thus, the substitution effect (Falconer and Mackay 1996) contributes to the additive variance, the dominance deviation contributes to the dominance variance, etc. There is no covariance between the genetic effects due to the orthogonal property of the model. For the allele frequency p = 0.5, the Cockerham model is a particular case of the G2A and Vitezica models, thus
Here, the definition of the additive and dominant contrast is the same whether or not other (independently segregating) loci or epistatic effects are fitted in the regression model.
Finally, another orthogonal model called NOIA was proposed by Álvarez-Castro and Carlborg (2007), and whose most relevant feature is that it can deal with departures from HWE while keeping statistical orthogonality. For the NOIA model and assuming the notation by Falconer and Mackay (1996), and
in
are defined as
(1)and
(2)where
and
are the genotypic frequencies for the genotypes
and
for the locus A. It can be shown that these contrasts and their Kronecker products, which model epistasis, fit Cockerham’s requirements of orthogonality when assuming LE but even in absence of HWE (Álvarez-Castro and Carlborg 2007). When the denominator is zero (monomorphic markers), the marker is ignored. Note that under the assumption of HWE, G2A and the model in Vitezica et al. (2013) are a particular case of the NOIA model when
and the denominator
Expanding the NOIA model, genomic relationship matrices for interaction terms can be obtained for any population, in HWE or not, as is detailed in the next section. The straight relationship between breeding values and α as “regression of value on gene dosage” only holds in HWE (Falconer 1985). Note that α here has a least-squares meaning (regression of value on gene dosage), but when HWE does not hold α is not a substitution effect in the sense of “breeding value” (Falconer 1985). Indeed, it is not clear what a breeding value is in absence of HWE.
Equivalent genomic model with epistasis
Using the NOIA model, the additive values of a set of individuals are, for multiple loci, with
containing one column per locus coded as in Equation 1 for an individual. Also, the dominant value of an individual can be parameterized as in Equation 2 and
for a set of individuals.
A linear model including additive, dominant, and higher-order interaction terms can be written as:where
is the vector of phenotypic records,
is the population mean,
is the total genotypic value,
is the additive value of the individual (breeding value if the population is in HWE),
is the dominant value,
is the second-order epistatic value,
is the third order epistatic value, and so on.
The additive genomic (co)variance relationship matrix will be computed from Equation 1 as:where
is a matrix with
rows (number of individuals) and
columns (number of markers) containing “additive” coefficients (Equation 1), as
where
is a row vector for the ith individual with m columns. For individual 1, the vector
is equal to
and the element
for the
marker is equal to
On the other hand,
corresponds to the expected variance of
with
(Searle 1982). In other words,
standardizes the cross-product matrix
to a variance of 1. In a Hardy–Weinberg population,
is equal to the heterozygosity of the markers
(Vitezica et al. 2013). Overall,
in HWE is similar to the VanRaden (2008) genomic-relationship matrix.
For the dominance, the genomic (co)variance matrix will be:where the matrix
is
and the elements of the vector
for the ith individual are equal to
The
standardizes the cross-product matrix
to a variance of 1. Again, for Hardy–Weinberg populations,
as expected (Vitezica et al. 2013).
Álvarez-Castro and Carlborg (2007) proved that the coefficients of the incidence matrix for second-order epistatic effects between two loci can be computed as the Kronecker products of the respective incidence matrices for single locus effects, that isand subsequently (e.g.,
for third- and higher-order epistatic effects. This results in orthogonality of epistatic effects. For instance, consider individual
with two loci:
In fact, the Kronecker product above reorders the effects as follows:
Following this idea, for the interactions, such as additive-by-dominant interactions, the matrix
can be written using Kronecker products of each row of the preceding matrices as
For instance, for individual 1, the incidence matrix of epistatic effects is:
For instance, the second element of
contains the additive-by-dominant interaction for the respective loci 1 and 2. For individual 2, this is
and so on. The matrix
has as many columns as marker interactions, and as many rows as individuals. This matrix is of a very large size (e.g., for a 50K SNP chip and 1000 individuals the matrix contains
elements).
Computation of this cross-product matrix across individuals can take a considerable time because the matrix
is very large, e.g.,
operations. However, there is an algebraic shortcut that allows easy computation of
and the rest of the epistatic matrices, even for third and higher orders. Using Searle (1982), p. 265:
And this product is
Thus, the
element of
has the form
In Searle (1982), p.265, we have that
Applying this,
has the form
However, each of the products
is scalar because
are row vectors. Thus, we have proven that, for instance, the cross-product matrix of the additive-by-dominant interaction can be put as the direct product of the unscaled cross-product matrices of additive and dominance:
However, to standardize and get meaningful relationships we need to divide by the trace:
But
and thus
unless all elements in the diagonals of the
matrices equal 1.
Assume that we know
and the traces
and
Then
and
Thus
which results in
which is very simple to do because the element
of
is the product of the elements
of the respective matrices
and
scaled by
Thus, this is the Hadamard product of both matrices divided by the average of its diagonal. Therefore, the covariance matrix can be calculated as:
Accordingly,
and, e.g.,
The normalization factor based on the traces was already used by Xu (2013) but several authors ignore it (Muñoz et al. 2014). Here we presented the reasoning for pairwise interactions but it extends to third and higher order interactions.
We have thus shown how to proceed to an orthogonal decomposition of the variances in any population in HWE or not, assuming LE. We have also shown that orthogonal high-order (epistatic) genomic-relationship matrices involved the Hadamard products of additive and dominant genomic-relationship matrices.
Genetic variances in the presence of two-locus epistasis and LE
In the simulation part of this article we simulate and estimate two-loci epistasis. Here we present the decomposition of the expected variance in such a situation, to recover the expected parameters from the simulation. We will use properties of Cockerham’s (1954) idea of orthogonal weighted partitioning in genetic systems, applied to populations not necessarily in HWE, although assuming LE. The expression relates genotypic values with statistical effects in
and
is an incidence matrix including an overall mean and all contrasts. Orthogonality of
is warranted in LE if it is constructed by the NOIA system.
For instance, in a two-loci case,
has nine values, ordered as AABB, AaBB, aaBB…aabb.
where
is a 3×3 matrix for locus k with alleles A/a. In
the first column contains 1, and the second and third columns are equal to Equations 1 and 2.
The orthogonal system has the two properties described before: and
for
These properties allow a straightforward estimation of statistical effects and variance components. The statistical effects can be obtained from weighted linear regression:
where
is a diagonal matrix (for two loci, a 9×9 matrix) with genotypic frequencies (weights) in the diagonal. This approach extends to any order of interaction. The total variance can be obtained from
All the variance components can be obtained from
where
is a diagonal matrix that contains the effects in
in its diagonal. Matrix
contains, in its diagonal, the variance components associated to each contrast; in the case of two loci, these are nine. Because the system is orthogonal, the out-of-diagonal terms in
contain 0. Because the Kronecker product reorders effects in the linear system, for the two-loci case we have:
For the F1 case that will be simulated, genotypic frequencies are as follows. Consider two loci k, A/a, and j, B/b. The allelic frequencies in the parental populations 1 and 2 are
and
and
and
The F1 population has genotypic frequencies for locus A
for the three genotypes (AA,Aa,aa) and similarly for locus B. In the F1 population, each locus has the following genotypic frequencies:
The frequencies of the nine genotypes at the two loci are, assuming LE, the Kronecker product of each locus’ frequencies:
For instance, let Only the double heterozygote has a functional value different from 0. Let
and
This gives
corresponding to the nine sums of squares attributed to the overall mean
and
From this, we get
and
For n pairs of epistatic loci, the procedure is repeated n times and the n variances are added. The procedure is generalizable to any number of interactions. An R code for this example is provided in Supplemental Material, File S1.
Example
Simulated data
We simulated an F1 population deviating strongly from HWE and two scenarios: (1) QTL in LE and directly observed, and (2) QTL in LD and estimation via markers.
In the LE scenario, we simulated a quantitative trait with pure biological (or functional) dominant-by-dominant effects (see Table 2) of pairs of QTL loci exclusive pairs from n loci), such that the double heterozygote has a value
sampled from a Bernouilli distribution (−1,1) and all other genotypes have value 0. It is important to recall that, when allelic frequencies differ from 0.5 and across loci, this dominant-by-dominant epistatic interaction generates statistical additive, dominance, and the three kinds of epistatic variances. Thus, we consider this as an extreme case for illustration purposes. Expected true variance components were obtained as explained in the previous section from the simulated effects and allelic frequencies.
Total genetic values were calculated for 1000 individuals by adding the genotypic values of 200 pairs of unlinked interacting loci from 400 biallelic QTL. Each individual has two alleles that each come from a different population, say line 1 and line 2, with different allelic frequencies both drawn from a U-shaped Beta (0.5,0.5) distribution. Allelic frequencies were simulated as negatively correlated (−0.5) across the populations to mimic divergent selection. The proportion of loci not in HWE (P-value < 0.001) turned out to be 0.94. The heritability was equal to 1 and the total genetic (and phenotypic) variance was equal to 39. No residual/environmental variance was simulated.
In the LD scenario, from a historical common population in drift- mutation equilibrium, two divergent lines were divergently selected based on phenotype for 10 generations, followed by an F1 cross. The simulation included 30 chromosomes of 1 M with, potentially, 5000 markers and 100 QTL each. We used QMSim (Sargolzaei and Schenkel 2009) and our own software. A quantitative trait with heritability equal to 0.5 was simulated in the creation and selection of lines 1 and 2, using the same biological dominant-by-dominant mechanism as above. Lines included 20 males and 20 females per generation, with a number of offspring of 10 per couple. Thus, selection was quite strong. To generate the F1, 80 males and 80 females were drawn from the last generations of line 1 and 2 to generate 10 offspring per couple. Therefore, the data set consisted of 800 crossed individuals. In the F1, the heritability was equal to one and no residual variance was simulated. The mean phenotypic values were equal to 6.69, −1.87, and 3.20 for lines 1 and 2 in generation 10 and in F1, respectively, for a total genetic variance of 29 in the F1. The pairs of epistatic loci and the functional effects of their interactions were generated at random at the beginning of the simulation and did not change with time.
The number of segregating loci in the F1 was 10,225 SNPs and 1432 QTL. The overall deviation from HWE was measured as the proportion of loci not in HWE (P-value < 0.001) on 200 individuals from line 1, line 2, and F1 (Table 3). LD was monitored from QMSim output in generation 10 for line 1 and 2, and in the F1 animals (Table 3). Levels and decay of LD (r2) with genetic map distance are in agreement with the values simulated by Schopp et al. (2017) for a scenario of “ancestral long-range LD” in plant breeding. Expected pseudotrue variance components were obtained (incorrectly assuming LE) from the simulated effects and allelic frequencies in the F1 as explained in the previous section.
Variance components were estimated for the two scenarios (LE and LD) with two different models, i.e., assuming HWE (model HWE) or not (model NOIA). The general model was:where the covariance matrices
and
(involved in the construction of
and
were constructed either assuming HWE as in Vitezica et al. (2013) or using the NOIA approach as in Equations 1 and 2. The genotypes at the QTL were directly used to construct the matrices.
Genetic
and
variances were estimated by Bayesian methods using Gibbs sampling using the software gibbs2f90 (Misztal et al. 2002), available at http://nce.ads.uga.edu/wiki/. After estimation, variances estimated assuming HWE equilibrium were transformed to a proper scale, multiplying the estimates by
(Legarra 2016), where
is a relationship matrix (e.g.,
or
This is needed because, in absence of HWE, these relationship matrices do not have a mean of 1 in the diagonal and 0 overall; see Legarra (2016) for a detailed explanation. This was not needed for the NOIA approach as in all cases
Finally, to ascertain if variance component estimates were empirically orthogonal, we computed their correlation in the posterior distribution.
Data availability
Programs and data are available at http://genoweb.toulouse.inra.fr/∼zvitezic/simuepi_genetics.
Results
Table 4 shows the statistics of the different relationship matrices for the F1 population in LE. The diagonal elements do not have an average diagonal of 1, as the divisors
and so on in Vitezica et al. (2013) are conceived for populations in HWE. In addition, in absence of HWE, the mean of the matrix is not zero, in particular for the dominant matrix. Both phenomena affect the interpretation of variance components (Legarra 2016). However, NOIA is free of this problem [although the mean of
and
is actually not zero, it is very small (<0.005)]. Similar results were observed in the scenario with LD. Whereas divergence in the LE scenario was very strong (correlation of allelic frequencies across lines 1 and 2 equal to −0.5), here divergence was much lower, even after 10 generations of divergent selection (correlation of allelic frequencies across populations of 0.70, both for markers and QTL). The number of loci not in HWE in the F1 was 11%, which is not too high. In the LE simulation, the number of loci not in HWE was 94%.
In the scenario with LE, estimates of variance components for the two different models (HWE and NOIA) are shown in Table 5. Most of the variance goes to nonepistatic effects as expected (Falconer and Mackay 1996; Hill et al. 2008; Huang and Mackay 2016). However, the HWE and NOIA models radically differ in their estimates. The NOIA model retrieves the simulated variance components (last column called “true” in Table 5). The HWE model (which incorrectly assumes HWE) gives a completely different partitioning of the variance, yielding an estimate of total genetic variance of 58 whereas the simulated variance is 39. After correction as in Legarra (2016), the sum of genetic variances in the HWE model yields a correct estimate of total genetic variance. However, the partitioning in the different components is still incorrect and quite different from NOIA.
Estimates of variance components fitting, exclusively, additive relationships were computed for the LE scenario. The NOIA model is orthogonal; the estimate (a value of 15) is the same as the estimate with the complete model with all direct and epistatic components (Table 5). When only additive effects are fit, NOIA assigns dominant and epistatic components to the residual. The HWE model is not orthogonal; the estimate fitting additive relationships only (a value of 25) is different from the one obtained fitting all components (a value of 12 in Table 5). When only additive effects are estimated and the correction is used in the HWE model, results are similar to NOIA values.
In the LD scenario, the variance component estimates are similar in both models, assuming HWE or NOIA (Table 6), although the NOIA model obtains a correct estimate of total genetic variance even with LD, while the HWE model overestimates it by 20%. In Table 6, the simulated variance components are pseudotrue variance components because they do not account for LD. Note that LD among loci has an effect on estimations that cannot be ignored. Differences between the HWE and NOIA models are less clear than in the LE scenario, nonetheless because divergence of lines is much lower, as well as the deviations from HW in the F1. The sum of nonadditive variances is higher than the additive variance, and, in particular, the additive-by-additive variance is quite high. This could be due to the selection of linked epistatic pairs of loci, something that the expected variances under LE cannot consider. Empirically, neither the HWE nor NOIA models were orthogonal under LD. Both models estimated different values when only additive relationships were fit than when the model fit all variance components (estimates of 17 vs. 9, respectively).
Lastly, in Figure 1 we show the a-posteriori correlations across estimates for the LE scenario. The HWE analysis shows a correlation of −0.24 between additive and dominant variance components, which shows difficulty in disentangling both components using this model. This correlation is reduced to −0.07 in NOIA. High-order terms are more difficult to estimate than linear ones. The a-posteriori correlations between epistatic variance components were not always the lowest for NOIA. Nevertheless, the mean absolute correlation among all (additive, dominant, and epistatic) variance components is lower for NOIA (0.11) than for HWE (0.22).
A-posteriori correlations among variance component estimates (var_e is the residual variance) in LE scenario. Left: assuming HWE. Right: using the orthogonal NOIA model. Size of the ● is proportional to absolute magnitude of the correlation.
Discussion
In this work, we propose using the NOIA model, which was conceived for QTL studies, to account for epistasis and partition the variance in a genomic evaluation context. The main advantage of the NOIA model is that the assumption of HWE is relaxed, so that the model applies equally well to populations in HWE or other populations not in HWE, such as F1 (e.g., corn, pigs), three-way crosses, or backcrosses, all of which are quite common in agriculture and livestock. In this genomic evaluation framework, the NOIA model can be used to define appropriate genomic-relationship matrices for the additive, dominant, and epistatic genetic effects for each individual. This was initially shown by Varona et al. (2014), a work that we generalize and complete here.
We have also shown that using Hadamard products of relationship matrices is equivalent to the direct estimation of loci-based epistatic effects for all possible levels of interactions, including dominance, something that had not been proven before. Jiang and Reif (2015) make an asymptotic proof for pairwise additive interactions, whereas here we provide a complete proof including dominance and for any order of interactions. The Hadamard product relies on the orthogonal property of the model because no covariance exists between main genetic effects (i.e., additive and epistatic effects). However, the derivation of the epistatic relationship matrices using the Hadamard product depends on the assumption of noninbreeding and random mating, or, in other words, HWE (Cockerham 1954). Henderson (1985) suggested using the Hadamard product of the (pedigree-based) additive and dominant relationship matrices to obtain the epistatic relationship matrices. Henderson’s approach was extended to the genomic framework by Xu (2013) for an F2 design. For instance, Xu et al. (2014) predicted hybrid performance in a rice F2 population. Recently, the Hadamard product of additive and dominant genomic-relationship matrices was used to construct epistatic relationship matrices in pigs (Su et al. 2012), pine trees (Muñoz et al. 2014), and wheat and maize (Jiang and Reif 2015), often without standardization of the resulting matrices or incorrectly assuming HWE. In absence of HWE, and to get meaningful estimates of variances that sum to the phenotypic variance, a further standardization (Legarra 2016) may be needed—this has not been considered by other authors.
In simulating an F1 population, we observed that even if there is only biological epistasis (dominant-by-dominant epistasis), the statistical additive and dominant effects capture an important part of the variation. This is as expected because epistasis contributes to additive and dominant genetic variances (Hill et al. 2008; Mäki-Tanila and Hill 2014). In LE, the difference between the HWE and NOIA model estimates can be large. Both yield different partitions of the genetic variance and, worse, HWE is neither correctly scaled nor orthogonal. Thus, inclusion of more genetic effects in analysis assuming HWE can dramatically change estimates. In some studies, the nonorthogonality can be seen because introducing new variance components dramatically changes previous estimates, something that does not occur for orthogonal models. Our approach based on the orthogonal NOIA model allows correct partitioning of the genetic variance, even in Hardy–Weinberg disequilibrium, through the correct use of Hadamard products to obtain epistatic relationships.
Although the NOIA model removes the assumption of HWE, another factor that generates nonindependence between loci and lack of orthogonality is LD. The LD affects the partition into additive, dominance, and epistatic components, such that an orthogonal partition is not possible (Hill and Mäki-Tanila 2015). In fact, LD may introduce genetic covariances between different genetic effects and complicates the definition of genetic effects and the partition of the genetic variance, in particular in presence of epistasis (Zeng et al. 2005). In addition, there are suspicions of favorable epistatic combinations of linked loci, leading to outbreeding depression and recombination loss (Dickerson 1969; Lynch 1991). Still, in our simulation in LD, the NOIA model returned the correct total genetic variance, while the model assuming HWE overestimated the total genetic variance by 20%. In F1 and other crosses, the NOIA model is a more realistic model able to deal with the HWE assumption even if estimates may be biased due to lack of orthogonality of the effects due to LD. Note that a tight linkage is needed to yield substantial LD in outbred populations (Hill and Mäki-Tanila 2015). For reliable parameter estimation, it is important to work with the most “reliable” statistical model before claiming high importance of the epistatic variance.
It is tempting to fit high-order epistasis terms given the relative ease of computing the direct products. There are two words of caution: First, the products tend quickly to the identity matrix, in which case there is no hope of distinguishing genetic components from residual environment. This can already be observed in Table 5 and Table 6: only the first components are estimated with some accuracy, and in Figure 1 epistatic terms have a-posteriori high correlations with residual variance. Second, variance decomposition in terms of additive, dominant, and epistatic terms does not indicate the relative importance of additive and nonadditive gene actions (Falconer and Mackay 1996; Hill et al. 2008; Huang and Mackay 2016).
We have not undertaken the change of reference operation included in the NOIA framework, which allows to transform genetic effects in a reference system (say, statistical effects) into another one (say, functional genetic effects). In principle, this is feasible by a multiple loci extension of the operations in Álvarez-Castro and Carlborg (2007). This should be a line of future research.
Conclusions
Our approach to construct NOIA genomic relationships is flexible and general for populations not in HWE, such as in inbred lines or F1 crosses of plants and animals, but also for populations in HWE. Genetic effects in NOIA are orthogonal in LE regardless of the genotypic frequencies in the population. Further, we have proven that any genetic modeling with interactions generalizes to Hadamard products of additive and dominant relationships, something that had not been proven before for the general case; but with special care for standardization, something that is not always done. The underlying orthogonal NOIA model is useful on the partition of additive and nonadditive genetic variation and is able to recover the total genetic variance. Assuming HWE when it is not present can yield biased and nonorthogonal partitions of the genetic variance. For reliable parameter estimation, it is important to work with the most reliable statistical model such as the NOIA model which can deal with lack of HWE.
Acknowledgments
We are grateful to Mehdi Sargolzaei for a modified version of QMSim and to members of the EpiSel project for their helpful and constructive comments. Discussions with Simon Teyssèdre (RAGT, Rodez, France) are greatly acknowledged. This work has been financed by the Institut National de la Recherche Agronomique Sélection Génomique metaprogram, project EpiSel (Z.V., A.L.), as well as the project CGL2016-80155 of the Spanish Ministerio de Economía y Competitividad (L.V.). The project was partly supported by Toulouse Midi-Pyrénées bioinformatics platform.
Footnotes
Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.199406/-/DC1.
Communicating editor: M. P. L. Calus
- Received December 20, 2016.
- Accepted May 11, 2017.
- Copyright © 2017 by the Genetics Society of America