## Abstract

Dominance genetic effects are rarely included in pedigree-based genetic evaluation. With the availability of single nucleotide polymorphism markers and the development of genomic evaluation, estimates of dominance genetic effects have become feasible using genomic best linear unbiased prediction (GBLUP). Usually, studies involving additive and dominance genetic effects ignore possible relationships between them. It has been often suggested that the magnitude of functional additive and dominance effects at the quantitative trait loci are related, but there is no existing GBLUP-like approach accounting for such correlation. Wellmann and Bennewitz (2012) showed two ways of considering directional relationships between additive and dominance effects, which they estimated in a Bayesian framework. However, these relationships cannot be fitted at the level of individuals instead of loci in a mixed model, and are not compatible with standard animal or plant breeding software. This comes from a fundamental ambiguity in assigning the reference allele at a given locus. We show that, if there has been selection, assigning the most frequent as the reference allele orients the correlation between functional additive and dominance effects. As a consequence, the most frequent reference allele is expected to have a positive value. We also demonstrate that selection creates negative covariance between genotypic additive and dominance genetic values. For parameter estimation, it is possible to use a combined additive and dominance relationship matrix computed from marker genotypes, and to use standard restricted maximum likelihood algorithms based on an equivalent model. Through a simulation study, we show that such correlations can easily be estimated by mixed model software and that the accuracy of prediction for genetic values is slightly improved if such correlations are used in GBLUP. However, a model assuming uncorrelated effects and fitting orthogonal breeding values and dominant deviations performed similarly for prediction.

- genomic model
- additive genetic effects
- dominance genetic effects
- correlation
- shared data resource
- GenPred
- Genomic Selection

FROM quantitative genetics theory, statistical additive genetic values (also called breeding values) of individuals are obtained from average allele substitution effects which are functions of functional additive and dominant gene/marker effects (Falconer and Mackay 1996). Dominance deviations are differences between genotypic values and breeding values, and only include a part of the dominant effects of the genes/markers (Falconer and Mackay 1996). Additive genetic variance includes variation due to the functional additive and dominant effects, and dominance genetic variance involves only the functional dominant effects.

The inclusion of dominance in genomic evaluation models has been proposed by several authors (Su *et al.* 2012; Vitezica *et al.* 2013, 2016; Ertl *et al.* 2014; Muñoz *et al.* 2014; Aliloo *et al.* 2016, 2017; Xiang *et al.* 2016). In those studies, additive and dominant marker effects are considered uncorrelated. However, QTL analyses show that the *magnitudes* of these effects are dependent (*e.g.*, Bennewitz and Meuwissen 2010).

In the genomic era, the relationship between magnitudes of additive and dominant gene/marker effects has scarcely been modeled (Wellmann and Bennewitz 2011, 2012; Bennewitz *et al.* 2017). Wellmann and Bennewitz (2011) reviewed evidence for association of magnitudes of additive and dominant effects across QTL. These magnitudes have been shown to be related through the dominance coefficients QTL with large absolute additive effect are likely to be associated with large dominance coefficients, while small additive effects tend to be associated with small dominance coefficients (Caballero and Keightley 1994). These results suggest that across QTL loci it is possible to construct joint distributions of additive and dominant effects. Wellmann and Bennewitz (2012) suggested a general hierarchical Bayesian model where absolute additive QTL effects and dominance coefficients were assumed to be dependent with

Recently, the dependencies between additive and dominant gene effects were considered in a Bayesian model for association analysis (Bennewitz *et al.* 2017). For genomic prediction, methods commonly used are linear mixed models and best linear unbiased prediction (BLUP), *i.e.*, genomic BLUP (GBLUP) methods (Su *et al.* 2012; Vitezica *et al.* 2013). In these models, the relationship between additive and dominant marker effects is ignored. Examining the relationship between and including it in such genomic models could improve the accuracy of predictions.

Even though Bayesian models can take into account the dependencies between additive and dominant gene effects, their implementation by Markov Chain Monte Carlo methods is not straightforward. These models need customary implementations and would be computationally slow for large data sets. In addition, if the relationship between functional additive and dominant marker effects cannot be described as covariance structure of additive and dominance effects on individuals, then the standard mixed model animal breeding software, *e.g.*, DMU (Madsen and Jensen 2013), BLUPf90 family (Misztal *et al.* 2002), or ASREML (Gilmour *et al.* 2009), cannot be used for estimating such relationships.

In this study, we present a novel method to quantify the importance of the relationship between additive and dominant effects of QTL using a GBLUP-like method. This method relies on the fact that allele substitution effects contain functional additive and dominant effects that, after phenotypic selection, tend to offset each other. Our approach is based on fixing the most frequent allele as the reference allele. By simulation, we evaluate the benefit of accounting for this relationship (between ) explicitly in the genomic model for genetic parameter estimation and the prediction of genetic values, comparing it with the model ignoring the relationship between and with the classical orthogonal model based on breeding values and dominance deviations (Vitezica *et al.* 2013).

## Materials and Methods

### Theory

Consider a quantitative trait that is determined by biallelic quantitative trait loci. For each locus, let the midpoint of the genotypic values of the two homozygotes be the origin (0). Relative to this origin, the genotypic value of a homozygote is defined as either a or and the genotypic value of the heterozygote is defined as *d*, which is the amount the heterozygote deviates from the origin. Note that the functional value a can be either positive or negative (a point that is rarely explicit in many textbooks), and the magnitude of the difference between the two homozygotes is therefore the absolute value . For many traits and species, an advantage of heterozygosity is observed, known as heterosis or the related phenomenon inbreeding depression (*i.e.*, Lynch and Walsh 1998, chapter 10). This phenomenon is typically modeled as a regression of the phenotype on the degree of heterozygosity, for which several metrics exist (*i.e.*, Silió *et al.* 2016; Xiang *et al.* 2016).

Using a genotypic model (*i.e.*, Su *et al.* 2012; Vitezica *et al.* 2016) for additive and dominant genotypic effects u and v on individuals we use the notation and respectively, where a and d are vectors of functional additive and dominant QTL effects across individuals, respectively, and Z and W are the respective incidence matrices. Considering one QTL, the matrix W has entries 0, 1, and 0 for genotypes *BB*, *Bb*, and *bb*, respectively. For the matrix Z, there are two ways of coding genotypes depending on the selected reference allele: if allele *B* is the selected reference allele, then matrix Z has entries 1, 0, and −1 for *BB*, *Bb*, and *bb*, respectively (case 1 in Table 1); if allele *b* is selected to be the reference allele, matrix Z is coded as −1, 0, and 1 for genotypes *BB*, *Bb*, and *bb*, respectively (case 2 in Table 1). Thus, in case 1, additive genotypic effects ** u** for

*BB*,

*Bb*, and

*bb*are 0, and respectively, and in case 2, additive genotypic effects u for

*BB*,

*Bb*, and

*bb*are and respectively. The dominant genotypic effects v are always and 0 for genotypes

*BB*,

*Bb*, and

*bb*, respectively. These two cases can be found in Table 1.

Phenotypic (directional) selection, either artificial by breeding or natural, operates to change the phenotype in a desired direction; throughout this paper we will use the shorter term “selection” for this, and without loss of generality we will assume that the direction of selection is upwards. Selection acts mainly on the statistical additive component of genetic values. Thus, considering a segregating QTL, then selection will act on the allele substitution effectTherefore, one would expect that in a long-term selected population, QTL with large allele substitution effects are fixed and only QTL with small allele substitution effects segregate. Thus, if there is selection, it is expected that no segregating QTL will have a very large allele substitution effect after some generations of selection, leading to:(1)where ε is small. From this we can derive and we see that *a* and will tend to have the same sign and magnitudes that are positively associated. At this point, a refers to an arbitrary allele (*B* or *b*) whose frequency is and both a and d can have positive or negative values. The allelic frequency and therefore can also be positive or negative, and we see from that there is no association between sign of a and signs of either or We will call this “random allele coding” (RAC).

An alternative presentation of the same idea is considering allelic frequency at equilibrium after several rounds of selection (Crow and Kimura 2009; Charlesworth and Charlesworth 2010). If the fitnesses at the locus are d, and a (the actual values are functions of a and d that depend on the selection intensity and the part of genetic variance explained by the locus; Charlesworth and Charlesworth 2010, box 3.7), the frequency at equilibrium is which is a rewriting of Equation (1). Thus, and have the same sign.

Consider now the equilibria. If loci tend to fixation toward the favorable homozygote. If (overdominance or underdominance), equilibria are *stable* when (overdominance) and there is maintenance of polymorphisms in the population, but if (underdominance) the equilibrium is unstable and any random event will lead loci to fixation (Crow and Kimura 2009; Charlesworth and Charlesworth 2010). Thus, selection tends to maintain primarily overdominant alleles and therefore, after a selection process, it is expected that (either across segregating loci or across repeated evolutionary histories of the same locus).

Now we extend the reasoning to several loci with random effects drawn from some distributions. Consider a collection of loci with elements , and respectively, and are treated independent and identically distributed across loci. We can formalize the above finding to the covariance between a and as with and For RAC we see from that there is no association between sign of and signs of either or This we formalize as:(2)The last identity holds because has zero mean due to having a symmetric distribution with mean 0.5 [could be a uniform distribution between 0 and 1 or a U-shaped distribution (Hill *et al.* 2008)] for the RAC, and assuming that conditional mean and variance of do not depend on This means that postulating in the model a covariance between a and d has no meaning (or interest) because this covariance is 0 under these assumptions.

However, returning to the one-locus case, we can measure its allelic frequency and we can arbitrarily fix the most frequent allele as the reference allele such that We will indicate this hereinafter as “major allele coding” (MAC). In such case, allelic frequency p is denoted as (). Additive effect a still refers to the allele whose frequency is p (actually now is ) and is denoted as Still, can have positive or negative values. The functional dominance effect d is invariant to the reference allele. Equation (1) still holds:By the restriction the term in becomes strictly positive and we see that and d will tend to have the same sign. In addition, because and ε is small, then it turns out that

Consider now the covariance, across several loci, between functional additive () and dominant (d) effects; then we formalize the above finding to(3)where the last inequality follows fromwhere the first term equals and the second term is zero since for segregating loci. If the conditional variance of is not depending on then where both terms are larger than zero.

Thus, we have shown that, by a careful coding of the model, it is possible to express an *after selection* dependency between functional additive and dominance effects at biallelic QTL as a covariance.

So far, we did not consider the dominant coefficient If, across loci, there is a biological relationship between the magnitudes of a and d (*e.g.*, Bennewitz and Meuwissen 2010), this mechanism will reinforce the previously mentioned dependency between functional additive and dominance effects. In Appendix A, we consider and we argue that it often holds that when [*BayesD3* model in Wellmann and Bennewitz (2012)] and when [*BayesD2* model in Wellmann and Bennewitz (2012)].

### Covariance between genotypic additive and dominant effects

In Hardy–Weinberg equilibrium (see Table 1), the covariance between genotypic additive effects and dominance effects (for a random individual *j* drawn from a population) for one QTL can be derived as:where the expectations can be computed from expectations of conditional expectations (because the cross product is always zero across the three possible genotypes), and respectively. Therefore,As we have already noted, (there will be tendency that only overdominant mutations eventually remain in heterozygous state) and therefore the term is always positive, irrelevant of the coding. When coding genotypes as in the RAC, and thus . Therefore, holds for RAC. When coding genotypes as in the MAC, and there will be a tendency that and *d* have the same sign [see Equation (2)], so that In other words, after long-term selection there will be a tendency that the allele with the positive effect will be most frequent, and hence is positive. To conclude, there will be tendency that both and *d* are positive. Therefore, and are both positive, and hence for the MAC coding(4)The intuition behind this is that after long-term selection, individual genotypic additive and dominance effects tend to offset each other in polymorphic loci (otherwise there would be fixation).

The expression in Equation (4) extends, assuming linkage equilibrium, to for the case of multiple QTL.

For a statistical analysis based on mixed models and genomic data, the assumption is and matrices ** Z** and

**are known. From the**

*W**α*being small property, we have that signs of

*a*and

*d*are uncorrelated for RAC but positively correlated for MAC. Therefore, we obtain for RAC [Equation (2)] and for MAC [Equation (3)].

### Variance component estimation

According to the theory sketched before, the variance–covariance structure between genotypic additive and dominant effects is:(5)where ** Z** and

**contain genotypic codings,**

*W***and**

*a***are additive and dominant SNP effects, is the additive variance for SNP effects, is the dominance variance for SNP effects, and is covariance between additive and dominance SNP effects. For different analyses, elements in matrix**

*d***will be coded using RAC (the reference allele is chosen at random for each locus) or using MAC (the reference allele is the most frequent for each locus).**

*Z*This (co)variance structure in Equation (5) cannot be fit in usual BLUP or restricted maximum likelihood (REML) software directly, because the covariance matrix is not factorizable as a Kronecker product of a relationship structure times a covariance matrix. A solution for this issue is to use an equivalent model with two additional unknown random effects ** u*** and

*** (Fernández**

*v**et al.*2017). Let and ; these effects have no biological meaning

*per se*. The variance structures for and are and respectively. Then, the (co)variance structure for all the four random effects is:(6)where K is an additive-dominance unscaled relationship matrix is a covariance matrix that associates to SNP effects. Equation (6) is a typical correlated random effects structure, and such structures can be fit using mixed model software.

Nevertheless, variance components in are associated to the scale of SNP effects. To adjust variance components in the matrix to be associated to the scale of individuals, similar to Vitezica *et al.* (2016) and Xiang *et al.* (2016), a genomic relationship matrix is introduced, where is the trace of the relationship matrix K and is twice the number of individuals involved in the matrix The is the average of the diagonal elements in the relationship matrix Let a constant then matrix . Then, the Equation (6) will change to:(7)where and are estimated genotypic additive variance, genotypic dominance variance, and covariance between genotypic additive and dominance effects at the level of individuals, and these can be estimated using REML implemented in standard animal breeding software.

Still, these estimated variance components cannot be interpreted as the genetic variances in the population (Legarra 2016). The estimated variance components in Equation (7) should be scaled to the expected variance components of a population as in Legarra (2016) (see Appendix B), as follows:(8)where is the expected genotypic additive variance, is the expected genotypic dominance variance, is the covariance between expected genotypic additive and dominance effects, and statistics for respectively. Thus, the total genetic variance is and the expected correlation between genotypic additive effects u and genotypic dominance effects **v** is:(9)Replacing in the formula above by its expectation (see Appendix B), which is negative, and using Equations (3) and (7) to see that we obtain that is negative, as also shown in Equation (4).

We derived formulae that can be used to estimate the covariance between genotypic additive and dominant effects. Based on this, four hypothesis can be proposed: (1) association between additive and dominant effects is captured in genomic model by a covariance if the MAC is used, but cannot be captured if the RAC is used; (2) based on Equation (9), the covariance is negative when the absolute additive effects and dominance coefficients of QTL are positively correlated [*e.g.*, *BayesD3* in Wellmann and Bennewitz (2012)]; (3) a long-term directional selection is a possible cause for different from 0; and (4) the predictive ability of a genomic model could be improved if the is included. A simulation study was used to test these hypotheses.

### Genomic models

To obtain mixed models with centered a and d (equivalently, centered u and v) two regressions are needed: in MAC and RAC, a regression on the proportion of heterozygotes [or its counterpart the genomic inbreeding, see Xiang *et al.* (2016)] and in MAC, a regression on the proportion of major alleles.

In this study, the full genomic model (M1) can be written aswhere y is a vector of user defined phenotypic values, μ is the overall mean, and (only included in MAC, not for RAC) models the regression of phenotype on proportion of most frequent alleles, *e.g.*, m is a vector with elements , matrix is the element in the incidence matrix for the additive effects for random individual j with MAC, *N* is the number of SNP markers used, and A is the regression coefficient, which needs to be estimated; models the inbreeding depression (Vitezica *et al.* 2016; Xiang *et al.* 2016) (*e.g.*, ), where f is inbreeding coefficient and b is the inbreeding depression parameter per unit of inbreeding, which needs to be estimated. Vectors * ** *, and

**v**are genotypic additive and dominance individual effects as in Equation (7), matrix

**I**is an identity matrix to assign genotypic effects u and

**v**to the corresponding phenotypic records, and e is the overall residual. The expectation of both u and v are zero after inclusion of and in the model. The covariance structure of random effects

**, and**

**v**are as in the Equation (7). If u has n levels, then an additional n levels for (from ) need to be declared to achieve the factorizable structure of the covariance matrix. Similarly, declare variables with levels 1 to and then levels of v are from

The M1 model was compared to a submodel with covariance between u and v equal to zero (Vitezica *et al.* 2016; Xiang *et al.* 2016), aswhere was only included for MAC, but not for RAC; and For both M1 and M2, variance components and associated SE were estimated by GREML using AIREMLf90 (Misztal *et al.* 2002) in different scenarios. Estimated genetic parameters were scaled as in Equation (8).

In addition, a model (M3) with orthogonal breeding values and dominance deviations (Vitezica *et al.* 2013) including genomic inbreeding depression was used. For M3, only RAC was investigated.where and Orthogonal incidence matrices and were coded as follows:

for genotypes ** ****for genotypes and p is the allele frequency of the second allele for each locus. Note that M3-RAC is strictly equivalent to M3-MAC as the cross products and **** are invariant to MAC or RAC coding, and therefore only M3-RAC is investigated.**

The variance proportions of genotypic additive variance () was calculated as where was the variance of genotypic additive effects; was the phenotypic variance, equal to the sum of variance of total genotypic effects () and residual variance (). The dominant variance proportion () was calculated as where was variance of genotypic dominance effects. The broad sense heritability was calculated as Note that variance proportions of total genotypic variance () is not the sum of and because the covariance between the genotypic additive and dominant effects is not equal to zero.

The goodness of fit of the models was measured by the The superiority of M1 over M2 was tested by a likelihood ratio test (LRT), which was calculated as The differences were assumed to follow a distribution with 1 d.f.

### Simulation

Phenotypic and genotypic data sets were simulated by the software QMSim, version 1.10 (Sargolzaei and Schenkel 2009). The trait was designed to be controlled by both additive and dominant gene actions, and the population mimicked a pig population. Heritability in the narrow sense was 0.38 and the additive genotypic variance was 0.66. The phenotypic variance was 1.74 and the dominance genotypic variance was set to 0.174. No polygenic effect was simulated.

The simulation steps are presented in Figure 1. In the first simulation step for creating the historical population (HP), 2500 discrete generations with a constant population size of 500 were simulated. To mimic the bottleneck of a pig population, from generation 2501 onwards, the population size was gradually reduced to 65 at generation 2530. Then, from the generation 2531 to 2535, the population size gradually increased from 65 to 220. At the last generation of the HP (generation 2535), the number of males was 20 and the number of females was 200. The LD decay of chromosome 1 was checked and it was in line with that in a real Danish Landrace/Yorkshire pig population (Wang *et al.* 2013). In the second simulation step, a recent population 1 (RP1) was generated by randomly mating the 20 males and 200 females from the last generation in the HP. Each female had 10 offspring and the sex proportions for progeny were fixed to 0.5. The goal of the RP1 was to expand the population size. The RP1 only had one generation (generation 2536), and there were 1000 males and 1000 females existing in RP1. In the next simulation step, 100 males and 500 females were randomly selected from the RP1 as the founders of recent population 2 (RP2). The RP2 had 13 generations (generation 2537–2550). For each of these generations, 100 males and 500 females who either had the highest phenotypic values (for a scenario with phenotypic selection) or were randomly selected (for a scenario with no selection) from the former generation were kept as parents for the next generation. Each female had ∼15 progeny (from 7 to 22) to mimic a real pig population. The sex ratio for the progeny was fixed as 0.5. For the last generation in RP2, there were ∼7500 individuals in total. All the simulation steps were repeated 10 times to create 10 independent data sets for the further analysis.

The genome consisted of 18 autosomes of 120 cM each. For the first generation in the HP, each chromosome had 200 segregating biallelic QTL loci and 18,200 biallelic marker loci randomly located (thus, in total, 3600 QTL loci and 327,600 marker loci). In the first generation, allele frequencies for QTL and markers loci were 0.5, and the recurrent mutation rate was

Since software QMSim cannot simulate dominance effects, in the RP2 (from generation 2537 onwards), the option of “ebv_est = external_bv” in QMSim was used to base selection decisions on a user provided file. In this study, selection decisions were made according to phenotypic values (phenotypic selection) and thus, the provided file contained user-defined phenotypes for each individual.

These phenotypic values were simulated as follows. For each QTL, additive effects a and dominance effects d were constant across generations. The additive effects a of 3600 QTL loci were drawn from a Student’s *t*-test distribution with 2.5 d.f. [ in R] (Wellmann and Bennewitz 2012). Then, dominance functional effects d were drawn in two different ways: *BayesD2* and *BayesD3*, as is proposed in Wellmann and Bennewitz (2012). For *BayesD2*, the magnitudes of a and d were related as and Bennewitz and Meuwissen (2010) showed that dominance coefficients δ follow a normal distribution with mean 0.193 and SD 0.312. Therefore, the dominance coefficients were simulated as and (element-wise multiplication) in R. For the *BayesD3*, the magnitudes of a and d were related as and Wellmann and Bennewitz (2012) assumed dominance coefficients δ follow a conditional normal distribution as where s is a scaling parameter and with . To generate the dominance coefficients δ following a distribution similar to that in *BayesD2*, was chosen and was used in R for *BayesD3*. The bivariate plots between a and d in the 10 simulated data sets are shown in the supplemental files (Supplemental Material, Figure S1 for *BayesD2* and Figure S2 for *BayesD3*). The mean empirical correlation (s.e.) between and δ over 10 repetitions was, for *BayesD2* and for *BayesD3*.

Then, for each individual, based on the genotypes of 3600 QTL loci and the corresponding QTL effects, “true” genotypic additive effects **u** and dominance effects v of each individual were calculated. Residual effects were sampled from a normal distribution. Afterward, the phenotype for each individual was calculated as the sum of an overall mean, true additive effects, dominance effects, and residual effects. Internally, QMSim used these phenotypes as the selection criteria for the next generation in scenarios with selection. In scenarios with no selection, replacement at each generation was done at random.

At the last generation of RP2, on average ∼650 QTL loci were segregating. Once the simulation was finished, each SNP marker simulated by QMSim was retained for the subsequent analyses if the minor allele frequency was larger than 0.05. In total, there were ∼50K segregating markers retaining. The parameter file for QMSim can be found in the supplemental material.

Four scenarios were studied (Table 2). Two scenarios had phenotypic selection: *SelYBD3* (“Selection Yes BayesD3”), where *a* and d were related as in *BayesD3*; and *SelYBD2* (“Selection Yes BayesD2”), where a and d were related as in *BayesD2*. Two scenarios had no selection: *SelNBD3* (“Selection No BayesD3”) and *SelNBD2* (“Selection No BayesD2”), with a and d related as in *BayesD3* and *BayesD2*, respectively. Within each scenario, MAC (most frequent allele as reference) in combination with M1 and M2 (M1-MAC and M2-MAC) and RAC (random allele as reference) in combination with M1, M2, and M3 (M1-RAC, M2-RAC, and M3-RAC) were applied. For each scenario, 10 replicates were run.

### Predictive abilities

The effect on predictive abilities of including the covariance between the genotypic additive and dominance effects in the genomic model was investigated. Prediction was performed by using M1, M2, and M3 in the scenarios with selection. In each replicate, 20% of individuals in the last generation of RP2 were randomly masked and put into the validation population; the remaining 80% of individuals were used as the training population. The predictive ability was measured as the correlation, in the validation population, between true total genetic values *g* that were known from the simulation and the estimated total genetic values for MAC and for RAC. Bias was measured as the regression coefficient of total g on estimated genetic values . The Hotelling–Williams *t*-test at a 95% confidence level was applied to evaluate the significance for the differences of predictive abilities between M1, M2, and M3 within different scenarios. Table 2 summarizes the analyses for the different scenarios.

### Data availability

The authors state that all data necessary for confirming the conclusions presented in the manuscript are represented fully within the manuscript. Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6111068.

## Results and Discussion

### Regressions on proportion of most frequent alleles and on genomic inbreeding in M1

Average regression coefficients (s.e.) of phenotypes on the proportion of the most frequent alleles (*A*) in the model were 12.61 (1.32) for *SelYBD3*, 12.10 (1.97) for *SelYBD2*, 0.02 (2.32) for *SelNBD3*, and 0.02 (2.51) for *SelNBD2*. The positive *A* for scenarios with selection is in agreement with the tendency that after long-term selection, additive effects for most frequent alleles are positive. The average (s.e.) inbreeding depression parameters per unit of inbreeding (*b*) were −6.36 (1.73) for *SelYBD3*, −6.20 (1.45) for *SelYBD2*, −6.63 (1.34) for *SelNBD3*, and −6.18 (1.52) for *SelNBD2*. These negative inbreeding depression parameters confirm the phenomenon of inbreeding depression.

### Variance components

Variance components were calculated using QTL (to obtain the true values) and also, using markers, for the four scenarios combining models M1 (with correlated additive and dominant genotypic effects) or M2 (with uncorrelated effects) with MAC (most frequent allele as reference) or RAC (random allele as reference). This gives four combinations (M1-MAC, M2-MAC, M1-RAC, and M2-RAC). Results were the mean of 10 replicates. Variance proportions of genotypic additive (), genotypic dominance (), and total genotypic variance over phenotypic variance are presented in Figure 2. All the other variance components are in Table S1 in the supplemental material.

For scenarios with phenotypic selection (*SelYBD2* and *SelYBD3*), estimated variance components in M1-MAC were not statistically significantly different from the true ones, while results in the other cases (M1-RAC, M2-MAC, and M2-RAC) were slightly but statistically different from the true ones. Similarly, all the calculated genetic parameters ( and ) in M1-MAC were not significantly different from the true values, while values in the other cases (M1-RAC, M2-MAC, and M2-RAC) were slightly different from the true ones in these two scenarios.

For scenarios with random selection (*SelNBD2* and *SelNBD3*), estimated genetic parameters in different cases did not dramatically deviate from the true ones. In all cases, and values were slightly overestimated. Estimates from M1-MAC did not show advantages over those from other combinations.

Overall, the variance components estimated with model M1 were close to the true values in different scenarios. In animal and plant breeding, most traits have experienced long-term selection. Thus, to estimate the genetic parameters more precisely, a genomic model considering covariances between genotypic additive and dominance effects (like M1) seems appropriate.

### Genetic correlations

True and estimated genetic correlations between genotypic additive effects and genotypic dominance effects () for combinations M1-MAC and M1-RAC in different scenarios are shown in Figure 3. When RAC was used, the was almost 0 in any scenario. However, when the most frequent allele was the reference (MAC), scenarios with phenotypic selection (*SelYBD3* and *SelYBD2*) yielded negative estimates of In such two scenarios, the absolute values of estimated were slightly lower than the absolute values of true For scenarios with random selection (*SelNBD3* and *SelNBD2*), both true and estimated were around zero.

Results showed that when a and d are related (Bennewitz and Meuwissen 2010; Wellmann and Bennewitz 2012), a negative correlation may be generated between the genotypic additive and dominance effects after long-term selection. The long-term directional selection seems to be a precondition for producing such correlation. M1-MAC can capture part of such correlation.

### LRT

The goodness of fit of M1 and M2 in different scenarios is shown as in the third and fourth columns in Table 3 for both MAC and RAC, respectively. For each allele coding way, M1 always had smaller numeric values of than M2 within different scenarios, which indicated that the M1 fitted the data set better than the M2.

For all the four scenarios, LRT showed no significant differences in fitting the data set between M1 and M2 () when RAC was applied. However, when MAC was used, M1 fitted the data set significantly better than M2 in scenarios with phenotypic selection (*SelYBD3* and *SelYBD2*). For scenarios with random selection (*SelNBD3* and *SelNBD2*), there were no significant differences of goodness of fit between M1 and M2, no matter which allele coding was applied. However, it can be seen that within each scenario, there was a tendency that M1 fitted the data set better than M2, which suggested that a genomic model including the relationships between genotypic additive and dominance effects fits the simulated data better than a model without considering such relationships.

### Predictive abilities

Based on the results of LRT, M1 only provided a better fit for the data set than M2 in scenarios with phenotypic selection (*SelYBD3* and *SelYBD2*) when MAC was used for coding SNP markers. Besides, in animal breeding, most traits have experienced long-term selection. Therefore, predictive abilities obtained from using M1 and M2 were only compared in these two scenarios (M1-MAC and M2-MAC). Predictive abilities were also computed for M2-RAC, because this model is commonly used in other dominance studies (Su *et al.* 2012; Vitezica *et al.* 2016; Xiang *et al.* 2016). Furthermore, M3-RAC was also applied so that the predictive abilities from coding genotypes in a classical (orthogonal) manner (Falconer and Mackay 1996; Vitezica *et al.* 2013) can be compared to that from coding genotypes in a genotypic manner directly.

Predictive abilities and unbiasedness in different scenarios are compared in Table 4. Overall, the Hotelling–Williams *t*-test indicated that predictive abilities from M1 were significantly higher than those from M2 (), but similar to those from M3 (). When MAC was applied, predictive abilities from M1 (M1-MAC) were ∼1% and 0.4% higher than those from M2 (M2-MAC) for *SelYBD3* and *SelYBD2*, respectively. This result is in agreement with LRT showing that goodness of fit of M1 was significantly superior to that of M2. In addition, it also implies that the relationships between additive and dominant effects were appropriately captured. The differences of predictive abilities between M1-MAC and M2-RAC increased to 1.6% and 0.7% for *SelYBD3* and *SelYBD2*, respectively. These differences indicated that consideration of the covariance between genotypic additive and dominance effects via using MAC would possibly generate a small extra genetic gain. For M2, when MAC was applied (M2-MAC), the predictive abilities were ∼0.7% and 0.3% higher than using RAC (M2-RAC). This result indicated that predictive abilities benefited from the inclusion of regression of phenotype on the proportion of the most frequent alleles in the model, even if the covariance between additive and dominance effects were not considered in the M2. This result again recommends the use of MAC in other genomic evaluation models to enhance their predictive abilities, which is in line with the smaller numeric values of for M1 than for M2. In terms of unbiasedness, the regression coefficients observed from M1 were slightly closer to 1 than those from M2 in scenario *SelYBD2*, but there was no clear trend in scenario *SelYBD3*.

However, when comparing the predictive abilities from M3-RAC with other combinations, except for M1-MAC, M3-RAC performed slightly better than M2-MAC and M2-RAC. This phenomenon indicates that when the covariance between additive and dominant effects is ignored, genomic evaluation via coding genotypes in the genotypic manner may work worse than coding genotypes in the classical, orthogonal manner (in terms of breeding values and dominance deviations). However, when considering the covariance between additive and dominance effects, coding genotypes in the MAC (M1-MAC) increased predictive abilities ∼0.7% and 0.3% compared to M3-RAC, which suggested the use of MAC in combination with models considering the covariance between additive and dominant effects. In terms of unbiasedness, the regression coefficients observed from M3 were slightly further from 1 than those from M1 and M2.

Overall, the comparisons of the different models and allele codings confirmed that the predictive ability of a genomic model could slightly improve if the is included (M1-MAC). The inclusion of such covariance does not need extra computing time. The computing time for estimating variance components is similar for M1-MAC and M2-MAC, which is slightly shorter than that for M3-RAC/M3-MAC.

Wellmann and Bennewitz (2012) also derived an reproducing kernel Hilbert space (RKHS) model for estimating total genetic values that assumed a correlation between additive and dominance effects, which was also a genomic BLUP model. This model was derived on the basis of *BayesD2*. A property in the RKHS model is that the covariance among two genotypic values depends on the assumed *a priori* joint distribution of *a* and *d*, *i.e.*, , and which were obtained from theoretical arguments and used for computing the Kernel matrix. We prefer instead to estimate these parameters explicitly from the data sets. Even though standard software can be used for prediction in the RKHS model, it cannot be used for the estimation of genetic parameters.

### Conclusions

We first presented a novel and simple way of incorporating, in populations under selection, the correlation between genotypic additive and dominance effects in a genomic model that can be implemented using standard mixed model software. Our study showed that: if the population is in HW equilibrium and the absolute additive effects and dominance coefficients of QTL are positively correlated [*e.g.*, *BayesD3* in Wellmann and Bennewitz (2012)], after directional selection, a negative correlation between genotypic additive and dominance effects is expected. A genomic model based on a reference allele that is the most frequent one at each locus can capture part of the negative genetic correlation between genotypic additive and dominant effects, but this cannot be captured if the RAC is used. This new approach is applied to a directional selected trait controlled by both additive and dominant gene actions. When such correlation is taken into account in the model, accuracies of estimated total genetic values can be improved by up to 1.5% in genomic evaluation while bias is slightly reduced.

## Acknowledgments

T.X. was supported by the Fundamental Research Funds for the Central Universities (project number 2662018QD001). O.F.C. was supported by the center for Genomic Selection in Animals and Plants (GenSAP) funded by the Danish Council for Strategic Research. A.L. and Z.G.V. thank financing from INRA SelGen metaprogram projects X-Gen and SelDir. We are grateful to the genotoul bioinformatics platform Toulouse Midi-Pyrenees (Bioinfo Genotoul) for providing computing resources. Useful comments from two anonymous reviewers are acknowledged.

## Appendix A

Here, we consider and argue that it often holds that when [*BayesD3* model in Wellmann and Bennewitz (2012)] and when [*BayesD2* model in Wellmann and Bennewitz (2012)].

In the main part of the paper, we have derived that the covariance between additive and dominance effects is determined by and since and we see that the result holds when when and when This latter property often holds; for example it holds when which we show below to hold for the bivariate normal case.

Here, we derive an expression for when assuming a bivariate normal distribution on For simplicity of notation assume that and and define the centered random variables and First, we write which reduces to For the bivariate normal distribution the central moments are and and from this we obtain that

Finally, we note that, strictly speaking, cannot follow a normal distribution, so the above expression is only an approximation.

## Appendix B

Consider a set of individuals with genetic values (in a broad sense, these can be understood as genotypic values) g, and these genetic values are assumed drawn from a certain distribution, *i.e.*, and Since the genetic values are unknown and drawn from a sampling distribution, the variances of these genetic values will also have a certain distribution (Sorensen *et al.* 2001; Legarra 2016). Legarra (2016) showed that, on average, the expectation of the variance for these genetic values is where is the average of the diagonal of G and is the average of matrix and is the difference between the two. Thus, the expected variance is associated with relationship matrix Only if is equal to 1, the expectation of variance for genetic values can be identical to the estimated variance component (Speed and Balding 2015). In this study, the submatrices and do not yield respective and equal to 1. Thus, the estimated variance components were scaled to the expected variance components of a population using, for instance, and

The values of the statistics can be derived analytically. For one SNP locus, if the population is in Hardy–Weinberg equilibrium and p is the frequency of the allele whose homozygote has the genotypic value the frequencies of different genotypes in the Z and W matrices for one locus are:Therefore, the elements in the matrix with corresponding frequencies are:For instance, in a one-locus model, value appears with frequency (the number of animals heterozygote) times (the number of homozygotes) in matrix Extending the reasoning to all loci, from the above table, (because an individual cannot be homozygote and heterozygote at the same time); where Thus, If (MAC), is negative because but if RAC is used then because

## Footnotes

Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6111068.

*Communicating editor: M. Calus*

- Received April 7, 2018.
- Accepted May 8, 2018.

- Copyright © 2018 by the Genetics Society of America