# On the Additive and Dominant Variance and Covariance of Individuals Within the Genomic Selection Scope

^{*}Université de Toulouse, Unité Mixte de Recherche 1289, Institut National de la Recherche Agronomique/Institut National Polytechnique de Toulouse, F-31326 Castanet-Tolosan, France^{†}Departamento de Anatomía, Embriología y Genética, Universidad de Zaragoza, 50013 Zaragoza, Spain^{‡}Instituto de Biocomputación y Física de los Sistemas Complejos, Universidad de Zaragoza, 50018 Zaragoza, Spain^{§}Institut National de la Recherche Agronomique, UR 631 Station d'Amélioration Génétique des Animaux, F-31326 Castanet-Tolosan, France

- 1Corresponding author: Unité Mixte de Recherche 1289 TANDEM, ENSAT, Avenue de l'Agrobiopole, Postal Box 32607, 31326 Auzeville Tolosane, France. E-mail: zulma.vitezica{at}ensat.fr

## Abstract

Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or “breeding” values of individuals are generated by substitution effects, which involve both “biological” additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, **D**, which is similar to the **G** matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the “genotypic” value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.

- variance
- dominance
- relationship
- genomic evaluation
- mixed models

GENOMIC evaluation models typically fit only marker or haplotypic additive effects, either explicitly, estimating the effect of each marker (Meuwissen *et al.* 2001; VanRaden 2008; De Los Campos *et al.* 2009), or implicitly through the so-called “genomic” relationship matrix (VanRaden 2008; Goddard 2009), which uses an equivalent model from which the marker effects can be inferred by backsolving. The additive substitution effect represents the average change in genotypic value that results when an A_{1} allele is randomly substituted for an A_{2} allele in a population (Falconer 1981; Lynch and Walsh 1998). The substitution effects in quantitative genetics are of primal interest, as they capture a large part of dominant and higher-order interactions across genes and alleles (*i.e.*, epistasis) (*e.g.*, Cockerham 1954; Kempthorn 1954; Falconer 1981). In addition, selection acts on additive substitution effects because alleles, not genotypes, are passed from parents to offspring. However, dominance is of theoretical and practical interest, because it is heavily used in crosses of animal breeds and plant lines (*e.g.*, in pigs, poultry, or corn). In principle, assortative mating or mate allocation (when the breeder chooses specific couples of individuals as parents for the next generation, for instance, favorable dominant combinations) can boost the field performances of livestock and crops (Varona and Misztal 1999; Toro and Varona 2010).

In livestock populations, one of the main reasons why dominance effects have not been widely used or estimated *per individual* is that pedigree relationships are not enough informative, as large full-sib families are typically needed for any accurate estimate. In prolific species such as chickens and pigs, the litter effect is highly confounded with family. In addition, the prediction of dominant values is typically very cumbersome because it involves complex computations (*e.g.*, Misztal *et al.* 1998; Mrode and Thompson 2005).

Recently, genomic evaluations have renewed the interest in the prediction of nonadditive genetic effects (*e.g.*, Toro and Varona 2010; Su *et al.* 2012; Wellmann and Bennewitz 2012). One of the reasons is that it is much easier to work with dominance, knowing for each evaluated locus which animals are heterozygotes, but also that prediction of the genotypic value of future matings is straightforward (Toro and Varona 2010).

An example of the richness of parametric genomic prediction methods for the estimation of marker effects is to have equivalent models and interpretations through the genetic covariance between individuals [*i.e.*, genomic relationships (VanRaden 2008; Goddard 2009; Yang *et al.* 2010)] and estimation of the base population variances (Forni *et al.* 2011; Legarra *et al.* 2011b; Sillanpaa 2011). These equivalences are not quite completely described for the case of dominance effects in genomic evaluations (*e.g.*, Toro and Varona 2010; Su *et al.* 2012; Wellmann and Bennewitz 2012). These works are either incomplete or, in part, induce erroneous interpretations of correct models. The aim of this study is to show the equivalences between additive and dominant effects at the marker and the population levels and to present how to compute from genotypes the covariances between individuals due to dominant deviations, *e.g.*, “dominant genomic relationships”. Real data and a simulated example are used to illustrate the principles.

## Theory

A model including additive and dominant effects of the SNP markers can be written as (1)(Toro and Varona 2010) or, in matrix form for a set of individuals,where is the phenotypic value of the individual *i* and is the population mean. An additive effect and a dominant effect are included for each of the *n* SNP markers. The covariate = 1, 0, and −1, for SNP genotypes , , and , respectively. For the dominant component, = 0, 1, and 0 for SNP genotypes , , and , respectively.

In model (1), the meaning of “dominant” and “additive” SNP effects is clear from the formulation of the model and basically parallels the biological intuitive meaning. However its transposition to classical, pedigree-based concepts (in particular, partitioning of the variance) is not that obvious. In the following, and to avoid ambiguities, we talk either about (*biological*) “genotypic” additive and dominant values or about (*statistical*) “breeding” values and dominance deviations. Quoting Hill *et al.* (2008, p. 1) the “biological” value is “the observations of dominance […] at the level of gene action at individual loci, exemplified by a table of genotypic values,” and the “statistical” values are “the observations of variance due to these components in analysis of data from a population.” Traditional treatment of quantitative genetics distinguish carefully between biological effects and statistical effects (*e.g.*, Crow and Kimura 1970; Falconer 1981; Hill *et al.* 2008). For instance, even when the biological action of genes is mostly dominant or epistatic, most of the genetic variation transforms into additive in the sense of transmission to offspring or of a “substitution” effect of a gene in a population (*e.g.*, Hill *et al.* 2008).

For instance, a recent study by Su *et al.* (2012) partitions genetic variance into additive and dominant, in such a way that these estimated variances are not directly comparable to pedigree-based estimates. In this work we explain why this is so and how partitioning should proceed. We also present a dominant genomic relationship matrix that can be introduced in genomic evaluation linear mixed models similar to genomic best linear unbiased prediction (GBLUP) (VanRaden 2008). The first part of this work is just a short review of classical treatments (*e.g.*, Crow and Kimura 1970; Falconer 1981).

Consider one locus. Following model (1) the genotypic value *G* of an individual iswhere the values *a* and *d* are deviations from the midpoint of the two homozygous genotypes (Falconer 1981). The genetic mean of the population is thereforewhere *p* is the frequency of and

### Classical parameterization

Classically, the genotypic values are split in additive (or breeding) values and dominant deviations, as in Falconer (1981),where *u* and *v* stand for breeding values and dominant deviations. The breeding value for an individual iswhere is the substitution effect of the gene. So, the breeding values of a set of individuals are (with **z** coded as in VanRaden 2008), (2)in individual *i*. Matrix including all markers is identical to matrix **T** in (1) but centered.

Also, the dominant deviation of an individual isSo withNote that in model (1).

The genetic variance is

We can partition the genetic variance into components due to individual additive value (breeding values, *u*), and dominance deviations (*v*). In fact, and the *additive genetic variance* is

Also like breeding values, dominance deviation sums to zero () and the *dominance genetic variance* is

Partitioning the total genetic variance, we can see that , where the first term is the additive genetic variance and the second term corresponds to the dominance genetic variance. Note that this parameterization implies that, in a noninbred population with Hardy–Weinberg equilibrium (*i.e.*, without inbreeding) the covariance between the breeding value and the dominance deviation is zero (Falconer 1981). Without inbreeding, the inclusion of dominance in the model is easy (Henderson 1985). However, its inclusion is more complicated with inbreeding, first, because there may be inbreeding depression and, second, because the covariance between inbred individuals with dominance is no longer a function of just and (De Boer and Hoeschele 1993; Lynch and Walsh 1998).

If *a* and *d* effects are considered to be random and covariance between them equal to zero, the covariance of additive individual effects, **u**, iswhere and are the SNP variances for additive and dominant components, respectively.

Now, we have that assuming that the covariance between *a* and *d* is zero. So in fact (3)as expected, which is the classical **G** matrix of GBLUP (VanRaden 2008). It is well known that in a base population with Hardy–Weinberg equilibrium, the average of the diagonal of **G** is 1, whereas the average off-diagonal is 0.

As for the dominant deviations, its covariance isIf we take then (4)In a base population with Hardy–Weinberg conditions, it turns out that the diagonal of **D** sums to which is equal to 1. In addition, the sum of off-diagonal elements of **D** is 0. In effect, this sum can be written aswhich sums to 0. Both features correspond to proper definitions of dominant relationships in a base population (Hayes *et al.* 2009).

### Alternative (genotypic) parameterization

From the genotypic value, we can also define and as the genotypic additive and dominant effects, *i.e.*, the parts attributable to the additive and dominance biological effects of the markers. Note that this is not a breeding value.

Thus, we have The mean of genotypic values of individuals due to additive effects is and that due to dominant effects is

Thereforewhich is as in (2) and as in VanRaden (2008). Also,or with **h** asNote that incidence matrix is simply **X** in (1) but centered. Remember that is the variance of genotypic additive values, and is the variance of genotypic dominant values. ThenandThe covariances across genotypic additive values, , areand introducing which is also the classical **G** matrix of genomic BLUP (VanRaden 2008), but with a different variance component.

As for the covariance in genotypic values due to dominance, this isIntroducing thenThis is as in Su *et al.* (2012). For model (1), **H** corresponds to the incidence matrix **X** (shifted by a constant). However, note that .

Therefore, the breeding (or classical) and the genotypic models are equivalent models to explain the data (**y**) but their interpretation must be different. The first model is termed in breeding values and dominant deviations and the second model in genotypic additive and dominant effects of markers. In particular, the first model provides direct estimates of breeding values, whereas the second model is more obviously interpretable in terms of dominant and additive effects of the markers. The interpretation of the variances is also different; for instance, is the variance useful in selection or in prediction of potential selection response. In particular, to compare with pedigree-based estimates should be used, not Also, the dominant relationship matrices are different. For instance, the covariance matrices for three individuals with the three possible genotypes, each one for one locus and assuming and (the frequency of ), are for the breeding modeland for the genotypic modelwith andThus, the dominant deviations across individuals have stronger covariances when two individuals share a rare allele (−0.41) in the classical parameterization.

### Going from genotypic to breeding additive and dominance variances

Assume that we have variances estimated in either model. Then we have the identitiesIt can be verified that Also, if all variances are identical, and if From these identities, conversion across models is immediate.

### Extension to multiple loci

The extension to multiple loci assumes linkage equilibrium and uncorrelated marker effects, which are ordinary assumptions (VanRaden 2008; Gianola *et al.* 2009).

ThenAlso, the covariance matrices are simple extensions of (3) and (4): (5) (6) (7) (8)For instance, Su *et al.* (2012) estimated several variance components using markers. These are parameterized in term of genotypic effects; *i.e.*, Assuming, for instance, that their allelic frequencies are drawn from a distribution, the terms , etc., can be computed (by Monte Carlo or algebraic integration). Then, the results of Su *et al.* (2012) could be converted as follows:In this example, the dominance variance () is not 13% of the genetic variance (), but 9.7%. The genotypic model overestimates the dominance genetic variance and, consequently, underestimates additive genetic variance.

### Inbreeding

It is not clear whether the inbreeding is taken automatically into account in the models. The model in Equation 1 makes no hypothesis of Hardy–Weinberg equilibrium. Inbreeding (deviations from Hardy–Weinberg) is accounted for because genotypes are observed. However, properties of the breeding and genotypic models are no longer well defined; for instance, covariances between **u** and **v** appear. But because the genotypic and breeding models are equivalent to model (1), the genotypic value can always be correctly inferred. However, model (1) does not account for inbreeding depression. This needs further investigation.

## Examples

### Mouse data analysis

Legarra *et al.* (2008) analyzed phenotypes and genotypes of mouse data (Valdar *et al.* 2006), including 1884 individuals and 10,946 markers. For illustration, we have reanalyzed the data and estimated and using the same data set and model (1) (plus an extra “cage” effect, as in the original work). Variance components under genotypic and breeding models were estimated for four traits (weight, growth speed, body length, and body mass index). Estimation was by Bayesian inference, using flat priors for the variances and a Gibbs sampling algorithm. We used the publicly available software GS3 (Legarra *et al.* 2011a) for estimation. Both the software and the mouse data are available at http://genoweb.toulouse.inra.fr/~alegarra/gs3dist.tar.gz.

### Simulated data

We simulated a quantitative trait with additive and dominant QTL action, so that the parts of total variance attributable to additive and dominance effects (in terms of breeding values and dominant deviations) were 0.20 and 0.10, respectively. Simulation was as in Toro and Varona (2010), including 10 chromosomes of 1 M and 1000 markers each. A population was generated by mutation and drift over 1000 generations with an effective population size of 100. After 1000 generations, the population was expanded to 1000 individuals (500 per sex) and remained at 1000 with random mating for 2 discrete generations. Therefore, the data set consisted of 2100 individuals. These 2100 individuals were genotyped and phenotyped and then used to estimate variance components. No inbreeding occurred within the 2100 individuals.

In variance components estimation four different models were used. To verify whether the correct genetic parameters could be estimated using markers, the genomic model involving breeding values and dominance deviations (MGD) waswhere the covariance matrices **G** and **D** were constructed as in (5) and (6).

A predigree-based alternative model including additive and dominant relationship matrices (MADped) usedwhere **A** is a pedigree-based relationship matrix and is a pedigree-based dominant relationship matrix (Cockerham 1954; Henderson 1985). A simplified model with two different variance structures, either (MG) or (MA), was also tested. These two variance structures define a genomic model with an additive genomic relationship matrix (MG) and a pedigree-based model (MA) respectively. Genetic ( ) and residual () variances were computed by restricted maximum likelihood, using the software remlf90 (Misztal *et al.* 2002) (available at http://nce.ads.uga.edu/wiki/doku.php).

Variance components under genotypic parameterization ( ) as in Su *et al.* (2012) were also calculated using the modelwith covariance structures as in (7) and (8), estimating thus variances of the additive and dominant effects of the markers (MG*D*).

In addition, we verified the ability of each model to correctly predict breeding values and dominant deviations for each individual in the last generation. Regression of true breeding value (TBV) on estimated breeding value (EBV) was used as a measure of the inflation of the prediction model, where a regression coefficient of 1 denotes no inflation. The predictive ability of model was also evaluated by accuracy computed as the correlation between TBV and EBV. Results were the mean of the 10 replicates for each model.

Simulated data sets (phenotypes, genotypes, and pedigree) are available at http://genoweb.toulouse.inra.fr/~zvitezic/simu_for_genetics.tar.gz. Programs for simulation of the data sets and for construction of the different relationship matrices are available on request from the authors.

## Results and Discussion

### Mouse data analysis

The results for variance components are shown in Table 1.

Estimates of the partitioning of the genetic variance do clearly differ; for instance, for growth the proportion of genetic variance attributed to dominance effects halves if one uses the classical parameterization. In addition, broad-sense heritability is sometimes much higher than its additive estimate (*e.g.*, in growth speed, additive heritability is 0.025 whereas broad-sense heritability () is 0.036). The ratio between and ranges from 1.36 to 1.50 for all the traits and depends only on the distribution of the allelic frequencies of markers; the reason it is not constant is because of Monte Carlo noise in the estimates. The correlations between estimated and in this data set were 0.99 and 0.86. This illustrates that estimated individual effects are different for the different parameterizations, in particular for dominance effects. We stress that the correlation between total estimated genetic values and is 1, because both models are equivalent.

### Simulated data

Estimates of additive and dominance variances for the four models are presented in Figure 1. Estimates of additive genetic variance were 20.7 ± 4.1, 21.4 ± 3.3, 21.4 ± 3.3, and 20.5 ± 4.2 for the models MA, MG, MGD and MADped, respectively. As was expected, the SD of estimates was greater in MA (MADped) than in MG (MGD). This means that the amount of statistical information to estimate variances is greater with markers than with pedigree. The additive genetic variance was well estimated also when models included dominant effects. These results do not agree with Su *et al.* (2012) who found lower estimates of additive genetic variance when the model included dominance and epistasis effects. The expanded model of Su *et al.* (2012) does not handle properly the fact that the (additive) substitution effects α include part of biological dominance. For instance, when only additive effects are included in their data analysis, part of biological dominance is included; but when dominance is fitted, this part goes to the dominance variation. The additive variance estimated by the increasingly complex models (with dominance and epistasis) in Su *et al.* (2012) has not the same meaning throughout all models, whereas in the breeding parameterization it is always the same, irrespective of whether additive only or additive and dominance effects are fitted.

The dominance variance estimated with the MGD model was 13.2 ± 3.5, which agrees with the simulated dominance variance of 10. Estimates of the proportion of dominance variance to phenotypic variance (“dominance heritability”) are shown in Figure 2. Dominance variance estimation had a large standard error in MADped (Figure 1 and Figure 2). These results illustrate the difficulties in obtaining a good estimate of dominance variance from pedigree information. In addition, the results show how genomic information allows one to obtain an accurate estimation of dominant deviations. Further, in practice, use of dominance through the implementation of mate allocations using markers is straightforward, contrary to pedigree-based methods (Varona and Misztal 1999).

Under the genotypic MG*D* model, the additive and dominance variances were 14.7 ± 4.4 and 19.9 ± 5.3, respectively. The dominance variance was overestimated; it was 57.5% instead 38% of the genetic variance (Figure 1). Consequently, this genotypic parameterization underestimates additive genetic variance. These results show that estimates from the genotypic model are not directly comparable with pedigree-based estimates, although comparable values can be derived from them using allele frequencies.

The degree of inflation from the prediction methods is indicated by the coefficient of regression of TBV on EBV (Table 2). The optimal prediction model for genetic merit of young individuals would have a regression coefficient close to 1. For each effect, the differences among the models were small, and the approaches achieved very similar inflation.

The accuracies for each model (correlations between TBV and EBV) are shown in Table 2. Compared with the MA and MADped models, all genomic prediction methods (MG and MGD) increased accuracy by ∼18% for additive effects. With dominant effects, accuracy from MGD was higher (32.5%) than from MADped (Table 2). In addition, the genomic models including dominance have the advantage that they provide a simple framework, compared with pedigree models, to estimate dominance effects (*i.e.*, Su *et al.* 2012; Ertl *et al.* 2013).

### Conclusions

Models using genomic additive and dominant relationships can recover information correctly. The parameterization is largely a matter of convenience, but the parameterization in terms of breeding values and substitution effects is more adequate for selection (both for ranking animals and for predicting genetic improvement). The genotypic model allows simpler estimates of the results of mate allocations. Translating estimates from one model to the other one is simple if the distribution of the allelic frequencies is available. We note, however, that estimates from the genotypic model are not comparable with pedigree-based estimates, although the transposition is fairly easy using the identities that we provide.

## Acknowledgments

We are grateful to members of the X-GEN project, Miguel Toro, and Luis Alberto Garcia-Cortes for their helpful and constructive comments. This work has been financed by INRA SELGEN metaprogram, project X-GEN (Z.G.V. and A.L.), as well as the Comisión Interministerial de Ciencia y Tecnología (CICYT) of Spain, project AGL2010-15903 (L.V.). This project was partly supported by the Toulouse Midi-Pyrénées bioinformatic platform.

## Footnotes

*Communicating editor: D.-J. De Koning*

- Received July 21, 2013.
- Accepted October 4, 2013.

- Copyright © 2013 by the Genetics Society of America