## Abstract

In plant breeding, heritability is often calculated (i) as a measure of precision of trials and/or (ii) to compute the response to selection. It is usually estimated on an entry-mean basis, since *the phenotype* is usually an aggregated value, as genotypes are replicated in trials, which stands in contrast with animal breeding and human genetics. When this was first proposed, assumptions such as balanced data and independent genotypic effects were made that are often violated in modern plant breeding trials/analyses. Due to this, multiple alternative methods have been proposed, aiming to generalize heritability on an entry-mean basis. In this study, we propose an extension of the concept for heritability on an entry-mean to an entry-difference basis, which allows for more detailed insight and is more meaningful in the context of selection in plant breeding, because the correlation among entry means can be accounted for. We show that under certain circumstances our method reduces to other popular generalized methods for heritability estimation on an entry-mean basis. The approach is exemplified via four examples that show different levels of complexity, where we compare six methods for heritability estimation on an entry-mean basis to our approach (example codes: https://github.com/PaulSchmidtGit/Heritability). Results suggest that heritability on an entry-difference basis is a well-suited alternative for obtaining an overall heritability estimate, and in addition provides one heritability per genotype as well as one per difference between genotypes.

THE idea behind measures of heritability as used in plant breeding is relatively simple: they express the proportion of the total phenotypic variance that is attributable to the average effects of genes, which in turn determines the degree of resemblance between relatives (Falconer and Mackay 2005, chapter 10). Lourenço *et al.* (2017) phrase it as “the extent to which a phenotype is genetically determined.” A phenotype is the composite of an organism’s observable traits. It results from (i) the expression of the organism’s genotype, (ii) the influence of environmental factors, and (iii) the interactions between both. Thus, heritability investigates the relationship between observed/phenotypic values with phenotypic variance and their respective underlying true genotypic values with genotypic variance . We can furthermore dissect g and ** into additive, dominance, and epistasis components to extract average effects of alleles and breeding values , with variance . Depending on whether genotypic values or breeding values are considered, we refer to broad-sense heritability or narrow-sense heritability , respectively (***e.g.*, Xu 2013). Accordingly, there is a clear distinction between and . However, note that the approach presented in this article is relevant for both measures, which is why we will refer to *heritability* in general throughout this article, unless it is necessary to refer to one of the two specifically. The true genotypic/breeding values (and their variances) are of course unknown, but can be estimated/predicted from phenotypic data.

Piepho and Möhring (2007) pointed out that heritability was originally proposed in the context of animal breeding where the observations are collected from individuals, and are used in mixed models to estimate genetic parameters or breeding values by utilizing the additive genetic relationship of all individuals. It is important to note that in animal breeding, each individual usually has its own and unique genotype. Hence, an *animal* and a *genotype* refer to the same thing: a single individual. Naturally, this means that genotypes/animals cannot be truly replicated in or across environments. Instead, the same genotype/animal may only be measured repeatedly over time (Mrode and Thompson 2014, chapter 1.3.2) and/or via its progeny/relatives (Mrode and Thompson 2014, chapter 3.4). Repeated observations of individuals are either modeled by adding a random uncorrelated permanent environment effect to the mixed model or by treating repeated observations as different traits.

However, in plant breeding, we usually have crop cultivars/lines/varieties that are represented by a large group of individual plants with exactly the same genotype. This is mainly because most crop cultivars are clones, inbred lines, or hybrids. Thus, in contrast to animal breeding, a *genotype* in plant breeding does not refer to just a single individual, but to all individual plants belonging to the same cultivar and thus sharing the same genotype. In other words, a genotype in animal breeding is a single, genetically unique individual, whereas a genotype in plant breeding is usually a group of genetically identical individuals (though there are exceptions; see the *Discussion*). This allows for true replication of a genotype in and across multiple environments, even at a single time point and thus without repeated measurements in time. Notice that plant breeders may also decide to have repeated measures over time and/or account for observations of related genotypes, but the salient feature of plant genotypes as opposed to animal genotypes is the availability of true replication.

As a consequence, the multiple observations of the same cultivar are usually aggregated to obtain a single phenotypic value. Levels of aggregation range from individual plants to means of many plants, with the same genotype tested across several locations and years in designed experiments. The need for this aggregation in plant breeding results is an additional step that is not necessary in animal breeding. Since we do not consider analyses of individual plants in this article, the phenotypic value for a genotype will always be assumed to be some sort of mean value. In this context, heritability is referred to as heritability on an entry-mean (*i.e.*, genotype-mean) basis, which stands in contrast with heritability in animal breeding and human genetics. This difference can also be deduced from the level of the heritability, which is usually much lower for complex traits in humans and animals compared to the entry-mean heritability of complex traits in plant breeding. Nowadays, the aggregation of the multiple observations into a mean phenotypic value per genotype is usually either done by best linear unbiased estimation (BLUE) or best linear unbiased prediction (BLUP). Note that the use of BLUPs here is not the same as in the framework of breeding value estimation typical for animal breeders (see *Materials and Methods*). The choice between BLUE and BLUP in plant breeding depends on the goal of the analysis and both are commonly used in practice [see Piepho *et al.* (2008), for more information and a review on the decision-making].

In early work on measures of heritability for plant breeding trials (*e.g.*, Hanson and Robinson 1963), it was assumed that (i) the trial design is completely balanced/orthogonal, (ii) genotypic effects are independent, and (iii) variances and covariances are constant, so that the arithmetic mean across all observations for a genotype is the natural choice to aggregate to a single phenotypic value. We will from now on refer to such a scenario as the *simple*, *balanced setting*. Only in a simple, balanced setting does heritability have multiple direct interpretations. Specifically, is (I) the ratio of genetic variance to phenotypic variance, (II) the slope coefficient of a linear regression of the genotypic values on the phenotypic values , (III) the squared correlation between genotypic values and phenotypic values , and (IV) the ratio of response to selection to selection differential [see Falconer and Mackay (2005), pages 160 and 186]. For the simple, balanced setting, these interpretations can be represented by(1)where is often also referred to as the genetic gain and S is the mean phenotypic value of the selected genotypes, expressed as a deviation from the population mean. Note that when heritability is estimated via (1)-IV, it is referred to as the realized heritability, as it requires to be known, which means that selection has already occurred and that the offspring have been observed. By rewriting (1)-IV, we obtain the more directly interpretable *breeder*’*s equation*:(2)which allows the prediction of the expected response to selection and is the key reason why heritability plays such an important role in plant (and animal) breeding. Hanson and Robinson (1963, page 128) get to the heart of it: “Heritability has value primarily as a method of quantifying the concept of whether progress from selection for a plant character is relatively easy or difficult to make in a breeding program. A plant breeder, through experience, can perhaps rate a series of characters on their response to selection. Heritability gives a numerical description of this concept.”

The standard method of estimating for phenotypic mean values is by plugging estimates for and into (1)-I. Given the simple, balanced setting, these variances can easily be estimated from mean squares and their respective expected mean squares of a conventional ANOVA (Yan 2014, chapter 1). For a single environment, where genotypes are tested in replicates in a completely randomized design (CRD), the observed data may be modeled as(3)where is the kth observation of the ith genotype, μ is the intercept, is the effect for the ith genotype, and is the plot error effect corresponding to . The phenotypic value of genotype i can be obtained as the arithmetic mean across replicates, , which is also the BLUE in this simple, balanced setting. Assuming that and are random, with independent distribution, zero mean, and variances and , respectively, we can calculate the phenotypic variance of as:(4)When genotypes are tested in a multienvironment trial (MET), where an environment denotes a year-by-location combination and CRDs are used at all environments, the observed data may be modeled as(5)where is the kth observation of the ith genotype at the tth environment, μ is the intercept, is the main effect for the ith genotype, is the main effect for the tth environment, is the th genotype-by-environment interaction effect, and is the plot error effect corresponding to . Again, since we have a simple, balanced setting, we can obtain phenotypic values as . Assuming that , , , and are random, with independent distributions, zero mean, and variances , , and , respectively, one may then calculate the phenotypic variance as(6)where is the number of environments (Hallauer *et al.* 2010, page 59). This approach assumes a single variance for genotype-by-environment interactions , even when multiple locations were tested across multiple years. In the latter case, one may instead model the environmental effects via separate year, and location main and interaction effects as(7)where is the kth observation of the ith genotype in the mth year and the qth location, is the intercept, is the main effect of the ith genotype, is the main effect for the mth year, is the main effect for the qth location, is the th year-by-location interaction effect, is the th genotype-by-year interaction effect, is the th genotype-by-location interaction effect, is the th genotype-by-year-by-location interaction effect, and is the plot error effect corresponding to . Once more, we have the phenotypic values for each genotype as . Assuming , , , , , , , and are random, with independent distribution, zero mean, and variances , , , , , , , and , respectively, we can calculate the phenotypic variance as:(8)where and are the number of years and locations, respectively (Becker 2011; Yan 2014). Note that when a MET is conducted either at multiple locations within a single year or at a single location but across multiple years, (8) simplifies to (6). We will refer to heritability estimates involving (4), (6), or (8) and (1)-I as the *standard heritability* .

By examining (4), (6), and (8), it can be seen that the calculation of always involves the genotypic variance and all variances. However, it does not incorporate purely environmental variance components, such as in the context of (6), and , , and in the context of (8) [but see Yan (2014, page 3)]. Hence, referring to as the phenotypic variance may be considered misleading, as it neglects the purely environmental effects and therefore is definitely not the variance of phenotypic mean values. However, a key feature of the perspective on heritability put forward in this paper is that , as defined in (3)–(8), actually coincides with half the variance of a difference between two phenotypic mean values in the simple, balanced setting. Taking (5) as an example, the variance of a genotype mean is(9)whereas the variance of a difference between means of genotypes i and j is , which in this simple, balanced setting reduces to(10)It can be argued that this focus on genotype “differences” makes more sense than a focus on genotype means or effects themselves, since the goal is to select the best-performing genotype(s) to maximize and the ranking of genotypes is uniquely determined by all pairwise differences, whereas the individual genotypic effects and the absolute (mean) performance level as such do not inform about the ranking (Searle *et al.* 1992; Piepho *et al.* 2008). Accordingly, the correct ranking of genotypes and thus the precision of estimating genotype differences is more relevant than the precision of the genotype effect estimates. Furthermore, we found that especially in some older publications, the definitions of heritability refer to differences: Knight (1948) defines heritability as “the portion of the observed variance for which differences in heredity are responsible,” while Hanson and Robinson (1963, page 125) state that the “concept of heritability originated as an attempt to describe whether differences actually observed between individuals arose from the differences in genetic makeup between the individuals or resulted from different environmental forces.” Finally, it is no coincidence that two published alternative heritability estimation methods also involve the variance of a difference between genotypes: Cullis *et al.* (2006) proposed to calculate heritability as(11)where is the mean variance of a difference of two BLUPs for the genotypic effect and is the genotypic variance. The BLUE counterpart was suggested by Piepho and Möhring (2007) as(12)where is the mean variance of a difference of two genotypic BLUEs and is the genotypic variance. Both measures reduce to in the simple, balanced setting.

These considerations suggest that heritability is best defined in terms of pairwise comparisons among genotypes, and this is in fact the key perspective taken in this paper. We aim to provide two things. First, an elaboration of how a heritability on an entry-difference basis can be defined. Second, a comparison of how the application of compares to other published methods for estimating heritability on an entry-mean basis. To do so, we give a derivation of and subsequently apply it alongside six other heritability measures in four example analyses with real data.

## Materials and Methods

### Mixed model theory

We assume a standard linear mixed model for the observed data vector **y**, which is of the form(13)where β and **u** are vectors of fixed and random effects, respectively, X and Z are the associated design matrices, and ε is a vector of random residual errors. The random effects u and ε are assumed to be independently distributed as and , such that , where denotes the multivariate normal distribution with a mean vector given as the first argument and a variance–covariance matrix as the second. To obtain estimates for β as well as predictions for u, the mixed model equations (Henderson 1986; Searle *et al.* 1992) can be solved as(14)where(15)Note that we will explicitly use the words “estimate/estimator” and “predict/predictor” for BLUEs and BLUPs, respectively. In the context of variety trials, genotype effects g may be considered to be fixed or random. The choice between these two options depends on the goal of the analysis and both are commonly used in practice [see Piepho *et al.* (2008) for a review on decision-making as well as Appendix A]. Accordingly, we consider both as complementing cases in this article.

If g is assumed to be a fixed effect, we obtain genotypic BLUEs as a subset of with ** as the submatrix of associated with ****g**. If g is assumed to be a random effect, we obtain genotypic BLUPs as a subset of with ** as the submatrix of associated with ****g**. Note that in the case where g is the only fixed/random effect in the model, its estimators/predictors and their variance matrices are not subsets of, but equal to, and , respectively. Notice further that BLUPs can always be computed from entry means (BLUEs) using a stage-wise approach (see Appendix A). Only in the simple, balanced setting is the correlation of genotypic BLUEs and BLUPs equal to unity, and thus the ranking of genotypes does not change. On a side note, it should be mentioned that one may want to use linear combinations of , such as, *e.g.*, *least squares means* (SAS Institute Inc. 2013) or *estimated marginal means* (Searle *et al.* 2012) as phenotypic mean values instead. While this certainly results in different estimated values and (co)variances, it does not affect the heritability measures proposed in this article. This is because for all methods that rely on , the focus lies on differences, and any additional variance brought in via the linear combination with nongenetic effects cancels out when computing differences.

### Example data sets

To exemplify the approach proposed in this article, we use four examples that show different levels of complexity. In examples 1 and 2, a single-environment data set (Wright 2017) is analyzed, while in example 3 a subset (*i.e.*, a single environment) of the MET data set in example 4 (Hadasch *et al.* 2016) is analyzed.

*Example 1:*

The R-package *agridat* (Wright 2017) provides yield data from an oat trial at a single environment. The trial had 24 genotypes and was laid out as an α-design, with three complete replicates and six incomplete blocks of size four within each replicate. Yet, in this first example, we will analyze the data as if the trial were laid out as CRD, and thus ignore the information about the complete replicates and incomplete blocks. Therefore, the model is simply(16)where is the kth observation of the ith genotype, μ is the intercept, is the effect for the ith genotype, and is the plot error effect corresponding to with . Accordingly, this example is a simple, balanced setting.

*Example 2:*

Here, we analyzed the same data set as in example 1, but included effects accounting for the trial’s design in the model(17)where is the observation of the ith genotype in the oth block of the kth replicate, μ is the intercept, is the effect for the ith genotype, is the effect for the kth replicate, is the effect for the oth block in the kth replicate, and is the plot error effect corresponding to with . To recover interblock information, was modeled as random with , whereas was taken as fixed.

*Example 3:*

This data set was taken from Hadasch *et al.* (2016). It comprises plot data of 89 lettuce varieties (*i.e.*, “geno1”–“geno89”) tested at three environments (*i.e.*, “env1”–“env3”), each laid out as a randomized complete block design (RCBD). The measured trait was resistance to downy mildew scored on a scale ranging from 0 to 5. We note that despite the acknowledged discreteness of this response variable, residual plots did not indicate an appreciable deviation of residuals from the normal distribution. All 89 varieties were genotyped with a total of 300 markers [*i.e.*, 95 single nucleotide polymorphisms and 205 amplified fragment length polymorphism markers, see Hayes *et al.* (2014) for details] so that a marker matrix was provided. The biallelic marker for the **i**th genotype, and the wth marker with alleles (*i.e.*, the reference allele) and , was coded as 1 for , −1 for , and 0 for and . The kinship matrix K was obtained as(18)We note that for our purposes, the scaling of ** M** is largely immaterial, but see Appendix D for different scaling options of

**and the associated interpretations of genetic variance. In this example, we analyzed only the second environment (“env2”) and thus have 89 genotypes at a single environment laid out as RCBD with three replicates, which we modeled as(19)where is the observation of the ith genotype in the kth replicate, μ is the intercept, is the additive effect for the ith genotype, is the effect for the kth replicate, and is the plot error effect corresponding to with . For simplicity, we assume here that there are no genetic effects for dominance or epistasis, but an extension of the model to cover this case is straightforward (Wellmann and Bennewitz 2011; Dias**

*M**et al.*2018; Viana

*et al.*2018).

*Example 4:*

In contrast to example 3, we here analyzed all three environments jointly. The data set is not completely balanced, as (i) environment env1 has two complete replicates, while environments env2 and env3 have three, (ii) there are no observations for genotypes geno38 and geno49 at environment env1, and (iii) there are no observations for genotype geno81 at either env1 or env3. The linear mixed model used for this analysis was(20)where is the observation of the ith genotype in the kth replicate at the tth environment, μ is the intercept, is the additive effect for the ith genotype, is the effect for the tth environment, is the th genotype-by-environment interaction, is the effect for the kth replicate at the tth environment, and is the plot error effect corresponding to . We allowed for heterogeneous error variances between environments, assuming , where ⊕ is the direct sum operator, and and are the number of observations and plot error variance at the tth environment, respectively. Finally, and were taken as fixed effects, while was taken as random with .

*Modeling of genotypic main effect in all examples:*

Note that some of the heritability measures used in this article require estimates from both the model with a fixed genotype main effect as well as the model with a random genotype main effect. For example, in (12) needs an estimate for as well as for . Thus, in this article we always fit both potential models in parallel, and accordingly obtain both and ** for the same example. For models where g is assumed to be a random effect, we assume . Additionally, we make use of the marker information in examples 3 and 4, and thus estimate additive genotypic effects a by assuming .**

Further, notice that for each example the variance components in the model with the fixed genotype main effect are fixed to the corresponding estimates in the model where g is assumed to be a random effect. This is done to rule out the potential influence of an estimation error on variance component estimation between the two models (Schmidt *et al.* 2019).

### Pairwise heritability

As shown in (1), heritability can be interpreted and thus estimated in multiple ways. To derive a *pairwise heritability* or *heritability on an entry-difference basis* , we can exploit and slightly alter (1)-III so that it applies to phenotypic and genotypic differences instead of their respective mean values. We here first derive under the assumption of independent genotypic effects with constant variance and afterward generalize to correlated genotypic effects with nonconstant (co)variances. For both scenarios, we differentiate between and .

*Independent genotypic effects with constant variance:*

Given independent genotypic effects with constant variance (as in the simple, balanced setting), and irrespective of whether or ** is used, the variance of the true difference between genotypes i and j is(21)As stated before, the conditional variance, or prediction error variance, of the predicted BLUPs given the observed data is . However, notice that the marginal variance of is , where is the submatrix of G associated with ****g** (see Searle *et al.* 1992, chapter 7.4.d.). As a result, we find that the variance of a difference between two BLUPs of genotypes i and j is(22)where is the prediction error variance of a difference between BLUPs of genotypes i and j, and can be obtained via . Furthermore, we find the covariance to be(23)Hence, making use of (21), (22), and (23), the squared correlation between the true difference and its predictor, which is the heritability of the predictor of the difference , is found to be(24)This pairwise heritability is a quantity that explicitly accounts for unbalanced data, is analogous to the coefficient of determination (CD) (Piepho 2019), and can be reported in its own right. Yet, as the number of genotypes increases, the number of genotype differences can quickly get very large . Thus, it is not a convenient statistic to report. However, the statistic can be averaged per genotype, so that a genotype-specific average heritability is obtained, *i.e.*(25)Note that (25) is similar in spirit to the repeatability for an individual genotype’s BLUP as is commonly reported in animal breeding (Laloë 1993; Mrode and Thompson 2014). Finally, if we go further and instead average (24) across all pairs of genotypes, we obtain a single average pairwise heritability as(26)Hence, has the interpretation of an arithmetic mean of all pairwise heritabilities and is (given independent genotypic effects with constant variance) the algebraically equivalent simplification of , since the pairwise heritability in (24) has a constant denominator.

If is used, we take the marginal variance of a difference between two BLUEs of genotypes i and j as(27)where is the variance of a difference between BLUEs of genotypes i and j, and can be obtained via . Moreover,(28)Analogous to (24), only this time applying (21), (27), and (28), the squared correlation between the true difference and its estimator, which is the heritability of the estimator of the difference , is found to be(29)In accordance with (25) and (26), we could now take the arithmetic mean of per genotype and across all pairs. However, the inconvenient difference compared to in (24) is that the pairwise heritabilities in (29) do not have a constant denominator. As a result, taking the arithmetic mean across all does not lead to the algebraically equivalent simplification of . However, if we take the harmonic mean instead of the arithmetic mean for all pairwise heritabilities as(30)the resulting average does indeed coincide with . Correspondingly, we take the harmonic mean per genotype as:(31)Notice that since we here assume independent genotypic effects with constant variance, both and can, in fact, also be obtained by averaging the numerator and denominator in (24) and (29) separately across pairs (see Appendix B).

*Generalization to correlated genotypic effects with nonconstant variances:*

So far, we have assumed independent genotypic effects with constant variance . Yet, the approach outlined above naturally extends to the case where effects have nonconstant variance, (*i.e.*, ) and/or are correlated (*i.e.*, ), *e.g.*, due to pedigree- or marker-based kinship. The variance of the true difference between genotypes i and j then becomes(32)When BLUP is used, the pairwise heritability in (24) accordingly generalizes to(33)When BLUE is used, the pairwise heritability in (29) generalizes to(34)These pairwise heritabilities can be averaged as before, *i.e.*, as per Equations (25) and (26) for BLUP, and Equations (31) and (30) for BLUE, to obtain and . Unfortunately, the average over all pairwise heritabilities cannot generally be simplified algebraically for either BLUE or BLUP, since in both cases neither the denominator nor the numerator of the pairwise heritability is constant. But a simplification is forthcoming if, as an approximation, we average the numerator and denominator of the pairwise heritability separately (see Appendix C).

### Other proposals to compute heritability

Besides , (11), and (12), we calculated three additional generalized heritability measures that are in common usage.

*H*^{2} Oakey:

^{2}Oakey:

An alternative measure of heritability, which is sometimes referred to as *the generalized heritability*, can be traced back to Laloë (1993) and was proposed by Oakey *et al.* (2006) in the context of plant breeding. It has gained popularity due to its ability to account for heterozygosity/covariances in (*e.g.*, Mathews *et al.* 2008; Rodríguez-Álvarez *et al.* 2018). In this approach, contrasts of the true and predicted genotypic effects are defined as and , respectively. Note that, unlike for where all differences/contrasts between genotype pairs are considered, the contrast vector c can be any linear combination of genotypic effects where the elements of c sum to 0. This allows different genotypic effects to have different heritabilities, while at the same time reducing to a single scalar quantity. Similar to our derivation, they make use of (1)-III, with the difference that it applies to and , and thus represents the squared correlation between the true and predicted genotypic contrasts defined by c. The vector c is then chosen so that this squared correlation (*i.e.*, the heritability) is maximized. *Components of the full heritability* are defined as(35)where c is an eigenvector of the matrix with constraint and λ is the associated eigenvalue. The largest among the eigenvalues, , equals the maximized squared correlation of and . Note that and due to constraints on , eigenvalues will be zero. The generalized heritability is then defined as the mean of all nonzero eigenvalues:(36)In more recent work, Rodríguez-Álvarez *et al.* (2018) show how random spatial variation in plant breeding experiments can be analyzed using tensor product P-splines, which is a two-dimensional smoothing technique. In this context, they denote the trace of D as the *effective dimension* of the genotypic effects and reexpress as:(37)They point out that the notion of effective dimension is well known in the smoothing context, where it can be interpreted as a complexity measure for a given model and its component effects.

*Simulated:*

This method was proposed by Piepho and Möhring (2007), and is also based on the squared correlation between g and in (1)-III. It first defines the variance–covariance matrix of all random effects u and the target genotypic effects g as(38)so that . They then define Ω as the variance–covariance matrix of the joint distribution of the true and predicted genotypic effects:(39)where . Subsequently, Ω can be decomposed as by singular value decomposition or Cholesky decomposition. Finally, values for g and ** can be simulated (***i.e.*, and ) for an experiment, with the same design and genotypic relationships as those underlying the actual data y, as(40)where is a vector with simulated random independent standard normal deviates. Thus, for each of the simulation runs, a new vector of independent standard normal deviates is randomly generated, so that we can obtain , and compute the squared sample correlation of and . By running a large number of simulations, we can obtain the simulated expected squared correlation of predicted and true genotypic effects as(41)Notice that even in the *simple*, *balanced setting*, this method is not exactly identical to , but it is asymptotically equivalent for an increasing number of genotypes:(42)It can be argued that is preferable over, *e.g.*, or , as it captures the entire variance–covariance structure in the data in a more comprehensive manner (Piepho and Möhring 2007; Schmidt *et al.* 2019). Further notice that this approach also allows direct simulation of the response to selection (see Appendix E).

*Reliability:*

In animal breeding and thus mostly in the context of a single observation per genotype, the reliability , also known as the CD, is a popular statistic expressing the squared correlation between predicted and true breeding value, and is closely related to heritability (Laloë *et al.* 1996; Laloë and Phocas 2003; Kuehn *et al.* 2007; Piepho 2019). We can estimate the reliability for the (genotypic/breeding value of the) ith entry as(43)where is the ith diagonal element of the matrix and is ith diagonal element of the matrix (Mrode and Thompson 2014, chapter 9.3.4.). Accordingly, does not account for the off-diagonal elements of either or [but see Mrode and Thompson (2014), Appendix D]. We can further obtain the mean reliability as

*Computation of heritability measures:*

As summarized in Table 1, we computed eight overall heritability measures (,, , , , , , and ), three genotype-wise heritability measures (, , and ), and two pair-wise heritability measures ( and ) for each of the four examples. When assuming for examples 3 and 4, we are estimating narrow-sense heritabilities, otherwise broad-sense heritabilities were computed. The former is not possible for , , and , however, as these measures implicitly assume a constant genotypic variance and no covariances. We therefore based estimates of these heritability measures on the alternative model that assumes and accordingly completely ignores marker information. Finally, we used an *ad hoc* solution to compute via (6) in example 4, where we had slightly unbalanced data and environment-specific error variance estimates. Although only 86 out of 89 genotypes were present at all three environments, and one environment only had two replicates instead of three, we considered Thus, we took the respective maximum value, which is often done in practice. Furthermore, we computed the mean error variance as .

*Software:*

All statistical analyses were done by REML with the mixed model package ASReml-R version 3.0 (Gilmour *et al.* 2009) in the R Statistical Computing Environment (R Core Team 2015) and SAS 9.4 (SAS Institute Inc. 2013).

### Data availability

The data used in examples 1 and 2 are the “john.alpha” data set provided in the R package agridat (Wright 2017), which is available at https://cran.r-project.org/web/packages/agridat/index.html. Supplemental material and the data used in examples 3 and 4 are available at https://figshare.com/articles/Lettuce_trial_phenotypic_and_marker_data_/8299493. R code is available at https://github.com/PaulSchmidtGit/Heritability.

## Results

In this section, we present overall heritability estimates first, followed by genotype-wise and finally pairwise heritability estimates. The results are accompanied by minimal interpretations, whereas an elaboration on general reasons for differences between estimates of the different methods is provided in the *Discussion*.

### Overall heritability

Figure 1 shows estimates for overall heritability obtained via all eight methods for all examples. As expected for the simple, balanced setting of example 1, all estimates are identical, except for the slightly smaller estimates obtained for and . A similar picture is found for example 2, with the exception of displaying a notably larger estimate than the rest, which is due to the method’s inability to account for the variance of the random incomplete block effect. Examples 3 and 4 show a more heterogeneous picture, with displaying estimates that are strikingly lower than the rest.

### Genotype-wise heritability

In example 1, estimates for the pairwise heritability measures and were constant across all pairs (≈0.580), and hence equal to , , , and , which in turn were equal to , , and . Analogously, the slightly lower estimates were also constant (≈0.556) across genotypes and, therefore, equal to .

In example 2, estimates for genotype-wise heritability measures were no longer constant, neither between nor within each method. Instead, each method found two unique, but relatively similar, estimates and each one for half of the genotypes, respectively, *i.e.*, 0.80792 and 0.80801 for , 0.80911 and 0.80916 for , and 0.77537 and 0.77547 for .

Figure 2 details the heritability estimates per genotype for examples 3 and 4. It can be seen that the estimates are much more heterogeneous compared to examples 1 and 2. In fact, all of the 89 genotype-wise estimates per method were different from each other in both example 3 and example 4. In both of the latter, and irrespective of the heritability method, genotypes geno49, geno82, and geno88 showed noticeably lower estimates than the rest. This is related to the fact that the additive variances of these three genotypes stand out with a size of only ≈30% of the average (results not shown). We additionally found relatively low estimates for geno49 and geno81 (*i.e.*, two of the genotypes with missing observations) in example 4 (Figure 2).

In example 3, estimates between the three methods all showed correlations larger than 0.9, with and displaying the highest correlation of ∼0.959. In example 4, the correlation between these two was still high (≈0.932), while the other two correlation estimates dropped below 0.76 (Figure 2).

### Pairwise heritability

Figure 3 details the estimated pairwise heritability measures for all examples. As mentioned above, estimates for and in example 1 were constant across all genotype pairs (≈0.580). In example 2, estimates for each method split into two clusters, respectively, yet even the overall estimate ranges were relatively small (0.802–0.817 for and 0.803–0.818 for ). The heterogeneous estimates in example 2 are due to the recovery of interblock information via the random incomplete block effect in (17). It results from the varying number of times two genotypes (or even their neighboring genotypes) appear together in the same incomplete block and thus can be compared directly (John and Williams 1995).

Estimates in example 3 mostly form a single cluster for both methods, respectively, ranging from ∼0.70–0.88 for and 0.80–0.92 for , so that the cluster is located left of/above the diagonal in Figure 3. There are three exceptionally low estimates for both methods, which are those of the differences geno49–geno82, geno49–geno88, and geno82–geno88, *i.e.*, the three genotypes with relatively small additive variances as mentioned before. Generally speaking, genotype pairs with a lower additive genotypic covariance tend to show higher estimates for both and (Figure 3).

In most aspects, results from example 4 are similar to those in example 3. The most striking difference is the second cluster of lower pairwise heritability estimates. As highlighted in Figure 3, this cluster consists only of the 88 differences from/to genotype geno81, *i.e.*, the genotype present only at a single of the three environments. Notice that while not highlighted, the differences from/to the two genotypes present at only two of the environments (*i.e.*, geno38 and geno49) also form similar clusters, but they are merely located at the lower left border of the main cluster and thus do not stand out as drastically. Furthermore, the main cluster here lies more or less on the diagonal. Finally, the range of estimates for is smaller than in example 3. This is mostly due to the variance that for example 4 in (20) could be estimated separately via the genotype-by-environment interaction effect as , whereas fo example 3 the variance is confounded within the variance of the genotype main effect, since there is no genotype-by-environment interaction effect in (19), as data from only a single environment are present.

## Discussion

### Variations of the mixed model for replicated plant data

A consequence of having multiple individuals represent the same genotype is that the definition of the genetic variance requires careful consideration; rather unproblematic are genetically completely homogeneous cultivars such as inbred lines, doubled-haploid (DH) lines, clones, or hybrids. Yet, it should be noted that when it comes to genetically heterogeneous population cultivars in rye, for example, the matter is no longer straightforward (Bernal-Vasquez *et al.* 2017). Note that estimates for , increased with decreasing covariance as seen in the *Results* section for pairwise heritability. Note further that when we assumed for the (additive) genotype main effect in the analysis of example 4, for simplicity we did not correspondingly make use of K for modeling the genotype–environment interaction effect, but instead assumed . Yet, to account for marker-by-environment interaction, one may instead assume *e.g.*, , where ⊗ denotes the Kronecker product. Although Schulz-Streeck *et al.* (2012) found that the improvement of accounting for marker-by-environment interaction can be negligible, the reader may easily decide to include K for random genotype-interaction effects and it may actually be advisable if the variances for those effects are expected to be relatively large (Bernal-Vasquez *et al.* 2017).

### Relatedness of heritability methods

When both and ** are proportional to identity matrices, all heritability measures give identical estimates. However, this only holds for a linear random model, since even a single fixed effect [***e.g.*, μ in (16)] results in nonzero off-diagonal elements in . Yet, it should be noted that such a purely random model is not practically relevant in plant (and animal) breeding.

When and ** have compound symmetry structure (***i.e.*, simple, balanced setting), all methods except and give identical estimates, which we confirmed in the simple, balanced setting of example 1. Furthermore, both methods display that same value across all genotypes and pairs. Since we always had to assume for , , and , example 3 technically reduces to a simple, balanced setting for these measures as well, since it involves a single environment laid out as RCBD with balanced data. Accordingly, these three methods yielded identical estimates in that scenario as well (Figure 1). The estimates for differ, because off-diagonal elements are ignored in its estimation. However, notice that with increasing , the off-diagonal elements of approach 0, and thus approaches the other measures and, as stated before, so does .

In the case where and ** is an unstructured matrix, and give identical estimates. This is analogously true for and , and for (rather than ) being an unstructured matrix. Rodríguez-Álvarez ***et al.* (2018) show a connection between and . They point out that given , the mean reliability can be expressed as and thus as a special case of , ignoring the number of zero eigenvalues. This is in agreement with Laloë (1993), and our results confirm this relationship for examples 1 and 2 (results not shown). However, we did not find it to be true for examples 3 and 4, where . Note that in their article Rodríguez-Álvarez *et al.* (2018) refer to the mean reliability as in Cullis *et al.* (2006), even though the latter method is based on differences [see (11)], whereas the former is based on individual genotypes [see (43)].

Interestingly, though not practically very relevant, and give identical estimates per genotype for diagonal and **.**

Finally, when displays covariances (*e.g.*, as ) and is an unstructured matrix, none of the five measures that are appropriate for this case (*i.e.*, , , , , and ) give identical results. Nevertheless, with the exception of , heritability estimates within examples 3 and 4 were relatively similar (Figure 1). This is true even for the methods that did not include kinship information (*i.e.*, *, , and ). Yet, when Ould Estaghvirou **et al.* (2013) simulated phenotypic and marker data for a single location, they found that these three methods gave lower estimates than the rest.

The most striking results are those for in examples 3 and 4. Notice that Lourenço *et al.* (2017) also found outstandingly low estimates for in models with for two real data sets and one simulated data set. This raises the question whether is suited for cases where . While it is true that estimates are equal to in the simple, balanced setting (of example 1), this is only a necessary, but not a sufficient, condition for to be generally applicable also with correlated genotypes. It can be argued that and are similar in the sense that both are based on contrasts between genotypes. However, the important difference is that for the contrasts of interest (*i.e.*, all genotype pairs) are chosen according to the breeders’ goals, while the contrasts in are determined by the data set. Laloë (1993) shows that the smallest and largest nonzero eigenvalues, , are the lower and upper limit for the heritability of all possible contrasts of genotypes. Accordingly, they are also the lower and upper limit for all , which is confirmed by our results (Figure 4). It may be noted that is a simple average of “canonical” heritabilities (eigenvalues) corresponding to canonical contrasts and derived from the canonical decomposition of covariance matrices. As such, low heritabilities indicate a low design efficiency, as the smallest eigenvalue appears to be a good indicator of the robustness of the design and a measure of the part of the genetic trend that can be predicted (Laloë and Phocas 2003). In contrast, a pairwise difference is a combination of low and high heritability contrasts that corresponds to the weighted mean of canonical heritabilities. This may go some way toward explaining the observed discrepancies in estimates of and the alternatives considered here.

### Genotype-specific information gained from

Figure 2 and Figure 3 are good examples of how can give a more in-depth insight into the outcome of a plant breeding trial. More precisely, the results show that investigating genotype-specific, or even genotype-contrast-specific, heritability measures can (i) summarize how variable/ambiguous an overall heritability estimate is and (ii) point out individual genotypes that stand out.

The first point may seem obvious by now, but it must be realized that whenever only a single (overall) heritability estimate is obtained, there is no information whatsoever about the dispersion of that estimate, only about its central tendency. We argue that investigating and/or gives more comprehensive insight, since they explicitly show the variability across genotypes and thus, bringing us to the second point, point out exceptional genotypes such as geno38, geno49, geno81, geno82, and geno88 in example 4. Notice that we identified two separate reasons for why these five genotypes were exceptional (low additive variance and missing observations; see *Results* section). Moreover, looking at the cluster of estimates for geno81 in example 4 in Figure 3, it becomes clear how a single genotype can display notably lower pairwise heritability estimates than the rest. These estimates are obviously decreasing the overall heritability estimate for this example, which would be especially unfavorable in a scenario where the breeder does not have great interest in this particular genotype. In the end, none of this would become apparent by only looking at overall heritability estimates (Figure 1).

### What do we really want heritability for and how does help?

There are two main reasons why heritability on an entry-mean basis is of interest in plant breeding. On one hand, it is plugged into the breeder’s Equation (2) to predict the response to selection. On the other hand, it is a descriptive measure used to assess the usefulness and precision of results from cultivar evaluation trials. In the simple, balanced setting, heritability is suited for both purposes. However, whenever we depart from the simple, balanced setting, is no longer suited for either of the two purposes. Furthermore, any alternative measure can ultimately only aim to generalize heritability as a descriptive measure. Regarding the purpose of heritability as a mean to estimate the response to selection, we reiterate the view of Piepho and Möhring (2007): instead of trying to approximate heritability using some *ad hoc* measure, one should simulate the response to selection directly. As stated before, this can be done via the same simulation-based approach used to obtain and is exemplified in their work (also see Appendix E).

In terms of the descriptive function of heritability, we believe that is a valuable extension of heritability on an entry-mean basis, because its genotype-wise and pair-wise estimates give more detailed insight, and ultimately allow for better decision-making by the breeder. Furthermore, we would like to argue that its derivation, calculation, and interpretation is rather intuitive, which makes us optimistic about its acceptance in practice. Our view is further supported by Laloë (1993), who also suggested extending reliability to pairwise contrasts in the animal breeding framework.

### Conclusions

Irrespective of whether a breeder ultimately decides to apply , we think that understanding the idea behind it alone raises awareness for the mentioned problems with heritability outside the scope of simple, balanced settings and thus can potentially improve the use of heritability in breeding programs. In the end, all heritability measures aim to inform about the same underlying subject matter, and, albeit via different methodologies, they all do. What should therefore be held above all, especially given the mentioned ambiguity problems in a nonsimple, unbalanced setting, is to report how heritability was estimated. Hanson and Robinson (1963, page 612) put it in this crucial context of reproducible science by pointing out that “One must extrapolate in the real world, and one must use estimates of heritability derived by someone else, especially with new crops in new environments. Therefore, it is important to specify exactly how published estimates were obtained in order that others may extrapolate.”

## Acknowledgments

We thank Ivan Simko and Ryan J. Hayes (US Department of Agriculture-Agricultural Research Service, Crop Improvement and Protection Research Unit, Salinas, CA) for providing the lettuce data. We also wish to thank two anonymous reviewers for their very helpful comments. This research was funded by the German Research Foundation (Deutsche Forschungsgemeinschaft grant PI 377/18-1).

## Appendix

### Appendix A: Computing From in a Stage-Wise Approach

Consider the analysis of a series of trials where the aim is to obtain . One may use a single-stage analysis with a random genotype main effect. Alternatively, it is also possible to use a stage-wise analysis, where the first stage obtains from which are predicted in the second stage. Both approaches are equivalent, if variance components are known. In practice, small differences are encountered, as variances need to be estimated (Piepho *et al.* 2012a).

#### Single-stage analysis

Assume that we can write the general single-stage model in (13) as(45)where is a vector of fixed across-environment effects (subscript A), is a vector of random across-environment effects with , is a vector of random between-environment effects (subscript B) with , is a vector of random within-environment effects (subscript W) with , ε is a vector of plot errors with , and y is the observed data vector with , where [see Piepho *et al.* (2012a)]. Thus, G here is a block diagonal matrix with blocks , **,** and ** on its diagonal. Given a MET similar to example 4, would be a general intercept, the random genotype main effect (***i.e. g*), the random environment main effect and random genotype-by-environment interaction effect, the random block effect at each environment, and ε the plot error. Accordingly, would correspond to the kinship matrix K, whereas and would be proportional to identity matrices.

#### Two-stage analysis

Model (45) has a two-stage representation, with the first stage model given by(46)where effects are defined as in (45), while the subscript 1 denotes that the corresponding parameters are estimated in this first stage and thus . Notice that in contrast to (45), is missing here, since the genotype main effect needs to be taken as fixed at this stage (Piepho *et al.* 2012a) and is therefore comprised in . For simplicity, we can assume that here the genotypic main effects are the only fixed effects so that are the genotype means. Accordingly, we can obtain genotypic BLUEs as(47)with variance(48)In the second stage we then have(49)where and just like in (45), the random genotype main effect where corresponds to the kinship matrix K. Notice that and . Accordingly, we have and for the mixed model equations we find(50)It is important to note that we can write(51)and(52)Finally, plugging in (48) and (49) into (50), while making use of (51) and (52), it can be seen that the mixed model equation solutions for the second stage are equivalent to those for the single-stage analysis.

### Appendix B: An Alternative Estimation for in the Case of Independent Genotypic Effects with Constant Variance by Averaging Separately Across Numerator and Denominator

Given independent genotypic effects with constant variance, averaging numerator and denominator of in (24) separately first leads to the algebraically equivalent simplification(53)Analogously, we find for in (29) that

(54)### Appendix C: An Approximation for in the General Case

It can be shown (Ould Estaghvirou *et al.* 2013, Additional Files 1) that(55)where with being a vector of ones. Thus, is a matrix that centers for the overall mean, such that . Analogously, we have(56)such that and finally(57)such that . By inserting (55), (56), and (57) into (34), we get(58)where, loosely speaking, captures the genotypic variance, captures the environmental variance, and hence captures the phenotypic variance. Notice that (58) coincides with Method 4 of Ould Estaghvirou *et al.* (2013). Based on (33), we find correspondingly

### Appendix D: A Standardization of M To Directly Capture the Total Genotypic Variance

As shown in Appendix C, captures the average genotypic variance of a difference between two genotypes. In accordance with the notion of this article, we could obtain half the average variance of a difference as an estimate for *the total genotypic variance*. Notice that this approach is in line with the average semi variance used by Piepho (2019) and can be defined as(60)In the case of , we can do some useful rearrangements:(61)where is column mean-centered. Notice that is equal to what is referred to as Z in VanRaden (2008) (page 4416). Notice further that the standardization presented here works for an arbitrary coding of markers and arbitrary scale. As a result, the trace of directly returns. We can standardize further:(62)and define . Now the total genotypic variance simplifies to(63)which simplifies (58) and (59) to(64)and(65)Thus, if is standardized as shown above, the genotypic variance we are trying to capture is conveniently estimated directly as the variance for a. However, it should be noted that both and are singular, and thus may lead to problems depending on the statistical software being used: ASReml-R 3.0 (Gilmour *et al.* 2009) will show an error message, while PROC MIXED in SAS 9.4 (SAS Institute Inc. 2013) will effectively compute a modified version of the mixed model equations (Piepho *et al.* 2012b; SAS Institute Inc. 2017).

We would like to point out that neither nor are equal to the genomic relationship matrices proposed in VanRaden (2008). Their first method obtains the general relationship matrix as:(66)where are the mean frequencies of the second allele across all genotypes at marker and can also be expressed as . Their second method obtains the matrix as where L is a diagonal with elements(67)Since we have , we find that opposed to and therefore , (66) and (67) additionally require a division involving allele frequencies. It may also be pointed out that in a population with Hardy–Weinberg equilibrium, corresponds to the heterozygosity of the markers. However, it should be kept in mind that completely genetically homogeneous cultivars—such as inbred lines, DH-lines, clones, or hybrids—are definitely not in Hardy–Weinberg equilibrium. Finally, note that if we approximate via (59) and implement into , we find that even in the general case. Analogously, we have .

### Appendix E: Directly Simulate Response to Selection

As proposed by Piepho and Möhring (2007), one may directly simulate the response to selection via the same approach used to obtain in this article. Since in (40) we simulate pairs of true and predicted genotypic values, we can select a subset with the best values for and compute the response to selection for the sth simulation run as(68)where is the selected subset of genotypes in the sth simulation run and is the number of selected genotypes (*i.e.* ). Analogous to (41), we can then obtain the simulated expected response to selection as(69)Note that, in principle, the simulation-based approach could also be adapted to simulate the joint distribution of true and estimated genotypic values. Furthermore, one could then compare the simulated expected response to selection values from both simulation approaches to determine whether to ultimately model the genotypic main effect as fixed or random. Yet, not least because of the ability to account for kinship information, we generally expect an analysis with random genotypic effects to be more useful than one with fixed genotypic effects [see Piepho *et al.* (2008)].

## Footnotes

Supplemental material available at FigShare: https://doi.org/10.25386/genetics.7370168.

*Communicating editor: A. Charcosset*

- Received March 18, 2019.
- Accepted June 17, 2019.

- Copyright © 2019 by the Genetics Society of America