## Abstract

Heritability is a central parameter in quantitative genetics, from both an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within- and between-genotype variability. This approach estimates broad-sense heritability and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker-based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here is to use mixed models at the individual plant or plot level. Using statistical arguments, simulations, and real data we investigate the feasibility of both approaches and how these affect genomic prediction with the best linear unbiased predictor and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at the individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For genome-wide association studies on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.

- marker-based estimation of heritability
- GWAS
- genomic prediction
*Arabidopsis thaliana*- one-
*vs.*two-stage approaches

NARROW-SENSE heritability is an important parameter in quantitative genetics, determining the response to selection and representing the proportion of phenotypic variance that is due to additive genetic effects (Jacquard 1983; Nyquist and Baker 1991; Ritland 1996; Visscher *et al.* 2006, 2008; Holland *et al.* 2010; Sillanpaa 2011). This definition of heritability goes back to Fisher (1918) and Wright (1920) almost a century ago. In plant species for which replicates of the same genotype are available (inbred lines, doubled haploids, clones), a different form of heritability, broad-sense heritability, is traditionally estimated by the intraclass correlation coefficient for genotypic effects, using estimates for within- and between-genotype variance. Broad-sense heritability is also referred to as repeatability and gives the proportion of phenotypic variance explained by heritable (additive) and nonheritable (dominance, epistasis) genetic variance.

With the arrival of high-density genotyping there is growing interest in marker-based estimation of narrow-sense heritability (WTCCC 2007; Yang *et al.* 2010, 2011; Speed *et al.* 2012; Vattikuti *et al.* 2012; Visscher and Goddard 2014). These estimates are obtained from mixed models containing random additive genetic effects, whose covariance structure is estimated from genetic markers. While marker-based heritability estimates have received much attention in human genetics, these are rarely reported for plant traits, despite their relevance in evolutionary genetics, in the dissection of complex traits, and in the ongoing debate on missing heritability (Manolio *et al.* 2009; Eichler *et al.* 2010; Brachi *et al.* 2011; Lee *et al.* 2011; Zuk *et al.* 2012). Heritability estimates are also of great relevance to plant breeders, as they give a measure for the breeding potential of a trait. In addition, state-of-the-art phenotyping platforms are making experiments more reproducible, increasing the relevance of marker-based estimation of heritability, as well as comparison of estimates from different experiments.

Although marker-based heritability estimation for plant traits can in principle be achieved within the same analytic framework that has been developed for human traits, there are important differences. First, heritability of human traits is usually estimated using panels of unrelated individuals, to avoid confounding with geographical or environmental effects (Browning and Browning 2011). For most plant species panels of unrelated individuals are not available, as plant genotypes will often share ancestry or adaptation, inducing dependence between genotypes. At the same time, however, plant genotypes can be evaluated under the same experimental conditions (*e.g.*, common garden or growth chamber), and the usual assumption is that this eliminates genotype–environment correlations. Second, plant genotypes are often phenotyped in several genetically identical replicates. This is the case for all so-called immortal populations (Keurentjes *et al.* 2007; Wijnen and Keurentjes 2014). Mixed-model analysis can then be performed either on the individual plant (or plot) data or on genotypic means. In the literature on multi-environment trials (Smith *et al.* 2001, 2005; Oakey *et al.* 2006; Piepho and Williams 2006; Piepho *et al.* 2006, 2012; Boer *et al.* 2007; Verbyla *et al.* 2007; Stich *et al.* 2008; Möhring and Piepho 2009; Van Eeuwijk *et al.* 2010; Welham *et al.* 2010; Malosetti *et al.* 2013) these approaches are referred to as respectively one-stage and two-stage. These works consider mostly populations for which a pedigree is available, typically experimental populations. In the context of genomic prediction and genome-wide association studies (GWAS) for natural populations, mixed-model analysis is usually performed using a two-stage approach. The (usually tacit) assumption is that the genotypic means and kinship coefficients contain all the relevant information for estimating the genetic and residual variance. Here we investigate the feasibility of marker-based estimation of heritability with one- and two-stage approaches and look at how heritability estimates affect genomic prediction with the best linear unbiased predictor (G-BLUP) and GWAS. Although our analysis of observed phenotypes focuses on the model plant *Arabidopsis thaliana*, the asymptotic variances of different heritability estimators were also computed for diverse panels of *Zea mays* (Riedelsheimer *et al.* 2012; Van Heerwaarden *et al.* 2012) and *Oryza sativa* (Zhao *et al.* 2011).

In both published data (Atwell *et al.* 2010) and new experiments, we found very large standard errors and sometimes unrealistically high estimates of heritability, which could not be explained by varying linkage disequilibrium (Speed *et al.* 2012). Much better heritability estimates were obtained when mixed-model analysis was performed at the individual plant level. These estimates were based on kinship information as well as additional information on within-genotype variability, and in simulations they were found to be up to 13 times more accurate than heritability estimates based on genotypic means. In genomic prediction, correlation between simulated and predicted genetic effects increased in some cases by as much as 49%. This is a substantial improvement that shows the importance of accurate heritability estimates in plant breeding programs.

All reported heritability estimates can be obtained using our R-package heritability, which is freely available from CRAN (http://cran.r-project.org/web/packages/heritability/index.html). In contrast to existing packages such as emma (Kang *et al.* 2008), rrblup (Endelman 2011), and synbreed (Wimmer *et al.* 2012), it provides confidence intervals for heritability estimates. We also present software for GWAS: our program scan_GLS can efficiently perform GWAS directly on the individual plant or plot-level data, as well as on the means, incorporating a nondiagonal error covariance structure.

## Materials and Methods

We assume a natural population of *n* genotypes, where for each genotype a quantitative trait is measured on a number of genetically identical replicates of immortal lines. These replicates can refer to either individual plants (*e.g.*, *Arabidopsis*) or plots in a field trial (*e.g.*, maize). For convenience we use the terms “replicate” and “individual plant data” as synonyms throughout. In either case, the observations on replicates should not be confused with having multiple observations on the same individual. We focus on inbred lines, but apart from a few different constants, all expressions are valid for any diploid species. We do not partition the environmental variance into different contributing factors (see, *e.g.*, Visscher *et al.* 2008); hence the environmental variance just equals the error variance. For simplicity we first present marker-based estimation of heritability for a completely randomized design, in the absence of additional covariates. The expressions for more general designs are given in *Appendix A*, which also provides a brief overview of the existing methodology for marker-based heritability estimation. Details on G-BLUP, GWAS, and the phenotypic data are given in *Appendixes C*, *D*, and *E*.

### Genotypic data

We analyze simulated and real traits for subpopulations of the RegMap, which contains 1307 worldwide accessions of *A. thaliana* that have been genotyped at 214,051 SNP markers (Horton *et al.* 2012). We considered four subpopulations: 298 Swedish accessions [Swedish RegMap (Horton *et al.* 2012; Long *et al.* 2013)], 204 French accessions [French RegMap (Horton *et al.* 2012; Brachi *et al.* 2013)], 350 accessions from the HapMap population (Li *et al.* 2010), and a subset of 250 accessions that we refer to as the *structured RegMap* (accession IDs are given in Supporting Information, Table S1). The structured RegMap accessions were chosen to have large differences in genetic relatedness (*i.e.*, variation in kinship coefficients, across pairs of accessions). These differences were hence largest in the structured RegMap and smallest in the HapMap (Figure S1 and Figure S2). For the asymptotic distributions of heritability estimators given below, we also considered marker-based kinship matrices for three populations of crop plants: the panel described in Van Heerwaarden *et al.* (2012) (*Z. mays*, 400 accessions), the panel described in Riedelsheimer *et al.* (2012) (*Z. mays*, 280 accessions), and the panel from Zhao *et al.* (2011) (*O. sativa*, 413 accessions).

### Phenotypic data

We estimated heritabilities for two traits from the literature and four traits from new experiments. Two flowering traits from Atwell *et al.* (2010) were analyzed: days to flowering time under long day and vernalization (LDV) and days to flowering time under long day (LD). In two new experiments (*Appendix E*), leaf area (LA) 13 days after sowing was measured for the Swedish RegMap and the HapMap, using the same automated phenotyping platform. In the final experiment, the HapMap was phenotyped for bolting time (BT) and leaf width (LW). Trait descriptions and abbreviations are given in Table 1. In all experiments the individual plants were grouped into complete blocks, but due to nongerminating seeds and dead plants, there is some unbalance for all of the traits.

### Marker-based estimation of heritability

Let *K* denote the genetic relatedness or kinship matrix of the genotypes, with elements (1)(see, *e.g.*, Patterson *et al.* 2006 and Goddard *et al.* 2010). The numbers *x _{i}*

_{,}

*∈ {0, 2} denote the minor allele count [because we focus on inbred lines, there is a small difference with the standard expression for outbreeders under Hardy–Weinberg equilibrium, where*

_{l}*x*

_{i}_{,}

*∈ {0, 1, 2}, and the constant 4 in (1) is replaced by 2] at marker*

_{l}*l*for genotype

*i*, and

*f*∈ [0, 1] is the minor allele frequency at marker

_{l}*l*. The phenotypic response of replicate

*j*of genotype

*i*is modeled as (2)where

*G*= (

*G*

_{1}, … ,

*G*) has a distribution, and the errors

_{n}*E*

_{i}_{,}

*have independent normal distributions with variance . We aim to estimate the heritabilitywhich has been referred to as the “chip” heritability (Speed*

_{j}*et al.*2012) and under certain assumptions equals narrow-sense heritability (

*Appendix A*). In contrast to the

*line*heritability , the denominator contains the residual variance for a single individual . Given restricted maximum-likelihood (REML) estimates and ,

*h*

^{2}can be estimated by (3)where the subscripts

*r*stress the fact that and are estimates directly obtained from the replicates. Alternatively,

*h*

^{2}can be estimated using the genotypic means (4)Given REML estimates and for model (4), we have the heritability estimate (5)where the subscripts

*m*indicate that the estimates are based on genotypic means. We omit the letters

*r*and

*m*either if these are clear from the context or when a statement holds for both models (2) and (4). In human association studies,

*r*is usually 1, and and are identical.

### Repeatability and broad-sense heritability

Given a completely randomized design with r replicates, repeatability, or intraclass correlation can be estimated by(6)where MS(G) and MS(Env) are the mean sums of squares for genotype and residual error, obtained from analysis of variance (ANOVA). In the case that MS(*G*) < MS(Env), is set to zero. See Singh *et al.* (1993) or (in the context of sib analysis) Lynch and Walsh (1998, p. 563). Here we stick to the widely used notation , although repeatability equals broad-sense heritability (*H*^{2}) only under the assumption that all differences between genotypes are indeed genetic and not due to, *e.g.*, genotype–environment correlation. In any case, no additional information on the genetic structure is used here, and hence the mean sums of squares for genotype contain all genetic effects, not only the additive ones. Consequently, is an estimator of (an upper bound on) broad-sense heritability.

It is known to have a small bias (Nyquist and Baker 1991; Singh *et al.* 1993), which tends to zero when the number of genotypes increases. Additional fixed effects can be included in the ANOVA, which may reduce MS(Env) and give a higher heritability estimate. In the case that the genotypes have differing numbers of replicates, genetic variance is estimated with , where (7)is the effective number of replicates (Lynch and Walsh 1998, p. 559). For a balanced design, .

### Confidence intervals

Confidence intervals for broad-sense heritability *H*^{2} were obtained from classical theory (*e.g.*, Lynch and Walsh 1998, p. 563) (for easy reference see File S1). Confidence intervals for *h*^{2} were constructed using and and the inverse average-information (AI) matrix, obtained from the AI-REML algorithm (Gilmour *et al.* 1995). This 2 × 2 matrix provides an estimate of the covariance matrix of . The delta method (see, *e.g.*, Van der Vaart 2000) applied to the function then gives the asymptotic distribution of , from which a confidence interval for *h*^{2} is calculated. In the case that the heritability is low or high, confidence intervals may be partly outside [0, 1]. We set all negative values to zero and all values larger than one equal to one. An alternative is application of the delta method to the function . This gives confidence intervals for , which are then back transformed to confidence intervals for heritability. The latter intervals are referred to as “log-transformed” and the other intervals as “standard.”

When the likelihood as a function of *h*^{2} is monotonically increasing, the confidence intervals can become numerically unstable, alternating between [0, 1] and [1, 1]. However, we found that using the AI matrix obtained by setting *h*^{2} = 1 − *ε*, the interval was always very close to [0, 1], for any *ε* > 0. We therefore defined the interval to be [0, 1] in the case of a monotonically increasing likelihood.

### Simulations

Each simulated trait consists of *r* = 3 replicates for a subset of 200 accessions randomly drawn from a subpopulation of the *Arabidopsis* RegMap (Horton *et al.* 2012). We simulate according to model (2), except that the genetic effects *G _{i}* are not purely polygenic, but a mixture of QTL effects at a small number of markers and a polygenic signal with genetic structure given by the kinship matrix defined in (1). More details are given in

*Appendix B*. Different genetic architectures were considered, but all simulated genetic effects were additive. Hence, broad- and narrow-sense heritabilities were equal, and , , and were in this case different estimators of the (same) simulated heritability.

For the comparison of heritability estimates we simulated traits for the structured RegMap, the HapMap, and the Swedish and French RegMap. For comparing one- and two-stage association mapping and genomic prediction only the structured RegMap and the HapMap were considered. For genomic prediction and the comparison of heritability estimates we simulated three levels of heritability, for each subpopulation: low (0.2), medium (0.5), and high (0.8). For each heritability level, 5000 traits were simulated. For the comparison of one- and two-stage association mapping, 1000 traits were simulated with heritability 0.5. In all simulations, a completely randomized design was assumed.

## Results

### Asymptotic variance

Applying general mixed-model theory (Casella and McCulloch 2006, p. 387) to model (2), it follows that the REML estimators are asymptotically Gaussian with covariance (8)where for , *X* = 1* _{nr}*, and

*Z*is the

*nr*×

*n*incidence matrix assigning plants to genotypes. The number of replicates

*r*is considered fixed here. The asymptotic arguments are all with respect to

*n*, the number of genotypes. The behavior of as a function of

*n*,

*h*

^{2}, and

*K*can also be derived from approximations recently proposed in Visscher and Goddard 2014, independently from the present work. Although this is not made explicit in the notation, , depends on the true (and usually unknown) values and through

*P*and

*V*. If are estimated based on genotypic means following model (4), has asymptotic covariance , which is obtained if we replace in (8)

*ZKZ*by

^{t}*K*and

*X*= 1

*by*

_{nr}*X*= 1

*and substitute in the definition of*

_{n}*P*. The asymptotic variance of the heritability estimators and can now be obtained by application of the delta method (Van der Vaart 2000) to the function , with gradientIt then follows that given the true and ,(9)It is easily verified that both variances do not depend on the absolute values of and , but only on the ratio . To explore the potential gains in accuracy due to the use of individual plant data, we computed and the ratio of the two standard deviations, for several populations in

*A. thaliana*,

*Z. mays*, and

*O. sativa*(Table 2) and three heritability levels (

*h*

^{2}= 0.2, 0.5, 0.8).

The use of individual plant data gave a substantial improvement in accuracy for all populations, which was largest for the *Arabidopsis* HapMap [81% reduction with respect to , for *r* = 4 and *h*^{2} = 0.8] and smallest for the structured RegMap, the rice population from Zhao *et al.* (2011), and the maize population from Riedelsheimer *et al.* (2012) (13–15% reduction when *r* = 2 and *h*^{2} = 0.2). For the maize population from Van Heerwaarden *et al.* (2012) and the other *Arabidopsis* populations the improvements were similar. The standard deviation of decreased with increasing heritability and increasing numbers of replicates. The ratio of the standard deviations of over those of also decreased substantially with increasing heritability, but remained similar when increasing the number of replicates beyond 2.

It should be emphasized that the ratios in Table 2 are asymptotic: although the expressions in (9) depend on the kinship matrix *K* of a finite population, they may be good approximations only for (very) large populations with population structure similar to that defined by *K*. Such populations could be modeled as a function of population genetic processes such as drift and selection (Rousset 2002; Fu *et al.* 2003). However, for real plant populations marker information is available for at most a few thousand genotypes and more often several hundred. To assess variance and bias of and in such populations we used simulated traits.

### Heritability estimates for simulated data

We analyzed simulated traits for the four populations of *A. thaliana*, for different genetic architectures. We also used the simulations to compare these marker-based estimators with the broad-sense heritability estimator , which ignores genetic relatedness, and to investigate the quality of heritability confidence intervals.

Heritability estimates for the structured RegMap and HapMap are given in Figure 1 and Table 3. Apart from in the HapMap, none of the estimators showed marked bias. In particular, had negligible bias, despite the fact that it does not account for genetic relatedness. As in the asymptotic results, had lower variance than , the difference being largest for the HapMap. The magnitude of the differences was larger than expected from the asymptotic variances in Table 2, and in the HapMap standard errors were up to 13 times larger. As in the asymptotic framework, differences were smallest for the structured RegMap and largest for the HapMap.

Remarkably, also was considerably more accurate than . The additional marker information appeared to be less important here than the loss of information on within-genotype variability. The estimator , which includes both sources of information, outperformed . Differences were largest for the structured RegMap (12.7% reduction when *h*^{2} = 0.2, 42.6% reduction when *h*^{2} = 0.8), where the large differences in relatedness provide more information than in the case of the HapMap (almost no difference when *h*^{2} = 0.2 and 7% reduction when *h*^{2} = 0.8). Simulation results for the Swedish and French RegMap are given in Figure S3 and Table S2. In terms of standard deviations of , , and , these populations were somewhere in between the structured RegMap and HapMap.

For a substantial proportion of the simulated traits in both the HapMap and the structured RegMap, we observed a phenomenon that was not apparent from the asymptotic arguments above: was close to or equal to 1, in which case the likelihood as a function of *h*^{2} was monotonically increasing. This behavior is most likely to occur when the genetic relatedness matrix is close to compound symmetry (*i.e.*, all genotypes being equally related), in which case the likelihood is constant in *h*^{2} (File S3). The small sample size can then lead to monotone likelihood profiles (Figure S4). Indeed = 1 occurred most often for the HapMap: even for a simulated heritability of 0.5, > 0.99 for 882 of the 5000 traits (Figure 1). For the HapMap and simulated *h*^{2} = 0.2, we also observed = 0.

The simulations were repeated for a genetic architecture where only 10% of the genetic variance consists of polygenic effects and the remaining 90% is determined by a single QTL (File S4). Even in this extreme scenario, still performed very well, having almost negligible bias. Also the improvement in standard error relative to remained similar.

### Confidence intervals

For all simulated traits we calculated both standard and log-transformed confidence intervals for narrow-sense heritability (*h*^{2}), using genotypic means as well as individual plant data. Also confidence intervals for broad-sense heritability (*H*^{2}) were calculated. (We recall that only additive genetic effects were simulated, and hence *h*^{2} = *H*^{2}.) Coverage and width of confidence intervals are given in Table 4 and Table S3 (Swedish and French RegMap). The standard 95% intervals obtained from the individual observations (*i.e.*, those associated with ) had around 95% coverage in the simulations, while coverage was below 95% for the intervals based on genotypic means. The latter intervals also had larger width, which relates to the higher variance of in simulations. For the HapMap average width was larger than 0.5, even at a simulated heritability of 0.2. For the simulated traits with , the corresponding confidence interval was [0, 1].

Differences between the log-transformed and the standard intervals were small for . For the use of the log-transformed intervals gave bigger improvements, although coverage was still below 95%. Also the confidence intervals for broad-sense heritability had insufficient coverage. This is due to model misspecification: the analysis of variance does not incorporate the genetic structure, and uncertainty in the estimates is therefore underestimated. Coverage was still around 93% for the HapMap, whereas for the structured RegMap it decreased to 67.6% for *h*^{2} = 0.8.

### Heritability estimates for real data

The heritability estimates , , and were calculated for the six traits. Genotypic means were calculated as the best linear unbiased estimators (BLUEs) for the genetic effects, in a linear model containing fixed effects for both genotype and additional design effects (*Appendix A*). In the calculation of the nondiagonal covariance structure of the BLUEs was taken into account. In the case of and , design effects were directly included in the mixed model.

For the flowering traits from Atwell *et al.* (2010), our estimates of broad-sense heritability were 0.858 for LDV and 0.966 for LD (Table 5). For several reasons (File S2), these values were lower than those reported in supplementary table 7 of Atwell *et al.* (2010): 0.94 for LDV and 0.99 for LD. For both traits, marker-based heritability estimates based on individual plant data () were substantially lower, but still high (0.80 for LDV and 0.93 for LD). Marker-based heritability estimates obtained from genotypic means () were very different: much lower in case of LDV (0.51) and 1 for LD. (Table 5 and Figure 2). In the latter case, the likelihood as a function of *h*^{2} was monotonically increasing, just as we observed for some of the simulated traits. The estimate of residual variance (not reported) was virtually zero in this case, and the confidence interval was equal to the whole unit interval. Also in the leaf area experiments there were substantial differences between and , especially for the Swedish RegMap ( *vs.* ). Heritability estimates were larger for the HapMap ( *vs.* ), which has greater genetic diversity. Again, confidence intervals associated with were wider than those associated with . In the final experiment, heritability estimates for bolting time (BT) were very similar to those for LD: = 1, and and were close to 1. For LW we found , much lower than . The latter was slightly larger than the broad-sense heritability estimate , but the confidence intervals largely overlap.

### Genomic prediction with G-BLUP

The G-BLUP depends on the estimated genetic and residual variance and hence on the estimated heritability (Henderson 1975; Robinson 1991). Therefore genomic prediction with G-BLUP could potentially be improved when individual plant data are used, instead of genotypic means. We compared G-BLUP based on genotypic means and individual plant data using simulated traits, each including a training set of *n* = 200 genotypes and a validation set of *m* = 50 genotypes, for which only marker information was available. For both sets of genotypes, we obtained the G-BLUP () predicting the true (simulated) genetic effects *G* and calculated the prediction accuracy in terms of the correlation (*r*) between and *G*.

For the training and validation sets, accuracy decreased when the estimated heritability was too far from the simulated heritability (Figure 3, Figure 4, and Table S5). For the validation sets there were larger differences in accuracy across simulations, because of the additional randomness in the selection of the validation genotypes, creating varying degrees of connectedness with the training sets. When using individual plant data, heritability estimates were never far from the simulated heritability, and prediction accuracy was close to a constant, depending on the simulated heritability. Using only genotypic means, heritability was often severely over- or underestimated, in which case accuracy decreased substantially. This decrease was largest when heritability was underestimated, which follows from the mathematical expressions for G-BLUP (*Materials and Methods* and File S5).

In the HapMap population with simulated heritability 0.8, the estimate was between 0.7 and 0.9 for 4998 of the 5000 simulated traits, and accuracy (*r*) averaged over these simulations was 0.431 (Table 6). The heritability estimate based on means () was, however, between 0.1 and 0.3 for 8.4% of the simulated traits, and for 2.6% of them it was even below 0.1. Averaged over the latter group of traits, accuracy was only 0.289. Consequently, for these traits an improvement in accuracy of 49% could be realized by genomic prediction based on individual plant data instead of genotypic means. Averaged over all simulated traits, the prediction accuracy obtained from genotypic means was not much lower, because of the large “safe zone” where the approaches performed similarly: accuracy was at least 0.4 once the estimated heritability was above 0.5. It is impossible, however, to determine whether a *given* trait is within this zone, using genotypic means only. When individual plant data are available, the heritability estimates and could be compared, and the G-BLUP based on genotypic means could be used if and were similar.

Similar results were obtained for prediction accuracy observed in cross-validations on the six observed traits (Figure 5). Averaged over 500 validation sets, prediction accuracies obtained with individual plant data and means were almost the same for LDV, LD, and BT (respectively 0.77, 0.82, and 0.67). For leaf area 13 days after sowing (Swedish RegMap) [LA(S)], leaf area 13 days after sowing (HapMap) [LA(H)], and LW, average accuracies were respectively 0.22, 0.27, and 0.04 when using genotypic means and 0.23, 0.28, and 0.11 when using individual plant data. For LDV, the heritability estimates obtained using genotypic means followed a bimodal pattern, which was largely due to accession 8233, for which only a single observation was available and genotypically is an outlier (Figure S6). Despite the fact that the standard errors and correlations among the genotypic means were taken into account, training sets including this accession produced much lower heritability estimates than in cases where it was assigned to the validation set. This, however, did not lead to lower prediction accuracy. More generally, the relation between low prediction accuracy and underestimating heritability was less clear than in Figure 4, which may be due to the additional uncertainty in the estimation of fixed effects and the fact that this concerns a subsampling of observed traits, rather than simulated traits. Nonetheless, genomic prediction based on individual plant data again performed at least as well as the standard approach based on means and much better in the case of LW.

### GWAS

Many state-of-the-art methods for GWAS (Kang *et al.* 2010; Lippert *et al.* 2011; Lipka *et al.* 2012; Zhou and Stephens 2012) use the same mixed model as in marker-based estimation of heritability, apart from the additional marker effect (*Appendix D*). When testing the significance of this marker effect, the estimated genetic and residual variances (and hence the heritability) determine the correction for population structure or genetic effects elsewhere in the genome. This suggests that poor estimation of the variance components and may also affect association mapping, especially when these are estimated in a model without marker effects and then kept fixed when calculating generalized least-squares (GLS) estimates of marker effects (Kang *et al.* 2010). The GLS estimate of the fixed effects (including the marker effect) is unbiased since the expectation of *Y* is *Xβ*. Hence , even conditionally on a poor estimate of . However, *P*-values for the significance of marker effects will be more sensitive to poor estimates of than the effect estimates themselves. We investigated this with GWAS on simulated traits for the HapMap and RegMap populations, which were performed on genotypic means (two-stage) and individual plant data (one-stage). Rank correlations between one- and two-stage −log_{10} *P*-values were almost 1 when was close to the simulated heritability and decreased when under- or overestimated heritability (Figure S7). However, even then correlations were still high, and the resulting loss in power appeared to be limited (Figure 6). For the RegMap (results not shown) differences between receiver operating characteristic curves were even smaller. In contrast to the accuracy of genomic prediction, these differences remained small when restricting the set of simulated traits to those for which underestimated the simulated heritability (curves not shown). This may be explained by the fact that the correlation between and tends to zero for large numbers of genotypes (Casella and McCulloch 2006).

### Software

In most of our analyses and simulations we used the commercial R package asreml (Butler *et al.* 2009), which contains a fast implementation of the AI algorithm. We also made our own implementation of the AI algorithm, together with functions for estimating heritability. These are contained in the R package heritability, which is freely available online. In contrast to other packages such as emma and synbreed, it provides confidence intervals for heritability.

For GWAS on simulated traits we developed the command-line program scan_GLS, which is available on request. scan_GLS performs generalized least-squares calculations conditional on variance components estimated in a model without markers, as proposed by Kang *et al.* (2010), and can efficiently handle genetically identical individuals. State-of-the-art association mapping software can perform association mapping on genetically identical individuals only when these are given different identifiers, *i.e.*, as if they were different genotypes. This leads to large genotypic data files and increases computation time. The GLS calculations in scan_GLS are more efficient since they use the fact that *ZKZ ^{t}* and

*K*have the same rank. This has been proposed in Lippert

*et al.*(2011) (supplement), but to our knowledge this has not been implemented in the Fast-LMM software. Although we did not find one-stage GWAS to be more powerful in the present study, the ability to perform fast association mapping for genetically identical individuals is useful in the context of a compressed kinship matrix (Bradbury

*et al.*2007; Zhang

*et al.*2010; Lipka

*et al.*2012). scan_GLS also includes a function to perform GLS calculations with nondiagonal residual variance structure, allowing association mapping with extra (possibly nongenetic) random effects.

## Discussion

We have presented new methodology for marker-based estimation of heritability for plant traits, which accounts for genetic relatedness and includes information on within-genotype variability, available from replicates. Our approach offers an alternative to mixed-model analysis based on genotypic means, which is the current practice in GWAS and genomic prediction with G-BLUP. Although mixed models can indeed estimate heritability from only kinship coefficients and genotypic means, we observed very large standard errors and sometimes unrealistic estimates of heritability, in both published data and new experiments. Using simulations and statistical arguments we showed that marker-based estimation of heritability based on genotypic means has indeed severe limitations when applied to commonly used association panels in *A. thaliana*. The main reason for this is the lack of information on within-genotype variability: heritability estimates are exclusively based on the (usually small) differences between genotypes. This is feasible in human cohorts with many thousands of individuals, but gives insufficient information in plant populations with only several hundred different genotypes, even if standard errors of genotypic means are taken into account. Much more accurate heritability estimates were obtained with mixed-model analysis at the individual plant or plot level. The resulting heritability estimates had accuracies similar to those reported for human diseases (see *e.g.*, Speed *et al.* 2012). In our simulations with 200 genotypes, the difference in accuracy relative to that of estimates obtained from genotypic means was larger than what was expected from the asymptotic variances. The reason for this larger difference appeared to be the monotone likelihood profile occurring in a substantial part of the simulations, giving heritability estimates of zero or one. However, even without this phenomenon, heritability estimations using individual plant data are more precise, and the asymptotic approximations appear to provide a lower bound on the gain in accuracy.

Mixed-model analysis at the individual plant level can also improve accuracy of G-BLUP, in particular when the actual heritability is underestimated. Too much shrinkage then leads to lower prediction accuracy. In GWAS, where the interest is in the estimated marker effects rather than the variance components, inclusion of the individual plant gave almost no increase in power. However, the possibility to include covariates observed at the individual plant level may be an important practical advantage. While two-stage procedures are usually considered preferable in complex multiexperiment settings (Welham *et al.* 2010; Piepho *et al.* 2012), a one-stage approach may give a more convenient and less error-prone analysis of a single experiment with a simple design.

State-of-the-art phenotyping platforms can measure plant traits with increasing accuracy and throughput. Compared to human traits, the key advantages are that phenotyping is performed under experimental conditions and can include different individuals of the same genotype. Our findings suggest that to fully exploit these advantages, statistical analysis at the individual plant level is necessary. Obviously, this requires the availability of the individual plant data, as well as covariates. For many studies in the literature this information is, however, not available, since most online resources store only genotypic means. More specifically, our results are relevant in the light of the missing heritability debate. Although the aim here was not to propose specific explanations for missing heritability, any such explanation clearly requires an accurate estimate of heritability in the first place. Standard errors for heritability are commonly reported for human traits, but usually absent in the *A. thaliana* literature. We demonstrated that these standard errors can be extremely large when using mixed models at a genotypic means level. Explaining missing heritability based on such estimates then becomes an unreasonable goal.

For two flowering traits, from both a published and a new experiment, = 1. In these cases, also and the broad-sense heritability estimates () were very high. This can partly be explained by the discrete scale (in whole days) on which the traits were measured. Some genotypes therefore had exactly equal phenotypic values in all or many of the replicates. The heritability estimates for LD were also affected by the fact that in the Atwell *et al.* (2010) data, all nonflowering plants were given a phenotypic value of 200. However, the estimates = 1 also occurred for some of our simulated (Gaussian) traits and have a more fundamental reason: the small number of genotypes (compared to human studies) leads to monotone likelihood profiles and very large standard errors, making it hard to make any statement about heritability.

Recently, Speed *et al.* (2012) showed that heritability estimates may become biased when linkage disequilibrium (LD) is not constant over the genome and proposed LD-adjusted kinship (LDAK) matrices to correct for this. To assess the effect on and we recalculated these heritability estimates, using an LD-adjusted kinship matrix (Figure S5 and Table S4). For all traits the estimates were very close to the values obtained using the unadjusted kinship matrix. The estimates on the other hand were quite different for several traits. Nevertheless, the same problems occurred: very large confidence intervals and estimates that appear biologically unrealistic. We conclude that is more sensitive to the choice of kinship matrix than and that the different behavior of and cannot be explained by LD being different over the genome.

The mixed-models analysis at the individual plant level proposed here can be extended in several ways, for example by partitioning the genetic variance into different chromosomal contributions, as in Yang *et al.* (2011), or by including epistatic effects. The latter has been proposed for experimental populations with known pedigrees (Oakey *et al.* 2007), but marker-based estimation of epistatic effects for natural populations appears possible only with more advanced statistical methodology, for example semiparametric mixed models and reproducing kernel Hilbert spaces (Gianola and Van Kaam 2008; Howard *et al.* 2014). Another direction for future research is the estimation of heritability in the presence of additional random effects, which would increase the applicability to agricultural field trials (where the raw data are usually at *plot* rather than individual plant level). Although our approach allows for missing values, it does assume that all design effects can be modeled as fixed. Field trials are often laid out in incomplete blocks, containing only a small number of the genotypes under study. Differences between such blocks are usually best modeled using random block effects. The definition of heritability in such contexts is, however, far from obvious: Oakey *et al.* (2007) proposed generalized heritability, but for natural populations this definition is not equivalent to the classical definition of heritability (Oakey *et al.* 2007, p. 813). In this case the ratio of the estimated genetic variance over the total phenotypic variance could be used as a lower bound on heritability. Nevertheless, we have demonstrated that mixed-model analysis at the individual plant or plot level offers important advantages over mixed models for genotypic means.

## Acknowledgments

We are grateful to Bjarni Vilhjálmsson, Arthur Korte, Susanna Atwell, and Magnus Nordborg for valuable discussions on heritability. We thank Doug Speed for his comments on the use of his LDAK software. Jian Yang is acknowledged for his comments on the calculation of confidence intervals in his GCTA software and Brian Cullis for feedback on the use of asreml-R. Christian Riedelsheimer, Tobias Schrag, and Albrecht Melchinger are acknowledged for providing the SNP data from Riedelsheimer *et al.* (2012). Finally, we thank two referees for helpful suggestions regarding the terminology.

W.K. was partially funded by the Learning from Nature project of the Dutch Technology Foundation (STW), which is part of the Netherlands Organisation for Scientific Research (NWO). The work was also funded by the Generation Challenge Program–Integrated Breeding Platform project 2.2.5 (M.M. and F.v.E.), the Technological Top Institute Green Genetics (P.J.F.), NWO Earth and Life Sciences (ALW) (P.J.F.), and the Centre for BioSystems Genomics (R.K. and J.K.).

Author contributions:W.K. and F.v.E. designed the research. W.K. performed the research. W.K. wrote the article, with input from M.M., B.E., F.v.E., P.J.F., M.B., J.K., and R.K. Software was written by W.K. (R package heritability) and M.B. (scan_GLS). Experimental data were obtained by P.J.F. [traits LA(S) and LA (H)] and R.K. (traits BT and LW).

## Appendix A

### Marker-Based Estimation of Heritability

We start with a brief summary of existing methodology and the required assumptions and then define marker-based heritability estimates given genetically identical replicates measured in an unbalanced design.

#### Marker-based estimation of heritability given a single observation per genotype

Phenotypic variance can be partitioned aswhere the genetic variance is decomposed into additive, dominance, and interaction effects (Falconer and Mackay 1996), the dominance effect being absent for inbred lines. Heritability in the broad sense is defined as , while the narrow-sense heritability takes into account only the additive genetic effects, which determine breeding values and the selection response. Defining the residual variance to be the sum of the environmental and nonadditive genetic variance terms (), we can write (A1)Marker-based estimates of narrow-sense heritability can be obtained using mixed models with random genetic effects. The covariances between these effects are modeled by a genetic relatedness matrix (GRM) estimated from markers, with elements given by (1). Given a single observation per genotype, the standard infinitesimal model is (A2)where *μ* is the intercept, *G* = (*G*_{1}, …, *G _{n}*) has a distribution, and the errors

*E*have independent normal distributions with variance . The optional term models the effect of

_{i}*k*additional covariates. Under model (A2), heritability can be estimated by (A3)where and are REML estimates of additive genetic and residual variance (Yang

*et al.*2011; Speed

*et al.*2012).

It has been noted (*e.g.*, Hayes *et al.* 2009) that using model (A2) with genetic relatedness matrix (1) is equivalent to assuming that the effects of the standardized marker scores are drawn independently from Gaussian distributions with variance . Consequently, the model can account only for additive genetic effects, and nonadditive effects will get into the residual variance. This is why is an estimate of narrow-sense heritability. The preceding argument also highlights the fact that is a marker-based estimate, which requires the (frequently made) assumptions that every causal locus is tagged by one of the markers, and that LD is constant across the genome (Speed *et al.* 2012). For the case that LD is not constant over the genome, Speed *et al.* (2012) proposed LD-adjusted kinship matrices.

The parameter can be interpreted as additive genetic variance only if *K* is scaled appropriately. For the kinship matrix used here this is the case, but for different types of kinship matrix (*e.g.*, identity by state), it is necessary to divide each coefficient by tr(*PKP*)/(*n* − 1), for *P* = *I _{n}* −

**11**′/

*n*. Under the condition that tr(

*PKP*) = (

*n*− 1),

#### Marker-based estimation of heritability given genetically identical replicates, for unbalanced designs

When for each genotype a number of genetically identical individuals are observed, (A2) generalizes to (A5)where *Y _{i}*

_{,}

*is the phenotypic response of replicate*

_{j}*j*of genotype

*i*,

*x*

_{i}_{,}

*is a vector of fixed effects,*

_{j}*G*= (

*G*

_{1}, … ,

*G*) has a distribution, and the errors

_{n}*E*

_{i}_{,}

*have independent normal distributions with variance . Equation A5 is similar to Equation 2 apart from the additional covariates and the generally unbalanced design. Analogous to the case of balanced designs, we use the estimator defined in (3) for REML estimates and obtained for model (A5).*

_{j}Heritability estimates based on genotypic means can be constructed as in Equations 4 and 5, except that these genotypic means are no longer equal to the arithmetic averages . We fit the linear model (A6)where is the vector of phenotypic observations, *r _{i}* is the number of replicates of genotype

*i*,

*X*is the design matrix of the extra fixed effects, and is the vector of independent Gaussian errors with variance . This model is identical to (A5), apart from the factor genotype now being fixed instead of random. The intercept is included in the incidence matrix

_{C}*X*, which has dimension

_{G}*N*×

*n*, for .

We obtain least-squares estimates for the genotypic effects in model (A6), which form the basis for the subsequent analysis. In mixed-model terminology, this is the BLUE for *g*. In (A6) the only random term is *E*, *i.e.*, we consider an ordinary linear model, as the additional covariates are all assumed to be fixed. The least-squares estimator is given by (A7)where *X* = [*X _{G} X_{C}*]. This vector has covariance matrix (A8)Consequently, the covariance matrix of isThe mixed model (4), used for the estimation of heritability, now naturally generalizes to (A9)where, to avoid having twice the letter

*g*in the same equation, we abused the notation by writing instead of . Hence the matrix

*r*

^{−1}

*I*in (4) is replaced by

_{n}*R*. Given REML estimates and for this model, we use in (5) as heritability estimate. As in the case of a completely randomized design, is a so-called two-stage estimator. The fact that genotype is taken as a fixed effect in the first stage (A6) and as random in the subsequent mixed-model analysis may appear inelegant, but is common in two-stage analyses and necessary to avoid shrinking twice.

In the case of a completely randomized design with *r _{i}* replicates without further covariates, andNote that in contrast to the estimator of broad-sense heritability described below, it is not necessary to define an average number of replicates. In the case of a randomized complete block design with

*r*replicates,

*X*is the

_{C}*nr*× (

*r*− 1) design matrix for the factor block, and is the

*n*×

*n*matrix with elements (

*r*− 1)/

*n*. Hence,

*R*is not diagonal, even though the design is balanced. The off-diagonal elements are equal in this case and small (provided

*n*is sufficiently large). With

*n*= 200 and

*r*= 3 for example, the diagonal elements of

*R*are 0.336667 and the off-diagonal elements are 0.003333. For unbalanced designs, some of the off-diagonal elements may be larger and more influential.

## Appendix B

### Simulations

Data for genotypes *i* = 1, … , *n* with replicates *j* = 1, … , *r* were simulated following the model (B1)where *K* was scaled such that tr(*PKP*) = *n* − 1, for *P* = *I _{n}* −

**11**′/

*n*(see Equation A4).

We assumed *q* QTL located at marker positions, with effect sizes *α _{m}* (

*m*= 1, … ,

*q*) and minor allele frequencies

*f*(

_{m}*m*= 1, … ,

*q*);

*i.e.*, genotypes

*AA*and

*aa*occur with frequencies

*f*and (1 −

_{m}*f*),

_{m}*α*being the effect of a single allele. Let

_{m}*x*

_{i}_{,}

*∈ {0, 1, 2} denote the marker score at QTL*

_{m}*m*for genotype

*i*. In the case of

*A. thaliana*inbred lines,

*x*

_{i}_{,}

*∈ {0, 2} and (B2)whereis the total (additive) genetic variance. For outbreeding species under Hardy–Weinberg equilibrium, the constant 4 is replaced by a 2. In (B2) we assume linkage equilibrium between the QTL.*

_{m}For each simulated trait, QTL locations were sampled randomly from all available markers whose minor allele frequency >10%. To enforce linkage equilibrium between the sampled QTL, the sampled locations were discarded and drawn again if the difference between and *v*_{2} = 4(*α*_{1}, … , *α _{q}*)

*S*(

*α*

_{1}, … ,

*α*)′ was too large,

_{q}*S*being the sample covariance matrix of the marker scores at QTL positions. It was required that min(

*v*

_{1},

*v*

_{2})/max(

*v*

_{1},

*v*

_{2}) > 0.97.

Let (B3)denote the proportion of the genetic variance explained by the QTL. In our main set of simulations we choose , *γ* = 0.5, *q* = 20 QTL. For the comparison of one- and two-stage association mapping, we choose *γ* = 0.75, *q* = 10. To achieve the desired level of heritability (for given *γ* and ), we set and , and *α*_{1}, … , *α _{q}* were chosen equally large, such that for each

*m*. The QTL effects were given random signs. In File S4 we repeat the simulations for

*γ*= 0.1 and

*q*= 1.

For the purpose of genomic prediction, 50 additional genotypes were drawn randomly from the same subpopulation of the RegMap. These were assigned only genotypic values, defined as the sum of QTL effects and polygenic effects *g _{i}* (

*i*= 201, … , 250). The polygenic effects were simulated such that ,

*K*

_{total}being the kinship matrix of the training and validation sets combined.

## Appendix C

### Genomic Prediction with G-BLUP

Genomic prediction with G-BLUP relies on the same models used for marker-based estimation of heritability and can be based on individual plant data (one-stage) or genotypic means (two-stage). We now provide the expressions for one- and two-stage G-BLUP, which are based on models (A5) and (A9) in *Appendix A*.

#### One- and two-stage G-BLUP for the training and validation sets

First we consider the G-BLUP for the training set, for which phenotypic observations are available. Including the intercept in the design matrix, model (A5) can be rewritten as (C1)where *N* is the total number of individuals and *Z* is the *N* × *n* incidence matrix assigning individuals to genotypes. The BLUP of *G* = (*G*_{1}, … , *G _{n}*)

*and the BLUE of*

^{t}*β*are given by (C2)where

*N*is the total number of individuals,

*Z*is the

*N*×

*n*incidence matrix assigning individuals to genotypes, and . Numerically and can be more conveniently obtained by solving the mixed-model equations (see,

*e.g.*, Henderson 1975 or Robinson 1991). Similar expressions hold for predictions based on the genotypic means following model (A9), (C3)where, as in (A9), denotes the vector of genotypic means obtained from preliminary linear model analysis (not necessarily the arithmetic averages). Since we assume that all covariates have been accounted for already in this preliminary analysis, the only fixed effect in (C3) is an intercept, with design matrix

*X*= 1

*.*

_{n}Assuming the one-stage model (C1) and given by (C2), the genetic effects *G*_{pred} = (*G _{n}*

_{+1}, … ,

*G*

_{n}_{+}

*)*

_{m}*of*

^{t}*m*unobserved (but genotyped) genotypes can be predicted by the conditional means(C4)where

*K*

_{pred.obs}is the

*m*×

*n*matrix of kinship coefficients for the unobserved

*vs.*observed genotypes. Assuming the two-stage model (A9) and given by (C3), the predictor is given by

#### Prediction of new observations and cross-validation

For new observations *Y*_{pred} at the individual plant level, we have the one-stage predictor (C6)where *X*_{pred} and *Z*_{pred} are the corresponding design matrices. For predictions with genotypic means, we replace by and *β* by the estimate obtained within the linear model (A6) in the preliminary stage.

## Appendix D

### Genome-Wide Association Studies

In mixed-model-based GWAS (Kang *et al.* 2010; Lippert *et al.* 2011; Lipka *et al.* 2012; Zhou and Stephens 2012) the phenotype of genotype *i* is modeled as (D1)where *x _{i}* is the marker score,

*γ*is the marker effect, and the genotypic effects

*G*= (

*G*

_{1}, … ,

*G*) follow a distribution. This model assumes a single observation per genotype. If observations on genetically identical individuals are available, the

_{n}*Y*’s can be replaced by genotypic means , as in (A7)–(A9). Model (D1) then generalizes to (D2)This amounts to a two-stage approach: first the genotypic means are calculated using (A7), and next association mapping is performed following (D2). In practice a GWAS is often performed on the arithmetic averages , which implicitly assumes a balanced completely randomized design, without any missing values or replicate effects. Here we use the more general model (D2). Apart from the additional marker effect, this is the same model we used in (A9) to construct the heritability estimate .

_{i}Alternatively, association mapping can be based on the one-stage model (D3)where the term (*X _{C}*)

_{i}_{,}

*models additional covariates, as in (A7). In the two-stage GWAS, this information is accounted for in the genotypic means . In models (D2) and (D3) we test the hypothesis*

_{j}β*γ*= 0 using the

*F*-test, conditional on estimates of the variance components obtained from a model without markers (Kang

*et al.*2010). When marker effects are small, these estimates are a good approximation of the exact estimates, obtained when genetic and residual variances are reestimated for each marker.

## Appendix E

### Phenotypic Data

#### Collection of phenotypic data

For the two flowering traits from Atwell *et al.* (2010) (LDV and LD), details can be found in the original publication. In the case of LD, all plants that had not flowered by the end of the experiment were given a phenotypic value of 200 days. The leaf area trait [LA(S)/LA(H)] was measured in two separate experiments on the Swedish RegMap and the HapMap, using the same phenotyping platform. The plants were imaged top down for projected leaf area every day. The images were taken using near-infrared light (790 nm) to not influence the photoperiod during the night measurements. In our final experiment, BT and LW were measured for the HapMap. BT was noted as the number of days after vernalization on which the plant started to bolt. LW was measured on photographs taken from the longest leaf 2 weeks after flowering.

#### Plant growth conditions

In the leaf area experiments, seeds from the *Arabidopsis* Swedish RegMap (298 accessions) and the *Arabidopsis* HapMap (350 accessions) were stratified at 4° for 4 days and sown on Rockwool blocks that had been covered with a black foamed PVC sheet to prevent algal growth and provide a uniform background for automated image analysis. The growth conditions were a light intensity of 200 μmol⋅m^{−2}⋅sec^{−1}, 10-hr short day, 20°/18° day/night cycle, and 70% relative humidity (RH). In the experiment measuring BT and LW on the HapMap, seeds were sown on filter paper with demi water and stratified at 4° in dark conditions for 5 days. Following stratification, seeds were transferred to a culture room (16-hr LD, 24°) to induce seed germination for 42 hr. Germinating seeds were then transplanted to wet Rockwool blocks of 4 × 4 cm in a climate chamber with a light intensity of 125 μmol⋅m^{−2}⋅sec^{−1} 16-hr LD, 20°/18° day/ night cycle, and 70% RH. All plants were watered every morning for 5 min at 9 am with 1/1000 Hyponex solution (Hyponex, Osaka, Japan). Nineteen days after germination, all plants were vernalized for 6 weeks in a cold room (12 hr light, 4°). After the 6-week vernalization period plants were transferred back to the same climate chamber in the same order, but given more space to grow.

#### Genotypic means

All experiments were laid out as randomized complete block designs, apart from the final HapMap experiment (BT and LW), where the same randomization was used within each replicate. In all experiments we included a replicate (complete block) effect. The numbers of replicates in the different experiments were six (LD and LDV), four [LA(S)/LA(H)], and three (BT and LW). Due to nongerminating seeds and dead plants, some accessions had lower numbers of replicates. For the leaf area experiments [LA(S)/LA(H)] we additionally included a row and column effect to model the within-image position of each plant (*x* = 1, 2, 3 and *y* = 1, 2, 3, 4). These factors correct for technical artifacts, which are known to be consistent across replicates.

## Footnotes

Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.114.167916/-/DC1.

*Communicating editor: A. H. Paterson*

- Received July 25, 2014.
- Accepted December 9, 2014.

- Copyright © 2015 by the Genetics Society of America