## Abstract

As one of the world’s most important food crops, the potato (*Solanum tuberosum* L.) has spurred innovation in autotetraploid genetics, including in the use of SNP arrays to determine allele dosage at thousands of markers. By combining genotype and pedigree information with phenotype data for economically important traits, the objectives of this study were to (1) partition the genetic variance into additive *vs.* nonadditive components, and (2) determine the accuracy of genome-wide prediction. Between 2012 and 2017, a training population of 571 clones was evaluated for total yield, specific gravity, and chip fry color. Genomic covariance matrices for additive (**G**), digenic dominant (**D**), and additive × additive epistatic (**G**#**G**) effects were calculated using 3895 markers, and the numerator relationship matrix (**A**) was calculated from a 13-generation pedigree. Based on model fit and prediction accuracy, mixed model analysis with **G** was superior to **A** for yield and fry color but not specific gravity. The amount of additive genetic variance captured by markers was 20% of the total genetic variance for specific gravity, compared to 45% for yield and fry color. Within the training population, including nonadditive effects improved accuracy and/or bias for all three traits when predicting total genotypic value. When six F_{1} populations were used for validation, prediction accuracy ranged from 0.06 to 0.63 and was consistently lower (0.13 on average) without allele dosage information. We conclude that genome-wide prediction is feasible in potato and that it will improve selection for breeding value given the substantial amount of nonadditive genetic variance in elite germplasm.

- tetraploid
- nonadditive effects
- genome-wide prediction
- potato
- GenPred
- Shared Data Resources
- Genomic Selection

CULTIVATED potato (*Solanum tuberosum* L.) is unique among the major, global food crops in that it is autotetraploid and clonally propagated. As of 2018, there are 12 public breeding programs in the U.S. with a mandate to release varieties for commercial production, as well as several additional programs with a focus on germplasm enhancement. The variety development process begins with botanical seed from the sexual hybridization of heterozygous clones. Seedlings are grown in a greenhouse and one or more tubers from each plant are retained for subsequent vegetative propagation. Crossing and seedling tuber production takes 1–2 years—depending on the breeding program—followed by 1–2 years of field selection based primarily on visual assessment of tuber appearance, plant maturity, and yield components (tuber number and size), with some postharvest evaluation of processing traits such as fry color and specific gravity. Quantitative measurement of these traits in replicated and/or multi-location trials begins in field year (FY) 3 or 4 and continues for several years. Because it takes 3–4 years to establish clones as disease-free plantlets *in vitro* and to produce foundation seed, new varieties are typically released 10–12 years after crossing, by which time dozens of traits have been evaluated. The duration of the potato breeding cycle—from sexual hybridization to incorporating new clones as parents—is shorter than the time to variety release, but it is still 5–7 years for most U.S. programs. Until now, U.S. breeding programs have primarily used phenotypic selection in combination with genetic markers for a handful of major resistance genes (Lopez-Pardo *et al.* 2013).

The use of phenotype means for parent selection is not ideal because the estimates contain additive and nonadditive genetic effects, but only the former are efficiently transmitted to offspring (Gallais 2003). A number of previous studies have investigated nonadditive genetic variance in potato using factorial mating designs to estimate general and specific combining abilities (Plaisted *et al.* 1962; Tai 1976; Brown and Caligari 1989; Maris 1989; Neele *et al.* 1991; Gopal 1998; Bradshaw *et al.* 2000). General combining ability (GCA) is equivalent to the covariance between half-sibs, which equals for autotetraploid loci in panmictic and linkage equilibrium (Kempthorne 1955c; Gallais 2003). The symbols V_{a}, V_{d}, and V_{aa} are the genetic variances for additive, digenic dominance, and additive × additive epistasis effects, respectively. Specific combining ability (SCA) is the covariance between full-sibs minus twice the covariance between half-sibs, which leads to an expression containing only nonadditive genetic variances The SCA/GCA ratio therefore provides an indication of the importance of nonadditive genetic variance. Although a wide range of values for SCA/GCA is found in the aforementioned references for potato, the general conclusion is that nonadditive genetic variance is important in many contexts.

For pedigreed populations, an alternative approach to estimating additive genetic variance is via the numerator relationship, or **A**, matrix. Kerr *et al.* (2012) were the first to publish a complete, recursive algorithm for **A** in autotetraploids, which has been applied to potato (Slater *et al.* 2014) and blueberry (Amadeu *et al.* 2016) populations. Mixed model analysis with **A** also enables selection on additive values calculated by BLUP (Henderson 1975). Although pedigree-BLUP is the cornerstone of genetic improvement for quantitative traits, the method has several limitations: (1) it depends on accurate pedigree records; (2) it neglects genetic covariance between founders; and (3) the covariance is based on expected, rather than realized, parental contribution.

Genomic selection (GS) has the potential to overcome these limitations by replacing **A** with a genomic covariance, or **G**, matrix estimated from markers (Bernardo 1994; Nejati-Javaremi *et al.* 1997; Habier *et al.* 2007; VanRaden 2008) or by estimating marker effects directly (Meuwissen *et al.* 2001). There have been several studies on GS in autotetraploid species (Annicchiarico *et al.* 2015; Li *et al.* 2015; Slater *et al.* 2016; Habyarimana *et al.* 2017; Sverrisdóttir *et al.* 2017), but none have used nonadditive genomic covariance matrices to partition genetic variance or predict total genotypic value. Both topics are addressed in this manuscript, building on analogous studies at the diploid level (Su *et al.* 2012; Vitezica *et al.* 2013; Xu 2013; Muñoz *et al.* 2014; Jiang and Reif 2015) and the classical theory of average effects in tetraploids.

## Theory

In a series of articles in 1955 (Kempthorne 1955a,b,c), which were further distilled in the monograph *An Introduction to Genetic Statistics* (Kempthorne 1957), Kempthorne developed the theory of average effects for arbitrary ploidy, drawing upon the same mathematical methods used in the ANOVA for factorial experiments. Key results from this literature are reproduced here, as well as details on the derivation omitted by Kempthorne.

For an autotetraploid locus in panmictic equilibrium, assuming random bivalent formation (*i.e.*, random chromosome segregation) and no inbreeding, the genotypic value (*g _{ijkl}*) of genotype

*ijkl*(each index ranges from 1 to the number of alleles, and permutations of the indices are distinct) can be orthogonally decomposed into the population mean (

*μ*) plus four additive effects (

*α*) corresponding to the four genes, six digenic dominance effects (

_{i}*β*) for all pairs of genes, four trigenic interactions (

_{ij}*γ*), and one quadrigenic term (

_{ijk}*δ*):(1)Equation 1 uses standard notation from the analysis of factorial experiments, where the symbols denote the regression coefficients and the regressors are implied to be indicator variables. Focusing on the additive effects and grouping the other parameters into a residual term, Equation 1 becomes a regression of genotypic value on allele dosage (Fisher 1941). The average effects minimize the sum of squared residuals for the population, which is equivalent to a sum over genotypes weighted by genotype frequency

_{ijkl}*p*(=

_{ijkl}*p*under the assumptions of the model):(2)Taking the derivative with respect to the additive effect for each allele and equating the result to zero generates a set of normal equations that can be solved (Supplemental Material, Supporting Methods in File S1) to produce the following linear constraint:(3)which is identical to the result for diploids. By substituting Equation 3 into the normal equations (Equation S3, File S1), the solution for the additive effect of an allele becomes the average genotypic value of all individuals with that allele (multiple doses contribute separately), relative to the population mean:(4)The residuals from the regression equation for the additive effects, which we denote by are known as the dominance deviation. In diploids, this deviation uniquely defines one dominance effect for each genotype; but in tetraploids the dominance deviation comprises digenic, trigenic, and quadrigenic effects (Equation 1). The tetraploid solution for the digenic effects corresponds to regressing the dominance deviation on the dosage of pairs of alleles, which involves minimizing the following sum of squared residuals:(5)Taking the derivative with respect to the digenic effect for each allele pair and equating the result to zero generates a set of normal equations that can be solved (Supporting Methods, File S1) to produce the following linear constraint for any allele

_{i}p_{j}p_{k}p_{l}*i*:(6)which is the same result for diploids. Substituting Equation 6 into the normal equations (Equation S6, File S1) leads to the solution(7)By now the pattern is clear and the least-squares solution for the trigenic effects can be written as(8)Having solved for the additive, digenic, and trigenic terms; the quadrigenic effect

*δ*is the residual in Equation 1.

_{ijkl}The breeding value (BV) of an individual is defined as twice the mean genotypic value of its progeny relative to the population mean. Under the model assumptions, all six possible gene pairs for tetraploid genotype *ijkl* have equal frequency in its gametes which, in conjunction with Equation 7, leads to the following expression:(9)Equation 9 shows that breeding value equals the total additive value (*u*) plus 1/3 of the total digenic dominance (*v*), but it is conventional to refer to the additive value as “breeding value” because the contribution of digenic dominance diminishes exponentially: 1/3* ^{t}* after

*t*generations (Gallais 2003). This is analogous to the situation in diploids (and polyploids) with regard to additive × additive epistasis, as it contributes to a breeding value with a coefficient of 1/2 but is generally omitted when referring to breeding value.

## Materials and Methods

### Training population

Phenotype data for a training population (TP) of 571 round white clones was collected between 2012 and 2017 at the University of Wisconsin (UW–Madison) Hancock Agricultural Research Station (number of clones trialed per year is in Table S1 in File S1). Between 2012 and 2015, all clones were entries in the National Chip Processing Trial (NCPT) which were contributed by 11 public U.S. breeding programs. In 2016 and 2017, FY3 and FY4 selections from the UW–Madison breeding program were included in addition to the NCPT clones. The NCPT uses a two-tier evaluation system, with one plot per location for tier 1 clones and two plots per location for tier 2 clones. FY3 clones were evaluated with a single plot, and FY4 clones were evaluated with two plots in 2016 and one plot in 2017. All plots contained 15 seed pieces, planted with 30-cm in-row spacing and 91 cm between rows. Trials were planted in late April and harvested in early September, with vine desiccation 2–3 weeks before harvest.

Phenotype data for three traits—yield, specific gravity, and fry color—are included in this study. Total yield is based on the weight of all harvested tubers and reported in Mg ha^{−1}. Specific gravity was determined by water displacement, using 2–3 kg of tubers per plot (Wang *et al.* 2017). Fry color was measured in March of each year after 6 months of storage (1 month at 12.8° for wound healing, followed by 5 months at 8.9°), using 1-mm slices fried for 130 sec in vegetable oil at 182°. For the 2012–2014 trials, fry color was measured on a 1–10 visual scale, while for the 2015–2017 trials it was measured on the *L** lightness scale using the D25 HunterLab Colorimeter (Hunter Associates Laboratory). Fry color measurements on the visual scale (*x*) were converted to *L** using the formula *L** = −1.37*x* + 63.7, which is based on a linear regression analysis of 70 clones phenotyped with both methods.

TP samples were genotyped with either version 1 or version 2 of the Solanaceae Coordinated Agricultural Project potato SNP array, which have in common a set of 8303 markers used for this study (Hamilton *et al.* 2011; Felcher *et al.* 2012). Tetraploid genotype calls (coded 0–4) were made using version 1.6 of the ClusterCall package (Schmitz Carley *et al.* 2017) in R (R Development Core Team 2015), which calibrates the relationship between signal intensity from the array and allele dosage for each marker based on multiple F_{1} populations. In addition to the Atlantic × Superior, Wauseon × Lenape, and Rio Grande × Premier Russet populations used by Schmitz Carley *et al.* (2017); two more calibration populations were used: Waneta × Pike (da Silva *et al.* 2017; *n* = 184) and A06084-1TE × Castle Russet (*n* = 245). Default parameters were used except for *min.train* = 3, which required a marker to be called in at least three of the five calibration families. The curated marker set contained 3895 polymorphic SNPs with ≥95% concordance across samples (File S2).

The numerator relationship matrix **A** was calculated with R package AGHmatrix (Amadeu *et al.* 2016) using pedigree records maintained by the authors as well as a public database (van Berloo *et al.* 2007). After removing uninformative ancestors, there were 185 founders (clones with no parent) and 1138 total clones in the pedigree (File S3).

### Genomic covariance matrices

In the *Theory* section, average effects were derived for an arbitrary number of alleles. For biallelic SNPs, additional simplifications are possible. Consider alleles *B* and *b* with frequencies *p* and *q*, respectively. In this case, Equation 3 becomes which reduces to the following well-known formulas involving (10)If *X* denotes the dosage of *B*, then the total additive value is(11)where *W* is a centered genotype because 4*p* is the population mean of *X*. To obtain a similarly parsimonious expression for the total digenic dominance *v*, we follow the example of Wricke and Weber (1986) and introduce the parameter When combined with Equation 6, the result is (Supporting Methods, File S1)(12)(13)For mixed model analysis, *G _{ij}* quantifies the covariance between the additive values for clones

*i*and

*j*relative to the additive genetic variance (14)The covariance in Equation 14 involves the expectation with respect to but the additive genetic variance is based on the theory of average effects in which the expectation is with respect to genotypes. To relate the two variance components, the expectation with respect to both parameters is used:(15)Upon substituting Equation 15 into Equation 14 and extending the analysis to

*m*loci in linkage equilibrium, the result is(16)The digenic dominance matrix

*D*is defined similarly as the covariance between dominance values relative to the dominance genetic variance, based on the expectation with respect to (17)Using the following result for the dominance genetic variance(18)the

_{ij}**D**matrix is(19)For the covariance between additive × additive epistatic effects, we used the Hadamard product of the

**G**matrix, denoted by

**G**#

**G**(Henderson 1985; Su

*et al.*2012).

### Mixed model analysis of the TP data set

Stage-wise analysis of the multi-year TP data set was performed using ASReml-R version 3 (Butler *et al.* 2009) and a diagonal weight matrix to account for the varying precision of the estimates in the first stage (Smith *et al.* 2001; Damesa *et al.* 2017). Stage one was an analysis within a year, including blocking effects when present and modeling the genotype effect for each clone as fixed. The covariance matrix () for the vector of genotype effect estimates () in year *j* was obtained from the inverse of the coefficient matrix of the mixed model equations (Henderson 1975), which is returned as *Cfixed* in the asreml object. Stage two was a multi-year analysis based on the following linear model:(20)In Equation 20, the parameter *φ* is the intercept, *g _{i}* is a random effect for genotype,

*y*is a fixed effect for year, (

_{j}*gy*)

*is a random effect for the genotype × year interaction, and the variance of the residual*

_{ij}*f*is (

_{ij}*ω*)

^{ij}^{−1}, where

*ω*is the

^{ij}*i*-th diagonal element of from stage one (Damesa

*et al.*2017). File S4 contains the genotype effect estimates () and corresponding weights (

*ω*) used in the multi-year analysis.

^{ij}After fitting a baseline model with independent genotype effects (var[*g _{i}*] = V

_{g}), five genetic models with different covariance structures for

*g*were tested (Table 1). The variance of the genotype × year interaction (var[

_{i}*gy*] = V

_{ij}_{gy}) was estimated in the baseline model and constrained at that value for the other models. This allowed for the partitioning of the genetic variance (V

_{g}) into additive (V

_{a}) and residual (V

_{r}) genetic components for models A and G. Models G+GG, G+D, and G+GG+D involved the estimation of nonadditive variance components for digenic dominance (V

_{d}) and/or additive × additive epistasis (V

_{aa}). Variances are reported using the standardization proposed by Legarra (2016) to make them comparable, in which the parameter estimate is multiplied by the difference between the mean of the diagonal elements and the mean of all elements of the covariance matrix Goodness of fit was assessed by the Akaike information criterion (AIC), defined as the deviance minus twice the number of variance parameters (Piepho 2009).

Our objective was to compare how well the different covariance models predicted the total genotypic value *g* of unobserved clones. Each of the models in Table 1 has the form where *θ* is a sum of average effects and *r* is the residual genetic effect. Genomic predictions were calculated as from a stage-two analysis (Equation 20) without response values () for clones in the validation set, using the variance parameter estimates from ASReml-R and custom scripts to solve Henderson’s mixed model equations (Henderson 1975). The validation data were calculated as from a stage-two analysis with all clones, assuming independent clone effects (*i.e.*, the baseline model). The reliability () of the validation data was calculated from the prediction error variance (PEV) by for clone *i* (Clark *et al.* 2012). To estimate prediction accuracy () from predictive ability (*i.e.*, the Pearson correlation between the genomic predictions and validation data, ), the latter was divided by the square root of the mean reliability (*i.e.*, broad-sense heritability) of the validation data (Dekkers 2007). Because the mean-squared error of the accuracy estimate worsens as the reliability of the validation data decreases (Ould Estaghvirou *et al.* 2013), only validation data with reliability ≥0.6 were used.

### Genome-wide prediction in F_{1} populations

As part of various research projects, six unselected F_{1} populations (Table 2) were evaluated at the same location as the TP during the same time period. Populations W12011, W12012, and W12060 were evaluated for yield and specific gravity with a single plot of 12 plants per clone in 2015 and 2016. Populations W9817 and W10010 were evaluated for yield with two 8-plant plots in 2013 and one 20-plant plot in 2014 (Rak and Palta 2015). Population W×L was evaluated with a single plot for specific gravity for 4 years (2012–2015), but yield was only measured in 2014 and 2015; there were 6 plants per plot in 2012 and 10 plants per plot in 2013–2015 (Frederick 2017). Phenotype data were analyzed separately for each F_{1} population using a linear model with fixed effects for year and independent random effects for clone. Genetic and residual variance components were estimated with ASReml-R and used to calculate BLUPs for validation. The BLUPs and corresponding reliabilities (which were used to estimate accuracy from predictive ability, as described above) are provided in File S5; however, yield BLUPs for W12011 and W12060 were excluded because of low reliability (<0.6).

The F_{1} populations were genotyped using the same SNP array as the TP. Tetraploid genotype calls (coded 0–4) were made in two ways: (1) as described above (using *CC.anypop* in the ClusterCall package) for the same set of 3895 markers selected for the TP, and (2) using the *CC.bipop* function. Polymorphic markers with identical calls for the two approaches were used for prediction (File S6). Genomic predictions for each F_{1} population were calculated as from a stage-two analysis of the entire TP (Equation 20), using the TP variance estimates with multi-population covariance matrices (“**G**_New” from Wientjes *et al.* 2017) to account for different allele frequencies in the TP *vs.* F_{1} population. If and denote scaled versions of the matrices defined previously:(21)then, using the subscripts 1 and 2 to denote the different populations, the covariance matrices are

## Results

Pedigree information was used to calculate the numerator relationship, or **A**, matrix for a TP of 571 clones. All but 13 clones had a pedigree depth (maximum number of generations from a founder) of at least 7, with a median depth of 10 generations (distribution in Figure S1 in File S1). For autotetraploids, diagonal element *A _{ii}* is related to the inbreeding coefficient

*F*—defined as the probability that two randomly chosen genes, sampled without replacement, are identical by descent—via the equation

_{i}*A*= 1+3

_{ii}*F*(Gallais 2003). Values for A

_{i}*ranged from 1 to 1.55, with a mean of 1.05, so the most inbred clone in the population had*

_{ii}*F*= 0.18.

_{i}In the G-BLUP method of GS, the covariance between additive values is proportional to a **G** matrix calculated from markers (instead of **A**). We calculated **G** using 3895 polymorphic markers from the potato SNP array for which accurate allele dosage information was available. The overall scaling of **G** is such that the mean of the diagonal elements equals 1 at panmictic equilibrium, which is very near the observed value of 0.99. As further confirmation, the observed frequency of heterozygotes was inspected as a function of allele frequency and found to be in close agreement with the expected values under panmixis (Figure S2 in File S1).

**G** has been called the “realized relationship matrix” because it captures Mendelian segregation around the expected value **A** (Hayes *et al.* 2009). This connection is the motivation for regression analysis between **G** and **A** (VanRaden 2008) and is shown in Figure 1 for our potato data set. The dashed line is the fit of the linear model when all off-diagonal elements are used (*G* = 0.66*A*–0.06, *R*^{2} = 0.41), which underestimated **G** at high values of **A**. By excluding very distant relationships (*A* < 0.05), the model fit improved overall and at the upper end, as shown by the solid black line (*G* = 0.79*A*–0.09, *R*^{2} = 0.51). These results are based on the assumption of no double reduction, *i.e.*, that diploid gametes do not contain genes from sister chromatids; but in potato the probability of double reduction varies from 0 at the centromere to as high as 0.07 at the telomere (Bourke *et al.* 2015). In the context of a polygenic trait, with genes distributed across the entire chromosome, the effective value of the double reduction parameter is expected to be small. When the probability of double reduction was increased to 0.05 for computing **A**, the goodness of fit and intercept for the regression were unaffected but the slope decreased from 0.79 to 0.73.

Figure 2 compares the off-diagonal elements of the digenic dominance covariance matrix **D** against the corresponding elements of **G**. For close relationships, *G _{ij}* ≥ 0.4,

**G**and

**D**were highly correlated (

*r*= 0.81); but when these pairs were excluded the correlation dropped to 0.08. The overall scaling of

**D**is such that the mean of its diagonal elements equals 1 at panmictic equilibrium, which is very near the observed value of 0.99.

A potential concern when using genomic relationship matrices is that estimates of genetic variance may be too low due to insufficient marker density. Yang *et al.* (2010) presented a method to assess (and correct) this issue. First, the markers are randomly partitioned, such that one half represent QTL and the other half markers; **G** (or **D**) is then calculated using each half separately; and finally **G**_{QTL} is regressed onto **G**_{mark}. For both the **G** and **D** matrices in our data set, the mean regression coefficient after 100 iterations was 1.00 (SD 0.01), indicating sufficient marker density under the assumption that markers are sampled from the same distribution as QTL.

### Variance components

Phenotype data for three economically important traits were analyzed: total yield, specific gravity (as a proxy for dry matter content), and potato chip fry color after 6 months of storage. Initially, genotype effects were modeled as independent to estimate the total genetic variance (V_{g}) and the variance of the genotype × year interaction (V_{gy}). The V_{g} estimate was higher than V_{gy} for all traits, ranging from 2.7× higher for yield to 5.6× higher for specific gravity (Table 3).

By adding another random effect to the baseline model, with covariance proportional to **A** or **G**, the genetic variance (V_{g}) was partitioned into additive (V_{a}) and residual genetic variance (V_{r}), the latter corresponding to the independent clone effect (Table 1). Both **A** and **G** lowered the AIC compared to the baseline model for all traits, with the **G** matrix producing a better fit for yield and fry color *vs.* the **A** matrix for specific gravity (Figure 3). Using **A**, the proportion of genetic variance due to additive effects was 0.52 for specific gravity, 0.59 for yield, and 0.76 for fry color (Figure 4). When **G** was used, the additive genetic variance estimates were reduced by 0.12–0.18 of the total genetic variance, depending on the trait.

For specific gravity and fry color, including additive × additive epistasis lowered the AIC compared to the additive G-BLUP model; but the AIC increased when dominance was included (Figure 3). For these traits, a substantial amount of the estimated additive variance in the G model became additive × additive epistasis in the G+GG model: V_{a} dropped from 34 to 20% of V_{g} for specific gravity and from 63 to 45% for fry color, with 44–51% of the genetic variance captured by **G**#**G** (Figure 4, SE in Table S2 in File S1). For yield, neither dominance nor additive × additive epistasis improved the AIC compared to G-BLUP, and only 10% of the genetic variance was captured by **D** or **G**#**G** compared to 45% for the residual clone effect.

### Prediction accuracy

Many studies use random cross-validation to assess the accuracy of genome-wide prediction. However, in the context of a pedigreed breeding population, this approach leads to training-set individuals that are descendants of individuals in the validation set, which is unrepresentative of how GS will be used in practice and may produce unrealistically high accuracies. To avoid this pitfall, we used the pedigree depth metric to partition the population into a set of 168 candidates for selection (depth ≥12) and a set of 403 clones ancestral to this group (depth <12) as the training set. The selection candidates were further narrowed by excluding clones with insufficiently reliable data for validation, leaving 54 clones for yield, 132 clones for specific gravity, and 49 clones for fry color (with mean reliability in the range 0.71–0.72 for all traits).

Figure 5 shows the accuracy (left) and regression coefficient (right) when using each of the models in Table 1 to predict total genotypic value in the validation set. Prediction accuracy using only the **A** matrix was just over 0.5 for total yield *vs.* 0.4 for specific gravity and fry color. Replacing **A** with **G** improved accuracy for yield by 0.03 and fry color by 0.06, but decreased the accuracy for specific gravity by 0.07, which is consistent with the trend observed for AIC. Including dominance improved yield accuracy by 0.01 and including epistasis improved specific gravity accuracy by 0.05. Based on the AIC results for fry color, we expected higher accuracy with the G+GG model, but this was not observed. Including epistasis reduced prediction bias for fry color and specific gravity, with regression coefficients >0.9 for the G+GG model compared to 0.75–0.78 for G-BLUP.

To investigate the effect of training population size on accuracy, 200 random subsets of the TP were taken at *N* = 100, 200, and 300 clones (Figure S3 in File S1). For all three traits, prediction accuracy decreased as the TP was reduced. Fry color accuracy was the most sensitive, dropping by 0.28 when population size was reduced from 403 to 100, compared to accuracy decreases of 0.14 and 0.19 for yield and specific gravity, respectively.

We also determined accuracy when using the entire (*N* = 571) TP to predict yield and specific gravity in six unselected F_{1} populations (Table 4). The F_{1} populations ranged in size from 48 to 167 clones, and the number of polymorphic markers ranged from 1376 to 2311 (Table 2). Several of the parents had little pedigree relationship to the TP because they were russet clones, which is a distinct market category from the round, chip-processing type. Prediction accuracies ranged from 0.06 to 0.63 with G-BLUP, with no discernible connection between accuracy and pedigree relationship to the TP. The G+GG+D model performed very similarly, with no difference in average accuracy (to two decimal places) across the eight cases in Table 4. To assess the value of tetraploid allele dosage, predictions were made with “diploidized” marker data (**G**_{dip}) in which the three heterozygotes were recoded to be identical. The accuracy of the **G**_{dip} model was consistently lower, with an average loss of 0.13.

## Discussion

This is the first study to connect the classical theory of genetic variance partitioning in tetraploids with covariance matrices constructed from genome-wide allele dosage information. Our results are based on a two-stage analysis in which the genotype estimate for each clone × environment combination was calculated in stage one by assuming independent effects, and in stage-two genomic covariance matrices were used. We note that not all references to two-stage analysis in the literature employ this convention; often a single genotype mean across all environments is estimated in the first stage. In our data set, there were 849 genotype × year means estimated in stage one for the 571 clones (File S4), and this partial replication is expected to improve the precision of the variance component estimates compared to analyzing a single mean per genotype in stage two (Kruijer *et al.* 2015).

The partial replication across years also allowed for explicit modeling of a residual genetic effect with no covariance structure in addition to the additive, dominance, and epistatic random effects. We interpret the residual genetic effect to include higher-order nonadditive effects (*e.g.*, trigenic dominance, additive × dominance epistasis) and genetic variance not captured by the markers. The latter might appear to be ruled out based on our analysis of the **G** and **D** matrices using the Yang *et al.* (2010) method, but this assumes QTL are drawn from the same distribution as the markers. In reality, low-frequency alleles are underrepresented on the potato 8303 SNP array and are expected to contribute residual genetic variance (Vos *et al.* 2015). The estimates for V_{d} and V_{aa} were sensitive to whether the residual genetic effect was included in the model. Without it, V_{aa} for yield was estimated at 40 (SE 14) Mg^{2} ha^{−2}, which is 45% of the total genetic variance (88 Mg^{2} ha^{−2}). When the residual effect was included, most of this variance shifted into V_{r}. This phenomenon and the large SE of the estimates (Table S2 in File S1) suggest that the partitioning of the nonadditive genetic variance is uncertain, probably because of limited population size.

At first glance, our finding that more of the genetic variance was additive for yield (45%) compared to specific gravity (20%) seems unexpected. Diallel studies have typically found the ratio between SCA and GCA to be higher for yield. Tai (1976) reported SCA/GCA = 3.8 for yield *vs.* 0.6 for specific gravity, and Bradshaw *et al.* (2000) reported SCA/GCA = 0.75 for yield *vs.* 0.06 for specific gravity. Whereas these earlier studies used unselected populations, the clones in our TP had been selected for high specific gravity at least once, and in many cases for 2–3 years. Specific gravity (which is closely correlated with dry matter content) is one of the most important traits for the chip processing market, and strong selection early in the variety-development process is practiced because the trait shows relatively little genotype × environment interaction (Table 3; Wang *et al.* 2017). Our results suggest that, in the tail of the phenotype distribution for specific gravity, the partitioning of genetic variance is shifted toward nonadditive effects.

A unique feature of our study compared to previous reports of GS in autotetraploid species, such as alfalfa (Annicchiarico *et al.* 2015; Li *et al.* 2015) and potato (Habyarimana *et al.* 2017; Sverrisdóttir *et al.* 2017), was the use of SNP array data rather than genotyping by sequencing (GBS). A major benefit of the SNP array for polyploids is the ability to accurately determine allele dosage (Voorrips *et al.* 2011; Schmitz Carley *et al.* 2017), but the cost of the array is determined by sales volume and can be prohibitively expensive for GS in small breeding programs or minor crops. GBS methods achieve low per-sample costs by pooling many samples into one library for sequencing; but much higher read depth (per sample–marker combination) is needed for accurate genotype assignment in tetraploids compared to heterozygous diploids. By comparing GBS with KASP markers in potato, Uitdewilligen *et al.* (2013) recommended 60–80× read depth to differentiate the three heterozygous genotypes, which agrees well with theoretical calculations (J. B. Endelman, unpublished data). For GS, a pressing question is whether paying for more sequencing to improve estimates of allele dosage provides a return on investment in terms of prediction accuracy and ultimately genetic gain. A completely general answer may be elusive due to complex interactions between GBS method, population, and phenotype; but our finding that diploidization of the marker data consistently reduced prediction accuracy in the F_{1} populations (by 0.13 on average) highlights the need for further research.

In plant breeding, both the total genotypic value and additive value are relevant for selection. For many crops, the unit of commercial production (*i.e.*, inbred, F_{1} hybrid, or vegetative clone) is the same genotype evaluated by the breeder and therefore selection should be based on total genotypic value. When selecting new parents, however, only the additive value should be considered because nonadditive effects are less efficiently transmitted to progeny. We have demonstrated the feasibility of this paradigm for potato using the G+GG+D model, although questions remain regarding its optimal implementation. For many issues, further progress will require larger populations genotyped with less ascertainment bias.

## Acknowledgments

Financial support was provided by Potatoes USA, the U.S. Department of Agriculture National Institute of Food and Agriculture, Award 2014-67013-22418, and U.S. Department of Agriculture Hatch Projects 1002731 and 1013047.

Author contributions: J.B.E. designed the study. J.B.E. and C.A.S.C. analyzed the data and drafted the manuscript. All authors contributed germplasm, data, or financial resources.

## Footnotes

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.118.300685/-/DC1.

*Communicating editor: M. Sillanpää*

- Received January 1, 2018.
- Accepted February 27, 2018.

- Copyright © 2018 by the Genetics Society of America