## Abstract

The classic diallel takes a set of parents and produces offspring from all possible mating pairs. Phenotype values among the offspring can then be related back to their respective parentage. When the parents are diploid, sexed, and inbred, the diallel can characterize aggregate effects of genetic background on a phenotype, revealing effects of strain dosage, heterosis, parent of origin, epistasis, and sex-specific versions thereof. However, its analysis is traditionally intricate, unforgiving of unplanned missing information, and highly sensitive to imbalance, making the diallel unapproachable to many geneticists. Nonetheless, imbalanced and incomplete diallels arise frequently, albeit unintentionally, as by-products of larger-scale experiments that collect F_{1} data, for example, pilot studies or multiparent breeding efforts such as the Collaborative Cross or the *Arabidopsis* MAGIC lines. We present a general Bayesian model for analyzing diallel data on dioecious diploid inbred strains that cleanly decomposes the observed patterns of variation into biologically intuitive components, simultaneously models and accommodates outliers, and provides shrinkage estimates of effects that automatically incorporate uncertainty due to imbalance, missing data, and small sample size. We further present a model selection procedure for weighing evidence for or against the inclusion of those components in a predictive model. We evaluate our method through simulation and apply it to incomplete diallel data on the founders and F_{1}'s of the Collaborative Cross, robustly characterizing the genetic architecture of 48 phenotypes.

THE diallel is one of the oldest designs in genetics and one whose analysis is notoriously complex. The premise is simple: given a set of *J* parents, generate and phenotype offspring from all *J* × *J* reciprocal crosses and from these data estimate genetic parameters that characterize how the parental genomes and sex influence phenotypic variation. Using this design one can estimate the average parental contribution to the phenotype and the effect of specific combinations with other parents. When the parents are inbred strains, one can also estimate parent-of-origin effects. Despite the potential wealth of information contained in a diallel, there has been much to discourage its use in practice. Controversies about the interpretation of estimated parameters have been inextricably confounded with controversies about the analysis methods themselves, and much of the discussant literature is steeped in terminology unfamiliar to potential users. Indeed, to the outsider, the diallel emerges as an arcane puzzle that is perhaps best avoided in favor of simpler designs.

The diallel originated in animal and plant breeding as an extension of the idea that, from a breeding perspective, you should judge the value of an individual by the phenotypes of its offspring (Christie and Shattuck 1992 and references therein). It was originally defined by Schmidt (1919) as the set of all possible *J*^{2} pairwise crosses and was later introduced into the mainstream genetics literature by Jinks and Hayman (1953). In the decade that followed, the diallel, whose definition quickly broadened to encompass any set of F_{1} crosses between *J* > 2 parents, caught the attention of an active group of quantitative geneticists who went on to develop a series of elaborations of the design and its analysis. Among the simplest and most popular analytic decompositions is that of Griffing (1956). If η* _{jk}* is the mean phenotype or predicted value for the cross of parent

*j*with parent

*k*, then the parental effects can be modeled as

*g*is the main effect of parent

_{j}*j*, and

*s*is the statistical interaction of

_{jk}*j*and

*k*, that is, the deviation from the combined main effects induced by the specific pairing of parents

*j*and

*k*. Following the terminology introduced by Sprague and Tatum (1942) and used throughout diallel literature,

*g*is the generalized combining ability (GCA) of parent

_{j}*j*whereas

*s*is the specific combining ability (SCA) of the parents

_{jk}*j*and

*k*. GCA captures aggregate effects of additive genetics whereas SCA reflects aggregate genetic effects that lead to departures from additivity, such as dominance and epistasis.

Numerous extensions to Griffing's model have been proposed to extract more subtle effects from the diallel. These include decomposing SCA into dominance, heterosis, and epistasis components (Hayman 1957; Gardner and Eberhart 1966), into reciprocal effects (Griffing 1956), their further decomposition into maternal and paternal effects (Cockerham and Weir 1977; Zhu and Weir 1996), and sex-linked variants thereof (Carbonell *et al.* 1983). Conversely, interest in obtaining GCAs with fewer than *J*^{2} crosses has motivated variants of the design such as the half-diallel (Griffing 1956) and the partial diallel (Kempthorne and Curnow 1961), among others (see Christie and Shattuck 1992; Lynch and Walsh 1998), which have themselves led to technical innovation (*e.g.*, Greenberg *et al.* 2010).

Disagreement about the precise meaning of parameters estimated from a diallel cross has presented a theoretical stumbling block to their interpretation. The parents could be inbred lines, independent outbreeding populations (such as open-pollinating varieties of corn), or outbred individuals (Eberhart and Gardner 1966). They could have been chosen deliberately, sampled randomly, or some compromise of these. The intention of the experiment could be to draw inferences about the parents themselves, the populations each parent represents, or the broader population from which all parents were drawn. Joint consideration of such factors weaves through much of the foundational diallel literature from 1950 to 1970 and has continued to represent a source of controversy (Baker 1978; Wright 1985).

A more practical stumbling block arises from the difficulty of estimating parameters from diallel data that are incomplete, imbalanced, or contaminated with outliers. Although some diallel crosses are created deliberately, a significant number arise as a by-product of intermediate stages in a multiparent breeding program. Such accidental diallels can contain valuable information, but their often haphazard patterns of missingness make them an imperfect match for well-studied designs. For many incomplete diallels it has been unclear how to analyze the data without discarding observations, drastically reducing the scope of inference, or making other significant compromises.

Even when traditional analysis methods accommodate the design, choice about which parameters (*e.g.*, explicit models of dominance, SCA, etc.) should be included in the model can alter the estimates and interpretation of the other parameters. The option of model selection by significance testing of individual terms, frequently proposed in older literature, provides some guidance but is unsatisfactory in that included parameters are estimated in a way that disregards uncertainty in model choice. The notion that any *a priori* plausible effect should be excluded from modeling seems to us artificial and out of step with modern approaches to applied statistical inference (*e.g.*, Gelman and Hill 2007).

We propose a general and efficient method for the analysis of diallel crosses and applying it to a data set of 48 phenotypes collected from an incomplete eight-strain diallel that arose serendipitously from the establishment of the Collaborative Cross (Churchill *et al.* 2004; Chesler *et al.* 2008; Collaborative Cross Consortium 2012). Our methods of analysis provide an inferential framework that is robust to imbalance in the design, missing data, and outliers. We model a wide range of effects including additive, heterosis, epistatic, parent-of-origin, and sex-specific variants thereof. This structure accomplishes two important goals. The first is familiar and constant interpretation of parameters across models. The second is stable and coherent estimation and prediction that is achieved through hierarchical Bayesian shrinkage and model selection.

## Statistical Models and Methods

We describe our hierarchical decomposition of the diallel in stages. Starting with the simplest submodel, which describes only additive strain dosage effects, we present successive elaborations, building up to the model depicted in Figure 1 with parameters listed in Table 1. We then state the full model in compact form, detailing its efficient estimation and methods for choosing informative submodels. Models are described by a quoted string of characters in alphabetical order in the first column of Table 1 where, following marginality restrictions (Venables and Ripley 2002), some components imply the presence of others unless otherwise stated.

### The “a” model

Consider an incomplete diallel arising from crosses among *J* inbred strains. Index the mothers with *j* and the fathers with *k*, such that any mating pair is (*j*, *k*) ∈ {1, … , *J*}^{2}. Let *y _{i}* be the phenotype value of individual

*i*and denote the mother, the father, and the mother–father pair relevant to individual

*i*as

*j*[

*i*],

*k*[

*i*], and (

*j*,

*k*)[

*i*]. For a continuous phenotype with normally distributed errors, we model

_{j}and father

_{k}are the contributions from mother and father, and ε

*∼*

_{i}*N*(0, σ

^{2}) is a deviation due to the specific individual (

*i.e.*, the residual). In this simplest model, each parental contribution amounts to a “dose” of the underlying strain;

*i.e.*,

*a*is the dose of the parental strain

_{j}*j*(equivalently for

*a*) and the effects are additive in the sense that two doses of strain

_{k}*j*increase the expected phenotype value twice as much as one dose. The estimated effects

**a**to be the sum of squared errors (

*i.e.*, quadratic loss),

*a*'s hierarchically, as if drawn from a common normal distribution,

_{j}*a*that is dynamically shrunk toward zero to the extent that its supporting data are few and their dispersion

_{j}In the “a” model, *a _{j}* coincides with the GCA for strain

*j*described by Sprague and Tatum (1942). Nonetheless, we make further comparisons with GCA and SCA only parenthetically, as these constructs hinder interpretation of later models. When the “a” model is seen as an estimation problem with primary interest on

**a**, the variance of additive effects

*Appendix A*.

### Accommodating outliers

When outliers are suspected, maybe as a result of erratic measurement error, it can be desirable to model the phenotype as being sampled from a distribution with heavier tails than the normal distribution. To simultaneously accommodate and detect outliers we model the individual deviations ε* _{i}* as if drawn from a scale mixture of normal densities,

*is modeled as 1/ν*

_{i}_{ε}times a draw from a chi-square distribution with ν

_{ε}d.f. As a result, ε

*is now*

_{i}*t*-distributed as ε

*| σ*

_{i}^{2}, ν

_{ε}∼

*t*(ν

_{ε}, 0, σ

^{2}). The scale λ

*for each data point has a prior mean*

_{i}*E*(λ

*| ν*

_{i}_{ε}) = 1.

*A posteriori*, λ

*acts as an indicator of the*

_{i}*i*th data point's outlier status, with

*E*(λ

*| ν*

_{i}_{ε}, Data) ≪ 1, suggesting a highly deviant observation. Setting ν

_{ε}→ ∞ implies that residual errors closely reflect a normal distribution, while lower values of ν

_{ε}imply increasing probability of large outliers (West 1984; Carlin and Louis 2008). For ν

_{ε}we consider values of 15 (only slightly heavier tailed than the normal), 3 (large outliers likely; the lowest integer ν

_{ε}for which the

*t*has finite variance), and 6 (an intermediate value, advocated in,

*e.g.*, Greenberg

*et al.*2010).

### Inbreeding and dominance effects: the “Bab” model

Hybrid vigor, or heterosis, describes the change in phenotype value due to heterozygosity when crossing two inbred lines (Lynch and Walsh 1998). It is conventional to model this effect as a dominance term that describes the deviation of hybrid individuals from the expected average of homozygous phenotypes. However, we believe the diallel is more naturally modeled in a converse manner, with inbreds, not hybrids, as the deviant type. In a full diallel, inbreds are in the minority, corresponding to 1/*J*th of the crosses. Even when the diallel is sparse, inbred crosses would seldom outnumber hybrids. We therefore define our predictive baseline as primarily modeling effects in hybrids, but accommodating inbred-specific effects through a deviation. First, Equation 2 is elaborated as_{(jk)[i]} is a deviation specific to mother–father pair (*jk*). In the “Bab” model, this pair effect is then modeled as a strain-specific inbred penalty *b _{j}* drawn from a common distribution centered at fixed effect β

_{inbred};

*i.e.*,

*a*. In our case,

_{j}*a*now estimates the dosage effect of strain

_{j}*j*when combined with another sampled strain. Defining the additive effects this way results in more stable and precise estimates for all effects in the diallel.

### Parent-of-origin effects: the “Babm” model

The full diallel includes reciprocal crosses of both mother_{j} × father_{k} and father_{j} × mother_{k}. This allows us to estimate strain-specific effects of maternity (which could include the uterine environment for mammals, mitochondrial effects, etc.) *vs.* paternity. We model these parent-of-origin effects as a symmetric deviation about the “Bab” model. If *m _{j}* is the “maternal” contribution from mothers of strain

*j*, then we revise Equations 3 and 4 to

### Symmetric and asymmetric effects: “Babmvw”

Departures from the “Babm” model corresponding to statistical interactions between pairs of strains can be represented through two additional layers of effects, revising Equation 9 to

### Sex-specific effects

Sex-specific effects can add considerable complication to a statistical model because it can not only double the number of parameters but also change the meaning of the parameters if, for example, the effect is expressed as an offset for one sex. Here we expand our model while preserving the meaning of the existing parameters, modeling the effect of an individual being female *vs.* being male through a symmetric deviation about an unsexed mean. Let ψ encode femaleness,*q*, adding of the term ψ(sex) · *q* to the model pushes the expected phenotype up *q*/2 for females and down *q*/2 for males. We can view females and males as modeled by Equations 14 and 15, respectively,*q _{j}* ∈ {

*a*,

_{j}*b*,

_{j}*m*},

_{j}*q*∈ {

_{jk}*v*,

_{jk}*w*},

_{jk}*q*∈ {

*v*,

*w*},

_{female}and β

_{female.inbred}are modeled as fixed effects, and each strain-specific sex-deviation component

*q*is modeled as random with its own variance

### The full model

Including fixed covariates **x**_{i} and *R* random-effect components *r* ∈ {1, … , *R*}, the “full model” (B_{s}Sa_{s}b_{s}m_{s}v_{s}w_{s}) is_{ε} ≠ ∞ described as the “full model with outliers” or “

### Prior elicitation

We model prior belief about the fixed effects μ, β_{inbred}, β_{female}, and β_{female.inbred}, using diffuse normal priors ^{2} and _{s}, b_{s}, m_{s}, w_{s}, v_{s}, we model prior belief using inverted chi-square distributions of the form τ^{2} ∼ μ_{τ} × Inv – χ^{2}(ν_{τ}) [equivalently described by the inverted gamma IG(ν_{τ}/2, μ_{τ}/2)], which is conjugate with our normal-likelihood and fixed-effect priors. To model vague prior information on reasonably scaled data, we apply weak priors with * _{τ}* = 0.02 and μ

_{τ}= 2 for the variance parameters. The information in the variance priors is equivalent to adding 0.02 datapoints from an additional strain.

### Setting shrinkage priors to beat the maximum-likelihood estimate

Within the class of unbiased regression estimators, the standard maximum-likelihood estimate *i.e.*, admissible) when judged by mean squared error (MSE) (*i.e.*, dominated) by biased estimators that employ shrinkage (*e.g.*, Parmigiani and Inoue 2009). In particular, ridge regression and generalized ridge regression will dominate the maximum-likelihood estimate (MLE) under MSE loss if**X** is a design matrix and **Q** is any generalized ridge shrinkage matrix (using Theorem 3.19 in Gross 2003). In our case, **X** specifies individuals’ parentage (see *Appendix B*), and **Q** is a diagonal matrix with **Q**^{−1} large enough for the above condition to likely hold. Thus, for large values of β* _{j}*, one would want large

*i.e.*, ∝ 1, as given by ν

_{τ}= 2, μ

_{τ}= 0, after,

*e.g.*, Gelman 2006, or our choice of a nonzero

*μ*) will often be preferable to the traditional Jeffreys prior [∝ 1/τ

_{τ}^{2}; μ

_{τ}= 0, ν

_{τ}= 0 (Jeffreys 1946)], which includes strong prior weight around

### Posterior estimation

We estimate posterior distributions for all parameters, using an efficient Gibbs sampling scheme. Details are provided in *Appendices B* and *C*.

### Posterior inference

We obtain raw posteriors by collecting sampled values for each parameter at each MCMC iteration. However, the Gibbs sampling scheme above gives marginal posteriors that appear overly vague for parameters that are grouped by a common variance [*e.g.*, **a** = (*a*_{1}, … , *a _{j}*)]. This is due to the inherent lack of identifiability in a highly parameterized mixed model. For example, a range of alternative values for

**b**,

**v**, and

**w**will produce identical predictions for

*y*. Hierarchical modeling means these alternatives are distinguished by their different posterior probabilities, but less so when the posteriors of

_{i}*t*= 1, … ,

*T*we calculate recentered versions of the grouped parameters

_{q}denotes the group of effects assigned to the same dispersion parameter

_{q}| is the number of such effects. The resulting posterior intervals for

*et al.*(2010), which gives a point estimate by finding the maximum of the hierarchical likelihood [

*h*-likelihood (Lee and Nelder 1996)] suggested by our model.

### Posterior prediction

Phenotype predictions for unobserved individuals that both anticipate sampling variation and incorporate all posterior uncertainty are easily obtained from MCMC output. We predict the phenotype *Y*(*j*, *k*, *s*), for a future individual with mother *j*, father *k*, and sex *s*, by constructing the appropriate design matrix **x**_{jks} and using the Gibbs samples to estimate the posterior predictive mean *t* ∈ 1, … , *T*. Prediction intervals for *Y*(*j*, *k*, *s*) are achieved from quantiles of a set of new draws ^{2(t)} is the *t*th draw of σ^{2} in the Gibbs sampler, but with *Z _{t}* being a new independent draw from

*N*(0, 1) or

*t*(ν

_{ε}) random noise.

### Model selection by information criteria

In comparing the suitability of different submodels estimated by Gibbs sampling, we consider the popular deviance information criterion (DIC) (Spiegelhalter *et al.* 2002), defined as **θ** collects all parameters, ℓ(·| **y**) is the log-likelihood, and *T* iterations. Lower DIC suggests a more parsimonious model containing fewer parameters that are poorly estimated.

### Bayesian model selection by exclusionary Gibbs group sampling

With potentially >400 models of interest under consideration (resulting from different combinations of effects), it is impractical to use information criteria for model selection because each model requires its own Markov chain. Furthermore, although the DIC minimizer is often a successful predictor of future *Y*(*j*, *k*, *s*), it is seldom as parsimonious as the true model in simulation. In particular, our Bayesian adaptive shrinkage means that even the full model B_{s}Sa_{s}b_{s}m_{s}v_{s}w_{s} can perform well against a model that is better informed. To deal with selection of parameter subsets *q* ∈ 1, … , *Q* in a way that better identifies valuable components, we consider a zero-inflated mixture prior (George and Foster 2000; Ishwaran and Rao 2005) on * _{q}* of being inactive, that is, equal to zero and having all corresponding effects

_{q}equal to zero. We develop an algorithm to draw from the conditional distribution

*l*∈

*L*as

*e.g.*, Guan and Stephens 2011). In this study, we set

*= π*

_{r}*= 0.5 for all*

_{q}*r*and

*q*. More details of the algorithm are presented in

*Appendix D*.

## Experimental Materials and Methods

### Phenotype data from a diallel of the Collaborative Cross founders

We collected data on multiple phenotypes (Supporting Information, File S1) in a diallel of eight inbred mouse strains (abbreviated names in parentheses), A/J (AJ), C57BL/6J (B6), 129S1/SvImJ (129), NOD/LtJ (NOD), NZO/H1LtJ (NZO), CAST/EiJ (CAST), PWK/PhJ (PWK), and WSB/EiJ (WSB), which are the founder strains of the Collaborative Cross (CC) (Churchill *et al.* 2004; Chesler *et al.* 2008; Collaborative Cross Consortium 2012). It would be expected that genetic effects present in the diallel will replicate in the CC itself, which motivates our interest in this population. Table 2 lists the phenotypes collected, transformations used to normalize each phenotype before diallel analysis, and the completeness of the data for each phenotype. All CC F_{1} animals had free access to standard laboratory chow containing 6% fat by weight (LabDiet 5K52) and acidified drinking water throughout the phenotyping protocol. Mice were on a 12-hr:12-hr light:dark cycle beginning at 6:00 am and were housed two to five animals per pen in pressurized, individually ventilated cages.

### Blood composition (ADVIA)

Whole blood was obtained in the morning from the retro-orbital sinus of nonfasted animals 7 weeks of age. Collection of 200 μl from each animal was performed using an EDTA-coated microhematocrit tube directed into a 1.5-ml microcentrifuge tube containing 2 μl of 10% EDTA. Samples were analyzed on the Bayer Advia 120 autoanalyzer within the 4 hr following collection.

### Blood pressure

Animals were acclimated to a dedicated room for blood pressure (BP) measurement on the Friday before a 5-day testing period beginning the following Monday. Systolic blood pressure and pulse were measured using the BP-2000 tail-cuff system (Visitech Systems, Apex, NC). Four unanesthetized mice were placed on a warmed platform (37°) and each was held in place using a magnetic restraining cover. The tail is placed through a cuff and held with a magnetic sensor unit that detects when blood flow stops and starts. Each day 30 measurements are obtained per mouse. Mice were trained to the apparatus for the first 3 days of the testing period and data were collected for analysis in the last 2 days, for average values based upon 60 total measurements. Animals were 10 weeks of age when blood pressure was measured.

### Plasma chemistries

Whole blood was obtained from 8-week-old animals after a 4-hr period of food removal in the morning (7:00–11:00 am). Collection of 150 μl from each animal was performed using a heparin-coated microcapillary tube inserted into the retro-orbital sinus and directed into a 1.5-ml microcentrifuge tube containing 2 μl of 1000 units/ml heparin. Samples were placed on ice prior to centrifugation at 10,000 rpm in a refrigerated microcentrifuge for 10 min. Plasma was collected into a clean tube for analysis on the Beckman (Fullerton, CA) CX-Delta5 Chemistry autoanalyzer.

### Densitometry and body composition

Body composition was assessed when mice were 16 weeks of age by dual-energy X-ray absorptiometry (DEXA), using a Lunar PIXImus densitometer (GE Medical Systems) after mice were anesthetized intraperitoneally with tribromoethanol (0.2 ml 2% solution per 10 g body weight). Because the skull is so bone dense, it is omitted from the DEXA analysis. Mice were weighed using an Ohaus Navigator scale with InCal calibration to accommodate animal movement.

### Electrocardiography

Unanesthetized mice aged 12 weeks were placed on the ECGenie (Mouse Specifics, Quincy, MA) for analysis of electrocardiogram parameters. Recording is initiated when the paws of the animal contact a 3-lead electrode plate. Data are analyzed using manufacturer's software.

## Simulations

We assess the performance of our methods by simulation, evaluating their ability to infer genetic parameters and to predict future phenotypes on an 8 × 8 diallel of these inbred strains. Two genetic architectures are considered, one simple, with additive effects only, and one more complex, with additive, inbreeding, maternal, and sex-specific maternal effects. We refer to our general Bayesian model as “BayesDiallel” and to its associated model selection procedure as “BayesSpike”. See File S1.

### Estimation and prediction of additive genetic effects in a simulated diallel

Gibbs sampler approaches can be difficult to compare with non-Bayesian methods or even with each other, given their indefinite approach to a point estimate. For the outlier model with low degrees of freedom, the posterior may possibly be multimodal. Furthermore, given the high dimensionality and structure of our decomposition of the diallel, not all parameters receive the same information content per complete diallel replicate, and whereas some parameters are better informed by incomplete diallels than others.

To start, we compare BayesDiallel and BayesSpike to estimates obtained from ordinary linear regression, using Griffing's model in Equation 1. We simulate an 8 × 8 complete diallel with five replicate individuals in each cell and assume all individuals are of the same sex. The first two columns of Table 3 (top section) list the values of the simulated parameters: eight additive strain effects (*a*_{1}, … , *a*_{8}), an intercept (μ), and noise variance (σ^{2}). Sampling from a normal distribution, these are used to generate 5 × 64 = 320 simulated phenotypes for the diallel. Columns 3 and 4 give MLEs and 95% confidence intervals from linear regression using Griffing's model for GCA (Equation 1 without the *s _{jk}*, which is directly competitive with the a model in Table 1) and for GCA + SCA (Equation 1 including

*s*, which is directly competitive with the “av” model, as defined in Table 1). In this setting the GCA model is at a distinct advantage: it knows

_{jk}*a priori*the true architecture and can thus save degrees of freedom from fitting specious parameters. The misinformed GCA + SCA model, however, risks overfitting unless the point estimates for the SCA effects happen to be small. We then consider several options for analyzing the diallel within the (unsexed) BayesDiallel framework: the full model, which we fit with and without consideration of outliers, despite the fact that outliers are not simulated; the model that minimizes the DIC among all 2

^{6}= 64 unsexed models; the true model, which is a Bayesian version of the GCA; the full model fit by hglm, which maximizes the

*h*-likelihood (see

*Statistical Models and Methods*); and the component-switching BayesSpike, which attempts explicitly to learn components of value. For the BayesSpike models we report the marginal posterior median (after Meng 2008); for the hglm fit, we report the point estimate and 95% confidence interval; for all other BayesDiallel methods we report the posterior intervals, calculated as the central 95% quantiles of the posterior distribution (Carlin and Louis 2008). For the Bayesian methods, we consider their ability to select out parameters either through severe shrinking or through formal selection. In the “top false” row in Table 3, we report for the BayesSpike the largest MIP of untrue fixed and random components, assuming a (generous) prior MIP of 0.5 per component; for the remaining Bayesian methods, we report the estimated effect for the largest untrue fixed component and the τ

^{2}

_{q}estimate for the largest untrue random component

*q*.

Diallel analysis should (at minimum) help the researcher predict phenotypes of new individuals from sampled crosses. We generate 1000 new simulated diallels on the basis of the same genetic architecture (*i.e.*, using the parameters in Table 3, column 2). Separately we use the point estimates from each method to predict the expected phenotype of the new individuals in each cell and compare these predictions with each of the 1000 subsequently observed data sets. The bottom row in Table 3 compares predicted with observed values, reporting the MSE as the average over simulations (upper row) and as its 95% central quantile (lower row).

Athough μ tends to have a wide confidence/prediction interval relative to the additive effects, Table 3 shows that the BayesDiallel models can meet or improve on the MLE for GCA in terms of prediction error and point estimates, despite the fact that the full model has 82 parameters. Against the GCA + SCA MLE, however, any method using hierarchical shrinkage is twice as successful in forecasting new phenotypes. This advantage to hierarchical models would reduce with smaller σ^{2}. The symmetric and asymmetric effects (v and w) tend to be the false components most likely to enter the model when using BayesSpike, although they are typically estimated with a much smaller magnitude than the additive effect.

Table 4 shows results from 100 simulation experiments, with 1000 test data sets per experiment. Values in this table measure the mean discrepancy in estimated effects and prediction MSE. Noise level, σ^{2} = 120, represents a lower limit on the performance of any estimator. We see that when the model is overspecified, as in the GCA + SCA model, lack of shrinkage severely affects the consistency of the MLE.

### Inferring a complex genetic architecture: a “BSabm_{s}” model

To investigate the performance of the Bayesian model when many effects are present, we simulated a more complex genetic architecture that included sex (“S”), maternal (“m”), sex-specific maternal (“m_{s}”), and inbreeding (“Bb”). In this case the nonzero effects were **a** = (−7.21, −5.77, −2.88, −0.72, 0.72, 2.16, 5.05, 8.65), **b** = (4.12, −3.88, −3.88, 2.12, 3.12, −2.88, 0.12, 1.12), **m** = (1, 1.63, −2.54, 7.24, 9.76, −14.18, 0.19, −3.08), *φ*^{(m)} = (3.75, 3.25, 4.25, −11.75, 0.75, −0.25, −4.75, 4.75), μ = 10, β_{female} = 4, β_{inbreed} = –4, and σ^{2} = 120. These were chosen so that ∼20% of variation was explained by the “a” component, 20% from “m”, 12% from “m_{s}”, and 0.5% from “b”. We generated 295 simulated data sets of 8 × 8 diallels with five replicates of each sex (640 individuals per data set). To each diallel we applied all of the methods tested above, except model selection using DIC (which was intractable for multiple replications when the number of models was 400+ in sex-effect models). Table 5 provides a summary of prediction and estimation error analogous to Table 4. In this case, we see that all of the BayesDiallel models are able to adapt to the active components, producing acceptable prediction error. Predictions are weaker for strain-specific “b” and “m_{s}” effects, reflecting the smaller amount of data that inform them. Griffing's models are shown for comparison, but as expected, perform poorly in this realm.

Figure 2 plots the results from fitting the Bayesian model to 50 of the simulated diallels described in Table 5. The black line in Figure 2, A and B, describes the set of parameters used for the simulation, plotting for each component *q* ∈ {*Q*, *L*} (*i.e.*, among the *Q* random components or *R* fixed-effect components) the parameter value β* _{q}* if it is a fixed effect or the SD of its vector

**q**if it is a random effect. In Figure 2A, each colored line summarizes estimates from the full model applied to one simulated data set, plotting

*E*(β

*| Data) and SD(*

_{q}*E*(

**q**|Data)), as appropriate. Figure 2B does the same for the BayesSpike, using median(

*β*| Data) and SD(median(

_{q}**q**|Data)). Figure 2, A and B, shows that whereas the full model shrinks spurious components, BayesSpike forces them to zero. Figure 2C plots the posterior MIP,

_{q}_{0}= 0.5, with the black line now indicating the median. Figure 2D shows MIPs for BayesSpike allowing for outliers at 6 d.f. and appears similar to Figure 2C. However, Figure 2 E and F, which plot the same data in a different way, reveal an important difference. They show the log

_{10}of the Bayes factor, calculated as

## Application to Data

We apply our Bayesian models and BayesSpike procedure to data on 48 phenotypes, collected on mice from a diallel of founders of the Collaborative Cross. We start with an analysis of mouse weight, for which we were able to collect almost a full diallel. We then describe our semiautomated analysis of 48 phenotypes, commenting on select examples that demonstrate robustness and stability in the presence of sparsely sampled data, and richly characterize genetic and parent-of-origin architecture when data are abundant.

### Analysis of weight data in a diallel of the Collaborative Cross founders

Body weight data were obtained for 292 female and 302 male mice (Figure 3). Shaded cells in Figure 3, A and B, represent observed phenotypes with the degree of shading indicating the average weight of mice in each group (log_{10} scale). Crossed boxes indicate the absence of phenotyped animals. Although mostly complete, this diallel is imbalanced: cells contained between 2 and 20 mice, with 1?13 male and 1?7 female. Visual inspection of this diallel reveals some striking trends. The dark banding in column and row 5 shows that the genome of the “New Zealand Obese” (NZO) mouse exerts a strong weight-inducing effect on its progeny (Taylor *et al.* 2001). The asymmetry of the banding, however, suggests this effect is transmitted more strongly from the mother than from the father. A similar asymmetry is apparent for strain AJ (row and column 1). Moreover, Figure 3, A and B, shows means only, unmoderated by shrinkage effects that would factor in the different numbers of mice that contribute to those estimates.

We applied the full model with outliers (B_{s}Sa_{s}b_{s}m_{s}v_{s}w_{s}O_{6}) to the body weight data. Highest posterior density (HPD) (Box and Tiao 1973) intervals of 163 effects parameters based on 8000 posterior samples from four independent Monte Carlo Markov chains are shown in Figure 4. Parameters are divided into four groups: general effects, which include the inbreeding penalty (“inbreed.overall”; B) and strain-specific effects of additive genetics (“additive”; a), inbreeding (“inbreed”; b), and parent-of-origin effects (“maternal”; m); strain pair-specific effects, which encompass effects peculiar to specific strain pairs (v and w); sex-specific effects, which include sex-specific deviations of the general effects (S, B_{s}, a_{s}, b_{s}, m_{s}); and sex/strain pair-specific effects, encompassing sex-specific deviations from the strain pair-specific effects (v_{s}, w_{s}). The general effects clearly show the high additive dosage effect of NZO (“additive:NZO”), plus some evidence for mothers transmitting this effect more strongly (“maternal:NZO”). A more striking parent-of-origin effect is evident in CAST (“maternal:CAST”), which has its 95% HPD most displaced from zero and indicates CAST mothers transmit low body weight more strongly than CAST fathers. The sex-specific effects include an expected drop in weight for females (“female.overall”) but few other strong deviations from zero. The strain pair-specific and sex/strain pair-specific effects are typically more vague. They represent fewer observations and so are more strongly subject to Bayesian adaptive shrinkage, which pulls extreme but sparsely supported means toward the middle. Figure 3, C and D, shows means of the (posterior) predictive distribution: that is, the average value of mice in a new diallel of the same strains that would be expected on the basis of the diallel model and the observed data. These Bayesian predictions incorporate all uncertainty due to finite sampling of the data and prior uncertainty about the parameters.

### Semiautomated analysis of 48 phenotypes in a diallel of the CC founders

We applied the BayesSpike procedure for automated selection of diallel components to all 48 of the phenotypes in Table 2. Figure 5 lists the posterior MIPs for each diallel component, assuming a prior MIP in each case of _{q}_{0} and π* _{q}* are the prior and posterior MIPs for component

*q*∈ {

*Q*,

*L*} (where

*Q*is the set of all effects grouped by a common variance, and

*L*is the set of fixed effects),

*c*= (2|{

_{Q}*Q*,

*L*}|

_{0})

^{−1},

*c*= (–log[min(π

_{q}

_{q}_{0}, 1 – π

_{q}_{0})])

^{−1}, and ‖{

*Q*,

*R*}‖

_{0}= 13 is the number of components considered in the selection. For each phenotype, the posterior MIPs (rounded to 2 decimal places) are based on 10,000 samples from five independent Markov chains.

The numbers given for each effect in Figure 5 indicate how strongly, and in what direction, the data shift opinion about which components should be included. For example, the MIP values for systolic blood pressure (SystolicMean) are mostly near 0.5, indicating little posterior certainty and reflecting the fact that only 188 mice had measurements for this phenotype and these encompassed only 24 of 64 diallel combinations (Figure 6). Yet despite the sparsity of this data set, the BayesSpike returns stable, if vague, posterior opinion about inclusions, and the full model, fitted to the weight data above, provides stable, if vague, posterior distributions (Figure 7) for 175 effects and variance parameters, 188 outlier parameters, 2 × 64 predicted new crosses, and any further combination of parameters that is of interest. In particular, the data set contains no inbreds, which means that posterior information about inbreeding parameters (Figure 7, rightmost column) is almost as diffuse as the original priors, and BayesSpike probabilities indicate the absence of evidence for or against their inclusion. In contrast, HDL cholesterol (HDL), for which there are considerably more data (631 animals with 62/64 cells covered), provides strong evidence for an overall effect of sex (“S”), strain-specific effects of additive (“a”) and dominance (as inbreeding; “b”), and symmetric effects, *i.e.*, effects that are specific to F_{1}'s between particular pairs of strains but for which the mother/father assignment does not matter. It also shows strong evidence against any sex-specific effects (*e.g.*, sex-specific additive effects “a_{s}”) beyond that explained by an overall shift in mean.

Strong evidence for a genetic effect need not imply that the effect exerts strong influence on the phenotype. Figure 8 plots strain-specific effects for some of the phenotypes listed in Figure 5. In this “straw plot”, each colored line tracks the posterior mean for the relevant CC founder strain, with predicted means of inbreds given in the bottom two rows (these contain a double dose of additive effects and so appear magnified). For ease of comparison, all values are shown on the scale of standard deviations of the transformed phenotype (transformations listed in Table 2). The percentage of eosinophils (EOS) (second straw plot) seems only moderately affected by additive genetics in this diallel. Nonetheless, as Figure 5 attests, the evidence for those effects is extremely strong, as is the evidence against substantial contributions from any other genetic, sex, or parent-of-origin effects, save a single symmetric increase attributed to the F_{1} combination of PWK and CAST (95% HPD is between 0.06 and 1.7, posterior mean = 0.88, posterior probability of being ≤0 = 0.99). The weight phenotype (rightmost straw plot) is the same as that described in Figures 3 and 4 and is marked by the effect of NZO escaping the ±2 SD axis boundaries. A similar, if more moderate effect of NZO is evident for HDL cholesterol (Figure 8, fourth plot). White blood cell count (WBC) is among the few phenotypes that show strong evidence for sex-specific effects, as well as unsexed effects (Figure 5, Figure 8). In particular, examination of its posterior HPD intervals (Figure 9) reveals that the sex specificity is confined to epistatic effects (v_{s}, w_{s}) involving strains AJ, WSB, and 129, with a wider range of epistatic effects existing irrespective of sex. These posteriors accommodate, through our outlier model the potential for erratic output from the measuring equipment. Figure 10 plots distributions of the posterior data weight (*i.e.*, the datapoint reliability) attributed to white cell counts from each individual, distinguishing one observation so incongruous as to merit down-weighting to 1/10th of a data point on average. From this robust analysis, white blood cell count emerges as a phenotype that, although apparently free of parent-of-origin effects, has an otherwise complex genetic architecture.

## Discussion

We describe an efficient and general framework for modeling effects of genetics, parent-of-origin, and sex on phenotypes collected for diallels of inbred strains. By deploying a fully Bayesian approach with conjugate priors, imbalance and missing data translate to vagueness in the posterior rather than instability of the estimates. By adopting an MCMC approach to estimation, we provide a flexible environment in which the posterior distribution of arbitrary combinations of parameters, including prediction of new data, can be easily obtained. Moreover, to satisfy all inferential tastes, we describe a rapid and powerful formulation of Bayesian model selection for weighing the evidence in support of each category of effect in the diallel.

Nonetheless, our approach is motivated by our bias: as geneticists focusing on model organisms, we are primarily concerned with characterizing the genetic architecture within the set of *J*^{2} genome combinations with a view to subsequent hypothesis-driven experiments. We less often seek to infer formally parameters relating to the superpopulation of individuals or species from which those inbred strains were drawn. Interestingly, traditional literature on diallel analysis has tended to oppose the use of random effects in our context. It espouses what we call the “random parents commonplace”: that modeling parental contributions as random effects is valid only if the parents have been drawn at random from a larger population and it is the variance parameters of this larger population that the investigator seeks to estimate. It further asserts that when the above conditions are not met, for example, if we are interested in parental effects present in the cross, or if the parents were chosen deliberately, then parental contributions should be modeled as fixed effects (*e.g.*, Griffing 1956; Eberhart and Gardner 1966; Baker 1978). This commonplace persists in current literature, reiterated in, for example, Greenberg *et al.* (2010), who nonetheless develop an elegant Bayesian hierarchical model tailored to analyze a specific type of outbred diallel.

We consider this view in need of updating. Stein (1955) shocked the statistical world by showing that when simultaneously estimating three or more means as part of the same decision problem, fixed-effects modeling is dominated by biased methods that use shrinkage. Trading off errors in one dimension with those in another, the resulting estimates are drawn closer together when dimensions look similar, with useful shrinkage even when those dimensions are unrelated (Parmigiani and Inoue 2009). In the diallel, dimensions are related, making the argument for hierarchical modeling yet stronger. Although it is sometimes hard to concoct a rationale for parental effects coming from a common distribution, it should be easy to intuit that they lie on a common scale and that knowledge about *a*_{1}, … , *a _{j}*

_{−1}provides information about how we would expect

*a*to appear. Bayesian updating provides a coherent rendering of this intuition, allowing information from the data and uncertainty from the priors to propagate through the hierarchy and inform posterior estimates of parameters and predictions of new effects (Bernardo and Smith 2000; Sorensen and Gianola 2004). To make our point, we demonstrate by simulation how a fixed-effects GCA model with 10 parameters is matched or beaten by Bayesian shrinkage models with

_{j}*n*+ 175 parameters in its own backyard (Tables 3 and 4).

When there is prior belief that phenotypic similarity will tend to follow overall genetic similarity (*e.g.*, Kang *et al.* 2008), relatedness (*e.g.*, Lynch and Walsh 1998), spatial distribution, or some other structure that can be incorporated into an expected *J* × *J* correlation matrix **A**, then this could aid estimation, potentially being incorporated by replacing Equation 4 with

The use of a mixed model makes it tempting to interpret variance parameters as heritabilities. However, the very depth of genetic characterization afforded by the diallel, as well as legitimate aspects of the random parents commonplace described above, suggests a more reflective approach. Our decomposition of effects groups into a, b, m, v, w spans many possible reduced models for quantitative inheritance. When we further explore the heritability, or the amount of variance explained, by each group of effects, we find that, in a diallel breeding structure, the variance contributions of the groups interact. For an expedient method of calculating *q*, we promote a calculation using the estimated *Appendix A*).

In our model we do not treat explicit variance heterogeneity (Rodriguez *et al.* 1993). However, our outlier approach implicitly adapts to variance heterogeneity by assigning reduced weights λ* _{i}* to measurements it deems to be more variable. Postprocess exploration of

*|*

_{i}*j*,

*k*,

*S*) can be achieved by modeling 1/λ

*∼ exp{x*

_{i}_{i}

^{T}

**γ**}, replacing the update for λ

*with Gibbs step*

_{i}*values. Ideally,*

_{i}**γ**would have a reduced form including just additive, or sex-specific additive, components to retain model parsimony. See Rönnegård and Valdar (2011) for an example of such a double generalized linear model in practice.

Bayesian models are often slower to compute and more difficult to interpret, and can require a large and ill-defined set of prior choices. Although it is often the case that estimators using reasonable hyperpriors can outperform the MLE over a large parameter space, for most loss functions neither MLE, nor ridge, nor Bayes estimators completely dominate each other. Given a mixed-model decomposition of the diallel, including the possibility of outliers, unobserved diallel combinations, and unknown sparsity among the parameters, stable MLE estimates can be difficult to achieve. Penalized estimates [*e.g.*, from group LASSO (Hastie *et al.* 2009)] represent an opportunity, but these can often be interpreted as a Bayesian prior hypothesis. We demonstrate the use of the *h*-likelihood (using hglm) for obtaining mostly accurate point estimates for our diallel model and in less computational time. Although useful for this purpose, the hglm approach does not easily provide the rich flexibility of our Gibbs sampler for, among other things, modeling outliers and uncertainty about component inclusions. We supply an algorithm with reasonable default priors, but users are still permitted to choose their own, including flat Gelman or Stein priors, Jeffreys priors, or heavily informed priors from the inverse gamma family. For the BayesSpike model, one might consider *a priori* that all components have an equal 50% chance of inclusion, that they have different anticipated prior weights, or that the prior proportion of active groups is itself unknown and must be learned in the model.

Approaches that model hierarchically, borrowing strength across data sources and explicitly defining higher-order components, are becoming essential to navigate the complex high-dimensional spaces created by high-throughput data of all types. However, such hierarchies should be designed to empower researchers, not intimidate or perplex them. We emphasize interpretability of all parameters in the model in the hope they can at some level be interpreted and critiqued by nonexperts. We provide a framework for high-, medium-, and low-level analysis of imperfectly sampled diallels. In doing so, we succinctly summarize results from a vast amount of original data on the genetic architecture of phenotypes in founders of the Collaborative Cross and their F_{1} hybrids.

Our diallel Gibbs sampler and BayesSpike software are provided free of charge as R packages R/BayesDiallel and R/BayesSpike as soon as practicably in File S1.

## Acknowledgments

We thank Vasyl Zhabotynsky and Fred Wright for helpful discussions. The authors acknowledge partial support from the National Institutes of Health (NIH) Center of Excellence in Genome Sciences grants P50 MH090338 and P50 MH006582 (to A.B.L. and W.V.), from NIH grant GM076468 (to K.L.S. and G.A.C.), from The Jackson Laboratory (G.A.C.), and from the University of North Carolina Lineberger Comprehensive Cancer Center (A.B.L. and W.V.).

## Appendix A: Heritability in the Diallel

Heritability is the proportion of phenotypic variance explained by genetic effects. Its exact definition depends on what effects are being considered and in which population phenotypic variance is measured. Consider any linear genetic model, *y _{i}* =

**x**

_{i}

^{T}

**β**+ ε

*, where*

_{i}**β**∈ ℝ

^{p}is an explanatory vector of common effects, and ε

*∼*

_{i}*N*(0, σ

^{2}) are iid environmental effects, for independently sampled individuals

*i*∈ 1, …,

*N*in the population. The vector

**x**

_{i}∈ ℝ

^{p×1}represents a draw from the genetic diversity of haplotypes/strains/alleles as distributed in the population, represented by stacked matrix

**x**

_{i}and

**β**were randomly distributed and independent, and

*y*) =

_{i}*E*[(

*y*– μ

_{i}*)*

_{Y}^{2}] has expectation that can be calculated

Equation A1 provides a method to decompose observed variability of *Y* to the contributions of specific effects or groups of effects. If we assume that diallels of future strains will have effects distributed *E*(**ββ**^{T}) is a diagonal matrix with *E*[**x**_{i}**x**_{i}^{T}] ∈ ℝ^{p × p} is a very structured design matrix. If we considered mother strain *j* and father strain *k* to be independent draws from a pool of *J* strains, and gender of the offspring to be an independent Bernoulli draw, then we can calculate explicit expected dosage amounts of each of the a, b, m, v, w, a_{s}, b_{s}, m_{s}, v_{s}, and w_{s} effects. For additive effect *a _{j}*

_{′}, for strain

*j*′, located at position {

*x*}

_{i}_{a(j′)}along the

**x**

_{i}covariate vector, expected

*J*separate

*a*

_{j}_{′}terms, the variance contribution of additive effects can be approximated as

in a diallel with no sex-specific effects. We see that heritability contributions of inbreeding effects *b* contribute at a reduced level of 1/*J* to the variability of the phenotype. This is because in a complete diallel, inbred subjects compose only 1/*J* of the population.

## Appendix B: Gibbs Sampling Scheme for the Full Model

Collecting all fixed-effect (*e.g.*, β_{female}) and strain-specific random-effect parameters (*e.g.*, *a _{j}*) in a single vector of

*p*regression coefficients

_{M}**β**, we construct an

*n*×

*p*design matrix

_{M}**X**= [

**x**

_{1}

^{T}…

**x**

_{n}

^{T}]

^{T}and consider the regression problem

**X** is both sparse and highly structured in our diallel model. For instance, in the “av” model (additive and symmetric effects), *p _{M}* = 1 +

*J*+ (

*J*+ 1) ×

*J*/2, and the last (

*J*+ 1) ×

*J*/2 positions in

**x**

_{i}are mostly zeros with a single 1 at the position corresponding to mother/father pair (

*j*,

*k*)[

*i*]. Let the parameter groups a, a

_{s}, m … be enumerated as groups

*q*∈ 1, … ,

*Q*and

*q*) denote the group of coefficients

*q*assigned to the same shrinkage parameter

_{j}**Q**matrix, as suggested from Equation 17, with diagonal terms

which can be efficiently sampled by taking the Cholesky square root of **X**^{T}**X** + σ^{2}**Q**^{−1}, where the diagonal of matrix **X**^{T}**X** counts the number of subjects with relevant *j*, *k* pairs for each *q* group. We deploy Gibbs sampling (Geman and Geman 1984; Gelfand and Smith 1990; Casella and George 1992), using C-level BLAS (Dongarra 2002) code compiled in package format for R (R Development Core Team 2011), and find that the 2011 Macintosh vecLib LAPACK (Apple Developers) is sufficient to generate as many as a 8000 draws in 26.2 sec from the posterior, given that *p _{M}* has an upper bound of 244 when the number of strains is 8. However, when

*p*gets larger, doing so as the number of strains increases, it is necessary to invert conditional subset groups of

_{M}**β**. The advantage of a single Cholesky draw is to reduce autocorrelation of the Gibbs sampler. In this case, maximum autocorrelation at 1-lag for

*p*= 164 variables is 0.177, and max 20-lag is 0.086, with 95% of the 1-lag being between ±0.05. Given

_{M}**β**, we draw

where |_{q}| is the count of members of group *q*, and draw σ^{2} as

## Appendix C: Gibbs Sampling Scheme for the Full Model with Outliers

Computational complexity changes with the outlier model, where * _{i}* as a weight for subject

*i*from

then we can reweight by defining components

becomes a new draw of **β**. Since the reweighting is an *n* > *p _{M}*.

## Appendix D: Exclusionary Gibbs Group Sampling in the Diallel

To sample efficiently from Equation 18, we consider the residual vector * _{j}* for

*j*not in

*q*). Integrating

provides a Gibbs decision to turn on or off parameters in group *q*. Moreover, to avoid potentially slow numerical integration, we construct a Metropolis–Hastings importance sample from *F*^{+} in expectation.

## Footnotes

Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.111.132563/-/DC1.

*Edited by Lauren M. McIntyre, Dirk-Jan de Koning, and 4 dedicated Associate Editors*

- Received July 11, 2011.
- Accepted October 5, 2011.

- Copyright © 2012 by the Genetics Society of America

Available freely online through the author-supported open access option.