## Abstract

Partial diallel crossing designs are in common use among evolutionary geneticists, as well as among plant and animal breeders. When the goal is to make statements about populations represented by a given set of lines, it is desirable to maximize the number of lines sampled given a set number of crosses among them. We propose an augmented round-robin design that accomplishes this. We develop a hierarchical Bayesian model to estimate quantitative genetic parameters from our scheme. For example, we show how to partition genetic effects into specific and general combining abilities, and the method provides estimates of heritability, dominance, and genetic correlations in the face of complex and unbalanced designs. We test our approach with simulated and real data. We show that although the models slightly overestimate genetic variances, main effects are assessed accurately and precisely. We also illustrate how our approach allows the construction of posterior distributions of combinations of parameters by calculating narrow-sense heritability and a genetic correlation between activities of two enzymes.

CROSSES among inbred lines (diallel crosses) have a long history in quantitative genetics (Sprague and Tatum 1942; Lynch and Walsh 1998). When one is interested in easily measurable properties of a particular, and limited, set of lines, the full diallel provides an efficient means of estimation of many parameters of interest to, for example, animal and plant breeders (Lynch and Walsh 1998). In evolutionary genetics, however, a common aim is to estimate the properties of populations the inbred lines represent. This requires a large sample of lines from a given population, and the crosses have to be replicated to control for environmental variance. Furthermore, the phenotypes assayed can be complicated or expensive to determine. Because the number of crosses in the full diallel grows very rapidly with the number of lines, this design is not practical for such purposes (Kempthorne and Curnow 1961; Lynch and Walsh 1998).

A variety of designs where only a subset of possible crosses is performed have been developed over the years (Lynch and Walsh 1998). Ideally, one would like a scheme where the number of crosses grows linearly with the number of lines sampled, while maintaining the ability to estimate important parameters relevant to evolutionary processes, such as narrow-sense heritability. The round-robin design (Wayne *et al.* 2004), where each line participates in one cross as a male and one as female, comes close. This design is related to the circulant crosses proposed by Kempthorne and Curnow (1961). The round-robin scheme is attractive because the number of crosses is equal to the number of lines, and thus the number of lines sampled with a given number of crosses is maximized. However, this design provides low power to estimate the main effects of lines [general combining ability (Sprague and Tatum 1942; Lynch and Walsh 1998)], and for X-linked traits assayed in males the task is impossible (Wayne *et al.* 2004).

Another possibility is to use only inbred lines themselves (*e.g.*, Ayroles *et al.* 2009a). While useful for association mapping, this approach precludes separate estimates of general and specific (*i.e.*, pertaining to particular cross) combining ability. Furthermore, the effect of inbreeding on the genetic architectures of traits is largely unknown (Lynch and Walsh 1998; Charlesworth and Charlesworth 1999). Recent studies show that mRNA levels of many genes, especially those involved in metabolism and stress response, are appreciably altered in inbred lines of *Drosophila melanogaster* (Kristensen *et al.* 2006; Ayroles *et al.* 2009b), potentially changing the associations of nucleotide variants with phenotypes. Inbreeding may thus confound the results of quantitative genetic experiments.

While partial diallel designs are an attractive option when the number of crosses is limited, estimation of quantitative genetic parameters from such schemes is in general difficult (Lynch and Walsh 1998). Traditional ANOVA and maximum-likelihood approaches can deal with relatively simple designs, especially in the absence of replication (Lynch and Walsh 1998). Development of Bayesian methods, coupled to Markov chain Monte Carlo (MCMC) sampling, greatly expanded the flexibility to estimate parameters from crossing schemes of arbitrary complexity (Sorensen and Gianola 2002). Moreover, Bayesian posterior distributions of parameters of interest take account of uncertainty in estimates of all other elements of a model. Finally, MCMC methods allow sampling from distributions of arbitrary combinations of parameters. This feature greatly expands the set of questions that can be probed statistically. Animal breeders have long embraced Bayesian statistics (Gianola and Fernando 1986; Blasco 2001; Sorensen and Gianola 2002; Thompson *et al.* 2005) and these methods are popular in population genetics and quantitative trait mapping (Beaumont and Rannala 2004). In contrast, applications in evolutionary quantitative genetics are still rare (Walsh 2001; O'Hara *et al.* 2008). We therefore set out to explore the utility of a Bayesian approach in estimating parameters of interest to evolutionary quantitative geneticists. In particular, we examined hierarchical modeling, a method that is popular in the social sciences (Gelman and Hill 2007) but has not been widely adopted by geneticists. Our purpose is not to demonstrate the superiority of the Bayesian perspective—we believe that the choice between Bayesian and maximum-likelihood approaches should be dictated by pragmatic considerations. Rather, we seek to demonstrate that Bayesian modeling can provide an accurate and useful alternative to traditional methods, even in the face of a complicated experimental design.

We propose an augmented round-robin crossing design and a hierarchical Bayesian model to estimate quantitative genetic parameters from it. This approach is motivated by our interest in assessing within- and between-population variation in a variety of metabolic traits, building on the work of Clark (1989). We augment the round-robin design with crosses between populations and measurements of the inbred lines lines themselves. Thus, each line participates in four crosses and is also “selfed,” enabling us to separate general and specific combining ability. We replicate the crosses to measure environmental effects.

The nesting of environmental replicates within crosses, and lines within populations, is naturally modeled using a hierarchical Bayesian approach (Gelman *et al.* 2004; Gelman and Hill 2007). We implement such a scheme and perform simulations to show that it performs well in estimating quantitative genetic parameters of interest to us. We illustrate the utility of our approach with an example using real data. The techniques we describe should be useful in a variety of settings in evolutionary quantitative genetics, as well as in animal and plant breeding.

## METHODS

#### Simulated data:

The crossing scheme we simulated is depicted in Figure 1A. We deterministically set mean values for four populations (population means: *A* = 8.0, *B* = 9.0, *C* = 8.5, and *D* = 6.0) and generated 15 line samples from a normal distribution with the mean equal to the corresponding population mean. We ran two sets of simulations—with high and low narrow-sense heritability (*h*^{2}) (Lynch and Walsh 1998). We list the parameters of these simulations in Table 1. The standard deviations (SD) for the normal distributions we used to generate line means were 1.5 for the high- and 0.5 for the low-heritability sets. To simulate crosses, we drew values from normal distributions with means equal to midparental values and standard deviations 1.0 (high heritability) and 0.4 (low heritability). Cross values were then adjusted for inbreeding and outcrossing by subtracting a value drawn from a normal distribution with mean 2.0 and SD 0.25 for inbred lines and adding a value drawn from a normal distribution with mean 1.0 and SD 0.1 for the between-population crosses. In keeping with the real data sets we are generating in the lab, we hierarchically simulated two kinds of environmental effects (Figure 1B). First, we generated two “blocks” of replicates by drawing two values for each cross from a normal distribution with mean equal to the cross mean and SD 0.7 (high heritability) and 1.0 (low heritability). Next, we generated “replicates” from blocks by drawing three values for each block mean from a normal distribution with SD 0.5 (high heritability) and 1.5 (low heritability).

In the absence of epistasis, narrow-sense heritability can be estimated by the formula(Lynch and Walsh 1998). For crosses of inbred lines, can be estimated by calculating and by , assuming epistatic interactions are weak (Lynch and Walsh 1998). (additive genetic variance) is the variance among line means from a given population. In our case, the true is the square of the SD of the distribution from which our simulation drew the line means for each population (see supporting information, File S1 for details). Dominance is the deviation of each cross from the mean of the two parental lines. Thus, the true in our simulations is the square of the SD of the distribution used to pick cross values from parental line means. The environmental variance, , is the sum of the block and replicate variances. Plugging in the values for these parameters enumerated above, *h*^{2} for high- and low-heritability cases is 0.56 and 0.07, respectively.

In our experiments, we measure enzyme kinetics on each replicate. We do this by providing a substrate to fly extracts and measuring changes in optical density of a substrate or product of an enzymatic reaction with time. The slope of the resulting line is the maximum enzyme rate, *V*_{max}. This rate is thus measured with some error, which is in practice variable from reaction to reaction. To investigate the effect of this on the inferences of model parameters, we simulated *V*_{max} values from normal distributions centered on the corresponding replicate values. To mimic the real world variation in assay quality, standard deviations for these distributions were drawn from truncated normal distributions (constraining SD to be above zero) with mean 20 times the replicate mean and SD 4 times the replicate value. We then generated points from these distributions and reestimated the means (replicate values) and their standard deviations.

In measuring enzyme activity, some reactions run aberrantly and produce outliers. To assess the impact of outlier observations on inference, we duplicated each data set, randomly drew 1% of observations, and multiplied each by a value drawn from a uniform distribution between 3 and 6. The regression standard deviations were multiplied by the same number. Thus, each data set has a “regular” version and a version with outliers.

Each simulated data set consists of 1080 sets of estimated enzyme rates with their standard deviations. We implemented the simulations in R (R Development Core Team 2008) and generated 500 data sets for each of the heritability values. The R script we used, with detailed annotations, is included in File S1.

#### Real data:

To check the performance of our model on real data, we chose two well-characterized enzymes from the pentose phosphate pathway, glucose-6-phosphate dehydrogenase (G6PD, EC 1.1.1.49) and 6-phosphogluconate dehydrogenase (6PGD, EC 1.1.1.44). We performed the assays as described in Clark and Keith (1989). We chose 92 *D. melanogaster* lines from five populations and inbred them for 12 generations. Nineteen lines came from The Netherlands [courtesy of Z. Bochdanovits (Bochdanovits and Jong 2003)], Ithaca, New York (collected in 2004 by E. M. Hill-Burns and B. P. Lazzaro), and Tasmania (sent in 2003 courtesy of A. A. Hoffman); 17 were from Beijing [provided by C. Aquadro (Begun and Aquadro 1995)]; and 18 were from Zimbabwe [ZH lines are from Harare and ZS lines are from Sengawa, provided by C. Aquadro (Begun and Aquadro 1993), and ZW lines are from Victoria Falls, provided by W. Ballard]. The names of the lines are listed in the data file (enz_data.tsv) in File S4. We crossed the lines in a scheme similar to the one depicted in Figure 1A and replicated the crosses in two blocks, with three replicates for each block. Five male flies were collected per replicate, weighed, crushed, and resuspended in buffer (Clark and Keith 1989). We performed the kinetic assays in 96-well plates, each plate containing extracts from one replicate of a particular kind of cross (*i.e.*, inbred lines or within- or between-population crosses). The complete data set is provided in the enz_data.tsv file in File S4.

#### Normal model for simulated data:

We describe the important features of our Bayesian hierarchical model here. A full description of the distributions of parameters and the Gibbs sampling scheme can be found in FileS2.

Traditionally, data similar to ours are analyzed using linear mixed models. The effects of, say, line are either “random,” if one is interested in the population the lines come from, or “fixed,” if one is interested in the particular sample of lines (Box and Tiao 1973), although different interpretations of these terms are also in use (Gelman 2005). Bayesians consider all parameters as random variables; thus the terms fixed and random effects are confusing. We use the terms “sample” and “population” parameters, when referring to groups of variables, or specifically identify them as means or variances. Depending on the biological question we are considering, we may be interested in either or both kinds of variables, and the strength of the Bayesian approach is that it allows us to simultaneously estimate them all (Box and Tiao 1973; Gelman 2005).

Bayesian analysis of mixed-effects models, together with MCMC sampling from posterior distributions, is generally successful (Box and Tiao 1973; Sorensen and Gianola 2002; Gelman *et al.* 2004). However, a number of computational difficulties exist (Gilks and Roberts 1996; Gelman *et al.* 2004), largely due to posterior correlations between the population and sample parameters (Gelfand *et al.* 1995; Gilks and Roberts 1996; Gelman *et al.* 2004). Bayesian hierarchical models make use of the data structure to alleviate these difficulties using two basic techniques. First, the population parameters are centered on means for the corresponding level [*i.e.*, the sample parameter values—hierarchical centering (Gelfand *et al.* 1995)], rather than on zero as is customary in mixed-effects models (Lynch and Walsh 1998; Sorensen and Gianola 2002). Thus, for example, in our model the block values are distributed normally with the mean among all blocks derived from the same cross () and variance .

Second, for a given level in the hierarchy, the likelihood for a sample parameter depends on the information from the level below, while the prior comes from the level above (Gelman *et al.* 2004; Gelman and Hill 2007). To illustrate this, we consider estimation of a block mean, . The likelihood iswhere is the mean of all the replicate values that belong to the same block *j* (hence the notation *i* ∈ *j*), and is the variance among replicates. The prior iswhere is the value for cross *k* that the block belongs to, and is the among-block variance. Given that the normal prior is conjugate to the normal likelihood, the posterior distribution of is(1)(Box and Tiao 1973; Gelman *et al.* 2004; Gelman and Hill 2007), where *n _{j}* is the number of replicates in each block

*j*and the other parameters are as before. Thus, each value is pulled toward the relevant cross mean, and the strength of the pulling is dependent on the precision of the estimate of from the data on one hand and the precision of the estimate of the cross mean on the other (Gelman

*et al.*2004; Gelman and Hill 2007). Note also that unbalanced designs (where

*n*are not the same across blocks, for example) are easily dealt with in this framework. The grand mean does not have a level above it, and therefore we assign it an improper flat prior in the model. It has been shown that with the types of model we are considering here, this choice of prior for the grand mean still leads to a proper posterior (Gelman

_{j}*et al.*2004). An additional complication is that this parameter is estimated from a small number of populations and thus the assumption of normality seems unrealistic. Instead, we assumed that population means come from a Student's

*t*distribution with 3 d.f.

To estimate line means, we consider all the crosses a given line participates in. Because there are three types of crosses—the selfed inbred lines and the within- and between-population crosses—we first have to correct for cross type effects. In principle, this is straightforward. Taking the within-population crosses as the baseline, we regress the cross means on two indicator variables: one takes the value of 1 when the cross is a selfed inbred line and 0 otherwise; the other takes the value of 1 only when the cross is between populations. The intercept is then the specific combining ability, or the cross mean after correcting for cross type. Since the coefficients of inbreeding and outcrossing (β^{inbr} and β^{outc}) are not further modeled (and are given flat priors), this is the well-known fixed-slope, variable-intercept regression (Gelman and Hill 2007). However, in practice it turns out that this approach leads to posterior correlations between the regression coefficients and the among-block variance (Gelman *et al.* 2004), leading to poor mixing and often to convergence of estimates to zero (not shown).

To alleviate this problem, we implemented parameter expansion (Gelman *et al.* 2004, 2008) when estimating the cross means. To do this, we modeled block means aswhere and are the line means for the female and male parents of the cross, *X* is the matrix of indexes for inbreeding and outcrossing, , and is the coefficient for specific combining ability, which we model as normally distributed with mean 0 and variance . Note that = of Equation 1. Thus, we abandon hierarchical centering for this level and use a mixed model. To break the posterior correlation among , , and , we introduce parameter expansion by replacing with a product of two values:The three values, α^{SCA}, γ_{k}, and σ_{γ} are not independently identified in the model and thus do not converge on any value (Gelman *et al.* 2004, 2008). However, the parameters we are interested in can be recovered at each iteration of the Markov chain:andThese variables are defined in the model and converge on the appropriate values, but because they are products of undefined parameters that are free to wander in the sample space, the posterior correlations are broken, leading to faster and more accurate convergence (Gelman *et al.* 2004, 2008).

In our implementation of the Markov chain, we update the -vector as a batch by first calculating the mean of μ^{bl} values that belong to the same cross *k*, subtracting the current values for and α^{SCA}γ_{k}, and regressing the resulting value on *X* without an intercept as suggested in section 18.4 of Gelman and Hill (2007).

Once the inbreeding and outcrossing effects have been controlled for, we use the values of μ^{SCA} to calculate line effects or general combining abilities. For each line *l*, we examine all the crosses it participates in (five in total). We take the value of the selfed cross as is. For the other crosses, we subtract the line mean for the opposite parent from twice the value for each cross. Thus, we have five estimates of the line mean from five different crosses. The likelihood for the line effect is a normal distribution with mean equal to average of the five estimates and variance . The variance among line effects is twice the general combining ability variance ().

Likelihoods for estimates of variances are inverse-χ^{2}. We allow the three kinds of crosses to have different and , on the basis of *a priori* biological considerations and some preliminary analyses of real data. We assume a single value for both and . We used flat improper priors on σ [σ_{(·)} ∝ 1 (Gelman 2006; Gelman and Hill 2007)]. We also tried proper Gamma priors, but these tend to shrink variance estimates to zero (Gelman 2006; Gelman and Hill 2007), leading to slow mixing and often to convergence failure (not shown).

Each replicate value in our data set is the slope of the corresponding kinetic curve or the enzyme maximal rate (*V*_{max}). We treated these two ways. First, we simply took the point estimates of *V*_{max}. Second, we modeled *V*_{max} values hierarchically with the prior being the mean value for the relevant block of replicates and the likelihood based on estimates of replicate values and their standard deviations computed as described at the end of the *Simulated data* section.

#### Student's *t* models for outliers:

For data sets with outliers, constructed as discussed above, we implemented a further extension of our model. Instead of a normal, we used Student's *t* distributions for replicate means:Student's *t* distributions have fatter tails than the normal, and therefore estimates of should be more robust to outliers (Gelman *et al.* 2004). For a given value of the variance, samples generated from a *t* distribution are expected to contain more observations that are far from the mean compared to samples generated from a normal. Thus, outlier observations do not inflate estimates of the variance, and the extent to which this is true depends on the degrees of freedom. As the degrees of freedom increase, a *t* distribution approaches the normal with the corresponding mean and variance.

We report analyses of the simulated data sets using *t* distributions with 3 and 6 d.f. Three degrees of freedom is the smallest value for which the mean and the variance for the *t* distribution are finite.

We implemented sampling from *t* distributions on the basis of their interpretation as mixtures of normals (Gelman *et al.* 2004). To improve mixing, we used parameter expansion for variance estimates (as suggested in section 11.8 of Gelman *et al.* 2004). Details of the sampling scheme are in File S2.

#### Computation:

Because we assume normal or *t* distributions for sample parameters throughout, the posterior distributions for all variables are available in closed form. Thus, we can take advantage of the efficiency of Gibbs sampling (Gilks *et al.* 1996) to construct the marginal posterior distributions for all parameters. We used blockwise updating for variables of the same level, with the exception of line effects, which had to be updated one at a time. The full sampling schemes for both the normal and the Student's *t* model can be found in File S2.

To set initial values, we first calculated approximate point estimates of parameters. For example, block means are means of the corresponding replicates, cross means are means of blocks, and so on. We then picked starting values from overdispersed distributions centered on these approximate estimates. For example, starting values for block means were picked from normal distributions with means equal to the approximate block estimates and variances 4 times the approximate estimate of . We picked initial values for standard errors from uniform distributions with lower bounds 0.2 times the approximate estimate and upper bounds 5 times the estimate.

We implemented the sampling algorithms in R. We provide an example R script that implements the Student's *t* model, with detailed annotations, in File S2. We ran three independent chains for each data set, with 500 iterations of burn-in followed by 1000 sampling iterations. We processed the chains using the coda package in R (version 0.13-4) (Plummer *et al.* 2009). If we suspected lack of convergence, we looked at time-series graphs of parameters. In a number of cases, lack of convergence was clearly due to insufficient burn-in because an initial value happened to be far from the eventual estimate. In these cases, we reran all three chains with new initial values.

For each variable, we compiled several statistics across the 500 high-heritability and 500 low-heritability data sets. For each data set, we noted whether the true value fell within the posterior 95% credibility interval, the fractional difference (*i.e.*, the absolute difference divided by the true value) between the median of the posterior distribution and the true value, the time-series estimate of the coefficient of variation (CVR) (the standard error divided by the true value; time-series SE calculated by the summary function in coda) of the posterior distribution, and the Gelman–Rubin convergence diagnostic (Gelman and Rubin 1992, as implemented in coda).

#### Restricted maximum-likelihood estimates of parameters:

As an alternative to our Bayesian hierarchical models, we used restricted maximum likelihood (REML) to estimate parameters using a traditional mixed-effects model. We employed the REML implementation provided by the lmer() function in the lme4 package in R (Bates and Maechler 2009). Gelman and Hill (2007) provide several examples of the use of this function for the analysis of hierarchical models. We used point estimates of *V*_{max} as the response variable and treated the replicate, block, cross (SCA), and line (GCA) effects as random. Population and cross-type effects were fixed. We describe the details of our model, the R code, and the results in File S3.

#### Model and computation for real data:

To analyze real kinetic data for G6PD and 6PGD, we implemented a version of the Student's *t* model with 3 d.f. described above. The model was programmed in R. We provide the annotated script, along with the raw data, in File S4. Three complications arise in the real data that we did not incorporate into our simulations. First, a small subset (0.4%) of the data is missing. Second, because the number of lines in each population is unequal, the design is somewhat unbalanced. Third, we performed the enzyme assays in 96-well plates and inspection of slope estimates clearly indicated the presence of plate effects.

In the analyses presented here, we did not implement missing data imputation, treating the missing data as not collected. Thus, the problem reduces to slight imbalance in the replication structure. The model detailed above does not assume equal sample sizes at any level of the hierarchy, and thus we did not have to make any changes in our algorithms to accommodate the missing data and the imbalance in the crossing design. We corrected for plate effects within the Gibbs sampler, as described in File S3.

We ran the Gibbs sampler for both enzymes simultaneously, updating variables at each level in blocks as described above for simulated data, one enzyme at a time. We generated five chains, sampling 20,000 iterations per chain after a 2000-iteration burn-in. We then thinned the output, retaining 2000 values per chain.

## RESULTS

#### Cross design and analysis:

Our choice of cross design and analysis methods is motivated by our empirical research. We study evolution of metabolic function within and between populations, using *D. melanogaster* as a model organism. Looking at within-population variation, we are interested in the potential for adaptation to new conditions (and hence in narrow-sense heritability), as well as the effects of deleterious mutations that are maintained by mutation–selection balance (and hence in the effects of inbreeding). Between-population variation should be informative about local adaptation, and thus we want to compare population means and look for potential genetic incompatibility by examining between-population crosses. The regulatory network that binds enzymes together can be probed by measuring genetic correlations among enzyme activities (Clark 1989). For this, we need estimates of line effects (general combining abilities).

Given these considerations, we developed a modified round-robin crossing design (Figure 1A). It combines round-robin (Wayne *et al.* 2004) crosses within populations, where each line participates in one cross as a male and in another cross as a female. To this, we added the inbred lines themselves and crosses between populations. Each line participates in two interpopulation crosses, once as a male and once as a female, each with a different population. This makes the between-population crosses balanced both at the line level (all lines participate in an equal number of crosses) and at the population level (each population is crossed to all other populations an equal number of times). Like the regular round-robin crossing design, the number of crosses grows linearly with the number of lines evaluated, allowing adequate sampling to estimate population parameters. Since each line is crossed five times, we can additionally separate specific and general combining ability.

To estimate environmental effects, we perform structured replications of our crosses (Figure 1B). Each cross is carried out on two batches of food (block level) and three samples are taken from each batch (replicate level). This allows us to capture two kinds of random environmental effects: variations in food quality from batch to batch and fluctuations within vials where we rear the flies. We then perform enzyme kinetic assays on each replicate, obtaining estimates of enzyme maximal rates with some error (Clark 1989; Clark and Keith 1989).

As is clear from Figure 1B, our data are structured hierarchically. Each level is completely nested in the level above, with the exception of the cross level that is not perfectly nested within lines (two different lines participate in within- and between-population crosses) or populations. To make use of this structure in the data, it is natural to employ hierarchical models (Gelman and Hill 2007) to estimate the parameters of interest. Furthermore, a Bayesian approach is attractive because it allows for models of arbitrary complexity and permits a full account of uncertainty in all parameters (Sorensen and Gianola 2002; Gelman *et al.* 2004; Gelman and Hill 2007). However, the limitation of this approach is that the posterior distributions of variables are not available, and thus one has to resort to numerical methods to estimate them (MCMC in this case). While in theory MCMC methods should accurately estimate posterior distributions given an infinite chain (Gilks *et al.* 1996), in practice one has to demonstrate that they do so after a reasonable number of iterations. Therefore, we generated simulated data and used them to test the performance of our models.

#### Analysis of simulated data:

We constructed simulated data sets to answer three main questions. First, we wanted to know if our combination of crossing design and modeling approach can be used to estimate parameters of interest to us and to quantify the accuracy and precision of such estimates. Second, we wished to know whether taking full account of uncertainty in enzyme activity estimates was necessary or if we could save on computational time and use point estimates of slope values. Third, because high-throughput processing of enzymatic assays yields a small but significant number of outliers, we were interested in gauging their effect on inference and testing strategies to minimize the influence of outliers.

To answer these questions, we generated two sets of simulated crosses with environmental replication: one with low narrow-sense heritability (*h*^{2} = 0.07) and one with high (*h*^{2} = 0.56; see methods for details). We also added outlier observations to each group of data sets. We chose values of fixed parameters (Table 1) in the simulations to fall in the range we typically see with real data. We generated 500 data sets with each type of simulation and analyzed them under three models. The first used point estimates of kinetic curve slopes and assumed normal distributions of replicates. The second also assumed normal distributions for replicates, but modeled the uncertainty in their estimation. For data sets with outliers, we also used a third model that assumed Student's *t* distributions for replicates. Because *t* distributions have fatter tails than the normal distribution, we expect estimates of sample parameters from this model to be robust to outliers (Gelman *et al.* 2004). On the basis of *a priori* biological considerations, we modeled environmental variances ( and ) separately for each type of cross, although they were the same in the simulations. The results we show are for the variances from the within-population crosses. The results for the variances from other cross types were indistinguishable.

We first turn to estimates of population parameters. For technical reasons, it is easier to estimate standard deviations rather than variances when programming in R, and this has an added advantage of bringing the values for these parameters to the same scale as sample variables. Population parameters are notoriously difficult to estimate accurately (Lynch and Walsh 1998; Sorensen and Gianola 2002; O'Hara *et al.* 2008). We quantified the quality of our estimates several ways. To determine the accuracy of our models, we plotted the range of differences between the estimated medians of posterior distributions and the true values (Figure 2, top panels). Precision of the estimates is reflected in the extent of the spread of their posterior distributions, quantified by standard errors of the sampling Markov chains (Figure 2, bottom panels). We scaled these statistics by true values to enable comparisons among variables. Another measure of accuracy is the fraction of the time the true value falls within the posterior 95% credibility interval (95% C.I.) (Table 2). Formally, under the Bayesian paradigm this statistic does not carry the same straightforward interpretation as for frequentist confidence intervals (Box and Tiao 1973). However, from a practical standpoint this measure still reflects the accuracy of parameter estimation, as well as the degree to which the uncertainty of the estimate is captured by a model. Finally, we assessed convergence properties of the various models using the Gelman–Rubin diagnostic (Gelman and Rubin 1992). Values of this statistic close to 1.0 indicate that the particular set of chains has converged. We considered a particular run to have failed to converge if the value of the diagnostic was >1.5 and counted the number of such instances for every model–data set combination (listed as percentage of failure to converge in Table 2). We tried a range of cutoff values (not shown). Our conclusions are not sensitive to the particular value chosen.

Overall, our model and cross combination yields estimates of population parameters that are close to reality (Figures 2 and 3). When outliers are absent, full treatment of uncertainty in slope estimates does not make an appreciable difference, with the exception of among-replicate standard deviations (Figure 2). In the latter case, however, the model yields overestimates that are falsely precise. In contrast, in the presence of outliers the model with point estimates yields results that are both incorrect and imprecise (Figure 2 and left set of columns in Table 2). Standard deviations are generally overestimated (with the exception of σ_{bl}), leading to drastic underestimation of heritability when true *h*^{2} is high (model 3 in Figure 3A). Furthermore, the model fails to converge more frequently than any other data set–model combination (Table 2).

The deficiencies of the model with point estimates can be corrected to a large degree by modeling the uncertainty of slope estimates (model 4 in Figure 2 and Table 2). Most of the estimates (with the exception of σ_{bl}) are further improved by using *t* models for replicates (models 5 and 6 in Figure 2). However, in every case our models overestimate the genetic standard deviations(σ_{SCA} and σ_{gca}) by ∼20%. This does not result in misestimates of *h*^{2} when true heritability is high (Figure 3A), although for data sets with low heritability, where these patterns are similar but exaggerated, we tend to overestimate *h*^{2} by almost twofold (Figure 3B).

Modeling the slopes also improves the probability of convergence (right set of columns in Table 2), although problems still persist for estimates of σ_{bl} when heritability is low. Convergence problems are essentially eliminated with Student's *t* models (columns 5 and 6 in the right set of Table 2).

Despite problems with estimating population parameters, our models do well when evaluating sample variables (Figure 4 for high-heritability data sets; the results for low-heritability data sets and for population means in both groups of data sets are indistinguishable). For data sets with outliers, the model with point estimates still produces unbiased estimates, but they are significantly less precise. However, although the credibility intervals are wider (model 3 in Figure 4), they are only slightly more likely to include the true observation (Table 2). We can improve the precision of estimates by modeling the slopes. Even more precise estimates can be obtained with *t* models. As is the case with standard deviations, the model with point estimates often fails to converge when outliers are present. Modeling slopes alleviates the problem, and using *t* models results in further improvements (Table 2).

As we detail in the methods section, posterior distributions of sample parameters at a given level in the hierarchy incorporate information from the higher levels through prior distributions. Thus, if our models perform correctly, extreme observations at a given level should be pulled toward the values for corresponding variables above them. For example, line effects of lines with true means below the corresponding population means should be overestimated. Conversely, line means for lines with values higher than their population values should be underestimated. Furthermore, the more extreme the true value is for a line, the more severe the discrepancy, with opposite signs for values that are low or high. To see if this indeed occurs, we plotted differences between estimated and true values of line effects against differences between true line effects and corresponding population means (botttom right panel of Figure 4; both differences scaled by the corresponding true population mean). As expected, we see an appreciable negative correlation between the two variables (Pearson's *r* = −0.18).

It is not our main purpose to compare maximum-likelihood and Bayesian methods of inference. Nevertheless, we analyzed our simulated data sets using REML estimates of a traditional mixed-effects model (see File S3 and methods for details). As expected, we find that for estimates of environmental standard deviations and fixed effects, this model behaves similarly to our normal model with point estimates of slopes. However, the genetic effect variances (specific and general combining ability) are frequently underestimated, especially when heritability is low (see the figures in File S3), leading to underestimates of heritability. This problem is particularly severe when heritability is low and outliers are present. Furthermore, when heritability is low, the estimates of σ_{gca}, and particularly σ_{SCA}, become unstable, with deviations from true values highly dependent on the particular data set.

#### Analysis of real kinetic data:

Simulated data sets are very useful when assessing model accuracy because the correct results are known *a priori*. However, simulations are necessarily idealized, and although we did our best to make them realistic (by, for example, including outliers), it is impossible to simulate every conceivable scenario. We therefore wanted to assess the performance of our model on a real data set. We analyzed data for two well-characterized enzymes in the pentose phosphate pathway, G6PD and 6PGD of *D. melanogaster*. Assays for these enzymes are well established (Clark and Keith 1989) and a number of studies have documented genetic variation in their activities (*e.g.*, Bijlsma 1980; Wilton *et al.* 1982; Clark 1989).

We used a version of our Student's *t* model with 3 d.f. (model 5 discussed above), but with correction for assay–plate effects (see methods for details). The choice of model was dictated by the presence of outliers in data sets for both enzymes. Furthermore, the outliers inflated the estimates of σ_{rep} to the extent that problems arose with evaluation of block standard deviations: estimates of σ_{bl} came close to zero, and this created numerical errors and lack of mixing in the posterior distributions of other parameters.

We present estimates of selected parameters in Table 3. Because previous studies employed only inbred lines and different replication strategies to control for environmental effects, it is difficult to make direct comparisons between our results and preceding findings. However, we can say that the relative magnitudes of σ_{rep} and σ_{bl} we see are broadly similar to those reported by Clark (1989), although we find a larger genetic component of variation in G6PD and 6PGD than in that study. Furthermore, we find that the magnitudes of environmental standard deviations vary among cross types (Table 3).

One way to assess the quality of our line effect estimates is to calculate the genetic correlation between the activities of the two enzymes. A positive correlation has been repeatedly documented before (Bijlsma 1980; Wilton *et al.* 1982; Clark 1989). Since we were estimating the parameters for each enzyme simultaneously in our Gibbs sampler, we were able to sample from the distribution of the correlation between line effects. We indeed see the expected positive genetic correlation (Table 3). Strikingly, our estimate is almost identical to the one obtained by Clark (1989): his value is 0.31 (0.18, 0.43), whereas ours is 0.30 (0.17, 0.42).

## DISCUSSION

We set out to develop a combination of a sparse partial diallel crossing scheme and modeling approach to estimate a number of parameters of interest in evolutionary quantitative genetics. For evolutionary applications, we are interested in learning about population-level processes, and therefore large samples of lines are required. The full diallel, which incorporates crosses among all lines, even when excluding reciprocal crosses and inbred lines, is not ideal for such applications because the number of crosses grows quickly with the number of lines assessed (Kempthorne and Curnow 1961; Lynch and Walsh 1998). We settled on a modified round-robin design that incorporates inbred lines and crosses within and between populations. Each line is selfed and also crossed four times. Thus, it is possible to partition genetic variation into specific and general combining abilities (Sprague and Tatum 1942; Lynch and Walsh 1998) and still have the number of crosses grow linearly with the number of lines.

While in principle this crossing design should be useful for our purposes, it is not straightforward to analyze using traditional methods. An important stumbling block is the inclusion of three different types of crosses, making it necessary to control for cross-type effects when estimating SCA. Other difficulties include measurement errors for phenotypes (in our case, enzyme activities), two levels of environmental replication, and slight imbalances in the crossing scheme caused by unequal sampling of lines from populations. Furthermore, we are interested in evaluating distributions of deterministic combinations of parameters (for example, narrow-sense heritability and genetic correlations between enzyme activities).

Bayesian approaches can fulfill these requirements (Blasco 2001; Sorensen and Gianola 2002) and have been extensively used in animal breeding and quantitative trait mapping (Gianola and Fernando 1986; Sorensen and Gianola 2002; Beaumont and Rannala 2004). Taking advantage of the structure of our data, we implemented a set of hierarchical Bayesian models (Gelman *et al.* 2004; Gelman and Hill 2007). We made extensive use of recently developed computational techniques—parameter expansion (Gelman *et al.* 2004, 2008) and hierarchical centering (Gelfand *et al.* 1995; Gilks and Roberts 1996)—to speed and improve convergence. Our emphasis is on estimating parameters given a model, rather than significance testing. Although model comparison techniques (Raftery 1996) and methods for integrating over various models (Phillips and Smith 1996) are available, in our case the structure of the data and biological considerations drive the selection of the basic modeling framework. We consider three types of models that differ only in the statistical approach to dealing with replicates and choose among these approaches on the basis of their performance on simulated data.

We used four simulated data sets: two with low heritability and with and without outliers among slope values and two with high heritability (also with and without outliers). We then assessed precision and accuracy of estimates from our models, compared with true values. As expected (Gelman 2005), estimates of sample-level parameters (*e.g.*, line effects and cross-type coefficients) are more accurate than evaluations of population-level variables (*e.g.*, standard deviations and heritability). In particular, genetic standard deviations tend to be overestimated by our models, although narrow-sense heritability is still accurately estimated when it is high. This is consistent with an analysis of an approximate circulant cross of Scotts pine data (Waldmann and Ericsson 2006). This data set had only one level of replication, one type of cross, and one population. Thus, overestimation of genetic effects is probably not due to the complexity of our experimental scheme. We find our combination of crossing design and model can effectively partition genetic components of variation and adequately estimate and control for cross-type effects. The latter is particularly important, because large inbreeding and outcrossing effects are present in the real data sets we seek to model (Table 3).

Overall, we find that when no outliers are present taking point estimates of slopes instead of modeling them does not appreciably affect inference. However, in the presence of outliers, modeling the slopes is essential. Further improvements in sample parameter estimates can be achieved when we use Student's *t* models for replicates. While analyzing empirical data, we further found that *t* models are often necessary to prevent Markov chains from getting stuck on estimates of σ_{bl} close to zero, a behavior that leads to numerical problems (values too small for computers to handle) and failure of proper mixing (Markov chains not moving adequately through the values that belong to the posterior distribution of a given parameter).

As a comparison, we used REML to estimate the parameters of a mixed-effects model to analyze our simulated data sets. We found that in general, this approach gives results that are similar to our normal model with point estimates of slopes. However, in contrast to our Bayesian estimates, the genetic variances are underestimated by the mixed-effects model. This is particularly pronounced when outliers are present and the heritability is low. In this case, estimates of σ_{gca} and particularly σ_{SCA} become fairly unstable, with the deviations from true values highly dependent on the data set (see File S3). This problem does not appear to arise in our Bayesian estimates.

An important feature of the Bayesian approach is the emphasis on parameter estimation over hypothesis testing. In the traditional non-Bayesian framework, one would, for example, test the significance of the additive genetic effect. If the effect is significant, the line means are estimated separately, and if it is not, line effects are pooled to estimate, say, population parameters. In the hierarchical Bayesian framework, we use the relative information from the population and line levels to partially pool line effects, with the degree of pooling determined by the data (Gelman and Hill 2007). We illustrate this behavior in the bottom right plot of Figure 4, where we see that estimates of means for lines with extreme values are shrunk to their respective population means.

Another example of this approach is our separate treatment of environmental variances for each cross type. Thus, instead of testing for the significance of the gene-by-environment interaction and seeking a biological interpretation if it is significant, we are able to ask a biologically driven question while building our model. We find differences among some of the cross types (Table 3). However, even when differences are not “significant” (for example, posterior intervals for σ_{rep} of 6PGD in within- and between-population crosses overlap substantially), we do not eliminate the extra parameters from the model. While this treatment does not capture all the possible kinds of gene-by-environment interactions, it provides an example of a biologically driven attempt to dissect them.

Because it is easy to construct posterior distributions of any deterministic combination of parameters, such tests can be driven by biological interest rather than statistical convenience. We provide examples of estimates of narrow-sense heritability and genetic correlations between activities of two enzymes: G6PD and 6PGD. We show that our approach recovers the well-known positive correlation between these two enzymes (Bijlsma 1980; Wilton *et al.* 1982; Clark 1989) that is likely due to the toxicity of an intermediate compound (Hughes and Lucchesi 1977, 1978). Because the previous studies used only inbred lines to calculate this correlation, concerns have been raised (Zera and Harshman 2001) that this type of observation is an artifact of making recessive deleterious alleles homozygous through inbreeding. Since we estimate our correlation values from general combining abilities on the basis of heterozygous as well as inbred lines, we can exclude this caveat.

The combination of crossing scheme and modeling approach we propose provides a powerful framework for estimating parameters important in evolutionary quantitative genetics. Although our primary motivation was the assessment of natural variation in metabolic attributes including enzyme activities, it can be readily extended to any situation when phenotypes are measured with error and are laborious or expensive to assess.

## Acknowledgments

We thank B. Logsdon and two anonymous reviewers for helpful comments on the manuscript and A. Coventry for help with computational methods. This research was funded by a National Institutes of Health grant to A.G.C. and L.H.

## Footnotes

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.110.115055/DC1.

Communicating editor: K. W. Broman

- Received February 1, 2010.
- Accepted February 6, 2010.

- Copyright © 2010 by the Genetics Society of America