## Abstract

Genetic studies usually focus on quantifying and understanding the existence of genetic control on expected phenotypic outcomes. However, there is compelling evidence suggesting the existence of genetic control at the level of environmental variability, with some genotypes exhibiting more stable and others more volatile performance. Understanding the mechanisms responsible for environmental variability not only informs medical questions but is relevant in evolution and in agricultural science. In this work fully sequenced inbred lines of *Drosophila melanogaster* were analyzed to study the nature of genetic control of environmental variance for two quantitative traits: starvation resistance (SR) and startle response (SL). The evidence for genetic control of environmental variance is compelling for both traits. Sequence information is incorporated in random regression models to study the underlying genetic signals, which are shown to be different in the two traits. Genomic variance in sexual dimorphism was found for SR but not for SL. Indeed, the proportion of variance captured by sequence information and the contribution to this variance from four chromosome segments differ between sexes in SR but not in SL. The number of studies of environmental variation, particularly in humans, is limited. The availability of full sequence information and modern computationally intensive statistical methods provides opportunities for rigorous analyses of environmental variability.

- Bayesian inference
- environmental sensitivity
- genetic control of environmental variance
- genomic models
- random regression models

ENVIRONMENTAL sensitivity can be defined either as mean phenotypic changes of a given genotype in different environments or as differences in the environmental variance of different genotypes in the same environment (Jinks and Pooni 1988). The first definition gives rise to genotype-by-environment interaction at the level of the mean, a topic that has a long history due to its relevance in livestock and plant breeding as well as in medical genetics. Recent articles on specific areas are Huquet *et al.* (2012) in livestock, El-Soda *et al.* (2014) in plants, and Hutter *et al.* (2013) in humans.

The second definition implies that environmental variance is under genetic control and is the subject of the present work. Genetic control of environmental variation has implications in evolutionary biology, in animal and plant improvement, and in medicine. In evolutionary biology, a fundamental problem is to understand the forces that maintain phenotypic variation. Most of the models assume that environmental variance is constant and explain the observed levels of variation by invoking a balance between a gain of genetic variance by mutation and a loss by different forms of selection and drift. Zhang and Hill (2005) discuss models where environmental variance is partly under genetic control and study conditions for its maintenance under stabilizing selection. From a breeding point of view, if environmental variance is under genetic control, it would be possible to decrease variation by selection leading to more homogeneous products (Mulder *et al.* 2008). In human health the study of variation has shown a recent revival due to the role it may play in understanding complex diseases (Geiler-Samerotte *et al.* 2013). For example, the question of how a phenotype (*e.g.*, blood pressure) varies over time within an individual and whether this variability is subject to genetic control can be of clinical relevance. Further, many traits such as cancer originate from rare events taking place in a few cells. Such events could be the result of stochastic cell-to-cell variation that has been shown to be under genetic control in *Saccharomyces cerevisiae* (Ansel *et al.* 2008) and in *Arabidopsis thaliana* (Jimenez-Gomez *et al.* 2011; Shen *et al.* 2012). Feinberg and Irizarry (2010) provide evidence from human and mouse tissues that differentially methylated regions could be one explanation for this genetically controlled variation in gene expression.

The literature on the subject can be distinguished between attempts at documenting the existence of genetic factors influencing variation and those aimed at finding the specific genes involved. The latter are more recent and involve fewer studies (Ansel *et al.* 2008; Jimenez-Gomez *et al.* 2011; Shen *et al.* 2012; Hutter *et al.* 2013) and the focus in some is on analysis of the marginal phenotypic variation (Yang *et al.* 2012a). This is different from the subject of the present study, where the focus is on the conditional variance, given the genotype. This distinction is important. For example, with repeated measurements, it amounts to studying the variance either between or within individuals.

In domestic animals and plants, statistical documentation for genetic control of environmental variance stems from analyses of outbred populations (reviewed in Hill and Mulder 2010). The evidence includes litter size in pigs (Sorensen and Waagepetersen 2003), adult weight in snails (Ros *et al.* 2004), uterine capacity in rabbits (Ibáñez *et al.* 2008), body weight in poultry (Rowe *et al.* 2006; Wolc *et al.* 2009), slaughter weight in pigs (Ibáñez *et al.* 2007), and litter size and weight at birth in mice (Gutierrez *et al.* 2006). Usually, support for the model is based on reporting estimates of the model parameters and on comparisons involving the quality of fit of the genetically heterogeneous variance model with that of models posing homogeneous variances. A better fit of the heterogeneous variance model can be due to its flexibility to account for specific features of the data not necessarily related to variance heterogeneity, such as unaccounted lack of normality (Yang *et al.* 2011). Spurious results can never be excluded using a model-based approach for the assessment of genetic control of environmental variances, particularly when a properly conducted analysis involves complex, highly parameterized hierarchical models.

More direct evidence for genetic control of environmental variation comes from analyses of genetically homogeneous populations. These data, consisting of pure lines that are genotypically different, with replicated genotypes within lines, are well suited to investigate genetic heterogeneity of environmental variation. The variance between genetically identical individuals within lines reflects environmental variation, and a different environmental variance across lines indicates that a genetic component is operating. Such results were reported for dry matter grain yield per plot from three genetically homogeneous single-cross maize hybrids by Yang *et al.* (2012b). More compelling evidence using bristle number in isogenic lines of *Drosophila melanogaster* reared in a common macroenvironment under controlled laboratory conditions was provided by Mackay and Lyman (2005). This design has the added advantage that the statistical method involved is straightforward: a comparison of within-line sampling variances across lines. A similar approach was followed by Morgante *et al.* (2015) who used inbred lines of *D. melanogaster* reared in a common macroenvironment under controlled laboratory conditions. Three quantitative traits were analyzed and the results provided a strong indication of environmental variance heterogeneity between inbred lines, with a pattern of behavior that differed among traits. In this work we extend the work of Morgante *et al.* (2015) by incorporating sequence information in the analysis. This allows a separation of the variation (in environmental variance) between lines (in principle, entirely of genetic origin) into a fraction captured by sequence information and a remainder. Here we restrict the analysis to two traits: starvation resistance and startle response. For each trait the environmental variances of males and females were analyzed as two correlated traits to investigate the existence of genomic variation in sexual dimorphisms. The total genomic variance is also partitioned into contributions from four chromosome segments.

This article is organized as follows. *Material and Methods* provides a brief description of the data, of the model for the variance between lines, and of the two sets of statistical models employed, both of which treat records in males and females as two correlated traits. In *Results* we report and compare inferences based on the two sets of models. The models are implemented using empirical Bayesian methods. The *Discussion* provides a final overview and implications of the results obtained. A number of technical details are relegated to supporting information, File S1. These include a formal derivation of the probability model for the variance between lines and details of a spectral decomposition that plays an important computational role in the Markov chain Monte Carlo strategy implemented, as well as a full description of the Bayesian models. Finally, File S1 also provides an overview of the Monte Carlo implementation of the Bartlett test to study overall heterogeneity of within-line variance, between lines.

## Materials and Methods

### The data

The traits studied are starvation resistance (SR) and startle response (SL). The data belong to the *D. melanogaster* Genetics Reference Panel (DGRP) (Mackay *et al.* 2012). The inbred lines were obtained by 20 generations of full-sib mating from isofemale lines collected from the Raleigh, North Carolina population, which have full genome sequences.

The DGRP lines are not completely homozygous. Most lines remained segregating for ≤2% sites after 20 generations of full-sib inbreeding; however, a number of lines had ∼20% of markers segregating on one or more autosomal arms due to the segregation of large polymorphic inversions. In previous work we explicitly tested whether the log-sampling variances may show an association with heterozygosity and no such tendency was found (Morgante *et al.* 2015). Therefore, rather than excluding lines, all were included in the analysis without accounting for the level of heterozygosity.

The experimental design involves replicates within line for each sex. For each sex, there are lines ( = 197 for SR and = 167 for SL), replicates per line ( = 5 for SR, = 2 for SL). The phenotypes analyzed are the log-sampling variances within each line, sex, and replicate of SR and SL. Each of the log-sampling variances from each sex are computed from phenotypes ( = 10 for SR; = 20 for SL) collected within each replicate. Resistance to starvation was quantified by placing two 1-day-old flies in culture vials containing nonnutritive medium (1.5% agar and 5 ml water) and scoring survival every 8 hr until all flies were dead. SL was assessed by placing single 3- to 7-day-old adult flies, collected under carbon dioxide exposure, into vials containing 5 ml culture medium and leaving them overnight to acclimate to their new environment. On the next day, between 8 am and 12 pm (2–6 hr after lights on), each fly was subjected to a gentle mechanical disturbance, and the total amount of time the fly was active in the 45 sec immediately following the disturbance was recorded. A similar data set and the same traits were also used by Ober *et al.* (2012) to study predictive ability, using whole-genome sequences.

#### Genetic marker data:

Briefly, the genome of *D. melanogaster* contains four pairs of chromosomes: an pair (males are *XY* and females are *XX*) and three autosomes labeled 2, 3, and 4. The fourth chromosome is very tiny and it is ignored. The “genetic markers” used in the present analyses were called from raw sequence data as described in Huang *et al.* (2014). Markers were filtered such that minor allele frequencies were >0.05 and were called in at least 80% of the lines, leaving a total of 1,493,351 SNPs from four chromosome arms (2*L*, 2*R*, 3*L*, and 3*R*). The numbers of genetic markers within each of these sets were 406,577, 327,967, 390,711, and 368,096, respectively.

One of the objectives of this study was to study whether sexual dimorphism could be detected at the autosomal level. Therefore sex chromosomes were excluded. As a check some of the analyses were repeated including the sex chromosomes. These are reported briefly and we show that the results of both analyses are almost identical.

#### Sexual dimorphism:

The term genetic (or genomic) variance in sexual dimorphism is used throughout this article. It should be understood as differences in the genetic (or genomic) variances between males and females (of log-sampling variances) and a genetic (or genomic) correlation between log-sampling variances of males and females different from 1.

Sexual dimorphism itself is manifested in the significance of the effect of sex. This would result in different means of log-sampling variances in males and females. Despite this difference, the variance in log-sampling variance in the two sexes may be equal, or the correlation between log-sampling variances in the two sexes may be equal to one. This would give rise to sexual dimorphism but no variance in sexual dimorphism. The converse is also true. In this work we focus mainly on the variance in sexual dimorphism.

### Statistical models

The data are analyzed with two sets of two-trait models, which regard records in males and females as two correlated traits. The first set comprises relatively simple two-trait models, which allocate the total variance in log-sampling variance into components between and within lines. These models are appealing in the sense that they are free from assumptions regarding gene action and lead directly to an estimate of broad sense heritability of the log-sampling variance of SR or SL, defined as the ratio of the between-line component relative to the total variance. This set of models provides also a description of broad sense genetic correlation between sexes. Differences in these parameters between sexes provide a first indication of genetic variance in sexual dimorphism.

The data are analyzed subsequently using a more parameterized two-trait model. Here the component in log-sampling variance between lines is studied in more detail, by quantifying the proportion of this variance captured by genetic marker information present in each of four chromosome arms and a remainder, not captured by genetic marker information. For each trait, we study whether chromosome arms contribute differently to the total genetic variance (variance between lines) and whether this contribution varies between sexes. The general two-trait model provides estimates of within-chromosome correlation of genomic values between sexes. Differences in these parameters between sexes provide a more detailed picture and disclose what we call genomic variance in sexual dimorphism.

The dependent variable analyzed is the logarithm of the sampling variance in a particular sex, line, and replicate. This is a measure of residual variance that is interpreted as environmental variance. The justification for this interpretation is that lines are highly homozygous, and, as indicated above, the small amount of heterozygosity remaining after 20 generations of full-sib mating was found not to be related to residual variability. The sample variance in a particular sex, line, and replicate (ignoring here the subscripts for sex, line, and replicate) is the scalar (1)where is the phenotype and is the mean of the observations. It can be shown (see File S1) that the distribution of can be approximated by (2)where m is a subscript for males (f is a subscript for females), *j* is a subscript for the line, *k* is subscript for the replicate, and is the variance of *y _{i}* for the specific line, sex, and replicate subclass. This normal distribution has known variance and unknown mean. Hereinafter we refer to the dependent variable

*z*as the log-sampling variance.

### The general two-trait model

We begin with a description of the general two-trait model since the simpler models are special cases. The general two-trait model partitions the line effects into a component explained by genetic marker effects, *g*, and a component due to genetic effects that are not associated with markers, *h*. Additionally, the component due to genetic markers *g* is subdivided into contributions from the *C* = 4 chromosome arms . The model also includes replicate effects *r* that contribute to within-line variation.

In (2), the model for the mean in males (m) is assumed to be (3)and in females (f) (4)In these expressions, the *μ*’s are scalar means for each sex, the *g*’s are contributions to line effects from each of *C* = 4 chromosome arms that can be associated with genetic marker information (genomic effects or genomic values), the *h*’s represent a component of the line mean that cannot be captured by regression on markers, and the *r*’s are the contribution to within-line effects from replicates.

The genomic value for chromosome arm *k* (a scalar) is defined as the sum of the effects of all the markers in chromosome arm *k*; that is, for line *j*, chromosome arm *k*, in males, (5)with an equivalent expression for females. In (5), *p _{k}* is the number of markers in chromosome arm

*k*, the scalar

*w*is the observed (centered and scaled) label for marker

_{ijk}*i*in chromosome arm

*k*in line

*j*, and is the effect of marker

*i*in chromosome arm

*k*in males. The × 1 vectors of genomic effects for males and females for chromosome arm

*k*are and , respectively, where

*W*= {

_{k}*w*} is the observed ×

_{ijk}*p*matrix of marker genotypes of chromosome arm

_{k}*k*. The joint distribution of vectors and is assumed to be (6)where the

*I*’s represent

*p*×

_{k}*p*identity matrices, is the prior uncertainty variance of marker effects for males (females) in chromosome arm

_{k}*k*, and is their prior covariance.

Due to the centering the rank of *W _{k}* is It follows from these assumptions and from standard properties of the multivariate normal distribution that the model for the joint distribution of genomic values in males and females is the singular multinormal (SN) distribution, (7)Above, is the genomic variance in males in chromosome arm

*k*, is the genomic variance in females in chromosome arm

*k*, is the genomic covariance between males and females in chromosome arm

*k*, is the genomic relationship matrix of rank of chromosome arm

*k*, and SN denotes the singular normal distribution (details in File S1). A more compact notation for (7) is (8)where (9)Similarly, collecting the

*h*’s in vectors with elements for each sex and the

*r*’s in vectors of

*n*

_{t}= (

*n*

_{ℓ}×

*n*

_{r}) elements for each sex, their joint distribution is assumed to be (10)and (11)The identity matrix

*I*is of order

*n*

_{ℓ}×

*n*

_{ℓ}in (10) and of order (

*n*

_{ℓ}

*n*

_{r}) × (

*n*

_{ℓ}

*n*

_{r}) in (11). The vectors of log-sample variances for males and females,

*z*

_{m}and

*z*

_{f}, each with

*n*

_{t}=

*n*

_{ℓ}

*n*

_{r}elements, are conditionally normally distributed, given

*g*,

*h*, and

*r*, and take the form (12a) (12b)where for starvation resistance and for startle response. In these expressions 1 is an

*n*

_{t}× 1 vector of ones, and

*Z*is an observed

*n*

_{t}×

*n*

_{ℓ}incidence matrix (of ones and zeros) that associates each of the

*n*

_{r}log-sampling variances of a given sex and line to a common element

*g*or

*h*. The vectors

*r*contain the effects of the

*n*

_{t}=

*n*

_{ℓ}

*n*

_{r}replicates.

For males, inferences are reported in terms of ratios (13)with and a similar expression for females. When the elements of *W* are scaled so that the average of the diagonal elements of *WW*′ is 1, (13) quantifies the proportion of the variance in log-sampling variance between lines (total genetic variance), captured by genetic marker information associated with chromosome arm *i*. The ratio involving marker effects from all the chromosome arms is (14)with a similar expression for females.

We also report within-chromosome arm genomic correlations (15)and broad sense heritabilities for each sex, here defined as (16a) (16b)These quantify the variance in log-sample variance observed between lines (which is the total genetic variance), as a proportion of the total variance (sum of the between- and within-line components). In the denominator of (16), the conditional variance in expression (2) is excluded because it is data dependent: the number of phenotypes per replicate used to compute *z*. Therefore, expressions (16a) and (16b) are interpreted as broad sense heritability of sample variances computed with a large number of replicates.

Finally, we also compute the total genomic correlation between sexes, defined as (17)and the broad sense genetic correlation between sexes, defined as

(18)### The genomic variance of chromosome-specific components

The models specified by (3), (4), and (7) quantify the relative contribution from each of the chromosome arms 2*L*, 2*R*, 3*L*, and 3*R* by fitting separate variance components for each of those segments. Here we describe the basis for this partitioning. The point of departure is to assume that the 2*n*_{ℓ} × 1 vector of genomic effects is equal to the sum of the vectors of genomic effects belonging to *C* chromosome components. Then (19)where *g _{i}* is the 2

*n*

_{ℓ}× 1 vector of genomic values for males and females, associated with component

*i*,

*i*= 1,…,

*C*;

*p*is the number of marker genotypes in component

_{i}*i*; and

*Vg*is the 2 × 2 genomic covariance matrix from chromosome component

_{i}*i*given by (9). Expression (19) holds under the model assumption that the components of the vector of marker effects are realizations from

*b*∼

_{i}*N*(0,

*I*⊗

*Vb*), with , . Then In these expressions, is the vector of marker effects of chromosome component

_{i}*i*, and

### Analyses based on simple models

Two simple two-trait models are implemented that partition the total variance in log-sampling variance into a between-line and a within-line component. The difference between the two models is whether marker information is included or not in the analysis.

#### Model *H*:

This model does not include marker information. Dropping the regression on markers (and therefore all the genetic marker effects ) results in a model with line and replicate effects only. The line effects in model *H* are assumed to be identically and independently distributed. The variance–covariance structure of marginal distribution of the observations *z* (with respect to lines) is block diagonal, where the elements within a block = 1 and specify the covariance between observations within lines. Due to the balanced nature of the data, maximum-likelihood estimates of variance components under Gaussian assumptions are equal to method of moments estimates that can be derived from standard ANOVA.

#### Model *G*:

This model includes marker information. Dropping the effects *h* and including an overall *g* effect leads to a model with replicate and line effects also, but in contrast to model *H*, here line effects have a correlated structure given by the genomic relationship matrix. In this model, the *g*’s are realizations from a common distribution . In model *G*, the variance–covariance structure of marginal distribution of the observations *z* includes a dominating block-diagonal structure, with elements given by the diagonals of the genomic relationship matrix (which, on average, = 1), and small off-block diagonals that describe the weak genetic marker-based measure of covariation among lines defined by the genomic relationship matrix.

Inferences from model *G* are reported as (20)for males, with a similar expression for females, and as (21)The ratio (20) describes the broad sense heritability or variance between lines as a proportion of the total variance, when only marker information is included in the model. The correlation (21) can be interpreted as the total genetic correlation between sexes.

Similarly, inferences from model *H* are reported as (22)for males, with a similar expression for females, and as (23)The ratio (22) describes the broad sense heritability or variance between lines as a proportion of the total variance, when marker information is excluded from the model. The correlations (23) and (21) can be interpreted as the total (broad sense) genetic correlation between sexes. The ratios (20) and (22) correspond to (16).

### Implementation of the models

The models were implemented using a Bayesian, Markov chain Monte Carlo approach, whereby the hyperparameters of the dispersion parameters of the Bayesian model were estimated by maximum likelihood, and conditional on these, the model was fitted using Markov chain Monte Carlo. To illustrate, consider determining a value of the scale parameter of an inverse chi-square distribution, which is the prior assumed for the variance components associated with *r*. This is achieved by equating the mode of the inverse chi-square prior density, for a fixed value of the degrees of freedom, to the maximum-likelihood estimate and solving for the scale parameter. The decision to use this approach rather than classical likelihood is based on the need to obtain measures of uncertainty that account as much as possible for the limited amount of information in the data without resorting to large sample theory, which is inherent in classical likelihood. This is relevant in the present study where the data are of limited size and results in marginal posterior distributions that are expected to be asymmetric, particularly in the case of the more complex, general two-trait model. The absence of symmetry of posterior distributions is well captured by the Bayesian MCMC methods, as a glance at results in Table 2 reveals. In a situation like this, summarizing results in the form of posterior means and standard deviations would be misleading. In the presence of asymmetry, posterior means are poor indicators of points of high probability mass. We therefore summarize inferences in terms of posterior modes and posterior intervals.

Details of the prior distributions of the parameters of the Bayesian models and of the Markov chain Monte Carlo algorithm can be found in File S1.

### Data availability

The data used in the study is available at http://dgrp2.gnets.ncsu.edu/data.html.

## Results

The first step of the analysis consisted of computing descriptive statistics and graphs (boxplots) based on which, by simple graphical inspection, we assessed the existence of differences in log-sample variance between lines and sexes. Subsequently we tested more formally the existence of variance heterogeneity between lines, using a Monte Carlo implementation of Bartlett’s test. This led to rejection of the null hypothesis of homogeneity of variance for both traits.

In a second step we performed more formal analyses, using two sets of models. The first set consists of simple models that partition the total variance in log-sampling variance into between- and within-line components. The second set employs more complex models whereby the variance between lines (in principle of genetic origin) is partitioned into components from four chromosome segments that capture variation due to linear regression on genetic markers and a remainder.

### Descriptive analyses

For starvation resistance, raw means (variances) of are, for males, 3.92 (0.53) and for females 4.47 (0.57). The raw value of the correlation of between sexes is 0.43. For startle response, these values are, for males, 3.09 (0.41) and for females 3.02 (0.44). The raw correlation is 0.73.

Figure 1 provides a summary of the empirical distribution of SR records. Figure 1A displays boxplots of the records by line sorted in decreasing order of log-sampling variance within lines, from left to right. The corresponding plots for SL are shown in Figure 2A. The results for males are similar and are not shown. Figure 1B and Figure 2B show scatter plots of the within-line log-sampling variance of males *vs.* females for SR and SL, respectively. Figure 1 and Figure 2 indicate marked heterogeneity of within-line variance across lines for both traits. Figure 1B and Figure 2B suggest that there is a poor linear relationship of within-line log-sample variance between sexes for SR but not for SL. In the case of SR, this is indicative of sexual dimorphism at the level of the mean (as indicated above, the mean is larger in females than in males).

### Testing for heterogeneity of environmental variance

Before fitting the parameterized models a Monte Carlo implementation of the Bartlett test was carried out within sex (technical details in File S1) to test for differences in the sampling variances (Equation 1). The *P*-values in both traits are of the order of 10^{−16}, leading to a clear rejection of the hypothesis of variance homogeneity between lines.

### Analyses based on model *H* and model *G*

MCMC estimates of posterior modes and 95% posterior intervals are shown in Table 1. These simple models lead to symmetric posterior distributions, as a glance at the numbers reveals. Posterior means and medians are almost indistinguishable from the posterior modes (not shown). Both models lead to very similar inferences. For SR, males show a little higher broad sense heritability than females. The total genetic correlation between sexes departs clearly from 1. These two sources of evidence provide a first indication of genetic variance in sexual dimorphism for this trait.

For SL the results indicate very similar broad sense heritabilities in the two sexes and total genetic correlations that are very close to 1. There is no indication of genetic variance in sexual dimorphism for this trait.

### Inclusion of sex chromosomes

As a small check, the analysis based on model *G* was repeated, including genetic marker information from the sex chromosomes. For SR, the modal values of the variance components between and within lines, and the between-line covariance between males and females, were as follows: including sex chromosomes, the estimates were 0.2278, 0.1966, and 0.1240, respectively. Excluding sex chromosomes, the estimates were 0.2308, 0.2009, and 0.1266, respectively. The modal values of the genetic correlation between sexes, including and excluding sex chromosomes, are 0.5857 and 0.5880, respectively. The differences are statistically indistinguishable. The same occurred for SL (not shown): inclusion or exclusion of sex chromosomes led to virtually identical results.

### Analysis based on the general two-trait model

Results summarized in terms of estimates of posterior modes and 95% posterior intervals of the proportion of the total genetic variance captured by genetic marker information from each of four chromosome arms (Equation 13), and of the within-chromosome arms genomic correlations (Equation 15), are displayed in Table 2. SR shows clear signals from chromosome arm 2*R* only in females and from chromosome 3*R* in both sexes. Judging by the 95% posterior intervals, the only within-chromosome genomic correlation that deserves mention is that associated with chromosome 3*R* (although the value of zero is included in the posterior interval). The remaining ones involve association between variables that hardly display variability at all. In this case the correlation does not contribute to phenotypic similarity and is very difficult to estimate. This is well captured by the Bayesian MCMC approach that results in wide posterior intervals reflecting large posterior uncertainty.

In contrast to SR, SL shows a very similar pattern in both sexes. There are clear signals from chromosome arms 2*L* and 3*L* in males and females. The genomic correlations between sexes in these chromosome arms are high and the posterior intervals do not include the value of zero.

The last row of Table 2 shows the overall proportion of the variance between lines captured by all the genetic marker information, as well as the overall genomic correlation. In the case of SR, this proportion is rather small in males (27%), it is 2.3 times larger in females (63%), and the total genomic correlation between sexes is moderate to small (0.36). However, for SL, the pattern is again very different. The total proportion of the genetic variance (variance between lines) captured by the markers is almost 100% in both sexes and the total genomic correlation between sexes is high (0.72). Therefore the general two-trait model substantiates the results suggested by the analyses based on model *H* and model *G*, indicating the existence of genomic variance in sexual dimorphism for SR and the lack of it in the case of SL.

In contrast to the results from Table 1, the posterior distributions in Table 2 show clear signs of asymmetry, which is attenuated when the signal is very strong. For example, the posterior mode, mean, and median in chromosome arm 2*L* for SR in males are 0.008, 0.035, and 0.019, respectively. For chromosome arm 3*R* for SR in females, which shows a moderate value, these numbers are 0.20, 0.24, and 0.23. However, for SL, chromosome arm 3*L* in females has a strong effect, and mode, mean, and median are 0.57, 0.55, and 0.55.

From the general two-trait model we also computed posterior modes and 95% posterior intervals of broad sense heritabilities and correlation defined in (16) and (18). These parameters have the same interpretation as those displayed in Table 1, inferred with models *H* and *G*. For SR, the estimates of (16a) and (16b) are 0.66 (0.51; 0.73) and 0.61 (0.48; 0.69), respectively. For the broad sense genetic correlation (Equation 18), the number is 0.52 (0.42; 0.65). These numbers are in good agreement with those in Table 1. For SL, the modal estimates of (16a) and (16b) are 0.87 (0.74; 0.94) and 0.85 (0.71; 0.93), respectively, and the number for (18) is 0.77 (0.57; 0.96). These broad sense heritabilities are a little higher than those reported in Table 1, and the broad sense genetic correlation is lower.

## Discussion

In previous work (Morgante *et al.* 2015) we produced evidence of genetic control of environmental variance for quantitative traits, using *Drosophila* inbred lines. Due to the design of the experiment and the possibility to control macroenvironmental variation in the laboratory, this evidence must be regarded as substantial, because differences between lines in environmental variance are a reflection of genetic variation. Here we extend this work by incorporating full sequence information and modern computationally intensive statistical methods. This provides opportunities for a more detailed investigation of environmental variability. This study revealed different mechanisms operating in the two traits and sexes. Specifically, for starvation resistance, the analyses indicated that the total variance in variance between lines is a little smaller in females than in males and the broad sense genetic correlation between sexes is markedly <1 (0.58). The fraction of the between-line variance captured by genomic markers is almost 2.3 times larger in females than in males. Further, the strength of the genomic signals across the four chromosome segments differs in the two sexes. On the other hand there is no indication of genetic or genomic variance in sexual dimorphism in startle response, where total variance in variance between lines is almost the same in both sexes, the broad sense genetic correlation between sexes is almost 1, and practically all the variance in environmental variance between lines is captured by marker information. In this trait, in agreement with the absence of genetic and genomic variance in sexual dimorphism, the genomic signals in the four chromosome segments are very similar in males and females.

### The marker-based model (*G*) and the marker-free model (*H*) lead to the same partitioning of the total variance into a between-line and a within-line component and the same broad sense genetic correlation

The analyses based on models *H* and *G* confirm the existence of a genetic component acting on environmental variance. Both models yielded very similar results for the estimates of broad sense heritabilities (∼0.65 for SR in males and 0.60 for SR in females and ∼0.70 for SL in both sexes) and for the broad sense genetic correlations (0.58 for SR and 0.97 for SL). These results are indicative of genetic variance in sexual dimorphism for SR and lack of it for SL. Sexual dimorphism has also been reported at the level of the mean by, for example, Mackay *et al.* (2012) and Zhou *et al.* (2012) who observed very different patterns for the number of dead flies *vs.* starvation time in males and females.

Genetic marker information, included in model *G* and excluded in model *H*, has an undetectable effect on the subdivision of the total variance. This is due to the presence of replicates within lines (genotypes) and the fact that lines have very small relationships among them. In the case of model *H*, the between-line component is completely defined by the covariation of the elements of *z* within lines. On the other hand, model *G* leads to a covariance structure of *z* that has dominating diagonal blocks, consisting of the diagonal elements of the (scaled) genomic relationship matrix, and off-diagonals that describe the weak genetic marker-based relationships among lines. In the case of model *G*, the elements of the diagonal blocks are 1 on average, in contrast to model *H*, where they are all exactly 1. In model *G*, the off-diagonal elements, in principle, contribute to additive genetic variance or to narrow sense heritability. Therefore estimates of the variance between lines or of broad sense heritability based on model *G* may be difficult to interpret and are expected to differ from estimates based on model *H*. However, our analyses do not detect such differences, presumably due to the very dominating block-diagonal structure, very similar in both models, in relation to the amount of data available. This situation is similar to what is encountered in the analysis of outbred populations, using genomic best linear unbiased prediction. In pedigreed populations with strong family structures, estimates of heritability obtained using pedigree-based or genetic marker-based covariance matrices lead to similar results (de los Campos *et al.* 2013). In the present data, the “strong family structure” is represented by the replicated genotypes within lines.

### Partition of variance between lines using genetic markers from different chromosome arms reveals further differences between traits and sexes

The results from models *H* and *G* are further supported by the general two-trait model that allows a partitioning of the total variance between lines into contributions from four chromosomal segments and a remainder not captured by markers. For SR, the partitioning indicates a strong signal generated by chromosomal segment 2*R* in females and clearly detectable signals from chromosome arm 3*R* in both sexes. In contrast, for SL, segments 2*L* and 3*L* generate strong signals in both sexes. We conclude that sexes show different signal behavior in the case of SR and a similar one in the case of SL. Our data do not allow an investigation of the nature of the differences of the signals between males and females.

Another line of evidence supporting these results was sought by carrying out an informal genome-wide association analysis for each trait, within sexes. Even though the experiment is underpowered for SNP effect detection, we simply counted the number of markers whose effects were associated with *P*-values arbitrarily chosen to be <0.01 and the results are shown in Table 3. A glance at the numbers in Table 3 shows that the genomic regions enriched with genetic markers with stronger signals agree well with the pattern displayed in Table 2. That is, for SR, males and females show the largest numbers in 3*R* followed by another large number (4954) only for females in arm 2*R*. On the other hand, for SL, both males and females show the largest numbers in Table 3 in relation to arms 2*L* and 3*L*. Arriving at the same conclusion using different approaches provides stronger support for our findings.

### The nature of the variance explained by genetic markers

In the present experiment the observed variance between inbred lines provides an adequate description of the total genetic variance of the trait, which can be inferred directly from models *H* and *G*. The analysis based on the general two-trait model indicates that the linear regression of the log-sampling variances on genetic markers captures only a fraction of this genetic variation in the case of SR and practically all of it in the case of SL. Further, in the case of SR this fraction is less than half as large in males as in females (27% *vs.* 63%).

Possible explanations of the difference between the estimated broad sense heritability and the proportion of variance explained by linear regression on sequence-derived SNPs may include rare variants not captured by the sequence-derived SNPs, structural variation, and epistatic interactions (the latter was found to be an important mechanism acting at the level of the mean for both SR and SL by Huang *et al.* 2012). Dominance is not a plausible explanation because the great majority of the data involve homozygotes only. Further work is clearly needed to elucidate the factors that can explain why the linear regression on genetic markers accounts for different proportions of the total genetic variance (variance between lines), depending on the trait and the sex.

## Acknowledgments

The study was funded by the Danish Strategic Research Council (GenSAP: Centre for Genomic Selection in Animals and Plants, contract no. 12-132452). GDLC and DS acknowledge financial support from NIH grants GMR01101219 and GMR01099992.

## Footnotes

*Communicating editor: G. A. Churchill*Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.180273/-/DC1.

- Received July 4, 2015.
- Accepted August 5, 2015.

- Copyright © 2015 by the Genetics Society of America