## Abstract

The interaction between genotype and environment is recognized as an important source of experimental variation when complex traits are measured in the mouse, but the magnitude of that interaction has not often been measured. From a study of 2448 genetically heterogeneous mice, we report the heritability of 88 complex traits that include models of human disease (asthma, type 2 diabetes mellitus, obesity, and anxiety) as well as immunological, biochemical, and hematological phenotypes. We show that environmental and physiological covariates are involved in an unexpectedly large number of significant interactions with genetic background. The 15 covariates we examined have a significant effect on behavioral and physiological tests, although they rarely explain >10% of the variation. We found that interaction effects are more frequent and larger than the main effects: half of the interactions explained >20% of the variance and in nine cases exceeded 50%. Our results indicate that assays of gene function using mouse models should take into account interactions between gene and environment.

IT is widely recognized that environmental variables, such as who carries out the experiment and when, and physiological variables, such as sex and weight, are confounds that need to be accounted for during the collection of mouse phenotypes. Many articles attest to the effect of these variables on phenotypic values (*e.g*., Chesler *et al.* 2002a; Champy *et al.* 2004) and point out the need for rigorous standardization of laboratory practice (Henderson 1970; Crabbe *et al.* 1999; Brown *et al.* 2005). It is also acknowledged that the size and even direction of environmental effects on a phenotype can vary with genotype, a phenomenon known as gene-by-environment interaction, and this has been documented in studies of rodents over the past 50 years (*e.g*., Cooper and Zubek 1958).

Following a report on the importance of laboratory-by-strain interaction (Crabbe *et al.* 1999), recent interest has focused on the prevalence and size of such interactions, as well as their ability to increase power in genetic mapping experiments (Wang *et al.* 2006). Table 1 summarizes the available data and shows that the picture of how much genetic and environmental factors interact is piecemeal: our knowledge of the relative size of interaction and main effects is limited to a handful of phenotype–covariate combinations.

During an investigation of the genetic basis of complex traits in 2448 genetically heterogeneous stock (HS) mice (1220 female, 1228 male) (Solberg *et al.* 2006), we collected environmental and physiological covariates. The mice we used were descended from eight inbred strains (A/J, AKR/J, BALBc/J, CBA/J, C3H/HeJ, C57BL/6J, DBA/2J, and LP/J) (Demarest *et al.* 2001), incorporating more genetic variation from a single cross than has hitherto been assessed in mice. The generality of our findings is enhanced by our use of a battery of tests that includes both behavioral and a broad range of physiological phenotypes (Solberg *et al.* 2006), summarized in Table 2 (the names of all phenotypes are given in Table 3).

## METHODS

#### Animals:

Original Northport HS mice were obtained from Robert Hitzemann at the Oregon Health Sciences Unit, Portland, Oregon. At the time the animals arrived they had passed 50 generations of pseudorandom breeding (Demarest *et al.* 2001). A breeding colony in open cages was established at Oxford University to generate animals for phenotyping. The animals' pedigree comprising the parents and grandparents of the phenotyped animals was recorded.

#### Phenotypes and covariates:

The phenotypes used in this study and the protocol used to collect them are fully described in Solberg *et al.* (2006) and summarized in Table 2. We collected 15 covariates (Table 4). Seven are mouse-specific covariates (short names quoted in brackets where needed): sex, age, cage identifier (*i.e*., a unit of shared environment), weight at 9 weeks (“weight”), number of animals in a cage (“cage density”), sibship (“family”), and which litter the mouse came from (“litter”; *e.g*., “3” means the animal came from his parents' third litter); three are test-specific covariates: experimenter, test order, and apparatus (if more than one was used); and five covariates are for the time of the experiment: year, season (the group of three months), month, hour (time rounded to the nearest hour), and “study day,” defined as the number of days from start of the study on January 20, 2003.

In the analysis, we fitted statistical models for each phenotype, first testing the significance of each covariate as a main effect and then its interaction with genetic background. Covariates were either treated as continuous variables [age, cage density, litter, study day (continuous), weight] or encoded as categorical factors taking discrete levels (apparatus, cage, experimenter, sex, hour, month, season, year, and family). Note that although hour could have been treated as continuous, that would have allowed detection of only linear trends between time and phenotype, whereas as a factor it can be used to detect nonlinear relationships.

#### Statistical analysis:

All analysis was carried out using the R statistical package (R Development Core Team 2004), along with the add-on packages lme4 (Pinheiro and Bates 2000), MASS (Venables and Ripley 2002), and regress (Clifford and McCullagh 2005).

We applied normalizing transformations to each phenotype, guided by the Box–Cox procedure (Venables and Ripley 2002), and in most cases this comprised a simple exponentiation or log transform to correct skewness (see Table 5). Phenotypes with symmetrical but highly long-tailed distributions were corrected with a simplified Blom transformation (Blom 1958), in which the value is replaced by the probit of its empirical distribution function probability. Asymmetric highly skewed long-tailed distributions best modeled as exponential or gamma distributions were excluded from the analysis, as were categorical phenotypes and latency phenotypes that require survival analysis. After transformation, each phenotype was trimmed by removing values more than 3 standard deviations from the mean to moderate the effects of outliers.

#### Modeling the heritability and the effect of common environment:

We used a variance-components approach to model the effect of genetic background. Here the genetic effect on an animal's phenotype is a value drawn from a normal distribution constrained such that the genetic effects of different animals correlate with their relatedness. First we fitted a standard additive genetic, common environmental error, unique environmental error (ACE) model to obtain estimates of the proportion of phenotypic variance attributable to additive genetic effects (*i.e*., the heritability) and to shared environmental effects. Second, we used an approximation to the ACE model that could be extended to test for the effect of individual environmental covariates.

We formulated the ACE model as follows. Let *n* be the total number of animals, be the number of cages, μ be the grand mean, be the phenotype of the *i*th animal in the *j*th cage, be that animal's additive genetic random effect, be its value for covariate *c*, be the fixed effect associated with covariate *c*, *C* be the set of fixed-effect covariates, be the random effect of cage *j*, and be the random effect of uncorrelated environmental noise. Then(1)where, the *n*-vector , the -vector , and the *n*-vector , where is the additive genetic relationship matrix (*e.g*., see Lynch and Walsh 1998) computed from the pedigree. We estimated the heritability of each phenotype, *i.e*., the proportion of variance attributable to additive genetic variation, as and the size of the common environmental effect as , where is the phenotypic variance. The set of covariates chosen for *C* was sex, litter, and, for phenotypes not directly related to body mass, weight. Fitting was done by restricted estimate maximum likelihood (REML), using the R package regress.

#### Testing main effects of covariates:

For each phenotype we tested the significance of individual covariates using an approximation to the ACE model above. We employed a random family effect as a surrogate for the genetic effect, replacing the random effect , specific to individual *i*, with a random effect , specific to family *q*. As explained below, this substitution amounts to a reparameterization that affects in a predictable fashion only the estimated variance of random terms. Also, because we wish to examine the effects of individual environmental covariates, we excluded a catch-all random effect for cage, which would otherwise be heavily confounded with any individual environmental covariate. Using notation similar to that above, the model for testing the significance of covariate was(2)where are the fixed effects associated with covariate *c*, is the component of the design matrix representing the *i*th animal's value for covariate *c*, and are defined similarly for , and is such that if there are nuclear families then the . We measured the significance of the covariate as the improvement in fit conferred by covariate after certain basic covariates (*C*) had already been included. The set *C* usually comprised sex and, for phenotypes not directly related to body mass, weight. When was weight, *C* comprised only sex; when was sex, *C* was empty. The significance of the fixed effect was assessed using an approximation to the sequential *F*-test based on the Wald test (Pinheiro and Bates 2000). We fit all models by REML using the lmer function from the R package lme4 (Pinheiro and Bates 2000).

#### Testing interaction effects between covariates and family:

We define the “interaction model” for the covariate and family by adding a term to the main-effects model in Equation 3 to allow each family to have its own effect for that covariate. For factor covariates, the interaction model included a random intercept nested within family, *i.e*.,(3)where is the fixed effect associated with category *k* of covariate , and is the random effect for category *k* in family *q*, such that if there are unique combinations of category and family then the -vector . For continuous covariates, the interaction model included a random slope for conditioned on family, *i.e*.,(4)where is the fixed coefficient of covariate , is the random deviation from that coefficient in family *q*, and the correlation between the random intercept *f* and slope *u* is unrestricted. We assessed the significance of the interaction model (Equation 3 or Equation 4) by a likelihood-ratio test (LRT) with the corresponding main-effects model. Note that by using the change in the number of degrees of freedom to parameterize the chi-square distribution used for the LRT, our *P*-values for interaction effects are slightly conservative (Self and Liang 1987). We used the Dunn–Šidák correction, an exact form of the Bonferroni correction (Sahai and Ageel 2000), to take account of the number of tests performed. For *N* tests, the corrected 5% threshold is .

The magnitude of a covariate's effect is defined as the percentage of phenotypic variance it explains, estimated in the model used to test its significance. For fixed effects, this is the percentage of the total sum of squares attributable to the effect in a sequential ANOVA table after fitting the other covariates (known in some literature as η^{2}; Olejnik and Algina 2003). For random effects, it is the estimated variance of the effect expressed as a percentage of the total phenotypic variance. Where the random effect is based on an interaction with family, we report the percentage variance as twice that of the estimated amount, in accordance with the reparameterization formulas described below.

Our use of family as a surrogate for the genetic effect means we underestimate the effect size of interactions by a factor of two. However, this difference is entirely superficial. Suppose the *n* animals are sorted in order of their nuclear families. When fitting the family effect, the *n*-vector of random effects is distributed as , where the matrix **F** is block diagonal such that is 1 if *i* and *j* are in the same sibship and 0 otherwise (note that parents are not included in the analysis because phenotypes were collected only on the offspring). The covariance matrix for all random effects is therefore(5)where is the environmental variance when using family for the genetic effect. This models all animals within a sibship as if they were genetically identical and all sibships as nuclear. Treating sibships as nuclear is reasonable in our case since the sparsity of our additive genetic relationship matrix means that , where when *i = j*, 0.5 when *i* and *j* are sibs, and 0 otherwise, and we found empirically that in this data set the likelihood ratios using the full pedigree **A** matrix were very close to those obtained using the nuclear approximation **S**. Using the approximation **S** for **A**, our heritability models a covariance matrix(6)

Substituting the equality and equating the coefficients of **F** and **I**, it follows that such that when estimated, , which agrees with our observed discrepancy between family-effect size and heritability. Similarly . Thus the two models are reparameterizations of each other. When fitted, they have identical likelihood ratios, and hence is an estimate of the true additive genetic variance.

Our estimates of the variance attributable to gene-by-environment effects also rely on the use of the family surrogate. Applying a similar argument to that above we can show that those variance estimates are also half what they would be if we used the **S** matrix. The variance of the interaction model for categorical covariates (Equation 4) is(7)where is the variance of the interaction and is its correlation matrix, which is simply **F** but with when animals *i* and *j* are in different categories. If we were to use **S** (an approximation for **A**) in place of **F** we would have(8)with being the interaction between the categorical covariate and the additive genetic effect. However, since and , it follows that and therefore . For interactions between a continuous covariate *x* and family (Equation 5) the variance is(9)where when **x** is the *n*-vector of *x* for the *n* animals. If we were to use **S**-approximation for **A** the variance would be(10)Substituting as before, , which implies . Hence, in all cases the estimated variance of an additive genetic component is simply twice that of the corresponding family component.

## RESULTS

Of the 102 phenotypes available for analysis (Solberg *et al.* 2006), 88 could be accommodated in our linear mixed modeling framework (see methods). We obtained data for 15 covariates (Table 4): age, apparatus (for those tests where multiple units were used), cage (a variable indicating animals that were housed in the same cage), cage density (the number of animals in a cage), experimenter, family (defined as the offspring of two parents), sex, hour, litter (a number representing the birth order of each litter for a given sire and dam), month, season, study day, test order, weight, and year. An average of 10.3 covariates were recorded per phenotype (since not all phenotype–covariate combinations were available), leading to an average of 69.4 phenotypes measured per covariate. In total, we performed 1804 statistical tests. The significance of results is reported as the negative base 10 logarithm of the *P*-value (log *P*) of the relevant test. We took account of multiple testing by using the Dunn–Šidák correction, which for α = 5% comparisonwise error rate yielded a significance threshold of log *P* = 4.55.

We assessed initially the importance of three physiological covariates (sex, weight, and age). We fitted the covariates sequentially in the order sex, then weight, then age, so that, for instance, our reported significance for weight refers to how much it improved the fit of a model that already included sex. We included family in all models to ensure tested covariates were significant over and above genetic effects. Family, modeled as a random effect, is highly correlated with heritability (correlation of 0.89) and so acts a surrogate for the effect of additive genetic variation (see methods). We report estimates of heritability for all phenotypes in Table 5.

The effects of sex, weight, and age were relatively small (Figure 1b, “main effect” rows): sex effects explained >10% of the variance for 14 phenotypes; in more than half of the cases the effect was <5%; weight accounted for >10% of the variance for three phenotypes; all age effects were <2% (see appendix).

We estimated the significances and effects of the remaining covariates by adding each to a model that already included family, sex, and weight. Significant main effects of covariates were more common in physiological than behavioral phenotypes (33% of the time *vs.* 13%; see Table 6). Overall, 21 of the 258 significant effects explained >10% of the variance; the five cases of when a covariate explained >25% of the variance involved sex. Table 6 provides a summary for each covariate, splitting results by category of phenotype. Figure 1 plots log *P*-values and the percentage of phenotypic variance explained by significant covariates. Figure 2 summarizes the variance explained by significant covariates for the 16 subcategories of phenotype.

We then extended our model to test for gene-by-covariate interactions, taking the main-effects models reported above and then assessing how much adding interaction terms improved the fit. We found 389 significant interaction effects. Figure 3 illustrates the interaction between sex and family on the percentage of B-cells (%B220^{+}) among white blood cells. It shows that the effect of sex is often marked within families but its direction can vary between families. Similarly, Figure 4 illustrates the interaction between family and season on mean adrenal weight measured at 10 weeks. It shows seasonal means (spring in green, summer in red, autumn in brown, and winter in blue) for 28 families. In 11 families, adrenal glands are heaviest when harvested in winter, whereas in 9 families they are heaviest in summer. The seasonal effects are strong within but inconsistent across families, reflecting the greater importance of interaction over main effects.

The distribution of the 389 significant interaction effects differed from that of the main effects (Figure 1 and appendix). Remarkably, half of the effects could explain >20% of the variance. In nine cases the interaction could explain >50% of the variance. The largest numbers of interactions were with month (65 significant effects), season (55), sex (53), litter (51), and cage density (40). There were only 13 significant interactions with experimenter.

Physiological phenotypes showed the largest number of interactions with covariates (56% of interactions tested were significant; Table 7). Largest effects were found on mean cellular hemoglobin concentration, serum sodium and serum chloride concentrations, and plethysmography measures. There were fewer interactions with behavioral phenotypes (5% of interactions tested were significant, amounting to 11 in total), although the effect sizes were much the same on average (mean of 18.1% for behavior compared with a mean of 18.6% for physiology; see Figure 2).

## DISCUSSION

We have carried out the first systematic analysis of a range of covariates across multiple phenotypes (see appendix). We have estimated the heritability of 88 phenotypes, assessed the impact of a number of environmental factors, and measured the size of gene-by-environment interactions. Our large data set provides the most robust assessments to date of these measures in both behavioral and physiological domains.

We found large interactions between gene and environment and report that the effects are not restricted to behavioral phenotypes (see appendix). We do not believe this is an artifact of our analysis. Our calculations of percentage variance for random interaction effects and for fixed main effects are only roughly comparable with each other (see methods) and the interaction effects are subject to a slight upward bias. However, that is not sufficient to account for the substantially higher effect of significant interactions (18.6%) compared with significant main effects (3.7%). Second, inhomogeneity of phenotype variance across families is also unlikely to account for our findings since in many cases the rank order of covariate effects differs between families (Ungerer *et al.* 2003) as illustrated in Figures 3 and 4.

We report the effects of covariates as the percentage of phenotypic variance they explain and in doing so provide one assessment of how environmental covariates influence a phenotype. But the true nature of this interaction is more complex. For example, the concentration of alanine transaminase is subject to gene-by-environment interactions of month, accounting for 48.49% of the phenotypic variance, of season, accounting for 45.51%, and of litter, accounting for 18.17%. Yet these effects combine, with further covariates, to produce 100%. How is this possible?

The correlational structure of our data complicates an assessment of the relative importance of different covariates and interactions. The observed phenotypic variance is the sum of the variances of the covariates minus twice the covariances between the covariates. This means that two covariates could have individual effects of 50% but a summed effect of 60% if they are positively correlated (or one of <50% if they are negatively correlated). An observed covariate effect, just like an observed QTL effect, therefore includes a portion of the effect of any element that correlates with it; an actual month effect will partly manifest as observed litter and season effects and vice versa. A more comprehensive analysis would build a complete picture of each phenotype in the context of a path diagram or structural equation model that enumerated all relationships, both raw effects and correlations, between actors (*e.g*., Lynch and Walsh 1998).

The importance of gene-by-environment interactions has been emphasized in the analysis of mouse behavior and largely ignored in studies of mouse physiology. In the light of this, we designed our phenotyping protocol to minimize the effects of covariates on behavioral measures. All such tests were automated, so that the experimenter's intervention was limited to placing animals in the apparatus. This may explain why some covariates, previously suspected to influence behavioral phenotypes, were found to make a small contribution to the variance: time of day (hour) was a nonsignificant (or hardly significant with negligible effect) contributor to all measures including those that utilize exploration as a measure of anxiety (elevated plus maze, which had observations from 9 different hours of the day, and open field, which had observations from 10), despite the fact that exploratory activity has been reported to vary throughout the day (Aschoff 1981). The order in which animals are tested is also considered to have an important effect on behavior (Harro 1997), but we found no evidence for this: its effect was nonsignificant on all phenotypes measured.

Physiological phenotypes were not so controlled. There are no automated ways of administering an intraperitoneal glucose tolerance test, for example, and we observed large experimenter effects on these tests. This raises the question as to whether some phenotypes are more susceptible to interaction effects than others. Differences in the assessment protocols cannot be the only factor that accounts for the smaller number of interactions in behavioral tests. There are a number of covariates common to all phenotypes whose effects we could not ameliorate: month, season, year, sex, and weight. All of these covariates impinge more on physiological than on behavioral phenotypes (Tables 6 and 7).

Importantly, we observed many significant and large gene-by-environment interactions in our analysis of physiological phenotypes. Biochemical measures showed strong (>10% effect) gene-by-environment interactions with month (in 14 of 16 biochemical phenotypes), sex (12), season (9), and litter (8). We saw a similar pattern of strong seasonal and sex effects for hematology, immunology, plethysmography (which also had a strong hour interaction), and the glucose tolerance test (which also had a strong experimenter interaction). This has profound implications for QTL studies.

QTL detection experiments suffer when covariates are not adequately accommodated in the experimental design and subsequent analysis. First, a QTL may owe some, or indeed all, of its significance to an environmental effect confounded with the allelic variant. When a phenotype is strongly affected by who performed the experiment, any nonfunctional variant that correlates with the experimenter will manifest as a significant, but spurious, effect. The random nature of recombination means that in any experimental cross a fully balanced design is impossible and so confounds of this type are ineluctable. While the impact of covariates can be minimized by regressing out their effects prior to mapping (*e.g*., Valdar *et al.* 2006), this is highly conservative, since in the converse scenario, where experimenter acts as a surrogate variable for an actual QTL effect, the QTL will be missed.

Second, an interaction between a QTL and an environmental covariate may conceal the effect of both, even when covariate and QTL are in the model. For instance, if mice with allele *a* fear experimenter John more than experimenter Alice, but mice with allele *A* fear Alice more than John and all four conditions occur in about equal proportion, then neither experimenter nor QTL will have an observed effect. To recover the genetic effect in this case it is necessary to model the interaction in the mapping procedure (*e.g*., Wang *et al.* 2006).

Our analyses are limited by the relatively small number of covariates that we collected. We have no information on temperature fluctuation and humidity levels [shown to be important for behavioral tests of nociception (Chesler *et al.* 2002a,b)], which might explain month and seasonal effects. We have no information on noise levels that are significantly increased during working hours (Milligan *et al.* 1993). The predominance of significant temporal covariates reflects the importance of many other unknown environmental factors whose effect is moderated through the animals' genotypes. Thus the dissection of complex phenotypes in the mouse will require far more sophisticated observation and analysis of these interactions than has hitherto been attempted.

## Acknowledgments

W.V. gratefully acknowledges receipt of an Access to Research Infrastructures fellowship under Örjan Carlborg, Uppsala University, Sweden, and additionally thanks Mike Neale, Tom Price, and Peter Visscher for helpful discussions. This work was funded by grants from the Wellcome Trust and the European Union Framework 6 Programme, contract no. LHSG-CT-2003-503265.

## Footnotes

Communicating editor: G. Gibson

- Received April 26, 2006.
- Accepted July 23, 2006.

- Copyright © 2006 by the Genetics Society of America