The interaction between genotype and environment is recognized as an important source of experimental variation when complex traits are measured in the mouse, but the magnitude of that interaction has not often been measured. From a study of 2448 genetically heterogeneous mice, we report the heritability of 88 complex traits that include models of human disease (asthma, type 2 diabetes mellitus, obesity, and anxiety) as well as immunological, biochemical, and hematological phenotypes. We show that environmental and physiological covariates are involved in an unexpectedly large number of significant interactions with genetic background. The 15 covariates we examined have a significant effect on behavioral and physiological tests, although they rarely explain >10% of the variation. We found that interaction effects are more frequent and larger than the main effects: half of the interactions explained >20% of the variance and in nine cases exceeded 50%. Our results indicate that assays of gene function using mouse models should take into account interactions between gene and environment.

IT is widely recognized that environmental variables, such as who carries out the experiment and when, and physiological variables, such as sex and weight, are confounds that need to be accounted for during the collection of mouse phenotypes. Many articles attest to the effect of these variables on phenotypic values (e.g., Chesler et al. 2002a; Champy et al. 2004) and point out the need for rigorous standardization of laboratory practice (Henderson 1970; Crabbe et al. 1999; Brown et al. 2005). It is also acknowledged that the size and even direction of environmental effects on a phenotype can vary with genotype, a phenomenon known as gene-by-environment interaction, and this has been documented in studies of rodents over the past 50 years (e.g., Cooper and Zubek 1958).

Following a report on the importance of laboratory-by-strain interaction (Crabbe et al. 1999), recent interest has focused on the prevalence and size of such interactions, as well as their ability to increase power in genetic mapping experiments (Wang et al. 2006). Table 1 summarizes the available data and shows that the picture of how much genetic and environmental factors interact is piecemeal: our knowledge of the relative size of interaction and main effects is limited to a handful of phenotype–covariate combinations.

View this table:

Recent reports of gene-by-environment interactions in mouse

During an investigation of the genetic basis of complex traits in 2448 genetically heterogeneous stock (HS) mice (1220 female, 1228 male) (Solberg et al. 2006), we collected environmental and physiological covariates. The mice we used were descended from eight inbred strains (A/J, AKR/J, BALBc/J, CBA/J, C3H/HeJ, C57BL/6J, DBA/2J, and LP/J) (Demarest et al. 2001), incorporating more genetic variation from a single cross than has hitherto been assessed in mice. The generality of our findings is enhanced by our use of a battery of tests that includes both behavioral and a broad range of physiological phenotypes (Solberg et al. 2006), summarized in Table 2 (the names of all phenotypes are given in Table 3).

View this table:

Summary of phenotypes analyzed, number of animals, and mean age (in days) at which the animals were analyzed

View this table:

Phenotypes assessed in the project



Original Northport HS mice were obtained from Robert Hitzemann at the Oregon Health Sciences Unit, Portland, Oregon. At the time the animals arrived they had passed 50 generations of pseudorandom breeding (Demarest et al. 2001). A breeding colony in open cages was established at Oxford University to generate animals for phenotyping. The animals' pedigree comprising the parents and grandparents of the phenotyped animals was recorded.

Phenotypes and covariates:

The phenotypes used in this study and the protocol used to collect them are fully described in Solberg et al. (2006) and summarized in Table 2. We collected 15 covariates (Table 4). Seven are mouse-specific covariates (short names quoted in brackets where needed): sex, age, cage identifier (i.e., a unit of shared environment), weight at 9 weeks (“weight”), number of animals in a cage (“cage density”), sibship (“family”), and which litter the mouse came from (“litter”; e.g., “3” means the animal came from his parents' third litter); three are test-specific covariates: experimenter, test order, and apparatus (if more than one was used); and five covariates are for the time of the experiment: year, season (the group of three months), month, hour (time rounded to the nearest hour), and “study day,” defined as the number of days from start of the study on January 20, 2003.

View this table:

Covariates used in the study

In the analysis, we fitted statistical models for each phenotype, first testing the significance of each covariate as a main effect and then its interaction with genetic background. Covariates were either treated as continuous variables [age, cage density, litter, study day (continuous), weight] or encoded as categorical factors taking discrete levels (apparatus, cage, experimenter, sex, hour, month, season, year, and family). Note that although hour could have been treated as continuous, that would have allowed detection of only linear trends between time and phenotype, whereas as a factor it can be used to detect nonlinear relationships.

Statistical analysis:

All analysis was carried out using the R statistical package (R Development Core Team 2004), along with the add-on packages lme4 (Pinheiro and Bates 2000), MASS (Venables and Ripley 2002), and regress (Clifford and McCullagh 2005).

We applied normalizing transformations to each phenotype, guided by the Box–Cox procedure (Venables and Ripley 2002), and in most cases this comprised a simple exponentiation or log transform to correct skewness (see Table 5). Phenotypes with symmetrical but highly long-tailed distributions were corrected with a simplified Blom transformation (Blom 1958), in which the value is replaced by the probit of its empirical distribution function probability. Asymmetric highly skewed long-tailed distributions best modeled as exponential or gamma distributions were excluded from the analysis, as were categorical phenotypes and latency phenotypes that require survival analysis. After transformation, each phenotype was trimmed by removing values more than 3 standard deviations from the mean to moderate the effects of outliers.

View this table:

Transformations, heritabilities, and common environment effects for 88 phenotypes listed in order of heritability

Modeling the heritability and the effect of common environment:

We used a variance-components approach to model the effect of genetic background. Here the genetic effect on an animal's phenotype is a value drawn from a normal distribution constrained such that the genetic effects of different animals correlate with their relatedness. First we fitted a standard additive genetic, common environmental error, unique environmental error (ACE) model to obtain estimates of the proportion of phenotypic variance attributable to additive genetic effects (i.e., the heritability) and to shared environmental effects. Second, we used an approximation to the ACE model that could be extended to test for the effect of individual environmental covariates.

We formulated the ACE model as follows. Let n be the total number of animals, Math be the number of cages, μ be the grand mean, Math be the phenotype of the ith animal in the jth cage, Math be that animal's additive genetic random effect, Math be its value for covariate c, Math be the fixed effect associated with covariate c, C be the set of fixed-effect covariates, Math be the random effect of cage j, and Math be the random effect of uncorrelated environmental noise. ThenMath(1)where, the n-vector Math, the Math-vector Math, and the n-vector Math, where Math is the Math additive genetic relationship matrix (e.g., see Lynch and Walsh 1998) computed from the pedigree. We estimated the heritability of each phenotype, i.e., the proportion of variance attributable to additive genetic variation, as Math and the size of the common environmental effect as Math, where Math is the phenotypic variance. The set of covariates chosen for C was sex, litter, and, for phenotypes not directly related to body mass, weight. Fitting was done by restricted estimate maximum likelihood (REML), using the R package regress.

Testing main effects of covariates:

For each phenotype we tested the significance of individual covariates using an approximation to the ACE model above. We employed a random family effect as a surrogate for the genetic effect, replacing the random effect Math, specific to individual i, with a random effect Math, specific to family q. As explained below, this substitution amounts to a reparameterization that affects in a predictable fashion only the estimated variance of random terms. Also, because we wish to examine the effects of individual environmental covariates, we excluded a catch-all random effect for cage, which would otherwise be heavily confounded with any individual environmental covariate. Using notation similar to that above, the model for testing the significance of covariate Math wasMath(2)where Math are the fixed effects associated with covariate c, Math is the component of the design matrix representing the ith animal's value for covariate c, Math and Math are defined similarly for Math, and Math is such that if there are Math nuclear families then the Math Math. We measured the significance of the covariate Math as the improvement in fit conferred by covariate Math after certain basic covariates (C) had already been included. The set C usually comprised sex and, for phenotypes not directly related to body mass, weight. When Math was weight, C comprised only sex; when Math was sex, C was empty. The significance of the fixed effect Math was assessed using an approximation to the sequential F-test based on the Wald test (Pinheiro and Bates 2000). We fit all models by REML using the lmer function from the R package lme4 (Pinheiro and Bates 2000).

Testing interaction effects between covariates and family:

We define the “interaction model” for the covariate Math and family by adding a term to the main-effects model in Equation 3 to allow each family to have its own effect for that covariate. For factor covariates, the interaction model included a random intercept nested within family, i.e.,Math(3)where Math is the fixed effect associated with category k of covariate Math, and Math is the random effect for category k in family q, such that if there are Math unique combinations of category and family then the Math-vector Math. For continuous covariates, the interaction model included a random slope for Math conditioned on family, i.e.,Math(4)where Math is the fixed coefficient of covariate Math, Math is the random deviation from that coefficient in family q, and the correlation between the random intercept f and slope u is unrestricted. We assessed the significance of the interaction model (Equation 3 or Equation 4) by a likelihood-ratio test (LRT) with the corresponding main-effects model. Note that by using the change in the number of degrees of freedom to parameterize the chi-square distribution used for the LRT, our P-values for interaction effects are slightly conservative (Self and Liang 1987). We used the Dunn–Šidák correction, an exact form of the Bonferroni correction (Sahai and Ageel 2000), to take account of the number of tests performed. For N tests, the corrected 5% threshold is Math.

The magnitude of a covariate's effect is defined as the percentage of phenotypic variance it explains, estimated in the model used to test its significance. For fixed effects, this is the percentage of the total sum of squares attributable to the effect in a sequential ANOVA table after fitting the other covariates (known in some literature as η2; Olejnik and Algina 2003). For random effects, it is the estimated variance of the effect expressed as a percentage of the total phenotypic variance. Where the random effect is based on an interaction with family, we report the percentage variance as twice that of the estimated amount, in accordance with the reparameterization formulas described below.

Our use of family as a surrogate for the genetic effect means we underestimate the effect size of interactions by a factor of two. However, this difference is entirely superficial. Suppose the n animals are sorted in order of their Math nuclear families. When fitting the family effect, the n-vector of random effects is distributed as Math, where the matrix F is block diagonal such that Math is 1 if i and j are in the same sibship and 0 otherwise (note that parents are not included in the analysis because phenotypes were collected only on the offspring). The covariance matrix for all random effects is thereforeMath(5)where Math is the environmental variance when using family for the genetic effect. This models all animals within a sibship as if they were genetically identical and all sibships as nuclear. Treating sibships as nuclear is reasonable in our case since the sparsity of our additive genetic relationship matrix means that Math, where Math when i = j, 0.5 when i and j are sibs, and 0 otherwise, and we found empirically that in this data set the likelihood ratios using the full pedigree A matrix were very close to those obtained using the nuclear approximation S. Using the approximation S for A, our heritability models a covariance matrixMath(6)

Substituting the equality Math and equating the coefficients of F and I, it follows that Math such that when estimated, Math, which agrees with our observed discrepancy between family-effect size and heritability. Similarly Math. Thus the two models are reparameterizations of each other. When fitted, they have identical likelihood ratios, and hence Math is an estimate of the true additive genetic variance.

Our estimates of the variance attributable to gene-by-environment effects also rely on the use of the family surrogate. Applying a similar argument to that above we can show that those variance estimates are also half what they would be if we used the S matrix. The variance of the interaction model for categorical covariates (Equation 4) isMath(7)where Math is the variance of the interaction and Math is its correlation matrix, which is simply F but with Math when animals i and j are in different categories. If we were to use S (an approximation for A) in place of F we would haveMath(8)with Math being the interaction between the categorical covariate and the additive genetic effect. However, since Math and Math, it follows that Math and therefore Math. For interactions between a continuous covariate x and family (Equation 5) the variance isMath(9)where Math when x is the n-vector of x for the n animals. If we were to use S-approximation for A the variance would beMath(10)Substituting Math as before, Math, which implies Math. Hence, in all cases the estimated variance of an additive genetic component is simply twice that of the corresponding family component.


Of the 102 phenotypes available for analysis (Solberg et al. 2006), 88 could be accommodated in our linear mixed modeling framework (see methods). We obtained data for 15 covariates (Table 4): age, apparatus (for those tests where multiple units were used), cage (a variable indicating animals that were housed in the same cage), cage density (the number of animals in a cage), experimenter, family (defined as the offspring of two parents), sex, hour, litter (a number representing the birth order of each litter for a given sire and dam), month, season, study day, test order, weight, and year. An average of 10.3 covariates were recorded per phenotype (since not all phenotype–covariate combinations were available), leading to an average of 69.4 phenotypes measured per covariate. In total, we performed 1804 statistical tests. The significance of results is reported as the negative base 10 logarithm of the P-value (log P) of the relevant test. We took account of multiple testing by using the Dunn–Šidák correction, which for α = 5% comparisonwise error rate yielded a significance threshold of log P = 4.55.

We assessed initially the importance of three physiological covariates (sex, weight, and age). We fitted the covariates sequentially in the order sex, then weight, then age, so that, for instance, our reported significance for weight refers to how much it improved the fit of a model that already included sex. We included family in all models to ensure tested covariates were significant over and above genetic effects. Family, modeled as a random effect, is highly correlated with heritability (correlation of 0.89) and so acts a surrogate for the effect of additive genetic variation (see methods). We report estimates of heritability for all phenotypes in Table 5.

The effects of sex, weight, and age were relatively small (Figure 1b, “main effect” rows): sex effects explained >10% of the variance for 14 phenotypes; in more than half of the cases the effect was <5%; weight accounted for >10% of the variance for three phenotypes; all age effects were <2% (see appendix).

Figure 1.—

Main effects and interactions. (a) The log P (i.e., the −log10 of the P-value) for main and interaction effects of 12 covariates. Each box shows significance scores for one covariate on all applicable phenotypes. The shaded bar marks the corrected 5% threshold for significance (log P = 4.55). For example, Apparatus has significant main effects for a few phenotypes but significant interactions for none, whereas Hour has few significant main effects but has significant interaction effect for a number of phenotypes. (b) The estimated percentage of variance significant effects contributed to the phenotype. Note that log P's are capped at 20 for display purposes and that results for test order, which had no significant effects, are not shown.

View this table:

Significant main effects and interactions of 13 covariates in 88 phenotypes

We estimated the significances and effects of the remaining covariates by adding each to a model that already included family, sex, and weight. Significant main effects of covariates were more common in physiological than behavioral phenotypes (33% of the time vs. 13%; see Table 6). Overall, 21 of the 258 significant effects explained >10% of the variance; the five cases of when a covariate explained >25% of the variance involved sex. Table 6 provides a summary for each covariate, splitting results by category of phenotype. Figure 1 plots log P-values and the percentage of phenotypic variance explained by significant covariates. Figure 2 summarizes the variance explained by significant covariates for the 16 subcategories of phenotype.

View this table:

Summary of main effects

Figure 2.—

Main and interaction effects of covariates on 88 phenotypes from 16 experimental tests. The y-axis gives the percentage variance explained by significant covariates; the x-axis lists the test performed with the number of phenotypes measured from that test in parentheses. Physiological tests are listed first and behavioral tests second. Boxes show the median (central line) and interquartile range (IQR; box perimeter), whiskers indicate the furthest data point <1.58 IQRs from the median, and circles show outliers.

We then extended our model to test for gene-by-covariate interactions, taking the main-effects models reported above and then assessing how much adding interaction terms improved the fit. We found 389 significant interaction effects. Figure 3 illustrates the interaction between sex and family on the percentage of B-cells (%B220+) among white blood cells. It shows that the effect of sex is often marked within families but its direction can vary between families. Similarly, Figure 4 illustrates the interaction between family and season on mean adrenal weight measured at 10 weeks. It shows seasonal means (spring in green, summer in red, autumn in brown, and winter in blue) for 28 families. In 11 families, adrenal glands are heaviest when harvested in winter, whereas in 9 families they are heaviest in summer. The seasonal effects are strong within but inconsistent across families, reflecting the greater importance of interaction over main effects.

Figure 3.—

Interaction between sex and family for the immunological phenotype percentage of B-cells among lymphocytes in 2056 mice. For each of 69 families (x-axis) we plot means (solid circles) and standard errors (bars) of the phenotype value for males (blue) and females (pink). The y-axis gives the phenotype as the square root of the percentage of white blood cells presenting B220. The graph shows that sex can have a strong effect within families but that the direction of the effect varies between families (interaction log P = 10.7). For example, in families plotted on the left, males are enriched in the B-cell compared with females, whereas for families on the right this sex effect is reversed. The graph also illustrates the marginal effects on the trait of family (differing overall heights; heritability = 59.9%) and sex (females higher overall; main effect log P = 13.0).

Figure 4.—

Interaction between season and family for the physiological phenotype mean adrenal weight in 696 mice. For each of 28 families (x-axis) we plot the seasonal means (solid circles) and standard errors (bars) of the phenotype for animals phenotyped in winter (blue), spring (green), summer (red), and autumn (brown). The y-axis gives the phenotype as the logarithm to the base 10 of the mean weight in grams of adrenal glands at 10 weeks old. The graph shows that the effect of season is consistent within family but can vary between families. For example, for the rightmost family adrenal glands are lightest in animals tested in summer and heaviest in autumn. Yet the rank order of seasons varies considerably through the graph.

The distribution of the 389 significant interaction effects differed from that of the main effects (Figure 1 and appendix). Remarkably, half of the effects could explain >20% of the variance. In nine cases the interaction could explain >50% of the variance. The largest numbers of interactions were with month (65 significant effects), season (55), sex (53), litter (51), and cage density (40). There were only 13 significant interactions with experimenter.

Physiological phenotypes showed the largest number of interactions with covariates (56% of interactions tested were significant; Table 7). Largest effects were found on mean cellular hemoglobin concentration, serum sodium and serum chloride concentrations, and plethysmography measures. There were fewer interactions with behavioral phenotypes (5% of interactions tested were significant, amounting to 11 in total), although the effect sizes were much the same on average (mean of 18.1% for behavior compared with a mean of 18.6% for physiology; see Figure 2).

View this table:

Summary of interaction effects between covariates and family


We have carried out the first systematic analysis of a range of covariates across multiple phenotypes (see appendix). We have estimated the heritability of 88 phenotypes, assessed the impact of a number of environmental factors, and measured the size of gene-by-environment interactions. Our large data set provides the most robust assessments to date of these measures in both behavioral and physiological domains.

We found large interactions between gene and environment and report that the effects are not restricted to behavioral phenotypes (see appendix). We do not believe this is an artifact of our analysis. Our calculations of percentage variance for random interaction effects and for fixed main effects are only roughly comparable with each other (see methods) and the interaction effects are subject to a slight upward bias. However, that is not sufficient to account for the substantially higher effect of significant interactions (18.6%) compared with significant main effects (3.7%). Second, inhomogeneity of phenotype variance across families is also unlikely to account for our findings since in many cases the rank order of covariate effects differs between families (Ungerer et al. 2003) as illustrated in Figures 3 and 4.

We report the effects of covariates as the percentage of phenotypic variance they explain and in doing so provide one assessment of how environmental covariates influence a phenotype. But the true nature of this interaction is more complex. For example, the concentration of alanine transaminase is subject to gene-by-environment interactions of month, accounting for 48.49% of the phenotypic variance, of season, accounting for 45.51%, and of litter, accounting for 18.17%. Yet these effects combine, with further covariates, to produce 100%. How is this possible?

The correlational structure of our data complicates an assessment of the relative importance of different covariates and interactions. The observed phenotypic variance is the sum of the variances of the covariates minus twice the covariances between the covariates. This means that two covariates could have individual effects of 50% but a summed effect of 60% if they are positively correlated (or one of <50% if they are negatively correlated). An observed covariate effect, just like an observed QTL effect, therefore includes a portion of the effect of any element that correlates with it; an actual month effect will partly manifest as observed litter and season effects and vice versa. A more comprehensive analysis would build a complete picture of each phenotype in the context of a path diagram or structural equation model that enumerated all relationships, both raw effects and correlations, between actors (e.g., Lynch and Walsh 1998).

The importance of gene-by-environment interactions has been emphasized in the analysis of mouse behavior and largely ignored in studies of mouse physiology. In the light of this, we designed our phenotyping protocol to minimize the effects of covariates on behavioral measures. All such tests were automated, so that the experimenter's intervention was limited to placing animals in the apparatus. This may explain why some covariates, previously suspected to influence behavioral phenotypes, were found to make a small contribution to the variance: time of day (hour) was a nonsignificant (or hardly significant with negligible effect) contributor to all measures including those that utilize exploration as a measure of anxiety (elevated plus maze, which had observations from 9 different hours of the day, and open field, which had observations from 10), despite the fact that exploratory activity has been reported to vary throughout the day (Aschoff 1981). The order in which animals are tested is also considered to have an important effect on behavior (Harro 1997), but we found no evidence for this: its effect was nonsignificant on all phenotypes measured.

Physiological phenotypes were not so controlled. There are no automated ways of administering an intraperitoneal glucose tolerance test, for example, and we observed large experimenter effects on these tests. This raises the question as to whether some phenotypes are more susceptible to interaction effects than others. Differences in the assessment protocols cannot be the only factor that accounts for the smaller number of interactions in behavioral tests. There are a number of covariates common to all phenotypes whose effects we could not ameliorate: month, season, year, sex, and weight. All of these covariates impinge more on physiological than on behavioral phenotypes (Tables 6 and 7).

Importantly, we observed many significant and large gene-by-environment interactions in our analysis of physiological phenotypes. Biochemical measures showed strong (>10% effect) gene-by-environment interactions with month (in 14 of 16 biochemical phenotypes), sex (12), season (9), and litter (8). We saw a similar pattern of strong seasonal and sex effects for hematology, immunology, plethysmography (which also had a strong hour interaction), and the glucose tolerance test (which also had a strong experimenter interaction). This has profound implications for QTL studies.

QTL detection experiments suffer when covariates are not adequately accommodated in the experimental design and subsequent analysis. First, a QTL may owe some, or indeed all, of its significance to an environmental effect confounded with the allelic variant. When a phenotype is strongly affected by who performed the experiment, any nonfunctional variant that correlates with the experimenter will manifest as a significant, but spurious, effect. The random nature of recombination means that in any experimental cross a fully balanced design is impossible and so confounds of this type are ineluctable. While the impact of covariates can be minimized by regressing out their effects prior to mapping (e.g., Valdar et al. 2006), this is highly conservative, since in the converse scenario, where experimenter acts as a surrogate variable for an actual QTL effect, the QTL will be missed.

Second, an interaction between a QTL and an environmental covariate may conceal the effect of both, even when covariate and QTL are in the model. For instance, if mice with allele a fear experimenter John more than experimenter Alice, but mice with allele A fear Alice more than John and all four conditions occur in about equal proportion, then neither experimenter nor QTL will have an observed effect. To recover the genetic effect in this case it is necessary to model the interaction in the mapping procedure (e.g., Wang et al. 2006).

Our analyses are limited by the relatively small number of covariates that we collected. We have no information on temperature fluctuation and humidity levels [shown to be important for behavioral tests of nociception (Chesler et al. 2002a,b)], which might explain month and seasonal effects. We have no information on noise levels that are significantly increased during working hours (Milligan et al. 1993). The predominance of significant temporal covariates reflects the importance of many other unknown environmental factors whose effect is moderated through the animals' genotypes. Thus the dissection of complex phenotypes in the mouse will require far more sophisticated observation and analysis of these interactions than has hitherto been attempted.


W.V. gratefully acknowledges receipt of an Access to Research Infrastructures fellowship under Örjan Carlborg, Uppsala University, Sweden, and additionally thanks Mike Neale, Tom Price, and Peter Visscher for helpful discussions. This work was funded by grants from the Wellcome Trust and the European Union Framework 6 Programme, contract no. LHSG-CT-2003-503265.


  • Communicating editor: G. Gibson

  • Received April 26, 2006.
  • Accepted July 23, 2006.


View Abstract