## Abstract

Functional dependencies between genes are a defining characteristic of gene networks underlying quantitative traits. However, recent studies show that the proportion of the genetic variation that can be attributed to statistical epistasis varies from almost zero to very high. It is thus of fundamental as well as instrumental importance to better understand whether different functional dependency patterns among polymorphic genes give rise to distinct statistical interaction patterns or not. Here we address this issue by combining a quantitative genetic model approach with genotype–phenotype models capable of translating allelic variation and regulatory principles into phenotypic variation at the level of gene expression. We show that gene regulatory networks with and without feedback motifs can exhibit a wide range of possible statistical genetic architectures with regard to both type of effect explaining phenotypic variance and number of apparent loci underlying the observed phenotypic effect. Although all motifs are capable of harboring significant interactions, positive feedback gives rise to higher amounts and more types of statistical epistasis. The results also suggest that the inclusion of statistical interaction terms in genetic models will increase the chance to detect additional QTL as well as functional dependencies between genetic loci over a broad range of regulatory regimes. This article illustrates how statistical genetic methods can fruitfully be combined with nonlinear systems dynamics to elucidate biological issues beyond reach of each methodology in isolation.

MANY, if not most, biologists are prone to believe that genetic interactions are common in the genetic architecture of complex traits. It is, however, more debatable how important these interactions are in contributing to the expression of phenotypes in individuals and in determining population responses to selection, maintenance of genetic variation, and speciation processes. Studies of genetic interactions, or epistasis, are commonly based on hierarchal genetic models with additivity as the main effect and dominance and epistasis modeled, if at all, as single- and multilocus deviations from the main effects. Using these models, hybridization experiments have shown an important overall contribution of epistasis to the phenotypic differences among (Doebley *et al.* 1995) and within (Hard *et al.* 1992; Lair *et al.* 1997; Carroll *et al.* 2001, 2003) species. The same observations have been made in studies that dissect quantitative genetic variation into contributions from individual quantitative trait loci (QTL) using epistatic genetic models (Carlborg and Haley 2004). Phillips (1998) predicted that interaction between gene products that form molecular machines and signaling pathways would become increasingly important to genetic analysis and reinforce the concept of epistasis. His predictions are supported by the appearance of the first genomewide mapping studies of epistatic interactions underlying gene expression in yeast (Brem and Kruglyak 2005), by the detection of epistatic pairs of genes, and by interpretation of these observations in terms of regulatory pathways (Brem *et al.* 2005).

Recent studies seeking to estimate how epistatic effects of individual loci contribute to phenotypic variance show that the proportion of genetic variation that can be attributed to statistical epistasis varies greatly among studies, where a very high proportion of the genetic variance is due to epistasis in some studies and virtually none in others (Carlborg and Haley 2004). As the studies are based on similar analytical approaches this suggests that there are biological reasons for the observed differences, and it is of importance both for understanding and for exploiting genetic variance to settle whether different functional dependency patterns between polymorphic genes (regulatory architectures) give rise to distinct statistical interaction patterns or not (we define gene A to be functionally dependent on gene B if the rate of change of expression of gene A changes when the level of gene B changes). If they do, statistical interaction patterns may reveal insights about underlying biological mechanisms. If not, it means that allelic variation within a given regulatory architecture determines the statistical interaction pattern and that very little can be inferred about the underlying architecture from observed epistatic patterns alone.

Our work is part of an ongoing effort to understand population-level variation in terms of individual-level genotype–phenotype maps. Attempts to refine the concept of epistasis have been made [*e.g.*, “physiological epistasis” (Cheverud and Routman 1995) and “functional epistasis” (Hansen and Wagner 2001)] and studies have addressed the genetics of biological network models (Wagner 1994; Frank 1999; Omholt *et al.* 2000; You and Yin 2002; Peccoud *et al.* 2004; Cooper *et al.* 2005; Moore and Williams 2005; Segre *et al.* 2005; Welch *et al.* 2005; Azevedo *et al.* 2006; Omholt 2006). In this work we study the relationship between statistical epistasis and functional dependency by doing quantitative genetic analysis of synthetic data sets obtained from genotype–phenotype models where phenotypic variation at the level of gene expression arises from allelic variation in model parameters. Using three-locus motifs of gene regulatory networks we elucidate the effects of no and one-way functional dependency in four regulatory situations in a no-feedback setting and the effects of one-way and two-way functional dependency in four regulatory situations in a negative- as well as in a positive-feedback setting. Our approach, where mathematical models generating phenotypic variability based on how genes work and interact are embedded into a statistical genetics context, illustrates how statistical methodology can be combined with nonlinear systems dynamics to elucidate biological issues beyond reach of each of them in isolation.

## METHODS

#### Network motifs:

We made use of 12 gene regulatory models, each with a unique regulatory motif (Figure 1). Motifs 1–4 involve one-way functional dependency only (*i.e.*, regulatory actions), motifs 5–8 include two-way functional dependency in the form of negative feedback, and motifs 9–12 have two-way functional dependency in the form of positive feedback. It should be noted that although the motifs contain only three genes each, the models reflect a level of abstraction where the functional dependency does not necessarily involve direct biochemical interaction. That is, the models implicitly account for the possible presence of numerous nonpolymorphic additional genes in the networks. The models thus potentially capture a wide range of regulatory situations.

#### Model framework and equations:

For modeling the gene regulatory motifs, we use the sigmoid formalism (Mestl *et al.* 1995; Plahte *et al.* 1998) for diploid organisms (Omholt *et al.* 2000). A gene regulatory network is described by a set of ordinary differential equations (ODEs),(1)where the 2*n*-vector contains the expression levels of the two alleles for each of genes in the gene regulatory network, while the vectors , , , and contain allelic parameter values. Each allele, (the *i*th allele of gene *j*) has the parameters , the maximal production rate of the allele, and , the relative decay rate of the expression product. In addition, for each gene regulating the expression of allele , there is a threshold parameter and a steepness parameter used to describe the dose-response relationship between and the resulting production rate of . We assume for simplicity the allele products to be equally efficient as regulators and use just their sum () in the regulatory function.

We have used the Hill function (Hill 1910) in our simulations to generate a flexible dose-response relationship between regulator and production at the regulated gene,(2)where gives the amount of regulator needed to get 50% of maximal production rate while *p* determines the steepness of the response. The Hill equation describes Michaelis–Menten-like regulation for and more switchlike response as increases. If the regulatory effect is inhibitory, the regulatory function is used. Concerning our choice of the gene regulatory function, the Hill function is widely used in modeling of gene regulatory networks (Becskei *et al.* 2001; de Jong 2002; Rosenfeld *et al.* 2002). There is a large body of literature supporting the presence of sigmoidal gene regulation functions, and the relationship can be due to cooperativity (Veitia 2003), multiple transcription-factor binding sites, multiple phosphorylations (Mariani *et al.* 2004), and spatially constrained kinetics (Savageau 1995). Thermodynamic modeling of *cis*-regulatory architecture also yields sigmoidal relationships (Bintu *et al.* 2005), and a recent empirical study of the -phage P_{R} promotor in *Escherichia coli* identifies regulatory functions closely resembling the Hill function (Rosenfeld *et al.* 2005).

Table 1 contains diploid ODE models of the diagrams in Figure 1. In all the equations and , and we use the notation for the dose-response relationships. In those motifs where the regulatory functions involve double inputs, we made use of the logical functions(3)

#### Simulations:

We created 2000 homozygous parental lines for each regulatory model. Functional genetic variation between the lines was introduced by assigning allelic values for the regulatory parameters of each gene in the model (*i.e*., production rates, regulation thresholds, regulation steepness, and relative decay rates) through sampling from uniform distributions except for relative decay rates that were held fixed (Table 2). Then 1000 ideal F_{2} populations of 960 individuals each were constructed by randomly crossing pairs of the 2000 inbred lines. The populations were ideal in the sense that the three genes in the network were at exact intermediate allele frequencies and in perfect Hardy–Weinberg and linkage equilibrium. The genotypes at each gene were recorded for each F_{2} individual. The phenotype *P* for each individual was obtained by solving the differential equations describing the regulatory model to find the stable equilibrium level of gene 3 (*y*_{3}). To avoid artifacts arising from numerically solving the differential equations, equilibrium values <0.01 were set to 0, and F_{2} populations in which all phenotypes were 0 were discarded (this happened for 3 of 12,000 F_{2} populations). For the remaining populations, the equilibrium levels were standardized to a unitary variance. To explore the statistical significance of the epistatic variance generated by the 12 interacting gene network motifs, we also generated F_{2} populations (1000 populations for each motif and heritability) with two different (broad sense) heritabilities (0.2 and 0.05), This was done by adding independent normally distributed noise to the standardized equilibrium levels of the individuals in the population with phenotypes generated without noise.

Since we scaled the noise to predefined heritabilities the somewhat arbitrary absolute ranges from which parameters are sampled become less important than the relative variation between parameters. We paid particular attention to the latter type to ensure that the full spectrum of different regulatory situations could be reached for every regulatory function. The chosen values of α and γ gave steady-state levels in the range (20, 40) for a constitutively expressed gene and (0, 40) for a regulated gene. As these ranges overlap with the range (10, 30) for θ, it allowed the regulatory function to attain values close to the limits 0 and 1. This ensured a range of behaviors from all regulated genes being switched off to all being switched on for each regulatory function. For simplicity we fixed the decay rates, but since the production rates are under genetic control we should not lose any generality by this.

#### Genetic model and estimation of parameters and variance components:

Following Zeng *et al.* (2005), and extending to three loci, the full genetic model(4)was used, and using the metric we letwhere {*AA*, *Aa*, *aa*} gives the genotype at gene *i*. On the basis of these equations and the simulated genotypes, we constructed a design matrix, *X*, containing vectors of regression variables for marginal effects and elementwise products of these variables for two-way interaction effects and three-way interaction effects such that(5)

The dimensions of *X* are *n* × 27, where *n* is the number of simulated individuals.

In our ideal populations there is no covariance between the columns of *X*. This allows us to estimate parameters in both the full genetic model (4) and any reduced model by regressing the simulated phenotypes on *X* and then just extracting the results from the columns associated with the particular model of interest.

#### Significance testing:

We tested the significance of terms in various genetic models by the general linear hypothesis test (Montgomery *et al.* 2001), with the test statistic(6)where the full model (FM) has parameters, is the number of parameters removed in the reduced model (RM), and is the number of individuals in the F_{2} population. Under the null hypothesis that none of the removed parameters are different from zero, the test statistic has the distribution .

## RESULTS

#### Variance component signatures:

To assess to which degree different functional dependency patterns between genes generate distinguishable statistical signatures, *e.g.*, the relative amount of variance explained by marginal, two-way interaction, and three-way interaction effects, these effects were estimated for the 12 motifs in the F_{2} populations using phenotypes without noise. For all motifs, the expression level phenotype varied over the 1000 simulated populations from fully monogenic on one extreme to displaying large levels of statistical epistasis (maximum ranging from 29 to 88%; Figure 2). However, the positive-feedback motifs (motifs 9–12) generate larger amounts of statistical epistasis than the no- or negative-feedback cases (motifs 1–4 and 5–8). All positive-feedback motifs produce some data sets with >80% epistatic variance (maximum range from 81 to 88%), a level not reached by any of the other motifs (maximum range from 29 to 50%). Moreover, on average, motifs with an upstream inhibitor of the positive-feedback loop (motifs 9 and 10) generate more epistatic variance than motifs with upstream activation of the positive-feedback loop (motifs 11 and 12). The distribution of the marginal variance for motifs 1–8 is rather similar with the only exception being motif 4 that produces data sets with particularly low levels of epistatic variance.

Figure 3 depicts in more detail the marginal-, two-way, and three-way epistatic variance distributions for a typical representative of the low epistatic variance group (motif 1, Figure 3A) and a typical representative of the high epistatic variance group (motif 10, Figure 3B). Among the 300 F_{2} populations with the lowest level of epistatic variance, both motifs display an almost entirely marginal statistical genetic signature of the expression level phenotype with very low levels of epistatic variance (<2% for motif 1, <1% for motif 10). The marginal effects for motif 1 are mainly from one gene (averages of 97, 86, and 83% of the variance for the three 100-population bins) and even more so for motif 10 (averages of 100, 100, and 98%). In the 200 populations with the highest epistatic variance, the average proportion of the variance explained by the single largest gene alone decreases to ∼55% for motif 1 and 40% for motif 10, and the levels of epistatic variance are considerably higher for motif 10 (25–88% of the variance) than for motif 1 (12–50%). It is also notable that a substantial portion of the explained genetic variance in motif 10 is due to three-way interactions (up to nearly 45% for some populations).

The general picture is that even though all motifs are capable of generating a flexible range of variance components, network motifs with positive feedback can generate higher amounts of two-way as well as three-way epistatic variance than motifs containing no feedback or negative feedback.

#### Statistical significance of two-way and three-way epistatic components:

The statistical significance of the observed epistasis was explored using the simulated mapping populations with broad sense heritabilities (*H*^{2}) of 0.2 and 0.05. First, reduced models containing only marginal parameters were compared to full models containing all two-way interaction parameters. This was done for all three genes at once (19 *vs.* 7 parameters) and all three pairwise combinations of the three genes (9 *vs.* 5 parameters). Results for *H*^{2} = 0.2 are summarized in Figure 4 (*H*^{2} = 0.05 exhibited a similar pattern among the motifs and the results are not shown). We find that significant interactions occur much more often than expected by chance (5% significance level) for all motifs when using the full model including all three genes and their two-way interactions (27–62% of all populations). This is also true for the pairwise combinations of genes 1 and 3 and genes 2 and 3 (17–45% and 18–57% of populations, respectively). The percentage of significant interactions between gene 1 and 2 ranges from 3 to 26%, but for motifs 3 (3%) and 4 (6%) there are no more significant interactions than expected by chance (type I errors). As gene 1 and gene 2 in these two motifs are the only pairs in the whole study where either gene is functionally independent of the other, the simulated data sets show correspondence between the type of functional relationship between genes and the significance of the statistically detectable interactions. However, although this provides us with a conceptual link between functional dependency and statistical epistasis, it should be noted that our analysis does not allow us to refine this link much further.

The significance of three-way interactions is generally lower than that for the two-way interactions (Table 3), but in the case of positive-feedback motifs, the level clearly exceeds what would be expected by chance.

Here we observe that the capacity to generate statistically significant two-way epistatic interactions is a generic feature of regulatory networks and that the capacity to generate statistically significant three-way interactions at the gene expression level is an additional generic feature of positive-feedback motifs.

#### Statistical significance of two-way interaction parameters:

To get a better view of how the various two-way interaction parameters (additive-by-additive, additive-by-dominance, dominance-by-additive, and dominance-by-dominance interactions) contribute to statistically significant two-way interactions in the various motifs, the significance of individual two-way interaction parameters was tested for all three pairwise combinations of the three genes (6 *vs.* 5 parameters) in the populations with *H*^{2} = 0.2. The results are summarized in Figure 5, and we see that the positive-feedback motifs (especially motifs 9 and 10) frequently generate significance for all four types of interactions, while this is much less pronounced for the other motifs. Additive-by-additive interaction is the most frequently significant type of interaction for all pairs in all motifs. It is most frequent in pairs involving gene 3 and is in some cases significant in nearly half of the populations. Although significant additive-by-dominance and dominance-by-additive interactions are in general rather infrequent, they do appear more often than expected by chance, especially for motifs 9 and 10 where *da*_{23} and *ad*_{23} are significant in 20–37% of the populations. Except for motifs 3, 4, and 6, single *ad* or *da* parameters are significant in >10% of the populations. Significant dominance-by-dominance interactions occur more often than expected by chance only for positive-feedback motifs.

#### Number of genes detected:

The fact that the capacity to generate statistically significant two-way and three-way interactions seems to be a generic feature of regulatory nets suggests that the search for such interactions may quite generally reveal more QTL underlying a complex trait than by solely searching for QTL independently of each other. To test this we did a three-step search for significant QTL effects in the populations with *H*^{2} = 0.2. First the power to detect QTL independently was evaluated by testing for the significance of the marginal additive and dominance effects for each gene (3 *vs.* 1 parameters). Then the additional power to detect functionally dependent QTL using genetic models including epistasis was explored by testing for pairwise interaction effects in addition to the marginal effects (9 *vs.* 5 parameters) and finally we looked for trigenic interaction effects in addition to marginal and pairwise interaction effects (27 *vs.* 18 parameters). Figure 6 summarizes the distribution of the number of significant QTL after each step. We see that when interactions are taken into account, there is for all motifs a considerable increase in the number of populations where three QTL are detected. Compared to testing for marginal effects only, an additional search for two-way interactions increased the number of cases where three QTL were detected by 22–36% more populations for motifs with no feedback or negative feedback and by 47–87% more populations for positive-feedback motifs. A further search for trigenic interactions increased the number of populations where three QTL were detected by 28–53% in the no-feedback and negative-feedback motifs and by 74–133% in the positive-feedback motifs (relative to testing for marginal effects only).

## DISCUSSION

#### Possible shortcomings of our approach:

In our network models we have focused on *cis*-regulatory mutations changing the dose-response relationship and the maximal production rate. As mutations may influence the gene expression patterns from given regulatory motifs in ways that are not accounted for here, we do not pretend to have generated an exhaustive list of epistatic expression patterns. However, the behavior that can be generated from our models is quite extensive and we expect it to cover the majority of situations normally encountered. In addition, regulatory variation is indeed identified as an important factor in explaining complex traits (Yan *et al.* 2002). There are still few reports characterizing the mutations underlying QTL effects in terms of biological parameters. Two examples are described in a genetical genomics study in mouse (Schadt *et al.* 2003), where allelic variation in transcript decay rates at the C5 gene and variation in the number of copies of the Alad gene, leading to higher production rates of transcript, were found to underlie two *cis*-acting eQTL. In addition, a genomewide study of regulatory variation underlying self-linkages in yeast (Ronald *et al.* 2005) identified a high proportion of *cis-*acting (promoters, transcription factor-binding sites, mRNA stability) variation.

The set of regulatory motifs in this study is not a complete collection of three-gene motifs, but we have included well-documented elements such as feed-forward loops and double input (Lee *et al.* 2002; Shen-Orr *et al.* 2002). We also have a strong focus on feedback that is ubiquitous in biological systems (Thomas and D'Ari 1990; Cinquin and Demongeot 2002) and contributes vital systemic features, where, *e.g.*, negative feedback is associated with homeostasis and positive feedback is a necessary prerequisite for multistationarity (Plahte *et al.* 1995). Several regulatory motifs including feedback have been shown to be involved in the regulation of gene expression (Lee *et al.* 2002; Davidson *et al.* 2003; Wray *et al.* 2003) and it is likely that it will be an important component in the regulation of other complex traits as well.

In our simulations we include environmental variation by adding random noise to the equilibrium values of the regulatory systems. This is the standard way of doing simulations of quantitative genetic data and gives no covariance between genotype and environment. In many transcriptional regulatory systems external factors play an active role in regulating gene expression, for instance, in responses to stress conditions and utilization of nutrients. The approach used here could be expanded by including environmental variables as inputs to the regulatory functions. This would probably lead to significant genotype-by-environment interactions in much the same manner as we find genotype-by-genotype interactions in this study.

#### Testable predictions:

Our studies confirm that traditional quantitative genetic models are, at least to some extent, able to detect functional dependencies within gene regulatory structures. This might seem like an obvious conclusion, but in our opinion it is not. Most evaluations of epistatic QTL-mapping methods have not aimed at exploring the ability of the method to detect various types of biological gene (actions and) interactions, but rather at demonstrating and testing the properties of these methods for mapping of QTL whose inheritance conforms to standard quantitative genetics nomenclature (Sen and Churchill 2001; Carlborg and Andersson 2002; Kao and Zeng 2002). Such simulations are thus useful for comparing mapping methods, but do not have any strong implications on the causal functional dependencies underlying the genetic interaction effects. In contrast to this, our simulations are based on the systemic features of a gene rather than its statistical effects. The genetic variance that can be detected by the statistical genetics model in our simulations thus emerges from polymorphisms describing allelic differences in properties affecting the expression of a gene in the context of a network of other genes. Our approach provides several new testable predictions concerning the ability of QTL-mapping methods to detect functional polymorphisms and dependencies in a genetical genomics context.

First, the amount of statistical epistasis generated by a biological network depends on system-level features such as the existence and sign of feedback. Regulatory structures with positive feedback are capable of generating more statistical epistasis than those with negative feedback, and these interactions are thus easier to detect in a QTL-mapping study.

Second, the amount of statistical epistasis that can be detected for a particular regulatory structure will vary widely depending on which of the regulatory parameters are affected by the genetic polymorphism. Figures 3 and 4 clearly show how the same regulatory structure can generate very different amounts of statistical epistasis: although polymorphisms are segregating at all loci, a three-gene network can statistically appear to be everything from a single major gene to a three-gene network with two- and three-way interactions. This also implies that in mapping studies where there are low levels of statistical epistasis such as in Flint *et al.* (2004), there can still be functional relationships and network structures causally connecting the QTL.

Third, there is no clear pattern discerning one-way and two-way functional dependencies when it comes to the amount of statistical interaction. An example of this is that although all motifs with positive feedback show high amounts of epistatic variance (Figure 2), the gene pair most frequently showing significant epistasis differs between the motifs even though all motifs have the same underlying structure (Figure 4).

Fourth, the results strongly suggest that the inclusion of statistical interaction terms in the genetic model will increase the chance to detect additional QTL as well as functional dependencies between genetic loci. It thus seems worthwhile to put more effort into development of methods for mapping functional dependencies and to interpret statistical epistatic estimates in functional terms. Our simulations identify additive-by-additive as the most commonly produced interaction, and it is therefore a strong candidate for inclusion in a reduced-interaction model. On the other hand, since the other types of interactions are less frequent, such patterns are of particular interest when it comes to biological interpretation of mapping results.

Although we in this article have limited ourselves to studying statistical epistasis patterns in a genetical genomics context, it should be noted that in addition to accounting for the possible presence of numerous other genes in the networks studied, polymorphisms in a given gene in our models can in principle influence the gene expression of another gene in the network through very complex routes involving higher-order phenotypic levels. In general, the relationship between genetic polymorphisms, regulatory dynamics, and statistical variance components can be monitored and analyzed at any phenotypic level, and there is no limit to how many systemic levels the genotype-to-phenotype models can include or how sophisticated these models can be. Fortunately, systems biology methodologies enabling us to make empirically well-founded mathematical genotype–phenotype models of more complex multilevel phenotypes are emerging very fast. This will open the way for a systematic investigation of the systemic conditions under which different types of functional dependency between polymorphic genes make detectable contributions to the genetic variance components of complex traits.

## Acknowledgments

This study was supported by the National Programme for Research in Functional Genomics in Norway (FUGE) in the Research Council of Norway (grant no. NFR153302). Ö.C. is thankful to the Knut and Alice Wallenberg foundation for financial support. Visits by A.B.G. to the Linneaus Centre for Bioinformatics were supported by the Access to Research Infrastructures (ARI) program (project no. HPRI-CT-2001-00153).

## Footnotes

Communicating editor: S. Muse

- Received April 3, 2006.
- Accepted September 18, 2006.

- Copyright © 2007 by the Genetics Society of America