| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 176, 1151-1167, June 2007, Copyright © 2007
doi:10.1534/genetics.106.067348
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Linnaeus Centre for Bioinformatics, Uppsala University, SE-75124 Uppsala, Sweden
1 Corresponding author: Linnaeus Centre for Bioinformatics, Uppsala University, Bio Medical Centre Box 598, SE-75124 Uppsala, Sweden.
E-mail: jose.alvarez-castro{at}lcb.uu.se
| ABSTRACT |
|---|
The term statistical epistasis refers to the use of statistical tools to analyze gene interactions. FISHER (1918) provided the basis of the study of gene effects of a trait using parameters that represent the average effects of allele substitutions over the population and lead to a decomposition of the genetic variance. COCKERHAM (1954) and KEMPTHORNE (1954) complemented this work with a subdivision of the epistatic variance into separate components. FISHER (1958) perceived epistasis as a nuisance effect whose evolutionary consequences would thus be equivalent to those of environmental variation. Albeit such an approach could be suitable to study phenotypic change in very large random-mating populations, it might not be reasonable otherwise. In fact, the theory of speciation by hybrid incompatibilities (DOBZHANSKY 1936; MULLER 1942) and the shifting balance theory (WRIGHT 1931, 1977) are two major theories that exemplify the crucial role of epistasis as a driving force in evolution. The evolutionary consequences of epistasis in the context of these theories, and in general in speciation and in adaptation in subdivided populations, have been studied by inspecting the components of the genetic variance (GOODNIGHT 1988, 1995, 2000; WADE and GOODNIGHT 1998; BARTON and TURELLI 2004; TURELLI and BARTON 2006).
CHEVERUD and ROUTMAN (1995) and CHEVERUD (2000) analyze and discuss the efficiency of statistical epistasis for studying the evolution of complex traits. They underline the difference between genotypic and genetic values and suggest to study epistasis by focusing on genotypic values, as they represent natural effects of allele substitutions regardless of the allele frequencies in the population under study. Their view is in accordance with the first definition of the term epistasis by BATESON (1909), and they refer to this as physiological epistasis because the aim is to capture the interactions of the genes at the level of the organism rather than at a population level (see PHILLIPS 1998 for a comprehensive, historical dissection of this duality). HANSEN and WAGNER (2001b) further inspected the relationship between physiological and statistical epistasis. They prefer to use the term functional epistasisinstead of physiological epistasisas it reflects the functional properties of the gene interactions in determining the expression of a trait. Their multilinear model incorporates this in the form of a simplified genotypephenotype map based on genetic values that capture the main role of gene interactions in evolution. The loss of generality of the multilinear model is rewarded by analytical tractability. A key concept in HANSEN and WAGNER's (2001b) development is their change-of-reference tool, which allows the description of epistatic interactions as allele substitutions made on any reference genotype. In particular, this allows inspection of evolutionary properties of a population by means of describing the (multilinear) epistasis parameters using the mean of the population as a reference point (HANSEN and WAGNER 2001a; HERMISSON et al. 2003; CARTER et al. 2005; HANSEN et al. 2006). BARTON and TURELLI (2004) have developed a model to analyze the consequences of epistasis in the presence of genetic drift. Their theoretical framework complements that of HANSEN and WAGNER (2001b) and implements a new notation with the purpose of providing more transparent results than the previous approaches. Functionalor physiologicalepistasis has also been referred to as biological epistasis, and has even been split into genetical epistasis and biological epistasis, when discussing how to integrate systems biology and quantitative trait loci (QTL) analysis (MOORE 2005; MOORE and WILLIAMS 2005).
In the context of QTL analysis, YANG (2004) and Zeng and collaborators (KAO and ZENG 2002; ZENG et al. 2005) have reviewed and analyzed several statistical models used for obtaining estimates of epistasis. For two major reasons, they stress the use of orthogonalstatistical models. First, the measurement of genetic effects of reduced models is consistent in orthogonal models. This enables a straightforward comparison of nested models for performing model selection. Second, each genetic effect in an orthogonal model can be independently estimated and plays a role in the computation of its component of variance alone. ZENG et al. (2005) have developed the G2A model, a multilocus two-allele model that is orthogonal in populations under strict HardyWeinberg and linkage equilibrium, regardless of the frequencies of the alleles at each locus. WANG and ZENG (2006) have extended this model to a multiallele framework with linkage disequilibrium, particularly focusing on the decomposition of the genetic variance. YANG (2004) has built an explicit two-locus two-allele model that is generally orthogonal regarding the frequencies of the genotypes in the populations and has implemented it with a tool for measuring the bias in the estimates of genetic effects caused by linkage disequilibrium.
Here we establish a formal link between the models of statistical and functional epistasis through a unified, formal frameworkthe natural and orthogonal interactions (NOIA) model. We provide a mathematical description of genetic systems that leads to a conceptual interpretation of the relationship between statistical and functional epistasis and a set of explicit expressions to translate between statistical and functional estimates and between genetic effects in different populations. The resulting model incorporates general statistical and functional formulations of genotypic values on genetic effects that improve both the existing statistical and the functional models of gene interactions. We also provide a graphical interpretation of the functional formulation of NOIA, similar to that of the statistical modelsas linear weighted regressions of the genotypic values on the gene content (FISHER 1918). The slope of the functional regression is constrained to the one of an unweighted regression, which provides a characterization of when the functional and the statistical formulations are equivalent.
| THE NOIA MODEL |
|---|
We begin by using a single genotype, say G11, as a reference point from which to measure the genetic effects, resulting in the following formulation of the model, G =
· E:
![]() | (1) |
The first column of
illustrates that the phenotypes are measured as deviations from the reference point, here R = G11. The second column illustrates that one additive effect is added to R for each A2 allele and the third column that the dominance effect is added to the heterozygote. The genetic effects are thus effects of allelic substitutions on the reference genotype A1A1. The extension of this NOIA functional formulation to several loci is obtained as the Kronecker product of the S matrices of the single loci (APPENDIX A). For two loci, A and B, with genetic-effect design matrices SA and SB, respectively, this reads
![]() | (2) |
. By using the properties of the Kronecker product we get
and, hence, the genetic effects can be obtained by solving the system
![]() | (3) |
If
, the formulations (2) and (3) describe the effects of allelic substitutions on the reference genotype A1A1B1B1 (or simply "1111"), as both loci have G11 as their respective reference points. This is convenient for constructing the model, but insufficient for a functional genotypephenotype map, which should be able to describe the genetic effects as effects of substitutions from any reference point. Therefore, we implement the model with a change-of-reference tool.
Modeling genetic effects as allele substitutions on any genotype:
Here we provide a simple way to compute the genetic-effect design matrix for using any individual of the population as a reference point of the genetic system. This enables us to describe all genotypic values in the genetic system as sets of allele substitutions on any particular (reference) individual in the population and also to use the mean of the population under study as the reference point.
The general expression for the one-locus functional genetic-effect design matrix, SF, is
![]() | (4) |
![]() | (5) |
This expression is very useful to inspect some particularities of the one-locus and multilocus NOIA functional formulations. By equating E in (1), the general expression of the genetic effects of the one-locus system is
. From this expression and (5) it becomes clear that the reference point is in fact R = p11G11 + p12G12 + p22G22, and that the genetic effects are always defined in the same way, regardless of the reference point used, as
(G22 G11),
(G11 + G22). This is the same definition of genetic effects as in, for instance, Cockerham's F2 model (ZENG et al. 2005). The general two-locus functional formulation of the NOIA model can be obtained by inserting two single-locus genetic-effect design matrices (4) in expression (2). In this expression (not shown), the frequencies at each locus affect the single-locus effects at the other locus. This is in accordance with the definition of epistasisthe effects of the allele substitutions at one gene depend on the genetic background. The (pairwise) epistatic effects in the two-locus case, on the other hand, are independent of the frequencies. This logic, only the highest-order effects being independent of the frequencies, extends to higher-order terms of epistasis when more loci are involved.
Translating genetic effects from one to another reference genotype:
Expressions (4) and (5) enable us to change the reference point from which to describe the genetic effects. Given a description of the genetic system from reference point R1,
, and a description of the same genetic system from a different reference point R2,
, it is straightforward to get to the expression
![]() | (6) |
. This expression is useful to change the reference of the genetic effects, i.e., to translate the genetic effects associated with a reference point to the genetic effects associated with a different reference point.
When are the genetic effects of allele substitutions orthogonal?
The NOIA functional formulation is orthogonal for several populations, by just using the mean of these populations as a reference point of the model. These populations fulfill
![]() | (7) |
This expression is derived in APPENDIX C and its graphical interpretation is in the next section. For the populations fulfilling (7), the NOIA functional formulation is an orthogonal statistical formulation that can therefore be used to properly estimate genetic effects in QTL studies as justified by YANG (2004) and Zeng and collaborators (KAO and ZENG 2002; ZENG et al. 2005).
A general orthogonalstatistical model:
The explicit and general orthogonal [regardless of whether or not condition (7) holds] expression of the statistical one-locus genetic-effect design matrix, SS, is
![]() | (8) |
The scalars of the SS matrix fulfill the conditions to be orthogonal scales sensu COCKERHAM (1954) (APPENDIX C). The first two columns of the functional (4) and statistical (8) genetic-effect design matrices are the scalars of the reference point and the scales related to additive effects and are identical in the two formulations. The differences between the two one-locus formulations are in the third column, the scales for dominance. The expressions for these dominance orthogonal scales can be obtained by computing the values of the dominance deviations in the graphical interpretation (APPENDIX C). In the same way as in the functional formulation, the general one-locus statistical formulation of the NOIA model (8) can be easily extended to a general multilocus case by taking the Kronecker product of single-locus genetic-effect design matrices (2). This resembles the way it has been done for particular cases of statistical formulations (ZENG et al. 2005). The statistical formulation (8) reduces to the functional one (4) whenever the conditions for orthogonality of the functional formulation (7) hold. The only exception is when the frequency of one of the genotypes is one, where the denominators in the third column of the statistical genetic-effect design matrix (8) are zero. This intuitively makes sense as no meaningful statistical formulation can be expected in a population in which only one genotype is present.
![]() | (9) |
Unlike in the general functional formulation (5), the additive effects, reflected in the second row of this inverse matrix, change depending on the allele frequencies in the population. This is a consequence of the parameters of the model no longer being natural effects of allele substitutions, but instead average effects of allele substitutions over the population. To clarify the difference between the meaning of the parameters in the statistical and the functional model formulations, we use ES = (µ,
,
)T for the genetic effects vector in the one-locus statistical formulation instead of EF = (R, a, d)T that was used in the functional formulation (1). In ES we use µ for denoting the mean of the population as in other statistical epistasis models (e.g., ZENG et al. 2005). However, we prefer to denote the statistical genetic effects as Greek letters for making a clear distinction between statistical and functional genetic effects. The vector EF follows the notation of the unweighted regression model by CHEVERUD and ROUTMAN (1995) regarding the functional genetic effects, although we prefer to use R instead of C for the reference point (CHEVERUD 2000).
Taking into account this notation of the vectors of genetic effects, and interpreting the genetic-effect design matrices as statistical matrices instead of functional matrices, expression (6) holds for the statistical formulation, and therefore it enables us to translate statistical genetic effects of one population into how they would look in a different population. The statistical formulation of the NOIA model (8) can be used for estimating multilocus genetic effects in the exact same way as previous models (APPENDIX C).
Obtaining functional estimates of genetic effects from statistical estimates:
Let us denote by SF and EF the genetic-effect design matrix and the vector of genetic effects in the functional formulation and by SS and ES the corresponding ones in the statistical formulation. In the one-locus case, the vectors of genetic effects are EF = (R, a, d)T and ES = (µ,
,
)T. By implementing this notation in (1) we have G = SF · EF and G = SS · ES. Hence, the expressions for the transformations of genetic effects between the two formulations of the NOIA model are
![]() | (10) |
| PREVIOUS MODELS AS PARTICULAR CASES OF NOIA |
|---|
models:
,
,
. The genetic-effects design matrix of the F2 model can be obtained by inserting the genotype frequencies of an ideal F2 population in the NOIA statistical formulation (8), and its reference point µ is, thus, the mean of an F2 population. For the multilocus case, the description of the system is obtained by first computing the correct genetic-effect design matrices for the individual loci and then computing the Kronecker product of the single-locus genetic-effect design matrices, as shown in (2). The F
model, which is orthogonal forand thus adapts to the mean ofa population with frequencies
, p12 = 0,
, is also a particular case of the general NOIA statistical formulation that can be explicitly obtained in the same way as explained for the F2 model above. One unsurprising remark about the F
population is that it fails in offering estimates of dominance effects, due to the absence of heterozygotes.
The G2A model:
ZENG et al. (2005) provided the genetic-effect design matrix of the G2A model for the one-locus case as
![]() | (11) |
![]() | (12) |
The unweighted regression model:
CHEVERUD and ROUTMAN's (1995) unweighted regression model (see also CHEVERUD 2000; ZENG et al. 2005) is a particular case of NOIA in which
for each locus. Sinceas well as for the F2 and the F
modelsthese frequencies fulfill criterion (7), the unweighted regression model can be considered as a particular case of both the functional (Equation 4) and the statistical (Equation 8) formulations of NOIA. The reference point of this model is the unweighted mean of the genotypic values of all genotypes, R = (1/3)G11 + (1/3)G12 + (1/3)G22 and the definition of genetic effects is the same as in the F2 model, as explained in relation to expression (5).
| GRAPHICAL INTERPRETATION OF NOIA |
|---|
|
i and
ij, which are directly related to the decomposition of the genetic variance), we provide the expressions that give those values as functions of the parameters of the NOIA functional formulationGreek letters. The additive variance is the variance of the average effects,
i, and the dominance variance is the variance of the dominance deviations,
ij (COCKERHAM 1954; FALCONER and MACKAY 1996; LYNCH and WALSH 1998). The extension of this to several loci is straightforward. The additive-by-additive variance, for instance, is the variance of the additive-by-additive average effects, 
ij, and these would be obtained in a multilocus genetic system as the products of the additive-by-additive genetic effects and the corresponding orthogonal scales in the multilocus genetic-effect design matrix.
|
|
), thus making criterion (7) hold. Consequently, the regression is parallel to the line through G11 and G22 only when the functional formulation of the NOIA model is orthogonal. In Figure 3A, the regression is made on what we label as an HW3 population, in which p1 = 0.3 and the HardyWeinberg proportions hold, thus leading to p11 = 0.09, p12 = 0.42, p22 = 0.49. In this case criterion (7) does not hold, and the slope of the regression differs from the line defined by G11 and G22, leading to a change in the additive values. In this particular case they even change signs.
A graphical interpretation of the functional formulation:
The NOIA functional formulation can also be interpreted as a regression on the gene content, albeit this is not a typical linear regression anymore. Here, the slope of the regression remains constant regardless of the allele frequencies. In particular, it always remains at the same value as in the cases in which it is orthogonali.e., the slope of the line defined by G11 and G22 (Figures 2 and 3B). This is actually the same slope as for an unweighted regression on the gene content. This constraint of the functional regression becomes apparent when comparing Figure 3A with 3B. Figure 3A represents the statistical formulation, showing a normal least-squares linear regression in an HW3 population, as defined above, which is not parallel to the line through G11 and G22. Figure 3B represents the functional regression (for the same population) that fits the data under the constraint of retaining the same slope as the regression in Figure 2, i.e., by being parallel to the line defined by G11 and G22. This constraint enables us to perform the regression in populations in which one only genotype is present; i.e., it allows us to use one single genotype as a reference point of the functional formulation. This is not possible for the statistical formulation, as already commented above in relation to expression (8). The equivalent parameters to
i and
ij are, in the functional formulation, the additive effects of natural allele substitutions in individuals, ai, and the deviations from those, dij.
| NUMERICAL EXAMPLE |
|---|
The statistical formulation of NOIA in QTL analysis:
The statistical formulation of NOIA can be used in QTL analysis in the same way as other statistical models of epistasis (APPENDIX C). Here we show how to use NOIA to translate statistical estimates, as they come from the analysis of experimental data, into what would come from other experimental designs and into estimates of functional epistasis. Let us first consider the estimates of genetic effects ZENG et al. (2005) obtained for a HW347 simulated population using the G2A model as a starting point. Following the logic of expressions (6) and (10), from G = SG2A · EG2A and G = SHW · EHW, we obtain
to translate the G2A estimates into what they would have been if the statistical formulation of NOIA had been used instead. The simulated populations in ZENG et al. (2005) consisted of 100,000 individuals, meaning that the random departures from the HardyWeinberg proportions are certainly negligible and the G2A model is, thus, virtually orthogonal in the population under study. The only differences between the G2A estimates and the NOIA estimates are, therefore, the signs of the additive effects (as illustrated when obtaining the G2A model as a particular case of NOIA). This can be seen in the first row of Table 1the genetic effects obtained from ZENG et al. (2005) are all positive.
|
, where H represents HW347, F represents functional, and S represents statistical in the subscripts. The genetic-effect design matrices needed for the operation are the Kronecker products of the matrices for the individual loci as in (2) and as in (B10) in APPENDIX B. A, B, and C are the three biallelic loci affecting the trait and the frequencies of the A2, B2, and C2 alleles are qA = 0.3, qB = 0.4, and qC = 0.7. Thus, we have
(utilizing that the Kronecker product is interchangeable with the inverse operation) and
, where the matrices for the individual loci are derived using expressions (4) and (5) for the functional formulation and (8) and (9) for the statistical formulation. The functional genetic effects in the resulting vector EHF are the effects of allele substitutions performed on a fictitious genotype whose genotypic value would be the mean of the HW347 population. Thus, the reference point of the functional description has not changed after the transformation from the statistical description. We change the reference to the genotypic value of a real individual below in this section.
Translating genetic effects into an ideal F2 population:
In the HW347 population, we do not consider the functional genetic effects as being meaningful per se. Here we use them as an intermediate step to computeas depicted in one of the change-of-reference arrows in the ideograph (Figure 1)the genetic effects as they would appear in an ideal F2 population (third row in Table 1). These calculations are done using (6), here taking the form
. The genetic-effect design matrices are computed as in (2) and (3), or more explicitly as in (B3) in APPENDIX B, by
and
, where the matrices of the individual loci are again computed from (4) and (5). The vector
(third row in Table 1) gives the average effects of substitutions in an F2 population. These are, therefore, the values that would be obtained in a QTL experiment by means of the F2 model in an ideal F2 population. And, in fact, these values are the same values ZENG et al. (2005) estimate from an F2 simulated population, built on the same genetic system (except for the sign differences in the genetic effects involving one additive effect, as explained above).
Genetic effects as allele substitutions on a particular genotype:
Here we use the NOIA model to obtain estimates at the reference point of the phenotypic value of a real genotype, G111111. In this way, the functional parameters get a direct genetic interpretation as natural effects of allele substitutions made on one particular individual. We shorten G111111 to read R1 in the subscripts, and hence expression (6) takes the form
, where
and
. The functional estimates of genetic effects as the natural effects of allele substitutions on the reference genotype G111111 (i.e., the resulting
vector) are shown in the last row of Table 1. These can of course be easily transformed into natural effects of allele substitutions from any other reference genotype by means of another change of reference operation.
General remarks:
All cases in Table 1, except form HW347S in the first row, are either purely functional or both functional and statistical descriptions of a genetic system in which the highest level of epistasis present is pairwise epistasis. This is why, as pointed out above in relation to expression (5), all the genetic effects of the interactions remain constant throughout these cases. The additive and dominance effects, on the other hand, do not necessarily remain constant between cases in Table 1. The values of the genetic effects of a functional and a statistical description of the same population are different because of the different meaning of the parameters in the functional and the statistical formulations of the NOIA model. The values of the genetic effects of (functional or statistical) descriptions of the system from different reference points are different because the single-locus genetic effects depend on the genetic backgroundi.e., because of epistasis.
| DISCUSSION |
|---|
Since NOIA overcomes the duality of functional and statistical models of epistasis, it enables us to obtain estimates of both functional and statistical genetic effects from data. The NOIA statistical formulation achieves orthogonality regardless of the genotype frequencies in the population and is therefore convenient for QTL detection and estimation and for an orthogonal decomposition of the genetic variance. The NOIA model is implemented with a tool to transform those orthogonal estimates into functional estimates. When expressed from the mean of the population under study, these functional estimates represent effects of allele substitutions performed on a fictitious genotype. Using the change-of-reference tool of the NOIA model, the reference point of the functional formulation can be changed to any real genotype, and therefore the NOIA model handles natural effects of allele substitutions on those genotypes, which is the genuine point of functional models. All these possibilities are represented in Table 1 as the result of a numerical example that illustrates the practical use of the theory provided within this article. The transformations in Table 1 can be explained using the classical concepts of cell means and factor effects (SEARLE 1971; COFFMAN et al. 2005). Indeed, expressions (6) and (10) are based on the fact that the genetic values (cell means) remain constant and they can therefore be used for linking and translating between genetic (factor) effects that entail different interpretations (statistical, functional, and both from different reference points).
The NOIA statistical formulation:
The statistical formulation of the NOIA model is an explicit, orthogonal description of multilocus two-allele models. Previous statistical epistasis models can thus be obtained as particular cases of NOIA. Orthogonality is a key property for statistical epistasis models to be appropriate for QTL analysis methods based on model selection. The F
model, for instance, lacks this property in commonly used experimental populations (KAO and ZENG 2002; YANG 2004; ZENG et al. 2005). The classical F2 model, on the other hand, is orthogonal in ideal F2 populations in which the frequencies are
,
,
. However, in QTL studies there are always deviations from these genotype frequencies due to sampling errors, leading to a number of problems related to QTL detection and estimation as thoroughly pointed out by YANG (2004) and by Zeng and collaborators (KAO and ZENG 2002; ZENG et al. 2005). These problems involve a bias in the estimates of genetic effects that will dramatically increase whenever segregation distortion affects at least one of the loci of the genetic system. The generality of the NOIA statistical formulation allows us to describe gene interactions of multilocus genetic systems in populations regardless of the gene frequencies of the alleles at the loci affecting the trait under study, thus avoiding the bias caused by sampling errors and segregation distortion. Furthermore, by changing the reference of the orthogonalstatistical estimates to a common reference point in NOIA, it is possible to compare the estimates of genetic effects coming from different QTL experiments affected by specific sampling errors or carried out using different experimental designs. This is an original feature of NOIA and we have proved its validity and accuracy by successfully transforming genetic effects between two simulated populations with different genotype frequencies but the same underlying genetics (Table 1).
YANG's (2004) genetic effects model can, like NOIA, deal with departures from the HardyWeinberg proportions, but his model is explicitly developed only for the two-locus case, whereas NOIA is not constrained regarding the number of loci. The epistasis model of WANG and ZENG (2006) is particularly focused on the decomposition of the genetic variance. Their model is more general than the current NOIA statistical formulation regarding the number of alleles and the computation of genetic covariances due to linkage disequilibrium. However, this model is valid only for populations under strict HardyWeinberg proportions and not developed using the convenient algebraic notation that simplifies the computation of the model for the particular population under study. This notation (together with the generality regarding genotype frequencies) allows us, in particular, to implement in NOIA a tool to translate, and therefore to compare, statistical estimates of genetic effects, as explained above. Finally, WANG and ZENG's (2006) model, the F2 model, and the G2A model do not provide a link between statistical epistasis and functional epistasis, which is the main motivation for the NOIA model.
The NOIA functional formulation:
The algebraic structure of the NOIA functional formulation resembles the statistical formulation but instead of being based on average effects of allele substitutions in populations, it uses natural (nonaverage) effects of allele substitutions as parameters. The graphical interpretation of these parameters is also akin to the classical linear regression of the genotypic values on the gene content that defines the average effects (see Figures 2 and 3B). The connection we provide between the functional and the statistical formulations enables us to feed the first one with estimates of genetic effects obtained by means of QTL mapping studies on biallelic systems, as explained in the text and illustrated by means of the numerical example. Several studies have analyzed general key properties of gene interactions using functional epistasis models. Hansen and collaborators, for instance, have found directionality of gene interactions to determine the way in which short- and long-term genetic architecture evolves in the face of selection (CARTER et al. 2005; HANSEN 2006; HANSEN et al. 2006). The NOIA model enables us now to study directionality in particular traits of particular populations, by using just data on orthogonal gene interactions from QTL studies and transforming them into functional estimates in which directionality can be inspected.
CHEVERUD and ROUTMAN (1995; CHEVERUD 2000) made a challenging attempt in the direction of linking statistical and functional (physiological) epistasis. Their unweighted regression model can be understood as a simultaneously functional and statistical description of genetic effects for a specific reference point and can be obtained as a particular case of NOIA. However, their model is not implemented with a change-of-reference tool, which causes two major practical problems. First, as a statistical model of epistasis, it is only orthogonal (and therefore appropriate for QTL detection and estimation) in populations in which every single genotype is present in the same quantity. Second, as a functional model, it cannot deal with natural effects of allele substitutions in real genotypes. In addition, several errors in the use and interpretation of the unweighted regression model have been pointed out (ZENG et al. 2005). HANSEN and WAGNER's (2001b) and BARTON and TURELLI's (2004) functional epistasis models do incorporate change-of-reference tools. The first one is formulated for multiple alleles and for constrained gene effects and interactions and the second one, like the current NOIA formulation, is a general formulation for two alleles. We find the algebraic notation of the NOIA functional formulation to be an advantage over these functional epistasis models. It is in fact by means of a parallel notation in the functional and the statistical formulations of the NOIA model that we developed both a graphical interpretation of functional epistasis and a transformation tool that enables us to feed the NOIA formulation with estimates of genetic effects from real data.
Future extensions of NOIA:
As discussed above, the theoretical framework of the NOIA model presents considerable advantages over the previous formulations of epistasis, in particular in analysis of real QTL experiments. Consequently, we are in the process of implementing NOIA in the context of QTL interval mapping with HaleyKnott regressions (HALEY and KNOTT 1992). We also aim to extend NOIA to multiple alleles and linkage disequilibrium, this last implementation motivated by the fact that even for unlinked loci, there is nonrandom association of alleles due to sampling in the experimental populations used in QTL mapping, resulting in biased estimates.
Closing perspective:
The formal framework we propose in this articletogether with the implementations we currently pursuecomprises theoretical developments and conceptual elucidations on the mathematical description of the genetic effects underlying a trait. Such a fundamental framework is reflected in graphical interpretations analogous to the classical regressions on the gene content provided by FISHER (1918) and will aid in the study of epistasis at different levels, including the role of epistasis in evolution, the response to selection in animal and plant breeding programs, and the analysis of multifactorial disease. Marker-assisted selection is a promising strategy for improving selection response for traits that are difficult to measure in individuals used for breeding or that manifest themselves late in life. The efficiency of marker-assisted selection relies on the precision with which estimates of genetic effects of individual or combinations of loci obtained in one genetic background can predict their effect in another. The generality of the NOIA model as well as its transformation and change-of-reference tools can allow the breeders to estimate the genetic effects in one experimental design and use these estimates to predict the effect of the same locus or loci in a particular genotype of a breeding individual or an average effect in any breeding population. This cannot be done with the currently available models. Another example where the NOIA model will fundamentally change the way science could proceed is in the mapping of loci underlying multifactorial disease. For example, we are on the verge of performing massive association studies on a grand scale. In these studies, deviations from ideal population conditions include sampling errors, segregation distortion, linkage disequilibrium, and (when the association studies are based on haplotypes) multiple alleles. The aim of these studies is to statistically detect loci affecting disease, but to functionally predict the effects of allele substitutions on an individual genotype basis to be able to suggest appropriate treatments or develop treatment regimes. The currently available models are far from suitable for this purpose, whereas the NOIA model is designed to do just this.
| APPENDIX A: THE NOIA MODEL FOR THE GENERAL MULTILOCUS CASE |
|---|
The vector of genotypic values, G:
The way in which the scalars of the vector G are sorted can be obtained by means of the Kronecker product of the vectors of the single-locus genotypic vectors, in which we then substitute the products of single-locus genotypic values by the correspondent multilocus genotypic values, for instance,
(
by
, or simply G1112). It is worth stressing that the Kronecker product of subsequent loci added to the genetic system must be computed to the left of the previous ones, as shown in the example below, which makes the vector expand downward as new loci are considered.
The vector of genetic effects, E:
This is obtained in a similar way as the G vector, by the Kronecker product of single-locus genetic effects vectors. In this case, we first replace the reference point by a one in the single-locus vectors and next compute the Kronecker product of the subsequent loci to the left of the previous ones. Then, to obtain E from the resulting vector, we just replace the products of the genetic effects by the corresponding interactions, for instance, dB (aA by adAB or by just ad in the two-locus case), and the first scalar of the vector, which shall be one, by the reference point R. Greek letters are used instead of the Latin letters in the statistical formulation. As was the case for G, to add new loci makes the vector E expand downward.
The genetic-effect design matrix, S:
Once the single-locus genetic-effects design matrices are expressed at the desired single-locus reference point, the multilocus S matrix for the complete system can be obtained as the Kronecker product (for subsequent loci, to the left of the previous ones) of the single loci, as already explained in the text using expressions (2) and (3) and also in APPENDIX B using (B3). We could also describe the system by multiplying the S matrices of subsequent loci to the right of the previous ones. In this case the vectors of genotypic values and genetic effects would need to be sorted in a different way, in which the new scalars that appear due to considering new loci would have to be inserted before the previous ones, instead of afterward.
Example:
Here we develop an example of a functional formulation using a real genotype as a reference point. Let us consider the simplest multilocus case, consisting of two loci, A and B, as in expression (2). This example deals with a very similar case to this expression,
, the only difference being that there we assumed that both genetic-effect design matrices came from expression (1), hence leading to
or simply G1111, as reference, whereas in this example we use as a reference the phenotypic value G1112 instead. We follow the same order as above, and therefore we begin by building the vector of genotypic values:
![]() | (A1) |
![]() | (A2) |
and SB =
and are given in (1) and in (B7) in APPENDIX B, respectively. Therefore, the two-locus genetic-effect design matrix is, by computing just the Kronecker product of these two matrices,
![]() | (A3) |
From (A1), (A2), and (A3) we have GAB = SAB · EAB, which describes every genotypic value in a two-locus two-alleles genetic system as the result of a set of allele substitutions from the reference genotype G1112.
| APPENDIX B: THE CHANGE-OF-REFERENCE OPERATION |
|---|
The functional change-of-reference operation:
Recall that we have described a one-locus biallelic genetic system using G11 as a reference (Equation 1). Now, the genetic-effect design matrix that leads to a reference point R2 = p11G11 + p12G12 + p22G22,
, can be obtained from the genetic-effect design matrix for any other reference point R1,
, as
![]() | (B1) |
is the change-of-reference matrix for the reference point R2, a square matrix in which each column is filled with one of the coefficients of the linear combination of genotypes that equals the new reference:
![]() | (B2) |
matrix, independently of the starting reference point R1 and, immediately afterward, we use an example to illustrate the logic that led us to this operation.
We obtained expression (4) by performing a change-of-reference operation as shown in (B1), with
=
as in (1) and without specifying the values of the frequencies in the change-of-reference matrix
(B2). An extension to the general multilocus change-of-reference operation is straightforward. First the change of reference is performed separately for each locus, and then the S matrix of the complete system is obtained from taking the Kronecker product of the new single-locus reference matrices, in reverse order. For n loci this reads
![]() | (B3) |
, i = 1, ..., n can be obtained as in (B1).
The transitive property:
For the change-of-reference operation to be consistent, the resulting matrix o