Abstract
We use the orthogonal contrast scales proposed by Cockerham to construct a genetic model, called Cockerham's model, for studying epistasis between genes. The properties of Cockerham's model in modeling and mapping epistatic genes under linkage equilibrium and disequilibrium are investigated and discussed. Because of its orthogonal property, Cockerham's model has several advantages in partitioning genetic variance into components, interpreting and estimating gene effects, and application to quantitative trait loci (QTL) mapping when compared to other models, and thus it can facilitate the study of epistasis between genes and be readily used in QTL mapping. The issues of QTL mapping with epistasis are also addressed. Real and simulated examples are used to illustrate Cockerham's model, compare different models, and map for epistatic QTL. Finally, we extend Cockerham's model to multiple loci and discuss its applications to QTL mapping.
GENES interact when they express their effects; i.e., the effects of genotypes at one locus depend on what genotypes are present at other loci. Interaction (epistasis) between genes affecting qualitative trait variation has been demonstrated for a long time since Gregor Mendel in 1865. Although the evidence of epistasis between genes controlling quantitative traits [quantitative trait loci (QTL)] has been reported by traditional techniques, such as variance component analyses (Brim and Cockerham 1961; Leeet al. 1968; Stuber and Moll 1971), epistasis between individual QTL generally has been difficult to discern by traditional techniques. The recent advances in molecular biology have allowed fine-scale genetic marker maps of various organisms to be constructed for the study of individual QTL. Using such maps, statistical methods for estimating the positions and effects of individual QTL (QTL mapping) have been proposed (Lander and Botstein 1989; Jansen 1993; Zeng 1994; Kaoet al. 1999; Sen and Churchill 2001). The problem of epistasis has been considered in some QTL mapping studies (e.g., Stuberet al. 1992; Cheverud and Routman 1995; Doebleyet al. 1995; Cockerham and Zeng 1996; Kaoet al. 1999; Goodnight 2000; Zenget al. 2000), but not sufficiently, and many theoretical and statistical issues involved with epistasis have not been discussed. Here, we discuss a genetic model, called Cockerham's model, in relation to QTL mapping with epistasis. We also investigate the model properties under linkage disequilibrium.
Fisher (1918) first partitioned genetic variance into components corresponding to additive, dominance, and epistatic variances using the least-squares principle. Cockerham (1954) further partitioned the epistatic variance into components using orthogonal contrasts. Kempthone (1957) and Hayman and Mather (1955) adopted the same epistasis model. Hayman and Mather (1955) and Mather (1967) proposed other epistasis models for modeling epistasis. Van Der Veen (1959) reviewed the genetic models of digenic epistasis published by then and summarized them into three categories:
The pure-line-metric or F_{∞}-metric model (F_{∞} denotes the population derived from selfing F_{2} individuals for t generations as t → ∞): The parameters in the F_{∞}-metric model show orthogonality with respect to genotypic frequencies of an F_{∞} population under linkage equilibrium.
The F_{2}-metric model (corresponding to Cockerham's model): The parameters in the F_{2}-metric model are mutually orthogonal with respect to genotypic frequencies of an F_{2} population under linkage equilibrium.
The mixed-metric model (corresponding to Hayman and Mather's model): The mixed-metric model is a mixture of the Cockerham's model and F_{∞}-metric model, and it can be transformed to the F_{2}-metric model by subtraction of the mean.
Later, Crow and Kimura (1970), Mather and Jinks (1982), Haley and Knott (1992), and Kearsey and Pooni (1996) applied the F_{∞}-metric model to the study of epistasis between genes, and Goodnight (2000) adopted an alternative model modified from Cockerham (1954) to study gene interaction. Although these three models can be translated to each other by addition or subtraction of a constant (see Table 1 of Van Der Veen's 1959 article), they have different meanings in interpreting gene effects, show different structures of variance components, and possess different properties in statistical estimation, which may affect the study of QTL as shown in this article.
In this article, we start from the traditional partition of genetic variance into variance components using Cockerham's (1954) orthogonal contrasts, then lead up to a definition of the genetic parameters for genetic effects, and present Cockerham's epistasis model. The properties of Cockerham's model in modeling and mapping epistatic genes are investigated when genes are in linkage equilibrium and disequilibrium. The differences between Cockerham's model and the other models are compared, and the advantages of Cockerham's model are discussed. It shows that Cockerham's model is a more appropriate model than the other models for the study of epistasis between genes and QTL mapping in the populations, such as F_{2} and backcross. Real and simulated examples are used to illustrate Cockerham's model, compare different genetic models in the analysis of epistasis between genes, and map for epistatic QTL. Finally, we generalize Cockerham's model to multiple loci and discuss its applications to QTL mapping.
COCKERHAM'S GENETIC MODEL
Cockerham (1954) used eight orthogonal contrast scales to partition the genetic variance contributed by two genes into eight components and to define the genotypic value of a genotype to find the correlation between relatives in a population. His definition of genotypic value using the orthogonal scales leads the way to construct a genetic model, which is called Cockerham's model, for modeling epistasis and defining gene effects in a population. In this section, the orthogonal contrast scales are introduced to present Cockerham's model, and the genetic parameters of Cockerham's model are defined. The similarities and differences between Cockerham's model and alternative models are compared, and their variance component structures are presented.
Orthogonal contrasts: Assuming that allele frequencies at one locus are uncorrelated with frequencies at another locus (two loci are in linkage equilibrium), Cockerham (1954) partitioned the genetic variance caused by two loci, A and B, each with two alleles (A, a, and B, b), of a diploid organism using the orthogonal contrast scales in Table 2 of his article. The scales Wt′ 's, which are functions of genotypic frequencies p_{ij}'s, have to satisfy two requirements
∑i,jpijWtij=0and∑i,jpijWtijWt′ij=0,
where i (j) indexed by 2, 1, or 0 refers to the genotype AA (BB), Aa (Bb), or aa (bb) at locus A (B), and W_{tij} is the scale component of genotype ij for the tth contrast. The first requirement ensures that deviations around the mean are compared (the scales W_{tij}'s are contrasts). The second requirement ensures that the contrasts are orthogonal. W_{1} and W_{2} (W_{3} and W_{4}) are the linear and quadratic orthogonal contrasts for locus A (locus B). W_{5} is the linear × linear contrast. W_{6} is the linear × quadratic contrast. W_{7} is the quadratic × linear contrast. W_{8} is the quadratic × quadratic contrast. Cockerham's orthogonal contrast scales serve the same purpose as the orthogonal contrasts for partitioning the sum of squares due to treatment into independent single-degree-of-freedom components in experimental design (Steel and Torrie 1981). The statistical linear and quadratic terms correspond to the genetical additive and dominance terms, respectively. Cockerham used these orthogonal scales to partition the genetic variance and find the partition of variance σt2 due to orthogonal scale W_{t} by
σt2=(∑i,jpijGijWtij)2(∑i,jpijWtij2),
where G_{ij} denotes the genotypic value of the genotype ij. He also defined G_{ij} in terms of the scales as
Gij=E0+∑t=18EtWtij,
(1)
where E_{t}'s are the corresponding coefficients, by solving the equations themselves, and used it to find the correlation between relatives in a population. His idea of defining the genotypic value by the orthogonal contrast scales leads up to Cockerham's genetic model for modeling epistasis between genes.
Cockerham's genetic model: We now apply Cockerham's orthogonal contrast scales to the F_{2} population to derive Cockerham's model for the F_{2} population. For an F_{2} population, the genotypic frequencies of the nine genotypes AABB, AABb, AAbb, AaBB, AaBb, Aabb, aaBB, aaBb, and aabb are 1/16, 1/8, 1/16, 1/8, 1/4, 1/8, 1/16, 1/8, and 1/16, respectively, and Cockerham's orthogonal contrasts can be modified as shown in Table 1 (see also Cockerham and Zeng 1996). By solving Equation 1 with the scales in Table 1, the unique solutions of the coefficients in terms of the genotypic values are
E0=G2216+G218+G2016+G128+G114+G108+G0216+G018+G0016,
(2)
E1=G228+G214+G208−G028−G014−G008,
(3)
E2=G128+G114+G108−G2216−G218−G2016−G0216−G0116−G0016,
(4)
E3=G228+G124+G028−G208−G104+G008,
(5)
E4=G218+G114+G018−G2216−G128−G0216−G2016−G1081G0016,
(6)
E5=(G22−G02)−(G20−G00)4=(G22−G20)−(G02−G00)4,
(7)
E6=(2G21−G22−G20)−(2G01−G02−G00)4,
(8)
E7=(2G12−G22−G02)−(2G10−G20−G00)4,
(9)
E8=2(2G11−G21−G01)−(2G12−G22−G02)4−(2G10−G20−G00)4=2(2G11−G12−G10)−(2G21−G22−G20)4−(2G01−G02−G00)4.
(10)
If the two genes are in linkage equilibrium, E_{0} is the mean of the genotypic values, G¯.. , and therefore can be denoted as μ. Coefficient E_{1} is equivalent to (G¯2.−G¯0.)∕2 , which is one-half of the difference in genotypic value between the two homozygote means of AA and aa and thus is defined as the genetic parameter of additive effects of gene A, a_{1}. Coefficient E_{2} is equivalent to (2G¯1.−G¯2.−G¯0.)∕2 , which represents the departure in genotypic value of the heterozygote mean of Aa from the midpoint between the two homozygote means of AA and aa and thus is defined as the genetic parameter of dominance effect of gene A, d_{1}. The same argument leads us to define coefficients E_{3} and E_{4} as the genetic parameters of additive and dominance effects of gene B, a_{2} and d_{2}. If the substitution effects at one locus depend on genotypes at the other locus, there is an interaction between the two genes in the usual sense. Coefficient E_{5} quantifies the difference between additive effects of gene A (gene B), (G_{2*} − G_{0*})/2 [(G_{*2} − G_{*0})/2], in the background of two different homozygotes of gene B (gene A), BB and bb (AA and aa), and this difference is defined as the genetic parameter of additive × additive epistatic effect, i_{aa}. The larger the difference is, the stronger the interaction is. The same argument leads to the definitions of E_{6}, E_{7}, and E_{8} as the genetic parameters of additive × dominance, i_{ad}; dominance × additive, i_{da}; and dominance × dominance, i_{dd}; epistatic effects between genes A and B. The definitions of these nine genetic parameters are summarized in Table 2. After defining the genetic parameters of genetic effects, Equation 1 can be expressed more succinctly as
Gij=μ+a1x1+d1z1+a2x2+d2z2+iaawaa+iadwad+idawda+iddwdd,
(11)
by defining the coded variables as
x1={1ifAisAA0ifAisAa−1ifAisaa,}x2={1ifBisBB0ifBisBb−1ifBisbb,}z1={1∕2ifAisAa−1∕2otherwise,}z2={1∕2ifBisBb−1∕2otherwise,}waa=x1×x2,wad=x1×z2,wda=z1×x2,wdd=z1×z2.
The coded variables of this model are mutually independent to each other due to orthogonality. The model
can also be represented in a different form as Table 3. Note that the marginal means of the three genotypes, G¯2 , G¯1 , and G¯0 , for locus A are μ + a_{1} − d_{1}/2, μ + d_{1}/2, and μ − a_{1} − d_{1}/2, respectively, as the segregation ratio is 1:2:1. There are similar forms for locus B. The grand mean G¯.. is equivalent to μ.
TABLE 1
The eight orthogonal contrast scales (W's) for the F_{2} population
TABLE 2
Definition of genetic parameters
Genetic variance structure: When applying Cockerham's model to modeling genotypic values in a population, the structure of variance components for the total genetic variance, V_{G}, contributed by the two genes, each with two alleles, is shown in appendix c. From appendix c, we can see that the total genetic variance is composed of genetic variance of individual effects and covariances between different effects, and it will change with gene frequencies (p's) and linkage disequilibrium (D). Certainly, the relative strengths of genetic effects will vary according to the change in gene frequency and linkage disequilibrium. For an F_{2} population (p_{A} = p_{B} = 0.5), the total genetic variance reduces to Equation 34 and contains covariances between different genetic effects through linkage. If genes are unlinked in the F_{2} population (p_{A} = p_{B} = 0.5 and D = 0), the total genetic variance can be partitioned into eight independent components without covariance as
VG=12a12+14d12+12a22+14d22+14iaa2+18iad2+18ida2+116idd2.
(12)
Each variance component is contributed by its own genetic parameter. For example, the additive variance component of gene A, a12∕2 , is contributed by its additive effect, a_{1}, and it has no genetic covariance with other effects. This property greatly facilitates the evaluation of the contribution of an effect to the genetic variance. The other models, such as F_{∞}-metric and mixed-metric models, do not have such a property (see Equation 18).
Linkage disequilibrium:The coded variables in Cockerham's
model (the scales in Table 1) are orthogonal and contrast to each other when the ratio of genotypic frequencies is 1:2:1:2:4:2:1:2:1 (genes are unlinked) in an F_{2} population. Therefore, the definition of the genetic parameters in Table 2 is appropriate for interpreting the gene effects and the genetic variance can be partitioned (Equation 12) as if genes are unlinked. If there is segregation distortion and/or linkage, the ratio will deviate from 1:2:1:2:4:2:1:2:1 (Table 6) and there will be covariances between some genetic effects (Equation 34). To take linkage disequilibrium into account in using Cockerham's model, we introduce statistical parameters to contrast with genetic parameters in interpreting gene effects when genes are in linkage disequilibrium (see next section).
TABLE 3
Cockerham's model (the F_{2}-metric model)
TABLE 4
The F_{∞}-metric model
F_{∞}-metric and mixed-metric models:The F_{∞}-metric model can be expressed in equation form as Equation 11 by coding
x1={1ifAisAA0ifAisAa−1ifAisaa,}x2={1ifBisBB0ifBisBb−1ifBisbb,}z1={1ifAisAa0otherwise,}z2={1ifBisBb0otherwise,}
and w_{aa} = x_{1} × x_{2}, w_{ad} = x_{1} × z_{2}, w_{da} = z_{1} × x_{2}, and w_{dd} = z_{1} × z_{2}, where the coded variables for epistasis are just the products of marginal variables. Equivalently, the F_{∞}-metric model can be illustrated by Table 4. It is easy to check that the coded variables of the F_{∞}-metric model do not have the property of orthogonal contrast. Also, the marginal means of one locus are involved in the genetic parameter of another locus and their epistasis parameters, and the difference in genotypic values between the two homozygotes is not equal to the genetic parameter of additive effect a_{1} (a_{2}). For example, G¯2.=(μ+a1+d2∕2+iad∕2) , (G¯2.−G¯0.)=a1+iad∕2 , and G¯..=μ+d1∕2+d2∕2+idd∕4 (Table 4). This result deviates from the usual definition in the one-locus analysis. The solutions of the marginal genetic parameters, a_{1}, d_{1}, a_{2}, d_{2}, in terms of the genotypic values for the F_{∞}-metric model are
μ=G22+G20+G02+G004,
(13)
a1=(G22+G20)−(G02+G00)4,
(14)
d1=2(G12−G10)−(G22+G20)−(G02+G00)4,
(15)
a2=(G22+G02)−(G20+G00)4,
(16)
d2=2(G21−G01)−(G22+G02)−(G20+G00)4,
(17)
and the solutions of epistasis genetic parameters, i_{aa}, i_{ad}, i_{da}, and i_{dd}, are the same as those in Cockerham's model. Apparently, most of the heterozygotes are excluded in the estimation of μ and marginal parameters, making the F_{∞}-metric model difficult in interpreting the gene action for the F_{2} population.
The equation form for the mixed-metric model, which is a mixture of Cockerham's model and the F_{∞}-metric model with the first part of marginal effects from the F_{∞}-metric model and the latter part of epistatic effects from Cockerham's model, is trivial (not shown), and it is tabulated in Table 5. The coded variables of the mixed-metric model are orthogonal, but not contrasts. Except
for μ, the solutions of the genetic parameters of the mixed-metric model are the same as those of Cockerham's model. The solution of μ is not equal to G¯.. . By subtracting d_{1}/2 + d_{2}/2, the mixed-metric model will become Cockerham's model. In Table 5, the marginal means of one locus involve the dominance effect of another locus, which deviates from the one-locus analysis. For example, the marginal mean of genotype AA, G¯2 , is μ + a_{1} + d_{2}/2. Except for μ, the solutions of the genetic parameters of the mixed-metric model are the same as those of Cockerham's model.
TABLE 5
The mixed-metric model
As the F_{∞}-metric model is not an orthogonal model, the total genetic variance contributed by two genes in linkage equilibrium is
VG=12a12+14d12+12a22+14d22+14iaa2+14iad2+14ida2+316idd2+12a1iad+12a2ida−14d1iaa−14d2iaa+14d1idd+14d2idd,
(18)
which consists of the covariances between marginal and epistatic gene effects. These covariances make the evaluation of the contribution of an individual effect to the total genetic variance difficult. The genetic variance structure of the mixed-metric model is the same as that of Cockerham's model. Note that the genetic variance structures of Cockerham's model and the F_{∞}-metric model cannot be translated to each other by adding or subtracting a constant value, and therefore they are different models from this point.
MODELING QUANTITATIVE TRAITS
When applying Cockerham's model to analyze a quantitative trait, controlled by two epistatic genes A and B, from a sample of size n of an F_{2} population, the trait value of the kth individual with genotype ij can be modeled as
yijk=Gij+εijk=μ+a1x1+d1z1+a2x2+d2z2+iaawaa+iadwad+idawda+iddwdd+εijk,
(19)
where ε_{ijk} is a residual. Let P∼ij and n_{ij} denote the observed frequency and sample size of genotype ij where nij=n×P∼ij . In expectation, E(n_{ij}) = n × P_{ij} and E(y_{ij.}) = n × P_{ij} × G_{ij}, where P_{ij} is the population frequency of genotype ij and depends on the linkage strength between genes (Table 6). Note that the ratio of P_{ij}'s reduces to 1:2:1:2:4:2:1:2:1 if genes are unlinked (D = 0).
Least-squares estimates of genetic parameters: The least-squares estimates (LSE) of the genetic parameters in Equation (19) have similar formulations as those of Equations 2, 3, 4, 5, 6, 7, 8, 9 and 10 except that G_{ij} is replaced with y¯ij . For example, the LSE of a_{1} is
a^1=y‒22.8+y‒21.4+y‒20.8−y‒02.8−y‒01.4−y‒00.8.
(20)
When genes are unlinked (the segregation ratio is 1:2:1:2:4:2:1:2:1), the expectation of a^1 is
E(a^1)=G‒2.−G‒0.2,
(21)
which corresponds to the additive effect of gene A.
However, when genes are linked, a^1 is not a measure of the difference between the two homozygote means as the ratio is no longer 1:2:1:2:4:2:1:2:1. Likewise, the LSE of the genetic parameters are appropriate estimates of the nine gene effects when genes are unlinked, but they are not appropriate estimates when genes are linked. To remedy this problem, statistical parameters of gene effects are introduced for interpretation in contrast to genetic parameters of gene effects.
TABLE 6
Genotypic frequencies (P's) in terms of allele frequencies (p's) and the linkage disequilibrium coefficient (D)
Statistical parameters of gene effects: When the derivatives of the error sum of squares in Equation 19 with respect to every genetic parameter in turn are set equal to zero, it gives the nine normal equations. For example, the normal equation with respect to a_{1} is
y2.−y0.=(n22+n21+n20−n02−n01−n00)μ^+(n22+n21+n20+n02+n01+n00)a^1+(−n22−n21−n20+n02+n01+n00)d^12+(n22−n20−n02+n00)a^2+(−n22+n21−n20+n02−n01+n00)d^2+(n22−n20+n02−n00)i^aa+(−n22+n21−n20−n02+n01−n00)i^ad2+(−n22+n20+n02−n00)i^da2+(n22−n21+n20−n02+n01−n00)i^dd4.
By taking expectation, the expected normal equations can be obtained and expressed in terms of genotypic values G_{ij}'s, population genotypic frequencies P_{ij}'s, and genetic parameters E's, as shown from Equations A1, A2, A3, A4, A5, A6, A7, A8 and A9 (appendix a). For simplicity, the left-hand sides of these nine expected normal equations are denoted as β_{0}, β_{1}, … , β_{8}. Then, Equation A1 can be written as
β0=P22G22+P21G21+P20G20+P12G12+P11G11+P10G10+P02G02+P01G01+P00G00,
which is the mean genotypic value in the population. Equation A2 can be written as
β1=P22G22+P21G21+P20G20−P02G02−P01G01−P00G00P22+P21+P20+P02+P01+P00+,
and it can be reformulated as
β1=(P22G22+P21G21+P20G20)∕(P22+P21+P20)2−(P02G02+P01G01+P00G00)∕(P02+P01+P00)2=G‒2.−G‒0.2
since P_{22} + P_{21} + P_{20} = P_{02} + P_{01} + P_{00} = ¼ in the F_{2} population. That is, β_{1} quantifies one-half of the difference in genotypic value between the two homozygote means of gene A; i.e., β_{1} is a quantity to measure the additive effect of gene A, no matter whether genes are in linkage equilibrium or not. Further, as the genotypic frequencies of gene A (B) have relationship 2P_{2.} = 2P_{0.} = P_{1.} = ½(2P_{.2} = 2P_{.0} = P_{.1} = ½) in the F_{2} population despite linkage,
β2=2(P12G12+P11G11+P10G10−P22G22−P21G21−P20G20−P02G02−P01G01−P00G00)=2(P12G12+P11G11+P10G10)∕(P12+P11+P10)2−(P22G22+P21G21+P20G20)∕(P22+P21+P20)2−(P02G02+P01G01+P00G00)∕(P02+P01+P00)2=2G‒1.−G‒2.−G‒0.2
(23)
which measures the dominance effect of gene A. For epistasis parameters,
β5=(P22G22−P20G20)−(P02G02−P00G00)P22+P20+P02+P00=(P22G22−P02G02)−(P20G20−P00G00)P22+P20+P02+P00,
which is a weighted version of the additive × additive epistasis. When genes are unlinked, β_{5} reduces to the genetic parameter i_{aa}. When genes are linked, the genetic parameter i_{aa} is still valid for the additive × additive effect since marginal means are not involved in it. Similarly, the genetic parameters i_{ad}, i_{da}, and i_{dd} are still appropriate to measure the additive × dominance, dominance × additive, and dominance × dominance effects under linkage disequilibrium, and β_{6}, β_{7}, and β_{8} are weighted versions of the epistatic effects, and they all reduce to i_{ad}, i_{da}, and i_{dd} if genes are in linkage equilibrium.
Given genotypic values G's, the quantities, β's, will have different values according to different strengths of linkage (ratios of genotypic frequencies). On the contrary, the genetic parameters, E's, will not change according to different strengths of linkage. Therefore, we define β's as statistical parameters to contrast with the genetic parameters of gene effects. The genetic parameters can be obtained directly from Cockerham's model, but the statistical parameters cannot. However, there exists a one-to-one relationship between the two kinds of parameters as shown below. It allows that once the genetic parameters are obtained from the model the statistical parameters can be obtained by transformation.
Relationship between genetic and statistical parameters: In a population, the frequency of the gamete AB, P_{AB}, can be expressed in terms of allele frequencies (p's) and the linkage disequilibrium coefficient D (Weir 1996) as
PAB=pApB+D,
where D is equivalent to (1 − 2r)/4 (r is the recombination fraction between loci A and B). If the union of gametes is random, the genotypic frequencies P_{ij}'s are products of gametic frequencies (Table 6). The expected normal equations from Equations A1, A2, A3, A4, A5, A6, A7, A8 and A9 can be further expressed in terms of the genetic parameters (E's), the statistical parameters (β's), the population allele frequencies (p's), and the linkage disequilibrium coefficient D as shown in Equations B1, B2, B3, B4, B5, B6, B7, B8 and B9 in appendix b. In the F_{2} population, the allele frequencies p_{A}, p_{a}, p_{B}, and p_{b} are one-half, and the nine expected normal equations in appendix b reduce to the following:
β0=μ+λ2iaa+λ2idd4,
(24)
β1=a1+λa2−λ2iad2−λida2,
(25)
β3=λa1+a2−λiad2−λ2ida2
(26)
β22=d12+λ2d22−λ2iaa,
(27)
β42=λ2d12+d22−λ2iaa,
(28)
β5=41+λ2[λ2μ−λ2d12−λ2d22+(1+λ24)iaa+λ2idd4],
(29)
β62=2(−λ22a1−λ2a2+12iad2+λ2ida2),
(30)
β72=2(−λ2a1−λ22a2+λ2iad2+12ida2),
(31)
β84=λ2μ+λ2iaa+idd4,
(32)
where λ = 1 − 2r. The statistical parameters (β's) are functions of the genetic parameters (E's) and a linkage parameter (λ), and vice versa. The approximation of the genetic parameter to its corresponding statistical parameter depends on the strength of linkage and the sizes of other genetic parameters. In matrix equation, the above equations can be also expressed as
B=TE,
(33)
where
B′=[β0β1β22β3β421+λ24β5β62β72β84]
contains the statistical parameters,
E′=[μa1d12a2d22iaaiad2ida2idd4]
contains the genetic parameters, and
T=[10000λ200λ2010λ00−λ2−λ00010λ2−λ20000λ0100−λ−λ2000λ201−λ2000−λ20−λ20−λ21+λ2400λ20−λ20−λ001λ00−λ0−λ200λ10λ20000λ2001]
is a symmetric and nonsingular matrix with components associated with the linkage parameter λ. The inverse of T is
T−1=1(1−λ2)2×[11+λ20−λ21+λ20−λ21+λ2−2λ00λ41+λ2010−λ00−λ2λ0−λ21+λ2011+λ20λ41+λ22λ00−λ21+λ20−λ0100λ−λ20−λ21+λ20λ41+λ2011+λ22λ00−λ21+λ2−2λ02λ02λ4(1+λ2)00−2λ0−λ20λ001−λ00λ0−λ200−λ10λ41+λ20−λ21+λ20−λ21+λ2−2λ0011+λ2]
(Wolfram 1992). The two kinds of parameters have a one-to-one relationship. When genes are in linkage equilibrium (λ = 0), T is diagonal, and β's are equal to E's. When genes are not in linkage equilibrium (λ ≠ 0), they are different, but transferable.
Random mating: Linkage disequilibrium decays after random mating. If the F_{2} progeny are further randomly mated, linkage disequilibrium is mitigated by a factor 1 − r, 0 < r < 0.5, gradually in each generation. The general formula of the linkage disequilibrium coefficient in generation F_{t} under random mating is
λt=(1−r)t−2λ,
where t ≥ 2 is the number of generations. As t gets larger, λ_{t} approaches zero; i.e., linkage equilibrium will be gradually attained in later generations by random mating. After random mating, λ_{t} changes (becomes smaller), as do the genotypic frequencies (P_{ij}'s), and accordingly the statistical parameters (β's) change and become closer to the genetic parameters (E's). Therefore, the statistical parameters (β's) depend on the population frequencies (P_{ij}'s) and will have different values in different generations. When λ_{t} approaches 0, the ratio of the genotypic frequencies approaches 1:2:1:2:4:2:1:2:1, and the statistical parameters (β's) will approach the genetic parameters (E's). Hence, the genetic parameters of genes in linkage disequilibrium estimated in the F_{2} population can be regarded as the gene effects in later generations when linkage equilibrium is attained.
Variance components: The genetic variance contributed by two genes in the F_{2} population is
Var(G)=12a12+14d12+12a22+14d22+14iaa2+18iad2+18ida2+116(1−λ4)idd2+λa1a2−λ22a1iad−λ2a1ida+λ22d1d2−λ2d1iaa−λ22a2ida−λ2d2iaa+λ4[1−λ2]iaaidd+λ4iadida
(34)
(appendix c). The genetic variance is composed of the variances and covariances of genetic parameters. If genes are in linkage equilibrium or attain equilibrium in later generations by random mating (λ = 0), the covariances disappear and the genetic variance will be partitioned into eight independent components (Equation 12).
QTL MAPPING USING COCKERHAM'S MODEL
In this section, we apply Cockerham's model to construct a statistical epistasis model to map for epistatic QTL and analyze epistasis between QTL. The problems when epistasis is present and ignored in QTL mapping are also investigated. By taking epistasis into account in QTL mapping, the accuracy of estimation and power of detection can be improved.
Mapping epistatic QTL: Assume that a quantitative trait y is controlled by two interacting QTL, Q_{A} and Q_{B}, located at positions p_{1} and p_{2}, in two different intervals, I_{1} and I_{2}. The statistical QTL mapping model can be written as
yi=μ+a1x1∗+d1z1∗+a2x2∗+d2z2∗+iaax1∗x2∗+iadx1∗z2∗+idaz1∗x2∗+iddz1∗z2∗+εi,i=1,2,⋯,n,
(35)
where ε_{i} follows N(0, σ^{2}), and the codes for variables x1∗(x2∗) and z1∗(z2∗) follow the codes of x_{1} (x_{2}) and z_{1} (z_{2}) in Cockerham's model (Equation 11). As Q_{A} (Q_{B}) is not located at the marker, its genotypes, i.e., the value of x1∗ and z1∗ (x2∗ and z2∗ ), are not observable. However, through its flanking markers, the conditional genotypic distribution of Q_{A} (Q_{B}) can be inferred on the basis of Haldane's mapping function (Haldane 1919) as listed in Table 2 of Kao and Zeng (1997). The joint conditional genotypic distribution of Q_{A} and Q_{B} in intervals I_{1} and I_{2} can be obtained using the property of conditional independence between them (Kao and Zeng 1997). Let p_{ij}'s, j = 1, 2, ⋯ , 9, denote the conditional probabilities of the nine possible QTL genotypes for individual i. The likelihood of the statistical model is a mixture of nine normals as
L(μ,a1,d1,a2,d2,iaa,iad,ida,idd,σ2∣Y,X,Z)=Πi=1n[∑j=132pijN(μij,σ2)],
(36)
where p_{ij}'s and μ_{ij}'s are the mixing proportions and genotypic values of the nine genotypes for individual i. To obtain the maximum-likelihood estimates (MLE) of the genetic parameters and their asymptotic variance-covariance matrix for the normal mixture model, the general formulas by Kao and Zeng (1997) based on the expectation-maximization (EM) algorithm (Dempsteret al. 1977) can be used. The general formulas are based on two matrices, the genetic design matrix D and the conditional probability matrix Q. Here, the genetic design matrix is a matrix with dimension 9 × 8 as
D=[W1W2W3W4W5W6W7W8]
(37)
where W_{i}'s, i = 1, 2, ⋯ , 9, are the orthogonal contrast scales of Cockerham's model in Table 1, and the conditional probability matrix Q is a 9^{2} × 3^{2} matrix with elements associated with the mixing proportions. By applying the matrices D and Q to the general formulas, the MLE and the asymptotic variance-covariance matrix can be obtained.
The proposed statistical QTL mapping model in Equation 35 can be used to search for epistatic QTL as well as to analyze epistasis between QTL by taking epistasis into account. In QTL mapping, we usually first search for QTL by ignoring epistasis. When epistasis is ignored, the accuracy in estimation and power of detection could be affected (see below). Thus, it is very likely that the detected epistatic QTL are those with relatively large marginal effects and the undetected epistatic QTL are those with relatively minor marginal effects. By taking epistasis into account, Equation 35 can be used to search for the undetected minor epistatic QTL by testing hypotheses
H0:a2=d2=iaa=iad=ida=idd=0;H1:atleastoneofthemisnot0;
(38)
given the detected QTL with marginal effects a_{1} and d_{1} in the model. Note that hypotheses in (38) can consider only additive effect and a part of the four epistatic effects in testing. Alternatively, Equation 35 can be used to test for the existence of epistasis between two detected QTL by setting hypotheses
H0:iaa=iad=ida=idd=0;H1:atleastoneofthemisnot0;
(39)
given their marginal effects in the model. Certainly, the hypotheses in (39) can contain individual epistasis parameters in the analysis. The hypotheses in (38) and (39) can be tested using the likelihood-ratio test (LRT) statistic,
LRT=−2logL0L1,
where L_{0} and L_{1} are the likelihoods under H_{0} and H_{1}. The critical value of the LRT statistic for rejecting H_{0} can be chosen from χ^{2} distribution on the basis of the Bonferroni argument.
What are the problems if epistasis is present and ignored? Although epistasis is an ubiquitous phenomenon (Wright 1980), many QTL mapping methods ignore epistasis in the analysis for simplicity. It is important to investigate the problems if epistasis is present and ignored and further to solve the problems and analyze epistasis in QTL mapping. When the quantitative trait affected by the two epistatic QTL, Q_{A} and Q_{B}, is regressed on a marker M along the genome to infer QTL, under Cockerham's model, the regression coefficient for the additive effect of M is
aM=(1−2rAM)a1+(1−2rBM)a2−12(1−2rAB)(1−2rBM)iad−(1−2rAB)(1−2rAM)ida,
(40)
where r_{AM}, r_{BM}, and r_{AB} are the recombination fractions between Q_{A} and M, Q_{B} and M, and Q_{A} and Q_{B}, respectively, and the regression coefficient for the dominance effect is
dM=(1−2rAM)2d1+(1−2rBM)2d2−(1−2rAM)(1−2rBM)iaa.
(41)
If the marker M is coincident with Q_{A}, the coefficient a_{M}, which reduces to the estimate of additive effect of Q_{A}, is confounded by the additive effect of Q_{B} and their epistatic effects, i_{ad} and i_{da}, via linkage, and the coefficient d_{M}, which is the estimate of dominance effect of Q_{A}, is confounded by the dominance effect of Q_{B} and their epistatic effects, i_{aa}, via linkage. When the quantitative trait is regressed on both Q_{A} and Q_{B}, the partial regression coefficient for the additive effect of Q_{A}, given the additive effect of Q_{B} is
aA.Ba=a1−12(1−2rAB)ida,
(42)
and the partial regression coefficient for the dominance effect of Q_{A} given the dominance effect of Q_{B} is
dA.Bd=d1−(1−2rAB)1+(1−2rAB)2iaa.
(43)
Again, the partial regression coefficients a_{A.Ba} and d_{A.Bd} are confounded by their epistasis, i_{da} and i_{aa}, respectively, via linkage. If Q_{A} and Q_{B} are unlinked (r_{AB} = 0.5), the confounding of epistasis disappears and the coefficients (Equations 40, 41, 42 and 43) are all unbiased for a_{1} and d_{1}. It implies that if epistasis between QTL is present and ignored in QTL mapping, the estimation of the marginal effects and positions of QTL are asymptotic unbiased if the epistatic QTL are unlinked. But, if the epistatic QTL are linked, the estimates of QTL positions and marginal effects are biased and confounded by epistatic effects via linkage. This unbiasedness property for unlinked QTL attributes to the orthogonal property of Cockerham's model. The approaches of interval mapping (Lander and Botstein 1989; Jansen 1993; Zeng 1994; Kaoet al. 1999), which test every position within marker intervals along the entire genome for QTL detection, share the same problems and properties under Cockerham's model.
The similar investigation on the problems if epistasis is present and ignored in QTL mapping can also be done for the F_{∞}-metric and mixed-metric models. Under the F_{∞}-metric model, the regression coefficient for the additive effect of a marker M is
aM=(1−2rAM)a1+(1−2rBM)a2+12(1−2rAM)[1−(1−2rBM)2]iad+12(1−2rBM)(1−(1−2rAM)2)ida,
(44)
and the regression coefficient for the dominance effect of a marker M is
dM=(1−2rAM)2d1+(1−2rBM)2d2−(1−2rAM)(1−2rBM)iaa+12(1−2rAM)2(1−2rBM)2idd.
(45)
If the marker M is coincident with Q_{A}, the coefficients, a_{M} and d_{M}, reduce to the estimates of the additive and dominance effects of Q_{A}. The estimate of the additive effect of Q_{A} is confounded by the additive effect of Q_{B}, a_{2}, and their epistatic effects, i_{ad} and i_{da}, and the estimate of dominance effect of Q_{A} is confounded by the dominance effect of Q_{B} and their epistatic effects, i_{aa} and i_{dd}. When the quantitative trait is regressed on both Q_{A} and Q_{B}, the partial regression coefficient for the additive effect of Q_{A} given the additive effect of Q_{B} is
aA.Ba=a1+12iad−12(1−2rAB)ida,
(46)
and the partial regression coefficient for the dominance effect of Q_{A} given the dominance effect of Q_{B} is
dA.Bd=d1−1−2rAB1+(1−2rAB)2iaa+12idd.
(47)
Again, the partial regression coefficients of the additive and dominance effects are confounded by epistatic effects, and they are biased estimates of the additive and dominance effects. If r_{AB} = 0.5, the four coefficients in Equations 44, 45, 46 and 47 are still biased. For example, when r_{AB} = 0.5, a_{A} = a_{1} + i_{ad}/2 and a_{A.Ba} = a_{1} + i_{ad}/2, which are all biased estimates of a_{1}. Therefore, the F_{∞}-metric model always has the problems of confounding and is biased in estimation if epistasis is present and ignored whether the QTL are linked or not. This implies that QTL mapping could be problematic for the F_{∞}-metric model if epistasis is ignored. As the mixed-metric model is also orthogonal, it possesses the same properties as those of Cockerham's model in the QTL analysis.
When epistasis is present and ignored in QTL mapping, the genetic variance contributed by epistasis is not controlled in the model and becomes a part of the genetic residual. Thus, the sampling variances of the effects are inflated, and the power of detecting QTL decreases. If epistasis is taken into account, the epistatic variance can be controlled, and the power will increase. The increase in power depends on the size of the epistatic effect. The larger the epistatic effect that can be controlled in mapping, the larger the increase in power that can be gained. In conclusion, by taking epistasis into account in QTL mapping, the chance of finding more QTL and the accuracy of estimating QTL positions and effects can be improved.
ADVANTAGES OF COCKERHAM'S MODEL
Cockerham's model has several advantages in the study of epistasis as compared to the F_{∞}-metric and mixed-metric models. When genes are in linkage equilibrium, the advantages include the following:
The genetic variance can be partitioned into eight independent components (Equation 12), and there is no genetic covariance. Each component is contributed by its corresponding genetic parameter. This is a desirable property in modeling. On the contrary, the F_{∞}-metric does not have such a property (Equation 18).
The marginal means of one locus do not involve the parameters of another locus and the epistasis parameters, which would make Cockerham's model readily interpretable (Table 3). The marginal means of locus A are (a_{1} − d_{1}/2), d_{1}/2, and (−a_{1} − d_{1}/2), which correspond to the one-locus analysis (differing by a constant d_{1}/2) despite epistasis. For the F_{∞}-metric model, the marginal means of locus A are (a_{1} + d_{2}/2 + i_{ad}/2), (d_{1} + d_{2}/2 + i_{dd}/2), and (−a_{1} + d_{2}/2 − i_{ad}/2), which are confounded by the genetic parameter of dominance effect of locus B (d_{2}) and their epistasis parameters, i_{ad} and i_{dd}. In the mixed-metric model, the marginal means of locus A are a_{1} + d_{2}/2, d_{1} + d_{2}/2, and −a_{1} + d_{2}/2, which are confounded by the genetic parameters of dominance effect of locus B (d_{2}). Both the F_{∞}-metric and mixed-metric models do not follow the definition in the one-locus analysis.
The difference between the two homozygote means, (G_{2.} − G_{0.})/2[(G_{.2} − G_{.0})/2], estimates the genetic parameter a_{1} (a_{2}) of locus A (B), and the departure of the heterozygote mean to the midpoint between the two homozygote means, (2G_{1.} − G_{2.} − G_{0.})/2[(2G_{.1} − G_{.2} − G_{.0})/2], estimates the genetic parameter d_{1} (d_{2}) of locus A (B). They follow the same definition of additive and dominance effects in the one-locus analysis. In the F_{∞}-metric model, they estimate a_{1} + i_{ad}/2 (a_{2} + i_{da}/2) and d_{1} + i_{dd}/2 (d_{2} + i_{dd}/2) and violate the definition in the one-locus analysis.
With the orthogonal property, the estimation of one genetic (marginal or epistatic) effect will not be affected by the presence or absence of other genetic effects in the model. Essentially, when epistasis is
present and ignored, the estimation of the marginal effects and the location of epistatic QTL is still asymptotically unbiased and not affected by epistasis. This advantage ensures that QTL mapping can be first performed without taking epistasis into account without causing a problem under Cockerham's model. The F_{∞}-metric model does not have such property (see qtl mapping using cockerham's model).
TABLE 7
The means of trait LBIL in the F_{2} population from Doebley et al. (1995)
EXAMPLES
In this section, real and simulated data were used to illustrate Cockerham's model, compare the differences between Cockerham's model and other models, verify the properties in statistical estimation, and map for epistatic QTL.
Real data: Doebley et al. (1995) crossed two corn inbred lines, Teosinte-M1L × Teosinte-M3L, to generate 183 F_{2} progeny, and they concluded that two unlinked markers UMC107 (Q_{A}) and BV302 (Q_{B}) are the candidate QTL for trait LBIL (average length of vegetative internodes in the primary lateral branch) in QTL analysis. Among the 183 progeny, 21 individuals have a missing trait and one individual has a missing genotype. Therefore, only the 161 individuals with complete trait and genotype information were used in the analysis. The observed allele frequencies are p^A=0.4720 , p^a=0.5280 , p^B=0.5062 , and p^b=0.4938 . The genotypic frequencies are 0.050, 0.137, 0.019, 0.124, 0.261, 0.149, 0.068, 0.130, and 0.062 for genotypes AABB, AABb, AAbb, AaBB, AaBb, Aabb, aaBB, aaBb, and aabb, respectively, which significantly deviate from the expected frequencies for two unlinked genes. The small sample size of AAbb in 3 individuals is responsible for the deviation.
The observed genotypic means (y¯ij.′s ) and sample sizes (n_{ij}'s) of the data are listed in Table 7. If all eight genetic parameters (full model) are considered, the estimated genetic parameters by Cockerham's model, the F_{∞}-metric model, and the mixed-metric model are listed in Table 9. In Table 9, except for μ, the estimates of the eight genetic parameters by Cockerham's model and the mixed-metric model are the same. Cockerham's model and the F_{∞}-metric model have different estimates of marginal effects, but the same estimates of epistatic effects (see Cockerham's genetic model for the reasons). The estimates of a_{1} and d_{1} are 15.11 (P value 0.0008) and −3.92 (P value 0.5035), respectively, for Cockerham's model, and they are 24.25 (P value 0.0008) and 5.15 (P value 0.5617), respectively, for the F_{∞}-metric model. The estimates of a_{2} and d_{2} are 19.46 (P value 0.0001) and −5.66 (P value 0.3336), respectively, for Cockerham's model, and they are 17.59 (P value 0.0001) and 3.40 (P value 0.3336), respectively, for the F_{∞}-metric model. Very likely, the marginal effects of Q_{A} and Q_{B} are mostly additive, and their dominance effects are not significant. The estimate of i_{aa} is 2.68 (P value 0.7054). Analytically, it means that the additive effects of Q_{B} (Q_{A}) in the background of AA (BB) and aa (bb), which are (Y¯22−Y¯20)∕2=20.27[(Y¯22−Y¯02)∕2=26.93] and (Y¯02−Y¯00)∕2=14.91[(Y¯20−Y¯00)∕2=21.57] , differ by 2.68, and this difference is not statistically significant at the 5% level (Figure 1a). The estimate of i_{ad} is −18.28 (P value 0.0411). Analytically, it means that the dominance effects of Q_{B} in the background of AA and aa, which are 21.68[(2Y¯01−Y¯02−Y¯00)∕2] and −14.88[(2Y¯21−Y¯22−Y¯20)∕2] , are significantly different at the 5% level. The significance of additive-by-dominance interaction can be illustrated by Figure 1b. In Figure 1b, the cross between the two lines tells that genotype Bb performs better than BB in the background of aa, but it does worse in the background of AA. The estimate of i_{da}, 3.75, is not significant (P value 0.6725) as illustrated by the three nearly parallel lines in Figure 1c. The estimate of i_{dd}, −18.13, is not statistically significant at the 5% level (P value 0.1227), although it shows that there is a cross between lines in Figure 1d. The proportion of the genetic variance in the total variance (model R^{2}) is 23.66% (Table 8).
The estimates of the statistical parameters are β^0=55.67 , β^1=8.10 , β^2=4.24 , β^3=21.91 , β^4=3.10 , β^5=8.59 , β^6=0.71 , β^7=−7.52 , and β_{8} = −38.88 following the definitions, or they can be obtained by using Equations A1, A2, A3, A4, A5, A6, A7, A8 and A9 by plugging in observed genotypic frequencies in Table 7 and the nine estimated genetic parameters in Table 9. Although the values of the statistical and genetic parameters are expected to be very close for unlinked genes, they are very different based on this data set. The difference occurs because the observed segregation ratio deviates from the expected segregation ratio.
If only the significant effects, a_{1}, a_{2}, and i_{ad} (reduced model), are considered for Cockerham's model, the estimates of a_{1}, a_{2}, and i_{ad} are 15.27 (SD 4.13, P value
0.0003), 19.13 (SD 4.04, P value < 0.0001), and −18.44 (SD 8.27, P value <0.0271), respectively, which are very close to the estimates in the full model. This shows that the estimation of one genetic parameter in Cockerham's model will not be affected by the presence or absence of other genetic parameters due to its orthogonal property. However, the F_{∞}-metric model does not have such a property. If the reduced model is considered for the F_{∞}-metric model, the estimates of a_{1}, a_{2}, and i_{ad} become 24.48 (SD 6.28, P value 0.0001), 19.12 (SD 4.04, P value < 0.0001), and −18.44 (SD 8.27, P value 0.0271), respectively. The estimate of a_{2} changes from 17.59 in the full model to 19.12 in the reduced model due to the confounding of i_{da}/2 = 3.75/2 (estimated in the full model) by Equation 46. Both models have the same model R^{2} = 0.2121.
Figure 1.
Epistasis plot of the four types of epistasis from the data of Doebley et al. (1995) in Table 7. (a) Additive-by-additive epistasis. (b) Additive-by-dominance epistasis. (c) Dominance-by-additive epistasis. (d) Dominance-by-dominance epistasis.
If only the significant effects are taken into account in calculating the variance components, the additive effect of Q_{A}, a_{1}, contributes ~34.05% to the total genetic variance (Equation 12), the additive effect of Q_{B}, a_{2}, contributes ~52.04% to the total genetic variance, and the epistatic effect, i_{ad}, contributes ~13.90% to the total genetic variance under Cockerham's model. There is no genetic covariance between effects for unlinked loci. The mixed-metric model has the same genetic variance structure as Cockerham's model. The genetic variance and covariance components under the F_{∞}-metric model can be obtained using Equation 18.
Simulation: Assume that a quantitative trait is affected by two unlinked epistatic QTL. The first QTL, Q_{A}, is located at 52 cM on the first chromosome, and the second QTL, Q_{B}, is located at 93 cM on the second chromosome. There are 11 15-cM equally spaced markers on each chromosome. The additive and dominance effects
of Q_{A} are a_{1} = 3 and d_{1} = 1. Q_{B} has additive effect a_{2} = 1 and no dominance effect. The additive-by-additive epistatic effect is i_{aa} = 2, and the other three epistatic effects are assumed to be zero. With these parameter settings, the marginal effects of Q_{A} and Q_{B} contribute 76 and 8% to the total genetic variance, and epistasis contributes 16% to the total genetic variance. The heritability of the quantitative trait is assumed to be 0.2, or equivalently the environmental variance is 25. The sample size is 200, and the number of simulated replicates is 500. When using the statistical model in Equation 35 for QTL mapping, a stepwise selection procedure (Kaoet al. 1999) was adopted to detect QTL and analyze epistasis, and the critical value for claiming significance was chosen as χk,0.05∕202 , where k is the number of parameters in testing.
TABLE 8
Two-way ANOVA of Doebley et al.'s (1995) data in Table 7
TABLE 9
Results from analysis of Doebley et al.'s (1995) data in Table 7 by Cockerham's model, the F_{∞}-metric model, and the mixed-metric model
The simulation results are shown in Table 10. When epistasis is ignored in QTL mapping, the powers to detect Q_{A} and Q_{B} are 1.0 and 0.238 (χ1,0.05∕202=9.14 ; χ2,0.05∕202=11.98 ), respectively. The mean estimates of positions of Q_{A} and Q_{B} are 51.25 with standard deviation (SD) 7.73 and 89.63 with SD 24.19, respectively. The means of the estimated additive and dominance effects of Q_{A} are 2.9941 (SD 0.5969) and 0.9816 (SD 0.9018). The mean estimate of the additive effect of Q_{B} (from significant replicates) is 1.8567 (SD 0.4196), which is poorly estimated. If the mean of the estimated effect of Q_{B} is calculated on the basis of all 500 replicates, it is 1.1214 (SD 0.7956), which is much closer to the true value. This corresponds to the theoretical proof of asymptotical unbiasedness for marginal effect in estimation if epistasis is present and ignored under Cockerham's
model (Equation 42). The estimates of environmental variance and heritability are 24.26 (SD 3.29) and 0.2158 (SD 0.0445), respectively. If epistasis is considered, the powers to detect Q_{A} and Q_{B} are 1.0 and 0.5, respectively. The power of detecting Q_{B} improves from 0.238 without epistasis to 0.5 with epistasis. The mean estimates of positions of Q_{A} and Q_{B} are 50.99 (SD 7.95) and 90.46 (SD 18.67), respectively. The estimated additive and dominance effects of Q_{A} have means 2.9658 (SD 0.5700) and 1.0024 (SD 0.8697), respectively. The mean of the estimated additive effect for Q_{B} is 1.3314 (SD 0.6368) from significant replicates, and it is 1.0447 (SD 0.6941) from all replicates. The mean of the estimated epistatic effects is 1.9897 (SD 0.948). The estimates of environmental variance and heritability are 24.02 (SD 3.50) and 0.225 (SD 0.0494).
TABLE 10
Simulation results of mapping epistatic QTL in the F_{2} population
TABLE 11
The three orthogonal contrast scales (W's) for the backcross population
CONCLUSION AND DISCUSSION
We use the orthogonal contrast scales proposed by Cockerham (1954) to define gene effects and to construct a genetic model, called Cockerham's model, for the study of epistasis between genes. The properties of Cockerham's model in modeling and mapping epistatic genes are investigated, and its variance component structure is also derived when genes are in linkage equilibrium and disequilibrium. The differences between Cockerham's model and other models in analyzing epistasis and mapping epistatic QTL are also compared. There are several advantages of using Cockerham's model in modeling epistasis of genes because of its orthogonal property. The advantages can benefit the study of QTL mapping. The issues of QTL mapping when epistasis is involved are also discussed. Real and simulated examples are used to illustrate Cockerham's model, verify its statistical properties, and map for epistatic QTL.
Parameterization of epistasis: Different types and degrees of epistasis can be also quantified by Cockerham's model. For example, if a_{1} = a_{2} = d_{1} = d_{2} = 3i_{aa}/2 = 3i_{ad}/2 = 3i_{da}/2 = 3i_{dd}/2, the genes show classical complementary interaction with a 9:7 ratio among different genotypic values, and the marginal effects of A and B
contribute 42.86% each and the epistatic effects contribute 14.28% to the total genetic variance. If a_{1} = a_{2} = d_{1} = d_{2} = −i_{aa}/2 = −i_{ad} 2 = −i_{da}/2 = −i_{dd}/2, the genotypes show classical duplicate interaction with a 15:1 ratio. The contributions of marginal effects of A and B and epistatic effects to the total genetic variance are 20, 20, and 60%, respectively. Other classical epistasis, such as recessive, dominant, and suppression, can also be quantified by Cockerham's model. The parameterization of epistasis can facilitate the study of epistasis in quantitative trait analyses.
TABLE 12
Cockerham's epistasis model for the backcross population
Backcross populations: Cockerham's model for a backcross population can be also obtained on the basis of the same orthogonal contrast principle. When two loci A and B are considered, the F_{1} progeny, which produce four different digenic gametes, AB, Ab, aB, and ab, with expected frequencies (1 − r)/2, r/2, r/2, and (1 − r)/2, respectively, are backcrossed to one of the parents. The allele frequencies p_{A}, p_{a}, p_{B}, and p_{b} in the gametes of the F_{1} progeny are one-half. The scales for partitioning the genetic variance of two unlinked genes in the backcross population are listed in Table 11, and Cockerham's model for the backcross population is shown in Table 12 or can be expressed succinctly as
Gij=μ+a1x1+a2x2+w12(x1x2),
(48)
where i (j) is 2 or 1 to denote the genotype AA (BB) or Aa (Bb) or QTL A (B), and x_{1} (x_{2}) is the orthogonal contrast scale ½ or −½ if genotype of A (B) is AA (BB) or Aa (Bb). The unique solutions of μ, a_{1}, a_{2}, and w_{12} are
μ=G22+G21+G12+G114,a1=G22+G21−G12−G112,a2=G22−G21+G12−G112,w12=(G22−G21)−(G12−G11)=(G22−G12)−(G21−G11).
When genes are unlinked, the parameters μ=G¯.. , a1=G¯2.−G¯1. , a1=G¯.2−G¯.1 are the mean and marginal effects, and w_{12} measures the difference in genotypic value between the two genotypes, AA (BB) and Aa (Bb), of the A (B) gene in the background of two different genotypes, BB (AA) and Bb (Aa), of the B (A) gene. Therefore, the parameters μ, a_{1}, a_{2}, and w_{12} are defined as the genetic parameters of the mean, the marginal effects of genes A and B, and their epistatic effect. The advantages and properties of Cockerham's model shown in the F_{2} population are also true in the backcross population. Likewise, the statistical parameters can also be derived to contrast with the genetic parameters for interpreting gene effects in linkage disequilibrium.
Cockerham's model for multiple loci: It is straightforward to extend Cockerham's model to consider multiple loci without further setting the orthogonal contrast scales for partitioning the genetic variance. The extension is based on a regression principle. In regression analysis, interaction among independent variables can generally be described in terms of a product term. Following the same principle, the model considering m loci for a backcross population can be written as
G=μ+∑j=1majxj+∑j<hmwjk(xjxk)+∑j<k<lmwjkl(xjxkxl)+⋯,
where w is the genetic parameter of epistatic effect between genes. The coded variables for the epistasis are just the product of x_{j}'s. For m loci, there are 2^{m} − 1 components consisting of m components for marginal effects and 2^{m} − m − 1 epistatic components. Of the 2^{m} − m − 1 epistatic components there are m(m − 1)/2 second-order epistasis components, m(m − 1)(m − 2)/3 third-order components, etc. Generally, higher order epistasis (order higher than two) contributes very little to the total genetic variation and can be ignored. The extension of Cockerham's model to a multiple-locus model in the F_{2} population is also straightforward, but trivial. Statistical parameters for the multiple-locus model can also be derived.
Cockerham's model distinguishes itself from other models by the property of orthogonal contrast and will show the advantages in partitioning variance, genetic interpretation, statistical estimation, and QTL mapping over the other models as described in this article. When Cockerham's multiple-locus model is used to form the base of a multiple interval mapping (MIM) model (Kaoet al. 1999) for QTL mapping, the likelihood of the MIM model is a 3^{m} (2^{m}) normal mixture for the F_{2} (backcross) population, and the general formulas by Kao and Zeng (1997) can be applied to obtain the MLE and the asymptotic variance-covariance matrix by using the coded variables to set up the genetic design matrix D and the conditional independence property to construct the conditional probability matrix Q. The orthogonalized MIM model can facilitate the search of epistatic QTL, enhance the resolution of QTL mapping, and help outline a better QTL mapping strategy.
Acknowledgments
This article is dedicated to the memory of Dr. C. Clark Cockerham. The authors are grateful to Dr. Christopher Basten for his suggestions and two anonymous reviewers for helpful comments. C.-H.K. was supported by grants NSC90-2313-B-001-019 from the National Science Council, Taiwan, Republic of China. Z.-B.Z. was funded by GM45344 from the National Institutes of Health and no. 9801300 from the U.S. Department of Agriculture, Plant Genome Program.
APPENDIX B
When the genotypic frequencies (P_{ij}'s) are expressed in terms of allele frequencies (p's) and the linkage disequilibrium coefficient (D) as shown in Table 6, and the left-hand sides of the expected normal equations in appendix a are replaced with statistical parameters, they can be expressed as the following equations.
β0=μ−τAa1−τA2d12−τBa2−τB2d22+(τAτB+2D)iaa+(τAτB2+4τBD)iad2+(τA2τB+4τAD)ida2+(τAτB+4D)2idd4,
(B1)
where τ_{A} = p_{a} − p_{A}, τ_{B} = p_{b} − p_{B}, π_{A} = p_{a}p_{A}, π_{B} = p_{b}p_{B} (Weir 1996),
β1=11−2πA{−τAμ+(1−2πA)a1+τAd12+(τAτB+2D)a2+(τAτB2+4DτB)d12−[(1−2πA)τB+2DτA]iaa−[(1−2πA)τB2]+(4τAτBD+8D2)iad−(τAτB+2D)ida−(τAτB2+4DτB)idd4},
(B2)
β3=11−2πB{τBμ+(τAτB+2D)a1+(τBτA2+4DτA)d12+(1−2πB)a2+τBd22−(τA(1−2τB)+2DτB)iaa−(τAτB+2D)iad2−[(1−2πB)τA2+4DτAτB+8D2]ida2−(τBτA2+4DτA)idd4}
(B3)
β22=−τA2μ+τAa1+d12+(τA2τB+4τAD)a2+(τAτB+4D)2d22−(τAτB+2D)iaa−(τAτB2+4DτB)iad2−τBida2−τB2idd4,
(B4)
β42=−τA2μ+(τB2τA+4τBD)a1+(τAτB+4D)2d12τBa2+d22−(τAτB+2D)iaa−τAiad2−(τBτA2+4DτA)ida2−τA2idd4,
(B5)
β5=1(1−2πA)(1−πB)+2τAτBD+4D2×{(τAτB+2D)μ=[(1−2πA)τB+2τAD]a1−(τAτB+2D)d12−(τA(1−2πB)+2DτB)a2−(τAτB+2D)d22+[(1−2πA)(1−2πB)+2τAτBD+4D2]iaa+[(1−2πA)τB+2τAD]iad2+[(1−2πB)τA+2τBD]ida2+(τAτB+2D)idd4},
(B6)
β62=11−2πA{(τAτB2+4DτB)μ=[(1−2πA)τB2+4τAτBD+8D2]a1−(τAτB2+4DτB)d12−(τAτB+2D)a2−τAd22+[(1−2πA)τB+2τAD]iaa+(1−2πA)iad2+(τAτB+2D)ida2+τAidd4},
(B7)
β72=11−2πB{(τBτA2+4DτA)μ−(τAτB+2D)a1−τBd12−[(1−2πA)τB2+4τAτBD+8D2]a2−(τBτA2+4DτA)d22+[(1−2πB)τA+2τBD]iaa+(τAτB+2D)iad2+(1−2πB)ida2+τBidd4},
(B8)
β84={(τAτB2+4D)2μ−(τAτB2+2DτB)a1−τB2d12−[τA2τB+4τAD]a2−τA2d22+(τAτB+2D)iaa+τAiad2+τBida2+idd4}.
(B9)
APPENDIX C
The variance components contributed by two genes each with two alleles in a population are shown (see appendix b for the definitions of π_{A}, π_{B}, τ_{A}, τ_{B}):
(C1)
The total genetic variance is composed of variances and covariances of different genetic parameters. With inbreeding, the genetic variance becomes even more complicated and is provided by Weir and Cockerham (1977).
- Received May 10, 2001.
- Accepted January 9, 2002.
- Copyright © 2002 by the Genetics Society of America