help button home button Genetics AJP: Heart and Circ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kao, C.-H.
Right arrow Articles by Zeng, Z.-B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kao, C.-H.
Right arrow Articles by Zeng, Z.-B.
Genetics, Vol. 160, 1243-1261, March 2002, Copyright © 2002

Modeling Epistasis of Quantitative Trait Loci Using Cockerham's Model

Chen-Hung Kaoa and Zhao-Bang Zengb
a Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China
b Bioinformatics Research Center, Departments of Statistics and Genetics, North Carolina State University, Raleigh, North Carolina 27695-7566

Corresponding author: Chen-Hung Kao, Academia Sinica, Taipei 11529, Taiwan, Republic of China., chkao{at}stat.sinica.edu.tw (E-mail)

Communicating editor: M. A. F. NOOR


*  ABSTRACT
*TOP
*ABSTRACT
*COCKERHAM'S GENETIC MODEL
*MODELING QUANTITATIVE TRAITS
*QTL MAPPING USING COCKERHAM'S...
*ADVANTAGES OF COCKERHAM'S MODEL
*EXAMPLES
*CONCLUSION AND DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

We use the orthogonal contrast scales proposed by Cockerham to construct a genetic model, called Cockerham's model, for studying epistasis between genes. The properties of Cockerham's model in modeling and mapping epistatic genes under linkage equilibrium and disequilibrium are investigated and discussed. Because of its orthogonal property, Cockerham's model has several advantages in partitioning genetic variance into components, interpreting and estimating gene effects, and application to quantitative trait loci (QTL) mapping when compared to other models, and thus it can facilitate the study of epistasis between genes and be readily used in QTL mapping. The issues of QTL mapping with epistasis are also addressed. Real and simulated examples are used to illustrate Cockerham's model, compare different models, and map for epistatic QTL. Finally, we extend Cockerham's model to multiple loci and discuss its applications to QTL mapping.


GENES interact when they express their effects; i.e., the effects of genotypes at one locus depend on what genotypes are present at other loci. Interaction (epistasis) between genes affecting qualitative trait variation has been demonstrated for a long time since Gregor Mendel in 1865. Although the evidence of epistasis between genes controlling quantitative traits [quantitative trait loci (QTL)] has been reported by traditional techniques, such as variance component analyses (BRIM and COCKERHAM 1961 Down; LEE et al. 1968 Down; STUBER and MOLL 1971 Down), epistasis between individual QTL generally has been difficult to discern by traditional techniques. The recent advances in molecular biology have allowed fine-scale genetic marker maps of various organisms to be constructed for the study of individual QTL. Using such maps, statistical methods for estimating the positions and effects of individual QTL (QTL mapping) have been proposed (LANDER and BOTSTEIN 1989 Down; JANSEN 1993 Down; ZENG 1994 Down; KAO et al. 1999 Down; SEN and CHURCHILL 2001 Down). The problem of epistasis has been considered in some QTL mapping studies (e.g., STUBER et al. 1992 Down; CHEVERUD and ROUTMAN 1995 Down; DOEBLEY et al. 1995 Down; COCKERHAM and ZENG 1996 Down; KAO et al. 1999 Down; GOODNIGHT 2000 Down; ZENG et al. 2000 Down), but not sufficiently, and many theoretical and statistical issues involved with epistasis have not been discussed. Here, we discuss a genetic model, called Cockerham's model, in relation to QTL mapping with epistasis. We also investigate the model properties under linkage disequilibrium.

FISHER 1918 Down first partitioned genetic variance into components corresponding to additive, dominance, and epistatic variances using the least-squares principle. COCKERHAM 1954 Down further partitioned the epistatic variance into components using orthogonal contrasts. KEMPTHONE (1957) and HAYMAN and MATHER 1955 Down adopted the same epistasis model. HAYMAN and MATHER 1955 Down and MATHER 1967 Down proposed other epistasis models for modeling epistasis. VAN DER VEEN 1959 Down reviewed the genetic models of digenic epistasis published by then and summarized them into three categories:

Later, CROW and KIMURA 1970 Down, MATHER and JINKS 1982 Down, HALEY and KNOTT 1992 Down, and KEARSEY and POONI 1996 Down applied the F{infty}-metric model to the study of epistasis between genes, and GOODNIGHT 2000 Down adopted an alternative model modified from COCKERHAM 1954 Down to study gene interaction. Although these three models can be translated to each other by addition or subtraction of a constant (see Table 1 of VAN DER VEEN 1959 Down article), they have different meanings in interpreting gene effects, show different structures of variance components, and possess different properties in statistical estimation, which may affect the study of QTL as shown in this article.


 
View this table:
[in this window]
[in a new window]

 
Table 1. The eight orthogonal contrast scales (W's) for the F2 population

In this article, we start from the traditional partition of genetic variance into variance components using COCKERHAM's (1954) orthogonal contrasts, then lead up to a definition of the genetic parameters for genetic effects, and present Cockerham's epistasis model. The properties of Cockerham's model in modeling and mapping epistatic genes are investigated when genes are in linkage equilibrium and disequilibrium. The differences between Cockerham's model and the other models are compared, and the advantages of Cockerham's model are discussed. It shows that Cockerham's model is a more appropriate model than the other models for the study of epistasis between genes and QTL mapping in the populations, such as F2 and backcross. Real and simulated examples are used to illustrate Cockerham's model, compare different genetic models in the analysis of epistasis between genes, and map for epistatic QTL. Finally, we generalize Cockerham's model to multiple loci and discuss its applications to QTL mapping.


*  COCKERHAM'S GENETIC MODEL
*TOP
*ABSTRACT
*COCKERHAM'S GENETIC MODEL
*MODELING QUANTITATIVE TRAITS
*QTL MAPPING USING COCKERHAM'S...
*ADVANTAGES OF COCKERHAM'S MODEL
*EXAMPLES
*CONCLUSION AND DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

COCKERHAM 1954 Down used eight orthogonal contrast scales to partition the genetic variance contributed by two genes into eight components and to define the genotypic value of a genotype to find the correlation between relatives in a population. His definition of genotypic value using the orthogonal scales leads the way to construct a genetic model, which is called Cockerham's model, for modeling epistasis and defining gene effects in a population. In this section, the orthogonal contrast scales are introduced to present Cockerham's model, and the genetic parameters of Cockerham's model are defined. The similarities and differences between Cockerham's model and alternative models are compared, and their variance component structures are presented.

Orthogonal contrasts:
Assuming that allele frequencies at one locus are uncorrelated with frequencies at another locus (two loci are in linkage equilibrium), COCKERHAM 1954 Down partitioned the genetic variance caused by two loci, A and B, each with two alleles (A, a, and B, b), of a diploid organism using the orthogonal contrast scales in Table 2 of his article. The scales W't's, which are functions of genotypic frequencies pij's, have to satisfy two requirements

where i (j) indexed by 2, 1, or 0 refers to the genotype AA (BB), Aa (Bb), or aa (bb) at locus A (B), and Wtij is the scale component of genotype ij for the tth contrast. The first requirement ensures that deviations around the mean are compared (the scales Wtij's are contrasts). The second requirement ensures that the contrasts are orthogonal. W1 and W2 (W3 and W4) are the linear and quadratic orthogonal contrasts for locus A (locus B). W5 is the linear x linear contrast. W6 is the linear x quadratic contrast. W7 is the quadratic x linear contrast. W8 is the quadratic x quadratic contrast. Cockerham's orthogonal contrast scales serve the same purpose as the orthogonal contrasts for partitioning the sum of squares due to treatment into independent single-degree-of-freedom components in experimental design (STEEL and TORRIE 1981 Down). The statistical linear and quadratic terms correspond to the genetical additive and dominance terms, respectively. Cockerham used these orthogonal scales to partition the genetic variance and find the partition of variance {sigma}2t due to orthogonal scale Wt by

where Gij denotes the genotypic value of the genotype ij. He also defined Gij in terms of the scales as

(1)

where Et's are the corresponding coefficients, by solving the equations themselves, and used it to find the correlation between relatives in a population. His idea of defining the genotypic value by the orthogonal contrast scales leads up to Cockerham's genetic model for modeling epistasis between genes.


 
View this table:
[in this window]
[in a new window]

 
Table 2. Definition of genetic parameters

Cockerham's genetic model:
We now apply Cockerham's orthogonal contrast scales to the F2 population to derive Cockerham's model for the F2 population. For an F2 population, the genotypic frequencies of the nine genotypes AABB, AABb, AAbb, AaBB, AaBb, Aabb, aaBB, aaBb, and aabb are 1/16, 1/8, 1/16, 1/8, 1/4, 1/8, 1/16, 1/8, and 1/16, respectively, and Cockerham's orthogonal contrasts can be modified as shown in Table 1 (see also COCKERHAM and ZENG 1996 Down). By solving Equation 1 with the scales in Table 1, the unique solutions of the coefficients in terms of the genotypic values are

(2)


(3)


(4)


(5)


(6)


(7)


(8)


(9)


(10)

If the two genes are in linkage equilibrium, E0 is the mean of the genotypic values, .., and therefore can be denoted as µ. Coefficient E1 is equivalent to , which is one-half of the difference in genotypic value between the two homozygote means of AA and aa and thus is defined as the genetic parameter of additive effects of gene A, a1. Coefficient E2 is equivalent to , which represents the departure in genotypic value of the heterozygote mean of Aa from the midpoint between the two homozygote means of AA and aa and thus is defined as the genetic parameter of dominance effect of gene A, d1. The same argument leads us to define coefficients E3 and E4 as the genetic parameters of additive and dominance effects of gene B, a2 and d2. If the substitution effects at one locus depend on genotypes at the other locus, there is an interaction between the two genes in the usual sense. Coefficient E5 quantifies the difference between additive effects of gene A (gene B), (G2* - G0*)/2 [(G*2 - G*0)/2], in the background of two different homozygotes of gene B (gene A), BB and bb (AA and aa), and this difference is defined as the genetic parameter of additive x additive epistatic effect, iaa. The larger the difference is, the stronger the interaction is. The same argument leads to the definitions of E6, E7, and E8 as the genetic parameters of additive x dominance, iad; dominance x additive, ida; and dominance x dominance, idd; epistatic effects between genes A and B. The definitions of these nine genetic parameters are summarized in Table 2. After defining the genetic parameters of genetic effects, Equation 1 can be expressed more succinctly as

(11)

by defining the coded variables as

The coded variables of this model are mutually independent to each other due to orthogonality. The model can also be represented in a different form as Table 3. Note that the marginal means of the three genotypes, 2., 1., and 0., for locus A are µ + a1 - d1/2, µ + d1/2, and µ - a1 - d1/2, respectively, as the segregation ratio is 1:2:1. There are similar forms for locus B. The grand mean .. is equivalent to µ.


 
View this table:
[in this window]
[in a new window]

 
Table 3. Cockerham's model (the F2-metric model)

Genetic variance structure:
When applying Cockerham's model to modeling genotypic values in a population, the structure of variance components for the total genetic variance, VG, contributed by the two genes, each with two alleles, is shown in Appendix C. From Appendix C, we can see that the total genetic variance is composed of genetic variance of individual effects and covariances between different effects, and it will change with gene frequencies (p's) and linkage disequilibrium (D). Certainly, the relative strengths of genetic effects will vary according to the change in gene frequency and linkage disequilibrium. For an F2 population , the total genetic variance reduces to Equation 34 and contains covariances between different genetic effects through linkage. If genes are unlinked in the F2 population , the total genetic variance can be partitioned into eight independent components without covariance as

(12)

Each variance component is contributed by its own genetic parameter. For example, the additive variance component of gene A, , is contributed by its additive effect, a1, and it has no genetic covariance with other effects. This property greatly facilitates the evaluation of the contribution of an effect to the genetic variance. The other models, such as F{infty}-metric and mixed-metric models, do not have such a property (see Equation 18).

Linkage disequilibrium:
The coded variables in Cockerham's model (the scales in Table 1) are orthogonal and contrast to each other when the ratio of genotypic frequencies is 1:2:1:2:4:2:1:2:1 (genes are unlinked) in an F2 population. Therefore, the definition of the genetic parameters in Table 2 is appropriate for interpreting the gene effects and the genetic variance can be partitioned (Equation 12) as if genes are unlinked. If there is segregation distortion and/or linkage, the ratio will deviate from 1:2:1:2:4:2:1:2:1 (Table 6) and there will be covariances between some genetic effects (Equation 34). To take linkage disequilibrium into account in using Cockerham's model, we introduce statistical parameters to contrast with genetic parameters in interpreting gene effects when genes are in linkage disequilibrium (see next section).


 
View this table:
[in this window]
[in a new window]

 
Table 4. The F{infty}-metric model


 
View this table:
[in this window]
[in a new window]

 
Table 5. The mixed-metric model


 
View this table:
[in this window]
[in a new window]

 
Table 6. Genotypic frequencies (P's) in terms of allele frequencies (p's) and the linkage disequilibrium coefficient (D)

F{infty}-metric and mixed-metric models:
The F{infty}-metric model can be expressed in equation form as Equation 11 by coding

and , and , where the coded variables for epistasis are just the products of marginal variables. Equivalently, the F{infty}-metric model can be illustrated by Table 4. It is easy to check that the coded variables of the F{infty}-metric model do not have the property of orthogonal contrast. Also, the marginal means of one locus are involved in the genetic parameter of another locus and their epistasis parameters, and the difference in genotypic values between the two homozygotes is not equal to the genetic parameter of additive effect a1 (a2). For example, , and (Table 4). This result deviates from the usual definition in the one-locus analysis. The solutions of the marginal genetic parameters, a1, d1, a2, d2, in terms of the genotypic values for the F{infty}-metric model are

(13)


(14)


(15)


(16)


(17)

and the solutions of epistasis genetic parameters, iaa, iad, ida, and idd, are the same as those in Cockerham's model. Apparently, most of the heterozygotes are excluded in the estimation of µ and marginal parameters, making the F{infty}-metric model difficult in interpreting the gene action for the F2 population.

The equation form for the mixed-metric model, which is a mixture of Cockerham's model and the F{infty}-metric model with the first part of marginal effects from the F{infty}-metric model and the latter part of epistatic effects from Cockerham's model, is trivial (not shown), and it is tabulated in Table 5. The coded variables of the mixed-metric model are orthogonal, but not contrasts. Except for µ, the solutions of the genetic parameters of the mixed-metric model are the same as those of Cockerham's model. The solution of µ is not equal to ... By subtracting d1/2 + d2/2, the mixed-metric model will become Cockerham's model. In Table 5, the marginal means of one locus involve the dominance effect of another locus, which deviates from the one-locus analysis. For example, the marginal mean of genotype AA, 2., is µ + a1 + d2/2. Except for µ, the solutions of the genetic parameters of the mixed-metric model are the same as those of Cockerham's model.

As the F{infty}-metric model is not an orthogonal model, the total genetic variance contributed by two genes in linkage equilibrium is

(18)

which consists of the covariances between marginal and epistatic gene effects. These covariances make the evaluation of the contribution of an individual effect to the total genetic variance difficult. The genetic variance structure of the mixed-metric model is the same as that of Cockerham's model. Note that the genetic variance structures of Cockerham's model and the F{infty}-metric model cannot be translated to each other by adding or subtracting a constant value, and therefore they are different models from this point.


*  MODELING QUANTITATIVE TRAITS
*TOP
*ABSTRACT
*COCKERHAM'S GENETIC MODEL
*MODELING QUANTITATIVE TRAITS
*QTL MAPPING USING COCKERHAM'S...
*ADVANTAGES OF COCKERHAM'S MODEL
*EXAMPLES
*CONCLUSION AND DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

When applying Cockerham's model to analyze a quantitative trait, controlled by two epistatic genes A and B, from a sample of size n of an F2 population, the trait value of the kth individual with genotype ij can be modeled as

(19)

where {epsilon}ijk is a residual. Let ij and nij denote the observed frequency and sample size of genotype ij where . In expectation, and , where Pij is the population frequency of genotype ij and depends on the linkage strength between genes (Table 6). Note that the ratio of Pij's reduces to 1:2:1:2:4:2:1:2:1 if genes are unlinked .

Least-squares estimates of genetic parameters:
The least-squares estimates (LSE) of the genetic parameters in Equation 19 have similar formulations as those of Equation 2 HREF="#FD3">Equation 3Equation 4Equation 5Equation 6Equation 7Equation 8 HREF="#FD9">Equation 9Equation 10 except that Gij is replaced with ij.. For example, the LSE of a1 is

(20)

When genes are unlinked (the segregation ratio is 1:2:1:2:4:2:1:2:1), the expectation of â1 is

(21)

which corresponds to the additive effect of gene A. However, when genes are linked, â1 is not a measure of the difference between the two homozygote means as the ratio is no longer 1:2:1:2:4:2:1:2:1. Likewise, the LSE of the genetic parameters are appropriate estimates of the nine gene effects when genes are unlinked, but they are not appropriate estimates when genes are linked. To remedy this problem, statistical parameters of gene effects are introduced for interpretation in contrast to genetic parameters of gene effects.

Statistical parameters of gene effects:
When the derivatives of the error sum of squares in Equation 19 with respect to every genetic parameter in turn are set equal to zero, it gives the nine normal equations. For example, the normal equation with respect to a1 is

(22)

By taking expectation, the expected normal equations can be obtained and expressed in terms of genotypic values Gij's, population genotypic frequencies Pij's, and genetic parameters E's, as shown from Equation A9 (Appendix A). For simplicity, the left-hand sides of these nine expected normal equations are denoted as ß0, ß1, ... , ß8. Then, Equation A1 can be written as

which is the mean genotypic value in the population. Equation A2 can be written as

and it can be reformulated as

since in the F2 population. That is, ß1 quantifies one-half of the difference in genotypic value between the two homozygote means of gene A; i.e., ß1 is a quantity to measure the additive effect of gene A, no matter whether genes are in linkage equilibrium or not. Further, as the genotypic frequencies of gene A (B) have relationship in the F2 population despite linkage,

(23)

which measures the dominance effect of gene A. For epistasis parameters,

which is a weighted version of the additive x additive epistasis. When genes are unlinked, ß5 reduces to the genetic parameter iaa. When genes are linked, the genetic parameter iaa is still valid for the additive x additive effect since marginal means are not involved in it. Similarly, the genetic parameters iad, ida, and idd are still appropriate to measure the additive x dominance, dominance x additive, and dominance x dominance effects under linkage disequilibrium, and ß6, ß7, and ß8 are weighted versions of the epistatic effects, and they all reduce to iad, ida, and idd if genes are in linkage equilibrium.

Given genotypic values G's, the quantities, ß's, will have different values according to different strengths of linkage (ratios of genotypic frequencies). On the contrary, the genetic parameters, E's, will not change according to different strengths of linkage. Therefore, we define ß's as statistical parameters to contrast with the genetic parameters of gene effects. The genetic parameters can be obtained directly from Cockerham's model, but the statistical parameters cannot. However, there exists a one-to-one relationship between the two kinds of parameters as shown below. It allows that once the genetic parameters are obtained from the model the statistical parameters can be obtained by transformation.

Relationship between genetic and statistical parameters:
In a population, the frequency of the gamete AB, PAB, can be expressed in terms of allele frequencies (p's) and the linkage disequilibrium coefficient D (WEIR 1996 Down) as

where D is equivalent to (1 - 2r)/4 (r is the recombination fraction between loci A and B). If the union of gametes is random, the genotypic frequencies Pij's are products of gametic frequencies (Table 6). The expected normal equations from Equation A9 can be further expressed in terms of the genetic parameters (E's), the statistical parameters (ß's), the population allele frequencies (p's), and the linkage disequilibrium coefficient D as shown in Equations B1–B9 in Appendix B. In the F2 population, the allele frequencies pA, pa, pB, and pb are one-half, and the nine expected normal equations in Appendix B reduce to the following:

(24)


(25)


(26)


(27)


(28)


(29)


(30)


(31)


(32)

where . The statistical parameters (ß's) are functions of the genetic parameters (E's) and a linkage parameter ({lambda}), and vice versa. The approximation of the genetic parameter to its corresponding statistical parameter depends on the strength of linkage and the sizes of other genetic parameters. In matrix equation, the above equations can be also expressed as

(33)

where

contains the statistical parameters,

contains the genetic parameters, and

is a symmetric and nonsingular matrix with components associated with the linkage parameter {lambda}. The inverse of T is

(WOLFRAM 1992 Down). The two kinds of parameters have a one-to-one relationship. When genes are in linkage equilibrium ({lambda} = 0), T is diagonal, and ß's are equal to E's. When genes are not in linkage equilibrium ({lambda} != 0), they are different, but transferable.

Random mating:
Linkage disequilibrium decays after random mating. If the F2 progeny are further randomly mated, linkage disequilibrium is mitigated by a factor 1 - r, 0 < r < 0.5, gradually in each generation. The general formula of the linkage disequilibrium coefficient in generation Ft under random mating is

where t >= 2 is the number of generations. As t gets larger, {lambda}t approaches zero; i.e., linkage equilibrium will be gradually attained in later generations by random mating. After random mating, {lambda}t changes (becomes smaller), as do the genotypic frequencies (Pij's), and accordingly the statistical parameters (ß's) change and become closer to the genetic parameters (E's). Therefore, the statistical parameters (ß's) depend on the population frequencies (Pij's) and will have different values in different generations. When {lambda}t approaches 0, the ratio of the genotypic frequencies approaches 1:2:1:2:4:2:1:2:1, and the statistical parameters (ß's) will approach the genetic parameters (E's). Hence, the genetic parameters of genes in linkage disequilibrium estimated in the F2 population can be regarded as the gene effects in later generations when linkage equilibrium is attained.

Variance components:
The genetic variance contributed by two genes in the F2 population is

(34)

(Appendix C). The genetic variance is composed of the variances and covariances of genetic parameters. If genes are in linkage equilibrium or attain equilibrium in later generations by random mating , the covariances disappear and the genetic variance will be partitioned into eight independent components (Equation 12).


*  QTL MAPPING USING COCKERHAM'S MODEL
*TOP
*ABSTRACT
*COCKERHAM'S GENETIC MODEL
*MODELING QUANTITATIVE TRAITS
*QTL MAPPING USING COCKERHAM'S...
*ADVANTAGES OF COCKERHAM'S MODEL
*EXAMPLES
*CONCLUSION AND DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

In this section, we apply Cockerham's model to construct a statistical epistasis model to map for epistatic QTL and analyze epistasis between QTL. The problems when epistasis is present and ignored in QTL mapping are also investigated. By taking epistasis into account in QTL mapping, the accuracy of estimation and power of detection can be improved.

Mapping epistatic QTL:
Assume that a quantitative trait y is controlled by two interacting QTL, QA and QB, located at positions p1 and p2, in two different intervals, I1 and I2. The statistical QTL mapping model can be written as

(35)

where {epsilon}i follows N(0, {sigma}2), and the codes for variables x*1 (x*2) and z*1 (z*2) follow the codes of x1 (x2) and z1 (z2) in Cockerham's model (Equation 11). As QA (QB) is not located at the marker, its genotypes, i.e., the value of x*1 and z*1 (x*2 and z*2), are not observable. However, through its flanking markers, the conditional genotypic distribution of QA (QB) can be inferred on the basis of Haldane's mapping function (HALDANE 1919 Down) as listed in Table 2 of KAO and ZENG 1997 Down. The joint conditional genotypic distribution of QA and QB in intervals I1 and I2 can be obtained using the property of conditional independence between them (KAO and ZENG 1997 Down). Let pij's, j = 1, 2, · · · , 9, denote the conditional probabilities of the nine possible QTL genotypes for individual i. The likelihood of the statistical model is a mixture of nine normals as

(36)

where pij's and µij's are the mixing proportions and genotypic values of the nine genotypes for individual i. To obtain the maximum-likelihood estimates (MLE) of the genetic parameters and their asymptotic variance-covariance matrix for the normal mixture model, the general formulas by KAO and ZENG 1997 Down based on the expectation-maximization (EM) algorithm (DEMPSTER et al. 1977 Down) can be used. The general formulas are based on two matrices, the genetic design matrix D and the conditional probability matrix Q. Here, the genetic design matrix is a matrix with dimension 9 x 8 as

(37)

where Wi's, i = 1, 2, · · · , 9, are the orthogonal contrast scales of Cockerham's model in Table 1, and the conditional probability matrix Q is a 92 x 32 matrix with elements associated with the mixing proportions. By applying the matrices D and Q to the general formulas, the MLE and the asymptotic variance-covariance matrix can be obtained.

The proposed statistical QTL mapping model in Equation 35 can be used to search for epistatic QTL as well as to analyze epistasis between QTL by taking epistasis into account. In QTL mapping, we usually first search for QTL by ignoring epistasis. When epistasis is ignored, the accuracy in estimation and power of detection could be affected (see below). Thus, it is very likely that the detected epistatic QTL are those with relatively large marginal effects and the undetected epistatic QTL are those with relatively minor marginal effects. By taking epistasis into account, Equation 35 can be used to search for the undetected minor epistatic QTL by testing hypotheses

(38)

given the detected QTL with marginal effects a1 and d1 in the model. Note that hypotheses in (38) can consider only additive effect and a part of the four epistatic effects in testing. Alternatively, Equation 35 can be used to test for the existence of epistasis between two detected QTL by setting hypotheses

(39)

given their marginal effects in the model. Certainly, the hypotheses in (39) can contain individual epistasis parameters in the analysis. The hypotheses in (38) and (39) can be tested using the likelihood-ratio test (LRT) statistic,

where L0 and L1 are the likelihoods under H0 and H1. The critical value of the LRT statistic for rejecting H0 can be chosen from {chi}2 distribution on the basis of the Bonferroni argument.

What are the problems if epistasis is present and ignored?
Although epistasis is an ubiquitous phenomenon (WRIGHT 1980 Down), many QTL mapping methods ignore epistasis in the analysis for simplicity. It is important to investigate the problems if epistasis is present and ignored and further to solve the problems and analyze epistasis in QTL mapping. When the quantitative trait affected by the two epistatic QTL, QA and QB, is regressed on a marker M along the genome to infer QTL, under Cockerham's model, the regression coefficient for the additive effect of M is

(40)

where rAM, rBM, and rAB are the recombination fractions between QA and M, QB and M, and QA and QB, respectively, and the regression coefficient for the dominance effect is

(41)

If the marker M is coincident with QA, the coefficient aM, which reduces to the estimate of additive effect of QA, is confounded by the additive effect of QB and their epistatic effects, iad and ida, via linkage, and the coefficient dM, which is the estimate of dominance effect of QA, is confounded by the dominance effect of QB and their epistatic effects, iaa, via linkage. When the quantitative trait is regressed on both QA and QB, the partial regression coefficient for the additive effect of QA, given the additive effect of QB is

(42)

and the partial regression coefficient for the dominance effect of QA given the dominance effect of QB is

(43)

Again, the partial regression coefficients aA.Ba and dA.Bd are confounded by their epistasis, ida and iaa, respectively, via linkage. If QA and QB are unlinked , the confounding of epistasis disappears and the coefficients (Equation 40Equation 41Equation 42 HREF="#FD43">Equation 43) are all unbiased for a1 and d1. It implies that if epistasis between QTL is present and ignored in QTL mapping, the estimation of the marginal effects and positions of QTL are asymptotic unbiased if the epistatic QTL are unlinked. But, if the epistatic QTL are linked, the estimates of QTL positions and marginal effects are biased and confounded by epistatic effects via linkage. This unbiasedness property for unlinked QTL attributes to the orthogonal property of Cockerham's model. The approaches of interval mapping (LANDER and BOTSTEIN 1989 Down; JANSEN 1993 Down; ZENG 1994 Down; KAO et al. 1999 Down), which test every position within marker intervals along the entire genome for QTL detection, share the same problems and properties under Cockerham's model.

The similar investigation on the problems if epistasis is present and ignored in QTL mapping can also be done for the F{infty}-metric and mixed-metric models. Under the F{infty}-metric model, the regression coefficient for the additive effect of a marker M is

(44)

and the regression coefficient for the dominance effect of a marker M is

(45)

If the marker M is coincident with QA, the coefficients, aM and dM, reduce to the estimates of the additive and dominance effects of QA. The estimate of the additive effect of QA is confounded by the additive effect of QB, a2, and their epistatic effects, iad and ida, and the estimate of dominance effect of QA is confounded by the dominance effect of QB and their epistatic effects, iaa and idd. When the quantitative trait is regressed on both QA and QB, the partial regression coefficient for the additive effect of QA given the additive effect of QB is

(46)

and the partial regression coefficient for the dominance effect of QA given the dominance effect of QB is

(47)

Again, the partial regression coefficients of the additive and dominance effects are confounded by epistatic effects, and they are biased estimates of the additive and dominance effects. If , the four coefficients in Equation 44Equation 45Equation 46 HREF="#FD47">Equation 47 are still biased. For example, when and , which are all biased estimates of a1. Therefore, the F{infty}-metric model always has the problems of confounding and is biased in estimation if epistasis is present and ignored whether the QTL are linked or not. This implies that QTL mapping could be problematic for the F{infty}-metric model if epistasis is ignored. As the mixed-metric model is also orthogonal, it possesses the same properties as those of Cockerham's model in the QTL analysis.

When epistasis is present and ignored in QTL mapping, the genetic variance contributed by epistasis is not controlled in the model and becomes a part of the genetic residual. Thus, the sampling variances of the effects are inflated, and the power of detecting QTL decreases. If epistasis is taken into account, the epistatic variance can be controlled, and the power will increase. The increase in power depends on the size of the epistatic effect. The larger the epistatic effect that can be controlled in mapping, the larger the increase in power that can be gained. In conclusion, by taking epistasis into account in QTL mapping, the chance of finding more QTL and the accuracy of estimating QTL positions and effects can be improved.


*  ADVANTAGES OF COCKERHAM'S MODEL
*TOP
*ABSTRACT
*COCKERHAM'S GENETIC MODEL
*MODELING QUANTITATIVE TRAITS
*QTL MAPPING USING COCKERHAM'S...
*ADVANTAGES OF COCKERHAM'S MODEL
*EXAMPLES
*CONCLUSION AND DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Cockerham's model has several advantages in the study of epistasis as compared to the F{infty}-metric and mixed-metric models. When genes are in linkage equilibrium, the advantages include the following:

  1. The genetic variance can be partitioned into eight independent components (Equation 12), and there is no genetic covariance. Each component is contributed by its corresponding genetic parameter. This is a desirable property in modeling. On the contrary, the F{infty}-metric does not have such a property (Equation 18).

  2. The marginal means of one locus do not involve the parameters of another locus and the epistasis parameters, which would make Cockerham's model readily interpretable (Table 3). The marginal means of locus A are (a1 - d1/2), d1/2, and (-a1 - d1/2), which correspond to the one-locus analysis (differing by a constant d1/2) despite epistasis. For the F{infty}-metric model, the marginal means of locus A are (a1 + d2/2 + iad/2), (d1 + d2/2 + idd/2), and (-a1 + d2/2 - iad/2), which are confounded by the genetic parameter of dominance effect of locus B (d2) and their epistasis parameters, iad and idd. In the mixed-metric model, the marginal means of locus A are a1 + d2/2, d1 + d2/2, and -a1 + d2/2, which are confounded by the genetic parameters of dominance effect of locus B (d2). Both the F{infty}-metric and mixed-metric models do not follow the definition in the one-locus analysis.

  3. The difference between the two homozygote means, (G2. - G0.)/2[(G.2 - G.0)/2], estimates the genetic parameter a1 (a2) of locus A (B), and the departure of the heterozygote mean to the midpoint between the two homozygote means, (2G1. - G2. - G0.)/2[(2G.1 - G.2 - G.0)/2], estimates the genetic parameter d1 (d2) of locus A (B). They follow the same definition of additive and dominance effects in the one-locus analysis. In the F{infty}-metric model, they estimate a1 + iad/2 (a2 + ida/2) and d1 + idd/2 (d2 + idd/2) and violate the definition in the one-locus analysis.

  4. With the orthogonal property, the estimation of one genetic (marginal or epistatic) effect will not be affected by the presence or absence of other genetic effects in the model. Essentially, when epistasis is present and ignored, the estimation of the marginal effects and the location of epistatic QTL is still asymptotically unbiased and not affected by epistasis. This advantage ensures that QTL mapping can be first performed without taking epistasis into account without causing a problem under Cockerham's model. The F{infty}-metric model does not have such property (see QTL MAPPING USING COCKERHAM'S MODEL).


*  EXAMPLES
*TOP
*ABSTRACT
*COCKERHAM'S GENETIC MODEL
*MODELING QUANTITATIVE TRAITS
*QTL MAPPING USING COCKERHAM'S...
*ADVANTAGES OF COCKERHAM'S MODEL
*EXAMPLES
*CONCLUSION AND DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

In this section, real and simulated data were used to illustrate Cockerham's model, compare the differences between Cockerham's model and other models, verify the properties in statistical estimation, and map for epistatic QTL.

Real data:
DOEBLEY et al. 1995 Down crossed two corn inbred lines, Teosinte-M1L x Teosinte-M3L, to generate 183 F2 progeny, and they concluded that two unlinked markers UMC107 (QA) and BV302 (QB) are the candidate QTL for trait LBIL (average length of vegetative internodes in the primary lateral branch) in QTL analysis. Among the 183 progeny, 21 individuals have a missing trait and one individual has a missing genotype. Therefore, only the 161 individuals with complete trait and genotype information were used in the analysis. The observed allele frequencies are , and . The genotypic frequencies are 0.050, 0.137, 0.019, 0.124, 0.261, 0.149, 0.068, 0.130, and 0.062 for genotypes AABB, AABb, AAbb, AaBB, AaBb, Aabb, aaBB, aaBb, and aabb, respectively, which significantly deviate from the expected frequencies for two unlinked genes. The small sample size of AAbb in 3 individuals is responsible for the deviation.

The observed genotypic means (ij.'s) and sample sizes (nij's) of the data are listed in Table 7. If all eight genetic parameters (full model) are considered, the estimated genetic parameters by Cockerham's model, the F{infty}-metric model, and the mixed-metric model are listed in Table 9. In Table 9, except for µ, the estimates of the eight genetic parameters by Cockerham's model and the mixed-metric model are the same. Cockerham's model and the F{infty}-metric model have different estimates of marginal effects, but the same estimates of epistatic effects (see Cockerham's genetic model for the reasons). The estimates of a1 and d1 are 15.11 (P value 0.0008) and -3.92 (P value 0.5035), respectively, for Cockerham's model, and they are 24.25 (P value 0.0008) and 5.15 (P value 0.5617), respectively, for the F{infty}-metric model. The estimates of a2 and d2 are 19.46 (P value 0.0001) and -5.66 (P value 0.3336), respectively, for Cockerham's model, and they are 17.59 (P value 0.0001) and 3.40 (P value 0.3336), respectively, for the F{infty}-metric model. Very likely, the marginal effects of QA and QB are mostly additive, and their dominance effects are not significant. The estimate of iaa is 2.68 (P value 0.7054). Analytically, it means that the additive effects of QB (QA) in the background of AA (BB) and aa (bb), which are and , differ by 2.68, and this difference is not statistically significant at the 5% level (Fig 1A). The estimate of iad is -18.28 (P value 0.0411). Analytically, it means that the dominance effects of QB in the background of AA and aa, which are and , are significantly different at the 5% level. The significance of additive-by-dominance interaction can be illustrated by Fig 1B. In Fig 1B, the cross between the two lines tells that genotype Bb performs better than BB in the background of aa, but it does worse in the background of AA. The estimate of ida, 3.75, is not significant (P value 0.6725) as illustrated by the three nearly parallel lines in Fig 1C. The estimate of idd, -18.13, is not statistically significant at the 5% level (P value 0.1227), although it shows that there is a cross between lines in Fig 1D. The proportion of the genetic variance in the total variance (model R2) is 23.66% (Table 8).



View larger version (26K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Epistasis plot of the four types of epistasis from the data of DOEBLEY et al. 1995 Down in Table 7. (a) Additive-by-additive epistasis. (b) Additive-by-dominance epistasis. (c) Dominance-by-additive epistasis. (d) Dominance-by-dominance epistasis.


 
View this table:
[in this window]
[in a new window]

 
Table 7. The means of trait LBIL in the F2 population from DOEBLEY et al. 1995 Down


 
View this table:
[in this window]
[in a new window]

 
Table 8. Two-way ANOVA of DOEBLEY et al. 1995 Down data in Table 7


 
View this table:
[in this window]
[in a new window]

 
Table 9. Results from analysis of DOEBLEY et al. 1995 Down data in Table 7 by Cockerham's model, the F{infty}-metric model, and the mixed-metric model

The estimates of the statistical parameters are , and following the definitions, or they can be obtained by using Equation A9 by plugging in observed genotypic frequencies in Table 7 and the nine estimated genetic parameters in Table 9. Although the values of the statistical and genetic parameters are expected to be very close for unlinked genes, they are very different based on this data set. The difference occurs because the observed segregation ratio deviates from the expected segregation ratio.

If only the significant effects, a1, a2, and iad (reduced model), are considered for Cockerham's model, the estimates of a1, a2, and iad are 15.27 (SD 4.13, P value 0.0003), 19.13 (SD 4.04, P value < 0.0001), and -18.44 (SD 8.27, P value <0.0271), respectively, which are very close to the estimates in the full model. This shows that the estimation of one genetic parameter in Cockerham's model will not be affected by the presence or absence of other genetic parameters due to its orthogonal property. However, the F{infty}-metric model does not have such a property. If the reduced model is considered for the F{infty}-metric model, the estimates of a1, a2, and iad become 24.48 (SD 6.28, P value 0.0001), 19.12 (SD 4.04, P value < 0.0001), and -18.44 (SD 8.27, P value 0.0271), respectively. The estimate of a2 changes from 17.59 in the full model to 19.12 in the reduced model due to the confounding of (estimated in the full model) by Equation 46. Both models have the same model .

If only the significant effects are taken into account in calculating the variance components, the additive effect of QA, a1, contributes ~34.05% to the total genetic variance (Equation 12), the additive effect of QB, a2, contributes ~52.04% to the total genetic variance, and the epistatic effect, iad, contributes ~13.90% to the total genetic variance under Cockerham's model. There is no genetic covariance between effects for unlinked loci. The mixed-metric model has the same genetic variance structure as Cockerham's model. The genetic variance and covariance components under the F{infty}-metric model can be obtained using Equation 18.

Simulation:
Assume that a quantitative trait is affected by two unlinked epistatic QTL. The first QTL, QA, is located at 52 cM on the first chromosome, and the second QTL, QB, is located at 93 cM on the second chromosome. There are 11 15-cM equally spaced markers on each chromosome. The additive and dominance effects of QA are and . QB has additive effect and no dominance effect. The additive-by-additive epistatic effect is , and the other three epistatic effects are assumed to be zero. With these parameter settings, the marginal effects of QA and QB contribute 76 and 8% to the total genetic variance, and epistasis contributes 16% to the total genetic variance. The heritability of the quantitative trait is assumed to be 0.2, or equivalently the environmental variance is 25. The sample size is 200, and the number of simulated replicates is 500. When using the statistical model in Equation 35 for QTL mapping, a stepwise selection procedure (KAO et al. 1999 Down) was adopted to detect QTL and analyze epistasis, and the critical value for claiming significance was chosen as , where k is the number of parameters in testing.

The simulation results are shown in Table 10. When epistasis is ignored in QTL mapping, the powers to detect QA and QB are 1.0 and 0.238 , respectively. The mean estimates of positions of QA and QB are 51.25 with standard deviation (SD) 7.73 and 89.63 with SD 24.19, respectively. The means of the estimated additive and dominance effects of QA are 2.9941 (SD 0.5969) and 0.9816 (SD 0.9018). The mean estimate of the additive effect of QB (from significant replicates) is 1.8567 (SD 0.4196), which is poorly estimated. If the mean of the estimated effect of QB is calculated on the basis of all 500 replicates, it is 1.1214 (SD 0.7956), which is much closer to the true value. This corresponds to the theoretical proof of asymptotical unbiasedness for marginal effect in estimation if epistasis is present and ignored under Cockerham's model (Equation 42). The estimates of environmental variance and heritability are 24.26 (SD 3.29) and 0.2158 (SD 0.0445), respectively. If epistasis is considered, the powers to detect QA and QB are 1.0 and 0.5, respectively. The power of detecting QB improves from 0.238 without epistasis to 0.5 with epistasis. The mean estimates of positions of QA and QB are 50.99 (SD 7.95) and 90.46 (SD 18.67), respectively. The estimated additive and dominance effects of QA have means 2.9658 (SD 0.5700) and 1.0024 (SD 0.8697), respectively. The mean of the estimated additive effect for QB is 1.3314 (SD 0.6368) from significant replicates, and it is 1.0447 (SD 0.6941) from all replicates. The mean of the estimated epistatic effects is 1.9897 (SD 0.948). The estimates of environmental variance and heritability are 24.02 (SD 3.50) and 0.225 (SD 0.0494).


 
View this table:
[in this window]
[in a new window]

 
Table 10. Simulation results of mapping epistatic QTL in the F2 population


*  CONCLUSION AND DISCUSSION
*TOP
*ABSTRACT
*COCKERHAM'S GENETIC MODEL
*MODELING QUANTITATIVE TRAITS
*QTL MAPPING USING COCKERHAM'S...
*ADVANTAGES OF COCKERHAM'S MODEL
*EXAMPLES
*CONCLUSION AND DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

We use the orthogonal contrast scales proposed by COCKERHAM 1954 Down to define gene effects and to construct a genetic model, called Cockerham's model, for the study of epistasis between genes. The properties of Cockerham's model in modeling and mapping epistatic genes are investigated, and its variance component structure is also derived when genes are in linkage equilibrium and disequilibrium. The differences between Cockerham's model and other models in analyzing epistasis and mapping epistatic QTL are also compared. There are several advantages of using Cockerham's model in modeling epistasis of genes because of its orthogonal property. The advantages can benefit the study of QTL mapping. The issues of QTL mapping when epistasis is involved are also discussed. Real and simulated examples are used to illustrate Cockerham's model, verify its statistical properties, and map for epistatic QTL.

Parameterization of epistasis:
Different types and degrees of epistasis can be also quantified by Cockerham's model. For example, if , the genes show classical complementary interaction with a 9:7 ratio among different genotypic