- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.046417v1
172/1/663 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Fan, R.
- Articles by Jin, L.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Fan, R.
- Articles by Jin, L.
Originally published as Genetics Published Articles Ahead of Print on September 19, 2005.
Genetics, Vol. 172, 663-686, January 2006, Copyright © 2006
doi:10.1534/genetics.105.046417
High-Resolution Association Mapping of Quantitative Trait Loci: A Population-Based Approach
Ruzong Fan*,1,
Jeesun Jung
and
Lei Jin*
* Department of Statistics, Texas A&M University, College Station, Texas 77843 and
Department of Human Genetics, University of Pittsburgh, Graduate School of Public Health, Pittsburgh, Pennsylvania 15261
1 Corresponding author: Department of Statistics, Texas A&M University, 447 Blocker Bldg., College Station, TX 77843.
E-mail: rfan{at}stat.tamu.edu
>ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
In this article, population-based regression models are proposed for high-resolution linkage disequilibrium mapping of quantitative trait loci (QTL). Two regression models, the "genotype effect model" and the "additive effect model," are proposed to model the association between the markers and the trait locus. The marker can be either diallelic or multiallelic. If only one marker is used, the method is similar to a classical setting by Nielsen and Weir, and the additive effect model is equivalent to the haplotype trend regression (HTR) method by Zaykin et al. If two/multiple marker data with phase ambiguity are used in the analysis, the proposed models can be used to analyze the data directly. By analytical formulas, we show that the genotype effect model can be used to model the additive and dominance effects simultaneously; the additive effect model takes care of the additive effect only. On the basis of the two models, F-test statistics are proposed to test association between the QTL and markers. By a simulation study, we show that the two models have reasonable type I error rates for a data set of moderate sample size. The noncentrality parameter approximations of F-test statistics are derived to make power calculation and comparison. By a simulation study, it is found that the noncentrality parameter approximations of F-test statistics work very well. Using the noncentrality parameter approximations, we compare the power of the two models with that of the HTR. In addition, a simulation study is performed to make a comparison on the basis of the haplotype frequencies of 10 SNPs of angiotensin-1 converting enzyme (ACE) genes.
IN genetics research, one important goal is to locate and identify important genetic variants that are related to complex traits. With the development of dense maps such as single-nucleotide polymorphisms (SNPs) and high-resolution microsatellites in the human genome, enormous amounts of genetic data on human chromosomes are becoming available (INTERNATIONAL SNP MAP WORKING GROUP 2001; KONG et al. 2002; INTERNATIONAL HAPMAP CONSORTIUM 2003; HapMap project, http://www.hapmap.org). The opportunities for a genomewide scan to map complex disease genes are tremendous. It is important to build appropriate models and useful algorithms in association mapping of complex diseases to identify important genetic variants of complex traits, for human, animal, or plant study.
In recent years, there has been great interest in linkage disequilibrium (LD) mapping (or association study) of quantitative traits of complex diseases. One way is to use diallelic markers such as SNPs in analysis. This approach has been receiving much attention and there are quite a lot of references to it in the literature (FULKER et al. 1999; GEORGE et al. 1999; ABECASIS et al. 2000a,b, 2001; SHAM et al. 2000; FAN et al. 2005). Another approach is to use haplotype data that may consist of a set of SNPs (SCHAID et al. 2002; ZAYKIN et al. 2002; SCHAID 2004). The haplotype data may provide more information on the relation between DNA variants and complex traits than that of any single SNP. Hence, it is important to investigate models and algorithms that are based on haplotype data. In SCHAID et al. (2002) and ZAYKIN et al. (2002), score tests are proposed for association between complex traits and haplotypes, which can be ambiguous owing to the unknown linkage phase of different haplotypes. In ZAYKIN et al. (2002), the method is called haplotype trend regression (HTR), which is very close to the method of SCHAID et al. (2002) (see SCHAID 2004, p. 355, for further explanation). HTR does not assume that haplotype phases are known. MEUWISSEN and GODDARD (2000) introduced a haplotype-based approach, which assumes that haplotype phases are known. In addition, mixed models are used to model the haplotype effect in MEUWISSEN and GODDARD (2000). MORRIS et al. (2004) used a Markov chain Monte Carlo algorithm based on the shattered coalescent model for fine mapping.
On the other hand, the direct available information is genotypes by current genotyping technology, instead of haplotypes. Hence, it is interesting to build models by directly using genotype information; under these models, the main effects of each marker are modeled, which does not require phase information across the markers. If phase is unknown, presumably the haplotypes would need to be estimated first, using a reconstruction algorithm such as PHASE or EM algorithms (DEMPSTER et al. 1977; M. STEPHENS et al. 2001; STEPHENS and DONNELLY 2003). This may introduce bias into the subsequent analysis, which would need to be investigated. It is of real interest in making comparison of the genotype-based models and the haplotype-based models. Interestingly, MORRIS et al. (2004) and CLAYTON et al. (2004) have observed that the haplotypes at SNPs may be only slightly more advantageous or even less powerful for fine mapping than the corresponding unphased genotypes.
Suppose that a quantitative trait locus (QTL) is located in a chromosome region. In the region, a marker (or two/multiple markers) is (or are) typed. In our previous research, the markers are assumed to be diallelic (FAN and XIONG 2002). In the current article, the markers can be either diallelic or multiallelic. Suppose that a population sample is available. For each individual in the sample, both trait value and genotypes at the markers are observed. We propose two regression models in association mapping of QTL based on population genetic data. One model is the "genotype effect model," and the other is the "additive effect model." These two models extend our previous research of high-resolution LD mapping of QTL using diallelic markers (FAN and XIONG 2002). The model can be very easily performed by using any statistical software in data analysis, or it can be easily implemented by widely used language such as C++. By analytical formulas, we show that the genotype effect model can be used to model the additive and dominance effects simultaneously; the additive effect model takes care of additive effect only. On the basis of the two models, F-test statistics are proposed to test association between the QTL and markers. To investigate the robustness of the proposed models and the related F-test statistics, simulation studies are performed to calculate the type I error rates. The noncentrality parameters of F-test statistics are derived to make power calculation and comparison. Moreover, the proposed models are compared with the haplotype trend regression method by simulation study and type I error rate analysis when two diallelic markers are used in the analysis (ZAYKIN et al. 2002). On the basis of the haplotype frequencies of 10 SNPs of angiotensin-1 converting enzyme (ACE) genes, a simulation study is performed to make power comparison of the proposed models with the haplotype trend regression method (KEAVNEY et al. 1998).
A software, CLAM_QTL, is written in C++ to implement the proposed models and methods, which can be downloaded from http://www.stat.tamu.edu/
rfan/software.html/.
ABSTRACT
>METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
Analysis by one marker:
Population models:
Consider a quantitative trait locus Q, which is located at an autosome. Suppose that there are two alleles Q1 and Q2 at the trait locus with frequencies q1 and q2, respectively. In a region of the QTL Q, suppose that one marker A is typed, which may be diallelic such as a single-nucleotide polymorphism or may be multiallelic such as a microsatellite marker. Let us denote the alleles of marker A by A1, ..., Am, where m is the number of alleles. Suppose that the marker A is in Hardy-Weinberg equilibrium (HWE). Let the frequency of Ai be
. There are JA = m(m + 1)/2 possible genotypes, which can be listed as A1A1, ..., AmAm, A1A2, ..., A1Am, ..., Am1Am. Accordingly, let ß11, ..., ßmm, ß12, ..., ß1m, ..., ßm1,m be the corresponding effects of the listed genotypes on the quantitative trait. Let y be the trait value of an individual with genotype GA = AiAj. Under an assumption of normality, the trait value can be modeled as
![]() | (1) |
is a column vector of regression coefficients of w, and e is the error term. Assume that e is normal N(0,
e2). In addition to the covariate effects, there are JA = m(m + 1)/2 parameters ßij in model (1), where ßij = ßji. Model (1) treats each genotype effect as one parameter. Hence, we call it a genotype effect model. In practice, model (1) may lead to large number of parameters.
Now let us denote the effect of allele Ai as
i, i = 1, ..., m. Suppose the genetic effect is additive in a sense of ßij =
i +
j, i, j = 1, ..., m. If an individual has quantitative trait value y and genotype GA = AiAj, model (1) can be modified as
![]() | (2) |
i, i = 1, ..., m, in model (2). Compared with model (1), model (2) may significantly reduce the number of parameters. Since it models only the additive effect, we call it the additive effect model.
Property of model coefficients and association tests:
As in the traditional quantitative genetics, let a be the effect of genotype Q1Q1, d be the effect of genotype Q1Q2, and a be the effect of genotype Q2Q2 (FALCONER and MACKAY 1996). Let
Q = a + (q2 q1)d be the average effect of gene substitution and
Q = 2d be the dominance deviation. In addition, let µ = a(q1 q2) + 2dq1q2 be the aggregate effect of the QTL on the trait mean in the population. For i = 1, 2, ..., m, let us denote
, which are measures of LD between QTL Q and marker A. Here P(Q1Ai) is the frequency of haplotype Q1Ai. In APPENDIX A, we show that the regression coefficients of model (1) are given by
![]() | (3) |
![]() | (4) |
i +
j, when
Q = 0, i.e., no dominance effect. Suppose that the marker A and the QTL Q are in linkage equilibrium; i.e.,
. Then Equation 3 implies ßij = µ; Equation 4 implies that
i = µ/2. Hence, models (1) and (2) are reduced to
![]() | (5) |
Assume that the additive genetic effect is significantly present, but the dominance genetic effect is not significantly present; i.e.,
Q
0 but
Q = 0. To test association between the marker A and the QTL Q, one may test hypotheses Ha0:
1 =
=
m vs. Ha1: at least two
i's are not equal. To see this, note that the hypotheses Ha0:
1 =
=
m is equivalent to
, since
Q is significantly different from 0. Thus,
implies
and so
under Ha0. Hence, the hypotheses Ha0:
1 =
=
m vs. Ha1: at least two
i's are not equal to each other are equivalent to
at least one
is not equal to 0. Model (2) can be used to map the QTL by an association analysis.
On the other hand, assume that both additive and dominance genetic effects are significantly present at the putative QTL Q; i.e.,
Q
0 and
Q
0. To test association between the marker A and the QTL Q, one may test hypotheses Had0: ß11 =
= ßmm = ß12 =
= ß1m =
= ßm1,m vs. Had1: at least two ßij's are not equal.
Relation to our previous work:
If the marker A has only two alleles A1 and A2, FAN and XIONG (2002) proposed the following model in association mapping of the QTL Q,
![]() | (6) |
![]() | (7) |
A and
A are regression coefficients of the dummy variables xA and zA. The regression coefficients are given by
and
(FAN and XIONG 2002). It can be shown that model (6) is equivalent to model (1). Actually, the following relations of the regression coefficients of the two models can be shown:
, and
. Similarly, model (2) is equivalent to y = w
+ µ + xH
A + e, and we have the following relations
and
. The advantage of model (6) is that the association effect is decomposed into summations of additive and dominance effects if A is diallelic. If A has more than two alleles, model (1) extends model (6), and model (2) extends model y = w
+ µ + xH
A + e.
Regression models:
Assume that N individuals from a population are available for study. Let us list their trait values as y1, ..., yN and their genotypes as GA1, ..., GAN. For individual k, let xii(k) be the indicator function of genotype AiAi and xij(k) be the indicator function of genotype AiAj. That is, they are dummy variables defined by
![]() |
j. Let
, k = 1, 2, ..., N; i.e., Xk is a column vector of genotype indicator functions of individual k. Here the superscript
denotes a vector/matrix transpose. Denote
. The corresponding regression of model (1) can be written as
![]() | (8) |
Similarly, let
be the number of alleles Ai of genotype GAk, i = 1, 2, ..., m, for individual k. That is,
is a dummy variable defined by
![]() |
and
. To use model (2) for data analysis, the corresponding regression model is
![]() | (9) |
F-tests and noncentrality parameter approximations:
It is well known that the additive variance
and the dominance variance
. Let
be the total variance. Assume that there are no covariates. Let us denote
, and
. Then model (8) can be expressed as y = X
+ e. By standard regression theory, the coefficients can be estimated by
. Let H be a (JA 1) x JA matrix defined by
![]() |
)
= (ß11 ß22, ..., ß11 ßmm, ß11 ß12, ..., ß11 ß1m, ..., ß11 ßm1,m). Hence, the hypothesis Had0 is equivalent to H
= (0, ..., 0)
. From GRAYBILL (1976), Chap. 6, the test statistic of a hypothesis Had0 is noncentral F(JA 1, N JA) defined by
![]() |
m,ad = (H
)
[H(X
X)1H
]1(H
)/
2. Under the assumption of large sample sizes N, we show in APPENDIX C the approximation
![]() | (10) |
(CROW and KIMURA 1970; HEDRICK 1987; MORTON and WU 1988; SHAM et al. 2000). Note that RAQ2 is the
2-statistic of the m x 2 table of haplotype frequencies of the marker A and trait locus Q. Approximation (10) shows that the noncentrality parameter of test statistics of the null hypothesis of no genetic effects of model (1) is reduced by a factor of
for additive variance and by a factor of
for dominance variance.
Similarly, let us denote
. Then model (9) can be expressed as y = Z
+ e. The coefficients can be estimated by
. Let K be a (m 1) x m matrix defined by
![]() |
)
= (
1
2, ...,
1
m). Hence, the hypothesis Ha0 is equivalent to K
= (0, ..., 0)
. From GRAYBILL (1976), Chap. 6, the test statistic of the hypothesis Ha0 is noncentral F(m 1, N m) defined by
![]() |
m,a = (K
)
[K(Z
Z)1K
]1(K
)/
2. Under an assumption of large sample sizes N, we show in APPENDIX D the following approximation:
![]() | (11) |
m,a is reduced by a factor of
for additive variance. The dominance variance is not present in
m,a.
Analysis by two/multiple markers:
Population models and association tests:
If genetic data of two/multiple markers are available, models (1) and (2) can be extended for association study of QTL. Most importantly, the data of two/multiple markers may contain phase ambiguity, i.e., phase unknown double heterozygotes. In the following, we generalize models (1) and (2) to directly analyze genetic data of two markers. The principle, actually, can be applied to multiple marker data.
In addition to marker A, assume that a second marker B is typed, which has n alleles denoted by B1, ..., Bn. Suppose that the marker B is also in Hardy-Weinberg equilibrium. Let the frequency of allele Bk be
. There are JB = n(n + 1)/2 possible genotypes, which can be listed as B1B1, ..., BnBn, B1B2, ..., B1Bn, ..., Bn1Bn. Let y be the trait value of an individual with genotype GA at marker A and genotype GB at marker B. Such as relations (7), define
|
| (15) |
since
.
To extend model (2) by using two markers A and B in the analysis, consider the following model
![]() | (13) |
,
Ai,
Bk, i = 1, ..., m 1, k = 1, ..., n 1 in model (13). To see why model (13) extends model (2), it is worthwhile to note that model (2) is equivalent to
. Actually, the quantity
implies that
if only information of marker A is used in the analysis; thus,
m =
/2,
i =
Ai +
/2, i = 1, ..., m 1. Such as model (2), model (13) takes only the additive effect into account. Hence, we call it an additive effect model. Similarly, model (1) can be extended to
![]() | (14) |
,
Ai,
Bk,
Aij,
Bkl in model (14). Model (14) takes both additive and dominance effects into account, and it is called the genotype effect model. Again, model (1) is equivalent to
.
Denote XA = (xA1, ..., xA(m1))
, XB = (xB1, ..., xB(n1))
, and XA
B = (XA
, XB
)
. Let us denote the additive variancecovariance matrix of the indicator variables xAi, xBk by
. Similarly, let ZA = (zA12, ..., zA1m, zA23, ..., zA2m, ..., zA(m1)m))
, ZB = (zB12, ..., zB1n, zB23, ..., zB2n, ..., zB(n1)n))
, and
. Let us denote the dominance variancecovariance matrix of the indicator variables zAij, zBkl by VD = Cov(ZA
B, ZA
B). For k = 1, 2, ..., n, let us denote
, which are measures of LD between QTL Q and marker B. In APPENDIX E, we show that the regression coefficients of models (13) and (14) are given by
|
| (15) |
and
) and gene effect (i.e.,
Q and
Q) are contained in the regression coefficients. Models (13) and (14) simultaneously take care of the LD and the effects of the putative trait locus Q. The gene substitution effect
Q is contained only in
Ai,
Bk; and the dominance effect
Q is contained only in
Aij,
Bkl. Therefore, VA is called the additive variancecovariance matrix; and VD is called the dominance variancecovariance matrix. The model (14) orthogonally decomposes the genetic effect into a summation of additive and dominance effects.
In FAN and XIONG (2002), regression models are proposed for LD mapping of QTL by diallelic markers. Models (13) and (14) extend the models by using multiallelic markers in LD analysis. On the basis of Equations 15, we may use models (13) and (14) to test the association between the trait locus Q and the two markers A and B. Assume that the additive genetic effect is significantly present, but the dominance genetic effect is not significantly present; i.e.,
Q
0 but
Q = 0. To test association between the markers A and B and the QTL Q, one may test hypotheses HABa0:
A1 =
=
A(m1) =
B1 =
=
B(n1) = 0 vs. HABa1: at least one
Ai,
Bk is not equal to 0. To see this, note that the hypothesis HABa0 is equivalent to
, since
Q is significantly different from 0. On the other hand, assume that both additive and dominance genetic effects are significantly present at the putative QTL Q; i.e.,
Q
0 and
Q
0. To test association between the markers A and B and the QTL Q, one may test hypothesis HABad0:
A1 =
=
A(m1) =
B1 =
=
B(n1) =
A12 =
=
A1m =
=
A(m1)m =
B12 =
=
B1n =
=
B(n1)n = 0 vs. HABad1: at least one
Ai,
Bk,
Aij,
Bkl is not equal to 0, since both
Q and
Q are significantly different from 0.
Regression models, F-tests, and noncentrality parameter approximations:
Assume that N individuals from a population are available for study, whose trait values are listed as y1, ..., yN and their genotypes as GA1, ..., GAN at marker A and GB1, ..., GBN at marker B. For individual s, let
be the corresponding coding functions of genotypes GAs and GBs. Let us denote
and
. Denote
A
B = (
,
A1, ...,
A(m1),
B1, ...,
B(n1))
, and
A
B = (
A12, ...,
A(m1)m,
B12, ...,
B(n1)n)
. The corresponding regression of model (14) can be written as
![]() | (16) |
and
and
. On the basis of regression (16), one may construct an F-test statistic FAB,ad to test the null hypothesis HABad0 in the same way as constructing Fm,ad or Fm,a (GRAYBILL 1976, Chap. 6). Under the null hypothesis of HABad0, FAB,ad is central to F(JA + JB 2, N JA JB + 1). Assume the sample size N is large enough that the large sample theory applies. Under the alternative hypothesis of HABad1, FAB,ad is noncentral to F(JA + JB 2, N JA JB + 1), and it can be shown that the corresponding noncentrality parameter is approximated by
![]() |
![]() |
The haplotype trend regression method:
If only one marker A is used in the analysis, the proposed model (2) is equivalent to the HTR method of ZAYKIN et al. (2002). However, the proposed models are different from the haplotype trend regression method for two/multiple marker data. Assume that M markers are typed in a region of the trait locus Q. On the basis of the genotypes of the multiple markers, assume that J haplotypes can be determined as h1, ..., hJ with frequencies
. For each individual, we may define an expected haplotype score vector as follows (SCHAID et al. 2002; ZAYKIN et al. 2002). The expected haplotype score vector is a column vector of J elements (c1, ..., cJ)
based on the genotype combination (G1, ..., GM) at the markers of an individual. For instance, the score vector is (1, 0, ..., 0)
if haplotype pair h1/h1 is the only possible phase of the genotype combination (G1, ..., GM). In general, cj is the conditional probability of a haplotype hj given genotype combination (G1, ..., GM) at the markers; i.e.,
![]() |
of the expected haplotype scores is equal to 1.
For the purpose of explanation, consider two diallelic markers A and B. Let us denote the two alleles of marker A by A1, A2; and denote the two alleles of marker B by B1, B2. Table 1 gives the score vector for each genotype combination of markers A and B. To understand the entries of Table 1, it is worthwhile to take genotype combination (GA = A1A1, GB = B1B1) as an example. Two copies of haplotype A1B1 can be formed from the genotype combination (GA = A1A1, GB = B1B1). The score for haplotype A1B1 is 1 for this genotype combination; and scores for the other three haplotypes are all 0. Denote the genotype of an individual at marker A by GA and the genotype at marker B by GB. Let us denote c1 = P(A1B1|GA = A1A2, GB = B1B2) = P(A1B1)P(A2B2)/[2P(A1B1)P(A2B2) + 2P(A1B2)P(A2B1)] = c4; i.e., c1 is the conditional probability of a haplotype A1B1 given the double heterozygotes (GA = A1A2, GB = B1B2); and
. For the double heterozygotes (GA = A1A2, GB = B1B2), the expected scores are c1, c2, c2, c1 for haplotypes A1B1, A1B2, A2B1, A2B2. The scores of the other genotype combinations are provided in Table 1. Then the corresponding model of the haplotype trend regression method can be written as
![]() | (17) |
|
In the general case of M markers, let Ij be the expected score of haplotype hj, j = 1, 2, ..., J. In terms of conditional probabilities, Ij can be expressed as
![]() |
![]() | (18) |
, which are measures of LD between QTL Q and the haplotypes. Here P(Q1hj) is the frequency of haplotype Q1hj. In APPENDIX F, we show that the regression coefficients of model (18) satisfy the matrix equation
![]() | (19) |
![]() |
. Then Equation 19 implies ß1 =
= ßJ = µ, since
and
. Hence, model (18) is reduced to (5). To test association between the haplotypes and the trait locus, one may test a null hypothesis ß1 =
= ßJ, and the related F-test statistic can be constructed.
Again, assume that N individuals from a population are available for study with trait values and genotype information. On the basis of regression (18), one may construct an F-test statistic FHTR to test the null hypothesis ß1 =
= ßJ = µ (GRAYBILL 1976). Under the null hypothesis, FHTR is central to F(J 1, N J). Under the alternative hypothesis that at least two ßj's are not equal to each other, FHTR is noncentral to F(J 1, N J). Assume the sample size N is large enough that the large sample theory applies. Then it can be shown that the corresponding noncentrality parameter is approximated by
![]() |
![]() |
The advantage of model (17) is that it may model the haplotype effect by parameters ßi. In practice, it is necessary to calculate the expected scorings or haplotype frequencies before building the haplotype trend regression model. Instead, the proposed models (13) and (14) may be used to analyze genetic data directly. Moreover, we have derived analytical formulas to calculate the regression coefficients of the HTR method and the related noncentrality parameter of the test statistic FHTR. Note that the original article by ZAYKIN et al. (2002) did not work out this very useful information. Our analytical coefficient equations and related noncentrality parameter approximations can be readily utilized for power evaluation.
ABSTRACT
METHODS
>RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
Type I error rates:
To evaluate the robustness of the proposed models, we calculate type I error rates of test statistics Fm,ad, Fm,a, FAB,ad, FAB,a, and FHTR at a 0.05 significance level. The results are presented in Tables 2 and 3. Four test cases are considered: null, no major gene effect a = d = 0; additive, additive mode of inheritance a = 1, but no dominant effect d = 0; dominant, dominant mode of inheritance a = d = 1; and recessive, recessive mode of inheritance a = 1 and d = 0.5. The total variance is fixed as
2 = 1.0 and the trait allele frequency is taken as q1 = q2 = 0.5 except for that in the null test case. In Table 2, only one marker A is used in analysis; the number m of alleles ranges from 2 to 6. The allele frequencies are given by:
when m = 2;
when
when m = 4;
when m = 5; and
when m = 6.
|
|
To calculate the type I error rates, 10,000 data sets are simulated for each test case. Each data set contains either 200 or 300 individuals. In each test case in Table 2, the data sets are generated under an assumption of linkage equilibrium between the QTL Q and the marker A; i.e.,
. That is, there is no association between the QTL Q and marker A. Utilizing the data sets, we fit either model (8) or model (9), and then calculate the F-test Fm,ad or Fm,a. Because the data sets are generated under the assumption of linkage equilibrium, an empirical test statistic that is larger than the cutting point of the related F-statistic at a 0.05 significance level is treated as a false positive. On the basis of the F-test of either Fm,ad or Fm,a, type I error rates are calculated as the proportions of the 10,000 simulation data sets that give significant results at the 0.05 significance level.
For the test statistic Fm,a, the Table 2 results show that the type I error rates are around the 0.05 nominal significance level in all cases. Hence, the proposed model (9) is robust for data sets of a sample size N = 200. For test statistic Fm,ad, the type I error rates are around the 0.05 nominal significance level when m
5 for data sets of sample size N = 200. For m = 6 and a sample size N = 200, the type I error rates of test Fm,ad are too big for the dominant and recessive test cases (9.11 and 7.04%, respectively). This is partially due to the large degrees of freedom, JA 1 = m(m + 1)/2 1 = 20 of test Fm,ad when m = 6; in addition, the high rate of type I error may be also caused by the mode of inheritance, i.e., for the cases of dominant and recessive models. When the sample size increases to N = 300, the type I error rates of test Fm,ad are around the 0.05 nominal significance level for m = 6. Model (8) is less robust than model (9).
In Table 3, two markers A and B are used in the analysis. The numbers m and n of alleles are equal to 2. The allele frequencies are given by
and
. In each test case, linkage equilibrium is assumed between the QTL Q and the markers A and B; i.e.,
. Denote
, which is the measure of LD between A and B. Here P(A1B1) is the frequency of haplotype A1B1. Let
![]() | (20) |
, and (2) linkage disequilibrium, i.e.,
. No linkage disequilibrium of third order is assumed among markers A and B and the QTL Q; that is, DAQB = 0. Again, 10,000 data sets are simulated for each test case, and each data set contains 200 individuals. The simulation is done as follows. First, the haplotype frequencies are calculated on the basis of allele frequencies and LD coefficients by relation (20) (THOMSON and BAUR 1984). Then data sets are simulated using the haplotype frequencies. On the basis of the F-test of either FAB,ad or FAB,a or the HTR method, type I error rates are calculated as the proportions of the 10,000 simulation data sets that give significant results at the 0.05 significance level. The Table 3 results show that the type I error rates are around the 0.05 nominal significance level in all cases. Hence, the proposed models (13) and (14) and the HTR method are robust for data sets of a sample size N = 200. Table 4 shows type I error rates (percentages) of test statistics FABC,ad, FABC,a, and FHTR at a 0.05 significance level when three diallelic markers A, B, and C are used in the analysis. The measures DABC, DAQC, and DBQC of the third-order LD are defined as that of DAQB; the measure of the fourth order is defined accordingly (BENNETT 1954). Such as relation (20), the haplotype frequencies at the three markers A, B, and C and at QTL Q are calculated on the basis of allele frequencies and LD coefficients by WEIR'S (1996, p. 119) relation (3.14). Then data sets are simulated using the haplotype frequencies. Since this article is about population data, one individual may have two copies of haplotypes. Each haplotype is sampled according to the haplotype frequencies. From the Table 4 results, we can see that the proposed models and the HTR method give correct type I errors for data sets of a sample size N = 200.
|
Power calculation and comparison:
Let h2 =
ga2/
2 be the heritability. Figure 1 shows power curves of the test statistics F4,a, F4,ad, F2,a, and F2,ad against the disequilibrium coefficient
for a dominant mode of inheritance a = d = 1.0 at a 0.05 significance level based on the approximations of noncentrality parameters
m,a and
m,ad. F4,a and F4,ad are calculated when A has four equal frequency alleles; i.e.,
. In addition, the measures of LD are given as follows: Figure 1, A and B,
, and Figure 1, C and D,
. F2,a and F2,ad are calculated by collapsing the four alleles to be two alleles: in Figure 1, A and C, alleles A1 and A2 are collapsed as one allele, and alleles A3 and A4 are collapsed to be the other; in Figure 1, B and D, alleles A1 and A3 are collapsed to be one allele, and alleles A2 and A4 are collapsed to be the other. For F2,a and F2,ad, a simple calculation can show that the measures of LD in Figure 1A are 0, 0; the measures of LD in Figure 1B are
; the measures of LD in Figure 1C are 0, 0; and the measures of LD in Figure 1D are
. Hence, the QTL Q is in linkage equilibrium with the marker after collapsing the alleles in Figure 1, A and C. The other parameters are q1 = 0.50, h2 = 0.25, N = 200.
|
From Figure 1, we may see the following:
- F4,ad is slightly less powerful than F4,a, and F2,ad is slightly less powerful than F2,a. This is because that test statistic Fm,ad has larger degrees of freedom than those of Fm,a. Note that the noncentrality parameter approximation
m,ad of Fm,ad is given by Equation 10. The contribution of the dominance effect is
, which depends on both dominance effect d and the magnitude of factor
and it can be significant when both of them are large enough. Hence, including a dominance component in the model can improve the power of QTL detection only when the magnitude of
is large enough to compensate for the extra degrees of freedom. Note that the quantity
is the product of the dominance variance
and of the measure RAQ4 of LD. The magnitude of
is the result of the dominance variance
reduced by a factor
. Even when
is large,
can be small when LD coefficients are not big; i.e.,
is small.
- When the measures of LD are high, the power of the test statistics is high. On the other hand, the power is minimal if all measures of LD are close to 0.
- The dependence of power on measures of LD can also be observed by comparing Figure 1A with Figure 1C, 1B with 1D. The power of F4,ad and F4,a in Figure 1A is higher than that of F4,ad and F4,a in Figure 1C, respectively; the power of each test statistic in Figure 1B is higher than that of the same test statistic in Figure 1D. This is because the measures of LD in Figure 1A are equal to or higher than those in Figure 1C, and the measures of LD in Figure 1B are equal to or higher than those in Figure 1D.
- In Figure 1B and Figure 1D, the power of F4,ad is slightly lower than that of F2,ad; the power of F4,a is slightly lower than that of F2,a.
- In Figure 1A and Figure 1C, the power of F2,ad and F2,a is minimal. This is because measures of LD are 0 after collapsing the alleles in these two graphs.
Figure 2 shows power curves of the test statistics F4,a, F4,ad, F3,a, and F3,ad against the disequilibrium coefficient
for a dominant mode of inheritance a = d = 1.0 at a 0.05 significance level. F4,a and F4,ad are calculated as those in Figure 1. F3,a and F3,ad are calculated by collapsing two of the four alleles to be a new alelle: in Figure 2, A and C, alleles A1 and A2 are collapsed as a new one; in Figure 2, B and D, alleles A1 and A3 are collapsed to be a new one. For F3,a and F3,ad, a simple calculation can show that the measures of LD in Figure 2A are
the measures of LD in Figure 2B are
the measures of LD in Figure 2C are
and the measures of LD in Figure 2D are
. Among the features shown in Figure 1, it can be seen that in Figure 2, A and C, the power of F4,ad is higher than that of F3,ad, and the power of F4,a is higher than that of F3,a. In Figure 2, B and D, the power of F4,ad is slightly lower than that of F3,ad, and the power of F4,a is slightly lower than that of F3,a. Hence, the way to collapse the alleles has impact on power.
|
From Figures 1 and 2, we may see that the power of F4,a and F4,ad is relatively stable although it may be slightly lower than that of F3,a, F3,ad, F2,a, and F2,ad in certain circumstances. However, the power of F3,a, F3,ad, F2,a, and F2,ad depends heavily on the way to collapse the alleles. This shows the advantage of using multiallelic markers in an association study of QTL detection. For multiallelic marker data, the proposed test statistics Fm,a and Fm,ad can be directly used to test if there is association between the marker and the QTL. As shown in Figures 1 and 2, the test statistic Fm,a is usually more powerful than Fm,ad due to the increase of degrees of freedom of test statistic Fm,ad.
Figure 3 shows power curves of the test statistics F4,a and F4,ad against the heritability h2 at a 0.05 significance level for a dominant mode of inheritance a = d = 1.0 and for a recessive mode of inheritance a = 1.0, d = 0.5, respectively. As with Figures 1 and 2, Figure 3 is based on noncentrality parameter approximations (10) and (11). In Figure 3, A and B, the power can be high as the heritability h2 > 0.1; in these two graphs, the measures of LD are given by
. In Figure 3, C and D, the power can be high as the heritability h2 > 0.15; in these two graphs, the measures of LD are given by
. Figure 4 shows power curves of the test statistics F4,a and F4,ad against the trait allele frequency q1 or marker allele frequency
at a 0.05 significance level. It can be seen that the power depends on both the measures of linkage disequilibrium and the trait allele frequency q1 or marker allele frequency
.
|
|
Comparison with the haplotype trend regression method:
Assume that the two diallelic markers A and B are used in the analysis. Figures 5 and 6 show power curves of the test statistics FAB,a, FHTR, and FAB,ad against the heritability h2 at a 0.05 significance level. The related parameters are given in the figure legends. The power curves of the test statistics FAB,a, FHTR, and FAB,ad are calculated on the basis of approximations of noncentrality parameters
ABa,
HTR, and
ABad.
|
|
In Figure 5, no third-order linkage disequilibrium is assumed; i.e., DAQB = 0. In Figure 6, A and B, weak third-order linkage disequilibrium is assumed; i.e., DAQB = 0.025. It can be seen that the genotype effect model can be less powerful than the HTR method, and the HTR method can be less powerful than the additive effect model in the case of no or weak third-order linkage disequilibrium among the two markers and the QTL (Figure 5 and Figure 6, A and B). In Figure 6, C and D, strong third-order linkage disequilibrium is assumed; i.e., DAQB = 0.065. In the case that strong third-order linkage disequilibrium exists, the HTR method can be more powerful (Figure 6, C and D).
Note the following fact: in Figure 6, A and B, the maximum of DAQB is 0.025; in Figure 6, C and D, the maximum of DAQB is 0.065 (otherwise, some of the haplotype would have negative frequencies). Thus, the simulated power curves of the haplotype trend regression method in Figures 5 and 6 represent the two extreme situations: (1) no third-order linkage disequilibrium (Figure 5) and (2) strongest third-order linkage disequilibrium (Figure 6). In practice, the third-order linkage disequilibrium would exist in a more moderate way that is between the two extremes; and the power of the haplotype trend regression method should be between those of the two extremes. Note that the proposed genotype effect model and additive effect model utilize only the second-order linkage disequilibrium or pairwise linkage disequilibrium. Hence, the powers of FAB,a and FAB,ad are the same for Figures 5 and 6.
Figure 7 shows power curves of the test statistics FABC,a and FABC,ad and FHTR against the heritability h2 at a 0.05 significance level, when three diallelic markers A, B, and C are used in the analysis. The related parameters are given in the figure legend. From Figure 7, it can be seen that the power of FHTR is the lowest. This is due to the large number of degrees of freedom of FHTR, which is F(7, N 8), N = 200. In contrast, FABC,a is F(3, N 4), N = 200; and FABC,a is F(6, N 7), N = 200. The low power of FHTR is most likely due to the biallelic QTL situation that we consider. In the situation of multiple QTL haplotypes and strong LD between QTL and marker haplotypes, the haplotype-based methods are expected to have good power.
|
Comparison based on ACE haplotype frequencies:
To work on more realistic scenarios, we take the haplotype information of ACE genes as an example. Ten diallelic polymorphisms in the ACE gene spanning 26 kb were genotyped (KEAVNEY et al. 1998). The order of the 10 polymorphisms is T-5991C, A-5466C, T-3892C, A-240T, T-93C, T1237C, G2215A, I/D, G2350A, and 4656(CT)3/2. Table 5 lists 10 haplotypes, where the first 7 are the most frequent haplotypes (http://www.well.ox.ac.uk/
mfarrall/oxhap_freq.html). For the 10 haplotypes, allele I at marker I/D is always present with allele A at marker G2350A, and allele D at marker I/D is always present with allele G at marker G2350A. Hence, the two markers can be treated as one. Similarly, markers T-5991C and A-5466C can be treated as one; and markers A-240T and T-93C can be treated as one. Therefore, the 10 haplotypes can be considered as containing seven markers.
|
In ABECASIS et al. (2000a,b) and FAN et al. (2005), it is found that that markers I/D and G2350A show strongest association with the circulating ACE level. Thus, markers I/D and G2350A are treated as a putative trait locus Q. A quantitative trait of the putative locus Q is simulated for each graph in Figure 8, AD. The empirical power curves of the test statistics FHTR, Fa, and Fad are plotted against the heritability h2 at a 0.05 significance level in Figure 8. Here Fa is the test statistic based on the additive effect model, and Fad is the test statistic based on the genotype effect model. The empirical power curves SFHTR, SFa, and SFad in Figure 8 are calculated as follows. First, the interval (0.01, 0.25) of the heritability h2 is divided into 24 subintervals. Correspondingly, the 24 subintervals lead to 25 end points. For each end point, there is a set of parameters for the power curve. Using the set of parameters, 2500 data sets are simulated for each end point. For each data set, empirical statistics of FHTR, Fa, and Fad are calculated. The simulated power is the proportion of the 2500 simulated data sets for which the empirical statistic is larger than the cutting point of the corresponding F-distribution at a 0.05 significance level.
|
In Figure 8, A and C, the curves are plotted for a dominant mode of inheritance a = d = 1.0; in Figure 8, B and D, the curves are plotted for an additive mode of inheritance a = 1.0, d = 0. In Figure 8, A and B, all 10 haplotypes are used in the simulations; in Figure 8, C and D, only the first 7 most frequent haplotypes are used. From Figure 8, AD, it can be seen that the proposed additive effect model has similar power to that of the HTR method. In Figure 8, A and C, when the dominance effects are present, the genotype effect model has similar power to those of the additive effect model and the HTR method. In Figure 8, B and D, the genotype effect model is less powerful because of the absence of the dominance effect. Hence, the genotype effect model can be useful only if the dominance effect can compensate for the extra degrees of freedom.
Simulation study:
To evaluate the accuracy of the noncentrality parameter approximations, we performed simulations for the power curves in Figures 1, 2, 5, 6, and 7. The results are presented as supplemental information (http://www.genetics.org/supplemental/). It can be seen that the approximations are excellent.ABSTRACT
METHODS
RESULTS
>DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
, it can be more powerful than Fm,a.
The formulas of noncentrality parameter approximations (10) and (11) clearly indicate the dependence of the power on the quantity RAQ2 for genetic data. That is, the noncentrality parameter of test statistics of the null hypothesis of no genetic effects is reduced by a factor of
for additive variance and by a factor of
for dominance variance. If only one diallelic marker A is used in the analysis, both our previous research and the work of colleagues have derived similar formulas to support this argument (SHAM et al. 2000; FAN and XIONG 2002, 2003; FAN and JUNG 2003; FAN et al. 2005; JUNG et al. 2005). This is a good example in the debate on appropriate measures of LD for markers or multiallelic markers (HEDRICK 1987; DEVLIN and RISCH 1995; PRITCHARD and PRZEWORSKI 2001; WEISS and CLARK 2002). For multiallelic markers or haplotypes, a satisfactory measure of LD has not been derived, as mentioned regarding p306 in ARDLIE et al. (2002). For two diallelic loci A and Q, ARDLIE et al. (2002) favor using
, which is the correlation of alleles at the two loci. For multiallelic marker data, this article extends previous research by providing the definition of RAQ2 and deriving Equations 10 and 11. HAYES et al. (2003) introduced a multilocus approach for estimating LD and past effective size and used chromosome segment homozygosity (CSH), which was introduced in SVED (1971). The dependence of the noncentrality parameter on the quantity
has been indicated by our study and also by SHAM et al. (2000).
In FULKER et al. (1999), ABECASIS et al. (2000a,b, 2001), and SHAM et al. (2000), an association between-family and association within-family ("AbAw") approach is proposed to decompose the genetic association into effects of between pairs and within pairs on the basis of variance component models. The AbAw approach is based on any single diallelic marker. Instead of using a single diallelic marker, we have proposed variance component models using multiple diallelic markers. In our models, the association is decomposed into additive and dominance components (FAN and XIONG 2002, 2003; FAN and JUNG 2003; FAN et al. 2005; JUNG et al. 2005). In FAN and JUNG (2003), FAN et al. (2005), and JUNG et al. (2005), we compare our method with the AbAw approach and find that our method is advantageous over the AbAw approach. In model (1) or (2), only one marker is used in model building. If multiple markers or multiallelic markers are available, it is very easy to generalize the models to analyze the data. For instance, model (14) generalizes model (1) if two markers are available in the analysis. Accordingly, model (13) generalizes model (2). If only one marker is used in analysis, the proposed model (2) is equivalent to the haplotype trend regression method by ZAYKIN et al. (2002), which is very close to the method of SCHAID et al. (2002). However, the proposed models are different from the haplotype trend regression method for two/multiple marker data. If both markers are diallelic markers, the genotype effect model can be less powerful than the HTR method, and the HTR method can be less powerful than the additive effect model in the case of no or weak third-order linkage disequilibrium among the two markers and the QTL. If strong third-order linkage disequilibrium exists, the HTR method can be more powerful.
Basically, the proposed models are genotype based. The models can be used to analyze directly any number of markers, and the markers can be either diallelic or multiallelic. By a simulation study based on ACE haplotype frequencies, we show that the proposed additive effect models have similar power to that of the haplotype-based HTR method. In the meantime, the proposed models enjoy the simplicity of not needing to estimate the expected haplotype scorings; in contrast, the HTR method needs to calculate the expected haplotype scorings before building the models. The proposed models decompose the main marker effects into a summation of additive and dominance effects. In the presence of haplotype effects, it is important to estimate the haplotype effects and haplotype-based methods are more relevant (STRAM et al. 2003; TREGOUET et al. 2004).
One potential problem of this generalization is that the number of parameters can be very big. Then, one needs to select important alleles in the analysis and search for important genetic variants that are truly associated with the genetic traits. At first glance, model (1), (2), (13), or (14) seems too complicated and contains too many terms. However, the models are not intimidating at all if one takes into account the recent discovery of haplotype structure in the human genome. Although a haplotype block may contain many SNPs, it takes only a few SNPs to uniquely identify each of the haplotypes in the block. Within a block, there are only two to four common haplotypes (ARNHEIM et al. 2003; DALY et al. 2001; GOLDSTEIN 2001; PATIL et al. 2001; REICH et al. 2001; RIOUX et al. 2001; J. C. STEPHENS et al. 2001; GABRIEL et al. 2002; NORDBORG and TAVARÉ 2002; PHILLIPS et al. 2003). This implies that model (1), (2), (13), or (14) contains a few terms and hence is manageable. Moreover, model (1) or (2) already takes the haplotype structure into account and is potentially more powerful. In practice, one may want to collapse some alleles to reduce the number of parameters. However, the collapsing process may decrease linkage disequilibrium and therefore result in loss of power. The proposed regression models can be fitted to alleviate the problem.
In the mathematical derivations, we make the assumption of HWE. It is unclear how to construct tests reflecting deviations from HWE and this requires further research. In addition, we illustrate that the false-positive rate of the genotype effect test is too high for more than five alleles in a sample of 200 individuals. This is obviously due to the large numbers of possible genotypes and hence to sparseness in the contingency table. This problem could be overcome by using exact tests or permutation procedures.
The models of this article are based on population data. Suppose that both population and pedigree data including sibships are available. Then, model (1) or (2) can be generalized to perform high-resolution combined LD mapping and a linkage study of QTL by variance component models in the spirit of our previous work. In fact, we may generalize regression (1) or (2) by adding the polygenic effect to fit the data. Moreover, log-likelihoods can be constructed on the basis of variance component models. This will generalize our research by using either diallelic/multiallelic markers or haplotypes in a combined analysis of population and pedigree data. It is well known that association study-based population data are prone to false positives, due to the population stratification and population history. A valid approach would be to find linkage information by using pedigree data to locate the QTL on a broad chromosome region. Then, a combined linkage and association mapping can be performed for fine mapping of the genetic traits on the basis of both population and pedigree data (FAN and XIONG 2003). This would be more likely to overcome the drawbacks of separate analysis of either a linkage study or association mapping: low resolution of linkage analysis and high false-positive rates in the association study. In the meantime, it is more likely to take advantage of the two methods: the low false-positive rates of linkage analysis and the high resolution of the association-mapping method.
ABSTRACT
METHODS
RESULTS
DISCUSSION
>APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() |
j. Then model (1) can be rewritten as
![]() | (A1) |
. Given Equation A1, taking expectation of yxii leads to
. On the other hand, a true random-effect model describing the trait value is y = w
+ g + e, where
![]() |
and
gives
![]() | (A2) |
shows Equation 3 when i = j.
If i
j,
. Multiplying at both sides of Equation A1 by xij and taking the expectation lead to E(yxij) = E(xij)[w
+ ßij]. Again, utilizing
,
,
, and
gives
![]() | (A3) |
shows Equation 3 when i
j. ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
>APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() |
![]() | (B1) |
![]() | (B2) |
. For i
j, the expectation
. For the elements on the right-hand side, Equations A2 and A3 lead to
, since
. Plugging the above quantities into matrix Equation B2 gives Equation 4 as
![]() |
is
![]() |
)1 = M1 (M1a)(b
M1)/(1 + b
M1a). ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
>APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
. If the sample size N is large enough, the large number law implies the approximation
![]() | (C1) |
is a diagonal matrix, whose elements on the diagonal are given by the elements of
. That is, if
, then
. Let H be a (JA 1) x JA matrix defined by
![]() |
)
= (ß11 ß22, ..., ß11 ßmm, ß11 ß12, ..., ß11 ß1m, ..., ß11 ßm1,m). From approximation (C1), we have the approximation
![]() |
![]() |
. Applying a fact of inverse matrix (M + ab
)1 = M1 (M1a)(b
M1)/(1 + b
M1a) again, we have
![]() |
![]() | (C2) |
![]() |
, we have
![]() |
![]() |
, and so
. Hence, the noncentrality parameter approximation (10) is valid. ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
>APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() |
)1 = M1 (M1a)(b
M1)/(1 + b
M1a), the inverse is
![]() |
![]() |
)
= (
1
2, ...,
1
m). On the other hand, we have the approximation
![]() |
![]() |
![]() |
. Thus, the noncentrality parameter
![]() |
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
>APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
, which are measures of LD between markers A and B. Here P(AiBk) is frequency of haplotype AiBk. It can be shown that for i
j, k
l, j
j', l
l', (i, j)
(i', j'), (k, l)
(k', l'),
![]() | (E1) |
![]() |
Since EZA
B is a vector of 0's by the quantities in (E1), it can be shown that VD = Cov(ZA
B, ZA
B) = E(ZA
BZA
B
). Moreover, the quantities in (E1) imply that the covariance matrix Cov(XA
B, ZA
B) is a 0 matrix.
Taking variancecovariance between y and xAi, xBk, zAij, zBkl on the basis of relation (14), we may get the regression coefficients (15) of models (13) and (14).
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
>APPENDIX F
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (F1) |
The elements of the matrix on the left-hand side of the above equation can be calculated as follows:
![]() |
![]() |
![]() |
Plugging the above quantities into matrix Equation F1 gives Equation 19.
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
APPENDIX F
ACKNOWLEDGEMENTS
>LITERATURE CITED
ABECASIS, G. R., L. R. CARDON and W. O. C. COOKSON, 2000a A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66: 279292.[CrossRef][Medline]
ABECASIS, G. R., W. O. C. COOKSON and L. R. CARDON, 2000b Pedigree tests of linkage disequilibrium. Eur. J. Hum. Genet. 8: 545551.[CrossRef][Medline]
ABECASIS, G. R., W. O. C. COOKSON and L. R. CARDON, 2001 The power to detect linkage disequilibrium with quantitative traits in selected samples. Am. J. Hum. Genet. 68: 14631474.[CrossRef][Medline]
ARDLIE, K. G., L. KRUGLYAK and M. SEIELSSTAD, 2002 Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. 3: 299309.[CrossRef][Medline]
ARNHEIM, N., P. CALABRESE and M. NORDBORG, 2003 Review article: hot and cold spots of recombination in the human genome: the reason we should find them and how this can be achieved. Am. J. Hum. Genet. 73: 516.[CrossRef][Medline]
BENNETT, J. H., 1954 On the theory of random mating. Ann. Eugen. 18: 311317.
BOERWINKLE, E., E. CHAKRABORTY and C. F. SING, 1986 The use of measured genotype information in the analysis of quantitative phenotype in man. I. Models and analytical methods. Ann. Hum. Genet. 50: 181194.[Medline]
CANTOR, R. M., G. K. CHEN, P. PAJUKANTA and K. LANGE, 2005 Association testing in a linked region using large pedigrees. Am. J. Hum. Genet. 76: 538542.[Medline]
CLAYTON, D., J. CHAPMAN and J. COOPER, 2004 The use of unphased multilocus genotype data in indirect association studies. Genet. Epidemiol. 27: 415428.[CrossRef][Medline]
CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics Theory. Harper & Row, New York.
DALY, M. J., J. D. RIOUX, S. F. SCHAFFNER, T. J. HUDSON and E. S. LANDER, 2001 High-resolution haplotype structure in the human genome. Nat. Genet. 29: 229232.[CrossRef][Medline]
DEMPSTER, A. P., N. M. LAIRD and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39: 138.
DEVLIN, B., and N. RISCH, 1995 A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29: 311322.[CrossRef][Medline]
FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman, London.
FAN, R. Z., and J. S. JUNG, 2003 High resolution joint linkage disequilibrium and linkage mapping of quantitative trait loci based on sibship data. Hum. Hered. 56: 166187.[CrossRef][Medline]
FAN, R. Z., and M. M. XIONG, 2002 High resolution mapping of quantitative trait loci by linkage disequilibrium analysis. Eur. J. Hum. Genet. 10: 607615.[Medline]
FAN, R. Z., and M. M. XIONG, 2003 Combined high resolution linkage and association mapping of quantitative trait loci. Eur. J. Hum. Genet. 11: 125137.[Medline]
FAN, R. Z., C. SPINKA, L. JIN and J. S. JUNG, 2005 Pedigree linkage disequilibrium mapping of quantitative trait loci. Eur. J. Hum. Genet. 13: 216231.[Medline]
FULKER, D. W., S. S. CHERNY, P. C. SHAM and J. K. HEWITT, 1999 Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64: 259267.[CrossRef][Medline]
GABRIEL, S. B., S. F. SCHAFFNER, H. NGUYEN, J. M. MOORE, J. ROY et al., 2002 The structure of markers in the human genome. Science 296: 22252229.
GEORGE, V., H. K. TIWARI, X. F. ZHU and R. C. ELSTON, 1999 A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am. J. Hum. Genet. 65: 236245.[CrossRef][Medline]
GOLDSTEIN, G. B., 2001 Islands of linkage disequilibrium. Nat. Genet. 29: 109111.[CrossRef][Medline]
GRAYBILL, F. A., 1976 Theory and Application of the Linear Model. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA.
HAYES, B. J., P. M. VISSCHER, H. C. MCPARTLAN and M. E. GODDARD, 2003 Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 13: 635643.
HEDRICK, P. W., 1987 Gametic disequilibrium measures: proceed with caution. Genetics 117: 331341.
INTERNATIONAL HAPMAP CONSORTIUM, 2003 The International HapMap Project. Nature 426: 789796.[CrossRef][Medline]
INTERNATIONAL SNP MAP WORKING GROUP, 2001 A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928933.[CrossRef][Medline]
JUNG, J. S., R. Z. FAN and L. JIN, 2005 Combined linkage and association mapping of quantitative trait loci by multiple markers. Genetics 170: 881898.
KEAVNEY, B., C. A. MCKENZIE, J. M. CONNELL, C. JULIER, P. J. RATCLIFFE et al., 1998 Measured haplotype analysis of the angiotension-1 converting enzyme gene. Hum. Mol. Genet. 7: 17451751.
KONG, A., D. F. GUDBJARTSSON, J. SAINZ, G. M. JONSDOTTIR, S. A. GUDJONSSON et al., 2002 A high resolution recombination map of the human genome. Nat. Genet. 31: 241247.[CrossRef][Medline]
LANGE, K., J. S. SINSHEIMER and E. SOBEL, 2005 Association testing with Mendel. Genet. Epidemiol. 29: 3650.[Medline]
MEUWISSEN, T. H. E., and M. E. GODDARD, 2000 Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155: 421430.
MORRIS, A. P., J. C. WHITTAKER and D. J. BALDING, 2004 Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data. Am. J. Hum. Genet. 74: 945953.[CrossRef][Medline]
MORTON, N. E., and D. WU, 1988 Alternative bioassays of kinship between loci. Am. J. Hum. Genet. 42: 173177.[Medline]
NIELSEN, D. M., and B. S. WEIR, 1999 A classical setting for associations between markers and loci affecting quantitative traits. Genet. Res. 74: 271277.[CrossRef][Medline]
NIELSEN, D. M., and B. S. WEIR, 2001 Association studies under general disease models. Theor. Popul. Biol. 60: 253263.[Medline]
NORDBORG, M., and S. TAVARÉ, 2002 Linkage disequilibrium: what history has to tell us. Trends Genet. 18: 8390.[CrossRef][Medline]
PATIL, N. P., A. J. BERNO, D. A. HINDS, W. A. BARRETT, J. M. DOSHI et al., 2001 Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294: 17191723.
PHILLIPS, M. S., R. LAWRENCE, R. SACHIDANANDAM, A. P. MORRIS, D. J. BALDING et al., 2003 Chromosome-wide distribution of markers and the role of recombination hot spots. Nat. Genet. 33: 382387.[CrossRef][Medline]
PRITCHARD, J. K., and M. PRZEWORSKI, 2001 Linkage disequilibrium in humans: model and data. Am. J. Hum. Genet. 69: 114.[CrossRef][Medline]
REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, R. C. SABETT et al., 2001 Linkage disequilibrium in the human genome. Nature 411: 199204.[CrossRef][Medline]
RIOUX, J. D., M. J. DALY, M. S. SILVERBERG, K. LINDBLAD, H. STEINHART et al., 2001 Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat. Genet. 29: 223228.[CrossRef][Medline]
SCHAID, D. J., 2004 Evaluating associations of haplotypes with traits. Genet. Epidemiol. 27: 348364.[CrossRef][Medline]
SCHAID, D. J., C. M. ROWLAND, D. E. TINES, R. M. JACOBSON and G. A. POLAND, 2002 Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70: 425434.[CrossRef][Medline]
SHAM, P. C., S. S. CHERNY, S. PURCELL and J. K. HEWITT, 2000 Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66: 16161630.[CrossRef][Medline]
STEPHENS, J. C., J. A. SCHNEIDER, D. A. TANGUAY, J. CHOI, T. ACHARYA et al., 2001 Haplotype variation and linkage disequilibrium in 313 human genes. Science 293: 489493.
STEPHENS, M., and P. DONNELLY, 2003 A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73: 11621169.[CrossRef][Medline]
STEPHENS, M., N. SMITH and P. DONNELLY, 2001 A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68: 978989.[CrossRef][Medline]
STRAM, D. O., C. A. HAIMAN, J. N. HIRSCHHORN, D. ALTSHULER, L. N. KOLONEL et al., 2003 Choosing haplotype-tagging SNPs based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the multiethnic cohort study. Hum. Hered. 55: 179190.[CrossRef][Medline]
SVED, J. A., 1971 Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor. Popul. Biol. 2: 125141.[CrossRef][Medline]
THOMSON, G., and M. P. BAUR, 1984 Third order linkage disequilibrium. Tissue Antigens 24: 250255.[Medline]
TREGOUET, D. A., S. ESCOLANO, L. TIRET, A. MALLET and J. L. GOLMARD, 2004 A new algorithm for haplotype-based association analysis: the stochastic-EM algorithm. Ann. Hum. Genet. 68: 165177.[CrossRef][Medline]
WEIR, B. S., 1996 Genetic Data Analysis II, Ed. 2. Sinauer Associates, Sunderland, MA.
WEIR, B. S., and C. C. COCKERHAM, 1977 Two-locus theory in quantitative genetics, pp. 247269 in Proceedings of the International Conference on Quantitative Genetics, edited by E. POLLAK, O. KEMPTHORNE and T. B. BAILEY. Iowa State University Press, Ames, IA.
WEISS, K. M., and A. G. CLARK, 2002 Linkage disequilibrium and the mapping of complex traits. Trends Genet. 18: 1924.[CrossRef][Medline]
ZAYKIN, D. V., P. H. WESTFALL, S. S YOUNG, M. A. KARNOUB, M. J. WAGNER et al., 2002 Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53: 7991.[Medline]
Communicating editor: G. GIBSON
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.046417v1
172/1/663 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Fan, R.
- Articles by Jin, L.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Fan, R.
- Articles by Jin, L.


































for a dominant mode of inheritance a = d = 1.0 at a 0.05 significance level. F4,ad and F4,a are calculated when marker A has four equal frequency alleles; i.e., 

(C and D) 
(C and D) the parameters are given by
,
. The other parameters are h2 = 0.15 and N = 250.
(C and D) the parameters are given by
. The other parameters are
and N = 200.
and (C and D)
.
and N = 200.





























