- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Sabatti, C.
- Articles by Freimer, N.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Sabatti, C.
- Articles by Freimer, N.
False Discovery Rate in Linkage and Association Genome Screens for Complex Disorders
Chiara Sabattia, Susan Serviceb, and Nelson Freimerba Departments of Human Genetics and Statistics, University of California, Los Angeles, California 90095-7088
b Center for Neurobehavioral Genetics, Neuropsychiatric Institute, and Departments of Psychiatry and Human Genetics, University of California, Los Angeles, California 90095-1761
Corresponding author: Chiara Sabatti, UCLA School of Medicine, 695 Charles E. Young Dr. S., Los Angeles, CA 90095-7088., csabatti{at}mednet.ucla.edu (E-mail)
Communicating editor: G. CHURCHILL
| ABSTRACT |
|---|
We explore the implications of the false discovery rate (FDR) controlling procedure in disease gene mapping. With the aid of simulations, we show how, under models commonly used, the simple step-down procedure introduced by Benjamini and Hochberg controls the FDR for the dependent tests on which linkage and association genome screens are based. This adaptive multiple comparison procedure may offer an important tool for mapping susceptibility genes for complex diseases.
RECENT developments in methods for controlling for multiple comparisons in statistical testing may strongly influence strategies for mapping disease genes. The importance of correcting for multiple comparisons in genome screens is well known (![]()
![]()
![]()
![]()
![]()
In 1995, Benjamini and Hochberg introduced the false discovery rate (FDR), a new notion of global error for multiple testing situations. The idea of FDR is to use, as a measure of global error, the expected proportion of false rejections of the null hypothesis among the total number of rejections. The use of the proportion of type I errors among the total number of "significant" results leads to a global cutoff value that is adaptive to the data set. That is, if a higher percentage of the null hypotheses tested are truly false, the FDR procedure will identify a lower cutoff level than the universal Bonferroni cutoff. Therefore, FDR defines an adaptive marginal search, which is most effective for the identification of loci with secondary effects. On the other hand, if all the null hypotheses are true (none of the analyzed markers is linked with the disease), controlling FDR is equivalent to controling FWER, as in the ![]()
![]()
The first work to suggest applying the FDR procedure in genetic mapping (![]()
![]()
To illustrate both the FDR approach and the implications of these novel findings, consider a linkage genome screen done under the sparse map assumption and based on n = 400 markers. Let Hi, i = 1, ... , n, be the null hypothesis of no linkage with the ith marker and let
Let p1, ... pn be the P values associated with each of the test statistics (n = 400) and let p(1) ≤ ... ≤ p(n) be their ordered counterpart. According to the Bonferroni rule, one can reject H0 if p(1) <
/n, where
is the desired level for the test of H0; for
= 0.05, this translates to a lod score of 3.3. Benjamini and Hochberg (BH) proposed the following stepwise procedure: proceed from i = 1 to i = 2, ... , n until, for the last time, p(i) ≤ i ·
/n. Denote this index by k and reject all H(i) with i = 1, ... , k. The decreasing cutoff values can be translated in a series of decreasing lod scores, shown in Fig 1: if the locus with the highest lod score has to pass a 3.3 threshold to be significant alone, the second locus is compared with a score of 3, and the third locus with a threshold of 2.8.
|
While the rule described above (BH) was proposed by Benjamini and Hochberg for independent testsa sparse map assumption in the context of linkagerecent theoretical developments assure that it can control FDR even in the case of dependent tests, as in the continuous-map assumption. ![]()
To achieve this, one must consider the result of ![]()
· n0/n, where n0 is the number of false null hypotheses, if the test statistics are positive-regression dependent on each one from the subset (PRDS) of test statistics corresponding to the true null hypotheses. Technically, the definition of PRDS is as follows. The set D is called increasing if x
D and y ≥ x imply that y
D as well. The random variables X1, ... , Xn are PRDS on I0 if, for any increasing set D, and for each i
I0, P(X1, ... , Xn
D|Xi = x) is nondecreasing in x. This definition is a specific formal requirement for what we may call "positive dependence," and in the context of genome screens, PRDS can be loosely interpreted as follows: if two markers are linked [or in linkage disequilibrium (LD)] and neither is related to the disease, the P values of the tests conducted at each marker tend to be positively correlatedas one would expect.
In the case of linkage we can prove that the lod score statistic (or a specific approximation of it) satisfies the PRDS requirement. In the case of association studies, we illustrate the meaning of PRDS with respect to a specific model and conduct a simulation study.
| Significance cutoffs for a linkage genome screen under FDR |
|---|
Using a specific approximation of the lod score statistics, one can show that they satisfy the PRDS requirement. Consider the Gaussian models for genetic linkage analysis proposed by ![]()
![]()
N(µ,
) be a vector of test statistics, each testing the hypothesis Hi that µi = 0 against the alternative µi > 0, for i = 1, ... , m. For i
I0, the true set of null hypotheses, µi = 0; otherwise µi > 0. If for each i
I0, and for each j
i,
ij ≥ 0, then the distribution of X is PRDS over I0. We can conclude, because the covariances are nonnegative, that the tests are PRDS on I0. Hence the cutoff values illustrated in Fig 1 are guaranteed to control the FDR, even when we relax the independence assumption.
| Significance cutoffs for an association genome screen under FDR |
|---|
Depending on the population of origin of the sample and on the distance between markers and their location in the genome, association tests at different markers either may be independent or may display varying degrees of dependence. In the case of independence between markers, we know that the BH rule controls for FDR. In the case of dependent markers, intuition suggests that PRDS should hold and hence the BH rule should also control FDR. In the absence of a general model for dependency among association tests, we illustrate that PRDS should hold by using a simple example and by testing the performance of the BH rule directly with simulations.
Suppose we conduct a screen using N cases and N controls and testing for association at two loci with single-nucleotide polymorphisms (SNPs). Their joint distribution can be represented by the parameters Pr(SNP1 = 0) = p, Pr(SNP2 = 0) = q, and Pr(SNP1 = 0 and SNP2 = 0) = pq +
. If we focus on the distribution of the test of association derived from these two markers, it is possible to see, through computations, that the distribution of their P values (p1, p2) has a property required by PRDS (see Fig 2). While this case of two markers serves as an illustration of the implications of PRDS and why it is reasonable to think that it should hold, a genome screen clearly involves more than two markers. We should investigate PRDS more carefully, for all combinations of multiple statisticsa task clearly impossible in the absence of a simpler model for their dependency. We therefore resorted to a simulation study.
|
For brevity we report here only on the simulations for haplotype-based tests. The results of single-marker-based tests are totally comparable and are available in a companion technical report (![]()
|
Under nearly all simulation conditions reported in Table 1, the BH method achieves control of the FDR: the average estimated FDRs are <0.05. In the high-power, high-dependence setting for the haplotype test, the average FDR is 0.052. Under the high-dependence settings, the variance of the results is considerably greater than that in the other scenarios and indeed, 0.05 is within the 95% bootstrap confidence interval for all the scenarios. The BH FDR method controls only the expected value of the FDR, so that in one replicate the FDR may actually be higher (data not shown). In general, even under high levels of dependency, then, the FDR is controlled at the appropriate level, as suggested by our previous analysis. The column named "no. of false positives" reports the average number of false positives per replicate; while this number is larger using BH than using Bonferroni, it is still <<1, indicating that indeed we do not need to be as strict as FWER procedures suggest.
The FDR method leads to a considerable increase in power when compared with FWER. On average, the marginal power estimates of FDR are 25% greater than those of FWER. The increased power of FDR over FWER is even more dramatic when one requires that multiple loci be detected, with the increase in power for identifying all three loci averaging >100%. The power for all methods decreases with increased dependency between markers. This decrease is apparent not only in the power estimates themselves, but also by the fact that both FDR and FWER are usually controlled at a level <<0.05 (the cutoff we had specified). This result argues in favor of the necessity of developing adequate resampling-based evaluations of FDR so that the dependence between markers is incorporated to increase the power of the study. This is the goal of a separate investigation.
We have also applied the BH procedure to a recently collected data set from a genome-wide LD study of a complex trait, bipolar disorder (Table 2; ![]()
![]()
|
| Is FDR the appropriate measure of global error for disease gene mapping? |
|---|
The FDR is a powerful, relatively novel measure of global error in multiple testing. The BH controlling strategy is simple and effective in a wide range of circumstances and in particular for disease mapping, as we illustrate. While extensively used in gene expression studies (see, for example, ![]()
![]()
In addition to what we have illustrated in this article, the application of FDR in genome screens has other advantages. It is becoming clearer that to identify the genes responsible for complex disease, a variety of strategies will need to be employed. Multiple phenotypes may be analyzed at the same time; for example, expression levels of candidate genes may be monitored together with the analysis of genotypes at thousands of loci. The BH strategy illustrated in this article can be easily adapted to control for multiple comparisons in these more diverse settings.
| ACKNOWLEDGMENTS |
|---|
C. Sabatti acknowledges support from National Institutes of Health (NIH) grant RO1 MH49499 and S. Service and N. Freimer from NIH grants RO1 MH49499, R01 NS 37484, R01 NS40024, R01 HL66289, and K02 MH01375.
Manuscript received January 9, 2003; Accepted for publication February 21, 2003.
| LITERATURE CITED |
|---|
ABRAMOVICH, F., Y. BENJAMINI, D. DONOHO and I. JOHNSTONE, 2000 Adapting to unknown sparsity by controlling the false discovery rate. Technical Report 2000-19. Department of Statistics, Stanford University, Stanford, CA.
BENJAMINI, Y. and Y. HOCHBERG, 1995 Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57:289-300.
BENJAMINI, Y. and D. YEKUTIELI, 2001 The control of the false discovery rate in multiple testing under independence. Ann. Stat. 29:1165-1188.
DUPUIS, J., P. O. BROWN, and D. O. SIEGMUND, 1995 Statistical methods for linkage analysis of complex traits from high-resolution maps of identity by descent. Genetics 140:843-856.[Abstract]
EFRON, B., R. TIBSHIRANI, J. STOREY, and V. TUSHER, 2001 Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96:1151-1160.
FEINGOLD, E., P. O. BROWN, and D. SIEGMUND, 1993 Gaussian models for genetic linkage analysis using complete high-resolution maps of identity by descent. Am. J. Hum. Genet. 53:234-251.[Medline]
LANDER, E. and D. BOTSTEIN, 1986 Strategies for studying heterogeneous genetic traits in humans by using a linkage map of restriction fragment length polymorphisms. Proc. Natl. Acad. Sci. USA 83:7353-7357.
LANDER, E. and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-190.
LANDER, E. and L. KRUGLYAK, 1995 Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet. 11:241-247.[Medline]
OPHOFF, R., M. ESCAMILLA, S. SERVICE, M. SPESNY, and D. MESHI et al., 2002 Genome-wide linkage disequilibrium mapping of severe bipolar disorder in a population isolate. Am. J. Hum. Genet. 71:565-574.[Medline]
SABATTI, C., S. SERVICE and N. FREIMER, 2002 UCLA Statistical Department Preprint. University of California, Los Angeles.
SERVICE, S., D. TEMPLE LANG, N. FREIMER, and L. SANDKUIJL, 1999 Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations. Am. J. Hum. Genet. 64:1728-1738.[Medline]
WELLER, J. I., J. Z. SONG, D. W. HEYEN, H. A. LEWIN, and M. RON, 1998 A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150:1699-1706.
This article has been cited by other articles:
![]() |
K. E. North and L. J. Martin The Importance of Gene--Environment Interaction: Implications for Social Scientists Sociological Methods Research, November 1, 2008; 37(2): 164 - 200. [Abstract] [PDF] |
||||
![]() |
E. J. C. G. van den Oord, P.-H. Kuo, A. M. Hartmann, B. T. Webb, H.-J. Moller, J. M. Hettema, I. Giegling, J. Bukszar, and D. Rujescu Genomewide Association Analysis Followed by a Replication Study Implicates a Novel Candidate Gene for Neuroticism Arch Gen Psychiatry, September 1, 2008; 65(9): 1062 - 1071. [Abstract] [Full Text] [PDF] |
||||
![]() |
P.-a. B. Shih and D. T. O'Connor Hereditary Determinants of Human Hypertension: Strategies in the Setting of Genetic Complexity Hypertension, June 1, 2008; 51(6): 1456 - 1464. [Full Text] [PDF] |
||||
![]() |
T. A. Pearson and T. A. Manolio How to Interpret a Genome-wide Association Study JAMA, March 19, 2008; 299(11): 1335 - 1344. [Abstract] [Full Text] [PDF] |
||||
![]() |
E Van Eyken, G Van Camp, E Fransen, V Topsakal, J J Hendrickx, K Demeester, P Van de Heyning, E Maki-Torkko, S Hannula, M Sorri, et al. Contribution of the N-acetyltransferase 2 polymorphism NAT2*6A to age-related hearing impairment J. Med. Genet., September 1, 2007; 44(9): 570 - 578. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Konings, L. Van Laer, M. Pawelczyk, P.-I. Carlsson, M.-L. Bondeson, E. Rajkowska, A. Dudarewicz, A. Vandevelde, E. Fransen, J. Huyghe, et al. Association between variations in CAT and noise-induced hearing loss in two independent noise-exposed populations Hum. Mol. Genet., August 1, 2007; 16(15): 1872 - 1883. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Graze, O. Barmina, D. Tufts, E. Naderi, K. L. Harmon, M. Persianinova, and S. V. Nuzhdin New Candidate Genes for Sex-Comb Divergence Between Drosophila mauritiana and Drosophila simulans Genetics, August 1, 2007; 176(4): 2561 - 2576. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Mathias, L. Jacky, W. E. Bradshaw, and C. M. Holzapfel Quantitative Trait Loci Associated with Photoperiodic Response and Stage of Diapause in the Pitcher-Plant Mosquito, Wyeomyia smithii Genetics, May 1, 2007; 176(1): 391 - 402. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Cohen Genetic Approaches to Coronary Heart Disease J. Am. Coll. Cardiol., October 27, 2006; 48(9_Suppl_A): A10 - A14. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. Thomas Are We Ready for Genome-wide Association Studies? Cancer Epidemiol. Biomarkers Prev., April 1, 2006; 15(4): 595 - 598. [Full Text] [PDF] |
||||
![]() |
G. Zou and Y. Zuo On the Sample Size Requirement in Genetic Association Tests When the Proportion of False Positives Is Controlled Genetics, January 1, 2006; 172(1): 687 - 691. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Ruiz-Narvaez, Y. Yang, Y. Nakanishi, J. Kirchdorfer, and H. Campos APOC3/A5 haplotypes, lipid levels, and risk of myocardial infarction in the Central Valley of Costa Rica J. Lipid Res., December 1, 2005; 46(12): 2605 - 2613. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Benjamini and D. Yekutieli Quantitative Trait Loci Analysis Using the False Discovery Rate Genetics, October 1, 2005; 171(2): 783 - 790. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. B. Freimer and C. Sabatti Guidelines for association studies in Human Molecular Genetics Hum. Mol. Genet., September 1, 2005; 14(17): 2481 - 2483. [Full Text] [PDF] |
||||
![]() |
S. Wacholder Publication Environment and Broad Investigation of the Genome Cancer Epidemiol. Biomarkers Prev., June 1, 2005; 14(6): 1361 - 1361. [Full Text] [PDF] |
||||
![]() |
I P Hall and J D Blakey Genetic association studies in Thorax Thorax, May 1, 2005; 60(5): 357 - 359. [Full Text] [PDF] |
||||
![]() |
S. Zollner and J. K. Pritchard Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci Genetics, February 1, 2005; 169(2): 1071 - 1092. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ghazalpour, S. Doss, X. Yang, J. Aten, E. M. Toomey, A. Van Nas, S. Wang, T. A. Drake, and A. J. Lusis Thematic review series: The Pathogenesis of Atherosclerosis. Toward a biological network for atherosclerosis J. Lipid Res., October 1, 2004; 45(10): 1793 - 1805. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. D. Love-Gregory, J. Wasson, J. Ma, C. H. Jin, B. Glaser, B. K. Suarez, and M. A. Permutt A Common Polymorphism in the Upstream Promoter Region of the Hepatocyte Nuclear Factor-4{alpha} Gene on Chromosome 20q Is Associated With Type 2 Diabetes and Appears to Contribute to the Evidence for Linkage in an Ashkenazi Jewish Population Diabetes, April 1, 2004; 53(4): 1134 - 1140. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. Thomas and D. G. Clayton Betting Odds and Genetic Associations J Natl Cancer Inst, March 17, 2004; 96(6): 421 - 423. [Full Text] [PDF] |
||||
![]() |
S. Wacholder, S. Chanock, M. Garcia-Closas, L. El ghormli, and N. Rothman Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies J Natl Cancer Inst, March 17, 2004; 96(6): 434 - 442. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Sabatti, C.
- Articles by Freimer, N.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Sabatti, C.
- Articles by Freimer, N.














