- THIS ARTICLE
- Full Text
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.042648v1
173/3/1747 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Zuo, Y.
- Articles by Zhao, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Zuo, Y.
- Articles by Zhao, H.
Originally published as Genetics Published Articles Ahead of Print on April 19, 2006.
Genetics, Vol. 173, 1747-1760, July 2006, Copyright © 2006
doi:10.1534/genetics.105.042648
Two-Stage Designs in CaseControl Association Analysis
Yijun Zuo*,
Guohua Zou
and
Hongyu Zhao
,1
* Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824,
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, People's Republic of China and
Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut 06520
1 Corresponding author: Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College St., 200 LEPH, New Haven, CT 06520-8034.
E-mail: hongyu.zhao{at}yale.edu
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005),
3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.
This article has been cited by other articles:
![]() |
J. Wang, H. Liang, and G. Zou Optimal 2-stage design with given power in association studies Biostat., April 1, 2009; 10(2): 324 - 326. [Full Text] [PDF] |
||||
![]() |
R. Pahl, H. Schafer, and H.-H. Muller Optimal multistage designs--a general framework for efficient genome-wide association studies Biostat., April 1, 2009; 10(2): 297 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Y. C. Kuk, H. Zhang, and Y. Yang Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium Bioinformatics, February 1, 2009; 25(3): 379 - 386. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhang, H.-C. Yang, and Y. Yang PoooL: an efficient method for estimating haplotype frequencies from large DNA pools Bioinformatics, September 1, 2008; 24(17): 1942 - 1948. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Homer, W. D. Tembe, S. Szelinger, M. Redman, D. A. Stephan, J. V. Pearson, S. F. Nelson, and D. Craig Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies Bioinformatics, September 1, 2008; 24(17): 1896 - 1902. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Macgregor, Z. Z. Zhao, A. Henders, N. G. Martin, G. W. Montgomery, and P. M. Visscher Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays Nucleic Acids Res., April 1, 2008; 36(6): e35 - e35. [Abstract] [Full Text] [PDF] |
||||


