Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.

  • Received December 27, 2010.
  • Accepted March 22, 2011.
View Full Text