RT Journal Article SR Electronic T1 Will Big Data Close the Missing Heritability Gap? JF Genetics JO Genetics FD Genetics Society of America SP genetics.300271.2017 DO 10.1534/genetics.117.300271 A1 Kim, Hwasoon A1 Grueneberg, Alexander A1 Vazquez, Ana I. A1 Hsu, Stephen A1 de los Campos, Gustavo YR 2017 UL http://www.genetics.org/content/early/2017/09/11/genetics.117.300271.abstract AB Despite the important discoveries reported by Genome-Wide Association studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of Big Data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n=22,221) of 0.24 (95% CI: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that Big Data will lead to a substantial reduction of the gap between trait heritability and the proportion of inter-individual differences that can be explained with a genomic predictor. However, even with the power of Big Data, for complex traits, we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.