Genome-Wide Regression & Prediction with the BGLR Statistical Package
Paulino Pérez, Gustavo de los Campos

Abstract

Many modern genomic data analysis require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and non-parametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection Bayesian regression models, including parametric variable selection and shrinkage methods and semi-parametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many non-genomic applications as well. The response can be continuous (censored or not) or categorical (either binary, or ordinal). The algorithm is based on a Gibbs Sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package and discuss practical issues emerging in real-data analysis.

  • Received March 22, 2014.
  • Accepted June 26, 2014.