This article has a correction. Please see:


The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available.

GENOMEWIDE dense marker maps are now available for many species in plants and animals (e.g., WANG et al. 2005). An important challenge is how this information should be incorporated into statistical models for prediction of genetic values in animal and plant breeding programs or prediction of diseases.

A standard quantitative genetic model assumes that genetic Math and environmental Math effects act additively, to produce phenotypic outcomes Math according to the rule Math. The information set now available for predicting genetic values may include, in addition to phenotypic records, a pedigree, molecular markers, or both.

Several methodologies have been proposed for incorporating dense marker data into regression models. A distinction can be made between methods that explicitly regress phenotypic records on markers via the regression function Math, where Math is a vector of marker covariates and Math is a vector of regression coefficients, e.g., Math, and those that view genetic values as a function of the subject and use marker information to build a (co)variance structure between subjects. The first group of methods includes standard Bayesian regression (BR) with random coefficients, i.e., a Bayesian model where regression coefficients are assigned the same Gaussian prior, and other shrinkage methods such as Bayes A or Bayes B of Meuwissen et al. (2001), and specifications described in Gianola et al. (2003). The second type of approach was suggested by Gianola et al. (2006) and Gianola and van Kaam (2008), who proposed using reproducing kernel Hilbert spaces regression (RKHS), with the information set consisting of SNP (single-nucleotide polymorphism) genotypes, possibly supplemented by genealogies. As discussed in De los Campos et al. (2009), in this approach, marker information is used to create a prior (co)variance structure between genomic values, Math, Math, where Math is some positive-definite function and Math is a parameter to be estimated from the data.

The two types of approaches lead to predictions of genomic values for quantitative traits. An advantage of explicitly regressing phenotypes on marker covariates is that the model can produce information about genomic regions that may affect the trait of interest. However, a main difficulty is that the number of regression coefficients (p) is typically large, even larger than the number of records (n), with pn. Therefore, a crucial aspect is how this methodology can cope with the curse of dimensionality and with colinearity.

With whole-genome scans, many markers are likely to be located in regions that are not involved in the determination of traits of interest. On the other hand, some markers may be in linkage disequilibrium with some QTL or in regions harboring genes involved in the infinitesimal component of the trait. This suggests that differential shrinkage of marker effects should be a feature of the model, as noted by Meuwissen et al. (2001). Tibshirani (1996) proposed a regression method (least absolute shrinkage and selection operator, LASSO) that combines the good features of subset selection (i.e., variable selection) with the shrinkage produced by BR. Recently, Park and Casella (2008) presented a Bayesian version of the LASSO method (Bayesian LASSO, BL) and suggested a Gibbs sampler for its implementation. Alternatives to the Gibbs sampler of Park and Casella are discussed in Hans (2008). While the BL described by Park and Casella is appealing for the reasons mentioned above, it does not accommodate pedigree information or regression on (co)variates other than the markers for which a different shrinkage approach may be desired.

Several authors have considered combining pedigree and marker data into a single model in the context of QTL analysis (e.g., Fernando and Grossman 1989; Bink et al. 2002, 2008). Here, in this spirit, the BL is modified and extended to accommodate pedigree information as well as covariates other than markers.

The main objectives of this article are to (1) discuss the use of BL and related methods in the context of linear regression of quantitative traits on molecular markers, (2) evaluate the sensitivity of BL with respect to the choice of the prior for the regularization parameter, (3) extend the BL so that pedigrees or regressions on covariates other than markers can also be included in the model, and (4) evaluate the methodology using data from a self-pollinated wheat population and an outcross mouse population. The article is organized as follows: the first section, bayesian lasso, introduces the BL as presented in Park and Casella (2008) and discusses connections between BL and closely related methods, such as those proposed by Meuwissen et al. (2001) or variants proposed by other authors. monte carlo study evaluates the sensitivity of BL with respect to the choice of prior for the regularization parameter. bayesian regression coupled with lasso presents an extension of BL, treating effects of different types of regressors with different priors. In data analysis, the proposed methodology is applied to two data sets representing a collection of wheat lines and a population of mice. concluding remarks are provided in the final section of the article. An R function (R Development Core Team 2008) that fits the model and data sets used in this article are made available (see supporting information, File S1 and File S2).


Tibshirani (1996) proposed using the sum of the absolute values of the regression coefficients (or L1 norm) as a penalty in regression models, to simultaneously produce variable selection and shrinkage of coefficients; the proposed methodology was termed LASSO. In LASSO, estimates are obtained by solving the constrained optimization problemMath(1)where Math is a vector of covariates, Math is the corresponding vector of regression coefficients, and t is an arbitrary positive constant. Above, it is assumed that data are centered, i.e., Math has zero mean. Optimization problem (1) is equivalent toMath(2)(e.g., Tibshirani 1996), for some value of the smoothing parameter Math. It is known that the solution to (2) may involve zeroing out some elements of Math, and there are many ways of illustrating why this may be so. One manner is to examine the shape of the feasible set in (1) (e.g., Tibshirani 1996); another way is to consider the Bayesian interpretation of the LASSO. From (2), it follows that the solution can be viewed as the posterior mode in a Bayesian model with Gaussian likelihood, Math, and a prior on Math that is the product of p independent, zero-mean, double-exponential (DE) densities; that is, Math. In contrast, BR is obtained by assuming the same likelihood and a prior on Math that is the product of p independent normal densities; that is, Math, where Math is a variance parameter common to all regression coefficients. The difference between these two priors is illustrated in Figure 1: the DE density places more mass at zero and has thicker tails than the Gaussian distribution. From this perspective, relative to BR, LASSO produces stronger shrinkage of regression coefficients that are close to zero and less shrinkage of those with large absolute values.

Figure 1.—

Densities of a normal and of a double-exponential distribution (both with null mean and with unit variance).

Parameter Math, sometimes referred to as a regularization parameter, plays a central role: as it approaches zero, the solution to (2) tends to ordinary least squares, while large values of Math penalize the L1 norm of Math, Math, highly. In the Bayesian view of LASSO, Math controls the prior on Math, with large values of this parameter associated with more informative (sharper) priors.

By construction, the non-Bayesian LASSO solution admits at most n − 1 nonzero regression coefficients (e.g., Park and Casella 2008). This is not desirable in models with dense marker-based regressions since, a priori, there is no reason why the number of markers with effectively nonzero effects should be smaller than the number of observations. This problem does not arise in BL, which is discussed next.

A computationally convenient hierarchical formulation of a DE distribution is obtained by exploiting the fact that the DE density can be represented as a mixture of scaled Gaussian densities (e.g., Andrews and Mallows 1974; Rosa 1999), where the mixing process of the variances is an exponential distribution. Following Park and Casella (2008),MathAbove, Math is the unknown effect of the jth marker and Math is a variance parameter (measuring prior uncertainty) associated with Math. Using this, Park and Casella (2008) suggested the following hierarchical model (BL),Math(3)Math(4)Above, Math and Math are normal densities centered at Math and 0, with variances Math and Math, respectively; Math is a scaled-inverted chi-square density, with degrees of freedom Math. and scale Math, in this parameterization, Math; Math is an exponential density indexed by a single parameter, Math, and Math is a Gamma distribution, with shape parameter Math and rate parameter Math.

The role of the Math's becomes more clear by changing variables in (3) and (4) from Math to Math. After this change of variables, the product of the likelihood function and of the joint prior for the regression coefficients, Math, becomes Math, where Math and Math is a vector of regression coefficients with homogeneous variance. Thus, one way of viewing this class of regression models is as a standard BR model with additional unknowns, Math, which assign different weights to the columns of X, with Math being equivalent to removing the jth covariate from the model.

Park and Casella (2008) presented a set of fully conditional distributions that allows fitting the BL model via the Gibbs sampler. Some of these distributions are discussed next, to illustrate main features of the algorithm.

Location parameters:

In the Gibbs sampler of Park and Casella (2008), the fully conditional distribution of the regression coefficients is multivariate normal with mean (covariance matrix) equal to the solution (inverse of the coefficient matrix) of the system of equations,Math(5)Recall that ordinary least-squares estimates are obtained by solving Math and that the counterpart of (5) in BR is Math. A key aspect of BL is that it produces a shrinkage that is marker specific, contrary to BR. Since Math is a scaling factor common to all regression coefficients, the differential shrinkage is due to the Math's. If Math is large, i.e., a large variance is associated with the effect of the jth marker, the quantity added to the diagonal will be small. Conversely, if a small variance is associated with the effect of the jth coefficient, Math will be large. Adding a large constant to the jth diagonal element shrinks the least-squares estimates toward zero and reduces the variance of its fully conditional distribution.

Variances of the regression coefficients:

An important aspect of the algorithm is how samples of the regression coefficients affect realizations of the variances of marker effects. In BL, the fully conditional posterior distributions of the Math's can be shown to be inverse Gaussian (e.g., Chhikara and Folks 1989), with mean Math and scale parameter Math. For a given Math, a small absolute value of Math will lead to a fully conditional distribution of Math with a large mean, which in turn will generate relatively small values of Math.

The λ parameter of the exponential prior:

In the standard LASSO, Math controls the trade-off between goodness of fit and model complexity, and this may be crucial in defining the ability of a model to uncover signal. Small values of Math produce better fit, in the sense of the residual sum of squares (Math = 0 gives ordinary least squares); as Math increases, the penalty on model complexity increases (in optimization problem (1) the feasible set is smaller). On the other hand, in BL, Math controls the shape of the prior distribution assigned to the Math's. In general, the exponential prior assigns more density to small values of the Math's than to large ones, and this may be reasonable for most SNPs under the expectation that most of their effects are nil.

In BL Math can be treated as any other unknown. If, as in (4), a Gamma prior is assigned to Math, the fully conditional posterior distribution of Math is also Gamma, with shape and rate parameters equal to Math and Math, respectively. The expectation of this Gamma distribution is Math, so a large value of Math will lead to a relatively small Math, and the opposite will occur if the sum of the variances of the regression coefficients is small.

Relationship between LASSO and other regression models used in genomic selection:

Standard BR may not be suitable for regressing phenotypes on a large number of markers because shrinkage of regression coefficients is homogeneous across markers (Fernando et al. 2007). In contrast, in BL the variance is marker specific, producing shrinkage whose extent is related to the absolute value of the estimated regression coefficient.

Meuwissen et al. (2001) recognized that marker-specific variances may be needed and suggested regression models based on marginal priors that are also mixtures of scaled-Gaussian distributions (“Bayes A”) or mixtures of scaled-Gaussian distributions and of a point mass at zero (“Bayes B”). In these models, the likelihood is as in (3) and, in Bayes A, the prior isMath(6)The first two components of (6), Math, are the counterparts of the first two components of (4), Math, with Math. The difference between BL and Bayes A (or Bayes B) is how the priors of the variances of the marker-specific regression coefficients (Math in Bayes A and Math in BL) are specified. At this level, Bayes A and BL differ in two respects:

  1. In Bayes A, the prior assumption is that the marker-specific variances are independent random variables following the same scaled-inverted chi-square distribution with known prior degree of belief Math and scale Math. In BL, the assumption is that these variances are independent as well, but each following the same exponential distribution with unknown parameter Math. The conditional (given Math) marginal prior in BL Math is DE, while in Bayes A Math is a t-distribution. Although a t-distribution may place more density at zero than the Gaussian prior of BR, the density at zero is larger in the DE. This issue was recognized by Meuwissen et al. (2001), leading to the development of Bayes B.

    Xu (2003) employed an improper prior for the marker-specific variances; if Math and Math, then Math. Similar to the exponential prior, this density decreases monotonically with Math. However, unlike the exponential distribution, where Math can be used to “tune” the shape of the distribution, this prior does not have parameters to allow any control. In addition, as noted by Ter Braak et al. (2005), Math yields an improper posterior. As an alternative, these authors suggested to use Math with Math; although this prior is improper, it does not yield an improper posterior. As with the exponential prior, Math is decreasing with respect to Math. Ter Braak (2006) furthered discussed the role of Math, which, as Math in the BL, controls the shape of the prior density on the variance of the regression coefficients.

  2. A second difference is that, in Bayes A, values of parameters Math and Math are specified as known a priori. On the other hand, in BL there is an extra level in the model: Math is assigned a Gamma distribution, and information from all regression coefficients is pooled. This difference is illustrated in Figure 2: in Bayes A, Math and Math control, as Math does in BL, the trade-offs between goodness of fit and model complexity.

    Figure 2.—

    Graphical representation of the hierarchical structure of the Bayesian LASSO (top) and Bayes A (bottom). In the Bayesian LASSO, the variances of the marker effects are Formula, Formula, with counterparts Formula in Bayes A.

Yi and Xu (2008) discuss an extension of Bayes A where a prior is assigned to Math and Math, and these quantities are treated as nuisances, as Math is in BL. However, as argued earlier, the DE seems to be a better choice, if the assumption is that most markers have no effect on the trait of interest.


Although in the BL Math can be treated as unknown, it is not clear how sensitive results might be with respect to the choice of hyperparameters Math and Math. Park and Casella (2008, p. 683) recognized that this may be an issue: “The prior density for Math should approach 0 sufficiently fast as Math (to avoid mixing problems) but should be relatively flat and place high probability near the maximum likelihood estimate.” The main problem of applying this recommendation is that one does not know in advance what the maximum-likelihood estimate is.

The sensitivity of BL with respect to the choice of the prior distribution of Math was investigated here by fitting the model under different priors to simulated data. In addition to the conjugate Gamma prior, we also considered (see File S1 and File S2)Math(7)The above distribution gives great flexibility for specifying a relatively flat prior over a wide range of values. The uniform prior appears as a special case when Math. When the Beta prior is used, the fully conditional distribution of Math does not have closed form; however, draws from the distribution can be obtained using the Metropolis–Hastings algorithm (see File S1 and File S2).

Data-generating process:

Data were simulated in a simple setting, such that problems could be identified easily, while the phenotypic and genotypic structure attempted to resemble those encountered in real data sets.

Data were generated under the additive model,Mathwhere Math is the phenotype for individual i, Math is the effect of allele substitution at marker j Math, and Math is the code for the genotype of subject i at locus j, Math. Residuals were independently sampled from a standard normal distribution; that is, Math.

Two scenarios regarding the genotypic distribution were considered. In scenario X0, markers were in low linkage disequilibrium (LD), with almost no correlation between adjacent markers (Table 1). In scenario X1 a relatively high LD was considered (Table 1).

View this table:

Absolute values of the correlation between marker genotypes (average across markers and 100 Monte Carlo simulations) by scenario (X0, low linkage disequilibrium; X1, high linkage disequilibrium)

The effects of allele substitutions Math were kept constant across simulations and set to zero for all markers except for 10 (Figure 3). The locations of markers with nonnull effects were chosen such that different situations regarding effects of linked markers were represented.

Figure 3.—

Positions (chromosome and marker number) and effects of markers (there were 280 markers, with 270 with no effect).

Choice of prior distribution of λ:

For each Monte Carlo (MC) replicate, five variations of BL were fitted, and four involved a Gamma prior Math with the following values of parameters: Math; Math; Math; Math. In BL5, the prior on Math was Math (Figure 4).

Figure 4.—

Unnormalized density of the five priors evaluated in the MC study (BL1–BL4 use Gamma priors on Formula, and BL5 uses a prior for Formula based on a Beta distribution; the densities in this figure are the corresponding densities for Formula).


Table 2 shows the average (across 100 MC replicates) of posterior means of the residual variance, the regularization parameter, and the correlation between the true and the estimated quantity of several features (phenotypes, genomic values, and marker effects). Math is a goodness-of-fit measure, Math measures how well the model estimates genomic values, and Math evaluates how well the model estimates marker effects.

View this table:

Posterior mean of residual variance, Formula, regularization parameter, Formula, and correlation between the true and estimated value for several items (Formula, phenotypes; Formula, true genomic value; Formula, marker effects; all quantities averaged over 100 MC replicates)

The posterior mean and standard deviation of Math were influenced by the prior (Table 2). The posterior mean was shrunk toward the prior mode, and the posterior standard deviation was larger for more dispersed priors (see Table 2 and Figure 4). These results suggest that there is not much information about Math in the type of samples evaluated. On the other hand, model goodness of fit and the ability of the model to uncover signal were not affected markedly by the choice of prior. This suggest that, while it may be difficult to learn about Math from data, inferences on quantities of interest (e.g., genetic values) may be robust with respect to values of Math over a fairly wide range. For example, differences in Math or in Math were small when the prior was changed.

A relatively flat prior based on a Beta distribution (BL5) produced a more dispersed posterior distribution of Math, and mixing was not as good as when the sharper Gamma priors (BL1–BL4) were used. For example, the average (across MC replicates) effective sample sizes (e.g., Plummer et al. 2008) for the residual variance were 1468, 1155, 1091, 1138, and 578 for BL1–BL5, respectively.


In practice, the information set available for prediction of genomic values may include components other than genetic markers. For example, data may cluster into known contemporary groups (e.g., individuals may be measured under different experimental conditions), or a pedigree may be available in addition to genetic markers. It is natural to treat the various classes of predictors in a different way. From a penalized-likelihood point of view, this amounts to using penalty functions that are specific to each class of predictors. From a Bayesian standpoint, treating predictors differently may be achieved by assigning different priors. A straightforward extension of the BL is described next.

The data structure is denoted as Math, where Math is the phenotype of subject i, Math is a vector of covariates that is treated as in a standard BR with a normal prior and variance common to all regressions, Math is a set of covariates whose effects are assigned a double-exponential prior as in BL, and Math is a label that allows tracking subjects in a pedigree. The equation for the data isMathwhere Math is an intercept, Math and Math are regressions of Math on Math and Math, respectively, Math is an infinitesimal genetic effect pertaining to individual i for which the prior (co)variance structure is determined by a pedigree, and Math is a model residual, assumed to be identically and independently distributed of other residuals. The likelihood function isMath(8)Prior specification (4) is modified asMath(9)where Math, Math, and Math are the variances of Math, Math, and Math, respectively; Math and Math are prior degrees of freedom and scale parameter of the corresponding distributions; Math is a (co)variance structure computed from the genealogy (for example, a numerator-relationship matrix); and, Math is the prior on Math that may be as in (4) or (7).

In the model defined by (8) and (9) all fully conditional distributions (except that of Math if a nonconjugate prior is chosen for Math) have closed form, so a Gibbs sampler (with a Metropolis–Hastings step) can be used to draw samples from the joint posterior distribution (see File S1 and File S2). To distinguish the above model from the standard BL we refer to it as Bayesian regression coupled with LASSO (BRL).


Two data sets were analyzed with the BRL model. The first set pertains to a collection of wheat lines (see File S1 and File S2); the second set contains information from a population of mice (publicly available at

The wheat data set is from the Global Wheat program of the International Maize and Wheat Improvement Center (CIMMYT). This program conducted several international trials across a wide variety of environments. For this study, we took a subset of 599 wheat lines derived from 25 years of Elite Spring Wheat Yield Trials (ESWYT) conducted from 1979 through 2005. The environments represented in these trials were grouped into four macroenvironments. The phenotype considered here was average grain yield performance of the 599 wheat lines evaluated in one of the macroenvironments. An association mapping study based on a reduced number of these ESWYT trials is presented in Crossa et al. (2007).

The Browse application of the International Crop Information System (ICIS), as described in (McLaren et al. 2005), was used for deriving the relationship matrix A between the 599 lines, and it accounts for selection and inbreeding.

A total of 1447 Diversity Array Technology (DArT) markers were generated by Triticarte (Canberra, Australia; The DArT markers may take on two values, denoted by their presence or their absence.

The mouse data come from an experiment carried out to detect and locate QTL for complex traits in a mouse population (Valdar et al. 2006a,b). These data have already been analyzed for comparing genome-assisted genetic evaluation methods (Legarra et al. 2008). The data file consists of 1884 individuals (168 full-sib families), each genotyped for 10,946 polymorphic markers. The trait analyzed here was body mass index (BMI), precorrected by body weight, season, month, and day. Mice were housed in 359 cages; on average, each litter was allocated into 2.84 cages.

Three models were fitted to each of the data sets: P (standing for pedigree) is a pedigree-based model where markers were not included; M is a model where the only genetic component is the regression on markers; P&M (standing for pedigree and markers) includes regressions on markers and an additive effect with (co)variance structure computed from the pedigree. For both data sets, phenotypes were standardized to have a sample variance equal to one, so that results are easily compared across data sets.

In the mouse data set, Math were the effects of cages where groups of mice were reared. In the wheat data set, the component Math was omitted because there was no such set of regressors.

Models were first fitted to the entire data set. Subsequently, a fivefold cross-validation (CV) was carried out with assignment of individuals into folds at random. The CV yields prediction of phenotypes Math (f = 1, … , 5) obtained from a model in which all observations in the fth fold were excluded. The ability of each model to predict out-of-sample data was evaluated via the correlation between phenotypes and predictions from CV. Inferences for each fit were based on 70,000 samples (after 5000 were discarded as burn-in). Convergence was checked by inspection of trace plots and with estimates of effective sample size for (co)variance components computed using the coda package of R (Plummer et al. 2008). Parameters of the prior distributions were Math and Math. This latter prior is flat over a wide range of values of Math.


Table 3 shows summaries of the posterior distributions of the variance components and of λ by model and data set. In both populations, a moderate reduction in the posterior mean of the residual variance was observed when the P&M model was fitted, relative to P. Using model P in the wheat population gave a posterior mean of heritability of grain yield of 0.34, while in the mouse population the posterior mean of h2 of body-mass index was 0.11. These results are in agreement with previous reports (Valdar et al. 2006b and Legarra et al. 2008 for the mouse data and Crossa et al. 2007 for the wheat data) for these traits and populations. The inclusion of markers (P&M) reduced the estimate of the variance of the infinitesimal additive effect, relative to P. This happens because, in P&M, part of the infinitesimal additive effect is captured by the regression on markers (e.g., Habier et al. 2007; Bink et al. 2008). In the model for body-mass index in M, the variance between cages Math was reduced only slightly when the effects of the markers were fitted.

View this table:

Posterior means (standard deviations) of variance components for yield in wheat and body-mass index in mice, and of Formula for each of the models, by data set

Figure 5 gives absolute values of the posterior means of marker effects. In the mouse data there are several regions showing groups of markers with relatively large estimated effects. This is not evident in the wheat data set where fewer markers were available.

Figure 5.—

Absolute values of the posterior means of effects of allele substitution in a model including markers and pedigree information (P&M), by data set.

From a breeder's perspective, a relevant question is whether or not the P, M, and P&M models lead to different ranking of individuals on the basis of the estimated genetic values. Table 4 shows the rank (Spearman) correlation of estimated genetic values. As expected, these correlations were high, but not perfect. The correlation between predicted genetic values from M and P&M was larger than that of the estimates from P and P&M, suggesting that the inclusion of markers in the model is probably critical. This was clearer in the mouse data set, where (a) the extent of additive relationships was not as strong as in the wheat population and (b) a much larger number of markers were available.

View this table:

Rank correlation (Spearman) between genetic values estimated from models including different sources of genetic information (pedigree, markers, and pedigree and markers), by data set (mouse data set above diagonal, wheat data set below diagonal)

Figure 6 shows scatter plots of predicted genomic values in P and P&M for both data sets. Although the correlation between genetic values estimated from different models was high, using P and P&M would lead to different sets of selected individuals. The difference was more marked in the mouse data set, illustrating that the impact of considering markers in breeding decisions depends on the data structure and on how informative the pedigree and markers are. Also, the dispersion of predicted genetic values was larger when markers were fitted, and this is consistent with the smaller posterior mean of the residual variance observed for P&M (Table 3). An interpretation of this result is that, in certain contexts, markers may help to uncover genetic variance that would not be captured if only pedigree-based predictions were used.

Figure 6.—

Predicted genetic value using markers and pedigree (P&M) vs. using pedigree only (P), by data set.

The aforementioned results indicate that incorporation of markers into a genetic model can influence inferences and breeding decisions. In contrast, cross-validation allows comparing models from the standpoint of their ability to predict future outcomes. Table 5 shows the correlation between phenotypic records and predictions from cross-validation. Two CV correlations were considered:

  1. Math is the correlation between phenotypic records and their prediction from CV. That is, Math, where Math in P, Math in M, and Math in P&M.

  2. Math is the correlation between the CV estimate of the genetic value and phenotypic records adjusted with CV estimates of nongenetic effects. That is, Math in the wheat population, and Math in the mouse data set.

View this table:

Rank correlation (Spearman) between phenotypic values or corrected phenotypic records and predictions from cross-validation, by population and model (P, pedigree-based model; P&M, pedigree and marker information)

Overall, P&M models had better predictive ability than models based on pedigrees or markers only. In the wheat data set, the increases in the correlation observed when markers were included in the model were 13% for Math and 42% for Math. In the mouse data set the relative increases in correlation were 14 and 100% for Math and Math, respectively. We conclude that there are sizable benefits from using markers for breeding decisions and that the relative impact of the contribution depends upon data structure and on how informative the pedigree and the set of markers are.


Additive models with infinitesimal effects are ubiquitous in animal and plant breeding. For many decades, predictions of genetic values have been made using phenotypic records and pedigrees, i.e., some sort of family-based evaluation. Markers capture Mendelian segregation and may enhance prediction of genomic values, independently of the mode of gene action.

With highly dense markers, marker-specific shrinkage may be needed. Priors on marker effects based on mixtures of scaled-Gaussian distributions allow this type of shrinkage and constitute a promising tool for genomic-based additive models. This family of models includes, among others, the t or DE distributions. Models based on marginal priors that belong to the t family have been proposed for marker-based regressions (e.g., Meuwissen et al. 2001).

If the hypothesis that most markers do not have any effect holds, a DE prior may be a better choice than the t. For this reason, the Bayesian LASSO appears to be an interesting alternative for performing regressions on markers, at least under an additive model.

Our results indicate that in the type of samples that are relevant for genomic selection (i.e., pn) the choice of prior for Math matters in terms of inferences about this unknown. However, estimates of genetic values and of marker effects may be robust with respect to the choice of prior, over a wide range. To circumvent the potential influence of the prior, we proposed an alternative formulation of the BL where the prior on Math is formulated using a Beta distribution. Unlike the Gamma prior, this prior allows expressing vague prior preferences over a wide range of values of Math.

Two data analyses carried out with the proposed model showed that (a) markers may allow capturing fractions of additive variance that would be lost if pedigrees are the only source of genetic information used, (b) considering markers has a sizable impact on selection decisions, and (c) models including marker and pedigree information had better predictive ability than pedigree-based or marker-based models.


We greatly appreciate suggestions of two anonymous reviewers and of the Associate Editor. The Wellcome Trust Center for Human Genetics, Oxford, is gratefully acknowledged for making the mouse data available at Vivi Arief from the School of Land Crop and Food Sciences of the University of Queensland, Australia, is thanked for assembling the historical wheat phenotypic and molecular marker data and for computing additive relationships between wheat lines. Financial support by the Wisconsin Agriculture Experiment Station, grant DMS-NSF DMS-044371, and by the Chaire D'Excellence Pierre de Fermat programme of the Midi-Pyrennées Region, France, is acknowledged.


  • Received February 5, 2009.
  • Accepted March 7, 2009.


View Abstract