help button home button Genetics Drug Metabolism
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Originally published as Genetics Published Articles Ahead of Print on August 24, 2007.

Genetics, Vol. 177, 1255-1258, October 2007, Copyright © 2007
doi:10.1534/genetics.107.077487

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
genetics.107.077487v1
177/2/1255    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Xu, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, S.

Derivation of the Shrinkage Estimates of Quantitative Trait Locus Effects

Shizhong Xu1

Department of Botany and Plant Sciences, University of California, Riverside, California 92521

1 Address for correspondence: 900 University Ave., University of California, Riverside, CA 92521-0124.
E-mail: xu{at}genetics.ucr.edu

Manuscript received June 12, 2007. Accepted for publication August 3, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 THEORY AND MODEL
 DISCUSSION
 LITERATURE CITED
 
The shrinkage estimate of a quantitative trait locus (QTL) effect is the posterior mean of the QTL effect when a normal prior distribution is assigned to the QTL. This note gives the derivation of the shrinkage estimate under the multivariate linear model. An important lemma regarding the posterior mean of a normal likelihood combined with a normal prior is introduced. The lemma is then used to derive the Bayesian shrinkage estimates of the QTL effects.


THE Bayesian shrinkage estimation of quantitative trait locus (QTL) effects was first introduced by XU (2003) and later formalized by WANG et al. (2005). The multivariate version of the shrinkage estimation of QTL effects was recently developed by YANG and XU (2007). The main purpose of the shrinkage estimation is to avoid variable selection for mapping multiple QTL. Once a normal prior distribution for each regression coefficient is incorporated into the QTL mapping program, the method can handle substantially more QTL effects than the classical maximum-likelihood (ML) method. In addition, the shrinkage method produces much clearer signals of QTL on the genome than the ML method. As a result, shrinkage mapping appears to have pointed to a new direction for future research in QTL mapping.

The key issue of shrinkage estimation is the normal prior distribution assigned to the regression coefficient (QTL effect). More importantly, different regression coefficients are assigned different normal priors. Because the variances in the prior distributions determine the degrees of shrinkage, assigning different prior variances to different regression coefficients allows the method to differentially shrink regression coefficients. A smaller prior variance will cause the regression coefficient to shrink more while a larger prior variance will lead to less shrinkage. This phenomenon is called selective shrinkage.

After incorporating the normal prior distribution into the likelihood function, we can derive the posterior distribution of the regression coefficient, which remains normal due to the conjugate nature of the normal prior. The posterior mean and posterior variance are used to generate a posterior sample of the regression coefficient. Formulas for the posterior mean and posterior variance are mathematically attractive (see XU 2003; WANG et al. 2005; YANG and XU 2007). However, due to page limitations of these publications, derivation of the formulas was not provided in these articles.

Derivation of the univariate shrinkage estimation closely followed BOX and TIAO's (1973, Appendix A1.1) combination of a univariate normal likelihood and a univariate normal prior. Derivation of the multivariate shrinkage estimation followed the general Bayesian linear model of LINDLEY and SMITH (1972) and the best linear unbiased prediction (BLUP) of ROBINSON (1991). The derivations presented by these authors were particularly targeted to statisticians and often difficult to understand by the audience of the genetics community. I have been regularly receiving e-mails and calls from readers asking for the derivation. These readers (almost all genetics professionals and students) are often interested in extending the shrinkage method to handle QTL mapping in different mapping populations. Understanding the derivation of these formulas is crucial to the development of new shrinkage methods. Simply pointing them to the above references often does not help too much because intermediate steps are needed to lead to the shrinkage estimate presented by XU (2003). By doing this, I often give them an impression of irresponsibility. Therefore, I prepared a short note for the derivation and distributed the note to these interested readers. The note briefly summarizes the derivation using a language that is easy to understand by geneticists with basic statistical training. Given the increasing interest of the derivation from the QTL mapping community, it is more efficient to publish the note in GENETICS where the very first shrinkage method (XU 2003) was published.


    THEORY AND MODEL
 TOP
 ABSTRACT
 THEORY AND MODEL
 DISCUSSION
 LITERATURE CITED
 
Shrinkage estimates:
Let Formula be an Formula vector for the phenotypic values of m traits collected from the jth individual for Formula where n is the sample size. This vector is described by the following linear model,

Formula 1(1)
where Formula 1 is an Formula 1 vector for the population means (or intercept), Formula 1 is an Formula 1 design matrix (determined by the genotypes of the jth individual at the kth locus), Formula 1 is a Formula 1 vector for the regression coefficients (QTL effects) for locus k (Formula 1), Formula 1 is an Formula 1 vector of residual errors with an assumed Formula 1 distribution, and D is an Formula 1 positive definite covariance matrix. When the kth regression coefficient is considered, all other regression coefficients are treated as constants and thus model (1) can be rewritten as

Formula 2(2)
where

Formula 3(3)
is the phenotypic value adjusted by all other regression coefficients that are not currently under consideration. Let us describe Formula 3 by the following normal prior Formula 3 where Formula 3 is a Formula 3 vector for the means and Formula 3 is a Formula 3 prior variance–covariance matrix. The posterior distribution of Formula 3 is multivariate normal with mean

Formula 4(4)
and variance–covariance matrix

Formula 5(5)
In shrinkage analysis, we often set Formula 5 for Formula 5 as such the posterior mean becomes

Formula 6(6)
This posterior mean is called the shrinkage estimate of the regression coefficient Formula 6 When Formula 6 the prior is flat, leading to the usual least-squares estimate,

Formula 7(7)
When Formula 7 we have Formula 7 which leads to Formula 7 and thus Formula 7 an estimate shrunken to zero. Therefore, matrix Formula 7 serves as a factor to determine the degree of shrinkage for the estimate of Formula 7 Because Formula 7 varies, the degree of shrinkage also varies across k. To prove the shrinkage estimate, I first introduce the following lemma:

LEMMA. Assume that parameter b can be inferred from two independent sources of information. Let Formula 7 and Formula 7 be the distributions of the two sources of information. When we combine Formula 7 and Formula 7 the distribution of b remains multivariate normal Formula 7 with mean Formula 7 and variance–covariance matrix Formula 7

Proof of the lemma. The distribution of b given the two sources of information is described by

Formula 8(8)
where C is a constant with respect to b. When deriving a distribution, we are interested only in the kernel of the distribution. A kernel of a distribution is the central part of the distribution function, the part that remains when constants are disregarded. In the above distribution, the logarithm of the kernel is

Formula 9(9)
which is further expressed by

Formula 10(10)
We can see that this kernel involves another constant, Formula 10 which can be ignored also. Therefore, the actual kernel that contains only the linear and quadratic functions of b is

Formula 11(11)
Let Formula 11 and Formula 11 The kernel is simplified into

Formula 12(12)
which turns out to be the kernel of Formula 12 Therefore, we conclude that Formula 12

Derivation of the shrinkage estimates:
We now use the above lemma to derive the shrinkage estimate of Formula 12 The two sources of information for Formula 12 come from the data (Formula 12) and the prior. Information from the data is used to infer Formula 12 through the maximum-likelihood method. The log-likelihood function is

Formula 13(13)
The maximum-likelihood estimate of Formula 13 is

Formula 14(14)
and the variance of this estimate is

Formula 15(15)

Let Formula 15 and Formula 15 After some algebraic manipulation on the likelihood function, we find that Equation 13 has the following normal kernel with respect to Formula 15

Formula 16(16)

Therefore, the distribution of Formula 16 inferred from the data is Formula 16 The second source of information for Formula 16 is the prior distribution Formula 16 If we let Formula 16 and Formula 16 the distribution of Formula 16 from the second source of information is Formula 16 According to the lemma, the posterior mean of Formula 16 is

Formula 17(17)
and the posterior variance is

Formula 18(18)
This concludes the derivation of the shrinkage estimate of Formula 18

Univariate version of the shrinkage estimate:
The shrinkage estimate of the regression coefficient given by XU (2003) is a special case of the general shrinkage estimate. The regression model of XU (2003) is

Formula 19(19)
where every variable in the equation is a scalar rather than a matrix. When focused on the kth regression coefficient, the model is rewritten as

Formula 20(20)
where Formula 20 is the adjusted data. Let us assume Formula 20 where Formula 20 is the univariate version of matrix D. Assume that the prior distribution for Formula 20 is Formula 20 Therefore, the univariate versions of Formula 20 and Formula 20 are Formula 20 and Formula 20 respectively. Substituting all the parameters of Equations 4 and 5 by their univariate counterparts, we have

Formula 21(21)
and

Formula 22(22)
These equations are exactly the same as Equations 5 and 6 given by XU (2003).


    DISCUSSION
 TOP
 ABSTRACT
 THEORY AND MODEL
 DISCUSSION
 LITERATURE CITED
 
There are several alternative ways to prove the shrinkage estimation, such as the conditional distribution of multivariate normal variables (GIRI 1996). The method presented in this note is a generalization of BOX and TIAO's (1973, Appendix A1.1) combination of a univariate normal likelihood and a univariate normal prior. Using the method of BOX and TIAO (1973), we can extend the lemma to the situation of inferring b from more than two independent sources of information. Let m be the number of sources of information (independent of each other) used to infer b and the distribution from the ith source is Formula 22 for Formula 22 The posterior distribution of b combining all the sources of information is Formula 22 where

Formula 23(23)
and

Formula 24(24)
One can use mathematical induction to prove Equations 23 and 24, starting from Formula 24 (given in the lemma) and moving to Formula 24 and so on.

Bayesian shrinkage estimation refers to the biased estimation of a regression coefficient toward zero using a prior variance as a factor to control the degree of shrinkage. A normal prior is often selected because it is a conjugate prior so that the posterior distribution remains normal. A normal posterior simplifies the MCMC sampling process because the Gibbs sampler can be used to draw the regression coefficient. Other prior distributions have been proposed, e.g., the mixture prior of two normal distributions (GEORGE and MCMULLOCH 1993; YI et al. 2003) and the spike and slab model (ISHWARAN and RAO 2005). A t-distribution may also be used as a prior for the regression coefficient. However, the posterior distribution using a nonnormal prior rarely has an explicit form of a distribution, making Gibbs sampling impossible and thus complicating the MCMC sampling process.

The shrinkage method for regression analysis may also be called the random model approach to regression analysis, or simply random regression, because each regression coefficient is treated as a random effect with a (prior) normal distribution. It is well known that there is no limit in the number of random effects that can be handled by a random model. The success of a random linear model analysis, however, depends on the variance components chosen for the random model. If a random model contains an excessively large number of regression coefficients, most of them will be zero or close to zero. The sparse nature of the regression coefficients cannot be characterized by the random linear model alone and it must be accompanied by an efficient method to choose the variance components. In QTL mapping, the number of variance components can be extremely large, making subjective selection of the variance components impossible. Therefore, the variance components must be estimated from the data.

The most convenient way to estimate the variance components is to use the maximum-likelihood method. The estimated variance components are used in place of the prior variances to estimate the regression coefficients. The method is called the empirical Bayes method as far as the estimation of regression coefficients is concerned (XU 2007). To reflect the sparse nature of the regression coefficients, a prior distribution is often assigned to each variance component. This is called hierarchical modeling (GELMAN 2005). Furthermore, the prior distribution should be highly concentrated around zero. Many different prior distributions can be chosen for the variance components, but the scaled inverse chi-square distribution is the most convenient and flexible prior with such a property (LINDLEY and SMITH 1972). Exponential distribution (TIBSHIRANI 1996) and half t-distribution (GELMAN 2006) have also been used. The prior choice for variance components of the random regression analysis is a very active research area to explore. More efficient priors may be developed in the future.

In the random regression analysis, the variance of a regression coefficient is not the primary interest of the investigator; rather, it is used only for the purpose of controlling the magnitude of the shrinkage. If the regression coefficients are batched (clustered) so that regression coefficients in the same batches share the same prior distribution, the variance may be estimated accurately and the estimate of it may be meaningful (GELMAN 2005). In this case, the primary interest has been shifted from the regression coefficients to the variances of the regression coefficients; the method is better called the analysis of variances (ANOVA) (GELMAN 2005). In the usual shrinkage analysis, the regression coefficients are not batched; i.e., every regression coefficient has its own prior variance, and the estimated variance for a regression coefficient may vary drastically across the posterior sample. This problem may look very bad, but will not seriously harm the Bayesian shrinkage estimates of the regression coefficients. One can minimize the variation of the sampled variance across the posterior sample by using some proper prior distribution for the variance (GELMAN 2005).


    LITERATURE CITED
 TOP
 ABSTRACT
 THEORY AND MODEL
 DISCUSSION
 LITERATURE CITED
 

BOX, G. E. P., and G. C. TIAO, 1973 Bayesian Inference in Statistical Analysis. Wiley & Sons, New York.

GELMAN, A., 2005 Analysis of variance–why it is more important than ever. Ann. Stat. 33: 1–53.[CrossRef]

GELMAN, A., 2006 Prior distribution for variance parameters in hierarchical models. Bayesian Anal. 1: 515–533.

GEORGE, E. I., and R. E. MCMULLOCH, 1993 Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 91: 883–904.[CrossRef]

GIRI, N. C., 1996 Multivariate Statistical Analysis. Marcel Dekker, New York.

ISHWARAN, H., and J. S. RAO, 2005 Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33: 730–773.[CrossRef]

LINDLEY, D. V., and A. F. M. SMITH, 1972 Bayes estimates for the linear model. J. R. Stat. Soc. Ser. B 34: 1–41.

ROBINSON, G. K., 1991 That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6: 15–32.

TIBSHIRANI, R., 1996 Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58: 267–288.

WANG, H., Y. M. ZHANG, X. LI, G. L. MASINDE, S. MOHAN et al., 2005 Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465–480.[Abstract/Free Full Text]

XU, S., 2003 Estimating polygenic effects using markers of the entire genome. Genetics 163: 789–801.[Abstract/Free Full Text]

XU, S., 2007 An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63: 513–521.[CrossRef][Medline]

YANG, R., and S. XU, 2007 Bayesian shrinkage analysis of quantitative trait loci for dynamic traits. Genetics 176: 1169–1185.[Abstract/Free Full Text]

YI, N., V. GEORGE and D. B. ALLISON, 2003 Stochastic search variable selection for identifying quantitative trait loci. Genetics 164: 1129–1138.[Abstract/Free Full Text]

Communicating editor: B. J. WALSH





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
genetics.107.077487v1
177/2/1255    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Xu, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by the Genetics Society of America.