- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Yi, N.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Yi, N.
Genetics, Vol. 167, 967-975, June 2004, Copyright © 2004
doi:10.1534/genetics.104.026286
A Unified Markov Chain Monte Carlo Framework for Mapping Multiple Quantitative Trait Loci
Nengjun Yi1
Section on Statistical Genetics, Department of Biostatistics, University of Alabama, Birmingham, Alabama 35294-0022
1 Address for correspondence: Department of Biostatistics, Ryals Public Health Bldg., 1665 University Blvd., University of Alabama, Birmingham, Alabama 35294-0022.
E-mail: nyi{at}ms.soph.uab.edu
>ABSTRACT
THE MULTIPLE-QTL MODEL
THE COMPOSITE SPACE FOR...
PRIOR SPECIFICATIONS
POSTERIOR CALCULATION AND...
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
In this article, a unified Markov chain Monte Carlo (MCMC) framework is proposed to identify multiple quantitative trait loci (QTL) for complex traits in experimental designs, based on a composite space representation of the problem that has fixed dimension. The proposed unified approach includes the existing Bayesian QTL mapping methods using reversible jump MCMC algorithm as special cases. We also show that a variety of Bayesian variable selection methods using Gibbs sampling can be applied to the composite model space for mapping multiple QTL. The unified framework not only results in some new algorithms, but also gives useful insight into some of the important factors governing the performance of Gibbs sampling and reversible jump for mapping multiple QTL. Finally, we develop strategies to improve the performance of MCMC algorithms.
MANY complex traits are controlled by multiple genetic [quantitative trait loci (QTL)] and environmental factors. Mapping QTL is the process of estimating the number of QTL, their genomic positions, and genetic effects conditional on the observed phenotypic data and marker data. This is essentially a problem of model selection (e.g., BROMAN and SPEED 2002; SILLANPää and CORANDER 2002). QTL mapping is complicated by the fact that the number of QTL and hence the dimensionality of the parameter space are unknown. Recently, the Bayesian methods and Markov chain Monte Carlo (MCMC) algorithms have been applied to jointly infer the number of QTL, their genomic positions, and genetic effects. The reversible-jump MCMC algorithm introduced by GREEN (1995) can move between models of different dimension and has become an almost routine tool in Bayesian QTL mapping (HOESCHELE 2001). Using the reversible-jump MCMC method, we can in principle jointly infer the genetic model of a complex trait and the associated genetic parameters, including the number, positions, and genetic effects of the identified QTL. Recently, a variety of reversible-jump algorithms have been conducted to map QTL in both experimental designs (SATAGOPAN and YANDELL 1996; SILLANPää and ARJAS 1998, 1999; STEPHENS and FISCH 1998; YI and XU 2000; GAFFNEY 2001) and pedigrees (HEATH 1997; UIMARI and HOESCHELE 1997; XU and YI 2000; UIMARI and SILLANPää 2001; YI and XU 2001).
The reversible-jump MCMC algorithm is a very general and widely applicable technique (GREEN 1995, 2003). It appears to be suited for implementing model selection procedures across a wide range of possible genetic architectures. However, this flexible method has been deemed somewhat "difficult" to understand, cumbersome to conduct, and difficult to tune. It also has been noted that the reversible-jump MCMC is usually subject to poor mixing and slow convergence. Therefore, there seems to be a need for further methodological work on improving the efficiency of reversible jump. The improved frameworks have been established recently for conventional statistical models (GODSILL 2001; BROOKS et al. 2003; GREEN 2003). It is clear that Bayesian QTL mapping can benefit from renewed research efforts.
For conventional linear models, a variety of MCMC methods have been proposed for variable selection, including the variable selection algorithms of SMITH and KOHN (1996) and KUO and MALLICK (1998), the MCMC model combination (MC3) technique of RAFTERY et al. (1997), the Gibbs variable selection of DELLAPORTAS et al. (2002), and the stochastic search variable selection of GEORGE and MCCULLOCH (1993). For certain situations, these different methods have their own advantages. To date, however, they have been rarely applied to the area of mapping QTL (but see BROMAN and SPEED 2002; YI et al. 2003b). The variable selection methods, originally derived from diverse procedures, have been recently shown to relate closely to Green's reversible-jump MCMC (CLYDE 1999; NTZOUFRAS 1999; GODSILL 2001; DELLAPORTAS et al. 2002). GODSILL (2001) recently introduced a composite model space framework that embraces not only all of the above variable selection methods, but also the reversible jump. The composite space method, which is a modification of the product space used by CARLIN and CHIB (1995), provides an interesting viewpoint on model selection, since it allows MCMC simulation to be performed, at least conceptually, on a fixed dimension space. Under the composite space representation, the Bayesian variable selection methods described above and the reversible-jump algorithm can be shown to derive straightforwardly from a general framework. These relationships between the methods can aid our understanding of MCMC model selection procedures and may assist in the development of improved procedures.
In this study, we propose a composite space presentation for the multiple-QTL model and develop a unified MCMC framework for exploring the posterior of the composite space. The proposed unified approach includes the existing Bayesian QTL mapping methods using reversible-jump MCMC algorithm as special cases. We also show that a variety of Bayesian variable selection methods using Gibbs sampling can be applied to map multiple QTL. The unified framework sheds some light upon the important factors governing the performance of Gibbs sampling and reversible jump for mapping multiple QTL. We also develop strategies to improve the performance of MCMC algorithms.
ABSTRACT
>THE MULTIPLE-QTL MODEL
THE COMPOSITE SPACE FOR...
PRIOR SPECIFICATIONS
POSTERIOR CALCULATION AND...
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (1) |
2). The definitions of xij and ßj depend on the experimental design. For an F2 cross, for example, we have that
![]() |
The above model is typically used in Bayesian mapping implemented via the reversible-jump MCMC algorithm. In this model, the number of QTL is treated as a random variable, and thus the total number of possible effects is unknown. In practical implementation of reversible-jump MCMC, we usually assume that the random variable l has an upper bound K. Model (1) can be rewritten as
![]() | (2) |
j is an indicator variable that denotes that the jth QTL is included (
j = 1) in the model or excluded from the model (
j = 0). Note that the number of QTL does not explicitly appear in model (2). This parameter equals the number of 1's in
. Model (2) is similar to that used in Bayesian variable selection for the linear regression model (e.g., KUO and MALLICK 1998). The idea of adding the indicator variable in the model facilitates setting up MCMC algorithms. As in the linear regression model, we treat K as known and thus in model (2) the total number of possible effects is fixed.
The choice of the constant K depends on the method and the aim in the analysis. In marker analysis, each marker is treated as a potential QTL and thus K equals the number of markers (BALL 2001; BROMAN and SPEED 2002; XU 2003; YI et al. 2003b). In QTL mapping, one does not know a priori how many QTL to expect for a given trait. We here propose two methods for choosing K: (1) As in almost all existing Bayesian mapping methods, we assume that there are at most K QTL in the entire genome, and (2) we assume that there are at most Kc QTL on the cth chromosome. Then we have K =
cKc. As an extreme case, we could assume that each marker interval is associated with a QTL and thus Kc is identical to the number of marker intervals on the cth chromosome. The assumption that there is at most one QTL on a marker interval is not a fundamental requirement for the proposed method. Generally, the value of K can be smaller than the number of marker intervals. The value of K should account for the data information and the previous results obtained by using other QTL mapping methods. As is seen later, alternatively, we can use particular prior distributions on the parameters in the model to relieve the influence of K on the performance of the proposed algorithms. In particular, these prior distributions account for the sample size n, the marker information, and the upper bound of QTL K.
In the following sections, we first propose a composite space representation of the problem for mapping multiple QTL based on model (2). We then discuss the specifications of the prior distributions on the unknowns. Finally, we develop a unified MCMC framework for exploring the composite model space.
ABSTRACT
THE MULTIPLE-QTL MODEL
>THE COMPOSITE SPACE FOR...
PRIOR SPECIFICATIONS
POSTERIOR CALCULATION AND...
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
. Hereafter, we call the vector
the model index, which indicates which QTL are present in the model.
In marker analysis, model (2) is essentially a usual linear regression model in that each coefficient xij is observed. In QTL mapping, the coefficients in the model,
, are unobservable, and the locations of K QTL,
, are also unknown. Denote
and
= (ß, µ,
2). We partition (
, x,
) into (
, x
, 
) and (
, x
,
), representing the unknowns included (
j = 1) or excluded (
j = 0) from the model, respectively, where 
= (ß
, µ,
2) and
= ß
. Hereafter, we call (
,
, x,
) the composite space for the multiple-QTL model. For the detailed description about the composite space for model uncertainty problems, the reader is referred to GODSILL (2001)(2003).
Under model (2), the likelihood function for a particular
depends only upon the parameters (x
, 
) used by that model, i.e.,
![]() | (3) |
We assume that the prior distribution of (
,
, x,
) can be partitioned as
![]() | (4) |
The full posterior distribution of the composite model space (
,
, x,
) can now be expressed as
![]() | (5) |
Note that here we have suppressed the notation for conditional on the observed marker data.
In the above posterior distribution, p(
) is the prior distribution of the model index. p(
, x
, 
|
) is the joint prior distribution of the used unknowns, which can be partitioned into three components:
![]() | (6) |
The prior for the unused unknowns, p(
, x
,
|
, 
, x
, 
), may be called "pseudo-prior." It is reasonable to assume that (
, x
,
) is a priori independent of (
, x
, 
). The pseudo-prior can be factorized into three components:
![]() | (7) |
In Equations 6 and 7, p(
|
) and p(
|
) are the prior distributions of the locations of QTL. p(x
|
) and p(x
|
) are the probability distributions of QTL genotype indicators, which can be calculated using multipoint methods (JIANG and ZENG 1997). p(
|
, x
) is the prior distribution of the used parameters, which may depend on x
. p(
|
) is the prior distribution of the unused genetic effects.
The key feature of the composite model space is that the dimension remains fixed even when the model index
or the number of QTL changes. This remarkable feat is achieved by augmenting the varying dimensional space (
, 
, x
, 
) to the fixed dimensional space (
,
, x,
). Simulation of p(
,
, x,
|y) then can be addressed via standard MCMC algorithms for a distribution of fixed dimension (GODSILL 2001). Thus convergence properties of these algorithms are inherited from standard MCMC theory. Furthermore, the composite space approach provides a method to use the important parameters for models other than the current model for efficient proposal design (GODSILL 2003; GREEN 2003).
ABSTRACT
THE MULTIPLE-QTL MODEL
THE COMPOSITE SPACE FOR...
>PRIOR SPECIFICATIONS
POSTERIOR CALCULATION AND...
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
For the specification of the model index, most Bayesian variable selection implementations have used independence priors of the form
![]() | (8) |
Under this prior, each QTL enters the model independently of the other QTL, with probability p(
j = 1) = 1 p(
j = 0) = wj. In QTL mapping, a reasonable reduction may be to set wj
w, yielding
![]() | (9) |
. The hyperparameter w is the prior expected proportion of QTL included in the model. In particular, setting w = 1/2 yields the popular uniform prior
![]() | (10) |
, i.e.,
![]() | (11) |
The locations of QTL are assumed to be independent a priori and uniformly distributed across the entire genome or the corresponding regions. If we suppose that there are at most Kc QTL on the cth chromosome, Kc QTL could be uniformly distributed on this chromosome. Since it is usually difficult to distinguish multiple QTL on a marker interval (e.g., LYNCH and WALSH 1998), it may be reasonable to assume that there is at most one QTL on a marker interval although this is not a fundamental requirement for our method. Furthermore, this assumption can limit the model space.
The prior for the overall mean µ is normally distributed with mean
0 and variance
20. We could choose
0 = 0 or
and
. We choose an inverse
(a, b) as the prior of
2. GAFFNEY (2001) suggested a = 3 and
, which has prior mean and variance equal to s2y/2. Alternatively, we could take p(
2)
1/
2 or p(
2)
1.
We could use three types of prior distributions for the genetic effect ß. First, we could use a normal prior for each vector of genetic effects, i.e., ßj
N(0,
), j = 1, ... , K, where the prior mean of zero reflects indifference between positive and negative values, and
is the prior covariance matrix. The covariance matrix
could be chosen to be diagonal. In this prior specification, the prior distribution for each QTL is identical and is independent of
j. Most Bayesian mapping methods have used this type of prior distribution. This prior has been used by KUO and MALLICK (1998) in Bayesian variable selection for a linear regression model. Second, we could use the prior p(ßj|
j) = (1
j)N(0,
) +
jN(0, c2
), where c2 is a predetermined constant. This prior has been used by GEORGE and MCCULLOCH (1993) and DELLAPORTAS et al. (2002) for a linear regression model. Third, we could use p
N
, where c2 is a hyperparameter. This prior has been used extensively in Bayesian variable selection for the conventional linear model.
The prior distributions on the model index
and the genetic effects ß may be the most critical factors influencing the performance of the algorithms and thus deserve careful attention. The hyperparameter w or L in the prior of
controls the expected proportion of genetic effects and the number of QTL included in the model. The prior covariance matrix or the hyperparameter c in the prior of ß controls the expected size of genetic effects included in the model. Small w or L and large prior variance or c would concentrate the prior on parsimonious models with large effects, and large w or L and small prior variance or c would concentrate on saturated models with small effects. The reasonable choices of c and w would account for the sample size n, the marker information, and the upper bound of QTL K. For the conventional linear model, FERNANDEZ et al. (2001) recommended c = max{n, K2}. GEORGE and FOSTER (2000) proposed treating c and w as unknown parameters and using empirical Bayes estimates of c and w based on the data. Finally, we could consider hyperprior distributions on w or L and
or c (e.g., CHIPMAN et al. 2001; GAFFNEY 2001).
ABSTRACT
THE MULTIPLE-QTL MODEL
THE COMPOSITE SPACE FOR...
PRIOR SPECIFICATIONS
>POSTERIOR CALCULATION AND...
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
,
, x,
|y), which includes the existing reversible-jump algorithms as special cases and also provides some new methods for mapping multiple QTL. It is seen that standard Gibbs samplers applied to the composite model space produce several well-known Bayesian variable selection methods developed for the linear regression model, while a more sophisticated Metropolis-Hastings (M-H) approach produces a version of the reversible-jump algorithm. We also propose strategies to improve efficiency of Bayesian mapping.
The full conditional posterior distributions for 
and
are given by
![]() | (12) |
![]() | (13) |
The full conditional posterior distributions for elements of 
, e.g., µ, ß
, and
2, can be easily derived from Equation 12. These posteriors have standard forms and thus can be easily sampled (e.g., GELMAN et al. 1995). It can be seen that the parameters unused in the model do not influence the posterior of 
. Since the unused parameters do not contribute to the likelihood, the posterior of
is identical to its prior. For some algorithms developed later, the values of
are used to update the model index and thus
needs to be generated from the pseudo-prior. The pseudo-prior is likely to influence the performance of these algorithms and hence it should be specified with caution.
Since the position
j is highly dependent on xj, we jointly update
j and xj. The joint full conditional posterior distribution for the location and genotype indicator of the jth QTL is
![]() | (14) |
j (xj) represents all elements of
(x) except
j (xj). This posterior is not a standard distribution, and thus the M-H algorithm is needed to update
j and xj jointly. We first propose a new location
*j from q
, and then generate genotype indicator x*j at this new location for all individuals from the posterior
. The proposals for the new location and the genotype indicator are then accepted or rejected simultaneously with probability
![]() | (15) |
(YI and XU 2001). Two schemes can be employed to propose a new location,
*j: (1) local move, propose
*j from Uniform(
j d,
j + d), where d is a predetermined tuning parameter; and (2) long range move, propose
*j uniformly from the corresponding region (GAFFNEY 2001). Note that under the conjugate prior, the parameter
can be integrated out from the posterior distribution (14). Therefore, the joint full conditional posterior distribution becomes
![]() | (16) |
.
The full conditional posterior distribution of xij is given by
![]() | (17) |
Note that when the jth QTL is not included in the model, the posteriors of the genetic effects, location, and genotype indicators for this QTL are identical to the corresponding priors. The values of these unknowns are required to update the indicator variable of the QTL. Therefore, we first describe the methods for updating the model index on the basis of the values of these unknowns sampled from their priors. However, sampling from the priors does not update or make use of our current knowledge about these unknowns and hence cannot possibly produce an optimal sampler.
The standard MCMC procedures, Gibbs sampler and M-H algorithm, can be applied to update the model index
. Several different methods can be developed as follows.
Method I:
The full conditional posterior distribution of the indicator variable
j is given by
![]() | (18) |
j represents all elements of
except
j. This posterior is a Bernoulli distribution and thus easily sampled. The sampling can be implemented sequentially or in random order.
This Gibbs sampler includes several Bayesian variable selection methods as special cases, depending on the prior specifications of ßj (NTZOUFRAS 1999; DELLAPORTAS et al. 2002). KUO and MALLICK (1998) use a prior distribution p(ßj), which is independent of
j so that p(ßj|
j = 1) = p(ßj|
j = 0). Then, the third term on the right-hand side of (18) can be omitted. Similar to GEORGE and MCCULLOCH (1993), DELLAPORTAS et al. (2002) use a mixture of normal distribution for model parameters, i.e., p(ßj|
j) = (1
j)N(0,
) +
jN(0, c2
), so that p(ßj|
j = 1) = N(0, c2
) and p(ßj|
j = 0) = N(0,
). Then, the third term should remain in (18).
We also can apply the M-H algorithm to update the model index
conditional on other unknowns. The M-H algorithm is not based on sampling directly from the full conditional, but on a proposal for a move from
to
', followed by acceptance or rejection of this proposal. Although the M-H sampler can in principle update multiple components of
simultaneously, we discuss only the simplest strategy where only one component in
is proposed; thus at each iteration we actually propose to add or delete one QTL. We assume that the jth element of
is proposed with probability q(
';
); then the acceptance probability, using the standard M-H algorithm, is given by min(1, r), where the acceptance ratio r is
![]() | (19) |
= (
j = 1 s,
j),
' = (
j = s,
j), and s = 1 or 0 corresponding to adding or deleting one QTL, respectively.
The proposals q(
';
) can be set to pa or pd/(l + 1), depending on s = 1 or 0, where l is the number of 1's in
, which equals the current number of QTL, and pa and pd are constants satisfying pa + pd = 1. This proposal scheme is equivalent to that commonly used in Bayesian QTL mapping. Alternatively, we can set [p(
)q(
;
')]/[p(
')q(
';
)] = 1 (GAFFNEY 2001). Under our composite model space, however, two new schemes can be developed, borrowing the idea of variable selection: (1) We pick one of the K variables (QTL) at random and either delete or add it if it is currently or not, respectively, in the model; thus we have that q1(
';
) = q1(
;
') = 1/K, or (2) we can update
j for all j = 1, ... , K sequentially or in random order; thus we have that q1(
';
) = q1(
;
') = 1. Under both these schemes, the move proposal probability cancels from the acceptance probability ratio (19).
The above M-H algorithm is equivalent to a reversible-jump algorithm with reflecting boundaries at 0 and K QTL. To describe this relationship, we assume s = 1, which corresponds to adding one QTL into the model. The reversible jump can proceed to generate a new location and the genotype indicators at the new location from the priors and the associated effects ßj from p(ßj|
j = 0). Then, the acceptance ratio is given, using the reversible-jump algorithm of GREEN (1995)(2003), by (19). This reversible-jump algorithm has been widely used in Bayesian QTL mapping (e.g., HEATH 1997; SILLANPää and ARJAS 1998, 1999; STEPHENS and FISCH 1998; YI and XU 2000).
Method II:
The above algorithm is conditional on the value of ßj sampled from the pseudo-prior when ßj is proposed to add into the model. More efficient MCMC algorithms by using blocking strategies can be devised to yield improved performance. Under the linear model (2), we can choose the pseudo-prior for ßj to be the conditional posterior for ßj with
j = 1; that is, set
![]() |
ßj means all elements of
except ßj. The sampling step for
j then reduces to
![]() | (20) |
(GODSILL 2001). This approach is equivalent to a sampling scheme, which first draws
j from p(
j|
j, x,
ßj, y) and then draws ßj from p(ßj|
j = s,
j,
, x,
, y). This scheme actually draws jointly for (
j, ßj). This blocking procedure can be viewed as equivalent to that used by GEWEKE (1996).
A Metropolis-Hastings version of the above blocking procedure can be easily designed. Assume that the jth element of
is proposed with probability q(
';
). The acceptance ratio is given by
![]() | (21) |
= (
j = 1 s,
j),
' = (
j = s,
j), and s = 1 or 0 corresponding to adding or deleting one QTL, respectively.
Using the identity
, the acceptance ratio (21) for s = 1 then becomes
![]() | (22) |
= (
j = 0,
j) and
' = (
j = 1,
j).
This M-H algorithm is equivalent to the reversible-jump algorithm, which proceeds to generate a new location and the genotype indicators at the new location from the priors and the associated effects ßj from the full conditional posterior, p(ßj|
', x, 
, y). This reversible-jump algorithm is similar to that developed by YI and XU (2001)(2002).
Method III:
Under the linear model (2), in fact, all parameters
can be integrated out from the conditional posterior distribution (18), i.e.,
![]() | (23) |
Therefore,
can be updated independent of
. This method is equivalent to that by SMITH and KOHN (1996) for the conventional linear regression model. BROMAN and SPEED (2002) have applied this method to marker selection in a backcross design. This approach is equivalent to a blocking scheme, which first draws 
from the full conditional posterior p(
|
j = 1,
j, x, y) and then draws
j from p(
j = s|
j,
, x,
, y). This equivalence can be seen more explicitly from the following M-H version.
Assume that a proposal for a move from
= (
j = 1 s,
j) to
' = (
j = s,
j) with probability q(
';
); the acceptance ratio is given, using the standard M-H procedure, by
![]() | (24) |
Using the identity p(
j = s|
j, x, y) = p(
j = s, 
|
j, x, y)/p(
|
, x, y), the acceptance ratio then becomes
![]() | (25) |
This is exactly the acceptance ratio for the reversible-jump sampler, which proceeds to generate a new location and the genotype indicators at the new location from the priors and all the associated parameters 
from the full conditional posterior, p(
|
, x, y). Such a scheme can be viewed as equivalent to the MC3 method of RAFTERY et al. (1997) for linear regressions. Note that the method developed by GAFFNEY (2001), in which all the associated effects ß
are sampled from p(ß
|
, x, µ,
2, y), is close to the above approach.
The major difference among these three methods is the proposal distribution on the genetic effects. Proposing the new genetic effects from the prior does not seem to be the best choice. It places an extraordinary burden on prior specification for the effects (GAFFNEY 2001). In methods II and III, the corresponding parameters are integrated out from the posteriors, or equivalently the blocking strategies are used. Since the acceptance probabilities are independent of parameter values, the samplers would lead to excellent exploration of model space (GODSILL 2001). This advantage results from the use of the full conditional posterior as reversible-jump proposals. This would suggest that reversible-jump proposals should be designed to approximate as close as possible the full conditionals.
General formula and improved strategies:
We can derive a general formula that includes the algorithms discussed above as special cases. Consider a proposal from the current state of the composite space (
,
, x,
) to a new state (
',
', x',
') with the proposal distribution q(
',
', x',
';
,
, x,
). Using the standard M-H procedure, the acceptance probability for this proposal is given by
![]() | (26) |
The proposal can be split into three components:
![]() |
The first component q1 proposes a move to a new model index
'. The second term q2 is the proposal for the unknowns used by model
'. The third term is the proposal probability for the remaining unused unknowns, which is chosen to equal the pseudo-prior p
. The acceptance ratio then reduces to
![]() | (27) |
This is exactly the acceptance ratio for the reversible-jump sampler with the proposal distribution factored into two components, q1(.) and q2(.) (GREEN 1995, 2003; GODSILL 2001, 2003). This derivation of reversible jump is obtained purely from an application of the standard M-H method to fixed-dimensional composite model space. We see that the acceptance probability is independent of the value of any parameters that are unused by both models k and k'. Hence sampling of these unused unknowns is only a "conceptual" step, which need not be performed in practice. The aim of including these unused parameters is to build a fixed-dimensional model space.
The performance of the above M-H sampler is determined by the proposal distributions q1(.) and q2(.). The optimal choice of proposal q2
should be the full conditional p
. This scheme produces an M-H sampler with the posterior model p(
|y) as the target distribution and thus leads to excellent exploration of model space (GODSILL 2001). Unfortunately, this full conditional is not available analytically. We have to design a proposal that approximates as closely as possible the full conditional. As in all existing Bayesian mapping methods, we use three sequential steps to propose the values of unknowns
and then have the factorization
![]() | (28) |
Conditional on (
, 
, x
), model (2) is a conventional linear model and thus q23 can be taken to be the full conditional posterior p(
|
, x, y), which results in the acceptance probability independent of 
. Sampling (
, x
) is a special problem in QTL mapping. Therefore, performance of MCMC mapping procedures should depend highly on the specifications of q21 and q22. In all the previous algorithms, the location
'j and the genotypes x'j are proposed from their priors. This sampling scheme may be suboptimal since each locus is chosen with equal probability no matter which one has weak or strong linkage evidence. Sampling
'j from the prior also means that the information about the jth QTL is totally lost as soon as we delete this QTL from the model; this usually causes low acceptance probability and greatly influences the mixing behavior. To improve performance of reversible jump, it may be desirable to choose a location with stronger linkage evidence. The proposal q21
then has unequal probability over the genome. LEE and THOMAS (2000) developed a method to propose a location by scanning the unoccupied regions of the entire genome for evidence of linkage of the trait residuals. Although the method of LEE and THOMAS (2000) has greatly improved the acceptance ratio, it largely increases computational load. With the composite model space approach, we are able to design an algorithm in which the values for any locus can be retained until this locus is next visited. An efficient scheme could be designed as follows: If the jth QTL was ever included in the model, the last location of this QTL is directly taken; otherwise, a new location is sampled from the prior. Clearly this easy-to-use method makes use of our current knowledge about the QTL locations and thus should improve the performance of MCMC algorithms.
ABSTRACT
THE MULTIPLE-QTL MODEL
THE COMPOSITE SPACE FOR...
PRIOR SPECIFICATIONS
POSTERIOR CALCULATION AND...
>DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
The main difficulty with the existing reversible-jump algorithms is that the acceptance probability is too low. Another major challenge remains to ascertain convergence of the reversible-jump sampler and obtain a rapidly converging sampler. This is especially true in searching for multiple QTL across the entire genome. With the existing reversible-jump algorithms, the information about a QTL is totally lost as soon as we delete this QTL from the model; this greatly influences the acceptance probability and the mixing behavior. The key to the proposed composite space approach is that transdimensional problems can be considered by looking only at a fixed dimension space. As GODSILL (2003) notes, therefore, convergence properties of the reversible-jump algorithm are inherited from the standard M-H scheme on the composite model space. A further advantage is that in principle the parameters for models other than the current model can be stored and then used for an efficient proposal design when a model is revisited. In this study, we develop strategies to achieve this advantage, which may improve the MCMC procedures for mapping QTL.
The proposed unified procedure includes various Bayesian model and variable selection methods. These methods are derived from a single framework, and thus may be incorporated into a unified computer program. Future work includes full-scale simulation studies and real data analyses, which can verify the mathematical derivations involved in the theory and test the efficiency of the proposed methods. The statistical properties of the Bayesian approach rest squarely on the specification of the prior distributions on the unknowns. Substantial effort will be devoted to prior selection and investigating the robustness of the proposed methods.
In this study, we ignored gene-by-environmental interactions and gene-by-gene interactions (epistatic effects). A growing number of experiments provide strong evidence of the presence of gene-by-environment interactions and epistasis for many complex traits. Thus it is important to include potential interactions into the proposed methods. Recently, YI and XU (2002) and YI et al. (2003a) developed reversible-jump algorithms for searching for complex epistatic QTL across the entire genome. The composite space sampler proposed can in principle be extended to include interactions and thus may improve efficiency of detecting complex interacting QTL.
ABSTRACT
THE MULTIPLE-QTL MODEL
THE COMPOSITE SPACE FOR...
PRIOR SPECIFICATIONS
POSTERIOR CALCULATION AND...
DISCUSSION
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
THE MULTIPLE-QTL MODEL
THE COMPOSITE SPACE FOR...
PRIOR SPECIFICATIONS
POSTERIOR CALCULATION AND...
DISCUSSION
ACKNOWLEDGEMENTS
>LITERATURE CITED
BALL, R. D., 2001 Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the Bayesian information criterion. Genetics 159: 13511364.
BROMAN, K. W., and T. P. SPEED, 2002 A model selection approach for identification of quantitative trait loci in experimental crosses. J. R. Stat. Soc. B 64: 641656.[CrossRef]
BROOKS, S. P., P. GIUDICI and G. O. ROBERTS, 2003 Efficient construction of reversible MCMC proposal distributions. J. R. Stat. Soc. B 65: 356.
CARLIN, B. P., and S. CHIB, 1995 Bayesian model choice via Markov chain Monte Carlo. J. Am. Stat. Assoc. 88: 881889.[CrossRef]
CHIPMAN, H., E. I. EDWARDS and R. E. MCCULLOCH, 2001 The practical implementation of Bayesian model selection, pp. 65116 in Model Selection, edited by P. LAHIRI. Institute of Mathematical Statistics, Beachwood, OH.
CLYDE, M. A., 1999 Bayesian model averaging and model search strategies, pp. 157185 in Bayesian Statistics 6, edited by J. M. BERNARDO, J. O. BERGER, A. P. DAWID and A. F. M. SMITH. Oxford University Press, London/New York/Oxford.
DELLAPORTAS, P., J. J. FORSTER and I. NTZOUFRAS, 2002 On Bayesian model and variable selection using MCMC. Stat. Comput. 12: 2736.[CrossRef]
FERNANDEZ, C., E. LEY and M. F. J. STEEL, 2001 Benchmark priors for Bayesian model averaging. J. Econom. 100: 381427.[CrossRef]
GAFFNEY, P. J., 2001 An efficient reversible jump Markov chain Monte Carlo approach to detect multiple loci and their effects in inbred crosses. Ph.D. Dissertation, Department of Statistics, University of Wisconsin, Madison, WI.
GELMAN, A., J. CARLIN, H. STERN and D. RUBIN, 1995 Bayesian Data Analysis. Chapman & Hall, London.
GEORGE, E. I., and D. P. FOSTER, 2000 Calibration and empirical Bayes variable selection. Biometrika 87: 731747.
GEORGE, E. I., and R. E. MCCULLOCH, 1993 Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88: 881889.[CrossRef]
GEWEKE, J., 1996 Variable selection and comparison in regression, pp. 609620 in Bayesian Statistics 5, edited by J. M. BERNARDO, J. O. BERGER, A. P. DAWID and A. F. M. SMITH. Oxford University Press, London/New York/Oxford.
GODSILL, S. J., 2001 On the relationship between MCMC model uncertainty methods. J. Comput. Graph. Stat. 10: 230248.[CrossRef]
GODSILL, S. J., 2003 Proposal densities, and product space methods, pp. 199203 in Highly Structured Stochastic System, edited by P. J. GREEN, N. L. HJORT and S. RICHARDSON. Oxford University Press, London/New York/Oxford.
GREEN, P. J., 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711732.
GREEN, P. J., 2003 Trans-dimensional Markov chain Monte Carlo, pp. 179198 in Highly Structured Stochastic System, edited by P. J. GREEN, N. L. HJORT and S. RICHARDSON. Oxford University Press, London/New York/Oxford.
HEATH, S. C., 1997 Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61: 748760.[Medline]
HOESCHELE, I., 2001 Mapping quantitative trait loci in outbred pedigrees, pp. 599644 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, New York.
JIANG, C., and Z-B. ZENG, 1997 Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101: 4758.[CrossRef][Medline]
KUO, L., and B. MALLICK, 1998 Variable selection for regression models. Sankhya. Ser. B 60: 6581.
LEE, J. K., and D. C. THOMAS, 2000 Performance of Markov chain Monte Carlo approaches for mapping genes in oligogenic models with an unknown number of loci. Am. J. Hum. Genet. 67: 12321250.[Medline]
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.
NTZOUFRAS I., 1999 Aspects of Bayesian model and variable selection using MCMC. Ph.D. Dissertation, Department of Statistics, Athens University of Economics and Business, Athens, Greece.
RAFTERY, A. E., D. MADIGAN and J. A. HOETING, 1997 Bayesian model averaging for linear regression models. J. Am. Stat. Assoc. 92: 179191.[CrossRef]
SATAGOPAN, J. M., and B. S. YANDELL, 1996 Estimating the number of quantitative trait loci via Bayesian model determination. Special Contributed Paper Session on Genetic Analysis of Quantitative Traits and Complex Disease, Biometric Section, Joint Statistical Meeting, Chicago.
SILLANPää, M. J., and E. ARJAS, 1998 Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148: 13731388.
SILLANPää, M. J., and E. ARJAS, 1999 Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151: 16051619.
SILLANPää, M. J., and J. CORANDER, 2002 Model choice in gene mapping: what and why. Trends Genet. 18: 301307.[CrossRef][Medline]
SMITH, M., and R. KOHN, 1996 Nonparametric regression using Bayesian variable selection. J. Econom. 75: 317344.[CrossRef]
STEPHENS, D. A., and R. D. FISCH, 1998 Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54: 13341347.[CrossRef]
UIMARI, P., and I. HOESCHELE, 1997 Mapping linked quantitative trait loci using Bayesian method analysis and Markov chain Monte Carlo algorithms. Genetics 146: 735743.[Abstract]
UIMARI, P., and M. J. SILLANPää, 2001 Bayesian oligogenic analysis of quantitative and qualitative traits in general pedigrees. Genet. Epidemiol. 21: 224242.[CrossRef][Medline]
XU, S., 2003 Estimating polygenic effects using markers of the entire genome. Genetics 163: 789801.
XU, S., and N. YI, 2000 Mixed model analysis of quantitative trait loci. Proc. Natl. Acad. Sci. USA 97: 1454214547.
YI, N., and S. XU, 2000 Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155: 13911403.
YI, N., and S. XU, 2001 Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics 157: 17591771.
YI, N., and S. XU, 2002 Mapping quantitative trait loci with epistatic effects. Genet. Res. 79: 185198.[CrossRef][Medline]
YI, N., D. B. ALLISON and S. XU, 2003a Bayesian model choice and search strategies for mapping multiple epistatic quantitative trait loci. Genetics 165: 867883.
YI, N., V. GEORGE and D. B. ALLISON, 2003b Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics 164: 11291138.
This article has been cited by other articles:
![]() |
S. Banerjee, B. S. Yandell, and N. Yi Bayesian Quantitative Trait Loci Mapping for Multiple Traits Genetics, August 1, 2008; 179(4): 2275 - 2289. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Yi and S. Xu Bayesian LASSO for Quantitative Trait Loci Mapping Genetics, June 1, 2008; 179(2): 1045 - 1055. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Chen and C. Kendziorski A Statistical Framework for Expression Quantitative Trait Loci Mapping Genetics, October 1, 2007; 177(2): 761 - 771. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Huang, C. D. Eversley, D. W. Threadgill, and F. Zou Bayesian Multiple Quantitative Trait Loci Mapping for Complex Traits Using Markers of the Entire Genome Genetics, August 1, 2007; 176(4): 2529 - 2540. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Yi, D. Shriner, S. Banerjee, T. Mehta, D. Pomp, and B. S. Yandell An Efficient Bayesian Model Selection Approach for Interacting Quantitative Trait Loci Models With Many Effects Genetics, July 1, 2007; 176(3): 1865 - 1877. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Yi, S. Banerjee, D. Pomp, and B. S. Yandell Bayesian Mapping of Genomewide Interacting Quantitative Trait Loci for Ordinal Traits Genetics, July 1, 2007; 176(3): 1855 - 1864. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Yandell, T. Mehta, S. Banerjee, D. Shriner, R. Venkataraman, J. Y. Moon, W. W. Neely, H. Wu, R. von Smith, and N. Yi R/qtlbim: QTL with Bayesian Interval Mapping in experimental crosses Bioinformatics, March 1, 2007; 23(5): 641 - 643. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sillanpaa and M. Bhattacharjee Association Mapping of Complex Trait Loci With Context-Dependent Effects and Unknown Context Variable Genetics, November 1, 2006; 174(3): 1597 - 1611. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Feenstra, I. M. Skovgaard, and K. W. Broman Mapping Quantitative Trait Loci by an Extension of the Haley-Knott Regression Method Using Estimating Equations Genetics, August 1, 2006; 173(4): 2269 - 2282. [Abstract] [Full Text] [PDF] |
||||
![]() |
































