Abstract
The MetropolisHastings algorithm used in analyses that estimate the number of QTL segregating in a mapping population requires the calculation of an acceptance probability to add or drop a QTL from the model. Expressions for this acceptance probability need to recognize that sets of QTL are unordered such that the number of equivalent sets increases with the factorial of the QTL number. Here, we show how accounting for this fact affects the acceptance probability and review expressions found in the literature.
IN a Bayesian quantitative trait loci (QTL) analysis it is possible to consider the number of QTL to be an unobserved parameter subject to estimation (Sillanpää and Arjas 1998). Markov chain Monte Carlo (MCMC) implementations of such an analysis can use the MetropolisHastings algorithm to move between models with differing numbers of QTL and hence differing dimensionalities (Sorensen and Gianola 2002, Chap. 11). The MetropolisHastings algorithm requires calculation of the probability to accept a move from the current model to the model proposed for the next iteration of the Markov chain. We denote this acceptance probability α below. Several expressions for the calculation of α have been presented in the literature (Heath 1997; Uimari and Hoeschele 1997; Sillanpää and Arjas 1998; Stephens and Fisch 1998; Lee and Thomas 2000). Here, we show that a careful derivation of α, and in particular of the prior for the vector of QTL parameters, indicates that some of these expressions for α contain an error or at least a risk being misinterpreted.
The QTL model and derivation of α: Linear models in previous reports all differ in details that are not pertinent to the present derivation. In a very general model, the vector of phenotypes y can be given as
The posterior density π(N_{qtl}, μ, σ^{2}, θ) given observable and prior information is
In some previous expressions for α, authors have equated p(θ^{+}N_{qtl} + 1) to p(θN_{qtl})p(θ_{N}_{qtl+1}) such that these terms cancel out of Equation 4. Here we show that if θ is unordered, this equality does not hold. The vector θ has N_{qtl} subvectors, where each subvector contains the parameters for a locus. Consider a model with QTL defined by the specific subvectors θ_{1}, θ_{2},..., θ_{N}_{qtl}. There are N_{qtl}! ways of assigning these θ_{i} to the subvectors of θ. That is, for example, {θ_{1}, θ_{2},..., θ_{N}_{qtl}} represents the same model as {θ_{2}, θ_{1},..., θ_{N}_{qtl}}. Since there are N_{qtl}! permutations that result in the same model, the prior for θ is
This expression is identical to that given in Sillanpää and Arjas (1998) under step 1.2 in their Appendix A, except that they square the N_{qtl} + 1 term in the denominator. The following reasoning reinforces our belief that Equation 7 is correct: consider running the sampler in the absence of phenotypic data. In that case, p(yN_{qtl}, μ, σ^{2}, θ) = 1 for all models, and the posterior should be equal to the prior. Running a MetropolisHastings sampler using (7) will indeed give samples from a Poisson distribution with mean λ, while the acceptance probabilities given in Sillanpää and Arjas (1998) would not.
Other derivations of α in the literature: In their derivation for α, Uimari and Hoeschele (1997) discuss a reversiblejump step moving between one and two QTL. In their model, they force the second QTL to be to the right of the first QTL. Thus, their model does not allow for the two permutations discussed above. Consequently, the results given in this note are irrelevant to their derivation. Heath (1997) does not discuss the number of permutations that results in the same vector of QTL parameters. The equation he provides for α (his Equation B2) appears to decompose the stationary distribution π into the product of the prior with the likelihood as we do in Equation 4. A (N_{qtl} + 1) term nevertheless appears in the denominator of Equation B2 (he denotes N_{qtl} by k) and we therefore believe Equation B2 to be incorrect. Stephens and Fisch (1998) also appear to make the mistake that is the main point of this note. In addition, they decompose the QTL position into a marker interval and a recombination frequency within the interval. They account for the prior of the recombination frequency but do not account for the prior of the specific interval in which the QTL is placed. Lee and Thomas (2000) present what is, to the best of our ability to discern, a correct acceptance probability with respect to the main point of this note. Their derivation of α, however, is unclear. They document the need for the (N_{qtl} + 1)^{–1} term because of the random choice of which QTL to drop (their Equation A4). They also recognize the λ× (N_{qtl} + 1)^{–1} term as the ratio of the Poisson priors. Since they do not discuss the permutations of the vector of QTL parameters, however, the attentive reader will be puzzled by their final expression for α, Equation A5. A different variant of the problem presented in this note arises in Bink (2002). That study presents research on a finite polygenic model in which unlinked loci are modeled, that is, the QTL have no position. Our derivation above is general to whether or not the QTL have positions and therefore is also relevant to this case. Consequently, the term in the denominator of the acceptance probability given (Bink 2002, p. 248) should not be squared. Finally, Sorensen and Gianola (2002, pp. 690–699) order θ by labeling each of its subvectors according to when the subvector was sampled. The MetropolisHastings proposal they discuss is different from that discussed above in that it allows only the last QTL sampled to be dropped. Thus, they have generalized Uimari and Hoeschele's (1997) ordered vector of two QTL to an ordered vector of an arbitrary number of QTL, and this note is also irrelevant to this derivation. It is heartening to see that, despite this difference in ordered vs. unordered QTL vector, Sorensen and Gianola (2002) arrive at the same expression for α as we do in Equation 7. Indeed, considering the reasoning given above of running the sampler in the absence of phenotypic data, it is necessary that they should arrive at the same expression.
The fact that this error has remained in the literature for over 5 years underscores the view that while Bayesian analysis using Markov chain Monte Carlo is incredibly flexible and therefore powerful, the devil is in the details. Furthermore, incorrect analyses can give results that seem quite reasonable.
Acknowledgments
We acknowledge discussions with Dr. Mikko Sillanpää.
Footnotes

Communicating editor: R. W. Doerge
 Received June 13, 2003.
 Accepted October 15, 2003.
 Copyright © 2004 by the Genetics Society of America