## Abstract

In 2003, Xu obtained remarkably precise estimates of QTL positions despite the many markers simultaneously in his Bayesian model. We extend his model and Gibbs algorithm to ensure a valid posterior distribution and convergence to it, without changing the attractiveness of the method.

STIMULATED by Meuwissen *et al.* (2001), Xu (2003) presented a Bayesian model for marker analysis in which each marker effect has its own specific variance with, on the log scale, a flat prior for each variance, where *y _{i}* denotes the phenotypic value for the

*i*th individual (

*i*= 1, … ,

*n*),

*b*

_{0}is the population mean,

*b*is the QTL effect associated with the

_{j}*j*th marker (

*j*= 1, … ,

*p*),

*x*is a dummy variable indicating genotype, and

_{ij}*e*∼

_{i}*N*(0, σ

^{2}

_{0}) is the residual error. In a backcross (BC) population, the variable

*x*is either 1 or −1, depending on whether the

_{ij}*i*th individual is homozygous or heterozygous. The priors of Xu (2003) are

*p*∝ 1,

*p*∼ 1/σ

^{2}

_{0},

*b*

_{j}∼

*N*, and

*p*∝ 1/σ

^{2}

_{j}; thus

*p*∝ 1. This is a daring model because O'Hagan (1994)(Sect. 9.61) and Gelman

*et al*. (2004) warn that such priors yield improper (

*i.e*., invalid) posterior distributions in variance components models. Nevertheless, in simulations Xu's (2003) Gibbs algorithm did very well in yielding precise estimates of the QTL effects and positions. The Bayesian estimates of the marker effects were all near zero for markers that did not coincide with a QTL. We were concerned that this might be an artifact of the Gibbs algorithm, which is prone to converge slowly for correlated parameters.

For illustration of the power of the method, Figure 1a shows results from our C++ implementation for the simulated BC population of Figure 6 of Xu (2003), where *p* = 301, except that we used *n* = 200 instead of 300. The positions of the first six QTL stand out clearly with the lines with solid circles representing the effect means—the fifth QTL being spread over the interval [590–615]—whereas the remaining five QTL that each explain 0.6% of the phenotypic variance are barely visible, except perhaps for the tenth QTL. The 95%-credible intervals cover the true effects for the first six QTL, although only the first three exclude zero. The mean, 2.5%, and 97.5% points for markers not coinciding with QTL are typically near zero (<0.005 in absolute value). The estimates near 450 cM illustrate the danger of nonconvergence. The peak is at 455 cM, close to the true QTL position. However, the LOD profile for these data (Figure 1b), calculated using a one-QTL model (Dupuis and Siegmund 1999) and phenotypic data from which the true effects of the other QTL are subtracted, peaks at 450 cM, the scores at 445, 450, 455, and 460 cM being 7.1, 10.4, 9.5, and 7.2. The 1.5-LOD interval (Dupuis and Siegmund 1999) includes both 450 and 455 cM. As Figure 1a contains little probability mass at 450 cM, it thereby suggests a more precise QTL position than warranted.

To ensure a valid posterior we extend the prior to *p* ∝ ^{−1+δ}, which yields a proper posterior for the QTL effect for 0 < δ ≤ ^{1}/_{2} (appendix). Xu's prior (δ = 0) is just excluded, because it yields a posterior of *b _{j}* with infinite mass near zero. The new prior requires a change in step 5 in Xu's Gibbs algorithm. In the notation of Gelman

*et al.*(2004), the new σ

^{2}

_{j}must be sampled from inv χ

^{2}; that is, new with χ

^{2}

_{1−2δ}a random number sampled from a chi-square distribution with (1 − 2δ) d.f. Another solution is to use an inv χ

^{2}(ν,

*s*

^{2}) prior (Meuwissen

*et al.*2001), yielding a valid posterior for positive ν and

*s*

^{2}. However, with this approach we found that the 2.5 and 97.5% points for markers not coinciding with QTL were typically >100 times larger than those with Xu's method, even for small values such as ν =

*s*

^{2}

*=*10

^{−4}.

We examined the danger of slow convergence (bad mixing) by adding a Metropolis step as step 5b in Xu (2003) in which σ^{2}_{j} and σ^{2}_{j+1} are swapped (*j =* 1, … , *p* − 1) with an acceptance probability (A5). This step is not conditional on *b _{j}* and

*b*

_{j}_{+1}as these are integrated out (Brown

*et al.*2002) and ensures appropriate mixing of close marker variances and, thereby, of close marker effects.

Figure 1b shows the results obtained for δ = 0.001 with the swap step. As in Figure 1a the major QTL stand out and the markers not coinciding with QTL have near zero means and percentage points. The improved mixing is visible near 450 cM where now the largest effect is where the LOD peaks instead of being at 455 cM. Also the peak at 615 cM in Figure 1a has decreased in importance in favor of the peak at 590 cM in Figure 1b, in agreement with the LOD profile for this QTL (not shown). In a one-QTL Bayesian model with *p* = 1 and fixed , (A6) in the appendix is proportional to the posterior density of a QTL at the location of the marker. The posterior distribution and credible interval for a single QTL can thus be obtained by calculating (A6) at all markers and normalizing the obtained values to a sum of 1. We use this one-QTL model for its analytical tractability. We expect the multiple-QTL model to give credible intervals that are somewhat larger than those of the one-QTL model for phenotypic data from which the true effects of the other QTL are subtracted. As 95% credible intervals we thus obtained for QTL 4–6 and 10 [450, 455], [590, 610], [150, 1330], and [75, 1425]. It is thus no surprise that QTL 6 and 10 do not stand out in Figure 1b. In further simulations we found that the swap step often has little effect on the results, showing that the original Gibbs algorithm mixes reasonably well. We view the swap step as an extra safeguard against bad mixing of consecutive σ^{2}_{j}'s. In addition, parameter expansion (Gelman *et al*. 2004) could be beneficial for the mixing for each (*b _{j}*, σ

^{2}

_{j}) pair.

In this all-marker model, mixing has the drawback that the size of the QTL effect is shrunken compared to that of the unmixed case. The sum of *b _{j}*'s at 450 and 455 cM has a posterior mean of 1.39 in Figure 1a and a mean of 0.89 in Figure 1b, whereas the true QTL effect is 1.58. The reason is that the model shrinks each effect more the smaller its

*t*-value is. For a better size estimate, nearby markers can be suppressed, as suggested by Xu (2003) in the case of closely linked markers. We see virtue in the extended model in QTL selection models with epistatic and

*G*×

*E*effects, where the number of predictors at each location may be very large.

## APPENDIX

### Posterior of *b*_{j}:

_{j}

Xu's (2003) Bayesian model and our extension of it can be expressed without reference to variance components σ^{2}_{j}. Let the prior of *b _{j}* given σ

^{2}

_{j}be

*N*and the prior for σ

^{2}

_{j}be

*p*∝

^{δ−1}. Then, by integrating out σ

^{2}

_{j}, the (unconditional) prior for

*b*becomes A1The last proportionality uses Box and Tiao's (1973) Equation A2.1.2. The prior is improper, for δ =

_{j}^{1}/

_{2}flat and for δ = 0 a hyperbola. For the posterior distribution of

*b*, given

_{j}*b*

_{0}and the other QTL effects

*b*(

_{k}*k*≠

*j*), we obtain, with A2where and , the least-squares estimate of

*b*given the other parameters. For δ

_{j}*=*0, the integral does not exist, as can be shown as follows. For any ε between 0 and 1, with

*a*the minimum of exp(−

^{1}/

_{2}

*f*(

_{j}*b*−

_{j}*b̂*)

_{j}^{2}) over

*b*in the interval [0, 1];

_{j}*a*is a positive number. As the integrand is positive over the whole range of

*b*,

_{j}*C*= ∞; the posterior of

*b*is thus improper and no valid Bayesian analysis is available. For 0 < δ ≤

_{j}^{1}/

_{2}, A3with Γ(·) the gamma function and M(· , · , ·) the first Kummer function (Abramowitz and Stegun 1972). For δ =

^{1}/

_{2},

*C*= (2πσ

^{2}/

**x**

^{T}

**x**)

^{1/2}and (A2) is the normal density. Equation A2 has an infinite mode at 0 and, for

*b̂*) > 2(1 − 2δ)

_{j}/se(b̂_{j}^{1/2}, a finite local mode between

^{1}/

_{2}

*b̂*and

_{j}*b̂*. The second mode thus starts to appear if, for small δ, the

_{j}*z*-ratio >2. This may also explain why δ = 0 works well in practice to pick up QTL.

### Swap step:

Let **G** be a 2 × 2 diagonal matrix with elements σ^{2}_{j} and σ^{2}_{j+1}, , , and **ỹ** the phenotypic data residualized with respect to *b*_{0} and all current marker effects, except *b _{j}* and

*b*

_{j+}_{1}. Because

*p*is bivariate normal

*N*with

**b̃**=

**V**

^{−1}

**X**

^{T}

**ỹ**, the posterior distribution of the variance components σ

^{2}

_{j}and σ

^{2}

_{j+1}has density A4

(Box and Tiao 1973; O'Hagan 1994; Gelman *et al.* 2004). If and at the beginning of the swap step (the additional Metropolis step 5b), the acceptance probability of the swap to and is A5

The two conditional probabilities in (A5) are calculated by inserting (*b*, *a*) and (*a*, *b*) for in (A4), respectively. Note that **G** and **V** in (A4) depend on σ^{2}_{j} and σ^{2}_{j+1}. The calculation is easy and quick as (A4) contains the determinant and inverse of 2 × 2 matrices only.

### Posterior of σ^{2}_{j}:

The univariate analogue of (A4) is A6The posterior of σ^{2}_{j} can be shown to be proper for 0 < δ < ^{1}/_{2}.

## Acknowledgments

We thank Shizhong Xu and Jean-Luc Jannink for constructive comments that improved this note.

## Footnotes

Communicating editor: C. Haley

- Received January 4, 2005.
- Accepted April 4, 2005.

- Genetics Society of America