## Abstract

A Bayesian methodology has been developed for multiple quantitative trait loci (QTL) mapping of complex binary traits that follow liability threshold models. Unlike most QTL mapping methods where only one or a few markers are used at a time, the proposed method utilizes all markers across the genome simultaneously. The outperformance of our Bayesian method over the traditional single-marker analysis and interval mapping has been illustrated via simulations and real data analysis to identify candidate loci associated with colorectal cancer.

TREMENDOUS advances have been achieved over the last decade in the identification of genes underlying many heritable traits with the greatest progress limited almost entirely to those with Mendelian inheritance patterns and well-defined quantitative traits that have relatively large and consistent effects. However, many common pathologies afflicting the greatest number of individuals are not due to simple Mendelian traits. Recent emphasis has been shifted to map complex traits, which are caused by the sum of a complex interaction between gene products and environmental stimuli. Complicating the analysis of these types of traits is the prediction that many are also controlled by genes that have small effects individually, but whose cumulative action is the cause of significant interindividual variation. Due to the complex and often subtle nature of phenotypic variation, traits with complex etiologies have proven far more resistant to genetic analysis. Most of the available quantitative trait loci (QTL) mapping methods map only one or a few QTL at a time and therefore are not efficient for mapping such complex traits. Forward and stepwise selection procedures have been proposed in searching for multiple QTL. Though simple, these methods have their limitations, such as the uncertainty of number of QTL, the sequential model building that makes it unclear how to assess the significance of the associated tests, etc.

To overcome this problem, Bayesian QTL mapping (Satagopan *et al*. 1996; Sillanpaa and Arjas 1998; Stephens and Fisch 1998; Yi and Xu 2000, 2001; Hoeschele 2001) has been developed, in particular, for detection of multiple QTL by treating the number of QTL as a random variable and specifically modeling it using reversible-jump Markov chain Monte Carlo (MCMC) (Green 1995). Due to the change of dimensionality, care must be taken in determining the acceptance probability for such a dimension change, which in practice may not be handled correctly (Ven 2004). To avoid such a problem by the uncertain dimensionality of parameter space, Yi (2004) and Xu (2003) proposed a unified Bayesian framework to identify multiple QTL using all markers across the genome. The method of Xu (2003) is based on a shrinkage idea to simultaneously evaluate marker effects of the entire genome under the random regression model by assigning each marker a normal prior with mean 0 and an effect-specific variance. The effect-specific prior variance was further assigned a vague prior such that the variance was estimated from the data. Those markers that have no effect on the trait will be essentially shrunk down to 0. Similarly, Yi (2004) adapted the stochastic search variable selection (SSVS) approach of George and McCulloch (1993) to the QTL mapping framework. SSVS is a variable selection method that keeps all possible variables in the model and limits the posterior distribution of nonsignificant variables in a small neighborhood of 0 and therefore eliminates the need to remove nonsignificant variables from the model. In principle, Xu (2003) and Yi (2004)'s methods are similar and both have the ability to control the genetic variances of a large number of QTL where each has small effect (Wang *et al*. 2005). Due to the simplicity of Xu (2003), we decide to go with the unified shrinkage method in this article.

Some quantitative traits do not have continuous measurements, but rather are qualitative traits with, for example, binary measurements. This research is mainly motivated by a colorectal cancer susceptibility study. One hundred thirty-five backcross mice of A/J × SPRET/EiJ F_{1} (ASP F_{1}) hybrids to A/J were given intraperitoneal injections of the alkylating carcinogen azoxymethane where ∼40% of the mice developed tumors. The goal of the study was to identify susceptibility genes for colorectal cancer. Mapping genes for such binary traits is more complicated than that for continuous traits as current QTL mapping methods are mainly limited to test association between a marker and a binary trait with simple chi-square tests. As an alternative, Hackett and Weller (1995), Xu and Atchley (1996), and Visscher *et al*. (1996) proposed interval mapping procedures for complex binary disease traits assuming that the binary traits are controlled by an underlying normally distributed liability. The quantitative liability is then modeled by the usual quantitative genetics model. Due to the apparent success of the unified Bayesian framework to identify multiple QTL using all markers across the genome for normally distributed quantitative traits, the systematic investigation of the unified Bayesian mapping for complex binary traits would be interesting and useful. In this article, we propose the Bayesian methodology in mapping complex binary traits and investigate its performance via extensive simulations. Detailed convergence diagnostics are also presented.

## STATISTICAL METHODS

#### The liability and threshold model:

In the liability model (Wright 1934a,b; Falconer 1965; Falconer and Mackay 1996), a binary trait is assumed to be controlled by a latent liability variable, which is normally distributed. That is, suppose *d _{i}* and

*y*(

_{i}*i*= 1,…,

*n*) are the binary phenotype and the underlying liability, respectively, of the

*i*th individual; then the threshold model assumes that there is a fixed threshold in the scale of liability,

*t*, which determines

*d*. Specifically,

_{i}*d*= 1, if

_{i}*y*>

_{i}*t*; otherwise

*d*= 0.

_{i}#### Marker analysis:

Here we allow only QTL to be located on markers. An extension to allow QTL to be located between markers is discussed in the following section.

For backcross QTL data, we can describe the liability *y _{i}* by the following linear model,(1)where

*K*is the total number of markers; μ is the overall population mean,

*x*is a dummy variable indicating the genotype of the

_{ij}*j*th marker of individual

*i*with

*x*= 1 or 0 if the marker genotype is homozygote or heterozygote, respectively,

_{ij}*a*is the partial regression coefficient, and

_{j}*e*is the residual error with a distribution of . Note,

_{i}*a*describes the genetic effect of the

_{j}*j*th marker that partly absorbs the effects of all QTL located between markers

*j*− 1 and

*j*+ 1, as shown by Zeng (1993).

Similarly, for an *F*_{2} population, the liability *y _{i}* can be related to QTL as(2)where

*x*and

_{ij}*w*are defined as and

_{ij}*w*= −1 for genotype

_{ij}*AA*,

*x*= 0 and

_{ij}*w*= 1 for

_{ij}*Aa*, and and

*w*= −1 for

_{ij}*aa*with

*AA*,

*Aa*, and

*aa*referring to the three marker genotypes at each locus;

*a*and

_{j}*b*are the additive and dominance effects of the

_{j}*j*th marker; and μ and

*e*are defined the same as in the backcross model.

_{i}Since the latent liability *y* is unobserved, the mean μ and residual variance can be set arbitrarily. For model identifiability, we set *t* = 0 and σ_{e} = 1 throughout the presentation.

To employ the frequentist method via a likelihood function requires calculation of the conditional probability, *p _{i}*, of

*d*= 1 given {

_{i}*x*,

_{ij}*j*= 1, 2,…,

*K*}, which can be approximated by the logistic model(3)(Xu and Atchley 1996), with for the backcross, for the F

_{2}, and . Therefore the log likelihood can be approximated as(4)where

*p*is defined above. The maximum-likelihood estimators (MLE) can be obtained by directly maximizing

_{i}*L*. However, in practice, the number of markers is often comparable to or even larger than the number of observations. Under such circumstances, the maximum-likelihood method will have very low efficiency or even fail.

Below we describe the Bayesian method which overcomes such a problem. We describe here only the basis for the backcross population. With a minor modification, it can be easily extended to an F_{2} design.

##### Bayesian modeling:

In the Bayesian framework, both the data and the parameters are treated as random, with random variables being classified as observed and unobserved. The goal of Bayesian analysis is to combine the prior distribution of the unobserved variables with the observed data to obtain a posterior distribution of the unknown variables. The observed data in our QTL setup are the binary responses , and the marker genotypes **X** = {*x _{ij}*}, for

*i*= 1, 2,…,

*n*and

*j*= 1, 2,…,

*K*. Our unobserved variables are the liability , the regression coefficients, , and the variances associated with .

By Bayes' theorem, the joint posterior density of the parameters {**Y**, **B**, **V**}, given the observed data {**S**, **X**}, is(5)Inference is performed conditional on **X** and we suppress this conditioning notation for the remainder of the article.

We assume that the joint prior distribution of *p*(**B**, **V**) is(6)Specifically, we choose , and , where ϕ(*x*, σ^{2}) is the density function of normal distribution with mean zero and variance σ^{2}.

The first term in (5) is the conditional distribution of the data given all the unknowns, which equals(7)where *I*(*A*) is an indicator function, taking the value of 1 if condition *A* is true and 0 otherwise. Note that *p*(**S** | **Y**, **B**, **V**) = *p*(**S** | **Y**) because **S** depends solely on **Y**. The second term in (5) is the conditional distribution of the liability. Because the liability is normally distributed and independent of each other given other variables, we have(8)

A MCMC method is used to generate the joint posterior distribution of all unknowns given in (5). Let ψ = {**Y**, **B**, **V**} be the collection of unknown variables. For a given *j*, if the conditional distribution of ψ_{j} given the rest of variables has a known standard density, new ψ_{j} are drawn via Gibbs samples. Otherwise, we use the Metropolis–Hastings algorithm to draw a new sample according to a proposed density . will be accepted with probability min(1, *r*), where(9)Here a negative subscript (− *j*) denotes a vector with the *j*th element removed.

We first initialize ψ as follows. The overall mean μ and the genetic effects of all QTL are initialized with zero while the variances are initialized with 1. Given the initial values of (**B**, **V**), we generate *y _{i}* from the corresponding truncated normal distributions,(10)(11)where Φ(

*x*) is the standardized normal distribution function. We describe below the steps for a single MCMC iterative sweep

*t*+ 1. Superscripts (

*t*) signify the current variables, and we begin with the initialized values for

*t*= 0:

Step 1. Updating μ: μ

^{(t+1)}is drawn from the posterior normal distribution with mean and variance 1/*n*.Step 2. Updating

*a*for_{j}*j*= 1,…,*K*: is drawn from the posterior normal distribution with mean(12)and variance(13)Step 3. Updating for

*j*= 1,…,*K*: is sampled from a scale-inverted χ^{2}-distribution, , where is a χ^{2}-distribution with 1 d.f.Step 4. Updating

*y*for_{i}*i*= 1,…,*n*: is drawn from a truncated normal distribution (10) if*d*= 1 or (11) if_{i}*d*= 0._{i}

After this round of sampling, we have completed one sweep of the MCMC and are ready to continue our sampling for the next round by repeating steps 1–4 with the new ψ. When the chain converges, the sampled parameters approximately follow the joint posterior distribution. From the joint posterior sample, one can easily obtain the desired Bayesian estimates, such as the posterior means and variances.

##### Simulations and real data analysis:

The performance of the proposed Bayesian method is evaluated by analyzing a set of simulated backcross data.

A single chromosome with a length of 15 morgans is simulated. On this chromosome, 301 evenly spaced markers (300 intervals, each 5 cM long) are located and two sets of QTL are simulated. In the first setup, four QTL are put along the genome with positions and effects listed in Table 1. Here the QTL are only loosely linked and each QTL explains 20% of the total liability variance. To investigate the ability of the method to identify small QTL effects and to separate closely linked QTL, 11 QTL are evenly placed on the first half of the chromosome, *i.e*., from 0 to 750 cM in the second setup. The QTL are numbered from 1 to 11 with effects in descending order: . The first QTL explains 43.54% of the total liability variance and the second QTL explains half the size of the first one and so on. The proportion of the total liability variance explained by the 11 QTL is *H* = 89.12%. To demonstrate the advantages of the proposed method over the traditional regression method, we simulate 300 individuals, which is smaller than the number of markers; traditional regression analysis fails if all markers are included as covariates. Further, we have simulated 500 individuals to see how sample size affects QTL mapping. For each simulated datum, the sampled parameter values from the first 50,000 sweeps of the chain (burn-in period) were discarded from the analysis. Then we performed an additional 500,000 MCMC sweeps. After the burn in, the final sample of observations was selected every 50 sweeps to reduce serial correlation, resulting in 10,000 samples from the posterior.

Table 1 shows some summary statistics and Figure 1 shows the estimated QTL-effect profiles (the posterior means of the marker effects) against marker positions along the genome for normal data (Figure 1a) and binary data (Figure 1b) from 10,000 sample states under setup one with sample size 300.

Both profiles show clear signals of QTL at the simulated positions. In most cases the estimated QTL effects are close to the true values (Table 1). However, the analysis based on the binary data has reduced efficiency relative to the analysis on the normal liability data directly. This is expected because of the reduced information in binary data. The histograms of the posterior distribution are also presented in Figure 2. The posterior distributions are nearly normal shaped.

The results for setup two where some QTL are closely linked (sample size = 300) are presented in Figure 3. For comparison, we also perform a simple chi-square test at each marker. From Figure 3a, where −log_{10}(*P-*value) of the single-marker analysis is plotted, we see many markers that are significantly associated with the simulated trait. Because the single-marker analysis fit only one marker at a time, those markers that are correlated with the simulated QTL will also be highly associated with the trait. On the other hand, in the Bayesian analysis result (Figure 3b), five large QTL are quite clear and four of them are located at the simulated positions with effects also close to the true values. To improve the estimate power, we simulated 500 backcross individuals on the basis of the same setup as above and applied both single-marker and Bayesian analyses on them. The results are depicted in Figure 4. Clearly, the mapping efficiency has been improved quite a lot when sample size increases from 300 to 500 (Figure 4b), where six clear QTL are given at the true positions with effects very close to true values. The smallest QTL that the Bayesian method can pick up in this example explains 2.72% of the liability variance.

##### Real data analysis:

Azoxymethane (AOM) is an alkylating carcinogen that causes strain-specific susceptibility to the development of colorectal tumors in mice. Susceptible mice strains treated with AOM exhibit genetic and pathologic changes similar to those in nonfamilial or sporadic human colorectal cancer. We previously characterized 32 inbred mouse strains for their susceptibility to AOM-induced colorectal tumors and identified the A/J strain as having one of the highest sensitivities, with nearly 100% of mice developing colorectal tumors (our unpublished results). In contrast, the genetically distinct strain from the related *Mus spretus* species, SPRET/EiJ, was found to be completely resistant.

For the current study, we set up a backcross of ASP F_{1} hybrids to A/J to generate backcross progeny. Two- to 3-month-old mice were given intraperitoneal injections of AOM at 10 mg/kg of body weight once a week for 4 weeks. Subsequently, mice were killed by carbon dioxide asphyxiation 20 weeks after the last AOM dose. A tail clip, a liver sample, and the colon were dissected from each mouse. Each colon was gently flushed with phosphate buffer saline solution and cut open along its longitudinal axis. The position and size of tumors were recorded before excising any colon tumors for histological verification. Tails, livers, and colons were then placed into labeled Eppendorf tubes and stored at −80°. DNA was isolated from liver or tail samples from each mouse by phenol-chloroform extraction. One hundred thirty-five backcross mice were genotyped using the Illumina SNP genotyping platform. Two hundred fifty-four informative markers were used that distinguished the A/J strain from the SPRET/EiJ strain. ASP F_{1} mice were found to be nearly tumor free with ∼5% of mice developing a single colorectal tumor. Analysis of the 135 mice in the backcross population revealed that 40% of mice developed AOM-induced colorectal tumors (Figure 5).

Since ∼5% of ASP F_{1} mice still developed a single colorectal tumor, we recategorized the backcross animals into two groups, with one group being tumor free or having a single tumor while the second group was backcross mice developing more than one tumor. The analysis was based on this recategorized binary trait and is summarized in Figure 6. Due to the small sample size, no significant results were identified. However, one region on chromosome 6 is promising. This region encompasses a previously detected susceptibility locus for AOM-induced colorectal tumors (Ruivenkamp *et al*. 2003). We are currently working on collecting another 140 backcross mice to confirm the findings on chromosome 6. For comparison, the analysis based on a single chi-square test was also performed. Again, the smallest *P*-value is on chromosome 6, which is consistent with the Bayesian analysis, but it does not survive the genomewide 5% significance level either. Another interesting region indicated by our analysis is located on chromosome 11, which deserves further investigation as well.

#### Interval mapping:

In this section, we extend the QTL mapping on markers to a more general situation where QTL are allowed to be located at any position within marker intervals.

##### Bayesian modeling:

Wang *et al*. (2005) developed a method for normal phenotypes that allows a QTL to take a position varying within a marker interval rather than fixed at a marker. They assumed that each marker interval is associated with a QTL, and thus the number of putative QTL is identical to the number of intervals. In their model, the conditional distribution of the QTL genotype given the marker genotypes is derived from a Markov model under the assumption of no segregation inference. The main difference between interval mapping and marker analysis is that for interval mapping, the QTL positions and QTL genotypes are unobserved, while for marker analysis, the QTL positions are fixed at markers and also QTL genotypes are observed. A major advantage of the Bayesian analysis is that the unobserved QTL positions and QTL genotypes can be treated as random variables and sampled via the MCMC procedure. Therefore, for interval mapping, we need only to add additional steps to the MCMC procedure described above for sampling the QTL positions and QTL genotypes.

For interval mapping, the observed variables are the same as those of the marker analysis, which include **S** and the marker genotypes **M** = {*m _{ij}*} (

*i*= 1, 2,…,

*n*;

*j*= 1, 2,…,

*K*). (Here we change the notation for the marker genotypes from

**X**to

**M**since we reserve

**X**for QTL genotypes. In the marker analysis, the QTL are markers themselves and therefore we refer to marker genotypes as

**X**.) For unobserved variables, besides those in the marker analysis {

**Y; B; V**}, there are additional variables, which include the QTL positions,

**λ**= {λ

_{j}} for

*j*= 1, 2,…,

*p*; and the QTL genotypes

**X**= {

*x*} for

_{ij}*i*= 1, 2,…,

*n*and

*j*= 1, 2,…,

*p*, where

*p*is the total number of intervals, which in general ≤

*K*− 1, and the equality holds only if one chromosome is considered in the analysis.

Now the joint posterior density of the parameters {**Y**, **λ**, **X**, **B**, **V**}, given the observed data {**S**, **M**}, is(14)Again, inference is performed conditional on **M** and we suppress this conditioning notation for the remainder of the article.

The last term in (14) is the joint prior, and we assume(15)The prior distributions for μ, *a _{j}*, and are the same as in the marker analysis while for the QTL positions and genotypes, we choose the following prior distributions,

*p*(λ

_{j}) = 1/

*d*and

_{j}*x*= −1 or 1 with probability each for

_{ij}*i*= 1, 2,…,

*n*,

*j*= 1, 2,…,

*p*, where

*d*is the length of the

_{j}*j*th marker interval.

To sample unobserved variables, first we sample **Y**, **B**, **V** the same way as described in steps 1–4 for the marker analysis except that all *x _{ij}*, which are known in the marker analysis, should be replaced by , samples from the

*t*th iteration. Then the following extra steps are needed to sample

**λ**,

**X**:

Step 5. Update QTL genotypes: The QTL genotype (

*i*= 1, 2,…,*n*;*j*= 1, 2,…,*p*) is updated one individual and one locus at a time, on the basis of their flanking marker information. Specifically, is sampled from the conditional probability distribution*p*(*x*), which equalswhere and are the two flanking markers of the*j*th QTL in the individual*i*; is the conditional probability of the QTL (at ) given the two flanking markers; and the probability(16)Step 6. Update QTL positions: is sampled via a Metropolis–Hastings approach since there is no closed form for the conditional posterior probability density of a QTL position. We first sample a new position uniformly from the neighborhood of with δ being a tuning parameter, which we take a value of 2 cM in subsequent simulations. will then be accepted or rejected according to the probability min(1, α), where(17)If neither nor is within δ-distance away from the ends of the interval, . However, if or/and are within δ-distance from one end of the interval, then(18)where is the distance of from the nearest end of the interval; similarly, is the distance of from the nearest end of the interval.

After these two extra steps for sampling **λ**, **X**, one MCMC iteration is done and we then continue our sampling for the next round by repeating steps 1–6 with the new ψ.

##### Simulation studies:

For interval mapping, a backcross population of 500 individuals was simulated. We investigated a single large chromosome of 24 M that was covered by 121 evenly spaced markers (120 intervals, each 20 cM long). Two sets of QTL are simulated. In the first setup, four QTL are placed at 50, 450, 850, and 1250 cM with effects 1, −1, 1, and −1, respectively. The second simulation setups are described in Table 2. Figure 7 shows the estimated QTL effect against QTL positions along the genome for continuous and binary data for setup one. The QTL signals are clearly shown at the simulated positions for both cases. The QTL effects are almost equal to the true values in the normal case but a little larger than the true values in the binary case. Figure 8 presents the QTL positions and effects along with the real positions and effects for continuous and binary data, respectively, for setup two. It is shown that the Bayesian shrinkage method performs better for normal data than for binary data, especially for those QTL that locate in the region beyond 100 cM and have small effects. Parameters of closely linked QTL are hard to estimate. For example, the following QTL pairs, (5, 6), (10, 11), and (16, 17), are so tightly linked that they are inseparable in both normal and binary data cases. QTL 7 and 8, which are closely linked and have the same size but with opposite directions, are not detected in either data set. Nevertheless, for binary data, our method can clearly localize separable QTL and QTL with large effects. A much denser marker set along with a larger population size is required to resolve QTL with tight linkage and small effect.

#### Convergence diagnostics tests:

Two convergence diagnostics tests are performed using convergence diagnosis and output analysis (CODA) software (Best *et al*. 1995) to make sure that the sample is representative of the underlying stationary distribution. The parameters we consider here include μ and the effect of four markers at which the true QTL are located. Figure 9 shows the autocorrelation function against the lag for the setup one simulation described in the *Marker analysis* section, where the number of markers is larger than the number of samples. The autocorrelation appears to be very small after lag 20 for all five parameters. The small autocorrelation shows no indication of slow convergence for the chain. For the same data set, we run five parallel Markov chains with the same length 10,000 using very different initial values to test Gelman and Rubin statistics. Figure 10 shows plots for the 50 and 97.5% quantiles of the sampling distribution for the shrink factor for above five parameters. The plots in Figure 10 show that both the median and the 97.5% quantiles stabilize around the expected value 1.0, indicating that the Markov chains converge to their limiting distribution.

## DISCUSSION

The Bayesian method developed in Xu (2003) has been extended to complex binary traits assuming the threshold model. The results outperform the single-marker chi-square test. The major advantage of using the threshold model is that once the underlying liability is generated, all other unknowns have conditional posterior distributions identical to those already given in the Bayesian analysis of normal data. This methodology can be easily generalized to multiple-ordered categorical traits. As an alternative to the approach of the liability augmentation, one can directly use the relationship (3) between the binary phenotype and the model effects. The problem is that normal is not a conjugate prior to logistic distribution and therefore Gibbs sampling will not work in this case. The Metropolis–Hastings algorithm or some complicated sampling scheme like ARMS (Gilks and Wild 1992) can be used but it is less efficient than the Gibbs sampling. The method has been implemented in C and the source codes are available from http://www.bios.unc.edu/∼hhuang/QTLBinary.

The most striking feature of the Bayesian analysis is that the estimated profiles show clear signal only at a few positions with all remaining markers having effects that are close to zero. This shrinkage effect has been explained in detail in Xu (2003). The key point is that the priors of the variances of QTL effects are allowed to vary across markers instead of being fixed. This is demonstrated clearly in the special forms of (12) and (13). A large *a _{j}* will have large variance σ

_{j}and thus a negligible , which will lead to a small shrinkage effect. However, if a QTL has a small effect, a small will be expected, which will generate a large ratio and dominate over ; thus

*a*will be very likely shrunk toward zero.

_{j}In this article, we have considered the cases that QTL are fixed at the observed markers and that one QTL can be located within each marker interval. For densely distributed markers, the former method is a good approximation to the latter one since the two methods make almost no difference. However, the first method may lead to biased results when the marker density is not high enough. One assumption made in the second method is that each interval contains at most one QTL, which may not be true if two markers are far apart. Yi (2004) has developed a new method that assumes that, on average, there is at most one QTL in every *d* cM. The pseudomarkers are introduced to make every interval the same length. The genotypes at pseudomarkers are treated as missing and can be easily handled in a Bayesian framework.

Several frequency-based model selection methods have been developed for mapping multiple QTL. To reduce modeling space, Carlborg and Andersson (2002) included only those markers that are significantly associated with the trait from the single-locus model in the multiple-marker model. Coffman *et al*. (2005) first selected one marker per linkage group, regardless of whether that marker is significantly associated with the trait or not, and then fitted their model on the basis of the chosen markers. Both approaches may identify multiple linked QTL in a biased way as mentioned in Kao *et al*. (1999). Our method keeps all possible models under consideration and therefore avoids potential selection bias of the above approaches. Although the variable dimension of our model is very high, our Bayesian model is simple and straightforward since standard Gibbs are mainly used in each MCMC iteration. We have tested the method on 300 individuals with 2000 markers and it took less than only 1 hr to generate 100,000 sweeps on a 3-GH linux machine. Convergence diagnostic tests also show that our method converges quickly and well.

Our method is particularly designed for the situation where no any prior information for the number of QTL and their positions is available. However, it should be noted that the method can be modified to include such prior information. For example, instead of using a noninformative prior for , we may use different priors for on the basis of the available information from other studies. For example, if there is strong evidence of a QTL in a region, we may let σ^{2}'s in the region follow a uniform prior on [*a*, *b*] with *a* being large. On the other hand, if there is strong evidence that no such QTL exists in another region, we may then let σ^{2}'s in this region follow a uniform prior on [0, *c*] with *c* being small. Of course, introducing such region-specific priors complicates the MCMC procedure but would be beneficial especially when the sample size is small.

Further, our model currently can handle only main QTL effects. Another important extension is to include epistatic effects in the model. Several articles (Yi and Xu 2002; Narita and Sasaki 2004) have discussed this issue for normal traits. To include epistatic effects, an upper bound of the number of QTL in the model (Yi *et al*. 2005) has to be placed, which is extremely useful and necessary since the number of variables dramatically increases when epistatic effects are considered.

Finally, it worth mention that we have run marker analysis only on the real data. We have found that a significantly amount of markers are in segregation distortion, which will affect the genetic map that the interval mapping heavily relies on. However, marker analysis is unrelated to the genetic map and therefore is more robust to such segregation distortion. We are currently carefully investigating the causes of such segregation distortion, which itself may provide us insights on the etiology of colorectal cancer.

## Acknowledgments

The authors thank the associate editor and two referees for helpful comments and suggestions, which have led to an improvement of this article. This work has been partially supported by National Institutes of Health grants MH070504, GM074175, CA105417, and CA079869.

## Footnotes

Communicating editor: J. B. Walsh

- Received August 17, 2006.
- Accepted April 28, 2007.

- Copyright © 2007 by the Genetics Society of America