- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Yi, N.
- Articles by Allison, D. B.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Yi, N.
- Articles by Allison, D. B.
Bayesian Model Choice and Search Strategies for Mapping Interacting Quantitative Trait Loci
Nengjun Yia,b, Shizhong Xud, and David B. Allisona,b,ca Department of Biostatistics, University of Alabama, Birmingham, Alabama 35294
b Section on Statistical Genetics, University of Alabama, Birmingham, Alabama 35294
c Clinical Nutrition Research Center, University of Alabama, Birmingham, Alabama 35294
d Department of Botany and Plant Sciences, University of California, Riverside, California 92521
Corresponding author: Nengjun Yi, University of Alabama, Birmingham, AL 35294-0022., nyi{at}ms.soph.uab.edu (E-mail)
Communicating editor: G. CHURCHILL
| ABSTRACT |
|---|
Most complex traits of animals, plants, and humans are influenced by multiple genetic and environmental factors. Interactions among multiple genes play fundamental roles in the genetic control and evolution of complex traits. Statistical modeling of interaction effects in quantitative trait loci (QTL) analysis must accommodate a very large number of potential genetic effects, which presents a major challenge to determining the genetic model with respect to the number of QTL, their positions, and their genetic effects. In this study, we use the methodology of Bayesian model and variable selection to develop strategies for identifying multiple QTL with complex epistatic patterns in experimental designs with two segregating genotypes. Specifically, we develop a reversible jump Markov chain Monte Carlo algorithm to determine the number of QTL and to select main and epistatic effects. With the proposed method, we can jointly infer the genetic model of a complex trait and the associated genetic parameters, including the number, positions, and main and epistatic effects of the identified QTL. Our method can map a large number of QTL with any combination of main and epistatic effects. Utility and flexibility of the method are demonstrated using both simulated data and a real data set. Sensitivity of posterior inference to prior specifications of the number and genetic effects of QTL is investigated.
MOST complex traits important to evolution, animal and plant breeding, and medical genetics are influenced by multiple genetic and environmental inputs. It has been speculated that epistases (interactions among two or more genetic loci) are ubiquitous in the genetic control and evolution of complex traits (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Statistical methodology for epistatic modeling in gene mapping is in its infancy. Early methods of QTL mapping were built mainly on single-QTL models in which one QTL is fitted to the model at a time and only marginal QTL effects are detected (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
There has been a surge of interest in Bayesian approaches to mapping multiple QTL. Earlier Bayesian approaches to QTL mapping were developed to estimate the locations and the effect parameters for multiple QTL with a prespecified number of QTL (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article, we treat the number of QTL as a random variable. Under the current number of QTL, we introduce indicator variables to represent the number of effects and the actual effects presented in the model. We use the methodology of the Bayesian model and variable selection to develop strategies for searching for multiple QTL with complex epistatic patterns and estimating the associated parameters simultaneously. Incorporation of epistasis not only necessitates determining the genetic model conditional on the number of QTL, but also necessitates a new algorithm for inferring the number of QTL since adding or deleting QTL will depend on the presence of either main or epistatic effects. An efficient reversible jump MCMC algorithm is proposed to update the number of QTL under complicated epistatic models. We also develop a reversible jump MCMC method to infer main and epistatic effects conditional on the number of QTL.
| METHODS |
|---|
Epistatic model:
We describe the method primarily for a mapping population with only two segregating genotypes, e.g., a backcross, double haploid lines (DHLs), or recombinant inbred lines (RILs). Assume that a complex trait is affected by l QTL in a mapping population. Among the l QTL, some may exhibit only main effects, some may show both main effects and epistasis, and others may show only epistasis. For a continuously distributed trait, the observed phenotypic value of individual i, yi, can be described by the following linear model,
![]() |
(1) |
where n is the number of individuals in the mapping population, µ is the overall mean, l is the number of QTL on the genome, aq is the main effect of putative QTL q, bq1q2 is the epistatic effect between putative QTL q1 and q2, xiq is the indicator variable denoting the genotype of putative QTL q for individual i and is defined by 0.5 or -0.5 for the two genotypes in the mapping population (![]()
![]()
2e),
q is a binary indicator variable for main effect of putative QTL q, taking value one if QTL q has main effect and zero otherwise, and
q1q2 is a binary indicator variable for epistatic effect between QTL q1 and q2, taking value one if QTL q1 and q2 interact and zero otherwise. Hereafter, we call these binary indicator variables effect indicators. Apparently, an effect indicator
taking value zero is equivalent to setting zero to the corresponding effect or removing the corresponding effect from the model. As will be seen, however, the introduction of the effect indicator variables facilitates our description and analysis of the problem. Finally, it is worth noting that each QTL included in model (1) has at least one nonzero effect indicator.
In practice, we observe the phenotypic values
and a set of marker genotypes
, where K is the number of markers. For a mapping population with only two segregating genotypes, mik takes one of two values, e.g., 0 and 1, denoting two different genotypes. Assume that marker linkage maps have been developed on the basis of observed marker data so that the locations of markers on each chromosome are known a priori. The observed marker data are used to infer the probability distribution of the unobserved putative QTL. Since it may be difficult to distinguish multiple QTL on a marker interval (e.g., ![]()
Based on the observed data, our aim is to search all types of QTL and to infer their locations, main effects, and epistatic effects simultaneously. There are several special statistical problems, which complicate mapping multiple interacting QTL. First, one does not know a priori how many QTL to expect for a given trait. Therefore, the number of QTL has to be treated as an unknown parameter and inferred along with other parameters in the model. Under the epistatic model, the l putative QTL are chosen on the basis of either their significant main effects or their epistatic effects; this complicates inference about the number of QTL. Second, the locations of multiple QTL are unknown. Therefore, one needs to search for QTL on the entire genome. Third, mapping multiple interacting QTL requires accommodating a very large number of potential genetic effects, even when one assumes only a moderate number of QTL. For example, if there are 10 QTL, 10 parameters would be required to model the main effects, and 45 parameters to model digenic epistasis. In fact, many of these possible effects, particularly the epistatic effects, are not significantly different from zero and thus should be removed from the model. However, one does not know a priori which effects are significantly different from zero. Moreover, in the traditional frequentist framework, such a backward selection procedure model would increase the type I error rate if not accommodated in some way (![]()
Prior distributions and joint posterior distribution:
One can view the problem of mapping epistatic QTL as one of model determination and variable selection. Conditional on the observed data y and M, we seek to determine the number of genes and associated effect indicator variables and to estimate the locations of identified genes and the model parameters. Hereafter, we denote the number of QTL by l, locations by
, vector of the effect indicators by
, main effects by a, epistatic effects by b, genotypic indicator for the genotypes of putative QTL by x, and all model parameters by
In some situations, the genotypes of some markers and the phenotypic values of some individuals are missing. For convenience of description, all marker genotypes and phenotypic values are assumed known, although they can be imputed (see the next section).
We employ a Bayesian framework in which statistical inference is based on the joint posterior distribution of all unknowns. The joint posterior distribution of all unknowns, (l,
,
,
, x), can be written as
![]() |
(2) |
The first term in this equation is the conditional distribution of the phenotypic data given all unknowns and has the form

where
The second term p(x|l,
, M) is the probability distribution of QTL genotypic indicator conditional on the number, locations of QTL and marker genotypes. Assuming that there is at most one QTL on any marker interval, p(x|l,
, M) can be factorized into the products
where mLiq and mRiq represent the left and the right flanking marker genotypes, respectively. Methods with which to calculate p(xiq|
iq, mLiq, mRiq) in different experimental designs are described in ![]()
The third term p(l,
,
,
) in Equation 2 is the joint prior distribution of the parameters (l,
,
,
). Assuming prior conditional independence of the parameters, we can factorize the joint prior distribution into the following products:

The prior distribution of l, p(l), is chosen to be uniform (0, lmax), or Poisson(lmean), for a prespecified integer lmax or Poisson mean lmean. To specify the conditional prior for the vector of QTL positions
, p(
|l), we assume that (1) a QTL is uniformly distributed across the entire genome and (2) there is at most one QTL on a marker interval. Under these two assumptions, the conditional prior distribution of QTL locations can be calculated as

where p(
1) is uniform across the entire genome, and p(
q|
1, ... ,
q-1) is uniform across the regions of the unoccupied marker intervals (without QTL). The prior distribution p(
|l) reflects our subjective belief on the genetic effects conditional on the number of QTL. When no information regarding the genetic architecture of the trait is available, p(
|l) can be decomposed into the products of all elements, i.e.,
and a natural choice for p(
q) and p(
q1q2) is uniform at two states of 0 and 1. This prior specification gives the same prior weight to all models conditional on the number of QTL and is widely used as a noninformative prior in variable selection problems (![]()
N(
0,
20), aq
N(
,
2), and bq1q2
N(
,
2), with prespecified prior means
0 and
and variances
20 and
2. The prior for
2e is assumed to be uniform on a predefined interval. The lower and upper bounds for
2e are usually set to zero and the phenotypic variance present in the data, respectively.
Posterior computations:
The calculation of the above joint posterior distributions is analytically intractable, and thus the MCMC approach is required to obtain observations from the joint posterior distribution. We use the reversible jump MCMC method of ![]()
a.Update the missing marker genotypes and phenotypic values.
b.Update the model parameters (
).
c.Update the genotypic indicators (x).
d.Update the locations of QTL (
).
e.Update the effect indicator (
): add or delete a main or epistatic effect.
f.Add one new QTL with main effect or epistatic effects between existing QTL, or delete an existing QTL.
g.Add two new QTL with only an epistatic effect between themselves, or delete two existing QTL.
It is noted that deleting an effect and deleting one or two QTL from the model may result in some QTL having all corresponding effect indicators equal to zero. If it occurs, we remove such QTL from the model. A complete pass through these steps is referred to as a sampling iteration. The algorithm starts from an initial point (l(0),
(0),
(0),
(0), x(0)) and then proceeds to update each of the groups of the unknowns in turn until a certain criterion of convergence is reached (![]()
(t),
(t),
(t), x(t)):t = 1, 2, ...} are random drawings from the joint posterior distribution p(l,
,
,
, x|y, M), and samples of individual parameters can be regarded as random drawings from the appropriate marginal posterior distribution. These posterior samples are used to draw inferences about parameters of interest.
Except for the effect indicator vector and the number of QTL, other unknowns do not result in a change in the dimension of the parameter space. Therefore, these unknowns can be updated using traditional Metropolis-Hastings (M-H) algorithms (![]()
, and x, model (1) is a conventional linear model, and thus the model parameters
can be generated using the Gibbs sampler (![]()
![]()
, x, and
, the missing phenotypic values can be sampled from normal distributions (![]()
![]()
![]()
![]()
q, we propose a new location
*q from uniform (
q - d,
q + d), where d is a predetermined tuning parameter, and generate genotypic indicators at the new location for all individuals. The proposals for the new location and the genotypic indicators are then accepted or rejected using the M-H algorithm (![]()
![]()
![]()
![]()
The effect indicator vector
represents which effects are present in the model. Conditional on other unknowns, updating
is a variable selection problem and thus a variety of Bayesian variable selection methods can be applied to update
(![]()
, corresponding to possible QTL effects for model (1), including l main effects and l(l - 1)/2 digenic epistasis. To update
, we randomly select one of all l(l + 1)/2 elements and then propose a change for this element. If the element associated with the epistasis between QTL q'1 and q'2 is selected, for example, then we propose to change
to
' where
with all other components remaining the same. In other words, we propose to add the epistasis bq'1q'2 into the model if the effect bq'1q'2 is out of the current model, or delete the epistasis bq'1q'2 from the model if the effect bq'1q'2 is included in the current model. The proposal can be accepted or rejected according to the reversible jump MCMC method (![]()
can be found in Appendix A.
To detect multiple QTL with any combination of main and epistatic effects, we can add four types of QTL into the current model. These four types of QTL include a new QTL with only main effect, a new QTL having only epistatic effects with the current QTL, a new QTL having both main effect and epistatic effects with the current QTL, and two new QTL with only epistatic effects between themselves. We develop two novel reversible jump steps to search for all four types of QTL. Specifically, we use step f to search the first three types of QTL and step g to search the fourth type of QTL. With these two steps, we are able to identify QTL with complex epistatic patterns.
In step f, we randomly decide whether to propose (a) adding one new QTL with main effect or epistatic effects between existing QTL or (b) deleting an existing QTL with equal probability. Let j(l + 1|l) and j(l - 1|l) = 1 - j(l + 1|l) be the probabilities of choosing either move type. Of course, j(l - 1|l) = 0 if l = 0 and j(l + 1|l) = 0 if l = lmax, where lmax is the maximum value allowed for l, and otherwise we choose j(l - 1|l) = j(l|l - 1) = 0.5, for l = 2, 3, ... , lmax - 1. Adding one QTL requires generating additional parameters for the new QTL, i.e., new location, genotypic indicators, effect indicators, and selected new effects. A detailed description for generating these additional parameters and the acceptance probability are given in Appendix B. To delete one QTL, we randomly select one QTL among the existing QTL and then calculate the relevant acceptance probability. The acceptance probability for deleting QTL is also derived in Appendix B.
In step g, we make a random choice between attempting to add two new QTL with only epistatic effects or deleting two existing QTL, with probabilities j(l + 2|l) and j(l - 2|l) = 1 - j(l + 2|l), respectively. Similar to step f, we choose j(l - 2|l) = j(l|l - 2) = 0.5, for l = 2, 3, ..., lmax - 2, and j(l - 2|l) = 0 if l
1 and j(l + 2|l) = 0 if l
lmax - 1. To add two QTL, we need to generate two new locations on the genome, genotypic indicators at these two locations for all individuals, and an epistatic effect for these two QTL. A detailed sampling scheme for generating these unknowns and the acceptance probability are given in Appendix C. The acceptance probability for deleting two QTL is also given in Appendix C.
| SIMULATION STUDIES AND REAL DATA ANALYSIS |
|---|
Simulation studies:
The applicability of the proposed method was demonstrated by analyzing three simulation experiments. Each experimental population was a backcross containing 300 segregating individuals. In designs I and II, one chromosome of length 150 cM was simulated, and 16 codominant markers were placed on the simulated chromosome with positions shown at the numerical labels of the horizontal axis of Fig 1. Zero and three QTL were simulated in designs I and II, respectively. The overall mean and the residual variance were set to µ = 0 and
. In design II, three QTL were placed at 25, 63, and 103 cM, respectively, and the nonzero main and epistatic effects of the three loci were a1 = -0.70, a2 = 0.75, and b23 = 1.0, where aq is the main effect of QTL q, and bq1q2 is the epistatic effect between QTL q1 and q2. Therefore, the third QTL had no main effect but interaction with the second QTL. The proportions of phenotypic variance explained by the nonzero effects were
9, 10, and 8%, respectively. In design III, four chromosomes with length 100 cM each were simulated, and 11 codominant markers were placed on each chromosome with positions shown at the numerical labels of the horizontal axis of Fig 3. We simulated seven QTL with complex epistatic patterns controlling the expression of a quantitative trait. The locations of the simulated QTL, the genetic effects (main and epistatic effects), and the heritability explained by each effect are shown in Table 1. The overall proportion of phenotypic variance explained by all QTL was
55%. The overall mean and the residual variance were µ = -1 and
respectively. To evaluate the ability of our method for handling missing markers and missing phenotypic values, we randomly generated missing markers of 15% and missing phenotypic values of 10% in design III.
|
|
|
|
Both designs I and II were replicated five times and the averaged results were reported. For all analyses, the MCMC were started with no QTL in the model, and the same prior distributions for the overall mean and the effect indicator vector were used. We tried to run the MCMC sampler using other starting values for the number of QTL and found the results to be consistent. The prior for the overall mean was N(0, 2). The residual variance took uniform(0, V), where V represents the phenotypic variance observed in each design. The prior for the indicator vector
was taken as uniform on the space of
as described earlier. The tuning parameter of proposals in the random-walk Metropolis-Hastings algorithm for updating QTL locations was chosen to be 2.0 cM. Different prior distributions for the number of QTL and the genetic effects were chosen for the analyses of three designs and are described later.
In each analysis, the MCMC sampler was run for 3 x 105 cycles after discarding the first 2000 cycles for the burn-in period. The chain was thinned (one iteration in every 15 cycles was saved) to reduce serial correlation in the stored samples so that the total number of samples kept in the post-Bayesian analysis was 2 x 104 (![]()
Zero-QTL model:
In design I, we tested whether any spurious loci were detected when there is no true QTL. The priors for all main and epistatic effects were chosen to be N(0, 0.5). The prior distribution of l was taken as uniform(0, 6) and Poisson distribution with three different prior means (see Table 2). For all analyses, we checked the plots of the changes in the number of QTL against the number of the iterations (not shown here). These plots showed that the MCMC sampler mixes well, with excursions into high values being short lived. The estimated posterior probability distributions for the number of QTL, averaged over five replicates, are given in Table 2. The posterior probability p(l = x|y, M) (x = 0, 1, 2, ...) was obtained by counting the number of samples in which the number of QTL is l, divided by the total of number of samples. For all prior specifications, the posterior probabilities of l = 0 have the highest value. Table 2 also gives Bayes factors B(0, 1) (= p(l = 0|y, M)/p(l = 1|y, M) x p(l = 1)/p(l = 0)), comparing a model with no QTL against a model with one QTL (![]()
![]()
|
Three-QTL model:
In the analyses of design II, we were primarily interested in checking for sensitivity to the choice of prior specifications for the number and genetic effects of QTL. The prior distribution for the number of QTL was taken as uniform(0, 10) and Poisson distribution with four different prior means (see Table 3). The priors for all main and epistatic effects were chosen to be N(0,
2) and three different values for
2 were considered (see Table 3). The prior distributions of the number and genetic effects of QTL appear in the acceptance ratios (see Appendix A AC) and thus may influence the posterior distributions of the parameters of interest and the mixing behavior of the chain.
|
As observed in the analyses of design I, the plots of the changes in the number of QTL against the number of the iterations showed that the MCMC sampler mixes well, changing frequently but always centralized around the posterior mode of the QTL number (not shown here). The estimated posterior probability distributions for the number of QTL are given in Table 3. In all analyses, the posterior mode of the number of QTL coincided with the simulated value and the posterior mean approximately equaled the true value. The posterior mode of the number of QTL was not affected by the choice of prior mean for the number of QTL and prior variance for genetic effects. However, the posterior probabilities were affected by these prior specifications. When taking Poisson distribution as the prior for the number of QTL, increasing the prior mean favored a higher number of QTL. Similarly, reducing the prior variance for genetic effects also favored a higher number of QTL. Table 3 also gives Bayes factors B(3, 2) and B(3, 4) for all analyses. Apparently, Bayes factors did not seem to be affected by the prior of l, but were affected by the choice of prior variance of genetic effects. However, all analyses supported a model with three QTL.
QTL locations were estimated using the posterior QTL intensity function (![]()
![]()
![]()
Seven-QTL model: In design III, we primarily evaluated the ability of the proposed method for mapping complex epistatic genes. The prior for the number of QTL was taken as uniform(0, 20). The priors for all main and epistatic effects were chosen to be N(0, 1). We tried to run the MCMC sampler using other prior distributions for the number of QTL and the main and epistatic effects and found the results to be similar. When Poisson with a small prior mean was taken as prior distribution for the number of QTL, however, the chain did not mix well and converged slowly. For uniform or Poisson distributions with suitable prior means, the chains mixed well.
The estimated posterior probability distribution for the number of QTL is given in Fig 2. It is immediately apparent from Fig 2 that the estimated posterior mode of the number of QTL coincides with the simulated value and the estimated posterior mean approximately equals the true value. The posterior probability of the true value of the QTL number was much greater than those of other values. Therefore, the number of QTL was estimated accurately using our method.
The profiles of posterior QTL intensity are shown in Fig 3 (left side). The regions of 95% highest posterior density (HPD; ![]()
|
Inference for the QTL effects and the proportion of phenotypic variance explained by each effect was obtained conditional on the estimated number of QTL (estimated mode) and the estimated loci falling into the corresponding 95% HPD regions. Although the empirical marginal posterior distribution for each effect and heritability can be depicted, for simplicity we report only the posterior means and standard deviations. These estimates were calculated to average over all models conditional on l = 7 with the effect set to zero if the effect was not selected. Table 4 gives the estimated posterior means and posterior standard deviations for the main and epistatic effects of the identified QTL, as well as the heritability of each effect. Compared with Table 1, we can see that most of the effects and the heritabilities of the effects were estimated accurately. Table 4 also gives the posterior probability that each effect is included in the model (i.e., that the corresponding indicator is nonzero). For all simulated main and epistatic effects, the corresponding posterior probabilities were estimated to be >90%. However, the posterior probabilities for all nonexisting effects were small.
The estimates of the overall mean and the residual variance were -0.911 and 1.052, respectively. The posterior standard deviations for these estimates were 0.137 and 0.102, respectively. Therefore, the overall mean and the residual variance were estimated with high precision.
We reanalyzed the simulated data using the nonepistatic model, where only main effects of QTL are fit in the model. The priors for the number, locations, and main effects of QTL and the tuning parameter for updating the locations of QTL were set to be the same as those in the above analysis. The estimated posterior probabilities for the number of QTL are given in Fig 2. Obviously, the estimated posterior mode was much smaller than the true number of QTL. The posterior mean was estimated to be 3.14, which was also smaller than the true number of QTL. Therefore, it is immediately apparent that some QTL remained undetected when ignoring epistasis. The profiles of the posterior QTL intensity are depicted in Fig 3 (right side). Clearly, all QTL with only epistatic effects were not detected in this analysis. Surprisingly, the two QTL simulated on chromosome one were actually not detected, although these two QTL have main effects. It was found that the main effects were estimated with bias. Finally, the estimates of the overall mean and the residual variance were -0.653 and 1.672, respectively. As expected, the residual variance was estimated to be greater than the true value because the epistatic variation was absorbed into the residual variance.
Real data analysis:
Data from the North American Barley Genome Mapping Project (![]()
where
is the mean and s is the standard deviation of y. The standardized records were subject to the Bayesian analysis.
The priors for the number, locations, and main and epistatic effects of QTL and the tuning parameter for updating the locations of QTL were set to be the same as those in the analysis of design III described above. The length of the chain was also set to be the same as that in the simulated studies. The posterior probability distribution of the number of QTL is depicted in Fig 4, showing that the variation of the trait is most likely contributed by seven or eight loci with an equal chance (
30%), and a slight chance (
18%) contributed by nine loci. The estimated posterior expectation of the number of QTL was 7.98. The posterior probability that there are at least seven loci was
92%. Therefore, a conservative conclusion is that at least seven loci are contributing to the genetic variation. In contrast, using the interval mapping method (![]()
![]()
![]()
|
The posterior QTL intensity is shown in Fig 5. The 95% HPD regions and the posterior modes of the locations of the seven QTL are given in Table 5. Seven QTL were localized at six chromosomes with width of 95% HPD from 20 to 38 cM. The three regions, on chromosomes one, four, and seven, respectively, were declared by ![]()
![]()
|
|
Conditional on the estimated number of QTL (l = 7) and the estimated loci falling into the corresponding 95% HPD regions, we calculated the main and epistatic effects and the heritability explained by each effect (Table 5). Four QTL exhibited negative main effects, and the other three QTL showed positive main effects. The posterior standard deviations for the estimates of the main effects were small, indicating that the estimates were fairly precise. Table 5 also gives the posterior probability that each effect is included in the model. The estimated posterior probability that each main effect is included in the model was >85%. It seems that there were four nonzero epistatic effects, i.e., b14, b35, b45, and b46, whose heritabilities are 2.5, 1, 0.9, and 1.1%, respectively. The posterior probability including the strongest epistatic effect, b14 in the model, was
85%, and other posterior probabilities were <55%. Therefore, we can conclude that there is at least one epistatic effect for heading in this DH population. The proportion of phenotypic variance explained by each main effect ranged from 5.3 to 11.3%. However, the overall proportion of phenotypic variance explained by the seven QTL was
60%.
| DISCUSSION |
|---|
Epistatic variance can be an important source of variation for complex traits. However, detecting complex epistatic genes is difficult primarily due to analytical methods, insufficient sample sizes, and study designs that exclude epistasis (![]()
![]()
![]()
![]()
![]()
![]()
Our approach differs from the existing reversible jump algorithms for QTL mapping in several essential ways. First, most of the reversible jump algorithms have been developed on the basis of nonepistatic models. In nonepistatic models, only an effect parameter is associated with one QTL and thus it is relatively easy to implement the reversible jump for adding one QTL or deleting one QTL. In epistatic models, however, many epistatic effects are involved when adding or deleting a QTL. In our previous study (![]()
to represent the number of effects and the actual effects presented in the model. The number of effects and the epistatic effects included in the model are proposed from the prior distribution of
. With such an algorithm, we have a chance at each step to pick a "good" combination of effects for the new QTL in arbitrarily complicated situations. The acceptance rate for adding or deleting one QTL in our method is typically 35%. Importantly, the acceptance rates are rarely influenced by the true number of QTL and the pattern of epistasis. These properties make our algorithm generally useful for mapping QTL under complicated epistatic models. Second, our method can search QTL with only epistatic effects between themselves, which is achieved in step g. Since this type of QTL does not depend on other types of QTL, we search them in a separate step, which simplifies the MCMC algorithm. In our simulation studies, the acceptance rate for step g is typically 0.51%. Third, our approach uses a new reversible jump step to determine the genetic models in terms of the number of effects. We have treated this problem as a variable selection problem. Model updating is performed by local moves where single terms are added or deleted from the model. When a variable is added or deleted from the model, all remaining parameters are unchanged. Such a procedure has been successfully applied to model determination and variable selection in a log-linear model (![]()
![]()
2530%. For our complex simulated data sets and the real data, these proportions are satisfactory and show that our proposal based on the local moves and the fully conditional posterior distribution is sensible.
The aim of our approach is similar to the MIM method proposed by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Our method appears to work well when tested using both the complex simulated data and real data set. However, we do not want to claim that the proposed algorithm is absolutely the optimal one. Improvements in several aspects can be made, e.g., computational efficiency, acceptance rates for adding or deleting QTL, and mixing behavior. The proposed reversible jump algorithm is computationally intensive. For analysis of a single data set of average size, our current program takes several hours, which makes a full-scale simulation study with many replicates of each choice of parameters a difficult task. However, the time required could be greatly reduced by optimizing the program codes and using a more efficient sampler (![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We are very grateful to Dr. Alfred Bartolucci, the associate editor, and two anonymous reviewers for their helpful comments and criticisms. This work was supported in part by the National Institutes of Health (NIH) no. GM55321; the U.S. Department of Agriculture National Research Initiative Competitive Grants Program 00-35300-9245 to S.X.; and NIH grants R01ES009912, P41RR006009, R01DK054298, and P30DK56336 to D.B.A.
Manuscript received March 10, 2003; Accepted for publication May 19, 2003.
| APPENDIX A |
|---|
UPDATING THE INDICATOR VECTOR OF QTL EFFECTS
AND CORRESPONDING EFFECTS
Assume that the current number of QTL is l and that there are a total of l(l + 1)/2 possible QTL effects for model (1), including l main effects and l(l - 1)/2 digenic epistasis. We randomly select one of the l(l + 1)/2 possible effects. If the epistasis between QTL q'1 and q'2 is picked, then we propose to change
to
', where
with all other components remaining the same.
If
, i.e., the epistasis bq'1q'2 is currently out of the model, we propose to add this epistasis into the model; this needs to generate an additional effect parameter from a proposal density q(bq'1q'2). We take the following fully conditional posterior distribution as the proposal density

where
and
and
2 are prior mean and variance of QTL effects, respectively. The proposal move is accepted with probability min(1, r), where

and
. If the proposed move is accepted, update
by
' and add the effect bq'1q'2 into the model; otherwise leave the state unchanged.
If
i.e., the epistasis bq'1q'2 is in the current model, we propose deleting this epistasis from the model. The proposal move is accepted with probability min(1, r), where

where
' is
with bq'1q'2 removed, and q(bq'1q'2) takes the form

where
If the proposed move is accepted, update
by
' and delete the effect bq'1q'2 from the model; otherwise the state remains unchanged.
| APPENDIX B |
|---|
ADDING OR DELETING ONE QTL
We first describe the sampling scheme for generating unknowns for the added QTL, i.e., new location, new genotype indicators for all individuals, effect indicators, and selected new effects, and then derive the acceptance probabilities for adding and deleting one QTL.
The unknowns for new QTL are generated as follows:
- Sample a location
* uniformly from all unoccupied intervals on the whole genome. - Sample the genotype indicator
for all individuals from the conditional distribution 
(3) A total of l + 1 possible new effects are associated with this proposed QTL, including one main effect and l epistatic effects between the proposed QTL and all existing QTL. In most situations, however, not all these effects are significant. Therefore, we first generate a random integral k between 1 and l + 1 and then pick k effects of l + 1 possible new effects at random. The proposed new effects, denoted by ß* = (ß*1, ... , ß*k), can be sampled sequentially from the conditional distributions,

These conditional distributions are univariate normal and thus can be easily sampled from.
The change in the number of QTL from l to l + 1, together with the proposed parameters, is accepted with probability min(1, r), where the acceptance ratio is

where
' is the indicator vector of QTL effects after adding the proposed QTL,
' = (
, ß*), x' = (x, x*), and

Deleting a QTL is simply the reverse process. A QTL is randomly chosen among the existing QTL. The chosen QTL, together with all corresponding parameters, is then proposed to delete from the model with probability min(1, r), where

where ß* is the main and epistatic effects of the deleted QTL, and
',
', and x' are
,
, and x, respectively, with the items corresponding to the deleted QTL removed.
| APPENDIX C |
|---|
ADDING OR DELETING TWO EPISTATIC QTL
In step g, we need to propose two QTL with only an epistatic effect between themselves. The associated parameters can be generated as follows:
- Sample two locations,
*1 and
*2, uniformly from all unoccupied intervals on the whole genome. - Sample the genotype indicators
and
, for the two proposed locations, respectively, from the conditional distributions






