- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Yi, N.
- Articles by Xu, S.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Yi, N.
- Articles by Xu, S.
Bayesian Mapping of Quantitative Trait Loci Under the Identity-by-Descent-Based Variance Component Model
Nengjun Yia and Shizhong Xuaa Department of Botany and Plant Sciences, University of California, Riverside, California, 92521-0124
Corresponding author: Nengjun Yi, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521., yi{at}genetics.ucr.edu (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
Variance component analysis of quantitative trait loci (QTL) is an important strategy of genetic mapping for complex traits in humans. The method is robust because it can handle an arbitrary number of alleles with arbitrary modes of gene actions. The variance component method is usually implemented using the proportion of alleles with identity-by-descent (IBD) shared by relatives. As a result, information about marker linkage phases in the parents is not required. The method has been studied extensively under either the maximum-likelihood framework or the sib-pair regression paradigm. However, virtually all investigations are limited to normally distributed traits under a single QTL model. In this study, we develop a Bayes method to map multiple QTL. We also extend the Bayesian mapping procedure to identify QTL responsible for the variation of complex binary diseases in humans under a threshold model. The method can also treat the number of QTL as a parameter and infer its posterior distribution. We use the reversible jump Markov chain Monte Carlo method to infer the posterior distributions of parameters of interest. The Bayesian mapping procedure ends with an estimation of the joint posterior distribution of the number of QTL and the locations and variances of the identified QTL. Utilities of the method are demonstrated using a simulated population consisting of multiple full-sib families.
THE identity-by-descent (IBD)-based variance component analysis is a powerful statistical method for quantitative trait loci (QTL) mapping in outbred populations, such as humans. This method requires fewer assumptions than other methods with regard to the genetic model underlying the expression of the trait in question. For instance, knowledge of the actual genetic mechanism of the trait, such as the number of loci, the number of alleles per locus, the allelic frequencies, or the marker linkage phases, is not absolutely required (![]()
![]()
![]()
![]()
![]()
![]()
Existing methods of QTL mapping under the IBD-based variance component model are developed under a single QTL model. When multiple QTL exist in the same chromosome, a proportion of effects of QTL not included in the model is confounded with the effect of the QTL fitted in the model and the remaining proportion is absorbed into the polygenic component. As a consequence, the single-QTL model can lead to biased estimates of QTL positions and effects because of the interference between QTL located on the same chromosome (e.g., ![]()
![]()
![]()
![]()
![]()
Almost all methods of QTL mapping under the IBD-based variance component model are developed for normally distributed traits. However, many complex human diseases, such as breast cancer and type I diabetes, are dichotomous or binary. Although these traits have a simple qualitatively expressed phenotype, their genetic architectures are generally complex, involving multiple genetic factors. Furthermore, the expression of the phenotype is often sensitive to environmental variation. As a consequence, these traits are usually called complex binary diseases and are commonly formulated under a threshold model. This model assumes a latent continuous variable (called the liability) controlling the expression of the binary trait (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Bayes methods of QTL mapping have been developed, in particular, for detection of multiple QTL (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article, we develop a Bayes method to map QTL for both normally distributed and binary traits under the IBD-based variance component model. We treat the additive and dominance effects of QTL as random variables so that their variances are directly estimated. The proposed method is implemented via the reversible jump MCMC, where we allow simultaneous estimation of the number of QTL and the locations and variances of the identified QTL.
| THE GENETIC MODEL |
|---|
Linear model for normally distributed traits:
Consider n individuals in a population of interest. The population is assumed to consist of many independent families. For convenience of presentation, only full-sib families are considered with phenotypic values of the parents excluded from the data. Parents of these full-sib families are randomly sampled from a large outbred population in Hardy-Weinberg and linkage equilibrium. Extended pedigree structures can be handled following the same procedure as described for full-sib families.
Let y represent an n x 1 vector for the observed phenotypic values. When the observed phenotype is controlled by multiple genes acting independently, y can be described by the linear model
![]() |
(1) |
with E(aj) = E(dj) = E(g) = E(e) = 0, Var(aj) =
j
2aj, Var(dj) =
j
2dj, Var(g) = A
2A, and Var(e) = I
2e, where ß is a p x 1 vector of covariate effects (fixed effects, including the overall mean), X is an n x p design matrix relating ß to y, l is the number of QTL on the chromosome(s) (or chromosomal segments) of interest, g is an n x 1 vector of the additive effects of the polygene (collective additive effects of all QTL residing on other chromosomes), e is an n x 1 vector of residual errors, aj and dj are n x 1 vectors for the additive and dominance effects of the genotypes at the jth QTL, respectively,
j = (
jii')nxn is an IBD matrix with element
jii' being the proportion of genes IBD shared by individuals i and i' at the jth QTL,
j = (
jii')nxn is a matrix with element
jii' indicating whether individuals i and i' share two IBD alleles at the jth QTL,
2aj and
2dj are the additive and dominance variances of the jth QTL, respectively, A = (Aii')nxn is the additive genetic relationship matrix with element Aii' being twice the coancestry coefficient between offspring i and i' (not conditional on marker information),
2A is the additive variance of the polygene (see Table 1), I is an identity matrix, and
2e is the residual variance. Note that the dominance effect of the polygene is assumed to be absent in model (1).
|
The expectation and variance-covariance matrix of y are
![]() |
(2) |
and
![]() |
(3) |
respectively. When families are independent, there are no covariances between effects of members from different families. Therefore, the matrices
j,
j, and A are all blockdiagonal. If individuals i and i' are full sibs,
jii' takes one of the four states {(0 + 0)/2, (0 + 1)/2, (1 + 0)/2, (1 + 1)/2}. If i and i' are from different families, then
jii' = 0. The four states show that individuals i and i' share no alleles, share the paternal but not the maternal alleles, share the maternal but not the paternal alleles, and share both alleles identity-by-descent, respectively. Similarly,
jii' takes one of the two values {1, 0}. If individuals i and i' share both alleles IBD,
jii' = 1; otherwise,
jii' = 0. Note that
jii' and
jii' vary from sib pair to sib pair but element Aii' of matrix A is a constant across all sib pairs. Because of this,
2aj,
2dj, and
2A can be separated, which is the theoretical basis of the variance component model of QTL mapping.
Multipoint inference of the IBD matrices
j and
j:
In linkage analysis, the IBD matrix of a QTL is not observable because the QTL genotype cannot be seen. In full-sib families without inbreeding, the diagonal elements of the matrices
j and
j are unity, but the off-diagonal elements vary depending on how many IBD alleles are shared by the two siblings. Thus the rationale is to infer the distributions of IBD variables
jii' and
jii' using markers in the same linkage group. In outbred populations, markers may be partially informative, and thus it is important to use a multipoint method to extract the maximum amount of marker information. A number of multipoint methods have been proposed to calculate the distributions of IBD (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Consider M ordered markers on the chromosome of interest. If a marker is fully informative, the IBD states of the marker shared by sibs are observed. Otherwise, the probabilities of IBD states of a marker can be inferred based on the observed genotypes of this marker (![]()
k
M - 1. What we want is to calculate the probabilities of IBD states of the QTL conditional on the probabilities of IBD states of all markers, i.e., Pr(
qii'|
1ii', ... ,
Mii'). When the marker IBD states are not directly observed, Pr(
qii'|
1ii', ... ,
Mii') should be denoted by Pr(
qii'|IM), where IM means the marker information. From Bayes' theorem, the probability of IBD states of QTL given the marker information is

where Pr(
qii') is the prior distribution of the IBD state, equal to 1/4 for each of the four possible IBD states. After some algebraic manipulations, we have



and

where 1 = (1 1 1 1)T, Dk = diag(pk00 pk01 pk10 pk11), D(00) = diag(0 0 0 1), D(01) = diag(0 0 1 0), D(10) = diag(0 1 0 0), D(11) = diag(1 0 0 0), and Tkl is the transition matrix between
kii' and
lii',

where
kl = r2kl + (1 - rkl)2 and rkl is the recombination fraction between loci k and l.
Since
qii' = 1 if
qii' = 1, and
qii' = 0 otherwise, at the time when the probability distribution of
qii' is calculated, that of
qii' is also generated as a by-product. The conditional expectations of
qii' and
qii' are calculated by

and

Both the conditional distributions and conditional expectations of the IBD matrices {
j} and {
j} can be used in the Bayesian analysis of QTL mapping. With the conditional distribution method, the states of {
j} and {
j} have to be sampled from their conditional distributions in the Bayesian procedure. In a maximum-likelihood (ML) analysis of IBD-based variance components, ![]()
j} and {
j} by their conditional expectations.
| BAYESIAN MAPPING FOR NORMALLY DISTRIBUTED TRAITS |
|---|
The observables are the phenotypic values y = {yi}ni=1, the covariate data X, and the marker data IM. The locations of markers on chromosomes are known a priori. The list of unobservables contains the number of QTL l, the locations of QTL
= {
j}lj=1, and the model effects
= (ß,
2a1, · · ·,
2al,
2d1, · · · ,
2dl,
2A,
2e), where
j denotes the distance of the jth QTL from one end of the chromosome in which the QTL resides. With the IBD-based variance component approach, the polygenic effects and the additive and dominance effects have been integrated out in the likelihood, and, therefore, we do not have to generate them in the MCMC process.
From Bayes' theorem, the joint posterior distribution of the parameters {l,
,
} given the observables and prior information is
![]() |
(4) |
Here, we have suppressed the notation for conditional on the observed markers and covariate data. The first term in Equation 4 is the conditional distribution of the phenotypic data given all unknowns, which is usually called the likelihood, and has the following form:
![]() |
(5) |
The second term in Equation 4 is the joint prior distribution of l,
, and
, the parameters of our interest. Assuming prior independence of the parameters, we can factorize the joint prior distribution p(l,
,
) into the following products:
![]() |
(6) |
The prior distribution of the number of QTL, l, is assumed to be truncated Poisson with mean µl and a predefined maximum number L. When no information regarding the QTL locations is available, the prior of
j takes the uniform distribution on the chromosome(s) into consideration. The prior distributions of ß,
2A,
2e,
2aj, and
2dj are assumed to be uniforms on predefined intervals, although other priors can be used. The lower and upper bounds for all variance components are usually set to zero and the phenotypic variance present in the data, respectively.
A Markov chain Monte Carlo method is used to generate the joint posterior distribution of all unknowns given in Equation 4. The idea of MCMC is to simulate a random walk in the space of the unknowns. The random walk eventually converges to a stationary distribution. The stationary distribution represents the posterior distribution of the unknowns. Various approaches have been suggested to conduct the MCMC. The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods that are useful for drawing samples from Bayesian posterior distributions (![]()
![]()
![]()
![]()
The proposed MCMC algorithm starts from an initial point (l0,
0,
0) and proceeds to update each of the unknowns in turn. For an initial QTL position, we can calculate the corresponding expectations of the IBD matrices using the multipoint method. Updating
and
is implemented using the Metropolis algorithm. Updating the QTL number l requires a change in the dimension of the model and thus needs a reversible jump step. More specifically, given the current state of (l,
,
), we proceed with the MCMC as follows.
- Updating
: All elements of
are updated simultaneously. New proposals for the elements of
are sampled from the symmetric uniform densities around their previous values. The proposals are accepted simultaneously with probability 
(7) - where
* represents the proposed
. If the proposal is not accepted, then the state remains unchanged. - Updating
and l: we adopt the method of SILLANPAA and ARJAS 1998 ,
SILLANPAA and ARJAS 1999 to implement the reversible jump MCMC. They used three different types of moves: (1) modify the QTL locations when the number of QTL is unchanged; (2) add one new QTL to the model; and (3) delete an existing QTL from the model. If l = 0, we cannot delete a QTL. On the other hand, if l = L, the maximum number of QTL assigned a priori, we cannot add a QTL. When 0 < l < L, the different types of moves have proposal probabilities of pm =
, pa =
, or pd =
to remain unchanged, to add, or to delete a QTL, respectively. Note that other values of pm, pa, and pd can be used.
When the number of QTL remains unchanged, we modify the locations of the existing QTL. Following ![]()
![]()
are modified one at a time using the Metropolis algorithm. For the jth QTL, a proposal
*j is sampled from a uniform distribution on the interval [
j - d,
j + d], where d is the tuning parameter. The expectations of the IBD matrices for new location
*j, denoted by
*j and
*j, are then formed according to the new position (
*j) and the marker information. The proposal is accepted with probability
![]() |
(8) |
where
-j means all elements of
excluding the jth element,

and

If the proposal is not accepted, the state remains unchanged, and the algorithm proceeds to update the location of the next QTL.
When pa happens, we add a QTL to the model. The location of the new QTL,
l+1, is proposed from the uniform density on the chromosome(s) under consideration. The expectations of the IBD matrices for the new QTL, denoted by
l+1 and
l+1, are then formed according to the new position (
l+1) and the marker information. The additive and dominance variances of the new QTL,
2al+1 and
2dl+1, are drawn from their priors. The proposal is accepted with probability
![]() |
(9) |
where V* =
lj=1
j
2aj +
l+1
2al+1 +
lj=1
j
2dj +
l+1
2dl+1 + A
2A + I
2e. If the proposal is accepted, we add a QTL to the model, and the new QTL location and the variances are all accepted simultaneously; otherwise, the QTL number remains unchanged.
When pd happens, we delete a QTL from the model, each existing QTL being equally likely to be deleted. If the j'th existing QTL is proposed to be deleted from the model, the acceptance probability is
![]() |
(10) |
where V* =
lj=1
j
2aj -
j'
2aj' +
lj=1
j
2dj -
j'
2dj' + A
2A + I
2e. If the proposal is accepted, we delete a QTL from the model. Otherwise, the number of QTL remains unchanged.
In Equation 9 and Equation 10, the first term of the acceptance ratio is the likelihood ratio, and the second term contains the prior ratio and the proposal ratio. Note that here the Jacobian of the transformation is one because adding one QTL or deleting one QTL does not influence the parameters of the other QTL.
| BAYESIAN MAPPING FOR BINARY TRAITS |
|---|
For a complex disease trait, the observed phenotype is defined in a binary fashion, i.e., wj = 1 if an individual is affected, and wj = 0 otherwise. We propose an underlying normal variable yj that determines the binary observation. The link between yj and wj is through a threshold t, i.e.,
![]() |
(11) |
The liability y is nothing new but an unobservable quantitative trait that can be described by Equation 1. The threshold model is overparameterized so that some constraints must be superimposed. As usual, we take t = 0 and
2e = 1 (![]()
![]() |
(12) |
In the threshold model, the observables are the binary phenotypic values w = {wi}ni=1, the covariate data, and the marker data. The unobservables include the vector of liability y = {yi}ni=1, the number of QTL l, the locations of QTL
= {
j}lj=1, and the model effects
= (ß,
2a1, · · · ,
2al,
2d1, · · · ,
2dl,
2A).
The joint posterior distribution of the unobservables, {y, l,
,
}, after suppressing the notation for conditioning on the marker data and the covariate data, can be expressed as
![]() |
(13) |
where the likelihood has the form of
![]() |
(14) |
where 1(X
A) is the indicator function, taking the value of 1 if X is contained in A, and 0 otherwise. Note that p(w|y, l,
,
) = p(w|y) because w is solely determined by y. The second term in Equation 13 is the conditional distribution of the liability given all other unknowns and has the same form as Equation 5. Finally, the last term in Equation 13 is the joint prior distribution of l,
, and
.
From this joint posterior density, it can be seen that the posterior distribution of the parameters l,
, and
can be computed using the method for normally distributed data if the liability y is known. To implement the MCMC algorithm for binary traits, therefore, we need to generate the liability y. Since it is difficult to simulate directly from the conditional distribution of y given w and other parameters, we use the method of the Gibbs sampler (![]()
Starting with some initial values y(0) = (y(0)1, · · ·, y(0)n)T consistent in signs with the observed data w, we proceed to generate y(t+1) = (y(t+1)1, · · · , y(t+1)n)T, t = 0, 1 · · ·, sequentially in the following manner. Given the current state (y(t), l(t),
(t),
(t)) of all unknowns, we simulate



To carry out the above simulations, we use the fact

Since y = (y1, · · · , yn)T has a N(Xß, V) distribution, it follows that f(yi|yj, j
i; l,
,
) is also normal with mean

and variance

where y-i = (y1, · · ·, yi-1, yi+1, · · ·, yn)T. Thus, f(yi|yj, j
i; wi, l,
,
) is a truncated (above 0 if wi = 1; below 0 if wi = 0) normal distribution. Devroye's algorithm is then used to simulate a truncated normal variable (![]()
| A SIMULATION STUDY |
|---|
Design of the simulation experiment:
The proposed method is illustrated in the analysis of a simulated example. We simulated a single chromosome of length 100 cM. Eleven codominant markers were placed on the chromosome with a marker distance of 10 cM. Four equally frequent alleles were simulated at each marker locus. Two QTL residing at positions 25 and 75 cM, respectively, and 12 additional independent loci of equal effect (called polygene) were simulated for a continuous phenotype. The dominance effect of the polygene was assumed to be absent. The variances of the first QTL were
2a1 =
2d1 = 0.6 and those of the second QTL were
2a2 =
2d2 = 0.4. The polygenic variance was set at
2A = 1.0. The normally distributed phenotype of each individual took the sum of an overall mean, values of QTL additive and dominance effects, polygenic effect, and a residual error sampled from a standardized normal distribution. In addition, we generated a binary phenotype for each individual using the normally distributed phenotypic value as the underlying liability. The binary phenotype took 1 if the liability exceeded 0, and 0 otherwise. The overall mean was set to 0, which led to 50% population incidence. We simulated 500 full-sib families, each with 6 siblings (a total of 3000 individuals). For comparison, we analyzed both the normal data and the binary data. No phenotypic values and marker linkage phases were recorded for all parents.
In each of the MCMC analyses, we ran a single long chain with 5 x 105 cycles of simulations. The first 200 samples were discarded and the chain was then thinned (we saved one iteration in every 50 cycles) to reduce serial correlation in the stored samples so that the total number of samples kept in each analysis was 104.
The initial value for the QTL number was set at 1 and the corresponding initial location was given at the 50-cM position of the chromosome. The prior Poisson mean of the number of QTL was µl = 2 and the maximum number of QTL was L = 4. The starting values were 0.0 for the overall mean, 1.0 for both the polygenic and residual variances, and 0.5 for both the additive and dominance variances of the QTL. The prior for the overall mean was uniform over the range [-2.0, 2.0]. The priors for all variance components were chosen as uniform on [0.0, 4.0], the right endpoint being equal to the true phenotypic variance of the liability. The prior for the QTL locations was uniform over the whole chromosome. The tuning parameter of the proposal distribution was set at 2 cM for QTL locations and 0.1 for the overall mean and all variance components.
We used the QTL intensity function of ![]()
![]()
![]()
Results:
The Bayesian posterior QTL intensities for both the normal and the binary data are presented in Fig 1. The QTL intensity graphs are concentrated around the true locations of the simulated QTL. One peak of the posterior QTL intensity for the normal data is on [24, 25] and the other on [76, 77]. The corresponding peaks for the binary data are on [23, 24] and [74, 75], respectively. These results support quite strongly a model having two QTL in the chromosome. Comparing the shapes of the posterior QTL intensities for the binary and normal data analyses, we can see that binary data analysis does lose some information, but the information retained is still sufficient to detect both QTL.
|
The approximate posterior distributions on the QTL number, obtained from the two types of data, are presented in Table 2. The posterior expectations are essentially the same for the binary data and the normal data and coincide with the simulated number of QTL. As expected, the posterior variance of QTL number for the normal data is smaller than that for the binary data. Finally, the posterior modes of the QTL numbers overlap with the true number of QTL in both data analyses.
|
The posterior samples for the overall mean, the polygenic variance, and the residual variance from the normal data analysis, and those for the overall mean and the polygenic variance from the binary data analysis, are depicted in Fig 2 and Fig 3, respectively. The posterior means and standard errors of these parameters are given in Table 3. From the figures and the table, it appears that estimates of these parameters are close to the simulated values with small standard errors from the normal data. However, the polygenic variance is greatly overestimated and the estimated standard error is large from the binary data analysis although the mean is still estimated accurately.
|
|
|
The chromosome regions with sufficiently high posterior QTL intensity are given in Table 3. The posterior samples, in which QTL locations fall into these regions, are used to estimate the QTL variances. Fig 4 and Fig 5 depict such posterior samples for the QTL variances from the normal data and the binary data analyses, respectively. The posterior densities of QTL variances are concentrated around the corresponding true values for both types of analyses, although the binary data analysis shows a wider distribution. More explicitly, we calculated the means and the standard errors of the posterior samples for the QTL variances (see Table 3). For the normal data, the estimated posterior means of the additive and dominance variances of the QTL are close to the simulated true values, and the standard errors are small. In the binary data analysis, however, the additive and dominance variances of the QTL are overestimated, and the standard errors are larger than those for the normal data.
|
|
The point estimates and the estimation errors of the QTL locations are also given in Table 3. For both types of data, the estimated QTL locations are very close to the corresponding true values. However, the standard errors are much larger for the binary data than those for the normal data.
For comparison, the simulated normal data set was analyzed using the ML method (![]()
2a1 = 0.8253 and
2d1 = 0.8900. The estimated variances of the second QTL are
2a2 = 0.7839 and
2d2 = 0.5667. The ML estimates of QTL variances are not so close to the corresponding true values as the estimated posterior means in the reversible jump MCMC analysis.
|
| DISCUSSION |
|---|
We have presented here a Bayesian procedure that allows detection of multiple QTL for both normally distributed and binary traits under IBD-based variance component analysis. The proposed Bayesian procedure is implemented via a reversible jump MCMC algorithm, which enables moves to be made between models with different numbers of QTL. We model complex binary traits under the classical threshold model of quantitative genetics (![]()
![]()
Bayesian mapping statistics have been well studied in the context of line-crossing experiments (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
2a = 2
(1 -
)â2. Although the biallelic model may be proper in most situations, it can potentially fail if the actual number of polymorphic alleles is not two. The IBD-based variance component Bayesian mapping goes up another level in the hierarchical model by treating the variance as the parameter of interest. As a result, information about the number of alleles is not required, which has significantly improved the robustness of the Bayes method.
The IBD-based variance component approach of QTL mapping is based on the proportion of alleles with identity-by-descent shared by relatives. Inference on IBD matrices does not need the information of marker linkage phases. The IBD states of markers are calculated in advance and remain as a prior in the Bayesian procedure. With the proposed method, we can completely avoid the complicated problem concerning sampling of additive and dominance effects of QTL and marker and QTL genotypes. The IBD-based variance component Bayesian mapping is not limited to nuclear families. In the simulation study, we used full-sib families to demonstrate the utility of the method solely for the purpose of simplicity. In theory, the Bayes method described in this article can be applied easily to extended or complex pedigrees by using a general algorithm to compute the IBD proportion shared by an arbitrary relative pair. Such a general algorithm has been given by ![]()
In the simulation study, we put two QTL on a single chromosome and 12 independent loci on other unspecified chromosomes as a "polygene." Our model always includes a polygenic term to absorb background QTL. The polygenic variance varies depending on the model effects fitted. Without this polygenic term, variances due to QTL on unmarked chromosomes or chromosome regions will join the residual variance. A large residual variance will reduce the power of QTL detection. In practice, we can take one of three strategies for QTL mapping. One strategy is to analyze one chromosome at a time. Marker information on other chromosomes can be completely ignored. In this case, variances of QTL on other chromosomes will be collected by the polygene (not the residual). The second strategy is to include the previously identified QTL (at other chromosomes) in the model instead of being absorbed into the polygenic term. The other strategy is to map QTL simultaneously for all chromosomes. In the latter situation, if the marker map is sufficiently dense, the polygenic term can be ignored because eventually all QTL will be identified and their effects will be explicitly estimated. These three mapping strategies may produce similar results.
The reversible jump MCMC sampler performed well for the simulated normally distributed and binary data. A plot of the changes in the number of QTL against the number of iterations for the binary data used in the simulation study is presented in Fig 7. It shows that the reversible jump MCMC algorithm mixes well over the number of QTL. A similar plot was obtained for the normally distributed data. We detected no influence of initial values of unknowns on the mixing of the MCMC sampler. For example, with the simulated data, starting with l0 = 4, the QTL number l quickly dropped to 1 after several hundred iterations and subsequently behaved the same as starting with l0 = 1. The mixing behavior of the number of QTL is greatly affected by the prior distributions for QTL additive and dominance variances (![]()
0.4% for both the simulated normally distributed and binary data. These acceptance rates were similar to those in ![]()
were found to be rather high in general and were
80% for the simulated data.
|
A major implementation issue in MCMC is to determine the effective sample sizes. This issue is related to the assessment of convergence of the chain, the serial correlation between the samples, and the burn-in period. For a single long chain, one can examine time series graphs of a simulated sequence to judge whether the chain is sufficiently long (![]()
We only use marker information to infer the IBD matrices of QTL and allow multiple QTL located in the same marker interval in the proposed method. Alternatively, we can infer the IBD matrices of a QTL considered by utilizing the IBD information of all markers and other existing QTL in the same chromosome and assume at most one QTL in any marker interval is acceptable. However, our treatment simplifies the MCMC algorithm and does not result in a significant reduction in efficiency particularly when the marker map is dense. It has been shown that, conditional on the IBD states of a marker locus, the IBD matrices of a QTL on one side of the marker are not correlated to those of a QTL on the other side (![]()
![]()
Finally, Bayesian statistics have the inherent flexibility introduced by their incorporation of multiple levels of randomness and the resultant ability to combine information from different sources. Therefore, the Bayesian approach could be extended to allow more complicated models under more complicated data structures, e.g., the epistatic model. When a new QTL is proposed, its interactive effects with all existing QTL should be proposed and, if accepted, the epistatic effects should be included in the model. A QTL is finally added to the model if at least one effect caused by the QTL is accepted. This will involve additional reversible jumps on the dimension of the model even if the number of QTL remains the same.
| ACKNOWLEDGMENTS |
|---|
We thank Dr. Damian Gessler for his helpful comments on the manuscript. We also thank two anonymous reviewers for their critical comments on an earlier version of the manuscript. This research was supported by the National Institutes of Health Grant GM55321 and the United States Department of Agriculture National Research Initiative Competitive Grants Program 97-35205-5075 to S.X.
Manuscript received December 6, 1999; Accepted for publication May 19, 2000.
| LITERATURE CITED |
|---|
ALBERT, J. H. and S. CHIB, 1993 Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88:669-679.
ALMASY, L. and J. BLANGERO, 1998 Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62:1198-1211[Medline].
AMOS, C. I., 1994 Robust variance-components approach for assessing genetic linkage in pedigrees. Am. J. Hum. Genet. 54:535-543[Medline].
BINK, M. G. A. M., L. L. G. JANSS and R. L. QUAAS, 1998 Mapping a polyallelic quantitative trait locus using simulated tempering. Proceedings of the 6th World Congress on Genetics Applied to Livestock Production Science, Armidale, Australia, Vol. 26, pp. 277280.
CHAN, J. S. K. and A. C. KUK, 1997 Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 88:86-97.
DEVROYE, T., 1986 Non-uniform Random Variable Generation. Springer-Verlag, New York.
DUGGIRALA, R., J. T. WILLIAMS, S. WILLIAMS-BLANGERO, and J. BLANGERO, 1997 A variance component approach to dichotomous trait linkage analysis using a threshold model. Genet. Epidemiol. 14:987-992[Medline].
FULKER, D. W. and L. R. CARDON, 1994 A sib-pair approach to interval mapping of quantitative trait loci. Am. J. Hum. Genet. 54:1092-1103[Medline].
FULKER, D. W., S. S. CHERNY, and L. R. CARDON, 1995 Multipoint interval mapping of quantitative trait loci, using sib pairs. Am. J. Hum. Genet. 56:1224-1233[Medline].
GEMAN, S. and D. GEMAN, 1984 Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6:721-741.
GESSLER, D. D. G. and S. XU, 1996 Using the expectation or the distribution of identity-by-descent for mapping quantitative trait loci under the random model. Am. J. Hum. Genet. 59:1382-1390[Medline].
GEYER, C. J., 1992 Practical Markov chain Monte Carlo. Stat. Sci. 7:473-511.
GOLDGAR, D. E., 1990 Multipoint analysis of human quantitative genetic variation. Am. J. Hum. Genet. 47:957-967[Medline].
GREEN, P. J., 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711-732
GRIGNOLA, F. E., I. HOESCHELE, and B. TIER, 1996 Mapping quantitative trait loci via residual maximum likelihood: I. Methodology. Genet. Sel. Evol. 28:479-490.
HACKETT, C. A. and J. I. WELLER, 1995 Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51:1252-1263[Medline].
HALEY, C. S. and S. A. KNOTT, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].
HASTINGS, W. K., 1970 Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109
HEATH, S. C., 1997 Markov chain Monte Carlo segregation and linkage analysis of oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].
JANSEN, R. C., 1993 Interval mapping of multiple quantitative trait loci. Genetics 135:205-211[Abstract].
KAO, C. H., Z-B. ZENG, and R. D. TEASDALE, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216
KRUGLYAK, L. and E. S. LANDER, 1995 Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am. J. Hum. Genet. 57:439-454[Medline].
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.
MENDELL, N. R. and R. C. ELSTON, 1974 Multifactorial qualitative traits: genetic analysis and prediction of recurrence risks. Biometrics 30:41-57[Medline].
METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER, and E. TELLER, 1953 Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1091.
RAO, S. and S. XU, 1998 Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity 81:214-224.
REBAI, A., 1997 Comparison of methods for regression interval mapping in QTL analysis with non-normal traits. Genet.



















