Genetics, Vol. 156, 411-422, September 2000, Copyright © 2000

Bayesian Mapping of Quantitative Trait Loci Under the Identity-by-Descent-Based Variance Component Model

Nengjun Yia and Shizhong Xua
a Department of Botany and Plant Sciences, University of California, Riverside, California, 92521-0124

Corresponding author: Nengjun Yi, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521., yi{at}genetics.ucr.edu (E-mail)

Communicating editor: J. B. WALSH


*  ABSTRACT
*TOP
*ABSTRACT
*THE GENETIC MODEL
*BAYESIAN MAPPING FOR NORMALLY...
*BAYESIAN MAPPING FOR BINARY...
*A SIMULATION STUDY
*DISCUSSION
*LITERATURE CITED

Variance component analysis of quantitative trait loci (QTL) is an important strategy of genetic mapping for complex traits in humans. The method is robust because it can handle an arbitrary number of alleles with arbitrary modes of gene actions. The variance component method is usually implemented using the proportion of alleles with identity-by-descent (IBD) shared by relatives. As a result, information about marker linkage phases in the parents is not required. The method has been studied extensively under either the maximum-likelihood framework or the sib-pair regression paradigm. However, virtually all investigations are limited to normally distributed traits under a single QTL model. In this study, we develop a Bayes method to map multiple QTL. We also extend the Bayesian mapping procedure to identify QTL responsible for the variation of complex binary diseases in humans under a threshold model. The method can also treat the number of QTL as a parameter and infer its posterior distribution. We use the reversible jump Markov chain Monte Carlo method to infer the posterior distributions of parameters of interest. The Bayesian mapping procedure ends with an estimation of the joint posterior distribution of the number of QTL and the locations and variances of the identified QTL. Utilities of the method are demonstrated using a simulated population consisting of multiple full-sib families.


THE identity-by-descent (IBD)-based variance component analysis is a powerful statistical method for quantitative trait loci (QTL) mapping in outbred populations, such as humans. This method requires fewer assumptions than other methods with regard to the genetic model underlying the expression of the trait in question. For instance, knowledge of the actual genetic mechanism of the trait, such as the number of loci, the number of alleles per locus, the allelic frequencies, or the marker linkage phases, is not absolutely required (GOLDGAR 1990 Down; SCHORK 1993 Down; AMOS 1994 Down; FULKER and CARDON 1994 Down; XU and ATCHLEY 1995 Down; ALMASY and BLANGERO 1998 Down). Conventionally, the method decomposes the overall genetic variance into several variance components, one being due to the segregation of a putative QTL at the position being tested and the other due to the effect of a polygenic term (the collective effects of all other quantitative loci affecting the trait). The key to separating the contribution of a putative QTL from that of the polygene is the differentiated proportion of IBD alleles shared by relatives at the QTL and the polygene. The IBD proportion varies from one locus to another, which provides the capability of locating QTL on the chromosome.

Existing methods of QTL mapping under the IBD-based variance component model are developed under a single QTL model. When multiple QTL exist in the same chromosome, a proportion of effects of QTL not included in the model is confounded with the effect of the QTL fitted in the model and the remaining proportion is absorbed into the polygenic component. As a consequence, the single-QTL model can lead to biased estimates of QTL positions and effects because of the interference between QTL located on the same chromosome (e.g., HALEY and KNOTT 1992 Down; GRIGNOLA et al. 1996 Down). In theory, effects of multiple QTL can be simultaneously fitted in the same model, but this can be difficult to implement in practice because even the number of QTL is unknown. For line-crossing experiments, JANSEN 1993 Down and ZENG 1994 Down developed the idea of composite interval mapping in which selected markers in untested regions are fitted in the model as cofactors to absorb effects of background QTL. Recently, KAO et al. 1999 Down developed a multiple interval mapping (MIM) approach, designed particularly for multiple QTL in line crosses. Yet, extension of the MIM to the IBD-based variance component approach under the maximum-likelihood framework is not straightforward.

Almost all methods of QTL mapping under the IBD-based variance component model are developed for normally distributed traits. However, many complex human diseases, such as breast cancer and type I diabetes, are dichotomous or binary. Although these traits have a simple qualitatively expressed phenotype, their genetic architectures are generally complex, involving multiple genetic factors. Furthermore, the expression of the phenotype is often sensitive to environmental variation. As a consequence, these traits are usually called complex binary diseases and are commonly formulated under a threshold model. This model assumes a latent continuous variable (called the liability) controlling the expression of the binary trait (LYNCH and WALSH 1998 Down). Methods of QTL mapping under the threshold model have been developed in line crosses (HACKETT and WELLER 1995 Down; VISSCHER et al. 1996 Down; XU and ATCHLEY 1996 Down; REBAI 1997 Down; RAO and XU 1998 Down). YI and XU 1999A Down, YI and XU 1999B Down recently developed a random model approach to directly estimate and test the QTL variances in outbred populations. Under a single-QTL model, DUGGIRALA et al. 1997 Down investigated the IBD-based variance component method using the Mendell-Elston algorithm (MENDELL and ELSTON 1974 Down) to approximate the likelihood function.

Bayes methods of QTL mapping have been developed, in particular, for detection of multiple QTL (SATAGOPAN and YANDELL 1996 Down; SATAGOPAN et al. 1996 Down; UIMARI and HOESCHELE 1997 Down; HEATH 1997 Down; SILLANPAA and ARJAS 1998 Down, SILLANPAA and ARJAS 1999 Down; STEPHENS and FISCH 1998 Down). In Bayesian analysis, Markov chain Monte Carlo (MCMC) methods are commonly used to evaluate complex integrals to summarize posterior distributions of all unknowns. A recent development in MCMC methodology is the reversible jump algorithm, an extension of the Metropolis-Hastings sampler, which permits posterior samples to be collected from posterior distributions with varying dimensions (GREEN 1995 Down). Bayes methods, implemented via reversible jump MCMC, can yield posterior densities for not only the QTL locations and the corresponding effects of a specified number of QTL but also the QTL number itself. The reversible jump MCMC has been used to map QTL for normally distributed traits in both line-crossing experiments (SATAGOPAN and YANDELL 1996 Down; SILLANPAA and ARJAS 1998 Down, SILLANPAA and ARJAS 1999 Down; STEPHENS and FISCH 1998 Down) and complex pedigrees (HEATH 1997 Down; UIMARI and HOESCHELE 1997 Down).

In this article, we develop a Bayes method to map QTL for both normally distributed and binary traits under the IBD-based variance component model. We treat the additive and dominance effects of QTL as random variables so that their variances are directly estimated. The proposed method is implemented via the reversible jump MCMC, where we allow simultaneous estimation of the number of QTL and the locations and variances of the identified QTL.


*  THE GENETIC MODEL
*TOP
*ABSTRACT
*THE GENETIC MODEL
*BAYESIAN MAPPING FOR NORMALLY...
*BAYESIAN MAPPING FOR BINARY...
*A SIMULATION STUDY
*DISCUSSION
*LITERATURE CITED

Linear model for normally distributed traits:
Consider n individuals in a population of interest. The population is assumed to consist of many independent families. For convenience of presentation, only full-sib families are considered with phenotypic values of the parents excluded from the data. Parents of these full-sib families are randomly sampled from a large outbred population in Hardy-Weinberg and linkage equilibrium. Extended pedigree structures can be handled following the same procedure as described for full-sib families.

Let y represent an n x 1 vector for the observed phenotypic values. When the observed phenotype is controlled by multiple genes acting independently, y can be described by the linear model

(1)

with E(aj) = E(dj) = E(g) = E(e) = 0, Var(aj) = {Pi}j{sigma}2aj, Var(dj) = {Delta}j{sigma}2dj, Var(g) = A{sigma}2A, and Var(e) = I{sigma}2e, where ß is a p x 1 vector of covariate effects (fixed effects, including the overall mean), X is an n x p design matrix relating ß to y, l is the number of QTL on the chromosome(s) (or chromosomal segments) of interest, g is an n x 1 vector of the additive effects of the polygene (collective additive effects of all QTL residing on other chromosomes), e is an n x 1 vector of residual errors, aj and dj are n x 1 vectors for the additive and dominance effects of the genotypes at the jth QTL, respectively, {Pi}j = ({pi}jii')nxn is an IBD matrix with element {pi}jii' being the proportion of genes IBD shared by individuals i and i' at the jth QTL, {Delta}j = ({delta}jii')nxn is a matrix with element {delta}jii' indicating whether individuals i and i' share two IBD alleles at the jth QTL, {sigma}2aj and {sigma}2dj are the additive and dominance variances of the jth QTL, respectively, A = (Aii')nxn is the additive genetic relationship matrix with element Aii' being twice the coancestry coefficient between offspring i and i' (not conditional on marker information), {sigma}2A is the additive variance of the polygene (see Table 1), I is an identity matrix, and {sigma}2e is the residual variance. Note that the dominance effect of the polygene is assumed to be absent in model (1).


 
View this table:
In this window
In a new window

 
Table 1. Description of symbols used in the text

The expectation and variance-covariance matrix of y are

(2)

and

(3)

respectively. When families are independent, there are no covariances between effects of members from different families. Therefore, the matrices {Pi}j, {Delta}j, and A are all blockdiagonal. If individuals i and i' are full sibs, {pi}jii' takes one of the four states {(0 + 0)/2, (0 + 1)/2, (1 + 0)/2, (1 + 1)/2}. If i and i' are from different families, then {pi}jii' = 0. The four states show that individuals i and i' share no alleles, share the paternal but not the maternal alleles, share the maternal but not the paternal alleles, and share both alleles identity-by-descent, respectively. Similarly, {delta}jii' takes one of the two values {1, 0}. If individuals i and i' share both alleles IBD, {delta}jii' = 1; otherwise, {delta}jii' = 0. Note that {pi}jii' and {delta}jii' vary from sib pair to sib pair but element Aii' of matrix A is a constant across all sib pairs. Because of this, {sigma}2aj, {sigma}2dj, and {sigma}2A can be separated, which is the theoretical basis of the variance component model of QTL mapping.

Multipoint inference of the IBD matrices {Pi}j and {Delta}j:
In linkage analysis, the IBD matrix of a QTL is not observable because the QTL genotype cannot be seen. In full-sib families without inbreeding, the diagonal elements of the matrices {Pi}j and {Delta}j are unity, but the off-diagonal elements vary depending on how many IBD alleles are shared by the two siblings. Thus the rationale is to infer the distributions of IBD variables {pi}jii' and {delta}jii' using markers in the same linkage group. In outbred populations, markers may be partially informative, and thus it is important to use a multipoint method to extract the maximum amount of marker information. A number of multipoint methods have been proposed to calculate the distributions of IBD (FULKER et al. 1995 Down; KRUGLYAK and LANDER 1995 Down; ALMASY and BLANGERO 1998 Down; XU and GESSLER 1998 Down). The method of KRUGLYAK and LANDER 1995 Down is the most efficient with regard to extracting the maximum amount of information from markers. However, their method is also the most intensive in terms of computational time and complexity. The method of FULKER et al. 1995 Down, on the other hand, is an approximate method, but it is a much simpler and faster algorithm. The method of XU and GESSLER 1998 Down is a compromise between the two. The multipoint method used in this study is a modified version of XU and GESSLER 1998 Down and is described below.

Consider M ordered markers on the chromosome of interest. If a marker is fully informative, the IBD states of the marker shared by sibs are observed. Otherwise, the probabilities of IBD states of a marker can be inferred based on the observed genotypes of this marker (XU and GESSLER 1998 Down). Denote the probabilities of IBD states of marker locus k by pk00, pk01, pk10, pk11, respectively, for the four states. Assume that the QTL is located between marker k and k + 1 for 1 <= k <= M - 1. What we want is to calculate the probabilities of IBD states of the QTL conditional on the probabilities of IBD states of all markers, i.e., Pr({pi}qii'|{pi}1ii', ... , {pi}Mii'). When the marker IBD states are not directly observed, Pr({pi}qii'|{pi}1ii', ... , {pi}Mii') should be denoted by Pr({pi}qii'|IM), where IM means the marker information. From Bayes' theorem, the probability of IBD states of QTL given the marker information is

where Pr({pi}qii') is the prior distribution of the IBD state, equal to 1/4 for each of the four possible IBD states. After some algebraic manipulations, we have



and

where 1 = (1 1 1 1)T, Dk = diag(pk00 pk01 pk10 pk11), D(00) = diag(0 0 0 1), D(01) = diag(0 0 1 0), D(10) = diag(0 1 0 0), D(11) = diag(1 0 0 0), and Tkl is the transition matrix between {pi}kii' and {pi}lii',

where {Psi}kl = r2kl + (1 - rkl)2 and rkl is the recombination fraction between loci k and l.

Since {delta}qii' = 1 if {pi}qii' = 1, and {delta}qii' = 0 otherwise, at the time when the probability distribution of {pi}qii' is calculated, that of {delta}qii' is also generated as a by-product. The conditional expectations of {pi}qii' and {delta}qii' are calculated by

and

Both the conditional distributions and conditional expectations of the IBD matrices {{Pi}j} and {{Delta}j} can be used in the Bayesian analysis of QTL mapping. With the conditional distribution method, the states of {{Pi}j} and {{Delta}j} have to be sampled from their conditional distributions in the Bayesian procedure. In a maximum-likelihood (ML) analysis of IBD-based variance components, GESSLER and XU 1996 Down discovered that the conditional distribution and expectation methods have virtually no difference, but the conditional expectation method is computationally much more efficient than the conditional distribution method. Because of this, we use only the conditional expectations in the proposed method, i.e., replacing {{Pi}j} and {{Delta}j} by their conditional expectations.


*  BAYESIAN MAPPING FOR NORMALLY DISTRIBUTED TRAITS
*TOP
*ABSTRACT
*THE GENETIC MODEL
*BAYESIAN MAPPING FOR NORMALLY...
*BAYESIAN MAPPING FOR BINARY...
*A SIMULATION STUDY
*DISCUSSION
*LITERATURE CITED

The observables are the phenotypic values y = {yi}ni=1, the covariate data X, and the marker data IM. The locations of markers on chromosomes are known a priori. The list of unobservables contains the number of QTL l, the locations of QTL {lambda} = {{lambda}j}lj=1, and the model effects {theta} = (ß, {sigma}2a1, · · ·, {sigma}2al, {sigma}2d1, · · · , {sigma}2dl, {sigma}2A, {sigma}2e), where {lambda}j denotes the distance of the jth QTL from one end of the chromosome in which the QTL resides. With the IBD-based variance component approach, the polygenic effects and the additive and dominance effects have been integrated out in the likelihood, and, therefore, we do not have to generate them in the MCMC process.

From Bayes' theorem, the joint posterior distribution of the parameters {l, {lambda}, {theta}} given the observables and prior information is

(4)

Here, we have suppressed the notation for conditional on the observed markers and covariate data. The first term in Equation 4 is the conditional distribution of the phenotypic data given all unknowns, which is usually called the likelihood, and has the following form:

(5)

The second term in Equation 4 is the joint prior distribution of l, {lambda}, and {theta}, the parameters of our interest. Assuming prior independence of the parameters, we can factorize the joint prior distribution p(l, {lambda}, {theta}) into the following products:

(6)

The prior distribution of the number of QTL, l, is assumed to be truncated Poisson with mean µl and a predefined maximum number L. When no information regarding the QTL locations is available, the prior of {lambda}j takes the uniform distribution on the chromosome(s) into consideration. The prior distributions of ß, {sigma}2A, {sigma}2e, {sigma}2aj, and {sigma}2dj are assumed to be uniforms on predefined intervals, although other priors can be used. The lower and upper bounds for all variance components are usually set to zero and the phenotypic variance present in the data, respectively.

A Markov chain Monte Carlo method is used to generate the joint posterior distribution of all unknowns given in Equation 4. The idea of MCMC is to simulate a random walk in the space of the unknowns. The random walk eventually converges to a stationary distribution. The stationary distribution represents the posterior distribution of the unknowns. Various approaches have been suggested to conduct the MCMC. The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods that are useful for drawing samples from Bayesian posterior distributions (HASTINGS 1970 Down). The Metropolis algorithm and the Gibbs sampler are two commonly used special cases of the Metropolis-Hastings algorithm (METROPOLIS et al. 1953 Down; GEMAN and GEMAN 1984 Down). The reversible jump MCMC is an extension of the Metropolis-Hastings sampler, permitting posterior samples to be collected from posterior distributions with varying dimensions (GREEN 1995 Down). With our Bayesian analysis, the number of QTL is treated as an unobservable, which naturally leads us to consider the problem within the general framework of variable dimensional parameter estimation.

The proposed MCMC algorithm starts from an initial point (l0, {lambda}0, {theta}0) and proceeds to update each of the unknowns in turn. For an initial QTL position, we can calculate the corresponding expectations of the IBD matrices using the multipoint method. Updating {theta} and {lambda} is implemented using the Metropolis algorithm. Updating the QTL number l requires a change in the dimension of the model and thus needs a reversible jump step. More specifically, given the current state of (l, {lambda}, {theta}), we proceed with the MCMC as follows.

  • Updating {theta}: All elements of {theta} are updated simultaneously. New proposals for the elements of {theta} are sampled from the symmetric uniform densities around their previous values. The proposals are accepted simultaneously with probability

    (7)

  • where {theta}* represents the proposed {theta}. If the proposal is not accepted, then the state remains unchanged.

  • Updating {lambda} and l: we adopt the method of SILLANPAA and ARJAS 1998 Down, SILLANPAA and ARJAS 1999 Down to implement the reversible jump MCMC. They used three different types of moves: (1) modify the QTL locations when the number of QTL is unchanged; (2) add one new QTL to the model; and (3) delete an existing QTL from the model. If l = 0, we cannot delete a QTL. On the other hand, if l = L, the maximum number of QTL assigned a priori, we cannot add a QTL. When 0 < l < L, the different types of moves have proposal probabilities of pm = , pa = , or pd = to remain unchanged, to add, or to delete a QTL, respectively. Note that other values of pm, pa, and pd can be used.

When the number of QTL remains unchanged, we modify the locations of the existing QTL. Following SILLANPAA and ARJAS 1998 Down, SILLANPAA and ARJAS 1999 Down, we do not fix the order of QTL when updating the QTL locations. Elements of {lambda} are modified one at a time using the Metropolis algorithm. For the jth QTL, a proposal {lambda}*j is sampled from a uniform distribution on the interval [{lambda}j - d, {lambda}j + d], where d is the tuning parameter. The expectations of the IBD matrices for new location {lambda}*j, denoted by {Pi}*j and {Delta}*j, are then formed according to the new position ({lambda}*j) and the marker information. The proposal is accepted with probability

(8)

where {lambda}-j means all elements of {lambda} excluding the jth element,

and

If the proposal is not accepted, the state remains unchanged, and the algorithm proceeds to update the location of the next QTL.

When pa happens, we add a QTL to the model. The location of the new QTL, {lambda}l+1, is proposed from the uniform density on the chromosome(s) under consideration. The expectations of the IBD matrices for the new QTL, denoted by {Pi}l+1 and {Delta}l+1, are then formed according to the new position ({lambda}l+1) and the marker information. The additive and dominance variances of the new QTL, {sigma}2al+1 and {sigma}2dl+1, are drawn from their priors. The proposal is accepted with probability

(9)

where V* = {Sigma}lj=1{Pi}j{sigma}2aj + {Pi}l+1{sigma}2al+1 + {Sigma}lj=1{Delta}j{sigma}2dj + {Delta}l+1 {sigma}2dl+1 + A{sigma}2A + I{sigma}2e. If the proposal is accepted, we add a QTL to the model, and the new QTL location and the variances are all accepted simultaneously; otherwise, the QTL number remains unchanged.

When pd happens, we delete a QTL from the model, each existing QTL being equally likely to be deleted. If the j'th existing QTL is proposed to be deleted from the model, the acceptance probability is

(10)

where V* = {Sigma}lj=1{Pi}j{sigma}2aj - {Pi}j'{sigma}2aj' + {Sigma}lj=1{Delta}j{sigma}2dj - {Delta}j'{sigma}2dj' + A{sigma}2A + I{sigma}2e. If the proposal is accepted, we delete a QTL from the model. Otherwise, the number of QTL remains unchanged.

In Equation 9 and Equation 10, the first term of the acceptance ratio is the likelihood ratio, and the second term contains the prior ratio and the proposal ratio. Note that here the Jacobian of the transformation is one because adding one QTL or deleting one QTL does not influence the parameters of the other QTL.


*  BAYESIAN MAPPING FOR BINARY TRAITS
*TOP
*ABSTRACT
*THE GENETIC MODEL
*BAYESIAN MAPPING FOR NORMALLY...
*BAYESIAN MAPPING FOR BINARY...
*A SIMULATION STUDY
*DISCUSSION
*LITERATURE CITED

For a complex disease trait, the observed phenotype is defined in a binary fashion, i.e., wj = 1 if an individual is affected, and wj = 0 otherwise. We propose an underlying normal variable yj that determines the binary observation. The link between yj and wj is through a threshold t, i.e.,

(11)

The liability y is nothing new but an unobservable quantitative trait that can be described by Equation 1. The threshold model is overparameterized so that some constraints must be superimposed. As usual, we take t = 0 and {sigma}2e = 1 (ALBERT and CHIB 1993 Down). The expectation of the liability y is the same as that of Equation 2, but the variance-covariance matrix is

(12)

In the threshold model, the observables are the binary phenotypic values w = {wi}ni=1, the covariate data, and the marker data. The unobservables include the vector of liability y = {yi}ni=1, the number of QTL l, the locations of QTL {lambda} = {{lambda}j}lj=1, and the model effects {theta} = (ß, {sigma}2a1, · · · , {sigma}2al, {sigma}2d1, · · · , {sigma}2dl, {sigma}2A).

The joint posterior distribution of the unobservables, {y, l, {lambda}, {theta}}, after suppressing the notation for conditioning on the marker data and the covariate data, can be expressed as

(13)

where the likelihood has the form of

(14)

where 1(X {isin} A) is the indicator function, taking the value of 1 if X is contained in A, and 0 otherwise. Note that p(w|y, l, {lambda}, {theta}) = p(w|y) because w is solely determined by y. The second term in Equation 13 is the conditional distribution of the liability given all other unknowns and has the same form as Equation 5. Finally, the last term in Equation 13 is the joint prior distribution of l, {lambda}, and {theta}.

From this joint posterior density, it can be seen that the posterior distribution of the parameters l, {lambda}, and {theta} can be computed using the method for normally distributed data if the liability y is known. To implement the MCMC algorithm for binary traits, therefore, we need to generate the liability y. Since it is difficult to simulate directly from the conditional distribution of y given w and other parameters, we use the method of the Gibbs sampler (CHAN and KUK 1997 Down). The details are as follows.

Starting with some initial values y(0) = (y(0)1, · · ·, y(0)n)T consistent in signs with the observed data w, we proceed to generate y(t+1) = (y(t+1)1, · · · , y(t+1)n)T, t = 0, 1 · · ·, sequentially in the following manner. Given the current state (y(t), l(t), {lambda}(t), {theta}(t)) of all unknowns, we simulate

{Vvdash}

{Vvdash}

To carry out the above simulations, we use the fact

Since y = (y1, · · · , yn)T has a N(Xß, V) distribution, it follows that f(yi|yj, j != i; l, {lambda}, {theta}) is also normal with mean

and variance

where y-i = (y1, · · ·, yi-1, yi+1, · · ·, yn)T. Thus, f(yi|yj, j != i; wi, l, {lambda}, {theta}) is a truncated (above 0 if wi = 1; below 0 if wi = 0) normal distribution. Devroye's algorithm is then used to simulate a truncated normal variable (DEVROYE 1986 Down).


*  A SIMULATION STUDY
*TOP
*ABSTRACT
*THE GENETIC MODEL
*BAYESIAN MAPPING FOR NORMALLY...
*BAYESIAN MAPPING FOR BINARY...
*A SIMULATION STUDY
*DISCUSSION
*LITERATURE CITED

Design of the simulation experiment:
The proposed method is illustrated in the analysis of a simulated example. We simulated a single chromosome of length 100 cM. Eleven codominant markers were placed on the chromosome with a marker distance of 10 cM. Four equally frequent alleles were simulated at each marker locus. Two QTL residing at positions 25 and 75 cM, respectively, and 12 additional independent loci of equal effect (called polygene) were simulated for a continuous phenotype. The dominance effect of the polygene was assumed to be absent. The variances of the first QTL were {sigma}2a1 = {sigma}2d1 = 0.6 and those of the second QTL were {sigma}2a2 = {sigma}2d2 = 0.4. The polygenic variance was set at {sigma}2A = 1.0. The normally distributed phenotype of each individual took the sum of an overall mean, values of QTL additive and dominance effects, polygenic effect, and a residual error sampled from a standardized normal distribution. In addition, we generated a binary phenotype for each individual using the normally distributed phenotypic value as the underlying liability. The binary phenotype took 1 if the liability exceeded 0, and 0 otherwise. The overall mean was set to 0, which led to 50% population incidence. We simulated 500 full-sib families, each with 6 siblings (a total of 3000 individuals). For comparison, we analyzed both the normal data and the binary data. No phenotypic values and marker linkage phases were recorded for all parents.

In each of the MCMC analyses, we ran a single long chain with 5 x 105 cycles of simulations. The first 200 samples were discarded and the chain was then thinned (we saved one iteration in every 50 cycles) to reduce serial correlation in the stored samples so that the total number of samples kept in each analysis was 104.

The initial value for the QTL number was set at 1 and the corresponding initial location was given at the 50-cM position of the chromosome. The prior Poisson mean of the number of QTL was µl = 2 and the maximum number of QTL was L = 4. The starting values were 0.0 for the overall mean, 1.0 for both the polygenic and residual variances, and 0.5 for both the additive and dominance variances of the QTL. The prior for the overall mean was uniform over the range [-2.0, 2.0]. The priors for all variance components were chosen as uniform on [0.0, 4.0], the right endpoint being equal to the true phenotypic variance of the liability. The prior for the QTL locations was uniform over the whole chromosome. The tuning parameter of the proposal distribution was set at 2 cM for QTL locations and 0.1 for the overall mean and all variance components.

We used the QTL intensity function of SILLANPAA and ARJAS 1998 Down to detect the number and locations of QTL. In practice, we divided the chromosome into intervals of equal length (according to the Haldane distance) and then calculated the proportion of QTL in each interval from the MCMC samples. The interval length was chosen to be 1 cM long. As done in STEPHENS and FISCH 1998 Down and SILLANPAA and ARJAS 1999 Down, the QTL variances were estimated from samples in which QTL locations fall into the regions with sufficiently high estimated QTL intensities.

Results:
The Bayesian posterior QTL intensities for both the normal and the binary data are presented in Fig 1. The QTL intensity graphs are concentrated around the true locations of the simulated QTL. One peak of the posterior QTL intensity for the normal data is on [24, 25] and the other on [76, 77]. The corresponding peaks for the binary data are on [23, 24] and [74, 75], respectively. These results support quite strongly a model having two QTL in the chromosome. Comparing the shapes of the posterior QTL intensities for the binary and normal data analyses, we can see that binary data analysis does lose some information, but the information retained is still sufficient to detect both QTL.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 1. Histograms of the posterior QTL intensity for (a) normally distributed data and (b) binary data. The true number of QTL is two, located at positions 25 and 75 cM, respectively.

The approximate posterior distributions on the QTL number, obtained from the two types of data, are presented in Table 2. The posterior expectations are essentially the same for the binary data and the normal data and coincide with the simulated number of QTL. As expected, the posterior variance of QTL number for the normal data is smaller than that for the binary data. Finally, the posterior modes of the QTL numbers overlap with the true number of QTL in both data analyses.


 
View this table:
In this window
In a new window

 
Table 2. Inferred posterior distribution and posterior mean of the number of QTL

The posterior samples for the overall mean, the polygenic variance, and the residual variance from the normal data analysis, and those for the overall mean and the polygenic variance from the binary data analysis, are depicted in Fig 2 and Fig 3, respectively. The posterior means and standard errors of these parameters are given in Table 3. From the figures and the table, it appears that estimates of these parameters are close to the simulated values with small standard errors from the normal data. However, the polygenic variance is greatly overestimated and the estimated standard error is large from the binary data analysis although the mean is still estimated accurately.



View larger version (26K):
In this window
In a new window
Download PPT slide
 
Figure 2. Approximate posterior distributions of the (a) overall mean, (b) polygenic variance, and (c) residual variance for normally distributed data.



View larger version (29K):
In this window
In a new window
Download PPT slide
 
Figure 3. Approximate posterior distributions of the (a) overall mean and (b) polygenic variance for binary data.


 
View this table:
In this window
In a new window

 
Table 3. The highest posterior QTL intensity intervals, Bayesian estimates of QTL locations, the additive and dominance variances of the QTL, and polygenic and residual variances

The chromosome regions with sufficiently high posterior QTL intensity are given in Table 3. The posterior samples, in which QTL locations fall into these regions, are used to estimate the QTL variances. Fig 4 and Fig 5 depict such posterior samples for the QTL variances from the normal data and the binary data analyses, respectively. The posterior densities of QTL variances are concentrated around the corresponding true values for both types of analyses, although the binary data analysis shows a wider distribution. More explicitly, we calculated the means and the standard errors of the posterior samples for the QTL variances (see Table 3). For the normal data, the estimated posterior means of the additive and dominance variances of the QTL are close to the simulated true values, and the standard errors are small. In the binary data analysis, however, the additive and dominance variances of the QTL are overestimated, and the standard errors are larger than those for the normal data.



View larger version (42K):
In this window
In a new window
Download PPT slide
 
Figure 4. Approximate posterior distributions of the QTL additive and dominance variances for normally distributed data. (a) The first QTL additive variance determined from interval 18 cM to ~29 cM. (b) The first QTL dominance variance determined from interval 18 cM to ~29 cM. (c) The second QTL additive variance determined from interval 72 cM to ~84 cM. (d) The second QTL dominance variance determined from interval 72 cM to ~84 cM.



View larger version (52K):
In this window
In a new window
Download PPT slide
 
Figure 5. Approximate posterior distributions of the QTL additive and dominance variances for binary data. (a) The first QTL additive variance determined from interval 17 cM to ~30 cM. (b) The first QTL dominance variance determined from interval 17 cM to ~30 cM. (c) The second QTL additive variance determined from interval 64 cM to ~86 cM. (d) The second QTL dominance variance determined from interval 64 cM to ~86 cM.

The point estimates and the estimation errors of the QTL locations are also given in Table 3. For both types of data, the estimated QTL locations are very close to the corresponding true values. However, the standard errors are much larger for the binary data than those for the normal data.

For comparison, the simulated normal data set was analyzed using the ML method (XU and ATCHLEY 1995 Down). Fig 6 shows the likelihood profile along the chromosome when a single QTL was fitted to the model. We can see that the likelihood-ratio profile shows two peaks, the higher peak being at position 25 cM and the lower peak at position 76 cM, overlapping with the true locations of the two simulated QTL (at 25 and 75 cM, respectively). This result is consistent with results of the reversible jump MCMC analysis. The ML estimates of variances of the first QTL are 2a1 = 0.8253 and 2d1 = 0.8900. The estimated variances of the second QTL are 2a2 = 0.7839 and 2d2 = 0.5667. The ML estimates of QTL variances are not so close to the corresponding true values as the estimated posterior means in the reversible jump MCMC analysis.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 6. Likelihood-ratio profile of ML mapping from normally distributed data. There are two QTL, located at positions 25 and 75 cM, respectively.


*  DISCUSSION
*TOP
*ABSTRACT
*THE GENETIC MODEL
*BAYESIAN MAPPING FOR NORMALLY...
*BAYESIAN MAPPING FOR BINARY...
*A SIMULATION STUDY
*DISCUSSION
*LITERATURE CITED

We have presented here a Bayesian procedure that allows detection of multiple QTL for both normally distributed and binary traits under IBD-based variance component analysis. The proposed Bayesian procedure is implemented via a reversible jump MCMC algorithm, which enables moves to be made between models with different numbers of QTL. We model complex binary traits under the classical threshold model of quantitative genetics (LYNCH and WALSH 1998 Down). The Bayes method of mapping QTL for binary traits is developed based on the idea of data augmentation. This treatment provides an easy way to generate the values of the normally distributed latent variable, which in turn allows the use of Bayesian mapping for normally distributed traits. The methodology can be easily generalized to multiple ordered categorical traits (ALBERT and CHIB 1993 Down).

Bayesian mapping statistics have been well studied in the context of line-crossing experiments (SATAGOPAN and YANDELL 1996 Down; SATAGOPAN et al. 1996 Down; SILLANPAA and ARJAS 1998 Down, SILLANPAA and ARJAS 1999 Down; STEPHENS and FISCH 1998 Down), in which the parameter of primary interest is the average effect of allelic substitution (the first moment). The variance of the allelic substitution is given as a prior and no further inference is made to the prior variance. When applied to complex pedigrees, the Bayes method can be formulated to fit different models of QTL variation (e.g., biallelic or normal QTL effects; HEATH 1997 Down; UIMARI and HOESCHELE 1997 Down; BINK et al. 1998 Down). In contrast to line-crossing experiments in which the number of alleles at the QTL is known exactly, in outbred populations, the number of alleles per locus is rarely known. In most cases, we can assume two alleles in a relatively homogeneous population, as made by HEATH 1997 Down and UIMARI and HOESCHELE 1997 Down. As a consequence, one must also estimate the allelic frequency to infer the genetic variance via 2a = 2(1 - 2. Although the biallelic model may be proper in most situations, it can potentially fail if the actual number of polymorphic alleles is not two. The IBD-based variance component Bayesian mapping goes up another level in the hierarchical model by treating the variance as the parameter of interest. As a result, information about the number of alleles is not required, which has significantly improved the robustness of the Bayes method.

The IBD-based variance component approach of QTL mapping is based on the proportion of alleles with identity-by-descent shared by relatives. Inference on IBD matrices does not need the information of marker linkage phases. The IBD states of markers are calculated in advance and remain as a prior in the Bayesian procedure. With the proposed method, we can completely avoid the complicated problem concerning sampling of additive and dominance effects of QTL and marker and QTL genotypes. The IBD-based variance component Bayesian mapping is not limited to nuclear families. In the simulation study, we used full-sib families to demonstrate the utility of the method solely for the purpose of simplicity. In theory, the Bayes method described in this article can be applied easily to extended or complex pedigrees by using a general algorithm to compute the IBD proportion shared by an arbitrary relative pair. Such a general algorithm has been given by ALMASY and BLANGERO 1998 Down. For large and complex pedigrees, however, the computational burden may be prohibitive, due to the need for computation of the variance-covariance matrix V and its inverse.

In the simulation study, we put two QTL on a single chromosome and 12 independent loci on other unspecified chromosomes as a "polygene." Our model always includes a polygenic term to absorb background QTL. The polygenic variance varies depending on the model effects fitted. Without this polygenic term, variances due to QTL on unmarked chromosomes or chromosome regions will join the residual variance. A large residual variance will reduce the power of QTL detection. In practice, we can take one of three strategies for QTL mapping. One strategy is to analyze one chromosome at a time. Marker information on other chromosomes can be completely ignored. In this case, variances of QTL on other chromosomes will be collected by the polygene (not the residual). The second strategy is to include the previously identified QTL (at other chromosomes) in the model instead of being absorbed into the polygenic term. The other strategy is to map QTL simultaneously for all chromosomes. In the latter situation, if the marker map is sufficiently dense, the polygenic term can be ignored because eventually all QTL will be identified and their effects will be explicitly estimated. These three mapping strategies may produce similar results.

The reversible jump MCMC sampler performed well for the simulated normally distributed and binary data. A plot of the changes in the number of QTL against the number of iterations for the binary data used in the simulation study is presented in Fig 7. It shows that the reversible jump MCMC algorithm mixes well over the number of QTL. A similar plot was obtained for the normally distributed data. We detected no influence of initial values of unknowns on the mixing of the MCMC sampler. For example, with the simulated data, starting with l0 = 4, the QTL number l quickly dropped to 1 after several hundred iterations and subsequently behaved the same as starting with l0 = 1. The mixing behavior of the number of QTL is greatly affected by the prior distributions for QTL additive and dominance variances (STEPHENS and FISCH 1998 Down). As expected, increasing the upper bounds of the uniform prior distributions has the general effect of decreasing the acceptance rates of adding a new QTL to the model and deleting a QTL from the model. With the upper bound c = 4.0 used in our simulation studies, the observed acceptance rates for both adding and deleting steps were ~0.4% for both the simulated normally distributed and binary data. These acceptance rates were similar to those in SILLANPAA and ARJAS 1998 Down. The acceptance rates for both adding and deleting steps increased to 3% when the upper bound was set at c = 1.0. By comparison, the acceptance proportions for updating QTL locations and model effects {theta} were found to be rather high in general and were ~80% for the simulated data.



View larger version (33K):
In this window
In a new window
Download PPT slide
 
Figure 7. The trace of the number of QTL for the binary data, for 104 saved samples after burn-in.

A major implementation issue in MCMC is to determine the effective sample sizes. This issue is related to the assessment of convergence of the chain, the serial correlation between the samples, and the burn-in period. For a single long chain, one can examine time series graphs of a simulated sequence to judge whether the chain is sufficiently long (GEYER 1992 Down). Subsampling the chain at regular intervals is a common practice to reduce the serial correlation between the samples. With the reversible jump MCMC algorithm, however, it is difficult to calculate autocorrelation of the MCMC samples for all parameters because the dimension keeps changing from one cycle to another. When the dimension changes, the identities of the QTL also change. The parameters in one cycle of the iteration may be different from those in the next cycle of iteration. In our simulation studies, therefore, we empirically determined the lengths of the chain, subsampling intervals and the burn-in period. We used the plots of the changes in the number of QTL against the number of iterations to determine an approximate burn-in period. The length of subsampling intervals was chosen to eliminate obvious changing trends for all parameters.

We only use marker information to infer the IBD matrices of QTL and allow multiple QTL located in the same marker interval in the proposed method. Alternatively, we can infer the IBD matrices of a QTL considered by utilizing the IBD information of all markers and other existing QTL in the same chromosome and assume at most one QTL in any marker interval is acceptable. However, our treatment simplifies the MCMC algorithm and does not result in a significant reduction in efficiency particularly when the marker map is dense. It has been shown that, conditional on the IBD states of a marker locus, the IBD matrices of a QTL on one side of the marker are not correlated to those of a QTL on the other side (XU and ATCHLEY 1995 Down). In case of two tightly linked QTL in the same marker interval, the IBD matrices of these two QTL are always highly correlated no matter what methods are used to infer them. The correlation between two QTL can be completely eliminated by the assumption that any marker interval includes at most one QTL. Under this assumption, the acceptance probabilities for adding one QTL and deleting one QTL should be modified (STEPHENS and FISCH 1998 Down).

Finally, Bayesian statistics have the inherent flexibility introduced by their incorporation of multiple levels of randomness and the resultant ability to combine information from different sources. Therefore, the Bayesian approach could be extended to allow more complicated models under more complicated data structures, e.g., the epistatic model. When a new QTL is proposed, its interactive effects with all existing QTL should be proposed and, if accepted, the epistatic effects should be included in the model. A QTL is finally added to the model if at least one effect caused by the QTL is accepted. This will involve additional reversible jumps on the dimension of the model even if the number of QTL remains the same.


*  ACKNOWLEDGMENTS

We thank Dr. Damian Gessler for his helpful comments on the manuscript. We also thank two anonymous reviewers for their critical comments on an earlier version of the manuscript. This research was supported by the National Institutes of Health Grant GM55321 and the United States Department of Agriculture National Research Initiative Competitive Grants Program 97-35205-5075 to S.X.

Manuscript received December 6, 1999; Accepted for publication May 19, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THE GENETIC MODEL
*BAYESIAN MAPPING FOR NORMALLY...
*BAYESIAN MAPPING FOR BINARY...
*A SIMULATION STUDY
*DISCUSSION
*LITERATURE CITED

ALBERT, J. H. and S. CHIB, 1993  Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88:669-679.

ALMASY, L. and J. BLANGERO, 1998  Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62:1198-1211[Medline].

AMOS, C. I., 1994  Robust variance-components approach for assessing genetic linkage in pedigrees. Am. J. Hum. Genet. 54:535-543[Medline].

BINK, M. G. A. M., L. L. G. JANSS and R. L. QUAAS, 1998 Mapping a polyallelic quantitative trait locus using simulated tempering. Proceedings of the 6th World Congress on Genetics Applied to Livestock Production Science, Armidale, Australia, Vol. 26, pp. 277–280.

CHAN, J. S. K. and A. C. KUK, 1997  Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 88:86-97.

DEVROYE, T., 1986 Non-uniform Random Variable Generation. Springer-Verlag, New York.

DUGGIRALA, R., J. T. WILLIAMS, S. WILLIAMS-BLANGERO, and J. BLANGERO, 1997  A variance component approach to dichotomous trait linkage analysis using a threshold model. Genet. Epidemiol. 14:987-992[Medline].

FULKER, D. W. and L. R. CARDON, 1994  A sib-pair approach to interval mapping of quantitative trait loci. Am. J. Hum. Genet. 54:1092-1103[Medline].

FULKER, D. W., S. S. CHERNY, and L. R. CARDON, 1995  Multipoint interval mapping of quantitative trait loci, using sib pairs. Am. J. Hum. Genet. 56:1224-1233[Medline].

GEMAN, S. and D. GEMAN, 1984  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6:721-741.

GESSLER, D. D. G. and S. XU, 1996  Using the expectation or the distribution of identity-by-descent for mapping quantitative trait loci under the random model. Am. J. Hum. Genet. 59:1382-1390[Medline].

GEYER, C. J., 1992  Practical Markov chain Monte Carlo. Stat. Sci. 7:473-511.

GOLDGAR, D. E., 1990  Multipoint analysis of human quantitative genetic variation. Am. J. Hum. Genet. 47:957-967[Medline].

GREEN, P. J., 1995  Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711-732[Abstract/Free Full Text].

GRIGNOLA, F. E., I. HOESCHELE, and B. TIER, 1996  Mapping quantitative trait loci via residual maximum likelihood: I. Methodology. Genet. Sel. Evol. 28:479-490.

HACKETT, C. A. and J. I. WELLER, 1995  Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51:1252-1263[Medline].

HALEY, C. S. and S. A. KNOTT, 1992  A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].

HASTINGS, W. K., 1970  Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109[Abstract/Free Full Text].

HEATH, S. C., 1997  Markov chain Monte Carlo segregation and linkage analysis of oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].

JANSEN, R. C., 1993  Interval mapping of multiple quantitative trait loci. Genetics 135:205-211[Abstract].

KAO, C. H., Z-B. ZENG, and R. D. TEASDALE, 1999  Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216[Abstract/Free Full Text].

KRUGLYAK, L. and E. S. LANDER, 1995  Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am. J. Hum. Genet. 57:439-454[Medline].

LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.

MENDELL, N. R. and R. C. ELSTON, 1974  Multifactorial qualitative traits: genetic analysis and prediction of recurrence risks. Biometrics 30:41-57[Medline].

METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER, and E. TELLER, 1953  Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1091.

RAO, S. and S. XU, 1998  Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity 81:214-224.

REBAI, A., 1997  Comparison of methods for regression interval mapping in QTL analysis with non-normal traits. Genet.