- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Yi, N.
- Articles by Xu, S.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Yi, N.
- Articles by Xu, S.
Bayesian Mapping of Quantitative Trait Loci Under Complicated Mating Designs
Nengjun Yia and Shizhong Xuaa Department of Botany and Plant Sciences, University of California, Riverside, California 92521
Corresponding author: Shizhong Xu, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521., xu{at}genetics.ucr.edu (E-mail)
Communicating editor: Y.-X. FU
| ABSTRACT |
|---|
Quantitative trait loci (QTL) are easily studied in a biallelic system. Such a system requires the cross of two inbred lines presumably fixed for alternative alleles of the QTL. However, development of inbred lines can be time consuming and cost ineffective for species with long generation intervals and severe inbreeding depression. In addition, restriction of the investigation to a biallelic system can sometimes be misleading because many potentially important allelic interactions do not have a chance to express and thus fail to be detected. A complicated mating design involving multiple alleles mimics the actual breeding system. However, it is difficult to develop the statistical model and algorithm using the classical maximum-likelihood method. In this study, we investigate the application of a Bayesian method implemented via the Markov chain Monte Carlo (MCMC) algorithm to QTL mapping under arbitrarily complicated mating designs. We develop the method under a mixed-model framework where the genetic values of founder alleles are treated as random and the nongenetic effects are treated as fixed. With the MCMC algorithm, we first draw the gene flows from the founders to the descendants for each QTL and then draw samples of the genetic parameters. Finally, we are able to simultaneously infer the posterior distribution of the number, the additive and dominance variances, and the chromosomal locations of all identified QTL.
THE availability of dense molecular marker maps provides a large opportunity to locate genes responsible for variation of quantitative traits in plants, animals, and humans. Experimental design and methodology are two important issues in quantitative trait loci (QTL) mapping. Most QTL mapping techniques require designed line crosses, e.g., F2 or backcross (BC). These crosses do not exist in natural populations and are not commonly used in some plant species in the breeding industry. It is not economical to design such line cross experiments solely for the purpose of QTL mapping if these crosses are not regularly used in a breeding program. Almost all natural populations and most domesticated plant populations consist of complicated pedigree structures. Even if inbred lines are used, different crosses may be connected by some common ancestors. A mating design combining information from multiple crosses is more powerful than one involving a single cross (![]()
![]()
![]()
![]()
![]()
A variety of methods have been developed for QTL mapping (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Bayesian analysis is preferable because of its convenience and flexibility in the use of full pedigrees and mapping multiple QTL, although it is computationally very demanding. Bayesian mapping fully takes into account the uncertainties associated with all unknowns in the QTL mapping problem, including the number and locations of QTL, effects of QTL, and the genotypes of markers and QTL. In plant line-crossing experiments, Bayesian mapping has been developed by using the Markov chain Monte Carlo algorithm, in particular, for detection of multiple QTL (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
There are several differences between plants and animals or humans in the context of general pedigrees: (i) many plant species are self-compatible and one must deal with a system that involves a mixture of selfing and outcrossing; (ii) inbreeding and line crossing are common mating designs in most plant breeding populations; (iii) a breeding population of plants usually contains fewer founders than an animal or human population and the founders can be pure inbred lines; and (iv) family sizes of plants are usually large compared with animals and humans. These differences provide additional opportunities for detecting QTL. Unfortunately, the Bayesian mapping methods developed for human and animal pedigrees cannot handle the unique properties for plant pedigrees. This poses unique challenges for plant geneticists to develop new QTL mapping statistics.
In this article, we develop a Bayesian method of QTL mapping under arbitrarily complicated mating designs, including a group of independent or related F2 or backcross populations and complicated multiple-generation cross populations derived from inbred or outbred founders. The method is so flexible that it can handle a variety of genetic models, such as arbitrary number of QTL alleles, dominance effects, and fixed and random models. The Bayesian method is implemented via a reversible jump Markov chain Monte Carlo (MCMC) algorithm, which allows simultaneous estimation of the number, the locations, and effects of identified QTL.
| STATISTICAL METHODS |
|---|
Mixed model:
Assume that the mapping population consists of n individuals with arbitrary pedigree relationships, and among the n individuals there are m founders and (n - m) nonfounders. The whole population may consist of a single large pedigree or multiple independent pedigrees. A founder in a pedigree is defined as an individual with no parents included in the pedigree. In contrast, a nonfounder is defined as an individual with both parents included in the pedigree. Founders are assumed to be unrelated but can be inbred. The descendants may be related in an arbitrary way.
Let y represent an n x 1 vector for the observed values of a quantitative trait. When the trait is controlled by multiple genes acting independently, y can be described by the linear model
![]() |
(1) |
where X is a known design matrix for a vector of nongenetic effects b (including the overall mean), l is the number of QTL on all chromosomes, upj and umj are n x 1 vectors for the paternal and maternal allelic effects for the jth QTL, vj is an n x 1 vector for the dominance effects for the jth QTL, and e is the vector of residual (environmental) effects. This model is written in the original form of the animal model (![]()
Denote aj as a 2m x 1 vector for the effects of the founder alleles (with m ancestors, each with two alleles) and dj as a vector of interaction effects (dominance effects) between all possible pairs of the 2m founder alleles at the jth QTL. The dimension of dj is m(2m + 1). The dimension of dj can be reduced greatly in some mating designs where it is impossible for some founder alleles to be combined in any descendant. The QTL effects of all individuals can be expressed as linear functions of the allelic effects and their interactions in the founders, i.e., upj = Zpjaj, umj = Zmjaj, and vj = Wjdj, leading to
![]() |
(2) |
This model is written in the form of a reduced animal model (![]()
2e) distribution.
The observables in model (2) include the phenotypic values, y = {yi}ni=1, the covariate X, and the marker data M*. The marker data include the locations of markers on chromosomes and the observed (possibly incomplete) marker genotypes. The observed marker genotypes in some individuals may not be fully informative and the patterns of allelic inheritance of such markers may also be unknown. The list of unobservables includes the number of QTL l, the QTL locations
= {
j}lj=1, the complete marker genotype matrix M, the QTL allelic inheritance matrices Zp = {Zpj}lj=1 and Zm = {Zmj}lj=1, the QTL allelic effects a = {aj}lj=1, the QTL dominance effects d = {dj}lj=1, and the residual variance
2e. The location parameter,
j, is expressed as the distance of the jth QTL from one end of the chromosome. The complete marker genotype means marker genotype with known linkage phase. A complete genotype for a nonfounder means a known allelic inheritance pattern. The QTL dominance design matrices are suppressed in the list of unknowns because they are completely determined by the QTL allelic inheritance matrices.
In a Bayesian framework, the unknowns in the model are considered to be drawn from appropriate prior distributions. The joint posterior distribution of all unobservables
= {l,
, a, d, b, M, Zp, Zm,
2e} given the observables {y, X, M*} and prior information can be expressed as
![]() |
(3) |
The likelihood function p(y|
) depends on the distribution of y. For normally distributed traits, it has the form
![]() |
(4) |
where R = y - Xb -
lj=1(Zpj + Zmj)aj -
lj=1Wjdj.
The priors of the allelic and dominance effects of QTL, p(aj) and p(dj), depend on the inbreeding coefficients of the founders (see Appendix). The inclusion of inbreeding coefficients of founders enables the proposed method to treat inbred founders. The prior distribution of the number of QTL, p(l), is assumed to be a truncated Poisson distribution with mean µ and a predefined maximum number lmax. When no information regarding the locations is available, the prior probability that a QTL is on a chromosome is proportional to the length of the chromosome. Within a chromosome, each QTL has a uniform distribution of residing at any location on that chromosome. The prior distributions of b and
2e are assumed to be uniform on predefined intervals, although other priors can be used. Finally, p(Zp, Zm, M|l,
, M*) is the joint conditional distribution of QTL allelic inheritance matrices and complete marker genotypes.
In some situations, we need to add an extra layer to the hierarchical model. The distributions p(aj) and p(dj) depend on other unknown quantities
2aj and
2dj, the allelic and dominance variance of the jth QTL. In other words, we replace p(aj) by p(aj,
2aj) = p(aj|
2aj)p(
2aj) and p(dj) by p(dj,
2dj) = p(dj|
2dj)p(
2dj). The parameters of interest now are
2aj and
2dj, with aj and dj being treated as missing values. The joint posterior distribution of all variables is then factorized as
![]() |
(5) |
where Va = {
2aj}lj=1, Vd = {
2dj}lj=1, and the distributions p(aj|
2aj) and p(dj|
2dj) are multivariate normal as given in the Appendix We use uniform prior distributions for p(
2aj) and p(
2dj) within some predetermined intervals. Other terms in Equation 5 are the same as in Equation 3.
Equation 3 and Equation 5 correspond to two different approaches in QTL mapping, i.e., the fixed-model and the random-model approaches, respectively. If there are only a few founders who are not randomly sampled from a large reference population, our interest may be only in the values of the actual allelic effects and the dominance effects for the founders at hand. Under the fixed-model approach the priors for the allelic and dominance effects are treated as variable with known distributions. The fixed-model approach is very common in designed line-crossing experiments, e.g., F2 and BC designs, where the average effect of allelic substitution is the parameter of interest. When the founders are randomly sampled from a reference population, we are usually interested in the variances of the genetic effects in the population from which the founders are sampled. In this case, the distributions for the allelic and dominance effects of founders depend on some unknown parameters. When the number of founders is so small that a meaningful estimate of the allelic or dominance variance cannot be inferred from the limited number of alleles sampled, we may still use the fixed-model approach, even if the founders are a random sample. In this study, we concentrate on the random model approach.
Reversible jump MCMC:
In Bayesian analysis, inferences about the parameters of interest are based on the joint posterior distribution of all unknowns. Since the joint posterior distribution does not have a standard form, MCMC samplers are used to generate samples from the joint posterior distribution (![]()
![]()
![]()
![]()
- Updating QTL allelic effects a = {aj}lj=1;
- Updating QTL dominance effects d = {dj}lj=1;
- Updating QTL allelic variances Va = {
2aj}lj=1; - Updating QTL dominance variances Vd = {
2dj}lj=1; - Updating the fixed effects b and residual variance
2e; - Updating complete marker genotypes M and QTL allelic inheritance matrices Zp and Zm;
- Updating QTL locations
= {
j}lj=1; - Birth of a QTL (adding one new QTL to the model) or death of a QTL (removing one existing QTL from the model).
The proposed algorithm starts from an initial point and proceeds to update each of the unknowns in turn. One complete pass over these eight update steps defines a cycle of iteration. Updating steps (a)(g) are conventional and do not alter the dimension of the variable vector. We use Metropolis-Hastings algorithms to implement steps (a)(e) and (g), and the Gibbs sampler to update step (f). Step (h) involves changing QTL number by one and making necessary corresponding changes to (a, d, Va, Vd, Zp, Zm,
). A reversible jump step is needed to change the number of QTL.
Several methods have been available for updating marker genotypes in general pedigrees (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
2e is also straightforward.
Updating QTL effects:
The allelic effects of QTL are updated founder by founder and locus by locus. Denote ajk as a 2 x 1 vector for the allelic effects of the kth founder at the jth QTL (Appendix). To update ajk, two random variables,
1 and
2, are simulated independently from the symmetric uniform distribution around zero (random walk). The length of this uniform distribution is determined empirically and should result in a reasonable rate of average acceptance rate. A new proposal value of ajk takes aj*k = ajk + (
1, fk
1 + (1 - fk)
2)T, where fk is the inbreeding coefficient of the kth founder. The new proposal is accepted with probability
![]() |
(6) |
where
* means all elements of
except that ajk is replaced by the proposal aj*k.
The dominance effects of QTL are updated for two founders at a time, again in a locus-by-locus basis. Denote djkk' as a vector of dominance effects between alleles of founder k and alleles of founder k' at the jth QTL. The dimension of djkk' is three or four, depending on whether k equals k' (see the Appendix). To update djkk', a new proposal dj*kk' is simulated by random walk, denoted by

if k' = k; otherwise,

where
1,
2,
3, and
4 are sampled independently from the symmetric uniform distribution around zero. The new proposal is accepted with probability
![]() |
(7) |
where
* means all elements of
except that djkk' is replaced by the proposal dj*kk'.
Updating QTL variances:
QTL allelic and dominance variances are updated locus by locus. To update
2aj and
2dj, new proposals
2*aj and
2*dj are sampled from the symmetric uniform densities around their previous values. The proposals are accepted with probabilities
![]() |
(8) |
respectively.
Updating QTL allelic inheritance Zp and Zm: Each allele in the descendants can be traced back to one of the founder alleles. This is reflected by apj = Zpjaj and amj = Zmjaj, where each row of matrices Zpj and Zmj has one element taking 1 and all other elements being 0. It is not convenient to generate realizations of Zpj and Zmj directly, but we can easily generate a sample of Zpj and Zmj indirectly through the following recursive approach.
Consider a pedigree with m founders. Let the 2m founder alleles be numbered consecutively from 1 to 2m. Then allele 2k - 1 and 2k are the two alleles of the kth founder. Assume that individuals are entered into the pedigree in a chronological order so that the parents are evaluated before their progeny. Denote Zpj(i) and Zmj(i) as the ith rows of Zpj and Zmj, respectively; i.e., Zpj(i) and Zmj(i) store the allele identifications of the paternal and maternal alleles of individual i, respectively. For example, if the paternal allele of individual i is traced back to allele 3 of the founders and the maternal allele is traced back to allele 10 of the founders, then the third element of Zpj(i) is 1 and all other elements are 0, and the tenth element of Zmj(i) is 1 and all other elements are 0. We now describe the recurrent process of building matrices Zpj and Zmj. Assume that we have already built matrices Zpj and Zmj up to the first (i - 1)th rows and are ready to build the ith rows. If the ith individual is a founder, say the kth founder, then the (2k - 1)th element of Zpj(i) and the (2k + 1)th element of Zmj(i) are 1. If the ith individual is not a founder but the progeny of individuals i1 (father) and i2 (mother), then

where upij and umij are the paternal and maternal segregation (meiosis) indicators, respectively, for individual i at the jth QTL. If the paternal allele of the father is passed to individual i, then upij = 1; otherwise, upij = 0. The value of umij is similarly defined but for the allelic inheritance of the mother. Note that Zpj(i1), Zmj(i1), Zpj(i2), and Zmj(i2) have been previously built because i1
i - 1 and i2
i - 1. Therefore, to trace the allelic origin, one only needs to simulate the segregation indicators for each descendant.
Simulating the segregation indicators (upij, umij) is straightforward. The segregation indicators (upij, umij) can take four possible values, i.e., (1, 1), (1, 0), (0, 1), and (0, 0). The conditional posterior distribution is thus a discrete distribution over the four possible allelic inheritance patterns and depends on the position of the QTL, the segregation indicators of flanking loci (markers or QTL), the phenotypic value of the progeny, and other parameter values in the model. We then sample a value from the posterior distribution and convert the segregation indicators into the design matrices using the recursive equation. The recursive algorithm for QTL allelic inheritance can be applied to complicated designs with mixture of outcrossing and selfing.
Updating QTL locations:
Similar to the method of ![]()
![]()
are modified one at a time using the Metropolis algorithm. For the jth QTL, a proposal
*j is sampled from a symmetric uniform distribution in the neighborhood of the previous value
j. In the meantime, new proposals for the segregation indicator matrices, denoted by Up*j and Um*j, are generated according to the method of updating QTL allelic inheritance matrices. The new allelic inheritance matrices, Zp*j and Zm*j, are calculated using the recursive equations. Denote the generating distribution of (Up*j, Um*j) by q(Up*j, Um*j). The proposals are accepted with probability
![]() |
(9) |
where
* contains all elements of
except
j, Zpj, Zmj. Let GLj(GRj) denote the complete genotypes of all pedigree members at the left (right) flanking locus of the corresponding location. If the proposals are accepted, we update the location of the jth QTL and also modify the segregation indicators, the allelic inheritance, and dominance design matrices at the jth QTL at the same time.
Updating QTL number: The reversible jump mechanism is needed to change the QTL number in the model. In this study, a reversible pair is used: birth/death of a QTL. In every cycle of the simulation, we make a random choice between attempting to add one new QTL into the model or delete one existing QTL from the model, with probabilities pa and pd = 1 - pa, respectively. Of course, pa = 0 if l = lmax and pd = 0 if l = 0, and otherwise we choose pa = 0.5, for 0 < l < lmax.
For a birth step, we need to generate a new location
l+1, a new allelic variance
2al+1, a new dominance variance
2dl+1, a new vector of allelic effects of the founder alleles al+1, a new vector of dominance effects between all possible pairs of the founder alleles dl+1, and new inheritance and dominance design matrices of all pedigree members Zpl+1, Zml+1, and Wl+1 for the new QTL. The new location
l+1 and the variances
2al+1 and
2dl+1 are sampled from the corresponding prior densities. The allelic effects al+1 and the dominance effects dl+1 are then simulated from the distributions p(al+1|
2al+1) and p(dl+1|
2dl+1) described in the Appendix The segregation indicator matrices, denoted as Upl+1 and Uml+1, are generated from q(Upl+1, Uml+1) using the method of updating QTL allelic inheritance matrices, and new inheritance and dominance design matrices Zpl+1, Zml+1, and Wl+1 are then calculated. The proposal is accepted with probability
![]() |
(10) |
where
* = (
,
l+1, al+1, dl+1, Zpl+1, Zml+1) with l in
replaced by (l + 1); GLl+1(GRl+1) denotes the complete genotypes of all pedigree members at the left (right) flanking locus of the location
l+1.
The death step is somewhat simpler. A random choice is made among the existing QTL, and the chosen QTL is then proposed to delete from the model. If the jth existing QTL is proposed to delete, the acceptance probability for the deletion is
![]() |
(11) |
where
* means all elements of
except the items corresponding to the jth QTL. Other terms of this equation are defined similarly as in Equation 10.
| A SIMULATION STUDY |
|---|
Design of the simulation experiment:
The proposed method was evaluated empirically by analyzing a simulated large complex pedigree. The pedigree consists of 2020 individuals covering three discrete generations. The pedigree is depicted in Fig 1. Twenty founders were randomly sampled from an outbred base population; i.e., founders were noninbred and genetically unrelated to each other. These founders are numbered from 1 to 20 and represented by open circles in the first row of Fig 1. With completely random mating among the founders, 20 full-sib families were formed, each represented by an open square in the second row of Fig 1. Because of the complete randomness, some founders (11, 13, and 18) were not represented in the next generation while others (e.g., 1, 6, 9, etc.) were overrepresented. Each full-sib family contains 50 members so that a total of 1000 individuals are available in the F1 generation (the second row of Fig 1). From the 1000 individuals, we randomly selected 20 as parents of the next generation. These 20 parents were randomly mated to form 20 full-sib families in the F2 generation. Each full-sib family is represented by an open square in the third row of Fig 1. Again, each family consists of 50 members, leading to 1000 individuals in the F2. The mating was completely arbitrary, including selfing, full-sib, and half-sib mating. Although we did not simulate parent-offspring mating (overlapping generation), nothing prevents us from doing that.
|
A quantitative trait was modeled as being controlled by three QTL residing on two chromosomes of length 100 and 70 cM, respectively. The 40 = 2 x 20 allelic effects and 820 =
dominance effects of each QTL in the founders were simulated from their corresponding normal distributions. The residual variance was set at
2e = 1.0. The fixed effect contains only the overall mean, which was set at b = 0.0. The true locations, and allelic and dominance variances of the three simulated QTL are given in Table 1. Marker data were generated for all individuals. Eleven and 8 codominant markers were respectively placed on the two chromosomes with a marker distance of 10 cM between two neighboring markers. Six equally frequent alleles were simulated at each marker locus. With this assignment of marker allele frequencies, many loci were partially informative and some were even uninformative at all. No phenotypic records were available for the 20 founders. The linkage phases of markers in the founders were reshuffled and eventually reconstructed via the MCMC process. Two sets of data were analyzed: data I include all 2020 individuals and data II include only founders and the F1 generation, a total of 1020 individuals.
|
The initial value for the QTL number was set at two and the corresponding locations were at 50 cM of chromosome 1 and 40 cM of chromosome 2, respectively. The prior Poisson mean of the QTL number was µ = 2 and the maximum number of QTL was lmax = 6. The starting values were 0.05 for all QTL allelic and dominance variances and 0.0 and 2.0 for the overall mean and the residual variance, respectively. The initial marker linkage phases were assigned randomly for each founder. The initial QTL allelic inheritance for each individual was determined by the initial QTL locations and the initial complete genotypes of the flanking markers.
A flat prior was assigned to the overall mean. The priors for all variance components were chosen to be uniform on (0.0, 2.0], the right endpoint being equal to the true phenotypic variance. The prior for the QTL locations was uniform over the whole genome. The tuning parameters of the proposal distributions were chosen to be 2.0 cM for QTL locations and 0.05 for all other parameters.
The proposed MCMC sampler was run for 5 x 105 cycles in each of the MCMC analyses. The first 400 samples (burn-in) were discarded. To reduce serial correlation in the samples, we saved only one in every 50 cycles of simulations so that the total number of samples kept in the analysis was 104.
Results:
The estimated posterior distributions of the QTL number in the analyses of the two data sets are given in Table 2. In each of the data sets, it is immediately apparent that there are three QTL controlling the trait. The posterior expectations are essentially the same as the true number of QTL for both data sets. The posterior modes of QTL numbers are consistent with the true number of QTL as well. The posterior for data II is more widely spread than that for data I, indicating that QTL can be more accurately detected using the extended families.
|
QTL locations were estimated using the posterior QTL intensity function (![]()
![]()
|
|
The overall mean and the residual variance were estimated from all MCMC samples. The posterior distributions of these two parameters for data I are depicted in Fig 4. The posterior mean and standard error are 0.2175 and 0.2790 for the overall mean, and 1.2037 and 0.0552 for the residual variance, respectively. It can be seen that the overall mean and the residual variance were slightly overestimated.
|
Following the idea of ![]()
k) and fd(
k) be the cumulative distribution functions associated with the allelic and dominance variances of a putative QTL in small interval
k. We used the means of samples in
k to assess fa(
k) and fd(
k). Therefore, fa(
k) and fd(
k) can be expressed as

and

respectively, where l(m) is the number of QTL in the mth posterior sample,
(m)q is Note that fa(
k) and fd(
k) are meaningful only when a sufficient number of samples are contained in
k. The plots of fa(
k) and fd(
k) for data I are presented in Fig 2. The chromosome regions with sufficiently high posterior QTL intensity are given in Table 3. The posterior samples in which QTL locations fell into these regions were used to estimate the QTL variances. Fig 5 depicts the posterior distributions for the QTL variances from the analysis of data I. We also calculated the means and the standard errors of the posterior samples for the QTL variances (see Table 3). In most situations, it appears that the estimates of QTL variances are close to the corresponding true values with small standard errors. The posterior means and the estimation errors of the QTL locations are also given in Table 3. It can be seen that the estimated QTL locations are very close to the corresponding true values.
|
|
| DISCUSSION |
|---|
There are many statistical methods and computer programs available for QTL mapping. Most of them are specialized in one or two particular types of designs, e.g., BC, F2, or multiple nuclear families. Here, we introduce a unified methodology of QTL mapping for arbitrarily complicated mating designs, ranging from a simple line cross to multiple independent sib-pairs. Although we developed the method based on a random-model approach, it works equally well for a fixed model. The difference between the random and the fixed models under the Bayesian framework is judged only by whether the distributions of allelic and dominance effects of QTL, i.e., p(aj) and p(dj), are treated as priors or not. If they are treated as prior distributions, the parameters involved in the prior distributions are assessed before the experiment, and there is no attempt to estimate them. The model is then called the fixed model. On the other hand, if p(aj) and p(dj) are not the ultimate priors but the distribution of missing values aj and dj, we are then interested in the parameters in p(aj) and p(dj), e.g.,
2aj, which in turn need to be assigned a prior. We are essentially interested in making an inference for
2aj. In this case, the model is called a random model. For the fixed model, the update steps for QTL variances in the proposed algorithm are no longer required. Therefore, programming-wise, the difference between a random and a fixed model depends on the turning on/off of a single statement.
The ability to handle arbitrarily complicated pedigrees and the flexibility of switching between fixed and random models possessed by our Bayesian mapping arise from the use of an "allelic approach" as opposed to the traditional "genotypic approach." In this study, we dealt exclusively with the allelic (haplotype) values rather than the genotypic values. We also sampled the "allelic inheritance" from parents to offspring rather than sampling the "genotypic transition." As a result, there is no need to consider the number of alleles and the total number of genotypes per locus in the mapping population. Instead, consideration is needed only when we assess the prior distribution of the founder alleles. This treatment has greatly simplified the algorithm and increased the robustness of the method.
In QTL mapping experiments of plants, founders are not usually a random sample from a reference population. They are often selected to be complementary for some traits of interest. As a consequence, the fixed-model approach can be used. Under a random model, however, the update steps for the additional parameters included in the priors p(aj) and p(dj) depend on the form of their prior distributions. We have only explained the reversible jump MCMC algorithm under normal-effects QTL. Under the model of multiple QTL with discrete effects, we can modify the update steps (c) and (d) in the proposed algorithm. Assume that there are k alleles at a certain QTL with allelic effects {ai}ki=1, dominance effects {dij}i
j, and frequencies {pi}ki=1 in the base population, where pi, ai, and dij are the frequency and effect of the ith allele and the dominance effect between alleles i and j, respectively. The priors for ai and dij can be assigned as independent normal with known mean and variance. The prior for {pi}ki=1 can take a symmetric Dirichlet. In this case, the parameters of interest are {ai}ki=1, {dij}i
j, and {pi}ki=1. The frequencies {pi}ki=1 can be updated using a Gibbs sampler because its full conditional distribution also remains Dirichlet (![]()
![]()
j, we first simulate a proposal for each parameter by using a random walk, then update the alleles of each founder by sampling from a multinomial distribution, and finally use the Metropolis-Hastings algorithm to update these parameters. Similar algorithms for the biallelic QTL model have been proposed in ![]()
![]()
A problem may arise in real data analysis in which the number of alleles of a putative QTL in the base population is unknown, nor are the distributions of the effects of the QTL. In this situation, two strategies should be considered:
- We can use the fixed-model approach to solve the random-model problem; that is, we first estimate the allelic and dominance effects of founders and then convert them into the QTL variances (
XU 1998 ). This method is expected to be efficient in the case where the number of founders is small.
- In the case of many founders, one may use normal distributions to approximate the prior distributions p(aj) and p(dj). In fact, drawing inferences about the multiallelic QTL variance via the normal distribution is a natural way to characterize genetic variation in the base population. In addition, normal distribution of the allelic effects is usually a very robust assumption. This has been verified in the context of ML mapping by
XU and ATCHLEY 1995 who found that, for data simulated under a biallelic model, the analysis based on normal distribution provided very accurate estimates of QTL variances.
Further investigation is needed to investigate the robustness of normal distribution or other distributions in the framework of Bayesian analysis.
The proposed reversible jump MCMC algorithm performed well for the simulated data. The following points are noteworthy. First, the update step of QTL locations is different from existing algorithms in Bayesian mapping (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
l+1, GLl+1, GRl+1) was used to generate Zpl+1 and Zml+1 in the normal-effects model approach, although such proposal distributions of the genotypes of new QTL worked well in line crosses and the biallelic-effects model (![]()
![]()
![]()
3 days on a SUN SPARC 5 workstation. The most time-consuming parts of our program were the updates of complete marker genotypes, QTL allelic inheritance, and dominance design matrices. The computational speed can be improved substantially with more efficient programming skills.
The assessment of the convergence and autocorrelation of the MCMC with the use of the reversible jump sampler remains a significant problem because the dimension keeps changing from one cycle to another. When the dimension changes, the identities of the QTL also change. The parameters in one cycle of the iteration may be different from those in the next cycle of iteration. Therefore, the convergence criteria developed for the MCMC with fixed dimension are hardly applicable to the reversible jump MCMC (e.g., ![]()
![]()
75%).
We used the method of ![]()
![]()
Following the convention in human pedigree analysis, we have assumed that all founders are included in the model. In open-pollinated trees, however, seeds collected from one tree (mother) are usually pollinated from multiple unknown trees (fathers). Because the fathers (founders) are not identified, their contribution to the progeny is difficult to evaluate. Our model, in theory, can include these founders in the pedigree but treat their marker genotypes as missing. A method that excludes the missing founders while still analyzing the data properly is under development.
| ACKNOWLEDGMENTS |
|---|
We thank Claus Vogl and Lori Weingartner for helpful comments on the manuscript. This research was supported by the National Institutes of Health grant GM55321 and the U.S. Department of Agriculture National Research Initiative Competitive Grants Program 97-35205-5075 to S.X.
Manuscript received April 10, 2000; Accepted for publication December 19, 2000.
| APPENDIX |
|---|
The prior distributions for allelic and dominance effects of the founders:
Denote ajk as a 2 x 1 vector for the allelic effects of the kth founder at the jth QTL, and djkk' as a vector of interaction effects (dominance effects) between the kth and the k'th founder alleles at the jth QTL. The dimension of djkk' is 3 or 4, depending on whether k equals k'. Explicitly, ajk = (ajk(1), ajk(2))T, djkk = (djkk(1), djkk(2), djkk(3))T, and djkk' = (djkk'(1), djkk'(2), djkk'(3), djkk'(4))T (k
k'). The dimension of djkk is 3 because each founder carries two alleles and potentially contributes 3 possible interactions in the descendants. The dimension of djkk' is 4 because 2 founders carry a total of 4 alleles and potentially contribute 2 x 2 = 4 possible interactions.
The priors for ajk and djkk' are assumed to be independent normals:

If the inbreeding coefficient, fk, for the kth founder equals 1, the two elements of ajk are identical, and so are the three elements of djkk. Therefore, ajk and djkk each turns into a scalar, ajk(1) and djkk(1). Similarly, if both fk and fk' are unity, djkk'(1) = djkk'(2) = djkk'(3) = djkk'(4). Note that
2aj and
2dj are set at prefixed constants under a fixed model. Otherwise, they are treated as unknown parameters to be estimated. If fk < 1, we have

and

If fk = 1 and fk'
1, then djkk'(1) = djkk'(2) and djkk'(3) = djkk'(4). Therefore, djkk' is equivalent to the 2 x 1 normal random vector (djkk'(1), djkk'(3))T, with covariance

Similarly, if fk' = 1 and fk
1, djkk' is equivalent to 2 x 1 normal random vector (djkk'(1), djkk'(2))T, with covariance

Finally, for the case where fk
1 and fk'
1, we can get

and

Under the assumption that the founders are independent from each other, the priors p(aj) and p(dj) can be expressed as the following forms, respectively:

| LITERATURE CITED |
|---|
ALMASY, L. and J. BLANGERO, 1998 Multipoint quantitative-trait linkage analysis in general pedigree. Am. J. Hum. Genet. 62:1198-1211[Medline].
AMOS, C. I., 1994 Robust variance-components approach for assessing genetic linkage in pedigrees. Am. J. Hum. Genet. 54:535-543[Medline].
BINK, M. C. A. M. and A. M. VAN ARENDONK, 1999 Detection of quantitative trait loci in outbred populations with incomplete marker data. Genetics 151:409-420
BINK, M. C. A. M., L. L. G. JANSS, and R. L. QUAAS, 2000 Markov chain Monte Carlo for mapping a quantitative trait locus in outbred populations. Genet. Res. 75:231-241[Medline].
BROOKS, S. P., 1997 Discussion to Richardson and Green (1997). J. R. Stat. Soc. Ser. B 59:774-775.
CANTET, R. J. G. and C. SMITH, 1991 Reduced animal model for marker assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 23:221-233.
FERNANDO, R. L. and M. GROSSMAN, 1989 Marker-assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21:467-477.
GELMAN, A. J. B., H. S. CARLIN, H. S. STERN and D. B. RUBIN, 1995 Bayesian Data Analysis. Chapman & Hall, London.
GEMAN, S. and D. GEMAN, 1984 Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6:721-741.
GREEN, P. J., 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711-732
HASTINGS, W. K., 1970 Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109
HEATH, S. C., 1997 Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].
HOESCHELE, I., P. UIMARI, F. E. GRIGNOLA, Q. ZHANG, and K. M. GAGE, 1997 Advances in statistical methods to map quantitative trait loci in outbred populations. Genetics 147:1445-1457[Abstract].
JANSEN, R. C., 1993 Interval mapping of multiple quantitative trait loci. Genetics 135:205-211[Abstract].
KAO, C. H., Z. B. ZENG, and R. D. TEASDALE, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216
LANDER, E. S. and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.
METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER, and E. TELLER, 1953 Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1091.
MURANTY, H., 1996 Power of tests for quantitative trait loci detection using full-sib families in different schemes. Heredity 76:156-165.
RICHARDSON, S. and P. J. GREEN, 1997 On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. Ser. B 59:731-792.
SATAGOPAN, R. J., and B. S. YANDELL, 1996 Estimating the number of quantitative trait loci via Bayesian model determination. Special Contributed Paper Session on Genetic Analysis of Quantitative Traits and Complex Diseases. Biometric Section, Statistical Meeting, Chicago, IL.
SATAGOPAN, J. M., B. S. YANDELL, M. A. NEWTON, and T. G. OSBORN, 1996 A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144:805-816[Abstract].















