- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Sillanpää, M. J.
- Articles by Arjas, E.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Sillanpää, M. J.
- Articles by Arjas, E.
Bayesian Mapping of Multiple Quantitative Trait Loci From Incomplete Outbred Offspring Data
Mikko J. Sillanpääa and Elja Arjasa,ba Rolf Nevanlinna Institute, FIN-00014 University of Helsinki, Finland
b National Public Health Institute, FIN-00300 Helsinki, Finland
Corresponding author: Mikko J. Sillanpää, Rolf Nevanlinna Institute, Research Institute of Mathematics, Statistics, and Computer Science, P.O. Box 4, FIN-00014 University of Helsinki, Finland., mjs{at}rolf.helsinki.fi (E-mail)
Communicating editor: Z-B. ZENG
| ABSTRACT |
|---|
A general fine-scale Bayesian quantitative trait locus (QTL) mapping method for outcrossing species is presented. It is suitable for an analysis of complete and incomplete data from experimental designs of F2 families or backcrosses. The amount of genotyping of parents and grandparents is optional, as well as the assumption that the QTL alleles in the crossed lines are fixed. Grandparental origin indicators are used, but without forgetting the original genotype or allelic origin information. The method treats the number of QTL in the analyzed chromosome as a random variable and allows some QTL effects from other chromosomes to be taken into account in a composite interval mapping manner. A block-update of ordered genotypes (haplotypes) of the whole family is sampled once in each marker locus during every round of the Markov Chain Monte Carlo algorithm used in the numerical estimation. As a byproduct, the method gives the posterior distributions for linkage phases in the family and therefore it can also be used as a haplotyping algorithm. The Bayesian method is tested and compared with two frequentist methods using simulated data sets, considering two different parental crosses and three different levels of available parental information. The method is implemented as a software package and is freely available under the name Multimapper/outbred at URL http://www.rni.helsinki.fi/~mjs/.
INBRED line cross designs are routinely used for quantitative trait locus (QTL) mapping in experimental organisms, because then full heterozygosity and perfect coupling between alleles in the QTL and in nearby marker loci are found in all F1 individuals. Furthermore, the biallelic nature of the design suits well the tradition in genetics, where QTL are treated as biallelic and all different heterozygous QTL effects are considered jointly as a dominance effect. Depending on the organism, an attempt to produce inbred lines is not always practical or even possible (![]()
Presently, there are QTL mapping methods suitable for the analysis of outbred populations (for a review see ![]()
![]()
When interval mapping, where a putative QTL is placed somewhere between markers, is applied to outbred offspring data, linkage phases (haplotypes) of parents must be considered. They are needed to determine whether paternally (or maternally) derived alleles at two neighboring loci are of the same grandparental origin or not. For a comparison, note that the grandparental line origin of alleles found in inbred line-cross offspring is automatically known in all marker positions.
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Recently we presented a Bayesian QTL mapping method from incomplete inbred line-cross data (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
If the F2 family sizes in the studied plant or animal organism are relatively small, one has to combine information from several families. A complication arising from family pooling is that there will then typically be a large number of founders and therefore possible QTL alleles in the data. (Note also that the applicability of marker covariates needs to be considered.) To keep the maximum number of QTL genotypes low (
4) in the combined data, one can assume one of the following alternatives: (1) Grandparents in each family have been drawn from the same two gene pools (lines), in which case they all represent two different QTL alleles in each trait locus. (Fixation of different QTL alleles in these two lines has been assumed.) (2) All families to be combined are related and share the same two grandparents, i.e., all parents belong to the same F1 generation. (Fixation of different QTL alleles in the two lines is again assumed.) (3) All families in the (combined) data are related and share the same four grandparents (numbered from 1 to 4) in such a way that one parent in each family is always progeny of grandparents 1 and 2 and the other parent is always progeny of grandparents 3 and 4; parents descending from grandparents 1 and 4, or 2 and 3, are excluded. Fixation of different QTL alleles in all four grandparental lines and that these lines show somewhat different phenotypic values has been assumed. If these assumptions are met, the resulting offspring population will have four different QTL alleles segregating in each trait locus.
In the following, we focus mainly on data from a one-family experiment. Our model is described next, followed by the results from simulation experiments and a discussion. In two appendixes, parameter estimation and summary measures for statistical inference are considered.
| MODEL |
|---|
We use the notation of ![]()
), the number of background controls (Nbc), incomplete and complete background control genotype information including parents (Xo and X*o), the number of QTL genotypes (Ngen), QTL genotypic effect (regression coefficient) vectors (b1, b2, ... , bNqtl), genotypic effects for background controls (C), residual variance (
2), fixed marker map m, and consistency between complete and incomplete information (A* ~ A).
Let I = (Ii) be the indicator vector, where element Ii = 1{yi observed} takes the value one or zero depending on whether yi is observed or not. Let H* and H be the corresponding complete and incomplete (observed) haplotype information (genotype + allelic origin information:paternal/maternal) in the marker positions. In each case, we indicate the split between maternally and paternally inherited haplotypes by writing H* = (H*F, H*M) and H = (HF, HM). Here H* and H are taken to be (Nind + 2) x N matrices, where N is the number of markers in the considered chromosome. Note that incomplete haplotype information often covers complete genotypic information but not the allelic origin.
In the chosen experimental design, let
= (
1, ... ,
Ngen) be the vector containing all possible QTL genotypes at any locus, so that their actual allelic forms are unknown. These QTL genotypes correspond to combinations of QTL alleles that were present in the crossed grandparents (founders) and that were transmitted to the F1 parents. Let
k = (
k1, ... ,
kNbc(k)gen) be the vector containing all possible background control genotypes and let Nbc(k)gen be their number (maximally four) at the kth background control. Let B = (Bi), where Bi is a vector of covariates (e.g., age, sex, or treatment) for offspring i. Let
be a vector of regression coefficients of these covariates (including also class means if some covariate is a classification variable). In case there is no individual control, we let all Bi reduce to Bi = 1 and
to a common regression intercept
= a. Here we consider only the case where no covariate values are missing.
We consider the following composite interval mapping (![]()
![]() |
(1) |
Here 1{xqi=
j} and 1{X*ik =
kj} are indicator variables (cf. ![]()
We use the shorthand notation
= (b1, ... , bNqtl,
2,
, C) and
= (
,
, l, H*, X*o, Nqtl). Under natural conditional independence assumptions (cf. ![]()
can be presented in the product form
![]() |
(2) |
The posterior density of
is then proportional to the right-hand side of
![]() |
(3) |
, I, m) is the likelihood function (normal density) constructed from those independent residuals ei in (1) in which the observation indicator Ii = 1. Here the complete background control genotypes are determined uniquely from X*o.
The ingredients of the prior density (2) are specified as follows. Denote complete haplotype information at the marker positions of the ith offspring by H*i, and similarly that of male and female parents by H*M and H*F. We consider the simple product form prior for the complete haplotypes in the family: p(H*|m) = p(H*F|m) p(H*M|m)
Nindi=1 p(H*i|H*F, H*M, m). Furthermore, for each offspring i we can further factorize the prior and compute it as the product
![]() |
(4) |
Here,
Fs,i(x) (
Ms,i(x)) is a function of haplotype information x, and it determines the grandparental origin of the maternal (paternal) allele of individual i at marker locus s. The probabilities p(
M1,i(H*) = F) = p(
M1,i (H*) = M) = p(
F1,i(H*) = F) = p(
F1,i(H*) = M) =
are the prior probabilities of different grandparental origins under Mendelian segregation for paternally and maternally inherited alleles at marker locus 1 in offspring i. When only maternally inherited alleles of offspring i are considered, then
![]() |
(5) |
Fs+1,i(H*) provided that the marker at position s has grandparental origin
Fs,i(H*). Here rs,s+1 is the recombination fraction between the markers s and s + 1. The structure of p(
Ms+1,i(H*)|
Ms,i(H*)) derived for paternally inherited alleles is similar.
Let the complete background control marker information in parents F and M be X*o,F and X*o,M, respectively. We assume the following prior form for background control genotypes in the other chromosomes: p(X*o) = p(X*o,F)p(X*o,M)
Nindi=1 p(X*o,i|X*o,F, X*o,M), where p(X*o,i|X*o,F, X*o,M)
p(X*o,i)1{X*o,i~X*o,F, X*o,i~X*o,M}. We also assume marker independence and that all (consistent) genotypes are a priori equally likely.
The prior distribution of the number of QTL is assumed to be truncated Poisson (see ![]()
As in ![]()
![]()
| H*, l, m, Nqtl) =
Nqtlq=1 p(xq|x1, ... , xq-1, H*, l, m) =
Nqtlq=1
Nindi=1 p(xqi|H*qi,LR, rq). Note that QTL are not automatically (conditionally) independent from each other (see ![]()
The QTL analysis of the offspring is done in terms of parental haplotypes. The numbers of possible QTL alleles and QTL genotypes in BC and F2 designs are found in Table 1. Given the QTL genotype vector
= (
1, ... ,
Ngen), the prior probabilities for s = 1, ... , Ngen are calculated from the equation
![]() |
(6) |
|
Here H*qiL = (
FL(q),i (H*),
ML(q),i (H*)), and H*qiR = (
FR(q),i (H*),
MR(q),i (H*)) are the left- and right-ordered flanking object (QTL or marker) genotypes in the grandparental origin form. Haplotype coding and the evaluation of the probability in (6) in F2 and backcross designs are illustrated in Figure 1 and Figure 2.
|
|
| SIMULATION ANALYSIS |
|---|
To test the performance of this method, an outcrossing F2 population consisting of Nind = 200 offspring was generated by a simulation program provided by J. W. Van Ooijen (Centre for Biometry Wageningen, CPRO-DLO, The Netherlands). We considered two 100-cM long chromosomes, both having 11 evenly spaced markers, at every 10 cM. The simulated trait had a genetic (QTL) variance 4.47 and a phenotypic variance 6.35, resulting in heritability 0.7. Two sets of parental crosses were generated: In the first set the parental mating type was fully informative (AB x CD) at all marker loci, and in the second set the degree of informativeness, as well as the corresponding linkage phases, varied from locus to locus. The simulated true underlying parental cross in the second set is shown in Figure 3; it is underlying in the sense that after the simulation this information was "forgotten" and not used in the Bayesian analyses (as explained below). The genotype-specific phenotype effects and the locations of the three simulated QTL can be found from Table 2. All haplotypic assignments in the offspring were assumed unknown. In the statistical analyses, three specifications regarding the amount of parental information were considered: (1) All genotypes and haplotypic assignments in parents were assumed known; (2) all genotypes were assumed known but their phases unknown in parents; and (3) all parental and grandparental marker information was assumed unknown (missing). The performance of our method was compared to that of "all-markers" interval mapping (IM; ![]()
![]()
|
|
In addition, the simulated data in which each QTL had four alleles were analyzed (in cases 1 and 3), having incorrectly assumed fixed grandparental lines (where grandfathers were assumed to originate from the same line). This was done to see how this erroneous assumption influences the results.
In all Bayesian analyses described here, our C-program implementing a Metropolis-Hastings chain was run 5,000,000 cycles in a Pentium II/266MHz computer. No values were deleted because of burn-in, but the chain was thinned so that only every fifth iteration was saved, resulting in 1,000,000 sampled values for each parameter. After a preprocessing stage (see Appendix 1), background controls were chosen. When analyzing a real data set, they can be determined by a single marker regression or by performing several analyses. Here, however, we simply chose marker 3 in chromosome 1 and marker 4 in chromosome 2 as background controls. Very likely, a few reanalyses would have led to the same conclusion. As no covariates (age, sex, etc.) were used, there was a common intercept (
= a and Bi = 1 for all i). The running times, in circumstances where there was practically no other load in the computer, varied around 9 hr. The initial value for the number of QTL was three, and the corresponding locations were 20.0 cM, 50.0 cM, and 80.0 cM. The Poisson mean (hyperparameter) was set to
= 2 and the maximum number of QTL (in the analyzed chromosome) to three. The residual standard deviation was chosen to be uniform over the range [0.0, 2.55], the right endpoint being equal to the phenotypic standard deviation estimate from the data. The prior of the intercept was taken to be uniform on [-13, 13], those of the QTL genotypic regression coefficients were independent normal distributions with mean zero and variance 100, and the prior of the background control genotypic regression coefficients was uniform on [-13, 13]. Finally, the prior of the QTL locations was uniform over [0, 100]. The control parameter values used in the final analyses are given in Table 3. The proposal distribution for the genotypic effects (coefficients) was chosen to be N(0, 0.5) in cases where the addition of a new QTL to the model was proposed.
|
In the IM and MQM/02 analyses, walking speed was set to 0.5 cM, which is the smallest admissible value in the MAPQTL software. We used the same background controls in MQM/02 as in the Bayesian analyses.
| RESULTS |
|---|
The Bayesian posterior QTL intensities (see Appendix 1) in chromosome 1, when all parental information was present (case 1) or when parental linkage phases were absent (case 2), are shown in Figure 4 (top) when all markers are fully informative, and Figure 5 (top) when marker information varies from marker to marker. The curves consisting of the pointwise medians and the 2.5 and 97.5% quantiles of the posterior distribution of the phenotypic effects of the four genotypes, as functions of the putative QTL location, are shown in the same figures when all parental information is present (left), or when parental linkage phases are unknown (right). Approximate posterior distributions of the number of QTL in chromosome 1, obtained from these four different analyses, are shown in Table 4. The analyses where all parental information was absent (case 3) are not summarized in figures or in tables. This is because in theory case 3 is not fully identifiable, resulting in probabilistic summary measures (the posterior QTL intensity and the posterior distribution of the number of QTL) that are not unique. These problems are described and considered more in the DISCUSSION.
|
|
|
Table 5 gives a brief summary of our findings concerning the localization of QTL as suggested by the QTL intensities in Figure 4 and Figure 5. The table makes direct reference to (approximate) posterior probabilities that a particular chromosomal region
of high QTL-intensity concentration contains a given number of QTL. Also the corresponding posterior expectations are calculated. The analyses support quite strongly the hypothesis of two QTL in chromosome 1.
|
In the analyses where all markers were fully informative (Figure 4, top), the two posterior QTL-intensity graphs (from cases 1 and 2) became nearly identical, regardless of whether parental linkage phase information was available or not. Both posterior QTL-intensity graphs were nicely concentrated around the left QTL at 32.7 cM. The graphs surrounding the right (weaker) QTL at 58 cM were much wider, and there was also some bias to the left. However, the true simulated QTL is still inside the regions [41 cM, 60 cM] and [41 cM, 63 cM] of elevated posterior QTL intensities. In this case (Figure 5, top left), the MQM analysis performed well in both QTL localizations in chromosome 1, but the IM analysis managed to localize only the left QTL. (Note that the posterior QTL-intensity graphs covering the regions [41 cM, 60 cM] and [41 cM, 63 cM] are multimodal. This is apparently the same phenomenon that is typical to the LOD-score curve at marker points: often there is more evidence, because of marker genotyping, against placing a putative QTL exactly at a marker locus than against placing it somewhere nearby.) The graph leaves somewhat uncertain why, of the two modes, the one that is farther away from the true simulated QTL at 58 cM ended up being higher in the first case.
It can be seen from Figure 5 that the nonconstant marker information analysis (case 1) results in high posterior QTL intensities surrounding both simulated QTL in chromosome 1. The IM and MQM analyses localized quite well the "left" QTL at 32.7 cM, but localization of the "right" QTL at 58 cM was poor with both methods. Somewhat surprisingly, in the Bayesian method, the left, more influential, QTL was not localized as accurately as the right QTL when linkage phases were available in parents. This may be a consequence of the fact that there is a highly informative marker very close to the right QTL, whereas this is not the case with the left QTL (see Table 6). As could be expected, the localization was somewhat less accurate when the parental genotypes or their linkage phases were not available.
|
Consider next the estimation of the phenotypic effects, indicated by asterisks in Figure 4 and Figure 5. As could be expected, the estimation was most successful in the case (displayed in Figure 4, left) where marker information was complete and where complete parental information was available. In the case of nonconstant marker information, but still assuming complete knowledge of the parental genotypes and linkage phases, the estimates were somewhat less accurate, with some of the true values being just outside the 95% credible boundaries (Figure 5, left). When analyzing real data, the true labeling [i.e., assigning of the QTL genotypes (13, 14, 23, 24) to the true grandparental alleles] of the phenotypic effects is almost always unknown (except for the QTL genes that have been positionally cloned). If parental genotype and/or linkage phase information are missing, the labeling of the genotypic effects according to the grandparental origin of the alleles also becomes nonunique in the simulated case. For this reason, when comparing the phenotypic effect estimates with the true values used in the simulation, we have to make sure that each estimate is matched correctly with a combination of two grandparental QTL alleles. Such reassignment of the QTL genotypes is indicated on the right-hand side of Figure 4 and Figure 5 by circles. In chromosome 1, note that the genotype labels are not consistent with each other in case 2.
The performance of the IM and MQM methods in the estimation of the phenotypic coefficients of the putative QTL was not particularly good. Moreover, they do not provide confidence intervals for such point estimates. Confidence intervals would have to be determined separately, for example, by employing bootstrap techniques.
The point estimates of QTL locations and their support regions are summarized in Table 7 for four different analyses of chromosome 1.
|
When considering chromosome 2 (which was analyzed only in cases 1 and 3), the posterior QTL-intensity graphs (see Figure 6) were all nicely concentrated around the simulated true QTL at 41.2 cM, regardless of whether the markers were fully informative or not. Also, the IM and MQM methods were able to localize the QTL at 41.2 cM quite well.
|
The performance of the analyses (cases 1 and 3), when it was incorrectly assumed that the grandparental lines are fixed (pictures not shown), was quite poor in chromosome 1. The only exception was the case where all parental information was available and all markers were fully informative. Then the simulated QTL at 32.7 cM was localized rather well, and there was also some indication of QTL activity around the QTL at 58 cM. Assuming fixation in the situation where all markers were fully informative but where all parental information was absent, only the latter QTL resulted in a high (but broad) QTL-intensity concentration.
| DISCUSSION |
|---|
We have presented here a Bayesian procedure for mapping multiple QTL from incomplete outbred offspring data, thus extending our earlier method (![]()
![]()
Following ![]()
![]()
![]()
![]()
![]()
We tested the performance of our method by using simulated F2 data sets (two informativeness levels), with varying degrees of parental marker information (three levels). It seems intuitively plausible, and it also became clear from our simulations, that the availability of parental linkage phase information is more important in the case where the markers are not fully informative. The situation where also a part of the offspring marker genotypes is missing was not considered in the test analyses.
Standardization of the phenotypic data is recommended before applying Bayesian QTL mapping in practice. Then the same proposal windows and other control parameters can be applied to different data sets, instead of performing separate test trials for each. Another advantage is that the numerical accuracy may be improved because computers' ability to store floating point numbers is maximal when dealing with numbers between zero and one.
The marker covariates can be chosen by an application of simple linear regression at each marker (putative QTL) position, omitting individuals whose genotype at that locus was unknown (because data augmentation would need linkage phase information). In doing so, one should pay attention to how much information a potential covariate marker carries and how many missing values there are. If an interesting region does not contain any fully informative markers, one can often find two closely linked markers such that each marker alone is informative only with respect to one (and a different) parent.
Parental mating type is usually not constant in outcrossing experiments. Thus a systematic application of some index describing the proportion of informative meioses locally present in the data will help the analyst to quantify the possibility of localizing a QTL in different areas of the considered chromosome. One such measure is displayed in Table 6. The influence of marker informativeness (cf. marker polymorphism in ![]()
The phenotypic effects can be estimated reliably only in chromosomal regions in which the posterior QTL intensity is sufficiently high. As an alternative to the locationwise posterior densities for phenotypic effects shown in Figure 4 and Figure 5, the posterior density can be constructed as an expectation over several pointwise values (of phenotypic effects), each being associated with a putative QTL location within a particular region of high posterior QTL intensity. One such posterior density is shown in Figure 7.
|
There appear to be two possible philosophies about how the indexing of QTL genotypes should be interpreted. Considering QTL genotype 13, for example, the first interpretation says that lines 1 and 3 are names for the parental haplotypes. In this case the remaining uncertainty concerning linkage phase is in how the grandparental alleles are assigned to these haplotypes. According to the second interpretation, lines 1 and 3 are names for the grandparental lines (alleles), and uncertainty is in the assignment of the parental haplotypes to these lines. Obviously, these two ways of thinking lead to different results only when there is some uncertainty in the parental linkage phases. We have adopted here the first interpretation, even though the second one is in some sense more fundamental in the context of QTL mapping.
We stress that in situations where all parental information is missing (case 3) it will be problematic to assign unique grandparental origins to the estimated phenotype effects. In this situation, both parents have symmetric pairs of haplotype configurations that are a posteriori equally likely to be the correct underlying mating structure. As a consequence, under these circumstances the correspondence between QTL genotypes (13, 14, 23, and 24; cf. Figure 1) and their grandparental alleles is not unique. In our program, the assignment can actually change from one iteration cycle to another within one MCMC run, let alone in different runs. (In practice such changes are rare because of the strong local dependence between offspring and their parents and between adjacent loci.) In case 3, the parental phase reconstruction can actually change suddenly in some region of the chromosome to a symmetrical mating type. (This can only be checked from the simulated data.) Also the resulting posterior QTL-intensity curves can differ in such regions in different MCMC runs.
In cases 1 and 2, the very strong local dependency structure between parents and offspring and between adjacent loci will in practice prevent such phase transitions during the same MCMC run. Therefore, to avoid problems of this kind, we strongly recommend that at least one of the parents should be genotyped in several marker loci along the chromosome, as equidistant as is possible.
Locally, of course, if there is a fully informative (reference) marker, in case 3 we can also avoid such identifiability problems and averaging in estimation by fixing the assignments (segregation indicators) arbitrarily at the reference marker and then using the fact that, as long as the genetic distance from the marker is short, haplotype assignment can be made in a way that is with high probability consistent with that chosen at the reference marker locus. If this informative marker is near a contemplated QTL, this technique will also facilitate the estimation of the corresponding phenotypic effects, by keeping the four haplotypic assignments (and thus the corresponding QTL allele combinations) apart. A more negative aspect of this technique is that it works only locally, as simultaneous haplotype assignments at two or more marker positions might not agree with the true haplotype configuration. As a consequence, the estimation would need a new MCMC run for each such local assignment.
| ACKNOWLEDGMENTS |
|---|
M.S. thanks Matti Taskinen for his advice in the programming work, and Päivi Hurme and Outi Savolainen for many useful discussions about the designs. We are grateful to Johan Van Ooijen for providing his simulation program, which was used to generate test data sets, and to Pekka Uimari and three anonymous referees for their constructive comments on the manuscript. This work was supported by a research grant (no. 38352) from the Academy of Finland, and by the ComBi Graduate School.
Manuscript received July 6, 1998; Accepted for publication December 28, 1998.
| APPENDIX 1 |
|---|
PREPROCESSING AND PARAMETER ESTIMATION
Before the actual statistical analysis, the data go through a preprocessing stage. In this process, we infer as much of the marker genotype and linkage phase information as is possible by direct logical deduction from known parts of the family structure. The deduction rules applied here (sequentially until there are no new assignments) are similar to the genotyping rules of ![]()
Let us consider a multi-allelic marker in the chromosome to be analyzed, where, after the logical deductions, the genotypes of the parents are still unknown. Further, consider the genotype or complete haplotype imputations for both parents by updating them one at a time. In such situations, when the genotype of one parent has been imputed, some offspring genotypes may in fact uniquely determine the genotype of the other parent. To avoid this and to make the sampler work more efficiently, genotypes of parents are considered jointly, and they have to form a pair that is consistent with the offspring genotypes. Therefore, we go through all possible allele combinations in parents, one at a time at each marker locus, and check whether any of them is inconsistent with the offspring genotypes. All inconsistent pairs are eliminated. In a backcross, one needs to check an additional consistency in genotypes of related parents.
Sometimes a block-update is preferred over a single-site-update in MCMC applications to pedigrees (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In the following, we describe only those parts of the estimation algorithm that are different from those in ![]()
- Step 2. The following is repeated for each marker, j = 1, ... , N: A new ordered genotype proposal (family-block) at the jth position is constructed as follows:
- 1. If one or both genotypes in parents are unknown, a consistent pair of genotypes is proposed. Each consistent genotype-pair is considered as equally likely.
- 2. If unknown, their allelic origins are also proposed considering each configuration as equally likely.
- 3. Incomplete offspring genotypes are completed by taking one allele (with equal transmission probabilities) from each parent. These transmissions simultaneously specify the allelic origins and the grandparental origins, which are then updated accordingly.
- 4. Unknown allelic origins of known offspring genotypes are determined by using deduction. Origins of a homozygote can be assigned randomly, and an offspring allele not found in one parent must originate from the other parent. If some origins are left uncertain, they are proposed with equal probabilities.
- 5. Grandparental origins are determined for offspring alleles having a heterozygous parent, but are randomly assigned for alleles inherited from homozygotes.
The family-block proposal H*new(j) is accepted, separately for each marker j, with probability

If the proposals for marker j are accepted, then H*(t)(j) = H*new(j), and otherwise H*(t)(j) = H*(t-1)(j). Here the notation H*(t)(j) refers to the family-block haplotype in the jth marker in the tth round, while vector H*(t,new(j)) = (H*(t)(1), ... , H*(t)(j-1) H*new(j), H*(t-1)(j+1), ... , H*(t-1)(N)), vector H*(t,j) = (H*(t)(1), ... , H*(t)(j) H*(t-1)(j+1), ... , H*(t-1)(N)), and function fj,i(H*1, H*2) = {p(
Fj+1,i (H*1)|
Fj,i (H*2)) x p(
Fj,i (H*2)|
Fj-1,i (H*1)) x p(
Mj+1,i (H*1)|
Mj,i (H*2)) x p(
Mj,i (H*2)|
Mj-1,i(H*1))}.
Step 3. Random walk proposals for regression parameters are generated in three different blocks: (1) mean, environmental covariates, and residual standard deviation; (2) all QTL genotypic coefficients; and (3) all background control coefficients. Denote by L1 (L2) the likelihood and by p1 (p2) the normal density prior for the QTL genotypic coefficients evaluated at the new (old) values. The proposals are accepted separately for each block with probability min{1, L1 x p1/(L2 x p2)}. If accepted, then
(t) =
new, and otherwise
(t) =
(t - 1). (In block 3, the acceptance ratio is evaluated separately for each background control.)
Step 4. Imputation for the missing background control markers is done as in ![]()
| APPENDIX 1 |
|---|
As in ![]()
1,
2, ... ,
Nbins, where
j is the approximate posterior QTL intensity on interval
j, obtained from the Monte Carlo simulation of Ncycs iteration cycles. In a backcross or an F2 intercross, let
![]() |
(7) |
j and µ(k)q =
Ngenx=1
. If fixation of QTL alleles in different grandparental lines is assumed, we can use distribution functions similar to those presented for F2 in | LITERATURE CITED |
|---|
GREEN, P. J., 1995 Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711-732
HALEY, C. S., S. A. KNOTT, and J.-M. ELSEN, 1994 Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207[Abstract].
HEATH, S. C., 1997 Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].
HOESCHELE, I., P. UIMARI, F. E. GRIGNOLA, Q. ZHANG, and K. M. GAGE, 1997 Advances in statistical methods to map quantitative trait loci in outbred populations. Genetics 147:1445-1457[Abstract].
JANSEN, R. C., 1993 Interval mapping of multiple quantitative trait loci. Genetics 135:205-211[Abstract].
JANSEN, R. C., 1996 A general Monte Carlo method for mapping multiple quantitative trait loci. Genetics 142:305-311[Abstract].
JANSEN, R. C. and P. STAM, 1994 High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455[Abstract].
JANSEN, R. C., D. L. JOHNSON, and J. A. M. VAN ARENDONK, 1998 A mixture model approach to the mapping of quantitative trait loci in complex populations with an application to multiple cattle families. Genetics 148:391-399
JANSS, L. L., G. R. THOMPSON, and J. A. M. VAN ARENDONK, 1995 Application of Gibbs sampling for inference in a mixed major gene-polygenic inheritance model in animal populations. Theor. Appl. Genet. 91:1137-1147.
JENSEN, C. S., and A. KONG, 1997 Blocking Gibbs sampling for linkage analysis in large pedigrees with many loops. Manuscript available at MCMC preprint service (http://www.stats.bris.ac.uk/MCMC/).
JENSEN, C. S. and N. SHEEHAN, 1998 Problems with determination of noncommunicating classes for Monte Carlo Markov Chain applications in pedigree analysis. Biometrics 54:416-425[Medline].
KAO, C.-H. and Z.-B. ZENG, 1997 General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53:653-665[Medline].
KNOTT, S. A., D. B. NEALE, M. M. SEWELL, and C. S. HALEY, 1997 Multiple marker mapping of quantitative trait loci in an outbred pedigree of loblolly pine. Theor. Appl. Genet. 94:810-820.
KONG, A., 1991 Analysis of pedigree data using methods combining peeling and Gibbs sampling, pp. 379385 in Computer Science and Statistics Proceedings of the 23rd Symposium on the Interface, edited by E. M. KERAMIDAS and S. M. KAUFMAN. Interface Foundation, Fairfax Station, VA.
KRUGLYAK, L., 1997 The use of a genetic map of biallelic markers in linkage studies. Nat. Genet. 17:21-24[Medline].
KRUGLYAK, L., M. J. DALY, and E. S. LANDER, 1995 Rapid multipoint linkage analysis of recessive traits in nuclear families, including homozygosity mapping. Am. J. Hum. Genet. 56:519-527[Medline].
LANDER, E. S. and P. GREEN, 1987 Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84:2363-2367
LIN, S., 1995 A scheme for constructing an irreducible Markov Chain for pedigree data. Biometrics 51:318-322[Medline].
LIN, S., E. THOMPSON, and E. WIJSMAN, 1994 Finding noncommunicating sets for Markov Chain Monte Carlo estimation on pedigrees. Am. J. Hum. Genet. 54:695-704[Medline].
MALIEPAARD, C., and J. W. VAN OOIJEN, 1994 QTL mapping in a full-sib family of an outcrossing species, pp. 140146 in Biometrics in Plant Breeding: Applications of Molecular Markers, edited by J. W. VAN OOIJEN and J. JANSEN. CPRO-DLO, Wageningen, The Netherlands.
RICHARDSON, S. and P. J. GREEN, 1997 On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. Ser. B 59:731-792.
SATAGOPAN, J. M., and B. S. YANDELL, 1996 Estimating the number of quantitative trait loci via Bayesian model determination. Special Contributed Paper Session on Genetic Analysis of Quantitative Traits and Complex Diseases, Biometric Section, Joint Statistical Meetings, Chicago, IL (available at ftp://ftp.stat.wisc.edu/pub/yandell/revjump.html/).
SATAGOPAN, J. M., B. S. YANDELL, M. A. NEWTON, and T. C. OSBORN, 1996 A Bayesian approach to detect quantitative trait loci using Markov Chain Monte Carlo. Genetics 144:805-816[Abstract].
SHEEHAN, N. and A. THOMAS, 1993 On the irreducibility of a Markov chain defined on a space of genotype configurations by a sampling scheme. Biometrics 49:163-175[Medline].
SILLANPÄÄ, M. J. and E. ARJAS, 1998 Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373-1388
SOBEL, E. and K. LANGE, 1996 Descent graphs in pedigree analysis: application to haplotyping, location scores, and marker-sharing statistics. Am. J. Hum. Genet. 58:1323-133












contains at least one QTL, calculated for different areas 

