help button home button Genetics AJP: Gastro and Liver
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pérez-Enciso, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pérez-Enciso, M.
Genetics, Vol. 163, 1497-1510, April 2003, Copyright © 2003

Fine Mapping of Complex Trait Genes Combining Pedigree and Linkage Disequilibrium Information: A Bayesian Unified Framework

Miguel Pérez-Encisoa
a Institut National de la Recherche Agronomique, Station d'Amélioration Génétique des Animaux, 31326 Castanet-Tolosan, France

Corresponding author: Miguel Pérez-Enciso, Station d'Amélioration Génétique des Animaux, BP 27, Cedex 31326 Castanet-Tolosan, France., mperez{at}toulouse.inra.fr (E-mail)

Communicating editor: C. HALEY


*  ABSTRACT
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We present a Bayesian method that combines linkage and linkage disequilibrium (LDL) information for quantitative trait locus (QTL) mapping. This method uses jointly all marker information (haplotypes) and all available pedigree information; i.e., it is not restricted to any specific experimental design and it is not required that phases are known. Infinitesimal genetic effects or environmental noise ("fixed") effects can equally be fitted. A diallelic QTL is assumed and both additive and dominant effects can be estimated. We have implemented a combined Gibbs/Metropolis-Hastings sampling to obtain the marginal posterior distributions of the parameters of interest. We have also implemented a Bayesian variant of usual disequilibrium measures like D' and r2 between QTL and markers. We illustrate the method with simulated data in "simple" (two-generation full-sib families) and "complex" (four-generation) pedigrees. We compared the estimates with and without using linkage disequilibrium information. In general, using LDL resulted in estimates of QTL position that were much better than linkage-only estimates when there was complete disequilibrium between the mutant QTL allele and the marker. This advantage, however, decreased when the association was only partial. In all cases, additive and dominant effects were estimated accurately either with or without disequilibrium information.


AN ultimate goal of quantitative trait loci (QTL) studies is to clone the gene(s) responsible for the genetic differences between individuals and, eventually, identify the causal mutation(s). Certainly, this is a daunting task that will be accomplished only gradually. One of the most severe limitations, at the moment, is that the QTL position is estimated with too large an error to allow positional cloning when a classical linkage analysis is employed. The 95% confidence interval for the QTL position usually spans over 5–20 cM, at a minimum. The wide confidence interval occurs because the number of meioses in the genotyped pedigree is usually very small; only between two and three generations are generally employed. Linkage disequilibrium (LD)-based methods, in contrast, capitalize on the number of generations that occurred since the appearance of mutation and can produce extremely accurate estimates of the gene position, within kilobases in some instances (HASTBACKA et al. 1994 Down). Nevertheless, the chance of success of the LD strategy depends on a number of population parameters, such as the degree of admixture in the sampled population, the actual level of association between the causal mutation and the polymorphisms, or the correct ascertainment of phases and of genotypes at the QTL. Of course these parameters are usually unknown but do dramatically affect the results (TERWILLIGER and WEISS 1998 Down). In fact, a pure LD analysis is likely to result in a large number of false positives as illustrated recently, e.g., in Alzheimer's disease (EMAHAZION et al. 2001 Down).

A promising approach is thus to combine both linkage and linkage disequilibrium (LDL) methods to add their advantages in a single unified theoretical framework. More specifically, there is an urgent need for robust methods that provide accurate estimation of the QTL position. Consider for the sake of illustration a simple design where a number of nuclear families are typed, i.e., parents and offspring. The theoretical advantages of combining linkage disequilibrium and pedigree (linkage) information in QTL analysis are manifold: (i) A marker for which a parent is homozygous does not contribute information in a linkage analysis, yet it does in LD analysis; (ii) conversely, two parents may share the same haplotype but not necessarily the same QTL genotypes, and a pure LD analysis would be misleading but the phenotype of offspring together with the ascertainment of alleles transmitted can be used to determine which are the most likely QTL genotypes of the parents; (iii) an individual without relatives but with phenotype records can be included in the LD analysis, in contrast to a pure linkage study; and (iv) a comparison of the analyses including or not the LD information can assess the validity of the LD model assumptions (i.e., one mutation t generations ago).

Several authors have addressed the problem of combining LD and linkage mapping for quantitative trait loci (ZHAO et al. 1998 Down; ALLISON et al. 1999 Down; ALMASY et al. 1999 Down; FULKER et al. 1999 Down; WU and ZENG 2001 Down; FARNIR et al. 2002 Down; MEUWISSEN et al. 2002 Down), whereas XIONG and JIN 2000 Down proposed a method suited to disease susceptibility genes. ZHAO et al. 1998 Down developed a semiparametric procedure based on the score-estimating equation approach and that addressed the particular case of single-nucleotide polymorphisms. This is one of the first articles to provide a theoretical framework for LDL mapping but the estimating equation approaches are difficult to implement in practice; they require complex computations adapted to each family structure. For instance, the method sums over all possible phases and computes their probabilities, which is extremely complex to do in practice beyond a few markers. The statistical properties of these estimators are also unknown.

FULKER et al. 1999 Down developed a sib-pair analysis in a likelihood framework. The approach followed by ALLISON et al. 1999 Down is a generalization of the transmission disequilibrium test (TDT) for quantitative traits (ALLISON 1997 Down), where a between- and within-family association parameter is modeled via a mixed model. Neither the FULKER et al. 1999 Down nor ALLISON et al. 1999 Down methods are very suited to analyzing complex pedigrees as they consider sib pairs (FULKER et al. 1999 Down) or parent-offspring trios (ALLISON et al. 1999 Down) and their theoretical framework is difficult to generalize to more complex settings. TDT in particular is not an optimum choice to deal with very polymorphic markers like microsatellites and makes use of only a limited amount of the total information contained in a typical pedigree. MEUWISSEN et al. 2002 Down, in turn, proposed to model the QTL alleles as a random variable, where the covariance between base population haplotypes allows the inclusion of the LD information (MEUWISSEN and GODDARD 2000 Down), and the covariance between non-base population haplotypes was computed as in FERNANDO and GROSSMAN 1989 Down and GODDARD 1992 Down. They estimated the position via maximum likelihood. The model followed by these authors is different from the usual LD, where a diallelic QTL is assumed. The key issue in their method is to compute the identity-by-descent probabilities between the base population haplotypes, and this was done by considering the number of identity-by-state alleles shared by any two haplotypes, along the lines also suggested by MCPEEK and STRAHS 1999 Down. They assumed that phases are known, which is a reasonable assumption only if families are very large, e.g., as in dairy cattle. Otherwise, QTL positioning can be dramatically affected if a phase is incorrectly specified. FARNIR et al. 2002 Down developed an analytical approach for combining linkage and LD in half-sib families, where the disequilibrium information is incorporated via TERWILLIGER'S 1995 Down approach. Their method would be very cumbersome to generalize to more complex populations; in addition, phases are assumed to be known and it is not a true multipoint method. The method of WU and ZENG 2001 Down, intended for natural populations, is also difficult to apply to complex pedigrees.

Here we present a Bayesian method that combines linkage and LD information for QTL mapping within a unified theoretical framework. Our LDL method uses jointly all marker information, as well as all available pedigree information; i.e., it is not restricted to any specific experimental design and it is not required that phases be known. If desired, infinitesimal genetic effects or environmental noise (fixed) effects can also be fitted. A diallelic QTL is assumed and both additive and dominant effects can be estimated. We have implemented a combined Gibbs/Metropolis-Hastings sampling to obtain the marginal posterior distributions of the parameters of interest. We illustrate the method with simulated data.


*  THEORY
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We assume that the goal of the analysis is to fine map a QTL that has been previously located within a given genome region. The genetic model presupposes that a single mutation occurred t generations ago on a gene affecting the trait studied. Thus, initially, a single ancestral (founder) haplotype harbored the mutation. The number of haplotypes carrying the mutation increases in successive generations provided that the mutation is not lost and, due to recombination, the initial allele combination is eroded. The amount of disequilibrium between markers and QTL decreases proportionally to genetic distance and to the number of generations elapsed since mutation. Here we use the population model for linkage disequilibrium decay described in MORRIS et al. 2000 Down, with modifications described below. Briefly, a binary variable Ski is defined such that, at any kth marker locus and ith individual, the locus will be either identical by descent (IBD) with the original haplotype carrying the mutation (Ski = -) or not (Ski = +), with minus and plus signs standing for the mutant and wild haplotype alleles, respectively. By convention we denote the QTL by locus 0. A Markov chain Monte Carlo (MCMC) method was provided by MORRIS et al. 2000 Down to obtain the transition probabilities of a locus being IBD or not at locus k + 1 conditional on being IBD or not at locus k.

Now suppose that the QTL additive and dominance effects are a and d, respectively; i.e., the mean phenotype of the individuals homozygous for the wild allele (+/+) minus that of individuals homozygous for the mutant allele (-/-) is 2a, whereas the mean phenotype of heterozygous individuals, (+/-) or (-/+), is d. Suppose further that a number m of individuals have been typed for DNA markers, contained in matrix M, and that phenotypic measurements (y) are available on a subset of n individuals. The linear explicative model is

(1)

where ß is a fixed-effects (environmental/nongenetic effects) vector; wa is a vector with indicator variables taking values 1 or -1 if the QTL genotype of each individual is +/+ or -/-, respectively, and zero for heterozygous individuals; wd contains values 1 if individual QTL genotype is +/- or -/+, zero otherwise; and u and e contain the infinitesimal genetic values (polygenic effects) and residuals, respectively, whereas X and Z are incidence matrices. The matrix X* contains X plus two additional columns for wa and wd; similarly vector ß* is ß plus elements a and d.

The goal of the analysis is to obtain estimates of the set of parameters, {theta} = {S0, a, d, u, ß, {sigma}2u, {sigma}2e, {delta}, t, T, H}, where S0 is a matrix containing the IBD status of the two individual QTL alleles with the causal mutation, taking values + or -; {sigma}2u is the infinitesimal genetic variance; {sigma}2e, the residual variance; and {delta} is the QTL position. T is a QTL segregation indicator vector containing, for each individual and haplotype, a binary variable specifying whether the QTL allele is IBD with the paternal or maternal parental allele (THOMPSON 1994 Down). Note that S0 needs to be specified only for the base population individuals (those without known parents) and that the QTL genotypes for the whole population are unambiguously determined once S0 and T are specified. Finally, H is a vector containing the phases (paternal or maternal) for each of the markers. It can be seen that wa and wd in (1) are completely determined by S0 and T and are not additional random variables; a redundant notation was used solely for the sake of clarity in (1). The main symbols that are used throughout the article are detailed in Table 1 for the reader's convenience.


 
View this table:
[in this window]
[in a new window]

 
Table 1. Main symbols used

The Bayesian inference is based upon the posterior distribution of the parameters,

(2)

where p(y, M|{theta}) is the likelihood (in the Bayesian sense), and p({theta}) is the a priori distribution for the parameters. Note that phenotypes and markers are conditionally independent. Ideally, inferences about each of the parameters in {theta}, say {theta}l, should be based on the marginal posterior distribution, i.e.,

(3)

where {theta}-l indicates the vector of parameters except the lth unknown. Typically this multidimensional integral is unfeasible and we need to resort to stochastic procedures like Gibbs or Metropolis-Hastings sampling schemes (SORENSEN and GIANOLA 2002 Down). In the following, we describe all conditional distributions that we need to sample from. Unless otherwise stated, we make the usual assumptions of flat priors for all parameters, except for p(u) = Normal(0, A{sigma}2u), where A is the additive relationship matrix between individuals (LYNCH and WALSH 1998 Down).

The rest of this section is devoted to presenting the main conditional distributions to sample from to obtain the posterior distribution of the parameters of interest. For the reader less interested in the mathematical details, this part can be summarized as follows. For the base population individuals (those without ancestors genotyped) we use their marker haplotypes and the phenotypic information of their descendants, in addition to the prior allele frequencies, to ascertain the more likely QTL genotypes. The LD signal is incorporated into the model via the distribution p(M|{theta}), which quantifies the probability of an individual carrying a certain marker haplotype conditional on its QTL genotype and other population parameters, like the age of the mutation. We assume a star-shaped genealogy. We suppose that base population individuals are genotyped for most of the markers but not that phases are known; they are inferred from the offspring genotypes. LD or allele frequency priors do not contribute any information to obtain the genotypes of the descendant individuals (conditionally on the genotypes of the base population) and are sampled following the most likely recombinants as inferred from marker information. Once the QTL alleles are sampled, most of the remaining parameters are obtained via a classical Gibbs sampling within the mixed-model context (SORENSEN and GIANOLA 2002 Down). In contrast, Metropolis-Hastings is required for the QTL position; here we identify where recombinants have occurred at two alternative positions and the resulting likelihoods using available phenotypic information are compared (UIMARI and SILLANPAA 2001 Down).

Base population QTL genotypes (S0):
In the absence of LD information, only the phenotypes of the individuals that have received a given base population allele provide information about the likely value of that allele. This is illustrated in the simple pedigree of Fig 1; the solid lines represent the transmitted alleles, stored in T. Suppose that we are sampling the IBD status of first individual and first allele (S011), conditional on all other parameters including the S0 of the remaining individuals (denoted by {theta}_). The phenotypes of individuals 1, 5, 6, and 7 influence the probability p(S011|{theta}_, y, M). In contrast, p(S012|{theta}_, y, M), corresponding to the second QTL allele, involves only the phenotype of individual 1, as this allele was not transmitted. If that individual does not have phenotype recorded, p(S012|{theta}_, y, M) is strictly proportional to the prior frequencies for each QTL allele, when LD information is not being used. We denote by {psi}i the set of individuals that have received at least one allele for individual i and have phenotypes, i.e., {psi}1 = {1, 5, 6, 7}, {psi}2 = {5, 6}, and {psi}3 = {psi}4 = {3, 4, 8}. Note that the set {psi} may vary from iteration to iteration as a new T is sampled. If LD information is being used, p(S0|{theta}_, y, M) also depends on the marker alleles of the base population individuals. Using all sources of information, the QTL IBD status of the ith base population individual can be sampled from the fully conditional distribution,

(4)

where yj is the phenotype of the jth individual having received at least one allele from individual i, and S0- denotes the rest of IBD status not sampled. We now show which are the distributions involved in (4). The first term is a product of Normal densities N(ej, {sigma}2e), with

where, x'j is the column vector of X corresponding to the jth individual's observation.



View larger version (11K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Representation of a pedigree via the transmission coefficients T. Each small circle represents an allele of the QTL, identical-by-descent alleles are connected with a solid line, and individual genotypes, 1–8, are boxed with dashed lines.

The distribution p(Mi|{theta}) in (4) is the probability of having marker alleles linked in haplotype 1 or 2 (say Mi1 or Mi2) conditional on a given QTL genotype, its position relative to DNA markers, and the parameter governing the LD decay (t). Both haplotypes are conditionally independent; thus p(Mi|{theta}) = p(Mi1, Mi2|S0i1, S0i2, t, Hi, {delta}) = p(Mi1|S0i1, t, Hi, {delta})p(Mi2|S0i2, t, Hi, {delta}), where Mi1 contains the marker alleles received from the father and Mi2, those of mother's origin. Consider the marker alleles of a given individual i at haplotype h (Mih); in our notation L markers are to the left and R markers to the right of the current QTL position. Then,

where Mihk denotes the allele at marker k (starting from the QTL) of haplotype h, ith individual. Note that k takes negative values for markers to the left of the QTL. Dropping subscripts i and h and the conditioning on t, H, and on {delta} for clarity, we find

This process is repeated sequentially from the QTL position toward the extremes of the interval,

(5)

where Sk is the IBD state of marker allele k of individual i with the original mutant haplotype.

At any marker locus, k, the locus will be either IBD with the original haplotype carrying the mutation (Sk = -) or not (Sk = +). The term p(Mk|Sk) contains the marker allele probabilities conditional on Sk; p(Mk|Sk = +) is simply given by the population allele frequencies. In contrast, p(Mk|Sk = -) will be 1 for the allele that carried the mutant haplotype and 0 for the remaining alleles. The vector p(M|SL = ... S-1 = S1 = SR = -) is the original haplotype that carried the mutation. Of course this haplotype is unknown but can be inferred as shown by MORRIS et al. 2000 Down. Here we have preferred to consider both Sk and p(Mk|Sk) as nuisance parameters; i.e., we are not usually interested directly in them, and thus we integrate them out in (5). As a result, p(Mk|Sk = -) is no longer 0's and 1's but can take any value between the two extremes. The Appendix shows how p(Mk|Sk) is updated.

The transition probabilities p(Sk|Sk-1) can be obtained as detailed in MORRIS et al. 2000 Down and depend on the effective size and time since mutation. Four transition probabilities need to be specified, which are

and

(MORRIS et al. 2000 Down), where {phi} is the ratio of 1 M/1 Mb DNA (typically 1/100), {delta}k,k+1 is the distance (morgans) between loci k and k + 1, and {alpha} is the probability of recombining with a haplotype carrying the mutation. This parameter is in fact highly confounded with t (KAPLAN et al. 1995 Down) and we did not try to estimate it; rather, we set {alpha} = 0.001. This had a negligible impact on the results.

Expression (5) is extremely difficult to compute. However, we can rearrange as

Thus, starting from the outermost marker, R, it is feasible to compute QR using the recursive formula

with initial values and similarly Note that each coefficient qk is a vector with two elements corresponding to states Sk = + and Sk = -. At the end of the computations we obtain the probabilities of individual haplotypes given S0 = + and S0 = -. There can be numerical problems in obtaining QR or QL for a large number of markers as the number of possible haplotypes increases exponentially with the number of markers, especially for highly polymorphic markers like microsatellites. However, since the relevant statistic is the ratio Q(S0 = +)/Q(S0 = -), qR and qL can be initialized to a very large number.

Finally, p(S0i1, S0i2) in Equation 4 is the a priori probability of the IBD state of the two QTL alleles with the original mutant haplotype. When the individual is not inbred, p(S0i1, S0i2) = p(S0i1)p(S0i2), where the prior probabilities are the same for any base population allele. If the ith individual is known to be inbred from the available pedigree with inbreeding coefficient fi, p(S0i1, S0i2) = (1 - fi)p(S0i1)p(S0i2) + fip(S0i2){eta}(S0i1|S0i2), with {eta} being an indicator 1/0 function that makes S0i1 take the same value as S0i2. The prior probability of an allele being identical by descent with the original mutant haplotype is {alpha} if the base population individuals have been sampled at random from the population, i.e., p(S0i = -) = {alpha} and p(S0i = +) = 1 - {alpha} for every individual. Otherwise, e.g., case/control study or selective genotyping, the probabilities have to be modified accordingly (MORRIS et al. 2000 Down).

In summary, to sample the IBD states at the QTL position we evaluate Equation 4 at all four possible QTL genotypes, i.e., (+/+), (+/-), (-/+), (-/-), for each base population individual in turn, and we take a random number according to the genotype probabilities. Both alleles are thus sampled simultaneously. Nevertheless, this strategy can be ameliorated by sampling larger blocks of base population IBD states. Suppose IBD states of base population individuals 1 through c are sampled; then

(6a)

where j {isin} {psi} means any individual having received at least one allele from any of individuals 1 through c. An issue of interest is to determine which S0 elements are to be sampled together to minimize the risk of reducibility. Here we sampled jointly those origins that coincided in the maximum number of individuals. For instance, if only four origins were to be sampled together in the pedigree of Fig 1, two blocks with the IBD status of individuals (1, 2) and (3, 4) rather than (1, 3) and (2, 4) would be chosen. Note that a pure linkage approach can be easily implemented sampling from

(6b)

instead of from (6a).

The rest of the sampling distributions required are detailed in the Appendix. Once all variables are initialized, the Markov chain Monte Carlo (MCMC) chain consists of iterating successively via Equation 4 or Equation 6aEquation 6b plus updating the phases (H), p(M|S), and the transmission indicators (T). Obviously, in a linkage-only approach,(6b), the sampling is simplified by not sampling p(M|S) and time since mutation (t). The procedure is otherwise identical.

Two-marker disequilibrium measures:
LD measurements like D' (HEDRICK 1987 Down; LEWONTIN 1988 Down) rely on the possibility of ascertaining the linkage phases and the alleles themselves, which is not possible with quantitative traits because the QTL genotypes are not known. Nevertheless, phases and QTL alleles are generated each iteration so we can define a Bayesian estimate of D' between any marker and the QTL, computing D' at the current configuration using the formula where i is the ith allele of the marker, with frequency pi, the marker has n1 alleles, index j refers to the jth QTL allele, with frequency qj, and is the usual measure for diallelic markers. Here we provided the mean of the posterior distribution, obtained as D' averaged over iterations. We also computed the recommended measure by PRITCHARD and PRZEWORSKI 2001 Down denoted by r2 (or {Delta}2 in DEVLIN and RISCH 1995 Down), which is defined as One of the interesting properties of r2 is that r2 times the number of haplotypes is distributed as a chi square with n1 - 1 d.f. (WEIR 1996 Down), although this is an approximation and does not hold for large r (HUDSON 1985 Down). Nevertheless that property is not needed here as we are able to derive the full posterior distribution of r2 between any marker and the QTL and assess the relevant highest density region that covers the point 0 (no disequilibrium). Here we report that to make it comparable with D'. Both D' and r were calculated using only the base population individuals.


*  SIMULATION
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Two population types that can typically be found in livestock, with "simple" and "complex" pedigrees, were simulated. The simple population consisted of 40 unrelated full-sib families, 10 offspring per family. The complex population was a four-generation pedigree, with a base population of 80 unrelated parents that produced 40 full-sib families of size 5 (generation 2), whereas generations 3 and 4 consisted of 20 full-sib families (5 offspring per family). Parents were chosen at random except in generation 1, where all parents had an equal number of offspring. Both simple and complex pedigrees had a total of 480 individuals. The explored region spanned 25 cM and contained six microsatellites at positions 0, 5, 10, 15, 20, and 25 cM, together with 10 single-nucleotide polymorphisms (SNPs) located at positions 11, 12, 13, 14, 16, 17, 18, 19, 21, and 22 cM. SNP allele frequencies were 0.3 and 0.7, whereas there were six alleles at equal frequencies for each microsatellite. The QTL was located in position 18 cM, its additive effect was a = 1, there was no dominance (d = 0), and the residual variance was . Phenotypic records were simulated for generation 2 in the simple population and for all individuals in the complex pedigree. All individuals were genotyped. The mutant QTL allele frequency in the population studied was 0.3. Two situations were considered: The mutant QTL allele was either completely associated with SNP allele "2" (frequency = 0.3) in position 18 cM or partially associated with the SNP allele "1" (frequency = 0.7). In the former case, all haplotypes with the SNP allele 2 in position 18 cM carried the mutant QTL allele; in the latter case, initially ~42% (0.3/0.7) of haplotypes with SNP allele 1 harbored the QTL mutant allele. The original haplotype carrying the mutation was 1111111111211111 with complete association and 1111111111111111 in the second case. It was assumed that the mutant allele appeared 100 generations ago, and the decay in disequilibrium was simulated following the model in MORRIS et al. 2000 Down. We compared the results using the LDL method (Equation 6a) with those when only linkage information was used (Equation 6b).

Three replicates of each case were run, resulting in 12 analyses in total. The only fixed effect included in the analyses was the general mean. The maximum change in QTL position was set to 0.5 cM in each direction. We ran 50,000 iterations of the MCMC chain, discarding the first 4000 iterations. Eight origins were sampled jointly; thus p(S0i1, S0i2, ... , S0c2|y, M, {theta}-) can take 28 = 256 values because the QTL is assumed to be diallelic. Phases were updated in blocks of six. Each complete iteration took ~3.5 sec on an alpha workstation with processor 21164A. The computing time per iteration is highly dependent on the number of paths and phases updated simultaneously.


*  RESULTS
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Table 2 presents the mean and SD of the marginal posterior distributions for the main parameters in the case of complete association. The posterior distributions for the additive and dominant effects in the first replicate are plotted in Fig 2A and provide a whole picture about the uncertainty regarding these parameters. Results were very similar for all replicates so only one is presented. The estimates of the genetic effects and the residual variance were quite accurate, and the SDs of their posterior distributions were small, indicating that there is enough information in the data to estimate these parameters. The 95% highest density region contained the true values of a, d, and {sigma}2e in all cases. In particular, it was correctly detected that gene action was additive. A rigorous test of dominance, nevertheless, would imply computing the Bayes factors between the two competing models. Interestingly, there was little difference between using or not using the linkage disequilibrium information. This means that most, if not all, information to estimate the QTL genetic effects comes from classical linkage analysis. The effect of population structure was also negligible. However, including LD does affect the estimate of the QTL position (Table 2, Fig 3) with complete association between the SNP and the QTL alleles: (1) The mode of the posterior distribution always coincided with the true position and this was not necessarily the case in the linkage-only approach; (2) LDL estimates were always less biased; and (3) the SDs of the posterior distributions were always smaller in the LDL than in the linkage-only method. In general, the relative advantage of LDL over linkage-only was larger in the two-generation than in the complex pedigrees. This can occur because more meioses are available for mapping in the four- than in the two-generation pedigree but also because in the complex pedigree there were fewer offspring per family, making it less accurate for estimating the QTL genotype and the marker phases of the base population individuals, and this has a much larger effect on LDL than in linkage-only analysis.



View larger version (18K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Marginal posterior probabilities of additive (a) and dominant effects (d), expressed in residual standard deviation units. The thick line corresponds to the LDL estimate and the thin shaded line, to the linkage-only estimate. (a) First replicate of the simple pedigree, complete association; (b) first replicate of the simple pedigree, incomplete association. The true values were a = 1 and d = 0.



View larger version (22K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Marginal posterior probabilities of QTL location with complete association between QTL and SNP genotype. (Left) Simple population graphs; (right) complex population graphs. The three replicates are shown below each other. The solid thick lines refer to estimates obtained using linkage and linkage disequilibrium, and the thin shaded lines refer to estimates obtained using linkage information only. The QTL was located in position 18 cM (indicated by the arrowhead).


 
View this table:
[in this window]
[in a new window]

 
Table 2. Posterior distribution statistics: complete association

Results concerning the incomplete association scenario are presented in Table 3 and Fig 4. As expected, the estimates of the QTL effects were similar to those in Table 2, albeit the SDs were somewhat larger in particular for the dominance effect. Replicate 2 of the complex pedigree had unusually large SD of the posterior distributions of a and d. But more importantly, the accuracy of the QTL position was generally much smaller with incomplete than with complete association (note that the scales of the y-axes are different in Fig 3 and Fig 4). It is also apparent that the mode of the posterior distribution coincided with the true position only once (replicate 1, complex pedigree) although it was close, positions 0.16–0.17 M, in the remaining replicates with the LDL approach. In some instances (replicate 1, simple pedigree) the posterior density was very flat and covered almost the whole region under study. In principle, linkage-only estimates should not be greatly affected by either complete or incomplete association, because the accuracy depends mainly on the informativity of markers to identify recombinant haplotypes. This seems to be the case if we exclude the rather outlying replicate 1 (simple pedigree, Fig 4). The average SD of the QTL position posterior density was 4 cM in the linkage-only approach for both complete and incomplete association scenarios. In contrast, it was 2.2 and 3 cM using LDL in the complete and incomplete scenarios, respectively.



View larger version (24K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. Marginal posterior probabilities of QTL location with incomplete association between QTL and SNP genotype. (Left) Simple population graphs; (right) complex population graphs. The three replicates are shown below each other. The solid thick lines refer to estimates obtained using linkage and linkage disequilibrium, and the thin shaded lines refer to estimates obtained using linkage information only. The QTL was located in position 18 cM (indicated by the arrowhead).


 
View this table:
[in this window]
[in a new window]

 
Table 3. Posterior distribution statistics: incomplete association

Contrary to the estimates of QTL genetic effects or position, the LD decay parameter t was loosely estimated (Table 2 and Table 3). This means that there is little information in the data to estimate them. In fact, we observed that p(M|{theta}) was quite flat for different values of t. A positive reading is that the exact figures for t did not affect the final results to a large extent, as we found similar output when we fitted these parameters to a variety of values, in agreement with previous results (MEUWISSEN and GODDARD 2000 Down).

Finally, Fig 5 draws a plot of the simple disequilibrium measures between each marker and the QTL, D' and r, for the three simple pedigrees. D' and r measures obtained under both statistical methods LDL and linkage-only are plotted. The two top and bottom plots correspond to the complete and incomplete LD scenarios, respectively. The most striking feature is, perhaps, the extreme differences in behavior between D' and r. Under complete LD, the pattern of r was much more stable behavior than that of D', as there was very little variation between replicates and r peaked clearly at the QTL position (18 cM). In contrast, D' had a much larger variability between replicates and was clearly multimodal in several instances. Nevertheless, these two measures showed clear maxima at or close to the true QTL position under complete disequilibrium. The picture changes dramatically in the incomplete LD scenario. Here r had maxima only at the nearest microsatellites (15 and 20 cM) but a very flat curve was apparent in clear contrast with the complete LD case. The pattern for D' was not as affected by incomplete LD (Fig 5, bottom left) although the profile was somewhat flatter than that with complete LD. Again, we observed a large variability between replicates. It is apparent that the LD statistics D' and r were higher when using LDL than when using linkage-only methods, although the general pattern was comparable (compare thick solid lines vs. thin shaded lines in Fig 5).



View larger version (19K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. Plots of disequilibrium measures D' and r between each marker and the QTL. The top (bottom) row corresponds to the three replicates with complete (incomplete) association in the simple pedigree. Estimates obtained with the LDL method are shown as thick solid lines and those with linkage only, as thin shaded lines. The QTL was located in position 18 cM (indicated by the arrowhead).


*  DISCUSSION
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We have provided a coherent and unified theoretical framework to combine linkage and LD information, as exemplified in Equation 4, Equation 6a, and Equation 6b. The method worked well with simulated data. Here we have used the exponential growth model as described by MORRIS et al. 2000 Down but the Bayesian framework is flexible and other population models can be incorporated by modifying p(M|{theta}) appropriately in Equation 4 or Equation 6aEquation 6b. An important feature of the method presented here is that it provides the joint haplotype probability conditional on the QTL genotype, i.e., p(M-L, ... MR, | S0, {theta}-), whereas MORRIS et al. 2000 Down wrote the likelihood as , which differs from that used here, Equation 5. Take, without loss of generality, two markers. MORRIS et al. 2000 Down(p. 162, bottom) used

where

and

In contrast, we used the actual joint distribution, which is

(Equation 5). Unless complete independence exists (which does not make sense in a haplotype analysis), a joint distribution is not equal to the product of the marginals, and our approach should provide more power, even in a LD-only analysis, than that of MORRIS et al. 2000 Down.

Our results show that it is indeed possible to go beyond the 20-cM confidence interval to locate QTL in populations of reasonable size with moderate family sizes and without an extremely dense genotyping. But they also point out that the advantages of combining LD information into the usual linkage framework should not be overemphasized and that its impact may vary dramatically depending on a number of factors. First, the usefulness of LDL over linkage-only methods is heavily dependent on the nature of the association, e.g., on whether there is complete LD between the marker and the QTL allele. Second, in the population structure, for accurate LD mapping it is extremely important to determine correctly the phases and the QTL genotypes. Having a small number of base population individuals with large families seems a better option than having a complex pedigree spanning several generations, although the optimum structure will depend on the strength of LD; e.g., if LD is extreme, a large number of base populations animals will be better because we will have more "independent" haplotypes. Finally, chance will affect the results: Mendelian transmission, recombination, and environmental noise are stochastic processes that may result in very different data sets starting from identical initial conditions. A sample of this variability is in Fig 3 and Fig 4, and very interesting experimental results are presented, e.g., in EMAHAZION et al. 2001 Down.

Our relatively pessimistic conclusions contrast with much more optimistic views of the advantages of LDL mapping in livestock, more specifically in dairy cattle (FARNIR et al. 2002 Down; MEUWISSEN et al. 2002 Down). Of course parts of the discrepancies are due to the different methodological approaches. It should also be mentioned that the accuracy of QTL estimates may also be affected by the method of computing the posterior distribution from the MCMC samples (HOTI et al. 2002 Down). However, the dairy cattle population structure is ideally suited for LD mapping; very large families and small effective population sizes make it possible to accurately estimate phases and QTL genotypes and reduce genetic heterogeneity. This is not the case for most livestock species and certainly not the case in humans. Results from the group of M. Georges are very illustrative (RIQUET et al. 1999 Down; FARNIR et al. 2002 Down). Initially, RIQUET et al. 1999 Down located a QTL using only LD information, but that position was shifted to a significantly different position in a later analysis that combined LD and linkage. The primary reason was that sires had different genotypes assigned in each analysis. The population sizes that we used here prevented us from an accurate estimation of both the QTL genotypes of base populations and of some of the phases; these two facts together make it that no one-to-one correspondence between haplotype and QTL genotype can be established unequivocally. As a result, linkage-only methods do not compare too badly with the LDL strategy. MCMC methods take care of the uncertainty but at the price of increasing the variance of the posterior density and thus the accuracy.

In this work, we have also proposed Bayesian equivalents for the classical LD measures D' and Interestingly, r and D' exhibited distinct behaviors depending on whether there was a complete association between the QTL and the SNP (Fig 5); r decreased more markedly than D' as we moved away from the QTL with complete association, but the reverse was true with incomplete association. NORDBORG and TAVARE 2002 Down have shown that the D' measure fluctuates more widely than r, which is in agreement with our results. It is important to note that there may be a large variability in disequilibrium decay, as has been evidenced by simulation (e.g., NORDBORG and TAVARE 2002 Down; PRITCHARD and PRZEWORSKI 2001 Down) or with experimental data (REICH et al. 2001 Down). In particular, it is difficult to compare LD measures of SNPs with those of microsatellites. Disequilibrium measures depend necessarily on allele frequencies and, as argued (NORDBORG and TAVARE 2002 Down), they should because gene history and frequency are inextricably linked. Here disequilibrium measures decreased much more rapidly with SNPs than with multiallelic markers. It is also important to bear in mind that the pattern in disequilibrium decay between QTL and marker does not necessarily parallel the posterior distribution of the QTL position, as is evident from comparing the graphs in Fig 3 and Fig 4 (simple pedigree) with those in Fig 5.

Certainly, further extensions and testing of this approach are warranted, particularly to overcome some of the potential risks of using LD. First of all, stratification may cause spurious disequilibrium. In principle, a LDL methodology should be more robust than a pure LD strategy but this remains to be tested and it is uncertain whether stratification has such a large impact on quantitative traits mapping as it does with binary traits. Genetic heterogeneity is also a major problem in quantitative trait loci mapping. In this case there will be a number nf of original haplotypes carrying a distinct or the same mutation affecting the trait. In our model, this amounts to considering more than either + or - IBD states; an IBD indicator variable should be included and probabilities p(M|S0 = k, k = 1, nf) should be estimated. In the likely case that nf is not known, a reversible-jump MCMC strategy could be used. LIU et al. 2001 Down and MORRIS et al. 2002 Down have recently presented an alternative approach to allow for multiple mutations in a pure LD-mapping strategy. Missing markers are dealt with by using only available information for computing phases and segregation indicators. This is a reasonable approximation if the percentage of missing genotypes and the pedigree's complexity are not large; otherwise the transmission coefficients T are not properly calculated. This should not be too much of a concern in the special case of fine mapping, where one is usually analyzing a few generations and very dense genotyping. However, this is a much more important limitation in marker-assisted selection or in linkage analysis of complex populations. Here we have implicitly assumed a star-shaped genealogy, which is not realistic in many instances. The dependence among sampled base population haplotypes, i.e., the fact that recombination histories are correlated, can be included in the model via, e.g., coalescent techniques assuming a given effective size (MEUWISSEN and GODDARD 2001 Down). A simple strategy is to consider that prior allele states in any two haplotypes are not independent, i.e., p(S0i, S0i') != p(S0i)p(S0i'), but rather use the additive relationship coefficient ({rho}i,i'), computed using all available pedigrees as a measure of association; then p(S0i, S0i') = (1 - {rho}ii')p(S0i)p(S0i') + {rho}ii' p(S0i){eta}(S0i|S0i'), as explained in the THEORY section. Much more complicated is the issue of conditioning on the actual known pedigree previous to the first genotyped individuals and their observed marker alleles.

To conclude, fine mapping complex trait genes is a topic of very active research and a major challenge in both human and animal genetics. Given the diversity of genetic architectures and population histories, it is unlikely that a single statistical approach will be valid for all cases. One of the advantages of the Bayesian approach presented here is that the different sources of knowledge are conditionally independent (Equation 4 and Equation 6aEquation 6b) so that we can consider, e.g., different population genetic models to model LD simply by changing equation p(M|{theta}) appropriately. Additionally, the degree of uncertainty about the parameters can be fully described via the marginal posterior distribution.


*  ACKNOWLEDGMENTS

We are thankful for helpful discussions with M. Sillanpää, D. Milan, J. M. Elsen, B. Goffinet, and L. L. G. Janss. A referee is thanked for the suggestions. This work was funded by the Bureau des Ressources Génétiques, project no. 20, and Action en Bioinformatique (France).

Manuscript received September 4, 2002; Accepted for publication December 19, 2002.


*  APPENDIX
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

SAMPLING DISTRIBUTIONS
Mixed-model effects (a, d, u, and ß):
The mixed-model equations (HENDERSON 1984 Down) are, conditional on wa, wd, {sigma}2u, and {sigma}2e,

(A1)

or Cb = d, where C is the left-hand-side matrix in (A1) above, d is the right-hand-side vector, and b contains ß* and u, with WANG et al. 1993 Down showed that the fully conditional distribution of any element bi of b = [ß*, u] is

(A2)

where di is the ith element of the right-hand-side vector, and cij is element (i, j) of C, which has dimension N.

Variance components ({sigma}2u and {sigma}2e):
The fully conditional distributions are

(A3)

and

(A4)

(WANG et al. 1993 Down), where {chi}-2q stands for an inverted chi-square distribution with q d.f. Equation A3 and Equation A4 assume a naïve ignorance prior. Conjugate informative priors with prior varianceO2 and {nu} d.f., respectively, result in posteriori conditional distributions of the type (QF + O2{nu}){chi}-2q+{nu}, where QF is the quadratic form in (A3) or (A4) (WANG et al. 1993 Down; SORENSEN and GIANOLA 2002 Down).

Linkage disequilibrium parameters [t, p(Mk,j|Sk)]:
The fully conditional distribution of t is not a known distribution and, thus, we resort to Metropolis-Hastings sampling. A new proposed age of mutation tnew is accepted with probability

(A5)

The probabilities p(Mkj|Sk) contain the allele probabilities for each allele j of marker k conditional on the IBD state of the marker with the original mutant haplotype. This variable is updated each iteration as follows. For each base population individual, the probabilities that at marker k the alleles are IBD with the mutant haplotype are calculated given the IBD state at the QTL position, S0 equaling either + or -. The original frequencies of allele j at marker k in the nonmutant population are obtained from

where F is the number of base population haplotypes, {eta}ihjk is an indicator variable taking value = 1 if the individual i has allele j at marker k and haplotype h, and zero otherwise. Similarly, we compute

which is the probability that the original mutant haplotype contains allele j at marker k. An alternative option is to sample the original mutant haplotype as in MORRIS et al. 2000 Down. However, and unless we are interested in reconstructing the original haplotype, we prefer the approach here, whereby the founder haplotype, that where QTL mutation occurred, is treated as a nuisance parameter and integrated out.

Phase sampling (H):
Phases that could not be determined unambiguously were sampled using a block Gibbs sampling algorithm. A parameterizable number of marker phases were sampled jointly for each individual in turn. The algorithm works as follows. First, unknown phases for a given individual are identified, say nh unknown phases. Second, an indicator variable is constructed taking all possible values (2nh). For instance, suppose that there are four markers and that the phases of first and last markers are known or not sampled (i.e., missing marker), then the indicator variable may take values -00-, -01-, -10-, and -11-, where "-" stands for not sampled, "0" for paternal, and "1" for maternal origin. Finally, the probability associated with each value is calculated using all available marker information and current phases in parents and offspring and a new phase block is sampled. Here a maximum of six phases were sampled jointly.

Segregation indicators (T):
T was usually updated together with the QTL position, as explained below. A new proposal for T was sampled conditioning on marker and phase information using Mendelian rules.

QTL position ({delta}):
This is one of the most critical steps of the Bayesian procedure. A variety of strategies have been proposed in the literature (SATAGOPAN et al. 1996 Down; HEATH 1997 Down; UIMARI and HOESCHELE 1997 Down; SILLANPAA and ARJAS 1998 Down). In a typical sampling scheme, individual S0 would be updated conditional on the other genotypes, but this is a risky option as the chain will get stuck easily (JANSS et al. 1995 Down). UIMARI and SILLANPAA 2001 Down proposed a dual sampling scheme. In some iterations, {delta} is updated using the acceptance ratio

(A6)

However, using (A6) may prevent {delta} from "jumping" between adjacent marker intervals because the above acceptance ratio is very sensitive to the percentage of QTL recombinant haplotypes, which in turn depends on the marker interval. In other iterations T and {delta} were updated simultaneously. A new T was generated as described using a new position, {delta}new, and both Tnew and {delta}new were accepted with probability

(A6)

(UIMARI and SILLANPAA 2001 Down). Otherwise T and {delta} remained unchanged. Here, sampling was normally performed via (A6'), except every five iterations when (A6) was used.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

ALLISON, D. B., 1997  Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60:676-690.[Medline]

ALLISON, D. B., M. HEO, N. KAPLAN, and E. R. MARTIN, 1999  Sibling-based tests of linkage and association for quantitative traits. Am. J. Hum. Genet. 64:1754-1763.[Medline]

ALMASY, L., J. T. WILLIAMS, T. D. DYER, and J. BLANGERO, 1999  Quantitative trait locus detection using combined linkage, disequilibrium analysis. Genet. Epidemiol. 17(Suppl. 1):S31-S36.

DEVLIN, B. and N. RISCH, 1995  A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29:311-322.[Medline]

EMAHAZION, T., L. FEUK, M. JOBS, S. L. SAWYER, and D. FREDMAN et al., 2001  SNP association studies in Alzheimer's disease highlight problems for complex disease analysis. Trends Genet. 17:407-413.[Medline]

FARNIR, F., B. GRISART, W. COPPIETERS, J. RIQUET, and P. BERZI et al., 2002  Simultaneous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred half-sib pedigrees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics 161:275-287.[Abstract/Free Full Text]

FERNANDO, R. L. and M. GROSSMAN, 1989  Marker assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21:467-477.

FULKER, D. W., S. S. CHERNY, P. C. SHAM, and J. K. HEWITT, 1999  Combined linkage and association sib pair analysis for quantitative traits. Am. J. Hum. Genet. 64:259-267.[Medline]

GODDARD, M. E., 1992  A mixed model analysis of data on multiple genetic markers. Theor. Appl. Genet. 83:878-886.

HASTBACKA, J., A. DE LA CHAPELLE, M. M. MAHTANI, G. CLINES, and M. P. REEVE-DALY et al., 1994  The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78:1073-1087.[Medline]

HEATH, S. C., 1997  Markov Chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760.[Medline]

HEDRICK, P. W., 1987  Gametic disequilibrium measures: proceed with caution. Genetics 117:331-341.[Abstract/Free Full Text]

HENDERSON, C. R., 1984 Applications of Linear Models in Animal Breeding. University of Guelph, Guelph, ON, Canada.

HOTI, F. J., M. J. SILLANPAA, and L. HOLMSTROM, 2002  A note on estimating the posterior density of a quantitative trait locus from a Markov chain Monte Carlo sample. Genet. Epidemiol. 22:369-376.[Medline]

HUDSON, R. R., 1985  The sampling distribution of linkage disequilibrium under an infinite alleles model without selection. Genetics 109:611-631.[Abstract/Free Full Text]

JANSS, L. L. G., R. THOMPSON, and J. A. VAN ARENDONK, 1995  Application of Gibbs sampling for inference in a mixed major gene-polygenic inheritance model in animal populations. Theor. Appl. Genet. 91:1137-1147.

KAPLAN, N., W. G. HILL, and B. S. WEIR, 1995  Likelihood methods for locating disease genes in nonequilibrium populations. Am. J. Hum. Genet. 56:18-32.[Medline]

LEWONTIN, R. C., 1988  On measures of gametic disequilibrium. Genetics 120:849-852.[Abstract/Free Full Text]

LIU, J. S., C. SABATTI, J. TENG, B. J. KEATS, and N. RISCH, 2001  Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res. 11:1716-1724.[Abstract/Free Full Text]

LYNCH, M., and B. WALSH, 1998 Genetic Analysis of Quantitative Traits. Sinauer, Sunderland, MA.

MCPEEK, M. S. and A. STRAHS, 1999  Assessement of linkage disequilibrium by the decay of haplotype sharing, with application to fine scale genetic mapping. Am. J. Hum. Genet. 65:858-875.[Medline]

MEUWISSEN, T. H. E. and M. E. GODDARD, 2000  Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155:421-430.[Abstract/Free Full Text]

MEUWISSEN, T. H. and M. E. GODDARD, 2001  Prediction of identity by descent probabilities from marker-haplotypes. Genet. Sel. Evol. 33:605-634.[Medline]

MEUWISSEN, T. H., A. KARLSEN, S. LIEN, I. OLSAKER, and M. E. GODDARD, 2002  Fine mapping of a quantitative trait locus for twinning rate using combined linkage and linkage disequilibrium mapping. Genetics 161:373-379.[Abstract/Free Full Text]

MORRIS, A. P., J. C. WHITAKER, and D. J. BALDING, 2000  Bayesian fine-scale mapping of disease loci, by hidden Markov models. Am. J. Hum. Genet. 67:155-169.[Medline]

MORRIS, A. P., J. C. WHITTAKER, and D. J. BALDING, 2002  Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. Am. J. Hum. Genet. 70:686-707.[Medline]

NORDBORG, M. and S. TAVARÉ, 2002  Linkage disequilibrium: what history has to tell us. Trends Genet. 18:83-90.[Medline]

PRITCHARD, J. K. and M. PRZEWORSKI, 2001  Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69:1-14.[Medline]

REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, and P. C. SABETI et al., 2001  Linkage disequilibrium in the human genome. Nature 411:199-204.[Medline]

RIQUET, J., W. COPPIETERS, N. CAMBISANO, J. J. ARRANZ, and P. BERZI et al., 1999  Fine-mapping of quantitative trait loci by identity by descent in outbred populations: application to milk production in dairy cattle. Proc. Natl. Acad. Sci. USA 96:9252-9257.[Abstract/Free Full Text]

SATAGOPAN, J. M., B. S. YANDELL, M. A. NEWTON