Multitrait Fine Mapping of Quantitative Trait Loci Using Combined Linkage Disequilibria and Linkage Analysis
 ^{*} Danish Institute of Agricultural Sciences, Department of Animal Breeding and Genetics, Research Centre Foulum, DK8830 Tjele, Denmark
 ^{†} Department of Dairy Science, Virginia Polytechnic and State University, Blacksburg, Virginia 240610315
 1 Corresponding author: Danish Institute of Agricultural Sciences, Department of Animal Breeding and Genetics, Research Centre Foulum, P.O. Box 50, DK8830 Tjele, Denmark. Email: mogens.lund{at}agrsci.dk
Abstract
A novel multitrait finemapping method is presented. The method is implemented by a model that treats QTL effects as random variables. The covariance matrix of allelic effects is proportional to the IBD matrix, where each element is the probability that a pair of alleles is identical by descent, given marker information and QTL position. These probabilities are calculated on the basis of similarities of marker haplotypes of individuals of the first generation of genotyped individuals, using “gene dropping” (linkage disequilibrium) and transmission of markers from genotyped parents to genotyped offspring (linkage). A small simulation study based on a granddaughter design was carried out to illustrate that the method provides accurate estimates of QTL position. Results from the simulation also indicate that it is possible to distinguish between a model postulating one pleiotropic QTL affecting two traits vs. one postulating two closely linked loci, each affecting one of the traits.
IN recent years a number of genome scans have been reported. The aim of these studies was to identify regions on the genome with genes affecting traits of interest. This is achieved by linkage analysis, which utilizes information on recombination events that occur between genetic markers when gametes are formed and transmitted from parents to offspring. The low number of recombination events between closely linked markers restricts the precision of quantitative trait loci (QTL) mapping using linkage analysis. In genome scans in livestock populations a QTL can typically be mapped to a chromosomal region of 1030 cM. To utilize QTL in selective breeding, or to identify functional genes, a higher level of resolution of position estimates is required.
An assumption common to linkage analysis methods is that founder individuals with no parents are unrelated and noninbred. It follows from this assumption that the probability of genes in founders being identical by descent (IBD) at marker loci or QTL is zero. However, similarities in haplotypes of closely linked markers around a given position provide information on the probability of founder genes being identical by descent in this position. Because individuals that carry a mutant gene will also be IBD in a chromosomal region surrounding the gene, linkage disequilibrium between haplotypes and QTL loci can be utilized to map QTL (Meuwissen and Goddard 2000). Meuwissen et al. (2002) showed that this information from linkage disequilibrium could be combined with linkage information in a model with random QTL effects.
The advantage of this method is that it utilizes information on both the historic recombination events, since the mutation in the QTL occurred, and the recombinations observed in the recorded family structure. Therefore, closely linked markers provide much more information than pure linkage analysis and the QTL can be mapped to a region of 13 cM (Meuwissen and Goddard 2000).
In most QTLmapping experiments several traits are measured. Usually, QTL are mapped for individual traits using singletrait analyses. However, there are several reasons why it is important to infer QTLrelated parameters using multitrait methods.
First, many of the traits are environmentally and genetically correlated. To use all information optimally the correlation structure between traits should be taken into account in the analysis. This can increase the statistical power of detection. However, analyzing many traits jointly will not necessarily lead to higher power of detection because an increased number of parameters must be inferred. At least, when only two traits are considered it has been shown, in the framework of linkage analysis, that the statistical power to detect a QTL can be increased by utilizing information from correlated traits. Such increase in power was demonstrated using regression methods (Jiang and Zeng 1995; Calinskiet al. 2000; Knott and Haley 2000), a maximumlikelihood method (Korolet al. 1995), and variance component models (Almasyet al. 1997; P. Sørensen, M. S. Lund, B. Guldbrandtsen, J. Jensen and D. Sorensen, unpublished data). It is particularly important to utilize the correlated information when mapping QTL for low heritability traits that are correlated to a trait of higher heritability (P. Sørensen, M. S. Lund, B. Guldbrandtsen, J. Jensen and D. Sorensen, unpublished data). In this case, the singletrait analysis can have substantially lower statistical power compared to a multipletrait analysis, which combines phenotypic information from both traits. This is important, because utilizing identified QTL through markerassisted selection for traits with low heritability will have a relatively higher impact on practical breeding work.
Second, decomposing the genetic correlation and thereby assessing the effect of the QTL on all traits in the breeding goal is important. The contribution of a QTL to the genetic correlation between two traits does not necessarily follow the overall genetic correlation. Such information is relevant when the value of the QTL for selection is assessed.
Third, knowledge of the genetic background of the QTL contribution to genetic correlations is valuable. A genetic correlation can be the result of pleiotropic effects of a single QTL affecting more than one trait or of linkage disequilibrium between two or more QTL, each affecting one trait only (Falconer and Mackay 1996). The important issue is to decide whether one or two QTL are involved. Specifically, if a chromosomal region is found to affect two traits, it is important to distinguish whether the association is due to a pleiotropic QTL that affects both traits or due to close linkage of two QTL, each affecting one trait. If the aim is selective breeding, and the QTL contribute to an unfavorable genetic correlation, it is crucial to know if it is due to one pleiotropic QTL or to two linked QTL. If there is one pleiotropic QTL, then the genetic correlation is fixed; if there are two linked loci, selection of individuals in which a recombination between the QTL has changed the linkage phase can change the unfavorable correlation. If the aim is to clone the functional gene, it is obviously important that the linked QTL are not modeled as one pleiotropic QTL with a position somewhere between the true QTL positions (Martinez and Curnow 1992).
The necessary tests to distinguish between onelocus or twoloci models can, in principle, be performed using multitrait linkage analysis. However, the QTL position is estimated with accuracy too low to distinguish between a single gene with pleiotropic effects or two loci, each affecting one trait. Combining fine mapping in a multitrait framework is expected to result in higher power to carry out this test.
The objectives of this study are, first, to present a method for multitrait fine mapping using combined linkage disequilibrium and linkage analysis and, second, to show via a small example that the method gives accurate position estimates of closely linked QTL, each affecting one trait. Finally, the small example provides evidence that the proposed method can be used for the rather difficult task of distinguishing between a model postulating one pleiotropic QTL affecting two traits and one postulating two closely linked QTL, each affecting one trait.
METHODS
Model: The multivariate mixed model can be written as
The random variables u, q, and e are assumed to be multivariate normally distributed and mutually uncorrelated. Specifically:
u ∼ MVN(0, G), where G = G_{0} ⊗ A and
q_{i}M,p_{i} ∼ MVN(0, K_{i}), where K_{i} = K_{0i} ⊗ IBD_{i}_{}_{M,pi} and
IBD matrix: The following four steps were used to calculate the IBD matrices for QTL i. First, a gametic relationship matrix (Fernando and Grossman 1989) was generated on the basis of linkage disequilibrium (see steps one and two below) and linkage information (step three below). This matrix was then transformed into the IBD matrix to diminish the dimensionality (step four below).
Step one: For each position p_{i}, a table of IBD probabilities, conditional on haplotype similarities, was generated. This is denoted (P_{IBDN1,Nr,}_{pi}). Similarities between a pair of haplotypes were defined as the number of alleles at consecutive marker loci, to the left and to the right of p_{i}, that were identical by state (IBS). The similarities are denoted as (Nl, Nr). If the linkage group consists of 11 marker loci and p_{i} is in marker bracket k (k = 1,..., 10) there are k markers to the left and 11  k markers to the right of p_{i}. Thus, there can be 0k markers IBS to the left of p_{i} and 011  k markers IBS to the right of p_{i}. This means that the dimension of P_{IBDNl,Nr,}_{pi} is (k + 1) × (11  k). The probabilities of haplotypes being IBD at p_{i} for each of the possible similarities were assessed using the genedrop method of Meuwissen and Goddard (2000), using 100,000 replicated gene drops.
For each individual in the conceptual base population, two haplotypes were sampled. At each position p_{i} a QTL locus was assumed. All QTL alleles were unique, such that any alleles that were IBS in subsequent generations were also IBD. Subsequent generations were simulated by randomly sampling parents from the previous generation to produce N_{o} offspring. Marker and QTL alleles were then transmitted from parents to offspring, allowing for recombinations between each locus according to Haldane’s map function (Haldane 1919). After 100 generations the similarities of all pairs of haplotypes were assessed and the number of pairs in each group (N_{Nl,Nr,}_{pi}) was updated. Within each group, the number of haplotypes that were IBD in the putative QTL position p_{i} was assessed. This number is denoted (N_{IBDNl,Nr,}_{pi}). After 100,000 gene drops, the IBD probabilities at each position p_{i}, conditional on haplotype similarities, were calculated as P_{IBDNl,Nr,}_{pi} = N_{IBD,Nl,Nr,}_{pi}/N_{Nl,Nr,}_{pi}.
Step two: In this step we return to the data set to be analyzed. First, the founder haplotypes were established and the similarity of each pair of haplotypes NlNr(i, j) was assessed. The probability of identity by descent between haplotype i and haplotype j at position p_{i} was found in the appropriate table that was generated in step one (P_{IBDNl,Nr,}_{pi}) and inserted as element (i, j) in the gametic relationship matrix.
Step three: In the case of individuals with genotyped parents, IBD probabilities were calculated on the basis of the inheritance of marker alleles from parents to offspring. The element (i, j) of the gametic relationship matrix was computed using the recurrence equations of Wang et al. (1995).
Step four: The IBD matrix was computed using the linear relationship between the IBD matrix and the gametic relationship matrix (Georgeet al. 2000).
Statistical analysis: Inferences were drawn using restricted maximum likelihood (Patterson and Thompson 1971). Conditional on the IBD matrix for the QTL effects, IBD_{iM,pi}, the restricted likelihood of the multivariate mixed model, assuming a single QTL, is
The AIREML algorithm is based on first and second derivatives of the restricted log likelihood (Jensenet al. 1997). It was implemented by combining it with the expectationmaximization (EM) algorithm (Dempster et al. 1977), to ensure that parameter estimates stay within the parameter space (Jensenet al. 1997). There are cases, however, when estimates of the elements of K_{0} are expected to fall at the boundary of the parameter space. Specifically, if a biallelic QTL has a pleiotropic effect on two or more traits, then the QTL correlation between the traits is unity, which has to be taken into account to check for convergence. This was achieved here using two different criteria. One of these checked for small values of the vector of first derivatives of the restricted log likelihood. If the algorithm converges to a point inside the parameter space, then the values of the vector of first derivatives of the restricted log likelihood should approach zero. However, if the estimates are at the boundary of the parameter space, then the vector of first derivatives is not necessarily zero. Therefore the other convergence criterion requires that changes in estimates of the (co)variance components between successive rounds are smaller than a minimum, welltuned value.
Simulation of data: The proposed method was tested using simulated data sets based on a granddaughter design (Welleret al. 1990), which is presently the most commonly used design in QTL mapping in dairy populations. It is assumed that a previous linkage analysis study had identified a 10cM region including one or two QTL that affect two traits. The aim is to test if there is information to distinguish between one pleiotropic QTL affecting both traits and two QTL, each affecting one trait.
Pedigree: The simulation was built on a founder population created 100 generations ago with a population size of 100 (50 males and 50 females). For each of the subsequent 100 generations, 50 males and 50 females were produced by randomly sampling parents from the previous generation. After the first 100 generations, individuals of the genotyped population were sampled. Ten sires and 200 dams (later referred to as grandsires and granddams) were produced by sampling parents randomly from the 100th generation. Each sire was mated to 20 dams and each mating produced one son.
Marker and QTL alleles: Marker alleles were sampled for the 11 biallelic loci placed with a distance of 1 cM between each locus. Two QTL were positioned 5 cM apart in marker bracket 3 and marker bracket 8. In the founder population, the two alleles of each marker locus were sampled with equal probability, and all QTL alleles were assigned a unique number. Marker and QTL alleles were transmitted from parents to offspring during a period spanning 100 generations. Recombinations were sampled according to Haldane’s mapping function (Haldane 1919).
From among the QTL alleles that were still present after 100 generations, one was sampled at random for each QTL as being the mutant gene, while all other QTL alleles were assumed to be of the wild type. The frequency of the mutant gene was restricted to be between 0.2 and 0.8. This resulted in different QTL variances between replicates (0.120.2) as indicated in Table 1.
Phenotypes: For each of the 200 sons in the last generation of the pedigree, a daughter yield deviation (DYD; Van Raden and Wiggans 1991) was simulated on the basis of 100 daughters. Phenotypic records were simulated for two traits using the following model for the DYD of the ith son,
Analysis of simulated data: When the IBD matrices were calculated, marker haplotypes were assumed known for grandsires and unknown for granddams. This means that the maternally inherited haplotypes in the sons were assumed to be founder haplotypes and to contribute with linkage disequilibrium information accordingly.
To mimic a situation where linkage analysis has identified a 10cM region including one or two QTL that affect two traits, each trait of the simulated data was analyzed initially using a singletrait model (i.e., model (1) with t = 1). A significant QTL was identified if the null hypothesis with no QTL was rejected. If a significant QTL was detected for both traits, the data were analyzed subsequently using a pleiotropic model (i.e., model (1) with t = 2, n_{qtl} = 1, and
Significance thresholds: First, the significance threshold for the singletrait tests was found by simulation. Five hundred replicates were simulated under the null hypothesis of no QTL segregating. Each replicate was simulated with the same structure, the same genetic variance, and the same environmental variance as the data sets including a QTL. The simulated data sets were then analyzed using a singletrait model with and without a QTL fitted. For each replicate, the test statistic based on the ratio of the log likelihoods of the models was calculated, and the significance threshold was set such that 5% of the test statistics were higher than the threshold.
Second, the significance threshold for the tests for pleiotropy vs. linkage was found by simulation as follows. Two hundred and fifty replicates were simulated under the null hypothesis, in which a single pleiotropic QTL was assumed to affect both traits. The effect of the pleiotropic QTL on each trait was the same as the effect of the two linked QTL. In each replicate, both traits were analyzed using a singletrait model. If both tests were significant, the traits were analyzed jointly using the pleiotropic model and the twoQTL model. The significance threshold was assessed on the basis of the empirical distribution of the test statistic (ratio of log likelihoods) over replicates.
RESULTS
Figure 1 shows the mean profile log likelihood for the QTL positions, over 100 replicates. On average, the ridge of the loglikelihood curve indicates that the estimates of the positions are at interval 3 for the QTL affecting trait 1 and interval 8 for the QTL affecting trait 2. These average estimates agree well with the input parameters.
Figure 2 shows the number of times the different intervals had the highest loglikelihood value for the QTL affecting trait 1 (light gray columns) and for the QTL affecting trait 2 (dark gray columns). For both QTL the most frequently estimated position was the correct input parameter used in the simulations. The histograms show some skewness: This is because the QTL were placed at the end of a confined interval and because of the small size of the experiment.
When data were simulated on the basis of the twoQTL model, the null hypothesis, which assumes that the state of nature is represented by a singleQTL model, was rejected in 56% of the replicates.
DISCUSSION
We have presented a multitrait method for finemapping detection of QTL, based on combined linkage disequilibria and linkage analysis, extending the work of Meuwissen and Goddard (2000). The performance of the method was illustrated using simulated data. The results (Figure 2) showed that the position of two closely linked loci, each affecting one trait, could be estimated correctly on average, with the proposed method. The two histograms of the estimated positions of the QTL affecting trait one and trait two were well separated and overlap only in areas where the probability mass was small. The results of Meuwissen and Goddard (2000) suggested that the precision in terms of marker intervals was fairly constant over different marker densities. It is therefore expected that QTL that are more closely linked than those in the present study (e.g., 1 cM) can still be distinguished from a pleiotropic QTL if a denser marker map is used (e.g., 0.25 cM). Furthermore, in the simulations, random drift over the 100 generations might have caused a proportion of the markers to become monomorphic in the genotyped individuals. This might have adversely affected the method to distinguish between one pleiotropic QTL and two QTL each affecting one trait.
More precise inferences about the position of one pleiotropic QTL affecting two traits using multitrait linkage analysis, compared to singletrait analyses, have been reported in the literature (Jiang and Zeng 1995; Almasyet al. 1997; P. Sørensen, M. S. Lund, B. Guldbrandtsen, J. Jensen and D. Sorensen, unpublished data). Similar results are expected to hold for the method proposed here as well, even though we did not investigate this problem.
An important and difficult question that we addressed is whether the proposed method can distinguish between a model postulating one pleiotropic QTL and another postulating two closely linked QTL, each affecting one trait. This is a problem of model choice, in a situation where models are not nested and have the same number of parameters. This issue was discussed from a frequentist perspective by Cox (1961, 1962). Within the constraints of the paradigm, we have avoided a frontal comparison of the models and instead posed the question as follows. When data are simulated under the twoQTL model, in what proportion of times over conceptual replications is the null hypothesis postulating one pleiotropic QTL rejected? The test statistic was the ratio of the log likelihoods of both models, whose small sample distribution was disclosed via Monte Carlo simulation. With a choice of a type I error rate of 5% and given the size of the present simulation experiment and the parameters used to simulate the data, the power of the test was 56%—a long way from the minimum possible power of 5%, which would result if the models were indistinguishable. It is a rather subtle task to discern between the two models, and a larger experiment is needed to arrive at more conclusive results. It is not the purpose of this article to carry out indepth simulation studies. Via the small example presented, we wish to illustrate that the two models can in principle be distinguished by the proposed method. The test performed herein was constructed to answer the specific question above. If real data are analyzed a more appropriate testing scheme might include a number of nested tests resulting from stepwise reductions starting from a full model with two pleiotropic QTL.
When applied to real data more information can be expected compared to that in the simulation in this study. This is because larger granddaughter designs and more informative markers are available. However, this could be counterbalanced by lower frequencies of QTL alleles or different patterns of linkage disequilibrium. If a larger part of the linkage disequilibrium were due to very strong selection or recent population admixture, larger areas of linkage disequilibrium would be expected. This may impair the possibility of fine mapping the QTL. It would be useful to study further the properties of the proposed method in such situations.
The method can be used in other designs involving more complex pedigrees. The main requirement is to be able to establish the haplotypes of the firstgeneration individuals of the genotyped population. In livestock populations, there is often sufficient information to reconstruct haplotypes very accurately and this requirement can be fulfilled. In species with small offspring groups, the method should be modified to take account of the uncertainty of haplotypes.
Currently, the age of mutation and the structure of the historic population are used as fixed parameters. Meuwissen and Goddard (2000) found that the method was rather insensitive to changes in these parameters, except when the mutation was recently introduced into the population. Uncertainty about the age of mutation could be taken into account by making analyses conditional on different ages and producing a profile likelihood over ages.
A Bayesian implementation of this method would be desirable. While not exempt from problems, the Bayesian approach is conceptually more transparent for model comparison than the one presented here. This is accomplished conditional on the available data, via the posterior probability of the respective models, for example, as in Sillanpää and Arjas (1998) and Yi and Xu (2000).
Acknowledgments
Daniel Sorensen acknowledges funding for research visits from the Wellcome Trust Biomedical Research Collaboration Grant 056266/Z/98/Z.
Footnotes

Communicating editor: J. B. Walsh
 Received April 25, 2002.
 Accepted October 14, 2002.
 Copyright © 2003 by the Genetics Society of America