Abstract
A simple procedure to calculate the Bayes factor between linked and pleiotropic QTL models is presented. The Bayes factor is calculated from the marginal prior and posterior densities of the locations of the QTL under a linkage and a pleiotropy model. The procedure is computed with a Gibbs sampler, and it can be easily applied to any model including the location of the QTL as a variable. The procedure was compared with a multivariate leastsquares method. The proposed procedure showed better results in terms of power of detection of linkage when low information is available. As information increases, the performance of both procedures becomes similar. An example using data provided by an Iberian by Landrace pig intercross is presented. The results showed that three different QTL segregate in SSC6: a pleiotropic QTL affects myristic, palmitic, and eicosadienoic fatty acids; another pleiotropic QTL affects palmitoleic, stearic, and vaccenic fatty acids; and a third QTL affects the percentage of linoleic acid. In the example, the Bayes factor approach was more powerful than the multivariate leastsquares approach.
QUANTITATIVE trait loci (QTL) mapping is one of the most active fields in statistical genetics. Since the publication of genetic maps of several livestock species (Archibaldet al. 1995; Barendseet al. 1997), efforts in animal genetics have been focused on the detection of QTL, making use of the available maps. Usually, the experiments to map QTL involve the recording of several traits that are genetically correlated. Following Falconer and Mckay (1996), sources of genetic correlation are pleiotropy, i.e., one gene affects the genetic expression of two or more different traits, and genetic linkage, i.e., two genes affecting separate traits are situated closely together on the same chromosome, preventing the genes from segregating independently at meiosis.
Several authors have tried to solve the problem of distinguishing between linked and pleiotropic QTL. Thus, Cheverud et al. (1997) proposed a likelihoodratio test between QTL locations for each trait separately and a weighted average location. Almasy et al. (1997) used a maximumlikelihood approach from a bivariate analysis, and Lebreton et al. (1998) proposed a bootstrap procedure to reject the linked QTL hypothesis when confidence intervals for the difference between QTL locations included zero. Later on, Knott and Haley (2000) proposed a multipletrait leastsquares analysis to test linkage from the pleiotropic null hypothesis. And, more recently, Lund et al. (2003) developed a likelihoodratiobased test, using a multitrait finemapping method that uses linkage disequilibrium information.
From a Bayesian point of view, comparison between alternative models is developed by calculating Bayes factors (Kass and Raftery 1995). The Bayes factor is the ratio between the marginal probabilities of the data, given the tested models, and after integrating out all parameters in both models. The Bayes factor automatically implies the posterior probabilities for each model, and it does not assume any model as either the null or the alternative hypothesis. A Bayes factor approach to discriminate between nested models has been proposed for the detection of QTL by Varona et al. (2001).
The objective of this article is to propose a Bayes factor to distinguish between linked and pleiotropic QTL. We first present the general procedure to calculate the Bayes factor by the calculation of marginal probabilities of data at a given set of parameters (Chib 1995; Varonaet al. 2001). Second, we compare the proposed procedure with the algorithm of Knott and Haley (2000) by computer simulation. And finally, we compare the performance of both methods, using data on fatty acid composition from an experiment with Iberian × Landrace F_{2} pigs.
THEORY
The Bayes factor discriminates between two candidate models. In the current application, the first candidate model is the linkage QTL model, where the likelihood of the bivariate data (y_{1}, y_{2}) is described by a probability function conditioned to a set of parameters for each trait (θ_{1}, θ_{2}), which can include additive, dominant, polygenic, systematic, and residual effects, as well as the different QTL locations for both traits (λ_{1}, λ_{2}), affecting each QTL at a different trait:
The second candidate model is the pleiotropy QTL model, where the likelihood of the bivariate data is described by a probability function conditioned on the same set of parameters (additive, dominant, polygenic, systematic, and residual effects) for each trait as in (1), but including only one location (λ_{p}):
The Bayes factor (BF) is defined as the ratio of marginal probabilities of the data under models (1) and (2),
If we assumed p_{l}(θ_{1}, θ_{2}) = p_{p}(θ_{1}, θ_{2}), and we set the location of the QTL to an arbitrary value k,
It must be noted that the BF is the ratio between the marginal probabilities of the data [p_{l}(y_{1}, y_{2}) and p_{p}(y_{1}, y_{2})], under both models. The marginal probabilities of the data are the integration constants of the posterior distributions of the models. We used an arbitrary location (k) only to facilitate the calculation of these integration constants in the scope of nested models. A very interesting application of this approach can be found in Chib (1995).
In summary, the information required to calculate the Bayes factor between both models consists of the prior and the posterior densities of the QTL locations under the linkage and pleiotropy models.
MONTE CARLO SIMULATION
The simulation modeled an F_{2} design, assuming the original lines were homozygous at the QTL for alternative alleles. We used the Haldane mapping function to compute map distances using recombination fractions. Two population sizes (400 and 800 individuals) and two percentages of the total F_{2} generation variance explained by the QTL (5 and 15%) for each of the two simulated traits were considered. We also simulated two different marker maps, a lowdensity map, with markers located at 0, 30, and 60 cM, and a highdensity map with 61 markers, one every centimorgan.
For each combination of population size, percentage of variance explained by the QTL, and marker map, four different situations were simulated:
Case I (pleiotropy): The QTL for both traits are located at position 30 cM.
Case II (close linkage): The QTL for the first trait is located at position 27.5 cM and the QTL for the second trait at position 32.5 cM.
Case III (linkage): The QTL for the first trait is located at position 20 cM and the QTL for the second trait at position 40 cM.
Case IV (loose linkage): The QTL for the first trait is located at position 10 cM and the QTL for the second trait at position 50 cM.
The phenotypic data for both traits were simulated with a general mean (μ= 100), the QTL effect, and a random residual term (e), sampled from a univariate normal distribution with a constant phenotypic variance of 100. For simplicity, residual effects for both traits were assumed to be uncorrelated. One hundred replicates were simulated for each combination of population size, percentage of variance, marker map, and situation (cases IIV).
Knott and Haley: Each replicate was analyzed using the algorithm proposed by Knott and Haley (2000) for linkage detection taking the pleiotropy model as the null hypothesis and the linkage model as the alternative hypothesis. In the linkage model there are two linked locations, each affecting a different trait. For each trait (j = 1, 2), the model of analysis was the same used for the simulation,
We calculated the following statistic on the basis of the Knott and Haley (2000) procedure,
Bayes factor: The Bayes factor to distinguish between the linkage and pleiotropy models was calculated for each replicate through the implementation of a Bayesian bivariate analysis. The likelihood of the model under the linkage QTL model was
Under the pleiotropy QTL model, the likelihood is similar, with the only difference being that both λ_{1} and λ_{2} are replaced with λ_{p}. It must be noted that under the linkage model, the probabilities pr(QQ)_{λ}_{i}, pr(Qq)_{λ}_{i}, and pr(qq)_{λ}_{i} can be different for each trait, as the location of the QTL varies, but under the pleiotropy QTL model those probabilities are the same for both traits, because the location is always the same.
Prior distributions for mean, additive, and dominance effects and the residual variances were bounded uniform priors (0, 500). In the linkage model, the prior distribution of the location is a discrete uniform distribution on the integer values in the intervals [0 cM:60 cM] × [0 cM:60 cM]. Therefore, the prior density for any pair of locations was
The Bayes factor to discriminate between linkage or pleiotropy was calculated using Equation 3 with values p_{l}(λ_{1} =λ_{2} = k) = 1/L_{2} and p_{p}(λ_{p} = k) = 1/L, to arrive at
Then,
A Bayes factor >1 indicates evidence of the linkage model, and a Bayes factor <1 indicates greater posterior probability for the pleiotropy model. Moreover, the Bayes factor also indicates the magnitude of the evidence in favor of each model. For this reason, we also classified the output of the BF into “strong” evidence for pleiotropy (BF < 0.1), “moderate” evidence for pleiotropy (0.1 > BF > 0.333), “slight” evidence for pleiotropy (0.333 > BF > 1), slight evidence for linkage (1 > BF > 3), moderate evidence for linkage (3 > BF > 10), and strong evidence for linkage (BF > 10). The transformation between the Bayes factor and posterior probability of the linkage model [1/(1 + BF)] and the pleiotropy model [BF/(1 + BF)] must be also noted.
Testing pleiotropy vs. linkage in an Iberian × Landrace pig intercross experiment for fatty acid composition: As an example, we used data from an F_{2} experiment with Landrace and Iberian pigs as parental populations. Comprehensive reports of the experimental design and results are given by PerezEnciso et al. (2000), Varona et al. (2002), Ovilo et al. (2002), and Clop et al. (2003). The pedigree consisted of 3 Iberian boars, 31 Landrace sows, 6 F_{1} boars, 73 F_{1} sows, and 577 F_{2} animals. A total of 369 individuals, from 58 fullsib families, were recorded for percentage of the 10 most important fatty acids (see Table 1). In addition, they were also genotyped for the following markers: S0035, SW1057, S0087, SW316, S0228, SW1881, and SW2419, at locations 0.0, 44.3, 57.7, 81.2, 96.0, 108.7, and 145.3 cM in SSC6, respectively. Genetic distances between markers were calculated using the BUILD option of the Crimap program (Greenet al. 1990).
We first performed a genomic scan with a single QTL analysis for each of the 10 fatty acids, using the algorithm of Haley et al. (1994) with the model
For those traits where a significant QTL was detected, the procedures of Knott and Haley (2000) and the proposed Bayes factor procedure were performed. Computation procedures followed those described in the monte carlo simulation section.
RESULTS
Simulation study: The results of the simulation study with a lowdensity map are presented in Tables 2 and 3, and the results with a highdensity map are presented in Tables 4 and 5. First, we compare the results of the KH procedure with the BF approach when a lowdensity map was simulated (Table 2). In case I, pleiotropy was simulated and it represents the null hypothesis under the KH procedure. Therefore, the values in the table for case I are a measurement of the type I error, and they were in the range of expected values (1 and 5%). On the contrary, the percentage of replicates that provides a BF smaller than one (linkage model) ranged from 0 to 13%, being smaller as the information provided by the data was greater.
When linkage was simulated (cases II, III, and IV), the values for the KH procedure represent empirical power for a type I error of 1 and 5%. In case II (close linkage), the empirical power at 5% using the KH procedure ranged between 4% in the less informative situation (5% of variance explained by the QTL and 400 individuals) to 22% in the most informative situation (15% of variance explained by the QTL and 800 individuals). At the 1% significance level, it ranged from 0 to 8% of significant cases. When the BF procedure is used, the number of replicates with a BF greater than one is 14 in the less informative situation (400 individuals and 5% of variance) and 22 in the most informative situation (800 individual and 15% of variance). The performance of both methods is similar, and only in the scenario with lower information does the BF procedure detect the linkage between the QTL in both traits in a greater percentage of replicates (14 vs. 4%).
In case III (linkage), the percentage of replicates significant at 5% using the KH procedure ranged from 31 to 96% and from 6 to 83% at significance of 1%. With the BF procedure, the results were similar; the percentage of replicates yielding linkage as the most probable model ranged from 51 to 93%, depending on the scenario of the simulation. As in the previous case, the performance of both methods was similar, and only in the scenario with low information did the BF procedure detect the linkage in a greater percentage of replicates (51 vs. 31%).
Finally, in case IV (loose linkage), the empirical power of the KH algorithm ranged from 68 to 100% at the 5% significance level and from 44 to 100% at 1%. The percentage of replicates that indicates linkage with the BF method ranged from 92 to 100%. As before, for 92% of the replicates linkage was the most likely situation in the scenario where 5% of variance is explained by the QTL and the population size of 400 individuals. In that situation, the KH method detected only 68% of replicates as significant.
In Table 3, the results of BF in the lowdensity map are classified according to the evidence supporting pleiotropy vs. linkage. In case I, when 400 individuals were simulated, most of the replicates yielded a BF between 1.0 and 3.0 (slight evidence of pleiotropy). On the contrary, when 800 individuals were simulated, the most frequent output was a BF between 3.0 and 10.0 (moderate evidence of pleiotropy). In case II, where the QTL differ in 5 cM, a few percent of cases indicated linkage (1422%), and most of the replicates produced a BF that indicates slight evidence supporting pleiotropy or even moderate evidence in the most informative situation (40%). In case III, most of the replicates indicate linkage and in the most informative case 59% of the replicates indicate strong evidence of pleiotropy. Finally, in case IV and 400 individuals, 53 and 93% of the replicates showed strong evidence of linkage when 400 individuals were simulated, depending of the percentage of variance simulated (5 or 15%). For 800 individuals, the BF fell in the category of strong evidence of pleiotropy (BF < 0.1) in all replicates.
The results comparing BF vs. KH using a highdensity marker map are presented in Table 4. It must be noted that all QTL locations tested here are on the location of fully informative markers, becoming in essence a singlemarker analysis. As with the previous marker map, the results of the KH and the BF procedure were equivalent, and only in the low informative cases were there some differences: 5 vs. 14% in case I, 11 vs. 25% in case II, 66 vs. 85% in case III, and 80 vs. 100% in case IV. It should be noted that the linkage is detectable in most of the replicates in case II, in contrast with the simulations with the lowdensity marker map.
In Table 5, the results of the BF with the highdensity marker map are classified according to its magnitude. It is remarkable that, as the information increases, a higher number of replicates provide strong evidence of pleiotropy (case I) or strong evidence of linkage (case III and case IV), whereas in case II (close linkage), most of the cases provided slight or moderate evidence of linkage or pleiotropy.
Bayes factor analysis of SSC6 in an Iberian × Landrace pig intercross for fatty acid composition: A summary of the maximum F values of genomic scans from the single QTL using the algorithm of Haley et al. (1994) is presented in Table 6. As previously reported by Clop et al. (2003), results show that for this example many QTL behave with dominance. The maximums of the genomic scans along SSC6 of myristic (C14:0), palmitic (C16:0), palmitoleic [C16:1(n9)], stearic (C18:0), vaccenic [C18:1(n7)], linoleic [C18:2(n6)], and eicosadienoic [C20:2(n6)] acids were significant at a nominal level of significance of 5% (Fvalue > 3.0). These traits were selected for testing linkage vs. pleiotropy with the KH and the BF procedures.
The posterior mode, mean, and standard deviation of the location, and the posterior mean and standard deviation for the additive and dominance effects for the selected traits are presented in Table 7. In addition, Table 8 shows the results of the Bayes factor’s posterior probability of the pleiotropy model and the significance under the KH procedure. The Bayes factor ranged from 0.197 for the bivariate analysis of palmitoleic and stearic acids to 9.804 for the bivariate analysis of palmitic and palmitoleic acids. Posterior probability of the linkage model ranged between 0.165 and 0.908 for the same bivariate analyses. On the contrary, the procedure of KH provides only three significant values at 5% and seven at 10%.
As an example, a bidimensional plot of the posterior distribution, where the Bayes factor indicates the suitability of the linkage model, is presented in Figure 1. The figure corresponds to the posterior density of the bivariate analysis of palmitoleic and linoleic acids, with a Bayes factor of 9.174. Figure 2 shows the posterior density for an example where the Bayes factor determines the suitability of the pleiotropy model. Figure 2 corresponds to the bivariate analysis of palmitoleic and stearic acids, with a BF of 0.197.
DISCUSSION
The Bayesian procedure proposed in this article provides a derivation of the Bayes factor and the posterior probability for each model. In contrast to the likelihood or bootstrapbased approaches (Almasyet al. 1997; Cheverudet al. 1997; Lebretonet al. 1998; Knott and Haley 2000; Lundet al. 2003), which require determining a null hypothesis model, the Bayes factor does not need to set any null hypothesis to contrast with. In this case, both the pleiotropy and the linkage models are considered as candidate models or hypotheses, and the odds between the marginal probabilities of the data under each model determine which one adjusts better to the data. For that reason, classical concepts of hypothesis testing like power or level of significance cannot be applied directly. Moreover, while the likelihoodbased approach makes use of the likelihood function on the maximumlikelihood estimates, the Bayes factor includes the information provided by data after integrating along the parametric space. Thus, all the available information is used to discriminate between the alternative models.
Another advantage of the Bayes factor is that the output is a probability, which is easier to compare with the results of other experiments. In contrast, the P values of the frequentist or the classical hypothesis testing scope cannot easily deal with comparing between different replications of the experiment.
The proposed algorithm is easy to compute from the output of a Gibbs sampler or from any other Markov chain Monte Carlo method. This fact represents an advantage over other approximations to the Bayes factor or posterior probabilities, such as the harmonic mean (Newton and Raftery 1994) or the reversiblejump Markov chain Monte Carlo method (Green 1995). The example presented here was a simple case in which the model of analysis for both traits consisted of a simple regression. However, the procedure can be easily adapted to any model to analyze QTL of inbred or outbred populations. The only prerequisite is including the location of the QTL as a parameter in the model, thus making available the posterior distribution of the QTL location for both traits. As a consequence, all Bayesian procedures to detect QTL (Hoescheleet al. 1997) can be easily adapted to discriminate between linkage and pleiotropy for different traits. It is even possible to include more than two traits in the analysis, allowing for the comparison of alternative models, when either the same gene influences all traits or at least one of the traits has a different genetic regulation. It is also possible to include the information available from linkage disequilibrium as described by Lund et al. (2003), which might increase considerably the power of discrimination between the alternative models.
The results of the simulation study showed that for the KH method, the levels of significance are set assuming the pleiotropy model as the null hypothesis, and, as expected, the percentage of the replicates that exceeded the significant thresholds in all scenarios corresponded to the type I error, when location of the QTL was the same for both traits in the simulation (case I). On the contrary, when the BF is used, no model is set as null or alternative hypothesis. For this reason, when the information increases due to the percentage of variance explained by the QTL or the number of individuals included in the analysis, the percentage of replicates that lead to the conclusion of the correct model increases. For example, with both marker maps, none of the replicates support linkage with a population size of 800 individuals and 15% of variance explained by the QTL in case I (pleiotropy). On the contrary, with a population size of 400 individuals and 5% of the variance explained by the QTL, 13 and 14% of the replicates yield the linkage model as more probable with the low and highdensity marker maps, respectively.
To compare both procedures, for the BF method, the percentage of replicates supporting linkage when it is the true model can be compared to the power of the procedure under a frequentist scope, although in the Bayesian paradigm the concept of power cannot be directly applied. However, from now on we refer to power for both procedures for simplicity. In this sense, the BF procedure had greater statistical power than the KH algorithm when the information available is low, i.e., when 400 individuals and 5% of variance are explained by the QTL (cases II, III, and IV with both marker maps). When more data are available, or the percentage of variance explained by the QTL is higher, the results of both procedures were similar. As expected, the differences in power depending on the density of the map were relevant with both procedures, with a higher incidence in the case with low distance between the QTL (case II). The percentage of cases that yielded the linkage map as most probable was 22 with the lowdensity map vs. 100 with the highdensity map when 800 individuals were simulated and the percentage of variance explained by the QTL was 15.
Another important property of the Bayes factor approach is that it does not rely on any asymptotic results, as likelihoodbased methods do. This is because the BF procedure provides exact results with any amount of data. However, as the information increases, the probability of the “best” model also increases, as pointed out by GarciaCortes et al. (2001) within the scope of a variance components model. This effect can be observed in our simulation results. The percentage of replicates with higher (or lower) BF indicating strong evidence of pleiotropy (or linkage) increases as the information provided by the number of data, the percentage of variance explained by the QTL, or the density of the marker map increases. For example, with the lowdensity map and case IV (loose linkage) of the simulation, the number of replicates yielding a BF < 0.1 (strong evidence of linkage) increased from 20 to 93% when the population increased from 400 to 800. It increased from 20 to 75% when the percentage of variance explained by the QTL increased from 5 to 15% and from 20 to 53% when the lowdensity marker map was replaced by a highdensity map.
With the real data set, the posterior mode estimate of the locations and the additive and dominance posterior means presented in Table 7 did not differ substantially from results obtained by the procedure of Haley et al. (1994) presented in Table 6. However, the posterior mean estimates of the locations are distant from the posterior mode estimates because of the asymmetry of the posterior distributions or the presence of multiple modes.
The Bayes factor and posterior probability results shown in Table 8 suggest the presence of a QTL at ∼3342 cM in SSC6, which affects myristic (C14:0), palmitic (C16:0), and eicosadienoic [C20:2(n6)] fatty acids, with Bayes factors of 0.387 (C14:0 and C16:0), 0.574 [C14:0 and C20:2(n6)] and 0.229 [C16:0 and C20:2(n6)], respectively. Another QTL affects palmitoleic [C16:1(n9)], stearic (C18:0), and vaccenic [C18:1(n7)] fatty acids, around position 08 cM, with Bayes factors ranging from 0.197 to 0.537. Finally, another QTL that affects only linoleic [C18:2(n6)] fatty acid was detected at ∼113 cM, and it has no relation to the QTL affecting other fatty acids. The KH procedure provides results in a similar direction. However, while the BF procedure indicates linkage as the more suitable model in 15 out of 21 analyses, the KH procedure detects only three times the significance at 5% and seven times at 10%. In the analyses where the KH procedure is significant at 5%, the BF produces posterior probabilities of the linkage model of 0.908, 0.902, and 0.639, respectively. Moreover, in all the cases where the KH indicates significant results at 10%, the BF provides posterior probabilities of the linkage model >0.565. The amount of data available (369 individuals) may suggest a greater power of the BF procedure, confirming its greater power when the available information is low.
In the graphical interpretation (Figures 1 and 2), the density on the diagonal of the posterior density under the linkage model indicates the probability of the pleiotropy model. The BF factor is calculated from the ratio of the prior and posterior probabilities of this diagonal. In Figure 1, the posterior density of the diagonal is 0.109 times the prior density (L), indicating the suitability of the linkage model. However, in Figure 2 the posterior density of the diagonal is 5.076 times the prior density, indicating a greater suitability of the pleiotropy model. However, it must be stated that in the model termed here “pleiotropy model,” QTL affecting both traits are located on the same position. It is not possible with a statistical approach to distinguish between the effects of one or two fully linked genes. It should be mentioned that the procedure proposed here assumes discrete prior and posterior distributions of the location parameters in both linkage and pleiotropy models, facilitating the calculation of Σ_{λ1=λ2}p_{l}(λ_{1}, λ_{2}y_{1}, y_{2}) by counting the correlated samples where λ_{1} =λ_{2} under the linkage model. However, the continuous distributions for the location parameters can also be assumed with equivalent results (results not presented), but the use of density calculation techniques (Silverman 1986) is required for the calculation of ∫_{λ1=λ2} p_{l}(λ_{1}, λ_{2}y_{1}, y_{2}).
It must be remarked that we assumed null correlation between the residuals in our study. However, the procedure will be similar when correlated residuals are assumed, although the residual correlation may influence the results. The comparison of the power of the BF procedure with respect to other available procedures with several scenarios of residual correlations can be a very interesting subject for future research.
In conclusion, the BF approach proposed here could be an interesting alternative to existing methods for testing pleiotropy vs. linkage. The procedure used all available information summarized in the marginal probability of data given the model. It is very easy to generalize to models more complicated than the ones used in this example. In terms of power, it provides the same or even better results than the Knott and Haley (2000) procedures. Finally, the BF does not need to invoke any asymptotic results and it provides the results in probabilistic terms or in the magnitude of the evidence.
Acknowledgments
We thank Meritxell Arqué and all the personnel in Nova Genètica, in particular Paco Márquez, Pere Borràs, and Eva Ramells, for technical assistance. We also thank Isabel Díaz and all the personnel in Centro Tecnológico de la Carne (IRTA) for their collaboration. We also thank Rodolfo Cantet for useful comments. The Iberian boars were a donation from the Servicio de Investigación Agraria ‘El Dehesón del Encinar’ (Toledo, Spain), and the rest of the animals were produced in Nova Genètica (Solsona, Spain). This work was funded by the Spanish Comisión Interminsterial de Ciencia y Technologia grants AGF990284C02 and AGL20001229C03.
Footnotes

Communicating editor: C. Haley
 Received August 1, 2003.
 Accepted October 30, 2003.
 Copyright © 2004 by the Genetics Society of America