Abstract
The discovery of epistatically interacting QTL is hampered by the intractability and low power to detect QTL in multidimensional genome searches. We describe a new method that maps epistatic QTL by identifying loci of high QTL by genetic background interaction. This approach allows detection of QTL involved not only in pairwise but also higherorder interaction, and does so with onedimensional genome searches. The approach requires large populations derived from multiple related inbredline crosses as is more typically available for plants. Using maximum likelihood, the method contrasts models in which QTL allelic values are either nested within, or fixed over, populations. We apply the method to simulated doubledhaploid populations derived from a diallel among three inbred parents and illustrate the power of the method to detect QTL of different effect size and different levels of QTL by genetic background interaction. Further, we show how the method can be used in conjunction with standard twolocus QTL detection models that use twodimensional genome searches and find that the method may double the power to detect firstorder epistasis.
TRAITS that show continuous variation among individuals (quantitative traits) are affected by the environment and by many genes [quantitative trait loci (QTL)] that act singly and in interaction with each other. Interaction among QTL, or epistasis, is implicated in a number of important processes. Epistatic variance, as its fraction of the total genetic variance increases, may reduce the resemblance of offspring to their parents (Lynch and Walsh 1998), affect the importance of genetic drift and population structure in evolution (Wright 1980; Wade 1992), increase the genetic variance remaining in a population after a bottleneck (Goodnight 1987, 1988), and lead to either heterosis (Lynch and Walsh 1998) or its reverse, outbreeding depression, which can cause speciation events (Parker 1992; Orr 1995).
Methods to study epistasis using quantitative genetic methods that are based strictly on individual phenotypes lack power because epistasis contributes little to the resemblance among relatives (Cheverud and Routman 1995). The advent of complete DNAmarker linkage maps offers a new possibility for the study of epistasis: QTL mapping allows for the direct evaluation of interaction among identified loci (e.g., simulation model 4 of Haley and Knott 1992; Chaseet al. 1997). Nevertheless, such an approach confronts two important problems: low power to detect firstorder epistasis (between just two QTL) and the intractability of seeking higherorder epistasis (involving multiple QTL).
To evaluate QTL interactions, methods must search for multiple QTL simultaneously. Such a multidimensional search (Kaoet al. 1999) necessitates many statistical tests, and a high statistical threshold must be adopted to avoid falsepositives among those tests. As an indication, Lander and Botstein (1989, Appendix 6) suggest that an mfold higher threshold be used to declare significance for an mdimensional search as for a onedimensional search. This high threshold would drastically reduce power to find significant QTL interactions. To date, researchers have implemented two partial solutions to this quandary. First, they have increased detection power by restricting the search for QTL interactions to portions of the genome (Fijnemanet al. 1996). This restriction ensures a feasible number of tests but leaves other portions of the genome unexplored. Second, they have sought QTL interactions that affect only those QTL that have detectable main effects (Larket al. 1995). This option also drastically reduces the number of tests but fails to discover interacting QTL that have no strong individual effect (Fijnemanet al. 1996; Hollandet al. 1997).
Despite the problem of an appropriate statistical threshold, methods for twodimensional searches have been developed (Haley and Knott 1992; Chaseet al. 1997; Holland 1998; Wanget al. 1999). These methods will allow the detection of firstorder interactions but not necessarily of higherorder interactions. The ability to detect higherorder interactions is desirable because such interactions have been documented (Allard 1988, 1999; Doebleyet al. 1995; AlonsoBlancoet al. 1998). Furthermore, the metabolic pathways that presumably underlie quantitative traits involve multiple interacting gene products (enzymes) and regulatory loci that could generate higherorder epistasis (McMullenet al. 1998).
A possible resolution to both the search dimension problem and the higherorder interaction problem would be to perform a onedimensional search for QTL that interact with the genetic background. In the simplest case of two loci, the alleles present at one locus form the genetic background for alleles present at the other locus. Thus, a firstorder interaction between these two loci would cause each locus to interact with its genetic background. Higherorder epistasis may also cause each locus to interact with its genetic background. Higherorder epistasis may also cause QTLbygeneticbackground interaction (Doebleyet al. 1995; AlonsoBlancoet al. 1998). Detecting such interactions has become more important given improvements in methods to transfer alleles from one background to another using markerassisted selection (Hospital and Charcosset 1997). Such efforts could be fruitless if the transferred alleles failed to affect the phenotype in a new genetic background as they had in their background of origin. Charcosset et al. (1994) and Rebaï et al. (1994, 1997) developed methods to detect QTLbygeneticbackground interaction that employ genotyped progeny resulting from a diallel mating design. These methods use leastsquares tests applied directly to DNA marker data.
In this article, we develop a new method that maps within marker intervals the loci of greatest QTLbygeneticbackground interaction by simultaneous analysis of multiple related inbredline crosses. Using maximum likelihood, the method contrasts models in which QTL allelic values are either nested within populations or are fixed over populations. High likelihood ratios between these models indicate QTLbygeneticbackground interaction. As a further benefit, the method allows statistical control of genetic noise due to other, nonfocal, QTL using multiple regression on marker data [Jansen’s multipleQTL model (MQM) method; Jansen and Stam 1994]. We apply the method to simulated doubledhaploid populations derived from a diallel among three parents and evaluate its power to detect QTL that interact with either the polygenic background or with each other. Finally, we show how the method can be used in conjunction with standard twolocus models.
METHODOLOGY
Consider three doubledhaploid parents, denoted A, B, and C, a diallel of the three possible crosses between them, A × B, A × C, and B × C, and the doubledhaploid populations derived from each of these crosses. Using these populations, QTL may be mapped using the MQM procedure (Jansen and Stam 1994) within each population. The linear model for such a procedure is
For missing QTL or marker information, Jansen and Stam (1994) have shown that maximumlikelihood estimates for the parameters μ_{i}, α_{i}, β_{ic}, and ε_{ij} can be obtained within each population by an expectationmaximization procedure using weighted multiple regression. Given this procedure, the support level for the presence of a QTL at a map location, using information from all populations simultaneously, derives from the likelihood ratio between (FULL) and a noQTL null model. (FULL) contains three more estimated parameters than the noQTL model, that is, one QTL effect per population.
A reduction in the number of estimated parameters is possible if we assume the QTL does not interact with genetic background. In that case, we may consider the allelic value of a QTL fixed over populations and represent the value of the homozygote of the allele from parent X at the QTL locus analyzed as g_{X}, irrespective of genetic background (i.e., population) in which this genotype occurs. The regression coefficients α_{1}, α_{2}, and α_{3} then respectively estimate ½(g_{A}  g_{B}), ½(g_{A}  g_{C}), and ½ (g_{B}  g_{C}) which we denote
Having defined (FULL) and (REDUCED), we see that a QTLbygeneticbackground interaction would cause a difference in their likelihoods. Thus, when using the models to fit a QTL at a locus, a large likelihood ratio between the models provides evidence of a QTL at that locus that interacts with genetic background, in other words, a QTL that interacts epistatically with other loci. In the presence of epistasis, a general relationship between the regression coefficients of (FULL) can be expressed
Further analytical exploration of d reveals its importance in mapping epistatic QTL. We address two questions: first, what is the maximal value of d relative to the substitution effect of the QTL in which it occurs, and second, for a firstorder interaction between two loci, locus 1 and locus 2, how are the corresponding values of d_{1} and d_{2} related?
To answer the first question, consider that in a doubledhaploid population, a QTL with allelesubstitution effect α induces an additive genetic variance of α^{2}. Assuming that α_{1}, α_{2}, and α_{3} are not equal (as is indeed impossible if at least one is nonzero and epistasis is absent), we ask what the maximal value of d may be, given a mean QTL variance over the three populations of
To analyze the relationship between the deviation contrasts d_{1} and d_{2} of two interacting loci, consider the vector of genetic values
SIMULATIONS
Genome generation: The genome consisted of 10 chromosomes of 200 cM each. Two hundred marker loci were randomly distributed over the genome. Each marker locus was assumed to be triallelic, with allele frequencies, in order of abundance, of 0.5, 0.3, and 0.2. For each of three doubledhaploid parents denoted A, B, and C below, alleles were randomly assigned at each marker locus with the probability of each allele depending on its frequency. Given this procedure, the probability that two parents shared an allele at a given marker locus was (0.5)^{2} + (0.3)^{2} + (0.2)^{2} = 0.38. Thus, for a given population, the expected number of segregating markers was 200^{*}(1  0.38) = 124.
In a first set of simulations (set 1), we explored the power to detect QTL interacting with the polygenic background. Doubledhaploid populations from all three crosses (A × B, A × C, and B × C) were generated by doubling simulated gametes from F_{1}’s of each of the crosses. We assumed an isolated QTL to be present at the center of each of six of the chromosomes. Each parent carried a different QTL allele and QTL substitution effects (α_{1}, α_{2}, α_{3}) were picked randomly but were subject to the constraints of Equations 1 and 2 to obtain desired deviation contrasts and average QTL effects. On each of two chromosomes, the fraction of the phenotypic variance caused by additive QTL effects, averaged over the three populations, was
In a second set of simulations (set 2) we evaluated how best to combine QTLbybackground interaction information with standard twolocus models to detect firstorder epistasis. We simulated two QTL, each at the center of a chromosome. Genetic values for the alleles derived from each parent were generated as given in Table 2, scaled so that the total genetic variance caused by the QTL together was a fraction
QTL analysis, set 1: For each population separately, three markers were chosen per chromosome to be candidate cofactors in the MQM procedure (leading to 3 × 10 = 30 candidate cofactors over the entire genome). Segregating markers were chosen that allowed the most uniform chromosome coverage. Because the same markers did not necessarily segregate in each population, different sets of candidate cofactors resulted per population. Using all candidate cofactors we calculated a biasadjusted residual variance for each population. These variances are unbiased (Jansen 1994) and we used them for all further estimations on the three populations. Again, for each population separately, we used a backward elimination procedure to retain in the model only those cofactors that explained significant proportions of variance. To determine whether to retain a cofactor, we used a threshold T such that P(F_{1,d.f.} > T) = 0.02, with d.f. equaling the number of residual degrees of freedom in the allcofactor model, that is, d.f. = [number of individuals in population  (number of candidate cofactors + 1)].
To locate QTL we then scanned the full genome in 5cM steps. We first calculated the likelihood of the data under (FULL) in the absence of a QTL (L_{NoQTL}), but using all retained cofactors with the exception of cofactors within 25 cM of the putative QTL. The likelihood of the data under (FULL) was then calculated in the presence of a QTL (L_{Full}). Finally, we calculated the likelihood of the data under (REDUCED) (L_{Reduced}). From these likelihoods we calculated three likelihood ratios:
We ran simulations with population sizes of 50, 100, and 200 doubledhaploid individuals per population, and with deviation ratios of zero to three in onehalf increments. To determine genomewide significance thresholds for these three statistics we performed 3000 simulation runs on individuals generated without genetic variance. We chose as threshold the 95th percentile value of the genomewide maxima of the statistics. The power of (FULL) or (REDUCED) to detect a QTL under specified conditions is the fraction of simulated QTL for which LR_{Full} or LR_{Reduced} exceeded their thresholds. Similarly, the power to detect a QTLbygeneticbackground interaction is the fraction of QTL for which LR_{Deviation} exceeded its threshold.
QTL analysis, set 2: We used two methods to detect pairs of interacting QTL, one with information from the deviation contrast method and one without it. In the first, we assumed that population A × B was part of a diallel, as would be typical for a population in an applied plant breeding program. That diallel was analyzed as indicated above and results were used to determine which regions of the genome should be paired for analysis using the twolocus model for epistasis,
The overall analysis proceeded as follows. The maximal LR_{Deviation} (Equation 4) for each chromosome determined the position of maximal support for the presence of an epistatic QTL. If the sum of LR_{Deviation} for two chromosomes exceeded a threshold T, and the signs of the deviation contrast d calculated from Equation 1 were opposite, 90cM regions surrounding the points of maximal support were analyzed in each population separately using (BILOC). As T increases, the number of locus pairs analyzed using (BILOC) declines and therefore the likelihood ratio necessary to ensure a 5% type I error rate also declines. We obtained thresholds for BILOC at different T from 1000 simulation runs on individuals generated without genetic variance.
In the second method, we assumed that the population A × B was analyzed for epistatic QTL without the benefit of information from a diallel analysis. (BILOC) was therefore applied over the whole genome leading to a much greater number of tests. We report the power to detect both QTL of an interacting pair, mapped to within 25 cM of their simulated positions. For both methods, we used significance thresholds for a 5% type I error rate on a populationwise basis. If several populations were analyzed, a further Bonferroni correction would be applied.
RESULTS
QTLbybackground interaction: Analysis results from a single simulation run in Figure 1 illustrate the type of output from the method. The likelihood ratios between (FULL) and a noQTL model show support for the presence of QTL at their simulated locations, irrespective of whether the QTL is epistatic to others. In contrast, the likelihood ratios between (FULL) and (REDUCED) specifically identify QTL that are epistatic to others segregating in the background of the populations. Using the regression coefficients estimated from (FULL), a deviation contrast can be calculated using Equation 1. Consistent with theory, the estimated deviation contrasts for two QTL involved in firstorder epistatic interaction are of similar magnitude but opposite sign. At the loci of maximal LR_{Deviation}, the deviation ratios estimated for the two QTL were 2.15 and 2.09, values that exceed in magnitude the ratio of 1.73 =
Empirical powers to detect QTL with nonnull deviation contrasts are graphed in Figure 2. The magnitude of the deviation contrast increases with both the average variation caused by the QTL and with the deviation ratio. For a given
Note that two forms of type I error may occur in QTLbygeneticbackground mapping: a significant deviation contrast may be declared either in the absence of true genetic variation or in the presence of a QTL that does not interact with genetic background. We set significance thresholds using simulations of genomes without genetic variance and therefore obtained α = 5% for the first form of type I error. Figure 2 shows the second form of type I error rate on a per QTL basis as the “detection power” when the deviation ratio is zero. This rate depended on the variation caused by the QTL and on the population size (Figure 2). Because the rate is given on a per QTL basis, it can be <5%, for example, when
High values for either LR_{Full} or LR_{Reduced} should indicate the presence of a QTL segregating within the diallel. In effect the difference between (FULL) and (REDUCED) is that QTL allele values are nested within populations in (FULL) while they are considered fixed in (REDUCED). Because (REDUCED) estimates one parameter less than (FULL) it may be more powerful to detect a segregating QTL if its assumption of fixed allele effects is correct. Based on our simulations, (REDUCED) is indeed more powerful than (FULL) for QTL with a deviation ratio lower than one (Figure 3). The gain in power, however, appeared relatively small, and for QTL with larger deviation ratios, a possibly large loss in power occurred. In contrast, (FULL) was impervious to changes in the deviation ratio. Methods have been described in the literature that model nested QTL effects over multiple populations (Xieet al. 1998; Xu 1998). Our simulation results indicate that for methods that seek to gain power by assuming fixed allele effects over multiple populations, a consequence of epistasis might precisely be loss of power. Methods applied to complex human or animal pedigrees that combine segregation and linkage analysis to map QTL (e.g., Lin 1999) often assume fixed allele effects because the sizes of the families they analyze are small. This assumption may also be made for plant populations. In either case, epistasis may reduce the QTL detection power of these methods.
Firstorder QTL interaction: If a population is embedded within a diallel, information from a prior analysis using (FULL) and (REDUCED) will provide evidence as to which regions of the genome carry epistatic QTL. That information, in turn, may be used to decrease the number of pairwise tests that must be carried out to detect firstorder epistasis using (BILOC). Under a null model containing no QTL, reducing the number of tests performed should also reduce the level of highest excursion of (BILOC)’s likelihood ratio. In consequence, a lower threshold could be used to declare significant firstorder epistasis and higher power should result. We investigated the effectiveness of such a twostep procedure by simulation.
In a twostep procedure, for an interacting pair of QTL to be detected, they must both pass both steps. Statistical thresholds for both steps must be chosen to attain the desired overall type I error rate (here, α = 5%). High stringency in step 1 reduces the power to pass step 1 but also drastically reduces the number of tests performed in step 2, thereby increasing the power to pass step 2. A tradeoff between the powers to pass each step results (Figure 4a). Under the conditions simulated, the stringency for step 1 that is optimal for obtaining the greatest overall power occurs for a minimal sum of the deviation likelihood ratios (R LR_{Deviation}) at two loci under consideration between 8 and 13 (Figure 4b). The rationale for using R LR_{Deviation} in step 1 derives from the result that even for two loci that are simulated to be interacting, the test statistics associated with each locus were independent. By simulation we found correlations between LR_{Deviation} statistics for two loci of r = 0.01 and r = 0.03 for
With this twostep procedure the powers obtained to pass both steps and map both QTL to within 25 cM of their simulated positions were 46 and 11% for QTL pairs with
DISCUSSION
Combining across and withinpopulation information: The twostep procedure that we have explored constitutes a method to combine information obtained across populations (QTLbygeneticbackground interaction) and within populations (QTLbyQTL interaction). A possible drawback of this combination method is that interacting QTL will not be found unless test statistics for both steps (R LR_{Deviation} and LR_{Epistasis}) exceed minimal thresholds. Low correlations found by simulation between R LR_{Deviation} and LR_{Epistasis} statistics (r = 0.21 and r = 0.10 for
Thus far, we have applied across and withinpopulation models in separate analyses. As a further refinement, however, it would be possible to combine the linear models used in each analysis. A combinedmodels approach would test a twolocus extension of (REDUCED) against a twolocus extension of (FULL) combined with (BILOC) interaction regression parameters within each population [call these models (REDUCED)^{2} and (FULL)^{2}, respectively]. In the case of a threepopulation diallel, (FULL)^{2} would differ from (REDUCED)^{2} by five parameters: one parameter per locus for genetic background interaction and one parameter per population in the diallel for locusbylocus interaction. Relative to performing separate analyses, the combinedmodel analysis would gain power by estimating nuisance parameters (the mean and cofactor parameters) only once. It would also more powerfully detect epistatic interactions when they were manifest in all three populations rather than in only a single population as we simulated (Table 2). Finally, we note that while the compound statistic appears to increase the power to detect epistatic QTL, it also creates interpretation problems. If a QTL pair was found to be significant, further analysis would be required to determine what sources contributed to that significance.
We also note that we have only discussed combining information in the ideal situation where two loci interact with each other but with no further loci. When a locus interacts with more than one other locus, the simple equation linking deviation ratios among two interacting loci, d_{1} + d_{2} = 0, will no longer hold. For three loci, d_{1} + d_{2} + d_{3} = 0 will hold if no threeway interaction occurs among loci. This zero sum, however, does not imply any simple pairwise relationship between, say, d_{1} and d_{2}. As the complexity of interactive sets of QTL increases, therefore, QTLbygeneticbackground interaction information will likely become less useful for the detection of firstorder interactions.
Generalization to other population structures: When QTL affecting the same trait are mapped in several populations they are generally not found in the same locations over the populations (e.g., Brummeret al. 1997; Orfet al. 1999). Beavis (1994) pointed out a number of sources of difference between different QTL experiments that reduce reproducibility. First, when the power to detect a QTL is less than one, it may be detected in some experiments but not others. A QTL’s detection is enhanced or reduced by colinearity with error residuals caused by random measurement variation and by colinearity with other QTL caused by random sampling of particular genotypes among segregating progeny. With proper randomization, these sampling variations will not be repeated from experiment to experiment. Second, a QTL will be detected only in populations where it segregates: even in the absence of known coancestry, some parents may carry identicalinstate alleles. Third, QTLbyenvironment interaction may affect QTL substitution effects in experiments conducted in different environments. Finally, QTLbygeneticbackground interaction may affect QTL substitution effects in experiments conducted with different populations. This plethora of possible causes leading to discrepancies between QTL mapping results makes the design of experiments to support/reject the QTLbygeneticbackground interaction hypothesis challenging. Clearly, to attribute observed differences in QTL effects to that hypothesis requires mating designs that bring single alleles of known origin into different genetic backgrounds. The diallel mating design achieves this task, but other designs are possible.
For populations produced from matings among N inbred lines a maximum of N alleles may segregate at a given locus. For a (REDUCED) model assuming QTL allelic values g_{i} (i = 1... N) that are fixed over populations, N  1 values may be estimated with the constraint
As a final example consider two inbred parents (A and B) and two recombinantinbredline or doubledhaploid populations, one produced from the F_{2} generation and the other produced from a BC_{1} generation. Here N = 2 and P = 2, and the populations have different genetic backgrounds because in the first, segregating alleles come from each parent equally, while in the second, segregating alleles come from one parent 75% of the time. Beavis et al. (1994) noted such a background difference as a possible factor explaining differences between their results and those of Stuber et al. (1992) for QTL affecting yield in maize (Zea mays L.). In this case, (FULL) estimates two substitution effects (g_{A1}g_{B1}) and (g_{A2}g_{B2}) while the algebraic identity leading to (REDUCED) is particularly simple: (g_{A1}g_{B1}) = (g_{A2}g_{B2}) = (g_{A}g_{B}). This example is a special case of the mechanism discussed by Goodnight (1988) whereby founder events convert epistasis into additivity: partial fixation at loci “will tend to incorporate their epistatic interaction into additive genetic variance.” Thus, (g_{A1}g_{B1}) and (g_{A2}g_{B2}) may differ if an interaction between loci in the F_{2}derived population is converted to a main effect in the BC_{1}derived population.
As illustrated in Table 2, a similar mechanism can lead to QTLbygeneticbackground interaction when more than two parents are used but the epistatically interacting QTL they carry are only biallelic. When several populations are derived from such parents, one may expect that in some populations some but not all of the interacting QTL will be segregating. In such cases, the epistatic variance will be converted to additive variance at the segregating loci, leading to detectable deviation contrasts. For crop species that have gone through major bottlenecks in the course of their domestication, such as soybean [Glycine max (L.) Merrill.] in the United States (Gizliceet al. 1993), this mechanism would appear as a simple and possibly frequent explanation for QTLbygeneticbackground interaction.
Implications for markerassisted selection: Research reports and theory surrounding markerassisted selection often sidestep the issue of QTLbygeneticbackground interaction. For example, Hospital and Charcosset (1997) discuss optimal markerassisted QTL introgression approaches “provided the expression of the gene is not reduced in the recipient genomic background.” However, within a markerassisted selection program in which progeny from different crosses are routinely genotyped, the methods described would make it possible to systematically uncover QTL that interact with genetic background. Within such a context, the information obtained could refine genotypebased selection indices, either by avoiding QTL that interact with genetic background or by improving the prediction of the genetic value conferred by specific QTL combinations.
Acknowledgments
We thank two anonymous reviewers for suggestions and improvements. A National Science Foundation North Atlantic Treaty Organization Postdoctoral Fellowship in Science and Engineering, DGE9902466, supported J.L.J.’s work on this research.
Footnotes

Communicating editor: J. A. M. van Arendonk
 Received March 26, 2000.
 Accepted October 9, 2000.
 Copyright © 2001 by the Genetics Society of America