Genetics, Vol. 162, 951-960, October 2002, Copyright © 2002

A Penalized Likelihood Method for Mapping Epistatic Quantitative Trait Loci With One-Dimensional Genome Searches

Martin P. Boera, Cajo J. F. ter Braaka, and Ritsert C. Jansenb
a Biometris, 6700 AC Wageningen, The Netherlands
b Groningen Bioinformatics Centre, University of Groningen, 9700 AV Groningen, The Netherlands

Corresponding author: Martin P. Boer, Bornsesteeg 47, PO Box 100, 6700 AC Wageningen, The Netherlands., m.p.boer{at}plant.wag-ur.nl (E-mail)

Communicating editor: P. D. KEIGHTLEY


*  ABSTRACT
*TOP
*ABSTRACT
*METHODOLOGY
*SIMULATIONS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Epistasis is a common and important phenomenon, as indicated by results from a number of recent experiments. Unfortunately, the discovery of epistatic quantitative trait loci (QTL) is difficult since one must search for multiple QTL simultaneously in two or more dimensions. Such a multidimensional search necessitates many statistical tests, and a high statistical threshold must be adopted to avoid false positives. Furthermore, the large number of (interaction) parameters in comparison with the number of observations results in a serious danger of overfitting and overinterpretation of the data. In this article we present a new statistical framework for mapping epistasis in inbred line crosses. It is based on reducing the high dimensionality of the problem in two ways. First, epistatic QTL are mapped in a one-dimensional genome scan for high interactions between QTL and the genetic background. Second, the dimension of the search is bounded by penalized likelihood methods. We use simulated backcross data to illustrate the new approach.


EPISTASIS is a common and important phenomenon, as indicated by results from an increasing number of recent experiments [see, for example, LARK et al. 1995 Down for soybean; HOLLAND et al. 1997 Down for oat; NAGASE et al. 2001 Down for mice]. Unfortunately, mapping the epistatic quantitative trait loci (QTL) is a serious and difficult challenge. Most of the existing methods for detecting epistasis use a higher dimensional search; see, for example, CHASE et al. 1997 Down, HOLLAND 1998 Down, KAO et al. 1999 Down, ZENG et al. 1999 Down, CARLBORG et al. 2000 Down, SEN and CHURCHILL 2001 Down. JANNINK and JANSEN 2001 Down proposed using multiple related inbred line crosses for detecting epistasis. In their approach epistatic QTL are mapped by identifying loci of high interaction between QTL and genetic background in a one-dimensional genome scan by combining information from the different crosses.

This article presents a one-dimensional search approach for detecting interacting QTL in single populations, which we call EpiMQM (epistatic multiple QTL mapping), an extension of MQM (JANSEN 1993 Down; JANSEN and STAM 1994 Down). EpiMQM uses penalized likelihood methods to fit the interaction between a QTL and its genetic background. Penalized likelihood methods were also used by WHITTAKER et al. 2000 Down in the context of marker-assisted selection. We describe the statistical machinery and subsequently present results from a number of illustrative simulations, mimicking crossings between a recipient strain and a derived congenic strain in mice (e.g., MOEN et al. 1996 Down).


*  METHODOLOGY
*TOP
*ABSTRACT
*METHODOLOGY
*SIMULATIONS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We describe the EpiMQM method for inbred line crosses. Without loss of generality, we here describe our approach for the case of a backcross population. The EpiMQM method consists of two steps, quite similar to standard MQM mapping. In the first step of EpiMQM, we use penalized multiple regression of the trait on the main effects of all the markers. The model is found by optimizing a certain criterion that we describe later in this section. In the second step, a one-dimensional genome search for QTL is done. This search combines interval mapping (IM; LANDER and BOTSTEIN 1989 Down) with penalized multiple regression of the trait on the QTL and its QTL by background interactions. In this step we test the presence of a QTL at a certain map position, while the other QTL form the genetic background and are approximated by nearby markers.

Step 1 of EpiMQM: penalized regression on the markers:
Let n denote the number of individuals. The number of markers is denoted by m. The linear model for the ith individual in a backcross population is given by

where yi denotes the trait value of the ith individual, µ is the mean of the population, {alpha}j is the substitution effect for the jth marker, and {epsilon}i ~ N(0, {sigma}2) is the residual error for individual i. The variable xij indicates the genotype of marker j of individual i. If the marker is homozygous (heterozygous), then xij = 1/2 (xij = -1/2). At first we assume that all markers are completely informative. Later in this section we show how the method can easily be extended to the case with missing data. In vector notation, the above model can be written as Y = Xß + {epsilon}, where

In standard multiple regression, the estimators of {alpha}j are not restricted; i.e., it is implicitly assumed that quite large values of {alpha}j are not unreasonable. Penalized regression can be regarded as an approach for estimating ß from the data, subject to the belief that smaller values of the parameters {alpha}j are more likely than larger values. For this reason it is assumed here that the parameters {alpha}j are outcomes of m independent draws from normal priors; i.e., {alpha}j ~ N(0, {sigma}2{alpha}j), where the parameter {sigma}2{alpha}j indirectly puts a penalty on larger values of the parameter {alpha}j (GOLDSTEIN and SMITH 1974 Down; DRAPER and SMITH 1981 Down; GELMAN et al. 1995 Down). No prior information for the additive effect of marker j is given if , while implies that {alpha}j = 0 (j = 1, 2, ... , m). In Bayesian terms, the parameters {alpha}j are shrunk toward a prior mean of zero. The penalized likelihood is given by

(1)

where {phi}(x, {sigma}2) is the probability density function of the normal distribution with mean x and variance {sigma}2. Note that the penalized likelihood approach can be regarded as a special case of a mixed model, where Equation 1 is the joint probability density of the responses and the unobserved random effects. Maximization of Equation 1 gives, taking

where Q is a (m + 1) x (m + 1) diagonal matrix: Q = diag(0, {lambda}1, {lambda}2, ... , {lambda}m). In this first step of the EpiMQM method we assume that the penalties {lambda}j are all equal to {lambda}. In this form, with Q = diag(0, {lambda}, {lambda}, ... , {lambda}), the method is known as ridge regression (e.g., DRAPER and SMITH 1981 Down; WHITTAKER et al. 2000 Down). The first diagonal element of Q is zero because there is no penalty on the mean µ. The inverse matrix of (XTX + Q) exists if {lambda} is positive, which implies that the estimators and 2 are unique, even if the matrix XTX is singular. In contrast with standard regression, the dimension of the model is not equal to the number of free parameters. The effective dimension, denoted by deff, can be defined by

as described by HASTIE and TIBSHIRANI 1990 Down and EILERS and MARX 1996 Down. Note that deff = m + 1 if {lambda} = 0 and XTX is nonsingular. For {lambda} -> {infty}, the estimators of all parameters, except for the mean µ, will be equal to zero, and thus deff = 1. Fig 1A shows the effective dimension as a function of the parameter {lambda} for a simulated backcross example (see SIMULATIONS for further details). HASTIE and TIBSHIRANI 1990 Down also show that deff can be used for an unbiased estimation of {sigma}2,



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 1. (a) Effective dimension deff as a function of the penalty parameter {lambda} for simulation I. Note that the horizontal axis has the base 10 log scale. For {lambda} close to zero the ridge regression method is approximately equal to regression on all markers, while for {lambda} -> {infty} the method reduces to the model with no markers. (b) Bayesian information criterion (BIC) as a function of the parameter {lambda}. The function has a minimum at {lambda} {cong} 187, with effective dimension deff {cong} 5.8.

In this article we use the unbiased estimator instead of the maximum-likelihood estimator.

If there are missing marker scores, the parameters can be estimated via the expectation-maximization (EM) algorithm (DEMPSTER et al. 1977 Down). The basic idea of the EM algorithm is to replace the incomplete observations by complete observations, weighting the complete observations by specified or updated probabilities (e.g., JANSEN 1994 Down, JANSEN 2001 Down). The EM algorithm consists of the following two steps:

  1. Expectation step: Specify or update the weight matrix W, which is a diagonal matrix of conditional probabilities P(genotype|yi, ß, {sigma}2).

  2. Maximization step: Update estimates of ß and {sigma}2,

    where X denotes here the augmented design matrix and Y denotes the vector with the augmented trait data (JANSEN 2001 Down).

These two steps are iterated until convergence to a stationary point of the likelihood. In the final step we use the unbiased estimator for {sigma}2; i.e., we divide by n - deff instead of n, where deff is defined by deff = trace ((XTWX + Q)-1 XTWX).

The extra penalty parameter {lambda} introduces a new problem: What is the optimal choice for the value of {lambda}? For the main effects of the markers we use the Bayesian information criterion (BIC; KASS and RAFTERY 1995 Down) to find the best balance between model complexity and fit to the data,

where {kappa} = ln n and Lmax is the maximum likelihood. Note that both Lmax and deff depend on the penalty parameter. The optimal penalty is found by minimizing the BIC. The effective dimension at the minimum of the BIC is denoted by dmain, since it gives an approximation of the dimension of the model with only main effects. Fig 1B shows the BIC as a function of the parameter {lambda} for simulation example I (see SIMULATIONS). For this example, the BIC has a minimum at {lambda} {cong} 187, with effective dimension dmain {cong} 5.8. Akaike's information criterion (AIC) tends to select more complex models (and thus lower values for {lambda}), since {kappa} = 2 for AIC. JANSEN 1994 Down used {kappa} = 6 for the selection of marker cofactors in MQM mapping, i.e., a more stringent penalty for the number of free parameters. This criterion is equivalent to BIC if the population size is 400.

Other approaches than AIC and BIC can be used to select a value for {lambda}, in particular, (generalized) cross-validation (e.g., EILERS and MARX 1996 Down), which aims to maximize predictive accuracy. Another method that can be used is restricted maximum likelihood (REML), as noted by FERNANDO and GROSSMAN 1989 Down. Another quite different approach that can be used is to choose a penalty {lambda} on the basis of a fixed dimension dmain as proposed by HASTIE and TIBSHIRANI 1990 Down in the context of smoothing splines. The choice for the value of dmain should reflect the prior belief for the number of QTL in the segregating population. In the next section we describe how this method can be used for modeling the interaction between a QTL and its genetic background.

Step 2 of EpiMQM: one-dimensional genome scan for epistatic QTL:
We describe how the combination of interval mapping and penalized regression can be used to search for epistatic QTL along the genome. Suppose we test for the presence of a QTL at a certain map position. This can be seen as adding an extra marker to the regression model, where all marker scores are missing. The model reads

(2)

where {alpha}0 is the additive effect of the putative QTL, {delta}j is the interaction effect between the QTL and marker j, and xi0 is an indicator variable for the QTL of individual i. If the incomplete observations are replaced by the complete observations (see, e.g., JANSEN 2001 Down), the model can be written as Y = Xß + {epsilon}, where X is the augmented design matrix, Y the vector with the augmented trait data, and ß = (µ, {alpha}0, {alpha}1, ... , {alpha}m, {delta}1, {delta}2, ... , {delta}m)T. For this model the penalty matrix Q has the general form,

where {lambda}j is a penalty on the additive effect of marker j and {xi}j is the penalty on the interaction between the putative QTL and marker j. The first two diagonal elements of the matrix Q are zero because we do not put a penalty on the mean µ and the QTL main effect {alpha}0. As noted by one of the referees, it might be interesting to put a small penalty on the QTL main effect {alpha}0, to reduce the bias in the estimates of this parameter (GORING et al. 2001 Down). However, in this article we concentrate on the primary goal of a genome scan for finding the locations of the QTL. We further assume that {lambda}j = {lambda} and {xi}j = {xi} if the distance between putative QTL and marker j is at least 10 cM, otherwise {lambda}j = {xi}j = {infty} (and thus {alpha}j = {delta}j = 0). Thus, a marker is not used in the model if the distance between the marker and the putative QTL is too small. This corresponds to the MQM method (JANSEN 1994 Down), where a selected cofactor is dropped if the cofactor lies within a preset window of 20 cM, or any other size of the window set by the user, around the putative QTL. Note that the penalty matrix Q and the effective dimension deff are functions of {lambda}, {xi}, and p, the map position of the putative QTL. For this reason we also denote the effective dimension deff by the extended notation deff({lambda}, {xi}, p).

Before we can perform a scan for QTL we need to choose values for the penalties {lambda} and {xi}. We could use BIC to find optimum values {lambda} and {xi} at each map position p under study. However, we used a slightly more heuristic procedure to save computing time. The values of {lambda} and {xi} at a position p can be chosen such that

where depi is the effective dimension for epistatic interactions. The value of depi can be chosen by the user. As noted earlier, this is similar to the approach proposed by HASTIE and TIBSHIRANI 1990 Down for smoothing splines. It turns out that the effective dimensions deff({lambda}, {infty}, p) and deff({lambda}, {xi}, p) hardly vary when moving the map position p of the QTL along the genome with fixed values of {lambda} and {xi}. Therefore we decided to calculate {lambda} and {xi} at an arbitrary map position and to use these values when the putative QTL is moved along the genome. In the simulation studies in this article we used the center of the second chromosome as the map position to calculate {lambda} and {xi}.

If we choose depi close to zero, we implicitly assume that there are no epistatic interactions. If there is strong evidence, for instance from earlier experiments, that there are many interacting QTL, a high value for depi can be chosen. However, high values of depi (and thus {xi} close to zero) will result in a decrease in power to detect QTL, since there are too many free interaction parameters in the model.

The evidence for a QTL at a certain map position can be expressed as usual in the LOD score, which is defined by

where 0 and 1 are the maximum-likelihood estimates under the null hypothesis H0 and alternative hypothesis H1, respectively. There are two tests of interest; one is to test for the presence of a QTL, and another is to test for the presence of interactions between a putative QTL and its genetic background. In this article we demonstrate the first test. The hypotheses for testing for the presence of a QTL are defined by


*  SIMULATIONS
*TOP
*ABSTRACT
*METHODOLOGY
*SIMULATIONS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We present three simulated backcross examples. In each of these examples the genome consists of three chromosomes of 100 cM each and with one QTL at 35 cM on chromosome 1, a second at 53 cM on chromosome 2, and a third at 22 cM on chromosome 3. The markers are placed at a distance of 10 cM apart. The data are simulated for 200 backcross individuals. The model for the phenotype yi is given by

where nQTL = 3 is the number of QTL, {alpha}j is the additive effect of QTL j, and {delta}kl is the interaction effect between QTL k and l. The indicator variables qij depend on the genotype of QTL j for individual i. If QTL j of individual i is homozygous, then qij = 1/2, otherwise qij = -1/2. If all three QTL are unlinked, the total genetic variance is given by

The random environmental variable is normally distributed with mean zero and variance {sigma}2e. The heritability in the broad sense (FALCONER and MACKAY 1996 Down), denoted by h2, is the fraction of the total variance explained by the genetic variance: .

The parameter values for the three sets of simulations are given in Table 1. In examples I and II it is assumed that there is a strong interaction between QTL 1 and 2, while their main effects are relatively low compared to QTL 3. Further, QTL 3 interacts weakly with QTL 1 and 2. Examples I and II differ only in the heritability h2. In example III it is assumed that the three QTL have no main effects, while the values of the interactions between the QTL are set equal to the values in examples I and II. There are two reasons why we include this simulation set in which all genetic variance is explained by epistatic interactions. First, previous work by FIJNEMAN et al. 1996 Down, FIJNEMAN et al. 1998 Down suggests that cancer susceptibility genes in mice are often completely counteracting; i.e., these interacting genes show no significant main effects. The second reason is that in most methods for QTL mapping it is difficult to search for QTL with no main effects.


 
View this table:
In this window
In a new window

 
Table 1. Parameter values for the three sets of simulations

Example I consists of one single simulation, whereas examples II and III consist of a set of 1000 simulations. The first example is used to illustrate the EpiMQM method, whereas the other two examples are used to study the power of this new method. We obtained 5% significance thresholds from 10,000 simulation runs on individuals generated without genetic variance. In each simulation run the maximum LOD score along the genome was calculated by using a fixed stepsize of 1 cM. The LOD threshold value for depi = d is denoted by Td. Fig 2 shows that a high value for depi will lead to a decrease in power to detect QTL, because of the high threshold values. In Fig 2 we also plotted threshold values using the formula given by LANDER and BOTSTEIN (1989). For interval mapping (IM) they showed that in the limit of an infinitely dense map and a large progeny size, the LOD score along the genome varies according to the square of an Ornstein-Uhlenbeck diffusion process. The 5% significance threshold T can be approximated by T = x/(2 ln 10), where x is defined by the equation

(3)



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 2. Threshold values as a function of the effective dimension for interactions depi. The solid line is the threshold value obtained from 10,000 simulation runs. The dashed line is the approximated threshold given by Equation 3. For a further explanation see text.

In this formula {alpha} = 0.05 is the significance level, C is the number of chromosomes, G is the genome length in morgans, and {chi}2f(x) denotes the cumulative distribution function of the {chi}2 distribution with f degrees of freedom. We calculated an approximation of the threshold by setting the degrees of freedom f to depi + 1, i.e., to the difference in effective number of degrees of freedom between the null hypothesis H0 and the alternative hypothesis H1. Fig 2 shows that the differences between the thresholds obtained from simulations and Equation 3 are small. Therefore it seems reasonable to use this formula for the threshold if it will take too much computer time to obtain threshold values by simulation. In this article we use the threshold values obtained from simulations. Note that this procedure controls the experimentwise error of falsely detecting a QTL (WELLER et al. 1998 Down).


*  RESULTS
*TOP
*ABSTRACT
*METHODOLOGY
*SIMULATIONS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Example I:
As described earlier, the EpiMQM method consists of two steps: penalized regression on the markers and a one-dimensional scan searching for interacting QTL. For this simulation example the first step was already discussed in METHODOLOGY (see Fig 1 and Fig 2).

Fig 3A shows the LOD scores along the genome for depi = 0; i.e., interactions between QTL and genetic background are excluded from the model. For this simulation only QTL 3 has a LOD score above the threshold T0 = 2.1. The reason is that QTL 3 has a strong main effect, whereas the other two QTL have weak main effects. Fig 3B gives the LOD scores if we include interactions between a QTL and its genetic background by setting depi = 3. Although increasing depi increases the threshold value from T0 = 2.1 to T3 = 3.7, all three QTL are detected, with LOD scores far above the threshold value.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 3. LOD scores along the genome for simulation I for depi = 0 (a) and depi = 3 (b). The horizontal dashed lines are the 5% significance threshold values T0 = 2.1 (a) and T3 = 3.7 (b), obtained from 10,000 simulation runs. (a) Only QTL 3 is detected, since it has a relatively large main effect, whereas the other two QTL have weak main effects. (b) All three QTL are detected, with LOD scores far above the threshold value.

In Fig 4 the squared interaction effect {delta}2j between QTL and marker j is plotted for three positions of the putative QTL, namely at the maximum LOD score of each chromosome. The figure clearly shows that, if the putative QTL is located on chromosome 1, there are strong interactions with markers on chromosome 2. Similarly, if the putative QTL is located on chromosome 2, there are strong interactions with markers on chromosome 1. Both curves indicate that there is a strong interaction between the QTL found on the first and second chromosomes. If the putative QTL is located on chromosome 3, the figure shows that there are weak interactions with markers on the first and second chromosomes.



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 4. Squared interaction effects {delta}2j between putative QTL and the markers. The three curves are for three different positions of the putative QTL, namely at the maximum LOD scores of the three chromosomes (see Fig 3B). For a further description see text.

Example II:
In the previous example we calculated the LOD scores for just two settings of depi, namely 0 and 3. Now we study how the power to detect QTL depends on depi. The same parameter values are used as in example I, except that the heritability h2 is set to a lower value (0.30). Fig 5A shows how the power to detect QTL depends on depi. QTL 1 and 2 have a low power to be detected if depi = 0, since these QTL have very weak main effects. A small increase in depi leads to a higher power for all three QTL. For QTL 3, with a large main effect and weak interaction effects, the highest power to be detected is obtained if the value of depi lies between 1 and 3. The other two strongly interacting QTL have the highest chance to be detected if depi lies in a range between 3 and 5. Thus, for this example, depi = 3 is a good choice.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 5. Power to detect QTL for the simulation sets II (a) and III (b).

Example III:
This example concerns three QTL with only epistatic interactions and no main effects (see Table 1). Fig 5B shows the power to detect the QTL as a function of depi. As desired, the method fails to detect QTL if depi = 0, since the QTL have no main effect. However, the chance of detecting QTL increases rapidly with depi. For intermediate values of depi (2 <= depi <= 10), the strongly interacting QTL 1 and 2 are detected in >95% of the simulations, while the weakly interacting QTL 3 is detected in at least 30% of the simulations.


*  DISCUSSION
*TOP
*ABSTRACT
*METHODOLOGY
*SIMULATIONS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Relation to other QTL mapping methods:
The EpiMQM method can be considered as a generalization of IM (LANDER and BOTSTEIN 1989 Down), composite interval mapping (CIM; ZENG 1994 Down), and MQM (JANSEN and STAM 1994 Down). To clarify the relation between the existing QTL mapping methods and our new method, we introduce the term selection degree, which is defined as s = 1/(1 + {lambda}). Thus, s = 1 if {lambda} = 0, and s = 0 if {lambda} -> {infty}. In IM, no markers are selected as cofactors, which in terms of penalized regression means that s = 0. In the original CIM method (ZENG 1994 Down), all markers are selected as cofactors (except for the flanking markers of the putative QTL), which corresponds to s = 1. In the EpiMQM method, markers can also be partially selected as cofactors; i.e., the selection degree can have any value in the range [0, 1]. Thus, the IM and CIM methods can be regarded as the end points of the EpiMQM method, if in the CIM method the maximum-likelihood estimator of {sigma}2 is replaced by the unbiased estimator.

The disadvantage of the IM method is that there are no markers in the model to control the genetic background. In contrast, if almost all markers are used as cofactors in the model to control the genetic background we may use more markers than necessary. This can reduce the power of the test and increase the sampling variance of estimates, particularly when the sample size is small (JANSEN 1994 Down; ZENG 1994 Down). The penalized regression method offers an alternative for subset selection methods to keep the effective dimension of the genetic background relatively small. The EpiMQM method can be extended such that the MQM method fits in the new statistical framework, by using different penalties {lambda}j and selection degrees sj for each marker j. For the markers that are selected as cofactors by backward selection in the MQM method we have sj = 1, while for the other markers we have sj = 0.

EpiMQM is also related to Bayesian methods for QTL mapping (e.g., HOESCHELE and VANRADEN 1993 Down; DU and HOESCHELE 2000 Down; WAAGEPETERSEN and SORENSEN 2001 Down), as both approaches use prior information and thereby also control the dimensionality of the model. Bayesian methods combine the prior information with the data to produce posterior densities for, for example, the number of QTL, the QTL positions, and their effects. Despite its Bayesian flavor, EpiMQM remains a frequentist analysis in which the evidence for a QTL is evaluated through LOD scores derived in a hypothesis testing framework (see MALIEPAARD et al. 2001 Down for a comparison of Bayesian vs. frequentist analysis). Moreover, EpiMQM uses several approximations, which lead to a smaller search space than used in a full Bayesian analysis. First, EpiMQM uses BIC, which is a numerical approximation to the Bayes factor (KASS and RAFTERY 1995 Down). Second, EpiMQM scans for one focal QTL at a time, using a model in which the effects of the remaining QTL are approximated by marker effects. Third, instead of dealing with all (pairwise) interactions among QTL simultaneously, EpiMQM considers only interactions of the focal QTL with the genetic background (the markers). Fourth, the dimensionality of the EpiMQM model is an approximation, and it is an effective dimension governed by a penalty parameter. These approximations used in the EpiMQM method lead to an optimization problem that can be solved in a modest amount of computer time. The user needs to specify only the effective dimension of the epistatic effects. In contrast, the Bayesian analysis must be carried out by a stochastic algorithm [reversible jump Markov chain Monte Carlo (RJMCMC)] that often requires much computer time. Although much progress has been made in recent years to improve the mixing properties of RJMCMC algorithms for QTL analysis (UIMARI and SILLANPAA 2001 Down; BINK et al. 2002 Down), it remains difficult, if not impossible, to ascertain that the RJMCMC algorithm has reached convergence in a practical application (COWLES and CARLIN 1996 Down). It then helps if an approximate solution is known in advance (GELMAN et al. 1995 Down), which can be provided by EpiMQM. Another approach could be to use our method in a Bayesian framework, i.e., fitting a single QTL with polygenic background.

Extensions and improvements of the EpiMQM method:
There are several ideas to further develop the penalized regression method, which need to be investigated in more detail. One idea for improvement concerns the selection of markers near the putative QTL. In the present form, if the moving QTL gets close to a marker, say a preset window of 10 or 20 cM, the marker is dropped as cofactor. The penalty for main effects jumps from some fixed value {lambda}j = {lambda} to {lambda}j = {infty} if the marker enters the QTL window. An alternative and more general solution for markers near a putative QTL is to define the penalty as {lambda}j = {psi}(r), where {psi} is a decreasing function of the recombination frequency r between marker j and the putative QTL, with {psi}(0) = {infty} and {psi}(1/2) = {lambda}. Defining some smooth function, for example {psi}(r) = {lambda}/(2r), seems more elegant then defining a QTL window.

Another extension of the penalized regression method can be obtained by using other penalty matrices Q. One of the problems in model selection procedures is that closely linked markers can cause statistical artifacts (JANSEN 2001 Down). For example, if two markers j and j + 1 are closely linked, and there is no real QTL in this region, then one would expect that {alpha}j + {alpha}j+1 {cong} 0. But we can still get large values for {alpha}j and {alpha}j+1, with {alpha}j {cong} -{alpha}j+1, which we might take as evidence for two linked QTL with opposite effects. However, in most cases this will be a statistical artifact due to the near collinearity between the two markers. With penalized regression we can reduce the chance of these statistical artifacts by putting a penalty on the differences {alpha}j - {alpha}j-1; see, for instance, EILERS 1991 Down.

Application of penalized likelihood methods to other population structures and marker-assisted selection:
The penalized likelihood method is of interest for a wide range of problems in statistical genetics. One example is the use of ridge regression in marker-assisted selection. WHITTAKER et al. 2000 Down showed that replacing the selection of a subset of markers by ridge regression can improve the mean response to selection and reduce the variability of selection response. A further advantage is that ridge regression can be applied even if XTX is singular, which often occurs after several generations of selection as the population becomes increasingly inbred. The problem of singular or near singular XTX matrices also often occurs in QTL mapping methods applied to a population with a dense marker map. This singularity of the XTX matrix can be caused by the absence of recombination between two markers. The XTX matrix will also be singular if there are more markers than phenotypic observations. Therefore, penalized likelihood methods are also valuable for QTL mapping of populations with a dense marker map.

The EpiMQM model in Equation 2 can be easily extended with fixed effects such as year effects. Other examples where penalized regression methods are of interest are experiments with genotype-by-environment interactions and in multiple related crosses. If we apply the standard MQM method, the number of parameters for the genetic background of a QTL, approximated by nearby markers, can be very high. Therefore it may be interesting to control the dimension of the genetic background by using penalized likelihood methods.


*  ACKNOWLEDGMENTS

We thank Marco Bink, Bas Engel, and Paul Goedhart for useful discussions of statistical methods for QTL mapping. The comments of two anonymous referees are gratefully acknowledged. The first author was supported by grant no. 925-01-001 from the Netherlands Organization for Scientific Research (NWO).

Manuscript received January 8, 2002; Accepted for publication July 18, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODOLOGY
*SIMULATIONS
*RESULTS
*DISCUSSION
*LITERATURE CITED

BINK, M. C. A. M., P. UIMARI, M. J. SILLANPÄÄ, L. L. G. JANSS, and R. C. JANSEN, 2002  Multiple QTL mapping in related plant populations via a pedigree-analysis approach. Theor. Appl. Genet. 104:751-762.[Medline]

CARLBORG, Ö, L. ANDERSSON, and B. KINGHORN, 2000  The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155:2003-2010.[Abstract/Free Full Text]

CHASE, K., F. R. ADLER, and K. G. LARK, 1997  EPISTAT: a computer program for identifying and testing interactions between pairs of quantitative trait loci. Theor. Appl. Genet. 94:724-730.

COWLES, M. K. and B. P. CARLIN, 1996  Markov chain Monte Carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 91:883-904.

DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977  Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39:1-38.

DRAPER, N. R., and H. SMITH, 1981 Applied Regression Analysis, Ed. 2. Wiley, New York.

DU, F.-X. and I. HOESCHELE, 2000  Estimation of additive, dominance and epistatic variance components using finite locus models implemented with a single-site Gibbs and a descent graph sampler. Genet. Res. 76:187-198.[Medline]

EILERS, P. H. C., 1991  Penalized regression in action: the estimation of pollution rises from daily averages. Environmetrics 2:25-47.

EILERS, P. H. C. and B. D. MARX, 1996  Flexible smoothing with B-splines and penalties. Stat. Sci. 11:89-121.

FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics. Longman Group, Essex, United Kingdom.

FERNANDO, R. L. and M. GROSSMAN, 1989  Marker assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21:467-477.

FIJNEMAN, R. J. A., S. S. DE VRIES, R. C. JANSEN, and P. DEMANT, 1996  Complex interactions of new quantitative trait loci, Sluc1, Sluc2, Sluc3, and Sluc4, that influence the susceptibility to lung cancer in the mouse. Nat. Gen. 14:465-467.[Medline]

FIJNEMAN, R. J. A., R. C. JANSEN, M. A. VAN DER VALK, and P. DEMANT, 1998  High frequency of interactions between lung cancer susceptibility genes in the mouse: mapping of Sluc5 to Sluc14. Cancer Res. 58:4794-4798.[Abstract/Free Full Text]

GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 1995 Bayesian Data Analysis. Chapman & Hall, Suffolk, United Kingdom.

GOLDSTEIN, M. and A. F. M. SMITH, 1974  Ridge-type estimators for regression analysis. J. R. Stat. Soc. Ser. B 36:284-291.

RING, H. H. H., J. D. TERWILLIGER, and J. BLANGERO, 2001  Large upward bias in estimation of locus-specific effects from genomewide scans. Am. J. Hum. Genet. 69:1357-1369.[Medline]

HASTIE, T. J., and R. J. TIBSHIRANI, 1990 Generalized Additive Models. Chapman & Hall, London.

HOESCHELE, I. and P. M. VANRADEN, 1993  Bayesian analysis of linkage between genetic markers and quantitative loci. I. Prior knowledge. Theor. Appl. Genet. 85:953-960.

HOLLAND, J. B., 1998  EPISTACY: a SAS program for detecting two-locus epistatic interactions using genetic marker information. J. Hered. 89:374-375.[Free Full Text]

HOLLAND, J. B., H. S. MOSER, L. S. O'DONOUGHUE, and M. LEE, 1997  QTLs and epistasis associated with vernalization responses in oat. Crop Sci. 37:1306-1316.[Abstract/Free Full Text]

JANNINK, J.-L. and R. C. JANSEN, 2001  Mapping epistatic quantitative trait loci with one-dimensional genome searches. Genetics 157:445-454.[Abstract/Free Full Text]

JANSEN, R. C., 1993  Interval mapping of multiple quantitative trait loci. Genetics 135:205-211.[Abstract]

JANSEN, R. C., 1994  Controlling the type I and II errors in mapping quantitative trait loci. Genetics 138:871-881.[Abstract]

JANSEN, R. C., 2001 Quantitative trait loci in inbred lines, pp. 567–597 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. Wiley, New York.

JANSEN, R. C. and P. STAM, 1994  High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455.[Abstract]

KAO, C.-H., Z-B. ZENG, and R. D. TEASDALE, 1999  Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216.[Abstract/Free Full Text]

KASS, R. E. and A. E. RAFTERY, 1995  Bayes factors. J. Am. Stat. Assoc. 90:773-795.

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits by using RFLP linkage maps. Genetics 121:185-199.[Abstract/Free Full Text]

LARK, K. G., K. CHASE, F. ADLER, L. M. MANSUR, and J. H. ORF, 1995  Interactions between quantitative trait loci in soybean in which trait variation at one locus is conditional upon a specific allele at another. Proc. Natl. Acad. Sci. USA 92:4656-4660.[Abstract/Free Full Text]

MALIEPAARD, C., M. J. SILLANPÄÄ, J. W. VAN OOIJEN, R. C. JANSEN, and E. ARJAS, 2001  Bayesian versus frequentist analysis of multiple quantitative trait loci with an application to an outbred apple cross. Theor. Appl. Genet. 103:1243-1253.

MOEN, C. J. A., P. C. GROOT, A. A. M. HART, M. SNOEK, and P. DEMANT, 1996  Fine mapping of colon tumor susceptibility (Scc) genes in the mouse, different from the genes known to be somatically mutated in colon cancer. Proc. Natl. Acad. Sci. USA 93:1082-1086.[Abstract/Free Full Text]

NAGASE, H., J.-H. MAO, J. P. DE KONING, T. MINAMI, and A. BALMAIN, 2001  Epistatic interactions between skin tumor modifier loci in interspecific (spretus/musculus) backcross mice. Cancer Res. 61:1305-1308.[Abstract/Free Full Text]

SEN, S. and G. A. CHURCHILL, 2001  A statistical framework for quantitative trait mapping. Genetics 159:371-387.[Abstract/Free Full Text]

UIMARI, P. and M. SILLANPÄÄ, 2001  A Bayesian MCMC linkage analysis with segregation indicators for complex pedigrees. Genet. Epidemiol. 21:224-242.[Medline]

WAAGEPETERSEN, R. and D. SORENSEN, 2001  A tutorial on reversible jump MCMC with a view toward applications in QTL-mapping. Int. Stat. Rev. 69:49-61.

WELLER, J. I., J. Z. SONG, D. W. HEYEN, H. A. LEWIN, and M. RON, 1998  A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150:1699-1706.[Abstract/Free Full Text]

WHITTAKER, J. C., R. THOMPSON, and M. C. DENHAM, 2000  Marker-assisted selection using ridge regression. Genet. Res. 75:249-252.[Medline]

ZENG, Z-B., 1994  Precision mapping of quantitative trait loci. Genetics 136:1457-1468.[Abstract]

ZENG, Z-B., C.-H. KAO, and C. J. BASTEN, 1999  Estimating the genetic architecture of quantitative traits. Genet. Res. 74:279-289.[Medline]




This article has been cited by other articles:


Home page
GeneticsHome page
A. Manichaikul, J. Y. Moon, S. Sen, B. S. Yandell, and K. W. Broman
A Model Selection Approach for the Identification of Quantitative Trait Loci in Experimental Crosses, Allowing Epistasis
Genetics, March 1, 2009; 181(3): 1077 - 1086.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. C. Reif, B. Kusterer, H.-P. Piepho, R. C. Meyer, T. Altmann, C. C. Schon, and A. E. Melchinger
Unraveling Epistasis With Triple Testcross Progenies of Near-Isogenic Lines
Genetics, January 1, 2009; 181(1): 247 - 257.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. P. Boer, D. Wright, L. Feng, D. W. Podlich, L. Luo, M. Cooper, and F. A. van Eeuwijk
A Mixed-Model Quantitative Trait Loci (QTL) Analysis for Multiple-Environment Trial Data Using Environmental Covariables for QTL-by-Environment Interactions, With an Example in Maize
Genetics, November 1, 2007; 177(3): 1801 - 1813.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Baierl, M. Bogdan, F. Frommlet, and A. Futschik
On Locating Multiple Interacting Quantitative Trait Loci in Intercross Designs
Genetics, July 1, 2006; 173(3): 1693 - 1703.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Diao, D. Y. Lin, and F. Zou
Mapping Quantitative Trait Loci With Censored Observations
Genetics, November 1, 2004; 168(3): 1689 - 1698.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
D. W. Podlich, C. R. Winkler, and M. Cooper
Mapping As You Go: An Effective Approach for Marker-Assisted Selection of Complex Traits
Crop Sci., September 1, 2004; 44(5): 1560 - 1571.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Bogdan, J. K. Ghosh, and R. W. Doerge
Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci
Genetics, June 1, 2004; 167(2): 989 - 999.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Yi, S. Xu, and D. B. Allison
Bayesian Model Choice and Search Strategies for Mapping Interacting Quantitative Trait Loci
Genetics, October 1, 2003; 165(2): 867 - 883.
[Abstract] [Full Text] [PDF]