Abstract
Mapping of quantitative trait loci (QTL) for backcross and F_{2} populations may be set up as a multiple linear regression problem, where marker types are the regressor variables. It has been shown previously that flanking markers absorb all information on isolated QTL. Therefore, selection of pairs of markers flanking QTL is useful as a direct approach to QTL detection. Alternatively, selected pairs of flanking markers can be used as cofactors in composite interval mapping (CIM). Overfitting is a serious problem, especially if the number of regressor variables is large. We suggest a procedure denoted as marker pair selection (MPS) that uses model selection criteria for multiple linear regression. Markers enter the model in pairs, which reduces the number of models to be considered, thus alleviating the problem of overfitting and increasing the chances of detecting QTL. MPS entails an exhaustive search per chromosome to maximize the chance of finding the bestfitting models. A simulation study is conducted to study the merits of different model selection criteria for MPS. On the basis of our results, we recommend the Schwarz Bayesian criterion (SBC) for use in practice.
PROCEDURES for detecting multiple quantitative trait loci (QTL) are of growing interest to plant breeders and geneticists. The currently most widely used methods are interval mapping (IM; Lander and Botstein 1989, 1994) and composite interval mapping (CIM; Jansen 1993; Zeng 1993). In CIM, a chromosome is scanned for the presence of a QTL, while controlling for the genetic background using some markers as cofactors in a multiple regression framework (Zeng 1993; Jansen and Stam 1994). IM and CIM may be implemented using the maximumlikelihood (ML) method (Zeng 1994). Alternatively, an approximate leastsquares method can be used (Haley and Knott 1992; Martinez and Curnow 1992; Whittakeret al. 1996). In this article we use the leastsquares method.
Whittaker et al. (1996) showed that the leastsquares method for IM and CIM in backcross (BC_{1}) and F_{2} populations can be cast as a standard multiple linear regression of phenotype on marker type, since information on the QTL is absorbed by the flanking markers (Stam 1991). Therefore, the problem of QTL detection essentially reduces to the problem of finding the appropriate pairs of markers. In CIM, we have the additional task of selecting cofactors for controlling the genetic background. It may be argued that cofactors are useful for controlling genetic background only if they are closely linked to a QTL. In fact, the best control is expected for markers flanking the QTL. Thus, the dual problem of QTL detection and selection of cofactors is seen to be a single problem of finding the flanking markers of QTL (Lebreton and Visscher 1998). This puts us into the general framework of model selection in multiple regression for which there is a vast literature (see, e.g., Miller 1990; Draper and Smith 1998; McQuarrie and Tsai 1998).
A peculiarity of multiple regression for QTL mapping is that there is no single true model, because there is no fixed set of markers. If we drop a pair of flanking markers from the analysis, the dropped pair can be replaced by adjacent markers. Similarly, if a different marker system is used, marker loci will change, but still the flanking markers will absorb the QTL effects, leading to a different model conditional on the markers. Thus, the term “true model” has to be used with this peculiarity in mind.
We believe that multiple LR testing (equivalently Ftesting if linear least squares is used) for model selection is problematic for several reasons. Most importantly, multiple likelihoodratio (LR) tests without adjustments are known to tend to overfitting (Gelfand and Ghosh 1998). For example, in multiple regression, using an Ftoenter statistic at the nominal α = 5% level in a forward selection procedure can easily give a true significance level (false positive rate) >50% (Miller 1990). Furthermore, only nested models can be compared with LR tests. Also, the sequence of models to be compared in a modelbuilding process is not unique.
One reaction to the problems connected with multiple LR testing is to consider a Bayesian framework (see, e.g., Draper 1995; Sillanpää and Arjas 1998). A somewhat intermediate approach, which is computationally much less demanding than many of the Bayesian methods, is to use information criteria such as Akaike’s information criterion (AIC) or criteria assessing the mean squared error of prediction such as Mallows’ C_{p} (Burnham and Anderson 1998; Gelfand and Ghosh 1998; McQuarrie and Tsai 1998). Some selection criteria such as the Schwarz (1978) Bayesian criterion (SBC) involve a Bayesian approach to model selection. A number of different selection criteria are compared in the present article.
Our suggested procedure is denoted as marker pair selection (MPS). Instead of implementing a standard model selection procedure, we exploit knowledge of the genetic mechanisms underlying the data. Our MPS procedure has three distinctive features: (i) markers are selected in adjacent pairs to increase the chance of selecting flanking markers while reducing the risk of selecting nonflanking markers; (ii) an exhaustive search per chromosome is used in place of simple forward selection, which increases the chance of finding the bestfitting model; and (iii) a model selection criterion such as SBC is employed to select the final model among a sequence of models.
MATERIALS AND METHODS
In this section, we describe the model for mapping QTL in a backcross population as well as the method for parameter estimation. We then develop an MPS algorithm on the basis of a modified forward selection procedure, which generates a sequence of models with an increasing number of markers. From this sequence, a bestfitting model may be selected according to one of the criteria given in Table 2. The performance of MPS is studied by means of simulation.
Model and parameter estimation: Consider a backcross m_{L}m_{L}qqm_{R}m_{R} × M_{L}m_{L}QqM_{R}m_{R}, where M_{i} and m_{i} (i = L, R) denote the left and right flanking marker alleles, while Q, q are the QTL alleles. The recombination frequency between left and right markers is denoted as θ, while r_{L} and r_{R} are the recombination frequencies of the markers and the QTL. Let Y_{j} be the phenotypic value (e.g., yield) of an individual in the backcross population. Then, conditional on the QTL genotype, we write
Model selection criteria: We use criteria in Table 2 to select the bestfitting model among candidate models. A very thorough and concise review of these criteria can be found in McQuarrie and Tsai (1998). Model selection criteria can be broadly classified as either efficient or consistent (McQuarrie and Tsai 1998). Efficient criteria are based on the presumption that the generating or true model is of infinite dimension and/or that the set of candidate models does not contain the true model. The goal is to select the model that best approximates the true model. In large samples, a selection criterion that chooses the model with minimum mean squared error (MSE) is said to be asymptotically efficient. Examples for efficient criteria are AIC, AIC_{c}, C_{p}, final prediction error (FPE), R_{p}, and leaveoneout crossvalidation (PRESS; Table 2). Efficient criteria seek to minimize some measure of discrepancy between true model and selected model. The two most common measures are the KullbackLeibler discrepancy and mean squared error. Some efficient measures seek to minimize the former (e.g., AIC), while others try to minimize the latter (e.g., FPE). Both discrepancy measures are asymptotically equivalent (McQuarrie and Tsai 1998, p. 7).
Consistent criteria are designed for cases where the true model has low dimension and is assumed to be among the candidate models. A consistent criterion identifies the correct model asymptotically (as sample size increases) with probability one. Examples are SBC, HQ (Hannan and Quinn 1979), HQ_{c}, and GM (Geweke and Meese 1981; Table 2; McQuarrie and Tsai 1998). It is not clear, on a priori grounds, which type of criterion is more appropriate. The objective of QTL mapping is to detect as many of the true QTL as possible, while not detecting false QTL, i.e., to find the true genetic model. If the number of QTL is small, we might expect a consistent criterion to stand a better chance of correctly detecting QTL, while efficient criteria may perform better in more complex cases. Note, however, that optimality of different model selection criteria is based on asymptotic arguments. Therefore, in this article we study the small sample behavior of different criteria by means of simulation.
In case there are more markers than observations, the full model is not estimable, and hence Mallows’ C_{p} and GM are not applicable due to lack of an error variance estimate based on the full model. We might continue the forward selection until the error variance estimate stabilizes, but this raises the problem of determining when stabilization has taken place. Incidentally, sequential Ftesting will not work for our procedure, since models in the sequence are not necessarily nested.
Subset selection of markers: In what follows, we first point out the need to select adjacent pairs of markers rather than individual markers. We then make a few remarks regarding applicability of standard subset selection procedures to our problem. Finally, suggestions are given for modifications exploiting the biology of the problem at hand and the procedure is described in algorithmic form.
The effect and position of a QTL can be estimated from the regression coefficients of two flanking markers. A subset selection procedure can be used to find markers, which are likely to flank a QTL. If one marker is selected, we will also have to include one of the adjacent markers, because two flanking markers are needed in the estimation procedure. Whittaker et al. (1996) state that “an exception to this rule might be when markers are fitted as cofactors to absorb the effect of QTL which, although too small to be mapped individually, contribute a significant portion of genetic variance.” In our procedure, we include pairs of adjacent markers as a general rule. An obvious requirement for entry of a pair of adjacent markers into the model is that the sign of their estimates be the same, for otherwise the estimated model is not consistent with the presence of a QTL between the pair (Whittakeret al. 1996).
Since we are in a multiple regression framework, standard procedures for subset selection could be used, such as forward selection, etc. (Miller 1990; Draper and Smith 1998). By so doing, however, we would ignore all we know about the relationship among markers. A potential payoff is expected if this knowledge is taken into account. Backward selection is not used here, because it does not work when the number of markers exceeds the sample size. If the number of markers is large, an overall exhaustive search is usually prohibitive due to the large number of possible models. Forward selection or “stepwise” regression (Efroymson 1960) are the most feasible approaches among standard techniques. It is well known, however, that these methods are not guaranteed to find the bestfitting subsets (Miller 1990). They work best when the regressor variables are nearly uncorrelated (Weisberg 1985, p. 195). Marker data from the same chromosome are correlated, so simple forward selection is problematic, mainly because the bestfitting submodel is likely to be missed, while spurious variables may enter the model (Weisberg 1985). Particularly, some of the variables selected first may not be included in the best model (see Miller 1990, p. 48, for a striking example). A genomewide exhaustive search assures that the bestfitting model will not be missed, but has the disadvantage of a high computational burden.
In this article, we propose a modified forward selection strategy based on an article by Gabriel and Pun (1979; see also Miller 1990, p. 64). These authors suggested that in some situations it may be possible to find groups of regressors, within which an exhaustive search is possible. The grouping needs to be such that if two variables x_{i} and x_{j} are in different groups then their regression sum of squares is additive. This requirement is fulfilled for orthogonal variables. For orthogonal groups, performing an exhaustive search over all possible models is equivalent to an exhaustive search per group and is thus guaranteed to find the bestfitting model, with enormous savings in computational effort. Marker data from different chromosomes are stochastically independent. Thus, in large samples, they are nearly orthogonal, conditional on the observed data. This suggests that it is useful to do an exhaustive search for each chromosome and that the regression sum of squares for markers from different chromosomes is nearly additive. Of course, in small samples we may fail to find the bestfitting model due to chance correlation among markers from different chromosomes. However, the probability of missing the bestfitting model is expected to be very much smaller than with simple forward selection. In this article, a model will be called sign consistent if its estimated regression coefficients are of the same sign for each marker pair in the model. Our MPS procedure is described as Algorithm 1.
Algorithm 1: Make the following definitions: i_{c} is a counter for the number of marker pairs selected for the cth chromosome; k is the total number of marker pairs in the current model; C is the total number of chromosomes; RSS_{min} is the smallest residual sum of squares of signconsistent models of order k found so far; M_{k} is the selected model of order k.
. For each chromosome set i_{c} = 0. Set k = 0. Fit the model with just an intercept and record the residual sum of squares (RSS_{total}). Record this model as M_{0}.
. Set k ⇒ k + 1. Set RSS_{min} = RSS_{total} (from step 1). _{total} (from step 1). For c = 1 to C do the following: From the current model drop the i_{c} marker pairs from the cth chromosome (but keep all pairs from other chromosomes) and do an exhaustive search for models with i_{c} + 1 marker pairs from the cth chromosome. Consider entry of a set of i_{c} + 1 pairs of markers only if the resulting model is sign consistent. For a current model that is sign consistent, compute the residual sum of squares (RSS_{current}). If RSS_{current} < RSS_{min} then set RSS_{min} = RSS_{current}, set c_{min} = c, and record the current model as M_{k}.
. If in step 2 no signconsistent model of order k can be found, stop. Else set i_{c} ⇒ i_{c} + 1 for chromosome c_{min} and go back to step 2.
. Apply a model selection criterion to select the bestfitting model in the sequence of models M_{k} (k = 0, 1, 2...) generated by steps 1, 2, and 3.
A remark regarding step 2 is in order. If a sign inconsistency is observed for a pair of markers to be entered, this suggests that the pair may not flank a QTL. Thus, such pairs should not be considered. Checking sign consistency upon entry does not, however, prevent a sign change in an entered pair later in the modelbuilding process. If, while other pairs are being added, a sign change occurs in a pair from another chromosome, that pair may be a false positive, suggesting there is an increasing risk of detecting false positives and that the selection procedure should be terminated. Therefore, we stop the selection process when no signconsistent model of order k is found.
We should point out that it is impossible that different orders of chromosomes lead to different results with Algorithm 1. This is because step 2 tries to add a pair of markers on each chromosome. In step 3 the algorithm then chooses the one chromosome for which addition of a pair gives the best fit. This will be the same chromosome, regardless of the order in which chromosomes are tried.
Note that in the model sequence obtained from Algorithm 1, the best model with k pairs does not necessarily contain all markers that are in the best model with k  1 pairs or less. An important reason for allowing the implicit drop of one or two markers during each step of the modelbuilding process is that there may be two adjacent QTL on the same chromosome with the same sign of the associated genetic effect. The pair of markers selected first is likely to lie between the two QTL. If left in the model, a ghost QTL will be detected. Allowing a pair to be dropped from the model during model building reduces the risk of detecting ghost QTL. For a chromosome with six markers and two QTL in the intervals (2, 3) and (5, 6) the model sequence may look like the hypothetical example shown in Table 3. The first pair tries to explain as much of the phenotypic variation as possible. However, only marker 3 is a flanking marker. Marker 4 is included because it accounts for the QTL in the interval (5, 6). In the next step, marker 4 is dropped while the flanking pairs (2, 3) and (5, 6) enter. SBC selects the fourmarker model as fitting best (smallest value of criterion), while the full model fits slightly worse. Were simple forward selection applied to the above example, we would first select the pair (3, 4), and this would remain in the model throughout. Thus, there is no more flexibility to end up with the “true” marker model (2, 3, 5, 6), and a ghost QTL will be detected.
Simulation study: We simulated BC_{1} populations for various settings. The number of chromosomes ranged from 12 to 20, while the number of QTL was between zero and five. Equal spacing of markers (10 or 20 cM) along a 100cM chromosome and absence of interference were assumed. The number of crossovers per chromosome was simulated according to a Poisson distribution with parameter equal to the length of the chromosome in morgans, which is in accordance with Haldane’s mapping function. For each setting, we performed 100 simulation runs. Assuming Poisson sampling, the standard error for an expected count μ (e.g., number of false positives) is (μ/100)^{0.5}, e.g., 0.2 for μ = 4. Due to high positive correlation among statistics of the same type as computed for different selection criteria (number of false positives, etc.), the accuracy of comparisons was deemed reasonable. Algorithm 1 was used, allowing a maximum of two QTL per chromosome to limit the computational burden of the exhaustive search. This does not imply, however, that such limitation is needed in practical applications where there is only one sequence of models to be generated instead of 100 or more in simulations. A QTL was considered as detected when an estimated QTL position was within 15 cM of the true QTL position. While the 15cM margin is somewhat arbitrary, rankings of model selection criteria according to different performance measures were rather insensitive to changes in the margin. If the hth QTL is detected,â_{h} is the estimate of the hth QTL effect based on (8). Otherwiseâ_{h} = 0. As an aggregate measure of bias we computed
We considered 14 examples with different QTL numbers, positions, and effect sizes (Table 4). Heritabilities were computed as described in the appendix. Example 1 is adapted from Lander and Botstein (1989). In this example, there are five QTL with decreasing effect size. This pattern of “tapered effects” (Burnham and Anderson 1998), i.e., few large effects and many small effects, is very typical of many real applications (Kearsey and Farquhar 1998). Lander and Botstein (1989) used a marker spacing of 20 cM for IM. From Darvasi et al. (1993) and Piepho (2000) it can be conjectured that using a much smaller spacing does not usually provide a dramatic gain in accuracy and power for loosely linked QTL. For detecting closely linked QTL, however, a finer spacing is necessary. In all examples except two, we used a spacing of 10 cM. We also included one example with a spacing of 5 cM. We do not consider finer marker spacings since this would increase the problem of multicollinearity and thus of instability of parameter estimates (Melchingeret al. 1998). The Lander and Botstein (1989) example was modified in different ways, i.e., by changing of error variance (heritability) and marker spacing. We included some other examples with less and with more markers. Examples 9 and 10 are adapted from Beavis (1994), who used examples with 10 and 40 QTL of the same, but small effect. Also, examples with two QTL on the same chromosome were included (examples 7, 8, 13, and 14; see Table 4).
If markers are densely spaced, it may happen that two adjacent markers are perfectly correlated, so that the design matrix for a model that includes these two markers is not of full column rank. If two markers are perfectly correlated, there is no information as to the position and effect of a QTL between the markers and the approach of Whittaker et al. (1996) breaks down, unless some constraint is imposed. For simplicity, we rejected the corresponding model in simulations. The problem occurred very rarely with a marker spacing of 10 cM and never with a marker spacing of 20 cM, but became more serious with spacings of 5 cM and smaller (results not shown). In practice one would include a check for collinearity (SariGorlaet al. 1997), drop one of two perfectly (or very highly) correlated markers, and include the best fitting of either adjacent (pseudo) noncollinear one.
RESULTS
The number of detected QTL is usually quite stable across criteria (Table 5). SBC tends to select the simplest models and thus the average number of correctly detected QTL is usually smaller than for other criteria, but the difference is very small most of the time, except for more extreme cases such as examples 9 and 10, where the difference is somewhat more pronounced. HQ_{c} and FPE4 also tend to select simpler models. In all cases investigated, SBC clearly has the smallest false positive rate (Table 6) and the most favorable percentage of correct detections among all detections (Table 7), often followed by HQ_{c} and FPE4. For these two types of counts, SBC is generally markedly superior to some other quite popular criteria such as s^{2} and C_{p}. For example, with example 2, SBC has an average number of 0.63 false detections, while s^{2} and C_{p} have 6.06 and 7.36 false detections, respectively. SBC is followed by HQ_{c} (1.07) and FPE4 (1.31) in this example. In the example with no QTL (example 12), SBC picks the correct model (model with no markers) 92% of the time, which is by far better than any other criterion. Only FPE4 and HQ_{c} come anywhere near this figure (59 and 74%, respectively). It should be noted that all criteria select from the same sequence of models. The difficult task is to strike the right balance between underfitting and overfitting, i.e., to find Ockham’s Hill (MacKay 1992), and it is this task for which model selection criteria are designed. Obviously, SBC is best at finding a suitable cutoff; i.e., it detects when the sequence starts picking up more noise than pattern.
Examples 9 and 10, which were chosen mainly to see how the criteria performing best in most cases would perform under circumstances very favorable to other criteria such as AIC, are extreme cases in many respects. The effects are all equal and not tapered as in many of the other examples. In contrast to other examples, SBC has a markedly smaller number of correct detections in example 9 (Table 5), so the fact that it still has the most favorable rate of correct detections relative to the total number of detections does not have an unambiguous interpretation. If we are more concerned about false positives, SBC is clearly favorable, while other criteria fare better regarding the number of correct detections.
Examples 13 and 14 have the same QTL as example 7, but are different in that the number of markers exceeds the number of individuals. Thus, there is a larger potential for overfitting. Note that the criteria C_{p} and GM are not applicable because the number of markers exceeds the sample size. While for both examples 13 and 14 the number of correct detections is about the same for all criteria, SBC is the clear winner in terms of the ratio of true detections among all detections (Table 7). For many of the other criteria, the number of false detections (Table 6) increases dramatically for examples 13 and 14 compared to example 7, showing that the problem of overfitting increases with the number of markers. SBC is the only criterion for which the number of false positives does not change markedly relative to example 7.
A comparison of examples 2, 3, 4, and 5 in Tables 6 and 7 shows that all criteria select simpler models as σ^{2} increases and as sample size decreases. Increasing the sample size from 200 (example 2) to 500 (example 5) results in a mild increase in the number of correct detections (Table 5) and in the proportion of correct detections among all detections (Table 7). Reducing marker spacing (compare examples 1 and 2 and examples 7 and 14) increased the number of false positives and reduced the proportion of correct detections, indicating that the risk of overfitting increases with the number of markers. Note, however, that in example 2 the number of correct detections is also increased relative to example 1.
Bias, as assessed by the overall measure SSE(α), is comparable for all selection criteria (Table 8). The only exception to this rule is SBC, which due to its tendency to select simpler models than other criteria has a notably smaller number of detected QTL and so has somewhat larger aggregate bias than other criteria in some examples, mainly due to undetected QTL. Bias decreases with smaller variance σ^{2} (examples 24 and examples 9 and 10). This corroborates the finding of Utz and Melchinger (1994) that heritability is among the main factors determining bias. Our results for examples 9 and 10 are somewhat more diverse than those of Beavis (1994), who almost exclusively found large upward biases for examples similar to ours on the basis of a comparable range of heritabilities. Note that Beavis (1994) used IM, which has been shown by Utz and Melchinger (1994) to be associated with more severe biases than CIM. We should point out that MPS is more akin to CIM, which may go some way toward explaining the contrasting results. Moreover, bias assessment is necessarily somewhat arbitrary, for it depends on the definition of QTL detection. We consider a QTL as detected when the model has a position estimate within 15 cM from the true QTL. Changing the margin to 10 or 20 cM leads to different bias estimates.
We summarize the results as follows: Since the number of correct detections is usually quite constant across criteria, we think that the number of false positives and the fraction of correct detections are the most meaningful performance measures. From the overall picture of simulation results, SBC emerges as the best criterion, with FPE4 and HQ_{c} as the closest competitors. While SBC, HQ_{c}, and FPE4 tend to find slightly fewer QTL than other criteria, they do much better in avoiding the risk of detecting spurious QTL.
DISCUSSION
Features of MPS: MPS is a new procedure that addresses the goal of finding as many QTL as possible, while limiting the risk of detecting spurious QTL. It contains three important building blocks specifically designed to achieve this goal: (i) selection of marker pairs, (ii) augmentation of a forward selection procedure by an exhaustive search per chromosome, and (iii) application of a model selection criterion to select the final model from a sequence of models. None of these building blocks is in itself new. The novelty here is the way in which these components are integrated into a single algorithm and how they are applied to QTL mapping, exploiting our knowledge of the underlying biology. The two main differences between MPS and CIM are the way in which cofactors are selected and how the final model is selected. MPS implicitly uses marker pairs of other QTL as cofactors, while conventional CIM can use a wide variety of ways in which cofactors are selected (forward selection, using the best five markers, using two markers per chromosome, etc.). MPS uses criteria such as SBC to select a model, while CIM uses multiple LR tests.
MPS can be used as a standalone procedure for detecting QTL and estimating their effect and position in BC_{1} populations, or equivalently for recombinant inbred and doubled haploid lines. It is also applicable for F_{2} populations, if one is interested only in additive effects, but not in dominance effects (Whittakeret al. 1996). Alternatively, for any kind of unselected population, MPS may serve as a supplement to CIM in two ways. First, the LOD profile produced by CIM can be overlaid with the positions of the pairs of markers selected by MPS. Peaks in the profile associated with selected pairs can then be given more credibility than other peaks. Second, MPS can be used to select cofactors. While Algorithm 1 will likely select pairs of markers accounting for QTL, it may not be efficient to use all markers selected by Algorithm 1 in CIM. Instead, it may be preferable to use only a subset of markers selected by Algorithm 1. We therefore suggest using the markers detected by Algorithm 1 and performing an exhaustive search, using some model selection criterion. The resulting subset may be submitted to CIM. It is as yet an open question whether use of all pairs of markers selected by Algorithm 1 in CIM is inferior to using a subset of them. This question deserves further study.
The equivalence of IM/CIM as applied to adjacent marker pairs and the approach by Whittaker et al. (1996) used in MPS is restricted to the case where IM/CIM does not map a QTL exactly at a marker, i.e., where the RSS has a local minimum in the open interval r_{L} Π (0, θ) and this minimum is smaller than the RSS at r_{L} = 0 and r_{L} = θ. In our experience this case will be the rule in real applications. As pointed out by a referee, if IM/CIM maps a true QTL exactly at a marker, it is possible that with the approach of Whittaker et al. (1996) estimates of β_{L} and β_{R} have opposite sign, where one marker corresponds to the mapped QTL. In this case it may happen that Algorithm 1 fails to detect the QTL due to the requirement of sign consistency, while IM/CIM finds the QTL. Our procedure can be modified to account for the problem. Consider three adjacent markers 1, 2, and 3 and assume there is a QTL close to marker 2. As we scan the chromosome, adjacent pairs (1, 2) and (2, 3) will be tried, possibly in conjunction with a set of additional pairs on the same chromosomes. We suggest scrutinizing fits of pairs (1, 2) and (2, 3) with the same set of additional pairs on the same chromosome (this set may be empty). If the sign of the regression coefficient for marker 2 is the same for both pairs, and if the signs of regression coefficients for markers 1 and 3 agree with each other and are opposite to the sign for marker 2, the QTL would go unnoticed by Algorithm 1. Thus, in such cases we could fit the pair (1, 3), again with the same set of additional pairs. The pair (1, 3) would be considered further only if the signs of the regression coefficients for markers 1 and 3 change compared to the corresponding fits with pairs (1, 2) and (2, 3). If the pair (1, 3) is selected for the current model order, the dropped marker 2 could be considered again for higher model orders. We have not incorporated this modification in our description of Algorithm 1 and in the simulation, because in our experience the problem is not very common, and the expected gain in power is small. Also, the modification increases the risk of detecting false positives.
Instead of an exhaustive search per chromosome as implemented in our Algorithm 1, we could adopt a simple forward selection procedure, possibly improved by some measures to exploit knowledge of the biology. For example, if at one step markers 2 and 3 have been selected on a chromosome, it is sensible to allow the pair (1, 4) to be selected in subsequent steps, providing pairs (1, 2) and (3, 4) lead to regression estimates of the same sign for a pair. This makes sure that the “correct” pairs can be selected in case there are two isolated QTL in the intervals (1, 2) and (3, 4) on the same chromosome, and at an earlier step pair (2, 3) was selected. Also, we could allow a selected pair of markers to move one position to the left or to the right as more marker pairs are being added. Thus, e.g., having selected the pair (2, 3) at some stage, we would allow this pair to be replaced by (1, 2) or (3, 4) later in the selection process, if this improves the fit. One can think of more modifications of simple forward selection. In fact, the modified algorithm may become fairly complicated and unrealistic to program when one attempts to cover all the possible QTL and marker configurations that may occur in reality. While our partially exhaustive search is computationally more demanding, it has the virtue of simplicity and at the same time covers many of the features lacking in a simple forward selection algorithm.
We observed that occasionally MPS selects more than one pair of markers for a large QTL, leading to overfitting of that QTL. We could augment our Algorithm 1 by a step that tries to reduce the model whenever there are two or more pairs of markers on the same chromosome. Our investigations (results not shown) suggest that this modification will slightly increase efficiency when in fact there is only one QTL on the chromosome, while it may deteriorate performance of the algorithm in case there is more than one QTL. We have not included such modifications in our simulation study for simplicity.
Comparison of MPS to other procedures: In conventional CIM, cofactors are usually selected on the basis of simple forward selection, with markers entering the model individually rather than in pairs. Often, the selection is semiautomated or fully automated, with no check of whether or not the selected cofactors match QTL detected later in the CIM scan across the chromosome. Such a check would be useful as a guard against overfitting. It has been observed that inclusion of too many cofactors that are not associated with a QTL will reduce power to identify QTL relative to IM (Zeng 1993; Beavis 1994). Since MPS selects markers in pairs, the likelihood of selecting spurious markers as cofactors is reduced. Moreover, a stringent criterion such as SBC further reduces the risk of overfitting.
A more detailed comparison of CIM and MPS would be rewarding, but is beyond the scope of this article. For CIM there are many parameters that would have to be considered in simulations: window size, definition of critical threshold, definition of when a LOD peak detects a QTL and when it must be considered as a “subpeak” of another detecting peak, selection of cofactors, ML, or least squares, etc. In fact, CIM could be modified by taking up some or all of the ingredients that make up MPS, i.e., selecting markers in pairs, requiring sign consistency, using SBC or some other criterion instead of LOD thresholds, doing an exhaustive search per chromosome to select cofactors, etc. Thus, a detailed comparison will have to be fairly extensive and should include various blends of MPS and CIM. Such a study is left for future work.
Recently, Kao et al. (1999) suggested a method termed multiple interval mapping (MIM) that was judged superior to CIM. Using MIM, several QTL can be fitted simultaneously, allowing for complex models of gene action including epistasis. Due to the potentially large complexity of models fitted by MIM, there is a severe danger of overfitting. The authors mention a number of model selection strategies, including use of AIC and SBC, but, for their suggested procedure, they adopt a stepwise selection procedure in conjunction with multiple LR testing and a Bonferroni adjustment on tests for epistasis. The results presented in our article strongly suggest that the performance of MIM could be considerably improved by using a selection criterion such as SBC in place of multiple LR tests. In fact, MPS can be used to first select potential regions in the genome for fitting QTL. This can be followed by MIM restricted to the selected regions. Restricting application of MIM to the regions selected by MPS is expected to reduce the inherent danger of overfitting.
Further remarks on model selection: Procedures for mapping QTL aim at finding as many true QTL as possible, while avoiding the risk of detecting spurious QTL. Significance testing as is commonly used for IM, CIM, and MIM is not necessarily the best strategy to achieve this goal (Gelfand and Ghosh 1998). For IM, there are a number of methods for determining the appropriate threshold so that the genomewise type I error rate is controlled at a predetermined value, such as 5% (Lander and Botstein 1989, 1994; Churchill and Doerge 1994; Rebaï et al. 1994, 1995; Doerge and Churchill 1996; Goffinet and Mangin 1998; Dupuis and Siegmund 1999). Most of these methods operate under a global null hypothesis of no QTL anywhere in the genome, which is rather restrictive, but see Doerge and Churchill (1996) and Goffinet and Mangin (1998). Controlling the genomewise error rate in a sequential modelbuilding process is an inherently difficult problem. Also, significance tests do not allow a comparison of nonnested models. Forcing a nested model sequence entails the risk of detecting ghost QTL and missing betterfitting models. Moreover, having controlled the genomewise rate of false positives among tests under the null hypothesis of no QTL at 5%, the rate of false positives among QTL detections, which is a different quantity that is usually of greater interest, can easily be 50% or more (Southey and Fernando 1998).
Model selection criteria are based on a philosophy that is essentially different from that underlying significance testing (Burnham and Anderson 1998). A common basis of many criteria is the notion that the more information is gathered, the greater is the model complexity that the data can support (Bucklandet al. 1997). While not guaranteeing the absence of false positives among detections, criteria such as SBC do a better job at striking the balance between the contrasting objectives of finding as many real QTL as possible and at the same time keeping the risk of fitting spurious QTL low. Moreover, the difficult task of finding an appropriate adjustment for multiple testing to control a genomewise error rate is obviated, and nonnested models can be compared.
In this article we have not used computerintensive methods of model selection, such as leavedout cross validation and bootstrapping (Hjorth 1994; McQuarrie and Tsai 1998), to limit the computational burden in simulations. It is interesting to note, however, that there exist several asymptotic equivalence relationships between crossvalidation and selection criteria used here (see Table 2), for example, between FPE and PRESS as well as between SBC and leavedout cross validation, when d = n(1  1/(log(n)  1)) (Shao 1996; McQuarrie and Tsai 1998).
APPENDIX
Let z = g_{1}α_{1} + g_{2}α_{2}, where α_{1} and α_{2} are additive genetic effects of two QTL and g_{1} and g_{2} are coded 0 and 1 depending on the genotype at the QTL. For the nonrecombinant genotypes we have either g_{1} = g_{2} = 0 or g_{1} = g_{2} = 1. For the recombinant genotypes g_{1} = 0 and g_{2} = 1 or g_{1} = 1 and g_{2} = 0. Let r be the recombination fraction between the two QTL. It can be shown that
Acknowledgments
We thank John Whittaker and three anonymous reviewers for helpful comments. This article was written while the first author was visiting the Department of Biometrics and the Department of Plant Breeding, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York. Support of the Heisenberg Programm of the Deutsche Forschungsgemeinschaft is gratefully acknowledged.
Footnotes

Communicating editor: C. Haley
 Received January 27, 2000.
 Accepted October 6, 2000.
 Copyright © 2001 by the Genetics Society of America