Abstract
Genomewide scans for quantitative trait loci (QTL) have traditionally been summarized with plots of logarithm of odds (LOD) scores. A valuable modification is to supplement such plots with an additional vertical axis displaying quantiles of adjusted P values and labeling local maxima of the LOD scores with locationspecific adjusted P values. This provides a visible gradation of genomewide significance for the LOD score curve, instead of the stark dichotomy that a single threshold yields. Adjusted P values give genomewide significance of individual LOD scores and are obtained through a straightforward modification of the familiar algorithm for generating permutationbased thresholds.
TWO of the most popular methods for performing genome scans to detect quantitative trait loci (QTL) are interval mapping (Lander and Botstein 1989) and an approximation to interval mapping using least squares (Haley and Knott 1992). Numerous extensions have been developed for both the interval mapping (Zeng 1994; Jiang and Zeng 1995; Kaoet al. 1999) and the leastsquares (Haleyet al. 1994; Knott and Haley 2000) approaches. While debate may be present concerning the choice of analysis method for a genome scan, the method of summarizing the analysis is rarely questioned. Invariably, a scalar multiple of a logarithm of odds (LOD) score curve is used.
A major attraction of the LOD score is that it acts as a type of profile likelihood, evincing at each analysis point the relative support for the presence of a QTL at that location. The profile nature of the LOD score can make the interpretation of significance problematic. The issue is one of false positives and has been discussed by Lander and Botstein (1989) and Lander and Kruglyak (1995). Essentially, the LOD score threshold that must be exceeded to declare significance should be higher when examining a collection of LOD scores (as in a genome scan) than when examining a single analysis point. Thresholds may be computed in a variety of ways. Two of the most common approaches are an analytical method (Lander and Botstein 1989; Lander and Kruglyak 1995) and an empirical method based on permutation tests (Churchill and Doerge 1994; Doerge and Churchill 1996; Nettleton and Doerge 2000). The thresholds are calculated in such a manner as to control the falsepositive rate at level α for the entire genome scan. More specifically, a threshold for a given study is the value of the LOD score such that the maximum of the observed genome scan LOD score will exceed the threshold with probability α only when the null hypothesis is true for that particular study. The threshold thus provides a sharp demarcation line between significance and nonsignificance for the genome scan. However, apart from the single case of a LOD score being exactly equal to the calculated value, a threshold does not provide a precise level of genomewide significance for the individual LOD score values.
It is desirable to know the genomewide significance or adjusted P values (Westfall and Young 1993) of each of the analysis points represented in the LOD score curve. The interpretation of an adjusted P value for a given LOD score within a genome scan is as the probability of observing the maximum LOD score of a genome scan that was at least as large as the LOD score in question, given that the null hypothesis (such as no QTL present anywhere in the genome) is true. As such, it is a statement about the significance of a particular test statistic within the context of a genome scan. This differs from a standard unadjusted P value, which considers only the marginal significance of the LOD score for a given analysis point, without accounting for the multitude of other LOD scores obtained for the entire genome scan. As originally discussed by Nettleton and Doerge (2000) and detailed later in this article, it is a relatively straightforward process to modify the algorithm used to generate permutationbased thresholds (Churchill and Doerge 1994) to obtain the adjusted P values. Furthermore, such a modified algorithm also provides a Monte Carlo approximation to the null distribution of the maximum LOD score from the genome scan.
The immediate use of the adjusted P values is as a means of stating the precise significance level for the test of the presence of a candidate QTL found in a genome scan. These values may appear in a textual description of the genome scan or may be used graphically to label local peaks of interest in a LOD score curve. The quantiles from the approximate null distribution of the maximum LOD score can be used to ascertain multiple threshold values, which then correspond to adjusted P values. The LOD score values of these quantiles can be used to place multiple threshold values for adjusted P values onto a traditional LOD score plot; the adjusted P values would appear as an additional labeled vertical axis. Of course, it is even possible to directly display the adjusted P values by plotting the entire genome scan with an adjusted Pvalue curve, similar to a LOD score curve.
METHODS
As noted by Nettleton and Doerge (2000), adjusted P values are readily obtained through a modification of the algorithm used to generate a permutationbased threshold (Churchill and Doerge 1994). The algorithm for generating a permutationbased threshold is presented below. This algorithm is appropriate when the null hypothesis states that no QTL are anywhere in the genome, so that the phenotypic and genotypic information are mutually independent.
Individuals in the experiment are labeled with unique numbers one through n.
The phenotypic data are shuffled by taking a random permutation of the indices 1,..., n and matching the ith phenotypic trait value to the individual with index given by the ith element of permuted indices. This permuted vector of traits is matched with the original (unpermuted) genotype information for all individuals.
A genome scan for QTL effects is performed on the resulting permuted data set, and the largest test statistic (LOD score) is recorded.
Steps 2 and 3 are repeated a total of N times (N is often 1000), yielding N maximal test statistics, one from each genome scan.
The N maximal test statistics are ordered from smallest to largest.
The 100(1 α) percentile of the N ordered values is the estimated experimentwise threshold value for controlling the type 1 error at level α.
With α= 0.05 and N = 1000, the 950th value of the ordered maximal test statistics would be the estimated threshold value. Note that while N = 1000 is usually sufficient to obtain a threshold for α= 0.05, on the order of N = 10,000 shuffled data sets are recommended for α= 0.01 to obtain stable estimates (Churchill and Doerge 1994).
To calculate adjusted P values, the only portion of the above algorithm that must be modified is step 6. Instead of extracting a single value from the N ordered values to form a threshold, one simply calculates (for each observed LOD score statistic x from the original genome scan) the proportion of the N maximal test statistics that are greater than or equal to x. Note that determining the number of maximal statistics greater than a given x is facilitated by the fact that the N maximal test statistics were sequentially ordered in step 5. The modification necessary to obtain adjusted P values can be incorporated into the above algorithm as follows:
6^{*}. For each LOD score x from the genome scan of the original unpermuted data, calculate the number of maximal test statistics from step 5 that are greater than or equal to x and divide by N.
The adjusted P values calculated in step 6^{*} effectively make use of the entire distribution of maximal test statistics from a set of N genome scans, while a permutationbased threshold uses only the 100(1 α) percentile of those statistics.
It is important to note that in addition to an α= 0.05 threshold, any other quantile of interest may be calculated from the set of N maximal LOD scores. In practice, this means that it is a simple task to add various threshold levels of adjusted P values onto standard LOD score plots. An example is provided with the simulated data below.
SIMULATED EXAMPLE
The simulated data set is for 200 animals from an F_{2} cross in the rat. The genetic length of the chromosomes is taken from Dracheva et al. (2000), but the content of the map was simulated and the true contents were used in the subsequent QTL analysis. The entire genome is 1872 cM in length. Markers were generated at regular 10cM intervals (188 markers total). A single QTL with additive effect 0.5 (σ^{2} = 1.0) and no dominance effect was simulated at 85 cM from the left end of chromosome 2. The data were evaluated with interval mapping to obtain LOD scores at 1cM intervals, yielding a total of 2001 analysis points. A LOD score plot for a genome scan of these data appears in Figure 1.
In addition to providing the scale for the LOD curve on the lefthand vertical axis, Figure 1 also provides the scale for the unadjusted P values on the righthand vertical axis. Similar dual scale plots appeared in Lander and Kruglyak (1995). One noteworthy feature of Figure 1 is how it demonstrates that the naïve unadjusted threshold of P = 0.05 corresponding to LOD = 1.30 is clearly too low. Seven of the 20 chromosomes without a QTL present exceeded the threshold by chance variation alone, since a true QTL is present only on chromosome 2. A value of the LOD score that maintains the correct size across the entire genome is the permutationbased threshold of LOD = 3.55 from 1000 resampled data sets. This LOD score corresponds to an unadjusted P value of P = 0.000283. However, as indicated in Figure 1, the permutationbased threshold merely serves to dichotomize the genome scan analysis points into two distinct groups: those with genomewide significance >0.05, and those with genomewide significance <0.05. The relative significance of portions of the LOD score curve within each of the two regions is not apparent from this LOD plot. The gradation of genomewide significance is not discernible.
The plot in Figure 2 depicts the genome scan results from chromosome 2, where the single QTL was generated. The genome scan is portrayed with a LOD score curve and supplemented with quantiles of adjusted P values (shown on the righthand vertical axis). Furthermore, two of the peaks are labeled with position and adjusted P value. The lower peak is 110.28 cM from the left end of the chromosome and has a LOD score of 2.85, which corresponds to an adjusted P value of 0.240. This value was calculated by determining that 2.85 was less than 240 of the 1000 maximal genome scan LOD scores from the resampled data sets. The higher peak is at 82.84 cM and has a LOD score of 8.00, which corresponds to an adjusted P value <0.001. The observed LOD score of 8.00 was greater than the maximal genome scan LOD scores from all 1000 resampled data sets. Unlike the stark dichotomy of genomewide significance seen in Figure 1, the adjusted Pvalue scale present on the righthand vertical axis in Figure 2 provides a meaningful gradient of genomewide significance for the LOD score curve.
DISCUSSION
Adjusted P values are a valuable tool to assist in the summary of genomewide scans. Depicted graphically in a supplemented LOD score plot, they enable the LOD score evidence at a single analysis point for the presence of a QTL to be interpreted both marginally and within the context of a whole genome scan. Furthermore, labeling individual LOD score peaks with adjusted P values permits precise statements of genomewide significance for particular locations of interest. Such supplemented LOD score plots retain all of the information of standard LOD score plots, enabling inferences to be made on values of the LOD score, the adjusted P values, or both.
The emphasis in this article has been on how adjusted P values may be combined with traditional LOD score plots to improve the summary of genomewide scans. One major improvement is the use of LOD score values corresponding to quantiles of the adjusted P values to provide a meaningful gradient of genomewide significance for the LOD score curve. Such emphasis differs markedly from that of Nettleton and Doerge (2000), who investigated the variability of permutationbased genomewide thresholds. These authors presented adjusted P values (referred to in their article as permutation P values) as analogues of permutation thresholds. Their prime interest was in assessing when a LOD score exceeded a genomewide threshold for a given fixed value of α, while accounting for the Monte Carlo error of the permutation procedure. As such, the emphasis was on a single individual value of α for genomewide significance, instead of on presenting gradations of significance. Moreover, neither adjusted nor unadjusted P values were displayed on any of their LOD score plots.
The Monte Carlo error of the permutation procedure is important to consider, especially when one is interested in precise values of very small adjusted P values. Churchill and Doerge (1994) originally suggested using at least 1000 permutations for estimating thresholds corresponding to an adjusted P value of 0.05 and suggested that as many as 10,000 permutations might be required for stable estimates of a threshold corresponding to an adjusted P value of 0.01. Depending on the stability required for the adjusted P values, these numbers could be refined further using techniques described by Nettleton and Doerge (2000). Note however that the variability of the adjusted P values resides in their calculated values and not in their relative ordering. This may be seen in part by observing that both adjusted and unadjusted P values are monotonic transformations of the values from the LOD score curve.
An intriguing property of adjusted P values is that they provide a common scale on which to present the results from different models, phenotypes, or crosses. This might be accomplished by plotting the adjusted P values directly, instead of as a supplemental vertical axis. One example of multiple models would be when comparing models of varying complexity in the search for epistasis (genetic interaction). Another version of multiple models that could be compared would be both logarithmic and untransformed versions of a particular phenotype. For an initial application comparing multiple competing models in a onedimensional genome scan, see Broman (2003). The comparison of multiple phenotypes would be particularly interesting with related measurements, such as alternative forms of assessing cognitive ability in Alzheimer’s disease or different measures of severity of rheumatoid arthritis. With the case of multiple crosses, researchers have a meaningful way of comparing the relative evidence for a QTL from a backcross strain and an intercross strain, for example.
The Pvalue adjustments employed in this article were based on a singlestep resampling method. It is also possible to perform a multiplestep resampling method adjustment, as described in Westfall and Young (1993, Chap. 2). In the multiplestep method, the order (in terms of LOD score) of a given statistic within the observed genome scan is used to determine which simulated order statistic the observed statistic should be ranked against. That is, the observed maximum is ranked against the simulated maxima, the secondhighest observed statistic is ranked against the distribution of the secondhighest statistics from the simulations, and so on until the smallest observed statistic is ranked against the distribution of the minima from the simulated data sets. The greatest disparity between the single and multiplestep methods occurs for the smallest observed statistics, which is where there is the least interest. There were negligible discernible differences between the adjusted P values from the single and multiplestep methods for the simulated data used in this article. As calculation of the multiplestep adjustments takes substantially longer than that of the singlestep adjustments, researchers are advised to use the singlestep adjustments.
In summary, the calculation of singlestep adjusted P values is straightforward and has been detailed above. The material necessary to perform the adjustments is generated naturally in the course of determining a permutationbased threshold. Quantiles of adjusted P values provide a meaningful gradation of genomewide significance for elements of a LOD score curve. Furthermore, the adjusted P values are available to provide precise levels of significance for all analysis points in genome scans, in particular local and global maxima of LOD score curves. They provide a natural mechanism for the simultaneous presentation of genome scans involving multiple models, multiple phenotypes, or multiple crosses. They even provide an appropriate framework in which to investigate epistatic gene action. Adjusted P values are a valuable supplement to LOD score curves and are an asset for both exploring and summarizing genetic models of quantitative traits.
Acknowledgments
I thank K. Broman, G. Churchill, O. Nerman, S. Schreyer, and two anonymous referees for providing critical feedback on earlier versions of this manuscript. Financial support for this work was provided by Arexis, AstraZeneca, and Chalmers University.
Footnotes

Communicating editor: G. Churchill
 Received March 17, 2003.
 Accepted April 10, 2003.
 Copyright © 2003 by the Genetics Society of America