- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Lystig, T. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Lystig, T. C.
Adjusted P Values for Genome-Wide Scans
Theodore C. Lystigaa Department of Mathematical Statistics, Chalmers University of Technology, 412 96 Göteborg, Sweden
Corresponding author: Theodore C. Lystig, Chalmers University of Technology, Eklandagatan 86, 412 96 Göteborg, Sweden., lystig{at}math.chalmers.se (E-mail)
Communicating editor: G. CHURCHILL
| ABSTRACT |
|---|
Genome-wide scans for quantitative trait loci (QTL) have traditionally been summarized with plots of logarithm of odds (LOD) scores. A valuable modification is to supplement such plots with an additional vertical axis displaying quantiles of adjusted P values and labeling local maxima of the LOD scores with location-specific adjusted P values. This provides a visible gradation of genome-wide significance for the LOD score curve, instead of the stark dichotomy that a single threshold yields. Adjusted P values give genome-wide significance of individual LOD scores and are obtained through a straightforward modification of the familiar algorithm for generating permutation-based thresholds.
TWO of the most popular methods for performing genome scans to detect quantitative trait loci (QTL) are interval mapping (![]()
![]()
![]()
![]()
![]()
![]()
![]()
A major attraction of the LOD score is that it acts as a type of profile likelihood, evincing at each analysis point the relative support for the presence of a QTL at that location. The profile nature of the LOD score can make the interpretation of significance problematic. The issue is one of false positives and has been discussed by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
for the entire genome scan. More specifically, a threshold for a given study is the value of the LOD score such that the maximum of the observed genome scan LOD score will exceed the threshold with probability
only when the null hypothesis is true for that particular study. The threshold thus provides a sharp demarcation line between significance and nonsignificance for the genome scan. However, apart from the single case of a LOD score being exactly equal to the calculated value, a threshold does not provide a precise level of genome-wide significance for the individual LOD score values.
It is desirable to know the genome-wide significance or adjusted P values (![]()
![]()
![]()
The immediate use of the adjusted P values is as a means of stating the precise significance level for the test of the presence of a candidate QTL found in a genome scan. These values may appear in a textual description of the genome scan or may be used graphically to label local peaks of interest in a LOD score curve. The quantiles from the approximate null distribution of the maximum LOD score can be used to ascertain multiple threshold values, which then correspond to adjusted P values. The LOD score values of these quantiles can be used to place multiple threshold values for adjusted P values onto a traditional LOD score plot; the adjusted P values would appear as an additional labeled vertical axis. Of course, it is even possible to directly display the adjusted P values by plotting the entire genome scan with an adjusted P-value curve, similar to a LOD score curve.
| METHODS |
|---|
As noted by ![]()
![]()
- Individuals in the experiment are labeled with unique numbers one through n.
- The phenotypic data are shuffled by taking a random permutation of the indices 1, ... , n and matching the ith phenotypic trait value to the individual with index given by the ith element of permuted indices. This permuted vector of traits is matched with the original (unpermuted) genotype information for all individuals.
- A genome scan for QTL effects is performed on the resulting permuted data set, and the largest test statistic (LOD score) is recorded.
- Steps 2 and 3 are repeated a total of N times (N is often 1000), yielding N maximal test statistics, one from each genome scan.
- The N maximal test statistics are ordered from smallest to largest.
- The 100(1 -
) percentile of the N ordered values is the estimated experiment-wise threshold value for controlling the type 1 error at level
.
With
= 0.05 and N = 1000, the 950th value of the ordered maximal test statistics would be the estimated threshold value. Note that while N = 1000 is usually sufficient to obtain a threshold for
= 0.05, on the order of N = 10,000 shuffled data sets are recommended for
= 0.01 to obtain stable estimates (![]()
To calculate adjusted P values, the only portion of the above algorithm that must be modified is step 6. Instead of extracting a single value from the N ordered values to form a threshold, one simply calculates (for each observed LOD score statistic x from the original genome scan) the proportion of the N maximal test statistics that are greater than or equal to x. Note that determining the number of maximal statistics greater than a given x is facilitated by the fact that the N maximal test statistics were sequentially ordered in step 5. The modification necessary to obtain adjusted P values can be incorporated into the above algorithm as follows:
- 6*. For each LOD score x from the genome scan of the original unpermuted data, calculate the number of maximal test statistics from step 5 that are greater than or equal to x and divide by N.
The adjusted P values calculated in step 6* effectively make use of the entire distribution of maximal test statistics from a set of N genome scans, while a permutation-based threshold uses only the 100(1 -
) percentile of those statistics.
It is important to note that in addition to an
= 0.05 threshold, any other quantile of interest may be calculated from the set of N maximal LOD scores. In practice, this means that it is a simple task to add various threshold levels of adjusted P values onto standard LOD score plots. An example is provided with the simulated data below.
| SIMULATED EXAMPLE |
|---|
The simulated data set is for 200 animals from an F2 cross in the rat. The genetic length of the chromosomes is taken from ![]()
2 = 1.0) and no dominance effect was simulated at 85 cM from the left end of chromosome 2. The data were evaluated with interval mapping to obtain LOD scores at 1-cM intervals, yielding a total of 2001 analysis points. A LOD score plot for a genome scan of these data appears in Fig 1.
|
In addition to providing the scale for the LOD curve on the left-hand vertical axis, Fig 1 also provides the scale for the unadjusted P values on the right-hand vertical axis. Similar dual scale plots appeared in ![]()
The plot in Fig 2 depicts the genome scan results from chromosome 2, where the single QTL was generated. The genome scan is portrayed with a LOD score curve and supplemented with quantiles of adjusted P values (shown on the right-hand vertical axis). Furthermore, two of the peaks are labeled with position and adjusted P value. The lower peak is 110.28 cM from the left end of the chromosome and has a LOD score of 2.85, which corresponds to an adjusted P value of 0.240. This value was calculated by determining that 2.85 was less than 240 of the 1000 maximal genome scan LOD scores from the resampled data sets. The higher peak is at 82.84 cM and has a LOD score of 8.00, which corresponds to an adjusted P value <0.001. The observed LOD score of 8.00 was greater than the maximal genome scan LOD scores from all 1000 resampled data sets. Unlike the stark dichotomy of genome-wide significance seen in Fig 1, the adjusted P-value scale present on the right-hand vertical axis in Fig 2 provides a meaningful gradient of genome-wide significance for the LOD score curve.
|
| DISCUSSION |
|---|
Adjusted P values are a valuable tool to assist in the summary of genome-wide scans. Depicted graphically in a supplemented LOD score plot, they enable the LOD score evidence at a single analysis point for the presence of a QTL to be interpreted both marginally and within the context of a whole genome scan. Furthermore, labeling individual LOD score peaks with adjusted P values permits precise statements of genome-wide significance for particular locations of interest. Such supplemented LOD score plots retain all of the information of standard LOD score plots, enabling inferences to be made on values of the LOD score, the adjusted P values, or both.
The emphasis in this article has been on how adjusted P values may be combined with traditional LOD score plots to improve the summary of genome-wide scans. One major improvement is the use of LOD score values corresponding to quantiles of the adjusted P values to provide a meaningful gradient of genome-wide significance for the LOD score curve. Such emphasis differs markedly from that of ![]()
, while accounting for the Monte Carlo error of the permutation procedure. As such, the emphasis was on a single individual value of
for genome-wide significance, instead of on presenting gradations of significance. Moreover, neither adjusted nor unadjusted P values were displayed on any of their LOD score plots.
The Monte Carlo error of the permutation procedure is important to consider, especially when one is interested in precise values of very small adjusted P values. ![]()
![]()
An intriguing property of adjusted P values is that they provide a common scale on which to present the results from different models, phenotypes, or crosses. This might be accomplished by plotting the adjusted P values directly, instead of as a supplemental vertical axis. One example of multiple models would be when comparing models of varying complexity in the search for epistasis (genetic interaction). Another version of multiple models that could be compared would be both logarithmic and untransformed versions of a particular phenotype. For an initial application comparing multiple competing models in a one-dimensional genome scan, see ![]()
The P-value adjustments employed in this article were based on a single-step resampling method. It is also possible to perform a multiple-step resampling method adjustment, as described in ![]()
In summary, the calculation of single-step adjusted P values is straightforward and has been detailed above. The material necessary to perform the adjustments is generated naturally in the course of determining a permutation-based threshold. Quantiles of adjusted P values provide a meaningful gradation of genome-wide significance for elements of a LOD score curve. Furthermore, the adjusted P values are available to provide precise levels of significance for all analysis points in genome scans, in particular local and global maxima of LOD score curves. They provide a natural mechanism for the simultaneous presentation of genome scans involving multiple models, multiple phenotypes, or multiple crosses. They even provide an appropriate framework in which to investigate epistatic gene action. Adjusted P values are a valuable supplement to LOD score curves and are an asset for both exploring and summarizing genetic models of quantitative traits.
| ACKNOWLEDGMENTS |
|---|
I thank K. Broman, G. Churchill, O. Nerman, S. Schreyer, and two anonymous referees for providing critical feedback on earlier versions of this manuscript. Financial support for this work was provided by Arexis, AstraZeneca, and Chalmers University.
Manuscript received March 17, 2003; Accepted for publication April 10, 2003.
| LITERATURE CITED |
|---|
BROMAN, K. W., 2003 Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163:1169-1175.
CHURCHILL, G. A. and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138:963-971.[Abstract]
DOERGE, R. W. and G. A. CHURCHILL, 1996 Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285-294.[Abstract]
DRACHEVA, S. V., E. F. REMMERS, S. CHEN, L. CHANG, and P. S. GULKO et al., 2000 An integrated genetic linkage map with 1137 markers constructed from five F2 crosses of autoimmune disease-prone and -resistant inbred rat strains. Genomics 63:202-226.[Medline]
HALEY, C. S. and S. A. KNOTT, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324.[Medline]
HALEY, C. S., S. A. KNOTT, and J.-M. ELSEN, 1994 Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207.[Abstract]
JIANG, C. and Z-B. ZENG, 1995 Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140:1111-1127.[Abstract]
KAO, C.-H., Z-B. ZENG, and R. D. TEASDALE, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216.
KNOTT, S. A. and C. S. HALEY, 2000 Multitrait least squares for quantitative trait loci detection. Genetics 156:899-911.
LANDER, E. and L. KRUGLYAK, 1995 Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet. 11:241-247.[Medline]
LANDER, E. S. and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199. [corrigendum: Genetics 136: 705 (1994)].
NETTLETON, D. and R. W. DOERGE, 2000 Accounting for variability in the use of permutation testing to detect quantitative trait loci. Biometrics 56:52-58.[Medline]
WESTFALL, P. H., and S. S. YOUNG, 1993 Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. John Wiley & Sons, New York.
ZENG, Z-B., 1994 Precision mapping of quantitative trait loci. Genetics 136:1457-1468.[Abstract]
This article has been cited by other articles:
![]() |
S. Wang, S. Huang, L. Zheng, and H. Zhao Mapping Quantitative Trait Loci in Noninbred Mosquito Crosses Genetics, April 1, 2006; 172(4): 2293 - 2308. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. L. Thomae, E. A. Stevens, A. L. Liss, N. R. Drinkwater, and C. A. Bradfield The Teratogenic Sensitivity to 2,3,7,8-Tetrachlorodibenzo-p-dioxin Is Modified by a Locus on Mouse Chromosome 3 Mol. Pharmacol., March 1, 2006; 69(3): 770 - 775. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bilger, L. M. Bennett, R. A. Carabeo, T. A. Chiaverotti, C. Dvorak, K. M. Liss, S. A. Schadewald, H. C. Pitot, and N. R. Drinkwater A Potent Modifier of Liver Cancer Risk on Distal Mouse Chromosome 1: Linkage Analysis and Characterization of Congenic Lines Genetics, June 1, 2004; 167(2): 859 - 866. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Lystig, T. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Lystig, T. C.



