| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 175, 855-865, February 2007, Copyright © 2007
doi:10.1534/genetics.106.059808
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Center for Developmental and Health Genetics and Department of Biobehavioral Health, Pennsylvania State University, University Park, Pennsylvania 16803
1 Address for correspondence: 101 Amy Gardner House, University Park, PA 16803.
E-mail: fzj100{at}psu.edu
| ABSTRACT |
|---|
|
|
|---|
However, many traits do not meet this requirement well because of their extremely nonlinear nature. As such, they are best characterized in terms of sudden transitions to qualitatively distinct phenotypic states as opposed to quantitative extensions of previous states. The most notorious of these is probably the death of the organism, but many other traits such as "cancer onset" or "flowering" can also be interpreted in this way. In a typical time-to-event (or time-to failure) experiment one follows a sample over time and records the times (e.g., hours, days) at which the event occurs to a given individual. The resulting phenotypic distribution is usually right-skewed and includes censored cases, that is, subjects who are randomly lost for follow-up or for whom no event has been observed at the end of the experiment.
Standard QTL search models can perform poorly under this type of data structure, because a suitable transformation may not be available to adjust the distribution to normality. Moreover, censored cases are usually treated as missing, which reduces the power to detect QTL, or else they are inappropriately treated as observed event (or failure) times. Recently, several models have been proposed that address these limitations (BROMAN 2003; DIAO et al. 2005; MORENO et al. 2005). However, without exception, these models make the strong assumption that the QTL effect remains constant across the duration of the experiment. This assumption may be violated when the gene(s) underlying the QTL are up- or downregulated on a biologically meaningful timescale, which can either increase or decrease the probability to experience the event at a given time point. In such situations, models that assume temporal constancy can fail to detect QTL in an initial whole-genome scan (ABRAHAMOWICZ et al. 1996).
The aim of this article is to develop the idea of time-varying QTL effects, as discussed in the functional mapping literature, in the context of time-to-event (or time-to-failure) analysis. To achieve this we apply an extension of the Cox model (EC model) (THERNEAU and GRAMBSCH 2000) within an interval-mapping framework. In its simplest form, this model assumes that the QTL effect changes at some time point t0 and follows a linear function before and after this change point. The approximate time point at which this change occurs is estimated. We consider four types of QTL effects: early-acting (EA), late-acting (LA), inversely acting (IA), and proportionally acting (PA) QTL.
To clarify these different QTL effects, consider a simulated experiment in which a sample of mice is exposed to a toxin at time zero. The sample is followed for 20 days and the death times of the mice are recorded. Mice surviving beyond day 20 are censored. We are interested in identifying QTL that influence differential survival in response to toxic exposure. Suppose the sample under study is a backcross with two possible genotypes AA and AB at a given locus. When we plot the survival curves for the two different genotypes at the ith locus, we may observe the four basic theoretical scenarios shown in Figure 1. Scenario 1 illustrates the presence of an EA QTL: The survival curves diverge between day 5 and day 10 and are indistinguishable thereafter. Scenario 2 shows a LA QTL: The survival curves begin to diverge only after day 15. Scenario 3 illustrates an IA QTL: Between day 5 and day 10 individuals carrying the AB genotype die at a faster rate compared to individuals with the AA genotype. At about day 10 we observe an effect reversal with the AA genotype experiencing increased death rates compared to the AB genotype. Last, scenario 4 provides an example of a PA QTL. In this latter situation the survival curve of the AA genotype is consistently lower compared to that of the AB genotype, which indicates that, at any given time point in the course of the study, the difference in the death rates between these genotypes is constant.
|
| METHODS |
|---|
|
|
|---|
ij = Pr(gi = j | mi), the conditional probability of a particular QTL genotype given the observed marker genotypes of the two markers bracketing the interval. Let gi = 0, 1 according to whether an individual has QTL genotypes AA, AB, respectively and let ß be the parameter estimating the QTL effect. We consider the Cox proportional hazards model and the extended Cox model to obtain estimates for ß. For simplicity, we treat the special case of no additional covariates in each of the models.
CPH:
Let f(t) denote the density function and F(t) the distribution function of the random failure (survival) times T. The survival function S(t) is defined as 1 – F(t) = Pr{T > t), which expresses the probability that an individual survives beyond time t. The corresponding hazard function (or hazard rate) is given by
![]() |
The hazard function has the interpretation
![]() |
It is the probability that an individual dies (or fails) in the instant immediately following time t given that the individual has survived until t. Often we are interested in formally testing the effect of a measured covariate (e.g., genotype) on the hazard. In such cases, COX (1972) argued to model the hazard as
![]() | (1) |
is the regression coefficient, and
0(t) is an unspecified (infinite-dimensional) baseline hazard function. Modifying (1) to include the relevant conditional probabilities and notation introduced above yields
![]() | (2) |
The estimation of ß is carried out through Cox's partial likelihood (COX 1972, 1975). A modified version of this likelihood for the present genetic application has previously been employed by MORENO et al. (2005). Conventionally the parameter
ij is not estimated directly but is profiled out of the likelihood function by letting it assume a series of plausible fixed values and evaluating the likelihood at each of these values. The value for
ij that yields the largest profile-likelihood estimate for ß is taken as an approximation of
ij. For the kth value of
ij the partial likelihood is proportional to
![]() | (3) |
denotes the risk set, that is, the collection of all individuals still at risk just prior to time point ti. The advantageous feature of (3) is that it is free of t because the baseline hazards in the numerator and the denominator cancel. This makes (2) a useful model in situations where the distribution of the failure/survival times is not straightforward. However, as argued in the Introduction, there is often reason to believe that the QTL effect changes over time. Therefore, to obtain appropriate estimates of the QTL effect the parameter ß must be allowed to change according to some function of time, say z(t). This can be accomplished with an extension of the Cox model.
EC model:
A simple but flexible method to detect time-dependent covariates is through change-point analysis (KLEIN and MOESCHBERGER 1997; TABLEMAN and KIM 2004). This method assumes a simple two-piece linear form for z(t). Suppose that the QTL effect changes at some discrete time point t0. We can specify a model that estimates the proportional hazard before t0 and after t0. Thus, a simple version of the EC model can take the form
![]() | (4) |
![]() |
ij. One strategy is to evaluate the partial likelihood over a two-dimensional grid of plausible fixed values for t0 and
ij. The combination of values that yields the highest profile-likelihood estimate provides approximations for these parameters. Hence, the partial-likelihood function for the kth combination of fixed values for t0 and
ij is
![]() | (5) |
Simulation study:
Data generation:
To quantify the advantages of the EC model over the CPH model we extend our discussion of the simulation experiment mentioned in the Introduction. In brief, we generated survival/failure times for N = 250 backcross individuals. The event line for each individual (i = 1,..., n) was generated separately, so that it formed a Markov chain with transition probabilities between time points given by
, which is the form of the EC model with the baseline hazard being replaced by the constant environmental effect
, where
is fixed at 2. Individuals who survived beyond 50 days were right-censored. The parameters ß1 and ß2 were each set at different combinations of –0.55, –0.45, –0.35, 0, 0.35, 0.45, and 0.55. A set of change-point values (t0) was chosen at equal spacing on either side of the expected failure time of the AA genotype, which is
7 days. This resulted in the change-point values 3, 5, 7, 9, and 11. For each unique combination of fixed values for ß1, ß2, and t0, we simulated 100 experiments. Due to the high computational demands, we restricted our analysis to a single QTL locus.
Figure 2A shows how the concepts of EA, LA, IA, and PA QTL can be expressed in terms of the simulated parameters ß1 and ß2. Note that if both ß1 and ß2 are positive or negative the model reduces to proportionality and the CPH model should be able to detect the QTL effect (see Figure 2B). In all other cases, the proportionality assumption is violated, on a continuum, from slightly (e.g., borderline EA–PA) to strongly (e.g., far corner IA), and we expect the CPH model to perform poorly. On the contrary, we expect the EC model to detect all four types of QTL effects (see Figure 2C). Naturally, for ß1 = ß2
0 we should see no effect. We utilize the idealized scenarios illustrated in Figure 2, A–C, as a heuristic guide in evaluating the simulation results.
|
| RESULTS |
|---|
|
|
|---|
|
|
-level = 0.05 and a sample size of N = 250 is
98% (SCHOENFELD 1981, 1983), but is expected to be lower if the model does not account for time-dependent effects (FLEMING et al. 1980). Generally, the numeric findings further substantiate the visual results presented above. The EC model (Table 1) recovers the simulated parameters accurately, although with a small upward bias. The estimates are well contained with their 95% confidence intervals. As anticipated, the CPH model displays serious bias in recovering the estimated QTL effect, which is largest for IA QTL (Table 2). This observation follows directly from the construction of Cox's partial likelihood for the CPH model, which makes inferences about the QTL effect by summing over all failure times. This can lead to a cancellation of the effect if the effect points in opposite directions for different time lags of the experiment. Tables 1 and 2 also provide empirical power estimates of the two models by reporting the percentage of simulation runs that resulted in significant detection. The EC model clearly outperforms the CPH model with the largest power difference for LA and IA QTL. The power to detect PA QTL is approximately the same for the two models. From these empirically derived results, the EC model seems to be generally more flexible than the CPH model. However, it should be noted that standard errors obtained with the EC model are consistently higher. This can be due to the fact that ß1 and ß2 are partially a function of the additional variability induced by the parameter t0.
|
Numeric model results:
The CPH and the EC models were applied to these data in a search for QTL influencing differential survival. The EC model was evaluated over a spatiotemporal grid with an average position step size of 2–3 cM and an average temporal step size of 1–3 hr within the 25th and 75th percentiles of the failure distribution. The CPH model was evaluated over an average position step size of 2–3 cM. The genomewide significance thresholds were derived empirically from 1000 permutations of the data (CHURCHILL and DOERGE 1994), using the EC and CPH models. They were LRT = 10.0 and LRT = 12.01 for the CPH and EC models, respectively. Results from the whole-genome scan are shown in Figure 4 and are further summarized in Table 3. The CPH model located QTL on chromosomes (Chr) 1, 5, and 13. The EC model located these same QTL, which demonstrates its ability to recover QTL that meet the proportionality assumption. However, a slight shift in the location estimate for QTL 4 (Chr 5) was observed, with the EC model estimating the QTL location at 27 cM and the CPH model estimating it at 34 cM. Besides the above QTL, the EC model detected additional QTL on Chr 1, 2, 6, and 15. These latter QTL appear to be time dependent because the CPH model failed to detect them.
|
|
Visual model results:
To provide visual support for the numeric results of Table 3, we plot the spatiotemporal LRT profiles for the relevant chromosomes on the basis of the EC model (Figure 5). For the time-dependent QTL (QTL 1, 3, 5, 6, and 8) we note that the surfaces show LRT values exceeding genomewide significance thresholds only for a small range of time points within the tested temporal window. However, at least within this search window, QTL 6 and 8 appear like PA QTL because the LRT values are generally large for all tested time points. This is supported by the observation that, with the CPH model, these QTL fall only slightly short of genomewide significance thresholds (Table 3) and would probably have been detected with larger sample sizes. For the PA QTL (QTL 2, 4, and 7), we note large LRT values for all tested time points within the temporal window. Nonetheless, particularly QTL 2 exhibits a decline in later-tested time points, which corresponds with the fact that the EC model identifies this QTL as an EA QTL.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
We found evidence for EA and LA types of QTL in our analysis of a data set of survival times of mice that were exposed to L. monocytogenes. Our results show that even those QTL that were detected with the CPH model, which explicitly assumes a constant QTL effect, exhibited some time dependency. As discussed, this suggests that the distinction between EA-, LA-, IA-, and PA QTL is not as clear cut as shown in Figure 2A, but is partially a function of the time point at which the change occurs. These concepts should therefore rather be taken as a heuristic guideline to classify the general behavior of a locus over time. With respect to the example data, this temporal characterization can provide a rough sketch of the genomewide genetic regulation in response to L. monocytogenes exposure.
It should be noted that no IA QTL were detected in these data. However, there are physiological arguments to support the existence of these types of QTL. In longevity studies, for example, high levels of a gene product, such as growth hormone, may be beneficial to survival early during development, but detrimental at later stages where they may promote the development of cancer. Carrying an "increasing allele" at a locus that aids in the production of growth hormone can therefore have an inverse impact on survival throughout the course of the study. This complicates the notion of "allelic substitution" as it is used in quantitative genetic analysis. Preliminary results from other data sets indeed show some support for IA QTL.
Because the function of time that governs the QTL effect was not known a priori but was essentially modeled after the data, there remains the question of whether these temporal effects are real. As with any type of QTL experiment, it is therefore necessary to carry out confirmation studies using the same or other crosses to test if the temporal effect can be replicated. From a theoretical standpoint this raises the interesting question of to what extent the observed temporal effect is a property of a single locus or its genetic background. This question must be addressed experimentally. In agreement with the functional mapping literature, the present approach views the "genetic architecture" as a dynamic process involving the up- and downregulation of genes that influence variability in the phenotype. The analytical dissection of this process should take the temporal dimension into consideration.
| LITERATURE CITED |
|---|
|
|
|---|
ABRAHAMOWICZ, M., T. MACKENZIE and J. M. ESDAILE, 1996 Time-dependent hazard ratio: modeling hypothesis testing with application in lupus nephritis. J. Am. Stat. Assoc. 91: 1432–1439.[CrossRef]
BOYARTCHUK, V. L., K. W. BROMAN, R. E. MOSHER, S. E. F. D'ORAZIO, M. N. STARNBACH et al., 2001 Multigenic control of Listeria monocytogenes susceptibility in mice. Nat. Genet. 27: 259–260.[CrossRef][Medline]
BROMAN, K. W., 2003 Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163: 1169–1175.
CHANG-XING, M., G. CASELLA and R. WU, 2002 Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 161: 1751–1762.
CHEVERUD, J. M., E. J. ROUTMAN, F. A. M. DUARTE, B. VAN SWINDEREN, K. GOTHRAN et al., 1996 Quantitative trait loci for murine growth. Genetics 142: 1305–1319.[Abstract]
CHURCHILL, G. A., and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971.[Abstract]
COX, D. R., 1972 Regression methods and life-tables (with discussion). J. R. Stat. Soc. Ser. B 34: 187–220.
COX, D. R., 1975 Partial likelihood. Biometrika 62: 269–276.
DIAO, G., and D. Y. LIN, 2005 Semiparametric methods for mapping quantitative trait loci with censored data. Biometrics 61: 789–798.[CrossRef][Medline]
DIAO, G., D. Y. LIN and F. ZOU, 2004 Mapping quantitative trait loci with censored observations. Genetics 168: 1689–1698.
FLEMING, T. R., J. R. O'FALLON, P. C. O'BRIAN and D. P. HARRINGTON, 1980 Modified Kolmogorov-Smirnov test procedure with applications to arbitrary right-censored data. Biometrics 36: 607–625.[CrossRef]
GRAMBSCH, P. M., and T. M. THERNEAU, 1994 Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81: 515–526.
HENCKAERTS, E., H. GEIGER, C. J. LANGER, P. REBOLLO, G. VAN ZANT et al., 2002 Genetically determined variation in the number of phenotypically defined hematopoietic progenitor and stem cells and in their response to early acting cytokines. Blood 99: 3947–3954.
JOHANNES, F., D. A. BLIZARD, A. LIONIKAS, D. H. LANG, D. J. VANDENBERGH et al., 2006 QTL influencing baseline hematocrit in the C57BL/6J and DBA/2J lineage: age-related effects. Mamm. Genome 17: 689–699.[CrossRef][Medline]
KLEIN, J. P., and M. L. MOESCHBERGER, 1997 Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York.
LEHMANN, E. L., 1975 Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco.
LIONIKAS, A., D. A. BLIZARD, D. J. VANDENBERGH, M. G. GLOVER, J. T. STOUT et al., 2003 Genetic architecture of fast- and slow-twitch skeletal muscle weight in 200-day-old mice of the C57BL/6J and DBA/2J lineage. Physiol. Genomics 16: 141–152.
MORENO, C. R., J. M. ELSEN, P. LE ROY and V. DUCROCQ, 2005 Interval mapping methods for detecting QTL affecting survival and time-to-event phenotypes. Genet. Res. 85: 139–149.[CrossRef][Medline]
SCHEMPER, M., 1992 Cox analysis of survival data with non-proportional hazard functions. Statistician 41: 455–465.[CrossRef]
SCHOENFELD, D., 1981 The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68: 316–319.
SCHOENFELD, D., 1982 Partial residuals for the proportional hazards regression model. Biometrika 69: 239–241.
SCHOENFELD, D., 1983 Sample-size formula for the proportional-hazards regression model. Biometrics 39: 499–503.[CrossRef][Medline]
TABLEMAN, M., and J. S. KIM, 2004 Survival Analysis Using S: Analysis of Time-to-Event Data. Chapman & Hall/CRC, Boca Raton, FL.
THERNEAU, T. M., and P. M. GRAMBSCH, 2000 Modeling Survival Data: Extending the Cox Model. Springer, New York.
WU, R., and M. LIN, 2006 Functional mapping—how to map and study the genetic architecture of dynamic complex traits. Nat. Rev. Genet. 7: 229–237.[CrossRef][Medline]
WU, R., C.-X. MA, J. ZHU and G. CASELLA, 2002 Mapping epigenetic quantitative trait loci (QTL) altering a developmental trajectory. Genome 45: 28–33.[Medline]
WU, W., Y. ZHOU, W. LI, D. MAO and Q. CHEN, 2002 Mapping of quantitative trait loci based on growth models. Theor. Appl. Genet. 105: 1043–1049.[CrossRef][Medline]
WU, R., C.-X. MA, M. LIN, Z. WANG and G. CASELLA, 2004 Functional mapping of quantitative trait loci underlying growth trajectories using the transform-both-sides of the logistic model. Biometrics 60: 729–738.[CrossRef][Medline]
Communicating editor: S. MUSE
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |