- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.050914v1
172/3/1821 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Tan, Q.
- Articles by Christensen, K.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Tan, Q.
- Articles by Christensen, K.
Originally published as Genetics Published Articles Ahead of Print on December 30, 2005.
Genetics, Vol. 172, 1821-1828, March 2006, Copyright © 2006
doi:10.1534/genetics.105.050914
Genetic Association Analysis of Human Longevity in Cohort Studies of Elderly Subjects: An Example of the PON1 Gene in the Danish 1905 Birth Cohort
Qihua Tan*,
,1,
Lene Christiansen
,
Lise Bathum*,
,
Shuxia Li
,
Torben A. Kruse* and
Kaare Christensen
* Department of Clinical Biochemistry and Genetics, Odense University Hospital, DK-5000 Odense, Denmark and
Institute of Public Health, University of Southern Denmark, DK-5000 Odense, Denmark
1 Corresponding author: Department of Clinical Biochemistry and Genetics (KKA), Odense University Hospital, Sdr. Blvd. 29, DK-5000 Odense C, Denmark.
E-mail: qihua.tan{at}ouh.fyns-amt.dk
>ABSTRACT
METHODS
SIMULATION STUDY
EXAMPLE APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Although the case-control or the cross-sectional design has been popular in genetic association studies of human longevity, such a design is prone to false positive results due to sampling bias and a potential secular trend in geneenvironment interactions. To avoid these problems, the cohort or follow-up study design has been recommended. With the observed individual survival information, the Cox regression model has been used for single-locus data analysis. In this article, we present a novel survival analysis model that combines population survival with individual genotype and phenotype information in assessing the genetic association with human longevity in cohort studies. By monitoring the changes in the observed genotype frequencies over the follow-up period in a birth cohort, we are able to assess the effects of the genotypes and/or haplotypes on individual survival. With the estimated parameters, genotype- and/or haplotype-specific survival and hazard functions can be calculated without any parametric assumption on the survival distribution. In addition, our model estimates haplotype frequencies in a birth cohort over the follow-up time, which is not observable in the multilocus genotype data. A computer simulation study was conducted to specifically assess the performance and power of our haplotype-based approach for given risk and frequency parameters under different sample sizes. Application of our method to paraoxonase 1 genotype data detected a haplotype that significantly reduces carriers' hazard of death and thus reveals and stresses the important role of genetic variation in maintaining human survival at advanced ages.
THE current genetic association studies on human aging and longevity are dominated by the casecontrol (cases represent long lived and controls the younger aged) or cross-sectional design. In these studies, subjects of different ages are genotyped, and frequencies of a particular gene variant are compared across the observed ages to infer genetic association (DE BENEDICTIS et al. 2001). In terms of analytical methods, new statistical approaches have been proposed to help analyze genotype data collected in cross-sectional studies with improved power (TAN et al. 2004). LEWIS and BRUNNER (2004) explored the validity of the basic assumptions in the cross-sectional approach, i.e., no secular change in both risk and initial frequency of the gene under study. Their study concluded that the assumptions are questionable when gene frequency differs in populations and geneenvironment interactions exist and suggested conducting long-term follow-up studies to ensure verifiable results. Although the cohort study design is expensive due to the long time of follow up, it is practically affordable to carry out follow-up studies on aged subjects given the high mortality rate at advanced ages. In this case, the study aims at investigating the genetic effect on human survival at extreme ages, an important topic nowadays because of the largely increased mean life span in the developed countries (VAUPEL et al. 1998). In the literature, association studies using follow-up design on oldest-old subjects such as nonagenarians (CHRISTIANSEN et al. 2004; HURME et al. 2005) and centenarians (BLANCHE et al. 2001; LOUHIJA et al. 2001) have already been conducted. With the collected genotype and survival information, traditional statistical methods have been employed for single-locus analysis, for example, frequency comparison using a simple
2-(trend) test or estimating genotype relative risk using the traditional Cox regression model.
In the context of human disease gene mapping, multilocus approaches such as the haplotype-based association analysis have been introduced (SCHAID et al. 2002; ZHAO et al. 2003). Haplotype-based analysis exhibits more power due to its functional and statistical advantages over the single-locus approach in linkage disequilibrium mapping (AKEY et al. 2001; CLARK 2004; SCHAID 2004). Unfortunately, haplotype analysis in population-based human longevity studies encounters the problem of missing phases in the long-lived subjects because genotype information is unavailable from their parents. We present a novel survival analysis model that combines population survival with individual genotype and phenotype information in assessing the genetic association with human longevity in cohort studies. By monitoring the changes in the observed genotype frequencies over the follow-up period in a birth cohort, we are able to assess the effects of the genotypes and/or haplotypes on individual survival. With the estimated parameters, genotype- and/or haplotype-specific survival and hazard functions can be calculated without any parametric assumption on the survival distribution. In addition, our model estimates haplotype frequencies in a birth cohort over the follow-up time, which is not observable in the multilocus genotype data. A computer simulation study was conducted to evaluate the performance of our model. For given haplotype risk and frequency parameters, we assess the power of our model under different sample sizes and years of follow up. The model is applied to measure the association of paraoxonase 1 (PON1) gene polymorphism with human survival at advanced ages in the Danish 1905 birth cohort followed from 1998 to 2005. Application of our model helped us to detect a haplotype of a PON1 gene that significantly reduces the carrier's hazard of death over the follow up.
ABSTRACT
>METHODS
SIMULATION STUDY
EXAMPLE APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
The basic model for genotype-based analysis:
We suppose that we start our follow-up study in a birth cohort of old subjects from initial age
. For each individual, we obtain genotype information at a biallelic locus (e.g., a SNP locus) for assessing its influence on conditional survival after age
. Combination of the two alleles (1 and 2) forms three genotypes (11, 12, and 22). If the frequencies of carriers of the three genotypes are
,
, and
at intake (age
), then the conditional survival (conditional on the fact that each individual has survived to age
) of the birth cohort can be expressed as the mean of the survivals for carriers of the three genotypes; i.e.,
![]() | (1) |
,
, and
are genotype-specific conditional survival functions for carriers of the corresponding genotypes. Age x ranges from
to
, where t is the follow-up time in years. From (1), it is straightforward to calculate the proportion of genotype carriers in the birth cohort at age x during the follow up,
![]() | (2) |
, the genotypes 11 and 12 affect survival with the relative risks (i.e., the relative rate of death among those carrying a specific genotype over that of the reference genotype)
and
, respectively, then in a proportional hazard model we have the hazard of death at age x for carriers of the three genotypes as
![]() | (3) |
is the hazard of death for the 22 genotype or the baseline hazard function. With (3), we obtain the survival functions for carriers of the three genotypes as
![]() | (4) |
, we obtain the genotype-specific survivals (see APPENDIX):
![]() | (5) |
to
),
![]() | (6) |
and p(x) are vectors of the observed and the fitted frequencies of all the genotypes at age x. In the estimation, a parametric form of the baseline hazard function can be assigned. However, by introducing the mean cohort survival available from population statistics into (1), a nonparametric baseline hazard function can be estimated. This is done using a two-step procedure (YASHIN et al. 1999; TAN et al. 2001) in which we start with an initial guess of the risk and frequency parameters and then numerically solve (1) to obtain a nonparametric baseline hazard function. This hazard function is introduced into (6) to estimate the risk and frequency parameters. The estimated parameters are put back into (1) to calculate an updated baseline hazard function. This process continues until (6) converges (TAN et al. 2001). To obtain the statistical significance for the risk parameters, we shuffle the ages at last observation for all the subjects to conduct the permutation tests.
The extended model for haplotype-based analysis:
When genotypes at closely linked loci are available, haplotype-based analysis can be performed. Different from genotypes that are observable for each individual, individual haplotypes cannot be determined explicitly without knowing phases. We start with assuming that all the haplotypes occurring at the typed loci are unambiguously observed and denote the collection of them with
. Under HardyWeinberg equilibrium (HWE), the frequency of the haplotype pair
can be calculated as
![]() | (7) |
and
are the haplotype frequencies at initial age
for haplotypes
and
. We further assume that the relative risk on hazard of death for carriers of haplotype
is
. For carriers of haplotype
, the hazard of death at follow-up age x is
in a proportional hazard model. Similar to the situation of genotype-based analysis, both homogeneity (no unobserved risk factor) and heterogeneity or frailty models can be fitted. In any case, the mean conditional survival of the birth cohort is the weighted survival for carriers of the different haplotype pairs; i.e.,
![]() | (8) |
in the birth cohort is
![]() | (9) |
that are consistent with g. With this relationship, we can express the frequency of the multilocus genotype g at follow-up age x in the birth cohort as
![]() | (10) |
![]() | (11) |
![]() | (12) |
All computer codes written in the GAUSS programming language are freely available upon contacting the corresponding author.
ABSTRACT
METHODS
>SIMULATION STUDY
EXAMPLE APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
= 0.05 in the null distribution is used in calculating the power. In Table 1, we show the results from our simulation study using 500 replicates. We can see that, overall, our model produces unbiased estimates for the fixed haplotype risk and frequency parameters used in generating the multilocus genotype data. The precision of the estimated relative risks as indicated by their percentiles goes up with increasing sample sizes and extending follow-up time. The top part of Table 1 contains the results for a relatively frequent haplotype (frequency of 0.284). For a strong relative risk of r = 0.6, the model can have acceptable power in capturing the parameter even within a follow-up period of <5 years using a small birth cohort (N = 500). However, for the same sample size, the model is unable to detect a low risk haplotype (r = 0.8). A large birth cohort (N = 1000) is required to obtain an acceptable power (80%) after 7 years follow-up. For a haplotype with a lower frequency of 0.116, the model still exhibits acceptable power in a small birth cohort (N = 500) after 7 years follow-up. When the sample size is doubled, the power can be as high as 91% after 5 years follow-up. To detect a low frequency and low risk haplotype (r = 0.8), a long term of follow-up (>7 years) and a large sample of >2000 is required.
|
In Table 1, we also report the simulation result on a haplotype with a frequency of 0.284 and a risk of 0.8 but ignoring the effect of hidden frailty (
). During all the follow-up years, the effect of the haplotype is obviously underestimated, meaning conservative results. At the same time, the power is largely reduced. Finally, a large birth cohort of 10,000 individuals (over 93 years) was simulated. This time we assume that there is a harmful haplotype that increases the carrier's hazard of death with a relative risk of 1.5 and frequency of the haplotype at the initial age is chosen as 0.284. In Figure 1, we show the theoretical, the simulated, and the estimated frequency trajectories for the haplotype after 7 years follow up. Both the theoretical (solid line) and the simulated (dashed line) frequency patterns are well captured by our model (the dashed-dotted line). Note also that all three lines start from the initial frequency we set in the simulation.
|
ABSTRACT
METHODS
SIMULATION STUDY
>EXAMPLE APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
In Table 2, we show the genotype counts and number of typed individuals at the three loci. Chi-square tests on each locus showed that all three loci are in HardyWeinberg equilibrium (
= 0.005, P = 0.944 at PON1 C-107T;
= 0.100, P = 0.752 at PON1 M55L; and
= 3.190, P = 0.074 at PON1 Q192R). We present our results on genotype-based analysis in Table 2, where, as described in METHODS, risk and frequency parameters are estimated for each genotype against the reference. Although only a subsample was used, our model detected PON1 192 Q/Q and Q/R genotypes as of potential influence (P = 0.062 for Q/R and P = 0.076 for Q/Q genotypes), a result consistent with that of CHRISTIANSEN et al. (2004). In Figure 2, we show the observed and the estimated genotype frequency patterns for all genotypes at the three loci.
|
|
On the basis of genotype information from the three loci, we carried out a haplotype-based analysis using our extended model. By taking individuals with genotype information available at all three loci, we obtained a sample size of 451 for haplotype analysis. We first tested the linkage disequilibrium between the markers using GENECOUNTING software (ZHAO et al. 2002). Very strong markermarker disequilibrium was found between markers C-107T and M55L (
= 225.36, P = 0), M55L and Q192R (
= 240.29, P = 0), and C-107T and Q192R (
= 17.54, P = 0). In Table 3, we show the estimated initial frequency and the relative risk for each haplotype by assigning the rest as reference. Our results show that haplotype T-L-Q exhibits a significantly (P = 0.043) beneficial effect that reduces the carrier's hazard of death with a relative risk of 0.688 and an initial frequency of 0.090. In Figure 3, we show the estimated haplotype frequencies for haplotypes T-L-Q and T-L-R calculated using (12). The increasing frequency of the T-L-Q haplotype in survivors of the 1905 birth cohort illustrates the beneficial effect of the haplotype, which is in contrast to the frequency of the T-L-R haplotype. In addition, on the basis of the parameter estimates, we calculated the conditional survival functions for carriers and noncarriers of the T-L-Q haplotype as presented in Figure 4. A comparison of the survival curves shows that, on average, carriers of the T-L-Q haplotype may live
1 year longer than noncarriers of the haplotype (Figure 4).
|
|
|
ABSTRACT
METHODS
SIMULATION STUDY
EXAMPLE APPLICATION
>DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Although our model can support the parametric form of the baseline hazard function, it is important to point out that, by incorporating the cohort-specific survival function available from population statistics, our model conducts parameter estimation without imposing any parametric form of the hazard function. This means that our model provides a nonparametric approach in analyzing survival data at advanced ages. This is important because, at extreme ages, validity of the parametric survival functions, such as the Gompertz or the GompertzMakeham models, has been seriously questioned (DRIVER 2001). In addition, when the sample size is limited at advanced ages, there will be a considerable error in estimating the survival distribution, which consequently leads to unreliable results. Since our model models the genotype or haplotype frequency patterns in the survivors over the follow-up period, with the estimated parameters, the fitted frequency trajectory over the observed period can be examined (Figures 2 and 3) for each genotype or haplotype. This is especially useful in haplotype-based analysis because, unlike in the single-locus analysis, we do not actually observe the individual haplotype in population studies. Moreover, since the parameters are estimated by monitoring the genotype or haplotype frequencies in the survivors over the follow-up period in a birth cohort, exact individual life span is not necessarily required to apply the model. This means that censoring is not a problem at all in our analysis. Finally, although the method is introduced using SNP markers, extending it to multiallelic loci involves only more genotypes or haplotypes in the analysis.
Here, we emphasize the importance of the frailty modeling in our analysis. It is well known that, at advanced ages, the cause-specific mortality curves start to converge as a result of heterogeneity in an individual's frailty composition (VAUPEL et al. 1998). As a result, ignoring the existence of the competing risk factors can substantially underestimate the risks of genotypes or haplotypes. Frailty modeling in our genotype- and haplotype-based analyses can help us to assess the risk parameters in a more realistic manner. Most importantly, since haplotype analysis is biologically as well as statistically advantageous over the single-locus approach, our haplotype model with frailty modeling provides a powerful method in data analysis. In addition, we point out that the applicability of our model is theoretically not limited to longevity studies. Application of our model to any time to event data, for example, the age of onset of a disease, is feasible provided that the population probability distribution of the time to onset of the disease is available.
On the basis of the mixed results from conducted follow-up studies on apolipoprotein E gene and longevity, LEWIS and BRUNNER (2004) suggested that adequate cohort studies with longer follow up (>5 years) be conducted to obtain reliable results. It is interesting to see from Table 1 that their conclusion seems to comply with our simulation. Since the cohort design avoids the validity issues concerning a cross-sectional design (secular change in the risk and initial frequency of the observed gene), our simulation result is promising because, with a proper analytical approach, the effect of a gene on human extreme age survival can be detected within an affordable period of follow-up time.
ABSTRACT
METHODS
SIMULATION STUDY
EXAMPLE APPLICATION
DISCUSSION
>APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
, the hazard of death at age x is
. The mean hazard of death for a heterogeneous population carrying the genotype is
![]() | (A1) |
is gamma distributed with mean 1 and variance
. Then
in (A1) can be derived as
. Substituted into (A1), we get
![]() | (A2) |
is the cumulative baseline hazard function. Correspondingly, we have the mean survival for the genotype carriers,
![]() | (A3) |
requires a large sample size (EWBANK 2002). In small-scale investigations,
can be determined by a grid search for the peak of the likelihood for tentatively assigned values of
(YASHIN et al. 2000; TAN et al. 2001). On the basis of our experiences in fitting the gamma-frailty model to large population data sets, one can alternatively fit a frailty model by simply setting
to 0.1. This can be conservative compared with some empirical results (YASHIN et al. 2000; TAN et al. 2001; EWBANK 2002). However, we think it is applicable for small data sets.
ABSTRACT
METHODS
SIMULATION STUDY
EXAMPLE APPLICATION
DISCUSSION
APPENDIX
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
METHODS
SIMULATION STUDY
EXAMPLE APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
>LITERATURE CITED
AALEN, O., 1998 Heterogeneity in survival analysis. Stat. Med. 7: 11211137.
AKEY, J., L. JIN and M. XIONG, 2001 Haplotypes vs single marker linkage disequilibrium tests: What do we gain? Eur. J. Hum. Genet. 9: 291300.[CrossRef][Medline]
BELLIZZI, D., G. ROSE, P. CAVALCANTE, G. COVELLO, S. DATO et al., 2005 A novel VNTR enhancer within the SIRT3 gene, a human homologue of SIR2, is associated with survival at oldest ages. Genomics 85: 258263.[CrossRef][Medline]
BLANCHE, H., L. CABANNE, M. SAHBATOU and G. THOMAS, 2001 A study of French centenarians: are ACE and APOE associated with longevity? Crit. Rev. Acad. Sci. III. 324: 129135.
BONAFE, M., F. MARCHEGIANI, M. CARDELLI, F. OLIVIERI, L. CAVALLONE et al., 2002 Genetic analysis of Paraoxonase (PON1) locus reveals an increased frequency of Arg192 allele in centenarians. Eur. J. Hum. Genet. 10: 292296.[Medline]
CHRISTIANSEN, L., L. BATHUM, H. FREDERIKSEN and K. CHRISTENSEN, 2004 Paraoxonase 1 polymorphisms and survival. Eur. J. Hum. Genet. 12: 843847.[Medline]
DE BENEDICTIS, G., Q. TAN, B. JEUNE, K. CHRISTENSEN, S. V. UKRAINTSEVA et al., 2001 Recent advances in human gene-longevity association studies. Mech. Ageing Dev. 122: 909920.[CrossRef][Medline]
DRIVER, C., 2001 The Gompertz function does not measure ageing. Biogerontology 2: 6165.[Medline]
EWBANK, D. C., 2002 Mortality differences by APOE genotype estimated from demographic synthesis. Genet. Epidemiol. 22: 146155.[CrossRef][Medline]
HEINECKE, J. W., and A. J. LUSIS, 1998 Paraoxonase-gene polymorphisms associated with coronary heart disease: Support for the oxidative damage hypothesis? Am. J. Hum. Genet. 62: 2024.[CrossRef][Medline]
HOUGAARD, P., 1991 Modeling heterogeneity in survival analysis. J. Appl. Prob. 28: 695701.[CrossRef]
HURME, M., T. LEHTIMAKI, M. JYLHA, P. J. KARHUNEN and A. HERVONEN, 2005 Interleukin-6 -174G/C polymorphism and longevity: a follow-up study. Mech. Ageing Dev. 126: 417418.[CrossRef][Medline]
CLARK, A. G., 2004 The role of haplotypes in candidate gene studies. Genet. Epidemiol. 27: 321333.[CrossRef][Medline]
LEWIS, S. J., and E. J. BRUNNER, 2004 Methodological problems in genetic association studies of longevitythe apolipoprotein E gene as an example. Int. J. Epidemiol. 33: 962970.
LOUHIJA, J., M. VIITANEN, H. AGUERO-TORRES and R. TILVIS, 2001 Survival in Finnish centenarians in relation to apolipoprotein E polymorphism. J. Am. Geriatr. Soc. 49: 10071008.[CrossRef][Medline]
SCHAID, D. J., 2004 Evaluating associations of haplotypes with traits. Genet. Epidemiol. 27: 348364.[CrossRef][Medline]
SCHAID, D. J., C. M. ROWLAND, D. E. TINES, R. M. JACOBSON and G. A. POLAND, 2002 Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70: 425434.[CrossRef][Medline]
TAN, Q., G. DE BENEDICTIS, A. I. YASHIN, M. BONAFE, M. DELUCA et al., 2001 Measuring the genetic influence in modulating human lifespan: gene-environment and gene-sex interactions. Biogerontology 2: 141153.[CrossRef][Medline]
TAN, Q., A. I. YASHIN, K. CHRISTENSEN, B. JEUNE, G. DE BENEDICTIS et al., 2004 Multidisciplinary approaches in genetic studies on human aging and longevity. Curr. Genomics 5: 409416.
VAUPEL, J. W., K. G. MANTON and E. STALLARD, 1979 The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16: 439454.[Medline]
VAUPEL, J. W., J. R. CAREY, K. CHRISTENSEN, T. E. JOHNSON, A. I. YASHIN et al., 1998 Biodemographic trajectories of longevity. Science 280: 855860.
WATSON, A. D., J. A. BERLINER, S. Y. HAMA, B. N. LA DU, K. F. FAULL et al., 1995 Protective effect of high density lipoprotein associated paraoxonase. Inhibition of the biological activity of minimally oxidized low density lipoprotein. J. Clin. Invest. 96: 28822891.[Medline]
YASHIN, A. I., G. DE BENEDICTIS, J. W. VAUPEL, Q. TAN, K. F. ANDREEV et al., 1999 Genes, demography, and lifespan: the contribution of demographic data in genetic studies on ageing and longevity. Am. J. Hum. Genet. 65: 11781193.[CrossRef][Medline]
YASHIN, A. I., G. DE BENEDICTIS, J. W. VAUPEL, Q. TAN, K. F. ANDREEV et al., 2000 Genes and longevity: lessons from studies on centenarians. J. Gerontol. 55A: B1B10.[CrossRef]
ZHAO, H., R. PFEIFFER and M. H. GAIL, 2003 Haplotype analysis in population genetics and association studies. Pharmacogenomics 4: 171178.[CrossRef][Medline]
ZHAO, J. H., S. LISSARRAGUE, L. ESSIOUX and P. C. SHAM, 2002 GENECOUNTING: haplotype analysis with missing genotypes. Bioinformatics 18: 16941695.
Communicating editor: R. W. DOERGE
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.050914v1
172/3/1821 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Tan, Q.
- Articles by Christensen, K.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Tan, Q.
- Articles by Christensen, K.


















