IDT. Quality oligos. Every time.

Originally published as Genetics Published Articles Ahead of Print on October 11, 2005.

Genetics, Vol. 172, 693-699, January 2006, Copyright © 2006
doi:10.1534/genetics.105.049122

Detection of Genes for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies

Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut 06520-8034

1 Corresponding author: Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034.
E-mail: heping.zhang{at}yale.edu

Manuscript received August 2, 2005. Accepted for publication October 6, 2005.

ABSTRACT

There is growing interest in genomewide association analysis using single-nucleotide polymorphisms (SNPs), because traditional linkage studies are not as powerful in identifying genes for common, complex diseases. Tests for linkage disequilibrium have been developed for binary and quantitative traits. However, since many human conditions and diseases are measured in an ordinal scale, methods need to be developed to investigate the association of genes and ordinal traits. Thus, in the current report we propose and derive a score test statistic that identifies genes that are associated with ordinal traits when gametic disequilibrium between a marker and trait loci exists. Through simulation, the performance of this new test is examined for both ordinal traits and quantitative traits. The proposed statistic not only accommodates and is more powerful for ordinal traits, but also has similar power to that of existing tests when the trait is quantitative. Therefore, our proposed statistic has the potential to serve as a unified approach to identifying genes that are associated with any trait, regardless of how the trait is measured. We further demonstrated the advantage of our test by revealing a significant association (P = 0.00067) between alcohol dependence and a SNP in the growth-associated protein 43.


TO identify genes underlying an inheritable disease, it is critical to establish the linkage of the disease locus with a known gene or marker (usually a DNA polymorphism) (SPIELMAN et al. 1993). While classic linkage analysis has been applied successfully in mapping disease genes for many Mendelian diseases and for some complex diseases such as breast cancer (e.g., HALL et al. 1990), major challenges, limitations, and failures remain in using classic linkage analysis to map complex diseases. New techniques and methodologies must be developed to address these challenges, limitations, and failures of classic linkage analysis for more accurate identification of gene–disease associations. Some challenges that have been studied thus far include population admixture (SPIELMAN et al. 1993) and limited and imprecise information on the density of combination (RABINOWITZ 1997).

Using insulin-dependent diabetes mellitus (IDDM) as the disease of interest, SPIELMAN et al. (1993) proposed a transmission/disequilibrium test (TDT) and demonstrated its power in establishing a strong linkage between 5'-flanking polymorphism on chromosome 11 and the susceptibility to IDDM. Specifically, the TDT compares the frequency of the marker allele of interest transmitted from heterozygous parents to their affected children with that of the nontransmitted marker allele. In contrast to the classic approaches, TDT has two important features. First, it uses an affected offspring in a parent-child trio to serve as his or her own case and control in an artificially created matched pair, thus eliminating the effect of population admixture. Second, it tests for linkage when a population association has been established between alleles at a marker and the trait status.

The properties and success of TDT have led to many useful extensions in two major directions. First, efforts have been made to consider data beyond the parent-child trio design. Examples include the use of sibships (SPIELMAN and EWENS 1998) and nuclear families (LUNETTA et al. 2000; RABINOWITZ and LAIRD 2000). The TDT has also been extended to deal with quantitative traits (ALLISON 1997; RABINOWITZ 1997). Furthermore, LIU et al. (2002) proposed a unified framework for TDT when the trait distribution belongs to an exponential family.

While methods for linkage and association analysis have been well established for dichotomous and quantitative traits, there is a lack of methodological development in analyzing ordinal traits. As illustrated by ZHANG et al. (2003) and FENG et al. (2004), many human conditions (e.g., cancer and most behavioral and psychiatric disorders) are measured on discrete, ordinal scales. An unnecessary collapse of trait levels could reduce the power in genetic analyses (ZHANG et al. 2003; FENG et al. 2004). Although ZHANG et al. (2003) and FENG et al. (2004) developed a basic framework to conduct segregation and linkage analyses for pedigree data, methods have not been developed for the association analysis of ordinal traits.

The purpose of this study is to develop a score test statistic to detect genes that are associated with an ordinal trait when gametic disequilibrium between marker and trait loci exists. To demonstrate the benefit of this test, we compare the new test statistic with existing test statistics in terms of the type I error approximation and power estimation for ordinal as well as quantitative and binary traits. While the primary motive of the new test is to deal with ordinal traits, the new test becomes the standard TDT when the trait is binary. Our simulations demonstrate that the new test has comparable power to other established tests when the trait is quantitative. Thus, the new test can serve as a unified test for any trait.


METHOD AND MODEL
Suppose that there are n nuclear families and si siblings in the ith family, Formula. Let Formula be the trait vector from the ith family, whose components take values in Formula, where the level is ordered, but is not necessarily on a linear scale. This trait value reflects the severity or stage of a certain condition such as cancer or diabetes.

At trait locus t, we assume that there is a trait increasing allele D, and we use d to denote the wild-type allele(s). Let Formula represent the genotype at trait locus t for the ith family. Let Formula be number of copies of allele D in genotype Formula.

We consider a diallelic marker with alleles A and a. Let Formula be the marker data. The likelihood contributed by the ith family at locus t is the probability Formula of the observed marker data, given the vector Formula of observed phenotypes, and given that t is the disease locus. As a standard assumption, we assume that this probability is independent among different families. In addition, as in WHITTEMORE and TU (2000), we assume (a) that the trait and marker loci are closely linked such that, given the family's genotypes at a trait locus t, the family's phenotypes and marker genotypes are independent and (b) that given Formula, the traits of the family members are conditionally independent. Thus, we have

Formula
It remains to specify the distribution for the ordinal trait; namely, we assume that it follows the following proportional odds logistic model conditional on the genotype at the trait locus,

Formula
where Formula are ascending level parameters, Formula, and ß is the genetic effect. Formula and ß are referred to as penetrance parameters. Here, we defined Formula to reflect an additive model, but it can be modified to reflect a dominant or recessive model.

Let Formula and Formula be the counts of children whose trait values are greater or less than Formula, respectively, and Formula be the numbers of copies of transmitted A alleles at the marker locus. We show in the APPENDIX that the score statistic to test the null hypothesis that Formula, namely, none of the genes that are linked and in gametic disequilibrium with the marker is associated with the trait, is

Formula
Our test is based on Formula that follows Formula asymptotically. We refer it to as the O-TDT.

It is noteworthy that T belongs to a general class of score statistics Formula, where the weight function Formula depends on the trait distribution. For example, it reduces to the original TDT when Formula and to RABINOWITZ's (1997) test when Formula for a normally distributed quantitative trait, where Formula is the sample average.


SIMULATION
Our simulation serves two purposes. First, the type I error of our score test with respect to specific nominal levels (0.05, 0.01, and 0.0001) was assessed to validate the asymptotic behavior of the test statistic in practical settings. Second, we evaluated the power of our test in comparison with other established test statistics.

Simulation experiment design:

The data are generated as follows. First, the parents' genotypes were generated according to specified coefficients of linkage disequilibrium or haplotype frequencies as delineated in Table 1.


View this table:
In this window
In a new window

 
TABLE 1

Haplotype frequencies with P(D) = P(A) = 0.3 and {delta} = 0.11

 
After the parental genotypes were generated, the offspring genotypes were generated depending on the purpose of the simulation. Under the null hypothesis, the trait is not associated with a locus linked to the marker. This is used to assess the type I error. To evaluate the power, the trait and marker loci are 1 cM apart. Finally, conditional on the trait genotype, the trait was generated by two models for different comparison purposes:

  1. A nonproportional odds model was also used to generate an ordinal trait. Because our score test was derived from a proportional odds model, we deliberately generated data from nonproportional odds to assess the robustness of our score test with respect to the proportionality assumption.
  2. A Gaussian model was used to generate a quantitative trait to evaluate the performance of the O-TDT for the quantitative trait. Again, the proportionality was not assumed.

Type I error comparison:

In Table 2, we compare the nominal levels of type I error with those estimated empirically by the simulation in 10,000 replications when ordinal traits were generated from nonproportional odds models. For the nonproportional odds model, Table 3 delineates the penetrance distribution, namely the distribution of the ordinal trait given the genotype at the trait locus.


View this table:
In this window
In a new window

 
TABLE 2

Type I error comparison for ordinal traits

 

View this table:
In this window
In a new window

 
TABLE 3

Conditional and marginal distributions for ordinal traits generated from nonproportional odds models

 
In Table 4, we compare the nominal levels of type I error with those estimated empirically by the simulation in 10,000 replications when quantitative traits were generated from a Gaussian distribution. Once the quantitative traits are generated, the observed trait values are regarded as discrete and ordered quantities and hence can be treated as ordinal numbers, allowing the use of O-TDT.


View this table:
In this window
In a new window

 
TABLE 4

Type I error comparison for quantitative traits

 
It is clear from Tables 2 and 4 that the empirical type I errors estimated from the simulation replications are numerically close to the nominal levels. On a relative scale, however, some deviations between the empirical and nominal levels of type I errors can be observed for {alpha} = 0.0001, but given the very small size of the nominal level such deviations are not unexpected from our 10,000 replications.

Power comparison:

Table 5 compares power of TDT and O-TDT at three significance levels when ordinal traits were generated from nonproportional odds models as described in Simulation experimental design. We do not compare Q-TDT with O-TDT for ordinal traits, because the ordinal scale is not numerically meaningful, and the use of the Q-TDT is not appropriate. Table 6 compares power of Q-TDT and O-TDT at three significance levels when quantitative traits were generated from the model as described in Simulation experimental design.


View this table:
In this window
In a new window

 
TABLE 5

Power comparison for ordinal traits that are characterized in Table 3

 

View this table:
In this window
In a new window

 
TABLE 6

Power comparison for quantitative traits

 
Table 5 demonstrates that dichotomizing an ordinal trait can lead to a substantial loss of power. Figure 1 highlights the gain of power by O-TDT relative to the use of TDT when ordinal traits were generated from nonproportional odds models. Figure 1 includes two choices of K (4 and 5) and two choices for the number of families (200 and 400).


Figure 1
View larger version (18K):
In this window
In a new window
Download PPT slide
 
FIGURE 1.—

Power comparison between O-TDT and TDT for ordinal traits generated from nonproportional odds models.

 


RESULTS
We now test the association between a candidate single-nucleotide polymorphism (SNP) and alcohol dependence. The data were from the Collaborative Study on the Genetics of Alcoholism (COGA), which was a six-center study aimed at identifying susceptibility genes for alcohol dependence (BEGLEITER et al. 1995). There were 143 families with 1614 individuals in the study. We focused on a particular candidate SNP, rs714697, from gene GAP43 (growth-associated protein 43) in the chromosome region 3q13.1. We selected this SNP because a prior study (SAUNDERS et al. 1995) reported that alcohol teratogenesis may be due in part to inhibition of neuronal differentiation by ethanol and that alcohol dose-dependently (0–0.5%) decreased GAP43/B50 protein levels by up to 92% in immature LA-N-5 cells. In addition, another study (BLENNOW 2004) suggested that GAP43 was associated with cerebrospinal fluid level.

The alcohol dependence measure that we analyzed was based on several diagnostic systems, including the Diagnostic and Statistical Manual of Mental Disorders, Ed. 3, Revised (DSM-III-R). This measure was recorded on an ordinal scale with four levels (pure unaffected, never drank, unaffected with some symptoms, and affected).

We applied our test for the ordinal alcohol dependence measure and founded a highly significant association (P = 0.00067) between rs714697 and alcohol dependence. However, when we employed a standard TDT by dichotomizing the ordinal alcohol dependence into affected and unaffected, the P-value was 0.01. Thus, the use of the original ordinal scale reveals a much more significant association.


DISCUSSION
It has been observed that traditional linkage studies are not as powerful as association studies for the identification of genes contributing to the risk for common, complex diseases (RISCH and MERIKANGAS 1996). There has been growing interest in genomewide association analyses using SNPs. For example, KLEIN et al. (2005) successfully employed this approach to identify the complement factor H gene on chromosome 1 for age-related macular degeneration, the major cause of blindness in the elderly. Even though many human conditions and diseases are measured in an ordinal scale, the existing analytic approaches were developed for binary or quantitative traits. The main objective of this report is to present a score statistic to detect genes associated with an ordinal trait and demonstrate its power through simulation and real data. We observed that the statistic belongs to a general class of test statistics that have been useful for association analyses of binary and quantitative traits. Through simulation studies, we discovered that our proposed score statistic (i.e., O-TDT) can serve as a unified test for all types of traits (binary, ordinal, or quantitative). The new test is more powerful when the trait is ordinal and comparable to existing tests when the trait is binary or quantitative. The analysis of SNP rs714697 supports the notation that it is highly worthwhile to analyze a trait in its original scale instead of a dichotomization.

Although we presented the O-TDT test for a diallelic marker, particularly a SNP, the test can be extended for association studies to detect multiple SNPs (HOH et al. 2001) or haplotypes (ZHANG et al. 2004) that may affect the trait.


APPENDIX: SCORE STATISTIC
Given phenotypes, the marker probability is

Formula
where Formula, Formula for Formula, and Formula. Note that

Formula

It is simple to see that Formula. Then,

Formula

Under the null hypothesis that Formula, we have

Formula
and

Formula

For convenience, we drop the two irrelevant parameters in Formula from now on. Therefore,

Formula
Let Formula be the coefficient of linkage disequilibrium. We have

Formula

Formula
and

Formula

Therefore,

Formula
In the absence of association Formula between the marker and trait loci, the score function equals zero under the null hypothesis. However, in the presence of association Formula, ignoring all constants, the score function becomes

Formula
for all families. The forgoing score function depends on the nuisance level parameters Formula. We use the empirical distribution function of the trait values but do not estimate Formula's directly. Hence, we replace Formula with Formula, where Formula and Formula are the counts of children whose trait values are greater or less than Formula, respectively.

Under the null hypothesis, the conditional expectation values Formula and Formula are

Formula
and

Formula

The results of RABINOWITZ and LAIRD (2000) can be applied to estimate Formula, Formula, and Formula and then to obtain Formula and Formula.


ACKNOWLEDGEMENTS
We thank John Myers for his careful reading and comments on an earlier draft of this manuscript. Data were provided by the Collaborative Study on the Genetics of Alcoholism (U10AA008401). This research was supported in part by grants DA012468, DA017713, and DA016750 from the National Institute on Drug Abuse.


LITERATURE CITED

ALLISON, D. B., 1997 Transmission-disequilibrium test for quantitative traits. Am. J. Hum. Genet. 60: 676–690.[Medline]

BEGLEITER, H., T. REICH, V. HESSELBROCK, B. PORJESZ, T. K. LI et al., 1995 The collaborative study on the genetics of alcoholism. Alcohol Health Res. World 19: 228–236.

BLENNOW, K., 2004 Cerebrospinal fluid protein biomarkers for Alzheimer's disease. J. Am. Soc. Exp. Neurother. 1: 213–225.

FENG, R., J. LECKMAN and H. P. ZHANG, 2004 Linkage analysis of ordinal traits for pedigree data. Proc. Natl. Acad. Sci. USA 101: 16739–16744.[Abstract/Free Full Text]

HALL, J. M., M. K. LEE, B. NEWMAN, J. E. MORROW, L. A. ANDERSON et al., 1990 Linkage of early-onset familial breast cancer to chromosome 17q21. Science 250: 1684–1689.[Abstract/Free Full Text]

HOH, J., A. J. WILLE and J. OTT, 2001 Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res. 11: 2115–2119.[Abstract/Free Full Text]

KLEIN, R. J., C. ZEISS, E. Y. CHEW, J. Y. TSAI, R. S. SACKLER et al., 2005 Complement factor H polymorphism in age-related macular degeneration. Science 308: 385–389.[Abstract/Free Full Text]

LIU, Y., D. TRICHLER and S. B. BULL, 2002 A unified framework for transmission-disequilibrium test analysis of discrete and continuous traits. Genet. Epidemiol. 22: 26–40.[CrossRef][Medline]

LUNETTA, K. L., S. V. FARONE, J. BIEDERMAN and N. M. LAIRD, 2000 Family based tests of association and linkage that used unaffected sibs, covariates, and interactions. Am. J. Hum. Genet. 66: 605–614.[CrossRef][Medline]

RABINOWITZ, D, 1997 A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47: 342–350.[CrossRef][Medline]

RABINOWITZ, D., and N. LAIRD, 2000 A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum. Hered. 50: 211–223.[CrossRef][Medline]

RISCH, N., and K. MERIKANGAS, 1996 The future of genetic studies of complex human diseases. Science 273: 1516–1517.[Abstract/Free Full Text]

SAUNDERS, D. E., J. H. HANNIGAN, C. S. ZAJAC and N. L. WAPPLER, 1995 Reversal of alcohol's effects on neurite extension and on neuronal GAP43/B50, N-myc, and c-myc protein levels by retinoic acid. Dev. Brain Res. 86: 16–23.[CrossRef][Medline]

SPIELMAN, R. S., and W. J. EWENS, 1998 A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am. J. Hum. Genet. 62: 450–458.[CrossRef][Medline]

SPIELMAN, R. S., R. E. MCGINNIS and W. J. EWENS, 1993 Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52: 506–516.[Medline]

WHITTEMORE, A. S., and I. P. TU, 2000 Detecting disease genes using family data. I. Likelihood-based theory. Am. J. Hum. Genet. 66: 1328–1340.[CrossRef][Medline]

ZHANG, H. P., R. FENG and H. T. ZHU, 2003 A latent variable model of segregation analysis for ordinal traits. J. Am. Stat. Assoc. 98: 1023–1034.[CrossRef]

ZHANG, K., Z. H. QIN, J. S. LIU, T. CHEN, M. S. WATERMAN et al., 2004 Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. Genome Res. 14: 908–916.[Abstract/Free Full Text]

Communicating editor: Y.-X. FU