Abstract
The recombination rates in meioses of females and males are often different. Some genes that affect development and behavior in mammals are known to be imprinted, and >1% of all mammalian genes are believed to be imprinted. When the gene is imprinted and the recombination fractions are sex specific, the conventional transmission disequilibrium test (TDT) is shown to be still valid for testing for linkage. The power function of the TDT is derived, and the effect of the degree of imprinting on the power of the TDT is investigated. It is learned that imprinting has little effect on the power when the female and male recombination rates are equal. On the basis of case–parents trios, the transmissions from the heterozygous fathers/mothers to their affected children are separated as paternal and maternal, and two TDT-like statistics, TDTp and TDTm, are consequently constructed. It is found that the TDTp possesses a higher power than the TDT for maternal imprinting genes, and the TDTm is more powerful than the TDT for paternal imprinting genes. On the basis of the parent-of-origin effects test statistic (POET), a novel statistic, TDT incorporating imprinting (TDTI) is proposed to test for linkage in the presence of linkage disequilibrium, which is shown to be more powerful than the TDT when parent-of-origin effects are significant but slightly less powerful than the TDT when parent-of-origin effects are negligible. The validity of the TDT and TDTI is assessed by simulation. The power approximation formulas for the TDT and TDTI are derived and the simulation results show that they are accurate. The simulation study on power comparison shows that the TDTI outperforms the TDT for imprinted genes. The improvement can be substantial in the case of complete paternal/maternal imprinting.
THE transmission disequilibrium test (TDT) is a powerful and major approach to search for genes underlying human complex/common diseases. It was introduced originally by Spielman et al. (1993) on the basis of the case–parents trios, to test directly for linkage between the marker locus and a disease susceptibility locus (DSL) when association due to linkage disequilibrium (LD) is present. The TDT essentially tests for the equality of the expected number of transmissions and that of nontransmissions of a marker allele from heterozygous parents to their affected offspring. When both parents' marker genotypes were unavailable, Spielman and Ewens (1998) extended the TDT to the sibship (S)-TDT for use in sibships with at least one affected individual and one unaffected individual. When only one of the parents was available, Sun et al. (1999) proposed a test, termed the 1-TDT, for use with marker genotypes of the affected individuals and the available parents. Sebastiani et al. (2004) proposed a robust TDT to handle incomplete genotypes on both parents and children, which does not rest on any assumption about a missing data mechanism.
Genomic imprinting, also known as “parent-of-origin effect,” is an important epigenetic factor. More than 1% of all mammalian genes are believed to be imprinted (Morison et al. 2001). Morison et al. (2001) constructed an imprinted-gene database which contained 489 records at the time of submission (http://www.otago.ac.nz/IGC). The transcriptional activity of an imprinted gene is dependent on the parent-of-origin of the allele. Imprinting can happen when either the paternal or the maternal copy of a gene is inactivated. DNA methylation and the differential packing density of DNA by histone proteins are two mechanisms known to be involved in the process of imprinting, which is determined by chromosomal region of DNA or by differences in chromatin structure (Hall 1990; Ainscough and Surani 1996; Bartolomei and Tilghman 1997; Strauch et al. 2000; Knapp and Strauch 2004). Reviews of mechanisms and function of genomic imprinting can be referred to in Pfeifer (2000), Reik and Walter (2001), and Wilkins and Haig (2003). Parent-of-origin effects have been demonstrated in several genetic disorders, such as Beckwith–Wiedemann, Prader–Willi, and Angleman syndromes (Falls et al. 1999). Recently, Weinberg (1999) and Zhou et al. (2007) developed tests for the detection of parent-of-origin effects, and Strauch and Baur (2005) reported the investigation results of genetic loci that show parent-of-origin effects in linkage analyses for alcoholism-related traits.
According to a rule by Haldane (1922), it is more common to have crossovers in the homogametic sex (e.g., XX) than in the heterogametic sex (e.g., XY) (Ott 1999, pp. 211–212). In other words, recombination rates are sex specific. The recombination rate for human females is on the average 60% higher than that for human males (Fann and Ott 1995; Broman et al. 1998), but the rates can be highly variable in some regions of the chromosome (Wu et al. 2005). In linkage analysis, sex-specific recombination rates are a consequence of imprinting and Smalley (1993) suggested the utilization of this information for possible identification of traits undergoing imprinting.
A number of investigators worked on the statistical power of the TDT. For example, Xiong and Guo (1998) demonstrated that the power was a function of several genetic parameters including the recombination fraction, penetrance, age of mutant disease allele, marker allele frequency, recurrent mutation rates at the marker and/or disease locus, and initial LD. Knapp (1999) presented a rigorous method for obtaining the power of the TDT for samples consisting of families with either a single affected child or affected sib pairs, based on the asymptotic normality of the maximum-likelihood estimator.
The statistical power of tests for genetic linkage would be affected when the genes are imprinted. Much work has been done recently to incorporate parent-of-origin effects into existing tests for linkage. Weinberg et al. (1998) investigated the effects in genetic studies by a log-linear approach, where some parameters were introduced to describe the parental effects. Knapp and Strauch (2004) derived the asymptotic distribution of the likelihood-ratio test, allowing for imprinting under the null hypothesis of no linkage, by extending Holmans' (1993) possible triangle test for affected sib pairs. Wu et al. (2005) proposed a robust generalized minimax test for linkage, based on alleles that were shared identical by descent by different affected sib pairs, allowing for parent-of-origin effects and sex-specific recombination rates. In the presence of genomic imprinting, the methods for incorporating parent-of-origin effects into linkage analysis of quantitative traits can be found in the work of Hanson et al. (2001), Shete and Amos (2002), and Shete et al. (2003), among others. Strauch (2005) presented an overview of linkage and association methods that take account of imprinting.
Since the recombination fractions are sex specific and imprinting is known or believed to play an important role in many genetically complex traits, such as type I diabetes, polycystic ovarian syndrome (Bennett et al. 1997), atopy, celiac disease, cancer, epilepsy, or bipolar disorder (Knapp and Strauch 2004), we study in this article the linkage analysis based on case–parents trios in the presence of genomic imprinting and sex-specific recombination rates.
In this article, we first show that the conventional TDT can still be used in the case of imprinted genes and sex-specific recombination fractions. On the basis of the normal approximation, the power calculation formula is derived accordingly. The effects of imprinting on the power of the TDT are investigated. It is illustrated that the difference between female and male recombination fractions plays an important role in the evaluation of effects of imprinting on the power of the TDT. The motivation to improve the power of the TDT for imprinted loci is exhibited. On the basis of the stratification of transmitted alleles from heterozygous parent to offspring into paternal and maternal, two TDT-like statistics, TDTp and TDTm, are introduced. A novel test statistic, TDT incorporating imprinting (TDTI) taking parent-of-origin effects into account is proposed, which actually depends on the TDT, the TDTp, and the TDTm. The asymptotic distribution of the TDTI is derived and the power calculation formula is given accordingly. The TDT and TDTI are shown to control the statistical size well. The power comparison of the TDT and TDTI demonstrates that the TDTI has higher power than the TDT in the presence of moderate to large parent-of-origin effects.
METHODS
Background:
Consider the marker locus having two alleles M1 and M2, with population frequencies g and g′ = 1 − g, respectively. Note that M1 and M2 may represent two groups of alleles. It is convenient to use 0, 1, and 2 to denote the genotypes M2M2, M1M2, and M1M1, respectively, where the number indicates the number of marker allele M1 a genotype has. Denote the two alleles at a DSL as D and d, with population frequencies p and q = 1 − p, respectively. To specify the source of alleles D and d, let d/d, D/d, d/D, and D/D denote the four ordered genotypes of a child at a DSL. The allele on the left side of the slash (/) is paternal and the one on the right side is maternal. The corresponding risks of being affected are denoted by ϕd/d, ϕD/d, ϕd/D, and ϕD/D, respectively. The risk for an individual with one copy of D is assumed to be less than that with two copies of D and be greater than that with no copies of D. The population disease prevalence is then given as ϕ = p2ϕD/D + pqϕD/d + pqϕd/D + q2ϕd/d. Strauch et al. (2000) introduced the degree of imprinting I = (ϕD/d − ϕd/D)/2, ranging from (ϕd/d − ϕD/D)/2 to (ϕD/D − ϕd/d)/2, to measure parent-of-origin effects. Generally, I < 0 signifies paternal imprinting or maternal expression, and I > 0 signifies maternal imprinting or paternal expression. I = 0 implies that the two heterozygote risks are identical, i.e., no imprinting. There are two extreme cases, one is completely paternal imprinting or complete maternal expression, i.e., ϕD/d = ϕd/d and ϕd/D = ϕD/D, and the other is completely maternal imprinting or complete paternal expression, i.e., ϕD/d = ϕD/D and ϕd/D = ϕd/d.
For a simple representation of some of our results later, let R denote the difference between the sum of two homozygote risks and the sum of two heterozygote risks, i.e., R = ϕD/D − ϕD/d − ϕd/D + ϕd/d, and Δ denote the difference between two ratios P(D | affected child)/P(D | random man) and P(d | affected child)/P(d | random man), where P(D | affected child) represents the probability that a chromosome of an affected child has a disease allele D at a DSL, and the other probabilities P(D | random man), P(d | affected child), and P(d | random man) are similarly defined. It is derived that Δ = (p(2ϕD/D − ϕD/d − ϕd/D) + q(ϕD/d + ϕd/D − 2ϕd/d))/(2ϕ) and this ratio difference Δ is a positive quantity according to the relative magnitude of the four risk parameters.
Denote the three genotypic relative risks (Risch and Merikangas 1996) as γ1p = ϕD/d/ϕd/d, γ1m = ϕd/D/ϕd/d, and γ2 = ϕD/D/ϕd/d. Denote γ1 = (γ1p + γ1m)/2, the average of the two heterozygous genotypic relative risks. It follows immediately that 1 ≤ γ1p, γ1m ≤ γ2, and 1 ≤ γ1 ≤ γ2. On the basis of those three genotypic relative risks, we have Δ = [pγ2 + (q − p)γ1 − q]/(p2γ2 + 2pqγ1 + q2), I/ϕ = (γ1p − γ1m)/[2(p2γ2 + 2pqγ1 + q2)], and R/ϕ = (γ2 − 2γ1 + 1)/(p2γ2 + 2pqγ1 + q2). It is noted in the following text that the power approximation using a normal distribution depends on the values of three genotypic relative risks γ1p, γ1m, and γ2, not directly on the original four risks ϕd/d, ϕD/d, ϕd/D, and ϕD/D. This facilitates the choice of the risks in the simulation studies. For simplicity, the homozygote risk ϕD/D is set to 0.8 in the simulation study.
For a given γ2, γ1 can take any value ranging from 1 to γ2. Denote an arbitrary γ1 in [1, γ2] as 1 + β(γ2 − 1) with β ∈ [0, 1]. β = 0 (γ1 = 1) means that the mode of inheritance is recessive, (γ1 = (1 + γ2)/2) and I = 0 mean that it is additive (Knapp 1999), and β = 1 (γ1 = γ2) means that it is dominant. Either the recessive or the dominant mode implies that the degree of imprinting is zero. For illustration, we choose β to be
,
, and
or equivalently γ1 to be (3 + γ2)/4, (1 + γ2)/2, and (1 + 3γ2)/4, equally spaced in the range of 1 and γ2. When γ1 (β) is given, we can derive the range of the degree of imprinting I from the diamond of inheritance (Strauch et al. 2000) as follows: [ϕd/dmin(β, 1 − β)(1 − γ2), ϕd/dmin(β, 1 − β)(γ2 − 1)]. Particularly, when γ1 = (3 + γ2)/4 (
) or γ1 = (1 + 3γ2)/4 (
), ϕd/d(1 − γ2)/4 ≤ I ≤ ϕd/d(γ2 − 1)/4; when γ1 = (1 + γ2)/2, ϕd/d(1 − γ2)/2 ≤ I ≤ ϕd/d(γ2 − 1)/2.
Let θf and θm be, respectively, the female and male recombination rates in meioses between the marker locus and a DSL and θ = (θf + θm)/2 denote the sex-averaged recombination rate. It is obvious that the null hypothesis of no linkage is equivalent to ; i.e.,
. It is reported that θf > θm for most chromosomal regions, and the recombination rate for human females is greater than that for human males by an average of 60% (Fann and Ott 1995; Broman et al. 1998). The coefficient of LD between the marker locus and a DSL is
, where
is the frequency of haplotype DM1. The four population haplotype frequencies are then, respectively,
,
,
, and
. The marker allele M1 in this article is taken to be positively associated with the disease allele D; i.e., δ > 0. Note that the replacement of M1 by M2 will change the sign of δ but not its magnitude.
Hardy–Weinberg equilibrium in the parental generation is assumed throughout this article, and the frequencies of three genotypes 0, 1, and 2 at the marker locus are then g′2, 2gg′, and g2, respectively. For n independent families, each one is characterized by the marker genotype trio FMC, where F, M, and C represent the marker genotype of the father, mother, and affected child, respectively. On the basis of the marker genotype trio, all the families are classified to 15 categories. Let NFMC denote the number of families that fall into category FMC (see Table 1 for more details). We are interested in the conditional probability of the family in the jth category given that the child is a case, which is denoted by sj (1 ≤ j ≤ 15). Bayes' theorem is employed to get these conditional probabilities and the detailed expressions of sj (1 ≤ j ≤ 15) are reported in Zhou et al. (2007). It is noted that {NFMC} follows a multinomial distribution with parameters (n, s1,…, s15).
For 1 ≤ j ≤ 15, let us consider an arbitrary family in the jth category. It is feasible to count the times marker allele M1 was transmitted and not transmitted from the heterozygous parents to their affected child, which are denoted, respectively, by uj and vj. Moreover, let denote the number of times that marker allele M1 was transmitted/not transmitted from the heterozygous father (mother) to the affected offspring, and let
be 1 if the father carries more copies of M1 than the mother does and 0 otherwise and
be 1 if the mother carries more copies of M1 than the father does and 0 otherwise. The detailed values of
,
,
,
,
, and
(1 ≤ j ≤ 15) are shown in Table 1.
Classification of all nuclear families each with a single affected child, together with the corresponding conditional probabilities of the family in each category, given that the child is a case
TDT with imprinting:
The total numbers of times of allele M1 transmitted and not transmitted from heterozygous parents to their offspring among n independent case–parents trios can be expressed as uTN and vTN, respectively, where , v =
, and N =
.
So the TDT statistic can be expressed as(1)Note that the conventional TDT (Spielman et al. 1993) is actually the square of the right side of Equation 1. It is concluded in the appendix that the asymptotic distribution of the TDT (1) is a normal distribution with mean
and variance σ2; see Corollary 1 in the appendix for the detailed expressions of the mean μ and variance σ2. Furthermore, Corollary 2 shows that this asymptotic distribution is a standard normal under the null hypothesis of no linkage. So, in the presence of linkage disequilibrium, the TDT (1) can still be used to test for linkage when the genes are imprinted and when the recombination fractions are sex specific. The validity of the TDT will be verified in the simulation study by checking the values of the empirical type I error rates.
For a given significance level α, the rejection region of the two-sided test of the null hypothesis of no linkage is |TDT| > zα/2, where zα/2 is the upper α/2 point of a standard normal distribution (e.g., when α = 0.05, zα/2 = 1.96). Thus we have the power approximation formula,(2)where Z is a standard normal random variable. The accuracy of the power approximation Equation 2 will be validated in the simulation study. Note that the principle of the power approximation Equation 2 is the same as that in the first approximation method developed in Knapp (1999). We can also employ the second approximation method in Knapp (1999) and the χ2-approximation method in Deng and Chen (2001) to estimate the power of the TDT. Our simulation results show that these three methods have equally good performance.
Imprinting effects on the power of the TDT:
It is interesting to find from the expressions of the mean and variance of the TDT (1) that the degree of imprinting I is always accompanied by the difference between the sex-specific recombination fractions θf − θm, which implies that this difference is vital in the evaluation of imprinting effects on the power of the TDT. One direct consequence is that the power of the TDT depends only on the sum of two heterozygote risks ϕD/d + ϕd/D, but not on their difference, or equivalently the degree of imprinting I, when the female and male recombination rates are identical (θf = θm).
To investigate extensively the imprinting effects on the power of the TDT, various sets of parameter values have been selected for power calculations. According to the numerical results, the variances σ2 are very close to one and the value of |μ| is so large that either the first or the second probability on the right-hand side of Equation 2 is almost zero. So the required sample size n to achieve a given power, e.g. 80%, assuming no imprinting can be solved through 2δ2Δ2(1 − 2θ)2n = (z0.025 + z0.2)2 (2gg′ + δΔ(1 − 2g)). It follows immediately that the analytical power of the TDT with this sample size n when the degree of imprinting I varies is(3)where Φ(·) is the cumulative probability function of a standard normal random variable. It is clear from Equation 3 that the power of the TDT is monotonic in I.
Note thatand the range of the degree of imprinting I is [ϕd/dmin (β, 1 − β)(1 − γ2), ϕd/dmin(β, 1 − β)(γ2 − 1)]. So the range of I/(ϕΔ) in Equation 3 is independent of γ2, δ, and g, which implies that the pattern of the analytical power of the TDT associated with different values of γ2, δ, and g vs. the degree of imprinting I remains almost the same except for the different scaling of I. This property is also observed while plotting the graphs of the analytical power (calculated from Equation 2), having different parameter values, against the degree of imprinting. Thus, it could be sufficient to show only the graphs corresponding to, for example, γ2 = 4, δ = 0.9δmax = 0.9 min(p(1 − g), g(1 − p)) (Deng and Chen 2001), and g = 0.4. We compare the powers calculated, respectively, by Equations 2 and 3 with various sets of parameter values and find that they are almost identical.
Moreover, when γ1 = (1 + γ2)/2 (), we have ϕΔ = ϕd/d(γ2 − 1)/2, which is independent of p. So the power of the TDT is independent of p if γ1 = (1 + γ2)/2 (the additive mode of inheritance). Generally, for any given γ2 and γ1 = 1 + β(γ2 − 1), the power of the TDT, as a function of the degree of imprinting I ranging from ϕd/dmin(β, 1 − β)(1 − γ2) to ϕd/dmin(β, 1 − β)(γ2 − 1), attains its minimum and maximum, Φ(z0.2 − min(β, 1 − β)(θf − θm)(z0.025 + z0.2)/((1 − 2θ)(βq + (1 − β)p))) and Φ(z0.2 + min(β, 1 − β)(θf − θm)(z0.025 + z0.2)/((1 − 2θ)(βq + (1 − β)p))) at the two boundary points of I, respectively, under the situation of θf > θm. It can be proved that min(β, 1 − β)/(βq + (1 − β)p) is a monotonically increasing function of β for
and is a monotonically decreasing function of β when
. It follows at once that the minimum and maximum powers over β and I are attained at
and I = ±ϕd/d(1 − γ2)/2 and are Φ(z0.2 − (θf − θm)(z0.025 + z0.2)/(1 − θf − θm)) and Φ(z0.2 + (θf − θm)(z0.025 + z0.2)/(1 − θf − θm)), respectively, which depend only on the two sex-specific recombination fractions. Note that
and I = ϕd/d(1 − γ2)/2 if and only if
and
, i.e., complete paternal imprinting, and
and I = ϕd/d(γ2 − 1)/2 if and only if
and
, i.e., complete maternal imprinting. Thus the analytical power attains its minimum and maximum in the cases of complete paternal imprinting and complete maternal imprinting, respectively.
The interval [Φ(z0.2 − (θf − θm)(z0.025 + z0.2)/(1 − θf − θm)), Φ(z0.2 + (θf − θm)(z0.025 + z0.2)/(1 − θf − θm))] is termed the maximum range of imprinting effects on the power of the TDT. For example, when θf = 0.146 and θm = 0.084, which are the sex-specific recombination fractions between ABO and the locus for the nail-patella syndrome (NPS1) (Ott 1999), the maximum range of imprinting effects on the power of the TDT is [0.7311, 0.8571]; when θf = 0.05 and θm = 0.02, the maximum range of imprinting effects is [0.7737, 0.8243]; when θf = 0.01 and θm = 0.006, the maximum range is [0.7968, 0.8032]. The parent-of-origin effect on the power of the TDT is negligible in the last case.
In short, to assess thoroughly the parent-of-origin effects on the power of the TDT, it is first advised to draw the graph of the power (Equation 3) vs. the degree of imprinting when parameters γ2, β, and p are taken as 4, , and
, respectively, which exhibits the maximum range of imprinting effects on the power of the TDT. If it is observed from the graph that the imprinting effect on the power of the TDT is substantial, then more graphs for specific parameter values may be needed. This greatly facilitates the assessment of the effect of genomic imprinting on the power of the TDT since lots of parameters are involved. The parent-of-origin effects on the power depend largely on the female and male recombination fractions and are negligible when their difference is small, e.g., both rates <1 cM.
Improve the power of the test:
The TDT (1) essentially tests for the equality of the expected number of transmissions of allele M1 from heterozygous parents to the affected children and that of nontransmissions (Spielman et al. 1993). Taking parent-of-origin effects into account, it is natural to stratify the transmission/nontransmission numbers according to whether the father or the mother is the source.
In fact, the difference between the probability of the marker allele M1 being transmitted from the heterozygous father to the affected child and the probability of the marker allele M1 not being transmitted (equivalent to M2 being transmitted) from the heterozygous father to the affected child can be given as(4)and similarly we have
(5)The details of the derivation of Equations 4 and 5 are sketched in the appendix. It follows immediately from Equations 4 and 5 that the difference between the probabilities of M1 being transmitted and not being transmitted is
(6)
When the two heterozygous genotypic risks ϕD/d and ϕd/D vary between ϕd/d and ϕD/D and the other parameter values remain unchanged, it is obvious that when the degree of imprinting I changes from the leftmost point to the rightmost point, i.e., from the case of complete paternal imprinting to the case of complete maternal imprinting, the quantity p(ϕD/D − ϕd/D) + q(ϕD/d − ϕd/d) changes from 0 to the maximum ϕD/D − ϕd/d, while the quantity p(ϕD/D − ϕD/d) + q(ϕd/D − ϕd/d) changes from the maximum ϕD/D − ϕd/d to 0. Equivalently speaking, in the case of complete paternal imprinting, the difference between the numbers of marker allele M1 being transmitted and nontransmitted from the heterozygous fathers to the children would provide strong evidence for linkage while the difference between the numbers of marker allele M1 being transmitted and nontransmitted from the heterozygous mothers to the children would provide weak evidence for linkage. In the case of completely maternally imprinting, the situation is the reverse. So it is expected that the TDTp, the paternal version of the TDT, would provide more evidence for linkage when the positive value of the degree of imprinting is large. Similarly, the TDTm, the maternal version of the TDT, would provide more evidence for linkage when the negative value of the degree of imprinting is large. Detailed expressions of the TDTp and the TDTm are given in the following section. A suitable combination of the TDT, the TDTp, and the TDTm is expected to be more powerful than the TDT.
When there is little imprinting, the TDT is expected to be powerful in conducting linkage analysis. It is also understood from Equation 6 that when θf = θm, the difference between probabilities of the marker allele M1 being transmitted and not being transmitted from heterozygous parents to offspring is independent of the difference of two heterozygous genotype risks ϕD/d and ϕd/D, but depends on the sum of these two risks. So it again implies that the power of the TDT is independent of the degree of imprinting I when θf = θm.
Proposed test statistic:
Note that the numbers of paternal transmission and paternal nontransmission of the marker allele M1 are, respectively, Tp = N101 + N122 + N112 and NTp = N100 + N121 + N110, and the numbers of maternal transmission and maternal nontransmission of the marker allele M1 are, respectively, Tm = N212 + N011 + N112 and NTm = N211 + N010 + N110; see Table 1 for details. The cell N111 is ignored in counting Tp, NTp, Tm, and NTm because the origin of M1 in the affected child is ambiguous. The numbers of total transmission and nontransmission are then, respectively, T = Tp + Tm + N111 and NT = NTp + NTm + N111. Using the constant vectors ,
,
, and
with components given in Table 1, the TDTp and the TDTm can be expressed, respectively, as
(7)
(8)The asymptotic distributions of the TDTp and the TDTm are given in the appendix.
Zhou et al. (2007) proposed the parent-of-origin effects test statistic (POET) to test for parent-of-origin effects, which essentially tests for the equality of the expected numbers of two groups of families, the first one comprising the families in which the father carries more copies of M1 than the mother does, and the second group comprising the families in which the mother carries more copies of M1 than the father does. The POET (Zhou et al. 2007) can be expressed as(9)where
and
with components given in Table 1. The asymptotic distribution of the POET is given in Equation A4 and is a standard normal distribution under the null hypothesis of no imprinting. On the basis of Equation A4, a large positive value of the POET may infer maternal imprinting, or equivalently paternal expression, and a large negative value of the POET may infer paternal imprinting, or equivalently maternal expression.
In conducting linkage analysis, it is discussed in the previous section that the TDTp could be more powerful than the TDT for a large positive value of I, and the TDTm could be more powerful than the TDT for a large negative value of I. When the degree of imprinting is small or does not exist, we keep closely to the proposed statistic for the TDT, which is commonly used and is powerful for testing for linkage. Thus, we propose an extension of the transmission disequilibrium test incorporating imprinting (TDTI)(10)where I{comparison statement} = 1 when the comparison statement holds and is 0 otherwise, and α′ is the significance level in testing for imprinting by the POET. It is proved in the appendix that all the joint distributions of the POET and the TDTm/TDT/TDTp are asymptotically normal. So the asymptotic distribution of the TDTI is a mixture of three two-dimensional normal distributions. Furthermore, under the null hypothesis of no linkage, the POET and the TDTm/TDT/TDTp are asymptotically independent and TDTI
.
For a significance level α, the rejection region about the null hypothesis of no linkage in the presence of linkage disequilibrium, i.e., , can be expressed as |TDTI| > zα/2 and the power of the TDTI is P(|TDTI| > zα/2). This significance level α may be different from α′ on the right side of Equation 10. For simplicity, α and α′ are taken to be 0.05 in our simulation study. The power approximation formula for the TDTI is then
(11)which is asymptotically the sum of three two-dimensional normal distributions.
SIMULATION RESULTS
There are a lot of parameters involved in the simulation. In the following simulation study, we fix the marker allele frequency g at 0.4, the homozygote risk ϕD/D at 0.8, the significance level α at 0.05, and the α′ in Equation 10 at 0.05. The other parameters take some discrete values or a range of interval, and the coefficient of LD is taken as δ = 0.9 min(0.6p, 0.4(1 − p)). The actual sizes/powers of the TDT and TDTI are estimated as the proportions of rejecting the null hypothesis of no linkage in 10,000 replicates performed under the null/alternative hypothesis.
Sizes of the TDT and the TDTI:
To investigate the validity of the TDT and the TDTI, it is necessary to check the sizes of the TDT and the TDTI, i.e., the proportion of rejecting the null hypothesis of no linkage when θf = θm = 0.5. For illustration purposes, we choose the homozygote risk ϕd/d = 0.1; the disease allele frequency p = 0.01, 0.1, 0.5, and 0.8; and the sample size n = 100, to evaluate the sizes of the TDT and the TDTI. For the completeness of this investigation, we choose the following 13 representative pairs of and
, which are scattered uniformly in the diamond (Strauch et al. 2000) composed of {(
,
) | 1 ≤
,
≤ γ2}: (
,
) = (γ2, γ2), ((1 + γ2)/2, γ2), ((1 + 3γ2)/4, (1 + 3γ2)/4), (γ2, (1 + γ2)/2), (1, γ2), ((3 + γ2)/4, (1 + 3γ2)/4), ((1 + γ2)/2, (1 + γ2)/2), ((1 + 3γ2)/4, (3 + γ2)/4), (γ2, 1), (1, (1 + γ2)/2), ((3 + γ2)/4, (3 + γ2)/4), ((1 + γ2)/2, 1), (1, 1). It is noted that (
,
) = (γ2, γ2) corresponds to the common dominant mode of inheritance, (
,
) = ((1 + γ2)/2, (1 + γ2)/2) corresponds to the additive mode of inheritance, (
,
) = (1, 1) corresponds to the common recessive mode of inheritance, (
,
) = (γ2, 1) indicates complete maternal imprinting, and (
,
) = (1, γ2) indicates complete paternal imprinting.
For those 13 pairs of values of (,
), we estimated the actual sizes of the TDT and the TDTI. The results are listed in Table 2, where
and
are expressed equivalently as γ1 and I. All the entries in Table 2 show that the sizes of the TDT and the TDTI are consistent with the nominal 0.05, which signifies the validity of the TDT and the TDTI. Due to the uniform distribution of those 13 pairs of (
,
), the simulated sizes of the TDT and the TDTI listed in Table 2 show that it is valid to use the TDT and the TDTI to test for linkage when the disease gene is imprinted and the recombination fractions are sex specific.
Type I error rates of the TDT and the TDTI at a significance level of 0.05 for a simulation with 10,000 replicates with no linkage, having the marker allele frequency g = 0.4, the homozygote risks ϕD/D = 0.8 and ϕd/d = 0.1, the coefficient of LD δ = 0.9 min(0.6p, 0.4(1 − p)), and the sample size n = 100
In fact, we have selected a number of other parameter values involved in evaluating the TDT and TDTI. The patterns of sizes of the TDT and the TDTI are similar to those reported above, and these results are omitted for brevity.
Power approximation formulas of the TDT and the TDTI:
The precision of the power approximation formulas (2) and (11) is evaluated by simulation and their comparisons are made on the basis of the simulation results. To investigate the accuracy of Equations 2 and 11 in calculating the powers of the TDT and the TDTI, various choices of the disease allele frequency p and the homozygote risk ϕd/d are taken, and the following three scenarios on imprinting are considered: (a) complete paternal imprinting, (b) general paternal imprinting, and (c) no imprinting, while the other parameter values remain unchanged. For a given set of parameter values, the sample sizes necessary to achieve 80% power in the TDT and the TDTI are calculated by Equations 2 and 11, respectively. The actual powers of the TDT and the TDTI corresponding to these particular sample sizes are then obtained by simulation with 10,000 replicates.
Table 3 lists the results when the female and male recombination fractions are 0.02 and 0.01, respectively. All the entries of Table 3 show a strong agreement between the analytical powers and the actual powers. Table 3 demonstrates that both Equations 2 and 11 perform well. It is also observed from Table 3 that the sample sizes needed by the TDTI are always smaller than those needed by the TDT in the cases of complete/general paternal imprinting, and conversely the sample sizes for the TDT are smaller in the case of no imprinting. This may imply that the power of the TDTI is greater than that of the TDT when the genes are paternally imprinted, and the power of the TDTI is slightly smaller than that of the TDT when the genes are not imprinted.
Sample sizes necessary to gain 80% power in the TDT and the TDTI (α = 0.05) according to Equations 2 and 11, respectively, with θf = 0.02, θm = 0.01, ϕD/D = 0.8, g = 0.4, and δ = 0.9 min(0.6p, 0.4(1 − p))
It is observed from Table 3 that the sample sizes necessary to gain 80% power in the TDT decrease in the order of complete paternal imprinting, general paternal imprinting, and no imprinting. Actually, we also investigate the required sample sizes in the case of complete/general maternal imprinting (results not shown for brevity) and find that these sample sizes decrease in the order of no imprinting, general maternal imprinting, and complete maternal imprinting. However, for the TDTI, it is observed that the sample size necessary to gain 80% power in the cases of complete/general imprinting is smaller than that in the case of no imprinting. This is equivalent to saying that the power of the TDTI in the case of complete/general imprinting is greater than that in the case of no imprinting when the sample size is fixed.
We also obtained similar findings when the parameters take other values, and they are not shown here for brevity. The power comparison is conducted further in the following two sections.
Powers of the TDT and the TDTI:
To compare the powers of the TDT and the TDTI, a simulation with a large number of different parameter values was conducted. For simplicity, we show just the simulation results with 10,000 replicates in Table 4 when the sample size n = 100/200 with three scenarios concerning θf and θm—(a) θf = θm = 0, (b) θf = θm = 0.001, and (c) θf = 0.001 and θm = 0.0006—and three scenarios concerning imprinting—(a) complete paternal imprinting, (b) general paternal imprinting, and (c) no imprinting. Columns 4 and 5, 6 and 7, and 8 and 9 in Table 4 exhibit the powers of the TDT and the TDTI with complete paternal imprinting genes, general paternal imprinting genes, and no imprinting genes, respectively. Table 4 shows that for complete/general paternal imprinting genes, the TDTI has a greater power than the TDT for testing for linkage, while the TDT has a slightly higher power than the TDTI for no imprinting genes. For example, when p = 0.1, ϕd/d = 0.1, n = 100, θf = 0.001, and θm = 0.0006, the powers of the TDT and the TDTI are 60.50% vs. 70.77% for complete paternal imprinting genes and 60.25% vs. 64.02% for general paternal imprinting genes; in both cases the TDTI has higher power than the TDT, while for no imprinting genes the TDT (power = 60.98%) has <1% higher power than the TDTI (power = 60.22%). Note that the TDT is <1% more powerful than the TDTI in all except one case of no imprinting genes in Table 4. The similar findings in the case of complete/general maternal imprinting are omitted for brevity.
Powers of the TDT and the TDTI for complete paternal imprinting genes, general paternal imprinting genes, and no imprinting genes, having ϕD/D = 0.8, g = 0.4, and δ = 0.9 min(0.6p, 0.4(1 − p))
Power comparisons of the TDT and the TDTI:
The previous section exhibits the powers of the TDT and the TDTI for a variety of parameter values and shows that the TDTI outperforms the TDT in testing for linkage in the cases of complete/general imprinting, where the degree of imprinting or sample size is fixed at some values. In this section, we continue to perform power comparisons of the TDT and the TDTI and focus on the pattern of the powers vs. various degrees of imprinting or sample sizes when all other parameter values remain unchanged.
We fix the parameter values as follows: disease allele frequency p = 0.1, the homozygote risk ϕd/d = 0.12, and two recombination fractions θf = 0.001 and θm = 0.0006. Figure 1a exhibits the simulated powers of the TDT and the TDTI with 100,000 replicates when the degree of imprinting I varies in the interval [(ϕd/d − ϕD/D)/2, (ϕD/D − ϕd/d)/2] and the sample size n = 200. Figure 1a illustrates that the power of the TDTI is greater than that of the TDT for a moderate to large degree of imprinting, and the improvement may be substantial in the case of complete paternal/maternal imprinting. For example, in the case of complete maternal imprinting, the powers of the TDTI and the TDT are, respectively, 89.40 and 79.54%. Also noted is that for a small degree of imprinting, the TDT is slightly more powerful to detect linkage than the TDTI. For example, in the case of no imprinting, the powers of the TDT and the TDTI are, respectively, 79.30 and 78.54%.
The actual powers of the TDT and TDTI are plotted against (a) the degree of imprinting I ∈ [(ϕd/d − ϕD/D)/2, (ϕD/D − ϕd/d)/2] when the sample size n = 200 and (b) the sample size n ∈ [100, 250] in increments of 15 in the case of general paternal imprinting, having p = 0.1, g = 0.4, δ = 0.054, ϕD/D = 0.8, ϕd/d = 0.12, I = −0.3, θf = 0.001, and θm = 0.0006. Powers are based on 100,000 replicates and assessed at the 5% level.
Overall, the proposed TDTI for testing for linkage outperforms the conventional TDT when the degree of imprinting is moderate to large. When the degree of imprinting is negligible, there is a slight loss of power from using the TDTI.
Figure 1b plots the powers of the TDT and the TDTI vs. the sample size n ∈ [100, 250] in increments of 15 and the degree of imprinting I = 4(ϕd/d − ϕD/D)/9 (general paternal imprinting). Figure 1b illustrates that the power of the TDTI is greater than that of the TDT. For example, the powers of the TDT/TDTI are, respectively, 69.52/77.19% for n = 160, 73.72/81.42% for n = 175, and 77.34/84.86% for n = 190. It is noted that for small/large sample sizes, the powers of both the TDT and the TDTI are small/large, and so the improvement of the TDTI over the TDT would be less significant. The reason is that both powers are close to 0/1 under those situations. The pattern of the powers of the TDT/TDTI in the case of general maternal imprinting is similar and the details are not reported here for brevity.
DISCUSSION
The TDT is still applicable to the cases of imprinted genes and sex-specific recombination fractions. It is interesting to note that the degree of imprinting I and the difference between the sex-specific recombination fractions θf − θm appear simultaneously in the power expression for calculating the power of the TDT. It therefore is deduced that the power of the TDT is independent of the degree of imprinting when the female and male recombination fractions are identical. But this is not the case for the powers of the TDTp, TDTm, and TDTI. Furthermore, the difference between two sex-specific recombination fractions plays an important role in the evaluation of the imprinting effects on the power of the TDT. It may be drawn that parent-of-origin effects on the power of the TDT are negligible in fine mapping where the genetic distance between markers is usually <1 cM.
Note that the data we employed in the TDTI are just the same as those in the TDT, so no extra data are needed in the proposed test statistic. In fact, the TDT uses 11 categories of families with at least one heterozygous parent and the TDTI uses 13 categories of families. Only categories of 000 and 222 are excluded from our analysis. Thus the TDTI would provide more information about linkage than the TDT. Meanwhile, as seen from the expression of the TDTI (10), it is simple and easy to use in practice. Like the conventional TDT, the TDTI requires the presence of LD to detect linkage. Furthermore, the TDTI is applicable in the population not in Hardy–Weinberg equilibrium, for example, the population stratification demographic model. The details are omitted here for brevity.
It is observed from the simulation results that the proposed TDTI accounting for parent-of-origin effects is not always more powerful than the conventional TDT that ignores parent-of-origin effects. Interpreted precisely, in the presence of moderate to large parent-of-origin effects, the TDTI is more powerful than the TDT while the TDT is more powerful than the TDTI in the absence of parent-of-origin effects, although the magnitude is relatively small. For the conventional TDT, the stronger the parent-of-origin effects are, the more severe the power loss is. Hanson et al. (2001), Shete and Amos (2002), and Wu et al. (2005) arrived at a similar conclusion.
It therefore may be useful to assess a prior knowledge about whether the disease gene is imprinted, for example, by use of epidemiological data. If imprinting is suspected, the proposed test statistic is recommended. Otherwise the conventional TDT is recommended. Note that the TDTI loses only a little power even if there is actually no imprinting.
In addition to fixing α′ in Equation 10 at a particular value, there could be some flexible choices in determining if the degree of imprinting is significant or not. In our experience and intuition, we may choose a large α′ in Equation 10 when there is a prior belief of imprinting about the disease gene. Conversely, a small α′ may be taken when there is no prior belief of imprinting.
An advantage of the proposed test statistic is that it requires no specifications for the model of the genetic mechanism of the studied disease. Actually, the mechanism of many genetic diseases is generally unknown. Incorporating of parent-of-origin effects into linkage analysis would be particularly useful for the study of known imprinted genes, and thus it provides a useful tool for genetic dissection of disease genes.
APPENDIX
Derivation of Equations 4 and 5:
For illustration, we give the details of the derivation of Equation 4 (the counterpart for Equation 5 is skipped):where the haplotype on the left side of the slash (/) is paternal and the one on the right side is maternal, the asterisk denotes an arbitrary allele, and, for example, Cp = DM1 indicates that the paternal haplotype of the child is DM1. The last equality is based on the following four equations: P(F = M1M2, Cp = DM1) = pgg′ + g′δ − δθm, P(F = M1M2, Cp = DM2) = pgg′ − gδ + δθm, P(F = M1M2, Cp = dM1) = qgg′ − g′δ + δθm, and P(F = M1M2, Cp = dM2) = qgg′ + gδ − δθm. The principle of their derivation is referred to in Zhou et al. (2007). Thus we have Equation 4.
Joint distribution of two TDT-like statistics:
We first give a theorem to describe the joint distribution of two TDT-like statistics and two corollaries to describe the asymptotic distributions of the TDT, the TDTp, and the TDTm. Then we provide a theorem that shows the asymptotic distributions of the TDTI.
Theorem 1. Let u1 = (u11,…, u1m)T, v1 = (v11,…, v1m)T, u2 = (u21,…, u2m)T, v2 = (v21,…, v2m)T be four constant vectors of length m and (N1,…, Nm, Nm+1)T be a multinomially distributed random variable of size and with parameters (r1,…, rm, rm+1), where
. Denote N = (N1,…, Nm)T; then we have in law
where
, b1 =
,
,
, σ11 = βT(u1, v1)TΣ(u1, v1)β, σ12 = σ21 = βT(u1, v1)TΣ(u2, v2)γ, σ22 = γT(u2, v2)TΣ(u2, v2)γ, Σ = diag(r) − rrT, r = (r1,…, rm)T, β = ((x1 + 3y1)/(2(x1 + y1)3/2), −(3x1 + y1)/(2(x1 + y1)3/2))T, and γ = ((x2 + 3y2)/(2(x2 + y2)3/2), −(3x2 + y2)/(2(x2 + y2)3/2))T are evaluated at x1 = a1, y1 = b1, x2 = a2, and y2 = b2, respectively. Particularly, we have
Proof. Note that N/n is the maximum-likelihood estimator of the parameter vector r and from the asymptotic normality of the maximum-likelihood estimator, we have in law(Rao 1973). It follows immediately that
Let
then we have
where
Note that the derivatives are evaluated at x1 = a1, y1 = b1, x2 = a2, and y2 = b2. After some transformations, we complete the proof.
Remark. By matrix multiplication, we find that(A1)where
,
, and
. Moreover, when
and
for all j, we have
(A2)Note that Equation A2 in this situation is just the theorem in Zhou et al. (2007). We also find that
In particular, when a1 = b1 and a2 = b2, we have
(A3)
Let u = ,
,
,
,
,
, and
with components listed in Table 1; by using Theorem 1 and Equations A1 and A2, we have the following two corollaries after tedious algebra:
Corollary 1. For the test statistic (1), we have in lawwhere
For the test statistic (7), we have in law
where
For the test statistic (8), we have in law
where
For the test statistic (9), we have in law
(A4)
(Zhou et al. 2007), where
Corollary 2. When , we have
,
,and
; when I = 0, we have
.
Proof. When , it is obvious that a = b, d1 = a + b, ap = bp, and am = bm. When I = 0, we have aI = bI. Furthermore,
. Thus we have proven Corollary 2.
Theorem 2 (distribution of the TDTI). Under the null hypothesis of no linkage, i.e., , the POET and the TDT/TDTp/TDTm are asymptotically independent, and
.
Proof. When , we have that a = b, ap = bp, and am = bm. So by Equation A3 and the eight constant vectors uI, vI, u, v, up, vp, um, and vm with components listed in Table 1, the covariances of the POET and the TDTp, the TDTm, and the TDT are, respectively,
After tedious algebra, we have
It follows immediately that Cov(POET, TDT) = Cov(POET, TDTp) = Cov(POET, TDTm) = 0 when
. So the POET and the TDT/TDTp/TDTm are asymptotically independent. Moreover, by Corollary 2, it is easy to show that
under the null hypothesis of no linkage. Hence the theorem is proved.
Acknowledgments
The authors are grateful to the associate editor and two anonymous reviewers for insightful comments that improved greatly the presentation of the material. This work was partially supported by a research grant of the Hong Kong Research Grant Council, by the National Natural Science Foundation of China (10329102, 10561008), and by the Scientific Research Fund of Huaihua University.
Footnotes
Communicating editor: Y.-X. Fu
- Received March 21, 2006.
- Accepted December 21, 2006.
- Copyright © 2007 by the Genetics Society of America