Abstract
I show that fine-scale localization of a survival-related locus can be accomplished on the basis of deviations from Hardy–Weinberg equilibrium and linkage disequilibrium at closely linked marker loci. The method is based on χ2-tests and they can be performed for age-specific samples of alive (or dead) individuals, as for combined samples of alive and dead individuals.
CONVENTIONAL tools for the analysis of QTL can locate loci underlying the variation of continuous quantitative traits to a genomic region of ∼30 cM (Deng et al. 2000). Fine-scale mapping (∼1 cM) is required to reduce the range of these candidate genomic regions, and some appropriate techniques have been developed for complex diseases and quantitative traits under Gaussian distributions (Spielman et al. 1993; Feder et al. 1996; Nielsen et al. 1998; Deng et al. 2000, 2003). Although survival has become an emergent research field in human health (Puca et al. 2001) and animal breeding (Kleinbaum 1996), there are not appropriate fine-mapping techniques for survival traits. The objective of this article is to adapt Deng's et al. (2000) QTL fine-mapping method to survival data.
Take as starting point a survival-related QTL locus with two alleles, A1 and A2, and allelic frequencies p and q = 1 − p, respectively. Under the proportional hazards framework (Cox 1972), is the survival probability at time t for an individual with genotype A1A1, where S0(t) is the baseline survival function and a is the genotypic value of the A1A1 genotype. In a similar way, we define
and
d and −a being the genotypic values for the A1A2 and A2A2 genotypes, respectively. Without loss of generality, we can assume that S0(t) represents a random variable for the combined effects of all the rest of the polymorphic loci and all random environmental effects. As in the original research of Feder et al. (1996), Nielsen et al. (1998), and Deng et al. (2000), a large population under random mating is assumed and thus Hardy–Weinberg (HW) equilibrium holds in each generation of individuals at birth. The proportion of survivors at time t (πt) is stated as
and
being the allelic (A1) and genotypic (A1A1) frequency within the group of alive individuals at time t (ALIVEt), respectively (the remaining frequencies can be easily derived following Deng et al. 2000). Deviation from HW equilibrium at the survival QTL can be measured by the disequilibrium coefficient
by Weir (1996),
or, following Deng et al. (2000), by the function between observed and expected homozygosities,
Previous derivations can be easily adapted to a marker locus closely located near the survival QTL, with alleles M1 and M2, and allelic frequencies r and s = 1 − r. As in Deng et al. (2000), is the allelic frequency of M1 and
is the genotypic frequency of M1M1, where
is the linkage disequilibrium (LD) measure between A1 and M1 (Crow and Kimura 1970) and PA1M1 is the frequency of haplotypes carrying both A1 and M1. According to Deng et al. (2000), the HW disequilibrium among ALIVEt individuals at the marker locus is
it being nonzero when
and
A wide range of combinations of ϕ11, ϕ12, and ϕ22 provide a value different from zero and, in practice, the HW disequilibrium at the marker locus solely reflects the LD in the whole generation (Deng et al. 2000). In a similar way, HW disequilibrium for the marker locus among alive individuals can be derived as
as described by Feder et al. (1996) and Nielsen et al. (1998) for affected individuals of complex traits. Both FM1 and DM1M1 statistics converge to the key point that HW disequilibrium at a marker locus corresponds to the whole-generation LD between the marker locus and the QTL (Deng et al. 2000). Alternatively, one could use a direct measure of LD like the pexcess statistic proposed by Bengtsson and Thomson (1981). For a survival QTL, pexcess becomes
where
was the allelic frequency of M1 within the group of dead individuals at time t (DEADt). Therefore, pexcess is proportional to DA1M1 and reaches its maximum at the marker with the greatest LD with the QTL (Nielsen et al. 1998).
To test for the statistical significance of the HW disequilibrium measures ( and
) and the LD measure (pexcess), two χ2-tests can be easily applied. Following Deng et al. (2000), the χ2-test statistic for HW disequilibrium is derived as
where the tilde (∼) denotes an estimated value from the sample and 2n is the total sample size of individuals. The
test has
d.f., m being the number of alleles at the marker locus being tested (k = 2). On the other hand, the χ2 for pexcess (Weir 1996; Deng et al. 2000) is stated as
with m − 1 = 1 d.f.
To illustrate the tests outlined above, extensive computer simulations were performed for a biallelic survival QTL and several biallelic markers. These computer simulations were carried out under a wide range of inheritance models (additive, dominant, recessive, partial dominant, and partial recessive), sample sizes, and ages and under a Weibull assumption for the baseline survival function (Ducrocq et al. 1988a,b; Ibrahim et al. 2001). For the five genetic models, both tests showed reduced power at greater distances between the QTL and the marker loci, although the power decayed more quickly for than for
and it was higher for
than for
(Figure 1). These results agree with previous QTL fine-mapping research (Nielsen et al. 1998; Deng et al. 2000) and they are not surprising because, in models where both the survival QTL and the marker locus have only two alleles, HW disequilibrium is proportional to the square of LD (Nielsen et al. 1998). Whereas
provided a similar power for all genetic models,
showed substantial discrepancies. Within this context,
seemed preferable if samples of both alive and dead individuals were accessible, although they could be unavailable if the study was not previously scheduled. On the other hand, the average type I error of both tests was close to the expected level of 0.05, slightly higher for
than for
(Figure 2). This larger variation in
was consistent with previous research (Nielsen et al. 1998; Deng et al. 2000). As was expected, the average power of both tests at the different marker positions increased as the amount of available information increased (e.g., the number of sampled individuals increases; Figure 3) or the selection criteria became more strict (e.g., elderly ages to differentiate between alive and dead individuals; Figure 4). These results agreed with those of Deng et al. (2000).
Comparison of the statistical power of the -test (open boxes) and the
-test (solid boxes) under various genetic models: (A) additive (a = −0.69, −a = 0.69, d = 0), (B) dominant (a = −0.69, −a = 0.69, d = a), (C) recessive (a = −0.69, −a = 0.69, d = −a), (D) partial dominant (a = −0.69, −a = 0.69, d = a/2), and (E) partial recessive (a = −0.69, −a = 0.69, d = −a/2). The bottom and top edges of the boxes represent the sample 25th and 75th percentiles; the whiskers extend the range of the results. Under a specific set of simulation parameters, the population started at generation zero, with complete association between allele A1 (p = 0.5) at the QTL and marker allele M1 (r = 0.5), and evolved for 50 generations, under random mating and genetic drift. A set of dense marker loci positioned at 0.25-cM intervals and that span 0–2 cM of the QTL were simulated, with the recombination rate obtained from Haldane's map function (Ott 1991). The first 100 populations for which the difference in p at the start and at the end of evolution do not differ by >5% were retained. The effective population size per generation was 15,000. Survival records were simulated under a proportional hazard model (Cox 1972), assuming a Weibull distribution with parameters ρ = 1.5 and λ = 0.001 for the baseline survival function (
), and the threshold between alive and dead individuals was assumed at t = 1500. In each case, 5000 appropriate samples of 200 individuals (sampling with replacement) from each of the 100 simulated populations were sampled and the statistical power was calculated as the percentage of times that the null hypothesis of no disequilibrium was rejected.
Comparison of type I error of the -test (open square) and the
-test (solid square) under the five inheritance models: additive, dominant, recessive, partial dominant, and partial recessive. Type I error for both tests was calculated as the percentage of times that the null hypothesis of no disequilibrium was rejected when the simulations were performed under the null hypothesis of no linkage disequilibrium. As is described in Figure 1, simulation parameters were p = 0.5, 2n = 200, t = 1500, λ = 0.001, and ρ = 1.5. The square represents the average value and the whiskers extend the range of the results.
Average power under different sample sizes (2n) for the (A) - and (B)
-tests (the simulation process is described in Figure 1).
Comparison of statistical power for various temporal cut points (t). The average results for the (A) - and (B)
-tests are presented (the simulation process is described in Figure 1).
In conclusion, LD is captured and magnified in extreme samples of elderly individuals, where QTL genotypes and alleles are disproportionately represented. The disequilibrium must be the highest at the QTL locus, since it is the underlying factor that determines the selection criterion, and it decreases as the degree of linkage between the QTL and the markers decreases. This relation between the HW equilibrium and/or LD and the physical distance between a panel of linked marker loci and a QTL is the key point that provides a straightforward basis for QTL fine mapping with use of the peaks of the disequilibrium measures and/or test statistics.
Footnotes
Communicating editor: J. B. Walsh
- Received October 24, 2006.
- Accepted February 25, 2007.
- Copyright © 2007 by the Genetics Society of America