- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gu, C.
- Articles by Rao, D. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Gu, C.
- Articles by Rao, D. C.
Considerations on Study Designs Using the Extreme Sibpairs Methods Under Multilocus Oligogenic Models
Chi Gua and D. C. Raoa,ba Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri 63110
b Departments of Psychiatry and Genetics, Washington University School of Medicine, St. Louis, Missouri 63110
Corresponding author: Chi Gu, Washington University School of Medicine, Box 8067, 600 S. Euclid Ave., St. Louis, MO 63110., gc{at}wubios.wustl.edu (E-mail)
Communicating editor: G. A. CHURCHILL
| ABSTRACT |
|---|
Several issues pertinent to study designs employing extreme sibpairs (ESP) methods to detect complex oligogenic quantitative trait loci (QTL) are investigated in the setting of genome-wide multipoint scans. We demonstrate that when stringent
-levels are imposed (e.g.,
= 0.00022 as recommended by Landers and Kruglyak), the power to detect a susceptibility locus could drop from 83.6% under a one-locus model down to a hopeless 22.8% under a two-locus model of the same heritability
and gene frequency (p = 0.1). We introduce the notion of joint power that is the power to detect linkage to at least one location over a given panel of markers across a genomic region and describe the effect of several design factors on such joint power in a multipoint scan. Moreover, power of analysis conditional on the IBD sharings of ESPs at a known/detected locus is examined and shown to increase substantively (to 93.3% under the previous two-locus model) in detecting novel trait loci. We conclude that with such remedies, the ESP design continues to be a relatively powerful design for mapping oligogenic QTL. However, when the effect of individual contributing loci becomes less tractable, especially when their contributions are "asymmetric," deliberation on balancing two types of statistical errors and a careful examination of possible contributions from multiple genetic factors and/or interaction effects are a must in designing an efficient study.
AS a powerful design in the case of quantitative traits influenced by a single gene, the extreme sibpairs (ESP) method can dramatically enhance the power by selecting sibpairs from extreme tails of the trait distribution (![]()
![]()
![]()
![]()
![]()
-level in genomic scans, (2) how do we minimize the effect of epistatic interaction on the power or utilize the information, and (3) is the effect of selective sampling still monotonic, and how does it affect the design of such a study?
This investigation is devoted to an examination of the power of ESP designs, when a quantitative trait is determined by two genetic loci. We show that although the effect of selective sampling is no longer monotonic, the ESP method remains powerful when used with caution. We see that indeed the power under two-locus models is severely compromised if very stringent
-levels are imposed, making it necessary to strike a balance between false positives and false negatives (![]()
![]()
![]()
![]()
We begin with characterization of two-locus models using the notion of genotypic values within each trait locus. Trait distributions under several two-locus models are presented to demonstrate the increased challenge of devising selective sampling methods. Properties of two ESP statistics are then briefly reviewed. We then describe our method to calculate power of ESPs (either pointwise or jointly) by computing the probability of multipoint IBD distribution conditional on their trait outcomes and estimating the correlation structure of ESP test scores at any two marker locations.
Finally, we present power calculations over a variety of two-locus models and the effects of various kinds of ESP design. Important practical issues concerning oligogenic QTL mapping such as asymmetric contributions of loci, schemes of conditional analysis, utilization of different types of ESPs, threshold of trait selection, and overselection are carefully discussed.
| METHODS |
|---|
We assume that the quantitative phenotype X derives from an additive effect of the overall phenotypic mean (µ), several biallelic loci (g), and a residual term (e),

where g is the genotypic mean conditioned on the genotypes of a person at the two susceptible loci. Two types of linkage tests using extreme sibpairs are discussed in this article. One was used by ![]()
![]()
Two-locus model:
We consider, for the most part, two types of two-locus models in this article: an additive model with no interaction and an epistatic model with multiplicative interaction. Unless otherwise stated, the two contributing loci are assumed to have the same gene frequencies and contribute equally to the total heritability of the trait.
Let us denote the two alleles at the first major locus by a and A, with frequencies
, and alleles at the second locus by b and B with frequencies
. We assume that the alleles a and b contribute to increased risk of the disease/trait in consideration. The residuals are allowed to be correlated among relatives (correlation
). The genotypic means g, depending on the genotype of a person at the two loci, take values according to the underlying epistatic model as follows. Let -a1, d1, and a be the "locus-specific" genotypic values for genotypes AA, aA, and aa, respectively, at the first locus and k - a2, k + d2, and k + a2 be genotypic values for genotypes BB, bB, and bb, respectively, at the second locus, where d1 and d2 measure locus-specific dominant effects, and k indicates genetic displacement due to the second locus. The additive interaction between the two loci is defined in Table 1.
|
In a similar fashion, for the multiplicative model we use K1, K1e1, and K1b1 as the "genotypic values" for genotypes AA, aA, and aa, respectively, at the first locus and K2, K2e2, and K2b2 for genotypes BB, bB, and bb, respectively, at the second locus. And the multiplicative interaction is defined in Table 2, with
.
|
The genotypic variance and the heritability under such models may be calculated as in ![]()
Selection of sibpairs on their trait values:
The trait values are divided into a certain number of intervals with specified probabilities. Individuals with trait values that fall in either the top or the bottom intervals are said to have extreme-high or extreme-low trait values. An extreme sibpair has its members sampled from either one or both of the extreme tails.
We know that, under a single-gene model, the strength of the selective sampling method comes from the fact that by sampling sibpairs from the extreme tails of the trait distribution, the probability of their IBD sharing (2 or 0) is dramatically increased. It is only natural to expect that, under multilocus models, this enhancement will be weakened and the magnitude of the enhancement will be less clear and will depend much on the mode of interaction of the multiple loci. In Fig 1A, we plot the trait distribution of a two-locus model of additive gene interaction (no epistatsis), assuming that both loci contribute the same to the trait (symmetric model) with
, and
. We see that the effect of selection is nearly linear as in the one-locus case, except that its magnitude is smaller. In another case of a (symmetric) two-locus multiplicative model, as plotted in Fig 1B for
, and
, the selection of high trait values becomes less effective because multiple distinctive genotypes linger near the upper extreme tail.
|
In Fig 2, we plot the trait distribution of an asymmetric two-locus model with additive gene action, for which
, and
. The effect of selection on HC extreme sibpairs becomes more complicated since six genotypes for extreme sibpairs (including all possible genotypes at locus A) are now crowded in the upper extreme tail. Clearly the power to detect linkage to locus A will be much reduced compared to that to locus B, although both loci contribute the same heritability of 9.6% to the overall
.
|
Test statistics:
We consider two types of ESP study designs. One is Risch-Zhang's extreme sibpairs statistic (![]()
ki be the proportion of genes shared IBD by the ith sibpair with k sibs affected, and nk be the number of such sibpairs in the sample. If ED pairs are used, the Risch-Zhang statistic is given by

As an alternative, if there are sufficient numbers of both ED and HC sibpairs in the sample, the combined EDAC statistic (![]()

Suppose that the statistics are calculated at each marker in a panel of markers. We denote by Xk the statistic at the kth marker on the map. The power of ESP tests depends on, among other things, the expected IBD distribution conditional on phenotypes of a sibpair:

The formulas to calculate the power and/or sample size at a single marker locus are included below for your convenience.
For the ED test (see ![]()
![]() |
(1) |
where
is the cumulative distribution function of the standard normal, and
. The necessary number of ED sibpairs to achieve a power of 1 - ß is given by

For the EDAC test (![]()
(Z1-ß), where
![]() |
(2) |
To calculate the necessary sample size for applying the EDAC test for a given power of 1 - ß and significance level
, we first determine the ratios (r0, r2) between the numbers of ED pairs and the expected numbers of LC and HC pairs in the sample. We then substitute n0 and n2 by
and calculate the required number of ED sibpairs for the given power as shown in ![]()
Calculating the power of multipoint ESP scans:
Under the multilocus/multipoint setting, to correctly calculate the power of the ESP tests, one needs to get the correct estimate of the expected IBD proportions across a set of markers conditional on the quantitative trait values of the sibpairs. The expected IBD sharing can be expressed as a sum of the products of the IBD probabilities at the trait loci conditional on the phenotypic values and the IBD probabilities at the marker loci conditional on the IBD sharings at the trait loci,

where P(
t|P) is calculated by assuming that the trait values of a sibpair follow a bivariate normal distribution. The conditional probability P(
m|
t) may be calculated by the chromosome-based IBD distribution (CB-IBDD) algorithm discussed in ![]()

Consider the union map of both marker loci and trait loci. We first calculate probabilities of all IBD configurations with IBD sharing at the trait loci fixed by
t. And then we derive the conditional probability P(
m|
t) as a proportion to the total sum of all such probabilities.
The power to detect linkage under multilocus models can be measured in two ways. One is the pointwise power, i.e., the power to detect linkage at a particular marker location, which is calculated by Equation 1 and Equation 2. The other is the total power of the scan over a panel of markers (called joint power in the sequel), i.e., the power to detect linkage at least at one of the marker locations. We assume that the joint distribution of the ESP test scores across all the marker locations follows a multivariate normal distribution and estimate the covariance between the test scores at two marker locations as

where Pkl is the probability that a pair shares k genes IBD at locus i and l genes IBD at locus j, Pk is the probability that the pair shares k genes IBD at locus i, and P'l is the probability that the pair shares l genes IBD at locus j, and all are conditional on the sibpair's phenotypes. The formula involves only joint probabilities of IBD vectors and its calculation was described at the beginning of this section.
Conditional analysis:
It is quite common in analyses of complex traits that one would detect, in a genome-wide scan, a number of weak-to-moderate signals among which many are possibly false positives (![]()
One way to enhance the weak (but real) signals out of their noisy background is via signal amplification by conducting analysis at the interesting loci conditional on results at the other "candidate" or "detected" loci. For example, ![]()
![]()
| RESULTS |
|---|
Power and/or required sample sizes are estimated under a variety of two-locus models. For fixed gene frequencies, fixed mode of epistatic gene action, and fixed ratio of genotypic values within each locus, we let the genotypic value vary to achieve a given overall heritability. Pointwise power, joint power over the two-trait gene locations, and joint power over a panel of 10 markers are all calculated. Also, cases of both linked and unlinked trait loci are considered. In the case of linked loci, the markers are assumed to be 10 cM apart and the trait loci are located at the fourth and the seventh markers. In the case of unlinked loci, the positions of the loci and the markers remain the same but the first and the last five markers are assumed to be from two unlinked groups (different chromosomes).
Balancing power and false-positive rate under multilocus models:
For 100 ED sibpairs sampled from the top and bottom 10th percentiles of the trait distribution, we calculated the power to detect linkage at the marker on top of the trait locus A, the joint power to detect linkage at either A or B, and the joint power to detect linkage by scanning the panel of 10 markers. Displayed in Table 3 is the power to detect linkage under single-locus models compared with that under two-locus models, with various preset false-positive rates (
-levels). It is worth noting that not only is the pointwise power severely reduced under two-locus models, but also more stringent significant levels reduced power drastically under two-locus models. Similar results hold when the two loci are linked.
|
Even when the overall heritability is fairly high
, under two-locus models, the pointwise power is close to useless (always <23%) if the stringent
(LANDERS and KRUGLYAK 1995) is used to infer "significant linkage." However, the joint power increases to 41.2% if two tests were performed on the markers right on top of the two trait loci. Using a less stringent
= 0.0001 (corresponding to one false positive in
18 scans with 400 markers), the power is
60% if the two loci are unlinked and 92% if linked (Table 4 and Table 5). As shown in columns 24 of Table 4 and Table 5, scanning multiple markers does improve power, and a couple of markers that are right on top of the two loci are as powerful as scanning a panel of 10 markers placed 10 cM apart. When the two trait loci are linked, scanning multiple markers can still achieve moderate power for somewhat lower heritabilitiesif a less stringent threshold of significance is used. For example, under the previous model, one achieves a power of 70% at
= 0.01 for
.
|
|
Enhancing power by EDAC and conditional analysis:
As we showed in previous work on ESP methods (![]()
![]()
![]()
Displayed in Table 6 is pointwise power at the location of locus A calculated for 100 ED pairs under a symmetric additive model with p = 0.1, in contrast to the power of EDAC tests that also utilize HC and EC pairs available in the samples. The EDAC method consistently enhances power. For example, at the significance level of
, the power of EDAC is 95.7% as compared to 62.4% for the ED-only design
.
|
The enhanced power by conditional analysis is shown in column 5 of Table 4 and Table 5. Note that since we used a ED-only
design the strategies used in the conditional analyses were different depending on whether the underlying disease loci were linked or not. Fig 3 shows the effect on power of different selection strategies. It is clear that for linked loci, selection on IBD = 0 and IBD = 2 resulted in very different power as the trait heritability increases. Assuming a symmetric multiplicative model with two linked loci, in Table 7 we display numbers of ED sibpairs required to achieve a pointwise power of 90% at various significance levels to detect linkage to locus B using unconditional analysis, compared with that when the conditional analysis is performed by requiring extreme sibpairs sharing IBD of 0 at locus A. Sample sizes are reduced substantially using conditional analysis and the reduction is more substantial when heritability is small. Similar gain in power also holds under additive models as shown in Table 8.
|
|
|
Effect of polychotomization:
The thresholds of trait values used to select extreme sibpairs are of importance in determining the actual power and cost-effectiveness of a QTL study (see ![]()
In Table 9, we display the effect of polychotomization by calculating the required ED sample sizes for a power of 90% at
= 0.0001 using different thresholds to select extreme sibpairs, under a symmetric model with additive gene action and additive effect at both loci
. For a sightly different symmetric model
, we show the required sample sizes of HC pairs to detect loci A and B, respectively, in Fig 4. Both examples demonstrate that, under the symmetric models, the power increases with more extreme thresholds for selection.
|
|
Under asymmetric models, however, the monotonicity between power and the selection threshold is likely violated at one locus while maintained at the other, and the power to detect the two loci may not be comparable. For the model used in Fig 2
, we plot the required sample sizes of HC pairs (at the locations of A and B, respectively) for a power of 90% at
= 0.0001 against the upper thresholds (percentiles) for extremely high trait values (Fig 5). We see that the power to detect A is slightly reduced when the threshold is >90th percentile, but increased again when it is >98th percentile.
|
For severely asymmetric models, overselection of extreme trait values could result in quite noticeable unexpected power reduction. Let us modify the previous asymmetric model so that it has a dominant effect at each locus
and consider various gene frequencies at A: p1 = 0.1, 0.2, and 0.5. The required sample sizes of HC pairs to detect linkage at locus A are plotted in Fig 6. The gene frequencies at locus B are calculated so that it gives the same heritability as locus A. Under all the models, power increases with selection threshold up to a certain level before declining unexpectedly. This shows that selecting for extreme trait values remains an effective strategy when moderately high thresholds are employed (see Table 10, where the power at loci A and B, respectively, is presented under several pairs of selection thresholds).
|
|
It seems that, under the models studied, higher gene frequencies at the problematic locus require less extreme selection to avoid power loss. The effect of selection on the power is also less significant in such cases (the curve in Fig 6 is flatter). It is worth noting that, in the above example, at locus B, the effect of selection on power is still proportional to selection (Fig 7A), and the power loss due to overselection occurs much later for ED pairs than for HC pairs (Fig 7B).
|
| DISCUSSION |
|---|
As the number of complex trait loci increases, it tends to weaken the linkage information carried at any given locus, even when the loci are closely linked. We have shown that the extreme sibpair (ESP) methods remain powerful for detection of QTL of oligogenic traits in genome-wide scans. However, the power may be much reduced as compared to single-locus models and hopelessly so when stringent significance thresholds are used. Therefore, closer scrutiny is necessary when applying stringent significance thresholds in a genomic search. A balance must be maintained between the power to detect linkage and the accuracy of a linkage claim, i.e., between false negatives and false positives (![]()
![]()
-level for infinitely many markers (continuous genome) and exceptionally low false-positive rate may not be the best choice. Moreover, in light of a recent article by ![]()
We provided a general algorithm for estimating the power of ESP methods under multilocus models and using multipoint analysis. It is clear that the power of an ESP design depends on the actual genetic model as well as the selection scheme for ESPs. The multilocus nature makes it harder to optimize study designs, since possible interaction among the trait loci and the possible asymmetry of all the genetic effects complicate the choice of a study design for detecting all the trait loci in a single genomic scan. As we have shown here, conditional analysis may improve the chances to delineate the epistatic interactions and detect linkage of smaller effects, by weighting test scores at certain locations according to the results at other locations already detected. We studied the type of conditional analysis that conditions on the IBD sharing of a sibpair at some detected/known genes. This may also be viewed as a type of subgroup analysis in which a subgroup of the sample is selected to enrich the homogeneity within the group. It will not only enhance the power of analysis but also assist in revealing interaction effects. Other weighting schemes may be applied in a similar fashion depending on the types of extreme sibpairs used and the actual results at the detected sites, and that certainly calls for further investigation. The ultimate solution to the problem of complex diseases, however, may depend on cleverly designed replication studies and the application of novel metaanalytical procedures tailored to pool genetic studies (![]()
| ACKNOWLEDGMENTS |
|---|
This work was supported in part by National Institutes of Health grant GM28719 and National Institute of Mental Health grant MH31302.
Manuscript received April 2, 2001; Accepted for publication January 28, 2002.
| LITERATURE CITED |
|---|
CAREY, G. and J. A. WILLIAMSON, 1991 Linkage analysis of quantitative traits: increased power by using selected samples. Am. J. Hum. Genet. 49:786-796[Medline].
CHEVERUD, J. M. and E. J. ROTHMAN, 1995 Epistasis and its contribution to genetic variance components. Genetics 139:1455-1461[Abstract].
COX, N. J., M. FRIGGE, D. L. NICOLAE, P. CONCANNON, and C. L. HANIS et al., 1999 Loci on chromosomes 2 (niddm1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nat. Genet. 21(2):213-215[Medline].
EAVES, L. and J. MEYER, 1994 Locating human quantitative trait loci: guidelines for the selection of sibling pairs for genotyping. Behav. Genet. 24(5):443-455[Medline].
FULKER, D. W., L. R. CARDON, J. C. DEFRIES, W. J. KIMBERLING, and B. F. PENNINGTON et al., 1991 Multiple regression analysis of sib-pair data on reading to detect quantitative trait loci. Read. Writ. Interdisc. J. 3:299-313.
GU, C. and D. C. RAO, 1997a A linkage strategy for detection of human quantitative-trait loci: I. Generalized relative risk ratios and power of sibpairs with extreme trait values. Am. J. Hum. Genet. 61:200-210[Medline].
GU, C. and D. C. RAO, 1997b A linkage strategy for detection of human quantitative-trait loci: II. Optimization of study designs based on extreme sibpairs and generalized relative risk ratios. Am. J. Hum. Genet. 61:211-222[Medline].
GU, C., B. K. SUAREZ, T. REICH, and A. A. TODOROV, 1995 A chromosome-based method to infer IBD scores for missing and ambiguous markers. Genet. Epidemiol. 12:871-876[Medline].
GU, C., A. A. TODOROV, and D. C. RAO, 1996 Combining extremely concordant sibpairs with extremely discordant sibpairs provides a cost effective way to linkage analysis of QTL. Genet. Epidemiol. 13:513-533[Medline].
GU, C., M. A. PROVINCE and D. C. RAO, 2001 Meta-analysis of genetic studies, pp. 255272 in Genetic Dissection of Complex Traits: Challenges for the Next Millennium. Academic Press, San Diego.
KAO, C.-H., Z-B. ZENG, and R. D. TEASDALE, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216
KEMPTHORNE, O., 1957 An Introduction to Genetic Statistics. John Wiley & Sons, New York.
LANDER, E. and L. KRUGLYAK, 1995 Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet. 11:241-247[Medline].
RAO, D. C., 1998 CAT scans, PET scans, and genomic scans. Genet. Epidemiol. 15:1-18[Medline].
RAO, D. C., and C. GU, 2001 False positives and false negatives in genome scans, pp. 487498 in Genetic Dissection of Complex Traits: Challenges for the Next Millennium. Academic Press, San Diego.
RISCH, N. and K. MERIKANGAS, 1996 The future of genetic studies of complex human diseases. Science 273:1516-1517
RISCH, N. and H. ZHANG, 1995 Extreme discordant sib pairs for mapping quantitative trait loci in humans. Science 268:1584-1589
ZENG, Z-B., 1994 Precision mapping of quantitative trait loci. Genetics 136:1457-1468[Abstract].
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gu, C.
- Articles by Rao, D. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Gu, C.
- Articles by Rao, D. C.


, and 
, and 


and
. Results are shown when the loci are (a) unlinked and (b) linked.
, required sample sizes of HC pairs for a power of 90% at 
; see 
. The gene frequencies and the total heritabilities for each curve are displayed in the inset.