Abstract
Several issues pertinent to study designs employing extreme sibpairs (ESP) methods to detect complex oligogenic quantitative trait loci (QTL) are investigated in the setting of genomewide multipoint scans. We demonstrate that when stringent αlevels are imposed (e.g., α = 0.00022 as recommended by Landers and Kruglyak), the power to detect a susceptibility locus could drop from 83.6% under a onelocus model down to a hopeless 22.8% under a twolocus model of the same heritability (h^{2} = 0.5) and gene frequency (p = 0.1). We introduce the notion of joint power that is the power to detect linkage to at least one location over a given panel of markers across a genomic region and describe the effect of several design factors on such joint power in a multipoint scan. Moreover, power of analysis conditional on the IBD sharings of ESPs at a known/detected locus is examined and shown to increase substantively (to 93.3% under the previous twolocus model) in detecting novel trait loci. We conclude that with such remedies, the ESP design continues to be a relatively powerful design for mapping oligogenic QTL. However, when the effect of individual contributing loci becomes less tractable, especially when their contributions are “asymmetric,” deliberation on balancing two types of statistical errors and a careful examination of possible contributions from multiple genetic factors and/or interaction effects are a must in designing an efficient study.
AS a powerful design in the case of quantitative traits influenced by a single gene, the extreme sibpairs (ESP) method can dramatically enhance the power by selecting sibpairs from extreme tails of the trait distribution (Carey and Williamson 1991; Fulkeret al. 1991; Eaves and Meyer 1994; Risch and Zhang 1995). However, it is well known that for complex traits, an increased number of quantitative trait loci (QTL) reduce power to detect linkage to any one of them, and it is not as clear how such selection techniques affect the power under multilocus oligogenic models. Possible epistasis among multiple loci may further reduce power to detect a particular QTL (Cheverud and Rothman 1995). As a result, several problems arise immediately: (1) Given that the effect of an individual locus will be much less, should we still use a stringent αlevel in genomic scans, (2) how do we minimize the effect of epistatic interaction on the power or utilize the information, and (3) is the effect of selective sampling still monotonic, and how does it affect the design of such a study?
This investigation is devoted to an examination of the power of ESP designs, when a quantitative trait is determined by two genetic loci. We show that although the effect of selective sampling is no longer monotonic, the ESP method remains powerful when used with caution. We see that indeed the power under twolocus models is severely compromised if very stringent αlevels are imposed, making it necessary to strike a balance between false positives and false negatives (Rao 1998; Rao and Gu 2001). To amplify the effect of a particular locus out of all possible epistatic interactions, we propose to conduct analysis at other sites conditional on the identity by descent (IBD) sharing at the known susceptibility loci and/or the loci where significant linkage was detected in earlier scans. Many reduced sample sizes are required when using such conditional analysis. Properties of similar conditional schemes for analysis of experimental crosses were thoroughly studied by Zeng and colleagues in their composite interval mapping (CIM) and multiple interval mapping (MIM) methods (Zeng 1994; Kaoet al. 1999).
We begin with characterization of twolocus models using the notion of genotypic values within each trait locus. Trait distributions under several twolocus models are presented to demonstrate the increased challenge of devising selective sampling methods. Properties of two ESP statistics are then briefly reviewed. We then describe our method to calculate power of ESPs (either pointwise or jointly) by computing the probability of multipoint IBD distribution conditional on their trait outcomes and estimating the correlation structure of ESP test scores at any two marker locations.
Finally, we present power calculations over a variety of twolocus models and the effects of various kinds of ESP design. Important practical issues concerning oligogenic QTL mapping such as asymmetric contributions of loci, schemes of conditional analysis, utilization of different types of ESPs, threshold of trait selection, and overselection are carefully discussed.
METHODS
We assume that the quantitative phenotype X derives from an additive effect of the overall phenotypic mean (μ), several biallelic loci (g), and a residual term (e),
Twolocus model: We consider, for the most part, two types of twolocus models in this article: an additive model with no interaction and an epistatic model with multiplicative interaction. Unless otherwise stated, the two contributing loci are assumed to have the same gene frequencies and contribute equally to the total heritability of the trait.
Let us denote the two alleles at the first major locus by a and A, with frequencies p_{1} and q_{1} = 1 − p_{1}, and alleles at the second locus by b and B with frequencies p_{2} and q_{2} = 1 − p_{2}. We assume that the alleles a and b contribute to increased risk of the disease/trait in consideration. The residuals are allowed to be correlated among relatives (correlation ρ). The genotypic means g, depending on the genotype of a person at the two loci, take values according to the underlying epistatic model as follows. Let −a_{1}, d_{1}, and a be the “locusspecific” genotypic values for genotypes AA, aA, and aa, respectively, at the first locus and k − a_{2}, k + d_{2}, and k + a_{2} be genotypic values for genotypes BB, bB, and bb, respectively, at the second locus, where d_{1} and d_{2} measure locusspecific dominant effects, and k indicates genetic displacement due to the second locus. The additive interaction between the two loci is defined in Table 1.
In a similar fashion, for the multiplicative model we use K_{1}, K_{1}e_{1}, and K_{1}b_{1} as the “genotypic values” for genotypes AA, aA, and aa, respectively, at the first locus and K_{2}, K_{2}e_{2}, and K_{2}b_{2} for genotypes BB, bB, and bb, respectively, at the second locus. And the multiplicative interaction is defined in Table 2, with K = K_{1}K_{2}.
The genotypic variance and the heritability under such models may be calculated as in Kempthorne (1957), and their values depend on the gene frequencies, the genotypic displacement within each trait locus, and the magnitude of epistatic iteration of the two loci.
Selection of sibpairs on their trait values: The trait values are divided into a certain number of intervals with specified probabilities. Individuals with trait values that fall in either the top or the bottom intervals are said to have extremehigh or extremelow trait values. An extreme sibpair has its members sampled from either one or both of the extreme tails.
We know that, under a singlegene model, the strength of the selective sampling method comes from the fact that by sampling sibpairs from the extreme tails of the trait distribution, the probability of their IBD sharing (2 or 0) is dramatically increased. It is only natural to expect that, under multilocus models, this enhancement will be weakened and the magnitude of the enhancement will be less clear and will depend much on the mode of interaction of the multiple loci. In Figure 1a, we plot the trait distribution of a twolocus model of additive gene interaction (no epistatsis), assuming that both loci contribute the same to the trait (symmetric model) with p_{i} = 0.20, k = 0, a_{1} = 1, and d_{i} = 0. We see that the effect of selection is nearly linear as in the onelocus case, except that its magnitude is smaller. In another case of a (symmetric) twolocus multiplicative model, as plotted in Figure 1b for p_{i} = 0.2, b_{i} = 4, e_{i} = 2, and K = 1, the selection of high trait values becomes less effective because multiple distinctive genotypes linger near the upper extreme tail.
In Figure 2, we plot the trait distribution of an asymmetric twolocus model with additive gene action, for which p_{1} = 0.40, a_{1} = 0.5, d_{1} = 0, p_{2} = 0.015, k = 2.0, a_{2} = 2, and d_{2} = 0. The effect of selection on HC extreme sibpairs becomes more complicated since six genotypes for extreme sibpairs (including all possible genotypes at locus A) are now crowded in the upper extreme tail. Clearly the power to detect linkage to locus A will be much reduced compared to that to locus B, although both loci contribute the same heritability of 9.6% to the overall h^{2} = 19.3%.
Test statistics: We consider two types of ESP study designs. One is RischZhang's extreme sibpairs statistic (Risch and Zhang 1995), which uses a single type of extreme sibpairs. Let k denote the number of sibs “affected” (i.e., in the highest tail) in a sibpair (e.g., k = 1 for ED pairs), π_{ki} be the proportion of genes shared IBD by the ith sibpair with k sibs affected, and n_{k} be the number of such sibpairs in the sample. If ED pairs are used, the RischZhang statistic is given by
As an alternative, if there are sufficient numbers of both ED and HC sibpairs in the sample, the combined EDAC statistic (Guet al. 1996) is calculated as
Suppose that the statistics are calculated at each
marker in a panel of markers. We denote by X^{k} the statistic at the kth marker on the map. The power of ESP tests depends on, among other things, the expected IBD distribution conditional on phenotypes of a sibpair:
For the ED test (see Risch and Zhang 1995), the power is
To calculate the necessary sample size for applying the EDAC test for a given power of 1 −β and significance level α, we first determine the ratios (r_{0}, r_{2}) between the numbers of ED pairs and the expected numbers of LC and HC pairs in the sample. We then substitute n_{0} and n_{2} by n_{0} = r_{0} · n_{1} and n_{2} = r_{2} · n_{1} and calculate the required number of ED sibpairs for the given power as shown in Gu et al. (1996), and sample sizes of LC (n_{0}) and HC (n_{2}) pairs are then computed using their respective ratios to n_{1}.
Calculating the power of multipoint ESP scans: Under the multilocus/multipoint setting, to correctly calculate the power of the ESP tests, one needs to get the correct estimate of the expected IBD proportions across a set of markers conditional on the quantitative trait values of the sibpairs. The expected IBD sharing can be expressed as a sum of the products of the IBD probabilities at the trait loci conditional on the phenotypic values and the IBD probabilities at the marker loci conditional on the IBD sharings at the trait loci,
The power to detect linkage under multilocus models can be measured in two ways. One is the pointwise power, i.e., the power to detect linkage at a particular marker location, which is calculated by Equations 1 and 2. The other is the total power of the scan over a panel
of markers (called joint power in the sequel), i.e., the power to detect linkage at least at one of the marker locations. We assume that the joint distribution of the ESP test scores across all the marker locations follows a multivariate normal distribution and estimate the covariance between the test scores at two marker locations as
Conditional analysis: It is quite common in analyses of complex traits that one would detect, in a genomewide scan, a number of weaktomoderate signals among which many are possibly false positives (Rao and Gu 2001).
One way to enhance the weak (but real) signals out of their noisy background is via signal amplification by conducting analysis at the interesting loci conditional on results at the other “candidate” or “detected” loci. For example, Cox et al. (1999) performed allelesharing analysis of their diabetic families at markers across chromosome 15 weighted by the evidence for linkage at NIDDM1 on chromosome 2 and obtained a much clearer signal at the CYP19 locus (LOD score of 4.0 vs. 1.3; see Coxet al. 1999). We consider here the type of conditional analysis on the basis of the IBD status at “known” or detected loci. Namely, depending on the type of extreme sibpairs used in the analysis, we calculate test scores at other locations using only those extreme sibpairs that have specific IBD sharings at the detected/known genes. In other words, we zeroweight the extreme sibpairs with unfavorable IBD sharings at the detected/known genes in the calculation of ESP test scores at other locations. It is shown that conditional on the IBD status, the power can be increased substantially.
RESULTS
Power and/or required sample sizes are estimated under a variety of twolocus models. For fixed gene frequencies, fixed mode of epistatic gene action, and fixed ratio of genotypic values within each locus, we let the genotypic value vary to achieve a given overall heritability. Pointwise power, joint power over the twotrait gene locations, and joint power over a panel of 10 markers are all calculated. Also, cases of both linked and unlinked trait loci are considered. In the case of linked loci, the markers are assumed to be 10 cM apart and the trait loci are located at the fourth and the seventh markers. In the case of unlinked loci, the positions of the loci and the markers remain the same but the first and the last five markers are assumed to be from two unlinked groups (different chromosomes).
Balancing power and falsepositive rate under multilocus models: For 100 ED sibpairs sampled from the top and bottom 10th percentiles of the trait distribution, we calculated the power to detect linkage at the marker on top of the trait locus A, the joint power to detect linkage at either A or B, and the joint power to detect linkage by scanning the panel of 10 markers. Displayed in Table 3 is the power to detect linkage under singlelocus models compared with that under twolocus models, with various preset falsepositive rates (αlevels). It is worth noting that not only is the pointwise power severely reduced under twolocus models, but also more stringent significant levels reduced power drastically under twolocus models. Similar results hold when the two loci are linked.
Even when the overall heritability is fairly high (h^{2} = 0.50), under twolocus models, the pointwise power is close to useless (always <23%) if the stringent α = 0.000022 (Landers and Kruglyak 1995) is used to infer “significant linkage.” However, the joint power increases to 41.2% if two tests were performed on the markers right on top of the two trait loci. Using a less stringent α = 0.0001 (corresponding to one false positive in ~18 scans with 400 markers), the power is ~60% if the two loci are unlinked and 92% if linked (Tables 4 and 5). As shown in columns 2–4 of Tables 4 and 5, scanning multiple markers does improve power, and a couple of markers that are right on top of the two loci are as powerful as scanning a panel of 10 markers placed 10 cM apart. When the two trait loci are linked, scanning multiple markers can still achieve moderate power for somewhat lower heritabilities—if a less stringent threshold of significance is used. For example, under the previous model, one achieves a power of 70% at α = 0.01 for h^{2} = 25%.
Enhancing power by EDAC and conditional analysis: As we showed in previous work on ESP methods (Guet al. 1996; Gu and Rao 1997a,b), combining different types of extreme sibpairs in the sample could significantly enhance the power to detect linkage. And for a variety of singlelocus models, it is also a more costeffective design compared with pursuing only a single type of extreme sibpairs. Our analysis verifies that similar conclusions hold for a variety of twolocus models.
Displayed in Table 6 is pointwise power at the location of locus A calculated for 100 ED pairs under a symmetric additive model with p = 0.1, in contrast to the power of EDAC tests that also utilize HC and EC pairs available in the samples. The EDAC method consistently enhances power. For example, at the significance level of α = 0.00074, the power of EDAC is 95.7% as compared to 62.4% for the EDonly design (h^{2} = 0.4).
The enhanced power by conditional analysis is shown in column 5 of Tables 4 and 5. Note that since we used a EDonly (n_{ED} = 100) design the strategies used in the conditional analyses were different depending on whether the underlying disease loci were linked or not. Figure 3 shows the effect on power of different selection strategies. It is clear that for linked loci, selection on IBD = 0 and IBD = 2 resulted in very different power as the trait heritability increases. Assuming a symmetric multiplicative model with two linked loci, in Table 7 we display numbers of ED sibpairs required to achieve a pointwise power of 90% at various significance levels to detect linkage to locus B using unconditional analysis, compared with that when the conditional analysis is performed by requiring extreme sibpairs sharing IBD of 0 at locus A. Sample sizes are reduced substantially using conditional analysis and the reduction is more substantial when heritability is small. Similar gain in power also holds under additive models as shown in Table 8.
Effect of polychotomization: The thresholds of trait values used to select extreme sibpairs are of importance in determining the actual power and costeffectiveness of a QTL study (see Gu and Rao 1997b for a discussion on determining optimal thresholds). Under multilocus models, the effect on power of polychotomization is no longer monotonic, nor is it comparable among trait loci, especially when the genetic contributions of the two trait loci are severely asymmetric. Nevertheless, over a variety of twolocus models our calculations show that ESP methods enhance power to detect linkage (as long as one maintains balance between the detection of either or both of the trait loci in the same scan and avoids overselection under “severely asymmetric” models, which may result in “unexpected” power decay).
In Table 9, we display the effect of polychotomization by calculating the required ED sample sizes for a power of 90% at α = 0.0001 using different thresholds to select extreme sibpairs, under a symmetric model with additive gene action and additive effect at both loci (p_{i} = 0.1, a_{i} = 1.0, d_{i} = 0, and k = 0; ρ = 0, h^{2} = 0.264). For a sightly different symmetric model (p = 0.2, k = 2), we show the required sample sizes of HC pairs to detect loci A and B, respectively, in Figure 4. Both examples demonstrate that, under the symmetric models, the power increases with more extreme thresholds for selection.
Under asymmetric models, however, the monotonicity between power and the selection threshold is likely violated at one locus while maintained at the other, and the power to detect the two loci may not be comparable. For the model used in Figure 2 (p_{1} = 0.40, a_{1} = 0.5, d_{1} = 0, p_{2} = 0.015, k = 2.0, a_{2} = 2, and d_{2} = 0), we plot the required sample sizes of HC pairs (at the locations of A and B, respectively) for a power of 90% at α = 0.0001 against the upper thresholds (percentiles) for extremely high trait values (Figure 5). We see that the power to detect A is slightly reduced when the threshold is >90th percentile, but increased again when it is >98th percentile.
For severely asymmetric models, overselection of extreme trait values could result in quite noticeable unexpected power reduction. Let us modify the previous asymmetric model so that it has a dominant effect at each locus (d_{1} = d_{2} = 1) and consider various gene frequencies at A: p_{1} = 0.1, 0.2, and 0.5. The required sample sizes of HC pairs to detect linkage at locus A are plotted in Figure 6. The gene frequencies at locus B are calculated so that it gives the same heritability as locus A. Under all the models, power increases with selection threshold up to a certain level before declining unexpectedly. This shows that selecting for extreme trait values remains an effective strategy when moderately high thresholds are employed (see Table 10, where the power at loci A and B, respectively, is presented under several pairs of selection thresholds).
It seems that, under the models studied, higher gene frequencies at the problematic locus require less extreme selection to avoid power loss. The effect of selection on the power is also less significant in such cases (the curve in Figure 6 is flatter). It is worth noting that, in the above example, at locus B, the effect of selection on power is still proportional to selection (Figure 7a), and the power loss due to overselection occurs much later for ED pairs than for HC pairs (Figure 7b).
DISCUSSION
As the number of complex trait loci increases, it tends to weaken the linkage information carried at any given locus, even when the loci are closely linked. We have shown that the extreme sibpair (ESP) methods remain powerful for detection of QTL of oligogenic traits in genomewide scans. However, the power may be much reduced as compared to singlelocus models and hopelessly so when stringent significance thresholds are used. Therefore, closer scrutiny is necessary when applying stringent significance thresholds in a genomic search. A balance must be maintained between the power to detect linkage and the accuracy of a linkage claim, i.e., between false negatives and false positives (Rao 1998; Rao and Gu 2001). For example, the αlevel for infinitely many markers (continuous genome) and exceptionally low falsepositive rate may not be the best choice. Moreover, in light of a recent article by Risch and Merikangas (1996), a candidate gene method may be the one to be used for finemapping of the trait genes. An extremely stringent significance level will often miss loci with small effects.
We provided a general algorithm for estimating the power of ESP methods under multilocus models and using multipoint analysis. It is clear that the power of an ESP design depends on the actual genetic model as well as the selection scheme for ESPs. The multilocus nature makes it harder to optimize study designs, since possible interaction among the trait loci and the possible asymmetry of all the genetic effects complicate the choice of a study design for detecting all the trait loci in a single genomic scan. As we have shown here, conditional analysis may improve the chances to delineate the epistatic interactions and detect linkage of smaller effects, by weighting test scores at certain locations according to the results at other locations already detected. We studied the type of conditional analysis that conditions on the IBD sharing of a sibpair at some detected/known genes. This may also be viewed as a type of subgroup analysis in which a subgroup of the sample is selected to enrich the homogeneity within the group. It will not only enhance the power of analysis but also assist in revealing interaction effects. Other weighting schemes may be applied in a similar fashion depending on the types of extreme sibpairs used and the actual results at the detected sites, and that certainly calls for further investigation. The ultimate solution to the problem of complex diseases, however, may depend on cleverly designed replication studies and the application of novel metaanalytical procedures tailored to pool genetic studies (Guet al. 2001).
Acknowledgments
This work was supported in part by National Institutes of Health grant GM28719 and National Institute of Mental Health grant MH31302.
Footnotes

Communicating editor: G. A. Churchill
 Received April 2, 2001.
 Accepted January 28, 2002.
 Copyright © 2002 by the Genetics Society of America