## Abstract

In the context of genomewide association studies where hundreds of thousand of polymorphisms are tested, stringent thresholds on the raw association test *P*-values are generally used to limit false-positive results. Instead of using thresholds based on raw *P*-values as in Bonferroni and sequential Sidak (SidakSD) corrections, we propose here to use a weighted-Holm procedure with weights depending on allele frequency of the polymorphisms. This method is shown to substantially improve the power to detect associations, in particular by favoring the detection of rare variants with high genetic effects over more frequent ones with lower effects.

THE development of high-throughput genotyping technologies, allied to a deeper understanding of the pattern of human sequence variation, has recently offered the opportunity to perform association studies with hundreds of thousands of single-nucleotide polymorphisms (SNPs) expected to cover the whole genome. Recent successes in human complex diseases such as type II diabetes mellitus (Scott *et al*. 2007; Sladek *et al*. 2007; Zeggini *et al*. 2007), Crohn's disease (Duerr *et al*. 2006; Rioux *et al*. 2007), prostate cancer (Yeager *et al*. 2007), and coronary artery disease (Helgadottir *et al*. 2007; Samani *et al*. 2007) provided compelling proof-of-principle. Most recent genomewide association studies (GWAS) identified “at-risk” SNPs with minor allele frequency (MAF) ∼0.30 and associated with allelic odds ratio ∼1.30. This could be explained by the fact that the SNP arrays used to perform GWAS were designed under the hypothesis that genetic susceptibility to complex diseases involves common variants that confer moderate to low relative risk (*i.e*., the common disease–common variants hypothesis) and are thus enriched in common variants. However, examples from the literature (see Cambien and Tiret 2007) show that rare variants with stronger effects may also contribute to the genetic architecture of complex diseases and it might then be of interest to develop statistical methods that will favor their detection over that of most frequent variants.

The statistical analysis of such a large number of SNPs requires imposing stringent thresholds on the association tests' *P*-values to control for the risk of false positives. In the context of multiple-testing procedures (MTP) controlling the familywise error rate (FWER), several methods based on raw *P*-values including Bonferonni, sequential Sidak (SidakSD), and Holm (Holm 1979) are available. These standard MTPs can be improved by weighting *P*-values according to some specific criteria to increase power (Benjamini and Hochberg 1997; Kropf and Lauter 2002; Kropf *et al*. 2004; Westfall *et al*. 2004). In the field of GWAS, Roeder *et al*. (2007) have recently shown that grouping SNPs within *a priori* sets subsequently used for defining weights could considerably improve the power to detect association. Instead of grouping SNPs, we here proposed to directly weight association tests' *P*-values according to the SNP's MAF estimated in the control sample and to apply the weighted-Holm (WH) procedure (Benjamini and Hochberg 1997). This strategy is expected to favor the detection of rare SNPs with a high odds ratio (OR) over that of frequent SNPs with a moderate OR. A simulation study was carried out to study the statistical properties of the proposed WH procedure and demonstrated that this very simple WH procedure can greatly improve the power of GWAS over standard MTPs controlling the FWER.

Following the general notations of Benjamini and Hochberg (1997), we define, for each SNP *i* (*i* = 1–*N*) with MAF *f _{i}* in the control sample, a standardized weight

*w*as

_{i}*w*= .

_{i}Let *p*_{1}, *p*_{2}, … , *p _{N}* be the

*P*-values of the association test for each SNP

*i*(

*i*= 1, 2, … ,

*N*) and be their weighted counterpart. Rank these in increasing order ( and finally note as the standardized weight corresponding to the SNP associated with . To control the FWER at level α, our proposed sequential WH procedure declares significant at level α any SNP whose satisfies for

*j*= 1–

*i*. The smaller the MAF, the larger the weight

*w*and thus the greater the power for rejecting the null hypothesis of no association with the disease. When all weights are fixed to 1, this procedure is equivalent to the standard Holm procedure (Holm 1979).

_{i}#### Simulated data:

Data were simulated under a balanced case–control design of 1000 individuals genotyped for *N* independent SNPs. A 10-locus disease model was considered with independent genetic effects. For the *N* − 10 nondisease SNPs, MAFs were randomly drawn from a uniform distribution *U*[0.05–0.50]. Concerning the disease SNPs, their MAFs in controls were randomly drawn from *U*[0.05–0.15] for 3 of them, from *U*[0.15 – 0.25] for another 3, and from *U*[0.30–0.40] for the remaining 4. The corresponding frequencies in cases were set to be compatible with predefined allelic effects as described below. For each particular setting, MAFs were fixed across the 2000 simulated replicates. Genotypes were then generated under the assumption of Hardy–Weinberg equilibrium in cases and in controls. Association was tested at the 0.05 significance level, using a simple allelic χ^{2}-test statistic with 1 d.f. Different patterns of association and different values for the number *N* of SNPs (1000, 2000, 5000, 10,000, 20,000, 50,000) were investigated. The performances of our proposed WH procedure were compared in terms of type I error and power to those of several standard unweighted MTPs, including Bonferroni, Holm, and SidakSD (Holm 1979). The allele frequencies used for computing the weights *w _{i}* were estimated from the control sample of each replicate.

#### Results:

For each studied MTP, we first checked that, under the complete null hypothesis (no SNP associated with the disease), the proportion of replicates in which at least one nondisease SNP was found significantly associated with the disease was always in agreement with the 0.05 threshold for the FWER (data not shown). False-positive SNPs tended, however, to be slightly less frequent with the WH procedure than with the unweighted methods (data not shown). These observations also hold under the different patterns of association considered (see below), confirming that all the studied MTPs correctly controlled the FWER at the desired level of 0.05.

Three different patterns of associations were considered when investigating the MTPs' power. For ease of presentation, results obtained with the Bonferroni procedure were not shown since they were very similar to those obtained with the Holm method. The first situation considered was a situation where disease SNPs with MAF <0.15 were associated with an allelic OR of 1.8, those with MAF between 0.15 and 0.25 were associated with an allelic OR of 1.5, while the last 4 SNPs with MAF between 0.30 and 0.40 were associated with an OR of 1.3. As shown in Figure 1, the WH procedure was more powerful than the Holm and SidakSD procedures. Interestingly, the power improvement increased with the number of tested SNPs, ranging from an ∼1.3% increase (*e.g*., 0.961 *vs*. 0.947) when *N* = 5000 to a 3.0% increase (*e.g*., 0.817 *vs*. 0.790) when *N* = 50,000. The good performance of the WH procedure over other methods was also illustrated by an improvement in the average number of identified disease SNPs per replicate. When *N* = 5000, the WH procedure provided a 3.0% increase in the number of true disease SNPs detected by comparison to SidakSD and a 7.5% increase when *N* = 50,000. When conditioning on there being at least one identified disease SNP, the number of detected SNPs is still higher with the WH method even though the gain of efficiency is less important. To illustrate how weighting influenced these results, Table 1 shows the power to detect the effect of each individual SNP when the total number of simulated SNPs was *N* = 20,000. As expected, the WH procedure favored the detection of SNPs with low MAF and high ORs over SNPs with higher MAF but lower ORs, while controlling for the expected FWER. The same observations hold for other values of *N* (data not shown).

In the second pattern of association considered, the allelic effects of the 10 disease SNPs were defined such that they each correspond to an attributable fraction of 0.05. Results are summarized in Figure 2. Again, the WH procedure outperformed the unweighted MTPs in terms of power and of average number of detected disease SNPs. When *N* = 1000, the power improvement over the SidakSD was ∼16% (0.684 *vs*. 0.589) and increased up to 41% (0.244 *vs*. 0.172) when *N* = 50,000. In terms of average number of identified disease SNPs, the WH efficiency over the SidakSD increased from 20% when *N* = 1000 to 41% when *N* = 50,000. Again, the power of detecting SNPs with low MAF but higher OR was increased using the WH procedure (Table 2).

However, looking carefully at Tables 1 and 2 revealed that the two patterns of association we have considered up to now corresponded to situations where SNPs with low MAF had much more power to be detected than common SNPs, whatever the MTP used. Therefore, a third situation was considered, where genetic effects were defined such that all allelic effects have approximately the same chance to be detected. For a given MAF in controls, the MAF in cases was computed using the arcsin transformation such that the power (uncorrected for multiple testing) for detecting an allelic case–control difference was 50% at a 0.05 significance level. Results are reported in Figure 3. Again, the WH procedure is more powerful than the unweighted methods but the gain in power is quite moderate. It achieved only ∼1.8% (0.566 *vs*. 0.556) when *N* reached 50,000, and the average number of disease sites detected is only 3.9% larger than with the unweighted procedures. Under this scenario, we found that both Holm and SidakSD corrections tend to disadvantage rare disease susceptibility alleles in favor of more frequent ones as illustrated in Table 3. One explanation for this result could be linked to the fact that the variance of the conditional distribution of the test statistic is lower for frequent alleles. As expected, the WH procedure does the opposite with a 6.7% power increase for SNP1 with the lowest MAF and a 2% decrease for SNP10 with the highest MAF.

To illustrate how the proposed weighting could affect the results of an association study, we compare the results of the weighted-Holm and Holm procedures applied to the analysis of 397,857 SNPs in a sample of 1926 coronary artery disease patients and 2938 controls participating in the Wellcome Trust Case–Control Consortium (WTCCC) initiative (Wellcome Trust Case–Control Consortium 2007). Even if, after controlling for multiple testing using either the Holm or the weighted-Holm procedure, no SNP reached statistical significance, Table 4 clearly indicates that the weighted-Holm procedure alters the order of “significance” in favor of less frequent SNPs by comparison to the Holm procedure. The same phenomenon holds for the SidakSD method (data not shown).

In this brief report, we showed how the weighted-Holm procedure could be used for slightly improving the power of association analyses when the number of tested SNPs is large and when MTPs controlling FWER are considered. This simple procedure just consists of weighting association-test *P*-values by the inverse allele frequencies. It is more appropriate when polymorphisms underlying the disease of interest are assumed to have low allele frequency but larger genetic effects rather than moderate/high frequency with smaller effects. Such SNPs with low MAF but larger genetic effects are expected to be of more clinical importance since they could “constitute potential diagnostic and prognostic tools” (Cambien and Tiret 2007) even though they may be less easily tagged by current DNA chips (Evans *et al*. 2008). In view of the recent finding by Evans *et al*., our method may then be more suitable for genomewide scans of nonsynonymous variants (Evans *et al*. 2008) that are particularly useful for the detection of “rare variants with intermediate penetrances” rather than common SNPs with moderate effects. It must be emphasized that all simulations were performed assuming uniform distribution of allele frequencies, a situation that is not realistic since SNPs with low MAF are expected to be more frequent over the genome than SNPs with higher MAF. However, it might represent the content of currently available DNA chips as the number of SNPs is now increasing up to 1 million and as reflected by the frequency spectrum in HapMap (see Figure 5 in the International HapMap Consortium 2005). Another assumption underlying these simulations is the independence, *i.e*., absence of linkage disequilibrium (LD), between SNPs. While one cannot rule out the possibility that LD may affect the behavior of our WH procedure, it must be stressed that LD is not taken into account in most published GWAS to select SNPs that would deserve replication in independent studies. This issue is becoming of less importance as current DNA chips tend to rely on tagging SNPs to eliminate part of the dependency between SNPs. All this work was based on the MAF inverse functions for defining weighted *P*-values. We have also looked at other functions (identity, log, square root, entropy) but the inverse function was the one that improved best the power for detecting a low-MAF allele. Weights giving more chance to frequent SNPs did not substantially outperform the SidakSD or the Holm procedure (data not shown). Further work would, however, be required for deeply investigating that point.

Moreover, since in the simulations the weights used with the WH procedure were derived from allele-frequency estimates obtained on each control sample replicate, they were not completely independent of the *P*-value. We do not think it invalidates the approach since all simulations showed that the WH procedure correctly controls for the desired FWER. However, it could have an impact on the power since weights of rare variants are then estimated with less precision than those of common variants. Some additional work will be needed to investigate this issue with, for example, an evaluation of the relevance of accounting for the variance of the allele-frequency estimates in the weights.

Compared to the approach proposed by the WTCCC (Wellcome Trust Case–Control Consortium 2007) that consists of using Bayes factors, our proposed WH procedure does not make any assumption on the effect sizes of the different SNPs. However, as shown by our simulations, the performances of the WH procedure strongly depend on the effect of the different SNPs. Should the SNPs with low MAF have effects (measured here by the allelic odds ratio) similar to the ones of SNPs with high MAF, then the procedure will not be of any interest. Indeed, it will reduce the power to detect the high-MAF SNPs that were the ones that should be the most easily detectable. In this respect, it will do just the opposite of the Bayes factor approach. However, in the situation where the expected power to detect SNPs with low and high MAFs is the same by unweighted approaches (see Table 3), then the proposed approach will boost the SNPs with low MAF that are also most likely to be nonsynonymous SNPs themselves or proxies for nonsynonymous SNPs. It could be of interest to compare the WH procedure to a Bayes factor approach where priors will be higher from SNPs with low MAFs than from SNPs with high MAFs.

As noted by Dudoit *et al*. (2003, p. 74), “within the class of multiple testing procedures that control a given type I error rate at an acceptable level α, one seeks procedures that maximize *power*, that is, minimize a suitably defined type II error rate.” In our work, we have chosen to control the FWER as this is a more conservative error criterion than the false discovery rate (FDR) one and also because most published GWAS rely on FWER rather than FDR. Our simulations showed that the WH procedure controls the FWER at the correct level. However, application of the weighted FDR proposed by Benjamini and Hochberg (1997) in the context of GWAS also deserves attention.

## Acknowledgments

We thank François Cambien, Laurence Tiret, and Philippe Broët for helpful discussions on an earlier version of this manuscript. We also are grateful to anonymous reviewers for their helpful comments. This study makes use of data generated by the Wellcome Trust Case–Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. C.D. is funded by a grant from the Région Ile-de-France. Funding for the project was provided by the Wellcome Trust under award 076113.

## Footnotes

Communicating editor: M. W. Feldman

- Received April 5, 2008.
- Accepted June 27, 2008.

- Copyright © 2008 by the Genetics Society of America