- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Aplenc, R.
- Articles by Propert, K. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Aplenc, R.
- Articles by Propert, K. J.
Group Sequential Methods and Sample Size Savings in Biomarker-Disease Association Studies
R. Aplenca,b, H. Zhaob, T. R. Rebbecka, and K. J. Propertaa Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6021
b Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104
Corresponding author: R. Aplenc, 9th Fl., Blockley Hall, 423 Guardian Dr., Philadelphia, PA 19104-6021., raplenc{at}cceb.med.upenn.edu (E-mail)
Communicating editor: Z-B. ZENG
| ABSTRACT |
|---|
Molecular epidemiological association studies use valuable biosamples and incur costs. Statistical methods for early genotyping termination may conserve biosamples and costs. Group sequential methods (GSM) allow early termination of studies on the basis of interim comparisons. Simulation studies evaluated the application of GSM using data from a case-control study of GST genotypes and prostate cancer. Group sequential boundaries (GSB) were defined in the EAST-2000 software and were evaluated for study termination when early evidence suggested that the null hypothesis of no association between genotype and disease was unlikely to be rejected. Early termination of GSTM1 genotyping, which demonstrated no association with prostate cancer, occurred in >90% of the simulated studies. On average, 36.4% of biosamples were saved from unnecessary genotyping. In contrast, for GSTT1, which demonstrated a positive association, inappropriate termination occurred in only 6.6%. GSM may provide significant cost and sample savings in molecular epidemiology studies.
ALTHOUGH group sequential methods (GSM) are routinely used to monitor randomized clinical trials, they have not yet been widely applied to molecular epidemiology (ME) studies. In clinical trials, GSM allow early closure of one or more treatment arms on the basis of interim analysis (![]()
![]()
Early closure for "futility," in which the study is unlikely to lead to rejection of the null hypothesis, is becoming more commonly used in clinical trials. Although ME studies lack this ethical imperative for early closure, such studies would benefit from early closure for futility for several reasons. First, such studies often use biologic samples that are difficult to obtain or limited in quantity. Second, genotype assessment incurs both material and labor costs. Thus, early closure for failure to reject the null hypothesis may save samples, reagents, labor, and opportunity costs. Finally, clearly defined interim analysis procedures would provide investigators with a formal tool for evaluating their data on an ongoing basis.
Previous investigators have described the importance of early closure for null effects (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Current molecular epidemiology studies, however, have practical characteristics that preclude these approaches. First, the finite number of available samples and limits on funding time lines prevent the use of an "open" GSM whose sample size is potentially unlimited. Second, almost all molecular epidemiology studies acquire genotype data on a group of samples simultaneously. Thus, the most appropriate GSM must evaluate sequential groups of genotype data rather than sequential individual genotypes. Finally, current studies often evaluate a small number of genotypes (<10), thus making the sample itself the primary limiting variable.
We evaluated the group sequential boundaries methods because of their widespread use and the availability of GSM commercial software. In GSM, the number of interim "looks" is frequently equally spaced and predefined at the design stage. These criteria may be relaxed during study conduct. In a case-control study, the test statistic is the
2 value corresponding to the odds ratio of disease between cases and controls. In the case of early stopping for futility, if the
2 test statistic is less than a predefined value, called a boundary value, then it is unlikely that genotyping additional samples will give a statistically significant result. Therefore genotyping stops once the
2 test statistic crosses this boundary. Stopping boundaries may be defined by commercial software packages such as EAST-2000 (Cambridge, MA; http://www.cytel.com) or PEST (Reading, UK; http://www.rdg.ac.uk/mps/mps_home/software/pest4/pest4.htm) or by writing local software (![]()
Fig 1 demonstrates the evolution of a test statistic in a hypothetical study with eight looks. The study would terminate early to accept the null hypothesis if the path of the test statistic crossed the boundary at any point, as occurs at look number 6. For some choices of parameter values, early closure is not possible. For example, the boundary shown in Fig 1 does not allow closure at the first look, where it is undefined. Therefore, irrespective of the results obtained at the first look, a second round of genotyping would be required.
|
Simulation studies were used to evaluate the application of GSM. Two previously published data sets of GST genotype and prostate cancer risk were used for the simulations (![]()
O'Brien-Fleming (OBF) stopping boundaries for both rejection and failure of rejection of the null hypothesis at each interval of genotype data acquisition were defined using EAST-2000 (![]()
For all simulations, the overall two-sided type I error was set at
= 0.05. Since the sample pool was fixed (N = 675 for GSTM1 and 725 for GSTT1), the power was defined by the sample size, null genotype frequencies in controls, and OR. We chose not to specify a type II error rate to examine the performance of the GSM method over a range of genotype frequencies and ORs. Genotype frequencies in controls were set at 10%, at 50%, and at the genotype frequency observed in the data set used. The observed genotype frequencies were 38% for GSTM1 and 28% for GSTT1. ORs of 1.6, 1.8, and 2.0 were examined. The OR of 1.6 was chosen to correspond to that observed for GSTT1. An OR of 2.0 corresponds to that often used as the target "clinically significant association" for many epidemiological studies. The OR of 1.8 was chosen to be intermediate between these two. For these simulations, the interval of genotype data acquisition was termed a "look." Each look contained a multiple of 90 genotypes to simulate genotype acquisition from a 96-well PCR-based genotyping method (e.g., 90 genotypes and 6 control samples per PCR run).
In addition to the simulation parameters defined by the baseline frequencies and OR, three different look strategies were examined. The first strategy had two looks, with the interim look occurring after
50% of the samples had been genotyped. The second strategy used the maximum number of possible looks, given the sample size and the restriction that each look (except the last) must include a multiple of 90 samples. The third strategy chosen was intermediate between these. Thus simulations for GSTM1 examined two, three, or seven looks; two, four, or eight looks were examined for GSTT1.
A total of 1000 replications were performed for each of the 27 combinations of baseline gene frequency, OR, and number of looks. Simulations were done separately for the GSTM1 and GSTT1 data sets. For each replication, prostate cancer cases and controls were randomly sampled from the true data sets without replacement and in proportion to their relative frequencies. The observed OR and
2 test statistic were calculated for each look. The
2 test statistic was then compared to the boundary value calculated by EAST-2000 for study termination. If the test statistic was less than the boundary for early stopping, i.e., if the test statistic "crossed the boundary," then the run terminated. If the test statistic did not cross the boundary, then an additional look was selected and the test statistic recalculated, accounting for the information gained in the prior look. This procedure was repeated until the test statistic crossed a boundary or all genotypes were sampled. All simulation studies and analyses were performed using STATA v7.0 (College Park, TX).
In the above, we dealt with the potential for early closure by using the boundary values themselves (on the
2 test statistic scale). This method allows application of these methods to test statistics that are not built into standard group sequential software packages. However, it should be noted that an alternative means of conducting monitoring of a molecular epidemiology trial would be to use directly the methods developed for a comparison of two binomials. These methods are available in, for example, EAST-2000.
Results for GSTM1 simulations are shown in Table 1. Overall, 91.5% of the simulations terminated early with a range of 4.5100%. The median genotyped sample size was 459. Thus, use of GSM decreased the median sample size by 32%. Results for the GSTT1 simulations are shown in Table 2. On average, only 6.6% of the GSTT1 simulations terminated early. The median sample size was 714 with the sample size of 725 representing the entire data set. This low frequency of termination is appropriate as an association between GSTT1 genotype and outcome was present in the data set.
|
|
Our simulations indicate that GSM may provide significant improvements for case-control molecular epidemiology studies. Our approach of evaluating genotype data in multiples of 90 more closely reflects laboratory data acquisition and is thus directly applicable to large molecular epidemiology studies. For GSTT1 simulations with 80% power, assuming a genotype cost of $3.00/genotype, the use of GSM would save
$650 from a total cost of $2025, in addition to savings in technician time and reagents. This sample size savings had a relatively small cost to the overall power of the study. The average difference in study power between a fixed sample design and a GSM design for GSTM1 simulations with 80% power was 3.3% (average fixed sample size power was 86.2%; average GSM design power was 82.9%). For these simulations, the average difference in study power between a one-look and a maximum-look strategy was also small3.3%.
A number of observations may be made regarding the effects of varying model parameters on the probability of early stopping. First, the frequency of early stopping decreased as the study power increased. Although power is affected by the baseline frequency, OR, and sample size, the frequency of early stopping was "monotonic" in power. Thus, in all cases lower-power studies had higher rates of termination and terminated at earlier looks than did models with higher power. This corresponds to the intuition that studies with low power should be more likely to close early because the a priori chance of finding a significant association is very small, even if an association exists. However, appropriately powered models closed appropriately early in the GSTM1 simulations and had low rates of inappropriate closure in the GSTT1 simulations.
The baseline genotype frequency in controls (p1) directly affects the statistical power. GSTM1 models with a baseline frequency p1 = 0.38 or p1 = 0.50 and GSTT1 models with p1 = 0.28 had the highest power for a given OR and number of looks. These higher-power models closed later and had larger average sample numbers. Likewise, simulations with larger OR closed later and had larger average sample numbers than simulations with lower OR for the same p1 and number of looks.
Finally, increasing the number of looks decreased the study power and in general decreased the average sample number. Interestingly, for GSTM1 models with a typical power of
80%, an intermediate number of looks had higher average sample numbers than models with either the minimum or the maximum number of looks. Models with two looks obtained enough genotype information at the first look to close early with a high rate with attendant sample size savings. This is consistent with the results of similar analyses in clinical trials (![]()
Since our simulations indicate that an intermediate look strategy may give a higher average sample number for studies with
80% power, investigators may wish to choose either a minimum- or a maximum-look strategy. Since the power cost of additional looks is relatively small, the optimal number of looks will be determined largely by the opportunity cost of multiple data analyses as well as by the need to conserve samples and costs. If samples are limited or expensive to assay, investigators may wish to perform multiple looks to minimize the average sample number. However, if sample conservation or cost minimization are not overriding concerns, then investigators may wish to perform only one interim analysis.
| ACKNOWLEDGMENTS |
|---|
Supported by the Doris Duke Charitable Foundation (R.A.) and the Leonard and Madilyn Abramson Endowed Chair, National Institutes of Health grant R01-CA85074 (T.R.R.)
Manuscript received September 27, 2002; Accepted for publication December 9, 2002.
| LITERATURE CITED |
|---|
GOULD, A. L., 1983 Abandoning lost causes (early termination of unproductive clinical trials). Proceedings of the Biopharmaceutical Sciences, American Statistical Association, Washington, DC, pp. 3134.
JENNISON, C., and B. W. TURNBULL, 2000 Group Sequential Methods With Application to Clinical Trials. Chapman Hall/CRC Press, New York.
KAAKS, R., I. VAN DER TWEEL, P. A. VAN NOORD, and E. RIBOLI, 1994 Efficient use of biological banks for biochemical epidemiology: exploratory hypothesis testing by means of a sequential t-test. Epidemiology 5:429-438.[Medline]
O'BRIEN, P. C. and T. R. FLEMING, 1979 A multiple testing procedure for clinical trials. Biometrics 35:549-556.[Medline]
O'NEILL, R. T. and C. ANELLO, 1978 Case-control studies: a sequential approach. Am. J. Epidemiol. 108:415-424.
PASTERNAK, B. S. and R. E. SHORE, 1980 Group sequential methods for cohort and case-control studies. J. Chronic Dis. 33:365-373.[Medline]
POCOCK, S. J., 1982 Interim analyses for randomized clinical trials: the group sequential approach. Biometrics 38:153-162.[Medline]
REBBECK, T. R., A. H. WALKER, J. M. JAFFE, D. L. WHITE, and A. J. WEIN et al., 1999 Glutathione S-transferase-mu (GSTM1) and -theta (GSTT1) genotypes in the etiology of prostate cancer. Cancer Epidemiol. Biomarkers Prev. 8:283-287.
SATAGOPAN, J. M., D. A. VERBEL, E. S. VENKATRAMAN, K. E. OFFIT, and C. B. BEGG, 2002 Two-stage designs for gene-disease association studies. Biometrics 58:163-170.[Medline]
SCHOENFELD, D. A., 2001 A simple algorithm for designing group sequential clinical trials. Biometrics 57:972-974.[Medline]
VAN DER TWEEL, I. and P. A. VAN NOORD, 2000 Sequential analysis of matched dichotomous data from prospective case-control studies. Stat. Med. 19:3449-3464.[Medline]
WHITEHEAD, J., 1999 A unified theory for sequential clinical trials. Stat. Med. 18:2271-2286.[Medline]
This article has been cited by other articles:
![]() |
M. E. Talkowski, G. Kirov, M. Bamne, L. Georgieva, G. Torres, H. Mansour, K. V. Chowdari, V. Milanova, J. Wood, L. McClain, et al. A network of dopaminergic gene variations implicated as risk factors for schizophrenia Hum. Mol. Genet., March 1, 2008; 17(5): 747 - 758. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Aplenc, R.
- Articles by Propert, K. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Aplenc, R.
- Articles by Propert, K. J.

