Abstract
Analysis of raw pooled data from distinct studies of a single question generates a single statistical conclusion with greater power and precision than conventional metaanalysis based on withinstudy estimates. However, conducting analyses with pooled genetic data, in particular, is a daunting task that raises important statistical issues. In the process of analyzing data pooled from nine studies on the human leptin receptor (LEPR) gene for the association of three alleles (K109R, Q223R, and K656N) of LEPR with body mass index (BMI; kilograms divided by the square of the height in meters) and waist circumference (WC), we encountered the following methodological challenges: data on relatives, missing data, multivariate analysis, multiallele analysis at multiple loci, heterogeneity, and epistasis. We propose herein statistical methods and procedures to deal with such issues. With a total of 3263 related and unrelated subjects from diverse ethnic backgrounds such as AfricanAmerican, Caucasian, Danish, Finnish, FrenchCanadian, and Nigerian, we tested effects of individual alleles; joint effects of alleles at multiple loci; epistatic effects among alleles at different loci; effect modification by age, sex, diabetes, and ethnicity; and pleiotropic genotype effects on BMI and WC. The statistical methodologies were applied, before and after multiple imputation of missing observations, to pooled data as well as to individual data sets for estimates from each study, the latter leading to a metaanalysis. The results from the metaanalysis and the pooling analysis showed that none of the effects were significant at the 0.05 level of significance. Heterogeneity tests showed that the variations of the nonsignificant effects are within the range of sampling variation. Although certain genotypic effects could be population specific, there was no statistically compelling evidence that any of the three LEPR alleles is associated with BMI or waist circumference in the general population.
WHEN many studies on the same topic differ in terms of statistical inferences and conclusions thereof, combining the information from these separate studies by either metaanalysis or raw data pooling provides a means by which data from the individual studies can be combined to enhance statistical power. The primary advantages of such analyses include (1) reduction of type I errors by consolidating many tests of the same hypothesis with many samples into a single test with one pooled sample; (2) increased statistical power; and (3) direct tests of heterogeneity among samples/populations.
In genetic studies, analysis of pooled data can be especially challenging even if all studies investigated relationships between genotypes at the same gene with the same phenotypes. Genetic data sets usually consist of individuals related to other individuals, resulting in correlated observations. Therefore, methodologies accounting for such correlated observations should be employed. Typically, familial correlations depend upon the degree of relationship between pedigree members. With respect to genotypes, multiple alleles across multiple loci, or markers, within the same gene may be of interest, leading to the need to evaluate multilocus analyses for direct and epistatic effects, i.e., interactions among the multiple alleles. Interactions among genotypes and covariates (i.e., genebyenvironment interaction or effect modification) should also be considered in modeling; the use of appropriate covariates can define more precisely the effects of individual alleles. Pleiotropic (either relational or mosaic) effects of genotypes are also of interest and can be investigated through multivariate analysis. Finally, missing observations on genotypes and other variables are not uncommon.
Codes for the same genotypes and discrete variables are often different from study to study. In addition, there may be increased numbers of missing observations in pooled data because of different lists of covariates and alleles. The number of members in each pedigree is usually different within and among studies. The configurations of degrees of relationship among pedigree members are not the same over the different pedigrees. These characteristics of the pooled data require statistical modeling, allowing flexible construction of the residual covariance matrix. Moreover, the number of independent variables in a model can be very large when all the main effects and interaction effects are included. Heterogeneity of samples (e.g., in demographic characteristics) among the different studies is also a concern and necessitates analysis adjusted for study profiles or study effects. To our knowledge, neither the guidelines nor the statistical software for handling such methodological and practical issues are currently well developed in the genetics literature, though some guidelines do exist in other contexts (e.g., Cooper and Hedges 1994).
In this article, we illustrate these issues and demonstrate some appropriate (and in some cases ad hoc) statistical methodologies and procedures. We used pooled raw data on body mass index (BMI; measured as the weight in kilograms divided by the square of the height in meters) and waist circumference (WC) and the three amino acid substitutions (the polymorphisms or the allelic variants) K109R at exon 2, Q223R at exon 4, and K656N at exon 12 in the leptin receptor (LEPR) gene on human chromosome 1p. In the rest of this article, to denote any variant at a particular exon, we use the word “allele” rather than “polymorphism” in accordance with Elston (2000). The exoncoding nomenclature and the marker names for the alleles used are as in Chung et al. (1997). The alleles at each exon within the LEPR gene are diallelic and the LEPR gene is a candidate gene influencing human obesity.
Interindividual variation in human BMI, direct measures of fat mass, and fat distribution have been clearly shown to have a strong genetic component (Comuzzie and Allison 1998). Mice homozygous for inactivating mutations of Lepr become massively obese (Chuaet al. 1996). Rare instances of human obesity secondary to homozygosity for inactivating LEPR mutant alleles have been identified (Clémentet al. 1998). Recently, several studies have assessed the role of the three common LEPR alleles in interindividual variations in BMI and WC. These analyses yielded inconsistent results (e.g., Gotodaet al. 1997; Matsuokaet al. 1997; Silveret al. 1997; Chagnonet al. 1999; del Giudiceet al. 2000). In an attempt to resolve the physiological significance of the alleles at the LEPR gene with obesityrelated phenotypes, teams of investigators representing nine studies provided their raw data for a pooling analysis (see appendix a).
SUBJECTS
A total of 3263 individuals were included in this study (Table 1). Sixtytwo percent of these individuals are related to one or more subjects in the data set. The largest number of generations among the family pedigrees in the pooled data was two. Descriptive statistics are presented in Table 2 along with percentages of missing observations. The subjects are ethnically diverse—i.e., AfricanAmerican, Caucasian, Danish, Finnish, FrenchCanadian, and Nigerian. Approximately half are female (Table 2).
STATISTICAL ANALYSIS
General model: In general, genetic models can be represented in the following form:
The function f of the phenotype(s) depends upon the model implemented. For example, f(phenotype(s)) is the squared difference of phenotypes of sibling pairs in the HasemanElston regression. The phenotypes can be univariate or multivariate. Genetic effects can be random as in variance components analysis or fixed as in usual association studies. Covariates are discrete and/or continuous, and their coefficients are usually fixed, although modeling random coefficients is possible. The model becomes a mixedeffects model when fixed and random effects are simultaneously included. Interaction effects can be among genetic effects (i.e., locusbylocus interaction, epistasis), among main covariates, and between genetic and main covariates (i.e., locusbyenvironment interactions). The expectation of the vector error term is zero but its covariance may not necessarily be in the form of σI, where I is the nbyn identity matrix and n is the number of subjects under study.
Preliminary analysis: The main objective of the preliminary analysis was to test the association between alleles at each exon and BMI and WC. We analyzed BMI and WC separately for each exon. The main covariates were sex and age and no interaction effects were modeled. In this preliminary analysis, no missing data imputation was employed. The purpose of these approaches was to overview a crude overall association.
For the association analysis, we utilized the ASSOC routine in the S.A.G.E. (1997) software (George and Elston 1987), which allows for familial (residual) correlations. The siblingbased permutation test, a valid joint test for association in the presence of linkage (Allisonet al. 1999), was also utilized to control for potential population admixture. The permutation of observations was performed within each sibship, which therefore leads to inferences conditional on sibship and hence on family. This step eliminates the possibility of confounding by population admixture. In other words, no matter what population the families came from, inferences conditional upon sibship are equivalent to those conditional upon population stratification, because sibships are “subunits” of a population with a uniform degree of admixture. This has been more fully described and demonstrated elsewhere (e.g., Allisonet al. 1999).
We also analyzed the proportion of alleles shared identicalinstate (IIS) by sibling pairs because the alleles at the three exons are of interest. Specifically, we regressed the square of the phenotypic difference between the sibling pairs (cf. HasemanElston procedure; Haseman and Elston 1972) on the proportion of alleles shared IIS by sibling pairs. We also regressed the grandmeancentered cross product of phenotypes between the sibling pairs (cf. New HasemanElston procedure; Elstonet al. 2000) on the proportion of alleles shared IIS by sibling pairs. This type of “IIS analysis” is also valid for association and linkage analysis. We used IIS instead of identicalbydescent (IBD) because, when the marker alleles are causative of variations in the phenotype, IIS linkage analysis should be more powerful than IBD linkage analysis and we were not interested in whether the phenotypes of interest are linked to unknown alleles (in which case IBD analysis would have been more powerful). All of these analyses were adjusted for age and sex.
For this preliminary analysis we took two methodological approaches: metaanalysis and pooling analysis. For the former, we applied the association analyses described above to each data set from each study and then “metaanalyzed” the results followed by heterogeneity analysis (Hedges and Olkin 1985). To combine the P values, we used Fisher's approach (Fisher 1954). For the latter, we applied association and linkage analyses to a single pooled data set. Finally, we compared the results of the metaanalysis with those obtained from the pooling analysis.
Main analysis: The objectives of the main analysis were to test the simultaneous effects of alleles in multiple exons, to test the epistatic effects of alleles in these exons, and to test effect modification by main covariates such as age, sex, diabetes, and ethnicity. BMI and WC were modeled as separate univariate phenotypes and as a combined multivariate phenotype. All exons were simultaneously included in all models. The main covariates included continuous age polynomial variables up to the third degree (i.e., age, age^{2}, age^{3}). In addition, discrete variables (ethnicity, diabetes, sex, and study effects) were dummy coded. The powers of the variable were included because the range of age was wide (3–94 years), and BMI and WC are nonlinearly associated with age over that range, especially in children (RollandCacheraet al. 1982).
Two and threeway interaction effects among genotypes and twoway interaction effects between genotype and main covariates were included in the analysis. However, with regard to twoway interaction effects, age^{2} and age^{3} were excluded to limit the number of independent variables in the models. The study effects were included only as main covariates without any related interaction terms. We used the S.A.G.E. ASSOC routine, fitted ordinary least squares (OLS) regression models, and conducted general linear model (GLM) multivariate analyses for the main analyses. In this main analysis, only pooling analyses were conducted. In the following section, the theoretical and practical issues that arose in conducting the main analysis are described.
Multiple imputation for missing values: There were many missing observations for the phenotypes and covariates, e.g., diabetic status (Table 2), and for genotypes at exon 2 in particular (Table 3). Because deletion of missing values (e.g., listwise deletion) from analyses can introduce biases and inefficient use of collected data (Schafer 1997), we employed the multiple imputation (MI) method proposed by Rubin (1978, 1987, 1996) and described in Schafer (1997), assuming that the missing observations occurred at random as defined in Little and Rubin (1987). Briefly, imputed values for one missing observation are randomly drawn multiple times independently from an underlying probability model, e.g., from a normal distribution whose mean and variance were determined by regression analysis; i.e., the predicted value is the mean, and the mean squared error (MSE) is the variance. For the nonmissing observations, imputation is not necessary.
The MI method is appealing because it accommodates variation due to random imputation in the inference procedure (note that single imputed values are not true observations). In principle, more imputations provide for better inference with increased accuracy. Practically, however, three to five imputations appear satisfactory in terms of efficiency of estimation even with 50% observations missing (Schafer and Olsen 1998). Specifically, the efficiency of estimation by means of MI is 1/(1 + γ/m), where γ is the fraction of missing information and m is the number of imputations (Rubin 1987). This returns 86 and 91% efficiency for m = 3 and 5, respectively, with γ = 0.5, i.e., 50% observations missing. Herein the missing observations were imputed five times.
Random values from the normal distribution can be imputed for missing values of continuous variables. For missing discrete categorical values, however, the categorical values nearest to the randomly generated continuous values can be imputed. This imputation method for the categorical variables is acceptable in general even under normality assumptions (Schafer and Olsen 1998). However, this technique does not guarantee producing imputations with coherent genotypes (i.e., genotypes satisfying Mendel's laws) for members within a family of multiple generations. To avoid possible inconsistencies of imputed genotypes among family members, we employed “restricted” imputation. For instance, when all family members had missing genotypes, only offspring genotypes were imputed for computational simplicity and convenience, or the “aa” genotype was excluded from possible offspring genotypes when the parental mating type was AAAA or AAAa. See appendix b for statistical inferences that include Pvalue calculations on the basis of MI.
Practical issues and ad hoc methods: The interaction terms and dummy codes for genotypes and discrete variables created models with many independent variables, making interpretation of results difficult. Therefore, we applied OLS backward elimination to select significant variables in the presence of genotype variables. In doing so, we first “stacked” the five imputed pooled data sets into one single data set and then assigned a weight of onefifth in count to every data point in the stacked data set, so that original nonmissing values have a weight of 1 and imputed values have a weight of ⅕. By “stack” we mean accumulation of (the five rectangular) missingimputed data sets by case. This procedure still does not account for familial correlations or variations due to MIs and therefore does not produce unbiased standard errors. However, this ad hoc method can help identify significant contributors to specific models; the OLS method provides unbiased coefficient estimates and reasonably good compatibility for testing compared to its counterpart that adjusts for correlations (M. A. Province, T. Rice and D. C. Rao, unpublished results). Therefore, for subsequent statistical analyses we took residualized BMI and WC resulting from OLS regression on some “significant” variables retained from the backward elimination using the stacked data set.
Although the S.A.G.E. ASSOC routine was developed to allow familial correlations and to produce consistent standard error estimates, it has several practical limitations in terms of the number of interactions that can be included. However, one can include more than one locus by coding the other loci as covariates since ASSOC allows multiple covariates to be included in the model. ASSOC provides the estimated coefficients for each covariate and their standard errors. Through the use of the statistics, we can test the effect of the additional loci by a Waldtype test. The choice of which locus is utilized as a genetic locus (“marker” in the S.A.G.E. ASSOC terminology) and which loci are treated as covariates should not affect the inferences about genotypic effects at each locus because the Wald (for the covariates) and likelihoodratio tests (for the genetic locus, the marker) are asymptotically equivalent. On the other hand, most programs for OLS regression are easy and flexible to run and produce unbiased point estimates, but they do produce biased estimates of standard errors. To adjust the bias of the standard errors of the OLS regression, we calculated a “correction factor”: the ratio of average estimated variances of the point estimates from S.A.G.E. ASSOC to that of estimated variances from the OLS regression. This calculation resulted in “corrected” flexible use of OLS in various modelfitting procedures in the presence of correlations among observations across data points. We also tried the OLS approach with the stacked data, using weights of ⅕ to compare the estimates and P values obtained from the MI inference described in appendix b.
With respect to multivariate analysis for testing pleiotropic effects, many approaches can be employed (e.g., Amoset al. 1990; Allisonet al. 1998; Manginet al. 1998). In this study, we employed GLM multivariate analysis without controlling for familial correlations, because P values could be adjusted by using the correction factors described above.
RESULTS
Descriptive statistics: Demographic descriptive statistics are presented in Tables 1 and 2 along with percentages of missing observations. The range of age is large, as are the ranges of BMI and WC. The estimated allele frequencies at exons 2, 4, and 12 were 0.23, 0.48, and 0.20, respectively (Table 3). Using the maximumlikelihood (ML) test for departure from HardyWeinberg equilibrium (HWE) described by Lynch and Walsh (1998, pp. 60–61) there was no evidence for a departure from HWE for alleles at any exon (exon 2, P = 0.285; exon 4, P = 0.597; and exon 12, P = 0.537). The MI resulted in little change in statistical significance (exon 2, P = 0.960; exon 4, P = 0.770; and exon 12, P = 0.374). On the basis of the ML test in Lynch and Walsh (1998, pp. 98–99), however, the three exons are (as expected) all significantly in pairwise linkage disequilibrium regardless of the imputation (exon 2 vs. exon 4, P < 0.001; exon 2 vs. exon 12, P < 0.001; and exon 4 vs. exon 12, P < 0.001).
Table 3 also presents genotypespecific mean BMI and WC by locus. Within each exon, the mean values are almost the same before or after imputation. However, before imputation, overall means of BMI and WC at exon 2 are smaller than those at the other exons because the genotypes at exon 2 in the Baltimore study (Table 1) are all missing and Baltimore study subjects have relatively larger BMI and WC compared to subjects in the other studies. After imputation, on the other hand, the overall mean values become almost the same across the exons. These results, unadjusted for covariates or familial correlations, provide a preliminary view of the association between the phenotypes and genotypes.
Preliminary analysis: Results of the ASSOC analyses are presented in terms of differences of the estimated effects of the two genotypes, “wildtype” homozygote and heterozygote, on BMI and WC, separately, from those of the “mutant” homozygous genotypes (Table 4) after adjusting for age and sex. For example, subjects heterozygous (K109R) for the exon 2 allele had a metaestimate effect size of 0.03 on BMI when compared to subjects with K109K genotype. No single effect was significant from either the individual studies or the metaanalysis (Table 4). The number of subjects for each estimate is presented in Table 5, which also shows P values to assess the significance of the phenotypic variation due to genotypic variation, by exon. The calculation of these P values was based on the contribution of the genotypic variation to the likelihood. The effects were again not significant for either phenotype for any exon from individual studies or from the metaanalysis. Table 6 presents results of the siblingbased permutation association test from individual studies, from the metaanalysis, and from the pooling analysis. In this siblingbased permutation association test, only studies with related subjects were analyzed, because the test requires siblings within each family. Table 6 also shows absence of significant genotypic effects on the phenotypes.
Results of the regression on proportions of alleles shared IIS are presented in Tables 7 and 8. These results were obtained from the regression of squared phenotype differences and the regression of grandmean centered phenotype cross products, respectively. There was no statistical evidence that the alleles examined are associated with (or linked to) significant variation in either phenotype. However, the result of such IIS analysis of one study (Quebec family study) showed a significant linkage of the Q223R allele to BMI (P = 0.04), which is in agreement with the result of the IBD linkage analysis reported in Chagnon et al. (1999; P = 0.02).
Heterogeneity analysis: Heterogeneity analysis (Tables 4, 7, and 8) showed that the nonsignificant results are consistent over the three loci and over all of the individual studies, which implies that the variation in effects over the studies is within the range of sampling variation.
Main analysis: Missing data imputation: The results of missing data imputation are presented in Table 3 with a comparison of descriptive statistics before and after imputation, as presented earlier. Due to the “restricted” imputation adopted herein, ~5% of subjects still have missing genotypes for at least one exon even after imputation. Mean BMIs and WCs over the genotypes at exon 2 were increased after imputation. However, frequencies of alleles at each exon changed little.
Backward elimination and effect modifications: Significance of various effect modifications was evaluated by testing for interaction effects among genotypes and main covariates. We tested such interaction effects by OLS regression with the weighted stacked data. Backward elimination—starting with a full model with main genotype effects, epistatic effects (i.e., interaction effects among genotypes), main covariate effects, and interaction among genotypes and covariates—was applied to identify significant interaction effects. The OLS backward elimination results are listed in Table 9.
Only the interaction effect between R109R at exon 2 and sex was significant for BMI, implying that male subjects with R109R genotype at exon 2 have significantly higher BMI than the other subjects. However, the contribution of this interaction effect to the variations of BMI is minimal (increase in R^{2} < 0.01%). The nonsignificant allelebyenvironment interaction effects suggest that the genotypic effects, if any, might not be modified by the main covariates such as diabetes, sex, age, and ethnicity. In terms of epistatic effects, however, subjects with K109R at exon 2 and N656N at exon 12 (see HT2HM12 in Table 9) appeared to have significantly higher BMIs. Age, sex, ethnicity, diabetes, and study effects are major contributors to the variation of BMI and WC. Therefore, we took the “residualized,” or adjusted, BMI and WC for sex, diabetes, ethnicity, age polynomials, and study effects for the following analyses.
Joint effects of multiple alleles at different loci and epistasis: The results for BMI from the S.A.G.E. ASSOC routine are displayed in Table 10 with the five imputed pooled data sets. No single genotype at any exon has significant effect on BMI. The joint effects of all the genotypes at all exons are also not significant. These results are consistent with those from OLS regression without controlling for the familial correlations (Table 10). Similar results were observed (Table 11) with respect to WC. Interestingly, estimated coefficients, their standard errors, and P values for individual genotypes obtained from the OLS regressions after employing the estimation process described in appendix b were almost exactly the same as those from the OLS regressions with the stacked data (with weights of ⅕; Tables 10 and 11).
The correction factors from both BMI and WC analyses were, however, <1, which is counterintuitive, implying that the standard errors estimated without controlling for familial correlations are bigger than those that include such controlling. In regard to testing epistasis, i.e., two and threeway interaction effects among genotypes over all exons, we applied the OLS regression to accommodate all of the interactions in a particular model. The results are displayed in terms of P values for simultaneous effect of such interactions in Table 12. This analysis shows that the epistatic effects among genotypes of the three exons are not jointly significant from the results of MI nor from the results of weighted OLS, regardless of the presence of the main genotype effects in models after adjusting for sex, diabetes, ethnicity, age polynomials, and study effects.
Multivariate analysis for pleiotropic effects: The results for testing the pleiotropic effects are shown in Table 13. Although P values vary with imputation, no main effects or interaction effects are significant. These results indicate that the polymorphisms do not have any statistically significant simultaneous effects on BMI and WC.
DISCUSSION
We presented practical and ad hoc statistical methodologies that can accommodate many of the challenges encountered in pooled genetic data analysis. Although data pooling may be ideal for enhancement of power of statistical inferences, it is a daunting task to manage and analyze pooled data, as we have seen so far. Even a task as seemingly simple as coordinating the pooling of different data sets by creating a coherent coding system and uniform variable names can prove to be time consuming. Developing a coherent coding system for the pedigree members is important and creating appropriate dummy family members is often required for application of software. Therefore, investigators planning a pooling study should be aware of the large amount of time required for data management before the pooled data are analyzed.
In addition, the availability of statistical software that can handle the analytic problems raised here is limited. This deficiency creates a particular type of problem because, even after all data management issues are resolved, analysts are sometimes forced to change software and/or write additional programs with specific computer languages to conduct appropriate and necessary procedures and analyses. For example, one statistical software package may be able to provide a multipleimputed data set but not be able to generate the estimates from the imputed data using advanced statistical analytical approaches. To obtain such estimates, data analysts generally need to apply a different software package to the multipleimputed data; it is important to note that some types of software are not compatible with all forms of electronic data. This emphasizes another consideration. While there are many forms of genetic data analysis software available, few are flexible enough to meet the rapidly increasing genetic statistical needs. For example, statistical methodologies and theories exist, such as GLM and generalized estimating equations, that can flexibly account for varying familial correlation matrices due to different pedigree structures. At the same time, to our knowledge, there are no statistical packages that can easily handle such variations.
As such, the procedures and analyses proposed herein should be an example for conducting a pooled analysis in the absence of “bona fide” methodologies, although some methodological issues are still in question. For example, the empirical correction factors obtained in this study were all <1 (Tables 10 and 11), implying that the estimated standard errors from OLS methods are bigger than those from methods accounting for familial correlations. This result may initially seem to be counterintuitive, because treating correlated observations as if they were uncorrelated often yields smaller standard errors due to inflated information. However, appropriate control for correlation may yield smaller standard errors. For example, suppose that random variables X and Y are perfectly correlated [i.e., corr(X, Y) = 1] and X = Y + δ for some nonrandom δ. Now, we have n pairs of observed X and Y. If we apply a paired ttest to testing the null hypothesis of δ = 0, the result will be significant no matter how small a nonzero δ, because the estimated standard error of δ, the denominator of the test statistic, will be 0. But if we apply a twosample ttest, which ignores the correlation, then the results will depend on the magnitude of δ and the number of subjects, because the estimated standard error will not be zero. However, the question of whether this reasoning also applies to the situation described in this article is still unanswered. If this were the case in general, P values based on OLS methods would provide only an upper limit of “true” P values. Thus, the OLSbased P values may not provide a reasonable conclusion about hypotheses tests unless the OLS P values are large. Therefore, when the P values from OLS methods are borderline (between 0.05 and 0.10), application of correlationadjusting methods may be needed for more accurate P values.
The MI method adds (or more precisely “allows for”) uncertainty around the unknown missing genotypes by generating multiple randomly imputed genotypes where the imputed values are the predicted values plus some random error for which the expected squared value is equal to the estimated variance of the prediction. This is done multiple times and the variance in the results that occur from imputation to imputation enter into the calculation of standard errors and P values, thereby “penalizing” one for uncertainty rather than artifactually augmenting one's certainty. In some cases, the variance around the imputation is zero because the missing genotypes are known without error by Mendel's laws. In those cases, the imputation adds the correct amount of uncertainty; it just happens that that amount is zero. More broadly, the justification of the regression imputation is that genotypes can be predicted on the basis of observed phenotypes and covariates just as phenotypes can be predicted on the basis of observed genotypes. Although imputing such genotypes does not in and of itself create new information, the regression imputation of missing genotypes in this way allows one to use the full information that is available in a data set by, for example, not requiring one to drop subjects who are missing genotypic information at one locus but who have information at other loci when conducting a multilocus analysis; i.e., it avoids listwise deletions.
Under the null hypothesis of no association, there is no reason to suspect that such imputation, if properly accounted for, would bias the expected results. However, under an alternative hypothesis, MI methods can give different results. This results from (a) reducing possible biases if the data are missing at random (MAR) but not missing completely at random (MCAR) in the terminology of Little and Rubin (1987); (b) increasing the precision of estimates by using all of the information available in the data set in contrast to analysis of only complete cases; and (c) thereby increasing power. In our example, the nonsignificant results were not changed after missing values were imputed. That is, analysis of complete cases and MI yielded equivalent results (data not shown). This further consolidates the null association and serves as a form of sensitivity analysis. MI is now generally accepted and widely advocated among statisticians. However, it has not made significant headway into many applied areas, including genetic research. We hope that this illustration of its use may encourage further adoption of it or something analogous by genetic researchers.
In terms of combining P values, when the dimension of the parameter vector is one, the procedure described in appendix b [when dim(Q_{i}) = 1] is well justified theoretically and by simulation studies (e.g., Rubin 1996). Therefore, it is surprising that the overall combined P values based on weighted OLS are all very close to the P values based on that procedure (Tables 10 and 11), despite the fact that the weighting does not account for incompleteness of random missing imputation. This could support the use of weighting because of its simplicity, effectiveness, and reliability, which in turn may allow its use to be extended to the case of multidimensional parameters. In the cases of multidimensions, on the other hand, the particular procedure adopted herein [appendix b; when dim(Q_{i}) > 1] can be used as only a rough guide—providing an estimate of the range of P values between onehalf and twice the calculated value (Liet al. 1991), although the use of this procedure has been advocated because of its computational simplicity in comparison with other procedures. However, the P values obtained from this procedure and from the weighted OLS are again very similar (Table 12). Calculation of the overall P values for pleiotropic effects (Table 13) based on weighting was not possible because of software limitation.
With respect to allelebyallele interaction analysis, it should be pointed out that when several loci, or markers, are in close physical proximity to each other, as in the current case, and interactions among loci are tested for, such interactions, if observed, may be due to linkage disequilibrium and not true epistasis. To understand why, consider a hypothetical situation of two diallelic loci, A and B, with alleles A_{1} and A_{2} and B_{1} and B_{2}, respectively, in close proximity to each other. Assume that they are in equilibrium and that neither has any individual or interactive effect on the phenotype. Then at some point in history, an allelic variation might have occurred at locus C, which is also very close to A and B, and that C now has alleles C_{1} and C_{2}, with the C_{2} allele conferring a predisposition to increased phenotypic values. Suppose further that C_{2} arose on a chromosome with alleles A_{2} and B_{2} and, due to tight linkage, to this day C_{2} occurs primarily (though not necessarily exclusively) on chromosomes with the A_{2}, B_{2} haplotype. Finally, assume that loci A and B are genotyped in a study but C is not (i.e., C is unobserved). Then, given a sufficiently large sample, one will detect an interaction “effect” of the A and B loci. However, this is solely due to the fact that when A_{2} and B_{2} occur together, they represent a haplotype with a higher likelihood of having a C_{2} allele at the C locus. For the present study, such “phase” information is not available. However, if a polymorphism under study does not cause variation but is both linked to and in disequilibrium with a polymorphism that causes variation in the phenotype, power to detect epistasis may be enhanced through the use of estimated haplotypes rather than single nucleotide polymorphisms (Fallin and Schork 2000; Fallinet al. 2001). The power will depend critically on the degree of disequilibrium between the polymorphism and the causative allele. Nevertheless, the haplotype analysis should be very close (though not identical) to our analyses testing for epistasis.
With respect to comparison between metaanalysis and pooling analysis, both methods should yield similar results in terms of estimated effects and their significance, as was the case in this study. Metaanalysis can be as powerful as pooling analysis in certain particular situations (e.g., Olkin and Sampson 1998). However, pooling analysis has advantages over metaanalysis, including the ability to run analyses in a consistent fashion across studies, to test certain assumptions (e.g., normality), and to run analyses beyond those the original investigators ran. In addition, pooling analysis enables us to consistently provide the same analysis to each data set and impose a degree of quality control on the data analyses that is uniform across all data sets. Finally, raw data pooling allows examination for outliers, use of transformations, and full use of the statistician's usual armamentarium. In this sense, raw data pooling may be preferable to metaanalysis of published summary statistics. Nevertheless, if all analyses are conducted identically and correctly both within each data set separately and in the pooled data set, then there may be no theoretical difference in power.
Interaction of study effects with genotypic effects was assessed by means of testing heterogeneity of effects among studies. The heterogeneity of the effects was not significant, as presented earlier. Because of this nonsignificance and concerns about possible overfitting, we did not include studybyallele interaction effects in the main pooling analysis; i.e., we estimated equal genetic effects across the studies. Although the study effects are large (Table 9), this does not mean that the genotypic effects are different over the studies, but rather that average levels of the phenotypes are different in the different studies. However, we acknowledge that modeling study effects alone may not capture all possible heterogeneity across the study populations due to differences in sampling schemes, degrees of demographic homogeneity within samples, and so on. For example, the Danish samples (Echwaldet al. 1997) were collected under a unique sampling scheme. Moreover, the Finnish samples (Oksanenet al. 1998) may be relatively homogenous compared to the other study samples, although the significance of this remains open to questions (Abbott 2000).
An alternative approach would be a mixedeffects model with cohort as a random effect. Such a model could include empirical Bayes estimation and testing (Carlin and Louis 2000), which we are currently exploring. Such mixedeffects (random effect for studies) model approaches to testing heterogeneity are certainly a reasonable alternative that can be pursued in future work.
On the basis of the results from the backward elimination procedures (Table 9), subjects who are heterozygous at exon 2 and homozygous at exon 4 are significantly greater in imputed missing BMI compared to the other subjects (Table 9). Although these subjects (there were five such) could deserve more investigation, only one subject with this combination is extremely obese, with BMI 51. It is therefore unlikely that this particular allele interaction can cause obesity and there is a limitation in epistasis analysis because we used only three exons; i.e., we do not know the pathways of how the three exons interact with unknown alleles in other exons and introns. Furthermore, if we had adjusted for the number of tests, each adjusted P value would have been much higher than that reported in this article. For example, if all tests were independent, a multipletest adjusted P value p_{b} may be written as p_{b} = 1 − (1 − p)^{t}, where t is the number of tests and p is an unadjusted P value. It follows that p_{b} is 0.06 even with t = 6 and P = 0.01. This further confirms that it is unlikely that all the nonsignificant results from this study are due to type 2 error, although, when the tests are dependent, the adjusted P value will be <0.06 but >0.01. Moreover, even 1% of variation of phenotype due to their allelic variants would have been detected with >99% power with 3000 subjects. As far as confidence intervals of point estimates are concerned, they can be immediately computed from the standard errors provided in the tables.
The lack of association between the amino acid substitutions and obesity indices, despite a large sample, suggests that the substitutions do not affect the phenotypes. While amino acid substitutions may result in either nonfunctioning or poorly functioning proteins, or even functional proteins (if the effect is silent), it is important to note that among complex traits with multiple pathways, such as obesity, the absence of association might not necessarily indicate a lack of effect. It may simply be that persons with the amino acid substitution compensated by other means or that additional genotypic factors may be involved and need to be taken into account before the phenotype becomes manifest. However, the lack of association does not rule out the possibility that the three alleles may influence intermediate traits, or phenotypes, not examined as part of the analyses conducted in this article.
In conclusion, conducting appropriate statistical procedure and analysis of pooled genetic data requires careful data management and flexible adaptation of methods and software to effectively model biological effects of the genes under study. In the absence of welldeveloped guidelines, we hope that the procedures and methods illustrated herein can be useful as an example for future pooling of genetic studies of quantitative trait loci.
Acknowledgments
We thank Drs. Jose Fernandez and Gary Gadbury for their valuable input. This study was supported in part by the National Institutes of Health grants R01DK51716, R01DK52431, R01ES09912, F33DK09919, P30DK26687, R01HD29569, R01GM28356, and P41RR03655.
APPENDIX A: DATA PROVIDERS
See Table 1 for the study names.
Finnish Studies 1 and 2: Markku Koulu, M. Karvonen, U. Pesonen, A. Rissanen, M. Laakso, and M. Uusitupa.
QFS and HFS: Claude Bouchard and Yvonne Chagnon.
MIFS: Trudy L. Burns and Patricia A. Donohoue.
Baltimore Study: Ross E. Andersen, Alan R. Shuldiner, and Kristi Silver.
Danish Study: Soren Echwald, Olaf Pedersen, and T. I. A. Sørensen.
Nigerian Study and AfAm Study: Philip Behn and M. Alan Permutt.
APPENDIX B: INFERENCE BASED ON MULTIPLE IMPUTATION (Shafer 1997)
We conducted appropriate analysis with each imputed complete data set to obtain five estimates, Q_{i}, i = 1, … , m(= 5), and their estimated variance U_{i} = Var(Q_{i}), i = 1, … , m(= 5). In the following, dim(Q_{i}) denotes the dimension of the parameter vector Q_{i}.
When dim(Q_{i}) = 1, the final estimate, its variance T, and sampling distribution are
When dim(Q_{i}) = k > 1, calculation of an “overall” P value for testing H_{0}: Q = Q_{0} can be performed on the basis of the Wald test statistic; that is,
Footnotes

Communicating editor: C. Haley
 Received October 30, 2000.
 Accepted July 5, 2001.
 Copyright © 2001 by the Genetics Society of America