Abstract
To date, few methods have been developed explicitly for metaanalysis of linkage analyses. Moreover, the methods that have been developed or suggested generally depend on certain ideal situations and have not been widely applied. In this article, we apply standard statistical theory and metaanalytic techniques in novel ways to five published papers discussing the evidence of linkage of body mass index (BMI) to the region of the human genome containing the OB gene. These methods are “inference based,” meaning that they allow one to make statements about the statistical significance of the entire body of evidence. As currently developed, they do not allow specific statements to be made about the amount of variance explained by any putative locus or allow precise confidence intervals to be placed around the putative location of a linked locus. By applying these techniques to the literature on linkage in the human OB gene region, we are able to show that the evidence for linkage somewhere in the region is extremely strong (P = 1.5 × 10^{−5}).
POOLING data to increase the precision of one's estimates and conclusions dates back at least to the early 1900s (Pearson 1904, 1933a,b; Tippett 1931; Fisher 1954) and was termed metaanalysis by Glass (1976). Since then, the use of metaanalysis has increased dramatically. However, there has been relatively little use of metaanalysis in assessing evidence of linkage between human diseases and genetic markers.
Recently, several authors have suggested the possibility of developing methods for metaanalysis of linkage studies. Lander and Kruglyak (1995, p. 245) state “careful metaanalysis of all studies may be useful to assess whether the overall evidence for linkage is convincing … . To combine results among studies it is always best to pool the raw data and reanalyze the entire data set. Lod scores can be added across studies, but only when they are computed with the same methods, with the same set of markers, and at the same map position.” Lander and Kruglyak (1995) then state that other metaanalytic methods are available and cite Cox and Hinkley (1974). The metaanalytic technique mentioned by Cox and Hinkley (1974, p. 80) involves combining P values from several independent samples as outlined by Fisher (1954, p. 99). Recently, Li and Rao (1996) presented a metaanalytic method in which each study used the Haseman–Elston procedure (Haseman and Elston 1972) and the same markers.
We agree with Lander and Kruglyak (1995) that the preferable way to metaanalyze data is by obtaining the raw data from each study and conducting a pooled analysis (Jenget al. 1995). In the context of pooling raw data, multipoint procedures (e.g., Kearsey and Hyne 1994; Hyne and Kearsey 1995; Fulkeret al. 1995; Xu and Atchley 1995) will have great utility. However, in many situations this is not feasible and the metaanalyst must work with the data available in published reports. Moreover, Li and Rao's (1996) method is limited in that not every study uses the Haseman–Elston procedure or the same markers. Herein, we present metaanalytic techniques that can be used under worstcase conditions, including:
Different studies use different genetic markers.
Different studies use different statistical techniques to test for linkage.
Some studies include multiple hypothesis tests by using multiple markers, multiple statistical techniques, or multiple phenotypic cutoff points. This creates issues of nonindependent multiple testing that must be managed.
Not all studies report all of the information required for easy extraction of the data. The techniques we will present are applicable even under these difficult circumstances.
GENERAL PRINCIPLES
Assume that there are m independent studies assessing linkage of a disease or trait to markers within a region of the genome. Suppose that a P value can be obtained on each of the m data sets where the P value indicates the probability of obtaining data as extreme or more extreme than the data observed under the null hypothesis of no linkage in the region. Fisher (1954, p. 99) showed that m independent P values can be combined into a single test of significance. Specifically,
In this article, we illustrate the application of this technique to published evidence for linkage in the human OB gene region to body mass index (BMI; kg/m^{2}). Most of the exposition involves the application of standard statistical methods and metaanalytic “tricks of the trade” to derive one P value from each study. Following the example, a reiteration and discussion of the general approach are included.
EXAMPLE
To our knowledge, there are five published studies concerning linkage of BMI with markers in the human OB gene region. A sixth study (Stirlinget al. 1995) examined linkage of diabetes mellitus to markers in the OB region but only examined associations (not linkage) with obesity and, therefore, is not included. We review each study below. The use of only published data is addressed in the discussion section.
Clement et al. (1996): Clement et al. (1996) evaluated linkage to BMI dichotomized as “greater than 35” or “less than or equal to 35” with markers ranging from D7S651 to D7S509. The main results are displayed in Table 1. The data consist of sibpair analyses testing whether the mean proportion of alleles shared identical by descent (IBD) among sib pairs differs from the expectation of 1/2 under the null hypothesis. The extraction of a single P value from this study is not easy because (1) in some cases the information offered is imprecise; (2) the data really consist of two different studies, one of obeselean pairs and one of obeseobese pairs; and (3) multiple markers are used.
However, the first problem of incomplete data can be solved rather easily in this case. Because the exact t values are provided along with n (the sample size) it is easy to obtain the exact P values by integrating the t distribution with n − 1 degrees of freedom (d.f.). In this case, the second problem of having two separate studies can also be solved. Although the two samples involved contain overlapping individuals, it has been shown that the IBD status of different sibling pairs from within the same sibship are pairwise independent (e.g., Hodge 1984; Amoset al. 1989). Therefore, test statistics obtained on the two samples will be independent, and this one paper (Clementet al. 1996) can then be considered to simply contain two statistically independent studies, each of which yields one P value. The final problem, multiple markers, is more challenging. Had a multipoint procedure been used to yield a single P value for the point in the interval that yielded the maximum evidence of linkage, the single P value could be corrected and used. However, because a multipoint procedure was not used, it is necessary to combine the eight P values observed into a single P value. Because the markers are linked to each other, test statistics obtained at each marker, and therefore the P values of each marker, are not statistically independent. Any method of combining them must take this nonindependence into account.
To accomplish this, we begin by converting each P value to a corresponding (standard normal) Zscore by means of the inverse standard normal distribution function Φ^{−1}, that is, Z = Φ^{−1} (1 − P). Under the null hypothesis of no linkage, the eight Zscores have a multivariate normal distribution with zero means, unit variances, and a correlation matrix R. According to Carey and Williamson (1993), the correlation of IBD status (and therefore Zscores based on IBD status) between two markers is equal to (1 − 2θ)^{2}, where θ denotes the recombination fraction between the markers. Note that one centimorgan (cM) is equal to θ of 0.01, equivalent, on average, to 1 million base pairs (bp) (Department of Energy 1992).
In this case, since the tests on individual markers are onesided, we defined more extreme data in terms of the sum of the Zscores
Duggirala et al. (1996): Although a multiplemarker linkage analysis was conducted by Duggirala et al. (1996), in contrast to the previous case of Clement et al. (1996), all the information contained in their multiple markers is already combined into a single P value for the interval investigated by their use of a multipoint procedure. For BMI, the P value was 0.003. However, one needs to correct this P value by taking into account that it was calculated from a multipoint procedure including markers spanning a 211cM interval (D7S531 to D7S483). This can be accomplished by using the formula provided by Lander and Kruglyak (1995, p. 244). Let us denote a corrected P value by P*; then, according to their formula, we have P* = 1 − exp (μ(T)), where
Borecki et al. (1994): Borecki et al. (1994) used only one marker in the area of the human OB gene. This one marker was KELL, located at 7q33. Four hundred two sibling pairs were included and the Haseman–Elston procedure was used. Thus, this study already yields a single P value requiring no correction. However, the P value is reported as 0.000. This implies that the actual P value was less than 0.001. From the published text there is no way of knowing how much less. Although it might have been reasonable to use the value 0.001 in the analysis, we contacted the original authors who provided us with an exact t of −4.4383 with 400 d.f., which translates to a single P value of 4.8 × 10^{−6}.
Norman et al. (1996): Norman et al. (1996) also used the Haseman–Elston procedure with 488 sibling pairs from the Pima Indian community. Only two markers were used, D7S530 and HCPA2. These markers are estimated to be 1 cM away from each other (Duggiralaet al. 1996). Zscores obtained for the two markers therefore will be highly correlated. The exact correlation between these two statistics is estimated to be 0.96 using again the formula from Carey and Williamson (1993). As was done with the data from Clement et al. (1996), the P values observed in Norman et al. (1996) were converted to Z statistics. The probability of obtaining two Z's as great or greater than the two Z's observed was again calculated in terms of the sum of the Zscores, which yielded a single P value of 0.544 for this study.
Reed et al. (1996): The data for Reed et al. (1996) present a unique challenge. The data consist of 213 concordant obese sibling pairs. Reed et al. (1996) used markers contained in and surrounding the interval D7S504 through D7S1875. They combined the marker information into haplotypes and conducted their analysis by looking at sharing of haplotypes rather than alleles. This aspect of their analysis simplifies the extraction of a single P value since significance is assessed only for IBD sharing at the single haplotype rather than at each individual locus. The challenge of the data is twofold. First, Reed et al. (1996) used two different statistical approaches to analyze their data. They used the siblingpair approach as described, and they used transmission disequilibrium testing (TDT; Spielmanet al. 1993) on a subset of offspring for whom heterozygote parents were available. Second, within each statistical analysis, Reed et al. (1996) analyzed the data three times, each at a different cutpoint of BMI to define obesity. The results from Reed et al. (1996) are reproduced here in Tables 2 and 3.
We begin by considering the data for the siblingpair sets in Table 2, in which for each BMI cutoff, the corresponding nominal P value was provided. Hence, by applying Φ^{−1} to those P values, the corresponding Zscores are obtained. In this situation, the 3 × 3 correlation matrix R for the Zscores can be calculated by assuming that the IBD status for the sibling pairs is independently and identically distributed (i.i.d.). Under this assumption, the correlation between the Zscore for a sample and a subset within the sample is the square root of the proportion of subjects in the subsample from the larger sample. For example, the correlation between the Zscores for sibling pairs above 35 and sibling pairs above 30 is
Turning to the TDT results in Table 3, a single P value can be obtained in a similar fashion. The chisquares in Table 3 are converted to Zscores by taking their square root. Assuming that the data are i.i.d., the correlation among the Z's can again be estimated as the square root of the proportion of subjects in a subset divided by the number of subjects in the larger set. For example, the estimated correlation of the Zscore in subjects with a BMI ≥40 and the Zscore for subjects with a BMI >30 is
The final challenge with the data from Reed et al. (1996) is that one now has two P values, one from TDT and one from siblingpair testing. If the correlation among these two tests could be determined, then one could combine these into a single P value. However, it is not immediately apparent how to estimate this correlation. There are several alternatives. First, one could, on some a priori grounds of preference, choose one test over another. For example, one might argue that because all of the other studies are using a siblingpair approach rather than TDT it would be more appropriate to combine siblingpair data rather than the TDT data and be consistent with the others. Second, one could multiply the lowest P value by two as a form of Bonferroni correction. However, this is overly conservative because it does not take the correlation between the two tests into account. Third, one could estimate the correlation via simulation. Fourth, one could conduct the overall metaanalysis with the results of the siblingpair tests and then conduct the analysis again using the results of the TDT tests as a form of sensitivity analysis (Greenhouse and Iyengar 1994). This is the strategy that we adopt.
Overall metaanalysis: The (single) P values for the five papers calculated as described above are presented in Table 4. In the primary overall metaanalysis the P values were pooled using Fisher's method described in the Introduction. For Reed et al. (1996) the P value used was that from the sibpair study. Based on all these P values, the χ^{2} for the overall analysis was 44.10 with 12 d.f. (P = 1.5 × 10^{−5}). These results apparently provide strong evidence of linkage somewhere in the OB region. When the P value for Reed et al. (1996) was replaced with that for TDT the results were even more significant, the overall χ^{2} being 47.73 (P = 3.5 × 10^{−6}). Thus, these results suggest that there is clear evidence for linkage of BMI to something in the OB region regardless of whether one uses the TDT or sibpair analyses from Reed et al. (1996).
As a sensitivity analysis, each study result was removed from the analysis, and the chisquare statistic with 10 d.f. (from the remaining study results) was computed. The corresponding P values are given in Table 4. This table shows that while the Borecki et al. (1994) study provides the extreme significance observed, even excluding this study, the remaining results still provide a significant value, again regardless of whether one uses the TDT or sibpair analyses from Reed et al. (1996). It follows that the strong overall statistical evidence is not necessarily due to a particular study with extreme significance but due to the accumulation of many study results.
Apart from the overall significance of these results, one may question whether the variation of the strength of the evidence (in terms of the P value) seen from study to study is simply random variation or represents some statistically significant heterogeneity in the linkage. Although this is clearly a legitimate question, it is not possible to conduct such a heterogeneity test in the current situation when only the inferential information available from P values is used, since the heterogeneity of the strength of statistical evidence across the studies is not identifiable from the heterogeneity of the linkage per se.
Summary of metaanalytic approach used: In this section we reiterate in general terms the approach taken in the metaanalysis. The first step is to extract a single P value from each study. The studies contained herein provide an illustration of several different techniques for extracting the single P value from each study. The techniques used can briefly be summarized as follows:
If a single P value was provided from a multipoint procedure, then the Lander–Kruglyak correction in Equation 3 was applied; Duggirala et al. (1996) case.
If a separate P value for each of several markers is used, then the Zscores and the correlations among them are obtained by using the inverse standard normal distribution function Φ^{−1} and the formula in Carey and Williamson (1993), respectively. Since the P values are onesided for linkage studies, a single P value was determined in terms of the sum of the Zscores in Equation 2, which is normally distributed; cases of Clement et al. (1996) and Norman et al. (1996).
For the multiple cutoff study defining affected or unaffected sib pairs (e.g., Reedet al. 1996), the correlations among Zscores were estimated by means of the proportion of sample sizes. For the onesided sibpair tests, the sum in (2) was used for extracting the single P value. In contrast, for the twosided TDT, the single P value was extracted by means of the quadratic form Q in (3), which is chisquare distributed.
For a single marker study, no correction needs to be applied; Borecki et al. (1994) case.
Once a single P value is derived for each study, the P values are pooled using Fisher's method in Equation 1.
DISCUSSION
The above example indicates that by judicious use of standard statistical theory and metaanalytic techniques, one can metaanalyze data from multiple linkage studies of the same phenotype even under a number of worstcase conditions. We believe such methods will become essential to understanding the overall significance of the research literature when raw data are not available. This is all the more important in the field of gene mapping for complex traits because the power to detect significance in any one study is often quite below what one would desire. This power can be enhanced by pooling the data in multiple studies.
The efficacy of these procedures is demonstrated by the example involving the assessment of linkage of BMI to the genome region containing the human OB gene. Taken individually, no one study is terribly convincing, except Borecki et al. (1994). Moreover, a cursory glance at the body of literature suggests that the results may be inconsistent across studies (i.e., some studies obtain significant results and others do not), which might lead readers to believe that the overall body of evidence does not support linkage in this region. However, as the foregoing analyses demonstrate, such a conclusion would be erroneous. Thus, in this case at least, metaanalysis of the entire body of data is able to demonstrate clarity, whereas individual studies or informal review of the individual studies yields no such clarity.
We wish to stress again that we do not consider our approach outlined in the current paper to be the optimal approach. Clearly, a better approach is to obtain the raw data of all of the investigators, whether the data were published or not, and conduct a pooled analysis. However, for the near future, we suspect that there will be many situations in which metaanalysts have only one alternative available to them: the statistical integration of data from multiple published studies that have used different statistical methods and different markers and, in some cases, have presented incomplete information. Moreover, although we hope that our paper acts as a call for the presentation of more complete information, such calls have been issued by metaanalysts for over two decades (Jackson 1980; Feinstein 1995). Our subjective assessment is that such calls have resulted in an only modest increase in the completeness of information presented in published reports. Thus, we suspect that metaanalysts will continue to be faced with similar situations.
A limitation of the metaanalysis conducted herein concerns the potential for publication bias. Publication bias occurs when the probability of a study being published is dependent upon the results of the study. Publication bias has been shown to exist in other fields (e.g., Allisonet al. 1996). Again, although calls have been issued to the research community to correct this problem (e.g., Chalmbers et al. 1990), the problem does not appear to be going away (e.g., Dickersin and Min 1993). There are several ways to approach the problem of publication bias. Clearly, the best way is to obtain unpublished studies as well as published studies. We did not attempt to do that in the context of this paper because our primary goal was to present our approach to conducting the metaanalysis as an example. The second approach one might take is to calculate a quantity called the “failsafe sample size,” which is the number of unpublished studies needed to overturn the overall conclusion of statistical significance from the published studies. Procedures for calculating the failsafe sample size are presented in Iyengar and Greenhouse (1988). We calculated the failsafe sample sizes for the current example and obtained values of 15 and 18 depending on whether Reed et al.'s (1996) sibpair test or TDT data were used, respectively. Because it seems very unlikely that there are 15to18 unpublished studies on this topic, the overall conclusion of statistical significance seems reasonably safe.^{1}
Finally, the methods presented herein only allow one to conduct inferential tests of whether or not there is significant linkage in a particular region. The appropriate conclusion to such an analysis, assuming a statistically significant result is obtained, is that there is statistically significant evidence for linkage somewhere in the region examined, in this case the region from D7S531 to D7S483, which is the same as Duggirala et al.'s (1996) interval. Unfortunately, one is unable, based solely on an analysis of the type presented herein, to estimate where in that interval the putative QTL lies or in any way to place a confidence interval around that estimated location. It is hoped that methods are developed to accomplish this goal in the future.
In conclusion, as more studies assessing the linkage between genetic markers and complex diseases or traits are possible, we suspect that there will be increasing need for metaanalysis to objectively and quantitatively pool the resulting information. We hope that the procedures outlined in this paper are a useful step in that direction.
Acknowledgments
This research was supported by National Institutes of Health grants R29DK47256, R01DK51716, and P30DK26687.
Footnotes

↵1 Since preparation of this article, we have become aware that there are at least three additional articles on this topic (Brayet al. 1996; Oksanenet al. 1997; Hasstedtet al. 1997). The interested reader is referred directly to those papers.

Communicating editor: R. R. Hudson
 Received February 28, 1997.
 Accepted October 23, 1997.
 Copyright © 1998 by the Genetics Society of America