Abstract
Linkage disequilibrium, the nonrandom association of alleles from different loci, can provide valuable information on the structure of haplotypes in the human genome and is often the basis for evaluating the association of genomic variation with human traits among unrelated subjects. But, linkage phase of genetic markers measured on unrelated subjects is typically unknown, and so measurement of linkage disequilibrium, and testing whether it differs significantly from the null value of zero, requires statistical methods that can account for the ambiguity of unobserved haplotypes. A common method to test whether linkage disequilibrium differs significantly from zero is the likelihoodratio statistic, which assumes HardyWeinberg equilibrium of the marker phenotype proportions. We show, by simulations, that this approach can be grossly biased, with either extremely conservative or liberal type I error rates. In contrast, we use simulations to show that a composite statistic, proposed by Weir and Cockerham, maintains the correct type I error rates, and, when comparisons are appropriate, has similar power as the likelihoodratio statistic. We extend the composite statistic to allow for more than two alleles per locus, providing a global composite statistic, which is a strong competitor to the usual likelihoodratio statistic.
LINKAGE disequilibrium (LD), the nonrandom association of alleles from different loci, can provide valuable information on the structure of haplotypes of the human genome. This may prove useful for studying the association of genomic variation with human traits because haplotypebased methods can offer a powerful approach for disease gene mapping (Dalyet al. 2001; Gabrielet al. 2002). The measurement and testing of LD among measured genetic variants is often based on pairs of loci; statistical analyses measure the departure of the joint frequency of pairs of alleles from two loci on a haplotype from random pairing of alleles. Statistical evaluation of LD is well developed when haplotypes are directly observed (Hedrick 1987; Weir 1996). But, it is common to measure genetic markers on unrelated subjects without knowing the haplotype origin (linkage phase) of the marker alleles. In this case, a common way to test for LD is to enumerate all pairs of haplotypes that are consistent with each subject's observed marker phenotypes, calculate maximumlikelihood estimates (MLEs) of the haplotype frequencies, and use these estimates to construct a likelihoodratio statistic—twice the difference between the loglikelihood based on MLEs and the loglikelihood based on independence of alleles from different loci (Excoffier and Slatkin 1995; Hawley and Kidd 1995; Longet al. 1995; Slatkin and Excoffier 1996). This method, however, requires the assumption of random pairing of haplotypes, which implies that each of the loci has genotype proportions that fit HardyWeinberg equilibrium (HWE) proportions (see appendix). It has been shown that departure from HWE proportions, which we denote HardyWeinberg disequilibrium (HWD), can bias estimates of haplotype frequencies (Fallin and Schork 2000). The impact of HWD on the statistical properties of the likelihoodratio statistic is not well known.
An alternative method that allows for unknown linkage phase was provided by Weir (1979) and Weir and Cockerham (1989) and discussed in the book by Weir (1996). They explicitly incorporate the ambiguity of the double heterozygote by using a composite measure of LD. The composite test measures the association of alleles from different loci on the same haplotype (intragametic LD) as well as on different haplotypes (intergametic LD). The advantages of this approach are that HWD at either locus is incorporated into the test statistic and the statistic is rapidly computed. Weir (1979) showed that this composite statistic provides the correct type I error rate when testing LD whether or not there is departure from HWE at either locus.
The first purpose of this report is to demonstrate the impact of HWD on the statistical properties of the likelihoodratio statistic. An advantage of the likelihoodratio method is that it allows for more than two alleles at either locus and provides a global test for LD among any of the pairs of alleles from the loci. The second purpose of this report is to extend the method of Weir and Cockerham to a global test of LD that allows for more than two alleles at either of the loci.
METHODS
To provide the necessary background, some of the developments of Weir and Cockerham (1989) are briefly reviewed (see also Weir 1996, pp. 94 and 125). Suppose that locus A has J possible alleles, A_{1}, A_{2},..., A_{J}, and locus B has K possible alleles, B_{1}, B_{2},..., B_{K}. Assuming that alleles are codominant, the probabilities of the marker phenotypes at the A locus can be expressed in terms of allele frequencies (p_{Aj}) and coefficients for HWD, D_{Aij},
Linkage disequilibrium when phase is unknown: When haplotypes are directly observed, linkage disequilibrium is measured by the intragametic LD,
One could also measure the nonrandom association of alleles A_{j} and B_{k} from different haplotypes, called the intergametic LD:
When linkage phase is unknown, the underlying pair of haplotypes is ambiguous for the doubleheterozygous phenotypes, and so one cannot directly measure the intragametic LD. To surmount this issue, Weir and Cockerham proposed a composite measure of LD, the sum of the intra and intergametic disequilibria:
When there are only two alleles per locus, there is one composite LD, say Δ_{A}_{1}_{B}_{1}, and an estimator is
When there are only two alleles per locus, there are eight phenotype categories, and the counts of these categories can be represented by the vector X. This emphasizes that n_{A}_{1}_{B}_{1} is a linear combination of the elements of the X vector, and so too are p̂_{A}_{1} and p̂_{B}_{1.} Hence,
When there are more than two alleles at either locus, all possible pairs of LD coefficients can be estimated. For J alleles at locus A and K alleles at locus B, there are a total of (J – 1)(K – 1) composite coefficients. To extend the work of Weir and Cockerham, we first use the vector X to denote phenotype counts for all possible distinguishable twolocus phenotypes. The sum of the elements of this vector is N, the total number of subjects. Suppose that L is the length of vector X; then, each composite LD can be written as a function of linear combinations of terms from the vector X. To see this, we first define counting vectors, α, β, and γ, each of length L. The vectors α and β are used to count alleles for loci A and B, respectively. A subscript on these vectors indicates the type of allele that is counted. For example, α_{j} counts alleles of type A_{j}. The ith element of α_{j} is denoted α_{j,i}, which has a value of 1, 0.5, or 0, according to whether the ith phenotype category has 2, 1, or 0 alleles of type A_{j}. The vector β_{k} counts alleles of type B_{k} in a similar manner. Allele frequencies can be estimated by these count vectors, such as
Variances and covariance: When more than two alleles exist at either locus, there is more than one composite LD coefficient. These coefficients are correlated, because they depend on the multinomial count vector X and because the same alleles can overlap between different coefficients. To derive the covariance matrix of the LD coefficients, we use Fisher's formula, which is a special case for a Taylor series approximation for functions that depend on the relative frequencies of the multinomial categories, X_{i}/N in our case. For a more complete description of Fisher's formula, see Bailey (1961, p. 285). The covariance of the functions T_{1} and T_{2} (e.g., T_{1}
Substituting these derivatives into expression (1) provides a way to estimate the covariance matrix for all the LD coefficients. To test the null hypothesis of no composite LD and no higherorder disequilibria, we compute the covariance matrix by using the vector of probabilities, Q, computed under the null hypothesis of no linkage disequilibrium and assuming that all disequilibrium parameters between loci (intra and intergametic disequilibria and higherorder terms) are zero, but we allow for HWD by including appropriate disequilibria coefficients. Under these assumptions, the probabilities of the marker phenotypes at two loci are
Parameter estimates for allele frequencies and HWD coefficients are substituted into expression (2) to estimate the Q vector under the null hypothesis.
Testing: To test the null hypothesis that all of the composite LD parameters are zero and that there are no higherorder disequilibria, we use a global chisquare statistic,
Simulations: To evaluate the type I error rates and power of the composite chisquare and likelihoodratio statistics, simulations were performed. The composite chisquare statistic was computed two ways: first by allowing for HWD as illustrated in expression (2) and second assuming HWE (i.e., forcing D_{A}_{12} and D_{B}_{12} coefficients equal to zero). Although our motivation is not to require HWE, we evaluated the statistical properties of the composite test with assumed HWE for two reasons. First, we wish to evaluate whether the composite test loses power when in fact data are simulated under the assumption of HWE. Second, it may be tempting to first test for HWE before testing LD; if there is no statistical departure from HWE, then we assume HWE when using the composite test for LD. This practice might be valid if there were significant gains in power by assuming HWE whenever appropriate.
For simulations under the null hypothesis of no LD, the distribution of twolocus phenotypes was simulated using expression (2) assuming two alleles per locus, with allele frequencies p_{A}_{1} and p_{B}_{1} equal to either 0.2 or 0.5. The amount of departure from HWE was simulated according to the fraction of its extreme values. For locus A, the fraction of HWD is f_{HWD,}_{A} =–1 or +1 according to whether D_{A}_{12} is equal to its minimum or maximum value [
We also performed simulations under the null hypothesis of no LD for three alleles per locus. In this case, there are three types of heterozygotes and hence three D coefficients for HWD at each locus. The patterns of HWD can be complex, as the range of each D coefficient depends on allele frequencies and the other D coefficients. To simplify our evaluations, we assumed equal allele frequencies at each locus (p_{Ai} = p_{Bi} = ⅓), and we assumed that only alleles 1 and 2 at each locus departed from HWE, so that there is only one D coefficient for each locus (i.e., only D_{A}_{12} and D_{B}_{12} were nonzero). The composite and likelihoodratio statistics have 4 d.f. when there are three alleles per locus.
To evaluate power, we assumed two alleles per locus, so that there is only one LD parameter. Because the likelihoodratio statistic is biased when there is HWD, all simulations for power were computed assuming HWE, to assure that the power of the various statistics was evaluated at approximately the same type I error rates. The amount of LD was simulated according to the fraction of its extreme values, with f_{LD} = – 1 when D_{A}_{1}_{B}_{1} = max(–p_{A}_{1}p_{B}_{1}, –p_{A}_{2}p_{B}_{2}), and f_{LD} =+1 when D_{A}_{1}_{B}_{1} = min(p_{A}_{2}p_{B}_{1}, p_{A}_{1}p_{B}_{2}); the parameter f_{LD} is equivalent to the familiar normalized
All simulations were based on 50 unrelated subjects and 1000 simulated data sets. Simulations and statistical analyses were conducted with SPLUS software (Insightful). The code to compute the composite test is available upon request by sending an email to schaid{at}mayo.edu.
RESULTS
Type I error rates: The estimated type I error rates, with an expected nominal rate of 0.05, are illustrated in Figure 1A for when allele frequencies are p_{A}_{1} = p_{B}_{1} = 0.2. Figure 1 illustrates that the composite chisquare statistic generally achieves the expected nominal error rate of 0.05 over all 25 simulated combinations of values for f_{HWD,}_{A} and f_{HWD,}_{B}. For 1000 simulations, the 95% confidence interval for the simulated type I error rate is 0.036–0.064. For the data in Figure 1, the type I error rate for the composite statistic ranged from 0.038 to 0.068, and only 1 of 25 values exceeded the upper 95% confidence limit. In contrast, the composite statistic that assumed HWE (Figure 1B) was either overly conservative when there was negative HWD at either locus or anticonservative when there was positive HWD at either locus, and the joint effects of HWD at both loci tended to accentuate these trends. The type I error rate for the composite test with assumed HWE ranged from 0.017 to 0.263, with 18 of 25 values falling outside the 95% confidence interval (C.I.). The likelihoodratio statistic tended to be liberal when the HWD at both loci was in the same direction (Figure 1C). The type I error rate for the likelihoodratio statistic ranged from 0.042 to 0.2141, with 10 of 25 values falling outside the 95% C.I.
The trends in Figure 2, for when allele frequencies are p_{A}_{1} = p_{B}_{1} = 0.5, tend to follow similar patterns as those in Figure 1. Contrasting Figures 1 and 2 emphasizes that the impact of HWD on the type I error rate depends not only on the values of f_{HWD,}_{A} and f_{HWD,}_{B}, but also on the allele frequencies. The composite statistic maintains the appropriate error rate of 0.05 (range of simulated values 0.034–0.068, with 2 of 25 falling outside the 95% C.I.), but the composite statistic with assumed HWE can be grossly conservative or liberal (range of simulated values 0.000–0.274, with 21 of 25 falling outside the 95% C.I.). The likelihoodratio statistic can also have large departures from the nominal 0.05 error rate (range of simulated values 0.000–0.783, with 11 of 25 falling outside the 95% C.I.), with the largest departure occurring when both loci have extremely large negative values of f_{HWD,}_{A} and f_{HWD,}_{B}, which implies an excessive number of heterozygotes at both loci. This situation is the worst for maximizing the likelihood, because double heterozygotes are ambiguous for linkage phase. In the extreme, with no homozygotes at either locus, the likelihood method fails because there are no unambiguous haplotypes to help estimate the relative frequencies of the different linkage phases among the double heterozygotes. But, an extreme excess number of homozygotes (both f_{HWD,}_{A} and f_{HWD,}_{B} having values of 0.8) also led to an inflated type I error rate for the likelihoodratio statistic (error rates of 0.14 and 0.13 for allele frequencies of 0.2 and 0.5, respectively). Simulations for a nominal error rate of 0.01 demonstrated similar patterns as those illustrated in Figures 1 and 2 (results not shown). Furthermore, simulations with unequal allele frequencies (i.e., p_{A}_{1} = 0.2, p_{B}_{1} = 0.5, and p_{A}_{1} = 0.5, p_{B}_{1} = 0.2) also showed trends similar to those illustrated in Figures 1 and 2 (results not shown).
Simulation results for three alleles per locus, with alleles 1 and 2 at each locus departing from HWE and yet no LD between the loci, are presented in Figure 3. Similar to the case of two alleles per locus, the composite statistic maintains the appropriate error rate of 0.05 (range of simulated values 0.032–0.063, with 2 of 25 falling outside the 95% C.I.; see Figure 3A); the composite statistic with assumed HWE can be grossly conservative or liberal (range of simulated values 0.006–0.187, with 18 of 25 falling outside the 95% C.I.; see Figure 3B); and the likelihoodratio statistic can have large departures from the nominal 0.05 error rate (range of simulated values 0.044–0.416, with 17 of 25 falling outside the 95% C.I.; see Figure 3C). Again, the largest departure occurred when both loci had extremely large negative values of f_{HWD,}_{A} and f_{HWD,}_{B}—an excessive number of heterozygotes at both loci.
Power: The power of the three statistics is presented in Figure 4. These simulations assumed HWE, so that all tests could be compared with the same approximate type I error rate. Figure 4 illustrates that all three statistics have similar power, although there is a small power advantage of the likelihoodratio statistic when p_{A}_{1} = p_{B}_{1} = 0.2 and there is negative LD between the loci (see Figure 4, left side). Surprisingly, there was no power difference between the composite test that allowed for HWD and that which assumed HWE. These results suggest that there is no advantage, in terms of power, to assume HWE for the composite statistic and that there can be a significant disadvantage in terms of robustness of the type I error rate to departures from HWE.
DISCUSSION
Our simulation results illustrate that when linkage phase is unknown, departures from HWE can have dramatic effects on the commonly used likelihoodratio statistic for testing LD. Gross departures from HWE, particularly an excess number of heterozygotes, can increase the rate of falsepositive conclusions regarding LD. In contrast, the composite statistic provides a robust method to test for LD between loci. This statistic is based on estimates of composite LD and their covariances under the null hypothesis of no LD and no higherorder disequilibria. Our methods are direct extensions of those by Weir and Cockerham, where we derive the covariance between composite measures of LD. An alternative statistic, proposed by Weir (1979), is based on the goodnessoffit of the observed phenotype frequencies to their null expected values and is implemented in SAS (2003). For large sample sizes, the Waldtype of statistic that we propose and the goodnessoffit statistic by Weir are expected to give similar results. For sparse data, due to some rare alleles, we speculate that the goodnessoffit statistic may not be well approximated by the chisquare distribution, as is often found for other goodnessoffit statistics. Our approach, based on covariances of composite LD measures, can use the singular values of the covariance matrix to assess the numerical stability of the statistic and reduce the degrees of freedom according to the rank of the covariance matrix, if needed. Further work is needed to compare the small sample properties of our proposed statistic and the goodnessoffit statistic.
Although it may be tempting to first test for HWE and then decide whether or not to assume HWE in the composite statistic, our simulations suggest that assuming HWE does not provide any power advantage, yet it could inflate the type I error rate. This suggests that the composite statistic should be used for routine testing for LD regardless of whether or not HWE exists at either locus.
Several forces could cause departure from HWE, and a critically important cause could be error in the measurement of genotypes. For this reason, departures from HWE are often used as a crude measure of quality control. This approach, however, does not provide adequate guidelines on when a marker should be excluded from the analysis (i.e., the threshold of statistical significance for concern) or whether particular subjects should be excluded. An alternative approach is to incorporate genotyping errors into methods of analysis, an approach that has been successful in linkage analysis of pedigree data (Sobelet al. 2002). Because departures from HWE could be caused by genotyping errors, explicit models of genotyping error could be incorporated into the usual likelihood models for haplotype frequencies, so that departures from HWE would be absorbed into parameters that measure genotyping error rates. More work along this type of modeling may prove beneficial. Although our simulations are limited in terms of the many different patterns of LD that could arise when more than two alleles exist at either locus, the broad range of LD that we explored for the simple case of two alleles per locus suggests that the composite statistic has power similar to that of the likelihoodratio statistic. It may be possible to construct situations where the likelihoodratio statistic has greater power, yet the potential inflation of the type I error rate does not seem to warrant routine use of this method.
Our work has focused entirely on determination of an appropriate way to test for LD, regardless of whether either locus attains HWE. We have not addressed the best way to estimate the amount of LD when there are departures from HWE. Numerous authors have discussed the statistical properties of competing measures of LD when linkage phase of double heterozygotes is known (Hedrick 1987; Devlin and Risch 1995; Zabetianet al. 2003), but there is little understanding about measures of LD when linkage phase is unknown and there are departures from HWE. The composite measure offers appeal, but it can be difficult to interpret for several reasons. First, it is an additive measure of the departure of the observed genotype frequency from that expected if there were no LD. This is analogous to the measure D when linkage phase of the double heterozygotes is known (i.e., D_{AB} = p_{AB} – p_{A}p_{B}). Hence, this type of additive measure will depend on allele frequencies; see Hedrick (1987) for more discussion. Second, the composite measure depends not only on the association of alleles between two loci on the same gamete (the usual D value), but also on the association of alleles between the two loci on different gametes. This latter type of association is typically ignored, but may occur when there are departures from HWE. The composite measure of LD is confounded between LD and HWD. Clearly, more work is required to determine the best measure of LD when the assumption of HWE is violated.
In conclusion, our results suggest that testing for the presence of LD between two loci with unknown linkage phase should be performed by the composite statistic. We have extended the work of Weir and Cockerham to allow for more than two alleles at either of the loci, and so this general composite statistic is a strong competitor to the traditional likelihoodratio statistic.
Acknowledgments
This research was supported by United States Public Health Services, National Institutes of Health, contract grant no. GM65450.
APPENDIX
The random pairing of haplotypes implies that the genotypes at each locus are expected to have genotype proportions in HWE. We can show why this occurs for the case of two loci; it is straightforward to extend our arguments to more loci. Let A_{j}B_{k} denote a haplotype. If haplotypes are randomly paired, the probability of the pair (A_{j}B_{k}, A_{j}_{′}B_{k}_{′}) is
Footnotes

Communicating editor: R. W. Doerge
 Received April 10, 2003.
 Accepted September 10, 2003.
 Copyright © 2004 by the Genetics Society of America