Abstract
A substantial body of theory has been developed to assess the effect of evolutionary forces on the distribution of genotypes, both single and multilocus, within populations. One area where the potential for application of this theory has not been fully appreciated concerns the extent to which population samples differ. Within populations, the divergence of genotype or haplotype frequencies from that expected under HardyWeinberg (HW) or linkage equilibrium can be measured as disequilibria coefficients. To assess population samples for concordant equilibria, an analytical framework for comparing disequilibria coefficients between populations is necessary. Here we present loglinear models to evaluate such hypotheses. These models have broad utility ranging from conventional population genetics to genetic epidemiology. We demonstrate the use of these loglinear models (1) as a test for genetic association with disease and (2) as a test for different levels of linkage disequilibria between human populations.
THE extent to which the varied influence of evolutionary forces such as natural selection and random genetic drift contribute to differences between population samples is of substantial interest. For example, the importance of linkage disequilibrium (LD) for populationbased gene mapping approaches has focused attention on assessing the genomic distribution of LD (Huttleyet al. 1999) and on the extent to which this distribution differs between populations (Kruglyak 1999; Lonjouet al. 1999). Theory predicts that the limit of LD will be greater in human populations with historically restricted sizes, giving such populations an advantage for gene mapping. Another example is the extent to which an individual’s genotype at a specific locus accounts for their susceptibility to disease. Comparing population samples that differ with respect to their disease status can assess a causative disease role for variation at a locus. Although these two examples appear quite distinct, as we argue below, the effect of genetic predisposition to disease on genetic variation is analogous to the effect of natural selection in wild populations. Thus, the assessment of such seemingly disparate lines of inquiry can be unified into a single analytic framework.
To illustrate the traditional approach employed to detect genetic differentiation between groups we consider an epidemiological example where a sample is divided into groups, say “affected” and “unaffected,” and tested for homogeneity of allele or genotype frequencies (see, for example, Chiano and Clayton 1998; Cox and Bell 1989; Schaid and Jacobsen 1999). Tests for genotype association implicitly allow for more complex genetic etiologies (e.g., heterozygote resistance or susceptibility) than allelebased tests, but have larger degrees of freedom, and thus may have reduced statistical power. Chiano and Clayton (1998) recently proposed an additional test for association that accommodates complex genetic causation with reduced degrees of freedom relative to the genotype test. This test is restricted in its application since it makes the assumption that heterozygote genotypes do not cause disease. However, empirical evidence that this assumption is violated exists in a number of experimental systems. In one example, the F_{1} progeny of a cross between inbred mouse strains NZ black and NZ white exhibit the immunological disorder lupus, which is a phenotype absent from both parental strains (Theofilopoulos and Dixon 1985).
In these traditional approaches, HardyWeinberg (HW) disequilibrium is treated as a confounding factor, and corrections are applied to eliminate the impact of HW departures. In addition to allele and/or genotype differences, however, it is important to understand the basis for differential departures from HW equilibrium (HWE).
The pattern of departure from HWE should reflect the underlying genetic etiology of a phenotype, suggesting that testing for HWE may also be used to assess whether a gene influences predisposition to a trait. In an epidemiological context, for example, the resistance to human immunodeficiency virus (HIV) infection of individuals homozygous for the Δ32CCR5 deletion allele results in a significant departure from HWE among HIVuninfected high risk individuals, and thus an excess of Δ32CCR5 homozygotes in this group (Deanet al. 1996). Departure from HWE in affecteds has also been used for fine scale gene mapping (Federet al. 1996; Nielsenet al. 1999). Of course, phenomena other than selection may cause departure from HWE: admixture of genetically differentiated populations leads to a characteristic excess of homozygotes, referred to as the Wahlund effect (Hartl and Clark 1990); nonrandom mating can cause either excess homozygosity or heterozygosity; and laboratory errors stemming from difficulty in discriminating between alleles, or difficulty in sampling all alleles, can cause either excess homozygosity or heterozygosity. Departures from HWE can be measured using disequilibria coefficients (Weir 1996, p. 132).
Loglinear models present a natural framework for analysis of disequilibria coefficients between populations (Aston and Wilson 1986) and can be implemented using standard statistical software packages. We present a loglinear model approach for the comparison of singlelocus disequilibria coefficients between populations. We also present loglinear modeling approaches for the comparison of disequilibria coefficients arising from nonrandom associations between loci. The latter disequilibrium is often referred to, nonrigorously, as linkage disequilibrium. While we adhere to this convention, it should be pointed out that there can be interlocus genotypic disequilibria, and that linkage is not essential for such disequilibria to occur.
STATISTICAL MODELS
A model for the effect of a selective process on genetic variation: The effect of genetic predisposition to a trait on deviations from HWE has been explored largely in the context of natural selection affecting wild populations. Here we apply analogous methodology to populations that have been divided as above into the groups “affected” and “unaffected” by selection, which we define generally as any process that differentiates individuals into two phenotypic groups on the basis of their genotypes at a locus. In Table 1 we present a simple model describing the consequences of selection on genetic variation. For a biallelic locus, having alleles A and a with frequencies p_{A} + p_{a} = 1, genotype frequencies prior to selection are simply those expected under HWE. In modeling a process of natural selection, differential survival of genotypes can be represented by the ratios of the fitness coefficients ω_{ij}, where i and j represent the alleles (A or a in our example). The ratios of fitness coefficients, in turn, are delimited by the corresponding selection coefficients (s_{ij}). The product of a fitness coefficient with the expected frequency (under HWE) of its corresponding genotype gives the frequency of that genotype in the unaffected group. For the sum of genotype frequencies to equal 1 in the postselection unaffected group, frequencies are normalized by dividing each genotype’s frequency in the unaffected group by the term
Because genotype frequencies are determined by the ω_{ij} in the unaffected group and s_{ij} in the affected group, variation at a causative locus will exhibit differential departures from HWE in the affected and unaffected groups (Table 1). This suggests a novel null hypothesis for comparing affected with unaffected groups: that the disequilibrium coefficient(s) at a locus are the same in affecteds and unaffecteds. Both the traditional allele and genotype association tests are indirect assessments of this null hypothesis.
Below we present loglinear models for tests of concordant equilibria. Although the models and examples considered are for biallelic loci, multiallelic loci can be readily accommodated. Examples of applying the models presented here are available at http://cbis.anu.edu.au/publications.html as generalized linear interactive modeling (GLIM), SAS, or R transcript files.
Testing groups for concordant equilibria at a single locus: Testing for concordance with HWE in a sample is predominantly performed using an additive statistical model. Consider a biallelic locus with alleles A and a, genotypic frequencies P_{AA}, P_{Aa}, P_{aa}, and allele frequencies p_{A} + p_{a} = 1. Departure from equilibrium expectation is commonly evaluated by whether a disequilibrium coefficient, namely
Disequilibria coefficients from the multiplicative and additive statistical frameworks have different properties: The disequilibrium coefficient of the additive model, D, is a function of the three fitness coefficients and the allele frequency, e.g., from the unaffected group
In addition to measuring the difference in disequilibria between two samples, the above model also assesses the extent to which two samples are differentiated by allele frequency. In the fully saturated model, allele frequency is evaluated as M_{A} = P_{Aa}/(2P_{aa}) (Weir 1996). This can be reformulated as (p_{A}ω_{Aa})/(p_{a}ω_{aa}) for the unaffected sample and [p_{A}(1  ω_{Aa})]/[p_{a}(1  ω_{aa})] for the affected sample. From these equations it can be seen that M_{A} of the two samples will be equal when ω_{Aa} = ω_{aa}. It is important to note that the τM_{A} term could still be significant when this relationship is true.
Example test of groups for concordant equilibria at a single locus: The loglinear models were implemented in GLIM. We illustrate application of the singlelocus model to +/Δ32 CCR5 genotype data from longitudinal AIDS cohorts (Deanet al. 1996). We include only homosexual men from the DCG, MAC, and SFCC cohorts. The frequencies of genotypes in each group are presented in Table 2. HIVrefers to individuals who have not contracted, but are at risk for exposure to, HIV (the resistant group). HIV+ refers to individuals that have contracted HIV (the susceptible group). There are four independent parameters in the model and the complete sequential addition of model terms is shown in Table 3. Parameters are estimated by maximum likelihood assuming a multinomial sampling of genotypes. The fit of a model is measured as the likelihoodratio test statistic or deviance
When the full hierarchy of models is being considered there are some general principles that can be employed to guide interpretation. First, a nonsignificant residual deviance for model 4 does not mean that both interaction terms (τM_{A}, τM_{AA}) are not significant. This principle also applies to the twolocus models. Second, for Pearson residuals, calculated as
The significance of group (τ) and allele (M_{A}) terms indicates that the HIV and HIV+ groups are unequal in size, and in the combined sample the + and Δ32 alleles are unequal in frequency. The term M_{AA} does not contribute significantly to this model, consistent with the combined affecteds and unaffecteds being drawn from a population in HWE. The groupbyallele interaction term (τM_{A}) is also not significant, indicating no difference in allele frequency between groups. We note that because the model adjusts for departures from HW, the groupbyallele interaction term under this model will be different from that resulting from the “standard” allele frequency goodnessoffit test. The groupbymonogenic disequilibria interaction term in the model is highly significant. This result indicates that groupspecific HW disequilibria (i.e., τM_{AA}) coefficients significantly improve the fit of the loglinear model. A twotailed Fisher’s exact test on the genotype distribution was also highly significant (P ≤ 10^{8}), affirming the validity of the asymptotic approximation used for
Testing groups for concordant equilibria at two loci: As for HW, departures from linkage equilibrium may arise from a number of evolutionary processes: selection, random genetic drift, nonrandom mating, admixture of genetically differentiated populations, and mutation. There is also a substantial body of theory concerning the distributional properties of the disequilibria coefficients that can exist between two loci (see Weir 1996).
Data for assessing the occurrence of nonrandomness between loci can take two general forms: (1) phaseknown data, where specific chromosomal or gametic combinations of variants are explicitly known; and (2) genotypic data with phase unknown. We treat each class of data separately.
Consider phaseknown data of two loci with alleles A/a and B/b from n groups. Under the full model the log of expected frequencies of the four possible gametes in group i (i = 1, 2,... n) can be parameterized as
Analyzing phaseunknown genotypic data for disequilibria is more complex and involves numerous disequilibria terms. Assuming phase for double heterozygotes is unknown and that only nine genotypic classes can be distinguished, under the full model the log of genotype frequencies can be expressed as
If gametic phase is known, it may be desirable to explicitly evaluate all disequilibria terms. In this case, the terms S_{AB} and Q_{AB} can be replaced by M_{AB} and M_{A}_{/}_{B} (Weir and Wilson 1986). The latter two terms represent the intra and intergametic digenic disequilibria, respectively.
Example test of groups for concordant equilibria at two loci, phase unknown: Because the equations presented for phaseunknown data above are overparameterized we assume no quadrigenic disequilibria and set the term
DISCUSSION
Tests of genetic differentiation have commonly involved directly comparing allele and/or genotype frequencies between groups, with possible adjustment for multiple testing. We have noted that an outcome of selective processes is the differential departures from population genetic equilibria at causative genes in affected and unaffected groups. Here, approaches have been presented to directly test the null hypothesis that the disequilibria coefficients in different groups are the same. To discuss the attributes of these tests we focus primarily on the application of testing for genetic association.
Properties of the singlelocus test: The singlelocus test for concordant equilibria between groups has several advantages over standard association tests. The loglinear models allow partitioning of the differentiation between samples into the contributions of alleles and interallelic interaction. These contributions are confounded in the conventional genotype goodnessoffit test. Disentangling these effects enables explicit assessment of hypotheses concerning their roles, reducing the degrees of freedom for tests of complex genetic etiologies. For biallelic loci the degrees of freedom for the concordant equilibria test are equal to that for allele association tests and are intermediate between those for allele and genotype association tests for multiallelic loci. Moreover, the test for concordant equilibria can detect genetic associations where the underlying genetic causation is complex. Alternatively, the full hierarchy of models (Table 3, models 46) may be considered and the basis for rejection of the null hypothesis can be identified. The latter approach avoids the necessity of applying a multiple test correction that arises when both allele and genotype association tests are performed. In our example, after rejecting the null hypothesis for model 4 (Table 3) we see that the genotype distribution difference arises not from allele frequency differences but as a consequence of the differential frequency of Δ32/Δ32 homozygotes in the two groups.
The loglinear approach should provide improvements in power relative to the conventional association tests. The conventional genotype goodnessoffit test is a special case of the loglinear model—deviance (Δ) from model 4 and the G value from a likelihoodratio genotype goodnessoffit test will be identical when the pooled sample exhibits perfect HW equilibria. If the pooled sample is not in HW equilibrium, both allele and genotype goodnessoffit tests can suffer from changes in type I error from the assumed α level (Schaid and Jacobsen 1999). The incorporation of a disequilibria term in the loglinear model prior to testing for concordant allele frequency or equilibria will have the effect of reducing this bias, similar to the influence of other corrections aimed at removing bias from HW disequilibrium (Schaid and Jacobsen 1999).
Our numerical analysis implies that the monogenic disequilibria coefficients from the loglinear model will be equal only when genotypes do not differ in their risk of disease. Different genetic etiologies (e.g., multiplicative or additive genetic allele interactions) should therefore be detectable. Interestingly, the D coefficients in the affected and unaffected samples are identical when additivity occurs in a biallelic system (ω_{AA} = 1, ω_{Aa} = 0.5, ω_{aa} = 0). The further instances in which D of the two samples can be equal indicate that additive statistical models are inappropriate to assess groups for concordant equilibria.
Sparse data will cause an apparent increased type I error rate for the conventional allele and genotype goodnessoffit test as well as the loglinear models presented here (because the results concerning the distribution of the test statistic are asymptotic). The solution for such sparse data analysis is the same—generating the null distribution of the test statistic by resampling from the expected tables with the constraint that the permuted table marginals are the same as that of the observed table.
One of the potential advantages of comparing disequilibria coefficients between groups is that examination of coefficients should provide insights into the relationship between genotype and phenotype (Hernandez and Weir 1989). For analyses involving the phase unknown twolocus model, consideration must be given to additional four di and trigenic coefficients. While the biological basis for significance of such coefficients may not be straightforward it seems likely that biological meaning can be attributed to them (Weir and Cockerham 1989). For example, large protein complexes involving several different genes, or multiple copies of the same gene, may be candidates in which different combinations of alleles from the member proteins can impact on the functional attributes of such complexes.
A shortcoming of HW tests on a single group is that under some combinations of ω_{ij}, a group may experience high levels of selection and yet retain HW proportions (M_{AA} = 1). This occurs when the heterozygote and homozygote coefficients fulfill the relationship ω^{2}_{Aa} = ω_{AA}ω_{aa} (Lewontin and Cockerham 1959), resulting in the appearance of HWE in the unaffected group. Similarly, the inverse situation (s^{2}_{Aa} = s_{AA}s_{aa}) will also result in apparent HWE in the affected group. Under the latter condition, the fine scale mapping method of Nielsen et al. (1999) will fail. However, methods that utilize a reference sample will be informative when either the affected or unaffected group fulfills this condition, since the coefficients for the other group are not in accord with this relationship (i.e., if
A further potential shortcoming of HWbased tests for association would appear to be sensitivity to distortions arising from population admixture, since this can result in HW disequilibrium. If genetically differentiated populations that differ in disease incidence are inadvertently pooled in a disproportionate way between affecteds and unaffecteds, even unlinked markers can exhibit association. As for the standard allele and genotype goodnessoffit tests, such potential confounding can be avoided by appropriate matching of affecteds and unaffecteds with regard to ethnic background. If affected and unaffected individuals are matched for ethnic background, and if there is no disease association, M_{AA} coefficients of the affected group, unaffected group, and total sample will all be the same, but not equal to 1.
As pointed out above, factors other than disease association can also lead to departure from HWE. Clearly, elimination of laboratory error as a potential source for departure from HWE is an essential first step. Excluding laboratory error, it is commonly assumed in genetic epidemiological studies that HW departure, manifest as excess homozygotes, in the random population of unaffecteds necessitates population admixture, and specific methods are employed to reduce the impact of this bias in evaluation of allele and genotypic frequencies between affecteds and unaffecteds (Chiano and Clayton 1998; Schaid and Jacobsen 1999). Yet available data for singlenucleotide polymorphisms do not support extensive genetic differentiation among the intensively studied populations from northwestern Europe (CavalliSforzaet al. 1994, p. 268; Goddardet al. 2000), for example, or even among the major ethnic groups (Barbujaniet al. 1997). Moreover, as indicated above, under some combinations of penetrances the affected group may be in HWE while the unaffected group is in HW disequilibrium. Thus, a presumption of admixture should be avoided.
An important alternative to admixture is the operation of natural selection. For natural selection to cause detectable HW disequilibrium in a population the following are required: substantial fitness differences between genotypes, the selected genotype(s) be reasonably common, and the selective force also be reasonably common. Given these constraints, natural selection is not expected to be a frequent cause of HW disequilibrium. However, the classic example of the malarial resistance conferred by the βglobin allele Hbβ^{S} in African populations (Allison 1964), and HW disequilibrium where malaria is endemic, clearly demonstrates that the influence of natural selection on endemic human genetic variation is not just a theoretical possibility. Furthermore, by virtue of their involvement in regulating important biological functions, human candidate disease genes might be reasonably considered a priori to have a higher likelihood of being subject to natural selection than anonymous markers. Thus, selective origins for HW disequilibrium in random population unaffecteds should not be automatically dismissed.
Possible applications of the twolocus tests: The potential utility of testing for concordant equilibria in studies of affected and unaffected individuals is not restricted to singlelocus comparisons. Many diseases may be polygenic, and epistatic genetic interactions, both within and between loci, are likely to be important in the etiology of the disease phenotype. One consequence of this interlocus dependence can be disequilibria between the loci. Either of the phaseknown or phaseunknown twolocus models may therefore be used to test for a role of epistatic interactions in genetic association studies.
There is also considerable value in comparing disequilibria coefficients between natural populations. In cases where evolutionary parameters are known to differ between population samples, a formal comparison of disequilibria coefficients would provide a valuable test of theoretical expectations. Additionally, comparing disequilibria coefficients may be used as an exploratory tool to assess whether differences exist between population samples.
The causes of genetic differentiation between wild populations will almost certainly be more complex than the genetic model we have outlined for epidemiological studies (Table 1). In comparing wild populations, the differential incidence of any evolutionary process that can cause departures from HW or linkage equilibrium is a candidate for detected genetic differentiation. The interpretation of genetic differences between populations will therefore require combining knowledge of theory with knowledge of population attributes.
Summary: The ease with which loglinear models can be modified to incorporate different terms has been illustrated here by our addition of terms to loglinear models of HW and linkage equilibrium. Increasingly complex data sets aimed at characterizing patterns of genetic differentiation using multiallelic loci or multiple singlenucleotide polymorphisms from multiple genes (Goddardet al. 2000) can also be readily accommodated by including additional terms (Zhanget al. 1990). The models we have presented enable population samples to be formally tested for concordant equilibria, providing a biologically intuitive framework for the examination of genetic differentiation. The extensive theory describing the effect of evolutionary processes on disequilibria coefficients can then serve as a rich backdrop from which to understand the nature of biological processes contributing to the genetic differentiation between samples.
Acknowledgments
We thank John Hopper, whose comments initiated this work, Robert Attenborough and Simon Easteal for comments on the manuscript, and Michelle Vella for assisting us with implementing the models in SAS.
Footnotes

Communicating editor: A. H. D. Brown
 Received March 26, 2000.
 Accepted September 11, 2000.
 Copyright © 2000 by the Genetics Society of America