## Abstract

When the classical χ^{2} goodness-of-fit test for Hardy-Weinberg (HW) equilibrium is used on samples with related individuals, the type I error can be greatly inflated. In particular the test is inappropriate in population isolates where the individuals are related through multiple lines of descent. In this article, we propose a new test for HW (the QL-HW test) suitable for any sample with related individuals, including large inbred pedigrees, provided that their genealogy is known. Performed conditional on the pedigree structure, the QL-HW test detects departures from HW that are not due to the genealogy. Because the computation of the QL-HW test becomes intractable for very polymorphic loci in large inbred pedigrees, a simpler alternative, the GCC-HW test, is also proposed. The statistical properties of the QL-HW and GCC-HW tests are studied through simulations considering a sample of independent nuclear families, a sample of extended outbred genealogies, and samples from the Hutterite population, a North American highly inbred isolate. Finally, the method is used to test a set of 143 biallelic markers spanning 82 genes in this latter population.

WHEN there are no disturbing forces such as selection, mutation, or migration that would change allele frequencies over time and when there is random mating in a very large population, the two alleles of an individual are known not to be associated. As a consequence, the genotype frequencies are given by the product of the allele frequencies and the population is said to be in Hardy-Weinberg equilibrium (HWE). Although it is one of the most basic and oldest tests in genetics, HWE testing has lately regained serious attention from human geneticists. Nielsen* et al*. (1999) proposed using it to fine map susceptibility loci. Indeed, as shown by these authors, when considering case-only samples and thus conditioning on phenotypes, departure from HW is expected at susceptibility loci acting in a nonmultiplicative manner. More recently, Lee (2003) proposed using the test as a genome-wide screening tool for association. As outlined by Weinberg and Morris (2003), this latter strategy should be considered very carefully. It has important caveats that include little power to detect some plausible genetic models and nonnegligible probability of detecting departures from HW (HWD) due to other reasons, which include selection not driven by the phenotype under study, population stratification, and genotyping errors. In the present context of intensive use of SNP markers, for which the proportion of genotyping errors that are not detectable by routine Mendelian incompatibility checks is fairly significant, as outlined by Douglas *et al.* (2002), testing for HWE is an additional tool with which to identify genotyping errors. Xu *et al.* (2002) have shown how systematic use of HW testing in case samples and in control samples may help to avoid false detection of association signals driven by genotyping errors in only one of the two samples (either cases or controls).

Classical approaches to testing for HWE include Pearson's χ^{2} goodness-of-fit test (Gof-HW) and the corresponding likelihood-ratio test for unrelated individuals. However, these tests are not suitable for use in samples containing related individuals as they assume independence among the observations. Neglecting these correlations may lead to spurious conclusions (as we demonstrate in results). Further, permutation-based tests where alleles are randomly reassigned to genotypes to get null distributions are not suitable. Indeed, for such tests to be valid, the alleles must be exchangeable under the null hypothesis, which is not generally the case when considering related individuals. In this article, we propose a method for solving the problem of testing for HWE in a sample with any kind of related individuals, provided that their relationship is known. The method is designed to be suitable for large inbred pedigrees in which the relatedness among individuals may be considerable with many or all individuals related through multiple lines of descent. In addition to requiring correction for nonindependence among the individuals, testing for HWE in large inbred pedigrees also requires precise specification of the hypothesis to be tested. This is because, by definition, in an inbred individual *i*, a correlation exists between *i*'s two alleles, which is measured by the inbreeding coefficient *h _{i}*. When the genealogy of the population is available, the inbreeding coefficient can be computed for any individual so that the correlation between the two alleles of any individual due to the pedigree structure can be measured. We propose a test performed conditional on the relationship among the genotyped individuals specified by the pedigree. In this case, the null hypothesis of the test is that, conditional on the pedigree information, the random variable representing assignment of genotypes to genotyped individuals follows the distribution that would be obtained if the founders' alleles were i.i.d. draws from a distribution (

*p*

_{1}, … ,

*p*), where

_{a}*p*is the frequency of allele

_{i}*i*,

*p*> 0, and ∑

_{i}*p*= 1 and if, at each meiosis in the pedigree, one of the two parental alleles was passed on with chance 1/2 each (notation defined throughout the article is summarized in Table 1). Deviation from this hypothesis could arise if, within the population, conditional on relationship, mate choice further depended on the genotypes at a given locus. It could also arise due to genotyping errors, selection, mutation, or, at a susceptibility locus, conditioning on phenotype information.

_{i}We have recently proposed (Bourgain *et al.* 2003) a valid test for case-control association studies in samples with related individuals that is suitable for large inbred pedigrees too complex for exact likelihood computations. Inference is based on the quasi-likelihood score function proposed by McPeek *et al.* (2004) and a quasi-likelihood score (QLS) statistic for case-control association is constructed. Under certain model assumptions and regularity conditions, this test is asymptotically the locally most powerful test of a general class of linear tests, which includes the χ^{2} test for case-control association corrected for the correlation within and between the individuals. Here, we use a similar idea to construct a quasi-likelihood score test for HWE (QL-HW test). Because it is based on genotypes rather than on alleles, the QL-HW statistic is more complex than the QLS. Allele frequencies do not factor out of the mean and derivative vectors and out of the covariance matrix, as they do in the QLS. Further, whereas the QLS requires only the inbreeding and kinship coefficients to summarize the correlation within an individual and between any pair of individuals, the QL-HW requires the values of the nine condensed identity coefficients for any pair of individuals. We use the efficient algorithm of Abney* et al.* (2000) to compute these coefficients in complex inbred pedigrees. Because the computation of the QL-HW test may become intractable for very polymorphic loci in large inbred pedigrees, we propose a simpler alternative, the generalized corrected χ^{2} test for Hardy-Weinberg (GCC-HW test). In the case of an outbred sample with a biallelic locus, the GCC-HW test is equivalent to the Gof-HW test with a variance correction that appropriately accounts for inter-individual correlations. In the case of an inbred sample or when considering loci with more than two alleles, the GCC-HW test is a generalization of the corrected χ^{2} test. GCC-HW is also performed conditional on the pedigree structure.

In what follows we first describe the QL-HW and GCC-HW statistics. Type I error of the two tests is assessed in samples of outbred nuclear or extended families and in samples of Hutterites from South Dakota, a North American religious and highly inbred isolate with known genealogy. The inflated type I error introduced in the classical Gof-HW test by the presence of related individuals, using either the χ^{2} distribution or a permutation procedure to get significance, is illustrated in the outbred samples. Power of the QL-HW and GCC-HW tests to detect deviations from HWE in samples of inbred and outbred individuals is evaluated. Finally, the QL-HW test is applied to a set of 143 intragenic biallelic markers typed in samples of Hutterites.

## METHODS

### Model:

Consider a sample of *N* individuals and a locus with *a* different alleles. Define *M* = *a*(*a* + 1)/2, the number of possible distinct genotypes. Let *Y* = (*Y*_{1}, … , *Y _{N}*)

^{T}be a vector of length

*N*(

*M*− 1) (a [

*N*(

*M*− 1)] vector) with

*Y*= (

_{i}*Y*

_{i}_{1}, … ,

*Y*

_{i}_{(}

_{M}_{−1)})

^{T}a (

*M*− 1) vector of indicators, where

*Y*equals 1 if individual

_{ig}*i*has genotype

*g*and 0 otherwise. Define

*E*(

*Y*) = μ = (μ

_{1}, … , μ

*)*

_{N}^{T}with μ

*= (μ*

_{i}

_{i}_{1}, … , μ

*, … ,*

_{ig}*μ*

_{i}_{(}

_{M}_{−1)})

^{T}. As recognized by Weir (1996), departures from HWE can be characterized in several ways, including use of fixation indices (correlations between any pair of alleles within individuals) and disequilibrium coefficients (difference between a frequency and its value expected when there is no association between alleles). We propose considering a model based on a fixation index, the model of heterozygote excess and deficiency (the HED model) studied by Rousset and Raymond (1995), and we extend it to explicitly include

*N*different inbreeding coefficients (

*h*) for the

_{i}*N*individuals. Table 2 gives the corresponding expression of μ

*, the*

_{i}*i*th component of μ. This model has

*a*distinct parameters: the (

*a*− 1) independent allele frequencies

*p*= (

*p*

_{1}, … ,

*p*, … ,

_{k}*p*

_{a}_{−1}) and the fixation index

*r*with the following constraints on these parameters, 12where the constraints on

*r*are derived from the constraint that each genotype frequency lies between 0 and 1. Under the null hypothesis,

*r*= 0 while

*r*≠ 0 under the alternative hypothesis.

*p*is a nuisance parameter to be estimated.

In the simple case of outbred individuals and a biallelic marker, where there are three possible genotypes whose frequencies must sum to 1, our model for μ is equivalent to the fully parameterized model that has any two of the three genotype frequencies as free parameters. With a biallelic marker and inbred individuals, there is not a unique two-parameter model for μ that takes into account inbreeding. Either the HED model or the model based on a disequilibrium coefficient could be a natural choice. However, it turns out that in the biallelic case with inbreeding, these two models result in strictly equivalent score tests, so there is no need to choose between them. Thus, in the context of SNP studies where most deviations from HWE are likely to result from genotyping errors, our model is very general in allowing for deviation from HWE.

For multiallelic markers, our choice of model for μ narrows the alternative hypothesis to excess of all kinds of homozygous genotypes and lack of all kinds of heterozygous genotypes when *r* > 0 or the contrary when *r* < 0. A primary reason for choosing such a model is that it is relatively parsimonious in that it has only *a* free parameters. Furthermore, it is motivated by previous findings using microsatellites in which excess of all kinds of heterozygotes has been observed, in particular in the HLA region (Ober *et al.* 1997; Robertson *et al.* 1999). Testing for an excess or a deficit of homozygosity is also of interest for detection of genotyping errors (Sobel *et al.* 2002; see discussion). We believe that the HED model, in which the departure from HWE for homozygous genotypes depends on the frequency of the corresponding allele, is better suited to detection of genotyping errors than is the model based on a disequilibrium coefficient, in which the departure from HWE is the same for all homozygous genotypes. More complex models involving more than *a* parameters may be chosen when evidence for such alternative models is available. Otherwise, the increase in degrees of freedom of the test will reduce its power when the true alternative is indeed in the direction specified in our model.

### Covariance matrix:

For a pair of individuals in an inbred pedigree there are 15 possible ways in which the four alleles of the pair at a particular locus can be identical by descent (IBD; see Figure 1 and also Lange 1997, Chap. 5). When the paternal and the maternal allele of an individual do not need to be distinguished, these 15 configurations reduce to 9, referred to as identity states. Following Jacquard (1974), for a pair of individuals (*i*, *j*), we let Δ^{ij}_{k}, *k* = 1, … , 9, denote the conditional probability of identity state *k* given the relationship between the individuals *i* and *j*. These Δ* _{k}*'s are called condensed identity coefficients. In outbred individuals, Δ

^{ij}

_{7}, Δ

^{ij}

_{8}, and Δ

^{ij}

_{9}are, respectively, the probability of sharing two, one, or zero alleles IBD given the pedigree, and all other . Abney

*et al.*(2000) have implemented an efficient algorithm to compute the condensed identity coefficients in large complex pedigrees. Note that the inbreeding coefficients of individuals

*i*and

*j*can be obtained from the identity coefficients: and . Let

**∑**= var

_{0}(

*Y*) be the [

*N*(

*M*− 1) ×

*N*(

*M*− 1)] covariance matrix of

*Y*under the null hypothesis when

*r*= 0. Table 3 gives the entries of

**∑**as a function of

*p*and the 9 condensed identity coefficients. Define

**Ω**to be the covariance matrix of

*Y*under the alternative hypothesis, where

**Ω**may depend on both

*p*and

*r*. To derive a test statistic, we require only that

**Ω**be differentiable and invertible and that, under the null hypothesis,

**Ω**=

**∑**.

### Quasi-score test for Hardy-Weinberg equilibrium (QL-HW test):

We propose basing inference on the quasi-score function *U* (see, *e.g.*, McCullagh and Nelder 1989, Chap. 9):

Here *D _{r}* is a [

*N*(

*M*− 1)] vector and

*D*is a [(

_{p}*N*(

*M*− 1)) × (

*a*− 1)] matrix with the

*k*th column equal to (∂μ/∂

*p*). Table 4 gives the expressions for and (∂μ

_{k}*/∂*

_{ig}*p*). To construct our Hardy-Weinberg test statistic, we use a similar idea to that used in Bourgain

_{k}*et al.*(2003) for the problem of case-control association testing. In analogy with the classical likelihood score statistic when there is a nuisance parameter (see,

*e.g.*, Cox and Hinkley 1974, Chap. 9), we use

*U*to build a quasi-score statistic

*W*

_{QL–HW}:

Here *I ^{rr}* is the (

*r*,

*r*) component of [

*E*(

*UU*

^{T})]

^{−1}, where

*E*(

*UU*

^{T}) is analogous to the information matrix. In our case, the value of

*r*under the null hypothesis is

*r*

_{0}= 0.

*p̂*

_{0}is the quasi-likelihood estimate of the nuisance parameter

*p*when

*r*=

*r*

_{0}. In other words,

*p*=

*p̂*

_{0}is the solution of 3subject to constraints (1), assuming that there is exactly one such solution. Because we cannot derive a closed-form solution for Equation 3, we obtain

*p̂*

_{0}by use of the Newton-Raphson method with Fisher scoring as presented by McCullagh and Nelder (1989) for the quasi-likelihood framework.

*p̂*

_{0}is obtained by iterating 4until convergence occurs. Here,

*p̂*

^{i}

_{0}is the approximation to

*p̂*

_{0}at step

*i*,

**∑̂**,

*D̂*, and μ̂

_{p}_{0}are computed at

*r*= 0 and . To reduce the number of iterations before convergence, we start the iteration procedure with

*p̂*

^{1}

_{0}equal to the best linear unbiased estimator (BLUE) of

*p*proposed by McPeek

*et al.*(2004). This estimator is based on allele information and does not make use of the additional information of genotypes,

*i.e.,*which alleles are paired within the individuals. Finally, 5where

**∑̂**,

*D̂*,

_{p}*D̂*, and μ̂

_{r}_{0}are evaluated at

*r*= 0 and

*p*=

*p̂*

_{0}. As demonstrated in Heyde (1997), under the null hypothesis, the quasi-score statistic should follow a χ

^{2}distribution with 1 d.f. provided that Var

_{0}(

*U*)

^{−1/2}

*U*∼ ℳ𝒱𝒩(0,

*I*) under the null hypothesis. We tested the accuracy of the χ

^{2}approximation to the null distribution of

*W*

_{QL–HW}(see

*Null distribution and power study of the W*).

_{QL–HW}and W_{GCC–HW}statisticsTo efficiently perform the calculations required in Equation 4, at each step, we take the Cholesky decomposition of **∑̂**, *i.e.,* we find an upper triangular matrix **B** such that **B**^{T}**B** = **∑̂**, using an algorithm that simultaneously computes *b*_{0} = **B**^{−T}*D̂ _{p}* and

*b*

_{1}=

**B**

^{−T}(

*Y*− μ̂

_{0}) at little extra cost (Graybill 1976). Then Equation 4 becomes , where

*b*

^{T}

_{0}

*b*

_{0}is an [(

*a*− 1) × (

*a*− 1)] matrix, also inverted using its Cholesky decomposition.

*W*

_{QL–HW}is computed with a similar technique in Equation 5.

We conclude this section by describing the form of the QL-HW test statistic in the case of *N* individuals belonging to *F* independent families. For each *f*, 1 ≤ *f* ≤ *F*, let *N _{f}* be the number of sampled individuals in family

*f*, and let

*Y*, μ

_{f}*,*

_{f}*D*,

_{pf}*D*, and

_{rf}**∑**

*be the vectors or matrices previously described but restricted to the*

_{f}*N*members of family

_{f}*f*. Allele frequencies are then computed using Equation 4, where and . Similarly

*W*

_{QL–HW}is computed using Equation 5, where all terms of the form

*X*

^{T}

**∑̂**

^{−1}

*B*(where

*X*and

*B*are either vectors or matrices) in the formula are computed as .

### GCC-HW test:

As the number of alleles per locus and the number of individuals in a family increase, the calculations of the Cholesky decompositions required in (4) and (5) become much more computationally intensive. In practice, with this method, we were not able to analyze Hutterite samples of 400 individuals for loci with more than five alleles. To overcome this limitation, we developed an alternative test, the GCC-HW test, which is computationally simpler than the QL-HW test and therefore suitable for an even wider set of situations (larger samples, larger number of alleles per locus). For outbred samples in the biallelic case, the GCC-HW test is Pearson's Gof-HW corrected using the variance that appropriately accounts for inter-individual correlations. In inbred samples or when considering multiallelic loci, the GCC-HW test is a generalization of this approach. We start by presenting this alternative test in the case of a biallelic locus in an outbred sample with related individuals and present its general expression in a second step. We then clarify how it differs from the QL-HW test.

### Special case—corrected χ^{2} test for HW for biallelic loci in outbred samples with related individuals:

Let us first consider a biallelic locus genotyped in an outbred sample with independent individuals. In this case, the QL-HW test is the same as the Gof-HW test, which can be written as , where *N _{ij}* is the observed count for genotype

*ij*and

*p̂*is the allele frequency estimate of allele 1 obtained by naive counting [

_{n}*p̂*= (2

_{n}*N*

_{11}+

*N*

_{12})/2

*N*].

We next give an equivalent form for Gof-HW in terms of a vector-valued function which can be left unspecified except that is has the property that where *r*_{0} = 0. Here *D _{r}*,

*D*,

_{p}*Y*, and μ

_{0}are the vectors and matrix described in

*Quasi-score test for Hardy-Weinberg equilibrium (QL-HW test)*.

**K**is the [(

*M*− 1)

*N*× (

*M*− 1)

*N*] covariance matrix of

*Y*. Because all the individuals are outbred and independent,

**K**is a block-diagonal matrix with

**K**=

**I**

*⊗*

_{N}**K**

_{ind}, where

**K**

_{ind}is a 2 × 2 matrix whose entries depend only on

*p*and are given in the first part of Table 3. It turns out that the Gof-HW test is equal to

*W*

_{χ2HW}, where 6

Here, *Ŝ _{r}* is computed at (

*r*,

*p*) = (

*r*

_{0},

*p̂*

_{0}).

*p*=

*p̂*

_{0}is the solution of

*S*(

_{p}*r*

_{0},

*p*) = 0 and is actually equal to

*p̂*.

_{n}_{rr}denotes the (

*r*,

*r*) entry of var(

*S*)

^{−1}computed at (

*r*,

*p*) = (

*r*

_{0},

*p̂*

_{0}). It is actually equal to 1/

*N*.

One way to extend (6) so that it is valid when the individuals are related is to consider the same *S*(*r*, *p*) function (with **K** = **I*** _{N}* ⊗

**K**

_{ind}, thus neglecting the inter-individual correlations) and to recompute

_{rr}to take the inter-individual correlations into account. We have var

_{0}(

*Y*) =

**∑**, with elements given in Table 3, where all

*h*= 0 in this case. Using and noting that for outbred individuals, we get , where

_{i}**∑̂**,

**K̂**, and

*D̂*are computed at (

_{r}*r*,

*p*) = (

*r*

_{0},

*p̂*

_{0}) (here again,

*p̂*

_{0}=

*p̂*). We assess the accuracy of the χ

_{n}^{2}approximation to the null distribution of this corrected χ

^{2}(

*W*

_{CC}_{–}

*) statistic in*

_{HW}*Null distribution and power study of the W*.

_{QL–HW}and W_{GCC–HW}statistics### General form of GCC-HW test:

There does not appear to be a Pearson χ^{2} test corresponding to the HED model for multiallelic loci. Furthermore, in the case of inbreeding, while one can extend the Gof-HW test to allow for a mean level of inbreeding, it does not take into account the different inbreeding coefficients for different individuals. To generalize the *W*_{CC–HW} statistic, we consider the following framework. Let and define to be the solution of *C*(*r*, *p*) = 0 subject to constraints (1) and (2), assuming that there is exactly one such solution. Although the framework is more general, we consider here *C* to be the form 7for some invertible matrix **L** that may depend on *r* and *p*, with *D _{i}*,

**L**, and μ not depending on

*Y*. Assume that is consistent for the parameters of the HED model and is asymptotically normal. Following the arguments used by Cox and Hinkley (1974)(Chap. 9) to derive the score test, under certain regularity conditions, we consider the expansion of

*C*(

*r*,

*p*) in a Taylor series about where

Then, 8and 9

Under the assumptions made on , 10should have an asymptotic χ^{2} distribution with dim(*r*) d.f. Here is the (*r*, *r*) entry of var. To test the null hypothesis *r* = *r*_{0}, we plug in *r* = *r*_{0} in (10). *p* is a nuisance parameter and we use its restricted estimator *p̂*_{0}, where *p* = *p̂*_{0} is the solution of *C _{p}*(

*r*

_{0},

*p*) = 0, subject to constraints (1), assuming that there is exactly one such solution. Under certain regularity conditions, this is asymptotically equivalent to the case when the true

*p*is used.

The *W*_{GCC–HW} statistic has the general form defined in (10), where *C*(*r*, *p*) is given by (7) with **L** set to **K**, the matrix defined in the subsection *Special case—corrected χ ^{2} test for HW for biallelic loci in outbred samples with related individuals*.

are approximated with Equations 8 and 9, respectively. Doing so, we can consider the same HED model as the one used in the QL-HW test, which allows a distinct *h _{i}* for each individual, by simply using for μ the model described in Table 2. To test for HW, we use

*r*

_{0}= 0. The

**K**matrix used in

*C*(

*r*,

*p*) is the covariance matrix of

*Y*where the intra-individual correlations are taken into account but where the inter-individuals correlations are neglected. Thus,

**K**is still a block diagonal matrix but with distinct matrix elements

**K**

*of size (*

_{i}*M*− 1) × (

*M*− 1), whose entries depend on

*p*and

*h*, and

_{i}**K**

^{−1}is a block-diagonal matrix with matrix element

**K**

^{−1}

_{i}. We show in the appendix that

*W*

_{GCC–HW}reduces to , where

*Ĉ*=

_{r}*C*(

_{r}*r*

_{0},

*p̂*

_{0}) and 11computed at (

*r*,

*p*) = (

*r*

_{0},

*p̂*

_{0}). In outbred samples,

*p̂*

_{0}corresponds to the naive counting estimator in which allele frequencies are estimated by their observed proportions. In inbred samples,

*C*(

_{p}*r*

_{0},

*p*) = 0 may not have a closed-form solution, in which case we use the Newton-Raphson method, as we did for the QL-HW test:

*p̂*

_{0}is obtained by iteration using Equation 4, where

**∑**is replaced by

**K**. We start the iteration procedure with

*p̂*

^{1}

_{0}equal to the BLUE (McPeek

*et al.*2004) computed under the pretense that all the individuals have their correct inbreeding coefficients but are independent. Note that, because

*D*

_{r}^{T}

**K**

^{−1}

*D*= 0 when considering a biallelic locus in an outbred sample,

_{p}*W*

_{CC–HW}is a special case of

*W*

_{GCC–HW}. Whereas the computation of

*W*

_{QL–HW}requires the Cholesky decomposition of

**∑**, a covariance matrix of size [(

*M*− 1)

*N*× (

*M*− 1)

*N*], the computation of

*W*

_{GCC–HW}requires only the Cholesky decompositions of the

*N*

**K**

*matrices of size [(*

_{i}*M*− 1) × (

*M*− 1)]. It is thus much less computationally intensive. Similarly to the

*W*

_{QL–HW}quasi-score statistic, we assess the accuracy of the χ

^{2}approximation to the null distribution of

*W*

_{GCC–HW}in results.

When the *N* individuals belong to *F* independent families, *W*_{GCC–HW} and *p̂*_{0} are computed using the same formulas as in the one-family case presented above but with all the terms of the form *X*^{T}**K̂**^{−1}∑̂K̂^{−1}*B* computed as .

### Explanation of the difference between QL-HW and GCC-HW:

Because both have an appropriate variance correction, both QL-HW and GCC-HW are valid tests that are approximately χ^{2} distributed in sufficiently large samples under regularity conditions. The difference is that, in the definition of GCC-HW, the matrix **K** is plugged in for **L** in Equation 7, while in the definition of QL-HW, the matrix **∑** is plugged in for **L**. This difference potentially affects the power of the statistics. For instance, by an argument similar to that given in Bourgain *et al.* (2003), the QL-HW should be asymptotically more powerful than GCC-HW for local alternatives (*i.e.,* alternatives close to the null hypothesis) under certain regularity conditions, assuming that the HED model holds, and, in fact, should be asymptotically locally most powerful of all statistics of the type defined by Equation 7. We compare actual power of GCC-HW and QL-HW under simulated scenarios in results.

### Null distribution and power study of the *W*_{QL–HW} and *W*_{GCC–HW} statistics:

To test the accuracy of the χ^{2} approximation to the null distributions of the *W*_{QL–HW} and *W*_{GCC–HW} statistics, we assessed their distributions under the null hypothesis using (i) a sample of 30 outbred independent nuclear families with two parents and two sibs typed (sample nuc), (ii) a sample of 456 individuals from 10 extended outbred genealogies (sample gaw), and (iii) two samples of 611 (sample H1) and 306 (sample H2) Hutterites. The 10 extended genealogies were simulated for the Genetic Analysis Workshop 12 (Almasy *et al.* 2001). The different genealogies are independent but the mean within-genealogy kinship coefficient is 0.0415 ([0.030–0.054]). The Hutterites are a North American religious isolate originating from Tyrol whose entire population can be traced back to 90 ancestors in the 1700s/1800s. The complete genealogy of the sample could be constructed from a ≥12,000-person Hutterite pedigree. This yielded a 1623-person pedigree that included all known ancestors of the individuals in samples H1 and H2. The nine identity coefficients for all pairs of individuals, from which the inbreeding coefficients can also be obtained, were computed using the efficient algorithm implemented by Abney *et al.* (2000). Genotype information for a marker was simulated by randomly drawing alleles independently from the same allele frequency distribution for (i) the parents of the nuclear families or (ii) the founders of the 10 genealogies or (iii) the founders of the 1623-person pedigree. Mendelian transmission of these alleles was then simulated (i) in the nuclear families, (ii) in the 10 genealogies, or (iii) throughout the 1623-person pedigree. The validity of the χ^{2} null distribution for *W*_{QL–HW} and *W*_{GCC–HW} was assessed by comparing the proportion of simulations showing a statistic whose value is greater than the χ^{2} threshold for a nominal type I error and the value of this nominal type I error. For the outbred nuc and gaw samples, the empirical type I error of the classical Gof-HW test was also assessed, using either the χ^{2} distribution or a permutation procedure (where alleles are randomly reassigned to genotypes 5000 times) to obtain significance.

For the Hutterite samples, in addition to the χ^{2} approximation, we also had to develop a parametric bootstrap procedure to assess significance, as follows: for a given data set, we estimated the allele frequencies (*p̂*_{data}) using Equation 4 (with **∑̂** for *W*_{QL–HW} and **K̂** for *W*_{GCC–HW}) and computed the value of the statistic *W*_{data} (either *W*_{QL–HW} or *W*_{GCC–HW}). *p̂*_{data} was then used to simulate 5000 replicates of the original data set under the null hypothesis (following the procedure described in the previous paragraph). The statistic was computed in each replicate. The empirical significance was assessed as the proportion of simulated statistic values greater than *W*_{data}. In principle, one could double-check the type I error of this parametric bootstrap procedure by performing a nested set of simulations, but we did not attempt this for computational reasons.

Simulations were also used to assess power. Even though HW testing is used mostly to identify potential genotyping errors in SNP studies, generating a genotyping error model would also have created Mendelian incompatibilities. As Mendelian incompatibilities are routinely checked before any other test, considering such a genotyping error model would have implied evaluation of a combined Mendelian checking/HW testing approach, whereas we wanted to focus solely on HW testing. Thus, we chose to consider a model without Mendelian incompatibilities. Further, we required this model to be suitable for samples with fixed genealogy. We simulated a selection pattern where the allele transmission from parent to offspring is not equally likely for the two parental alleles (we retain independence of children's genotypes conditional on parents' genotypes). For a biallelic marker, a single parameter 0 < *s* < 1 is required to describe our model. Table 5 gives the corresponding transmission probabilities from parents to offspring as a function of *s*. Here, we considered the situation where, for parental couples who are compatible with both homozygous and heterozygous children, the probability that any given child is heterozygous is greater than would be expected under Mendelian inheritance: *s* > ^{1}/_{2}. Power was assessed in the samples nuc, H1 and H2 using 5000 replicates. For the Hutterite samples, founders' alleles were simulated assuming HWE in the founding population, with founding allele frequency taken to be 0.5, 0.7, or 0.85. In the case of nuc, we assume that the families are sampled from a population that is in a steady state for our selection model. We give in Table 6 the population genotype frequencies corresponding to this steady state in an infinite-size population under random mating. When *s* > ^{1}/_{2}, the genotype frequency distribution depends only on *s* and not on *p* (whatever the initial allele frequency, *p* = 0.5 in the steady state). The parental genotypes of the nuc families were sampled independently from the steady-state genotype distribution for various choices of *s* > 0.5. In such a nuclear family sampling scheme, the classical Gof-HW test would consider only the 60 independent parents. To measure the gain in power obtained while using QL-HW on the global sample instead of the Gof-HW test on the parent sample, we derived analytically the power of the Gof-HW test using a noncentral χ^{2} distribution.

### Analysis of biallelic markers:

We used the *W*_{QL–HW} statistic to test for HW in a set of 143 biallelic markers in 82 genes that were genotyped in samples of Hutterites. A variety of genotyping methods were used: multiplex PCR and immobilized probe linear array system (LAS) (Mirel* et al.* 2002) for 48 markers (34%), single base extension with fluorescent polarization (SBE-FP; Chen *et al.* 1999) for 44 markers (31%), dot blot with allele-specific oligos (Ober *et al.* 2000) for 26 markers (18%), restriction fragment length polymorphism analysis (RFLP) for 13 markers (9%), size separation on acrylamide gel for 9 insertion/deletion (in/del) polymorphisms (6%), allele-specific (AS) PCR for 1 marker (0.7%), SSCP (Gonen *et al.* 1999) for 1 marker (0.7%), and denaturing high-performance liquid chromatography (DHPLC; Oefner and Underhill 1995) for 1 marker (0.7%). All markers were checked for Mendelian incompatibilities in the general pedigree using PEDCHECK (O'Connell and Weeks 1998).

## RESULTS

### Null Distribution of the *W*_{QL–HW} and *W*_{GCC–HW} statistics:

Table 7 presents the empirical type I error of the classical Gof-HW, *W*_{QL–HW}, and *W*_{GCC–HW} statistics in the outbred samples of nuclear families and in the extended outbred genealogies. The classical Gof-HW test, whatever the significance procedure used, either the χ^{2} distribution or the permutation test, is anti-conservative. While the inflation in the type I error is moderate in the nuclear families (type I error of 7% instead of 5%), it becomes substantial in the extended GAW12 genealogies (type I error of 10%). In contrast, when the χ^{2} distribution is used to assess significance of the *W*_{QL–HW} or *W*_{GCC–HW} statistics, the empirical type I error lies within the 95% confidence interval of the nominal type I error (5%) for all three allelic distributions considered in the founders. In the two Hutterite samples (results shown in Table 8), the empirical type I error for both *W*_{QL–HW} and *W*_{GCC–HW} is smaller than the nominal one. Use of the χ^{2} approximations to the null distributions of the statistics would thus lead to conservative tests in the Hutterite sample. We note, however, that when simulating a sample consisting of 10 independent copies of the 1623-person Hutterite pedigree and running the two statistics on a sample of 6110 Hutterites (10 times sample H1), the empirical type I error for a biallelic locus with frequency 0.5 in the founders is not statistically different from the nominal one when assuming a χ^{2} distribution. Similarly, when considering only 2 outbred genealogies out of the 10 from the gaw sample (corresponding to 106 individuals deriving from 41 founders), the χ^{2} distribution provides valid type I errors for both the *W*_{QL–HW} and the *W*_{GCC–HW} tests. The asymptotic conditions required for the *W*_{QL–HW} and *W*_{GCC–HW} to have a χ^{2} distribution under the null hypothesis should thus be met in most samples, except in the extreme case where all the individuals belong to a single inbred pedigree. In this case, the central limit theorem either may not apply or may require a larger sample size to be a good approximation, and the test may be conservative. We give in Table 8 the thresholds corresponding to a 5% nominal type I error, for three different allelic distributions, in the two Hutterite samples. These thresholds vary with sample sizes and allele frequencies. We would thus recommend use of empirical *P*-values obtained by the parametric bootstrap (as described in methods), for samples in which all the individuals come from a single inbred pedigree.

### Power of the *W*_{QL–HW} and *W*_{GCC–HW} statistics:

Figure 2 displays the power of *W*_{QL–HW} in the nuc sample (30 nuclear families) for values of *s* varying between 0.5 and 0.75, where power was computed using the χ^{2} approximation. Figure 2 also displays the power of the Gof-HW test in the sample of 60 independent parents that can be extracted from the nuclear families. The gain in power obtained while using *W*_{QL–HW} instead of Gof-HW can be substantial: when *s* = 0.65, the power of *W*_{QL–HW} is 79% instead of 55% for the Gof-HW test. There is still a notable increase in power at more realistic values of *s* close to 0.5. The power of *W*_{GCC–HW} was also computed for the nuc sample, but it is not shown in the plot as the results were almost indistinguishable from those for *W*_{QL–HW}. Thus, the simpler *W*_{GCC–HW} appears to be approximately as powerful as the *W*_{QL–HW} in this sampling scheme. Because they allow one to use all the information available in the families and thus to increase the sample size, the QL-HW and GCC-HW tests may significantly enhance the power to detect HWD when familial data are available.

In the Hutterite sample, founder genotypes are not available, so the Gof-HW test in founders cannot be performed. Thus, we compare power only for the *W*_{QL–HW} and *W*_{GCC–HW} tests. To see a power difference between the two tests, we set *s* = 0.6, even though this value is probably unrealistically high for selection in a human population. Table 9 displays the results, where significance is assessed using a correct 5% nominal threshold obtained by simulation under the true model (the thresholds are shown in Table 8). For the purpose of simulation studies, this method of assessing significance was used because it is much faster than the parametric bootstrap that we use in the real data examples. Because the same method is used with both statistics, we expect this to have a negligible impact on the power comparison. The *W*_{QL–HW} performs slightly better than the *W*_{GCC–HW} in the Hutterite samples; however, the power difference between the two tests is quite small. Thus, use of *W*_{GCC–HW} instead of *W*_{QL–HW} in situations in which *W*_{QL–HW} cannot be computed (*e.g.*, a highly polymorphic marker in a pedigree with many individuals typed) should not have too strong an impact on power. Conversely, *W*_{QL–HW} should be preferred when computable. Note that the power of both methods in the Hutterite samples is expected to increase when the allele frequency becomes closer to 0.5. Indeed, as can be seen from Table 5, our model is such that selection acts only when there is at least one heterozygous parent. When the allele frequency becomes closer to 0.5, the proportion of heterozygous parents increases and the impact of selection increases as a result.

### Marker testing for HWE in the Hutterites:

The results for the 14 markers, of the 143 biallelic markers tested, that showed an associated *P*-value <5% by the *W*_{QL–HW} are shown in Table 10. Four of these markers (HLA-G_689, LTA_26, OR12D2_2, OR2H3_587) are located within the HLA region, which has been associated with homozygous deficiencies in this population (Robertson *et al.* 1999). The genotyping methods used to type the 14 markers in Table 10 were not very different from those used to type the 143 markers except that the only marker typed with allele-specific PCR was among the 14 markers with a *P*-value ≤0.05, markers typed by LAS were underrepresented (34% of all markers but only 14% of the markers with a *P*-value ≤0.05), and markers typed by dot blot were overrepresented (18% of all markers but 28% of the markers with a *P*-value ≤0.05) (see Figure 3). Thus, it is unlikely that any particular genotyping method was more error prone or contributing disproportionately. We further investigated the genotyping history of these 14 markers. Two markers, HLA-G_689 and OR12D2_2, produced an excess of Mendelian errors and had already been either discarded (for OR12D2_2) or retyped using another method (for HLA-G_689). Two additional markers, PSG4_1 and PLAUR_210, had a high rate of ambiguous genotypes. Overall, it is likely that the deviations from HWE for these 4 markers were due to genotyping errors, although only 2 were previously identified as problematic by Mendelian error checking.

Finally, the remaining 10 markers with a *P*-value ≤0.05 of 143 (6.9%) represent slightly more than what is expected just by chance (but the observed proportion is not significantly different from 5% at the 5% level). Given that the *P*-values associated with each of these 10 markers are not dramatically small and that we had at least 1 other marker typed in 9 of these genes that was not significant, we believe that the small deviations from expectation in these 9 genes might be due to chance. We cannot rule out, however, that these departures reflect the real action of evolutionary forces.

## DISCUSSION

We demonstrated in this article that the classical Gof-HW test has inflated type I error in samples with related individuals when significance is assessed either with a χ^{2} approximation or with a permutation test. Sing and Rothman (1975) previously discussed the problem of testing for HWE in small human populations with nonnegligible correlations among the individuals. Whereas they corrected the expectation of the χ^{2} in a very particular case of correlation among the observations, we propose in this article a general approach suitable for any sample with related individuals provided that their genealogy is known. Our QL-HW test models the correlation between the two alleles of each individual due to the pedigree structure and, by making use of the quasi-likelihood framework, takes into account the correlations among the individuals in the construction of the test.

We chose to base our test on the model of heterozygote excess or deficiency studied by Rousset and Raymond (1995), and we extended it to allow a different inbreeding coefficient for each individual in the sample. In the context of SNP studies, where deviations from HWE are likely to result from genotyping errors, our model is very general in allowing for deviations from HWE. For multiallelic markers, our model specifically tests for an excess or a deficiency of homozygosity. A primary reason for choosing such a model is parsimony. Furthermore, it is motivated by previous findings using microsatellites in which excess of all kinds of heterozygotes has been observed, in particular in the HLA region (Ober *et al.* 1997; Robertson *et al.* 1999). This model is also relevant for detection of genotyping errors. Indeed, as noted by Sobel *et al.* (2002), when automated software or trained technicians genotype by scoring bands on a gel, the most common error is false homozygosity. An error occurs in this case when an allele amplifies insufficiently or a band falls outside a prescribed range. In addition, these authors also discuss the possibility of false heterozygosity when a stutter band is misinterpreted as a second allele. Thus, we expect our model to be useful for detecting genotype errors in the cases of both SNPs and multiallelic markers. Alternatively, a model involving more parameters could be incorporated into our quasi-likelihood score test framework, if the available evidence favored such a model.

We also propose a simpler alternative test, the GCC-HW test, to be used when the QL-HW test might not be computationally feasible, for instance, in the case of highly polymorphic loci in large inbred pedigrees. We found that for a variety of study designs, ranging from many nuclear families to two or more extended genealogies, the χ^{2} approximation to the null distributions of both the QL-HW and the GCC-HW was very accurate. It was only in the extreme case of a single large inbred pedigree that the tests were rather conservative. In that case, we recommend using the parametric bootstrap to assess significance. Further, we showed that the use of the GCC-HW test in place of the QL-HW test has only a very minor effect on the power to detect deviation from HWE. Finally, we illustrated on nuclear family data how the QL-HW test can improve the power to detect departure from HWE over the classical Gof-HW test.

We believe that, by making the HWE test available for a larger variety of samples and by increasing sample sizes through the inclusion of related individuals that are typically discarded, both the QL-HW and GCC-HW tests are of crucial interest in several relevant situations for geneticists. In particular, as illustrated by the data analysis presented here, HWE testing is an additional tool for genotyping error detection. Indeed, of the four markers with HWD and strong suspicion of typing errors after scrupulous checking, only two had been previously detected by regular Mendelian error checking. We thus strongly recommend that HW tests be routinely performed to detect genotyping errors, particularly while working with SNPs. Finally, used very cautiously, the QL-HW and GCC-HW tests could also be considered as possible tools to detect association in samples with related individuals. The program to run the QL-HW and GCC-HW tests is available on request from C.B. (dbSNP home page, http://www.ncbi.nlm.nih.gov/SNP/index.html).

## APPENDIX:

### EXPRESSION OF *W*_{GCC–HW}

To derive *W*_{GCC–HW}, we need to evaluate (*r̂* − *r*) and at (*r*, *p*) = (*r*_{0}, *p̂*_{0}) using approximations (8) and (9), respectively, which require the computation of **A** ≡ *Ê*^{−1}_{} and **B** ≡ . We use the notation

It follows from (7) with **L** = **K** that , , and . Because *C _{p}*(

*r*

_{0},

*p̂*

_{0}) = 0 and letting

*Ĉ*=

_{r}*C*(

_{r}*r*

_{0},

*p̂*

_{0}), (8) implies that |

_{}≈ α

^{rr}

*Ĉ*

_{r}. Similarly, (9) implies

Note that and , where . Plugging these expressions in (10) gives, after some algebra, the final expression of den_{GCC} given in (11).

## Acknowledgments

The authors acknowledge Dina Newman for technical assistance and two anonymous reviewers for their thoughtful comments. The GAW12 data are available thanks to the Genetic Analysis Workshop grant GM31575. This study was supported by National Institutes of Health grants HL56399, HL66533, HG001645, and DK55889.

## Footnotes

Communicating editor: K. W. Broman

- Received May 28, 2004.
- Accepted September 8, 2004.

- Genetics Society of America