Genetics, Vol. 158, 875-883, June 2001, Copyright © 2001

An Unconditional Exact Test for the Hardy-Weinberg Equilibrium Law: Sample-Space Ordering Using the Bayes Factor

Luis E. Montoya-Delgadoa, Telba Z. Ironyb, Carlos A. de B. Pereirac, and Martin R. Whittled
a Universidad del Cauca, Popayan, Cauca, Colombia,
b Division of Biostatistics, Center for Devices and Radiological Health, Food and Drug Administration, Rockville, Maryland 20850,
c Instituto de Matematica e Estatistica, Universidade de São Paulo, 05008-090 São Paulo, SP, Brazil
d Genomic Engenharia Molecular Ltda, 01332-903 São Paulo, SP, Brazil

Corresponding author: Martin R. Whittle, Genomic Engenharia Molecular Ltda, Rua Itapeva 500, cj 5AB, 01332-903 São Paulo, SP, Brazil., mwhittle{at}genomic.com.br (E-mail)

Communicating editor: G. A. CHURCHILL


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*DISCUSSION
*LITERATURE CITED

Much forensic inference based upon DNA evidence is made assuming that the Hardy-Weinberg equilibrium (HWE) is valid for the genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, each having advantages and limitations. The limitations become more obvious when testing for deviation within multiallelic DNA loci is attempted. Here we present an exact test for HWE in the biallelic case, based on the ratio of weighted likelihoods under the null and alternative hypotheses, the Bayes factor. This test does not depend on asymptotic results and minimizes a linear combination of type I and type II errors. By ordering the sample space using the Bayes factor, we also define a significance (evidence) index, P value, using the weighted likelihood under the null hypothesis. We compare it to the conditional exact test for the case of sample size n = 10. Using the idea under the method of {chi}2 partition, the test is used sequentially to test equilibrium in the multiple allele case and then applied to two short tandem repeat loci, using a real Caucasian data bank, showing its usefulness.


ONE of the major uses of data from human multiallelic DNA loci is forensic inference. Because of the increasing use of variable number of tandem repeats (VNTR) and short tandem repeats (STR) loci, the importance of the Hardy-Weinberg equilibrium (HWE) has been reinforced (DEVLIN et al. 1991 Down; GEISSER and JOHNSON 1992 Down; DEKA et al. 1995 Down; AYRES and BALDING 1998 Down; SHOEMAKER et al. 1998 Down) by being a useful assumption in the analysis of DNA evidence, as used in human identification and paternity studies. The conclusions reached by analyzing such evidence depend on the probabilistic evaluation of them and this evaluation is simplified if HWE is shown to be valid.

To test the HWE, usually the {chi}2 test, the conditional exact test, and the likelihood-ratio test are used. For a complete discussion on these procedures, including some comparisons, see, for instance, HERNANDEZ and WEIR 1989 Down, GUO and THOMPSON 1992 Down, MAISTE 1993 Down, WEIR 1996 Down, and LAZZERONI and LANGE 1997 Down. The conditional exact test is analogous to Fisher's exact test for contingency tables. There is a sufficient statistic, under the null hypothesis, that is considered to be known. Hence the p value is based on the conditional probabilities of the sample points given the value of the statistic.

MAISTE 1993 Down and MAISTE and WEIR 1995 Down contrasted these tests and claimed to show that the exact conditional test has a better performance. A problem with this test is to define an order in the sample space to calculate the p value. In fact, this is the main difficulty when dealing with high-dimension sample spaces (KEMPTHORNE and FOLKS 1971 Down). In addition, EMIGH 1980 Down makes a useful comparison of the various equilibrium tests.

In cases where the number of alleles per locus or the sample size is large, a technique to generate a Markov chain is used. The objective is to estimate the p values for the exact conditional test. For more details see GUO and THOMPSON 1992 Down and LAZZERONI and LANGE 1997 Down. Today it is possible to perform the exact conditional test (or at least an approximation of it) under any sample size.

The {chi}2 test is highly dependent on asymptotic results. In addition to being inefficient when the sample is insufficiently large, it may fail whenever there are categories (genotypes) of low expected frequencies. In the problem we consider, these genotypes are present in large numbers because of the inherent genetic structure and so it is not unusual to find an allele that appears only once or twice in a database, even when there are a considerable total number ({approx}5000) of points.

Being a nonasymptotic test, the conditional exact test is, on the other hand, useful in cases where small samples are dealt with. It depends on a multinomial distribution and requires the ranking of all the possible samples with the same frequencies of alleles and same sample size. However, this ranking of the possible samples is problematic as we enlarge the size of the sample. Another disadvantage of this test is that it imposes unrealistic probabilities on the data points whose probabilities had already been determined from conditioning. Recall that, by conditioning, the sample space is severely reduced and the probabilities may be drastically increased.

AYRES and BALDING 1998 Down propose a computational methodology to estimate the inbreeding coefficient, which allows one to measure the deviation from HWE under a model of inbreeding. SHOEMAKER et al. 1998 Down describe a Bayesian methodology to study the independence between pairs of alleles in a given locus; in this case, they consider the inbreeding coefficient and the disequilibrium coefficient (HERNANDEZ and WEIR 1989 Down).

The problem of estimating the allelic frequencies, under the Bayesian perspective, was considered by GUNEL and WEARDEN 1995 Down. CHOW and FONG 1992 Down studied this same problem but as a particular case of the simultaneous estimation of the related proportions.

The aim of this study is to develop an exact test on the basis of the comparison between weighted likelihoods (DICKEY and LIENTZ 1970 Down) under the null and alternative hypotheses. The ratio of these two functions is the Bayes factor (BF). A distribution of the BF under the null hypothesis defines a natural order in the sample space. Therefore the test is exact and unconditional and does not depend on asymptotic results. In addition, the test is desirable, in terms of decision theory, in that it minimizes a linear combination of type I and type II errors. The approach suggested by BARNARD 1945 Down may be used to construct a non-Bayesian test that is unconditional and exact. In fact it considers suprema in the place of weighted averages.

The weight used to calculate the weighted likelihoods under each of the hypotheses is based on the a priori preferences, which are derived from the choice of an a priori distribution over the parametric space. The dimension of the subspace defined by the hypothesis of HWE is smaller than that of the original parametric space. Therefore, to calculate the weighted likelihood under the null hypothesis, we use line integrals. In the same manner, type I and type II errors are weighted average errors (IRONY and PEREIRA 1995 Down). In other words, suppose {alpha}1 and ß1 are the errors associated with the Neyman-Pearson test of simple hypotheses H0, {theta} = {theta}0 vs. H1, {theta} = {theta}1, where {theta}0 is a parametric element of the equilibrium curve and {theta}1 is outside the curve. Consider now the set of all such pairs. The weighted errors for the unconditional exact test are the weighted averages of these Neyman-Pearson errors.

Usually, a test of hypothesis consists of comparing the supremum of the likelihood in the subset of the parameter space corresponding to the null hypothesis with the supremum in the whole parameter space. The test presented in this article consists of comparing the averages of the likelihoods of these sets. That is, instead of comparing suprema the test compares averages. The most important property of the test is that it minimizes linear combinations of the two kinds of errors. For instance, if {alpha} and ß are errors of the first and second types, the test presented here minimizes {alpha} + ß (DEGROOT 1989 Down). Thus for a fixed {alpha}, it maximizes (1 - ß) - {alpha} and consequently also the power of the test.

We define the significance level of the test by ordering the sample space using the Bayes factor. The BF, following GOOD 1983 Down, is the ratio of the weighted likelihood under the null hypothesis and the weighted likelihood under the alternative hypothesis. To compare two sample points, s and t, we calculate the BF of these points. The order of the sample points follows the order of the BFs. After ordering the sample space, consider s as the sample observation. We define the P value as the sum of the weighted likelihood under the null hypothesis over the set of points smaller or equal to s. The idea of this Bayesian significance level is not new: it was suggested by KEMPTHORNE and FOLKS 1971 Down. Note that the significance level as such uses the whole sample space in its calculus and then may not follow the likelihood principle (ROYALL 1997 Down). Hence our P value may not by considered a full Bayesian procedure. A full Bayesian significance test for equilibrium and for contingency tables can be found in PEREIRA and STERN 1999 Down.

The significance level, the P value, takes into consideration the alternative hypothesis in its calculus, which controls the type II error. Recall that the Bayes factor is the ratio between the probabilities under the two hypotheses. Usually this is not considered for the standard P value, which can cause problems such as rejecting the null hypothesis even when, under the alternative hypothesis, the observed sample has a lower probability (PEREIRA and WECHSLER 1993 Down).

A program to compute our P value is available for the MatLab environment at the website http://www.ime.usp.br/~cpereira/signifpr.html.

A more complex test environment where two loci are considered simultaneously is presented by DEVLIN et al. 1996 Down. A future useful project will be to construct a Bayesian alternative test for this situation and extend it for more than two loci.


*  METHODS
*TOP
*ABSTRACT
*METHODS
*DISCUSSION
*LITERATURE CITED

To exemplify the use of the test (see Examples), data from two STR loci were analyzed: D17S250 (WEBER et al. 1990 Down) and MYC (POLYMEROPOULOS et al. 1992 Down). Genomic DNA was obtained from unrelated, predominantly Caucasian individuals undergoing paternity testing nationwide by Genomic Engenharia Molecular Ltda. Alleles were amplified by PCR in the presence of [{alpha}-32P]dCTP, separated on DNA sequencing gels by electrophoresis, and visualized by autoradiography. Allele sizing was done by running adjacent M13 sequence ladders.

Definition and presentation of the exact unconditional test
Consider a single autosomal biallelic locus, comprising alleles a1 and a2, which does not undergo mutation. Let

be the genotypes at this locus, and we denote p1, p2, and p3 as the respective proportions of these genotypic classes in the population (p1 + p2 + p3 = 1). Let us suppose that the system is codominant; that is, distinct genotypic classes define distinct phenotypic classes. In this way, in a sample of size n, the frequencies of members in each class n1, n2, and n3, satisfying the condition n1 + n2 + n3 = n, can be observed.

In a panmictic population obeying Mendelian rules, equilibrium is attained in one reproductive generation and this assures the existence of a real number p {isin} (0, 1), such that the genotypic proportions satisfy the relations

(1)

Hence, to decide as to the existence or not of equilibrium, it is necessary to test the null hypothesis H0, {omega} = (p1, p2, p3) {isin} {Omega}0, where

against the alternative hypothesis H1, {omega} {isin} {Omega}1 = {Omega} - {Omega}0, where

Consequently, the statistical problem of interest is the construction of a procedure to test the following two alternative hypotheses,

and

where {Omega}, {Omega}0, and {Omega}1 are defined as above (see Fig 1).



View larger version (56K):
In this window
In a new window
Download PPT slide
 
Figure 1. The HWE curve. The complete parametric space is shown by the shaded area and the Hardy-Weinberg equilibrium is represented by the curve.

Assuming that the sample elements are obtained independently, by using a Bernoulli multivariate process, prefixing the sample size n, and representing the data by d = (n1, n2, n3), we have that the likelihood function is given by

(2)

where {omega} {isin} {Omega}.

Let us represent the researcher prior preferences by a Dirichlet density function (see WILKS 1968 Down) with parameter vector ({alpha}1, {alpha}2, {alpha}3), {alpha}i > 0. That is, if {Gamma}[·] represents the Gamma function,

(3)

is the function that defines the prior preferences of the possible parameter points {omega} = (p1, p2, p3). From the Bayesian perspective, to choose this density is to choose a conjugate prior since the posterior density will also be a Dirichlet density with parameters (A1, A2, A3), where Ai = {alpha}i + ni, i = 1, 2, or 3.

Considering now (3) as the weighing system, the weighted likelihood average over {Omega} is given by

(4)

Also note that {Omega}0 is a line inside the simplex {Omega} and hence the weighted likelihood average over {Omega}0 is the ratio of two line integrals as follows:

(5)

Let us suppose that the a priori density over {Omega} is uniform. Thus, making {alpha}i = 1 for i = 1, 2, 3 in Equation 4 and 6, this means that the exact values of the weighted likelihoods over {Omega}1 and {Omega}0 are given, respectively, by

(7)

and

Consequently if we assume that a priori the two hypotheses have equal probabilities, 0.5, we obtain

(8)

A test for the hypothesis of HWE consists of comparing BF [d] with unity. In this case we have a test that minimizes the sum of the average of the two types of errors.

Sometimes the exact calculation of Equation 8 is not feasible and so it is useful to show approximations to its determination. An approximation to Equation 8 (using Taylor's expansion) is given by

Note that although we have no closed form for general Dirichlet priors, using Equation 5 and 6, and numerical integration, we can easily compute the Bayes factor for any choice of the prior parameters. In computing the P value using the program mentioned above for general Dirichlet priors, one needs only to adjust the data input. Instead of inputing the vector (x1, x2, x3), one must use (A1 - 1, A2 - 1, A3 - 1).

In this discussion we emphasize the use of uniform priors only for the purpose of a fair comparison with the alternative classical methods. Recall that with the uniform prior, the posterior is the normalized likelihood function. In the next section we provide a comparison of this proposed test for HWE to the conditional exact test.

Comparison between the unconditional and the conditional exact tests
In this section we compare the exact unconditional test proposed in the previous section to the traditional conditional exact test. Considering all possible samples of size n = 10, we calculate the P value for each of these samples. To pinpoint the two different ways of computing the probabilities, we refer to p value in the conditional test and to P value in the unconditional one (see PEREIRA and WECHSLER 1993 Down). Table 1 lists the BF and the P values. Table 2 and Table 3 show, respectively, the P values and p values, multiplied by 100, for the unconditional and for the conditional exact tests.


 
View this table:
In this window
In a new window

 
Table 1. Bayes factor and unconditional exact test P value


 
View this table:
In this window
In a new window

 
Table 2. Unconditional exact test (two-tailed): P value multiplied by 100


 
View this table:
In this window
In a new window

 
Table 3. Unconditional exact test (two-tailed): p value multiplied by 100

From Table 2 and Table 3 it can be seen that the conditional exact test is more conservative than the unconditional exact test. The unconditional test is observed to minimize the sum of the average of the two types of errors. For a sample where (n1, n2, n3) = (1, 8, 1) the unconditional exact test rejects the hypothesis of equilibrium (P = 0.036744 {approx} 0.04). Meanwhile the conditional exact test will not reject this hypothesis (p = 0.20). The p value of the conditional exact test can be obtained from the weighted likelihood ratios. For example, let us fix the sample size at n = 10 and suppose that the total number of observed elements that show the allele a1 is 9. That is, T = 2n1 + n2 = 9. To compute the p value of the conditional exact test, it is enough to consider all possible sample points for which T = 9. To determine the conditional probability of a sample point (n1, n2, n3) given that 2n1 + n2 = 9 we have only to divide the BF obtained in this point by the sum of the BF of all points having T = 9. With these probabilities calculated we compute the p values in the usual manner, adding to the probability of each point all the smaller ones. Table 4 illustrates this calculation. Note that column 4, which is equal to column 3, is the second column divided by its sum.


 
View this table:
In this window
In a new window

 
Table 4. Conditional p value based on BF for the case where n = 10 and T = 9

In the general case, we maintain the sample size at n and the total number of observed elements that have the allele a1 at T = t. Considering the statistic T defined by T = 2n1 + n2, where n1 and n2 are random variables that denote the number of individuals observed in the sample who have the genotype (a1, a1) and (a1, a2), respectively, this means that for each d = (n1, n2, n3) with 2n1 + n2 = t and {Sigma}3i=1ni = n, whatever the value of p at (0, 1), the conditional probability Pr[d|T = t] is given by the equation

where

Note that the sample space for n = 10 has a total of 66 sample points, in contrast to the conditional test, where T = 10, which considers only a sample set of 6 points: (0, 10, 0), (1, 8, 1), (2, 6, 2), (3, 4, 3), (4, 2, 4), and (5, 0, 5).

Hierarchical sequential testing for multiple alleles
The ideal situation to build the significance test for the multiallelic case would be to consider an ordering in the whole space. However, the dimensionality of the parameter space is incredibly large. For instance, consider a locus with 20 alleles. For this example, the parameter space will have dimension 210. Hence the number of possible sample points increases drastically. Theoretically, substituting line integrals with surface integrals, we can proceed exactly as in the biallelic case but at an extremely high computational cost. Next we present a sequential procedure that loses in precision in benefit of cost.

Let us consider a single autosomal locus with multiple codominant alleles. Let (ai, aj) be the genotype referring to the alleles ai and aj, i = 1, · · ·, m; j = i, · · ·, m, and

is the proportion of the genotypic class (ai, aj) in the population.

Because the system is codominant, in a sample of size n, the frequencies of the elements in each genotypic class n1, n2, · · ·, nm(m+1)/2, with {Sigma}i=1ni = n can be identified.

Assuming that the sample elements are obtained independently, by using a Bernoulli (multivariate) process, we see that the likelihood function is proportional to

where

In a manner analogous to the case of the biallelic locus, the condition of equilibrium given in Equation 1 is characterized in the general case by Equation 9, because of the following statement: in a panmictic population that obeys Mendelian laws, equilibrium is attained in one reproductive generation and there are u1, u2, · · ·, um {isin} (0, 1) with {Sigma}mj=1 uj = 1, such that they obey the relation

(9)

Although we can define the exact unconditional test using the BF (as in the case with two alleles), great difficulties arise when calculating the surface integrals in this case. Upon examining the population HWE determined by Equation 9, we can prove that the following statement, a property of the multinomial distribution, is true. This comprises the basis of the procedure that we propose to test the hypothesis of equilibrium in a situation of multiple alleles.

Statement: Let a1, a2, · · ·, am be the alleles under consideration. If for any i1 = 1, 2, · · ·, n, the condition of HWE given in Equation 1, is satisfied by the biallelic system ai1, AI1, where

and if for every j = 2, · · ·, m - 1, the condition of HWE is satisfied by the biallelic system aij, AIj, whatever ij in Ij-1, where

then the system with alleles a1, a2, · · ·, am satisfies the condition of HWE given by Equation 9.

Thus if a system with n alleles, A0 = {a1, a2, · · ·, am}, obeys the law of HWE, this law is obeyed by "any system" obtained by partitioning A0 into at least two nonempty subsets and upon considering each element of this partition as "an allele."

The idea under the HWE test for the multiallelic case is based on the chi-square partition in contingency tables (see EVERITT 1977 Down, for instance). Consider a data bank, sample S, for a specific locus and consider also that there are m (a positive integer) different alleles, a1, a2, · · ·, am. The order of testing, starting from the smallest allele frequency to the highest one, has the objective of working, in each step, with the biggest possible sample. The reason for this is to try to work, in all steps, with the smallest possible errors.

The sequential procedure, to test the hypothesis of HWE, is as follows.

Procedure:

  • Step 1: Without loss of generality, call a1 the least frequent allele in sample S.

  • Step 2: Divide the sample S into three mutually exclusive sets:

  • S11, all individuals with genotype (a1, a1); S1., all individuals with genotype (a1, a1), i != 1; and S.., all individuals not having the allele a1.

  • Step 3: Apply the unconditional exact test for the biallelic case in the partition (S11, S1., S..). If HWE is rejected stop and declare the population to be in disequilibrium. If HWE is accepted go to step 4.

  • Step 4: If S.. is composed of elements with only one allele involved, stop and declare the population to be in equilibrium. If more than one allele is involved in the elements of S.., rename S.. as S and go to step 1.

Examples
In the examples we illustrate the sequence in which the tests for equilibrium were performed and present the values of the Bayes factors with respective p values and the values of the p values for the corresponding {chi}2 tests.

Let ma be the number of alleles present in the locus being studied. For each i = 1, · · ·, ma - 1, we define

  • ai, the allele chosen to carry out the ith test;

  • (ai)c, the set made up of the remaining alleles to carry out the ith test;

  • ni1, the number of genotypes (ai, ai);

  • ni2, the number of genotypes (ai, (ai)c);

  • ni3, the number of genotypes ((ai)c, (ai)c); and

  • BFi, the Bayes factor corresponding to the ith test.

EXAMPLE 1: The following data were obtained from the STR locus MYC in which 19 alleles are observed and na = 5714 (Table 5).


 
View this table:
In this window
In a new window

 
Table 5. Testing results for Example 1 (p value for {chi}2 test)

In this case, therefore, the hypothesis of HWE should be rejected.

EXAMPLE 2: Here the data were obtained from the STR locus D17S250 in which 21 alleles are seen and na = 5592 (Table 6).


 
View this table:
In this window
In a new window

 
Table 6. Testing results for Example 2 (p value for {chi}2 test)

The hypothesis is not rejected in this example.


*  DISCUSSION
*TOP
*ABSTRACT
*METHODS
*DISCUSSION
*LITERATURE CITED

A Bayesian test of hypothesis was presented in this article. However, its evaluation and comparison with the alternative conditional test are done under a classical perspective. To start this discussion we recall that the generalized Neyman-Pearson (GNP) test is an optimal test under both perspectives, classical and Bayesian (DEGROOT 1989 Down, Section 8.2). In the GNP situation, one compares two probability (density) functions, f0, the probability function under the null hypothesis, against f1, the one under the alternative hypothesis. Having chosen a constant k > 0, if f0 > (>=) k f1, then the null hypothesis is accepted. On the other hand, if f0 <= (<) k f1, then the null hypothesis is rejected. Note that, in fact, we compare the values of the two likelihoods at the sample point that effectively occurred, to make the decision. If {alpha} and ß are, respectively, the probabilities of the two kinds of errors, the GNP is the test that minimizes the linear combination {alpha} - kß. Considering adequately the choice of k as a function of losses and prior probabilities, this linear combination is the minimum expected loss, which makes the test optimal under the Bayesian perspective. PRESS 1989 Down presents a complete description of the Bayesian method.

The HWE case is different in that both hypotheses are composite. That is, each hypothesis is represented as a set of probability functions. One of the difficulties is that the two sets have different dimensions. The alternative hypothesis is bidimensional although the null hypothesis is unidimensional. The idea of the test, under the classical point of view, is to define as the two probability functions, f0 and f1, the averages of the likelihood over the parametric sets defined by the null and the alternative hypotheses, respectively. Having now two probability functions, we apply the GNP procedure to define the critical region. To compute the average p value we have to order the sample space. We say that a sample point xi is higher than xj, denoted xj <= xi, if f0(xi)/f1(xi) > f0(xj)/f1(xj). To define the P value (not p value) we consider the sum of f0(xj) over all sample points xj <= xo, where xo is the sample point effectively observed. Since we cover the whole sample space, we have called the test an unconditional exact test.

The unconditional test is opposite to the one that considers as the likelihood under the null hypothesis the conditional probability function given the observed value of T = 2n1 + n2, which is a sufficient statistic under the null hypothesis (HALDANE 1954 Down; CHAPCO 1976 Down; ELSTON and FORTHOFER 1977 Down). To compute the p value in this case one must look only for the sample points with the same value of 2n1 + n2 obtained by the observed one. This is the reason to call this test a conditional test. Tentatively, GEISSER and JOHNSON 1992 Down presented an unconditional test that was based on quantiles. However, it seems to be not appropriate as discussed by DEVLIN et al. 1993A Down, DEVLIN et al. 1993B Down. CANNINGS and EDWARDS 1969 Down without conditioning presented a way to estimate a deviation from the HWE. However, they did not discuss hypothesis testing.

Turning here to our procedure, from the Bayesian point of view, where a posterior density is defined over the parametric space, one could say that the test is fully conditional. The reason is obvious because we compute a conditional density for the parameter given the observed sample point. Considering the uniform prior in the parametric space and 0.5 as the prior probability of the null hypothesis to be true, the ratio of the average likelihoods, as presented, is the posterior odds, which compared with a chosen k would define the testing procedure presented. This is a test that minimizes + k, where and are the average of the probabilities of errors of types I and II, respectively (PEREIRA 1985 Down).

As far as we know, PEREIRA and ROGATKO 1984 Down presented the first Bayesian article for testing the HWE. However, ALTHAM 1971 Down presented a Bayesian estimation of a parameter that can be used to evaluate the HWE. It does not mention the HWE because it is described in a different context. PEREIRA and ROGATKO 1984 Down defined an ad hoc way to define the likelihoods, which could not be properly supported. They also presented credible sets that could be used to test HWE. The value of the credibility was used to define the size of the first kind of error.

LINDLEY 1988 Down considers a Bayesian estimation for equilibrium parameters in the case of two alleles. The parameters studied are obtained from an alternative parameterization that on the one hand allows the use of Gaussian priors, but on the other complicates the interpretation at the moment of assessing the a priori distribution.

The hierarchical sequential procedure described in this article is based on intuition. Recall that multinomial likelihoods can be factorized using partitions on the set of categories. Also, the HWE is a special kind of association among alleles at a specific locus. Whenever we conclude that a specific allele is in HWE association with all the others, we believe we do not have to use it again when testing the remaining alleles. It could be argued that, by using this procedure, the probability of rejecting HWE may increase as alleles are being eliminated. However, since the sample size is decreasing, the power of the tests will decrease. Hence, it is reasonable to believe that there is a compensation and that the procedure will do the job fairly. Note also that the sequence order depends on the rarity of the alleles in such a way that the sample size reduction occurs as slowly as possible.

Today, the use of Bayesian ideas in genetics is a reality. More recently, an interesting article by VIELAND 1998 Down suggested that the future of genetic data analysis is strongly related to the Bayesian paradigm.


*  ACKNOWLEDGMENTS

The views expressed are those of the authors and not necessarily those of the FDA.

Manuscript received September 17, 1999; Accepted for publication March 1, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*DISCUSSION
*LITERATURE CITED

ALTHAM, P. M. E., 1971  Exact Bayesian analysis of an intraclass 2 x 2 table. Biometrika 58:679-680[Abstract/Free Full Text].

AYRES, K. L. and D. J. BALDING, 1998  Measuring departures from Hardy-Weinberg: a Markov chain Monte Carlo method for estimating the inbreeding coefficient. Heredity 80:769-777.

BARNARD, G. A., 1945  A new test for 2 x 2 tables. Nature 156:177.

CANNINGS, C. and A. M. F. EDWARDS, 1969  Expected genotypic frequencies in a small sample: deviation from the Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 21:245-247[Medline].

CHAPCO, W., 1976  An exact test of the Hardy-Weinberg law. Biometrics 32:183-189[Medline].

CHOW, M. and D. K. H. FONG, 1992  Simultaneous estimation of the Hardy-Weinberg proportions. Can. J. Stat. 20:291-296.

DEGROOT, M. H., 1989 Probability and Statistics, Ed. 2. Addison Wesley, Reading, MA.

DEKA, R., L. JIN, M. SHRIVER, and L. YU, 1995  Population genetics of dinucleotide (dC - dA)n: (dG - dT)n polymorphisms in world populations. Am. J. Hum. Genet. 56:461-474[Medline].

DEVLIN, B., N. RISCH, and S. ROEDER, 1991  Forensic DNA tests and Hardy-Weinberg equilibrium. Science 253:1039-1041[Free Full Text].

DEVLIN, B., N. RISCH, and S. ROEDER, 1993a  Statistical evaluation of DNA fingerprinting: a critique of the NRC report. Science 259:748-749[Free Full Text].

DEVLIN, B., N. RISCH, and S. ROEDER, 1993b  NRC report on DNA typing. Science 260:1057-1058[Free Full Text].

DEVLIN, B., N. RISCH, and S. ROEDER, 1996  Disequilibrium mapping: composite likelihood for pairwise disequilibrium. Genomics 36:1-16[Medline].

DICKEY, J. M. and B. P. LIENTZ, 1970  The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. Ann. Math. Stat. 41:214-266.

ELSTON, R. C. and R. FORTHOFER, 1977  Testing the Hardy-Weinberg equilibrium in small samples. Biometrics 33:536-542.

EMIGH, J. M., 1980  A comparison of tests for Hardy-Weinberg equilibrium. Biometrics 36:627-642.

EVERITT, B. S., 1977 The Analysis of Contingency Tables. Chapman & Hall, London.

GEISSER, S. and W. JOHNSON, 1992  Testing Hardy-Weinberg equilibrium on allelic data from VNTR loci. Am. J. Hum. Genet. 51:1084-1088[Medline].

GOOD, I. J., 1983 Good Thinking: The Foundations of Probability and Its Applications. University of Minnesota Press, Minneapolis.

GUNEL, E. and S. WEARDEN, 1995  Bayesian estimation and testing of gene frequencies. Theor. Appl. Genet. 91:534-543.

GUO, S. and E. THOMPSON, 1992  Performing the exact test for Hardy-Weinberg proportion for multiple alleles. Biometrics 48:361-372[Medline].

HALDANE, J. B. S., 1954  An exact test for randomness of mating. J. Genet. 52:631-635.

HERNÁNDEZ, J. L. and B. S. WEIR, 1989  A disequilibrium coefficient approach to Hardy-Weinberg testing. Biometrics 45:53-70[Medline].

IRONY, T. Z. and C. A. PEREIRA, 1995  Bayesian hypothesis test: using surface integrals to distribute prior information among hypotheses. Resenhas 2:27-46.

KEMPTHORNE, O., and L. FOLKS, 1971 Probability, Statistics, and Data Analysis. The Iowa State University Press, Ames, IA.

LAZZERONI, L. C. and K. LANGE, 1997  Markov chains for Monte Carlo tests of genetic equilibrium in multidimensional contingency tables. Ann. Stat. 27:138-168.

LINDLEY, D. V., 1988  Statistical inference concerning Hardy-Weinberg equilibrium. Bayesian Stat. 3:307-326.

MAISTE, P. J., 1993 Comparison of statistical tests for independence at genetic loci with many alleles. Ph.D Thesis, North Carolina State University, Raleigh, NC.

MAISTE, P. J. and B. S. WEIR, 1995  A comparison of tests for independence in the FBI RFLP data bases. Genetica 96:125-138[Medline].

PEREIRA, C. A. DE B., 1985 Testing hypotheses defined in spaces of different dimensions: Bayesian vision and classical interpretation. Associate Professor Thesis, Instituto de Matemática e Estatística, University of São Paulo, Brazil (in Portugese).

PEREIRA, C. A. DE B. and A. ROGATKO, 1984  The Hardy-Weinberg equilibrium under a Bayesian perspective. Braz. J. Genet. 4:689-707.

PEREIRA, C. A. and J. M. STERN, 1999  Evidence and credibility: full Bayesian significance test for precise hypotheses. Entropy 1:69-80.

PEREIRA, C. A. DE B. and S. WECHSLER, 1993  On the concept of P-value. Braz. J. Prob. Stat. 7:159-177.

POLYMEROPOULOS, M. H., H. XIAO, and C. R. MERRIL, 1992  Dinucleotide repeat polymorphism at the human c-myc oncogene locus (MYC). Hum. Mol. Genet. 1:65[Free Full Text].

PRESS, S. J., 1989 Bayesian Statistics: Principles, Models, and Applications. John Wiley, New York.

ROYALL, R., 1997 Statistical Evidence: A Likelihood Paradigm. Chapman & Hall, London.

SHOEMAKER, J., I. PAINTER, and B. S. WEIR, 1998  A Bayesian characterization of Hardy-Weinberg disequilibrium. Genetics 149:2079-2088[Abstract/Free Full Text].

VIELAND, V. J., 1998  Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage. Am. J. Hum. Genet. 63:947-954[Medline].

WEBER, J. L., A. E. KWITEC, P. E. MAY, F. S. COLLINS, and D. H. LEDBETTER, 1990  Dinucleotide repeat polymorphism at the D17S250 and D17S261 loci. Nucleic Acids Res. 18:4640[Free Full Text].

WEIR, B. S., 1996 Genetic Data Analysis II. Sinauer Associates, Sunderland, MA.

WILKS, S. S., 1968 Mathematical Statistics. Wiley, New York.




This article has been cited by other articles:


Home page
GeneticsHome page
W. R. Engels
Exact Tests for Hardy-Weinberg Proportions
Genetics, December 1, 2009; 183(4): 1431 - 1441.
[Abstract] [Full Text] [PDF]


Home page
JAMAHome page
J. Attia, J. P. A. Ioannidis, A. Thakkinstian, M. McEvoy, R. J. Scott, C. Minelli, J. Thompson, C. Infante-Rivard, and G. Guyatt
How to Use an Article About Genetic Association: B: Are the Results of the Study Valid?
JAMA, January 14, 2009; 301(2): 191 - 197.
[Abstract] [Full Text] [PDF]