Abstract
Much forensic inference based upon DNA evidence is made assuming that the HardyWeinberg equilibrium (HWE) is valid for the genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, each having advantages and limitations. The limitations become more obvious when testing for deviation within multiallelic DNA loci is attempted. Here we present an exact test for HWE in the biallelic case, based on the ratio of weighted likelihoods under the null and alternative hypotheses, the Bayes factor. This test does not depend on asymptotic results and minimizes a linear combination of type I and type II errors. By ordering the sample space using the Bayes factor, we also define a significance (evidence) index, P value, using the weighted likelihood under the null hypothesis. We compare it to the conditional exact test for the case of sample size n = 10. Using the idea under the method of χ^{2} partition, the test is used sequentially to test equilibrium in the multiple allele case and then applied to two short tandem repeat loci, using a real Caucasian data bank, showing its usefulness.
ONE of the major uses of data from human multiallelic DNA loci is forensic inference. Because of the increasing use of variable number of tandem repeats (VNTR) and short tandem repeats (STR) loci, the importance of the HardyWeinberg equilibrium (HWE) has been reinforced (Devlinet al. 1991; Geisser and Johnson 1992; Dekaet al. 1995; Ayres and Balding 1998; Shoemakeret al. 1998) by being a useful assumption in the analysis of DNA evidence, as used in human identification and paternity studies. The conclusions reached by analyzing such evidence depend on the probabilistic evaluation of them and this evaluation is simplified if HWE is shown to be valid.
To test the HWE, usually the χ^{2} test, the conditional exact test, and the likelihoodratio test are used. For a complete discussion on these procedures, including some comparisons, see, for instance, Hernandez and Weir (1989), Guo and Thompson (1992), Maiste (1993), Weir (1996), and Lazzeroni and Lange (1997). The conditional exact test is analogous to Fisher’s exact test for contingency tables. There is a sufficient statistic, under the null hypothesis, that is considered to be known. Hence the p value is based on the conditional probabilities of the sample points given the value of the statistic.
Maiste (1993) and Maiste and Weir (1995) contrasted these tests and claimed to show that the exact conditional test has a better performance. A problem with this test is to define an order in the sample space to calculate the p value. In fact, this is the main difficulty when dealing with highdimension sample spaces (Kempthorne and Folks 1971). In addition, Emigh (1980) makes a useful comparison of the various equilibrium tests.
In cases where the number of alleles per locus or the sample size is large, a technique to generate a Markov chain is used. The objective is to estimate the p values for the exact conditional test. For more details see Guo and Thompson (1992) and Lazzeroni and Lange (1997). Today it is possible to perform the exact conditional test (or at least an approximation of it) under any sample size.
The χ^{2} test is highly dependent on asymptotic results. In addition to being inefficient when the sample is insufficiently large, it may fail whenever there are categories (genotypes) of low expected frequencies. In the problem we consider, these genotypes are present in large numbers because of the inherent genetic structure and so it is not unusual to find an allele that appears only once or twice in a database, even when there are a considerable total number (≈5000) of points.
Being a nonasymptotic test, the conditional exact test is, on the other hand, useful in cases where small samples are dealt with. It depends on a multinomial distribution and requires the ranking of all the possible samples with the same frequencies of alleles and same sample size. However, this ranking of the possible samples is problematic as we enlarge the size of the sample. Another disadvantage of this test is that it imposes unrealistic probabilities on the data points whose probabilities had already been determined from conditioning. Recall that, by conditioning, the sample space is severely reduced and the probabilities may be drastically increased.
Ayres and Balding (1998) propose a computational methodology to estimate the inbreeding coefficient, which allows one to measure the deviation from HWE under a model of inbreeding. Shoemaker et al. (1998) describe a Bayesian methodology to study the independence between pairs of alleles in a given locus; in this case, they consider the inbreeding coefficient and the disequilibrium coefficient (Hernandez and Weir 1989).
The problem of estimating the allelic frequencies, under the Bayesian perspective, was considered by Gunel and Wearden (1995). Chow and Fong (1992) studied this same problem but as a particular case of the simultaneous estimation of the related proportions.
The aim of this study is to develop an exact test on the basis of the comparison between weighted likelihoods (Dickey and Lientz 1970) under the null and alternative hypotheses. The ratio of these two functions is the Bayes factor (BF). A distribution of the BF under the null hypothesis defines a natural order in the sample space. Therefore the test is exact and unconditional and does not depend on asymptotic results. In addition, the test is desirable, in terms of decision theory, in that it minimizes a linear combination of type I and type II errors. The approach suggested by Barnard (1945) may be used to construct a nonBayesian test that is unconditional and exact. In fact it considers suprema in the place of weighted averages.
The weight used to calculate the weighted likelihoods under each of the hypotheses is based on the a priori preferences, which are derived from the choice of an a priori distribution over the parametric space. The dimension of the subspace defined by the hypothesis of HWE is smaller than that of the original parametric space. Therefore, to calculate the weighted likelihood under the null hypothesis, we use line integrals. In the same manner, type I and type II errors are weighted average errors (Irony and Pereira 1995). In other words, suppose α_{1} and β_{1} are the errors associated with the NeymanPearson test of simple hypotheses H_{0}, θ = θ_{0} vs. H_{1}, θ = θ_{1}, where θ_{0} is a parametric element of the equilibrium curve and θ_{1} is outside the curve. Consider now the set of all such pairs. The weighted errors for the unconditional exact test are the weighted averages of these NeymanPearson errors.
Usually, a test of hypothesis consists of comparing the supremum of the likelihood in the subset of the parameter space corresponding to the null hypothesis with the supremum in the whole parameter space. The test presented in this article consists of comparing the averages of the likelihoods of these sets. That is, instead of comparing suprema the test compares averages. The most important property of the test is that it minimizes linear combinations of the two kinds of errors. For instance, if α and β are errors of the first and second types, the test presented here minimizes α + β (Degroot 1989). Thus for a fixed α, it maximizes (1  β)  α and consequently also the power of the test.
We define the significance level of the test by ordering the sample space using the Bayes factor. The BF, following Good (1983), is the ratio of the weighted likelihood under the null hypothesis and the weighted likelihood under the alternative hypothesis. To compare two sample points, s and t, we calculate the BF of these points. The order of the sample points follows the order of the BFs. After ordering the sample space, consider s as the sample observation. We define the P value as the sum of the weighted likelihood under the null hypothesis over the set of points smaller or equal to s. The idea of this Bayesian significance level is not new: it was suggested by Kempthorne and Folks (1971). Note that the significance level as such uses the whole sample space in its calculus and then may not follow the likelihood principle (Royall 1997). Hence our P value may not by considered a full Bayesian procedure. A full Bayesian significance test for equilibrium and for contingency tables can be found in Pereira and Stern (1999).
The significance level, the P value, takes into consideration the alternative hypothesis in its calculus, which controls the type II error. Recall that the Bayes factor is the ratio between the probabilities under the two hypotheses. Usually this is not considered for the standard P value, which can cause problems such as rejecting the null hypothesis even when, under the alternative hypothesis, the observed sample has a lower probability (Pereira and Wechsler 1993).
A program to compute our P value is available for the MatLab environment at the website http://www.ime.usp.br/~cpereira/signifpr.html.
A more complex test environment where two loci are considered simultaneously is presented by Devlin et al. (1996). A future useful project will be to construct a Bayesian alternative test for this situation and extend it for more than two loci.
METHODS
To exemplify the use of the test (see Examples), data from two STR loci were analyzed: D17S250 (Weberet al. 1990) and MYC (Polymeropouloset al. 1992). Genomic DNA was obtained from unrelated, predominantly Caucasian individuals undergoing paternity testing nationwide by Genomic Engenharia Molecular Ltda. Alleles were amplified by PCR in the presence of [α^{32}P]dCTP, separated on DNA sequencing gels by electrophoresis, and visualized by autoradiography. Allele sizing was done by running adjacent M13 sequence ladders.
Definition and presentation of the exact unconditional test
Consider a single autosomal biallelic locus, comprising alleles a_{1} and a_{2}, which does not undergo mutation. Let
In a panmictic population obeying Mendelian rules, equilibrium is attained in one reproductive generation and this assures the existence of a real number p ∈ (0, 1), such that the genotypic proportions satisfy the relations
Assuming that the sample elements are obtained independently, by using a Bernoulli multivariate process, prefixing the sample size n, and representing the data by d = (n_{1}, n_{2}, n_{3}), we have that the likelihood function is given by
Let us represent the researcher prior preferences by a Dirichlet density function (see Wilks 1968) with parameter vector (α_{1}, α_{2}, α_{3}), α_{i} > 0. That is, if Γ[·] represents the Gamma function,
Considering now (3) as the weighing system, the weighted likelihood average over Ω is given by
Also note that Ω_{0} is a line inside the simplex Ω and hence the weighted likelihood average over Ω_{0} is the ratio of two line integrals as follows:
Let us suppose that the a priori density over Ω is uniform. Thus, making α_{i} = 1 for i = 1, 2, 3 in Equations 4 and 6, this means that the exact values of the weighted likelihoods over Ω_{1} and Ω_{0} are given, respectively, by
Consequently if we assume that a priori the two hypotheses have equal probabilities, 0.5, we obtain
A test for the hypothesis of HWE consists of comparing BF [d] with unity. In this case we have a test that minimizes the sum of the average of the two types of errors.
Sometimes the exact calculation of Equation 8 is not feasible and so it is useful to show approximations to its determination. An approximation to Equation 8 (using Taylor’s expansion) is given by
Note that although we have no closed form for general Dirichlet priors, using Equations 5 and 6, and numerical integration, we can easily compute the Bayes factor for any choice of the prior parameters. In computing the P value using the program mentioned above for general Dirichlet priors, one needs only to adjust the data input. Instead of inputing the vector (x_{1}, x_{2}, x_{3}), one must use (A_{1}  1, A_{2}  1, A_{3}  1).
In this discussion we emphasize the use of uniform priors only for the purpose of a fair comparison with the alternative classical methods. Recall that with the uniform prior, the posterior is the normalized likelihood function. In the next section we provide a comparison of this proposed test for HWE to the conditional exact test.
Comparison between the unconditional and the conditional exact tests
In this section we compare the exact unconditional test proposed in the previous section to the traditional conditional exact test. Considering all possible samples of size n = 10, we calculate the P value for each of these samples. To pinpoint the two different ways of computing the probabilities, we refer to p value in the conditional test and to P value in the unconditional one (see Pereira and Wechsler 1993). Table 1 lists the BF and the P values. Tables 2 and 3 show, respectively, the P values and p values, multiplied by 100, for the unconditional and for the conditional exact tests.
From Tables 2 and 3 it can be seen that the conditional exact test is more conservative than the unconditional exact test. The unconditional test is observed to minimize the sum of the average of the two types of errors. For a sample where (n_{1}, n_{2}, n_{3}) = (1, 8, 1) the unconditional exact test rejects the hypothesis of equilibrium (P = 0.036744 ≈ 0.04). Meanwhile the conditional exact test will not reject this hypothesis (p = 0.20). The p value of the conditional exact test can be obtained from the weighted likelihood ratios. For example, let us fix the sample size at n = 10 and suppose that the total number of observed elements that show the allele a_{1} is 9. That is, T = 2n_{1} + n_{2} = 9. To compute the p value of the conditional exact test, it is enough to consider all possible sample points for which T = 9. To determine the conditional probability of a sample point (n_{1}, n_{2}, n_{3}) given that 2n_{1} + n_{2} = 9 we have only to divide the BF obtained in this point by the sum of the BF of all points having T = 9. With these probabilities calculated we compute the p values in the usual manner, adding to the probability of each point all the smaller ones. Table 4 illustrates this calculation. Note that column 4, which is equal to column 3, is the second column divided by its sum.
In the general case, we maintain the sample size at n and the total number of observed elements that have the allele a_{1} at T = t. Considering the statistic T defined by T = 2n_{1} + n_{2}, where n_{1} and n_{2} are random variables that denote the number of individuals observed in the sample who have the genotype (a_{1}, a_{1}) and (a_{1}, a_{2}), respectively, this means that for each d = (n_{1}, n_{2}, n_{3}) with 2n_{1} + n_{2} = t and
Note that the sample space for n = 10 has a total of 66 sample points, in contrast to the conditional test, where T = 10, which considers only a sample set of 6 points: (0, 10, 0), (1, 8, 1), (2, 6, 2), (3, 4, 3), (4, 2, 4), and (5, 0, 5).
Hierarchical sequential testing for multiple alleles
The ideal situation to build the significance test for the multiallelic case would be to consider an ordering in the whole space. However, the dimensionality of the parameter space is incredibly large. For instance, consider a locus with 20 alleles. For this example, the parameter space will have dimension 210. Hence the number of possible sample points increases drastically. Theoretically, substituting line integrals with surface integrals, we can proceed exactly as in the biallelic case but at an extremely high computational cost. Next we present a sequential procedure that loses in precision in benefit of cost.
Let us consider a single autosomal locus with multiple codominant alleles. Let (a_{i}, a_{j}) be the genotype referring to the alleles a_{i} and a_{j}, i = 1, · · ·, m; j = i, · · ·, m, and
Because the system is codominant, in a sample of size n, the frequencies of the elements in each genotypic class n, n_{2}, · · ·, n_{m}(m+1)/2, with
Assuming that the sample elements are obtained independently, by using a Bernoulli (multivariate) process, we see that the likelihood function is proportional to
In a manner analogous to the case of the biallelic locus, the condition of equilibrium given in Equation 1 is characterized in the general case by Equation 9, because of the following statement: in a panmictic population that obeys Mendelian laws, equilibrium is attained in one reproductive generation and there are u_{1}, u_{2}, · · ·, u_{m} ∈ (0, 1) with
Although we can define the exact unconditional test using the BF (as in the case with two alleles), great difficulties arise when calculating the surface integrals in this case. Upon examining the population HWE determined by Equation 9, we can prove that the following statement, a property of the multinomial distribution, is true. This comprises the basis of the procedure that we propose to test the hypothesis of equilibrium in a situation of multiple alleles.
Statement: Let a_{1}, a_{2}, · · ·, a_{m} be the alleles under consideration. If for any i_{1} = 1, 2, · · ·, n, the condition of HWE given in Equation 1, is satisfied by the biallelic system a_{i}_{1}, A_{I}_{1}, where
Thus if a system with n alleles, A_{0} = {a_{1}, a_{2}, · · ·, a_{m}}, obeys the law of HWE, this law is obeyed by “any system” obtained by partitioning A_{0} into at least two nonempty subsets and upon considering each element of this partition as “an allele.”
The idea under the HWE test for the multiallelic case is based on the chisquare partition in contingency tables (see Everitt 1977, for instance). Consider a data bank, sample S, for a specific locus and consider also that there are m (a positive integer) different alleles, a_{1}, a_{2}, · · ·, a_{m}. The order of testing, starting from the smallest allele frequency to the highest one, has the objective of working, in each step, with the biggest possible sample. The reason for this is to try to work, in all steps, with the smallest possible errors.
The sequential procedure, to test the hypothesis of HWE, is as follows.
Procedure:
Step 1: Without loss of generality, call a1 the least frequent allele in sample S.
Step 2: Divide the sample S into three mutually exclusive sets:
S11, all individuals with genotype (a1, a1);
S_{1}., all individuals with genotype (a_{1}, a_{1}), i ≠ 1; and
S.., all individuals not having the allele a_{1}.
Step 3: Apply the unconditional exact test for the biallelic case in the partition (S_{11}, S_{1}., S..). If HWE is rejected stop and declare the population to be in disequilibrium. If HWE is accepted go to step 4.
Step 4: If S.. is composed of elements with only one allele involved, stop and declare the population to be in equilibrium. If more than one allele is involved in the elements of S.., rename S.. as S and go to step 1.
Examples
In the examples we illustrate the sequence in which the tests for equilibrium were performed and present the values of the Bayes factors with respective p values and the values of the p values for the corresponding χ^{2} tests.
Let m_{a} be the number of alleles present in the locus being studied. For each i = 1, · · ·, m_{a}  1, we define
a_{i}, the allele chosen to carry out the ith test;
(a_{i})^{c}, the set made up of the remaining alleles to carry out the ith test;
n_{i}_{1}, the number of genotypes (a_{i}, a_{i});
n_{i}_{2}, the number of genotypes (a_{i}, (a_{i})^{c});
n_{i}_{3}, the number of genotypes ((a_{i},)^{c}, (a_{i})^{c}); and
BF_{i}, the Bayes factor corresponding to the ith test.
Example 1: The following data were obtained from the STR locus MYC in which 19 alleles are observed and n_{a} = 5714 (Table 5).
In this case, therefore, the hypothesis of HWE should be rejected.
Example 2: Here the data were obtained from the STR locus D17S250 in which 21 alleles are seen and n_{a} = 5592 (Table 6).
The hypothesis is not rejected in this example.
DISCUSSION
A Bayesian test of hypothesis was presented in this article. However, its evaluation and comparison with the alternative conditional test are done under a classical perspective. To start this discussion we recall that the generalized NeymanPearson (GNP) test is an optimal test under both perspectives, classical and Bayesian (Degroot 1989, Section 8.2). In the GNP situation, one compares two probability (density) functions, f_{0}, the probability function under the null hypothesis, against f_{1}, the one under the alternative hypothesis. Having chosen a constant k > 0, if f_{0} > (≥) k f_{1}, then the null hypothesis is accepted. On the other hand, if f_{0} ≤ (<) k f_{1}, then the null hypothesis is rejected. Note that, in fact, we compare the values of the two likelihoods at the sample point that effectively occurred, to make the decision. If α and β are, respectively, the probabilities of the two kinds of errors, the GNP is the test that minimizes the linear combination α  kβ. Considering adequately the choice of k as a function of losses and prior probabilities, this linear combination is the minimum expected loss, which makes the test optimal under the Bayesian perspective. Press (1989) presents a complete description of the Bayesian method.
The HWE case is different in that both hypotheses are composite. That is, each hypothesis is represented as a set of probability functions. One of the difficulties is that the two sets have different dimensions. The alternative hypothesis is bidimensional although the null hypothesis is unidimensional. The idea of the test, under the classical point of view, is to define as the two probability functions, f_{0} and f_{1}, the averages of the likelihood over the parametric sets defined by the null and the alternative hypotheses, respectively. Having now two probability functions, we apply the GNP procedure to define the critical region. To compute the average p value we have to order the sample space. We say that a sample point x_{i} is higher than x_{j}, denoted x_{j} ≤ x_{i}, if f_{0}(x_{i})/f_{1}(x_{i}) > f_{0}(x_{j})/f_{1}(x_{j}). To define the P value (not p value) we consider the sum of f_{0}(x_{j}) over all sample points x_{j} ≤ x_{o}, where x_{o} is the sample point effectively observed. Since we cover the whole sample space, we have called the test an unconditional exact test.
The unconditional test is opposite to the one that considers as the likelihood under the null hypothesis the conditional probability function given the observed value of T = 2n_{1} + n_{2}, which is a sufficient statistic under the null hypothesis (Haldane 1954; Chapco 1976; Elston and Forthofer 1977). To compute the p value in this case one must look only for the sample points with the same value of 2n_{1} + n_{2} obtained by the observed one. This is the reason to call this test a conditional test. Tentatively, Geisser and Johnson (1992) presented an unconditional test that was based on quantiles. However, it seems to be not appropriate as discussed by Devlin et al. (1993a,b). Cannings and Edwards (1969) without conditioning presented a way to estimate a deviation from the HWE. However, they did not discuss hypothesis testing.
Turning here to our procedure, from the Bayesian point of view, where a posterior density is defined over the parametric space, one could say that the test is fully conditional. The reason is obvious because we compute a conditional density for the parameter given the observed sample point. Considering the uniform prior in the parametric space and 0.5 as the prior probability of the null hypothesis to be true, the ratio of the average likelihoods, as presented, is the posterior odds, which compared with a chosen k would define the testing procedure presented. This is a test that minimizes
As far as we know, Pereira and Rogatko (1984) presented the first Bayesian article for testing the HWE. However, Altham (1971) presented a Bayesian estimation of a parameter that can be used to evaluate the HWE. It does not mention the HWE because it is described in a different context. Pereira and Rogatko (1984) defined an ad hoc way to define the likelihoods, which could not be properly supported. They also presented credible sets that could be used to test HWE. The value of the credibility was used to define the size of the first kind of error.
Lindley (1988) considers a Bayesian estimation for equilibrium parameters in the case of two alleles. The parameters studied are obtained from an alternative parameterization that on the one hand allows the use of Gaussian priors, but on the other complicates the interpretation at the moment of assessing the a priori distribution.
The hierarchical sequential procedure described in this article is based on intuition. Recall that multinomial likelihoods can be factorized using partitions on the set of categories. Also, the HWE is a special kind of association among alleles at a specific locus. Whenever we conclude that a specific allele is in HWE association with all the others, we believe we do not have to use it again when testing the remaining alleles. It could be argued that, by using this procedure, the probability of rejecting HWE may increase as alleles are being eliminated. However, since the sample size is decreasing, the power of the tests will decrease. Hence, it is reasonable to believe that there is a compensation and that the procedure will do the job fairly. Note also that the sequence order depends on the rarity of the alleles in such a way that the sample size reduction occurs as slowly as possible.
Today, the use of Bayesian ideas in genetics is a reality. More recently, an interesting article by Vieland (1998) suggested that the future of genetic data analysis is strongly related to the Bayesian paradigm.
Acknowledgments
The views expressed are those of the authors and not necessarily those of the FDA.
Footnotes

Communicating editor: G. A. Churchill
 Received September 17, 1999.
 Accepted March 1, 2001.
 Copyright © 2001 by the Genetics Society of America