# A Clarification of the Hardy–Weinberg Law

- Alan E. Stark
^{1}

- 1
*Address for correspondence:*3/20 Seaview St., Balgowlah, NSW, Australia 2093. E-mail: ae_stark{at}ihug.com.au

## Abstract

C. C. Li showed that Hardy–Weinberg proportions (HWP) can be maintained in a large population by nonrandom mating as well as random mating. In particular he gave the mating matrix for the symmetric case in the most general form possible. Thus Li showed that, once HWP are attained, the same proportions can be maintained by what he called pseudorandom mating. This article shows that, starting from any genotypic distribution at a single locus with two alleles, the same in each sex, HWP can be reached in one round of nonrandom mating with no change in allele frequency. In the model that demonstrates this fact, random mating is represented by a single point in a continuum of nonrandom possibilities.

LI (1988) made a valuable contribution when he showed that Hardy–Weinberg (HW) frequencies can be maintained in large populations with nonrandom mating (Hardy 1908; Weinberg 1908, 1963). However, textbooks that invoke random mating as a justification for use of HW frequencies in population models and, in addition, maintain that random mating is a necessary condition, continue to be published. For example, Holsinger (2001, p. 914) states “… the Hardy-Weinberg Law provides a way to estimate allele frequencies … provided we are willing to assume that all of the assumptions apply to the population in which we are interested.” The important point to note here is that one of Holsinger's assumptions is that individuals “choose mates at random.”

Random mating is sometimes described as pairing in ignorance of the genotypes of potential mates. But even lack of such knowledge does not ensure pairings with frequencies satisfying the formal criterion of randomness. Moreover, it is claimed frequently that assortative mating is incompatible with HW frequencies. But this is not generally true. Random mating is not a realistic assumption for human population models and therefore it should be invoked only with careful qualification based on correct mathematical foundations. Because it is simple to describe, resort to the assumption of random mating has been too facile. But this does not justify presentations of the most basic concept of population genetics that contain dubious and even formally incorrect notions.

Li (1988, p. 736) wrote that “An infinite number of patterns of deviations from random mating exists for autosomal loci that would make a population pseudo-random mating. This could be a contributing factor to the robustness of the Hardy-Weinberg law … makes the study of the mating pattern a worthwhile subject.”

The next section gives the notation together with a brief outline of Li's model, for the reader's convenience. The final section gives the model that yields HWP from an arbitrary genotypic distribution in one round of nonrandom mating.

## LI'S MODEL

Consider a population with respect to a single locus having alleles *A* and *B* with respective frequencies *q* and *p*, the same in males and females. Denote frequencies of genotypes *AA*, *AB*, and *BB* by *f*_{0}, *f*_{1}, and *f*_{2}. Table 1 gives Li's symmetrical mating model, which he introduces with the remark: “…When reciprocal crosses have the same frequency, the general pattern will be symmetrical” (Li 1988, p. 733). Thus the roles of males and females can be reversed without changing the model. This case is simpler than Li's more general model but is suitable for the present purpose. The 3 × 3 matrix of cell frequencies is denoted by [*f _{ij}*],

*i*= 0, 1, 2;

*j*= 0, 1, 2.

Both row totals and column totals give Hardy–Weinberg proportions: *f*_{0} = *q*^{2}, *f*_{1} = 2*pq*, *f*_{2} = *p*^{2}. Thus the parental population is in HW form and it is simple to show, under the usual assumptions, that the distribution of genotypes among offspring is the same. Note that *f*_{11} = 4*f*_{02}.

There are four parameters or constraints in Li's model, taking up the 4 d.f. in [*f _{ij}*]: these are

*q*, Sewall Wright's fixation index

*F*, here taking the value zero,

*a*, and

*b*. Parameters

*a*and

*b*are constrained by the requirement that the elements of [

*f*] be nonnegative. In Li's model, random mating is defined by the pair of conditions

_{ij}*a*= 0 and

*b*= 0.

A compact formula giving a range of nonrandom mating tables that reproduce HWP is given by Stark (2005). This was adapted from a more general formula that was used by Stark (1980) to classify systems of partial inbreeding. Other relevant sources are Stark (1977) and A. E. Stark (unpublished results).

## HWP FROM AN ARBITRARY DISTRIBUTION WITH NONRANDOM MATING

Here it is demonstrated that HW proportions can be obtained in one round of nonrandom, as well as random, mating from any genotypic distribution shared by males and females. Suppose that, in generation *t*, the population has gene frequencies *q _{t}* and

*p*and genotypic frequencies

_{t}*f*

_{0}(

*t*) =

*q*

_{t}^{2}+

*F*,

_{t}p_{t}q_{t}*f*

_{1}(

*t*) = 2

*p*(1 −

_{t}q_{t}*F*), and

_{t}*f*

_{2}(

*t*) =

*p*

_{t}^{2}+

*F*. The possible values of

_{t}p_{t}q_{t}*F*are constrained by

_{t}*q*to ensure that {

_{t}*f*

_{0}(

*t*),

*f*

_{1}(

*t*),

*f*

_{2}(

*t*)} is a valid set of genotypic frequencies, but generally

*F*is in the interval (−1, 1). Without loss of generality take

_{t}*q*in the interval 0 <

_{t}*q*≤ .

_{t}Consider a mating system in which the frequency of *i* × *j* couples in generation *t* giving rise to offspring in generation *t* + 1 is(1)where the entries in (1) are defined by

Equation 1 was derived from Fisher's identity whose properties and background references are set out fully in Lancaster (1969, p. 90). The set of values {*d*_{0}(*t*), *d*_{1}(*t*), *d*_{2}(*t*)}, denoted by **d**(*t*), was constructed by first assigning values 0, 1, and 2, respectively, to genotypes *AA*, *AB*, and *BB* and then correcting by deducting the mean with respect to the distribution of genotypic frequencies which is 2*p _{t}*. Thus the mean of

**d**(

*t*), defined by , is zero and the variance

*S*of

_{t}**d**(

*t*) is calculated from . The set of values {

*e*

_{0}(

*t*),

*e*

_{1}(

*t*),

*e*

_{2}(

*t*)} is denoted by

**e**(

*t*). Since the mean of

**e**(

*t*), defined by the expression , is zero, the variance of

**e**(

*t*), denoted

*T*, is calculated from . In (1) the quantity

_{t}*e*(

_{i}*t*)

*e*(

_{j}*t*)/

*T*is the product of a pair of standardized elements, that is, elements whose mean is zero and standard deviation is unity.

_{t}This article requires a special case of Equation 1, namely the case when μ = 0. Then (1) reduces to(2)Although Equation 1 is not used in its entirety in what follows, (1) is included because otherwise (2) gives little insight to the reader as to its origins. Fisher's identity was conceived originally as a means of expressing a bivariate distribution in canonical form to explore its properties. For a 3 × 3 matrix such as that in Table 1 this entails finding two sets of variable values each with mean zero and standard deviation unity. In this article the procedure is reversed by starting from a canonical form, such as (1), but giving it desired properties. It turns out that **d**(*t*) as defined above, combined with setting the correlation coefficient μ to be zero, generates HWP in the offspring. Other values of μ produce, in time, non-HWP. Straightforward calculation shows that the distribution of genotypes in offspring of generation *t* + 1 obtained from (2) isthat is, the offspring are distributed in HWP.

The admissible values of ν in (2) are obtained by setting various elements of the mating matrix to zero. Given *q _{t}* and −

*q*/

_{t}*p*<

_{t}*F*< 1, the upper limit is obtained aswhen

_{t}*f*

_{01}(

*t*) = 0.

The lower limit is set in two intervals: given *q _{t}* and −

*q*/

_{t}*p*<

_{t}*F*≤ (

_{t}*p*−

_{t}*q*)/

_{t}*p*, the left lower limit is obtained aswhen

_{t}*f*

_{00}(

*t*) = 0; and given

*q*and (

_{t}*p*−

_{t}*q*)/

_{t}*p*≤

_{t}*F*< 1, the right lower limit is obtained aswhen

_{t}*f*

_{11}(

*t*) = 0.

As an example of (2), when the lower limits meet, that is, when *f*_{00}(*t*) = 0 and *f*_{11}(*t*) = 0, *F _{t}* = (

*p*−

_{t}*q*)/

_{t}*p*,

_{t}*e*

_{0}(

*t*) = −1,

*e*

_{1}(

*t*) = 1,

*e*

_{2}(

*t*) = −

*q*/(2 − 3

_{t}*q*),

_{t}*T*=

_{t}*q*(3 − 4

_{t}*q*)/(2 − 3

_{t}*q*), and ν

_{t}_{lr}= −

*T*, the mating matrix is as given in Table 2 .

_{t}In summary, given any gene frequency *q* and any admissible *F*, a value of ν can be chosen from an interval governed by *q* and *F*, according to the limits given above, to produce HWP in one generation. Random mating is defined by the single value ν = 0 contained in this interval. Thus Li's contribution and this one completely remove the necessity of random mating as a requirement for the establishment and maintenance of HWP. Hardy and Weinberg showed that random mating was a sufficient condition and this is the concept that has taken hold in the genetics community for almost 100 years.

## Acknowledgments

I thank Francisco M. Salzano and Paulo A. Otto for their encouragement.

## Footnotes

Communicating editor: M. W. Feldman

- Received February 12, 2006.
- Accepted May 5, 2006.

- Copyright © 2006 by the Genetics Society of America