Abstract
While nonrandom associations between zygotes at different loci (zygotic associations) frequently occur in HardyWeinberg disequilibrium populations, statistical analysis of such associations has received little attention. In this article, we describe the joint distributions of zygotes at multiple loci, which are completely characterized by heterozygosities at individual loci and various multilocus zygotic associations. These zygotic associations are defined in the same fashion as the usual multilocus linkage (gametic) disequilibria on the basis of gametic and allelic frequencies. The estimation and test procedures are described with details being given for three loci. The sampling properties of the estimates are examined through Monte Carlo simulation. The estimates of threelocus associations are not free of bias due to the presence of twolocus associations and vice versa. The power of detecting the zygotic associations is small unless different loci are strongly associated and/or sample sizes are large (>100). The analysis of zygotic associations not only offers an effective means of packaging numerous genic disequilibria required for a complete characterization of multilocus structure, but also provides opportunities for making inference about evolutionary and demographic processes through a comparative assessment of zygotic association vs. gametic disequilibrium for the same set of loci in nonequilibrium populations.
MULTILOCUS associations are most commonly studied at the gametic level. In this case, linkage disequilibrium or more appropriately gametic disequilibrium can be used to sufficiently describe the nonrandom associations of alleles at different loci ordered within gametes (Bennett 1954; Weir 1996). The evidence of gametic disequilibrium is important in inferring about the history of a population, the evolutionary forces governing these loci, and the location of the loci on the chromosomes. This approach to studying multilocus associations is appropriate for a haploid population where different gametes can be counted directly or for a HardyWeinberg equilibrium population where gametic frequencies can be inferred from genotypic (zygotic) frequencies. However, natural populations are rarely at equilibrium because of many disturbing forces such as inbreeding, population structure, and selection. In a nonequilibrium population, a complete characterization of multilocus associations requires gametic and many other genic disequilibria (Cockerham and Weir 1973; Weir 1979). Even with a moderate number of loci each with a few alleles, the number of genic disequilibria to be characterized and estimated can quickly increase beyond comprehension. Thus, with a large number of loci each with many alleles, it is necessary to have a single measure that is similar to gametic disequilibrium, but at zygote level.
Recently, Yang (2000) described characterization and estimation of such a measure for a pair of loci, which is called zygotic association. According to Yang (2000), the zygotic association is simply the deviation of twolocus zygotic frequencies from products of singlelocus zygotic frequencies, but is composed of all nonallelic genic disequilibria at the two loci. Thus, in experimental population genetic studies, the zygotic association can be estimated directly by comparing the two and singlelocus zygotic frequencies observed in a sample of diploid individuals. Haldane (1949) was probably the first to recognize that the zygotic association can be generated as a result of partial inbreeding even in a linkage (gametic) equilibrium population. Subsequent studies have shown that such zygotic associations may arise from mixed selfing random mating (Bennett and Binet 1956; Allardet al. 1968; Weir and Cockerham 1973), associative overdominance (Ohta and Cockerham 1974; Charlesworth 1991), admixture of two or more distinct gene pools (Barton and Gale 1993), or heterotic selection (Mitton 1997). Thus, knowledge of extent and patterns of zygotic associations at two or more loci is essential for inferring about evolutionary and demographic processes. However, while there is substantial literature on gametic disequilibria at three or more loci (e.g., Bennett 1954; Brown 1975; Hill 1975; Thomson and Baur 1984; Barton 2000), equivalent development for multilocus zygotic associations is not yet available. In this article, we first describe the joint distributions of zygotes at multiple loci and their relationships with heterozygosities and zygotic associations. We then describe statistical procedures of estimating and testing multilocus zygotic associations from a sample of diploid individuals, with the details being given for the case of three loci. The sampling properties of the estimates are examined by computer simulation.
THEORY AND ANALYSIS
Consider a diploid population in which individual genotypes are known at m loci (e.g., codominant phenotypic markers such as the MN blood groups, allozymes, and microsatellites). Then, a genotype at a particular locus can be unambiguously recognized as a homozygote or heterozygote, depending on whether or not the two alleles at the locus are the same. Just as frequencies of genes or gametes at one or more loci are needed for defining and characterizing gametic disequilibria, frequencies of zygotes at one or more loci are required for defining and characterizing multilocus zygotic associations. These zygotic frequencies and their relationships with heterozygosities and multilocus zygotic associations are described below.
One locus: At a given locus, say locus j, the probability of an individual genotype being heterozygous or homozygous is defined as
Two loci: When two loci, say loci j and l, are considered, the joint distribution of indicators, X_{j} and X_{l}, is
To see how the zygotic association (ω_{jl}) is related to different genic disequilibria including gametic disequilibrium, it is necessary to first identify the relationships between joint frequencies of homozygotes and heterozygotes in (2) and genotypic frequencies,
Three loci: When three or more loci are considered jointly, two alternative approaches can be used to describe zygotic associations at these loci. The first is Bartlett’s (1935) multiplicative approach based on the multiway contingency table. In the case of three loci, the absence of threelocus zygotic association but the presence of all three pairwise associations implies that
Unlike the twolocus associations (e.g., ω_{jl}) where the minimum is always zero for both ω_{jl} > 0 and ω_{jl} < 0 (cf. Equation 3), the threelocus zygotic associations may sometimes be bounded away from zero. In other words, both positive and negative ω_{jlo} values may be constrained by their own minimum and maximum values. Let us first define two quantities,
Table 1 shows some numerical examples to illustrate effects of heterozygosities, which are pairwise zygotic associations on the ranges of the threelocus association (ω_{jlo}). For example, with H_{j} = H_{l} = 0.05 and H_{o} = 0.1, the ranges for ω_{jl}, ω_{jo}, and ω_{lo} are 0.0025 ≤ω_{jl} ≤ 0.0475, 0.005 ≤ω_{jo} ≤ 0.045, and 0.005 ≤ω_{lo} ≤ 0.045, respectively. Each of these three ranges is divided by 19 to obtain 20 equally divided values from the minimum to the maximum. Thus, there are 8000 (20 ω_{jl}’s × 20 ω_{jo}’s × 20 ω_{lo}’s) configurations of the twolocus zygotic associations that can be used to define the ranges for ω_{jlo}. Of these 8000 configurations, 240 have the ranges with ω_{jlo} < 0, 1496 have the ranges with ω_{jlo} > 0, and 2158 have the ranges of
In the absence of twolocus gametic disequilibria (i.e., D_{jl} = D_{jo} = D_{lo} = 0), the expressions of zygotic frequencies in f are greatly simplified. For example, the frequency of homozygotes at all three loci, j, l, and o, is given by
The combined effect of two and threelocus gametic disequilibria on the values of ω_{jlo} is also examined (numerical results are not presented). The joint contribution of two and threelocus gametic disequilibria to ω_{jlo} greatly cloaks their relationships with ω_{jlo}. However, there are clearly cases where the ω_{jlo} values exceed the limits of ω_{jlo} under the cases of no pairwise disequilibria. For example, for p_{J} = p_{L} = p_{O} = 0.5, the ranges for D_{jl}, D_{jo}, and D_{lo} are all from 0.25 to 0.25, but permissible values of D_{jlo} are determined by different combinations of these pairwise disequilibria with the given gene frequencies (Thomson and Baur 1984). It is found that when D_{jl} = D_{jo} = D_{lo} = D_{jlo} =0.25, ω_{jlo} is 0.5, which exceeds the limit of 0.125 in the case of no twolocus disequilibria (D_{jl} = D_{jo} = D_{lo} = 0).
More than three loci: The extension to four or more loci following Bennett (1954) is straightforward. For example, the joint distribution of indicators X_{j}, X_{l}, X_{o}, and X_{q} for loci j, l, o, and q, respectively, is given by
STATISTICAL INFERENCE
Maximumlikelihood estimation: For m loci, there are 2^{m} possible classes of zygotes with two extreme classes being mlocus homozygotes (00 · · · 0) and mlocus heterozygotes (11 · · · 1). A total of 2^{m}  1 parameters can be estimated. Here we focus on the estimation for the case of three loci (m = 3), letting j = 1, l = 2, and o = 3 for convenience. Table 3 lists the eight classes of zygotes with the expected frequencies of f(000), f(001), f(010), f(011), f(100), f(101), f(110), and f(111) as obtained from (5). Seven parameters are estimable: three heterozygosities (H_{1}, H_{2}, and H_{3}), three twolocus zygotic associations (ω_{12}, ω_{13}, and ω_{23}), and one threelocus zygotic association (ω_{123}). If a sample of n individuals is taken from a diploid population and if the numbers of each class in the sample are assumed to be multinomially distributed, frequencies of these classes can be estimated using the maximumlikelihood (ML) method. Let n_{abc} be the numbers of the abcth class of zygotes with a, b, and c representing indicators X_{1}, X_{2}, and X_{3}, respectively. Thus the ML estimates of f’s are given by fˆ(abc) = n_{abc}/n to satisfy the maximized multinomial likelihood,
Sampling variances of linear combinations of multinomial variables are known exactly. For example, var(Ĥ_{1}) = H_{1}(1  H_{1})/n. The sampling variances of zygotic association estimates involve quadratic functions of observed heterozygosities and can be calculated using Fisher’s (1954) expression for the approximate variance of a function of multinomial observations n_{abc}, for example, with expectations E(n_{abc}) = nf(abc). The sampling variances of
Hypothesis testing: Since the ML estimate
Simulation: Monte Carlo simulation is carried out to examine the performance of the estimators and test statistics for the four zygotic associations, ω_{12}, ω_{13}, ω_{23}, and ω_{123}. The eight frequencies of zygote classes, f(X_{1}X_{2}X_{3}), can be constructed from given values of the four zygotic associations and three heterozygosities, H_{1}, H_{2}, and H_{3} (cf. Table 3). For each of the 18 configurations given in Table 1, we consider three values (maximum, minimum, and zero) of threelocus zygotic association (ω_{123}). Thus, there are a total of 54 populations constructed. From each population, 10,000 replicate samples of sizes n = 30, 100, and 300 are drawn. Estimation and test are made for each simulated sample and descriptive statistics are calculated across all the samples.
Table 4 presents means and standard deviations (SD) of estimates from the simulated samples for 8 of the 54 constructed populations described above. The simulation results are given only for n = 30 and n = 300. It is evident that the averages of estimated zygotic associations are very close to their theoretical values when there is no or little association. In this case, bias is expected to be negligible as it arises only from the factor of (n  1)/n. However, when such a case is not true, there can be a substantial amount of bias in the estimates. For example, for the case of H_{1} = 0.1, H_{2} = 0.3, and H_{3} = 0.5 with ω_{12} = 0.023, ω_{13} = 0.026, ω_{23} = 0.103, and ω_{123} = 0.032, the respective averaged estimates of ω_{12}, ω_{13}, ω_{23}, and ω_{123} are 0.023, 0.012, 0.099, and 0.025 for n = 30 and 0.024, 0.012, 0.103, and 0.027 for n = 300. While
While means of estimated zygotic associations for the two sample sizes in Table 4 are similar, the larger sample leads to a much smaller SD. It is thus no surprise to see that the larger sample leads to a much greater power of detecting nonzero zygotic associations. The estimated powers for the cases of no zygotic associations in Table 4 are close to 0.05 as expected because a 5% significance level is used to reject these null hypotheses. To further explore the effect of sample sizes on the power, we calculate the powers of detecting threelocus associations in the presence of twolocus associations (i) ω_{12} = ω_{13} =ω_{23} = 0.0213 and (ii) ω_{12} = 0.0226, ω_{13} = 0.0263, and ω_{23} = 0.0263 (cf. Table 1) for sample sizes of 30, 100, 300, 500, and 1000 (the three loci are indexed as j = 1, l = 2, and o = 3). The critical value with a 5% significance level, c_{0.05}, which determines the rejection region for the hypothesis H_{0}, ω_{123} = 0, is
It is also evident from Figure 1 that low heterozygosities at individual loci cause a very strong asymmetry between positive and negative associations. The unbalanced intensities of associations from both positive and negative sides result in unequal powers unless the sample size is very large. Because of the asymmetry, Lewontin’s (1964) normalized associations as often used in the literature (e.g., Hedrick 1987; Zapata 2000) may give a false impression about intensities of multilocus associations. For example, Brown (1975) showed in his Table VII that, with the same amount of normalized threelocus gametic disequilibrium at both sides (T′= ±0.99), there is a substantial difference in sample size requirements. In the case of gene frequencies equal to 0.2 and two twolocus gametic disequilibria being 0.4 with the third one being zero, Brown (1975) found that a sample size of n = 8402 is required to detect T′=0.99 with the power of 0.9, but only n = 82 is needed to detect T′=+0.99 with the same amount of power. Had he not given the range of T (0.0016 to 0.0224), one would be led to believe that the negative disequilibrium is much more difficult to detect than its positive counterpart. The reverse conclusion would be drawn in the cases where the asymmetry is skewed toward the negative side. The truth is that it is the actual, not normalized, association that determines the power and sample size requirement regardless of whether the association is positive or negative. Thus, the use of normalized measures for such purposes should be treated with caution.
DISCUSSION
This article describes measures of zygotic associations at more than two loci and their estimation with samples from diploid populations. These measures are defined as departures of joint zygotic frequencies from the expected values of zero zygotic associations (cf. Equations 2 and 5). This is very similar to the definition for gametic disequilibria for two or more loci, which is based on gametic and allelic frequencies (e.g., Bennett 1954). Thus, it is of little surprise to see that the measures of multilocus zygotic associations share most of the statistical properties by the usual gametic disequilibria. However, the meanings of the two sets of measures are quite different. In fact, a comparative assessment of zygotic associations vs. gametic disequilibria may provide some important insights into adaptive significance of genotypes at different loci. For example, if strong zygotic association but little gametic disequilibrium between a pair of loci is observed, then the study population may undergo natural selection favoring highly heterozygous individuals without distinguishing among different homozygotes in large and predominantly outcrossing populations (Mitton 1997). The assessment would be most sensitive with quantitative trait loci (QTL) that directly affect components of fitness. However, a lack of zygotic associations may also mean that selection discriminates among different homozygotes (e.g., favoring common homozygotes, but selecting against rare homozygotes). Thus, extra care is needed to choose homozygous QTL with similar selection advantages for such an analysis.
There are a variety of methods of estimating and interpreting multilocus gametic disequilibria from haploid data or diploid data from a HardyWeinberg equilibrium population (e.g., Bennett 1954; Brownet al. 1980; Barton 2000). In contrast, with the diploid data from a HardyWeinberg disequilibrium population, a complete characterization of multilocus associations also requires other types of genic disequilibria (Cockerham and Weir 1973; Weir 1979). However, the exceedingly large number of genic disequilibria encountered for multiple alleles at many loci makes such detailed characterization difficult for comparing multilocus organizations among several populations. The multilocus zygotic associations analyzed here summarize different genic disequilibria with no need to consider whether or not the study population is in HardyWeinberg equilibrium. The estimation and hypothesis testing are quite straightforward as they are merely the direct adoption of the procedures used for diallelic haploid data. Thus, our method presents a simple solution to the analysis of complex multilocus structures in diploid populations.
Of course, such a highly compacted summary in the multilocus zygotic associations represents a severe loss of information. In particular, since the analysis is based on the frequencies of zygote classes, it completely ignores haplotype information such as linkages between different loci. Thus, when significant zygotic associations are detected, there is a need to determine which genic disequilibria are important. In light of great current interest in the linkage (gametic) disequilibrium approach to finescale QTL mapping (e.g., Pritchard and Przeworski 2001; Reichet al. 2001), it is essential to determine if gametic disequilibrium is important in the presence of significant zygotic associations. As shown earlier, if the study population is in HardyWeinberg equilibrium, then there are direct relationships between zygotic associations and various orders of gametic disequilibria (cf. Equations 4b and 11). In this case, it is definitely more informative to work directly with the raw genotypic data instead of the collapsed data based on zygote classes so that haplotype frequencies and gametic disequilibrium can be inferred. However, in the presence of HardyWeinberg disequilibrium, which may often be the case in natural populations, gametic disequilibrium may be inflated because many other types of nonallelic disequilibria may also cause the multilocus associations. The knowledge about the inflation may be gained through the comparative assessment of gametic vs. zygotic associations mentioned above.
In estimating and testing for multilocus zygotic associations, we adopt Bennett’s (1954) additive approach, with frequencies of different zygote classes being expressed as a linear function of the zygotic associations and heterozygosities (Table 3). This approach enables us to explicitly give estimates and to elucidate the sampling properties of these estimates. However, our tests for two and threelocus associations are not independent as shown in the simulation results (Table 3). Hill (1975) discussed the use of the multiplicative approach (or loglinear model analysis) for developing an independent test for no threelocus association, but with the presence of twolocus associations. Another possibility is the exact test as suggested by Zaykin et al. (1995). In the exact test, the probability of the observed multilocus genotypic (zygotic) array conditional on the genotypic arrays expected under an appropriate hypothesis of zero zygotic association is evaluated to determine whether or not it lies in the tail of the empirical distribution generated by permutation. For example, the conditional probability required for testing if ω_{123} = 0, given the presence of all three twolocus associations, is given by
Acknowledgments
I thank Dr. YunXin Fu and a reviewer for helpful comments. This research was partially supported by the Natural Sciences and Engineering Research Council of Canada grant OGP0183983.
Footnotes

Communicating editor: Y.X. Fu
 Received April 25, 2001.
 Accepted February 19, 2002.
 Copyright © 2002 by the Genetics Society of America