Genetics, Vol. 161, 435-445, May 2002, Copyright © 2002

Analysis of Multilocus Zygotic Associations

Rong-Cai Yanga
a Alberta Agriculture, Food and Rural Development, Edmonton, Alberta T6H 5T6, Canada, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta T6G 2P5, Canada and Department of Renewable Resources, University of Alberta, Edmonton, Alberta T6G 2H1, Canada

Corresponding author: Rong-Cai Yang, Food and Rural Development, No. 301, J. G. O'Donoghue Bldg., 7000 - 113 St., Edmonton, Alberta T6H 5T6, Canada., rongcai.yang{at}gov.ab.ca (E-mail)

Communicating editor: Y.-X. FU


*  ABSTRACT
*TOP
*ABSTRACT
*THEORY AND ANALYSIS
*STATISTICAL INFERENCE
*DISCUSSION
*LITERATURE CITED

While nonrandom associations between zygotes at different loci (zygotic associations) frequently occur in Hardy-Weinberg disequilibrium populations, statistical analysis of such associations has received little attention. In this article, we describe the joint distributions of zygotes at multiple loci, which are completely characterized by heterozygosities at individual loci and various multilocus zygotic associations. These zygotic associations are defined in the same fashion as the usual multilocus linkage (gametic) disequilibria on the basis of gametic and allelic frequencies. The estimation and test procedures are described with details being given for three loci. The sampling properties of the estimates are examined through Monte Carlo simulation. The estimates of three-locus associations are not free of bias due to the presence of two-locus associations and vice versa. The power of detecting the zygotic associations is small unless different loci are strongly associated and/or sample sizes are large (>100). The analysis of zygotic associations not only offers an effective means of packaging numerous genic disequilibria required for a complete characterization of multilocus structure, but also provides opportunities for making inference about evolutionary and demographic processes through a comparative assessment of zygotic association vs. gametic disequilibrium for the same set of loci in nonequilibrium populations.


MULTILOCUS associations are most commonly studied at the gametic level. In this case, linkage disequilibrium or more appropriately gametic disequilibrium can be used to sufficiently describe the nonrandom associations of alleles at different loci ordered within gametes (BENNETT 1954 Down; WEIR 1996 Down). The evidence of gametic disequilibrium is important in inferring about the history of a population, the evolutionary forces governing these loci, and the location of the loci on the chromosomes. This approach to studying multilocus associations is appropriate for a haploid population where different gametes can be counted directly or for a Hardy-Weinberg equilibrium population where gametic frequencies can be inferred from genotypic (zygotic) frequencies. However, natural populations are rarely at equilibrium because of many disturbing forces such as inbreeding, population structure, and selection. In a nonequilibrium population, a complete characterization of multilocus associations requires gametic and many other genic disequilibria (COCKERHAM and WEIR 1973 Down; WEIR 1979 Down). Even with a moderate number of loci each with a few alleles, the number of genic disequilibria to be characterized and estimated can quickly increase beyond comprehension. Thus, with a large number of loci each with many alleles, it is necessary to have a single measure that is similar to gametic disequilibrium, but at zygote level.

Recently, YANG 2000 Down described characterization and estimation of such a measure for a pair of loci, which is called zygotic association. According to YANG 2000 Down, the zygotic association is simply the deviation of two-locus zygotic frequencies from products of single-locus zygotic frequencies, but is composed of all nonallelic genic disequilibria at the two loci. Thus, in experimental population genetic studies, the zygotic association can be estimated directly by comparing the two- and single-locus zygotic frequencies observed in a sample of diploid individuals. HALDANE 1949 Down was probably the first to recognize that the zygotic association can be generated as a result of partial inbreeding even in a linkage (gametic) equilibrium population. Subsequent studies have shown that such zygotic associations may arise from mixed selfing random mating (BENNETT and BINET 1956 Down; ALLARD et al. 1968 Down; WEIR and COCKERHAM 1973 Down), associative overdominance (OHTA and COCKERHAM 1974 Down; CHARLESWORTH 1991 Down), admixture of two or more distinct gene pools (BARTON and GALE 1993 Down), or heterotic selection (MITTON 1997 Down). Thus, knowledge of extent and patterns of zygotic associations at two or more loci is essential for inferring about evolutionary and demographic processes. However, while there is substantial literature on gametic disequilibria at three or more loci (e.g., BENNETT 1954 Down; BROWN 1975 Down; HILL 1975 Down; THOMSON and BAUR 1984 Down; BARTON 2000 Down), equivalent development for multilocus zygotic associations is not yet available. In this article, we first describe the joint distributions of zygotes at multiple loci and their relationships with heterozygosities and zygotic associations. We then describe statistical procedures of estimating and testing multilocus zygotic associations from a sample of diploid individuals, with the details being given for the case of three loci. The sampling properties of the estimates are examined by computer simulation.


*  THEORY AND ANALYSIS
*TOP
*ABSTRACT
*THEORY AND ANALYSIS
*STATISTICAL INFERENCE
*DISCUSSION
*LITERATURE CITED

Consider a diploid population in which individual genotypes are known at m loci (e.g., codominant phenotypic markers such as the MN blood groups, allozymes, and microsatellites). Then, a genotype at a particular locus can be unambiguously recognized as a homozygote or heterozygote, depending on whether or not the two alleles at the locus are the same. Just as frequencies of genes or gametes at one or more loci are needed for defining and characterizing gametic disequilibria, frequencies of zygotes at one or more loci are required for defining and characterizing multilocus zygotic associations. These zygotic frequencies and their relationships with heterozygosities and multilocus zygotic associations are described below.

One locus:
At a given locus, say locus j, the probability of an individual genotype being heterozygous or homozygous is defined as

(1)

where indicator Xj takes either 1 or 0 to signal whether the genotype at the jth locus is a heterozygote or homozygote, and Hj is the population heterozygosity at locus j. Thus, . If the population is in Hardy-Weinberg equilibrium, then the heterozygosity (Hj) is reduced to the gene diversity or expected heterozygosity under Hardy-Weinberg equilibrium (hj). The relationship between Hj and hj is given in YANG 2000 Down(Equation 5). Such a relationship has been the basis for detecting Hardy-Weinberg disequilibrium (WEIR 1996 Down).

Two loci:
When two loci, say loci j and l, are considered, the joint distribution of indicators, Xj and Xl, is

(2)

where f(Xj), for example, is given in (1) and {omega}jl is the zygotic association between loci j and l (YANG 2000 Down). Thus, , and . The marginal frequencies for the individual loci are: . These relationships enable {omega}jl to be expressed in one of the following five ways:

Clearly, the zygotic association ({omega}jl) is bounded by the marginal zygotic frequencies at the two individual loci,

(3)

Given that the variance of Xj, , and the covariance between Xj and Xl, , the correlation between heterozygosities at loci j and l is given by

To see how the zygotic association ({omega}jl) is related to different genic disequilibria including gametic disequilibrium, it is necessary to first identify the relationships between joint frequencies of homozygotes and heterozygotes in (2) and genotypic frequencies,

(4a)

where, for example, jlPuyvz is the frequency of genotypes at loci j and l from the union of gametes ju ly and jv lz (u, v = 1, 2, ... , r; y, z = 1, 2, ... , s). Then, using COCKERHAM and WEIR's (1973) disequilibrium functions for the two-locus frequencies (e.g., jlPuyuy), the zygotic association at loci j and l can be expressed in terms of individual genic disequilibria,

(4b)

where jpu, for example, is the frequency of allele u at locus j. Clearly, each genic disequilibrium (D) in (4b) is the deviation of a frequency from that based on random association of genes and accounting for any lower order disequilibria. For example, the gametic disequilibrium (jlDuy··) is the deviation of frequency of gamete july from the product of frequencies of alleles u and y at loci j and l, . It is also evident from (4b) that even in a gametic equilibrium population , nonzero zygotic associations can arise from other forces such as partial inbreeding as a result of "identity disequilibrium" (jlDuyvz != 0). On the other hand, in a Hardy-Weinberg equilibrium population, the zygotic association is a function of gametic disequilibrium only. YANG 2000 Down has described in detail the interrelationships among gene frequencies, genic disequilibria, and two-locus zygotic associations.

Three loci:
When three or more loci are considered jointly, two alternative approaches can be used to describe zygotic associations at these loci. The first is BARTLETT's (1935) multiplicative approach based on the multiway contingency table. In the case of three loci, the absence of three-locus zygotic association but the presence of all three pairwise associations implies that

where f(111), for example, is the joint frequency of heterozygotes at the three loci. However, no explicit formulas for these joint zygotic frequencies can be given, and the numerical solutions are often sought. The second approach is the additive formulation of BENNETT 1954 Down in which the joint frequencies of heterozygosities at three loci, for example, are linear functions of heterozygosities and two- and three-locus zygotic associations. Because of its relative simplicity relating to estimation and hypothesis testing, the additive approach is used for the subsequent development of multilocus zygotic associations. Thus, the joint distribution of indicators Xj, Xl, and Xo for loci j, l, and o is given by

(5)

where f(Xj) and {omega}jl, for example, are given in (1) and (2), and {omega}jlo is the three-locus zygotic association. Three-locus independence is implied by zero three-locus association, but with the presence of all pairwise associations, i.e.,

(6)

Unlike the two-locus associations (e.g., {omega}jl) where the minimum is always zero for both {omega}jl > 0 and {omega}jl < 0 (cf. Equation 3), the three-locus zygotic associations may sometimes be bounded away from zero. In other words, both positive and negative {omega}jlo values may be constrained by their own minimum and maximum values. Let us first define two quantities,

where , for example, is obtained using (6). Thus, following the development by THOMSON and BAUR 1984 Down for the three-locus gametic disequilibrium, the maximum and minimum values for {omega}jlo > 0 [denoted as {omega}+jlo(max) and {omega}+jlo(min)] are

(7a)

and those for {omega}jlo < 0 [denoted as {omega}-jlo(max) and {omega}-jlo (min)] are

(7b)

Since any of the eight f0(XjXlXo) values can be negative, neither {omega}+jlo(min) nor {omega}-jlo(min) is necessarily zero. Clearly, the ß1 and ß2 values can be used to determine the sign and range of {omega}jlo,

(8)

but some pairs of the ß1 and ß2 values (e.g., ß1 < 0 and ß2 < 0) would not lead to the definable {omega}jlo. Given 0 <= Hj, Hl, Ho <= 1, all profiles of heterozygosities {Hj Hl Ho}, but with no two-locus associations (i.e., ), would produce the condition of ß1 >= 0 and ß2 >= 0 [i.e., ] and thus definable {omega}jlo values. However, given a heterozygosity profile, not all configurations of two-locus zygotic associations {{omega}jl {omega}jo {omega}lo} would lead to the conditions given in (8) under which {omega}jlo can be defined.

Table 1 shows some numerical examples to illustrate effects of heterozygosities, which are pairwise zygotic associations on the ranges of the three-locus association ({omega}jlo). For example, with and , the ranges for {omega}jl, {omega}jo, and {omega}lo are -0.0025 <= {omega}jl <= 0.0475, -0.005 <= {omega}jo <= 0.045, and -0.005 <= {omega}lo <= 0.045, respectively. Each of these three ranges is divided by 19 to obtain 20 equally divided values from the minimum to the maximum. Thus, there are 8000 (20 {omega}jl's x 20 {omega}jo's x 20 {omega}lo's) configurations of the two-locus zygotic associations that can be used to define the ranges for {omega}jlo. Of these 8000 configurations, 240 have the ranges with {omega}jlo < 0, 1496 have the ranges with {omega}jlo > 0, and 2158 have the ranges of {omega}-jlo(max) <= {omega}jlo <= {omega}+jlo(max), but the remaining 4106 configurations do not lead to any definable {omega}jlo. In each of these three cases, we identify a configuration that leads to the maximum range of {omega}jlo (it is noted that many other configurations may also lead to the same maximum range in each case).


 
View this table:
In this window
In a new window

 
Table 1. Ranges for three-locus zygotic associations

As in the two-locus case, we wish to learn how the three-locus zygotic association ({omega}jlo) is related to genic disequilibria. To focus our interest in the relationships between zygotic associations and gametic disequilibria, we assume that a zygote is formed from random union of two gametes (i.e., the population is in Hardy-Weinberg equilibrium). Because, under this assumption, the zygotic frequencies are just products of the gametic frequencies, the three-locus zygotic and gametic frequencies are directly related. For convenience, we consider only the case of two alleles at each of the three loci and the notation in this case is varied to reduce the superscripts and subscripts. The frequencies of the two alleles J and j at locus j are pJ and pj (=1 - pJ), those of the two alleles L and l at locus l are pL and pl (=1 - pL), and those of the two alleles O and o at locus o are pO and po (=1 - pO). The gametic disequilibrium between the j-l loci is denoted as Djl, between the j-o loci as Djo, between the l-o loci as Dlo, and the three-locus gametic disequilibrium as Djlo. These seven parameters are used to obtain the vector of eight three-locus gametic frequencies, , where gJLO, for example, is

(BROWN 1975 Down; THOMSON and BAUR 1984 Down). Under random mating, the zygotic frequencies are just appropriate sums of elements of the matrix gg', but the level of heterozygosity is not in any particular sorted order. Fortunately, an (8 x 8) matrix G is found to enable the relationships between heterozygosities and gametic frequencies to be expressed directly, i.e., f = Gg, where f = [f(000) f(001) f(010) f(011) f(100) f(101) f(110) f(111)]' and

(9)

Note that G is a symmetric matrix and the eight gametic frequencies in its first row or column are identical to those in g whereas rows (columns) 2 to 8 are just different rearrangements of the eight gametic frequencies required to obtain the desired zygotic frequencies in f.

In the absence of two-locus gametic disequilibria (i.e., ), the expressions of zygotic frequencies in f are greatly simplified. For example, the frequency of homozygotes at all three loci, j, l, and o, is given by

(10)

Here Hj, for example, is the same as the expected heterozygosity under Hardy-Weinberg equilibrium . Given that (YANG 2000 Down), for example, in the absence of the pairwise gametic disequilibria. Thus, the three-locus zygotic association can be expressed in terms of gene frequencies and three-locus gametic disequilibrium

(11)

(cf. Equation 5). Table 2 lists the values of three-locus zygotic association ({omega}jlo) in the presence of three-locus gametic disequilibrium (Djlo) but absence of the pairwise disequilibria for various gene frequencies (pJ <= pL <= pO). In this case, the range of Djlo is defined by the gene frequencies -pJ pL pO <= Djlo <= pJ pL(1 - pO). The values of {omega}jlo are calculated for two sizes of negative Djlo (-pJ pL pO and -0.5pJ pL pO) and two sizes of positive Djlo [pJ pL(1 - pO) and 0.5pJ pL(1 - pO)]. The higher the absolute values that Djlo can take, the larger the absolute values of {omega}jlo. It is evident from (11) that {omega}jlo is always negative if Djlo < 0, but can be either positive or negative if Djlo > 0 with {omega}jlo being positive only if 0 < Djlo < (1 - 2pJ)(1 - 2pL)(1 - 2pO)/4. As pJ, pL, and pO are approaching 0.5, the range of Djlo is expanded and the absolute values of {omega}jlo are increased. In the cases of pJ = 0.5 or pL = 0.5 or pO = 0.5, {omega}jlo is always negative because .


 
View this table:
In this window
In a new window

 
Table 2. Three-locus gametic and zygotic associations

The combined effect of two- and three-locus gametic disequilibria on the values of {omega}jlo is also examined (numerical results are not presented). The joint contribution of two- and three-locus gametic disequilibria to {omega}jlo greatly cloaks their relationships with {omega}jlo. However, there are clearly cases where the {omega}jlo values exceed the limits of {omega}jlo under the cases of no pairwise disequilibria. For example, for , the ranges for Djl, Djo, and Dlo are all from -0.25 to 0.25, but permissible values of Djlo are determined by different combinations of these pairwise disequilibria with the given gene frequencies (THOMSON and BAUR 1984 Down). It is found that when , {omega}jlo is -0.5, which exceeds the limit of -0.125 in the case of no two-locus disequilibria .

More than three loci:
The extension to four or more loci following BENNETT 1954 Down is straightforward. For example, the joint distribution of indicators Xj, Xl, Xo, and Xq for loci j, l, o, and q, respectively, is given by

(12)

where f(Xj), {omega}jl, and {omega}jlo, for example, are given in (1), (2), and (5), and {omega}jloq is the four-locus zygotic association. In other words, the frequencies of 16 zygote classes for loci j, l, o, and q can be uniquely defined in terms of the four heterozygosities for individual loci, the six pairwise zygotic associations, four three-locus zygotic associations, and one four-locus association. The three products of pairwise zygotic associations in the last term of (12) arise from the "two-locus" recombination, a distinct feature inherent in the associations for more than three linked loci (BENNETT 1954 Down; LEWONTIN 1964 Down; COCKERHAM and TACHIDA 1986 Down). A set of functions, f0(XjXlXoXq), can be defined in a similar manner as (6) for f0(XjXlXo) to provide the basis for defining the range of {omega}jloq. The higher order zygotic associations are required for deriving higher moments of the number of heterozygous loci (YANG 2000 Down) or covariances of two-locus sample zygotic associations (WEIR 1996 Down, Chap. 4).


*  STATISTICAL INFERENCE
*TOP
*ABSTRACT
*THEORY AND ANALYSIS
*STATISTICAL INFERENCE
*DISCUSSION
*LITERATURE CITED

Maximum-likelihood estimation:
For m loci, there are 2m possible classes of zygotes with two extreme classes being m-locus homozygotes (00 · · · 0) and m-locus heterozygotes (11 · · · 1). A total of 2m - 1 parameters can be estimated. Here we focus on the estimation for the case of three loci (m = 3), letting j = 1, l = 2, and o = 3 for convenience. Table 3 lists the eight classes of zygotes with the expected frequencies of f(000), f(001), f(010), f(011), f(100), f(101), f(110), and f(111) as obtained from (5). Seven parameters are estimable: three heterozygosities (H1, H2, and H3), three two-locus zygotic associations ({omega}12, {omega}13, and {omega}23), and one three-locus zygotic association ({omega}123). If a sample of n individuals is taken from a diploid population and if the numbers of each class in the sample are assumed to be multinomially distributed, frequencies of these classes can be estimated using the maximum-likelihood (ML) method. Let nabc be the numbers of the abcth class of zygotes with a, b, and c representing indicators X1, X2, and X3, respectively. Thus the ML estimates of f's are given by to satisfy the maximized multinomial likelihood,


 
View this table:
In this window
In a new window

 
Table 3. Joint frequencies of zygotes at three loci

Various one- and two-locus marginal frequencies are given by sums of the three-locus frequencies as indicated by dots for the indices summed. For example, and . Note that the one-locus marginal frequencies, , and , are the estimates of heterozygosities at loci 1, 2, and 3, respectively. The zygotic associations for two loci (e.g., {omega}12) and for all three loci ({omega}123) are estimated as

(13a)

and

(13b)

respectively. These ML estimates are biased as indicated from their expected values,

Sampling variances of linear combinations of multinomial variables are known exactly. For example, . The sampling variances of zygotic association estimates involve quadratic functions of observed heterozygosities and can be calculated using FISHER's (1954) expression for the approximate variance of a function of multinomial observations nabc, for example, with expectations . The sampling variances of 12 and 123 are

(14a)

and

(14b)

where with t = 1, 2, 3. Equation 14a and Equation 14b are essentially the same as Equation 3 and Equation 13a HREF="#FD13b">Equation 13b of BROWN 1975 Down for the sampling variances of two- and three-locus gametic disequilibria.

Hypothesis testing:
Since the ML estimate i is approximately normally distributed, i.e., i ~ N[E(i), Var(i)], a test statistic ({chi}2i) that is constructed, after setting {omega}i to zero in both E(i) and Var(i), is distributed as chi square with 1 d.f., where subscript i indexes for 12, 13, 23, and 123 for the three loci. For example, the test statistic for estimated zygotic association at loci 1 and 2 (12),

is used to test for {omega}12 = 0.

Simulation:
Monte Carlo simulation is carried out to examine the performance of the estimators and test statistics for the four zygotic associations, {omega}12, {omega}13, {omega}23, and {omega}123. The eight frequencies of zygote classes, f(X1X2X3), can be constructed from given values of the four zygotic associations and three heterozygosities, H1, H2, and H3 (cf. Table 3). For each of the 18 configurations given in Table 1, we consider three values (maximum, minimum, and zero) of three-locus zygotic association ({omega}123). Thus, there are a total of 54 populations constructed. From each population, 10,000 replicate samples of sizes n = 30, 100, and 300 are drawn. Estimation and test are made for each simulated sample and descriptive statistics are calculated across all the samples.

Table 4 presents means and standard deviations (SD) of estimates from the simulated samples for 8 of the 54 constructed populations described above. The simulation results are given only for n = 30 and n = 300. It is evident that the averages of estimated zygotic associations are very close to their theoretical values when there is no or little association. In this case, bias is expected to be negligible as it arises only from the factor of (n - 1)/n. However, when such a case is not true, there can be a substantial amount of bias in the estimates. For example, for the case of H1 = 0.1, H2 = 0.3, and H3 = 0.5 with {omega}12 = 0.023, {omega}13 = 0.026, {omega}23 = 0.103, and {omega}123 = -0.032, the respective averaged estimates of {omega}12, {omega}13, {omega}23, and {omega}123 are 0.023, 0.012, 0.099, and -0.025 for n = 30 and 0.024, 0.012, 0.103, and -0.027 for n = 300. While 12 and 23 are almost identical to their theoretical values, 13 is only less than one-half of its true value and 23 is also a downwardly biased estimate of {omega}123. However, when {omega}123 is set to zero, the estimates of all three two-locus zygotic associations are unbiased; conversely 123 is also an unbiased estimate of {omega}123 when there are no two-locus associations.


 
View this table:
In this window
In a new window

 
Table 4. Powers of detecting three-locus zygotic associations

While means of estimated zygotic associations for the two sample sizes in Table 4 are similar, the larger sample leads to a much smaller SD. It is thus no surprise to see that the larger sample leads to a much greater power of detecting nonzero zygotic associations. The estimated powers for the cases of no zygotic associations in Table 4 are close to 0.05 as expected because a 5% significance level is used to reject these null hypotheses. To further explore the effect of sample sizes on the power, we calculate the powers of detecting three-locus associations in the presence of two-locus associations (i) and (ii) , and (cf. Table 1) for sample sizes of 30, 100, 300, 500, and 1000 (the three loci are indexed as j = 1, l = 2, and o = 3). The critical value with a 5% significance level, c0.05, which determines the rejection region for the hypothesis , is

Thus, the power (the probability of rejecting the false H0) is given by

where {Phi}(x) is the cumulative density function of normal variate x. The results of power calculations are displayed in Fig 1. The power is very small when zygotic associations are close to zero and when sample sizes are small (<100). These results corroborate those by BROWN 1975 Down and THOMPSON et al. 1988 Down on detecting gametic disequilibria at two or three loci. On the other hand, BROWN et al. 1980 Down and YANG 2000 Down have concluded that the multilocus association in the variance of the number of heterozygous loci ({sigma}2K) is detectable in a sample of moderate size (>=30). However, the magnitude of such association in {sigma}2K may be appreciably larger than an individual association examined here because it is the sum of gametic disequilibria or zygotic associations between all pairs of loci.



View larger version (38K):
In this window
In a new window
Download PPT slide
 
Figure 1. Power to detect three-locus zygotic associations with samples of sizes n = 30 ({diamondsuit}), n = 100 ({diamond}), n = 300 (•), n = 500 ({circ}), and n = 1000 ({blacktriangleup}) for two cases: (A) when heterozygosities at three loci are H1 = H2 = 0.05 and H3 = 0.1, and three pairwise zygotic associations are {omega}12 = {omega}13 = {omega}23 = 0.0213; (B) when heterozygosities at three loci are H1 = 0.1, H2 = 0.3, and H3 = 0.5, and three pairwise zygotic associations are {omega}12 = 0.0226, {omega}13 = 0.0263, and {omega}23 = 0.00.1026.

It is also evident from Fig 1 that low heterozygosities at individual loci cause a very strong asymmetry between positive and negative associations. The unbalanced intensities of associations from both positive and negative sides result in unequal powers unless the sample size is very large. Because of the asymmetry, LEWONTIN's (1964) normalized associations as often used in the literature (e.g., HEDRICK 1987 Down; ZAPATA 2000 Down) may give a false impression about intensities of multilocus associations. For example, BROWN 1975 Down showed in his Table VII that, with the same amount of normalized three-locus gametic disequilibrium at both sides , there is a substantial difference in sample size requirements. In the case of gene frequencies equal to 0.2 and two two-locus gametic disequilibria being -0.4 with the third one being zero, BROWN 1975 Down found that a sample size of n = 8402 is required to detect with the power of 0.9, but only n = 82 is needed to detect with the same amount of power. Had he not given the range of T (-0.0016 to 0.0224), one would be led to believe that the negative disequilibrium is much more difficult to detect than its positive counterpart. The reverse conclusion would be drawn in the cases where the asymmetry is skewed toward the negative side. The truth is that it is the actual, not normalized, association that determines the power and sample size requirement regardless of whether the association is positive or negative. Thus, the use of normalized measures for such purposes should be treated with caution.


*  DISCUSSION
*TOP
*ABSTRACT
*THEORY AND ANALYSIS
*STATISTICAL INFERENCE
*DISCUSSION
*LITERATURE CITED

This article describes measures of zygotic associations at more than two loci and their estimation with samples from diploid populations. These measures are defined as departures of joint zygotic frequencies from the expected values of zero zygotic associations (cf. Equation 2 and Equation 5). This is very similar to the definition for gametic disequilibria for two or more loci, which is based on gametic and allelic frequencies (e.g., BENNETT 1954 Down). Thus, it is of little surprise to see that the measures of multilocus zygotic associations share most of the statistical properties by the usual gametic disequilibria. However, the meanings of the two sets of measures are quite different. In fact, a comparative assessment of zygotic associations vs. gametic disequilibria may provide some important insights into adaptive significance of genotypes at different loci. For example, if strong zygotic association but little gametic disequilibrium between a pair of loci is observed, then the study population may undergo natural selection favoring highly heterozygous individuals without distinguishing among different homozygotes in large and predominantly outcrossing populations (MITTON 1997 Down). The assessment would be most sensitive with quantitative trait loci (QTL) that directly affect components of fitness. However, a lack of zygotic associations may also mean that selection discriminates among different homozygotes (e.g., favoring common homozygotes, but selecting against rare homozygotes). Thus, extra care is needed to choose homozygous QTL with similar selection advantages for such an analysis.

There are a variety of methods of estimating and interpreting multilocus gametic disequilibria from haploid data or diploid data from a Hardy-Weinberg equilibrium population (e.g., BENNETT 1954 Down; BROWN et al. 1980 Down; BARTON 2000 Down). In contrast, with the diploid data from a Hardy-Weinberg disequilibrium population, a complete characterization of multilocus associations also requires other types of genic disequilibria (COCKERHAM and WEIR 1973 Down; WEIR 1979 Down). However, the exceedingly large number of genic disequilibria encountered for multiple alleles at many loci makes such detailed characterization difficult for comparing multilocus organizations among several populations. The multilocus zygotic associations analyzed here summarize different genic disequilibria with no need to consider whether or not the study population is in Hardy-Weinberg equilibrium. The estimation and hypothesis testing are quite straightforward as they are merely the direct adoption of the procedures used for diallelic haploid data. Thus, our method presents a simple solution to the analysis of complex multilocus structures in diploid populations.

Of course, such a highly compacted summary in the multilocus zygotic associations represents a severe loss of information. In particular, since the analysis is based on the frequencies of zygote classes, it completely ignores haplotype information such as linkages between different loci. Thus, when significant zygotic associations are detected, there is a need to determine which genic disequilibria are important. In light of great current interest in the linkage (gametic) disequilibrium approach to fine-scale QTL mapping (e.g., PRITCHARD and PRZEWORSKI 2001 Down; REICH et al. 2001 Down), it is essential to determine if gametic disequilibrium is important in the presence of significant zygotic associations. As shown earlier, if the study population is in Hardy-Weinberg equilibrium, then there are direct relationships between zygotic associations and various orders of gametic disequilibria (cf. Equation 4b and Equation 11). In this case, it is definitely more informative to work directly with the raw genotypic data instead of the collapsed data based on zygote classes so that haplotype frequencies and gametic disequilibrium can be inferred. However, in the presence of Hardy-Weinberg disequilibrium, which may often be the case in natural populations, gametic disequilibrium may be inflated because many other types of nonallelic disequilibria may also cause the multilocus associations. The knowledge about the inflation may be gained through the comparative assessment of gametic vs. zygotic associations mentioned above.

In estimating and testing for multilocus zygotic associations, we adopt BENNETT's (1954) additive approach, with frequencies of different zygote classes being expressed as a linear function of the zygotic associations and heterozygosities (Table 3). This approach enables us to explicitly give estimates and to elucidate the sampling properties of these estimates. However, our tests for two- and three-locus associations are not independent as shown in the simulation results (Table 3). HILL 1975 Down discussed the use of the multiplicative approach (or log-linear model analysis) for developing an independent test for no three-locus association, but with the presence of two-locus associations. Another possibility is the exact test as suggested by ZAYKIN et al. 1995 Down. In the exact test, the probability of the observed multilocus genotypic (zygotic) array conditional on the genotypic arrays expected under an appropriate hypothesis of zero zygotic association is evaluated to determine whether or not it lies in the tail of the empirical distribution generated by permutation. For example, the conditional probability required for testing if {omega}123 = 0, given the presence of all three two-locus associations, is given by

where nab+, na+c, and n+bc are marginal total counts of the abth, acth, and bcth classes of zygotes at locus pairs 12, 13, and 23, respectively. However, both log-linear model analysis and exact test do not allow for the explicit expression of the multilocus zygotic associations.


*  ACKNOWLEDGMENTS

I thank Dr. Yun-Xin Fu and a reviewer for helpful comments. This research was partially supported by the Natural Sciences and Engineering Research Council of Canada grant OGP0183983.

Manuscript received April 25, 2001; Accepted for publication February 19, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THEORY AND ANALYSIS
*STATISTICAL INFERENCE
*DISCUSSION
*LITERATURE CITED

ALLARD, R. W., S. K. JAIN, and P. L. WORKMAN, 1968  The genetics of inbreeding populations. Adv. Genet. 14:55-131.

BARTLETT, M. S., 1935  Contingency table interactions. J. R. Stat. Soc. Suppl. 2:248-252.

BARTON, N. H., 2000  Estimating multilocus linkage disequilibria. Heredity 84:373-389.

BARTON, N. H., and K. S. GALE, 1993 Genetic analysis of hybrid zones, pp. 13–45 in Hybrid Zones and the Evolutionary Process, edited by R. G. HARRISON. Oxford University Press, New York.

BENNETT, J. H., 1954  On the theory of random mating. Ann. Eugen. 18:311-317.

BENNETT, J. H. and F. E. BINET, 1956  Association between Mendelian factors with mixed selfing and random mating. Heredity 10:51-55.

BROWN, A. H. D., 1975  Sample sizes required to detect linkage disequilibrium between two or three loci. Theor. Popul. Biol. 8:184-201[Medline].

BROWN, A. H. D., M. W. FELDMAN, and E. NEVO, 1980  Multilocus structure of natural populations of Hordeum spontaneum.. Genetics 96:523-536[Abstract/Free Full Text].

CHARLESWORTH, D., 1991  The apparent selection on neutral marker loci in partially inbreeding populations. Genet. Res. 57:159-175.

COCKERHAM, C. C. and H. TACHIDA, 1986  Linkage disequilibria in finite populations. Theor. Popul. Biol. 29:293-311[Medline].

COCKERHAM, C. C. and B. S. WEIR, 1973  Descent measures for two loci with some applications. Theor. Popul. Biol. 4:300-330[Medline].

FISHER, R. A., 1954 Statistical Methods for Research Workers, Ed. 12. Oliver & Boyd, London.

HALDANE, J. B. S., 1949  The association of characters as a result of inbreeding and linkage. Ann. Eugen. 15:15-23.

HEDRICK, P. W., 1987  Gametic disequilibrium measures: proceed with caution. Genetics 117:331-341[Abstract/Free Full Text].

HILL, W. G., 1975  Tests for association of gene frequencies at several loci in random mating diploid populations. Biometrics 31:881-888[Medline].

LEWONTIN, R. C., 1964  The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:49-67[Free Full Text].

MITTON, J. B., 1997 Selection in Natural Populations. Oxford University Press, Oxford.

OHTA, T. and C. C. COCKERHAM, 1974  Detrimental genes with partial selfing and effects on a neutral locus. Genet. Res. 23:191-200[Medline].

PRITCHARD, J. K. and M. PRZEWORSKI, 2001  Linkage disequilibrium in humans: model and data. Am. J. Hum. Genet. 69:1-14[Medline].

REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, and P. C. SABETI et al., 2001  Linkage disequilibrium in the human genome. Nature 411:199-204[Medline].

THOMPSON, E. A., S. DEEB, D. WALKER, and A. G. MOTULSKY, 1988  The detection of linkage disequilibrium between closely linked markers: RFLPs at the AI-CIII apolipoprotein genes. Am. J. Hum. Genet. 42:113-124[Medline].

THOMSON, G. and M. P. BAUR, 1984  Third order linkage disequilibrium. Tissue Antigens 24:250-255[Medline].

WEIR, B. S., 1979  Inferences about linkage disequilibrium. Biometrics 35:235-254[Medline].

WEIR, B. S., 1996 Genetic Data Analysis II. Sinauer Associates, Sunderland, MA.

WEIR, B. S. and C. C. COCKERHAM, 1973  Mixed self and random mating at two loci. Genet. Res. 21:247-262[Medline].

YANG, R.-C., 2000  Zygotic associations and multilocus statistics in a nonequilibrium diploid population. Genetics 155:1449-1458[Abstract/Free Full Text].

ZAPATA, C., 2000  The D' measure of overall gametic disequilibrium between pairs of multiallelic loci. Evolution 54:1809-1812[Medline].

ZAYKIN, D., L. ZHIVOTOVSKY, and B. S. WEIR, 1995  Exact tests for association between alleles at arbitrary numbers of loci. Genetica 96:169-178[Medline].




This article has been cited by other articles:


Home page
GeneticsHome page
T. Liu, R. J. Todhunter, Q. Lu, L. Schoettinger, H. Li, R. C. Littell, N. Burton-Wurster, G. M. Acland, G. Lust, and R. Wu
Modeling Extent and Distribution of Zygotic Disequilibrium: Implications for a Multigenerational Canine Pedigree
Genetics, September 1, 2006; 174(1): 439 - 453.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R.-C. Yang
Epistasis of Quantitative Trait Loci Under Different Gene Action Models
Genetics, July 1, 2004; 167(3): 1493 - 1505.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R.-C. Yang
Gametic and Zygotic Associations
Genetics, September 1, 2003; 165(1): 447 - 450.
[Full Text] [PDF]


Home page
GeneticsHome page
X.-Y. Lou, G. Casella, R. C. Littell, M. C. K. Yang, J. A. Johnson, and R. Wu
A Haplotype-Based Algorithm for Multilocus Linkage Disequilibrium Mapping of Quantitative Trait Loci With Epistasis
Genetics, April 1, 2003; 163(4): 1533 - 1548.
[Abstract] [Full Text] [PDF]