Genetics, Vol. 155, 1449-1458, July 2000, Copyright © 2000

Zygotic Associations and Multilocus Statistics in a Nonequilibrium Diploid Population

Rong-Cai Yanga
a Research Division, Alberta Agriculture, Food and Rural Development, Edmonton, Alberta T6H 5T6, Canada and Department of Renewable Resources, University of Alberta, Edmonton, Alberta T6G 2H1, Canada

Corresponding author: Rong-Cai Yang, Alberta Agriculture, Food and Rural Development, #202, 7000 - 113 St., Edmonton, Alberta T6H 5T6, Canada., rongcai.yang{at}agric.gov.ab.ca (E-mail)

Communicating editor: A. H. D. BROWN


*  ABSTRACT
*TOP
*ABSTRACT
*ZYGOTIC ASSOCIATIONS
*MULTILOCUS HETEROZYGOSITY
*NUMERICAL ANALYSIS
*DISCUSSION
*LITERATURE CITED

The usual approach to characterizing and estimating multilocus associations in a diploid population assumes that the population is in Hardy-Weinberg equilibrium. The purpose of this study is to develop a set of summary statistics that can be used to characterize and estimate the multilocus associations in a nonequilibrium population. The concept of "zygotic associations" is first expanded to facilitate the development. The summary statistics are calculated using the distribution of a random variable, the number of heterozygous loci (K) found in diploid individuals in the population. In particular, the variance of K consists of single-locus and multilocus components with the latter being the sum of zygotic associations between pairs of loci. Simulation results show that the multilocus associations in the variance of K are detectable in a sample of moderate size (>=30) when the sum of all pairwise zygotic associations is greater than zero and when gene frequency is intermediate. The method presented here is a generalization of the well-known development for the Hardy-Weinberg equilibrium population and thus may be of more general use in elucidating the multilocus organizations in nonequilibrium and equilibrium populations.


THE extent and patterns of nonrandom associations between linked as well as independent loci provide important information about the history of a population, the evolutionary forces governing these loci, and the location of the loci on the chromosomes. Such multilocus associations may arise from many demographic and evolutionary events including epistatic selection, random drift due to population growth and decline, mixing of two or more distinct gene pools, nonrandom mating, and mutation, regardless of whether or not the loci are physically linked (e.g., HEDRICK et al. 1978 Down; BROWN 1979 Down; BARTON and CLARK 1990 Down).

A number of statistical measures have been proposed to characterize the multilocus associations, but the literature has focused on characterizing gametic disequilibria, i.e., nonrandom associations of alleles at two loci ordered within gametes (e.g., HEDRICK 1987 Down). While these measures are useful for analyzing haploid data or diploid data from a Hardy-Weinberg equilibrium population, they may not be appropriate for a nonequilibrium diploid population in which a complete characterization of two-locus associations also requires other types of disequilibria (COCKERHAM and WEIR 1973 Down; WEIR 1979 Down). For example, in a hybrid population arising from mixing of genes from two or more populations or species, alleles derived from the same populations or species tend to cluster together in the same individuals, either because of WAHLUND 1928 Down effect or because of strong selection against hybrids or both. The resulting Hardy-Weinberg disequilibria at individual loci and multilocus associations across loci may be persistent and may be detectable for a number of generations after an initial mixing of gene pools. Thus, the multilocus associations in the hybrid population need to be characterized at the zygote level.

A related issue about characterizing and testing the multilocus associations is that most of the proposed measures are defined for a pair of loci only. When there are a large number of loci, each having many alleles, pairwise measures may be too many to be readily manageable and interpretable. For example, for 20 loci, each with four alleles in a nonequilibrium population, there are 6 independent Hardy-Weinberg disequilibria for each of the 20 loci, 9 gametic disequilibria, 9 nongametic disequilibria, 54 trigenic disequilibria, and 45 quadrigenic disequilibria for each of 190 locus pairs. Furthermore, unless a stringent significance level is imposed, the large number of required pairwise tests under commonly used significance levels of 5 and 1% may produce spurious association realizations (KARLIN and PIAZZA 1981 Down; WEIR 1996 Down, pp. 133–135). Therefore, it is desirable to have a set of summary statistics that adequately describe the extent and patterns of multilocus structure in a nonequilibrium population.

The objective of this study is to develop such a set of summary statistics. The concept of "zygotic associations" (HALDANE 1949 Down; BENNETT and BINET 1956 Down; ALLARD et al. 1968 Down) is first expanded to facilitate the development. The summary statistics are calculated using the distribution of a random variable, the number of heterozygous loci (K) found in diploid individuals in the population. A similar method by BROWN et al. 1980 Down has been used to analyze multilocus data collected from haploid, inbred, or random mating populations (e.g., BROWN et al. 1980 Down; WHITTAM et al. 1983 Down; NEVO and BEILES 1989 Down; MAYNARD SMITH et al. 1993 Down; YEH et al. 1994 Down; HAUBOLD et al. 1998 Down), but it considers only gametic disequilibrium. Numerical analyses are also carried out to depict the dependence of the zygotic associations on gene frequencies and various disequilibria and to examine the sensitivity of our method for detecting the multilocus zygotic associations.


*  ZYGOTIC ASSOCIATIONS
*TOP
*ABSTRACT
*ZYGOTIC ASSOCIATIONS
*MULTILOCUS HETEROZYGOSITY
*NUMERICAL ANALYSIS
*DISCUSSION
*LITERATURE CITED

Let us consider a diploid population in which individual genotypes are known at each of m loci. Two of these m loci are indexed by j and l with alleles ju, u = 1, 2, ... , r and ly, y = 1, 2, ... , s, respectively. Frequencies of genotypes at loci j and l from the union of gametes july and jvlz are written as jlPuyvz = jlPvzuy. WEIR 1979 Down described various marginal totals that are sums of genotypic frequencies indicated by dots for the indices summed. For example, one-locus genotypic frequencies for jujv and lylz are denoted by

and frequencies of alleles ju and ly are given by

Following BENNETT and BINET 1956 Down and ALLARD et al. 1968 Down, we now define a zygotic association between loci j and l as a deviation of joint frequencies of double heterozygotes from products of frequencies of heterozygotes at the two loci:

(1)

The other three zygotic associations, jl{omega}uyuz, jl{omega}uyvy, and jl{omega}uyuy, can be similarly defined by substituting appropriate allele indexes in (1). It is easy to find the ranges of these zygotic associations. For example, the range of jl{omega}uyvz is

(2a)

This dependence of the zygotic association on the marginal frequencies at single loci suggests a need to normalize the zygotic association jl{omega}uyvz,

(2b)

which is analogous to LEWONTIN 1964 Down normalized gametic disequilibrium.

When summing over all alleles at loci j and l, we obtain an overall measure of zygotic associations ({omega}jl) and the following relations:

(3)

Thus, the sum {Sigma}ru=1{Sigma}rv=1{Sigma}sy=1{Sigma}sz=1jlPuyvz = 1 can be expanded into four classes of genotypic frequencies: (i) frequency of being homozygous at both loci; (ii) homozygous at locus j and heterozygous at locus l; (iii) heterozygous at locus j and homozygous at locus l; and (iv) heterozygous at both loci,

(4)

where Hj and Hl are the population heterozygosities at loci j and l,

(5)

with hj (= 1 - {Sigma}ru=1jp2u) and jDu.u. (= -{Sigma}v!=ujDu.v.), for example, being the gene diversity (or expected heterozygosity under Hardy-Weinberg equilibrium) and Hardy-Weinberg disequilibrium for allele u at locus j, respectively.


*  MULTILOCUS HETEROZYGOSITY
*TOP
*ABSTRACT
*ZYGOTIC ASSOCIATIONS
*MULTILOCUS HETEROZYGOSITY
*NUMERICAL ANALYSIS
*DISCUSSION
*LITERATURE CITED

Number of heterozygous loci (K):
When a diploid individual is randomly taken from the population (defined above), it can be either homozygote or heterozygote at a given locus. If all m loci are evaluated, then the random variable K is simply the number of heterozygous loci found in the randomly chosen diploid individual from the population. Thus, K is the sum of m indicator variables, K = {Sigma}mj=1Xj, where Xj takes either 1 or 0, depending on whether the jth locus is heterozygous or homozygous. The probability that this locus is heterozygous is Hj, the population heterozygosity at the jth locus, and the probability that it is homozygous is 1 - Hj. K can take any integer value from 0 to m. If K = 0, then all m loci are homozygous; if, on the other hand, K = m, then all m loci are heterozygous.

Moments of K:
The expected value of K is

(6)

and the second to fourth central moments are given by, letting xj = Xj - E(Xj),

(7a)


(7b)

and

(7c)

where, for example, E(x2jxl) is the {21}th central mixed moment of variables Xj and Xl for loci j and l ( ELANDT-JOHNSON 1971 Down, pp. 106–107). It is evident from (7a)–(7c) that evaluating the ith central moment of K requires a specification of joint genotypic frequencies for i loci, which include various associations for genes at up to i loci. For example, the variance (second central moment) of K is a function of single-locus heterozygosities and two-locus associations only and is independent of higher-order associations involving three or more loci. Similar arguments can be carried out for the third or higher central moments of K. If there is complete interlocus independence, (7a)–(7c) reduce to (3)–(5) of BROWN et al. 1980 Down but we use heterozygosity {Hj} instead of gene diversity {hj} to measure genetic variation at individual loci. When the population is in Hardy-Weinberg equilibrium, the heterozygosity equals to the gene diversity (cf. Equation 5).

Variance of K:
The variance of K as given in (7a) has two components, one being the sum of variances at individual loci and the other being the sum of covariances between pairs of loci,

(8)

where Var(Xj) = Hj - H2j and Cov(Xj, Xl) = {omega}jl as computed using the joint probability distribution between loci j and l (Table 1). Thus,

(9a)


 
View this table:
In this window
In a new window

 
Table 1. Joint frequency distribution of indicator variables Xj and Xl in terms of heterozygosities (Hj and Hl) and zygotic associations ({omega}jl) at loci j and l

It is evident from (1) and (3) that {omega}jl = {Sigma}ru=1{Sigma}sy=1 [jlPuyuy - jPu.lu.P.y.y], for example. Following COCKERHAM and WEIR 1973 Down and WEIR 1979 Down, the two-locus frequencies {jlPuyuy} are expressed in terms of gene frequencies and various genic disequilibria. Given these results and those in (5) for {Hj}, {sigma}2K in (9a) can be rewritten as

(9b)

where each genic disequilibrium (D) is the deviation of a frequency from that based on random association of genes and accounting for any lower-order disequilibria. Definitions and properties of these disequilibria are detailed in many places (e.g., WEIR 1979 Down). Here it suffices to recognize that there are five types of disequilibria: (i) single-locus digenic disequilibria (i.e., Hardy-Weinberg disequilibria, jDu.u. and lD.y.y); (ii) two-locus digenic disequilibria for gametic genes (i.e., gametic disequilibria, jlDuy..); (iii) two-locus digenic disequilibria for nongametic genes (i.e., nongametic disequilibria, jlDu..y); (iv) trigenic disequilibria (jlDuyu. and jlDuy.y); and (v) quadrigenic disequilibria (jlDuyuy).

Table 2 lists six special cases of {sigma}2K as given in (9a) or (9b). The first two cases assume that there are no zygotic associations between pairs of loci for all m loci ({Sigma} j<l {omega} jl = 0), but case 1 further assumes Hardy-Weinberg equilibrium in the population. When genotypes (zygotes) result from random union of gametes, all nongametic disequilibria including Hardy-Weinberg disequilibria at all loci disappear (e.g., jDu.u. = jlDu..y = jlDuy.y = jlDuyuy = 0). This leads to {sigma}2K(3) as given in case 3. {sigma}2K(3) was previously derived (cf. Equation 15 of BROWN et al. 1980 Down). Case 4 states a well-established fact that nonzero quadrigenic disequilibria occur under Hardy-Weinberg disequilibrium, even in a population that is in gametic equilibrium (e.g., HALDANE 1949 Down; BENNETT and BINET 1956 Down; WEIR and COCKERHAM 1973 Down).


 
View this table:
In this window
In a new window

 
Table 2. Single-locus and multilocus components of variance of K, {sigma}2K, under six special cases

The last two cases in Table 2 are not directly obtainable from (9a) or (9b), but rather serve to illustrate the difficulty of finding the maximum value of {sigma}2K because the upper bound for jl{omega}uyvz in (2a) is not unique. Case 5 portrays a scenario where all m loci are absolutely associated (CLEGG et al. 1976 Down). The final case constructs a population of hypothetical multilocus zygotes with maximum variance of heterozygosity by ranking the {Hj} such that H1 > H2 > H3 > ... > Hm. Similar expressions for these two cases were given by BROWN et al. 1980 Down and BROWN and BURDON 1983 Down for haploid and random mating populations.


*  NUMERICAL ANALYSIS
*TOP
*ABSTRACT
*ZYGOTIC ASSOCIATIONS
*MULTILOCUS HETEROZYGOSITY
*NUMERICAL ANALYSIS
*DISCUSSION
*LITERATURE CITED

Relationships between zygotic associations and genic disequilibria:
It is evident from (9a) and (9b) that the overall measure of zygotic associations between a pair of loci is a complex function of gametic, nongametic, trigenic, and quadrigenic disequilibria weighted appropriately by gene frequencies. The range of values for each of these disequilibria is defined by gene frequencies and disequilibria of lower orders. To further explore such intricate interrelationships among zygotic associations, gene frequencies, and various genic disequilibria, numerical calculations are carried out. For simplicity, let us assume that there are two alleles (1 and 2) at each of the two loci. Frequencies of the ten possible genotypes are denoted as P1111, P1112, P1212, P1121, P1122, P1221, P1222, P2121, P2122, and P2222, dropping the identifiers for the two loci. These genotypic frequencies are grouped into four classes (f00, f01, f10, and f11) based on whether genotypes at individual loci are homozygous or heterozygous (Table 3). The marginal totals for the individual loci are, respectively, f0. = f00 + f01, f1. = f10 + f11, f.0 = f00 + f10, and f.1 = f01 + f11. Thus, the overall measure of zygotic associations ({omega}) can be calculated using the relations given in Table 1. To gauge the relationships between zygotic associations, gene frequencies, and various disequilibria, the two-locus genotypic frequencies are expressed in terms of disequilibrium functions (cf. Table 6.1 of WEIR and COCKERHAM 1989 Down). All types of disequilibria except for Hardy-Weinberg disequilibria affect the zygotic associations because they are genic disequilibria between the two loci.


 
View this table:
In this window
In a new window

 
Table 3. Frequencies of homozygotes and heterozygotes in terms of frequencies of 10 possible genotypes at two loci (j and l)


 
View this table:
In this window
In a new window

 
Table 4. Joint frequencies of nine genotypes at loci j and l in terms of their single-locus genotypic frequencies and zygotic associations


 
View this table:
In this window
In a new window

 
Table 5. Mean, standard deviation, skewness, and kurtosis of s2K under zero zygotic association for two gene frequencies (p) and three Hardy-Weinberg disequilibria (D)


 
View this table:
In this window
In a new window

 
Table 6. Properties of sample variance of K, s2K, and its use to detect zygotic association {omega}1111 with two gene frequencies (p), three Hardy-Weinberg disequilibria (D), and three zygotic associations ({omega}1111), as estimated from 10,000 samples of size n = 30

We examine the effects of three genic disequilibria (gametic, trigenic, and quadrigenic disequilibria) on the distribution of zygotic associations. Since we assume equal gene frequencies (p) at both loci, the nongametic disequilibrium and gametic disequilibrium are equal, and so are the two trigenic disequilibria. To illustrate the three-way relationship, the effect of gene frequencies and gametic disequilibria on zygotic associations is depicted in Fig 1. In this case, the zygotic association is {omega} = 2(1 - 2p)2D + 4D2, where D (= D11.. = -D12.. = -D21.. = D22..) is the gametic disequilibrium. The maximum zygotic association ({omega} = 0.25) is obtained at p = 0.5 and D = ±0.25, but while {omega} always increases with D > 0, it can be negative with D < 0 for some gene frequencies as shown in Fig 1. The zygotic association is affected little by trigenic disequilibria, but increases with positive and decreases with negative quandrigenic disequilibria, respectively (the 3D plots for trigenic and quadrigenic disequilibria are not presented).



View larger version (34K):
In this window
In a new window
Download PPT slide
 
Figure 1. Dependence of zygotic associations on gene frequency and gametic disequilibrium.

Estimating zygotic associations from variance of multilocus heterozygosity:
The variance of K in (9a) suggests that the average zygotic associations () may be obtained by

(10)

where {sigma}2K(2) = {Sigma}mj=1(Hj - H2j) is for case 2 of Table 2. To estimate from a sample of n diploid individuals with m polymorphic loci, one needs to estimate {sigma}2K and single-locus heterozygosities, {Hj}. There are several discussions of procedure for estimating these parameters from a sample taken from a random mating population or haploid population (e.g., BROWN et al. 1980 Down; BROWN and BURDON 1983 Down; CHAKRABORTY 1984). Essentially the same estimation procedure is used in the following simulation study.

The nonequilibrium population for two loci each with two alleles is constructed using the fact that each two-locus genotypic frequency can be written as a sum of the product of single-locus frequencies and its zygotic association (Table 4). For a given gene frequency (p) at a locus, Hardy-Weinberg disequilibrium (D = D1.1. = -D1.2. = -D2.1. = D2.2.) is bounded by

(11)

so that the frequencies of the three genotypes at this locus are completely described by p and D: P1.1. = p2 + D, P1.2. = 2p(1 - p) - 2D, and P2.2. = (1 - p)2 + D. We simulate three D values: zero and half the maximum and minimum possible values as given in (11). While bounds of nine individual zygotic associations can be computed from the single-locus genotypic frequencies using (2a), we choose to compute only the four associations ({omega}1111, {omega}2121, {omega}1212, and {omega}2222) since the remaining five ({omega}1112, {omega}1121, {omega}1122, {omega}1222, and {omega}2122) are simply the functions of those four associations as explained in Table 4. For simplicity, a further assumption in our simulation is that only one zygotic association is present in the population and the other three are zero. Under this assumption, the bounds of these four zygotic associations are

(12)

We simulate three values of zygotic association: zero and half the maximum and minimum possible values as given in (12).

From each of 27 constructed populations [3 gene frequencies (p = 0.1, 0.3, and 0.5) x 3 values of Hardy-Weinberg disequilibrium x 3 values of zygotic association], 10,000 replicate samples of size n = 30 or n = 100 are drawn. For a sample of n diploid individuals, let tj be 1 or 0 according to whether the tth individual in the sample is heterozygous or homozygous at the jth locus. Then the number of heterozygous loci for this individual is t = {Sigma}mj=1tj. We compute the sample mean as = and the sample variance as

(13a)

Using various expectations of indicators defined for the sample (WEIR et al. 1990 Down; WEIR 1996 Down, pp. 142–144), it is easily seen that while the sample mean is an unbiased estimator of K, [E() = K], the sample variance (13a) is not an unbiased estimator of {sigma}2K, i.e., E(s2K) = []{sigma}2K, because we have divided by n rather than the customary (n - 1) in computing (13a). Clearly, the bias should be negligible unless the sample size is very small.

Under the null hypothesis of no zygotic association (H0), we estimate {sigma}2K(2) by computing the sample variance, s2K(2), as the sum of sample variances for m loci {s2j},

(13b)

where j = . While the estimator s2K(2) in (13b) is slightly biased for the same reason as in computing s2K, its expectation and sampling variance can be readily calculated by inserting the appropriate results in (7) under interlocus independence (see also Equation 3Equation 4Equation 5 of BROWN et al. 1980 Down) into the well-known formulas of KENDALL and STUART 1977 Down(Equations 10.8 and 10.9),

(14a)

and

(14b)

Two one-tailed tests are used to determine if the sample variance s2K is significantly greater than its expectation under zero zygotic association {sigma}2K(2). In the first test, assuming that the distribution of K under H0 approximates a normal distribution, the statistic

(15)

has a {chi}2 distribution with n d.f., where n is the number of diploid individuals in the sample and {sigma}2K(2) is estimated using (13b) [The chi-square test (15) would have d.f. = (n - 1) if the customary (n - 1) is used to compute s2K]. The null hypothesis (H0) is rejected if X2s2K exceeds 43.77 or 124.34, the upper-tailed 5% critical value of {chi}2 distribution with d.f. = 30 or d.f. = 100, respectively. MANLY 1985 Down(p. 331) defined a similar statistic for haploid data, but because s2K was computed from a sample of n2 "dependent" gamete pairs (comparisons) for n haplotypes (BROWN et al. 1980 Down), the appropriate degrees of freedom for the chi-square test are yet to be determined. Furthermore, HAUBOLD et al. 1998 Down recently provided a more appropriate formula to estimate {sigma}2K(2) for haploid data with an account of the interdependence between the gamete pairs. In the second test, assuming that the sampling distribution of s2K approximates normality, BROWN et al. 1980 Down suggested a test criterion of rejecting H0 if s2K > L, the upper-tailed 5% critical value for s2K. In our simulation, L is estimated by

(16)

Statistical properties of sample zygotic association and s2K are examined for the simulated samples of sizes n = 30 and n = 100. Despite the slight downward bias in the mean values of s2K(2) by a factor of (n - 1)/n, its observed standard deviations are very close to their expected values even for n = 30 (Table 5), suggesting that (14b) is an adequate approximation to the sampling variance of s2K(2). Table 5 also shows that Hardy-Weinberg disequilibrium (D) affects {sigma}2K(2) in an interesting way. Avoidance of mating between relatives (D < 0) increases heterozygosity whereas inbreeding (D > 0) decreases it. Thus, {sigma}2K(2) is expected to be greater for D < 0 or smaller for D > 0 than that for the equilibrium population (D = 0). However, this is not true when the gene frequency approaches p = 0.5. At p = 0.5, the maximum {sigma}2K(2) is obtained only when the population is in the Hardy-Weinberg equilibrium (D = 0) and any change in heterozygosity either due to avoidance of mating between relatives or to inbreeding would result in a smaller {sigma}2K(2). Negligible skewness and kurtosis suggest that the normality of the sampling distribution of s2K(2) required for the test criterion (16) is probably adequate even though our simulation results are limited to the two loci only. As expected, the estimates of zygotic association in all simulated populations are zero or very close to zero. The increase of sample size from n = 30 to n = 100 (not presented) has improved the results only slightly.

The means of 1111 are close to their respective theoretical values and the sampling variances of 1111 increase with increasing gene frequencies at n = 30 (Table 6). The increase of sample sizes from 30 to 100 reduces the sampling variances and downward bias of estimated {sigma}2K (results not presented for n = 100). The X2s2K test statistics are close to their expected values of 30.0 for n = 30 and 100.0 for n = 100 when zygotic association is small at low gene frequencies, but fluctuate with large positive or negative zygotic associations at more intermediate gene frequencies. The standard deviations of the chi-square statistics are also close to their expectations of 7.75 for n = 30 and 14.14 for n = 100 in most cases, but sizable discrepancies occur in the cases of large positive or negative zygotic associations. Similar patterns of sampling behaviors and properties are revealed for {omega}1212 = {omega}2121 and {omega}2222.

Judging from the estimated powers of the two test statistics, the zygotic associations are detectable only when they are positive and when the gene frequencies are close to 0.5 (Table 6). Fig 2 further shows that the powers increase with the large, positive zygotic associations and that zero powers are obtained for the large, negative zygotic associations when p = 0.5 and D = 0.125. Similar patterns are observed for other values of p and D. It is of interest to note that, unlike the nonlinear relationship in Fig 2, a linear relationship of zygotic associations with the variances of K or with chi-square values is observed (results not shown). The power should be 0.05 for the cases of no zygotic associations as a 5% significance level is used to reject these null hypotheses. According to this criterion, both tests perform reasonably well. While test (16) is slightly more powerful than test (15) in most cases, the two tests essentially provide the same amount of power across the range of zygotic associations. The increase of sample size from 30 to 100 results in an increase in the power of detecting the zygotic associations. Hardy-Weinberg disequilibrium (D) has little effect on the detection. For example, with p = 0.5, {omega}1111 = 0.0938 for both D = -0.125 and D = 0.125. The power estimates with n =30 are 0.810 for D = -0.125 and 0.816 for D = 0.125, according to the chi-square test criteria (15).



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 2. The relationships between zygotic associations and the estimated powers of two tests as given in Equation 15 (dashed lines) and Equation 16 (solid lines). Each point represents the power estimated from 10,000 simulated samples of sizes n = 30 (•) and n = 100 ({blacktriangleup}).


*  DISCUSSION
*TOP
*ABSTRACT
*ZYGOTIC ASSOCIATIONS
*MULTILOCUS HETEROZYGOSITY
*NUMERICAL ANALYSIS
*DISCUSSION
*LITERATURE CITED

A wide range of molecular data, from isozymes to newly developed microsatellite markers, is now available for population genetic analysis. The average heterozygosity across all the loci scored has been routinely used to summarize the molecular data at hand. In the presence of nonrandom associations within and among loci, there is a need to characterize various genic disequilibria (e.g., COCKERHAM and WEIR 1973 Down; WEIR 1979 Down; WEIR and COCKERHAM 1989 Down), but the number of disequilibria for multiple alleles and many loci quickly increases beyond comprehension. This article has expanded the earlier concept of zygotic associations to effectively summarize those disequilibria within and between pairs of loci [cf. (9a) and (9b)]. The measure of zygotic associations shares most of the properties by gametic disequilibrium, but at the zygote level (Table 4). Further, we have developed a method to compute a set of summary statistics that are used to characterize and estimate the multilocus associations in the nonequilibrium population. This development substantiates and complements the earlier development of BROWN et al. 1980 Down for a Hardy-Weinberg equilibrium population in which the multilocus associations are the function of only one type of two-locus disequilibria, gametic disequilibria. For the equilibrium population, our method reduces to that of BROWN et al. 1980 Down because, in this case, the gametic frequencies can be inferred from the zygotic frequencies at individual loci. However, our method should be of more general use in elucidating the multilocus organizations in nonequilibrium and equilibrium diploid populations. For haploid data such as those from genetic assessment of bacterial or inbred plant populations, the procedures of BROWN et al. 1980 Down and HAUBOLD et al. 1998 Down should be used to construct the distribution of K through comparing all possible pairs of gametes in a population and to estimate different moments of K with an account of the interdependence between the gamete pairs for detecting multilocus associations.

Our method may be particularly useful for characterizing and estimating the multilocus associations in hybrid populations. Because these populations arise from the mixing of two or more distinct gene pools, strong Wahlund effect and selection against heterozygotes may frequently occur, thereby maintaining Hardy-Weinberg disequilibrium and zygotic associations for a long time. Given that alleles derived from the same parental populations or species tend to cluster together in the same individuals, the majority of pairwise zygotic associations should be positive, leading to an easier detection of the multilocus associations from our summary statistics. BARTON and GALE 1993 Down have recently proposed a somewhat different method of estimating the multilocus associations from the variance of hybrid index for a hybrid population arising from the mixing of two parental gene pools. While their method is based on essentially the same strategy to summarize the multilocus data, it is of limited value in (i) detecting multilocus associations for hybrid populations arising from the mixing of more than two parental gene pools; (ii) using unfixed but informative markers for the multilocus analysis; and (iii) analyzing hybrid populations that are not in Hardy-Weinberg equilibrium.

The zygotic associations and multilocus statistics presented here are for a single nonequilibrium population. When many nonequilibrium populations are studied, the total variance of K may be partitioned into components due to the single-locus and multilocus effects of population subdivision. The method of partitioning the total variance of K among several haploid populations by BROWN and FELDMAN 1981 Down can be conceivably extended to account for population subdivision related to Hardy-Weinberg disequilibrium and zygotic associations in nonequilibrium populations. However, the analysis of variance (ANOVA) of the K values may be considered for the multilocus diploid data with a complex hierarchical population structure. In this case, the ANOVA procedure of estimating hierarchical F-statistics by YANG 1998 Down may be extended to assess the effects of multilocus population subdivision at different levels of hierarchy.

The two sample sizes (n = 30 and n = 100) in our simulation probably represent the two ends of what may be used in most experimental population genetic studies for measuring multilocus heterozygosity. The sample of n = 30 appears to provide sufficient power to detect zygotic associations, agreeing with BROWN et al. 1980 Down assertion that the multilocus statistics can be used with relatively small samples (in the order of 30). The increase in the sample size to n = 100 results only in slight to moderate reduction in the sampling variance of s2K and an increase in the powers of detecting the multilocus associations (Fig 2). We have not simulated samples of very small sizes that may occur in practice. With small sample sizes, the validity of the assumed distributions for the sample variance of K as required by tests (15) and (16) may not be warranted. In this case, the recently developed permutation test (GUO and THOMPSON 1992 Down) may be a preferred alternative to detect zygotic associations because it requires no assumptions about the distributions of multilocus statistics. In the permutation test, the null distribution [i.e., the distribution of s2K(2)] is generated by randomly shuffling the single-locus zygotes among individuals in the sample. This is very similar to the randomization scheme described by HAUBOLD et al. 1998 Down for haploid data, but it retains Hardy-Weinberg disequilibrium in the zygotes. However, the permutation test can be computationally intensive, particularly when the sample size is large. Thus, tests (15) and (16) should be useful for analyzing samples of moderate to large sizes.


*  ACKNOWLEDGMENTS

I thank three reviewers for comments and constructive criticisms on earlier versions of the manuscript. This research has been supported in part by the Natural Sciences and Engineering Research Council of Canada grant OGP0183983.

Manuscript received July 14, 1999; Accepted for publication April 3, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*ZYGOTIC ASSOCIATIONS
*MULTILOCUS HETEROZYGOSITY
*NUMERICAL ANALYSIS
*DISCUSSION
*LITERATURE CITED

ALLARD, R. W., S. K. JAIN, and P. L. WORKMAN, 1968  The genetics of inbreeding populations. Adv. Genet. 14:55-131.

BARTON, N. H., and A. G. CLARK, 1990 Population structure and process in evolution, pp. 115–173 in Population Biology, edited by K. WÖHRMANN and S. K. JAIN. Springer-Verlag, Berlin.

BARTON, N. H., and K. S. GALE, 1993 Genetic analysis of hybrid zones, pp. 13–45 in Hybrid Zones and the Evolutionary Process, edited by R. G. HARRISON. Oxford University Press, New York.

BENNETT, J. H. and F. E. BINET, 1956  Association between Mendelian factors with mixed selfing and random mating. Heredity 10:51-55.

BROWN, A. H. D., 1979  Enzyme polymorphism in plant populations. Theor. Popul. Biol. 15:1-42.

BROWN, A. H. D. and J. J. BURDON, 1983  Multilocus diversity in an outcrossing weed, Echium plantagineum L. Aust. J. Biol. Sci. 36:503-509.

BROWN, A. H. D. and M. W. FELDMAN, 1981  Population structure of multilocus associations. Proc. Natl. Acad. Sci. USA 78:5913-5916[Abstract/Free Full Text].

BROWN, A. H. D., M. W. FELDMAN, and E. NEVO, 1980  Multilocus structure of natural populations of Hordeum spontaneum.. Genetics 96:523-536[Abstract/Free Full Text].

CHAKARABORTY, R., 1984  Detection of nonrandom association of alleles from the distribution of the number of heterozygous loci in a sample. Genetics 108:719-731[Abstract/Free Full Text].

CLEGG, M. T., J. F. KIDWELL, M. G. KIDWELL, and N. J. DANIEL, 1976  Dynamics of correlated genetic systems. I. Selection in the region of the Glued locus of Drosophila melanogaster. Genetics 83:793-810[Abstract/Free Full Text].

COCKERHAM, C. C. and B. S. WEIR, 1973  Descent measures for two loci with some applications. Theor. Popul. Biol. 4:300-330[Medline].

ELANDT-JOHNSON, R. C., 1971 Probability Models and Statistical Methods in Genetics. Wiley, New York.

GUO, S. W. and E. A. THOMPSON, 1992  Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 48:361-372[Medline].

HALDANE, J. B. S., 1949  The association of characters as a result of inbreeding and linkage. Ann. Eugen. 15:15-23.

HAUBOLD, B., M. TRAVISANO, P. B. RAINEY, and R. R. HUDSON, 1998  Detecting linkage disequilibrium in bacterial populations. Genetics 150:1341-1348[Abstract/Free Full Text].

HEDRICK, P. W., 1987  Gametic disequilibrium measures: proceed with caution. Genetics 117:331-341[Abstract/Free Full Text].

HEDRICK, P. W., S. K. JAIN, and L. HOLDEN, 1978  Multilocus systems in evolution. Evol. Biol. 11:101-184.

KARLIN, S. and A. PIAZZA, 1981  Statistical methods for assessing linkage disequilibrium at the HLA-A, B, C loci. Ann. Hum. Genet. 45:79-94[Medline].

KENDALL, M., and A. STUART, 1977 The Advanced Theory of Statistics. Vol. 1. Distribution Theory, Ed. 4. Griffin, London.

LEWONTIN, R. C., 1964  The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:49-67[Free Full Text].

MANLY, B. F. J., 1985 The Statistics of Natural Selection on Animal Populations. Chapman and Hall, London.

MAYNARD SMITH, J., N. H. SMITH, M. O'ROUKE, and B. G. SPRATT, 1993  How clonal are bacteria? Proc. Natl. Acad. Sci. USA 90:4384-4388[Abstract/Free Full Text].

NEVO, E. and A. BEILES, 1989  Genetic diversity of wild emmer wheat in Israel and Turkey: structure, evolution, and application in breeding. Theor. Appl. Genet. 77:421-455.

WAHLUND, S., 1928  Zusammensetzung von population und korrelationserscheinung vom standpunkt der vererbungslehre aus betrachtet. Hereditas 11:65-106.

WEIR, B. S., 1979  Inferences about linkage disequilibrium. Biometrics 35:235-254[Medline].

WEIR, B. S., 1996 Genetic Data Analysis II. Sinauer Associates, Sunderland, MA.

WEIR, B. S. and C. C. COCKERHAM, 1973  Mixed self and random mating at two loci. Genet. Res. 21:247-262[Medline].

WEIR, B. S., and C. C. COCKERHAM, 1989 Complete characterization of disequilibrium at two loci, pp. 86–110 in Mathematical Evolutionary Theory, edited by M. W. FELDMAN. Princeton University Press, Princeton, NJ.

WEIR, B. S., J. REYNOLDS, and K. G. DODDS, 1990  The variance of sample heterozygosity. Theor. Popul. Biol. 37:235-253[Medline].

WHITTAM, T. S., H. OCHMAN, and R. K. SELANDER, 1983  Multilocus genetic structure in natural populations of Escherichia coli.. Proc. Natl. Acad. Sci. USA 80:1751-1755[Abstract/Free Full Text].

YANG, R.-C., 1998  Estimating hierarchical F-statistics. Evolution 52:950-956.

YEH, F. C., J. SHI, R.-C. YANG, J. HONG, and Z. YE, 1994  Genetic diversity and multilocus associations in Cunninghamia lanceolata (Lamb.) Hook. from People's Republic of China. Theor. Appl. Genet. 88:465-471.




This article has been cited by other articles:


Home page
GeneticsHome page
T. Liu, R. J. Todhunter, Q. Lu, L. Schoettinger, H. Li, R. C. Littell, N. Burton-Wurster, G. M. Acland, G. Lust, and R. Wu
Modeling Extent and Distribution of Zygotic Disequilibrium: Implications for a Multigenerational Canine Pedigree
Genetics, September 1, 2006; 174(1): 439 - 453.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R.-C. Yang
Epistasis of Quantitative Trait Loci Under Different Gene Action Models
Genetics, July 1, 2004; 167(3): 1493 - 1505.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R.-C. Yang
Gametic and Zygotic Associations
Genetics, September 1, 2003; 165(1): 447 - 450.
[Full Text] [PDF]


Home page
GeneticsHome page
C. Sabatti and N. Risch
Response to the Letter "Gametic and Zygotic Associations" by Rong-Cai Yang
Genetics, September 1, 2003; 165(1): 451 - 452.
[Full Text] [PDF]


Home page
GeneticsHome page
B. Law, J. S. Buckleton, C. M. Triggs, and B. S. Weir
Effects of Population Structure and Admixture on Exact Tests for Association Between Loci
Genetics, May 1, 2003; 164(1): 381 - 387.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
X.-Y. Lou, G. Casella, R. C. Littell, M. C. K. Yang, J. A. Johnson, and R. Wu
A Haplotype-Based Algorithm for Multilocus Linkage Disequilibrium Mapping of Quantitative Trait Loci With Epistasis
Genetics, April 1, 2003; 163(4): 1533 - 1548.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R.-C. Yang
Analysis of Multilocus Zygotic Associations
Genetics, May 1, 2002; 161(1): 435 - 445.
[Abstract] [Full Text] [PDF]