Genetics, Vol. 165, 447-450, September 2003, Copyright © 2003


Letter to the Editor

Gametic and Zygotic Associations

Rong-Cai Yanga
a Alberta Agriculture, Food and Rural Development, Edmonton, Alberta T6H 5T6, Canada and Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta T6G 2P5, Canada

Corresponding author: Rong-Cai Yang, Food and Nutritional Science, 410 Agriculture/Forestry Centre, University of Alberta, Edmonton, AB T6G 2P5, Canada., rong-cai.yang{at}ualberta.ca (E-mail)

NONRANDOM associations between genes at different loci are often assessed in population genetic and evolution studies because such associations provide the basis for inferring about demographic and genetic events in the past, such as population history and evolutionary forces governing the loci. Current intensive interest in the association studies largely stems from the prospect of exploiting the relation between the extent of association and the recombination fraction for fine-scale mapping of quantitative trait loci (QTL) controlling complex diseases in humans (ARDLIE et al. 2002 Down) or quantitative traits of economical or adaptive importance in animals and plants (FARNIR et al. 2002 Down). In either case, the focus has been on the use of gametic association or commonly called linkage disequilibrium (LD). Several statistical measures have been proposed to characterize LD (see HEDRICK 1987 Down for review), but the use of these measures is often limited to a pair of alleles at two loci. With increasing availability of multiallelic systems such as microsatellites, pairwise LD measures may be too numerous to be readily manageable and interpretable in initial genome-wide studies. More importantly, unless a stringent significance level is imposed, the large number of required pairwise tests under commonly used significance levels 5 and 1% may produce spurious association realizations (KARLIN and PIAZZA 1981 Down).

Recently, SABATTI and RISCH 2002 Down suggested the use of haplotype homozygosity as a possible measure of LD to circumvent the problem of measuring multilocus associations relating to multiple alleles and loci. When zygotes result from the random union of gametes (i.e., Hardy-Weinberg equilibrium) as assumed in SABATTI and RISCH 2002 Down, LD can be estimated from observed homozygosities and heterozygosities. The advantage of this approach is that the homozygosities and heterozygosities are defined independently of the number of alleles per locus, thereby allowing one to measure LD between highly polymorphic markers. In the presence of Hardy-Weinberg disequilibrium as often in natural populations, however, LD is only one of several genic disequilibria that are required for a complete characterization of nonrandom associations at different loci (COCKERHAM and WEIR 1973 Down). In a similar but independent development, YANG 2000 Down, YANG 2002 Down advocated a direct characterization and test of zygotic associations at multiple loci regardless of whether or not the population is in Hardy-Weinberg equilibrium. The purposes of this letter are (i) to elucidate the relationship between the two approaches by SABATTI and RISCH 2002 Down and by YANG 2000 Down, YANG 2002 Down and (ii) to point out possible bias in calculating LD if other nonzero genic disequilibria are ignored.

For simplicity, the consideration is given only to the case of two loci (say j and l), each with multiple alleles (j1, j2, ... , jr; l1, l2, ... , ls). Frequencies of zygotes at loci j and l from the union of gametes, ju ly and jv lz (u, v = 1, 2, ... , r and y, z = 1, 2, ... , s), are written as . At each locus, a zygote can be either homozygous (denoted as 0) or heterozygous (denoted as 1). Thus, there are four classes of zygotic frequencies at the two loci: (i) double homozygotes [f(00)], (ii) homozygotes at locus j and heterozygotes at locus l [f(01)], (iii) heterozygotes at locus j and homozygotes at locus l [f(10)], and (iv) double heterozygotes [f(11)]:

The marginal zygotic frequencies at the two individual loci are: f(0·) = f(00) + f(01) = 1 - Hj, f(1·) = f(10) + f(11) = Hj, f(·0) = f(00) + f(10) = 1 - Hl, and f(·1) = f(01) + f(11) = Hl, where Hj and Hl are the population heterozygosities at loci j and l, respectively. SABATTI and RISCH 2002 Down used these relations to set up the two-way contingency table for homozygosities and heterozygosities at the two loci, but all in terms of double and single homozygosities. Using the above notation, the four classes of zygotic frequencies given in Equation 9 of SABATTI and RISCH 2002 Down are: f(00), f(·0) - f(00), f(0·) - f(00), and 1 - f(·0) - f(0·) + f(00). YANG 2000 Down, YANG 2002 Down explicitly described nonrandom association between zygotes at loci j and l called zygotic association ({omega}) in terms of the following relations:

Furthermore, YANG 2002 Down showed that this zygotic association could be expressed as a function of various genic disequilibria including LD using COCKERHAM and WEIR's (1973) disequilibrium functions,

(1)

where pu and qy are the frequencies of allele u at locus j and allele y at locus l, respectively, and each genic disequilibrium is the deviation of a frequency from that based on random association of genes and accounting for any lower-order disequilibria. For example, LD (Duy..) is the deviation of frequency of gamete ju ly from the product of frequencies of allele u at locus j and allele y at locus l, with .

When zygotes result from random union of gametes, all nongametic disequilibria including Hardy-Weinberg disequilibrium disappear (e.g., ). Thus, Equation 1 reduces to

(2)

Furthermore, if there are only two alleles per locus (u, y = 1, 2), the zygotic association becomes

(3)

where D is LD. Equation 3 is essentially the same as Equation 3 of SABATTI and RISCH 2002 Down and is the basis for the homozygosity measure of gametic disequilibrium.

Evidently, since the zygotic association is a composite measure, the direct one-to-one relationship between zygotic and gametic associations is possible only when there are two alleles at each of the two loci with the absence of all nongametic disequilibria (i.e., Equation 3). Thus, with knowledge of {omega} and allelic frequencies (p's and q's), LD can be calculated by solving the equation . In the special case of p1 = p2 = 0.5 or q1 = q2 = 0.5,

(4a)

Unfortunately, {omega} must be nonnegative to obtain a solution for D. In all other cases,

(4b)

with the condition of {omega} >= -[(p1 - p2)(q1 - q2)/2]2. It remains unclear which of the two solutions at a given {omega} is the right solution for D.

Numerical calculation is carried out to examine patterns of the solutions for D. Consider first the case where all genic disequilibria except for LD are zero. For a given set of gene frequencies, LD falls in the range of D-max <= D <= D+max, where and with . Using disequilibrium functions of COCKERHAM and WEIR 1973 Down, I construct frequencies of 10 genotypes at loci j and l (with double heterozygotes being distinguished) and then group them into the four frequency classes of homozygotes and heterozygotes [f(00), f(01), f(10), and f(11)]. The zygotic association is calculated as {omega} = f(00) f(11) - f(10)f(01). Given gene frequencies and {omega}, the two solutions for D as obtained from (4b) are D1 (for taking the negative at "±" sign) and D2 (for taking the positive at ± sign). Table 1 presents the solutions for six gene frequencies that are equal at the two loci (i.e., p1 = q1 = 0.0, 0.1, 0.2, 0.3, 0.4, and 0.5), at five levels of D (D-max, 0.5 D-max, 0, 0.5 D+max, and D+max). The equality of two zygotic frequencies, f(10) and f(01), is expected for equal gene frequencies at the two loci. It is evident from Table 1 that when gene frequencies are low, only D1 is the correct solution, but when gene frequencies are increased toward intermediate (p1 = q1 = 0.5), D1 is the correct solution for D >= 0 and D2 is the correct solution for D < 0.


 
View this table:
In this window
In a new window

 
Table 1. Solutions for linkage disequilibrium (D1 and D2) from zygotic associations ({omega}) in Hardy-Weinberg equilibrium populations

Because {omega} is a summary statistic at the zygote level, it may represent a loss of haplotype information such as gametic disequilibrium. In other words, zero zygotic association ({omega} = 0) does not preclude the existence of certain nonzero gametic disequilibria (D != 0) as evident from Equation 3. Thus, with {omega} = 0, the nontrivial solution as derived from Equation 4b for LD, D = -(p1 - p2)(q1 - q2)/2, is not necessarily zero unless there are symmetric allele frequencies (p1 = p2 = 0.5 or q1 = q2 = 0.5). For example, if p1 = q1 = 0.3, the nontrivial solution for LD is D = -0.08, but zygotic frequencies are f(00) = 0.3364, f(01) = f(10) = 0.2436, and f(11) = 0.1764, leading to {omega} = (0.3364)(0.1764) - (0.2436)2 = 0.

In the presence of all genic disequilibria, the relationship between zygotic and gametic associations becomes far less clear (cf. Equation 1). Table 2 presents five selected examples of solutions for LD (D1 and D2) from zygotic associations ({omega}). For each of five gene frequencies that are equal at the two loci (i.e., p1 = q1 = 0.1, 0.2, 0.3, 0.4, and 0.5), minimum and maximum values of Hardy-Weinberg disequilibria (HWD), nonallelic digenic disequilibria including both gametic (D) and nongametic disequilibria(D'), trigenic disequilibria (TRID), and quadrigenic disequilibria (QD) are determined just as LD is determined for Table 1. As with LD, the strength of each genic disequilibrium is represented by the five levels (maximum negative, half-maximum negative, zero, half-maximum positive, and maximum positive). Thus, a total of 3125 (5 x 5 x 5 x 5 x 5) combinations are examined. Frequencies of 10 genotypes are calculated using COCKERHAM and WEIR's (1973) disequilibrium functions involving these genic disequilibria and 4 zygotic frequencies are simply appropriate sums of the 10 genotypic frequencies. In the first example, all nonallelic genic disequilibria (D = D', TRID, and QD) are zeros, zygotic association is zero ({omega} = 0) as expected, and the first solution (D1 = 0) corresponds to the absence of gametic disequilibrium (D = 0). However, because one or more nonallelic genic disequilibria are present in each of the remaining four examples, there is no correspondence between either of the two solutions (D1 or D2) and D. In the third and fifth examples, there is no LD (D = 0), but because of nonzero TRID and/or QD, neither solution is zero. In particular, the fifth example represents a well-known scenario where nonzero quadrigenic disequilibrium between two unlinked loci is present in a population undergoing mixed selfing and random mating with s being the proportion of selfing (e.g., WEIR and COCKERHAM 1973 Down). For the case of two alleles at each of the two loci, the zygotic association is , where


 
View this table:
In this window
In a new window

 
Table 2. Selected examples of solutions for linkage disequilibrium (D1 and D2) from zygotic associations ({omega}) in Hardy-Weinberg disequilibrium populations

Clearly, Duyuy != 0 unless s = 0 or s = 1. Thus, because of nonzero zygotic association, neither D1 nor D2 is even close to zero for a gametic equilibrium (D = 0) population.

While the selected examples in Table 2 are somewhat arbitrary, the point is clear: there is little correspondence between gametic and zygotic associations when other types of genic disequilibria are present. SABATTI and RISCH 2002 Down(p. 1718) also noted that "unfortunately, the relation between homozygosity and recombination fraction is not always direct ..." although they considered only the haplotype homozygosity and heterozygosity in a Hardy-Weinberg equilibrium population. The important values of zygote-based measures may lie in (i) their ability to quickly detect suspected "hot spots" of associations in genome-wide scans (SABATTI and RISCH 2002 Down) and (ii) the comparative assessment of gametic vs. zygotic associations to infer about adaptive significance of genotypes at different loci (YANG 2002 Down). For the genome scanning, the primary purpose of the zygotic association analysis, just like that of the LD analysis, is to detect markers that are tightly linked to QTL. In such detection, spurious associations (false positives) between markers and QTL may occur in two ways. First, strong associations between unlinked loci may arise from many evolutionary factors (see below for a discussion). Genetic designs and statistical tests are now available to avoid these kinds of false-positive findings (GIBSON and MUSE 2002 Down). Second, the huge number of comparisons that are required to scan the genome for association will inevitably produce abundant false positives unless a significance level that is much more stringent than 5% or 1% is imposed (KARLIN and PIAZZA 1981 Down).

Most current LD studies, whether on evolution or on QTL mapping, focus on patterns of LD as predicted by simple demographic models of population expansions or contractions, but do often acknowledge the impact of other factors such as natural selection, random drift, admixture, or gene flow and inbreeding (e.g., PRITCHARD and PRZEWORSKI 2001 Down; ARDLIE et al. 2002 Down). In essence, these factors cause the departure from Hardy-Weinberg equilibrium, thereby producing the zygotic association even in a gametic equilibrium population (cf. YANG 2000 Down, Table 2, case 4). Thus, if these factors are present but ignored, LD will be definitely over- or underemphasized in evolution or QTL-mapping studies.

ACKNOWLEDGMENTS

This research was partially supported by the Natural Sciences and Engineering Research Council of Canada grant OGP0183983.

Manuscript received July 3, 2002; Accepted for publication February 5, 2003.

LITERATURE CITED

ARDLIE, K. G., L. KRUGLYAK, and M. SEIELSTAD, 2002  Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. 3:299-309.[Medline]

COCKERHAM, C. C. and B. S. WEIR, 1973  Descent measures for two loci with some applications. Theor. Popul. Biol. 4:300-330.[Medline]

FARNIR, F., B. GRISART, W. COPPIETERS, J. RIQUET, and P. BERZI et al., 2002  Simultaneous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred half-sib pedigrees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics 161:275-287.[Abstract/Free Full Text]

GIBSON, G., and S. V. MUSE, 2002 A Primer of Genome Science. Sinauer Associates, Sunderland, MA.

HEDRICK, P. W., 1987  Gametic disequilibrium measures: proceed with caution. Genetics 117:331-341.[Abstract/Free Full Text]

KARLIN, S. and A. PIAZZA, 1981  Statistical methods for assessing linkage disequilibrium at the HLA-A, B, C loci. Ann. Hum. Genet. 45:79-94.[Medline]

PRITCHARD, J. K. and M. PRZEWORSKI, 2001  Linkage disequilibrium in humans: model and data. Am. J. Hum. Genet. 69:1-14.[Medline]

SABATTI, C. and N. RISCH, 2002  Homozygosity and linkage disequilibrium. Genetics 160:1707-1719.[Abstract/Free Full Text]

WEIR, B. S. and C. C. COCKERHAM, 1973  Mixed self and random mating at two loci. Genet. Res. 21:247-262.[Medline]

YANG, R.-C., 2000  Zygotic associations and multilocus statistics in a nonequilibrium diploid population. Genetics 155:1449-1458.[Abstract/Free Full Text]

YANG, R.-C., 2002  Analysis of multilocus zygotic associations. Genetics 161:435-445.[Abstract/Free Full Text]




This article has been cited by other articles:


Home page
GeneticsHome page
R.-C. Yang
Epistasis of Quantitative Trait Loci Under Different Gene Action Models
Genetics, July 1, 2004; 167(3): 1493 - 1505.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. Sabatti and N. Risch
Response to the Letter "Gametic and Zygotic Associations" by Rong-Cai Yang
Genetics, September 1, 2003; 165(1): 451 - 452.
[Full Text] [PDF]