| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Root Gorelick, Arizona State University, P.O. Box 871501, Tempe, AZ 85287-1501., cycad{at}asu.edu (E-mail)
Communicating editor: M. W. FELDMAN
| ABSTRACT |
|---|
We present a mathematically precise formulation of total linkage disequilibrium between multiple loci as the deviation from probabilistic independence and provide explicit formulas for all higher-order terms of linkage disequilibrium, thereby combining J. Dausset et al.'s 1978 definition of linkage disequilibrium with H. Geiringer's 1944 approach. We recursively decompose higher-order linkage disequilibrium terms into lower-order ones. Our greatest simplification comes from defining linkage disequilibrium at a single locus as allele frequency at that locus. At each level, decomposition of linkage disequilibrium is mathematically equivalent to number theoretic compositions of positive integers; i.e., we have converted a genetic decomposition into a mathematical decomposition.
A precise measurement of linkage disequilibrium is required for studying virtually any phenomenon in multilocus population genetics. This is especially true for explicit multilocus models that investigate the contributions of physiological epistasis to additive genetic variance (![]()
![]()
![]()
![]()
![]()
The classical definition of linkage disequilibrium, D, follows the probability theory definition of deviation from independence. Independence of two events, B and C, means that Pr(BC) = Pr(B) · Pr(C), where Pr is probability and BC is the joint distribution of B and C, so that the deviation from independence is measured as D = Pr(BC) Pr(B) · Pr(C). Changing notation slightly to let Ak(i) designate the kth allele at the ith locus gives the linkage disequilibrium between the alleles at two loci, D2, as D2 = Pr(Ak(1)Ak(2)) Pr(Ak(1)) · Pr(Ak(2)), where Pr represents probability and Ak(1)Ak(2) represents the joint occurrence of Ak(1) and Ak(2) in a single haploid gamete. In most modern interpretations of probability theory, the primitive concept of "probability" is interpreted as a relative frequency; therefore, Pr(Ak(1)) is the same as the frequency of allele k at locus 1.
The quintessential examples of linkage disequilibrium are coadapted gene complexes, in which several loci are tightly linked because they provide a large selective advantage if they occur together. In these cases, linkage disequilibrium is maintained by selection. Although coadapted gene complexes are implicit in Wright's shifting-balance hypothesis (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Methodologically, we follow Geiringer's lead and decompose higher-order linkage disequilibrium into lower-order linkage disequilibrium terms. In other words, we take a top-down approach to defining multilocus linkage disequilibrium, rather than the bottom-up approach followed by virtually everyone since ![]()
![]()
In this article, we first define linkage disequilibrium at a single locus as the allele frequency at this locus, which greatly simplifies notation. Second, we extend the definition of linkage disequilibrium to multiple loci by invoking compositions of positive integers. Our decomposition of multilocus linkage disequilibrium is entirely consistent with the standard definitions for two loci, as well as its previous extensions to three, four, and six loci (![]()
![]()
![]()
| DECOMPOSITION OF MULTILOCUS LINKAGE DISEQUILIBRIUM |
|---|
Define the one-locus coefficient of linkage disequilibrium, D1, as D1(Ak(i)) = Pr(Ak(i)). This definition may appear paradoxical, but it dramatically simplifies notation for the decomposition of multilocus linkage disequilibrium. In elementary algebra we have the analogous problem of defining the algebraic expression xn when n = 0 (![]()
Following ![]()

Let Dn be the coefficient of linkage disequilibrium between n loci. Then the pattern here is that Dn = Pr(Ak(1)Ak(2) ... Ak(n)) minus all possible products of lower-order linkage disequilibrium coefficients, such that each term has all of its subscripts adding up to n. The key to writing down an explicit formula for Dn is that the phrase "all possibilities of the subscripts adding up to n" refers to partitions of the positive integer n (![]()
of a positive integer n is a set of positive integers that adds up to n; i.e.,
= {n1,n2, ..., nm} such that
. The set of all partitions of n is designated p(n); e.g., p(5) = {{5}, {4, 1}, {3, 2}, {2, 2, 1}, {3, 1, 1}, {2, 1, 1, 1}, {1, 1, 1, 1, 1}}. To define multilocus linkage disequilibrium, we have to add over all partitions, excluding the trivial partition
= {n}, and permute over all alleles for a given number of loci. However, the order of elements of the partition matters, and hence we construct the number-theoretic compositions c of the positive integer n (![]()
= {2, 2, 1} are the ordered triples (2, 2, 1), (2, 1, 2), and (1, 2, 2). Using these mathematical notions we can generalize the two- and three-locus cases to define linkage disequilibrium between n loci as
![]() |
(1a) |
where ni
c means that ni
c is a scalar component of the vector c. Equivalently,
![]() |
(1b) |
The only way to decompose n into a single positive integer is c = (n). Therefore, we can also write the highest-order coefficient of linkage disequilibrium as
, where the summation has only a single term and the product has only a single factor. Therefore, Equation 1a yields
![]() |
(2) |
which we use below.
Equation 1aEquation 1b has never been written explicitly for general multilocus linkage disequilibrium, even though special cases have been given by ![]()
![]()
![]()
![]()
![]() |
(3) |
which we call total linkage disequilibrium, Dn, where we have again replaced Pr(Ak(i)) with D1(Ak(i)). We refer to Dn as total linkage disequilibrium because, as we show below, all of the nonboldface linkage disequilibrium coefficients D1, D2, D3, ..., Dn can be independent from one another and contribute to Dn. Equation 3 has a simple heuristic interpretation: Dn(Ak(1) ... Ak(n)) measures how far the haploid genotype at all n loci deviates from probabilistic independence.
We are now ready to derive the relationship between Dn and Dn. In Equation 3, substitute
all compositions c of n [
ni
cDni (...)] for Pr(Ak(1), ... Ak(n)) (see Equation 2), yielding
The last term in this equation is simply the value of
ni
cDni (...) for the composition c = (1, 1, 1, ..., 1), i.e.,
Therefore, Equation 3 becomes
![]() |
(4a) |
or, equivalently,
![]() |
(4b) |
Equation 4aEquation 4b provides the crucial link between deviations from independence (Dn) and the linkage disequilibrium coefficients Dn computed by ![]()
ni = n.
| DISCUSSION |
|---|
We have converted the genetics problem of decomposing linkage disequilibrium into the mathematical problem of decomposing positive integers into their additive parts, all while maintaining the convenient heuristic definition of total linkage disequilibrium as the deviation from independence. Unlike ![]()
![]()
One immediate consequence of our decomposition approach is that the single highest-order coefficient of linkage disequilibrium, Dn, cannot be examined in isolation. Because
, we need to examine all lower-order linkage disequilibrium coefficients, Dni (...) with ni < n. All of the subscripted linkage disequilibrium coefficients D1, D2, D3, ..., Dn can be independent from one another and all contribute to Dn, which we therefore call total linkage disequilibrium.
Multilocus definitions of linkage disequilibrium have not been used very often in empirical studies because of the large number of inputs and linkage disequilibrium coefficients that must be analyzed (2n 1). Currently, even third-order linkage disequilibrium is seldom measured (![]()
One important theoretical application is the analysis of multilocus epistasis. ![]()
![]()
![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We thank Phil Hedrick, Tom Dowling, and two anonymous reviewers for their helpful comments.
Manuscript received November 26, 2003; Accepted for publication December 18, 2003.
| LITERATURE CITED |
|---|
ANDREWS, G. E., 1976 The Theory of Partitions. Addison-Wesley, Reading, MA.
ARDLIE, K. G., L. KRUGLYAK, and M. SEIELSTAD, 2002 Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. 3:299-309.[CrossRef][Medline]
BENNETT, J. H., 1954 On the theory of random mating. Ann. Eugen. 18:311-317.
BULMER, M. G., 1980 The Mathematical Theory of Quantitative Genetics. Clarendon Press, Oxford.
CHEVERUD, J. M. and E. J. ROUTMAN, 1995 Epistasis and its contribution to genetic variance components. Genetics 139:1455-1461.[Abstract]
DAUSSET, J., L. LEGRAND, V. LEPAGE, L. CONTÚ, and A. MARCELLI-BARGE et al., 1978 A haplotype study of HLA complex with special reference to HLA-DR series and to Bf, C2 and glyoxalase I polymorphisms. Tissue Antigens 12:297-307.[Medline]
DOBZHANSKY, T., 1948 Genetics of natural populations. XVIII. Experiments on chromosomes of Drosophila pseudoobscura from different geographical regions. Genetics 33:797-808.
GEIRINGER, H., 1944 On the probability theory of linkage in Mendelian heredity. Ann. Math. Stat. 15:25-57.
GOODNIGHT, C. J., 1988 Epistasis and the effect of founder events on the additive genetic variance. Evolution 42:441-454.[CrossRef]
GOODNIGHT, C. J., 1995 Epistasis and the increase in additive genetic variance: implications for phase 1 of Wright's shifting balance process. Evolution 49:502-511.[CrossRef]
HASTINGS, A., 1984 Linkage disequilibrium, selection and recombination at three loci. Genetics 106:153-164.
HEDRICK, P. W., 2000 Genetics of Populations, Ed. 2. Jones & Bartlett, Sudbury, MA.
LAKOFF, G., and R. E. NÚÑEZ, 2000 Where Mathematics Comes From. Basic Books, New York.
LEWONTIN, R. C., 1974 The Genetic Basis of Evolutionary Change. Columbia University Press, New York.
LYNCH, M., 1991 The genetic interpretation of inbreeding depression and outbreeding depression. Evolution 45:622-629.[CrossRef]
PALOPOLI, M. F. and C.-I WU, 1996 Rapid evolution of a coadapted gene complex: evidence from the segregation distorter (SD) system of meiotic drive in Drosophila melanogaster.. Genetics 143:1675-1688.[Abstract]
RAWSON, P. D. and R. S. BURTON, 2002 Functional coadaptation between cytochrome c and cytochrome c oxidase within allopatric populations of a marine copepod. Proc. Natl. Acad. Sci. USA 99:12955-12958.
THOMSON, G. and M. P. BAUR, 1984 Third order linkage disequilibrium. Tissue Antigens 24:250-255.[Medline]
WADE, M. J. and C. J. GOODNIGHT, 1998 Perspective: the theories of Fisher and Wright in the context of metapopulations: when nature does many small experiments. Evolution 52:1537-1553.[CrossRef]
WAGNER, G. P. and M. D. LAUBICHLER, 2000 Character identification in evolutionary biology: the role of the organism. Theory Biosci. 119:20-40.[CrossRef]
WAGNER, G. P., M. D. LAUBICHLER, and H. BAGHERI-CHAICHIAN, 1998 Genetic measurement theory of epistatic effects. Genetica 102(103):569-580.
WRIGHT, S., 1931 Evolution in Mendelian populations. Genetics 16:97-159.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |