- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.107.085654v1
179/3/1705 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Takahasi, K. R.
- Articles by Innan, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Takahasi, K. R.
- Articles by Innan, H.
Originally published as Genetics Published Articles Ahead of Print on June 18, 2008.
Genetics, Vol. 179, 1705-1712, July 2008, Copyright © 2008
doi:10.1534/genetics.107.085654
The Direction of Linkage Disequilibrium: A New Measure Based on the Ancestral-Derived Status of Segregating Alleles
K. Ryo Takahasi*,1 and
Hideki Innan
,1,2
* Population and Quantitative Genomics Team, Genomic Sciences Center, RIKEN, Yokohama 230-0045, Japan and
Department of Evolutionary Studies of Biological Systems, Graduate University for Advanced Studies, Hayama 240-0193, Japan
2 Corresponding author: Department of Evolutionary Studies of Biological Systems, Graduate University for Advanced Studies, Hayama 240-0193, Japan.
E-mail: innan_hideki{at}soken.ac.jp
A new measure of directional linkage disequilibrium is developed for detecting epistatic selection on interacting genes. Simulations show that by orienting the direction of linkage disequilibrium on the basis of the ancestral-derived status of alleles, the new measure indeed improves the power to detect a positive fitness interaction between two new mutations.
LINKAGE disequilibrium (LD) measures the nonrandom association of alleles at distinct loci. Because strong LD tends to be eroded in a recombining genome, significant deviation from random combination (linkage equilibrium) may be indicative of close proximity of a pair of loci along a chromosome (linkage) or fitness interactions among cosegregating variants (statistical epistasis). For a system of two loci with two alleles each (say, A/a and B/b), there are four possible combinations of alleles (or haplotypes), and a simple form of LD is expressed as the excess haplotype frequency over the expectation under linkage equilibrium (LEWONTIN and KOJIMA 1960),
![]() |
![]() |
![]() |
![]() |
![]() | (1) |
|
The present study proposes an orientation of directional LD based on ancestor–descendant relationships of alleles and illustrates the applicability of the new directional measure in detecting epistatic interactions among cosegregating variants. We first focus on a process eventually leading to the joint fixation of two novel mutations a and b, which originated independently from their parental alleles A and B, respectively. If we further assume that the substitution process is facilitated by a positive fitness interaction between the two mutant alleles, we would observe an excess frequency of the double mutant ab and hence a transient formation of positive Dab (= DAB) in the process of double fixation. By properly utilizing the knowledge about ancestor–descendant relationships of the alleles, we expect that the directional departure from the neutral expectation would be more effectively detected. In contrast, when the appropriate information is not available, we have to rely solely on the absolute value, here denoted |D| (= |DAB| = |DAb| = |DaB| = |Dab|), while ignoring the direction of LD. The power of detecting epistatic effects would then be largely diminished.
To characterize the biased nature inherent in the transient LD, we here introduce a directional measure of LD, defined as
![]() | (2) |
r*
1). Its squared value is therefore equivalent to an oft-used measure of LD,
![]() | (3) |
In the following, we show by stochastic simulations that the statistical power to detect positive epistatic interactions is indeed enhanced by applying the directional measure r*. Two forms of fitness interactions are considered:
![]() | (4) |
![]() | (5) |
![]() | (6) |
Simulations show that conditional on double fixation, positive r* may be formed shortly after the introduction of the second mutation b, even when the allelic combinations are selectively neutral (i.e., s = 0 in the above simulations). As evolution proceeds, recombination breaks up the temporarily established positive r* under neutrality, and the biased distribution rapidly becomes symmetrical about r* = 0 (Figure 1A, distribution in solid lines). In contrast, when the mutations interact epistatically, positive r* persists for a longer period of time despite the counteracting effect of recombination (Figure 1A, shaded distribution), until either of the mutant alleles becomes fixed. During this phase of evolution, the incorporation of allelic status into the analysis would allow us to successfully discriminate positive r* caused by epistatic interactions from symmetrically distributed r* under neutrality (Figure 1A). In contrast, the distinction is less clear when the unsigned measure |r| is used in the absence of the appropriate ancestral information (Figure 1B). By applying the directional measure r*, the power of detecting positive epistatic interactions may be gained as much as 1.5-fold (Figures 2 and 3). Even under weak epistasis (s = 0.01), relatively high levels of detection power are retained for some time after the second mutational event, although the power quickly drops off and approaches the standard level of 5% when the ancestral information is lacking. It seems that higher LD is produced in the compensation model because of selection disfavoring the haplotypes Ab and aB that potentially reduce the absolute value of LD.
|
|
|
Overall, our simulations show that for moderate levels of linkage (c = 0.02 or 0.05 in the above examples), the power of detecting positive epistatic interactions may be substantially enhanced by explicitly taking the ancestral-derived status of alleles into account. They also suggest that when the linkage is too tight (e.g., c
0.01), the relative power may not be increased because conditional on double fixation, positive r* tends to persist even under neutrality. When the rates of recombination are much higher, discernible levels of LD would not arise in any case. We have also assumed a fixed value p0 = 0.1 for the initial frequency of the allele a. When the first mutation is initially more abundant (e.g., p0 >> 0.2), the amount of LD will remain low during evolution, as this assumption implies that the evolutionary increase of the allele a is led primarily by random genetic drift in the absence of the second mutation b. Accordingly, the role played by the positive epistatic interaction in driving the joint increase of the two mutations, which is the prerequisite for the generation of significant LD, would be of minor importance (cf. TAKAHASI and TAJIMA 2005). We also found that analogous measures of directional LD (such as the simple measure of Dab) successfully distinguished the peculiar pattern of LD under epistatic selection from the neutral expectation; in the above examples, the statistical power of Dab was almost comparable to or slightly less than the corresponding power based on our new measure r*. However, D' (LEWONTIN 1964) was an exception; because the absolute value of D' becomes unity when at least one of four possible haplotypes is absent, its distribution is more easily concentrated at the boundaries, even under neutrality. Whereas the above analysis clearly illustrates the potential advantage of the directional measure, its direct application seems impractical because the analysis is based on a number of conditions that are generally unknown in practice; specifically, the simulations are conditioned on double fixation, and the performance of each LD measure is evaluated by conditioning on the time since the second mutational event (as in Figures 2 and 3). For practical purposes, we may wish to base our analysis on some property that can be directly observed or at least be inferred from the available data. To exemplify this possibility, we have conducted an additional series of simulations and investigated the relative power of r* by conditioning on the observed frequencies of mutant alleles at a pair of loci (Tables 2 and 3).
|
|
In this new series of simulations, we consider a more realistic situation where mutant alleles are repeatedly introduced at both loci and the two-locus frequency dynamics are fully simulated without postulating a fixed initial frequency p0 as in the above case. In so doing, the recurrent mutation rates are assumed so low that when an allele is lost at one of the loci, a new mutant allele is immediately introduced to ensure that the alleles always segregate at the two loci. This procedure, which saves a considerable amount of computational time by skipping the waiting period for the next mutational event, does not affect our results because we are here interested only in those situations where the two loci are both polymorphic so that some degree of LD may persist between the loci. Moreover, in contrast to the above analysis conditioned on double fixation, the new simulations also make use of the mutant alleles that are eventually lost from the population when computing the distributions of LD. This is because in practice, we do not know which of the segregating alleles will be fixed in the future generations.
Every generation the mutant allele frequencies (pa and qb) in addition to the LD measures (r* and |r|) were recorded. The allele-frequency spaces were then partitioned (somewhat arbitrarily) into 10 classes with an equal interval of 0.1, and the pattern of LD was studied separately for each class of mutant allele frequencies. Excluding the two terminal classes 0.0
0.1 and 0.9
1.0, there are 36 [= 8 x (8 + 1)/2] two-locus frequency configurations. For each of these possible combinations of frequency classes, we first investigated the null distributions of LD (conditional on given allele-frequency combinations), assuming neutrality. On the basis of a total of 106 data sets collected separately for each of the 36 configurations, the 5% significance level conditional on mutant allele frequencies was independently determined. As depicted in Figure 4, the significance levels based on the directional measure r* are substantially affected by the two-locus frequency configuration. When the mutant allele frequencies at the two loci are nearly equal [as for the frequency combination (0.8
0.9, 0.8
0.9) in Figure 4], the 5% significance level grows rapidly as the linkage between the loci becomes tight, and even under neutrality it reaches the maximum value of unity roughly when Nc
1. In contrast, when the frequency difference between the loci is large [as for the frequency combinations (0.8
0.9, 0.1
0.2) in Figure 4], the neutral threshold is kept low for a wide range of recombination rates.
|
In a similar manner, simulations with the selection models (4) and (5) were then conducted to evaluate the power of our new measure r* relative to its unsigned version |r|. As shown in Tables 2 and 3, the power of detecting epistatic interactions may be gained substantially by adding the ancestral information, especially when at one of the loci the mutant allele has attained a relatively high frequency (0.6
0.9), while it is kept low (0.2
0.4) at the other locus. Under these unbalanced frequency configurations, the detection power of the unsigned measure |r| may be negligibly small, whereas relatively high detection power is retained by using the directional measure r*. Consequently, when the coadaptation model of epistatic selection is assumed, the maximum increase in power reaches almost 50-fold with the parameter values studied here (s = 0.05 and c = 0.02; Table 2). Under the compensation model, the detection power may be enhanced even more substantially in spite of a smaller selection coefficient (s = 0.02 with c = 0.02; Table 3). Although we here show only a limited set of simulation results to demonstrate the improved performance of the directional measure r*, its superiority over the unsigned measure |r| was confirmed for a much wider range of parameters under the two selection models. Note that the test described above involves some additional parameters and conditions that must also be inferred from the available data (e.g., ancestor–descendant relationship of alleles, linkage phase in the case of diploid organisms, recombination rate, and so on). It seems almost certain that the misinference of any of these factors would cause a bias in the LD-based test. Especially, an erroneous specification of the allelic status might occur when an appropriate outgroup sequence is not available or at sites with high mutation rates. Such misinference would cause a reduction in the relative performance of r*; that is, the advantage of using a directional measure is reduced. Indeed, additional simulations have found that when the ancestral-derived status of the alleles is incorrectly assigned at one of the two loci, the detection power of r* is almost completely lost. Because the detection power decreases linearly with the rate of erroneous inference, the relationship between the error rate and the relative power is as in Figure 5, which illustrates the results for the coadaptation model; the compensation model yields essentially the same pattern. These results imply that the availability of reliable outgroup information is a key to an apt application of our directional measure.
|
Moreover, as with any other LD-based methods, an incorrect inference of the linkage relationship may cause a serious error in the above test, since the expected level of LD under neutrality is closely related to the degree of linkage between loci (i.e., recombination rate). This is especially so when the difference in mutant allele frequencies between the two loci is small [as for (0.8
0.9, 0.8
0.9); see Figure 4] such that the threshold level changes radically depending on the population scaled recombination rate; a precise estimation of the (scaled) recombination rate would then be a prerequisite to carry out a proper test. However, when the frequency difference is large, an erroneous estimation of the recombination rate would not be as serious an issue since the neutral threshold is nearly invariant across a wide range of recombination rates [as for (0.8
0.9, 0.1
0.2); see Figure 4]; in other words, sufficient power would be retained even if we accept a conservative estimate of recombination rate. Notably, this is exactly when the relative advantage of using the directional measure r* becomes rather substantial, as demonstrated above (see Tables 2 and 3). The present study demonstrates that when LD-based methods are used to detect epistatic interactions among cosegregating variants, there is a potential advantage in properly polarizing the direction of LD by additionally considering the ancestor–descendant relationships of alleles. Besides the difficulties associated with the ancestral inference, there still remain many practical problems as discussed above, but they are equally shared by any other LD-based tests, and it seems a general tendency that the test performance is improved by the directional measure. Since the relative performance of r* depends on the frequencies of the interacting variants (as indicated in Tables 2 and 3), in practice we may first wish to look for a pair of loci (or nucleotide sites) that shows a considerable amount of directional LD for a given combination of mutant allele frequencies at the two loci. By applying this primary screening to available polymorphism data, it would then be easier to choose the candidate pairs of loci that may be subject to further functional assays. Although there have been only a few suggested examples of positive interlocus interactions among naturally occurring molecular variants (e.g., RAWSON and BURTON 2002; CAICEDO et al. 2004), we expect that the effective search for epistatic interactions in the wild would be much facilitated by explicitly incorporating the ancestral-derived status of the segregating alleles into consideration.
CAICEDO, A. L., J. R. STINCHCOMBE, K. M. OLSEN, J. SCHMITT and M. D. PURUGGANAN, 2004 Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proc. Natl. Acad. Sci. USA 101: 15670–15675.
HILL, W. G., and A. ROBERTSON, 1968 Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38: 226–231.[CrossRef]
INNAN, H., and W. STEPHAN, 2001 Selection intensity against deleterious mutations in RNA secondary structures and rate of compensatory nucleotide substitutions. Genetics 159: 389–399.
KIMURA, M., 1985 The role of compensatory neutral mutations in molecular evolution. J. Genet. 64: 7–19.[CrossRef]
LANGLEY, C. H., and J. F. CROW, 1974 The direction of linkage disequilibrium. Genetics 78: 937–941.
LANGLEY, C. H., Y. N. TOBARI and K.-I. KOJIMA, 1974 Linkage disequilibrium in natural populations of Drosophila melanogaster. Genetics 78: 932–936.
LEWONTIN, R. C., 1964 The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49: 49–67.
LEWONTIN, R. C., and K. KOJIMA, 1960 The evolutionary dynamics of complex polymorphisms. Evolution 14: 450–472.[Medline]
RAWSON, P. D., and R. S. BURTON, 2002 Functional coadaptation between cytochrome c and cytochrome c oxidase within allopatric populations of a marine copepod. Proc. Natl. Acad. Sci. USA 99: 12955–12958.
TAKAHASI, K. R., and F. TAJIMA, 2005 Evolution of coadaptation in a two-locus epistatic system. Evolution 59: 2324–2332.[Medline]
Communicating editor: M. W. FELDMAN
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.107.085654v1
179/3/1705 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Takahasi, K. R.
- Articles by Innan, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Takahasi, K. R.
- Articles by Innan, H.













0). In the coadaptation model, s = 0.05. Other parameters are N = 100, p0 = 0.1, and c = 0.05.




