- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Zapata, C.
- Articles by Sacristán, F.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Zapata, C.
- Articles by Sacristán, F.
Spectrum of Nonrandom Associations Between Microsatellite Loci on Human Chromosome 11p15
Carlos Zapataa, Santiago Rodrígueza, Guillermo Visedoa, and Felipe Sacristánba Departamento de Biología Fundamental, Facultad de Biología, Universidad de Santiago, 15782 Santiago de Compostela, Spain
b Hospital Juan Canalejo, 15006 La Coruña, Spain
Corresponding author: Carlos Zapata, Departamento de Biología Fundamental, Area de Genética, Facultad de Biología, Universidad de Santiago, 15782 Santiago de Compostela, Spain., bfcazaba{at}usc.es (E-mail)
Communicating editor: D. CHARLESWORTH
| ABSTRACT |
|---|
Most evidence about nonrandom association of alleles at different loci, or gametic disequilibrium, across extensive anonymous regions of the human genome is based on the analysis of overall disequilibrium between pairs of microsatellites. However, analysis of interallelic associations is also necessary for a more complete description of disequilibrium. Here, we report a study characterizing the frequency and strength of both overall and interallelic disequilibrium between pairs of 12 microsatellite loci (CA repeats) spanning 19 cM (14 Mb) on human chromosome 11p15, in a large sample (810 haplotypes deduced from 405 individuals) drawn from a single population. Characterization of disequilibrium was carried out, taking into account the sign of the observed disequilibria. This strategy facilitates detection of associations and gives more accurate estimates of their intensities. Our results demonstrate that the incidence of disequilibrium over an extensive human chromosomal region is much greater than is commonly considered for populations that have expanded in size. In total, 44% of the pairs of microsatellite loci and 18% of the pairs of alleles showed significant nonrandom association. All the loci were involved in disequilibrium, although both the frequency and strength of interallelic disequilibrium were distributed nonuniformly along 11p15. These findings are especially relevant since significant associations were detected between loci separated by as much as 1719 cM (7 cM on average). It was also found that the overall disequilibrium masks complicated patterns of association between pairs of alleles, dependent on their frequency and size. We suggest that the complex mutational dynamics at microsatellite loci could explain the allele-dependent disequilibrium patterns. These observations are also relevant to evaluation of the usefulness of microsatellite markers for fine-scale localization of disease genes.
MULTILOCUS genetic theory represents an alternative to the classical theory of population genetics, which considers single genes as the units of selection (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Much empirical work over the past 30 years, based on protein loci, seemed to demonstrate that disequilibrium in outcrossing species was important only for very tightly linked protein loci (![]()
![]()
![]()
0.1 cM (
100 kb) of DNA (![]()
![]()
![]()
This prevailing view began to change during the past decade. It was demonstrated that the lack of statistical power of traditionally used tests for disequilibrium was actually an important barrier, hindering detection of disequilibria between allozyme loci in populations (![]()
![]()
4 cM) anonymous regions of human chromosomes, have provided some revealing results (![]()
![]()
![]()
![]()
![]()
4 cM, although a few associations have been detected between polymorphisms separated by >18 cM. Third, the distribution of disequilibrium seems to be nonuniform across the human genome. These findings disclose that disequilibrium extends over wider regions of the human genome than formerly thought, which reopens questions about the importance of the multilocus genetic systems. In addition, the available evidence strongly suggests that the human genome is not only a structural and functional mosaic (![]()
Nevertheless, our current knowledge of the levels of disequilibrium along extensive human chromosomal regions is still limited, in some respects. So far, associations between pairs of microsatellite loci over extensive anonymous human regions have been characterized by testing the null hypothesis of overall gametic equilibrium and by using probabilities resulting from the significance tests as a measure of the strength of overall disequilibrium. This kind of analysis is useful, although clearly insufficient to discern the actual amount (frequency and strength) of disequilibrium existing along human chromosomes. Overall disequilibrium analysis has the advantage of summarizing disequilibria between all possible pairs of alleles at multiallelic loci in a single test of significance and as a single measure of disequilibrium intensity, which can be particularly useful for loci with a high number of alleles, such as microsatellite markers. Nevertheless, an analysis based solely on overall disequilibrium necessarily forfeits important information underlying multiallelic systems. It must be remembered that the null hypothesis of no overall disequilibrium specifies that there is no disequilibrium for any pair of alleles at two loci. In this way, a given locus pair is judged to be in significant overall disequilibrium irrespective of whether one or many pairs of alleles deviate from random association. Undoubtedly, quantification of the levels of disequilibrium across the human genome requires an assessment of how many of the possible pairs of alleles across locus pairs are in significant disequilibrium. On the other hand, probabilities resulting from the significance tests are not the best tools for determining the strength (weak, moderate, or strong) of disequilibria, or for comparisons across loci, because statistical power is dependent on the sample size, the statistical tests, the number of alleles, their frequencies, and whether the association is positive or negative (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Alleles at multiallelic loci cannot be reasonably considered as a homogeneous whole, and thus, in addition to overall disequilibrium, allele-dependent disequilibrium analyses are clearly needed. A multiallelic locus is a very complex genetic system, comprising alleles differentiated in frequency, age, and evolutionary history (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article, we describe the amount of overall and interallelic disequilibrium between all possible pairs of 12 dinucleotide (CA repeats) microsatellites, located on the telomeric region of the short arm of human chromosome 11 (11p15), from a large sample of the Galician population [northwest (NW) Spain]. Distribution of nonrandom associations along 11p15 and allele-dependent disequilibrium patterns were also investigated. The significance and intensity of overall and interallelic disequilibria were obtained by taking into account the sign of the deviations from random association. This new strategy for analyzing disequilibria between multiallelic loci provides both more statistical power for detecting nonrandom associations and more precise estimates of the strength of the disequilibrium. The results of our investigations provide new insights concerning two-locus disequilibrium across extensive regions of the human genome.
| MATERIALS AND METHODS |
|---|
Study subjects:
Peripheral blood samples were obtained from 405 random unrelated individuals (169 males and 236 females) and five families (53 individuals). They were all Caucasian individuals of Galicia (NW Spain). Genomic DNA was isolated from blood samples using the QIAamp kit (QIAGEN, Chatsworth, CA). All samples were collected with the approval of the appropriate institutional review board.
Microsatellite markers and genotyping:
All individuals were genotyped for 12 dinucleotide (CA)n repeat loci spanning chromosome region 11p15. Genotyping was performed using the polymerase chain reaction (PCR). The loci typed in this study ordered from telomere (D11S4177) to centromere (D11S926); their chromosomal assignments [Location Data Base (LDB), http://cedar.genetics.soton.ac.uk/public_html/ldb.html] and amplification primers (GÉNÉTHON, ftp://ftp.genethon.fr/pub/Gmap/Nature-1995) are shown in Table 1. PCR amplifications were performed in accordance with conditions described in GÉNÉTHON, with some modifications to optimize each marker. General amplification reactions were carried out in a final volume of 25 µl, containing 120 ng genomic DNA, 10 mM Tris-HCl (pH 9.0), 50 mM KCl, 0.1% Triton X-100, 1.5 mM MgCl2, 50 pmol of each primer, 200 µM dNTP mix, and 1 unit Taq DNA polymerase. Glycerol 10% (v/v) was added to the PCR buffer to maximize efficient specific amplification (![]()
![]()
![]()
![]()
|
Estimation of haplotype frequencies and deviations from Hardy-Weinberg proportions:
Maximum-likelihood estimates of two-locus haplotype frequencies were obtained from genotype data (405 individuals), using an expectation-maximization (EM) algorithm (![]()
![]()
2) distribution with 1 d.f (![]()
![]()
Estimation of gametic disequilibrium:
Two-locus disequilibrium was studied at two different levels. One analysis estimated the disequilibrium for each pair of alleles or haplotype. The second involved an overall disequilibrium analysis, condensing the information of disequilibrium between all alleles at the two loci. Two-locus haplotypes with singletons or alleles occurring once in the sample were excluded from the disequilibrium analysis.
Let us consider two loci A and B, having Ai (i = 1, ... , k) and Bj (j = 1, ... , l) alleles, respectively. Let pi and qj be the frequencies of alleles i and j, respectively, and Xij be the relative frequency of the haplotype AiBj. Disequilibrium for each possible haplotype AiBj may be considered separately by collapsing the data into Ai vs. not Ai (A
) at the A locus and Bj vs. not Bj (B
) at the B locus. In this way, the full array of possible two-locus haplotypes is partitioned into k x l separate 2 x 2 contingency tables (![]()
![]()
, A
Bj, and A
B
become Xij, pi - Xij, qj - Xij, and 1 - pi - qj + Xij, respectively. We consider AiBj and A
B
to be the haplotype classes in coupling, AiB
and A
Bj to be the repulsion classes. The strength of disequilibrium was analyzed in terms of D' coefficients because of their useful properties (![]()
![]()
![]()
![]()
![]()
![]()
![]()
ki=1
lj=1piqj|D'ij|, which makes use of the absolute values of D'ij weighted by the frequencies of the haplotypes expected at gametic equilibrium (![]()
![]()
The null hypothesis of gametic equilibrium for each pair of alleles (Dij = 0) was tested by
2ij =
, which approximates a
2 distribution with 1 d.f., where n is the number of individuals sampled (![]()
![]()
2 test with Yates's correction to avoid spurious rejection of the null hypothesis when expectations are too small (![]()
2 distributions, with (k - 1)(l - 1) d.f. (![]()
![]()
![]()
![]()
Estimation of disequilibrium by sign:
Overall disequilibrium includes both positive and negative interallelic gametic associations. It was demonstrated that the statistical power of the chi-square test, for detecting disequilibrium between two loci with two alleles each, depends strongly on the sign of disequilibrium (![]()
![]()
![]()
![]()
The aforementioned observations can be extended to multiallelic systems because significance of each interallelic association by
2ij-statistic and its intensity by D'ij coefficient are obtained by collapsing the data to a 2 x 2 contingency table, which has the same framework as those systems of two loci with two alleles each. For multiallelic systems, it is easy to verify that if pi < 0.5 and qj < 0.5, haplotypes involving the most or the least frequent "alleles" turn out to be the coupling classes (AiBj and A
B
). If so, positive interallelic associations can be more easily detected by the chi-square tests. In addition, they provide a more accurate estimation of the strength of disequilibrium because positive D'ij estimates will have smaller sampling variance than negative D'ij ones of the same intensity under otherwise equivalent conditions. The reverse is true when repulsion classes (AiB
, A
Bj) contain the most or the least frequent alleles. The significance of each positive (or negative) interallelic association was determined by
2ij-statistic (one sided) with Yates' correction.
We derived two different measures of the strength of global disequilibrium, depending on the sign of the interallelic associations, which are defined as

where piqj(+) and piqj(-) are the expected frequencies of the haplotypes with positive [D'ij(+)] and negative [D'ij(-)] association, respectively. The range of D'(+) and D'(-) also varies from 0 to a maximum value close or equal to 1 (C. ZAPATA, unpublished results).
Significance of overall positive gametic disequilibrium was tested by means of standard simultaneous-inference statistical procedures. Thus, the null hypothesis of overall gametic equilibrium specifies that there is no disequilibrium for any pair of alleles (i.e., Dij = 0) and was tested against the alternative Dij > 0 from the individual
2ij (one sided) with Yates' correction. Because of the large number of Dij > 0 involved at each locus pair, it is likely that at least one
2ij would be nominally significant, even if there were no real disequilibrium. The usual experimental error rate of 0.05 for multiple comparisons was controlled using the Bonferroni method (![]()
![]()
![]()
Relationship of disequilibrium with frequency of recombination and physical distance:
The correlation between recombination frequency (or physical distance) and the strength of association between loci was investigated using the Pearson product-moment correlation coefficient (r) and Kendall's nonparametric coefficient of rank correlation (
; ![]()
![]()
![]()
| RESULTS |
|---|
Polymorphisms and Hardy-Weinberg proportions:
A total of 405 individuals were scored for their genotype at 12 microsatellite loci (CA repeats) distributed along 11p15 (Fig 1). The allele sizes and their frequency distributions at the 12 loci are shown in Fig 2. The distributions of allele frequencies vary considerably from locus to locus and appear to be bimodal at some loci. In addition, alleles of extreme size at the loci generally occur at very low frequencies. Levels of polymorphism at the 12 loci are shown in Table 2. The number of alleles per locus ranged from 8 (D11S926) to 15 (D11S1331) and averaged 12. An analysis of individual loci shows that the number of alleles is always higher in our sample than in that reported by GÉNÉTHON from a smaller sample size (data not shown). Of the 142 alleles, 69 (49%) had a frequency
3%. This high proportion of alleles occurring at low frequency distributes quite uniformly over loci. All systems are highly polymorphic, and unbiased expected allelic diversity estimates range from 0.62 (D11S1323) to 0.85 (D11S4121), with an average (±SE) of 0.75 ± 0.02. Overall, the distributions of allele frequencies and the levels of polymorphism found in our population are comparable to those reported in other studies of dinucleotide repeat markers (![]()
![]()
![]()
![]()
|
|
|
From our genotypic data, it is not possible to distinguish the coupling and repulsion double heterozygotes, and estimates of two-locus haplotype frequencies were obtained using the EM algorithm. We examined whether there is evidence of deviations from HWP, as any departure from these conditions may lead to erroneous estimates of haplotype frequencies (![]()
Interallelic disequilibrium:
A total of 6797 two-locus haplotypes can be evaluated between the alleles of the 12 microsatellite loci studied when only alleles occurring more than once in our sample (122 alleles) are considered. Fig 3A shows the distribution of the magnitude of interallelic disequilibrium measured by D'ij across all the pairwise comparisons. It can be seen that D'ij values span the whole range of possible values (from -1 to +1), but the proportion of haplotypes with negative deviations from random association is considerably higher than that with positive deviations. Of the 6797 haplotypes, 4287 (63%) showed negative deviations, whereas 2510 (37%) gave positive ones. This discrepancy is due mainly to the high proportion of D'ij values that are -1. Thus, a total of 2839 of the 6797 (42%) D'ij are -1, indicating the absence of many of the possible haplotypes. In contrast, only 0.6% (42/6797) of D'ij are found to be +1. On the other hand, only 364 of 6797 (5%) pairs of alleles are in significant disequilibrium (P < 0.05) by the
2 test (two sided).
|
The distribution of D'ij values can be more easily understood by taking into account that all the alleles, except one (at the D11S1323 locus), had a frequency <0.5 in the sample (Fig 2). This means that most coupling haplotype classes in the 2 x 2 contingency tables used to obtain the D'ij estimates involve the most or the least frequent alleles (see MATERIALS AND METHODS). This structure of the haplotype classes has two important consequences. First, sampling error would be sufficient to explain the high proportion of D'ij = -1, because those haplotypes carrying two alleles at very low frequencies are unlikely to be detected in the sample. Indeed, the proportion of D'ij = -1 decreased dramatically (from 42 to 8%) when rarer alleles (frequency
3%) were removed from the analysis (Fig 3B). Second, the sampling variance of negative interallelic associations is expected to be greater than that of positive ones, as described in MATERIALS AND METHODS. Therefore, negative associations will tend to overestimate disequilibrium intensity. In addition, negative associations will be less likely to be detected by significance tests in comparison to positive ones. These predictions are well supported by the analysis of our data. Thus, mean values of D'ij(-) and D'ij(+) across loci were -0.756 ± 0.006 and 0.164 ± 0.004, respectively. These differences are not completely explained by the high occurrence of both D'ij = -1 and alleles at a low frequency. If both D'ij = ±1 and rarer alleles (
3%) are ignored, mean values of D'ij(-) and D'ij(+) were -0.258 ± 0.007 and 0.082 ± 0.002, respectively. On the other hand, percentages of significant D'ij(-) and D'ij(+) values calculated by the
2 test with Yates' correction (one sided) were 4% (160/4287) and 18% (456/2510), respectively.
Positive interallelic disequilibrium:
The aforementioned observations make it advisable to confine our attention to only interallelic disequilibrium with positive sign. Cases of significant positive interallelic disequilibrium (456/2510) are distributed across all 66 possible pairs of loci (Fig 4). However, percentages of significant disequilibrium between pairs of alleles, as well as their intensities, were not distributed uniformly across locus pairs. Percentages ranged from 6% (D11S4188 x D11S926) to 42% (D11S4124 x D11S1760) and averaged 18 ± 0.9%. The means of D'ij(+) over significant comparisons ranged from 0.171 ± 0.066 (D11S4188 x D11S926) to 0.565 ± 0.251 (D11S1323 x D11S1331) and averaged 0.345 ± 0.011. It must be noted that a substantial proportion of alleles are in moderate or strong disequilibrium (Fig 5). By way of illustration, 144 of 456 (32%) significant positive nonrandom associations exhibited D'ij(+) values >0.40. As is shown in detail below, the occurrence of patterns of significant interallelic associations dependent on the frequency and size of the alleles allows us to exclude that the detected associations can be explained merely by type I error.
|
|
Overall disequilibrium:
Pairs of loci in significant overall disequilibrium by likelihood-ratio
2 tests, as well as their intensities by D' disequilibrium coefficient, are given in Table 3. Only 4 of 66 pairwise comparisons (6%) were significantly different from 0 (P < 0.05). These results could be explained by type I error and seem to suggest that little global disequilibrium occurs between microsatellites located at 11p15. Nevertheless, likelihood-ratio
2 tests consider both positive as well as negative interallelic disequilibrium, which might diminish statistical power.
|
Positive overall disequilibrium:
Overall disequilibrium, analyzed by considering only those positive interallelic associations, presents quite a different picture. Pairs of loci in significant overall positive disequilibrium are shown in Table 4. Also shown is the lowest probability of those Yates' correction
2ij values for each locus pair, which allows us to reject the null hypothesis of no global disequilibrium (i.e., all Dij = 0) by using the Bonferroni criterion. It is noteworthy that a substantial number of pairs of loci are now in significant overall disequilibrium, despite the fact that the Bonferroni correction is believed to be very conservative (i.e., maintains the null hypothesis too frequently; see ![]()
![]()
|
Disequilibrium depending on the allele frequency:
The objective of this analysis was to ask whether the amount of positive disequilibrium is related to the allele frequency at loci. To investigate this, we stratified the sample of two-locus haplotypes (2510) into RR, RC, and CC, where R and C indicate rarer (frequency
3%) and more common (frequency >3%) alleles, respectively. Then the amount of disequilibrium was analyzed separately for each haplotype data set. The results are shown in Table 5. It can be seen that disequilibrium is not randomly distributed with respect to these three haplotype data sets. The percentage of significant disequilibria is considerably higher in RR haplotypes (61%) than in RC (17%) and CC (14%) haplotypes. Differences of disequilibrium between RR haplotypes and any of the other haplotype classes are highly significant (P << 0.001) based on
2 tests for 2 x 2 contingency tables (1 d.f.). In addition, although CC haplotypes show somewhat less disequilibrium than RC haplotypes, differences are still significant (P < 0.05).
|
Overall, these results provide evidence that the frequency of significant disequilibrium depends strongly on the allele frequencies. This cannot be exclusively attributable to differences in statistical power, because the power of the
2 test is expected to decline as allele frequencies become extreme (![]()
![]()
![]()
2 tests (i.e., that reject the null hypothesis too frequently). If so, some disequilibrium occurring between rarer alleles (frequency
3%) could be artifactual. However, the
2 test is robust and generally provides actual significance levels close to or smaller than the usual nominal level of 0.05 (![]()
![]()
![]()
![]()
2 test with Yates' correction, which is commonly considered to be conservative (![]()
![]()
![]()
2 with Yates' correction by Monte Carlo simulations under the conditions of our experiment. We constructed populations for a system of two loci with two alleles each, with genotype frequencies following HWP and haplotype frequencies at gametic equilibrium, for given allelic frequencies. Then, 10,000 replicate random genotype samples of size 405 were drawn from each of these populations with replacement. Haplotype frequencies were determined from each of the 10,000 random genotype samples according to ![]()
2 test with Yates' correction (one sided) was computed for each sample, and type I error probabilities were calculated as the proportion of significant positive disequilibria at the
= 0.05 nominal significance level. Simulation results show that those empirical type I errors of the
2 test with Yates' correction are always less than the designed type I error of
= 0.05, especially when allele frequencies are extreme (Table 6). Therefore, it can be concluded that those significant interallelic disequilibria detected at 11p15 are genuine and not an artifact resulting from the lack of rigor in the tests for significance. On the contrary, it is likely that disequilibrium between rarer alleles is being underestimated.
|
Disequilibrium depending on the allele size:
We examined whether the amount of positive disequilibrium is related to allele size. Fig 2 shows that all alleles at the upper and lower extremes of the size distributions (except allele 92 at locus D11S4121) are found at low frequencies (
3%). Accordingly, those alleles in the left and right tails of the size distributions that were present at a frequency
3% and that were separated by alleles more frequent than 3% were designated extreme-size (E) alleles, while the remaining alleles were designed intermediate-size (I) alleles. For example, only alleles 121, 143, and 145 at the D11S1318 locus are E alleles under this criterion (see Fig 2). Having established the categories of allele sizes, we stratified two-locus haplotypes as EE, EI, and II, and disequilibrium was analyzed for each haplotype class separately (Table 7). The results obtained show that disequilibrium is very heterogeneous across these haplotype data sets. Thus, EE haplotypes exhibit much greater relative amounts of disequilibrium than do the remaining haplotypes (Table 7). We found that disequilibrium occurs in 74% of EE haplotypes, but in only 21 and 15% of EI and II haplotypes, respectively. These differences in the amount of disequilibrium become highly significant (P < 0.001) when evaluated by
2 tests in 2 x 2 contingency tables. In addition, this pattern of disequilibrium dependent on allele size cannot be explained by variations in recombination, since the mean of the recombination frequency within each haplotype class is very similar (
7 cM). To minimize differences in statistical power among comparisons that arise from variable polymorphisms associated with size classes, the data were reanalyzed excluding two-locus haplotypes with alleles at frequencies >3%. The results of this analysis, however, do not qualitatively change the conclusions, which retain the same aforementioned pattern of size-dependent disequilibrium (Table 7). Therefore, the occurrence of interallelic disequilibrium is not only dependent on the allele frequency but also on the size of the alleles.
|
Relationship of disequilibrium with frequency of recombination and physical distance:
The 12 microsatellites chosen for this study span
19 cM, or 14 Mb, on chromosome 11p15. Recombination frequencies (in centimorgans) and physical distances (in megabases) for pairwise comparisons in significant overall disequilibrium are shown in Table 4. It can be seen clearly that disequilibrium is not confined to the closest pairs of loci. The largest recombination frequencies between the loci in disequilibrium were 1719 cM (three pairs), whereas the largest physical distances were 1214 Mb (six pairs). The recombination frequency and physical distance across significant pairs averaged 7.55 ± 1.02 cM and 5.96 ± 0.82 Mb, respectively. Note that disequilibrium was detected even between D11S4177 and D11S4121, the most distant pair in the set that we studied (19 cM and 14 Mb). We determined whether significant overall nonrandom associations decrease as recombination frequency or physical distance increases. Since loci located more distantly from one another undergo recombination more frequently, they are expected to exhibit less disequilibrium than more closely linked loci if evolutionary agents generating or randomizing nonrandom associations act uniformly across the chromosomal region studied. For the comparison, we classified the number of significant pairwise comparisons according to the same arbitrary criterion used in previous disequilibrium studies (![]()
![]()
2 = 0.07, P = 0.79) or physical distance (
2 = 0.26, P = 0.61).
We looked for a negative relationship of the strength of overall and interallelic disequilibrium with the recombination frequency and physical distance for the 12 microsatellite loci spanning much of the 11p15 chromosome region. Only the pairs of alleles and loci in significant disequilibrium were included in this analysis. A weak but significant negative correlation was found by the Mantel test (one sided) between D'(+) and recombination frequency (r = -0.226,
= -0.177, P = 0.037, n = 29). In contrast, a negative but not significant correlation was detected with physical distance (r = -0.179,
= -0.151, P = 0.079, n = 29). On the other hand, D'ij(+) values do not appear to be correlated with recombination frequency (r = -0.068,
= -0.041, P = 0.098, n = 456) or with physical distance (r = -0.074,
= -0.042, P = 0.071, n = 456). However, we found significant negative correlations between D'ij(+) and recombination frequency when only alleles present at higher frequencies are considered, and indeed the strength of the correlation showed an increasing trend with allele frequency (e.g., r = -0.192,
= -0.095, P = 0.019, n = 96 for alleles at a frequency >6%; and r = -0.284,
= -0.128, P = 0.017, n = 50 for alleles at a frequency >9%). A similar trend was detected between the strength of interallelic disequilibrium and physical distance (r = -0.198,
= -0.071, P = 0.015, n = 96 for alleles at a frequency >6%; and r = -0.209,
= -0.079, P = 0.055, n = 50 for alleles at a frequency >9%).
| DISCUSSION |
|---|
This study is the most complete characterization to date of disequilibrium patterns between pairs of microsatellite markers over an extensive (19 cM and 14 Mb) anonymous region of the human genome. We provide, for the first time, estimates of the frequency and strength of both overall and interallelic disequilibria on the basis of 12 microsatellite loci (CA repeats) spanning human chromosome 11p15. We also derived a novel approach for analyzing disequilibrium between pairs of multiallelic loci on the basis of the sign of the interallelic associations. Our results demonstrate that the statistical power is much greater, for given allelic frequencies, in detecting positive rather than negative interallelic associations, which is in agreement with previous evidence on two-allele systems (![]()
![]()
![]()
We found that all the loci studied were involved in significant disequilibrium, and disequilibrium was thus distributed over the entire 11p15 region. In total, 44% of pairs of loci were in significant overall disequilibrium, whereas 18% of pairs of alleles deviated significantly from random association. The strength of the significant overall disequilibria lay within a narrow interval of low values [D'(+) from 0.050 to 0.138]. In contrast, the strength of the significant interallelic disequilibria was very heterogeneous, ranging from weak to strong, with a substantial proportion (32%) of pairs showing strong nonrandom association [D'ij(+) > 0.40]. This does not contradict the finding that the strength of overall disequilibria across pairs of loci was weak: D'(+) is a measure of overall disequilibrium, which will adopt low values if the proportion of haplotypes at gametic equilibrium outweighs the proportion of haplotypes in disequilibrium.
Overall, the amount of disequilibrium detected in 11p15 is considerable given the high recombination frequencies for loci in disequilibrium (7 cM on average). Disequilibrium was detected even between the most distant locus pair in the set studied (D11S4177 x D11S4121, 19 cM and 14 Mb). Table 8 shows that the percentage of pairs of microsatellites in overall disequilibrium in 11p15 (44%) is much higher (
39 times higher) than previously reported for pairs of dinucleotide microsatellites along extensive anonymous regions of human chromosomes in fast-growing populations (514%; ![]()
![]()
|
A variety of factors may cause disequilibrium in populations, such as founder effects, genetic drift, stratification or admixture of populations, selection, and recent mutations (see Introduction for references). Founder effects appear to be an unlikely explanation for the observed disequilibria in 11p15, as the available genetic evidence on the demographic history of the Galician population is compatible with a population expansion model in Europe during the Upper Paleolithic, and there is no evidence for recent founder effects (![]()
![]()
1/(1 + 4Nec) (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
A more likely explanation for the observed disequilibria may be related to the mutational dynamics of microsatellite loci. The analysis of patterns of disequilibrium dependent on allele frequency and size offers a new approach for interpreting associations between the microsatellites occurring in 11p15 on the basis of their mutational dynamics. An interesting finding of this analysis is that alleles present at low frequency (
3%), as well as alleles of extreme size, tend to be much more in disequilibrium than the remaining alleles. This can be interpreted as a consequence of the high mutation rate (say 10-3) and complex mutational dynamics of microsatellite loci (see Introduction for references). Many of the disequilibria detected between rarer alleles might be the result of relatively recent mutational events. Low-frequency alleles are more likely to have arisen recently in the population (by the introduction of new mutations) than alleles of moderate frequency, because time is necessary for new alleles to become common (![]()
With regard to the higher frequency of disequilibrium in haplotypes bearing alleles of extreme size, previous studies already reported evidence for a relationship between disequilibrium and allele size for tightly linked microsatellite loci in human populations. A positive correlation between the sizes of two alleles in each haplotype was reported for two pairs of microsatellites separated by 212 and 7 kb (![]()
![]()
![]()
![]()
![]()
![]()
Gametic disequilibrium analysis in humans has proven extremely useful for fine mapping of a large number of loci that have major effects in rare Mendelian diseases. In contrast, the usefulness of whole-genome association studies, recently proposed as a means of identifying the numerous genes of weak effect that underlie susceptibility to common diseases, remains controversial (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
0.20 [D'ij(+)
0.47]; indeed d2 was in some cases as high as 0.42 [D'ij(+) = 0.74].
We detected no global relationship of the strength of interallelic disequilibrium with recombination frequency or with physical distance. A variety of factors may disturb the expected negative relationship between disequilibrium strength and recombination frequency (or physical distance) for distant markers (![]()
![]()
The amount of disequilibrium may differ greatly across chromosomes, so there is a clear need for a whole-genome gametic disequilibrium map (![]()
![]()
![]()
![]()
![]()
![]()
![]()
| FOOTNOTES |
|---|
We dedicate this article to our friend, colleague, and coauthor Guillermo Visedo, who died on March 8, 1998. ![]()
| ACKNOWLEDGMENTS |
|---|
We thank Deborah Charlesworth and three anonymous reviewers for useful comments. We also thank E. Vázquez Martul of the Hospital Juan Canalejo (La Coruña, Spain) for blood samples. This work was supported by grants XUGA 20002B95 and 20001B97 (to C.Z.) from the Xunta de Galicia (Spain).
Manuscript received November 17, 2000; Accepted for publication April 13, 2001.
| LITERATURE CITED |
|---|
AGRESTI, A., 1990 Categorical Data Analysis. John Wiley & Sons, New York.
BERNARDI, G., 1995 The human genome: organization and evolutionary history. Annu. Rev. Genet. 29:445-476[Medline].
BODMER, W. F., 1986 Human genetics: the molecular challenge. Cold Spring Harbor Symp. Quant. Biol. 51:1-13.
BOEHNKE, M., 2000 A look at linkage disequilibrium. Nat. Genet. 25:246-247[Medline].
BROWN, A. H. D., 1975 Sample sizes required to detect linkage disequilibrium between two or three loci. Theor. Popul. Biol. 8:184-201[Medline].
BUDOWLE, B., R. CHAKRABORTY, A. M. GIUSTI, A. J. EISENBERG, and R. C. ALLEN, 1991 Analysis of the VNTR locus D1S80 by the PCR followed by high-resolution PAGE. Am. J. Hum. Genet. 48:137-144[Medline].
CAMILLI, G. and K. D. HOPKINS, 1978 Applicability of chi-square to 2 x 2 contingency tables with small expected cell frequencies. Psychol. Bull. 85:163-167.
CHAKRABORTY, R. and K. WEISS, 1988 Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc. Natl. Acad. Sci. USA 85:9119-9123
COLLINS, A., C. LONJOU, and N. E. MORTON, 1999 Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl. Acad. Sci. USA 96:15173-15177
CURIE-COHEN, M., 1982 Estimates of inbreeding in a natural population: a comparison of sampling properties. Genetics 100:339-358
D'AGOSTINO, R. B., W. CHASE, and A. BELANGER, 1988 The appropriateness of some common procedures for testing the equality of two independent binomial populations. Am. Stat. 42:198-202.
DEKA, R., L. JIN, M. D. SHRIVER, L. M. YU, and S. DECROO et al., 1995 Population genetics of dinucleotide (dC-dA)n · (dG-dT)n polymorphisms in world populations. Am. J. Hum. Genet. 56:461-474[Medline].
DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39:1-38.
EAVES, I. A., T. R. MERRIMAN, R. A. BARBER, S. NUTLAND, and E. TUOMILEHTO-WOLF et al., 2000 The genetically isolated populations of Finland and Sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes. Nat. Genet. 25:320-323[Medline].
ELLEGREN, H., 2000 Heterogeneous mutation processes in human microsatellite DNA sequences. Nat. Genet. 24:400-402[Medline].
EVERITT, B. S., 1997 The Analysis of Contingency Tables. Chapman & Hall, London.
FALLIN, D. and N. J. SCHORK, 2000 Accuracy of haplotype frequency estimation for biallelic loci, via the Expectation-Maximization algorithm for unphased diploid genotype data. Am. J. Hum. Genet. 67:947-959[Medline].
FALUSH, D. and Y. IWASA, 1999 Size-dependent mutability and microsatellite constraints. Mol. Biol. Evol. 16:960-969.
FRANKLIN, I. and R. C. LEWONTIN, 1970 Is the gene the unit of selection? Genetics 65:707-734
FREIMER, N. B., S. K. SERVICE, and M. SLATKIN, 1997 Expanding on population studies. Nat. Genet. 17:371-373[Medline].
GARZA, J. C., M. SLATKIN, and N. B. FREIMER, 1995 Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Mol. Biol. Evol. 12:594-603[Abstract].
GOLDSTEIN, D. B. and D. D. POLLOCK, 1997 Launching microsatellites: a review of mutation processes and methods of phylogenetic inference. J. Hered. 88:335-342
GORDON, D., I. SIMONIC, and J. OTT, 2000 Significant evidence for linkage disequilibrium over a 5 cM region among Afrikaners. Genomics 66:87-92[Medline].
GUO, S. W. and E. A. THOMPSON, 1992 Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 48:361-372[Medline].
HAAS, H., B. BUDOWLE, and G. WEILER, 1994 Horizontal polyacrylamide gel electrophoresis for the separation of DNA fragments. Electrophoresis 15:153-158[Medline].
HÄSTBACKA, J., A. DE LA CHAPELLE, M. M. MAHTANI, G. CLINES, and M. P. REEVE-DALY et al., 1994 The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78:1073-1087[Medline].
HEDRICK, P. W., 1987 Gametic disequilibrium measures: proceed with caution. Genetics 117:331-341
HEDRICK, P., S. JAIN, and L. HOLDEN, 1978 Multilocus systems in evolution. Evol. Biol. 11:101-184.
HILL, W. G., 1974 Estimation of linkage disequilibrium in randomly mating populations. Heredity 33:229-239[Medline].
HILL, W. G. and A. ROBERTSON, 1968 Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38:226-231.
HOLM, S., 1979 A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6:65-70.
HUTTLEY, G. A., M. W. SMITH, M. CARRINGTON, and S. J. O'BRIEN, 1999 A scan for linkage disequilibrium across the human genome. Genetics 152:1711-1722
JIN, L., C. MACAUBAS, J. HALLMAYER, A. KIMURA, and E. MIGNOT, 1996 Mutation rates among alleles at a microsatellite locus: phylogenetic evidence. Proc. Natl. Acad. Sci. USA 93:15285-15288
JORDE, L. B., 2000 Linkage disequilibrium and the search for complex disease genes. Genome Res. 10:1435-1444
KARLIN, S. and A. PIAZZA, 1981 Statistical methods for assessing linkage disequilibrium at the HLA-A, B, C loci. Ann. Hum. Genet. 45:79-94[Medline].
KENDALL, F. B. A., and A. STUART, 1979 The Advanced Theory of Statistic. Vol 2. Inference and Relationship. Charles Griffin & Company, London.
KEREM, B. S., J. M. ROMMENS, J. A. BUCHANAN, D. MARKIEWICZ, and T. K. COX et al., 1989 Identification of the cystic fibrosis gene: genetic analysis. Science 245:1073-1080
KOSAMBI, D. D., 1944 The estimation of the map distance from recombination values. Ann. Eugen. 12:172-175.
KRUGLYAK, L., 1999 Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144[Medline].
LAAN, M. and S. PÄÄBO, 1997 Demographic history and linkage disequilibrium in human populations. Nat. Genet. 17:435-438[Medline].
LEWONTIN, R. C., 1964 The interaction of selection and linkage. I. General considerations: heterotic models. Genetics 49:49-67
LEWONTIN, R. C., 1974 The Genetic Basis of Evolutionary Change. Columbia University Press, New York.
LEWONTIN, R. C., 1985 Population genetics. Annu. Rev. Genet. 19:81-102[Medline].
LU, H. L. and S. NÈGRE, 1993 Use of glycerol for enhanced efficiency and specificity of PCR amplification. Trends Genet. 9:297[Medline].
MACAUBAS, C., L. JIN, J. HALLMAYER, A. KIMURA, and E. MIGNOT, 1997 The complex mutation pattern of a microsatellite. Genome Res. 7:635-641
MANTEL, N., 1967 The detection of disease clustering and a generalized regression approach. Cancer Res. 27:209-220
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
NEI, M. and D. GRAUR, 1984 Extent of protein polymorphism and the neutral mutation theory. Evol. Biol. 17:73-118.
NEI, M. and W. H. LI, 1973 Linkage disequilibrium in subdivided populations. Genetics 75:213-219
OHTA, T., 1982 Linkage disequilibrium due to random genetic drift in finite subdivided populations. Proc. Natl. Acad. Sci. USA 79:1940-1944
OTT, J., 1999 Analysis of Human Genetic Linkage. The Johns Hopkins University Press, Baltimore.
OTT, J. and D. RABINOWITZ, 1997 The effect of marker heterozygosity on the power to detect linkage disequilibrium. Genetics 147:927-930[Abstract].
PENA, S. D. J., K. T. SOUZA, M. ANDRADE, and R. CHAKRABORTY, 1994 Allelic associations of two polymorphic microsatellites in intron 40 of the human von Willebrand factor gene. Proc. Natl. Acad. Sci. USA 81:723-727.
PETERSON, A. C., A. DI RIENZO, A. E. LEHESJOKI, A. DE LA CHAPELLE, and M. SLATKIN et al., 1995 The distribution of linkage disequilibrium over anonymous genome regions. Hum. Mol. Genet. 4:887-894
RICE, W. R., 1989 Analyzing tables of statistical tests. Evolution 43:223-225.
RISCH, N. J., 2000 Searching for genetic determinants in the new millennium. Nature 405:847-856[Medline].
RISCH, N. and K. MERIKANGAS, 1996 The future of genetic studies of complex human diseases. Science 273:1516-1517
ROBERTSON, A. and W. G. HILL, 1984 Deviations from Hardy-Weinberg proportions: sampling variances and use in estimation of inbreeding coefficients. Genetics 107:703-718
ROTHMAN, K. J., 1990 No adjustments are needed for multiple comparisons. Epidemiology 1:43-46[Medline].
SALAS, A., D. COMAS, M. V. LAREU, J. BERTRANPETIT, and A. CARRACEDO, 1998 mtDNA analysis of the Galician population: a genetic edge of European variation. Eur. J. Hum. Genet. 6:365-375[Medline].
SAMADI, S., F. ERARD, A. ESTOUP, and P. JARNE, 1998 The influence of mutation, selection and reproductive systems on microsatellite variability: a simulation approach. Genet. Res. 71:213-222.
SCHLÖTTERER, C. and D. TAUTZ, 1992 Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 20:211-215
SCHLÖTTERER, C., R. RITTER, B. HARR, and G. BREM, 1998 High mutation rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele-specific mutation rates. Mol. Biol. Evol. 15:1269-1274[Abstract].
SCHNEIDER, S., D. ROESSLI and L. EXCOFFIER, 2000 Arlequin, ver. 2000. Genetics and Biometry Laboratory, University of Geneva, Switzerland. Available from http://anthropologie.unige.ch/arlequin/.
SHERRINGTON, R., G. MELMER, M. DIXON, D. CURTIS, and B. MANKOO et al., 1991 Linkage disequilibrium between two highly polymorphic microsatellites. Am. J. Hum. Genet. 49:966-971[Medline].
SLATKIN, M., 1994 Linkage disequilibrium in growing and stable populations. Genetics 137:331-336[Abstract].
SLATKIN, M. and L. EXCOFFIER, 1996 Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm. Heredity 76:377-383.
SMOUSE, P. E., J. C. LONG, and R. R. SOKAL, 1986 Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35:627-632.
SOKAL, R. R., and F. J. ROHLF, 1995 Biometry. W. H. Freeman, New York.
STEPHENS, J. C., D. BRISCOE, and S. J. O'BRIEN, 1994 Mapping by admixture linkage disequilibrium in human populations: limits and guidelines. Am. J. Hum. Genet. 55:809-824[Medline].
TAILLON-MILLER, P., I. BAUER-SARDIÑA, N. L. SACCONE, J. PUTZEL, and T. LAITINEN et al., 2000 Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat. Genet. 25:324-328[Medline].
TAKAHATA, N. and Y. SATTA, 1997 Evolution of the primate lineage leading to modern humans: phylogenetic and demographic inferences from DNA sequences. Proc. Natl. Acad. Sci. USA 94:4811-4815
TERWILLIGER, J. D. and K. M. WEISS, 1998 Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr. Opin. Biotechnol. 9:578-594[Medline].
THOMPSON, E. A., S. DEEB, D. WALKER, and A. G. MOTULSKY, 1988 The detection of linkage disequilibrium between closely linked markers: RFLPs at the AI-CIII apolipoprotein genes. Am. J. Hum. Genet. 42:113-124[Medline].
UPTON, G. J. C., 1982 A comparison of alternative tests for the 2 x 2 comparative trial. J. R. Stat. Soc. A 145:86-105.
VALDES, A. M., M. SLATKIN, and N. B. FREIMER, 1993 Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737-749[Abstract].
WEI, C. C., F. T. CHIANG, K. S. LIN, and L. I. LIN, 1999 The spectrum of microsatellite loci on chromosomes 7 and 8 in Taiwan aboriginal populations: a comparative population genetic study. Hum. Genet. 104:333-340[Medline].
WEIR, B. S., 1979 Inferences about linkage disequilibrium. Biometrics 35:235-254[Medline].
WEIR, B. S. and C. C. COCKERHAM, 1978 Testing hypothesis about linkage disequilibrium with multiple alleles. Genetics 88:633-642
WEISSENSTEINER, T. and J. S. LANCHBURY, 1996 Strategy for controlling preferential amplification and avoiding false negatives in PCR typing. BioTechniques 21:1102-1108[Medline].
WILSON, J. F. and D. B. GOLDSTEIN, 2000 Consistent long-range linkage disequilibrium generated by admixture in a Bantu-Semitic hybrid population. Am. J. Hum. Genet. 67:926-935[Medline].
ZAPATA, C., 2000 The D' measure of overall gametic disequilibrium between pairs of multiallelic loci. Evolution 54:1809-1812[Medline].
ZAPATA, C. and G. ALVAREZ, 1992 The detection of gametic disequilibrium between allozyme loci in natural populations of Drosophila.. Evolution 46:1900-1917.
ZAPATA, C. and G. ALVAREZ, 1993 On the detection of nonrandom associations between DNA polymorphisms in natural populations of Drosophila.. Mol. Biol. Evol. 10:823-841[Abstract].
ZAPATA, C. and G. ALVAREZ, 1997a On Fisher's exact test for detecting gametic disequilibrium between DNA polymorphisms. Ann. Hum. Genet. 61:71-77[Medline].
ZAPATA, C. and G. ALVAREZ, 1997b Testing for homogeneity of gametic disequilibrium among populations. Evolution 51:606-607.
ZAPATA, C. and G. VISEDO, 1995 Gametic disequilibrium and physical distance. Am. J. Hum. Genet. 57:190-191[Medline].
ZAPATA, C., G. ALVAREZ, and C. CAROLLO, 1997 Approximate variance of the standardized measure of gametic disequilibrium D'. Am. J. Hum. Genet. 61:771-774[Medline].
ZERBA, K. E., A. M. KESSLING, J. DAVIGNON, and C. F. SING, 1991 Genetic structure and the search for genotype-phenotype relationships: an example from disequilibrium in the Apo B gene region. Genetics 129:525-533[Abstract].
This article has been cited by other articles:
![]() |
I. A. Yang, O. Holz, R. A. Jorres, H. Magnussen, S. J. Barton, S. Rodriguez, J. A. Cakebread, J. W. Holloway, and S. T. Holgate Association of Tumor Necrosis Factor-{alpha} Polymorphisms and Ozone-induced Change in Lung Function Am. J. Respir. Crit. Care Med., January 15, 2005; 171(2): 171 - 176. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rodriguez, T. R. Gaunt, S. D. O'Dell, X.-h. Chen, D. Gu, E. Hawe, G. J. Miller, S. E. Humphries, and I. N.M. Day Haplotypic analyses of the IGF2-INS-TH gene cluster in relation to cardiovascular risk traits Hum. Mol. Genet., April 1, 2004; 13(7): 715 - 725. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zapata, C. Nunez, and T. Velasco Distribution of Nonrandom Associations Between Pairs of Protein Loci Along the Third Chromosome of Drosophila melanogaster Genetics, August 1, 2002; 161(4): 1539 - 1550. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Zapata, C.
- Articles by Sacristán, F.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Zapata, C.
- Articles by Sacristán, F.







