- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.047431v1
genetics.105.047431v2
171/4/2085 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Clarke, G. M.
- Articles by Cardon, L. R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Clarke, G. M.
- Articles by Cardon, L. R.
Originally published as Genetics Published Articles Ahead of Print on August 22, 2005.
Genetics, Vol. 171, 2085-2095, December 2005, Copyright © 2005
doi:10.1534/genetics.105.047431
Disentangling Linkage Disequilibrium and Linkage From Dense Single-Nucleotide Polymorphism Trio Data
Geraldine M. Clarke1 and Lon R. Cardon
Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom
1 Corresponding author: Wellcome Trust Centre for Human Genetics, Oxford University, Roosevelt Dr., Oxford OX3 7BN, United Kingdom.
E-mail: gclarke{at}well.ox.ac.uk
Parent-offspring trios are widely collected for disease gene-mapping studies and are being extensively genotyped as part of the International HapMap Project. With dense maps of markers on trios, the effects of LD and linkage can be separated, allowing estimation of recombination rates in a model-free setting. Here we define a model-free multipoint method on the basis of dense sequence polymorphism data from parent-offspring trios to estimate intermarker recombination rates. We use simulations to show that this method has up to 92% power to detect recombination hotspots of intensity 25 times background over a region of size 10 kb typed at density 1 marker per 2.5 kb and almost 100% power to detect large hotspots of intensity >125 times background over regions of size 10 kb typed with just 1 marker per 5 kb (
= 0.05). We found strong agreement at megabase scales between estimates from our method applied to HapMap trio data and estimates from the genetic map. At finer scales, using Centre d'Etude du Polymorphisme Humain (CEPH) pedigree data across a 10-Mb region of chromosome 20, a comparison of population recombination rate estimates obtained from our method with estimates obtained using a coalescent-based approximate-likelihood method implemented in PHASE 2.0 shows detection of the same coldspots and most hotspots: The Spearman rank correlation between the estimates from our method and those from PHASE is 0.58 (p < 2.216).
LINKAGE disequilibrium (LD) refers to the presence of nonrandom association among two or more alleles at distinct loci. The degree of LD between alleles is determined by an evolutionary process combining the effects of natural selection, mutation, genetic drift, and population admixture. Appreciable LD is likely to be present only between two loci that are tightly linked. For this reason, attempts have been made to use the magnitude of LD between two loci to obtain estimates of recombination fractions between loci, e.g., SERRE et al. (1990). Transmission of haplotypes from parents to offspring is dependent on both LD and linkage (SPIELMAN et al. 1993). Specifically, the extent of LD between two distinct loci decays exponentially at a rate proportional to the recombination fraction c between the loci: If the disequilibrium between the two loci is D0 at time 0, then after k generations the disequilibrium will be
![]() | (1) |
More sophisticated methods applying models to account for stochastic effects have been developed and used successfully for the fine mapping of disease genes using allelic associations in isolated populations, for example, those in HASTBACKA et al. (1992), KAPLAN and WEIR (1995), KAPLAN et al. (1995), and DEVLIN et al. (1996). Also, many coalescent model-based methods exist for the estimation of recombination rates from patterns of sequence variation in population genetic data. Methods include moment-based estimators (WALL 2000) and full-likelihood (GRIFFITHS and MARJORAM 1996; KUHNER et al. 2000; NIELSON 2000; FEARNHEAD and DONNELLY 2001) and approximate-likelihood approaches (HUDSON 2001; FEARNHEAD and DONNELLY 2002; MCVEAN et al. 2002; LI and STEPHENS 2003). Methods are further described in STUMPH and MCVEAN (2003).
We define a hotspot of intensity s to be a region of length 10 kb where crossovers occur a frequency of more than s times that in the surrounding region. We use simulations to show that, using 200 trio families, our method can reliably detect recombination hotspots of intensity 10 across a region typed with 20 markers at a density of 1 marker per 2.5 kb (P < 0.025). Hotspots of an intensity >25 can be detected by typing as few as 5 markers at a density of 1 marker per 2.5 kb (P < 0.001). We show that the number of markers required to achieve a given accuracy decreases with marker density. Results suggest that this method is able to detect fine-scale variation in recombination rate according to the density of typed markers: That is, the more dense the marker spacing is, the more fine scale the variation that can be detected. To validate our approach, we estimated recombination rates using HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trio data from various chromosomes and compared averages over a megabase scale to estimates from the genetic map (KONG et al. 2002). At a finer scale, we compared recombination rate estimates in European (CEPH) populations between adjacent SNPs on a 10-Mb region of chromosome 20 made using our model-free method to independent estimates made from a coalescent model-based approximate-likelihood approach implemented in the program PHASE 2.0 (STEPHENS et al. 2001; STEPHENS and DONNELLY 2003). These independent estimates are described in detail in EVANS and CARDON (2005).
- For all pairs of markers, we first estimate the probability of observing each haplotype in the parental population. This provides an estimate of LD in the parental generation.
- For each pair of markers, we write down the likelihood of transmitting the observed offspring haplotype counts conditional on the LD in the parental generation, that is, assuming the LD in the parental generation is a known parameter. The likelihood of transmitted haplotype counts at each pair of markers is a function of the recombination fraction between the markers. The recombination fraction is defined as the probability that a transmitted haplotype constitutes a new combination of alleles different from that of either parental haplotype. Maximum-likelihood estimates (MLEs) of the recombination fraction are then obtained, independently, at all pairs of markers.
- Finally, for any given pair of adjacent markers, all estimates from marker pairs that overlap the adjacent pair, up to a given distance apart, are scaled according to the interval spanned by the pair of markers relative to the interval spanned by the adjacent markers under consideration. Each scaled estimate represents a new estimate of the recombination fraction between the adjacent pair of markers. The original estimate between each pair of markers and all new estimates are then averaged (weighted by variance) to form a final, new estimate of the recombination fraction between each pair of adjacent markers. These steps are now described in greater detail.
Step 1determine LD in the parental generation:
Consider any two biallelic markers with alleles arbitrarily labeled A and a at the first locus and B and b at the second locus. Let c be the recombination fraction between the markers. We assume that haplotypes are available or have been estimated from a sample of N parent-offspring trios. The four possible haplotypes are ab, aB, Ab, and AB and the observed number of these haplotypes in the parental generation is denoted by m = (mab, maB, mAb, mAB), with mab + maB + mAb + mAB = 4N. The likelihood of a specific sample configuration is
![]() | (2) |
and is denoted by
and likewise for paB, pAb, and pAB. Let estimates of major allele frequencies at each locus be given by
and
. Using the standard notation for an LD coefficient, we let
![]() |
![]() |
Step 2infer recombination fraction c between markers:
Let p'ab be the frequency of haplotype ab in the offspring generation. Then, by conditioning on all possible parental haplotype combinations from which an ab haplotype can be transmitted, it is easy to see that
![]() | (3) |
![]() |
![]() |
![]() |
Let the observed number of haplotypes in the offspring generation be denoted by
![]() |
![]() | (4) |
of c are obtained by selecting the value of c that maximizes Equation 4. The information I(c) for c is defined as minus the second derivative of Equation 4:
![]() | (5) |
is the observed information and is inversely proportional to the asymptotic variance of
. Let
denote the variance of estimate
. Observe that when
is small, as would be the case in recombination hotspots, the likelihood has less dependence on the exact value of c and MLEs will have large variance. Conversely, when
is large, estimates of c will be more precise with small variance.
Step 3use multiple closely linked flanking markers to refine estimate of c:
Let di,j be the distance, in base pairs, between any two markers i and j. Let ci,j represent the recombination fraction between markers i and j and let
be the MLE of ci,j obtained using the method described above. Individual estimates of
can be highly variable. To refine our estimate
between adjacent markers i and i + 1, we combine estimates
obtained at multiple pairs of markers j and k up to w markers away from markers i and i + 1, i.e., pairs of markers in S := {(j, k) : max(0, i + 1 w)
j < k
min(i + w, L)}. These represent all the markers that overlap markers i and i + 1 and are within w markers of them and are referred to as the flanking markers. We assume that the rate at which recombination occurs remains constant over the region of the chromosome spanned by the flanking markers. Since recombination fractions are approximately additive over short distances and our markers are densely spaced, we assume that the intermarker recombination fractions estimated between pairs of flanking markers are scaled estimates of the intermarker recombination fraction between markers i and i + 1. Specifically, let
![]() |
is
![]() |
We form a new estimate
of ci,i+1, by taking the weighted average of all estimates
, (j, k)
S. Specifically, let
![]() | (6) |
![]() |
|
If estimates are independent and N large then
is approximately distributed as a normal random variable with mean ci,i+1 and variance
(EDWARDS 1992), where
![]() | (7) |
However, estimates
, (j, k)
S, are, in fact, dependent. For the analysis presented here, we violate the assumption of independence and use the formula given in Equation 7 to get rough approximations of normal confidence intervals. Refer to the DISCUSSION for alternative possibilities for computing the variance.
Simulations:
To test the accuracy of our method for estimating recombination fractions between loci we used the program fin (software available on request from G. A. T. Mcvean, mcvean{at}stats.ox.ac.uk) to generate SNP genotype data simulated from a standard coalescent neutral model of genetic variation. The basic simulation involved 400 sequences of 400 SNPs with an overall population recombination rate,
= 4Ner = 400, where Ne is the effective population size and r is the per generation recombination rate, that is, the probability of a crossover per generation across the 1-Mb region. For an effective population size of Ne = 10,000, then r = 0.01 over the 1-Mb region, equivalent to a genetic distance of 1 cM/Mb, which is approximately the rate seen in human data derived from observed crossovers in pedigrees (KONG et al. 2002). Average SNP spacing is 1 SNP per d = 2.5 kb. The population recombination rate was set at a constant background value over the simulated region but increased to size s = 10 times the background rate at three evenly spaced intervals of length 10 kb. Hence, for this simulation, there were 385 markers with intermarker recombination rate value 0.748; the remaining 15 markers were simulated to be in three hotspots of 5 markers with intermarker recombination rate value 7.48. Thus, in total, the overall population recombination rate is
= 15 x 7.48 + 385 x 0.748 = 400, as required, and crossovers between markers in the hotspot were simulated to occur 10 times more frequently than those in the surrounding sequence. Samples were then randomly mated using the program tdtsim (software available on request from A. Morris, amorris{at}well.ox.ac.uk) to produce 200 offspring samples with zero recombination. We show that recombination in current generation meioses does not have a significant impact on estimated recombination rates using our method. Simulations were repeated as described for hotspot intensities s = 1, 25, 125, and 625. Obviously, simulations with s = 1 have uniform recombination rate across the region, i.e., no hotspots, and are used for comparison. Intermarker recombination fraction estimates were estimated for each simulation and converted to genetic distances in centimorgans per megabase as follows. c is multiplied by 100, to convert to centimorgans, and then divided by the intermarker interval length in megabases, to convert to centimorgans per megabase. To ensure a mean value of 1 cM/Mb across the region, estimates are scaled by dividing by the estimated total genetic distance across the 1-Mb region. Since recombination fractions are <0.01, conversion to genetic distances requires no mapping function.
|
Figure 3 shows intermarker recombination rates, in centimorgans per megabase, from simulations described for Figure 2. Each row corresponds to a different true hotspot intensity, s = 10, 25, 125, and 625, and illustrates results at 10, 20, 40, and 60 flanking markers. Estimates plotted in centimorgans per megabase appear smoothed because: (1) The intermarker recombination fraction estimates depend on the intermarker distances and (2) the method assumes constancy of recombination rate over the distance covered by the flanking markers that are used to refine each intermarker estimate. Plots indicate that hotspots start to appear at as few as 10 flanking markers; large hotspots of 625 times background are well defined at this point (Figure 3D). Clearly, the larger the hotspot intensity is, the easier it is to detect and using more markers increases the ability to detect hotspots. The last column of Figure 3 indicates that when 60 flanking markers are used to refine the intermarker recombination rate estimates, the method can successfully detect hotspots of each intensity. Note that the presence of peaks at locations other than model hotspots may still be valid since our data are randomly generated.
|
Figure 4 shows results similar to those given in Figure 3 but where the current generation of offspring was created, allowing an expected value of four recombinations in their parental meioses per 1 Mb of simulated sequence. The solid crosses indicate the site of a recombination that occurred in the current generation. Clearly, recombination in the current generation does not have a significant impact on our estimates of population recombination rates. Our method appears to capture historical recombination rate information.
|
Consistency:
To assess more formally whether our method was able to consistently detect hotspots and to analyze the impact of marker density, simulations were repeated 1000 times for each hotspot intensity, s = 1, 10, 25, 125, 625 times background and for new marker densities 1 marker per 2 and 5 kb. We refer to a set of 1000 simulations as a simulation group. For each simulation we used our method to estimate intermarker recombination fractions for various numbers of flanking markers. For each set of estimated intermarker recombination fractions, we estimated hotspot intensity. Estimated hotspot intensity is calculated as the estimated mean genetic distance (centimorgans per megabase) at the simulated hotspot sites divided by that at the simulated nonhotspot sites.Figure 5 shows the estimated mean hotspot intensity by flanking markers for each simulation group at marker density 1 marker per 2.5 kb. Dotted lines indicate 95% bootstrap percentile confidence intervals (EFRON and TIBSHIRANI 1993). For any value of flanking markers used, Figure 5 shows, as indicated in Figure 2, that estimates of hotspot intensity are relatively correct although they are not linearly related; i.e., the line for 625 times background is still higher than the line for 125 times background but not 5 times greater. Clearly the method can detect hotspots of different intensity but problems of scale are related to the way that the method utilizes nearby markers to refine estimates. The effect of refining estimates over multiple pairs of markers also smoothes them.
|
To formally test whether a hotspot has been detected in simulations from groups where s > 1, we consider a null hypothesis that the simulation true hotspot intensity is 1. For a type I error rate of 0.05, we use the 95th percentile of estimated hotspot intensity value from simulation groups where s = 1 as critical values and accept the alternative hypothesis that a simulation true hotspot intensity is >1 if the estimated hotspot intensity from that simulation is greater than the corresponding critical value. We define the power of our test in a given simulation group to be the proportion of simulations from that group with estimated hotspot intensities greater than the corresponding critical value.
Figure 6 shows the power of our test to detect a hotspot for each simulated true hotspot intensity, s = 10, 25, 125, and 625 times background, and marker density 1 marker per 2, 2.5, and 5 kb, according to the number of flanking markers. Power appears to increase with number of flanking markers used, up to a point: Graphs for hotspot intensities 10 and 25 times background indicate that using too many flanking markers can result in a decline of power. In regions of low hotspot intensity, using too many markers will over-smooth estimates to the extent that hotspots cannot be distinguished. Figure 6 also suggests that the more dense the marker spacing is, the greater the power to detect a hotspot for any given number of flanking markers. Clearly, the more dense the markers are, the more information in any given interval and so the more accurate the estimates over that interval.
|
Results suggest that this method can be used to detect variation in recombination rates. The resolution at which estimates can be accurately made will depend on the marker density in conjunction with the number of trios studied (results varying the number of trios are not shown here). Greater density of markers means accuracy at a greater resolution. Increasing the flanking markers will tend to increase the accuracy at the expense of resolution.
Real data at high resolution:
Existing genetic maps estimate recombination rates over megabase scales. We can obtain estimates at megabase scales using our model-free method by using a sliding-windows approach to average estimates over a given interval. We used HapMap (March 2005 release, www.hapmap.org) SNP genotypes from 30 CEPH trios of European ancestry (DAUSSET et al. 1990) on chromosomes 3 and 22 to compare estimates made in this way to those in the genetic map for a European population obtained by a pedigree-based method (KONG et al. 2002). Figure 7 shows results of these comparisons and indicates strong agreement between the methods. Further comparisons on all other chromosomes indicate similar results.
|
Read data at low resolution:
We compared European population recombination rates, across a 10-Mb section of chromosome 20, estimated from our method to those from independent estimates made using the approximate-likelihood-based method implemented in PHASE 2.0. Estimates were based on SNP genotype data from 12 CEPH pedigrees. The raw genotype data and samples have been described by KE et al. (2004). The black line in Figure 8 shows estimates from our method, in centimorgans per megabase, between 5355 adjacent SNP markers genotyped from 21 trios from 12 CEPH pedigrees. The red line shows PHASE recombination rate estimates, in centimorgans per megabase, between 4513 of the 5355 adjacent SNP markers, genotyped from 46 founders from the same 12 CEPH pedigrees (EVANS and CARDON 2005). The original estimates from PHASE are population recombination rates,
, and are dependent on both effective population size and per generation recombination rate. To scale PHASE estimates to centimorgans per megabase, we assumed a constant effective population size of Ne = 10,000. The Spearman rank correlation of recombination rate estimates made using each method between the 4513 common SNPs is 0.58 (p < 2.2 x 1016), indicating strong positive correlation between estimates from the two methods.
|
To understand why the method works even when there are no observed recombinants, we consider the relationship between the possible distribution of offspring haplotypes and the size of LD between any two loci in the parental generation. If there are no recombinants, then the offspring haplotypes are simply a sample of the parental haplotypes. When LD is high in the parental generation, random sampling of haplotypes without recombination will tend to produce a sample with high LD too, in which case Equation 1 suggests that our method will estimate a low value of recombination fraction c. As the LD decreases in the parental generation, random sampling of haplotypes without recombination will tend to produce samples with an increasing range of possible LD values, in which case Equation 1 suggests that estimates of c will be, in general, higher. The strength of our method comes from combining estimates from neighboring markers.
We note that our method ignores dependence between markers when combining estimates. Further work is required to understand possible biases that may result or to devise alternative ways of combining estimates. We have shown that the method is able to capture variation in recombination rate but not absolute value. Our estimates are dependent upon a variety of interdependent parameters, namely, the original hotspot intensity, the number of flanking markers used, the marker density, and the number of parent-offspring trios available.
The Spearman rank correlation coefficient between our recombination rate estimates and those made from PHASE across a 10-Mb section of chromosome 22 was 0.58. We do not expect perfect correlation with PHASE for at least three reasons:
- Our method ignores the effects of evolutionary sampling, which we regard as a distinguishing feature since it removes the need for accompanying assumptions about population demography.
- PHASE estimates population recombination rates, which include the effective population size; our method estimates recombination fractions. To compare the two, we have assumed a fixed effective population size. Variability in the effective population size would account for some of the differences between recombination rate estimates made from each method.
- Our method uses multiple overlapping markers to refine estimates. We therefore expect that our approach will produce estimates at a lower resolution than estimates from PHASE for the same number of markers. Our method requires more markers than PHASE to achieve results at an equivalent resolution to that of PHASE.
- PHASE estimates population recombination rates, which include the effective population size; our method estimates recombination fractions. To compare the two, we have assumed a fixed effective population size. Variability in the effective population size would account for some of the differences between recombination rate estimates made from each method.
The aim of our method is not to outperform PHASE. PHASE is model based and can allow for patterns of population-level association, or LD, that can arise by chance as well as simply as a result of recombination rate variation. The trade-off in allowing for this complexity is extended computation times. The time taken to estimates rates using PHASE is dependent on the amount of LD: Chromosomal segments with low levels of LD took a minimum of 34 days to estimate recombination rates between just 70 markers, and some chromosomal segments with regions of high LD took over a month to estimate rates for that many markers.
To address the question of how to select the number of flanking markers w, we consider the case where all markers are an equal distance apart and let the variance for each original recombination fraction estimated from step 2 of the method be V. Then, the variance of the refined estimate can be approximated:
![]() |
89%; w = 20 reduces it by
99.998%. Power curves in Figure 6 suggest that the power of our method to detect hotspots of intensity <100 using markers of any density begins to decline at
w = 20 markers. Obviously, for hotspots of small intensity, smoothing over so many markers causes depletion of the estimate to the extent that it cannot be detected from nearby estimates. For hotspots of intensity >100, power is maintained for larger values of w. Hence, we suggest w = 20 as a minimum for this method. Increasing w > 20 will depend on the aim of the project. Clearly, as w increases, estimates are smoothed and detection of smaller hotspots becomes problematic. However, as w increases, there will be greater accuracy of recombination rate variation, albeit at a lower resolution. Recall that the method assumes that the recombination rate is constant over the interval spanned by the flanking markers. Hence, given a specific marker density, the chosen value of w will define the scale at which recombination rate variation can be detected. If density is 1 marker per 2.5 kb and w = 20, for example, then rates can be accurate to a scale of 50 kb. If finer scales of recombination rate variation are required, then a greater density of markers will be required; e.g., markers at density 1 per 1 kb and w = 20 can give an accuracy to a scale of 20 kb.
We have assumed that haplotypes can be accurately measured in parental and offspring generations. However, for data collected on nuclear families, haplotypes in parents may not be uniquely resolved. Typing of additional siblings may help to resolve parental haplotypes but since our method uses the parental haplotypes only to gain a measure of LD in the parental generation, an alternative approach would be to use composite LD measures to estimate LD using parental genotypes instead (SCHAID 2004). A necessary condition for ambiguity in offspring haplotype estimation between two loci is that one parent and the offspring are doubly heterozygous and the other parent is heterozygous at at least one of the loci (DUDBRIDGE et al. 2000). The probability that an offspring is heterozygous at a locus given his parental genotypes is
. If the probability of being heterozygous at a locus is H, then the probability of ambiguity in offspring haplotype is
![]() |
![]() |
, probabilities of ambiguity in offspring haplotype could range from
to
. On average the heterozygosity at any given locus in HapMap is 
(www.hapmap.org). Hence ambiguity in offspring haplotype estimation in HapMap reduces in range from
to
. The impact is further reduced by the averaging of estimates over multiple markers. Methods to incorporate haplotype uncertainty into the likelihood of transmitted haplotypes are possibilities for further work. To treat recombination fractions between pairs of markers flanking an adjacent pair as a scaled version of that between the adjacent pair we have assumed that recombination fractions are additive. This is approximately true over short distances (<2 cM). To allow for the effects of interference, the method could be modified to convert each estimate of a recombination fraction made in step 2 to a map distance via an appropriate mapping function, e.g., Kosambi. Map distances between markers flanking an adjacent pair are then scaled versions of the map distance between the adjacent pair and the method could proceed as described in step 3, but by replacing recombination fractions with map distances.
To determine normal confidence intervals presented in this article we have assumed independence between individual estimates made for each pair of markers. Further work might include the more accurate estimation of distribution of our estimator, using bootstrap or jackknife techniques (EFRON and TIBSHIRANI 1993).
DAUSSET, J., H. CANN, D. COHEN, M. LAHTROP, J. M. LALOUEL et al., 1990 Centre d'Etude du Polymorphisme Humain (CEPH): collaborative genetic mapping of the human genome. Genomics 6: 575577.[CrossRef][Medline]
DEVLIN, B., N. RISCH and S. ROEDER, 1996 Disequilibrium mapping: composite likelihood for pairwise disequilibrium. Genomics 36: 116.[CrossRef][Medline]
DUDBRIDGE F., B. P. C. KOELEMAN, J. A. TODD and D. G. CLAYTON, 2000 Unbiased application of the transmission/disequilibrium test to multilocus haplotypes. Am. J. Hum. Genet. 66: 20092012.[CrossRef][Medline]
EDWARDS, A., 1992 Likelihood. John Hopkins University Press, Baltimore.
EFRON, B., and R. TIBSHIRANI, 1993 An Introduction to the Bootstrap. Chapman & Hall, London/New York.
EVANS, D. M., and L. R. CARDON, 2005 A comparison of linkage equilibrium patterns and estimated population recombination rates across multiple populations. Am. J. Hum. Genet. 76: 681687.[CrossRef][Medline]
FEARNHEAD, P., and P. DONNELLY, 2001 Estimating recombination rates from population genetic data. Genetics 159: 12991318.
FEARNHEAD, P., and P. DONNELLY, 2002 Approximate likelihood methods for estimating local recombination rates. J. R. Stat. Soc. Ser. B 64: 657680.[CrossRef]
GRIFFITHS, R. C., and P. MARJORAM, 1996 Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3: 479502.[Medline]
HASTBACKA, J., A. DE LA CHAPELLE, I. KAITILA, P. SISTONEN, A. WEAVER et al., 1992 Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat. Genet. 2: 204211.[CrossRef][Medline]
HILL, W. G., and B. S. WEIR, 1994 Maximum likelihood estimation of gene location by linkage disequilibrium. Am. J. Hum. Genet. 54: 705714.[Medline]
HUDSON R. R., 2001 Two locus sampling distributions and their application. Genetics 159: 18051817.
INTERNATIONAL HAPMAP CONSORTIUM, 2003 The international HapMap project. Nature 426: 789796.[CrossRef][Medline]
KAPLAN, N. J., and B. S. WEIR, 1995 Are moment bounds on the recombination fraction between a marker and a disease locus too good to be true? Allelic association mapping revisited for simple genetic diseases in the Finnish population. Am. J. Hum. Genet. 57: 14861498.[Medline]
KAPLAN, N. N., W. G. HILL and B. S. WEIR, 1995 Likelihood methods for locating disease genes in nonequilibrium populations. Genetics 56: 1832.
KE, X., S. HUNT, W. TAPPER, R. LAWRENCE, G. STAVRIDES et al., 2004 The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum. Mol. Genet. 13: 577588.
KONG, A., D. F. GUDBJARTSSON, J. SAINZ, G. M. JONSDOTTIR, S. A. GUDJONSSON et al., 2002 A high-resolution recombination map of the human genome. Nat. Genet. 31: 241247.[CrossRef][Medline]
KUHNER, M. K., J. YAMATO and J. FELSENSTEIN, 2000 Maximum likelihood of recombination rates from population data. Genetics 156: 13931401.
LANDER, E. S., and D. BOTSTEIN, 1986 Mapping complex genetic traits in humans: new methods using a complete RFLP linkage map. Cold Spring Harbor Symp. Quant. Biol. 51: 4962.
LI, N., and M. STEPHENS, 2003 Modeling linkage disequilibrium and identifying recombination hotspots using single nucleotide polymorphism data. Genetics 165: 22132233.
MCVEAN, G. A. T., P. ADWALLA and P. FEARNHEAD, 2002 A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160: 12311241.
NIELSON, R., 2000 Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154: 931942.
SCHAID, D. J., 2004 Linkage disequilibrium testing when linkage phase is unknown. Genetics 166: 505512.
SERRE, J. L., B. SIMON-BOUY, E. MORNET, B. JAUME-ROIG, A. BALASSOPOULOU et al., 1990 Studies of RFLP closely linked to the cystic fibrosis locus throughout Europe lead to new considerations in populations genetics. Hum. Genet. 84: 449454.[Medline]
SPIELMAN, R. S., R. E. MCGINNIS and W. J. EWENS, 1993 Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52: 506516.[Medline]
STEPHENS, M., and P. DONNELLY, 2003 A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73: 11621169.[CrossRef][Medline]
STEPHENS, M., N. SMITH and P. DONNELLY, 2001 A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68: 978989.[CrossRef][Medline]
STUMPH, M. P. H., and G. A. T. MCVEAN, 2003 Estimating recombination rates from population-genetic data. Nat. Genet. Rev. 4: 959968.
WALL, J., 2000 A comparison of estimators of the population recombination rate. Mol. Biol. Evol. 17: 839850.
Communicating editor: P. J. OEFNER
This article has been cited by other articles:
![]() |
A. Tenesa, P. Navarro, B. J. Hayes, D. L. Duffy, G. M. Clarke, M. E. Goddard, and P. M. Visscher Recent human effective population size estimated from linkage disequilibrium Genome Res., April 1, 2007; 17(4): 520 - 526. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Visscher and W. G. Hill Estimation of Recombination Rate and Detection of Recombination Hotspots From Dense Single-Nucleotide Polymorphism Trio Data Genetics, August 1, 2006; 173(4): 2415 - 2417. [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.047431v1
genetics.105.047431v2
171/4/2085 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Clarke, G. M.
- Articles by Cardon, L. R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Clarke, G. M.
- Articles by Cardon, L. R.
















,
,
,
,
, and
of the intermarker recombination fractions are made for each pair of markers. New estimates for the interval between markers 2 and 3 are made from markers 1 and 3, markers 2 and 4, and markers 1 and 4 as
,
, and
, respectively. A weighted average of these new estimates provides a final refined estimate,
, of the intermarker recombination fraction between markers 2 and 3.


are converted to genetic distances in centimorgans per megabase.








