Recent studies report a surprisingly high degree of marker-to-marker linkage disequilibrium (LD) in ruminant livestock populations. This has important implications for QTL mapping and marker-assisted selection. This study evaluated LD between microsatellite markers in a number of breeding populations of layer chickens using the standardized chi-square (χ2′) measure. The results show appreciable LD among markers separated by up to 5 cM, decreasing rapidly with increased separation between markers. The LD within 5 cM was strongly conserved across generations and differed among chromosomal regions. Using marker-to-marker LD as an indication for marker-QTL LD, a genome scan of markers spaced 2 cM apart at moderate power would have good chances of uncovering most QTL segregating in these populations. However, of markers showing significant trait associations, only 57% are expected to be within 5 cM of the responsible QTL, and the remainder will be up to 20 cM away. Thus, high-resolution LD mapping of QTL will require dense marker genotyping across the region of interest to allow for interval mapping of the QTL.
THE possibility of shifting selection from the phenotypic to the genomic level, through marker-assisted selection (MAS) based on DNA level polymorphisms, first proposed by Soller and Beckmann (1982, 1983), has aroused enormous interest and great investment in developing an appropriate genomic infrastructure, including complete marker maps of the major agricultural species, culminating with the sequencing of the chicken genome (completed) and the cattle genome (in progress). Nevertheless, although innumerable QTL mapping experiments have been successfully carried out, actual implementation of MAS in commercial breeding populations has been slow to come. At the animal breeding level, there appear to be a number of reasons for this. Most QTL mapping experiments have been performed in crosses between strongly differentiated breeds (Andersson 2001), and it is not clear to what extent QTL identified in such crosses are segregating within populations. Also, there are strong indications that QTL effects observed in a particular population depend on genetic background and can differ by population (Dekkers and Hospital 2002). Thus, to ensure successful MAS, it is necessary to conduct experiments studying QTL directly within commercial production populations.
For species with large family sizes, specifically dairy cattle (see Khatkar et al. 2004 for a review), QTL have been detected within breeds but on a within-family basis. Because of their potentially loose linkage to the QTL, markers detected in such within-family studies are likely to be in linkage equilibrium (LE) with the QTL across the population. For these so-called LE markers the linkage phase between markers and the QTL must be established separately for each family and, once established, phased markers must be traced across generations (Soller and Medjugorac 1999). This limits the use of these markers for MAS (Dekkers 2004).
An alternative approach, based on candidate gene analysis (Rothschild and Soller 1997), avoids these problems. Candidate gene markers are expected to be close enough to the causative mutation such that associations are consistent across families because of population-wide linkage disequilibrium (LD) (Dekkers 2004). This allows candidate gene associations to be detected within breeding populations and to be used immediately for selection in those same populations by selection on so-called LD markers (Dekkers 2004). One of the first implementations of this approach in animal breeding was for the estrogen receptor gene for litter size in pigs (Rothschild et al. 1994; Short et al. 1997; Dekkers 2004). At present, however, because of the targeted and time-consuming nature of candidate gene studies, the scope of a candidate gene approach for identifying QTL across the genome is limited. Costs of targeted candidate gene SNP detection and of SNP genotyping are, however, rapidly dropping, which may change the situation dramatically.
An attractive alternative to both LE and candidate gene mapping, which shares the advantages of both, is to look for population-wide LD between anonymous markers and QTL, using a dense marker map. Genome-wide scans for marker-QTL LD are feasible through application of selective DNA pooling (Darvasi and Soller 1994; Lipkin et al. 1998; Mosig et al. 2001) for the initial scan. Current chicken/cattle maps provide about one microsatellite marker per centimorgan, and there are now 2.8 million SNPs available in chickens, which, after correcting for SNPs appearing in more than one line, amounts to one marker per 374 bp (2,833,578 variant sites in the 1.06-Gb genome) (International Chicken Polymorphism Map Consortium 2004a). Availability of the complete chicken and cattle genome sequences means that essentially unlimited numbers of microsatellite and SNP markers can be readily obtained to saturate the genome to any desired degree. Thus, implementation of genome-wide scans for LD between anonymous markers and QTL in commercial populations is feasible, but depends on the extent of LD that exists in these populations. Theoretical analyses, based on the well-known Sved (1971) equation relating LD to effective population size and map distance, predict LD over very small regions in populations with large effective population size, such that a 0.1- or even 0.01-cM spacing might be required for an effective LD scan. In this context, the report by Farnir et al. (2000) of extensive marker-to-marker LD over several tens of centimorgans in a study of Dutch Holsteins has important implications for LD mapping and LD-based MAS within commercial animal breeding populations. These results have been supported in further studies in cattle (Vallejo et al. 2003) and sheep (McRae et al. 2002).
The purpose of this study was to examine the extent of marker-to-marker LD (henceforth “marker-LD”) in a number of closed breeding lines of commercial egg-laying chickens of the Hy-Line International chicken breeding organization (henceforth Hy-Line). Under the assumption that the presence and extent of marker-LD correspond to the presence and extent of marker-QTL LD, marker-LD provides a basis for determining whether it is feasible to search for marker-QTL LD in commercial chicken lines and allows the marker density and population size required for such a search to be quantified.
MATERIALS AND METHODS
For purposes of this study, Hy-Line provided marker genotype data for three commercially relevant lines, which were under intense selection for various production traits. Lines 1, 2, and 3 had been closed for 12, 30, and 6 generations, respectively, at the generation at which the first set of DNA samples was taken (henceforth, G1). In all data sets, individuals sampled from each line were taken from the set of males chosen for entry into a progeny test program on the basis of estimated breeding values based on pedigree and sib performance. Individuals with the widest possible pedigree representation were sampled, but in some cases half-sibs were chosen to obtain sufficient numbers.
To initiate the study, marker genotype data for 22 individuals, each of the G1 generation of lines 1 and 2, were evaluated for LD. This data set, henceforth termed the “LD22 data set,” consisted of two subsets: (i) the LD22 “whole-chromosome” data set, consisting of markers covering chromosomes 3, 4, and 5, and (ii) the LD22 “cluster” data set, consisting of clusters of closely linked markers on chromosomes 1, 2, 6 (line 2 only), and 8. On the basis of results from the LD22 data set, a dense marker scan was implemented for chromosomes 4 and 5 of lines 1 and 3, to study the degree to which population-wide marker-LD was conserved across generations. Wang (2003) previously reported QTL in these populations affecting the traits under selection on chromosome 4, but not on chromosome 5. This data set, denoted the “LD96 data set,” consisted of marker genotype data for 32 individuals of each line in each of three generations (a total of 96 individuals per line). For line 1, the data came from generations G1, G2, and G6 and for line 3 from generations G5, G6, and G7.
Markers and genotyping:
The number of microsatellite markers used for each line and chromosome, the average spacing between markers, and the average number of alleles per marker are in Table 1. Complete marker details are in Heifetz (2004). The number of alleles per marker averaged ∼3.0. Where possible, map locations and distances between markers were according to the consensus map (http://iowa.thearkdb.org/). In some instances, a marker was not in the consensus map, but located on one of the individual maps on which the consensus map is based. In such cases, additional markers to both sides of the marker in question were identified that were present in both the consensus map and the individual map, and the marker in question was positioned on the consensus map proportional to these two markers. Resulting distances were in good agreement with the recently published sequence of the chicken genome (International Chicken Polymorphism Map Consortium 2004b). In any event, map distances represent proportions of recombination and will track the effects of recombination on LD more closely than distances on the sequence map, which may not be as closely tied to recombination rates. A plot of LD against physical distance, however, would be a convenient means of identifying recombination hotspots. An analysis of our data from this point of view will be presented elsewhere.
DNA was isolated from whole blood and diluted to 25 μg/ml. PCR genotyping was done using primer sequences given in http://iowa.thearkdb.org/. Marker genotypes were determined following size separation on an ABI 377 sequencer (Applied Biosystems, Foster City, CA), using ABI Genotyper 2.0 software (Applied Biosystems).
Measures of linkage disequilibrium:
Simulation studies (Zhao et al. 2005) have shown that the standardized chi-square, χ2′ (Yamazaki 1977; Hedrick 1987), is the preferred measure of LD for multiallelic markers for purposes of QTL mapping. The measure termed r2 (Hill and Robertson 1968) is very useful for biallelic markers, since it stands in inverse proportion to the sample size needed to demonstrate significance. But when pooling across allele pairs for multiallelic markers, r2 weighted by the product of allele frequencies was found to be strongly undervalued (Zhao et al. 2004) and this was also found in our data (not shown). Hence this measure will not be considered further. For comparison purposes, the measure D′, which was used in other livestock studies (Farnir et al. 2000; McRae et al. 2002; Vallejo et al. 2003), was included in some analyses. Definitions of D′ and χ2′ arewhere Dij = P(AiBj) − P(Ai)P(Bj), P(Ai) is the frequency of allele i at locus A, P(Bj) is the frequency of allele j at locus B, andN is the population size, and n is the number of alleles at the marker with the smaller number of alleles
To calculate the various LD measures between marker pairs, maximum-likelihood estimates of all two-marker haplotype frequencies were obtained using the software Arlequin (Schneider et al. 2000; Genetic and Biometry Lab, University of Geneva), which uses the genotypes of each individual for all markers as input. Individuals with a missing genotype for a given marker were excluded when computing LD for that marker.
Critical 0.05 and 0.01 significance levels of the χ2′-statistic were obtained empirically on the assumption that LD is not expected for nonsyntenic marker pairs, and hence the distribution of χ2′-values obtained for such marker pairs represents the distribution under the null hypothesis. Critical values were determined separately for each data set by ranking χ2′-values for all nonsyntenic marker pairs and taking the LD value of the pair whose rank corresponded to the desired significance level.
Prediction equation for LD:
The well-known equation of Sved (1971), relating LD generated by drift to distance between markers and effective population size (Ne), is a useful way to summarize the extent and decline of LD with distance in a population. This equation was used as the basis for fitting the following model (model 1) to observed LD,(1)where LDij is the observed LD for marker pair i of data set j, dij is the distance in morgans for marker pair i of data set j, bj is a coefficient that describes the decline of LD with distance for data set j, and eij is a random residual. Parameter bj was estimated separately for each data set, using the nonlinear fit command option of JMP (JMP software 5.1.2, 1989–2004; SAS Institute, Cary, NC). For this purpose, χ2′-values of marker pairs that were up to 100 cM apart were included.
Tests for chromosomal and regional differences in LD:
Under the assumption that presence and extent of marker-LD corresponds to presence and extent of marker-QTL LD, regions with high marker-LD would be expected to also exhibit greater marker-QTL LD. Such regions would be priority regions for QTL mapping, since they would require lower marker density (on the genetic map distance scale) for equivalent power. In addition, outlier behavior of a chromosomal region with respect to LD might in itself be a “selection signature,” indicating presence of a QTL under selection in that region (Kim and Nielsen 2004). Differences in LD between chromosomes were evaluated using marker pairs that were up to 20 cM apart on chromosomes 4 and 5 from the LD22 and LD96 data sets. Since markers were not evenly distributed within regions, it was necessary to take differences in map distance between markers into account when evaluating regional differences in LD. This was done by conducting the analysis on residuals from model 1 fitted to each data set. In addition, residuals were divided by their predicted value to stabilize the variance, since variance of LD (measured by r2) is expected to be proportional to the square of expected LD (Hill 1981). Thus, observations used for analysis werewhere eijk is the residual from model 1 for marker pair i on chromosome k for data set j and is the estimate of the decline of LD with distance for data set j. To test for differences in LD between chromosomes, the following model (model 2) was used,(2)where Lj is the effect of line j, Ck is the effect of chromosome k, LCjk is the interaction effect of line j and chromosome k, and εijk is a residual. Differences in LD between chromosomal regions on chromosomes 3, 4, and 5, were evaluated using regions of 10 cM that had three or more markers in at least one of the lines. This resulted in four data sets: two LD22 data sets with 90 and 128 marker pairs for lines 1 and 2 in 24 regions, respectively, and two LD96 data sets with 101 and 77 marker pairs for lines 1 and 3 in 16 regions, respectively. Mean marker distance within 10-cM regions was 4.5 cM for the LD22 data set and 4.1 cM for the LD96 data set. Marker-LD was corrected for distance and heterogeneous variance using the procedure described above and analyzed by the following model (model 3) to test for differences between regions,(3)where Yijk is the distance-corrected and variance-stabilized LD for marker pair i in region k of line j, Lj is the effect of line j, Rk is the effect of region k, LRjk is the interaction effect of line j and region k, and εijk is the residual of marker pair i in region k of line j.
Correlations between generations:
Pairwise correlations of LD values between generations were calculated using all marker pairs that were present in all generations for each line of the LD96 data set. Expected correlations were obtained by simulation, using the methods described in Zhao et al. (2004). Linkage disequilibrium between markers with four alleles was simulated over 100 generations of random mating in a population with effective size 100. Correlations between LD in alternate generations were obtained by averaging correlations among generations 91–100 using all marker pairs that were segregating in these generations. For syntenic markers >20 cM apart, the maximum was taken as 100 cM and for the nonsyntenic markers, marker pairs that were 300–350 cM apart were used.
LD among nonsyntenic markers:
In the data sets analyzed, 15–31% of nonsyntenic marker pairs had D′-values between 0.50 and 1.00, and 4–15% had D′-values >0.90 (data not shown). Using the χ2′-measure resulted in a 100-fold reduction: only 0.08–0.30% of nonsyntenic pairs had values between 0.50 and 1.00, and only 0.02–0.20% had values >0.90 (data not shown). With rare exceptions (e.g., in cases of epistasis or where two markers are each in LD with QTL that are under selection), LD is not expected among markers on different chromosomes. Thus, for these data sets it is clear that D′ had an unacceptable proportion of high values that do not appear to correspond to the reality of the situation. For this reason, χ2′ was used in all subsequent analyses of the data.
Observed LD among nonsyntenic markers was used to derive critical χ2′-values to declare significant LD for syntenic markers. Critical values were first obtained separately by line and generation within data sets. Examination showed that critical values differed between data sets, but within data sets values did not differ between lines or among generations within lines (data not shown). Consequently, lines and generations were pooled within data sets and used to derive critical values by data set (Table 2). As expected, critical LD values were in inverse proportion to the number of individuals used to calculate the LD values. Thus, they were highest for the LD22 data set (based on 22 individuals), somewhat less for individual generations of the LD96 data set (32 individuals), and distinctly lower for the pooled-generation data of the LD96 data set (96 individuals).
Distribution of LD for syntenic markers:
Figure 1 illustrates the decline of LD with map distance between the markers in a pair, on the basis of the distribution of χ2′-values as a function of distance. Values are for the LD96 data set of line 3, using marker data from 96 individuals across three generations. This picture of high LD at shorter distances that decays rapidly as distance increases was typical of all other data sets and agrees with previous results (Farnir et al. 2000; Pritchard and Przeworski 2001) and with theory (Sved 1971).
Tables 3 and 4 summarize the frequency distribution of χ2′ against genetic distance for the LD22 (Table 3) and LD96 data sets (Table 4). For all data sets, results for d ≥ 20 cM were almost identical to results for nonsyntenic chromosomes for the corresponding data sets (not shown), with only 10% of χ2′-values ≥0.20 and none ≥0.50. For shorter distances, in the LD22 whole chromosome data (Table 3), the proportion of χ2′-values >0.50 depended strongly on distance between markers and on line, declining from 33–34% for d ≤ 5 cM, to 9–13% for d in the range 5–10 cM, to 6% for d in the range10–20 cM. The proportion of χ2′-values >0.80 followed a similar trend, going from 15–23% for d ≤ 5 cM, to 3–5% for d in the range 5–10 cM, to 1–3% for d in the range 10–20 cM. Both lines behaved very similarly, with line 1 showing a somewhat higher degree of LD than line 2. Results for the marker clusters conformed to those above, with strong LD for d ≤ 5 cM, and appreciable, but less LD for d in the range 5–10 cM, and greater LD for line 1 than for line 2. When considering the LD96 results (Table 4), although the average amount of LD was greater in line 3 than in line 1 (see later), differences among generations within lines were small and not significant (data not shown). The pooled data sets behaved similarly to the individual generations for d ≤ 5 cM, but for longer distances (d ≥ 5 cM), the pooled data sets showed a higher proportion of χ2′-values in the lowest LD class (0.0–0.1 bin) and a lesser proportion in all other bins (data not shown). The pooled data sets, consisting of 96 observations per line, have lower chance variation than the individual-generation data sets of 32 observations. Thus, these results show that chance variation is not a major source of the high LD values for short marker-to-marker distances, but is for longer distances. Consequently, by increasing sample size it should be possible to reduce the contribution to marker-QTL LD of more remote QTL. This is important for LD mapping. Considering the data averaged over generations for line 1 and d < 5 cM, 24% of markers showed moderate to high LD (χ2′ ≥ 0.50), and 11% showed very high LD (χ2′ ≥ 0.80); corresponding values for line 3 were 44 and 29%, respectively. The difference between the two lines was highly significant by a chi-square contingency test (P < 0.01). This accords with the history of the lines, since line 3 underwent a hybridization episode in G1, which would be expected to contribute to increased LD in generations G5–G7, and is evaluated further below, with a model that includes correction for differences in marker distances. For line 1, the proportion of LD values in the highest LD bin for the LD96 data set at the closest distances (<5 cM) was 0.11, only about half the corresponding proportion (0.23) for the LD22 data set.
The important conclusion from Tables 3 and 4 is the very high degree of marker-LD at marker-to-marker distances <5 cM, with an average over all lines and data sets, including clusters, of 34% of marker pairs showing moderate to high LD (χ2′ ≥ 0.50) and 18% showing very high LD (χ2′ ≥ 0.80).
Comparing lines, chromosomes, and regions:
Estimates of the decline of LD with distance (bj) obtained from model 1 were 7.92, 11.81, 18.34, and 11.36 for LD22 line 1, LD22 line 2, LD96 line 1, and LD96 line 3, respectively. Estimates for LD22 line 2 and LD96 line 3 were not significantly different on the basis of a t-test, but differences with and among estimates for all other lines were highly significant (P < 0.0001). The smaller estimate for line 1 compared to line 2 in the LD22 data set indicates a greater degree of LD in line 1, which can be attributed to the fact that line 1 is a more recent cross (12 generations vs. 30 generations for line 2). The smaller estimate for line 3 compared to line 1 in the LD96 data set can be attributed to the fact that line 3 was formed at G1 by crossing two lines and hence would still have large amounts of residual short-range LD at the G5–G7 generations studied here. On the basis of the Sved (1971) prediction equation for LD, and assuming that the standard deviation of LD is equal to the predicted LD, as shown by Hill (1981) for r2, predicted and observed (in parentheses) proportions of LD >0.80 for LD22 line 1 and line 2 and for LD96 line 1 and line 3 are: 0.32 (0.23), 0.23 (0.15), 0.10 (0.11), and 0.24 (0.29), respectively. The mean predicted and observed proportion of LD across the four lines was 0.22 (0.20). Thus, the observed proportion of LD >0.80 was virtually as predicted.
Analysis of LD corrected for distance and variance using model 2 also showed a highly significant difference (P = 0.0003) between lines 1 and 2 in the LD22 data set and a significant difference (P = 0.02) between lines 1 and 3 in the LD96 data set. The model 2 analysis did not show significant differences between chromosomes 4 and 5 across lines and the interaction of chromosome and data set was also not significant (data not shown). Thus, this analysis shows that these two chromosomes were not characterized by different degrees of LD.
When evaluating differences in LD between regions, after correcting for genetic distance and heterogeneous variance, following model 3, the interaction of line and region was not significant and hence was dropped from the model. In the LD22 data set, the line effect was significant (P < 0.05) and the region effect was highly significant (P < 0.0001), the latter explaining ∼26% of the total variance. In the LD96 data set, the line effect was not significant but the region effect was again significant (P < 0.005), explaining ∼18% of the total variance.
Correlations between generations:
Table 5 shows observed Pearson correlations of χ2′ for marker pairs between generations of the LD96 data set, according to map distance between markers. For each correlation the expected correlation calculated according to the simulation is also given. For nonsyntenic markers, correlations were consistently low, but were positive and significantly different from zero (P < 0.0007) for both lines, even over four or five generations. For such markers, expected correlations are 0.25 for adjacent generations, similar to what was observed, but close to zero for generations further apart. For syntenic markers at d ≥ 20 cM, correlations were about the same as for nonsyntenic markers for line 1 but about twice as large for line 3. Again, this is probably due to the recent hybridization event in the history of line 3.
For marker distances of <10 cM, correlations among generations were very high for both lines (0.82–0.96) and did not decline much with the number of generations that separated the LD measures. Thus, these results show that marker LD across short distances was strongly conserved in these populations. For somewhat longer distances (10 < d ≤ 20 cM), correlations among generations were reduced considerably and ranged from 0.56 to 0.84. A χ2-test of the difference between observed and expected did not reveal a significant difference.
Standardized chi-square, χ2′, was a more effective measure than either r2 or D′ to evaluate marker-LD in the studied populations: r2 was strongly undervalued in multiallelic situations (data not shown), and D′ often gave high values for widely separated or nonsyntenic marker pairs. In contrast, χ2′ maintained a full range of values from 0 to 1.0, even with multiallelic markers, and rarely gave high values for widely separated or nonsyntenic marker pairs. For this reason, and also on the basis of simulation results of Zhao et al. (2005), χ2′ was used as the definitive measure of LD in this study. Moderately high and statistically significant values for D′ have been reported for widely separated and nonsyntenic marker pairs in studies of dairy cattle (Farnir et al. 2000; Vallejo et al. 2003) and sheep (McRae et al. 2002). These have been attributed to the population structure imposed by the small effective numbers of sires in livestock populations (Farnir et al. 2000). It is possible that a population structure of this sort is also responsible for some of the high D′-values obtained in the present study, although an attempt was made to reduce the importance of this factor by choosing individuals that represented as broad a pedigree as possible. A plausible explanation for the high nonsyntenic values of D′ in this and other livestock studies relates to the technical artifact that when one or more of the haplotypes expected in a sample are not observed, D′ must equal 1.0. Consequently, when one of the alleles in a haplotype is at low frequency in a population, haplotypes containing this allele are also expected to be in low frequency in the population. In this situation, one or more of the expected haplotypes may not be present in the sampled individuals, which causes D′ to inflate to 1.0. In studies of human populations, where the D′-measure is commonly used, biallelic markers are the rule, and markers for which one of the alleles is at low frequency are not included. In contrast, in livestock studies where multiallelic microsatellites are used, low-frequency alleles and haplotypes are common. This could explain the high D′-values between nonsyntenic markers obtained in this study and in the other livestock studies listed above. In contrast, χ2′ is unaffected by this artifact (Zhao et al. 2005).
The main finding of the present marker-LD study is the presence in the study populations of widespread LD among markers separated by ≤5 cM. Although there were significant differences in the degree of LD between lines 1 and 2, overall LD levels were high. Taking a rough unweighted average across lines and chromosomes, ∼30% of marker pairs in the 0- to 5-cM distance range showed χ2′-values ≥0.50, and 14% showed values ≥0.90. LD dropped off rapidly with increasing distance between markers, with ∼15% of pairs separated by 5–10 cM showing LD ≥0.50, and only 5% showing LD ≥0.90. For markers separated by 10–20 cM, the corresponding values were 4 and 0.6%, respectively. It should be emphasized that these results apply to the specific populations of the present study, which are highly selected and partially inbred. The situation in other populations may be very different. Indeed, differences in LD among lines 1, 2, and 3 were consistent with the histories of these lines, and overall levels of LD in these lines were consistent with their relatively small effective population sizes. In contrast to the results reported here, results reported by the International Chicken Polymorphism Map Consortium (2004b) hint that SNP haplotype blocks in their populations are rarely as large as 0.3 cM.
The relationship of LD to distance found in this study for markers separated by ≤5 cM was comparable to that found by Farnir et al. (2000), using the D′-measure in the Dutch Black and White dairy cattle population. A study of 26 markers on sheep chromosome 6 in a mixed population of Romney, Coopworth, and Perendale sheep also showed comparable values of D′ for markers separated by ≤5 cM (McRae et al. 2002). A study of LD in North American Holstein cattle included only four marker pairs separated by ≤5cM; highly significant LD was found for one of these pairs (Vallejo et al. 2003).
The finding of LD over extended regions (5–10 cM) implies that testing a candidate gene by an association test in a population of this sort could result in significant associations through linked QTL as far as 5–10 cM from the candidate gene marker, which greatly increases the noise factor. Thus, a candidate gene association test should be accompanied by a marker-LD analysis, to give some idea of the degree of LD in the population and hence of the potential region with which the candidate gene marker might be associating. In addition, marker-LD in a population could provide a criterion for constructing and testing a population a priori for its suitability with respect to candidate gene testing. One could decide on the degree of marker-LD against distance that would be acceptable and construct the population accordingly. Similarly, a two-stage approach might be used to lead from moderate- to high-resolution LD QTL mapping. The initial screen could be at a lower marker density to identify suggestive regions, and then a second analysis would be implemented, using a much higher density of markers in these regions. This would be similar to the combined linkage disequilibrium and linkage analysis mapping procedures that have been proposed (Meuwissen et al. 2002), in which family-based linkage analysis is used to determine general QTL location, and LD analysis is used for high-resolution mapping.
Examination of the Sved (1971) prediction equation for LD as a function of distance shows that the coefficient bj stands in inverse proportion to the extent of LD; that is, the greater the observed LD is, the smaller the estimate of bj. Thus, the effect of bj is similar to the effect of effective population size (Ne). Nevertheless, simulations show that bj is a biased estimate of Ne when based on χ2′ (Zhao et al. 2005). Consequently, the bj estimates obtained in the present study do not provide estimates of Ne.
This study was unique in showing that short-range LD is strongly conserved in consecutive generations—and almost to the same extent across an interval of four or five generations. Thus, short-range LD established in a given generation can be expected to persist over a number of generations. This is important when using marker-QTL LD for MAS. Another important finding was the presence of significant differences in marker LD among chromosomal regions within lines. To the extent that such differences are generated by drift, it may be useful to concentrate mapping efforts in regions that have high LD since in these regions the likelihood of detecting marker-QTL associations may also be higher. However, if regions of high LD represent selection signatures (Kim and Nielsen 2004), regions of high marker LD may be regions where selection has already increased the frequency of favorable alleles to high levels or even fixation, and in this case, depending on the approach to fixation, these regions may not be useful for mapping QTL that are still segregating in the population. Further experimental and theoretical studies are needed to clarify these possibilities.
It is of interest to explore the implications of these findings for QTL mapping based on marker-QTL LD, on the assumption that marker-QTL LD occurs at the same levels as marker-marker LD. At a marker spacing of 2 cM, a randomly placed QTL will be within 5 cM of ∼5 markers, within 5–10 cM of ∼5 markers, and within 10–20 cM of ∼10 markers. If the proportion of marker pairs within a given range R that have a level of LD greater than a set threshold T is denoted LDR, and the number of markers in this range is denoted mR, then on elementary probability calculations and assuming independence of LD between adjacent markers, the likelihood that a QTL will have LD > T with at least one marker in the given range is . Using the approximate levels of LD found in this study as given above (LD≤5cM = 0.30), it can then readily be calculated that there is a likelihood of P≤5cM = 0.83 that a QTL is in LD at χ2′ ≥ 0.50 with a marker that is within 5 cM of the QTL. The corresponding likelihood for χ2′ ≥ 0.50 for a marker in the range 5–10 cM is P5–10cM = 0.56 and for a marker in the range 10–20 cM is P10–20cM = 0.34. Thus, at this marker spacing, for any given QTL, there is a likelihood of P0–20cM = 1 − (1 − P≤5cM)(1 − P5–10cM)(1 − P10–20cM) = 0.95 that marker-QTL LD at χ2′ ≥ 0.50 will be found with at least one marker that is within 20 cM of the QTL, and on the average, LD at χ2′ ≥ 0.50 will be found for 2.65 markers within 20 cM of the QTL. Even at moderate statistical power of the experiment, therefore, a genome scan at 2-cM spacing would potentially be able to uncover most of the QTL segregating in these populations. At a marker spacing of 5 cM, similar calculations show that about two-thirds of QTL would be in LD at χ2′ ≥ 0.50 with at least 1 marker within 20 cM. In this case, an average QTL would be in LD with 1.0 markers, and very high statistical power would be required to realize the LD mapping potential of the population. This said, the evaluation of statistical power for marker-QTL LD association tests in agricultural populations is a complex problem that remains to be adequately addressed.
Nonetheless, whether at a marker spacing of 2 or 5 cM, only 57% of positive marker-QTL association tests will be with markers that are within 5 cM of the QTL, while the remainder will be distributed among markers in the range 5–10 cM (28%) and 10–20 cM (15%). Thus, a finding of marker-QTL association in these populations does not automatically position the QTL close to the marker. Achieving this will require genotyping additional markers, at a spacing of 0.25–1 cM, in the regions of interest.
An optimal strategy for detecting QTL using population-wide LD remains to be worked out, but it might involve the use of selective DNA pooling (Darvasi and Soller 1994; Lipkin et al. 1998) for the initial scan at close marker spacing, followed by individual genotyping of suggestive markers with an association test in the second stage to confirm marker-QTL LD. Multitrait (Korol et al. 1995) and multilocus (de Koning et al. 2001) methods might be useful at this stage to increase statistical power. In addition, multilocus haplotype analysis may increase mapping precision compared to single-marker methods (Meuwissen and Goddard 2000; Lee et al. 2004), although that has been questioned in recent research, which showed that single-marker regression analysis can be just as effective for fine mapping (Grapes et al. 2004). Studying LD across two or more generations should also be effective in narrowing the region to which the QTL has been mapped.
We thank Hy-Line International and in particular the staff of the molecular biology lab at Hy-Line (Karol Field, Amy McCarron, and Kara Pinegar) for providing the marker genotype data on which this study was based. E.H. was supported in part by a grant from Hy-Line International.
↵1 Present address: Department of Animal Science, Iowa State University, Ames, IA 50011.
Communicating editor: T. H. D. Brown
- Received April 19, 2005.
- Accepted August 2, 2005.
- Copyright © 2005 by the Genetics Society of America