Precision and High-Resolution Mapping of Quantitative Trait Loci by Use of Recurrent Selection, Backcross or Intercross Schemes
Z. W. Luo, Chung-I Wu, M. J. Kearsey

Abstract

Dissecting quantitative genetic variation into genes at the molecular level has been recognized as the greatest challenge facing geneticists in the twenty-first century. Tremendous efforts in the last two decades were invested to map a wide spectrum of quantitative genetic variation in nearly all important organisms onto their genome regions that may contain genes underlying the variation, but the candidate regions predicted so far are too coarse for accurate gene targeting. In this article, the recurrent selection and backcross (RSB) schemes were investigated theoretically and by simulation for their potential in mapping quantitative trait loci (QTL). In the RSB schemes, selection plays the role of maintaining the recipient genome in the vicinity of the QTL, which, at the same time, are rapidly narrowed down over multiple generations of backcrossing. With a high-density linkage map of DNA polymorphisms, the RSB approach has the potential of dissecting the complex genetic architecture of quantitative traits and enabling the underlying QTL to be mapped with the precision and resolution needed for their map-based cloning to be attempted. The factors affecting efficiency of the mapping method were investigated, suggesting guidelines under which experimental designs of the RSB schemes can be optimized. Comparison was made between the RSB schemes and the two popular QTL mapping methods, interval mapping and composite interval mapping, and showed that the scenario of genomic distribution of QTL that was unlocked by the RSB-based mapping method is qualitatively distinguished from those unlocked by the interval mapping-based methods.

THE benchmark article by Lander and Botstein (1989) stimulated enormous interest in locating quantitative trait loci (QTL) in experimental and natural populations. Research efforts in the last decade were focused on mass production of high-throughput DNA polymorphic markers (Dibet al. 1996; Wanget al. 1998; Marthet al. 2001) and development of analytical methods for detecting the presence and inferring the locations of QTL in marker linkage maps (Lander and Botstein 1989; Haley and Knott 1992; Luo and Kearsey 1992; Zeng 1994; Rabinowitz 1997; Mottet al. 2000). A recent comprehensive review based on 47 experimental studies of QTL mapping in plants, however, revealed that the current QTL mapping practice entails tremendous research effort and financial investment but yields QTL map localizations that are far from being satisfactory for identifying and isolating the quantitative trait genes at the molecular level (Kearsey and Farquhar 1998). The analysis showed that QTL were usually mapped with low accuracy and poor resolution (∼10–30 cM) and that the proportion of quantitative genetic variation determined by the QTL detected was very low (∼5%). Little progress has been made so far in cloning quantitative trait genes on the basis of inferred map location of QTL despite the claim in Alpert and Tanksley (1996) that a yeast artificial chromosome (YAC) clone bearing a major QTL affecting fruit weight in tomato was successfully obtained. However, the gene (fw2.2) was finally identified after 20 years' journey in narrowing down the candidate genomic region that contains fw2.2 (Fraryet al. 2000).

Theoretical investigations (Boehnke 1994; Guo and Lange 2000) suggested that the major bottleneck in narrowing down the confidence interval of QTL location is the limited number of informative meioses obtainable in most mapping populations in the literature. Experimental strategies using historically accumulated recombinations between markers and QTL have been suggested as an efficient approach to improving map resolution of QTL. These include several alternatives. First, Darvasi and Soller (1995) demonstrated that the confidence interval of QTL location inferred from a conventional F2 mapping population might be reduced by up to fivefold if the F2 population is expanded into a so-called advanced intercross line (AIL) by continued intercross. Improvement in the mapping resolution in an AIL is due to breakdown of linkage disequilibrium between the QTL and their linked marker loci. However, an appropriate statistical method still needs to be developed to model and analyze the data from an AIL experiment (Manly and Olson 1999). Second, the rate of dissipation in linkage disequilibrium between QTL and nearby marker loci in genetically isolated natural populations with good genealogical records may be modeled in terms of the recombination fraction between the loci and demographic parameters defining the evolutionary history of the populations. This approach may, in the best case, enable QTL to be located in a region of <1 cM (Hill and Weir 1994; de la Chapelle and Wright 1998; Luoet al. 2000; Luo and Wu 2001). However, it must be pointed out that much uncertainty exists in this population-based analysis if evolutionary details of the populations are not appropriately taken into account (Zollner and von Haeseler 2000). Statistical modeling of linkage disequilibrium involved with QTL has arisen as a new challenge to modern quantitative genetics (Luoet al. 2000; Luo and Wu 2001). Third, use of congenic lines was shown to be effective in narrowing intervals of inferred QTL location, providing the QTL effect was so significant that genotypes at the QTL could be assigned accurately (Darvasi 1997; Nadeau and Frankel 2000). Another major limitation of this approach is that tightly linked QTL would not be resolved if their genetic effects act in the same direction in the congenic strains. Fourth, construction of chromosome substitution lines enables precise identification of the chromosomes, which carry QTL. Recombinant progeny from backcrossing the appropriate chromosome substitution strain to its host strain may be used to test whether more than one QTL accounts for trait phenotypic difference among the substitution chromosomes and to locate each QTL with considerable map resolution (Nadeauet al. 2000).

A breeding scheme with repeated backcrossing and selection was proposed long ago by Wright (1952) for isolating quantitative trait genes of large effect, but little progress was made in QTL analysis until a recent series of elegant theoretical studies by Hill (1997, 1998). On the basis of his theory of directional selection for quantitative traits in finite populations (Hill 1969), he formulated the probability that quantitative trait genes of specified effects remain segregating in a backcross family undergoing one generation of truncation selection for nonrecurrent parental phenotype. The results were extended to multiple generations by an approximation that did not take into account the change in gene frequency under repeated selection and random drift. Use of multiple generations in the recurrent selection backcross (RSB) scheme is essential for accumulating a sufficient amount of recombination both between closely linked QTL and between the QTL and nearby markers. It has been clear that the RSB is an effective approach only in isolating QTL of large effect from their surrounding genome regions; QTL with small to medium effect have a high probability of being lost during the breeding scheme. An alternative way, suggested in Hill (1998), is to intersperse one generation of intercrossing among the selected individuals between consecutive backcrossings. However, it is less clear what impact the recurrent selection backcross inter se intercross (RSBI) scheme will have on maintaining QTL of small effect on the one hand and on separating the QTL from their surrounding marker loci on the other. Moreover, many important questions remain to be answered. How can the basic idea of RSB or RSBI be extended to dissect complex genetic variation into QTL? How robust is the strategy to various models of QTL effect? What precision and resolution in the QTL mapping may be expected by use of the RSB or RSBI schemes if advantage is fully taken of the fast development of single nucleotide polymorphic (SNP) markers? The extremely dense distribution of SNP markers over the whole genome may provide at least one polymorphic site in each of the functional genes in the genome (Marthet al. 2001), thus allowing full control of the genetic architecture that underlies complex quantitative genetic variation. What are the major factors affecting the RSB or RSBI for their efficiency in QTL mapping? In an attempt to address these questions, this article provides an exact theoretical prediction of mean and variance of heterozygosity at a marker locus linked to one or two QTL with any degree of recombination for any number of generations of the RSB schemes. This builds up a theoretical basis for the RSB-based QTL mapping. More complicated models were investigated and the above questions were explored by numerical evaluation of the theoretical predictions and by extensive computer simulation. Comparisons were made between the RSB(I) strategy and the routine methods of QTL mapping for their precision and resolution in identifying locations of multiple linked QTL.

THEORY AND METHODS

The breeding scheme: The theoretical analysis considers a breeding scheme initiated from two inbred lines P1 and P2 that are assumed to be fixed for different alleles at m marker loci and q loci affecting a quantitative trait. The QTL alleles increasing the phenotype are fixed in P1 and those decreasing the phenotype in P2. Effects of individual QTL are scaled in units of standard deviation of the residual variation of the trait, which are assumed to be normally distributed with mean zero and variance 1.0. Various models for genetic effects of the QTL are considered in the following analysis.

The two inbred lines are crossed to generate an F1 family and a random sample of F1 individuals are backcrossed to recurrent parental line P2 to produce F independent backcross families with a constant size of N. These families are defined as the first generation of the breeding program. In each of these backcross families, n individuals with the top-scoring phenotype for a quantitative trait are selected. The selected individuals are either backcrossed to the recurrent parent or randomly intercrossed to produce the next generation of the families. The breeding program lasts for T generations. Let Bit (SBit)or Iit (SIit) denote the ith backcross or intercross family (i = 1, 2,..., F) at generation t before (or after) selection. A simple diagram describing the RSB breeding scheme is given in Figure 1.

Figure 1.

—A diagram illustrating a recurrent selection and backcross (RSB) breeding scheme.

Hill (1997, 1998) exploited the probability that genes at one or two QTL of specified effects remain segregating after one generation of truncation selection. This provides a direct comparison with the situation where the probability is reduced by one-half, on average, if the locus is free of the selection pressure. Instead of calculating the segregating probability at the selected locus, we consider heterozygosity maintained at any marker loci, which may or may not be linked to the QTL in any generation of the breeding program. In all QTL mapping strategies, map location of a putative QTL is inferred from information of marker genotype and trait phenotype observed from a specified mapping population. In the RSB scheme, the genetic contribution of QTL to the trait phenotype is reflected as efficiency of selection in counterbalancing against the loss of the QTL allele of the nonrecurrent parent due to repeated backcross and genetic drift, while the genetic marker in the system provides information about genome location and extent to which the selection affects it. The closer the marker locus is to the selected locus (QTL), the more efficient the selection will be in maintaining the recipient allele at the marker locus, and thus the more likely the marker locus is in heterozygous status. Therefore, the marker heterozygosity serves as a natural and rational measure for its location relative to the QTL. When the genome regions bearing QTL are covered with densely distributed markers, the closest approximation for map location of the QTL may be inferred as the location at which the marker heterozygosity reaches its peak value. Thus, the theoretical analyses below are focused on heterozygosity of a marker locus that is linked to QTL with an arbitrary value of recombination frequency.

The theoretical analyses comprise four sections. The first two relate to the dynamic change in genetic structure of the RSB breeding populations under two loci (one marker and one QTL) and three loci (one marker and two QTL) models. The third section develops the calculation of the mean and variance of the marker heterozygosity under the above two models. When the RSB scheme is interrupted by incorporation of intercrossing, prediction of the dynamic change in linkage disequilibria between multiple loci becomes intractable. To investigate the impact of the RSBI, theoretical analysis is restrained to one QTL only and described in the last section.

Two loci model: The model considers one marker locus and one QTL. The two alleles at the marker locus are denoted by M and m, respectively, and those at the QTL by A and a. For each of the F independent backcross families, let Xt = (xt1 xt2 xt3 xt4)T and Yt = (yt1 yt2 yt3 yt4)′ be the two vectors that represent the distribution of four possible genotypes, at generation t,in Bit (before selection) and in SBit (after selection), respectively. In other words, element xti (or yti) is a random variable for the number of individuals with genotype j in Bit (or SBit), where j = 1, 2, 3, 4 corresponding, respectively, to joint genotypes MA/ma, ma/ma, mA/ma, and Ma/ma at the marker and QTL.

The stochastic change in population genetic structure during a RSB program is described fully by the following probability distributions: ptR=Pr{Xt=R}=Pr{xtk=rk,k=1,2,3,4}qtS=Pr{Yt=S}=Pr{ytk=sk,k=1,2,3,4}ξtSR=Pr{Yt=SXt=R}=Pr{ytk=sk,k=1,2,3,4xtk=rk,k=1,2,3,4}ηtRS=Pr{Xt=RYt1=S}=Pr{xtk=rk,k=1,2,3,4yt1k=sk,k=1,2,3,4}. These can be evaluated in a recursive formulation that is initiated from p1R=Pr{X1=R}=N!2Nk=14rk!(1c)r1+r2cr3+r4q1S=Pr{Y1=S}=RPr{X1=R}Pr{Y1=SX1=R}=Rp1Rξ1SR. (1) In general, ptR=Pr{Xt=R}=SPr{Yt1=S}Pr{Xt=RYt1=S}=Sqt1SηtRSqtS=Pr{Yt=S}=RPr{Xt=R}Pr{Yt=SXt=R}=RptRξtSR. (2) In the above, ΣR (or ΣS) represents summation over all possible rk under the constraint k=14rk=N (or sk under k=14sk=n . The conditional probability can be calculated as ξtSR=Pr{Yt=SXt=R}=k=14(rksk)i=14siIi(S,R;μ) (3) in which Ii(S,R;μ)=[1Φi(x)]si1j=14Φj(x)rjsjji4[1Φj(x)]sjfi(x)dx, (4) where μ= (μ1 μ2 μ1 μ2)′ and μ1 and μ2 are means of QTL genotypes Aa and aa, respectively, and fi(x) and Φi(x) are, respectively, the probability density function and the probability distribution function of a normal distribution with mean μi and variance 1.0, ηtRS=Pr{Xt=RYt1=S}=N!k=14rk!k=14ϕkrk, (5) where φ1 = (1 – c)s1/2N, φ2 = [(1 – c)s1 + 2s2 + s3 + s4]/2N, φ3 = [cs1 + s3]/2N, and φ4 = [cs1 + s4]/2N.

Three loci model: There are three possible patterns of relative locations of one marker locus and two QTL under this model. Let c1 and c2 be recombination frequencies between loci 1 and 2 and between loci 2 and 3, respectively. Assuming there is no recombination interference, the recombination frequency between the first and the third loci will be c = c1(1 – c2) + c2(1 – c1). Analogous to the above two loci model, the breeding scheme at generation t can be described by two random vectors: Xt = (xt1 xt2... xt8)′ or Yt = (yt1 yt2... yt8)′ whose component xti (or yti) denotes the number of individuals with the marker-QTL genotype i (i = 1, 2,..., 8 corresponding to AMB/amb, amb/amb, Amb/amb, aMB/amb, AMb/amb, amB/amb, AmB/amb, and aMb/amb, accordingly) in Bit (or SBit). The probability distributions of these random vectors are given by ptR=Pr{Xt=R}=Pr{xtk=rk,k=1,2,,8}=SPr{Yt1=S}Pr{Xt=RYt1=S}=Sqt1SηtRS (6) qtS=Pr{Yt=S}=Pr{ytk=sk,k=1,2,,8}=RPr{Xt=R}Pr{Yt=SXt=R}=RptRξtSR. (7) These can be evaluated from the initial condition p1R=N!2Nk=18rk![(1c1)(1c2)]r1+r2×[c1(1c2)]r3+r4[(1c1)c2]r5+r6(c1c2)r7+r8 (8) and the conditional probabilities ξtSR=Pr{Yt=SXt=R}=k=18(rksk)i=18si[1Φi(x)]si1j=18Φj(x)rjsjji8[1Φj(x)]sjfi(x)dx (9) ηtRS=Pr{Xt=RYt1=S}=N!k=18rk!k=18λkrk, (10) where fi(x) and Φi(x) are defined similarly to those in Equation 4. Λ= (λ1 λ2... λ8)′ can be calculated from product of matrices G and S = (s1 s2... s8)′ as Λ=1NGS, (11) where the matrix G has a form that depends on the relative location of the marker to the QTL. If the marker locates at the left side of the linked QTL, G=((1c1)(1c2)20000000(1c1)(1c2)2112[(1c1)(1c2)+c1c2]2(1c1)212(1c2)212c1c22012[c1(1c2)+c2(1c1)]2c120c220c1c2200[(1c1)(1c2)+c1c2]20000(1c1)c22000(1c1)2000(1c1)c2200[c1(1c2)+c2(1c1)]2012c220c1(1c2)200000(1c2)20c1(1c2)2000c120012); when the marker is between the linked QTL, G=((1c1)(1c2)20000000(1c1)(1c2)2112(1c2)2(1c1)212[(1c1)(1c2)+c1c2]212c1(1c2)20120c120[c1(1c2)+(1c1)c2]20c1(1c2)200(1c2)20000(1c1)c22000(1c1)2000(1c1)c2200c22012[c1(1c2)+(1c1)c2]20c1c2200000[(1c1)(1c2)+c1c2]20c1c2200c22c120012); and if the marker locates at the right of the linked QTL, G=((1c1)(1c2)20000000(1c1)(1c2)2112(1c2)2[(1c1)(1c2)+c1c2]212(1c1)212c1(1c2)20120[c1(1c2)+(1c1)c2]20c120c1(1c2)200(1c2)20000c1c22000[(1c1)(1c2)+c1c2]2000c1c2200c22012c120(1c1)c2200000(1c1)20(1c1)c2200c22[c1(1c2)+(1c1)c2]20012).

Mean and variance of the marker heterozygosity: The marginal probabilities πtk = Pr{xtk = rk} and πtij = Pr{xti = ri, xtj = rj} can be computed from the above joint probability distributions as πtk=lkPr{xtl=rl,l=1,2,,Ki=1Kri=N} and πtij=lijPr{xtl=rl,l=1,2,,Ki=1Kri=N} with K = 4 or 8 for the two or three loci model, respectively. Thus, the expected mean and variance of the marker heterozygosity can be calculated as Ht={1Ni=1N(πt1+πt3)ifor the two loci model1Ni=1N(πt1+πt4+pt5+πt8)ifor the three loci model} (12) and Vt=F1N2F{i=1N(πt1+πt3)i2+2i=0Nj=0Nπt13ij[i=0N(πt1πt3)i]2} (13) for the two loci model or Vt=F1N2F{i=1N(πt1+πt4+πt5+πt8)i2+2i=0Nj=0N(πt14+πt15+πt18+πt45+πt48+πt58)ij[i=0N(πt1+πt4+πt5+πt8)i]2} (14) for the three loci model. In theory, the above analysis may be extended to any number of marker loci and QTL but the algebra involved would be very tedious.

Recurrent selection and backcross or intercross scheme: Here we exploited the impact of introducing intercross among the selected individuals into the RSB scheme on maintenance of heterozygosity at a single QTL. To model the dynamics of the RSBI, let Xt = (xt1 xt2xt3)′ and Yt = (yt1yt2yt3)′ be two vectors whose elements xti (or yti) are a random number of individuals with genotype i in one of the independent families before (or after) selection (i = 1, 2, 3 corresponding to the QTL genotypes AA, Aa, and aa respectively). Probability distributions of these random vectors are given by ptR=Pr{Xt=R}=Pr{xtk=rk,k=1,2,3}=SPr{Yt1=S}Pr{Xt=RYt1=S}=Sqt1ηtRS(X) (15) qtS=Pr{Yt=S}=Pr{ytk=sk,k=1,2,3}=RptRk=13(rksk)i=13si[1Φi(x)]si1j=13Φj(x)rjsj×ji3[1Φj(x)]sjfi(x)dx. (16) The conditional probability ηtRS(X) depends on whether the selected individuals are backcrossed to the recurrent line (X = B) or intercrossed to each other (X = I) and has forms as ηtRS(B)=(nr2)[(2s1+s2)n]r2[(s2+2s3)n]r3 (17) ηtRS(I)=N!k=13rk!k=13ϕkrk, (18) where φ1 = [4s1(s1 – 1) + 2s1s2 + s2(s2 – 1)]/4n(n – 1), φ2 = [2s1s2 + 4s1s3 + s2(s2 – 1) + 2s2s3]/2n(n – 1) and φ3 = [s2(s2 – 1) + 2s2s3 + 4s3(s3 – 1)]/4n(n – 1). The dynamics of the probability distributions can be readily evaluated using the initial condition p1R = Pr{X1 + R} = (n!/r2!r3!)(1/2)(r2+r3). Similarly, the mean and variance of heterozygosity at the QTL in generation t of the RSBI scheme are given by Ht=i=1Nπt2iN and Vt=[i=1Nπt2i2(i=1Nπt2i)2]×(F1)N2F , respectively.

NUMERICAL ANALYSES AND SIMULATION STUDY

For simplicity but without loss of generality, we considered one chromosome on which there were 100 evenly distributed marker loci. Map distance between a pair of adjacent loci is 1 cM (approximately equivalent to a recombination frequency of 0.01 under Haldane's mapping function). Among the 100 loci, 1 or 2 were assigned to be within the genes underlying a quantitative trait in the numerical analyses of the theoretical model represented in the previous section. In the simulation study discussed below, 3 of the polymorphic marker loci were assumed to locate at the same positions as 3 QTL on the chromosome. The rest of the markers were devoid of effect on the quantitative trait. Assignment of some marker loci to the same polymorphic sites within QTL was based on two considerations. First, single nucleotide polymorphisms within coding sequences of functional genes have proved abundant in genomes whose complete sequence data are available (i.e., yeast, Caenorhabditis elegans, Drosophila, Arabidopsis, etc.) or are to be available (i.e., human, mice, pig, cattle, rice, etc.). Second, use of extremely dense marker maps allows us to exploit the maximum efficiency of the RSB mapping strategy in identifying the QTL map locations.

Numerical analyses: For a given set of parameters defined in the previous theoretical analysis, the mean and variance of heterozygosity maintained at a marker locus can be worked out numerically. The only technical difficulty in the numerical analysis is the limitation in computer time and memory for evaluating Equations 2, 6, and 7. These equations involve summation of a huge number of terms. The total number of terms in the summation is equivalent to the number of different configurations of ri (i = 1, 2,..., K with K = 4 or 8 under the two and three loci model, respectively) such that N=i=1Kri . A general formula for this number is given by c(K,N)=i=2K(N+i1)(K1)! , which takes a value of 264,385,836 when K = 8 and N = 50. Thus, numerical analysis demonstrated here has to be restricted to the cases with a small family size.

Figure 2 illustrates distributions of the means and variances of heterozygosity at 100 marker loci that were linked with (a) one, (b) two, or (c) three QTL in the various RSB breeding programs. Figure 2, a and b, was obtained from theoretical predictions and Figure 2c was calculated from averages of 100 repeated simulations of the RSB scheme whose parameters were given as the scheme 1 in Table 1. The pattern of change in the mean and variance of heterozygosity maintained at the marker loci directly reflected their locations relative to that of the QTL. The peak of the mean curves always occurred at the same locations as the QTL regardless of the QTL effects and the other parameters. On the other hand, there was a rapid decline in the marker heterozygosity as its mapping distance from the QTL increased. However, the change in the variance over the marker loci showed two different patterns. Whenever there was a substantial level of heterozygosity maintained around the selected loci, the variance was observed to be lower at the selected loci than their nearby markers (scheme 2 in Figure 2a and the scheme in Figure 2c), while the variance took a peak value at the selected loci to those of their neighboring markers if the heterozygosity had drifted to a low level (schemes 1–4 in Figure 2a and all schemes in Figure 2b). The difference in the pattern of the variance curves can be explained by noting three facts. First, the variance of heterozygosity in the present context is equivalent to the variance in frequency of the gene favored by selection. The variance in gene frequency due to genetic drift is proportional to the gene frequency. Second, selection efficiency (i.e., the absolute change in frequency of the gene under selection) is inversely proportional to the allele frequency. The smaller the frequency, the more efficient the selection will be in driving up the frequency. Third, the hitchhiking effect of the selected locus on the nearby neutral markers depends on the selection efficiency at the selected locus and the linkage disequilibrium between the selected and the neutral loci. The low heterozygosity maintained at the selected locus indicates low effectiveness of selection at the locus and thus the dynamics of heterozygosity at the nearby marker loci were less affected by hitchhiking but dominated by genetic drift. This resulted in a much lower allele frequency at the marker loci and thus a smaller variance in their allele frequencies. In contrast, a high gene frequency at the QTL reflects that the influence of genetic drift on frequency is effectively counterbalanced by selection. The strong hitchhiking effect maintained a high gene frequency for those markers in the vicinity of the selected loci before complete breakage of linkage disequilibria between the QTL and markers, while drift in the marker gene frequencies was balanced less effectively than that in the QTL gene frequencies. Thus, a larger variance in gene frequency might be observed at these markers than at the nearby selected loci. The analysis revealed that not only the mean but also the variance of the marker heterozygosity provide essential information regarding locations of the QTL and regarding how genes at both QTL and marker loci in the RSB populations evolve under a complicated combination of evolutionary factors of recombination, genetic drift, and selection.

View this table:
TABLE 1

The parameters defining the 16 simulated breeding schemes

Figure 2.

—Expected value (solid symbols) and the variance (open symbols) of heterozygosity at the marker loci linked to varying numbers of QTL under different design parameters: (a) one QTL and all schemes with T = 20; (b) two QTL and all schemes with n = 2, N = 10, F = 50, and T = 20; (c) three QTL and all schemes with n = 5, N = 50, F = 20, and h2 = 0.5; and (d) one QTL and all schemes with n = 5, N = 50, F = 20, and d = 0.5, where n, N, F, T, h2, and d are accordingly the number of individuals selected, the family size, the number of families, the number of generations of consecutive selection and backcrossing, heritability, and additive effect of QTL. Red arrows indicate location of the QTL.

It is clear from the figures that there was rapid fixation at the QTL with small effect in the RSB breeding program. However, the program may be improved in properly designed experiments. For a given number of individuals involved in the experiment, the breeding scheme with larger family sizes but smaller number of families (scheme 2 in Figure 2a) was superior in maintaining a high level of heterozygosity to that with smaller family size but larger number of families (scheme 3 in Figure 2a). Selection intensity played a significant role in slowing down the allele loss at QTL due to genetic drift, particularly when the QTL had a low genetic contribution to the trait. Figure 2b presents the analysis of two linked QTL under three models of genetic effects: (i) the additive and equal effect model (schemes 1 and 4), where the two QTL contributed equally and additively to the trait; (ii) the additive and unequal effect model (scheme 2) under which the two QTL affected the trait additively but the first QTL had twice as large an effect as the second QTL; and (iii) the epistatic model (scheme 3), where the individuals carrying the trait-increasing allele at each of the two QTL had a genotypic value that was fourfold those carrying only one such allele. A distinct feature observed from comparing the three models is that epistasis had remarkable influence on the marker heterozygosity. Under the epistatic model, the individuals carrying the increasing allele at every QTL had a much better chance of being selected in comparison to the corresponding additive model. This anticipates a strong trend for the nonrecombinant gametes carrying all increasing alleles to be selected and in turn results in a substantial increase in the level of heterozygosity at these loci on the one hand and a noticeable decrease in recombination between the loci on the other.

Loss of heterozygosity at the QTL with small effect was very fast in the RSB program. Reducing the amount of backcrossing by incorporation of intercrossing into the program was effective in slowing down the allelic loss particularly when intercrossing was frequent (scheme 2 in Figure 2d), but this caused a large fluctuation in the gene frequency. The effect of the RSBI on inferring the QTL locations is discussed in the following simulation study.

Simulation study: We developed a series of computer simulation programs that offer a high degree of flexibility in mimicking the RSB or RSBI schemes specified with various mating design parameters, different genetic architectures of quantitative traits, and arbitrary linkage relationships between the marker loci and QTL. In a single meiosis, the “random walk” procedure has been described elsewhere (Luo and Kearsey 1992) to simulate genetic recombination between linked loci. Chiasmata interference, sexual differentiation in recombination frequency, and segregation distortion were assumed to be absent in the simulation model.

As described in the above numerical analysis, we considered only one chromosome in the simulation study. There were 100 evenly distributed marker loci on the chromosome, 3 of them affecting a quantitative trait. Map distance between adjacent loci is constantly 1 cM. For a given proportion of quantitative genetic variation explained by the 3 QTL (h2), three models under which h2 was resolved into genetic effect of QTL were considered to investigate robustness of the mapping strategy to various QTL effect models. These include (i) the additive equal effect (AEE) model, under which each allele increasing the trait delivered an additive contribution of d = [2h2/3(1 – h2)]1/2 to the trait phenotype; (ii) the additive unequal effect (AUE) model, under which genetic effects of the increasing alleles at the three QTL are, respectively, d, d/2, and d/4 with d = [32h2/21(1 – h2)]1/2; and (iii) the epistatic effect model (EEM), under which any individual carrying k(≤2) increasing alleles would have a genotypic effect of [2kh2/3 (1 – h2)]1/2, but the genotypic effect was 2.0 when it carried all 3 increasing alleles (i.e., k = 3). The individual phenotype was determined by its genotypic effect plus a number that was randomly sampled from a standard normal distribution.

The breeding program was initiated with F backcross families obtained from two homozygous inbred parental lines, which were fixed with different alleles at the 100 loci. After the start of the breeding program, three mating strategies might be performed: (i) RSB, (ii) recurrent selection and either intercrossing or backcrossing in every Δt generation(s) (RSBIΔt), and (iii) recurrent selection and either intercrossing or backcrossing whenever the family mean at generation t was not lower than that at generation 1 (RSBIp). Intercrossing was simulated as random mating among the selected individuals. This includes the possibility of selfing of some selected individuals, but this would not seriously influence our results. The parameters defining the 16 simulated schemes are listed in Table 1. Each of the simulated schemes was repeated 100 times unless otherwise stated.

Means and standard deviations of heterozygosity maintained at the QTL were calculated from the repeated simulation trials for all these simulated schemes and are illustrated in Table 2. It can be seen from the table that there was a general trend in loss of heterozygosity at the QTL as the breeding schemes evolved (from T1 to T3). However, the rate of loss of the heterozygosity was influenced by almost all parameters defining the breeding program. Heritability was a dominant factor influencing the RSB schemes. Given the other parameters, the selected alleles at the QTL, which contributed 50% of phenotypic variance, were maintained at an unchanged frequency after 50 generations of RSB (scheme 1), whereas the alleles had almost completely vanished after 30 generations of RSB if the QTL explained only 15% of the phenotypic variance (scheme 15). However, incorporation of intercrossing in the RSB breeding schemes was effective in reducing loss of the increasing alleles at the QTL (scheme 16). Comparison of heterozygosity at the QTL between schemes 15 and 16 revealed that use of family phenotypic mean was an effective way to determine the switch between backcrossing and intercrossing during the breeding program such that the selected alleles may be effectively protected from being lost. As has been shown in the previous numerical analysis, the genetic model of the QTL effect influenced the heterozygosity loss remarkably. Epistasis in the QTL effects dramatically slowed down the loss of the QTL heterozygosity (scheme 9) compared to the corresponding additive models (schemes 7 and 8). A higher level of heterozygosity was maintained at more closely linked QTL (scheme 9) than at the less closely linked QTL (scheme 5). For a given experimental size (N × F), the scheme with larger family size but smaller number of families (scheme 4) was more effective than the scheme with smaller family size but larger number of families (scheme 1) for maintenance of the heterozygosity at the selected loci.

Locations of the QTL were inferred in the simulation study as locations of the marker loci at which the heterozygosity curve at generation 30 reached the peak values, and the accuracy in locating the QTL was evaluated as the percentage of the peak values of the same curve occurring at the simulated QTL locations in repeated simulations. Tabulated in Table 3 were the percentages of the correct locations of the QTL and means and standard deviations of the inferred QTL locations. It can be seen that a major factor determining accuracy of QTL mapping was size of the QTL genetic effect. The QTL with effect as large as 1.23 units of the residual standard deviation was located correctly in 87% of the repeated simulation trials, but the accuracy dropped to only 15% if the effect was quartered (scheme 17). As the trait had a very low heritability, increasing alleles at the QTL were almost completely lost after 30 generations of RSB (refer to scheme 15 in Table 2) and the QTL in this scheme was poorly mapped. However, the loss in mapping accuracy was substantially recovered in scheme 16 in which the RSBIp was performed. The actual map distance between linked QTL showed an obvious influence on their mapping accuracy: The closer the QTL were linked, the poorer their mapping accuracy (scheme 1 vs. 3). For a given experimental size, the QTL in the scheme with a larger family size but smaller number of families (scheme 4) were located more accurately than those with a smaller family size but larger number of families (scheme 1). Comparison between schemes 5 and 6 showed that the QTL in the scheme under a stronger selection had resulted in a worse rather than a better mapping precision of QTL. Epistasis in the QTL genetic effects was preferred in the RSB schemes for a better maintenance of genetic heterozygosity at the loci (Table 2), but it hindered rather than improved accumulation of recombination between the QTL and between them and the nearby marker loci and thus resulted in a reduced accuracy.

View this table:
TABLE 2

Means and standard deviations (in parentheses) of heterozygosity maintained at the QTL at different generation times in the simulated breeding schemes

Although there was variation in the percentage of correct identification of the QTL locations over the various schemes considered in the simulation, the QTL in all these schemes were, on average, mapped to locations that were not significantly different from their actual map locations. The standard deviations of the estimated QTL locations were in the range of 0.49–4.14 cM, and the change in the standard deviation among the different breeding schemes was consistent with change in the percentage of correct QTL locations inferred.

The above discussion represented evaluation of precision in mapping QTL under the RSB or RSBI schemes. It is important to exploit resolution of QTL mapping by use of the marker heterozygosity distribution. An ad hoc measure of resolution for an estimated QTL location at the ith marker locus can be λi = Hi – (Hi–1 + Hi+1)/2, where Hi denotes the heterozygosity of the ith marker locus. Means and standard errors of the resolution estimates over repeated simulations of all the schemes are tabulated in Table 4. It can be seen that all parameters considered here influenced the mapping resolution. As expected, there was a trend of improvement in the mapping resolution as the breeding schemes progressed, provided a substantial amount of heterozygosity was maintained at the stage when the QTL locations were examined. When the other parameters were fixed, the trait heritability played an important role in determining the mapping resolution. The higher the heritability, the better the QTL was resolved. In contrast to its positive effect on maintenance of heterozygosity, epistasis in the QTL genetic effect showed a negative influence on the mapping resolution of the QTL due to a reduced number of recombinants between the linked QTL and between them and their nearby marker loci during the selection and backcrossing process. This effect was more obvious with the QTL surrounded by other QTL. Population designs with smaller family size but larger number of families (i.e., scheme 1) were more effective in achieving a better mapping resolution. Better resolution observed in such designs may be explained by the fact that with larger numbers of families there is a better chance of maintaining a wider range of different recombinants between the linked QTL themselves and between the QTL and the marker. Selection intensity was less important for mapping resolution. Comparison of the mapping resolution between the RSB schemes and the RSBI schemes showed that reducing the number of backcrossing generations and allowing intercrossing in a RSB scheme had reduced the mapping resolution. More frequent intercrossing tended to worsen the mapping resolution providing frequent backcrossing had not driven the increasing allele at the QTL to a very low frequency.

View this table:
TABLE 3

The percentage of correctly inferred QTL locations and means and standard deviations (in parentheses) of estimated QTL map locations

Comparison of the RSB schemes to the interval mapping-based methods: Interval mapping and its later extended versions have been the most popular methods in QTL mapping in man, plants, and animals. This section compares the RSB-based QTL mapping approach with the two most popularly cited interval mapping methods in the literature: the interval mapping (IM; Lander and Botstein 1989) and the composite interval mapping (CIM; Zeng 1994) methods under the constraint of a constant capacity of genotyping 1000 individuals for 100 linked and evenly spaced marker loci. Of the marker loci, 3 were assumed to be QTL, which explained 30% of phenotypic variance of a quantitative trait. Map distance between any pair of adjacent marker loci was 1 cM (∼1% recombination frequency). For the IM and CIM analyses, a backcross family with 1000 individuals was generated and analyzed by use of the QTL cartographer (Bastenet al. 1994), the computer software for carrying out the IM and CIM analyses. The RSB breeding schemes were simulated with 20 independent families and 50 individuals for each of these families, yielding the same population size as the interval mapping analyses. Selection in the RSB program was for 50 generations at a constant intensity of 10%. It has already been pointed out in the above simulation study that both mean and variance of the marker heterozygosity are informative about relative locations of the marker loci to the selected QTL. To combine the information from these two statistics, a measure for the QTL presence was calculated as τit=HitVit when Vit > 0 otherwise 0, where Hit and Vit are, respectively, the mean and variance of the ith marker heterozygosity at generation t.

Illustrated in Figure 3 are the distribution of the likelihood-ratio test statistics from the IM and CIM analyses and the distribution of the measure of the QTL presence in the RSB schemes. It was very clear from analysis of the RSB schemes that the QTL locations were accurately and unambiguously identified as the chromosome locations at which τit reached its peak value. In addition, the method was very robust to various models of the QTL effects on the trait. The presence of the QTL on the simulated chromosome was strongly evident from the interval mapping methods, but in sharp contrast, it did not provide clear-cut inference of the locations of the QTL. The interval mapping predictions of the QTL locations worsened when there was epistasis in the QTL effects.

View this table:
TABLE 4

Means and standard errors (in parentheses) of the resolution for the QTL mapping at different generation times in the simulated breeding schemes

DISCUSSION

The present article develops a theoretical framework for predicting the mean and variance of heterozygosity maintained at marker loci linked to one or two QTL for any number of generations using the recurrent selection and backcross schemes previously proposed by Wright (1952) and studied by Hill (1998). The theoretical prediction takes appropriate account of the dynamic change in linkage disequilibria between the QTL themselves and between the QTL and the marker loci due to selection, recombination, and genetic drift during the breeding program. In principle, it is tractable to extend the analysis to more than the three loci modeled here because the distribution of multiple loci linkage disequilibria under the present setting is equivalent to that under a multiple loci haplotype model of linkage disequilibria. Nevertheless, numerical evaluation of the multiple loci system will be computationally very demanding when the experimental size is large. More complicated models were investigated in the simulation study. The analyses demonstrated that distributions of mean and variance of heterozygosity at different marker loci in the RSB breeding program provide sufficient information regarding their relative locations to the QTL under selection in the program and regarding the evolutionary driving factors behind the breeding populations. Appropriate use of these statistics may provide a simple but efficient alternative approach for mapping complex quantitative genetic variation at a substantially improved precision and resolution. The major features of the RSB-based QTL mapping schemes can be summarized as follows:

  1. The mapping strategy is powerful in identifying the polymorphic sites, which are in close linkage (i.e., 1 or 2 cM) to the QTL. Use of the dense marker maps that include the genomic polymorphisms within the QTL (cSNP, for instance) enables precise identification of the map locations of the QTL. In general, the error in the inference of the QTL locations will not be beyond one or two times the coverage density of the marker maps.

  2. Maintenance of the selected genes at a nontrivial frequency is a prerequisite for achieving both precision and resolution in inferring their map locations under the RSB framework. For QTL with large genetic effects on the trait phenotype, selection on the trait is usually efficient enough to counterbalance dilution of the recipient genome regions around the selected loci over repeated backcrossing and genetic drift that causes loss of the selected genes over the breeding process. Selection in the RSB schemes is usually not able to prevent QTL with small effects from quick gene fixation. Instead of repeatedly backcrossing the selected individuals to the recurrent parental line, incorporation of intercrossing among the selected individuals at some stages of the breeding program is effective in maintaining the selected QTL alleles for long enough to break down linkage disequilibrium between the QTL and the nearby marker loci so that the QTL may be isolated from their closely linked genome regions.

  3. Many factors that can be managed by experimentalists play an important role in determining success of the RSB-based approach in the QTL mapping. The number of generations of the breeding program affects the degree of breakage in linkage disequilibria between the QTL and between them and their nearby marker loci. A prerequisite for precision and high-resolution mapping of the QTL using this program is sufficient breakage of the linkage disequilibria. In principle, the theory developed in this study can easily be extended to provide the estimate of the number of generations required to reach independent segregation between the linked QTL and between the QTL and their linked marker loci. After establishment of linkage equilibrium between these loci, selection will maintain a slow change in gene frequency at the QTL and, at the same time, genetic drift will drive the marker gene to be fixed very quickly, yielding a clear scenario of the QTL locations as showed in Figure 3. For a given size of experiment and whenever possible, experimental designs with a larger family but smaller number of families are superior in reducing the effect of genetic drift and thus maintaining higher allele frequencies at the selected loci, but may be inferior in obtaining a better mapping resolution of the QTL in comparison to the designs with smaller family size but a larger number of families.

  4. The RSB-based QTL mapping approach is quite robust to various models of genetic effects of QTL even though there is a difference in the mapping resolution for the QTL affecting quantitative traits under different models of genetic effects. Positive epistasis in the QTL effects enhanced effectiveness of their selection. This resulted in two consequences: an increased heterozygosity at the QTL but a decreased resolution in their mapping when compared to the corresponding additive model.

  5. The QTL mapping approach discussed above has several advantages over the interval mapping method and its extended versions. It takes full advantage of a dense marker map and uses accumulated recombinations between the QTL and their nearby marker loci. This yields precise identification of the QTL locations or accurate inference of their locations at a mapping resolution as close as 1 cM or less. In contrast, there is an upper limit in marker density by which the efficiency of the interval mapping methods will not be improved further (Lander and Schork 1994). On the other hand, the informative meioses are practically limited in the segregating populations for which the interval mapping methods were developed. These make the QTL mapping inferred from the methods far from satisfactory at the same criterion as the RSB mapping results (Figure 3). The RSB mapping approach is robust to errors in estimation of map distances between the marker loci whereas methods based on interval mapping are sensitive to these errors (Luo and Kearsey 1992). The practical implementation of the RSB mapping schemes does not need any complicated statistical modeling of the experimental data. Given that many years of considerable research efforts to isolate genes affecting complex traits have resulted in slow progress, we would not consider the long duration of the RSB breeding program to be an expensive investment for significant improvement in mapping precision and resolution in the QTL locations that may lead directly to cloning of QTL. It may not be strictly appropriate to compare the multiple generation approach of RSB-based QTL mapping to interval mapping analysis, which is based on the populations of a single segregating generation. Among the several multiple generation approaches suggested in the literature, use of the advanced intercross line has been shown theoretically to have potential for improving mapping resolution of QTL localization (Darvasi and Soller 1995; Xiong and Guo 1997). These researchers consistently revealed a diminished return in the resolution improvement after eight generations of continued intercross in an AIL scheme. This property of the AIL approach is in sharp contrast to that of the RSB scheme in which the mapping resolution has been observed in our study to increase steadily as the breeding scheme progresses until complete breakage of linkage disequilibrium between the QTL and their linked marker loci. However, a direct and comprehensive comparison between these two schemes requires that an appropriate statistical method be established to analyze experimental data from these breeding schemes.

Figure 3.

—Comparison of the RSB schemes to interval mapping (IM) and composite interval mapping (CIM) of a trait controlled by three QTL, which explained 30% of phenotypic variation under different models of their genetic effects and varying mapping distances between linked QTL: (a) additive equal QTL effect and 20 cM between adjacent QTL, (b) additive unequal QTL effect and 20 cM between adjacent QTL, (c) epistatic QTL effect and 20 cM between adjacent QTL, and (d) epistatic QTL effect and 10 cM between adjacent QTL. The red arrow indicates the simulated location of the QTL.

Selection in the RSB breeding program is based entirely on the trait phenotype. Appropriate use of marker information will be an effective way of improving the efficiency of selection for the QTL with small effects (Luoet al. 1997). The marker-assisted RSB mapping will enable not only the QTL of large effects but those with small effects to be maintained during the breeding procedure, and thus the complete genetic architecture underlying the polygenic variation will, in principle, be uncovered. Other useful information that can be extracted during the breeding program is the change in pattern of association between the different recombinant marker genotypes and the trait phenotype. A careful assay of the different marker recombinants for their trait phenotype enables direct assessment of whether or not a specific recombinant is informative for resolving the QTL from their nearby marker loci (Mackay 2001). Taking this information into account in selection allows an efficient accumulation of these informative recombinants and thus further improvement in resolution of the QTL mapping. Although this study serves as only an initial step in illuminating the potential of appropriate use of genetic tools in dissecting complex genetic variation at a molecular level, the analyses and results demonstrated in this article suggest a potentially better future for map-based cloning of genes affecting most evolutionarily, agronomically, and medicinally important complex traits as has been done successfully in Frary et al. (2000).

Acknowledgments

We thank Professor W. G. Hill for his discussion and encouragement to the research presented here and Dr. Zhao-Bang Zeng for his advice on using the QTL cartographer software. This work was initiated when Z.W.L. was supported by a collaboration grant from the University of Chicago. Z.W.L. is supported by a start-up research grant from the School of Biosciences and a grant from the Medical Research Committee of the University of Birmingham, by China's Basic Research Program “973”, the National Science Foundation of China, the QiuShi Foundation, and the Changjiang Scholarship. C.-IW. is supported by National Institutes of Health and National Science Foundation grants. M.J.K. is supported by research grants from the United Kingdom Biotechnology and Biological Sciences Research Council.

Footnotes

  • Communicating editor: Y.-X. Fu

  • Received October 10, 2001.
  • Accepted March 13, 2002.

LITERATURE CITED

View Abstract