Abstract
Sewall Wright suggested that genes of large effect on a quantitative trait could be isolated by recurrent backcrossing with selection on the trait. Loci [quantitative trait loci (QTL)] at which the recurrent and nonrecurrent lines have genes of different large effect on the trait would remain segregating, while other loci would become fixed for the gene carried by the recurrent parent. If the recurrent line is inbred and the backcrossing and selection is conducted in a series of replicate lines, in each of which only one backcross parent is selected for each generation, the lines will become congenic to the recurrent parent except for the QTL of large effect and closely linked regions of the genome, and these regions can be identified using a dense set of markers that differ between the parental lines. Such lines would be particularly valuable for subsequent fine-scale mapping and gene cloning; but by chance, even QTL of large effect will be lost from some lines. The probability that QTL of specified effect remain segregating is computed as a function of its effect on the trait, the intensity of selection, and the number of generations of backcrossing. Analytical formulas are given for one or two loci, and simulation is used for more. It is shown that the method could have substantial discriminating ability and thus potential practical value.
WITH modern molecular methods, it is becoming possible to map quantitative trait loci (QTL), those regions of the genome that affect traits with polygenic expression. The precision with which the QTL can be located depends on the populations available, as well as the size and design of the experiment and the statistical methods adopted. To understand the genetic control of a trait, the mapping of QTL is only the start; it is subsequently necessary to identify, if possible, whether a QTL actually comprises a single genetic locus and then the actual gene(s) involved. This has not yet been accomplished for a continuous trait, although Alpert and Tanksley (1996) obtained a clone containing a QTL in tomato. Precise mapping and cloning require well-defined stocks. Multigeneration backcross lines, more specifically congenic lines, for which all the genome except the region of interest comes from an inbred line, are likely to be of particular value for QTL identification (Démant and Hart 1986; Tanksley and Nelson 1996). Sets of congenic lines can be used to obtain narrow intervals for a QTL, providing its effect is sufficiently large that genotypes can be assigned accurately (Darvasi 1997).
While it is desirable to isolate QTL with both small and large effects, in practice the former cannot be mapped with sufficient precision to be useful for subsequent location and cloning; in any case, more information about biological processes are likely to come from a study of those with large effect. A method that can be used to isolate genes (i.e., QTL) of large effect was proposed long ago by Wright (1952). He suggested that recurrent backcrossing be practiced in which the nonrecurrent line is, say, of high performance and the recurrent line is of low performance, with the parents of the backcross individuals selected each generation for high performance of the trait of interest. Backcrossing leads to a halving of frequency each generation of genes that do not affect the trait and are not linked to those that do. Genes with large effect on the trait are, however, more likely to be present in the selected individuals, so their expected frequency is >0.5 and the backcross line should eventually remain segregating only for genes of large effect. In simple terms, a gene would need to confer a twofold higher fitness to survive indefinitely in a large backcross population. In practice, lines are finite in size, so even genes with very large effect may be lost, albeit slowly. While it may be feasible to maintain one large backcross line with several parents each generation, this provides only one replicate and requires that selection be practiced between and within families, so between-family environmental covariances, important in species such as mice, increase the errors in selection.
The recurrent backcrossing enables all but very closely linked markers and QTL to recombine, thereby facilitating detailed mapping using the dense maps now available, for example, in the mouse (Dietrichet al. 1996). The method is likely to be of most use in species such as Drosophila, mice, and Arabidopsis, which have a short generation interval and for which inbred lines are available. It has been proposed by Snell (1958) as a means of identifying histocompatibility genes, and it has been used by Beebe et al. (1997) to identify loci associated with resistance to Leishmania major in mice. The principles and methods, particularly for detection of loci associated with a disease with all-or-none expression, have been investigated by N. J. Schork, A. M. Beebe, B. Thiel, P. St. Jean, and R. L. Coffman, (unpublished data), in which the scheme generally involves taking sublines from individuals who show resistance so that a pedigree of resistant individuals is built up. The method is then analogous to that of identity-by-descent (ibd) mapping (Houwenet al. 1994; Guo 1995; Charlieret al. 1996), in which regions of chromosomes in related individuals with extreme phenotypes for a quantitative trait carrying putative QTL are identified by a genomewide scan, and for which relevant theory has been developed (Thomaset al. 1994; N. J. Schork, A. M. Beebe, B. Thiel, P. St. Jean, and R. L. Coffman, unpublished data). The use of backcrossing with selection to identify QTL by their linkage to markers has similarities to the selection scheme practiced by G. Bulfield from an inbred cross in which changes in marker frequency are monitored in replicate lines to infer QTL position (Keightley and Bulfield 1993; Keightleyet al. 1996; Ollivieret al. 1997), but differs in that the backcross selected lines are congenics and immediately useful for precision mapping and gene cloning.
A formal backcrossing scheme, which has the particular benefit that it leads to the production of several independent lines that are congenic for small but different parts of the genome, is to maintain a series of separate single-family lines during the backcrossing, with each family maintained by only one selected parent each generation. The presence of QTL of large effect is then detected by undertaking a genome-wide scan of molecular markers only after several generations of backcrossing and only on the one selected individual of each line, and by identifying regions that have remained segregating in many of the families. Subsequently, these regions that remain segregating can be investigated more finely by further developing lines by continued backcrossing, but now maintaining segregating the marker or flanking markers that are identified as close to QTL in the initial screen and by progeny testing within families to identify the marker-associated effect.
In this paper, the properties of the method of backcrossing with selection to develop such independent congenic lines are examined, for example, in terms of the probability that QTL remain segregating as a function of the size of its effect on the trait, the number of generations of backcrossing, and its linkage to other QTL. Ways to analyze and interpret the data are considered.
ANALYSIS
It is assumed that backcrossing is practiced recurrently to a completely inbred line. The nonrecurrent and recurrent lines are assumed to be homozygous for different alleles at QTL of interest. A total of M families (lines) are maintained independently. In each generation, in each backcross family, n individuals are recorded for the quantitative trait, and the highest scoring individual is selected and backcrossed again to the recurrent line. Backcrossing and selection are continued for t generations, F1 being generation 0. Variation that is not explained by the QTL, because of within-family environmental and residual genetic deviations in the trait (likely to be significant only in early backcross generations) is assumed to be normally distributed, N(0,1). Effects, a, of QTL are expressed in units of this within-family (mainly environmental) standard deviation.
Single locus: This simple example serves as a reference for others. It is assumed that a QTL with allele A from the nonrecurrent and A′ from the recurrent parent confers an increase in the heterozygote of a SD on the trait, and that it is unlinked to any other QTL affecting the trait. It is necessary to compute the probability P(n,a) that the offspring of generation t + 1 selected from a heterozygous backcross parent of generation t is itself heterozygous for the locus. This probability can be readily computed using binomial probabilities for the number k of heterozygous offspring among the n recorded and order statistics for the probability that the highest scoring offspring is heterozygous. For example, if one of the k heterozygotes has phenotypic value x for the trait with probability ϕ(x)dx and is the highest in the family, it implies that k − 1 heterozygotes and n − k homozygotes have performance less than x, the probabilities for each individual being Φ(x) and Φ(x + a), respectively, where ϕ(x) and Φ(x) denote the density and cumulative distribution functions, respectively, of the standardized normal distribution. Then, using the method of Hill (1969), the probability an AA′ heterozygote is selected is as follows:
Probability P(n,a) that a heterozygote with a QTL of effect a SD units is selected from the offspring of a backcross mating, with selection of the best one from n
These values can be approximated for small values of a, by
Repeated backcrossing: In the first few generations of backcrossing, the variance is likely to be inflated by segregation at other loci in a way that the accuracy of
selection and probabilities of retention will be lower than those shown in Table 1, where the effect of the gene is expressed in terms of the within-family environmental SD. This problem is visited again later, but first, let us assume for simplicity that the (unit) within-family variance remains constant. The probability P(n,a)t that the QTL remains segregating for t generations in a line maintained with one parent and the same selection intensity each generation is therefore
Probability that a QTL of effect a remains segregating for t generations of backcrossing when the best individual of n is selected each generation: P(n,a)t plotted against t for a = 0, 0.5, 1, 1.5, and 2, and n = 4 and 12.
Some examples using Equation 3 are given in Figure 1. The main point is that the differences between the probability of continued segregation of a QTL of large effect and a QTL of small effect becomes wider as more generations of backcrossing are undertaken and more intense selection is practiced. Of course, if backcrosssing is continued too long, even those of very large effect are lost. In principle, there is some intermediate optimum time for discriminating among QTL of specified effects, if such can be defined.
Since a number of independent replicate lines can be kept, it is useful to reconsider these results in terms of the distribution of the number of lines in which a QTL of specified effect would be segregating. If M lines are maintained, then the expected number in which there is segregation at generation t is MP(n,a)t. The actual number segregating, m, has a binomial distribution, but the Poisson distribution gives an adequate approximation and results are more readily generalized. Hence, we assume that m has a Poisson distribution with parameter MP(n,a)t. Some examples are given in Table 2, for experiments in which 10 or 20 lines are maintained. For example, if 10 lines are maintained with selection of one from n = 4 for t = 4 generations, a QTL with effect of 0.25 SD has a probability of <2% of remaining segregating in four or more lines, whereas a QTL with effect of 1.5 SD or more, has a <1% chance of being lost from all 10 lines and a 66% chance of remaining segregating in four or more lines.
Probability distribution of the number m of a total of M lines in which a QTL of effect a SD units remains segregating for t generations with selection of the best one from n each generation
Correction for background genetic variation: With segregation at other loci than that being analyzed, there will be additional variation within families, particularly in early generations. With additive genes and genetic variation VA caused by background genetic variation in the F2, and assuming for simplicity that the background variation is caused by very many unlinked loci, each of very small effect, there will be VA/2 in the first backcross and VA/2t in the tth backcross. There will also be variance a2VE/4 caused by the QTL under consideration in the first backcross, which with additive gene action implies a2VE/2 in the F2, where VE is the within-family environmental variance. Consider a simple case, where VA = VE and a = 1, so the total genetic variance in the F2 would be 3VE/2, and the within-family environmental plus background genetic variance in the backcross would be [1 + (½)t]VE. Hence, the effect of the QTL in environmental SD units would be 0.816, 0.894, 0.942, … in backcross generations t = 1, 2, 3,… For example, P(4, 0.816) = 0.698, P(4, 0.894) = 0.714, and P(4, 0.942) = 0.724, whereas P(4,1) = 0.735. Hence, the probability that the QTL remains segregating to generations 1, 2, 3, and 4 is 0.697, 0.498, 0.361, and 0.263, respectively, whereas the equivalent values assuming that P(4,1) is appropriate each generation are 0.735, 0.541, 0.397, and 0.292, respectively. It seems that the quantitative differences are rather small, and that the simple calculations are adequate unless the segregation variance, VA, is much larger than VE (i.e., heritability in the F2 considerably in excess of one-half) and if, because of the selection, it declines much more slowly than one-half per generation, i.e., there are other QTL of large effect or linked in coupling phase.
Two loci: The usefulness of the method depends not only on keeping QTL of large effect segregating, but also on losing those of small or negative effect that may be maintained by linkage to a QTL of large effect. Let us consider a model where there are two additive (i.e., nonepistatic) QTL, A1 and A2, on the same chromosome, with effects a1 and a2 within family standard deviations, respectively. The recombination fraction is r between the loci. In any generation where the backcross parent is a double heterozygote, A1A2/A′1A′2, there are four possible offspring genotypes selected in the next generation: the double heterozygote with both A1 and A2 present, i.e., A1A2/A′1A′2 with probability P12(n,a1,a2), or the single heterozygote with only A1, i.e., A1A′2/A′1A′2 with probability P12′(n,a1,a2), or only A2, or both lost. An extension of Equation 1 can be used. For example, the probability that the double heterozygote is selected is given by
Probabilities of selection of each alternative genotype for a two-locus model with additive effects a1 between alleles A1 and at locus 1 and a2 between A2 and
at locus 2, and recombination fraction r between the loci—selection of the best one from n
Examples are given in Table 3 of the probabilities for a series of examples in which the sum of the effects of the two loci are the same (a1 + a2 = 1), but their relative sizes differ (a1 = 0.5, 0.75 and 1.5). Results for a1 = 1 and a2 = 0 can be obtained from Table 1 by noting that P 12(n,1,0) = (1 − r)P(n,1), where P(n,1) is given by Equation 1. If linkage is loose, it is seen that the extreme cases of equal effects (a1 = a2) and a2 < 0 give quite different outcomes, but as linkage becomes tight, the survival probability of the double heterozygote depends little on the relative size of effects of the two loci.
These probabilities can be approximated, providing values of ia1 and ia2 are not too large, for example:
The examples in Table 3 are for nonepistatic loci, i.e., with additive effects in heterozygotes over loci. Equation 4 changes in a straightforward way if this is not the case, and probabilities that one or both of the loci continue to segregate change correspondingly. For example, consider the case where the double heterozygote is 1 SD superior to each single heterozygote and the double homozygote. For complete linkage (r = 0), results are therefore the same as in each example in Table 3, whereas for r = 0.2, P12 = 0.6376, P12′= P1′2 = 0.0604, and P1′2′ = 0.2416, and for unlinked loci (r = 0.5), P12 = 0.4502, and P12′ = P1′2 = P1′2′ = 0.1832, i.e., single heterozygotes are less likely to be selected than in the additive case.
To consider the passage over several generations of each of the genotypic classes, it is necessary to include the probabilities of the single locus segregants. We construct the 3 × 3 transition matrix B, for which the rows and columns identify the following states: (1) A1 and A2, (2) A1 but not A2, and (3) A2 but not A1; the elements
bi,j specify the transition probability from state i at generation t to state j at generation t + 1. (Alternatively, a 4 × 4 matrix can be used, with the fourth row and column denoting the case where neither A1 nor A2 are segregating; because this is an absorbing state, the computation of segration probabilities are not affected.) Hence, using the single locus formulas from the preceding section,
Probabilities of segregation after t generations of alternative genotypes for a two-locus model with additive effects a1 between alleles A1 and at locus 1 and a2 between A2 and
at locus 2, with recombination fraction r between the loci—selection of the best one from n each generation
For the example given in Table 3, results for segration probabilities for four and eight generations are given in Table 4. Although the probability that both loci remain segregating is not greatly affected by the relative magnitude of the gene effects (and no probabilities of retention are high in this example because the total effect is only a1 + a2 = 1 and n = 4), the QTL of smaller or negative effect has a low probability of remaining segregating alone for many generations, so the method does have some discriminating power.
Marker segregation: The previous analyses have been restricted to the fate of the QTL, but their segregation has to be detected by means of molecular markers. The calculations for individual markers are straightforward and are simply a special case of the two-locus analysis given above. Let us assume that A1 is the QTL and A2 is the marker, i.e., a1 = a and a2 = 0, with the recombination fraction between the loci equal to r. Then the elements of B are given by
If there are two QTL, the fate of alleles at a marker locus, say A3, depends on whether it is between or outside this pair. If A3 is outside the interval A1–A2, then its probability of segregation is given by expanding the formulas given in Equation 3. If A3 lies outside the interval, but nearer to A2, for example, the probability that all three loci remain segregating is given by (1 − r23)P12(n,a1,a2). If A3 lies between A1 and A2, then, for example, the probability that all three loci remain segregating is [(1 − r13)(1 − r23)/(1 − r12)]P12(n,a1,a2). The matrix has to be formally extended to consider seven possible classes (all three loci, three pairs, and three singles), but the calculations are straightforward.
Probabilities of segregation of multiple QTL and markers, computed using Monte Carlo simulation with 500 replicates*
Multiple linked QTL: The preceding analysis is concerned with the outcome of backcrossing when there are only one or two QTL on the chromosome affecting the trait under selection. An alternative model is that the difference in performance between the recurrent and nonrecurrent parent lines are caused by many QTL of (mainly) small effect on each chromosome. Some examples have been considered using the Monte Carlo simulation, with a model of a chromosome of map length LcM and typically 21 loci simulated at equal spacing, with the most distant at the ends of the chromosome. Ten of these loci, numbers 1 (i.e., at position 0.05L), 3,…, 19, were assumed to be QTL of equal effect, and the remaining 11 loci, numbers 0, 2,…, 20, were assumed to be markers with no effect on the trait. The probabilities that individual QTL remain segregating do not differ greatly whether or not they are near the middle or the end of the chromosome, unless the chromosome is of length 200 cM or more. Similarly, the probability that individual markers remain segregating is little different from that of the QTL between which they lie. Hence, only summary figures are given for the examples in Table 5. If the chromosome is short (L < 100), the probability that loci on it remain segregating is quite substantial, even if the individual loci have effects of 0.4 SD or less. If it is long (L > 200), the probability of continued segregation is a little higher than for neutral genes (6% in the example of t = 4 generations), even if the individual effects are as large as 0.4 SD and the total difference between the ancestral chromosomes is 4 SD. In general, there will be greater discrimination if selection is more intense and the numbers of generations are longer than the examples in Table 5 (n = 4, t = 4). A model with very many QTL of very small effect in coupling would give similar results to those in Table 5, as seen by the similar segregation probabilities for QTL and markers.
Unless the chromosome is very long (L > 200) and individual QTL are all of large effect (say a > 1), an alternative extreme model in which there is complete repulsion of QTL, i.e., alternating positive and negative effects on the trait along the chromosome with no net effect of the QTL together on the chromosome, would behave in a very similar way to the case in which all loci are neutral. In general, there will be greater discrimination if selection is more intense and the numbers of generations are longer than the examples in Table 5 (n = 4, t = 4).
Interspersed inter se matings: QTL of small effect, particularly if they are partially recessive to that in the recurrent parent, have a low probability of retention. It is possible to increase the probability of QTL segregation by increasing the strength of the selection in increasing frequency relative to that of backcrossing in reducing frequency. One possible method, which fits within the independent family (line) structure discussed here, is to intersperse a generation of inter se mating between each generation of backcrossing, i.e., to allow two generations of selection per generation of backcrossing. For simplicity, it is assumed that the same family size is used in each case: in the backcrossing generations, the best male (or female) from n of that sex is selected; in the inter se generation, the best male from n males recorded and the best female from n females recorded are selected and mated. The important difference from the previous analysis is that now matings can be made between two heterozygotes so that homozygotes for the QTL of interest may then be selected for the next backcross generation.
In the previous analysis of backcrosses alone, the quantity a was used for simplicity to define the homozygote–heterozygote difference; a fuller definition is now required, and the notation of Falconer and Mackay (1996) is adopted, adding an asterisk to distinguish values from those above: the genotypic values (in within family SD units) are AAa*, AA′ d*, and A′A′ −a*. Hence, in the previous analysis, a = a* + d*. If the parents of the inter se generation are a homozygote and a heterozygote, the calculations given in Equation 1 still apply. If the mating is between two heterozygotes, the probabilities that the individual selected for the next backcross generation is AA, AA′, or A′A′ are readily computed by extending the formula, as exemplified by Equation 4. Let these probabilities be, for example, PAA(n,a*,d*). The full two-generation process can be described by a transition matrix C, in which the rows and columns denote the genotype of the backcross parent, AA for the first and AA′ for the second (a third row and column could be added for the absorbing state when the backcross parent is A′A′):
In Table 6, examples are given assuming that n = 4 individuals are recorded per family in both the backcross and inter se mating generations, for different degrees of dominance. The value of t refers to the number of backcross generations completed, and the probabilities given are that the QTL is still segregating, i.e., the individual selected from the inter se generation is either AA or AA′. For reference, to define the rate of loss after a few generations, values of 1 − λ, where λ is the larger eigen value of C, are also given and can be compared directly with the probabilities 1 − P(n,a), which equal 1 − eigenvalue for a 1 × 1 matrix), from Table 1 when only backcrossing with selection is practiced.
The greatest benefits from the inter se mating arise, of course, when the QTL of high value is recessive and therefore neutral during the selection among backcrossed individuals. In the additive case, the interspersed generation does not half the rate of loss (i.e., 1 − λ) unless the effect of the QTL is quite large. Because the time (i.e., total number of generations) required to reduce the probability of segregation of neutral background genes is doubled, it is moot whether there is benefit in inserting the inter se generation. Maintaining more replicate lines or increasing the number of individuals from each family recorded for the quantitative trait, each generation, family size may make more efficient use of resources. Where reproductive rate limits selection intensity, for example in mice, it might be increased by keeping two litters from each female or by selecting only among males that have been given multiple matings.
DISCUSSION
To convey some “feel” for the results, examples of simulated data sets are given in Table 7, in each case for a single chromosome of length 1 Morgan and following eight generations of backcrossing with selection of the best individual from 12 recorded. The different models simulated represent different distributions of the QTL effects, ranging from ai = 0.2 to 2, but with the same total difference in effect, Σiai = 2, between these chromosomes (as heterozygotes) from the recurrent and nonrecurrent backcross lines. The variances in the F2 would differ among the models (unless there was no recombination), being largest with only one QTL differentiating the lines. As expected from the previous analyses, there is segregation at one or more of the markers in the majority of the replicate backcross lines. It is also seen that the pattern differs somewhat according to the number of QTL accounting for the line difference on this chromosome, but that many marker configurations can appear for two or more quite different distributions of QTL effects, and that identifying whether one or more QTL are responsible for the marker effects seen is unlikely to be feasible from the marker distribution alone. As in other methods of QTL mapping, it is difficult to distinguish between one QTL and a pair of closely linked QTL (Haley and Knott 1992; Jansen 1993; Zeng 1993).
Putative evidence for a QTL in the region of a marker comes from finding that the marker is present in more replicate lines than expected by chance. If there is no QTL in the region, the probability that the marker remains segregating is ½t, and if M lines are maintained, the probability that it is found in m of them is given by the Poisson distribution with parameter M/2t. For example, with four generations of backcrossing, the probability that it is found in three or more lines is <5% (0.026, extending results of Table 2). Hence, in such an experiment, applying a site-by-site type I error, further attention should be given to regions that are found segregating in three or more lines. In an experiment run for eight generations with 10 lines, any region remaining segregating should be considered further. This does not take into account the multiple testing problem, which is straightforward if all markers are on different chromosomes, or if the markers are essentially unlinked because they are widely separated on fewer chromosomes; the Bonferroni correction can then be applied to give an experiment-wide error of specified value, albeit at the risk of considerable reduction in power. For example, taking 20 unlinked markers, one per mouse chromosome, the overall type I error if only markers found segregating in four or more of 10 congenic backcross lines are examined further is ~8% (0.004 × 20, from Table 2); more precisely, the probability that a locus remains segregating is (½)8, and the probability that any of the 20 remain segregating is 1 − [1 − (½)8]20 = 7.5%. A more sophisticated analysis is required to find critical values when the distribution of chromosome lengths and actual position of markers are taken into account, but it is quite straightforward to obtain genome-wide critical values such as those that are used in QTL mapping from one-generation crosses (Lander and Botstein 1989), for example, by a permutation test. It can be shown that for a chromosome with k markers, the recombination fractions between the adjacent markers being r1, r2, …, rk−1, then the probability that at least one remains segregating is ~(½)t[k − Ri(1 − ri)t], a result obtained by using transition probability matrices such as B (Equation 6) or the methods of Visscher and Thompson (1995) and by ignoring double recombinants. In any case, if the parental lines differ in mean performance and if there is evidence of segregation variance in the initial backcross or an F2 of the lines, it is moot whether the genome-wide null hypothesis of no effects is relevant. It might be more appropriate to assume an infinitesimal model (Visscher and Haley 1996), with the variance distributed equally among and within the chromosomes: it seems likely, however, since the net difference between the two lines is therefore likely to be small around any marker, that the probabilities of continued segregation of each marker will be little higher than in the neutral case unless the variance of aggregate effects among marker intervals is large.
Probabilities (Prob: t) that a QTL is retained with t of each of alternating generations of backcrossing and inter se mating, with one individual of each sex selected from n = 4 recorded in each case.
It is possible, at least in principle, to estimate the effects of a QTL located near a marker from the number of replicate lines in which it is segregating. Considering first individual QTL, the expected proportion of lines in which it is segregating is given by P(n,a)t, which can be equated to the actual proportion m/M (this is the maximum likelihood estimator). If there is no recombination between the marker and QTL, an estimate, â, of the effect can then be obtained by trial and error, evaluating Equations 1 and 3. As an approximation, Equations 2 and 3 can be used together to give [(1 + iâ/2)/2]t = m/M or â = (2/i)[2(m/M)1/t − 1]. Because the estimate is obtained from a realization of the Poisson distribution with a small parameter value, the standard error of the estimate is likely to be of the same size as the estimate itself, which can be considered as no more than a guide. The estimate also has to be corrected for recombination between the marker and the QTL. In principle, but beyond the scope of this paper, interval mapping (Lander and Botstein 1989) could be used to combine the data on markers, but sampling errors will remain large. Information on the effect of the QTL can also be obtained directly from segregation analysis within the backcross families at the end of the backcrossing phase, those retaining a QTL of large effect having both higher mean and higher within family variance than those in which it is lost. Further precision can be obtained by typing progeny for the marker(s) near the putative QTL.
Simulated examples of patterns of markers for different models of QTL distribution with the same total effect on a chromosome of length 100 cM, with 11 equally spaced markers at 0, 10, …, 100 cM.
A QTL can be mapped solely using the marker information on single individuals in each family at the end of the backcrossing phase. Consider the example in the first column of Table 7, and assume, as was actually the model simulated, that there was only one QTL on the chromosome, and that the lines in which the QTL was segregating could be identified from their mean and variance. In the 20 replicate lines, markers 1, 2, …, 7 were segregating in, respectively, 1, 3, 7, 12, 11, 2, and 1lines, and none of markers 8–10 remained segregating, so the most likely QTL position is between markers 4 and 5. Among the 18 lines segregating at markers 4 or 5, the QTL (let us assume identified from phenotypes for the trait) was segregating at 17; of these18 lines, nine were recombinants between marker 4 (map position 30 cM) and the QTL, and five were recombinants between the QTL and marker 5 (map position 40 cM). Hence, the maximum likelihood estimate of QTL position can be shown to be at 36.7 cM, close to its actual position at 35 cM. The precision was achieved by typing solely 20 genotypes, but obviously, this is an extreme example of a QTL of very large effect retained segregating over many (eight) generations. More generally, the precision will be a function of the number of lines in which the QTL remains segregating, those in which it and the markers are lost providing none, and the number of generations for which backcrossing is continued, the effective recombination rate being 1 − (1 − r)t. There is, however, a trade-off because increasing the number of generations leads to a reduction in probability of segregation and an increase in the effective recombination rate.
It would be possible to obtain more information about QTL positions and effects if the marker screening were conducted during the backcrossing program, and this would also enable decisions to be made as to when to cease backcrossing to optimize the trade-off between recombination and loss of QTL. The records of individual animals for the quantitative trait and for each marker can be combined to provide further information using maximum likelihood methods that are computationally feasible using Markov chain Monte Carlo methods, such as Gibbs sampling, and have been used in an analysis of recurrent backcross lines in which a set of different marker regions were maintained segregating (Ranceet al. 1997). The simple scheme analyzed here, in which marker information is collected only at the end, is more appropriate for laboratory species with short generation intervals than for commercial animal or crop species. In such cases, it is likely to be preferable to collect marker data each generation, and perhaps to also choose individuals for backcrossing on their marker genotype as well as on phenotype for the trait of interest. This leads to more complicated design and interpretation problems than are discussed here.
If only QTL on autosomes are to be identified, it should not matter greatly whether a male or female is selected for the next generation of backcrossing. If, however, QTL on the sex chromosome are to be located, then females should obviously be selected in the backcross line; if map lengths in females are greater than in males, there is an added benefit in doing so. There are other practical issues that have not been considered. For example, there is a risk that the selected individual in a line is infertile or that none of the required sex are available for selection. In such cases, it may be appropriate to initiate many more lines than are expected to be maintained, or to sacrifice the simplicities of having completely independent backcross lines by drawing sublines from surviving lines to maintain numbers. The analysis can be clearly developed further.
With recurrent backcrossing and selection to only one line, QTL that are recessive in the nonrecurrent parent will be missed. Although alternating backcrossing and inter se mating alleviates this problem, net selective pressures on recessives remain small. An alternative, if both lines are inbred, is to practice backcrossing and selection in two reciprocal sets of lines, differing in which is used as the recurrent parent. The probability that the QTL is maintained segregating in each type of line and over the whole set can readily be computed from the methods given here.
It is important to note that the method discussed in this and other studies (e.g., N. J. Schork, A. M. Beebe, B. Thiel, P. St. Jean, and R. L. Coffman, unpublished data) on the use of recurrent backcrossing paper is essentially a prescreening procedure for QTL detection, not a finishing point. When the recurrent backcrossing with selection on phenotype for the quantitative trait is completed, and regions of the genome which the marker analysis indicates that QTL for the trait are likely to be present, more detailed analysis is needed; the congenic lines, however, provide a useful starting point. For example, further backcrossing can be practiced while maintaining segregating-only specific short marker intervals. Segregation within the families and QTL and marker recording can be undertaken to confirm QTL effects and more closely map their position. Progeny testing of recombinants can be used to make this procedure more accurate (P. D. Keightley, personal communication).
For precise mapping of QTL, opportunity is needed for substantial recombination between linked QTL and between them and markers. The advantage of multiple generation schemes is that, in effect, recombination fractions are increased roughly in proportion to the number of generations, and similarly, the average length of chromosome retained around a marker is reduced in inverse proportion to the number of generations. The method discussed here has advantages and disadvantages over others for QTL location. In the most conventional, QTL are mapped by recording phenotypes and marker genotypes in an F2 or backcross of a line cross. An additional experiment is subsequently required to isolate these further, which can involve introgression or retention of marked segments by backcrossing; this contrasts with the backcrossing with selection proposed here in that it is the markers that are identified in the backcrossing rather than the phenotypes. Even so, a number of lines have to be retained for each QTL because the precise relation between marker and QTL position is not known. The use of markers, however; has the benefit that QTL of smaller effect can be retained, and as marker and trait data are collected throughout, information on the location accumulates during the backcrossing phase, but at the expense of a lot of recording, compared to backcrossing with selection on the trait. An alternative approach is to proceed to QTL mapping of more advanced generations of the cross, for example, F3, F4,…, before undertaking the QTL analysis; however, this may still require backcrossing to establish congenic lines for gene cloning. (Recurrent backcrossing rather than inter se mating is not feasible without selection because most QTL would be lost.) A further alternative for multigeneration analysis is the use of selected lines and identification of QTL, preferably replicated by changes in marker frequencies between high and low lines (Keightley and Bulfield 1993); however, this still requires subsequent backcrossing if congenic lines are needed for precise mapping and cloning.
The objective of this paper is not to show that the use of recurrent backcrossing with selection with several independent single family backcross lines is optimal in any broad way. It is indeed clear that if more effort were expended on recording markers during the backcrossing phase and associating them with performance of the trait, and perhaps using this information to generate sublines on a dynamic basis, further precision could be obtained; the analysis of such a scheme, however, is beyond the scope of this paper. The aim is merely to suggest a method with relatively low input of effort that may lead to fairly clear identification of QTL of large effect and simultaneous production of congenic lines that may be useful for further, more detailed analysis. The idea of backcrossing with selection is old indeed, and its successful use to identify QTL for disease resistance has been reported (Beebeet al. 1997). The only novelty is in the use of independent sublines and the quantification of the probabilities that QTL are retained. N. J. Schork, A. M. Beebe, B. Thiel, P. St. Jean and R. L. Coffman (unpublished data) discuss methods for analysis of nonindependent backcross lines with selection and use of markers.
Acknowledgments
I am grateful to Philippe Baret, Peter Keightley, Sara Knott, Peter Visscher, Zhao-Bang Zeng, and two anonymous referees for helpful comments, and to the Biotechnology and Biological Sciences Research Council for financial support.
Footnotes
-
Communicating editor: Z-B. Zeng
- Received July 2, 1997.
- Accepted November 10, 1997.
- Copyright © 1998 by the Genetics Society of America