Haplotype Probabilities for Multiple-Strain Recombinant Inbred Lines

Recombinant inbred lines (RIL) derived from multiple inbred strains can serve as a powerful resource for the genetic dissection of complex traits. The use of such multiple-strain RIL requires a detailed knowledge of the haplotype structure in such lines. Broman (2005) derived the two- and three-point haplotype probabilities for 2n-way RIL; the former required hefty computation to infer the symbolic results, and the latter were strictly numerical. We describe a simpler approach for the calculation of these probabilities, which allowed us to derive the symbolic form of the three-point haplotype probabilities. We also extend the two-point results for the case of additional generations of intermating, including the case of 2n-way intermated recombinant inbred populations (IRIP).

R ECOMBINANT inbred lines (RIL) can serve as powerful tools for genetic mapping. An RIL is formed by crossing two inbred strains followed by repeated matings among relatives (e.g., selfing or sibling mating) to create a new inbred line whose genome is a mosaic of the parental genomes. As each RIL is an inbred strain and so can be propagated eternally, a panel of RIL has a number of advantages for genetic mapping: one need genotype each strain only once; one can phenotype multiple individuals from each strain to reduce individual, environmental, and measurement variability; multiple invasive phenotypes can be obtained on the same set of genomes, including measurements on a single invasive phenotype over time or in different environments; and, as the breakpoints in RIL are more dense than those that occur in any one meiosis, greater mapping resolution can be achieved.
Members of the Complex Trait Consortium have recently begun the development of a large panel of eight-way RIL in the mouse (Threadgill et al. 2002;Complex Trait Consortium 2004). An eight-way RIL is formed by intermating eight parental inbred strains, followed by repeated selfing or sibling mating to produce a new inbred line whose genome is a mosaic of the eight parental strains. (Figure 1, A and B, illustrates the production of eight-way RIL by selfing and sibling mating, respectively.) This panel will serve as a valuable community resource for mapping the loci that contribute to complex phenotypes in the mouse.
In general, one might consider the development of a panel of 2 n -way RIL, mixing the genomes of 2 n different inbred lines. One might also consider an additional generation of interbreeding, preceding the process of inbreeding, to increase the density of breakpoints on the final RIL; we call this the RIL1 design. In 2 n -way RIL, inbreeding begins with individuals at generation n; in 2 n -way RIL1, two G n individuals from independent ''funnels'' (with initial crosses in the same order, but with no shared recombination events) are crossed, and inbreeding begins at generation n 1 1. The production of eight-way RIL1 by selfing and sibling mating is shown in Figure 1, C and D, respectively. Note that in eight-way RIL1, one may mate cousins at generation G 2 , as these individuals have no shared recombination events. For higher-order RIL1, a more extensive set of matings will be required to ensure that the individuals at generation G nÀ1 exhibit independent recombination events.
Further, it has been proposed to include some number of generations of random mating prior to inbreeding, a design that has been called an intermated recombinant inbred population (IRIP). Multiple designs for the formation of 2 n -way IRIP might be considered. First, one might create an unlimited population of individuals at generation n, each from a funnel having initial crosses in the same order, but with such crosses completely independent between individuals. Second, the individuals at generation n might each come from an independent, random funnel, with the order of the initial crosses completely randomized, though with all 2 n parental strains represented. We focus on the latter design, as it requires the formation of a single large population from which a panel of IRIP may be developed. The former design would require separate populations of intermating individuals for each line to be formed. Note that the use of random funnels makes the IRIP design distinct from the RIL1 design, which uses a fixed funnel. 1 The use of multiple-strain RIL panels will require a detailed understanding of the haplotype structure in such lines. At any given genomic position, an RIL will be homozygous for one of the 2 n possible parental alleles; a haplotype is the set of alleles at linked loci along a chromosome. We seek to understand the pattern of exchanges among the parental alleles along an RIL chromosome. In particular, the decision of whether to include additional generations of intermating should be based upon an understanding of the additional mapping precision that such intermating will provide.
The seminal article of Haldane and Waddington (1931) provided the basic results for the standard twoway RIL by selfing or by sibling mating: they derived both two-and three-point haplotype probabilities (i.e., the probabilities for all possible two-and three-locus haplotypes) for such two-way RIL. Winkler et al. (2003) calculated the two-point haplotype probabilities for the case of two-way IRIP. Broman (2005) derived the twoand three-point haplotype probabilities for four-and eight-way RIL, though with enormous computational effort. Only numerical results were provided for the three-point probabilities.
Here, we improve on the work of Haldane and Waddington (1931) and Broman (2005). We describe a simpler approach for the calculation of two-and threepoint probabilities in 2 n -way RIL, which allowed us to determine exact formulas for the three-point probabilities. We also extend the results on two-point haplotype probabilities for the case of 2 n -way RIL1 and 2 n -way IRIP. Our results on the map expansion obtained in each design will provide a useful guide to investigators considering the development of 2 n -way RIL and considering whether additional generations of intermating should be performed.

TWO POINTS
Here we derive the two-point haplotype probabilities on the fixed chromosome in 2 n -way RIL, RIL1, and IRIP. We consider both selfing and sibling mating, and we focus on the autosome. (Results for the X chromosome may be derived in a similar manner, but since the X chromosome recombines in females but not in males and so different alleles have different numbers of opportunities for recombination before they arrive at the four-chromosome bottleneck, even single-point results are difficult to write down for the general 2 n -way case.) We also derive the quantity analogous to the recombination fraction, but for the fixed RIL chromosome. Note that in the case of sibling mating, we generally assume n $ 2 (that is, 2 n $ 4).
Selfing: Two-way RIL: Haldane and Waddington (1931) derived the two-locus haplotype probabilites for two-way RIL by selfing. Here, we describe a simpler solution to the problem.
Let W 1 W 2 j X 1 X 2 denote the haplotypes for a G k individual (for k . 0), with subscripts denoting the alleles at the two loci. Let p 1 denote the probability that the W 1 W 2 haplotype goes on to be fixed, and let p 2 denote the probability that the W 1 X 2 haplotype goes on to be fixed. By symmetry, Pr(X 1 X 2 fixed) ¼ Pr(W 1 W 2 fixed) and Pr(X 1 W 2 fixed) ¼ Pr(W 1 X 2 fixed), and so 2p 1 1 2p 2 ¼ 1.
Further, if we condition on the first step, we have That is, the probability that the W 1 W 2 haplotype is fixed is the probability that it is transmitted intact to the next generation (and this can occur in two ways) and then becomes fixed plus the probability that W 1 is transmitted to one gamete and W 2 is transmitted to the other gamete and then these are brought together at fixation (and this can occur in two ways). Substituting p 2 ¼ (1 À 2p 1 )/2, we find p 1 ¼ 1/ [2(1 1 2r)] and p 2 ¼ r/(1 1 2r).
2 n -way RIL: The results for higher-order RIL by selfing may be immediately derived from the results for two-way RIL, due to the two-chromosome bottleneck at the start of inbreeding. We consider the generation of 2 n -way RIL via a funnel, in which the genomes are brought together as rapidly as possible, followed immediately by inbreeding (see Figure 1A). In the following, we assume n $ 2. Let L 1 ; L 2 ; . . . ; L 2 n denote the parental lines, and consider the cross [(L 1 3 L 2 ) 3 (L 3 3 L 4 )] 3 .. . . We also use L i to denote the allele from that line.
Let W 1 W 2 j X 1 X 2 denote the alleles on the two chromosomes in generation n, at which inbreeding begins. We must have W i 2 fL 1 ; L 2 ; . . . ; L 2 nÀ1 g and X i 2 fL 112 nÀ1 ; . . . ; L 2 n g.
To derive the haplotype probabilities for the fixed 2 nway RIL chromosome, we first determine the haplotype probabilities at the start of inbreeding and then combine them with the results for two-way RIL. We begin with the calculation of the haplotype probabilities at the start of inbreeding. We consider the case that the L 1 allele will be fixed at the first locus; other probabilities follow by symmetry.
To obtain Pr(W 1 ¼ W 2 ¼ L 1 ), note that there must be no recombination at any of the initial mixing generations, and that the L 1 L 1 haplotype must be transmitted at each generation. Thus we see that Pr , as the two loci must recombine at the first generation but not at subsequent generations, and the L 1 allele at the first locus must always be transmitted. Finally, for i ¼ 0, 1, . . ., n À 2 and j ¼ 1, . . ., 2 i , we have We now proceed to calculate the haplotype probabilities for the fixed RIL chromosome. The probability that the fixed haplotype is L 1 L j , for j ¼ 1, . . ., 2 nÀ1 is simply the probability Pr(W 1 ¼ L 1 , W 2 ¼ L j ) multiplied by the probability that the W 1 W 2 haplotype gets fixed. For k ¼ 2 nÀ1 1 1, . . ., 2 n , the probability that the fixed multiplied by the probability that the W 1 X 2 haplotype gets fixed. Thus the two-locus haplotype probabilities in a 2 n -way RIL by selfing are as follows: The probability that the RIL chromosome is fixed at different alleles at the two loci (the quantity analogous to the recombination fraction) is then R ¼ 1 À 2 n Pr(L 1 L 1 ) ¼ 1 À (1 À r) nÀ1 /(1 1 2r). The map expansion in a 2 n -way RIL by selfing is then dR/dr j r¼0 ¼ n 1 1. (For a short proof of the fact that dR/dr j r r¼0 corresponds to the map expansion, see the appendix.) 2 n -way RIL1: In 2 n -way RIL1 by selfing, one crosses two G n individuals, generated from independent funnels, and then performs repeated selfing starting at generation n 1 1 (see Figure 1C). To calculate the twolocus haplotype probabilities for this case, we need to revise the haplotype probabilities for the generation in which inbreeding begins. We now have W i ; X i 2 fL 1 ; L 2 ; . . . ; L 2 n g. These haplotype probabilities use those from the formation of 2 n -way RIL, but with an additional generation of recombination.
We have Pr( Calculation of the haplotype probabilities on the fixed RIL1 chromosome proceeds as before, but a particular allele may come from either chromosome. Thus, for example, the probability that the RIL1 is fixed at L 1 L 1 is Pr(W 1 ¼ W 2 ¼ L 1 ) times the chance that the W 1 W 2 haplotype gets fixed, plus Pr(X 1 ¼ X 2 ¼ L 1 ) times the chance that the X 1 X 2 haplotype gets fixed, plus Pr(W 1 ¼ X 2 ¼ L 1 ) times the chance that the W 1 X 2 haplotype gets fixed, plus Pr(X 1 ¼ W 2 ¼ L 1 ) times the chance that the X 1 W 2 haplotype gets fixed. This gives Pr( The other cases are similar, and so the two-locus haplotype probabilities in a 2 n -way RIL1 by selfing are as follows: The probability that the RIL1 chromosome is fixed for different alleles is then R 1 1 2r). The map expansion for the 2 n -way RIL1 design by selfing is then n 1 2 À 2 1Àn .
2 n -way IRIP(s): In the formation of 2 n -way IRIP(s) by selfing, one generates an unlimited population of G n individuals from random funnels, intermates them for s generations, and then inbreeds, by selfing, a random individual from the n 1 s generation.
At generation G n , in the case of the funnel [(L 1 3 L 2 ) 3 (L 3 3 L 4 )] 3 . . ., the haplotype probabilities for the first chromosome are Pr( The other chromosome has a similar structure, but for the other alleles.
In the IRIP, we consider individuals from random funnels. That is, each individual comes from a cross of the form ½ðL k1 3 L k2 Þ 3 ðL k3 3 L k4 Þ 3 . . . , where ðk 1 ; k 2 ; . . . ; k 2 n Þ is a random permutation of (1, 2, . . ., 2 n ). The haplotype probabilities for a random individual at generation G n then become Pr( We thus have complete symmetry among the 2 n alleles.
Sibling mating: Two-way RIL: Haldane and Waddington (1931) derived the two-locus haplotype probabilities for two-way RIL by sibling mating. Their derivation involved the solution of a system of 22 linear equations. Here we describe a simpler solution to the problem.
Let W 1 W 2 j X 1 X 2 3 Y 1 Y 2 j Z 1 Z 2 denote the haplotypes for the pair of individuals at generation G k (for k $ 0). Let q 1 , q 2 , and q 3 denote the probabilities that the W 1 W 2 , W 1 X 2 , and W 1 Y 2 haplotypes, respectively, go on to be fixed. Others follow by symmetry, and so we have 4q 1 1 4q 2 1 8q 3 ¼ 1.
Four-way RIL: Our method for calculating the twolocus haplotype probabilities for two-way RIL by sibling mating (above) included the results for four-way RIL. The q i defined above are exactly the two-locus haplotype probabilities for four-way RIL by sibling mating. If we let L 1 , . . ., L 4 denote the four alleles, we have Pr(L i L i fixed) ¼ q 1 ¼ 1/[4(1 1 6r)] and Pr(L i L j fixed) ¼ q 2 ¼ r/[2(1 1 6r)] for i 6 ¼ j. These results are the same as those obtained by Broman (2005).
2 n -way RIL: Derivation of the two-point haplotype probabilities on the fixed chromosome in a 2 n -way RIL by sibling mating is similar to the case of selfing, although we must consider the four chromosomes at the start of inbreeding. Let W 1 W 2 j X 1 X 2 3 Y 1 Y 2 j Z 1 Z 2 denote the two-locus haplotypes in the two individuals at generation G nÀ1 , prior to inbreeding (see Figure 1B), and note that W i 2 fL 1 ; . . . ; L 2 nÀ2 g, X i 2 fL 112 nÀ2 ; . . . ; L 2 nÀ1 g,Y i 2 fL 112 nÀ1 ; . . . ; L 2 nÀ2 1 2 nÀ1 g, and Z i 2 fL 112 nÀ2 1 2 nÀ1 ; . . . ; L 2 n g. To determine the haplotype probabilities on the fixed chromosome, we first determine the probabilities that particular alleles survive to the G nÀ1 generation and then multiply those by the probabilities that such alleles go on to be fixed.
2 n -way RIL1: In 2 n -way RIL1 by sibling mating, G n individuals from independent funnels are crossed to form the G n11 generation, at which point inbreeding by sibling mating begins (see Figure 1D). At generation n, we have that W i ; Y i 2 fL 1 ; . . . ; L 2 nÀ1 g and X i ; Z i 2 fL 112 nÀ1 ; . . . ; L 2 n g. Derivation of the two-point haplotype probabilities proceeds with two changes: there is an additional generation of recombination prior to the start of inbreeding, and the L 1 L 1 haplotype may now be fixed in four possible ways: W 1 ¼ W 2 ¼ L 1 and the W 1 W 2 haplotype is fixed, Y 1 ¼ Y 2 ¼ L 1 and the Y 1 Y 2 haplotype is fixed, W 1 ¼ L 1 and Y 2 ¼ L 1 and the W 1 Y 2 haplotype is fixed, and finally W 2 ¼ L 1 and Y 1 ¼ L 1 and the Y 1 W 2 haplotype is fixed.
We first look at the haplotype probabilities at generation n. We have Pr( To obtain the two-point haplotype probabilities on the fixed 2 n -way RIL1 chromosome, we note that, for example, the probability that the L 1 L 1 haplotype is fixed is 2[(1 À r)/2] nÀ1 Pr(W 1 W 2 fixed) 1 2(1/2) 2(nÀ1) Pr(W 1 Y 2 fixed). The probability that the L 1 L k haplotype is fixed, for k . 2 nÀ1 , is 4(1/2) 2(nÀ1) Pr(W 1 X 2 fixed), with the 4 coming from the fixation of W 1 X 2 , W 1 Z 2 , Y 1 X 2 , or Y 1 Z 2 . Thus, the final results are as follows: It follows that the probability that the 2 n -way RIL1 is fixed at different alleles at the two loci is R ¼ 1 À [(1 À r) nÀ1 1 r 2 2Àn ]/ (1 1 6r), and so the map expansion is n 1 5 À 2 2Àn .
2 n -way IRIP(s): In the formation of 2 n -way IRIP(s) by sibling mating, one generates an unlimited population of G n individuals from random funnels, intermates them for s generations, and then inbreeds. The haplotype probabilities at generation n 1 s, at which inbreeding begins, are the same for the case of 2 n -way IRIP(s) by selfing, and so we have, at generation n 1 s, Pr(W 1 ¼ W 2 ¼ L 1 ) ¼ (1 À r) n1sÀ1 /2 n 1 [1 À (1 À r) sÀ1 ]/ 2 2n . The probabilities Pr(W 1 ¼ L 1 , W 2 ¼ L j ) for j 6 ¼ 1 may be derived by symmetry.
Summary: Here we have derived the two-point haplotype probabilities for 2 n -way RIL, RIL1, and IRIP. Perhaps our most important results concern the map expansion in the different designs, as these indicate the increased mapping resolution that may be obtained. The RIL1 and IRIP designs require additional generations of mating, and this additional effort must be weighed against the improved precision provided.
The map expansions for 2 n -way RIL, RIL1, and IRIP(s) by selfing are assembled in Table 1. The map expansion in the RIL1 design is somewhat ,1 unit greater than that for the RIL. In the IRIP, one obtains a slightly ,1 unit increase in the map expansion for each additional generation of intermating.

THREE POINTS
A technique similar to that used above for the case of two points may be used to derive the three-point haplotype probabilities in RIL. Broman (2005) derived these quantities, but obtained only numerical solutions. By our approach, we may obtain exact formulas for the three-point haplotype probabilities.
We focus exclusively on the autosome in four-and eight-way RIL by sibling mating. Exact formulas for fourand eight-way RIL by selfing were presented in Broman (2005). Results for higher-order RIL, RIL1, and IRIP may be obtained from the results provided below, and a similar technique may be used to derive results for the X chromosome.
We consider three points and assume that the recombination fractions in the two intervals are the same, r 12 ¼ r 23 ¼ r. Let c denote the three-point coincidence at meiosis, c ¼ Pr(double recombinant)/r 2 . Note that c is generally a function of r, with, for most organisms, c ¼ 0 for small r (indicating strong positive crossover interference) and c ¼ 1 for r ¼ 1 2 . We define r 13 to be the recombination fraction between the first and third loci, so that c ¼ (2r À r 13 )/(2r 2 ) and so r 13 ¼ 2r(1 À cr).
To simplify some of the notation in what follows, define r 00 ¼ 1 À 2r 1 cr 2 , the chance that a nonrecombinant haplotype is transmitted; r 01 ¼ r(1 À cr), the chance that the second but not the first interval recombines; and r 11 ¼ cr 2 , the chance that both intervals recombine.
Four-way RIL: We consider the case of four-way RIL by sibling mating. Let p ijk denote the probability that the ijk haplotype is fixed. (For ease of notation, here we denote the four alleles as the integers 1, 2, 3, 4.) Taking account of the various symmetries, there are seven distinct haplotype probabilities, shown in Table 2. Note that we must have 4p 111 1 8p 112 1 4p 121 1 16p 113 1 8p 131 1 16p 123 1 8p 132 ¼ 1.
To derive these seven probabilities, we condition on the first step toward inbreeding. For example, we can write p 111 ¼ 2ðr 00 =2Þp 111 1 4ðð1 À r Þ=2Þð1=2Þp 113 1 2ðð1 À r 13 Þ=2Þð1=2Þp 131 . This is derived as follows: the 111 haplotype can be fixed if it is transmitted intact in the first generation and then that haplotype goes on to be fixed (and this can happen in two ways), or if the 1 alleles at two adjacent loci are transmitted from the first parent in one generation and from the other parent at the third locus and these are brought together at fixation (and this can occur in four different ways), or Map expansion in 2 n -way RIL, RIL1, and IRIP

TABLE 2
Three-point haplotype probabilities on an autosome in four-way RIL by sibling mating
A quantity analogous to the three-point coincidence, but for the fixed RIL chromosome, may be calculated from these results, as C ¼ (1 À 4p 111 À 8p 112 À 16p 113 )/ R 2 , which gives the following: Þð110 1 404r À 288r 2 1 3cð5 À 20r À 204r 2 1 192r 3 Þ À 16c 2 ð2 À 13r 1 18r 2 Þr 2 Þ 18ð1 1 12r À 12cr 2 Þð5 1 10r À 4ð2 1 cÞr 2 1 8cr 3 Þ : Eight-way RIL: The three-point haplotype probabilities for the autosome in eight-way RIL by sibling mating may be immediately derived from the results on fourway RIL, using the equations in Table 7 of Broman (2005). We neglect to write these out, but do derive the quantity analogous to the three-point coincidence, for the fixed eight-way RIL chromosome: ð116r Þ½28011208r À 848r 2 15cð7 À 28r À 368r 2 1344r 3 ÞÀ2c 2 ð49 À 324r 1452r 2 Þr 2 À 16c 3 ð1 À 2r Þr 4 49ð1112r À 12cr 2 Þ½5110r À 4ð21cÞr 2 18cr 3 : In the work of Broman (2005), nearly 3 years of total computer time were used to derive the above quantity, although the results were strictly numerical and were for the case of no interference (c ¼ 1) and for a model of strong positive crossover interference. Here, we have shown a simpler method to derive the result, which allowed us to obtain explicit formulas for the threepoint probabilities. The formulas in this section match the numerical results of Broman (2005) to within round-off error. DISCUSSION We have improved on the work of Haldane and Waddington (1931) and Broman (2005), describing a simpler approach for the calculation of two-and threepoint haplotype probabilities in multiple-strain RIL. Our simpler solution (which is an instance of the standard trick for calculations with Markov chains: condition on the first step) allowed us to derive exact formulas for the three-point haplotype probabilities in four-and eight-way RIL by sibling mating. Moreover, we have extended the results on two-point haplotype probabilities for the case of additional generations of intermating in the 2 n -way RIL1 and IRIP designs. It is important to emphasize that the results on IRIP are based on the assumption of an infinite population of intermating individuals. With the finite populations that would be used in practice, the progress to inbreeding would be more rapid and the realized map expansion would be somewhat less than our theoretical calculations indicate.
While our results on the two-point haplotype probabilities will play an important role in methods for reconstructing the RIL haplotypes on the basis of incompletely informative markers, such as single-nucleotide polymorphisms (SNPs), perhaps the greatest value of this work concerns the map expansion provided by the different designs. The precision of localization of a quantitative trait locus (QTL) depends critically upon the density of breakpoints in the mapping population, but the increased density of breakpoints in RIL1 and IRIP must be weighed against the additional generations of intermating (and of inbreeding) required. In this regard, it should be emphasized that there is an important trade-off between the power to identify novel QTL and the precision of localization of QTL. The LOD threshold for significance in a genomewide scan for QTL increases as the density of breakpoints increases. This is shown clearly in the results of Lander and Botstein (1989) on the LOD threshold for the densemap case: the threshold increases with the effective genetic length of the genome. Thus, while the introduction of additional generations of interbreeding in the formation of RIL will lead to greater mapping precision, a larger RIL panel will be required to identify QTL with a given effect size. Martin and Hospital (2006) recently pointed out that the maximum-likelihood estimate of the recombination fraction between two markers, on the basis of breakpoint frequencies in an RIL panel, is subject to some bias. They presented a method, using a Taylor expansion, for reducing the bias. They further described a method for testing for crossover interference with RIL data. Their methods could also be used with the multiple-strain RIL considered herein. While these results are quite interesting, we wish to point out that, in the use of RIL for QTL mapping, interest is in the breakpoint frequencies themselves and not in the underlying recombination fractions. Moreover, an understanding of recombination at meiosis, particularly regarding crossover interference, might best be studied in a large backcross or intercross, rather than with RIL, as the process of inbreeding to develop RIL is subject to considerable selection, and so our understanding of recombination on the basis of breakpoint frequencies in RIL would likely be distorted. Martin and Hospital (2006) viewed the term ''map expansion'' as misleading, as it really concerns an increased frequency of breakpoints and no real change in the genetic map. We, however, still prefer the phrase, and no useful alternatives have been proposed; it provides a useful shorthand for a more complex phenomenon. They further take issue with the treatment, in software, of RIL as a backcross through equations such as R ¼ 4r/(1 1 6r) and with an assumption of no crossover interference, as even if meiosis exhibits no interference, occurrences of breakpoints in adjacent intervals on an RIL chromosome are not independent. (Note that this lack of independence was identified by Haldane and Waddington in 1931.) To the contrary, however, as was stated in Broman (2005), the breakpoint process on an RIL chromosome, at least for the mouse, will be more closely approximated by a Poisson process than is the crossover process at meiosis, which in the mouse exhibits extremely strong positive crossover interference (see Broman et al. 2002). Thus the current approach for multipoint QTL mapping in RIL, embodied in software such as MapMaker/QTL (Lander et al. 1987), is entirely reasonable.