## Abstract

Recombinant inbred lines (RILs) can serve as powerful tools for genetic mapping. Recently, members of the Complex Trait Consortium proposed the development of a large panel of eight-way RILs in the mouse, derived from eight genetically diverse parental strains. Such a panel would be a valuable community resource. The use of such eight-way RILs will require a detailed understanding of the relationship between alleles at linked loci on an RI chromosome. We extend the work of Haldane and Waddington on two-way RILs and describe the map expansion, clustering of breakpoints, and other features of the genomes of multiple-strain RILs as a function of the level of crossover interference in meiosis.

RECOMBINANT inbred lines (RILs) can serve as powerful tools for genetic mapping. An RIL is formed by crossing two inbred strains followed by repeated selfing or sibling mating to create a new inbred line whose genome is a mosaic of the parental genomes (Figure 1). As each RIL is an inbred strain, and so can be propagated eternally, a panel of RILs has a number of advantages for genetic mapping: one need genotype each strain only once; one can phenotype multiple individuals from each strain to reduce individual, environmental, and measurement variability; multiple invasive phenotypes can be obtained on the same set of genomes; and, as the breakpoints in RILs are more dense than those that occur in any one meiosis, greater mapping resolution can be achieved.

Members of the Complex Trait Consortium recently proposed the development of a large panel of eight-way RILs in the mouse (Threadgill *et al.* 2002; Complex Trait Consortium 2004). An eight-way RIL is formed by intermating eight parental inbred strains, followed by repeated sibling mating to produce a new inbred line whose genome is a mosaic of the eight parental strains (Figure 2). Such a panel would serve as a valuable community resource for mapping the loci that contribute to complex phenotypes in the mouse.

The use of such a panel requires a detailed understanding of the relationship between alleles at linked loci on a recombinant inbred (RI) chromosome, particularly for the reconstruction of the parental origin of DNA (the haplotypes) on the basis of less-than-fully informative genetic markers [such as single-nucleotide polymorphisms (SNPs)]. The haplotype reconstruction will likely make use of the standard hidden Markov model (HMM) technology, and a primary ingredient to the HMM will be the two-point haplotype probabilities on an RIL chromosome, such as the probability that the RIL is fixed at allele *A* at one locus and allele *E* at a second locus, as a function of the recombination fraction (per meiosis) between the two loci. Also of interest are the three-point haplotype probabilities, which inform us regarding the clustering of breakpoints on the RIL chromosome and of the appropriateness of the Markov assumption used in the HMM.

Haldane and Waddington (1931)(which we abbreviate H&W) studied the case of two-way RILs by selfing and sibling mating and, in an impressive feat of algebra, derived the relationship between the recombination fraction between two loci and the probability that the loci are fixed at different parental alleles in the RIL. They further showed that such two-point results may be used to derive three-point probabilities. Related work includes that of Hospital *et al.* (1996), who developed an algorithm for calculating multilocus genotype probabilities in two-way RILs, and Winkler *et al.* (2003), who considered the case of two-way RILs subjected to extra generations of outbreeding.

In this article, we extend the work of H&W to the case of eight-way RILs. We derive the algebraic relationship between the recombination fraction at meiosis and the analogous quantity for the RIL chromosome, for the case of multiple-strain RILs by selfing and sibling mating (including the X chromosome in the case of sibling mating). In the case of multiple-strain RILs by selfing, we also obtain exact results for three-point probabilities. However, with multiple-strain RILs by sibling mating, such symbolic results for the three-point probabilities continue to elude us, and so we must be satisfied with numerical results.

A number of other features of the genomes of multiple-strain RILs are of considerable interest and yet are not amenable to such symbolic or numerical analysis (for example, the number of generations of inbreeding required to obtain complete homozygosity). We investigated such features via computer simulations.

## TWO POINTS

We first consider the case of two loci. Let *r* denote the recombination fraction between the two loci, and let *g _{m}* denote the allele at locus

*m*on a random RI chromosome at fixation. We seek the joint distribution of the

*g*. We are particularly interested in

_{m}*R*= Pr(

*g*

_{1}≠

*g*

_{2}), the quantity analogous to the recombination fraction, but on the fixed RI chromosome. Note that we assume no mutation and no selection. The alleles at each locus are denoted

*A*,

*B*,

*C*,

*D*, … ,

*H*. Two-locus haplotypes are written, for example,

*AA*,

*AB*,

*BA*,

*BB*, where the first allele corresponds to the first locus and the second allele to the second locus. Two-locus diplotypes (

*i.e.*, phase-known genotypes) are written, for example,

*AB*|

*AB*(an individual who is homozygous

*A*at the first locus and homozygous

*B*at the second locus). It might be better to write the diplotype as but such notation is unwieldy.

We assume that two-way RILs are obtained with an initial cross of the form (*A* × *B*) × (*A* × *B*), four-way RILs by the cross (*A* × *B*) × (*C* × *D*), and eight-way RILs by the cross [(*A* × *B*) × (*C* × *D*)] × [(*E* × *F*) × (*G* × *H*)], with, in all cases, females listed first.

### RILs by selfing:

#### Two-way RILs, selfing:

The results for two-way RILs by selfing were presented in H&W. By symmetry, it is clear that Pr(*g _{m}* =

*A*) = Pr(

*g*=

_{m}*B*) =

^{1}/

_{2}. They further showed that the two-point probabilities are Thus

*R*= Pr(

*g*

_{1}≠

*g*

_{2}) = 2

*r*/(1 + 2

*r*).

We describe a general approach to obtain this result, as the technique is used in what follows and is most clear in this, the simplest case. Let *X _{n}* denote the two-locus diplotype for the individual at generation

*n*. The {

*X*} form a Markov chain, as

_{n}*X*

_{n}_{+1}is conditionally independent of X

_{0},

*X*

_{1}, … ,

*X*

_{n}_{−1}, given

*X*. That is, the parental diplotype at a particular generation depends only on the diplotype at the preceding generation and not on the entire history. There are 10 possible diplotypes, as the order of the two haplotypes may be ignored. (If haplotype order were taken into account, there would be 2

_{n}^{4}= 16 states.) This number may be reduced further by accounting for further symmetries: The order of the two loci may be ignored, and the symbols

*A*and

*B*may be switched. Thus we form 5 distinct states, shown in Table 1. Let

*Y*denote the state at generation

_{n}*n*, among these 5 minimal states. The {

*Y*} also form a Markov chain.

_{n}Let *P _{ij}* = Pr(

*Y*

_{n}_{+1}=

*j*|

*Y*=

_{n}*i*), the transition matrix for the chain. Calculation of the

*P*deserves further explanation. Consider the state

_{ij}*AA*|

*BB*, corresponding to heterozygosity at each locus, with the

*A*alleles on the same haplotype. The possible meiotic products are

*AA*,

*AB*,

*BA*, and

*BB*, with probabilities (1 −

*r*)/2,

*r*/2,

*r*/2, and (1 −

*r*)/2, respectively. The probabilities for the states in the next generation may be obtained by calculating the Kronecker product of this vector with itself and then collapsing the 16 probabilities to give the probabilities of the five states in Table 1. Thus the probabilities of transition from state

*AA*|

*BB*to states

*AA*|

*AA*,

*AB*|

*AB*,

*AA*|

*AB*,

*AA*|

*BB*, and

*AB*|

*BA*are (1 −

*r*)

^{2}/2,

*r*

^{2}/2, 2

*r*(1 −

*r*), (1 −

*r*)

^{2}/2, and

*r*

^{2}/2, respectively.

The states *AA*|*AA* and *AB*|*AB* are absorbing; for these states, *P _{ii}* = 1. Our goal is to obtain the absorption probabilities starting at state

*AA*|

*BB*(for example, starting at the state

*AA*|

*BB*, the chance that the chain will eventually hit the state

*AA*|

*AA*). These absorption probabilities may be obtained as the solutions of sets of linear equations (Norris 1997, Sect. 1.3).

Let *h _{i}* denote the probability, starting at state

*i*, that the chain is absorbed into the state

*AA*|

*AA*. Clearly

*h*

_{AA}_{|}

*= 1 and*

_{AA}*h*

_{AB}_{|}

*= 0. For the other three states, we condition on the first step and obtain*

_{AB}*h*= ∑

_{i}*. Thus we obtain a set of three linear equations in three unknowns, which may be solved to obtain*

_{k}P_{ik}h_{k}*h*

_{AA}_{|}

*= 1/(1 + 2*

_{BB}*r*). Thus Pr(

*Y*→

_{n}*AA*|

*AA*|

*Y*

_{0}=

*AA*|

*BB*) = 1/(1 + 2

*r*). The states within an equivalence class are equally likely, and so Pr(

*X*→

_{n}*AA*|

*AA*|

*X*

_{0}=

*AA*|

*BB*) = 1/[2(1 + 2

*r*)].

#### Four-way RILs, selfing:

The results for two-way RILs by selfing may be extended immediately to obtain those for four-way RILs by selfing, by considering one preceding generation of recombination, as the two-chromosome generation (in which inbreeding begins) is a bottleneck: Alleles that do not appear in this generation cannot appear on the final RI chromosome.

For example, the chance that the final haplotype in a four-way RIL by selfing is *AA* is the probability that in the initial cross of *AA* × *BB*, the *AA* haplotype is transmitted, multiplied by the probability that a two-way RIL by selfing is fixed at *AA*. Thus, Pr(*AA*) = (1 − *r*)/2 × 1/[2(1 + 2*r*)]. Similarly, the chance that the final haplotype is *AB* is the chance that the initial cross of *AA* × *BB* delivers *AB*, multiplied by the chance that a two-way RIL by selfing is fixed at *AA*, and so Pr(*AB*) = *r*/2 × 1/[2(1 + 2*r*)]. Finally, the chance that the final haplotype is *AC* is the chance that the initial cross of *AA* × *BB* delivers *A* at the first locus, multiplied by the chance that the cross of *CC* × *DD* delivers *C* at the second locus, multiplied by the chance that a two-way RIL by selfing is fixed at *AB*, and so Pr(*AC*) = ^{1}/_{2} × ^{1}/_{2} × *r*/(1 + 2*r*).

For four-way RILs, the marginal probabilities are of course Pr(*g _{m}* =

*i*) =

^{1}/

_{4}for

*i*=

*A*,

*B*,

*C*,

*D*. The two-locus probabilities are Thus

*R*= Pr(

*g*

_{1}≠

*g*

_{2}) = 3

*r*/(1 + 2

*r*). When

*r*=

^{1}/

_{2},

*R*=

^{3}/

_{4}.

#### Eight-way RILs, selfing:

The results for eight-way RILs can be deduced from the results for four-way RILs, by the same technique that allowed us to obtain the results for four-way RILs by selfing from those for two-way RILs by selfing.

For eight-way RILs by selfing, the marginal probabilities are Pr(*g _{m}* =

*i*) =

^{1}/

_{8}for

*i*=

*A*,

*B*, … ,

*H*. The two-locus probabilities are shown in Table 2. It is especially interesting in Table 2 that the off-diagonal elements are not all the same. The results in Table 2 give

*R*=

*r*(4 −

*r*)/(1 + 2

*r*). When

*r*=

^{1}/

_{2},

*R*=

^{7}/

_{8}.

One can easily go backward, from eight-way RILs to four-way RILs, by taking *A* = *B*, *C* = *D*, *E* = *F*, *G* = *H*, and collapsing the joint probabilities. Similarly, taking *A* = *B* = *C* = *D* and *E* = *F* = *G* = *H*, one can collapse to obtain the results for two-way RILs.

### X chromosome for RILs by sibling mating:

#### Two-way RILs, X chromosome:

H&W derived the connection between *r* and *R* for the X chromosome for two-way RILs by sibling mating. The full two-point distribution may be obtained from their result, using the marginal distribution Pr(*g _{m}* =

*A*) =

^{2}/

_{3}, Pr(

*g*=

_{m}*B*) =

^{1}/

_{3}, and that Pr(

*AB*) = Pr(

*BA*). The two-locus probabilities are shown in Table 3. These results give

*R*= (8/3)

*r*/(1 + 4

*r*). When

*r*=

^{1}/

_{2},

*R*=

^{4}/

_{9}.

#### Four-way RILs, X chromosome:

The case of four-way RILs by sibling mating cannot be deduced from the above, but we were able to calculate the results symbolically using a combination of R (Ihaka and Gentleman 1996) and Mathematica (Wolfram Research 2003), by the approach described above, in *Two-way RILs by selfing*.

Let *X _{n}* denote the parental type at the

*n*th generation, with

*X*

_{0}=

*AA*|

*BB*×

*CC*. There are 405 such parental types, but they may be reduced to 116 distinct states by taking account of two symmetries: The order of the two loci may be reversed, and the

*A*and

*B*alleles may be exchanged. There are four absorbing states:

*AA*|

*AA*×

*AA*,

*AB*|

*AB*×

*AB*,

*AC*|

*AC*×

*AC*, and

*CC*|

*CC*×

*CC*. The determination of the absorption probabilities again requires the solution of systems of linear equations, in this case 112 equations in 112 unknowns, since there are a total of 116 states, of which 4 are absorbing.

The marginal probabilities are Pr(*g _{m}* =

*i*) =

^{1}/

_{3}for

*i*=

*A*,

*B*,

*C*. The two-locus probabilities are the following: Thus

*R*= 4

*r*/(1 + 4

*r*). When

*r*=

^{1}/

_{2},

*R*=

^{2}/

_{3}.

#### Eight-way RILs, X chromosome:

The case of eight-way RILs can be deduced from the case of four-way RILs (due to the bottleneck at the four-chromosome stage), by the technique described above, for the case of RILs by selfing. For example, the chance that an eight-way RIL is fixed at *AB* on the X chromosome is equal to the chance that, in the *AA* × *BB* cross, the *AB* haplotype is transmitted, times the chance that a four-way RIL is fixed at *AA*, giving *r*/2 × 1/[3(1 + 4*r*)].

The marginal probabilities are Pr(*g _{m}* =

*A*) = Pr(

*g*=

_{m}*B*) = Pr(

*g*=

_{m}*E*) = Pr(

*g*=

_{m}*F*) =

^{1}/

_{6}and Pr(

*g*=

_{m}*C*) =

^{1}/

_{3}. The joint two-locus probabilities are shown in Table 4. It follows that

*R*= (14/3)

*r*/(1 + 4

*r*). When

*r*=

^{1}/

_{2},

*R*=

^{7}/

_{9}.

### Autosomes for RILs by sibling mating:

#### Two-way RILs, autosomes by sib mating:

H&W provided the results for the autosome in two-way RILs by sibling mating. The marginal distribution is Pr(*g _{m}* =

*A*) = Pr(

*g*=

_{m}*B*) =

^{1}/

_{2}. The two-locus joint probabilities are Thus

*R*= 4

*r*/(1 + 6

*r*). When

*r*=

^{1}/

_{2},

*R*=

^{1}/

_{2}.

#### Four-way RILs, autosomes by sib mating:

The case of four-way RILs cannot be deduced from the above. Let *X _{n}* denote the parental type at generation

*n*, with

*X*

_{0}=

*AA*|

*BB*×

*CC*|

*DD*. There are 9316 such states, which reduce to 700 distinct states after we take account of several symmetries: reversing the order of the two loci, exchanging the

*A*and

*B*alleles, exchanging the

*C*and

*D*alleles, exchanging

*A*for

*C*and

*B*for

*D*, and any combination of these. There are three distinct absorbing states,

*AA*|

*AA*×

*AA*|

*AA*,

*AB*|

*AB*×

*AB*|

*AB*, and

*AC*|

*AC*×

*AC*|

*AC*. The determination of the absorption probabilities requires the solution of a system of 697 linear equations in 697 unknowns.

This system of equations proved too large to solve symbolically; however, the numerical solution for any particular value of the recombination fraction *r* was relatively simple to obtain, and we could infer the algebraic forms of the equations. Insertion of the inferred solution back into the system of equations demonstrated that our algebraic results are correct.

The marginal distribution is Pr(*g _{m}* =

*i*) =

^{1}/

_{4}for

*i*=

*A*,

*B*,

*C*,

*D*. The two-locus joint probabilities are Thus

*R*= 6

*r*/(1 + 6

*r*). When

*r*=

^{1}/

_{2},

*R*=

^{3}/

_{4}.

#### Eight-way RILs, autosomes by sib mating:

The case of eight-way RILs can be deduced from the results for four-way RILs. The marginal distribution is Pr(*g _{m}* =

*i*) =

^{1}/

_{8}for

*i*=

*A*,

*B*, … ,

*H*. The two-locus joint probabilities are Thus

*R*= 7

*r*/(1 + 6

*r*). (This was the key target of all of our efforts.) When

*r*=

^{1}/

_{2},

*R*=

^{7}/

_{8}. Note that here, all off-diagonal elements are the same.

### Higher-order RILs:

The two-point probabilities for 2* ^{n}*-way RILs for any

*n*may be derived from the results presented above, although here we sought to obtain only the relationship between

*r*and

*R*. These results are assembled in Table 5. The case of the X chromosome proved cumbersome and a closed-form solution elusive. Omitting some rather tedious algebra, we have, for the X chromosome of a 2

*-way RIL by sibling mating with*

^{n}*n*≥ 2, where the coefficients

*x*may be defined inductively, as follows. Begin with

_{nk}*x*

_{21}= 3,

*x*

_{31}= 1, and

*x*

_{32}= 4 and with

*x*

_{2}

*= 0 for*

_{k}*k*≥ 2 and

*x*

_{3}

*= 0 for*

_{k}*k*≥ 3. Then, for

*n*≥ 4, let

*x*

_{n}_{1}= 0 and let

*x*=

_{nk}*x*

_{n}_{−1,}

_{k}_{−1}+

*x*

_{n}_{−2,}

_{k}_{−1}for

*k*≥ 2. The complexity of these formulas is due to the fact that the X chromosome recombines in females but not males, and so different alleles have different numbers of opportunities for recombination before they arrive at the four-chromosome bottleneck, at which inbreeding begins.

Of special interest is the map expansion in RILs. For two-way RILs by selfing, note that *R* ≈ 2*r* for small *r*, indicating a 2× map expansion. In general, one may define the map expansion as *dR/dr*|_{r}_{=0}. The map expansion for 2* ^{n}*-way RILs by selfing is then (

*n*+ 1) for

*n*≥ 1. In 2

*-way RILs by sibling mating (for*

^{n}*n*≥ 2), the map expansion is (

*n*+ 4) for the autosomes and (

^{2}/

_{3}

*n*+ 4) for the X chromosome.

## THREE POINTS

We consider the case of three loci. We assume the recombination fractions in the two intervals are the same, *r*_{12} = *r*_{23} = *r*. Results for the case of separate recombination fractions could also be obtained, but the expressions can be much more complex, and they provide essentially no further insight.

Let *c* denote the three-point coincidence at meiosis, *c* = Pr(double recombinant)/*r*^{2}, which may also be written as Pr(recn in 2–3|recn in 1–2)/Pr(recn in 2–3). Note that *c* is generally a function of *r*, with *c* = 0 for small *r* (indicating strong positive crossover interference) and *c* = 1 for *r* = ^{1}/_{2}. We define *r*_{13} to be the recombination fraction between the first and third loci, so that *c* = (2*r* − *r*_{13})/(2*r*^{2}) and so *r*_{13} = 2*r*(1 − *cr*).

In the case of no crossover interference, we have, of course, *c* = 1 for all *r*. We are particularly interested in the case of positive crossover interference. Broman *et al.* (2002) studied crossover interference in the mouse and showed that the gamma model (McPeek and Speed 1995) provided a good fit to available data. The gamma model involves a single parameter, ν, which indicates the strength of crossover interference; ν = 1 corresponds to no interference, and ν > 1 corresponds to positive crossover interference. Broman *et al.* (2002) obtained the estimate ν̂ = 11.3 for the mouse, indicating especially strong crossover interference.

Zhao and Speed (1996) derived the map function for stationary renewal models of the recombination process at meiosis. Their results may be used to calculate the three-point coincidence for the gamma model, as a function of *r* and the interference parameter, ν. The map function for the gamma model is where *f*(*t*; ν) = *e*^{−2ν}* ^{x}*(2ν)

^{ν}

*x*

^{ν−1}/Γ(ν), the density of the gamma distribution with shape parameter ν and rate parameter 2ν.

We thus have, for the gamma model, , and we can obtain the three-point coincidence by *c* = (2*r* − *r*_{13})/(2*r*^{2}). While *M*_{ν}(*d*) cannot be obtained in closed form, it can be calculated by numerical integration. Further, *M*^{−1}_{ν} cannot be calculated directly, but can be obtained by solving *r* = *M*_{ν}(*d*) for *d* by Newton's method. This was done in R (Ihaka and Gentleman 1996).

The three-point coincidence for the gamma model with ν = 11.3 is displayed as the dashed curve in Figure 3A. For *r* < 0.1, the coincidence is essentially 0, indicating that if the first pair of loci recombine, the second pair will not. As *r* approaches ^{1}/_{2}, the coincidence approaches 1.

### RILs by selfing:

#### Two-way RILs, selfing:

H&W showed that the two-point probabilities for two-way RILs by selfing are sufficient to determine the three-point probabilities. Note that the equation *R* = 2*r*/(1 + 2*r*) applies to each interval between loci, and so, because *r*_{13} = 2*r*(1 − *cr*), we have *R*_{13} = 2*r*_{13}/(1 + 2*r*_{13}) = 4*r*(1 − *cr*)/[1 + 4*r*(1 − *cr*)]. Thus, for example, to calculate the probability of the haplotype *AAA* on the RIL, we note that Pr(*AAA*) + Pr(*AAB*) = Pr(*AA*–), Pr(*ABB*) + Pr(*ABA*) = Pr(*AB*–), and Pr(*AAA*) + Pr(*ABA*) = Pr(*A*–*A*), and thus, as Pr(*ABB*) = Pr(*AAB*), we have

Plugging in *R* = 2*r*/(1 + 2*r*), and using a similar approach for the other two cases, we obtain the distribution for the three-locus haplotype on the RI chromosome, in two-way RILs by selfing:

We are especially interested in the quantity analogous to the coincidence for the RI chromosome, *C* = [Pr(*ABA*) + Pr(*BAB*)]/*R*^{2}, which gives the following: Note that in the case *r* = 0, we have *C* = (2 + *c*)/2; with no interference (*c* = 1), *C* = ^{3}/_{2}, and with strong positive interference (*c* = 0), *C* = 1. In the case that *r* = ^{1}/_{2} and *c* = 1, we have, of course, *C* = 1.

The coincidence is plotted as the black curves in Figure 3B, with the solid and dashed curves corresponding to no interference and strong positive interference (the gamma model with ν = 11.3), respectively. Note that in the case of no interference, the coincidence is entirely >1, indicating clustering of breakpoints: if the first two loci have recombined on the RIL chromosome, the second two loci are *more likely* to have recombined. In the case of strong positive interference, the coincidence is ≤1 for all *r*.

#### Four-way RILs, selfing:

In the case of four-way RILs by selfing, the joint three-locus genotype probabilities may be derived from the above results, by the technique used for the case of two loci. There are 64 possible three-locus genotypes, which collapse into seven distinct cases. These cases and the corresponding probabilities are shown in Table 6.

The coincidence-type quantity for the RI chromosome is then *C* = (4*a*_{3} + 8*a*_{5} + 16*a*_{6} + 8*a*_{7})/*R*^{2}, which gives the following: Thus, for *r* = 0, *C* = (8 + 3*c*)/9, and in the case of no interference, *C* = ^{11}/_{9}; with strong positive interference, *C* = ^{8}/_{9}. With *r* = ^{1}/_{2} and *c* = 1, *C* = 1. The coincidence is displayed as the blue curves in Figure 3B and is generally smaller than that for the case of two-way RILs by selfing.

#### Eight-way RILs, selfing:

The case of eight-way RILs by selfing can be derived directly from the case of four-way RILs by selfing. There are 8^{3} = 512 three-locus haplotypes, but they collapse to 13 distinct classes. The joint three-locus haplotype probabilities are presented in Table 7. Here, we present only one genotype pattern for each of the 13 classes, but also list the number of genotypes that fall into each class.

We thus have *C* = [1 − 8(*b*_{1} + 2*b*_{2} + 4*b*_{4} + 8*b*_{6})]/*R*^{2}, which gives the following:

It is difficult to write this in terms of *R*, as , and so we neglect to do so. Note that when *r* = 0, *C* = (7 + 2*c*)/8, which takes value ^{9}/_{8} under no interference and ^{7}/_{8} under strong positive interference.

The coincidence is displayed as the red curves in Figure 3B and is generally smaller than that for the case of four-way RILs by selfing.

### X chromosome for RILs by sibling mating:

#### Two-way RILs, X chromosome:

In the case of the X chromosome for two-way RILs by sibling mating, the two-point probabilities are not sufficient to determine the full three-locus probabilities, due to the difference in the frequency of the *A* and *B* alleles on the X chromosome, which prevents us from making use of symmetries, such as the relation Pr(*ABB*) = Pr(*AAB*), which held in the case of two-way RILs by selfing.

However, the two-point probabilities are sufficient to determine the coincidence-type quantity, which has the form *C* = [Pr(*ABA*) + Pr(*BAB*)]/*R*^{2}. By the approach used for three points in two-way RILs by selfing, we obtain the following: With *r* = 0, *C* = 3(4 + *c*)/8, so that with no interference, *C* = ^{15}/_{8}, while with strong positive interference, *C* = ^{3}/_{2}. When *r* = ^{1}/_{2} and *c* = 1, *C* = ^{9}/_{8}. The coincidence is displayed as the black curves in Figure 3C and is greater than that for two-way RILs by selfing. For both no interference and strong positive interference, the coincidence is entirely >1.

The full distribution of the three-locus haplotypes may be obtained by the approach we used to calculate the two-locus probabilities. There are 288 parental types, which reduce to 168 distinct states after accounting for symmetries, of which 6 states are absorbing. Thus, the absorption probabilities may be obtained by solving a set of 162 linear equations. Alternatively, one may collapse the results for the more complex case of four-way RILs, derived below, to obtain the results for two-way RILs.

#### Four-way RILs, X chromosome:

In the case of four-way RILs, the two-point probabilities are not sufficient to determine the three-point probabilities, or even the three-point coincidence. Thus we must return to the technique used to calculate the two-point probabilities, calculating the absorption probabilities of a Markov chain, here defined by the parental types at three loci at each generation of inbreeding.

There are 10,206 parental types, of the form *AAA*|*BBB* × *CCC* (the three-locus, X chromosome diplotype of the female parent and the three-locus, X chromosome haplotype of the male parent). These reduce to 2690 distinct states after accounting for symmetries (exchange alleles *A* and *B* and invert the order of the three loci), of which 10 states are absorbing. The transition matrix contains 65,612 nonzero elements (that is, ∼1% of the matrix).

The absorption probabilities could, conceivably, be obtained by solving a system of 2680 linear equations, but the scale and complexity of this system was too unwieldy in practice. We thus took a different approach. We also abandoned the effort to obtain symbolic solutions, seeking instead numerical solutions. (The absorption probabilities are ratios of polynomials, but we hypothesize that, in this case, the expressions are extremely complex.)

Let **π**^{(}^{n}^{)} denote the distribution (as a row vector) of the Markov chain at generation *n*, with **π**^{(0)} denoting the starting distribution, for which the state *AAA*|*BBB* × *CCC* has probability 1 and all other states have probability 0. Let *P* denote the transition matrix for the chain. Then **π**^{(}^{n}^{)} = **π**^{(0)}*P ^{n}*. We seek lim

_{n→∞}

**π**

^{}, which we calculated numerically. For each value of the recombination fraction,

*r*, and the three-point coincidence at meiosis,

*c*, we iterated across generations until the maximum difference between

**π**

^{(}

^{n}^{)}and

**π**

^{(}

^{n}^{+1)}was small (<10

^{−14}). Approximately 150 generations were required.

The most difficult part of the calculation was the construction of the transition matrix, and the most difficult part of that construction was the reduction of the full set of 10,206 parental types to the minimal set of 2690 states, to account for symmetries. This was done by first creating a look-up table. (Because the central task concerned this collapse of states by symmetry, we performed these calculations via a pair of short Perl programs. Such text manipulation is most conveniently accomplished in Perl.) Rather than construct the entire transition matrix in advance, each row of the transition matrix was constructed anew at each generation, and only those rows that were needed were so constructed (rows *i* for which π^{}_{i} > 10^{−16}). This approach, which saves memory but requires considerably more computation, was used in anticipation of the case of autosomes for four-way RILs by sibling mating, in which the transition matrix contains 73 million nonzero elements.

The 10 absorbing states for the X chromosome in four-way RILs by sibling mating are presented in Table 8A. The haplotype probabilities, denoted *c _{i}*, must be calculated numerically. The symbols are presented in Table 8A, as they are used to calculate the haplotype probabilities for eight-way RILs (below and Table 8B). It turns out that

*c*

_{1}= Pr(

*AAA*) ≡ Pr(

*CCC*) =

*c*

_{10}, which could not have been anticipated in advance. (Note that the

*c*here are not at all related to the coincidence,

_{i}*c*.)

The coincidence-type quantity is *C* = [2*c*_{3} + 2*c*_{6} + 2*c*_{7} + 4*c*_{8} + 2*c*_{9}]/*R*^{2}. This is displayed as the blue curves in Figure 3C.

#### Eight-way RILs, X chromosome:

The three-point probabilities for the X chromosome in eight-way RILs by sibling mating may be obtained from those for four-way RILs, by the same approach used in the case of selfing. There are 29 distinct absorbing states (see Table 8B), although three pairs may be collapsed due to the fact that, for the X chromosome four-way RILs by sibling mating, Pr(*AAA*) = Pr(*CCC*).

The coincidence-type quantity is then *C* = [1 − (4*d*_{1} + 8*d*_{2} + *d*_{4} + 4*d*_{5} + 4*d*_{6} + 8*d*_{11} + 8*d*_{12} + 4*d*_{19} + 4*d*_{20})]/*R*^{2}. This is displayed as the red curves in Figure 3C, with the solid curve corresponding to no interference and the dashed curve corresponding to the case of strong positive crossover interference.

### Autosomes for RILs by sibling mating:

#### Two-way RILs, autosomes by sib mating:

The results for autosomes in two-way RILs by sibling mating can be derived immediately from the results of H&W, by the technique used for two-way RILs by selfing. The three-point distribution is the following:

Thus the coincidence for the autosome in two-way RILs by sibling mating is the following: With *r* = 0, *C* = (6 + *c*)/4, so that with no interference, *C* = ^{7}/_{4}, while with strong positive interference, *C* = ^{3}/_{2}. With *r* = ^{1}/_{2} and *c* = 1, *C* = 1. This is displayed as the black curves in Figure 3D, with the solid curve corresponding to no interference and the dashed curve corresponding to the case of strong positive crossover interference. Note that the coincidence is yet smaller than that for the X chromosome in two-way RILs by sibling mating.

#### Four-way RILs, autosome by sib mating:

The three-point probabilities for autosomes in four-way RILs by sibling mating may be calculated by the approach used for the X chromosome, although the scale of the problem is greatly increased. There are 2,164,240 parental types (of the form *AAA*|*BBB* × *CCC*|*DDD*), which reduce to 137,488 distinct states after accounting for symmetries; seven states are absorbing. The transition matrix contains 73,022,406 nonzero elements (that is, ∼0.4% of the matrix).

Calculation of the three-point probabilities for a single (*r*, *c*) pair took ∼1 min for the X chromosome, but required ∼1.5 days for the autosome. Thus, the three-point coincidence curves (one for no interference, one for strong positive interference), displayed in blue in Figure 3D and containing 250 points each, required ∼750 days of computation time. (Spread across 12 computers, that is just 2 months.)

#### Eight-way RILs, autosome by sib mating:

The three-point probabilities for autosomes in eight-way RILs by sibling mating may be obtained from those for four-way RILs, by the same approach used in the case of selfing. The equations in Table 7 apply, with the *a _{i}* now representing the probabilities for the autosome in four-way RILs by sibling mating.

The three-point coincidence for the autosomes in eight-way RILs by sibling mating is displayed as the red curves in Figure 3D, with the solid curve corresponding to no interference and the dashed curve corresponding to the case of strong positive crossover interference. Note that, again, the three-point coincidence is entirely >1 in the case of no interference and is very near 1 in the case of strong positive crossover interference.

The autosomes in eight-way RILs by sibling mating are of particular interest to us, and so it is valuable to study the probabilities more thoroughly. The three-point coincidence is most informative for the two-way RILs; here they give information largely on the clustering of breakpoints, rather than on the dependence between alleles at adjacent loci.

First, we consider the symmetry of the alleles. In the two-point probabilities, complete symmetry was observed: The chance of switching from *A* to *x* across an interval on an eight-way RIL autosome was identical for all *x* ≠ *A*. An inspection of the three-point probabilities, however, indicates that such symmetry does not continue to hold.

For example, consider the case of haplotypes of the form *AxA* for *x* ≠ *A*. In Figure 4, we plot the conditional probabilities Pr(*AxA*|*A*–*A*) for *x* = *B*, *C*, *E*. (The alleles *C* and *D* are equivalent in this context, as are the alleles *E*, *F*, *G*, and *H*.) A stretch of *A* alleles is much more likely to contain a small segment of *E* than of *B*, especially in the case of strong positive interference.

Most interesting is an assessment of deviation from the Markov property for the alleles at three points on an eight-way RIL autosome. If the process along an RIL chromosome were Markov, then the allele at the first of three loci would provide no further information about the allele at the third locus, given knowledge of the allele at the central locus. In other words, if the Markov property held, we would have Pr(*xyA*|*xy*–) = Pr(–*yA*|–*y*–) for all *x* and *y*. Thus we consider log_{2} {Pr(*xyA*|*xy*–)/Pr(–*yA*|–*y*–)}, which would be strictly 0 if the Markov property held. These are displayed in Figure 5 for all distinct cases (*x*, *y*). (We make careful use of the symmetries of the problem to reduce the possible cases to 19.) The solid and dashed curves correspond to no interference and strong positive crossover interference, respectively.

In the case of no crossover interference (the solid curves), the probabilities are largely Markov-like, though with important exceptions: If the first two loci are *AC* or *AE*, the third locus is more likely to be *A* than one would expect in the absence of information about the allele at the first locus (see the black curves in Figure 5, C and D), while if the first two loci are *BC* or *BE*, the third locus is less likely to be *A* (see the blue curves in Figure 5, C and D). In the case of strong positive crossover interference (the dashed curves), we also see that *AB* is considerably less likely to be followed by another *A* (see the black dashed curve in Figure 5B). These observations are closely connected to the lack of symmetry in the three-point probabilities seen in Figure 4: Small segments of *C* or *E* will be inserted within longer stretches of *A*.

As it turns out, the cases *CCA* and *DCA* give identical probability ratios, although this could not be anticipated in advance. (These are the red and orange curves in Figure 5C, although only the red curves may be seen, as they overlap.) The same is true for the cases *EEA* and *FEA* (the green and pink curves in Figure 5D, of which only the green curves may be seen).

## THE WHOLE GENOME

A number of interesting questions about multiple-strain RILs cannot easily be answered by analytic means and so require large-scale computer simulations. For example, we are interested in the number of generations of breeding that will be required to achieve genome-wide fixation and the density of genotypes that will be required to identify all breakpoints. We are most interested in the eight-way RILs formed by sibling mating, but we also considered two-way RILs by selfing and sibling mating, to serve as benchmarks.

The simulated genome was modeled after the mouse, with the genetic lengths of the chromosomes taken from the Mouse Genome Database (see Table 9). The total genetic length was 1665 cM. We considered solely the case of strong positive crossover interference, as has been observed in the mouse (Broman *et al.* 2002). To simulate meiosis, we used the χ^{2}-model (Zhao *et al.* 1995), which is a special case of the gamma model (considered in the previous two sections), for which the interference parameter ν = *m* + 1, for a nonnegative integer *m*. (The χ^{2}-model is more convenient for computer simulation.) We used *m* = 10, corresponding to ν = 11, close to the estimate obtained by Broman *et al.* (2002). We used 10,000 simulation replicates and bred until complete fixation of the entire genome.

A variety of summaries of the results of the whole-genome simulations are displayed in Figure 6. The distribution of the number of generations of breeding required to achieve fixation at 99% of the genome is displayed in Figure 6A. It is important to note that this includes the initial mixing generations of breeding in addition to the many generations of inbreeding. (One additional mixing generation is required for eight-way RILs than for two-way RILs; see Figures 1 and 2.) Two-way RILs by selfing required an average of 8 generations, while two- and eight-way RILs by sibling mating required 23.5 and 26.7 generations, on average, respectively, to achieve 99% fixation. Thus, eight-way RILs require an additional 3 generations of breeding (including the additional generation of mixing). There is considerable variation in the number of generations required to achieve this level of fixation.

The distribution of the number of generations to achieve *complete* fixation is displayed in Figure 6B. For two-way RILs by selfing, 10.5 generations are required, on average. For two- and eight-way RILs by sibling mating, complete fixation requires an average of 35.6 and 38.9 generations, respectively. Eight-way RILs again require only 3 or so additional generations of breeding to achieve the same level of fixation as two-way RILs.

The distribution of the total number of segments (with segments defined as the regions between breakpoints on the RIL chromosomes) is displayed in Figure 6C. Two-way RILs by selfing have an average of 53 segments; two- and eight-way RILs by sibling mating have an average of 85 and 134 segments, respectively.

The marginal distribution of the lengths of the segments is displayed in Figure 6D. It should be no surprise that eight-way RILs have shorter segments. The spikes in the right tails in Figure 6D are whole chromosomes that were inherited intact. The higher spike at 80 cM corresponds to chromosomes 11 and 13, which had identical lengths (see Table 9). Another unusually high spike occurs at 96.5 cM and corresponds to the X chromosome (which, as was seen above, behaves differently than autosomes). Two-way RILs have a good chance of inheriting an intact chromosome, while for eight-way RILs this is relatively rare. The median segment length for two-way RILs by selfing is 25.6 cM; the median segment lengths for two- and eight-way RILs by sibling mating are 12.9 and 8.5 cM, respectively. The chance that a two-way RIL by selfing will have at least one intact chromosome is ∼99%; the average number of intact chromosomes is 3.8. For two-way RILs by sibling mating, the chance of at least 1 intact chromosome is 79%, and the average number of intact chromosomes is 1.5. For eight-way RILs by sibling mating, the chance of at least 1 intact chromosome is 12%, and the chance of 2 intact chromosomes is just 0.5%.

The high frequency of small segments seen in Figure 6D raises the question: How small is the smallest segment? The distribution of the smallest segment in the genome of an RIL is displayed in Figure 6E. For eight-way RILs by sibling mating, the smallest segment is generally quite small, which suggests that an extremely high density of genetic markers will be required to identify all segments in a panel of eight-way RILs. The 95th percentile of the length of the smallest segment in two-way RILs by selfing was 2.2 cM, while for two- and eight-way RILs by sibling mating, it was 0.58 and 0.27 cM, respectively. That is, 95% of the time, an eight-way RIL will have at least one segment that is <0.27 cM long.

Finally, the distribution of the number of small segments (that is, segments <1 cM in length) is displayed in Figure 6F. The average number of such small segments is just 1.4 for two-way RILs by selfing, but is 5.2 and 11.2 for two- and eight-way RILs by sibling mating, respectively.

## DISCUSSION

Our aim in this work was to characterize the genomes of multiple-strain recombinant inbred lines. We were particularly interested in the case of eight-way RILs by sibling mating, as the development of a large panel of such RILs has been proposed for the mouse (Threadgill *et al.* 2002; Complex Trait Consortium 2004). We have extended the work of Haldane and Waddington (1931) on two-way RILs to the case of 2* ^{n}*-way RILs.

Our key result is the equation *R* = 7*r*/(1 + 6*r*), connecting the recombination fraction at meiosis to the analogous quantity for the autosome of an eight-way RIL derived by sibling mating, which indicates a 7× map expansion in the RIL. (Compare this to the 2× map expansion in two-way RILs by selfing and the 4× map expansion in two-way RILs by sibling mating). Perhaps more important is the two-point transition matrix, which is critical for the analysis of eight-way RILs. An essential component of the use of RILs for genetic mapping is the reconstruction of the parental origin of DNA on the RIL chromosomes (the haplotypes), on the basis of less-than-fully informative genotype data. Such RILs, if developed, will likely be genotyped at a dense set of diallelic markers, such as SNPs. Application of the standard HMM technology for the haplotype reconstruction will make critical use of this transition matrix.

In advance, we had hypothesized that the two-point transition matrix would have a complex structure, with the *A* and *B* alleles found together more often than the *A* and *H* alleles. And so we were surprised to see that all off-diagonal elements in the transition matrix are the same. We discovered this initially by computer simulation. Indeed, the fundamental equation, *R* = 7*r*/(1 + 6*r*), was initially identified by simulation: We hypothesized the form *R* = *ar*/(1 + *br*) and used nonlinear regression to identify *a* and *b*. Nonlinear regression was rather dissatisfying, especially in comparison to the work of H&W, and so we pursued symbolic and numeric approaches and eventually the intense computation of three-point probabilities. The key breakthrough was the observation that an understanding of four-way RILs is sufficient for an understanding of eight-way RILs.

The three-point coincidence function is especially interesting. That it generally exceeds 1 indicates a clustering of breakpoints on RIL chromosomes. Such clustering is often seen in illustrations of RIL chromosomes (*e.g.*, see Silver 1995, Figure 9.4), but the cause of such clustering was not immediately obvious. Our explanation is the following: Closely spaced breakpoints had their origin in different generations, but breakpoints in later generations can occur only in regions that have not yet been fixed. Thus, regions that are not fixed early have a tendency to become saturated with breakpoints late.

The three-point coincidence function does not tell the full story regarding multiple-strain RIL chromosomes. That the coincidence is near 1 indicates that the locations of breakpoints follow something like a Poisson process. However, the three-point probabilities clearly indicate that the alleles along an RIL chromosome do not follow a Markov chain. Nevertheless, an RIL chromosome is more like a Markov chain than is the product of a single meiosis, and so one may be confident in the successful use of the HMM technology for haplotype reconstruction in eight-way RILs.

The symmetry in the two-point transition matrix for eight-way RILs implies that the genealogy of an RIL (the order of the eight parental lines in the initial crosses) will not generally need to be considered in the effort to reconstruct haplotypes from genotype data, even though the three-point probabilities indicate clear (and interesting) lack of symmetry. Nevertheless, in the analysis of the X chromosome, such genealogy information will be important, as the transition matrix for the X chromosome has considerable structure.

It should be emphasized that H&W considered sex-specific recombination fractions in their analysis of two-way RILs and showed that the two-point probabilities for the RIL depended only on the sex-averaged recombination fraction. We neglected such niceties in our analysis of higher-order RILs. We did examine the effect of sex differences in recombination on the results of our simulation study (data not shown). As sex-specific genetic maps are not available for the mouse, we simply assumed that the recombination rate in females was 50% greater than that in males, uniformly across the genome. The results (with 10,000 simulation replicates) were identical (to within sampling error) to the results in the case of no sex difference in recombination.

That our results assume no mutation and no selection is even more worthy of emphasis. Selection against particular alleles or combinations of alleles during the process of inbreeding will clearly have important influences on the structure and content of the RILs that survive the process. However, the study of the effects of selection by the analytic means we have pursued here would be extremely difficult, and our results ignoring selection will no doubt be of considerable value, just as the work of H&W, who also ignored selection, has been of great value. Exploration of the effects of selection on the products of inbreeding, likely by computer simulation, may be of practical importance.

While we considered only the case of multiple-strain RILs obtained via a “funnel,” in which no more intercrossing is used than is required to bring all of the alleles together, one might consider the addition of extra generations of outbreeding prior to the start of inbreeding. Winkler *et al.* (2003) tackled this problem for two-way RILs; their approach, combined with our results, could be used to derive haplotype probabilities for multi-way RILs with extra outbreeding.

The software that was written to accomplish this work is available at the author's web site (www.biostat.jhsph.edu/^{∼}kbroman/software) as an add-on package, R/ricalc, for the freely available statistical software, R (Ihaka and Gentleman 1996). The most intensive computations, of the three-point probabilities for four-way RILs derived by sibling mating, were performed via a pair of Perl programs; these programs are included within the R/ricalc package, although they have no connection to R. In addition, the package includes a set of Mathematica notebooks that document much of the algebraic details for the simpler results.

## Acknowledgments

The author is grateful to David Levin and Dan Naiman for valuable advice; to an anonymous reviewer, Śaunak Sen, and Hongyu Zhao for comments to improve the manuscript; and to the Department of Biostatistics, Johns Hopkins University, for putting up with his intense use of the Department's computing resources for this work.

## Footnotes

Communicating editor: J. Wakeley

- Received August 20, 2004.
- Accepted November 5, 2004.

- Genetics Society of America