# Genotype Probabilities at Intermediate Generations in the Construction of Recombinant Inbred Lines

- Karl W. Broman
^{1}

- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706

- 1Address for correspondence: Department of Biostatistics and Medical Informatics, University of Wisconsin, 1300 University Ave., Room 4710 MSC, Madison, WI 53706. E-mail: kbroman{at}biostat.wisc.edu

## Abstract

The mouse Collaborative Cross (CC) is a panel of eight-way recombinant inbred lines: eight diverse parental strains are intermated, followed by repeated sibling mating, many times in parallel, to create a new set of inbred lines whose genomes are random mosaics of the genomes of the original eight strains. Many generations are required to reach inbreeding, and so a number of investigators have sought to make use of phenotype and genotype data on mice from intermediate generations during the formation of the CC lines (so-called pre-CC mice). The development of a hidden Markov model for genotype reconstruction in such pre-CC mice, on the basis of incompletely informative genetic markers (such as single-nucleotide polymorphisms), formally requires the two-locus genotype probabilities at an arbitrary generation along the path to inbreeding. In this article, I describe my efforts to calculate such probabilities. While closed-form solutions for the two-locus genotype probabilities could not be derived, I provide a prescription for calculating such probabilities numerically. In addition, I present a number of useful quantities, including single-locus genotype probabilities, two-locus haplotype probabilities, and the fixation probability and map expansion at each generation along the course to inbreeding.

THE mouse Collaborative Cross (CC) is a panel of eight-way recombinant inbred lines (RIL): eight diverse parental strains are intermated, followed by repeated sibling mating, many times in parallel (see Figure 1D), to create a new set of inbred lines whose genomes are random mosaics of the genomes of the original eight strains (Complex Trait Consortium 2004; Collaborative Cross Consortium 2012). There are similar efforts for *Drosophila* (Macdonald and Long 2007) and *Arabidopsis* (Kover *et al.* 2009); the panels will serve as important reference populations for the systemic genetic analysis of complex traits.

Many generations are required for inbreeding of such RIL, and so a number of investigators have sought to make use of phenotype and genotype data on mice from intermediate generations during the formation of the CC lines: the pre-CC mice (*e.g.*, see Aylor *et al.* 2011). The mapping of quantitative trait loci (QTL) with data on pre-CC mice, whether by interval mapping (Lander and Botstein 1989) or Haley–Knott regression (Haley and Knott 1992), requires the calculation of conditional genotype probabilities given incompletely informative marker data (*e.g.*, at single-nucleotide polymorphisms). Such probabilities are generally derived using a hidden Markov model (HMM). The construction of an HMM for pre-CC mice formally requires the calculation of two-locus diplotype probabilities at arbitrary generations along the course to inbreeding. Thus, I sought to calculate single-locus genotype probabilities and two-locus diplotype probabilities at generation G_{2} : F* _{k}* (see Figure 1D), with the latter being a function of the recombination fraction between the two loci.

Previous work on genotype probabilities in RIL has focused largely on the final lines (Haldane and Waddington 1931; Broman 2005; Teuscher and Broman 2007), although Haldane and Waddington (1931) did calculate a portion of the probabilities for intermediate generations in two-way RIL by selfing. More recently, Johannes and Colomé-Tatché (2011) fully derived the two-locus genotype probabilities for two-way RIL by selfing and described numerical calculations for the autosome in two-way RIL by sibling mating.

Here I extend these results to the case of four- and eight-way RIL by selfing and sibling mating, including consideration of the X chromosome. The basic problem is to calculate the *k*-step probabilities of a Markov chain with many states. While I was not able to obtain closed-form solutions for the two-locus diplotype probabilities at F* _{k}* in RIL by sibling mating, I do provide recipes for calculating the probabilities numerically. And I was able to obtain closed-form solutions for single-locus genotype probabilities and two-locus haplotype probabilities. Further, I derived the fixation probability and map expansion at F

*. These latter results have important applications: the single-locus genotype probabilities could be useful in efforts to identify regions under selection, through the comparison of observed to expected genotype frequencies; the fixation probability can be interpreted as the expected proportion of the genome that is fixed; and the map expansion results indicate the accumulation of recombination events over generations.*

_{k}## Methods

The generation of two-way RIL by selfing and of two-, four-, and eight-way RIL by sibling mating is shown in Figure 1. The notation for generation numbers for RIL can be confusing. The numbering indicated in Figure 1 is used throughout, with F_{1} being the first generation in which all parental alleles are present in a single individual. In the following, I abbreviate G_{1} : F* _{k}* in four-way RIL and G

_{2}: F

*in eight-way RIL as simply F*

_{k}*. In particular, G*

_{k}_{1}in four-way RIL and G

_{2}in eight-way RIL is called F

_{0}.

Consider a particular crossing strategy, and let *X _{k}* denote the parental type at generation F

*. For RIL by selfing, this is the diplotype of the individual; for RIL by sibling mating, this is the pair of diplotypes for the two siblings. For example, in considering two loci in four-way RIL by sibling mating, one possible state is the starting state at F*

_{k}_{0},

*AA*|

*BB*×

*CC*|

*DD*. (In this notation, the pairs of letters on each side of the vertical bar denote the two haplotypes for an individual; the first and second letters in each haplotype correspond to the alleles at the first and second loci, respectively.) The sequence

*X*

_{0},

*X*

_{1},

*X*

_{2}, … , forms a Markov chain. That is,

*X*

_{k}_{+1}is conditionally independent of

*X*

_{0},

*X*

_{1}, … ,

*X*

_{k}_{−1}, given

*X*.

_{k}Let *P* denote the transition matrix of the Markov chain, defined by *P _{ij}* = Pr(

*X*

_{k}_{+1}=

*j*|

*X*=

_{k}*i*). Our goal is to calculate the

*k*-step probabilities, π

*= π*

_{k}_{0}

*P*, where π

^{k}_{0}is the starting distribution (at F

_{0}), which contains 1 at the fixed starting state and 0 for all other states.

First note that, for RIL by selfing, it is sufficient to consider two-way RIL, and for RIL by sibling mating, it is sufficient to consider four-way RIL. This is due to the bottleneck with two chromosomes in RIL by selfing at generation F_{1} and with four chromosomes in RIL by sibling mating at generation F_{0}. The results may be extended from two-way RIL by selfing to four-way RIL by selfing or from four-way RIL by sibling mating to eight-way RIL by sibling mating, by considering an additional generation of recombination. One may obtain the results for two-way RIL by sibling mating from the results for four-way RIL by sibling mating by collapsing states: let *A* ≡ *B* and *C* ≡ *D*.

The major technique for deriving the *k*-step probabilities, π* _{k}*, is to derive the eigen decomposition of the transition matrix:

*P*=

*V*Λ

*V*

^{−1}, where Λ is the diagonal matrix of eigenvalues and

*V*is a matrix whose columns are the corresponding eigenvectors. Then

*P*=

^{k}*V*Λ

^{k}V^{−1}, and Λ

*is obtained from Λ by taking the*

^{k}*k*th powers of the eigenvalues.

Such an eigen decomposition is straightforward in theory but is unwieldy in practice, due to the extremely large number of possible states. And so the second major technique is to take account of various symmetries to collapse the states into a smaller number. For two-way RIL by selfing with two loci, the simplest formulation would give 2^{4} = 16 possible states (two possible alleles at each locus on each of the two chromosomes). But considering that the order of the two haplotypes is immaterial, these may be reduced to 10 possible diplotypes. As shown in Haldane and Waddington (1931), these may be further reduced to just five states, by taking account of two additional symmetries: the order of the two loci may be ignored, and the symbols *A* and *B* may be switched.

Let us formalize this idea. (For a more rigorous approach, see Burke and Rosenblatt 1958.) Let the possible states of the chain be *S* = {*s*_{1}, … , *s _{n}*}. Partition

*S*into

*m*subsets of equivalent states,

*S*⊂

_{i}*S*, so that, for any pair

*i*and

*j*, Pr(

*X*

_{k}_{+1}∈

*S*|

_{j}*X*=

_{k}*s*) =

*q*for all

_{ij}*s*∈

*S*. The

_{i}*q*form an

_{ij}*m*×

*m*transition matrix,

*Q*, for the collapsed states. Let

*Z*denote the

*n*×

*m*incidence matrix defined by

*z*= 1 if

_{ij}*s*∈

_{i}*S*and 0 otherwise. Then(1)and so

_{j}*P*=

^{k}Z*ZQ*. As a result, π

^{k}*= π*

_{k}Z_{0}

*P*= π

^{k}Z_{0}

*ZQ*. Thus, one may work with the

^{k}*m*×

*m*transition matrix

*Q*in place of the

*n*×

*n*transition matrix

*P*.

For this collapse of states to be useful, the multiple states within each equivalence class, *S _{i}*, need to have equal probabilities at each generation, so that the probabilities of the individual states may be derived from the probabilities of the collapsed states. This will depend on the starting distribution. For example, consider one locus in two-way RIL by sibling mating. If the starting state is

*AA*×

*BB*, then at any future generation, the chance of being in state

*AA*×

*AB*is the same as that of being in state

*AB*×

*BB*. However, if the starting state is

*AA*×

*AB*, then there will be a lack of symmetry between

*A*and

*B*. (For the asymmetric case of two-way RIL initiated from a backcross, see Johannes and Colomé-Tatché 2011.)

Kimura (1963) described a further technique that has been critical in this work. In many instances, we do not need the full distribution π* _{k}*, but only various linear combinations, say π

*= π*

_{k}z_{0}

*P*, where

^{k}z*z*is an

*n*× 1 vector. Kimura (1963) demonstrated how to expand

*z*to an

*n*×

*m*matrix

*Z*in such a way that there exists a matrix

*Q*satisfying Equation 1. Then we again have π

*= π*

_{k}Z_{0}

*P*= π

^{k}Z_{0}

*ZQ*and may work with the

^{k}*m*×

*m*matrix

*Q*in place of the

*n*×

*n*matrix

*P*. Here, the matrix

*Q*is not a transition matrix but simply defines a recursion. The first element of π

*is the target quantity, π*

_{k}Z*.*

_{k}zConsider, for example, the probability of a random two-locus haplotype drawn from generation F* _{k}* in the formation of RIL by sibling mating. Let

*C*(

_{k}*AA*) denote that chance that

*AA*is drawn. This could either be an intact haplotype, transmitted without recombination from generation F

_{k}_{−1}, or be the result of recombination between the two haplotypes in a random F

_{k}_{–1}individual. Consider drawing a single random allele at the first locus from generation

*k*and then taking the allele at the second locus but on the opposite chromosome in that individual. Let

*S*(

_{k}*AA*) denote the probability that these two alleles are both

*A*. Then

*C*(

_{k}*AA*) = (1 −

*r*)

*C*

_{k}_{−1}(

*AA*) +

*rS*

_{k}_{−1}(

*AA*), where

*r*is the recombination fraction between the two loci. Further,

*S*(

_{k}*AA*) =

*T*

_{k}_{−1}(

*AA*), where

*T*(

_{k}*AA*) is the chance that, if one draws a random allele at the first locus from generation F

*and then a random allele from the opposite individual at the second locus, both alleles are*

_{k}*A*. We may further write

*T*(

_{k}*AA*) as a function of

*C*

_{k}_{−1},

*S*

_{k}_{−1}, and

*T*

_{k}_{−1}, forming the recursion matrix,

*Q*, which is shown in Supporting Information, Table S1. Moreover, this same recursion applies for all of the other haplotypes; one just needs to use different starting distributions, π

_{0}

*Z*. For the three distinct cases for four-way RIL by sibling mating, these are shown in Table S2.

A particularly useful aspect of Kimura's technique is that the recursion matrix can be constructed by probabilistic arguments, without the need to form the full transition matrix, *P*, or even the *n* × *m* matrix *Z*. For two autosomal loci in four-way RIL by sibling mating, there are 4^{8} = 65, 536 diplotype pairs without accounting for any symmetries. This may be reduced to 9316 after accounting for the obvious symmetries (exchange the two haplotypes in each individual and exchange the two individuals) and then to 700 diplotype states after accounting for the less obvious symmetries (exchange the two loci, exchange alleles *A* and *B*, exchange alleles *C* and *D*, and exchange both *A* for *C* and *B* for *D*). By the technique of Kimura (1963), one may work with a 3 × 3 matrix in place of the 700 × 700 transition matrix, if only haplotype probabilities are desired.

Throughout this work, Maxima (http://maxima.sourceforge.net) was used for symbolic algebra, and R (R Development Core Team 2010) and Perl (Wall *et al.* 2000) were used for additional verifications of the results.

## Results

### Two-way RIL by selfing

As noted in the previous section, it is sufficient to consider two-way RIL by selfing, due to the bottleneck with two chromosomes at F_{1}. Let us jump directly to two-locus diplotype probabilities, as the results are fairly simply obtained. As noted in Haldane and Waddington (1931) and in the previous section, if one takes account of the various symmetries, one may collapse the two-locus diplotype states to a Markov chain with five states. The transition matrix of this chain is shown in Table S3, with *r* denoting the recombination fraction between the two loci.

I obtained the eigen decomposition of this transition matrix (not shown) and, noting that the starting state (at generation F_{1}) is *AA* | *BB*, derived the two-locus diplotype probabilities at generation F* _{k}* (that is, after

*k*− 1 steps), π

_{k}_{−1}= π

_{0}

*P*

^{k}^{−1}= π

_{0}

*V*Λ

^{k}^{−1}

*V*

^{−1}, presented in Table 1. For each group of states, a single prototype and the number of states in that group are provided. For example, in the third row, with prototype

*AA*|

*AB*, the cited probability is for that prototype as well as each of the other three states in that group:

*AA*|

*BA*,

*BB*|

*AB*, and

*BB*|

*BA*.

Haldane and Waddington (1931) also derived these results for intermediate generations in two-way RIL by selfing. They displayed just the two cases *AA* | *AA* and *AB* | *AB* (Haldane and Waddington 1931, equation 1.4), but the results match those in Table 1, though note that their results are a factor of 2 larger, as they concern the combined states. Also note that Haldane and Waddington (1931) allowed a sex difference in the recombination fraction, whereas I assume no sex difference in recombination.

### Four-way RIL by sibling mating, one locus

#### Autosome:

For a single autosomal locus in four-way RIL by sibling mating, there are 55 genotype pairs, after accounting for the obvious symmetries. These may be reduced to 13 states, after accounting for the less obvious symmetries. The transition matrix for this reduced Markov chain is shown in Table S4. The starting state at generation F_{0} is *AB* × *CD*. I calculated the eigen decomposition of the transition matrix, and from that π* _{k}* = π

_{0}

*P*= π

^{k}_{0}

*V*Λ

^{k}V^{−1}. The results are shown in Table S5, which also indicates the number of genotype pairs corresponding to each of the 13 states. (The sum of the second column in Table S5 is 55.)

The probabilities of single-locus genotype pairs at generation F* _{k}*, shown in Table S5, are complex and not of particular interest in themselves (hence their inclusion in the Supporting Information). However, the single-locus probabilities for a random F

*individual follow immediately from these results; they are shown in Table 2. These are of considerably greater interest, as they constitute the “initiation” probabilities for an HMM. The single-locus genotype probabilities for the first several generations are plotted in Figure S1A.*

_{k}#### X chromosome:

In considering the X chromosome, it is important to consider the order of the initial crosses. In four-way RIL by sibling mating, I assume that a female *A* was crossed to a male *B*, and a female *C* was crossed to a male *D*, and then a female from the *A* × *B* F_{1} was crossed to a male from the *C* × *D* F_{1} (see Figure S2B). In the F_{1}, there are three X chromosomes, *A*, *B*, and *C*; the *D* allele is lost.

After accounting for the obvious symmetries, there are 18 possible single-locus genotype pairs. These may be reduced to 10 states, after accounting for the less obvious symmetries. The transition matrix for this reduced Markov chain is shown in Table S6. The starting state at generation F_{0} is *AB* × *C*. Through the eigen decomposition of the transition matrix, the results in Table S7 are obtained.

The marginal probabilities for the female and male are displayed in Table 2 and are plotted in Figure S1, B and C. The oscillations in the male X chromosome probabilities are interesting, but not particularly surprising.

#### Fixation probabilities:

The detailed single-locus results in Table S5 (for autosomes) and Table S7 (for the X chromosome) immediately provide the fixation probability in four-way RIL by sibling mating. For an arbitrary autosomal locus, the chance that a four-way RIL has been fixed at or before generation F* _{k}* is 4 · Pr(

*AA*×

*AA*). For an arbitrary X chromosome locus, the chance of fixation by generation F

*is 2 · Pr(*

_{k}*AA*×

*A*) + Pr(

*CC*×

*C*). These are shown in Table 3 and are further plotted in Figure 2A. Note that the probability that an arbitrary locus in four-way RIL has become fixed at exactly generation F

*may be derived as the difference between the results for*

_{k}*k*and

*k*− 1. These are shown in Figure 2B. Note that the fixation probabilities for eight-way RIL are identical to those for four-way RIL.

The fixation probabilities for two-way RIL by sibling mating are also simply derived: collapse alleles *A* ≡ *B* and *C* ≡ *D*. Thus, for an autosomal locus, one adds up the rows in Table S5 that contain only *A* or *B*.

The fixation probability for an X chromosome locus is slightly larger than that for an autosomal locus, and that for two-way RIL is slightly larger than that for four-way RIL. The fixation probability for two-way RIL by sibling mating at generation *k* is quite similar to that for four-way RIL at generation *k* + 1. Fixation in RIL by selfing occurs much more rapidly.

For a large genome, the fixation probabilities for an arbitrary autosomal locus, displayed in Figure 2A, may be interpreted as the approximate proportion of the autosomal genome that will be fixed at generation F* _{k}*. Nevertheless, as shown in Broman (2005) via computer simulation, there will be considerable variation across lines.

### Four-way RIL by sibling mating, two-locus haplotypes

#### Autosome:

The technique of Kimura (1963), described above, may be used to derive probabilities of random two-locus haplotypes drawn from generation F* _{k}* in the formation of four-way RIL by sibling mating. There are three distinct cases to consider (

*AA*,

*AB*, and

*AC*), which share a common recursion matrix (shown in Table S1) but require consideration of different starting states. For each case, the starting probabilities, π

_{0}

*Z*, form a 1 × 4 row vector with a single nonzero entry (see Table S2).

I obtained the eigen decomposition of the recursion matrix, *Q*, which is shown in Table S1, and through the equation π* _{k}Z* = π

_{0}

*ZQ*= π

^{k}_{0}

*ZV*Λ

^{k}V^{−1}obtained the two-locus haplotype probabilities in Table 4. The equations for autosomal haplotypes in Table 4 are valid only for

*r*< , but the results for

*r*= are obvious by symmetry: Pr(

*AA*) = at F

_{0}, Pr(

*AA*) = Pr(

*AB*) = at F

_{1}, and Pr(

*AA*) = Pr(

*AB*) = Pr(

*AC*) = at F

*for*

_{k}*k*≥ 2.

The autosomal haplotype probabilities as a function of the recombination fraction, *r*, are displayed in the left panels in Figure S3.

#### X chromosome:

To calculate the probability of a random two-locus X chromosome haplotype drawn from the female at F* _{k}* in four-way RIL by sibling mating and of the single X chromosome haplotype in the corresponding male, there are four cases to consider (

*AA*,

*AB*,

*AC*, and

*CC*). Application of the technique of Kimura (1963) for the X chromosome requires consideration of a set of four states, with the recursion matrix shown in Table S8. Each of the four cases uses the same recursion matrix but different starting probabilities (shown in Table S9). Calculation of the probabilities of a random haplotype drawn from the female and of the single haplotype in the male uses the same set of equations, but for a random female haplotype one takes the first element of π

*, while for the male haplotype one takes the third element. Again, following the eigen decomposition of the matrix in Table S8, the haplotype probabilities in Table 4 were obtained.*

_{k}ZThe female and male X chromosome haplotype probabilities, as a function of the recombination fraction, *r*, are displayed in the center and right panels, respectively, in Figure S3. The probabilities of haplotypes *AA* and *CC* in females, and all of the haplotype probabilities in males, show pronounced oscillations across generations. For example, the *CC* haplotype is common in males for even *k* and common in females for odd *k* and vice versa for haplotype *AA*.

#### Map expansion:

The multiple generations of recombination in the formation of RIL lead to genetic map expansion. The map expansion as a function of generation is easily obtained from the haplotype probabilities. Let *R* denote the probability of a recombinant haplotype. [For the autosome, take 1 − 4 · Pr(*AA*).] The map expansion relative to a single meiosis is (see Teuscher and Broman 2007). For the X chromosome, I calculated the map expansion separately in females and males and then obtained a combined map expansion by averaging the two values, giving the female value weight .

The map expansion for the two-way RIL by selfing case may be obtained from the values in Table 1. To obtain the map expansion for two-way RIL by sibling mating, equate alleles *A* ≡ *B* and *C* ≡ *D*; the recombinant haplotypes correspond to the *AC* case in Table 4. To obtain the map expansion for eight-way RIL by sibling mating, note that the chance of each nonrecombinant haplotype would be (1 − *r*)/2 times the probability for haplotype *AA* shown in Table 4. A similar calculation applies for four-way and eight-way RIL by selfing.

The map expansions for two-, four-, eight-, and 2* ^{n}*-way RIL by selfing and for the autosome by sibling mating are shown in Table 5 and are further illustrated in Figure 3. The results for the X chromosome in RIL by sibling mating are simply two-thirds those for the autosome and so are not shown. (I have verified, via Maxima, that this is true for 2

*-way RIL up to*

^{n}*n*= 98. A general proof continues to elude me.)

Teuscher *et al.* (2005) also sought to calculate the map expansion by generation for two-way RIL by sibling mating. My results match those of Teuscher *et al.* (2005), were more easily obtained, and provide a closed-form solution.

### Four-way RIL by sibling mating, two-locus diplotypes

#### Autosome:

I now turn to the calculation of the distribution of the two-locus diplotype on an autosome, for a random individual drawn from generation F* _{k}* in the formation of four-way RIL by sibling mating. There are 18 cases falling into three groups: diplotypes of the form

*AA*|

*AA*, with both loci being homozygous;

*AA*|

*AB*, with one locus being homozygous; and

*AA*|

*BB*, with both loci being heterozygous.

Let us start with the *AA* | *AA* case. Following the approach of Kimura (1963), I obtained a 13 × 13 recursion matrix, *Q*, whose transpose is shown in Table S10. Because of the size and sparsity of the matrix, only the nonzero elements are indicated. The starting states for the three related diplotypes are shown in Table S11. (For each diplotype pattern, the starting distribution π_{0}*Z* has a single nonzero entry.)

The next step is to derive the eigen decomposition of *Q*. However, while 7 of the 13 eigenvalues can be obtained, the other 6 eigenvalues are the roots of a sixth degree polynomial (whose coefficients are polynomials of *r* of degree up to 5). This prevented the calculation of the diplotype probabilities at F* _{k}*.

Nevertheless, the reduction of states represented in Table S10 and Table S11 is considerable and is useful for the numeric calculation of these probabilities. Direct calculation would require consideration of the full transition matrix for pairs of diplotypes, which, even after accounting for all possible symmetries, is a 700 × 700 matrix.

Turning to the *AA* | *AB* case, with one locus being homozygous and the other heterozygous, the recursion matrix is 17 × 17; the nonzero elements of its transpose are shown in Table S12. The starting states for the four related diplotype patterns are shown in Table S13.

Finally, for the *AA* | *BB* case, with both loci being heterozygous, the recursion matrix is 14 × 14; its transpose is shown in Table S14. The starting states for the 11 related diplotype patterns are shown in Table S15.

#### X chromosome:

It should not be surprising that closed-form solutions for the two-locus diplotype probabilities for the female on the X chromosome at generation F* _{k}* in the formation of four-way RIL by sibling mating could not be derived. (But note that the two-locus haplotype probabilities for the male X chromosome could be calculated; see Table 5.)

Nevertheless, the recursion matrices and starting states, derived by the approach of Kimura (1963), may be useful for numerical computations. After consideration of the various symmetries, there are 17 two-locus diplotype patterns falling into the same three groups as for the autosome.

For the *AA* | *AA* case, the recursion matrix is 13 × 13; its transpose is shown in Table S16. The starting states for the four related diplotype patterns are shown in Table S17. Six of the 13 eigenvalues may be derived; the other 7 are roots of a pair of polynomials, one of degree 3 and the other of degree 4.

For the *AA* | *AB* case, the recursion matrix is 18 × 18; its transpose is shown in Table S18. The starting states for the five related diplotype patterns are shown in Table S19. For the *AA* | *BB* case, the recursion matrix is 12 × 12; its transpose is shown in Table S20. The starting states for the eight related diplotype patterns are shown in Table S21.

### Eight-way RIL

The calculation of genotype or diplotype probabilities for eight-way RIL from those for four-way RIL is straightforward, but also tedious and potentially confusing. For clarity, lowercase letters are used for the alleles in eight-way RIL, while uppercase letters denote the alleles in four-way RIL.

First, consider the genotype at an autosomal locus for a random individual drawn from generation F* _{k}* in the construction of eight-way RIL by sibling mating. Due to the bottleneck at generation G

_{2}, the genotypes

*ab*,

*cd*,

*ef*, and

*gh*are not possible. (At any one locus, only one allele from each of these pairs will be transmitted from G

_{1}to G

_{2}.) The probability of genotype

*aa*is the chance that the G

_{2}individual receives the allele

*a*(which is ) times the probability for the genotype

*AA*in the construction of four-way RIL by sibling mating. The probabilities of genotypes

*ac*and

*ae*are the probabilities of genotypes

*AB*and

*AC*, respectively, in the construction of four-way RIL by sibling mating.

Two-locus haplotype probabilities are obtained in a similar way, noting that the haplotype in the position of the *A* haplotype at G_{2} in the construction of four-way RIL is *aa* or *bb* with probability (1 − *r*)/2 each and is *ab* or *ba* with probability *r*/2 each. Thus, for example, the chance of obtaining *ab* as a two-locus autosomal haplotype, drawn at random from generation F* _{k}* in the construction of eight-way RIL by sibling mating, is

*r*/2 times the probability of drawing

*AA*from generation F

*in the construction of four-way RIL by sibling mating. The chance of drawing the haplotype*

_{k}*cg*is times the probability of drawing

*BD*from the corresponding generation in the construction of four-way RIL.

Table S22 contains a complete prescription for the calculation of two-locus autosomal diplotype probabilities for intermediate generations in the construction of eight-way RIL, on the basis of the corresponding probabilities for four-way RIL. The first column contains the possible diplotype patterns. The second column contains the numbers of diplotype states corresponding to each pattern. To calculate the probability of the pattern in the first column, for an eight-way cross, take the corresponding probability for the pattern in the third column, for a four-way cross, multiplied by the value in the fourth column.

Table S23 contains a similar prescription for calculating probabilities of the two-locus X chromosome diplotype in the female at intermediate generations in the construction of eight-way RIL. A key feature to note is that, at a single X chromosome locus at generation G_{2}, the female is either *ac* or *bc*, while the male is hemizygous *e* or *f* (see Figure S2C).

## Discussion

I sought to calculate the two-locus diplotype probabilities for a random individual drawn from generation G_{2} : F* _{k}* in the formation of eight-way RIL by sibling mating, as these could form the basis for an HMM for reconstructing the genotype probabilities in pre-CC individuals given incompletely informative marker data. While I was not able to obtain closed-form solutions for these probabilities, the results in Table S10, Table S11, Table S12, Table S13, Table S14, Table S15, Table S16, Table S17, Table S18, Table S19, Table S20, and Table S21 provide a recipe for numerical calculations.

A more careful reading of Haldane and Waddington (1931) would have indicated that closed-form solutions for these probabilities would not be possible. As they state (Haldane and Waddington 1931, p. 367), regarding the calculation of related probabilities for two-way RIL by sibling mating, “These equations can, in part at least, be reduced to quartics, but at least one quartic is irreducible. Hence only numerical calculation is practicable.”

Moreover, Liu *et al.* (2010) described a general HMM for the treatment of complex pedigrees with inbreeding, appropriate for pre-CC individuals, that does not require the explicit derivation of these two-locus probabilities. Further, with the high-density genotype data available on the pre-CC mice (Aylor *et al.* 2011), a relatively simple HMM, such as that in HAPPY (Mott *et al.* 2000), which does not take formal account of the varying recombination patterns as a function of cross direction or generation, is likely sufficient for genotype reconstruction.

Nevertheless, an HMM making use of these calculations, as well as functions for simulating partially inbred lines, will be implemented in a future version of R/qtl (Broman *et al.* 2003). With this implementation, we will be able to assess the relative advantages of such a specially tailored HMM over both the more general approach of Liu *et al.* (2010) and the simpler approach of Mott *et al.* (2000).

In practice, the genetic analysis of RIL generally proceeds prior to full inbreeding, with calculations based on the haplotype frequencies at fixation, and with remaining heterozygous genotypes often omitted and treated as missing. The calculations herein might be used to deal with residual heterozygosity, but it is unlikely that it will give much improvement in the analysis. After only a few generations, the haplotype probabilities are quite close to the values at fixation. Moreover, the consideration of heterozygotes can lead to problems in the QTL analysis if the frequency of heterozygotes is low, although this can be alleviated by an assumption of additive allele effects at a QTL. Finally, the remaining regions of heterozygosity in RIL may have survived due to selection, whereas these calculations rely on assumptions of no selection or mutation and so are not appropriate to capture such phenomena.

I derived closed-form solutions for a number of quantities that are of considerable interest. The single-locus genotype probabilities (Table 2) could be useful in efforts to identify regions under selection, through the comparison of observed to expected genotype frequencies. The fixation probability (Table 3 and Figure 2) can be interpreted as the expected proportion of the genome that is fixed. The map expansion results (Table 5 and Figure 3) indicate the accumulation of recombination events over generations, which can be valuable for study design: If an investigator intervenes at an intermediate generation to speed up the process toward inbreeding, what proportion of the final recombination breakpoints might be lost, and so to what extent might mapping precision be eroded?

The single-locus genotype probabilities (Table 2) were derived by brute force, using the full transition matrix for the pair of genotypes from generation to generation. The approach of Kimura (1963) could also be used in these cases, to provide considerable simplification. For example, in the single-locus autosome case, one may use a 3 × 3 recursion matrix in place of the 13 × 13 matrix in Table S4.

There is also a simpler way to calculate the fixation probabilities for four-way RIL by sibling mating. One may consider a single locus in a two-way RIL, but starting at a different state (*e.g.*, start at *AB* × *BB*, to calculate the chance that a four-way RIL is fixed at allele *A*). This requires the consideration of a 6 × 6 transition matrix.

There is an interesting connection between this work and the Fibonacci sequence {0, 1, 1, 2, 3, 5, 8, …}, defined by the recursive formula *x _{k}* =

*x*

_{k}_{−1}+

*x*

_{k}_{−2}, with starting values

*x*

_{0}= 0 and

*x*

_{1}= 1 (see Graham

*et al.*1994, Section 6.6). To obtain a closed-form solution for

*x*, one may write the recursion in matrix form and apply the same techniques used herein to obtain , where φ is the “golden ratio”, . Note that appears numerous times in the results. (When I first derived the single-locus probabilities in Table 2, I was confused about why the results involve , when the numbers are all clearly rational.)

_{k}The great effort expended here to derive symbolic results raises the question of the relative merits of computer simulations, numeric calculations, and symbolic calculations. Simulations are most flexible and are generally simpler to obtain, but lack precision. Numeric calculations can be precise, but can be computationally intensive. Symbolic results are more general than numeric calculations, can enable quicker calculations in software, and have the potential to provide more clear insight. Ultimately, the effort toward symbolic results largely serves to satisfy a personal compulsion.

## Acknowledgments

I thank James Crow, Bret Larget, and Timo Seppäläinen for valuable discussions and Maria Colomé-Tatché, Frank Johannes, Lauren McIntyre, Tracey DePellegrin Connelly, and two anonymous reviewers for comments on the manuscript. This work was supported in part by National Institutes of Health grant GM074244.

## Footnotes

*Edited by Lauren M. McIntyre, Dirk-Jan de Koning, and 4 dedicated Associate Editors*

- Received July 11, 2011.
- Accepted September 10, 2011.

- Copyright © 2012 by the Genetics Society of America

Available freely online through the author-supported open access option.