## Abstract

Previous studies have enabled exact prediction of probabilities of identity-by-descent (IBD) in random-mating populations for a few loci (up to four or so), with extension to more using approximate regression methods. Here we present a precise predictor of multiple-locus IBD using simple formulas based on exact results for two loci. In particular, the probability of non-IBD *X _{ABC}* at each of ordered loci

*A*,

*B*, and

*C*can be well approximated by

*X*/

_{ABC}= X_{AB}X_{BC}*X*and generalizes to

_{B}*X*

_{123…k}=

*X*

_{12}

*X*

_{23…}

*X*

_{k}_{−1,k}/

*X*, where

^{k−2}*X*is the probability of non-IBD at each locus. Predictions from this chain rule are very precise with population bottlenecks and migration, but are rather poorer in the presence of mutation. From these coefficients, the probabilities of multilocus IBD and non-IBD can also be computed for genomic regions as functions of population size, time, and map distances. An approximate but simple recurrence formula is also developed, which generally is less accurate than the chain rule but is more robust with mutation. Used together with the chain rule it leads to explicit equations for non-IBD in a region. The results can be applied to detection of quantitative trait loci (QTL) by computing the probability of IBD at candidate loci in terms of identity-by-state at neighboring markers.

IN a recent article formulas for computing probabilities of identity-by-descent (IBD) at multiple loci in random-mating populations were obtained (Hill and Weir 2007) by extending methods of Weir and Cockerham (1969, 1974) for a haploid model. Recurrence equations were presented for multilocus non-IBD, from which IBD can be computed; but the number of terms involved quickly becomes impracticably large to compute. For example, prediction of nonidentity at three loci requires recurrence equations for a total of 16 non-IBD measures defined for loci sampled on two, three, four, five, and six different haplotypes. For four loci the number of measures rises to 139 (Hill and Weir 2007). Hernández-Sánchez *et al*. (2004) have developed approximations based on multiple regression to compute IBD at multiple loci from that at two loci, but the formulas become increasingly less tractable and accurate as the number of loci increases.

Here we develop a straightforward method (the chain rule) for predicting probabilities of multilocus non-IBD, and thus IBD, which uses exact results only on two-locus non-IBD probabilities. Assuming a known population history, this predictor can be very precise for many loci and can enable IBD for a whole chromosome region to be computed. We also develop simple approximate recurrence equations that are generally less precise, except in the presence of mutation.

An application of multiple-locus extensions of Wright's inbreeding coefficient is in gene or quantitative trait loci (QTL) mapping on the basis of the association between phenotypic similarity of individuals and shared IBD at a particular genomic region (Meuwissen *et al*. 2002; Hernández-Sánchez *et al*. 2006). The magnitude of IBD at a QTL is computed from the identity-by-state (IBS) of neighboring marker loci, but to do so it is necessary to know the extent of joint IBD across the QTL and markers relative to some reference population.

## METHODS

#### Background:

##### Definitions:

Let *A*, *B*, and *C* be three loci located in that order on a chromosome, and denote by *F _{A}*,

*F*, and

_{AB}*F*probabilities of IBD at locus

_{ABC}*A*, loci

*A*and

*B*, and loci

*A*,

*B*, and

*C*, respectively. Similarly, let

*X*,

_{A}*X*,

_{AB}*X*denote the probabilities of non-IBD at the corresponding loci;

_{ABC}*i.e*.,

*X*is the probability that neither

_{AB}*A*nor

*B*is IBD. These quantities refer to the case where identity is examined at all loci on a pair of haplotypes. There are other measures when considering more than two haplotypes. For example, two IBD loci can also be sampled in three and four different haplotypes (Weir and Cockerham 1974).

The IBD and non-IBD probabilities are related at any generation by, for example,(1a)(1b)(1c)(Hill and Weir 2007). In general, *k*-locus IBD and non-IBD measures are related as(1d)where *F*_{1…k} and *X*_{1…k} denote, respectively, the probabilities of IBD and non-IBD for all ordered loci 1 to *k* on two haplotypes. Equation 1d is an example of the inclusion–exclusion principle. The multilocus measures used here, which are extensions of the Wright–Malécot definitions of inbreeding such that pairs of genes at each of two loci may be IBD even though identity at each locus traces back to different ancestors, differ from the “chromosome segment homozygosity” defined by Hayes *et al*. (2003), which defines identity of haplotypes back to a common ancestral haplotype *without* intervening recombination.

The following parameters are also used and assumptions made. All genes in the founder population (generation *t* = 0) are assumed to be non-IBD at all loci; *i.e*., *X _{A}*

_{(0)}=

*X*

_{AB}_{(0)}= … = 1. The effective population size is

*N*diploids (2

*N*genes) and is constant over generations. There is random mating (with or without selfing, as specified) and there is no selection at or near the identified loci. The recombination fraction between loci

*A*and

*B*is

*r*and there is no crossover interference. The map length of a region of chromosome is denoted

_{AB}*l*(in morgans). The rate of mutation at each locus is

*u*, where any mutant gene is assumed to be non-IBD to all existing genes at that locus in the population (

*i.e*., infinite-alleles model), and the rate of migration is

*m*, where migrant haplotypes come from an infinitely large and unrelated population, such that in the generation following migration, genotypes comprising one or two migrant haplotypes are non-IBD at all loci. Also we define

*R*= 4

_{AB}*Nr*,

_{AB}*L*= 4

*Nl*,

*U =*4

*Nu*, and

*M*= 4

*Nm*.

##### Exact method:

By extending methods of Weir and Cockerham (1974), Hill and Weir (2007) give an exact way to predict probabilities of multilocus non-IBD, and from that IBD, by transition matrix iteration over generations, assuming a haploid model. Although the method is feasible for four loci it rapidly becomes unwieldy with more, so we review and consider alternative methods to predict identity for multiple loci from results for fewer loci, *e.g*., *F _{ABC}* from

*F*and

_{AB}*F*.

_{BC}##### Regression method:

Hernández-Sánchez *et al*. (2004) proposed a regression analysis to predict probabilities of identity at three and four loci from those on two loci given by Weir and Cockerham (1974). For example, *F _{AB}*,

*F*, and

_{AC}*F*are computed each generation, and from these the regression coefficients of identity at locus

_{BC}*B*given identity at

*A*are calculated; for example, β

*= Cov(*

_{B.A}*F*,

_{A}*F*)/Var(

_{B}*F*) = (

_{A}*F*)/[

_{AB}− F_{A}F_{B}*F*(1 −

_{A}*F*)]. Consequently the conditional probability

_{A}*F*

_{B}_{|AC}of identity at locus

*B*given identity at

*A*and

*C*is predicted from a partial regression equation including terms in β

*and β*

_{B.A}*, and thus the three-locus identity*

_{B.C}*F*=

_{ABC}*F*

_{B}_{|AC}

*F*(Hernández-Sánchez

_{AC}*et al*. 2004, Equation 3). On the basis of this three-locus prediction, but still using exact results for only two loci, Hernández-Sánchez

*et al*. extended the regression method to predict identity at four loci in a two-step process. The method gave good predictions for three- and four-locus identity obtained by simulation, for example, for three- and four-locus inbreeding coefficients in random-mating diploid populations for values of

*R*= 4 between adjacent loci (

*e.g*.,

*N*=10,

*r*= 0.1) and 8 (

*N*= 20,

*r*= 0.1). Predictions were poorer for four loci or if the conditional identities were predicted for loci outside (

*C*from

*A*and

*B*) rather than between the two reference loci (

*B*from

*A*and

*C*). Their method could be extended by standard multiple-regression methods to make more precise predictions for five or more loci using the results given by Hill and Weir (2007) for three or four loci, but computation of the partial regression coefficients rapidly becomes unwieldy as the number of loci increases.

#### Conditional (chain-rule) method for multilocus non-IBD:

##### Principle:

The regression method of Hernández-Sánchez *et al*. (2004) does not utilize the ordering of the loci on the chromosome directly, *i.e*., the fact that for loci ordered *A*, *B*, *C*,…, a recombination between *A* and *B* usually also implies a recombination between *A* and *C*. This suggests alternative methods for predicting the multilocus (non)identities by utilizing such information. Therefore a “natural” predictor of the three-locus nonidentity is to approximate the joint probability *X _{ABC} =X_{AB}X_{C}*

_{|AB}by

*= X*

_{AB}X_{C}_{|B}, where

*X*

_{C}_{|B}=

*X*/

_{BC}*X*is the conditional probability of nonidentity at locus

_{B}*C*given nonidentity at the adjacent locus

*B*. This implies that knowledge of IBD probability at the more distant

*A*adds no further information and gives the predictor(2)In the absence of mutation it turns out that (2) is remarkably precise, as shown by examples in Table 1 in which predictions of

*X*are compared to exact values (Hill and Weir 2007), with most predictions deviating <0.1% in absolute terms and 1% in relative terms. These are better than those based on the regression method of Hernández-Sánchez

_{ABC}*et al*. (2004), particularly at higher values of

*R*. For example, for

*N = t =*100, the predictions from the regression method are 0.6058 (

*i.e*., exact), 0.5796, 0.5159, and 0.3802 (an absolute deviation of almost 1% in

*X*) for

_{ABC}*R*= 0, , 1, and 4, respectively (

_{AB}= R_{BC}*cf*. Table 1). It is important to note that, unlike in the regression method, the ordering of the loci is important for the chain rule; for example,

*X*/

_{AB}X_{AC}*X*is a very poor predictor of

_{A}*X*.

_{ABC}In view of the high predictive value of Equation 2, unsurprisingly the natural extension to four lociis also a good predictor (results not shown). For *k* loci, this “chain-rule” predictor of multilocus nonidentity , from adjacent two-locus and one-locus nonidentities *X* ≡ *X _{i}*, which are assumed to be the same at each locus, is(3)and for equally spaced markers(4)

Examples of predictions of multilocus nonidentity computed from Equation 4 are compared with results obtained by stochastic simulation using Wright–Fisher sampling in Figure 1, where it is seen that there is excellent correspondence for these examples in which there is a population of constant size with no mutation or migration. The method can be used for any mating system, *e.g*., a haploid (Table 1) or a diploid with selfing included (Figure 1), for nonconstant population size, and in the presence of migration or mutation. As we show subsequently, of these only mutation causes significant errors.

##### Regional non-IBD:

Using Equation 4, the probability *X*(*l*) that *all* sites in a region of length *l* morgans are non-IBD can be predicted by dividing it into very many, say *s = k* − 1, small equally sized segments and taking the limit(5)where *X*_{(l/s)} denotes the probability of joint non-IBD of a pair of markers *l/s* map units apart. This probability approaches that for loci with recombination fraction *r* = *l/s* as *s* → ∞. Equation 5 can therefore be written asThe limits arethe derivative of the two-locus non-IBD probability with respect to the recombination fraction *r* between the loci evaluated at *r =* 0, and it is convenient to define(6)Using the definition of the exponential function, Equation 5 reduces to(7)Equation 7 can also be derived from the incremental change in *X*(*l*) as *l* is increased by an infinitesimally small amount and integrating the resultant “growth” equation.

The derivatives in Equation 6 (which are negative) can be evaluated numerically at any generation by iteration of the transition matrix for a small value of *r* and computing γ as [*X*_{(r)}/*X* − 1]/*r*. To ensure there are no errors due to rounding or inclusion of higher-order terms, consistency can be checked using a range of values of *r* (we found consistency for *r* between 10^{−4} and 10^{−7}). Equation 7 can also be expressed in terms of *L* = 4*Nl* if the derivative is similarly rescaled. Examples are given in Figure 2. In these examples mutation is assumed to be absent. Indeed, to include mutation it would be necessary to define a mutation rate per unit map length as a continuous function, and in view of the limited accuracy of the chain rule in the presence of mutation, we do not consider this extension to the analysis.

##### Multilocus IBD:

*F _{ABC}*,

*F*, etc., can be predicted from Equations 1–3 directly. For example, from Equations 1c and 2(8)A similar simple conditional argument to that used to obtain Equation 2 would lead to a different prediction

_{ABCD}*= F*/

_{AB}F_{BC}*F*. This prediction equation for does not hold because the conditional probability

_{B}*F*

_{BC}_{|AB}does not equal

*F*

_{BC}_{|B}as the regions

*AB*and

*BC*may be IBD for different founder haplotypes. In contrast, replacing non-IBD for IBD coefficients using Equations 1a and 1b and rearranging Equation 8 gives(9)Thus for the chain rule in terms of IBD, the term on the right of Equation 9 is the overall probability of identity at

*A*and

*C*less situations in which there is nonidentity at

*B*but identity at

*A*and

*C*.

Prediction of *k*-locus IBD from non-IBD using Equation 1d involves 2* ^{k}* − 1 terms, and becomes computationally impractical for evaluating IBD over multiple sites (

*e.g*., 6 hr of computation for

*k*= 30 with an ∼1 Mflop computer). There is, however, a very efficient algorithm for adding successive loci in the chain. Note thatfrom Equations 1b and 1c, and from Equation 2SimilarlyLet Δ

_{1}≡

*F*, Δ

_{2}≡

*F*

_{12}−

*F*, Δ

_{3}≡ −

*F*

_{12}, and, in general, Δ

_{i}≡ Then(10)for

*k*> 2, and(11)Equation 10, in which one locus is added at each iteration to compute the change in multilocus IBD, involves

*k*terms when the

*k*th locus is added and thus a total of in all. This contrasts with the 2

*− 1 needed in Equation 1d, such that the computation is feasible up to thousands of loci (*

^{k}*e.g*., 10 sec computation for

*k*= 2000 with the same computer). To predict the probability of IBD on a region assuming equal recombination fractions between consecutive loci, it requires the evaluation only of

*k*− 1 values of

*X*,

_{ik}*i*= 1,

*…*,

*k*, and it is also possible to predict regional IBD simply by estimating IBD for a very large number of sites.

A comparison between predictions of multilocus IBD from simulation and use of Equation 11 is given in Figure 3 for a population of constant size in the absence of mutation or migration. In view of the excellent predictions of non-IBD shown in Figure 1, for example, the fit of IBD is to be expected. Results for regional IBD are given for a wider range of parameters in Figure 4. Figures 3 and 4 also show how slowly the multilocus IBD increases with generation if many loci are considered, which implies that there can be small regions of the genome non-IBD even when most nearby sites are IBD.

##### Mutation, migration, and population bottlenecks:

The chain-rule predictions of multilocus non-IBD probabilities, and of those from IBD, can be undertaken for any random-mating system (*e.g*., in haploid and monoecious or dioecius diploid populations with/without avoidance of selfing) by using an appropriate transition matrix to compute the two-locus non-IBD (Weir and Cockerham 1974).

Changes in population size, for example due to bottlenecks, are easily accounted for in the chain rule by using the appropriate value of *N*. Migration, under the continent-to-island model, increases the probability of non-IBD. This can be accounted for by replacing **x**_{t} by **x**_{t}+**m**_{t} in Equation 7 of Hill and Weir (2007) in the following vector [assuming for simplicity that the migration rate *m* is small so terms of *O*(*m*^{2}) can be ignored],where *x _{i}* refers to the

*i*th component of vector

**x**

_{t}of two-, three-, and four-haplotype coefficients of non-IBD. Continent-to-island and also multiple small-island models results in Figure 5 show an excellent level of prediction of simulated values of

*F*using the chain rule, which also implies that the chain rule would apply within a nonrandom mating population, for example, incorporating avoidance of mating of relatives.

_{ABC}Mutation is the only evolutionary force considered in this study for which the chain rule gave poor predictions (Figure 5). Although the departure is small with realistic *u* (<10^{−5}) and few loci in small populations, it worsens as mutation rate (*U*) increases and as linkage becomes very tight as do predicted regional non-IBD and IBD probabilities (results not shown). A simple explanation of why mutation breaks the chain rule is that the adjacent locus does not contain all the information about the non-IBD status at a given locus (with mutation *X _{A}*

_{|BC}>

*X*

_{A}_{|B}and without

*X*

_{A}_{|BC}=

*X*

_{A}_{|B}). In the presence of mutation, information about the IBD status at locus

*C*is useful in predicting the status at

*A*because

*B*may be non-IBD due to mutation and, except for this mutation, the chromosome region including

*A*and

*C*would be IBD. The chain rule assumes a first-order Markov chain that is violated in the presence of mutation because mutations occur independently of position (so that an IBD locus can be next to a mutant locus). In contrast, migration affects the whole string of loci, so a subset contains all the information (which will subsequently suffer recombination in the standard fashion). A formal analysis demonstrating the bias due to mutation on the chain rule for the case of completely linked loci is in the next section.

#### Simple recurrence relations:

##### Principle:

The recurrence equations for non-IBD at two loci depend on terms in two-, three-, and four-haplotype probabilities in previous generations (Weir and Cockerham 1974; Hill and Weir 2007), although some may have very small coefficients in the recurrence equations. Numerical examples (not shown), however, indicate that these three- and four-haplotype identities are of similar magnitude to each other over quite a wide range of parameters, as are corresponding terms for three or more loci. Thus, if genes at one of the pair of loci *A* and *B* are sampled from different haplotypes, the probability of (non-)IBD depends little on whether the other *A* and *B* genes are sampled from one or two more haplotypes. In addition, if the two loci are not very tightly linked, the probability of two-locus (non-)IBD for genes sampled on four different haplotypes is slightly greater than *X _{A}*

*X*,

_{B}*i.e*., the joint probability for two independent loci. Hence approximate recurrence predictions of non-IBD for two linked loci can be obtained solely by considering the probabilities on a pair of haplotypes and at individual loci. Similar arguments apply for more loci. Thus for two loci, this prediction of the two-locus non-IBD,

*X**, satisfies(12)If

_{AB}*r*is small and

_{AB}*N*is large, (12) reduces to(13)where

*X*

_{A}_{,t}

*= X*

_{B}_{,t}= [1 − 1/(2

*N*)]

*∼ exp(−*

^{t}*t*/2

*N*). The first term in Equations 12 and 13 denotes sampling two different and nonrecombined haplotypes that are non-IBD at both loci and the second denotes the sampling of recombinant gametes that are non-IBD at both loci. Equation 13 extends naturally to more loci, allowing for recombination between

*A*and

*B*and between

*B*and

*C*, and ignoring the chance of double recombinants. For example,(14)The two-locus terms in Equation 14 can be predicted from Equation 13.

These are simple rather than necessarily precise predictors, but Equation 12 is exact if linkage is complete (*r _{AB}* = 0) or if loci are essentially independent (

*R*→ ∞). Evaluations using Equations 13 and 14 compared to exact methods (Hill and Weir 2007) are illustrated in Figure 6. The method is seen to give reasonably good predictions for much of the range of

_{AB}*R*(0, , 1, 4, 16) and

*t*/

*N*(0, 0.01,…, 4). This is probably because the second term in Equations 12 and 13 makes a small contribution when

*r*is very small, but

*X*

_{A}_{,t}

*X*

_{B}_{,t}departs most from the actual probability when both loci are segregating; and when

*r*is large, it makes a larger contribution but is a better approximation of

*X*

_{A}_{,t}

*X*

_{B}_{,t}. Other examples (not shown) indicate that the approximation behaves relatively poorly in small populations (say

*N*< 10) than large (say

*N*> 50) for the same value of

*R =*4

*Nr*, which is expected since relative probabilities of random sampling from three rather than four haplotypes are more likely when

*N*is small. Similar results can be obtained using Equation 14 or alternatively by joint use of Equations 12 and 13 for loci pairs

*AB*and

*BC*together with the three-locus chain prediction (Equation 2). It can also be shown that Equations 12, 13, and 14 are consistent:

*i.e*., replacing

*X**

_{ABC}_{,t}by

*X**

_{AB}_{,t}

*X**

_{BC}_{,t}/

*X*

_{B}_{,t}at

*t*and

*t*+ 1 satisfies Equation 14 if terms of

*O*(<1/

*N*) are excluded.

##### Regional non-IBD:

Formulas for the two-locus non-IBD after integration with respect to time are derived in appendix a in the case of no mutation or migration (Equation A1). This equation can then be used with the chain rule to obtain multilocus non-IBD and, as it can be differentiated explicitly (Equation A2), can be used with Equation 7, to obtain a remarkably simple formula for regional non-IBD (Equation A3),(15)or, if Δ*F* = 1/2*N*, then = *X* exp[*L*(*F* − *t*Δ*F*)], where *L* = 4*Nl*. Results in Figure 2 show that Equation 15 gives reasonably satisfactory predictions of regional IBD.

##### Bottlenecks, mutation, and migration:

These recurrence formulas (Equations 12–14) extend straightforwardly to include bottlenecks in population size by changing *N* accordingly. Assume for simplicity that mutation rates (*u*) are the same at each locus and migration is at rate *m* haplotypes from a completely unrelated and large population (*i.e*., continent-to-island model). (A more complete migration analysis, for example using a finite-island model, is more complicated (Vitalis and Couvet 2001) and beyond the scope of this article.) From Kimura and Crow (1964), the recurrence relation for a single locus is (ignoring higher-order terms)For two loci the two-locus non-IBD arises if there is no mutation on two nonrecombinant haplotypes or a mutation at locus *B* at haplotypes on which *A* is non-IBD and vice versa. In the migration model used, an immigrant haplotype is non-IBD at all loci. Hence extending Equation 13 and similarly Equation 14 leads to(16)(17)

With mutation and migration included, asymptotic expectations as *t* → ∞ are given in appendix b. With complete linkage (so results are exact) and no migration the asymptotic value of the *k*-locus non-IBD probability based on iterating (17) reduces to *k*!*U ^{k}* for small values of

*U*(from Equation A5). In contrast, it reduces to 2

^{k}^{−1}

*U*

^{2k−1}by using Equation 16 to obtain the two-locus non-IBD and then applying the chain rule. This illustrates the breakdown of the chain rule with mutation, whereas with migration and no mutation or recombination, the

*k*-locus non-IBD asymptotes at

*M/*(

*M*+ 1) for any number of loci, satisfying the chain rule.

## RESULTS AND DISCUSSION

The probability of IBD simultaneously at two or more neutral loci is a generalization of Wright's inbreeding coefficient, *F*. Such probabilities are clearly functions of the population size, time, and the breeding structure, as is *F*, but they also depend on the degree of linkage between loci. For example, in a closed random-mating population without mutation, the probability of double IBD is approximately equal to *F*^{2} for unlinked loci, but increases to *F* for a completely linked pair. The multilocus IBD is a useful parameter in predicting the joint ancestry of multiple loci, for example, in mapping studies (Meuwissen *et al*. 2002), in inferences about historic population structure from current data (Hayes *et al*. 2003), and also in computing variances and covariances of quantitative traits in finite populations (Weir and Cockerham 1977; Barton and Turelli 2004). Whereas contributions to variance in the absence of epistasis depend only on two-locus identities or disequilibria, with epistasis, multilocus terms may be involved.

Although in principle a method exists for predicting multilocus IBD (Hill and Weir 2007), it is unwieldy for more than four loci and applies only for a haploid model. In contrast, the chain-rule method proposed here, which utilizes the independence of crossing-over events to compute multilocus non-IBD, is computationally simple for an unlimited number of loci and applies for diploid as well as haploid models assuming random mating. It is not, however, applicable exactly in the presence of mutation. The approximate method proposed previously by Hernández-Sánchez *et al*. (2004) generally gives poorer predictions and becomes unwieldy to apply for more than five or so loci.

The second method proposed in this article, which is based on ignoring some of the descent measures defined by Weir and Cockerham (1974) for two loci and Hill and Weir (2007) for more, gives less precise predictions because of the simplifications made, but is straightforward to apply and leads to closed formulas at intermediate generations and for regional non-IBD. In addition, it can be applied when there is much mutation, for it generally performs better than the chain rule for any degree of recombination when mutation rates are moderate or high (*U* > 0.25) (results not shown). As the chain rule is in any case easier to apply for multiple loci, there seems little benefit in using the simple method other than to cope with mutation.

The relation between multilocus non-IBD and moments of multilocus linkage disequilibria is shown by Weir and Cockerham (1974) and Hill and Weir (2007). These require all the relevant descent measures; for two loci, for example, the expected linkage disequilibrium, *E*(*D*^{2}) is a function of nonidentity of genes sampled from two haplotypes (*i.e*., *X _{AB}*), three haplotypes, and four haplotypes. Thus neither of the linear methods developed here involving only sampling from two haplotypes can be used to predict such moments of disequilibria.

A potential application of this theory is fine mapping of QTL, where the data comprise phenotypes for the trait and genotypes at nearby marker loci, such that probabilities of IBD at the QTL can be computed for any individuals (Meuwissen and Goddard 2001). Using the equations developed here to calculate multilocus (non-)IBD, the probability of IBD at putative QTL can be computed for any pair of individuals in the population, conditional on their genotypes or IBS at marker loci. For example, for marker *A* and QTL *B*, *P*(IBS *A*, IBD *B*)/[*P*(IBS *A*, IBD *B*) + *P*(IBS *A*, non-IBD *B*)], in which *P*(IBS *A*, IBD *B*) = *F _{AB}* + (

*F*)(1 −

_{B}− F_{AB}*H*), where

_{A}*H*is its heterozygosity in the founder population. Assuming a model of random QTL effects, the covariance due to the QTL between individuals

_{A}*i*and

*j*is , where

*k*and

*l*denote QTL alleles. Therefore, the variance contributed by a putative QTL () at any position can be estimated using predicted IBD among all alleles in a sample. Likewise, the regression models proposed by Hernández-Sánchez

*et al*. (2006) to predict IBD at the QTL given IBS at linked markers can now be more easily extended to include multiple markers together using this multilocus theory. These calculations require assumptions of population history and marker allele frequencies or heterozygosity at its foundation. In this application, at least in the livestock context, population sizes are not likely to be so large that mutation rates at marker loci, particularly SNPs, will be sufficient to lead to appreciable inaccuracies of prediction because of breakdown of the chain rule. More importantly, the robustness of the rule to migration or population introgression seems a far more important feature.

Regional IBD has also been used in gene mapping. For example, Goldgar (1990) predicted regional IBD among sibling pairs and Guo (1995) extended the method to accommodate any pair of relatives within a simple pedigree. Henceforth, gene mapping consisted of correlating phenotypic similarity with regional IBD. Regional IBD is conceptually linked to Fisher's (1953) junction theory. As junctions were defined as recombination events delimiting different IBD regions, there must be a link between the number of junctions and the regional IBD obtained in this work (*e.g*., MacLeod *et al*. 2005).

Finally, predicting IBD from IBS requires, as do Meuwissen and Goddard (2000), information on population history, and robustness to historical assumptions is an issue needing research.

## APPENDIX A : EXPLICIT APPROXIMATION FOR SEGMENTAL NON-IBD

From the iterative approximation (Equation 13) assuming no mutation,Replacing this difference equation by a differential equation, and noting that (1 − 1/(2*N*)]* ^{t}* ∼

*e*

^{−t/2N},The equation has solution (Korn and Korn 1968). Hence, after rearrangement and integration with respect to

*t*, and noting that

*X*= 1 if

_{AB}*t*= 0,(A1)for

*r*≠ 1/4

_{AB}*N*, and

*X**

_{AB}_{,t}=

*e*[1 +

^{−t/N}*t*/2

*N*] if

*r*= 1/4

_{AB}*N*. To utilize the chain rule to compute non-IBD for genomic segments, we require the derivative at generation

*t*:Evaluating the derivative at

*R*= 0 (see Equation 4) and dividing by

_{AB}*X*

_{A}= e^{−t/}^{2N},(A2)and the derivative wrt

*r*is 4

_{AB}*N*times larger. Hence, an approximation, for a region of length

*L*= 4

*Nl*is(A3)

## APPENDIX B: ASYMPTOTIC VALUES FROM THE SIMPLE APPROXIMATION

With mutation and migration included, we consider just asymptotic expectations, denoted , ,…, assuming other parameters to be constant and *t* → ∞. Equating values in successive generations following Kimura and Crow (1964), for a single locus = (*U* + *M*)/(*U* + *M* + 1), and for two loci from Equation 16,(B1)With complete linkage and no migration (*M* = *R _{AB}* = 0), (B1) is exact and reduces to

*=*[

*U*/(

*U*+ 1)][2

*U/*(2

*U*+ 1)]. For

*k*loci with the same assumptions, by using Equation 17 it can be shown that(B2)which reduces to

*k*!

*U*for small values of

^{k}*U*.

## Acknowledgments

We are grateful to Mike Goddard, Bruce Weir, Xu-Sheng Zhang, and two referees for suggestions, comments, and advice. This work was supported in part by grants from the Biotechnology and Biological Sciences Research Council to W.G.H. and to Sara Knott.

## Footnotes

Communicating editor: J. Wakeley

- Received April 10, 2007.
- Accepted May 14, 2007.

- Copyright © 2007 by the Genetics Society of America