## Abstract

Molecular markers can be employed to predict the parental genome contribution to inbred lines. The proportion α of alleles originating from parent P_{1} at markers polymorphic between the parental lines P_{1} and P_{2} is commonly used as a predictor for the genome contribution of parent P_{1} to an offspring line. Our objectives were to develop a new marker-based predictor ξ for the parental genome contribution, which takes into account not only the alleles at marker loci but also their map distance, and to compare the prediction precision of ξ with that of alternative methods. We derived formulas for ξ for inbreds derived from biparental crosses (F_{1} and backcrosses) with the single-seed descent or double-haploid method and presented an extension ξ* possessing statistical optimum properties. In a simulation study, α showed a systematic overestimation of large parental genome contribution that was not observed for ξ. The mean squared prediction error of ξ was at least 50% smaller than that of α for linkage maps with unequal distances between adjacent markers. A data set from a study on plant variety protection in maize was used to illustrate the application of ξ. We conclude that ξ provides substantially greater prediction precision than the commonly used predictor α in a broad range of applications in genetics and breeding.

GENETIC fingerprinting of inbred lines and their crossing parents with molecular markers provides a means to assess the parental origin of the genome of a line. It is carried out routinely in basic genetic research and applied breeding programs. Applications include, for example, the prediction of the donor genome proportion in inbred lines derived from backcross individuals of a gene introgression program or in near-isogenic lines of an introgression library developed either for fine mapping of QTL or for identification of favorable chromosome segments in genetic resources. In plant variety protection, prediction of the parental genome contribution is employed to decide whether or not a line is derived essentially from a progenitor line. In estimation of the breeding value of a line using phenotypic information from its crossing parents, marker-based prediction of the parental genome contribution can replace the assumption that each parent of a biparental cross contributes one-half to the genome of an offspring line.

In these applications, the proportion of marker alleles that are identical with the alleles of a parental line is commonly used to predict the contribution of the parental line to the genome of the derived inbred line (*cf.* Bernardo *et al.* 2000; Heckenberger *et al.* 2005b). The major shortcoming of this unweighted prediction is that neither linkage between markers nor the stochastic dependence between the parental origin of the marker alleles and the parental origin of the adjacent genomic regions is taken into account.

In the context of recurrent backcrossing, Visscher (1996) suggested to predict the contribution of the donor parent to the genome of a backcross individual by assigning different weights to the markers. He treated prediction of the parental genome contribution based on linked markers analogously to prediction of the breeding value of an individual based on different sources of phenotypic information. Extending the previous work of Hill (1993), he applied selection index theory (Hazel 1943) to derive weights depending on the recombination frequency between markers. However, for inbred lines no advanced theory has been elaborated for molecular marker-based prediction of the parental genome contribution.

We focused on inbred lines developed from biparental crosses (F_{1} or backcrosses) with the single-seed descent or double-haploid method. The objectives of our research were to (1) develop a new marker-based predictor ξ for the parental genome contribution, which takes into account not only the alleles at marker loci but also their map distance, (2) present an extension ξ*, which possesses statistical optimum properties, and (3) compare the prediction precision of ξ with that of alternative methods. Furthermore, various examples for applications of the predictor ξ in genetics and breeding are discussed.

## THEORY

#### Outline of the prediction approach:

The parental origin of the genome in a derived inbred line can be traced with molecular markers, which are polymorphic in the parental lines P_{1} and P_{2}. The markers can be regarded as a sample of all loci in the genome and, therefore, the parental genome contribution to marker loci can be used as a predictor for the parental genome contribution to the entire genome. However, typically marker maps are not equally spaced and the different lengths of marker intervals are ignored in such a prediction. We suggest a predictor ξ, which takes into account not only the genotype at the marker loci, but also the map distance between adjacent markers. The principle of ξ is to determine for each locus in the genome the conditional expectation that it carries the allele of parent P_{1} under the condition of the observed genotype at flanking markers. The genome is subdivided into nonoverlapping chromosome intervals, of which the borders are defined by the markers, and the conditional expectations are integrated along the chromosome intervals. This yields a prediction of the parental genome contribution of P_{1} to each chromosome interval. Subsequently, the predictions for the chromosome intervals are weighted with the interval lengths and averaged to obtain a predictor for the genome contribution of parent P_{1} to the entire genome.

#### Notation and assumptions:

Map positions, measuring the distance of a locus from a telomere in morgan units, are denoted by *x*. Indicator variables *G* take the value 1 if the allele at the corresponding locus originates from parent P_{1} and 0 otherwise. Realizations of *G* are denoted with *g*. We subdivide the genome into *n* nonoverlapping chromosome intervals. A chromosome interval *i* is delimited by either (1) two markers with map positions < or (2) a marker and a telomere. For case 2 we assume without loss of generality that the telomere has map position 0 and the distance between the marker and the telomere is *x _{ai}*. The length of a chromosome interval is

*d*= − (case 1) and

_{i}*d*= (case 2). The genome length is .

_{i}We assume that the offspring are completely homozygous lines, derived without selection from a biparental cross or a backcross of completely homozygous parents P_{1} and P_{2} that are polymorphic in at least one marker per chromosome. We further assume absence of interference (Stam 1979) in crossover formation such that the recombination frequency *r _{uv}* between two loci with map positions

*x*≤

_{u}*x*is calculated by Haldane's (1919) mapping function:(1)

_{v}#### The predictor ξ:

The predictor ξ of the genome contribution of parent P_{1} to a derived line is defined as(2)where ξ_{i} is the prediction of the genome contribution of parent P_{1} to the *i*th chromosome interval.

We consider at first a finite number *w* of loci equidistantly distributed at positions *x*_{1}, … , *x _{w}* on a chromosome interval delimited by markers at positions and . We then have(3)where

*E*(

*G*| , ) is the conditional expectation that the locus at map position

_{s}*x*carries the allele of parent P

_{s}_{1}under the condition that the genotypes and were observed at the two flanking markers with map positions and . Following the principle used by Franklin (1977) and Hill (1993), Equation 3 can be extended to an infinite number of loci at positions

*x*:(4)

_{s}For telomere chromosome intervals we have in analogy(5)

The conditional expectation of *G _{s}* is (omitting the subscript

*i*for the chromosome interval)(6)for

*x*in chromosome intervals flanked by two markers and(7)for

_{s}*x*in chromosome intervals flanked by a marker and the telomere.

_{s}#### Mating systems:

We express the one-, two-, and three-locus genotype frequencies required for Equations 6 and 7 in terms of(8)where *x _{u}*,

*x*∈ {

_{v}*x*,

_{a}*x*,

_{b}*x*}. The values of

_{s}*p*and

*q*depend on the mating system used for deriving the inbred line and the map distance between the markers at positions

_{uv}*x*and

_{u}*x*. In this study, we consider four mating systems: (1) (F

_{v}_{2})

^{t}-single-seed descent (SSD) lines are developed by

*t*(

*t*≥ 0) generations of random mating of an F

_{2}population and subsequent application of the single-seed descent method for line development; (2) (F

_{1})

^{t}-double-haploid (DH) lines are developed by

*t*(

*t*≥ 0) generations of random mating of an F

_{1}cross and subsequent inbred line development with double haploids; and (3) backcross (BC)

_{t}-SSD and (4) BC

_{t}-DH lines are developed from an F

_{1}cross backcrossed

*t*(

*t*≥ 1) times to parent P

_{2}, with subsequent line development by the single-seed descent or double-haploid method, respectively. Expressions for

*p*and

*q*under these mating systems are given in Table 1, and the corresponding derivations are presented in the appendix.

_{uv}#### Genotype frequencies:

For the derivations a shorthand notation is used. We omit the names of random variables in definitions of multilocus genotype frequencies and use only the value of the realizations. For example, *P*(*G _{a}* = 1,

*G*= 1) is abbreviated as

_{s}*P*(11), and

*P*(

*G*= 1,

_{a}*G*= 1,

_{s}*G*= 1) as

_{b}*P*(111). For the derivation of three-locus genotype frequencies, two-locus genotype frequencies referring to subsets of the three loci are required. In this case, the realization of the third (not considered locus) is denoted with a “–,”

*e.g.*,

*P*(

*G*= 1,

_{a}*G*= 1) is abbreviated as P(1–1).

_{b}The single-locus genotype frequencies follow directly from the definition of *p* in Equation 8:(9)

The two-locus genotype frequencies for two loci at map positions *x _{u}*,

*x*∈ {

_{v}*x*,

_{a}*x*,

_{b}*x*} can be written as(10)

_{s}For deriving the three-locus genotype frequencies with respect to three loci at map positions *x _{a}* <

*x*<

_{s}*x*for (F

_{b}_{1})

^{t}-DH and (F

_{2})

^{t}-SSD lines, we follow an approach outlined by Haldane and Waddington (1931) and recently developed by Broman (2005) for F

_{2}-SSD lines (named two-way RILs in his article). We have(11)and because of symmetry(12)

Solving this system of linear equations and using *p* = yields(13)

For BC_{t}-DH and BC_{t}-SSD lines we employ the system of equations(14)

For BC_{t}-DH lines we use(15)to solve the system of equations in Equation 14 and obtain(16)

For BC_{t}-SSD lines we have(17)where the superscript *b* refers to the parameters of a BC_{t} individual, which can be obtained from Equation 16 by replacing *t* + 1 with *t*, and superscript *s* refers to the parameters of F_{2}-SSD lines. Solving Equation 14 yields(18)

Note that (1) the genotype frequencies obtained with Equation 13 after inserting *p* and *q _{ab}* for F

_{2}-SSD lines are identical to those of Broman (2005), and (2) the genotype frequencies for BC

_{t}-DH lines are identical with those obtained with the formulas of Visscher and Thompson (1995) for BC

_{t+1}individuals.

#### Conditional expectation ξ*:

The predictor ξ can be extended by replacing *E*(*G _{s}* |

*g*,

_{a}*g*) (Equation 4) and

_{b}*E*(

*G*|

_{s}*g*) (Equation 5) with(19)where

_{a}**g**

_{i}is a vector consisting of the marker genotype of the markers on the

*i*th chromosome. The resulting predictor ξ* is the conditional expectation of the parental genome contribution to an inbred line under the condition of the observed marker genotype. For calculation of the multilocus genotype frequencies in Equation 19, the recursion equations of Hospital

*et al.*(1996) can be employed. Further, the closed-form equations derived by Visscher and Thompson (1995) for BC

_{t}individuals can be applied to BC

_{t−1}-DH lines.

## DISCUSSION

#### Other predictors for the parental genome contribution:

A commonly used predictor (*cf.* Bernardo *et al.* 2000; Heckenberger *et al.* 2005b) of the genome contribution of parent P_{1} to an inbred line is the proportion of marker alleles from P_{1} in the set of polymorphic markers between P_{1} and P_{2},(20)where *m* is the number of markers and *g _{j}* refers to the genotypes at the marker loci. Major shortcomings of the unweighted predictor α are that (i) the correlation between markers due to linkage and (ii) the stochastic dependence between the markers and the adjacent genomic regions are not taken into account. The advantage of α is that no prior information about the mating system used to develop the line is required.

No previous studies exist about more efficient predictors for the parental genome contribution to inbred lines. However, Visscher (1996) developed an approach for predicting the proportion of the genome originating from the donor parent in backcross individuals, borrowing ideas from selection index theory (Hazel 1943),(21)where *c* is the number of chromosomes, *l _{i}* is the length of the

*i*th chromosome,

**g**

_{i}is a vector consisting of the marker genotype of the markers on the

*i*th chromosome,

**V**

_{i}is the covariance matrix of

**g**

_{i}, and

**y**

_{i}is a vector consisting of the covariances between the donor genome at the markers and the donor genome on the carrier chromosome of the markers.

Visscher's (1996) approach can be extended to inbred lines derived from arbitrary mating systems by defining for each chromosome (we omit the index for the chromosome)(22)where *D _{us}* = (

*q*−

_{us}*p*)

*p*is the expected gametic disequilibrium between loci at map positions

*x*and

_{u}*x*under the considered mating system, with expressions for

_{s}*p*and

*q*given in Table 1. For example, for BC

_{us}_{t}-DH lines we have(23)

In comparison with the predictor α, the weighted predictor β has the advantage that the markers contribute with different weights to β, depending on their linkage; *i.e.*, β takes into account the correlation between the markers on a chromosome. However, it ignores the stochastic dependence between the markers and the adjacent genomic regions on a chromosome.

In contrast to α and β, the predictor ξ takes into account both the correlation between markers and the stochastic dependence between markers and the adjacent genomic regions. The former is considered by weighting ξ_{i} with the distance *d _{i}* between adjacent markers (Equation 2) and the latter by the integration of

*E*(

*G*|

_{s}*g*,

_{a}*g*) along the chromosome (Equations 4 and 5).

_{b}#### Conditional expectation ξ*:

The predictor ξ* is unbiased, and this can be shown using *E*(*G _{u}*) =

*p*and

*E*(

*E*[

*Z*|

**G**]) =

*E*(

*Z*) (Shao 1999, p. 33, Proposition 1.12.iv), where the random variable

*Z*denotes the parental genome contribution to an inbred line and the random vector

**G**its multilocus marker genotype. From ξ* =

*E*(

*Z*|

**g**) it follows that ξ* is also unbiased in the sample space Ω

**, which comprises the parental genome contribution to all possible inbred lines having a certain marker genotype**

_{g}**g**. From an applied point of view, this means that ξ* is neither systematically overestimating nor underestimating the parental genome contribution for any given marker genotype

**g**. It can be further shown that the conditional expectation of a random variable has minimum variance among all unbiased predictors (Shao 1999, p. 33, Equation 1.40). In consequence, the conditional expectation ξ* can be regarded as an optimum predictor of the parental genome contribution.

For the F_{1}-DH mating system (which is, for example, often employed for the development of inbred lines in hybrid maize breeding programs) the predictors ξ and ξ* are identical under the assumption of no interference in crossover formation. For other mating systems, calculation of ξ* requires calculation of multilocus genotype frequencies. In contrast to the relatively simple calculations of two- and three-locus genotype frequencies, which can easily be carried out with standard software such as R (Ihaka and Gentleman 1996), calculation of multilocus genotype frequencies requires extensive programming (*cf.* Servin *et al.* 2002).

We compared ξ* and ξ for several special cases and found only small numerical differences in the results. We therefore conclude that the simple calculations required for ξ may outweigh the theoretical optimum properties of ξ* in many practical applications. It could be the subject of further research to investigate whether the substantially greater programming and computational effort, which is required for calculation of ξ* in the general case, results in a significant improvement of the prediction accuracy compared with ξ.

#### Systematic prediction error of α and β:

Consider the example of a 2-M chromosome of an F_{2}-SSD line, on which two markers located 0.5 and 1.5 M from the telomere carry the allele of parent P_{1} (Figure 1A). Loci in the genome region between the two markers are up to 0.5 M distant from the nearest adjacent marker. Owing to the large recombination frequency between distant loci, the markers predict only poorly the genotype at these loci. The predictors α and β do not take this low correlation into account and the genotype of all loci on the chromosome is predicted to be the same as the genotype of the markers: α = β = 1. We now focus on a large number of chromosomes carrying the alleles of parent P_{1} at the two markers. Only ∼70% of these chromosomes carry the allele of parent P_{1} at a locus in the center between the markers [*E*(*G _{s}* |

*g*,

_{a}*g*) ≈ 0.7, see Figure 1]. Hence, with respect to all possible chromosomes having the considered marker genotype, the predictors α and β are systematically overestimating the genome proportion originating from parent P

_{b}_{1}.

For symmetry reasons, the genome contribution of parent P_{1} to chromosomes carrying at both markers the allele of parent P_{2} is systematically underestimated by the predictors α and β. In contrast, α and β show no systematic prediction error for chromosomes on which recombination occurred (Figure 1B). Individuals having no recombination between two markers with map distance 1 M occur in an F_{2}-SSD population with probability 0.54 (*cf.* Haldane and Waddington 1931). Hence, a considerable systematic prediction error of α and β is observed for more than half of the chromosomes of an F_{2}-SSD population.

#### Systematic prediction error for the entire genome:

The above theoretical example illustrates that in principle systematic prediction errors can occur when employing predictors α and β. To investigate whether the extent of such systematic prediction errors is of relevance in practical applications, we conducted a simulation study with Plabsoft (Maurer *et al.* 2004). Simulated data were used because they provide the “true” parental genome contributions *z* of parent P_{1} as well as the predictions (ϑ ∈ {α, ξ}) for each simulated inbred line. This allows us to generate a large number of inbred lines and determine the prediction errors *e* = ϑ − *z*.

For the simulation we employed a model of the maize genome based on the study of Heckenberger *et al.* (2005a). It consists of 10 chromosomes of length 1.70, 1.30, 1.06, 1.48, 1.28, 1.15, 1.14, 1.21, 0.99, and 0.91 M and 100 SSR markers, which were chosen for good coverage of the entire genome. We simulated 1000 F_{2}-SSD lines, for which we determined the prediction errors of α and ξ. The correlation ρ_{α,e} = 0.36 between the predicted genome proportions α and the corresponding prediction errors *e* was highly significant (type I error rate 0.001), whereas no significant correlation was observed for the predictor ξ (Figure 2).

We conclude that the extent of systematic overestimation of large parental genome contributions and systematic underestimation of small ones by the predictor α can cause serious problems with linkages maps commonly used in practical applications.

#### Precision of prediction:

To assess systematically the precision of prediction of α, β, and ξ in the four mating systems under consideration, we conducted a simulation study. We employed a model of the maize genome with 10 chromosomes of length 1.6 M. Twenty to 200 markers were assumed to be (a) randomly distributed and (b) equally spaced in the genome. In practice, the marker distribution ranges between these two extremes, which can be regarded as a “worst-case” scenario (a) and a “best-case” scenario (b). For each combination of marker density and spacing we simulated 500 F_{2}-SSD, F_{1}-DH, BC_{1}-SSD, and BC_{1}-DH populations of size 100. (For random spacing of markers, different maps were used for each of the 500 populations.) The correlations ρ_{α,e}, ρ_{β,e}, and ρ_{ξ,e} between predicted values and prediction errors as well as the mean squared prediction errors(24)

were determined for each simulated population. The results were then averaged over the 500 populations.

The correlations ρ_{α,e}, ρ_{β,e} were highly significant (type I error rate 0.001) for all combinations of the investigated parameters, while ρ_{ξ,e} was not significantly different from zero for any combination. The largest correlations, amounting to 0.75, were observed for predictor α with sparse maps and random marker positions (Table 2). However, even with 200 equally spaced markers ρ_{α,e} ≥ 0.25 and ρ_{β,e} ≥ 0.15. The mean squared prediction error *M*_{ξ} was at least 50% smaller than *M*_{α} for randomly distributed markers, and *M*_{β} ranged in between and approached the values of *M*_{ξ} for ≥100 markers. With equally spaced maps, the differences between *M*_{α}, *M*_{β}, and *M*_{ξ} were negligible for >80 markers.

We conclude that the superiority of ξ compared to α and β with respect to the mean squared prediction error reduces with increasing numbers of equally spaced markers. However, even for dense maps with equally spaced markers, the correlations ρ_{α,e} and ρ_{β,e} between predicted value and prediction error indicate that systematic prediction errors of α and β are to be expected, with negative effects for practical applications.

#### Application to experimental data:

Prediction of the parental genome contribution is illustrated with experimental data from a study on plant variety protection in maize (Heckenberger *et al.* 2005a). The genotype of 100 SSR markers was assessed at 56 F_{2}-SSD lines and their crossing parents. For each inbred line, markers not polymorphic between its crossing parents were discarded. This resulted in different marker sets used for the calculations in each line, with numbers of polymorphic markers *m* ranging between 38 and 67. From the genotype at the polymorphic markers, the predictors α, β, and ξ were calculated (Table 3 lists results for the 12 lines with the largest and smallest values of ξ).

The differences between the predictors α and ξ were mostly negative for small values of ξ and mostly positive for large values, reaching up to 11% (line 1). Comparing these values with simulation results for the same linkage map (Figure 2) leads to the conclusion that the differences observed for large and small parental genome contributions are partially caused by the systematic prediction error of α.

A method to detect essentially derived varieties is to compare a prediction of the parental genome contribution to an inbred line with a threshold value. Heckenberger *et al.* (2005b) suggested to use as thresholds the quantiles of the probability distribution of the parental genome contribution to inbred lines under an accepted breeding method. For the chromosome lengths underlying the study of Heckenberger *et al.* (2005a) we investigated this strategy and determined with a simulation the 0.95 quantile of the parental genome contribution to F_{2}-SSD lines as *t* = 0.662. When comparing the predictions of the parental genome contribution to an F_{2}-SSD population with this threshold value, then it is expected that 5% of the F_{2}-SSD lines are classified incorrectly as essentially derived varieties. In our experimental F_{2}-SSD population α > *t* for 7 lines (12.5% of the 56 lines) and β > *t* for 6 lines (10.7%), but ξ > *t* only for 3 lines (5.4%) (Table 3).

Consequently, the systematic overestimation of large parental genome contributions by α and β can result in a greater error rate of incorrectly classifying a line as essentially derived than is nominally associated with a chosen threshold value. However, due to the stochastic nature of meiosis, using α does not necessarily result in a greater error rate in every experimental population. This can be seen, for example, when comparing the lower tail of the distribution of the experimental data with 1 − *t*.

Summarizing, ξ allows prediction of the parental genome contribution with a much higher precision than the unweighted predictor α commonly employed in practice, in particular when extreme values of the parental genome contribution are of interest. Thus, using ξ instead of α is clearly advantageous for obtaining reliable conclusions on the true parental genome contribution to an inbred line.

#### Deviations from the assumptions:

As applies to most mathematical models of biological systems, the presented prediction method is not capable of capturing every detail of the underlying biological process, and the results should be interpreted with this in mind. Among the assumptions made in our derivations, the following seem of particular importance:

We assumed absence of interference in crossover formation, although it is well known that interference occurs (for a discussion on using noninterference models, see Frisch and Melchinger 2001).

We assumed known map positions of the markers. However, in practice, the linear order and map distances are estimated from mapping experiments with one or several segregating populations. Depending on the size and type of the mapping population(s), the estimated map may deviate from the true map due to sampling error or other causes.

We assumed absence of selection during backcrossing and inbred line development. If selection is carried out, the probability that a certain locus carries the allele of P

_{1}may differ from our derivations.

If, for a certain study, one or several of these assumptions do not hold true, the actual advantage in precision of ξ compared with α and β may be smaller than that under the idealized model, where all assumptions are fulfilled.

#### Applications in genetics and breeding:

Being aware of the above limitations, the presented results demonstrate that the predictor ξ provides a substantial improvement in the precision of predicting the parental genome contribution to inbred lines compared with the commonly used unweighted predictor α. This improved precision can be important in a broad range of practical applications.

In inbred lines developed from backcross individuals of a gene introgression program, exact prediction of the parental contribution of the donor parent can help to assess the risk of negative phenotypic effects caused by the donor genome. The prediction of the parental genome contribution to inbred lines complements the prediction of the donor genome proportion in backcross individuals described by Frisch and Melchinger (2005). The combination of both approaches allows monitoring of the parental genome proportion from the first backcross generation until the converted inbred line is finally developed.

Introgression libraries of near-isogenic lines (Eshed and Zamir 1995) are increasingly developed in various crops, *e.g.*, for fine mapping of QTL or for identification of advantageous chromosome segments in exotic genetic resources or landraces (Tanksley and Nelson 1996). The presented approach can be used to predict the donor genome proportion in chromosome regions where a line of an introgression library carries the marker alleles of the recurrent parent. This can help to assess the risk that the observed phenotypic effect is not caused by the chromosome segment introgressed on purpose but by other donor chromosome segments not detected by the employed marker set.

In plant variety protection, exact prediction of the parental genome contribution to an inbred line is of crucial importance to draw conclusions whether or not the line (1) was developed with a generally accepted breeding method or (2) has a parental genome proportion below a generally accepted threshold. The predictor ξ can be employed, for example, to estimate precisely the parental genome contribution of one parent, assuming a given mating system, which can then be compared with threshold values (Heckenberger *et al.* 2005b). In this context, it is of particular interest that α is overestimating systematically large parental genome contributions.

In quantitative genetic studies, the genome contribution of a parent to a biparental crossing progeny is usually assumed to be one-half. If ξ is employed instead, this allows us to consider not only the expected relation between two lines on the basis of the mating system, but also the actual similarity at the level of the entire genome. A possible application is, for example, the best linear unbiased prediction of the breeding value of a line employing phenotypic information from its crossing parents.

Recurrent full-sib mating is used to generate homozygous strains in animals such as mice. Employing results of Haldane and Waddington (1931), parameters *p* and *q _{uv}* for recurrent full-sib mating can be derived analogously to the derivations for recurrent selfing in the appendix. Using this extension, the theory presented here can be used straightforwardly for applications in animal genetics.

Prediction of the parental genome proportion with ξ can be interpreted as a “map-based genetic distance” between an inbred line and its crossing parent. It seems promising for further research to investigate whether the principle used in this study can be extended to provide map-based genetic distances for general pedigrees and/or heterozygous individuals.

## APPENDIX

We derive the probabilites *p* and *q _{uv}* for (F

_{1})

^{t}-DH, (F

_{2})

^{t}-SSD, BC

_{t}-DH, and BC

_{t}-SSD lines. For the derivations we use the relationshipwhereis the expected gametic disequilibrium between two loci at positions

*x*and

_{v}*x*in infinite populations.

_{u}#### (F_{1})^{t}-DH lines:

The probability that a locus of an (F_{1})^{t}-DH line carries the allele of parent P_{1} is *p* = . The expected linkage disequilibrium in an (F_{1})^{1} (*i.e.*, an F_{2}) population is

Because (i) the expected gametic disequilibrium in an (F_{1})^{t−1}-derived DH line equals that of an (F_{1})^{t} population and (ii) in random mating populations, the linkage disequilibrium decreases with ratio (1 − *r _{uv}*) per generation (Falconer and Mackay 1996, p. 18), the expected gametic disequilibrium for (F

_{1})

^{t}-DH lines is(A1)

In consequence, we have

#### (F_{2})^{t}-SSD lines:

The probability that a locus in an (F_{2})^{t}-DH line carries the allele of parent P_{1} is *p* = . The linkage disequilibrium in SSD lines derived from a population in Hardy–Weinberg equilibrium with linkage disequilibrium of *D*′_{uv} is(Cockerham and Weir 1973). Because for an (F_{2})^{t} population (derivation in analogy to Equation A1)we have for (F_{2})^{t}-SSD linesand, therefore,

#### BC_{t}-DH lines:

The probability that a locus of BC_{t}-derived DH line carries the allele of parent P_{1} is *p* = (1/2)^{t+1} and the probability that a locus at position *x _{v}* carries the allele of P

_{1}under the condition that the locus at position

*x*carries the allele of parent P

_{u}_{1}is

#### BC_{t}-SSD lines:

The probability that a locus in a BC_{t}-derived SSD line carries the allele of parent P_{1} is *p* = (1/2)^{t+1}. The probability that continued selfing of an individual with genotype ABab results in an inbred with one of genotypes AbAb or aBaB is(Haldane and Waddington 1931). Consequently, for BC_{t}-SSD lines,and, therefore,

## Acknowledgments

We thank the anonymous reviewers for their comments and suggestions, which helped to improve the manuscript. In particular, we are greatly indebted to an anonymous reviewer for pointing out a major mistake in an earlier version of the manuscript.

## Footnotes

Communicating editor: R. W. Doerge

- Received February 17, 2006.
- Accepted July 25, 2006.

- Copyright © 2006 by the Genetics Society of America