Originally published as Genetics Published Articles Ahead of Print on August 3, 2006.

Genetics, Vol. 174, 795-803, October 2006, Copyright © 2006
doi:10.1534/genetics.106.057273

Marker-Based Prediction of the Parental Genome Contribution to Inbred Lines Derived From Biparental Crosses

Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany

1 Corresponding author: Institute of Plant Breeding, Seed Science, and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany. 
E-mail: melchinger{at}uni-hohenheim.de

Manuscript received February 17, 2006. Accepted for publication July 25, 2006.

ABSTRACT

Molecular markers can be employed to predict the parental genome contribution to inbred lines. The proportion {alpha} of alleles originating from parent P1 at markers polymorphic between the parental lines P1 and P2 is commonly used as a predictor for the genome contribution of parent P1 to an offspring line. Our objectives were to develop a new marker-based predictor {xi} for the parental genome contribution, which takes into account not only the alleles at marker loci but also their map distance, and to compare the prediction precision of {xi} with that of alternative methods. We derived formulas for {xi} for inbreds derived from biparental crosses (F1 and backcrosses) with the single-seed descent or double-haploid method and presented an extension {xi}* possessing statistical optimum properties. In a simulation study, {alpha} showed a systematic overestimation of large parental genome contribution that was not observed for {xi}. The mean squared prediction error of {xi} was at least 50% smaller than that of {alpha} for linkage maps with unequal distances between adjacent markers. A data set from a study on plant variety protection in maize was used to illustrate the application of {xi}. We conclude that {xi} provides substantially greater prediction precision than the commonly used predictor {alpha} in a broad range of applications in genetics and breeding.


GENETIC fingerprinting of inbred lines and their crossing parents with molecular markers provides a means to assess the parental origin of the genome of a line. It is carried out routinely in basic genetic research and applied breeding programs. Applications include, for example, the prediction of the donor genome proportion in inbred lines derived from backcross individuals of a gene introgression program or in near-isogenic lines of an introgression library developed either for fine mapping of QTL or for identification of favorable chromosome segments in genetic resources. In plant variety protection, prediction of the parental genome contribution is employed to decide whether or not a line is derived essentially from a progenitor line. In estimation of the breeding value of a line using phenotypic information from its crossing parents, marker-based prediction of the parental genome contribution can replace the assumption that each parent of a biparental cross contributes one-half to the genome of an offspring line.

In these applications, the proportion of marker alleles that are identical with the alleles of a parental line is commonly used to predict the contribution of the parental line to the genome of the derived inbred line (cf. BERNARDO et al. 2000; HECKENBERGER et al. 2005b). The major shortcoming of this unweighted prediction is that neither linkage between markers nor the stochastic dependence between the parental origin of the marker alleles and the parental origin of the adjacent genomic regions is taken into account.

In the context of recurrent backcrossing, VISSCHER (1996) suggested to predict the contribution of the donor parent to the genome of a backcross individual by assigning different weights to the markers. He treated prediction of the parental genome contribution based on linked markers analogously to prediction of the breeding value of an individual based on different sources of phenotypic information. Extending the previous work of HILL (1993), he applied selection index theory (HAZEL 1943) to derive weights depending on the recombination frequency between markers. However, for inbred lines no advanced theory has been elaborated for molecular marker-based prediction of the parental genome contribution.

We focused on inbred lines developed from biparental crosses (F1 or backcrosses) with the single-seed descent or double-haploid method. The objectives of our research were to (1) develop a new marker-based predictor {xi} for the parental genome contribution, which takes into account not only the alleles at marker loci but also their map distance, (2) present an extension {xi}*, which possesses statistical optimum properties, and (3) compare the prediction precision of {xi} with that of alternative methods. Furthermore, various examples for applications of the predictor {xi} in genetics and breeding are discussed.


THEORY

Outline of the prediction approach:

The parental origin of the genome in a derived inbred line can be traced with molecular markers, which are polymorphic in the parental lines P1 and P2. The markers can be regarded as a sample of all loci in the genome and, therefore, the parental genome contribution to marker loci can be used as a predictor for the parental genome contribution to the entire genome. However, typically marker maps are not equally spaced and the different lengths of marker intervals are ignored in such a prediction. We suggest a predictor {xi}, which takes into account not only the genotype at the marker loci, but also the map distance between adjacent markers. The principle of {xi} is to determine for each locus in the genome the conditional expectation that it carries the allele of parent P1 under the condition of the observed genotype at flanking markers. The genome is subdivided into nonoverlapping chromosome intervals, of which the borders are defined by the markers, and the conditional expectations are integrated along the chromosome intervals. This yields a prediction of the parental genome contribution of P1 to each chromosome interval. Subsequently, the predictions for the chromosome intervals are weighted with the interval lengths and averaged to obtain a predictor for the genome contribution of parent P1 to the entire genome.

Notation and assumptions:

Map positions, measuring the distance of a locus from a telomere in morgan units, are denoted by x. Indicator variables G take the value 1 if the allele at the corresponding locus originates from parent P1 and 0 otherwise. Realizations of G are denoted with g. We subdivide the genome into n nonoverlapping chromosome intervals. A chromosome interval i is delimited by either (1) two markers with map positions Formula < Formula or (2) a marker and a telomere. For case 2 we assume without loss of generality that the telomere has map position 0 and the distance between the marker and the telomere is xai. The length of a chromosome interval is di = FormulaFormula (case 1) and di = Formula (case 2). The genome length is Formula.

We assume that the offspring are completely homozygous lines, derived without selection from a biparental cross or a backcross of completely homozygous parents P1 and P2 that are polymorphic in at least one marker per chromosome. We further assume absence of interference (STAM 1979) in crossover formation such that the recombination frequency ruv between two loci with map positions xu ≤ xv is calculated by HALDANE's (1919) mapping function:

Formula 1(1)

The predictor {xi}:

The predictor {xi} of the genome contribution of parent P1 to a derived line is defined as

Formula 2(2)
where {xi}i is the prediction of the genome contribution of parent P1 to the ith chromosome interval.

We consider at first a finite number w of loci equidistantly distributed at positions x1, ... , xw on a chromosome interval delimited by markers at positions Formula 2 and Formula 2. We then have

Formula 3(3)
where E(Gs | Formula 3, Formula 3) is the conditional expectation that the locus at map position xs carries the allele of parent P1 under the condition that the genotypes Formula 3 and Formula 3 were observed at the two flanking markers with map positions Formula 3 and Formula 3. Following the principle used by FRANKLIN (1977) and HILL (1993), Equation 3 can be extended to an infinite number of loci at positions xs:

Formula 4(4)

For telomere chromosome intervals we have in analogy

Formula 5(5)

The conditional expectation of Gs is (omitting the subscript i for the chromosome interval)

Formula 6(6)
for xs in chromosome intervals flanked by two markers and

Formula 7(7)
for xs in chromosome intervals flanked by a marker and the telomere.

Mating systems:

We express the one-, two-, and three-locus genotype frequencies required for Equations 6 and 7 in terms of

Formula 8(8)
where xu, xv isin {xa, xb, xs}. The values of p and quv depend on the mating system used for deriving the inbred line and the map distance between the markers at positions xu and xv. In this study, we consider four mating systems: (1) (F2)t-single-seed descent (SSD) lines are developed by t (t ≥ 0) generations of random mating of an F2 population and subsequent application of the single-seed descent method for line development; (2) (F1)t-double-haploid (DH) lines are developed by t (t ≥ 0) generations of random mating of an F1 cross and subsequent inbred line development with double haploids; and (3) backcross (BC)t-SSD and (4) BCt-DH lines are developed from an F1 cross backcrossed t (t ≥ 1) times to parent P2, with subsequent line development by the single-seed descent or double-haploid method, respectively. Expressions for p and quv under these mating systems are given in Table 1, and the corresponding derivations are presented in the APPENDIX.


View this table:
In this window
In a new window

 
TABLE 1

Definition of parameters p and quv for four mating systems

 

Genotype frequencies:

For the derivations a shorthand notation is used. We omit the names of random variables in definitions of multilocus genotype frequencies and use only the value of the realizations. For example, P(Ga = 1, Gs = 1) is abbreviated as P(11), and P(Ga = 1, Gs = 1, Gb = 1) as P(111). For the derivation of three-locus genotype frequencies, two-locus genotype frequencies referring to subsets of the three loci are required. In this case, the realization of the third (not considered locus) is denoted with a "–," e.g., P(Ga = 1, Gb = 1) is abbreviated as P(1–1).

The single-locus genotype frequencies follow directly from the definition of p in Equation 8:

Formula 9(9)

The two-locus genotype frequencies for two loci at map positions xu, xv isin {xa, xb, xs} can be written as

Formula 10(10)

For deriving the three-locus genotype frequencies with respect to three loci at map positions xa < xs < xb for (F1)t-DH and (F2)t-SSD lines, we follow an approach outlined by HALDANE and WADDINGTON (1931) and recently developed by BROMAN (2005) for F2-SSD lines (named two-way RILs in his article). We have

Formula 11(11)
and because of symmetry

Formula 12(12)

Solving this system of linear equations and using p = Formula 12 yields

Formula 13(13)

For BCt-DH and BCt-SSD lines we employ the system of equations

Formula 14(14)

For BCt-DH lines we use

Formula 15(15)
to solve the system of equations in Equation 14 and obtain

Formula 16(16)

For BCt-SSD lines we have

Formula 17(17)
where the superscript b refers to the parameters of a BCt individual, which can be obtained from Equation 16 by replacing t + 1 with t, and superscript s refers to the parameters of F2-SSD lines. Solving Equation 14 yields

Formula 18(18)

Note that (1) the genotype frequencies obtained with Equation 13 after inserting p and qab for F2-SSD lines are identical to those of BROMAN (2005), and (2) the genotype frequencies for BCt-DH lines are identical with those obtained with the formulas of VISSCHER and THOMPSON (1995) for BCt+1 individuals.

Conditional expectation {xi}*:

The predictor {xi} can be extended by replacing E(Gs | ga, gb) (Equation 4) and E(Gs | ga) (Equation 5) with

Formula 19(19)
where gi is a vector consisting of the marker genotype of the markers on the ith chromosome. The resulting predictor {xi}* is the conditional expectation of the parental genome contribution to an inbred line under the condition of the observed marker genotype. For calculation of the multilocus genotype frequencies in Equation 19, the recursion equations of HOSPITAL et al. (1996) can be employed. Further, the closed-form equations derived by VISSCHER and THOMPSON (1995) for BCt individuals can be applied to BCt–1-DH lines.


DISCUSSION

Other predictors for the parental genome contribution:

A commonly used predictor (cf. BERNARDO et al. 2000; HECKENBERGER et al. 2005b) of the genome contribution of parent P1 to an inbred line is the proportion of marker alleles from P1 in the set of polymorphic markers between P1 and P2,

Formula 20(20)
where m is the number of markers and gj refers to the genotypes at the marker loci. Major shortcomings of the unweighted predictor {alpha} are that (i) the correlation between markers due to linkage and (ii) the stochastic dependence between the markers and the adjacent genomic regions are not taken into account. The advantage of {alpha} is that no prior information about the mating system used to develop the line is required.

No previous studies exist about more efficient predictors for the parental genome contribution to inbred lines. However, VISSCHER (1996) developed an approach for predicting the proportion of the genome originating from the donor parent in backcross individuals, borrowing ideas from selection index theory (HAZEL 1943),

Formula 21(21)
where c is the number of chromosomes, li is the length of the ith chromosome, gi is a vector consisting of the marker genotype of the markers on the ith chromosome, Vi is the covariance matrix of gi, and yi is a vector consisting of the covariances between the donor genome at the markers and the donor genome on the carrier chromosome of the markers.

VISSCHER's (1996) approach can be extended to inbred lines derived from arbitrary mating systems by defining for each chromosome (we omit the index for the chromosome)

Formula 22(22)
where Dus = (qusp)p is the expected gametic disequilibrium between loci at map positions xu and xs under the considered mating system, with expressions for p and qus given in Table 1. For example, for BCt-DH lines we have

Formula 23(23)

In comparison with the predictor {alpha}, the weighted predictor ß has the advantage that the markers contribute with different weights to ß, depending on their linkage; i.e., ß takes into account the correlation between the markers on a chromosome. However, it ignores the stochastic dependence between the markers and the adjacent genomic regions on a chromosome.

In contrast to {alpha} and ß, the predictor {xi} takes into account both the correlation between markers and the stochastic dependence between markers and the adjacent genomic regions. The former is considered by weighting {xi}i with the distance di between adjacent markers (Equation 2) and the latter by the integration of E(Gs | ga, gb) along the chromosome (Equations 4 and 5).

Conditional expectation {xi}*:

The predictor {xi}* is unbiased, and this can be shown using E(Gu) = p and E (E[Z | G]) = E(Z) (SHAO 1999, p. 33, Proposition 1.12.iv), where the random variable Z denotes the parental genome contribution to an inbred line and the random vector G its multilocus marker genotype. From {xi}* = E(Z | g) it follows that {xi}* is also unbiased in the sample space {Omega}g, which comprises the parental genome contribution to all possible inbred lines having a certain marker genotype g. From an applied point of view, this means that {xi}* is neither systematically overestimating nor underestimating the parental genome contribution for any given marker genotype g. It can be further shown that the conditional expectation of a random variable has minimum variance among all unbiased predictors (SHAO 1999, p. 33, Equation 1.40). In consequence, the conditional expectation {xi}* can be regarded as an optimum predictor of the parental genome contribution.

For the F1-DH mating system (which is, for example, often employed for the development of inbred lines in hybrid maize breeding programs) the predictors {xi} and {xi}* are identical under the assumption of no interference in crossover formation. For other mating systems, calculation of {xi}* requires calculation of multilocus genotype frequencies. In contrast to the relatively simple calculations of two- and three-locus genotype frequencies, which can easily be carried out with standard software such as R (IHAKA and GENTLEMAN 1996), calculation of multilocus genotype frequencies requires extensive programming (cf. SERVIN et al. 2002).

We compared {xi}* and {xi} for several special cases and found only small numerical differences in the results. We therefore conclude that the simple calculations required for {xi} may outweigh the theoretical optimum properties of {xi}* in many practical applications. It could be the subject of further research to investigate whether the substantially greater programming and computational effort, which is required for calculation of {xi}* in the general case, results in a significant improvement of the prediction accuracy compared with {xi}.

Systematic prediction error of {alpha} and ß:

Consider the example of a 2-M chromosome of an F2-SSD line, on which two markers located 0.5 and 1.5 M from the telomere carry the allele of parent P1 (Figure 1A). Loci in the genome region between the two markers are up to 0.5 M distant from the nearest adjacent marker. Owing to the large recombination frequency between distant loci, the markers predict only poorly the genotype at these loci. The predictors {alpha} and ß do not take this low correlation into account and the genotype of all loci on the chromosome is predicted to be the same as the genotype of the markers: {alpha} = ß = 1. We now focus on a large number of chromosomes carrying the alleles of parent P1 at the two markers. Only ~70% of these chromosomes carry the allele of parent P1 at a locus in the center between the markers [E(Gs | ga, gb) {approx} 0.7, see Figure 1]. Hence, with respect to all possible chromosomes having the considered marker genotype, the predictors {alpha} and ß are systematically overestimating the genome proportion originating from parent P1.


Figure 1
View larger version (21K):
In this window
In a new window
Download PPT slide
 
FIGURE 1.—

Predictions {alpha}, ß, and {xi} for a 2-M chromosome of an F2-SSD line on which two markers are located 0.5 and 1.5 M from the telomere. (A) Both markers carry the allele of parent P1. (B) The first marker carries the allele of parent P1 and the second marker that of parent P2. The solid line denotes the conditional expectation E(Gs | ga, gb) that the locus at position xs carries the allele of parent P1.

 
For symmetry reasons, the genome contribution of parent P1 to chromosomes carrying at both markers the allele of parent P2 is systematically underestimated by the predictors {alpha} and ß. In contrast, {alpha} and ß show no systematic prediction error for chromosomes on which recombination occurred (Figure 1B). Individuals having no recombination between two markers with map distance 1 M occur in an F2-SSD population with probability 0.54 (cf. HALDANE and WADDINGTON 1931). Hence, a considerable systematic prediction error of {alpha} and ß is observed for more than half of the chromosomes of an F2-SSD population.

Systematic prediction error for the entire genome:

The above theoretical example illustrates that in principle systematic prediction errors can occur when employing predictors {alpha} and ß. To investigate whether the extent of such systematic prediction errors is of relevance in practical applications, we conducted a simulation study with Plabsoft (MAURER et al. 2004). Simulated data were used because they provide the "true" parental genome contributions z of parent P1 as well as the predictions Formula 23 ({vartheta} isin {{alpha}, {xi}}) for each simulated inbred line. This allows us to generate a large number of inbred lines and determine the prediction errors e = {vartheta}z.

For the simulation we employed a model of the maize genome based on the study of HECKENBERGER et al. (2005a). It consists of 10 chromosomes of length 1.70, 1.30, 1.06, 1.48, 1.28, 1.15, 1.14, 1.21, 0.99, and 0.91 M and 100 SSR markers, which were chosen for good coverage of the entire genome. We simulated 1000 F2-SSD lines, for which we determined the prediction errors of {alpha} and {xi}. The correlation {rho}{alpha},e = 0.36 between the predicted genome proportions {alpha} and the corresponding prediction errors e was highly significant (type I error rate 0.001), whereas no significant correlation was observed for the predictor {xi} (Figure 2).


Figure 2
View larger version (15K):
In this window
In a new window
Download PPT slide
 
FIGURE 2.—

Prediction error e of {alpha} and {xi} in a simulated maize data set. Formula 23, Formula 23, and Formula 23 are mean values, and {rho}{alpha},e and {rho}{xi},e are the correlations between the predicted value and prediction error.

 
We conclude that the extent of systematic overestimation of large parental genome contributions and systematic underestimation of small ones by the predictor {alpha} can cause serious problems with linkages maps commonly used in practical applications.

Precision of prediction:

To assess systematically the precision of prediction of {alpha}, ß, and {xi} in the four mating systems under consideration, we conducted a simulation study. We employed a model of the maize genome with 10 chromosomes of length 1.6 M. Twenty to 200 markers were assumed to be (a) randomly distributed and (b) equally spaced in the genome. In practice, the marker distribution ranges between these two extremes, which can be regarded as a "worst-case" scenario (a) and a "best-case" scenario (b). For each combination of marker density and spacing we simulated 500 F2-SSD, F1-DH, BC1-SSD, and BC1-DH populations of size 100. (For random spacing of markers, different maps were used for each of the 500 populations.) The correlations {rho}{alpha},e, {rho}ß,e, and {rho}{xi},e between predicted values and prediction errors as well as the mean squared prediction errors

Formula 24(24)

were determined for each simulated population. The results were then averaged over the 500 populations.

The correlations {rho}{alpha},e, {rho}ß,e were highly significant (type I error rate 0.001) for all combinations of the investigated parameters, while {rho}{xi},e was not significantly different from zero for any combination. The largest correlations, amounting to 0.75, were observed for predictor {alpha} with sparse maps and random marker positions (Table 2). However, even with 200 equally spaced markers {rho}{alpha},e ≥ 0.25 and {rho}ß,e ≥ 0.15. The mean squared prediction error M{xi} was at least 50% smaller than M{alpha} for randomly distributed markers, and Mß ranged in between and approached the values of M{xi} for ≥100 markers. With equally spaced maps, the differences between M{alpha}, Mß, and M{xi} were negligible for >80 markers.


View this table:
In this window
In a new window

 
TABLE 2

Correlations {rho}{alpha},e and {rho}{xi},e and mean squared prediction errors M{alpha}, Mß, and M{xi} for simulated maize lines depending on marker density and spacing for four mating systems

 
We conclude that the superiority of {xi} compared to {alpha} and ß with respect to the mean squared prediction error reduces with increasing numbers of equally spaced markers. However, even for dense maps with equally spaced markers, the correlations {rho}{alpha},e and {rho}ß,e between predicted value and prediction error indicate that systematic prediction errors of {alpha} and ß are to be expected, with negative effects for practical applications.

Application to experimental data:

Prediction of the parental genome contribution is illustrated with experimental data from a study on plant variety protection in maize (HECKENBERGER et al. 2005a). The genotype of 100 SSR markers was assessed at 56 F2-SSD lines and their crossing parents. For each inbred line, markers not polymorphic between its crossing parents were discarded. This resulted in different marker sets used for the calculations in each line, with numbers of polymorphic markers m ranging between 38 and 67. From the genotype at the polymorphic markers, the predictors {alpha}, ß, and {xi} were calculated (Table 3 lists results for the 12 lines with the largest and smallest values of {xi}).


View this table:
In this window
In a new window

 
TABLE 3

Predictors {alpha}, ß, and {xi} for the experimental data from maize

 
The differences between the predictors {alpha} and {xi} were mostly negative for small values of {xi} and mostly positive for large values, reaching up to 11% (line 1). Comparing these values with simulation results for the same linkage map (Figure 2) leads to the conclusion that the differences observed for large and small parental genome contributions are partially caused by the systematic prediction error of {alpha}.

A method to detect essentially derived varieties is to compare a prediction of the parental genome contribution to an inbred line with a threshold value. HECKENBERGER et al. (2005b) suggested to use as thresholds the quantiles of the probability distribution of the parental genome contribution to inbred lines under an accepted breeding method. For the chromosome lengths underlying the study of HECKENBERGER et al. (2005a) we investigated this strategy and determined with a simulation the 0.95 quantile of the parental genome contribution to F2-SSD lines as t = 0.662. When comparing the predictions of the parental genome contribution to an F2-SSD population with this threshold value, then it is expected that 5% of the F2-SSD lines are classified incorrectly as essentially derived varieties. In our experimental F2-SSD population {alpha} > t for 7 lines (12.5% of the 56 lines) and ß > t for 6 lines (10.7%), but {xi} > t only for 3 lines (5.4%) (Table 3).

Consequently, the systematic overestimation of large parental genome contributions by {alpha} and ß can result in a greater error rate of incorrectly classifying a line as essentially derived than is nominally associated with a chosen threshold value. However, due to the stochastic nature of meiosis, using {alpha} does not necessarily result in a greater error rate in every experimental population. This can be seen, for example, when comparing the lower tail of the distribution of the experimental data with 1 – t.

Summarizing, {xi} allows prediction of the parental genome contribution with a much higher precision than the unweighted predictor {alpha} commonly employed in practice, in particular when extreme values of the parental genome contribution are of interest. Thus, using {xi} instead of {alpha} is clearly advantageous for obtaining reliable conclusions on the true parental genome contribution to an inbred line.

Deviations from the assumptions:

As applies to most mathematical models of biological systems, the presented prediction method is not capable of capturing every detail of the underlying biological process, and the results should be interpreted with this in mind. Among the assumptions made in our derivations, the following seem of particular importance:
  1. We assumed absence of interference in crossover formation, although it is well known that interference occurs (for a discussion on using noninterference models, see FRISCH and MELCHINGER 2001).
  2. We assumed known map positions of the markers. However, in practice, the linear order and map distances are estimated from mapping experiments with one or several segregating populations. Depending on the size and type of the mapping population(s), the estimated map may deviate from the true map due to sampling error or other causes.
  3. We assumed absence of selection during backcrossing and inbred line development. If selection is carried out, the probability that a certain locus carries the allele of P1 may differ from our derivations.

If, for a certain study, one or several of these assumptions do not hold true, the actual advantage in precision of {xi} compared with {alpha} and ß may be smaller than that under the idealized model, where all assumptions are fulfilled.

Applications in genetics and breeding:

Being aware of the above limitations, the presented results demonstrate that the predictor {xi} provides a substantial improvement in the precision of predicting the parental genome contribution to inbred lines compared with the commonly used unweighted predictor {alpha}. This improved precision can be important in a broad range of practical applications.

In inbred lines developed from backcross individuals of a gene introgression program, exact prediction of the parental contribution of the donor parent can help to assess the risk of negative phenotypic effects caused by the donor genome. The prediction of the parental genome contribution to inbred lines complements the prediction of the donor genome proportion in backcross individuals described by FRISCH and MELCHINGER (2005). The combination of both approaches allows monitoring of the parental genome proportion from the first backcross generation until the converted inbred line is finally developed.

Introgression libraries of near-isogenic lines (ESHED and ZAMIR 1995) are increasingly developed in various crops, e.g., for fine mapping of QTL or for identification of advantageous chromosome segments in exotic genetic resources or landraces (TANKSLEY and NELSON 1996). The presented approach can be used to predict the donor genome proportion in chromosome regions where a line of an introgression library carries the marker alleles of the recurrent parent. This can help to assess the risk that the observed phenotypic effect is not caused by the chromosome segment introgressed on purpose but by other donor chromosome segments not detected by the employed marker set.

In plant variety protection, exact prediction of the parental genome contribution to an inbred line is of crucial importance to draw conclusions whether or not the line (1) was developed with a generally accepted breeding method or (2) has a parental genome proportion below a generally accepted threshold. The predictor {xi} can be employed, for example, to estimate precisely the parental genome contribution of one parent, assuming a given mating system, which can then be compared with threshold values (HECKENBERGER et al. 2005b). In this context, it is of particular interest that {alpha} is overestimating systematically large parental genome contributions.

In quantitative genetic studies, the genome contribution of a parent to a biparental crossing progeny is usually assumed to be one-half. If {xi} is employed instead, this allows us to consider not only the expected relation between two lines on the basis of the mating system, but also the actual similarity at the level of the entire genome. A possible application is, for example, the best linear unbiased prediction of the breeding value of a line employing phenotypic information from its crossing parents.

Recurrent full-sib mating is used to generate homozygous strains in animals such as mice. Employing results of HALDANE and WADDINGTON (1931), parameters p and quv for recurrent full-sib mating can be derived analogously to the derivations for recurrent selfing in the APPENDIX. Using this extension, the theory presented here can be used straightforwardly for applications in animal genetics.

Prediction of the parental genome proportion with {xi} can be interpreted as a "map-based genetic distance" between an inbred line and its crossing parent. It seems promising for further research to investigate whether the principle used in this study can be extended to provide map-based genetic distances for general pedigrees and/or heterozygous individuals.


APPENDIX
We derive the probabilites p and quv for (F1)t-DH, (F2)t-SSD, BCt-DH, and BCt-SSD lines. For the derivations we use the relationship

Formula 24
where

Formula 24
is the expected gametic disequilibrium between two loci at positions xv and xu in infinite populations.

(F1)t-DH lines:

The probability that a locus of an (F1)t-DH line carries the allele of parent P1 is p = Formula 24. The expected linkage disequilibrium in an (F1)1 (i.e., an F2) population is

Formula 24

Because (i) the expected gametic disequilibrium in an (F1)t–1-derived DH line equals that of an (F1)t population and (ii) in random mating populations, the linkage disequilibrium decreases with ratio (1 – ruv) per generation (FALCONER and MACKAY 1996, p. 18), the expected gametic disequilibrium for (F1)t-DH lines is

Formula A1(A1)

In consequence, we have

Formula A1

(F2)t-SSD lines:

The probability that a locus in an (F2)t-DH line carries the allele of parent P1 is p = Formula A1. The linkage disequilibrium in SSD lines derived from a population in Hardy–Weinberg equilibrium with linkage disequilibrium of D'uv is

Formula A1
(COCKERHAM and WEIR 1973). Because for an (F2)t population (derivation in analogy to Equation A1)

Formula A1
we have for (F2)t-SSD lines

Formula A1
and, therefore,

Formula A1

BCt-DH lines:

The probability that a locus of BCt-derived DH line carries the allele of parent P1 is p = (1/2)t+1 and the probability that a locus at position xv carries the allele of P1 under the condition that the locus at position xu carries the allele of parent P1 is

Formula A1

BCt-SSD lines:

The probability that a locus in a BCt-derived SSD line carries the allele of parent P1 is p = (1/2)t+1. The probability that continued selfing of an individual with genotype ABab results in an inbred with one of genotypes AbAb or aBaB is

Formula A1
(HALDANE and WADDINGTON 1931). Consequently, for BCt-SSD lines,

Formula A1
and, therefore,

Formula A1


ACKNOWLEDGEMENTS
We thank the anonymous reviewers for their comments and suggestions, which helped to improve the manuscript. In particular, we are greatly indebted to an anonymous reviewer for pointing out a major mistake in an earlier version of the manuscript.


LITERATURE CITED

BERNARDO, R., J. ROMERO-SEVERSON, J. ZIEGLE, J. HAUSER, L. JOE et al., 2000 Parental contribution and coefficient of coancestry among maize inbreds: pedigree, RFLP, and SSR data. Theor. Appl. Genet. 100: 552–556.

BROMAN, K., 2005 The genomes of recombinant inbred lines. Genetics 169: 1133–1146.[Abstract/Free Full Text]

COCKERHAM, C. C., and B. S. WEIR, 1973 Descent measures for two loci with some applications. Theor. Popul. Biol. 4: 300–330.[CrossRef][Medline]

ESHED, Y., and D. ZAMIR, 1995 An introgression line population of Lycopersicon pennellii in the cultivated tomato enables the identification and fine mapping of yield associated QTL. Genetics 141: 1147–1162.[Abstract]

FALCONER, D. S., and T. C. MACKAY, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman Group, Harlow, UK.

FRANKLIN, I. R., 1977 The distribution of the proportion of genome which is homozygous by descent in inbred individuals. Theor. Popul. Biol. 11: 60–80.[CrossRef][Medline]

FRISCH, M., and A. E. MELCHINGER, 2001 The length of the intact chromosome segment around a target gene in marker-assisted backcrossing. Genetics 157: 1343–1356.[Abstract/Free Full Text]

FRISCH, M., and A. E. MELCHINGER, 2005 Selection theory for marker-assisted backcrossing. Genetics 170: 909–917.[Abstract/Free Full Text]

HALDANE, J. B. S., 1919 The combination of linkage values and the calculation of distance between the loci of linkage factors. J. Genet. 8: 299–309.

HALDANE, J. B. S., and C. H. WADDINGTON, 1931 Inbreeding and linkage. Genetics 16: 357–374.[Free Full Text]

HAZEL, L. N., 1943 The genetic basis for constructing selection indices. Genetics 28: 476–490.[Free Full Text]

HECKENBERGER, M., M. BOHN and A. E. MELCHINGER, 2005a Identification of essentially derived varieties obtained from biparental crosses of homozygous lines. I. SSR data from maize inbreds. Crop Sci. 45: 1132–1140.[Abstract/Free Full Text]

HECKENBERGER, M., M. BOHN, M. FRISCH, H. P. MAURER and A. E. MELCHINGER, 2005b Identification of essentially derived varieties with molecular markers: an approach based on statistical test theory and computer simulations. Theor. Appl. Genet. 111: 598–608.[CrossRef][Medline]

HILL, W. G., 1993 Variation in genetic composition in backcrossing programs. J. Hered. 84: 212–213.[Abstract/Free Full Text]

HOSPITAL, F., C. DILLMANN and A. E. MELCHINGER, 1996 A general algorithm to compute multilocus genotype frequencies under various mating systems. Comput. Appl. Biosci. 12: 455–462.[Abstract/Free Full Text]

IHAKA, R., and R. GENTLEMAN, 1996 A language for data analysis and graphics. J. Comput. Graph. Stat. 5: 299–314.[CrossRef]

MAURER, H. P., A. E. MELCHINGER and M. FRISCH, 2004 Plabsoft: software for simulation and data analysis in plant breeding. Proceedings of the 17th Eucarpia General Congress, September 8–11, 2004, Tulln, Austria, pp. 359–362.

SERVIN, B., C. DILLMANN, G. DECOUX and F. HOSPITAL, 2002 MDM: a program to compute fully informative genotype frequencies in complex breeding schemes J. Hered. 93: 227–228.[CrossRef]

SHAO, J., 1999 Mathematical Statistics. Springer-Verlag, New York.

STAM, P., 1979 Interference in genetic crossing over and chromosome mapping. Genetics 92: 573–594.[Abstract/Free Full Text]

TANKSLEY, S. D., and J. C. NELSON, 1996 Advanced backcross QTL analysis: a method for the simultaneous discovery and transfer of valuable QTLs from unadapted germplasm into elite breeding lines. Theor. Appl. Genet. 92: 191–203.[CrossRef]

VISSCHER, P. M., 1996 Proportion of the variance in genetic composition in backcrossing programs explained by molecular markers. J. Hered. 87: 136–138.[Abstract/Free Full Text]

VISSCHER, P. M., and R. THOMPSON, 1995 Haplotype frequencies of linked loci in backcross populations derived from inbred lines. Heredity 75: 644–649.

Communicating editor: R. W. DOERGE




This article has been cited by other articles:


Home page
Crop Sci.Home page
S. Smith
Intellectual Property Protection for Plant Varieties in the 21st Century
Crop Sci., July 1, 2008; 48(4): 1277 - 1290.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. E. Melchinger, H. F. Utz, H.-P. Piepho, Z.-B. Zeng, and C. C. Schon
The Role of Epistasis in the Manifestation of Heterosis: A Systems-Oriented Approach
Genetics, November 1, 2007; 177(3): 1815 - 1825.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Frisch and A. E. Melchinger
Variance of the Parental Genome Contribution to Inbred Lines Derived From Biparental Crosses
Genetics, May 1, 2007; 176(1): 477 - 488.
[Abstract] [Full Text] [PDF]