Predicting Rates of Inbreeding in Populations Undergoing Selection
 ^{*}Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, United Kingdom
 ^{†}Animal Breeding and Genetics Group, Wageningen Institute of Animal Sciences, Wageningen Agricultural University, 6700 AH Wageningen, The Netherlands
 Corresponding author: J. A. Woolliams, Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, United Kingdom. Email: john.woolliams{at}bbsrc.ac.uk
Abstract
Tractable forms of predicting rates of inbreeding (ΔF) in selected populations with general indices, nonrandom mating, and overlapping generations were developed, with the principal results assuming a period of equilibrium in the selection process. An existing theorem concerning the relationship between squared longterm genetic contributions and rates of inbreeding was extended to nonrandom mating and to overlapping generations. ΔF was shown to be ~¼(1 − ω) times the expected sum of squared lifetime contributions, where ω is the deviation from HardyWeinberg proportions. This relationship cannot be used for prediction since it is based upon observed quantities. Therefore, the relationship was further developed to express ΔF in terms of expected longterm contributions that are conditional on a set of selective advantages that relate the selection processes in two consecutive generations and are predictable quantities. With random mating, if selected family sizes are assumed to be independent Poisson variables then the expected longterm contribution could be substituted for the observed, providing ¼ (since ω = 0) was increased to ½. Established theory was used to provide a correction term to account for deviations from the Poisson assumptions. The equations were successfully applied, using simple linear models, to the problem of predicting ΔF with sib indices in discrete generations since previously published solutions had proved complex.
WRAY and Thompson (1990) proved a fundamental relationship between the sum of squared longterm genetic contributions of ancestors and rates of inbreeding for random mating populations in discrete generations. One consequence of this relationship was that rates of inbreeding were tied to the numerator relationship matrix for the first time. This narrowed the conceptual gap between the central parameter for genetic evaluation of individuals using best linear unbiased prediction and one of the key properties of a breeding scheme. Another important consequence was to set out in a formal way a model for the mechanics of inheritance of selected advantage, a concept that Robertson (1961) had introduced but had left unclarified. An achievement of the methods of Wray and Thompson (1990) was to obtain, for the first time, accurate predictions of ΔF in mass selection through modeling pathway extensions. However, this was done by using a recursive algorithm, so that although the mechanics were clear, the overall structure of the prediction remained obscure.
Woolliams et al. (1993) advanced the understanding of the structure of the prediction by obtaining a closed form for the prediction of ΔF. It was shown to have terms involving variances of family size in one generation, with additional terms for the proliferation or reduction of ancestral lines over many generations that could be predicted as a result of the selective advantage of the ancestor. Furthermore, it was clear that under equilibrium conditions, the model would lend itself to geometric summation of terms across generations. This led to simple forms for the expected longterm contribution of an ancestor. Wray et al. (1994) extended the methods to index selection, although the form of the model is a hybrid of the approach of Woolliams et al. (1993) and Hill (1972), since the conditional arguments of pathway extension that had been carried out for mass selection were found to be too complex for index selection. Nevertheless, worthwhile predictions were made available in a tractable form.
Santiago and Caballero (1995) used an approach that made no direct reference to the theory of contributions to predict ΔF in mass selection. They obtained a neater closed form for ΔF than that derived by Woolliams et al. (1993) through an argument based on total drift, relating the change through selection to loss of genetic variance. Unlike the previous work of Wray and Thompson (1990) and Woolliams et al. (1993), who had considered the population in relation to an unselected base generation, Santiago and Caballero (1995) developed predictions based upon equilibrium genetic variance. Nomura (1996) extended the approach of Santiago and Caballero (1995) to mass selection with overlapping generations but with the important restriction that the males and females selected from a cohort remain the same in both number and identity throughout the breeding life of the cohort.
This article examines the issues raised by the work described above. First, the relationship between ΔF and the realized longterm genetic contributions is extended to include nonrandom mating and overlapping generations. Second, an important result for the prediction of ΔF is developed by demonstrating a relationship between ΔF and the expected squared longterm contribution conditional on the selective advantages for random mating. Finally, as an example of application, predictions of ΔF for sib indices, previously considered by Wray et al. (1994), are reexamined using the equilibrium methods for expected longterm contributions developed by Woolliams et al. (1999) and compared to results from simulation.
RELATIONSHIP BETWEEN ΔF AND LONGTERM GENETIC CONTRIBUTIONS
This section discusses the relationship between ΔF and realized longterm genetic contributions. In doing so, it derives the expected increase in homozygosity at the level of a neutral locus in contrast to the matrix method of Wray and Thompson (1990). The notation that is used is shown in Table 1. The model for the population is assumed, for the present, to have discrete generations with X_{m} male parents and X_{f} female parents. For calculation of inbreeding coefficients every allele is considered as unique in the base population (t = 0). It does not matter if the base generation has the structure of an unselected and unrelated population.
Discrete generations: Consider one of these alleles in the base population at a neutral locus (say allele B). Let the gene frequency at time t, in the parents of sex q that have been selected to produce generation t + 1, be denoted by P_{B}(q, t). The gene frequency can be described in terms of genetic contributions similar to Equation 1 of Woolliams et al. (1999). Let A_{i} be the gene frequency of an allele B in individual i, where A_{i} = 1, ½, or 0 if i is BB, B·, or ··, respectively (where · represents any other allele), then the individual gene frequencies can be treated as breeding values for frequency. The average of the gene frequency in the parents of sex q in generation t is given by
Initially assume that there is random mating. For any generation the probability of homozygotes for B is obtained from the product of the gene frequencies in the male and in female parents and is P_{B}(m, t)P_{B}(f, t). The inbreeding coefficient F_{t} for the neutral locus is then the sum over all distinct alleles at the locus,
More precisely, for each allele and each ancestor, the term
Assume equilibrium values for (i) the deviation from HardyWeinberg frequencies arising from the nonrandom mating (ω, equivalent to α_{I} of Caballero and Hill 1992) and (ii) ΔF, attained by generation 2 (this assumption is removed later); then Equation 2 can be further simplified using results given in appendix a, namely,
Therefore, the terms in C_{u}(t) can be modified to terms in C_{u−1}(t − 1), and each term of the sum within the square brackets of Equation 6 can be reduced to −ΔFC_{u}(t − 1). After repeating this process for the C_{2}(t) term [and temporarily neglecting the term in ωΔFC_{1}(t − 1)],
Since C = E[Σ_{i}Σ_{mates (j(m),j*( f ))}r_{i,u}(j(m), t) r_{i,u}(j*(f), t)] for large u ≫ t, for any i the terms r_{i,u}(j(m), t) and r_{i,u}(j(f), t) converge to the same value for all j in generation t providing the population mixes. This value will be the longterm contribution of ancestor i to the population, denoted by r_{i}. This will occur with or without random mating. Thus
This result was obtained for ω = 0 by Wray and Thompson (1990) but the derivation differs in several aspects. First, in the derivation of Wray and Thompson the base was unselected and therefore not in equilibrium at the start of the selection process, and this led to an impression that the contributions used for estimating rates of inbreeding must be the generation after an unselected base. It is now evident that the choice of generation on which the estimate is obtained is arbitrary except that it is at the start of some period of local equilibrium during which some “equilibrium ΔF” may exist. Second, the derivation using the probability of homozygosity for an assumed allele is of value since the proof of Wray and Thompson (1990) is heavily based upon the properties of the numerator relationship matrix. Third, it extends the result to incorporate nonrandom mating, although the result was given without proof by Woolliams and Thompson (1994). Caballero and Hill (1992) noted that the result of Wray and Thompson (1990) was a poor predictor of ΔF with nonrandom mating and it is now clear why this was so.
Even though the development of the pedigree may be in equilibrium (which will imply the genetic variance being selected upon is in equilibrium) this does not imply that equilibrium values of ω and ΔF for the alleles defined in the arbitrary base are immediately attained. Equation 4, using appendix a, assumes that these parameters were in equilibrium for the Mendelian sampling in generation 2. However, the following argument shows that this does not affect the result. Assume the equilibrium conditions have not been attained by generation 2; then for this generation plus a small number of generations following (i.e., up to attainment of equilibrium) there will be terms of the form δC_{u}(t) in Equation 4 and δC_{u}(t − 1) in Equation 5. Providing t is sufficiently large compared to the period of attainment, these terms will cancel in Equation 6 since C_{u}(t) is a convergent series. Thus Equations 10, 11, 12 and 13 will hold for the equilibrium values of ω and ΔF.
Overlapping generations: If ΔF is taken per unit time then the structure of the preceding proof holds. The reduction in the variance of the Mendelian sampling term over initial cohorts, before an equilibrium ΔF/unit time is established, is not straightforward since it will depend upon the age structure of the population; but the previous argument used to overcome deviations from equilibrium can be applied. However, one distinction in overlapping generations is that the base generation will contain the equivalent of L cohorts, where L is the period of time over which the longterm contributions sum to one, since this is the period required for the population to turn over a generation for those genes destined to remain in the population in the longterm. Woolliams et al. (1999) show this genetic generation interval is different from the average age of the parents when there are selection advantages between groups (see also Bijma and Woolliams 1999). To balance (8) there is a need to add and subtract terms of magnitude ½C_{0}(t) (ΔF/generation) or equivalently ½C_{0}(t)L (ΔF/unit time), where L is the generation interval. Thus the error term in Equation 10 is [1 − ½CL]^{−1}, and consequently ignoring this term results in an underestimate with a fractional error of 2 × (ΔF per generation). Equation 11 is obtained by summing over all individuals born in a single cohort. With overlapping generations, individual ancestors within cohorts will have different life histories, since they will be used at different breeding ages or for different purposes. If X_{q} is the number of individuals with a lifetime breeding profile categorized by q, then the approximation will be
RELATIONSHIP BETWEEN ΔF AND EXPECTED CONTRIBUTIONS
Since ΔF is proportional to
Monoecious population: The proof is simplest in the case of a monoecious diploid population of X parents in discrete generations without selfing. Random mating is assumed (ω = 0). Extension to overlapping generations and to two sexes follows by analogy but is complicated by the need for matrices, and so this extension is made in appendix b. The longterm contribution of individual i is given by
The variance
One of the critical assumptions of the proof leading to (24) is that the selected family sizes are distributed as a Poisson variable. However, departures from this will occur, for example, (i) when the litter sizes are not Poisson; (ii) when negative covariances between fullsibs and between halfsibs are induced by using sib indices for selection; (iii) when selection intensity becomes large; and (iv) when there are common environmental variances associated with litters. (The occurrence of the last two causes will depend on the model chosen for s, which is addressed in the discussion.)
To account for this deviation let V_{n,i} = θ_{n,i} + V_{n,dev,i} in Equation 19, where V_{n,dev,i} may be positive or negative according to the circumstances. Then the component in θ_{n,i} can be treated as previously and Equation 21 becomes
One of the benefits of Equation 24 is that the rate of inbreeding can be obtained from predicting means, often using regression techniques. Accounting for deviations from the Poisson distribution introduces the need for estimating variances of family size to obtain Equation 27. Nevertheless, the multigenerational problem of estimating the variance of a longterm genetic contribution has been reduced to estimating the variance of family size after selection in a single generation.
Extension to overlapping generations: With overlapping generations, individuals within a cohort that are selected to breed at any point in their lifetime can be divided into breeding categories. These categories are defined by the age of breeding, how often, and for what purpose the individual breeds. Categories are particularly important in selection. As an example, consider mass selection where all selected individuals can have progeny born at ages 1, 2, or 3. If the population is making genetic progress the average merit of individuals born 3 years ago is less than the average merit of an individual born 1 year ago. Therefore an offspring of a 3yearold parent will have a selective disadvantage compared to an offspring of a 1yearold parent and so is expected to make a smaller genetic contribution in the longterm (see Bijma and Woolliams 1999). If an individual is a parent at all ages then its genetic contribution is expected to be greater than an individual chosen for breeding only at a single age. Breeding purpose is also important: if one group of parents are given more mating opportunities, then these would be expected to have more offspring and, other factors being equal, ultimately a greater longterm genetic contribution.
For these reasons partition of the selected individuals into categories is necessary to obtain the general result. It is assumed that the categories are defined so that an individual belongs to a single category that describes its lifetime genetic contribution. To continue the example of mass selection, where the only distinction among parents is the breeding age, there would be potentially seven categories. If {x} denotes age x at breeding, then these categories are {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}. The number of categories will inevitably depend on the complexity of the breeding scheme, but the essential point is that they can be defined and enumerated. Let n_{c} be the number of categories indexed from q = 1 … n_{c}, and μ_{i(q)} be the expected longterm contribution of individual i in category q conditional on its selective advantage s_{i}_{(q)} with variance
As previously, for a parent from category q, define the matrix V_{n}_{(q),dev} of size n_{c} × n_{c} to be the (co)variance matrix for the number of selected offspring in each of the n_{c} categories, expressed as deviations from independent Poisson variances. For each q, neglecting terms in s (for empirical reasons given earlier), there will be a term δ_{q} defined by α^{T}V_{n(q),dev}α, where α is the vector with the qth element equal to the expected longterm contribution for an individual from category q, i.e., E_{s}[μ_{i}_{(q)}] = α_{q}. Note δ_{q} may be negative since it is a variance deviation and is not a variance. This term is introduced in Equation B6 of appendix b. From appendix b we arrive at
Although the proof has been based upon a monoecious diploid organism with no selfing, the extension to a dioecious organism is clear from the proof for overlapping generations. Having discrete generations with two sexes is identical to having two categories, i.e., males and females. Finally note that, other than assuming an equilibrium and random mating, there have been no assumptions on the type of selection index used, the nature of the genetic variation, or the population structure.
APPLICATIONS AND RESULTS
Sib indices in discrete generations: The theory is illustrated by selection on a general sib index of the form
In Wray et al. (1994) the selective advantages were based on the breeding values A_{i}_{(x)}, and this approach is adopted here but slightly modified. A sire i has one selective advantage, namely, its own breeding value plus the average breeding value of its d mates (i.e., its mate group) and this aggregate value is denoted by A_{i}_{(hs)}. A dam i has two selective advantages: first, the selective advantage of its mate (A_{i}_{(hs)}) and second, its own breeding value expressed as a deviation from the average breeding value of the mate group to which it belongs (denoted A_{i}_{(fs)}). The average breeding value of the fullsib family from dam i is ½(A_{i}_{(hs)} + A_{i}_{(fs)}). Thus, in this hierarchical scheme, s_{i}_{(m)} = (A_{i}_{(hs)}), and s_{i}_{(f)} = (A_{i}_{(hs)}, A_{i}_{(fs)})^{T}. The two selective advantages for a dam are independent.
Expected longterm genetic contributions were modeled following Woolliams et al. (1999) as
Step 1. Prediction of expected contributions: The prediction of expected genetic contributions is covered in detail by Woolliams et al. (1999). The current article only summarizes the procedure for a sib index, without derivation. Prediction of μ_{i}_{(q)} requires the prediction of α = (α_{m}, α_{f})^{T} and
Step 2. Rates of inbreeding assuming Poisson variances: From step 1, μ_{i}_{(m)} = [0.0250 + 0.0447A_{i}_{(hs)}]. The expected squared mean is a simple sum of squared terms:
The terms arising from
The rate of inbreeding ignoring deviations from Poisson variances is predicted from
Step 3. Correction for deviations of V_{n} from Poisson variances: Deviations from Poisson variances can be accounted for by correcting the rate of inbreeding using Equation 28, where δ_{q} = α^{T}V_{n}_{(q),dev}α and V_{n}_{(q),dev} is the (2 × 2) matrix with (co)variances of the number of selected offspring of a parent of sex q (q = m, f) as a deviation from independent Poisson variances. The calculation of the deviation from Poisson family variance for fixed numbers of selection candidates per fullsib family is described in appendix d. The approach adopted was derived in detail by Burrows (1984), although extension to two sexes was required and the method was made more flexible by incorporating results from Mendell and Elston (1974). Applying the method to the example gives
General fit: Extensive simulations were carried out assuming an infinitesimal model with factorial combinations of X_{m} = 20, 40, 80; d = 1, 2, 3 (and 5 for X_{m} = 20, 40); total offspring of 4, 8, and 16 per fullsib family equally divided between sexes; and with h^{2} = 0.1, 0.2, 0.4, and 0.6; weights used were (1.0, 0.75, 0.5) for d > 1 [changed to (1.0, 0.75, 0.75) for d = 1] and (1.0, 1.5, 2.0) for d > 1 [changed to (1.0, 1.5, 1.5) for d = 1]. Classical weights were also examined since these weights were the subject of the study of Wray et al. (1994), although they are suboptimal after the first round of selection from an unselected base population. Results have been tabulated and summarized by Woolliams and Bijma (1999).
With weights (1.0, 0.75, 0.5, or 0.75) the accuracy was excellent for all schemes, with all errors <4%. With weights (1.0, 1.5, 1.5, or 2.0) accuracy was also very good, accurately tracking trends with the changes in the parameters and with a large majority of errors <2% with the exception of d = 3, h^{2} = 0.4, where underestimates of up to 8% were observed. The trends in rates of inbreeding were also accurately tracked with classical weights with no increases in the magnitude of the errors, even though schemes had rates of inbreeding >0.03.
The most serious trend in the errors was a pattern of underprediction characterized by high mating ratio and large family sizes (both of which increase the selection intensity) and increased family weights. More surprisingly, the errors also increased with the numbers of parents at a constant d (i.e., X_{m} = 20, X_{f} = 60 compared to X_{m} = 80, X_{f} = 240), and also the errors were not present for h^{2} = 0.01 and increased sharply as h^{2} increased. To explore these errors further, the longterm contributions for selected males were plotted against A_{i}_{(hs)} for the following schemes with d = 3, weights (1.0, 1.5, 2.0): I, X_{m} = 20, h^{2} = 0.4, n_{o} = 16; II, X_{m} = 80, h^{2} = 0.4, n_{o} = 16; III, X_{m} = 80, h^{2} = 0.01, n_{o} = 16; and IV, X_{m} = 80, h^{2} = 0.4, with n_{o} = 4. The results for simulated (S) and predicted (P) were as follows: I, S = 0.0231, P = 0.0220; II, S = 0.0070, P = 0.0058; III, S = 0.0028, P = 0.0029; IV, S = 0.0037, P = 0.0037. Note that scheme II is simply scheme I with four times the number of parents and expected longterm contributions of I are consequently four times bigger than II. The prediction of ΔF for scheme II is close to (but not precisely) ¼ of that for I. However, the ratio of the simulated ΔF for scheme II compared to I was closer to ⅓, i.e., much greater than would be expected from scaling. Serious prediction error occurs only for scheme II.
Figure 1 shows that the accuracy of prediction with low h^{2} (scheme III) is because the linear model used is a good fit (i.e., the contributions are a simple linear regression on the selective advantage) and similarly for low selection intensity (scheme IV). However, for both the other two schemes the linear model predicts a substantial proportion of the selected males to have negative contributions, although rates of inbreeding are accurately predicted in one case (scheme I) but not in the other (scheme II).
Closer replicatebyreplicate analysis shows that despite the expectation, the substantially greater variance of contributions (approximately proportional to ΔF/X_{m}) in scheme I obscures the nonlinearity in the majority of replicates. When both linear and quadratic terms for the selective advantage were included in a regression model for observed contributions, the quadratic term was not statistically significant (defined here as P < 0.01) in >60% of the replicates. In contrast, for scheme II, this percentage was <15%. Thus the accuracy of prediction depends on the goodnessoffit of the linear model within a replicate, so more parents may promote greater proportional prediction errors, even though these errors will be associated with lower rates of inbreeding.
The pattern of the correction for deviations from Poisson distribution for selected family sizes is worth noting. These corrections are negative for b_{2}, b_{3} < 1, reduce in size as the index weights increase, and were generally positive for b_{2}, b_{3} > 1. For mass selection, b_{1} = b_{2} = b_{3} = 1, the correction is of the order of −1/(8T).
DISCUSSION
The theory described in this article provides a powerful tool for predicting rates of inbreeding in selected populations and for providing insights into the forces that contribute to the rate of loss of variation. The relationship of Wray and Thompson (1990) has been derived directly from consideration of identity by descent and has been extended to cover overlapping generations and nonrandom mating. Applicability was then advanced by showing how expected longterm contributions, which are predictable by general methods, can be used in place of observed longterm contributions to predict the rates of inbreeding, if random mating was assumed. Finally, the methods were applied to sib indices in discrete generations, for which the previous solutions were complex (Wrayet al. 1994). In doing so, some insight was gained into the origin of the prediction errors, and these appeared to arise from the goodnessoffit of the models used to implement the theory rather than those used to derive it.
Theory: The first theorem relating the rate of inbreeding in a population to the squared longterm contributions was previously derived by Wray and Thompson (1990) but the proof here has several useful extensions. In contrast to Wray and Thompson (1990), the proof is direct in using identity by descent rather than properties of the numerator relationship matrix, and it also incorporates nonrandom mating and overlapping generations. The simplest relationship (
The importance of the relationship between rates of inbreeding and squared genetic contributions is that it holds for selected populations, with no assumptions on the form of selection, providing (i) the genes are ultimately mixed, and (ii) an equilibrium exists over which a stable ΔF may be defined. A further caveat is that the rate obtained applies to a neutral, unlinked gene. The extension of other relationships to predict ΔF in selected populations does not always hold. For example, using the relationship Var(δq) = q(1 − q)ΔF, where q is the frequency of a neutral gene and δq is the change in frequency per unit time, will not hold if selection is not random since it assumes mutual independence of δq over consecutive intervals. The increments, δq, are also correlated for overlapping generations due to the many intervals over which the progeny of a single parent may be selected. As a consequence the justification for the proof by Hill (1979) for ΔF with overlapping generations is invalid, even in the absence of genetic selection, although the result is correct and agrees with the previous proof of Hill (1972). Closer examination of Hill (1979) shows that its justification lies in an intuitive argument for the relationship that was to be proved later by Wray and Thompson (1990). Consequently the methods derived here may be seen to arise as a natural development of the results of Hill (1972, 1979) for selected populations.
The form of Equation 4 shows that the sum of squared longterm contributions for any given cohort may be usefully interpreted in the absence of an equilibrium. The sum of squared contributions for a cohort is the proportion of the new variation (the Mendelian sampling variance) arising from within that cohort that is lost to the population in the long term. This includes all mutational variance arising in prior generations, since the choice of base is arbitrary. Therefore the sum of squared contributions of cohorts (particularly those still to converge!) is important, irrespective of equilibrium, and provides a meaningful measure of risk, and merits attention in both breeding and conservation schemes. The operational tools described by Grundy et al. (1998) are based upon controlling sums of squared contributions of cohorts and have meaning and validity beyond the infinitesimal model (e.g., Villanuevaet al. 1999). However, there are clearly greater problems in providing deterministic predictive tools to analyze population dynamics if the assumption of equilibrium is removed, and those provided by Woolliams et al. (1999) assume this equilibrium.
The second, novel theorem derived in this article is concerned with showing how the formulas with observed longterm contributions may be translated into formulas with expected longterm contributions. The latter are advantageous since they use predictable entities. The major change is that the expected can be substituted for the observed, providing the constant of proportionality is increased from ¼ to ½. The critical step in the proof is that the error variance of a longterm contribution given the selective advantage is related to the square of its mean, i.e., the coefficient of variation is relatively constant. Apart from random mating, the scope of this proof is very broad and is applicable to overlapping generations. The validity of the derivation was checked using general sibindices as an example in discrete generations, and a companion article (Bijmaet al. 2000) provides verification in overlapping generations with mass selection with lifetime selection, thereby removing a serious restriction of Nomura (1996). The limitation to random mating arises from Equation 17, although in one special case, partial fullsib mating with no selection, the analysis can be completed (using results of Ghai 1965) and shown to agree with the results of Caballero and Hill (1992). This provides an indirect verification of Equation 13 for nonrandom mating.
Woolliams et al. (1999) show how the expected longterm contribution may be calculated in general for different inheritance models (e.g., imprinted variation, maternal additive, or sexlinked variation) with different selection indices (sib indices or best linear unbiased predictors). Using longterm contributions follows the path of Wray and Thompson (1990) and Woolliams et al. (1993) and differs from Santiago and Caballero (1995; mass selection in discrete generations) and Nomura (1996; a special case of mass selection with overlapping generations), who base their predictions on genetic variation transmitted to descendants. This is because the approach using genetic variation cannot be sustained for general selection schemes. Santiago and Caballero (1995) suggest (their Equation 13) that a change in covariance between a general selective advantage and a neutral gene following selection is determined by the reduction in genetic variation. This is true for mass selection, where the index of selection is solely a function of the total breeding value and residual error, but will not be true in general (Woolliamset al. 1999). Bijma et al. (2000) show why there is agreement between the two approaches for mass selection in discrete generations and also why the current methods are required to cope with overlapping generations.
Prediction: Usable predictions were obtained by Wray et al. (1994) and an alternative form based upon Wray et al. (1994) was used by Villanueva and Woolliams (1997). However, the method of Wray et al. (1994) was complicated, although it attempted to model the expected proliferation of ancestral lines. The authors believe the proposed method is conceptually simpler than that of Wray et al. (1994) and is open to development.
In any attempt to obtain prediction formulas, a balance has to be achieved between accuracy and simplicity. We have used simple linear models to interpret the theory. Thus in application the prediction consists of two elements: (i) the squared expected contribution and (ii) the deviation from independent Poisson families. The first of these elements was applied precisely as described by Woolliams et al. (1999), with corrections for finite numbers only being used to obtain the sample variance of selective advantages. No other modifications were needed because the other terms in the squared expected contribution were estimates of regression coefficients, which were assumed to be relatively robust to finite sampling. This assumption may be justified in part by the excellent agreement obtained by Woolliams et al. (1999) between simulations and deterministic predictions of expected longterm contributions. The second element, calculating the deviation from independent Poisson families, only required extension of the method of Burrows (1984) to two sexes. The correlation coefficients among fullsibs and halfsibs used for calculating this element were those obtained assuming infinite numbers but, to compensate for this, no reduction for finite samples was applied to the squared means.
The choice of selective advantages has as an objective the minimum number needed to make the selective processes in different time periods independent. Using sib indices as an example, the authors considered both the method presented, where only breeding values were included as selective advantages, and an alternative definition in which the selective advantages were the halfsib mean and deviation of the fullsib mean from the halfsib mean. The potential benefit from the alternative parameterization is that the environmental covariances in the index arising from the sib means are accounted for within the expected longterm contribution. Conditioning on the sib means is more than is strictly necessary for conditional independence between generations. However, while results using the alternative parameterization were as accurate in most cases (results not shown), the underestimates explored in the results tended to be more severe. One reason for this is that terms included in the expected longterm contribution are modeled by linear functions, whereas modeling the environmental correlations by the method of Burrows (1984) allows part of the nonlinearity to be accounted for. Therefore, the more terms that are included linearly in the expected longterm contribution, the greater the errors arising from nonlinearity.
Nonlinear relationships between the selective advantage and longterm contributions occurred when high selection intensities of selection were combined with moderate heritabilities, large numbers of parents, and high mating ratios. Results from including quadratic terms in the model for the expected longterm contribution (unpublished) confirm that the serious prediction errors arise from the assumption of linearity rather than from Equation 29.
There are good reasons to believe that these departures from linearity should not prove a major problem where the objective is to design effective breeding schemes. First, on pragmatic grounds the curvilinear relationship shown in Figure 1 suggests that 15% of selected males were being used with no expectation of longterm contribution to the population (this percentage is even higher if the contributions were plotted against the observed halfsib mean!). The resources used to keep and breed these animals are clearly wasted. In an ideal selection scheme, an ancestor's longterm contributions will be zero or, once its Mendelian sampling term is above a critical threshold, linearly related to the sampling term (Woolliams and Thompson 1994; Grundyet al. 1998). Consequently it would be expected that in an ideal scheme, the longterm contribution of a selected ancestor will show an approximate linearity with its breeding value. This argument suggests that if the design objective is for a scheme to generate gain efficiently from the resources available, a linear model for the relationship between the longterm contribution and the selective advantage should prove sufficient. If so, then the need for improved deterministic models to cater for the schemes with large prediction errors would be removed. The viewpoint that the schemes with large prediction errors are inefficient is supported by the results of Villanueva and Woolliams (1997), who showed that when using sib indices, efficient schemes had d ≤ 2 for which the methods presented here had a good fit.
In conclusion, this article has (i) established a broader theorem (compared to Wray and Thompson 1990) concerning the relationship between squared longterm genetic contributions and rates of inbreeding, in particular extending the theorem to nonrandom mating and to overlapping generations; (ii) shown that, for random mating, the relationship can be generalized from longterm contributions that are simply observed to encompass expected longterm contributions that can be predicted; and (iii) shown how these equations might be interpreted with simple linear models in the context of predicting rates of inbreeding with sib indices in discrete generations. Together with the findings of Woolliams et al. (1999), the findings of this study show how rates of inbreeding may be predicted in general populations with complex structures and genetic models.
Acknowledgments
J.A.W. gratefully acknowledges financial support from the Ministry of Agriculture, Fisheries and Food (United Kingdom), and the support and encouragement of Prof. A. MakiTanila, who gave an opportunity for this work to be initiated. The contribution of P.B. was financially supported by the Netherlands Technology Foundation (STW) and coordinated by the Earth and Life Science Foundation (ALW).
APPENDIX A: THE EXPECTED MENDELIAN SAMPLING VARIANCE
The expected Mendelian sampling variance in generation 1 summed over all alleles in the founders can be calculated using the following argument. For the progeny of the carrier founder i* of the allele the gene frequency has mean ¼, i.e., half of the gene frequency in carrier (½) plus half of that in mate (0), with
At generation 2 and later, with true random mating the Mendelian sampling variance will be reduced. For dioecious species this will be delayed by a generation through nonrandom mating, and in general the expected variance is ¼(1 = ω)(1 − ΔF)^{u−1} in generation u > 1, where ΔF is the rate of inbreeding among the parents.
APPENDIX B: EXTENSION OF THE PROOF RELATING EXPECTED CONTRIBUTIONS TO RATES OF INBREEDING TO INCLUDE OVERLAPPING GENERATIONS
Let X_{q} be the number of parents in category q and for convenience define a diagonal matrix N with elements X_{q}. The prediction for ΔF in overlapping generations is given by Equation 14. Let μ_{i}_{(q)} = E[r_{i}_{(q)}s_{i}_{(q)}] and
Thus Equation 16 becomes
To obtain Equations 18 and 19, we need to define θ_{n,i}_{(q)} = E[n_{i}_{(q)}] with element p given by θ_{n,i}_{(q),p}, and V_{n,i}_{(q)} to be the variancecovariance matrix for the elements n_{i}_{(q)}, and to simplify the expressions define γ to be the vector with elements γ_{p} = E[r_{j}_{(p)}s_{i}_{(q)}, j(p) offspring of i(q)], and η to be a vector with elements η_{p} = Var[r_{j}_{(p)}s_{i}_{(q)}, j(p) offspring of i(q)]. This results in
APPENDIX C: PREDICTION OF EXPECTED GENETIC CONTRIBUTIONS FOR SIB INDICES
Expected genetic contributions were calculated using equilibrium genetic parameters. The genetic parameters were obtained by iterating rounds of selection starting from an unselected base generation with additive genetic variation
Calculation of the expected longterm genetic contributions followed the methods of Woolliams et al. (1999). Briefly these methods depend upon defining two regression models: the first describes the relative fitness of a parent as a linear function of its selective advantages; the second regression model describes the relationship of the selective advantages of the selected offspring with those of its parent. In discrete generations these models will depend only upon the sex of the parent and the sex of the selected offspring (in overlapping generations they may also depend on age).
For discrete generations the values of α_{m} and α_{f} are simply (2X_{m})^{−1} and (2X_{f})^{−1}, respectively, and so the only term that needs more detailed description is the calculation of β. β is a vector of three regression coefficients, the first (β_{1}) describing the regression of the longterm contribution of a selected male on its selective advantages A_{i}_{(hs)} and the remaining (β_{2}, β_{3}) describing the regression of the longterm contribution of a selected female on its two selective advantages (A_{i}_{(hs)}, A_{i}_{(fs)}). In the remainder of the appendix the selective advantages are indexed 1–3 as above.
β is derived from the formula of Woolliams et al. (1999), which has been simplified for application to discrete generations,
λ is a (2 × 3) matrix, where λ_{i1} is the regression coefficient for the relative fitness of a male parent on its selective advantage, and where λ_{i2}, λ_{i3} are the corresponding coefficients for the selective advantages of a female parent. When i = 1 the relative fitness is for having male offspring selected and i = 2 for having female offspring selected. These coefficients will depend on the index of selection used and the selection intensity. The coefficients are derived using Appendix A of Woolliams et al. (1999). The elements are
Π is a (3 × 3) matrix, with π_{ij} being the regression coefficient of selective advantage i of a selected offspring on the selective advantage j of the parent. This matrix describes exactly how the selection process in one generation is related to the same process in the next generation. The elements of II are derived by standard selection theory (described in detail in Appendix B of Woolliamset al. 1999) and account for the effects of selection. Let z = ρ_{I}σ_{A}/σ_{I}; then the elements of II are
Example. For X_{m} = 20, X_{f} = 60,
APPENDIX D: THE VARIANCES OF FAMILY SIZE AFTER SELECTION WHEN LITTER SIZES ARE CONSTANT
The variances of family size when litter sizes are constant are derived by combining results of Burrows (1984) and Mendell and Elston (1974), which extend and formalize results used by Woolliams et al. (1993). For simplicity, litters are assumed to have n males and n females, and there are T candidates for selection in each sex. The basic approach of using factorial moments, i.e., E[n_{ij}(q)(n_{ij}(q) − 1)], where n_{ij}(q) is the number of sex q (i.e., q = m or f) selected from the fullsib family with sire i and dam j, was described in detail by Burrows (1984). Since Burrows (1984) was working in the context of forestry only a single sex was considered and hence some extension to two sexes is necessary. The approach of Burrows (1984) has been preferred since it results in elegant formulas.
Denote n_{ij}(q) as the number of offspring selected of sex q from the fullsib family of sire i and dam j, and n_{i*}(q) as the number selected from sire i (i.e., summed over all its mates). Note that the variance of family size can be simply expressed in terms of the factorial moments:
Burrows (1984) derived the asymptotic form (Burrows 1984, Equations 4, 5, 6, 7, 8, 9, 10, 11 and 12),
Burrows (1984) derived the additional result to use for the variance of halfsib family sizes. In this article only paternal halfsib families are considered,
To obtain the variances and covariances conditional upon the selective advantage, the regression model derived for the expected number of offspring selected is used (see appendix c).
Thus, for a dam family,
Example. For X_{m} = 20, X_{f} = 60,
Applying the results of this appendix gives V_{n}_{(m),dev} = (0.186, 0.7510.751, −0.079) and V_{n}_{(f),dev} = (0.020, 0.1590.159, −0.154).
Footnotes

Communicating editor: R. G. Shaw
 Received March 30, 1999.
 Accepted December 6, 1999.
 Copyright © 2000 by the Genetics Society of America