Genetics, Vol. 153, 1009-1020, October 1999, Copyright © 1999

Expected Genetic Contributions and Their Impact on Gene Flow and Genetic Gain

J. A. Woolliamsa, P. Bijmab, and B. Villanuevac
a Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, United Kingdom,
b Animal Breeding and Genetics Group, Wageningen Institute of Animal Sciences, Wageningen Agricultural University, 6700 A4 Wageningen, The Netherlands
c Scottish Agricultural College, Edinburgh EH9 3JG, United Kingdom

Corresponding author: J. A. Woolliams, Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, United Kingdom., john.woolliams{at}bbsrc.ac.uk (E-mail)

Communicating editor: R. G. SHAW


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*APPLICATION OF MODELS AND...
*DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Long-term genetic contributions (ri) measure lasting gene flow from an individual i. By accounting for linkage disequilibrium generated by selection both within and between breeding groups (categories), assuming the infinitesimal model, a general formula was derived for the expected contribution of ancestor i in category qi(q)), given its selective advantages (si(q)). Results were applied to overlapping generations and to a variety of modes of inheritance and selection indices. Genetic gain was related to the covariance between ri and the Mendelian sampling deviation (ai), thereby linking gain to pedigree development. When si(q) includes ai, gain was related to Ei(q)ai], decomposing it into components attributable to within and between families, within each category, for each element of si(q). The formula for µi(q) was consistent with previous index theory for predicting gain in discrete generations. For overlapping generations, accurate predictions of gene flow were obtained among and within categories in contrast to previous theory that gave qualitative errors among categories and no predictions within. The generation interval was defined as the period for which µi(q), summed over all ancestors born in that period, equaled 1. Predictive accuracy was supported by simulation results for gain and contributions with sib-indices, BLUP selection, and selection with imprinted variation.


SELECTION theory has not generally addressed how the number of descendants from an individual grows or reduces over time in relation to properties of the population. This is perhaps surprising because the development of the pedigree over generations provides the framework for the passage of genes through the population, forming the link between our understanding of individual genotypes and the way such genotypes influence the population. Such an understanding provides answers to, for example, the relative importance of individuals within a generation; where genetic change has arisen; how quickly the change generated has spread through the population; with what precision we are able to predict this change; how genetic change is related to the loss of variation; and how genetic change in one generation relates to that in a subsequent generation. These questions have no general framework within which they can be answered although some special cases have been investigated (e.g., VILLANUEVA et al. 1996 Down; BIJMA and WOOLLIAMS 1999 Down).

The objective of this study is to describe the expectations for the proliferation of genetic lines using the concept of genetic contributions. The generation of linkage disequilibrium during selection changes the impact of selective advantages and this must be accounted for to predict the flow of an individual's genes through a population over time. These changes affect the comparative gene flow of different breeding groups or categories and of different individuals within categories. The general development builds upon the pioneering work of WRAY and THOMPSON 1990 Down and more recently the studies of WOOLLIAMS et al. 1993 Down(mass selection), WRAY et al. 1994 Down(sib-indices), and WOOLLIAMS and THOMPSON 1994 Down. First, the concept of genetic contributions is considered in relation to genetic gain, and a general formula for gain is proved. The expected genetic contribution of an individual to subsequent generations is derived, and the relationship of the long-term genetic contribution with gain is used to show the consistency between the developed theory and classical theory (e.g., BULMER 1980 Down). The concept of the generation interval is reevaluated as a natural extension of the contribution theory. Many of the detailed results are derived assuming an equilibrium. The uses of the developed formulae are shown in examples of selection applied to discrete generations using sib-indices, using best linear unbiased predictors (BLUP), with imprinted variation, and with overlapping generations.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*APPLICATION OF MODELS AND...
*DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Definitions and basic notation:
Table 1 shows the notation for the principal parameters. The concept of genetic contributions was introduced by JAMES and MCBRIDE 1958 Down and was developed by WRAY and THOMPSON 1990 Down for the prediction of rates of inbreeding ({Delta}F). Given the fundamental nature of the concept of this article, the definition is restated. The genetic contribution of an ancestor i born at time u to an individual j born at time t (>u) is the proportion of the genes of j that are expected to derive by descent from ancestor i. This is different from the definition used by WRAY and THOMPSON 1990 Down, who multiplied this proportion by Xm + Xf (where Xm and Xf are the number of male and female parents in a generation); but as shown by WOOLLIAMS et al. 1993 Down, a contribution is more usefully defined without this rescaling. It is also distinct from the numerator genetic relationship that considers shared genes, not only those restricted to descent: so full-sibs make no genetic contribution to each other although they have a genetic relationship >0.


 
View this table:
In this window
In a new window

 
Table 1. The notational conventions for the principal parameters

The notation is defined to allow extensions to overlapping generations. Therefore contributions are defined within and between categories, where the categories are defined by both age and sex and, potentially, breeding use (e.g., nucleus females and other females). Over its lifetime an individual moves through various categories. An initial objective is to show the relationship between contributions and rate of gain, and for this there is no need to identify details of the category of an individual and what is happening to the different categories over time. For this objective it is necessary only to consider the observed contribution by whatever means it is achieved. However, to develop the concept of gene flow, which is important for understanding the dynamics of overlapping generations, the tracking of categories is required. Therefore, to keep notation minimal at any given stage, the notation for contributions is developed through the article, and a balance between consistency and simplicity was attempted.

The following notation is used initially: ri,u(j, t) is the contribution of ancestor i that was born at time u to individual j born at time t; ri,u(t) is the mean contribution over all the newborn cohort at time t (i.e., one-half of the mean for newborn males plus one-half of the mean for newborn females). For the long-term contributions of i, ri,u = ri,u(t) as t -> {infty}. For long-term contributions there is less need to specify u, and ri is used. Tm males and Tf females are scored in each cohort at random, and only scored individuals are candidates for breeding opportunities.

The populations are assumed to mix over time. With mixing, the contribution a particular ancestor makes to later-born individuals tends to a value that is the same for all individuals in later cohorts; i.e., for each i, the variance of ri,u(j, t) among j tends to 0 as t -> {infty} (WRAY and THOMPSON 1990 Down). This value is the long-term contribution ri and will differ between individual ancestors, depending upon the lifetime breeding use of i, its breeding value, and other selective advantages both genetic and nongenetic, and chance factors. WRAY and THOMPSON 1990 Down and GRUNDY et al. 1998 Down describe in more detail the relationship between the long-term contribution and the numerator relationship matrix.

The full development presented in this article assumes the infinitesimal model with negligible rates of inbreeding, because this satisfies the principal requirement for a period of equilibrium in the population structure. This study uses Mendelian sampling terms to mean the deviation of the breeding value of an individual from the mean of its parents' breeding values and Mendelian sampling variance to mean the variance of these deviations.

Rates of gain:
The breeding value of an individual may be decomposed into a sum of independent terms involving the breeding values of the base generation and Mendelian sampling terms of all other ancestors. This may be done by observing that (i) the breeding value of an individual j born at time t can be expressed as the average of its parental breeding values plus a deviation (the Mendelian sampling term), which is independent of its parental breeding values, i.e., Aj,t = 1/2Asire + 1/2Adam + aj,t; and (ii) by going backward through the pedigrees, this substitution can be repeated for each generation of ancestors until the base generation is reached. The coefficients for these terms are the genetic contributions of the ancestors to individual j born at time t. Therefore,

The second term is to allow for the base population, not necessarily unselected, where it is assumed that parents are unknown and so all the genetic information prior to t = 0 is contained in this base information. Let Gt, the genetic merit of the population at time t, be the average of the breeding values of the newborn males and females, i.e., Gt = {Sigma}jmales T-1mAj,t + 1/2 {Sigma}jfemalesT-1fAj,t; then Gt = {Sigma}tu=1 {Sigma}i ri,u(t)ai,u + {Sigma}i ri,0(t)Ai,0. Because E[ai,u] = 0, the cross-product riai is related to the covariance between ri and ai; thus sustained genetic gain is related to the creation of covariance between contributions and Mendelian sampling terms.

Let the gain made by selection in cohort t be {Delta}Gt = Gt+1 - Gt, and {Delta}ri,u(t) = ri,u(t + 1) - ri,u(t); then

(1)

Because the population is assumed to mix, the terms {Delta}ri,u(t) -> 0 as t -> {infty} and so {Delta}ri,u(t)ai,u -> 0 as t -> {infty} for a fixed u, and, in particular, the terms for the base population terms in Equation 1 tend to 0. Therefore for large t, summing over males (i(m)) and females (i(f)) separately and taking expectations,

(2)

If an equilibrium is approached (as will be the case with the infinitesimal model when inbreeding is ignored), the expected change in covariance between ri and ai will depend only on t - u and not on u per se, i.e., only on the elapsed time since the ancestor's birth, and not on the actual time of birth. So E[{Delta}ri(q),u(t)ai(q),u] = E[{Delta}ri(q),u+{delta}t(t + {delta}t)ai(q),u+{delta}t].

After making these substitutions, {Delta}Gt may be expressed as a sum of changes in contributions of individual ancestors, i.e.,

For u large enough, the right-hand side will approach its equilibrium value E[ri(q),uai(q),u]. Therefore, for a sufficiently large t, E[{Delta}Gt] = E[{Delta}Geq] and substitution of these results into Equation 2 gives

(3)

or equivalently, E[{Delta}Geq] = Tmcov(ri(m), ai(m)) + Tfcov(ri(f), ai(f)). An equivalent expression to Equation 3 can be given as a continuous function of time (available from the authors).

Comparison of Equation 3 with other expressions of gain:
The traditional formula for quantitative genetic gain expresses gain as the product of selection intensity ({iota}), accuracy ({rho}), and genetic standard deviation ({sigma}A) defined in a single generation. Equation 3 makes explicit and clear that (i) genetic gain must arise from "good" ancestors contributing more genes; (ii) this process of contributing genes concerns more than a single generation; (iii) sustained gain depends on utilizing the new variation, i.e., the Mendelian sampling variation, entering the population each generation; and (iv) quantitatively, the covariance of ri with ai gives a complete description of the process involved in items (i)–(iii).

The traditional expression for gain may be the most tractable form for calculation in most schemes, but it is unclear that this will always be the case, e.g., with quadratic indices as described by MEUWISSEN 1997 Down and GRUNDY et al. 1998 Down. However, it is shown that formulae developed in the next sections and used in Equation 3 lead to estimates for rates of gain that are precisely equivalent to the traditional expression for important cases. Therefore, the main outcome of Equation 3 is that the rate of gain has been connected to the pedigree, which is not apparent with {iota}{rho}{sigma}A. Equation 3 is useful for decomposing achieved gain, but its usefulness for prediction is limited because ri is observed. Therefore, it is necessary to develop expectations for ri.

Framework for general solution:
As described above, one reason for deriving expected long-term contributions is to exploit the relationships between the long-term contributions and rates of gain by replacing the observed ri. There are other reasons that are perhaps more important: first, the expected contributions are involved in predicting rates of inbreeding ({Delta}F) in selected populations using the relationship between {Delta}F and the sum of squared contributions (WRAY and THOMPSON 1990 Down; WOOLLIAMS et al. 1993 Down); second, the expected long-term contributions represent the expected gene flow in the population, and in complex population structures (with overlapping generations and breeding pyramids) this information is essential for scheme design. To develop expected contributions it is necessary to modify slightly the notation used. In particular, it is necessary for breeding categories (i.e., ages and sexes) to be explicit, so i(q) denotes an ancestor in category q.

The expected long-term contribution of individual i(q) is defined conditional on a vector of ns selective advantages, si(q). The si(q) are expressed as deviations from the average of the selected contemporaries q. The selective advantages influence the success of the offspring and (or) may influence the selection of subsequent descendants, i.e., µi(q) = E[ri(q)|si(q)]. For example, an expected breeding value (EBV) of an ancestor at the time of selection of its own offspring will influence the number of offspring that are selected and will play a role in the number of grand-offspring selected; in contrast, the corresponding prediction error of the EBV will not influence selection of offspring but will influence selection of grand-offspring. The conditional expectation expresses the expected contribution as a function of the selective advantages. If a linear model for the conditional expectation is assumed, then µi(q) = {alpha}q + ßTq(si(q) - q). If an equilibrium is assumed, then the coefficients {alpha}q and ßq will not change over generations and the same coefficients can be used for both the ancestor and the selected offspring. The expected lifetime long-term contribution of an individual i is the sum of the expected long-term contributions for all categories that i belonged to over its lifetime.

The objective of the following section is to define a set of achievable steps that can be followed to derive formulae for {alpha}q and ßq to obtain expected contributions even in complex breeding schemes. The starting point is to note that the long-term contribution of individual i is given by

(4)

where the sums are taken over its male and female offspring. Because unselected offspring have no long-term contribution, these sums may be restricted to the selected offspring. Taking expectations conditional on si(q) and summing over categories p,

(5)

Let the population have nc categories that describe sex, age, and breeding purpose. Discrete generations are a special case with only two categories, males and females. Initially, si(q) is assumed to be a single variable (ns = 1), namely the breeding value Ai(q). This was assumed for mass and sib-index selection by WOOLLIAMS et al. 1993 Down and WRAY et al. 1994 Down. In this situation ßq is a single number. The expected long-term contributions for individual i in category q can then be represented by µi(q) = {alpha}q + ßq(Ai(q) - q).

The solutions are obtained from four steps: (i) for overlapping generations only, to determine the gene flow from the parents (sic) in previous periods to selected individuals in the current period; (ii) to regress the expected number of offspring selected for a parent upon the selective advantage(s), with the regression coefficients {lambda}pq forming an nc x nc matrix {Lambda}; (iii) to regress the selective advantage(s) of a selected offspring upon those of the parent, with the coefficients {pi}pq forming an nc x nc matrix {Pi}; (iv) from these steps calculate the vectors of {alpha}q and ßq for all categories, i.e, {alpha} = ({alpha}1, {alpha}2, ... , {alpha}nc)T, and ß = 1, ß2, ... , ßnc)T, both of dimension nc x 1.

Step 1, defining the gene flow matrix G: The concept of gene flow (HILL 1974 Down) is used, but the development of Hill does not account for the inheritance of selective advantage that is critical for selection. A consequence of this selective advantage is that the probability that the parent of a selected individual in category p comes from category q will depend on the selection intensity in category p and the selective advantage of category q over other categories contributing candidates for category p. If category q has a selective advantage over other categories then its offspring will have increasing success as selection becomes more intense. Consider an example where dams from age 1 have a higher genetic merit than those of age 2, and the two ages contribute equally to a group of newborn individuals. If selection among this newborn group is at random, then those chosen are expected to come equally from 1- and 2-yr-old females; but if there is selection in this group, offspring of females of age 1 would be expected to be favored.

In the standard gene flow matrix (HILL 1974 Down), the key elements are g0,pq representing the proportions of genes in the newborn cohort from which category p will be selected (at some time in their life) that arise from category q parents. To obtain the expected long-term contributions a modified matrix is required (G, of dimension nc x nc) in which each row represents a category of selected individuals (rather than newborn), and with the elements gpq of each row representing the proportions of genes in the selected individuals transferred through breeding from the parents in the different categories q. With discrete generations and the standard two pathways, G = (1/2, 1/2|1/2, 1/2) always. Deterministic procedures to obtain G are described in detail by BIJMA and WOOLLIAMS 1999 Down and briefly in the application concerning overlapping generations in this article.

Step 2, defining and deriving {Lambda}: A regression model is required for the expected number of offspring (the expected selection score) of a parent in category q that are selected to breed in category p on the breeding value of their parent. With random selection the proportion of the Xp selected in category p that are expected to have category q parents is 2g0,pq and these are divided equally among the Xq parents in category q. In this case, the expected selection score for a parent in category q is simply a constant 2Xpg0,pqX-1q and does not depend upon Ai(q). With selection, Appendix A shows that this expectation is of the form 2XpgpqX-1q(1 + {lambda}pq(Ai(q) - q)). The elements {lambda}pq form an nc x nc matrix {Lambda}. For mass selection the {lambda}pq = {iota}p{sigma}-1P, where {iota}p is the intensity of selection in category p, and {sigma}P is the phenotypic standard deviation.

Step 3, defining and deriving {Pi}: A second regression model is required for the regression of the breeding value of the selected offspring on the breeding value of the parent. In principle these, too, depend on both the category of offspring and parent, giving an nc x nc matrix {Pi}, with {pi}pq representing the coefficient for offspring category p and parent category q. Thus E[Aj(p) - p] = {pi}pq(Ai(q) - q). Appendix B gives a general derivation for {Pi} that is used in all the applications. For the case of mass selection with only the breeding value conferring selective advantage, {pi}pq = 1/2(1 - kph2), where kp is the variance reduction coefficient for selection in category p and h2 is the heritability in the candidates.

Step 4, solutions: Using Equation 5 with (i) the breeding value replacing si(q) as the selective advantage; (ii) the E[number selected offspring|Ai(q)] replaced by 2XpgpqX-1q(1 + {lambda}pq(Ai(q) - q)); (iii) the assumption of equilibrium justifying the use of the same {alpha} and ß for both parent and offspring; (iv) (Aj(p) - p) in E[rj(p)|si(q)] replaced by {pi}pq(Ai(q) - q); and collecting terms independent of Ai(q) and those linearly dependent upon Ai(q) separately gives

(6a)


(6b)

The quadratic terms have been neglected and this is addressed in the DISCUSSION. If N is the diagonal matrix with elements Xp, then the matrix forms of Equation 6a and Equation 6b are

(7a)


(7b)

where {otimes} denotes element-by-element multiplication of the matrices.

Therefore, N{alpha} is a right eigenvector of GT with eigenvalue 1 (this eigenvector exists because all rows of G sum to 1). This defines {alpha} only up to a scalar. Let L be the generation interval defined as the period of time for the population to renew itself. Then (i) over its lifetime, a single cohort has a total long-term contribution of {Sigma}p Xp{alpha}p and so L{Sigma}pXp{alpha}p = 1; (ii) the average age at which the long-term contributions are made is given by L = ({Sigma}Xp{alpha}p)-1{Sigma}Xp{alpha}page(p), where age(p) is the age of individuals in category p. Combining these two formulae gives the constraint {Sigma}pXp{alpha}page(p) = 1, and this is sufficient to define {alpha} uniquely. Note L = ({Sigma}Xp{alpha}p)-1. For discrete generations, with the standard two pathways, {alpha} = (X-1m,X-1f)T and L = 1 always.

The vector Nß is completely determined once G, {Pi}, {Lambda}, and {alpha} are defined. If we consider a simple case with a single category that may occur with a monoecious population with X parents, then all the terms become scalars and ß = (1 - {pi})-1{lambda}{alpha} and {alpha} = X-1. For more than one category the gpq act as weighting factors across the categories for the different values of {pi}pq and {lambda}pq.

Extension to multiple variables (s):
With multiple variables (ns) conferring selective advantage, µi(q) = {alpha}q + ßTq(si(q) - q). {alpha} remains a vector of length nc but ß is a vector of length ncns of the form T1, ßT2, ... , ßTnc)T. Each element {lambda}pq becomes a 1 x ns submatrix {lambda}pq, and each element {pi}pq becomes an ns x ns submatrix {pi}pq. The matrix {Lambda} is of order nc x ncns, and {Pi} is of ncns x ncns. The solution for {alpha} remains unchanged (Equation 6a and Equation 7a). To obtain the equation analogous to (6b), let sj(p(v)) and si(q(w)) represent variables v and w in sj(p) and si(q), respectively, so 1 <= v, w <= ns:

(8)

The matrix forms of the equations for multiple variables in si(q) (not shown) are the same as in (7), but with (i) the definition of {otimes} being extended to mean the multiplication of the submatrices {pi}pq and {lambda}pq by the element gpq; and (ii) in (7b), Nß is replaced by N {otimes} ß; i.e., each subvector ßq is multiplied by Xq.

A further refinement of {alpha}:
This section is not essential to the overall development, but it can prove important for good approximation in complex structures and it is used in RESULTS. The section describes an improvement in the estimation of {alpha}, which corresponds to a second-order approximation.

The gpq account for the different selective advantages among the categories of the parents at the time of selection but the advantages or disadvantages are inherited in part by the selected offspring. From Equation 6a, {alpha}q = {Sigma}ncp=1XpgpqX-1q ({alpha}p + ßTpdpq), where dpq = E[sp|category q parent] - . After rearranging terms in Equation 6a and Equation 6b,

(9)

where D is dimension (ncns x nc), with submatrix pq equal to dpq. Although {alpha} is still defined as a right eigenvector of a matrix with eigenvalue one, the matrix is now more complex. The constraint to define {alpha} uniquely is unchanged. When generations are discrete and with the standard two-pathway model, D = 0.

Expected long-term contributions and rates of gain:
For any one individual i the total long-term contribution is the sum of its long-term contributions as it moves through the different categories over its lifetime, i.e., ri = {Sigma}ncq=1ri(q). Define Si(q) = 1 if i is selected in category q, 0 otherwise; then

When the expected long-term contribution is expressed in terms of the components of the breeding value, in particular the Mendelian sampling term, the expected long-term contribution is sufficient for the prediction of genetic gain because the remaining part (ri(q) - µi(q)) has no covariance with the Mendelian sampling term. Within a category q the sum of Si(q) over all candidates is Xq, and so application of Equation 3 gives

(10)

where now the expectations are conditional on being selected as a parent rather than unconditional as was the case in Equation 3. Equation 10 is expressed solely in terms of the selected individuals and in terms that are predictable rather than simply observed.

If µi(q) = {alpha}q + ßTq(si(q) - q) then Equation 10 immediately decomposes the gain into two components: the first, {Sigma}ncqXq{alpha}qE[ai(q)], is the expected gain from selection within families, which occurs at the time of selection of the ancestor, while the second, {Sigma}ncq=1 XqßTqE[(si(q) - q)ai(q)], represents the expected between-family gain, and describes the changes in contribution of selected ancestors from the time of their selection until convergence in the long term. Because the between-family gain is explicitly defined in terms of the selective advantages, the gain can be decomposed into components arising from each category and each selective advantage within categories.

The covariance between the Mendelian sampling term ai(q) and (si(q) - q) following the selection of the ancestor can be calculated using standard index theory. Note that because this is a covariance with the deviation from a sample mean, adjustments of (1 - X-1q) should result in increased precision. For simplicity, this has not been applied in the results presented. The predicted increase in precision can be confirmed from the results shown.

Development of contributions over time:
This section is not essential to the overall development but describes the solution to an important application of gene flow. In complex population structures it is often useful to predict how quickly improvement in one part of the population diffuses through to other parts of the population or what proportion of the gene flow arises from particular pathways (e.g., by male descent alone). This requires methods to predict the rate of convergence of genetic contributions over time.

To simplify the notation the development of contributions over time is given for the single selective advantage, the breeding value, A. It is assumed that when t = 0, the population is already in equilibrium. For category q, a selected individual at time 0 has a vector (dimension nc x 1) of contributions to selected individuals in category p at time t given by cq(p, t) + bq(p, t)(Ai(q) - q). This is a form similar to that of the long-term contribution, but before convergence it will differ between categories p and so needs to be defined for each p. Let cq(t) = (cq(1, t), cq(2, t), ... cq(nc, t))T, and bq(t) = (bq(1, t), bq(2, t), ... bq(nc, t))T. Then cq(0) = 0 except for X-1q in the qth position, and bq(0) = 0. A further vector of regressions is required, fq(t), for which the pth element is the regression of the breeding value of the selected individual in category p at time t on the breeding value of an ancestor in category q. By definition fq(0) = 0 except for the qth position where it is 1.

It is critical to note that the contributions at time t to the selected individuals in category p of age(p) will depend on the consequences of the selection upon the parental gene pool at time t - age(p): the more intense the selection, the more those parent categories with greater selection advantages will dominate. In a selection scheme, a group of newborn individuals will typically be subject to different selection intensities as they become older. Therefore the complete spectrum of contributions among the selected individuals in the different categories at time t will depend on states back to t - maxage, where maxage is the maximum age of the parents in the breeding scheme. Define Gp to be the nc x nc matrix consisting of zeros, except for the single row corresponding to category p, which is identical to the pth row of G. Then

(11a)


(11b)


(11c)

Equation 11a describes the contribution of category q to each category at each time t, with element p of the sum describing the contributions of category q ancestors (at time t = 0) to category p parents at time t, accounting for the selection in category p through the matrix Gp. Equation 11b describes the relationship of contributions from ancestors within category q (at time t = 0) to each category at each time t to the selective advantage; this arises from two processes, the first, analogous to (11a), from the transfer of differential contributions among ancestors of category q that were accumulated up to and including time t - 1, and the second from further differential contributions from selective advantages among the candidates at time t due to ancestors in category q at time t = 0. Equation 11c describes the changes in the selective advantages among the candidates at time t due to ancestors of category q at time t = 0.

When t becomes large, the mixing assumption for the population ensures that both cq(t) and bq(t) converge to a vector with all elements equal, namely {alpha}q1 and ßq1, respectively, where 1 = (1, ... , 1)T. Furthermore fq(t) -> 0 because the eigenvalues of G {otimes} {Pi} are <1 and >-1, and this reflects the diminishing effect of ancestors over time on the selection advantage of their descendants.

By redefining the state vector at time t to include not only cq(t) but also cq(t - 1), ... cq(t - maxage + 1), Equation 11a can be reformulated (results not shown) so that the state vector at time t is the product of a square stochastic matrix of order nc x maxage and the state vector at time t - 1. Using this reformulation and the properties of stochastic matrices (described in Appendix 1 of HILL 1974 Down), it can be demonstrated that Equation 11aEquation 11b HREF="#FD11c">Equation 11c are consistent with Equation 7aEquation 7b and the constraint {Sigma}p Xp{alpha}page(p) = 1 (results not shown).

The discrete time contributions with the refinement in estimating {alpha} are given in Appendix C. An example of application is given in RESULTS.


*  APPLICATION OF MODELS AND RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*APPLICATION OF MODELS AND...
*DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Expected long-term contributions and genetic gain for general sib-indices in discrete generations:
A general sib-index of the form I = b1(P - F) + b2(F - H) + b3H was studied by WRAY et al. 1994 Down, where I is the index, P is the phenotype of the candidate, F is the mean of the full-sib family (size nF) including the candidate, and H is the mean of the half-sib family (size nH) including the candidate and full-sibs. Mass selection is a special case with b1 = b2 = b3 = 1. For simplicity, the only selective advantage considered in this article, si(q), is the breeding value Ai(q), with other forms of environmental influences that are often considered (e.g., litter effects) omitted and random mating assumed. For discrete generations there are just two categories, males and females. In an unselected base generation the phenotypic variance ({sigma}2P) is 1 and the additive genetic variance is h20. The categories are q = m for male and f for female. The notation is included in Table 1.

The regression models required are derived from Appendix A and B: {lambda}pq = {iota}p{tau}q(2{sigma}I)-1 and {pi}pq = (1 - kp{tau}q{rho}{sigma}A{sigma}-1I), where {tau}m = b3 and {tau}f = b2(1 - XmX-1f) + b3XmX-1f and {tau} = ({tau}m + {tau}f). The {tau}q values were used by WRAY et al. 1994 Down and are twice the regression of the index of the candidate on the breeding value of the parent of sex q, {sigma}2I is the variance of the index, and {rho} is the accuracy of the index.

After simplification of Equation 7aEquation 7b (see WOOLLIAMS et al. 1999 Down for further details),

(12)

where {kappa} = [k{tau} + 1/8({tau}m - {tau}f)(km - kf)] and z = {rho}{sigma}A. This form is nearly equivalent to that given by WRAY et al. 1994 Down, but their derivation proceeded on different (and more complex) lines. Three points of difference should be noted. First, WRAY et al. 1994 Down do not include the small 1/8({tau}m - {tau}f)(km - kf) term in {kappa} that arises when both the selection intensity and the regression on the parental breeding value differ between the sexes. Second, the indices of WRAY et al. 1994 Down are explicitly scaled so that the regression of the breeding value of the candidate on its index is 1 (i.e., {rho}{sigma}A{sigma}-1I = 1), but scaling does not change {tau}q{sigma}-1I and so {alpha} and ß do not change with scaling. Finally in this article, predictions in equilibrium are obtained using equilibrium parameters.

Rate of gain from sib-indices: The decomposition of the rate of gain is achieved using Equation 10 and standard index theory. Within-family gain is given by

because {alpha}q = X-1q and E[ai(q)|i selected] = h20{tau}q{tau}w{sigma}-1I, where {tau}w is the regression of the index I on ai(q)({tau}w = b1(1 - n-1F) + b2(n-1F - n-1H) + b3n-1H). The total between-family gain is given by

because cov(ai(q), Ai(q)) = h20(1 - kq{tau}wz{sigma}-1I) for the selected individuals in category q.

The total gain, summed over both sexes, including both between- and within-family gain is, after simplification,

(13)

This uses the result km{tau}m + kf{tau}f = (km + kf)({tau}m + {tau}f) + (km - kf)({tau}m - {tau}f) = 2k{tau} + (km - kf)({tau}m - {tau}f).

Consistency with other approaches: Equation 13 for equilibrium {Delta}Geq can be compared to the standard formula {Delta}G = {iota}{rho}{sigma}A = {iota}z. Equation 13 comes from considering the gain achieved from a single cohort over all subsequent generations, whereas the standard formula comes from considering the gain achieved by all previous generations over a single cohort. For an equilibrium the two forms must be equal, and equating them results in a quadratic equation for z:

(14)

Equation 14 can be obtained as an equilibrium condition when using standard index theory with {sigma}2A = h20 + {sigma}2A(1 - km{rho}2) + {sigma}2A(1 - km{rho}2) and cov(A, I) = {rho}{sigma}A{sigma}I.

This demonstrates a consistency between the methods presented in this article (in particular those detailed in Appendix A and B) with results from classical index theory for discrete generations. Thus the decision to neglect the second-order correction for the Bulmer effect when deriving {pi}pq in Appendix B (i.e., correcting the genetic variance of the selected parents for selection among their offspring) is also implicit in standard index theory.

Equation 14 can be used to give reasonable estimates of equilibrium gain for indices even when using unselected base parameters, because many of the terms are constant over time. To use Equation 14 only the base generation value of {sigma}I is required to solve the quadratic equation for z and then gain is estimated by {iota}z. Using (14) to obtain z results in underestimates rather than the overestimates obtained using base parameters and ignoring linkage disequilibrium. However, the magnitude of the errors from (14) is qualitatively smaller (WOOLLIAMS et al. 1999 Down). Estimates from (14) are not precise because they assume {sigma}I constant, and further improvements to Equation 14 would require an iterative scheme in combination with {sigma}2I = {sigma}2I - z2([b22(1 - XmX-1f) + b23XmX-1f](k + kf) + b23(k + km)). The consistency, demonstrated with standard index theory, shows that this leads to the same result as the usual procedures for deriving equilibrium gain by iterating on the index accuracy and the genetic variance among the parents.

Expected long-term contributions for best linear unbiased predictors:
The analysis of individual long-term contributions can be extended to BLUP evaluation and indices derived from it. With sib-indices, si(q) was simply the breeding value Ai(q) because it is the only means by which a parent may influence its offspring over multiple generations (in the absence of common environmental effects, etc.). With BLUP, different approaches to the form of si(q) can be taken. WOOLLIAMS et al. 1999 Down used three terms for individual i in category q: Âi(q), the "initial EBV" at the point of selection of i; {delta}Âi(q), the "increment" in the EBV at the point of selection of its offspring; and êi(q), the remaining "prediction error" of the parent at the selection of offspring. Selection of i itself is determined by Âi(q), the selection of the offspring is influenced by Âi(q) and {delta}Âi(q), while selection of grand-offspring and subsequent generations is influenced by all three. Using the methods described here, WOOLLIAMS et al. 1999 Down obtained excellent predictions of expected contributions and genetic gain. They showed that the primary source of between-family selection among ancestors in BLUP is the increment in the EBV between its own selection and that of its offspring. The initial EBV played the least important role.

Extensions to other inheritance modes in the absence of allelic interactions:
Extensions of the model to other inheritance modes, such as additive maternal effects or X-linked variation, are made by defining the variables in si(q) and their impact on {lambda}pq and {pi}pq. As an example, results with maternal imprinted variation are given, where the passage of genes from parent to offspring follows normal Mendelian inheritance, but only the alleles passed to the offspring by the dam are expressed and affect the phenotype. For maternal imprinting, the breeding value can be split into the "expressed" breeding value (A+) inherited from the dam, and the "latent" breeding value (A-) inherited from the sire and not expressed.

Define si(q) = (A-i(q), A+i(q)), with discrete generations giving two categories, m for males and f for females. In this case, {lambda}pm will be zero because the genes passed by the sire do not influence selection of its offspring. However, {lambda}pf will depend on both breeding values, because although A- is not expressed in the dam it is expressed in its offspring. For {pi}pq, there is a dependence on both breeding values: genes passed by the sire only affect A-, and genes passed by the dam only affect A+. Because genes passed by the sire are not expressed, the regression of offspring on parent is unaffected by selection. Therefore, applying Appendix A and B,

where h2 = , and the phenotypic variance, {sigma}2P, is the sum of the variance of A+ and the environmental variance. Equation 7aEquation 7b was used to obtain ß.

Predictions were made using variance parameters obtained after iteration to equilibrium. To calculate {Delta}G, the expected values of the Mendelian sampling terms for selected individuals and the covariance with si(q) for selected individuals were calculated using standard index theory:

Because this is imprinted variation, half the genes from an ancestor will be expressed in females and half will be latent in males in the long term. Therefore gains predicted from Equation 3 should be halved.

Excellent predictions of expected genetic contributions and genetic gain were obtained (WOOLLIAMS et al. 1999 Down). From these results WOOLLIAMS et al. 1999 Down were able to show the relative importance of A- and A+ in male and female parents in contributing to within- and between-family gain.

Overlapping generations:
An example of application with overlapping generations is presented for mass selection, with a fixed number of parents selected at each age, in a two-path scheme (i.e., there was no subdivision of breeding individuals into males to breed males, males to breed females, etc.). The general approach is explained in more detail by BIJMA and WOOLLIAMS 1999 Down. The steps are illustrated using a scheme with three categories: 20 males breeding at 1 yr of age, 20 females breeding at 1 yr of age, and 20 females breeding at 3 yr of age, respectively. The number of offspring per litter was eight and the trait was assumed to have a heritability of 0.4. The age groups not used for parents are omitted: males age 1 (category 1), females age 1 (category 2), and females age 3 (category 3).

  1. The genetic make-up of the newborns is described by g0,p1, g0,p2, and g0,p3. These are 0.5, 0.25, and 0.25, respectively for all categories p. It is the same for all offspring categories p because it is only a two-path model. From the g0,pq, and the number of parents and the family sizes, the selection intensities ({iota}p) and variance reduction coefficients (kp) were calculated for each category: {iota}p = 1.647, kp = 0.817, i.e., the same for all three categories.

  2. An initial {Delta}G was assumed as a starting point for iteration. In the following, the starting point was {Delta}G calculated from standard gene flow (HILL 1974 Down). After iterating to an equilibrium, this was calculated to be {Delta}G = 0.412.

  3. The genetic value of the selected parents in category p was {iota}ph2{sigma}P - (age(p) - 1){Delta}G. Deviations from the overall means of the selected males and females were {delta} = (0, +0.412, -0.412); i.e., the female parents age 1 had breeding values 0.412 units above average and the female parents age 3 had breeding values 0.412 units below average.

  4. Before selection, genetic variance in category p was calculated using the pooled variance within categories plus between categories plus the Mendelian sampling variance:

    This was 0.370 for all p, and the phenotypic variance was {sigma}2P = 0.970 for all p.

  5. G was calculated using a truncation algorithm to find a truncation point for a given upper-tail probability for a mixture of Normal distributions. The algorithm was used twice for the selection of candidates in each category, first to obtain the genetic make-up from sire categories and then to obtain the genetic make-up from dam categories. For category p candidates, the mixing proportions for the Normal distributions were 2g0,pq (q = 1, 2, 3), i.e., the frequency of the candidates with parent category q; the means of the Normal distributions were the deviations of the candidates with parent category q from the mean of all like-sexed candidates, i.e., 1/2{delta}q; and the variances were assumed independent of parent category q and the phenotypic variance was adjusted for the component of genetic variance between categories of the same sex as parent category q, i.e., {sigma}2P - {Sigma}q* same sex as q (2g0,pq*){delta}2q*. In the first iteration, each row of G was (0.5, 0.336, 0.164), thus indicating that although the dams of ages 1 and 3 provided equal numbers of candidates, the candidates with dams of age 1 were expected to be twice as successful in having selected offspring.

  6. {Lambda} and {Pi} matrices were constructed according to Appendix A and B, respectively. For mass selection, {pi}pq = 0.5(1 - kph2) and {lambda}pq = 0.5{iota}p{sigma}-1P. In the first iteration, {Pi} = 0.344 11T, where 1T = (1, 1, 1), {Lambda} = 0.836 11T, and D = 1 (0, 0.092, -0.188). The result for D indicates that the breeding value of a selected individual (of any category p) with a dam of age 1 is expected to be 0.28 greater than a selected individual of the same category with a dam of age 3.

  7. {alpha} and ß were calculated according to Equation 7b and Equation 9. In the first iteration (N{alpha})T = (0.395, 0.289, 0.106) and (Nß)T = (0.503, 0.338, 0.165).

  8. The covariance of the Mendelian sampling term with the breeding values was calculated and {Delta}G was updated using Equation 11aEquation 11b HREF="#FD11c">Equation 11c; this uses the result that E[ai(q)] = h20{iota}q{sigma}-1P, and after selection cov(ai(q), Ai(q)) = h20(1 - kqh2).

  9. Steps 3 through 8 were repeated to convergence.

Results after convergence of the iterations were {alpha} = (0.0200, 0.0149, 0.0050)T and ß = (0.0255, 0.0171, 0.0084)T. Predicted gain within families was (0.134, 0.100, 0.034), and predicted gain between families was (0.067, 0.045, 0.022), giving a total gain of 0.402. At equilibrium G was 1 (0.500, 0.335, 0.165). This was compared to simulation results for 1000 replicates: {alpha} = (0.0197, 0.0145, 0.0052)T with a maximum SE of 0.0009; ß = (0.0249, 0.0175, 0.0071)T with a maximum SE of 0.0004; and a total gain of 0.398 (SE 0.001). Thus very close agreement between simulations and predictions was obtained. As in discrete generations the gain from mass selection was evenly divided between males and females. The gene flow predicted using HILL 1974 Down is {alpha} = (0.0167, 0.0083, 0.0083). HILL 1974 Down makes no prediction of ß.

The generation interval, defined by the time taken to turn over the genes once, was predicted from ({Sigma}Xq{alpha}q)-1 to be 1.25 (cf. 1.26 with SE 0.01 in the simulations), which was notably shorter than the average age of the parents. This was because of the cumulative effect of the selective advantage of the younger age group of females. Although they produced equal numbers of offspring they produced more than twice as many parents. However, the generation interval was not predictable from the equilibrium G alone (i.e., accounting for a single generation of selective advantage) because this would have predicted an interval of 1.33 (i.e., 0.5 x 1 + 0.335 x 1 + 3 x 0.165).

To obtain the time course of the contributions, Appendix C was used. Appendix C needs the following matrices based on G:

The results are shown in Table 2 for the time course of contributions from category 2. The contributions converged in cohort 10.


 
View this table:
In this window
In a new window

 
Table 2. The time course of expected contributions from an individual female parent of age one at t = 0


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*APPLICATION OF MODELS AND...
*DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

This study developed a framework for predicting the expected genetic contributions of individuals and categories of individuals under a wide range of selection and inheritance models. This framework allows selection to be more properly accounted for compared to existing gene-flow methods for overlapping generations and multiple breeding groups (such as that presented by HILL 1974 Down). Furthermore, it advances understanding by considering the differential gene flow among individuals within categories, an extension not hitherto achieved except in some special cases. The framework was constructed by first modeling the selection process and the transfer of selective advantages within a single generation of selection, and second, extending this to multiple generations. Two regression models are required, both of which are derived using standard index theory: first, a model describing the expected number of selected offspring a parent may have ({Lambda}); and the second describing the relationship of the selective advantages of a selected offspring with those of the parent ({Pi}). Predictions of genetic gain directly follow from the expected long-term contributions. Unlike {iota}{rho}{sigma}A, the relationship between gain and contributions (Equation 3 and Equation 10) shows that gain comes from generating a covariance between the long-term contributions and the new variance arising in the population (i.e., the Mendelian sampling variation) in each cohort, thus changing the description of gain from a statistical one to a genetical one.

The framework has been developed to describe the expected genetic contribution over all time horizons from the short-term to the long-term. The novel, closed formulae (Equation 7aEquation 7b and Equation 9) produced for the expected long-term contribution of an ancestor rely on the assumption of equilibrium in the selection process. If there is no equilibrium the error will depend on the relative degree of departure in relation to the timescale of convergence of the contributions (approximately five generations). However, this assumption is not necessary for the use of Equation 11aEquation 11bEquation 11c, where contributions are predicted over finite time periods, but more effort may be required to define the changes in the necessary parameters if there is no equilibrium.

In the development of the framework, the effects of inbreeding on parameters and progress have been neglected, but this is not a serious problem. First, the timescale for the convergence of contributions is small in comparison to the timescale for the effects of inbreeding on parameters in breeding schemes, especially where inbreeding is controlled to be at reasonable levels. The impact of individuals within a cohort is very largely decided within five generations, and even within this period, the scope for controlling an individual's contribution declines exponentially (the scope can be measured by the variance of an individual's contribution within the population). A second reason is that schemes will most usefully be compared at the same rates of inbreeding, and so the neglect of inbreeding is less likely to bias the comparisons made.

The expected long-term contribution has been described in a general linear form {alpha}q + ßTq(si(q) - q), where s is a vector of selective advantages for an ancestor i. Judged by the accuracy of the results in this study, the omission of quadratic terms from the model has not led to serious errors in predicting the rates of gain or the linear component of relationship between the long-term contribution and the selective advantages. Quadratic terms in s do not affect the prediction of rates of gain unless terms of the order E[s2a] are significant (which will involve the skewness of a after selection), and will not influence the predicted rate of inbreeding unless higher moments than the variance of s are considered (WOOLLIAMS and THOMPSON 1994 Down). The linear approximations used in the applications, and presented in the Appendix A, were robust.

The {alpha} represents the proportion of genes that derive from the various categories as a whole, and these can differ qualitatively from predictions using HILL 1974 Down, because the earlier study does not account for the inheritance of selective advantages. The impact of this may be particularly great where breeding structures, subject to selection, are subdivided with migration, either planned or random, taking place between the subdivisions. In these circumstances, ignoring the selective advantage between groups will overestimate the impact of groups of lesser merit and underestimate the impact of cohorts of greater merit. The consequences of these errors may be the maintenance and use of subdivisions that have little potential to contribute in the long-term and a greater rate of inbreeding in the population than had been anticipated (BIJMA et al. 1999 Down). The framework presented here and that of HILL 1974 Down give the same prediction of {alpha} when selection is at random, because (i) elements of G are identical to g0,pq, (ii) {Pi} = 0, and (iii) {Lambda} = 0.

The genetic contribution of an individual represents the expected impact its Mendelian sampling term has on the population. Within a cohort, the magnitude of the contribution made by an individual will depend upon the breeding categories in which it is included over its lifetime. In any newborn cohort, even when generations overlap, the males are expected to have a total long-term contribution equal to those of the females, i.e., {Sigma}male categories Xq{alpha}q = {Sigma}female categories Xq{alpha}q. When generations are discrete these sums are equal to one-half, but when generations overlap the sums will be less than one-ha