Abstract
Using the concept of conditional coancestry, given observed markers, an explicit expression of the accuracy of marker-based selection is derived in situations of linkage equilibrium between markers and quantitative trait loci (QTL), for the general case of full-sib families nested within half-sib families. Such a selection scheme is rather inaccurate for moderate values of family sizes and QTL variance, and the accuracies predicted for linkage disequilibrium can never be reached. The result is used to predict the accuracy of marker-assisted combined selection (MACS) and is shown to agree with previous MACS results obtained by simulation of a best linear unbiased prediction animal model. Low gains in accuracy are generally to be expected compared to standard combined selection. The maximum gain, assuming infinite family size and all QTLs marked, is about 50%.
THE linkage maps developed in recent years for several farm animal species have given a new impetus to the theoretical investigations initiated by Neimann-Sorensen and Robertson (1961) on the role that individual gene or marker identification could play in breeding schemes, compared to classical quantitative genetics methods relying only on measurements of the traits of interest. By integrating marker information into artificial selection for polygenic traits, a system known as marker-assisted selection (MAS), the efficiency of selection can be increased substantially, as shown by Lande and Thompson (1990). They considered situations of linkage disequilibrium between markers and quantitative trait loci (QTL), which create statistical associations between traits and markers. In such a situation knowledge of the individual marker genotype provides information on the individual's breeding value. This is not the case in situations of linkage equilibrium, where no association between traits and markers exists at the population level. The information that can then be used in selection is not the individual marker genotype, but rather the marker allele transmission in a given family. In the absence of any knowledge on this transmission, the covariances between relatives needed for establishing selection indices, or best linear unbiased prediction (BLUP) of breeding values, can be based only on evaluations established over the whole genome. If markers close to QTL can be followed in their transmission, more accurate coancestries for segments of the genome including QTL can be obtained and used for breeding value prediction. This was first suggested by Soller and Beckman (1982) in a dairy bull selection context. More generally, Chevalet et al. (1984) introduced the concept of conditional coancestry (or conditional identity by descent, given the observed markers in two individuals) for a marked region of the genome, from which conditional covariances between relatives might be obtained. The additional information coming from marker transmission may be included in the prediction of breeding values, as later shown by Fernando and Grossman (1989) with BLUP methodology using marker genotypes and trait phenotypes.
Contrary to the situation considered by Lande and Thompson (1990), there seems to have been no attempt to predict the accuracy of MAS in situations of linkage equilibrium, the work of Stam (1986) being a notable exception. Monte-Carlo simulations (e.g., Ruane and Colleau 1995) or prediction algorithms (e.g., van der Beek and van Arendonk 1996; J. J. Colleau and J. Ruane, unpublished results) have been used instead. The purpose of this article is to provide an analytical expression of selection accuracy based on marker transmission within families. The result will then be used to evaluate the gain in accuracy in marker-assisted combined selection (MACS) compared to a strictly combined selection on phenotypes, for populations in linkage equilibrium.
MARKER-BASED SELECTION WITHIN FAMILIES
The situation considered in each family is such that the alleles present at one QTL in the two parents are unequivocally identified by marker genes, and these are located at such a short distance from the QTL that no recombination occurs. Then, if the four genes present in the parents are called (ab) and (cd), four types of offspring are obtained: (ac), (ad), (bc), (bd). The coancestries among those four classes of offspring, conditional on the markers, are easily obtained knowing the number of marker genes in common between any two marker classes. From the standard definition of the coancestry coefficient, when two offspring have received the same two markers (a and c for example), the probability that a random gene from one is identical with a random gene from the other, i.e., their coancestry, is 0.5. Similarly, when two offspring have received only one marker in common (e.g., a), the corresponding coancestry is 0.25. The coancestries obtained can be arranged as shown in Table 1.
Array of coancestries conditional on the markers observed in a full-sib family
Assuming a phenotypic variance of 1 for the trait considered, and otherwise using the notation of Lande and Thompson (1990), we define h2, p, rnh2 and tn to be the heritability of the trait, the proportion of the additive genetic variance due to the QTL, or marked segment, and the genetic and phenotypic variances of family means, respectively. The within-family variance due to the QTL (or marked genetic variance) then is M = (1 − rn)ph2. The variance-covariance matrices of within-family phenotypic and genetic values, P and G, can be written as functions of M and tn, assuming one individual in each class, i.e., a family size of n = 4, by application of the general expression of the covariance between relatives (e.g., Ollivier 1981, p. 28)
In the more general case of n > 4, given that M is the phenotypic correlation among members of the same marker class and assuming n/4 individuals per class, the phenotypic variance of a class mean is (1 − tmn)[M + 4 (1 − M)n−1]. The correlation between class means is then either zero or t = 0.5M[M + 4(1 − M)n−1]−1, and general expressions for P and G can be established as given in the appendix (ignoring the negligible predictive value of the deviation from class mean).
The variance-covariance matrix of the predictors of breeding values is obtained, using classical selection index theory, as the product GP−1G, whose diagonal terms yield the variance of the index (I). Applying the rules of inversion of partitioned matrices for obtaining P−1 (Searle 1982), an explicit expression of GP−1G can be obtained as shown in the appendix, and
The particular case of full-sib families implies m = 1, rmn = rn and tmm = tn. The previous equations then considerably simplify and (1a) becomes
The previous predictions apply when the 4m categories of offspring are equally represented, which requires n ≥ 4. With lower values of n, various structures of the P and G matrices occur, with given probabilities. Only the case of n = 1, which corresponds to a particularly simple situation, will be considered here. When n = 1, each half-sib family is made up of two groups of individuals, receiving either gene present in the sire. The coancestry conditional on the markers is then either 0.25 between offspring belonging to the same group, or zero between offspring belonging to different groups. The predictor is therefore equivalent to a combined selection index, with group size m/2, intragroup phenotypic correlation and genetic covariance both equal to t = 0.5M, and phenotypic variance 1 − tm. The variance of such an index (e.g., from Ollivier 1981, p.
79) is
Accuracy of within-family marker selection, relative to individual phenotypic selection, for various full- and half-sib family sizes (n and m), heritabilities (h2), and proportions (p) of additive genetic variance due to marked QTL, in situations of linkage equilibrium
Table 2 gives values of selection accuracy relative to individual phenotypic selection for various half-sib family sizes. For moderate values of family sizes and p, this type of selection is rather inaccurate. In a typical litter-bearing species, with m = n = 8, it can be seen that p must exceed 0.5 for marker accuracy to exceed individual phenotype accuracy (h). As m increases, less and less is gained from increasing n. For low family sizes, relative accuracy is not much influenced by heritability, whereas for a large number of m or n it is approximately halved when h2 increases from 0.10 to 0.50 (see the columns “large” in Table 2). The particular case of full-sib families is presented in Table 3, which shows that low values of relative accuracy are reached, unless extreme and rather unrealistic values of n and p are assumed.
Accuracy of within-family marker selection, relative to individual phenotypic selection, for various full-sib family sizes (n), heritabilities (h2), and proportions (p) of additive genetic variance due to marked QTL, in situations of linkage equilibrium
It should be noted that in situations of linkage disequilibrium selection accuracy does not depend on the family structure and, as Table 2 shows, attains values which can never be reached under linkage equilibrium, even with extremely large family sizes. It can be seen that when both m and n become very large in Equation 1a Var(I) → 0.75 ph2 (1 − t2)−1, and when n becomes very large in Equation 2 Var(I) → 0.5 ph2 (1 − t1)−1. The corresponding maximum accuracies,
APPLICATION TO MACS
Once the marker index is defined, a selection index combining phenotypic measurements on the individual and its relatives with this marker index can be established. When only one type of family is considered, the MACS situation of Lande and Thompson (1990, Appendix I) can easily be adapted to the present situation, by replacing the quantity
Accuracy of marker-assisted combined selection relative to standard combined selection, in percent
For evaluating MACS accuracy in the general case of full-sib families nested within half-sib families, i.e., m and n both different from 1, it will be convenient to replace the overall marker index defined by Equation 1a by two independent marker indices, a within- and a between-full-sib family marker index, whose variances are defined by Equation 2 and the difference between 1a and 2, respectively, i.e.,
DISCUSSION
Several simplifying assumptions have been made to obtain explicit values of selection accuracy. The most important assumption is that marker identification is complete and fully informative regarding each QTL. This will require a highly variable set of markers to reach a high proportion of fully informative matings (see Smith and Simpson 1986 or Götz and Ollivier 1992, for a derivation of this proportion in full-sib families). Quite obviously, this assumption will become harder to meet with increasing number of dams per sire. We have also assumed tight linkages between markers and QTLs. If their relative positions are known, conditional coancestries taking this information into account can be derived, at the expense of slightly more complex P and G matrices, using the general method of Chevalet et al. (1984), or the method detailed by Haseman and Elston (1972) for full-sib families. On the other hand, flanking markers recombining at a rate r with the QTL may be considered as roughly equivalent to a single marker with a recombination rate of r2. Ruane and Colleau (1995) have shown that such an approximation is acceptable for distances of the order of 10 cM.
The accuracies derived from Equations 1 are very close to those obtained in generation 1 by Ruane and Colleau (RC, 1995, Table 4) in their simulations using a BLUP animal model with one QTL and p = 0.125. The relative accuracies for m = n = 8 and p = 0.125 in Table 4, below 1%, are also quite close to the corresponding values simulated in the first generation of RC, relative to conventional BLUP. In later generations, RC noted a slight increase in relative accuracy, reaching a maximum of 106% in generation 3 for h2 = 0.10. This might be the result of the inclusion of additional pedigree information. On the other hand, the changes of gene frequencies at the QTL while selection proceeds reduce the validity of the initially predicted accuracy. It should also be noted that in their simulations RC assumed a high number of alleles at the marker locus, as each founder was assumed to carry 2 unique alleles. They also assumed a rather tight linkage between markers and QTL, and observed a slight increase in response with an almost complete linkage (see their Table 6).
More general assumptions, such as low individual QTL effects, as in the treatment of linkage disequilibrium situations, are also to be made. This means that p should be allowed to increase with the inclusion of additional QTLs, rather than with an increased single QTL effect. RC's simulations indeed show that when the QTL effect is increased, the superiority of MACS over conventional BLUP may be reduced.
The approach applied here to combined selection may easily be extended to other types of MAS, such as the marker-assisted sib selection addressed by Stam (1986). It can be shown that in this case the approach outlined previously leads to the same expression of accuracy as given by Stam (1986, p. 175). It should be noted, however, that the latter derived a within-family accuracy, neglecting the rm and tm quantities involved in the overall accuracy considered here.
In conclusion, this study shows the difficulty of achieving substantial gains in accuracy by MACS in situations of linkage equilibrium, contrary to the optimistic concluding remark of Chevalet et al. (1984). This brings additional support to the thesis developed by Smith and Smith (1993), on the need to detect close linkages, hopefully inducing linkage disequilibria and so allowing MAS to be more efficient and simpler to implement in practice. However, for traits not measured on the candidates for selection, marker information remains valuable under linkage equilibrium, particularly when large half-sib families are available.
APPENDIX: PREDICTION OF BREEDING VALUES BASED ON MARKER TRANSMISSION IN HALF-SIB FAMILIES
The phenotypic variance-covariance matrix of marker class means can be written as a function of the quantities M, tmn and t, defined in text.
From the expression of B12 given above, C11 and C12 may be expressed as functions of B11, such as
It can be shown that when full-sib families are considered m = 1, tmn = tn,rmn = rn, a − b + c = 2(1 + t)−1 and a + 2b = (1 + 2t)(1 + t)−1. Equation 1a then reduces to Equation 2.
Footnotes
-
Communicating editor: B. S. Weir
- Received July 14, 1997.
- Accepted November 11, 1997.
- Copyright © 1998 by the Genetics Society of America