Abstract
The efficiency of marker-assisted selection (MAS) based on an index incorporating both phenotypic and molecular information is evaluated with an analytical approach that takes into account the size of the experiment. We consider the case of a population derived from a cross between two homozygous lines, which is commonly used in plant breeding, and we study the relative efficiency of MAS compared with selection based only on phenotype in the first cycle of selection. It is shown that the selection of the markers included in the index leads to an overestimation of the effects associated with these markers. Taking this bias into account, we study the influence of several parameters, including experiment size and heritability, on MAS efficiency. Even if MAS appears to be most interesting for low heritabilities, we point out the existence of an optimal heritability (~0.2) below which the low power of quantitative trait loci detection and the bias caused by the selection of markers reduce the efficiency. In this situation, increasing the power of detection by using a higher probability of type I error can improve MAS efficiency. This approach, validated by simulations, gives results that are generally consistent with those previously obtained by simulations using a more sophisticated biological model than ours. Thus, though developed from a simple genetic model, our approach may be a useful tool to optimize the experimental means for more complex genetic situations.
THE development of highly polymorphic molecular markers has opened a new era for genetics and selection. Most traits of economic importance are quantitative. The use of molecular markers enables one to identify and map quantitative trait loci (QTLs) that are involved in the variation of such traits. For the last six years, the opportunity to use markers in breeding programs to improve the efficiency of the selection of quantitative traits has received extensive attention. Lande and Thompson (1990) developed a method based on a multiple linear regression of phenotype on marker types. In this method, markers are used as cofactors to increase the accuracy in the prediction of genotypic values. Phenotype and estimated effects associated with markers are combined in an index of selection. Experimental results of selection using markers have been published (Stuberet al. 1982; Freiet al. 1986; Stuber and Edwards 1986; Stuber and Sisco 1992; Stromberget al. 1994), but to our knowledge, none of them used the method of Lande and Thompson (1990).
Lande and Thompson (1990) evaluated analytically the expected efficiency of their method compared with conventional selection based solely on phenotype under some restrictive hypotheses (e.g., the size of the population is assumed to be infinite in the major part of the paper; the number of markers is unlimited). They concluded that marker-assisted selection (MAS) should be more efficient when the heritability of the trait is low. They mentioned, however, that to have such an advantage at low heritability, it is necessary to study a large number of individuals. Lande and Thompson (1990) discussed briefly the influence of the sampling size, but only through its effect on the fraction of the additive genetic variance that was estimated to be associated with markers. Gallais and Charcosset (1994) studied analytically the effect of experiment size, but they assumed a strict linkage between markers and QTLs, and they neglected the possibility of false QTL detection and the overestimation of the effects associated with markers.
Other published results on the efficiency of this method or related methods are based on simulations (Whittakeret al. 1995; Gimelfarb and Lande 1994a, b, 1995; Edwards and Page 1994; Zhang and Smith 1992, 1993). In all these works, population size and heritability appear to be the key parameters of MAS efficiency. Compared with an analytical approach, simulations are a powerful tool because they can be designed to be closer to real conditions of selection, and they allow one to evaluate the efficiency of the method for many successive generations of selection. Even if simulations are quicker than field studies, they are time consuming, especially when the objective is to evaluate the effect of a large range of parameters or possible interactions between parameters. Moreover, simulations give mostly descriptive information on the effect of a parameter and may not provide explanations about the way this parameter influences the efficiency of MAS.
Extending the preliminary analysis of Gallais and Charcosset (1994), with fewer restrictive assumptions, this paper presents an analytical approach to evaluate the relative efficiency of MAS, and it addresses the effect of population size.
THEORY
Lande and Thompson's analytical approach when population size is infinite: Lande and Thompson (1990) proposed to select individuals in a population assumed to be of infinite size on an index combining phenotype and effects associated with markers. Because our aim is to predict the mean genetic value of the population after one cycle of selection, the approach of Lande and Thompson (1990) can be extended to the prediction of the genetic value of the offspring of the individuals of the population. The selection index is then
Denoting Gi, the genetic value of the offspring of the individual i, we then have: Gi = ½Ai, where Ai is the additive genetic value of i.
Assuming that Ĥ is normally distributed and that the selection is conducted on both sexes, the genetic advance (ΔGMAS) obtained with MAS after one generation is given by
Assuming that P is distributed normally, the genetic advance in the next generation under conventional phenotypic selection is
The relative efficiency (RE) of MAS compared with phenotypic selection can be defined as the ratio of the genetic advance under MAS to the genetic advance under phenotypic selection with the same intensity of selection
When the size of the population is infinite, there is no error in the estimation of the weight coefficients nor of the effects associated with markers. In that case, the relative efficiency of MAS depends only on the heritability of the trait and on the proportion of phenotypic variation associated with markers. When the size of the population is finite, the weight coefficients and the effects associated with markers are estimated with a possible error. This experimental error leads to a smaller efficiency than expected under the assumption that parameters are known. Moreover, as mentioned by Lande and Thompson (1990), there is a bias in the estimation of the parameters because only markers with significant effects are included in the index. The authors considered that this bias could be neglected because markers can be chosen a priori from the results of a QTL detection experiment conducted in a previous generation.
Genetic model used to study the case of a population of finite size: Consider a reference population of infinite size derived from a cross between two inbred lines. Such populations are currently used to search for QTLs in plant species, and the most commonly used are F2, backcross progenies (BC), recombinant inbred lines (RIL), or doubled haploids (DH). Consider a normally distributed quantitative trait that is influenced by numerous (l) unlinked QTLs with no epistasis. Each QTL is supposed to be linked to a single marker. The observed rate of recombination between a QTL and its linked marker, r, is assumed to be the same for all the marker-QTL pairs. It is assumed that this situation provides a relevant approximation for several markers in the vicinity of each QTL, r being the smallest recombination rate between the QTL and the markers. With these assumptions, the additive genetic variance associated with markers is (1 − 2r)2σ2A. The parameter m2 = (1 − 2r)2 is the fraction of the total additive genetic variance truly associated with markers. It is the maximum percentage of genetic variance that can be detected with this set of markers. In addition to the l markers that are linked with a QTL, Nm − l markers unlinked to any QTL are considered (Nm is the total number of markers). We also assume that all the markers are unlinked.
To simplify this approach, we will consider, in a first step, that all QTL effects are equal. Experimental results concerning QTL detection, however, show that the distribution of QTL effects is generally not uniform. Many authors (e.g., Patersonet al. 1991; Edwardset al. 1992) found few QTLs with relatively large effects and many others with smaller ones. Thus, to be more realistic, different types of QTL effects will be considered in a second step.
Relative efficiency of MAS in a given experiment: In a given MAS experiment, we consider N individuals randomly sampled from the reference population. Marker-QTL associations are detected in this sample by a simple linear regression of phenotypes on marker types, with a given probability α of type I error. In Lande and Thompson (1990), the effects associated with markers were estimated by multiple regression of phenotypes on marker types. Since markers are supposed to be independent in this study, simple regression leads to nearly the same estimated marker effects as multiple regression (with, however, a slightly reduced power of QTL detection and a reduced precision in estimations). Yet, simple regression presents the advantage of being simpler from an analytical point of view because it tests each marker effect independently. The estimated effects associated with markers are then used to predict the additive genetic values of the individuals of the reference population of infinite size. This allows us to consider that the candidates for selection are independent from the individuals used to estimate the parameters of the model. The weight coefficients of the selection index that maximize the genetic advance are obtained using Equation 2 where parameters are replaced by their estimations (b^p and b^m). Because we only consider additive effects and populations derived from a cross between two inbred lines, the molecular score M is defined as
Like Lande and Thompson (1990), we suppose that the heritability is known (for a discussion of a confidence interval around the estimated heritability, see Knappet al. 1985). With the assumption that the estimated effects at markers significantly associated with a QTL are independent, we have
Then,
Following appendix A, Equation 4 becomes
Expected RE over all possible experiments of same size: For a given experiment, the association between a given marker and a QTL is tested with Fisher's test. As developed in appendix B, all the parameters estimated at a given marker are functions of the F statistics. Since only markers significantly associated with a QTL are taken into account in the selection index, the expectations of the estimated parameters are obtained by using a truncated F distribution: only F values that are equal or superior to a critical F value need to be considered. It results that the expectations of the estimated parameters are not equal to their true values, but are overestimated. Even if a marker is not linked to a QTL, the expectation of the estimated variance accounted for by this marker is not zero. As shown in appendix B, the use of truncated F distributions allows us to obtain the expected RE over all the results that can be obtained after sampling N individuals from a given reference population.
Numerical applications: The formulas described above show that only m2 (related to r), h2, l, N, Nm, α, and the QTL effect distribution affect the RE of MAS for a given population type. In the numerical applications, we suppose that the population is composed of doubled haploids. Three sizes of experiment (100, 300, or 500), five type I error risks α (1, 5, 10, 20, or 30%), different numbers of QTLs (five or 10), different QTL effects distributions (QTLs with equal effect or QTLs effects following an approximate geometric distribution), and 30 markers are considered. For these parameters, the relative efficiency of MAS is determined when m2 varies from 0 to 1 and when h2 varies from 0.05 to 1.
Validation of this approach by simulations: In previous formulations, some assumptions were made concerning (1) the independence between the parameters estimated at different markers and (2) the expression of the expected efficiency. To validate these assumptions, simulations were performed with conditions as close as possible to those of the theoretical approach. We simulated a population of N = 300 DH (or any population where there are only two classes of genotypes at each locus). The narrow sense heritability of the trait of interest was h2, and the additive variance associated with markers was m2σ2A. We considered a total of 30 unlinked markers. This was chosen to roughly correspond to 10 chromosomes with three nearly independent markers on each. Marker selection was made by a simple regression of phenotypes on marker types with a probability α of type I error. The marker effects and the weight coefficients of the selection index were estimated subsequently as described above. A second population with the same genetic parameters was simulated, and individuals were selected using the estimations made from the first population. One-tenth of the individuals were selected based on either (1) their index value (MAS) or (2) their phenotypic value (PHE). The relative efficiency is computed from Equation 4. For each set of parameters (N, m2, h2, and α), simulations were replicated 100 times. Over 100 simulations, the averages of the percentage of phenotypic variance accounted for by the markers (in the first population) and of the RE were compared with the values predicted by the analytical approach. Using the standard error, a confidence interval at the 5% level was determined for the average values obtained by simulations. When the analytical result was included in this interval, we concluded that the two approaches were not significantly different. In the analytical approach, the bias caused by the selection of the markers was taken into account. To investigate the importance of this bias, the analytical results obtained by considering that this bias could be neglected in the formulae (i.e., with no false detection and no overestimation of the effects of QTLs) were also given and compared with simulation results.
Estimated percentage of phenotypic variance associated with markers
Comparisons between simulations and analytical results were made for 24 conditions [two α probabilities of type I error (5% and 20%), three heritabilities (0.15, 0.3, and 0.45), and four m2 parameters (0.30, 0.5, 0.7, and 0.9)].
To simplify the formulation of the expected RE of MAS, it was assumed in the analytic approach that selection is performed in the reference population, and not in the sample used to estimate marker-QTL associations, as would be the case in true experiments. To validate this assumption, simulations conducted with only one population instead of two were performed, and the results were compared with those obtained with two populations. Only minor differences were observed between results of simulations conducted with one or two populations(results not presented). The maximum difference in RE was ~0.1, but the differences were generally ~0.01. Hence, our approach seems to be a good approximation of the realistic situation.
RESULTS
Validation of the analytical approach by simulations: The analytical results obtained with bias of selection taken into account or neglected are compared to the simulation results (Tables 1 and 2). The estimation of the percentage of variance associated with markers obtained by the analytical approach is included in the confidence interval at the 5% level of significance of the average value found on 100 simulations (Table 1). Taking into account selection bias through truncated F distribution is, therefore, a valid way to predict the estimated percentage of variance associated with markers. For low m2 and heritabilities, the bias in the estimation of the variance associated with markers is important, the estimated value being at least twice the true value. Detections of false QTLs and overestimations of the effects accounted for by markers explain these differences. For the RE of MAS (Table 2), results of simulations are not significantly different from the analytical ones when selection bias is taken into account. Not considering selection bias leads to overestimating the actual relative efficiency of MAS. This overestimation is especially important for low heritabilities, low m2 and for high α. Simulation results therefore show that the bias caused by the selection of the markers can be important. Since this bias cannot be easily corrected in true experiments, it is important to consider it in the evaluation of the RE of MAS.
Relative efficiency of MAS
Influence of biological parameters (h2, l, and QTL effects distribution) on the RE of MAS: We can distinguish two groups of parameters: those that depend on the biology of the trait (number and effects of the QTLs) and those that depend only on experimental conditions (the size N of the population, the distance between markers and QTLs, and the probability α of type I error). The heritability of the trait results from both the biology of the trait and experimental conditions because it can be increased by using replications of genotypes (in the case of plants) or performances of relatives to reduce the experimental error.
Figure 1 shows the domains of RE of MAS compared with selection based on phenotype in function of r and h2 for N = 100, 300, or 500, as well as for two numbers of QTL effects: 5 or 10 of equal effects. The different domains are separated by lines corresponding to REs varying from 1 to 2.75 by 0.25. It is seen that for a given N and with α = 5%, domains with the highest RE correspond to low heritabilities. When the heritability is high, genotypic values are well estimated by the phenotype, the weight given to the markers in the selection index is low, and MAS tends to be equivalent to phenotypic selection. Nevertheless, with an α risk level of 5%, the RE decreases for very low heritabilities (<0.15). At such heritabilities, the power of detection is small and the effects are estimated poorly. Since MAS can only be efficient if the number of detected QTLs is high enough, there is an optimal heritability that varies slightly with m 2 but is around 0.15–0.2 for N = 300 and five QTLs of equal effects in the model.
Domains of RE of MAS. Efficiency of MAS is compared with selection based solely on phenotype for different heritabilities of the trait (ordinate) and marker-QTL recombination rates (abscissa).The number of markers is 30. Three sizes of population (N) are considered: N = 100 (graphs a and b), N = 300 (graphs c and d), and N = 500 (graphs e and f). Two genetic models are compared: five QTLs of equal r 2q (graphs a, c, and e) and 10 QTLs of equal r2q (graphs b, d, and f).
Effect of heritability and genetic model on RE of MAS. The RE of MAS (ordinate) compared with selection based solely on phenotype is given for different heritabilities (abscissa) of the trait and four genetic models. For the four models, the number of QTLs in each class of effect and the percentage of variance accounted for by each QTL of the class are given. The size of the population was fixed at 300, the percentage of genetic variance accounted for by markers is 0.7, and the type I level risk is 5%. See text for models description.
The RE of MAS for given N and α values decreases as the number of QTLs increases (see the comparison between five QTLs vs. 10 QTLs of equal effects in Figure 1). When there are many QTLs, the individual effect (r 2q) of a given QTL is small, so the power of detection becomes low. The RE also depends on the distribution of the QTL effects.
The RE obtained with the same number of QTLs, but three different QTL effect distributions are compared in Figure 2: (1) 10 QTLs, all with the same r 2q of 10%, (2) 10 QTLs with three having an individual r2q of15.37% and seven with an r2q of 7.7%, and (3) 10 QTLs with one QTL having an r2q of 33.3%, three QTLs with an r 2q of 16.8%, and six QTLs with an r 2q of 2.7%. Model 3 has been chosen to be close to a geometric distribution, as used by Lande and Thompson (1990), corresponding to an effective number nE of five QTLs. The RE with five QTLs of equal effects is also presented in Figure 2 to be compared with the RE obtained with model 3. It appears (cf. Figure 2 with N = 300, α = 5%, and m2 = 0.7) that distribution 3 leads to a better efficiency of MAS than distribution 2, which is slightly better than distribution 1 when the heritability is <0.3. Then, for a given number of QTLs, equal effects lead to a lower RE of MAS compared with an “L” distribution or geometric distribution, as assumed by Lande and Thompson (1990) or Gimelfarb and Lande (1994a,b). In distribution 3, one QTL explains a large part of the genetic variance. The power of detection of such a QTL is high enough for it to be detected even if the heritability is low. Then, compared with a distribution where all the QTLs have the same effect, the RE of MAS is higher for small heritabilities. Conversely, when the heritability is high, the third model leads to a slightly smaller RE of MAS than the other models. This is because of the minor QTLs in model 3 which have an r 2q lower than the QTLs of the other models, as well as a lower power of detection. It appears that the RE obtained with five QTLs of equal effects is close to that obtained with distribution 3. Also, the RE obtained with 10 QTLs of equal effects is comparable to the RE obtained with 15 QTLs and a distribution of effects close to a geometric series with nE = 10 (result not presented). This suggests that the RE obtained with a geometric distribution can be approximated by using a number of QTLs equal to the effective number and by considering that all these QTLs have an equal effect. Nevertheless, by doing so, the RE of MAS is slightly underestimated when Nh2m2 is low and slightly overestimated in the other cases (see Table 3).
Effect of different parameters on the relative efficiency of MAS
Influence of experimental parameters (N, m2, number of markers, and α) on RE: Figure 1 shows that N is an important parameter. Obviously, the RE of MAS increases with N. When N is large, the power of detection and the accuracy of the estimation of the marker-associated effect are increased. Therefore, MAS seems to be interesting only for populations of >100 or 200 individuals. Increasing N is more important when the trait is controlled by a high number of QTLs; the efficiency domains with N = 300 and l = 5 are very close to those obtained with N = 600 and l = 10 (results are not presented) if we consider in both cases an equal percentage of markers linked with a real QTL (i.e., Nm = 30 for l = 5 and Nm = 60 for l = 10). This can be related to the fact that if each marker accounts for a small part of the phenotypic variance (i.e., if m2h2/l << 1), the non-centrality parameter of the F distribution is close to Nh2m2/l. Thus, with a given percentage of noninformative markers and a given probability of type I error, the noncentrality parameter appears to be the key parameter that explains the relative efficiency of MAS. This is no longer true for a same total number of markers because there are more noninformative markers when l = 5 than when l = 10. This leads to a lower relative efficiency when l = 5 and N = 300 than when l = 10 and N = 600 because of a higher risk of false QTL detections. Nevertheless, the efficiency domains obtained with a given population size, N, and a given number of QTLs, l, are approximately equal to the efficiency domains that would be obtained with 2l QTLs of equal effect and a population size of 2N.
Figure 1 shows that the RE increases as the distance between markers and QTLs decreases. Obviously, if the markers are the QTLs themselves (m2 = 1 or r = 0), selecting on markers is equivalent to selecting on QTLs. The marker-QTL distance in our model cannot be directly interpreted as a density of markers in a real genetic map because, in the latter case, markers that are linked on the same chromosome are not independent. If the marker density is not too high, then it can be assumed that the correlations between linked markers are not strong enough to greatly modify the results, and r can be roughly related to the density of the markers. If we assume the absence of interference, for the DH population, r can be connected with the distance d between markers and QTLs by the mapping function of Haldane (1919). If we suppose a uniform distribution of QTL positions on chromosomes, the expected distance between a QTL and the nearest marker is one-fourth of the distance between adjacent markers. On our graphs r = 30% could therefore be roughly interpreted as one marker every 180 cM. The realistic domains of the graphs then correspond to a recombination rate between marker-QTL, r, <15% (which corresponds to 4d < 72 cM and m2 > 0.49).
Effect of heritability and type I risk level on RE of MAS. The relative efficiency of MAS (ordinate) compared with selection based solely on phenotype is given for different heritabilities (abscissa) of the trait and different type I error risks used in the detection of marker-QTL associations. The size of the population is 300, the percentage of genetic variance explained by markers (m2) is 0.7, and five QTLs of equal effects are included in the model.
Increasing α is also a way to increase the power of detection, but it results in an increased risk of detecting false QTLs. This parameter therefore has contradictory effects on RE. Figure 3 (with m2 = 0.7 and N = 300) shows that the effect of α depends on the heritability. When the heritability is low, increasing α leads to a better RE. It is the opposite when the heritability is high. Thus, for low heritabilities, the gain in power of detection largely compensates the risk of false detections. As a consequence, the heritability optimum observed with α = 0.05% disappears for higher α values. Table 3, however, shows that the effect of α also depends on the population size and m2. If m2 is low (e.g., for instance 0.5) and if the population size is small (N = 100), RE decreases as α increases. In this situation, even when the heritability is low, the markers that are truly associated with QTLs account for only a minor part of the genetic variation, and the gain in the power of detection does not compensate the risk of false detections. When the size of the population is large (>500) and m2 is high, when α increases, the relative increase of RE tends to be smaller because the power of detection is high enough, even with low α. In true experiments, 2 should generally be >0.5 because of the marker m density now available for many species, and the size of the population will often be <500 because of resource limitations; thus, the choice of α may be important to maximize the gain of MAS.
DISCUSSION
One of the aspects of this study is to take into account the bias caused by the selection of the markers included in the index. This bias affects the estimation of the weight coefficients of the selection index and also the estimation of the additive effect of each marker. In true experiments, it is difficult to avoid this bias. Gimelfarb and Lande (1994a) proposed to evaluate it by simulating a population of individuals with phenotypes controlled exclusively by the environment, i.e., with no QTL. This method, however, only considers the bias on the estimated percentage of variance that is accounted for by false positives and the bias that could be corrected by using the adjusted Rsquare. The overestimation of the true QTL parameters is not taken into account. Hence, the authors found that the correction using their method is negligible when the population size is >200, but their method does not correct all the bias. With our approach, we considered the bias caused by the overestimation of the QTL parameters, and we showed that it has an important effect when the power of QTL detection is small (Nm2h2 low) even if the population size is >200. By means of simulations, Beavis (1994) also found that the effects associated with detected QTLs are more overestimated when the population size is low, and that this bias cannot be neglected even for a population of 500 individuals. Of course, it would be most useful to have an estimation of this bias to correct the estimated values, but this requires further investigation.
It was shown that the RE of MAS depends on the genetics of the trait of interest, experimental characteristics, and options concerning data analyses. Lande and Thompson (1990) did not discuss the effect of the number of QTLs but, as for us, the RE of MAS depends on the QTL effect distribution: MAS is more efficient for small nE, i.e., when a few QTLs explain a major part of the variation of the trait. Our results suggest that considering l QTLs of equal effects may give a good approximation of the RE that would be obtained with a more realistic geometric distribution of effects corresponding to a nE of l. The problem is to determine how many QTLs are involved in the variability of quantitative traits under interest and how their effects are distributed. Beavis (1994) showed that the number of QTLs detected in a given experiment provides only an underestimation of the true number of QTLs, which may be >20 for most traits. In this study, we consider only a small number of QTLs (<20). The effect of a higher number of QTLs could have been investigated, but with many QTLs, it is no longer realistic to consider them as independent. Without this assumption of independence, the calculations are much more complex. It is known that linkage between two QTLs could result in the detection of a “ghost” QTL between them, which would decrease the efficiency of MAS when compared with that obtained with the same number of QTLs assumed to be independent. The number of QTLs considered in this study can roughly be seen as the number of independent chromosomal areas that are involved in the variability of the trait.
We demonstrated that even if the RE is generally higher for low heritabilities, when the population size is finite, there is an optimal heritability below which the RE decreases. This was mentioned by Lande and Thompson (1990) in their discussion and was confirmed by Hospital et al. (1997) by means of simulations. When the population is finite but large (>500), however, the optimal heritability is close to zero, and this is of little practical impact.
We showed that RE increases with population size. Lande and Thompson (1990) studied the effect of population size on the estimated percentage of genetic variance accounted for by the markers significantly associated with a QTL. Their results, however, indicated that the population size below which MAS is not efficient is much higher than the one we found. Lande and Thompson (1990) considered that a marker is always included in the index if the expectation of the estimated variance accounted for by this marker is equal or superior to a threshold value corresponding to a type I error risk of 1%; conversely, if the expected value of the estimator is under the threshold value, the marker is considered to be never included in the index. Yet, in actual experiments, such a marginal marker may have a very low but not zero power of detection. Hence, when the population size is small, Lande and Thompson (1990) underestimate the expected percentage of variance associated with markers and, thus, the RE of MAS because in this situation no marker is considered “possibly detected.” Our analysis shows that even if MAS is more efficient for high population sizes, it is still efficient for populations of ~200 individuals, which are more realistic than the large population sizes indicated by Lande and Thompson (1990).
We pointed out that MAS is more efficient when the distance between markers and QTLs is small. Such a result was also observed by Edwards and Page (1994) when only one marker was linked to each QTL, as in our model. This effect was reduced when they used a model with two markers linked to each QTL. Moreover, Gimelfarb and Lande (1994a,b, 1995) showed that with many markers on each chromosome there is an optimal density of markers, above which the RE decreases. According to their results, it seems that the effect of the marker-QTL distance depends mostly on the number of markers on each chromosome. If there are numerous markers on each chromosome, the flanking markers of each QTL give complementary information and the distance between them is not the deciding factor—until a certain limit. The results we obtained by considering that markers are close to the QTLs may be a good approximation of the RE obtained with two flanking markers further apart.
In our approach, marker-QTL associations are supposed to be detected by a simple regression. It is now acknowledged that the methods based on the simultaneous use of numerous markers (nearby markers or markers linked to other QTLs) can improve the power of detection of a given marker-QTL association and the precision of the estimated QTL effects (see Lander and Botstein 1989; Zeng 1994; Jansen and Stam 1994). Lande and Thompson (1990) proposed a stepwise regression that was used by Gimelfarb and Lande (1994a,b, 1995) and Hospital et al. (1997). Obviously, the methods leading to the best power of detection should be used and could lead to better results than the simple method investigated here. For all the possible methods, a stopping rule for the introduction of markers into the index must be defined. In our approach, we studied the effect of the type I error risk, and we found that it must be adapted according to the heritability. If the heritability is low (<0.3), it is better to use a larger type I error risk. It is the reverse at high heritabilities, but the effect of type I error risk is then expected to be rather small. Thus, it is generally best to avoid using a low type I error risk, except when the heritability is high and estimated accurately. In our approach, we did not study the effect of including too many parameters in the model. A related problem was investigated by Gimelfarb and Lande (1994a, 1995), who concluded that there is an optimal number of markers to be included in the index.
In spite of the assumptions made, our analytical results are consistent with results of simulations conducted in this study, justifying the statistical approach. Moreover, our results are also generally consistent with the results obtained by means of simulations by other authors using more sophisticated biological models.
The problem of MAS profitability: Our results show that MAS can be more efficient than selection based only on phenotype in a large range of situations, as long as the size of the population is at least 200, the heritability of the trait is between 0.05 and 0.5, and the markers are relatively close to the QTLs. While the expected RE of MAS is higher for low heritabilities (0.1–0.2), however, our simulations show that the frequency of experiments that lead to a worse genetic gain with MAS than with phenotypic selection (RE < 1) is higher (e.g., with five QTLs of equal effects, N = 300, m2 = 0.5, α = 5%, and h2 taking the values 0.15, 0.3, and 0.45, the number of simulations where RE < 1 are, respectively, 13, 9, and 5). This was studied in more detail by Hospital et al. (1997) by using a larger number of simulations and a more complex genetic model. This element reinforces the interest of MAS for medium heritabilities (0.3–0.4).
Nevertheless, even if MAS is more efficient than phenotypic selection, this method is expensive because of the genotyping. Doing several replications of each genotype or using complementary information coming from relatives can be a way to increase the heritability and improve the efficiency of phenotypic selection. Situations can be found where additional replications may be less expensive than obtaining marker data and yet lead to the same genetic advance. For instance, with N = 300, m2 = 0.8, and 10 QTLs of equal effects, the maximum RE of MAS is obtained for a h2 of 0.15 and is ~1.54. In this case, phenotypic selection can provide the same genetic gain as MAS by using three replications of each genotype instead of one. If the genotyping is more expensive than adding two extra replications (which is presently the case for most traits and most markers), MAS is not profitable. New marker techniques based on PCR can reduce the cost of MAS. Our analytical approach can be used to predict the expected genetic gain for a large range of parameters and then to define the best strategy of allocating resources for a given cost of genotyping. This point is currently being investigated in our laboratory.
This problem of cost is complex because even in situations where MAS is not more profitable than other methods of selection in the first generation of selection, it allows one to select the individuals based on their marker types in the second generation. There is no need to evaluate their phenotype and no need to determine all the marker types; it is only necessary to evaluate the genotype at markers linked to QTLs. If the selection can be performed on immature individuals, the second selection cycle can be achieved in a single generation. This possibility of adding one selection cycle based only on markers was not considered in our analytical approach, but simulations (Hospitalet al., 1997) indicate that this may be a very efficient strategy in terms of genetic gain per time unit, even for high heritabilities, provided that there is a close linkage between markers and QTLs. If the heritability is high, the power of detection of QTL effects is high, and markers bring nearly the same information as the phenotype, but in the second cycle, the selection on markers only is faster.
Acknowledgments
We thank the anonymous reviewers for their helpful comments on an earlier version of this article.
APPENDIX A
The expected genetic gain of MAS is defined by Equation 3 in the text. The phenotype Pi of the individual i can be written as the sum of its additive genetic value, Ai, and a residual term, Ei, which includes an environmental error term and a genetic term corresponding to nonadditive effects
The genetic value of the offspring of i is equal to ½Ai. In the case of a population of finite size, if the selection is made on the two sexes, it follows that
The parameter λ depends only on the population type: λ = 1 for HD or RIL; λ = 0.5 for F2 and λ = 1/4 for BC. The variance of the molecular score is computed as
APPENDIX B
For a given experiment, the association between a given marker and a QTL is tested with a Fisher test. If there is no QTL near the marker, then the statistic follows a central F distribution. The probability of declaring the effect significant (false positive) is equal to the α type I level risk. If there is a QTL near the marker, the statistic follows a noncentral F distribution. The probability of detecting it is the power of detection. Following the approach described by Soller and Brody (1976) and Knapp and Bridges (1990), the power is defined as the probability Pr[FC−1,N−1,ϕ > Fcritα,C−1,N−1,0] where C − 1 is the number of independent estimated parameters (C − 1 = 1 for the simple regression), Fcritα,C−1,N−1,0 is the critical value from a central F distribution used to test QTL effect with a type I error of α, and FC−1,N−1,ϕ is a random variable from a noncentral F distribution (ϕ ≠ 0). Following Charcosset and Gallais (1996), the noncentrality parameter is ϕ = (N − 1)
Because all the markers are independent, among the lj markers truly linked to QTLs having the same effect (aj), we consider that the number of markers detected to be associated with a QTL follows a binomial law B(pj, lj), where pj is the power of detection. There are as many different binomial laws as the number of different QTL effects (z) included in the model. In the same way, among the markers unlinked with QTLs, the number of markers erroneously detected to be associated with a QTL follows a B(α, Nm-l) binomial law, where
In Equation 7, three terms depend on estimations coming from the experiment
To test the presence of a QTL near a marker, Fisher's test is performed. The value of the statistic, F, is connected with the estimated effects associated with markers
The value of the estimated effect at a given marker varies from one experiment to the other but its expected value over all the possible experiments equals the true effect because the estimator is unbiased. Nevertheless, only markers with a significant effect are introduced in the selection index. As mentioned by Lande and Thompson (1990), this selection leads to a bias because the F statistic at each marker is always bigger than the critical value. To avoid this bias, Lande and Thompson (1990) proposed to detect the marker-QTL associations from one sample and to evaluate the effects accounted for by the selected markers in another sample that is independent from the first one and where the selection is conducted. This solution is not realistic in most situations because it implies genotyping a lot of individuals but using only half of them for QTL detection, or using a previous generation to choose the markers. Here we only consider the RE of MAS in the first generation of selection. Because it is difficult to avoid this bias, we take it into account when evaluating the expected relative efficiency of MAS by considering that estimated effects associated with markers depend on a “truncated” F distribution.
Because the expected effect associated with a marker, considering truncated F distribution, is not null, we can associate an expected effect for false-positive markers. This expected effect because all the more important as α becomes smaller. For the same reason, the effect of a marker that is truly linked to a QTL is overestimated. The overestimation becomes all the bigger as the power of detection (and α) becomes smaller.
Footnotes
-
Communicating editor: B. S. Weir
- Received January 21, 1997.
- Accepted November 12, 1997.
- Copyright © 1998 by the Genetics Society of America