help button home button Genetics Plant Phys
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pérez-Enciso, M.
Right arrow Articles by Varona, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pérez-Enciso, M.
Right arrow Articles by Varona, L.
Genetics, Vol. 155, 391-405, May 2000, Copyright © 2000

Quantitative Trait Loci Mapping in F2 Crosses Between Outbred Lines

Miguel Pérez-Encisoa and Luis Varonaa
a Centre UdL-IRTA, Area de Producció Animal, 25198 Lleida, Spain

Corresponding author: Miguel Pérez-Enciso, Station d'Amélioration Génétique des Animaux, INRA, BP 27, 31326 Castanet-Tolosan Cedex, France., mperez{at}toulouse.inra.fr (E-mail)

Communicating editor: C. HALEY


*  ABSTRACT
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We develop a mixed-model approach for QTL analysis in crosses between outbred lines that allows for QTL segregation within lines as well as for differences in mean QTL effects between lines. We also propose a method called "segment mapping" that is based in partitioning the genome in a series of segments. The expected change in mean according to percentage of breed origin, together with the genetic variance associated with each segment, is estimated using maximum likelihood. The method also allows the estimation of differences in additive variances between the parental lines. Completely fixed random and mixed models together with segment mapping are compared via simulation. The segment mapping and mixed-model behaviors are similar to those of classical methods, either the fixed or random models, under simple genetic models (a single QTL with alternative alleles fixed in each line), whereas they provide less biased estimates and have higher power than fixed or random models in more complex situations, i.e., when the QTL are segregating within the parental lines. The segment mapping approach is particularly useful to determining which chromosome regions are likely to contain QTL when these are linked.


QUANTITATIVE traits arise from the joint action of the environment and multiple genes, usually called quantitative trait loci (QTL). The wide availability of DNA markers scattered along the genome, together with recently developed statistical methods, has spurred the massive search for QTL in any species of interest. Crosses between highly divergent lines are a powerful experimental design for this purpose (LYNCH and WALSH 1998 Down). The optimum situation in a F2 design occurs when all genes affecting the trait of interest are diallelic with the alternative alleles fixed in each parental line. Although in annual plant species and some lab animals highly inbred lines that may fulfill this condition have been developed, outbred parental populations are normally the only genetic material available in domestic animals (e.g., ANDERSSON et al. 1994 Down) or trees (e.g., GRATTAPAGLIA et al. 1995 Down), as well as in allogamous wild species (e.g., HUNT et al. 1998 Down). The QTL analysis of crosses between outbred populations poses two main statistical problems (reviews in BOVENHUIS et al. 1997 Down; HOESCHELE et al. 1997 Down; ELSEN et al. 1999 Down). The first one concerns the validity of the genetic model assumed in the analysis. The second one is related to accounting for the variation in the rest of the genome when fitting a QTL model at a particular position.

The usual model for analyzing F2 crosses (LANDER and BOTSTEIN 1989 Down; HALEY and KNOTT 1992 Down) is based on estimating the QTL effect from the phenotypic differences between individuals according to the estimated percentage of breed origin at a given position, assuming that alternative alleles are fixed in each parental line. We call this model the fixed model. Yet, the fact that heritability for a given trait is nonzero, as in most outbred lines, implies that there exists additive variation within lines and thus not all alleles affecting the trait can be fixed. There are also methods that allow for QTL segregation where the QTL effect is modeled as a normally distributed random variable with mean zero and variance to be estimated. This is the random model. The random model strategy has been put forward by several authors in the context of the analysis of outbred populations (FERNANDO and GROSSMAN 1989 Down; GOLDGAR 1990 Down; XU and ATCHLEY 1995 Down; GRIGNOLA et al. 1996 Down). The QTL variance is estimated by assessing the degree of phenotypic similarity between relatives according to the probability of sharing identical by descent alleles at specified positions. But the random model does not seem appropriate for the analysis of F2 crosses because no particular distinction is made between allele breed origin in current implementations. A strategy similar to the random model is the within-family analyses, where each family (e.g., descendants of each sire) is analyzed separately and the results pooled (e.g., KNOTT et al. 1996 Down). However, this approach will tend to have small power when the family size and the QTL effect decrease.

A mixed-model approach that accounts for variation both between and within lines is thus the most appropriate strategy for analyzing F2 crosses between outbred lines. GODDARD 1992 Down proposed a QTL mixed-model strategy for genetic evaluation that can potentially be applied to crosses between outbred lines, but marker information is used only to model covariances between QTL effects, not means, and the method does not account for differences in means and heritabilities between breeds in the genetic covariance matrix of crossed individuals; it is also assumed that marker phases are known in constructing the relationship matrix. LO et al. 1993 Down developed the covariance between relatives in crosses between outbred populations for a number of unlinked loci and without marker information, whereas WANG et al. 1998 Down studied the case of a single marker and a QTL in a genetic evaluation context.

The problem of accounting for the genetic variation in the rest of the genome has been addressed by proposing the use of cofactors ("composite interval mapping"; JANSEN 1993 Down; ZENG 1993 Down), but it would be desirable to have a methodology that addresses the issue more generally. Other authors have included a polygenic effect in addition to the fixed QTL effect (e.g., FERNANDO and GROSSMAN 1989 Down), but this does not allow for the fact that not all the genome contributes equally to the genetic variation and implies that this polygenic component is unlinked to the QTL of interest.

In this work we derive the genetic covariance matrix in crosses between outbred lines allowing for any number of linked markers and QTL, thus permitting a general QTL analysis of F2 crosses. This mixed model allows for more flexible genetic models than current strategies. We also propose a method, "segment mapping," aimed at accounting for the variation in the whole genome simultaneously. The method also allows us to test genetic variance differences between breeds. A simulation study is carried out to compare the performance of segment mapping and mixed model mapping with classical methods, i.e., a genome scan using fixed or random models.


*  THEORY
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The breeding value of an individual is, by definition, twice the average performance of an infinite number of its offspring when mated to a random sample of spouses from the same population. The starting point is the assumption that the breeding values (g) of two outbred populations A and B are normally distributed gA ~ N(µ + , {sigma}2A) and gB ~ N(µ - , {sigma}2B), respectively. The phenotypic difference between breeds for the trait of interest is thus {Delta}. Genetic variation within breeds is assumed to be caused by an indeterminate number of loci in genetic equilibrium with additive action. Further, consider that the whole genome is divided in nseg segments and that a vector containing the additive genetic values from the population of breed A can be expressed as gA = {Sigma}nsegs=1gA,s, where gA,s is the contribution of segment s to total breeding value, and Var(gA) = {Sigma}nsegs=1 Var(gA,s) = {Sigma}nsegs=1 GA,s because of linkage equilibrium. In the absence of molecular information, Var(gA) is the well-known additive relationship matrix and GA,s is the same for all segments (weighed by the segment's length). However, the availability of marker information makes it possible to compute the probabilities of identity by descent at particular positions of interest (e.g., FERNANDO and GROSSMAN 1989 Down).

The goal of the approach presented here is to estimate, conditional on marker information, the contribution of each segment to total genetic variance/covariance between the F2 individuals and to ascertain the expected phenotypic mean of individuals according to the percentage of breed origin in each particular segment. A reasonable strategy would be to include loci of similar effect in the same segment but the theory developed is valid for any partition strategy.

Assume that trait performance has been recorded in a F2 cross population derived from breed A and B and that parental, F1, and F2 individuals have been genotyped for a series of markers. A general explanatory model of the F2 records is

(1)

where y is a N x 1 vector containing the F2 phenotypes, X and Z are incidence matrices relating observations to the vector of fixed effects (b) and additive genetic values (g), respectively, and e contains the residuals. In the following we refer only to breeding values in the F2 population and thus the subscript is omitted for brevity. The distribution of the random variables in (1) is

(2)

where V = ZGZ' + R, G is the genetic covariance matrix conditional on marker information as specified below, R = I{sigma}2e, I being a diagonal unit matrix and {sigma}2e the residual variance, Q is a N x nseg matrix with elements qi,s = , phi,s is the average probability of segment s from individual i and haplotype h being of breed origin A, and {Delta} = {{Delta}s, s = 1, nseg}, i.e., a vector containing the average differences between individuals carrying an A breed origin segment s minus those carrying a B origin segment. Further, Var(g) = G = {Sigma}nsegs=1 Gs, assuming linkage equilibrium in the parental populations and that markers are informative (see the Appendix). Otherwise, the gs from different within-chromosome segments will be correlated. The matrix Gs contains elements Var(gi,s) in the diagonal and Cov(gi,s, gi',s) in the off-diagonal. It is shown in the Appendix that the variance of breeding values of F2 individuals, conditional on marker information, is approximately

(3)

where {sigma}2A,s and {sigma}2B,s are the genetic variances contributed by segment s within parental populations A and B, respectively. Thus, the genetic variance of F2 individuals, conditional on marker information, is a weighted average of the genetic variances in the pure breeds. It is important to realize that the segregation variance (WRIGHT 1968 Down) can be neglected in (3) because the expression above is the genetic variance conditional on marker information. Equation 3 would be exact if the breed origin along the whole genome could be identified without error. Suppose that a subset of individuals with its whole genome of origin A could be identified in an infinitely large F2 population; the genetic variance of these individuals would be {sigma}2A, exactly that of the founder breed A. The additive genetic covariance between F2 individuals is

(4)

where {rho}hA(i,i'),s ({rho}hB(i,i'),s) is the probability of individuals i and i' having identical by descent alleles of breed origin A (B) at segment s and haplotype h. Equation 4 shows that two individuals can share alleles identical by descent of breed origin A or B and that the total genetic covariance is a weighted average of both probabilities.

The model in (1) and (2) together with (3) and (4) provides the general framework to analyze F2 populations using standard mixed-model theory and molecular markers. These equations account for the fact that the average effect of alleles can be different between breeds, but also that there can simultaneously exist a QTL segregation within breeds. The average difference in allelic effects between both breeds is included as a fixed effect through Q{Delta}, whereas the additional variation within breeds is allowed through G. The usual genome scan/regression strategy means that model (1) is fitted with an infinitesimally small segment (= 1 QTL) in successive positions assuming {sigma}2A = {sigma}2B = 0. If only one QTL is fitted at a time, the matrix Q is a vector with coefficients as in, e.g., HALEY and KNOTT 1992 Down. In contrast, {sigma}2A and {sigma}2B are larger than zero for those QTL with alleles not fixed in the parental populations. The simple fixed model is not appropriate because not all differences between individuals due to that segment are fully accounted for by {Delta}s. Note that it is straightforward to accommodate that alleles are fixed in only one of the two breeds.

Molecular information is used to calculate phi,s, {rho}hA(i,i'),s, and {rho}hB(i,i'),s. Note that only the breed origin probabilities are involved in obtaining phi,s, whereas the identity by descent probabilities between marker alleles are required to compute {rho}hA(i,i'),s and {rho}hB(i,i'),s. If two F2 individuals do not have any common ancestor, {rho}hA(i,i'),s = {rho}hB(i,i'),s = 0 necessarily for all segments. But if both are homozygous for marker alleles that can be traced back unambiguously to breed A, phi,s = phi',s = 1, for that particular position, and could differ for other segments. In an ideal situation of infinite number of informative markers, these quantities are easy to compute. For instance the fraction of the genome of origin A is

where {delta}hi(x) is a Dirac function taking value 1 if haplotype h at point x is of origin A and zero otherwise, and Ls and L are the segment length and the total length of the genome in morgans, respectively. If markers are not completely informative or the map is not infinitely dense, several options can be employed. Note that only the breed origin probabilities are needed to compute phi,s and, e.g., the method in HALEY et al. 1994 Down can be employed. In contrast, the identity by descent probabilities need to be obtained to compute {rho}hA(i,i'),s and {rho}hB(i,i'),s. These are given by FERNANDO and GROSSMAN 1989 Down for a QTL linked to a single marker. GRIGNOLA et al. 1996 Down provide a more general algorithm. Monte Carlo Markov chain methods like that in HEATH 1997 Down can also be employed. We have developed a Monte Carlo Markov chain algorithm because of its flexibility and because it takes into account all available information, considering simultaneously the molecular information from all individuals. The procedure is based on a Gibbs sampler that samples and updates successively the phase of markers for every individual conditional on the phase of its spouse, parents, and offspring. For each Gibbs iteration, crossover locations for an individual's genome are simulated conditional on its current phase and the phase of its parents. A noninterference Haldane's mapping function is used. Once all crossover locations are simulated, the parentage between all individuals is obtained by tracing back the genome origins at the specified segments. The total relationship is obtained by averaging the relationship over Gibbs iterates.

Parameter estimates of b, {Delta}s, {sigma}2A,s and {sigma}2B,s can obtained by maximum likelihood using the Simplex algorithm. This algorithm is a derivative-free method and requires only the logarithm of the likelihood, i.e.,

It should be noted that the average G over Gibbs iterates is used here and that the method can be, potentially, improved by marginalizing with respect to G, b, {Delta}, and {sigma}2, as in a Bayesian framework.


*  SIMULATION
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We carried out a simulation study to test the performance of segment mapping and mixed-model scan vs. standard strategies. The F2 pedigree consisted of 5 parental sires from breed A, each mated to 2 dams of breed B that produced 5 F1 sires (1 per parental sire) and 40 F1 dams (4 per parental dam). The number of F2 offspring was 400. A 60-cM chromosome was simulated, and completely informative markers (i.e., each line had different marker alleles and as many alleles as founder individuals were generated) were located at positions 0, 20, 40, and 60 cM. Three genetic scenarios as depicted in Fig 1 were considered. A single telomeric locus explained all genetic differences between lines in scenario 1, and there were two telomeric loci at positions 0 and 60 cM in scenario 2. In scenario 3 there were two spaced clusters of 20 genes each, and the loci were of equal effect located every centimorgan in positions 1–20 and 41–60 cM. Three distinct cases were studied in scenario 1. First (case a) {sigma}2A = {sigma}2B = {sigma}2e = 1 and {Delta} = 0; i.e., this is equivalent to an outbred population, as there are no expected phenotypic differences according to allele origin. In case b the alleles were fixed within breed ({sigma}2A = {sigma}2B = 0), {sigma}2e = 2, and {Delta} = 2. This is the current genetic model assumed in analyzing F2 crosses. And finally (case c), {sigma}2A = {sigma}2B = {sigma}2e = 1 and {Delta} = 2; i.e., there are phenotypic differences between breeds but still there exists additive variance within the parental populations. This is the situation occurring in F2 crosses between divergent outbred populations. It was the only case considered for genetic scenarios 2 and 3. Thirty replicates per model and case were run. The allele effects of the founder individuals were simulated according to its expected distribution, e.g., for breed A, N[, ], where nloci is the number of loci, i.e., 1, 2, and 40 for scenarios 1, 2, and 3, respectively. Phenotypes were generated by summing the allele effects of the F2 individual and adding a residual normal variate of mean zero and variance 1 (case a and c) or 2 (case b). There was no sexual dimorphism and the general mean was the only fixed effect considered.



View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. (a) Scheme of the genetic scenarios and cases considered. The open bar represents the chromosome with numbers at the marker positions. The solid arrows/bars indicate the positions of the QTL for each scenario; the thickness is proportional to the effect of the QTL. The cases considered within each scenario are shown within the frame, where {Delta} is the phenotypic difference between breeds A and B, and {sigma}2A and {sigma}2B are the genetic variance in breeds A and B, respectively. Only case c was considered in scenarios 2 and 3. (b) Scheme of chromosome partitions used with segment mapping in scenarios 1 and 2. The two segments considered are hatched and open, respectively.

Four methods of analysis were compared:

Segment mapping:
The chromosome was divided into two segments, a 10-cM segment (genetic scenarios 1 and 2) or 20 cM (scenario 3) and a segment comprising the rest of the chromosome. The model was

(5)

where the subscript is used to indicate the complement of segment s (here the rest of the chromosome). It was assumed that genetic variances were equal in both breeds and a single variance component was fitted per segment ({sigma}2A,s = {sigma}2B,s = {sigma}2s and {sigma}2A, = {sigma}2B, = {sigma}2). Above, gs is split for convenience into its mean (ps {Delta}s), where ps is a vector with elements , and a random genetic variable (us) with mean zero. Thus,

Several segment partitions were considered. In genetic scenarios 1 and 2, the 10-cM segment was shifted along the chromosome and a total of six analyses were considered, i.e., the first partition consisted of segments at positions 1–10 cM and 11–60 cM; second partition, segments 11–20 cM and the rest (1–10, 21–60 cM); and so on. A scheme of the partitions is in Fig 1B. A similar strategy was followed for genetic scenario 3, except that three partitions of 20 and 40 cM were considered; i.e., the first partition comprised segments 1–20 and 21–60 cM. Note that it is not necessary to establish these successive partitions but it facilitates the comparison with genome scan strategies.

Mixed model:
The point model was

(6)

where us ~ N (0, Gs) as above. This model was fitted in 10-cM intervals for genetic scenarios 1 and 2 and in intervals of 20 cM for scenario 3. The relationship matrix Gs contains the average relationships in that particular interval. The probabilities ps used were the average probabilities in the intervals considered. Note that the common strategy is to compute point probabilities, e.g., every centimorgan, but this has a negligible effect on the results given the small size of the interval and allows us to compare segment mapping with the mixed model and the two other strategies below.

Random model:
The point model was

(7)

Fixed model:
The point model was

(8)

Random and fixed models were fitted in identical intervals as in the mixed-model strategy.

The relationship matrices and ps were obtained after 1000 iterates of the Gibbs sampling scheme. The parameters were estimated in all cases by maximum likelihood using a Simplex algorithm. At each genome partition (segment mapping) or interval position (mixed, random, and fixed models), the likelihood ratio (LR0) comparing models (5), (6), (7), or (8) vs. y = µ + e was computed. In addition, the segment mapping model (5) was compared vs. model

for each segment partition (LRs); i.e., the null hypothesis (H0) tested is that there is no genetic effect in the 10-cM segment (hatched segments in Fig 1B). The likelihood ratios are asymptotically distributed as a chi square with degrees of freedom the difference in number of parameters between models tested. Degrees of freedom are then 1 for LR0,RM (the H0 in the random model is that {sigma}2s is 0) and LR0,FM (the H0 in the fixed model is that {Delta}s is 0), 2 in LR0,MM (the H0 in the mixed model is that both {Delta}s are 0), 4 in LR0,SM (the H0 for segment mapping is that all {Delta}s, {Delta}, {sigma}2s, and {sigma}2 are 0), and 2 for LRs in segment mapping (the H0 is that {Delta}s and {sigma}2s are 0). To study the empirical null distribution of the different LR, we simulated 200 replicates under the null hypothesis ({Delta} = {sigma}2A = {sigma}2B = 0).


*  RESULTS
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Table 1 shows the statistics corresponding to the empirical (simulated) distributions of the different likelihood ratios. The distributions analyzed were those corresponding to the maximum LR at each scan or at each chromosome partition. They are not far apart from the theoretical asymptotic values. There is a trend, as expected, in increasing the mean and variance with the degrees of freedom and, in fact, the empirical threshold is sometimes less conservative than the theoretical chi-square figure P({chi}2 > x0.05) > 0.05. Fig 2 shows the empirical cumulative distribution functions (CDFs) together with their chi-square counterparts. We can conclude as KNOTT and HALEY 1992 Down that, for all practical purposes, the chi-square distribution is a valid approximation in this instance.



View larger version (27K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Empirical and theoretical (Chi-2) cumulative distributions of the several likelihood ratios used in this work. RS corresponds to LRS in the segment mapping approach; the remaining figures correspond to LR0 (see text): SM, segment mapping; MM, mixed model; RM, random model; FM, fixed model. The Chi-2 are the solid thick lines, with degrees of freedom in parentheses in the inset.


 
View this table:
[in this window]
[in a new window]

 
Table 1. Empirical likelihood-ratio distributions

Scenario 1:
Here there is only one QTL in the linkage group studied. The average LRs over segments are in Fig 3 Fig 4 Fig 5 for cases a, b, and c, respectively. These figures are equivalent to a LOD score or F-graphics in a chromosome scan, but we prefer a bar representation to underline that they are tests at discrete positions. Note again that the LR0,SM corresponds to a test where the whole chromosome is considered; it changes only the partition employed (Fig 1B). In the presence of a single QTL, the segment mapping test shows a distinct behavior from that of the point scan strategies (mixed, random, and fixed models). As expected, the scan strategies produce LR0 maxima at the QTL position, and LR0 decreases as the test position moves away. In contrast, LR0,SM also shows a clear maximum with partition 1, whereas the rest of the partitions show a rather flat and nonclearly decreasing profile. The differences between partitions should be due to random fluctuations because no clear pattern emerges. Now consider LRs. This statistic should be larger than zero whenever there is a QTL in the position considered and close to zero elsewhere. This is what we observe, and LRs shows clear maxima at the QTL positions irrespective of the genetic case, a, b, or c. The drop in LRs when we move away from the QTL position is much larger than in scan methods; e.g., compare the change in LR0,MM and in LRS between positions 1 and 2 (Fig 3 Fig 4 Fig 5).



View larger version (35K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Bar profiles of the different likelihood ratios at the positions (partitions) considered. RS corresponds to LRS in the segment mapping approach; the remaining figures correspond to LR0 (see text): SM, segment mapping; MM, mixed model; RM, random model; FM, fixed model. Scenario 1a (1 QTL, {sigma}2A = {sigma}2B = {sigma}2e = 1, {Delta} = 0).



View larger version (40K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. Bar profiles of the different likelihood ratios at the positions (partitions) considered. RS corresponds to LRS in the segment mapping approach; the remaining figures correspond to LR0 (see text): SM, segment mapping; MM, mixed model; RM, random model; FM, fixed model. Scenario 1b (1 QTL, {sigma}2A = {sigma}2B = 0; {sigma}2e = 2, = 1).



View larger version (33K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. Bar profiles of the different likelihood ratios at the positions (partitions) considered. RS corresponds to LRS in the segment mapping approach; the remaining figures correspond to LR0 (see text): SM, segment mapping; MM, mixed model; RM, random model; FM, fixed model. Scenario 1c (1 QTL, {sigma}2A = {sigma}2B = {sigma}2e = = 1).

Although there are some similarities between LR0,MM, LR0,RM, and LR0,FM, their performance depends critically on the underlying genetic model. Consider first case a (Fig 3), where the random model is the most appropriate strategy. It is not surprising that LR0,RM is very close to LR0,MM and LR0,SM in position 1, despite the larger number of parameters involved in the latter two methods. Moreover, Table 2 shows that segment mapping as well as the mixed and random models lead to the same {sigma}2s estimate. Segment mapping and the mixed model clearly show that the mean of allelic effects ({Delta}/2) is zero and that there is no additional variation out of segment 1. All three methods had a 100% power in detecting the QTL. In contrast, the fixed model was the worst strategy considered; not only in 61% of the replicates did maximum LR0 coincide with the QTL position, but also in only 68% out of those 61% replicates were the LR0,FM significant. The {sigma}2e estimate was clearly biased (Table 2).


 
View this table:
[in this window]
[in a new window]

 
Table 2. Results with genetic scenario 1 at segment 1 (1–10 cM)

In contrast, the fixed model (8) is the best choice in scenario 1b because the premise that the QTL affecting the trait are diallelic with alternative alleles fixed in each parental line is fulfilled. Here LR0,RM was much lower than LR0,FM, and this was very similar to LR0,MM and LR0,SM, because no additional parameters are needed. The fixed model yielded unbiased estimates of {sigma}2e, µ, and {Delta}s/2, as did the segment mapping and the mixed-model analysis. A random-model analysis also yielded with power 100% the first position as the most likely one to contain a QTL. But note that total variance ({sigma}2e + {sigma}2s) was overestimated and the mean estimate was biased downward because the assumed genetic model was not adequate.

The most complex, and realistic, scenario is when alleles are not fixed and their average effect differs from line to line (case c). All four analysis strategies identified the correct QTL location (except for one replicate in the fixed-model analysis) with power 100%, and in this sense all methods would lead to the detection of a QTL. But classical methods, either fixed or random models, are not capable of extracting all available information from the data. According to previous results, it is not surprising that the fixed-model analysis resulted in a biased estimate of {sigma}2e, whereas the estimates of {Delta} were much more accurate. Alternatively, the RM analysis provided aberrant estimates of the general mean, but {sigma}2e and {sigma}2s estimates were more realistic. Finally, the mixed model is the most parsimonious and correct model and results in the best estimates. The segment mapping indicates that there is a single segment contributing to the F2 genetic differences, as can be inferred from the dramatic drop in LRs for s > 1. The estimates of {sigma}2s and {Delta}s show that the QTL affects both the variance and the mean.

Scenario 2c:
Consider first the behavior of the likelihood ratio under the different models of analyses (Fig 6). The scan approaches (mixed, random, and fixed models) peaked at both QTL positions with probability close to 50% in all methods (Table 3) because the two QTL were of about the same effect. Again, LR0,RM was higher than LR0,FM, and the power was slightly larger with the random model than with the fixed-model approach. The LR0,SM peaks were more scattered, but almost 50% of the maxima were located at intermediate positions (partitions 3 and 5). These partitions correspond to those where segments containing QTL are grouped vs. segments without QTL. We can think of these partitions as the most "reasonable" ones. Occasionally the LR0,SM peaked at partitions 1 or 6 because in that particular replicate a given QTL effect was much larger than the other QTL effect. In no replicate did the maximum LR0,SM coincide with partition 2 or 5. The plot of LRs clearly indicates that only segments 1 and 6 contain QTL (Fig 6). Moreover, Table 3 shows that SM resulted in unbiased estimates of {sigma}2e irrespective of the partition because the variation along the whole chromosome is always considered (at the expense of logically increasing the number of parameters). The other strategies, the mixed and random model, but especially the fixed model, overestimated {sigma}2e. The mixed-model point estimates of {sigma}2s and of {Delta}/2 collected the variation along the whole chromosome and not only on that position (a phenomenon already described by JANSEN 1993 Down and ZENG 1993 Down for the fixed model but we can see that applies equally to the random or mixed models). The random and fixed models provided much poorer estimates than the mixed model.



View larger version (36K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6. Bar profiles of the different likelihood ratios at the positions (partitions) considered. RS corresponds to LRS in the segment mapping approach; the remaining figures correspond to LR0 (see text): SM, segment mapping; MM, mixed model; RM, random model; FM, fixed model. Scenario 2c (2 QTL, {sigma}2A = {sigma}2B = {sigma}2e = = 1).


 
View this table:
[in this window]
[in a new window]

 
Table 3. Results with genetic scenario 2c

Scenario 3c:
Here the marker positions coincided with segment bounds. The presence of a close but distinct cluster of genes results in a different LR0 pattern as compared to scenario 2c. The LR0,MM and LR0,RM tend now to peak in between both clusters, whereas LR0,FM results in a completely flat profile, with maxima randomly located along the chromosome (Fig 7, Table 4). The LRs allows us to identify convincingly that the intermediate segment contains no QTL. Note that LRs for s = 1 and 3 are significant despite the much lower value compared to the other LR. Again LR0,SM peaked at partition 2. The phenomena already described in scenario 2c are noted again but to a larger extent because more than one linked loci are involved now: there is a bias in {sigma}2e estimates and the point genetic variance collects the variance from the whole linkage group. Note, e.g., that {sigma}2s estimates are the same for all s = 1, 3 with the mixed- and random-model analyses, although there are no QTL on positions 20–40 cM. Again it is not surprising that the fixed model provided unrealistic estimates of {sigma}2e, whereas the QTL effect estimates ({Delta}) are confounded, as in scenario 2c. Segment mapping is the most appropriate analysis tool here and it is the only method providing accurate results.



View larger version (36K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 7. Bar profiles of the different likelihood ratios at the positions (partitions) considered. RS corresponds to LRS in the segment mapping approach; the remaining figures correspond to LR0 (see main text): SM, segment mapping; MM, mixed model; RM, random model; FM, fixed model. Scenario 3c (40 QTL, {sigma}2A = {sigma}2B = {sigma}2e = = 1).


 
View this table:
[in this window]
[in a new window]

 
Table 4. Results with genetic scenario 3c


*  DISCUSSION
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The QTL mixed model developed here is a generalization over the WANG et al. 1998 Down approach by allowing that loci can be linked and making use of the information provided by any number of molecular markers jointly; thus the method can be applied to the analysis of QTL studies of F2 crosses. The methodology presented here shows as well that the covariance between F2 individuals should be split into the probabilities of identity by descent contributed by each breed. Further, the segment-mapping approach allows a global analysis by partitioning the genome, or the chromosome, in segments. RODOLPHE and LEFORT 1993 Down proposed considering the whole genome simultaneously but their approach is a fixed model with multiple regression on all markers genotyped. And this results in a loss of power as the number of markers increases. This does not occur with segment mapping because the number of parameters depends on the number of segments defined, not on the number of markers used.

The simulation results presented show that, under a variety of genetic architectures, the mixed-model and segment-mapping procedures are more robust and flexible strategies than the classical methods based on pure fixed or random models. Segment-mapping, mixed model, and pure fixed or random models are hierarchical levels of analysis complexity, as can be seen from comparing (5), (6), (7), and (8). A likelihood-ratio test can be used to decide whether there is evidence to consider a genetic model more complex than the one assumed in classical methods. Overall, the point mixed model showed optimum performance with a single QTL. The segment-mapping approach will be most useful in the case of linked QTL (Table 3 and Table 4). The LRs will help to determine which chromosome regions are likely to contain QTL. It is interesting that the segment-mapping partition corresponding to the maximum likelihood (at equal number of parameters) occurs when the genome is partitioned according to its effect on the trait. For instance, when the QTL are in both extremes, the likelihood is maximized when a model-partitioning segment equidistant between the two QTL or the two clusters vs. the rest of the genome is chosen (Table 3 and Table 4). But it is also a nice property of segment mapping that, irrespective of the partition actually chosen, it results in general in accurate estimates of {sigma}2e and of the total contribution of the chromosome, {sigma}2s + {sigma}2 and {Delta}s + {Delta}. This contrasts with fixed-, random-, or mixed-model approaches, where accurate estimates are obtained only at the exact position of the QTL.

The classical fixed-model approach is simple to compute and easy to interpret in F2 crosses, although it makes very strong assumptions about allele distributions in the parental lines. We have shown that fixed-model estimates can be dramatically affected if alleles are not fixed within lines, even in one-locus scenarios (Table 2 and Table 4). A systematic upward bias of the {sigma}2e estimate was observed in particular. Allele segregation also results in a loss of power with the fixed model (ALFONSO and HALEY 1998 Down), and it can be seen that the LR0,FM is lower in case a and c than in b, when alleles are fixed (Table 2). In contrast, segment mapping gave reasonable estimates of the QTL mean effects and variance. All in all, it cannot be overlooked that the standard regression approach (LANDER and BOTSTEIN 1989 Down; HALEY and KNOTT 1992 Down) has been successful in identifying QTL in crosses between outbred lines. Some of these QTL have been confirmed in independent experiments (e.g., ANDERSSON et al. 1994 Down; WALLING et al. 1998 Down; M. PERÉZ-ENCISO, A. CLOP, J. L. NOGUERA, C. ÓVILO, A. COLL, J. FULCH, D. BABOT, J. ESTANY, M. A. OLIVER, I. DIAZ and A. SÁNCHEZ, unpublished results, for a QTL on chromosome 4 affecting fatness in pigs), strongly suggesting that they are not false positives and that allele effects are distinct between breeds. Note (Table 2) that the fixed model will tend to identify the correct QTL position even if all genetic assumptions are not fulfilled, at the price of biased estimates and misleading significance levels. The fixed model can be generalized to deal with more than one QTL using cofactors or an n-QTL model, but the presence of gene clusters inevitably causes individual QTL not to be resolved individually, and estimates obtained with a genome scan approach will probably be unreliable. In addition, more than one QTL worsens the performance of the fixed model if the alleles are not fixed within breeds.

It is interesting to compare the performance of random and fixed models under the genetic models considered. The random model was more robust than the fixed-model approach in terms of locating a QTL: the LR0,RM was higher in case b (Fig 4) than LR0,FM in case a (Fig 3), as well as in case c (Fig 5, Fig 6, and Fig 7). That is, the random model behaved better when the random-model assumptions were violated than the fixed model did when fixed-model assumptions did not hold. This is an interesting result; the random model does not seem a priori a reasonable strategy for analyzing F2 crosses as no differences in allelic effects between breeds are assumed. XU 1998 Down studied by computer simulation the performance of random models in analyzing crosses but in a context where several crosses between different inbred lines were analyzed together. We are not aware of actual F2 QTL experiments analyzed using a completely random model. Nonetheless DE KONING et al. 1999 Down have analyzed a F2 cross in pigs using a within-sire regression approach (KNOTT et al. 1996 Down) and a classical fixed model. The former method does not make specific assumptions about number of alleles and frequencies in the parental lines, at the expense of increasing the number of parameters and disregarding genotypic information of dam origin. Interestingly, the two statistical approaches lead to distinct results, both in QTL effect and in location (with the exception of a QTL for backfat thickness on chromosome 7). The within-sire approach exhibited, overall, smaller power than the fixed model. This analysis seems to contradict our simulation results concerning the robustness of the random model, but there are important differences between the random model and the within-sire regression. First, the within-sire regression as used by DE KONING et al. 1999 Down disregards dam information. This can have a negligible effect in very large and outbred populations, but not necessarily so in modest family sizes (22–51 half-sibs in DE KONING et al. 1999 Down) and in a F2 between divergent breeds where the variation contributed by the meiotic segregation in the dam can be large compared to the environmental variance. Second, we have assumed in the simulations a maximum informativity in terms of marker alleles, and it is plausible that the relative performance of the methods differs at lower levels of heterozygosity.

The approximation of (3) depends on the informativity and density of molecular markers. We have not explored in detail the impact of noninformativeness on the segment mapping approach, but it can be seen that the partitions used in genetic scenarios 1 and 2 (Table 2 and Table 3) have segments with one bound not coinciding with markers, i.e., the least informative possible situation. Despite this, the estimates were quite reasonable. Take, e.g., genetic scenario 1 (Table 2): in partition 1 the variance associated with segment 1–10 cM collects almost all genetic variance and {sigma}2 is zero, as it should be. In scenario 2c the only partitions where the 10-cM segment collects a significant variance are the first and last, where QTL are actually located (Table 3). In addition, the LRs statistic has a very distinct behavior depending on whether or not there is a QTL in the particular segment under consideration (Fig 3 Fig 4 Fig 5 Fig 6 Fig 7).

The simulations carried out here have assumed that loci behave additively, both between and within breeds. This may seem a quite strong assumption in view of the ample empirical evidence for heterosis in line crosses (LYNCH and WALSH 1998 Down). The general theory to deal with dominance in crosses between outbred lines has been developed by LO et al. 1995 Down, and it can be extended to deal with molecular markers. Unfortunately the number of parameters that need to be estimated is very large so that in practice one may be confined to providing only approximate estimates of the dominance variance or making strong assumptions about allele distributions. The fixed-model approach and regression-type methods take into account dominance by adding an additional covariable to the probability of the QTL being heterozygous at the position of interest. The same course of action can be followed here, but it should be noted that this strategy presupposes that a diallelic locus is fixed in each line. Otherwise, the dominance deviation estimate will be biased and not accurate.

We have assumed a model {sigma}2A = {sigma}2B, i.e., equal genetic variances across the parental lines, in the analyses reported here. Note, however, that the theory developed allows us to distinguish between genetic variances in each breed. To test this, we ran 30 additional replicates in scenario 1 with parameters {sigma}2e = {sigma}2A = 1 and {Delta} = {sigma}2B = 0. We analyzed the data using a random model with {sigma}2A and {sigma}2B as distinct parameters. The average actual simulated value for {sigma}2A was 0.901, and the estimates were 0.98 ± 0.02 ({sigma}2e), 0.90 ± 0.07 ({sigma}2A), and 0.01 ± 0.00 ({sigma}2B). The estimate of {sigma}2B was exactly 0 in 14 replicates. A likelihood ratio showed that a model including {sigma}2B did not improve over a model without {sigma}2B. The approach developed here thus provides insight into the genetic architecture of the trait in the parental lines, as it should allow us to estimate {sigma}2A,s and {sigma}2B,s for each segment considered. These are the most relevant parameters in the study of an outbred population and it is a bonus of the usefulness of F2 crosses. With current statistical approaches, the only loci detected with maximum power are those with alleles fixed within line, which limits the inferences with respect to loci segregating in the parental lines. Moreover, the mixed model and segment mapping encourage the use of performance records from F1 and parental individuals not usually analyzed jointly with F2 records nor even recorded. F1 and parental records can be analyzed jointly with the F2 data without any significant modification of (1)–(4). An advantage of including these records is that they will provide insight into the presence and extent of dominance action.

Note that in segment mapping we do not make the distinction between a QTL and a polygenic background, and it is not necessarily assumed in segment mapping that a single locus is segregating within the segment or segments considered. It follows that it is more relevant in the segment-mapping context to test whether a given segment, however small, contributes significantly to genetic variation than in an accurate QTL location, as is emphasized in interval mapping (e.g., VISSCHER et al. 1996 Down). The importance of accuracy of QTL location or correctly ascertaining the number of QTL need not be overestimated. First, if a very dense genotyping is carried out, segment-mapping will be able to separate intervals contributing to variation more effectively than genome scan because external "genetic noise" is properly accounted for in segment mapping. Compare, e.g., the drops in LR0,MM and LRs between positions 1 and 2, which have very similar distributions under the null hypothesis (Fig 2). The change in LRs is larger than in LR0,MM for all genetic cases. We may thus conjecture that a combination of LR0,SM and LRs tests may lead to a more accurate location of the QTL than a simple scan with LR0,MM, although more extensive simulation is needed to prove this. Second, the candidate genes will be readily located once a promising region is identified as genetic maps are becoming densely populated with known genes. The current strategy in QTL analysis is to look for candidate genes within the chromosome regions that have shown association with the trait. It is likely, in fact, that the reverse strategy will be predominant in the future: once the number of cloned candidate genes becomes very large and their physiological effects are ascertained or inferred, it will be routine to estimate the fraction of genetic variance associated with these genes, including possible epistatic effects, in a particular population.

A feature of the segment-mapping strategy is that there is not an obvious course of action to conduct a genome partitioning. We propose to run a preliminary analysis with a segment partitioning scan as depicted in Fig 1B complemented with LRs tests every, say, 10 or 5 cM. This should allow us to identify which segments are more promising. In a second analysis the noninteresting regions should be discarded from further consideration, and a detailed partitioning of the most relevant genome regions can be studied, together with elucidating whether fixed, random, or mixed models are more suitable for each segment. Interactions between segments can be analyzed as well. The ultimate goal of segment mapping would be to have a function establishing the appropriate weights given to each region of the genome when computing the additive relationship between animals and, additionally, the expected changes in mean as well. Given estimation errors, the most parsimonious model explaining the maximum variance should be chosen. A reasonable compromise is to classify genome regions according to their effect on the trait of interest, e.g., strong, weak, and nonsignificant. Regions of similar effect can be analyzed together in the same segment. Note that different "segmentation" may be used to model variance components or means; i.e., the whole genome may be partitioned in just three segments grouped according to its contribution to total genetic variance, whereas differences in means ({Delta}s) can be fitted in more segments, or at specific genome locations if there is clear evidence of a QTL. In that manner, it can be considered that QTL that contribute to differences between lines do not contribute necessarily to differences within lines.

In conclusion, we have put forward a methodology based on mixed-model theory that allows for complex genetic models and, at least theoretically, a simultaneous analysis of the whole genome. It has been shown that genome scans using regression or completely random model approaches are but particular cases of the theory presented in this work. The random model shows a more robust behavior than the most commonly used regression approach. Finally, segment-mapping principles can be accommodated to a variety of experimental designs, not only F2 crosses.


*  ACKNOWLEDGMENTS

We are grateful to Miguel Toro, Luis Silió, Rohan Fernando, and the referees for useful comments. Some of this work was accomplished during a sabbatical visit of M.P.E. to Iowa State University. M.P.E. expresses his appreciation for the financial support received by Cotswold USA and Max Rothschild during his stay at Iowa State University. Work was funded by projects Comisión Asesora de Ciencia y Technología AGF96-2510 (Spain) and BIO4-CT97-962243 (E.U.).

Manuscript received March 16, 1999; Accepted for publication January 10, 2000.


*  APPENDIX
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The variance/covariance matrix of additive genetic values in the F2 generation, G, is derived. First a finite number of loci (nloci) is considered and then extended to an infinitesimal model. Genetic equilibrium and additive genic action, within and between breeds, is assumed. The genetic value of individual i from breed A is

where gSAi,k is the sire's origin allele and gDAi,k is dam's origin allele at the kth locus. Assume for simplicity but without loss of generality that all alleles from all loci are assumed, a priori, to have equal effects on the trait. Then

where

and

h is the haplotype (S or D origin). Breeding values in the F1 are distributed as N[µ, ]. The variance of F2's additive values is given by

and provided the individual is not inbred,

Define as in LO et al. 1993 Down a variable wk,k' that takes values AA, AB, BA, and BB according to the breed origin of each allele at loci k and k':

(A1)

The first term in (A1), Cov(ghi,k, ghi,k'|wk,k'), is zero if k != k' because linkage equilibrium is assumed within pure breeds or if wk,k' = AB or wk,k' = BA. For k = k' it is {sigma}2Ak or {sigma}2Bk depending on the origin of k (A or B). Thus,

(A2)

where pi is the fraction of the genome of origin A. The second term in (A1) is

(A3)

where rk,k' is the recombination fraction between loci k and k'. Combining (A2) and (A3) into (A1) and rearranging,

(A4)

Setting rk,k' = 0.5 for all k != k', we retrieve the equation by LO et al. 1993 Down for an arbitary number of unlinked loci. The last two terms in (A4) are the segregation variance when loci are linked. Equation A4 can be generalized to an infinite number of loci by integrating rk,k' over the whole genome comprising nchr chromosomes of length Lc using results in HILL 1993 Down for Haldane's mapping function,

(A5)

where = - L2c, when Haldane's mapping function is assumed, L is in morgans, phi,c is the fraction of chromosome c, haplotype h of individual i of breed origin A, µc is the mean effect of loci located in chromosome c, and {Delta}c is the average difference between loci from each breed origin for chromosome c.

In the absence of marker information, phi,c is 0.5 along the whole genome and for all F2 individuals, and (A5) is consequently of little relevance. Now consider that molecular information such that the probability of breed origin phi(x) can be obtained at any point x of the genome and the genome is partitioned in a series of segments. The genetic variance conditional on marker information is

where µs and {Delta}s are the mean of loci in segment s and the average deviation of that particular segment. The null hypothesis is that the contribution to total variation and differences between lines is proportional to genome length, i.e., µs = µLs/2L, {Delta}s = {Delta}Ls/2L, with L = {Sigma}nsegs=1Ls. Thus,

(A6)

The last three terms in (A6) can be neglected: (1) if molecular markers are relatively close, s and phi,s (1 - phi,s) tend to zero; (2) the segment's mean breeding value, µs, will be negligible in most cases if a general mean is included in model (1); and (3) the sum {Sigma}nsegs=1 also becomes zero for a large number of small segments. Consequently the diagonal elements of G can be simplified as

In practice one is interested in assessing the particular contribution of a given genome segment, as genetic covariance between individuals is not strictly proportional to the percentage of genome shared; rather, this percentage needs to be weighed by the relevance of each genome location, {sigma}2A,s and {sigma}2B,s in breeds A and B, respectively. Then,

with


*  LITERATURE CITED
*TOP
*ABSTRACT
*THEORY
*SIMULATION
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

ALFONSO, L. and C. S. HALEY, 1998  Power of different F2 schemes for QTL detection in livestock. Anim. Prod. 66:1-8.

ANDERSSON, L., C. S. HALEY, H. ELLEGREN, S. A. KNOTT, and M. JOHANSSON et al., 1994  Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science 263:1771-1774[Abstract/Free Full Text].

BOVENHUIS, H., J. A. M. VAN ARENDONK, G. DAVIS, J. M. ELSEN, and C. S. HALEY et al., 1997  Detection and mapping of quantitative trait loci in farm animals. Livest. Prod. Sci. 52:135-144.

DE KONING, D. J., L. L. G. JANSS, A. P. RATTINK, P. A. M. VAN OERS, and B. J. DE VRIES et al., 1999  Detection of quantitative trait loci for backfat thickness and intramuscular fat content in pigs (Sus scrofa). Genetics 152:1679-1690[Abstract/Free Full Text].

ELSEN, J. M., B. MANGIN, B. GOFFINET, D. BOICHARD, and P. LE ROY, 1999  Alternative models for QTL detection in livestock. I. General introduction. Genet. Sel. Evol. 31:213-224.

FERNANDO, R. L. and M. GROSSMAN, 1989  Marker-assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21:467-477.

GODDARD, M. E., 1992  A mixed model for analysis of data on multiple genetic markers. Theor. Appl. Genet. 83:878-886.

GOLDGAR, D. E., 1990  Multiplepoint analysis of human quantitative genetic variation. Am. J. Hum. Genet. 47:957-967[Medline].

GRATTAPAGLIA, D., F. L. G. BERTOLUCCI, and R. R. SEDEROFF, 1995  Genetic mapping of QTLs controlling vegetative propagation in Eucaliptus grandis and E. urophylla using a pesudo-testcross mapping startegy and RAPD markers. Theor. Appl. Genet. 90:933-947.

GRIGNOLA, F. E., I. HOESCHELE, and B. TIER, 1996  Mapping quantitative trait loci in outcross populations via residual maximum likelihood. Genet. Sel. Evol. 28:479-490.

HALEY, C. S. and S. A. KNOTT, 1992  A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].

HALEY, C. S., S. A. KNOTT, and J. M. ELSEN, 1994  Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207[Abstract].

HEATH, S., 1997  Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].

HILL, W. G., 1993  Variation in genetic composition in backcrossing programs. J. Hered. 84:212-213[Abstract/Free Full Text].

HOESCHELE, I., P. UIMARI, F. E. GRIGNOLA, Q. ZANG, and K. M. GAGE, 1997  Advances in statistical methods to map quantitative trait loci in outbred populations. Genetics 147:1445-1457[Abstract].

HUNT, G. J., E. GUZMAN-NOVOA, M. K. FONDRIK, and R. E. PAGE, JR., 1998  Quantitative trait loci for honey bee stinging behavior and body size. Genetics 148:1203-1213[Abstract/Free Full Text].

JANSEN, R. J., 1993  Interval mapping of multiple quantitative trait loci. Genetics