| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Miguel Pérez-Enciso, Station d'Amélioration Génétique des Animaux, INRA, BP 27, 31326 Castanet-Tolosan Cedex, France., mperez{at}toulouse.inra.fr (E-mail)
Communicating editor: C. HALEY
| ABSTRACT |
|---|
We develop a mixed-model approach for QTL analysis in crosses between outbred lines that allows for QTL segregation within lines as well as for differences in mean QTL effects between lines. We also propose a method called "segment mapping" that is based in partitioning the genome in a series of segments. The expected change in mean according to percentage of breed origin, together with the genetic variance associated with each segment, is estimated using maximum likelihood. The method also allows the estimation of differences in additive variances between the parental lines. Completely fixed random and mixed models together with segment mapping are compared via simulation. The segment mapping and mixed-model behaviors are similar to those of classical methods, either the fixed or random models, under simple genetic models (a single QTL with alternative alleles fixed in each line), whereas they provide less biased estimates and have higher power than fixed or random models in more complex situations, i.e., when the QTL are segregating within the parental lines. The segment mapping approach is particularly useful to determining which chromosome regions are likely to contain QTL when these are linked.
QUANTITATIVE traits arise from the joint action of the environment and multiple genes, usually called quantitative trait loci (QTL). The wide availability of DNA markers scattered along the genome, together with recently developed statistical methods, has spurred the massive search for QTL in any species of interest. Crosses between highly divergent lines are a powerful experimental design for this purpose (![]()
![]()
![]()
![]()
![]()
![]()
![]()
The usual model for analyzing F2 crosses (![]()
![]()
![]()
![]()
![]()
![]()
![]()
A mixed-model approach that accounts for variation both between and within lines is thus the most appropriate strategy for analyzing F2 crosses between outbred lines. ![]()
![]()
![]()
The problem of accounting for the genetic variation in the rest of the genome has been addressed by proposing the use of cofactors ("composite interval mapping"; ![]()
![]()
![]()
In this work we derive the genetic covariance matrix in crosses between outbred lines allowing for any number of linked markers and QTL, thus permitting a general QTL analysis of F2 crosses. This mixed model allows for more flexible genetic models than current strategies. We also propose a method, "segment mapping," aimed at accounting for the variation in the whole genome simultaneously. The method also allows us to test genetic variance differences between breeds. A simulation study is carried out to compare the performance of segment mapping and mixed model mapping with classical methods, i.e., a genome scan using fixed or random models.
| THEORY |
|---|
The breeding value of an individual is, by definition, twice the average performance of an infinite number of its offspring when mated to a random sample of spouses from the same population. The starting point is the assumption that the breeding values (g) of two outbred populations A and B are normally distributed gA ~ N(µ +
,
2A) and gB ~ N(µ -
,
2B), respectively. The phenotypic difference between breeds for the trait of interest is thus
. Genetic variation within breeds is assumed to be caused by an indeterminate number of loci in genetic equilibrium with additive action. Further, consider that the whole genome is divided in nseg segments and that a vector containing the additive genetic values from the population of breed A can be expressed as gA =
nsegs=1gA,s, where gA,s is the contribution of segment s to total breeding value, and Var(gA) =
nsegs=1 Var(gA,s) =
nsegs=1 GA,s because of linkage equilibrium. In the absence of molecular information, Var(gA) is the well-known additive relationship matrix and GA,s is the same for all segments (weighed by the segment's length). However, the availability of marker information makes it possible to compute the probabilities of identity by descent at particular positions of interest (e.g., ![]()
The goal of the approach presented here is to estimate, conditional on marker information, the contribution of each segment to total genetic variance/covariance between the F2 individuals and to ascertain the expected phenotypic mean of individuals according to the percentage of breed origin in each particular segment. A reasonable strategy would be to include loci of similar effect in the same segment but the theory developed is valid for any partition strategy.
Assume that trait performance has been recorded in a F2 cross population derived from breed A and B and that parental, F1, and F2 individuals have been genotyped for a series of markers. A general explanatory model of the F2 records is
![]() |
(1) |
where y is a N x 1 vector containing the F2 phenotypes, X and Z are incidence matrices relating observations to the vector of fixed effects (b) and additive genetic values (g), respectively, and e contains the residuals. In the following we refer only to breeding values in the F2 population and thus the subscript is omitted for brevity. The distribution of the random variables in (1) is
![]() |
(2) |
where V = ZGZ' + R, G is the genetic covariance matrix conditional on marker information as specified below, R = I
2e, I being a diagonal unit matrix and
2e the residual variance, Q is a N x nseg matrix with elements qi,s =
, phi,s is the average probability of segment s from individual i and haplotype h being of breed origin A, and
= {
s, s = 1, nseg}, i.e., a vector containing the average differences between individuals carrying an A breed origin segment s minus those carrying a B origin segment. Further, Var(g) = G =
nsegs=1 Gs, assuming linkage equilibrium in the parental populations and that markers are informative (see the Appendix). Otherwise, the gs from different within-chromosome segments will be correlated. The matrix Gs contains elements Var(gi,s) in the diagonal and Cov(gi,s, gi',s) in the off-diagonal. It is shown in the Appendix that the variance of breeding values of F2 individuals, conditional on marker information, is approximately
![]() |
(3) |
where
2A,s and
2B,s are the genetic variances contributed by segment s within parental populations A and B, respectively. Thus, the genetic variance of F2 individuals, conditional on marker information, is a weighted average of the genetic variances in the pure breeds. It is important to realize that the segregation variance (![]()
2A, exactly that of the founder breed A. The additive genetic covariance between F2 individuals is
![]() |
(4) |
where
hA(i,i'),s (
hB(i,i'),s) is the probability of individuals i and i' having identical by descent alleles of breed origin A (B) at segment s and haplotype h. Equation 4 shows that two individuals can share alleles identical by descent of breed origin A or B and that the total genetic covariance is a weighted average of both probabilities.
The model in (1) and (2) together with (3) and (4) provides the general framework to analyze F2 populations using standard mixed-model theory and molecular markers. These equations account for the fact that the average effect of alleles can be different between breeds, but also that there can simultaneously exist a QTL segregation within breeds. The average difference in allelic effects between both breeds is included as a fixed effect through Q
, whereas the additional variation within breeds is allowed through G. The usual genome scan/regression strategy means that model (1) is fitted with an infinitesimally small segment (= 1 QTL) in successive positions assuming
2A =
2B = 0. If only one QTL is fitted at a time, the matrix Q is a vector with coefficients as in, e.g., ![]()
2A and
2B are larger than zero for those QTL with alleles not fixed in the parental populations. The simple fixed model is not appropriate because not all differences between individuals due to that segment are fully accounted for by
s. Note that it is straightforward to accommodate that alleles are fixed in only one of the two breeds.
Molecular information is used to calculate phi,s,
hA(i,i'),s, and
hB(i,i'),s. Note that only the breed origin probabilities are involved in obtaining phi,s, whereas the identity by descent probabilities between marker alleles are required to compute
hA(i,i'),s and
hB(i,i'),s. If two F2 individuals do not have any common ancestor,
hA(i,i'),s =
hB(i,i'),s = 0 necessarily for all segments. But if both are homozygous for marker alleles that can be traced back unambiguously to breed A, phi,s = phi',s = 1, for that particular position, and could differ for other segments. In an ideal situation of infinite number of informative markers, these quantities are easy to compute. For instance the fraction of the genome of origin A is

where
hi(x) is a Dirac function taking value 1 if haplotype h at point x is of origin A and zero otherwise, and Ls and L are the segment length and the total length of the genome in morgans, respectively. If markers are not completely informative or the map is not infinitely dense, several options can be employed. Note that only the breed origin probabilities are needed to compute phi,s and, e.g., the method in ![]()
hA(i,i'),s and
hB(i,i'),s. These are given by ![]()
![]()
![]()
Parameter estimates of b,
s,
2A,s and
2B,s can obtained by maximum likelihood using the Simplex algorithm. This algorithm is a derivative-free method and requires only the logarithm of the likelihood, i.e.,

It should be noted that the average G over Gibbs iterates is used here and that the method can be, potentially, improved by marginalizing with respect to G, b,
, and
2, as in a Bayesian framework.
| SIMULATION |
|---|
We carried out a simulation study to test the performance of segment mapping and mixed-model scan vs. standard strategies. The F2 pedigree consisted of 5 parental sires from breed A, each mated to 2 dams of breed B that produced 5 F1 sires (1 per parental sire) and 40 F1 dams (4 per parental dam). The number of F2 offspring was 400. A 60-cM chromosome was simulated, and completely informative markers (i.e., each line had different marker alleles and as many alleles as founder individuals were generated) were located at positions 0, 20, 40, and 60 cM. Three genetic scenarios as depicted in Fig 1 were considered. A single telomeric locus explained all genetic differences between lines in scenario 1, and there were two telomeric loci at positions 0 and 60 cM in scenario 2. In scenario 3 there were two spaced clusters of 20 genes each, and the loci were of equal effect located every centimorgan in positions 120 and 4160 cM. Three distinct cases were studied in scenario 1. First (case a)
2A =
2B =
2e = 1 and
= 0; i.e., this is equivalent to an outbred population, as there are no expected phenotypic differences according to allele origin. In case b the alleles were fixed within breed (
2A =
2B = 0),
2e = 2, and
= 2. This is the current genetic model assumed in analyzing F2 crosses. And finally (case c),
2A =
2B =
2e = 1 and
= 2; i.e., there are phenotypic differences between breeds but still there exists additive variance within the parental populations. This is the situation occurring in F2 crosses between divergent outbred populations. It was the only case considered for genetic scenarios 2 and 3. Thirty replicates per model and case were run. The allele effects of the founder individuals were simulated according to its expected distribution, e.g., for breed A, N[
,
], where nloci is the number of loci, i.e., 1, 2, and 40 for scenarios 1, 2, and 3, respectively. Phenotypes were generated by summing the allele effects of the F2 individual and adding a residual normal variate of mean zero and variance 1 (case a and c) or 2 (case b). There was no sexual dimorphism and the general mean was the only fixed effect considered.
|
Four methods of analysis were compared:
Segment mapping:
The chromosome was divided into two segments, a 10-cM segment (genetic scenarios 1 and 2) or 20 cM (scenario 3) and a segment comprising the rest of the chromosome. The model was
![]() |
(5) |
where the subscript
is used to indicate the complement of segment s (here the rest of the chromosome). It was assumed that genetic variances were equal in both breeds and a single variance component was fitted per segment (
2A,s =
2B,s =
2s and
2A,
=
2B,
=
2
). Above, gs is split for convenience into its mean (ps
s), where ps is a vector with elements
, and a random genetic variable (us) with mean zero. Thus,

Several segment partitions were considered. In genetic scenarios 1 and 2, the 10-cM segment was shifted along the chromosome and a total of six analyses were considered, i.e., the first partition consisted of segments at positions 110 cM and 1160 cM; second partition, segments 1120 cM and the rest (110, 2160 cM); and so on. A scheme of the partitions is in Fig 1B. A similar strategy was followed for genetic scenario 3, except that three partitions of 20 and 40 cM were considered; i.e., the first partition comprised segments 120 and 2160 cM. Note that it is not necessary to establish these successive partitions but it facilitates the comparison with genome scan strategies.
Mixed model:
The point model was
![]() |
(6) |
where us ~ N (0, Gs) as above. This model was fitted in 10-cM intervals for genetic scenarios 1 and 2 and in intervals of 20 cM for scenario 3. The relationship matrix Gs contains the average relationships in that particular interval. The probabilities ps used were the average probabilities in the intervals considered. Note that the common strategy is to compute point probabilities, e.g., every centimorgan, but this has a negligible effect on the results given the small size of the interval and allows us to compare segment mapping with the mixed model and the two other strategies below.
Random model:
The point model was
![]() |
(7) |
Fixed model:
The point model was
![]() |
(8) |
Random and fixed models were fitted in identical intervals as in the mixed-model strategy.
The relationship matrices and ps were obtained after 1000 iterates of the Gibbs sampling scheme. The parameters were estimated in all cases by maximum likelihood using a Simplex algorithm. At each genome partition (segment mapping) or interval position (mixed, random, and fixed models), the likelihood ratio (LR0) comparing models (5), (6), (7), or (8) vs. y = µ + e was computed. In addition, the segment mapping model (5) was compared vs. model

for each segment partition (LRs); i.e., the null hypothesis (H0) tested is that there is no genetic effect in the 10-cM segment (hatched segments in Fig 1B). The likelihood ratios are asymptotically distributed as a chi square with degrees of freedom the difference in number of parameters between models tested. Degrees of freedom are then 1 for LR0,RM (the H0 in the random model is that
2s is 0) and LR0,FM (the H0 in the fixed model is that
s is 0), 2 in LR0,MM (the H0 in the mixed model is that both
s are 0), 4 in LR0,SM (the H0 for segment mapping is that all
s, 
,
2s, and
2
are 0), and 2 for LRs in segment mapping (the H0 is that
s and
2s are 0). To study the empirical null distribution of the different LR, we simulated 200 replicates under the null hypothesis (
=
2A =
2B = 0).
| RESULTS |
|---|
Table 1 shows the statistics corresponding to the empirical (simulated) distributions of the different likelihood ratios. The distributions analyzed were those corresponding to the maximum LR at each scan or at each chromosome partition. They are not far apart from the theoretical asymptotic values. There is a trend, as expected, in increasing the mean and variance with the degrees of freedom and, in fact, the empirical threshold is sometimes less conservative than the theoretical chi-square figure P(
2 > x0.05) > 0.05. Fig 2 shows the empirical cumulative distribution functions (CDFs) together with their chi-square counterparts. We can conclude as ![]()
|
|
Scenario 1:
Here there is only one QTL in the linkage group studied. The average LRs over segments are in Fig 3 Fig 4 Fig 5 for cases a, b, and c, respectively. These figures are equivalent to a LOD score or F-graphics in a chromosome scan, but we prefer a bar representation to underline that they are tests at discrete positions. Note again that the LR0,SM corresponds to a test where the whole chromosome is considered; it changes only the partition employed (Fig 1B). In the presence of a single QTL, the segment mapping test shows a distinct behavior from that of the point scan strategies (mixed, random, and fixed models). As expected, the scan strategies produce LR0 maxima at the QTL position, and LR0 decreases as the test position moves away. In contrast, LR0,SM also shows a clear maximum with partition 1, whereas the rest of the partitions show a rather flat and nonclearly decreasing profile. The differences between partitions should be due to random fluctuations because no clear pattern emerges. Now consider LRs. This statistic should be larger than zero whenever there is a QTL in the position considered and close to zero elsewhere. This is what we observe, and LRs shows clear maxima at the QTL positions irrespective of the genetic case, a, b, or c. The drop in LRs when we move away from the QTL position is much larger than in scan methods; e.g., compare the change in LR0,MM and in LRS between positions 1 and 2 (Fig 3 Fig 4 Fig 5).
|
|
|
Although there are some similarities between LR0,MM, LR0,RM, and LR0,FM, their performance depends critically on the underlying genetic model. Consider first case a (Fig 3), where the random model is the most appropriate strategy. It is not surprising that LR0,RM is very close to LR0,MM and LR0,SM in position 1, despite the larger number of parameters involved in the latter two methods. Moreover, Table 2 shows that segment mapping as well as the mixed and random models lead to the same
2s estimate. Segment mapping and the mixed model clearly show that the mean of allelic effects (
/2) is zero and that there is no additional variation out of segment 1. All three methods had a 100% power in detecting the QTL. In contrast, the fixed model was the worst strategy considered; not only in 61% of the replicates did maximum LR0 coincide with the QTL position, but also in only 68% out of those 61% replicates were the LR0,FM significant. The
2e estimate was clearly biased (Table 2).
|
In contrast, the fixed model (8) is the best choice in scenario 1b because the premise that the QTL affecting the trait are diallelic with alternative alleles fixed in each parental line is fulfilled. Here LR0,RM was much lower than LR0,FM, and this was very similar to LR0,MM and LR0,SM, because no additional parameters are needed. The fixed model yielded unbiased estimates of
2e, µ, and
s/2, as did the segment mapping and the mixed-model analysis. A random-model analysis also yielded with power 100% the first position as the most likely one to contain a QTL. But note that total variance (
2e +
2s) was overestimated and the mean estimate was biased downward because the assumed genetic model was not adequate.
The most complex, and realistic, scenario is when alleles are not fixed and their average effect differs from line to line (case c). All four analysis strategies identified the correct QTL location (except for one replicate in the fixed-model analysis) with power 100%, and in this sense all methods would lead to the detection of a QTL. But classical methods, either fixed or random models, are not capable of extracting all available information from the data. According to previous results, it is not surprising that the fixed-model analysis resulted in a biased estimate of
2e, whereas the estimates of
were much more accurate. Alternatively, the RM analysis provided aberrant estimates of the general mean, but
2e and
2s estimates were more realistic. Finally, the mixed model is the most parsimonious and correct model and results in the best estimates. The segment mapping indicates that there is a single segment contributing to the F2 genetic differences, as can be inferred from the dramatic drop in LRs for s > 1. The estimates of
2s and
s show that the QTL affects both the variance and the mean.
Scenario 2c:
Consider first the behavior of the likelihood ratio under the different models of analyses (Fig 6). The scan approaches (mixed, random, and fixed models) peaked at both QTL positions with probability close to 50% in all methods (Table 3) because the two QTL were of about the same effect. Again, LR0,RM was higher than LR0,FM, and the power was slightly larger with the random model than with the fixed-model approach. The LR0,SM peaks were more scattered, but almost 50% of the maxima were located at intermediate positions (partitions 3 and 5). These partitions correspond to those where segments containing QTL are grouped vs. segments without QTL. We can think of these partitions as the most "reasonable" ones. Occasionally the LR0,SM peaked at partitions 1 or 6 because in that particular replicate a given QTL effect was much larger than the other QTL effect. In no replicate did the maximum LR0,SM coincide with partition 2 or 5. The plot of LRs clearly indicates that only segments 1 and 6 contain QTL (Fig 6). Moreover, Table 3 shows that SM resulted in unbiased estimates of
2e irrespective of the partition because the variation along the whole chromosome is always considered (at the expense of logically increasing the number of parameters). The other strategies, the mixed and random model, but especially the fixed model, overestimated
2e. The mixed-model point estimates of
2s and of
/2 collected the variation along the whole chromosome and not only on that position (a phenomenon already described by ![]()
![]()
|
|
Scenario 3c:
Here the marker positions coincided with segment bounds. The presence of a close but distinct cluster of genes results in a different LR0 pattern as compared to scenario 2c. The LR0,MM and LR0,RM tend now to peak in between both clusters, whereas LR0,FM results in a completely flat profile, with maxima randomly located along the chromosome (Fig 7, Table 4). The LRs allows us to identify convincingly that the intermediate segment contains no QTL. Note that LRs for s = 1 and 3 are significant despite the much lower value compared to the other LR. Again LR0,SM peaked at partition 2. The phenomena already described in scenario 2c are noted again but to a larger extent because more than one linked loci are involved now: there is a bias in
2e estimates and the point genetic variance collects the variance from the whole linkage group. Note, e.g., that
2s estimates are the same for all s = 1, 3 with the mixed- and random-model analyses, although there are no QTL on positions 2040 cM. Again it is not surprising that the fixed model provided unrealistic estimates of
2e, whereas the QTL effect estimates (
) are confounded, as in scenario 2c. Segment mapping is the most appropriate analysis tool here and it is the only method providing accurate results.
|
|
| DISCUSSION |
|---|
The QTL mixed model developed here is a generalization over the ![]()
![]()
The simulation results presented show that, under a variety of genetic architectures, the mixed-model and segment-mapping procedures are more robust and flexible strategies than the classical methods based on pure fixed or random models. Segment-mapping, mixed model, and pure fixed or random models are hierarchical levels of analysis complexity, as can be seen from comparing (5), (6), (7), and (8). A likelihood-ratio test can be used to decide whether there is evidence to consider a genetic model more complex than the one assumed in classical methods. Overall, the point mixed model showed optimum performance with a single QTL. The segment-mapping approach will be most useful in the case of linked QTL (Table 3 and Table 4). The LRs will help to determine which chromosome regions are likely to contain QTL. It is interesting that the segment-mapping partition corresponding to the maximum likelihood (at equal number of parameters) occurs when the genome is partitioned according to its effect on the trait. For instance, when the QTL are in both extremes, the likelihood is maximized when a model-partitioning segment equidistant between the two QTL or the two clusters vs. the rest of the genome is chosen (Table 3 and Table 4). But it is also a nice property of segment mapping that, irrespective of the partition actually chosen, it results in general in accurate estimates of
2e and of the total contribution of the chromosome,
2s +
2
and
s + 
. This contrasts with fixed-, random-, or mixed-model approaches, where accurate estimates are obtained only at the exact position of the QTL.
The classical fixed-model approach is simple to compute and easy to interpret in F2 crosses, although it makes very strong assumptions about allele distributions in the parental lines. We have shown that fixed-model estimates can be dramatically affected if alleles are not fixed within lines, even in one-locus scenarios (Table 2 and Table 4). A systematic upward bias of the
2e estimate was observed in particular. Allele segregation also results in a loss of power with the fixed model (![]()
![]()
![]()
![]()
![]()
It is interesting to compare the performance of random and fixed models under the genetic models considered. The random model was more robust than the fixed-model approach in terms of locating a QTL: the LR0,RM was higher in case b (Fig 4) than LR0,FM in case a (Fig 3), as well as in case c (Fig 5, Fig 6, and Fig 7). That is, the random model behaved better when the random-model assumptions were violated than the fixed model did when fixed-model assumptions did not hold. This is an interesting result; the random model does not seem a priori a reasonable strategy for analyzing F2 crosses as no differences in allelic effects between breeds are assumed. ![]()
![]()
![]()
![]()
![]()
The approximation of (3) depends on the informativity and density of molecular markers. We have not explored in detail the impact of noninformativeness on the segment mapping approach, but it can be seen that the partitions used in genetic scenarios 1 and 2 (Table 2 and Table 3) have segments with one bound not coinciding with markers, i.e., the least informative possible situation. Despite this, the estimates were quite reasonable. Take, e.g., genetic scenario 1 (Table 2): in partition 1 the variance associated with segment 110 cM collects almost all genetic variance and
2
is zero, as it should be. In scenario 2c the only partitions where the 10-cM segment collects a significant variance are the first and last, where QTL are actually located (Table 3). In addition, the LRs statistic has a very distinct behavior depending on whether or not there is a QTL in the particular segment under consideration (Fig 3 Fig 4 Fig 5 Fig 6 Fig 7).
The simulations carried out here have assumed that loci behave additively, both between and within breeds. This may seem a quite strong assumption in view of the ample empirical evidence for heterosis in line crosses (![]()
![]()
We have assumed a model
2A =
2B, i.e., equal genetic variances across the parental lines, in the analyses reported here. Note, however, that the theory developed allows us to distinguish between genetic variances in each breed. To test this, we ran 30 additional replicates in scenario 1 with parameters
2e =
2A = 1 and
=
2B = 0. We analyzed the data using a random model with
2A and
2B as distinct parameters. The average actual simulated value for
2A was 0.901, and the estimates were 0.98 ± 0.02 (
2e), 0.90 ± 0.07 (
2A), and 0.01 ± 0.00 (
2B). The estimate of
2B was exactly 0 in 14 replicates. A likelihood ratio showed that a model including
2B did not improve over a model without
2B. The approach developed here thus provides insight into the genetic architecture of the trait in the parental lines, as it should allow us to estimate
2A,s and
2B,s for each segment considered. These are the most relevant parameters in the study of an outbred population and it is a bonus of the usefulness of F2 crosses. With current statistical approaches, the only loci detected with maximum power are those with alleles fixed within line, which limits the inferences with respect to loci segregating in the parental lines. Moreover, the mixed model and segment mapping encourage the use of performance records from F1 and parental individuals not usually analyzed jointly with F2 records nor even recorded. F1 and parental records can be analyzed jointly with the F2 data without any significant modification of (1)(4). An advantage of including these records is that they will provide insight into the presence and extent of dominance action.
Note that in segment mapping we do not make the distinction between a QTL and a polygenic background, and it is not necessarily assumed in segment mapping that a single locus is segregating within the segment or segments considered. It follows that it is more relevant in the segment-mapping context to test whether a given segment, however small, contributes significantly to genetic variation than in an accurate QTL location, as is emphasized in interval mapping (e.g., ![]()
A feature of the segment-mapping strategy is that there is not an obvious course of action to conduct a genome partitioning. We propose to run a preliminary analysis with a segment partitioning scan as depicted in Fig 1B complemented with LRs tests every, say, 10 or 5 cM. This should allow us to identify which segments are more promising. In a second analysis the noninteresting regions should be discarded from further consideration, and a detailed partitioning of the most relevant genome regions can be studied, together with elucidating whether fixed, random, or mixed models are more suitable for each segment. Interactions between segments can be analyzed as well. The ultimate goal of segment mapping would be to have a function establishing the appropriate weights given to each region of the genome when computing the additive relationship between animals and, additionally, the expected changes in mean as well. Given estimation errors, the most parsimonious model explaining the maximum variance should be chosen. A reasonable compromise is to classify genome regions according to their effect on the trait of interest, e.g., strong, weak, and nonsignificant. Regions of similar effect can be analyzed together in the same segment. Note that different "segmentation" may be used to model variance components or means; i.e., the whole genome may be partitioned in just three segments grouped according to its contribution to total genetic variance, whereas differences in means (
s) can be fitted in more segments, or at specific genome locations if there is clear evidence of a QTL. In that manner, it can be considered that QTL that contribute to differences between lines do not contribute necessarily to differences within lines.
In conclusion, we have put forward a methodology based on mixed-model theory that allows for complex genetic models and, at least theoretically, a simultaneous analysis of the whole genome. It has been shown that genome scans using regression or completely random model approaches are but particular cases of the theory presented in this work. The random model shows a more robust behavior than the most commonly used regression approach. Finally, segment-mapping principles can be accommodated to a variety of experimental designs, not only F2 crosses.
| ACKNOWLEDGMENTS |
|---|
We are grateful to Miguel Toro, Luis Silió, Rohan Fernando, and the referees for useful comments. Some of this work was accomplished during a sabbatical visit of M.P.E. to Iowa State University. M.P.E. expresses his appreciation for the financial support received by Cotswold USA and Max Rothschild during his stay at Iowa State University. Work was funded by projects Comisión Asesora de Ciencia y Technología AGF96-2510 (Spain) and BIO4-CT97-962243 (E.U.).
Manuscript received March 16, 1999; Accepted for publication January 10, 2000.
| APPENDIX |
|---|
The variance/covariance matrix of additive genetic values in the F2 generation, G, is derived. First a finite number of loci (nloci) is considered and then extended to an infinitesimal model. Genetic equilibrium and additive genic action, within and between breeds, is assumed. The genetic value of individual i from breed A is

where gSAi,k is the sire's origin allele and gDAi,k is dam's origin allele at the kth locus. Assume for simplicity but without loss of generality that all alleles from all loci are assumed, a priori, to have equal effects on the trait. Then

where

and

h is the haplotype (S or D origin). Breeding values in the F1 are distributed as N[µ,
]. The variance of F2's additive values is given by

and provided the individual is not inbred,

Define as in ![]()
![]() |
(A1) |
The first term in (A1), Cov(ghi,k, ghi,k'|wk,k'), is zero if k
k' because linkage equilibrium is assumed within pure breeds or if wk,k' = AB or wk,k' = BA. For k = k' it is
2Ak or
2Bk depending on the origin of k (A or B). Thus,
![]() |
(A2) |
where pi is the fraction of the genome of origin A. The second term in (A1) is
![]() |
(A3) |
where rk,k' is the recombination fraction between loci k and k'. Combining (A2) and (A3) into (A1) and rearranging,
![]() |
(A4) |
Setting rk,k' = 0.5 for all k
k', we retrieve the equation by ![]()
![]()
![]() |
(A5) |
where
=
-
L2c, when Haldane's mapping function is assumed, L is in morgans, phi,c is the fraction of chromosome c, haplotype h of individual i of breed origin A, µc is the mean effect of loci located in chromosome c, and
c is the average difference between loci from each breed origin for chromosome c.
In the absence of marker information, phi,c is 0.5 along the whole genome and for all F2 individuals, and (A5) is consequently of little relevance. Now consider that molecular information such that the probability of breed origin phi(x) can be obtained at any point x of the genome and the genome is partitioned in a series of segments. The genetic variance conditional on marker information is

where µs and
s are the mean of loci in segment s and the average deviation of that particular segment. The null hypothesis is that the contribution to total variation and differences between lines is proportional to genome length, i.e., µs = µLs/2L,
s =
Ls/2L, with L =
nsegs=1Ls. Thus,
![]() |
(A6) |
The last three terms in (A6) can be neglected: (1) if molecular markers are relatively close,
s and phi,s (1 - phi,s) tend to zero; (2) the segment's mean breeding value, µs, will be negligible in most cases if a general mean is included in model (1); and (3) the sum
nsegs=1
also becomes zero for a large number of small segments. Consequently the diagonal elements of G can be simplified as

In practice one is interested in assessing the particular contribution of a given genome segment, as genetic covariance between individuals is not strictly proportional to the percentage of genome shared; rather, this percentage needs to be weighed by the relevance of each genome location,
2A,s and
2B,s in breeds A and B, respectively. Then,

with

| LITERATURE CITED |
|---|
ALFONSO, L. and C. S. HALEY, 1998 Power of different F2 schemes for QTL detection in livestock. Anim. Prod. 66:1-8.
ANDERSSON, L., C. S. HALEY, H. ELLEGREN, S. A. KNOTT, and M. JOHANSSON et al., 1994 Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science 263:1771-1774
BOVENHUIS, H., J. A. M. VAN ARENDONK, G. DAVIS, J. M. ELSEN, and C. S. HALEY et al., 1997 Detection and mapping of quantitative trait loci in farm animals. Livest. Prod. Sci. 52:135-144.
DE KONING, D. J., L. L. G. JANSS, A. P. RATTINK, P. A. M. VAN OERS, and B. J. DE VRIES et al., 1999 Detection of quantitative trait loci for backfat thickness and intramuscular fat content in pigs (Sus scrofa). Genetics 152:1679-1690
ELSEN, J. M., B. MANGIN, B. GOFFINET, D. BOICHARD, and P. LE ROY, 1999 Alternative models for QTL detection in livestock. I. General introduction. Genet. Sel. Evol. 31:213-224.
FERNANDO, R. L. and M. GROSSMAN, 1989 Marker-assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21:467-477.
GODDARD, M. E., 1992 A mixed model for analysis of data on multiple genetic markers. Theor. Appl. Genet. 83:878-886.
GOLDGAR, D. E., 1990 Multiplepoint analysis of human quantitative genetic variation. Am. J. Hum. Genet. 47:957-967[Medline].
GRATTAPAGLIA, D., F. L. G. BERTOLUCCI, and R. R. SEDEROFF, 1995 Genetic mapping of QTLs controlling vegetative propagation in Eucaliptus grandis and E. urophylla using a pesudo-testcross mapping startegy and RAPD markers. Theor. Appl. Genet. 90:933-947.
GRIGNOLA, F. E., I. HOESCHELE, and B. TIER, 1996 Mapping quantitative trait loci in outcross populations via residual maximum likelihood. Genet. Sel. Evol. 28:479-490.
HALEY, C. S. and S. A. KNOTT, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].
HALEY, C. S., S. A. KNOTT, and J. M. ELSEN, 1994 Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207[Abstract].
HEATH, S., 1997 Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].
HILL, W. G., 1993 Variation in genetic composition in backcrossing programs. J. Hered. 84:212-213
HOESCHELE, I., P. UIMARI, F. E. GRIGNOLA, Q. ZANG, and K. M. GAGE, 1997 Advances in statistical methods to map quantitative trait loci in outbred populations. Genetics 147:1445-1457[Abstract].
HUNT, G. J., E. GUZMAN-NOVOA, M. K. FONDRIK, and R. E. PAGE, JR., 1998 Quantitative trait loci for honey bee stinging behavior and body size. Genetics 148:1203-1213
JANSEN, R. J., 1993 Interval mapping of multiple quantitative trait loci. Genetics