| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Miguel Pérez-Enciso, INRA, Station d'Amélioration Génétique des Animaux, BP 27, 31326 Castanet-Tolosan, France., mperez{at}toulouse.inra.fr (E-mail)
Communicating editor: C. HALEY
| ABSTRACT |
|---|
We provide a theoretical framework for quantitative trait locus (QTL) analysis of a crossed population where parental lines may be outbred and dominance as well as inbreeding are allowed for. It can be applied to any pedigree. A biallelic QTL is assumed, and the QTL allele frequencies can be different in each breed. The genetic covariance between any two individuals is expressed as a nonlinear function of the probability of up to 15 possible identity modes and of the additive and dominance effects, together with the allelic frequencies in each of the two parental breeds. The probabilities of each identity mode are obtained at the desired genome positions using a Monte Carlo Markov chain method. Unbiased estimates of the actual genetic parameters are recovered in a simulated F2 cross and in a six-generation complex pedigree under a variety of genetic models (allele fixed or segregating in the parental populations and additive or dominance action). Results from analyzing an F2 cross between Meishan and Large White pigs are also presented.
THERE is currently much interest in the use of molecular markers to analyze the genetic basis of quantitative or "complex" traits, and an increasing number of experimental designs and statistical methods are being developed for this purpose (e.g., ![]()
![]()
![]()
![]()
![]()
Modeling of dominance in outbred populations with inbreeding has proved to be a difficult task, given the large number of parameters that are required. ![]()
![]()
The objective of this work is to present theory to analyze data from crosses of outbred lines using marker information. This theory allows dominance and inbreeding and the use of all available pedigree information. The article is organized as follows. First, we present the theory. Second, we illustrate the approach with simulated data and real data from a pig F2 cross. The main emphasis is on F2 crosses, given the wide popularity of this experimental design, but we also show results concerning more complex pedigrees.
| THEORY |
|---|
A general explanatory model for performance records is
![]() |
(1) |
where y is a vector containing the phenotypes, X and Z are incidence matrices, b is the vector of fixed effects, g, the vector that contains the genetic values, and e is the residuals' vector. We do not make any assumption in (1) about the pedigree structure; y may contain records from purebred and/or crossed individuals. In principle, any number of breeds could be accommodated, but we restricted the theory developed below to a two-breed pedigree. The multivariate normal distribution is a very robust assumption. Thus,
|
(2) |
where V = ZGZ' + R, R is a diagonal matrix with diagonal elements equal to the residual variance,
2, µg = {E(gi)} is a vector containing the expected genetic values of each individual (gi), and G = {Cov(gi, gi')} is a matrix consisting of the covariances between g elements. The assumption of normality is required only for obtaining estimates of optimum statistical properties via Equation 5 below, but the theory developed is valid in any case. Now we need to obtain E(gi) and Cov(gi, gi'). First, we briefly recall the theory for analyzing crossed populations developed by ![]()
The theory developed by ![]()
![]()
|
|
![]()
![]() |
(3) |
where E(gij|Nj = k) is the mean genetic effect at individual i, locus j (j = 1, L), given that the individual two-breed identity mode, N, at locus j is k (Table 1). The genetic covariance between any two crossed individuals is, assuming that loci are unlinked,
![]() |
(4) |
(![]()
![]()
![]()
![]()
|
The principles of this theory can be applied to QTL detection, thus permitting the QTL analysis of populations of any pedigree structure issued from crosses between outbred populations irrespective of whether the gene action shows dominance or not. But two main obstacles persist. First, the number of genetic parameters to be estimated for every locus is 20, the mean plus the covariance parameters. Even if we reduced the number of parameters required in ![]()
The number of parameters to be estimated can be dramatically reduced if we assume a biallelic QTL with different frequencies in each breed. The model can now be reparameterized solely in terms of the additive (a) and dominance (d) QTL effects, plus the frequencies of each allele in breeds A and B, p1 and p2, respectively. The genotypic value of homozygous individuals is thus a and -a for the alternative alleles, and heterozygous individuals have d as genotypic value. The conditional covariances in (4) can be obtained easily if we assume Hardy-Weinberg equilibrium in the purebred founder individuals. Consider, for instance, M = 14 (Table 2), i.e., the case where both individuals are inbred and the locus is from breed A origin. The covariance is, dropping the subscripts in M,

Given that the individuals are inbred, their genotype will be a with probability p1 (because its origin is breed A), and they share the same allele, thus

and

Other conditional genetic covariances can be obtained similarly. All terms required are listed in Table 1 and Table 2.
The TIM probabilities conditional on marker information can be computed via a modification of the Monte Carlo Markov chain (MCMC) approach described in ![]()
![]()
![]()
![]()
Finally, it should be noted that Equation 4 assumes that loci are unlinked, which would preclude the analysis of linked QTL. However, we have shown that the covariances between loci are zero conditional on marker information, provided that markers are informative and distances between successive markers are small (![]()
We used a two-step strategy for the QTL analysis. First, the TIM coefficients were calculated at the desired genome positions. Subsequently, maximum-likelihood estimates for a, d, p1, and p2, plus the fixed effects, were obtained at each genome position to determine the most likely QTL location, its effect, and its frequencies. The log-likelihood is
![]() |
(5) |
Note that here both G and µg depend nonlinearly on the four parameters a, d, p1, and p2. In contrast, the approach in ![]()

where
i are the parameters to be estimated. The nonlinearity is the price to pay for allowing dominance gene action within outbred lines. We maximized the likelihood (5) using a simplex algorithm. This algorithm is not efficient in CPU use but it is convenient because it does not require any derivatives to be calculated.
It is interesting to compare the approach followed here with other classical methods. Take p1 = 1 and p2 = 0. This is the model used in analyzing crosses between inbred lines. By substituting p1 = 1 and p2 = 0 into (3) and (4) it is straightforward to show that G =
and that the only terms remaining are those involved in µg, which are the usual regression coefficients employed in QTL analysis. If, in contrast, we set p1 = p2, we retrieve a model for analyzing outbred populations, i.e., where breed origins are not taken into account. Similarly, a strict additive model can be studied by constraining d = 0. In summary, we should be able to test specific gene actions in the population under study by choosing an appropriate restriction on the parameters.
| SIMULATION |
|---|
Two sets of simulations, an F2 cross and a six-generation pedigree, were simulated. The F2 population consisted of 10 and 20 founders from each of the two breeds, 20 male and 40 female F1 individuals, and 320 F2 individuals. All families contributed an equal number of descendants. Two analysis options were considered: Either only performances from F2 individuals were used (n = 320) or also records from all F0 and F1 individuals were available and analyzed jointly with the F2 data (n = 410).
In addition, we also tested the method in a general pedigree. More specifically we simulated a six-discrete-generation pedigree (n = 410). It consisted of 10 and 20 founders from each of the two breeds. The individuals of the next generation were produced by mating 5 sires to two dams each, sires and dams being chosen at random with replacement (i.e., an individual, male or female, could participate in more than one mating per generation), and five full-sibs per mating were generated. The exceptions were the F1 generation, where 10 sires were chosen to produce the F2, and the F2, where 13 offspring per mating were generated. It was assumed that all individuals were genotyped and phenotyped. All data were included in the analysis.
The trait was assumed to be controlled by a single biallelic QTL in position 10 cM and bracketed by two markers located at 0 and 25 cM. The markers had 12 alleles, with 6 alleles specific to each breed. Hardy-Weinberg equilibrium frequencies were forced for the QTL in the founder individuals. Founder marker genotypes were sampled at random from a uniform distribution for allele frequencies. The additive genetic value was set to a = 1 and d to 0 or 1. Three cases for allele frequencies were considered: p1 = p2 = 0.5; p1 = 1, p2 = 0; and p1 = 1, p2 = 0.5. All six cases considered are listed in Table 3 and Table 4. The phenotype was obtained by adding a normal deviate N(0, 1) to the genetic value. We report the estimates obtained by maximizing the likelihood at the true QTL position. This was done to assess the ability of the method to distinguish between alternative genetic models. The performance of the method in a chromosome scan is shown below in the real data example. Thirty replicates per case were done.
|
|
Four models were used to analyze each of the data sets generated under the six genetic situations. These were an additive model where a single allele frequency was estimated, i.e., a and p (p1 = p2 forced) as parameters; second, a model containing a, d, and p; third, a model with a, p1, and p2 parameters; and finally a full model containing a, d, p1, and p2.
| REAL DATA |
|---|
The data were from an F2 cross with Meishan and Large White pigs as parental populations. A comprehensive report of the experimental design and results can be found in ![]()
![]()
![]()
![]()
![]()
![]()
| RESULTS |
|---|
Simulation:
A first step in the analysis is to decide which is the most appropriate genetic model, i.e., whether the alleles are fixed within the parental populations and whether genic action is purely additive or there is evidence of dominance. Consequently, we computed the likelihood ratio (LR) of models including dominance and/or unequal breed allele frequencies vs. the simplest model, i.e., no dominance and equal allele frequencies in both breeds. Fig 2 shows the results corresponding to the F2 population. The results are shown for all six parameter combinations used to generate the data. Statistics are presented for two cases, namely, whether only F2 phenotypic records or all F0, F1, and F2 records are included in the analysis. A LR test allowed us to retrieve the correct model in all instances studied. Take, for example, Fig 2A, where the null model is the true one, no LR exceeded the significance threshold. Whenever data were generated according to a purely additive model (Fig 2, a, c, and e), the LR of the model including dominance did not improve upon the additive one. Otherwise (d = 1), the LR clearly showed that a dominance parameter should be included in the model (Fig 2B, Fig D, and Fig F). Accordingly, a LR also discriminated whether allele frequencies are equal (Fig 2, a, and b) or not (Fig 2, cf). In cases c and d (true p1 = 1, p2 = 0), we also tested whether a model including parameters p1 and p2 improved over a model that set p1 = 1 and p2 = 0, with the result that the former model was not significantly better than the latter model (results not shown). Similar results, not presented to avoid repetition, were found for the six-generation pedigree.
|
Note, in addition, that the inclusion of parental purebred and F1 records improves the probability of detecting the correct model as the LR of the most parsimonious correct model increases. In the particular case represented in Fig 2A (d = 0, p1 = p2 = 0.5), the LRs of less parsimonious models decrease when analyzing all data, giving further support to the null hypothesis model.
Average estimates of the parameters for the F2 cross and the six-generation pedigree are in Table 3 and Table 4, respectively. The estimates reported are those obtained under the correct model, the rationale being that a test has been carried out to determine which is the appropriate model, as in Fig 2. All in all, we find an excellent agreement between actual parameters and their estimates. The accuracy of allele frequency estimates was very high if alternative alleles were fixed and less so if the alleles were segregating within breeds, but still unbiased estimates were retrieved. Standard errors were, in most cases, smaller when F0 and F1 records were included in the F2 pedigree analysis.
Comparing by experimental designs, the estimates of the six-generation pedigree had on average a larger standard error than those in the F2 design when alleles were not fixed in each parental breed. This is likely to be due to genetic drift, which increases each generation, and we found a strong interrelationship between allele frequency and QTL effect estimates. In contrast, we also observed a smaller error for QTL position in the six-generation pedigree than in the F2 design (results not presented), as expected because of a larger number of meioses in the former population (![]()
Pig data:
The results of the comparison between alternative models on the F2 cross pig data from ![]()
![]()
![]()
![]()
![]()
|
In contrast to backfat thickness, all statistical evidence suggests that Meishan and Large White pigs have fixed alternative alleles affecting live weight in chromosome 4. Models 4 and 5 converge to p1 = 1 and p2 = 0, with no increase in likelihood in model 5 with respect to model 1. It is more difficult to ascertain the effect of dominance, the difference in LR being close to significance. The regression approach provided estimates similar to those obtained under the additive model 3. Note that the QTL position estimates vary widely depending on whether dominance was included or not; QTL position changed over 10 cM according to the model chosen. Fig 3 shows a plot of LR for models that include the dominance effect or not and p1 = 1 and p2 = 0. It can be seen that there are two local maxima in that region, and probably the confidence interval for the QTL position comprises both maxima. In any case, this change in QTL position is particularly worrying here given that the effect of dominance borders significance. Note, in addition, that the within-family analysis agrees with the position estimated under the dominant model, whereas the between-breed regression estimate is close to that obtained with the additive model. The LRs of models that assume equal frequencies in both breeds (models 2 and 3) were nonsignificant, which contrasts to the results obtained for backfat thickness. This occurs because these models assume that there is allelic variation within breeds, which seems to be the case for backfat thickness but not for growth.
|
| DISCUSSION |
|---|
The theory developed allows us to obtain a very useful insight into QTL genic action. It allows us to diagnose whether the alleles are fixed within the parental populations or segregating at similar frequencies and whether the genic action is dominant or not. Unlike other indirect approaches like within-sire regression, testing can be done irrespective of the population structure, i.e., number of generations, and using all available pedigree and marker information. Certainly the method can be improved; for instance, it would be desirable to use a single MCMC strategy to sample jointly the identity coefficients and the rest of the parameters. Such a strategy would provide exact estimates of the standard errors of the parameters, whereas in this likelihood framework with a simplex algorithm we need to resort to asymptotic approximations. We are currently working on a general Bayesian strategy to address this issue. Nonetheless, we have shown that the approach followed here performed quite well under a variety of genetic and pedigree scenarios (Table 3 and Table 4, Fig 2).
The ascertainment of whether the QTL alleles are segregating within lines is an important issue in QTL identification. If a QTL is found, say, in an F2 cross, the subsequent experimental procedure to map it finely can be very different depending on whether all F1 are heterozygous at the QTL (alternative alleles fixed) or only a percentage are (i.e., alleles segregating in the parental lines). Moreover, the presence of dominance may also alter the statistical results obtained via classical regression type methods. The power of such an approach will diminish if a recessive allele is segregating in any of the crossed populations. Certainly, the most convincing proof would be to analyze the purebreds directly, but usually the number of purebred individuals typed in experimental crosses is small and would require setting up an additional experiment. Our approach is able to extract more information than classical approaches from the already available data.
There are currently a large number of crosses between divergent lines in many different animal and plant species. Certainly, not all the parental lines utilized are completely inbred. It is thus interesting to compare the results using a regression method and the method developed here. The results presented in Table 5 represent two of the possible situations that may be encountered. For the first trait, backfat thickness, there is reasonable evidence that alleles may not be fixed in both lines. Further, the statistical analysis suggests that the QTL is segregating in Large White but not in Meishan (Table 5), which may be explained by the very small number of founder animals of the French Meishan population (![]()
![]()
In contrast to backfat, the classical model seems appropriate for live weight and there is not much to be gained by adding extra parameters to the regression model. Here some uncertainty lies on the relevance of the dominant effect, as the significance level of contrasting model 5 vs. 4 is
P
0.09, the probability of a chi square distribution (1 d.f.) being >2.8. The evidence in favor of dominance is thus weak, in agreement with the results of the regression analysis. The QTL position estimates obtained via the within-family analyses are in agreement with those obtained via the TIM coefficients, although the average estimate is lower. In principle, one should also expect a within-family heterogeneity of substitution effects if alleles are not fixed. Nonetheless, we found that the variance of sire effect estimates using the within-family approach was similar for both traits, 0.10 and 0.08 in SD units for live weight and backfat, respectively. This is probably the result that each half-sib family is analyzed separately and thus a small number of observations is actually used to estimate each substitution effect, in contrast to the more parsimonious approach presented here where all pedigree and marker information is considered jointly.
Although beyond the scope of this article, the researcher should be aware of possible QTL position shifts according to the model of choice (Fig 3). This is pertinent especially considering that it is customary to include a dominant effect in the model in crossed-population analyses without testing for its effect. We did not carry out a joint multivariate analysis of live weight and backfat, but the fact that the allele frequencies are different for each trait would suggest that there are two linked loci. This hypothesis would be in agreement with results from ![]()
![]()
The identity modes in ![]()
![]()
![]()
![]()
![]()
![]()
s1-casein in goats (![]()
Finally, it should be recalled that most plant and animal individuals exploited commercially are hybrids but that their genetic evaluation is largely based on purebred performance. Thus a further application of the theory developed here, beyond the detection of QTL, will be to include molecular and performance data from hybrids in the genetic evaluation scheme. This approach can also be used to help marker-assisted introgression, where typically data from several generations are available and where dominance and inbreeding may be present.
| ACKNOWLEDGMENTS |
|---|
This work began during a sabbatical visit of the senior author to Iowa State University. M.P-E. expresses his appreciation for the financial support received by Max Rothschild and Cotswold USA during this visit. The pigmap QTL project was funded by the European Union (Bridge and Biotech+ programs), Institut National de la Recherche Agronomique (Department of Animal Genetics, AIP "structure des génomes animaux" and the "Groupement de Recherches et d'Études sur les Génomes").
Manuscript received September 13, 2000; Accepted for publication June 13, 2001.
| LITERATURE CITED |
|---|
ALFONSO, L. and C. S. HALEY, 1998 Power of different F2 schemes for QTL detection in livestock. Anim. Prod. 66:1-8.
BIDANEL, J. P., J. C. CARITEZ, and C. LEGAULT, 1989 Estimation of crossbreeding parameters between Large White and Meishan porcine breeds. I. Reproductive performance. Genet. Sel. Evol. 21:507-526.
DARVASI, A. and M. SOLLER, 1995 Advanced intercross lines, an experimental population for fine genetic mapping. Genetics 141:1199-1207[Abstract].
GILLOIS, M., 1964 La relation d'identité en génétique. Ann. Inst. Henri Poincaré B2: 194.
HALEY, C. S., S. A. KNOTT, and J. M. ELSEN, 1994 Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207[Abstract].
HARRIS, D. L., 1964 Genotype covariances between inbred relatives. Genetics 50:1319-1348
LE ROY, P., J. M. ELSEN, D. BOICHARD, B. MANGIN, and J. P. BIDANEL et al., 1998 An algorithm for QTL detection in mixture of full and half sib families. World Cong. Genet. Appl. Livest. Prod. 26:257-260.
LIU, B. H., 1998 Statistical Genomics. CRC Press, Boca Raton, FL.
LO, L. L., R. L. FERNANDO, R. J. C. CANTET, and M. GROSSMAN, 1995 Theory for modelling means and covariances in a two-breed population with dominance inheritance. Theor. Appl. Genet. 90:49-62.
MALÉCOT, G., 1948 Les Mathématiques de l'Hérédité. Masson et cie., Paris.
MARKLUND, L., P. E. NYSTRÖM, S. STERN, and L. ANDERSSON-EKLUND, L. ANDERSSON, 1999 Confirmed quantitative trait loci for fatness and growth on pig chromosome 4. Heredity 82:134-141.
MARTIN, P., C. LEROUX, Y. AMIGUES, M. JANSÀ PÉREZ, and F. REMEUF et al., 1995 Molecular diversity of the goat alpha-S1-casein gene: impact on casein content and cheesemaking properties. Bull. Int. Dairy Fed. 304:12-13.
MILAN, D., J. P. BIDANEL, P. LE ROY, C. CHEVALET, and N. WOLOSZYN et al., 1998 Current status of QTL detection in large white x Meishan crosses in France. World Cong. Genet. Appl. Livest. Prod. 26:414-417.
PÉREZ-ENCISO, M. and L. VARONA, 2000 Quantitative trait loci mapping in F2 crosses between outbred lines. Genetics 155:391-405
PÉREZ-ENCISO, M., A. CLOP, J. L. NOGUERA, C. ÓVILO, and A. COLL et al., 2000a A QTL on pig chromosome 4 affects fatty acid metabolism: evidence from an Iberian by Landrace intercross. J. Anim. Sci. 78:2525-2531
PÉREZ-ENCISO, M., L. VARONA, and M. F. ROTHSCHILD, 2000b Computation of identity by descent probabilities conditional on DNA markers via a Monte Carlo Markov chain method. Genet. Sel. Evol. 32:467-482.
SMITH, S. P. and A. MÄKI-TANILA, 1990 Genotypic covariance matrices and their inverses for models allowing dominance and inbreeding. Genet. Sel. Evol. 22:65-91.
WALLING, G. A., P. M. VISSCHER, L. ANDERSSON, M. F. ROTHSCHILD, and L. WANG et al., 2000 Combined analyses of data from quantitative trait loci mapping studies: chromosome 4 effects on porcine growth and fatness. Genetics 155:1369-1378
WANG, T., R. L. FERNANDO, and M. GROSSMAN, 1998 Genetic evaluation by best linear unbiased prediction using marker and trait. Genetics 148:507-516
This article has been cited by other articles:
![]() |
L. Ronnegard, F. Besnier, and O. Carlborg An Improved Method for Quantitative Trait Loci Detection and Identification of Within-Line Segregation in F2 Intercross Designs Genetics, April 1, 2008; 178(4): 2315 - 2326. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |