Genetics, Vol. 159, 1819-1832, December 2001, Copyright © 2001

Interval Mapping of Quantitative Trait Loci in Autotetraploid Species

C. A. Hacketta, J. E. Bradshawb, and J. W. McNicola
a Biomathematics and Statistics Scotland, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland
b Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland

Corresponding author: C. A. Hackett, Biomathematics and Statistics Scotland, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland., christine{at}bioss.ac.uk (E-mail)

Communicating editor: C. HALEY


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

This article presents a method for QTL interval mapping in autotetraploid species for a full-sib family derived by crossing two parents. For each offspring, the marker information on each chromosome is used to identify possible configurations of chromosomes inherited from the two parents and the locations of crossovers on these chromosomes. A branch and bound algorithm is used to identify configurations with the minimum number of crossovers. From these configurations, the conditional probability of each possible QTL genotype for a series of positions along the chromosome can be estimated. An iterative weighted regression is then used to relate the trait values to the QTL genotype probabilities. A simulation study is performed to assess this approach and to investigate the effects of the proportion of codominant to dominant markers, the heritability, and the population size. We conclude that the method successfully locates QTL and estimates their parameters accurately, and we discuss different modes of action of the QTL that may be modeled.


LINKAGE analysis and quantitative trait loci (QTL) mapping methods are now well established and widely used for diploid plant species, and there is an increasing interest in extending these methods to autopolyploid species, despite the complications of polysomic inheritance. Linkage maps have been calculated for autotetraploid potato (MEYER et al. 1998 Down), autotetraploid alfalfa (BROUWER and OSBORN 1999 Down; DIWAN et al. 2000 Down), autohexaploid sweet potato (UKOSIT and THOMPSON 1997 Down), and autooctaploid sugarcane (AL-JANABI et al. 1993 Down; DA SILVA et al. 1993 Down, DA SILVA et al. 1995 Down; RIPOL et al. 1999 Down). Most of these studies used dominant markers or individual alleles from codominant markers that are present in one parent and segregate in a 1:1 ratio in the mapping population (simplex markers). Methods for calculating recombination frequencies between simplex markers were discussed by WU et al. 1992 Down.

Unless a mapping population is very large, it is difficult to detect repulsion linkages between simplex markers in polyploids. YU and PAULS 1993 Down and HACKETT et al. 1998 Down examined the use of double-dose (duplex) markers as well as simplex markers to identify homologous chromosomes in autotetraploid populations, and MEYER et al. 1998 Down used this approach in their map of potato. DA SILVA 1993 Down and RIPOL et al. 1999 Down gave similar theoretical results for octaploid species and used this to obtain a map of sugarcane. Recently LUO et al. 2001 Down showed how to calculate recombination frequencies and LOD scores for all possible configurations of codominant multiallelic markers in autotetraploid species and how this information may be used to construct a linkage map. They showed that codominant markers are, in general, more powerful for detecting linkage and give more precise estimates of the recombination fraction.

An important use of linkage maps is to locate major genes and QTL for important traits. Early studies of diploid species compared trait means for different phenotypes at a single marker using regression models, and some authors used the same approach in polyploid species. SILLS et al. 1995 Down used regression models to establish associations between four stalk traits in sugarcane and a set of simplex markers. A similar approach was used by MEYER et al. 1998 Down and BRADSHAW et al. 1998 Down to locate QTL in potato for quantitative resistance to late blight [Phytophthora infestans (Mont.) De Bary] and the white potato cyst nematode [Globodera pallida (Stone)], respectively. These authors used both simplex and duplex markers and used permutation tests (CHURCHILL and DOERGE 1994 Down) to establish an appropriate significance level for testing multiple markers. In an autotetraploid cross, however, there could be up to eight different QTL alleles and the regression approach will give little insight into their individual effects and interactions. We also want to test for the presence of a QTL at locations between markers.

In this article we present an approach for QTL interval mapping in autotetraploid species in a full-sib family derived by crossing two parents. As for similar populations derived from outbreeding diploid parents, we use all the markers on a chromosome to estimate conditional probabilities for QTL genotypes. The trait values are then related to the QTL genotypes, using a mixture model. We present a simulation study to look at effects of the proportion of codominant to dominant markers, the heritability, and the population size on the ability of the model to locate QTL and to estimate their effects.


*  METHODS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

The mapping population:
The QTL mapping approach is developed for an F1 population derived by crossing two parents, P1 and P2. The phenotypes of m molecular markers are assumed to be known for the parents and n offspring, and trait data are available for the parents and offspring. The parents can have up to eight distinct alleles at each marker or quantitative trait locus: These are represented by A–H or O for a "null" allele (CALLEN et al. 1993 Down). In practice, most loci will have less than eight alleles, and the parental phenotypes will generally be compatible with more than one genotype. LUO et al. 2000 Down have shown how the probabilities of possible parental marker genotypes may be inferred from the parental and offspring phenotypes using Bayes' theorem.

Tetrasomic inheritance:
The model for QTL mapping is developed by assuming random chromosomal segregation. The four homologous chromosomes are assumed to pair at random to give two bivalents, and crossing over is assumed to be restricted to within each bivalent. All bivalent pairings are assumed to be equally likely. The possibilities of nonrandom chromosomal pairing, or multivalent formation, are not considered here. We assume that there is no chromatid and no crossover interference. LUO et al. 2001 Down have shown how recombination frequencies and LOD scores between pairs of markers can be estimated for all possible phases, assuming random chromosomal segregation, and how the most likely phase may be identified. Using this information, markers can be partitioned into linkage groups, ordered, the distances between them calculated, and the phases of the markers in the parents can be deduced. We assume that this analysis has been carried out, so that the map of molecular markers is known.

Model for a quantitative trait:
KEMPTHORNE 1957 Down discussed the partitioning of the genetical variance of polyploid individuals in a random mating population at equilibrium. He expressed the genotypic value YG of a tetraploid individual with genotype AiAjAkAl as

(1)

where µ is the population mean, the {alpha}'s are the main effects of the alleles (analagous to additive effects in a diploid population), and ß's, {gamma}'s, and {delta}'s are the diallelic, triallelic, and tetraallelic interactions, respectively. The ß's are analogous to dominance deviations in a diploid population, but there is no diploid analogy to the triallelic and tetraallelic interactions. Appendix A compares the notation of model (1) with that used by various authors when considering two alleles at a tetraploid locus. For our full-sib family model, an individual will inherit alleles Ai and Aj, 1 <= i <= 3, i <= j <= 4 from parent 1 and Ak and Al, 5 <= k <= 7, k <= l <= 8 from parent 2. There are therefore eight main effects, 28 diallelic interactions, 24 triallelic interactions, and 36 tetraallelic interactions in model (1), totaling more than the 36 genotypes available for model fitting. We rewrite model (1) for a full-sib family with indicator variables Xi = 1/0 corresponding to allele Ai present/absent for that individual. Model (1) becomes

(2)

There is intrinsic aliasing of {Xi} and higher products, so that some of the parameters are nonestimable. As each individual inherits precisely two alleles from each parent, we have the constraints

(3)

Substituting these into a model with main effects only gives

(4)

The estimable parameters are (µ + 2{alpha}1 + 2{alpha}5), {alpha}2 - {alpha}1, etc. To estimate the individual parameters, we impose the constraints {alpha}1 = 0, {alpha}5 = 0, sometimes referred to as cornerpoint constraints.

There are further constraints on the higher-order terms of Equation 2, for example,

(5)

and similarly for the other parent. In total, it is possible to fit six main effects, 13 biallelic interactions (2 for interactions between alleles from parent 1, 2 for parent 2, and 9 for interactions between alleles from different parents), 12 triallelic interactions, and 4 tetraallelic interactions. These total 35 effects, equal to the degrees of freedom among the 36 genotype means.

When fitting models in the simulation study, we concentrate on the main effects model (4), with cornerpoint constraints on the parameter estimates. However, there is no theoretical difficulty in including higher-order terms.

In practice, the offspring QTL genotypes are unknown, and the conditional probabilities of QTL genotypes must be estimated from the marker information.

A mixture model for QTL mapping:
Here we develop a maximum-likelihood approach for fitting a single QTL model, considering one chromosome at a time. The analysis is an extension of that used by JANSEN 1992 Down. We assume that we have a population of n offspring and that for offspring i we observe trait value yi and marker phenotype data oi for the chromosome. Let qi {isin} Qi be the set of possible QTL genotypes, and let gi {isin} Gi be the set of chromosome configurations that are compatible with phenotype oi. By "chromosome configurations" we mean the marker genotypes and the parental chromosomes from which the marker alleles come, so that it is clear how the bivalents occurred to form that offspring and where recombinations occurred. We adopt an interval mapping approach, fitting a QTL at a set of locations along the chromosome and maximizing the likelihood for each location as a function of the QTL parameters {theta} = (µ, {alpha}2, {alpha}3, {alpha}4, {alpha}6, {alpha}7, {alpha}8, {sigma}2), where µ and {alpha}i are as in Equation 4 and {sigma}2 is the residual variance.

The likelihood of the trait and marker data is

(6)

Now,

(7)

assuming conditional independence of (i) the trait value and the marker data given the QTL genotype and (ii) the QTL genotype and the marker phenotype given the chromosome configuration.

We can maximize the log-likelihood by

(8)

The first term on the right-hand side of (8) does not depend on the QTL parameters {theta}, and the second term may be written as

(9)

Only the third term of the final sum depends on {theta} and so contributes to the likelihood equation. The term represents a regression of the trait values on the QTL genotypes, weighted by the conditional probability of each QTL genotype. JANSEN 1992 Down, JANSEN 1996 Down discussed how this may be solved by the expectation-maximization (EM) algorithm. The QTL genotype probabilities can be expressed as

(10)

We can therefore use P(qi|gi)P(gi|oi) as initial weights, maximize the likelihood conditional on these, update the QTL genotype probabilities using Equation 10, and repeat until the log-likelihood converges to a maximum. An alternative approach is to calculate the QTL genotype probabilities from the marker data only, regardless of the trait values. In this case we would use a single weighted regression with weights P(qi|gi)P(gi|oi) rather than an iterative process to update the QTL genotype probabilities. We compare both approaches here.

Estimation of QTL genotype probabilities:
In populations such as doubled haploids from inbred diploid lines, every marker is fully informative, and the conditional probability of a QTL genotype depends only on the genotypes of the markers flanking the possible QTL location. When markers are not fully informative, we need to consider the information from all markers on a chromosome to calculate the conditional probabilities of QTL genotypes. The QTL genotype probabilities P(qi|gi)P(gi|oi) factor into two terms, and we consider them separately.

The conditional probability of the chromosome configuration, given the marker phenotypes, P(gi|oi): One way to calculate these probabilities would be to use a hidden Markov model (HMM). These have been widely used for multipoint mapping in diploid species (e.g., LANDER and GREEN 1987 Down; JIANG and ZENG 1997 Down) and XIE and XU 2000 Down tried to apply them to mapping in tetraploid species under random chromosomal segregation. There are theoretical problems with the latter's approach, however, due to the form used for the matrix of transition probabilities between neighboring markers, and as a consequence their HMM represents multivalent formations (HACKETT 2001 Down).

We preferred to use an alternative method via a "branch and bound" algorithm, where we search for chromosome configurations for each offspring that are compatible with the marker phenotype information, arise from a possible bivalent pairing in each parent, and have the minimum number of crossovers. Appendix B describes the process of reconstruction for one individual from a cross between the two parents shown in Table 1 and Fig 1 illustrates the eight possible chromosome configurations for this individual that are compatible with the phenotypic information and have the minimum number of crossovers (six).



View larger version (43K):
In this window
In a new window
Download PPT slide
 
Figure 1. The graphical genotype of an individual from the parents shown in Table 1. Chromosomes 1–4 are from parent P1, and chromosomes 5–8 are from parent P2. The dark and light shadings distinguish the two chromosomes of each bivalent. Hatched areas show where the location of a recombination is uncertain. There are three regions of uncertainty, each with two possible alleles, giving 23 = 8 configurations for this individual.


 
View this table:
In this window
In a new window

 
Table 1. Typical parental genotypes using codominant markers (simulations A, B, and E)

If there are m marker loci, the recombination frequency between loci i and i + 1 is ri, and there are xi recombinations between them (0 <= xi <= 4), then the probability of a configuration can be calculated as

(11)

This assumes that there is no interference and that recombinations occur independently between different pairs of loci. As all the configurations have the same number of crossovers, the probabilities of the different configurations will be similar unless the marker spacing is very uneven. There is, of course, an approximation here as we are ignoring configurations that have more than the minimum number of crossovers compatible with the phenotypes. However, configurations with more crossovers will have smaller probabilities by Equation 11.

The probability of the QTL genotype, given the chromosome configuration P(qi|gi): Once we have calculated a set of possible chromosome configurations for each individual, we can identify possible QTL genotypes and calculate their probability for putative QTL locations at a set of positions along the chromosome. We assume that there are no double crossovers between markers. The individual illustrated in Fig 1, for example, has inherited chromosomes 1, 2, 6, and 8 at marker loci L5 and L6 for all configurations. Therefore, for QTL locations between L5 and L6, we assume that the QTL genotype is Q1Q2Q6Q8, with probability 1. This individual has also inherited chromosomes 1, 2, 6, and 5 at locus L7, with a crossover between chromosomes 5 and 8. For QTL locations between L6 and L7, there are two possible QTL genotypes, Q1Q2Q6Q8 and Q1Q2Q6Q5. The probability of the former genotype will decrease, and the probability of the latter will increase, as we consider locations at an increasing distance from L6 and closer to L7. To calculate these probabilities, we assume that crossovers follow a Poisson process with the probability of no crossovers in an interval of M morgans equal to e-M and the probability of one crossover as Me-M. If the positions of L6 and L7 are m6 and m7 and we want to calculate the probability associated with a QTL at position mQ between them,

(12)

For many positions the set of possible QTL genotypes will depend upon the configuration; e.g., for positions between loci L8 and L9 (configuration 3275), there will be two QTL genotypes (Q1Q2Q7Q5 and Q3Q2Q7Q5) for configurations where the configuration at L8 is 1275 and four QTL genotypes (Q1Q2Q6Q5, Q1Q2Q7Q5, Q3Q2Q6Q5, and Q3Q2Q7Q5) for configurations where the configuration at L8 is 1265. Crossovers between chromosomes 1 and 3 occur independently of crossovers between chromosomes 6 and 7, and so we have probabilities

(13)

Similarly,

(14)

and

(15)

From these formulas, we can calculate QTL genotype probabilities from Equation 10 and use them for a weighted regression to test for the significance of a QTL at position mQ. A plot of the adjusted coefficient of determination R2a against the position mQ should show a maximum at the true QTL location.

Simulation study:
A simulation study was carried out to investigate this approach for QTL mapping and to quantify the effects of the marker type, the trait heritability, and the population size. Two parents were simulated initially and these were crossed to give a population of 200 offspring by random chromosomal segregation. The first simulation consisted of one chromosome with 10 codominant markers, spaced at 10-cM intervals. There were five possible alleles (A–E) and a null allele (O) at each marker locus, with equal probability. The parental marker genotypes were simulated by sampling four alleles for each parent from the set of possible alleles, with replacement. One such parental configuration is shown in Table 1. A QTL was assumed to be situated halfway between markers L2 and L3 and to have eight different alleles. The QTL alleles were assumed to have additive effects, of sizes 0, 1, 1, 2, 0, -1, -1, and -2 for alleles Q1Q8, respectively. The trait values for each offspring were calculated as an overall mean of 10.0, plus the sum of the effects of their alleles, plus an environmental effect distributed as N(0, {sigma}2). The value of {sigma}2 was chosen to give the desired trait heritability h2 = , where {sigma}2G is the genetic variance: {sigma}2 = 4.0, 12.0 correspond to heritabilities of 25 and 10%, respectively. The true QTL genotype of each individual was known, so that it was possible to compare parameter estimates from an unweighted regression on the true QTL genotype with the interval mapping approach of weighted regression on the possible QTL genotypes.

The above simulation has three random stages: simulation of the parents, simulation of the offspring given the parents, and simulation of the environmental error to add to the genotype values; and the study can be replicated at each stage. To see the effect of each level of replication, 10 pairs of parents were simulated, 20 sets of 200 offspring were simulated for each set of parents, and 20 sets of environmental error were simulated for each set of offspring, giving a total of 4000 sets of marker and trait data for analysis. Simulations A and B had heritabilities of 25 and 10%, respectively.

Most experimental data sets will be a mixture of codominant and dominant marker types, and in general the dominant markers are less informative. A further set of simulations (C) was generated to investigate this. The dominant markers were simulated as a mixture of simplex markers (AOOO x OOOO), duplex markers (AAOO x OOOO), and double-simplex markers (AOOO x AOOO), in the proportion found in potato by MEYER et al. 1998 Down. Other configurations of dominant markers were excluded. Dominant and codominant markers were ordered so that there were two loci with dominant alleles present in each parent between the codominant loci. A typical parental configuration is shown in Table 2. For this configuration, loci L1, L5, L9, and L14 are codominant. Between L1 and L5 there are three dominant markers (one double simplex, one duplex originating from parent 1, and one simplex originating from parent 2). Between L5 and L9 there are another three dominant markers (one double simplex and two duplex, one from each parent) and between L9 and L14 there are four simplex markers, two from each parent. A QTL was positioned halfway between the first and second codominant markers for each configuration. A heritability of 10% was used for set C. A further set D was run to investigate the effect of reducing the population size from 200 to 100 offspring. Set D was otherwise the same as C.


 
View this table:
In this window
In a new window

 
Table 2. Typical parental genotypes using codominant and dominant markers (simulations C, D, and F)

We need a threshold for R2a, above which we declare a QTL present. For a single regression on a known QTL genotype in a population of size n, the threshold for declaring significance at a 5% level is the 95% point of an F distribution with 6 and (n - 7) d.f. (Six QTL effects are fitted, rather than eight, due to the constraints discussed in Equation 4.) The F-statistic is related to R2a by

(16)

where {nu}1 and {nu}2 are the numerator and denominator degrees of freedom. For n = 200, the 95% point for the F-statistic is 2.146, corresponding to R2a = 3.3%, and for n = 100, the 95% point for the F-statistic is 2.198, corresponding to R2a = 6.8%. In QTL interval mapping, however, we consider the location with the maximum R2a for a large number of linked positions and this is difficult to establish theoretically. Instead simulation sets E and F were run, with a similar pattern to simulations A and D, but the QTL effects were all set to zero. In this way, we can see how large a value of R2a can be achieved by random variation and establish the distribution of R2a under the null hypothesis of no QTL on the chromosome. If the observed peak of R2a along a profile exceeds the 95% point of the distribution under the null hypothesis, we declare a QTL present with significance p < 0.05.

Computing:
All routines were written in Fortran 90 (DIGITAL 1997 Down). The branch and bound algorithm was adapted from a branch and bound algorithm for best subset selection by ROBERTS 1984 Down.


*  RESULTS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Reconstruction of the chromosome configurations:
The reconstruction of the chromosome configurations depends only on the marker data and is not affected by the heritability of the trait. The computer program to calculate the possible configurations had an upper limit of 5000 for each individual so that individuals with >5000 possible configurations were excluded from further analysis. No individuals were excluded, using the codominant markers for simulations A and B. The mean numbers of individuals excluded from simulations C and D were 8.8 (SE 0.93) out of 200 individuals and 4.8 (SE 0.96) out of 100 individuals, respectively.

For the individual illustrated in Fig 1, there are eight possible configurations. These agree, and are the same as the true genotypic configuration, for 37 of the 40 alleles. For the other 3 alleles one-half of the configurations have the simulated allele. The mean proportions of alleles that are correct for every chromosome configuration are summarized in Table 3 for simulations A, B, C, and D. The proportions of correct reconstructions were significantly higher for simulations A and B, using codominant markers only, but the proportions for simulations C and D were still high (0.81).


 
View this table:
In this window
In a new window

 
Table 3. The proportion of alleles reconstructed correctly in all chromosome configurations

Interval mapping:
For each trait, we obtained a profile of R2a at a series of positions along the chromosome. Fig 2 shows such a profile for a trait from simulation A, with codominant markers and a heritability of 25%. We took the maximum of the profile to indicate the most likely location of the QTL. Table 4 summarizes the estimates of the QTL effects, R2a, and the residual mean square for the different simulation studies. For each study, the mean and standard errors over 4000 data sets (10 parental combinations x 20 offspring combinations x 20 traits) are presented. The row labeled wt4 is the sum of the probabilities of the true QTL genotype for each individual in the weighted regression step of the EM algorithm. If the true QTL genotype has been inferred for each individual with probability 1, then wt4 is equal to the number of individuals n. Similarly, wt3 is the sum of the probabilities of QTL genotypes that are correct for three-quarters of the alleles. The proportions of simulations indicating a QTL in the interval (5–25 cM) are shown.



View larger version (8K):
In this window
In a new window
Download PPT slide
 
Figure 2. Profile of R2a along the chromosome. The triangles show the positions of the marker loci.


 
View this table:
In this window
In a new window

 
Table 4. Results of simulation studies

Model fitting by iterative or noniterative weighted regression: As discussed above, the QTL model can be fitted at each position by a single weighted regression or by an iterative process, updating the QTL genotype probabilities using trait information as in Equation 10. The third and fourth columns of Table 4 compare the model parameters from these two methods of model fitting for simulation set A. The estimated value for the constant term was close to the true value in both cases, and the mean position, the proportion of QTL located to the region between 5 and 25 cM, and the total weight of correct and three-quarters correct QTL genotypes were very similar. For all the QTL allele effects, however, the iterative approach gave an estimate close to the true value, while the noniterative approach gave estimates whose absolute values were biased downward. The noniterative approach also underestimated the percentage variance accounted for and overestimated the residual variance. The same pattern was observed for all of the other simulation sets (results not shown). We conclude that the noniterative approach to model fitting is inadequate, and it is not considered further.

The threshold for declaring a QTL present: Simulation sets E and F (codominant markers and n = 200, and dominant and codominant markers and n = 100, respectively, and with all QTL effects equal to zero) were used to investigate the distribution of R2a when no QTL is segregating. For set E, the mean R2a was 2.6 (SE 0.04), and the upper 95% point of the distribution was 6.5. We therefore took R2a = 6.5 as the threshold above which a QTL was declared present for simulations A and B. Further simulations (results not shown) indicated that the same threshold was appropriate for simulation C, with a mixture of codominant and dominant markers. For set F, with 100 individuals, the mean R2a was 5.1 (SE 0.18), and the upper 95% point of the distribution was 13.7. This is the appropriate threshold for simulation set D.

The effect of heritability: The fourth and fifth columns of Table 4 compare simulations with heritabilities of 25 and 10%, respectively. Means and standard errors were calculated over all 4000 data sets in each case. For a heritability of 25%, 3998/4000 data sets had R2a greater than the threshold of 6.5. For a heritability of 10%, 3098/4000 data sets had R2a > 6.5, indicating a significant QTL. The sixth column of Table 4 shows the means and standard deviations for these 3098 significant data sets. The standard error of the QTL location increased as the heritability decreased, although ~80% of the simulations were still found in the interval between 5 and 25 cM. The weight placed on the true QTL genotypes in the weighted regression decreased slightly for the lower heritability, and the weight on the three-quarters correct QTL genotype increased. The estimates of the QTL effects, R2a, and the residual mean square were still close to the true values. However, the mean QTL effects and R2a for all 4000 data sets were consistently slightly lower than the true value, while they were consistently slightly higher for the 3098 significant data sets. The opposite trend was seen for the residual mean square. This is to be expected, as we are excluding the data sets with the lowest R2a from the mean in the case of the significant sets.

The effect of marker type: The effect of the change from all codominant markers (simulation B) to a mixture of codominant and dominant markers (simulation C) can be seen by comparing columns five and seven (all 4000 data sets) and columns six and eight (significant data sets) of Table 4. In each case the heritability is 10%. For simulation C, 2848/4000 data sets had R2a > 6.5, indicating a significant QTL. The proportion of QTL located in the interval (5–25 cM) fell to 0.66 (0.70 for the significant data sets) and the weight given to the correct QTL genotypes decreased. The absolute values of the QTL effects were biased downward for the mean over all 4000 data sets but less so for the significant data sets.

The effect of population size: The last four columns of Table 4 compare the effect of decreasing the population size from 200 to 100 (simulation D), for the situation of 10% heritability and a mixture of codominant and dominant markers. As discussed above, the threshold for declaring a significant QTL with a population of 100 is R2a > 13.7, and 1233/4000 data sets had R2a in this range. The absolute values of the QTL effects for a population of 100 (estimated over all 4000 data sets) were biased downward slightly more than for 200 individuals, and the standard errors were generally slightly larger. The weight given to the correct QTL genotypes decreased to below half of the population size. The mean of the QTL location changed markedly from 18.5 to 25.8 cM. However, an examination of the distribution of locations showed that the distribution was skewed. The median QTL position was 18.0 for all data sets and 16.0 for the significant data sets, closer to the true location. The estimates of the QTL effects for the significant data sets, and especially the mean R2a, were biased upward. This is to be expected, but is a source of potential bias in the analysis of experimental data sets.

Comparison with regression on true QTL genotype: In a simulation study such as this, the true QTL genotype is known and we can compare the parameter estimates from a regression of each trait on the true QTL genotype to that obtained by our interval mapping procedure. An examination of the estimates produced by regression on the true QTL genotypes shows that these estimates varied substantially. To illustrate this, Fig 3 shows the relationship between the estimate of R2a from the interval mapping and that from regression on the true genotype for each set of simulations. The line x = y is shown on each plot. There was a high correlation in each case, decreasing as the heritability, the informativeness of the markers, and the population size decreased.



View larger version (39K):
In this window
In a new window
Download PPT slide
 
Figure 3. Plots of the adjusted coefficient of determination R2a from QTL mapping against R2a from regression on the true QTL genotype, for each simulation.


*  DISCUSSION
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

In this article we proposed and tested a method for interval mapping of QTL in a full-sib population of an autotetraploid species. This method could also be extended for QTL mapping in plant species of higher ploidy. As with diploid species, the precision with which a QTL may be located is affected by the heritability of the trait, the size of the mapping population, and the informativeness of the markers. It is useful to have as high a proportion of codominant markers as possible, both for precision of QTL mapping and for linkage map construction (LUO et al. 2001 Down).

The threshold at which a QTL was declared present was calculated by simulating data sets with QTL effects set to zero and examining the distribution of the adjusted coefficient of determination R2a. Using the 95% point of the distribution of R2a gives a test of significance at a 5% level for the presence of a QTL. Using this threshold, the power to detect QTL varied considerably. For simulation set A, with codominant markers, a heritability of 25% and a population of 200 individuals, the power was >99%. This fell to 77% when the heritability was reduced to 10% (set B), to 71% when a mixture of dominant and codominant markers was used (set C), and to 31% when the population size was reduced to 100 (set D). For set D, the true value of R2a is 10%. The mean value of R2a from all the simulations in set D was 11.1% (SE 0.31), which is lower than the threshold for declaring significance (13.7%). The simulations for which a QTL is declared significant are those with values of R2a in the upper tail of the distribution, which have mean 18.9% (SE 0.14), overestimating the true value. The effects of the QTL alleles are similarly overestimated. This problem is not confined to tetraploid analysis. Simulation studies by BEAVIS 1994 Down demonstrated the same problem in an F2 population and showed how dramatically the variance accounted for by a QTL could be overestimated as the power to detect a QTL decreased. UTZ et al. 2000 Down proposed the use of cross-validation to obtain unbiased estimates of the proportion of genotypic variance explained by a QTL. Power calculations and cross-validation could be usefully applied to analysis of experimental tetraploid populations to ensure that QTL detection and estimation are reliable.

We used a novel approach of reconstructing possible chromosome configurations from the observed marker phenotypes for each offspring. The branch and bound algorithm was used to identify configurations with the minimum number of crossovers consistent with the observed data. Configurations that did not come from bivalent pairings were rejected. This analysis was motivated by the need for QTL mapping studies in tetraploid potato. A recent ultrahigh-density diploid genetic linkage map of potato chromosome 1 found that chromatids had experienced 0, 1, or 2 recombination events during meiosis (E. ISIDORE, personal communication); that is, one or two chiasmata per chromosome pair had occurred during the meiosis. The same is likely to be true for chromosomes 2–12, given the lengths of the linkage groups (68–108 cM) found using molecular markers (VAN ECK et al. 1995 Down). Hence it is not surprising that the limited cytological evidence available suggests that bivalents predominate in potato, although low frequencies of quadrivalents, trivalents, and univalents also occur (SWAMINATHAN and HOWARD 1953 Down). As ring and chain quadrivalents can give an equal 2:2 distribution of homologues and hence balanced gametes, we must assume that departures from chromosomal segregation (multivalents and double reduction) will occur at a low frequency, which may or may not be higher for the two long chromosomes of potato (PIJNACKER and FERWERDA 1984 Down). Hence when analyzing real data under the assumption of crossing over restricted to bivalents, some anomalous progeny may occur and need to be eliminated from the analysis and this may affect the power of QTL detection and bias the estimation of QTL effects. However, our computer simulation, based on bivalent pairing, has shown that for some offspring multivalent configurations with fewer than the true number of recombinations could be constructed. It would be incorrect in these simulations to infer multivalent formation for such offspring, and the same problem may arise with real data. Clearly, more theoretical work and much more experimental data will be required to resolve these issues.

The accuracy with which the chromosome configurations were reconstructed depended on the type of markers used. In this study we considered codominant markers (for example, microsatellites) and dominant markers (for example, amplified fragment length polymorphisms). For the codominant markers, the dosages and configurations of alleles were obtained by random sampling with replacement from a maximum of five alleles and a null allele. The proportion of alleles reconstructed correctly was lower for a mixture of dominant and codominant markers than for codominant markers alone. The codominant markers with most alleles, and in particular those with most alleles in simplex configurations, gave offspring phenotypes that could occur in the fewest ways. Such markers are therefore the most useful for chromosome reconstruction. Useful marker information could also be obtained by pyrosequencing single nucleotide polymorphisms to measure dosages of alleles for each offspring, which should be more informative in chromosome reconstruction than presence/absence data.

This simulation study, and in particular the chromosome reconstructions, assumed that the marker order was known without error. This may not be the situation for experimental data, and the reconstruction method could also be used to check and improve the locus ordering. The current strategy (LUO et al. 2001 Down) is to calculate the recombination frequency and LOD score for all possible phases for each pair of markers on a linkage group and to use the recombination frequency and LOD score of the most likely phase for assembling the map of that group. The final step uses the module JMMAP of JoinMap (STAM and VAN OOIJEN 1995 Down). This strategy was tested for a map of simplex and duplex dominant markers by HACKETT et al. 1998 Down and was found to give locus orders with high rank correlations with the true order for populations of 150 or more offspring. Once the offspring genotypes have been reconstructed on the basis of the order from JoinMap, the order could be investigated in two ways:

  1. Drop each marker in the linkage group in turn and calculate the total number of crossovers for all the offspring. If the omission of any marker reduces the number of crossovers markedly compared to the order for the full group, then try other positions for this marker and reposition it where the total number of crossovers is lowest. Repeat for other markers if necessary.

  2. Examine the distribution of the crossovers for all the offspring and identify individuals with large numbers of crossovers. The marker data corresponding to the crossovers should be checked and corrected where necessary.

Theoretically, we could also use a computer-intensive search method to search directly for the locus order and parental phases that minimize the total number of crossovers in the offspring. However, there are a very large number of possible orders for a tetraploid cross [m!/2 orders for m loci, and up to (4!)2 possible phases at each locus], so it is preferable to use pairwise information to reduce the search space as far as possible. There is a need for further research here.

This analysis was restricted to the case of additive effects of the QTL alleles. However, this model may be too simple. For example, many traits in potato display specific as well as general combining ability (BRADSHAW and MACKAY 1994 Down). Furthermore, BARNES and HANSON 1967 Down postulated that downy mildew resistance in alfalfa is controlled by a gene that confers resistance only when in a triplex or quadruplex state. In theory, there is no problem in extending the weighted regression model to fit separate means for each of the 36 QTL genotypes but a large population would be required to obtain good estimates of the means. One possible strategy is to use our additive model to locate regions of the genome associated with the trait. The mean trait value for each QTL genotype could then be calculated from the conditional probabilities of the genotypes for putative QTL locations in this region, to examine how they change. A range of models for the gene action (additive effects, simple models for dominance, etc.) could be tested to identify which are compatible with the means. If the additive model is rejected in favor of an alternative model, the interval mapping process should be repeated to see if the most likely QTL region changes. This strategy will be used to explore experimental data from autotetraploid potatoes at the Scottish Crop Research Institute. Computer routines may be obtained by contacting the first author.


*  ACKNOWLEDGMENTS

The computer programs for simulating tetraploid data were written by Dr. Z. W. Luo. This research was supported by a research grant from the United Kingdom Biotechnology and Biological Sciences Research Council and by the Scottish Executive Rural Affairs Department.

Manuscript received May 25, 2001; Accepted for publication September 17, 2001.


*  APPENDIX A
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

RELATIONSHIP BETWEEN BIOMETRICAL MODELS
BRADSHAW 1994 Down compares three notations that have been used for the genetic values of tetraploid individuals. All these notations assume that two alleles, and hence five genotypes, are at a locus affecting the trait. Table A1 shows the notation used for each genotype by EASTON 1976 Down and WRIGHT 1979 Down and compares this to the expression derived from Equation 1 of this article. From the last column, the average of the two quadruplex genotypes is

We can equate the expression from EASTON 1976 Down to the difference between the genetic values in the last column and m to give the following correspondences:

We see that d is a function of the additive effects {alpha} and higher-order terms, while h, v, and w depend on the diallelic, triallelic, and tetraallelic interactions and higher-order terms, respectively.


 
View this table:
In this window
In a new window

 
Table A1. Comparison of biometrical models for the case of two alleles at a QTL


*  APPENDIX B
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

RECONSTRUCTION OF CHROMOSOME CONFIGURATIONS
Here we demonstrate the reconstruction of the possible chromosomes inherited by an offspring from the cross between parents P1 and P2, with genotypes given in Table 1. The phenotype of this offspring is shown in Table A2.


 
View this table:
In this window
In a new window

 
Table A2. Possible chromosome configurations

Consider locus L1, with phenotype ACD. The A allele must have come from chromosome 6 (parent P2). The individual does not have a B or E allele and therefore has not inherited chromosome 2 or 5 at this locus. The possibilities are (i) chromosomes 6 and 7 from P2, together with either chromosomes 1 and 3 from P1 (giving genotype ACDO); or (ii) chromosomes 1 and 4 from P1 (giving genotype ACDD); or (iii) chromosomes 6 and 8 from P2, together with either chromosomes 1 and 4 (giving genotype ACCD); or (iv) chromosomes 3 and 4 (giving genotype ACDO). Table A2 shows the chromosome configurations giving rise to the phenotypes of this individual at each locus.

A branch and bound algorithm is then used to identify the chromosome configurations that give the minimum number of recombinations for the complete linkage group. An initial configuration is found by ordering the configurations according to the number of loci for which they are possible and selecting for each locus the most frequent compatible configuration. For this individual, the configurations 1257 and 2357 are jointly most frequent, each being possible for 5 of the 10 loci. Neither of these is possible for locus L1, however. The initial order is

L1  1 4 6 7

L2  1 2 5 7

L3  1 2 5 7

L4  1 2 5 7

L5  1 4 6 7

L6  3 2 6 8

L7  1 2 5 6

L8  1 2 5 7

L9  3 2 5 7

L10  1 2 5 7

with 12 recombinations. The algorithm searches for configurations with the minimum number of crossovers. It is not necessary to test every combination in Table A2: If a combination for, say, L1–L5 has more recombinations than the current minimum, then this is rejected without considering L6–L10.

The minimum number of recombinations for this individual is six, and there are 20 configurations with this minimum. Up to now, the question of whether the configuration may be produced by bivalent pairing has been ignored, but now this is checked for each configuration.

One possible configuration with six recombinations is

L1  1 4 6 7

L2  1 4 6 7

L3  1 4 6 5

L4  1 2 6 5

L5  1 2 6 8

L6  1 2 6 8

L7  1 2 6 5

L8  1 2 7 5

L9  3 2 7 5

L10  3 2 7 5.

If we consider the first four loci, these suggest that the chromosomes are paired as 1 + 3 and 2 + 4 from P1 and 5 + 7 and 6 + 8 from P2. However, L5 and L6 have chromosomes 6 and 8 from P2, and L8–L10 have chromosomes 5 and 7 together. This configuration is rejected as incompatible with bivalent pairing. However, the configuration

L1  1 4 6 8

L2  1 4 6 8

L3  1 4 6 5

L4  1 2 6 5

L5  1 2 6 8

L6  1 2 6 8

L7  1 2 6 5

L8  1 2 7 5

L9  3 2 7 5

L10  3 2 7 5

is compatible throughout with the chromosomes pairing as 1 + 3, 2 + 4, 6 + 7, and 5 + 8. Of the 20 configurations with six recombinations for this individual, 8 were compatible with bivalent pairings. They can be summarized as

L1  1 4 6 8

L2  1 4 6 5/8

L3  1 4 6 5

L4  1 2 6 5/8

L5  1 2 6 8

L6  1 2 6 8

L7  1 2 6 5

L8  1 2 6/7 5

L9  3 2 7 5

L10  3 2 7 5.

This is represented as a graphical genotype in Fig 1. The 8 configurations coincide for 37 of the 40 alleles, but there is uncertainty about the other 3 alleles. For example, there is definitely a recombination between chromsomes 6 and 7 between L7 and L9, but it is uncertain on which side of L8 it occurred. One of the 8 configurations is the same as the simulated genotype for this individual.

If locus L3 was excluded from the analysis, there would be no evidence to establish the first two crossovers between chromosomes 5 and 8. Loci L1, L3, and L5 have unique alleles on chromosome 5, and this individual carries the unique allele from L3 but not from L1 or L5. The true recombinations were between L1 and L2 and between L4 and L5, but the phenotypes observed for L2 and L4 are both compatible with inheriting chromosomes 6 and 8 from parent P2. Without L3, the minimum recombination configuration would have four recombinations, fewer than that simulated.

Occasionally the minimum recombination configurations have fewer than the simulated number of recombinations, but are all incompatible with bivalent pairing. In this case configurations with minimum + 1, minimum + 2, etc., recombinations are considered until compatible configurations are found.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

AL-JANABI, S. M., R. J. HONEYCUTT, M. MCCLELLAND, and B. W. S. SOBRAL, 1993  A genetic linkage map of Saccharum spontaneum L. ‘SES 208’. Genetics 134:1249-1260[Abstract].

BARNES, D. K., and C. H. HANSON, 1967 An illustrated summary of genetic traits in tetraploid and diploid alfalfa. U.S. Department of Agriculture Technical Bulletin 1370. U.S. Department of Agriculture, Washington, DC.

BEAVIS, W. D., 1994 The power and deceit of QTL experiments: lessons from comparative QTL studies, pp. 250–266 in 49th Annual Corn and Sorghum Industry Research Conference. ASTA, Washington, DC.

BRADSHAW, J. E., 1994 Quantitative genetics theory for tetrasomic inheritance, pp. 71–99 in Potato Genetics, edited by J. E. BRADSHAW and G. R. MACKAY. CAB International, Wallingford, Oxon, UK.

BRADSHAW, J. E., and G. R. MACKAY, 1994 Breeding strategies for clonally propagated potatoes, pp. 467–497 in Potato Genetics, edited by J. E. BRADSHAW and G. R. MACKAY. CAB International, Wallingford, Oxon, UK.

BRADSHAW, J. E., C. A. HACKETT, R. C. MEYER, D. MILBOURNE, and J. W. MCNICOL et al., 1998  Identification of AFLP and SSR markers associated with quantitative resistance to Globodera pallida (Stone) in tetraploid potato (Solanum tuberosum subsp. tuberosum) with a view to marker-assisted selection. Theor. Appl. Genet. 97:202-210.

BROUWER, D. J. and T. C. OSBORN, 1999  A molecular marker linkage map of tetraploid alfalfa (Medicago sativa L.). Theor. Appl. Genet. 99:1194-1200.

CALLEN, D. F., A. D. THOMPSON, H. A. Y. SHEN, H. A. PHILLIPS, and R. I. RICHARDS et al., 1993  Incidence and origin of "null" alleles in the (AC)n microsatellite markers. Am. J. Hum. Genet. 52:922-927[Medline].

CHURCHILL, G. A. and R. W. DOERGE, 1994  Empirical threshold values for quantitative trait mapping. Genetics 138:963-971[Abstract].

DA SILVA, J., 1993 A methodology for genome mapping of autopolyploids and its application to sugarcane (Saccharum spp.). PhD. Dissertation, Cornell University, Ithaca, NY.

DA SILVA, J. A. G., M. E. SORRELLS, W. L. BURNQUIST, and S. D. TANKSLEY, 1993  RFLP linkage map and genome analysis of Saccharum spontaneum.. Genome 36:782-791.

DA SILVA, J., R. J. HONEYCUTT, W. BURNQUIST, S. M. AL-JANABI, and M. E. SORRELLS et al., 1995  Saccharum spontaneum L. ‘SES 208’ genetic linkage map combining RFLP- and PCR-based markers. Mol. Breed. 1:165-179.

DIGITAL, 1997 DIGITAL Fortran Language Reference Manual. Digital Equipment Corporation, Maynard, MA.

DIWAN, N., J. H. BOUTON, G. KOCHERT, and P. B. CREGAN, 2000  Mapping of simple sequence repeat (SSR) DNA markers in diploid and tetraploid alfalfa. Theor. Appl. Genet. 101:165-172.

EASTON, H. S., 1976 Etude comparative d'effects génétique chez des plantes diploïdes et tetraploïdes isogéniques de Festuca pratensis Huds. Thèse de Doctorat d'Etat des Sciences Naturelles, Université de Paris-Sud, Paris, France.

HACKETT, C. A., 2001  A comment on Xie and Xu: ‘mapping quantitative trait loci in tetraploid species.’. Genet. Res. 78:187-189[Medline].

HACKETT, C. A., J. E. BRADSHAW, R. C. MEYER, J. W. MCNICOL, and D. MILBOURNE et al., 1998  Linkage analysis in tetraploid species: a simulation study. Genet. Res. 71:143-154.

JANSEN, R. C., 1992  A general mixture model for mapping quantitative trait loci by using molecular markers. Theor. Appl. Genet. 85:252-260.

JANSEN, R. C., 1996  A general Monte Carlo method for mapping multiple quantitative trait loci. Genetics 142:305-311[Abstract].

JIANG, C. J. and JIANG, C. J.Z-B. ZENG, 1997  Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101:47-58[Medline].

KEMPTHORNE, O., 1957 An Introduction to Genetic Statistics. John Wiley & Sons, New York.

LANDER, E. S. and P. GREEN, 1987  Construction of multilocus genetic-linkage maps in humans. Proc. Natl. Acad. Sci. USA 84:2363-2367[Abstract/Free Full Text].

LUO, Z. W., C. A. HACKETT, J. E. BRADSHAW, J. W. MCNICOL, and D. MILBOURNE, 2000  Predicting parental genotypes and gene segregation for tetrasomic inheritance. Theor. Appl. Genet. 100:1067-1073.

LUO, Z. W., C. A. HACKETT, J. E. BRADSHAW, J. W. MCNICOL, and D. MILBOURNE, 2001  Construction of a genetic linkage map in tetraploid species using molecular markers. Genetics 157:1369-1385[Abstract/Free Full Text].

MEYER, R. C., D. MILBOURNE, C. A. HACKETT, J. E. BRADSHAW, and J. W. MCNICOL et al., 1998  Linkage analysis in tetraploid potato and associations of markers with quantitative resistance to late blight (Phytophthora infestans). Mol. Gen. Genet. 259:150-160[Medline].

PIJNACKER, L. P. and M. A. FERWERDA, 1984  Giemsa C-banding of potato chromosomes. Can. J. Genet. Cytol. 26:415-419.

RIPOL, M. I., G. A. CHURCHILL, J. A. G. DA SILVA, and M. SORRELLS, 1999  Statistical aspects of genetic mapping in autopolyploids. Gene 235:31-41[Medline].

ROBERTS, S. J., 1984  A branch and bound algorithm for determining the optimal feature subset of given size. Appl. Stat. 33:236-241.

SILLS, G. R., W. BRIDGES, S. M. AL-JANABI, and B. W. S. SOBRAL, 1995  Genetic analysis of agronomic traits in a cross between sugarcane (Saccharum officinarum L.) and its presumed progenitor (S. robustum Brandes and Jesw. ex Grassl). Mol. Breed. 1:355-363.

STAM, P., and J. W. VAN OOIJEN, 1995 JoinMap Version 2.0: Software for the Calculation of Genetic Linkage Maps. CPRO-DLO, Wageningen, The Netherlands.

SWAMINATHAN, M. S. and H. W. HOWARD, 1953  The cytology and genetics of the potato (Solanum tuberosum) and related species. Bibliogr. Genet. 16:1-192.

UKOSIT, K. and P. G. THOMPSON, 1997  Autopolyploidy versus allopolyploidy and low-density randomly amplified polymorphic DNA linkage maps of sweetpotato. J. Am. Soc. Hort. Sci. 122:822-828[Abstract/Free Full Text].

UTZ, H. F., A. E. MELCHINGER, and C. C. SCHON, 2000  Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross validation and validation with independent samples. Genetics 154:1839-1849[Abstract/Free Full Text].

VAN ECK, H. J., J. R. VAN DER VOORT, J. DRAAISTRA, P. VAN ZANDVOORT, and E. VAN ENCKEVORT et al., 1995  The inheritance and chromosomal localization of AFLP markers in a noninbred potato offspring. Mol. Breed. 1:397-410.

WRIGHT, A. J., 1979  The use of differential coefficients in the development and interpretation of quantitative genetics models. Heredity 43:1-8.

WU, K. K., W. BURNQUIST, M. E. SORRELLS, T. L. TEW, and P. H. MOORE et al., 1992  The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor. Appl. Genet. 83:294-300.

XIE, C. and S. XU, 2000  Mapping quantitative trait loci in tetraploid populations. Genet. Res. 76:105-115[Medline].

YU, K. F. and K. P. PAULS, 1993  Segregation of random amplified polymorphic DNA markers and strategies for molecular mapping in tetraploid alfalfa. Genome 36:844-851.




This article has been cited by other articles:


Home page
J HeredHome page
C. A. Hackett, I. Milne, J. E. Bradshaw, and Z. Luo
TetraploidMap for Windows: Linkage Map Construction and QTL Mapping in Autotetraploid Species
J. Hered., November 1, 2007; 98(7): 727 - 729.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
J. G. Robins, D. Luth, T. A. Campbell, G. R. Bauchan, C. He, D. R. Viands, J. L. Hansen, and E. C. Brummer
Genetic Mapping of Biomass Production in Tetraploid Alfalfa
Crop Sci., January 22, 2007; 47(1): 1 - 10.
[Abstract] [Full Text] [PDF]


Home page