A Model Selection-Based Interval-Mapping Method for Autopolyploids

Cao, Dachuang; Craig, Bruce A; Doerge, R W

doi:10.1534/genetics.104.035410

Abstract

While extensive progress has been made in quantitative trait locus (QTL) mapping for diploid species, similar progress in QTL mapping for polyploids has been limited due to the complex genetic architecture of polyploids. To date, QTL mapping in polyploids has focused mainly on tetraploids with dominant and/or codominant markers. Here, we extend this view to include any even ploidy level under a dominant marker system. Our approach first selects the most likely chromosomal marker configurations using a Bayesian selection criterion and then fits an interval-mapping model to each candidate. Profiles of the likelihood-ratio test statistic and the maximum-likelihood estimates (MLEs) of parameters including QTL effects are obtained via the EM algorithm. Putative QTL are then detected using a resampling-based significance threshold, and the corresponding parental configuration is identified to be the underlying parental configuration from which the data are observed. Although presented via pseudo-doubled backcross experiments, this approach can be readily extended to other breeding systems. Our method is applied to single-dose restriction fragment autotetraploid alfalfa data, and the performance is investigated through simulation studies.

QUANTITATIVE trait loci (QTL) mapping detects and identifies the regions of a genome associated with the variation of a quantitative trait of interest. Molecular markers have been used extensively to construct genetic maps for diploid species (Koornneef et al. 1983; Dietrich et al. 1996) and act as the foundation for QTL analyses such as interval mapping (Lander and Botstein 1989; Knapp et al. 1990; Haley and Knott 1992), composite interval mapping (Zeng 1993, 1994), and multiple-QTL mapping (Jansen 1993). The statistical issues involved in diploid QTL mapping are reviewed in Doerge et al. (1997), Broman (2001), and Doerge (2002).

Polyploids are organisms having more than two complete sets of chromosomes in a cell. Polyploidy is most common in plants, but also found in some insects, amphibians, and reptiles. Some 30–70% of today's angiosperms and at least half of the flowering plants are thought to be polyploid (Soltis and Soltis 1999). Not only is polyploidy important in agriculture, but it is also an important evolutionary force (Soltis and Soltis 2000; Wolfe 2001; Osborn et al. 2003; Soltis et al. 2003).

To describe the current state of QTL mapping in polyploids, they must first be classified according to their homology. A polyploid with genomes all derived from the same species is called an autopolyploid. If the multiple sets of chromosomes are derived from different species, the polyploid is called an allopolyploid. A segmental allopolyploid represents species with homology in between allopolyploids and autopolyploids (Suzuki et al. 1989). For allopolyploids (or disomic polyploids), such as bread wheat and potato, meiosis is usually restricted to the pairing of ancestral parental homologs. Thus, the genetics of allopolyploids are similar to those of diploids, and diploid QTL-mapping methods can be carried out by treating each of the two sets of homologous chromosomes from an ancestral genome as a diploid.

For autopolyploids (or polysomic polyploids), however, the high ploidy level and homology create complications in interpreting the meiotic process. Unlike diploids whose two homologs always pair during meiosis, autopolyploids may undergo either bivalent pairing (two chromosomes pair) or multivalent pairing (more than two chromosomes pair) (Muller 1916; Newton and Darlington 1929; Darlington 1931; Sybenga 1995). Furthermore, the manner in which the paired chromosomes segregate during meiosis, especially for multivalent pairing, varies in different species (Mather 1938; Rieseberg and Doyle 1989; Jackson and Jackson 1996). Finally, the marker genotypes in autopolyploids are not always identifiable (i.e., the banding patterns are not unique since not all the alleles can be differentiated by any one marker system). Even if we are able to identify the different alleles of one locus, the number of copies (dosage) of each allele and the linkage phases (i.e., the arrangement of alleles on the informative chromosomes) between loci remain unknown.

There are two different approaches for dealing with the multiple copies of one locus for autopolyploids. With a dominant marker system, each locus is treated as biallelic. For this case, Wu et al. (1992) estimated a genetic map for autopolyploids with simplex markers, or single-dose restriction fragment (SDRF) markers, which represent only one homolog and segregate 1:1 in the progeny. Ripol et al. (1999) extended the method of Wu et al. (1992) to any dominant marker with an unobservable dosage level. With codominant markers each locus is treated as multiallelic (Luo et al. 2000, 2001; Hackett et al. 2001). Each band observed on a gel is viewed as a different allele for that locus. This approach is valid for some marker systems, such as simple sequence repeats (SSRs) that do not require the use of DNA digestion by restriction enzymes and whose motifs have high variation in the number of repeat units. However, since the use of SSRs may be responsible for inaccurate allele identification, Rodzen and May (2002) suggested scoring multiallelic SSR markers as individual dominant markers, unless the markers' underlying modes of inheritance are known, simply because different loci may have different inheritance patterns. Because the availability of codominant markers for autopolyploids is relatively rare, and dominant markers (e.g., SDRF markers) have the advantages of being rich in plants and easily scored, the remainder of this article is based on the dominant marker system.

An additional advantage of using SDRF markers is that the only unknown factor is the linkage phase (coupling or repulsion) between the SDRFs and this can be estimated by a goodness-of-fit test. In turn, the maximum-likelihood estimate (MLE) of the recombination between SDRF markers can be calculated on the basis of the estimated linkage phase. Using SDRFs, Brouwer and Osborn (1999) estimated the first composite genetic map for tetraploid alfalfa. When considering multiple-dose markers, Ripol et al. (1999) first estimated marker dosages and linkage phase and then constructed the genetic map by computing the maximum-likelihood estimate of recombination fraction on the basis of estimated parental marker configurations. Doerge and Craig (2000) extended polyploid QTL methodology by developing an algorithmic model selection process for a single-marker QTL analysis with dominant markers for autopolyploids with any even ploidy level. Recently, Luo et al. (2004) addressed both bivalent and quadrivalent chromosomal pairing during meiosis by employing a statistical framework for the analysis of tetrasomic linkage.

As a continuation of the work by Doerge and Craig (2000), we propose an interval-mapping method by employing an available genetic map to increase the power of detecting and estimating QTL locations within an autopolyploid bivalent pairing framework. For simplicity, and in keeping with this previous work, we base our work on a pseudo-doubled backcross experiment (Grattapaglia and Sederoff 1994) and employ a model selection-based interval mapping to simultaneously estimate model parameters, which include QTL location, parental marker dosages, parental QTL configuration, and QTL effect. Initially, we estimate the parental marker dosages using progeny marker data to reduce the number of potential parental configurations. On the basis of each putative parental configuration, interval mapping is performed to estimate QTL location and to calculate maximum-likelihood estimates of QTL effect and population variation that are obtained using the expectation maximization (EM) algorithm (Dempster et al. 1977). We limit our approach to any even ploidy level with multiple-dose dominant markers since odd ploidy levels are often associated with high infertility. After describing the methodology, our method is applied to single-dose restriction fragment alfalfa data with an estimated linkage map from Brouwer and Osborn (1999). Several putative QTL have been identified as associated with winter hardiness traits in bivalent tetraploid alfalfa and act as the basis of simulation studies that investigate the performance of our approach.

METHODS

Our polyploid QTL-mapping methodology is developed for organisms undergoing bivalent pairing during meiosis. Under the bivalent pairing assumption, the pairing of the chromosomes can range from preferential pairing (always pairing with the same homologous chromosome) to random pairing (equally likely to pair with any other homologous chromosome). The data analysis and simulation are based on random pairing.

Let n denote the number of progeny and k denote the ploidy level. In what follows, for each locus, uppercase is used to denote both the locus name and its dominant allele, and lowercase represents the recessive allele (e.g., M₁ and m₁). The dosage of a locus refers to the number of copies of the dominant allele at that locus. For example, a locus dosage of two refers to two copies of the dominant allele at that locus. When a marker is observed, at least one copy of the dominant allele for that marker is present.

Marker data (presence/absence) and the quantitative trait data are measured on the F₁ progeny. These progeny are the result of a pseudo-doubled backcross experiment (Figure 1)

Figure 1.—

A pseudo-doubled backcross experiment for a tetraploid with two loci. Each locus is assumed to be biallelic: M1/m1 and M2/m2, where M1 and m1 denote two alleles of the first locus and M1 is the dominant allele, and similarly for M2 and m2. For each locus, there are two copies of the dominant allele in the informative parent P1. P2 is the noninformative doubled haploid produced by doubling a haploid of P1, which contains only recessive alleles.

Open in new tab Download slide

A pseudo-doubled backcross experiment for a tetraploid with two loci. Each locus is assumed to be biallelic: M₁/m₁ and M₂/m₂, where M₁ and m₁ denote two alleles of the first locus and M₁ is the dominant allele, and similarly for M₂ and m₂. For each locus, there are two copies of the dominant allele in the informative parent P₁. P₂ is the noninformative doubled haploid produced by doubling a haploid of P₁, which contains only recessive alleles.

, where an informative parent P₁ is crossed with a noninformative parent P₂. The dosage at each locus in the parents is at least one for the informative parent and zero for the noninformative parent. Furthermore, at most half of the chromosomes of the informative parent contain dominant alleles. Chromosomes that contain dominant alleles in the informative parent are called informative chromosomes. The pseudo-doubled backcross design is equivalent to doubling the half of the noninformative chromosomes of the informative parent to produce the noninformative parent and then crossing them to get F₁. An example of a pseudo-doubled backcross experiment for a tetraploid (k = 4) with two marker loci M₁ and M₂ is shown in Figure 1. In this case, the dosage for both markers is two in the informative parent. A pseudo-doubled backcross population can also be developed from diploids by choosing two homozygous parents carrying different alleles, doubling each of them to produce tetraploids, crossing them to produce an F₁, and then crossing the F₁ to the parent containing recessive alleles.

Interval mapping:

Consider two flanking markers, M₁ and M₂, and one QTL, Q, with marker genetic distance or recombination fraction provided by an available genetic map. Once the parental configuration c is known, interval mapping can be applied to locate QTL. A parental configuration includes loci dosages and linkage phase. If the dominant alleles of two loci are on the same chromosome, they are in coupling phase; otherwise, they are in repulsion phase. On the basis of this, the five respective configurations for a tetraploid with two markers under a pseudo-doubled backcross design are called double coupling (Figure 2A)

Figure 2.—

Possible parental marker configurations for a tetraploid with two marker loci (M1/m1 and M2/m2) under a pseudo-doubled backcross design. (A) Double coupling. (B) Asymmetric coupling 12. (C) Asymmetric coupling 21. (D) Coupling. (E) Repulsion.

Open in new tab Download slide

Possible parental marker configurations for a tetraploid with two marker loci (M₁/m₁ and M₂/m₂) under a pseudo-doubled backcross design. (A) Double coupling. (B) Asymmetric coupling 12. (C) Asymmetric coupling 21. (D) Coupling. (E) Repulsion.

, asymmetric coupling 12 (Figure 2B), asymmetric coupling 21 (Figure 2C), coupling (Figure 2D), and repulsion (Figure 2E). Similarly, there are three possible QTL configurations: (1) one dose of the QTL located on the first informative chromosome, (2) one dose of the QTL on the second informative chromosome, and (3) two doses of the QTL on both informative chromosomes. In what follows, configurations A–E are used to denote a marker configuration for a tetraploid as defined above. A marker configuration letter followed by a QTL configuration number represents a complete parental configuration, such as “A1,” which represents a parental configuration with markers in double coupling and one QTL on the first informative chromosome.

For the ith individual, the observed data are x_i = (y_i, I_i), where y_i is the trait value, and I_i is the indicator vector for marker presence (i.e., I_ih = 1 if marker M_h is present and 0 otherwise). Assume for each QTL dosage level j (j = 0, 1, … , k/2) that the trait y has a normal distribution with mean μ_j and common variance σ² for all the dosages of QTL. The vector θ = (μ = (μ₀, μ₁, … , μ_k_/2), σ) is used to denote model parameters. At any fixed putative position of the QTL the likelihood function of the trait is a mixture of normal distributions (Equation 1) under the assumption of one QTL,

\begin{eqnarray*}&&L\left(\mathrm{\mathbf{{\theta}}}|\mathbf{x},\ c\right)\ =\ {{\prod}_{i=1}^{n}}\ {{\sum}_{j=0}^{k/2}}P\left(d_{Q}\ =\ j|\mathbf{I}_{i},\ c\right)\mathrm{{\phi}}\left(y_{i}{;}\ \mathrm{{\mu}}_{j},\ \mathrm{{\sigma}}^{2}\right)\\&&{\triangleq}{{\prod}_{i=1}^{n}}\ {{\sum}_{j=0}^{k/2}}\mathrm{{\omega}}_{j|i}\mathrm{{\phi}}\left(y_{i}{;}\ \mathrm{{\mu}}_{j},\ \mathrm{{\sigma}}^{2}\right),\end{eqnarray*}

1

where

\(P\left(d_{Q}\ =\ j|I_{i},\ c\right)\ {\triangleq}\mathrm{{\omega}}_{j|i}\)

is the probability of the ith progeny having dosage j of the QTL given marker presence status I_i, and φ(y_i; μ_j, σ²) denotes the normal density function valued at y_i with mean μ_j and variance σ². The ω_j_|_i's are a function of parental configuration c and the recombination fractions between the QTL and markers.

In the interval-mapping framework (Lander and Botstein 1989), the putative QTL is considered at incremental positions in an interval defined by adjacent markers, and a test statistic (commonly the log-likelihood ratio) is evaluated. Let R_h (h = 1, 2) denote the true recombination fractions between marker M_h and QTL. At each evaluation position, let r_h (h = 1, 2) denote the recombination fractions between markers defining the interval and the current QTL position. The log-likelihood-ratio test statistic is calculated to test the hypothesis

\begin{eqnarray*}&&\mathrm{H}_{0}:{\ }{\ }R_{h}\ =\ 0.5,\ h\ =\ 1,\ 2\\&&\mathrm{H}_{\mathrm{a}}:{\ }{\ }R_{h}\ =\ r_{h},\ h\ =\ 1,\ 2.\end{eqnarray*}

The null hypothesis assumes that the QTL is present, but unlinked to both markers, while the alternative hypothesis assumes that the QTL is present and linked to M₁ and M₂ with recombination fractions r_h. The log-likelihood-ratio test statistic LRT is

\[\mathrm{LRT}\ =\ {-}2\ \mathrm{ln}\frac{L\left(\mathrm{\mathbf{{\hat{{\theta}}}}}_{0},\ R_{1}\ =\ R_{2}\ =\ 0.5\right)}{L\left(\mathrm{\mathbf{{\hat{{\theta}}}}}_{\mathrm{a}},\ R_{1}\ =\ r_{1},\ R_{2}\ =\ r_{2}\right)},\]

where θ̂₀ and θ̂_a are MLEs under the null and alternative hypotheses, respectively. If the test statistic is significant then the corresponding interval position with the largest LRT statistic is considered as the estimated QTL position. A permutation test can be performed to estimate the significance threshold for the test statistic (Churchill and Doerge 1994).

Because the summation format of the likelihood function (1) causes difficulty in finding the MLE of θ, the complete likelihood function, containing both observable and unobservable data, is considered,

\[L\left(\mathrm{\mathbf{{\theta}}}|\mathbf{x}\right)\ =\ {{\prod}_{i=1}^{n}}\ {{\prod}_{j=0}^{k/2}}\left[\mathrm{{\omega}}_{j|i}\mathrm{{\phi}}\left(y_{i}{;}\ \mathrm{{\mu}}_{j},\ \mathrm{{\sigma}}^{2}\right)\right]^{z_{ij}},\]

2

and the EM algorithm (Dempster et al. 1977) is employed to estimate θ. Here, the unobservable data

\(\left\{z_{i}\ =\ \left(z_{i0},\ {\ldots}\ ,\ z_{i,k/2}\right)\right\}^{n}_{i=1}\)

⁠, where z_ij = 1 if the QTL dosage for the ith progeny is j and 0 otherwise, are indicator vectors representing progeny QTL dosages.

To implement the EM algorithm, the E step and M step are iterated until a convergence criterion of the log-likelihood (e.g., log L^t⁺¹ − log L^t < 10⁻⁶) is satisfied. In the E step, the missing data z_ij are estimated on the basis of the observable data, and, in the M step, the MLE of θ are calculated using the estimated z_ij. After t iterations, estimated values of z_ij^t, μ^t, and σ^t are used at step t + 1 such that the z_ij^t+1 are estimated in the E step by

\begin{eqnarray*}&&z^{t+1}_{ij}\ =\ P_{\mathrm{{\theta}}}\left(d_{Q}\ =\ j|\mathbf{x}_{i},\ c,\ \mathrm{\mathbf{{\mu}}}^{t},\ \mathrm{{\sigma}}^{t}\right)\\&&=\ \frac{\mathrm{{\omega}}_{j|i}\ \mathrm{exp}\left\{{-}{1}/{2}\left(\left(y_{i}\ {-}\ \mathrm{{\mu}}^{t}_{j}\right)/\mathrm{{\sigma}}^{t}\right)^{2}\right\}}{{\sum}^{k/2}_{l=0}\mathrm{{\omega}}_{l|i}\ \mathrm{exp}\left\{{-}{1}/{2}\left(\left(y_{i}\ {-}\ \mathrm{{\mu}}^{t}_{l}\right)/\mathrm{{\sigma}}^{t}\right)^{2}\right\}}.\end{eqnarray*}

Given z_ij^t+1, the MLE of θ = (μ, σ) is obtained (M step) by maximizing (2),

\begin{eqnarray*}&&\mathrm{{\mu}}^{t+1}_{j}\ =\ \frac{{\sum}^{n}_{i=1}z^{t+1}_{ij}y_{i}}{{\sum}^{n}_{i=1}z^{t+1}_{ij}},j\ =\ 0,\ 1,\ {\ldots}\ ,\ k/2{;}\\&&\mathrm{{\sigma}}^{t+1}\ =\ \sqrt{\ \frac{1}{n}\ {{\sum}_{i=1}^{n}}\ {{\sum}_{j=0}^{k/2}}z^{t+1}_{ij}\left(y_{i}\ {-}\ \mathrm{{\mu}}^{t+1}_{j}\right)^{2}}.\end{eqnarray*}

Since the estimation procedure is based on a specific parental configuration (model), the model space expands quickly as the marker number and/or ploidy level increase. Enumerating each possible model for the purpose of exploring which is most likely to produce the observed data and then estimating parameters on the basis of the most likely model(s) is computationally challenging. For example, under a pseudo-doubled backcross design with two markers and one QTL, there are 14 possible parental configurations for a tetraploid, 91 for a hexaploid (k = 6), and 390 for an octaploid (k = 8). To reduce these computations, we employ a model reduction step prior to parameter estimation.

Model reduction:

To reduce the model space, one can consider one of two approaches: either a marginal or a joint statistical method. Under the marginal method, we calculate the posterior probability of dosages for each marker individually. If one particular dosage level has a posterior probability higher than a specified cutoff, only that marker dosage is considered in candidate parental configurations. Otherwise, we select the most likely dosage levels until the sum of posterior probabilities exceeds the cutoff and these dosages are used to form the candidate parental configuration set. The joint method, on the other hand, takes a similar approach, but information on all markers is considered jointly. In this case, parental marker configuration posterior probabilities are calculated directly, and candidate parental configurations are chosen using the same posterior probability rule as described for the marginal method.

Given the marginal marker presence/absence distribution (Ripol et al. 1999), the posterior probabilities for parental marker dosages of each marker are calculated using Bayes' rule (Bayes 1783). For one marker M, the number of progeny with M absent, n_⊘, has a binomial distribution Bin(n, p_{d_M}), where d_M denotes the marker dosage of M in the informative parent. Under the pseudo-double backcross design,

\[p_{d_{M}}\ =\ \left(\begin{array}{l}k\ {-}\ d_{M}\\k/2\end{array}\right)/\left(\begin{array}{l}k\\k/2\end{array}\right)\]

for random pairing. Thus, given the informative parent marker dosage d_M,

\[P\left(N_{{\oslash}}\ =\ n_{{\oslash}}|n,\ d_{M}\right)\ {\triangleq}\mathrm{Bin}\left(n_{{\oslash}}{;}\ n,\ d_{M}\right)\ =\ \left(\begin{array}{l}n\\n_{{\oslash}}\end{array}\right)p^{n_{{\oslash}}}_{d_{M}}\left(1\ {-}\ p_{d_{M}}\right)^{n{-}n_{{\oslash}}}.\]

Let π denote the prior distribution of marker dosage d_M. The posterior probability of each dosage level is

\begin{eqnarray*}&&P\left(d_{M}|n,\ n_{{\oslash}}\right)\ =\ \frac{\mathrm{Bin}\left(n_{{\oslash}}{;}\ n,\ p_{d_{M}}\right)\mathrm{{\pi}}\left(d_{M}\right)}{{\sum}^{k/2}_{d=1}\mathrm{Bin}\left(n_{{\oslash}}{;}\ n,\ p_{d}\right)\mathrm{{\pi}}\left(d\right)}\\&&=\ \frac{p^{n_{{\oslash}}}_{d_{M}}\left(1\ {-}\ p_{d_{M}}\right)^{n{-}n_{{\oslash}}}\mathrm{{\pi}}\left(d_{M}\right)}{{\sum}^{k/2}_{d=1}p^{n_{{\oslash}}}_{d}\left(1\ {-}\ p_{d}\right)^{n{-}n_{{\oslash}}}\mathrm{{\pi}}\left(d\right)}.\end{eqnarray*}

As an alternative method to the described marginal approach, we use the joint marker information and genetic map to calculate the posterior probability of a parental marker configuration. Given markers M₁ and M₂, all progeny can be classified into four sets according to the presence status of the two markers. Let n_obs = (n₀₀, n₀₁, n₁₀, n₁₁) denote the observed frequency vector in the four sets, where the hth (h = 1, 2) subscript is 1 if M_h is present and 0 otherwise. Assuming no segregation distortion and a parental configuration c, the distribution of n_obs follows a multinomial distribution with probability parameter vector

\(\mathbf{p}^{c}\ =\ \left(p^{c}_{00},\ p^{c}_{01},\ p^{c}_{10},\ p^{c}_{11}\right)\)

⁠. Table 1

TABLE 1

Multinomial distribution probability parameters

\(\mathbf{p}^{\mathbf{c}}\ \mathbf{=}\ \left(\mathbf{p}^{\mathbf{c}}_{\mathbf{00}}\mathbf{,\ p}^{\mathbf{c}}_{\mathbf{01}}\mathbf{,\ p}^{\mathbf{c}}_{\mathbf{10}}\mathbf{,\ p}^{\mathbf{c}}_{\mathbf{11}}\right)\)

in a tetraploid with two markers under random pairing

	(M₁M₂\|M₁M₂)^a	(M₁M₂\|M₁)	(M₁M₂\|M₂)	(M₁\|M₂)	(M₂\|M₁)
p₀₀^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 1}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)
p₀₁^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₀^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₁^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 5}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)

	(M₁M₂\|M₁M₂)^a	(M₁M₂\|M₁)	(M₁M₂\|M₂)	(M₁\|M₂)	(M₂\|M₁)
p₀₀^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 1}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)
p₀₁^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₀^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₁^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 5}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)

^a

Only dominant alleles are shown for informative chromosomes, separated by |.

r, estimated recombination fraction between the two markers.

Open in new tab

TABLE 1

Multinomial distribution probability parameters

\(\mathbf{p}^{\mathbf{c}}\ \mathbf{=}\ \left(\mathbf{p}^{\mathbf{c}}_{\mathbf{00}}\mathbf{,\ p}^{\mathbf{c}}_{\mathbf{01}}\mathbf{,\ p}^{\mathbf{c}}_{\mathbf{10}}\mathbf{,\ p}^{\mathbf{c}}_{\mathbf{11}}\right)\)

in a tetraploid with two markers under random pairing

	(M₁M₂\|M₁M₂)^a	(M₁M₂\|M₁)	(M₁M₂\|M₂)	(M₁\|M₂)	(M₂\|M₁)
p₀₀^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 1}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)
p₀₁^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₀^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₁^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 5}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)

	(M₁M₂\|M₁M₂)^a	(M₁M₂\|M₁)	(M₁M₂\|M₂)	(M₁\|M₂)	(M₂\|M₁)
p₀₀^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 1}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)
p₀₁^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₀^c	\(\frac{{-}\mathit{r}^{2}\ +\ 2\mathit{r}}{6}\)	\(\frac{\mathit{r}}{6}\)	\(\frac{2\ +\ \mathit{r}}{6}\)	\(\frac{\mathit{r}}{2}\)	\(\frac{2\ {-}\ \mathit{r}}{6}\)
p₁₁^c	\(\frac{\mathit{r}^{2}\ {-}\ 2\mathit{r}\ +\ 5}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{3\ {-}\ \mathit{r}}{6}\)	\(\frac{1\ {-}\ \mathit{r}}{2}\)	\(\frac{1\ +\ \mathit{r}}{6}\)

^a

Only dominant alleles are shown for informative chromosomes, separated by |.

r, estimated recombination fraction between the two markers.

Open in new tab

lists p^c for a tetraploid. The probability of observing n_obs = (n₀₀, n₀₁, n₁₀, n₁₁) is

\[P\left(\mathbf{n}_{\mathrm{obs}}|n,\ c\right)\ {\triangleq}\mathrm{Multinomial}\left(\mathbf{n}_{\mathrm{obs}}{;}\ n,\ c\right)\ =\ \frac{n!}{{\prod}^{1}_{i,j=0}n_{ij}!}{{\prod}_{i,j=0}^{1}}\left\{p^{c}_{ij}\right\}^{n_{ij}}.\]

Let π denote the prior distribution of marker configuration c ∈ C. The posterior probability for a specific parental marker configuration c₀ ∈ C is

\begin{eqnarray*}&&P\left(c_{0}|n,\ \mathbf{n}_{\mathrm{obs}}\right)\ =\ \frac{\mathrm{Multinomial}\left(\mathbf{n}_{\mathrm{obs}}{;}\ n,\ c_{0}\right)\mathrm{{\pi}}\left(c_{0}\right)}{{\sum}_{c{\in}C}\ \mathrm{Multinomial}\left(\mathbf{n}_{\mathrm{obs}}{;}\ n,\ c\right)\mathrm{{\pi}}\left(c\right)}\\&&=\ \frac{{\prod}^{1}_{i,j=0}\left\{p^{c_{0}}_{ij}\right\}^{n_{ij}}\mathrm{{\pi}}\left(c_{0}\right)}{{\sum}_{c{\in}C}{\prod}^{1}_{i,j=0}\left\{p^{c}_{ij}\right\}^{n_{ij}}\mathrm{{\pi}}\left(c\right)}.\end{eqnarray*}

Among the many criteria for assessing the worth of a model reduction method are ease of implementation, a high probability of selecting the correct model, and efficiency in reducing the size of the candidate model space. Even though the marginal method is easy to implement and less computationally expensive, the joint method directly estimates the posterior probability of parental marker configurations. One might expect that because there may be multiple marker configurations that have the same marker dosages, especially if the marker dosages are low and ploidy high, the joint method will be more efficient in reducing the model space.

A simulation study was performed to compare the performance of the marginal and joint methods for a tetraploid under random pairing. We chose two markers and marker genetic distance varied from 0.1 to 0.5 M with 0.1-M increments. Sample sizes ranged from 50 to 500 with increments of 50. For each simulation setting, 10,000 data sets were generated. The cutoff probability was set to be 0.90 for both methods. Without any prior information concerning the parental marker configuration, we assumed all the possible parental marker configurations to be equally likely (i.e., the prior for the joint method is discrete uniform). Under this assumption, the prior distribution of a marker dosage is not uniform since a lower dosage is more frequent than a higher dosage. Among the five possible parental marker configurations (Figure 2, A–E) for a tetraploid, dosage 1 has a frequency of 3/5 and dosage 2 has a frequency of 2/5 for each marker locus. In turn, this frequency distribution was used as the prior of marker dosage for the marginal method.

The proportion of data sets for which the correct configuration was included in the candidate configuration space, p_inc, was recorded (Table 2)

TABLE 2

Estimated p_inc, probability of selecting the correct configuration in the candidate configuration (model) space, for a tetraploid with two markers, 10,000 simulated data sets, and a cutoff of 0.90 for both the marginal and joint model reduction methods under random pairing

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.992	0.992	0.991	0.999	0.999	0.999	1.000	1.000	0.999
(M₁M₂\|M₁m₂)	0.992	0.992	0.993	0.999	0.999	0.999	1.000	0.999	0.999
(M₁M₂\|m₁m₂)	0.988	0.987	0.985	0.998	0.999	0.999	1.000	0.999	1.000
(m₁M₂\|M₁m₂)	0.983	0.987	0.987	0.998	0.998	0.997	0.999	0.999	0.999
Joint
(M₁M₂\|M₁M₂)	0.999	0.999	0.999	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.999	0.999	0.998	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	1.000	0.999	0.998	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.999	0.998	0.988	1.000	1.000	0.999	1.000	1.000	0.999
(m₁M₂\|M₁m₂)	0.999	0.999	0.992	1.000	0.999	0.999	1.000	1.000	0.999

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.992	0.992	0.991	0.999	0.999	0.999	1.000	1.000	0.999
(M₁M₂\|M₁m₂)	0.992	0.992	0.993	0.999	0.999	0.999	1.000	0.999	0.999
(M₁M₂\|m₁m₂)	0.988	0.987	0.985	0.998	0.999	0.999	1.000	0.999	1.000
(m₁M₂\|M₁m₂)	0.983	0.987	0.987	0.998	0.998	0.997	0.999	0.999	0.999
Joint
(M₁M₂\|M₁M₂)	0.999	0.999	0.999	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.999	0.999	0.998	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	1.000	0.999	0.998	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.999	0.998	0.988	1.000	1.000	0.999	1.000	1.000	0.999
(m₁M₂\|M₁m₂)	0.999	0.999	0.992	1.000	0.999	0.999	1.000	1.000	0.999

^a

Marker genetic distance expressed in morgans (M).

n, sample size.

Open in new tab

TABLE 2

Estimated p_inc, probability of selecting the correct configuration in the candidate configuration (model) space, for a tetraploid with two markers, 10,000 simulated data sets, and a cutoff of 0.90 for both the marginal and joint model reduction methods under random pairing

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.992	0.992	0.991	0.999	0.999	0.999	1.000	1.000	0.999
(M₁M₂\|M₁m₂)	0.992	0.992	0.993	0.999	0.999	0.999	1.000	0.999	0.999
(M₁M₂\|m₁m₂)	0.988	0.987	0.985	0.998	0.999	0.999	1.000	0.999	1.000
(m₁M₂\|M₁m₂)	0.983	0.987	0.987	0.998	0.998	0.997	0.999	0.999	0.999
Joint
(M₁M₂\|M₁M₂)	0.999	0.999	0.999	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.999	0.999	0.998	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	1.000	0.999	0.998	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.999	0.998	0.988	1.000	1.000	0.999	1.000	1.000	0.999
(m₁M₂\|M₁m₂)	0.999	0.999	0.992	1.000	0.999	0.999	1.000	1.000	0.999

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.992	0.992	0.991	0.999	0.999	0.999	1.000	1.000	0.999
(M₁M₂\|M₁m₂)	0.992	0.992	0.993	0.999	0.999	0.999	1.000	0.999	0.999
(M₁M₂\|m₁m₂)	0.988	0.987	0.985	0.998	0.999	0.999	1.000	0.999	1.000
(m₁M₂\|M₁m₂)	0.983	0.987	0.987	0.998	0.998	0.997	0.999	0.999	0.999
Joint
(M₁M₂\|M₁M₂)	0.999	0.999	0.999	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.999	0.999	0.998	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	1.000	0.999	0.998	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.999	0.998	0.988	1.000	1.000	0.999	1.000	1.000	0.999
(m₁M₂\|M₁m₂)	0.999	0.999	0.992	1.000	0.999	0.999	1.000	1.000	0.999

^a

Marker genetic distance expressed in morgans (M).

n, sample size.

Open in new tab

for both methods. For the marginal method, this refers to selecting the correct marker dosages for both marker loci, while for the joint method it refers to selecting the correct marker configuration. The proportion of data sets for which a unique configuration was selected, p_uni, was also recorded as a measurement of the efficiency for reducing the model space (Table 3)

TABLE 3

Estimated p_uni, probability of selecting a unique configuration in the candidate configuration (model) space, for a tetraploid with two markers, 10,000 simulated data sets, and a cutoff of 0.90 for both the marginal and joint model reduction methods under random pairing

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	0.997	0.996	0.995	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.901	0.904	0.910	0.991	0.992	0.991	0.999	0.999	0.999
(M₁M₂\|M₁m₂)	0.903	0.908	0.900	0.991	0.991	0.989	0.999	0.999	0.999
(M₁M₂\|m₁m₂)	0.860	0.842	0.841	0.983	0.982	0.979	0.997	0.998	0.998
(m₁M₂\|M₁m₂)	0.813	0.821	0.820	0.982	0.978	0.982	0.998	0.998	0.998
Joint
(M₁M₂\|M₁M₂)	0.992	0.989	0.983	0.999	1.000	0.999	1.000	1.000	0.999
(M₁M₂\|m₁M₂)	0.999	0.991	0.979	1.000	0.999	0.999	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	0.999	0.989	0.978	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.991	0.981	0.867	0.999	0.999	0.981	1.000	1.000	0.997
(m₁M₂\|M₁m₂)	0.999	0.987	0.900	1.000	0.999	0.985	1.000	1.000	0.997

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	0.997	0.996	0.995	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.901	0.904	0.910	0.991	0.992	0.991	0.999	0.999	0.999
(M₁M₂\|M₁m₂)	0.903	0.908	0.900	0.991	0.991	0.989	0.999	0.999	0.999
(M₁M₂\|m₁m₂)	0.860	0.842	0.841	0.983	0.982	0.979	0.997	0.998	0.998
(m₁M₂\|M₁m₂)	0.813	0.821	0.820	0.982	0.978	0.982	0.998	0.998	0.998
Joint
(M₁M₂\|M₁M₂)	0.992	0.989	0.983	0.999	1.000	0.999	1.000	1.000	0.999
(M₁M₂\|m₁M₂)	0.999	0.991	0.979	1.000	0.999	0.999	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	0.999	0.989	0.978	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.991	0.981	0.867	0.999	0.999	0.981	1.000	1.000	0.997
(m₁M₂\|M₁m₂)	0.999	0.987	0.900	1.000	0.999	0.985	1.000	1.000	0.997

^a

Marker genetic distance expressed in morgans (M).

n, sample size.

Open in new tab

TABLE 3

Estimated p_uni, probability of selecting a unique configuration in the candidate configuration (model) space, for a tetraploid with two markers, 10,000 simulated data sets, and a cutoff of 0.90 for both the marginal and joint model reduction methods under random pairing

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	0.997	0.996	0.995	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.901	0.904	0.910	0.991	0.992	0.991	0.999	0.999	0.999
(M₁M₂\|M₁m₂)	0.903	0.908	0.900	0.991	0.991	0.989	0.999	0.999	0.999
(M₁M₂\|m₁m₂)	0.860	0.842	0.841	0.983	0.982	0.979	0.997	0.998	0.998
(m₁M₂\|M₁m₂)	0.813	0.821	0.820	0.982	0.978	0.982	0.998	0.998	0.998
Joint
(M₁M₂\|M₁M₂)	0.992	0.989	0.983	0.999	1.000	0.999	1.000	1.000	0.999
(M₁M₂\|m₁M₂)	0.999	0.991	0.979	1.000	0.999	0.999	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	0.999	0.989	0.978	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.991	0.981	0.867	0.999	0.999	0.981	1.000	1.000	0.997
(m₁M₂\|M₁m₂)	0.999	0.987	0.900	1.000	0.999	0.985	1.000	1.000	0.997

	n
	50			100			150
	0.1^a	0.3	0.5	0.1	0.3	0.5	0.1	0.3	0.5
Marginal
(M₁M₂\|M₁M₂)	0.997	0.996	0.995	1.000	1.000	1.000	1.000	1.000	1.000
(M₁M₂\|m₁M₂)	0.901	0.904	0.910	0.991	0.992	0.991	0.999	0.999	0.999
(M₁M₂\|M₁m₂)	0.903	0.908	0.900	0.991	0.991	0.989	0.999	0.999	0.999
(M₁M₂\|m₁m₂)	0.860	0.842	0.841	0.983	0.982	0.979	0.997	0.998	0.998
(m₁M₂\|M₁m₂)	0.813	0.821	0.820	0.982	0.978	0.982	0.998	0.998	0.998
Joint
(M₁M₂\|M₁M₂)	0.992	0.989	0.983	0.999	1.000	0.999	1.000	1.000	0.999
(M₁M₂\|m₁M₂)	0.999	0.991	0.979	1.000	0.999	0.999	1.000	1.000	1.000
(M₁M₂\|M₁m₂)	0.999	0.989	0.978	1.000	1.000	0.999	1.000	1.000	1.000
(M₁M₂\|m₁m₂)	0.991	0.981	0.867	0.999	0.999	0.981	1.000	1.000	0.997
(m₁M₂\|M₁m₂)	0.999	0.987	0.900	1.000	0.999	0.985	1.000	1.000	0.997

^a

Marker genetic distance expressed in morgans (M).

n, sample size.

Open in new tab

. In general, both methods performed better with a larger sample size and shorter marker distances. With sample size ≥100, both p_inc and p_uni were ∼1.0 for all the simulation settings. With a small sample size, the extra information from the genetic map added strength to the joint method both in selecting the correct configuration and in reducing the model space. On the basis of the simulation results, the joint model reduction method is recommended for data analysis.

Once the model reduction step is completed, the selected submodels are used to form the pool of candidate parental marker configurations by combining these submodels with possible parental QTL configurations. After this, the likelihood function is constructed for each candidate parental configuration, and the EM algorithm is applied to maximize the likelihood function and estimate parameters. The parental configuration that has the maximal likelihood is then identified as the parental configuration from which the observed data have arisen.

DATA ANALYSIS

The proposed approach was employed to analyze a published alfalfa SDRF data set with traits related to winter hardiness (Brouwer and Osborn 1999). Alfalfa is an autotetraploid that undergoes bivalent random pairing (Bingham and McCoy 1988; Cao et al. 2004). Winter hardiness is a complex trait and one of the most important adaptations for alfalfa. Because there is no direct measurement of winter hardiness, related traits including winter injury (WI), fall growth (FG), freezing injury (FI), and unifoliate internode length (UIL) were measured in addition to survival percentage of plants (SURV). SURV was measured by the percentage of each plant that survived in the winter (0–100%). FG was measured by height of vertical regrowth in centimeters in early October in two successive years, 1995 and 1996; FI was measured by absorbance at 265 nm (ABS) and electrical conductivity per gram fresh weight (COND) in 1995 and 1996; WI scaled 1–5 (1, no injury; 5, dead) was measured in 1996 and 1997; and UIL was measured for all seedlings. The histogram plots of these traits can be referred to Figure 1 in Brouwer et al. (2000).

Two genotypes, B17 and P13, representing the extremes for each trait were crossed, and a single F₁ plant was crossed to each parent to create two populations of 101 individuals each. Both populations were scored for 82 SDRF loci and measured for each trait in 2 years of replicated field trials. The composite map and the original analysis of the trait data and marker data can be found in Brouwer and Osborn (1999) and Brouwer et al. (2000). On the basis of the genetic map, seven linkage groups were identified and corresponded to seven out of the eight chromosomes. As such in what follows, the terms for chromosome and linkage group are used interchangeably.

Because only B17 markers were considered for the P13 × F₁ backcross population and similarly only P13 markers for the B17 × F₁ backcross, each backcross population is equivalent to a pseudo-doubled backcross population. Our approach was applied to identify regions in the genome related to the previously described quantitative traits for both backcross populations based on the composite map. The average over the replicates in each year for each measurement was treated as a quantitative trait, and the two averages were analyzed separately. A permutation test (Churchill and Doerge 1994) with 1000 permutations for a 0.05 significance level was used to detect a significant QTL.

Seven QTL were detected for FG, two for FI, one for SURV, and one for WI in the backcross to B17; and one QTL was detected for UIL in the backcross to P13 (Table 4)

TABLE 4

Estimated locations and effects of QTL associated with winter hardiness and its related traits for autotetraploid alfalfa

Trait	Year	Chr	Marker interval	Pos	d̂₁	d_Q	μ̂^a	σ̂
FG^b	1995	1	uwg284–vg2b9	0.31	0.31	2	12.24, 21.45, 24.61	2.61
	1996	1	uwg284–vg2b9	0.34	0.34	2	12.51, 19.18, 19.29	3.17
	1995	5	vg2a2–vg2e5	0.498	0.001	2	12.30, 21.65, 23.75	2.82
	1996	5	hg2b12–vg2a2	0.456	0.05	2	12.70, 18.48, 22.08	2.96
	1996	5	mtsc35–ugac473	0.29	0.02	2	15.23, 18.46, 20.90	3.63
	1996	8	ugac291–ugac540	0.187	0.10	2	13.02, 18.60, 21.53	3.08
	1996	8	ugac109–uwg169	0.704	0.06	2	15.57, 18.95, 19.95	3.62
COND^c	1996	6	ugac281–vg2c2	0.356	0.09	2	52.42, 59.05, 94.60	13.30
	1996	8	ugc291–ugac540	0.247	0.16	2	50.91, 58.36, 92.89	12.64
SURV^d	1995	8	ugac291–ugac540	0.187	0.10	1	0.54, 0.14	0.229
WI^e	1996	8	ugc291–ugac540	0.217	0.13	2	2.66, 4.40, 4.82	0.513
UIL	—	8	ugac235–ugac291	0.001	0.001	2	9.34, 3.35, 12.74	1.847

Trait	Year	Chr	Marker interval	Pos	d̂₁	d_Q	μ̂^a	σ̂
FG^b	1995	1	uwg284–vg2b9	0.31	0.31	2	12.24, 21.45, 24.61	2.61
	1996	1	uwg284–vg2b9	0.34	0.34	2	12.51, 19.18, 19.29	3.17
	1995	5	vg2a2–vg2e5	0.498	0.001	2	12.30, 21.65, 23.75	2.82
	1996	5	hg2b12–vg2a2	0.456	0.05	2	12.70, 18.48, 22.08	2.96
	1996	5	mtsc35–ugac473	0.29	0.02	2	15.23, 18.46, 20.90	3.63
	1996	8	ugac291–ugac540	0.187	0.10	2	13.02, 18.60, 21.53	3.08
	1996	8	ugac109–uwg169	0.704	0.06	2	15.57, 18.95, 19.95	3.62
COND^c	1996	6	ugac281–vg2c2	0.356	0.09	2	52.42, 59.05, 94.60	13.30
	1996	8	ugc291–ugac540	0.247	0.16	2	50.91, 58.36, 92.89	12.64
SURV^d	1995	8	ugac291–ugac540	0.187	0.10	1	0.54, 0.14	0.229
WI^e	1996	8	ugc291–ugac540	0.217	0.13	2	2.66, 4.40, 4.82	0.513
UIL	—	8	ugac235–ugac291	0.001	0.001	2	9.34, 3.35, 12.74	1.847

^a

μ = (μ₀, μ₁, … , μ_{d_Q}) is the estimated mean vector.

^b

The trait fall growth (FG) was measured by height of vertical regrowth in centimeters in early October in 2 successive years, 1995 and 1996.

^c

Electrical conductivity per gram fresh weight (COND) was measured in 1995 and 1996 to represent freezing injury (FI).

^d

Winter survival percentage of plants (SURV) was measured in 1995 and 1996 to represent winter hardiness.

^e

Winter injury (WI) was scaled 1–5 (1, no injury; 5, dead) and was measured in 1996 and 1997.

Data are from Brouwer and Osborn (1999). Significant QTL were detected by experiment-wise thresholds calculated from 1000 permutation tests at the 0.05 significance level. Chr, chromosome number; Pos, estimated composite map position of the QTL; d̂₁, genetic distance between the detected QTL and left end marker of the interval; d_Q, estimated QTL dosage; σ, estimated common standard deviation of the trait. UIL, unifoliate internode length.

Open in new tab

TABLE 4

Estimated locations and effects of QTL associated with winter hardiness and its related traits for autotetraploid alfalfa

Trait	Year	Chr	Marker interval	Pos	d̂₁	d_Q	μ̂^a	σ̂
FG^b	1995	1	uwg284–vg2b9	0.31	0.31	2	12.24, 21.45, 24.61	2.61
	1996	1	uwg284–vg2b9	0.34	0.34	2	12.51, 19.18, 19.29	3.17
	1995	5	vg2a2–vg2e5	0.498	0.001	2	12.30, 21.65, 23.75	2.82
	1996	5	hg2b12–vg2a2	0.456	0.05	2	12.70, 18.48, 22.08	2.96
	1996	5	mtsc35–ugac473	0.29	0.02	2	15.23, 18.46, 20.90	3.63
	1996	8	ugac291–ugac540	0.187	0.10	2	13.02, 18.60, 21.53	3.08
	1996	8	ugac109–uwg169	0.704	0.06	2	15.57, 18.95, 19.95	3.62
COND^c	1996	6	ugac281–vg2c2	0.356	0.09	2	52.42, 59.05, 94.60	13.30
	1996	8	ugc291–ugac540	0.247	0.16	2	50.91, 58.36, 92.89	12.64
SURV^d	1995	8	ugac291–ugac540	0.187	0.10	1	0.54, 0.14	0.229
WI^e	1996	8	ugc291–ugac540	0.217	0.13	2	2.66, 4.40, 4.82	0.513
UIL	—	8	ugac235–ugac291	0.001	0.001	2	9.34, 3.35, 12.74	1.847

Trait	Year	Chr	Marker interval	Pos	d̂₁	d_Q	μ̂^a	σ̂
FG^b	1995	1	uwg284–vg2b9	0.31	0.31	2	12.24, 21.45, 24.61	2.61
	1996	1	uwg284–vg2b9	0.34	0.34	2	12.51, 19.18, 19.29	3.17
	1995	5	vg2a2–vg2e5	0.498	0.001	2	12.30, 21.65, 23.75	2.82
	1996	5	hg2b12–vg2a2	0.456	0.05	2	12.70, 18.48, 22.08	2.96
	1996	5	mtsc35–ugac473	0.29	0.02	2	15.23, 18.46, 20.90	3.63
	1996	8	ugac291–ugac540	0.187	0.10	2	13.02, 18.60, 21.53	3.08
	1996	8	ugac109–uwg169	0.704	0.06	2	15.57, 18.95, 19.95	3.62
COND^c	1996	6	ugac281–vg2c2	0.356	0.09	2	52.42, 59.05, 94.60	13.30
	1996	8	ugc291–ugac540	0.247	0.16	2	50.91, 58.36, 92.89	12.64
SURV^d	1995	8	ugac291–ugac540	0.187	0.10	1	0.54, 0.14	0.229
WI^e	1996	8	ugc291–ugac540	0.217	0.13	2	2.66, 4.40, 4.82	0.513
UIL	—	8	ugac235–ugac291	0.001	0.001	2	9.34, 3.35, 12.74	1.847

^a

μ = (μ₀, μ₁, … , μ_{d_Q}) is the estimated mean vector.

^b

The trait fall growth (FG) was measured by height of vertical regrowth in centimeters in early October in 2 successive years, 1995 and 1996.

^c

Electrical conductivity per gram fresh weight (COND) was measured in 1995 and 1996 to represent freezing injury (FI).

^d

Winter survival percentage of plants (SURV) was measured in 1995 and 1996 to represent winter hardiness.

^e

Winter injury (WI) was scaled 1–5 (1, no injury; 5, dead) and was measured in 1996 and 1997.

Data are from Brouwer and Osborn (1999). Significant QTL were detected by experiment-wise thresholds calculated from 1000 permutation tests at the 0.05 significance level. Chr, chromosome number; Pos, estimated composite map position of the QTL; d̂₁, genetic distance between the detected QTL and left end marker of the interval; d_Q, estimated QTL dosage; σ, estimated common standard deviation of the trait. UIL, unifoliate internode length.

Open in new tab

. Among the seven QTL detected for FG, two were mapped on chromosome 1, three on chromosome 5, and two on chromosome 8. Two pairs of QTL were detected at close map positions across 2 years: on chromosome 1, one QTL was detected at 0.31 M for the 1995 average and one at 0.34 M for the 1996 average; on chromosome 5, one QTL was detected at 0.498 M for the 1995 average and one at 0.456 M for the 1996 average. Two QTL were detected on chromosomes 6 and 8 for freezing injury FI. One QTL was detected on chromosome 8 for SURV and WI. On chromosome 8, in the region from 0.187 to 0.247 M, four QTL were detected for multiple traits, fall growth (FG 1996), freezing injury (COND 1996), winter hardiness (SURV 1995), and winter injury (1996); and two of them were at the same location, 0.187 M. The one QTL for UIL in the P13 backcross was at the top of chromosome 8. All of the detected QTL were identified to have dosage two, except for the QTL for winter hardiness (SURV). In general, an additional copy of a QTL was found to be associated with a higher quantitative trait mean, except for the QTL for SURV and UIL.

Brouwer et al. (2000) used a regression-based single-marker analysis to identify markers associated with the previously described quantitative traits. Multiple regression analyses were performed using the same data for all 82 SDRFs to identify the best polygenic models for each trait in each backcross population. With the backcross to B17, the SDRF multiple regression models include six SDRFs for FG (1995), six for FG (1996), five for FI (COND 1996), and four for WI (1996) [Table 5

TABLE 5

Estimated locations of QTL affecting traits related to winter hardiness in Brouwer et al. (2000): (A) fall growth (FG) 1995 average; (B) fall growth (FG) 1996 average; (C) freezing injury (FI: COND) 1996 average; (D) winter injury (WI) 1996 average

A			B			C			D
Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos
1	vg2b9	35.8^a	1	vg2b9	35.8^a	1	vg2b9	35.8^b	2	vg0c5	23.0^c
1	mtsc9	51.4	3	mtsc14	46.1^b	1	mtsc9	51.4^d	4	ugac671	15.3
3	uwg96	24.3^b	5	mtsc35b	27.0^a	3	ugac224	23.1^d	8	ugac540	30.9^d
3	mtsc14	46.1^c	5	ugac473	29.3^a	8	ugac540	30.9^c	8	ugac191	63.7
5	vg2a2	49.8^a	8	ugac540	30.9^c	8	ugac191	63.7
8	ugac191	63.7	8	ugac191	63.7^c

A			B			C			D
Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos
1	vg2b9	35.8^a	1	vg2b9	35.8^a	1	vg2b9	35.8^b	2	vg0c5	23.0^c
1	mtsc9	51.4	3	mtsc14	46.1^b	1	mtsc9	51.4^d	4	ugac671	15.3
3	uwg96	24.3^b	5	mtsc35b	27.0^a	3	ugac224	23.1^d	8	ugac540	30.9^d
3	mtsc14	46.1^c	5	ugac473	29.3^a	8	ugac540	30.9^c	8	ugac191	63.7
5	vg2a2	49.8^a	8	ugac540	30.9^c	8	ugac191	63.7
8	ugac191	63.7	8	ugac191	63.7^c

^a

Locus close to the QTL identified by the interval-mapping method.

^b

LRT at this locus is a peak close to the 0.05 threshold.

^c

LRT at this locus is higher than the 0.05 threshold, but not a peak.

^d

LRT at this locus is a peak, not close to the 0.05 threshold.

Chr, chromosome; Pos, estimated composite map position of the QTL.

Open in new tab

TABLE 5

Estimated locations of QTL affecting traits related to winter hardiness in Brouwer et al. (2000): (A) fall growth (FG) 1995 average; (B) fall growth (FG) 1996 average; (C) freezing injury (FI: COND) 1996 average; (D) winter injury (WI) 1996 average

A			B			C			D
Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos
1	vg2b9	35.8^a	1	vg2b9	35.8^a	1	vg2b9	35.8^b	2	vg0c5	23.0^c
1	mtsc9	51.4	3	mtsc14	46.1^b	1	mtsc9	51.4^d	4	ugac671	15.3
3	uwg96	24.3^b	5	mtsc35b	27.0^a	3	ugac224	23.1^d	8	ugac540	30.9^d
3	mtsc14	46.1^c	5	ugac473	29.3^a	8	ugac540	30.9^c	8	ugac191	63.7
5	vg2a2	49.8^a	8	ugac540	30.9^c	8	ugac191	63.7
8	ugac191	63.7	8	ugac191	63.7^c

A			B			C			D
Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos	Chr	Marker	Pos
1	vg2b9	35.8^a	1	vg2b9	35.8^a	1	vg2b9	35.8^b	2	vg0c5	23.0^c
1	mtsc9	51.4	3	mtsc14	46.1^b	1	mtsc9	51.4^d	4	ugac671	15.3
3	uwg96	24.3^b	5	mtsc35b	27.0^a	3	ugac224	23.1^d	8	ugac540	30.9^d
3	mtsc14	46.1^c	5	ugac473	29.3^a	8	ugac540	30.9^c	8	ugac191	63.7
5	vg2a2	49.8^a	8	ugac540	30.9^c	8	ugac191	63.7
8	ugac191	63.7	8	ugac191	63.7^c

^a

Locus close to the QTL identified by the interval-mapping method.

^b

LRT at this locus is a peak close to the 0.05 threshold.

^c

LRT at this locus is higher than the 0.05 threshold, but not a peak.

^d

LRT at this locus is a peak, not close to the 0.05 threshold.

Chr, chromosome; Pos, estimated composite map position of the QTL.

Open in new tab

, summarized from Brouwer et al.'s (2000) Table 2]. Among all the detected SDRFs, all but five are located on chromosomes 1, 5, and 8, which also contain most of QTL we detected with the proposed method. Two loci for FG (1995) and three for FG (1996) are also shared by our interval-mapping method. Among the other significant marker loci, four had likelihood-ratio test statistics larger than the 0.05 experiment-wise threshold used in our interval-mapping method, but were not local peaks. Five loci had likelihood-ratio test statistics being local peaks, with two of them close to the 0.05 threshold.

Using an experiment-wise threshold based on a significance level 0.05, rather than an asymptotic-based strict cutoff based on a significance level 0.05 or 0.005 for each marker (Brouwer et al. 2000), we detected fewer QTL by controlling the type I error. The region on chromosome 8 (from 0.187 to 0.247 M) that was identified by our method as being shared by multiple traits was not detected by the regression method. This is most likely the direct result of more experimental information being incorporated into the analysis (e.g., model selection, dosages, intermarker position). Since single-marker analysis explores only the relationship between the quantitative trait and marker loci, if the QTL is very close to a marker locus, the estimated effect for that marker locus could be interpreted as the effect of the QTL, under the assumption that the QTL is also of a single dose and in coupling with the marker. However, most of the QTL that were detected using our method were of dosage two; therefore, it is not appropriate to compare the QTL effects estimated by Brouwer et al. (2000) with our method.

Ma et al. (2002) analyzed the same alfalfa data for two traits, WI and fall injury (FI: COND). They identified five QTL for FI: COND on chromosomes 4, 5, 6, and 8, and four QTL for winter injury on chromosomes 1, 5, and 8. The specifics of the Ma et al. (2002) method and analysis make a direct comparison between our results and their results difficult for the following reasons. First, it is assumed in Ma et al. (2002) that the markers and QTL have the same dosage level and are in coupling, while most of our detected QTL were of dosage two. Second, there is some evidence that the preferential pairing factor estimation process provided by Ma et al. (2002) is flawed since in reality the genotypic information collected for across years should not change, yet their estimates of preferential factor do (Cao et al. 2004). Finally, in Ma et al. (2002) the critical value for the significance level 5% is calculated on the basis of 200 permutation tests. For a 5% significance level (Churchill and Doerge 1994; Doerge and Churchill 1996), 200 permutations limit full exploration of the null space and may lead to the declaration of some QTL that are actually not significant at the 5% level.

SIMULATION

We used the framework provided by both the alfalfa experiment and analysis as a means to investigate the reliability of the detected QTL via a simulation study. Data were simulated by treating the estimated parental configuration and related parameters as the true values. For this purpose, 5 of the 12 QTL that we detected were chosen to provide a range for the flanking marker genetic distance and configuration, QTL dosage and configuration, QTL effect, and QTL position (Table 6)

TABLE 6

Estimated QTL location and effect for data simulated from five intervals containing detected QTL as indicated by alfalfa data and analysis

QTL	Trait	Pos	Mconfig	Freq	d(M₁, M₂)	d₁	d̅̂₁ \(\left(\mathit{s}\left(\mathit{{\hat{d}}}_{1}\right)\right)\)	σ	σ̅̂	μ^a	μ̅̂
1	FG	0.31	E	999	0.358	0.31	0.30(0.066)	2.61	2.55	12.24, 21.45, 24.61	12.32, 21.49, 24.54
2	FG	0.34	E	992	0.358	0.34	0.30(0.092)	3.17	3.04	12.51, 19.18, 19.29	12.53, 19.21, 19.29
3	FG	0.456	D	862	0.098	0.05	0.059(0.036)	2.96	2.82	12.70, 18.48, 22.08	12.68, 18.58, 21.89
4	SURV	0.187	D	955	0.222	0.10	0.097(0.041)	0.23	0.23	0.54, 0.14	0.54, 0.14
5	UIL	0.001	D	954	0.087	0.001	0.015(0.027)	1.85	1.86	9.34, 3.35, 12.74	9.14, 3.41, 12.50

QTL	Trait	Pos	Mconfig	Freq	d(M₁, M₂)	d₁	d̅̂₁ \(\left(\mathit{s}\left(\mathit{{\hat{d}}}_{1}\right)\right)\)	σ	σ̅̂	μ^a	μ̅̂
1	FG	0.31	E	999	0.358	0.31	0.30(0.066)	2.61	2.55	12.24, 21.45, 24.61	12.32, 21.49, 24.54
2	FG	0.34	E	992	0.358	0.34	0.30(0.092)	3.17	3.04	12.51, 19.18, 19.29	12.53, 19.21, 19.29
3	FG	0.456	D	862	0.098	0.05	0.059(0.036)	2.96	2.82	12.70, 18.48, 22.08	12.68, 18.58, 21.89
4	SURV	0.187	D	955	0.222	0.10	0.097(0.041)	0.23	0.23	0.54, 0.14	0.54, 0.14
5	UIL	0.001	D	954	0.087	0.001	0.015(0.027)	1.85	1.86	9.34, 3.35, 12.74	9.14, 3.41, 12.50

^a

μ = (μ₀, μ₁, … , μ_{d_Q}) is the mean vector of the trait corresponding to QTL dosages.

For each interval, 1000 data sets of sample size 100 were simulated. Pos, position of the detected QTL on the composite genetic map; Mconfig, marker configuration (D is coupling and E is repulsion); d₁, distance between the QTL and the left end marker; σ, common standard deviation of the trait for all QTL dosage levels; Freq, frequency of selecting the correct parental configuration out of 1000 data sets; d̅̂₁, average of the 1000 estimated QTL locations from the 1000 simulated data sets; s

\(\left(\overline{\mathit{{\hat{d}}}}_{1}\right)\)

⁠, sample standard deviation of the 1000 estimated QTL locations from the 1000 simulated data sets; σ̅̂, average of population standard deviation estimates; μ̅̂, average of population mean vector estimates.

Open in new tab

TABLE 6

Estimated QTL location and effect for data simulated from five intervals containing detected QTL as indicated by alfalfa data and analysis

QTL	Trait	Pos	Mconfig	Freq	d(M₁, M₂)	d₁	d̅̂₁ \(\left(\mathit{s}\left(\mathit{{\hat{d}}}_{1}\right)\right)\)	σ	σ̅̂	μ^a	μ̅̂
1	FG	0.31	E	999	0.358	0.31	0.30(0.066)	2.61	2.55	12.24, 21.45, 24.61	12.32, 21.49, 24.54
2	FG	0.34	E	992	0.358	0.34	0.30(0.092)	3.17	3.04	12.51, 19.18, 19.29	12.53, 19.21, 19.29
3	FG	0.456	D	862	0.098	0.05	0.059(0.036)	2.96	2.82	12.70, 18.48, 22.08	12.68, 18.58, 21.89
4	SURV	0.187	D	955	0.222	0.10	0.097(0.041)	0.23	0.23	0.54, 0.14	0.54, 0.14
5	UIL	0.001	D	954	0.087	0.001	0.015(0.027)	1.85	1.86	9.34, 3.35, 12.74	9.14, 3.41, 12.50

QTL	Trait	Pos	Mconfig	Freq	d(M₁, M₂)	d₁	d̅̂₁ \(\left(\mathit{s}\left(\mathit{{\hat{d}}}_{1}\right)\right)\)	σ	σ̅̂	μ^a	μ̅̂
1	FG	0.31	E	999	0.358	0.31	0.30(0.066)	2.61	2.55	12.24, 21.45, 24.61	12.32, 21.49, 24.54
2	FG	0.34	E	992	0.358	0.34	0.30(0.092)	3.17	3.04	12.51, 19.18, 19.29	12.53, 19.21, 19.29
3	FG	0.456	D	862	0.098	0.05	0.059(0.036)	2.96	2.82	12.70, 18.48, 22.08	12.68, 18.58, 21.89
4	SURV	0.187	D	955	0.222	0.10	0.097(0.041)	0.23	0.23	0.54, 0.14	0.54, 0.14
5	UIL	0.001	D	954	0.087	0.001	0.015(0.027)	1.85	1.86	9.34, 3.35, 12.74	9.14, 3.41, 12.50

^a

μ = (μ₀, μ₁, … , μ_{d_Q}) is the mean vector of the trait corresponding to QTL dosages.

For each interval, 1000 data sets of sample size 100 were simulated. Pos, position of the detected QTL on the composite genetic map; Mconfig, marker configuration (D is coupling and E is repulsion); d₁, distance between the QTL and the left end marker; σ, common standard deviation of the trait for all QTL dosage levels; Freq, frequency of selecting the correct parental configuration out of 1000 data sets; d̅̂₁, average of the 1000 estimated QTL locations from the 1000 simulated data sets; s

\(\left(\overline{\mathit{{\hat{d}}}}_{1}\right)\)

⁠, sample standard deviation of the 1000 estimated QTL locations from the 1000 simulated data sets; σ̅̂, average of population standard deviation estimates; μ̅̂, average of population mean vector estimates.

Open in new tab

. The flanking marker distances were 0.358, 0.358, 0.098, 0.222, and 0.087 M for the five intervals (Table 6). The flanking markers in the first two intervals were in repulsion, and the others were in coupling. The estimated QTL dosage was one for the fourth interval and two for the other intervals. The simulation setting of the second interval was similar to that of the first interval. We included the second interval in our simulation because we were interested in investigating whether our method is able to identify the small increment from μ₁ to μ₂, where μ = (μ₀, μ₁, μ₂) = (12.51, 19.18, 19.29) is the estimated mean vector for the corresponding QTL in that interval.

The five interval settings were studied separately. For each interval, 1000 data sets with a sample size of 100 (i.e., similar to the alfalfa data) were simulated and analyzed, with results summarized in Table 6. Using the joint model selection method, the correct parental marker configuration was included in the candidate parental marker configuration pool 100% of the time for all five cases. The overall parental configuration pool is then formed by the combination of candidate parental marker configurations and possible QTL configurations. After fitting our interval-mapping model to each possible parental configuration, the one having the maximal test statistic provides the estimated QTL location and marker and QTL configuration, as well as the parameter estimates. The correct configuration was identified at least 95% of the time with markers in coupling and at least 99% of the time with markers in repulsion, except for the third interval where the correct configuration was selected 86% of the time. Given the correct configuration, the estimated parameters were close to the true values used for simulation.

The fact that our method performed better with repulsion than with coupling is due to the difference between these two configurations. This can be demonstrated by considering an example where two copies of the QTL are on both informative chromosomes. When the markers are in coupling, one copy of the QTL is located on a chromosome containing no dominant marker alleles. Thus, not much extra information with respect to the recombination between markers and the QTL can be provided by this extra copy of the QTL. This, in turn, makes it difficult to choose between the configurations with one copy and two copies of the QTL. On the other hand, with markers in repulsion, both copies of the QTL provide information about recombination between markers and QTL. The effect of this difference is easily illustrated by comparing the second interval with the third interval used for this simulation. Although the increment in trait means from QTL dose 1 to dose 2 is only 0.035σ for case 2 and 1.22σ for case 3, case 2 still outperformed case 3. On the basis of this, repulsion is not necessarily the worst-case scenario for QTL mapping in tetraploids with simplex markers, as was once thought.

Since the above simulation was mainly focused on parental marker configurations formed by simplex markers, another simulation study was performed to serve as the basis of investigation for how, in general, QTL or marker dosages and their respective linkage phase affect the performance of our algorithm. Under a pseudo-doubled backcross experiment, two markers (M₁/m₁ and M₂/m₂) and one QTL (Q/q) were used with marker distance 0.10 M and a sample size 100. The location of the QTL was 0.01, 0.03, or 0.05 M. The trait mean μ was (10.0, 12.0) if QTL dosage was 1 or (10.0, 12.0, 14.0) if QTL dosage was 2, and σ was 1.0. One thousand data sets were simulated for each combination of the parameter settings and each of the following configurations: double coupling (A1, A3), asymmetric coupling 12 (B1, B2, B3), coupling (D1, D2, D3), and repulsion (E1, E2, E3).

Our simulation results illustrate that when the correct parental configuration was identified, the parameter estimate averages were close to the true value. The ability to identify the correct configuration, however, depended on the underlying parental configuration. The worst-case scenario was when markers were in coupling on the first informative chromosome with one copy of the QTL on the second informative chromosome (D2). For this case the identified correct configuration proportion ranged from 63 to 66% (Figure 3)

Figure 3.—

Open in new tab Download slide

The proportion identifying the correct parental (marker and QTL) configuration for a tetraploid under a pseudo-doubled backcross experiment. The trait mean μ is (10.0, 12.0) if QTL dosage is 1 or (10.0, 12.0, 14.0) if QTL dosage is 2. Population standard deviation σ is 1.0. With sample size 100, 1000 data sets are simulated under 11 parental configurations corresponding to numbers 1–11 on the horizontal axis: A1, A3, B1, B2, B3, D1, D2, D3, E1, E2, and E3. The letters denote the marker configurations: (A) double coupling, (B) asymmetric coupling 12, (D) coupling, and (E) repulsion. The numbers denote the QTL configurations: (1) one dose of QTL present on the first informative chromosome, (2) one dose of QTL present on the second informative chromosome, and (3) two doses of QTL present on both informative chromosomes. d(M₁, Q) is the genetic distance between the QTL and the left end-flanking marker.

. This is not surprising since with D2 under random pairing, only one-third of the pairings provide information concerning the recombination between marker and QTL. On the other hand, the best situation was when two markers were in repulsion, where the correct parental configuration was almost 100% regardless of the QTL configuration or location. Most of the other configurations obtained a proportion of at least 95% (Figure 3).

It is worth noting that in our earlier work (Cao et al. 2003), repulsion did not outperform configurations with higher dosages of loci, such as double coupling and asymmetric coupling. The difference in results comes from the difference in the assumed chromosome-pairing mechanism. Our earlier work (Cao et al. 2003) investigated the performance of our interval-mapping method under preferential pairing, which specifically deals with bivalent pairing (i.e., an informative chromosome pairs only with a noninformative chromosome). The preferential pairing mechanism is favorable to configurations with higher dosages of loci because it maximizes the amount of recombination information through maximizing the disequilibrium in pairing. However, with random pairing, one-third of the time one informative chromosome pairs with another informative chromosome to provide less, or even no, information, if two chromosomes are homozygous. Using similar reasoning it can be demonstrated that random pairing is more informative than preferential pairing for markers in repulsion.

DISCUSSION

One of the largest challenges when mapping QTL in polyploids is the incomplete parental configuration (model) information. This is compounded by the dramatic increase in the size of the model space as the number of markers and/or ploidy level increase. To combat these challenges in autopolyploid QTL mapping, our interval-mapping algorithm takes one of two approaches. The size of the model space is addressed using Bayesian model selection by first identifying potential parental marker dosages using a marginal approach or marker configurations based on a joint approach. We have demonstrated that both methods are effective in reducing the model space, but that the joint method is more efficient due to the fact that it incorporates the genetic map information. From the reduced model space, the candidate parental configuration pool is formed by combining each potential parental marker configuration with a possible QTL configuration. For each potential parental configuration, profiles of MLEs of QTL effects, as well as population variation and likelihood-ratio test statistics, are produced at each evaluation point of a grid search across the genome. Putative QTL are detected using a resampling-based significance threshold. The corresponding parental configuration (and related parameter estimates) is then taken as the underlying model from which the data are produced.

Our simulation studies have demonstrated that if the correct parental configuration is identified, our interval-mapping method provides parameter estimates close to true values. However, the chance of identifying the correct configuration depends on the true parental configuration, chromosome pairing system, and genetic distance between markers and the QTL. This helps us address a question raised in Doerge and Craig (2000)(p. 7956), “for which situation is the linkage mostly affected?” Comparatively, among these factors, the true parental configuration is the most influential one. Since a configuration consists of both loci dosages and linkage phase, the influence of one cannot be considered without considering the effect of the other. For example, a configuration with higher marker/QTL dosages does not always imply a greater chance of identifying the correct configuration as shown in the case of repulsion. However, a configuration with higher dosages is often more informative because it often has a higher disequilibrium, which as we all know is the basis for linkage analysis. Furthermore, such a configuration is less sensitive to the chromosome-pairing system and thus consistently provides high power for detecting QTL.

Our simulation results also help to answer some of the remaining questions presented by Doerge and Craig (2000)(p. 7956). First, “If a molecular marker is found to be tightly linked to a QTL, should the dosage of the marker agree with the dosage of the QTL?” Not necessarily. There is no causal relationship between the dosages of the QTL and the flanking markers. Although as discussed previously a configuration with the marker and QTL having the same dosage often has a higher disequilibrium and thus a higher power of detecting QTL. Second, “Should the models which are controlled by dosage levels be weighted for the purpose of representing more realistic results?” Weighting the candidate models may be a better way than depending solely on the one “best” model, when the uncertainty about parental configuration is high. However, even configurations with the same marker dosage levels could differ to a greater extent biologically; therefore, further investigation is needed to find a both statistically and biologically sound weighting scheme. Finally, “Would models with dosage levels more similar to each other be more likely, especially with close linkage (short genetic distance)?” This answer depends on the linkage phase since it is possible for two models with exactly the same dosage levels and genetic association to be biologically different due to different linkage phases, and therefore they are different in likelihood.

We applied our approach to a published alfalfa data set for quantitative traits that are related to winter hardiness. Our analysis confirms several putative QTL identified by a regression-based single-marker analysis (Brouwer et al. 2000), and it also identified a new region on chromosome 8 that was shared by several traits. Our analysis results demonstrate that for a majority of the detected QTL, there is a positive or negative correlation between the quantitative trait mean and QTL dosage level; i.e., μ₀ ≤ μ₁ ≤ … ≤ μ_{d_Q} or μ₀ ≥ μ₁ ≥ … ≥ μ_{d_Q}. If such an ordering of means is available before analysis, an order-restricted version of interval mapping can be performed to improve power of detection. With an order constraint, the problem falls into the category of isotonic regression and it can be computed using the pool-adjacent-violator algorithm (Robertson et al. 1988). We have performed simulations (not shown) to compare the power between an order restriction and without one. In fact, even when the only information is that the means are ordered, and the direction remains unknown, there is a slight improvement in the performance. The improvement is most evident when the order of the means is increasing or decreasing with respect to the QTL dosage and is known a priori.

Thus far it has been assumed that the QTL dosage is at most half the ploidy level, which is a reasonable assumption given the pseudo-doubled backcross experimental design that we considered. In practice, if this assumption does not hold, we can simply change the upper bound for the QTL dosage to the ploidy level. This change will result in more candidate QTL configurations to choose from, but the method and the procedure remain the same. Similarly, if the breeding design is altered from a pseudo-doubled backcross design, our approach can be adapted. Also, if codominant markers are considered, they can be easily incorporated in the model by including parameters such as allelic effect and interaction.

The effect of the assumptions we place on the genetic map that is used for QTL mapping in polyploids is one area of research that remains unaddressed. When the genetic map is estimated, the most likely parental marker configuration in turn has to be estimated since recombination requires a parental marker configuration. Because of this, the parental marker configuration used for estimating the genetic map should be consistent with the one used to locate QTL. However, if the most likely parental marker configuration is not the correct configuration, the reliability of the map and the mapped QTL are affected. By selecting the pool of potential parental configurations on the basis of observed progeny data, our method can serve as a further verification for the estimated genetic map. A framework is currently under development to allow variability or uncertainty in estimating the map to be considered when mapping QTL.

Footnotes

Communicating editor: C. Haley

Acknowledgement

We thank Tom Osborn for providing the data set and continued work in this area, and we thank Gary Churchill for his comments on this work.

References

Bayes, T.,

1783

An essay towards solving a problem in the doctrine of chances.

Philos. Trans. R. Soc. Lond.

53

:

370

–418.

Bingham, E. T., and T. J. McCoy, 1988 Cytology and cytogenetics of alfalfa, pp. 737–776 in Alfalfa and Alfalfa Improvement, edited by A. A. Hanson, D. K. Barnes and R. R. Hill, Jr. American Society of Agronomy, Madison, WI.

Broman, K. W.,

2001

Review of statistical methods for qtl mapping in experimental crosses.

Lab Anim.

30

(7):

44

–52.

Brouwer, D., and T. Osborn,

1999

A molecular marker linkage map of tetraploid alfalfa.

Theor. Appl. Genet.

99

:

1194

–1200.

Month:	Total Views:
January 2021	1
February 2021	1
March 2021	3
April 2021	1
May 2021	2
June 2021	3
July 2021	2
August 2021	9
September 2021	1
October 2021	4
November 2021	2
January 2022	5
February 2022	7
April 2022	6
May 2022	3
June 2022	2
July 2022	11
August 2022	6
September 2022	11
October 2022	3
November 2022	3
December 2022	2
January 2023	3
February 2023	1
March 2023	2
April 2023	1
May 2023	2
June 2023	1
July 2023	4
August 2023	13
September 2023	1
November 2023	2
December 2023	6
January 2024	16
February 2024	5
March 2024	8
April 2024	5

Article Contents

A Model Selection-Based Interval-Mapping Method for Autopolyploids

Abstract

METHODS

Interval mapping:

Model reduction:

DATA ANALYSIS

SIMULATION

DISCUSSION

Footnotes

Acknowledgement

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only