# Mapping Quantitative Trait Loci Using the Experimental Designs of Recombinant Inbred Populations

- Chen-Hung Kao
^{1}

- 1
*Author e-mail:*chkao{at}stat.sinica.edu.tw

## Abstract

In the data collection of the QTL experiments using recombinant inbred (RI) populations, when individuals are genotyped for markers in a population, the trait values (phenotypes) can be obtained from the genotyped individuals (from the same population) or from some progeny of the genotyped individuals (from the different populations). Let F_{u} be the genotyped population and F_{v} (*v* ≥ *u*) be the phenotyped population. The experimental designs that both marker genotypes and phenotypes are recorded on the same populations can be denoted as (F_{u}/F_{v}, *u* = *v*) designs and that genotypes and phenotypes are obtained from the different populations can be denoted as (F_{u}/F_{v}, *v* > *u*) designs. Although most of the QTL mapping experiments have been conducted on the backcross and F_{2}(F_{2}/F_{2}) designs, the other (F_{u}/F_{v}, *v* ≥ *u*) designs are also very popular. The great benefits of using the other (F_{u}/F_{v}, *v* ≥ *u*) designs in QTL mapping include reducing cost and environmental variance by phenotyping several progeny for the genotyped individuals and taking advantages of the changes in population structures of other RI populations. Current QTL mapping methods including those for the (F_{u}/F_{v}, *u* = *v*) designs, mostly for the backcross or F_{2}/F_{2} design, and for the F_{2}/F_{3} design based on a one-QTL model are inadequate for the investigation of the mapping properties in the (F_{u}/F_{v}, *u* ≤ *v*) designs, and they can be problematic due to ignoring their differences in population structures. In this article, a statistical method considering the differences in population structures between different RI populations is proposed on the basis of a multiple-QTL model to map for QTL in different (F_{u}/F_{v}, *v* ≥ *u*) designs. In addition, the QTL mapping properties of the proposed and approximate methods in different designs are discussed. Simulations were performed to evaluate the performance of the proposed and approximate methods. The proposed method is proven to be able to correct the problems of the approximate and current methods for improving the resolution of genetic architecture of quantitative traits and can serve as an effective tool to explore the QTL mapping study in the system of RI populations.

MOST biologically important traits show continuous variations and have poor heritability. Traditional study of quantitative genetics based on the phenotype evaluation to investigate quantitative trait loci (QTL) controlling these traits is difficult and limited. Recently, the advent of fine-scale molecular markers has provided researchers with an efficient tool for the detection of the underlying QTL. Most QTL detection experiments for producing marker genotypes and phenotypic traits in species have been conducted with populations derived from crosses between inbred lines, *e.g.*, backcross, advanced backcross, F_{2}, recombinant inbred (RI) populations, intermated recombinant inbred (IRI) populations, advanced intercross (AI) populations, advanced backcross populations, double haploid (DH) populations, and NC Design III, etc. (Comstock and Robinson 1952; Stuber*et al.* 1992; Beavis*et al.* 1994; Veldboom*et al.* 1994; Darvasi and Soller 1995; Austin and Lee 1996; Liu*et al.* 1996; Chapman*et al.* 2003; Winkler*et al.* 2003; Complex Trait Consortium 2004; Broman 2005). These different populations may show different properties in QTL mapping as they have different population structures, such as homozygosity, genotypic frequencies, and linkage disequilibrium (Weir 1996, Chap. 5). In principle, the use of the information about genotypes and phenotypes of individuals in these populations has become a key approach to detect the underlying QTL for the understanding of the genetic basis and the improvement of important traits in genetic study.

In the data collection of these QTL experiments, when individuals are genotyped for markers in a population, the trait values (phenotypes) can be recorded on the genotyped individuals (on the same population) or on some progeny of the genotyped individuals (on the progeny population). Fisch*et al.* (1996) illustrated the situations of data collection by F_{u}/F_{v}, where F_{u} is the genotyped population and F_{v} (*v* ≥ *u*) is the phenotyped population, in the system of RI populations (see recombinant inbred populations for the population structure of RI populations). For example, F_{2}/F_{2} denotes the typical F_{2} design, where genotypes and phenotypes are obtained from the same individuals in the F_{2} population, and F_{2}/F_{4} denotes the design to genotype F_{2} individuals and phenotype their progeny in the F_{4} population. Although the (F_{u}/F_{v}, *u* = *v*) designs are typical (Doerge*et al.* 1997; Lynch and Walsh 1998, Chap. 15), the (F_{u}/F_{v}, *u* < *v*) designs are also very popular and important for QTL detection in the genetic analysis of complex traits. For example, the F_{3}/F_{3}, F_{2}/F_{3}, F_{2}/F_{4}, F_{4}/F_{4}, F_{5}/F_{5}, and F_{6}/F_{7} designs have been used to detect QTL in maize by Stuber*et al.* (1992), Beavis*et al.* (1994), Veldboom*et al.* (1994), Austin and Lee (1996), Mihaljevic*et al.* (2004, 2005), Zhang and Xu (2004), and Sala*et al.* (2006) and the F_{4}/F_{6} design was used to study QTL in soybean by Chapman*et al.* (2003). There are some benefits of using the (F_{u}/F_{v}, *u* < *v*) with multiple phenotyping individuals and (F_{u}/F_{u}, *u* < 2) designs. For example, the cost can be economical for not genotyping the progeny for markers, the environmental variance can be reduced by phenotyping multiple progeny for trait measurement, and homozygotes can be accumulated so that QTL mapping may be improved (Cowen 1988; Lander and Botstein 1989; Knapp and Bridges 1990; Edwards*et al.* 1992; Austin and Lee 1996).

Traditional QTL mapping methods developed to date mostly assume that both marker genotypes and phenotypic traits are obtained from the same population [the (F_{u}/F_{v}, *u* = *v*) designs], and they especially focus on the F_{2}/F_{2} and backcross designs (Lander and Botstein 1989; Jensen 1993; Zeng 1994; Satagopan*et al.* 1996; Kao*et al.* 1999; Nakamichi*et al.* 2001; Sen and Churchill 2001; Kao and Zeng 2002; Yi*et al.* 2003; Carlborg and Haley 2004; Zou*et al.* 2004). Some researchers have applied these traditional (approximate) methods to QTL mapping study by regarding the traits (trait means) of progeny as the traits of genotyped individuals, *i.e.*, by treating (F_{u}/F_{v}, *u* < *v*) designs as (F_{u}/F_{v}, *u* = *v*) designs, in the analysis (Stuber*et al.* 1992; Beavis*et al.* 1994; Veldboom*et al.* 1994; Austin and Lee 1996; Chapman*et al.* 2003; Zhang and Xu 2004). Such application implicitly ignores the fact that the traits are controlled by the progeny (F_{v}) genomes, not by their ancestral (F_{u}) genomes, and that the segregation of heterozygotes will vary their population structures. Consequently, the power of QTL detection may be affected and the estimates of QTL effects can be biased by the approximate methods as shown in Zhang and Xu (2004) and in this article. Statistical methods are generally lacking or inadequate for the (F_{u}/F_{v}, *u* < *v*) designs. Fisch*et al.* (1996) suggested to propose an adequate model for the (F_{u}/F_{v}, *u* ≤ *v*) designs, and Zhang and Xu (2004) considered the nature of segregation to propose a one-QTL model for the F_{2}/F_{3} design in QTL mapping. As shown in this article, the one-QTL model by Zhang and Xu (2004) and the approximate method may have confounding problems in the estimation of QTL parameters and lose power of QTL detection. Ideally, we would like to extend the one-QTL model to a multiple-QTL model and the F_{2}/F_{3} design to the more general (F_{u}/F_{v}, *u* ≤ *v*) designs for more practical and broad use in a way that multiple QTL and their possible epistasis can be considered in the model to correct the problems and the benefit of other RI populations as mentioned can be utilized to further improve and study QTL mapping. In this article, a statistical method considering the differences in population structures between different RI populations is developed on the basis of a more complete multiple-QTL model for the more general (F_{u}/F_{v}, *u* ≤ *v*) designs. In addition, the QTL mapping properties of the proposed and approximate methods in the different (F_{u}/F_{v}, *u* ≤ *v*) designs are also derived and discussed. A simulation study was performed for evaluating the relative efficiencies of different (F_{u}/F_{v}, *u* ≤ *v*) designs and comparing the performance of the proposed and current methods in these designs. The proposed method is capable of improving the resolution of the genetic architecture of quantitative traits and can serve as a tool to study QTL mapping in the (F_{u}/F_{v}, *u* ≤ *v*) designs.

## RECOMBINANT INBRED POPULATIONS

Assume that two parental inbred lines, P_{1} and P_{2}, differ substantially in the quantitative trait of interest and are fixed for alternative alleles at QTL and markers. A cross between the parental lines produces an F_{1} population with all the same heterozygous individuals. If the F_{1} individuals are selfed or intermated, it produces an F_{2} population. In the F_{2} population, the genotypic frequencies of P_{1} homozygote, heterozygote, and P_{2} homozygote are , , and , respectively (the heterozygosity is 0.5), if one locus is considered. The frequency of recombinants (*r*) between any two loci in the F_{2} population is equivalent to the recombination fraction (*c*). If the F_{2} individuals are further selfed for *t* − 2 generations, it produces a so-called RI F_{t} population. For , the derived population is called recombinant inbred lines (RILs). In an F_{t} population, the frequencies of P_{1} homozygote, heterozygote, and P_{2} homozygote in a locus are expected to be , , and , respectively (the heterozygosity is ), and the frequency of recombinants between two loci, denoted as *r _{t}*, is increasing as

*t*is increasing and can be obtained according to Haldane and Waddington (1931). Haldane and Waddington showed that .

#### Genetic model:

In a RI population, any individual can have three possible QTL genotypes, *QQ*, *Qq*, and *qq*, if only one QTL, say *Q*, is considered. Let the genotypic value, *G _{i}*, of an individual

*i*have the following relation with the genetic parameters as(1)where μ is the intercept, and

*a*and

*d*are the additive and dominance effects according to Cockerham's model (Kao and Zeng 2002). If multiple, say

*m*, QTL are considered, the extension of the one-QTL genetic model in Equation 1 to a multiple-QTL model with epistasis is straightforward (Kao and Zeng 2002). If an individual

*i*produces

*k*progeny, the mean genotypic value of the

*k*progeny, , is(2)where

*K*

_{2},

*K*

_{1}, and

*K*

_{0}denote the numbers of progeny with

*k*progeny, respectively. If the genotype of the individual

*i*is

*k*progeny have the same

*i.e.*,

*K*

_{2}=

*k*(

*K*

_{0}=

*k*), and the mean genotypic value is μ +

*a*−

*d*/2 (μ −

*a*−

*d*/2). If the genotype of the individual

*i*is

*K*

_{2},

*K*

_{1}, and

*K*

_{0}. The possible allocations of (

*K*

_{2},

*K*

_{1},

*K*

_{0}) have (

*k*+ 1)(

*k*+ 2)/2 combinations and follow a trinomial distribution with

*k*trials and cell probabilities , , and . The number of possible mean genotypic values corresponds to the number of possible allocations of (

*K*

_{2},

*K*

_{1},

*K*

_{0}). Let , , … , denote the (

*k*+ 1)(

*k*+ 2)/2 genotypic means. For simplicity, the mean genotypic value in Equation 2 is expressed as(3)whereare to characterize the status of the additive and dominance effects in the genotypic means. Under the expression of Equation 3, the extension of the model for mean genotypic value from one QTL to multiple QTL is straightforward. For

*m*QTL without epistasis, the genetic model can be written as(4)where

*x*'s and

_{ij}*z*'s are the coded variables for Q

_{ij}_{j}'s,

*j*= 1, 2, … ,

*m*, and are defined similarly as

*x*and

_{i}*z*in Equation 3. The extension of this model to consider epistasis is straightforward by introducing the cross-product terms as the terms of epistasis.

_{j}#### Variance components:

When *m* QTL with complete marginal and epistatic effects are considered together, the genetic variances of a quantitative trait can be decomposed into 2*m*^{2} variances and 2*m*^{4} − *m*^{2} covariances in a RI population. Taking *m* = 2 as an example, there are 8 genetic variances and 28 genetic covariances. If the two QTL are unlinked, the genetic variance in an F_{t} population can be found as(5)where *i _{aa}*,

*i*,

_{ad}*i*, and

_{da}*i*are the additive-by-additive, additive-by-dominance, dominance-by-additive, and dominance-by-dominance epistatic effects, respectively, under the setting of the digenic model of Equation 1. Some of the covariances are zeros. For

_{dd}*t*= 2, there is no genetic covariance and the genetic variance reduces to eight independent components . As

*t*is increasing, the additive variances are increasing due to the accumulation of homozygotes, and the dominance variances are decreasing for the loss of heterozygotes. For example, the additive and dominance variance components are () and () for the F

_{3}population, and these components are () and () for the F

_{4}population. These two components approach and zero for . This shows that the RI F

_{t},

*t*> 2, populations can benefit the estimation of additive effects by cumulating the homozygotes, but may hurt the estimation of dominance effects due to the loss of heterozygotes. Also, the epistatic variances involving additive effects (

*i*,

_{aa}*i*, and

_{ad}*i*) are increasing, and the dominance-by-dominance variance is decreasing. For example, the epistatic variances involving the additive effects are (), (), and (), and the dominance-by-dominance variance is () in the F

_{da}_{3}(F

_{4}) population. The four variance components approach , , , and zero, respectively, as . Also, the covariances between genetic effects become present in the F

_{t},

*t*> 2, populations, and they will cause confounding problems in estimation for the one-QTL approach or if epistasis is present and ignored in QTL mapping.

## THE STATISTICAL METHODS

#### Data structure:

Consider a sample of size *n* from a (F_{u}/F_{v}, *u* < *v*) design or a (F_{u}/F_{v}, *u* = *v*) design. The *n* individuals from the F_{u} population are genotyped for markers (*X _{i}*,

*i*= 1, 2, … ,

*n*). If the sample is from the (F

_{u}/F

_{v},

*u*=

*v*) design, the

*n*genotyped individuals are phenotyped to obtain the

*n*trait values (

*y*'s,

_{i}*i*= 1, 2, … ,

*n*). If the sample is from the (F

_{u}/F

_{v},

*u*<

*v*) design, each of the

*n*genotyped individuals produces

*k*progeny in the F

_{v}generation for phenotyping, and their traits (

*y*'s,

_{ij}*j*= 1, 2, … ,

*k*) or trait means ('s) are recorded. For QTL mapping using the data from (F

_{u}/F

_{v},

*u*<

*v*) designs, both the traditional (approximate) and the proposed QTL mapping methods have been used and are discussed here. When applying the traditional methods to (F

_{u}/F

_{v},

*u*<

*v*) designs, one assumes that the mean trait is controlled by the QTL in the F

_{u}individuals, referred to as

*Q*

^{[u]}'s hereafter, rather than by the QTL in the F

_{v}progeny, referred to as

*Q*

^{[v]}'s hereafter (

*Q*

^{[u]}and

*Q*

^{[v]}have the same dimension). As a result, the problems, such as bias in estimation and loss in power of QTL detection, will occur in QTL mapping for the traditional methods. The proposed method intends to connect the trait of the F

_{v}progeny with

*Q*

^{[v]}'s using the marker information in the F

_{u}individuals; hence it can correct the problems to improve QTL mapping in the (F

_{u}/F

_{v},

*u*<

*v*) designs as shown below.

#### The proposed method:

Without loss of generality in inferring QTL mapping in the (F_{u}/F_{v}, *u* ≤ *v*) designs, consider that the sample is obtained from a (F_{u}/F_{v}, *u* < *v*) design and the trait means ('s) measured on the F_{v} progeny are used in the analysis. The proposed method attempts to relate the mean traits with the mean genotypic values at *Q*^{[v]} using the marker information of the F_{u} individuals so that the genetic structure of the F_{v} population can be taken into account in modeling. If a quantitative trait is controlled by *m* nonepistatic QTL, can be related to the *m* QTL by the model(6)where 's and 's, *j* = 1, 2, … , *m*, are the coded variables associated with the additive and dominance effects at 's, *j* = 1, 2, … , *m*, in the genotypic means, and they have the same definitions as and in Equation 4. The residual error ϵ_{i} is assumed to follow a normal distribution with mean zero and variance σ^{2}. As multiple (*m*) intervals are used to infer the multiple QTL, this model is a multiple-interval mapping-based (MIM-based) method (Kao*et al.* 1999) for the (F_{u}/F_{v}, *u* < *v*) designs. A single-QTL model for the F_{2}/F_{3} design was first proposed by Zhang and Xu (2004).

As QTL could be located in the marker intervals, the genotypic means ('s and 's) for the *m* QTL are unobservable and need to be inferred from the flanking marker genotype of the F_{u} individual. For *k* progeny, there are [(*k* + 1)(*k* + 2)/2]* ^{m}* genotypic means (possible values for 's and 's) for

*m*QTL. Given a sample with size

*n*, the likelihood function of the model in Equation 6 for θ = (

*a*

_{1},

*d*

_{1},

*a*

_{2},

*d*

_{2}, … ,

*a*,

_{m}*d*, σ

_{m}^{2}) is(7)where 's are the [(

*k*+ 1)(

*k*+ 2)/2]

*genotypic means, and the mixing proportions,*

^{m}*p*'s, are the conditional probabilities of the corresponding genotypic means given the marker genotype. The density of each individual is a mixture of [(

_{ij}*k*+ 1)(

*k*+ 2)/2]

*possible normals with different means, 's, and mixing proportions,*

^{m}*p*'s. Note that the mixing proportions in the likelihoods can be obtained by Equation 9 and need not to be estimated at the tested positions. The EM algorithm (Dempster

_{ij}*et al.*1977) is used for the estimation of the parameters in Equations 7 by treating the trait means and markers, 's and

*X*'s, as

_{i}*observed data*and the coded variables of mean genotypic values, 's and 's, as

*missing data*.

#### The EM algorithm and maximum-likelihood estimate:

The coded variables, and , associated with the additive and dominance effects in the mean genotypic values of are determined by *K _{j}*

_{2},

*K*

_{j}_{1}, and

*K*

_{j}_{0}. Therefore, inferring the distribution of and is equivalent to inferring the distribution of

*K*

_{j}_{2},

*K*

_{j}_{1}, and

*K*

_{j}_{0}. To infer the distribution of

*K*

_{j}_{2},

*K*

_{j}_{1}, and

*K*

_{j}_{0}using the marker information from the F

_{u}individuals, one may first infer the distribution of the genotype given the marker information and then infer the distribution of

*K*

_{j}_{2},

*K*

_{j}_{1}, and

*K*

_{j}_{0}given the QTL genotype of . That is,(8)For the flanking marker interval,

*I*, in the RI populations, there are nine possible flanking marker genotypes. Given each of the nine marker genotypes, the conditional probabilities of the QTL genotypes

_{j}*Q*,

_{j}Q_{j}*Q*, and

_{j}q_{j}*q*for the within are different in different RI populations, and they depend on their population structure. If an F

_{j}q_{j}_{2}population (

*u*= 2) is genotyped for markers, these conditional probabilities of genotypes given the nine flanking marker genotypes have been provided by several researchers (see,

*e.g.*, Table 2 of Kao and Zeng 1997). If an F

_{u},

*u*> 2, population is genotyped, the conditional probabilities are similar to those for the F

_{2}population with

*r*substituted for

_{t}*r*

_{2}(see Haldane and Waddington 1931, for the derivation of

*r*). Due to segregation, and may have the same or different genotypes. If is

_{t}*Q*(

_{j}Q_{j}*q*), of each progeny is sure to be

_{j}q_{j}*Q*(

_{j}Q_{j}*q*). That is, . If is

_{j}q_{j}*Q*,

_{j}q_{j}*Q*

^{[v]}among the

*k*progeny can be

*Q*,

_{j}Q_{j}*Q*, or

_{j}q_{j}*q*, and it will follow a multinomial distribution (see

_{j}q_{j}*Genetic model*);

*i.e*.,Taking all three genotypes of into consideration, it is straightforward to obtainThe possible number of allocations for each set of

*K*

_{j}_{2},

*K*

_{j}_{1}, and

*K*

_{j}_{0}is (

*k*+ 1)(

*k*+ 2)/2. If all the

*m*QTL are considered at a time, there are 9

*possible flanking marker genotypes, and, for each marker genotype, there are totally [(*

^{m}*k*+ 1)(

*k*+ 2)/2]

*possible allocations for*

^{m}*K*

_{j}_{2}'s,

*K*

_{j}_{1}'s, and

*K*

_{j}_{0}'s,

*j*= 1, 2, … ,

*m*(genotypic means). The joint distribution of

*K*

_{j}_{2}'s,

*K*

_{j}_{1}'s, and

*K*

_{j}_{0}'s,

*j*= 1, 2, … ,

*m*, is simply the product of the

*m*individual multinomial distributions:(9)Under such a setting, the proposed model can be statistically formulated as a two-stage hierarchical model for the use of the EM algorithm. First the random variables 's and 's,

*j*= 1, 2, … ,

*m*, are sampled from a multinomial experimentto determine the genotypic mean , and then a normal variable for that genotypic mean is generated fromwhere , belonging to one of 's,

*j*= 1, 2, … , [(

*k*+ 1)(

*k*+ 2)/2]

*, to produce the mean trait value, . Following the definition of the EM algorithm, the complete-data likelihood function can be written as(10)where*

^{m}**Y**

_{com}contains the missing and observed data to denote the complete data. Note that the mixing proportions,

*p*'s, are not for estimation and can be determined by Equation 9. Following the definition of the EM algorithm. In the E-step, the conditional expected complete-data log-likelihood with respect to the conditional distribution of missing data given observed data and the current estimated parameter is computed. The M-step is to find θ to maximize the conditional expected complete-data log-likelihood. The maximization will become complicated as

_{ij}*k*or

*m*increases. Although the derivations of the solutions in the M-step are complicated (not shown), these solutions can be regularized together in the form of the general formulas by Kao and Zeng (1997). The general formulas were originally devised to obtain the maximum-likelihood estimate (MLE) for the backcross and F

_{2}/F

_{2}designs by constructing a genetic design matrix

**D**to systematize the solutions into tidy formulations, and the elements in the

**D**matrix are the coded variables associated with the genetic effects in all the possible genotypic values. For the (F

_{u}/F

_{v},

*u*<

*v*) designs, the complicated solutions in the maximization step after regularization have the same forms of general formulas by assigning the coded variables associated with the genetic effects in the genotypic means to the elements of

**D**. For example, when considering

*m*= 1 and

*k*= 3, there are two coded variables, one for the additive effect and another for the dominance effect, and 10 possible genotypic means. The solutions are equivalent to the general formulas by constructing

**D**with dimension 10 × 2 aswhere the first column with elements 1 and corresponds to the coded variables, and , in the first genotypic mean, , for all the progeny with

*Q*

^{[v]}=

*K*

_{2}= 3,

*K*

_{1}= 0,

*K*

_{0}= 0), and the second row with elements and is for the coded variables in the second genotypic mean, , for the progeny, two with

*K*

_{2}= 2,

*K*

_{1}= 1,

*K*

_{0}= 0). The remaining eight rows are for the other possible genotypic means, , , … , , corresponding with the allocations of different genotypes among progeny. For

*m*QTL in the model, there are [(

*k*+ 1)(

*k*+ 2)/2]

*possible genotypic means and 2*

^{m}*parameters (ignoring epistasis), and the genetic design matrix has a dimension of [(*

^{m}*k*+ 1)(

*k*+ 2)/2]

*× 2*

^{m}*. Each row of*

^{m}**D**is assigned to the values of the coded variables for the

*m*QTL in each genotypic mean. The construction of the genetic design matrix for different

*m*and

*k*as well as for considering epistasis is straightforward, although the dimension expands dramatically as

*m*or

*k*becomes large. The E- and M-steps are iterated until convergence, and the converged values of the parameters are the MLE.

#### The problems if epistasis is present and ignored:

Many current methods ignore epistasis in the analysis of QTL for simplicity. It is important to check the problems if epistasis is present and ignored and in addition to solve the problems in QTL mapping. Without loss of generality, consider that the quantitative trait is controlled by two unlinked epistatic QTL, Q_{A} and Q_{B}. If the trait value is regressed on Q_{A} (Q_{B}), the estimates of the additive and dominance effects can be found to be(11)in the F_{u}/F_{u} designs, where *a*_{1} (*d*_{1}) is the additive (dominance) effect of Q_{A}, and *i _{ad}* and

*i*are their epistatic effects (Equations 11 can be obtained from Equations A2 and A3 by setting

_{dd}*u*=

*v*in the appendix). It shows that the estimate of the

*a*

_{A}can be confounded by

*a*

_{1}and

*i*, and

_{ad}*d*

_{A}is confounded by

*d*

_{1}and

*i*. Also, it is important to note that the epistatic effect

_{dd}*i*is not confounding in the estimation of the marginal effects. Therefore, in the F

_{aa}_{2}/F

_{2}design,

*a*

_{A}=

*a*

_{1}and

*d*

_{A}=

*d*

_{1}, and there is no confounding problem as the model has the orthogonal property in this design. In the (F

_{u}/F

_{u},

*u*> 2) designs, the problem of confounding occurs. For example, and in the F

_{3}/F

_{3}design, and in the F

_{4}/F

_{4}design, and and in the F

_{5}/F

_{5}design. The fractions associated with the confounding epistatic effects are , , and in the F

_{3}/F

_{3}, F

_{4}/F

_{4}, and F

_{5}/F

_{5}designs, respectively, and the confounding problem is found to become more serious in the later RI populations. This fraction approaches for the designs with large

*u*. For large

*u*, the dominance component may become diminished and hard to estimate, and the additive component plays the major role in estimation, due to the loss of heterozygotes and increase of homozygotes in the later F

_{u}populations. But, for the early F

_{u}populations, say the F

_{2}, F

_{3}, and F

_{4}populations, the dominance components may not be negligible and should be considered in the model. In addition, ignoring epistasis can inflate the sampling variances of QTL effects and will reduce the power of QTL detection. Among the epistatic variance components, the component contributed by

*i*is relatively larger than the components by other epistatic effects. According to Equation 5, the components contributed by

_{aa}*i*are , , and in the F

_{aa}_{2}/F

_{2}, F

_{3}/F

_{3}, and F

_{4}/F

_{4}populations, respectively. This component becomes greater for the later RI populations and approaches for . The above implies that QTL mapping could be problematic, such as biasing the estimation of QTL parameters and reducing the power of QTL detection, if epistasis is ignored in QTL analysis. By taking epistasis into account, the variance components contributed by the epistatic effects can be controlled to enhance the power of QTL detection and the confounding problem can be avoided to improve QTL detection.

#### The traditional (approximate) method and its problems:

The approximate method is to model the relation between the mean trait of the F_{v} progeny and the QTL in their ancestral F_{u} individuals, *Q*^{[u]}'s. It implicitly assumes that the traits measured on the F_{v} progeny are controlled by the genomes of the F_{u} individuals, rather than by those of the progeny. This assumption overlooks the differences between population structures, as the genotypic frequencies, heterozygosity, and linkage disequilibrium between the genotyped (ancestral) and phenotyped (progeny) populations are different. Consequently, some problems, such as less power and bias in estimation, will occur (see the appendix). By the approximate method, the estimate of the additive effect is confounded by the additive effect *a*_{1} and the epistatic effect *i _{ad}*, and the estimate of the dominance effect is confounded by

*d*

_{1}and

*i*. The confounding depends on

_{dd}*u*and

*v*. For example,

*b*=

_{a}*a*

_{1}−

*i*/4 and

_{ad}*b*=

_{d}*d*

_{1}/2 −

*i*/8 in the F

_{dd}_{2}/F

_{3}design, and in the F

_{2}/F

_{4}design,

*b*=

_{a}*a*

_{1}− 3

*i*/8 and

_{ad}*b*=

_{d}*d*

_{1}/2 − 3

*i*/16 in the F

_{dd}_{3}/F

_{4}design, and

*b*=

_{a}*a*

_{1}− 7

*i*/16 and

_{ad}*b*=

_{d}*d*

_{1}/4 − 7

*i*/64 in the F

_{dd}_{3}/F

_{5}design (see the appendix). The estimated additive effect is an unbiased estimate of

*a*

_{1}, and the estimated dominance effect is only a fraction of

*d*

_{1}. Using the approximate method, the confounding of

*i*and

_{ad}*i*in the estimation of additive and dominance effects becomes more severe for the designs with a larger difference between

_{dd}*u*and

*v*; moreover, the confounding problems remain unsolved if epistasis is taken into account (see the appendix). In addition to the confounding problem in estimation, the uncontrolled genetic variance will become a part of the genetic residual, causing loss of power in QTL detection. In general, the application of the approximate method to the QTL mapping in the (F

_{u}/F

_{v},

*u*<

*v*) designs has the problems of confounding, estimating dominance effects, and controlling the genetic variances. To avoid the problems and to increase the power, it is desirable to consider the genome structure in the F

_{v}population by using the proposed method for QTL mapping in the systems of the (F

_{u}/F

_{v},

*u*<

*v*) designs.

## SIMULATION STUDIES

Simulations were performed to achieve three purposes: (1) to verify the derived mapping properties of the proposed and current methods, (2) to compare the performance of the proposed and current methods in different (F_{u}/F_{v}, *v* ≤ *v*) designs, and (3) to evaluate the relative mapping efficiency of different experimental designs. Two 100-cM chromosomes each with 11 equally spaced markers and one QTL were simulated. The two unlinked epistatic QTL, Q_{A} and Q_{B}, are assumed to be located at 25 cM on their chromosomes. The additive effects of Q_{A} and Q_{B} are assumed to be *a*_{1} = 2 and *a*_{2} = 2, respectively, and there is no dominance effect. Their additive-by-dominance effect is assumed to be *i _{ad}* = 2, and the other three epistatic effects are assumed to be zero. With these parameter settings, the marginal effects of the two QTL contribute 44.44% and 44.44% to the total genetic variance, respectively, and epistasis contributes 11.11% to the total genetic variance in the F

_{2}/F

_{2}design. The environmental variance is assumed to be 85.5 (the heritability,

*h*

^{2}, is 0.05 in the F

_{2}/F

_{2}design). Also, according to Equation 11, when ignoring epistasis in the estimation of

*a*

_{1}and

*a*

_{2}, the estimate of

*a*

_{1}will be confounded by

*i*, and the estimate of

_{ad}*a*

_{2}will not be confounded. The QTL, Q

_{A}, will be referred to as the confounded QTL, and Q

_{B}will be referred to as the unconfounded QTL. The sample size is 200, and the number of replicates is 100. The simulations include three parts. The first part is for the (F

_{u}/F

_{v},

*u*=

*v*) designs. Six such designs, F

_{2}/F

_{2}, F

_{3}/F

_{3}, F

_{4}/F

_{4}, F

_{5}/F

_{5}, F

_{6}/F

_{6}, and F

_{10}/F

_{10}, are simulated. The second part is for the F

_{2}/F

_{3}designs, and four different numbers of phenotyping progeny,

*k*= 1,

*k*= 3,

*k*= 5, and

*k*= 10, are assumed. The third part considers designs with other genotyping and phenotyping populations, including the F

_{2}/F

_{4}, F

_{3}/F

_{4}, F

_{3}/F

_{5}, and F

_{4}/F

_{5}designs. The number of progeny for phenotyping is assumed to be

*k*= 5. In each part, a stepwise selection procedure (Kao

*et al.*1999; Zeng

*et al.*1999) is adopted to detect QTL. Both the proposed and approximate interval mapping (IM)-based (one-QTL) and MIM-based (multiple-QTL) methods are used in the analysis. The critical value for claiming significance was chosen as , where

*k*is the number of parameters in testing (see discussion). The simulation results are shown in Tables 1–3⇓.

Table 1 shows the results of the first part of the simulation. When the IM-based method is used to detect QTL, one can consider one (additive or dominance) effect or two effects in the search. The model considering dominance effect only did not detect any QTL, and the performance of the two-effect model is inferior as compared to that of the additive-effect model. Table 1 presents the QTL mapping result of the model considering the additive effect only. The powers for detecting the confounded Q_{A} are 30, 14, 17, 18, 11, and 17% in the F_{2}/F_{2}, F_{3}/F_{3}, F_{4}/F_{4}, F_{5}/F_{5}, F_{6}/F_{6}, and F_{10}/F_{10} designs, respectively, and the powers for detecting the unconfounded Q_{B} are 33, 39, 44, 50, 53, and 54% in the six different designs, respectively (critical value ). As compared to the QTL detection in the F_{2}/F_{2} design, the confounded Q_{A} was detected with decreasing power, and the unconfounded Q_{B} was detected with increasing power by using the later RI populations. The reasons are that the estimation of the additive effect of Q_{A} is confounded by *i _{ad}* due to ignoring epistasis and such confounding becomes more severe in the F

_{3}, F

_{4}, F

_{5}, F

_{6}, and F

_{10}populations and that the estimation of the additive effect of Q

_{B}is not confounded so that the power can be increased due to the accumulation of homozygotes in the later RI populations (Equation 11). The estimated additive effects of the confounded Q

_{A}by the IM method are 2.13 (SD 1.40), 1.43 (SD 01.25), 1.40 (SD 1.11), 1.10 (SD 1.24), 1.08 (SD 1.17), and 1.02 (SD 1.25), respectively, in the six designs (the predicted confounded estimates by the IM method are 2.0, 1.5, 1.25, 1.125, 1.0625, and 1.00394, respectively, according to Equation 11). Except for the F

_{2}/F

_{2}design, the estimates of

*a*

_{1}by the IM method are poorly estimated and very far away from the true value

*a*

_{1}= 2 due to the confounding of

*i*= 2. The confounding problem is more severe for the confounded Q

_{ad}_{A}in the designs using the later RI populations if epistasis is ignored. The estimated effects of the unconfounded Q

_{B}are 2.20 (SD 1.46), 2.09 (SD 1.06), 2.06 (SD 0.88), 2.07 (SD 0.97), 2.23 (SD 0.65), and 2.14 (SD 0.75), respectively. These estimates by the IM method are very close to the true value

*a*

_{2}= 2 as expected, because they are not confounded. The advantages of the MIM method include that the detected QTL can be fitted into the model for further QTL search and the epistasis between QTL can be considered. When the MIM method considers only one QTL in the model (

*m*= 1), the mapping results are identical to those of the IM method. If the detected Q

_{A}(Q

_{B}) is fitted into the MIM model (

*m*= 2 without epistasis) in the search for other QTL, both the powers of detecting Q

_{A}and Q

_{B}increase 1–6%. For example, the powers increase from 6% (1%) to 45% (15%) by using the MIM method without epistasis in the F

_{3}/F

_{3}design. If epistasis is taken into account in the search, four different types of epistatic effects can be considered in the model. Among the four possible epistatic effects, only the model taking

*i*into account improves QTL detection, and the models fitting other epistasis become inferior as a higher critical value is used for claiming significance (critical value ). By considering epistasis, the values of the average partial LRT statistic increase by 1–2 and the powers of detecting Q

_{ad}_{A}and Q

_{B}also increase as compared to the MIM method without epistasis. For example, the powers of detecting Q

_{A}and Q

_{B}increase 4% (5%) and 7% (7%) to 35% (20%) and 41% (52%) in the F

_{2}/F

_{2}(F

_{3}/F

_{3}) design after taking epistasis into account. The increase in powers of detecting Q

_{A}for the other designs is less notable. Also, by considering epistasis, the confounding problem in the estimation of

*a*

_{1}seems to be relieved by considering

*i*in the model. The means of the estimated

_{ad}*a*

_{1}are 2.06 (SD 1.33), 1.74 (SD 1.26), 1.63 (SD 1.74), 1.51 (SD 2.11), 1.52 ( SD 2.22) and 1.04 (SD 2.42) in the designs, respectively, and the means of the estimated

*i*are 1.33 (SD 2.63), 1.50 (SD 3.01), 0.78 (SD 3.42), 1.15 (SD 4.39), 0.97 (SD 4.67), and 0.10 (SD 4.76), respectively. These estimates of

_{ad}*a*

_{1}and

*i*in the later RI populations seem to be unsatisfactory, especially for the F

_{ad}_{10}/F

_{10}design, but they can be improved by increasing the sample size or as the heritability becomes higher (not shown) or by using the (F

_{u}/F

_{v},

*u*<

*v*) designs with multiple phenotyping progeny (Table 3). The poor estimation of

*i*in the F

_{ad}_{10}/F

_{10}design may be attributed to the lack of heterozygotes. In the estimation of the QTL position, the means of the estimated positions of Q

_{A}are 33.88 (SD 25.32), 39.62 (SD 27.43), 39.06 (SD 26.99), 42.58 (SD 29.25), 38.64 (SD 27.58), and 44.24 (SD 28.43), respectively, in the six different designs, and the means for Q

_{B}are 37.81 (SD 26.86), 33.78 (SD 21.60), 30.34 (SD 17.84), 32.37 (SD 22.16), 30.63 (SD 16.20), and 32.76 (SD 19.42), respectively. The estimated QTL positions are found to be biased toward the center of chromosomes. The position of unconfounded Q

_{B}(confounded Q

_{A}) seems to be estimated with greater (reduced) accuracy as the later RIL populations are used in the design. The above results shows that the use of the RI population after F

_{2}can improve the estimation of parameters and power of detection of the unconfounded Q

_{B}, but it is difficult to improve the resolution of the confounded Q

_{A}as compared to the use of the F

_{2}/F

_{2}design.

Table 2 shows the QTL mapping results using the F_{2}/F_{3} designs with different numbers of phenotyping progeny. If there is only 1 progeny (*k* = 1) for trait measurement, the IM and MIM (with or without epistasis) methods have less power to detect Q_{A} and Q_{B} as compared to the powers in the F_{2}/F_{2} or F_{3}/F_{3} designs. For example, the powers of detecting Q_{B} (Q_{A}) by the proposed MIM method are 32% (12%) in the F_{2}/F_{3} design with *k* = 1, and they are 41% (35%) and 52% (20%) in the F_{2}/F_{2} and F_{3}/F_{3} designs. By increasing the number of phenotyping progeny, the powers of detecting Q_{A} and Q_{B} are enhanced. By using 3 (*k* = 3), 5 (*k* = 5), and 10 (*k* = 10) progeny for phenotyping, the powers of detecting the unconfounded Q_{B} increase to 83, 93, and 100%, respectively, and the powers of detecting the confounded Q_{A} become 46, 73, and 99%, respectively. As compared to the results of the IM method in the F_{2}/F_{2} and F_{3}/F_{3} designs, the use of three progeny for phenotyping can greatly enhanced the power of detecting the unconfounded Q_{B} from 33% (or 39%) to 79%, but it does not greatly increase the power of detecting the confounded Q_{A} [the powers of detecting Q_{A} in the F_{2}/F_{2}, F_{3}/F_{3}, and (F_{2}/F_{3}, *k* = 3) designs are 30, 14, and 37%, respectively]. The use of more progeny for phenotyping also improves the estimation of QTL effects and positions. For example, by using the proposed MIM method with epistasis, the means of the estimated Q_{B} (Q_{A}) positions are 37.73 cM with SD 27.41 cM (43.06 cM with SD 29.23 cM) and 25.22 cM with SD 8.25 cM (27.54 cM with SD 15.82 cM) for *k* = 1 and *k* = 5. The estimated *a*_{1}, *a*_{2}, and *i _{ad}* are 1.62 (SD 2.01), 2.03 (SD 1.44), and 1.25 (SD 5.48) for

*k*= 1, and they are 1.96 (SD 0.68), 2.01 (SD 0.45), and 1.91 (SD 2.18) for

*k*= 5. In addition, if epistasis is not taken into account or the approximate method is used, there will always be confounding problems in estimation as shown in Table 2. For example, the means of the estimated

*a*

_{1}and

*i*by the approximate MIM method are 1.48 (predicted value

_{ad}*a*

_{1}−

*i*/4 = 1.5) with SD 0.49 and 0.98 (predicted value

_{ad}*i*/2 = 1.0) with SD 1.10 for

_{ad}*k*= 5. By using 10 progeny for phenotyping, both Q

_{A}and Q

_{B}can almost be detected with power 1 and with good precision and accuracy. In general, the estimation becomes improved as more progeny are used for phenotyping. The performance of the MIM method is also better than that of the IM method as expected.

Table 3 shows the QTL mapping results of using the (F_{u}/F_{v}, *v* > *u* > 2) designs with *k* = 5. The means of the estimated additive effects by the proposed IM and MIM without epistasis are 1.33 (SD 0.41) and 1.28 (SD 0.39), respectively, in the F_{3}/F_{4} design, and they are 1.20 (SD 0.33) and 1.17 (SD 0.36), respectively, in the F_{4}/F_{5} design (the predicted estimated *a*_{1}'s by Equation 11 are 1.25 and 1.125 in the two designs). By taking epistasis into account, the confounding problem can be solved by the proposed method. The means of the estimated *a*_{1} by considering epistasis are 1.92 (SD 0.87) and 1.86 (SD 1.12) in the two designs, respectively. The means of the estimated *i _{ad}*'s are 1.81 (SD 2.41) and 1.65 (SD 2.58), respectively. On the contrary, the approximate methods always have the confounding problem whether or not epistasis is taken into account. For example, the means of the estimated additive effects by the approximate MIM with epistasis are 1.48 (SD 0.43) and 1.47 (SD 0.56) for the two designs, respectively (the predicted

*a*

_{1}by the approximate method is 1.5). The powers of QTL detection are also increasing and the estimation of QTL positions is also improved by using the MIM approach and by taking epistasis into account. For example, the power increases from 68% (59%) by the IM method to 84% (75%) by the MIM method in the F

_{3}/F

_{4}(F

_{4}/F

_{5}) design, and the mean of the estimated position of the confounded Q

_{A}is improved from 27.05 with SD 13.44 (27.27 with SD 12.16) to 25.60 with SD 14.04 (26.86 with SD 10.93). In addition, the use of the F

_{2}/F

_{4}and F

_{3}/F

_{5}designs does not provide a better resolution of the confounded Q

_{A}when compared to the use of F

_{2}/F

_{3}and F

_{3}/F

_{4}designs as expected. For example, the powers of detecting Q

_{A}by the proposed MIM method are 52 and 71%, respectively, in the two designs, and the means of the estimated positions are 29.02 cM (SD 19.96 cM) and 28.88 cM (SD 18.90 cM), respectively. Across all different designs with

*k*= 5, the unconfounded Q

_{B}can be well detected with high power and great accuracy and precision as compared to the confounded Q

_{A}.

## DISCUSSION

The data required in QTL mapping analysis are usually composed of two parts, phenotypic trait values and marker genotypes. In data collection using the designs of RI populations, the trait values can be obtained from the same genotyped population by using the (F_{u}/F_{v}, *u* = *v*) designs or from the progeny of the genotyped population using the (F_{u}/F_{v}, *u* < *v*) designs. The great benefit of using the (F_{u}/F_{v}, *u* < *v*) designs in QTL mapping is not only through reducing the cost and environmental variance by phenotyping several progeny for each genotyped individual (Lander and Botstein 1989; Knapp and Bridges 1990), but also likely through taking advantage of the changes in population structures between different RI populations. Different RI populations have different homozygosities, genotypic frequencies, and proportions of recombinant genotypes. The increase of homozygosity may help the estimation of additive effects due to the accumulation of homozygotes, but it will hinder the estimation of dominance effects due to the loss of heterozygotes. Also, in modeling, the orthogonal property of the genetic model, which holds in the F_{2} population, will be lost in the other later RI populations as the genotypic frequencies have changed. Then, the confounding problem in the QTL estimation may occur if epistasis is present and ignored. Such a confounding problem cannot be relieved by enlarging the sample size or increasing heritability or using the approximate methods, and it becomes more severe for the later populations (see Equations 11 and Table 1). Therefore, the use of the later RI populations can greatly benefit the detection of unconfounded QTL with additive effects, but it may deter the detection of confounded QTL and the QTL with large dominance effects as compared to the QTL mapping using the F_{2} population. By taking epistasis into account, the confounding problem can be alleviated by the proposed method. The approximate method, however, always has the confounding problems. In addition, the (F_{u}/F_{v}, *u* < *v*) designs also allow for phenotyping more progeny for each genotyped individual to reduce environmental variance so that the resolution of QTL can be further enhanced. The resolution of the unconfounded QTL can be easily improved, but more progeny are needed to improve the resolution of the confounded QTL as compared to the F_{u}/F_{u} designs (comparing the results of Table 1 with those of Tables 2 and 3).

In statistical modeling, the relation between the phenotype and the underlying QTL genotype is relatively simple and can be modeled by a 3* ^{m}* normal mixture model for the (F

_{u}/F

_{v},

*u*=

*v*) designs. However, for the (F

_{u}/F

_{v},

*u*<

*v*) designs, when the phenotypic means of the

*k*progeny from the genotyped individuals are used in QTL mapping, the relationship between the phenotypic means and the involved QTL genotypes becomes increasingly complicated and should be modeled by a [(

*k*+ 1)(

*k*+ 2)/2]

*normal mixture model as discussed here. Such complication in statistical modeling arises mainly from the segregation of heterozygote into homozygotes and heterozygote and from the numerous possible combinations of different genotypes among the*

^{m}*k*progeny. Genetically, segregation will vary the homozygosity and linkage disequilibrium in different RI populations. It is possible to utilize different experimental designs of these RI populations to benefit QTL mapping by taking advantage of their specific population structures. To achieve this purpose, for QTL mapping in the (F

_{u}/F

_{v},

*u*<

*v*) design, the proposed method is designed to take the population structures of phenotyping populations into account by modeling the relationship between the phenotypic means and the underlying QTL in the same populations. Then, the likelihood of the proposed method is a mixture of [(

*k*+ 1)(

*k*+ 2)/2]

*normals with the number of mixture components and mixing proportions adjusted for the phenotyping population. The approximate method, however, ignores the fact of segregation and differences in population structure between different RI populations, and it relates the phenotypic traits of the progeny with the QTL in their ancestral populations. Therefore, the likelihood of the approximate method is always a mixture of 3*

^{m}*normals with constant mixing proportions derived from the genotyped population. Consequently, the approximate method may have the problems of confounding and estimating the dominance effect, and the proposed method can avoid the problems to improve the QTL mapping in (F*

^{m}_{u}/F

_{v},

*u*<

*v*) designs as shown in this article. In addition, it is straightforward to modify the proposed method for the (F

_{u}/F

_{v},

*u*<

*v*) designs with each individual progeny trait (

*y*'s,

_{ij}*i*= 1, 2, … ,

*n*,

*j*= 1, 2, … ,

*k*) recorded. The mapping results by the approaches of using traits and trait means are similar, but the approach of using individual traits can be more computationally economical for its relatively simple likelihood (with a mixture of 3

*normals).*

^{m}The proposed method has a much more complicated mixture likelihood, and the mixture likelihood will have different numbers of components with different weights (mixing proportions) for different designs. Therefore, the determination of the critical value for the proposed method is challenging in the (F_{u}/F_{v}, *u* ≤ *v*) designs. It is well known that the critical value cannot be simply chosen from a χ^{2}-distribution because of violation of the standard conditions of asymptotic theory for mixture models (Self and Liang 1987; Feng and McCulloch 1994) and that the determination of the critical values for claiming QTL detection may depend on the factors, such as heritability, marker density, size of the genomes, number of (linked or unlinked) QTL, and the direction of QTL effects (Jensen 1993; Zeng*et al.* 1999; Zou*et al.* 2004). Several methods, such as the method by Piepho (2001), the permutation tests (Churchill and Doerge 1994), residual bootstrapping (Zeng*et al.* 1999), and the resampling method by Zou*et al.* (2004), have been proposed to determine the values, but they generally require additional assumptions, such as a dense map with equally spaced markers, or are applicable only to some standard designs, such as a backcross or F_{2} design (the F_{u}/F_{v}, *u* = *v* design), or are restricted to the model with a 2- (3-) normal mixture (see Zou*et al.* 2004 for the discussion). In addition, the concept of the false discovery rate (Benjamini and Hochberg 1995) has been introduced to deal with the problem of statistical significance by the control of type II rather than type I errors in QTL mapping. As the proposed method considers a more complicated mixture of ([(*k* + 1)(*k* + 2)/2]* ^{m}*) normals with different numbers of components and mixing proportions varying with

*m*,

*k*,

*u*, and

*v*, and the different population structures may also affect the critical values, the issue of determining the critical values in the (F

_{u}/F

_{v},

*u*≤

*v*) designs will become even more complicated and still needs to be unraveled. Here, Bonferroni argument based on χ

^{2}-distribution (Lander and Botstein 1989) is used to choose the critical values before the complicated issue is solved. Further research on the theoretical basis of determining the critical value is of great value to QTL mapping in the (F

_{u}/F

_{v},

*u*≤

*v*) designs.

The RI populations have been very important and popular in the study of QTL for a long time (Haldane and Waddington 1931; Stuber*et al.* 1992; Beavis*et al.* 1994; Veldboom*et al.* 1994; Darvasi and Soller 1995; Austin and Lee 1996; Liu*et al.* 1996; Belknap 1998; Chapman*et al.* 2003; Complex Trait Consortium 2004; Broman 2005). As compared to the F_{2} population, the population structures in the later RI populations have some precious properties, such as larger additive genetic variance, higher homozygosity, and more recombinants. These properties may benefit the QTL resolution and should be well utilized in the study of QTL mapping. With the ability to consider the changes in population structures of different populations, the proposed method can serve as an effective tool to map for QTL in specific designs and evaluate the efficiency of QTL mapping among different experimental designs under the system of RI populations. Other important issues of QTL mapping by using the (F_{u}/F_{v}, *u* ≤ *v*) designs include the consideration of endosperm traits (Wu*et al.* 2002; Xu*et al.* 2003; Kao 2004) and the extension of the methods from the system of RI populations to the system of IRI populations. The IRI populations, which are derived by randomly mating for some generations after F_{2} and then followed by cycles of selfing, have the advantages of producing more recombinants as compared to the RI populations, and they can benefit the analysis of quantitative traits (Liu*et al.* 1996; Winkler*et al.* 2003). It is critical to provide adequate statistical methods for these designs by considering their specific population structures to explore their properties in the QTL mapping study.

## APPENDIX

If *m* QTL without epistasis are considered, the model of the traditional method for the mean trait, , of the *k* F_{v} progeny from each of the *n* genotyped F_{u} individuals and the QTL can be written as(A1)where 's and 's, *j* = 1, 2, … , *m*, are the coded variables for the genotype of 's, *j* = 1, 2, … , *m*, and they are coded as (1, ), (0, ), and (−1, ) for *Q _{j}Q_{j}*,

*Q*, and

_{j}q_{j}*q*, respectively. The mean residual error has a mean of zero and variance σ

_{j}q_{j}^{2}/

*k*, where σ

^{2}is the residual variance of the trait on the basis of a single individual. As

*Q*

^{[u]}may not be coincident with marker and could be

*Q*,

_{j}Q_{j}*Q*, or

_{j}q_{j}*q*, the likelihood of the model is a mixture of 3

_{j}q_{j}*normals. In parameter estimation, the general formulas by Kao and Zeng (1997) derived on the basis of the EM algorithm (Dempster*

^{m}*et al.*1977) can be used for obtaining their MLE.

To show the problems of less power and bias in estimation for the approximate method, without loss of generality, again assume that the quantitative trait value *y _{i}* is affected by two unlinked epistatic QTL, Q

_{A}and Q

_{B}. The variances of , , , , × , × , × , and × can be found to be 1 − ()

^{u}^{−1}, ()

^{u}^{−1}[1 − ()

^{u}^{−1}], 1 − ()

^{u}^{−1}, ()

^{u}^{−1}[1 − ()

^{u}^{−1}], [1 − ()

^{u}^{−1}]

^{2}, − ()

^{(u+1)}, − ()

^{(u+1)}, and − [ − ()

^{(u−1)}]

^{4}, respectively. The covariances between () and the coded variables in Equation 6 are needed in the derivation. The covariances between and

*x*

_{1}and between and

*w*are 1 − ()

_{ad}

^{u}^{−1}and [ − ()

*][()*

^{u}

^{v}^{−2}− 1], respectively. The covariances between and

*z*

_{1}and between and

*w*are ()

_{dd}

^{v}^{−1}[1 − ()

^{u}^{−1}] and ()

*[1 − ()*

^{v}

^{v}^{−2}][()

^{(u−1)}− 1], respectively. Therefore, when

*y*is regressed on Q

_{i}_{A}with additive and dominance effects, the estimates of the additive and dominance effects by the approximate method can be found to be(A2)and(A3)in the (F

_{u}/F

_{v},

*u*<

*v*) design. It shows that the

*b*is confounded by the the additive effect

_{a}*a*

_{1}and the epistatic effect

*i*, and

_{ad}*b*is confounded by

_{d}*d*

_{1}and

*i*. When multiple QTL and their epistasis are considered in the model, the estimates of their effects can be also derived. It can be found that the approximate methods also have the confounding problems.

_{dd}## Acknowledgments

The author thanks two anonymous reviewers for helpful comments. This study was supported by grants NSC94-2118-M-001-021 from the National Science Council, Taiwan, Republic of China.

## Footnotes

Communicating editor: A. D. Long

- Received January 27, 2006.
- Accepted July 6, 2006.

- Copyright © 2006 by the Genetics Society of America