# Quantitative Trait Loci Mapping and The Genetic Basis of Heterosis in Maize and Rice

- Antonio Augusto Franco Garcia
^{*}, - Shengchu Wang
^{†}, - Albrecht E. Melchinger
^{‡}and - Zhao-Bang Zeng
^{†}^{§},^{1}

^{*}Departamento de Genética, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo CP 83, 13400-970, Piracicaba, SP, Brazil,^{†}Department of Statistics and Bioinformatics Research Center and^{§}Department of Genetics, North Carolina State University, Raleigh, North Carolina 27695-7566 and^{‡}Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599 Stuttgart, Germany

- 1
*Corresponding author:*Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695-7566. E-mail: zeng{at}stat.ncsu.edu

## Abstract

Despite its importance to agriculture, the genetic basis of heterosis is still not well understood. The main competing hypotheses include dominance, overdominance, and epistasis. NC design III is an experimental design that has been used for estimating the average degree of dominance of quantitative trait loci (QTL) and also for studying heterosis. In this study, we first develop a multiple-interval mapping (MIM) model for design III that provides a platform to estimate the number, genomic positions, augmented additive and dominance effects, and epistatic interactions of QTL. The model can be used for parents with any generation of selfing. We apply the method to two data sets, one for maize and one for rice. Our results show that heterosis in maize is mainly due to dominant gene action, although overdominance of individual QTL could not completely be ruled out due to the mapping resolution and limitations of NC design III. For rice, the estimated QTL dominant effects could not explain the observed heterosis. There is evidence that additive × additive epistatic effects of QTL could be the main cause for the heterosis in rice. The difference in the genetic basis of heterosis seems to be related to open or self pollination of the two species. The MIM model for NC design III is implemented in Windows QTL Cartographer, a freely distributed software.

HETEROSIS (or hybrid vigor) is a phenomenon in which an F_{1} hybrid has superior performance over its parents. It has been observed in many plant and animal species. The utilization of heterosis is responsible for the commercial success of plant breeding in many species and leads to the widespread use of hybrids in several crops and horticultural species. In maize, the most notable example, heterosis is the primary reason for the success of commercial industry (Stuber *et al*. 1992). In China, hybrid rice varieties showed ∼20% yield advantage over inbred varieties (Yuan 1992) and made a tremendous impact on rice production around the world.

Despite its importance, the genetic basis of heterosis has been debated for almost one century and is still not explained satisfactorily. The *dominance* hypothesis (Davenport 1908; Bruce 1910; Keeble and Pellew 1910; Jones 1917) suggests that the alleles from one parent are dominant over the alleles from the other parent, and due to the cancelation of deleterious effects at multiple loci, the F_{1} hybrid is superior to the parents. The *overdominance* hypothesis (East 1908; Shull 1908) assumes that the loci with heterozygous genotypes are superior to both homozygous parents. Epistasis is also frequently mentioned as a possible cause of heterosis.

NC design III, or design III (Comstock and Robinson 1948, 1952), is an experimental design for estimating genetic variances and the average degree of dominance for quantitative trait loci (QTL) and has being used to study heterosis. Random F_{2} individuals are taken from a population that originated by crossing two inbred lines. These individuals are backcrossed to both parental lines and a quantitative trait is measured in the progeny. An analysis of variance of the progenies gives estimates of the average degree of dominance, which can be used to infer the genetic basis of quantitative traits and study heterosis. Cockerham and Zeng (1996) extended the analysis of design III to include linkage, two-locus epistasis, and also the use of F_{3} parents. Considering that the F_{2} (or F_{3}) parents could be genotyped with molecular markers, they presented a statistical methodology based on four orthogonal contrasts for single-marker analysis of design III, allowing the study of the effects of QTL on both backcrosses simultaneously. Melchinger *et al*. (2007) studied the role of epistasis on the manifestation of heterosis in design III populations. They defined new types of heterotic genetic effects, the augmented additive and dominance effects of QTL, since the main effects also contain epistasis that could not be removed or estimated separately.

Stuber *et al*. (1992) used design III with marker loci to study the genetic basis of heterosis in maize. They conducted separate interval mapping analyses (Lander and Botstein 1989) in each backcross and concluded that overdominance (or pseudo-overdominance) is the major cause of heterosis. However, a combined analysis of both backcrosses showed that dominance is probably more likely to be a major cause of heterosis (Cockerham and Zeng 1996), although overdominance and epistasis were also present. In rice, design III using F_{7} parents was used by Xiao *et al*. (1995) and the data were analyzed in the same way as that of Stuber *et al*. (1992). They concluded that dominance is the major genetic cause of heterosis in this species. Later, Z.-B. Zeng (unpublished results) analyzed this data set using the method of Cockerham and Zeng and concluded that epistasis is more likely to be a major cause of heterosis in rice.

The statistical analysis proposed by Cockerham and Zeng has several advantages. It allows estimates of both additive and dominance effects and has two contrasts for testing the presence of epistasis. However, it is based on single-marker analysis and was not developed for QTL mapping. The method has several limitations: the contrasts are biased due to the recombination fraction between marker and QTL, it is not possible to separate the additive and dominance effects of several QTL linked to the same marker, the contrasts for epistasis detect only a small portion of the interactions between QTL that are linked to the same marker, and it has low statistical power.

In this article, we first extend the method of Cockerham and Zeng in the framework of multiple-interval mapping (MIM) (Kao and Zeng 1997; Kao *et al*. 1999), which provides a sound basis for QTL mapping. Our MIM model for design III combines information from multiple markers and takes epistatic effects into account. By analyzing both backcrosses simultaneously, it provides estimates of augmented additive and dominance effects. The model can be used for parents with any number of generations in selfing. Then, we apply the model to the data of Stuber *et al*. (1992) and Xiao *et al*. (1995) to study the genetic basis of yield heterosis in maize and rice.

## DESIGN III WITH MARKER LOCI

Before presenting the new model for design III, we first outline some important results for design III from Comstock and Robinson (1952) and Cockerham and Zeng (1996), adapting the notation when necessary. The genetic effects of QTL *Q _{r}* with genotypes

*Q*,

_{r}Q_{r}*Q*, and

_{r}q_{r}*q*are defined as

_{r}q_{r}*a*−

_{r}*d*/2,

_{r}*d*/2, and −

_{r}*a*−

_{r}*d*/2, respectively (using the F

_{r}_{2}model, see Zeng

*et al*. 2005), where

*a*and

_{r}*d*are additive and dominance effects. The two-way epistatic interactions between QTL

_{r}*Q*and

_{r}*Q*are denoted as

_{s}*aa*for additive × additive (

_{rs}*a*×

_{r}*a*),

_{s}*ad*for additive × dominance (

_{rs}*a*×

_{r}*d*),

_{s}*da*for dominance × additive (

_{rs}*d*×

_{r}*a*), and

_{s}*dd*for dominance × dominance (

_{rs}*d*×

_{r}*d*) interaction.

_{s}On the basis of an analysis of variance for progenies of F_{2} parents in the backcrosses in design III, Comstock and Robinson developed a theory for estimating genetic variances among F_{2} parents () and due to interactions of F_{2} and inbred parents (). They showed that, under the assumption of no epistasis for *m* independent loci, the genetic constitutions of these variances are and . Cockerham and Zeng expanded these ideas to include F_{3} parents, showing that in this case and . For F_{2} (and F_{3}) parents, the average degree of dominance for a quantitative trait can be inferred through the ratio . When two-locus epistasis is considered, the additive effects include *ad* and *da*, and the dominance effects include *aa*, regardless of linkage. The variances are also affected: contains *a* and *aa* + *dd*; contains *d* and *ad* + *da*. However, the coefficients of epistatic effects on the variances are usually small.

Considering that information from molecular markers could be available, Cockerham and Zeng presented a statistical method to analyze design III in the framework of single-marker analysis. For a single-marker locus *M* with genotypes *MM*, *Mm*, and *mm* for each parent (F_{2} or F_{3}), four orthogonal contrasts *C _{k}* (

*k*= 1, … , 4) can be used for testing linear functions of effects of QTL. The four contrasts explore the 2 d.f. for differences among the means of marker genotypes (

*C*

_{1}and

*C*

_{3}) and the 2 d.f. for interaction of the marker genotypes with the inbred lines (

*C*

_{2}and

*C*

_{4}).

To obtain a MIM model for design III, we first extend the contrasts of Cockerham and Zeng still in the framework of marker analysis (not interval mapping), but considering simultaneously two marker loci (*M*_{1} and *M*_{2}) observed for F_{2} parents and two QTL (*Q*_{1} and *Q*_{2}). Then, we generalize the results for any number of QTL in any genomic position and develop a MIM model for design III.

Assume that the loci are linked with the order *Q*_{1}*M*_{1}*M*_{2}*Q*_{2}. We denote ρ_{1}, ρ, ρ_{2}, and ρ_{12} as recombination fractions for the intervals between *Q*_{1} and *M*_{1}, *M*_{1} and *M*_{2}, *M*_{2} and *Q*_{2}, and *Q*_{1} and *Q*_{2}, respectively. We calculated the relative frequencies of QTL genotypes given the marker genotype in the F_{2} parent for two loci (Table 1) and then derived the genotypic means of the progenies in both backcrosses (appendix a). These means were denoted as , where *j* is the inbred line (*j* = 2, 1) and *g* is the genotype of the two markers in the F_{2} parent.

It is possible to define 17 orthogonal contrasts for testing differences among means (appendix b). These contrasts correspond to an orthogonal decomposition of the degrees of freedom available when two loci and two backcrosses are considered. There are 2 d.f. for differences for marker genotypes of *M*_{1}, 2 for marker genotypes of *M*_{2}, 4 for the interaction *M*_{1} × *M*_{2}, 2 for the interaction of marker *M*_{1} with the inbred lines, 2 for the interaction of *M*_{2} with the inbred lines, 4 for the interaction *M*_{1} × *M*_{2} with inbred lines, and 1 for the difference between inbred lines. Using the genotypic means of the progenies and following the definitions of genetic effects based on the F_{2} genetic model according to Cockerham and Zeng (1996; Zeng *et al*. 2005), we derived the genetic expectation of these 17 contrasts (appendix b).

There are seven QTL genotypes present in a population that originated from design III when two QTL are considered. It is important to note that some QTL genotypes do not occur in the backcross populations. For example, marker genotypes in the F_{2} parents include *M*_{1}*m*_{2}/*M*_{1}*m*_{2}, but there is no QTL genotype *Q*_{1}*q*_{2}/*Q*_{1}*q*_{2} in the backcross populations. Also not present is *q*_{1}*Q*_{2}/*q*_{1}*Q*_{2}. Hence, for a pair of QTL, it is possible to define only six contrasts for the differences between genotypes, even though there are eight parameters to be estimated (*a*_{1}, *a*_{2}, *d*_{1}, *d*_{2}, *aa*, *ad*, *da*, and *dd*). As a consequence, it is not possible to estimate all genetic parameters separately. Also, some of the 17 contrasts do not provide useful information for the genetic effects, because the genetic expectations are based on the segregating QTL in the backcross populations, not on the F_{2} marker genotypes. For example, contrasts *c*_{6}, *c*_{7}, *c*_{15}, and *c*_{16} have genetic expectations equal to zero. Contrasts *c*_{2} and *c*_{4} have the same expectation, which is − of *c*_{8}. The same happens to *c*_{11}, *c*_{13}, and *c*_{17}.

Taking these into account, a new set of six orthogonal contrasts that provide useful information about the genetic parameters was defined (Table 2). Let , , , , , and . The genetic expectations of these new contrasts are

Contrasts – are for additive and dominance effects and came directly from contrasts *c*_{1}, *c*_{10}, *c*_{3}, and *c*_{12}, respectively. They can be viewed as contrasts between marginal means of genotypic classes. Because we do not have all QTL genotypes, it is not possible in this case to define contrasts to test only the main effects (additive and dominance) without some bias due to epistatic effects. However, by considering contrasts for two QTL simultaneously, it is possible to test additive and dominance effects (plus epistatic effects) even if the two QTL are linked.

For epistasis, it is also not possible to separate *aa* from *dd* and *ad* from *da*. To test *aa* + *dd*, the contrast *c*_{5}/2 could be used. It is important to note that *c*_{5} does not use the means from genotypes that are heterozygous for at least one marker locus. Thus, by using *c*_{5}/2, means and will not be used in the analysis. Also, contrasts *c*_{2}, *c*_{4}, and *c*_{8}, which could be used for estimating *aa* + *dd*, have the expectation zero if the markers are unlinked (ρ = ), which is an obvious disadvantage. Therefore, we suggest using a linear combination of contrasts (defined as ) that uses all means. Note that if ρ = , . The same argument applies to , designed to test *ad* + *da*. Using to denote the coefficients of contrasts in Table 2, the *k*th contrast is . The six new contrasts are orthogonal because for any pair and (*k* ≠ *k*′).

The bias in the expectations of contrasts due to ρ_{1} and ρ_{2} can be removed by using multiple-interval mapping (next section). In MIM, we search and estimate the positions of QTL. Thus it is possible to test contrasts between putative QTL, not markers. This means that potentially ρ_{1} = 0 and ρ_{2} = 0; thus , , , and . For epistasis, and . For unlinked QTL with ρ = , and . This shows that given a correct identification of QTL model, the statistical analysis in the framework of MIM can minimize the bias in estimation and increase statistical power. Also, it is possible to test epistasis between any two QTL, not just QTL that are linked to a marker as in the approach of Cockerham and Zeng (1996).

In a study of the role of epistasis in the manifestation of heterosis, Melchinger *et al*. (2007) defined as an augmented additive effect of QTL *r* and as an augmented dominance effect. These augmented effects are exactly the ones contained in contrasts –, if we generalize the expressions to multiple QTL. Therefore, in a statistical analysis by MIM, we estimate and test and as well as epistasis effects.

## MIM MODEL FOR DESIGN III

The six new contrasts for two markers (Table 1) were used for the development of a MIM model for design III. Multiple-interval mapping (Kao and Zeng 1997; Kao *et al*. 1999; Zeng *et al*. 1999) is a procedure for mapping multiple QTL simultaneously with a model fitted with main and epistatic effects of multiple QTL. Combined with a search procedure, it tests and estimates the positions, effects, and interactions of multiple QTL.

#### Statistical model:

The MIM model for design III is defined by generalizing the six contrasts for any number of putative QTL and level of inbreeding of the parents,(1)where *y _{ij}* is the phenotypic mean of the progenies of parent

*i*(

*i*= 1, … ,

*n*) on the backcross with inbred line

*j*(

*j*= 1, 2). The parameters are the mean of backcross

*j*(μ

_{j}), the regression coefficients for augmented additive effect (

*a**) and dominance (

*d**) effect of QTL

*r*(α

_{r}and β

_{r}, respectively), and the regression coefficients for epistatic interactions

*aa*+

*dd*and

*ad*+

*da*between QTL

*r*and

*s*(γ

_{rs}and δ

_{rs}, respectively). The residuals ε

_{ij}are assumed to be

*N*(0, ). The variables , , , and denote QTL genotypes corresponding to the main and epistatic effects specified by the six contrasts. They were coded as

The first two summations are over the *m* QTL currently fitted in the model, and the last ones are for significant *t*_{1} and *t*_{2} two-way epistatic interactions. The coefficients for the coded variables can be seen as a generalization of the orthogonal contrasts developed for two markers with some adaptations.

For design III from recombinant inbred lines (after continuing selfing from F_{2} for a number of generations), the model can be further simplified. As a consequence of selfing, we note in Table 3 that the proportion of homozygous genotypes for at least one locus is becoming smaller in relation to the others. So, if the parents used in design III have several generations of selfing, the contrasts and the MIM model should be adapted to this situation. Details are presented in appendix c.

#### Likelihood and parameter estimation:

As pointed out by Kao *et al*. (1999), MIM models contain missing data, since the QTL genotypes are not observed. Therefore, the likelihood function for the model, assuming that the *y _{ij}*'s are independent across observations and backcrosses, iswhere

**Y**is a vector of phenotypic data for backcross

_{j}*j*,

**X**is a matrix with molecular data,

*g*indicates the 3

*multiple-QTL genotypes,*

^{m}*p*is the probability of each multilocus genotype conditional on marker data, ϕ(.) is a standard normal probability density function,

_{ig}**E**is a column vector with QTL parameters (α's, β's, γ's, and δ's), and

**D**is a row vector that specifies the configuration of

_{jg}*x**'s,

*z**'s,

*w**'s, and

*o**'s associated with the parameters on

**E**in each backcross (following the notation of Kao and Zeng 1997).

To obtain the maximum-likelihood estimates (MLEs), we adapted the general formulas of Kao and Zeng (1997) to the MIM model for design III, on the basis of the expectation-maximization (EM) algorithm (Dempster *et al*. 1977). The E and M steps are iterated until some convergence criteria are met and the converged values are the MLEs. Details are presented in appendix d.

After the final model is selected, it is necessary to convert the estimates of the regression coefficients to the contrasts, which contain the desired genetic effects. This can be easily done on the basis of the genotypic expectations of the coefficients. For any type of selfing parents (F_{2} to F_{∞}), for estimating augmented additive and dominance effects we simply multiply and by 2, since and . For epistasis between unlinked QTL, for F_{2} (or F_{3}, etc.) parents and . For homozygous parents (F_{∞}), the expectations are and .

Melchinger *et al*. (2007) pointed out that and are the net contributions of QTL *r* to parental difference and midparent heterosis, respectively, considering simultaneously main effects and epistatic interactions with the genetic background. Therefore, by providing estimates of , , and epistasis, the MIM model for design III can be very useful for studying the genetic basis of heterosis.

#### Strategy for QTL mapping:

The usual procedures for model selection in MIM can be used here and were discussed in detail by Kao *et al*. (1999) and Zeng *et al*. (1999). Briefly, forward, backward, and stepwise procedures can be applied, combined with selection criteria, such as Akaike information criteria (AIC) (Akaike 1974), the Bayesian information criterion (BIC) (Schwarz 1978), or the likelihood-ratio test. In stepwise selection, for a model with *m* QTL, the genome is scanned to find the best position of an (*m* + 1)th QTL. Then, all the QTL in the model are tested, one by one, to check if one of them should be removed. The process is repeated until no QTL was added or removed, and then the positions are refined. After finding the final model for main effects, the procedure can be repeated to identify significant epistatic effects.

## ANALYSIS OF A MAIZE DATA SET

#### Experiment description:

We applied our model to the maize data of Stuber *et al*. (1992), where detailed information about the experiment can be found. Briefly, starting from two inbred lines, *Mo17* (*L*_{1}) and *B73* (L_{2}), 264 F_{3} lines were created and backcrossed to the two inbred lines. The backcross progenies of each of the F_{3} parents were allocated in 22 sets of 12 parents and then evaluated in six locations or environments without further replication. Seven traits were measured on the backcross progenies, but we used just the adjusted means across locations for grain yield, calculated using the type III analysis of variance in the SAS general linear models procedure. Only 11 observations were missing. The F_{3} parents were genotyped with RFLP and isozyme markers and a genetic map was built using the Kosambi map function to express distances in centimorgans. We used the same 73 markers analyzed by Cockerham and Zeng, obtaining multipoint estimates with MAPMAKER/EXP (Lander *et al*. 1987) for the distances not presented in their article.

#### Statistical analysis:

##### Interval mapping for design III:

First, we applied interval mapping (IM) for design III for the maize data. This corresponds to model (1) with only one QTL fitted in the model. This was done to (1) have comparisons with the results of Stuber *et al*. (using IM for each backcross separately) and Cockerham and Zeng (using four contrasts for single-marker analysis of both backcrosses simultaneously) and (2) help on the selection of the final MIM model.

##### MIM for design III:

To select number and map positions of putative QTL to be included in an initial model, a forward procedure was used on the basis of the ideas of Kao *et al*. (1999). Starting with a model with no QTL, a model with one QTL that resulted in the greatest increase in the likelihood was selected. The procedure was repeated for adding a second QTL and so on until no further QTL can be added with a model of, say, *m* QTL. The models with *m* − 1 and *m* QTL were compared on the basis of BIC (Schwartz 1978). We also tried to add QTL on positions suggested by IM for design III, keeping them in the model if the effects were significant. When the QTL number of a model is changed, estimates of QTL positions were optimized. After a model with main effects and refined positions was established, a forward/backward procedure was applied to identify two-way epistasis between QTL. Every possible epistatic effect was tested and the one with the highest likelihood was selected. The procedure was repeated until no more effects could be added. We note that in using BIC few epistatic effects remain in the model. Since we are interested in estimating epistatic effects on heterosis, a less conservative criterion, AIC (Akaike 1974), was adopted. After epistatic effects were selected, all main and epistatic effects were tested for significance and the nonsignificant effects were removed. If the main effects of a QTL were not significant but it had some significant epistasis with at least one other QTL, it was kept in the model.

#### Results:

##### IM for design III:

The results for QTL mapping for grain yield are presented in Figure 1, A and B. In general, they are in close agreement with the previous analysis of Stuber *et al*. and Cockerham and Zeng, but provide more information and statistical power. Stuber *et al*. did the analysis on each backcross separately. A QTL was mapped if it had a significant effect in at least one backcross. We note that using IM for design III there are LOD peaks approximately in the same genomic regions previously identified, but the shape of the new curves is similar to the sum of the previous ones, with higher LOD scores. This is an indication of higher statistical power and results in more identifiable peaks in some regions, such as chromosomes 1 and 10. On the backcrosses to *B73* and *Mo17*, Stuber *et al*. found six and eight QTL, respectively, with LOD scores varying from 2.73 to 9.73. We also found evidence for QTL in the same regions, but with LOD scores between ∼10 and 35. On chromosomes 8 and 10, the QTL that were barely detectable by the analysis on each backcross separately now have LOD scores ∼10.

The separate analysis on each backcross can lead to difficult interpretation about QTL number. This can be alleviated by the new analysis. For example, on chromosome 10, IM for design (D)III shows a profile indicating that there is evidence for only one QTL in the middle of the chromosome, instead of two indicated before. However, IM for DIII still has some problems. For example, using an arbitrary LOD threshold of 3, it is difficult to precisely indicate how many QTL are on chromosomes 1, 2, 4, 5, 8, 9, and 10.

As pointed out by Cockerham and Zeng, by analyzing the backcrosses separately and estimating the genetic effects in terms of differences between heterozygous and homozygous, Stuber *et al*. actually estimated *d** + *a** for the backcross to *Mo17* and *d** − *a** for the backcross to *B73* (*d* + *a* and *d* − *a* in their notation). As a consequence, if *a** and *d** have the same magnitude, the QTL will not be identified in one backcross and its effect will be aggregated in the other. This seems to be the case for the QTL on chromosomes 3 and 4, where only one LOD curve is above the threshold. With IM for DIII, *a** and *d** can be estimated separately.

The Cockerham and Zeng approach does not provide LOD curves or an indication about QTL number, but their *P-*values can be used to identify genomic regions for the evidence of QTL. Their method is based on the analysis of both backcrosses simultaneously and also allows the estimation of *a** and *d** associated with markers. Marker analysis for all chromosomes has significant effects for at least one of the four contrasts. In general, there is correspondence between small *P-*values and LOD peaks for IM for design III, specially for *d** effects, which are the most significant ones. It is noted that *d** is positive in almost every position (with exceptions at the beginning of chromosomes 3 and 9) and is consistently larger in magnitude than *a**, whose sign varies from region to region. Few *a** effects were significant, mostly on chromosomes 3 and 4.

##### MIM for design III:

We use this analysis to provide some detailed estimates and to provide some interpretation on the basis of these estimates (Figure 1, A and B; Tables 4–6⇓⇓). Compared to other methods, this analysis tends to provide better estimates on QTL number, positions, effects, and epistasis. Thirteen putative QTL were mapped in nine chromosomes with LOD score >5 (except for the closely linked QTL X and XI). All QTL together explain 74.90 and 78.23% of the phenotypic variation in backcrosses to *Mo17* and *B73*, respectively. These values are higher than the ones found by Stuber *et al*. (59.1 and 60.9%). The main effects of each QTL individually explained from 0.61 to 12.34% of the phenotypic variation.

The estimates of *a** are both positive and negative. However, the values of *d** are consistently positive and are generally higher than those of *a**. When *a** is positive, the favorable allele comes from *B73*, and when negative, it comes from *Mo17*. The magnitude of the effects varies from −5.48 to 6.28 for *a** and from 0.36 to 9.18 for *d**. These are generally consistent with Stuber *et al*.'s results. For example, they had estimates of *d** + *a** for QTL IV and VI with values 11.57 and 10.55, respectively. In our results, these estimates are 10.81 and 8.67. For *d** − *a** for QTL II, they found 8.72; the MIM value is 9.02.

The QTL found on chromosomes 1, 2, 3, 7, and 9 are the same ones suggested by Stuber *et al*. The two QTL previously indicated on chromosome 10 are now estimated as a single one. We tried to fit a model with another QTL on this chromosome. There is not enough statistical evidence to support this model. For chromosomes 4, 5, and 8, there is evidence for three additional QTL: one near the beginning of chromosome 4, one at the end of chromosome 5, and one near the beginning of chromosome 8. The presence of QTL at the beginning of chromosome 4 was suggested by IM for design III and with more support from MIM. QTL VII on chromosome 5 has the largest LOD score (23.36) and explains 8.76 and 12.34% of the phenotypic variances in two backcrosses. This indicates the importance of this region and is in agreement with Stuber *et al*.'s results.

On chromosome 8 the two mapped QTL have *a** in opposite signs (repulsion linkage), making their identification difficult by using single-QTL models. QTL X and XI were barely detectable as a single one by Stuber *et al*. with LOD score 2.73. Cockerham and Zeng found *P*-values of 0.01 in this region only for the contrast for *d**. The two QTL also have smaller LOD scores using MIM for design III (2.48 and 0.89, respectively). However, they were retained in the model, since they were detected to have significant epistatic interaction with other QTL (Table 4).

For epistasis, the final selected model has 14 effects of *aa* + *dd* and 8 effects of *ad* + *da*. Their LOD scores vary from 0.51 to 2.66, generally smaller than the ones for the main effects. Also, they explained individually only a small fraction of the phenotypic variance (the highest was only 3.47% for *ad* + *da* between QTL IX and XI in the backcross to *B73*). Because in design III it is impossible to estimate individual epistatic effects separately, the magnitude of the effects is generally higher than that for *a** and *d** separately, varying from −16.49 to 12.91.

A summary of the final results for the selected model is presented in Table 6. The means of the progenies for the backcross to *Mo17* and *B73* are 86.25 and 90.78 from Cockerham and Zeng, close to the model means 85.52 and 90.59 in Table 6. On the basis of the orthogonal principle for the genetic model used for this study, the difference between the means is an estimate of the sum of additive effects of all potential QTL (Wang and Zeng 2006). For the 13 QTL, , which is somewhat close to the observed mean difference (4.53). From the estimates of genetic variance partition in the model, 21.02% is due to α, 59.71% to β, and 19.27% to epistasis (γ and δ).

#### Discussion:

Since MIM for design III tends to provide more appropriate results as compared to other methods, the following discussion is based on this analysis. The signs of *a** effects vary from QTL to QTL, with seven positive (the plus allele from *B73*) and six negative (the plus allele from *Mo17*). The lines *B73* and *Mo17* are elite inbred lines for grain yield and produce a superior hybrid when crossed. These lines, or lines and cultivars derived from them, are widely used for commercial purposes (Stuber *et al*. 1992). We found favorable alleles evenly distributed between the inbred lines. Since the difference is positive, one would also expect *B73* to have some advantage in terms of *a** effects, and our results corroborate this hypothesis, since .

All mapped QTL have *d** with positive sign, meaning that the heterozygous genotype is always superior in the direction of the favorable allele, wherever it is. This is in line with the hypothesis of dominance of favorable alleles as the cause of heterosis in maize. The magnitude of *d** is >2.5 times greater than that of *a** for six QTL (III, VII, IX, X, XII, and XIII). Normally this would be interpreted as evidence of overdominance for these QTL (or some of them). For QTL VII on chromosome 5, further studies based on near isogenic lines dissected this QTL into at least two smaller ones, linked in repulsion to each other and with dominant gene action (Graham *et al*. 1997). Pseudo-overdominance, described first by Jones *et al*. (1917) as a possible cause of heterosis, is usually difficult to identify. Graham *et al*.'s result clearly indicates that QTL VII, which has the highest ratio *d**/|*a**|, might be due to pseudo-overdominance, rather than overdominance. Without further study it is difficult to know whether this might be also the case for QTL III, IX, X, XII, and XIII, although there is some weak indication for it as the estimates associated with *a** change in sign around those QTL regions by the analysis of Cockerham and Zeng and IM for design III. On the basis of a further study on F_{7} parents from the same initial cross, LeDeaux *et al*. (2006) concluded that the genes act predominantly in a dominant manner (not overdominant). Further experiments with larger sample sizes may be required to check if some of those QTL have real overdominance.

Comstock and Robinson (1952) showed that, without epistasis, the average degree of dominance is a weighted average for *d* effects over *r* loci with weights . From MIM, the estimate of the augmented average degree of dominance is . This value could be interpreted as evidence for overdominance. However, Melchinger *et al*. (2007) discussed in detail that is not suitable to provide an accurate estimate of , because it is based on a ratio of quadratic forms due to *d** () and *a** () effects, being strongly affected by epistasis and the linkage disequilibrium between QTL. In our results, QTL pairs I–II, VII–VIII, and X–XI have *a** effects linked in repulsion, while for pair V–VI they are in coupling. In this situation, the contributions of linked QTL are likely to cancel in . In contrast, is clearly overestimated since all *d** effects are positive. As a consequence, is possibly overestimated.

It can be shown that the midparent heterosis *h* (considered only up to digenic epistasis) is . Therefore, only negative *aa* epistasis increases *h* in addition to dominance effects. Unfortunately, in design III it is impossible to estimate *aa* effects separately from *dd*. Because we are estimating sums of *aa* + *dd*, if they have the same magnitude and opposite signs, the effects will cancel out and epistasis will not be detectable. With opposite signs, the effect can be detected only if one of them is much larger than the other. On the other hand, if they have the same sign, the effects will add up and the interaction can be more easily detected. So, if *aa* is important for heterosis and most of its effects are negative, one would expect the signs of *aa* + *dd* estimates to be predominantly negative, because when *dd* is positive the effects tend to cancel out and would be more difficult to be detected. From the results, this does not seem to be the case, because there are seven positive and seven negative estimates of *aa* + *dd*. By these arguments, *aa* epistasis could be present, but is unlikely to contribute to the observed heterosis significantly in maize. Stuber *et al*. did not find evidence for epistasis, although they used an analysis with low statistical power. Cockerham and Zeng found some evidence for the presence of epistasis in their analysis. Their second and fourth contrasts estimate only a small fraction of linked *aa* + *dd* and *ad* + *da* epistasis. We found linked QTL on chromosomes 1, 4, 5, and 8, and for them the signs of the contrast for *aa* + *dd* were both positive and negative. Therefore, unless most of the negative *aa* effects were canceled out by positive *dd* and not detected (which seems to be unlikely), epistasis is unlikely to be an important explanation for the heterosis in maize.

From the expression of midparent heterosis, the importance of having reliable estimates of *d** becomes evident. The augmented dominance effect *d** measures the net contribution of heterotic QTL to the midparent heterosis. On the basis of the results of QTL mapping, we have bushels/acre [3.92 tons/hectare (t/ha)]. Unfortunately, the inbred lines were not evaluated in the experiments used for the current analysis and so direct heterosis estimates for this data set are not available. James Holland (personal communication) provided some information about heterosis magnitude on the cross *Mo17* × *B73*. On the basis of means over evaluations in two locations near Lafayette, Indiana, in 2003, t/ha. The plant density used was 50,000 plants/ha, while Stuber *et al*. used from 36,000 to 50,000 plants/ha. Moreover, the growing conditions in Indiana are not necessarily similar to the ones used in Stuber *et al*.'s study, and some genotype × environment interaction might be expected. In any case, the estimate of heterosis based on MIM results seems to be comparable to the data provided by James Holland.

## ANALYSIS OF A RICE DATA SET

#### Experiment description and statistical analysis:

The rice data set was presented in detail in Xiao *et al*. (1995). Briefly, 194 F_{7} parents were backcrossed to two elite homozygous lines, 9024 (*L*_{1}, *indica* parent) and LH422 (*L*_{2}, *japonica* parent). The backcross progenies were evaluated in a randomized complete block design with two replications. Twelve quantitative traits were measured, but we used just means over replications for grain yield (in tons/hectare). A genetic map for the recombinant inbred population was constructed with 141 RFLP markers and the genetic distances were expressed in centimorgans using the Kosambi map function.

To help in the selection of the final MIM model, the same procedures used for the maize data were applied. Initially, IM for design III was applied. Then, a MIM model for design III was selected. First a forward procedure was used until no more QTL could be added. Second, a forward/backward procedure was applied to find two-way epistasis between QTL. Models were compared using the BIC for the main effects and the AIC for epistatic effects. The positions were refined in every step of model updating. Finally, we also estimated the four contrasts proposed by Cockerham and Zeng for all markers. For epistasis, some markers did not have heterozygous genotypes and therefore the contrasts could not be estimated.

#### Results:

##### IM for design III:

The results for QTL mapping for grain yield are presented in Figure 2, A and B. In the same way as for the maize data, they are in agreement with the analysis of Xiao *et al*., but provide more information and statistical power. Xiao *et al*. did their analysis in a way similar to Stuber *et al*., considering the backcrosses separately. They found only two QTL, one in the backcross to *japonica* on chromosome 8 (with LOD score 2.49), and another one in the backcross to *indica* on chromosome 11 (with LOD score 2.64). Using IM for design III there are LOD peaks in the same regions, but with higher LOD scores (∼4.5). Moreover, there is an indication of additional QTL in many other chromosomes.

In general, the LOD curves from Xiao *et al*. are flat and with small values. When the analysis is done for both backcrosses simultaneously, some peaks become more evident, such as on chromosomes 2, 3, 5, and 11. The QTL on chromosome 4, that had previously a LOD score <2 and thus was not selected, now has a more identifiable peak with LOD score ∼4. At the beginning of chromosome 11 there is strong evidence for the presence of a QTL, showing that the new analysis can significantly increase the ability for the identification of QTL. In fact, this QTL is the most important one in the MIM model (next section).

For the same reasons as discussed above for the maize data, Xiao *et al*. also estimated *d** + *a** and *d** − *a**, leading to the identification of QTL in only one backcross if the effects are similar in magnitude. With the combined analysis, *a** and *d** could be estimated separately. The *P-*values for the contrasts of Cockerham and Zeng were not significant for all markers, with only few exceptions that are possibly false positives. None of the *P-*values is <0.01. The signs of the contrasts are in agreement with the estimates from IM for design III. In contrast to the results for maize data, now *d** effects are positive and negative for approximately the same number of regions.

##### MIM for design III:

Six QTL were mapped on chromosomes 2, 4, 7, 8, and 11, with LOD scores varying from 0.40 to 9.43 (Figure 2, A and B, Tables 7–9⇓⇓). QTL II and III were retained in the model because they had significant epistasis with another QTL. Not all putative QTL suggested by IM were kept in the final MIM model, since they were not significant. This is the case for putative QTL on chromosomes 1, 5, and 6 and also for the one near the end of chromosome 2. Only chromosome 11 has more than one QTL, but they are very far apart (>90 cM).

Surprisingly, QTL V at the beginning of chromosome 11 was not detected by Xiao *et al*., having just a slight tendency for its presence in the backcross to *japonica*. However, it has the highest LOD and *R*^{2} in our analysis. Its presence is also suggested by IM for design III. This is an indication that the analysis of the combined backcross has more statistical power and can lead to different results.

Together, all QTL explain 60.94 and 64.67% of the phenotypic variation in the backcrosses to *indica* and *japonica*, respectively. In their analysis, Xiao *et al*. found only two QTL (named IV and VI in our results), explaining 6.80 and 6.30% of the phenotypic variation. In our analysis, the main effects of QTL have *R*^{2}'s varying from 0.34 to 31.13%. Four *aa* + *dd* and five *ad* + *da* epistasis effects were selected, with small LOD scores. For the estimated genetic variance, 74.29% is due to additive effects of QTL, 9.52% is due to dominance effects, and 16.19% is due to epistatic effects. In contrast to the maize results, *a** effects seem to be more important for rice.

The signs of *a** are negative for all QTL (except QTL I), showing that the favorable alleles are concentrated in *indica*. Their values vary from −0.723 to 0.442 (t/ha). Significantly different from maize, *d** effects are both positive (for four QTL) and negative (for two QTL) and are in general smaller than *a** in magnitude. No evidence for overdominance of any QTL is observed.

#### Discussion:

Again, the following discussion is based on the results of MIM for design III. The *a** effect is positive for one QTL and negative for the other five, showing that the favorable alleles are distributed between the parents but with concentration in the *indica* parent. In contrast to maize, *d** estimates are now positive and negative, indicating that the heterozygote is not always superior in the direction of the favorable allele. This is not in line with the hypothesis that dominance is a major cause of heterosis in rice.

For rice, *d** effects are not significantly greater than *a** effects for any QTL. This can be interpreted as lack of overdominance (or pseudo-overdominance). Actually, from our results, , corroborating the importance of *a** effects for grain yield in rice. Even knowing that can be strongly biased, one would expect this to occur in a smaller magnitude in this case, since there is no evidence for closely linked QTL (the only two QTL on the same chromosome are very far apart). Therefore, the bias due to *aa* and *da* effects contained in *a** and *d** and the overestimation that happened for in maize is not expected here.

Xiao *et al*. concluded that dominance is the major genetic basis of heterosis in rice. In the same way as Stuber *et al*., they used the difference between the phenotypic means of heterozygous and homozygous genotypes in each backcross as an estimate of the phenotypic effect of QTL. They found one positive and one negative result for these differences for the two QTL for grain yield. Since positive and negative signs indicate superior heterozygous and homozygous genotypes, respectively, they assumed lack of overdominance and concluded that dominance (or partial dominance) is the major contributor to F_{1} heterosis. Probably, their conclusions were reinforced by the fact that they did not find significant epistasis. However, using differences on each backcross they were actually estimating *d** + *a** and *d** − *a** in the backcross to *indica* and *japonica*, respectively. Our estimates for *d** + *a** and *d** − *a** for QTL IV and V are, respectively, −0.171 and 0.834, with the same signs as the Xiao *et al*. estimates, showing that positive and negative estimates can appear, but are not necessarily evidence of dominance (or partial dominance) as a major cause for heterosis.

Since rice is a self-pollinated species, it is common to express heterosis also in terms of the difference between F_{1} and the better parent (also called heterobeltiosis, *H*). Xiao *et al*. estimated heterobeltiosis t/ha. Melchinger *et al*. showed that . From the MIM results, t/ha, close to the observed heterosis. However, when considering the midparent heterosis *h*, we get from the MIM results t/ha, while Xiao *et al*.'s value is 1.605 t/ha, >15 times greater. One possible explanation for this difference is the presence of epistasis. As pointed out above, if *aa* is a cause for the midparent heterosis, its signs will be predominantly negative. But if *d* signs vary from locus to locus, *d** signs will tend to be positive and negative and therefore will tend to cancel each other out when added in *h*. Our estimates of *aa* + *dd* showed three negative signs and one positive sign. This could be an indication of a tendency of *aa* to be predominantly negative and therefore potentially important as a cause for the midparent heterosis in rice. In addition to the facts that normally epistasis is difficult to detect and design III is also not suitable to estimate epistatic effects separately, the progeny data used in this research were evaluated in only one location and year, with few replications. So, it may be expected that the means used in the analysis were not estimated with good precision. Therefore, this tendency for the presence of negative *aa* epistasis as a cause for heterosis needs to be confirmed in further studies.

## CONCLUSIONS

The objective of this research is to study the genetic basis of heterosis in maize and rice. Since maize and rice are economically important and are good examples of outcrossing and self-pollinating crops, we believe that the conclusions from this study may be useful for plant breeders and geneticists. To achieve this goal, we first extended the single-marker contrasts proposed by Cockerham and Zeng for the analysis of design III to two markers. On the basis of the genetic expectations of contrasts for the analysis of two markers simultaneously, we were able to propose a new model for a statistical analysis of design III, taking into account positions between markers. This leads to the MIM model for design III that provides a basis to estimate QTL number, positions, effects (*a** and *d**), and epistatic interactions (*aa* + *dd* and *ad* + *da*) simultaneously. Our model can be used for parents with any number of generations in selfing.

After Stuber *et al*. and Cockerham and Zeng, a few authors also proposed methods for QTL mapping and analysis of design III, most of them based on the derivations of Cockerham and Zeng showing that the contrasts of heterozygous and homozygous genotypes on each backcross actually test *d** + *a** and *d** − *a**. For example, Lu *et al*. (2003) and Ledeaux *et al*. (2006) proposed the utilization of composite-interval mapping (CIM) (Zeng 1994) on each backcross separately and, after QTL were mapped (in one or both backcrosses), *a** and *d** effects were estimated by a linear combination of the contrasts for each backcross. Although *a** and *d** effects can be estimated individually in this way, the results of QTL mapping are still based on the analysis of each backcross separately in a similar way to that of Stuber *et al*. Lu *et al*. proposed to test epistasis by fitting a two-locus linear regression model for the main effects and interaction between loci. If performed in this way, it is likely that epistasis will be rarely identified because the test tends to have relatively low statistical power and, even if identified, it is not clear how to interpret the results in a way to understand its influence on heterosis. In a different approach, Melchinger *et al*. (2007) suggested the use of CIM for the identification of genomic regions affecting heterosis. They defined two orthogonal single-marker contrasts based on progeny mean values for pair means and pair differences. These contrasts, which correspond to contrasts *C*_{1} and *C*_{3} of Cockerham and Zeng, and and in our MIM model, are used individually for CIM analysis of the combined backcrosses and the estimation of *a** and *d**. Although using information from both crosses simultaneously, their method is still based on CIM and does not capitalize on all the advantages of MIM models. To our knowledge, the proposed MIM model for design III is probably the most powerful statistical method for QTL mapping in this type of population currently. We developed a module of MIM for design III for Windows QTL Cartographer (Wang *et al*. 2007) specifically for its public use. The software can be freely downloaded from http://statgen.ncsu.edu/qtlcart/WQTLCart.htm.

We realize that by using AIC as a criterion for including epistasis in the MIM model, there is a risk that the final model may be overfitted. However, this was done mostly to study the sign of estimates for epistasis. Normally, epistasis is difficult to detect with statistical significance, and both Stuber *et al*. and Xiao *et al*. did not find evidence for it using statistical tests with relatively low statistical power. Since our model allows the inclusion of epistasis, it is possible to study its effects more clearly on maize and rice. The results showed that dominance is possibly a major cause of heterosis in maize, although overdominance (or pseudo-overdominance) of individual loci could not be ruled out. On the other hand, for rice there is evidence that additive × additive epistasis could be important for explaining heterosis. Maize and rice evolved from a common ancestor (Ahn and Tanksley 1993) but have different reproductive biology. As a consequence, maize is supposed to have more deleterious recessive alleles than rice, masked by their corresponding dominant counterparts. When inbreeding occurs, these unfavorable alleles are expressed in the homozygous loci, causing the inbreeding depression. In self-pollinating species, deleterious alleles are possibly eliminated by natural (and artificial) selection since the individuals are homozygous. Therefore, outcrossing species could be selected for true dominant loci to avoid the expression of these deleterious loci (causing the outbreeding advantage), whereas in self-pollinating species the selection for dominance is less important and, when an F_{1} cross shows midparent heterosis, it is more likely due to epistatic interactions (*aa*) among loci.

Two important conferences about heterosis should be mentioned. In 1950, in Iowa, there was a 5-week conference (Gowen 1952). At that occasion, Comstock and Robinson (1952) proposed design III as a means to estimate the average degree of dominance and also presented some estimates, suggesting overdominance. Some authors proposed breeding schemes to exploit it. Since then, design III has been widely used in breeding programs over the years for understanding the genetic basis of many economically important traits and for developing breeding schemes. Crow (1999, p. 521) said that “1950 and the next few years was the zenith of overdominance,” but in later years the importance of the dominance hypothesis increased. When comparing this conference with another one that took place in 1997 in Mexico City, Crow (1999) noted a change in emphasis, since in the second one many authors included epistasis in their presentations. We hope that the results presented here can make a contribution to this important discussion.

## APPENDIX A: GENOTYPIC CONSTITUTION OF THE PROGENIES FROM F_{2} PARENTS

Here we expand the idea of Cockerham and Zeng (1996) and consider F_{2} parents for two linked markers (*M*_{1} and *M*_{2}) with recombination fraction ρ. The markers are linked to two QTL with the linkage order *Q*_{1}*M*_{1}*M*_{2}*Q*_{2}. The recombination fraction between *Q*_{1} and *M*_{1} is ρ_{1}, between *M*_{2} and *Q*_{2} is ρ_{2}, and between *Q*_{1} and *Q*_{2} is ρ_{12}. We assume no crossover interference, so ρ_{12} = ρ_{1}(1 − ρ)(1 − ρ_{2}) + (1 − ρ_{1})ρ(1 − ρ_{2}) + (1 − ρ_{1})(1 − ρ)ρ_{2} + ρ_{1}ρρ_{2}. Assume that the inbred lines' genotypes are *L*_{2} = *Q*_{1}*Q*_{1}*M*_{1}*M*_{1}*M*_{2}*M*_{2}*Q*_{2}*Q*_{2} and *L*_{1} = *q*_{1}*q*_{1}*m*_{1}*m*_{1}*m*_{2}*m*_{2}*q*_{2}*q*_{2}.

Denote F_{1} gametes aswithThe gametic frequencies are one-half of

From these frequencies, it is easy to show the conditional frequencies of QTL gametes from F_{2} with different marker genotypes (Table 1). These gametes are combined with the gametes *Q*_{1}*Q*_{2} and *q*_{1}*q*_{2} from inbred lines *L*_{2} and *L*_{1}, respectively, to form two backcross populations.

Let denote the genotypic means of backcross progenies with *g* marker genotype in the F_{2} parent backcrossed to parental line *j*. There are 18 values. They are weighted genotypic values of seven QTL genotypes (the nine possible genotypes at two loci of minor genotypes *Q*_{1}*q*_{2}/*Q*_{1}*q*_{2} and *q*_{1}*Q*_{2}/*q*_{1}*Q*_{2}, which are not produced in the backcrosses) with weights given in Table 1.

## APPENDIX B: ORTHOGONAL CONTRASTS WITH TWO MARKERS

When two markers are considered simultaneously in the two backcrosses of design III, it is possible to define a set of 17 orthogonal contrasts denoted as *c _{k}* (

*k*= 1, … , 17) (Table 3). Denoting the coefficients in Table 3 as

*u*, the

_{kgj}*k*th contrast is . All contrasts are orthogonal because for any pair of contrasts

*c*and

_{k}*c*′ (

_{k}*k*≠

*k*′).

Contrasts *c*_{1}–*c*_{4} are for marginal differences among means for marker genotypes of *M*_{1} (*c*_{1} and *c*_{2}) and *M*_{2} (*c*_{3} and *c*_{4}) and can be viewed as a direct expansion of the first and third contrasts of Cockerham and Zeng. Contrasts *c*_{1} and *c*_{3} are for differences between homozygous marker genotypes for *M*_{1} and *M*_{2}, respectively, and *c*_{2} and *c*_{4} are for contrasts between heterozygous and homozygous marker genotypes. The contrasts *c*_{5}–*c*_{8} are for interactions between *c*_{1} and *c*_{3}, *c*_{1} and *c*_{4}, *c*_{2} and *c*_{3}, and *c*_{2} and *c*_{4}, respectively. Contrast *c*_{9} is for testing the difference between the inbred lines (not considered by Cockerham and Zeng) and *c*_{10}–*c*_{17} are for interactions of contrasts *c*_{1}–*c*_{8} with the inbred lines (analogous to contrasts 2 and 4 of Cockerham and Zeng).

On the basis of the genotypic constitution of the progenies of F_{2} parents (Table 1 and appendix a) and substituting the genotypic values by the genetic effects based on the F_{2} genetic model (Cockerham and Zeng 1996; Zeng *et al*. 2005), we derived the genetic expectation of the 17 contrasts:

## APPENDIX C: DESIGN III WITH RECOMBINANT INBRED LINES

If we continue selfing F_{2} for a number of generations, it will lead to the development of recombinant inbred lines (F_{∞}) where heterozygote genotypes are eliminated. There are four homozygote genotypes for two loci in the recombinant inbred lines and eight genotypic means in the two backcrosses. The six contrasts can be further simplified from Table 2 and are presented in Table A1.

The genotypic expectations of the contrasts in the framework of MIM can be expressed for two QTL asThe MIM model is thenwhere *y _{ij}*, μ

_{j}, α

_{r}, β

_{r}, γ

_{rs}, δ

_{rs}, and ε

_{ij}have the same interpretation of the MIM model in the main text.

The indicator variables for the main and interaction effects are

## APPENDIX D: EM ALGORITHM

Adapting the general formulas of Kao and Zeng (1997) for the likelihood of our model, we present here the EM algorithm using matrix notation. (However, when coding the software, we took into consideration the problems for convergence presented by Zeng *et al*. 1999 and used a different notation; see Kao and Zeng 1997 for details). For the [τ + 1]*th* iteration,

E step:M step:where **1** is a column vector of ones, , , , and . **D _{jk}** (

**D**) is the

_{jl}*k*th (

*l*th) column of the genetic design matrix

**D**, δ(

_{j}*k*≠

*l*) is an indicator variable that assume values 1 if

*k*≠

*l*and 0 otherwise, and # denotes the Hadamard product. For details about genetic design matrices see Kao and Zeng (1997) and Kao

*et al*. (1999).

To test the MLEs of the **E** vector, the likelihood-ratio test or the LOD score can be used. For example, for testing the effect *E _{r}*,

## Acknowledgments

The authors thank Charles Stuber (North Carolina State University) and Steven Tanksley (Cornell University) for providing the maize and rice data, respectively. This research was done while A. A. F. Garcia was working with Z.-B.Z. as a visiting scientist (postdoc) at the Bioinformatics Research Center, North Carolina State University, with a fellowship from Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brazil (grant no. 200345/2004-4). Z.-B.Z. was also partially supported by National Institutes of Health grant GM45344 and by the National Research Initiative of the U.S. Department of Agriculture Cooperative State Research, Education and Extension Service, grant no. 2005-00754. A.E.M. was supported by grants from the German Research Foundation (ME931/4-1 and ME937/4-2).

## Footnotes

**“Design III with marker loci” was the last article published by C. Clark Cockerham. This article is dedicated to his memory.**Communicating editor: E. S. Buckler

- Received October 4, 2007.
- Accepted September 8, 2008.

- Copyright © 2008 by the Genetics Society of America