## Abstract

The expression of most developmental or behavioral traits involves complex interactions between quantitative trait loci (QTL) from the maternal and offspring genomes. The maternal-offspring interactions play a pivotal role in shaping the direction and rate of evolution in terms of their substantial contribution to quantitative genetic (co)variation. To study the genetics and evolution of maternal-offspring interactions, a unifying statistical framework that embraces both the direct and indirect genetic effects of maternal and offspring QTL on any complex trait is developed. This model is derived for a simple backcross design within the maximum-likelihood context, implemented with the EM algorithm. Results from extensive simulations suggest that this model can provide reasonable estimation of additive and dominant effects of the QTL at different generations and their interaction effects derived from the maternal and offspring genomes. Although our model is framed to characterize the actions and interactions of maternal and offspring QTL affecting offspring traits, the idea can be readily extended to decipher the genetic machinery of maternal traits, such as maternal care. Our model provides a powerful means for studying the evolutionary significance of indirect genetic effects in any sexually reproductive organisms.

FOR many characteristics, particularly those expressed early in life, an individual's phenotype may be influenced by two kinds of genetic effects: *direct genetic effects* of the genes carried by that individual on its own traits and *indirect genetic effects* of genes carried by others (Wolf* et al.* 1998; Li* et al.* 1999; Wolf 2000, 2003; Agrawal* et al.* 2001; Hager and Johnstone 2003). Direct genetic effects occur when genes possessed by an individual directly influence that individual's phenotype, whereas indirect genetic effects occur when genes expressed in one individual have phenotypic effects in another individual. Notable indirect genetic effects were observed in many taxa, such as mice (Falconer 1965; Hager and Johnstone 2003), fish (Reznick 1991), plants (Roach and Wulff 1987), and insects (Mousseau and Dingle 1991; Agrawal* et al.* 2001). Their simultaneous occurrence with direct genetic effects can create new sources of genetic (co)variation (Mousseau and Fox 1998; Wolf 2003) and therefore can change the rate and direction of evolution (Cheverud 2003).

A number of quantitative genetic models have been proposed to quantify the contribution of direct genetic effects to the phenotypic variance (Lynch and Walsh 1998). These models have now been refined to characterize and map individual genetic loci [or quantitative trait loci (QTL)] affecting complex phenotypes, using molecular linkage maps (Lander and Botstein 1989; Wu* et al.* 2002; reviewed in Jansen 2000). While these refined models are instrumental in the genetic mapping of QTL triggering direct effects on a variety of traits, none of them can be effectively used to map QTL of indirect genetic loci because of their failure to take into account gene actions and interaction from different genomes. Many of the current results from the mapping of direct-effect QTL may be insufficient or misleading in the characterization of genetic architecture due to the omission of the potentially important sources of genetic variance contributed by indirect genetic effects (see Wolf* et al.* 2002).

The presence of indirect genetic effects means that the phenotypic effect of a gene in an individual relies on the genes possessed by its social partners. Perhaps the most important indirect genetic effects occur between parents and their offspring (Wolf 2003). An enormous body of work on parent-offspring conflict suggests that life history traits and other development-related traits are affected by genes expressed in mothers and by maternally and paternally inherited genes expressed in offspring (Agrawal* et al.* 2001; Hager and Johnstone 2003). Using simple mapping approaches, some researchers have begun to map QTL for indirect effects from maternal interactions (Peripato and Cheverud 2002; Peripato* et al.* 2002; Wolf* et al.* 2002). For example, Wolf* et al.* (2002) identified several direct and maternal-effect QTL for offspring preweaning growth in a cross of LG/J and SM/J inbred mouse strains. They further found that maternal-effect loci contribute much more strongly to the genetic variance in offspring growth than do direct-effect loci. However, results from simple mapping approaches are premature because they were not designed to understand the genetics underlying parent-offspring interactions.

Here we present a novel statistical model for characterizing the genetic effects of the maternal genome on offspring traits. In doing this, we incorporate into our mapping model epistatic interactions between one QTL from the maternal genome and other QTL from the offspring genome. In the QTL mapping study, the direct- and maternal-effect QTL mapped to distinct portions of the genome (Wolf* et al.* 2002). Epistasis, expressed as the effect of one gene contingent upon the expression of other genes, is thought to play a central role in trait evolution and speciation (Doebley* et al.* 1995; Lark* et al.* 1995; Whitlock* et al.* 1995; Phillips 1998; Wade 1998). Despite their considerable evolutionary significance (Wolf 2003), theoretical models of epistatic genetic effects resulting from maternal and offspring genomes are virtually lacking in the literature. To illustrate the idea of our mapping approach, we base our analysis on a simple backcross genetic design for an autogamous plant system. It is possible for our model to be extended to more complex designs, such as the F_{2}, full-sib family, natural populations, and allogamous systems.

## EXPERIMENTAL DESIGN

Suppose a backcross population is of size *n*, initiated with two contrasting inbred lines in autogamous plants. We use *t* to denote the generation of the backcross and *t* + 1 to denote the generation of offspring derived from the autogamous pollination of the backcross plants. Thus, leaves, stems, or roots of the backcross represent generation *t*, whereas the embryos and endosperms of its seeds represent generation *t* + 1. We use this backcross population to construct a genetic linkage map with molecular markers. Each of the members in the backcross is measured for a quantitative trait. For the traits with the same generation as the backcross plant (mother), such as plant height, traditional mapping approaches (Lander and Botstein 1989) can be used directly to map the underlying QTL affecting plant height on the map. But for offspring traits (resulting from the embryo) with different generations from the mother, the direct use of traditional approaches may lead to biased estimation of genetic architecture because these traits, particularly when they are expressed in the mother plant, are confounded by genes from both the maternal genome and the maternally and paternally contributed offspring genome. We propose to use marker genotypes derived from the backcross population to map epistatically interacting maternal and offspring QTL for the embryo trait.

Consider two flanking markers, ℳ_{η}(*t*) and ℳ_{η+1}(*t*), derived from sporophytic plants of generation *t* in the backcross population. The recombination fraction between the two markers is denoted by *r*. A putative maternal QTL, denoted by 𝒬(*t*) and located between these two markers (measured by the recombination fraction *r*_{1} with ℳ_{η} or *r*_{2} with ℳ_{η+1}), may affect an embryo trait of interest. Table 1
gives the conditional probabilities (π* _{ij}*) of backcross individual

*i*(

*i*= 1, … ,

*n*) carrying a maternal QTL genotype

*j*[

*j*= 1 for

*t*) and 0 for

*t*)], given the four marker genotypes of the backcross population,

*M*

_{η}

*m*

_{η}

*M*

_{η+1}

*m*

_{η+1}(

*t*),

*M*

_{η}

*m*

_{η}

*m*

_{η+1}

*m*

_{η+1}(

*t*),

*m*

_{η}

*m*

_{η}

*M*

_{η+1}

*m*

_{η+1}(

*t*), and

*m*

_{η}

*m*

_{η}

*m*

_{η+1}

*m*

_{η+1}(

*t*). This embryo trait is also affected by a QTL on the offspring genome of generation (

*t*+ 1), denoted by 𝒬(

*t*+ 1). The offspring QTL may be located on either the same marker interval or a different marker interval, ℳ

_{η′}(

*t*) and ℳ

_{η′+1}(

*t*). Unlike the maternal QTL, the offspring QTL will segregate into three genotypes,

*t*+ 1),

*t*+ 1), and

*t*+ 1), through the autogamous pollination of the backcross plants. We derive the conditional probabilities (π

_{ij}_{′},

*j*′ = 2, 1, 0) of these three offspring QTL genotypes given four different marker genotypes of the backcross,

*M*

_{η′}

*m*

_{η′}

*M*

_{η′+1}

*m*

_{η′+1}(

*t*),

*M*

_{η′}

*m*

_{η′}

*m*

_{η′+1}

*m*

_{η′+1}(

*t*),

*m*

_{η′}

*m*

_{η′}

*M*

_{η′+1}

*m*

_{η′+1}(

*t*), and

*m*

_{η′}

*m*

_{η′}

*m*

_{η′+1}

*m*

_{η′+1}(

*t*) (Table 2) . The conditional probabilities of joint maternal-offspring QTL genotypes given the marker genotypes of two different marker intervals can be expressed as the product of the corresponding conditional probabilities,

*i.e.*,

If the maternal and offspring QTL are located within the same marker interval, we need to derive the conditional probabilities of six joint maternal-offspring QTL genotypes, *Qq*(*t*)*QQ*(*t* + 1), *Qq*(*t*)*Qq*(*t* + 1), *Qq*(*t*)*qq*(*t* + 1), *qq*(*t*)*QQ*(*t* + 1), *qq*(*t*)*Qq*(*t* + 1), and *qq*(*t*)*qq*(*t* + 1), given four marker genotypes of the backcross (Table 3)
.

Traditional quantitative genetic theory (Cheverud 2000) is used to model the genetic values of six joint maternal-offspring QTL genotypes. Because a backcross is considered in which there are two genotypes at a locus, we can model only the additive effect of the maternal QTL. But a heterozygous backcross is selfed to generate three possible QTL genotypes in the embryo, which makes it possible to model the additive and dominant effects of the offspring QTL. Therefore, the maternal-offspring QTL interactions that we can model will include the additive × additive and additive × dominant effects. Let *a*(*t*) be the additive effect of the maternal QTL, 𝒬(*t*), and *a*(*t* + 1) and *d*(*t* + 1) be the additive and dominant effects of the offspring QTL, 𝒬(*t* + 1). These two QTL interact to form the additive × additive (*i _{aa}*) and the additive × dominant epistatic effects (

*i*), respectively, between the two different genomes. The genotypic means of the six joint QTL are expressed as 1

_{ad}The additive, dominant, and epistatic effects of the two QTL can be estimated from the estimated genotypic means by solving the above regular equations, as can be seen in Wu* et al.* (2002).

## STATISTICAL METHOD

### The mixture model:

Statistical methods for mapping QTL based on a mixture model have been previously developed (Lander and Botstein 1989). In the mixture model, each observation *y* is assumed to have arisen from one of *k* (*k* possibly unknown but finite) components, each component being modeled by a density from the parametric family *f*,
2where π = (π_{1}, … , π* _{k}*) are the mixture proportions that are constrained to be nonnegative and sum to unity; φ = (φ

_{1}, … , φ

*) are the component-specific parameters, with φ*

_{k}*being specific to component*

_{i}*i*; and η is a parameter (

*i.e.*, residual variance) that is common to all components.

Modeling maternal-offspring interactions contains two major tasks: (1) Derive the mixture proportions (π) expressed as the conditional probabilities of joint maternal-offspring QTL genotypes given marker genotypes (Tables 1–3) and the density functions expressed in terms of six QTL genotypic means and residual variance; (2) provide estimates of unknown parameters **Θ** = (π, φ, η) included in the mixture model using statistical and computational algorithms, such as the maximum-likelihood method (Lander and Botstein 1989; Wu* et al.* 2002) and the Bayesian approach (Satagopan* et al.* 1996; Robert and Casella 1999). Here, we implement the expectation-maximization (EM) algorithm to obtain the maximum-likelihood estimates (MLEs) of the unknown parameters.

### EM algorithm:

On the basis of the mixture model (2), we formulate the likelihood of the marker data and embryo trait phenotypes controlled by the putative QTL as
3where the unknown vector **Θ** contains QTL effects (Equation 1), QTL positions (Tables 1–3), and residual variance (σ^{2}). These parameters are solved using the EM algorithm (Dempster* et al.* 1977) described below.

In the E step, the conditional probabilities (priors) of the QTL genotypes given the marker genotypes and the normal distribution function are used to calculate
4which could be thought of as a posterior probability that the *j*th seed of the *i*th backcross plant has the *k*th joint maternal-offspring QTL genotype.

In the M step, the calculated posterior probabilities were used to solve the unknown parameters 5 6

Iterations are repeated between Equations 4 and 6 until convergence. The values at convergence are the MLEs. With the MLEs of μ* _{j}*'s, the MLEs of the overall mean, the additive, dominant, and epistatic effects of the QTL, as indicated in Equation 1, can be obtained by solving a system of regular equations. The estimation of the QTL positions can be obtained using a grid approach. This approach views the QTL positions as a known parameter in the likelihood function (3) by scanning the QTL over all marker intervals. The positions corresponding to the maximum of the log-likelihood ratio across a linkage group are the MLEs of the QTL positions.

### Hypothesis tests:

Our model provides hypothesis tests for the presence of QTL affecting the expression of an offspring trait and their additive, dominant, and maternal-offspring epistatic interactions. The presence of QTL can be tested by using the following hypotheses: 7

The test statistic for testing the above hypotheses is calculated as the log-likelihood ratio (LR) of the full model (H_{1}) over the reduced model (H_{0}),
where **Ω̃** and **Ω̂** denote the MLEs of the unknown parameters under H_{0} and H_{1}, respectively. The LR is asymptotically χ^{2} distributed with 5 d.f. However, for a practical data set with a limited sample size, the critical threshold value for declaring the presence of QTL can be empirically calculated on the basis of permutation tests (Churchill and Doerge 1994).

We can also test whether the maternal QTL exerts a significant additive (main) effect on the offspring trait by simply formulating
8whose log-likelihood-ratio test statistic is asymptotically χ^{2} distributed with 1 d.f. The hypothesis about the effect of offspring's own QTL on the offspring trait is tested using
9

Increasing evidence has been obtained for the role of maternal-offspring interactions in changing the rate and direction of evolution (Wolf 2003). The significance of maternal-offspring interactions (*i.e.*, *i _{aa}* and

*i*in our model) can be tested by formulating the hypotheses 10

_{ad}The likelihood value under H_{1} of Equation 10 is the same as that under H_{1} of Equation 7. The likelihood value under H_{0} of Equation 10 is calculated by plugging the MLEs of μ, *a*(*t*), *a*(*t* + 1), *d*(*t* + 1), and σ^{2} under this null hypothesis into the likelihood function (3). The characterization of critical thresholds for hypotheses 8–10 can be empirically obtained through simulation studies.

## RESULTS

We performed simulation studies to investigate the theoretical properties of our model. A linkage group of 80 cM, composed of five equidistant markers, is simulated for a backcross population. For a given marker, there are two (maternal) genotypes, one heterozygous and the other homozygous, in the backcross. Through selfed pollination, the heterozygote produces three genotypes and the homozygote produces one homozygote in the progeny population, which leads to four joint maternal-offspring genotypes. We use a two-stage hierarchical bifurcation strategy to simulate genotypes at the five markers for a given sample size of *n*. The upper-hierarchical bifurcation of this strategy concerns the polynomial distribution of 2^{5} marker genotypes in the backcross population, whereas the lower-hierarchical bifurcation concerns the polynomial distribution of 4^{5} joint maternal-offspring marker genotypes in the autogamous progeny of the backcross plants. These two hierarchies are connected through Mendelian inheritance laws. This simulation strategy is illustrated by Figure 1
in which three genes (𝒜, ℬ, 𝒞) are cosegregating in a backcross and cotransmitted into the offspring generation of the backcross.

Assume that there are two QTL for an offspring trait, one from the maternal genome and the other from the offspring genome, which together account for a proportion (*R*^{2}) of the total observed variance in this trait. We consider the two QTL to be located in two different marker intervals (ℒ_{1}) or the same marker interval (ℒ_{2}). In this simulation, we consider 12 different schemes each representing a combination of three different sample sizes (*n* = 200, 400, and 800), two different QTL-effect sizes (*R*^{2} = 0.1 and 0.4), and two different QTL locations (ℒ_{1} and ℒ_{2}). For ℒ_{1}, the maternal QTL is located at 16 cM from the left marker of the first interval and the offspring QTL is located at 8 cM from the left marker of the third interval (Figure 2)
. For ℒ_{2}, the maternal and offspring markers are located at 8 and 16 cM from the left marker of the first interval, respectively (Figure 3)
. Knowing the locations of these assumed QTL, joint QTL-marker genotypes can be simulated by viewing the QTL as “markers” using the two-stage hierarchical bifurcation strategy described above (Figure 1). The phenotypic values of an assumed offspring trait are simulated as the sum of the QTL genotypic means (determined by both the maternal and offspring QTL) and normally distributed random errors.

Tables 4 and 5
give the hypothesized values of the QTL-effect parameters as shown in Equation 1, as well as their MLEs under different simulation schemes. First, both the maternal and offspring QTL can be well identified. As an example, we draw the landscapes of the log-likelihood-ratio test statistics, calculated using Equation 3, as a function of the location of these two QTL only for two extreme situations, one having small *R*^{2} and *n* (Figures 2A and 3A) and the other having large *R*^{2} and *n* (Figures 2B and 3B). The peaks of the landscapes from each of 100 simulation replicates are all beyond the critical thresholds at α = 0.001 obtained from permutation tests, suggesting that our model has a power of 100% to detect the QTL hidden in our simulated dataset. The locations of both the maternal and offspring QTL can be reasonably estimated (Figures 2 and 3). Second, in general, all parameters can be quite reliably estimated using our model constructed with marker information purely from the maternal population. As expected, the additive genetic effects of the two QTL can be more reliably estimated than the dominant effects. Of different epistatic components, the additive × additive genetic effect can be estimated better than the additive × dominant effect. The overall mean and residual variance can be estimated precisely.

Increased *R*^{2} and sample sizes can always improve the precision of the parameter estimation (Tables 4 and 5). But relative to the effect of sample sizes used, the increase of *R*^{2} from 0.1 to 0.4 can lead to more significant improvement for the estimation precision, especially for the dominant and epistatic effects, than the increase of *n* from 200 to 400 to 800. For example, the squared root of the mean square error of the MLE of the additive × dominant genetic effect (*i _{ad}*) between two QTL located at different marker intervals would decrease by 217% when

*R*

^{2}is increased from 0.1 to 0.4 but only by 66% when

*n*is increased from 200 to 800. The corresponding numbers are 29% and 77% for the two QTL located at the same interval. This suggests that in practice well-managed experiments, through which residual errors are reduced and therefore

*R*

^{2}is increased, are more important than simply increasing sample sizes.

When both *R*^{2} and *n* are small, the estimates of the QTL parameters display similar precision for the two cases in which QTL are located at different (ℒ_{1}; Table 4) or the same marker intervals (ℒ_{2}; Table 5). The relative advantage of ℒ_{1} over ℒ_{2} for the precision of parameter estimation becomes more remarkable with the increased *R*^{2} value and sample size. It seems that the maternal QTL location can be estimated more precisely than the offspring QTL location when they are located at either different (Table 4) or the same marker intervals (Table 5). When *R*^{2} is small, the estimation precision of the QTL locations is not very responsive to sample size. But the QTL locations can be more precisely estimated with increased sample size at a large *R*^{2} value.

It is interesting to compare the results for genetic mapping of offspring traits from previous models and our model. Lander and Botstein's (1989) model associates the marker genotypes with the phenotypes measured at the same generation, whereas Wu* et al.*'s (2002) model associates the marker genotypes at the maternal generation with the phenotypes at the offspring generation. Unlike our model, these two models do not consider possible interactions between the QTL from the maternal and offspring genomes. The simulated data including the maternal-offspring interactions on each of the above 12 schemes were analyzed by the three models, aimed at mapping the QTL for the embryo. As expected, the two previous models cannot provide reasonable estimation of the embryo QTL parameters (results not shown). Also, their power to detect significant QTL was much reduced compared to our model.

We performed the hypothesis test for the significance of the additive × additive (*i _{aa}*) and additive × dominant epistatic effects (

*i*) between different genomes. Using the hypothesized values in Tables 4 and 5,

_{ad}*i*and

_{aa}*i*are detected to be significant for 80 of 100 simulation replicates at

_{ad}*R*

^{2}= 0.1 and

*n*= 200. This proportion is increased markedly when the QTL explain a larger proportion of the observed variance and/or when there is a larger sample size (results not shown).

Different gene action modes are suggested to affect the estimates of QTL parameters. An additional simulation study was conducted to investigate the impact of weak main (additive and dominant) effects *vs.* strong maternal-offspring interaction effects on the precision of parameter estimation. We consider an intermediate sample size (400), two different *R*^{2} values, and two different QTL distribution patterns (Table 6)
. The simulation results are summarized below. First, both maternal-offspring additive × additive (*i _{aa}*) and additive × dominant effects (

*i*) can be be better estimated when these effects are large. Second, large maternal-offspring interaction effects have no marked effects on the estimation of the additive or dominant effects. Third, large maternal-offspring effects would reduce the estimation precision of the two QTL located at the same interval.

_{ad}## DISCUSSION

We have proposed a new theoretical model for estimating the epistatic effects between maternal and offspring genomes on a complex trait expressed in the offspring. A considerable body of literature in animals supports the view that the maternal and offspring genomes interact to determine the developmental processes of offspring (Li* et al.* 1999; Agrawal* et al.* 2001; Hager and Johnstone 2003; Wolf 2003). Maternal-offspring interactions have long been thought to be more ignorable in plants than in animals, but increasing evidence suggests that they are more common and of greater evolutionary importance in plants than originally appreciated (Vielle-Calzada* et al.* 2000; Weijers* et al.* 2001). Our model will have great implications for estimating maternal-offspring interactions in any sexually reproductive organisms and make it possible to decipher the genetic basis of this important phenomenon (Agrawal* et al.* 2001; Hager and Johnstone 2003; Wolf 2003) at the individual QTL level.

Unlike traditional mapping approaches that were all developed to estimate direct genetic effects, our model incorporates both direct and indirect genetic effects into a QTL mapping framework and displays two significant advantages: (1) The results from our model should be closer to biological reality, given the fact that indirect genetic effects account for a great proportion (50% or more) of the genetic variance (Wolf* et al.* 2002; Wolf 2003), and (2) the model is statistically more powerful because more genetic information (indirect genetic effects) is used. As shown by extensive simulation studies, our model is quite robust in that the QTL parameters can be reliably estimated for a modest sample size (200) and low *R*^{2} value (0.1). Increased sample sizes and *R*^{2} values can improve the power to detect QTL and the estimation precision of QTL parameters. It is interesting to note that increasing *R*^{2} may be more important for precise parameter estimation than increasing sample sizes to the same extent. The increased *R*^{2} values may result from more precise phenotyping of a quantitative trait or weaker interaction between the underlying QTL and the environment.

Our strategy for mapping epistatic QTL from the maternal and offspring genomes is based on the marker genotypes derived purely from the maternal genome. Although it is no problem for the maternal markers to infer the maternal QTL, as demonstrated in traditional mapping approaches (Lander and Botstein 1989), such a one-stage sampling strategy is limited to predict the offspring QTL. This is because the marker information and offspring QTL are at different generations and undergo Mendelian transmission. In a study of mapping the endosperm of cereals, Wu* et al.* (2002) proposed a two-stage hierarchical genotyping strategy in which both the maternal and offspring genomes are genotyped at the same set of molecular markers. It is likely that the incorporation of Wu *et al.*'s two-stage hierarchical genotyping strategy can provide more unique information to capture gene transmission.

Our model proposed here is based on a simple backcross design for an autogamous plant system. It is not difficult to extend the model to other reproductive systems, such as allogamous and mixed-pollinated and other mapping populations. More interesting, our model can also be modified to characterize the genes affecting maternal care (Peripato and Cheverud 2002). The amount of care provided by parents is determined through a complex interaction of offspring signals and responses by parents to those signals (Agrawal* et al.* 2001). Variation in maternal care results from two distinct genetic sources: variation among offspring in their ability to elicit care and variation among parents in their response to offspring signals. By estimating the genetic loci for maternal care derived from these two sources, we can test for one of the most fundamental genetic assumptions in the family conflict theory: *i.e.*, offspring signaling is negatively genetically correlated with maternal provisioning (Agrawal* et al.* 2001; Wolf 2003).

Understanding the maternal and paternal genetic regulation of offspring development and/or maternal care helps to answer many fundamental evolutionary questions in both animals and plants. Our model can be applied immediately to evolutionary genetic studies. But it needs to consider the patterns of gene segregation and transmission in natural plant populations. Parameters characterizing population structure and organization, such as allele frequencies, linkage disequilibrium, and haplotype frequencies, should be incorporated into our mapping model. Lou* et al.* (2003) have recently derived a closed-form solution for precise and efficient estimates of haplotype frequencies. By incorporating Lou *et al.*'s model, we will be closer to unraveling the genetic basis of embryogenesis and organ development in natural animal and plant populations.

## Acknowledgments

This work is supported by an Outstanding Young Investigator Award of the National Natural Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense grant (7222061-12) to R.W. The publication of this manuscript was approved as Journal Series no. R-10064 by the Florida Agricultural Experiment Station.

## Footnotes

Communicating editor: J. B. Walsh

- Received November 8, 2003.
- Accepted March 15, 2004.

- Genetics Society of America