## Abstract

The use of high-throughput genomic techniques to map gene expression quantitative trait loci has spurred the development of path analysis approaches for predicting functional networks linking genes and natural trait variation. The goal of this study was to test whether potentially confounding factors, including effects of common environment and genes not included in path models, affect predictions of cause–effect relationships among traits generated by QTL path analyses. Structural equation modeling (SEM) was used to test simple QTL-trait networks under different regulatory scenarios involving direct and indirect effects. SEM identified the correct models under simple scenarios, but when common-environment effects were simulated in conjunction with direct QTL effects on traits, they were poorly distinguished from indirect effects, leading to false support for indirect models. Application of SEM to loblolly pine QTL data provided support for biologically plausible *a priori* hypotheses of QTL mechanisms affecting height and diameter growth. However, some biologically implausible models were also well supported. The results emphasize the need to include any available functional information, including predictions for genetic and environmental correlations, to develop plausible models if biologically useful trait network predictions are to be made.

THE emergence of genome technologies provides unprecedented opportunities to dissect the regulatory and functional networks by which genes affect phenotypes. The field of systems biology arising from these opportunities has been devoted primarily to understanding constitutive processes by investigating and modeling phenotypic effects of knockout mutants, but recent interests have expanded into understanding how natural genetic variation affects phenotypic variability (Feder and Mitchell-Olds 2003; Benfey and Mitchell-Olds 2008; Keurentjes *et al*. 2008; Koonin and Wolf 2008). Genetic mapping of quantitative trait loci (QTL) in segregating families has become a key tool for uncovering the genetic architecture of natural trait variation and ultimately identifying the underlying genes over the last two decades (Mackay 2001, 2004). However, the use of high-throughput techniques has expanded the definition of “quantitative trait” to encompass transcript levels assayed on a genomewide scale, leading to expression QTL (eQTL) mapping (Brem *et al*. 2002; Schadt *et al*. 2005; Keurentjes *et al*. 2007), and similar approaches are being used to map QTL associated with metabolomic variation (Keurentjes *et al*. 2006; Lisec *et al*. 2008).

Interest in incorporating eQTL data into a systems biology framework has spawned the development of new analytical approaches for predicting functional networks linking genes and traits using multiple-trait QTL data. Basic methods for combined analysis of traditional multiple-trait QTL, such as mapping QTL for eigenvectors defined by principal components analysis (Langlade *et al*. 2005; Brewer *et al*. 2007) and multiple-trait QTL analysis (Jiang and Zeng 1995; Rauh *et al*. 2002; Hall *et al*. 2006), may be useful for detecting pleiotropic (or closely linked) QTL and for providing insights into the nature of multiple-trait QTL effects, but yield little information for causal inference. One proposed method for identifying key regulators of gene expression variation uses average expression levels of gene regulatory networks identified *a priori* as composite traits for eQTL analysis (Kliebenstein *et al*. 2006). In another approach, expression patterns of genes located within eQTL regions shared among genes in known pathways are compared to those of the genes in the pathway to identify likely pathway regulators (Keurentjes *et al*. 2007).

In contrast with the above approaches, Doss *et al*. (2005) suggested using patterns of overall *vs*. residual correlations of gene expression in eQTL studies, which differ among tightly linked gene pairs with shared *vs*. independent regulatory mechanisms, in combination with path analysis to make causative predictions about gene regulation. Inference methods based on this reasoning have been proposed for *de novo* prediction of causal relationships in gene regulatory pathways (Zhu *et al*. 2004; Liu *et al*. 2008), between gene regulation processes and disease (Schadt *et al*. 2005), and among sets of functionally related complex morphological and metabolic phenotypes (Li *et al*. 2006). Various models of network regulatory structure, in which sets of QTL control variation in multiple traits directly or indirectly through cause–effect relationships, are tested with genetic marker and trait data to identify the models best supported by the data. Statistical approaches include Bayesian networks (Zhu *et al*. 2004), likelihood methods to evaluate conditional trait correlations for their support under alternative regulatory model configurations (Schadt *et al*. 2005; Neto *et al*. 2008), and structural equation modeling to compare the fit of different regulatory path models to the multiple-trait covariance structure (Li *et al*. 2006; Liu *et al*. 2008). The information from residual trait correlations used in these techniques is normally neglected in conventional QTL analysis, except for some multiple-trait QTL analysis techniques (Jiang and Zeng 1995) where it is used to increase the power to detect joint QTL but not for functional prediction.

Structural equation modeling (SEM) is well suited to causal network prediction because it can accommodate a large variety of model structures, including feedback relationships (Liu *et al*. 2008), and is adaptable to various modes of exploratory and confirmatory analyses of biological models (Stein *et al*. 2003; Li *et al*. 2006; Tonsor and Scheiner 2007; Liu *et al*. 2008). Moreover, SEM is supported by software packages such as SAS/STAT (SAS Institute 2002) and AMOS (Amos Development 2006). Path analysis has long been recognized as a potentially valuable but probably underutilized tool for testing causative rather than merely correlative relationships in quantitative genetic data (Wright 1921; Lynch and Walsh 1998). Nevertheless, some important issues must be considered in evaluating the reliability of SEM and other path analysis methodologies for inferring trait networks from QTL data. For one thing, cause–effect relationships among traits, environmental effects, and effects of other QTL can all generate within-genotype trait correlations (Stein *et al*. 2003; Doss *et al*. 2005; Schadt *et al*. 2005). Can path analysis correctly distinguish these potentially confounding effects? Second, how valid are the results of path analysis for exploring the likelihood of different model structures (*i.e*., different directions of cause–effect relationships among traits) rather than merely testing for the presence of particular connections within an established structure or estimating path coefficients for a particular model? Third, the true regulatory network shaping patterns of trait variation may be much more complex than the tested models, including feedback loops or unobserved traits and QTL (Schadt *et al*. 2005; Liu *et al*. 2008). Under these circumstances, will poor support for overly simple models provide reliable clues to the existence of a more complex network structure?

The goal of this study was to test the causal inferences from structural equation models under a set of relatively simple QTL regulatory scenarios. Specifically, I wanted to test whether models could correctly distinguish direct *vs*. indirect mechanisms of QTL effects on a pair of traits and test the effects of incomplete identification of QTL, common-environment (CE) effects, and direct regulation of unobserved traits on model predictions. To do this, I simulated multiple QTL data sets using a backcross design involving three QTL jointly affecting two traits in various direct and indirect modes of action. SEMs representing different models were tested for their ability to distinguish the correct functional scenario from incorrect scenarios in each case. For the sake of consistent comparisons, similar locations for each QTL and similar sets of effect sizes on the observed traits were simulated under all of the initial scenarios, a constraint that was later relaxed. Finally, I used SEM to evaluate data from a loblolly pine (*Pinus taeda* L.) QTL study of diameter and height growth for which biologically reasonable predictions could be made about models of joint genetic effects on the set of traits.

## MATERIALS AND METHODS

#### QTL simulations and detection:

All data sets were initially simulated using QTL Cartographer version 1.17 (Basten *et al*. 1994, 2004) and modified as needed to incorporate indirect QTL effects. The genome simulated using Rmap consisted of five chromosomes, each with eight markers spaced 15 cM apart starting at position 0 cM, generated using the Kosambi map function. Three QTL were simulated using Rqtl at the following locations: chromosome 1, 50 cM (Q1); chromosome 2, 40 cM (Q2); and chromosome 3, 60 cM (Q3). Thus, the first two simulated QTL were in intervals between markers and the third was at a marker position. Environmental variance was set to keep the total phenotypic variance close to 1.0 so all effect sizes would be approximately in phenotypic standard deviation units. Effects of all QTL were strictly additive, with simulated allele substitution effects (*a*) after all modifications on traits T1 and T2, respectively, of 0.7 and 0.5 for Q1 and 0.5 and 0.35 for Q2 and Q3. Thus, *a* for each QTL was uniformly greater on T1 than on T2 by a factor of , and the simulated variance explained by each QTL for T1 was very close to double that for T2. Backcross families of 500 progeny segregating for the genetic markers and three QTL in the two traits were simulated in QTL Cartographer using Rcross.

#### Direct-effects scenario:

Under a scenario of direct effects, the simulated QTL directly and independently affected both traits (Figure 1a). All marker and trait data were directly simulated in QTL Cartographer as described above.

#### Indirect-effects scenario:

Under the indirect-effects (or cause–effect) scenario, the simulated QTL directly affected only trait 1, which in turn affected trait 2 (Figure 1b). Only the data for *t*_{1} were simulated in QTL Cartographer with the effect sizes as described above for the basic model. Data for *t*_{2} were simulated using the equation(1)where *t*_{1i} and *t*_{2i} are the values for T1 and T2, respectively, in progeny *i*, and *z _{i}* is a standard normal random variate generated using the RANNOR command in SAS version 9 (SAS Institute 2002). Thus, half the variance in T2 is explained by the value of T1, yielding expected substitution effects for Q1, Q2, and Q3 of ∼0.5, 0.35, and 0.35, respectively, in T2, similar to the effect sizes under the direct-effects model. A new marker-trait input file was then created for Rcross that included the simulated values for both traits.

#### Latent-direct scenario:

In this scenario, the three QTL were assumed to have direct effects on the unobserved trait T0, which in turn had independent direct effects on T1 and T2 (Figure 1c). Progeny values for T0 were simulated in Rcross using substitution effects of 1.0, 0.7, and 0.7 for the three QTL. Progeny values for T1 and T2 were then simulated using the equations(2a)and(2b)with T0 explaining half of the variance in T1 and one-fourth of the variance in T2, where *z*_{1i} and *z*_{2i} are standard normal variates. This resulted in substitution effects for the three QTL on the two traits that were similar to those described above for the base model. The generated values for *t*_{1} and *t*_{2} were then used as input into Rcross.

#### Direct-effects, common-environment scenarios:

In these scenarios, the simulated QTL directly and independently affected the two traits in similar fashion to the direct-effects model, but a common-environment effect on the two traits was included (Figure 1d). To generate simulated data sets under this scenario, the two traits T1* and T2* were first simulated using Rqtl and Rcross, with Q1, Q2, and Q3 having substitution effects of 1.0, 0.7, and 0.7, respectively, on T1* and 0.7, 0.5, and 0.5 on T2*. Three vectors of the standard normal variates, **z _{C}**,

**z**, and

_{1}**z**, were then generated to represent common, T1-specific, and T2-specific environmental effects, respectively. Progeny values for T1 and T2 were then generated using the equations:(3a)and(3b)Values of

_{2}*b*and

*c*were chosen so that

*b*

^{2}+

*c*

^{2}= 0.5 and were varied in different simulated data sets to adjust the strength of the common-environment effect. Values of and

*c*= 0 were used to simulate strong environmental correlation, and and were used to generate a weaker environmental correlation. The resulting substitution effects of QTL on the two traits were thus similar to those of the base model. The values for T1 and T2 were then substituted for the original T1* and T2* values as input into QTL Cartographer.

#### Heterogeneous-effects, common-environment scenarios:

These scenarios were similar to the direct-effects, common-environment scenarios and data were generated using the same approach, except that the assumption of a homogeneous ratio of relative QTL effects on the two traits was relaxed (Figure 1e). Values for T1* and T2* were changed to result in substitution effects for Q1, Q2, and Q3 of 0.7, 0.5, and 0.3 for T1, and 0.5, 0.5, and 0.5 for T2. Both strong and weak environmental correlations were simulated in different data sets.

#### F_{2} direct-effects scenario:

A model with direct and independent effects on the two traits was also used to simulate an F_{2} family of 500 progeny. QTL locations and additive substitution effects on the two traits were identical to the basic model described above, and a dominance effect was also included such that the simulated *d*/*a* ratio for both traits was for Q1, 1.0 for Q2, and 0 for Q3.

#### QTL detection and structural equation modeling:

QTL effects and locations were estimated by maximum-likelihood composite interval mapping in QTL Cartographer, using model 6 in Zmapqtl. Prior to running Zmapqtl, background markers were fit using forward plus backward stepwise regression using a *P*-value threshold of 0.05 in SRmapqtl to control for background effects, including those of other QTL when conducting the QTL scans. In Zmapqtl, a 2-cM walking speed was used to scan the genome for QTL effects, a window size of 10 cM around each tested position was used to control for QTL effects outside the current marker interval, and up to five of the markers selected in SRmapqtl were used to control for background effects. For backcross data sets, an empirical genomewide α = 0.05 likelihood-ratio test statistic threshold of 11.28 was estimated from 1000 permutations of simulated data set 1 for the direct-effects scenario. For F_{2} data sets, an empirical threshold of 14.37 was similarly estimated from permutation tests of one of the two F_{2} simulations. Finally, multiple-trait composite interval mapping was done using model 6 in JZmapqtl to estimate joint QTL effects on the two traits. The positions with the peak multiple-trait likelihood ratios when comparing the full and reduced (*a* = 0) models were used as the QTL locations in the subsequent SEM analysis in place of the “true” simulated locations. If the detected multiple-trait likelihood peaks were >3 cM outside the range defined by the single-trait likelihood peaks identified using Zmapqtl, a location closer to the range of the detected single-trait peaks was chosen to represent the QTL.

Virtual markers were generated for the likelihood-ratio peak locations of each detected QTL. For backcross progeny sets, the virtual marker score was the probability of inheriting a donor parent allele from the heterozygous parent, given the flanking marker genotypes (coded as 1 for a heterozygote and 0 for a recurrent parent homozygote) and recombination frequencies with the respective markers. For F_{2} progeny, additive genotypic coefficients at flanking markers were scored as 1, 0, and −1 for parent 1 homozygotes, heterozygotes, and parent 2 homozygotes, respectively. Dominance coefficients were scored as 0.5 and −0.5 for heterozygotes and homozygotes, respectively. The additive and dominance scores for each virtual marker were calculated as the expected values of the additive and dominance genotypic coefficients at the estimated QTL position, given the flanking marker scores and recombination frequencies with each marker. The MultiRegress module in QTL Cartographer was used to calculate the additive and dominance scores for the F_{2} data sets.

SEM was conducted using PROC CALIS in SAS version 9 (SAS Institute 2002). SEMs were implemented by solving a set of linear equations for each tested model using maximum-likelihood estimation, with each equation describing a trait value (an endogenous variable) as a linear function of variables directly “upstream,” which could be virtual marker scores at QTL positions and/or another trait. In F_{2} designs, additive and dominance scores at each virtual marker were included as separate variables. Virtual markers for all QTL significant at the empirical threshold for at least one of the two traits were included as variables in the SEM. All simulated QTL were detected with genomewide significance for at least one, and generally both, traits. In all cases in which a simulated QTL lacked genomewide significance for a trait, a likelihood-ratio peak with at least comparisonwise significance was present. In several simulated data sets, significant effects were also detected at loci that did not correspond to simulated QTL locations, and these were also included in the models. The coefficients of the virtual marker scores in the linear equations represent the maximum-likelihood estimates of direct QTL effects on the traits. This approach differs from the mixture model maximum-likelihood estimation used for interval mapping in QTL Cartographer. However, in models in which the only explanatory variables were direct QTL effects on the observed traits, the estimated QTL effects from SEM were very similar to those obtained with QTL Cartographer. When common-environment effects were included in models, they were incorporated either as unobserved variables (known as latent variables) affecting multiple traits or as a covariance between the error terms for the traits, with both approaches producing the same results. The SEM methodology described above was implemented in PROC CALIS by using Lineqs statements.

Analyses of all data sets included: (1) a full model with direct effects of all three QTL on both T1 and T2 and indirect effects of T1 on T2 (Figure 1f); (2) a direct-effects model corresponding to Figure 1a; (3) an indirect-effects model corresponding to Figure 1b; and (4) a reversed indirect-effects model, corresponding to Figure 1b but with T2 downstream of T1. The full model under both backcross and F_{2} scenarios could not be tested because it was fully specified (*i.e*., had zero degrees of freedom), and thus the solutions resulted in a perfect fit to the covariance structure of the marker-trait data (see also Neto *et al*. 2008). Alternative fully specified models with the same sets of observed (manifest) variables but different path structures also have a statistically equivalent perfect fit to the data. Including the full model in the analyses, however, provided unique estimates of the relative contributions of the direct and indirect paths to the values of T2, given the model structure. A model corresponding to the direct-effects, common-environment scenario (Figure 1d) could not be tested under the simulated conditions. This model has no degrees of freedom if all QTL are included as effects for both traits, and thus the maximum-likelihood solutions perfectly fit the data as in the full model. Models corresponding to the latent-direct scenario (Figure 1c, with the unobserved T0 included in the model as a latent variable) also could not be validly tested. Due to the path structure in which T0 is directly connected to all explanatory QTL and observed traits, the latent variable can be fit optimally to be highly correlated to the upstream trait in an indirect scenario or to characterize the correlated QTL effects on the two traits in a direct scenario. Consequently, a latent-direct model fit the data better than both the direct and indirect models under all of the simulated scenarios and was uninformative about the true model.

Structural equation models were evaluated and compared using three criteria.

The likelihood-ratio (LR) test statistic (LRT) is −2 times the log-likelihood ratio for the data under the tested model

*vs*. a fully specified model. All tested models are nested within a fully specified model (either the full model described above or a statistically equivalent model with a different path structure), so the LR test statistic has an asymptotic χ^{2}distribution based on the number of degrees of freedom in the tested model. A significant*P*-value (*P*< 0.05) represents a failure of the tested model to explain the covariance structure of the data fully (*i.e*., insufficiency of the model), and insignificant*P*-values identify models that adequately fit the data (Mitchell 1992).Standard and adjusted goodness-of-fit indices (GFI and AGFI, respectively) estimate model fit on the basis of the residual errors, with the latter adjusted for degrees of freedom. Thus, GFI may be considered analogous to the

*R*^{2}in a linear model and AGFI analogous to an adjusted*R*^{2}. While both measures should produce values between 0 and 1, poorly fitting models with small degrees of freedom may result in negative AGFI (PROC CALIS documentation in SAS Institute 2002).Akaike's information criterion (AIC; Akaike 1974, 1987) reduces the LRT statistic by twice the number of degrees of freedom in the model, allowing comparison of non-nested models. The model with the lowest AIC is favored.

#### Loblolly pine QTL data collection and analysis:

Year 8 height and diameter data were collected from 146 surviving selfed progeny of a single loblolly pine parental clone from a previously reported genetic mapping study of inbreeding depression; the progeny were growing in a South Carolina Forestry Commission field site near Tillman, South Carolina (Remington and O'Malley 2000a,b). Malformed trees for which diameters and heights could not be measured conventionally were assigned heights of 1.0 m and diameters of 1.0 cm when severely dwarfed, and heights of 2.0 m and diameters of 3.0 cm for larger bent-over or multiple-stemmed trees, which the field measurement crew determined to be reasonable estimates.

QTL mapping was done using the 8-year field data, previously scored AFLP marker data (Remington and O'Malley 2000a,b), and a genetic map developed from outcross progeny of the same parental clone (Remington *et al*. 1999). Composite interval mapping was performed using QTL Cartographer as described above for simulated data. Trait data for year 3 height from the study of Remington and O'Malley (2000a) were also included, and height growth between year 3 and year 8 (years 3–8 growth) was calculated from the difference between the year 8 and the year 3 height measurements. Permutation tests on year 8 height data established an extremely high empirical genomewide α = 0.05 LR test statistic threshold of 55.76 for year 8 height, probably because the trait data were bimodally distributed due to very large phenotypic effects of the QTL associated with the *cad* locus in year 8. Therefore, for all traits I used the empirical genomewide α = 0.05 LR test statistic threshold of 16.42 estimated for earlier height growth measurements in Remington and O'Malley (2000a). I also included two suggestive QTL peaks for year 3 height with lower test statistic values because they corresponded to significant QTL regions for annual growth in either year 2 or year 3. For the purposes of this study, using this lower threshold is actually conservative, since not including all true QTL effects in the model introduces residual trait correlations that lead to bias in the SEM results. Virtual markers were developed for single-trait or multiple-trait QTL likelihood peak locations estimated from Zmapqtl or JZMapqtl and evaluated by SEM in SAS PROC CALIS as described above for simulated data.

In evaluating SEMs for year 3 height and years 3–8 growth and for year 8 height and diameter, QTL regions were included as direct effects for each trait unless there was no evidence at all of contribution to one of the traits (*i.e*., no likelihood ratio peak in the region exceeding the nominal comparison-wise significance level of 5.99 and with homozygous effects in the same direction as those of the other traits). Excluding noncontributing QTL effects provided additional degrees of freedom that allowed testing of models that included various combinations of direct, indirect, and correlated-environment effects on traits. This opportunity did not exist with the simulated data, since in almost every case all QTL were either significant at a genomewide level or suggestive at a comparison-wise level for both traits. In models with indirect effects, QTL regions contributing only to the downstream trait were included as effects on that trait.

## RESULTS

#### Backcross simulations and SEM testing:

Two simulated backcross data sets were tested against SEMs for each causal scenario to test the hypotheses (a) that the correct model would have the best fit to the simulated data sets using AIC as a criterion and (b) that the correct model would adequately explain the simulated data (likelihood ratio *P*-value ≥0.05). All simulations consisted of three QTL (Q1, Q2, and Q3) affecting two observed traits (T1 and T2). The three QTL were simulated to have homogeneous effects ratios on the two traits except as noted. Simulation results are summarized in Table 1.

#### Direct-effects scenario:

The first pair of simulations was generated under the direct scenario (Figure 1a), in which the three QTL each have direct effects on both T1 and T2. In both simulated data sets, the direct model correctly outperformed the indirect model in which T2 was downstream of T1 (Figure 1b), with the two simulations resulting in AIC values of 2.7627 *vs*. 18.6995 and 3.6130 *vs*. 16.7255, respectively, for the direct *vs*. indirect models. The direct model outperformed the reversed indirect model (T1 downstream of T2) by even larger margins. In both simulations, however, the direct model was weakly rejected as a sufficient explanation for the data (LRT *P*-values of 0.029 and 0.018, respectively, and modestly positive AIC values relative to a fully specified model) while the indirect model was strongly rejected (*P* < 0.0001). In the full model in which T2 was affected by the QTL via both direct and indirect paths, a substantial proportion of the effect on T2 (0.165 and 0.196, respectively) was inferred to be indirect and exerted through T1.

#### Single QTL in SEM:

I also tested the first of the two direct model simulations with three SEMs that each included only one of the predicted QTL. Not including all true QTL effects in the model can be expected to result in residual correlations between the traits, confounding the “signal” used by SEM to distinguish direct and indirect models. Consistent with this prediction, the direct model was strongly rejected and the indirect model had higher AIC values in each case when only one of the three QTL was included (Table 2). In one case, when only Q3 was included in the model, the indirect model was found to fit the data adequately (*P* = 0.1253).

#### Indirect-effects scenario:

When simulations were done under the indirect-effects scenario (Figure 1b), the indirect-effects model adequately fit the data, with insignificant *P*-values and negative AIC values, and correctly performed far better than either the direct model or the indirect model with the trait effects reversed from the simulated scenario (T2 upstream of T1). In the full model, the proportion of the effect on T2 inferred to be via the indirect path was >0.98 in both simulations.

#### Latent-direct scenario:

A more complex but potentially realistic scenario was represented by a pair of simulations in which the QTL affected an unobserved trait, which in turn had independent direct effects on T1 and T2 (Figure 1c). The latent-direct scenario could correspond to a situation in which genetic variation in the amount or activity of a transcriptional regulator affects expression of downstream genes or other observed traits or any other situation in which the direct QTL effects are on a trait that is not readily observed. In these simulations, both the direct and indirect models were rejected, with the direct model showing a much poorer fit than the indirect model with T2 downstream but better than the reversed indirect model with T1 downstream. In the full model, proportions of the effects on T2 inferred to come through the indirect pathway were 0.70 and 0.84, respectively, in the two simulations.

#### Common-environment direct scenario:

The next scenario was one in which T1 and T2 were directly affected by the QTL as in the direct scenario, but in which common-environment effects on T1 and T2 were included (CE direct scenario; Figure 1d). This corresponds to a highly realistic situation in which an environmental variable such as nutrient availability may jointly affect multiple aspects of growth and development. In the first of these simulations, common-environment effects were simulated to be strong, explaining 50% of the variance in T1 and T2. Under this scenario, the indirect model not only was incorrectly favored but also adequately fit the data, while the direct model and the reversed indirect model were strongly rejected. In the full model, essentially all of the effect on T2 was inferred to come through the indirect pathway. In a second simulation, in which the common-environment effect explained only 15% of the phenotypic variance in both traits, all three of these models were strongly rejected but the indirect model showed by far the best overall fit. In the full model, the proportion of the effect on T2 attributed to the indirect path was 0.73.

#### CE direct scenario with relaxed homogeneity assumptions:

There is no particular reason to expect multiple QTL to have a homogeneous ratio of effects on multiple traits under a direct-effects scenario. Thus, I generated additional data sets under the common-environment effects scenario in which the relative effects of the three QTL on the two traits were different to test whether the direct model would perform better than the indirect model under these relaxed assumptions. Under the CE heterogeneous-direct scenarios, Q1, Q2, and Q3 had larger, equal, and smaller effects, respectively, on T1 relative to T2. Two data sets were simulated under this scenario—one with strong correlated-environment effects and the other with weaker correlated effects.

Under both CE heterogeneous-direct scenarios, the fit of the direct model and of both indirect models were all strongly rejected (Table 1). When the correlated-environment effects were strong, the indirect model performed best by AIC and adjusted goodness-of-fit criteria, followed in order by the indirect-reversed and direct models. Under the full model, ∼0.92 of the effects on T2 were attributed to the indirect pathway. When the correlated-environment effects were weaker, the direct model outperformed the other models using AIC as the criterion, but the indirect model had the highest adjusted goodness of fit. Under the full model, more than two-thirds of the effect on T2 was attributed to the direct pathway.

#### F_{2} simulations under direct scenario:

I also tested SEM performance on F_{2} progeny sets simulated under the direct scenario. The additive effects of the three QTL were simulated to be the same as those in the backcross simulations. Dominance effects were also simulated, with *d/a* ratios differing among QTL but homogeneous between the two traits. The F_{2} SEMs included independent additive and dominance effects for each detected QTL. The SEM results were similar to those obtained for the direct scenario in the backcross simulations, with the direct model strongly outperforming the indirect models but still failing as a sufficient explanation of the data (Table 3). Under the full model, the proportions of the effects on T2 attributed to the indirect pathway were 0.11 and 0.13, respectively, in the two simulations, slightly less than in the direct-scenario backcross simulations.

#### SEM predictions with loblolly pine growth QTL data:

The F_{2} models were tested on data from loblolly pine (*P. taeda* L.) progeny of a self-pollinated parent, originally planted as part of a QTL study of inbreeding depression (Remington and O'Malley 2000a,b). Surviving 8-year-old trees were scored for height and breast-height diameter. The F_{2} progeny had previously been scored for a set of mapped AFLP markers covering the loblolly pine genome (Remington *et al*. 1999; Remington and O'Malley 2000b) and for annual height growth through the first 3 years after germination (Remington and O'Malley 2000a). Applying SEM to year 3 and year 8 QTL data provided an opportunity to address questions about the genetic basis for juvenile-mature correlations and plant allometry, topics of interest to plant biologists and tree breeders. One hypothesis was that QTL affecting year 3 height would continue to affect subsequent height growth between years 3 and 8 (years 3–8 growth). A prediction of this hypothesis is that year 3 height QTL would also have direct effects on years 3–8 growth. However, year 3 height QTL might also be expected to continue to affect subsequent growth indirectly through a carryover effect of the growth trajectory established in individual plants through year 3. A second hypothesis was that QTL effects on year 8 diameter would largely be indirect effects of genetic variation for year 8 height rather than vice versa, under the assumption that diameter growth in trees is controlled largely as an allometric response to height growth. A height-diameter allometric exponent close to 1 (*i.e*., a linear relationship between height and diameter) has been reported for relatively young trees such as those in this study, so indirect effects of tree-height QTL on diameter would be expected to be more or less linear (Niklas 1995). Common-environment effects would also be expected to generate positive correlations between traits in both the years 3–8 and height-diameter analyses.

Four QTL regions that exceeded the likelihood ratio test threshold of 16.42 were identified for eight-year height or diameter. Two QTL regions, LG1-14 and LG8-88, were significant for both year 8 height and diameter. The LG8-88 region corresponds to the location of the segregating *cad-n1* mutation, which results in production of chemically altered lignin and reduced wood stiffness, apparently interfering with the trees' ability to maintain upright form and sustained height growth (Mackay *et al*. 1997; Ralph *et al*. 1997; Remington and O'Malley 2000a). A third QTL region (LG9-08) was significant for year 8 height and strongly suggestive for diameter (LRT = 16.24). Each of these regions had large additive effects with substantial overdominance for both traits. A fourth QTL region (LG1-80) was significant for year 8 height, with large symmetrically overdominant effects, but had no apparent effect on diameter (Table 4). The height effect associated with LG1-80 may be an artifact of extreme segregation distortion, as it is flanked by near-lethal recessive alleles linked in repulsion and has substantial deficits of both homozygous genotypes, but this locus was included in the SEMs anyway to account for all prospective QTL effects. The LG8-88 and LG1-14 QTL regions had similar effects on height at earlier ages (Remington and O'Malley 2000a). The LG9-08 region was also suggestive of effects on year 3 growth, with a peak LRT value of 9.18 and additive and dominance effects in the same direction as in year 8. Two QTL regions on linkage groups 12 (LG12-116) and 3 (LG3-03), which were significant for year 2 and year 3 growth, respectively, had nearly significant effects on total year 3 height but showed no evidence of effects in year 8. QTL effects for years 3–8 growth were very similar to those for year 8 height (Table 5).

In fitting SEMs for year 3 height and years 3–8 growth, only QTL with significant or suggestive effects were included as direct effects; thus, LG12-116 and LG3-03 were included only in direct effects for year 3 height, and LG1-80 was included only for years 3–8 growth. Similarly, in comparing year 8 height and diameter, LG1-80 was included only in direct effects for year 8 height. In both cases, QTL regions with direct effects only on the downstream trait were included in models with indirect effects. The additional degrees of freedom provided by excluding noncontributing QTL allowed testing of models that included various combinations of direct, indirect, and correlated-environment effects on traits.

Effects on year 3 height and years 3–8 growth were poorly explained both by models with only direct QTL effects and with only indirect effects of year 3 height on years 3–8 growth (Table 6). A model with both direct and indirect effects on years 3–8 growth (direct + indirect model) adequately fit the data and had the lowest AIC value. Adding environmental correlation to the direct + indirect model only marginally improved model fit but this direct + indirect + CE model had a higher AIC value than the simpler direct + indirect model. A model with direct effects and environmental correlation (direct + CE model) was marginally significant for lack of fit (*P* = 0.0344) and had slightly poorer fit and a slightly higher AIC value than the full model, suggesting that adding indirect effects may explain a small amount of effect not explained by direct and common-environment effects alone but not vice versa. Even with the additional degrees of freedom available with this data set, however, there is very little information to distinguish whether indirect effects or common-environment effects (or both) contribute to the relationship between the two traits.

To evaluate the utility of SEM for distinguishing between different causal structures, I also tested a set of biologically implausible models in which year 3 height was a downstream effect of years 3–8 growth. One implausible model with year 3 height QTL explained only as an indirect effect of years 3–8 growth, and another model with both direct and indirect effects on year 3 height, had only marginally significant lack of fit to the data (*P* = 0.0169 and *P* = 0.0256, respectively); the former model had an AIC value within 1.0 of the best-fitting plausible direct + indirect model.

By contrast, effects on year 8 height and diameter were adequately explained by a model with only indirect effects of year 8 height on year 8 diameter (Table 7). Models that included direct QTL effects on both traits in combination with indirect effects on either height or diameter or correlated-environment effects also adequately fit the data, but had AIC values that were >4.0 larger than the indirect-only model due to the additional degrees of freedom used in fitting the model. Models with only direct QTL effects on both traits and with height explained only by diameter failed to fit the data and had much higher AIC values. Thus, the data were consistent with the prediction that diameter growth was largely an allometric response to height growth.

## DISCUSSION

For this study, I used simulations to examine the ability of structural equation models to distinguish between simple but realistic genetic regulatory scenarios for multiple-trait QTL mapping data, including those with common-environment effects and incomplete QTL information, and to examine the sensitivity of SEM to incorrect or incomplete model structures. The consistent results from the replicate simulations in each case suggest that the patterns that they reveal are valid and that their implications should be taken into account when interpreting the results of QTL-based causal inference studies.

The correct models were strongly favored using *P*-value and AIC criteria when data sets were simulated under direct and indirect scenarios without confounding factors and all QTL were included in the model. However, the direct model provided only marginal fit to the simulated data when it was the correct model, which is discussed in more detail below. The simulated QTL effects were relatively small, with Q2 and Q3 explaining only ∼3% of the phenotypic variance for T2, the trait with smaller QTL effects, in the backcross design. Success in identifying the correct model in these circumstances is consistent with the results of a larger set of simulations by Schadt *et al*. (2005), who found that the power to detect the correct model under similar circumstances was ∼80% when QTL explained as little as 1–2% of the variation in the less-affected trait.

#### Confounding effects of incomplete QTL and trait detection:

These simulations clearly show that not including all relevant QTL in the models can lead to incorrect functional predictions with multiple-trait QTL data. Much of the “signal” for detecting indirect QTL effects on traits in a network, whether the traits are gene expression or functional phenotypes, comes from the degree and pattern of residual correlations between trait values within QTL genotypic classes (Stein *et al*. 2003; Doss *et al*. 2005; Schadt *et al*. 2005). In the absence of common-environment effects, only the genetic component of trait variation in a pair of traits should be correlated if a set of genes affects both traits independently. When QTL effects on one trait are indirect effects of variation in an upstream trait, however, both genetic and environmental effects on the upstream trait will affect the downstream trait in the same manner, leading to similar correlations within and among QTL classes. If all QTL that jointly affect a set of traits are not included in the model, effects of the unobserved or excluded QTL will also result in residual correlations between traits. These residual correlations may generate false support for indirect models, with the trait having larger standardized QTL effects appearing to be upstream. Thus, analyses that test only a single QTL region for effects on a suite of polygenic traits are virtually guaranteed to place the traits in a common regulatory pathway, whether such a pathway exists or not. Even when all experimentally detected QTL regions are included in a model, some actual QTL are likely to be missed due to limited experimental power (Beavis 1994) and lead to correlated residuals. Testing of direct *vs*. indirect QTL path models has been suggested as a test of close linkage *vs*. pleiotropy with shared QTL regions, since the former are by definition independently regulated (Doss *et al*. 2005; Schadt *et al*. 2005). For this test to identify instances of close linkage effectively, however, these results show that all other QTL regions that jointly affect the two traits must be included in the model.

Unobserved traits can also affect model predictions if they occur at branching points leading to observed traits, as in the latent-direct scenario. In the two latent-direct simulations, neither the direct nor the indirect model was well supported, providing some indication that the true model was more complex, but these results are not distinguishable from those expected if the true scenario were a combination of direct and indirect effects.

#### Distinguishing indirect *vs*. common-environment effects:

These results show that shared environmental effects on a pair of traits may be difficult to distinguish from indirect effects. Models combining direct effects with either indirect effects or with common-environment effects on a pair of traits cannot be distinguished if all QTL affect both traits; both will lack residual degrees of freedom and have perfect fit. It should be noted that similar indistinguishable submodels could potentially be nested within more complex trait networks even when the overall model has residual degrees of freedom. Even if both model structures can be tested, however, as with the loblolly pine years 3–8 analyses, there may be little information to distinguish common-environment effects from indirect effects because both mechanisms will generate trait correlations within QTL genotype classes. When common-environment effects were included in simulations of direct-effects scenarios, indirect models were more strongly supported in most cases. Genes with smaller standardized genetic effects tended to be placed downstream of those with larger genetic effects even when their genetic regulation by shared QTL was independent.

In biologically realistic situations, shared environmental effects are probably common because each individual organism in a genetic mapping population is shaped by the unique environment that it has experienced and by the peculiarities of developmental “noise.” These environmental and developmental effects are likely to have correlated effects on multiple traits (Lynch and Walsh 1998). The influence of common environment on residual trait correlations is well recognized (Stein *et al*. 2003; Doss *et al*. 2005; Schadt *et al*. 2005), but the question of whether or not path models can distinguish environmental correlations from causative indirect effects has received little scrutiny.

The simulations presented here probably represent some of the more challenging scenarios for distinguishing direct, indirect, and common-environment effects. In contrast with these simulations, there is no expectation that multiple QTL acting independently on a set of traits would have homogeneous effects ratios, or even the same direction of effect, on the traits. For example, five QTL regions were found to affect both corolla length and width in a cross between perennial and annual populations of the monkeyflower *Mimulus guttatus*, with ratios of standardized effects on the two traits ranging from 0.40 to 2.25 (Hall *et al*. 2006). In the same study, one QTL region with joint effects on corolla tube length and reproductive allocation—traits that lack an obvious mechanism for a cause–effect relationship—affected both traits in the same direction while a second affected the two traits in opposite directions (Hall *et al*. 2006). In the loblolly pine data set examined in this study, only three of six QTL regions for year 3 height and years 3–8 growth had detectable effects on both traits, and for these three regions the ratio of standardized homozygous effects on the two traits ranged from 0.44 to 1.09. By contrast, three QTL regions affecting year 8 height and diameter (ignoring LG1-80, which had negligible homozygous effects) had standardized effect ratios in a much narrower range of 1.16–1.37, consistent with the SEM inference of indirect effects on diameter. When heterogeneous QTL effects were simulated, the false support for the indirect model relative to the direct model due to common-environment effects was weakened or even reversed. Moreover, genetic and environmental correlations may be in opposite directions in actual trait networks, unlike those in our simulations. For example, traits involved in life-history trade-offs are likely to have negative genetic correlations, but shared environmental effects (*e.g*., variation in nutritional status) may generate positive correlations. In such situations, confounding of common-environment effects with indirect effects would be unlikely, and in some cases SEM results might even be biased against indirect models.

#### Sensitivity of the direct models:

In contrast with the effects of confounding factors that favor indirect models, these data show that direct models are highly sensitive to any residual correlation between trait values. In all four simulations under simple direct-effects scenarios (two backcross and two F_{2} simulations), the correct direct model was rejected as an adequate explanation of the data (LR *P*-values <0.05) although it outperformed the indirect model. Concomitantly, the full models that included both direct and indirect effects estimated as much as 20% of the effect on T2 to occur via the indirect pathway under the direct-effects scenarios, even though none of the actual effect was indirect. The basis for this bias toward indirect effects is uncertain. Imprecise estimates of the true QTL locations is one possibility, although replacing the estimated QTL locations with the actual simulated locations in one of the simulations did not improve the fit (data not shown). Imprecise estimates of the true QTL effects will result in under- or overfitting of the model to the true sources of effects and could also result in residual correlations. It is also possible that the residual values generated in the QTL Cartographer simulations were not entirely random.

Regardless of the explanation for the lack of complete model fit in the simulated data, actual data from directly regulated trait sets can generally be expected to show even less independence of residuals due to the pervasiveness of common-environment effects and undetected QTL discussed above. Given these realities, standard tests of model fit are inappropriate for evaluating the sufficiency of direct-effects models. Empirical tests will be necessary but not easy to implement, as permutation tests would have to randomize trait values within QTL classes and handling recombinants will present special problems. Parametric bootstrap simulations may be a better option, but the algorithms required to generate hundreds of simulated data sets, estimate QTL locations and virtual marker scores, and run multiple SEMs on each simulation will be complex. Effects of unobserved QTL could be incorporated as polygenic effects in a parametric bootstrap. Crude but potentially useful estimates of polygenic variance could be obtained from the percentage of differences between parental lines not explained by the identified QTL, especially if the mapping population were derived from divergent parental lines.

In spite of the factors causing bias against the direct model, direct effects were identified as the largest contributor to QTL effects in the experimental data on loblolly pine years 3–8 growth. These results were consistent with *a priori* predictions of the carryover effects of earlier QTL action and possibly common-environment effect as additional contributors to years 3–8 growth. In contrast, QTL effects on year 8 diameter were fully explained as an indirect effect of year 8 height, which is also consistent with *a priori* predictions based on allometry theory.

#### Limitations of SEM for exploratory analysis:

Results of this study underscore the need for caution in using SEM in a purely exploratory mode in the absence of functional hypotheses to guide the choice of model structure. While SEM analyzes causal models of the covariance structure between traits, the adequate fit of a model does not by itself imply causation (Mitchell 1992; Grace and Pugesek 1998). In data sets simulated under an indirect scenario, reversing the cause–effect relationship between the two traits resulted in a substantially poorer model fit, as expected. With the loblolly pine years 3–8 height data, however, a biologically implausible but simple model in which years 3–8 growth alone explained year 3 height QTL fit almost as well as the best-fitting plausible model, which required both direct and indirect QTL effects to explain years 3–8 growth, and fit better than any of the other plausible models tested. The most likely explanation is that the standardized effect of QTL on years 3–8 growth is much larger than on year 3 height due to the very large effect of the LG8-88 QTL on years 3–8 growth. In this example, the temporal relationship between the traits made the distinction between plausible *vs*. implausible model structures obvious, but this will not be the case in other situations. For example, in explaining genetic correlations between morphological and physiological traits, reasonable biological models might involve direct or indirect genetic effects on either set of traits. It is therefore important to include as much prior biological information as possible to identify plausible networks, such as prior information on coregulated sets of genes or structure of known regulatory pathways in eQTL analyses (Kliebenstein *et al*. 2006; Keurentjes *et al*. 2007; Zhu *et al*. 2008), developmental models describing temporal and functional relationships among morphological and physiological traits (Mündermann *et al*. 2005), and the expected direction of correlations of common-environment effects. In this context, SEM provides a powerful tool for testing the adequacy of causal hypotheses (Mitchell 1992). Purely exploratory analyses may be useful in systems-biology applications for reducing the set of model structures for further experimentation (or “sparsification,” per Liu *et al*. 2008), but the results of such analyses should not be interpreted as support for particular causal structures.

#### Summary:

Results of this study confirm that the residual correlation structure in multiple-trait QTL data, which is typically unused in QTL analyses, provides a powerful but limited source of information on the functional basis of trait variation. Factors likely to be pervasive in real biological data, including common-environment effects and incomplete identification of QTL, can masquerade as indirect causal effects and lead to incorrect conclusions if not carefully considered. Applications of SEM or other approaches to path analysis for functional predictions from QTL data need to incorporate as much *a priori* information as possible on the traits being studied, including the expected patterns of common-environment effects on traits, if the predictions are to be biologically useful.

Further studies should be conducted to test the predictive power of SEM under scenarios such as the ones used here. Other studies have used large numbers of simulations to examine the power of network modeling approaches (Schadt *et al*. 2005; Liu *et al*. 2008; Neto *et al*. 2008), but have not explored the questions addressed in this study. Issues with statistically indistinguishable models incorporating common-environment *vs*. indirect effects need to be examined in more complex trait networks, where untestable, fully specified modules may be nested within larger models. Future research needs also include evaluating different patterns of direct genetic and common-environment effects, developing methods for empirical statistical tests, and incorporating epistatic QTL effects.

## Acknowledgments

I thank Ross Whetten and Elizabeth Lacey for assistance with year 8 data collection from the Tillman, South Carolina, loblolly pine genetic mapping population; Rongling Wu for first suggesting to me the use of path analysis approaches for functional prediction from QTL data; Elizabeth Lacey, Sergey Nuzhdin, and David Aylor for valuable discussions; and Malcolm Schug, Rebecca Doerge, and two anonymous reviewers for helpful comments that improved the manuscript.

## Footnotes

Communicating editor: R. W. Doerge

- Received June 15, 2008.
- Accepted December 31, 2008.

- Copyright © 2009 by the Genetics Society of America