## Abstract

Developmental instability or noise, defined as the phenotypic imprecision of an organism in the face of internal or external stochastic disturbances, has been thought to play an important role in shaping evolutionary processes and patterns. The genetic studies of developmental instability have been based on fluctuating asymmetry (FA) that measures random differences between the left and the right sides of bilateral traits. In this article, we frame an experimental design characterized by a spatial autocorrelation structure for determining the genetic control of developmental instability for those traits that cannot be bilaterally measured. This design allows the residual environmental variance of a quantitative trait to be dissolved into two components due to permanent and random environmental factors. The degree of developmental instability is quantified by the relative proportion of the random residual variance to the total residual variance. We formulate a mixture model to estimate and test the genetic effects of quantitative trait loci (QTL) on the developmental instability of the trait. The genetic parameters including the QTL position, the QTL effects, and spatial autocorrelations are estimated by implementing the EM algorithm within the mixture model framework. Simulation studies were performed to investigate the statistical behavior of the model. A live example for poplar trees was used to map the QTL that control root length growth and its developmental instability from cuttings in water culture.

EVERY live organism in the course of evolution is intricately equipped with developmental stability or canalization (Waddington 1940) through collective mechanisms that buffer against the stochastic perturbations arising spontaneously from the cellular processes involved in the development of morphological structures (Polak 2003). However, when stochastic perturbations of either environmental or genetic origin are beyond the capacity of the organism to produce a consistent phenotype, the organism will be forced to display some degree of developmental instability, manifested as the imprecision of developmental pathways and final morphological phenotypes (Waddington 1957; Zakharov 1992; Palmer 1994). Indeed, developmental instability embodies variation around the expected (target) phenotype that should be produced by a specific genotype in a given environment, and the occurrence of developmental instability is due to small random errors accruing in development even when genetic and environmental conditions are kept constant (Klingenberg 2004).

In general, developmental instability produces a subtle difference in each step of development. But increasingly more evidence has been observed that the accumulation of these minor differences may have played an important role in the ultimate formation and evolution of a complex trait (reviewed in Leamy and Klingenberg 2005). In nature, developmental instability may negatively affect the fitness of a biological organism (Badyaev *et al*. 2000) and the yield of an economic trait and its components, such as seed size, seed number and photosynthetic rate (Souza *et al*. 2005), through the investment of extra energy to buffer against various environmental fluctuations that are internal and external to an organism. A widely accepted view is that developmental instability will be higher in the more stressed populations compared to the control or unstressed populations (Pankakoski *et al*. 1992; Graham *et al*. 2000; Pertoldi *et al*. 2006). Given the fundamental importance of developmental instability, it is essential for understanding its genetic causes and consequences (Polak 2003; Leamy and Klingenberg 2005) and further exploring how it responds to natural or artificial selection within an evolutionary and ecological context (Clarke 1998).

Developmental instability is quantified by the amount of variation among phenotypes that would be produced by the same developmental blueprint under identical genetic and environmental conditions (Klingenberg 2004). In organisms like animals that display a bilateral symmetry, developmental instability is measured as fluctuating asymmetry (FA) that is due to random differences between left and right sides. Although FA is considered to be purely environmental in origin, it may also be under genetic control (Leamy 1997; Markow and Clarke 1997; Palmer 2000; Fuller and Houle 2003). Empirical studies suggest that the heritability of FA is low (Pelabon *et al*. 2004), but in many cases it is significant, as observed in Scheiner *et al*. (1991) and demonstrated by a meta-analysis of Møller and Thornhill (1997a,b), although there is a controversy on this issue (Whitlock and Fowler 1997). Recent quantitative trait locus (QTL) mapping approaches (Lander and Botstein 1989; Lynch and Walsh 1998) have been performed to identify specific loci responsible for the variation of FA in mice (Leamy *et al*. 1998, 2002). These mapping studies allowed Leamy and Klingenberg (2005) to conjecture the nonadditive genetic architecture of FA composed of intralocus (dominance) and interlocus interactions (epistasis).

Plants, as organisms with modular construction, are very suitable subjects for detecting developmental instability caused by environmental disturbance. The analysis of the asymmetry of plant structural traits can be used to determine deviations from the basic structural pattern, which is a measure of plant developmental instability. However, for important traits such as stemwood growth in forest trees and grain yield in crops, it is not possible to measure such asymmetry. Different from conventional FA measures, developmental instability for these traits can be measured by growing individual plants of the same genotype in a microsite with clonal replicates or recombinant inbred lines. Variation among phenotypes of different individuals within a clonal genotype under a similar condition is thought to stem from developmental instability or noise.

In this article, we propose an experimental design based on genotypic replicates in space to map and estimate the genetic effects of QTL on the developmental instability of a quantitative trait. A mixture model is constructed to separate different QTL genotypes in terms of observed marker information (Lander and Botstein 1989). The autoregressive model interpreted on a spatial scale is used to model the structure of the residual variance matrix (Cressie 1991). It assumes that the residual correlation between any two different copies of the same progeny genotype decays exponentially with the physical distance between these two replicates in the field. Also, the residual variance is postulated to be composed of two components due to permanent and random environmental factors. The random residual variance due to stochastic independent errors reflects the degree of developmental instability. The genetic control of developmental instability can be determined by testing the difference in the random residual variance among QTL genotypes inferred from a molecular linkage map. The statistical model is constructed within the context of maximum likelihood and implemented with the EM algorithm (Dempster *et al*. 1977). Simulation studies have been performed to investigate the statistical behavior of the model. We used a real example in poplars to validate the usefulness of the model.

## THE MODEL

#### Experimental design:

Consider a simple backcross design in which *n* progeny are segregating in a 1:1 ratio at each locus. A genetic linkage map, aimed to identify segregating quantitative trait loci (QTL), is constructed with polymorphic markers genotyped through the genome. Each backcross progeny is replicated with clones, recombinant inbred lines, or isogenic lines and planted in a randomized complete design. There are multiple replicates for each progeny planted in a plot. The number of replicates (*R*) can be small or large, depending on the availability of materials. The shape of a plot can be a triangle, a rectangle, a square, and so on. Without loss of generality, let each progeny have four copies laid out in a square plot with a loop at a spacing of *d* × *d* m. Thus, the physical distances between any two plants can be expressed as(1)For other layouts, between-plant distances can always be calculated as long as the geometric shape of the plot and the number of copies are known.

#### Linear model:

The phenotypic value of a quantitative trait, *y _{ij}*, for progeny

*i*at its

*j*th replicate in a plot is described by a linear model(2)where

*c*is the genotypic value of progeny

_{i}*i*and

*e*is the residual (or environmental) effect,

_{j}*e*∼

_{ij}*N*(0, σ

^{2}).

Suppose there is a putative QTL in the backcross, with two genotypes, *Qq* (coded by 1) and *qq* (coded by 0), involved in the control of the trait. The genotypic value of progeny *i* can be partitioned into two components, *i.e*., the genotypic value (μ_{h}) of QTL genotype *h* (*h* = 1, 0) and the genetic effect (*c _{i}*

_{|h}) due to other loci rather than the QTL under consideration. Because of the replicates of a progeny, the residual effect is partitioned into permanent (

*p*) and random environmental effect (ϵ

_{i}_{ij}). Thus, Equation 2 is written as(3)where ξ

_{h}is the indicator variable for QTL genotypes defined as 1 for a considered QTL genotype and 0 otherwise, μ

_{h}is assumed to be a fixed effect, and

*c*

_{i}_{|h},

*p*, and ϵ

_{i}_{ij}are assumed to be the random effects, with

*c*

_{i}_{|h}∼

*N*(0, ),

*p*∼

_{i}*N*(0, ), and

*e*∼

_{ij}*N*(0, ).

#### Autocorrelation structure:

Let be the vector of the observed value for the trait measured for progeny *i* planted with *R* replicates in a plot. Equations 2 and 3 are then written in matrix notation as(4)where Φ_{i} is an *R*-dimensional vector of all elements equal to 1, Ψ_{i} is an *R*-dimensional vector whose elements describe the spatial positions of different replicates for progeny *i* within a plot, , and .

The *R*-dimensional residual covariance matrix of the phenotype vector (**y**_{i}) among different replicates across different progenies of the same QTL genotype is expressed in terms of Equation 4 as(5)where matrix **R**_{i} specifies the autocorrelation structure of different replicates for progeny *i* within a plot, defined by the positions of replicates Ψ_{i} (Cressie 1991), and **I**_{i} is an identity matrix because the random environmental effects are assumed to be independent among replicates.

Equation 5 partitions the variance within QTL genotypes into two parts, one being the genetic variance and the other being the residual environmental variance. The environmental variance is further partitioned into spatial and nonspatial components. The spatial component of the environmental variance is due to some permanent factors within a plot, such as moisture or nutritional gradients in a microsite. The nonspatial component of the residual environmental variance that does not depend on microenvironmental gradients is due to local unpredictable variability arising from random independent errors. The nonspatial component represents the local variance of the residual error that is often called the “nugget variance” (Isaaks and Srivastava 1989). Thus, the relative magnitude of the spatial and nonspatial components, described by parameter *g*, reflects the extent of the local variability due to developmental instability. On the basis of this definition, we have the permanent environmental variance and the random environmental variance .

The spatial covariance matrix can be structured by various statistical models, such as the first-order autoregressive [AR(1)] model, in which the variance is assumed to be constant over different plant positions within a plot and the spatial correlation drops off exponentially with the distance between plant positions, so that a distance of *d* between plant positions leads to a correlation of ρ* ^{d}*. Considering a square plot, we model the spatial correlation matrix for progeny

*i*by(6)A similar modeling structure can also be used for other different layouts of the plot.

#### Likelihood function and computational algorithm:

The likelihood function of the observed values (**y**) for the trait and marker information (**M**) can be expressed, by a mixture model (Lander and Botstein 1989), as(7)where **Ω** is the unknown vector including the QTL position, the QTL genotypic values (μ_{h}), the genetic variance within QTL genotypes (), the permanent environmental correlation (ρ), the residual variance (σ^{2}), and the proportion of the random residual variance to the total residual variance (*g*).

The mixture proportions, ϖ_{1|i} and ϖ_{0|i}, are the QTL genotype frequencies in the backcross. Because each backcross progeny has known marker genotypes, the likelihood function will be expressed in terms of known groups of marker genotypes. Let the putative QTL be predicted by a pair of flanking markers that bracket the QTL. Thus, the QTL genotype frequencies should be expressed for each of the four possible marker genotypes. These so-called conditional probabilities of QTL genotype given marker genotypes are derived in terms of the recombination fractions between the QTL and two flanking markers. For a dense map, these conditional probabilities can be approximated by the ratio (θ) between the QTL–marker over marker–marker recombination fractions. We thereby use θ to denote the chromosomal location of the QTL within the marker interval.

The multivariate normal distribution probability of the trait for QTL genotype *h* (*h* = 1, 0), *f _{h}*(

**y**

_{i}), is expressed aswhere

**u**

_{h|i}= Φ

_{i}μ

_{h}is the vector of the genotypic values of the trait at

*R*different replicates of QTL genotype

*h*within a plot. We use Equation 5 to model the structure of the covariance matrix in the above probability density function specifically for QTL genotypes. Assume that the QTL does not affect the spatial (permanent) residual variance, but it is responsible for the local (random) residual variance. Thus, by defining a QTL genotype-specific proportion of the local to total residual variance,

*g*, and comparing its differences among different QTL genotypes, we can test whether the developmental instability of the trait studied is controlled by the hypothesized QTL.

_{h}The EM algorithm (Dempster *et al*. 1977) is implemented to estimate the genotypic values and the parameters that model the structure of the covariance matrix, all contained in vector **Ω** = (θ, **Θ**) with **Θ** = (μ_{h}, , *g _{h}*, ρ, σ

^{2}) (

*h*= 1, 0). These unknown parameters can be estimated by differentiating the log-likelihood function of Equation 7 with respect to each parameter, letting the derivatives be equal to zero, and solving the log-likelihood functions.

The log-likelihood function of the phenotypic values for a trait affected by a QTL is given bywith the derivative with respect to any element Ω_{ℓ} in the unknown vectorwhere we define(8)which could be thought of as a posterior probability that progeny *i* has QTL genotype *h*. We then implement the EM algorithm with the expanded parameter set {**Ω**, **Π**}, where **Π** = {Π_{h|i}}. Conditional on **Π** (the E step; Equation 8), we solve the log-likelihood equations(9)to get the estimates of **Ω** (the M step). The E and M steps between Equations 8 and 9 are repeated until the estimates converge to stable values that are regarded as the maximum-likelihood estimates (MLEs) of the parameters. A detailed procedure for the derivations of the MLEs is available upon request.

In practical computations, the QTL position parameter can be viewed as a fixed parameter because a putative QTL can be searched at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. The log-likelihood-ratio test statistic for a QTL at a particular map position is displayed graphically to generate a likelihood map or profile. The genomic position that corresponds to a peak of the profile is the MLE of the QTL location.

## HYPOTHESIS TESTS

#### The existence of QTL:

On the basis of the MLEs of two QTL genotypic values, we estimate the overall mean by and the additive genetic effect of the QTL on the trait by . This design allows us to test the existence of a QTL, regardless of whether it affects only the trait or its developmental instability or both. This can be tested by formulating the following hypotheses:(10)The log-likelihood values *L*_{0} and *L*_{1} under H_{0} and H_{1} are calculated. The test is performed with a log-likelihood-ratio statistic(11)where the tildes and hat stand for the MLEs under the null and the alternative hypothesis, respectively. The LR statistic is plotted against test locations and a high LR corresponds to the position of QTL. Because the QTL position under H_{0} of hypothesis (10) is not identifiable, the distribution of the LR calculated is unclear. An empirical approach for determining the critical threshold that does not depend on the distribution is based on permutation tests, as advocated by Churchill and Doerge (1994). By repeatedly shuffling the relationships between marker genotypes and phenotypes, a series of the maximum-log-likelihood ratios are calculated, from the distribution of which the critical threshold is determined.

The effect of the QTL on the trait can be tested using the hypotheses as follows:(12)The likelihood calculated under hypothesis (12) can be thought to follow a χ^{2}-distribution with 1 d.f. because the QTL position under the null hypothesis is identifiable, a case different from hypothesis (10). The genetic variance of the trait contributed by the detected QTL is the variance between the two QTL genotypes, calculated byThe total environmental variance is the summation of the residual environmental variance within each QTL genotype weighted by the frequencies of QTL genotypes, calculated asThus, the heritability of the trait explained by the QTL is calculated as

#### Genetic control of developmental instability:

After the existence of a QTL for the trait is confirmed, it is essential to test whether this detected QTL triggers an effect on the developmental instability of the trait. We can first test whether the nonspatial local variation is significant by formulating the hypotheses(13)If H_{0} of hypothesis (13) is accepted, this means that all the residual variance is contributed by the spatial variance and that there is no variance due to developmental instability. By contrast, if the null hypothesis of the test(14)is accepted, this indicates that the nonspatial component, *i.e*., the nugget effect, is only a source for the residual variance.

The genetic control of developmental instability is tested by(15)If H_{0} of hypothesis (15) is rejected, this suggests that developmental instability is under significant genetic control. The LR values for hypotheses (13)–(15) can be thought to follow a χ^{2}-distribution with 2 or 1 d.f., respectively. For each QTL genotype, the proportion of the nonspatial variance due to developmental noise relative to the total residual variance is calculated by(16)Thus, by comparing between the two genotypes, we determine how the QTL detected for the trait affects its developmental instability.

## APPLICATION

#### Material:

We used a real example for QTL mapping in poplar trees to demonstrate the usefulness of our model. The study material, as described in Wu *et al*. (2002), was derived from the triple hybridization of Populus (poplar), *Populus deltoides*, and *P. euramericana*, which is an interspecific hybrid between *P. deltoides* and *P. nigra*. Of >400 triple hybrids, 90 were randomly chosen to establish a mapping population for marker analysis and QTL identification. Given the heterozygous characteristic of forest trees, analysis of this mapping population is based on a pseudotest backcross design (Grattapaglia and Sederoff 1994), in which markers and QTL are heterozygous in one parent but not in the second parent. According to this design, two genetic linkage maps each based on the gene segregation of a different parent were constructed with different types of molecular markers (Yin *et al*. 2002). Our analysis here is based on a *P. deltoides* (D)-specific linkage map.

Ramets from each of the progeny used to construct linkage maps were made to study their rooting capacity. The ramets were water cultured in a randomized complete block design with three different blocks and a four-tree square plot. Root numbers were counted and the length of each root was measured at five weekly intervals starting at day 7. The total root length of each cutting was then estimated. In this study, the total root lengths at the last measurement point were used. A total of 75 trees containing complete marker and trait data are used.

#### Results:

The mapping model with an algorithm derived in the appendix was used to scan genomewide for the existence of all possible QTL that affect root length growth and/or its developmental instability in hybrid poplars. For this particular example, we derive a joint likelihood function of the data by combining the three different blocks although all the parameters are assumed to be identical among the blocks. To assure that the data are homogeneous, they were log-transformed before statistical analyses. On the basis of hypothesis (10), we calculated the LR values across the entire linkage map (Figure 1). A QTL on linkage group D4 was detected to be significant genomewide, with the maximum LR value 152.9 greater than the 0.05 significance threshold (150.4) determined from 1000 permutation tests. This QTL is bracketed by two highly linked AFLP markers, AG/CGA-395d and CA/CCG-680d. The MLEs of the parameters (**Θ**) related to this QTL are given in Table 1.

The two subsequent hypotheses (12 and 15) were made to test whether this QTL triggers an effect on root length growth and the developmental instability of the trait, respectively. The testing results suggest that the QTL has a significant additive effect on the root length (Table 1), explaining 19.9% of the total phenotypic variance for the trait. In addition, this QTL also displays a significant effect on the developmental instability of root length, with the nugget parameters being different between the two QTL genotypes (0.802 and 0.094; Table 1). Developmental instability explains different proportions of the residual variance for two different QTL genotypes, respectively, as calculated by Equation 16, with such a proportion being larger for the larger genotype (29.2%) than for the smaller genotype (5.9%).

#### Monte Carlo simulation:

We performed simulation studies to investigate the statistical properties of the model for mapping developmental instability QTL. We mimicked the poplar example by simulating the backcross of a low sample size (*n* = 75). A linkage group of length 200 cM is constructed with 10 evenly distanced markers for this backcross. To investigate the influence of sample size on parameter estimation, we performed further simulation studies by increasing sample size to 200 and 500. Each progeny from this backcross is planted with a square plot of four copies. Assume that a QTL located at 48 cM from the first marker at the left of the linkage group affects a normally distributed trait. With the given additive genetic effect of the QTL, along with the parameters that specify the spatial and nonspatial structure of the covariance matrix (Equation 5; Table 2), the phenotypic values of the backcross are simulated under different heritability levels (*h*^{2} = 0.1 and 0.4). The spatial correlation is modeled by the AR model (Equation 6).

The simulated data are subjected to statistical analyses by the mapping model proposed. For a small sample size like the one used for the poplar example, the model provides reasonable but not excellent estimates for all the parameters (including the QTL position, QTL effects on the simulated trait and its developmental instability, and covariance-structuring parameters) when the heritability of the simulated trait is low (Table 2). However, as the sample size increases, the precision of parameter estimation increases remarkably. As expected, these estimates are more accurate and more precise with a high heritability. The model appears to have high power for the detection of significant effects on developmental instability, even for a low sample size and a low heritability level (Table 2).

## DISCUSSION

Because developmental systems are inherently “noisy” and frequently subject to random fluctuations, considerable variation can be observed in the rates of development and morphology even among genetically identical individuals under the most uniform experimental conditions (McAdams and Arkin 1997). This so-called developmental noise or instability has been thought to be related to the fitness and evolution of the organism (Møller and Swaddle 1997). The study of the genetic control of developmental instability has been the subject of great interest in evolutionary and developmental biology (Klingenberg and Nijhout 1999; Leamy and Klingenberg 2005) and agricultural genetics (Wu 1997). Traditional quantitative genetic approaches have been used to estimate the genetic variability and heritability of developmental instability measured as FA for bilateral traits. Specific QTL have been identified for FA in mice with molecular genetic linkage maps (Leamy *et al*. 2002). In this article, we have framed a general design to estimate QTL that affect developmental instability for those traits whose developmental instability cannot be measured in terms of FA. We have derived a statistical model and a computational procedure for testing and estimating the effect of QTL on developmental instability with the data collected from a field trial.

The idea behind our model is the incorporation of the spatial autocorrelation model into the QTL mapping framework. We have implemented a common form of spatial autocorrelation (Cressie 1991) that involves both spatial and nonspatial, *i.e*., local or “nugget,” components (Isaaks and Srivastava 1989). Thus, rather than estimating all elements contained within the covariance matrix, our model estimates a few key parameters that model the structure of the covariance matrix. Spatial components result from some predictable microenvironments within a plot, whereas nonspatial components result from local unpredictable perturbations. Thus, the proportion of nonspatial relative to spatial variances can be regarded as the degree of developmental instability. We derived the EM algorithm to estimate the parameters that structure the covariance matrix. The derived algorithm is robust, as seen from results through simulation studies, in that the model provides reasonable estimates of genotypic effects and spatial parameters over a range of space including different sample sizes and heritability levels.

The new model has been employed to identify a QTL that affects rooting capacity and its developmental instability from cuttings of poplar trees in a controlled water culture laid in a square plot with three blocks. We have detected a significant QTL on a linkage group constructed by AFLP markers in an interspecific hybrid population of two poplar species (Yin *et al*. 2002). On the basis of previous quantitative and molecular genetic studies, rooting capacity is found to be under strong genetic control in woody plants (Wang *et al*. 1988; Wullschleger *et al*. 2005). Several QTL for fine or coarse root biomass traits have been mapped on specific chromosomal locations for hybrid poplars between *P. trichocarpa* × *P. deltoides* (Wullschleger *et al*. 2005). A smaller number of the root QTL detected in this study may be due to a small sample size and/or possible C-effects, a common phenomenon in vegetative propagation (Stelzer *et al*. 1998), that describe variation in trait performance among different ramets from different tree positions. A structuring model for approximating the dependence of ramets can be proposed to remove C-effects that may have confounded the estimates of the genetic parameters in the poplar example.

As a first conceptual exploration of its kind, our mapping model was formulated in terms of interval mapping and it should be powerful to detect unlinked QTL for a quantitative trait. Its capacity to separate multiple linked QTL can be improved through the combination of interval mapping and partial regression analysis on all the markers except for the two that flank the QTL. This so-called composite interval mapping (Zeng 1994) that has proven to increase the resolution of QTL mapping can be readily integrated with our mapping model to more precisely identify positions and effects of QTL that control the developmental instability of a quantitative trait. Another immediate issue is to incorporate the effects of QTL × environment interactions on developmental instability when the trial is conducted under multiple different sites. Such a multisite trial is needed to avoid the inflated estimate of the genetic control of developmental instability. Our developmental model and its extensions will prove their value to draw a detailed picture of the developmental instability of complex phenotypes for a variety of organisms in plants and animals.

## APPENDIX

In what follows, we derive the formulas for obtaining the MLEs of all the unknown parameters (**Ω**), except for the QTL position, when the experimental design used has a square plot. The formulas for other designs of a plot can be derived in a similar way. The log-likelihood equations for estimating the MLEs of **Ω** for a square plot whose spatial correlation matrix is modeled by the AR(1) are derived aswhere ′ denotes a subsequent iteration in the M step, and

Considering a square plot, the residual variance matrix, denoted by **Σ**_{h}, for QTL genotype *h* is modeled bywhere and ϕ = ρ* ^{d}*. The determinant and inverse of the matrix are derived;

*i.e*.,We havewhere

## Acknowledgments

We thank two anonymous reviewers for their constructive comments on this manuscript. This work is partially supported by grants (09-95671 and 30671704) from the National Natural Science Foundation of China and from the National Science Foundation (0540745).

## Footnotes

Communicating editor: M. W. Feldman

- Received March 2, 2007.
- Accepted March 27, 2007.

- Copyright © 2007 by the Genetics Society of America