## Abstract

The biological and statistical advantages of functional mapping result from joint modeling of the mean-covariance structures for developmental trajectories of a complex trait measured at a series of time points. While an increased number of time points can better describe the dynamic pattern of trait development, significant difficulties in performing functional mapping arise from prohibitive computational times required as well as from modeling the structure of a high-dimensional covariance matrix. In this article, we develop a statistical model for functional mapping of quantitative trait loci (QTL) that govern the developmental process of a quantitative trait on the basis of wavelet dimension reduction. By breaking an original signal down into a spectrum by taking its averages (smooth coefficients) and differences (detail coefficients), we used the discrete Haar wavelet shrinkage technique to transform an inherently high-dimensional biological problem into its tractable low-dimensional representation within the framework of functional mapping constructed by a Gaussian mixture model. Unlike conventional nonparametric modeling of wavelet shrinkage, we incorporate mathematical aspects of developmental trajectories into the smooth coefficients used for QTL mapping, thus preserving the biological relevance of functional mapping in formulating a number of hypothesis tests at the interplay between gene actions/interactions and developmental patterns for complex phenotypes. This wavelet-based parametric functional mapping has been statistically examined and compared with full-dimensional functional mapping through simulation studies. It holds great promise as a powerful statistical tool to unravel the genetic machinery of developmental trajectories with large-scale high-dimensional data.

FUNCTIONAL mapping of dynamic or longitudinal traits, such as growth curves, human immunodeficiency virus dynamics, programmed cell death, circadian rhythms, and pharmacodynamics/pharmacokinetics, advocated by R. Wu and colleagues (Ma *et al*. 2002; Wu *et al*. 2003, 2004a,b,c; Zhao *et al*. 2004a,b, 2005; reviewed in Wu and Lin 2006), has emerged as an important statistical method to map and identify specific quantitative trait loci (QTL) that underlie the genetic architecture of complex traits. Genetic mapping of longitudinal traits has also received considerable attention by other workers who have developed different statistical designs (Macgregor *et al*. 2005) or methods (Yang *et al*. 2006) for estimating the temporal pattern of genetic variance and covariance explained by a QTL in a time course. The rationale of functional mapping is to express the genotypic effects of QTL at a series of time points in terms of a continuous growth function with respect to time or other independent variables. Under this formulation, the parameters describing longitudinal trajectories, rather than time-dependent genotypic values, are estimated within a maximum-likelihood framework. Also, unlike traditional genetic mapping strategies, functional mapping estimates the parameters that model the structure of the covariance matrix among multiple different time points, which, therefore, largely reduces the number of parameters being estimated for variances and covariances, especially when the dimension of data is high.

The significant advantage of functional mapping is that it provides a quantitative framework for testing the interplay between genetic (inter)actions and the pattern of development in a time course. Functional mapping constructs a setting for precisely estimating and predicting a number of fundamental events in the genetic control of development (Wu *et al*. 2004a), which include (1) the timing of a QTL to turn on and off to affect growth in a time course, (2) the duration of the dynamic genetic effect of a QTL, (3) the magnitude of the genetic effect of a QTL on maximal growth rate and the lasting period of maximal growth rate, and (4) the pleiotropic effect of the growth QTL on other developmental traits related to growth processes. Despite these elegant features, functional mapping based on simple parametric modeling of mean-covariance structures may be limited in the following two aspects when the dimension of longitudinal data is large. First, one of the most promising directions in genetic studies is to view the formation of a quantitative trait as a dynamic system, dissect the system into a series of fundamental components, and evaluate the developmental complexity of the system on the basis of the significance of each component. Functional mapping has power to detect the genes involved in various stages of development of each component and draw a detailed picture of interactive networks of these genes. However, functional mapping of multivariate high-dimensional data required to describe a system is encumbered with a tremendous computational burden. Second, with increased dimensionality of repeated measurements, it is difficult to model the structure of a time-dependent covariance matrix on the basis of parametric approaches, such as autoregressive (Diggle *et al*. 2002) and antedependence models (Núñez-Antón and Zimmerman 2000; Zimmerman and Núñez-Antón 2001) or the Brownian process (Sy *et al*. 1997). Also, high dimension may make the computation of the inverse of such a structured covariance matrix unstable, with the determinant approaching infinity or zero.

An efficient approach for applying functional mapping to high-dimensional data is through dimensionality reduction, *i.e*., the transformation that brings data from a high- to a low-order dimension. Wavelet dimensionality reduction preserves the signal pattern and yields better or comparative classification accuracy, as shown by Donoho and colleagues (Donoho and Johnstone 1994; Donoho 1995; Donoho *et al*. 1995). More recently, wavelet-based approaches have been integrated with functional mixed models to extract meaningful information from high-dimensional functional and longitudinal data (Morris *et al*. 2003, 2006; Morris and Carroll 2006). In these applications, wavelet-based approaches, serving as a nonparametric tool to fit curves that cannot be mathematically described, play a similar role to other nonparametric approaches like B-spline smoothing. For many biological processes, such as growth curves, in which explicit mathematical equations exist (West *et al*. 2001), the nonparametric application of wavelet-based approaches will reduce the power of functional mapping to quantitatively test biologically meaningful hypotheses through mathematical models, as itemized above. In this article, we develop a statistical model for integrating parametric functional mapping of developmental trajectories on the basis of wavelet transform methods. By reducing the dimensionality of data through wavelet transform, conventional parametric functional mapping can be performed, as has usually been practiced, and thus its biological relevance and statistical advantage are preserved. It has been shown that the low-dimensionality models are not only computationally efficient, but also more flexible and robust than high-dimensionality models. The statistical properties of this new model are investigated through simulation studies and are compared with those of full-dimensional functional mapping methods.

## WAVELET-BASED FUNCTIONAL MAPPING

#### Wavelet transform:

Through wavelet transform (also called *analysis*), a series of original functional data are divided into two sequences of wavelet coefficients with exactly the same length, half of the original signal length (Mallat 1989, 1998; Vidakovic 1999; Jensen and la Cour-Harbo 2001). The first sequence of coefficients is called *smooth coefficients*, usually denoted by **c**^{−j}, which correspond to an approximation process of the original signal. The second sequence of coefficients is called *detail coefficients*, usually denoted by **d**^{−j}, which correspond to the detail information (subtleties) left by the approximation process. The symbol −*j* denotes the resolution level at which the initial sequence is split into smooth and detail coefficients. Since detail coefficients are contaminated severely by random errors, shrinking them to zero can help to reduce the overall noise level of the signal (Donoho and Johnstone 1994; Donoho 1995; Donoho *et al*. 1995).

Here we incorporate the principle of wavelet transform into the framework of functional mapping for developmental trajectories. Suppose there is an experimental mapping population, such as the backcross, with two different genotypes at a putative QTL. Each individual of this population is measured for a growth trait at *T* different time points. Molecular markers have been genotyped for the population to identify and map QTL that control growth trajectories. For a particular QTL genotype *l* (*l* = 1, 2), the phenotypic observation (*y _{i}*(

*t*

_{τ})) for individual

*i*at time

*t*

_{τ}can be written aswhere ξ

_{i}is an indicator variable for individual

*i*denoted as 1 for QTL genotype 1 and 0 for QTL genotype 2,

*u*(

_{l}*t*

_{τ}) is the genotypic value for QTL genotype

*l*at time

*t*

_{τ}, and

*e*(

_{i}*t*

_{τ}) is the residual error at time

*t*following a normal distribution

*N*(0, σ

^{2}). The residual covariance between two different time points and is expressed as .

Letting **y**_{i} = {*y _{i}*(

*t*

_{1}), … ,

*y*(

_{i}*t*)}′,

_{T}**u**

_{l}= {

*u*(

_{l}*t*

_{1}), … ,

*u*(

_{l}*t*)}′, and

_{T}**e**

_{i}= {

*e*(

_{i}*t*

_{1}), … ,

*e*(

_{i}*t*)}′, we have(1)where

_{T}**u**

_{l}is the mean vector for QTL genotype

*l*and

**e**

_{i}is the normally distributed residual error vector with mean zero and covariance matrix

**Σ**. For the wavelet transformation, a matrix

**H**exists such that(2)The simplest wavelet transform is the discrete Haar transform. In the Haar transform, detail coefficients are calculated simply by subtracting successive values in the sequence (Walker 1999). Specifically, we are interested in the discrete Haar transform for two sequences of full-dimensional signals contained in the regression model (Equation 1), one for the phenotypic vector of individual

*i*,and the second for the mean vector of QTL genotype

*l*,

After the first-resolution Haar wavelet transform, the smooth and detail coefficients of these two signals are expressed, respectively, as(3)and(4)Note that variation in detail coefficients at resolution 1 reflects only local, nearest-neighbor fluctuations in the sequence. Similarly, the resolution 2 smooth coefficients are obtained by summing pairs of resolution 1 smooth coefficients. At each resolution, the number of smooth and detail coefficients obtained drops by . The process can be iterated until only one smooth and one detail coefficient are produced (Walker 1999).

The pattern of the smooth coefficients in the wavelet space resembles the signal pattern in the time space. In Figure 1, *c*^{0} represents a sample of repeated measurements of a logistic growth curve at 12 time points. The smooth coefficients of the first and second Haar wavelet transformation are plotted as *c*^{−1} and *c*^{−2}, respectively. The pattern of *c*^{−1} and *c*^{−2} coefficients conforms to the signal pattern although they are in two different resolution levels. Because of the similarity, it is reasonable to model the original growth pattern on the basis of low-dimensional smooth coefficients.

#### Likelihood of smooth coefficients:

Let **c**^{−j} be the new variable with a reduced dimension *K* (*K* < *T*) transformed from the *j*th resolution Haar wavelet. The likelihood function based on a mixture model containing two QTL genotypes in the backcross can be rewritten, in terms of **c**^{−j} and marker information **M**, as(5)where (ω_{1|i}, ω_{2|i}) are the mixture proportions corresponding to the frequencies of different QTL genotypes for individual *i*. Because the marker genotype of each individual is known, these mixture proportions actually represent the conditional probabilities expressed in terms of the recombination fractions between the markers and QTL. **Θ** = {ω_{l|i}, **w**, **Σ**_{−j}} contains unknown parameters about the QTL position and the mean vector and covariance matrix after the wavelet transform, and *f _{l}*(

**c**;

**w**,

**Σ**

_{−j}) is the multivariate normal distribution of individual

*i*that carries QTL genotype

*l*expressed as(6)in which

**c**= {

*c*(

*k*

_{1}), … ,

*c*(

*k*)} is a vector of smooth coefficients for individual

_{K}*i*,

**w**= {

*w*

_{l}^{−j}(

*k*

_{1}), … ,

*w*

_{l}^{−j}(

*k*)} is a vector of expected smooth coefficients for QTL genotype

_{K}*l*, and

**Σ**

_{−j}is the (

*K*×

*K*) residual covariance matrix for the smooth coefficients.

#### Modeling the mean-covariance structures:

The tenet of functional mapping is used to model the mean vector **w** by a simple parametric function of biological meaning denoted as ) with being a set of mathematical parameters that describe the shape of the growth curve specifically for QTL genotype *l*. For the first resolution transformation, this modeling process can be mathematically expressed as(7)Thus, by estimating with wavelet-transformed data at an appropriate transformation resolution *j*, trajectory curves in a time course can be drawn individually for different QTL genotypes. Genotypic differences for the curve can be compared and tested for the statistical significance of genetic control over developmental traits.

As shown in functional mapping, the covariance matrix **Σ** among different time points can be modeled using parametric approaches. The structure of the covariance matrix can be modeled by the first-order autoregressive [AR(1)] model (Diggle *et al*. 2002), expressed as(8)where 0 < ρ < 1 is the proportion parameter with which the correlation decays with time lag. The parameters that model the structure of the covariance matrix are arrayed in **Θ**_{v} = (ρ, σ^{2}).

Alternatively, the age-specific change of variance and correlation in the analysis of longitudinal traits can be modeled by Zimmerman and Núñez-Antón's (2001) SAD model. Using matrix notation, the error term in regression model (1), ignoring the subscript *i*, can be expressed aswhere **e** = {*e*(*t*_{1}), … , *e*(*t _{T}*)}′, ϵ = {ϵ(

*t*

_{1}), … , ϵ(

*t*)}′, and for the first-order SAD [SAD(1)] model

_{T}The residual covariance matrix of the longitudinal trait is then expressed (Jaffrézic *et al*. 2003) as(9)where **Σ**_{ϵ} is the innovation covariance matrix, expressed asThe time-dependent innovative variance σ(*t*_{τ}) can be approached by a polynomial (Pourahmadi 1999) or, for simplicity, is assumed as a constant σ. Thus, the residual matrix **Σ** contains the parameters, **Θ**_{v} = (ϕ, σ), that model it.

The residual variances and covariances of the smooth coefficients for the developmental trait measured at *t _{T}* time points after Wavelet transforms can be derived, with expressions depending on the resolution level of transform. For example, the variance of first-resolution smooth coefficient

*c*

_{i}^{−1}(

*k*

_{1}) is expressed asand the covariance between

*c*

_{i}^{−1}(

*k*

_{1}) and

*c*

_{i}^{−1}(

*k*

_{2}) is expressed asVariances and covariances between different time points can be modeled by matrix (Equation 8) for the AR(1) model or by matrix (Equation 9) for the SAD(1) model.

#### Computational algorithms:

We adopt the previous EM algorithm developed on the basis of log-likelihood function (5) (Ma *et al*. 2002; Wu *et al*. 2004a,b,c) to estimate the QTL position in terms of QTL-marker recombination fractions, the curve parameters () that model the mean vector, and the parameters (**Θ**_{v}) modeling the covariance matrix. In the appendix, a detailed EM algorithm is given to estimate these parameters.

In practice, the QTL position parameter can be viewed as a fixed parameter because a putative QTL can be searched at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. The amount of support for a QTL at a particular map position is often displayed graphically through the use of likelihood maps or profiles, which plot the likelihood-ratio test statistic as a function of map position of the putative QTL.

#### Thresholding rule:

As the noise-free signal is unknown, an approximation of the signal should be sought that is smooth and fits the input well. There are two predominant thresholding schemes for this approximation based on wavelet dimensionality reduction or wavelet shrinkage. One is the *hard threshold filter* (*H*_{h}), or referred to as the “keep or kill” method (Aboufadel and Schlicker 1999), that removes coefficients below a threshold value, determined by the noise variance. The second is the soft threshold filter (*H*_{s}) that shrinks the wavelet coefficients above and below the threshold. Soft thresholding reduces coefficients toward zero (Ghael *et al*. 1997). The process of denoising may lose some information in that the denoised signal is irreversibly different from the noisy signal. Thus, thresholding is the cause of this loss of information.

If we desire the resulting signal to be smooth, the soft threshold filter should be used, although the hard threshold filter performs better. In practice, it is difficult to choose a threshold value. A small threshold value creates a noisy result near the input, while a large threshold value introduces bias. The optimal threshold is somewhere in between.

Many approaches have been available to determine the threshold level. A universal method assigns a threshold level equal to the variance times, expressed as(10)where *n* is the sample size (Donoho and Johnstone 1994). The hard thresholding rule is defined as(11)As pointed out by Donoho and Johnstone (1994) and Johnstone and Silverman (1997) that, for a sequence of normal distributed random variables *z _{i}* ∼

*N*(0, σ),

*i*= 1, … ,

*n*,(12)so that if a detail coefficient is truly zero then with a high probability it is estimated as zero by the hard thresholding rule. The expected number of greater than the threshold tends to 0. For most applications, the hard thresholding rule keeps only those detail coefficients that are significantly greater than zero. We use the hard thresholding rule as a guidance to either keep or kill the whole level of detail coefficients.

The following procedure is proposed to perform data dimensionality reduction through wavelet transforms:

Select proper orthogonal wavelet filters.

Calculate the empirical variances for the detail coefficients.

Apply the hard thresholding rule to the detail coefficients.

Truncate the whole level of the detail coefficients if they are all set to zero by step 3, and keep the whole level of the detail coefficients otherwise.

Repeat procedures 1–4 for user prescribed

*j*times. Functional QTL mapping will be conducted on the basis of the smooth coefficients**c**^{−j}if the rest of detail coefficients are truncated or on the basis of the smooth coefficients**c**^{−j+1}otherwise.

## HYPOTHESIS TESTING

The main advantages of functional mapping include the formulation of a number of biologically meaningful hypothesis tests regarding the interplay between gene action/interactions and development patterns (Wu *et al*. 2004a). These advantages can be preserved for wavelet-based functional mapping. The first hypothesis test is about the existence of a QTL that affects a longitudinal trait, expressed asThe test is performed using the log-likelihood ratio (LR) statistics calculated at all given test locations,(13)where *L*_{0} and *L*_{1} are the corresponding values of the plug-in likelihood values under the null (H_{0}) and the alternative hypothesis (H_{1}). An LR profile is then plotted against test locations throughout the genome. Because the QTL position under the null hypothesis is a nuisance parameter, the distribution of the LR values under H_{0} is unclear. The critical threshold value can be determined empirically by permutation tests as advocated by Churchill and Doerge (1994). Those locations with the LR higher than the threshold value are declared to harbor significant QTL.

Other hypothesis tests after a significant QTL is detected include the significance of the additive genetic effect on a certain point or period of growth trajectory during a time course. The wavelet-based parametric model can be used to test the genetic control over particular developmental timing and specific mathematical parameters that determine the shape of a curve. Furthermore, wavelet-based functional mapping facilitates the resolution of fundamental biological issues as follows: (1) If multiple environments are used, as shown in Zhao *et al*. (2004b), we can test how the detected QTL interact with the environment in a coordinated manner to determine growth performance at different developmental stages; (2) the model can be readily extended to model multiple QTL for growth trajectories throughout the genome and estimate and test the effects of different components of epistasis on growth patterns and forms (Wu *et al*. 2004a); and (3) wavelet-based parametric functional mapping builds up a bridge between the genetic mechanisms underlying different traits, quantitatively testing for the role of pleiotropic QTL on the phenotypic integration of a biological system.

Unlike the test statistic for the QTL significance by Equation 13, the critical threshold values for all the tests after a significant QTL was detected should be determined from simulation studies. However, the null hypotheses for these tests are well nested with their alternative hypotheses and, for this reason, the corresponding LR values can be thought to asymptotically follow a χ^{2}-distribution. Therefore, in theory, the critical values for these tests can be obtained from the χ^{2}-table, although the asymptotic properties of these test statistics need to be investigated analytically or empirically.

## MONTE CARLO SIMULATION

Extensive real data analyses and simulation studies of functional mapping based on parametric modeling have been performed in previous studies (Ma *et al*. 2002; Wu *et al*. 2004a,b,c; Zhao *et al*. 2004a,b, 2005) to examine its statistical behavior. Here we focus our simulation studies on the comparison between wavelet-based and conventional full-dimensional functional mapping for growth trajectories. Three different dimensions of longitudinal data, low (*T* = 8), intermediate (*T* = 16), and high (*T* = 64), were simulated to detect the advantage of wavelet-based over conventional approaches when longitudinal data are high dimensional.

We assume a backcross population of a moderately large sample size (200), in which there are two genotypes at each gene. A total of six markers are evenly distributed along a linkage group of length 100 cM. A QTL assumed to affect growth trajectories is located at the center of the linkage group (*i.e*., 50 cM from either end of the linkage group). By assuming two different QTL genotypic curves, defined by , where *a _{l}* is the genotype-specific asymptotic growth when time is infinite,

*b*is the genotype-specific parameter related to the initial growth, and

_{l}*r*is the genotype-specific relative growth rate, phenotypic data were simulated using the linear regression model (2), with the time-dependent covariance of residual errors following the AR(1) model. The variance σ

_{l}^{2}was selected according to the heritability of

*H*

^{2}= 0.1 and 0.4 at the time point where the genetic variance is in the middle of the largest and the smallest.

It is important to note that the genetic control of a QTL over growth may have different patterns. We consider four such different patterns (Figure 2), as also indicated in Wu and Lin (2006), to investigate how functional mapping affects the estimation of genotypic curves for different patterns. Pattern 1 with two parallel genotypic curves shows that a QTL exerts its effect on the entire process of growth, whereas the other three patterns show the changes of QTL effects over the time course because the two genotypic curves are crossed at a certain point of development. Pattern 2 suggests that a QTL has no effect at an earlier stage of development, but displays an increased influence with time after a particular timing. Pattern 3 is the inverse of pattern B and, thus, reflects the impact of an “early QTL.” The two genotypic curves in pattern 4 make a crossover, suggesting that the underlying QTL alters the direction of its effect. Curve parameters () that defined these two patterns are given in Table 1.

Both full-dimensional and wavelet-based functional mapping were used to analyze the simulated data including marker information and longitudinal measures, respectively. According to the thresholding rule, the optimal resolution levels of wavelet transform are 1, 2, and 3 and, therefore, the optimal dimensions after reduction are 4, 4, and 8, for the 8-, 16-, and 64-dimensional longitudinal data, respectively. For each simulation scheme, 200 runs were repeated to estimate the square roots of the mean square errors of the maximum-likelihood estimates (MLEs) of unknown parameters as well as averaged computing times in seconds under the two models. Figure 3 illustrates the profiles of LR values across the simulated linkage group for two different levels of heritability of growth curves (0.1 and 0.4) and of dimension of longitudinal data (16 and 64) calculated from full-dimensional and wavelet-based functional mapping. The peaks of the LR profiles near the center of the linkage group suggest that the two models can provide a good estimate of the QTL location for each simulation scheme.

The two functional mapping models provide comparable results; *i.e*., both of them can be well used to estimate the location of a QTL and QTL genotype-specific parameters that define the shape of growth trajectories (Tables 2–4⇓⇓). As expected, the accuracy and precision of parameter estimation increase when the heritability of growth curves increases from 0.1 to 0.4. The estimation precision of parameters is markedly affected by the dynamic pattern of genetic effects. The QTL location can be estimated with better precision for pattern 3 than for patterns 1, 2, and 4. The estimation precision of all three logistic parameters is better when the dynamic pattern of genetic effects obeys patterns 2 and 3 than patterns 1 and 4. It seems that the estimation precision of the parameters also depends on their types and QTL genotypes. For example, when the dimension of repeated measure is 16, initial growth (*b*) can be better estimated for QTL genotype 1 under pattern 4 than 1, whereas the converse was true for relative growth (*r*) rate, although the two patterns have similar estimation precision for asymptotic growth (*a*) (Table 3). The dimension of longitudinal data affects the precision of parameter estimates. Increasing dimensions from an intermediate (16; Table 3) to high level (64; Table 4) can always improve the estimation precision for all the patterns, but the influence of dimension was pattern specific when the dimensions increase from a low (8; Table 2) to an intermediate level. For patterns 2 and 3, an intermediate dimension provided more precise estimates for growth curve parameters than a low dimension, but, for patterns 1 and 4, the converse was detected.

The computing times used for full-dimensional and wavelet-based functional mapping differ notably. The advantage of wavelet-based over full-dimensional functional mapping in computing efficiency is consistent for growth curves of different heritabilities and different patterns of dynamic genetic effects. Wavelet-based functional mapping is computationally more efficient by approximately one to two and two to four times relative to full-dimensional functional mapping when the dimension of longitudinal data is 8 or 16. This computational efficiency increases to approximately four to eight times when the dimension increases to 64.

## DISCUSSION

Genetic mapping has been used as a routine approach to studying the genetic control of complex traits (Lander and Botstein 1989; Zeng 1994; Lynch and Walsh 1998). There is now a pressing need for statistical methods that can unravel the genetic machinery of developmental characteristics because many traits of agricultural, biological, and biomedical interest exhibit a complex developmental feature. Functional mapping, aimed at characterizing dynamic QTL underlying time-dependent phenotypic differentiation, has emerged as a powerful tool in genetic research (Ma *et al*. 2002; Wu *et al*. 2003, 2004a,b,c; Zhao *et al*. 2004b, 2005; Macgregor *et al*. 2005; Yang *et al*. 2006; reviewed in Wu and Lin 2006). For a high-dimensional response data set contaminated by noise, the detection of QTL by functional mapping suffers from the so-called “curse of dimensionality.” Also, the complexity and high dimension of longitudinal data make functional mapping computationally prohibitive. In this article, we have incorporated the principle of wavelet shrinkage to map QTL that affect developmental trajectories. As an increasingly popular means for data compression and denoising in the context of signal and image processing (Donoho and Johnstone 1994; Donoho 1995; Donoho *et al*. 1995), wavelet shrinkage can also display favorable properties to be used for genetic mapping by projecting higher-dimensional data to a manageable lower-dimensional subspace.

The effective use of the wavelet thresholding model for functional mapping can be attributed to three factors. First, the dimensionality of smooth coefficients is only a fraction of the data dimensionality. The curse of dimensionality is removed by considering models for low-dimensional smooth coefficients. Second, the smooth coefficients capture the pattern of the original signal (see Figure 1). As shown in Fan and Lin (1998), the Fourier transformation preserves information content mostly in the low-frequency Fourier coefficients. Likewise, smooth coefficients are the low-frequency counterpart of the signal (Vidakovic 1999). Third, removing detail coefficients from the transformation corresponds to removing high-frequency contents of the data, which is equivalent to reducing the noise level and the redundancies of the data. Truncating detail coefficients corresponds to a denoising process (Donoho and Johnstone 1994; Donoho 1995; Donoho *et al*. 1995). By throwing out the noise-rich detail coefficients, the remaining smooth coefficients have a higher signal-to-noise ratio under certain conditions.

We performed simulation studies to investigate the statistical behavior of wavelet-based functional mapping. The data were simulated for multiple time points using QTL genotype-specific logistic curves, but analyzed by both wavelet-based and full-dimensional functional mapping models. The results suggest that the wavelet-based functional mapping outperforms the full-dimensional model in computational efficiency, while these two models provide consistent results about parameter estimation. With an increased dimension of longitudinal data, the advantage of wavelet-based over full-dimensional functional mapping in computation becomes more pronounced.

The advantage of functional mapping in biology comes from the parametric modeling of the dynamic change of genotypic values for a putative QTL over different time points and the testing of QTL effects on the pattern and form of growth curves on the basis of mathematical parameters that define the curves. Wavelet-based shrinkage has been generally used as a nonparametric approach to handle functional data (Morris *et al*. 2003, 2006; Morris and Carroll 2006), thus exhibiting a broad use in situations in which no explicit mathematical function is available to describe longitudinal curves. For many longitudinal trajectories that can be mathematically formulated, such as the sigmoid shape of organismic growth (von Bertalanffy 1957; West *et al*. 2001), biexponential decay of viral load after medical treatment (Ho *et al*. 1995; Wu and Ding 1999), and periodic functions of circadian rhythm (Scheper *et al*. 1999), a nonparametric treatment will unavoidably reduce their biological relevance. One of the most significant relevances of our wavelet-based functional mapping proposed is to preserve all the favorable properties of functional mapping in biologically meaningful hypothesis tests through the integration of wavelet shrinkage within the framework of parametric functional mapping.

For a practical data set, the true mean-covariance structures are not known. As shown by Zhao *et al*. (2005), different covariance structures may give different results of QTL location estimation. In our simulation studies, the AR(1) model was used to approximate the structure of the covariance matrix. It would be important to model the mean-covariance structures for the time–space data set using different approaches (see Pourahmadi 1999, 2000; Pan and Mackenzie 2003; Wu and Pourahmadi 2003) and further derive dimensionally reduced models at different resolutions.

The wavelet dimensionality reduction method proposed in this article has power to map and identify QTL responsible for high-dimensional longitudinal traits. It opens a new avenue to draw a detailed picture of the genetic architecture of a complex biological system by dissecting it into different components and developmental stages and studying the genetic regulation and interactions of different QTL involved in various sequential steps of development. As a starting point, we hope to develop more sophisticated wavelet-based functional mapping to take into account the complexity and high dimension of longitudinal data that are needed to describe a biological system in a comprehensive way.

## APPENDIX

In what follows, we provide the EM algorithm for estimating genotype-specific curve parameters (*l* = 1, 2) for the logistic equation of growth trajectories and the covariance parameters **Θ**_{v} = (ρ, σ^{2}) based on the AR(1) model from wavelet-based functional mapping. The observed log-likelihood function of Equation 5 is rewritten aswhereuntil **w** and **w** are determined according to the thresholding rule.

Assuming that QTL genotypes (**Q**) for each individual are known, *i.e*., there are no missing data, we construct the complete log-likelihood function aswhere ξ_{1} is the indicator variable for a QTL genotype of individual *i* as defined before.

In the E-step of the EM algorithm, we need to calculate the conditional expectation of the complete log-likelihood given the observed data and the current estimation of parameters,where, **E**(·) and **E**_{i}(·) denote the conditional expectations given the observed data and current estimation of parameters.

So, we define the E-step by expressing the posterior probabilities of individual *i* to be QTL genotype 1 or 2 as

The M-step is derived by solving the log-likelihood equations of the expected complete log-likelihood function given the observed data and current estimations,and .

Note that since the smooth coefficients are weighted averages of the neighboring points, the correlation structure between these smooth coefficients resembles the correlation structure of the full-dimensional data. Here we take advantage of this resemblance and directly model the correlation structure of the smooth coefficients. If the likelihood equations cannot be solved directly, the Newton–Raphson algorithm is implemented within the M-step to solve these equations numerically. For an illustration, we show only the method for computing *b*_{1}. The ζth step in the Newton–Raphson algorithm is expressed as

The E- and M-steps are iteratively repeated until the estimates of parameters are stable. These stable estimates are regarded as the MLEs of parameters.

## Acknowledgments

We thank Bruce Walsh for his encouragement to develop this wavelet-based functional mapping model and several anonymous referees for their constructive comments on the earlier versions of this manuscript. The preparation of this work was partially supported by a National Science Foundation grant (0540745), a National Institutes of Health grant (R01 NS041670), and grants from the National Natural Science Foundation of China (09-95671 and 30230300).

## Footnotes

Communicating editor: J. B. Walsh

- Received January 13, 2007.
- Accepted April 10, 2007.

- Copyright © 2007 by the Genetics Society of America