help button home button Genetics Blood
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Originally published as Genetics Published Articles Ahead of Print on April 15, 2007.

Genetics, Vol. 176, 1879-1892, July 2007, Copyright © 2007
doi:10.1534/genetics.107.070920

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
genetics.107.070920v1
176/3/1879    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zhao, W.
Right arrow Articles by Wu, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhao, W.
Right arrow Articles by Wu, R.

Wavelet-Based Parametric Functional Mapping of Developmental Trajectories With High-Dimensional Data

Wei Zhao, Hongying Li, Wei Hou and Rongling Wu1

Department of Statistics, University of Florida, Gainesville, Florida 32611

1 Corresponding author: Department of Statistics, University of Florida, Gainesville, FL 32611.
E-mail: rwu{at}stat.ufl.edu

Manuscript received January 13, 2007. Accepted for publication April 10, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
The biological and statistical advantages of functional mapping result from joint modeling of the mean-covariance structures for developmental trajectories of a complex trait measured at a series of time points. While an increased number of time points can better describe the dynamic pattern of trait development, significant difficulties in performing functional mapping arise from prohibitive computational times required as well as from modeling the structure of a high-dimensional covariance matrix. In this article, we develop a statistical model for functional mapping of quantitative trait loci (QTL) that govern the developmental process of a quantitative trait on the basis of wavelet dimension reduction. By breaking an original signal down into a spectrum by taking its averages (smooth coefficients) and differences (detail coefficients), we used the discrete Haar wavelet shrinkage technique to transform an inherently high-dimensional biological problem into its tractable low-dimensional representation within the framework of functional mapping constructed by a Gaussian mixture model. Unlike conventional nonparametric modeling of wavelet shrinkage, we incorporate mathematical aspects of developmental trajectories into the smooth coefficients used for QTL mapping, thus preserving the biological relevance of functional mapping in formulating a number of hypothesis tests at the interplay between gene actions/interactions and developmental patterns for complex phenotypes. This wavelet-based parametric functional mapping has been statistically examined and compared with full-dimensional functional mapping through simulation studies. It holds great promise as a powerful statistical tool to unravel the genetic machinery of developmental trajectories with large-scale high-dimensional data.


FUNCTIONAL mapping of dynamic or longitudinal traits, such as growth curves, human immunodeficiency virus dynamics, programmed cell death, circadian rhythms, and pharmacodynamics/pharmacokinetics, advocated by R. Wu and colleagues (MA et al. 2002; WU et al. 2003, 2004a,b,c; ZHAO et al. 2004a,b, 2005; reviewed in WU and LIN 2006), has emerged as an important statistical method to map and identify specific quantitative trait loci (QTL) that underlie the genetic architecture of complex traits. Genetic mapping of longitudinal traits has also received considerable attention by other workers who have developed different statistical designs (MACGREGOR et al. 2005) or methods (YANG et al. 2006) for estimating the temporal pattern of genetic variance and covariance explained by a QTL in a time course. The rationale of functional mapping is to express the genotypic effects of QTL at a series of time points in terms of a continuous growth function with respect to time or other independent variables. Under this formulation, the parameters describing longitudinal trajectories, rather than time-dependent genotypic values, are estimated within a maximum-likelihood framework. Also, unlike traditional genetic mapping strategies, functional mapping estimates the parameters that model the structure of the covariance matrix among multiple different time points, which, therefore, largely reduces the number of parameters being estimated for variances and covariances, especially when the dimension of data is high.

The significant advantage of functional mapping is that it provides a quantitative framework for testing the interplay between genetic (inter)actions and the pattern of development in a time course. Functional mapping constructs a setting for precisely estimating and predicting a number of fundamental events in the genetic control of development (WU et al. 2004a), which include (1) the timing of a QTL to turn on and off to affect growth in a time course, (2) the duration of the dynamic genetic effect of a QTL, (3) the magnitude of the genetic effect of a QTL on maximal growth rate and the lasting period of maximal growth rate, and (4) the pleiotropic effect of the growth QTL on other developmental traits related to growth processes. Despite these elegant features, functional mapping based on simple parametric modeling of mean-covariance structures may be limited in the following two aspects when the dimension of longitudinal data is large. First, one of the most promising directions in genetic studies is to view the formation of a quantitative trait as a dynamic system, dissect the system into a series of fundamental components, and evaluate the developmental complexity of the system on the basis of the significance of each component. Functional mapping has power to detect the genes involved in various stages of development of each component and draw a detailed picture of interactive networks of these genes. However, functional mapping of multivariate high-dimensional data required to describe a system is encumbered with a tremendous computational burden. Second, with increased dimensionality of repeated measurements, it is difficult to model the structure of a time-dependent covariance matrix on the basis of parametric approaches, such as autoregressive (DIGGLE et al. 2002) and antedependence models (NÚÑEZ-ANTÓN and ZIMMERMAN 2000; ZIMMERMAN and NÚÑEZ-ANTÓN 2001) or the Brownian process (SY et al. 1997). Also, high dimension may make the computation of the inverse of such a structured covariance matrix unstable, with the determinant approaching infinity or zero.

An efficient approach for applying functional mapping to high-dimensional data is through dimensionality reduction, i.e., the transformation that brings data from a high- to a low-order dimension. Wavelet dimensionality reduction preserves the signal pattern and yields better or comparative classification accuracy, as shown by Donoho and colleagues (DONOHO and JOHNSTONE 1994; DONOHO 1995; DONOHO et al. 1995). More recently, wavelet-based approaches have been integrated with functional mixed models to extract meaningful information from high-dimensional functional and longitudinal data (MORRIS et al. 2003, 2006; MORRIS and CARROLL 2006). In these applications, wavelet-based approaches, serving as a nonparametric tool to fit curves that cannot be mathematically described, play a similar role to other nonparametric approaches like B-spline smoothing. For many biological processes, such as growth curves, in which explicit mathematical equations exist (WEST et al. 2001), the nonparametric application of wavelet-based approaches will reduce the power of functional mapping to quantitatively test biologically meaningful hypotheses through mathematical models, as itemized above. In this article, we develop a statistical model for integrating parametric functional mapping of developmental trajectories on the basis of wavelet transform methods. By reducing the dimensionality of data through wavelet transform, conventional parametric functional mapping can be performed, as has usually been practiced, and thus its biological relevance and statistical advantage are preserved. It has been shown that the low-dimensionality models are not only computationally efficient, but also more flexible and robust than high-dimensionality models. The statistical properties of this new model are investigated through simulation studies and are compared with those of full-dimensional functional mapping methods.


    WAVELET-BASED FUNCTIONAL MAPPING
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
Wavelet transform:
Through wavelet transform (also called analysis), a series of original functional data are divided into two sequences of wavelet coefficients with exactly the same length, half of the original signal length (MALLAT 1989, 1998; VIDAKOVIC 1999; JENSEN and LA COUR-HARBO 2001). The first sequence of coefficients is called smooth coefficients, usually denoted by cj, which correspond to an approximation process of the original signal. The second sequence of coefficients is called detail coefficients, usually denoted by dj, which correspond to the detail information (subtleties) left by the approximation process. The symbol –j denotes the resolution level at which the initial sequence is split into smooth and detail coefficients. Since detail coefficients are contaminated severely by random errors, shrinking them to zero can help to reduce the overall noise level of the signal (DONOHO and JOHNSTONE 1994; DONOHO 1995; DONOHO et al. 1995).

Here we incorporate the principle of wavelet transform into the framework of functional mapping for developmental trajectories. Suppose there is an experimental mapping population, such as the backcross, with two different genotypes at a putative QTL. Each individual of this population is measured for a growth trait at T different time points. Molecular markers have been genotyped for the population to identify and map QTL that control growth trajectories. For a particular QTL genotype l (l = 1, 2), the phenotypic observation (yi(t{tau})) for individual i at time t{tau} can be written as

Formula
where {xi}i is an indicator variable for individual i denoted as 1 for QTL genotype 1 and 0 for QTL genotype 2, ul(t{tau}) is the genotypic value for QTL genotype l at time t{tau}, and ei(t{tau}) is the residual error at time t following a normal distribution N(0, {sigma}2). The residual covariance between two different time points Formula and Formula is expressed as Formula.

Letting yi = {yi(t1), ... , yi(tT)}', ul = {ul(t1), ... , ul(tT)}', and ei = {ei(t1), ... , ei(tT)}', we have

Formula 1(1)
where ul is the mean vector for QTL genotype l and ei is the normally distributed residual error vector with mean zero and covariance matrix {Sigma}. For the wavelet transformation, a matrix H exists such that

Formula 2(2)
The simplest wavelet transform is the discrete Haar transform. In the Haar transform, detail coefficients are calculated simply by subtracting successive values in the sequence (WALKER 1999). Specifically, we are interested in the discrete Haar transform for two sequences of full-dimensional signals contained in the regression model (Equation 1), one for the phenotypic vector of individual i,

Formula 2
and the second for the mean vector of QTL genotype l,

Formula 2

After the first-resolution Haar wavelet transform, the smooth and detail coefficients of these two signals are expressed, respectively, as

Formula 3(3)
and

Formula 4(4)
Note that variation in detail coefficients at resolution 1 reflects only local, nearest-neighbor fluctuations in the sequence. Similarly, the resolution 2 smooth coefficients are obtained by summing pairs of resolution 1 smooth coefficients. At each resolution, the number of smooth and detail coefficients obtained drops by Formula 4. The process can be iterated until only one smooth and one detail coefficient are produced (WALKER 1999).

The pattern of the smooth coefficients in the wavelet space resembles the signal pattern in the time space. In Figure 1, c0 represents a sample of repeated measurements of a logistic growth curve at 12 time points. The smooth coefficients of the first and second Haar wavelet transformation are plotted as c–1 and c–2, respectively. The pattern of c–1 and c–2 coefficients conforms to the signal pattern although they are in two different resolution levels. Because of the similarity, it is reasonable to model the original growth pattern on the basis of low-dimensional smooth coefficients.


Figure 1
View larger version (8K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 1.— Smooth coefficients by the Haar wavelet transformation at the first (c–1) and second resolutions (c–2) follow a similar pattern of 12-dimensional original signals (c0).

 
Likelihood of smooth coefficients:
Let cj be the new variable with a reduced dimension K (K < T) transformed from the jth resolution Haar wavelet. The likelihood function based on a mixture model containing two QTL genotypes in the backcross can be rewritten, in terms of cj and marker information M, as

Formula 5(5)
where ({omega}1|i, {omega}2|i) are the mixture proportions corresponding to the frequencies of different QTL genotypes for individual i. Because the marker genotype of each individual is known, these mixture proportions actually represent the conditional probabilities expressed in terms of the recombination fractions between the markers and QTL. {Theta} = {{omega}l|i, wFormula 5, {Sigma}j}Formula 5 contains unknown parameters about the QTL position and the mean vector and covariance matrix after the wavelet transform, and fl(cFormula 5; wFormula 5, {Sigma}j) is the multivariate normal distribution of individual i that carries QTL genotype l expressed as

Formula 6(6)
in which cFormula 6 = {cFormula 6(k1), ... , cFormula 6(kK)} is a vector of smooth coefficients for individual i, wFormula 6 = {wlj(k1), ... , wlj(kK)} is a vector of expected smooth coefficients for QTL genotype l, and {Sigma}j is the (K x K) residual covariance matrix for the smooth coefficients.

Modeling the mean-covariance structures:
The tenet of functional mapping is used to model the mean vector wFormula 6 by a simple parametric function of biological meaning denoted as Formula 6) with Formula 6 being a set of mathematical parameters that describe the shape of the growth curve specifically for QTL genotype l. For the first resolution transformation, this modeling process can be mathematically expressed as

Formula 7(7)
Thus, by estimating Formula 7 with wavelet-transformed data at an appropriate transformation resolution j, trajectory curves in a time course can be drawn individually for different QTL genotypes. Genotypic differences for the curve can be compared and tested for the statistical significance of genetic control over developmental traits.

As shown in functional mapping, the covariance matrix {Sigma} among different time points can be modeled using parametric approaches. The structure of the covariance matrix can be modeled by the first-order autoregressive [AR(1)] model (DIGGLE et al. 2002), expressed as

Formula 8(8)
where 0 < {rho} < 1 is the proportion parameter with which the correlation decays with time lag. The parameters that model the structure of the covariance matrix are arrayed in {Theta}v = ({rho}, {sigma}2).

Alternatively, the age-specific change of variance and correlation in the analysis of longitudinal traits can be modeled by ZIMMERMAN and NÚÑEZ-ANTÓN's (2001) SAD model. Using matrix notation, the error term in regression model (1), ignoring the subscript i, can be expressed as

Formula 8
where e = {e(t1), ... , e(tT)}', {epsilon} = {{epsilon}(t1), ... , {epsilon}(tT)}', and for the first-order SAD [SAD(1)] model

Formula 8

The residual covariance matrix of the longitudinal trait is then expressed (JAFFRÉZIC et al. 2003) as

Formula 9(9)
where {Sigma}{epsilon} is the innovation covariance matrix, expressed as

Formula 9
The time-dependent innovative variance {sigma}Formula 9(t{tau}) can be approached by a polynomial (POURAHMADI 1999) or, for simplicity, is assumed as a constant {sigma}Formula 9. Thus, the residual matrix {Sigma} contains the parameters, {Theta}v = ({phi}, {sigma}Formula 9), that model it.

The residual variances and covariances of the smooth coefficients for the developmental trait measured at tT time points after Wavelet transforms can be derived, with expressions depending on the resolution level of transform. For example, the variance of first-resolution smooth coefficient ci–1(k1) is expressed as

Formula 9
and the covariance between ci–1(k1) and ci–1(k2) is expressed as

Formula 9
Variances and covariances between different time points can be modeled by matrix (Equation 8) for the AR(1) model or by matrix (Equation 9) for the SAD(1) model.

Computational algorithms:
We adopt the previous EM algorithm developed on the basis of log-likelihood function (5) (MA et al. 2002; WU et al. 2004a,b,c) to estimate the QTL position in terms of QTL-marker recombination fractions, the curve parameters (Formula 9) that model the mean vector, and the parameters ({Theta}v) modeling the covariance matrix. In the APPENDIX, a detailed EM algorithm is given to estimate these parameters.

In practice, the QTL position parameter can be viewed as a fixed parameter because a putative QTL can be searched at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. The amount of support for a QTL at a particular map position is often displayed graphically through the use of likelihood maps or profiles, which plot the likelihood-ratio test statistic as a function of map position of the putative QTL.

Thresholding rule:
As the noise-free signal is unknown, an approximation of the signal should be sought that is smooth and fits the input well. There are two predominant thresholding schemes for this approximation based on wavelet dimensionality reduction or wavelet shrinkage. One is the hard threshold filter (Hh), or referred to as the "keep or kill" method (ABOUFADEL and SCHLICKER 1999), that removes coefficients below a threshold value, determined by the noise variance. The second is the soft threshold filter (Hs) that shrinks the wavelet coefficients above and below the threshold. Soft thresholding reduces coefficients toward zero (GHAEL et al. 1997). The process of denoising may lose some information in that the denoised signal is irreversibly different from the noisy signal. Thus, thresholding is the cause of this loss of information.

If we desire the resulting signal to be smooth, the soft threshold filter should be used, although the hard threshold filter performs better. In practice, it is difficult to choose a threshold value. A small threshold value creates a noisy result near the input, while a large threshold value introduces bias. The optimal threshold is somewhere in between.

Many approaches have been available to determine the threshold level. A universal method assigns a threshold level equal to the variance times, expressed as

Formula 10(10)
where n is the sample size (DONOHO and JOHNSTONE 1994). The hard thresholding rule is defined as

Formula 11(11)
As pointed out by DONOHO and JOHNSTONE (1994) and JOHNSTONE and SILVERMAN (1997) that, for a sequence of normal distributed random variables zi ~ N(0, {sigma}Formula 11), i = 1, ... , n,

Formula 12(12)
so that if a detail coefficient is truly zero then with a high probability it is estimated as zero by the hard thresholding rule. The expected number of Formula 12 greater than the threshold tends to 0. For most applications, the hard thresholding rule keeps only those detail coefficients that are significantly greater than zero. We use the hard thresholding rule as a guidance to either keep or kill the whole level of detail coefficients.

The following procedure is proposed to perform data dimensionality reduction through wavelet transforms:

  1. Select proper orthogonal wavelet filters.
  2. Calculate the empirical variances for the detail coefficients.
  3. Apply the hard thresholding rule to the detail coefficients.
  4. Truncate the whole level of the detail coefficients if they are all set to zero by step 3, and keep the whole level of the detail coefficients otherwise.
  5. Repeat procedures 1–4 for user prescribed j times. Functional QTL mapping will be conducted on the basis of the smooth coefficients cj if the rest of detail coefficients are truncated or on the basis of the smooth coefficients cj+1 otherwise.


    HYPOTHESIS TESTING
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
The main advantages of functional mapping include the formulation of a number of biologically meaningful hypothesis tests regarding the interplay between gene action/interactions and development patterns (WU et al. 2004a). These advantages can be preserved for wavelet-based functional mapping. The first hypothesis test is about the existence of a QTL that affects a longitudinal trait, expressed as

Formula 12

Formula 12
The test is performed using the log-likelihood ratio (LR) statistics calculated at all given test locations,

Formula 13(13)
where L0 and L1 are the corresponding values of the plug-in likelihood values under the null (H0) and the alternative hypothesis (H1). An LR profile is then plotted against test locations throughout the genome. Because the QTL position under the null hypothesis is a nuisance parameter, the distribution of the LR values under H0 is unclear. The critical threshold value can be determined empirically by permutation tests as advocated by CHURCHILL and DOERGE (1994). Those locations with the LR higher than the threshold value are declared to harbor significant QTL.

Other hypothesis tests after a significant QTL is detected include the significance of the additive genetic effect on a certain point or period of growth trajectory during a time course. The wavelet-based parametric model can be used to test the genetic control over particular developmental timing and specific mathematical parameters that determine the shape of a curve. Furthermore, wavelet-based functional mapping facilitates the resolution of fundamental biological issues as follows: (1) If multiple environments are used, as shown in ZHAO et al. (2004b), we can test how the detected QTL interact with the environment in a coordinated manner to determine growth performance at different developmental stages; (2) the model can be readily extended to model multiple QTL for growth trajectories throughout the genome and estimate and test the effects of different components of epistasis on growth patterns and forms (WU et al. 2004a); and (3) wavelet-based parametric functional mapping builds up a bridge between the genetic mechanisms underlying different traits, quantitatively testing for the role of pleiotropic QTL on the phenotypic integration of a biological system.

Unlike the test statistic for the QTL significance by Equation 13, the critical threshold values for all the tests after a significant QTL was detected should be determined from simulation studies. However, the null hypotheses for these tests are well nested with their alternative hypotheses and, for this reason, the corresponding LR values can be thought to asymptotically follow a {chi}2-distribution. Therefore, in theory, the critical values for these tests can be obtained from the {chi}2-table, although the asymptotic properties of these test statistics need to be investigated analytically or empirically.


    MONTE CARLO SIMULATION
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
Extensive real data analyses and simulation studies of functional mapping based on parametric modeling have been performed in previous studies (MA et al. 2002; WU et al. 2004a,b,c; ZHAO et al. 2004a,b, 2005) to examine its statistical behavior. Here we focus our simulation studies on the comparison between wavelet-based and conventional full-dimensional functional mapping for growth trajectories. Three different dimensions of longitudinal data, low (T = 8), intermediate (T = 16), and high (T = 64), were simulated to detect the advantage of wavelet-based over conventional approaches when longitudinal data are high dimensional.

We assume a backcross population of a moderately large sample size (200), in which there are two genotypes at each gene. A total of six markers are evenly distributed along a linkage group of length 100 cM. A QTL assumed to affect growth trajectories is located at the center of the linkage group (i.e., 50 cM from either end of the linkage group). By assuming two different QTL genotypic curves, defined by Formula 13, where al is the genotype-specific asymptotic growth when time is infinite, bl is the genotype-specific parameter related to the initial growth, and rl is the genotype-specific relative growth rate, phenotypic data were simulated using the linear regression model (2), with the time-dependent covariance of residual errors following the AR(1) model. The variance {sigma}2 was selected according to the heritability of H2 = 0.1 and 0.4 at the time point where the genetic variance is in the middle of the largest and the smallest.

It is important to note that the genetic control of a QTL over growth may have different patterns. We consider four such different patterns (Figure 2), as also indicated in WU and LIN (2006), to investigate how functional mapping affects the estimation of genotypic curves for different patterns. Pattern 1 with two parallel genotypic curves shows that a QTL exerts its effect on the entire process of growth, whereas the other three patterns show the changes of QTL effects over the time course because the two genotypic curves are crossed at a certain point of development. Pattern 2 suggests that a QTL has no effect at an earlier stage of development, but displays an increased influence with time after a particular timing. Pattern 3 is the inverse of pattern B and, thus, reflects the impact of an "early QTL." The two genotypic curves in pattern 4 make a crossover, suggesting that the underlying QTL alters the direction of its effect. Curve parameters (Formula 13) that defined these two patterns are given in Table 1.


Figure 2
View larger version (14K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 2.— Four different patterns of dynamic genetic effects exerted by a QTL in a backcross design with logistic parameters given in Table 1.

 

View this table:
[in this window]
[in a new window]

 
TABLE 1 True logistic parameters for each of the four dynamic patterns of genetic effects as plotted in Figure 2 and the AR(1) parameters modeling the structure of the covariance matrix

 
Both full-dimensional and wavelet-based functional mapping were used to analyze the simulated data including marker information and longitudinal measures, respectively. According to the thresholding rule, the optimal resolution levels of wavelet transform are 1, 2, and 3 and, therefore, the optimal dimensions after reduction are 4, 4, and 8, for the 8-, 16-, and 64-dimensional longitudinal data, respectively. For each simulation scheme, 200 runs were repeated to estimate the square roots of the mean square errors of the maximum-likelihood estimates (MLEs) of unknown parameters as well as averaged computing times in seconds under the two models. Figure 3 illustrates the profiles of LR values across the simulated linkage group for two different levels of heritability of growth curves (0.1 and 0.4) and of dimension of longitudinal data (16 and 64) calculated from full-dimensional and wavelet-based functional mapping. The peaks of the LR profiles near the center of the linkage group suggest that the two models can provide a good estimate of the QTL location for each simulation scheme.


Figure 3
View larger version (21K):
[in this window]
[in a new window]
[Download PPT slide]
 
FIGURE 3.— Log-likelihood ratio (LR) profiles for the detection of a QTL across a simulated linkage group from a random simulation replicate under pattern 1 using full-dimensional (solid curves) and wavelet-based (dashed-dotted curves) functional mapping under different levels of heritability (H2 = 0.1 and 0.4) and different longitudinal dimensions (T = 16 and 64). The estimated QTL locations are indicated by the arrows.

 
The two functional mapping models provide comparable results; i.e., both of them can be well used to estimate the location of a QTL and QTL genotype-specific parameters that define the shape of growth trajectories (Tables 2–4GoGo). As expected, the accuracy and precision of parameter estimation increase when the heritability of growth curves increases from 0.1 to 0.4. The estimation precision of parameters is markedly affected by the dynamic pattern of genetic effects. The QTL location can be estimated with better precision for pattern 3 than for patterns 1, 2, and 4. The estimation precision of all three logistic parameters is better when the dynamic pattern of genetic effects obeys patterns 2 and 3 than patterns 1 and 4. It seems that the estimation precision of the parameters also depends on their types and QTL genotypes. For example, when the dimension of repeated measure is 16, initial growth (b) can be better estimated for QTL genotype 1 under pattern 4 than 1, whereas the converse was true for relative growth (r) rate, although the two patterns have similar estimation precision for asymptotic growth (a) (Table 3). The dimension of longitudinal data affects the precision of parameter estimates. Increasing dimensions from an intermediate (16; Table 3) to high level (64; Table 4) can always improve the estimation precision for all the patterns, but the influence of dimension was pattern specific when the dimensions increase from a low (8; Table 2) to an intermediate level. For patterns 2 and 3, an intermediate dimension provided more precise estimates for growth curve parameters than a low dimension, but, for patterns 1 and 4, the converse was detected.


View this table:
[in this window]
[in a new window]

 
TABLE 2 Means of the MLEs of the QTL position and genotype-specific logistic parameters and the square roots of the mean square errors of the MLEs (in parentheses) for eight-dimensional longitudinal data under different heritability levels (H2 = 0.1 and 0.4) in a backcross from full-dimension parametric and wavelet-based functional mapping obtained from 200 Monte Carlo simulations

 

View this table:
[in this window]
[in a new window]

 
TABLE 3 Means of the MLEs of the QTL position and genotype-specific logistic parameters and the square roots of the mean square errors of the MLEs (in parentheses) for 16-dimensional longitudinal data under different heritability levels (H2 = 0.1 and 0.4) in a backcross from full-dimension parametric and wavelet-based functional mapping obtained from 200 Monte Carlo simulations

 

View this table:
[in this window]
[in a new window]

 
TABLE 4 Means of the MLEs of the QTL position and genotype-specific logistic parameters and the square roots of the mean square errors of the MLEs (in parentheses) for 64-dimensional longitudinal data under different heritability levels (H2 = 0.1 and 0.4) in a backcross from full-dimension parametric and wavelet-based functional mapping obtained from 200 Monte Carlo simulations

 
The computing times used for full-dimensional and wavelet-based functional mapping differ notably. The advantage of wavelet-based over full-dimensional functional mapping in computing efficiency is consistent for growth curves of different heritabilities and different patterns of dynamic genetic effects. Wavelet-based functional mapping is computationally more efficient by approximately one to two and two to four times relative to full-dimensional functional mapping when the dimension of longitudinal data is 8 or 16. This computational efficiency increases to approximately four to eight times when the dimension increases to 64.


    DISCUSSION
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
Genetic mapping has been used as a routine approach to studying the genetic control of complex traits (LANDER and BOTSTEIN 1989; ZENG 1994; LYNCH and WALSH 1998). There is now a pressing need for statistical methods that can unravel the genetic machinery of developmental characteristics because many traits of agricultural, biological, and biomedical interest exhibit a complex developmental feature. Functional mapping, aimed at characterizing dynamic QTL underlying time-dependent phenotypic differentiation, has emerged as a powerful tool in genetic research (MA et al. 2002; WU et al. 2003, 2004a,b,c; ZHAO et al. 2004b, 2005; MACGREGOR et al. 2005; YANG et al. 2006; reviewed in WU and LIN 2006). For a high-dimensional response data set contaminated by noise, the detection of QTL by functional mapping suffers from the so-called "curse of dimensionality." Also, the complexity and high dimension of longitudinal data make functional mapping computationally prohibitive. In this article, we have incorporated the principle of wavelet shrinkage to map QTL that affect developmental trajectories. As an increasingly popular means for data compression and denoising in the context of signal and image processing (DONOHO and JOHNSTONE 1994; DONOHO 1995; DONOHO et al. 1995), wavelet shrinkage can also display favorable properties to be used for genetic mapping by projecting higher-dimensional data to a manageable lower-dimensional subspace.

The effective use of the wavelet thresholding model for functional mapping can be attributed to three factors. First, the dimensionality of smooth coefficients is only a fraction of the data dimensionality. The curse of dimensionality is removed by considering models for low-dimensional smooth coefficients. Second, the smooth coefficients capture the pattern of the original signal (see Figure 1). As shown in FAN and LIN (1998), the Fourier transformation preserves information content mostly in the low-frequency Fourier coefficients. Likewise, smooth coefficients are the low-frequency counterpart of the signal (VIDAKOVIC 1999). Third, removing detail coefficients from the transformation corresponds to removing high-frequency contents of the data, which is equivalent to reducing the noise level and the redundancies of the data. Truncating detail coefficients corresponds to a denoising process (DONOHO and JOHNSTONE 1994; DONOHO 1995; DONOHO et al. 1995). By throwing out the noise-rich detail coefficients, the remaining smooth coefficients have a higher signal-to-noise ratio under certain conditions.

We performed simulation studies to investigate the statistical behavior of wavelet-based functional mapping. The data were simulated for multiple time points using QTL genotype-specific logistic curves, but analyzed by both wavelet-based and full-dimensional functional mapping models. The results suggest that the wavelet-based functional mapping outperforms the full-dimensional model in computational efficiency, while these two models provide consistent results about parameter estimation. With an increased dimension of longitudinal data, the advantage of wavelet-based over full-dimensional functional mapping in computation becomes more pronounced.

The advantage of functional mapping in biology comes from the parametric modeling of the dynamic change of genotypic values for a putative QTL over different time points and the testing of QTL effects on the pattern and form of growth curves on the basis of mathematical parameters that define the curves. Wavelet-based shrinkage has been generally used as a nonparametric approach to handle functional data (MORRIS et al. 2003, 2006; MORRIS and CARROLL 2006), thus exhibiting a broad use in situations in which no explicit mathematical function is available to describe longitudinal curves. For many longitudinal trajectories that can be mathematically formulated, such as the sigmoid shape of organismic growth (VON BERTALANFFY 1957; WEST et al. 2001), biexponential decay of viral load after medical treatment (HO et al. 1995; WU and DING 1999), and periodic functions of circadian rhythm (SCHEPER et al. 1999), a nonparametric treatment will unavoidably reduce their biological relevance. One of the most significant relevances of our wavelet-based functional mapping proposed is to preserve all the favorable properties of functional mapping in biologically meaningful hypothesis tests through the integration of wavelet shrinkage within the framework of parametric functional mapping.

For a practical data set, the true mean-covariance structures are not known. As shown by ZHAO et al. (2005), different covariance structures may give different results of QTL location estimation. In our simulation studies, the AR(1) model was used to approximate the structure of the covariance matrix. It would be important to model the mean-covariance structures for the time–space data set using different approaches (see POURAHMADI 1999, 2000; PAN and MACKENZIE 2003; WU and POURAHMADI 2003) and further derive dimensionally reduced models at different resolutions.

The wavelet dimensionality reduction method proposed in this article has power to map and identify QTL responsible for high-dimensional longitudinal traits. It opens a new avenue to draw a detailed picture of the genetic architecture of a complex biological system by dissecting it into different components and developmental stages and studying the genetic regulation and interactions of different QTL involved in various sequential steps of development. As a starting point, we hope to develop more sophisticated wavelet-based functional mapping to take into account the complexity and high dimension of longitudinal data that are needed to describe a biological system in a comprehensive way.


    APPENDIX
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
In what follows, we provide the EM algorithm for estimating genotype-specific curve parameters Formula 13 (l = 1, 2) for the logistic equation of growth trajectories and the covariance parameters {Theta}v = ({rho}, {sigma}2) based on the AR(1) model from wavelet-based functional mapping. The observed log-likelihood function of Equation 5 is rewritten as

Formula 13
where

Formula 13

Formula 13

Formula 13
until wFormula 13 and wFormula 13 are determined according to the thresholding rule.

Assuming that QTL genotypes (Q) for each individual are known, i.e., there are no missing data, we construct the complete log-likelihood function as

Formula 13
where {xi}1 is the indicator variable for a QTL genotype of individual i as defined before.

In the E-step of the EM algorithm, we need to calculate the conditional expectation of the complete log-likelihood given the observed data and the current estimation of parameters,

Formula 13
where, E(·) and Ei(·) denote the conditional expectations given the observed data and current estimation of parameters.

So, we define the E-step by expressing the posterior probabilities of individual i to be QTL genotype 1 or 2 as

Formula 13

Formula 13

The M-step is derived by solving the log-likelihood equations of the expected complete log-likelihood function given the observed data and current estimations,

Formula 13

Formula 13

Formula 13

Formula 13

Formula 13

Formula 13

Formula 13

Formula 13
and Formula 13.

Note that since the smooth coefficients are weighted averages of the neighboring points, the correlation structure between these smooth coefficients resembles the correlation structure of the full-dimensional data. Here we take advantage of this resemblance and directly model the correlation structure of the smooth coefficients. If the likelihood equations cannot be solved directly, the Newton–Raphson algorithm is implemented within the M-step to solve these equations numerically. For an illustration, we show only the method for computing b1. The {zeta}th step in the Newton–Raphson algorithm is expressed as

Formula 13

The E- and M-steps are iteratively repeated until the estimates of parameters are stable. These stable estimates are regarded as the MLEs of parameters.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 
We thank Bruce Walsh for his encouragement to develop this wavelet-based functional mapping model and several anonymous referees for their constructive comments on the earlier versions of this manuscript. The preparation of this work was partially supported by a National Science Foundation grant (0540745), a National Institutes of Health grant (R01 NS041670), and grants from the National Natural Science Foundation of China (09-95671 and 30230300).


    LITERATURE CITED
 TOP
 ABSTRACT
 WAVELET-BASED FUNCTIONAL MAPPING
 HYPOTHESIS TESTING
 MONTE CARLO SIMULATION
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 LITERATURE CITED
 

ABOUFADEL, E., and S. SCHLICKER, 1999 Discovering Wavelets. John Wiley & Sons, New York.

CHURCHILL, G. A., and D. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971.[Abstract]

DIGGLE, P. J., P. HEAGERTY, K. Y. LIANG and S. L. ZEGER, 2002 Analysis of Longitudinal Data. Oxford University Press, Oxford.

DONOHO, D. L., 1995 De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41: 613–627.[CrossRef]

DONOHO, D. L., and I. M. JOHNSTONE, 1994 Ideal spatial adaptation by wavelet shrinkage. Biometrika 81: 425–455.[Abstract/Free Full Text]

DONOHO, D. L., I. M. JOHNSTONE, G. KERKYACHARIAN and D. PICARD, 1995 Wavelet shrinkage: Asymptopia? J. R. Stat. Soc. B 57: 301–369.

FAN, J. Q., and S. K. LIN, 1998 Test of significance when data are curves. J. Am. Stat. Assoc. 93: 1007–1021.[CrossRef]

GHAEL, S. P., A. M. SAYEED and R. G. BARANIUK, 1997 Improved wavelet denoising via empirical Wiener filtering. Proceedings of SPIE: Wavelet Applications in Signal and Imaging, San Diego, Vol. 3169, pp. 389–399.

HO, D. D., A. U. NEUMANN, A. S. PERELSON, W. CHEN, J. M LEONARD et al., 1995 Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373: 123–126.[CrossRef][Medline]

JAFFRÉZIC, F., R. THOMPSON and W. G. HILL, 2003 Structured antedependence models for genetic analysis of repeated measures on multiple quantitative traits. Genet. Res. 82: 55–65.[CrossRef][Medline]

JENSEN, A., and A. LA COUR-HARBO, 2001 Ripples in Mathematics. Springer, New York.

JOHNSTONE, I. M., and B. W. SILVERMAN, 1997 Wavelet threshold estimators for data with correlated noise. J. Roy. Stat. Soc. Ser. B 59: 319–351.[CrossRef]

LANDER, E. S., and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.[Abstract/Free Full Text]

LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA.

MA, C. X., G. CASELLA and R. L. WU, 2002 Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 161: 1751–1762.[Abstract/Free Full Text]

MACGREGOR, S., S. A. KNOTT, I. WHITE and P. M. VISSCHER, 2005 Quantitative trait locus analysis of longitudinal quantitative trait data in complex pedigrees. Genetics 171: 1365–1376.[Abstract/Free Full Text]

MALLAT, S. G., 1989 A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Patt. Anal. Machine Intell. 11: 674–693.[CrossRef]

MALLAT, S., 1998 A Wavelet Tour of Signal Processing. Academic Press, New York/London/San Diego.

MORRIS, J. S., and R. J. CARROLL, 2006 Wavelet-based functional mixed models. J. R. Stat. Soc. Ser. B 68: 179–199.[CrossRef]

MORRIS, J. S., M. VANNUCCI, P. J. BROWN and R. J. CARROLL, 2003 Wavelet-based nonparametric modeling of hierarchical functions in colon carcinogenesis. J. Am. Stat. Assoc. 98: 573–597.[CrossRef]

MORRIS, J. S., S. ARROYO, B. A. COULL, L. M. RYAN, R. HERRICK et al., 2006 Using wavelet-based functional mixed models to characterize population heterogeneity in accelerometer profiles: a case study. J. Am. Stat. Assoc. 101: 1352–1364.[CrossRef]

NÚÑEZ-ANTÓN, V., and D. L. ZIMMERMAN, 2000 Modeling nonstationary longitudinal data. Biometrics 56: 699–705.[CrossRef][Medline]

PAN, J. X., and G. MACKENZIE, 2003 On modelling mean-covariance structures in longitudinal studies. Biometrika 90: 239–244.[Abstract/Free Full Text]

POURAHMADI, M., 1999 Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika 86: 677–690.[Abstract/Free Full Text]

POURAHMADI, M., 2000 Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika 87: 425–435.[Abstract/Free Full Text]

SCHEPER, T. O., D. KLINKENBERG, C. PENNARTZ and J. VAN PELT, 1999 A mathematical model for the intracellular circadian rhythm generator. J. Neurosci. 19: 40–47.[Abstract/Free Full Text]

SY, J. P., J. M. TAYLOR and W. G. CUMBERLAND, 1997 A stochastic model for the analysis of bivariate longitudinal AIDS data. Biometrics 53: 542–555.[CrossRef][Medline]

VIDAKOVIC, B., 1999 Statistical Modeling by Wavelets. Wiley, New York.

VON BERTALANFFY, L., 1957 Quantitative laws in metabolism and growth. Q. Rev. Biol. 32: 217–231.[CrossRef][Medline]

WALKER, J. S., 1999 A Primer on Wavelets for Their Scientific Applications. Chapman & Hall/CRC, London/New York/Cleveland/Boca Raton, FL.

WEST, G. B., J. H. BROWN and B. J. ENQUIST, 2001 A general model for ontogenetic growth. Nature 413: 628–631.[CrossRef]

WU, H., and A. DING, 1999 Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics 55: 410–418.[CrossRef][Medline]

WU, R. L., and M. LIN, 2006 Functional mapping—how to map and study the genetic architecture of dynamic complex traits. Nat. Rev. Genet. 7: 229–237.[CrossRef][Medline]

WU, R. L., C.-X. MA, W. ZHAO and G. CASELLA, 2003 Functional mapping of quantitative trait loci underlying growth rates: a parametric model. Physiol. Genomics 14: 241–249.[Abstract/Free Full Text]

WU, R. L., C.-X. MA, M. LIN and G. CASELLA, 2004a A general framework for analyzing the genetic architecture of developmental characteristics. Genetics 166: 1541–1551.[Abstract/Free Full Text]

WU, R. L., C.-X. MA, M. LIN, Z. H. WANG and G. CASELLA, 2004b Functional mapping of quantitative trait loci underlying growth trajectories using a transform-both-sides logistic model. Biometrics 60: 729–738.[CrossRef][Medline]

WU, R. L., Z. H. WANG, W. ZHAO and J. M. CHEVERUD, 2004c A mechanistic model for genetic machinery of ontogenetic growth. Genetics 168: 2383–2394.[Abstract/Free Full Text]

WU, W. B., and M. POURAHMADI, 2003 Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90: 831–844.[Abstract/Free Full Text]

YANG, R. Q., Q. TIAN and S. Z. XU, 2006 Mapping quantitative trait loci for longitudinal traits in line crosses. Genetics 173: 2339–2356.[Abstract/Free Full Text]

ZENG, Z.-B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457–1468.[Abstract]

ZHAO, W., R. L. WU, C.-X. MA and G. CASELLA, 2004a A fast algorithm for functional mapping of complex traits. Genetics 167: 2133–2137.[Abstract/Free Full Text]

ZHAO, W., C.-X. MA, M. CHEVERUD and R. L. WU, 2004b A unifying statistical model for QTL mapping of genotype x sex interaction for developmental trajectories. Physiol. Genomics 19: 218–227.[Abstract/Free Full Text]

ZHAO, W., Y. Q. CHEN, G. CASELLA, J. M. CHEVERUD and R. L WU, 2005 A nonstationary model for functional mapping of complex traits. Bioinformatics 21: 2469–2477.[Abstract/Free Full Text]

ZIMMERMAN, D. L., and V. NÚÑEZ-ANTÓN, 2001 Parametric modeling of growth curve data: an overview. Test 10: 1–73.[CrossRef]

Communicating editor: J. B. WALSH





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
genetics.107.070920v1
176/3/1879    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zhao, W.
Right arrow Articles by Wu, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhao, W.
Right arrow Articles by Wu, R.


HOME HELP FEEDBACK