Genetics, Vol. 167, 2133-2137, August 2004, Copyright © 2004
doi:10.1534/genetics.103.024844

A Fast Algorithm for Functional Mapping of Complex Traits

Department of Statistics, University of Florida, Gainesville, Florida 32611

1 Corresponding author: Department of Statistics, 533 McCarty Hall C, University of Florida, Gainesville, FL 32611.
E-mail: rwu{at}stat.ufl.edu

Manuscript received November 21, 2003. Accepted for publication May 7, 2004.

ABSTRACT

By integrating the underlying developmental mechanisms for the phenotypic formation of traits into a mapping framework, functional mapping has emerged as an important statistical approach for mapping complex traits. In this note, we explore the feasibility of using the simplex algorithm as an alternative to solve the mixture-based likelihood for functional mapping of complex traits. The results from the simplex algorithm are consistent with those from the traditional EM algorithm, but the simplex algorithm has considerably reduced computational times. Moreover, because of its nonderivative nature and easy implementation with current software, the simplex algorithm enjoys an advantage over the EM algorithm in the dynamic modeling and analysis of complex traits.


THE statistical foundation of quantitative trait locus (QTL) detection and mapping is the mixture model. In this mixture model, each observation y is assumed to have arisen from one of L (L possibly unknown but finite) components, each component being modeled by a density from the parametric family f,

(1)
where {pi} = ({pi}1, ... , {pi}L)T are the mixture proportions that are constrained to be nonnegative and sum to unity; {phi} = ({phi}1, ... , {phi}L)T are the component-specific parameters, with {phi}l being specific to component l; and {eta} is a parameter that is common to all components.

The genetic mapping of QTL based on the mixture model contains three major steps: first, derive the mixture proportions ({pi}) denoted as the frequencies of QTL genotypes for a particular genetic design; second, determine the distribution density function for each genotype in terms of QTL effects ({phi}) and residual variance ({eta}); third, provide estimates of unknown parameters contained in the mixture model. The first step uses Mendelian and population genetics relevant to experimental design, marker type, and population structure. The second step needs basic principles from quantitative genetics, general biology, and biomedicine. Most QTL mapping methods are based on the normal distribution because the traits of interest are generally continuously distributed. The last step, aimed at solving the mixture model, relies upon statistical theories and computational algorithms. Three computational algorithms have been developed to solve the QTL parameters contained in Equation 1: least-squares regression (LSR) analysis (KNOTT and HALEY 2000), maximum-likelihood-based expectation-maximization (EM) algorithm (DEMPSTER et al. 1977; JANSEN and STAM 1994), and Bayesian-based Markov chain Monte Carlo (MCMC) algorithm (SILLANPAA and ARJAS 1999). These three algorithms have been used as standard methods for QTL mapping because they are founded on solid statistical backgrounds and because their statistical properties have been investigated extensively (see the references cited above).

For most of the current statistical mapping methods devised under the assumption that there is a direct relationship between the genotype and phenotype (LANDER and BOTSTEIN 1989; reviewed in JANSEN 2000), LSR, EM, and MCMC algorithms are basically adequate to provide a reasonable solution of the mixture model. However, they would encounter significant difficulties in model solution when statistical methods are upgraded to incorporate inherited biological complexities. For example, the formation of quantitative traits is under developmental control, involving an intricate array of factors, genetic or nongenetic, and interactions among factors. For any organism, the output of a complex trait, i.e., phenotype, and its underpinning blueprint, i.e., genotype, are related through a particular developmental process or network (WOLF 2002). More recently, a general genetic theory has been formulated to integrate developmental mechanisms of trait formation into the mapping framework constructed by a mixture-based likelihood (MA et al. 2002; WU et al. 2002). This integrated model, called functional mapping by MA et al. (2002), has proven powerful in mapping the QTL that govern growth trajectories in long-lived forest trees.

Although the QTL detected from functional mapping are expected to be biologically more relevant than those from traditional methods (which do not consider any developmental mechanisms), the incorporation of a developmental mechanism that is described by a mathematical function often needs extensive mathematical manipulations, including differentiation when the EM algorithm is derived (MA et al. 2002), and computer programs. However, descriptive mathematical functions are often complicated nonlinear equations in many situations and, therefore, it is extremely time consuming to derive, program, and estimate the solutions of the likelihood. To make functional mapping more applicable in the genetic dissection of complex traits, it is essential to develop a fast and efficient computational algorithm for solving the likelihood equations.

The Nelder-Mead simplex algorithm, originally proposed by NELDER and MEAD (1965), is a direct-search method for nonlinear unconstrained optimization. It attempts to minimize a scalar-valued nonlinear function using only function values, without any derivative information (explicit or implicit). The algorithm uses linear adjustment of the parameters until some convergence criterion is met. The term "simplex" arises because the feasible solutions for the parameters may be represented by a polytope figure called a simplex. The simplex is a line in one dimension, triangle in two dimensions, and tetrahedron in three dimensions, respectively.

We illustrate how the simplex algorithm works using the following general optimization problem,

where f(·) is a nonlinear function with n parameters. The simplex algorithm uses three basic moves: reflection, expansion, and contraction (Figure 1 shows the three moves for n = 2). It first takes n + 1 points, x1, x2, ... , xn+1, to construct a simplex and calculates the function values f(xi), for i = 1, 2, ... , n + 1. The point of maximum value is then reflected. Depending on whether the value of the reflected point is a new minimum, expansion or contraction may follow the reflection to form a new simplex. If a false contraction is encountered, the algorithm will start an additional shrinkage process.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
FIGURE 1.—

The three basic movements in the simplex algorithm.

 
The Nelder-Mead simplex algorithm (NELDER and MEAD 1965) and its variants have been some of the most widely used methods for nonlinear unconstrained optimization. They have been applied widely in a variety of fields including chemistry, engineering, medicine, and food science (WALTERS et al. 1991; CASTRO et al. 2003). The simplex algorithm has three attractive properties: first, it is free of any explicit or implicit derivative information when minimizing a scalar-valued nonlinear function, which makes it much less prone to finding false minima (see LAGARIUS et al. 1998; PRICE et al. 2002 for discussion on the convergence issues). Another important feature of the simplex algorithm is that no divisions are required in the calculation; thus the "divided by zero" runtime error can be avoided. In functional mapping, various complicated nonlinear functions are embedded in the mixture-based likelihood. Newton-Raphson (PRESS et al. 1992) and EM-like (DEMPSTER et al. 1977) algorithms involve first and second derivatives of the likelihood, which can become intractable quickly when the complexity of the function increases. Second, in practice it typically requires only one or two function evaluations to construct a new simplex in one iteration; thus it runs very fast. Third, it is easy to implement with existing computer software, such as Matlab, which provides a user-friendly function fminsearch for a direct usage of the method. The computer source code is also downloadable from the Internet.

Because of these favorable properties of the simplex algorithm, this algorithm has been used by some authors to map complex traits (MARTINEZ et al. 1998; PEREZ-ENCISO and VARONA 2000; LIU et al. 2001; PEREZ-ENCISO et al. 2002). However, all of these studies presented only a simple utilization of the simplex algorithm to QTL mapping with no detailed discussion on its advantages in computational efficiency and mathematical manipulation and its disadvantages in some asymptotic properties. Some of these studies did not cite even the seminal article by NELDER and MEAD (1965). Clearly, with no such discussion, the broad implications of the simplex algorithm remain unjustified. In this note, we for the first time compare the results and computational efficiency for QTL mapping between the simplex algorithm and the standard algorithm. We demonstrate that the simplex algorithm is an advantageous alternative when the purpose of a genome project is to map dynamic QTL that govern developmental trajectories.

As a special case of the mixture model (Equation 1), suppose that there is a progeny population of size N in which there are two different genotype groups at each locus with respective frequencies p and 1 – p. Let Y = (Y1, ... , YN) be N independent observations each having k-dimensional phenotypic measurements. On the basis of a multivariate mixture model, the likelihood of Y, affected by a putative QTL of genotypes Qq (denoted by 1) and qq (denoted by 0), is formulated as

(2)
where the vector {Omega} = (m1, m0, {sum})T contains k expected genotypic values, m1 and m0, for each QTL genotype and a (k x k) within-genotype residual (co)variance matrix, {sum}. We implement the simplex algorithm to obtain the maximum-likelihood estimates (MLE) of {Omega}.


Simulations:
Two simulation strategies are designed to compare the results from the simplex algorithm and the EM algorithm. The first strategy considers nonfunctional mapping, in which k-dimensional phenotypic traits measured have identical means for each QTL genotype group, i.e., m1 = 1kx1µ1 and m0 = 1kx1µ0, and a constant residual variance (there are no correlations between the m traits), i.e., {sum} = Ikxk{sigma}2. The data are simulated under a multivariate normal distribution, with different sample sizes (100, 200, and 400). Other simulation conditions are the heritability of 0.15 for each trait, a total of k = 15 phenotypic traits, and the QTL genotype frequency of p = 0.3. Means and standard deviations of the estimated parameters are calculated from 1000 simulations. All calculations are carried out on Dell Workstation PWS530.

Table 1 summarizes the results from the simplest model in which the hypothesized trait is measured only at a single time point. The MLEs of mean values and residual variances of the trait are compared from the simplex and EM algorithms. To prevent possible occurrences of local maxima, multiple tries for initial values were made for the simplex algorithm. The two algorithms give identical estimates of all the parameters and identical likelihood values for each sample size. For this simplest situation, the simplex algorithm needs a slightly longer computational time than the EM algorithm, suggesting that the former is not advantageous for multivariate trait mapping.


View this table:
In this window
In a new window

 
TABLE 1

Comparisons in the MLEs of the model parameters and the computational times (t) between the simplex and EM algorithms for a simple univariate mapping simulation strategy

 
The second simulation strategy incorporates logistic growth models into the mixture-based likelihood framework in Equation 2 (MA et al. 2002; WU et al. 2002). The expected genotypic values for each QTL genotype at different times are modeled by and . The results of the MLEs of growth parameters from functional mapping are almost identical between the simplex algorithm and the EM algorithm for all different sample sizes (Table 2). Yet, the computation times are substantially reduced when the simplex algorithm is used. On average, the time used for the simplex algorithm is only one-fifth of that for the EM algorithm. Thus, the simplex algorithm has a significant advantage in increasing computational efficiency over the traditional EM algorithm. This advantage is particularly striking if one is using a permutation argument to establish significance levels, as the calculation using the simplex algorithm would be substantially faster than the EM algorithm.


View this table:
In this window
In a new window

 
TABLE 2

Comparisons in the MLEs of the model parameters and the computational times (t) between the simplex and EM algorithms for the functional mapping simulation strategy

 


A worked example:
The example for functional mapping used in an outcrossing poplar by MA et al. (2002) is reanalyzed in this study. A full-sib family of 90 progeny was genotyped at a number of pseudo-test backcross molecular markers (YIN et al. 2002). Stem diameter growth was measured at the end of each growing season. MA et al.'s (2002) functional mapping model based on the EM algorithm has successfully identified a major QTL (P < 0.01) for 11-year diameter growth trajectories on linkage group D10. This major QTL is also identified by the simplex algorithm in this study (Figure 2). As expected, the simplex and EM algorithms produce an identical profile of the log-likelihood ratios for the full model (there is a QTL) vs. reduced model (there is no QTL) across the length of linkage group 10, thus providing the same estimates of the two different growth curves each for a QTL genotype. But the simplex algorithm uses only 85 seconds to finish a chromosome-wide QTL search, whereas the EM algorithm needs 122 seconds. This difference will be amplified by the number of permutation tests used to characterize the critical threshold for declaring the presence of QTL.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
FIGURE 2.—

The profile of the log-likelihood ratios between the full and reduced (no QTL) model for diameter growth trajectories across linkage group 10 in the Populus deltoides parent map (YIN et al. 2002). The genomic positions corresponding to the peak of the curve are the MLEs of the QTL localization. The threshold values at P = 0.05 and 0.01 are given as horizonal lines.

 


Concluding remarks:
Optimization techniques, e.g., the simplex algorithm, have been used to map QTL affecting a complex trait (MARTINEZ et al. 1998; PEREZ-ENCISO and VARONA 2000; LIU et al. 2001; PEREZ-ENCISO et al. 2002). However, none of these studies has explored statistical properties of this algorithm and made any comparison between it and the standard algorithms, such as the EM algorithm. In practice, the simplex algorithm may converge to local maxima and multiple tries for initial values should be made to guarantee the occurrence of global maxima. More recently, the implication of advanced optimization techniques like genetic algorithms has been recognized for mapping epistatic or linked QTL (CARLBORG et al. 2000; NAKAMICHI et al. 2001), but no attempt was made to identify their roles in functional mapping aimed at the identification of more biologically relevant dynamic QTL (MA et al. 2002).

In this study, we incorporated the Nelder-Mead simplex algorithm into the framework of QTL mapping. Our simulation study suggests that the simplex algorithm obtains the same results as the traditional EM algorithm, but it uses different amounts of computational time from the EM algorithm. If QTL mapping is performed for a trait measured at a single point, the simplex algorithm, although simple to implement, uses a longer time and, hence, displays no advantage, as compared to the EM algorithm. However, for functional mapping of a complex trait undergoing a particular developmental trajectory, the simplex algorithm is much more computationally efficient, as well demonstrated by both simulation and real-world examples. In conjunction with its derivative-free nature and easy implementation with software, therefore, the simplex algorithm provides an excellent alternative to estimate the mixture-based model for functional mapping.


ACKNOWLEDGEMENTS
We thank Max Shen for his contribution of expertise to this work and Chris Haley for bringing to our attention the literature of the simplex algorithm for QTL mapping. This work is partially supported by grants from the National Science Foundation to G.C. (DMS9971586) and by the Outstanding Young Investigator Award of the National Natural Science Foundation of China (30128017) and the University of Florida Research Opportunity Fund (02050259) to R.W. The publication of this article is approved as journal series R-10067 by the Florida Agricultural Experiment Station.


LITERATURE CITED

CARLBORG, O., L. ANDERSSON and B. KINGHORN, 2000 The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155: 2003–2010.[Abstract/Free Full Text]

CASTRO, I. A., R. S. F. SILVA, J. TIRAPEGUI, D. BORSATO and E. BONA, 2003 Simultaneous optimization of response variables in protein mixture formulation: constrained simplex method approach. Int. J. Food Sci. Tech. 28: 103–110.[CrossRef]

DEMPSTER, A. P., N. M. LAIRD and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39: 1–38.

JANSEN, R. C., 2000 Quantitative trait loci in inbred lines, pp. 567–597 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. Wiley, New York.

JANSEN, R. C., and P. STAM, 1994 High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136: 1447–1455.[Abstract]

KNOTT, S. A., and C. S. HALEY, 2000 Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.[Abstract/Free Full Text]

LAGARIUS, J. C., J. A. REEDS, M. H. WRIGHT and P. E. WRIGHT, 1998 Convergence properties of the Nelder-Mead simplex method in low dimensions. Soc. Ind. Appl. Math. J. Optim. 9: 112–147.

LANDER, E. S., and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.[Abstract/Free Full Text]

LIU, X.-J., F. OLOVER, S. D. M. BROWN, A. DENNY and P. D. KEIGHTLEY, 2001 High-resolution quantitative trait locus mapping for body weight in mice by recombinant progeny testing. Genet. Res. 77: 191–197.[CrossRef][Medline]

MA, C. X., G. CASELLA and R. L. WU, 2002 Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 161: 1751–1762.[Abstract/Free Full Text]

MARTINEZ, M., N. VUKASINOVIC, A. E. FREEMAN and R. L. FERNANDO, 1998 Mapping QTL in outbred populations using selected samples. Genet. Sel. Evol. 30: 453–468.

NAKAMICHI, R., Y. UKAI and H. KISHINO, 2001 Detection of closely linked multiple quantitative trait loci using a genetic algorithm. Genetics 158: 463–475.[Abstract/Free Full Text]

NELDER, J. A., and R. MEAD, 1965 A simplex method for function minimization. Comput. J. 7: 308–313.

PEREZ-ENCISO, M., and L. VARONA, 2000 Quantitative trait loci mapping in F2 crosses between outbred lines. Genetics 155: 391–405.[Abstract/Free Full Text]

PEREZ-ENCISO, M., A. CLOP, J. M. FOLCH, A. SANCHEZ, M. A. OLIVER et al., 2002 Exploring alternative models for sex-linked quantitative trait loci in outbred populations: application to an Iberian x Landrace pig intercross. Genetics 161: 1625–1632.[Abstract/Free Full Text]

PRESS, W. H., S. A. TEUKOLSKY, W. T. VETTERLING and B. P. FLANNERY, 1992 Numerical Recipes in C, Ed. 2. Cambridge University Press, Cambridge/London/New York.

PRICE, C. J., I. D. COOPE and D. BYATT, 2002 A convergent variant of the Nelder-Mead algorithm. J. Optim. Theor. Appl. 113: 5–19.[CrossRef]

SILLANPAA, M. J., and E. ARJAS, 1999 Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151: 1605–1619.[Abstract/Free Full Text]

WALTERS, F. H., L. R. PARKER, S. L. MORGAN and S. N. DEMING, 1991 Sequential Simplex Optimization. CRC Press, Boca Raton, FL.

WOLF, J. B., 2002 The geometry of phenotypic evolution in developmental hyperspace. Proc. Natl. Acad. Sci. USA 99: 15849–15851.[Free Full Text]

WU, R. L., C.-X. MA, M. R. CHANG, R. C. LITTELL, S. S. WU et al., 2002 A logistic mixture model for characterizing genetic determinants causing differentiation in growth trajectories. Genet. Res. 79: 235–245.[CrossRef][Medline]

YIN, T. M., X. Y. ZHANG, M. R. HUANG, M. X. WANG, Q. ZHUGE et al., 2002 Molecular linkage maps of the Populus genome. Genome 45: 541–555.[Medline]




This article has been cited by other articles:


Home page
GeneticsHome page
R. Yang, H. Gao, X. Wang, J. Zhang, Z.-B. Zeng, and R. Wu
A Semiparametric Approach for Composite Functional Mapping of Dynamic Quantitative Traits
Genetics, November 1, 2007; 177(3): 1859 - 1870.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Wu, J. Yang, and R. Wu
Semiparametric functional mapping of quantitative trait loci governing long-term HIV dynamics
Bioinformatics, July 1, 2007; 23(13): i569 - i576.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
W. Zhao, H. Li, W. Hou, and R. Wu
Wavelet-Based Parametric Functional Mapping of Developmental Trajectories With High-Dimensional Data
Genetics, July 1, 2007; 176(3): 1879 - 1892.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
Y. Cui, J. Zhu, and R. Wu
Functional mapping for genetic control of programmed cell death
Physiol Genomics, May 16, 2006; 25(3): 458 - 469.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Wu and W. Hou
A Hyperspace Model to Decipher the Genetic Architecture of Developmental Processes: Allometry Meets Ontogeny
Genetics, January 1, 2006; 172(1): 627 - 637.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
W. Hou, C. W. Garvan, W. Zhao, M. Behnke, F. D. Eyler, and R. Wu
A general model for detecting genetic determinants underlying longitudinal traits with unequally spaced measurements and nonstationary covariance structure
Biostat., July 1, 2005; 6(3): 420 - 433.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Lin and R. Wu
Theoretical Basis for the Identification of Allelic Variants That Encode Drug Efficacy and Toxicity
Genetics, June 1, 2005; 170(2): 919 - 928.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Zhao, Y. Q. Chen, G. Casella, J. M. Cheverud, and R. Wu
A non-stationary model for functional mapping of complex traits
Bioinformatics, May 15, 2005; 21(10): 2469 - 2477.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Wu, Z. Wang, W. Zhao, and J. M. Cheverud
A Mechanistic Model for Genetic Machinery of Ontogenetic Growth
Genetics, December 1, 2004; 168(4): 2383 - 2394.
[Abstract] [Full Text] [PDF]