## Abstract

By integrating the underlying developmental mechanisms for the phenotypic formation of traits into a mapping framework, functional mapping has emerged as an important statistical approach for mapping complex traits. In this note, we explore the feasibility of using the simplex algorithm as an alternative to solve the mixture-based likelihood for functional mapping of complex traits. The results from the simplex algorithm are consistent with those from the traditional EM algorithm, but the simplex algorithm has considerably reduced computational times. Moreover, because of its nonderivative nature and easy implementation with current software, the simplex algorithm enjoys an advantage over the EM algorithm in the dynamic modeling and analysis of complex traits.

THE statistical foundation of quantitative trait locus (QTL) detection and mapping is the mixture model. In this mixture model, each observation *y* is assumed to have arisen from one of *L* (*L* possibly unknown but finite) components, each component being modeled by a density from the parametric family *f*, 1where π = (π_{1}, … , π* _{L}*)

^{T}are the mixture proportions that are constrained to be nonnegative and sum to unity; φ = (φ

_{1}, … , φ

*)*

_{L}^{T}are the component-specific parameters, with φ

*being specific to component*

_{l}*l*; and η is a parameter that is common to all components.

The genetic mapping of QTL based on the mixture model contains three major steps: first, derive the mixture proportions (π) denoted as the frequencies of QTL genotypes for a particular genetic design; second, determine the distribution density function for each genotype in terms of QTL effects (φ) and residual variance (η); third, provide estimates of unknown parameters contained in the mixture model. The first step uses Mendelian and population genetics relevant to experimental design, marker type, and population structure. The second step needs basic principles from quantitative genetics, general biology, and biomedicine. Most QTL mapping methods are based on the normal distribution because the traits of interest are generally continuously distributed. The last step, aimed at solving the mixture model, relies upon statistical theories and computational algorithms. Three computational algorithms have been developed to solve the QTL parameters contained in Equation 1: least-squares regression (LSR) analysis (Knott and Haley 2000), maximum-likelihood-based expectation-maximization (EM) algorithm (Dempster* et al*. 1977; Jansen and Stam 1994), and Bayesian-based Markov chain Monte Carlo (MCMC) algorithm (Sillanpaa and Arjas 1999). These three algorithms have been used as standard methods for QTL mapping because they are founded on solid statistical backgrounds and because their statistical properties have been investigated extensively (see the references cited above).

For most of the current statistical mapping methods devised under the assumption that there is a direct relationship between the genotype and phenotype (Lander and Botstein 1989; reviewed in Jansen 2000), LSR, EM, and MCMC algorithms are basically adequate to provide a reasonable solution of the mixture model. However, they would encounter significant difficulties in model solution when statistical methods are upgraded to incorporate inherited biological complexities. For example, the formation of quantitative traits is under developmental control, involving an intricate array of factors, genetic or nongenetic, and interactions among factors. For any organism, the output of a complex trait, *i.e*., *phenotype*, and its underpinning blueprint, *i.e*., *genotype*, are related through a particular developmental process or network (Wolf 2002). More recently, a general genetic theory has been formulated to integrate developmental mechanisms of trait formation into the mapping framework constructed by a mixture-based likelihood (Ma* et al*. 2002; Wu* et al.* 2002). This integrated model, called *functional mapping* by Ma* et al.* (2002), has proven powerful in mapping the QTL that govern growth trajectories in long-lived forest trees.

Although the QTL detected from functional mapping are expected to be biologically more relevant than those from traditional methods (which do not consider any developmental mechanisms), the incorporation of a developmental mechanism that is described by a mathematical function often needs extensive mathematical manipulations, including differentiation when the EM algorithm is derived (Ma* et al*. 2002), and computer programs. However, descriptive mathematical functions are often complicated nonlinear equations in many situations and, therefore, it is extremely time consuming to derive, program, and estimate the solutions of the likelihood. To make functional mapping more applicable in the genetic dissection of complex traits, it is essential to develop a fast and efficient computational algorithm for solving the likelihood equations.

The Nelder-Mead simplex algorithm, originally proposed by Nelder and Mead (1965), is a direct-search method for nonlinear unconstrained optimization. It attempts to minimize a scalar-valued nonlinear function using only function values, without any derivative information (explicit or implicit). The algorithm uses linear adjustment of the parameters until some convergence criterion is met. The term “simplex” arises because the feasible solutions for the parameters may be represented by a polytope figure called a simplex. The simplex is a line in one dimension, triangle in two dimensions, and tetrahedron in three dimensions, respectively.

We illustrate how the simplex algorithm works using the following general optimization problem, where *f*(·) is a nonlinear function with *n* parameters. The simplex algorithm uses three basic moves: reflection, expansion, and contraction (Figure 1 shows the three moves for *n* = 2). It first takes *n* + 1 points, *x*_{1}, *x*_{2}, … , *x _{n}*

_{+1}, to construct a simplex and calculates the function values

*f*(

*x*), for

_{i}*i*= 1, 2, … ,

*n*+ 1. The point of maximum value is then reflected. Depending on whether the value of the reflected point is a new minimum, expansion or contraction may follow the reflection to form a new simplex. If a false contraction is encountered, the algorithm will start an additional shrinkage process.

The Nelder-Mead simplex algorithm (Nelder and Mead 1965) and its variants have been some of the most widely used methods for nonlinear unconstrained optimization. They have been applied widely in a variety of fields including chemistry, engineering, medicine, and food science (Walters* et al*. 1991; Castro* et al*. 2003). The simplex algorithm has three attractive properties: first, it is free of any explicit or implicit derivative information when minimizing a scalar-valued nonlinear function, which makes it much less prone to finding false minima (see Lagarius* et al*. 1998; Price* et al*. 2002 for discussion on the convergence issues). Another important feature of the simplex algorithm is that no divisions are required in the calculation; thus the “divided by zero” runtime error can be avoided. In functional mapping, various complicated nonlinear functions are embedded in the mixture-based likelihood. Newton-Raphson (Press* et al*. 1992) and EM-like (Dempster* et al.* 1977) algorithms involve first and second derivatives of the likelihood, which can become intractable quickly when the complexity of the function increases. Second, in practice it typically requires only one or two function evaluations to construct a new simplex in one iteration; thus it runs very fast. Third, it is easy to implement with existing computer software, such as Matlab, which provides a user-friendly function *fminsearch* for a direct usage of the method. The computer source code is also downloadable from the Internet.

Because of these favorable properties of the simplex algorithm, this algorithm has been used by some authors to map complex traits (Martinez* et al*. 1998; Perez-Enciso and Varona 2000; Liu* et al*. 2001; Perez-Enciso* et al*. 2002). However, all of these studies presented only a simple utilization of the simplex algorithm to QTL mapping with no detailed discussion on its advantages in computational efficiency and mathematical manipulation and its disadvantages in some asymptotic properties. Some of these studies did not cite even the seminal article by Nelder and Mead (1965). Clearly, with no such discussion, the broad implications of the simplex algorithm remain unjustified. In this note, we for the first time compare the results and computational efficiency for QTL mapping between the simplex algorithm and the standard algorithm. We demonstrate that the simplex algorithm is an advantageous alternative when the purpose of a genome project is to map dynamic QTL that govern developmental trajectories.

As a special case of the mixture model (Equation 1), suppose that there is a progeny population of size *N* in which there are two different genotype groups at each locus with respective frequencies *p* and 1 − *p*. Let **Y** = (**Y**_{1}, … , **Y*** _{N}*) be

*N*independent observations each having

*k*-dimensional phenotypic measurements. On the basis of a multivariate mixture model, the likelihood of

**Y**, affected by a putative QTL of genotypes

**Ω**= (

**m**

_{1},

**m**

_{0}, ∑)

^{T}contains

*k*expected genotypic values,

**m**

_{1}and

**m**

_{0}, for each QTL genotype and a (

*k*×

*k*) within-genotype residual (co)variance matrix, ∑. We implement the simplex algorithm to obtain the maximum-likelihood estimates (MLE) of

**Ω**.

## Simulations:

Two simulation strategies are designed to compare the results from the simplex algorithm and the EM algorithm. The first strategy considers nonfunctional mapping, in which *k*-dimensional phenotypic traits measured have identical means for each QTL genotype group, *i.e*., **m**_{1} = **1**_{k}_{×1}μ_{1} and **m**_{0} = **1**_{k}_{×1}μ_{0}, and a constant residual variance (there are no correlations between the *m* traits), *i.e*., **∑** = **I**_{k}_{×}* _{k}*σ

^{2}. The data are simulated under a multivariate normal distribution, with different sample sizes (100, 200, and 400). Other simulation conditions are the heritability of 0.15 for each trait, a total of

*k*= 15 phenotypic traits, and the QTL genotype frequency of

*p*= 0.3. Means and standard deviations of the estimated parameters are calculated from 1000 simulations. All calculations are carried out on Dell Workstation PWS530.

Table 1 summarizes the results from the simplest model in which the hypothesized trait is measured only at a single time point. The MLEs of mean values and residual variances of the trait are compared from the simplex and EM algorithms. To prevent possible occurrences of local maxima, multiple tries for initial values were made for the simplex algorithm. The two algorithms give identical estimates of all the parameters and identical likelihood values for each sample size. For this simplest situation, the simplex algorithm needs a slightly longer computational time than the EM algorithm, suggesting that the former is not advantageous for multivariate trait mapping.

The second simulation strategy incorporates logistic growth models into the mixture-based likelihood framework in Equation 2 (Ma* et al*. 2002; Wu* et al*. 2002). The expected genotypic values for each QTL genotype at different times are modeled by and . The results of the MLEs of growth parameters from functional mapping are almost identical between the simplex algorithm and the EM algorithm for all different sample sizes (Table 2). Yet, the computation times are substantially reduced when the simplex algorithm is used. On average, the time used for the simplex algorithm is only one-fifth of that for the EM algorithm. Thus, the simplex algorithm has a significant advantage in increasing computational efficiency over the traditional EM algorithm. This advantage is particularly striking if one is using a permutation argument to establish significance levels, as the calculation using the simplex algorithm would be substantially faster than the EM algorithm.

## A worked example:

The example for functional mapping used in an outcrossing poplar by Ma* et al.* (2002) is reanalyzed in this study. A full-sib family of 90 progeny was genotyped at a number of pseudo-test backcross molecular markers (Yin* et al*. 2002). Stem diameter growth was measured at the end of each growing season. Ma *et al.*'s (2002) functional mapping model based on the EM algorithm has successfully identified a major QTL (*P* < 0.01) for 11-year diameter growth trajectories on linkage group D10. This major QTL is also identified by the simplex algorithm in this study (Figure 2). As expected, the simplex and EM algorithms produce an identical profile of the log-likelihood ratios for the full model (there is a QTL) *vs*. reduced model (there is no QTL) across the length of linkage group 10, thus providing the same estimates of the two different growth curves each for a QTL genotype. But the simplex algorithm uses only 85 seconds to finish a chromosome-wide QTL search, whereas the EM algorithm needs 122 seconds. This difference will be amplified by the number of permutation tests used to characterize the critical threshold for declaring the presence of QTL.

## Concluding remarks:

Optimization techniques, *e.g*., the simplex algorithm, have been used to map QTL affecting a complex trait (Martinez* et al*. 1998; Perez-Enciso and Varona 2000; Liu* et al*. 2001; Perez-Enciso* et al*. 2002). However, none of these studies has explored statistical properties of this algorithm and made any comparison between it and the standard algorithms, such as the EM algorithm. In practice, the simplex algorithm may converge to local maxima and multiple tries for initial values should be made to guarantee the occurrence of global maxima. More recently, the implication of advanced optimization techniques like genetic algorithms has been recognized for mapping epistatic or linked QTL (Carlborg* et al.* 2000; Nakamichi* et al.* 2001), but no attempt was made to identify their roles in functional mapping aimed at the identification of more biologically relevant dynamic QTL (Ma* et al*. 2002).

In this study, we incorporated the Nelder-Mead simplex algorithm into the framework of QTL mapping. Our simulation study suggests that the simplex algorithm obtains the same results as the traditional EM algorithm, but it uses different amounts of computational time from the EM algorithm. If QTL mapping is performed for a trait measured at a single point, the simplex algorithm, although simple to implement, uses a longer time and, hence, displays no advantage, as compared to the EM algorithm. However, for functional mapping of a complex trait undergoing a particular developmental trajectory, the simplex algorithm is much more computationally efficient, as well demonstrated by both simulation and real-world examples. In conjunction with its derivative-free nature and easy implementation with software, therefore, the simplex algorithm provides an excellent alternative to estimate the mixture-based model for functional mapping.

## Acknowledgments

We thank Max Shen for his contribution of expertise to this work and Chris Haley for bringing to our attention the literature of the simplex algorithm for QTL mapping. This work is partially supported by grants from the National Science Foundation to G.C. (DMS9971586) and by the Outstanding Young Investigator Award of the National Natural Science Foundation of China (30128017) and the University of Florida Research Opportunity Fund (02050259) to R.W. The publication of this article is approved as journal series R-10067 by the Florida Agricultural Experiment Station.

## Footnotes

Communicating editor: C. Haley

- Received November 21, 2003.
- Accepted May 7, 2004.

- Genetics Society of America