| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Department of Statistics, University of Florida, Gainesville, Florida 32611
1 Corresponding author: Department of Statistics, 533 McCarty Hall C, University of Florida, Gainesville, FL 32611.
E-mail: rwu{at}stat.ufl.edu
| ABSTRACT |
|---|
|
|
|---|
![]() | (1) |
= (
1, ... ,
L)T are the mixture proportions that are constrained to be nonnegative and sum to unity;
= (
1, ... ,
L)T are the component-specific parameters, with
l being specific to component l; and
is a parameter that is common to all components.
The genetic mapping of QTL based on the mixture model contains three major steps: first, derive the mixture proportions (
) denoted as the frequencies of QTL genotypes for a particular genetic design; second, determine the distribution density function for each genotype in terms of QTL effects (
) and residual variance (
); third, provide estimates of unknown parameters contained in the mixture model. The first step uses Mendelian and population genetics relevant to experimental design, marker type, and population structure. The second step needs basic principles from quantitative genetics, general biology, and biomedicine. Most QTL mapping methods are based on the normal distribution because the traits of interest are generally continuously distributed. The last step, aimed at solving the mixture model, relies upon statistical theories and computational algorithms. Three computational algorithms have been developed to solve the QTL parameters contained in Equation 1: least-squares regression (LSR) analysis (KNOTT and HALEY 2000), maximum-likelihood-based expectation-maximization (EM) algorithm (DEMPSTER et al. 1977; JANSEN and STAM 1994), and Bayesian-based Markov chain Monte Carlo (MCMC) algorithm (SILLANPAA and ARJAS 1999). These three algorithms have been used as standard methods for QTL mapping because they are founded on solid statistical backgrounds and because their statistical properties have been investigated extensively (see the references cited above).
For most of the current statistical mapping methods devised under the assumption that there is a direct relationship between the genotype and phenotype (LANDER and BOTSTEIN 1989; reviewed in JANSEN 2000), LSR, EM, and MCMC algorithms are basically adequate to provide a reasonable solution of the mixture model. However, they would encounter significant difficulties in model solution when statistical methods are upgraded to incorporate inherited biological complexities. For example, the formation of quantitative traits is under developmental control, involving an intricate array of factors, genetic or nongenetic, and interactions among factors. For any organism, the output of a complex trait, i.e., phenotype, and its underpinning blueprint, i.e., genotype, are related through a particular developmental process or network (WOLF 2002). More recently, a general genetic theory has been formulated to integrate developmental mechanisms of trait formation into the mapping framework constructed by a mixture-based likelihood (MA et al. 2002; WU et al. 2002). This integrated model, called functional mapping by MA et al. (2002), has proven powerful in mapping the QTL that govern growth trajectories in long-lived forest trees.
Although the QTL detected from functional mapping are expected to be biologically more relevant than those from traditional methods (which do not consider any developmental mechanisms), the incorporation of a developmental mechanism that is described by a mathematical function often needs extensive mathematical manipulations, including differentiation when the EM algorithm is derived (MA et al. 2002), and computer programs. However, descriptive mathematical functions are often complicated nonlinear equations in many situations and, therefore, it is extremely time consuming to derive, program, and estimate the solutions of the likelihood. To make functional mapping more applicable in the genetic dissection of complex traits, it is essential to develop a fast and efficient computational algorithm for solving the likelihood equations.
The Nelder-Mead simplex algorithm, originally proposed by NELDER and MEAD (1965), is a direct-search method for nonlinear unconstrained optimization. It attempts to minimize a scalar-valued nonlinear function using only function values, without any derivative information (explicit or implicit). The algorithm uses linear adjustment of the parameters until some convergence criterion is met. The term "simplex" arises because the feasible solutions for the parameters may be represented by a polytope figure called a simplex. The simplex is a line in one dimension, triangle in two dimensions, and tetrahedron in three dimensions, respectively.
We illustrate how the simplex algorithm works using the following general optimization problem,
![]() |
|
Because of these favorable properties of the simplex algorithm, this algorithm has been used by some authors to map complex traits (MARTINEZ et al. 1998; PEREZ-ENCISO and VARONA 2000; LIU et al. 2001; PEREZ-ENCISO et al. 2002). However, all of these studies presented only a simple utilization of the simplex algorithm to QTL mapping with no detailed discussion on its advantages in computational efficiency and mathematical manipulation and its disadvantages in some asymptotic properties. Some of these studies did not cite even the seminal article by NELDER and MEAD (1965). Clearly, with no such discussion, the broad implications of the simplex algorithm remain unjustified. In this note, we for the first time compare the results and computational efficiency for QTL mapping between the simplex algorithm and the standard algorithm. We demonstrate that the simplex algorithm is an advantageous alternative when the purpose of a genome project is to map dynamic QTL that govern developmental trajectories.
As a special case of the mixture model (Equation 1), suppose that there is a progeny population of size N in which there are two different genotype groups at each locus with respective frequencies p and 1 p. Let Y = (Y1, ... , YN) be N independent observations each having k-dimensional phenotypic measurements. On the basis of a multivariate mixture model, the likelihood of Y, affected by a putative QTL of genotypes Qq (denoted by 1) and qq (denoted by 0), is formulated as
![]() | (2) |
= (m1, m0,
)T contains k expected genotypic values, m1 and m0, for each QTL genotype and a (k x k) within-genotype residual (co)variance matrix,
. We implement the simplex algorithm to obtain the maximum-likelihood estimates (MLE) of
. | Simulations: |
|---|
|
|
|---|
= Ikxk
2. The data are simulated under a multivariate normal distribution, with different sample sizes (100, 200, and 400). Other simulation conditions are the heritability of 0.15 for each trait, a total of k = 15 phenotypic traits, and the QTL genotype frequency of p = 0.3. Means and standard deviations of the estimated parameters are calculated from 1000 simulations. All calculations are carried out on Dell Workstation PWS530. Table 1 summarizes the results from the simplest model in which the hypothesized trait is measured only at a single time point. The MLEs of mean values and residual variances of the trait are compared from the simplex and EM algorithms. To prevent possible occurrences of local maxima, multiple tries for initial values were made for the simplex algorithm. The two algorithms give identical estimates of all the parameters and identical likelihood values for each sample size. For this simplest situation, the simplex algorithm needs a slightly longer computational time than the EM algorithm, suggesting that the former is not advantageous for multivariate trait mapping.
|
and
. The results of the MLEs of growth parameters from functional mapping are almost identical between the simplex algorithm and the EM algorithm for all different sample sizes (Table 2). Yet, the computation times are substantially reduced when the simplex algorithm is used. On average, the time used for the simplex algorithm is only one-fifth of that for the EM algorithm. Thus, the simplex algorithm has a significant advantage in increasing computational efficiency over the traditional EM algorithm. This advantage is particularly striking if one is using a permutation argument to establish significance levels, as the calculation using the simplex algorithm would be substantially faster than the EM algorithm.
|
| A worked example: |
|---|
|
|
|---|
|
| Concluding remarks: |
|---|
|
|
|---|
In this study, we incorporated the Nelder-Mead simplex algorithm into the framework of QTL mapping. Our simulation study suggests that the simplex algorithm obtains the same results as the traditional EM algorithm, but it uses different amounts of computational time from the EM algorithm. If QTL mapping is performed for a trait measured at a single point, the simplex algorithm, although simple to implement, uses a longer time and, hence, displays no advantage, as compared to the EM algorithm. However, for functional mapping of a complex trait undergoing a particular developmental trajectory, the simplex algorithm is much more computationally efficient, as well demonstrated by both simulation and real-world examples. In conjunction with its derivative-free nature and easy implementation with software, therefore, the simplex algorithm provides an excellent alternative to estimate the mixture-based model for functional mapping.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
CARLBORG, O., L. ANDERSSON and B. KINGHORN, 2000 The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155: 20032010.
CASTRO, I. A., R. S. F. SILVA, J. TIRAPEGUI, D. BORSATO and E. BONA, 2003 Simultaneous optimization of response variables in protein mixture formulation: constrained simplex method approach. Int. J. Food Sci. Tech. 28: 103110.[CrossRef]
DEMPSTER, A. P., N. M. LAIRD and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39: 138.
JANSEN, R. C., 2000 Quantitative trait loci in inbred lines, pp. 567597 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. Wiley, New York.
JANSEN, R. C., and P. STAM, 1994 High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136: 14471455.[Abstract]
KNOTT, S. A., and C. S. HALEY, 2000 Multitrait least squares for quantitative trait loci detection. Genetics 156: 899911.
LAGARIUS, J. C., J. A. REEDS, M. H. WRIGHT and P. E. WRIGHT, 1998 Convergence properties of the Nelder-Mead simplex method in low dimensions. Soc. Ind. Appl. Math. J. Optim. 9: 112147.
LANDER, E. S., and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185199.
LIU, X.-J., F. OLOVER, S. D. M. BROWN, A. DENNY and P. D. KEIGHTLEY, 2001 High-resolution quantitative trait locus mapping for body weight in mice by recombinant progeny testing. Genet. Res. 77: 191197.[CrossRef][Medline]
MA, C. X., G. CASELLA and R. L. WU, 2002 Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 161: 17511762.
MARTINEZ, M., N. VUKASINOVIC, A. E. FREEMAN and R. L. FERNANDO, 1998 Mapping QTL in outbred populations using selected samples. Genet. Sel. Evol. 30: 453468.
NAKAMICHI, R., Y. UKAI and H. KISHINO, 2001 Detection of closely linked multiple quantitative trait loci using a genetic algorithm. Genetics 158: 463475.
NELDER, J. A., and R. MEAD, 1965 A simplex method for function minimization. Comput. J. 7: 308313.
PEREZ-ENCISO, M., and L. VARONA, 2000 Quantitative trait loci mapping in F2 crosses between outbred lines. Genetics 155: 391405.
PEREZ-ENCISO, M., A. CLOP, J. M. FOLCH, A. SANCHEZ, M. A. OLIVER et al., 2002 Exploring alternative models for sex-linked quantitative trait loci in outbred populations: application to an Iberian x Landrace pig intercross. Genetics 161: 16251632.
PRESS, W. H., S. A. TEUKOLSKY, W. T. VETTERLING and B. P. FLANNERY, 1992 Numerical Recipes in C, Ed. 2. Cambridge University Press, Cambridge/London/New York.
PRICE, C. J., I. D. COOPE and D. BYATT, 2002 A convergent variant of the Nelder-Mead algorithm. J. Optim. Theor. Appl. 113: 519.[CrossRef]
SILLANPAA, M. J., and E. ARJAS, 1999 Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151: 16051619.
WALTERS, F. H., L. R. PARKER, S. L. MORGAN and S. N. DEMING, 1991 Sequential Simplex Optimization. CRC Press, Boca Raton, FL.
WOLF, J. B., 2002 The geometry of phenotypic evolution in developmental hyperspace. Proc. Natl. Acad. Sci. USA 99: 1584915851.
WU, R. L., C.-X. MA, M. R. CHANG, R. C. LITTELL, S. S. WU et al., 2002 A logistic mixture model for characterizing genetic determinants causing differentiation in growth trajectories. Genet. Res. 79: 235245.[CrossRef][Medline]
YIN, T. M., X. Y. ZHANG, M. R. HUANG, M. X. WANG, Q. ZHUGE et al., 2002 Molecular linkage maps of the Populus genome. Genome 45: 541555.[Medline]
This article has been cited by other articles:
![]() |
R. Yang, H. Gao, X. Wang, J. Zhang, Z.-B. Zeng, and R. Wu A Semiparametric Approach for Composite Functional Mapping of Dynamic Quantitative Traits Genetics, November 1, 2007; 177(3): 1859 - 1870. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wu, J. Yang, and R. Wu Semiparametric functional mapping of quantitative trait loci governing long-term HIV dynamics Bioinformatics, July 1, 2007; 23(13): i569 - i576. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhao, H. Li, W. Hou, and R. Wu Wavelet-Based Parametric Functional Mapping of Developmental Trajectories With High-Dimensional Data Genetics, July 1, 2007; 176(3): 1879 - 1892. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Cui, J. Zhu, and R. Wu Functional mapping for genetic control of programmed cell death Physiol Genomics, May 16, 2006; 25(3): 458 - 469. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Wu and W. Hou A Hyperspace Model to Decipher the Genetic Architecture of Developmental Processes: Allometry Meets Ontogeny Genetics, January 1, 2006; 172(1): 627 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Hou, C. W. Garvan, W. Zhao, M. Behnke, F. D. Eyler, and R. Wu A general model for detecting genetic determinants underlying longitudinal traits with unequally spaced measurements and nonstationary covariance structure Biostat., July 1, 2005; 6(3): 420 - 433. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Lin and R. Wu Theoretical Basis for the Identification of Allelic Variants That Encode Drug Efficacy and Toxicity Genetics, June 1, 2005; 170(2): 919 - 928. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhao, Y. Q. Chen, G. Casella, J. M. Cheverud, and R. Wu A non-stationary model for functional mapping of complex traits Bioinformatics, May 15, 2005; 21(10): 2469 - 2477. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Wu, Z. Wang, W. Zhao, and J. M. Cheverud A Mechanistic Model for Genetic Machinery of Ontogenetic Growth Genetics, December 1, 2004; 168(4): 2383 - 2394. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |