Abstract
The genetic analysis of characters that are best considered as functions of some independent and continuous variable, such as age, can be a complicated matter, and a simple and efficient procedure is desirable. Three methods are common in the literature: random regression, orthogonal polynomial approximation, and character process models. The goals of this article are (i) to clarify the relationships between these methods; (ii) to develop a general extension of the character process model that relaxes correlation stationarity, its most stringent assumption; and (iii) to compare and contrast the techniques and evaluate their performance across a range of actual and simulated data. We find that the character process model, as described in 1999 by Pletcher and Geyer, is the most successful method of analysis for the range of data examined in this study. It provides a reasonable description of a wide range of different covariance structures, and it results in the best models for actual data. Our analysis suggests genetic variance for Drosophila mortality declines with age, while genetic variance is constant at all ages for reproductive output. For growth in beef cattle, however, genetic variance increases linearly from birth, and genetic correlations are high across all observed ages.
A simple and efficient procedure for the genetic analysis of characters that change as a function of age (or some other independent and continuous variable) is desirable for researchers in several fields of biology and genetics. Plant and animal breeders are often faced with the genetic analysis of “repeated measures” data, such as lactation in dairy cows or growth rates in important agricultural species. Biologists interested in the evolution of life histories study the genetic basis of agespecific fitness components, such as survival or reproductive output, while evolutionary ecologists often examine the genetic relationship between values of a single character expressed over a continuous range of environmental variables.
Recent conceptual and computational advancements have made the genetic analysis of such functionvalued traits readily accessible. Three methods have been advanced in the literature. First, random regression models have been widely used for the analysis of longitudinal data in the traditional statistical literature (Diggleet al. 1994) and recently have been applied in the animal breeding context (Jamrozik et al. 1997b). Second, the use of orthogonal polynomials to approximate covariance matrices was initially suggested by Kirkpatrick and Heckman (1989) and is closely related to the random regression models (Meyer and Hill 1997; Meyer 1998). Third, the character process model was recently proposed by Pletcher and Geyer (1999) and is based on theories of stochastic processes. We develop and consider a general extension of the process model to take advantage of new methods for estimating complicated correlation structures. Each of these methods has been implemented in relatively easy to use computer software packages, and they are freely available.
The aim of this article is to compare and contrast random regression, orthogonal polynomials, and character process models and evaluate their performance. We focus first on examining the underlying assumptions of the three methods, while emphasizing fundamental similarities and differences when appropriate. Next, we explore a variety of simulated data sets and describe the types of covariance structures (genetic, environmental, and otherwise) accommodated by each method. Last, using empirical data on agespecific mortality and reproductive output in the fruit fly Drosophila melanogaster and on growth in beef cattle, we evaluate the ability of each model to adequately fit empirical data.
THE GENETIC ANALYSIS OF FUNCTIONVALUED TRAITS
Detailed descriptions of the extension of classical quantitative genetics to the analysis of functionvalued traits is given in Kirkpatrick and Heckman (1989) and Pletcher and Geyer (1999). In short, the method assumes the observed character is best described by a function (or stochastic process) of some independent and continuous variable. Although any continuous variable is acceptable (e.g., the level of some environmental factor), the most common is age, and all of the examples in this article focus on characters that change with age. Further, it is assumed that the character values at each age constitute a multivariate normal distribution on some scale.
As with traditional quantitative genetics, it is assumed that the observed phenotypic trajectory of the character is random and influenced by one or more unobservable factors. In the simplest case one might consider the additive contribution of many genes along with unpredictable environmental effects. More complicated models involving interactions among different genes or specific environmental effects (e.g., maternal effects) are straightforward, although computational difficulties will likely arise. For the additive model, we assume the observed phenotype can be decomposed as
The goal of the analysis is to decompose the observed variation in X(t) into its genetic and environmental contributions by estimating covariance functions for g(t) and e(t). A covariance function, r(s, t), is a bivariate continuous function that describes the covariance between any two ages, r(s, t) = Cov{X (s), X (t)}. By the independence of g(t) and e(t), the phenotypic covariance function of X(t) is given by P(s, t) as
There have been at least three different methods suggested for estimating the desired covariance functions: orthogonal polynomials (Kirkpatrick and Heckman 1989), random regression (Meyer 1998), and the character process model (Pletcher and Geyer 1999). All three methods are based on likelihood estimation—although the orthogonal polynomial approach was originally published as a least squares estimation (Kirkpatricket al. 1990).
Random regression: Random regression (RR) models employ parametric forms for the unobserved functions in (1). Although traditionally a parametric mean curve is often used to estimate μ(t), this is not essential. However, the individual deviations from this curve [i.e., the g(t) and e(t)] are assumed to be parametric functions of time, and polynomials are often used. For example, the agedependent deviations from the population mean due to an individual's genotype might be linear in time, such that
Genetic and environmental covariances as a function of age are determined by the variances and covariances among the regression coefficients. Following the example presented above, the genetic covariance between ages s and t is
Character process model: In contrast to the RR models, the character process model does not attempt to model the forms of the g(t) or e(t) functions. Instead, parametric models for the covariance functions themselves [i.e., G(s, t) and E(s, t) in Equation 2] are the target of analysis (Pletcher and Geyer 1999).
Again taking the genetic covariance function as an example, the covariance function can be decomposed into
We suggest an extension of the character process model for nonstationary correlations using a method proposed by NunezAnton (1998) and NunezAnton and Zimmerman (2000) in what they term structured antedependence models. The idea is to implement a nonlinear transformation upon the time axis, f(t), such that correlation stationarity holds on the transformed scale—on the original scale the correlation is nonstationary. The correlation function is then defined as ρ(s, t) = ρ(f (s) – f (t)), and the functions suggested by Pletcher and Geyer (1999) remain valid. Ideally the transformation function should contain a small number of parameters with interpretable effects.
NunezAnton and Zimmerman (2000) suggest a BoxCox power transformation such that
Orthogonal polynomials: Kirkpatrick and Heckman (1989) originally presented the use of orthogonal polynomials (OPs) as a nonparametric way of “smoothing” previously estimated covariance matrices. This was the first attempt to formalize the estimation of covariance functions in a genetic context. As with the CP model, the shapes of the individual agedependent deviations were not considered, and models for the structure of the variancecovariance matrix itself were the focus of attention. Kirkpatrick and Heckman (1989) suggest that the genetic covariance function be represented as
As originally presented, the orthogonal polynomial approach is similar in spirit to the CP model, and both differ in principle from the RR approach. In the RR methods, the primary model development occurs at the level of individual deviations (Equation 1). The analyst begins by considering the behavior of individual agespecific deviations. The resulting covariance structure is a consequence of these deviations. For the CP and OP models, the situation is reversed. The analyst begins by considering the structure of the covariance matrix (Equation 2), and the shapes of the individual deviations are a consequence of this structure. In some cases it may be possible to expose a duality between the two, as Meyer (1998) has done for certain RR and OP models. When the data are collected at equally spaced intervals, CP models with a constant variance and an absolute exponential correlation
EXAMPLES AND ANALYSES
Estimation procedures: All models were estimated using REML. In all cases a nonparametric mean function was used (i.e., a separate mean was fitted for each distinct age in the data), which ensures a consistent estimate of the covariance structure (Diggleet al. 1994). Comparison among models was based on the Bayesian information criterion (BIC; Schwarz 1978), which provides for likelihoodbased comparison among nonnested models. BIC is
To determine the bestfitting model under each technique, a large number of models were fit to each data set. For the character process method, >100 different models (i.e., different combinations of polynomial variance functions and stationary and nonstationary correlation functions) were investigated, and the best model was chosen according to the BIC criterion. We chose to examine a large number of CP models for reasons of thoroughness. The CP models are relatively new, and the behavior of these models is not well known. In practice, such an exhaustive search is not required, as standard model selection procedures (e.g., sequential addition of polynomial terms to the variance function) result in identical conclusions (results not presented). For both random regression and orthogonal polynomial methods, the appropriate polynomials of increasing degree were fit until an increase in degree no longer resulted in a significant increase in the loglikelihood at the α = 0.05 level (Meyer and Hill 1997). We find that a reasonable approach to model selection requires on the order of 5–10 model fits for each method.
Estimates of the covariance structure based on random regression and orthogonal polynomial methods were obtained using the software package ASREML (Gilmouret al. 1997), while estimates of the character process model (and certain orthogonal polynomial models) were obtained using computer software developed by one of the authors (S. Pletcher; C code and executable files freely available). A series of exploratory analyses were conducted to ensure the two software packages produced comparable loglikelihoods. A small number of covariance structures could be fitted by both packages (models of constant variance and correlation across ages, and small orthogonal polynomial models) and these structures were fitted to several data sets. In all cases, identical loglikelihoods were reported by each package.
Simulated data: Many data sets were simulated according to various covariance structures from CP, RR, and OP models. All were built assuming a standard sire design (i.e., groups of halfsibs) in which 12 offspring from each of 70 sires were measured at five different ages (Lynch and Walsh 1998). Under such a design, the estimated betweensire covariance function is directly proportional to the genetic covariance function. The environmental covariance function and residual error are estimated based on the withinsire and the withinanimal variation. We present the results of four representative data sets. Because the magnitudes of the variance and covariances were different among the simulations, we set the residual variance for all simulations to ∼10% of the total variance at age 0.
The first data set was simulated according to a stationary CP covariance structure, the purpose of which was to assess the behavior of RR and OP models when the genetic correlation decreases to zero within the range of the data. The genetic covariance function was composed of a quadratic variance [i.e., a quadratic v^{2}(·) from Equation 4] and “normal” correlation (ρ(t_{i}, t_{j}) = exp(–0.8(t_{i} – t_{j})^{2})) (Figure 1A). The environmental covariance function was composed of a linear variance and “Cauchy” correlation function (ρ(t_{i}, t_{j}) = 1/(1 + 0.05(t_{i} – t_{j})^{2})) (Pletcher and Geyer 1999). We refer to this data set as the stationary CP data.
To examine a wellbehaved covariance function with a somewhat nonstationary correlation, we simulated data with genetic variance function identical to that in the stationary CP data, but with an arbitrary nonstationary correlation structure (Figure 1B). The environmental covariance was assumed identical to that in the stationary CP data. This data set is the nonstationary CP data.
The third data set was simulated according to a random regression model with linear deviations for both the genetic and environmental parts. The chosen parameter values resulted in genetic and environmental correlations that remained quite high over all ages in the data (Figure 1C).
The last data set that we present was simulated according to an OP model, with quadratic Legendre polynomials for the genetic and environmental parts (i.e., m = 2 in Equation 6). The shapes of the covariance functions were rather undulating, as is expected from functions based on orthogonal polynomials. Parameter values were chosen such that the environmental correlation remained quite high over time while the genetic correlation was highly nonstationary (Figure 1D).
To compare the fit of the models we calculated goodnessoffit statistics for the estimated variance and correlation functions under each model with respect to the simulated structure. Goodness of fit was quantified by the concordance correlation coefficient, r_{c}, described by Vonesh et al. (1996; see appendix). The possible values of r_{c} are in the range –1 ≤ r_{c} ≤ 1, with a perfect fit corresponding to a value of 1 and a lack of fit to values ≤0.
Empirical data: Drosophila reproduction and mortality: Agespecific measurements of reproduction and mortality rates were obtained from 56 different recombinant inbred (RI) lines of D. melanogaster, which are expected to exhibit genetically based variation in longevity and reproduction (J. W. Curtsinger and A. A. Khazaeli, unpublished results). Agespecific measures of mortality and average female reproductive output were collected simultaneously from two replicate cohorts for each of 56 RI lines. Deaths were observed every day, while egg counts were made every other day. For both mortality and reproduction the data were pooled into 11 5day intervals for analysis. Mortality rates were log transformed and reproductive measures were squareroot transformed to insure the agespecific measures were normally distributed.
Growth in beef cattle: These data come from the Wokalup selection experiment in Western Australia and correspond to January weights of 436 beef cows from 77 sires. Weights were recorded between 19 and 82 months of age, with up to six records per cow. Analyses were carried out within 83 contemporary groups (yearpaddockage of weighing subclasses), fitted as fixed effects. Additional information, along with access to the data, can be obtained from Dr. Karin Meyer's web page at the Animal Genetics unit of the University of New England, Australia (http://agbu.une.edu.au/~meyer).
RESULTS
Simulations: For the stationary CP data, the best random regression model according to the BIC criterion was characterized by quadratic and linear deviations for the genetic and environmental parts, respectively. Higherorder polynomials did not converge to a maximum and could not be considered. The best OP model contained a cubic polynomial for the genetic covariance and a quadratic for the environmental part. As expected, the simulated structure was accurately recovered by the stationary character process model. Concordance coefficients r_{c} describing the goodness of fit for the variance and correlation functions are given in Table 1. For the RR and OP models, the environmental covariance structure (including both the variance and correlation) was very well fitted (r_{c} ≈ 1). The genetic variance was also well modeled, but both models had trouble dealing with the rapidly decreasing genetic correlation function. Although the OP model could better estimate the genetic correlation (r_{c} = 0.61 for OP compared to 0.36 for RR), it contains significantly more parameters than the regression model (17 vs. 10), and both models exhibit similar behavior. The polynomial structures are unable to handle correlation patterns that decrease asymptotically to zero within the range of the data, and the correlation obtained by both models goes negative (Figure 2).
The aim of the second simulated data set was to investigate the behavior of these models in the case of a rather simple nonstationary genetic correlation structure. The best RR and OP models were the same as for the stationary CP data detailed in the previous paragraph. The RR model dealt very poorly with the nonstationary pattern of the genetic correlation (r_{c} = 0.10); the correlation was estimated to be very high over all ages. Again, the greater number of parameters in the bestfitting OP model over the regression model provided a better fit to the correlation structure (r_{c} = 0.70). Surprisingly, the CP model failed to accurately estimate the nonstationary correlation pattern (Table 1). Our nonstationary extension did not significantly improve the goodness of fit (BIC = –4454 and –4456 for stationary and nonstationary models, respectively; P = 0.052 for a likelihoodratio test of λ = 1.0). However, the goodness of fit of the fitted nonstationary correlation (r_{c} = 0.55) is substantially better than that of the stationary model (r_{c} = 0.03), which provides an interesting commentary on model selection criteria. In retrospect, the nonstationarity in this data set was predominantly between extreme ages (ages 1 and 5). It is possible that more observations per individual are needed to detect small to moderate levels of nonstationarity (see fly reproduction data). The genetic variance function and environmental covariance structure were identical to that for the stationary CP data and were well fit by all the methods (Table 1).
All methods did a reasonable job of estimating the genetic and environmental covariance structures generated according to a random regression model with linear deviations. Under this model the correlations (both genetic and environmental) remained quite high over time. Our nonstationary extension of the CP model was successful in providing a good fit to the data. The genetic covariance structure was described by a quadratic variance and nonstationary correlation given by the characteristic function of the Uniform distribution (Pletcher and Geyer 1999), and the environmental variance function was linear with a Cauchy correlation. The goodness of fit for the genetic correlation structure was improved substantially over a stationary model (r_{c} = 0.74, BIC = –3819 and r_{c} = 0.93, BIC = –3817 for the stationary and nonstationary CP models, respectively).
Although we have essentially no idea what a typical agedependent genetic covariance function might look like, the data set simulated with an OP structure might be considered pathological in that the genetic covariance structure is highly irregular. In fact, the genetic correlation is negative between early ages but highly positive between late ages (Figure 1D). Such a structure is, however, typical for OP models (Kirkpatricket al. 1994). Convergence problems hindered our ability to obtain estimates of high dimensional random regression models, and the best RR model was not able to accommodate either the simulated genetic variance or correlation (r_{c} = 0.30 and r_{c} = 0.15, respectively). Both the genetic and environmental covariance structures were described by a quadratic variance and nonstationary correlation given by the characteristic function of the Uniform distribution. When compared to random regression, the CP model is much better at estimating the genetic variance function but is slightly worse at approximating the correlation structure (Table 1). The environmental covariance is better behaved and much less of a problem. As seen with the random regression simulations, the strong positive correlations across all ages are well fit by all the methods.
Empirical: Drosophila reproduction and mortality: For agespecific mortality and reproduction in Drosophila, the character process model provided a significantly better fit, according to the BIC criterion, than either the orthogonal polynomial or random regression methods (Table 2). In fact, the CP models achieved higher likelihoods despite containing significantly less parameters than the OP or RR models. For agespecific mortality, the bestfitting model for the genetic covariance was a quadratic variance with a Cauchy correlation function (ρ_{G}(t_{i}, t_{j}) = 1/(1 + θ(t_{i} – t_{j})^{2})). For fly reproduction the best character process model was a constant variance at all ages coupled with a nonstationary correlation function described by the absolute exponential,
The simplicity of the character process model allows quantitative statements about the predominant attributes of the genetic covariance function. Genetic variance for Drosophila mortality declines significantly with age, while genetic variance is constant at all ages for reproductive output. For mortality, the parameter in the genetic correlation function was significantly different from zero (P < 0.0001), suggesting that mortality rates become less genetically correlated as ages become further separated in time. This is true for reproductive output as well, and the significant nonstationary parameter in the genetic correlation provides evidence for an increase in the correlation between two equidistant ages with increasing age.
Beef cattle: Although differences in fit among the methods are less dramatic for beef cattle than for Drosophila, the character process model again provides a significantly better fit (as determined by the BIC criterion) than either random regression or orthogonal polynomial methods (Table 2). The bestfitting model for the genetic part was a linear variance (increasing with age) and an absolute exponential correlation
DISCUSSION
The quantitative genetic analysis of repeated measures and other functionvalued traits requires the estimation of continuous covariance functions for each source of variation in a particular statistical model. Traditionally, statistical geneticists interested in characters that change gradually along some continuous scale have had to settle for models that are either overparameterized (i.e., standard multivariate methods) or oversimplified (e.g., composite character analysis; Meyer 1998; Pletcher and Geyer 1999). In recent years, however, the introduction and development of random regression models, orthogonal polynomial models, and models based on stochastic process theory (i.e., the character process model) have provided important alternatives. Other types of random regression models (e.g., nonlinear models as suggested by Lindstrom and Bates 1990 and Davidian and Giltinan 1995) may prove useful, but they are currently difficult to implement.
Through extensive investigation of a variety of simulated covariance structures and empirical data, we find that under most conditions the CP models provide the best description of the underlying covariance structure. It is clear from the simulation results that the CP model is the only method that adequately captures a correlation that declines rapidly to zero as character values become further separated in time. Both random regression models and orthogonal polynomials have noticeable problems approximating such a structure (Table 1, stationary CP data; Figure 2). Polynomials do not have asymptotes, and the rapid decline in correlation tends to force both methods to estimate correlations that are strongly negative within the range of the data. Although the characteristics of covariance functions for natural organisms remain generally unknown, this is a serious limitation as asymptotic behavior in covariances/correlations are to be expected (Pletcher and Geyer 1999). Other parameterizations of the RR models (e.g., using orthogonal polynomials in the regression) may prove more useful in this regard. On the other hand, RR and OP models work quite well when the correlation structure remains high over time (see Table 1, environmental correlation in CP simulated data).
A further advantage of the CP models appears to be the ability to model the variance and correlation separately. As mentioned previously, for random regression models the entire covariance structure is implicitly determined by the shapes of the regression polynomials, and covariance surfaces described by orthogonal polynomials have a fixed relationship between variance and correlation. This limitation is exemplified in the analysis of growth in beef cattle. For the genetic deviation, the bestfitting RR model included only a random intercept. This implies not only that the variance is considered constant over time but also that the correlation is constant and equal to 1 across all ages, which is probably not appropriate (Figure 3C). Applying the same argument to the fertility data in Drosophila, the bestfitting CP model for the genetic part was a constant variance with a rather rapid decline in correlation between increasingly separated ages (Table 3). Such a combination is simply not possible under the RR or OP methods. It is also likely that the separation of variance and correlation was a major factor contributing to the ability of the CP model to reasonably estimate the genetic variation with a much smaller number of parameters (4 parameters) than random regression (10 parameters) or orthogonal polynomial (17 parameters) models (Table 2).
The data sets we examined were small in comparison to those commonly analyzed in agricultural and breeding contexts. Using extremely large data sets, complicated covariance and correlation models may be of greater use, and the random regression and orthogonal polynomial methods may begin to show an advantage. Large data sets would also relieve the convergence problems we experienced with highorder random regression and orthogonal polynomial models. Unfortunately, most quantitative genetic studies of natural and experimental populations are extremely labor intensive, and sample sizes will often be similar to those reported here. For these situations, the properties of the character process models (e.g., easy hypothesis testing, few and interpretable parameters) make it a useful option.
Despite their apparent success in this study, there are several important limitations of the process models that suggest avenues for further development. First, additional ways of relaxing the stationarity assumption (Pletcher and Geyer 1999) without greatly increasing the number of parameters are needed. Although not appropriate in all situations, a promising direction proposed by NunezAnton and Zimmerman (2000) has been studied here and seems to offer reasonable flexibility in practice. Second, CP models require the manipulation (inversion, factorization, etc.) of matrices whose dimensions are proportional to the number of ages in the data set, regardless of the size of the model itself (Meyer 1998). A method of reparameterization, similar to that used for RR and OP models (Meyer 1998), would be useful. Third, a method for estimating the eigenfunctions of covariance functions used by the process models would provide insight into patterns of genetic constraints across ages (Kirkpatricket al. 1990; Kirkpatrick and Lofsvold 1992).
Last, the genetic analysis of two or more functionvalued traits is an important goal. Generalization of regression models to multitrait analyses is straightforward and has already been used, for instance, to analyze agedependent milk production, fat, and protein content in dairy cattle (Jamroziket al. 1997a). Bivariate character process models might be implemented by defining a parametric crosscovariance function between the two traits, but appropriate forms for this function are yet to be discovered.
Acknowledgments
W. Hill, N. Barton, and two anonymous reviewers provided valuable comments on the manuscript. Thanks to J. Curtsinger and A. Khazaeli for generously providing published and unpublished data. F.J. thanks the INRA for support during this project.
APPENDIX: GOODNESS OF FIT OF THE COVARIANCE STRUCTURE
The concordance correlation coefficient r_{c} described by Vonesh et al. (1996) was used in the simulation study to evaluate the goodness of fit for both the variance and correlation functions estimated by the models when compared to the simulated structure. For the correlation structure, for instance, we consider
The coefficient r_{c} is directly interpretable as a concordance coefficient between observed and predicted values. It directly measures the level of agreement (concordance) between y_{ij} and ŷ_{ij}, and its value is reflected in how well a scatter plot y_{ij} vs. ŷ_{ij} falls about the line identity. The possible values of r_{c} are in the range –1 ≤ r_{c} ≤ 1, with a perfect fit corresponding to a value of 1 and a lack of fit to values ≤0.
Footnotes

Communicating editor: C. Haley
 Received February 18, 2000.
 Accepted June 26, 2000.
 Copyright © 2000 by the Genetics Society of America