Multivariate models are of great importance in theoretical and applied quantitative genetics. We extend quantitative genetic theory to accommodate situations in which there is linear feedback or recursiveness between the phenotypes involved in a multivariate system, assuming an infinitesimal, additive, model of inheritance. It is shown that structural parameters defining a simultaneous or recursive system have a bearing on the interpretation of quantitative genetic parameter estimates (e.g., heritability, offspring-parent regression, genetic correlation) when such features are ignored. Matrix representations are given for treating a plethora of feedback-recursive situations. The likelihood function is derived, assuming multivariate normality, and results from econometric theory for parameter identification are adapted to a quantitative genetic setting. A Bayesian treatment with a Markov chain Monte Carlo implementation is suggested for inference and developed. When the system is fully recursive, all conditional posterior distributions are in closed form, so Gibbs sampling is straightforward. If there is feedback, a Metropolis step may be embedded for sampling the structural parameters, since their conditional distributions are unknown. Extensions of the model to discrete random variables and to nonlinear relationships between phenotypes are discussed.
MULTIVARIATE models are of great importance in applied, evolutionary, and theoretical quantitative genetics. For example, in animal and plant breeding, the value of a candidate for selection as a prospective parent of the next generation often is a function of several traits, e.g., protein yield, milk somatic cell count, fertility, and life span in dairy cattle or yield and resistance to disease in wheat. In evolutionary genetics, the effects of natural selection on mean fitness depend on the values of elements of the genetic variance-covariance matrix between quantitative characters (e.g., Cheverud 1984). Walsh (2003) and B. Walsh and M. Lynch (unpublished results) give a discussion of the dynamics of quantitative traits under multivariate selection.
A schematic of the standard multivariate model used in quantitative genetics is displayed in Figure 1, where a two-trait system is represented; for simplicity, all other explanatory variables are omitted. The diagram depicts a 2 × 1 vector of phenotypic values (Y1, Y2) expressed as a function (typically linear) of genetic effects (U1, U2), usually taken to be of an additive type, and of environmental or residual effects (E1, E2). The genetic and environmental effects are assumed to be independently distributed random vectors, following the bivariate normal distributions N(0, G0) and N(0, R0), respectively. Here, 1and 2are genetic and residual variance-covariance matrices, respectively. For example, σu12 is the variance between additive genetic effects affecting trait 1, and σe12 is the residual covariance between traits 1 and 2.
The standard model depicted in Figure 1 does not allow for feedback or recursive relationships between phenotypes, which may be present in many biological systems. A classical example of feedback (that is, when changes of a quantity indirectly influence the quantity itself) is given by Haldane and Priestley (1905) and retaken by Turner and Stevens (1959) and by Wright (1960). These authors modeled feedback relationships between the concentration of CO2 in the air (A) and in the alveoli of the lungs (C) and the depth of respiration (D). As shown in Figure 2, Turner and Stevens (1959) posit that A affects C; in turn, C and D have a feedback relationship. Wright (1960) introduces residuals V and W for the C and D variables, respectively, and makes a further extension of the model. The extension consists of including a variable X for the actual concentration of CO2 in the lungs; in Figure 2, U represents “random” errors of technique; this is a “measurement error” model (Warren et al. 1974; Joreskög and Sorböm 2001). In the Turner-Stevens model, the effect of C on D is through a coefficient λDC, whereas λCD gives the rate of change of C with respect to D. Suppose that these two coefficients are not null, so that feedback takes place. Ignoring the actual biology of the problem, a model such as that of Turner and Stevens (1959) or of Wright (1960) implies the following: if A is increased and the relationship between C and A is such that C increases as well, then D will increase provided λDC is positive. Further, if λCD is positive, then C will increase further. If all the loops go in the same direction, there is positive feedback, which might lead to some equilibrium or to an eventual breakdown of the system (Turner and Stevens 1959). A second example of reciprocal interaction is the classical supply-demand problem of econometrics (Wright 1925; Johnston 1972; Judge et al. 1985). Also, the existence of feedback inhibition is well known in genetic regulation. For instance, the product of a metabolic pathway may bind to a gene product (enzyme) catalyzing a previous step, to prevent the channeling of additional molecules through the pathway (Lewin 1985). A discussion of the implications of interactive enzyme systems in genetics is in Kacser and Burns (1981). They write:
In vivo enzymes do not act in isolation, but are kinetically linked to other enzymes via their substrates and products. These interactions modify the effect of enzyme variation on the phenotype, depending on the nature and quantity of the other enzymes present. An output of such a system, say a flux, is therefore a systemic property, and its response to variation at one locus must be measured in the whole system.
It has been long recognized in economics (e.g., Haavelmo 1943) that the existence of lagged or of instantaneous feedback (often referred to as “simultaneity”) and of recursiveness between variables has implications on the understanding of multivariate systems, and that special statistical techniques are required for inference. Curiously, Sewall Wright's work on feedback mechanisms has received scant attention in population/quantitative genetics, in spite of his influence on the aforementioned fields and the pervasiveness of such mechanisms in regulation, as noted. An explanation may reside in the fact that even though path analysis was “extremely powerful in the hands of Wright” (Kempthorne 1969), the lack of matrix representations in his writings hampered a general understanding of the method. This is especially true of Wright's treatment of reciprocal interaction with lags (Wright 1960), which is difficult to follow. Also, Goldberger (1972)(p. 988) noted: “when there are more estimating equations than unknown parameters, path analysis gives no systematic guide to efficient estimation,” a situation known as overidentification. At any rate, social scientists eventually embedded path analysis into the general framework of simultaneous systems and gave it a formal statistical structure (Goldberger 1972; Goldberger and Duncan 1973; Duncan 1975; Joreskög and Sorböm 2001).
Our concern here is with the consequences of the existence of simultaneous (“feedback”) and recursive relationships between phenotypic values on quantitative genetic parameters, as well as with statistical methods for appropriate inference. The outline of the article is as follows. genetic consequences of simultaneity in a two-trait system and genetic consequences of recursiveness illustrate effects of simultaneity or recursiveness on simple two-trait systems. In particular, formulas are presented for heritability, for the offspring-parent regression, and for the genetic and environmental correlations between traits. Next, matrix representations shows that if four phenotypes are involved in a multivariate system, there can be as many as 128 mutually exclusive and exhaustive models for describing the relationships between phenotypic values. Formulas pertinent to multivariate selection (e.g., best prediction of genetic values) are given as well. Then, the likelihood function and identification of parameters sections are presented. bayesian model addresses statistical inference in a structural equations model from a Bayesian perspective. It is shown in the fully conditional posterior distributions that, under normality assumptions, most conditional posterior distributions arising in the multivariate system are in recognizable form. The implication is that software for standard multiple-trait analysis of quantitative traits via Gibbs sampling (a Markov chain Monte Carlo method) can be adapted to handle simultaneity and recursiveness in a fairly direct manner. The article concludes with suggestions on how the approach can be extended to a wider class of models.
GENETIC CONSEQUENCES OF SIMULTANEITY IN A TWO-TRAIT SYSTEM
Let yi1 and yi2 be measurements on traits 1 and 2 observed in individual i. For example, in an animal breeding setting, yi1 may represent the milk yield of dairy cow i and yi2 may be a proxy for the level of some disease (e.g., milk somatic cell count as an indicator of mastitis, a bacterial-related inflammation of the mammary gland). Suppose that biological knowledge admits that production affects disease and, in turn, that disease affects production. As noted earlier, we refer to this as a simultaneous or instantaneous feedback system, following econometric terminology (Zellner 1979; Judge et al. 1985). Assume that the relationship between production and disease can be represented by the two-equation linear system, 3and 4Here, λ12 is the rate of change of level of production with respect to a disease index and λ21 is the gradient of disease with respect to production. A priori, one might expect λ12 to be negative and λ21 to be positive, since high output may be associated with “stress” in the dairy cow. It is assumed that λ12 and λ21 are homogeneous across individuals, but this can be relaxed. The vectors β1 and β2, often called fixed effects in the statistical literature (Searle 1971), are some location parameters such as age of the cow, parity, or breed affecting production (net of disease) and disease (net of production), respectively. Further, x′i1 and x′i2 are known incidence row vectors; whenever the same factors affect the two traits, the covariates are such that , say. Finally, ui1 and ui2 are additive genetic effects (Fisher 1918; Bulmer 1980) intervening in the system and ei1 and ei2 are model residuals.
It is important to note that ui1 (ui2) above can be construed as an additive genetic effect affecting only production (disease) if and only if λ12(λ21) = 0; it is shown subsequently that ui1 and ui2 may affect each of the two traits. On the other hand, it is legitimate to view ui1, say, as the additive genetic component of the linear combination yi1 − λ12yi2. This is suggested by a rewriting of (3) and (4) as 56More generally, ui1 and ui2 are genetic effects “controlling” system (5)–(6).
When a specification such as system (5)–(6) holds, generalized least-squares estimates (or maximum-likelihood estimates under standard normality assumptions) of the structural parameters λ12 (or λ21) are biased and inconsistent if obtained from a subset of equations (Johnston 1972). For instance, if a univariate analysis of trait 1 is conducted including yi2 as a covariate (but ignoring the submodel for trait 2), λ12 is estimated with an upward bias (provided λ12 < 1). The effects on model parameters are more cryptic as the dimensionality of the system increases, for example, when a system of five mutually interacting response variables is analyzed with a three-trait model.
The implications of system (5)–(6) on the interpretation of some parameters of interest in quantitative genetics analysis are considered next.
Use (4) in (3) and solve for yi1, to obtain the “reduced” model (a term from econometrics), 7where and with the random effects u*i1 ∼ N and e*i1 ∼ N being independently distributed. Note that both β1 and β2 intervene in μ1*, and that ui1* is a linear combination of the system genetic effects ui1 and ui2. An implication of this is that estimates of location parameters and of predictions of random effects from standard univariate or multivariate analyses can be interpreted differently if simultaneity holds. Suppose data are missing at random (i.e., that selection is ignorable). In this case the fraction of the variance of trait 1 that can be attributed to additive genetic effects, or coefficient of heritability, is 8Observe that the apportionment of variance into genetic and environmental components depends nontrivially on the structural parameter λ12, but not on λ21 (the opposite being true for trait 2). If σu12 = σe12 = 0, the linear combinations or “composite” traits yi1 − λ12yi2 and yi2 − λ21yi1 are statistically independent (by virtue of normality); however, h12 would still depend on λ12, as The corresponding expression for the heritability of trait 2 is
Regression of offspring on parent:
Let yi′1 be a record for trait 1 measured on the offspring of an individual with phenotype yi1. The offspring-parent covariance is Assuming zero covariances between environmental effects affecting records taken on different individuals, under additive inheritance one has that which is half of the numerator of the expression leading to (8). In the absence of feedback (λ12 = 0), the offspring-parent covariance is always positive and equal to σ2u1/2. Under simultaneity, however, this covariance could be negative, provided that The implication is that a variance component analysis based on the reduced model would, necessarily (because of the parameterization), return a positive fitted value of the offspring-parent covariance. On the other hand, this may not be so under a simultaneous equations model. If the observed covariance is negative, this should be construed as evidence against a specification failing to accommodate simultaneity, although there may be other reasons (e.g., maternal effects) for a negative offspring-parent covariance.
The regression of offspring on parent is 9yielding the usual σ2u1/ in the absence of feedback (λ12 = λ21 = 0).
Regression of one variable on another:
If , the reduced models as in (7) become 10and 11
Under normality, the regression function of trait 2 on trait 1 is In the absence of simultaneity, this reduces to the usual formula,
Genetic and environmental correlations:
The reduced models (10) and (11) lead to where Similarly The genetic and environmental covariances depend on the λ coefficients and on the appropriate variances and covariances of each of the two composite traits.
The genetic correlation is 12and the expression for the residual correlation is similar. If σu12 = 0 (i.e., when the composite traits are genetically uncorrelated), (12) becomes
GENETIC CONSEQUENCES OF RECURSIVENESS IN A TWO-TRAIT SYSTEM
A recursive specification postulates, for instance, that yi1 affects yi2 but that the latter variable has no effect on yi1. An example is the maternal-effects model proposed by Falconer (1965) and examined by Koerhuis and Thompson (1997). This model postulates that the phenotype of an offspring is affected by the phenotype of its dam. In pigs, for instance, it is known that females born in larger litters tend to produce smaller litters (leading to negative values of the λ coefficients); conversely, females in smaller litters are expected to produce larger litters, etc. A recursive specification can be obtained from the “full model” by setting λ12 = 0 in (3) or (5), so that the system is now 13and 14Assuming that the incidence vectors are such that , use of (13) in (14) gives a reduced model for yi2, where and so that
The heritability of trait 1 is the usual . Here, σ2u1 and σ2e1 are the additive genetic and residual variances of trait 1, respectively, contrary to the simultaneity situation, where these dispersion parameters pertain to the variation of genetic and residual effects affecting the distribution of yi1 − λ12yi2. The coefficient of heritability of trait 2 has the form of (8), but with λ21 instead of λ12: 15
Regression of offspring on parent:
The regression of offspring on parent depends on the trait or pairs of traits involved. Using the same notation as in the section for simultaneity, the offspring-parent covariance for trait 1 is Hence the regression of offspring on parent for trait 1 is simply h12/2, the standard result from assuming additive inheritance.
Let yi′2 be a record measured for trait 2 on the offspring of an individual with phenotype yi2. The offspring-parent regression, assuming that between-generation environmental effects are uncorrelated, has a similar form to (9). Consider now the covariance between yi′2, a record for trait 2 measured on the offspring of an individual with phenotype yi1. The offspring-parent covariance between such records is now and the regression coefficient is 16Conversely, with yi′1 being now a record for trait 1 measured on the offspring of an individual with phenotypic value yi2 the offspring-parent regression is 17
Regression of one variable on another:
Recall that and that Then 18Conversely, the regression function of yi2 on yi1 is 19The two preceding expressions reduce to the usual formulas under bivariate normality by letting λ21 = 0.
Genetic and environmental correlations:
The genetic covariance between the two traits is so that the genetic correlation is 20
If σu12 = 0 in which case the sign of the genetic correlation depends on the sign of λ21. Further, if the traits are scaled such that , the correlation is λ21/. It is interesting to observe that when genotypes are expressed in units of standard deviation, the genetic correlation is driven entirely by λ21, which is a gradient operating at the phenotypic level.
Likewise, the environmental covariance and correlation are and 21respectively.
Many possible models:
A multivariate system may involve many response variables, as well as different levels of simultaneity and recursiveness. When the models comprise more than two traits, the issues and principles are as discussed above but the algebra is awkward. For example, consider the simultaneous-equation model for three traits given in Figure 3. Here, the three response variables Y1, Y2, and Y3 have mutually reciprocal effects, so that there are six λ coefficients or structural parameters in the model.
Several different models can be derived as special cases of the specification given in Figure 3. There are 64 models that can be viewed as “nested” within the diagram depicted. In general, for K response variables there are K(K − 1) structural coefficients (λ's) in a fully simultaneous model. Since, in a given model, each coefficient can take the value λij or be constrained to be 0 (when there is no “effect” of variable j on variable i in the latter case), there can be as many as 2K(K−1) possible models for explaining relationships between the phenotypic variables; in practice, however, many of the models can be discarded on mechanistic grounds. For example, if all λ's are set equal to 0 in Figure 3, this yields the standard trivariate model used for quantitative genetic analysis of three traits. Some other models that can arise are illustrated in Figures 4–6. A “cyclically recursive” model is depicted in Figure 4. Here, the causal relationship modeled is Y1 → Y2 → Y3 → Y1 →… For instance, consider a hypothetical situation where Y1, Y2, and Y3 are phenotypic values in sibships of size 3, with the subscript indicating birth order. It may be that the chain of influences is such that the older sib (with phenotype Y1) affects the second sib (Y2) and so on, with the loop closing via an influence of the youngest on the oldest sib.
In Figure 5, it is hypothesized that Y1 has an effect of the recursive type on both Y2 and Y3, but that there is simultaneity between Y2 and Y3; this is referred to as a recursive-simultaneous model. Here, Y1 might be the concentration of a hormone regulating the production of two metabolites that are involved in a feedback relationship. In Figure 6, there is simultaneity between Y2 and Y3, with these two variables affecting Y1; this is a simultaneous-recursive model: two biochemical products (Y2 and Y3) involved in the regulation of Y1 may interact reciprocally toward the establishment of some equilibrium. Clearly, there is a constellation of modeling alternatives.
The preceding discussion illustrates that a general representation is needed for describing the full range of possibilities. For example, the two-variate simultaneous-equations system of Equations 3 and 4 can be put in matrix form as 22This representation embeds four models [K = 2, so 2K(K−1) = 4], including the simultaneous one. The other three models are the standard bivariate specification (λ12 = λ21 = 0) and two recursive models (λ12 = 0 when y1 “affects” y2, but without a reciprocal effect; λ21 = 0 when y2 affects y1).
Let there be K traits observed on individual or “cluster” (e.g., a family) i (i = 1, 2, … , N), and write the system as 23where yi is a K × 1 vector of phenotypic measurements on the K traits of individual i; Λ is a K × K matrix containing at most K(K − 1) unknown λ coefficients (all diagonal elements are equal to 1); and Xi is a known incidence matrix with the form where x′ij is a row vector with pj elements. Without loss of generality, it is assumed that Xi has full-column rank. Further, the location vector β is such that where βj(j = 1, 2, … , K) is pj × 1. The vector ui contains additive genetic effects of individual i for the K traits and, similarly, ei is a vector of residual effects, distributed independently of ui.
If Λ has full rank, the reduced form of the model is 24where , and . For example, in a two-trait simultaneous model, the elements of μ*i, u*i, and e*i take the form given in (7). Assume now that 25and 26are mutually independent. Hence, the vectors of genetic and residual effects involved in the correlations shown in Figures 3–6 follow multivariate normal distributions with covariance matrices G0 and R0, respectively, each having order 3 × 3. This implies that 27and 28are also independently distributed. Further, the marginal distribution of the phenotypic values for individual i is 29
Genetic parameters and functions thereof:
The “multivariate heritability and coheritability” matrix can be defined as 30In the absence of simultaneity or recursiveness, H = G0(R0 + G0)−1, since Λ would be an identity matrix of order K in this case. Note that the trace of (30), is free of the λ coefficients. Now, using the measurements taken on individual i, the best predictor of u*i, in the sense of minimizing the mean square error of prediction among all possible functions of the data (Henderson 1973), is given by the conditional expectation function 31
In multiple-trait selection, animal and plant breeders are often interested in improving a linear combination of genetic values; e.g., , where v is a known K × 1 vector of relative economic values (Smith 1936; Hazel 1943), and u*i contains the “true” genetic values affecting the traits. Recall that the genetic values are ui only in the absence of feedbacks or recursiveness. Suppose the N candidates are independently distributed, so that the density of the joint distribution of all genetic and phenotypic values is given by This is what Henderson (1963)(1973) termed an “equal information” situation. The best predictor of the “merit function” Ti is 32where 33is the classical “selection index” solution to the Smith-Hazel equations
Suppose that selection of a truncation type is based on T̂i in (32), such that a proportion α of the candidates is kept as parents of the following generation. From the forms of (32) and (33), it follows that the mean of the distribution of T̂i in the unselected individuals is 0, since E(yi) = Λ−1Xiβ. Under normality assumptions, standard theory (e.g., Bulmer 1980; Falconer and Mackay 1996) gives a mean of the selected individuals, 34where i = z/α is called “selection intensity” and z is the ordinate of the standard normal distribution at a point at the right of which there is a probability mass equal to α; S stands for selection. Under additive genetic action, the expected genetic value of the progeny of selected parents is equal to the expected value of the selected parents. Hence, the expected response to selection is given directly by (34). For example, consider single-trait selection and the merit function (the additive genetic value of individual i), and suppose that the only source of information is yi1, the phenotypic value for trait 1. In this case, and from the form of (32), it follows that For a two-trait simultaneous system, it was seen earlier that and Hence When λ12 = 0, this reduces to the usual , provided selection is based on .
The covariance matrix between additive genetic values of related individuals i and i′ is where aii′ is twice the coefficient of coancestry between i and i′.
Consider system (23) in conjunction with the normality assumptions (25) and (26), and regard the vector Λyi as “data.” The model for the entire data vector can be written as 35where u comprises additive genetic effects for all individuals and all traits (u may include additive genetic effects of individuals without records), and Z is an incidence matrix of appropriate order. If all individuals have records for all traits, Z is an identity matrix of order NK × NK; otherwise, columns of 0's for effects of individuals without phenotypic measurements would be included in Z. In view of the normality assumptions (25) and (26), one can write and where A is a matrix of additive genetic relationships (or of twice the coefficients of coancestry) between individuals in a genealogy, and ⊗ indicates Kronecker product. Note that I ⊗ R0 reflects the assumption that all individuals with records possess phenotypic values for each of the K traits. This is not a requirement, but it simplifies somewhat the treatment that follows.
Given u, the vectors Λyi are mutually independent (since all ei vectors are independent of each other), so the joint density of all Λyi is 36where Zi is an incidence matrix that “picks up” the K breeding values of individual i (ui) and relates these to its phenotypic records yi. Making a change of variables from Λyi to yi (i = 1, 2, … , N), the determinant of the Jacobian of the transformation is |Λ|. Hence, the density of is 37This is the density of the product of the N normal distributions highlighting that the data generation process can be represented in terms of the reduced model (24), with the only novelty here being the presence of the incidence matrix Zi, with the latter being a K × K identity matrix in (24). Hence, the entire data vector can be modeled as 38where XΛ is an matrix (again, assuming that each of the N individuals has measurements for the K traits), and ZΛ has order NK × (N + P)K, where P is the number of individuals in the genealogy lacking phenotypic records (the corresponding columns of ZΛ being null). Observe that (38) is in the form of a standard multiple-trait mixed-effects linear model, save for the fact that the incidence matrices depend on the unknown structural coefficients contained in Λ. Hence 39where is a block-diagonal matrix consisting of N blocks of order K × K, and all such blocks are equal to Λ−1R0Λ′−1. It follows that y|Λ, β, u, R0 ∼ N(XΛβ + ZΛu, RΛ). Hence, if simultaneity or recursiveness holds, the estimator of the residual variance-covariance matrix from a reduced model analysis is actually estimating Λ−1R0Λ′−1; this has a bearing on the interpretation of the parameter estimates.
Since it is assumed that u|G0 ∼ N(0, A ⊗ G0), the likelihood function is given by 40This likelihood has the same form as that for a standard multivariate mixed-effects model, except that, here, additional parameters (the nonnull elements of Λ) appear in both the location and dispersion structures of the reduced model (38). A pertinent issue, then, is whether or not all parameters in the model, that is, Λ, β, R0, and G0, can be identified (i.e., estimated uniquely) from the likelihood. This is discussed in the following section.
IDENTIFICATION OF PARAMETERS
Consider the system of K response variables (23), and reorganize it as 41where and εi = ui + ei is a residual. It is convenient to lump the sum of the two random effects into a single residual for the treatment that follows. Rewrite where xi now is a column vector of order , and is . In practice, it suffices to keep the distinct explanatory variables in xi; e.g., if herd effects affect all traits in the system, only a single set of incidence variables needs to be considered. With this notation, (41) can be put as 42where λ′j and b′j (j = 1, 2, … , K) are the jth rows of Λ and B, respectively. The specification constitutes the jth equation of the system. Compactly, the system is 43The reduced model is expressible as 44where Π = −Λ−1B is a matrix of reduced model parameters, and . The system in (43) contains, at least potentially, the following number of parameters: K2 (all elements of Λ, including the 1's in the diagonal), (all elements of B, including the null ones), plus K(K + 1) (the distinct elements of R0 and G0). It is assumed that these two variance-covariance matrices can be separated in the estimation procedure, which depends on the genetic structure of the data set. Letting , the total number of parameters in the system is . In the reduced model, on the other hand, the number of potential parameters is K2p̅ (the order of Π), plus K(K + 1) (the elements of and those of ), yielding as the total number of parameters. To obtain unique estimates of the parameters in Λ, B, G0, and R0, S − R = K2 restrictions are needed for uniqueness. These can be of four types (Judge et al. 1988), as follows.
“Normalization” restrictions: set the diagonal elements of Λ to 1, so that the parameters in equation j are expressed relative to this constant of proportionality. This yields K restrictions, so an additional K2 − K = K(K − 1) restrictions are still needed.
Exclusion restrictions: some of the λ coefficients may be 0, as in a recursive model, or the elements of β may not appear in each of the equations.
Restrictions in the form of a linear combination of parameters in the same equation or across equations.
Restrictions on the variance-covariance matrices G0 and R0 (typically, such restrictions are not employed in quantitative genetic analysis).
Formal procedures for evaluation of identification of equations are described by Johnston (1972) and Judge et al. (1988). Suppose that Π is given and that one wishes to estimate uniquely (identify) the parameters in Λ and in B. Briefly, note that the parameters of the reduced model, Π = −Λ−1B, satisfy ΛΠ + B = 0 or, equivalently, 45Consider now row j of (45) and write it as Transposing, this yields 46This defines a system of equations on unknowns in which the rank of the known coefficient matrix is . Hence, K restrictions are needed to identify the unknown parameters λj and bj of equation j of the system. The restrictions can be denoted (Judge et al. 1988) as 47where is Rj is a matrix of rank . For example, an exclusion restriction can be indicated by filling the appropriate row of Rj with 0's, save for a 1 in the position corresponding to the element of ′ to be excluded from equation j. Since the normalization restrictions set all diagonal elements of Λ as equal to 1, and then λjj = 1 (λjj is the jth element of λj), this implies that (47) must provide K − 1 linearly independent relationships, so that one can arrive at the K restrictions needed. Now, combine (46) and (47), to arrive at the system 48where Rj = [Rjλ Rjb] is given in partitioned form, and the coefficient matrix must have rank to obtain unique estimates of λj and bj. Johnston (1972) and Judge et al. (1988) state that the rank of the coefficient matrix is if and only if the rank of 49is K − 1. Note that the preceding matrix has order J × K and that column j is null by virtue of (47). Hence, for (49) to possess rank K − 1, it must be that J ≥ K − 1; i.e., a condition for identification of equation j is that the number of restrictions J must be at least K − 1 (recall that K is the number of traits in the system). However, this is not sufficient: as stated, the rank of (49) must be K − 1.
In short, if the rank of (49) is K − 1, equation j is just identified, meaning that the relationship between the reduced model parameters and the λ's and β's in the equation is unique. If the rank is larger than K − 1, the equation is overidentified, meaning that there are many ways in which the structural model parameters can be expressed as a function of the elements of Π. In these two cases, the λ's and β's may be inferred efficiently, using methods that employ all information available in the data, e.g., maximum-likelihood or Bayesian procedures. Finally, if the rank of (49) is smaller than K − 1, equation j is underidentified, and the structural parameters cannot be solved as a function of the reduced model parameters (Dreze and Richard 1983).
The preceding developments are illustrated with a two-trait simultaneous model. Suppose that yi1 and yi2 are measurements of systolic and diastolic blood pressure, respectively, taken on individual i; assume that physiological knowledge postulates a feedback between the two variables. Let the models be and where Age is the age of i in years; Smoking is a binary variable (0 represents no smoking during the year prior to measurement and 1 represents smoking); Drinking is an estimate of the amount of alcohol i consumed in the year previous to the blood pressure test, ignoring a possible error of measurement, and Exer measures the extent to which i exercises. The u and e variables are additive genetic and residual effects, as before, and the λ's and β's are the structural model parameters. Here, K = 2 and the number of x variables is 5, since the two intercepts β11 and β21 are related to the measurements via the same incidence variate, which takes the value 1 for all i. The first equation has three “beta coefficients” (β11, β12, and β13) and the second has four (β21, β22, β24, and β25). Before normalization and
Equation 1 of the system uses the two exclusions β14 = β15 = 0. Hence, (49) is
The rank of this matrix is 1 (which is K − 1), so that the equation is identified. Equation 2 of the system employs the exclusion β23 = 0 so that (49) is Since the rank of this matrix is 1, the second equation is identified as well. Hence, λ12, λ21, and the elements of β1 = [β11 β12 β13]′ and of β2 = [β21 β22 β24 β25] can be estimated uniquely.
The form of the likelihood function given in (40) suggests that obtaining maximum-likelihood estimates of the structural model parameters Λ, β, R0, and G0 is not an easy matter, with a main difficulty being the fact that Λ is unknown. On the other hand, if the elements of this matrix were given, the setting would be as in a multivariate mixed-effects linear model, so standard procedures, such as the expectation-maximization (EM) algorithm, could be employed for computing the likelihood-based estimates. Another complication is that, typically, highly nonlinear functions of the parameters must be inferred. For example, see the forms of the mean μ*1 in model (7) and of the coefficient of heritability in (8). Intuitively, asymptotic approximations to the sampling distribution of the maximum-likelihood estimates may be relatively less accurate at a given sample size when the parametric function of interest is nonlinear than when it is linear. Note, however, that μ*1 and (8) may be inferred from the reduced model, via the standard multivariate parameterization. In special circumstances, one can form estimators of the structural parameters from statistics derived from the reduced model. These are called “indirect” procedures in econometrics (Johnston 1972).
Also, inferring random effects is of great importance in applied quantitative genetics (e.g., animal, plant, or tree breeding), and their best predictor would take a form such as in (32). In practice, however, calculations require replacing the unknown structural parameters by their maximum-likelihood estimates, that is, computing 50If interest focuses on the “system” genetic effects, the statistic would be The finite sample properties of the resulting empirical predictors are unknown. A common Bayesian criticism (Box and Tiao 1973; Gianola and Fernando 1986; Sorensen and Gianola 2002) is that (50) does not take the uncertainty (error of estimation) of the estimates of the parameters into account.
An alternative is to adopt a Bayesian approach, where inferences about structural parameters, random effects, or functions thereof are made from their marginal posterior distributions (Zellner 1971, 1979; Box and Tiao 1973; Gelman et al. 1995; Carlin and Louis 2000; Sorensen and Gianola 2002). A review of some of the issues in simultaneous models from an econometric perspective is in Zellner (1979), Dreze and Richard (1983), Judge et al. (1985), and Koop (2003). A salient feature of the Bayesian analysis is its ability to produce exact finite sample inference, as well as to override potential underidentification of parameters. If proper priors are adopted for all parameters in a model, all posterior distributions are proper as well (Bernardo and Smith 1994; O'Hagan 1994). However, unless the parameters are identifiable in the likelihood, the influence of the prior does not dissipate asymptotically. An example of this is in Carlin and Louis (2000) and in Gianola and Sorensen (2002). These authors discuss a situation where two random variables have the distributions Xi ∼ N(μ, σ2) and Yi ∼ N(η, σ2), say, but Zi = Xi + Yi (i = 1, 2, … , N) is observed. Here, maximum likelihood can estimate μ + η, but not μ or η separately. The individual means cannot be inferred separately because an infinite number of combinations of the elementary parameters confer the same likelihood to μ + η. On the other hand, a Bayesian analysis with proper priors assigned to both μ and η gives distinct, proper, marginal posterior distributions of μ and η. However, the usual asymptotic domination of the prior by the likelihood does not occur. Even when sample sizes are of infinite size, the prior matters; see Dreze and Richard (1983) and O'Hagan (1994). For this reason, it is always important to investigate parameter identification, as discussed in the preceding section. A Bayesian analysis of the structural model (35) is presented subsequently.
Prior and posterior distributions:
Represent all unknown parameters of the model by θ = [λ, β, u, R0, G0], where λ is a vector containing the unknown elements of Λ, after normalization; for example, in a fully simultaneous model with K = 3, λ would contain six λ coefficients. No distinction is made here between fixed and random elements in the parameter vector since all unknowns are treated as unobservable random variables in the Bayesian setting. Bayes' theorem gives as density of the posterior distribution where p(θ|H) is the prior density of θ, H represents the collection of all known hyperparameters, and p(y|θ) is the density of the data. Note that p(y|θ) = p(y|Λ, β, u, R0), since, given u, the data-generation model does not depend on the genetic covariance matrix G0. Take as joint prior density of all parameters 51where, for example, p(λ|Hλ) is the density of the prior distribution of λ and Hλ denotes a set of known hyperparameters. We take as prior distribution of λ the Gaussian process 52where 1 is a vector of ones, λ0 is a known prior mean, common to all λ's, and τ2 is a tuning parameter, adjusting the degree of sharpness of the prior. Further, we assign the priors 535455and 56where IW(νR, VR) and IW(νG, VG) denote K-dimensional inverse Wishart distributions with “degrees of freedom” parameters νR and νG, respectively, and scale matrices VR and VG. Collecting (52–56), the joint prior density of all parameters in (51) is 57 Combination of (57) with the data density (37) gives as joint posterior density of all estimands 58
FULLY CONDITIONAL POSTERIOR DISTRIBUTIONS
Often, the fully conditional posterior distributions of the parameters of a model can be ascertained from the joint posterior density, (58) in this case, by retaining the parts varying with the parameter (or group of parameters) of interest and treating the remaining parts as known (e.g., Sorensen and Gianola 2002). This procedure is employed in what follows.
Distributions of β and u:
Using the preceding concept in (58), one has that Now, since the value of λ is given in this conditional distribution, one can treat this vector as known and form the pseudo-data vectors Λyi (i = 1, 2, … , N). Thus where p(Λy1, Λy2, … , Λyn|Λ, β, u, R0) is as in (36). Retaining only the terms that depend on β and u, the explicit form of the conditional posterior density above is 59Note in (59) that the influence of the prior distribution of β on conditional (or marginal) inferences can be tempered by taking a large value of the spread parameter b2. The preceding expression can be recognized as the posterior density of the location parameters in a Gaussian linear model with proper priors and known variance components (Lindley and Smith 1972; Gianola and Fernando 1986; Sorensen and Gianola 2002). It is well known that that the corresponding distribution is 60where 6162and Samples from (60) can be obtained directly, or by the method of Garcia-Cortés and Sorensen (1996), or via Gibbs sampling, element-wise or block-wise (e.g., Wang et al. 1993, 1994; Sorensen and Gianola 2002).
Distributions of G0 and R0:
Retaining in (58) the parts that vary with G0 gives taking the explicit form 63where aij is the element in row i and column j of A−1. This reveals that the conditional posterior distribution of G0 is the inverse Wishart process 64where q is the order of A. Similarly, the density of the conditional posterior distribution of R0 is Since the elements of Λ are given in this distribution, one can also write 65where ri = Λyi − Xiβ − Ziu. This is the density of the inverse Wishart distribution 66
Distribution of the elements of λ:
All conditional posterior distributions considered so far are recognizable. However, this is not so for the elements of λ. Retaining in (58) the parts that vary with the structural coefficients λ yields This has the form 67Hence, contrary to (60), (64), and (66), the conditional posterior distribution of λ is not standard, except in a fully recursive system (Zellner 1971), as noted below. Subsequently, a Metropolis scheme (Tanner 1993; Gelman et al. 1995) is developed for drawing samples from the distribution having (67) as density.
Consider Λyi, and rewrite it (put K = 3, to illustrate) as where Using this representation in (67) above gives where wi = yi − Xiβ − Ziu. The two quadratic forms on λ appearing in the exponents can be combined using standard results (e.g., Box and Tiao 1973; Sorensen and Gianola 2002), yielding 68where and An important simplification occurs in a fully recursive system, e.g., when y2 is a linear function of y1, and y3 = f(y1, y2); here, Λ is a triangular matrix, so that |Λ| = 1. In this case, the conditional posterior distribution of λ is exactly the normal process N, so sampling is straightforward.
For the general case, our Metropolis algorithm for drawing samples from (68) uses the multivariate normal distribution N for generating candidates. The algorithm proceeds as outlined below:
Draw a candidate λ* from N, with λ̂ and Vλ computed from the given β, u, R0, G0, y, and H values, which we refer to as state of the system at time t.
Calculate the acceptance probability
MARKOV CHAIN MONTE CARLO AND INFERENCES FROM THE SAMPLES
Drawing marginal inferences about unknown aspects of the joint posterior distribution (58) is impossible by analytical means, since this requires high-dimensional integration of unwieldy expressions. Instead, we propose to infer features of interest by iterative sampling via Markov chain Monte Carlo methods. The requisite theory is presented in Gilks et al. (1996), Robert and Casella (1999), and Sorensen and Gianola (2002). Briefly, the idea is to create a Markov chain with (58) as equilibrium distribution. In our context, the sampling starts from some initial point θ(0) inside the parameter space, with updates obtained by successive looping through the fully conditional posterior distributions (60), (64), (66), and (68). This defines what has been termed a “Metropolis within Gibbs” algorithm: the draws from the known fully conditionals constitute the Gibbs proposals (accepted with probability equal to 1), with the Metropolis step for λ completing a loop of the algorithm (if the system is recursive, this becomes a Gibbs step as well). The early samples are discarded as burn-in, i.e., prior to declaration of convergence, and successive samples (m, say) are collected subsequently until a sufficiently small Monte Carlo error of estimation of features of the posterior has been attained. Typically, the estimators of the features are ergodic averages. A discussion of convergence analysis is in Cowles and Carlin (1996).
To illustrate, consider inferring the offspring-parent regression in a simultaneous two-trait model, as discussed in genetic consequences of simultaneity in a two-trait system. The Monte Carlo estimator of the regression is where, for example, σu12 and λ12 are samples from the marginal posterior distributions of σu12 and λ12, respectively. A second example is that of inferring the additive genetic value of an individual for trait 1 in connection with model (7). The mean of the posterior distribution is estimated as Further, in the context of multivariate selection, the estimate of the posterior mean of the merit of candidate i would be, using (32) and (33),
This article proposes an extension of the standard linear models for analysis of quantitative traits in genetics to situations in which there is feedback or recursiveness between phenotypic values. In addition, techniques for parameter inference are described, with a focus on Bayesian analysis via Markov chain Monte Carlo methods. The developments represent a merger of standard quantitative genetics theory and modern tools from Bayesian inference with existing knowledge from econometrics and sociology, i.e., simultaneous systems and structural equations models (Joreskög and Sorböm 2001).
Wright (1925) pioneered on this type of treatment of multivariate systems, beginning with his “corn and hog correlations,” as noted by Goldberger (1972), and reaching a climax in Wright (1960), where a difficult path analysis of “reciprocal interaction with or without lag” is presented. Curiously, although Wright's ideas were foundational in animal breeding and quantitative genetics, his work on feedback and on joint determination of systems of variables did not receive much attention (if at all) in these fields, even though biological systems typically display reciprocity between variables, with instantaneous or delayed feedbacks. Thus, it is perplexing that this type of work has been ignored in quantitative genetics. Path analysis (Wright 1921) was not favored by influential statistical geneticists such as Charles Henderson (C. Henderson, personal communication) and Oscar Kempthorne (Kempthorne 1969), who preferred variance component models and matrix treatments instead. For instance, calculation of additive relationships and of inbreeding coefficients is straightforward with tabular (matrix) methods (Henderson 1976; Quaas 1976), and the inverse of large relationship matrices is crucial in genetic evaluation via the mixed-model algorithm (see Gianola 2001, for a vignette). Similar calculations are awkward, at best, if done with the method of path coefficients, and it is not obvious how the needed inverse can be generated directly from the paths. Also, animal breeders often need to account for nuisance fixed effects (the explanatory variables are often discrete) and for interaction variance components in the models. This also contributed to the fading away of path analysis in animal breeding, since it was not clear how these problems could be treated in the standard framework of path analysis. Arguably, a heavy emphasis on large-scale computations may have distracted animal and plant breeders away from the modeling process, where a path diagram has an unparalleled eloquence.
On the other hand, path analytic methods have had an important impact in sociometrics. For example, see Duncan (1975) for an account of structural equation models. Further, the LISREL software (now in version 8; Joreskög and Sorböm 2001) for estimation of structural coefficients has been used in sociology since the mid 1970s. A discussion of the use of structural equation models in biology is in Shipley (2002), with emphasis on causal inference; an abundant literature on estimation and testing in structural models can be found in this text. In this context, our article can be viewed as an attempt to reclaim, for quantitative genetics, the heritage of Wright modeling ideas, although in the light of modern machinery developed mainly by econometricians.
As pointed out in matrix representations, there can be many competing models for explaining relationships in a multivariate system. Thus, model selection issues are paramount. The standard Bayesian tool kit for model selection and criticism, including the Bayes factor (Bernardo and Smith 1994; O'Hagan 1994) and posterior predictive checks (Gelman et al. 1995; Sorensen and Waagepetersen 2003), can be employed in connection with simultaneity and recursiveness without extra difficulties, other than those related to the computations.
The methods presented here can be extended in several directions. For example, some of the response variables involved may be discrete. Here, one could introduce latent variables, such as the typical liability variate of quantitative genetics (Thompson 1979; Gianola 1982), and then model feedback or recursiveness at that level. Also, nonlinearity (in the parameters) may be dictated by mechanistic considerations. For example, a gene product may accumulate following the trajectory of some nonlinear growth curve, and one may wish to capture such behavior in the model. Lack of linearity makes identification and computation more difficult, but the Bayesian Markov chain Monte Carlo machinery works satisfactorily with nonlinear systems (Gilks et al. 1996). Another extension consists of a hierarchical modelling of the λ coefficients controlling feedback or recursion. Here, we have assumed that the model depends on a single set of λ parameters. However, these may be clustered, e.g., family aggregation of parameters, or be affected by a locus with major effects. This sort of multitier specification can be handled via the usual Bayesian modeling of hierarchies, in the sense of Lindley and Smith (1972).
We have focused on feedback or recursiveness at the phenotypic level. However, these effects might also take place at the level of the genes, in which case the model could be written as with a “purely genetic” (phenotypic) feedback obtained by setting Λy (Λu) equal to an identity matrix. It would be of interest to study parameter identification issues in this more general model.
There are situations specific to animal breeding where introducing simultaneity in the model may alleviate some inadequacies of standard treatments. Consider, for example, dairy cows with first and second lactation milk yield records. It is known that if a cow has a good first record of performance, then there is a chance that she will receive preferential nutrition and management, relative to her herd mates. One can think of an effect of the first record on the second even in the absence of a mechanistic basis for this when the breeding value and permanent environmental effect of the cow are included in the statistical models for genetic evaluation. Similarly, one may consider including an effect of second on first records on similar grounds, as if the future had an influence on the present.
In short, there is a plethora of potential scenarios in biology where recursiveness or simultaneity enters into the realm of the possible. In spite of the towering influence of Sewall Wright, these phenomena have been essentially ignored in quantitative genetics. It is hoped that the techniques described in this article will contribute toward a deeper understanding of complex traits.
The authors thank James Crow, Arthur Goldberger, Guilherme Rosa, and Arnold Zellner for useful comments. Research was supported by the Wisconsin Agriculture Experiment Station and the Danish Institute of Agricultural Sciences and by grants from the National Research Initiatives Competitive Grants Program/U.S. Department of Agriculture (99-35205-8162 and 2003-35205-12833) and the National Science Foundation (DEB-0089742).
This article is dedicated to Arthur B. Chapman, teacher and mentor of numerous animal breeding students and disciple and friend of Sewall Wright.
Communicating editor: J. B. Walsh
- Received December 11, 2003.
- Accepted March 5, 2004.
- Genetics Society of America