## Abstract

Structural equation models (SEMs) of a recursive type with heterogeneous structural coefficients were used to explore biological relationships between gestation length (GL), calving difficulty (CD), and perinatal mortality, also known as stillbirth (SB), in cattle, with the last two traits having categorical expression. An acyclic model was assumed, where recursive effects existed from the GL phenotype to the liabilities (latent variables) to CD and SB and from the liability to CD to that of SB considering four periods regarding GL. The data contained GL, CD, and SB records from 90,393 primiparous cows, sired by 1122 bulls, distributed over 935 herd-calving year classes. Low genetic correlations between GL and the other calving traits were found, whereas the liabilities to CD and SB were high and positively correlated, genetically. The model indicated that gestations of ∼274 days of length (3 days shorter than the average) would lead to the lowest CD and SB and confirmed the existence of an intermediate optimum of GL with respect to these traits.

MULTIVARIATE mixed models have been extensively used in quantitative genetics to study genetic and environmental correlations between traits. Although in some cases standard mixed models (SMMs) can be seen as a reparameterization of linear recursive models, the former do not pose feedback or recursive relationships between phenotypes, which are generally present in biological systems. On the other hand, structural equation models (SEMs) can be used to study cause-and-effect relationships (Wright 1934b). These models were first introduced in genetics by Wright (1921), but have been essentially ignored in quantitative genetics until recently. Gianola and Sorensen (2004) presented statistical methods for inferring such relationships at the phenotypic level in a quantitative genetics scenario. The reappearance of these methods has stimulated application of SEMs to animal genetics and breeding (de los Campos *et al.* 2006a,b; López de Maturana *et al.* 2007a; Varona *et al.* 2007; Wu *et al.* 2007).

There has been increasing interest in birth-related traits (*e.g.*, gestation length, GL; calving difficulty, CD; and stillbirth, SB) in dairy cattle breeding in the past decade (Groen *et al.* 1997). Research has been primarily motivated by their economic importance and by animal well-being considerations. Calving difficulty, caused by an incompatibility between the calf's size and the dam's pelvic area (Meijering 1984), increases veterinary and labor costs, culling risk, and mortality in cows and calves, decreases milk production in the next lactation, and leads to lower female fertility in the next reproductive cycle (Dematawewa and Berger 1997; López de Maturana *et al.* 2007a,b). It is a complex trait controlled by genes affecting the ability of the calf to be born easily (direct genetic effects) and by genes affecting the ability of the cow to give birth without problems (maternal genetic effects) (Meijering 1984).

Stillbirths, defined as calves dying prior to 24 or 48 hr after calving, also cause significant costs to the dairy industry (Meyer *et al.* 2001). A multifactorial noninfectious etiology of SB has been reported in several studies (Meyer *et al.* 2000; Berglund *et al.* 2003; Steinbock *et al.* 2003). Calving difficulty is a relevant predictor of SB (Meyer *et al.* 2000), because prolonged parturitions cause extended hypoxia and significant acidosis, which can lead to the calf's death (Breazile *et al.* 1988; House 2002). Nevertheless, many other factors affect SB, since more than half of all stillborn calves are born from normal or easy calvings (Philipsson and Steinbock 2003; Steinbock *et al.* 2003). Furthermore, Steinbock *et al.* (2003), in a genetic analysis of SB, reported that only about half of its genetic variation is accounted for by problems at calving, suggesting that CD and SB should be treated as different traits in statistical analysis. Meyer *et al.* (2000) found that GL was the third most important predictive factor in the occurrence of SB in primiparous and multiparous cows. These authors found nonlinear relationships between GL and SB, which is in agreement with other studies (Meijering 1986; McGuirk *et al.* 1999; Hansen *et al.* 2004). Factors such as changes in endocrine profiles in late stages of pregnancy and calving supervision by farmers have been related to SB as well (Berglund *et al.* 2003; Swalve *et al.* 2006).

The objective of this study was to explore relationships between calving traits, using a recursive Gaussian-threshold model within a Bayesian framework. The model is a sire–maternal grandsire mixed-effects process, accounting for (1) the categorical nature of CD and SB, (2) nonlinear relationships between the GL phenotype and the liabilities to CD and SB, as well as a linear relationship between the liabilities to CD and SB, and (3) direct and maternal effects for all three traits.

## MATERIALS AND METHODS

#### Data:

The data consisted of a sample of primiparous Holstein cows calving from 2000 to 2005 that were recorded as a part of the National Association of Animal Breeders (Columbia, MO) Calving Ease Program. All cows used in the analysis had records for GL, CD, and SB.

Gestation length was calculated as the interval in days between breeding and calving. Calving difficulty was scored subjectively using a 1–5 scale in which 1 corresponded to unassisted calvings and 5 corresponded to extremely difficult calvings. Stillbirth was recorded as follows: 1, alive; 2, dead prior to 24 hr after calving; and 3, dead between 24 and 48 hr after calving. Categories 4 (considerable force) and 5 (extreme difficulty) for CD and scores 2 and 3 for SB were combined in the present analysis, to alleviate the “extreme category problem” arising in threshold models (Moreno *et al.* 1997). After editing, the final data set comprised GL, CD, and SB records from 90,393 primiparous cows, sired by 1122 bulls, mated to 567 service sires and distributed over 935 herd–calving year combinations. A summary of the data is given in Table 1. As expected, a nonlinear relationship was observed between GL and the incidences of CD and SB (Figure 1). There was an intermediate range of GL values at which CD and SB were lowest. Calves with gestation ∼274 days in length (3 days shorter than the average, 277–278 days) had the lowest rates of CD and SB. Calves with the shortest gestations had the highest rates of stillbirths. The incidence of SB decreased as the GL increased to 273 days and then increased thereafter. Regarding CD, calves with the longest and shortest gestations had higher incidence than those with intermediate gestation lengths.

On the basis of the nonlinear relationships depicted in Figure 1, and after analyzing the fit of B-splines with degree 1 considering different groupings, the 90,393 cows were classified into four groups, such that a linear approximation could be used to describe the relationship between GL and the incidences of CD and SB within each group. The incidences of CD and SB, the mean of GL for these groups, and the corresponding number of calving records are presented in Table 2.

#### Threshold models:

A threshold model (Wright 1934a) postulates that an ordered categorical variable *d* observed in individual *i*, , is the expression of an underlying unobservable continuous variable , referred to as liability (Falconer 1965). The variable takes values in one of mutually exclusive and ordered categories, which are bounded by thresholds (**t**). Thus, the probability that corresponds to category *c*, given the liability and the threshold(s), is expressed as(1)Here 1(.) is an indicator function, taking the value 1 when expression (.) is true and 0 otherwise. The statistical model adopted for the liabilities to CD and SB is discussed later in the article.

#### Biological relationships:

It is reasonable to hypothesize that recursive effects exist from the GL phenotype to the liabilities to CD and SB. Further, we assume that the liability to CD has a recursive effect on the liability to SB. A graphical description of the model is illustrated in Figure 2, where structural coefficients , , and describe the rates of change of the liabilities to CD and SB with respect to GL and that of the liability to SB with respect to the liability to CD, respectively. Thus, the acyclic graph assumes that GL has both direct and indirect recursive effects on the liability to SB; the indirect effect would measure how much the liability to SB would be affected by a change in GL through the mediating effect of the liability to CD. This indirect effect can be calculated as the product of the structural coefficients × . The overall recursive effect of GL on liability to SB is the sum of the direct and indirect recursive effects, + × (Shipley 2002).

#### Recursive Gaussian-threshold model:

A recursive model with heterogeneous structural coefficients (Wu *et al.* 2007) was used to analyze the relationships between the calving traits considered. A different structural coefficient matrix was assumed for each of the four GL groups, leading to heterogeneous covariance matrices for genetic effects, common environmental effects, and residuals in the four subpopulations. Here, the method was extended further to joint analysis of Gaussian and categorical traits.

Assuming a joint multivariate normal distribution for GL and the liabilities to CD and SB, the data for individual *i* belonging to subpopulation *k* were modeled as(2)where(3)Here, (*k* = 1, … , 4) is the matrix of structural coefficients corresponding to subpopulation *k* for GL, and it takes the form(4)In (2) above, is a 3 × 1 vector containing the observed GL and the unobserved liabilities to CD and SB for the *i*th individual (calf); is a vector including effects on the three traits, sex of calf (2 levels), age at first calving (4 levels), and the combination of calving year and calving season (12 levels); is a vector of contemporary group effects (herd–calving year, with 935 levels); **s** and **mgs** are vectors of the calf's sire and maternal grandsire genetic effects (with 567 and 1122 levels, respectively); and **e** is the vector of residuals. , , , and are incidence matrices of appropriate order.

Sires and maternal grandsire effects for the three traits were assumed to be distributed, *a priori*, as multivariate normal, with a null mean vector and a (co)variance matrix , where is the numerator relationship matrix among sires and maternal grandsires, involving 1619 bulls, andwithand(5)In (5), for example, is the between-sire variance for GL, is the variance between maternal grandsires for CD, and is the (co)variance between sire effects of GL and CD.

Effects of herd–year combinations (**h**) on the three traits were assumed to follow a multivariate normal distribution with null mean vector and (co)variance matrix , whereAbove, is the covariance between herd–year effects on GL and CD, and so on; **I** is an identity matrix of order 935. Finally, the residual vector was assumed to follow a multivariate normal distribution with mean vector 0 and (co)variance matrix , where(6)and **I** is and identity matrix of order 90,393.

#### Markov chain Monte Carlo implementation and posterior analysis:

Denote and, for simplicity, ignore hyperparameters in the notation, and let and be the vectors denoting the observed categories of CD and SB, respectively, for the 90,393 cows. The joint posterior density of θ and of the liabilities is given by(7)Note that and are independent, given the liabilities. The third term is the density of the sampling model for GL and the liabilities to CD and SB. The last term is the joint prior density of the unknown parameters, and it can be factorized as follows, under the assumption of prior independence between groups of parameters:(8)Multivariate normal prior distributions were assigned to the structural coefficient vectors ; systematic effects , where ; herd–year effects, ; and sire and maternal grandsire genetic effects, so that their fully conditional posterior distributions were also normal. The prior distributions of the genetic and herd–year (co)variance and were assumed to be inverted Wishart distributions with 8 and 5 d.f., respectively, so that their fully conditional posterior distributions were inverted Wishart as well (Sorensen and Gianola 2002).

To ensure identifiability in the threshold models, the first threshold delimiting categories of SB and CD was set to 0 and the second threshold for CD was set to 1. The third threshold for CD was sampled through a Metropolis algorithm, using a method described by Albert and Chib (1997) and applied by González-Recio *et al.* (2006). In addition, the residual variance of the binary trait SB was set to 1.

At the level of the marginal likelihood, the proposed recursive model is underidentified, since there are three extra parameters. To achieve identification, the residuals corresponding to different traits were assumed uncorrelated, so that the matrix was assumed to be diagonal. The prior and, consequently, the fully conditional distributions of residual variances of GL and CD were assumed to be scaled inverted chi-square distributions.

Bayesian inference about parameters was via a Markov chain Monte Carlo implementation using the SirBayes package (Wu *et al.* 2007). The length of the chain and the burn-in period were assessed by visual examination of trace plots of posterior samples of selected parameters plus additional diagnostic checks. After a preliminary run, it was decided to run five independent chains, each consisting of 10,000 iterations, and starting from random points generated from the prior distributions. In each chain, the first 1000 iterations were discarded as burn-in; subsequently one of each 10 successive samples was retained, to store draws that were more loosely correlated. Thus, 4500 samples were used to infer posterior distributions of unknown parameters.

Features of marginal posterior distributions of interest, the convergence analysis, and estimates of Monte Carlo error were obtained using the BOA package (http://www.public-health.uiowa.edu/boa).

#### Genetic and structural parameters:

##### Direct and maternal genetic parameters:

Additive genetic (co)variances of direct and maternal effects were derived using the formulas(9)(Willham 1972; Kriese *et al.* 1991), where, for example, , , , and are the variances of additive direct genetic effects, additive maternal genetic effects, and sire and maternal grandsire effects, respectively, when *i* = *j*; and and are the covariances between additive direct genetic and maternal genetic effects and between sire and maternal grandsire effects, respectively, when *i* = *j*.

##### System parameters:

Location and dispersion parameters in the recursive model differ from those in a SMM and are interpretable as “system parameters.” For example, the covariance matrices (*i.e.*, ) need to be “adjusted” by the matrix of the structural coefficients (), to be comparable with their counterparts obtained from SMMs (Sorensen and Gianola 2002):(10)Note that, although was assumed to be diagonal, is a full matrix with nonzero off-diagonal elements.

#### Posterior analysis of structural coefficients:

Because the recursive relationships were assumed to exist between the phenotype of GL and the liabilities to CD and SB, the rates of change [*i.e.*, , , and ] are expressed in liability units. The interpretation of these recursive effects, however, is not transparent, and it helps if estimates are transformed to the observable scale. A method, inspired by Dempster and Lerner (1950), is presented below. Here, only the rate of change of liability to CD with respect to GL [] is considered, but the method extends to the other structural coefficients.

The issue is how the probability of having a difficult calving (CD = 4 + 5) changes as GL increases by 1 day and how to express the rate of change accordingly. If is positive, an increase of GL by 1 day will increase liability to CD by units per day. This means that the probability of having a difficult calving increases, and the threshold separating categories 3 and 4 “moves to the left.” If is negative, the liability to CD decreases, as well as the probability of having difficult calvings, and the threshold “moves to the right.”

Let the change in the probability scale be(11)where is the difference between the probability of having CD after () and before () a change of 1 day of GL and can be calculated as(12)where and are the mean and residual standard deviation of the liability to CD, respectively.

## RESULTS AND DISCUSSION

#### Posterior analysis:

The five chains gave similar estimates of posterior densities for all unknown parameters monitored. Small Monte Carlo errors (∼10^{−3}–10^{−4}) were obtained, reflecting a good mixing of chains, and the shrinkage factors in the procedure of Gelman and Rubin (1992) were near unity. Thus, convergence was assumed and samples from the five chains were combined to estimate features of marginal posterior distributions of parameters of interest.

#### Genetic parameters:

Table 3 shows the posterior means and standard deviations of the direct and maternal genetic, herd–year, and residual (co)variances of the system (*i.e.*, ).

For the sake of comparison, results from the analyses of SMMs are also discussed in this section.

Posterior distributions of direct and maternal heritabilities for the three calving traits were very similar in the four subpopulations and also similar to their counterparts from the analyses of the SMMs (see Table 4). The posterior mean of direct heritability of GL was higher than that for maternal heritability (0.39 *vs.* 0.08); corresponding estimates for CD (0.08 *vs.* 0.07) and SB (0.07 *vs.* 0.10) were similar. Heritability estimates were within the range of values reported in previous studies (Steinbock *et al.* 2003; Hansen *et al.* 2004; Heringstad *et al.* 2007); estimates for CD and SB were higher than those used in the U.S. Department of Agriculture genetic evaluation of CD and SB, except for that of direct heritability of CD (Wiggans *et al.* 2003; Cole *et al.* 2007). The estimate of direct heritability of GL, higher than the one of maternal heritability, may reflect the fact that mainly fetal genes determine the onset of parturition (Meijering 1984; Hansen *et al.* 2004). Regarding CD and SB, the present results suggest similar contributions to the variance by genes controlling direct and maternal effects.

Features of posterior distributions of genetic correlations between direct effects and between maternal effects by group of GL are shown in Table 5. Posterior means of genetic correlations between direct effects for GL and CD were moderate, positive, and slightly higher than those between maternal effects for GL and CD. Those estimates were similar in sign to but different in magnitude from their counterparts obtained from the analysis of SMMs (0.29 and 0.18 for genetic correlation between direct and maternal effects, respectively; see Table 5). Genetic correlations of different magnitude and sign regarding the different GL subgroups were obtained between direct effects for GL and SB: the posterior mean was negative (−0.30) and with 0 outside of the highest posterior density interval of size 99% (HPD_{99%}) for gestations of 261–267 days, nil for gestations of 268–273 and 274–279 days, and positive (0.37, HPD_{99%}) for gestations of 280–291 days. In this case, both magnitude and sign of the estimates differed from those obtained from the SMM analyses (0.16). Posterior means of genetic correlations between maternal effects for GL and SB were negative in the first three subgroups (in agreement with the sign of the estimate obtained from SMMs, −0.15), whereas that for the longest gestation subgroup was nil. The heterogeneous genetic correlation between direct genetic effects for GL and SB might explain why other studies, such as that of Hansen *et al.* (2004), found very weak genetic associations using a SMM. Note that the estimates obtained from the analyses of SMMs are within the range of the estimates obtained for the four subgroups regarding GL. Relatively high and positive estimates were obtained for the genetic correlation between direct effects for CD and SB in each of the four subpopulations (0.62–0.73). The posterior mean of the maternal genetic correlation between CD and SB was also positive, although slightly lower (0.56–0.59). All these estimates were slightly lower than those obtained from the SMM (0.79 and 0.70, unpublished results) and also than those reported by Steinbock *et al.* (2003), Hansen *et al.* (2004), and Heringstad *et al.* (2007).

As shown in Table 6, the genetic correlations between direct and maternal effects were negative and low for GL and SB and similar to the estimates obtained from the SMM (−0.19 and −0.14, respectively). These “antagonisms” suggest that genetic selection programs should address both types of genetic effects. The posterior mean of the genetic correlation between direct and maternal effects for CD was close to 0, which is in agreement with the estimate from the SMM (−0.04).

All HPD_{90%} for genetic correlations between direct and maternal effects for different traits included 0 and were similar to their counterparts from the SMM (Table 7). This indicates that effects of genes controlling direct effects for one calving trait are not associated with those controlling maternal effects for another calving trait, and vice versa.

Heterogeneous correlations were found between residuals of GL and those of CD and SB (Table 8). Posterior means of the residual correlation between GL and CD were positive and low, although of different magnitude, for all subgroups, except the first, which was close to 0. Its counterpart from the SMM was also low and positive (0.10). In regard to the residual correlation between GL and SB, posterior means were low and negative in the first two subgroups of GL (−0.17 and −0.08), nil in the third, and low and positive (0.10) for gestations of 280–291 days. In this case, its counterpart obtained from SMMs was nil. For herd–year effects, a low and positive correlation was found between GL and SB (similar to that obtained from SMMs, 0.13). Similarly to the results from SMMs, correlations between the herd–year effects of other calving traits were close to 0.

#### Posterior analysis of the structural coefficients:

Figure 3 shows posterior densities of the structural coefficients for the four subpopulations of GL [, , and ]. The differences in shapes between posterior and prior densities (not shown) indicated that the data substantially modified beliefs about the λ's. The HPD_{90%} of all structural coefficients, except for , did not contain 0, indicating that recursive relationships exist between calving traits. Table 9 shows rates of changes of CD and SB expressed on the observable scale.

##### Direct recursive effect of GL on CD:

Nonlinearity between GL and liability to CD was confirmed, because the rate of change for the four GL subpopulations was different, except for gestations within 274–279 and 280–291 days (considering HPD_{95%}).

For gestations pertaining to the first subgroup, an extra day of GL would not increase CD. However, the rates of change were positive for the remaining populations, indicating that calving problems would increase by 0.24, 0.37, and 0.47% for each additional day of gestation, respectively.

##### Direct recursive effect of GL on SB:

In the three subpopulations with the shortest gestations (261–267, 268–273, and 274–279 days), an extra day of gestation would decrease incidence of SB by 0.96, 0.57, and 0.40%, respectively. However, for the subgroup of longest gestations (280–291 days), the incidence of SB would increase by 0.23% per extra day of GL.

##### Direct recursive effect of CD on SB:

The overlap of the posterior densities of the structural coefficients for the four subgroups (Figure 3c) suggested a linear relationship between liabilities to CD and SB (HPD_{99%}). The positive estimates obtained in all subgroups indicated that an increase in liability to CD would also increase liability to SB. The rate of SB would increase by 0.60% if the incidence of difficult parturitions increased by 1%.

##### Indirect recursive effect of GL on SB (through CD):

The trend of indirect recursive effects of GL through the mediating effect of CD [] across subgroups was similar to that of the direct recursive effect of GL on CD []. The magnitudes of the coefficients were much lower than those corresponding to the direct recursive effect (Table 9), indicating that the overall recursive effect of GL on SB is mainly a direct one. Unlike the direct effect, an extension of gestation by 1 day, *e.g.*, in the subpopulation with shortest gestations (261–267 days), would not decrease the incidence of stillbirths. If gestations were lengthened by 1 day in the remaining subpopulations, the incidence of stillbirths would be affected (HPD_{99%}) indirectly (through CD), increasing by 0.14, 0.23, and 0.29%, respectively.

##### Overall recursive effect of GL on SB:

Different rates of change of SB as a consequence of direct and indirect recursive effects of GL corroborated nonlinearity between GL and liability to SB. The incidence of SB would be expected to decrease by 0.93 and 0.43% in the two subpopulations with the shortest gestations, if gestations were lengthened by 1 day. In the third subpopulation, the HPD_{90%} of the total effect contained 0. However, for the 280- to 291-day group, if gestations were prolonged by 1 day, the incidence of stillbirths would increase by 0.52% (HPD_{99%}).

Results indicated that the highest incidences of stillbirths occurred after extremely short or long gestations. On the other hand, the shortest gestations were found to be related to the lowest incidence of problems at calving. For longer gestations, the incidence of CD increases nonlinearly, so that the highest incidence of dystocias was for the extremely long gestations. The fact that not all SBs are associated with CD might be explained by a poorer vitality of calves with extremely short or long gestations.

#### Future research:

As noted earlier, segmented regression was useful to detect “abrupt changes” of the liabilities to CD and SB at an increase of GL. However, this approach has two drawbacks: the lack of continuity at the breakpoints delimiting GL subgroups and the subjective predetermination of the subgroups before the analyses. Statistical approaches that might have been considered to address the problem of discontinuity, *e.g.*, piecewise linear regression, second-order function, or change-point models, are discussed next. The main advantage of using a piecewise linear regression over the segmented regression approach is that it would satisfy the continuity at the points separating the GL subgroups. However, the use of restrictions to allow for continuity at those points could lead to estimates with higher MSE. Implementation of a second-order function could deal with the lack of linearity and continuity, and it would not be necessary to distinguish subgroups regarding GL. However, such a statistical approach would be more seriously challenged by the identification problem, leading to imposing restrictions and making computation more difficult (Gianola and Sorensen 2004). Moreover, the adjustment of the system parameters to get parameter estimates comparable to those from the SMM or from recursive mixed models (RMM) with heterogeneous structural coefficients is not straightforward. An alternative approach to be considered in future research would be a Bayesian approach with multiple change points (Carlin *et al.* 1992; Chib 1998). The change-point model can be formulated in terms of an unobserved discrete state variable indicating the distribution a particular observation belongs to. The state variable can either stay at the current value or jump to the next higher value (Chib 1998). The Bayesian framework allows comparison of change-point models with different numbers of change points (*e.g.*, by computing the Bayes factor). One of the critical points of the implementation of this method might be the incorporation of the prior information into the analyses.

#### Implications:

Application of a recursive model with heterogeneous structural coefficients to a trivariate analysis of calving traits allowed consideration of the nonlinear relationship between GL phenotype and the liabilities to CD and SB. In addition, given the scenario assumed, it was possible to disentangle direct and indirect recursive effects and to quantify them.

A heterogeneous correlation was detected between direct genetic effects of GL and SB. Low genetic correlations between GL and liabilities to CD and SB suggest that genetic selection for GL would not greatly affect the genetic level of either CD or SB. However, GL is an important trait that should be taken into account in artificial breeding, because of its association with other calving traits. On one hand, the shortest gestations are related to high incidences of SB and to the lowest incidence of CD. On the other hand, the longest gestations are associated with high incidences of both CD and SB. To minimize the incidence of both CD and SB, the existence of an intermediate optimum for gestation length regarding CD and SB was suggested, such that gestations of ∼274 days in length (3 days shorter than the phenotypic mean) led to the lowest rates of CD and SB and therefore to an optimal situation from an animal well-being point of view.

## Acknowledgments

Gustavo de los Campos and two anonymous referees are thanked for useful comments. The authors acknowledge the National Association of Animal Breeders (Columbia, MO) for providing data for this study, as well as for providing partial financial support for K.W. Research was also supported by the Wisconsin Agriculture Experiment Station and by National Science Foundation grant NSF-DMS-044371.

## Footnotes

Communicating editor: C. Haley

- Received August 11, 2008.
- Accepted November 1, 2008.

- Copyright © 2009 by the Genetics Society of America