# Determining the Effective Dimensionality of the Genetic Variance–Covariance Matrix

- Emma Hine
^{1}and - Mark W. Blows

- 1
*Corresponding author:*Department of Zoology and Entomology, University of Queensland, St. Lucia, Queensland 4072, Australia. E-mail: ehine{at}zen.uq.edu.au

## Abstract

Determining the dimensionality of **G** provides an important perspective on the genetic basis of a multivariate suite of traits. Since the introduction of Fisher's geometric model, the number of genetically independent traits underlying a set of functionally related phenotypic traits has been recognized as an important factor influencing the response to selection. Here, we show how the effective dimensionality of **G** can be established, using a method for the determination of the dimensionality of the effect space from a multivariate general linear model introduced by Amemiya (1985). We compare this approach with two other available methods, factor-analytic modeling and bootstrapping, using a half-sib experiment that estimated **G** for eight cuticular hydrocarbons of *Drosophila serrata*. In our example, eight pheromone traits were shown to be adequately represented by only two underlying genetic dimensions by Amemiya's approach and factor-analytic modeling of the covariance structure at the sire level. In contrast, bootstrapping identified four dimensions with significant genetic variance. A simulation study indicated that while the performance of Amemiya's method was more sensitive to power constraints, it performed as well or better than factor-analytic modeling in correctly identifying the original genetic dimensions at moderate to high levels of heritability. The bootstrap approach consistently overestimated the number of dimensions in all cases and performed less well than Amemiya's method at subspace recovery.

GENETIC variance–covariance (**G**) matrices conveniently summarize the genetic relationships among a suite of traits and are a central parameter in the determination of the multivariate response to selection (Lande 1979). Although individual elements of **G**, single-trait genetic variances along the diagonal and bivariate genetic covariances off the diagonal, are frequently the subject of hypothesis testing and biological interpretation, such a compartmentalized approach to the analysis of a set of functionally related traits is of limited value (Pease and Bull 1988; Charlesworth 1990; Blows and Hoffmann 2005). The multivariate pattern of genetic covariance represented by **G** can be analyzed by determining the eigenstructure of this symmetrical matrix and, subsequently, how the genetic variance is partitioned among genetically independent traits. Of particular importance is the rank of **G**, that is, how many genetically independent traits are represented by a set of functionally related phenotypic traits, a question that can be traced back to Fisher's (1930) geometric model (Orr 2000).

The distribution of the eigenvalues of **G** (the genetic variances of the genetically independent traits defined by the eigenvectors) allows an unambiguous definition of a genetic constraint. A singular **G** matrix (one or more of the eigenvalues are zero) suggests that an absolute genetic constraint exists (Pease and Bull 1988; Blows and Hoffmann 2005; Mezey and Houle 2005) and the effective dimensionality of **G** may be less than the number of traits measured. In turn, the eigenvectors of **G** have become important components of a number of approaches to the study of multivariate genetic constraints (Schluter 1996; Blows *et al.* 2004; McGuigan *et al.* 2005) and investigations of how selection may change the genetic variance (Blows *et al.* 2004; Hine *et al.* 2004).

Importantly, the singular nature (or otherwise) of **G** cannot be determined from a cursory viewing of the elements of **G**; **G** may be still be singular in the presence of genetic variance in all individual traits, for instance (Dickerson 1955; Charlesworth 1990). Determining the dimensionality of a covariance matrix, and the subsequent estimation of nonnegative definite covariance matrices, has received considerable attention in the statistical literature (Amemiya 1985; Anderson and Amemiya 1991; Calvin and Dykstra 1991; Mathew *et al.* 1994; Sun *et al.* 2003). However, the effective dimensionality of **G** has been addressed only on a few occasions, primarily using resampling approaches to generate confidence intervals for the eigenvalues of **G** (Kirkpatrick *et al.* 1990; Mezey and Houle 2005) and to find whether one or more eigenvalues of **G** were significantly different from zero (Kirkpatrick *et al.* 1990). In a recent departure from this approach, factor-analytic modeling has been adopted as a way of fitting genetic principal components in a framework that will also allow direct hypothesis tests of the number of genetic dimensions required to explain genetic covariation among environment and traits (Thompson *et al.* 2003; Kirkpatrick and Meyer 2004).

Here, we assess the utility of three methods for determining the dimensionality of **G**. First, we introduce how the effective dimensionality of **G** can be established, using a method for the determination of the dimensionality of the effect space from a multivariate general linear model (Amemiya 1985; Amemiya *et al.* 1990; Anderson and Amemiya 1991). We outline how the effective number of dimensions of a genetic covariance matrix obtained from either one or two random factor experimental designs commonly used in evolutionary studies can be determined. We then show how the original genetic covariance matrix can be partitioned so that a new, nonnegative definite covariance matrix can be constructed that contains only those genetic dimensions that have strong statistical support. Conveniently, this method requires calculating only the eigenvectors and eigenvalues of a symmetrical matrix without iteration, an approach that can be implemented in many commonly used software packages. Second, factor-analytic modeling in a restricted maximum-likelihood framework is utilized to demonstrate how dimensionality may be determined in many genetic experimental designs using a mixed-model approach, again resulting in a guaranteed positive semidefinite estimate of **G**. Finally, the bootstrap approach, recently used to establish full rank in *Drosophila melanogaster* wing shape by Mezey and Houle (2005), is implemented.

We compare the performance of all three methods in a simulation study conducted on 100 simulated data sets of known dimensionality, as well as applying them to a multivariate set of contact pheromones (cuticular hydrocarbons) from a standard half-sib breeding design using *D. serrata*. We have been investigating the effect of directional selection on the genetic variance in male sexually selected display traits of *D. serrata* (Blows *et al.* 2004; Hine *et al.* 2004). In this system, multiple male cuticular hydrocarbons act as contact pheromones and are under linear sexual selection. The vast majority of genetic variance in these male traits was found to be orientated in such a way that it was almost orthogonal to the direction of sexual selection in both a laboratory-adapted (Blows *et al.* 2004) and a field (Hine *et al.* 2004) population. These results suggested that sexual selection might have depleted genetic variance, resulting in an ill-conditioned **G** matrix that was effectively not of full rank.

## MATERIALS AND METHODS

#### Method I: the dimensionality of the effect space from a linear model:

##### The dimensionality of a covariance matrix:

To begin, consider the simple case of an experimental design that could be analyzed appropriately by a one-way multivariate analysis of variance; for example, multiple traits are measured on a number of individuals from a set of families. In this case, extraction of the between- and within-mean square matrices ( and , respectively) is the first step in the estimation of the between-source (family) variance–covariance matrix (), using(1)where *r* is the coefficient of the variance components at the between-source level. The characteristic roots (λ_{i}) of in the metric of can be found by solving(2)(Amemiya 1985). If λ_{i} ≥ 1 for all *i* = 1, 2,…, *p*, then the covariance matrix is nonnegative definite. If any λ_{i} < 1, then is indefinite as will often be the case with estimates of **G** (Hill and Thompson 1978). Solving the polynomial in λ represented by (2) requires specialized software. Instead, one can use linear algebra to find the character roots of in the metric of as defined by the eigenvalues (λ_{i}) ofwhere **L** is a lower triangular matrix and defined as the transpose inverse of **U** (upper triangular), which in turn is the Cholesky root of (*i.e.*, ). Note that the notation here differs from that of Amemiya (1985) to avoid confusion with lower (**L**) and upper (**U**) matrix notation. The classical Cholesky decomposition is a lower triangular matrix (), but many common mathematical and statistical programs (such as Matlab and SAS IML) use the upper triangular matrix convention (). Be aware of which convention your program of choice uses and ensure that the inverse of the Cholesky root is arranged in lower triangular form to define **L**.

When the existence of genetic constraints is of interest, the initial null hypothesis will be full rank (*p*) as a **G** matrix of full rank indicates there are no genetic constraints on the evolution of the set of traits. Let *k* be the number of characteristic roots such that λ_{i} ≥ 1. Amemiya *et al.* (1990; Anderson and Amemiya 1991) provide a way of determining how many of the *k* dimensions are supported statistically to determine the effective number of dimensions (*m*). As a first step, the null hypothesis that *m* ≤ *k* can be accepted immediately, because, at most, there is evidence for the same number of dimensions as there are λ_{i} ≥ 1. Next, we test the new null hypothesis that *m* ≤ *k* − 1. If this null hypothesis is accepted, as outlined below, we continue to test *m* ≤ *k* − 2 and so on until one in the series of null hypotheses is rejected or the null hypothesis that *m* ≤ 0 is accepted, indicating a lack of support for any effect.

To test the null hypothesis that *m* ≤ *b* (0 < *b* < *k* − 1), order the λ_{i} ≥ 1 from greatest to smallest (λ_{1} ≥ λ_{2} ≥ ⋯ ≥ λ_{k} ≥ 1) and to each apply the equationwhere *M* and *N* are the degrees of freedom at the effect (between) and error (within) levels, respectively. The test statistic for the current null hypothesis is then(3)To determine whether to reject the null hypothesis, the test statistic can be compared to the distribution of *Y* for in Table 1 of Amemiya *et al.* (1990). Rejecting the null hypothesis *m* ≤ *b* indicates that there is evidence for *b* + 1 dimensions.

##### Constructing a nonnegative estimate of a covariance matrix:

It is possible to construct a nonnegative matrix of rank *k* from the eigenvectors corresponding to those λ_{i} ≥ 1 (Amemiya 1985). Let the characteristic vectors of in the metric of from (2) form the columns of a matrix **T**. Define the matrix **P** as(Amemiya 1985). Using **P** we can now express the original difference aswhere and **I**_{pp} is the identity matrix. The characteristic roots (λ_{i}) of in the metric of can be used to partition into a nonnegative definite matrix and a negative definite matrix,(4)where **P**_{k} is a matrix containing the first *k* columns of **P**, **P**_{l} contains the remaining *p* − *k* columns of **P**, , and . The first term on the right-hand side of (4) then represents the nonnegative definite portion of and the second term represents the negative definite portion. By dropping the second term representing the negative definite portion, the negative eigenvalues are set to zero. The remaining term is then the projection of in the metric of onto the set of all nonnegative definite matrices (Amemiya 1985).

Again, in practice it is easier not to deal with estimating the characteristic vectors of in the metric of using (2). Alternatively, let **Q** be the matrix containing the eigenvectors of as columns (Amemiya 1985). Define the matrix **P** (Amemiya 1985):

Having established the effective number of dimensions, we can construct a new covariance matrix of rank *m* that is nonnegative definite and contains only the dimensions supported by the hypothesis-testing procedure, rather than of rank *k* from the eigenvectors corresponding to those λ_{i} ≥ 1. The first *m* columns of **P** (forming the matrix **P**_{m}) are associated with the characteristic roots of in the metric of that are statistically supported. **P**_{m} is used to construct a nonnegative definite covariance matrix of rank *m* by(5)where and **I**_{mm} is the identity matrix. The first *m* eigenvectors of represent the dimensions of the original covariance matrix that have strong statistical support as defined by the hypothesis-testing procedure outlined above. The remaining eigenvectors will have eigenvalues equal to zero. Amemiya (1985, appendix) goes on to show that (5) is equivalent to the restricted maximum-likelihood estimator of for balanced experimental designs.

#### Method II: factor-analytic modeling:

##### The dimensionality of a covariance matrix:

Although Amemiya's approach allows the determination of the number of dimensions of a **G** matrix, it is restricted in the experimental designs that can be accommodated by the approach. Factor-analytic modeling is commonly used to determine the minimum number of factors (dimensions) required to explain the pattern of covariation among a set of variables (Krzanowski and Lai 1988). Although not explicitly the goal of their development of factor-analytic modeling for fitting genetic principal components (Kirkpatrick and Meyer 2004; Meyer and Kirkpatrick 2005), factor-analytic modeling provides an alternative way to determine the dimensionality of **G** matrices because likelihood-ratio tests can be made of the improvement in fit as each principal component is added (or subtracted) from the model (Kirkpatrick and Meyer 2004; Meyer and Kirkpatrick 2005). This approach displays great promise as it allows for almost any experimental design to be used, and tests of dimensionality can be conducted within a restricted maximum-likelihood framework.

Kirkpatrick and Meyer (2004) introduced a new algorithm to fit genetic principal components that we do not attempt to implement here. Rather, as suggested by these authors, alternative algorithms are available, and here we outline a readily available approach using standard statistical software (Proc Mixed in SAS). In particular, the reduced-rank model in which the specific variances are assumed to be zero can be specified in Proc Mixed for the level of interest that contains the additive genetic variance components (the sire level in a half-sib design, for example), using the factor-analytic model [FA0(*m*)] covariance structure. At the genetic level of interest then, a reduced-rank covariance matrix is given by(6)where () is a lower triangular matrix of constants that represent the factor loadings of the *m* latent variables. A series of nested hypothesis tests may be conducted to determine how many genetic dimensions are required to explain the observed patterns of genetic covariance. A full model is first fit (*m* = *p*), and factors are sequentially dropped until the model achieves a significantly worse fit compared to the previous model (in which it is nested), indicating that the amount of variation accounted for by the tested factor is sufficient for the factor to be retained. Since is a lower triangular matrix, one less parameter is estimated in each subsequent column. Therefore, the degrees of freedom associated with each log-likelihood-ratio test are determined by the change in the number of parameters required in given by *p* − *m* + 1.

##### Constructing a nonnegative estimate of a covariance matrix:

Conveniently, fitting the reduced model will guarantee a nonnegative definite variance–covariance matrix with the desired number of dimensions. For example, fitting *m* ≤ *p* factors will result in a **G** matrix that is positive semidefinite with *m* dimensions, even if a number of the eigenvalues of the original **G** matrix were negative. Once the number of factors to be retained is established, the reduced-rank covariance matrix, , is obtained by multiplying the matrix of factor loadings by its transpose as in (6).

#### Method III: bootstrapping of the eigenvalues of G:

Mezey and Houle (2005) attempted to determine the dimensionality of **G** by generating a bootstrapped sample of the eigenvalues of the original **G** matrix. Here, we implement this bootstrap procedure to enable a comparison with the other two methods we present. Bootstrapping was conducted by sampling sires (and all their progeny as blocks) with replacement, generating 1000 estimates of **G**. The eigenvalues of each **G** were then calculated and mean eigenvalues and their 95% confidence intervals are presented. Constructing a nonnegative definite matrix from this approach is not possible as it does not allow a determination of the exact dimensions that have received statistical support, a limitation that we discuss in detail below.

#### Simulation:

To assess the relative effectiveness of the three methods, we conducted a simulation using 100 data sets. For simplicity and to reduce computational times we did not simulate a hierarchical data structure as in the half-sib experiment, but rather each data set consisted of 100 “sires,” each with 6 “offspring,” resulting in 600 offspring for which eight traits were generated. To generate data with known dimensionality at the sire level for each data set, we first generated an 8 × 2 factor matrix, , to represent the factor loadings of two latent variables, with randomly drawn constants in all elements except , which was set to zero as the pivot. We then multiplied by its transpose (as in factor-analytic modeling, Equation 6) to construct the 8 × 8 “genetic” covariance matrix. One hundred random linear combinations of the first 2 eigenvectors of **G**, where the random multipliers were drawn from distributions of mean zero and variance equal to the corresponding eigenvalue, were produced to represent 100 sire “breeding values.” One hundred simulated sets of breeding values for eight traits that could be completely described by two different latent variables were generated using this process.

We conducted the simulation on four different approximate levels of heritability: 0.75, 0.50, 0.25, and 0.10. To generate offspring phenotypes with an average heritability of ∼0.75, we generated for each simulated data set 600 (100 sires by six offspring) rows and eight columns of random numbers independently distributed with mean 0 and variance 1.5. The breeding value of each sire was added to six of these random numbers to create six offspring values for that sire. To obtain heritabilities of ∼0.50, 0.25, and 0.10, the diagonals of the “error” covariance matrix were set at 10, 30, and 100, respectively.

We applied Amemiya's method and factor analytic modeling to the same 100 data sets for each heritability level. Bootstrapping was initially applied to the 10 and 75% heritability treatments, and because the results differed little between the two extreme treatments, and to save on computational time, we did not proceed to apply this method to the intermediate heritability treatments. For Amemiya's method, variance components were estimated using least-squares, using Proc GLM in SAS as the data sets were balanced. Factor-analytic modeling was conducted using Proc Mixed in SAS, with an unconstrained covariance structure modeled at the “within-sire” level and the factor-analytic structure at the sire level. Bootstrapping of each data set was conducted by sampling sires with replacement, and 1000 bootstrap replicates of each of the 100 data sets were generated.

## RESULTS

Here, we have reanalyzed the data from the half-sib experiment described in Blows *et al.* (2004). In this experiment 66 sires were each mated to up to three dams, resulting in 367 offspring for which eight contact pheromones were analyzed. To this data set we have applied each of the three methods for determining the dimensionality of the additive genetic variance–covariance matrix. Although this experiment has not been designed to allow an exhaustive search for the presence of very small levels of genetic variance (Mezey and Houle 2005), it serves to illustrate the implementation and relative performance of the three methods in an experiment of a size commonly found in evolutionary studies.

#### Method I: the dimensionality of the effect space from a linear model:

The model for this experimental design was the standard nested MANOVA model, with dams nested within sires. The mean square matrices (*r* = 5.4218), (*r* = 1.8326), and can be found in Table 1. The approach outlined by Amemiya (1985) for such a model differed in one aspect from that presented above for the one-way model. Here, the approach was applied in a two-step process, by first finding that part of the dam-level covariance matrix () that was nonnegative definite and had strong statistical support. By using and in the above procedure, the ordered eigenvalues of were 3.260, 2.464, 2.115, 1.913, 1.726, 1.277, 1.164, and 1.003. Since all λ_{i} > 1, **Σ**_{dam} is nonnegative definite, although a number of the eigenvalues are quite small. We began the hypothesis-testing procedure by accepting the null hypothesis that *m* ≤ 8, and each subsequent null hypothesis was accepted until *m* ≤ 2, which was rejected (*Y* = 22.032, *P* < 0.05), suggesting that has three dimensions that have statistical support. A new dam-level covariance matrix was constructed with a rank of 3, which allowed the construction of by rearranging (1) to , where .

The dimensionality of (Table 2) was determined by following the same procedure, now using and the Cholesky decomposition of . The ordered eigenvalues of were 3.542, 3.001, 2.580, 1.506, 1.436, 1.046, 0.941, and 0.638. Since two roots were <1, the null hypothesis that *m* ≤ 6 was accepted. Each subsequent null hypothesis was accepted until *m* ≤ 1, which was marginally nonsignificant (*Y* = 24.410, *P* = 0.059), where the exact probability level was derived from Equation 4 in Amemiya *et al.* (1990), suggesting that there was evidence for two dimensions of genetic variance in the set of eight pheromones.

The two genetic dimensions supported by the analysis are the first two principal components of the constructed nonnegative definite matrix (Table 2) and are displayed in Table 3. The original **G** matrix presented in Blows *et al.* (2004) was indefinite with three negative eigenvalues. Here, it can be seen that the two dimensions supported by the current analysis are very similar to the first two principal components of the original **G** matrix (Table 3). A more formal subspace comparison (Krzanowski 1979; Blows *et al.* 2004) indicated the two two-dimensional subspaces in Table 3 are indeed very similar. The comparison yields a value of 1.76 for the sum of the eigenvalues of the **S** matrix (hereon represented as ) defined in Equation 3 in Blows *et al.* (2004), which in this case ranges from 0 for orthogonal subspaces to 2 for coincident subspaces.

#### Method II: factor-analytic modeling:

Factor-analytic models were run constraining the covariance matrix at the sire level to be from a single dimension to full rank (eight dimensions), and model fitting statistics are displayed in Table 4. Of the eight models, the two-dimensional model, FA0(2), displayed the lowest value of the Akaike information criterion (AIC), suggesting that overall, this was the model of best fit. The series of nested hypothesis tests indicated that moving from two dimensions to one resulted in the first significant increase in the log likelihood (χ^{2} = 17.4, d.f. = 7, *P* = 0.015), again suggesting that the two-dimensional model was the favored model. Eigenanalysis of the resulting covariance matrix constrained to be two-dimensional revealed that the two significant genetic dimensions were almost identical to the two dimensions found to have statistical support by Amemiya's approach (Table 3). A formal subspace comparison indicated that the two dimensions from each of these covariance matrices described almost coincident subspaces ( = 1.93 of 2).

#### Method III: bootstrapping of the eigenvalues of G:

Mean eigenvalues from the bootstrapped sample of each of the eigenvectors are presented in Table 5, along with the fifth percentile value that represents the lower 95% confidence interval in this circumstance as the significance of a variance component from zero is a one-tailed hypothesis. The lower 95% confidence interval does not overlap zero for the first four eigenvectors, indicating that significant genetic variance is present in these four dimensions. The next eigenvector, **g _{5}**, has a positive mean eigenvalue, but the lower 95% confidence interval overlaps zero and equates to a significance level of

*P*= 0.187, indicating that there is no evidence for genetic variance in this dimension.

#### Simulation:

Amemiya's method successfully identified two dimensions at the sire level in 99 of the 100 data sets in the 75% heritability treatment. The sole remaining case was identified as having only a single dimension and was the data set that displayed the most extreme condition index (here defined as the ratio of the first to the second eigenvalue). This method was less effective at lower heritability, with only 22 cases correctly identified as two-dimensional in the 10% heritability treatment. All other cases were identified as one-dimensional, indicating Amemiya's approach consistently underestimated the number of dimensions when heritability was low.

In contrast, factor-analytic modeling was relatively consistent in its ability to identify the correct number of dimensions across heritability treatments (Figure 1), although rejection of the two-dimensional model occurred in at least 30% of cases in each heritability treatment. The behavior of factor-analytic modeling with a change in heritability was more complex than Amemiya's approach, with overestimation of the number of dimensions at high heritability, with increasing underestimation of dimensionality at lower heritabilities (Table 6). One possible explanation for the failure of factor-analytic modeling to identify the correct number of dimensions was if the search algorithm had been captured by a local peak on the likelihood surface. To test this, we reran a subset of cases using the correct factor-analytic structure as the starting values. REML iterated to the same solution as when the default MIVQUE starting values were used, suggesting that the global maximum likelihood had been found in these cases.

To determine the effectiveness of Amemiya's method and factor-analytic modeling in identifying the correct genetic dimensions, we investigated the ability of the two approaches to identify the correct two-dimensional subspace in the 75% heritability treatment in which Amemiya's approach had a 99% success rate. We compared the two-dimensional subspaces of the estimated positive semidefinite covariance matrices from the two methods with the subspace defined by the first two eigenvectors of the rescaled simulated genetic covariance matrix (to account for orientation changes due to the standardization of the traits) for each data set. Both Amemiya's method and factor-analytic modeling were highly successful in uncovering the correct two-dimensional genetic subspace, with the mean values of (±SD) being 1.982 ± 0.102 and 1.944 ± 0.106, respectively. Figure 2 shows the relationship between the two methods in their ability to estimate the correct two-dimensional subspace. When factor-analytic modeling incorrectly indicated three dimensions, the two-dimensional subspace recovered by this method was consistently a poorer match to the correct subspace (Figure 2A). Including the third dimension in the subspace comparison (Figure 2B) improved the recovery of the correct subspace to a level comparable to that in Amemiya's method. Therefore, in the data sets in which factor-analytic modeling incorrectly identified three dimensions, the third dimension does in fact contain some proportion of the originally defined two-dimensional genetic subspace. Figure 2C shows that for those cases where factor-analytic modeling correctly indicated that two dimensions existed, there is an almost perfect relationship between the two methods.

Bootstrapping did not recover the correct number of dimensions in any of the 100 data sets, consistently overestimating the number of dimensions with 21, 78, and 1 cases having three, four, and five dimensions, respectively, at the 75% heritability level (Table 6). At the 10% heritability level bootstrapping again consistently overestimated the rank of **G**, although the number of dimensions identified decreased overall, with 42 and 58 cases of three and four dimensions, respectively. To assess the performance of bootstrapping in recovering the correct subspace within the dimensions identified as being significant, the mean bootstrap estimate of **G** was compared to simulated (rescaled, as above) **G**. Bootstrapping performed better for the higher heritability level, with mean values of (±SD) being 1.994 ± .017 and 1.781 ± 0.245 at the 75 and 10% heritability levels, respectively.

## DISCUSSION

#### Determining the dimensionality of G matrices:

The effective dimensionality of a set of traits plays a central role in the genetics of adaptation (Fisher 1930; Orr 2000) and the definition of absolute multivariate genetic constraints (Pease and Bull 1988; Blows and Hoffmann 2005). Here, we have investigated three methods for the determination of the effective dimensionality of a **G** matrix. Of the three methods, the simulation study indicated that Amemiya's approach performed best at higher heritabilities, but consistently underestimated dimensionality at lower heritabilities and was particularly biased in the 10% heritability treatment. Factor-analytic modeling tended to overestimate the number of genetic dimensions by 1 at higher heritabilities and underestimate at lower heritabilities by the same magnitude in a substantial number of cases. Although the performance of factor analysis in recovering the original genetic dimensions dropped off at the lowest heritability level, the two methods recovered the same two-dimensional subspace when the correct number of dimensions was indicated by factor-analytic modeling for the 25, 50, and 75% heritability treatments, suggesting a close mathematical association between the two approaches. The precise mathematical relationship between the two methods deserves further attention.

The simulation study indicated that bootstrapping will consistently overestimate the dimensionality of the **G** matrix, which is likely to be a consequence of two related limitations. First, each bootstrapped estimate of the covariance matrix will have a unique set of eigenvectors. Although the first eigenvalue will always be the largest in each replicate, the eigenvectors associated with each of the eigenvalues will be different; how different will depend mainly on the sample size, but even with very large samples the same trait combination (eigenvector) is unlikely to be replicated exactly in any of the bootstrapped samples. Since the eigenvalue is simply an estimate of the genetic variance, this is analogous to estimating the genetic variance of a different trait in each of the bootstrapped replicates, but treating the eigenvalues for hypothesis testing as if they estimated the genetic variance in the same trait each time. Second, if the eigenvalues of the original covariance matrix are distributed in such a way that some are very close in size, the order of the eigenvectors may change in different bootstrapped replicates (Cohn 1999; Peres-Neto *et al.* 2005), adding to the variation in the trait combinations that will be associated with each ranked eigenvalue. This is likely to be a particular problem with eigenvectors associated with small eigenvalues.

Either of these problems will result in like-sized eigenvalues of bootstrapped replicates being binned together for the calculation of the confidence interval for an estimate of an original eigenvalue, even though each estimate reflects a different trait combination. In this way, confidence intervals will be estimated as being small as the variance in each bin (*i.e.*, first eigenvalue, second eigenvalue, etc.) is determined only by the rank of the eigenvalue for each bootstrapped replicate, rather than by how much genetic variance is present for a particular trait combination in each bootstrapped replicate. Such an approach will tend to uncover as many statistically significant dimensions as there were positive eigenvalues in the original estimate of the **G** matrix; in our case, five positive eigenvalues were present in the original **G** matrix, and four dimensions had nonzero eigenvalues determined by bootstrapping. This pattern was also present in the results from the simulation study; in 63 (75% heritability treatment) and 70 (10% heritability treatment) cases, one less dimension than the number of positive eigenvalues was recovered.

Although the factor-analytic approach provides the most flexibility for determining the dimensionality of **G** from virtually any experimental design, Amemiya's approach may provide an important alternative under some conditions. One such situation is that in practice REML often does not converge with the inclusion of a large number of traits. For laboratory-based quantitative genetic or genomic experiments based on simple breeding designs with relatively balanced data sets, Amemiya's approach supplies a noniterative alternative to determining dimensionality. For example, microarray analysis tends to involve a very large number of gene expression traits. One approach to future microarray experiments will be to conduct these experiments within a breeding design. Such an experimental design will allow the determination of the number of independent genetic factors underlying the typically large number of gene expression changes among treatment groups. With so many traits to be included in a factor-analytic model, convergence problems are likely to be encountered and therefore these experiments may benefit from applying Amemiya's approach.

It is important to note that the approaches outlined above are subject to the usual power constraints due to small sample sizes. Genetic experiments conducted with small sample sizes are likely to be capable of detecting only one or two dimensions that explain a substantial amount of genetic variance. The poor performance of Amemiya's approach at low heritability suggests that this method will be particularly adversely affected by small sample size. Factor-analytic modeling also underestimated the number of dimensions as heritability dropped, with a less dramatic loss of performance than Amemiya's approach as the information content of the data was reduced. Consequently, we recommend these approaches as objective aids in defining a genetic subspace, but they should not be applied as the sole determining factor in subspace selection. In particular, in cases where fewer genetic dimensions included may increase the likelihood of rejecting a null hypothesis of interest (Blows *et al.* 2004 would be a good example), these approaches should be used with caution.

#### Are G matrices of full rank?

In our example, eight pheromone traits were shown to be adequately represented by only two underlying genetic dimensions by two of these methods: Amemiya's approach and factor-analytic modeling of the covariance structure at the sire level. The bootstrap method identified four dimensions that had eigenvalues significantly different from zero, but as indicated by the simulation, this is likely to be an overestimate. The sample size in this data set is not sufficient to allow an exhaustive search for dimensions describing small levels of genetic variance, so a definitive conclusion that only two dimensions have genetic variance would be premature. In the only other statistical determination of the rank of **G**, Mezey and Houle (2005) reported **G** matrices of full rank for wing shape in *D. melanogaster*, using the bootstrap approach in an experiment with a very large sample. Since the bootstrap approach is likely to overestimate dimensionality, it is uncertain if **G** matrices of full rank may be a feature of multivariate descriptions of morphology.

A major advantage of Amemiya's approach and factor-analytic modeling is that both can result in the identification of the reduced subspace for which there is strong statistical support for the presence of genetic variance. It is important to note that in our example, the two dimensions supported by these analyses were not exactly the same as the first two eigenvectors of estimated **G**, suggesting that directions in the space of **G** that describe substantial amounts of the additive genetic variance may also be associated with large nonadditive or environmental sources of variance.

In conclusion, determining the dimensionality of **G** provides an important perspective on the genetic basis of a multivariate suite of traits. Understanding the genetic basis of adaptation will require a multivariate approach, as single traits are rarely under selection in isolation (Lande and Arnold 1983; Schluter and Nychka 1994; Blows and Brooks 2003). In combination with emerging genomic technologies, statistical approaches to determining the number of genetically independent traits may potentially allow an investigation of how many independent genetic changes underlie a response to selection involving complex adaptations.

## Acknowledgments

We thank Y. Amemiya, D. Houle, H. Rundle, and four anonymous reviewers for comments on previous drafts of this manuscript. M.W.B. was supported by a grant from the Australian Research Council.

## Footnotes

Communicating editor: D. Houle

- Received December 12, 2005.
- Accepted March 14, 2006.

- Copyright © 2006 by the Genetics Society of America