## Abstract

The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be overdispersed by sampling error, where large eigenvalues are biased upward, and small eigenvalues are biased downward. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy–Widom (TW) distribution. Here we show that genetic eigenvalues estimated using restricted maximum likelihood (REML) in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW distribution, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix.

AS a consequence of pleiotropy, the biological conclusions drawn from the distribution of genetic variance across phenotypic space are often in sharp contrast to the magnitude of genetic variance typically observed for individual traits (Dickerson 1955; Blows and Hoffmann 2005). Most individual traits tend to display substantial levels of genetic variance; however, most of this genetic variance is often confined to fewer multivariate trait combinations than the number of individual traits that are measured (Hine and Blows 2006; Kirkpatrick 2009; Walsh and Blows 2009). As applications of quantitative genetics to human health, agriculture, and evolutionary biology move toward adopting multivariate analyses of genetic variance, it will therefore be important that analytical approaches to accommodate the complexity of higher dimensional genetic information are developed.

The fundamental tool for understanding how pleiotropic covariance restricts the genetic variance in individual traits to multivariate combinations of these traits is the genetic covariance (**G**) matrix, a symmetrical variance-component matrix whose diagonal elements represent the genetic variance in individual traits and the off-diagonal elements the covariances between them. The multivariate distribution of genetic variance is then determined by the empirical spectral distribution of the eigenvalues of **G**, in which each eigenvalue explains a portion of the total genetic variance underlying the phenotypic space (Dickerson 1955; Hill and Thompson 1978; Lande 1979; Pease and Bull 1988; Blows 2007). Although analyses of the spectral distribution of **G** are relatively uncommon, an exponential decline in eigenvalues is typically observed, with some eigenvalues approaching zero (Kirkpatrick 2009; Pitchers *et al.* 2014). In evolutionary genetics the relative sizes of the eigenvalues of **G** are of particular interest, as they may determine how populations respond to selection in directions of phenotypic space. For example, trait combinations with small genetic eigenvalues that have low levels of genetic variance form a nearly null genetic subspace (Gomulkiewicz and Houle 2009; Houle and Fierst 2013; Hine *et al.* 2014), which may or may not include a true null subspace where genetic variance is zero (Mezey and Houle 2005). The response to selection in these directions of phenotypic space may be severely slowed (Kirkpatrick *et al.* 1990) and biased toward trait combinations with higher levels of genetic variance (Chenoweth *et al.* 2010). If population sizes are small, failure to respond to selection in these regions of phenotypic space is also a real possibility (Gomulkiewicz and Houle 2009; Hine *et al.* 2014). Consequently, determining whether the eigenvalues from variance-component matrices that represent levels of genetic variance are significantly different from zero is important for understanding the evolution of multivariate phenotypes.

Although the pattern of decay in genetic eigenvalues from multivariate genetic analyses may reflect the biological covariance among traits, it is known from random matrix theory (RMT) that a similar pattern is also the expected outcome from random sampling alone (Johnstone 2006). RMT provides a framework for understanding the behavior of eigenvalues of symmetrical matrices with elements drawn randomly from a wide array of statistical distributions. For sample covariance matrices in particular, the behavior of such eigenvalues has been the subject of intense interest (Wigner 1955; Bai and Silverstein 2010). In a genetics context, sample covariance matrices describe among-individual covariance structure, for example SNPs that are represented as genomic relatedness matrices (Patterson *et al.* 2006), and phenotypic covariance matrices.

To illustrate two key results from RMT for sample covariance matrices, consider the phenotypic covariance (**P**) matrix of a set of *p* = 5 quantitative traits sampled from a multivariate normal distribution with a mean of 0 and covariance matrix equal to the identity matrix on *n* = 250 individuals. While the naive expectation would be that all eigenvalues of this matrix are equal to one, sampling error causes the eigenvalues to be overdispersed, where some eigenvalues are much larger or smaller than one. This spectral distribution of eigenvalues can be characterized in two ways: First, the bulk of the eigenvalues are known to conform to the Marchenko–Pastur (MP) distribution (Johnstone 2006), which defines the interval in which the bulk of the eigenvalues will fall and the shape of their distribution (Figure 1A). Second, each individual eigenvalue in the spectrum is known to conform to a variation of the Tracy–Widom (TW) distribution for which the moments of the distribution differ for each eigenvalue (Tracy and Widom 1996; Johnstone 2001) (Figure 1B). Using this property of individual eigenvalues, one application of RMT in genetics has been to test for cryptic population structure in genome-wide association studies. When *p* SNPs are used to estimate the relatedness between *n* individuals, the eigenvalues of the *n* × *n* sample covariance matrix (the genomic relatedness matrix) can be tested against the TW distribution to determine if cryptic population structure is present in the sample (Patterson *et al.* 2006; Bryc *et al.* 2013).

In contrast, a **G** matrix is partitioned from **P** using a multivariate random effects model, and therefore is not an example of a sample covariance matrix, but a variance component matrix. The theoretical limiting distributions for the eigenvalues of variance component matrices, like **G**, have been until recently unexplored. Hill and Thompson (1978) first noted that genetic eigenvalues tended to be overdispersed as a consequence of sampling; large eigenvalues tended to overestimated, and small eigenvalues tended to be underestimated. Blows and McGuigan (2015) showed that the bulk of the genetic eigenvalues follow a MP-like distribution that describes this overdispersion in the context of RMT, and it has subsequently been confirmed that the MP distribution can be generalized to describe the bulk distribution of eigenvalues from variance component matrices (Fan and Johnstone 2016). In addition to the bulk distribution of eigenvalues, the leading eigenvalue of **G** has been shown to conform to the TW distribution using an empirical scaling approach (Saccenti *et al.* 2011; Blows and McGuigan 2015), highlighting that hypothesis testing of genetic eigenvalues can take advantage of the TW distribution to form the appropriate null distribution that observed genetic eigenvalues can be tested against.

Here, we show how RMT approaches can be applied in quantitative genetic analyses to account for the sampling error that is concentrated in the leading eigenvalues of **G** matrices. We demonstrate how the use of confidence intervals on the eigenvalues of **G** estimated from restricted maximum likelihood (REML) and Bayesian genetic analyses can result in erroneous conclusions concerning the presence of genetic variance when examined in the absence of the appropriate null distribution. We integrate the use of the TW distribution into the analyses of genetic eigenvalues from these approaches to enable a test of whether a genetic eigenvalue significantly differs from the null distribution.

## Methods

### Simulation of random data

To illustrate the behavior of the sampling variance in genetic eigenvalues, we generated 10,000 simulated data sets with a simple experimental design that represented 50 lines, with 5 individuals sampled from each line (representing, for example, 5 individuals sampled from each of 50 inbred lines). The identity matrix was used to sample the five traits for each data set from a multivariate normal distribution with a mean of 0, giving no rise to within- or among-line covariance. Therefore, any covariance among the five traits within each data set was solely a consequence of sampling. Estimation of (co)variance components at the among-line (genetic) level then represents the behavior of the sampling error in the genetic variance of individual traits and the multivariate genetic eigenvectors, in the absence of genetic information in the data.

This simulation closely matches that used in Blows and McGuigan (2015), where the relevance of the MP and TW distributions for genetic eigenvalues was first explored using 10 simulated traits. In the current work we use only five traits to reduce computational times. This also represents the modal number of traits that are currently used in empirical studies that have estimated **G** matrices (Pitchers *et al.* 2014). While RMT was developed for very high-dimensional problems, the TW distribution has been shown to empirically fit well with *p* = 5 traits in a number of contexts (Ma 2012), including variance component matrices (Blows and McGuigan 2015).

### Statistical analysis

The statistical analysis of multivariate genetic data typically employs either REML- or Bayesian-based approaches to estimate the genetic covariance matrix. In these analyses, an important choice to be made in the multivariate modeling is the type of covariance structure to be imposed on the data at the genetic level (in this case the among-line level of our simulation). The simplest covariance structure is an unconstrained covariance matrix that permits genetic correlations to exceed the theoretical boundary (−1 and 1), often resulting in negative eigenvalues of **G**. The probability that an estimated **G** matrix will be nonpositive definite is very high with the commonly employed sample sizes in evolutionary studies (Hill and Thompson 1978). While negative eigenvalues seem counterintuitive, they are an important component of the behavior of the sampling error in variance component matrices and the unconstrained covariance structure is therefore most compatible with RMT (Blows and McGuigan 2015).

An alternative choice that is used extensively in quantitative genetics is a constrained covariance structure that requires the estimate of **G** to stay within the parameter space. Within REML-based analyses, this can be achieved by specifying a factor-analytic covariance structure that fits a number of orthogonal trait combinations, up to the number of traits. In this case, **G** is constrained to be positive semidefinite, so that all eigenvalues are greater than zero. Similarly, Bayesian analyses that use Markov chain Monte Carlo (MCMC) typically sample from a sums-of-squares-and-cross-product matrix, and consequently estimates of **G** from these analyses are also constrained to be positive semidefinite (Hazelton and Gurrin 2003; Sorensen and Gianola 2010). Although a constrained covariance structure results in biologically sensible estimates of **G**, the leading genetic eigenvalues remain subject to the sampling error predicted from RMT as we show below.

Here, we use both REML and MCMC approaches to analyze random data, to determine whether the sampling variance that is concentrated in the leading eigenvalues that result from the respective analyses conform to the TW distribution. We then demonstrate how the TW distribution can be used to determine whether an observed genetic eigenvalue significantly differs from zero, and explore the utility of this method for three experimental designs that represent different levels of power. First, using REML, we analyze the 10,000 simulated data sets employing both unconstrained and factor-analytic covariance structures to determine whether the sampling variance of random data generated by each of these models is TW distributed. Next, using an MCMC approach, we analyzed single random data sets using two commonly employed priors. We sampled 10,000 times from each of the posterior distributions to determine whether the posterior distributions of eigenvalues from MCMC models are TW distributed. Third, we use a recently suggested REML-multivariate normal distribution (REML-MVN) sampling approach, which approximates the sampling error of genetic (co)variances (Meyer and Houle 2013; Houle and Meyer 2015) to generate 10,000 samples from the REML analysis of single random data sets, to determine whether the eigenvalues generated by REML-MVN sampling of random data are TW distributed. Finally, we simulated data with genetic variance for three experimental designs that represent different levels of power, and we demonstrate how REML-MVN sampling can be used to test whether a genetic eigenvalue represents a significant level of genetic variance using the TW distribution.

### REML estimation of G using unstructured and factor-analytic models and the TW distribution

We performed two separate analyses of the 10,000 simulated data sets, first employing an unconstrained covariance structure and second using a factor-analytic covariance structure. For both analyses, **G** was estimated using the following multivariate linear model:(1)where **Y** denotes a stacked vector of multivariate observations, was the among-line design matrix relating observations to the vector of unknown random effects, was the vector of unknown random genetic effects, and *ϵ* was a vector of residual errors. The random effect (and residuals) were assumed to be normally distributed and elements of were further assumed to be drawn from where **G** was the genetic covariance matrix and was the genetic relationship matrix. In the case of the unconstrained analysis, **G** was modeled as an unstructured covariance matrix; and in the case of the factor-analytic analysis, **G** was modeled using a full-rank, factor-analytic structure:(2)where was a lower triangular matrix of factor loadings. Models were run using REML implemented in the MIXED procedure in SAS (version 9.4; SAS Institute). Both analyses returned 10,000 **G** matrices, from which the spectral decomposition provided 10,000 samples for each of the five eigenvalues of **G**. As any covariance in **G** estimated from these models is simply a consequence of sampling, these eigenvalues form the null distribution of sampling variance that can be compared against the theoretical TW distribution.

To determine whether the eigenvalues of **G** obtained from (1) conform to the TW distribution specific to each eigenvalue, they first needed to be scaled and centered. The scaling and centering parameters for the theoretical limiting distributions of variance component matrices such as **G** are currently unknown (I. Johnstone, personal communication), however an empirical scaling approach can be used (Saccenti *et al.* 2011).

Rescaling of each observed genetic eigenvalue was accomplished using the approach of Saccenti *et al.* (2011):(3)where and are the mean and SD of the theoretical TW distribution for the *i*th eigenvalue, and and are the mean and SD of the corresponding observed genetic eigenvalues of random data respectively. The rescaling procedure followed here differs from that given in Box 2 of Blows and McGuigan (2015), with the removal of a redundant step represented by an initial rescaling using the equation before applying (3).

To obtain the theoretical distributions of each eigenvalue to compare the statistic to, we generated 10,000 observations from each theoretical TW distribution using its gamma approximation. For each theoretical TW distribution with mean variance and skewness the shape scale and constant of the matching gamma distribution is given by (Chiani 2014):(4)The distributions of the statistics for the observed genetic eigenvalues were then compared to the appropriate theoretical TW distribution for that particular eigenvalue using quantile–quantile (QQ) plots.

### MCMC estimation of G and the TW distribution

MCMC is an analytical approach that is often used to place confidence on genetic parameters by sampling from the MCMC chain and using this posterior distribution of the (co)variance components to generate confidence intervals (O’Hara *et al.* 2008; Sorensen 2008; Hadfield *et al.* 2010). The covariance components in such models are constrained to fall within the parameter space and hence the estimated **G** matrices will be positive-definite (Hazelton and Gurrin 2003; O’Hara *et al.* 2008; Hadfield 2010). Therefore, the posterior distribution cannot be used to test whether genetic variances themselves differ from zero. Consequently, a randomization of the data across the pedigree or with respect to the genetic groups, such that any estimated covariance is a consequence of sampling and not genetic variance, has been used to provide a posterior null distribution that the observed distribution is then compared to (*e.g.*, Hadfield 2010; Aguirre *et al.* 2014; McGuigan *et al.* 2015). To determine whether the eigenvalues from the posterior null distribution generated by MCMC analyses conformed to the TW distribution, we analyzed a single random data set (the first of our 10,000 simulated data sets from above) using the MCMCglmm package (Hadfield 2010) in R. We sampled from the posterior distribution of the MCMC chain 10,000 times, to obtain 10,000 **G** matrices (and consequently 10,000 samples of each eigenvalue). The analysis was repeated 10 times (for 10 different random data sets) to determine the consistency of results across the 10 data sets.

The data were analyzed in accordance with the multivariate linear model described in (1). We performed two separate analyses where the posterior chain was sampled 10,000 times in each, and where the two analyses differed by the prior distribution specified. In practice, the goal of many quantitative genetic studies is to choose a prior that is noninformative with respect to the (co)variance components, having little influence on the posterior distribution. However, choosing uninformative priors is a nontrivial task; presumably uninformative flat prior distributions have been shown to strongly influence (some) model parameters, and specifying uninformative priors at all scales of the analysis may be difficult (Van Dongen 2006; O’Hara *et al.* 2008). Here we used two priors that are often employed in quantitative genetic studies: an inverse-Wishart distribution where the phenotypic variance is partitioned equally among random effects and the degree of belief is weak (*e.g.*, Wilson *et al.* 2010), and a half-Cauchy parameter-expanded prior that may perform better when variances are close to zero (*e.g.*, Hadfield 2016). Preliminary analyses in which we used an improper prior deviated substantially from the TW distribution and often suffered from numerical issues, and therefore we do not consider this option any further.

In both cases, the priors for the location parameters were normally distributed and diffuse about a mean of 0 and a variance of For the variance components we first analyzed the data using an inverse-Wishart prior where the scale parameter was defined by a diagonal matrix containing values of one-half of the phenotypic variance, and the degree of belief was set at 5.002, slightly more than the dimensions of the matrix. Next, we used a half-Cauchy parameter-expanded prior with a scale parameter equal to For both priors the joint posterior distribution was estimated from 3,000,000 MCMC iterations sampled at 300 iteration intervals after an initial burn-in period of 300,000 iterations. Overall, model convergence diagnostics indicated that the MCMC chain sampled the parameter space adequately, and the autocorrelation between successive samples was typically much below 0.1 for analyses using both priors.

For each of 10,000 samples of the posterior distribution, we performed an eigenanalysis of **G**, obtaining 10,000 samples for each eigenvalue. We subsequently scaled the eigenvalues as above, according to Equation 3, and the statistics for the genetic eigenvalues were then compared to the appropriate gamma approximation of the theoretical TW distribution for that particular eigenvalue.

### REML estimation of G with MVN sampling and the TW distribution

Conceptually similar to the confidence intervals placed on variance component estimates from MCMC using the posterior distribution, Houle and Meyer (Meyer and Houle 2013; Houle and Meyer 2015) have recently shown how the inverse of the Fisher information matrix of covariance parameters ** from multivariate REML models can be used to generate confidence intervals on REML estimates of variance components. This approach is appealing compared to MCMC methods from both a computational perspective and because no prior specification is required. To determine whether the sampling distribution of the eigenvalues generated by REML-MVN sampling conformed to the TW distribution, we analyzed a single random data set (the first of the 10,000 simulated data sets from above), fitting an unconstrained REML analysis in accordance with (1), and then obtained 10,000 REML-MVN samples from the inverse of the Fisher information matrix **** of this analysis. Again, we repeated the analysis 10 times (for 10 different random data sets), to determine the consistency of results across data sets.**

REML estimates of the *p*(*p* + 1)/2 (co)variance components were obtained by fitting (1) with an unconstrained covariance structure that allowed negative eigenvalues. Following estimation of the (co)variance components, we directly sampled the elements of **G** 10,000 times, from the distribution ** where was the vector of parameter estimates of ****G** and ** was the inverse of the Fisher information matrix from the unconstrained analysis (Meyer and Houle 2013; Houle and Meyer 2015). This is considered sampling on the ****G**-scale of an unconstrained analysis. Alternatively, one can sample on the **L**-scale, where the elements of **L** (with vector of estimates are the Cholesky factors from a factor-analytic model. However, REML-MVN sampling requires that the Fisher information matrix be safely positive-definite. The Fisher information matrix from the analysis of these data that employed a factor-analytic covariance structure was highly ill-conditioned, and consequently did not meet this requirement. Therefore, we only sample on the **G**-scale of unconstrained analyses here.

For each of the 10,000 samples from the distribution ** we performed an eigenanalysis of ****G**, obtaining 10,000 estimates of each eigenvalue. We subsequently scaled the eigenvalues as above, according to Equation 3, and the statistics for the genetic eigenvalues were then compared to the appropriate gamma approximation of the theoretical TW distribution for that particular eigenvalue.

### Using REML-MVN to test observed genetic eigenvalues against the appropriate null

To illustrate how the REML-MVN approach can be used to test whether genetic eigenvalues significantly differ from the null distribution, we simulated two data structures that incorporated known genetic variance for each of three experimental designs that represented different levels of power. Incorporating known genetic variance into the simulation also allowed us to highlight the potential problems that arise when REML-MVN sampling is used to generate confidence intervals on eigenvalues without reference to the TW distribution.

The three experimental designs were: 50 lines with 5 individuals sampled per line (the same design presented above for random data); 500 lines with 10 individuals sampled per line; and a half-sibling design with 200 sires, 5 dams per sire, and 5 individuals per full-sibling family. These experimental designs represent, for example, 50 (500) inbred lines with 5 (10) individuals sampled per line, and a standard half-sibling breeding design used in many quantitative genetic studies.

We modified the data simulation from above so that a single genetic dimension was present in the data, or five genetic dimensions were present in the data (*i.e.*, the genetic covariance matrix was full rank). The among-line (or among-sire) covariance matrix for the data simulated to have a single genetic dimension was modeled using the factor-analytic structure:together with an unstructured dam (in the case of the half-sib design) and residual (for all cases) covariance matrix that resulted in an expected eigenvalue of of 0.56. The among-line covariance matrix for the data simulated to have five genetic dimensions was modeled using the factor-analytic structure:together with an unstructured dam (in the case of the half-sib design) and residual (for all cases) covariance matrix. The expected eigenvalues of the five genetic factors were 0.69, 0.57, 0.32, 0.08, and 0.01, respectively.

For each of the three experimental designs, we also simulated random data using the identity matrix to sample from a multivariate normal distribution with a mean of 0, giving rise to no within- or among-line covariance. This provided a baseline with which the performance of REML-MVN in conjunction with the TW could be compared for experimental designs with varying levels of power. In total there were three experimental designs, two simulated data sets with genetic variance, and one simulated random data set (six data sets with genetic variance plus three data sets without genetic variance).

To obtain the parameter estimates of the “observed” **G** for each experimental design/data set combination that had genetic variance, each of the six data sets were analyzed using model (1) with an unconstrained covariance structure. To generate their respective null distributions, the data were randomized across the lines (or sires in the case of the half-sib design), and again were analyzed using model (1) with an unconstrained covariance structure. We then sampled the (co)variance elements of the randomized **G** 10,000 times, from the distribution ** (Meyer and Houle 2013; Houle and Meyer 2015).**

The genetic eigenvalues that formed the null distributions were scaled using (3). To compare the eigenvalues resulting from the observed **G** to the null distribution, they also needed to be placed on the appropriate scale. For each observed eigenvalue, the statistic was calculated according to (3) where and were the mean and SD of the eigenvalues from the null distribution of randomized data, substituting each of the five observed genetic eigenvalues for the observed distribution with genetic variance. The sampling distributions of the simulated random data for each of the three experimental designs were generated as described above.

### Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

## Results

### REML estimation of G using unstructured and factor-analytic models and the TW distribution

For the 10,000 random data sets, the magnitudes of the leading eigenvalues of **G** were biased upward by the concentration of sampling error under both the unconstrained and constrained models. Since these data were simulated randomly with respect to genetic line, the genetic variance in each trait and consequently in any eigenvector is expected to be zero. The average eigenvalues for of 0.13 and 0.11, respectively, therefore represented spurious genetic variance that was purely a consequence of sampling error. With sample covariance matrices, it is well known that as the ratio of the number of parameters to the number of observations increases, the inflation of the leading eigenvalues will also increase (Johnstone 2001). Therefore, we expect that the magnitude of the spurious eigenvalue will increase as the ratio of *p* to the genetic degrees of freedom increases, although this remains to be explored in detail for different experimental designs.

Each of the first four eigenvalues that were estimated from the 10,000 random data sets using an unconstrained covariance structure that permitted negative eigenvalues fit the TW distribution well (Table 1). However, the last eigenvalue did show some deviation, falling below the 1:1 line on the QQ plot at the tails of the distribution (Figure 2 and Table 1). Consequently, the proportion of eigenvalues that exceeded the critical value of the theoretical distribution at was only 0.0389 (Table 1).

In contrast to the unstructured REML analyses, only the first two genetic eigenvalues from the model fitting the factor-analytic structure appear to conform to the TW distribution (Figure 2 and Table 1). The factor-analytic covariance structure constrains the genetic estimates to the parameter space, and therefore the lower bound on the eigenvalues from these models is zero. By the second eigenvalue the effect of the boundary constraint was becoming evident by the deviation of the lower tail from the 1:1 line on the QQ plot (Figure 2B). As the boundary was approached, there was a significant deviation from the TW distribution for the last three eigenvalues, evidenced by both the QQ plot (Figure 2, C–E) and the proportion of observations that exceeded the critical value (Table 1).

### MCMC estimation of G and the TW distribution

The MCMC analysis differed from the previous REML models in that the sampling variance of each eigenvalue was characterized by 10,000 samples of the posterior distributions for single data sets. The centering of the eigenvalue distribution along the *x*-axis is therefore determined by the individual data set that is used. For example, in the first of our simulated data sets, the observed lead eigenvalue estimated from an unconstrained REML analysis of that data set was 0.1084. Therefore it would be expected that the 10,000 samples of the posterior distribution from the MCMC analysis of this data set would center on 0.1084. However, the MCMC analyses returned posterior distributions of eigenvalues that substantially differed in mode depending on the prior that was used. For the inverse-Wishart prior, the posterior distribution of the leading eigenvalue was centered on 0.2862, well above the parameter estimate from REML (Table 1). In contrast, the parameter-expanded prior returned a posterior distribution that was centered well below the REML parameter estimate at 0.0604 (Table 1).

The posterior distributions of the eigenvalues from the first data set that was analyzed deviated substantially from the TW distribution for both priors that were used, with a larger proportion of eigenvalues exceeding the critical value than expected by chance in most cases (Table 1). This deviation from the theoretical distribution was consistent across all 10 of the random data sets that were analyzed (Figure 3, A and B).

The posterior distributions of variances from MCMC models are constrained to fall within the parameter space, as they are for factor-analytic REML models, and therefore both the boundary constraint and the concentration of sampling variance in leading eigenvalues may result in their overestimation and consequent deviation from the theoretical distribution. However, considering that the first eigenvalue from MCMC models already failed to conform well to the TW distribution, despite being well away from the boundary with *p* = 5 traits, the boundary constraint may not be the only contributing factor to the deviation from the TW distribution in this case (Figure 3, A and B and Table 1).

### REML estimation of G with MVN sampling and the TW distribution

The REML-MVN sampling was carried out using the same 10 data sets analyzed in the MCMC models above. Again, for the first of the simulated data sets, the mean magnitude of the lead eigenvalue observed from the REML-MVN sampling approach was higher than the parameter estimate of 0.1084, with an average eigenvalue of 0.15 (Table 1). Despite the difference in mean for the lead eigenvalue the sampling distribution generated by the REML-MVN approach conformed well to the TW distribution and in a consistent manner across the 10 data sets (Figure 4A), with the proportion of observations for the first data set that exceeded the critical value at 0.0469. Similarly, the following two eigenvalues and also conformed well to the theoretical distribution (Table 1), at least as well as the eigenvalues estimated from 10,000 random data sets using an unstructured covariance structure. However, the last eigenvalue deviated from the distribution, falling below the 1:1 line on the QQ plot at the tails of the distribution (Table 1) in a consistent manner across the 10 data sets (Figure 4B). This bias of the last eigenvalue was similar in magnitude to the bias observed from the analysis of 10,000 random data sets using an unstructured model (Table 1).

### Using REML-MVN to test observed genetic eigenvalues against the appropriate null

It has recently been suggested that the confidence intervals on genetic variances that are generated using REML-MVN can be used to determine whether eigenvalues contain nonzero levels of genetic variance by comparing the parameter estimate to the number of SDs above zero (Houle and Meyer 2015). For the REML-MVN analysis of the first random data set, the observed mean of the leading eigenvalue was 3.5-times larger than its SD and above zero (Table 1), raising the possibility that sampling error could be misinterpreted as genetic variance in this dimension. Relying solely on the distribution of sampling error around an eigenvalue generated by REML-MVN to infer genetic variance in real data will therefore likely be insufficient to demonstrate the presence of genetic variance. However, the centering and scaling parameters of the TW distributions specific to each eigenvalue can be used to form the appropriate null distribution against which the observed genetic eigenvalues can be tested.

We simulated data to have either one genetic dimension or five genetic dimensions for each of three different experimental designs (50 lines, 500 lines, 200 sires) representing different levels of experimental power, to demonstrate how significance testing using REML-MVN sampling combined with the TW distribution performs in these situations. As expected, as the ratio of *p* to the genetic degrees of freedom decreased, the inflation of the leading eigenvalues of random data simulated with no genetic variance decreased (Table 2). However, for all three experimental designs, the lower confidence interval for the first two eigenvalues of random data were above zero, despite the lack of simulated genetic variance in these eigenvectors. This potentially indicates the presence of significant genetic variance when the confidence intervals are interpreted in isolation. When empirically rescaled to the TW distribution, however, none of the five eigenvalues were deemed significant by comparison of their statistics to the critical values of the TW distributions specific to each eigenvalue (Table 2), highlighting the value of the TW approach to test for the significance of observed genetic eigenvalues.

For the data simulated to have a single genetic eigenvalue, the lower confidence interval of the sampling distribution generated by REML-MVN for was far above zero at 1.2080, 1.6788, and 1.4224 for the 50 line, 200 sire, and 500 line cases, respectively (Table 2). Therefore, the presence of this genetic dimension was correctly identified by examining the confidence interval for all three experimental designs in this case. However, as observed for the random data, the lower confidence intervals of were also above zero, incorrectly identifying the presence of a second genetic dimension (Table 2). To illustrate this behavior, consider the example of the 500 line experimental design. For both and the mean unscaled eigenvalues were both above zero, and the 95% confidence intervals did not overlap zero (Figure 5, A and C), suggesting the presence of significant genetic variance. However, by empirically rescaling to the TW distribution, was demonstrated to be the only significant genetic eigenvalue (Figure 5, B and D). Here the appropriate comparison was between the observed genetic eigenvalue in real data and the null distribution generated by REML-MVN sampling of randomized data, after both the observed eigenvalue and null distribution were centered and scaled according to the theoretical TW distribution specific to that eigenvalue. The observed genetic eigenvalue fell well outside the null distribution (Figure 5B), and above the critical value of 0.987 (Table 1), indicating significant genetic variance in this genetic dimension. In contrast, the observed eigenvalue fell well within the null distribution (Figure 5D) and below the critical value of −1.5411, indicating a lack of significant genetic variance in this dimension.

Currently, the best method to test whether eigenvalues are significantly different from zero is by comparing the likelihoods of a series of nested reduced-rank, factor-analytic models. Here, if reducing the rank of the model from *k* to *k*−1 dimensions significantly reduces the fit, then the *k*th eigenvalue is said to be significant (Hine and Blows 2006). For the 50 line, 200 sire, and 500 line simulated data sets with a single genetic eigenvalue, factor-analytic modeling identified a single genetic dimension in each (Table 2), consistent with the results of the TW approach presented above. Therefore, the TW approach performed as well as factor-analytic modeling in this case. For the data simulated to be full rank, factor-analytic modeling identified the presence of the first four genetic factors (Table 2). The lack of statistical support for was not surprising considering the low level of genetic variance accounted for by this factor and that factor-analytic models tend to be conservative. In contrast, the TW testing approach correctly identified the presence of all five genetic eigenvalues (Table 2). However, the statistical significance of should be interpreted with some caution. We previously demonstrated using REML-MVN sampling of random data (50 line experimental design) that the sampling distribution of the last eigenvalue did not conform well to the TW distribution (Figure 4B), with fewer samples exceeding the critical value than expected by chance (Table 1). This may result in anticonservative significance tests for the last eigenvalue of variance component matrices.

## Discussion

Sampling error generates patterns in the empirical spectral distribution of variance component matrices that can look strikingly like the biological patterns that are interpreted to represent genetic covariance, which concentrates genetic variance into fewer multivariate dimensions than the number of traits measured (Blows and McGuigan 2015). As a consequence of the magnitude of sampling error in the leading eigenvalues of **G**, particular care needs to be taken to determine if a genetic eigenvalue is greater in magnitude than expected by sampling error alone. RMT provides a generally applicable framework for testing whether eigenvalues of sample covariance matrices represent significant levels of variance (Tracy and Widom 1996, 2009; Johnstone 2001). Here we demonstrate that RMT can similarly be applied to the outcomes of multivariate genetic analyses that take the form of variance component matrices, to test whether leading genetic eigenvalues represent significant levels of genetic variance.

### Sampling error in genetic eigenvalues of random data

In general, the genetic eigenvalues from unconstrained REML-based analyses of random data conformed well to the TW distribution (Figure 2), with the small number of lines (*n* = 50) and traits (*p* = 5). While the proportion of leading eigenvalues that exceeded the critical value at was >0.05 at 0.0568; as the number of genetic degrees of freedom and *p* increase, the fit to the theoretical distribution is predicted to further improve. This is illustrated, in part, by the better fit of the sampling error of to the theoretical distribution from the analysis of 200 random data sets of *p* = 10 traits, where the proportion of eigenvalues exceeding the critical value was 0.05 (Blows and McGuigan 2015). While sampling error is known to concentrate genetic variance in the leading eigenvalues of **G**, particularly with the small sample sizes that are used here, the corollary is that the trailing eigenvalues must be underestimated. In contrast to the leading eigenvalues, the last eigenvalue did show some deviation from the theoretical TW distribution (Table 1), with a smaller proportion of eigenvalues exceeding the critical value than expected by chance. The behavior of the last eigenvalue of sample covariance matrices may be better described by a reflected TW distribution (Ma 2012). Whether this may also be the case for variance component matrices such as **G** remains to be determined, and will be an important component of future studies.

In contrast to the good fit to the TW distribution of the sampling error in eigenvalues from the unconstrained models, the sampling distribution of eigenvalues generated by factor-analytic models failed to conform to the TW distribution in general (Table 1). Currently, factor-analytic modeling is the best approach for testing whether eigenvalues significantly differ from zero, and is a useful tool in many cases that provides biologically sensible estimates of **G**. In this approach, a series of nested reduced-rank models are fit, and a likelihood ratio test is then used to determine whether reducing the rank significantly decreases the fit of the model (Hine and Blows 2006). Our application of factor-analytic modeling to generate a null sampling distribution differed in that we fit full-rank factor-analytic models to random data that had no genetic signal, and consequently these models were severely overparameterized. Because factor-analytic modeling imposes a boundary constraint such that eigenvalues must be greater than or equal to zero, to be compatible with RMT, we fit full-rank models to avoid the boundary constraint as much as possible. But despite this, the sampling variance of only the first eigenvalue was unaffected by the boundary constraint, and the models also suffered from convergence problems in some cases, due to the overparameterization (9618 of 10,000 models converged). Consequently, the sampling distribution of the eigenvalues of **G** resulting from factor-analytic models depart from the theoretical distributions characterized by RMT, and therefore, will not be compatible with the approach developed here in many cases.

Similar to factor-analytic models, the sampling distribution of eigenvalues generated by the posterior distributions of MCMC analyses are also subject to a boundary condition where **G** is positive semidefinite (Hazelton and Gurrin 2003; O’Hara *et al.* 2008; Hadfield 2010). Consequently, determining whether eigenvalues account for significant levels of genetic variance in the MCMC framework requires more complicated tests based on the overlap between a null and observed posterior distribution for a given eigenvalue (Schmid and Schmidt 2006). If the sampling distribution of genetic eigenvalues conforms to the TW distribution, then this would provide a simple empirical framework for testing the significance of these eigenvalues, circumventing the need for more complicated tests. However, we were unable to demonstrate that the posterior distributions of eigenvalues generated MCMC adequately conform to the TW distributions, surprisingly, even for the leading eigenvalue of random data. While the boundary conditions set under MCMC are likely to contribute to this lack of fit, there may also be other factors involved. For example, although the autocorrelation between successive samples of the posterior distributions of our MCMC models was less than the accepted rule of thumb of 0.01 in all cases, the autocorrelation will only approach 0 as the thinning interval approaches infinity. Preliminary analyses where the sampling distribution of eigenvalues is generated by a distribution of posterior modes from each of the posterior distributions of the MCMC analyses of 1000 random data sets (where the autocorrelation between modes is 0) conform much better to the TW distribution. Therefore, any level of autocorrelation may be contributing to the lack of fit. It will be important to determine in future work if this could be a contributing factor, and if relaxing the requirement for positive variance components under MCMC (Hazelton and Gurrin 2003) might be employed to allow the TW distribution to be used.

The confidence intervals generated by the posterior distributions of MCMC analyses share a conceptual similarity with the sampling error that can be produced with REML-MVN sampling, but one advantage of REML-MVN is that it does not require positive semidefinite covariance matrices. REML-MVN sampling shows particular promise as an easily implementable approach to a difficult empirical problem (Meyer and Houle 2013; Houle and Meyer 2015). It circumvents the issue of having to run thousands of models to generate sampling distributions by sampling from the inverse of the Fisher Information matrix that is obtained from the analysis of a single random (or randomized) data set. Here we were able to demonstrate that the sampling distributions of the leading eigenvalues generated by REML-MVN sampling conformed well to the theoretical TW distributions (Table 1) and consistently across different random data sets (Figure 4A). While the last eigenvalue did show a consistent bias from the TW distribution (Figure 4B), this deviation was not substantially worse than that observed for from the unconstrained REML analysis of 10,000 random data sets (Figure 2E). Therefore, REML-MVN sampling appears to be as compatible with the TW approach as the sampling variance of eigenvalues generated by the analysis of thousands of data sets, and accordingly REML-MVN can therefore be used to generate appropriate null distributions against which observed genetic eigenvalues can be tested.

### Sampling error in genetic eigenvalues compared to the genetic variance in trait combinations

It is important to recognize a key distinction in the behavior of sampling error in genetic eigenvalues compared to any particular trait combination. Consider the leading eigenvalue of a given observed **G** matrix, which represents the genetic variance in the trait combination As a consequence of the concentration of sampling error in the leading eigenvalue, the magnitude of genetic variance in is a combination of that sampling error and the true level of genetic variance. The appropriate null distribution for the eigenvalue of is therefore a distribution of leading eigenvalues (as shown above) that all experience the same level of sampling error as the observed eigenvalue. Counterintuitively, this means that the genetic variance in the trait combination is tested against the sampling error in random directions in trait space, as in each randomization of a data set the direction of greatest variance will be essentially unique.

From a biological perspective, it may tempting to project the observed trait vector back through a set of **G** matrices that result from a randomization process, so that the variance in the same trait combination forms the basis of the null distribution. Although this seems to make sense biologically, it will underestimate the amount of sampling error in the observed genetic eigenvalue (Figure 6). This is because the trait combination in each realization of a randomization will not capture the concentration of sampling error that resides in the leading eigenvalue.

### Testing observed genetic eigenvalues using the TW distribution

The importance of appropriately accounting for sampling error in genetic eigenvalues using the TW distribution is best illustrated by the application of REML-MVN sampling in experimental designs with simulated genetic variance. In our simulated data sets with one genetic dimension, although only a single genetic eigenvalue was simulated in the data, using the 95% confidence intervals from REML-MVN sampling would have resulted in the conclusion that a second genetic dimension existed for all three of the experimental designs that were studied. Interpreting the confidence intervals in isolation may therefore result in erroneous conclusions regarding the presence of genetic variance (Houle and Meyer 2015; Kingsolver *et al.* 2015; Wolak and Reid 2016). The overlap of the lower 95% confidence interval with zero will be solely dependent on the power of the experiment: higher (lower) power will result in more (less) spurious genetic eigenvalues detected. Without rescaling these sampling distributions to the TW distribution and using the appropriate critical value of the TW statistic, they do not adequately control for the accumulation of sampling error in the genetic eigenvalues.

In the data simulated to be full rank, factor-analytic modeling identified four genetic dimensions that accounted for nonzero levels of variance (Table 2). This is unsurprising given the low expected value of the last eigenvalue of 0.01, and that factor-analytic models are known to be conservative. The TW testing approach, on the other hand, identified all five genetic factors as accounting for significant levels of variance (Table 2). However, the last eigenvalue may not conform to the TW distribution particularly well. The consequence of the downward bias at the tails of the distribution that was observed (Figure 4B) results in an anticonservative hypothesis test when relying on the critical value of the TW distribution. Therefore, the significance of this last genetic dimension identified by TW testing should be interpreted with some caution. At the moment, the TW testing approach will therefore be most appropriate for determining whether leading eigenvalues account for a significant portion of genetic variance.

A particularly promising avenue for combining REML-MVN with a TW testing approach may be for very high-dimensional genetic analyses where the convergence of REML would be very unlikely, and where, for example, MCMC is the only viable current approach (Runcie and Mukherjee 2013). If a large number of bivariate genetic analyses are combined to estimate a **G** matrix (Blows *et al.* 2015), and if the eigenvalues of such **G** conform to the TW distribution, it would provide a way to determine the presence of genetic variance in high-dimensional situations without the need for the convergence of high-dimensional models that is currently required.

## Acknowledgments

We would like to thank Iain Johnstone, David Houle and Bruce Walsh for helpful comments on the manuscript.

## Footnotes

*Communicating editor: E. A. Stone*

- Received December 4, 2016.
- Accepted April 17, 2017.

- Copyright © 2017 by the Genetics Society of America