Genetics, Vol. 149, 1605-1614, July 1998, Copyright © 1998

On the Sampling Variance of Intraclass Correlations and Genetic Correlations

Peter M. Visschera
a University of Edinburgh, Institute of Ecology and Resource Management, Edinburgh EH9 3JG, Scotland

Corresponding author: Peter M. Visscher, University of Edinburgh, Institute of Ecology and Resource Management, West Mains Rd., Edinburgh EH9 3JG, Scotland, peter.visscher{at}ed.ac.uk (E-mail).

Communicating editor: R. G. SHAW


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION AND CONCLUSIONS
*LITERATURE CITED

Widely used standard expressions for the sampling variance of intraclass correlations and genetic correlation coefficients were reviewed for small and large sample sizes. For the sampling variance of the intraclass correlation, it was shown by simulation that the commonly used expression, derived using a first-order Taylor series performs better than alternative expressions found in the literature, when the between-sire degrees of freedom were small. The expressions for the sampling variance of the genetic correlation are significantly biased for small sample sizes, in particular when the population values, or their estimates, are close to zero. It was shown, both analytically and by simulation, that this is because the estimate of the sampling variance becomes very large in these cases due to very small values of the denominator of the expressions. It was concluded, therefore, that for small samples, estimates of the heritabilities and genetic correlations should not be used in the expressions for the sampling variance of the genetic correlation. It was shown analytically that in cases where the population values of the heritabilities are known, using the estimated heritabilities rather than their true values to estimate the genetic correlation results in a lower sampling variance for the genetic correlation. Therefore, for large samples, estimates of heritabilities, and not their true values, should be used.


THERE are three classic papers on the topic of sampling variances of estimates of genetic correlations in the 1950s: REEVE 1955 Down, who derived expressions of the sampling variance for parent-offspring designs; ROBERTSON 1959A Down, who derived general expressions for balanced one-way ANOVA designs with equal population values for heritabilities; and TALLIS 1959 Down, who derived general expressions for balanced and unbalanced one-way designs. Although the latter article is, in a sense, the most general, it is not referred to frequently (it does not help that the reference added in the proof of the ROBERTSON article points to the wrong journal).

It could be argued that the expressions derived in those articles are no longer relevant, since estimation techniques have moved on from least-squares methods to likelihood-based methods [mainly residual maximum likelihood (REML); PATTERSON and THOMPSON 1971 Down]. Using likelihood methods, sampling variances can be approximated from likelihood profiles (MEYER and HILL 1991 Down). However, numerous publications, particularly in the evolutionary genetics literature, still use the expressions derived by REEVE, ROBERTSON, and TALLIS. There appears to be some confusion about the use of expressions for the sampling variance of genetic correlations. For example, KOOTS and GIBSON 1996 Down, who performed a meta-analysis of an impressive number (1500) of estimates of heritabilities and genetic correlations for beef cattle traits, argued that using the estimates of heritabilities and genetic correlations in the expressions of REEVE 1955 Down and ROBERTSON 1959A Down, rather than their true (population) values, appeared to be closer to the observed empirical sampling variance of the genetic correlation coefficient. This was corroborated by a small simulation study. Furthermore, these authors implied that the equations derived in REEVE 1955 Down and ROBERTSON 1959A Down were inaccurate, because the empirical sampling variance of estimated genetic correlation coefficients was much larger than the expected sampling variance, using both their data and simulations.

Three areas of confusion can be identified:

  1. Should the parameters in the expressions for the sampling variance of the genetic correlation coefficient in REEVE 1955 Down, ROBERTSON 1959A Down, and TALLIS 1959 Down be the population parameters or the estimates of those parameters?

  2. How good are the expressions derived in the trio of articles?

  3. What is the impact of estimates that are outside the parameter space, i.e., negative heritability estimates and/or estimates of genetic correlations <-1 or >+1?

In this article I review the main expressions of REEVE, ROBERTSON, and TALLIS and clarify under what circumstances the expressions should be used. I also review and evaluate the equations for the sampling variance of intraclass correlations as a paradigm, since these are central to the understanding of the assumptions and methods.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION AND CONCLUSIONS
*LITERATURE CITED

To answer the above questions, we first look at the derivation of the sampling variance of intraclass correlation coefficients because this serves as an appropriate paradigm for the sampling variance of the genetic correlation coefficient. Subsequently, expressions for the sampling variance of the genetic correlation are reviewed and evaluated.

Sampling variance of intraclass correlation coefficients
Population parameters known: Consider a simple, balanced one-way design with s sires and n progeny per sire, and assume that parameters are estimated using least-squares methods, e.g., ANOVA. Observations are assumed to be normally distributed. The expectations and variances of the between- and within-sire mean squares (MS) are

with B and W the between- and within-sire MS, t the popula-tion intraclass correlation, and {sigma}p the phenotypic standard deviation. The variance of the least-squares estimate of the intra-class correlation,

can be approximated using a first-order Taylor-series expansion about the expected mean squares,

(1)

Equation 1 is well known (ROBERTSON 1959B Down; FALCONER and MACKAY 1996 Down). Its derivation, using means and variances of mean squares, appears to be first given by OSBORNE and PATERSON 1952 Down.

For large n, Equation 1 reduces to

(2)

A number of expressions similar to Equation 1, which all differ in the terms relating to the number of sires and progeny per sire, can be found in the literature. They differ in particular in the degrees of freedom relating to the between-sire component of variance. The general expression is

(3)
with

Where ci is a function of s and n, from reference i. The original expression was derived in a classical paper by FISHER 1921 Down,

(4)

ROBERTSON and LERNER 1949 Down quoted Fisher, but used

(5)

Finally, following ZERBE and GOLDGAR 1980 Down, an expression can be derived using

Hence,

(6)
with

(6)

Although this expression is trivial to derive, to my knowledge this is the first time that it has been documented following a first-order approximation using the F ratio.

All the above expressions for ci reduce to the one most commonly used (i.e., c0 = ) for large s and n. However, even for large n, there are discrepancies between the formulas depending on the use of s, (s - 1), or (s - 2) in the denominator. It is likely that the form used by ROBERTSON and LERNER is incorrect and probably stems from the use of FISHER's z-transformation. FISHER showed that

with

Hence,

However, as pointed out by OSBORNE and PATERSON 1952 Down, this is a roundabout way of deriving the sampling variance of the estimated intraclass correlation, by first transforming to z and then back again.

Population values unknown: Except when doing power calculations and/or investigating the design of experiments (ROBERTSON 1959B Down) or simulation studies, we do not know the population values and hence do not know the exact or approximate sampling variance of the intraclass correlation. The standard practice is to use the formulae derived in the previous section and to substitute for t. This is essentially based upon the assumption that E() = t. However, it should be obvious that if, by chance, the estimate of the intraclass correlation is too high (or too low), the resulting estimate of the sampling variance will be biased. Using

(7)
gives a maximum estimate of the sampling variance of = [] (see also TAYLOR 1976 Down). The minimum estimate of the sampling variance is found for = 1 or = - . Only when there is a scale on which the sampling variance is (nearly) independent of the population values, for example, FISHER's z-scale, will the estimate of the sampling variance be correct.

KEMPTHORNE 1957 Down argued that an adjustment to the degrees of freedom should be made when estimating the sampling variance of an estimate of the intraclass correlation, since

where d.f. is degrees of freedom. This indicates an unbiased estimate of the sampling variance as

(8)

Equation 8 reduces to the standard equation of OSBORNE and PATERSON 1952 Down and ROBERTSON 1959B Down for large sn.

Simulation study: Simulations were performed to compare the empirical standard deviation of heritability estimates, the predicted standard deviation using the true population values (Equation 1), and the average estimated standard deviation (using Equation 1, Equation 4, and Equation 8, with substituted for t). Independent between-sire and within-sire sums of squares were sampled from central {chi}2 distributions with (s - 1) and [s(n - 1)] degrees of freedom, respectively, and then scaled to the appropriate mean squares using the population values of t and {sigma}p and the values of s and n. Without loss of generality, a phenotypic standard deviation of unity was used throughout.

Since the only difference between the various prediction equations for the sampling variances are functions of s and n, only results for the Taylor series (OSBORNE and PATERSON 1952 Down) are presented. The other equations for the sampling variance differ approximately by factors of (s - 1)/s (using FISHER's formula) and (s - 1)/(s + 1) (using KEMPTHORNE 1957 Down).

Sampling variance of genetic correlation coefficient
Expressions from literature: TALLIS 1959 Down derived a general expression for the approximate sampling variance of the estimated genetic correlation coefficient for a balanced half-sib design. Population values for the intraclass correlations of the two traits are t1 and t2, and the genetic and within-sire correlation coefficients are rg and rw, respectively. The general form of TALLIS' expression is

for

(9)
and

(10)
with

(11)
Ri = , which is the general expression of the reliability of a progeny test based upon n progeny and a heritability of 4ti. TALLIS 1959 Down used a different (but equivalent) expression for P and Q, but we find it more convenient to write the terms as presented here.

Special cases:

This is the scenario of ROBERTSON 1959A Down. The terms P and Q simplify to

(12)

(13)
with

These correspond to the equations given by ROBERTSON 1959A Down(p. 473), although his P and Q were scaled by a factor of (nt)2.


A further simplification is if the genetic and within-sire correlation are the same. The expression for the sampling variance may be written as

(14)
which corresponds to ROBERTSON's formula on page 474. In general, when t1 != t2, there is no simple form for the sampling variance of the genetic correlation coefficient.

Finally, ROBERTSON 1959A Down suggested a very simple and general expression for the sampling variance of the genetic correlation coefficient by observing the similarity between expressions derived for special cases,

(15)

This was the equation used by KOOTS and GIBSON 1996 Down.


For a large number of progeny per sire, TALLIS' equation reduces to a very simple form,

(16)

This equation is equivalent to the approximation of the sampling variance of a correlation coefficient in the bivariate normal case with (s - 1) degrees of freedom.

There are difficulties in using the approaches for the sampling variance of the intraclass correlation to determine the sampling variance of estimates of the genetic correlation coefficient: (1) The estimate of rg is unbounded in principle, so that large positive and negative values (outside the range -1 to +1) are possible, and (2) the true heritability, or its estimate, appears in both the numerator and the denominator of the equation for the sampling variance of rg (see Equation 9Equation 10Equation 11). This means that in the vicinity of true or estimated h2 being zero, the estimate of the sampling variance can become very large because of a division by a small number. Also, if one or both of the estimated heritabilities is <0, the estimate of the genetic correlation coefficient is an imaginary number (REEVE 1955 Down). Simulation results are less meaningful in these cases, because the estimate of the sampling variance may not have converged (or never will). For example, in the simulation study of parent-offspring regression in KOOTS and GIBSON 1996 Down, with true population parameters of h21 = h22 = 0.10 and rg = 0, the estimates of rg varied from -2.5 and +4.18, and the empirical variance of estimated genetic correlation coefficients was very large when both estimates of the heritability were close to zero.

Heritabilities known: KOOTS and GIBSON 1996 Down argued that in some (rare) cases, the population heritabilities may be known when the genetic correlation is estimated, for example, through prior information or a meta-analysis of relevant literature values. In that case, one could proceed with only estimating the between- and within-sire covariances, along with the phenotypic variances, from the data. If the phenotypic variances are assumed to be estimated accurately, i.e., for large sn, then

with B12 and W12 the between-sire mean crossproduct and within-sire mean crossproduct, respectively, and ti and {sigma}2pi the known intraclass correlations and phenotypic variances for trait i. The means and variances of the crossproducts, using Wii and Bii to denote the within and between-sire MS for trait i, are

Using

(17)
the sampling variance of the genetic correlation coefficient can be calculated by substituting the expressions for E(B12), E(W12), var(B12), and var(W12) into Equation 17. In particular, for t1 = t2 = t,

(18)
and

(19)

The values of P and Q are larger when the heritabilities are assumed known; i.e., the sampling variance of the genetic correlation is larger when heritabilities are assumed known (cf. Equation 12 and Equation 13). This is most clearly seen when the number of progeny are very large, because then

(20)
which is always larger than the derivation for unknown heritabilities (Equation 16). These findings are in agreement with KOOTS and GIBSON 1996 Down, who argued that the estimated heritabilities, and not the population values (if known), should be used to estimate the sampling variance of the genetic correlation coefficient. The results arise because of the positive covariance between the estimated between-sire covariance (from B12) and between-sire variances (from B11 and B22). Hence, the absolute value of the genetic correlation coefficient and the estimates of the heritabilities are positively correlated (unless rg = 0), which is taken into account by using the estimated rather than true values of the heritabilities in the expressions to estimate the sampling variance of the genetic correlation.

Simulations: A simulation study was performed in which the empirical variance of the estimated rg and the average estimated sampling variance using the estimates of heritabilities and correlations was compared to the predicted sampling variance [using equations from TALLIS 1959 Down; see above]. Independent 2 x 2 matrices of within-sire mean squares and mean crossproducts (W11, W12, and W22) and between-sire mean squares and crossproducts (B11, B12, and B22) were sampled from a central Wishart distribution with (s(n - 1)) and (s - 1) degrees of freedom, respectively (see VISSCHER 1995 Down for more details). The resulting matrices of mean squares and crossproducts were used to calculate least-squares estimates of the two heritabilities and the genetic and within-sire correlation coefficients in the standard way (see, for example, TALLIS 1959 Down). For estimated sire variances that were positive, estimates of genetic correlations could be <-1 or >1. A least-squares estimate of the genetic correlation coefficient was not possible when one or both of the heritability estimates were negative.

Since most authors (e.g., FISHER 1921 Down; ROBERTSON 1959A Down; TALLIS 1959 Down) have explicitly warned against the use of the "standard" equations when the true heritability, or its estimate, is close to zero, it seems more meaningful to force the estimate of rg to be >-1 and <1. Therefore, if the estimate of the genetic covariance matrix, i.e., (B - W)/n, with B and W the 2 x 2 matrices of between- and within-sire mean squares, respectively, was not positive-definite, REML estimates were calculated using the sampled between- and within-sire mean squares and crossproducts. To force the parameters in the parameter space, a form of "bending" the (least-squares) covariance matrix was applied (e.g., HAYES and HILL 1981 Down; CALVIN 1993 Down; VISSCHER 1995 Down). The form of bending applied was described as attenuating the covariance matrix by VISSCHER 1995 Down, and estimates of (co)variances are the same if a REML analysis had been carried out on the original data. This was done by calculating heritabilities on a canonical scale and setting negative heritabilities to a small positive value (10-6) and heritabilities larger than one to a value less than one (1–10-6). Following KOOTS and GIBSON 1996 Down, simulated data sets were summarized only if the geometric mean of the estimated heritabilities was >0.01. For each set of parameters, simulation was stopped when 105 replicated samples of the estimated genetic correlation coefficient were obtained.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION AND CONCLUSIONS
*LITERATURE CITED

Validation of expressions for sampling variances of intraclass correlations:
Predicted sampling variances of heritability estimates, observed sampling variances from simulations, and the average estimated sampling variance from simulation are presented in Table 1. For the predicted and estimated sampling variances, only the equation of OSBORNE and PATERSON 1952 Down was used. To obtain other predictions, results for the standard error of the heritability need to be multiplied by, approximately, factors of [(s - 1)/s]1/2 (using Fisher's formula) and [(s - 1)/(s + 1)]1/2 (using KEMPTHORNE 1957 Down). Results (Table 1) indicate that the approximation of OSBORNE and PATERSON 1952 Down works very well, in that the predicted variation in estimated heritabilities is close to the observed variation from simulation, in particular for a small value of the heritability. For a large heritability and a small number of sires, it appears that the approximation of Fisher is better. For example, for s = 2, n = 1000, and h2 = 0.96, the predicted standard error from OSBORNE and PATERSON 1952 Down is 1.0353 (Table 1) and from Fisher's equation, is 0.7323 (not shown in tables), while the observed standard error was 0.7192. However, further simulations (not shown) with different values of the heritability indicated that both predictions deviate substantially from the observed standard errors for large heritabilities and a small number of sires. For example, for t = 0.95 (and hence a "heritability" of 3.80), the observed standard error in estimated heritability was 1.2619, whereas the predictions from OSBORNE and PATERSON 1952 Down and FISHER 1921 Down were 0.2687 and 0.5183, respectively.


 
View this table:
In this window
In a new window

 
Table 1. Observed (O), estimated (E), and predicted (P) standard errors of intraclass correlations

The average estimated standard errors of the heritability estimates are close to the observed standard error from simulation. Clearly, the approximation of OSBORNE and PATERSON 1952 Down, i.e., the substitution of the estimated heritabilities into Equation 1, works very well, and, therefore, using FISHER's or KEMPTHORNE's formula will underestimate the true standard error by factors of [(s - 1)/s]1/2 and [(s - 1)/(s + 1)]1/2. It appears that the average estimated standard error is closer to the observed standard error than the prediction using the population values of the heritability.

Sampling variances for genetic correlations:
Results are presented in Table 2. Clearly for small n, the empirical standard error of rg is usually larger than that predicted under the unconstrained (least-squares) model. For example, for s = 100 and n = 10, the empirical standard error is 0.833 for h2 = 0.10 and rg = 0.0, whereas the predicted value is 0.509. When the parameters are forced in the parameter space, the maximum empirical standard error is 1.0, when half of the time an estimate of +1 is obtained, and half of the time an estimate of -1. For small s and n, the empirical standard error can then be smaller than that predicted. For example, for s = 100 and n = 2, the empirical standard error from REML was 0.897 for h2 = 0.10 and rg = 0, whereas the predicted value was 2.84. For large n (>10), the equations perform well. Substituting the estimated parameters into expressions 8 and 9 is almost always worse, because of the real possibility of obtaining very small estimates of the heritabilities. Only for large designs are the average estimated and predicted standard errors similar. For powerful designs, i.e., for those designs with a small probability of obtaining least-squares estimates that are out of bounds, the average estimated sampling variance appears to be closer to the observed sampling variance than the predicted values (Table 2).


 
View this table:
In this window
In a new window

 
Table 2. Empirical (O), estimated (E), and predicted (P) standard errors of estimated genetic correlation coefficients

Relationship between heritability estimates and sampling variance of rg:
A more detailed investigation into the relationship between estimated heritabilities, estimated genetic correlation coefficients, and the sampling variance of the genetic correlation estimate was performed for s = 100, n = 1000, h2 = 0.50 (both traits), and rw = rg = 0.75. One million replicated populations were simulated, and both the observed standard error and the estimated standard error were summarized as a function of the geometric mean of the heritability estimates (i.e., {h1 x h2}). Simulation results are displayed in Figure 1. The graph also includes a plot of the predicted standard error, assuming that the values of the heritabilities on the x-axis are the population values. Since for a powerful design with many progeny per sire the predicted sampling variance of the genetic correlation coefficient does not depend on the heritabilities (Equation 14 and Equation 16), the corresponding line in Figure 1 appears to be horizontal. The prediction of the standard error using population values, i.e., h2 = 0.50 and rg = rw = 0.75, is 0.0443 for this design. The observed standard error of genetic correlation coefficients over all samples, hence also over all possible values of estimated heritabilities, was 0.0443, and the correlation between the estimated genetic correlation and the geometric mean of the heritability estimates was 0.59 (results not shown).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 1. Standard error of the genetic correlation coefficient against the geometric mean of the estimated heritabilities, for a design of s = 100 and n = 1000. Population values were 0.50 for the two heritabilities, and 0.75 for the phenotypic and genetic correlation coefficients. Simulated results based upon 106 samples. —, predicted values using Equation 14, assuming that the values on the x-axis are the population values; – – –, observed standard error; ---, estimated standard error, using the estimated values of the heritabilities and correlation coefficients.


*  DISCUSSION AND CONCLUSIONS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION AND CONCLUSIONS
*LITERATURE CITED

Population or estimated values:
It is clear that all the expressions for the sampling variance of intraclass correlations or genetic correlation coefficients were essentially derived using a first-order Taylor series about the true population values. Hence, these are the values that should be used to study, for example, the power of various experimental designs, because they give the best prediction of the sampling variance.

Sampling variance of intraclass correlation:
The prediction of the sampling variance of the heritability estimate based upon population values is accurate for small heritabilities and/or a large number of sires. However, for a small number of sires and a large heritability, the prediction is very poor. The reason for this is that the distribution of the heritability estimate becomes very skewed in these cases, and the Taylor series in Equation 1 ignores higher-order terms by implicitly assuming that both numerator and denominator in the series are normally distributed. In fact, they are distributed proportionally to {chi}2 distributions, which are known to be highly skewed for a small number of degrees of freedom [the coefficient of skewness of a central {chi}2 distribution with k degrees of freedom is 23/2/k1/2 (LANCASTER 1969 Down, p. 20)]. Higher-order Taylor series converged only slowly (results not shown) and do not improve the prediction of the sampling variance substantially. For the extreme case of sires with many progeny it was shown (Equation 2) that the predicted standard error of the estimated heritability is proportional to t(1 - t). However, the observed standard error in this case is a function of t, since () is the only term that varies in the Taylor series. Hence, the OSBORNE and PATERSON 1952 Down approximation is biased downward for large heritabilities and a small number of sires. Although results were not shown, the estimate of the heritability is also biased downward in these cases. However, heritabilities of quantitative traits are not often >0.5, and estimates are usually based on a reasonable number of families, so that this problem of an underprediction of the sampling variance is unlikely to occur.

Estimating the sampling variance of intraclass correlations by substituting the estimates of the heritabilities into the standard expressions (OSBORNE and PATERSON 1952 Down) works remarkably well, in that the average estimated standard error is very close to that predicted. For a large value of the heritability (h2 = 0.96) and a small number of sires (s = 2 or s = 4), the average estimated standard error is smaller than the predicted standard error and appears to be close to the observed standard error (Table 1). However, further simulations using more extreme values of t showed that this is not a general observation. For example, for t > 0.5 (corresponding to "heritability" >2.0) the prediction using population values is closer to the observed sampling variance than the average estimated sampling variation (results not shown). Part of the reason why the average estimated standard error of the heritability is closer to the observed sampling variance than the prediction based upon t for heritability values in the normal range is that it takes account of the bias in the heritability estimate. Heritability estimates are biased downward, in particular for large values of t and small values of s. For example, for h2 = 0.96, s = 4, and n = 1000, the average estimate for the heritability from simulation is 0.86 (results not in tables). When this value is used for the standard prediction equation (Equation 1), the predicted standard error of the heritability is 0.5532, which is closer to the observed standard error (0.5169, Table 1) than that predicted from the population value of the heritability (0.5978, Table 1).

Sampling variances of genetic correlations:
The expressions for var(g) from REEVE, TALLIS, and ROBERTSON perform poorly using small population sizes and small heritabilities when the known population values are used to predict the sampling variance of the genetic correlation coefficient. This is because the estimates of the genetic correlation coefficient can become very large (positive or negative) when using least-squares methods. Using REML, the equations perform much better, although the empirical standard errors are generally larger than those predicted in Table 2. Substituting the estimates of the heritabilities and genetic correlations into Equation 15 can result in a very large estimate of the sampling variance of the genetic correlation when there is a real chance that the estimates of the heritabilities approach zero (the numerator of Equation 15 approaches 2/[n(s - 1)(n - 1)] for both intraclass correlations approaching zero, whereas the denominator goes to zero).

KOOTS and GIBSON 1996 Down showed in one of their simulation studies that the empirical sampling variance of the genetic correlation coefficient depended on the estimates of the two heritabilities and concluded that, therefore, the estimates of the heritabilities should be used in, for example, Equation 15, even if the population values were known. Further simulation results using the same population values (K. KOOTS and J. GIBSON, personal communication) showed very clearly that the sampling variance of the genetic correlation coefficient, conditional on the values of the estimated heritabilities, is accurately estimated by substituting the estimated (and not the true) heritabilities into Equation 15, in the case of a parent-offspring design, heritabilities of 0.10, and environmental and genetic correlation coefficients of zero. This appears to contradict the simulation results in Table 2, which show that the average sampling variance is poorly estimated using estimated parameters. However, the results in Table 2 indicate that the estimation is poor only when the least-squares estimates are likely to be outside the parameter space. In the cases where the least-squares and REML estimates are identical, both the prediction and average estimated sampling variance are close to the observed one.

The relationship between estimated heritabilities and the empirical sampling variance of the genetic correlation was explored in Figure 1 for a powerful design. From these results we may conclude that (1) the predicted sampling variance accurately predicts the average observed sampling variance (0.0443), but not the observed sampling variance for given values of the achieved estimates of the heritabilities; (2) for a given value of the estimates of the heritabilities, the estimated sampling variance follows a very similar pattern to that of the observed sampling variance (as in KOOTS and GIBSON 1996 Down); (3) for a given value of the estimates of the heritabilities, the estimated sampling variance is larger than the observed sampling variance; and (4) when the estimated heritabilities are close to the true values, the predicted and estimated sampling variances coincide. For other designs, a similar pattern was observed. When the experiment was large and the population values of the correlation coefficients zero, the observed and estimated sampling variances, as a function of the geometric mean of the estimated heritabilities, were very similar. These additional results confirm the results of KOOTS and GIBSON 1996 Down and reinforce their recommendation that the value estimated heritabilities should be used in calculating the sampling variance of the genetic correlation coefficient, even in the rare cases when the population values of the heritabilities are known. From Table 2 it appears that the average estimated sampling variance of the genetic correlation coefficient is closer to the observed sampling variance than the prediction using population parameters. For one design, s = 500, n = 10, h2 = 0.10, rg = rw = 0, this was further explored using the results from 106 replicated samples. Figure 2 shows that for h1h2 > 0.07, the observed, estimated, and predicted sampling variance are virtually identical. For each of the geometric mean classes, the mean estimate of the genetic correlation was zero (results not shown), so that for a particular value of the estimated geometric mean of the heritabilities, the sampling variance of the genetic correlation coefficients reflects sampling from a population with the true values of the heritabilities equal to those estimated. This is in contrast with the previous example, in which the correlation between the estimated genetic correlation and geometric mean of the heritabilities was positive (+0.59). For smaller values of the estimated heritabilities, Figure 2 shows that the estimated sampling variance is much larger than either the predicted or observed sampling variance. The estimated sam-pling variance of the genetic correlation coefficient for h1h2 = 0.01 was 9.9. It is not clear from these results why the average estimated sampling variance is closer to the observed sampling variance than the prediction using population values.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 2. Standard error of the genetic correlation coefficient against the geometric mean of the estimated heritabilities, for a design of s = 500 and n = 10. Population values were 0.10 for the two heritabilities, and 0.0 for the phenotypic and genetic correlation coefficients. —, predicted values using Equation 14, assuming that the values on the x-axis are the population values; – – –, observed standard error; ---, estimated standard error, using the estimated values of the heritabilities and correlation coefficients.

VAN VLECK and HENDERSON 1961 Down investigated the behavior of the expression derived by REEVE 1955 Down for the parent-offspring regression scenario by simulation. They came to the same conclusion as KOOTS and GIBSON 1996 Down; i.e., in the case of a single progeny and one parent, more than 1000 pairs were needed before REEVE's expression was reasonably accurate.

Bias in estimate of sampling variance:
It is usually assumed that by substituting the estimates of population parameters in the prediction equations for the sampling variance of the heritability or genetic correlation, unbiased estimates of those sampling variances are obtained. However, this is not generally the case for small sample sizes. Furthermore, when comparing the average sampling variation from simulation, it matters whether results are expressed in the average standard error (as in this study) or in the average sampling variance. This is best illustrated using the example of a half-sib design with large n and a small population value of t. Then

with corresponding estimates,

If expectations are taken over these estimates, then

Hence, for small t and large n, the estimate of the sampling variance can be severely biased upward for a small number of sires, whereas the estimate of the standard error is unbiased. It follows that the adjustment of the degrees of freedom suggested by KEMPTHORNE 1957 Down(Equation 8) gives an unbiased estimate of the sampling variance for small t, but a severely biased estimate of the standard error. In practice one should therefore be cautious when comparing or using estimated sampling variances from different experiments. In particular, combining heritability estimates from different-sized experiments by weighting the estimates proportionally to the inverse of the estimated sampling variance should be avoided because of an induced positive correlation between the estimate of the heritability and the estimate of its sampling variance, following Equation 1 and Equation 7 (W. G. HILL, personal communication). This induced correlation causes the combined heritability estimate to be biased downward, because of the negative correlation between the estimate of the heritability and its weight (the inverse of the sampling variance). Furthermore, a downward-biased overall heritability estimate would be obtained because the heritability estimate from smaller experiments tends to be biased downward, and the corresponding estimated sampling variance would be too small, giving too much weight to the smaller experiments. A joint analysis of data, or an iterative procedure, in which the estimated sampling variance for each experiment is recalculated from the pooled heritability estimate (W. G. HILL, personal communication) is to be preferred. However, a further complication arises because of a bias in the estimated sampling variance if the standard prediction equations are used. This bias may be upward (as shown above) or downward depending on the population parameters. To avoid strong biases in the pooled heritability estimates, a single data analysis should be carried out.

Conclusion:
For the design of experimental populations to estimate genetic parameters, the prediction of the sampling variance of heritabilities using OSBORNE and PATERSON 1952 Down is accurate, unless the population heritability is large and the number of family groups is very small. For analysis of data, the estimate of the standard error of the heritability obtained by substituting the estimated heritability for the true value in the standard prediction formulas is almost unbiased for the range of heritabilities and sample sizes likely to be encountered in practice.

For small experiments, estimates of heritability are biased downward, and estimates of sampling variances are generally not unbiased. Combining results from different experiments by weighting the heritability estimates by the inverse of their estimates-sampling variances may result in a severely biased heritability estimate, because the smaller experiments tend to have estimates that are too low, and too much weight is given to these estimates if their sampling variances are biased downward too. A joint analysis of all data is to be preferred.

The predicted sampling variance of the genetic correlation using REEVE 1955 Down and TALLIS 1959 Down are accurate only if the population heritabilities are not close to zero and if the number of families is large. Even if the population heritabilities are known, the estimated heritabilities should be used in the estimation of the sampling variance of the genetic correlation coefficient.


*  ACKNOWLEDGMENTS

I thank NAOMI WRAY and BILL HILL for helpful comments and KEN KOOTS and JOHN GIBSON for many constructive discussions and the sharing of additional simulation results.

Manuscript received February 3, 1997; Accepted for publication March 23, 1998.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION AND CONCLUSIONS
*LITERATURE CITED

CALVIN, J. A., 1993  REML estimation in unbalanced multivariate variance components models using an EM algorithm. Biometrics 49:691-701.

FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics. Longman, Harlow, UK.

FISHER, R. A., 1921  On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3-32.

HAYES, J. F. and W. G. HILL, 1981  Modification of estimates of parameters in the construction of genetic selection indices ("bending"). Biometrics 34:429-439.

KEMPTHORNE, O., 1957 An Introduction to Genetic Statistics. John Wiley & Sons, New York.

KOOTS, K. R. and J. P. GIBSON, 1996  Realized sampling variances of estimates of genetic parameters and the difference between genetic and phenotypic correlations. Genetics 143:1409-1416[Abstract].

LANCASTER, H. O., 1969 The Chi-Squared Distribution. John Wiley & Sons, New York.

MEYER, K. and W. G. HILL, 1991  Approximation of sampling variances and confidence intervals for maximum likelihood estimates of variance components. J. Anim. Breed. Genet. 109:264-280.

OSBORNE, R. and W. S. B. PATERSON, 1952  On the sampling variance of heritability estimates derived from variance analyses. Proc. R. Soc. Edinb. Sect. B 64:456-461.

PATTERSON, H. D. and R. THOMPSON, 1971  Recovery of inter-block information when block sizes are unequal. Biometrika 58:545-554[Abstract/Free Full Text].

REEVE, E. C. R., 1955  The variance of the genetic correlation coefficient. Biometrics 11:357-374.

ROBERTSON, A., 1959a  The sampling variance of the genetic correlation coefficient. Biometrics 15:469-485.

ROBERTSON, A., 1959b  Experimental design in the evaluation of genetic parameters. Biometrics 15:219-226.

ROBERTSON, A. and I. M. LERNER, 1949  The heritability of all-or-none traits: viability of poultry. Genetics 34:395-411[Free Full Text].

TALLIS, G. M., 1959  Sampling errors of genetic correlation coefficients calculated from analyses of variance and covariance. Aust. J. Stat. 1:35-43.

TAYLOR, S. C., 1976  Multibreed designs. 1. Variation between breeds. Anim. Prod. 23:133-144.

VAN VLECK, L. D. and C. R. HENDERSON, 1961  Empirical sampling estimates of genetic correlations. Biometrics 17:359-371.

VISSCHER, P. M., 1995  Bias in genetic R2 from halfsib designs. Genet. Sel. Evol. 27:335-345.

ZERBE, G. O. and D. E. GOLDGAR, 1980  Comparison of intraclass correlation coefficients with the ratio of two independent F-statistics. Commun. Stat. Theor. Meth. A 9:1641-1655.