- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Visscher, P. M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Visscher, P. M.
On the Sampling Variance of Intraclass Correlations and Genetic Correlations
Peter M. Visscheraa University of Edinburgh, Institute of Ecology and Resource Management, Edinburgh EH9 3JG, Scotland
Corresponding author: Peter M. Visscher, University of Edinburgh, Institute of Ecology and Resource Management, West Mains Rd., Edinburgh EH9 3JG, Scotland, peter.visscher{at}ed.ac.uk (E-mail).
Communicating editor: R. G. SHAW
| ABSTRACT |
|---|
Widely used standard expressions for the sampling variance of intraclass correlations and genetic correlation coefficients were reviewed for small and large sample sizes. For the sampling variance of the intraclass correlation, it was shown by simulation that the commonly used expression, derived using a first-order Taylor series performs better than alternative expressions found in the literature, when the between-sire degrees of freedom were small. The expressions for the sampling variance of the genetic correlation are significantly biased for small sample sizes, in particular when the population values, or their estimates, are close to zero. It was shown, both analytically and by simulation, that this is because the estimate of the sampling variance becomes very large in these cases due to very small values of the denominator of the expressions. It was concluded, therefore, that for small samples, estimates of the heritabilities and genetic correlations should not be used in the expressions for the sampling variance of the genetic correlation. It was shown analytically that in cases where the population values of the heritabilities are known, using the estimated heritabilities rather than their true values to estimate the genetic correlation results in a lower sampling variance for the genetic correlation. Therefore, for large samples, estimates of heritabilities, and not their true values, should be used.
THERE are three classic papers on the topic of sampling variances of estimates of genetic correlations in the 1950s: ![]()
![]()
![]()
It could be argued that the expressions derived in those articles are no longer relevant, since estimation techniques have moved on from least-squares methods to likelihood-based methods [mainly residual maximum likelihood (REML); ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Three areas of confusion can be identified:
- Should the parameters in the expressions for the sampling variance of the genetic correlation coefficient in
REEVE 1955 ,
ROBERTSON 1959A , and
TALLIS 1959 be the population parameters or the estimates of those parameters?
- How good are the expressions derived in the trio of articles?
- What is the impact of estimates that are outside the parameter space, i.e., negative heritability estimates and/or estimates of genetic correlations <-1 or >+1?
In this article I review the main expressions of REEVE, ROBERTSON, and TALLIS and clarify under what circumstances the expressions should be used. I also review and evaluate the equations for the sampling variance of intraclass correlations as a paradigm, since these are central to the understanding of the assumptions and methods.
| MATERIALS AND METHODS |
|---|
To answer the above questions, we first look at the derivation of the sampling variance of intraclass correlation coefficients because this serves as an appropriate paradigm for the sampling variance of the genetic correlation coefficient. Subsequently, expressions for the sampling variance of the genetic correlation are reviewed and evaluated.
Sampling variance of intraclass correlation coefficients
Population parameters known:
Consider a simple, balanced one-way design with s sires and n progeny per sire, and assume that parameters are estimated using least-squares methods, e.g., ANOVA. Observations are assumed to be normally distributed. The expectations and variances of the between- and within-sire mean squares (MS) are
p the phenotypic standard deviation. The variance of the least-squares estimate of the intra-class correlation,
![]() |
(1) |
Equation 1 is well known (![]()
![]()
![]()
For large n, Equation 1 reduces to
![]() |
(2) |
A number of expressions similar to Equation 1, which all differ in the terms relating to the number of sires and progeny per sire, can be found in the literature. They differ in particular in the degrees of freedom relating to the between-sire component of variance. The general expression is
![]() |
(3) |
Where ci is a function of s and n, from reference i. The original expression was derived in a classical paper by ![]()
![]() |
(4) |
![]()
![]() |
(5) |
Finally, following ![]()
Hence,
![]() |
(6) |
![]() |
(6) |
Although this expression is trivial to derive, to my knowledge this is the first time that it has been documented following a first-order approximation using the F ratio.
All the above expressions for ci reduce to the one most commonly used (i.e., c0 =
) for large s and n. However, even for large n, there are discrepancies between the formulas depending on the use of s, (s - 1), or (s - 2) in the denominator. It is likely that the form used by ROBERTSON and LERNER is incorrect and probably stems from the use of FISHER's z-transformation. FISHER showed that
Hence,
However, as pointed out by ![]()
Population values unknown:
Except when doing power calculations and/or investigating the design of experiments (![]()
for t. This is essentially based upon the assumption that E(
) = t. However, it should be obvious that if, by chance, the estimate of the intraclass correlation is too high (or too low), the resulting estimate of the sampling variance will be biased. Using
![]() |
(7) |
=
[
] (see also
= 1 or
= -
. Only when there is a scale on which the sampling variance is (nearly) independent of the population values, for example, FISHER's z-scale, will the estimate of the sampling variance be correct.
![]()
![]() |
(8) |
Equation 8 reduces to the standard equation of ![]()
![]()
Simulation study:
Simulations were performed to compare the empirical standard deviation of heritability estimates, the predicted standard deviation using the true population values (Equation 1), and the average estimated standard deviation (using Equation 1, Equation 4, and Equation 8, with
substituted for t). Independent between-sire and within-sire sums of squares were sampled from central
2 distributions with (s - 1) and [s(n - 1)] degrees of freedom, respectively, and then scaled to the appropriate mean squares using the population values of t and
p and the values of s and n. Without loss of generality, a phenotypic standard deviation of unity was used throughout.
Since the only difference between the various prediction equations for the sampling variances are functions of s and n, only results for the Taylor series (![]()
![]()
Sampling variance of genetic correlation coefficient
Expressions from literature:
![]()
![]() |
(9) |
![]() |
(10) |
![]() |
(11) |
, which is the general expression of the reliability of a progeny test based upon n progeny and a heritability of 4ti.
Special cases:
This is the scenario of ![]()
![]() |
(12) |
![]() |
(13) |
These correspond to the equations given by ![]()
A further simplification is if the genetic and within-sire correlation are the same. The expression for the sampling variance may be written as
![]() |
(14) |
t2, there is no simple form for the sampling variance of the genetic correlation coefficient.
Finally, ![]()
![]() |
(15) |
This was the equation used by ![]()
For a large number of progeny per sire, TALLIS' equation reduces to a very simple form,
![]() |
(16) |
This equation is equivalent to the approximation of the sampling variance of a correlation coefficient in the bivariate normal case with (s - 1) degrees of freedom.
There are difficulties in using the approaches for the sampling variance of the intraclass correlation to determine the sampling variance of estimates of the genetic correlation coefficient: (1) The estimate of rg is unbounded in principle, so that large positive and negative values (outside the range -1 to +1) are possible, and (2) the true heritability, or its estimate, appears in both the numerator and the denominator of the equation for the sampling variance of rg (see Equation 9Equation 10Equation 11). This means that in the vicinity of true or estimated h2 being zero, the estimate of the sampling variance can become very large because of a division by a small number. Also, if one or both of the estimated heritabilities is <0, the estimate of the genetic correlation coefficient is an imaginary number (![]()
![]()
Heritabilities known:
![]()
2pi the known intraclass correlations and phenotypic variances for trait i. The means and variances of the crossproducts, using Wii and Bii to denote the within and between-sire MS for trait i, are
Using
![]() |
(17) |
![]() |
(18) |
![]() |
(19) |
The values of P and Q are larger when the heritabilities are assumed known; i.e., the sampling variance of the genetic correlation is larger when heritabilities are assumed known (cf. Equation 12 and Equation 13). This is most clearly seen when the number of progeny are very large, because then
![]() |
(20) |
Simulations:
A simulation study was performed in which the empirical variance of the estimated rg and the average estimated sampling variance using the estimates of heritabilities and correlations was compared to the predicted sampling variance [using equations from ![]()
![]()
![]()
Since most authors (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| RESULTS |
|---|
Validation of expressions for sampling variances of intraclass correlations:
Predicted sampling variances of heritability estimates, observed sampling variances from simulations, and the average estimated sampling variance from simulation are presented in Table 1. For the predicted and estimated sampling variances, only the equation of ![]()
![]()
![]()
![]()
![]()
![]()
|
The average estimated standard errors of the heritability estimates are close to the observed standard error from simulation. Clearly, the approximation of ![]()
Sampling variances for genetic correlations:
Results are presented in Table 2. Clearly for small n, the empirical standard error of rg is usually larger than that predicted under the unconstrained (least-squares) model. For example, for s = 100 and n = 10, the empirical standard error is 0.833 for h2 = 0.10 and rg = 0.0, whereas the predicted value is 0.509. When the parameters are forced in the parameter space, the maximum empirical standard error is 1.0, when half of the time an estimate of +1 is obtained, and half of the time an estimate of -1. For small s and n, the empirical standard error can then be smaller than that predicted. For example, for s = 100 and n = 2, the empirical standard error from REML was 0.897 for h2 = 0.10 and rg = 0, whereas the predicted value was 2.84. For large n (>10), the equations perform well. Substituting the estimated parameters into expressions 8 and 9 is almost always worse, because of the real possibility of obtaining very small estimates of the heritabilities. Only for large designs are the average estimated and predicted standard errors similar. For powerful designs, i.e., for those designs with a small probability of obtaining least-squares estimates that are out of bounds, the average estimated sampling variance appears to be closer to the observed sampling variance than the predicted values (Table 2).
|
Relationship between heritability estimates and sampling variance of rg:
A more detailed investigation into the relationship between estimated heritabilities, estimated genetic correlation coefficients, and the sampling variance of the genetic correlation estimate was performed for s = 100, n = 1000, h2 = 0.50 (both traits), and rw = rg = 0.75. One million replicated populations were simulated, and both the observed standard error and the estimated standard error were summarized as a function of the geometric mean of the heritability estimates (i.e., {
1 x
2}). Simulation results are displayed in Figure 1. The graph also includes a plot of the predicted standard error, assuming that the values of the heritabilities on the x-axis are the population values. Since for a powerful design with many progeny per sire the predicted sampling variance of the genetic correlation coefficient does not depend on the heritabilities (Equation 14 and Equation 16), the corresponding line in Figure 1 appears to be horizontal. The prediction of the standard error using population values, i.e., h2 = 0.50 and rg = rw = 0.75, is 0.0443 for this design. The observed standard error of genetic correlation coefficients over all samples, hence also over all possible values of estimated heritabilities, was 0.0443, and the correlation between the estimated genetic correlation and the geometric mean of the heritability estimates was 0.59 (results not shown).
|
| DISCUSSION AND CONCLUSIONS |
|---|
Population or estimated values:
It is clear that all the expressions for the sampling variance of intraclass correlations or genetic correlation coefficients were essentially derived using a first-order Taylor series about the true population values. Hence, these are the values that should be used to study, for example, the power of various experimental designs, because they give the best prediction of the sampling variance.
Sampling variance of intraclass correlation:
The prediction of the sampling variance of the heritability estimate based upon population values is accurate for small heritabilities and/or a large number of sires. However, for a small number of sires and a large heritability, the prediction is very poor. The reason for this is that the distribution of the heritability estimate becomes very skewed in these cases, and the Taylor series in Equation 1 ignores higher-order terms by implicitly assuming that both numerator and denominator in the series are normally distributed. In fact, they are distributed proportionally to
2 distributions, which are known to be highly skewed for a small number of degrees of freedom [the coefficient of skewness of a central
2 distribution with k degrees of freedom is 23/2/k1/2 (![]()
) is the only term that varies in the Taylor series. Hence, the ![]()
Estimating the sampling variance of intraclass correlations by substituting the estimates of the heritabilities into the standard expressions (![]()
Sampling variances of genetic correlations:
The expressions for var(
g) from REEVE, TALLIS, and ROBERTSON perform poorly using small population sizes and small heritabilities when the known population values are used to predict the sampling variance of the genetic correlation coefficient. This is because the estimates of the genetic correlation coefficient can become very large (positive or negative) when using least-squares methods. Using REML, the equations perform much better, although the empirical standard errors are generally larger than those predicted in Table 2. Substituting the estimates of the heritabilities and genetic correlations into Equation 15 can result in a very large estimate of the sampling variance of the genetic correlation when there is a real chance that the estimates of the heritabilities approach zero (the numerator of Equation 15 approaches 2/[n(s - 1)(n - 1)] for both intraclass correlations approaching zero, whereas the denominator goes to zero).
![]()
The relationship between estimated heritabilities and the empirical sampling variance of the genetic correlation was explored in Figure 1 for a powerful design. From these results we may conclude that (1) the predicted sampling variance accurately predicts the average observed sampling variance (0.0443), but not the observed sampling variance for given values of the achieved estimates of the heritabilities; (2) for a given value of the estimates of the heritabilities, the estimated sampling variance follows a very similar pattern to that of the observed sampling variance (as in ![]()
![]()
1
2 > 0.07, the observed, estimated, and predicted sampling variance are virtually identical. For each of the geometric mean classes, the mean estimate of the genetic correlation was zero (results not shown), so that for a particular value of the estimated geometric mean of the heritabilities, the sampling variance of the genetic correlation coefficients reflects sampling from a population with the true values of the heritabilities equal to those estimated. This is in contrast with the previous example, in which the correlation between the estimated genetic correlation and geometric mean of the heritabilities was positive (+0.59). For smaller values of the estimated heritabilities, Figure 2 shows that the estimated sampling variance is much larger than either the predicted or observed sampling variance. The estimated sam-pling variance of the genetic correlation coefficient for
1
2 = 0.01 was 9.9. It is not clear from these results why the average estimated sampling variance is closer to the observed sampling variance than the prediction using population values.
|
![]()
![]()
![]()
Bias in estimate of sampling variance:
It is usually assumed that by substituting the estimates of population parameters in the prediction equations for the sampling variance of the heritability or genetic correlation, unbiased estimates of those sampling variances are obtained. However, this is not generally the case for small sample sizes. Furthermore, when comparing the average sampling variation from simulation, it matters whether results are expressed in the average standard error (as in this study) or in the average sampling variance. This is best illustrated using the example of a half-sib design with large n and a small population value of t. Then
If expectations are taken over these estimates, then
Hence, for small t and large n, the estimate of the sampling variance can be severely biased upward for a small number of sires, whereas the estimate of the standard error is unbiased. It follows that the adjustment of the degrees of freedom suggested by ![]()
Conclusion:
For the design of experimental populations to estimate genetic parameters, the prediction of the sampling variance of heritabilities using ![]()
For small experiments, estimates of heritability are biased downward, and estimates of sampling variances are generally not unbiased. Combining results from different experiments by weighting the heritability estimates by the inverse of their estimates-sampling variances may result in a severely biased heritability estimate, because the smaller experiments tend to have estimates that are too low, and too much weight is given to these estimates if their sampling variances are biased downward too. A joint analysis of all data is to be preferred.
The predicted sampling variance of the genetic correlation using ![]()
![]()
| ACKNOWLEDGMENTS |
|---|
I thank NAOMI WRAY and BILL HILL for helpful comments and KEN KOOTS and JOHN GIBSON for many constructive discussions and the sharing of additional simulation results.
Manuscript received February 3, 1997; Accepted for publication March 23, 1998.
| LITERATURE CITED |
|---|
CALVIN, J. A., 1993 REML estimation in unbalanced multivariate variance components models using an EM algorithm. Biometrics 49:691-701.
FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics. Longman, Harlow, UK.
FISHER, R. A., 1921 On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3-32.
HAYES, J. F. and W. G. HILL, 1981 Modification of estimates of parameters in the construction of genetic selection indices ("bending"). Biometrics 34:429-439.
KEMPTHORNE, O., 1957 An Introduction to Genetic Statistics. John Wiley & Sons, New York.
KOOTS, K. R. and J. P. GIBSON, 1996 Realized sampling variances of estimates of genetic parameters and the difference between genetic and phenotypic correlations. Genetics 143:1409-1416[Abstract].
LANCASTER, H. O., 1969 The Chi-Squared Distribution. John Wiley & Sons, New York.
MEYER, K. and W. G. HILL, 1991 Approximation of sampling variances and confidence intervals for maximum likelihood estimates of variance components. J. Anim. Breed. Genet. 109:264-280.
OSBORNE, R. and W. S. B. PATERSON, 1952 On the sampling variance of heritability estimates derived from variance analyses. Proc. R. Soc. Edinb. Sect. B 64:456-461.
PATTERSON, H. D. and R. THOMPSON, 1971 Recovery of inter-block information when block sizes are unequal. Biometrika 58:545-554
REEVE, E. C. R., 1955 The variance of the genetic correlation coefficient. Biometrics 11:357-374.
ROBERTSON, A., 1959a The sampling variance of the genetic correlation coefficient. Biometrics 15:469-485.
ROBERTSON, A., 1959b Experimental design in the evaluation of genetic parameters. Biometrics 15:219-226.
ROBERTSON, A. and I. M. LERNER, 1949 The heritability of all-or-none traits: viability of poultry. Genetics 34:395-411
TALLIS, G. M., 1959 Sampling errors of genetic correlation coefficients calculated from analyses of variance and covariance. Aust. J. Stat. 1:35-43.
TAYLOR, S. C., 1976 Multibreed designs. 1. Variation between breeds. Anim. Prod. 23:133-144.
VAN VLECK, L. D. and C. R. HENDERSON, 1961 Empirical sampling estimates of genetic correlations. Biometrics 17:359-371.
VISSCHER, P. M., 1995 Bias in genetic R2 from halfsib designs. Genet. Sel. Evol. 27:335-345.
ZERBE, G. O. and D. E. GOLDGAR, 1980 Comparison of intraclass correlation coefficients with the ratio of two independent F-statistics. Commun. Stat. Theor. Meth. A 9:1641-1655.
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Visscher, P. M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Visscher, P. M.






















