## Abstract

The genetics and evolution of complex traits, including quantitative traits and disease, have been hotly debated ever since Darwin. A century ago, a paper from R.A. Fisher reconciled Mendelian and biometrical genetics in a landmark contribution that is now accepted as the main foundation stone of the field of quantitative genetics. Here, we give our perspective on Fisher’s 1918 paper in the context of how and why it is relevant in today’s genome era. We mostly focus on human trait variation, in part because Fisher did so too, but the conclusions are general and extend to other natural populations, and to populations undergoing artificial selection.

IT has been a century since the landmark paper “The Correlation between Relatives on the Supposition of Mendelian Inheritance” by R.A. Fisher was published in the *Transactions of the Royal Society of Edinburgh* (Fisher 1918). Much has been written about the paper in the last 50 years in the context of the historical background (Provine 1971), relevant statistical theory, and the genetic models [*e.g.*, (Lynch and Walsh 1998)]. The paper is also frequently cited and discussed in the recent outstanding synthesis of our knowledge of the selection and evolution of quantitative traits (Walsh and Lynch 2018). Fisher also summarized it in a much easier article a year later (Fisher 1919).

After Mendel’s laws were rediscovered in 1900, there was a vigorous debate between the “Biometricians,” led by Pearson, and the “Mendelians,” led by Bateson. The Biometricians argued that the inheritance of continuous traits, such as human height, could not be explained by Mendelian principles. For instance, they argued that the “law of ancestral heredity,” which states that the mean trait value of offspring is better predicted the more knowledge of ancestral trait values one has, could not be reconciled with single genes with discrete, large effects, such as Mendel inferred for several traits in peas (Yule 1902; Provine 1971). The purpose of this Perspective is not to give a detailed historical account of that era (in addition to Provine’s book, there are a number of very good Wikipedia pages on the topic). However, a quotation from 1902, in which G. Udny Yule defends (Raphael) Weldon against criticism from William Bateson, illustrates the strength of feelings at that time: “Mr. Bateson devotes many words to these questions, but one cannot help feeling that his speculations would have had more value had he kept his emotions under better control; the style and method of the religious revivalist are ill-suited to scientific controversy. It is difficult to speak with patience either of the turgid and bombastic preface to ‘Mendel’s Principles,’ with its reference to Scribes and Pharisees, and its Carlylean inversions of sentence, or of the grossly and gratuitously offensive reply to Professor Weldon and the almost equally offensive adulation of Mr. Galton and Professor Pearson” (Yule 1902). So much for Victorian gentlemanly and scholarly behavior. Nevertheless, attempts to reconcile the observed results with Mendelian inheritance had been made before 1918 by Pearson, Yule, and Weinberg, but it was Fisher’s 1918 paper that formed the basis for quantitative genetics in the future.

## The 1918 Paper

In his paper, Fisher showed that this “pea *vs.* height debate” could be reconciled by postulating that multiple genes contribute to variation in the population, each of them obeying Mendelian rules and segregation ratios. Fisher’s 1918 paper is notoriously hard to read, not least because he introduced new concepts, new genetic models, and new statistical methods. It also contains a number of typos (Moran and Smith 1966). However, its purpose was clearly set out in the introductory paragraph, namely, to interpret empirical results from biometry, in particular the correlations between relatives, in accordance with Mendelian inheritance, to provide a more precise analysis of the causes of variation in human complex traits. Fisher built upon previous work by Yule and Pearson. The main findings can be readily summarized.

If there are many genes contributing to trait variation then total (phenotypic) variance in a trait (such as height) can be partitioned into variance components due to genetic factors and environmental factors, and genetic variance itself can be partitioned into variance due to the average effects of alleles and variance due to dominance deviations. The average effect of alleles is the regression coefficient of the genotype mean on allelic dosage (0, 1, and 2), and dominance deviations are the residuals from this regression (Box 1). The now ubiquitous term variance was first introduced in this paper, as was the method of the ANOVA (Charlesworth and Edwards 2018).

Knowledge of the genetic variance components for average effects across loci, dominance deviations, and epistatic interaction deviations is sufficient to predict the resemblances between relatives, without knowing anything about what we now call “genetic architecture” of a trait (the number of genes affecting the trait, and the joint distribution of allelic effects and frequencies). Conversely, the observed phenotypic correlation between relatives can be used to estimate genetic variance components.

Assortative mating changes the population genetic variance relative to a randomly mating population, and affects the correlation between relatives. Interestingly, much of Fisher’s 1918 paper was about how to model the effects of assortative mating on genetic variation and the resemblance between relatives, and Fisher commented later that for him this was the most difficult aspect of the theory (Fisher 1919). As shown by Fisher, and by Sewall Wright a few years later (Wright 1921), assortative mating creates a correlation between the effect of trait alleles at different loci. This changes the genetic variance in the population because, in addition to the variation contributed by individual trait loci (the genic variance), a covariance is induced. Specifically, for positive assortative mating (like-with-like), individuals carrying alleles with a positive effect on the trait at one locus tend to carry positive alleles at other loci as well. Thus for positive assortative mating, this covariance is positive and it can be nontrivial. The within-gamete correlation of the effect of trait loci is a form of linkage disequilibrium (strictly speaking gametic phase disequilibrium, since loci do not have to be genetically linked). When the number of loci affecting a trait is small, positive assortative mating also increases the variance at individual loci. However, for polygenic traits, the contribution of this increased variance is negligible compared to the contribution from covariances between loci (Lynch and Walsh 1998).

### Box 1: Main Concepts with Relevance to GWAS

For a lucid and thorough description, and derivation of single- and multi-locus models on quantitative traits, we refer to text books (Falconer and Mackay 1996; Lynch and Walsh 1998). Here, we provide the definitions and derivations from a single-locus model. Let AA, AT, and TT be the three genotypes at a single locus with two alleles segregating in the population, the frequency of T be *p*, and the mean phenotypes of those genotypes be μ−*a*, μ + *d*, and μ + *a*, respectively. Assuming that the genotypes are in Hardy–Weinberg equilibrium, their frequencies are (1−*p*)^{2}, 2*p*(1−*p*), and *p*^{2}, respectively. A linear regression of the genotype means, weighted by their frequency, on the number of T alleles (0, 1, or 2) has an intercept of μ−*a* + 2*p*^{2}*d* and a slope of α = *a* + (1−2*p*)*d*.

Fisher modeled and partitioned the variance generated by this model. The variance of the genotypes can be written as σ_{G}^{2} =2*p*(1−*p*)α^{2} + (2*p*(1−*p*)*d*)^{2}. The first term is the variance in the genotypes that is explained by the regression on allele dosage (0, 1, or 2) and is what we now called the additive genetic variance. The second term is the residual variance from that regression and is termed dominance variance. Hence, the genetic variance can be partitioned as σ_{G}^{2} = σ_{A}^{2} + σ_{D}^{2}.

In GWAS, the individual phenotype *y _{i}* is regressed on

*x*, the number of minor (or major) alleles at a locus. Including an error term,

_{i}*y*

_{i}= μ + β

*x*

_{i}+ e

_{i}. The regression slope β is the average effect, hence β = α, and the residual variance includes additive genetic variation due to other loci in the genome, dominance variance at the locus under consideration and elsewhere in the genome, and environmental variance. In GWAS, the statistical power to detect a variant depends on the variance it generates, which is β

*σ*

^{2}_{x}

^{2}= 2

*p*(1−

*p*)[

*a*+ (1 − 2

*p*)

*d*]

^{2}. Therefore, statistical power depends on how common an allele is and its average effect, which includes a dominance term.

Fisher showed that, ignoring epistasis, the genetic covariance between relatives depends on only the additive genetic and dominance variance components, the expected proportion of the genome that they share identical-by-descent (IBD), and the expected proportion of the genome where relatives share both alleles IBD.

The theory built in Fisher’s 1918 paper has proved to be extremely useful. It correctly predicts the consequences, in the short-term at least, of artificial or natural selection. A strength of the theory is that these predictions rely on estimable parameters, such as the “heritability” (the ratio of genetic variance to total variance), but not on the details of the individual loci affecting the trait. Although the phrase heritability was not used by Fisher, he defined variance components due to “essential genotypes” and “genotypes,” and the ratios of these variances to total phenotypic variance are what we now call narrow- and broad-sense heritability, respectively. If the number of loci influencing a quantitative trait in the model is increased toward infinity, each locus having an infinitesimally small effect, then the distribution of genetic values approaches a normal distribution, as is observed for many traits. However, only a modest number of loci is needed for the distribution to become close to normal, so that the resulting theoretical distributions are, in practice, indistinguishable from those observed. Consequently, it is impossible to determine the genetic architecture underlying quantitative traits by studying the resemblance between relatives. This limitation ended with the availability of genomic tools such as high-throughput single-nucleotide polymorphism (SNP) genotyping or sequencing.

## Fisher 1918 and the Genome Era

Since 2007, there has been a remarkable pace of discovery in complex trait genetics in human populations and also in other species. Discoveries have been facilitated by advances in genomic technologies, in particular the ability to cheaply genotype hundreds of thousands of common SNP markers. The availability of SNP arrays allowed the experimental design of genome-wide association studies (GWAS), in which individual SNPs are tested for statistical association with one or more complex traits. A decade later, tens of thousands of robust SNP–trait associations have been reported for human traits and diseases [see, for example, Visscher *et al.* (2017)]. Fisher’s 1918 theory has survived empirical observations from GWAS with flying colors. The assumptions underlying Fisher’s models can now be tested empirically and the parameters in his models can now be estimated directly from genomic data. Despite the continual misunderstanding of Fisher’s definition of (what we now call) the average effect of allele substitution and of additive genetic variation (see below), the standard way to analyze data from GWAS is by performing a linear regression of the trait on SNP dosage (a count of 0, 1, or 2 of one of the two alleles at a locus). This is exactly the regression model that underlies Fisher’s derivation of additive genetic variation in 1918 (Box 1). Hence, the effect sizes that are reported from GWAS should be interpreted as average effects.

GWAS have shown that Fisher’s assumptions about multiple loci affecting a trait (*i.e.*, polygenicity) and the resulting additive genetic variation were well justified. First, it has been shown that there is genetic variation for nearly any trait that varies in a population, and that polygenicity is the norm for such traits (Visscher *et al.* 2012). Indeed, it is remarkable just how polygenic traits are. For example, the latest publication on human height reports > 3000 loci that are statistically significantly associated with the trait, although these loci together still only explain about one-third of additive genetic variation (Yengo *et al.* 2018b). Polygenicity is not restricted to traits like human height. Interestingly, many common diseases are also polygenic. For example, a randomly chosen 1-Mb region of the human genome contains at least one locus contributing to variation in liability to schizophrenia (Loh *et al.* 2015). It is also clear that traits with a major causative gene are also affected by many genes of small effect that contribute to individual differences in the population. Even Mendelian traits can show such polygenicity. For example, the age of onset of the single-gene disorder Huntington’s Disease shows polygenic effects (Lee *et al.* 2015).

Fisher modeled variation due to average effect and dominance deviations, and also covered what he called epistacy. He reasoned that higher-order epistasis would contribute little nonadditive genetic variance, though this was shown theoretically only very recently (Mäki-Tanila and Hill 2014), but considered dominance variance possible. Indeed, his conclusion from the analysis of human height was that the ratio of dominance to additive variance was about one-third, which we would now consider to be a rather large estimate of dominance variance. It is interesting to speculate about why Fisher did not question this inference on dominance variance, although he noted the large SE of the estimates of the variance components. Perhaps the rediscovery of Mendel’s segregation ratios, with all factors showing dominance, led to an assumption that polygenic traits would show substantial dominance variance. In addition, earlier results on human traits (mainly stature) showed a somewhat larger correlation between full siblings than between parent and offspring, consistent with dominance variance (Pearson and Lee 1903), but also consistent with sibs sharing a common environment. Empirical evidence from many traits across multiple species has since shown that, although dominance and epistasis occur, most of the genetic variation is additive (Hill *et al.* 2008). This empirical evidence spans selection experiments, estimation of variance components from pedigrees, and GWAS across a range of species [*e.g.*, (Bloom *et al.* 2015)].

Complex disease phenotypes are often recorded as “affected” or “unaffected.” Data of this form are commonly analyzed by an extension of Fisher’s model that assumes an underlying scale of liability to the disease showing continuous variation, similar to that of quantitative traits (Wright 1920; Falconer 1965; Edwards 1969). The liability model is flexible and can include dominance and epistatic interactions. At one extreme is a model with purely additive effects on the scale of liability, and at the other extreme a model where each case of disease is caused by a single recessive mutation. Although the latter type of inheritance can occur, most of the genetic variance for complex diseases is explained by a model closer to the additive model with a large number of loci.

In human populations, there is strong evidence for assortative mating, which cannot be explained by shared ancestry between mates (Tenesa *et al.* 2016; Robinson *et al.* 2017). Remarkably, the phenotypic correlation between spouse pairs for human height has changed little over the last century (and is ∼0.2−0.3). If phenotypes are indeed important in mate choice, assortative mating should generate a correlation between the genetic values of mates. Recently, with data from GWAS, this expectation has been confirmed in that the (estimated) breeding values of mates have been found to be correlated (Tenesa *et al.* 2016; Robinson *et al.* 2017). The traits in human populations that show the strongest effects of assortative mating are height and traits that are genetically correlated with intelligence (*e.g.*, educational attainment). Positive assortative mating on liability to disease, as suggested by observational data for liability to psychiatric disorders (Nordsletten *et al.* 2016), increases the disease’s prevalence in the population, relative to a random mating possibility, assuming a liability threshold model, *e.g.*, (Peyrot *et al.* 2016). The linkage disequilibrium at trait loci generated by assortative mating can now be estimated directly from a sample of genomes in the population by exploiting information on the effect sizes at trait-associated loci, which can be estimated from large GWAS (Yengo *et al.* 2018a).

## Persistent Misunderstandings

Despite the fact that the theory has stood the test of time, a surprising number of fundamental misunderstandings persist about assumptions, models, and implications. For example, Fisher’s model implies that both genetic and environmental factors contribute to variation in each trait, yet one is still asked if a certain trait is “genetic or environmental.” A more subtle error is to assume that individual cases of a complex disease are due either to genetic or to environmental causes; in human disease epidemiology, attempts are made to summarize the proportion of cases into mutually distinct categories, using, for example, a pie-chart that includes the category “genetic.” In his 1919 paper, Fisher commented that even a high heritability of 0.95 (his estimate for height) is consistent with large effects of environmental factors (Fisher 1919). Subsequent research has shown that the heritability of human height is ∼0.8. Even with this large value, if the phenotypic SD of human height is 7 cm, then the SD of the environmental component is 7√(1−0.8) = 3.1 cm. Thus, the high heritability is not incompatible with the large increase over time in human height due to improvements in the environment, such as better nutrition. Nor does a high heritability of human traits and diseases imply that nongenetic interventions will not work.

The definition of the average effect at a locus is α = *a* + *d*(1 − 2*p*), where 2*a* is the difference between the genotypic values of the homozygotes, *d* the genotypic value of the heterozygote, and *p* and (1 − *p*) are the frequencies of the two alleles at the locus (Box 1). As noted above, this is the quantity being estimated in GWAS. The average effect depends on the interaction (*d*) between the two alleles and on the allele frequencies (*p*), and so does its variance, the additive genetic variance (or variance of breeding values). The definition of the average effect shows why it is incorrect to say that it assumes the absence of dominance (a frequent misunderstanding) and why, if dominance deviations exist, the effect sizes of loci associated with a trait are not expected to be the same across populations that differ appreciably in allele frequency.

Mendel’s rules imply that a parent does not pass on the same combination of alleles to all its offspring. Similarly, by using regression theory, Fisher showed that the regression of offspring phenotype on parental phenotype contains a residual genetic term, which is the deviation of the offspring’s breeding value from the parental average. The resulting segregation variance is essential for natural or artificial selection to work, and understanding it solved the problem of “blending inheritance,” which implies a depletion of genetic variance over generations (Jenkin 1867). However, it is still widely unappreciated (at least among human geneticists) that 50% of additive genetic variation in the population is segregation variance within families. For instance, even if the heritability of liability to a rare disease is high, the majority of cases have no family members suffering from the same disease (Smith 1970; Yang *et al.* 2010).

A century after Fisher 1918, it is still common to read in the popular press that “Dr. X has discovered the gene for trait Y,” when we know or expect that there are thousands of genes contributing to the variation in each complex trait. An unappreciated corollary of this is that each case of type 2 diabetes or schizophrenia is due to a different combination of alleles, and that it is the cumulative effect of risk variants in an individual that determine their liability to disease.

Fisher’s models assume a large number of variants segregating in the population that, together, account for the standing genetic variation in the population. Research shows that there are polymorphisms affecting almost any trait one cares to study and that the frequency of rare alleles is only a little more than expected under neutrality. It is inconceivable that polymorphisms affecting height in humans or milk yield in cows are neutral in the sense that they have no fitness consequences. These polymorphisms continue to segregate despite natural selection, presumably because their effects are so small and/or because neither allele is consistently more fit than the other (Simons *et al.* 2018; Walsh and Lynch 2018). If polymorphisms affecting complex traits were neutral, genetic variance would build up until a mutation-drift equilibrium was reached. In large populations, this would create unrealistically large heritability. Therefore, observed heritability values suggest that selection operates mainly to eliminate variation, not to maintain it, consistent with many studies of DNA sequence evolution. Recent results from GWAS of human traits also provides evidence for widespread selection acting to reduce diversity (Gazal *et al.* 2017; Simons *et al.* 2018)

Intense artificial selection, for instance in maize and poultry, has led to enormous changes in phenotypes, which continue approximately linearly over time (Walsh and Lynch 2018). If there were only a few important loci, then these should have reached fixation. On the other hand, if response to selection were due to new mutations of large effect, we should observe bursts of selection response. The observed linear response can most readily be explained by small allele frequency changes at many segregating loci. Therefore, empirical data from selection experiments are consistent with the idea that adaptation in natural populations is largely due to small allele frequency changes at many loci (Pritchard *et al.* 2010). Similarly, some methods to detect signatures of selection assume that selection operates by increasing the frequency of a new mutation (Walsh and Lynch 2018). This does occur but, more commonly, selection experiments show a response typical of small changes in allele frequency.

## Conclusion

For 100 years, Fisher (1918) has been the basis for our understanding of the inheritance of quantitative or complex traits, and the resemblance between relatives. In the last 10 years, the tools of genomics have allowed us to see the details of genes that underlie this variation and the emerging empirical data are fully consistent with the assumptions underlying quantitative genetics theory. The common misconceptions we outline here should disappear when genetic variation for complex traits is fully dissected using modern genomic data and analysis tools.

## Acknowledgments

We thank Naomi Wray and Jian Yang for comments, and all three referees plus two editors for helpful suggestions. P.M.V. acknowledges that his wife’s sister’s mother-in-law was Margaret E. Wallace, who held the Assistant in Research post at the Department of Genetics at the University of Cambridge and obtained her Ph.D. there with R.A. Fisher as advisor. P.M.V. is supported by the Australian National Health and Medical Research Council, and the Australian Research Council.

## Footnotes

*Communicating editor: A. S. Wilkins*

- Received September 11, 2018.
- Accepted December 26, 2018.

- Copyright © 2019 by the Genetics Society of America