Tilting at Quixotic Trait Loci (QTL): An Evolutionary Perspective on Genetic Causation
Kenneth M. Weiss


Recent years have seen great advances in generating and analyzing data to identify the genetic architecture of biological traits. Human disease has understandably received intense research focus, and the genes responsible for most Mendelian diseases have successfully been identified. However, the same advances have shown a consistent if less satisfying pattern, in which complex traits are affected by variation in large numbers of genes, most of which have individually minor or statistically elusive effects, leaving the bulk of genetic etiology unaccounted for. This pattern applies to diverse and unrelated traits, not just disease, in basically all species, and is consistent with evolutionary expectations, raising challenging questions about the best way to approach and understand biological complexity.

THE past 25 years have seen an outpouring of new knowledge in genetics on a scale unprecedented in the history of any science. For important societal reasons the heaviest research investment has been in the genetics of human disease, but there has been comparable progress in understanding normal and abnormal traits in humans and many other species. Numerous approaches, that I generically refer to as “mapping,” have been developed to find statistical association between phenotypes and genotypes. They include searching variation in known candidate genes, genomewide linkage studies in samples of relatives, and genomewide association studies in population samples such as comparing cases and controls (e.g., Terwilliger and Goring 2000; Mackay 2001; Rao and Province 2001; Georges 2007; Rao 2008).

The objective of mapping is reductionistic: to dissect biological traits into enumerable genotypes with estimable effects. Complexity is not a precise concept, but generally means that many genes as well as environmental factors produce a trait, with different combinations of these factors accounting for its variation. Causation is often expressed as probabilistic risk or penetrance, the probability that someone with a given genotype will manifest a particular trait. Whether risk is probabilistic because of the nature of sampling, unmeasured heterogeneity, or because of inherently probabilistic processes is usually not known. Causation takes two faces: to describe the basis of variation of the trait in populations and to identify the origin of the trait's value in a specific individual. These are philosophically related, but different in practical terms.

Complex phenotypes can usually be viewed in quantitative terms. A trait may be defined quantitatively, like blood pressure, or may be viewed as the qualitative outcome of underlying quantitative risk factors crossing some threshold, as hypertension relative to blood pressure. The quantitative effect may pertain to onset age, severity, or the probability of a stochastic event such as of stroke as a function of blood pressure.

For decades in the history of modern genetics there were few systematic ways to go beyond segregation analysis, a statistical method for testing whether trait variation that clusters in families is consistent with the inherently probabilistic process of Mendelian inheritance. Only in fortuitous exceptions could a specific protein or chromosome anomaly be associated with a disease. Laborious mapping based on recombination among Mendelian traits was possible in experimental plants or animals, but genes and even their number remained largely unidentified until surprisingly recently.

A chromosomal region or gene identified by statistical association can for purposes here be generically referred to as a quantitative trait locus (QTL). The major breakthrough was the advent, just a generation ago, of systematic genomewide mapping techniques that, quite remarkably, could identify QTL without our having to know the biological nature of a trait so long as it could be defined and measured, an advance properly characterized as “a new horizon in human genetics” (Botstein et al. 1980). This is often described as hypothesis-free science, which seems oxymoronic because the scientific method is about testing hypotheses; in fact, mapping does essentially hypothesize genetic causation somewhere in the genome and the objective is to find it. Modern mapping was initially based on RFLP markers, soon supplemented by short tandem repeat (e.g., microsatellite) markers and recently by high-density SNP genotyping.

A long parade of successes quickly followed the availability of genomewide markers. The classic Mendelian pediatric diseases were mapped, led by bellwethers including phenylketonuria (PKU), Duchenne muscular dystrophy, Huntington's disease, retinoblastoma, and cystic fibrosis. The responsible genes were then studied in detail, stuffing Mendelian Inheritance in Man beyond printability, forcing it online as OMIM (http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim).

Meanwhile, it has long been recognized that the common chronic diseases that predominate in industrial populations are causally complex. Overall, they do not segregate in families but phenotypes are correlated among relatives, suggesting genetic involvement. Intriguingly, there is usually a subset of families in which cases do seem to segregate as if due to a single gene. So, it was natural to ask if such genes could be found by the same mapping methods—and the answer was “Yes.”

The first dramatic successes included the identification of BRCA1, which conferred high risk of breast and ovarian cancer in large multiply affected families (Hall et al. 1990; Miki et al. 1994). The subsequent hit parade included genes with major effect on colorectal cancer, Alzheimer's disease, hypercholesterolemia, hemochromatosis, and adult lactase production—just to name a few of the earlier findings. Some pharmacogenetic success has also been achieved (Goldstein 2008; Mallal et al. 2008).


These findings affirmed the extension of Mendelian concepts to complex traits, and there has been no looking back. However, poster-child genes do not tell the whole story. Just a few keywords (linkage, mapping, SNP, genomewide association) identified 6866 articles in the PubMed database published in 2007 alone. There has been a comparable burgeoning of online databases; summary, overview, and perspective articles (including this one); new journals; and a sense of urgent competitiveness, with accompanying promotion by journals (with new Errata sections), investigators, companies, and the public media.

A consistent picture has emerged, shown schematically in Figure 1 (Altmuller et al. 2001; Bowcock 2007; Khoury et al. 2007a,b; Bodmer and Bonilla 2008; Goldstein 2008; Janssens et al. 2008). The poster-child genes explain only a fraction of variation in a trait or a disease. For almost every tested trait, mapping has identified numerous additional QTL, with lesser or more problematic effects and scattered on many chromosomes (Figure 2).

Figure 1.—

Schematic of the general relationship between number of contributing alleles or loci, their individual effect size, and the consequent degree of complexity of the resulting biological trait. Modified after Sing et al. (1996).

Figure 2.—

Heuristic examples of genomic mapping hits. The specifics are not important here. (A) A composite of studies of autism (Abrahams and Geschwind 2008). (B) A few human chromosomes showing obesity-associated hits mapped in mouse or human; the rest of the genome is similarly littered (Rankinen et al. 2006; http://obesitygene.pbrc.edu). (The figure is part of Figure 1 of Rankinen et al. 2006 and is adapted by permission from Macmillan Publishers, Ltd: Obesity. Rannekin et al. Copyright 2006.)

A tiny sampler from this smorgasbord with selected representative references is given in Table 1. We can add to this feast a chutney of recent reports on various complex diseases (Sjoblom et al. 2006; Benjamin et al. 2007; Wellcome Trust Case–Control Consortium 2007; Chen et al. 2008; Emilsson et al. 2008; Manolio et al. 2008).

View this table:

A feast of findings of genes affecting complex traits in one way or another

One could easily flesh this list out from A to Z, but it is perhaps enough to say that every gene must in some sense be a “disease” gene, because if it has no ill effect when mutated, it will eventually become a pseudogene. More data than you could ever want can easily be found on almost any trait by searching Wikipedia, OMIM, GeneCards (http://www.genecards.org/), or the Human Gene Mutation Database (HGMD) (http://www.hgmd.cf.ac.uk/ac), among others. Estimates of the number of genes affecting complex traits in populations—or even in simple mouse crosses—range from the tens to hundreds or even thousands (Sjoblom et al. 2006; Wang et al. 2006; Chen et al. 2008; Reed et al. 2008).

This consistently observed pattern applies to traits involving almost every conceivable body part or process, a fact with theoretical, empirical, and epistemological implications. Most information gleaned from mapping consists of numerous small to very small individual effects. The most convincing genes have been found in more than one study, but usually confer relative risks on the order of 1.1–1.3 (Figure 3), which for most complex diseases means small average absolute risk effects (Altmuller et al. 2001; Bodmer and Bonilla 2008; Hunter et al. 2008b). The alleles at these genes usually have both low detectance and low penetrance; that is, the underlying genotype cannot be accurately predicted from the phenotype, and the genotype confers little power to predict the phenotype. Alleles with higher relative risks are more likely to be replicated but are as a rule too rare in the population to have cost effectiveness for variant-targeted therapies.

Figure 3.—

Typical odds ratios for rare (0.1–3% minor allele frequency, MAF) and common (>5% MAF) variants in genomewide association studies (from Bodmer and Bonilla 2008) (The figure is Figure 2 from Bodmer and Bonilla 2008 and is reprinted by permission from Macmillan Publishers, Ltd: Nature Genetics. Bodmer and Bonilla. Copyright 2008.)

Statistically, most hits by far have been marginally or only suggestively significant or have not been replicated. Replication is challenging (Chanock et al. 2007; McPherson et al. 2007). This is why Figure 2 is only heuristic: the examples are as of their publication date, subsequent studies always differ, and by no means are all the hits statistically reliable. There is no such thing as “the” true genome map for a complex trait. Even replicated hits are not detected in all studies. In most cases, in both humans and experimental species, the QTL are chromosome regions, often well over 1 Mb long, and may contain tens or hundreds of genes, with no specific gene in the interval statistically implicated (as yet). Replicable QTL usually account for only a fraction of the genetic risk, as estimated by heritability or familial correlation, and a correspondingly smaller fraction of the overall risk that includes environmental effects.

Population isolates such as Finland, Iceland, Sardinia, the Quebecois, or American Hutterites or Amish have been popular sampling frames, on the grounds that due to isolation or founder effect, they will have less variation to sort through. This has helped identify some genes, by reducing background etiological litter, although isolates usually turn out to be less homogeneous than had been thought—the founding bottlenecks simply were not that severe. Indeed, causation seems comparably complicated even in digenomic crosses between just two strains of inbred mice!

Upon closer inspection it often turns out that different implicated genotypes need not produce exactly the same phenotype. More precise phenotype definition, such as clinical subcategories, can refine mapping and increase the significance or narrow the implicated chromosomal range of some QTL. The price paid is that each subcategory is rarer and the population impact of its QTL less. On the frustrating other hand, sometimes individual traits generate only weak, broad QTL mapping peaks, but multivariate trait analysis of the same data sharpens the peaks. That could indicate that one pleiotropic gene in the chromosomal region affects multiple traits or multiple genes each related to a different trait.

Mapping is essentially made possible by linkage disequilibrium (LD) between marker and chromosomally nearby causal sites, and the fact that markers “tag” causal sites only by indirect, and usually incomplete, statistical association means that once we find the functional site(s) more of the genetic risk may be accounted for (Goldstein et al. 2003). However, the high LD that enables mapping also often makes it impossible to dissociate many linked variable sites to identify which are causally relevant, meaning also that the tagged association is closer to capturing the total association at that region. Independent new data, other study designs, and especially the discovery of different alleles at the same locus also associated with the trait strongly reinforce the candidacy of the QTL. When followed up in detail, the gene often turns out to make functional sense relative to the trait. But that the statistically indirect nature of marker-based mapping does not typically account for relatively weak estimated effects or the unmapped fraction of heritability can be seen by follow-up studies of known “causal” SNPs, as shown, for example, by meta-analysis.

Meta-analysis that jointly analyzes multiple or pooled studies often achieves sample sizes adequate to support the candidacy of replicated SNPs and/or to see how geographically widespread similar associations are, although the relative risks typically converge toward a small overall effect, and many or even most candidates fail to survive the test (e.g., McPherson et al. 2007; Allen et al. 2008). Meta-analysis presents a number of analytic challenges (e.g., Ioannidis et al. 2004; Ioannidis 2007; Kavvoura and Ioannidis 2008), not the least of which is upward biases in risk estimates, especially the “winner's curse” of first reports (Göring et al. 2001; Begg 2002; Zollner and Pritchard 2007), which often are based on studies intentionally biased to optimize detection (Terwilliger and Weiss 2003). There is another subtle bias, in that meta-analysis is a candidate-gene design, testing the effects of a known allele rather than searching for unknown effects, but it usually includes the often-biased first report, but not mapping studies that did not find a “hit” in the candidate's chromosomal region (this would admittedly be hard to do for various reasons of comparability between mapping scores and candidate gene tests, made worse perhaps by reluctance of journals to publish negative results). Some meta-analyses find statistical evidence of risk heterogeneity among studies, a warning that QTL that are not consistently replicated may not all be false positives. By the same token, there must also be many false negatives.

Even after a gene has been identified there is still more mapping to be done. And it turns out that there is no truly free lunch, as “simple” traits are not so simple after all (Scriver and Waters 1999). Many different alleles are found in the normal population (see dbSNP at http://www.ncbi.nlm.nih.gov/projects/SNP/ or HapMap at http://www.hapmap.org/), and tens to hundreds of alleles are found among patients: >560 for PAH, 1400 for CFTR, and 1300 for Dystrophin (HGMD), usually served one to a haplotype. A few miner's canaries that enabled the gene to be mapped are of high penetrance and relatively common (among patients, though usually rare in the general population), but subsequent resequencing of the gene in patients reveals a long tail of increasingly rare alleles, most of which have been observed only once. Even the relatively common alleles are often restricted to a single geographic region, and the allelic spectrum may have no overlap across continental regions. Therefore, unless the mutation clearly knocks out function in the gene, for example by causing a frameshift, singleton or near-singleton alleles can legitimately be considered disease related only on the assumption that the gene is responsible for the disease in the person in which it is found.

This variation shows that Mendelian notions such as recessiveness have clung on beyond their sell-by date, because many or even most cases of some classical “recessive” diseases are actually heterozygotes at the sequence level, which close inspection shows have quantitative rather than dichotomous genotype–phenotype associations. Indeed, Mendel himself probably could not have succeeded had he had to work by mapping in samples of wild peas rather than carefully choosing traits segregating dichotomously in inbred lines that he could study experimentally. However, we seem to hunger to make things categorical, and these findings have led to new clinical entities like mild PKU or nonphenylalanine hyperphenylalaninemia, to supplement classical PKU (http://www.PAHdb.mcgill.ca).

Some traits appear to be a mix of genetic complexity and simplicity: many different genes have been implicated, but most families seem to be segregating a highly penetrant allele at only one of them. Examples include nonsyndromic deafness (125 genes; http://webh01.ua.ac.be/hhh) and retinitis pigmentosa (>100 genes; Hartong et al. 2006) or epilepsies (Meisler et al. 2001; Crino 2007). This is multiple unilocus etiology, quite different from classical polygenic traits in which variants at many different genes are thought to contribute to the trait in each affected individual. Yet these genes together account for only a fraction of cases, and as with all other genes many different mutations are found among cases, with a spectrum of frequency and phenotypic effect within and among families.

Along with this cornucopia of data, the menu of the Mapper's Café has also greatly expanded: it is becoming clear that our search for candidates must go beyond the few percent of the genome that comprises the exons of our paltry 20,000 genes. Genes often have multiple context-specific splicing variants (Hiller and Platzer 2008), their expression controlled by multiple alternative or sometimes countervailing regulatory elements (Davuluri et al. 2008), phenotypes affected by multiple cis-alleles (perhaps on the same haplotype) that may have compensating phenotypic effects relative to each other (Kondrashov et al. 2002; Kwiatkowski 2005; Hughes et al. 2006; Li et al. 2006), and trans-haplotypic effects as well as other kinds of epistasis that are important (e.g., Moore and Williams 2005; Tsai et al. 2007). For the bulk of common complex traits such as chronic diseases that allow successful embryogenesis and develop gradually or strike decades later in life, gene regulation rather than coding differences may be the more important source of phenogenetic variation (Manolio et al. 2008), yet regulatory regions are currently largely unknown in number and location and can even be trans to the affected gene (e.g., Chen et al. 2008).

It also now appears that a large fraction of genomes are transcribed into noncoding RNA, whose conserved pattern, along with subtle genome structural or copy number variation (CNV) (projects.tcag.ca/variations and eichlerlab.gs.washington.edu/database.html), suggests phenotypic relevance (Birney et al. 2007; Pheasant and Mattick 2007; Stranger et al. 2007; Amaral et al. 2008; Hurles et al. 2008; Weiss et al. 2008). Best understood at present are miRNA “genes” that regulate expression via effects on chromosomal packaging or mRNA translation via RNA interference, which can have disease consequences (Van Rooij et al. 2008). Epigenetic chromosome modification such as by sequence-specific nucleotide methylation or histone acetylation that affects gene expression, along with CNV, can be polymorphic even between MZ twins (Wong et al. 2005; Brena et al. 2006; Petronis 2006; Bruder et al. 2008). Parent-specific or monoallelic expression turns out to be widespread in autosomal as well as X-linked genes and which allele is activated is at least partly stochastic (Krueger and Morison 2008). This choice is made cell by cell early in embryogenesis and is mitotically remembered in cell lineages thereafter, creating somatic mosaics within and thus phenotypic variation among nominally identical heterozygotes at the locus. Even the humble mitochondria have surprising proliferating effects that can be somatic as well as inherited (Wallace 2008). Somatic changes, including mutations, contribute phenogenetic variation that is not transmitted across generations and hence is cryptic to mapping strategies (Weiss 2005).

The ability to test a given cell type for expression of a high fraction of genes in the genome has led to a new type of mapping search, for expression QTL (eQTL). The idea is to find sequence variation whose effect is to alter the timing or expression of other genes (Morley et al. 2004; Cheung et al. 2005; Stranger et al. 2007). Cluster analysis identifies sets or networks of genes whose correlated expression is altered by variation in a mapped region elsewhere in the genome (Wang et al. 2006; Chen et al. 2008) and animal studies can be tested for consistency with human disease etiology (Emilsson et al. 2008). Similarly, correlated expression-level changes involving large numbers of genes characterize cell-specific expression profiles, or before-and-after effects of experimental treatment, or, in the case of diseases like cancer, effects related to prognosis (Nevins and Potti 2007). Because most genes are pleiotropic, expressed in and affecting multiple traits (Buchanan et al. 2009), this approach, while highly promising, requires access to the right types of cells at the right time. This is not a trivial issue especially when the trait is complex or developmental: What cells do you look at if you want to understand craniofacial development, diabetes, or schizophrenia, and how do you get your hands on the cells?

The sobering fact about these many genomic functions is that they each add to, and never subtract from, the sequence elements that may affect disease. How much each of these new functional genomic features actually contributes to phenotypic variation is anybody's guess at present. But together they comprise a large DNA target for mutation and variation. The point is not that noncoding variation is unmappable, but rather that to find functional candidates, we need to search whole megabase-long QTL regions rather than just the protein-coding genes they contain. At ∼1000 SNPs/Mb of candidate region (even just in a pairwise comparison), this raises many problems for statistical and causal inference.

The patterns I have described tell an empirical tale without any over-arching theoretical framework, so one is free to believe that to resolve the incompleteness all we need are larger-scale, longer-term studies. This argument rests on a subtle assumption that the signal-to-noise (S/N) ratio will improve with sample size, number of markers or complete genome sequence, phenotype details, and number of environmental variables. Yet it is not obvious that S/N will behave as expected. Will genetic or other sources of heterogeneity rise as fast as sample size? Scaling up mapping studies will not detect alleles too rare to generate significance in the sample, and common or large-effect alleles can already be replicated and, once identified, their effects estimated directly from small samples. So the huge longitudinal biobanks being launched can be expected mostly to refine estimates of modestly common alleles with modest effects and discover a scattering of minor effects (Figure 3).

The demand for increased sequence and sample size may itself be evidence that we are approaching diminishing returns. Scaled-up studies move us ever more toward tilting at quixotic trait loci, chasing the effects most difficult to replicate, hardest to discriminate between true and false positives, or from which to make accurate risk estimates. Like the man of La Mancha (Figure 4), we have perhaps misperceived the sails of QTL windmills: they are not standing giants waiting to be lanced, but elusively whirling, sometimes ephemeral targets. Despite the unquestioned exceptions, quixotic trait loci are the rule, and we have known to expect them for nearly a century.

Figure 4.—

Don Quixote, undaunted in assaulting elusive evil. Drawings are by Gustav Doré (1869), inspired by Don Quixote (Miguel Cervantes, 1605).

Indeed, it is more than a little remarkable that the same phenogenetic pattern pertains to traits that are genomically, functionally, histologically, and adaptively unrelated, in plants, animals, and microbes. Surely there must be meaning there! It is a central lesson, for which an evolutionary perspective provides a kind of theoretical support that has otherwise been missing.


The broad outlines of what we see today were predicted by early geneticists with little direct understanding of the nature of genes, based almost exclusively on the phenotypic similarity among related individuals and Mendelian principles backed by basic biology and experimental breeding and population genetics (Wright 1931, 1978; Waddington 1957; Provine 1971, 1986). A benchmark article was R. A. Fisher's 1918 turgid demonstration that the combined effects of many “Mendelian” (discretely segregating) loci could account for both quantitative inheritance (of continuously varying traits) and the observed correlation among relatives (Fisher 1918). A central fact about this model is genotypic equivalence, that different genotypes—different combinations of alleles—can confer effectively the same phenotype. Each gene is a “will-o-the wisp” (Wright 1934), but together they constitute the classical polygenes that contribute effects to, rather than cause, traits; their individual effects are individually small—in the theoretical limit, infinitely many identical loci each contribute infinitesimal effects. This is why complex traits can aggregate but not segregate in families—and can at the same time be “genetic” and yet not Mendelian.

The recent response to the quixotic trait locus landscape is the catch phrase “systems biology.” But long before their molecular nature was known, leading biologists said in strikingly modern ways that complex traits are the result of networks of multiple contributing, interacting genes (Morgan 1917; Waddington 1957; Wright 1931, 1978) (Figure 5). Volume 1 of Wright's great retrospective is especially instructive of the early recognition of the fundamental facts (Wright 1968). What we have been discovering in disease genetics was also predictable before the outpouring of modern data (Weiss 1993; Weiss and Terwilliger 2000), which are now providing at least preliminary documentation (Figure 5, C and D).

Figure 5.—

Complex trait architecture, then and now: early conceptual diagrams of interactive complexity of genes. (A) Wright's schematic indicator of combinations of genotypes that can affect phenotypes affected by just five pairs of alleles with potential differences among the interactions depending on which alleles are present (Wright 1931). (B) Waddington's metaphor of multiple genes tugging on various parts of a developmental landscape (Waddington 1957). (C) Elements of a genetic network of ∼200 diabetes-related genes correlating gene expression and human variation; red and green indicate, respectively, genes with positively or negatively correlated expression (courtesy of Joanne Curran, Southwest Foundation for Biomedical Research, unpublished research). (D) Hypothetical diabetes-related metabolic syndrome network suggested by mapping and experimental data in a cross between B6 and C3H laboratory mice, in which changes in one mapped location (left) affect a whole network of genes (center), ultimately modifying the final disease-related phenotype (right) (Chen et al. 2008). (D is Figure 4C from Chen et al. 2008 and is reprinted by permission from Macmillan Publishers, Ltd: Nature. Chen et al. Copyright 2008.)

Once the modular nature of the genome and its evolution became known, we could understand where polygenes and networks—the plethora of contributing functional elements described above—come from. Biological traits are built up over eons by episodic mutation and duplication events. Their genetic architecture is not as internally homogenized as textbook polygenic models suggest. Basic functions such as core molecules in signaling or metabolic networks, rate-limiting genes in physiologic systems, or core protein domains are phylogenetically deep and widespread. Subsequent components arise that modify but must be compatible with these earlier ones. Interactions build up in this way so that, viewed retrospectively, we name the result a “network,” often after the “hub” genes (e.g., “Hedgehog signaling”), with more numerous less centrally connected downstream “spoke” genes. The pervasiveness of such systems shows that, while the predominant image of life is the Darwinian one of winner-take-all competition, the predominant nature of life itself is of cooperative interactions among multiple components: signals and receptors, proteins and each other or with DNA, and so on (Weiss and Buchanan 2008a,b). So it is no surprise that multiple genes inherently affect interestingly complex traits.

There are always exceptions, because life is a contingent, highly stochastic evolutionary phenomenon, but generally there is reason to think that the coding regions of early developmental or hub genes evolve relatively slowly (Kim et al. 2007; H. A. Lawson, unpublished results). Partly this is because such genes are typically pleiotropic, rather than being evolved for some specific function (Buchanan et al. 2009). Mutations with large effect (almost always negative) may be quickly removed by the cold hand of selection, having little chance to become geographically widespread. However, most functional genetic elements do tolerate variation and most new mutations have little effect either on phenotype or on Darwinian fitness (Ohta 2002; Eyre-Walker and Keightley 2007; Keightley and Eyre-Walker 2007; Lynch 2007b; Bodmer and Bonilla 2008; Boyko et al. 2008). These alleles are evolutionarily neutral or nearly neutral, and their frequencies change predominantly by genetic drift. Drift generates distributions of allele frequencies that are skewed toward rare, local alleles with only a few common, widely distributed ones at a given locus.

Surprisingly, natural selection, even balancing selection, leaves qualitatively similar results. The numerous genetic elements that affect complex traits present a large DNA target for mutation, and many alleles arise that have a comparable adaptive effect, and their frequencies evolve by drift relative to each other (Hartl and Campbell 1982). For every common allele maintained even by heterosis, such as sickle cell hemoglobin in malarial environments, there are many rare, geographically local alleles (at the same or other genes) also maintained by the same selective pressure, and even the major alleles are often found only within a geographic region (Kwiatkowski 2005).

Evolution generally molds complex traits to have modal distributions, in which most individuals are within the trait's historically accepted fitness range. This generates a subtle confounding of frequency and effect size that is reflected implicitly in Figure 1 (Sing et al. 1996). Like fitness, allelic effects are essentially measured relative to a population mean, so that a “large” effect must almost by definition be far from the mean and hence rare: were it too common, it would be the mean, with zero effect.

These points together comprise one of the underappreciated implications of gradualism, a cornerstone of evolutionary theory. Evolution, working through phenotypes, but only indirectly on underlying genotypes (Weiss and Buchanan 2003, 2004), has led to the complexity that buffers organisms against devastating mutation. Interestingly, while major mutations and hub genes with easily studied effects understandably receive the preponderance of experimental investigation, their importance and constrained evolution may mean that adaptive evolution usually occurs through small-effect mutation in the less vital, but more numerous and more nearly neutral downstream or peripheral genes with causation less driven by strong selective adaptation and more internally heterogeneous (e.g., Lynch 2007a,b; Weiss and Buchanan 2008a,b).

Why does this inform us what to expect in genetic association studies? Geographically dispersed or common risk alleles are older and more likely to be repeatedly detected (Chakravarti 1999). But, their widespread dispersion indicates that those alleles are benign (at least in regard to fitness history), so if they are associated with disease the causal finger actually points to recent environmental change rather than primarily to genetic etiology. Rapid environmental change and secular changes in incidence of complex traits are characteristic of our age, and most common chronic disorders in the developed world are possible because of reduced risks of infectious or early onset disease, plus widespread exposure to sedentary lifestyles and old age (Neel 1962; Trowell and Burkitt 1981; Pollard 2008). Yet ironically, these largely environmentally induced diseases have become the most intensely studied by geneticists.

A few years ago the idea was promulgated that common variants were commonly going to be found to make major contribution to our common diseases and hence to public health: the common variants/common diseases (“CV/CD”) notion (e.g., Reich and Lander 2001; Goldstein et al. 2003). If alleles with large effect are rare, yet a disease is common, then it might seem that the few contributing small-effect alleles must have high frequency (Chakravarti 1999)—otherwise there would be too few risk genotypes for the disease to be common. Common variants are economically attractive pharmaceutical targets because large numbers of people would be affected by them. Of course, “common” is a moveable target, and CV/CD had an element of hopeful thinking from the beginning (Weiss and Clark 2002). It has not generally been borne out by experience (Figure 3, Bodmer and Bonilla 2008), even though the exceptions properly receive special attention (e.g., Goldstein 2008; Mallal et al. 2008).

Nonetheless, association mapping can help identify unsuspected pathways even when segregating variants per se do not have major public health impact. CV/CD probably has the most promise in specific molecular-recognition interactions, such as autoimmune or infectious disease, chemical exposures, or pharmaceutical agents (Goldstein et al. 2003), but again these are environmental interactions. Indeed, a potential future surprise could be that more chronic diseases have an immune or infectious component than has been suspected.

We still have to ask how a trait with substantial heritability that is produced by alleles at several genes could be common if those alleles are rare. One answer may lie in the size of the aggregate of alleles that affect complex trait values, mostly rare as the empirical data and theory (e.g., Pritchard et al. 2000) suggest. The biomedical ascertainment system of registries and specialty clinics collects cases from populations numbering hundreds of millions, thus ascertaining very rare alleles. They are geographically local and difficult to replicate, but they may provide a sufficient pool of risk polygenotypes to make the trait common. These are the quixotic trait loci that populate current mapping data.


Genetic association studies reconstruct the history of today's phenotypes. This must perforce be done retrospectively, in terms of the sampled persons' exposures to their inherited genotypes at conception and to subsequent environmental factors. Yet risk estimation is a prospective enterprise, to predict future genotype-specific phenotypes, and here the devil is in the nongenetic component. The lesson should be chilling. Among the most undisputed disease risk alleles are those at BRCA1 and BRCA2 associated with breast and ovarian cancer. Yet, their estimated risk varies by roughly twofold depending on many factors including birth cohort (Fodor et al. 1998; King et al. 2003; Chen et al. 2006) and lifestyle differences.

For alleles as dangerous as those in BRCA1/2, causation seems to be real and screening can lead to prevention without worrying about the fine points of the risk estimates. But risks associated with lesser genotypes will more likely be enhanced, reduced, or even disappear under future environmental conditions, and new ones will appear. Yet we have no way to know how those exposures will change in the future, especially if we must ascertain them in exceedingly subtle ways from conception onward (Doblhammer and Vaupel 2001; Gluckman et al. 2008), although we can be confident that they will change. Things always look simpler after the space of unseen possibilities is narrowed by the quixotic turns of history down to one—what actually happened. This is a major conceptual weakness of prospective biobank studies because, once done, their estimated risks will again be backward looking.

A profound epistemological challenge is that environmental risks are, if anything, even more problematic to identify and estimate, much less to predict, to the point that environmental epidemiologists have been rushing to the genetics bandwagon expecting to be bailed out by causal factors that are more tangible, not realizing that we face very similar problems (Buchanan et al. 2006).

The instability or unpredictability of genotype-specific risks raises obvious ethical issues. Reviews of association studies reflect understandable enthusiasm; caveats are usually offered, but often seem unconvincing or stated largely in passing (e.g., Blangero 2004; Daiger 2005; Cardon 2006; Evans and Cardon 2006; Jaquish 2007; Khoury et al. 2007a,b; Chen et al. 2008; Emilsson et al. 2008; Hunter et al. 2008b; Janssens et al. 2008; Manolio et al. 2008; Pearson and Manolio 2008; Yesupriya et al. 2008). I would not be the first to note that the literature often reflects at least potential corporate, professional, or institutional conflicts of interest. There are also bioethical implications of the reluctance of leading journals, despite the known issues reviewed here, to publish negative studies: they are just not exciting enough. Yet from a biological point of view, good negative results may be some of the best, most positively instructive evidence about genetic architecture, and tentative positive studies, though headline grabbing, could be the most misleading.

What risk estimates (if any) should be given to customers of DNA testing companies or posted on public websites (e.g., dbGAP at ncbi.nlm.nih.gov, or HuGe at http://www.cdc.gov/genomics/hugenet/, and see Yu et al. 2008)? Should censoring or self-censoring be imposed? These are active, but difficult questions, especially in an impatient, market-driven age (McGuire et al. 2007). One need think only of the cost to health care systems of using problematic associations as the basis of clinical or lifestyle intervention or of the potential social consequences of intervening in regard to genes widely treated as if they were “for” various kinds of behavior but that in most if not all cases involve complex interactions and only vaguely specifiable environmental conditions. As of this writing, services that use genotypes to give individualized risk are under legal scrutiny concerning what constitutes medical practice vs. “informational service” (Wadman 2008).


We have been served a feast of proverbial low-hanging phenogenetic fruit (Blangero 2004), genes with clear effects on normal or pathological traits in humans or countless other organisms. Findings to date have not had a major impact on public health; that may be sobering (think of sickle cell, known for >60 years), but should not be discouraging, since there is no reason to expect genetic engineering to be easy or quick just because a gene is known. The facts in hand also support our understanding of the evolutionary origin of genetic diversity. The knowledge we have gained constitutes substantial, positive, and reassuring scientific success. But in the face of quixotic trait loci, it is too early to declare “mission accomplished.”

The pace of biotechnological growth may have exceeded even Moore's law that computing power doubles every 2 years. It may soon reach its human threshold of 6+ billion nucleotides affordably identified in each diploid individual. With larger samples with cheaper and better DNA sequencing, many statistical power issues will fade as limiting factors. But another law seems to apply: Murphy's law, that whatever can go wrong will go wrong. Complex biological traits have many redundant, interlaced, stochastic, interacting, variable, emergent properties. Consistent with this, disease-related genes show every conceivable type of mutation. To the extent that each instance of a phenotype is etiologically unique, it can be resistant to science that depends on replication. Yet the strategies currently proposed are for even more technologically intense enumerative reductionism.

By contrast, quantitative genetics has been the basis of agricultural breeding formally or empirically, for thousands of years. Artificial selection basically works by aggregate empiricism, without needing to identify specific genes (Falconer and Mackay 1996; Lynch and Walsh 1998; Griffiths et al. 2004). The object is to change genetic architecture across generations, a speedup of the natural evolutionary process that produces organisms. But the need in biomedical genetics is the more specific challenge to address individual risk, within the individual's lifetime.

Proponents of systems biology suggest that “targeting of whole networks” (Chen et al. 2008; Hunter et al. 2008a) will be the answer. In principle this is analogous to quantitative genetics applied to individuals within their lifetimes rather than to populations across generations. Computational and experimental approaches in simple model systems have nibbled at causal networks (e.g., Moore and Williams 2005; Moore et al. 2005; Keller et al. 2008; Zhu et al. 2008) or related individuals' relative position in genotype space to complex phenotypes (Nievergelt et al. 2008). But whether such approaches can tractably yield substantial, stable individual risk estimates or account for the bulk of genetic risk, or will again run up against a few modestly predictive network genotypes, probably mostly rare, and a long tail of ephemeral ones, remains to be seen.

Unfortunately, I think the latter is the predictable outcome, even if the number of risk loci is small (Pharoah et al. 2008). An important network should be identifiable without needing huge biobank studies. But its individual multilocus risk genotypes will almost automatically be rare to exceedingly rare, even if the number of loci is on the order of 10, much less hundreds, and the component alleles are individually common. So while mapping naturally occurring variation may be able to identify pathways that can then be followed up in other ways for biological understanding, the persistent hope for major individual risk prediction from that approach remains problematic at best.

Note that this discussion is about inferring biological mechanism and causation from naturally occurring variation. The story may be quite different in experimental biology where these things can be studied in replicable model systems at the cell and developmental level. Model systems provide mechanistic stereotypes, which may be quite useful in, for example, designing therapeutic agents to target stable biological pathways. But the degree to which that really reduces complexity or addresses the problem of connecting variation to outcome—individual genotypes to individual phenotypes—such as in the medical setting, is doubtful.

Given the present situation, and that the genetic architecture of no biological trait is as yet fully known, I think it is important if not urgent to try to get a much better understanding of the phenogenetic lay of the land that we are trying to infer. Computing power now makes possible very flexible, forward evolutionary simulation approaches that may be a key tool in such investigation. Useful simulation can be based on natural evolution-by-phenotype approaches (Lambert et al. 2008) or ones more centered on genomic questions (e.g., Hoggart et al. 2007; Peng et al. 2007; Edwards et al. 2008).

But perhaps we also need a clearer goal. What would it mean to say that we understand the genetic basis of a biological trait? Must we know all the genes that contribute? All their variation? In all populations—or all species—and their frequencies? Specifically for each instance? These are literally hopeless ideals, since new variation is always arising by mutation and recombination and being lost to selection and drift, and environments are mostly unmeasured but surely always changing.

Let us look at this with an analogy, as in Figure 6. Suppose we are in New Orleans and wish to predict each instance of a flood of the Mississippi River (not counting hurricanes coming from the other direction). Figure 6A shows the major rivers that contribute to the Mississippi flow past New Orleans. Is this enough to monitor, or do we need to enumerate and measure all the streamlets in the entire drainage system, shown on the right, to make our prediction? Their courses continually vary and are subject to the vagaries of local weather and human activity, and their contributions differ greatly from flood to flood. They are the fluvial equivalent of quixotic trait loci. They are real, but they are elusive and ephemeral. How far upriver do we have to go before we know enough?

Figure 6.—

Mississippi River drainage to New Orleans. (A) The major contributing rivers. (B) The entire drainage system (courtesy of R. K. Weiss, ESRI Geographic Information Systems).

Don Quixote is a satire whose hero is usually treated as the object of ridicule, but in fact he was a sympathetic hero who knew he was illusional, but remained undaunted in his quest against evil. In our quest to understand genetic architecture, it is possible to imagine that its complex appearance is an illusion and that the low-hanging fruit really does tell the biological tale. But if the exceptions really are exceptions, we risk being lured to struggle vainly for the rest of the bunch, which may remain out of our grasp until we have a more biologically grounded approach than to enumerate the quixotic fraction of nature. Meanwhile we should at least do our best to discriminate carefully before we tilt at windmills, thinking that they are giants.


I appreciate the editors giving me the freedom to express this perspective on a complex but important problem. The references cited are my quixotic attempt to exemplify issues fairly, by recent examples and overviews where back references can be found, but with no attempt to be comprehensive. I apologize to the countless authors whose equally worthy work I do not know or could not explicitly acknowledge. I thank Anne Buchanan, Allan Spradling, Barak Cohen, and an additional reviewer for helpful criticism of the manuscript. My work in this area is supported by grants from the National Institutes of Health (MH063749) and the National Science Foundation (BCS 0343442 and BCS 0725227) and by my Penn State Evan Pugh Professor's research fund.


  • Communicating editor: A. Spradling


View Abstract