Abstract
The past decades have witnessed extensive efforts to correlate fitness traits with genomic heterozygosity. While positive correlations are revealed in most of the organisms studied, results of no/negative correlations are not uncommon. There has been little effort to reveal the genetic causes of these negative correlations. The positive correlations are regarded either as evidence for functional overdominance in large, randomly mating populations at equilibrium, or the results of populations at disequilibrium under dominance. More often, the positive correlations are viewed as a phenomenon of heterosis, so that it cannot possibly occur under withinlocus additive allelic effects. Here we give exact genetic conditions that give rise to positive and negative correlations in populations at HardyWeinberg and linkage equilibria, thus offering a genetic explanation for the observed negative correlations. Our results demonstrate that the above interpretations concerning the positive correlations are not complete or even necessary. Such a positive correlation can result under dominance and potentially under additivity, even in populations where associated overdominance due to linked alleles at different loci is not significant. Additionally, negative correlations and heterosis can cooccur in a single population. Although our emphasis is on equilibrium populations and for biallelic genetic systems, the basic conclusions are generalized to nonequilibrium populations and for multiallelic situations.
DURING the past three decades, numerous efforts have attempted to correlate fitness (or related characters) with genomic heterozygosity as reflected by molecular marker heterozygosity in natural populations (Mitton and Grant 1984; Allendorf and Leary 1986; Zouros and Foltz 1987; Lynch and Walsh 1997). While a number of studies have found no or negative correlations (e.g., Gainset al. 1978; Pierce and Mitton 1982), positive correlations have been revealed in most organisms studied. There has been little effort to search for satisfactory genetic explanations for the negative correlations, and it is generally regarded that with a parametric slope of nearly zero, negative correlations can result by chance. On the other hand, the more commonly observed positive correlations have stimulated a great deal of research interest and there have been some distinct genetic explanations for them.
One explanation is that (Mitton and Grant 1984; Smouse 1986; Zouros and Foltz 1987) overdominance is the cause, and the positive correlation is a piece of evidence for overdominance underlying the fitness loci. This is especially true in large randomly mating populations in which linkage disequilibrium is not significant. The basis for this view is that with overdominance, one would expect individual fitness to increase with the fraction of the genome that is heterozygous. This fraction should be correlated with the number of heterozygous molecular marker loci if they are tightly linked to polymorphic loci underlying fitness or themselves influence fitness. Under dominance, a correlation between multilocus heterozygosity and fitness cannot arise except when there is a correlation between multilocus homozygosity and the level of inbreeding. Such an association is unlikely to be pronounced in large, randomly mating populations. However, the overdominance argument encounters some difficulties. For example, there are several findings of heterozygote deficiency in populations showing positive correlations between heterozygosity and fitnessrelated traits (Gaffneyet al. 1990; Lynch and Walsh 1997), which should not be expected in large randomly mating populations with the overdominance hypothesis.
Another explanation is that the populations studied may not be strictly panmictic but instead have local inbreeding. Nonrandom mating may cause correlations between homozygosity in the genome, even with unlinked loci (Haldane 1949; Ohta and Cockerham 1974; Houle 1994). The allozyme heterozygosity may thus be correlated with an individual level of inbreeding, and hence the associations between heterozygosity and fitness are largely the consequence of variation in the level of inbreeding among individuals (Lediget al. 1983; Strauss 1986). However, this explanation was not supported by a few studies (e.g., Learyet al. 1987), and large studies in marine bivalves normally exclude inbreeding as an explanation for the observed positive correlation (Gaffneyet al. 1990). In randomly mating populations, if linkage disequilibrium is present, regardless of its causes, linked deleterious alleles with dominance at different loci will likely cause associated overdominance (Houle 1989). This also may explain the positive correlation. However, despite a large number of studies, evidence for linkage disequilibrium in large randomly mating populations remains controversial (e.g., Barker 1979; Lewontin 1985; SmitMacBrideet al. 1988; Houle 1989; Zapata and Alvarez 1992, 1993; Lynch and Deng 1994; Deng and Lynch 1996a).
The correlation approach for fitness and genomic heterozygosity and the different explanations for the observed data are highly relevant to one fundamental and longstanding issue in population genetics: How is genetic variation maintained in natural populations? Overdominance essentially encompasses all forms of balancing selection at the allelic level and dominance is compatible with mutationselection balance. The following two inferences are common to the above two genetic explanations. First, the positive correlation reflects the phenomenon of heterosis. Hence in large randomly mating populations, it cannot possibly exist under withinlocus additive allelic effects. In addition, heterosis should be incompatible with the negative correlations and they cannot cooccur within individual populations. Second, in populations at genetic equilibria, the positive correlations cannot exist with dominance. These concepts have been widely held among the researchers in this field. However, are they always true? Throughout, unless otherwise specified, (genetic) equilibria refer to HardyWeinberg and/or linkage equilibria.
Employing a multilocus biallelic model, and by theoretical analyses supplemented by computer simulations, we show that these two concepts are not true. Moreover, we demonstrate that negative correlations between fitness and genomic heterozygosity are not unexpected, and we give explicit genetic conditions for both positive and negative correlations to occur. Our focus is on equilibrium populations. However, the conclusions derived for multiple loci under equilibrium are generalized to nonequilibrium populations and multiallelic systems through one locus model.
There are extensive data existing for the discovered correlations. There are also some potential limitations of the correlation approach for fitness and genomic heterozygosity (see discussion). Therefore, our focus here is to show when correlations do indeed exist in equilibrium populations, and how to interpret them when they are detected. Hence, some practical issues are not dealt with here, such as what sample sizes are needed and how many loci need to be assayed in order to detect a correlation when it indeed exists. Some of these, or related practical problems, have been addressed before (e.g., Mitton and Pierce 1980; Chakraborty 1981).
THEORY
Consider a simplified situation, in which there are N polymorphic loci underlying fitness, each having two alleles A and a. The allelic effects across loci may vary so that the equilibrium frequencies for the ith locus is p_{i} and q_{i}, respectively. Let the three genotypic values be:
The multiplicative fitness function is biologically plausible by direct and indirect evidence (Mortonet al. 1956; Crow 1986; Fu and Ritland 1996) and will be assumed throughout. The fitness W(n_{1},n_{2}) of an individual is totally determined by the number of heterozygous (n_{1}) and homozygous (n_{2}) loci for a allele in the genome, regardless of the specific genotypes at particular loci:
Under random mating and linkage equilibrium, genomic genotypes for individuals, as determined by n_{1} and n_{2}, follow a trinomial distribution:
Equation 4 can be rewritten as:
Therefore, both positive and negative relationships between fitness and genomic heterozygosity can exist. Which one exists critically depends on the above condition (Equation 6 or 8) under our assumptions of random mating, no significant linkage disequilibrium, and multiplicative fitness function. Figure 1 depicts the parameter space of h and q that gives rise to positive and negative correlations between fitness and genomic heterozygosity. Note that Figure 1 does not imply any true relationship of h and q. It just graphically depicts the outcome regions of the relationship of fitness and heterozygosity given the true relationship of h and q. It can be seen that negative correlations are possible under a large range of parameter space of h and q. In particular, when q is small (<0.1), negative correlations are almost always expected under dominance and additivity. Additionally, positive correlations could potentially exist under additivity, if q is larger than 0.5 (see discussion).
To corroborate our analytical derivations, computer simulations were performed. The population was at HardyWeinberg and linkage equilibria. A total of 1000 polymorphic genomic loci of constant effects h and s with the same equilibrium allele frequencies p and q were assumed to be underlying fitness. At each locus, the genotype of an individual was determined by a uniform random variable ξ(0 < ξ < 1.0). Genotype AA was chosen if ξ < p^{2}, else Aa if ξ < p^{2} + 2pq, otherwise aa was chosen. An individual's fitness was determined by Equation 1. One hundred genotypes were sampled from the population in our simulations. As stated in the introduction, our objective here was to find the genetic conditions for the existence of a true correlation. Additionally, the purpose of our simulation was to corroborate our analytical results. Therefore, the genotypic values of the 100 sampled genotypes were assumed to be measured without error, and are plotted in Figure 2. Regression analysis was performed; the regression slope was positive and highly significant (P = 0.0006). The simulation corroborates very well with our analytical derivations, i.e., the regression line based on the simulated genotypic data coincides with the expected theoretical line (Figure 2).
The above conclusion, which was derived from a multiplelocus system where linkage equilibrium and mutationselection balance were assumed, may be made more intuitive and general. In the rest of this section, we are going to progressively relax the assumptions concerning the genetic systems. First, we consider the simplest case of one locus with two alleles A and a as defined before.
The essential question is then: Which has a higher fitness, a heterozygote or a homozygote? With the heterozygote, the fitness is 1 − hs. With a homozygote, it can be either AA or aa with respective frequencies in the population under HardyWeinberg equilibrium being p^{2} and q^{2}. Thus the expected fitness of a homozygote in the populoation is
Now, let us relax our assumptions even further and assume a general population where even HardyWeinberg equilibrium may not hold. A population resulting from the mixing of different populations may represent such a scenario. The essential question still is: At each polymorphic locus, which has a higher fitness, a heterozygote or a homozygote? For a biallelic locus as above, let us denote the genotype frequencies as P_{AA}, P_{Aa}, and P_{aa} respectively. The expected fitness of a genotype conditional on that it is homozygous is:
All the above analyses are for biallelic genetic systems, which are applicable for many allozyme loci and restriction fragment polymorphisms (RFLPs). However, some allozyme loci have more than two alleles and the increasingly employed microsatellite marker loci are even more polymorphic. In the following, we are going to give the general genetic conditions of the correlation relationships for multiallelic systems. We will use the triallelic genetic system as an example; extensions to genetic systems with more alleles are straightforward and can be obtained similarly.
Let the genotypic values and frequencies of a triallelic locus be:
Therefore, if
DISCUSSION
Under the simplified model of two alleles at each locus, in randomly mating populations with genetic equilibria, a positive correlation between genomic heterozygosity and fitness with dominance seems to be counterintuitive and has not been revealed before (Figures 1 and 2). A more counterintuitive conclusion is that a positive correlation could potentially exist even under withinlocus additive allelic effects, without any dominance or overdominance in equilibrium populations (Figure 1, Equations 6, 7, 8 and 9). However, our analyses clearly demonstrate that these are entirely possible for a range of parameter space under HardyWeinberg and linkage equilibria. Importantly, negative correlations between fitness and genomic heterozygosity are actually not unexpected under a wide range of plausible parameter space of h and q in equilibrium populations (Figure 1). Additionally, in nonequilibrium populations, a positive correlation can also exist with dominance, overdominance, and additive allelic effects if Equation 10 holds at polymorphic loci; otherwise a negative correlation may exist. Furthermore, with no assumption about the genetic equilibrium, the exact genetic conditions for the positive and negative correlations for general multiallelic systems are also given. All of these results are new in that they give genetic conditions for the correlations, which directly link the genetic effect h (and s) with the population property of gene (or genotype) frequencies.
The conditions for these counterintuitive phenomena to exist do not seem to be prohibitive given some level of biological knowledge. For a positive correlation to exist, q does not have to be very common in the dominance case (Figure 1). In the additive case, in order for h_{c} > 0.5 so that a positive correlation could possibly exist (Equations 6, 8 and 9), q has to be greater than 0.5 in populations at genetic equilibrium or P_{AA} < P_{aa} in populations at disequilibrium (Equation 10). Are these entirely impossible? Our knowledge is very limited on fitness effects at polymorphic loci such as those revealed by molecular markers (Kimura 1983; Nei 1987; Li 1997). For example, how is the extensive polymorphism in natural populations maintained? What is the difference in the fitness effects of different polymorphisms? How much genetic variation of fitness can polymorphism at a particular single locus explain? What is the mutation rate at a locus? Our knowledge about the population history, such as population dynamics and population admixture, is also very limited. Therefore, the possibility of q > 0.5 cannot be entirely ruled out in at least three conceivable situations. The first is in populations that have been large for long enough generations, so that a mutationselection balance has been approximately reached. For instance, for slightly deleterious mutations (Ohta 1973, 1974), it is not unreasonable to assume s = 0.0001. Even in the most prohibitive case of additive effects, with mutationselection balance, the locus mutation rate u is inferred to be on the order of 1.0E5 (Crow and Kimura 1970) in order for q > 0.5. This is roughly on the order of the few mutation rates inferred for allozyme loci (Hartl and Clark 1989; MaynardSmith 1989). For instance, Schleger and Dickie (1971) estimated that, for five loci tested in the mouse, the average mutation rate is 1.1E5. Since this is only for visible mutants, it is likely to be a lower bound. It is noted that by inferring the order of u, reversible mutations are ignored. This may be partially justified by the fact that mutations at many of its nucleotide sites within a locus may be slightly deleterious; however, in a mutant allele, only reversible mutations at those few mutated nucleotide sites can restore the original wildtype allele.
The second situation where q may exceed 0.5 may be in populations experiencing recent expansion, where mutationselection balance is not established yet but HardyWeinberg and linkage disequilibria are not significant. We focus on considering the conditions for different correlations under HardyWeinberg and linkage equilibria. In order to assume constant q under constant h and s across loci, mutationselection balance is assumed when we derive Equation 8 for the case of multiple loci. However, mutationselection may not be an essential assumption for our conclusions. This can be easily seen, since the same basic conclusion (Equation 8) is derived for the single locus case (Equation 9) without assuming mutationselection balance. It is known that HardyWeinberg equilibrium can be established by just one generation of random mating; linkage disequilibrium decays at a rate of r. r is the recombination rate between two loci at disequilibrium. However, to reach a mutationselection balance, roughly, for those mutants with s greater than 10/N_{e}, where N_{e} is the effective population size, the population has to have an annual population size in excess of N_{e} for a time span (in generations) of at least a few N_{e} (Kimuraet al. 1963; Lynchet al. 1995; Deng and Lynch 1996b). Therefore, it takes a much longer time for a population to reach mutationselection balance than to reach approximate genetic equilibria. So, it is possible for a population to be approximately in genetic equilibria without reaching mutationselection balance. The third scenario is that for nonequilibrium populations due to recent population admixture, we can have P_{AA} < P_{aa} fairly easily if two populations mix and the larger proportion is from the population homozygous for a.
The observed negative correlations between fitness traits and heterozygosity in a number of studies (Gainset al. 1978; Pierce and Mitton 1982; Mitton and Grant 1984; Allendorf and Leary 1986; Zouros and Foltze 1987; Lynch and Walsh 1997) have not attracted much attention. Hence, there has been hardly any satisfactory genetic explanation for them. Researchers have generally attributed these negative correlations to statistical artifacts and commented that with a parametric slope of nearly zero, negative correlations can result by chance. However, we show here that the negative correlations may indeed result if, at each locus, heterozygote fitness is smaller than the weighted mean fitness of homozygote, or h > h_{c}. As noted in Figure 1 and Equations 6, 7, 8, 9 and 10, for a diallelic system, the negative correlations can actually occur under a wide range of parameter space of h and q in equilibrium and nonequilibrium populations. This conclusion contradicts that of Turelli and Ginzburg (1983) for multiallelic systems. By numerical analyses via computer simulations, they concluded that in populations with little linkage disequilibrium, average fitness always increases with genomic heterozygosity. The different conclusions of ours and that of Turelli and Ginzburg (1983) are not due to the genetic systems under study (biallelic vs. multiallelic), since we clearly show that the negative correlations can result under multiallelic systems (Equation 11). Turelli and Ginzburg (1983) did not study the influence of the detailed genetic effect h (and s) in association with gene (or genotype) frequencies on the correlations.
The potentially common negative correlations are not inconsistent with the widely observed inbreeding depression. For a single locus with two alleles, the necessary and sufficient condition for population inbreeding depression to occur is that the heterozygote has fitness greater than the arithmetic mean fitness of the two corresponding homozygotes, not weighted by their population frequencies as in Equations 6 and 10 (Crow and Kimura 1970; Falconer and Mackay 1996). Therefore, the widely observed inbreeding depression does necessarily imply that the negative correlation should be rare. In fact, when q is small, the negative correlation should be more likely unless h is also very small (Figure 1). Please note, underdominance (where h > 1) may be an alternative and sufficient explanation for the negative correlations. However, it is not a necessary one, since h does not have to be greater than 1.0 if h_{c} < 1. As long as h > h_{c}, a negative correlation will exist. Figure 1 depicts the parameter space where inbreeding depression and negative correlations can cooccur in a single population. Because of this potential cooccurrence of inbreeding depression (heterosis) and negative correlations, the positive correlations may not be interpreted as equivalent to heterosis.
It should be pointed out that although the derivation for multipleloci in the theory section is based on the assumptions of constant effects (h and s) and multiplicative fitness function, these assumptions are not essential for our main conclusions. In the theory section, for the one locus model, we also showed the same basic results as the multilocus model. If h_{i} and s_{i} are variable across loci, each locus will have its own peculiar q_{i}, and thus a peculiar h_{ic}, at mutationselection balance. At the ith locus, h_{ic} is determined by the equilibrium q_{i} (Equation 7), which in turn depends on the specific allelic effects h_{i} and s_{i} for equilibrium populations at mutationselection balance (Crow and Kimura 1970). Under either multiplicative or additive fitness functions across loci, as long as at each locus, h_{i} < h_{ic}, being a heterozygote has higher expected fitness than being a homozygote; thus, a positive correlation between fitness and genomic heterozygosity will result. Similar conclusions hold for the situations of negative correlations. In cases where h_{i} < h_{ic} at some loci and h_{i} > h_{ic} at other loci, no correlation or either correlation could exist, which depends on the fitness effects and the number of loci with h_{i} < h_{ic} relative to those with h_{i} > h_{ic}.
The explanation for the commonly observed positive correlations between fitness (or its related traits such as developmental stability) and molecular marker loci may be complex. In populations that are not strictly panmictic with local inbreeding present, the genomic heterozygosity may be correlated with individual levels of inbreeding. Therefore, the associations between heterozygosity and fitness are largely the consequence of variation in the level of inbreeding among individuals (Lediget al. 1983; Strauss 1986). In randomly mating populations, if linkage disequilibrium is present (whether due to random genetic drift, selection or other causes), linked deleterious alleles under dominance at different loci will likely cause associated overdominance (Houle 1989, 1994). This may also explain the positive correlations. However, we have particularly shown, via analytical approaches supplemented by simulations, two novel results for populations at genetic equilibria. The first is that the positive correlation can result with dominance. The second is that the positive correlation may not necessarily always reflect the phenomenon of heterosis, for the two reasons argued earlier: (1) the positive correlation may potentially exist even under withinlocus additive allelic effects, and (2) the possible cooccurrence of inbreeding depression (heterosis) and negative correlations implies that the positive correlations may not be interpreted as equivalent to heterosis. In populations at genetic equilibria, functional overdominance may be an alternative and sufficient explanation (Smouse 1986); however, it is not a necessary one. Therefore, even for populations at equilibria, the positive correlation observed cannot by itself be evidence for overdominance; it may not even be evidence for heterosis expected with dominance/overdominance either, since it could potentially exist under pure additive allelic effects.
An implication of our results is that there is probably a limitation of the correlation approach for distinguishing the genetic mechanisms responsible for the maintenance of genetic variability. This is because of the following reasons. First, in equilibrium populations, a positive correlation can be explained by both dominance (0 < h < 1.0) and overdominance (h < 0) as long as h < h_{c} (Equations 6, 7, 8, 9 and 10). Second, the negative correlations and heterosis can cooccur under the same genetic conditions (Figure 1). Third, even for the same genetic effect (i.e., the same h) in one species, both correlations could be revealed in different populations if these populations have different genotype frequencies due to different population origins and histories (Equation 10). The limitation of the correlation approach in inferring the mechanisms responsible for the maintenance of genetic variability was also pointed out before on different grounds (e.g., Houle 1994; Fu and Ritland 1996).
A potential application of the theoretical result here is that, for diallelic makers (such as those from RFLP), inference of the upper/lower bounds of h may be made given significant positive/negative correlations being found (Equations 6, 7, 8, 9 and 10. h is an important genetic parameter in population and evolutionary genetics and has been difficult to estimate (even for its bounds, Deng 1997; Deng et al. 1997), especially for those organisms for which controlled breeding is difficult. Whereas, using Equations 6, 7, 8, 9 and 10, the bounds of h may be estimated without controlled breeding with the application of the traditional correlation approach for fitness and genomic heterozygosity in natural populations.
Acknowledgments
We thank Drs. D. Charlesworth, M. Lynch, R. Chakrobarty, and M. Johnson for discussions and comments. We are also grateful to Dr. ZB. Zeng and three anonymous reviewers for their helpful comments that improved the paper. H.W. Deng would like to thank Dr. M. Lynch for years of advice, and Dr. D. Hedgecock for providing support to attend the conference “Genetic and Physiological Basis of Heterosis,” which stimulated the development of this work. The work was in part supported by a FIRST AWARD from National Institutes of Health to Dr. Y.X. Fu. H.W. Deng was supported by a grant from Health Future Foundation to Drs. R. Recker and D. Kimmel while finalizing this paper.
Footnotes

Communicating editor: ZB. Zeng
 Received August 12, 1997.
 Accepted November 7, 1997.
 Copyright © 1998 by the Genetics Society of America