Abstract
Variation among loci in the distribution of allele frequencies among subpopulations is well known; how to tell when the variation exceeds that expected when all loci are subject to uniform evolutionary processes is not well known. If locusspecific effects are important, the ability to detect those effects should vary with the level of gene flow. Populations with low gene flow should exhibit greater variation among loci in F_{st} than populations with high gene flow, because gene flow acts to homogenize allele frequencies among subpopulations. Here I use Lewontin and Krakauer’s k statistic to describe the variance among allozyme loci in 102 published data sets from fishes. As originally proposed, k ⪢ 2 was considered evidence that the variation in F_{st} among loci is greater than expected from neutral evolution. Although that interpretation is invalid, large differences in k in different populations suggest that locusspecific forces may be important in shaping genetic diversity. In these data, k is not greater for populations with expected low levels of gene flow than for populations with expected high levels of gene flow. There is thus no evidence that locusspecific forces are of general importance in shaping the distribution of allele frequencies at enzyme loci among populations of fishes.
BIOLOGISTS using molecular or biochemical markers to infer patterns of gene flow from population genetic structure commonly observe substantial variation among loci in the distribution of allele frequencies within and among populations, even among loci with similar levels of overall variation (e.g., Trexler 1988, Baer 1998a,b). For example, in Trexler’s (1988) data on the Sailfin molly (Poecilia latipinna) in Florida, F_{st} at loci with mean heterozygosity >0.2 ranges from 0.03 to 0.55. Two questions emerge from such data: (1) how much variation among loci is expected if all loci are subject to the same evolutionary forces; and (2) which loci, if any, reveal the truth about patterns of gene flow? Many biologists would suspect that different subsets of loci must be subject to different balances of evolutionary forces; the problem is how to transform suspicion into a hypothesis test.
The key fact is that different evolutionary forces act in characteristic ways with respect to the genome. Genetic drift, migration, and inbreeding are statistical sampling processes that, on average, affect all loci equally, whereas mutation, natural selection, meiotic drive, and assortative mating differ among loci. This fact provides a useful null hypothesis against which various evolutionary hypotheses may in principle be tested. If a set of loci is evolving neutrally and is subject to the same mutational processes, then the distribution of allele frequencies or trait values among subpopulations should have the same average value across loci or across traits. In particular, a set of loci subject only to drift and migration is expected to have the same average inbreeding coefficient, F, and by extension, the same average partitioning of the inbreeding coefficient into within and amongpopulation components, F_{is} and F_{st} (CavalliSforza 1966, Lewontin and Krakauer 1973; Nei and Chakravarti 1977).
CavalliSforza (1966) first suggested that a hypothesis test for natural selection could be based on the fact that a set of neutrally evolving loci will have the same expected inbreeding coefficient. Lewontin and Krakauer (1973) proposed a formal statistical test of this idea. They argued that the variance in F is proportional to the square of its mean value averaged across loci. They derived the equation for the theoretically expected variance in F,
To determine the value of k, Lewontin and Krakauer simulated several distributions of allele frequencies among subpopulations and reached the conclusion that for neutral loci governed only by drift, k is ≤2 and is a decreasing function of F. Using the value k = 2 to establish the expected variance in F among loci, they demonstrated that the expected variance is distributed as a mean chisquare (chisquare/degrees of freedom), where the degrees of freedom are equal to the number of loci. They proposed that the ratio of the observed variance to the expected variance be compared to the critical chisquare value with the appropriate degrees of freedom; if the ratio of observed to expected variance is significantly large, the hypothesis of neutral evolution at all loci in the sample must be rejected. That is, by their argument, at least one locus in the sample must be under natural selection.
Unfortunately, the LewontinKrakauer test is not generally valid as a test of natural selection for several reasons. Robertson (1975a,b) pointed out that anything that introduces a correlation among allele frequencies among loci within subpopulations will inflate the variance in F relative to the expected LewontinKrakauer value. Two processes that introduce such a correlation are steppingstone migration and phylogenetic history, such that some subpopulations share a more recent common ancestor than others. Nei and Maruyama (1975) demonstrated that the transient effects of neutral mutations can also inflate the variance in F even if the rest of Lewontin and Krakauer’s assumptions are met. Furthermore, several authors (Ewens and Feldman 1976; Ewens 1977; Nei and Chakravarti 1977, Neiet al. 1977) showed that k is generally dependent on initial allele frequency and that highly skewed initial allele frequencies will typically result in a value of k > 2, even if all of Lewontin and Krakauer’s assumptions are met.
However, the problem attacked by Lewontin and Krakauer remains important, because it has become standard practice to use the variation in allele frequencies among populations (i.e., F_{st}) as an indirect estimator of gene flow (Slatkin 1985; Slatkin and Barton 1989), with the variation among loci taken as valid replicates of the same evolutionary process (Weir and Cockerham 1984; Weir 1990; Slatkin 1991, 1993). Yet, as such data continue to accumulate and be used for this purpose, large interlocus variation continues to emerge. Indeed, the issue is not merely an academic one among theoretical population geneticists; conservation and fisheries and wildlife management decisions are often made on the basis of such data (Moritz 1994).
Even in the absence of natural selection, it is possible that locusspecific processes are important in cases of large variation among loci. Specifically, it may be that some loci mutate more or less according to an infinite alleles model (i.e., all alleles of a given type are identical by descent), whereas recurrent mutation to identical allelic states is the rule at other loci (i.e., some alleles are identical in state but not identical by descent). If such variation in mutational properties among loci is present, it should be more apparent in populations with low levels of migration (large F¯_{st}) than in populations with high levels of migration (low F¯_{st}). Locusspecific effects can artificially hide real population subdivision, but they cannot artificially create the appearance of subdivision in a panmictic population except in the case of strong directional selection. That is, in a population with low gene flow, one set of loci (the “infinite alleles” loci) appears highly differentiated among subpopulations whereas another set of loci (the “recurrent mutation” loci) may not. In a population with high gene flow, however, all loci appear relatively homogeneous among subpopulations. Another way in which mutation can artificially hide the presence of real population structure is if there are undetected null alleles segregating; again, populations that are actually differentiated can appear homogeneous (although a high frequency of nulls may be indicated by a deviation from HardyWeinberg equilibrium). Finally, in populations with low levels of gene flow, the assumption of μ ⪡ m may be violated at some but not all loci, which leads to large variance among loci.
Herein, I use the LewontinKrakauer test to describe the variance among allozyme loci in the literature on fishes and then draw conclusions from the pattern of results. My approach is to initially assign an expected level of gene flow to a population (data set) from a priori biological and geographical considerations; if locusspecific effects are of general importance, populations with expected low levels of gene flow should generally exhibit greater variation in F_{st} among loci for the reasons noted above. Note that it is important to assign expected levels of gene flow a priori, because in any given case a small value of F_{st} may be due to locusspecific effects rather than high gene flow. It is also important to realize what the LewontinKrakauer test can and cannot do. Although it is not a valid test of selection per se for the reasons noted above, it can provide a onesided test for locusspecific effects that is of some value to the biologist interested in inferring gene flow from F_{st}. Specifically, if the variance among loci is greater than expected by the LewontinKrakauer criterion, it cannot confidently be attributed to selection; any of the other explanations may hold. However, if a LewontinKrakauer test is not significant, then there is some theoretical justification for making the assumptions implicit in relating F_{st} to N_{e}m, the effective number of migrants, i.e., effective neutrality, weak mutation, and approximate migrationdrift equilibrium (e.g., Nei 1973; Wright 1978; Crow and Aoki 1984; Slatkin 1985), and for treating each locus as a replicate of the same evolutionary process.
MATERIALS AND METHODS
The data analyzed here are taken from a subset of the literature on studies of gene flow and population structure in fishes (102 data sets from 77 publications; raw data are presented in an appendix available online at http://www.colostate.edu/Depts/Biology/Research/baer1999genetics.htm/). Fishes are particularly suitable for this study because they encompass almost the entire range of possible degrees of population structuring, from the essentially panmictic (large, pelagic marine species) to essentially isolated populations (lacustrine species or species endemic to springs). To be included, a data set had to have at least three natural subpopulations (e.g., hatchery populations or those known to be stocked were omitted) and at least three loci. If a study reported values of F_{st} and average heterozygosity for each locus, those values were taken directly as published. If a study did not report either of those two quantities, the raw allele frequency data were entered into BIOSYS I (Swofford and Selander 1989) and F_{st} for each locus was calculated using the WRIGHT78 step with the NOHRCHY option. This analysis assumes HardyWeinberg equilibrium and does not partition the total inbreeding coefficient into F_{is} and F_{st}. Because most authors do not report genotype frequencies, this was the only analysis possible. The WRIGHT78 protocol calculates F_{st} for each allele at a locus and an average value at a locus weighted by p¯_{i} (1  p¯_{i}), where p¯_{i} is the mean frequency of the ith allele; this is equivalent to Nei’s (1973) G_{st}. In my analysis I use the weighted average F_{st} (of alleles within a locus) because that is the value usually reported in published studies. An alternative possibility would have been to use the original procedure of Lewontin and Krakauer, which is to calculate F_{st} for each allele at a locus and subtract a degree of freedom for each multialleleic locus. The LewontinKrakauer procedure provides greater statistical power, but because most authors report weighted average values of F_{st}, I chose to do so as well; the result is a conservative test.
The first step of the analysis was to assign to each population an expected level of gene flow, from low (i.e., high expected F¯_{st}) to high (low expected F¯_{st}), from a priori biological considerations such as habitat, behavior, geographic distribution, etc. For example, my expectation is that yellowfin tuna will exhibit high levels of gene flow and the Leon Springs pupfish will exhibit low levels of gene flow. It is very important to realize that the expected level of gene flow of a population is often determined more by the geographical milieu in which a data set was collected than by the biological properties (e.g., swimming ability) of the species. For example, consider the large, vagile largemouth bass and the small, sedentary madtom. Within a river drainage, I expect bass to exhibit greater gene flow than madtoms. However, I expect there to be less gene flow between populations of bass in different drainages than between subpopulations of madtoms in the same drainage. Some species are included more than once, and of those, some are assigned different levels of expected gene flow from geographic considerations (see appendix at website). For example, the Atlantic salmon is included four times and appears in all three categories of expected gene flow due to the geographic properties of the samples. The unweighted mean value of F_{st} (among loci) was then used to calculate from Equation 1 the expected variance in a given study; the observed variance was calculated as usual.
These data were then used in two ways. First, I assumed that the theoretically expected variance in F_{st} in a study was in fact equal to the observed variance and calculated a value of k, substituting the observed variance for the expected in Equation 1 (see Nei and Chakravarti 1977; Neiet al. 1977). This procedure yields continuous data and allows the examination of k as a function of other variables (e.g., F¯_{st}, number of subpopulations, number of loci, taxon, ecological niche, etc.).
Second, I did two LewontinKrakauer tests on each data set, using the original criterion of k = 2 and Ewens’ (1977, p. 120) skewed (p¯ = 0.9) βdistribution criterion of k = 7.6. These tests yield categorical data; either the hypothesis of “neutral evolution” was not rejected (i.e., the variance was not larger than expected given the particular criterion of a test) or it was. Because the focus of this study concerns the strength of inferences about gene flow drawn from allele frequency data and not the inference of natural selection from those data, a conservative test in this case necessitates minimizing the possibility of type II error; accordingly, the level of significance was not corrected for multiple tests.
These analyses were done first for all loci at which the frequency of the common allele in at least one subpopulation was <0.95. I then repeated the analyses and included only loci with an expected heterozygosity [H_{T} in Nei’s (1973) terminology, or the “total limiting variance” of Wright (1978)] of 0.2 or greater for two reasons, one theoretical and one empirical. First, as noted, loci with highly skewed allele frequencies cause k to be >2 in certain cases (Ewens and Feldman 1976, Ewens 1977; Nei and Chakravarti 1977; Neiet al. 1977). Second, the oddlybehaved loci in Heterandria formosa (least killifish) that initially brought the issue to my attention had expected heterozygosities >0.2 (Baer 1998b). For this second set of analyses, I used Lewontin and Krakauer’s value of k = 2 and Ewens’ βdistribution value of k = 2.57; the latter value was calculated from Ewens’ (1977, p. 120) equation 110, which assumes that the common allele at a locus had a frequency of 0.73, the median value of allele frequencies at highly polymorphic loci averaged across data sets. Calculated statistics that include only highly polymorphic loci are designated with the subscript “20” (e.g., k_{20}, F_{st,20}).
The distributions of k and k_{20} were approximately lognormal; means and 95% confidence limits (CL) were calculated from backtransformation of naturallogtransformed data. The distributions of F_{st} and F_{st,20} could not be satisfactorily normalized by transformation; median values and ranges are thus presented.
In any comparative study, the potential effects of phylogenetic nonindependence need to be considered. A formal comparative treatment of the data in this study would be problematic for two reasons. First, because of the samplingdependent nature of the character “expected level of gene flow,” mapping characterstate changes onto a tree would be meaningless for most clades. Second, although one could in principle use independent contrasts of the relationship of k with F_{st} (Felsenstein 1985), one would have to make some very speculative assumptions about the relative branch lengths. To account for the potential effects of phylogeny in a heuristic way, I calculated an average value of k for each of the 39 families represented in the data set and compared them to the results from the complete data set.
RESULTS
The prediction that locusspecific effects should lead to greater variance in F_{st} among loci in populations with low expected levels of migration (high e.g., F¯_{st}) compared to those with high expected levels of migration (low F¯_{st}) was not borne out. Lewontin and Krakauer’s k statistic did not differ among the three classes of expected levels of gene flow (oneway ANOVA, F_{2,99} = 0.822, P = 0.442; Table 1). The criteria by which populations were assigned an expected level of gene flow proved reliable; the average F¯_{st} was highest for “low” gene flow populations and lowest for “high” gene flow populations, with “medium” populations intermediate (Table 1). Regression of log(k) against F¯_{st} revealed no relationship between k and F¯_{st} (F_{1,100} = 1.443, P = 0.232, ^{2} = 0.014; Figure 1a). When only highly polymorphic R loci were considered, there was again no difference in variance in F_{st} among loci among the different expected levels of gene flow (oneway ANOVA, F_{2,87} = 2.111, P = 0.127; Table 1). Regression of log(k_{20}) against F_{st,20} revealed a significant negative relationship between the variance among loci, and F¯_{st}; populations with low expected levels of gene flow (high F¯_{st}) had smaller values of k_{20} than did those with high expected levels of gene flow (low F¯_{st}; log[k_{20}] = 2.101[F¯_{st,20}] + 1.500; F_{1,88} = 12.027, P = 0.001, R^{2} = 0.094; Figure 1b). This result is consistent with the expectation that k is be a decreasing function of F¯_{st} under the LewontinKrakauer model (Lewontin and Krakauer 1973; Robertson 1975b).
The mean value of k averaged over all 102 data sets is 5.92 (95% CL = 4.81, 7.29), >2 but <7.6 predicted under the beta distribution of allele frequencies with a median allele frequency of p¯ = 0.9 (Table 1). When only highly variable loci are considered, the mean value of k_{20} averaged over all data sets is 2.82 (95% CL = 2.08, 3.81), which is close to the value of 2.57 predicted from the βdistribution with the median allele frequency of p¯ = 0.73. In both cases, k is smaller in populations with low expected levels of gene flow than in populations with medium or high expected levels of gene flow (Table 1).
When the LewontinKrakauer test is used to assess the pattern of variation, the general pattern of a decline in variation among loci with decreasing level of gene flow remains. When the original value of k = 2 is used and all loci are included, the test is significant in a substantial majority of cases and the pattern is consistent across classes of expected levels of gene flow (Table 2). When the Ewens’ (p¯ = 0.9) criterion of k = 7.6 is used, the pattern is reversed; the observed value of k is no greater than expected in a large majority of cases, again consistent across classes. When only highly polymorphic loci are considered, the pattern is more complicated. For populations in which high gene flow is expected, the LewontinKrakauer test with k = 2 is significant a majority of the time (Table 2) but it is not significant in populations with low expected levels of gene flow; populations with intermediate levels of expected gene flow are intermediate. When the Ewens’ criterion of k = 2.57 is used, the LewontinKrakauer test is significant in slightly fewer cases in all three gene flow categories, as expected (Table 2).
There are weak but highly significant relationships between k and both the number of subpopulations (log[k] = 0.032[n pops] + 1.421; F_{1,100} = 14.419, P = 0.000) and the number of loci included in a data set (log[k] = 0.074[n loci] + 1.172; F_{1,100} = 10.491, P = 0.002). However, these relationships disappear when only highly polymorphic loci are considered (n pops, F_{1,88} = 0.799, P = 0.374; n loci, F_{1,88} = 1.167, P = 0.283). There is no relationship between F_{st} and either number of subpopulations (F_{1,100} = 1.607, P = 0.208) or number of loci (F_{1,100} = 1.571, P = 0.213) included in a dataset, the results for highly polymorphic loci are essentially identical.
When k was averaged within families without regard to expected level of gene flow, the mean k was 5.45 (95% CL = 4.12, 7.16; n = 39), which is very close to the uncorrected mean of 5.92. When averaged over families within individual categories of expected levels of gene flow, the results were again very similar to the uncorrected results (mean k, high E[gf] = 6.68, medium E[gf] = 4.68, low E[gf] = 4.71; see Table 1 for comparison). For highly variable loci, k_{20} averaged over families without regard to expected level of gene flow was 2.82 (95% CL = 1.45, 4.27; n = 38), exactly the same as the uncorrected value. The family averages of k_{20} within categories of expected gene flow were again very similar to the uncorrected results (mean k_{20}, high E[gf] = 3.35, medium E[gf] = 2.45, low E[gf] = 1.71; see Table 1 for comparison). The family means were distributed approximately lognormally, the same as the full data set. These results suggest that there is no important confounding effect of phylogenetic nonindependence on the initial results.
DISCUSSION
Most importantly, the results of this study lead to the conclusion that there is no general tendency for locusspecific effects to artificially mask real population structure. This is illustrated by the random (when all loci are considered) or negative (when only highly polymorphic loci are considered) relationship between k and F¯_{st}. This is good news for biologists interested in inferring patterns of gene flow from allozyme allele frequency data; it means that the assumptions necessary for that inference (effective neutrality, weak mutation, and approximate migrationdrift equilibrium) seem in general to be valid, especially when only highly polymorphic loci are considered. Obviously, there are individual cases when those assumptions apparently are violated, as evidenced by the large values of k seen in some data sets (e.g., approximately an order of magnitude greater than even a liberally calculated expected value).
The fact that k calculated over all loci is greater than k calculated over only highly polymorphic loci is almost certainly due to the effect of differences in allele frequency per se, with skewed values of p tending to inflate the value of k (Ewens and Feldman 1976; Ewens 1977, Nei and Chakravarti 1977). It is possible, however, that in certain data sets “all loci” encompass two (or more) classes of loci that are under different degrees of evolutionary constraint (i.e., different neutral substitution rates; CavalliSforza 1966; Lewontin and Krakauer 1973). That the larger value of k is observed when all loci are included is consistent with one class of loci, which includes highly polymorphic loci, being relatively unconstrained and the remainder of the loci being relatively more constrained, with variation at those loci consisting of a few rare alleles. This observation is not consistent with pervasive balancing selection, unless almost all highly polymorphic loci are under balancing selection, a possibility not usually entertained in the literature (but see Karl and Avise 1992). That F¯_{st} averaged over highly polymorphic loci is ∼40% greater than F¯_{st} averaged over all loci (Table 1) also argues strongly against the possibility of pervasive balancing selection at highly polymorphic loci, at least within subpopulations.
The random/negative relationship of k with F¯_{st} is perhaps surprising for another reason. As first pointed out by Robertson (1975a,b), population structure that results either from phylogenetic history or steppingstone migration will increase the variance among loci over the LewontinKrakauer expectation. It was my initial expectation that such structure would be more likely in populations with low gene flow—yet another reason to expect the value of k to be greater in low gene flow populations. One possible explanation for the decrease in k with increasing F¯_{st} is that the distribution of allele frequencies among subpopulations in populations with low gene flow is governed almost solely by drift and that there is little historical information left in the data, a possibility that seems unlikely given what is known about the general utility of allozyme frequencies for phylogenetic reconstruction. Any correlations that are present would then occur primarily at short geographic distances due to steppingstone migration (e.g., Slatkin 1993) and be maintained more readily in populations with high levels of gene flow. Another possibility (pointed out by an anonymous reviewer) is that because N_{e} will often decrease with increasing population subdivision, the effectively neutral mutation rate will increase with increasing population subdivision. Therefore, some loci that are subject to purifying directional selection in large populations will begin to accumulate effectively neutral variation in small ones. Such a situation could decrease the probability of observing classes of loci with different levels of selective constraint in small populations, which would lead to the observed negative relationship between k and F¯_{st}. There is at least anecdotal evidence for just such an effect of N_{e}. In a study of three species of Cyprinids (Tibbets and Dowling 1996), two of which are endangered and the third of which is not, the value of k for both endangered species is ≅1; for the more abundant species k ≅ 7. Likewise, in a study of three endangered Cyprinodontids (Echelleet al. 1987), k is ≅1 in all cases, whereas for two more abundant Cyprinodontids (Dugginset al. 1983), k is >5 in both species. The empirical relationship between N_{e} and the variance in F_{st} among loci warrants further investigation.
A potential criticism of these conclusions is the fact that in some cases a high frequency of LewontinKrakauer tests are statistically significant, particularly when all loci are considered in populations with high expected gene flow. However, given what is known about the behavior of k under a variety of driftonly (Nei and Chakravarti 1977), mutationdrift (Nei and Maruyama 1975; Ewens and Feldman 1976), and migrationdrift (Robertson 1975a,b; Ewens and Feldman 1976, Neiet al. 1977) models, the observed average values of k are consistent with allozyme evolution in fishes being predominently free of locusspecific effects (i.e., see the values of k reported in the cited articles). Obviously, a conservative philosophy would be to refrain from inferring gene flow from F_{st} in cases when a LewontinKrakauer test is statistically significant. In the near future there should be enough data available from studies of other classes of markers (microsatellites, anonymous singlecopy nuclear DNA) to construct an independent test of the hypotheses presented here, although such analyses need to take absolute levels of variation into account (e.g., Pogsonet al. 1995).
Finally, there is a pattern that emerges from the data that is worthy of comment, which is that F¯_{st} averaged only over highly polymorphic loci is sometimes substantially greater than when all loci are included in the analysis (averaged over all 102 data sets, the median F¯_{st,20} is ≅40% greater than F¯_{st} calculated over all loci). This is not a novel observation (e.g., Bossart and Prowell 1998), but the size of the database considered in this study emphasizes the point. The usual procedure of weighting the average F_{st} by the quantity p¯(1  p¯) (Nei 1973) mitigates the situation, but the results of this study suggest that the common procedure of resampling across loci and excluding from the analysis those loci with values of F_{st} that fall outside the 95% CL (Weir and Cockerham 1984; Weir 1990) may in some cases result in omitting informative loci and retaining uninformative (or misinformative) ones. In light of these results, managers faced with decisions based on population genetic structure should consider using the largest estimate of F_{st} as their criterion for assessing the degree of structure present.
Acknowledgments
I thank Mike Antolin, Bill Black IV, Mike Hellberg, Tom Turner, Mike Whitlock, and two anonymous reviewers for discussions and/or comments on the manuscript. I am especially indebted to Steve Karl for a conversation in which my thinking on the subject crystallized and for sharing an unpublished manuscript and to Joe Travis for particularly insightful comments. Support was provided by a Florida State University Dissertation Fellowship and Colorado Agricultural Experiment Station Hatch Project no. 697 to M. Antolin.
Footnotes

Communicating editor: M. Slatkin
 Received November 16, 1998.
 Accepted February 22, 1999.
 Copyright © 1999 by the Genetics Society of America