Abstract
A method is proposed for estimating the intensity of overdominant selection scaled by the effective population size, S = 2Ns, from allele frequencies. The method is based on the assumption that, with strong overdominant selection, allele frequencies are nearly at their deterministic equilibrium values and that, to a first approximation, deviations depend only on S. Simulations verify that reasonably accurate estimates of S can be obtained for realistic sample sizes. The method is applied to data from several loci in the major histocompatibility complex (Mhc) in numerous human populations. For alleles distinguished by both serological typing and the sequence of the peptidebinding region, our estimates of S are comparable to those obtained by analysis of DNA sequences in showing that selection is strongest on HLAB and weaker on HLAA, HLADRB1, and HLADQA1. The intensity of selection on HLAB varied considerably among populations. Two populations, Native American and Inuit, showed an excess rather than a deficiency in homozygosity. Comparable estimates of S were obtained for alleles at Mhc class II loci distinguished by serological reactions (serotyping) and by differences in the amino acid sequences of the peptidebinding region (molecular typing). A comparison of two types of data for DQA1 and DRB1 showed that serotyping led to generally lower estimates of S.
THE extensive polymorphism at the major histocompatibility complex (Mhc) loci in humans and other vertebrates is probably attributable to balancing selection caused by the superior performance of the immune system of individuals more heterozygous at these loci. Hughes and Yeager (1998) summarize evidence in favor of this theory. In this article we describe a simple method for using allele frequencies at a locus to estimate the intensity of selection in favor of heterozygotes. Our method is based on the assumption that allele frequencies are held at almost the frequency expected under selection alone and that deviations from those frequencies are determined by the balance achieved between genetic drift and selection. In a previous article, we showed that making such an assumption leads to an excellent approximation for the numbers of alleles maintained at a locus under the combined effects of genetic drift, mutation, and overdominant selection (Slatkin and Muirhead 1999). We first develop the basic theory, then test the performance of our method on simulated data, and finally apply it to several published sets of allele frequencies at Mhc loci in humans. Our results are similar to those of Satta et al. (1994), who estimated selection intensities from the ratio of silent to replacement changes in the peptidebinding region of Mhc proteins. Edwards and Hedrick (1998) review other methods for estimating selection intensities in the Mhc region.
METHOD
Our method assumes a single locus with k alleles in a population containing N diploid individuals and symmetric overdominant selection in which every heterozygous individual has a fitness 1 + s relative to every homozygous individual. The frequency of the ith allele is x_{i} and, at the deterministic equilibrium under selection alone, x_{i} = 1/k. We assume that genetic drift results in small deviations of allele frequencies from 1/k and write the deviation at the ith allele as ▪_{i} = x_{i}  1/k. One generation of selection and genetic drift changes the ▪_{i} to
The general solution to a multidimensional diffusion equation based on (1) has been obtained by Watterson (1977) but it is difficult to use that result to derive an estimate of the intensity of selection. Here, we obtain approximate results for both the population and a sample from it by assuming that F = 1/k, the value at the equilibrium under selection, and that the ▪_{t} are much smaller than 1/k. In other words, we assume that genetic drift results in only small deviations from the deterministic equilibrium. With these approximations
Under this model and under the assumption of strong selection, the joint distribution of allele frequencies in the population depends on only one combination of parameters, 4Ns, which we denote by 2S to be consistent with previous notation. The problem is to estimate S from a sample from a population. A data set consists of a set of counts of each allele, {i_{j}} (j = 1,..., k), where
To simplify the analysis, we further approximate the multivariate normal distribution of allele frequencies by a symmetric Dirichlet distribution,
Given the data, Equation 3 provides the likelihood of a and hence of S. We have written a short Mathematica (Wolfram 1996) program to find the maximumlikelihood estimate (MLE) of S, Ŝ, and will distribute copies of that program upon request. This program either provides Ŝ or it indicates that the likelihood function is a monotonically increasing function of S, implying that there is no finite MLE value. In that case, we report the result as Ŝ = ∞. The value of Ŝ is always positive because of the functional dependence of the likelihood on a.
Two extreme hypotheses, S = 0 and S = ∞, can be considered separately. Neutrality (S = 0) can be tested using either Watterson’s (1977) test based on the observed homozygosity, F (also called the EwensWatterson test), or an exact test (Slatkin 1994). We used the program described by Slatkin (1996) to carry out the EwensWatterson test and provide the tail probability P_{H}. The value of P_{H} is the probability under neutrality (S = 0) of obtaining a value of F no larger than the observed value in a random sample of the same size containing the same number of alleles. A small value of P_{H} indicates too little homozygosity and can be the result of overdominant selection. A value of P_{H} close to 1 indicates excess homozygosity and can be the result of purifying selection. The exact test is similar but was not designed to be powerful against the specific alternative of overdominant selection.
The other extreme hypothesis is S = ∞, in which case the allele frequencies are at their deterministic equilibrium value, 1/k, and the deviations from those frequencies are attributable to sampling only. The test for S = ∞ is then a test of the hypothesis that the sample is drawn from a multinomial distribution with equal probabilities and sample size n. For reasonably large sample sizes, a χ^{2} test with k  1 d.f. can be used and a probability of the data under the hypothesis that S = ∞, P_{M} (M for multinomial), can be obtained. A onesided test is appropriate in this case because we assume random sampling is always present and the only question is whether there is significant deviation from equal frequencies. A value of P_{M} > 0.05 indicates that the data do not reject the hypothesis that S = ∞. Our Mathematica program also computes P_{M}.
For any data set, our program and the program to test neutrality provide three numbers, Ŝ, P_{M}, and P_{H}. We found in some cases that P_{M} < 0.05 but Ŝ = ∞, but we have found no cases in which P_{M} > 0.05 but Ŝ is finite. In the simulated data sets described in the next section we found several for which P_{M} > 0.05, but no such cases in any of the real data sets we analyzed.
Our method estimates S from the extent of variation in allele frequencies at a locus. The dependence on k is very weak. Our previous results (Slatkin and Muirhead 1999) and those of Takahata (1990) provide a formula relating the expected number of alleles to S and the scaled mutation rate, M = Nu:
Equation 4 can be solved for M as a function of k and S:
SIMULATION TEST
We applied our method to simulated data generated by the program described in our earlier article (Slatkin and Muirhead 1999). For each replicate, the parameters are S = 2Ns, M = Nu, and n, the sample size. In all cases, we assumed N = 10,000. Samples were taken after the numbers of alleles reached stationarity. For each set of parameter values we generated 1000 replicate data sets.
Typical distributions of Ŝ are shown in Figure 1 and the summary of results is shown in Table 1. For the parameter values we used in our simulations, estimates of S are relatively unbiased although they have broad 95% confidence intervals. Averages and confidence intervals reported in Table 1 are based on all replicates for which Ŝ < ∞. For a particular data set, it appears that if an estimate can be obtained at all, it is of the correct order of magnitude at least.
In almost all cases, neutrality could be rejected using the EwensWatterson test even with a sample size of 100. For the examples summarized in Table 1 with n = 100, with u = 10^{6} and s = 0.02 (S = 400 and M = 0.1) P_{H} < 0.05 in 997 out of 1000 replicates and with u = 10^{7}, P_{H} < 0.05 in all 1000 replicates. For the larger sample size (n = 1000) and both selection intensities (S = 400 and S = 4000), and for the smaller sample size (n = 100) and the stronger selection intensity (S = 4000), P_{H} < 0.05 for all 1000 replicates in each case.
We also simulated some cases with weaker selection, S = 40. For this selection coefficient, we had to use higher mutation rates to obtain numbers of alleles comparable to what is found in many data sets we analyzed. In all cases, N = 10,000. With u = 10^{5}, the average number of alleles is ∼9 and for u = 5 × 10^{5}, the average is ∼15. For the smaller mutation rate, P_{H} < 0.05 in 603 of 1000 replicates with n = 100, and P_{H} < 0.05 in 785 of 1000 replicates with n = 1000. With the higher mutation rate, P_{H} < 0.05 in 318 of 1000 replicates with n = 100 and 453 of 1000 replicates with n = 1000. In all cases, averages of Ŝ were biased upward somewhat.
Even when the entire population is sampled, Ŝ is not the true value. The reason is that Ŝ reflects two sources of variation, variation attributable to sampling and variation attributable to stochastic variation in the allele frequencies. Sampling the entire population removes only one source of variation. The relatively weak dependence of the variance of Ŝ on sample size shows that stochastic variation is the more important.
Estimates of M obtained in the same set of replicates are not as good and are probably not very useful. In most of the cases we examined, average estimates of M are not of the correct order of magnitude and the 95% confidence interval covers several orders of magnitude. For example, with n = 1000, S = 400, and M = 0.01, the average estimate of M was 0.11 and the 95% confidence interval was (9.8 × 10^{8}, 0.44). Our conclusion is that the method estimates S reasonably well, to within an order of magnitude at least, even for samples as small as 100, but that the estimate of M is not accurate enough to be useful.
APPLICATIONS
We applied our method for estimating S to two published data sets. The first data set is based on alleles distinguished by serological typing (or serotyping) and was presented at the histocompatibility workshop held in 1991 (Tsujiet al. 1992). These data were analyzed previously by Satta et al. (1994) and copies of these data were kindly provided to us by Dr. Y. Satta. We analyzed data from four loci, two class I loci, human histocompatibility system HLAA and HLAB, and two class II loci, HLADQA1 and HLADRB1. The second data set was for class II loci for which alleles could be distinguished by differences in DNA sequence in the known or presumed peptidebinding regions (molecular typing), which are thought to be the targets of selection (Hughes and Yeager 1998). We analyzed the data for four loci, HLADQA1, HLADQB1, HLADPB1, and HLADRB1. These data were published in the proceedings of a later workshop on HLA (Charron 1997) and were analyzed previously by Valdes et al. (1999). These data were kindly provided by Dr. S. McWeeney. Only alleles at class II loci can be distinguished by molecular typing.
Alleles distinguished by serotyping: We restricted our analysis to populations in which the sample size was >100 chromosomes and the proportion of null or unrecognizable alleles was usually <0.05. To obtain wider geographic coverage, we included some populations with slightly higher frequencies of null alleles. Table 2 shows the populations, loci, and sample sizes, and the values of Ŝ and P_{H} for each case. In no case was Ŝ = ∞ obtained and in all cases P_{M} was 0 or nearly so, indicating that the hypothesis S = ∞ could always be rejected.
For HLAA, values of P_{H} indicate neutrality cannot be rejected in most populations, yet values of Ŝ are generally larger than those from HLADRB1 and HLADQA1, for which neutrality could usually be rejected. Neutrality could not be rejected for any of the samples from Native Americans and Inuits for any of the four loci. In those two populations, the value of P_{H} for HLAA is close enough to one that there appears to be significantly more homozygosity than expected under neutrality. It is difficult to know whether the excess homozygosity results from a problem with serotyping, from evidence of purifying selection, or from admixture in those populations. We considered the possibility that a population bottleneck could lead to excess homozygosity, but simulation results from a model of a bottlenecked population could not reproduce the patterns found in the Native American and Inuit samples (results not shown). A bottleneck generally results in the loss of alleles rather than excess homozygosity.
Our results are similar to those obtained by Satta et al. (1994). Our estimates of S are consistently lower than those in Table 2 of Satta et al. (1994). Their estimates of S were based on the comparison of pairwise differences in silent and replacement sites in the peptidebinding region. They used five different methods that gave somewhat different results. Their method I gave the lowest estimates: HLAA, S = 690; HLAB, S = 1200; HLADRB, S = 990; and HLADQB1, S = 360. For HLAA, the largest values of Ŝ we found are slightly >200 (Figure 2). For HLAB, most of the Ŝ values were between 500 and 800 with only one >1000. The ratio of Ŝ for HLAB to HLAA is in the range 2.54, which is slightly higher than the ratio found by Satta et al. for their method I but comparable to values they obtained using the other four methods. Given the wide confidence intervals associated with both methods for estimating selection intensities, our results are compatible with those of Satta et al. (1994).
Alleles distinguished by molecular typing: Table 3 shows the results for four class II loci. Values of Ŝ are consistently larger for DQB1 than DQA1. Values for DRB1 are the largest. The orders of magnitude of the values of Ŝ for the loci are the same as found by Satta et al. (1994) for the same four loci, but, as with the previous data set, our estimates of S are consistently smaller than theirs. For DQA1 and DRB1 we have estimates based on both serotyping and molecular typing. The data sets are mostly for different populations and, even when the same population is represented, different individuals were sampled. The results from analyzing the two data sets are comparable. Most values of Ŝ for European populations are between 50 and 100 with smaller values for Native American populations. The values of Ŝ based on molecular typing are somewhat smaller than based on serotyping for Native Americans. Values of Ŝ are larger for DRB1, and again Native American populations have the smallest values.
DISCUSSION AND CONCLUSIONS
Our method provides a simple way to estimate the strength of overdominant selection from the observed distribution of allele frequencies. The estimate of S is essentially the inverse of the variance in frequencies. The multinomial Dirichlet distribution (3) is used to account for sampling. Our estimate of S is always an underestimate of the true intensity of selection because other factors, particularly differences in fitness among different alleles or anything else that causes deviations from the stationary distribution of allele frequencies, increase the variance and hence reduce the estimate of S. Our method assumes all variation in observed allele frequencies is the result of sampling and genetic drift.
The estimate of S obtained is an estimate of the net force tending to equalize allele frequencies. That force could be attributable only to overdominance in fitness, as we have assumed, or to frequencydependent selection resulting from mating preferences or some other factor. The estimate itself does not indicate the kind of selection acting and does not validate a particular model of selection.
Another possible cause of differences in allele frequencies is bias in the mutation process, with some alleles or types of alleles arising more frequently than others. Intragenic recombination between different alleles could contribute to bias in mutation because recombination between alleles that are more different in DNA sequence in the peptidebinding region would be more likely to create functionally different alleles. To explore the effects of intragenic recombination, a sequencebased simulation model of the kind analyzed by Satta (1997) is necessary but that is beyond the scope of our study. Satta assumed that heterozygote fitness increased with the difference in DNA sequence of the peptidebinding region and allowed for intragenic recombination to create functionally different alleles. Both of these assumptions lead to increased variance in allele frequencies under selection alone and hence would reduce estimates of S obtained using our method.
Our method is similar to methods developed by M. N. Grote (unpublished results) and by P. Donnelly, M. Nordberg and P. Joyce (unpublished results), both of which rely on the theory of Joyce (1995). These methods are mathematically much more sophisticated than the method presented here and require extensive simulation to obtain an estimate of S for a single data set. Donnelly et al. apply their method to data for a locus with four alleles and observed counts (46, 166, 112, and 120) and estimate S to be 36. Our method gives 47, which is consistent with their results.
There is no simple relationship between our estimates of S and the tail probability from the EwensWatterson homozygosity test, P_{H}. The variance in allele frequency is closely related to the homozygosity,
Our estimates of S for HLAA are of the same order of magnitude as those obtained by Satta et al. (1994) from the analysis of silent and replacement differences in the peptidebinding region of HLAA. There appears to be some geographic variation in the intensities of overdominant selection in human populations. The broad confidence intervals on Ŝ we found in simulations mean that differences among European, Asian, and the one African population analyzed are not significant at any of the loci. It does appear that in the two North American populations, Native Americans and Inuit, either selection is much weaker than in the rest of the world or that admixture has resulted in the appearance of weaker selection.
Acknowledgments
We thank Dr. Y. Satta for providing copies of the data analyzed in the article by Satta et al. (1994) and Dr. S. McWeeney for providing copies of the data analyzed by Valdes et al. (1999), and for helpful discussions of this project. This research was supported in part by National Institutes of Health (NIH) grant GM40282 to M. Slatkin and by an NIH traineeship to C. A. Muirhead from grant GM07127.
Footnotes

Communicating editor: N. Takahata
 Received May 22, 2000.
 Accepted August 24, 2000.
 Copyright © 2000 by the Genetics Society of America