A Method for Estimating the Intensity of Overdominant Selection From the Distribution of Allele Frequencies
Montgomery Slatkin, Christina A. Muirhead


A method is proposed for estimating the intensity of overdominant selection scaled by the effective population size, S = 2Ns, from allele frequencies. The method is based on the assumption that, with strong overdominant selection, allele frequencies are nearly at their deterministic equilibrium values and that, to a first approximation, deviations depend only on S. Simulations verify that reasonably accurate estimates of S can be obtained for realistic sample sizes. The method is applied to data from several loci in the major histocompatibility complex (Mhc) in numerous human populations. For alleles distinguished by both serological typing and the sequence of the peptide-binding region, our estimates of S are comparable to those obtained by analysis of DNA sequences in showing that selection is strongest on HLA-B and weaker on HLA-A, HLA-DRB1, and HLA-DQA1. The intensity of selection on HLA-B varied considerably among populations. Two populations, Native American and Inuit, showed an excess rather than a deficiency in homozygosity. Comparable estimates of S were obtained for alleles at Mhc class II loci distinguished by serological reactions (serotyping) and by differences in the amino acid sequences of the peptide-binding region (molecular typing). A comparison of two types of data for DQA1 and DRB1 showed that serotyping led to generally lower estimates of S.

THE extensive polymorphism at the major histocompatibility complex (Mhc) loci in humans and other vertebrates is probably attributable to balancing selection caused by the superior performance of the immune system of individuals more heterozygous at these loci. Hughes and Yeager (1998) summarize evidence in favor of this theory. In this article we describe a simple method for using allele frequencies at a locus to estimate the intensity of selection in favor of heterozygotes. Our method is based on the assumption that allele frequencies are held at almost the frequency expected under selection alone and that deviations from those frequencies are determined by the balance achieved between genetic drift and selection. In a previous article, we showed that making such an assumption leads to an excellent approximation for the numbers of alleles maintained at a locus under the combined effects of genetic drift, mutation, and overdominant selection (Slatkin and Muirhead 1999). We first develop the basic theory, then test the performance of our method on simulated data, and finally apply it to several published sets of allele frequencies at Mhc loci in humans. Our results are similar to those of Satta et al. (1994), who estimated selection intensities from the ratio of silent to replacement changes in the peptide-binding region of Mhc proteins. Edwards and Hedrick (1998) review other methods for estimating selection intensities in the Mhc region.


Our method assumes a single locus with k alleles in a population containing N diploid individuals and symmetric overdominant selection in which every heterozygous individual has a fitness 1 + s relative to every homozygous individual. The frequency of the ith allele is xi and, at the deterministic equilibrium under selection alone, xi = 1/k. We assume that genetic drift results in small deviations of allele frequencies from 1/k and write the deviation at the ith allele as ▪i = xi - 1/k. One generation of selection and genetic drift changes the ▪i to ξi . We can use the standard theory of population genetics to compute the approximate mean and variance of Δξi=ξiξi and the covariance between Δξi and Δξj for the ith and jth alleles (ij) under the assumption that s is small and N is large, E(Δξi)s(1k+ξi)(F1kξi) (1a) Var(Δξi)(1k+ξi)(11kξi)2N (1b) Cov(Δξi,Δξj)=(1k+ξi)(1k+ξj)2N, (1c) where F=Σixi2 is the homozygosity.

The general solution to a multidimensional diffusion equation based on (1) has been obtained by Watterson (1977) but it is difficult to use that result to derive an estimate of the intensity of selection. Here, we obtain approximate results for both the population and a sample from it by assuming that F = 1/k, the value at the equilibrium under selection, and that the ▪t are much smaller than 1/k. In other words, we assume that genetic drift results in only small deviations from the deterministic equilibrium. With these approximations E(Δξi)=sξik, (2a) Var(Δξi)=1(2Nk), (2b) Cov(Δξi,Δξj)=1(2Nk2), (2c) where (2b) assumes that k ⊢ 1, which is typically the case at Mhc loci. If k is small, then N in (2b) is divided by 1 - 1/k. A diffusion equation for the joint probability distribution of ▪1,..., ▪k can be derived from (2). The solution to that equation tells us that the joint probability distribution of ▪1,..., ▪k is a symmetric multivariate normal with mean values of 0, variances 1/(4Ns), and covariances -1/(4Nsk) (Felsenstein 1977). Because xi ξi + 1/k, the joint distribution of (x1,..., xk) is multivariate normal with means 1/k, with equal variances given by (2b) and with equal covariances given by (2c).

Under this model and under the assumption of strong selection, the joint distribution of allele frequencies in the population depends on only one combination of parameters, 4Ns, which we denote by 2S to be consistent with previous notation. The problem is to estimate S from a sample from a population. A data set consists of a set of counts of each allele, {ij} (j = 1,..., k), where Σj=1kij=n , the sample size (n/2 being the number of individuals sampled). In general, the true number of alleles at the locus is unknown so, in our analysis, k means the number of alleles found in the sample and the ij are all nonzero. The value of k can and does differ among populations. Our method assumes in effect that differences among populations in the number of alleles found in a sample reflect differences among populations in the true number of alleles present. It would be easy to modify the method presented here so that the true number of alleles is assumed known and that some of the ij are zero. Such a modification would increase the apparent variance in allele frequencies and reduce the estimated values of S.

To simplify the analysis, we further approximate the multivariate normal distribution of allele frequencies by a symmetric Dirichlet distribution, Pr(x1,,xka)=Γ(ka)(Γ(a))kj=1kxja1, where the parameter a is chosen so that S = k2(1 + ka)/(k - 1), which ensures that both distributions have the same means, variances, and covariances. The reason for using a Dirichlet rather than a multivariate normal distribution is that the sampling distribution has a particularly simple form, the multinomial Dirichlet distribution, Pr(i1,,ika)=j=1k(a)(ij)(ka)(n), (3) where (a)(i) = a(a + 1)... (a + i - 1) (Johnsonet al. 1997). For large values of a, there is little difference between Dirichlet and multinomial distributions, and the use of a Dirichlet distribution ensures that all allele frequencies are positive.

Given the data, Equation 3 provides the likelihood of a and hence of S. We have written a short Mathematica (Wolfram 1996) program to find the maximum-likelihood estimate (MLE) of S, Ŝ, and will distribute copies of that program upon request. This program either provides Ŝ or it indicates that the likelihood function is a monotonically increasing function of S, implying that there is no finite MLE value. In that case, we report the result as Ŝ = ∞. The value of Ŝ is always positive because of the functional dependence of the likelihood on a.

Two extreme hypotheses, S = 0 and S = ∞, can be considered separately. Neutrality (S = 0) can be tested using either Watterson’s (1977) test based on the observed homozygosity, F (also called the Ewens-Watterson test), or an exact test (Slatkin 1994). We used the program described by Slatkin (1996) to carry out the Ewens-Watterson test and provide the tail probability PH. The value of PH is the probability under neutrality (S = 0) of obtaining a value of F no larger than the observed value in a random sample of the same size containing the same number of alleles. A small value of PH indicates too little homozygosity and can be the result of overdominant selection. A value of PH close to 1 indicates excess homozygosity and can be the result of purifying selection. The exact test is similar but was not designed to be powerful against the specific alternative of overdominant selection.

The other extreme hypothesis is S = ∞, in which case the allele frequencies are at their deterministic equilibrium value, 1/k, and the deviations from those frequencies are attributable to sampling only. The test for S = ∞ is then a test of the hypothesis that the sample is drawn from a multinomial distribution with equal probabilities and sample size n. For reasonably large sample sizes, a χ2 test with k - 1 d.f. can be used and a probability of the data under the hypothesis that S = ∞, PM (M for multinomial), can be obtained. A one-sided test is appropriate in this case because we assume random sampling is always present and the only question is whether there is significant deviation from equal frequencies. A value of PM > 0.05 indicates that the data do not reject the hypothesis that S = ∞. Our Mathematica program also computes PM.

Figure 1.

—Frequency distribution of Ŝ in two sets of 1000 replicates of a model with N = 10,000, u = 10-6 (M = 0.1), and s = 0.01 (S = 400). For n = 100, Ŝ was finite in 902 replicates and for n = 1000, Ŝ was finite in 997 replicates (cf. Table 1).

For any data set, our program and the program to test neutrality provide three numbers, Ŝ, PM, and PH. We found in some cases that PM < 0.05 but Ŝ = ∞, but we have found no cases in which PM > 0.05 but Ŝ is finite. In the simulated data sets described in the next section we found several for which PM > 0.05, but no such cases in any of the real data sets we analyzed.

Our method estimates S from the extent of variation in allele frequencies at a locus. The dependence on k is very weak. Our previous results (Slatkin and Muirhead 1999) and those of Takahata (1990) provide a formula relating the expected number of alleles to S and the scaled mutation rate, M = Nu: k=2Sln(S32πM2). (4)

Equation 4 can be solved for M as a function of k and S: M=S32πeSk2. (5) When Ŝ is finite, our program calculates M from Equation 5.

View this table:

Performance of method for finding Ŝ, the MLE of S = 2Ns in sets of 1000 replicate simulations for each set of parameter values


We applied our method to simulated data generated by the program described in our earlier article (Slatkin and Muirhead 1999). For each replicate, the parameters are S = 2Ns, M = Nu, and n, the sample size. In all cases, we assumed N = 10,000. Samples were taken after the numbers of alleles reached stationarity. For each set of parameter values we generated 1000 replicate data sets.

Typical distributions of Ŝ are shown in Figure 1 and the summary of results is shown in Table 1. For the parameter values we used in our simulations, estimates of S are relatively unbiased although they have broad 95% confidence intervals. Averages and confidence intervals reported in Table 1 are based on all replicates for which Ŝ < ∞. For a particular data set, it appears that if an estimate can be obtained at all, it is of the correct order of magnitude at least.

View this table:

Analysis of allele frequencies at several human MHC loci

In almost all cases, neutrality could be rejected using the Ewens-Watterson test even with a sample size of 100. For the examples summarized in Table 1 with n = 100, with u = 10-6 and s = 0.02 (S = 400 and M = 0.1) PH < 0.05 in 997 out of 1000 replicates and with u = 10-7, PH < 0.05 in all 1000 replicates. For the larger sample size (n = 1000) and both selection intensities (S = 400 and S = 4000), and for the smaller sample size (n = 100) and the stronger selection intensity (S = 4000), PH < 0.05 for all 1000 replicates in each case.

We also simulated some cases with weaker selection, S = 40. For this selection coefficient, we had to use higher mutation rates to obtain numbers of alleles comparable to what is found in many data sets we analyzed. In all cases, N = 10,000. With u = 10-5, the average number of alleles is ∼9 and for u = 5 × 10-5, the average is ∼15. For the smaller mutation rate, PH < 0.05 in 603 of 1000 replicates with n = 100, and PH < 0.05 in 785 of 1000 replicates with n = 1000. With the higher mutation rate, PH < 0.05 in 318 of 1000 replicates with n = 100 and 453 of 1000 replicates with n = 1000. In all cases, averages of Ŝ were biased upward somewhat.

Even when the entire population is sampled, Ŝ is not the true value. The reason is that Ŝ reflects two sources of variation, variation attributable to sampling and variation attributable to stochastic variation in the allele frequencies. Sampling the entire population removes only one source of variation. The relatively weak dependence of the variance of Ŝ on sample size shows that stochastic variation is the more important.

Estimates of M obtained in the same set of replicates are not as good and are probably not very useful. In most of the cases we examined, average estimates of M are not of the correct order of magnitude and the 95% confidence interval covers several orders of magnitude. For example, with n = 1000, S = 400, and M = 0.01, the average estimate of M was 0.11 and the 95% confidence interval was (9.8 × 10-8, 0.44). Our conclusion is that the method estimates S reasonably well, to within an order of magnitude at least, even for samples as small as 100, but that the estimate of M is not accurate enough to be useful.

Figure 2.

—Values of Ŝ for alleles distinguished by serotyping for four human Mhc loci (A, HLA-A; B, HLA-B; DQ, HLA-DQA1; DR, HLA-DRB1) in 11 populations (FRA, French; GER, German; ITA, Italian; NAI, Native American; INU, Inuit; JAP, Japanese; THA, Thai; VIE, Vietnamese; USA, United States; SEN, Senegalese).


We applied our method for estimating S to two published data sets. The first data set is based on alleles distinguished by serological typing (or serotyping) and was presented at the histocompatibility workshop held in 1991 (Tsujiet al. 1992). These data were analyzed previously by Satta et al. (1994) and copies of these data were kindly provided to us by Dr. Y. Satta. We analyzed data from four loci, two class I loci, human histocompatibility system HLA-A and HLA-B, and two class II loci, HLA-DQA1 and HLA-DRB1. The second data set was for class II loci for which alleles could be distinguished by differences in DNA sequence in the known or presumed peptide-binding regions (molecular typing), which are thought to be the targets of selection (Hughes and Yeager 1998). We analyzed the data for four loci, HLA-DQA1, HLA-DQB1, HLA-DPB1, and HLA-DRB1. These data were published in the proceedings of a later workshop on HLA (Charron 1997) and were analyzed previously by Valdes et al. (1999). These data were kindly provided by Dr. S. McWeeney. Only alleles at class II loci can be distinguished by molecular typing.

View this table:

Analysis of allele frequencies at four human MHC loci based on sequence typing of alleles

Alleles distinguished by serotyping: We restricted our analysis to populations in which the sample size was >100 chromosomes and the proportion of null or unrecognizable alleles was usually <0.05. To obtain wider geographic coverage, we included some populations with slightly higher frequencies of null alleles. Table 2 shows the populations, loci, and sample sizes, and the values of Ŝ and PH for each case. In no case was Ŝ = ∞ obtained and in all cases PM was 0 or nearly so, indicating that the hypothesis S = ∞ could always be rejected.

For HLA-A, values of PH indicate neutrality cannot be rejected in most populations, yet values of Ŝ are generally larger than those from HLA-DRB1 and HLA-DQA1, for which neutrality could usually be rejected. Neutrality could not be rejected for any of the samples from Native Americans and Inuits for any of the four loci. In those two populations, the value of PH for HLA-A is close enough to one that there appears to be significantly more homozygosity than expected under neutrality. It is difficult to know whether the excess homozygosity results from a problem with serotyping, from evidence of purifying selection, or from admixture in those populations. We considered the possibility that a population bottleneck could lead to excess homozygosity, but simulation results from a model of a bottlenecked population could not reproduce the patterns found in the Native American and Inuit samples (results not shown). A bottleneck generally results in the loss of alleles rather than excess homozygosity.

Our results are similar to those obtained by Satta et al. (1994). Our estimates of S are consistently lower than those in Table 2 of Satta et al. (1994). Their estimates of S were based on the comparison of pairwise differences in silent and replacement sites in the peptide-binding region. They used five different methods that gave somewhat different results. Their method I gave the lowest estimates: HLA-A, S = 690; HLA-B, S = 1200; HLA-DRB, S = 990; and HLA-DQB1, S = 360. For HLA-A, the largest values of Ŝ we found are slightly >200 (Figure 2). For HLA-B, most of the Ŝ values were between 500 and 800 with only one >1000. The ratio of Ŝ for HLA-B to HLA-A is in the range 2.5-4, which is slightly higher than the ratio found by Satta et al. for their method I but comparable to values they obtained using the other four methods. Given the wide confidence intervals associated with both methods for estimating selection intensities, our results are compatible with those of Satta et al. (1994).

Alleles distinguished by molecular typing: Table 3 shows the results for four class II loci. Values of Ŝ are consistently larger for DQB1 than DQA1. Values for DRB1 are the largest. The orders of magnitude of the values of Ŝ for the loci are the same as found by Satta et al. (1994) for the same four loci, but, as with the previous data set, our estimates of S are consistently smaller than theirs. For DQA1 and DRB1 we have estimates based on both serotyping and molecular typing. The data sets are mostly for different populations and, even when the same population is represented, different individuals were sampled. The results from analyzing the two data sets are comparable. Most values of Ŝ for European populations are between 50 and 100 with smaller values for Native American populations. The values of Ŝ based on molecular typing are somewhat smaller than based on serotyping for Native Americans. Values of Ŝ are larger for DRB1, and again Native American populations have the smallest values.

Figure 3.

—Values of Ŝ and PH from Table 3 for alleles distinguished by molecular typing for four Mhc class II loci in several human populations.


Our method provides a simple way to estimate the strength of overdominant selection from the observed distribution of allele frequencies. The estimate of S is essentially the inverse of the variance in frequencies. The multinomial Dirichlet distribution (3) is used to account for sampling. Our estimate of S is always an underestimate of the true intensity of selection because other factors, particularly differences in fitness among different alleles or anything else that causes deviations from the stationary distribution of allele frequencies, increase the variance and hence reduce the estimate of S. Our method assumes all variation in observed allele frequencies is the result of sampling and genetic drift.

The estimate of S obtained is an estimate of the net force tending to equalize allele frequencies. That force could be attributable only to overdominance in fitness, as we have assumed, or to frequency-dependent selection resulting from mating preferences or some other factor. The estimate itself does not indicate the kind of selection acting and does not validate a particular model of selection.

Another possible cause of differences in allele frequencies is bias in the mutation process, with some alleles or types of alleles arising more frequently than others. Intragenic recombination between different alleles could contribute to bias in mutation because recombination between alleles that are more different in DNA sequence in the peptide-binding region would be more likely to create functionally different alleles. To explore the effects of intragenic recombination, a sequence-based simulation model of the kind analyzed by Satta (1997) is necessary but that is beyond the scope of our study. Satta assumed that heterozygote fitness increased with the difference in DNA sequence of the peptide-binding region and allowed for intragenic recombination to create functionally different alleles. Both of these assumptions lead to increased variance in allele frequencies under selection alone and hence would reduce estimates of S obtained using our method.

Our method is similar to methods developed by M. N. Grote (unpublished results) and by P. Donnelly, M. Nordberg and P. Joyce (unpublished results), both of which rely on the theory of Joyce (1995). These methods are mathematically much more sophisticated than the method presented here and require extensive simulation to obtain an estimate of S for a single data set. Donnelly et al. apply their method to data for a locus with four alleles and observed counts (46, 166, 112, and 120) and estimate S to be 36. Our method gives 47, which is consistent with their results.

There is no simple relationship between our estimates of S and the tail probability from the Ewens-Watterson homozygosity test, PH. The variance in allele frequency is closely related to the homozygosity, Var(x)=1ki=1k(xix)2=1k(F1k), (6) where k is the number of alleles, xi is the frequency of the ith allele, x = 1/k is the average allele frequency, and F=Σi=1kxi2 is the homozygosity. But because of sampling, Ŝ and PH depend in complex ways on k and the counts of each allele. Figure 3 shows plots of Ŝ vs. PH for each of the four loci in Table 3. Although it would be convenient if PH indicated roughly the intensity of selection, it does not appear to do so very well.

Our estimates of S for HLA-A are of the same order of magnitude as those obtained by Satta et al. (1994) from the analysis of silent and replacement differences in the peptide-binding region of HLA-A. There appears to be some geographic variation in the intensities of overdominant selection in human populations. The broad confidence intervals on Ŝ we found in simulations mean that differences among European, Asian, and the one African population analyzed are not significant at any of the loci. It does appear that in the two North American populations, Native Americans and Inuit, either selection is much weaker than in the rest of the world or that admixture has resulted in the appearance of weaker selection.


We thank Dr. Y. Satta for providing copies of the data analyzed in the article by Satta et al. (1994) and Dr. S. McWeeney for providing copies of the data analyzed by Valdes et al. (1999), and for helpful discussions of this project. This research was supported in part by National Institutes of Health (NIH) grant GM40282 to M. Slatkin and by an NIH traineeship to C. A. Muirhead from grant GM07127.


  • Communicating editor: N. Takahata

  • Received May 22, 2000.
  • Accepted August 24, 2000.


View Abstract