Abstract
The sojourn times until fixation of an overdominant allele were investigated based on the diffusion equation. Furthermore, the rate of accumulation of mutations, or the substitution rate, was predicted from the mean extinction time of a common overdominant allele. The substitution rate calculated theoretically agreed well with that determined by computer simulation. Overdominant selection enhances the polymorphism at linked loci, while its effect on the sojourn times and the substitution rate at linked loci has not been studied yet. To solve these problems, a model that assumed two linked loci, each with infinite alleles, was examined by computer simulation. A decrease in the recombination rate between two loci markedly changed the distribution of sojourn times of a neutral allele. Although overdominant selection obviously increased the sojourn times and the polymorphism at a linked locus, the rate of nucleotide substitution at the neutral locus was not influenced significantly even if complete linkage was assumed. These results suggest that, in regions containing overdominant genes, linked neutral loci will exhibit elevated levels of polymorphism, but their rate of molecular evolution remains that predicted by neutral theory.
SEVERAL outstanding characteristics of overdominant alleles, such as the allele frequency distribution, the homozygosity, the maintenance of polymorphisms, the rate of gene substitution, and the allelic genealogy have been investigated based on the infiniteallele model (e.g., Ewens 1964; Kimura and Crow 1964; Watterson 1977; Li 1978; Yokoyama and Nei 1979; Maruyama and Nei 1981; Takahata 1990; Takahata and Nei 1990). Also, sojourn times for a selected allele as well as a neutral allele have been studied using the diffusion equation (Maruyama 1974; Nagylaki 1974; Maruyama and Kimura 1975). However, the sojourn times of alleles with arbitrary starting frequency at loci undergoing overdominant selection have yet to be determined. Under strong overdominant selection, all alleles that are maintained in a finite population can be divided into two distinct classes, common and rare, based on the frequency (Takahata 1990; Takahataet al. 1992). An allelic turnover occurs when a common allele is lost from the population and a rare allele becomes a common one at the overdominant locus. When such an allelic turnover occurs, it results in the substitution of nucleotides at sites that differ between the formerly common and rare alleles. The time interval between allelic turnovers is given by the sojourn times of a common allele, and the substitution rate at the overdominant locus can be calculated based on this interval (Takahata 1990). Thus, of particular interest are the sojourn times of a common overdominant allele. However, the dynamics of how a common allele is lost from a finite population are not adequately understood. The primary aim of this article is, using a diffusion equation, to investigate the sojourn times of a common overdominant allele.
It is known that overdominant selection at a particular locus enhances the polymorphism at a linked nonoverdominant locus or region (Hudson and Kaplan 1988; Kaplanet al. 1988; Satta 1997; Takahata and Satta 1998). Neutral alleles at such a locus behave as if nonneutral and may show a different evolutionary rate from that expected from the neutral theory by Kimura (1968). Although there are several studies relating to associative overdominance (e.g., Ohta and Kimura 1970, 1971; Thomson 1977), it has not been examined how the gene frequency of such a linked neutral allele would change until extinction on the basis of the model assuming infinite alleles for both loci. Here, the term “extinction” can be defined as the loss of an allele from the population. When we determine what kind of natural selection operates at a given locus, we need to consider not only the sequence variations but also the distribution of allele frequencies (frequency spectrum), which is obtained from the distribution of sojourn times. If the effect of overdominant selection on the frequency spectrum at a linked neutral locus is unclear, it is difficult to examine which locus is subjected to natural selection. Therefore it would be very desirable to obtain both the sojourn time and frequency spectrum of alleles at a neutral locus linked to a locus undergoing overdominant selection. Another purpose of this study is, using computer simulation, to explore the effect of overdominant selection on the sojourn times and the substitution rate at the linked neutral locus.
MODELS
Throughout this article, the infiniteallele model of Kimura and Crow (1964) is considered. In this model, mutations occur at the rate of u per gene per generation and each new allele is assumed to be unique. We assume that the relative fitnesses of homozygotes and heterozygotes are 1 − s and 1, respectively, under symmetric overdominant selection.
Analysis of the overdominant allele: In this study, the sojourn times are calculated based on the process until extinction of the allele in question. Since the sojourn times of infinite neutral alleles have already been determined by Maruyama (1974), we can use the same approach for overdominant alleles. Let τ_{x}[y] be the total number of generations in which the gene frequency is y, in the case that the initial allele frequency is x. Then τ_{x}[y] is represented as (Equation 7 in Maruyama 1974)
To calculate the sojourn times, J needs to be evaluated
before the calculation. For the moment, let us focus on the computation of J as well as the equilibrium distribution of the number of alleles. Let Φ(p)dp be the expected number of alleles whose frequency is in the range of p to p + dp. It is given by Wright (1938), Ewens (1964), and Yokoyama and Nei (1979):
Figure 1 shows the distribution of the sojourn times of a new mutant allele, or τ_{1/2}_{N}[y], based on (6) and (9). Here we assumed that N = 1000 and u = 10^{−5}. The mean extinction time of this allele is given by
To check the accuracy of our computation, simulation experiments were carried out. In the simulation, two genes are randomly chosen from a population without replacement, and the genotype is determined. If the individual is a homozygote, then we compare the homozygote disadvantage, s, with the random number that is generated in the range of 0–1. When the former is smaller than the latter, this individual can transmit one gene to the next generation. If the genotype is heterozygous, one of two genes is transmitted randomly to the next generation. This procedure is repeated until 2N genes are transmitted. Mutation events are considered after the transmission of the genes, and the parental allele and the time of mutation are recorded at the same time. After reaching equilibrium, the data for 2000 different alleles were recorded, and the mean sojourn times were calculated on the basis of this record (Figure 2). The results of theoretical expectation and simulation were in reasonable agreement with each other. When s = 0.1, (6) gives a better approximation than (9). In contrast, when s = 0.01, the theoretical curve from (9) is in fair agreement with the simulation results.
In this study, an allele whose frequency is >p_{min} is defined as a common allele and p_{max} is regarded as the mean frequency of a common allele at equilibrium. Figure 3 shows the sojourn times of a common allele with an initial frequency of p_{max} under overdominant selection (i.e., τ_{p}_{max}[y]). Unlike the sojourn times of a newly arisen mutant allele, the mean extinction time of
a common allele,
The nucleotide substitution is caused by allelic turnover of common alleles under overdominant selection (Maruyama and Nei 1981; Takahata 1990; Takahata and Nei 1990). Thus, the rate of substitution can be calculated on the basis of the mean extinction time of a common allele (Takahata 1990). In brief, the number of common alleles, n_{c}, is stable at equilibrium. When one of the common alleles disappears from a population, a new allele with a new nucleotide change is expected to become the common one. This time interval is given by t_{c}/n_{c}, and a new nucleotide change is maintained at the frequency of 1/n_{c}; i.e., it takes t_{c}/n_{c} generations until 1/n_{c} mutations are accumulated. We, therefore, express the mean rate of accumulation of mutations in the entire population, α, as
Analysis of the neutral allele at the linked locus: Unlike the single locus, it is difficult to treat the two linked loci with infinite alleles. To examine the effect of overdominant selection on the linked neutral gene, a simulation approach was used. The recombination rate between the overdominant locus and a neutral locus was denoted by r.
Among the overdominant loci, the major histocompatibility complex (MHC) loci, especially the human MHC, or human leukocyte antigen (HLA) loci that are located at 6p21.3, have been investigated intensively. On the basis of the HLA data, the effective size of the human population is considered to be ~10^{4}–10^{5} (see Takahata and Satta 1998). Also, the selection coefficients of HLA loci are estimated to be in the range of 0.0007–0.042 (Sattaet al. 1994), and the mutation rates in the antigen recognition site (ARS) are ~2.7 × 10^{−6} per gene per generation for the HLA class I loci and 0.8 × 10^{−6} for the HLA class II loci (Takahata and Satta 1998). To consider the realistic case that N = 10^{4}, u = 2 × 10^{−6}, and s = 0.025, parameter values in the simulation were set as follows: N = 500, u = 4 × 10^{−5}, and s = 0.5 (i.e., Nu = 0.02 and Ns = 250). As described above, the products of N and other parameter values were important, although N was small and other parameter values were large in our simulation. Figures 4 and 5 represent the mean sojourn times of a newly arisen mutant allele at a linked neutral locus, where u = 4 × 10^{−5} (Figure 4) and u = 4 × 10^{−4} (Figure 5) were assumed for the neutral locus. If r is not >10^{−3} (Nr = 0.5), the peak of sojourn times appeared in the middle frequency class, where a strong peak was observed for overdominant alleles; the peak of the overdominant alleles was markedly stronger than that of neutral alleles even when complete linkage was assumed (r = 0). Just as the expected number of alleles at a linked locus is smaller than that at an overdominant locus even if r = 0, the sojourn times at a linked locus are smaller than those at an overdominant locus. It should be noted that we can obtain the distribution of the expected number of alleles if we multiply 2Nu by the sojourn times, which are given in Figures 4 and 5.
For the same parameter sets, the mean rate of accumulation of neutral mutations, the number of alleles, and the oldest allelic divergence time among alleles that existed at the end of the simulation were examined for a linked neutral locus (Table 3). The number of alleles and the oldest allelic divergence time increased as r decreased, mainly because the mean coalescence time of two linked neutral genes increased (Takahata and Satta 1998). Unexpectedly, however, the rate of accumulation of mutations did not vary irrespective of the value of r. That is, the rate of molecular evolution is still equal to the mutation rate.
DISCUSSION
Theoretical calculations in this study are dependent on Φ(p) because the equilibrium homozygosity, J, is required to be known prior to the calculations. Although several onedimensional frequency spectrum approximations have been obtained (Ewens 1964; Kimura and Crow 1964; Wright 1966; Watterson 1977; Yokoyama and Nei 1979; Takahata 1990), we used (6) in this study to adjust to (1) and (2). It is not clear which approximation is the best. However, if N > 10^{4}, (6) does not differ from the other approximations.
The sojourn times or the frequency spectrum at the neutral locus that locates around the HLA locus are affected by natural selection operating at the HLA. Our results suggest that genes are influenced by HLA if the loci are located within 0.1 cM (r = 10^{−3}) from HLA, although this genetic distance is overestimated because the population size is not taken into consideration here. Recently, a high degree of polymorphism of CYP21 was reported (Cargillet al. 1999). This gene is located in the HLA region and encodes an enzyme that is associated with the biosynthesis of adrenal steroid hormones. Although Cargill et al. (1999) discussed the possibility that diversity at CYP21 is maintained by natural selection at the HLA loci, CYP21 is situated >0.5 Mb (which corresponds to ~0.5 cM) away from the HLAB and HLADRB1. Therefore, diversity of CYP21 does not seem to have been maintained by selection for HLA genes. CYP21 may be subjected to a kind of balancing selection as pointed out by Cargill et al. (1999) if its mutation rate is not extremely high compared to the other genes. To test this hypothesis, the frequency spectrum of CYP21 needs to be obtained from a random population sample.
Our results offer the key to understanding the synonymous substitutions in the ARS of HLA genes. The ARS of the HLA class II gene is composed of 16 amino acid residues, whose sites are codons 9, 11, 13, 28, 30, 37, 38, 57, 61, 67, 70, 71, 74, 78, 82, and 86, in the second exon. Since onefifth of the nucleotide changes are thought to be synonymous, and the mutation rate is 2 × 10^{−8} per site per generation for humans, the synonymous mutation rate in the ARS is ~0.2 × 10^{−6} per gene per generation for the HLA class II loci (0.2 × 10^{−6}=1/5 × 48 × 2 × 10^{−8}). If we suppose that the longterm breeding size in the human lineage is 10^{4} (case I) or 10^{5} (case II), the synonymous mutation rate is equivalent to 0.4 × 10^{−5} or 0.4 × 10^{−4}, respectively, in our simulation in which N = 500 is assumed. Likewise, the nonsynonymous mutation rate is assumed to be 1.6 × 10^{−5} (case I) or 1.6 × 10^{−4} (case II). The obtained synonymous substitution and nonsynonymous substitution rates are shown in Table 4. Here, we use substitution rate as a rate of accumulation of mutations. The number of synonymous changes in the ARS between any pair of alleles was larger for r = 0 than for r = 0.5 (data not shown), while the synonymous substitution rate was approximately equal to the synonymous mutation rate. Whether the synonymous substitution rate at the HLA locus is increased by the overdominant selection can be evaluated by the interlocus comparisons of HLA genes. Hughes and Nei (1988, 1989) calculated the mean number of nucleotide substitutions per synonymous site, d_{S}, in the ARS and in the nonARS between different loci (e.g., HLADQB1 vs. HLADRB1 and HLAA vs. HLAB). If the synonymous substitution rate is equal to the synonymous mutation rate regardless of the degree of the linkage, d_{S} in the ARS is expected not to be larger than that in the nonARS, because the synonymous substitution rate mainly determines the d_{S} in interlocus comparisons. This prediction agrees with the results of Hughes and Nei (1988, 1989). Although our simulation model is too simple to examine the molecular evolution in the ARS of the MHC gene, we can conclude that, unlike the nucleotide diversity, the rate of synonymous substitution even in the ARS should not be enhanced by overdominant selection.
Acknowledgments
We express our great thanks to Dr. Naoyuki Takahata for his interest and helpful comments. We are grateful to Dr. Andrew G. Clark for reviewing the manuscript and providing helpful suggestions and to the anonymous reviewers for their insightful comments. Thanks to Dr. Minato Nakazawa and Mr. Jun Hagihara for help in carrying out simulation experiments. This study was supported by a GrantinAid for Scientific Research from the Japanese Ministry of Education, Science, Sports, and Culture.
Footnotes

Communicating editor: A. G. Clark
 Received June 30, 1999.
 Accepted February 14, 2000.
 Copyright © 2000 by the Genetics Society of America