IDT. Quality oligos. Every time.

Genetics, Vol. 155, 921-927, June 2000, Copyright © 2000

Sojourn Times and Substitution Rate at Overdominant and Linked Neutral Loci

Jun Ohashia and Katsushi Tokunagaa
a Department of Human Genetics, University of Tokyo, Graduate School of Medicine, Tokyo 113-0033, Japan

Corresponding author: Jun Ohashi, Department of Human Genetics, Graduate School of Medicine, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan., juno{at}m.u-tokyo.ac.jp (E-mail)

Communicating editor: A. G. CLARK


*  ABSTRACT
*TOP
*ABSTRACT
*MODELS
*DISCUSSION
*LITERATURE CITED

The sojourn times until fixation of an overdominant allele were investigated based on the diffusion equation. Furthermore, the rate of accumulation of mutations, or the substitution rate, was predicted from the mean extinction time of a common overdominant allele. The substitution rate calculated theoretically agreed well with that determined by computer simulation. Overdominant selection enhances the polymorphism at linked loci, while its effect on the sojourn times and the substitution rate at linked loci has not been studied yet. To solve these problems, a model that assumed two linked loci, each with infinite alleles, was examined by computer simulation. A decrease in the recombination rate between two loci markedly changed the distribution of sojourn times of a neutral allele. Although overdominant selection obviously increased the sojourn times and the polymorphism at a linked locus, the rate of nucleotide substitution at the neutral locus was not influenced significantly even if complete linkage was assumed. These results suggest that, in regions containing overdominant genes, linked neutral loci will exhibit elevated levels of polymorphism, but their rate of molecular evolution remains that predicted by neutral theory.


SEVERAL outstanding characteristics of overdominant alleles, such as the allele frequency distribution, the homozygosity, the maintenance of polymorphisms, the rate of gene substitution, and the allelic genealogy have been investigated based on the infinite-allele model (e.g., EWENS 1964 Down; KIMURA and CROW 1964 Down; WATTERSON 1977 Down; LI 1978 Down; YOKOYAMA and NEI 1979 Down; MARUYAMA and NEI 1981 Down; TAKAHATA 1990 Down; TAKAHATA and NEI 1990 Down). Also, sojourn times for a selected allele as well as a neutral allele have been studied using the diffusion equation (MARUYAMA 1974 Down; NAGYLAKI 1974 Down; MARUYAMA and KIMURA 1975 Down). However, the sojourn times of alleles with arbitrary starting frequency at loci undergoing overdominant selection have yet to be determined. Under strong overdominant selection, all alleles that are maintained in a finite population can be divided into two distinct classes, common and rare, based on the frequency (TAKAHATA 1990 Down; TAKAHATA et al. 1992 Down). An allelic turnover occurs when a common allele is lost from the population and a rare allele becomes a common one at the overdominant locus. When such an allelic turnover occurs, it results in the substitution of nucleotides at sites that differ between the formerly common and rare alleles. The time interval between allelic turnovers is given by the sojourn times of a common allele, and the substitution rate at the overdominant locus can be calculated based on this interval (TAKAHATA 1990 Down). Thus, of particular interest are the sojourn times of a common overdominant allele. However, the dynamics of how a common allele is lost from a finite population are not adequately understood. The primary aim of this article is, using a diffusion equation, to investigate the sojourn times of a common overdominant allele.

It is known that overdominant selection at a particular locus enhances the polymorphism at a linked nonoverdominant locus or region (HUDSON and KAPLAN 1988 Down; KAPLAN et al. 1988 Down; SATTA 1997 Down; TAKAHATA and SATTA 1998 Down). Neutral alleles at such a locus behave as if nonneutral and may show a different evolutionary rate from that expected from the neutral theory by KIMURA 1968 Down. Although there are several studies relating to associative overdominance (e.g., OHTA and KIMURA 1970 Down, OHTA and KIMURA 1971 Down; THOMSON 1977 Down), it has not been examined how the gene frequency of such a linked neutral allele would change until extinction on the basis of the model assuming infinite alleles for both loci. Here, the term "extinction" can be defined as the loss of an allele from the population. When we determine what kind of natural selection operates at a given locus, we need to consider not only the sequence variations but also the distribution of allele frequencies (frequency spectrum), which is obtained from the distribution of sojourn times. If the effect of overdominant selection on the frequency spectrum at a linked neutral locus is unclear, it is difficult to examine which locus is subjected to natural selection. Therefore it would be very desirable to obtain both the sojourn time and frequency spectrum of alleles at a neutral locus linked to a locus undergoing overdominant selection. Another purpose of this study is, using computer simulation, to explore the effect of overdominant selection on the sojourn times and the substitution rate at the linked neutral locus.


*  MODELS
*TOP
*ABSTRACT
*MODELS
*DISCUSSION
*LITERATURE CITED

Throughout this article, the infinite-allele model of KIMURA and CROW 1964 Down is considered. In this model, mutations occur at the rate of u per gene per generation and each new allele is assumed to be unique. We assume that the relative fitnesses of homozygotes and heterozygotes are 1 - s and 1, respectively, under symmetric overdominant selection.

Analysis of the overdominant allele:
In this study, the sojourn times are calculated based on the process until extinction of the allele in question. Since the sojourn times of infinite neutral alleles have already been determined by MARUYAMA 1974 Down, we can use the same approach for overdominant alleles. Let {tau}x[y] be the total number of generations in which the gene frequency is y, in the case that the initial allele frequency is x. Then {tau}x[y] is represented as (Equation 7 in MARUYAMA 1974 Down)

(1)


(2)

where G(p) = exp {-2{int}p()dp}. M{delta}p and V{delta}p are, respectively, the mean and the variance of the change in the gene frequency p per generation. Equation 1 and Equation 2 consider a possibility for temporary fixation of the allele. Under the infinite-allele model and symmetric overdominant selection, M{delta}p and V{delta}p are given by -up - and respectively, where J (= {Sigma}p2k) is the sum of the squares of the frequencies of all alleles and N is the size of a diploid population (KIMURA and CROW 1964 Down). As in previous studies (e.g., EWENS 1964 Down; KIMURA and CROW 1964 Down; TAKAHATA 1990 Down), J is assumed to be a constant and is given theoretically as a function of N, u, and s. Putting A = , we obtain

(3)

If we substitute V{delta}p and (3) into (1) and (2), then we obtain

(4)


(5)

Numerical calculations are required to compute the integral equation in (4) and (5) unless -4N(u + A - AJ) and -4NA are special values.

To calculate the sojourn times, J needs to be evaluated before the calculation. For the moment, let us focus on the computation of J as well as the equilibrium distribution of the number of alleles. Let {Phi}(p)dp be the expected number of alleles whose frequency is in the range of p to p + dp. It is given by WRIGHT 1938 Down, EWENS 1964 Down, and YOKOYAMA and NEI 1979 Down:

(6)

Equation 6 has a local maximum at pmax = and a local minimum at pmin = , if 0 < 1 - a + b < 2b, where we put a = 4N(u + A - AJ) - 1 and b = 4NA. Since the sum of allele frequencies is equal to unity, the relation

(7)

is satisfied. J, which satisfied (7), was calculated for several parameter sets of N, u, and s (Table 1). Once J is obtained, it can be evaluated by

(8)

because (8) gives the mean homozygosity. Although values of J from (8) are not quite identical to those of J from (7) due to the approximation, values of J from (7) are numerically close to those from (8) (data not shown). KIMURA and CROW 1964 Down and TAKAHATA 1990 Down obtained a different equation with reference to {Phi}(p):

(9)


 
View this table:
In this window
In a new window

 
Table 1. Numerical solutions for steady-state homozygosity (J)

They used -up - sp(p - J) and p/2N as M{delta}p and V{delta}p, respectively. As N increases, the difference between J from (6) and (9) becomes small (Table 1). In this study, J from (6) are used in the following calculations. When A {cong} s, {tau}[y] and {Phi}(p) are functions of Nu and Ns. As a consequence, the results of theoretical calculation and simulation for the particular parameters N, u, and s, can be regarded as the results for other parameter values chosen appropriately.

Fig 1 shows the distribution of the sojourn times of a new mutant allele, or {tau}1/2N[y], based on (6) and (9). Here we assumed that N = 1000 and u = 10-5. The mean extinction time of this allele is given by {int}1{tau}[y]dy. We can see that the extinction time of a particular mutant allele (s > 0) is mostly determined by the duration of its stay at the middle frequency class.



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 1. Distributions of sojourn times of a mutant overdominant allele with an initial frequency of 1/2N. The curves were calculated based on Equation 6 and Equation 9. N = 1000, u = 10-5.

To check the accuracy of our computation, simulation experiments were carried out. In the simulation, two genes are randomly chosen from a population without replacement, and the genotype is determined. If the individual is a homozygote, then we compare the homozygote disadvantage, s, with the random number that is generated in the range of 0–1. When the former is smaller than the latter, this individual can transmit one gene to the next generation. If the genotype is heterozygous, one of two genes is transmitted randomly to the next generation. This procedure is repeated until 2N genes are transmitted. Mutation events are considered after the transmission of the genes, and the parental allele and the time of mutation are recorded at the same time. After reaching equilibrium, the data for 2000 different alleles were recorded, and the mean sojourn times were calculated on the basis of this record (Fig 2). The results of theoretical expectation and simulation were in reasonable agreement with each other. When s = 0.1, (6) gives a better approximation than (9). In contrast, when s = 0.01, the theoretical curve from (9) is in fair agreement with the simulation results.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 2. Distributions of sojourn times of a mutant overdominant allele with an initial frequency of 1/2N. The curves were obtained from computer simulations. N = 1000, u = 10-5.

In this study, an allele whose frequency is >pmin is defined as a common allele and pmax is regarded as the mean frequency of a common allele at equilibrium. Fig 3 shows the sojourn times of a common allele with an initial frequency of pmax under overdominant selection (i.e., {tau}pmax[y]). Unlike the sojourn times of a newly arisen mutant allele, the mean extinction time of a common allele, tc = {int}1 {tau}pmax [y]dy, decreased as s increased, as expected from (6) in TAKAHATA 1990 Down. tc for s = 0.01, s = 0.05, and s = 0.1 were ~18,784, 14,004, and 12,635 generations, respectively, in Fig 3. Likewise, the simulation showed the same tendency (tc for s = 0.01, s = 0.05, and s = 0.1 were 17,433, 15,526, and 10,875 generations, respectively). The main reason is that the number of common alleles is large and pmax is small under strong overdominant selection; accordingly one of the common alleles is apt to disappear from a finite population due to random genetic drift.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 3. Distributions of sojourn times of a common overdominant allele with an initial frequency of pmax. N = 1000, u = 10-5.

The nucleotide substitution is caused by allelic turnover of common alleles under overdominant selection (MARUYAMA and NEI 1981 Down; TAKAHATA 1990 Down; TAKAHATA and NEI 1990 Down). Thus, the rate of substitution can be calculated on the basis of the mean extinction time of a common allele (TAKAHATA 1990 Down). In brief, the number of common alleles, nc, is stable at equilibrium. When one of the common alleles disappears from a population, a new allele with a new nucleotide change is expected to become the common one. This time interval is given by tc/nc, and a new nucleotide change is maintained at the frequency of 1/nc; i.e., it takes tc/nc generations until 1/nc mutations are accumulated. We, therefore, express the mean rate of accumulation of mutations in the entire population, {alpha}, as

(10)

where we measure {alpha} in units of 1/u generations. Equation 10 is not useful for estimating {alpha}, compared to TAKAHATA's (1990) formula,

(11)

However, (10) agreed well with the results of the simulation (Table 2).


 
View this table:
In this window
In a new window

 
Table 2. The rate of accumulation of mutations at the overdominant locus ({alpha})

Analysis of the neutral allele at the linked locus:
Unlike the single locus, it is difficult to treat the two linked loci with infinite alleles. To examine the effect of overdominant selection on the linked neutral gene, a simulation approach was used. The recombination rate between the overdominant locus and a neutral locus was denoted by r.

Among the overdominant loci, the major histocompatibility complex (MHC) loci, especially the human MHC, or human leukocyte antigen (HLA) loci that are located at 6p21.3, have been investigated intensively. On the basis of the HLA data, the effective size of the human population is considered to be ~104–105 (see TAKAHATA and SATTA 1998 Down). Also, the selection coefficients of HLA loci are estimated to be in the range of 0.0007–0.042 (SATTA et al. 1994 Down), and the mutation rates in the antigen recognition site (ARS) are ~2.7 x 10-6 per gene per generation for the HLA class I loci and 0.8 x 10-6 for the HLA class II loci (TAKAHATA and SATTA 1998 Down). To consider the realistic case that N = 104, u = 2 x 10-6, and s = 0.025, parameter values in the simulation were set as follows: N = 500, u = 4 x 10-5, and s = 0.5 (i.e., Nu = 0.02 and Ns = 250). As described above, the products of N and other parameter values were important, although N was small and other parameter values were large in our simulation. Fig 4 and Fig 5 represent the mean sojourn times of a newly arisen mutant allele at a linked neutral locus, where u = 4 x 10-5 (Fig 4) and u = 4 x 10-4 (Fig 5) were assumed for the neutral locus. If r is not >10-3 (Nr = 0.5), the peak of sojourn times appeared in the middle frequency class, where a strong peak was observed for overdominant alleles; the peak of the overdominant alleles was markedly stronger than that of neutral alleles even when complete linkage was assumed (r = 0). Just as the expected number of alleles at a linked locus is smaller than that at an overdominant locus even if r = 0, the sojourn times at a linked locus are smaller than those at an overdominant locus. It should be noted that we can obtain the distribution of the expected number of alleles if we multiply 2Nu by the sojourn times, which are given in Fig 4 and Fig 5.



View larger version (24K):
In this window
In a new window
Download PPT slide
 
Figure 4. Distributions of sojourn times of a linked neutral allele with an initial frequency of 1/2N. The curves were obtained from computer simulations. N = 500, u = 4.0 x 10-5 at both loci.



View larger version (22K):
In this window
In a new window
Download PPT slide
 
Figure 5. Distributions of sojourn times of a linked neutral allele with an initial frequency of 1/2N. The curves were obtained from computer simulations. N = 500, u = 4.0 x 10-5 at the overdominant locus, and u = 4.0 x 10-4 at the linked neutral locus.

For the same parameter sets, the mean rate of accumulation of neutral mutations, the number of alleles, and the oldest allelic divergence time among alleles that existed at the end of the simulation were examined for a linked neutral locus (Table 3). The number of alleles and the oldest allelic divergence time increased as r decreased, mainly because the mean coalescence time of two linked neutral genes increased (TAKAHATA and SATTA 1998 Down). Unexpectedly, however, the rate of accumulation of mutations did not vary irrespective of the value of r. That is, the rate of molecular evolution is still equal to the mutation rate.


 
View this table:
In this window
In a new window

 
Table 3. The rate of accumulation of mutations ({alpha}), the number of alleles (n), and the oldest allelic divergence time (T) at a linked neutral locus


*  DISCUSSION
*TOP
*ABSTRACT
*MODELS
*DISCUSSION
*LITERATURE CITED

Theoretical calculations in this study are dependent on {Phi}(p) because the equilibrium homozygosity, J, is required to be known prior to the calculations. Although several one-dimensional frequency spectrum approximations have been obtained (EWENS 1964 Down; KIMURA and CROW 1964 Down; WRIGHT 1966 Down; WATTERSON 1977 Down; YOKOYAMA and NEI 1979 Down; TAKAHATA 1990 Down), we used (6) in this study to adjust to (1) and (2). It is not clear which approximation is the best. However, if N > 104, (6) does not differ from the other approximations.

The sojourn times or the frequency spectrum at the neutral locus that locates around the HLA locus are affected by natural selection operating at the HLA. Our results suggest that genes are influenced by HLA if the loci are located within 0.1 cM (r = 10-3) from HLA, although this genetic distance is overestimated because the population size is not taken into consideration here. Recently, a high degree of polymorphism of CYP21 was reported (CARGILL et al. 1999 Down). This gene is located in the HLA region and encodes an enzyme that is associated with the biosynthesis of adrenal steroid hormones. Although CARGILL et al. 1999 Down discussed the possibility that diversity at CYP21 is maintained by natural selection at the HLA loci, CYP21 is situated >0.5 Mb (which corresponds to ~0.5 cM) away from the HLA-B and HLA-DRB1. Therefore, diversity of CYP21 does not seem to have been maintained by selection for HLA genes. CYP21 may be subjected to a kind of balancing selection as pointed out by CARGILL et al. 1999 Down if its mutation rate is not extremely high compared to the other genes. To test this hypothesis, the frequency spectrum of CYP21 needs to be obtained from a random population sample.

Our results offer the key to understanding the synonymous substitutions in the ARS of HLA genes. The ARS of the HLA class II gene is composed of 16 amino acid residues, whose sites are codons 9, 11, 13, 28, 30, 37, 38, 57, 61, 67, 70, 71, 74, 78, 82, and 86, in the second exon. Since one-fifth of the nucleotide changes are thought to be synonymous, and the mutation rate is 2 x 10-8 per site per generation for humans, the synonymous mutation rate in the ARS is ~0.2 x 10-6 per gene per generation for the HLA class II loci (0.2 x 10-6{fallingdotseq} 1/5 x 48 x 2 x 10-8). If we suppose that the long-term breeding size in the human lineage is 104 (case I) or 105 (case II), the synonymous mutation rate is equivalent to 0.4 x 10-5 or 0.4 x 10-4, respectively, in our simulation in which N = 500 is assumed. Likewise, the nonsynonymous mutation rate is assumed to be 1.6 x 10-5 (case I) or 1.6 x 10-4 (case II). The obtained synonymous substitution and nonsynonymous substitution rates are shown in Table 4. Here, we use substitution rate as a rate of accumulation of mutations. The number of synonymous changes in the ARS between any pair of alleles was larger for r = 0 than for r = 0.5 (data not shown), while the synonymous substitution rate was approximately equal to the synonymous mutation rate. Whether the synonymous substitution rate at the HLA locus is increased by the overdominant selection can be evaluated by the interlocus comparisons of HLA genes. HUGHES and NEI 1988 Down, HUGHES and NEI 1989 Down calculated the mean number of nucleotide substitutions per synonymous site, dS, in the ARS and in the non-ARS between different loci (e.g., HLA-DQB1 vs. HLA-DRB1 and HLA-A vs. HLA-B). If the synonymous substitution rate is equal to the synonymous mutation rate regardless of the degree of the linkage, dS in the ARS is expected not to be larger than that in the non-ARS, because the synonymous substitution rate mainly determines the dS in interlocus comparisons. This prediction agrees with the results of HUGHES and NEI 1988 Down, HUGHES and NEI 1989 Down. Although our simulation model is too simple to examine the molecular evolution in the ARS of the MHC gene, we can conclude that, unlike the nucleotide diversity, the rate of synonymous substitution even in the ARS should not be enhanced by overdominant selection.


 
View this table:
In this window
In a new window

 
Table 4. The rate of accumulation of mutations at the nonsynonymous and synonymous sites at the ARS


*  ACKNOWLEDGMENTS

We express our great thanks to Dr. Naoyuki Takahata for his interest and helpful comments. We are grateful to Dr. Andrew G. Clark for reviewing the manuscript and providing helpful suggestions and to the anonymous reviewers for their insightful comments. Thanks to Dr. Minato Nakazawa and Mr. Jun Hagihara for help in carrying out simulation experiments. This study was supported by a Grant-in-Aid for Scientific Research from the Japanese Ministry of Education, Science, Sports, and Culture.

Manuscript received June 30, 1999; Accepted for publication February 14, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MODELS
*DISCUSSION
*LITERATURE CITED

CARGILL, M., D. ALTSHULER, J. IRELAND, P. SKLAR, and K. ARDLIE et al., 1999  Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231-238[Medline].

EWENS, W. J., 1964  The maintenance of alleles by mutation. Genetics 50:891-898[Free Full Text].

HUDSON, R. R. and N. L. KAPLAN, 1988  The coalescent process in models with selection and recombination. Genetics 120:831-840[Abstract/Free Full Text].

HUGHES, A. L. and M. NEI, 1988  Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167-170[Medline].

HUGHES, A. L. and M. NEI, 1989  Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection. Proc. Natl. Acad. Sci. USA 86:958-962[Abstract/Free Full Text].

KAPLAN, N. L., T. DARDEN, and R. R. HUDSON, 1988  The coalescent process in models with selection. Genetics 120:819-829[Abstract/Free Full Text].

KIMURA, M., 1968  Evolutionary rate at the molecular level. Nature 217:624-626[Medline].

KIMURA, M. and J. F. CROW, 1964  The number of alleles that can be maintained in a finite population. Genetics 49:725-738[Free Full Text].

LI, W.-H., 1978  Maintenance of genetic variability under the joint effect of mutation, selection and random drift. Genetics 90:349-382[Abstract/Free Full Text].

MARUYAMA, T., 1974  The age of an allele in a finite population. Genet. Res. 23:137-143[Medline].

MARUYAMA, T. and M. KIMURA, 1975  Moments for sum of an arbitrary function of gene frequency along a stochastic path of gene frequency change. Proc. Natl. Acad. Sci. USA 72:1602-1604[Abstract/Free Full Text].

MARUYAMA, T. and M. NEI, 1981  Genetic variability maintained by mutation and overdominant selection in finite populations. Genetics 98:441-459[Abstract/Free Full Text].

NAGYLAKI, T., 1974  The moments of stochastic integrals and the distribution of sojourn times. Proc. Natl. Acad. Sci. USA 71:746-749[Abstract/Free Full Text].

OHTA, T. and M. KIMURA, 1970  Development of associative overdominance through linkage disequilibrium in finite populations. Genet. Res. 16:165-177[Medline].

OHTA, T. and M. KIMURA, 1971  Behavior of neutral mutants influenced by associated overdominant loci in finite populations. Genetics 69:247-260[Free Full Text].

SATTA, Y., 1997  Effects of intra-locus recombination on HLA polymorphism. Hereditas 127:105-112[Medline].

SATTA, Y., C. O'HUIGIN, N. TAKAHATA, and J. KLEIN, 1994  Intensity of natural selection at the major histocompatibility complex loci. Proc. Natl. Acad. Sci. USA 91:7184-7188[Abstract/Free Full Text].

TAKAHATA, N., 1990  A simple genealogical structure of strongly balanced allelic lines and trans-species evolution of polymorphism. Proc. Natl. Acad. Sci. USA 87:2419-2423[Abstract/Free Full Text].

TAKAHATA, N. and M. NEI, 1990  Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics 124:967-978[Abstract].

TAKAHATA, N. and Y. SATTA, 1998  Footprints of intragenic recombination at HLA loci. Immunogenetics 47:430-441[Medline].

TAKAHATA, N., Y. SATTA, and J. KLEIN, 1992  Polymorphism and balancing selection at major histocompatibility complex loci. Genetics 130:925-938[Abstract].

THOMSON, G., 1977  The effect of a selected locus on linked neutral loci. Genetics 85:753-788[Abstract/Free Full Text].

WATTERSON, G. A., 1977  Heterosis or neutrality? Genetics 85:789-814[Abstract/Free Full Text].

WRIGHT, S., 1938  The distribution of gene frequencies under irreversible mutation. Proc. Natl. Acad. Sci. USA 24:253-259[Free Full Text].

WRIGHT, S., 1966  Polyallelic random drift in relation to evolution. Proc. Natl. Acad. Sci. USA 55:1074-1081[Free Full Text].

YOKOYAMA, S. and M. NEI, 1979  Population dynamics of sex-determining alleles in honey bees and self-incompatibility alleles in plants. Genetics 91:609-626[Abstract/Free Full Text].