Abstract
Kimura (1955b) proposed a solution for the time-dependent probability of fixation, or loss, of a gene under selection. Example calculations suggest the formulas are prone to numerical inaccuracies. An alternative solution is proposed; the correctness of the solution is confirmed by numerically solving the Kolmogorov backward equation and by simulation.
THE process of change of the frequency of a gene over time in a large random-mating population can be treated as a stochastic process and approximated by a diffusion process. Kimura modeled the dynamic process of gene frequency change over time under different models for mutation and selection by making use of diffusion theory (Kimura 1954, 1955a,b, 1957, 1964, 1978). Empirical studies have suggested that many mutant disease-related alleles are under natural selection, for example, the MHC-HLA gene family (Dean et al. 2002). Studying the dynamics of a disease-related allele under natural selection is important for understanding the evolutionary history of a disease (Slatkin and Rannala 2000). In this article, we review the existing theory of gene frequency evolution and suggest that for a selected allele in the case of no dominance it is not possible to accurately calculate the time-dependent fixation and loss probabilities using Kimura's solution; there are no numerical examples of this problem in Kimura's article (Kimura 1955b, 1964). By further investigating properties of the boundary conditions of the diffusion process and the solution of the oblate spheroidal equation, we propose an alternative solution for the time-dependent probability of fixation, or loss, of an allele that can be accurately calculated by using other existing solutions (e.g., the ultimate probability of fixation, or loss, of an allele). The correctness of our result was confirmed by simulation studies and by numerically solving the Kolmogorov backward equation.
Consider a large random-mating population; two alleles exist at a locus with a selectively advantageous allele with frequency p0 at generation 0. The probability density that the frequency of the favored allele is x (x ∈ (0, 1)) at generation t, denoted by φ(x, t|p0), satisfies the Kolmogorov forward equation (or Fokker-Planck equation). The average change of allele frequency per generation under selection (with selection coefficient s > 0) is approximately 1according to a diffusion approximation if there is no dominance. The variance of the change of gene frequency due to random drift is x(1 − x)/(2N) and φ(x, t|p0) can be obtained by solving the following partial differential equation (PDE),
2with boundaries x = 0 and x = 1, where N is the Wright-Fisher population size. Kimura solved this PDE by using the separation-of-variables method (Kimura 1955b, 1957, 1964). Applying this method, φ(x, t|p0) can be expressed as a product of a function of x and t alone. It has been found that the function of x alone has a similar form to an intermediate solution of the oblate spheroidal equation (Stratton et al. 1941, 1956). Kimura proposed the solution for Equation 2 on the basis of the theory of the oblate spheroidal equation,
3where
4and
is the Gegenbauer polynomial, λk is the eigenvalue of the oblate spheroidal angular function, and fkn is the intermediate coefficient of the spheroidal harmonics. Feller (1952) has classified the boundaries associated with the one-dimensional diffusion process into four general classes. According to Feller's classification criteria, the boundaries both belong to the exit boundaries. From the nature of the process, the boundaries x = 0 and x = 1 are absorbing barriers. It was shown that the probability mass functions of loss and fixation before time t, denoted by f0(t|p0) and f1(t|p0), respectively, satisfy
5
6for an absorbing barrier process (Bharucha-Reid 1960). After simplification, Equations 5 and 6 become
7which were used by Wright (1931), Wright and Kerr (1954), and Kimura (1955b)(1964). The explicit solution of the probability of fixation, or loss, of an allele by generation t, given initial frequency p0, can be derived by means of the above relations and Equation 3. The procedure has been used by Kimura (1955b)(1964). For example, the loss probability of an allele within t generations was derived as
8
If we instead write the above equation in the form , we can see that if t/N is not too small (e.g., t/N ≥ 0.5),
rapidly converges as the eigenvalue of the oblate spheroidal angular function, λk, increases with increasing k and
is then close to 0, even for small k. However, the term
converges very slowly, and finite approximations of this term turn out to be inaccurate if a finite k is used for numerical calculations. An alternative approach is to obtain the probability of fixation, or loss, of an allele after time t by evaluating the integral over time from t to ∞. The probability of fixation, or loss, of a gene before t can then be obtained by using the stationary fixation (loss) probability of an allele under selection and subtracting the fixation (loss) probability of the allele after time t. As was pointed out, one or the other boundary is eventually reached, and the probability of reaching either boundary was derived early on (Ewens 1979). Using this approach, the time-dependent loss, and fixation, probabilities can be written as
9and
10respectively. Note that in recent years more efficient algorithms have been proposed for calculating the eigenvalues λk and coefficient fkn of the spheroidal harmonics (Li 1998). Equations 9 and 10 above provide accurate numerical results for these probabilities. The examples in Figure 1, calculated using the above equations, illustrate the effect of natural selection on the loss and fixation probability of an allele at time t for different initial allele frequencies.
The probability of loss, or fixation, of an allele at time t = 103 generations under different selection intensities, s, given various initial frequencies of the allele, p0, calculated using Equations 9 and 10. The population size N = 103.
Using the solution for the fixation and loss probabilities, one can examine related questions about the effects of natural selection on allele frequency distributions that have not been studied previously. For example, the probability density function of the length of the time until fixation of a selected allele, when only cases in which an allele is ultimately fixed are taken into account, can be obtained by differentiating the cumulative distribution of the time until fixation with respect to t. This turns out to be 11
Numerical examples calculated using the above equation are demonstrated in Figure 2. We can see from this figure that the distribution of the time required for fixation is broad and becomes narrower as selection intensities increase. For a selectively neutral allele, the above equation can be used by simply letting limc→0 {y(t|p0)}, and numerical results are compared with results calculated previously for this case by using the solution of the probability distribution of the length of time until fixation of a selectively neutral allele derived by Kimura (1970). The results calculated using the two equations agree closely.
Probability distributions of allele age T, given the initial frequency of the allele p0 = 1/(2N) and the allele is being fixed, under different selection intensity s, calculated using Equation 11. The population size N = 103.
Previous studies have shown that the loss and fixation probabilities satisfy the Kolmogorov backward equation. The correctness of the novel solutions for the absorption probabilities by generation t presented above was examined by comparing numerical results obtained using these solutions with the results obtained by numerically solving the corresponding backward Kolmogorov equations and with results obtained by computer simulations.
Numerical solution of the PDE:
To obtain the probabilities of fixation and loss, one approach is to make use of the backward Kolmogorov equation. The basic difference between the forward and backward equations is the variable that is being fixed. This method was used by Kimura to study the neutral case (Kimura 1964) and the solutions have been verified by simulation studies. For the selection model we discuss, the probability that a favored allele with initial frequency p0 is fixed by generation t, denoted by u(p0, t), satisfies 12
Note that u(p0, t) is equivalent to f1(t|p0) but we use different notation to distinguish the backward approach. The boundary conditions are 13for fixation probability. The loss probability also satisfies Equation 12 but with boundary conditions opposite to those of Equation 13. The initial condition for both equations is
14given that the initial frequency of the allele is x0 at initial time t0. We solved Equation 12 with boundary conditions (13) and initial condition (14) numerically by the implicit finite difference method (Crank-Nicolson scheme).
Simulation studies:
To confirm the accuracy of the numerical solutions of the absorption probabilities and compare these with the solutions calculated using the above analytic equations, the sample path (over time) of the population frequency of a selected allele was simulated by a modified “pseudo-sampling variable (PSV)” method. The PSV method was suggested by Kimura and was used for the simplest neutral models (Kimura 1980; Kimura and Takahata 1983). The method can be easily extended to models taking account of selection and mutation. The gist of the method is to simulate the diffusion process itself rather than simulating the entire population of alleles at each generation under a binomial distribution for the case of two alleles or a multinomial distribution for three or more alleles. For the selection model discussed above, E[δx] = sx(1 − x), Var[δx] = x(1 − x)/(2N), and the frequency of the allele at the next generation x′, given the current frequency x, is simulated by 15where ξPSV is a uniform random variable with mean sx(1 − x) and variance x(1 − x)/(2N). We fixed the age of the allele and simulated the sample path of allele frequency for a large number of iterations, tabulating the proportion of simulated populations in which the allele was fixed, or lost, by time t (SIM), and comparing this with the expected probabilities obtained by numerically solving the PDE (NUM) and with the results obtained using Equations 9 and 10 (ANA). The results calculated using Kimura's solutions (KIM) were also included. For both ANA and KIM, the sums of the equations were evaluated to k = 8. Results are listed in Table 1. The examples in Figure 2 were also verified by the simulations.
Comparison of numerical results of ANA, NUM, SIM, and KIM
The probability distribution of an allele, taking account of random drift and natural selection, was studied by Kimura under diffusion theory. The complexity of Kimura's analytical solutions make them difficult to use, except in the case that only random drift is considered. By calculating the probabilities of fixation and loss of an allele using the equations given by Kimura (e.g., Equation 8), we found the results were invariably quite inaccurate even though the sum of Equation 8 was evaluated to a relatively large k, and t/N ≥ 1 was used. An alternative solution is proposed, which provides much more accurate results even when a small k is chosen. For example, in Table 1, when t/N = 1 and k was set to be 8, the numerical results agree very closely with those obtained numerically and by simulation.
Acknowledgments
We thank Warren Ewens for helpful comments. Support was provided for this research by Canadian Institutes of Health Research grant MOP 44064 and National Institutes of Health grant HG01988 to B.R.
Footnotes
Communicating editor: M. K. Uyenoyama
- Received February 17, 2004.
- Accepted July 7, 2004.
- Genetics Society of America