Random Genetic Drift and Gamete Frequency

Mano, Shuhei

doi:10.1534/genetics.104.036897

Abstract

An analytic expression of conditional expectation of transient gamete frequency, given that one of the two loci remains polymorphic, is obtained in terms of the diffusion process by calculating the moments of the distribution. Using this expression, a model where linkage disequilibrium is introduced by a single mutation is considered. The conditional expectation of the gamete frequency given that the locus with the mutant allele remains polymorphic is presented. The behavior is significantly different from the monotonic decrease observed in the deterministic model without random genetic drift.

WITH respect to random genetic drift for the one-locus problem, the state of steady decay was first obtained correctly by Wright (1931). However, in this study it was assumed that the state of steady decay had already been attained. By calculating the moments of the distribution, Kimura (1955a) obtained the complete expression of the transient probability density for the unfixed class, which shows how the process leads to the state of steady decay. It was found that after 2N generations the distribution becomes almost flat, where N is the effective population size.

Since each mutant ultimately becomes either fixed or lost, the steady state will be attained only if linear evolutionary pressures, such as mutation, operate. For the two-locus problem, the steady state has been discussed in terms of the diffusion process (Ohta and Kimura 1969b, 1970; Ethier and Nagylaki 1989) and the genealogical process (Griffiths 1981; Hudson 1983, 1985; Golding 1984; Ethier and Griffiths 1990). In contrast, in situations without linear evolutionary pressures, how the process eventually leads to the state of steady decay has not been studied, with the exception of several functions that vanish at the absorbing boundaries (Hill and Robertson 1968; Ohta and Kimura 1969a; Littler 1973). Despite the fact that the two-locus problem is uniquely characterized by gamete frequencies, the transient behavior of them has not been examined.

In this article, I derive an analytic expression of conditional expectation of the transient gamete frequency, given that one of the two loci remains polymorphic, in terms of the diffusion process. This expression shows how it ultimately leads to the asymptotic value. Using this expression, a model where linkage disequilibrium is introduced by a single mutation is discussed.

THE DIFFUSION PROCESS OF THE TWO-LOCUS PROBLEM

Consider a random-mating population with an effective population size of N. We measure time T in units of 2N generations. Let A₁ and A₂ be a pair of alleles with initial allele frequencies that are p and 1 − p, respectively, and the allele frequencies at time T are X and 1 − X, respectively. Kimura (1955a) obtained an analytic expression of the transient probability density for the unfixed class. In what follows lowercase letters of random variables represent their values. Let ϕ(p, x; T) be the probability density. The probability that the locus remains polymorphic was also given,

\begin{eqnarray*}&&P[X{\in}(0,{\,}1)]{=}{{\int}_{0}^{1}}\mathrm{{\phi}}dx\\&&{=}1{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}{\,}E[X^{n}]{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}{\,}E[(1{-}X)^{n}]\\&&{=}{{\sum}_{m{=}0}^{{\infty}}}[P_{2m}(1{-}2p){-}P_{2m{+}2}(1{-}2p)]e^{{-}((2m{+}1)(2m{+}2)/2)T},\end{eqnarray*}

(1)

where P_m(z) represents the Legendre polynomial.

Next, we discuss the expectation of the allele frequency. In general, since we cannot observe a polymorphism that has been lost, a polymorphism can be observed only for an unfixed class. Thus, the obvious relation E[X] = p is nonsense from the perspective of observation. We have interest in the conditional expectation of the frequencies given that a polymorphism is retained. By using the expression of the transient fixation probability, which was given by Kimura (1955a), we obtain the conditional expectation of the allele frequency for the unfixed class,

\[E[X{\vert}X{\in}(0,{\,}1)]{=}\frac{E[XI_{(0,1)}(X)]}{P[X{\in}(0,{\,}1)]},\]

(2)

where

\begin{eqnarray*}&&E[XI_{(0,1)}(X)]{=}{{\int}_{0}^{1}}x\mathrm{{\phi}}dx{=}E[X]{-}f(1;{\,}T)\\&&{=}{{\sum}_{m{=}1}^{{\infty}}}\frac{({-}1)^{m}}{2}[P_{m{+}1}(1{-}2p){-}P_{m{-}1}(1{-}2p)]e^{{-}(m(m{+}1)/2)T},\end{eqnarray*}

where I_(0,1)(X) represents the indicator function of the open interval (0, 1) and f(1; T) represents the transient fixation probability of the allele A₁. The asymptotic value of the conditional expectation of the allele frequency is

\[E[X{\vert}X{\in}(0,{\,}1)]{\rightarrow}\frac{1}{2}{\ }(T{\rightarrow}{\infty}),\]

which agrees with the fact that the conditional distribution becomes uniform asymptotically.

Let us assume two loci A and B in which pairs of alleles A₁, A₂ and B₁, B₂ are segregating, and let the initial frequencies of gametes A₁B₁, A₁B₂, A₂B₁, and A₂B₂ be, respectively, g₁, g₂, g₃, and 1 − (g₁ + g₂ + g₃), and let the frequencies of them at time T be, respectively, X₁, X₂, X₃, and 1 − (X₁ + X₂ + X₃). Let the initial frequencies of alleles B₁ and B₂ be, respectively, q and 1 − q, and let the frequencies of them at time T be Y and 1 − Y, respectively. Let D = g₁(1 − g₁ − g₂ − g₃) − g₂g₃ be the initial value of the linkage disequilibrium coefficient and Z = X₁(1 − X₁ − X₂ − X₃) − X₂X₃ be the value of the linkage disequilibrium coefficient at time T. We have

\[X_{1}{=}XY{+}Z,{\ }X_{2}{=}X(1{-}Y){-}Z,{\ }X_{3}{=}(1{-}X)Y{-}Z.\]

Let c be the recombination fraction between the two loci, and we set ρ = 4Nc. We do not discuss where c = 0, since the problem reduces to the multiallelic one-locus problem that has previously been discussed by Kimura (1955b). For the deterministic model without random genetic drift, we have x = p, y = q, and z = De^−ct.

The probability density for the gamete frequencies

\[\mathrm{{\phi}}(g_{1},{\,}g_{2},{\,}g_{3};{\,}x_{1},{\,}x_{2},{\,}x_{3};{\,}T)\]

satisfies the following Kolmogorov backward equation,

\[\frac{{\partial}\mathrm{{\phi}}}{{\partial}T}{=}{{\sum}_{i,j{=}1}^{3}}\frac{g_{i}(\mathrm{{\delta}}_{ij}{-}g_{j})}{2}\frac{{\partial}^{2}\mathrm{{\phi}}}{{\partial}g_{i}{\partial}g_{j}}{-}\frac{\mathrm{{\rho}}D}{2}\left(\frac{{\partial}\mathrm{{\phi}}}{{\partial}g_{1}}{-}\frac{{\partial}\mathrm{{\phi}}}{{\partial}g_{2}}{-}\frac{{\partial}\mathrm{{\phi}}}{{\partial}g_{3}}\right)\]

(3)

(Ohta and Kimura 1969a), where δ_ij represents Kronecker's delta. Although the probability density itself is unknown, Ohta and Kimura (1969a) obtained expectations of functions

\[X(1{-}X)Y(1{-}Y),{\ }(1{-}2X)(1{-}2Y)Z,{\ }Z^{2},\]

(4)

which were discussed by Hill and Robertson (1968). The process is defined in a tetrahedron 0 ≤ x₁ ≤ x₁ + x₂ ≤ x₁ + x₂ + x₃ ≤ 1. By changing variables from the gamete frequencies to the variables x, y, and z, the region is transformed into a three-dimensional region, the upper surface of the boundary of which is depicted in Figure 1. On the peripheral edges, which is the periphery of the square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, one of the two loci is monomorphic. At the points (1, 1, 0), (1, 0, 0), (0, 1, 0), and (0, 0, 0), one of the gametes A₁B₁, A₁B₂, A₂B₁, and A₂B₂ fixes, respectively. We represent the inside of the region as

\(\mathcal{D}\)

⁠. The expectation of the linkage disequilibrium coefficient is

\[E[Z]{=}De^{{-}(1{+}(\mathrm{{\rho}}/2))T}\]

(Hill and Robertson 1968), and the squared standard linkage deviation tends to

\[\frac{E[Z^{2}]}{E[X(1{-}X)Y(1{-}Y)]}{\sim}\frac{1}{\mathrm{{\rho}}}{\ }(T{\rightarrow}{\infty}),\]

when ρ is large (Ohta and Kimura 1969a).

Figure 1.—

Open in new tab Download slide

The upper surface of the boundary of the region in which the diffusion process is defined.

Next, we discuss the expectation of the gamete frequencies. In the same manner as for the functions (4) and the linkage disequilibrium measure, we obtain the expectation of the gamete frequency X₁:

\[E[X_{1}]{=}g_{1}{+}\frac{\mathrm{{\rho}}D}{2{+}\mathrm{{\rho}}}[e^{{-}(1{+}(\mathrm{{\rho}}/2))T}{-}1].\]

However, in contrast to the functions (4) and the linkage disequilibrium coefficient, the gamete frequencies do not vanish at the peripheral edges. The expectation takes over not only the inside of the region

\(\mathcal{D}\)

⁠, but also the peripheral edges. As obtained by Ohta (1968),

\(\mathrm{lim}_{T{\rightarrow}{\infty}}E[X_{1}]\)

gives the fixation probability of the gamete A₁B₁. Thus, the expectation of the gamete frequency X₁ can be rewritten as

\begin{eqnarray*}&&E[X_{1}]{=}{{\int}}{{\int}}{{\int}_{\mathcal{D}}}x_{1}\mathrm{{\phi}}dxdydz{+}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{x{=}1}dy{+}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{y{=}1}dx\\&&{+}f(1,{\,}0,{\,}0;{\,}T),\end{eqnarray*}

(5)

where ϕ_x=1 and ϕ_y=1 represent the probability density for the open intervals x = 1, y ∈ (0, 1), z = 0 and x ∈ (0, 1), y = 1, z = 0, respectively, and f(1, 0, 0; T) represents the transient fixation probability of the gamete A₁B₁ at time T. With respect to the one-locus problem, as discussed above, we are interested in the conditional expectation of the gamete frequencies given that polymorphism is retained.

CONDITIONAL EXPECTATION OF GAMETE FREQUENCY

Let us suppose a model whereby linkage disequilibrium is introduced by a single mutation, as considered by Nei and Li (1980) regarding the association between electromorphs and inversion chromosomes in Drosophila. We assume that locus A has remained monomorphic with the wild-type allele A₂ and that locus B, in which a pair of alleles B₁ and B₂ (electromorphs) are segregating, has allele frequencies q and 1 − q, respectively. Then, the mutation introduces the mutant allele (inversion chromosome) A₁ to locus A of one of the allele B₁ bearing chromosomes. This model specifies the initial allele frequency of A₁ as p = 1/2N, the initial gamete frequencies as g₁ = p, g₂ = 0, and g₃ = q − p, and the initial value of the linkage disequilibrium measure as D = p(1 − q); however, the following expressions hold regardless of these relations. In this model, a polymorphism at locus A is important since allele A₁ is prone to be lost by random genetic drift. In addition, locus B may be regarded as a marker polymorphism to detect the mutant. In this article, we consider the conditional expectation given that locus A remains polymorphic. It might seem that this condition is similar to that described by Kaplan and Weir (1992). They discussed conditional expectation of the linkage disequilibrium measure, which was defined by Nei and Li (1980), given that a polymorphism is observed at locus B. They assumed that the allele frequency of A₁ is constant and that locus B follows the infinite-allele model assumption. Moreover, they considered the steady state. Thus, their model differs from that described here, and the condition that locus A remains polymorphic is meaningful for our diffusion process, which ultimately leads to monomorphism. Note that this condition nearly equates to a condition that both of the two loci remain polymorphic for large-size populations, since the probability that a polymorphism at locus A is lost earlier than that at locus B is given by Karlin and McGregor (1968),

\[\frac{q(1{-}q)}{q(1{-}q){+}p(1{-}p)},\]

which is almost unity unless the allele frequency q is very small.

By expression (5), we have

\begin{eqnarray*}&&E[X_{1}I_{(0,1)}(X)]{=}{{\int}}{{\int}}{{\int}_{\mathcal{D}}}x_{1}\mathrm{{\phi}}dxdydz{+}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{y{=}1}dx\\&&{=}E[X_{1}]{-}f(1,{\,}0,{\,}0;{\,}T){-}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{x{=}1}dy\\&&{=}E[X_{1}]{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}E[X_{1}X^{n}]\\&&{=}E[X_{1}]{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}E[X^{n}Y].\end{eqnarray*}

(6)

The expressions for the other gamete frequencies X₂ and X₃ can be obtained in the same manner. To calculate the limit of the expectation

\(\mathrm{lim}_{n{\rightarrow}{\infty}}E[X^{n}Y]\)

⁠, we consider some moments. For convenience, we denote

\[\mathrm{{\mu}}_{{\ell},m,n}{=}E[X^{{\ell}}Y^{m}Z^{n}].\]

Making use of the Kolmogorov backward equation (3), the moments μ_ℓ,m,n satisfy a differential equation:

\begin{eqnarray*}&&\frac{d\mathrm{{\mu}}_{{\ell},m,n}}{dT}{=}{-}\frac{{\ell}({\ell}{-}1){+}m(m{-}1){+}n(n{-}1){+}4n({\ell}{+}m){+}n(2{+}\mathrm{{\rho}})}{2}\mathrm{{\mu}}_{{\ell},m,n}\\&&{+}{\ell}m\mathrm{{\mu}}_{{\ell}{-}1,m{-}1,n{+}1}{+}\frac{{\ell}({\ell}{+}2n{-}1)}{2}\mathrm{{\mu}}_{{\ell}{-}1,m,n}{+}\frac{m(m{+}2n{-}1)}{2}\mathrm{{\mu}}_{{\ell},m{-}1,n}\\&&{+}\frac{n(n{-}1)}{2}[\mathrm{{\mu}}_{{\ell},m,n{-}1}{+}\mathrm{{\mu}}_{{\ell}{+}1,m{+}1,n{-}2}{-}\mathrm{{\mu}}_{{\ell}{+}1,m{+}2,n{-}2}{-}\mathrm{{\mu}}_{{\ell}{+}2,m{+}1,n{-}2}\\&&{+}\mathrm{{\mu}}_{{\ell}{+}2,m{+}2,n{-}2}{-}2(\mathrm{{\mu}}_{{\ell}{+}1,m,n{-}1}{+}\mathrm{{\mu}}_{{\ell},m{+}1,n{-}1})\\&&{+}4\mathrm{{\mu}}_{{\ell}{+}1,m{+}1,n{-}1}].\end{eqnarray*}

It is worthwhile to note that E[X^ℓY^mX₁ⁿ] satisfies a recurrence relation on the two-locus sampling distribution by Golding (1984; Ethier and Griffiths 1990).

The moments μ_n,0,1 satisfy a differential equation,

\[\frac{d\mathrm{{\mu}}_{n{-}1,0,1}}{dT}{=}{-}\left[\frac{n(n{+}1)}{2}{+}\frac{\mathrm{{\rho}}}{2}\right]\mathrm{{\mu}}_{n{-}1,0,1}{+}\frac{n(n{-}1)}{2}\mathrm{{\mu}}_{n{-}2,0,1},\]

and the differential equation has the solution of the form

\[\mathrm{{\mu}}_{n{-}1,0,1}{=}{{\sum}_{m{=}1}^{n}}C_{n{-}1}^{(m)}e^{{-}((\mathrm{{\rho}}{+}m(m{+}1))/2)T},\]

where

\[C_{n{-}1}^{(m)}{=}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)\left(\begin{array}{l}2m{-}1\\m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{m(2m{+}1)}{n(2n{+}1)}C_{m{-}1}^{(m)},\]

with the initial condition

\[p^{n{-}1}D{=}{{\sum}_{m{=}1}^{n}}C_{n{-}1}^{(m)}.\]

(7)

In appendix a it is shown that

\[C_{n{-}1}^{(m)}{=}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{2m{+}1}{n(2n{+}1)}D({-}1)^{m{+}1}T_{m{-}1}^{1}(1{-}2p),\]

where

\(T_{m}^{1}(z)\)

represents the Gegenbauer polynomial, which is also represented as

\(C_{m}^{3/2}(z)\)

⁠.

The moments μ_n,1,0 satisfy a differential equation,

\[\frac{d\mathrm{{\mu}}_{n,1,0}}{dT}{=}{-}\frac{n(n{-}1)}{2}(\mathrm{{\mu}}_{n,1,0}{-}\mathrm{{\mu}}_{n{-}1,1,0}){+}n\mathrm{{\mu}}_{n{-}1,0,1},\]

and the differential equation has the solution of the form

\begin{eqnarray*}&&\mathrm{{\mu}}_{n,1,0}{=}pq{+}\frac{D}{1{+}(\mathrm{{\rho}}/2)}{+}{{\sum}_{m{=}1}^{n{-}1}}E_{n}^{(m)}e^{{-}(m(m{+}1)/2)T}\\&&{+}{{\sum}_{m{=}1}^{n}}F_{n}^{(m)}e^{{-}((\mathrm{{\rho}}{+}m(m{+}1))/2)T},\end{eqnarray*}

where

\[E_{n}^{(m)}{=}\frac{\left(\begin{array}{l}2n{-}1\\n{+}m\end{array}\right)\left(\begin{array}{l}2m{+}1\\m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}E_{m{+}1}^{(m)},\]

(8)

\[[(n{+}m)(n{-}m{-}1){-}\mathrm{{\rho}}]F_{n}^{(m)}{=}n(n{-}1)F_{n{-}1}^{(m)}{+}2nC_{n{-}1}^{(m)},\]

(9)

with the initial condition

\[p^{n}q{=}pq{+}\frac{D}{1{+}(\mathrm{{\rho}}/2)}{+}{{\sum}_{m{=}1}^{n{-}1}}E_{n}^{(m)}{+}{{\sum}_{m{=}1}^{n}}F_{n}^{(m)}.\]

(10)

The recurrence relation (9) can be expressed by a matrix equation,

\(\mathbf{\mathrm{Af}}{=}\mathbf{\mathrm{c}}\)

with vectors

\(\mathbf{\mathrm{f}}_{k}{=}F_{k}^{(m)},{\,}\mathbf{\mathrm{c}}_{k}{=}2kC_{k{-}1}^{(m)},(k{=}m,{\,}m{+}1,{\ldots},{\,}n)\)

⁠. The determinant of the matrix A is

\[\mathrm{det}\mathbf{\mathrm{A}}{=}{{\prod}_{k{=}m}^{n}}[k(k{-}1){-}m(m{+}1){-}\mathrm{{\rho}}],\]

which has zeros at ρ = 2 + 2ℓ, (ℓ = 1, 2, 3, …). These zeros are due to degeneracy of the eigenvalues. Since we are not interested in the specific points of ρ, we discuss the case that the inverse matrix exists in the following, although the calculation is straightforward for each point ρ = 2 + 2ℓ, (ℓ = 1, 2, 3, …). By applying the inverse matrix, we obtain

\begin{eqnarray*}&&F_{n}^{(m)}{=}{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{2n!(n{-}1)!}{[(n{-}k)!]^{2}}\\&&{\times}\frac{{\Gamma}(n{-}k{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{-}k{+}(1/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}C_{n{-}k}^{(m)}\\&&{=}\left[{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\right.\ \\&&\left.\ {\times}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\right]\\&&{\times}4D(2m{+}1)({-}1)^{m{+}1}T_{m{-}1}^{1}(1{-}2p),\\&&\mathrm{{\sigma}}{=}\sqrt{\frac{1}{4}{+}m(m{+}1){+}\mathrm{{\rho}}}.\end{eqnarray*}

It is shown in appendix b that the finite series inside of the brackets follows an identity:

\begin{eqnarray*}&&{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\\&&{=}\frac{n!(n{-}1)!}{(n{+}m{-}1)!(n{-}m)!}\frac{{-}1}{2(2m{+}1)}\\&&{\times}\left[\frac{1}{2m{+}\mathrm{{\rho}}}{+}\frac{1}{2(m{+}1){-}\mathrm{{\rho}}}\frac{(n{-}m)(n{-}m{-}1)}{(n{+}m)(n{+}m{+}1)}\right].\end{eqnarray*}

(11)

Thus, we obtain

\begin{eqnarray*}&&F_{n}^{(m)}{=}\frac{n!(n{-}1)!}{(n{+}m{-}1)!(n{-}m)!}\\&&{\times}\left[\frac{1}{2m{+}\mathrm{{\rho}}}{+}\frac{1}{2(m{+}1){-}\mathrm{{\rho}}}\frac{(n{-}m)(n{-}m{-}1)}{(n{+}m)(n{+}m{+}1)}\right]\\&&{\times}2D({-}1)^{m}T_{m{-}1}^{1}(1{-}2p).\end{eqnarray*}

(12)

By using (12) and the orthogonal property of the Gegenbauer polynomial,

\[{{\int}_{{-}1}^{1}}(1{-}z^{2})T_{k{-}1}^{1}(z)T_{{\ell}{-}1}^{1}(z)dz{=}\mathrm{{\delta}}_{k,{\ell}}\frac{2{\ell}({\ell}{+}1)}{2{\ell}{+}1},\]

(13)

we obtain the general expression for

\(E^{(m)}_{n}\)

from (10),

\begin{eqnarray*}&&E_{n}^{(m)}{=}({-}1)^{m}\frac{\left(\begin{array}{l}2n{-}1\\n{+}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\\&&{\times}\left\{\frac{2m{+}1}{m(m{+}1)}2pq(1{-}p)T_{m{-}1}^{1}(1{-}2p)\right.\ \\&&\left.\ {+}2D\left[\frac{T_{m}^{1}(1{-}2p)}{2(m{+}1){+}\mathrm{{\rho}}}{+}\frac{T_{m{-}2}^{1}(1{-}2p)}{2m{-}\mathrm{{\rho}}}\right]\right\}{\ }(m{\geq}2),\\&&E_{n}^{(1)}{=}{-}\frac{n{-}1}{n{+}1}3\left[pq(1{-}p){+}\frac{2D(1{-}2p)}{4{+}\mathrm{{\rho}}}\right].\end{eqnarray*}

We observe the limit,

\begin{eqnarray*}&&\mathrm{{\mu}}_{n,1,0}{\rightarrow}q\left[p{+}{{\sum}_{m{=}1}^{{\infty}}}({-}1)^{m}\frac{\left(\begin{array}{l}2n{-}1\\n{+}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{2m{+}1}{m(m{+}1)}\right.\ \\&&\left.\ {\times}2p(1{-}p)T_{m{-}1}^{1}(1{-}2p)e^{{-}(m(m{+}1)/2)T}\right]\\&&{=}q\mathrm{{\mu}}_{n,0,0}{\ }(\mathrm{{\rho}}{\rightarrow}{\infty}),\end{eqnarray*}

which agrees with the limit theorem given by Ethier (1979).

We observe the limits of

\(F^{(m)}_{n}\)

and

\(E^{(m)}_{n}\)

⁠,

\[{\mathrm{lim}_{n{\rightarrow}{\infty}}}F_{n}^{(m)}{=}\frac{4D(2m{+}1)({-}1)^{m}}{(2m{+}\mathrm{{\rho}})[2(m{+}1){-}\mathrm{{\rho}}]}T_{m{-}1}^{1}(1{-}2p)\]

and

\[{\mathrm{lim}_{n{\rightarrow}{\infty}}}E_{n}^{(m)}{=}\left(\begin{array}{l}2m{+}1\\m\end{array}\right)E_{m{+}1}^{(m)},\]

respectively.

By using these results for the moments, we arrive at the analytic expression of (6):

\begin{eqnarray*}&&E[X_{1}I_{(0,1)}(X)]{=}\frac{\mathrm{{\rho}}D}{2{+}\mathrm{{\rho}}}e^{{-}(1{+}(\mathrm{{\rho}}/2))T}{+}3\left[pq(1{-}p){+}\frac{2D(1{-}2p)}{4{+}\mathrm{{\rho}}}\right]e^{{-}T}\\&&{-}{{\sum}_{m{=}2}^{{\infty}}}({-}1)^{m}\left\{\frac{2m{+}1}{m(m{+}1)}2pq(1{-}p)T_{m{-}1}^{1}(1{-}2p)\right.\ \\&&\left.\ {+}2D\left[\frac{T_{m}^{1}(1{-}2p)}{2(m{+}1){+}\mathrm{{\rho}}}{+}\frac{T_{m{-}2}^{1}(1{-}2p)}{2m{-}\mathrm{{\rho}}}\right]\right\}e^{{-}(m(m{+}1)/2)T}\\&&{-}{{\sum}_{m{=}1}^{{\infty}}}\frac{4D(2m{+}1)({-}1)^{m}}{(2m{+}\mathrm{{\rho}})[2(m{+}1){-}\mathrm{{\rho}}]}T_{m{-}1}^{1}(1{-}2p)e^{{-}((\mathrm{{\rho}}{+}m(m{+}1))/2)T}.\end{eqnarray*}

We observe the limit,

\begin{eqnarray*}&&E[X_{1}I_{(0,1)}(X)]{\rightarrow}De^{{-}ct}{+}{{\sum}_{m{=}1}^{{\infty}}}\frac{q({-}1)^{m}}{2}[P_{m{+}1}(1{-}2p){-}P_{m{-}1}(1{-}2p)]\\&&{=}pq{+}De^{{-}ct}{\ }(N{\rightarrow}{\infty}),\end{eqnarray*}

which shows the deterministic behavior of the gamete frequency X₁ without random genetic drift, as expected.

We observe the asymptotic form,

\[E[X_{1}I_{(0,1)}(X)]{\sim}3\left[pq(1{-}p){+}\frac{2D(1{-}2p)}{4{+}\mathrm{{\rho}}}\right]e^{{-}T}{\ }(T{\rightarrow}{\infty}).\]

The conditional expectation of the gamete frequency X₁ given that locus A remains polymorphic is

\[E[X_{1}{\vert}X{\in}(0,{\,}1)]{=}\frac{E[X_{1}I_{(0,1)}(X)]}{P[X{\in}(0,{\,}1)]},\]

where the denominator is given by (1). The asymptotic value of the conditional expectation of the gamete frequency X₁ is

\[E[X_{1}{\vert}X{\in}(0,{\,}1)]{\rightarrow}\frac{q}{2}{+}\frac{D(1{-}2p)}{p(1{-}p)(4{+}\mathrm{{\rho}})}{\ }(T{\rightarrow}{\infty}).\]

(14)

In contrast to the deterministic model without random genetic drift, the value is higher than pq, to which the deterministic model tends, and depends on ρ. Note that the second term of the asymptotic value in (14) represents the conditional covariance between the frequencies of the alleles A₁ and B₁. The process of the change in the conditional expectation of the gamete frequency X₁ when the linkage disequilibrium is introduced into a population as p = 1/2N = 0.05 and q = 0.2 is illustrated in Figure 2. It can be seen that after 4N generations (T = 2.0) the conditional expectation of the gamete frequency X₁ almost reaches the asymptotic value for large ρ, although 4N generations is still not enough for small ρ. It can also be seen that the conditional expectation of the gamete frequency X₁ does not show monotonic behavior for small ρ. It increases rapidly and then decreases to the asymptotic value. For comparison, the counterpart in the deterministic model is also illustrated in Figure 3

Figure 2.—

The conditional expectation of the gamete frequency X1 given that locus A keeps polymorphism. p = 0.05 and q = 0.2.

Open in new tab Download slide

The conditional expectation of the gamete frequency X₁ given that locus A keeps polymorphism. p = 0.05 and q = 0.2.

Figure 3.—

The gamete frequency X1 in the deterministic model without random genetic drift. p = 0.05 and q = 0.2.

Open in new tab Download slide

The gamete frequency X₁ in the deterministic model without random genetic drift. p = 0.05 and q = 0.2.

.

To observe the frequency of the allele B₁ within the allele A₁ bearing chromosomes, let us consider a ratio of the conditional expectation of the gamete frequency X₁ to that of the allele frequency X,

\[\frac{E[X_{1}{\vert}X{\in}(0,{\,}1)]}{E[X{\vert}X{\in}(0,{\,}1)]},\]

where the denominator is given by (2). The asymptotic value is

\[\frac{E[X_{1}{\vert}X{\in}(0,{\,}1)]}{E[X{\vert}X{\in}(0,{\,}1)]}{\rightarrow}q{+}\frac{2D(1{-}2p)}{p(1{-}p)(4{+}\mathrm{{\rho}})}{\ }(T{\rightarrow}{\infty}).\]

(15)

In contrast to the deterministic model without random genetic drift, the value is higher than q, to which the deterministic model tends, and depends on ρ. The process of the change in the ratio of the conditional expectation of the gamete frequency X₁ to that of the allele frequency X when the linkage disequilibrium is introduced into a population as p = 1/2N = 0.05 and q = 0.2 is illustrated in Figure 4. It can be seen that after 4N generations (T = 2.0) the ratio almost reaches the asymptotic value for large ρ, although 4N generations is still not enough for small ρ. For comparison, the counterpart in the deterministic model is also illustrated in Figure 5. It can be seen that the discrepancy between our model and the deterministic model is significant for small ρ.

Figure 4.—

The ratio of the conditional expectation of the gamete frequency X1 given that locus A keeps polymorphism to that of the allele frequency X. p = 0.05 and q = 0.2.

Open in new tab Download slide

The ratio of the conditional expectation of the gamete frequency X₁ given that locus A keeps polymorphism to that of the allele frequency X. p = 0.05 and q = 0.2.

Figure 5.—

The ratio of the gamete frequency X1 to the allele frequency X in the deterministic model without random genetic drift. p = 0.05 and q = 0.2.

Open in new tab Download slide

The ratio of the gamete frequency X₁ to the allele frequency X in the deterministic model without random genetic drift. p = 0.05 and q = 0.2.

DISCUSSION

The analytic expression of conditional expectation of transient gamete frequency given that one of the two loci remains polymorphic was obtained in terms of the diffusion process by calculating the moments of the distribution. This expression is general and independent from models that introduce linkage disequilibrium into a population.

We considered the model that linkage disequilibrium is introduced by a single mutation and association between the mutant allele A₁ and the allele B₁, which filled the other locus of the chromosome on which the mutation occurred. Because the allele A₁ is prone to be lost by random genetic drift, the conditional expectation of the frequency of the gamete A₁B₁ given that locus A remains polymorphic is meaningful. The behavior is significantly different from the monotonic decrease in the deterministic model without random genetic drift. After 4N generations, the conditional expectation of the gamete frequency almost reaches the asymptotic value for large ρ, although 4N generations is still not enough for small ρ. The asymptotic value is larger than the product of the initial allele frequencies to which the deterministic model tends and depends on the recombination fraction between the two loci. Note that the conditional expectation of the linkage disequilibrium coefficient vanishes asymptotically in a similar manner to that in the deterministic model. This observation demonstrates the obvious fact that the linkage disequilibrium measure is not enough to characterize the two-locus problem uniquely.

APPENDIX A

Since the Gegenbauer polynomial is orthogonal on the interval [−1, 1], the right-hand terms of (7) can be represented in terms of the Gegenbauer polynomials of which degrees are up to n − 1 as

\[p^{n{-}1}D{=}{{\sum}_{m{=}1}^{n}}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)\left(\begin{array}{l}2m{-}1\\m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{m(2m{+}1)}{n(2n{+}1)}DC_{m}T_{m{-}1}^{1}(z),\]

where we set z = 1 − 2p. Multiplying

\((1{-}z^{2})T_{m{-}1}^{1}(z)\)

on both sides of the equation and using the orthogonal property (13), we have

\[C_{m}{=}\frac{({-}1)^{m{+}1}2^{{-}n}\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}{\left(\begin{array}{l}2m{-}1\\m\end{array}\right)\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)}\frac{n(2n{+}1)}{m^{2}(m{+}1)}{{\int}_{{-}1}^{1}}(1{-}z)(1{+}z)^{n}T_{m{-}1}^{1}(z)dz.\]

An integral transform of (1 − z)(1 + z)ⁿ by the Gegenbauer polynomial is

\[{{\int}_{{-}1}^{1}}(1{-}z)(1{+}z)^{n}T_{m{-}1}^{1}(z)dz{=}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{2^{n}m(m{+}1)}{n(2n{+}1)}\]

(Erdélyi 1954). Thus, we have

\[C_{m}{=}\frac{({-}1)^{m{+}1}}{m\left(\begin{array}{l}2m{-}1\\m\end{array}\right)}.\]

APPENDIX B

It is straightforward to check the identity (11) for m = n. For 1 ≤ m ≤ n − 1, the finite series can be expressed by the truncated hypergeometric series,

\begin{eqnarray*}&&{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}{=}\frac{n!(n{-}1)!}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\\&&{\times}\left[\frac{m{\Gamma}(m{-}(1/2){+}\mathrm{{\sigma}}){\Gamma}(m{-}(1/2){-}\mathrm{{\sigma}})}{(2m{+}1)!}\right.\ \\&&\left.\ {\times}y_{n{-}m}\left(m{-}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{-}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}2m{+}2,{\,}1\right)\right.\ \\&&\left.\ {+}\frac{{\Gamma}(m{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(m{+}(1/2){-}\mathrm{{\sigma}})}{(2m{+}2)!}\right.\ \\&&\left.\ {\times}y_{n{-}m{-}1}\left(m{+}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{+}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}2m{+}3,{\,}1\right)\right],\end{eqnarray*}

where y_n(a, b, c, z) is the truncated hypergeometric series. The truncated hypergeometric series can be expressed in terms of the generalized hypergeometric series,

\[y_{i}(a,{\,}b,{\,}c,{\,}1){=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}c{+}i;{\,}1\\c,{\,}a{+}b{+}i{+}1\end{array}\right)\]

(Erdélyi 1953), where

\[_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}c;{\,}z\\d,{\,}e\end{array}\right)\]

is the generalized hypergeometric series. Thus, we have an identity for the truncated hypergeometric series:

\begin{eqnarray*}&&y_{i}(a,{\,}b,{\,}a{+}b{+}j,{\,}1){=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}a{+}b{+}i{+}j;{\,}1\\a{+}b{+}j,{\,}a{+}b{+}i{+}1\end{array}\right)\\&&{=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}a{+}b{+}i{+}j;{\,}1\\a{+}b{+}i{+}1,{\,}a{+}b{+}j\end{array}\right)\\&&{=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}\frac{(j{-}1)!{\Gamma}(a{+}b{+}j)}{{\Gamma}(a{+}j){\Gamma}(b{+}j)}\\&&{\times}y_{j{-}1}(a,{\,}b,{\,}a{+}b{+}i{+}1,{\,}1).\end{eqnarray*}

By using the identity for the truncated hypergeometric series, we obtain

\begin{eqnarray*}&&{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\\&&{=}\frac{n!(n{-}1)!{\Gamma}(m{-}(1/2){+}\mathrm{{\sigma}}){\Gamma}(m{-}(1/2){-}\mathrm{{\sigma}})}{(n{+}m{-}1)!(n{-}m)!{\Gamma}(m{+}(5/2){+}\mathrm{{\sigma}}){\Gamma}(m{+}(5/2){-}\mathrm{{\sigma}})}\\&&{\times}\left[2my_{2}\left(m{-}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{-}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}n{+}m,{\,}1\right)\right.\ \\&&\left.\ {+}\frac{(n{-}m)(m{-}(1/2){+}\mathrm{{\sigma}})(m{-}(1/2){-}\mathrm{{\sigma}})}{n{+}m}y_{1}\left(m{+}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{+}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}n{+}m{+}1,{\,}1\right)\right]\\&&{=}\frac{n!(n{-}1)!}{(n{+}m{-}1)!(n{-}m)!}\frac{{-}1}{2(2m{+}1)}\left[\frac{1}{2m{+}\mathrm{{\rho}}}{+}\frac{1}{2(m{+}1){-}\mathrm{{\rho}}}\frac{(n{-}m)(n{-}m{-}1)}{(n{+}m)(n{+}m{+}1)}\right].\end{eqnarray*}

Footnotes

Communicating editor: M. Feldman

Acknowledgement

I acknowledge the continuous encouragement offered by T. Gojobori. Also, I thank M. Notohara, A. Simizu, and two anonymous reviewers for comments on an earlier version of this manuscript.

References

Erdélyi, A. (Editor),

1953

Higher Transcendental Functions, Vol. I. McGraw-Hill, New York.

Erdélyi, A. (Editor),

1954

Tables of Integral Transforms, Vol. II. McGraw-Hill, New York.

Ethier, S. N.,

1979

A limit theorem for two-locus diffusion models in population genetics.

J. Appl. Probab.

16

:

402

–408.

Month:	Total Views:
March 2021	1
April 2021	1
June 2021	2
July 2021	2
August 2021	1
September 2021	4
October 2021	2
November 2021	1
January 2022	5
February 2022	8
March 2022	3
April 2022	5
May 2022	4
June 2022	4
July 2022	9
August 2022	9
September 2022	6
October 2022	3
November 2022	1
December 2022	3
January 2023	2
February 2023	1
March 2023	2
April 2023	1
May 2023	1
June 2023	5
July 2023	6
August 2023	6
September 2023	3
October 2023	4
November 2023	3
December 2023	5
January 2024	1
February 2024	1
March 2024	3
April 2024	6

Article Contents

Random Genetic Drift and Gamete Frequency

Abstract

THE DIFFUSION PROCESS OF THE TWO-LOCUS PROBLEM

CONDITIONAL EXPECTATION OF GAMETE FREQUENCY

DISCUSSION

APPENDIX A

APPENDIX B

Footnotes

Acknowledgement

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only