Abstract

An analytic expression of conditional expectation of transient gamete frequency, given that one of the two loci remains polymorphic, is obtained in terms of the diffusion process by calculating the moments of the distribution. Using this expression, a model where linkage disequilibrium is introduced by a single mutation is considered. The conditional expectation of the gamete frequency given that the locus with the mutant allele remains polymorphic is presented. The behavior is significantly different from the monotonic decrease observed in the deterministic model without random genetic drift.

WITH respect to random genetic drift for the one-locus problem, the state of steady decay was first obtained correctly by Wright (1931). However, in this study it was assumed that the state of steady decay had already been attained. By calculating the moments of the distribution, Kimura (1955a) obtained the complete expression of the transient probability density for the unfixed class, which shows how the process leads to the state of steady decay. It was found that after 2N generations the distribution becomes almost flat, where N is the effective population size.

Since each mutant ultimately becomes either fixed or lost, the steady state will be attained only if linear evolutionary pressures, such as mutation, operate. For the two-locus problem, the steady state has been discussed in terms of the diffusion process (Ohta and Kimura 1969b, 1970; Ethier and Nagylaki 1989) and the genealogical process (Griffiths 1981; Hudson 1983, 1985; Golding 1984; Ethier and Griffiths 1990). In contrast, in situations without linear evolutionary pressures, how the process eventually leads to the state of steady decay has not been studied, with the exception of several functions that vanish at the absorbing boundaries (Hill and Robertson 1968; Ohta and Kimura 1969a; Littler 1973). Despite the fact that the two-locus problem is uniquely characterized by gamete frequencies, the transient behavior of them has not been examined.

In this article, I derive an analytic expression of conditional expectation of the transient gamete frequency, given that one of the two loci remains polymorphic, in terms of the diffusion process. This expression shows how it ultimately leads to the asymptotic value. Using this expression, a model where linkage disequilibrium is introduced by a single mutation is discussed.

THE DIFFUSION PROCESS OF THE TWO-LOCUS PROBLEM

Consider a random-mating population with an effective population size of N. We measure time T in units of 2N generations. Let A1 and A2 be a pair of alleles with initial allele frequencies that are p and 1 − p, respectively, and the allele frequencies at time T are X and 1 − X, respectively. Kimura (1955a) obtained an analytic expression of the transient probability density for the unfixed class. In what follows lowercase letters of random variables represent their values. Let ϕ(p, x; T) be the probability density. The probability that the locus remains polymorphic was also given,
\begin{eqnarray*}&&P[X{\in}(0,{\,}1)]{=}{{\int}_{0}^{1}}\mathrm{{\phi}}dx\\&&{=}1{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}{\,}E[X^{n}]{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}{\,}E[(1{-}X)^{n}]\\&&{=}{{\sum}_{m{=}0}^{{\infty}}}[P_{2m}(1{-}2p){-}P_{2m{+}2}(1{-}2p)]e^{{-}((2m{+}1)(2m{+}2)/2)T},\end{eqnarray*}
(1)
where Pm(z) represents the Legendre polynomial.
Next, we discuss the expectation of the allele frequency. In general, since we cannot observe a polymorphism that has been lost, a polymorphism can be observed only for an unfixed class. Thus, the obvious relation E[X] = p is nonsense from the perspective of observation. We have interest in the conditional expectation of the frequencies given that a polymorphism is retained. By using the expression of the transient fixation probability, which was given by Kimura (1955a), we obtain the conditional expectation of the allele frequency for the unfixed class,
\[E[X{\vert}X{\in}(0,{\,}1)]{=}\frac{E[XI_{(0,1)}(X)]}{P[X{\in}(0,{\,}1)]},\]
(2)
where
\begin{eqnarray*}&&E[XI_{(0,1)}(X)]{=}{{\int}_{0}^{1}}x\mathrm{{\phi}}dx{=}E[X]{-}f(1;{\,}T)\\&&{=}{{\sum}_{m{=}1}^{{\infty}}}\frac{({-}1)^{m}}{2}[P_{m{+}1}(1{-}2p){-}P_{m{-}1}(1{-}2p)]e^{{-}(m(m{+}1)/2)T},\end{eqnarray*}
where I(0,1)(X) represents the indicator function of the open interval (0, 1) and f(1; T) represents the transient fixation probability of the allele A1. The asymptotic value of the conditional expectation of the allele frequency is
\[E[X{\vert}X{\in}(0,{\,}1)]{\rightarrow}\frac{1}{2}{\ }(T{\rightarrow}{\infty}),\]
which agrees with the fact that the conditional distribution becomes uniform asymptotically.
Let us assume two loci A and B in which pairs of alleles A1, A2 and B1, B2 are segregating, and let the initial frequencies of gametes A1B1, A1B2, A2B1, and A2B2 be, respectively, g1, g2, g3, and 1 − (g1 + g2 + g3), and let the frequencies of them at time T be, respectively, X1, X2, X3, and 1 − (X1 + X2 + X3). Let the initial frequencies of alleles B1 and B2 be, respectively, q and 1 − q, and let the frequencies of them at time T be Y and 1 − Y, respectively. Let D = g1(1 − g1g2g3) − g2g3 be the initial value of the linkage disequilibrium coefficient and Z = X1(1 − X1X2X3) − X2X3 be the value of the linkage disequilibrium coefficient at time T. We have
\[X_{1}{=}XY{+}Z,{\ }X_{2}{=}X(1{-}Y){-}Z,{\ }X_{3}{=}(1{-}X)Y{-}Z.\]

Let c be the recombination fraction between the two loci, and we set ρ = 4Nc. We do not discuss where c = 0, since the problem reduces to the multiallelic one-locus problem that has previously been discussed by Kimura (1955b). For the deterministic model without random genetic drift, we have x = p, y = q, and z = Dect.

The probability density for the gamete frequencies
\[\mathrm{{\phi}}(g_{1},{\,}g_{2},{\,}g_{3};{\,}x_{1},{\,}x_{2},{\,}x_{3};{\,}T)\]
satisfies the following Kolmogorov backward equation,
\[\frac{{\partial}\mathrm{{\phi}}}{{\partial}T}{=}{{\sum}_{i,j{=}1}^{3}}\frac{g_{i}(\mathrm{{\delta}}_{ij}{-}g_{j})}{2}\frac{{\partial}^{2}\mathrm{{\phi}}}{{\partial}g_{i}{\partial}g_{j}}{-}\frac{\mathrm{{\rho}}D}{2}\left(\frac{{\partial}\mathrm{{\phi}}}{{\partial}g_{1}}{-}\frac{{\partial}\mathrm{{\phi}}}{{\partial}g_{2}}{-}\frac{{\partial}\mathrm{{\phi}}}{{\partial}g_{3}}\right)\]
(3)
(Ohta and Kimura 1969a), where δij represents Kronecker's delta. Although the probability density itself is unknown, Ohta and Kimura (1969a) obtained expectations of functions
\[X(1{-}X)Y(1{-}Y),{\ }(1{-}2X)(1{-}2Y)Z,{\ }Z^{2},\]
(4)
which were discussed by Hill and Robertson (1968). The process is defined in a tetrahedron 0 ≤ x1x1 + x2x1 + x2 + x3 ≤ 1. By changing variables from the gamete frequencies to the variables x, y, and z, the region is transformed into a three-dimensional region, the upper surface of the boundary of which is depicted in Figure 1. On the peripheral edges, which is the periphery of the square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, one of the two loci is monomorphic. At the points (1, 1, 0), (1, 0, 0), (0, 1, 0), and (0, 0, 0), one of the gametes A1B1, A1B2, A2B1, and A2B2 fixes, respectively. We represent the inside of the region as
\(\mathcal{D}\)
. The expectation of the linkage disequilibrium coefficient is
\[E[Z]{=}De^{{-}(1{+}(\mathrm{{\rho}}/2))T}\]
(Hill and Robertson 1968), and the squared standard linkage deviation tends to
\[\frac{E[Z^{2}]}{E[X(1{-}X)Y(1{-}Y)]}{\sim}\frac{1}{\mathrm{{\rho}}}{\ }(T{\rightarrow}{\infty}),\]
when ρ is large (Ohta and Kimura 1969a).
Figure 1.—

The upper surface of the boundary of the region in which the diffusion process is defined.

Next, we discuss the expectation of the gamete frequencies. In the same manner as for the functions (4) and the linkage disequilibrium measure, we obtain the expectation of the gamete frequency X1:
\[E[X_{1}]{=}g_{1}{+}\frac{\mathrm{{\rho}}D}{2{+}\mathrm{{\rho}}}[e^{{-}(1{+}(\mathrm{{\rho}}/2))T}{-}1].\]
However, in contrast to the functions (4) and the linkage disequilibrium coefficient, the gamete frequencies do not vanish at the peripheral edges. The expectation takes over not only the inside of the region
\(\mathcal{D}\)
, but also the peripheral edges. As obtained by Ohta (1968),
\(\mathrm{lim}_{T{\rightarrow}{\infty}}E[X_{1}]\)
gives the fixation probability of the gamete A1B1. Thus, the expectation of the gamete frequency X1 can be rewritten as
\begin{eqnarray*}&&E[X_{1}]{=}{{\int}}{{\int}}{{\int}_{\mathcal{D}}}x_{1}\mathrm{{\phi}}dxdydz{+}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{x{=}1}dy{+}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{y{=}1}dx\\&&{+}f(1,{\,}0,{\,}0;{\,}T),\end{eqnarray*}
(5)
where ϕx=1 and ϕy=1 represent the probability density for the open intervals x = 1, y ∈ (0, 1), z = 0 and x ∈ (0, 1), y = 1, z = 0, respectively, and f(1, 0, 0; T) represents the transient fixation probability of the gamete A1B1 at time T. With respect to the one-locus problem, as discussed above, we are interested in the conditional expectation of the gamete frequencies given that polymorphism is retained.

CONDITIONAL EXPECTATION OF GAMETE FREQUENCY

Let us suppose a model whereby linkage disequilibrium is introduced by a single mutation, as considered by Nei and Li (1980) regarding the association between electromorphs and inversion chromosomes in Drosophila. We assume that locus A has remained monomorphic with the wild-type allele A2 and that locus B, in which a pair of alleles B1 and B2 (electromorphs) are segregating, has allele frequencies q and 1 − q, respectively. Then, the mutation introduces the mutant allele (inversion chromosome) A1 to locus A of one of the allele B1 bearing chromosomes. This model specifies the initial allele frequency of A1 as p = 1/2N, the initial gamete frequencies as g1 = p, g2 = 0, and g3 = qp, and the initial value of the linkage disequilibrium measure as D = p(1 − q); however, the following expressions hold regardless of these relations. In this model, a polymorphism at locus A is important since allele A1 is prone to be lost by random genetic drift. In addition, locus B may be regarded as a marker polymorphism to detect the mutant. In this article, we consider the conditional expectation given that locus A remains polymorphic. It might seem that this condition is similar to that described by Kaplan and Weir (1992). They discussed conditional expectation of the linkage disequilibrium measure, which was defined by Nei and Li (1980), given that a polymorphism is observed at locus B. They assumed that the allele frequency of A1 is constant and that locus B follows the infinite-allele model assumption. Moreover, they considered the steady state. Thus, their model differs from that described here, and the condition that locus A remains polymorphic is meaningful for our diffusion process, which ultimately leads to monomorphism. Note that this condition nearly equates to a condition that both of the two loci remain polymorphic for large-size populations, since the probability that a polymorphism at locus A is lost earlier than that at locus B is given by Karlin and McGregor (1968),
\[\frac{q(1{-}q)}{q(1{-}q){+}p(1{-}p)},\]
which is almost unity unless the allele frequency q is very small.
By expression (5), we have
\begin{eqnarray*}&&E[X_{1}I_{(0,1)}(X)]{=}{{\int}}{{\int}}{{\int}_{\mathcal{D}}}x_{1}\mathrm{{\phi}}dxdydz{+}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{y{=}1}dx\\&&{=}E[X_{1}]{-}f(1,{\,}0,{\,}0;{\,}T){-}{{\int}_{0}^{1}}x_{1}\mathrm{{\phi}}_{x{=}1}dy\\&&{=}E[X_{1}]{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}E[X_{1}X^{n}]\\&&{=}E[X_{1}]{-}{\mathrm{lim}_{n{\rightarrow}{\infty}}}E[X^{n}Y].\end{eqnarray*}
(6)
The expressions for the other gamete frequencies X2 and X3 can be obtained in the same manner. To calculate the limit of the expectation
\(\mathrm{lim}_{n{\rightarrow}{\infty}}E[X^{n}Y]\)
, we consider some moments. For convenience, we denote
\[\mathrm{{\mu}}_{{\ell},m,n}{=}E[X^{{\ell}}Y^{m}Z^{n}].\]
Making use of the Kolmogorov backward equation (3), the moments μℓ,m,n satisfy a differential equation:
\begin{eqnarray*}&&\frac{d\mathrm{{\mu}}_{{\ell},m,n}}{dT}{=}{-}\frac{{\ell}({\ell}{-}1){+}m(m{-}1){+}n(n{-}1){+}4n({\ell}{+}m){+}n(2{+}\mathrm{{\rho}})}{2}\mathrm{{\mu}}_{{\ell},m,n}\\&&{+}{\ell}m\mathrm{{\mu}}_{{\ell}{-}1,m{-}1,n{+}1}{+}\frac{{\ell}({\ell}{+}2n{-}1)}{2}\mathrm{{\mu}}_{{\ell}{-}1,m,n}{+}\frac{m(m{+}2n{-}1)}{2}\mathrm{{\mu}}_{{\ell},m{-}1,n}\\&&{+}\frac{n(n{-}1)}{2}[\mathrm{{\mu}}_{{\ell},m,n{-}1}{+}\mathrm{{\mu}}_{{\ell}{+}1,m{+}1,n{-}2}{-}\mathrm{{\mu}}_{{\ell}{+}1,m{+}2,n{-}2}{-}\mathrm{{\mu}}_{{\ell}{+}2,m{+}1,n{-}2}\\&&{+}\mathrm{{\mu}}_{{\ell}{+}2,m{+}2,n{-}2}{-}2(\mathrm{{\mu}}_{{\ell}{+}1,m,n{-}1}{+}\mathrm{{\mu}}_{{\ell},m{+}1,n{-}1})\\&&{+}4\mathrm{{\mu}}_{{\ell}{+}1,m{+}1,n{-}1}].\end{eqnarray*}
It is worthwhile to note that E[XYmX1n] satisfies a recurrence relation on the two-locus sampling distribution by Golding (1984; Ethier and Griffiths 1990).
The moments μn,0,1 satisfy a differential equation,
\[\frac{d\mathrm{{\mu}}_{n{-}1,0,1}}{dT}{=}{-}\left[\frac{n(n{+}1)}{2}{+}\frac{\mathrm{{\rho}}}{2}\right]\mathrm{{\mu}}_{n{-}1,0,1}{+}\frac{n(n{-}1)}{2}\mathrm{{\mu}}_{n{-}2,0,1},\]
and the differential equation has the solution of the form
\[\mathrm{{\mu}}_{n{-}1,0,1}{=}{{\sum}_{m{=}1}^{n}}C_{n{-}1}^{(m)}e^{{-}((\mathrm{{\rho}}{+}m(m{+}1))/2)T},\]
where
\[C_{n{-}1}^{(m)}{=}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)\left(\begin{array}{l}2m{-}1\\m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{m(2m{+}1)}{n(2n{+}1)}C_{m{-}1}^{(m)},\]
with the initial condition
\[p^{n{-}1}D{=}{{\sum}_{m{=}1}^{n}}C_{n{-}1}^{(m)}.\]
(7)
In  appendix a it is shown that
\[C_{n{-}1}^{(m)}{=}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{2m{+}1}{n(2n{+}1)}D({-}1)^{m{+}1}T_{m{-}1}^{1}(1{-}2p),\]
where
\(T_{m}^{1}(z)\)
represents the Gegenbauer polynomial, which is also represented as
\(C_{m}^{3/2}(z)\)
.
The moments μn,1,0 satisfy a differential equation,
\[\frac{d\mathrm{{\mu}}_{n,1,0}}{dT}{=}{-}\frac{n(n{-}1)}{2}(\mathrm{{\mu}}_{n,1,0}{-}\mathrm{{\mu}}_{n{-}1,1,0}){+}n\mathrm{{\mu}}_{n{-}1,0,1},\]
and the differential equation has the solution of the form
\begin{eqnarray*}&&\mathrm{{\mu}}_{n,1,0}{=}pq{+}\frac{D}{1{+}(\mathrm{{\rho}}/2)}{+}{{\sum}_{m{=}1}^{n{-}1}}E_{n}^{(m)}e^{{-}(m(m{+}1)/2)T}\\&&{+}{{\sum}_{m{=}1}^{n}}F_{n}^{(m)}e^{{-}((\mathrm{{\rho}}{+}m(m{+}1))/2)T},\end{eqnarray*}
where
\[E_{n}^{(m)}{=}\frac{\left(\begin{array}{l}2n{-}1\\n{+}m\end{array}\right)\left(\begin{array}{l}2m{+}1\\m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}E_{m{+}1}^{(m)},\]
(8)
\[[(n{+}m)(n{-}m{-}1){-}\mathrm{{\rho}}]F_{n}^{(m)}{=}n(n{-}1)F_{n{-}1}^{(m)}{+}2nC_{n{-}1}^{(m)},\]
(9)
with the initial condition
\[p^{n}q{=}pq{+}\frac{D}{1{+}(\mathrm{{\rho}}/2)}{+}{{\sum}_{m{=}1}^{n{-}1}}E_{n}^{(m)}{+}{{\sum}_{m{=}1}^{n}}F_{n}^{(m)}.\]
(10)
The recurrence relation (9) can be expressed by a matrix equation,
\(\mathbf{\mathrm{Af}}{=}\mathbf{\mathrm{c}}\)
with vectors
\(\mathbf{\mathrm{f}}_{k}{=}F_{k}^{(m)},{\,}\mathbf{\mathrm{c}}_{k}{=}2kC_{k{-}1}^{(m)},(k{=}m,{\,}m{+}1,{\ldots},{\,}n)\)
. The determinant of the matrix A is
\[\mathrm{det}\mathbf{\mathrm{A}}{=}{{\prod}_{k{=}m}^{n}}[k(k{-}1){-}m(m{+}1){-}\mathrm{{\rho}}],\]
which has zeros at ρ = 2 + 2ℓ, (ℓ = 1, 2, 3, …). These zeros are due to degeneracy of the eigenvalues. Since we are not interested in the specific points of ρ, we discuss the case that the inverse matrix exists in the following, although the calculation is straightforward for each point ρ = 2 + 2ℓ, (ℓ = 1, 2, 3, …). By applying the inverse matrix, we obtain
\begin{eqnarray*}&&F_{n}^{(m)}{=}{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{2n!(n{-}1)!}{[(n{-}k)!]^{2}}\\&&{\times}\frac{{\Gamma}(n{-}k{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{-}k{+}(1/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}C_{n{-}k}^{(m)}\\&&{=}\left[{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\right.\ \\&&\left.\ {\times}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\right]\\&&{\times}4D(2m{+}1)({-}1)^{m{+}1}T_{m{-}1}^{1}(1{-}2p),\\&&\mathrm{{\sigma}}{=}\sqrt{\frac{1}{4}{+}m(m{+}1){+}\mathrm{{\rho}}}.\end{eqnarray*}
It is shown in  appendix b that the finite series inside of the brackets follows an identity:
\begin{eqnarray*}&&{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\\&&{=}\frac{n!(n{-}1)!}{(n{+}m{-}1)!(n{-}m)!}\frac{{-}1}{2(2m{+}1)}\\&&{\times}\left[\frac{1}{2m{+}\mathrm{{\rho}}}{+}\frac{1}{2(m{+}1){-}\mathrm{{\rho}}}\frac{(n{-}m)(n{-}m{-}1)}{(n{+}m)(n{+}m{+}1)}\right].\end{eqnarray*}
(11)
Thus, we obtain
\begin{eqnarray*}&&F_{n}^{(m)}{=}\frac{n!(n{-}1)!}{(n{+}m{-}1)!(n{-}m)!}\\&&{\times}\left[\frac{1}{2m{+}\mathrm{{\rho}}}{+}\frac{1}{2(m{+}1){-}\mathrm{{\rho}}}\frac{(n{-}m)(n{-}m{-}1)}{(n{+}m)(n{+}m{+}1)}\right]\\&&{\times}2D({-}1)^{m}T_{m{-}1}^{1}(1{-}2p).\end{eqnarray*}
(12)
By using (12) and the orthogonal property of the Gegenbauer polynomial,
\[{{\int}_{{-}1}^{1}}(1{-}z^{2})T_{k{-}1}^{1}(z)T_{{\ell}{-}1}^{1}(z)dz{=}\mathrm{{\delta}}_{k,{\ell}}\frac{2{\ell}({\ell}{+}1)}{2{\ell}{+}1},\]
(13)
we obtain the general expression for
\(E^{(m)}_{n}\)
from (10),
\begin{eqnarray*}&&E_{n}^{(m)}{=}({-}1)^{m}\frac{\left(\begin{array}{l}2n{-}1\\n{+}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\\&&{\times}\left\{\frac{2m{+}1}{m(m{+}1)}2pq(1{-}p)T_{m{-}1}^{1}(1{-}2p)\right.\ \\&&\left.\ {+}2D\left[\frac{T_{m}^{1}(1{-}2p)}{2(m{+}1){+}\mathrm{{\rho}}}{+}\frac{T_{m{-}2}^{1}(1{-}2p)}{2m{-}\mathrm{{\rho}}}\right]\right\}{\ }(m{\geq}2),\\&&E_{n}^{(1)}{=}{-}\frac{n{-}1}{n{+}1}3\left[pq(1{-}p){+}\frac{2D(1{-}2p)}{4{+}\mathrm{{\rho}}}\right].\end{eqnarray*}
We observe the limit,
\begin{eqnarray*}&&\mathrm{{\mu}}_{n,1,0}{\rightarrow}q\left[p{+}{{\sum}_{m{=}1}^{{\infty}}}({-}1)^{m}\frac{\left(\begin{array}{l}2n{-}1\\n{+}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{2m{+}1}{m(m{+}1)}\right.\ \\&&\left.\ {\times}2p(1{-}p)T_{m{-}1}^{1}(1{-}2p)e^{{-}(m(m{+}1)/2)T}\right]\\&&{=}q\mathrm{{\mu}}_{n,0,0}{\ }(\mathrm{{\rho}}{\rightarrow}{\infty}),\end{eqnarray*}
which agrees with the limit theorem given by Ethier (1979).
We observe the limits of
\(F^{(m)}_{n}\)
and
\(E^{(m)}_{n}\)
,
\[{\mathrm{lim}_{n{\rightarrow}{\infty}}}F_{n}^{(m)}{=}\frac{4D(2m{+}1)({-}1)^{m}}{(2m{+}\mathrm{{\rho}})[2(m{+}1){-}\mathrm{{\rho}}]}T_{m{-}1}^{1}(1{-}2p)\]
and
\[{\mathrm{lim}_{n{\rightarrow}{\infty}}}E_{n}^{(m)}{=}\left(\begin{array}{l}2m{+}1\\m\end{array}\right)E_{m{+}1}^{(m)},\]
respectively.
By using these results for the moments, we arrive at the analytic expression of (6):
\begin{eqnarray*}&&E[X_{1}I_{(0,1)}(X)]{=}\frac{\mathrm{{\rho}}D}{2{+}\mathrm{{\rho}}}e^{{-}(1{+}(\mathrm{{\rho}}/2))T}{+}3\left[pq(1{-}p){+}\frac{2D(1{-}2p)}{4{+}\mathrm{{\rho}}}\right]e^{{-}T}\\&&{-}{{\sum}_{m{=}2}^{{\infty}}}({-}1)^{m}\left\{\frac{2m{+}1}{m(m{+}1)}2pq(1{-}p)T_{m{-}1}^{1}(1{-}2p)\right.\ \\&&\left.\ {+}2D\left[\frac{T_{m}^{1}(1{-}2p)}{2(m{+}1){+}\mathrm{{\rho}}}{+}\frac{T_{m{-}2}^{1}(1{-}2p)}{2m{-}\mathrm{{\rho}}}\right]\right\}e^{{-}(m(m{+}1)/2)T}\\&&{-}{{\sum}_{m{=}1}^{{\infty}}}\frac{4D(2m{+}1)({-}1)^{m}}{(2m{+}\mathrm{{\rho}})[2(m{+}1){-}\mathrm{{\rho}}]}T_{m{-}1}^{1}(1{-}2p)e^{{-}((\mathrm{{\rho}}{+}m(m{+}1))/2)T}.\end{eqnarray*}
We observe the limit,
\begin{eqnarray*}&&E[X_{1}I_{(0,1)}(X)]{\rightarrow}De^{{-}ct}{+}{{\sum}_{m{=}1}^{{\infty}}}\frac{q({-}1)^{m}}{2}[P_{m{+}1}(1{-}2p){-}P_{m{-}1}(1{-}2p)]\\&&{=}pq{+}De^{{-}ct}{\ }(N{\rightarrow}{\infty}),\end{eqnarray*}
which shows the deterministic behavior of the gamete frequency X1 without random genetic drift, as expected.
We observe the asymptotic form,
\[E[X_{1}I_{(0,1)}(X)]{\sim}3\left[pq(1{-}p){+}\frac{2D(1{-}2p)}{4{+}\mathrm{{\rho}}}\right]e^{{-}T}{\ }(T{\rightarrow}{\infty}).\]
The conditional expectation of the gamete frequency X1 given that locus A remains polymorphic is
\[E[X_{1}{\vert}X{\in}(0,{\,}1)]{=}\frac{E[X_{1}I_{(0,1)}(X)]}{P[X{\in}(0,{\,}1)]},\]
where the denominator is given by (1). The asymptotic value of the conditional expectation of the gamete frequency X1 is
\[E[X_{1}{\vert}X{\in}(0,{\,}1)]{\rightarrow}\frac{q}{2}{+}\frac{D(1{-}2p)}{p(1{-}p)(4{+}\mathrm{{\rho}})}{\ }(T{\rightarrow}{\infty}).\]
(14)

In contrast to the deterministic model without random genetic drift, the value is higher than pq, to which the deterministic model tends, and depends on ρ. Note that the second term of the asymptotic value in (14) represents the conditional covariance between the frequencies of the alleles A1 and B1. The process of the change in the conditional expectation of the gamete frequency X1 when the linkage disequilibrium is introduced into a population as p = 1/2N = 0.05 and q = 0.2 is illustrated in Figure 2. It can be seen that after 4N generations (T = 2.0) the conditional expectation of the gamete frequency X1 almost reaches the asymptotic value for large ρ, although 4N generations is still not enough for small ρ. It can also be seen that the conditional expectation of the gamete frequency X1 does not show monotonic behavior for small ρ. It increases rapidly and then decreases to the asymptotic value. For comparison, the counterpart in the deterministic model is also illustrated in Figure 3

Figure 2.—

The conditional expectation of the gamete frequency X1 given that locus A keeps polymorphism. p = 0.05 and q = 0.2.

Figure 3.—

The gamete frequency X1 in the deterministic model without random genetic drift. p = 0.05 and q = 0.2.

.

To observe the frequency of the allele B1 within the allele A1 bearing chromosomes, let us consider a ratio of the conditional expectation of the gamete frequency X1 to that of the allele frequency X,
\[\frac{E[X_{1}{\vert}X{\in}(0,{\,}1)]}{E[X{\vert}X{\in}(0,{\,}1)]},\]
where the denominator is given by (2). The asymptotic value is
\[\frac{E[X_{1}{\vert}X{\in}(0,{\,}1)]}{E[X{\vert}X{\in}(0,{\,}1)]}{\rightarrow}q{+}\frac{2D(1{-}2p)}{p(1{-}p)(4{+}\mathrm{{\rho}})}{\ }(T{\rightarrow}{\infty}).\]
(15)

In contrast to the deterministic model without random genetic drift, the value is higher than q, to which the deterministic model tends, and depends on ρ. The process of the change in the ratio of the conditional expectation of the gamete frequency X1 to that of the allele frequency X when the linkage disequilibrium is introduced into a population as p = 1/2N = 0.05 and q = 0.2 is illustrated in Figure 4. It can be seen that after 4N generations (T = 2.0) the ratio almost reaches the asymptotic value for large ρ, although 4N generations is still not enough for small ρ. For comparison, the counterpart in the deterministic model is also illustrated in Figure 5. It can be seen that the discrepancy between our model and the deterministic model is significant for small ρ.

Figure 4.—

The ratio of the conditional expectation of the gamete frequency X1 given that locus A keeps polymorphism to that of the allele frequency X. p = 0.05 and q = 0.2.

Figure 5.—

The ratio of the gamete frequency X1 to the allele frequency X in the deterministic model without random genetic drift. p = 0.05 and q = 0.2.

DISCUSSION

The analytic expression of conditional expectation of transient gamete frequency given that one of the two loci remains polymorphic was obtained in terms of the diffusion process by calculating the moments of the distribution. This expression is general and independent from models that introduce linkage disequilibrium into a population.

We considered the model that linkage disequilibrium is introduced by a single mutation and association between the mutant allele A1 and the allele B1, which filled the other locus of the chromosome on which the mutation occurred. Because the allele A1 is prone to be lost by random genetic drift, the conditional expectation of the frequency of the gamete A1B1 given that locus A remains polymorphic is meaningful. The behavior is significantly different from the monotonic decrease in the deterministic model without random genetic drift. After 4N generations, the conditional expectation of the gamete frequency almost reaches the asymptotic value for large ρ, although 4N generations is still not enough for small ρ. The asymptotic value is larger than the product of the initial allele frequencies to which the deterministic model tends and depends on the recombination fraction between the two loci. Note that the conditional expectation of the linkage disequilibrium coefficient vanishes asymptotically in a similar manner to that in the deterministic model. This observation demonstrates the obvious fact that the linkage disequilibrium measure is not enough to characterize the two-locus problem uniquely.

APPENDIX A

Since the Gegenbauer polynomial is orthogonal on the interval [−1, 1], the right-hand terms of (7) can be represented in terms of the Gegenbauer polynomials of which degrees are up to n − 1 as
\[p^{n{-}1}D{=}{{\sum}_{m{=}1}^{n}}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)\left(\begin{array}{l}2m{-}1\\m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{m(2m{+}1)}{n(2n{+}1)}DC_{m}T_{m{-}1}^{1}(z),\]
where we set z = 1 − 2p. Multiplying
\((1{-}z^{2})T_{m{-}1}^{1}(z)\)
on both sides of the equation and using the orthogonal property (13), we have
\[C_{m}{=}\frac{({-}1)^{m{+}1}2^{{-}n}\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}{\left(\begin{array}{l}2m{-}1\\m\end{array}\right)\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)}\frac{n(2n{+}1)}{m^{2}(m{+}1)}{{\int}_{{-}1}^{1}}(1{-}z)(1{+}z)^{n}T_{m{-}1}^{1}(z)dz.\]
An integral transform of (1 − z)(1 + z)n by the Gegenbauer polynomial is
\[{{\int}_{{-}1}^{1}}(1{-}z)(1{+}z)^{n}T_{m{-}1}^{1}(z)dz{=}\frac{\left(\begin{array}{l}2n{+}1\\n{-}m\end{array}\right)}{\left(\begin{array}{l}2n{-}1\\n\end{array}\right)}\frac{2^{n}m(m{+}1)}{n(2n{+}1)}\]
(Erdélyi 1954). Thus, we have
\[C_{m}{=}\frac{({-}1)^{m{+}1}}{m\left(\begin{array}{l}2m{-}1\\m\end{array}\right)}.\]

APPENDIX B

It is straightforward to check the identity (11) for m = n. For 1 ≤ mn − 1, the finite series can be expressed by the truncated hypergeometric series,
\begin{eqnarray*}&&{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}{=}\frac{n!(n{-}1)!}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\\&&{\times}\left[\frac{m{\Gamma}(m{-}(1/2){+}\mathrm{{\sigma}}){\Gamma}(m{-}(1/2){-}\mathrm{{\sigma}})}{(2m{+}1)!}\right.\ \\&&\left.\ {\times}y_{n{-}m}\left(m{-}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{-}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}2m{+}2,{\,}1\right)\right.\ \\&&\left.\ {+}\frac{{\Gamma}(m{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(m{+}(1/2){-}\mathrm{{\sigma}})}{(2m{+}2)!}\right.\ \\&&\left.\ {\times}y_{n{-}m{-}1}\left(m{+}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{+}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}2m{+}3,{\,}1\right)\right],\end{eqnarray*}
where yn(a, b, c, z) is the truncated hypergeometric series. The truncated hypergeometric series can be expressed in terms of the generalized hypergeometric series,
\[y_{i}(a,{\,}b,{\,}c,{\,}1){=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}c{+}i;{\,}1\\c,{\,}a{+}b{+}i{+}1\end{array}\right)\]
(Erdélyi 1953), where
\[_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}c;{\,}z\\d,{\,}e\end{array}\right)\]
is the generalized hypergeometric series. Thus, we have an identity for the truncated hypergeometric series:
\begin{eqnarray*}&&y_{i}(a,{\,}b,{\,}a{+}b{+}j,{\,}1){=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}a{+}b{+}i{+}j;{\,}1\\a{+}b{+}j,{\,}a{+}b{+}i{+}1\end{array}\right)\\&&{=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}_{3}F_{2}\left(\begin{array}{l}a,{\,}b,{\,}a{+}b{+}i{+}j;{\,}1\\a{+}b{+}i{+}1,{\,}a{+}b{+}j\end{array}\right)\\&&{=}\frac{{\Gamma}(a{+}i{+}1){\Gamma}(b{+}i{+}1)}{i!{\Gamma}(a{+}b{+}i{+}1)}\frac{(j{-}1)!{\Gamma}(a{+}b{+}j)}{{\Gamma}(a{+}j){\Gamma}(b{+}j)}\\&&{\times}y_{j{-}1}(a,{\,}b,{\,}a{+}b{+}i{+}1,{\,}1).\end{eqnarray*}
By using the identity for the truncated hypergeometric series, we obtain
\begin{eqnarray*}&&{{\sum}_{k{=}1}^{n{-}m{+}1}}\frac{n!(n{-}1)!(k{+}m{-}1)}{(k{-}1)!(k{+}2m)!}\frac{{\Gamma}(k{+}m{-}(3/2){+}\mathrm{{\sigma}}){\Gamma}(k{+}m{-}(3/2){-}\mathrm{{\sigma}})}{{\Gamma}(n{+}(1/2){+}\mathrm{{\sigma}}){\Gamma}(n{+}(1/2){-}\mathrm{{\sigma}})}\\&&{=}\frac{n!(n{-}1)!{\Gamma}(m{-}(1/2){+}\mathrm{{\sigma}}){\Gamma}(m{-}(1/2){-}\mathrm{{\sigma}})}{(n{+}m{-}1)!(n{-}m)!{\Gamma}(m{+}(5/2){+}\mathrm{{\sigma}}){\Gamma}(m{+}(5/2){-}\mathrm{{\sigma}})}\\&&{\times}\left[2my_{2}\left(m{-}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{-}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}n{+}m,{\,}1\right)\right.\ \\&&\left.\ {+}\frac{(n{-}m)(m{-}(1/2){+}\mathrm{{\sigma}})(m{-}(1/2){-}\mathrm{{\sigma}})}{n{+}m}y_{1}\left(m{+}\frac{1}{2}{+}\mathrm{{\sigma}},{\,}m{+}\frac{1}{2}{-}\mathrm{{\sigma}},{\,}n{+}m{+}1,{\,}1\right)\right]\\&&{=}\frac{n!(n{-}1)!}{(n{+}m{-}1)!(n{-}m)!}\frac{{-}1}{2(2m{+}1)}\left[\frac{1}{2m{+}\mathrm{{\rho}}}{+}\frac{1}{2(m{+}1){-}\mathrm{{\rho}}}\frac{(n{-}m)(n{-}m{-}1)}{(n{+}m)(n{+}m{+}1)}\right].\end{eqnarray*}

Footnotes

Communicating editor: M. Feldman

Acknowledgement

I acknowledge the continuous encouragement offered by T. Gojobori. Also, I thank M. Notohara, A. Simizu, and two anonymous reviewers for comments on an earlier version of this manuscript.

References

Erdélyi, A. (Editor),

1953
 Higher Transcendental Functions, Vol. I. McGraw-Hill, New York.

Erdélyi, A. (Editor),

1954
 Tables of Integral Transforms, Vol. II. McGraw-Hill, New York.

Ethier, S. N.,

1979
A limit theorem for two-locus diffusion models in population genetics.
J. Appl. Probab.
 
16
:  
402
–408.

Ethier, S. N., and R. C. Griffiths,

1990
On the two-locus sampling distribution.
J. Math. Biol.
 
29
:  
131
–159.

Ethier, S. N., and T. Nagylaki,

1989
Diffusion approximation of the two-locus Wright-Fisher model.
J. Math. Biol.
 
27
:  
17
–28.

Golding, G. B.,

1984
The sampling distribution of linkage disequilibrium.
Genetics
 
108
:  
257
–274.

Griffiths, R. C.,

1981
Neutral two-locus multiple allele model with recombination.
Theor. Popul. Biol.
 
19
:  
169
–186.

Hill, W. G., and A. Robertson,

1968
Linkage disequilibrium in finite populations.
Theor. Appl. Genet.
 
38
:  
226
–231.

Hudson, R. R.,

1983
Property of a neutral allele model with intragenic recombination.
Theor. Popul. Biol.
 
23
:  
183
–201.

Hudson, R. R.,

1985
The sampling distribution of linkage disequilibrium under an infinite allele model without selection.
Genetics
 
109
:  
611
–631.

Littler, R. A.,

1973
Linkage disequilibrium in two-locus, finite, random mating models without selection or mutation.
Theor. Popul. Biol.
 
4
:  
259
–275.

Kaplan, N. L., and B. S. Weir,

1992
Expected behavior of conditional linkage disequilibrium.
Am. J. Hum. Genet.
 
51
:  
333
–343.

Karlin, S., and J. McGregor,

1968
Rates and probabilities of fixation for two locus random mating finite population without selection.
Genetics
 
58
:  
141
–159.

Kimura, M.,

1955
a Solution of a process of random genetic drift with a continuous model.
Proc. Natl. Acad. Sci. USA
 
41
:  
144
–150.

Kimura, M.,

1955
b Random genetic drift in multi-allelic locus.
Evolution
 
9
:  
419
–435.

Nei, M., and W-H. Li,

1980
Non-random association between electromorphs and inversion chromosomes in finite populations.
Genet. Res.
 
35
:  
65
–83.

Ohta, T.,

1968
Effect of initial linkage disequilibrium and epistasis on fixation probability in a small population, with two segregating loci.
Theor. Appl. Genet.
 
38
:  
243
–248.

Ohta, T., and M. Kimura,

1969
a Linkage disequilibrium due to random genetic drift.
Genet. Res.
 
13
:  
47
–55.

Ohta, T., and M. Kimura,

1969
b Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation.
Genetics
 
63
:  
229
–238.

Ohta, T., and M. Kimura,

1970
Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population.
Genetics
 
68
:  
571
–580.

Wright, S.,

1931
Evolution in Mendelian populations.
Genetics
 
16
:  
97
–159.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)