## Abstract

An analytic expression of conditional expectation of transient gamete frequency, given that one of the two loci remains polymorphic, is obtained in terms of the diffusion process by calculating the moments of the distribution. Using this expression, a model where linkage disequilibrium is introduced by a single mutation is considered. The conditional expectation of the gamete frequency given that the locus with the mutant allele remains polymorphic is presented. The behavior is significantly different from the monotonic decrease observed in the deterministic model without random genetic drift.

WITH respect to random genetic drift for the one-locus problem, the state of steady decay was first obtained correctly by Wright (1931). However, in this study it was assumed that the state of steady decay had already been attained. By calculating the moments of the distribution, Kimura (1955a) obtained the complete expression of the transient probability density for the unfixed class, which shows how the process leads to the state of steady decay. It was found that after 2*N* generations the distribution becomes almost flat, where *N* is the effective population size.

Since each mutant ultimately becomes either fixed or lost, the steady state will be attained only if linear evolutionary pressures, such as mutation, operate. For the two-locus problem, the steady state has been discussed in terms of the diffusion process (Ohta and Kimura 1969b, 1970; Ethier and Nagylaki 1989) and the genealogical process (Griffiths 1981; Hudson 1983, 1985; Golding 1984; Ethier and Griffiths 1990). In contrast, in situations without linear evolutionary pressures, how the process eventually leads to the state of steady decay has not been studied, with the exception of several functions that vanish at the absorbing boundaries (Hill and Robertson 1968; Ohta and Kimura 1969a; Littler 1973). Despite the fact that the two-locus problem is uniquely characterized by gamete frequencies, the transient behavior of them has not been examined.

In this article, I derive an analytic expression of conditional expectation of the transient gamete frequency, given that one of the two loci remains polymorphic, in terms of the diffusion process. This expression shows how it ultimately leads to the asymptotic value. Using this expression, a model where linkage disequilibrium is introduced by a single mutation is discussed.

## THE DIFFUSION PROCESS OF THE TWO-LOCUS PROBLEM

Consider a random-mating population with an effective population size of *N*. We measure time *T* in units of 2*N* generations. Let *A*_{1} and *A*_{2} be a pair of alleles with initial allele frequencies that are *p* and 1 − *p*, respectively, and the allele frequencies at time *T* are *X* and 1 − *X*, respectively. Kimura (1955a) obtained an analytic expression of the transient probability density for the unfixed class. In what follows lowercase letters of random variables represent their values. Let ϕ(*p*, *x*; *T*) be the probability density. The probability that the locus remains polymorphic was also given,(1)where *P _{m}*(

*z*) represents the Legendre polynomial.

Next, we discuss the expectation of the allele frequency. In general, since we cannot observe a polymorphism that has been lost, a polymorphism can be observed only for an unfixed class. Thus, the obvious relation *E*[*X*] = *p* is nonsense from the perspective of observation. We have interest in the conditional expectation of the frequencies given that a polymorphism is retained. By using the expression of the transient fixation probability, which was given by Kimura (1955a), we obtain the conditional expectation of the allele frequency for the unfixed class,(2)wherewhere *I*_{(0,1)}(*X*) represents the indicator function of the open interval (0, 1) and *f*(1; *T*) represents the transient fixation probability of the allele *A*_{1}. The asymptotic value of the conditional expectation of the allele frequency iswhich agrees with the fact that the conditional distribution becomes uniform asymptotically.

Let us assume two loci *A* and *B* in which pairs of alleles *A*_{1}, *A*_{2} and *B*_{1}, *B*_{2} are segregating, and let the initial frequencies of gametes *A*_{1}*B*_{1}, *A*_{1}*B*_{2}, *A*_{2}*B*_{1}, and *A*_{2}*B*_{2} be, respectively, *g*_{1}, *g*_{2}, *g*_{3}, and 1 − (*g*_{1} + *g*_{2} + *g*_{3}), and let the frequencies of them at time *T* be, respectively, *X*_{1}, *X*_{2}, *X*_{3}, and 1 − (*X*_{1} + *X*_{2} + *X*_{3}). Let the initial frequencies of alleles *B*_{1} and *B*_{2} be, respectively, *q* and 1 − *q*, and let the frequencies of them at time *T* be *Y* and 1 − *Y*, respectively. Let *D* = *g*_{1}(1 − *g*_{1} − *g*_{2} − *g*_{3}) − *g*_{2}*g*_{3} be the initial value of the linkage disequilibrium coefficient and *Z* = *X*_{1}(1 − *X*_{1} − *X*_{2} − *X*_{3}) − *X*_{2}*X*_{3} be the value of the linkage disequilibrium coefficient at time *T*. We have

Let *c* be the recombination fraction between the two loci, and we set ρ = 4*Nc*. We do not discuss where *c* = 0, since the problem reduces to the multiallelic one-locus problem that has previously been discussed by Kimura (1955b). For the deterministic model without random genetic drift, we have *x* = *p*, *y* = *q*, and *z* = *De*^{−ct}.

The probability density for the gamete frequenciessatisfies the following Kolmogorov backward equation,(3)(Ohta and Kimura 1969a), where δ_{ij} represents Kronecker's delta. Although the probability density itself is unknown, Ohta and Kimura (1969a) obtained expectations of functions(4)which were discussed by Hill and Robertson (1968). The process is defined in a tetrahedron 0 ≤ *x*_{1} ≤ *x*_{1} + *x*_{2} ≤ *x*_{1} + *x*_{2} + *x*_{3} ≤ 1. By changing variables from the gamete frequencies to the variables *x*, *y*, and *z*, the region is transformed into a three-dimensional region, the upper surface of the boundary of which is depicted in Figure 1. On the peripheral edges, which is the periphery of the square 0 ≤ *x* ≤ 1, 0 ≤ *y* ≤ 1, one of the two loci is monomorphic. At the points (1, 1, 0), (1, 0, 0), (0, 1, 0), and (0, 0, 0), one of the gametes *A*_{1}*B*_{1}, *A*_{1}*B*_{2}, *A*_{2}*B*_{1}, and *A*_{2}*B*_{2} fixes, respectively. We represent the inside of the region as . The expectation of the linkage disequilibrium coefficient is(Hill and Robertson 1968), and the squared standard linkage deviation tends towhen ρ is large (Ohta and Kimura 1969a).

Next, we discuss the expectation of the gamete frequencies. In the same manner as for the functions (4) and the linkage disequilibrium measure, we obtain the expectation of the gamete frequency *X*_{1}:However, in contrast to the functions (4) and the linkage disequilibrium coefficient, the gamete frequencies do not vanish at the peripheral edges. The expectation takes over not only the inside of the region , but also the peripheral edges. As obtained by Ohta (1968), gives the fixation probability of the gamete *A*_{1}*B*_{1}. Thus, the expectation of the gamete frequency *X*_{1} can be rewritten as(5)where ϕ_{x=1} and ϕ_{y=1} represent the probability density for the open intervals *x* = 1, *y* ∈ (0, 1), *z* = 0 and *x* ∈ (0, 1), *y* = 1, *z* = 0, respectively, and *f*(1, 0, 0; *T*) represents the transient fixation probability of the gamete *A*_{1}*B*_{1} at time *T*. With respect to the one-locus problem, as discussed above, we are interested in the conditional expectation of the gamete frequencies given that polymorphism is retained.

## CONDITIONAL EXPECTATION OF GAMETE FREQUENCY

Let us suppose a model whereby linkage disequilibrium is introduced by a single mutation, as considered by Nei and Li (1980) regarding the association between electromorphs and inversion chromosomes in Drosophila. We assume that locus *A* has remained monomorphic with the wild-type allele *A*_{2} and that locus *B*, in which a pair of alleles *B*_{1} and *B*_{2} (electromorphs) are segregating, has allele frequencies *q* and 1 − *q*, respectively. Then, the mutation introduces the mutant allele (inversion chromosome) *A*_{1} to locus *A* of one of the allele *B*_{1} bearing chromosomes. This model specifies the initial allele frequency of *A*_{1} as *p* = 1/2*N*, the initial gamete frequencies as *g*_{1} = *p*, *g*_{2} = 0, and *g*_{3} = *q* − *p*, and the initial value of the linkage disequilibrium measure as *D* = *p*(1 − *q*); however, the following expressions hold regardless of these relations. In this model, a polymorphism at locus *A* is important since allele *A*_{1} is prone to be lost by random genetic drift. In addition, locus *B* may be regarded as a marker polymorphism to detect the mutant. In this article, we consider the conditional expectation given that locus *A* remains polymorphic. It might seem that this condition is similar to that described by Kaplan and Weir (1992). They discussed conditional expectation of the linkage disequilibrium measure, which was defined by Nei and Li (1980), given that a polymorphism is observed at locus *B*. They assumed that the allele frequency of *A*_{1} is constant and that locus *B* follows the infinite-allele model assumption. Moreover, they considered the steady state. Thus, their model differs from that described here, and the condition that locus *A* remains polymorphic is meaningful for our diffusion process, which ultimately leads to monomorphism. Note that this condition nearly equates to a condition that both of the two loci remain polymorphic for large-size populations, since the probability that a polymorphism at locus *A* is lost earlier than that at locus *B* is given by Karlin and McGregor (1968),which is almost unity unless the allele frequency *q* is very small.

By expression (5), we have(6)The expressions for the other gamete frequencies *X*_{2} and *X*_{3} can be obtained in the same manner. To calculate the limit of the expectation , we consider some moments. For convenience, we denoteMaking use of the Kolmogorov backward equation (3), the moments μ_{ℓ,m,n} satisfy a differential equation:It is worthwhile to note that *E*[*X*^{ℓ}*Y ^{m}X*

_{1}

*] satisfies a recurrence relation on the two-locus sampling distribution by Golding (1984; Ethier and Griffiths 1990).*

^{n}The moments μ_{n,0,1} satisfy a differential equation,and the differential equation has the solution of the formwherewith the initial condition(7)

In appendix a it is shown thatwhere represents the Gegenbauer polynomial, which is also represented as .

The moments μ_{n,1,0} satisfy a differential equation,and the differential equation has the solution of the formwhere(8)(9)with the initial condition(10)

The recurrence relation (9) can be expressed by a matrix equation, with vectors . The determinant of the matrix **A** iswhich has zeros at ρ = 2 + 2ℓ, (ℓ = 1, 2, 3, …). These zeros are due to degeneracy of the eigenvalues. Since we are not interested in the specific points of ρ, we discuss the case that the inverse matrix exists in the following, although the calculation is straightforward for each point ρ = 2 + 2ℓ, (ℓ = 1, 2, 3, …). By applying the inverse matrix, we obtain

It is shown in appendix b that the finite series inside of the brackets follows an identity:(11)Thus, we obtain(12)

By using (12) and the orthogonal property of the Gegenbauer polynomial,(13)we obtain the general expression for from (10),We observe the limit,which agrees with the limit theorem given by Ethier (1979).

We observe the limits of and ,andrespectively.

By using these results for the moments, we arrive at the analytic expression of (6):We observe the limit,which shows the deterministic behavior of the gamete frequency *X*_{1} without random genetic drift, as expected.

We observe the asymptotic form,The conditional expectation of the gamete frequency *X*_{1} given that locus *A* remains polymorphic iswhere the denominator is given by (1). The asymptotic value of the conditional expectation of the gamete frequency *X*_{1} is(14)

In contrast to the deterministic model without random genetic drift, the value is higher than *pq*, to which the deterministic model tends, and depends on ρ. Note that the second term of the asymptotic value in (14) represents the conditional covariance between the frequencies of the alleles *A*_{1} and *B*_{1}. The process of the change in the conditional expectation of the gamete frequency *X*_{1} when the linkage disequilibrium is introduced into a population as *p* = 1/2*N* = 0.05 and *q* = 0.2 is illustrated in Figure 2. It can be seen that after 4*N* generations (*T* = 2.0) the conditional expectation of the gamete frequency *X*_{1} almost reaches the asymptotic value for large ρ, although 4*N* generations is still not enough for small ρ. It can also be seen that the conditional expectation of the gamete frequency *X*_{1} does not show monotonic behavior for small ρ. It increases rapidly and then decreases to the asymptotic value. For comparison, the counterpart in the deterministic model is also illustrated in Figure 3.

To observe the frequency of the allele *B*_{1} within the allele *A*_{1} bearing chromosomes, let us consider a ratio of the conditional expectation of the gamete frequency *X*_{1} to that of the allele frequency *X*,where the denominator is given by (2). The asymptotic value is(15)

In contrast to the deterministic model without random genetic drift, the value is higher than *q*, to which the deterministic model tends, and depends on ρ. The process of the change in the ratio of the conditional expectation of the gamete frequency *X*_{1} to that of the allele frequency *X* when the linkage disequilibrium is introduced into a population as *p* = 1/2*N* = 0.05 and *q* = 0.2 is illustrated in Figure 4. It can be seen that after 4*N* generations (*T* = 2.0) the ratio almost reaches the asymptotic value for large ρ, although 4*N* generations is still not enough for small ρ. For comparison, the counterpart in the deterministic model is also illustrated in Figure 5. It can be seen that the discrepancy between our model and the deterministic model is significant for small ρ.

## DISCUSSION

The analytic expression of conditional expectation of transient gamete frequency given that one of the two loci remains polymorphic was obtained in terms of the diffusion process by calculating the moments of the distribution. This expression is general and independent from models that introduce linkage disequilibrium into a population.

We considered the model that linkage disequilibrium is introduced by a single mutation and association between the mutant allele *A*_{1} and the allele *B*_{1}, which filled the other locus of the chromosome on which the mutation occurred. Because the allele *A*_{1} is prone to be lost by random genetic drift, the conditional expectation of the frequency of the gamete *A*_{1}*B*_{1} given that locus *A* remains polymorphic is meaningful. The behavior is significantly different from the monotonic decrease in the deterministic model without random genetic drift. After 4*N* generations, the conditional expectation of the gamete frequency almost reaches the asymptotic value for large ρ, although 4*N* generations is still not enough for small ρ. The asymptotic value is larger than the product of the initial allele frequencies to which the deterministic model tends and depends on the recombination fraction between the two loci. Note that the conditional expectation of the linkage disequilibrium coefficient vanishes asymptotically in a similar manner to that in the deterministic model. This observation demonstrates the obvious fact that the linkage disequilibrium measure is not enough to characterize the two-locus problem uniquely.

## APPENDIX A

Since the Gegenbauer polynomial is orthogonal on the interval [−1, 1], the right-hand terms of (7) can be represented in terms of the Gegenbauer polynomials of which degrees are up to *n* − 1 aswhere we set *z* = 1 − 2*p*. Multiplying on both sides of the equation and using the orthogonal property (13), we haveAn integral transform of (1 − *z*)(1 + *z*)* ^{n}* by the Gegenbauer polynomial is(Erdélyi 1954). Thus, we have

## APPENDIX B

It is straightforward to check the identity (11) for *m* = *n*. For 1 ≤ *m* ≤ *n* − 1, the finite series can be expressed by the truncated hypergeometric series,where *y _{n}*(

*a*,

*b*,

*c*,

*z*) is the truncated hypergeometric series. The truncated hypergeometric series can be expressed in terms of the generalized hypergeometric series,(Erdélyi 1953), whereis the generalized hypergeometric series. Thus, we have an identity for the truncated hypergeometric series:

By using the identity for the truncated hypergeometric series, we obtain

## Acknowledgments

I acknowledge the continuous encouragement offered by T. Gojobori. Also, I thank M. Notohara, A. Simizu, and two anonymous reviewers for comments on an earlier version of this manuscript.

## Footnotes

Communicating editor: M. Feldman

- Received September 25, 2004.
- Accepted July 20, 2005.

- Copyright © 2005 by the Genetics Society of America