| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Carsten Wiuf, Department of Statistics, University of Oxford, 1 S. Parks Rd., Oxford, OX1 3TG, England., wiuf{at}stats.ox.ac.uk (E-mail)
Communicating editor: A. G. CLARK
| ABSTRACT |
|---|
In this article we develop a coalescent model with intralocus gene conversion. The distribution of the tract length is geometric in concordance with results published in the literature. We derive a simulation scheme and deduce a number of analytical results for this coalescent with gene conversion. We compare patterns of variability in samples simulated according to the coalescent with recombination with similar patterns simulated according to the coalescent with gene conversion alone. Further, an expression for the expected number of topology shifts in a sample of present-day sequences caused by gene conversion events is derived.
UNDERSTANDING patterns of variation in DNA sequences requires insight into at least three processes. The first is the genealogical process describing the ancestral history of a sample of alleles from a single locus. Kingman's coalescent process (![]()
![]()
Central to many models describing the mechanisms at the sequence level is the Holliday junction (![]()
![]()
![]()
It is the aim of this article to develop a coalescent model with intralocus gene conversion alone and investigate the effects of gene conversion on various statistics of interest in the analysis of DNA sequences. We compare patterns of variability in samples simulated according to the coalescent with recombination with similar patterns simulated according to the coalescent with gene conversion alone. In ![]()
In a subsequent article (![]()
Two nucleotides separated by a large distance produce recombinants due to recombination events only. The (short) finite length of a gene conversion tract makes the contribution from gene conversion events insignificant. Thus, the patterns observed in a sample of sequences are the result of recombination events in the history of the sample and of the substitution process imposed. However, over small distances recombinants can be produced by recombination events as well as by gene conversion events. The observed patterns in samples of sequence data are thus the results of three different processes: the gene conversion process and the recombination process, and as well as the substitution process.
A recombination process was incorporated into Kingman's coalescent model (![]()
![]()
![]()
![]()
![]()
We expect that the patterns of intralocus variability in a model with gene conversion are different from those in a model with recombinants produced by recombination (as defined above) only. The effect of recombination in the coalescent model is to break up the material ancestral to a sequence in two parts and distribute the parts on two different ancestors, one carrying the ancestral material to the left of the recombination break point, S, the other carrying the material to the right of S (Fig 1). In contrast, gene conversion, as defined here, breaks the material ancestral to a sequence in two points, S and T, and distributes the material to the left of S and that to the right of T on one ancestor, and the part in between S and T is on another ancestor (Fig 1).
|
The effect of a recombination event can easily be obtained in a model of intralocus gene conversion. If one end point falls outside the observed sequence the effect will be similar to that of a recombination event, though the probabilities of the two events might be different from each other. These probabilities depend on the rates of gene conversion and of recombination within the observed sequence. Further, they depend on the number of nucleotides observed; the higher the number the less is the chance of a gene conversion with only one end point within the observed sequence.
Similarly, the effect of a gene conversion event can be obtained by two recombination events and one coalescent event (Fig 2). Again, the probabilities of obtaining the events might be very different. Especially, the latter series of events (given that the first recombination event occurs) will depend strongly on the current sample size; the higher the sample size, the lower the chance that the two recombined sequences will coalesce before coalescing with any other sequence in the sample.
|
Finally, the length distribution of the transferred chunk in the gene conversion events determines the distributions of the end points. Thus, the end points will only in rare cases be uniformly distributed along the observed sequence. Again, this is in striking discrepancy with the coalescent model with recombination where the breakpoint is uniformly distributed along the entire sequence (standard assumption; see, e.g., ![]()
Fig 3 shows an example of a genealogy of a sample of size 2 with intralocus gene conversion.
|
This article is organized into sections. The first two sections describe the coalescent model with gene conversion. In the third section a simulation scheme is developed similar to that of ![]()
![]()
| THE MODEL |
|---|
Consider a diploid population model with effective constant (diploid) size 2N. A new generation is obtained from the present generation by sampling 2N sequences with replacement, forming random pairs of sequences, and letting a short tract of nucleotides be transferred between the sequences. The mode of transfer is described below.
Sequences are of length L + 1 nucleotides so that there are L gaps between nucleotides. Consider one such sequence. Assume that in any generation the probability of a gene conversion initiating between any two positions in the sequence is g, independently of whether gene conversions initiate elsewhere along or outside the observed sequence. Where along the sequence a gene conversion initiates is uniformly distributed among all sites. If gL is small the chance of more than one gene conversion event in one generation is negligible.
The transferred chunk of nucleotides originates from a randomly chosen sequence in the population. In ![]()
![]() |
(1) |
That is, at least one nucleotide is transferred. It is assumed that the insertion happens to the right of the gap where the conversion initiates. This model is essentially equivalent to a model where the insertion happens to the right of the gap with probability p and to the left with probability 1 - p (see ![]()
![]()
Let Q = qL. If q is small and L is large the distribution of z =
can be approximated with an exponential distribution with parameter Q, i.e., z ~ Exp(Q) (where ~ means "is distributed like"). Further, let G = 4gNL be the expected number of gene conversion events per sequence per 4N generations and assume g is small and N large. The definition of G is analogous to the parametrization adopted for the coalescence with recombination (![]()
The genealogical process of a sample of n present-day sequences is studied: time starts at the present and increases, going backward in time. Under the above assumptions we find that the waiting time, WC (in units of 2N generations), until a sequence has been created by a gene conversion event that initiates within the sequence is approximately exponentially distributed with parameter G/2, i.e., WC ~ Exp(G/2), if N is large and gL is small. The rate of gene conversions initiating outside the sequence but ending within the observed sequence must also be taken into account. This rate depends on the distribution of Z and is discussed in the subsequent section. The rate of coalescence is n(n - 1)/2 if there are n sequences in a sample (![]()
We note that Q as well as G scale linearly in L, the observed number of gaps between nucleotides. That is, both Q and G are doubled if the observed number of gaps, L, is doubled. The expected length of the transferred chunk in a gene conversion is 1/q (measured in number of nucleotides). Therefore, the parameter Q is interpreted as the sequence length measured in units of expected length of the transferred chunk. Similarly, the parameter G is sequence length measured in expected number of gene conversion events per sequence per 4N generations. We note that
=
is independent of L.
| EFFECTS OF GENE CONVERSION |
|---|
In this section we discuss the effects of a single gene conversion event on a sequence and find the waiting time until the sequence has been created by a gene conversion event.
Denote by
,
, and
the variables Z/L, S/L, and T/L, respectively, where Z, S, and T are as defined in the previous section. The variables
and
take values in {1/L, 2/L, ... , 1 - 1/L}
(0, 1) and as L becomes large the distributions of
and
converge to continuous distributions on (0, 1). The representation of a sequence as the continuous interval (0, 1) is commonly used (see, e.g., ![]()
Let C denote the event that a gene conversion happens in a given sequence in a given generation. Further, let C1 denote the event that the gene conversion falls partly outside the L + 1 observed nucleotides, that is, S + Z > L. Similarly, let C2 denote that both end points are within the sequence length (lower index i indicates that i of the two end points of the converted chunk are within the L nucleotides). We find
![]() |
(2) |
and
![]() |
(3) |
Denote by WC1, the time until an event of type C1 and by WC2 the time until an event of type C2. Then we have WC1 ~ Exp(G
) and WC2 ~ Exp(
), such that the waiting time, WC, until a gene conversion event of either type is WC = min(WC1, WC2) ~ Exp(
). This is similar to the recombination model; the waiting time to a recombination event within the observed sequence is Exp(R/2), where R = 4rNL is the rate of recombination and r is the probability of a recombination break between any two nucleotides in the sequence.
The end points of the tract are affected by the type, C1 or C2, of the event. The density, f
,
(s, t|C2), of
and
conditional on C2 is given by
![]() |
(4) |
from which the marginal densities of
and
easily can be derived. The distribution of
(and
) is not uniform but skewed toward 0 (and 1). The density (4) is symmetric so that (1 -
, 1 -
) is distributed like (
,
).
The density of (
,
) conditional on C2 relates to that of
conditional on C2 because
=
-
. We find
![]() |
(5) |
where f
(z|C2) denotes the density of
conditional on C2.
Finally, the density of
conditional on C1 (that is,
+
falls outside the observed nucleotides,
+
> 1) is
![]() |
(6) |
If the tract falls outside the observed sequence the chance that
is close to 1 is higher than the chance that
is close to 0. This is in concordance with (6); the density function is increasing in s.
In Fig 4 we illustrate the above different distributions with q obtained from D. melanogaster data.
|
Gene conversions initiating outside the L + 1 nucleotides will have a chance of terminating within the L + 1 observed nucleotides. Assume that the entire chromosome potentially consists of an infinite array of sequences of length L plus the observed one of length L + 1 and that the gene conversion model described above is valid for the entire chromosome.
For most organisms, there is an upper bound to the length of a gene conversion tract. This means that only a minor (finite) extension of the observed sequence should be taken into consideration and not the entire chromosome. In the model discussed here, tract lengths can have an arbitrary size, but it can be shown that the probability of a tract initiating at least nL nucleotides away from the observed sequence and ending within the sequence, given a gene conversion event happens that ends in the observed sequence, is of order exp(-nQ). Basically only tracts initiating close to the observed sequence affect the sequence history in concordance with what is expected.
For each gene conversion tract initiating within the observed sequence and ending outside the observed sequence there is a similar tract initiating outside (to the left of the observed sequence), ending within the observed sequence. Because the chance of gene conversion events is the same along the entire chromosome, the two events have the same probability of happening.
Let Co(o for outside) denote the event that a gene conversion initiates outside the observed sequence, and terminates within. The above informal argument shows that
![]() |
(7) |
and that the waiting time, WCo, until a gene conversion initiating outside the observed sequence ends within the observed sequence, is distributed like the waiting time, WC1, until an event of type C1; that is, WCo ~ WC1. This is a general result that applies to the models described in ![]()
Further, conditional on Co, the termination point t =
has density
![]() |
(8) |
because the length is exponentially distributed, and the chance that an exponential variable Exp(Q) is <1 is 1 - exp(-Q).
The events and Co and C1 are independent. Thus, the waiting time WC1
Co = min(WC1, WCo) until an event of either type C1 or type Co is approximately exponentially distributed with parameter G(1 - K(Q)), and where the initiation/termination point,
, happens, is distributed with density
![]() |
(9) |
according to (6) and (8).
Summing up, we find that the waiting time, W, until a sequence is created by a gene conversion event is exponentially distributed
![]() |
(10) |
because P(Co) + P(C1|C) + P(C2|C) = 2 - K(Q).
Given a gene conversion occurs, the probability that both end points of the inserted chunk are within the observed sequence is
![]() |
(11) |
and the probability that only one end point is within the observed sequence is
![]() |
(12) |
Whether it is a type Co or type C1 event happens with probability 1/2. The density of (
,
) is given by (4) (if there are two breakpoints) and the density of the single breakpoint
is in the last case given by (9). In the example in Fig 4, Q = 5, or five times the expected tract length, and we find p2 = 0.67 and p1 = 0.33.
The distribution in (10) is interesting for several reasons. First, the distribution of the tract length,
, depending on Q, affects the number of events and the times between events. The parameter G(2 - K(Q))/2 varies from G/2 when Q =
to G when Q = 0. Second, for all Q > 0 the number of events is higher than the number of events in a recombination model with a similar parameter (i.e., r = g).
The two extreme values, Q = 0 and Q =
, are of interest. If Q = 0 we find that W ~ Exp(G), and a gene conversion will leave just one breakpoint within the sequence. Where the break occurs will be uniformly distributed along the sequence. Thus, the coalescent with gene conversion resembles the coalescent with recombination, but the rate of gene conversions is twice the rate of recombinations (assuming the probability of a recombination between any two nucleotides is g).
If Q =
we find that W ~ Exp(G/2) and that both end points of a gene conversion will be within the sequence. As Q goes toward infinity,
will be uniformly distributed along the sequence, and
, that is,
0. In the limit Q =
, a gene conversion will just leave a spot on the sequence, leaving the sequence the way it is. Thus, the coalescent with gene conversion will be indistinguishable from the pure coalescent process. Each time a conversion happens, it cannot be seen, and only coalescent events affect the history of a sample.
| MODEL SIMULATIONS |
|---|
Assume n sequences are sampled from a present-day population. The genealogy of the sample subject to gene conversion as described in the previous section can be simulated according to the scheme in Fig 5.
|
The algorithm is formulated as a birth and death process with constant death rate, G(2 - K((Q))/2, and terminates when there is only one ancestral sequence to the sample. ![]()
Consider a fixed point,
, along the sequences. The tree, T(
), that describes the sequences at
can be found by starting at the present sequences and going back in time. When a gene conversion node is encountered, the branch that describes the segment containing
is followed. Assume that the gene conversion leaves only one end point, s, in the sequence. If s <
, then the branch describing the fate of the left part of the sequences is to be followed and vice versa and similarly, if the gene conversion leaves two end points. The tree T(
) is distributed like the coalescent process since one point cannot be subject to gene conversion.
Mutations can be superimposed on the genealogy if all alleles are selectively neutral. Assuming mutations arrive according to a Poisson process, the number of mutations on the total genealogy is Poisson with parameter
B/2, where B is the total branch length of the entire genealogy,
= 4Nu is the scaled mutation rate, and u is the probability of a mutation in a sequence per generation.
The coalescent with gene conversion can be combined with the coalescent with recombination. The genealogy of a sample of sequences is constructed similarly to the scheme above, waiting for three different events to occur: coalescence, recombinations, and gene conversions.
| RESULTS |
|---|
As noted previously, Q0 =
=
is independent of L. The parameters (Q0, G) are more natural parameters than (Q, G); G represents the length of the sequence in units of gene conversion events per sequence per 4N generations and Q0 is a parameter dependent on the effective size of the population and parameters intrinsic to the biological system. In contrast (Q, G) are both parameters representing sequence length, but in different units.
In the following, results are given in terms of (Q0, G) to facilitate comparisons between samples of different lengths but with the same Q0. If we consider Q0 fixed and G a variable, results (and plots) can be converted between models with different g's but equal Q0 through a linear scaling of sequence length.
If sequence length is measured in units of expected number of events per sequence per 4N generations, we find that the scaled length
' =
G of a tract is
![]() |
(13) |
[according to (1)]; that is, exponential with parameter Q0 =
. The waiting time, W, until a sequence is created by a gene conversion event is in this formulation given by
![]() |
(14) |
[from (10)]. The probabilities of the different types of gene conversion events are given by (11) and (12) with Q replaced by Q0G. A similar remark applies to the densities of the end points
and
.
In the following, we consider a sample of size n taken from the population at the present time. First, we focus on correlations between trees. Consider two positions,
1 and
2, in a sample of n sequences at distance G. We are interested in the trees, Tn(
1) and Tn(
2), that describe the sequences at
1 and
2. Only events of type C1 and Co in between positions
1 and
2 affect the relation between the two trees. Events of type C2 in between the positions cannot be traced. The rate, rG/2, by which events of type C1 and Co happen is, per sequence,

cf. (10) and (12).
For small values of G we find rG
2G. This is of particular interest because the rate at which recombinants are produced by recombination is only G (assuming r = g), and this is half the rate at which recombinants are produced by gene conversion events. Thus, gene conversion events might contribute significantly to linkage disequilibrium over small distances (see also WIUF 1999). For example, according to ![]()
Applying results in ![]()
![]() |
(15) |
where Cov denotes the covariance between variables. We note that the expression in (15) obtains its minimum when G =
, in which case rG =
. Thus the covariance is strictly positive for all values of G and converges toward a nonzero constant as G
. We have
![]() |
(16) |
The probability P(T2(
1) = T2(
2)) for n = 2 is plotted in Fig 6. The tree height, Tn = max{Tn(
)|
}, until all positions in the sample have found a most recent common ancestor is plotted in Fig 7 for n = 2. For Q0 = 0.05 and G = 5 we find Q = 0.25, which corresponds to ~100 nucleotides in D. melanogaster (see Fig 4 and Fig 6). In this case the chance that two positions separated at distance 5 share a most recent common ancestor (MRCA) is ~0.10 if sample size is 2.
|
|
Also of interest are the numbers of different types of gene conversion events. We discuss two such numbers. The first is the number of gene conversion events affecting the history of the sample. Denote this number by Gn. The second is the number of gene conversion end points that make the topology change. Call this number Sn. This number relates to the possibilities of detecting gene conversion events in the sample history. Under an infinite-sites mutation model (![]()
![]()
It is not possible to find the expectation of Gn analytically. The reason for this is the following. The expectation of Gn is not linear in G because a gene conversion with both end points in ancestral material is only counted once. Knowing that a gene conversion has one end point within ancestral material in a small sequence interval does not reveal if the other end point also is within ancestral material. Thus, information about gene conversion end points in a small sequence interval does not provide information whether the points count in Gn (Fig 8). In the coalescent with recombination, the expected number of recombination events can easily be found because the number is linear in R (![]()
|
|
The expected number of events causing a topology shift is given by
![]() |
(17) |
The proof is similar to that of ![]()
2G for small G and not rG = G as with recombination. Again, this can result in a considerable difference in the number of topology shifts seen in models with recombination vs. gene conversion.
Consider position 0 in the sequences and let Bn be the total length of the coalescent tree in position 0. ![]()
n, conditional on Bn, from position 0 until the first recombination point affecting the history of the sample. They find that
![]() |
(18) |
where sequence length is measured in units of expected number of recombination events per sequence per 4N generations. Sequences are here potentially infinitely long.
Here, we give the analogous result for the coalescent model with gene conversion. The situation differs in the sense that we wait for either a gene conversion initiating to the right of position 0 [analogous to (18)] or a gene conversion initiating to the left of position 0 but ending to the right of 0. The proof is given in the Appendix. The distribution of the length,
n, until the first point affecting the history of the sample, conditional on Bn, is
![]() |
(19) |
Similarly, the distribution of
n can be found (see Appendix) and is given by
![]() |
(20) |
If n = 2 we find that the expectation of
n is infinite; that is, in general the sequence length until the first gene conversion event is large. For n > 2 the expectation is finite.
| DISCUSSION |
|---|
We developed a coalescent model with gene conversion assuming a diploid population of constant size, N. It takes two parameters as input (along with the sample size). The first is the product G = 4NLg, where g is the probability that a gene conversion tract initiates in a fixed position, and the second parameter is Q = qL, where q is the probability that the next nucleotide is within the tract given that the former is within the tract. An easy simulation scheme was developed. This scheme could be modified, in accordance with ![]()
We derived a number of results related to the correlation of trees in different positions and to the probability that two given positions share a common ancestor. These tend to be highly different from similar quantities obtained in the coalescent with recombination. The covariance between trees relating two distinct positions does not tend to zero with increasing distance between the positions, but is bounded by a positive constant. Thus, it is expected that the trees in the two distinct positions are more likely to share the same topology only in a model of gene conversion than in a model of recombination with similar rate. Fig 10 confirms this. As Q0 =
increases, the probability,
, of common topology in two distinct positions decreases and for fixed value of Q0,
converges (as function of distance) to a level distinct from 0 and 1. Low values of Q0 have a similar effect to that of recombination.
|
We should note in passing that given a gene conversion end point is in position
, the probability, p, of detecting that the gene conversion has occurred from incompatibilities in the sequence alignment is very low and slowly increases with sample size. ![]()

for large sample sizes, n; e.g., if n = 1000, p
0.69. The proof is similar to that of ![]()
It is of high interest to be able to distinguish mechanisms operating on the molecular level in patterns of variability obtained in a sample of sequences. An indicator of gene conversion could be the following. Define an informative site to be a site where at least two different nucleotides are present in at least two sequences each. Further, say a pair of sites is compatible if there exists a topology explaining the sites with the minimum possible number of substitution events. Now, assume three informative sites are given. If the first and last sites are compatible, and the two remaining pairs involving the middle site are both incompatible, we would take this as evidence of a gene conversion event. In a pure recombination model, given the middle site is incompatible with both other sites, we would expect that the first and the last sites are also incompatible. We have simulated how often under ideal circumstances this is likely to happen. It is assumed that the topology can be accurately constructed for each column in the sequence alignment. This is, for example, the case in an infinite-sites model with very high mutation rate. Table 1 shows simulation results for various values of Q0 and G.
|
Denote the positions 1, 2, and 3. Positions 1 and 3 are fixed whereas 3 is uniformly distributed between 1 and 3. The findings in the table can be explained as follows. Assume Q0 = 0. If 1 and 2 do not share topology and similarly for 2 and 3, this requires at least two recombination events. As the distance between 1 and 3 increases, the chance of more recombination events before the MRCAs to positions 1 and 3 increases; thus the chance that 1 and 3 share the same topology decreases.
If Q0 > 0 the pattern changes. When the distance is small compared to 1/Q0 (expected tract length), most events will be of type C1 and Co. A similar pattern to that for Q0 = 0 is thus expected. As the distance increases, the rate by which 1 and 3 are separated becomes almost constant (rG =
) and if (1, 2) and (2, 3) do not share topologies it is most likely caused by a gene conversion event with both end points within 1 and 3. Thus we will accept more cases where 1 and 3 share topology.
Now fix the distance between the positions. As Q0 increases, the number of C2 events increases relative to C1 and Co events (Equation 11 and Equation 12) and again if (1, 2) and (2, 3) do not share topologies it is most likely caused by a gene conversion event that removes a region around position 2. Again, we expect more cases where 1 and 3 share topology.
The pattern observed in Table 1 is obtained under ideal circumstances. In practice, the topology relating a set of sequences in a given position can be reconstructed from the nucleotide pattern in that position in rare cases only. Under an infinite-sites model with low mutation rate, the chance that 1 and 3 can be explained by the same topology is higher than the value given in Table 1. Similarly, recurrent substitutions have the same effect, but in both cases the overall pattern in Table 1 is retained. These results suggest that in a pure recombination model we will see many more pairs of incompatible sites than in a model with pure gene conversion (assuming the rate of recombination is similar to the rate of gene conversion). On the other hand we expect many more topology shifts under a gene conversion model, so that we must return, moving along the sequences, to previously visited topologies more often than under a recombination model. Also, the results in Table 1 imply that recombination can be distinguished from gene conversion by the spatial arrangement of incompatible pairs, simply by estimating the probability in Table 1 from data. If recombination is not present we will expect this probability to be significantly different from 0.
It would be advantageous to build a model of gene conversions and recombination more directly on a molecular model of these phenomena. This will be pursued in greater depth elsewhere, but the basic issues involved are shortly sketched here.
Central to most models of gene conversion/recombination is the Holliday junction (![]()
![]()
![]()
Later models, especially ![]()
![]()
Since it is not well known how different recombination mechanisms are phylogenetically distributed, it would be interesting to know if it was ever possible to distinguish the underlying mechanism from pure population data. It would here be natural to formulate what was expected in terms of different sets of incompatibilities. For instance, if three informative sites were given, and the first and last sites were compatible, but the two remaining pairs involving the middle site were incompatible, this would be taken as an indicator of a gene conversion. If more informative sites were found close to the middle position, these would be expected to have different kinds of compatibilities dependent on which kind of repair mechanism was operating on the heteroduplex.
Whether or not different recombination/gene conversion models are distinguishable by population sequence data depends on the frequency of configurations of segregating nucleotides in the population that would result in different products when recombined under different models. A functional geneticist would design the necessary configurations of nucleotides and set up the adequate crosses. In population genetics this would have to occur by chance. It is unlikely that population sequence data can compete with designed experiments in this respect, but this does not imply that different mechanisms of recombination will not have consequences for the expected patterns of variation in population sequence data; such data are becoming available from many organisms, where molecular genetical experiments have not yet been performed.
| ACKNOWLEDGMENTS |
|---|
J. P. Hjorth is thanked for helpful discussions on what gene conversion is, M. Nordborg for commenting on an early version of the manuscript, and M. Schierup for various suggestions that improved the manuscript considerably. We thank A. Mikkelsen for implementing the model and performing the simulations used in the article. C.W. was supported by grant Biotechnology and Biological Sciences Research Council 43/MMI09788 and by the Carlsberg Foundation, Denmark. Part of this work was carried out while the authors visited the Isaac Newton Institute, University of Cambridge.
Manuscript received August 4, 1999; Accepted for publication January 18, 2000.
| APPENDIX |
|---|
PROOF OF Equation 19 AND Equation 20
![]()
n, conditional on the total branch length, Bn, from position 0 until the first recombination point affecting the history of the sample. They find that
![]() |
(A1) |
where the sequence length is measured in units of expected number of recombinations per sequence per 4N generations. Sequences are here potentially infinitely long.
Remember [see (10), (11), and (12)]

and

Whenever an event of type C = C1
C2 happens the initiation point is uniform along the sequence (per definition). Therefore, the sequence length,
1,2 from position 0 until the first breakpoint affecting the history of the sample is distributed like (A1)
![]() |
(A2) |
for sequences potentially infinitely long.
The rate, G(1 - K(Q0G))/2, of type Co events converges to 1/(2Q0) for G
. Conditional on Bn = b, the number, Go, of Co events in the sequences ancestral to position 0 is thus Poisson distributed with parameter b/(2Q0),
![]() |
(A3) |
The density, f
(x|Co, G), of the termination point
G = G
(in units of sequence length) converges to an exponential distribution for G
![]() |
(A4) |
[from (8)]. Combining (A3) and (A4) we find the distribution of the length,
o, until the first point of type Co affecting the history of the sample,
![]() |
(A5) |
Outside {G0 > 0},
0 =
.
Finally, combining (A2), (A3), and (A5) we deduce that
n = min (
o,
1,2) has distribution
![]() |
(A6) |
as required. The distribution of
n can be found in the following way. The event {
n > x} is equivalent to the event that no gene conversion events occur in a sample of sequences of length x before a MRCA is found. This has probability
![]() |
(A7) |
| LITERATURE CITED |
|---|
ANDOLFATTO, P. and M. NORDBORG, 1997 The effect of gene conversion on intralocus associations. Genetics 148:1397-1399
CARPENTER, A. T. C., 1984 Meiotic roles of crossing-over and of gene conversion. Cold Spring Harbor Symp. Quant. Biol. 49:23-29[Medline].
GRIFFITHS, R. C., 1991 The two-locus ancestral graph, pp. 100117 in Selected Proceedings of the Symposium of Applied Probability, Sheffield 1989, IMS Lecture Notes-Monograph Series, 18, edited by I. V. BASAWA and R. L. TAYLOR. Hayward, CA.
GRIFFITHS, R. C., and P. MARJORAM, 1997 An ancestral recombination graph, pp. 257270 in Progress in Population Genetics and Human Evolution, IMA Volumes in Mathematics and Its Applications, 87, edited P. DONNELLY and S. TAVARÉ. Springer-Verlag, Berlin.
GRIFFITHS, R. C. and S. TAVARÉ, 1994 Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. Ser. B 344:403-410[Medline].
HILLIKER, A. J., G. HARAUZ, A. G. REUAME, M. GRAY, and S. H. CLARK et al., 1994 Meiotic gene conversion tract length distribution within the rosy locus of Drosophila melanogaster.. Genetics 137:1019-1026[Abstract].
HOLLIDAY, R., 1964 A mechanism for gene conversion in fungi. Genet. Res. 5:282-287.
HUDSON, R. R., 1983 Properties of the neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183-201[Medline].
HUDSON, R. R. and N. KAPLAN, 1985 Statistical properties of the number of recombination events in the history of DNA sequences. Genetics 111:147-164
KINGMAN, J. F. C., 1982 The coalescent. Stoch. Process. Appl. 13:235-248.
MESELSON, M. S. and C. M. RADDING, 1975 A general model for genetic recombination. Proc. Natl. Acad. Sci. USA 41:215-220.
RESNICK, M. A., 1976 The repair of double-strand breaks in DNAs: a model involving recombination. J. Theor. Biol. 59:97-106[Medline].
STAHL, F. W., 1994 The Holliday junction on its thirtieth anniversary. Genetics 138:241-246[Medline].
TAVARÉ, S., 1984 Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol. 26:119-164[Medline].
WATTERSON, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276[Medline].
WIUF, C., 2000 A coalescent approach to gene conversion. Theor. Popul. Biol. in press.
WIUF, C. and J. HEIN, 1997 On the number of ancestors to a DNA sequence. Genetics 147:1459-1468[Abstract].
WIUF, C. and J. HEIN, 1999 Recombination as a point process along sequences. Theor. Popul. Biol. 55:248-259[Medline].
This article has been cited by other articles:
![]() |
K. M. Teshima and H. Innan Neofunctionalization of Duplicated Genes Under the Pressure of Gene Conversion Genetics, March 1, 2008; 178(3): 1385 - 1398. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Cutter Multilocus Patterns of Polymorphism and Selection Across the X Chromosome of Caenorhabditis remanei Genetics, March 1, 2008; 178(3): 1661 - 1672. |