Skip to main content
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus
  • Other GSA Resources
    • Genetics Society of America
    • G3: Genes | Genomes | Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org

Institution: Massachusetts Inst of Technol MIT Libs

  • Log in
Genetics

Main menu

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Early Career Reviewers
    • Contact us
  • SERIES
    • Centennial
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Prediction
    • Multiparental Populations
    • FlyBook
    • WormBook
    • YeastBook
  • ARTICLE TYPES
    • About Article Types
    • Commentaries
    • Editorials
    • GSA Honors and Awards
    • Methods, Technology & Resources
    • Perspectives
    • Primers
    • Reviews
    • Toolbox Reviews
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Why subscribe?
    • For institutions
    • For individuals
    • Email alerts
    • RSS feeds
  • Other GSA Resources
    • Genetics Society of America
    • G3: Genes | Genomes | Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org

User menu

  • Log out

Search

  • Advanced search
Genetics

Advanced Search

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Early Career Reviewers
    • Contact us
  • SERIES
    • Centennial
    • Genetics of Immunity
    • Genetics of Sex
    • Genomic Prediction
    • Multiparental Populations
    • FlyBook
    • WormBook
    • YeastBook
  • ARTICLE TYPES
    • About Article Types
    • Commentaries
    • Editorials
    • GSA Honors and Awards
    • Methods, Technology & Resources
    • Perspectives
    • Primers
    • Reviews
    • Toolbox Reviews
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Why subscribe?
    • For institutions
    • For individuals
    • Email alerts
    • RSS feeds
Previous ArticleNext Article

The Coalescent With Gene Conversion

Carsten Wiuf and Jotun Hein
Genetics May 1, 2000 vol. 155 no. 1 451-462
Carsten Wiuf
*Department of Statistics, University of Oxford, Oxford OX1 3TG, England
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jotun Hein
†Institute of Biological Sciences, University of Aarhus, 8000 Aarhus C, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
Loading

Abstract

In this article we develop a coalescent model with intralocus gene conversion. The distribution of the tract length is geometric in concordance with results published in the literature. We derive a simulation scheme and deduce a number of analytical results for this coalescent with gene conversion. We compare patterns of variability in samples simulated according to the coalescent with recombination with similar patterns simulated according to the coalescent with gene conversion alone. Further, an expression for the expected number of topology shifts in a sample of present-day sequences caused by gene conversion events is derived.

UNDERSTANDING patterns of variation in DNA sequences requires insight into at least three processes. The first is the genealogical process describing the ancestral history of a sample of alleles from a single locus. Kingman's coalescent process (Kingman 1982) is one such process. The second is the substitution process that describes how alleles mutate over time. For neutral loci the genealogical process and the substitution process can be separated such that the substitution process is superimposed on the genealogical process (Tavaré 1984), whereas for loci subject to selection the two processes cannot be separated. Finally, the last process describes the linkage relations between adjacent loci. It governs how and when adjacent loci are linked or split on different ancestral chromosomes and requires essentially an understanding of mechanisms that operate at the sequence level.

Central to many models describing the mechanisms at the sequence level is the Holliday junction (Holliday 1964). When the Holliday junction is resolved the result can be of two types: (1) a gene conversion with accompanying exchange of flanking regions (gene conversion with recombination); or (2) a gene conversion alone (Carpenter 1984; Stahl 1994). In this case a short tract of nucleotides is exchanged between two strands. We refer to the latter case as a gene conversion and the former as a recombination.

It is the aim of this article to develop a coalescent model with intralocus gene conversion alone and investigate the effects of gene conversion on various statistics of interest in the analysis of DNA sequences. We compare patterns of variability in samples simulated according to the coalescent with recombination with similar patterns simulated according to the coalescent with gene conversion alone. In Hilliker et al. (1994) the distribution of the gene conversion tract in Drosophila melanogaster is estimated. They find that a geometric distribution is a good approximation of the empirical distribution and accordingly the model is built on this observation.

In a subsequent article (Wiuf 2000) it is shown that to some extent the results given in this article apply generally without assuming any specific form of the distribution of the transferred chunk. The benefit of the geometric distribution is that more analytical results can be proven and that a specific choice of distribution (here, geometric) allows for investigation by simulation.

Two nucleotides separated by a large distance produce recombinants due to recombination events only. The (short) finite length of a gene conversion tract makes the contribution from gene conversion events insignificant. Thus, the patterns observed in a sample of sequences are the result of recombination events in the history of the sample and of the substitution process imposed. However, over small distances recombinants can be produced by recombination events as well as by gene conversion events. The observed patterns in samples of sequence data are thus the results of three different processes: the gene conversion process and the recombination process, and as well as the substitution process.

A recombination process was incorporated into Kingman's coalescent model (Kingman 1982; describing the genealogical process of a sample of sequences taken from a population) by Hudson (1983) and has subsequently been investigated by a number of authors (Hudson and Kaplan 1985; Griffiths and Marjoram 1997; and Wiuf and Hein 1997 among others).

We expect that the patterns of intralocus variability in a model with gene conversion are different from those in a model with recombinants produced by recombination (as defined above) only. The effect of recombination in the coalescent model is to break up the material ancestral to a sequence in two parts and distribute the parts on two different ancestors, one carrying the ancestral material to the left of the recombination break point, S, the other carrying the material to the right of S (Figure 1). In contrast, gene conversion, as defined here, breaks the material ancestral to a sequence in two points, S and T, and distributes the material to the left of S and that to the right of T on one ancestor, and the part in between S and T is on another ancestor (Figure 1).

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

(A) Recombination and (B) gene conversion. If the resolution of the Holliday junction results in an exchange of flanking regions we have (what here is called) a recombination event. In the top, two of the four strands involved in the Holliday junction are shown. In the bottom, time starts at present (the second generation) and goes backward. Starting with either of the two strands/sequences in the second generation, the effect of the recombination event is to create two ancestors to the sequence; the positions to the left of position S share one ancestor and the positions to the right of S share another ancestor. If the Holliday junction is resolved without an exchange of flanking regions we have a gene conversion event (without recombination). In the top, two of the four strands involved in the Holliday junction are shown. In the bottom, time starts at present (the second generation) and goes backward. Starting with either of the two strands/sequences in the second generation, the effect of the gene conversion event is to create two ancestors to the sequence; the positions to the right of T and to the left of S share one ancestor and the position between S and T shares another ancestor.

The effect of a recombination event can easily be obtained in a model of intralocus gene conversion. If one end point falls outside the observed sequence the effect will be similar to that of a recombination event, though the probabilities of the two events might be different from each other. These probabilities depend on the rates of gene conversion and of recombination within the observed sequence. Further, they depend on the number of nucleotides observed; the higher the number the less is the chance of a gene conversion with only one end point within the observed sequence.

Similarly, the effect of a gene conversion event can be obtained by two recombination events and one coalescent event (Figure 2). Again, the probabilities of obtaining the events might be very different. Especially, the latter series of events (given that the first recombination event occurs) will depend strongly on the current sample size; the higher the sample size, the lower the chance that the two recombined sequences will coalesce before coalescing with any other sequence in the sample.

Finally, the length distribution of the transferred chunk in the gene conversion events determines the distributions of the end points. Thus, the end points will only in rare cases be uniformly distributed along the observed sequence. Again, this is in striking discrepancy with the coalescent model with recombination where the breakpoint is uniformly distributed along the entire sequence (standard assumption; see, e.g., Griffiths and Marjoram 1997).

Figure 3 shows an example of a genealogy of a sample of size 2 with intralocus gene conversion.

This article is organized into sections. The first two sections describe the coalescent model with gene conversion. In the third section a simulation scheme is developed similar to that of Griffiths and Marjoram (1997) to simulate from the coalescent with recombination; in the fourth section a number of analytical and simulated results are collected. Results obtained in the coalescent with recombination apply on several occasions to the gene conversion model. Finally, the last section is a discussion of extensions to more realistic models of gene conversion and recombination. When proofs are omitted they can be derived from the general results in Wiuf (2000).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Gene conversion vs. recombination. The topological effect of one gene conversion event (left) can be obtained in a model with recombination only by two recombination events accompanied by a coalescence event. Assuming the rate of gene conversion events is the same as the rate of recombination events, we find that the chance of two recombination events followed by a coalescence event is far smaller than that of a gene conversion event. The former also strongly depends on the number ancestral sequences present. Here, for example, given that a coalescence event occurs it will result only in the desired distribution of ancestral material in one out of three possible mergings.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Example of a sample history with intralocus gene conversion. We trace the history of a sample of size 2 back in time until all positions on the two sequences have found a most recent common ancestor (MRCA). We adopt the following graphical notation: thin lines represent material nonancestral to the present-day sample; thick lines, material ancestral to the sample; and crosses, material that has found a MRCA. The first three events, 1, 2, and 3, affecting the history of the sample are all gene conversion events spreading the material ancestral to the sample on five sequences. From the point of the topological structure of the tree, we cannot distinguish events 2 and 3 from recombination events. In the first event both end points of the transferred chunk fall within the observed sequence, whereas we cannot say whether the other end points in events 2 and 3 fall within the observed sequence length or fall outside. All that can be said is that the other end points do not fall within material ancestral to the present-day sample. The fourth event is a coalsecence event whereby positions (1/2, 3/4) find a MRCA. At event 5, positions (3/4, 1) find a MRCA, and at event 6, positions (0, 1/2) find a MRCA. Consider the ancestral sequence A. A gene conversion event with end points in the material not ancestral to the sample [that is outside (1/2, 3/4)] cannot be traced; it does not affect the history. Similarly, a gene conversion event in the ancestral sequence B with one or two end points in (1/2, 3/4) and none in ancestral material does not affect the history. Only the region spanned by ancestral material need be taken into account; e.g., the span of ancestral material at sequence B is 1 − 0 = 1, and the span at sequence A is 0.75 − 0.5 = 0.25. Note that the region might include nonancestral material as in sequence B.

THE MODEL

Consider a diploid population model with effective constant (diploid) size 2N. A new generation is obtained from the present generation by sampling 2N sequences with replacement, forming random pairs of sequences, and letting a short tract of nucleotides be transferred between the sequences. The mode of transfer is described below.

Sequences are of length L + 1 nucleotides so that there are L gaps between nucleotides. Consider one such sequence. Assume that in any generation the probability of a gene conversion initiating between any two positions in the sequence is g, independently of whether gene conversions initiate elsewhere along or outside the observed sequence. Where along the sequence a gene conversion initiates is uniformly distributed among all sites. If gL is small the chance of more than one gene conversion event in one generation is negligible.

The transferred chunk of nucleotides originates from a randomly chosen sequence in the population. In Wiuf (2000), the length, Z, of the converted chunk can have an arbitrary distribution. Here we consider the specific case where Z follows a geometric distribution with parameter q, i.e., P(Z∣conversion)=q(1−q)Z−1,Z≥1. (1) That is, at least one nucleotide is transferred. It is assumed that the insertion happens to the right of the gap where the conversion initiates. This model is essentially equivalent to a model where the insertion happens to the right of the gap with probability p and to the left with probability 1 − p (see Wiuf 2000). The setting chosen is of mathematical convenience. If the conversion initiates at gap S, S = 1, 2 …, L (S for start), then the end point of the conversion is T = S + Z (T for terminate). If S + Z is >L, the conversion is partly outside the L + 1 observed nucleotides. The geometric distribution is in concordance with results in Hilliker et al. (1994) and emerges as a natural choice if the probability of adding an extra nucleotide to the transferred chunk does not depend on the current length. Later, we comment on other more realistic length distributions.

Let Q = qL. If q is small and L is large the distribution of z = Z/L can be approximated with an exponential distribution with parameter Q, i.e., z ~ Exp(Q) (where ~ means “is distributed like”). Further, let G = 4gNL be the expected number of gene conversion events per sequence per 4N generations and assume g is small and N large. The definition of G is analogous to the parametrization adopted for the coalescence with recombination (Hudson 1983).

The genealogical process of a sample of n presentday sequences is studied: time starts at the present and increases, going backward in time. Under the above assumptions we find that the waiting time, WC (in units of 2N generations), until a sequence has been created by a gene conversion event that initiates within the sequence is approximately exponentially distributed with parameter G/2, i.e., WC ~ Exp(G/2), if N is large and gL is small. The rate of gene conversions initiating outside the sequence but ending within the observed sequence must also be taken into account. This rate depends on the distribution of Z and is discussed in the subsequent section. The rate of coalescence is n(n − 1)/2 if there are n sequences in a sample (Kingman 1982) assuming a constant population size.

We note that Q as well as G scale linearly in L, the observed number of gaps between nucleotides. That is, both Q and G are doubled if the observed number of gaps, L, is doubled. The expected length of the transferred chunk in a gene conversion is 1/q (measured in number of nucleotides). Therefore, the parameter Q is interpreted as the sequence length measured in units of expected length of the transferred chunk. Similarly, the parameter G is sequence length measured in expected number of gene conversion events per sequence per 4N generations. We note that Q/G = q/(4gN) is independent of L.

EFFECTS OF GENE CONVERSION

In this section we discuss the effects of a single gene conversion event on a sequence and find the waiting time until the sequence has been created by a gene conversion event.

Denote by ζ, σ, and τ the variables Z/L, S/L, and T/L, respectively, where Z, S, and T are as defined in the previous section. The variables σ and τ take values in {1/L, 2/L, …, 1 − 1/L} ⊆ (0, 1) and as L becomes large the distributions of σ and τ converge to continuous distributions on (0, 1). The representation of a sequence as the continuous interval (0, 1) is commonly used (see, e.g., Griffiths and Marjoram 1997).

Let C denote the event that a gene conversion happens in a given sequence in a given generation. Further, let C1 denote the event that the gene conversion falls partly outside the L + 1 observed nucleotides, that is, S + Z > L. Similarly, let C2 denote that both end points are within the sequence length (lower index i indicates that i of the two end points of the converted chunk are within the L nucleotides). We find P(C2∣C)=1−1Q{1−exp(−Q)}≡K(Q) (2) and P(C1∣C)=1−P(C2∣C)=1Q{1−exp(−Q)}=1−K(Q). (3) Denote by WC1, the time until an event of type C1 and by WC2 the time until an event of type C2. Then we have WC1 ~ Exp(G(1 − K(Q))/2) and WC2 ~ Exp(GK(Q)/2), such that the waiting time, WC, until a gene conversion event of either type is WC = min(WC1, WC2) ~ Exp(G/2). This is similar to the recombination model; the waiting time to a recombination event within the observed sequence is Exp(R/2), where R = 4rNL is the rate of recombination and r is the probability of a recombination break between any two nucleotides in the sequence.

The end points of the tract are affected by the type, C1 or C2, of the event. The density, fσ,τ(s, t|C2), of σ and τ conditional on C2 is given by fσ,τ(s,t∣C2)=Qexp(−Q(t−s))K−1(Q),0<s<t<1, (4) from which the marginal densities of σ and τ easily can be derived. The distribution of σ (and τ) is not uniform but skewed toward 0 (and 1). The density (4) is symmetric so that (1 − τ, 1 − σ) is distributed like (σ, τ).

The density of (σ, τ) conditional on C2 relates to that of ~ conditional on C2 because ζ = τ − σ. We find fζ(z∣C2)=Q(1−z)exp(−Qz)K−1(Q),0<z<1, (5) where fζ(z|C2) denotes the density of ~ conditional on C2.

Finally, the density of σ conditional on C1 (that is, σ + ζ falls outside the observed nucleotides, σ + ζ > 1) is fσ(s∣C1)=exp(−Q(1−s))(1−K(Q))−1,0<s<1. (6) If the tract falls outside the observed sequence the chance that σ is close to 1 is higher than the chance that σ is close to 0. This is in concordance with (6); the density function is increasing in s.

In Figure 4 we illustrate the above different distributions with q obtained from D. melanogaster data.

Gene conversions initiating outside the L + 1 nucleotides will have a chance of terminating within the L + 1 observed nucleotides. Assume that the entire chromosome potentially consists of an infinite array of sequences of length L plus the observed one of length L + 1 and that the gene conversion model described above is valid for the entire chromosome.

For most organisms, there is an upper bound to the length of a gene conversion tract. This means that only a minor (finite) extension of the observed sequence should be taken into consideration and not the entire chromosome. In the model discussed here, tract lengths can have an arbitrary size, but it can be shown that the probability of a tract initiating at least nL nucleotides away from the observed sequence and ending within the sequence, given a gene conversion event happens that ends in the observed sequence, is of order exp(−nQ). Basically only tracts initiating close to the observed sequence affect the sequence history in concordance with what is expected.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Densities of the initiation point and the length of the gene conversion. Shown are the densities of σ conditional on C1 and C2, respectively, as well as the density of ~ conditional on C2. The density of σ given C2 is found from (4); fσ(s|C2) = {1 − exp(−Q(1 − s))}K−1(Q), 0 < s < 1. The parameter Q is in all graphs 5. Hilliker et al. (1994) estimates q in D. melanogaster to be ~1/350 and this corresponds in this case to an observed sequence length five times the expected length of a gene conversion tract or 1750 nucleotides, L = Q/q.

For each gene conversion tract initiating within the observed sequence and ending outside the observed sequence there is a similar tract initiating outside (to the left of the observed sequence), ending within the observed sequence. Because the chance of gene conversion events is the same along the entire chromosome, the two events have the same probability of happening.

Let Co(o for outside) denote the event that a gene conversion initiates outside the observed sequence, and terminates within. The above informal argument shows that P(Co)=P(C1∣C) (7) and that the waiting time, WCo, until a gene conversion initiating outside the observed sequence ends within the observed sequence, is distributed like the waiting time, WC1, until an event of type C1; that is, WCo ~ WC1. This is a general result that applies to the models described in Wiuf (2000).

Further, conditional on Co, the termination point t = T/L has density fτ(t∣Co)≈Qexp(−Qt)1−exp(−Q)=exp(−Qt)(1−K(Q))−1,0<t<1, (8) because the length is exponentially distributed, and the chance that an exponential variable Exp(Q) is <1 is 1 − exp(−Q).

The events and Co and C1 are independent. Thus, the waiting time WC1∪Co = min(WC1, WCo) until an event of either type C1 or type Co is approximately exponentially distributed with parameter G(1 − K(Q)), and where the initiation/termination point, δ, happens, is distributed with density fσ(s∣Co∪C1)=12{exp(−Q(1−s))+exp(−Qs)}⋅(1−K(Q))−1,0<s<1, (9) according to (6) and (8).

Summing up, we find that the waiting time, W, until a sequence is created by a gene conversion event is exponentially distributed W~Exp(G2{1+1Q(1−exp(−Q))})~Exp(G2{2−K(Q)}), (10) because P(Co) + P(C1|C) + P(C2|C) = 2 − K(Q).

Given a gene conversion occurs, the probability that both end points of the inserted chunk are within the observed sequence is p2=1−(1∕Q)(1−exp(−Q))1+(1∕Q)(1−exp(−Q))=K(Q)2−K(Q), (11) and the probability that only one end point is within the observed sequence is p1=(2∕Q)(1−exp(−Q))1+(1∕Q)(1−exp(−Q))=2(1−K(Q))2−K(Q). (12) Whether it is a type Co or type C1 event happens with probability 1/2. The density of (σ, τ) is given by (4) (if there are two breakpoints) and the density of the single breakpoint σ is in the last case given by (9). In the example in Figure 4, Q = 5, or five times the expected tract length, and we find p2 = 0.67 and p1 = 0.33.

The distribution in (10) is interesting for several reasons. First, the distribution of the tract length, ~, depending on Q, affects the number of events and the times between events. The parameter G(2 − K(Q))/2 varies from G/2 when Q = ∞ to G when Q = 0. Second, for all Q > 0 the number of events is higher than the number of events in a recombination model with a similar parameter (i.e., r = g).

The two extreme values, Q = 0 and Q = ∞, are of interest. If Q = 0 we find that W ~ Exp(G), and a gene conversion will leave just one breakpoint within the sequence. Where the break occurs will be uniformly distributed along the sequence. Thus, the coalescent with gene conversion resembles the coalescent with recombination, but the rate of gene conversions is twice the rate of recombinations (assuming the probability of a recombination between any two nucleotides is g).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

The simulation scheme. Define Embedded Image = G(1 + (1 − exp(−Q))/Q) = G(2 − K(Q)), and let k denote the number of sequences ancestral to the sample of size n at a given time in the past. Further, let Ui, i ≥ 1, and Vj, j ≥ 1, be series of uniform variables on (0, 1). The exponential variable Wi is the time between event i − 1 and i in the samples history. Events are either coalescence or gene conversions, and if a gene conversion occurs we distinguish between two types according to the number of end points within the observed sequence. With probability p1 there is one end point only, otherwise two. The index i counts the number of events and j the number of gene conversions. That is, i − j is the number of coalescence events. The algorithm stops the first time there is one ancestor to the sample.

If Q = ∞ we find that W ~ Exp(G/2) and that both end points of a gene conversion will be within the sequence. As Q goes toward infinity, σ will be uniformly distributed along the sequence, and τ ≈ σ, that is, ζ ≈ 0. In the limit Q = ∞, a gene conversion will just leave a spot on the sequence, leaving the sequence the way it is. Thus, the coalescent with gene conversion will be indistinguishable from the pure coalescent process. Each time a conversion happens, it cannot be seen, and only coalescent events affect the history of a sample.

MODEL SIMULATIONS

Assume n sequences are sampled from a present-day population. The genealogy of the sample subject to gene conversion as described in the previous section can be simulated according to the scheme in Figure 5.

The algorithm is formulated as a birth and death process with constant death rate, G(2 − K((Q))/2, and terminates when there is only one ancestral sequence to the sample. Griffiths and Marjoram (1997) developed a similar birth and death process to handle to coalescent with recombination. Both algorithms return a genealogy in which the “true” genealogy of the sample is embedded; we keep track only of the number of sequences and the end points of gene conversions, not which parts of the sequences that are ancestral to the sample. Gene conversions can happen outside the ancestral material to a sequence, thereby creating an “empty” sequence, i.e., a sequence with no material ancestral to the sample (cf. Figure 2).

Consider a fixed point, χ, along the sequences. The tree, T(χ), that describes the sequences at χ can be found by starting at the present sequences and going back in time. When a gene conversion node is encountered, the branch that describes the segment containing χ is followed. Assume that the gene conversion leaves only one end point, s, in the sequence. If s < ω, then the branch describing the fate of the left part of the sequences is to be followed and vice versa and similarly, if the gene conversion leaves two end points. The tree T(χ) is distributed like the coalescent process since one point cannot be subject to gene conversion.

Mutations can be superimposed on the genealogy if all alleles are selectively neutral. Assuming mutations arrive according to a Poisson process, the number of mutations on the total genealogy is Poisson with parameter θB/2, where B is the total branch length of the entire genealogy, θ = 4Nu is the scaled mutation rate, and u is the probability of a mutation in a sequence per generation.

The coalescent with gene conversion can be combined with the coalescent with recombination. The genealogy of a sample of sequences is constructed similarly to the scheme above, waiting for three different events to occur: coalescence, recombinations, and gene conversions.

RESULTS

As noted previously, Q0 = Q/G = q/(4gN) is independent of L. The parameters (Q0, G) are more natural parameters than (Q, G); G represents the length of the sequence in units of gene conversion events per sequence per 4N generations and Q0 is a parameter dependent on the effective size of the population and parameters intrinsic to the biological system. In contrast (Q, G) are both parameters representing sequence length, but in different units.

In the following, results are given in terms of (Q0, G) to facilitate comparisons between samples of different lengths but with the same Q0. If we consider Q0 fixed and G a variable, results (and plots) can be converted between models with different g's but equal Q0 through a linear scaling of sequence length.

If sequence length is measured in units of expected number of events per sequence per 4N generations, we find that the scaled length ζ′ = ζG of a tract is fζ′(z)=Q0exp(−zQ0),z>0 (13) [according to (1)]; that is, exponential with parameter Q0 = q/(4Ng). The waiting time, W, until a sequence is created by a gene conversion event is in this formulation given by W~Exp(G2+12Q0(1−exp(−Q0G))) (14) [from (10)]. The probabilities of the different types of gene conversion events are given by (11) and (12) with Q replaced by Q0G. A similar remark applies to the densities of the end points σ and τ.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Probability that two positions share an ancestor, n = 2. Shown is equation 15 for different values of Q0. Sequence length is in expected number of gene conversions per sequence per 4N generations. Using estimates in Hilliker et al. (1994), we find Q0 = 0.05 in D. melanogaster (g = 3 × 10−8, q = 3 × 10−3, and 2N = 106). The curve for Q0 = 0.05 flattens out approximately at 0.027 for G > 40 in which case Q > 2, or twice the expected length of a gene conversion tract. The case Q0 = 0 corresponds to a model with recombination only at rate 2G.

In the following, we consider a sample of size n taken from the population at the present time. First, we focus on correlations between trees. Consider two positions, χ1 and χ2, in a sample of n sequences at distance G. We are interested in the trees, Tn(χ1) and Tn(χ2), that describe the sequences at χ1 and χ2. Only events of type C1 and Co in between positions χ1 and χ2 affect the relation between the two trees. Events of type C2 in between the positions cannot be traced. The rate, rG/2, by which events of type C1 and Co happen is, per sequence, rG=21Q0(1−exp(−Q0)G); cf. (10) and (12).

For small values of G we find rG ≈ 2G. This is of particular interest because the rate at which recombinants are produced by recombination is only G (assuming r = g), and this is half the rate at which recombinants are produced by gene conversion events. Thus, gene conversion events might contribute significantly to linkage disequilibrium over small distances (see also Wiuf 1999). For example, according to Andolfatto and Nordborg (1997), g = 3r in D. melanogaster and the rate of gene conversions is thus sixfold that of recombination over small distances.

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Simulated values of the time until all positions have found a MRCA in a sample of size n = 2. The expected value 2(1 − 1/n) of the time to a MRCA in a single position is subtracted. The curves increase with increasing values of Q0. This is in agreement with the fact that the expected number of gene conversions also increases with increasing Q0 (see Figure 9). The D. melanogaster curve, Q0 = 0.05, follows Q0 = 0 very closely (not shown). A total of 10,000 simulations were performed for each value of Q0.

Applying results in Griffiths (1991), we find for n = 2, P(T2(χ1))=T2(χ2)=Cov(T2(χ1),T2(χ2))=18+rG18+13rG+rG2, (15) where Cov denotes the covariance between variables. We note that the expression in (15) obtains its minimum when G = ≈, in which case rG = 2/Q0. Thus the covariance is strictly positive for all values of G and converges toward a nonzero constant as G → ∞. We have P(T2(χ1)=T2(χ2))≥9Q0+19Q02+13Q0+2. (16)

The probability P(T2(χ1) = T2(χ2)) for n = 2 is plotted in Figure 6. The tree height, Tn = max{Tn(χ)|χ}, until all positions in the sample have found a most recent common ancestor is plotted in Figure 7 for n = 2. For Q0 = 0.05 and G = 5 we find Q = 0.25, which corresponds to ~100 nucleotides in D. melanogaster (see Figures 4 and 6). In this case the chance that two positions separated at distance 5 share a most recent common ancestor (MRCA) is ~0.10 if sample size is 2.

Figure 8.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 8.

Number of gene conversion events. The number of gene conversion events affecting the history cannot be counted by chopping up the sequence into small sections, di, i = 1, 2, …, counting the number, ci, in each section and then summing over i, ∑ici. Knowing that one end point of a gene conversion is in section di does not provide information whether or not the other end point is within ancestral material. In the example, c1 = 2, c2 = 1, and c1 + c2 = 3, but only two events affect the history. Each breakpoint occurs twice in the drawing.

Also of interest are the numbers of different types of gene conversion events. We discuss two such numbers. The first is the number of gene conversion events affecting the history of the sample. Denote this number by Gn. The second is the number of gene conversion end points that make the topology change. Call this number Sn. This number relates to the possibilities of detecting gene conversion events in the sample history. Under an infinite-sites mutation model (Watterson 1975) with a mutation rate very high compared to that of gene conversions, G, all shifts in topologies can be detected from sequence polymorphisms (Hudson and Kaplan 1985). Also, these are the only events that can be detected (under the infinite-sites model).

It is not possible to find the expectation of Gn analytically. The reason for this is the following. The expectation of Gn is not linear in G because a gene conversion with both end points in ancestral material is only counted once. Knowing that a gene conversion has one end point within ancestral material in a small sequence interval does not reveal if the other end point also is within ancestral material. Thus, information about gene conversion end points in a small sequence interval does not provide information whether the points count in Gn (Figure 8). In the coalescent with recombination, the expected number of recombination events can easily be found because the number is linear in R (Hudson and Kaplan 1985). The expected value of Gn is simulated for different values of G and Q0 and plotted in Figure 9. Asymptotically, the expectation of Gn is linear.

The expected number of events causing a topology shift is given by E[Sn]≈2{∑i=1n−11i−2.14}G. (17) The proof is similar to that of Hudson and Kaplan (1985; see Wiuf et al. 1999 for the approximation). The difference is the factor 2 in front, which comes from the fact that rG ≈ 2G for small G and not rG = G as with recombination. Again, this can result in a considerable difference in the number of topology shifts seen in models with recombination vs. gene conversion.

Figure 9.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 9.

Expected value of Gn. Sequence length is in expected number of gene conversions per sequence per 4N generations. As sequence length increases the expected number of gene conversions becomes effectively linear. The sample size is 10 and 10,000 simulations were performed for each value of Q0.

Consider position 0 in the sequences and let Bn be the total length of the coalescent tree in position 0. Wiuf and Hein (1999) consider the coalescent with recombination and discuss the sequence length, Λn, conditional on Bn, from position 0 until the first recombination point affecting the history of the sample. They find that P(Λn>x∣Bn=b)=exp(−bx∕2),x>0, (18) where sequence length is measured in units of expected number of recombination events per sequence per 4N generations. Sequences are here potentially infinitely long.

Here, we give the analogous result for the coalescent model with gene conversion. The situation differs in the sense that we wait for either a gene conversion initiating to the right of position 0 [analogous to (18)] or a gene conversion initiating to the left of position 0 but ending to the right of 0. The proof is given in the appendix. The distribution of the length, Λn, until the first point affecting the history of the sample, conditional on Bn, is P(Λn>x∣Bn=b)=exp{(e−Q0x−1)b∕(2Q0)−bx∕2},x>0. (19) Similarly, the distribution of Λn can be found (see appendix) and is given by P(Λn>x)=Πi=1n−1ii+x+(1−exp(−xQ0))∕Q0. (20) If n = 2 we find that the expectation of Λn is infinite; that is, in general the sequence length until the first gene conversion event is large. For n > 2 the expectation is finite.

DISCUSSION

We developed a coalescent model with gene conversion assuming a diploid population of constant size, N. It takes two parameters as input (along with the sample size). The first is the product G = 4NLg, where g is the probability that a gene conversion tract initiates in a fixed position, and the second parameter is Q = qL, where q is the probability that the next nucleotide is within the tract given that the former is within the tract. An easy simulation scheme was developed. This scheme could be modified, in accordance with Griffiths and Tavaré (1994), to allow for fluctuation in the population size with time.

We derived a number of results related to the correlation of trees in different positions and to the probability that two given positions share a common ancestor. These tend to be highly different from similar quantities obtained in the coalescent with recombination. The covariance between trees relating two distinct positions does not tend to zero with increasing distance between the positions, but is bounded by a positive constant. Thus, it is expected that the trees in the two distinct positions are more likely to share the same topology only in a model of gene conversion than in a model of recombination with similar rate. Figure 10 confirms this. As Q0 = Q/G increases, the probability, π, of common topology in two distinct positions decreases and for fixed value of Q0, π converges (as function of distance) to a level distinct from 0 and 1. Low values of Q0 have a similar effect to that of recombination.

We should note in passing that given a gene conversion end point is in position χ, the probability, p, of detecting that the gene conversion has occurred from incompatibilities in the sequence alignment is very low and slowly increases with sample size. Hudson and Kaplan (1985) and C. Wiuf, T. Christensen and J. Hein (unpublished results) found p≤1−2.14log(n) for large sample sizes, n; e.g., if n = 1000, p ≤ 0.69. The proof is similar to that of Hudson and Kaplan (1985) and relates to the expected number of topology shifts [see Hudson and Kaplan 1985 and formula (17)].

It is of high interest to be able to distinguish mechanisms operating on the molecular level in patterns of variability obtained in a sample of sequences. An indicator of gene conversion could be the following. Define an informative site to be a site where at least two different nucleotides are present in at least two sequences each. Further, say a pair of sites is compatible if there exists a topology explaining the sites with the minimum possible number of substitution events. Now, assume three informative sites are given. If the first and last sites are compatible, and the two remaining pairs involving the middle site are both incompatible, we would take this as evidence of a gene conversion event. In a pure recombination model, given the middle site is incompatible with both other sites, we would expect that the first and the last sites are also incompatible. We have simulated how often under ideal circumstances this is likely to happen. It is assumed that the topology can be accurately constructed for each column in the sequence alignment. This is, for example, the case in an infinite-sites model with very high mutation rate. Table 1 shows simulation results for various values of Q0 and G.

Figure 10.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 10.

Simulated probabilities that two positions do not share the same topology for various values of Q0. Sequence length is in expected number of gene conversions per sequence per 4N generations. Consider again D. melanogaster. The curve Q0 = 0.95 follows Q0 = 0 very closely and the chance that two points at distance 1 have different topologies is >70%. This corresponds to a distance of 5% the length of the expected tract length or 0.05 × 350 ≈ 20 nucleotides. Sample size is 10 and 20,000 simulations were performed for each value of Q0.

Denote the positions 1, 2, and 3. Positions 1 and 3 are fixed whereas 3 is uniformly distributed between 1 and 3. The findings in the table can be explained as follows. Assume Q0 = 0. If 1 and 2 do not share topology and similarly for 2 and 3, this requires at least two recombination events. As the distance between 1 and 3 increases, the chance of more recombination events before the MRCAs to positions 1 and 3 increases; thus the chance that 1 and 3 share the same topology decreases.

If Q0 > 0 the pattern changes. When the distance is small compared to 1/Q0 (expected tract length), most events will be of type C1 and Co. A similar pattern to that for Q0 = 0 is thus expected. As the distance increases, the rate by which 1 and 3 are separated becomes almost constant (rG = 2{1 − exp(−Q0G)}/Q0) and if (1, 2) and (2, 3) do not share topologies it is most likely caused by a gene conversion event with both end points within 1 and 3. Thus we will accept more cases where 1 and 3 share topology.

View this table:
  • View inline
  • View popup
TABLE 1

Distance between 1 and 3

Now fix the distance between the positions. As Q0 increases, the number of C2 events increases relative to C1 and Co events (Equations 11 and 12) and again if (1, 2) and (2, 3) do not share topologies it is most likely caused by a gene conversion event that removes a region around position 2. Again, we expect more cases where 1 and 3 share topology.

The pattern observed in Table 1 is obtained under ideal circumstances. In practice, the topology relating a set of sequences in a given position can be reconstructed from the nucleotide pattern in that position in rare cases only. Under an infinite-sites model with low mutation rate, the chance that 1 and 3 can be explained by the same topology is higher than the value given in Table 1. Similarly, recurrent substitutions have the same effect, but in both cases the overall pattern in Table 1 is retained. These results suggest that in a pure recombination model we will see many more pairs of incompatible sites than in a model with pure gene conversion (assuming the rate of recombination is similar to the rate of gene conversion). On the other hand we expect many more topology shifts under a gene conversion model, so that we must return, moving along the sequences, to previously visited topologies more often than under a recombination model. Also, the results in Table 1 imply that recombination can be distinguished from gene conversion by the spatial arrangement of incompatible pairs, simply by estimating the probability in Table 1 from data. If recombination is not present we will expect this probability to be significantly different from 0.

It would be advantageous to build a model of gene conversions and recombination more directly on a molecular model of these phenomena. This will be pursued in greater depth elsewhere, but the basic issues involved are shortly sketched here.

Central to most models of gene conversion/recombination is the Holliday junction (Holliday 1964). Although the original Holliday model has since been superceded by more complicated models, such as the single strand invasion model (Meselson and Radding 1975) and the double-chain-break repair model (Resnick 1976), the Holliday model would be still be the natural starting point for modeling gene conversion from a population genetical viewpoint. Issues in the Holliday model would also be central in the more complicated models. In the Holliday model the following three problems would be of central importance.

  1. The movement of the Holliday junction and its consequences. If the movement is controlled by a random walk that is stopped at a fixed time, the length of the gene conversion would be close to a symmetric binominal distribution moved so its mean is zero. Since the movement of the Holliday junction along nonidentical sequences will create heteroduplexes that are energetically unfavorable to homoduplexes, it could be expected that there would be a sequence-dependent centralizing drift in this random walk. Since the energetic costs of mismatches are approximately known, this distribution could be calculated. The length of gene conversions would then be sequence dependent. This would be a complication when incorporating it into the coalescent that takes a view from the present toward the past.

  2. How is the heteroduplex corrected? Two extremes can be imagined. The heteroduplex is resolved when the DNA is duplicated or one single strand is made the master copy and the other single strand is corrected so it matches to this master strand. From the point of view of redistributing ancestral material, this is simple as the region within the gene conversion will choose one ancestor. The other possibility, that mismatches are corrected independently one by one, has more drastic consequences for the distribution of ancestral material as each correction will be independent choices of parents. Within the region of the gene conversion, different positions can have different ancestors. This would also have drastic consequences for the phylogenetic relationships among the sequences as one moves along the sequences, as even arbitrarily close positions could be related by different phylogenies.

  3. The Holliday structure can be resolved in two ways, one leading to a gene conversion, but no recombination, and the other leading to a gene conversion and a recombination. What are the probabilities of gene conversion and recombination? Few, if any, biological systems have gene conversion but no recombination. In modeling the consequences, it would be natural to make this probability a single parameter in the model.

Later models, especially Meselson and Radding (1975) and the double exchange model (Resnick 1976), are more complicated and will slightly change the picture. From a population genetical viewpoint, all that matters in these different models is how the ancestral material is redistributed when viewed from the present toward the past. The Meselson-Radding model would have a very short region with a one-directional gene conversion added relative to the pure Holliday model, due to the strand invasion. Since this region is expected to be short relative to the length of the migration of the Holliday structure, the Meselson-Radding model is expected to be very similar to the pure Holliday model. The double-chain-break repair model would have two strand invasions and also two Holliday junctions that could undergo a random walk relative to each other. The double-chain-break repair model probably has greater consequences for the distribution of ancestral material than the Meselson-Radding model, but that remains to be explored.

Since it is not well known how different recombination mechanisms are phylogenetically distributed, it would be interesting to know if it was ever possible to distinguish the underlying mechanism from pure population data. It would here be natural to formulate what was expected in terms of different sets of incompatibilities. For instance, if three informative sites were given, and the first and last sites were compatible, but the two remaining pairs involving the middle site were incompatible, this would be taken as an indicator of a gene conversion. If more informative sites were found close to the middle position, these would be expected to have different kinds of compatibilities dependent on which kind of repair mechanism was operating on the heteroduplex.

Whether or not different recombination/gene conversion models are distinguishable by population sequence data depends on the frequency of configurations of segregating nucleotides in the population that would result in different products when recombined under different models. A functional geneticist would design the necessary configurations of nucleotides and set up the adequate crosses. In population genetics this would have to occur by chance. It is unlikely that population sequence data can compete with designed experiments in this respect, but this does not imply that different mechanisms of recombination will not have consequences for the expected patterns of variation in population sequence data; such data are becoming available from many organisms, where molecular genetical experiments have not yet been performed.

Acknowledgments

J. P. Hjorth is thanked for helpful discussions on what gene conversion is, M. Nordborg for commenting on an early version of the manuscript, and M. Schierup for various suggestions that improved the manuscript considerably. We thank A. Mikkelsen for implementing the model and performing the simulations used in the article. C.W. was supported by grant Biotechnology and Biological Sciences Research Council 43/MMI09788 and by the Carlsberg Foundation, Denmark. Part of this work was carried out while the authors visited the Isaac Newton Institute, University of Cambridge.

APPENDIX: PROOF OF FORMULAS (19) AND (20)

Wiuf and Hein (1999) consider the coalescent with recombination and discuss the sequence length, Λn, conditional on the total branch length, Bn, from position 0 until the first recombination point affecting the history of the sample. They find that P(Λn>x∣Bn=b)=exp(−bx),x>0, (A1) where the sequence length is measured in units of expected number of recombinations per sequence per 4N generations. Sequences are here potentially infinitely long.

Remember [see (10), (11), and (12)] WCo~Exp(G(1−K(Q0G))∕2) and WC1∪C2~Exp(Q0G∕2). Whenever an event of type C = C1 ∪ C2 happens the initiation point is uniform along the sequence (per definition). Therefore, the sequence length, Λ1,2 from position 0 until the first breakpoint affecting the history of the sample is distributed like (A1) P(Λ1,2>x∣Bn=b)=exp(−bx∕2),x>0, (A2) for sequences potentially infinitely long.

The rate, G(1 − K(Q0G))/2, of type Co events converges to 1/(2Q0) for G → ∞. Conditional on Bn = b, the number, Go, of Co events in the sequences ancestral to position 0 is thus Poisson distributed with parameter b/(2Q0), P(Go=k∣Bn=b)=1k!(b2Qo)exp(−b∕2Q0),k=0,1,…. (A3) The density, fτ(x|Co, G), of the termination point τG = Gτ (in units of sequence length) converges to an exponential distribution for G → ∞ fτ(x∣Co,G)=Q0exp(−Q0x)(1−exp(−Q0G))−1→Q0exp(−Q0x),x>0 (A4) [from (8)]. Combining (A3) and (A4) we find the distribution of the length, Λo, until the first point of type Co affecting the history of the sample, P(Λo>x∣Go>0,Bn=b)=∑k=1∞exp(−kQ0x)1k!(b2Q0)k⋅exp(−b∕(2Q0))1−exp(−b∕(2Q0))=exp(−b∕(2Q0))1−exp(−b∕(2Q0))⋅{exp(e−Q0xb∕(2Q0))−1},x>0. (A5) Outside {G0 > 0}, Λ0 = ∞.

Finally, combining (A2), (A3), and (A5) we deduce that Λn = min (Λo, Λ1,2) has distribution P(Λn>x∣B=bn)=exp{(e−Q0x−1)b∕(2Q0)−bx∕2},x>0, (A6) as required. The distribution of Λn can be found in the following way. The event {Λn > x} is equivalent to the event that no gene conversion events occur in a sample of sequences of length x before a MRCA is found. This has probability P(Λn>x)=Πi=1n−1ii+x+(1−exp(−xQ0))∕Q0. (A7)

Footnotes

  • Communicating editor: A. G. Clark

  • Received August 4, 1999.
  • Accepted January 18, 2000.
  • Copyright © 2000 by the Genetics Society of America

LITERATURE CITED

  1. ↵
    1. Andolfatto P.,
    2. Nordborg M.
    , 1997 The effect of gene conversion on intralocus associations. Genetics 148: 1397–1399.
    OpenUrlFREE Full Text
  2. ↵
    1. Carpenter A. T. C.
    , 1984 Meiotic roles of crossing-over and of gene conversion. Cold Spring Harbor Symp. Quant. Biol. 49: 23–29.
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. Basawa I. V.,
    2. Taylor R. L.
    1. Griffiths R. C.
    , 1991 The two-locus ancestral graph, pp. 100–117 in Selected Proceedings of the Symposium of Applied Probability, Sheffield 1989, IMS Lecture Notes-Monograph Series, 18, edited by Basawa I. V., Taylor R. L.. Hayward, CA.
  4. ↵
    1. Donnelly P.,
    2. Tavaré S.
    1. Griffiths R. C.,
    2. Marjoram P.
    , 1997 An ancestral recombination graph, pp. 257–270 in Progress in Population Genetics and Human Evolution, IMA Volumes in Mathematics and Its Applications, 87, edited Donnelly P., Tavaré S.. Springer-Verlag, Berlin.
  5. ↵
    1. Griffiths R. C.,
    2. Tavaré S.
    , 1994 Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. Ser. B 344: 403–410.
    OpenUrlPubMedWeb of Science
  6. ↵
    1. Hilliker A. J.,
    2. Harauz G.,
    3. Reuame A. G.,
    4. Gray M.,
    5. Clark S. H.,
    6. et al.
    , 1994 Meiotic gene conversion tract length distribution within the rosy locus of Drosophila melanogaster. Genetics 137: 1019–1026.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Holliday R.
    , 1964 A mechanism for gene conversion in fungi. Genet. Res. 5: 282–287.
    OpenUrlCrossRefPubMedWeb of Science
  8. ↵
    1. Hudson R. R.
    , 1983 Properties of the neutral allele model with intragenic recombination. Theor. Popul. Biol. 23: 183–201.
    OpenUrlCrossRefPubMedWeb of Science
  9. ↵
    1. Hudson R. R.,
    2. Kaplan N.
    , 1985 Statistical properties of the number of recombination events in the history of DNA sequences. Genetics 111: 147–164.
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Kingman J. F. C.
    , 1982 The coalescent. Stoch. Process. Appl. 13: 235–248.
    OpenUrlCrossRef
  11. ↵
    1. Meselson M. S.,
    2. Radding C. M.
    , 1975 A general model for genetic recombination. Proc. Natl. Acad. Sci. USA 41: 215–220.
    OpenUrl
  12. ↵
    1. Resnick M. A.
    , 1976 The repair of double-strand breaks in DNAs: a model involving recombination. J. Theor. Biol. 59: 97–106.
    OpenUrlCrossRefPubMedWeb of Science
  13. ↵
    1. Stahl F. W.
    , 1994 The Holliday junction on its thirtieth anniversary. Genetics 138: 241–246.
    OpenUrlFREE Full Text
  14. ↵
    1. Tavaré S.
    , 1984 Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol. 26: 119–164.
    OpenUrlCrossRefPubMedWeb of Science
  15. ↵
    1. Watterson G. A.
    , 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276.
    OpenUrlCrossRefPubMedWeb of Science
  16. ↵
    1. Wiuf C.
    , 2000 A coalescent approach to gene conversion. Theor. Popul. Biol. (in press).
  17. ↵
    1. Wiuf C.,
    2. Hein J.
    , 1997 On the number of ancestors to a DNA sequence. Genetics 147: 1459–1468.
    OpenUrlAbstract/FREE Full Text
  18. ↵
    1. Wiuf C.,
    2. Hein J.
    , 1999 Recombination as a point process along sequences. Theor. Popul. Biol. 55: 248–259.
    OpenUrlCrossRefPubMedWeb of Science
View Abstract
Previous ArticleNext Article
Back to top

PUBLICATION INFORMATION

Volume 155 Issue 1, May 2000

Genetics: 155 (1)

ARTICLE CLASSIFICATION

INVESTIGATIONS
View this article with LENS
Email

Thank you for sharing this Genetics article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
The Coalescent With Gene Conversion
(Your Name) has forwarded a page to you from Genetics
(Your Name) thought you would be interested in this article in Genetics.
Print
Alerts
Enter your email below to set up alert notifications for new article, or to manage your existing alerts.
SIGN UP OR SIGN IN WITH YOUR EMAIL
View PDF
Share

The Coalescent With Gene Conversion

Carsten Wiuf and Jotun Hein
Genetics May 1, 2000 vol. 155 no. 1 451-462
Carsten Wiuf
*Department of Statistics, University of Oxford, Oxford OX1 3TG, England
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jotun Hein
†Institute of Biological Sciences, University of Aarhus, 8000 Aarhus C, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation

The Coalescent With Gene Conversion

Carsten Wiuf and Jotun Hein
Genetics May 1, 2000 vol. 155 no. 1 451-462
Carsten Wiuf
*Department of Statistics, University of Oxford, Oxford OX1 3TG, England
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jotun Hein
†Institute of Biological Sciences, University of Aarhus, 8000 Aarhus C, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Related Articles

Cited By

More in this TOC Section

  • Evidence for Weak Selective Constraint on Human Gene Expression
  • Robust CRISPR/Cas9-Mediated Tissue-Specific Mutagenesis Reveals Gene Redundancy and Perdurance in Drosophila
  • The Genetic Basis of Mutation Rate Variation in Yeast
Show more Investigations
  • Top
  • Article
    • Abstract
    • THE MODEL
    • EFFECTS OF GENE CONVERSION
    • MODEL SIMULATIONS
    • RESULTS
    • DISCUSSION
    • Acknowledgments
    • APPENDIX: PROOF OF FORMULAS (19) AND (20)
    • Footnotes
    • LITERATURE CITED
  • Figures & Data
  • Info & Metrics

GSA

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.

Online ISSN: 1943-2631

  • For Authors
  • For Reviewers
  • For Subscribers
  • Submit a Manuscript
  • Editorial Board
  • Press Releases

SPPA Logo

GET CONNECTED

RSS  Subscribe with RSS.

email  Subscribe via email. Sign up to receive alert notifications of new articles.

  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus

Copyright © 2019 by the Genetics Society of America

  • About GENETICS
  • Terms of use
  • Advertising
  • Permissions
  • Contact us
  • International access