Previous studies have shown that genetic exchange in bacteria is too rare to prevent neutral sequence divergence between ecological populations. That is, despite genetic exchange, each population should diverge into its own DNA sequence-similarity cluster. In those studies, each selective sweep was limited to acting within a single ecological population. Here we postulate the existence of globally adaptive mutations, which may confer a selective advantage to all ecological populations constituting a metapopulation. Such adaptations cause global selective sweeps, which purge the divergence both within and between populations. We found that the effect of recurrent global selective sweeps on neutral sequence divergence is highly dependent on the mechanism of genetic exchange. Global selective sweeps can prevent populations from reaching high levels of neutral sequence divergence, but they cannot cause two populations to become identical in neutral sequence characters. The model supports the earlier conclusion that each ecological population of bacteria should form its own distinct DNA sequence-similarity cluster.
IT is becoming increasingly clear that a full accounting of ecological diversity in the bacterial world requires a molecular approach. Molecular techniques have demonstrated that only a small fraction of bacterial species are culturable (Ammanet al. 1995; Huberet al. 1995; Ohkuma and Kudo 1996), so our best hope of identifying the full scope of bacterial biodiversity is to characterize the sequence diversity of genes that can be amplified directly from natural habitats (Knightet al. 1992; Pace 1997). Such surveys typically yield clusters of organisms with similar sequences, and each sequence-similarity cluster is typically interpreted as a distinct ecological population (Britschgi and Giovannoni 1991; Murray and Stackebrandt 1995; Boivin-Jahnset al. 1996).
This interpretation is justified because in studies of more familiar and culturable taxa, bacterial systematists have found an empirical correspondence between ecologically distinct populations and sequence-similarity clusters. That is, groups of bacteria known to be ecologically different generally fall into separate sequence-similarity clusters (Vandammeet al. 1996; Palyset al. 1997); conversely, ecologically uncharacterized strains that fall into separate sequence clusters have subsequently been found to have different ecological properties (Balmelli and Piffaretti 1996; Normandet al. 1996). Sequence surveys appear to be an efficient method for discovering the ecological diversity of culturable as well as non-culturable bacteria (Vandammeet al. 1996; Palyset al. 1997).
While ecologically distinct groups of bacteria are frequently distinguishable as separate sequence-similarity clusters, it is important to find a strong theoretical basis for this observation. If there are times when multiple ecological populations of bacteria fall together into the same sequence cluster, molecular approaches may severely underestimate bacterial biodiversity (Cohan 1994a,b, 1995, 1996, 1999).
Recent theory has shown why ecological populations should correspond to sequence clusters (Cohan 1994a,b; Palyset al. 1997). In this theory, ecological populations are defined so that (1) each adaptive mutation confers a benefit only in the genetic background of its original population, and (2) mutant cells bearing an adaptive mutation can outcompete only members of their own population. Natural selection favoring adaptive mutants within a particular population purges that population of genetic diversity at all loci, owing to the low rate of recombination in bacteria. [Each such purging event is called a “selective sweep” (Guttman and Dykhuizen 1994b); we refer to recurrent selective sweeps as “periodic selection” (Atwoodet al. 1951; Koch 1974; Levin 1981).] Because an adaptive mutant does not outcompete cells from other populations, periodic selection purges only the diversity within populations and not the divergence between populations. Each round of periodic selection thereby enhances the distinctness of ecological populations at all loci and fosters the divergence of different ecological populations into separate sequence-similarity clusters.
The tendency for bacterial populations to form separate sequence clusters is opposed by recombination between populations (Cohan 1994a). Depending on the rates of interpopulation recombination and the intensity of periodic selection, the model has shown three possible classes of outcomes of neutral sequence divergence between populations: (1) under extremely low rates of recombination, populations will diverge without bound, so that every nucleotide site that can be substituted harmlessly will eventually become substituted; (2) under higher rates of recombination, populations will reach an equilibrium level of divergence, so that populations fall into distinct sequence clusters, but divergence between them never becomes saturated; and (3) under yet higher rates of recombination, ecologically distinct populations will not be distinguishable by neutral sequence data, as the levels of divergence within and between populations will be nearly equal. Given the low rates of recombination estimated thus far for bacteria (Selander and Musser 1990; Maynard-Smithet al. 1993; Whittam and Ake 1993; Guttman and Dykhuizen 1994a; Roberts and Cohan 1995), the model predicts that each population should be distinct as a separate sequence-similarity cluster, as described in cases 1 and 2 above (Cohan 1994a, 1995; Palyset al. 1997).
Nevertheless, it is not clear that the existing model adequately predicts the degree of sequence divergence between ecological populations. Here we present an alternative and more general model for periodic selection, in which some mutations may be adaptive outside of the context of their original populations. In this model, the domain of competitive superiority of an adaptive mutant (i.e., the cell) is still its own ecological population, but the adaptive mutation (i.e., the allele) can be recombined into other populations, where it can confer higher fitness and cause a local selective sweep within each recipient population (Figure 1). This process may homogenize the populations for any segment that is cotransferred between populations along with the adaptive mutation. We have hypothesized that globally adaptive mutations could homogenize populations for neutral sequence diversity at all gene loci, provided that the size of fragments recombined is large enough and that universally adaptive mutations recur throughout the genome.
In this article, we present a coalescence model to explore the conditions under which universally adaptive mutations can homogenize neutral sequence diversity across ecological populations. We tested whether different ecological populations might fail to diverge into separate sequence-similarity clusters under the rates of recombination observed in bacteria. We also tested whether universally adaptive mutations may prevent populations with low recombintion rates from diverging without bound.
Ecological populations and adaptive mutations: A metapopulation consists of n closely related ecological populations, each containing N cells (Table 1). Each population is adapted to a different ecological niche. Recombination occurs rarely within and between these populations, and the metapopulation is closed to recombination with other such metapopulations.
Following Cohan (1994a), we define an ecological population as the domain of competitive superiority of an adaptive mutant. Thus, an adaptive mutant would outcompete to extinction all other strains from the same population (because they are adapted to the same niche) but would not drive to extinction strains from other populations. While an adaptive mutant (i.e., the cell) has a competitive advantage only within its own ecological population, an adaptive mutation (i.e., the mutant allele) may be either locally or globally adaptive. A locally adaptive mutation confers a benefit only in the genetic background of its original population, whereas a globally adaptive mutation can confer a benefit in any genetic background within the metapopulation. A global selective sweep occurs when a globally adaptive mutation recombines from its original population into other populations: any cell receiving the adaptive mutation from the original population is then able to outcompete other members of its own population (Figure 1). Whereas a globally adaptive mutation confers fitness globally to all cells in the metapopulation, natural selection acts only locally to favor the adaptive genotype within each population.
We assume that selective sweeps are rare events and that the duration of the sweep is short relative to the time between sweeps.
Rate of fixation of adaptive mutations: Following Cohan (1994a), adaptive mutation is modeled as a one-step process that occurs randomly over time at a rate μg (for global adaptations) or μl (for local adaptations) per capita per generation. Each adaptive mutation confers a selective advantage z. Taking into account that only the fraction 2z of adaptive mutations is expected to become fixed (assuming that the population size N ⪢ l/z; Wright 1931), locally adaptive mutations are fixed within a population by directional selection at a rate σl = 2zμlN. It is assumed that once a globally adaptive mutation becomes fixed in its original population (with probability 2z), recurrent recombination and subsequent selection will cause the mutation to eventually become fixed in all populations. Therefore, globally adaptive mutations are fixed at a rate σg = 2zμgnN.
Recombination within and between populations: Recombination in bacteria is unidirectional and the segment recombined is usually a small fraction of the genome (Smith 1988). We therefore model recombination as a gene conversion process in which a segment of the recipient DNA is replaced with the homolog of the donor. The model is concerned with recombination at two loci: a “gene of interest,” whose sequence divergence we wish to predict, and a selected gene, whose adaptive mutation is favored by selection. Each gene is assumed to be short enough so that it is not split by recombination. A single recombination event may involve one or both of the genes, depending on the size of the recombining fragment (h) and the distance between the loci (y).
Recombination follows a modified island model, where cs is the rate (per gene segment per genome per generation) at which individuals integrate (as recipients) DNA at a gene segment of interest from other individuals of the same ecological population; cd is the rate at which individuals integrate DNA from any other ecological population; c is the total rate of recombination at which an individual integrates DNA from any other individual in the metapopulation; thus c = cs + cd. The value cδ is the rate at which individuals integrate DNA from a particular ecological population (other than their own). In a metapopulation consisting of n ecological populations, cδ = cd/(n - 1).
Probability that a selective sweep leads to coalescence: Our model determines the expected time (going backward from the present) to coalescence into a common ancestor for two homologous gene segments occurring today in two different individuals. These individuals may be cells from the same or different ecological populations of the metapopulation.
We define p as the probability that a selective sweep leads to coalescence at a gene segment of interest. This is the probability that two cells chosen from the metapopulation immediately following a selective sweep are identical by descent for the gene segment of interest. Whether a selective sweep results in coalescence at a gene of interest depends on the relative magnitudes of the selective advantage of the adaptive mutation, the rate at which recombination separates the gene of interest from the selected gene, and the population size. If the rate of recombination is high and the selective advantage low, the event is unlikely to lead to coalescence.
We consider several instances of the variable p, corresponding to the probabilities of coalescence within and between populations, for globally and locally adaptive mutations: pl is defined as the probability that a local selective sweep within a population leads to coalescence of segments from that population; pgs is the probability that a global selective sweep leads to coalescence of segments from the same population; and pgd is the probability that a global selective sweep leads to coalescence of segments from different populations of the metapopulation.
In appendix a, we derive a method (adapted from Kaplanet al. 1989) for calculating pl, pgs, and pgd, for the special case of two ecological populations, i.e., n = 2. The variables as pl, pgs, and pgd are functions of cs, cd, N, and q, where q is the probability that a recombination event results in corecombination of the adaptive allele with the segment of interest. This probability is a function of the length h of the DNA taken up by a recipient cell during a recombination event and of the distance y between the adaptive mutation and the segment of interest (both h and y are measured as fractions of the genome): (1)
Because the coalescence of homologous segments from different populations requires that the transfer of the adaptive mutation from population 1 to population 2 includes the segment of interest (Figure 1), pgd will be highly dependent on the probability of cotransfer.
We assume that the size (h) of the recombining DNA fragment is constant, while adaptive mutations occur randomly throughout the genome. Because we are interested in modeling the consequences of many selective sweeps, we need to calculate the mean probability (P) that a selective sweep leads to coalescence, averaged over all possible distances (y) between the neutral marker and the adaptive mutations (i.e., between 0 and 1/2 because the bacterial chromosome is circular): (2)
Both the probabilities p and the above integral were evaluated numerically.
The coalescence model: Our coalescence model calculates the expected time that two homologous gene segments (occurring in different organisms) have diverged since their last common ancestor. These gene segments are postulated to be short enough so that they are not split by recombination. The following are the expected times to coalescence for two strains from the same and different ecological populations, E(ts) and E(td) (derived in appendix b): (3) (4) These equations were solved using Mathematica (Wolfram 1991). The solutions are too long to present here. Numerical representations of the probability density functions, Pts(t) and Ptd(t), of the respective times to coalescence were also calculated using a method outlined in appendix b.
Calculation of the expected nucleotide divergence: The expected nucleotide sequence divergence is predicted using the probability density functions for ts and td following Cohan (1994a). Nucleotide substitutions are postulated to consist only of synonymous mutations, and every third base substitution is taken to be synonymous, with no synonymous substitutions allowed at the first or second bases of codons. The number of neutral substitutions per third base site (Δ) is then obtained by multiplying the time to coalescence by twice the per third base site rate of mutation (μ0), (5) where t = ts or td are the times until coalescence of segments in the same population or different populations, respectively.
The nucleotide sequence divergence over all sites, π, may be calculated by correcting for multiple substitutions per site (Jukes and Cantor 1969) and correcting for substitutions occurring only at third base sites: (6) We used the probability density functions Pts(t) and Ptd(t) to calculate the expected divergence (7) The integration was carried out numerically. The expected nucleotide divergence between segments from the same population, E(πs), and different populations, E(πd), was calculated using Pts(t) and Ptd(t), respectively. The neutral nucleotide divergence after infinite time reaches a level of ¼, which we refer to as “unbounded divergence.”
The following parameter values were used in all numerical calculations: the neutral mutation rate per third base site, μ0 = 3 × 10-10; the selective advantage, z = 10-2; the population size, N = 5 × 1014; and the number of populations in the metapopulation, n = 2. Recombination rates within and between populations were set as equal (cs = cδ; i.e., no sexual isolation between populations) to maximize the homogenizing effect of recombination.
The diversity-purging effect of an adaptive mutation: The probability that a particular global selective sweep causes coalescence, within or between populations, is shown in Figure 2. The probability of coalescence within a population, pgs, is always near 1 because recombination is so rare in bacteria (see also Cohan 1994b). The probability of coalescence of segments from different populations, pgd, is approximately equal to q (Figure 2). This is because a globally adaptive mutation causes coalescence between populations at a gene of interest only when it causes coalescence within each population (occurring with probability pgs) and the gene of interest is cotransferred between populations along with the adaptive mutation (occurring with probability q). Thus, the probabilities of coalescence within and between populations are most similar for genes most closely linked to the adaptive mutation (i.e., q = 1).
The ratio of globally to locally adaptive mutation rates: We next explore the effect of recurrent adaptive mutations on population structure. We focus on the significance of the ratio of globally to locally adaptive mutations. We maintain the total frequency of adaptive mutations constant, while allowing the ratio of global:local adaptations to vary. We consider three relative frequencies of global:local events, 1:0, 1:1, and 0:1 (Figure 3).
Figure 3 shows that globally adaptive mutations reduce neutral sequence divergence between populations compared to the case with only local adaptations. This effect is most pronounced at low recombination rates. When only local selective sweeps are possible, the model shows that a recombination rate of 10-10 leads to unbounded neutral divergence between populations (i.e., πd ≈ 1/4). Increasing the global:local ratio decreases the divergence between populations by up to 50-fold. The divergence within populations also decreases, but to a much lower extent. Thus, increasing the proportion of globally adaptive mutations makes the populations less distinct in neutral characters.
Consider next whether globally adaptive mutations can prevent different ecological populations from diverging into separate sequence clusters. We define populations as falling into separate sequence-similarity clusters when E(πd) > 2E(πs) (Palyset al. 1997). Using this criterion, Figure 4 shows that the critical recombination rates necessary for populations to diverge into separate clusters are nearly the same whether or not globally adaptive mutations occur.
Analysis of a simplified model with no locally adaptive mutations: We concentrated on the effect of globally adaptive mutations by considering the special case of a two-component metapopulation in which all the adaptive mutations are global (Figure 5). For this special case we treated the coalescence equations analytically to gain further insight into the behavior of the sequence divergence functions presented in Figure 3. Noting that bacterial populations are always large enough so that the probability of coalescence by drift is negligible relative to coalescence by periodic selection (i.e., 1/N ⪡ σP), Equations 3 and 4 reduce to (8) (9)
We may consider σgPgs and σgPgd as pseudoparameters, representing the diversity-purging effect of periodic selection (i.e., the rate of selective sweeps times the probability of coalescence within each sweep). The times to coalescence are then determined by only three factors: cδ, the rate of recombination between populations; σgPgs, the within-population diversity-purging effect of global periodic selection; and σgPgd, the between-population diversity-purging effect of global periodic selection.
Consider the relative magnitudes of σgPgs and σgPgd. We used Equation A4 of appendix a to calculate the values of Pgs and Pgd across the range of frequency of recombination (c) and selective advantage (z) considered in this article, and we found that Pgs ≥ Pgd/h. We assume that the size of the recombination fragment h is usually <10% for the genome (see discussion). Therefore σgPgs ⪢ σgPgd. This leaves four regions of magnitude for cδ: cδ ⪢ σgPgs, cδ ∼ σgPgs, cδ ∼ σgPgd, and cδ ⪡ σgPgd. These regions correspond to regions I through IV, respectively, of Figure 5.
Region I of Figure 5, cδ ⪢ σgPgs ⪢ σgPgd, corresponds to very high recombination rates, yielding the following approximation of Equations 8 and 9: (10) Region I of the graph corresponds to the case where high rates of recombination within populations (cs) diminish the diversity-purging effect of periodic selection (Pgs), while high values of interpopulation recombination (cδ) further homogenize the populations, making them indistinguishable (such that expected divergence levels within and between populations are equal).
The conditions in region II, cδ ∼ σgPgs ⪢ σgPgd, yield (11) (12)
In region II of Figure 5, recombination is no longer sufficient to prevent populations from diverging. The divergence between populations is greater than that within populations and is determined by the equilibrium between recombination (which acts to homogenize the populations) and local diversity-purging events (which tend to keep the populations distinct).
The conditions of region III, σgPgs ⪢ σgPgd ∼ cδ, yield (13) (14)
Region III reflects the increasing significance of global periodic selection. Divergence between populations is determined by the combined homogenizing effects of recombination (cδ) and global periodic selection (σgPgd).
Region IV corresponds to the case of extremely rare recombination, σgPgs ⪢ σgPgd ⪢ cδ, yielding (15) (16)
This is the limiting case, where recombination between populations becomes so infrequent that its effects are entirely overwhelmed by periodic selection. In this limit, the divergence within populations is determined solely by the intensity of local purging of diversity, while the divergence between populations is only limited by the intensity of global purging of diversity.
Under the conditions of rare recombination (region IV), the ratio of the times to coalescence (i.e., E[td]:E[ts]) approaches 1/h. This follows from two consequences of rare recombination. First, because the locus of interest and the adaptive mutation are rarely separated by recombination, a selective sweep almost certainly leads to coalescence of gene segments from the same population (i.e., Pgs ≈ 1). Second, the transmission of the adaptive mutation from population 1 to population 2 is likely to be the result of a single transfer event. Hence, the probability of coalescence of two gene segments from different populations (Pgd) approaches the probability that the transfer event was a cotransfer of the adaptive mutation and the segment of interest (averaged over all distances between the two loci). That is, Pgd ≈ h, and E[td]/E[ts] ≈ 1/h.
Effect of recombination fragment size on population divergence: We consider next the effect of the recombination fragment size (h) on the distinctness of ecological populations. In general, larger recombination fragments increase the probability (q) that a gene of interest will cotransfer across populations with an adaptive mutation (Equation 1), thus fostering coalescence of segments between populations (Figure 2). Hence, larger sizes of recombination fragments tend to make ecological populations appear less distinct in neutral characters (A, Figure 5). The effect of h on the distinctness of populations is most important at low between-population recombination rates (Figure 5).
The effect of h on population distinctness (quantified as E[πd]/E[πs]) is shown explicitly in Figure 6. Under very low rates of between-population recombination, the distinctness ratio approaches 1/h for large fragment sizes (i.e., h > 10%; Figure 6). Thus, global periodic selection alone (i.e., with little recombination between populations) cannot reduce the distinctness ratio of populations to 1 (so that E[πd] ≈ E[πs] unless the recombination fragment size reaches 100% of the genome.
We explored in more detail the effect of h on population distinctness for the case when within-population divergence levels are 1% (i.e., E[πs] = 0.01), because this is the divergence level frequently observed within bacterial sequence-similarity clusters (Palyset al. 1997). With this level of divergence, global periodic selection is quite ineffective in reducing divergence between populations when recombination fragments are small (Figure 7). For example, global periodic selection with a recombination fragment of 1% of the genome cannot reduce the between-population divergence by >4%; however, with larger recombination fragments (e.g., h = 5%), global periodic selection may significantly reduce the between-population divergence from unbounded neutral divergence (πd = 0.25) to a much more limited level of divergence (πd = 0.09; Figure 7).
This study presents a coalescence model for investigating the effect of globally adaptive mutations on neutral sequence divergence in bacteria. We used this model to test whether interpopulation transfer of globally adaptive mutations might prevent neutral sequence divergence between ecologically distinct populations of bacteria.
Assumptions of the model: If globally adaptive mutations are to reduce divergence between ecological populations at every locus in the genome, we must assume that every gene locus has the opportunity to hitchhike from population to population along with globally adaptive mutations (Figure 1). We therefore assume that globally adaptive mutations that confer benefits in more than one population exist, that they are numerous, and that they appear throughout the genome. The latter two assumptions are required because only a limited fraction of the genome can be cotransferred (and subsequently homogenized) across populations with any given adaptive mutation: the segments transferred in bacterial recombination are generally small (Smith 1988), and the transfer of large segments across populations is probably disfavored by natural selection (Cohan 1994b; Zawadzki and Cohan 1995).
Consider next the central premise of the model, that globally adaptive mutations exist and are numerous. The likelihood of globally adaptive mutations must depend on the degree of ecological divergence between populations. In the early stages of population divergence, a mutation that is adaptive in one population is likely to be adaptive in others. As the populations become progressively more finely tuned to their respective niches, accumulating many niche-specific adaptations, we should see fewer adaptive mutations that can benefit more than one population. We therefore expect globally adaptive mutations to prevent neutral sequence divergence genome-wide only between the most closely related populations.
Does a typical adaptive mutation confer a benefit in more than one population? Recently, Guttman and Dykhuizen (1994b) provided evidence that one adaptive mutation precipitated selective sweeps in all the ecological populations included within Escherichia coli. A selective sweep apparently purged sequence diversity within a small chromosomal region from all the various sequence clusters of E. coli, while these clusters retained their distinctness for all other chromosomal regions studied. This is exactly the pattern expected soon after a global selective sweep. As shown in Figure 2, for genes that are closely linked to the adaptive mutation (q ≈ 1), there is nearly total purging of diversity both within and between populations; for genes that are not linked to the adaptive mutation (q ≈ 0), there is purging of diversity within populations but none between populations. Provided that each of the E. coli sequence clusters is actually a separate ecological population (Cohan 1994a,b, 1999; Palyset al. 1997), the selective sweep demonstrated by Guttman and Dykhuizen (1994b) appears to have been driven by a globally adaptive mutation.
Globally adaptive mutations as a homogenizing force in neutral sequence evolution: Analysis of our model has shown that, in general, globally adaptive mutations tend to make populations less distinct. Especially under extremely low recombination rates, globally adaptive mutations severely depress neutral sequence divergence between populations while having only a minor effect on within-population diversity (Figure 3). Populations that would diverge without bound in the absence of global periodic selection may be prevented from diverging without bound in the presence of global periodic selection.
Nevertheless, global periodic selection does not homogenize neutral sequence divergence to the extent that populations become indistinguishable. Consider, for example, ecological populations whose average within-population sequence divergence is ∼1%, a value typical for sequence-similarity clusters in bacteria (Palyset al. 1997). In the absence of global periodic selection, such populations diverge into separate sequence-similarity clusters whenever the between-population recombination rate is <10-7.6; in the presence of global periodic selection, even for a large recombination fragment (h = 10%), the critical recombination rate decreases only slightly to 10-7.8 (Figure 4). Recombination rates between most bacterial populations are unlikely to exceed either critical value (Whittam and Ake 1993; Roberts and Cohan 1995; Palyset al. 1997; Cohan 1999). We therefore conclude that in spite of the homogenizing effect of global periodic selection, ecological populations should diverge into separate sequence-similarity clusters.
Analysis of the model has shown that the effect of global adaptations on between-population divergence is highly dependent on the size of the fragment recombined (Figure 7). If the recombination fragment is small (<1% of the genome), global periodic selection is virtually ineffective in reducing between-population divergence; however, if the recombining fragment is large (e.g., 5% of the genome), global periodic selection may significantly reduce the between-population divergence (Figure 7).
The effect of global periodic selection on sequence divergence may therefore depend on the mode of genetic transfer between populations, because the various modes of transfer differ greatly in the length of DNA recombined. In naturally competent taxa, such as Streptoccus and Bacillus, transformation may be the predominant mode of DNA exchange. The average fragment of DNA incorporated in both Streptococcus and Bacillus transformation is <1% of the genome (Humbertet al. 1995; Zawadzki and Cohan 1995). To the extent that transformation is the primary mode of transferring adaptive mutations across populations in these taxa, global periodic selection should have virtually no effect on sequence divergence (Figure 7).
Other modes of recombination, such as transduction and conjugation, can transfer much larger segments of DNA. A generalized transducing phage can, in principle, transfer segments as large as the phage’s own genome, which could be ∼10% of the bacterium’s genome (Fraenkel-Conrat 1985; Arber 1994). Conjugating plasmids can transfer even larger segments: in the case of the Hfr plasmid of E. coli, most of the genome can be transferred (but there may be additional fitness constraints on the size of large transferred fragments; see above). Therefore, when transduction and conjugation are the principal means of transfer of adaptive mutations, global periodic selection can have an important role in reducing divergence between populations.
In summary, global periodic selection can limit the sequence divergence between ecological populations. The effect of global periodic selection is most pronounced for groups of populations with low between-population recombination, such that global periodic selection is the only constraint on divergence between populations. Global periodic selection is unlikely to prevent the divergence of ecological populations into separate sequence clusters. A quantitative prediction of the homogenizing effect of global periodic selection would require more information about the rate of mutations that confer adaptations in multiple populations, information about how evenly globally adaptive mutations are distributed throughout the genome, and information about the size of fragments that can be transferred between populations and then successfully accommodated by the receiving population.
APPENDIX A: Probability That a Periodic Selection Event Leads to Coalescence
We consider the special case of a metapopulation consisting of two ecological populations. The adaptive mutation driving the periodic selection event begins in population 1 and is subsequently passed into population 2 by recombination. We use a two-locus, four-allele model. A is the locus under selection, while B is the segment of interest whose neutral sequence divergence we are investigating. Alleles in population 1 are designated by subscript 1; those in population 2 are designated by subscript 2. Within population 1, the advantageous allele is designated as A1, and all other alleles at the selected locus are designated a1. A2 is the advantageous allele in population 2; a2 designates all the other alleles at the selected locus in this population. An allele at locus B can be attached to any of the four A alleles, i.e., A1, a1, A2, a2. The frequencies of the alleles A1 and A2 in their respective populations are x1 and x2.
Let gX(Y,t) be the conditional probability that if a randomly selected gene B from generation t of the meta-population is attached to the allelic type Y (at locus A), its ancestor in generation (t - 1) was attached to allelic type X. [gX(Y,t) is equivalent to the quantity fX(Y,t)/f(Y,t) of Hudson and Kaplan (1988).] Following Hudson and Kaplan (1988), and keeping only the highest order terms in N (where N is the population size), 8 of the 16 g probabilities are where R11 = R22 = 2Ncs(1 - q), the per-population rate at which the two loci are separated by recombination within a population; R12 = R21 = 2Ncδ(1 - q), the per-population rate at which the two loci are separated by recombination between populations; and R1 = R2 = Ncδq, the per-population rate at which a DNA segment containing both loci is transferred between populations. The remaining 8 g values may by obtained by substituting 2 for 1 and 1 for 2 in all the indices of the above equations, e.g.,
We now define the Q process. Suppose that m B genes are selected at random at the end of the selective sweep (time t = 0). Let Q(0) = (i, j, k, l), where i, j, k, l represent the number of B genes attached to A1, a1, A2, a2, respectively. Going back in time, Q(t) describes the number of ancestral B genes attached to each A allele at time t (i.e., t generations before time 0). The total number of ancestral B genes in generation t is denoted by |Q(t)|. Note that |Q(t) never increases, because the number of ancestral alleles | can only stay constant or decrease (if two or more of the sampled alleles had a common ancestor in the previous generation). We are interested in the cases where Q(t) changes states, i.e., Q(t - 1) 1 Q(t). There are two possible cases.
Case 1. |Q(t - 1)| = |Q(t)|: The only possible state changes allowed by this condition result from recombination between parental genes. Given Q(t) = (i, j, k, l), there are 12 possible states of Q(t - 1):
Note that all other jumps would require more than a single recombination event and their probabilities are therefore of the order 1/N2 and are negligible. We are interested in the probabilities that the process jumps from (i, j, k, l), to any of the above states, e.g.,
The probability that a selected gene B from generation t is attached to a1 while its ancestor was attached to A1 is given by gA1(a1,t). Because we are sampling j a1 alleles,
Equations of the same form can be obtained for the remaining 11 jumps.
Case 2. |Q(t - 1)| 1 |Q(t): We have already noted that the number of ancestral| alleles can only decrease going backward in time. This case implies that some of the genes sampled at time t must have a common ancestor at time (t - 1). Kaplan et al. (1988) have shown that the probability that two genes of a particular allelic type at time t have a common ancestor in generation (t - 1) is, to the first order in N, given by the probability of coalescence by drift, i.e., where xa(t - 1) is the frequency of the allelic type a in the parental generation. Thus, the probability that two B alleles attached to A1 at t have a common ancestor at (t - 1) is Because i B genes attached to A1 are being sampled, where if is interpreted as 0. Similarly, and because the chance of more than one coalescence event per generation is of the order of 1/N2, jumps of >1 state are ignored. We have now defined the probabilities of every possible state change of Q(t). The probability that the Q process does not change state can thus be written as where hijkl is the total probability of the Q process changing states:
Calculation of pgd and pgs: At the end of the selective sweep we sample two B alleles from the metapopulation. We want to know the probability that the two alleles had a common ancestor during the selective sweep. We consider two cases:
Case 1: the two genes are sampled from different ecological populations. The probability of coalescence during the sweep is pgd.
Case 2: the two genes are sampled from the same ecological population. The probability of coalescence during the sweep is pgs.
We begin by considering case 1, calculation of pgd. We follow the Q process back in time, going from t = τf (end of the selective sweep) to t = τb (beginning of selective sweep; Figure 8). We can write (1 - pgd) as the probability of escaping coalescence, i.e., leaving the selective sweep (t = τf) with one B gene in population 1 (attached to A1) and one B gene in population 2 (attached to A2) and entering it (t = τb) with two ancestral B genes: (A1)
We also need to define Pijkl(t), the probability of finding the Q process in the state (i, j, k, l) at time (t), given that at the end of the selective sweep Q(τf) = (1,0,1,0), Hence, we can rewrite Equation A1 as (A2)
To calculate all the relevant Pijkl(τb) values, we use the differential equations governing their behavior: (A3) We also need the equations describing changes in the frequencies of the adaptive alleles A1 and A2, x1(t) and x2(t):
To be able to treat the model deterministically, we follow Kaplan et al. (1989) and limit t to where τ(ε) corresponds to the time in the early phase of the selective sweep, where x1 = ε, and τ2(1 - ε) corresponds to a time near the end of the selective sweep, where x2 = 1 - ε. We take ε = 5/Nz.
It remains for us to establish the boundary conditions while all other Pijkl(τ2(1 - ε)) are zero. Also,
Because the adaptive mutation enters population 2 later than it enters population 1, we must determine the value of x1 at the time when x2 = 1 - ε. For that purpose we need to first run the frequency equations forward in time (starting at x1 = ε, x2 = 0). We then allow for the transfer of a single adaptive mutation to population 2. This takes place at time E(τc), which is the expectation of the transfer of the first adaptive allele that will become fixed in population 2. (This is in fact equal to the time at which 1/2z alleles have crossed over.) We run the equations forward in time until τ2(1 - ε) to establish the end boundary conditions for x1; then we can run both x1 and x2 backward, along with the equations for Pijkl, following Equation A3 (see Figure 8).
The transfer of the first adaptive allele from population 1 to population 2 is a stochastic event and hence introduces a discontinuity in the solutions to differential equations for Pijkl. At the time immediately preceding the initial transfer of the adaptive allele into population 2 (time τc+), no A2 alleles existed. Therefore, at time τc+ the probabilities of a B allele being attached to an A2 allele [Pijkl(τc+) for k 1 0] must be zero. However, these P terms might have nonzero values at τ2(ε). Following Kaplan et al. (1989), we may neglect the contribution of P0020(τ2[ε]) to pgd. However, a nonzero value of Pijkl(τ2[ε]) implies that immediately after the transfer event, one of the sampled B genes is attached to an A2 allele. In this case, there exist two possible states of the Q process immediately preceding the transfer: Q(τc+) = (i + 1, j, 0, l) if the transfer was a corecombination of A and B (with a probability q); or Q(τc+) = (i, j, 0, l + 1) if only the A allele was transferred (with a probability q - 1). Hence, the additional boundary conditions at τc needed to account for the transfer are Using the above boundary conditions, we can obtain the values Pijkl(τ1[ε]) at the beginning of the selective sweep. Then, using the approximation Pijkl(τ1[ε]) = Pijkl(τb), Equation A2 becomes (A4) The above analysis also applies to case 2, the calculation of pgs. We only need to alter the initial conditions, i.e., allowing Q(τf) = (2,0,0,0) (if we are sampling two alleles from the original population where the adaptive mutation first occurred) or Q(τf) = (0,0,2,0) (if we are sampling two alleles from the population to which the adaptive mutation has been transferred). Because the probabilities of choosing either population are equal, we calculate pgs as the average of the two values.
APPENDIX B: The Expected Time to Coalescence
Coalescence within populations
Following Cohan (1994a), we first consider the time to coalescence ts for two segments currently residing in cells of the same ecological population. The statistical properties of ts are investigated by dividing ts into two constituent quantities: the time ks necessary to go back into the past of the two lineages to reach a “key event” (defined below) and the additional time necessary to go beyond the key event to reach a coalescence of the two lineages into a common ancestor (if the key event did not result in coalescence). A key event is any event that changes the expected time to coalescence. In the case of segments from the same ecological population, a key event may be any of the following: coalescence by drift acting on N cells of the population, a local selective sweep, a global selective sweep, or a genetic exchange event in which one of the segments is transferred into its present population from another ecological population.
The variable ts is thus defined below, where the first term ks represents the time to reach the most recent key event, and the other three terms represent the additional time necessary to go beyond the key event to reach a coalescence (Table 2):
Local selective sweep as the key event: The random variable χσl indicates whether the key event was a local selective sweep (χσl = 1 if the key event was a local selective sweep; χσl = 0 else). Two lineages that are in the same population at the end of the selective sweep may begin the sweep in one of three states: a single lineage (i.e., the lineages coalesce); two lineages in the same population; or they may begin as two lineages in different populations. The random variable χpl indicates whether the lineages escape coalescence and begin in the same population (1 if yes, 0 else, as above), and is the additional time to coalescence in this case. The random variable χγl indicates whether the lineages begin the selective sweep in different populations (1 if yes, 0 else), and is the additional time to coalescence in this case. Accordingly, the sum represents the additional time to coalescence when the key event is a local sweep.
Global periodic selection as the key event: The random variable χσg indicates whether the key event was a global selective sweep (1 if yes, 0 else). The random variable χpgs indicates whether the lineages escape coalescence and begin the sweep in the same population (1 if yes, 0 else), and the random variable represents the additional time to coalescence in this case. Similarly, χγg indicates whether the lineages begin the selective sweep in different populations (1 if yes, 0 else), and represents the additional time to coalescence in this case. The sum represents the additional time to coalescence if the event is a global selective sweep.
Recombination between populations as the key event: The random variable χcd indicates whether the key event is a recombination between populations, in which one of the lineages enters the current population (1 if yes, 0 else). The random variable indicates the additional time to coalescence beyond this key event.
Coalescence between populations
Next consider the time to coalescence, td, for segments that are now in two cells belonging to different ecological populations. As above, we divide td into the time kd to go back to the most recent key event (a global selective sweep or a genetic exchange event in which one of the lineages enters its current population from the population of the other lineage) and the additional time required to go beyond the key event to reach coalescence:
Global selective sweep as the key event: The random variable χpgd indicates whether the two lineages presently in different populations escape coalescence and begin the sweep in different populations, and is a random variable for the additional time to coalescence in this case. The random variable χγd indicates whether the two lineages escape coalescence and begin the sweep in the same population, and represents the additional time to coalescence in this case. The sum represents the additional time to coalescence when the key event is a global selective sweep.
Between-population recombination as the key event: The random variable χcδ indicates whether the key event was a recombination event between populations, such that before the event the lineages were in the same population and afterward were in different populations (1 if yes, 0 else). The random variable represents the additional time to coalescence in this case.
Expected values of ts and td: The expected values of all indicator variables were calculated following Cohan (1994a), except when the key event is a selective sweep. The expectation of χpl is the probability that two segments from the same population escape coalescence and begin the sweep in the same population (1 - Pl) and likewise for the expected values of other indicator variables: E(χpgs) = 1 - Pgs; E(χpgd) = 1 - Pgd; E(χγl) = Plc. We also define E(χγg) = Pgsc and E(χγd) = Pgdc. All the relevant expected values are calculated in appendix a.
The expected values of ts and td are as follows: It can be shown that if the selective sweeps are rapid and infrequent (i.e., duration of a sweep is much greater than the interval between sweeps) the above equations reduce to
Probability density functions of ts and td: We define Pts(t) as the probability that the time ts to coalescence for two segments currently in the same population is equal to t. Similarly, Ptd(t) is the probability that the time td to coalescence for two segments currently in different populations is equal to t. The values Pts(t) and Ptd(t) can then be expressed as the probability that the most recent key event occurred at exactly time t and led to coalescence or, if the key event occurred at any other time τ < t and did not lead to coalescence, the additional time necessary to reach coalescence was (t - τ). We can consider the key events to be exponentially distributed (Hudson and Kaplan 1988). Hence, and
The above recursions were solved numerically to yield numerical representations of the two probability distribution functions.
We thank Michael Feldgarden for suggesting that we explore a model of global periodic selection and Richard Hudson for suggesting important improvements to the model. This work was supported by Environmental Protection Agency grants R82-1388-010 and R82-5348-010 and by research funds from Wesleyan University.
Communicating editor: R. R. Hudson
- Received September 20, 1996.
- Accepted April 23, 1999.
- Copyright © 1999 by the Genetics Society of America