## Abstract

Purifying selection at many linked sites alters patterns of molecular evolution, reducing overall diversity and distorting the shapes of genealogies. Recombination attenuates these effects; however, purifying selection can significantly distort genealogies even for substantial recombination rates. Here, we show that when selection and/or recombination are sufficiently strong, the genealogy at any single site can be described by a time-dependent effective population size, *N*_{e}(*t*), which has a simple analytic form. Our results illustrate how recombination reduces distortions in genealogies and allow us to quantitatively describe the shapes of genealogies in the presence of strong purifying selection and recombination. We also analyze the effects of a distribution of selection coefficients across the genome.

PURIFYING selection acts to remove deleterious mutations, and variation linked to these mutations, as they continually arise in a population. This leads to reduced genetic diversity at both selected sites and linked neutral sites. This effect, known as background selection, can significantly affect the patterns of diversity evident in sequence data. In recent years, substantial empirical evidence has arisen, suggesting that these effects may be pervasive in humans and other organisms (McVicker *et al.* 2009; Haddrill *et al.* 2010; Lohmueller *et al.* 2011; see Charlesworth 2012 for review).

When selection is very strong, these effects are well understood. Deleterious mutations are purged almost immediately, leading to an overall reduction in diversity to the level expected in a neutrally evolving population with a reduced population size *N*_{e} (Charlesworth 1994; Hudson and Kaplan 1994, 1995a; Nordborg *et al.* 1996). The size of this reduction depends upon the recombination rate, which determines the extent to which each site is linked to potentially deleterious mutations. This effect is captured by a simple analytic formula showing how *N*_{e} depends upon mutation rates, selection strengths, and recombination rates and has been widely used to interpret patterns of molecular evolution (Hudson and Kaplan 1995b; Charlesworth 2013).

However, it has long been recognized that in addition to overall reductions in diversity, purifying selection also distorts the shapes of genealogies (Charlesworth *et al.* 1993, 1995; Zeng and Charlesworth 2011). These distortions arise because purifying selection does not act instantaneously, and hence deleterious mutations can persist transiently in the population, lengthening coalescence times in the recent past relative to those in the distant past (Williamson and Orive 2002; Barton and Etheridge 2004). A number of recent studies have addressed these distortions in the completely nonrecombining (asexual) case (O’Fallon *et al.* 2010; Seger *et al.* 2010; Nicolaisen and Desai 2012; Walczak *et al.* 2012; see Charlesworth 2013 for review). However, very little is quantitatively known about the shape and magnitude of these distortions in the presence of recombination.

To address this question, Zeng and Charlesworth (2011) recently developed a structured coalescent algorithm to simulate genealogies in the presence of purifying selection with recombination, analogous to the asexual structured coalescent (Hudson and Kaplan 1994). They used this method to analyze distortions in statistics such as the mean coalescence time and the ratio of external branch length to total branch length. This approach makes it possible to rapidly simulate any statistic describing the shape of genealogies, including those involving multiple sites (*e.g.*, the correlation in coalescence times at two sites). Although this approach is valid only for sufficiently strong selection, it has led to many novel conclusions and offers great promise as a useful practical tool.

However, despite these advantages, one of the main difficulties of simulation-based methods is that they cannot provide simple analytical insight into how distortions in genealogies depend upon the relevant parameters, and it is thus difficult to incorporate results into practical inference or estimation methods. In this article, we show that the distortions in genealogies may be described by a neutral population with a time-dependent effective population size *N*_{e}(*t*), and we compute a simple analytical formula describing how this *N*_{e}(*t*) depends upon mutation rates, selection strengths, and recombination rates. Our approach is closely related to our earlier analysis of the effects of purifying selection in completely nonrecombining regions (Nicolaisen and Desai 2012), and many of our results are closely analogous, demonstrating that many of the effects of selection in asexual populations remain qualitatively similar in the presence of substantial recombination.

Our analysis is limited to describing genealogies at a single site and is valid only provided purifying selection and recombination are sufficiently strong. Thus, it is unable to describe the topological distortions in genealogies that begin to appear as selection and recombination become weaker. However, despite these limitations, our results show that the effects of strong purifying selection and recombination may be described by a time-varying population size and explicitly describe the analytical dependence of this time-varying population size on the underlying parameters. This result can therefore be directly incorporated into preexisting neutral methods of inference and estimation in a time-varying population to describe or infer the effects of selection and recombination. As we will see, it is also straightforward to incorporate the effects of variation in selection strengths across sites into our analysis.

We begin in the next section by describing our model, which is closely related to earlier studies of background selection with recombination (Charlesworth *et al.* 1993; Hudson and Kaplan 1994, 1995a,b; Nordborg *et al.* 1996). We focus on a single focal site in a randomly sampled individual, and we trace that individual’s ancestral history backward in time. We calculate the probability that a single linked site carries a deleterious mutation as a function of time in the past. In general, selection acts against individuals carrying deleterious mutations, such that the probability an ancestor carried such a mutation decreases as we look farther into the past. We refer to this probability as the “ancestral fitness distribution”, and we use this to calculate the probability that two individuals contain the same set of deleterious mutations across all linked sites. This calculation relies upon the key assumption that we may treat the ancestral fitness distribution at each site as independent (Supporting Information, File S1). Finally, we use this to calculate the probability of coalescence over time and thus *N*_{e}(*t*).

## Analysis

We consider a haploid population of *N* individuals. For simplicity, throughout most of our analysis we assume that each site experiences deleterious mutations at rate *μ* to an allele carrying selective cost *s*. In a section below, we show how our results can be straightforwardly generalized to the case where each site has an arbitrary mutation rate and selection coefficient. Throughout, we assume that there is no epistasis and that fitnesses combine multiplicatively, so that an individual with *k* deleterious mutations has fitness (1 − *s*)* ^{k}*. We note that this haploid model is analogous to that of Zeng and Charlesworth (2011) and is closely related to earlier diploid models of purifying selection and recombination (Charlesworth

*et al.*1993; Hudson and Kaplan 1994, 1995a,b; Nordborg

*et al.*1996)—resulting equations can be compared with simple modifications.

### The ancestral fitness distribution

We consider a single individual, randomly sampled from the population. We wish to trace the ancestral history of that individual at a single focal site backward in time, to calculate the probability that the ancestor carries a mutation at a particular linked site (referred to as the “index site”), as a function of time in the past.

In the present, we know that the probability an individual carries a deleterious mutation at a particular site is given by the classical mutation–selection balance result, *p*_{mut} = *μ*/*s* (Kimura and Maruyama 1966; Haigh 1978). However, there are two types of events that may occur in the ancestral history. First, a deleterious mutation may occur, which will move the ancestor from the mutant state into the nonmutant state. This will occur at rate , where we have neglected terms of higher order in *μ*/*s*. Second, a recombination event may occur between the index and focal sites at rate *r*(*x*_{i}, *x*_{f}), where *x*_{i} denotes the index site and *x*_{f} denotes the focal site. When a recombination event separates the focal site from the index site, the ancestor at the index site is randomly chosen from the population (Figure 1). Thus, the ancestor will carry a mutation with probability *p*_{mut} = *μ*/*s*. Writing this out, we have thatwhere *P*_{mut}(*t*) is the probability the ancestral lineage carries a deleterious mutation at the index site at time *t*. We note that we have neglected the effects of back mutations, which introduce terms of higher order in *μ*/*s*. However, it is straightforward to include these terms (see File S2). Solving the above differential equation, we have that the ancestral fitness distribution is simply(1)

We note that Equation 1 allows for recombination rates to vary in any arbitrary way across the genome. However, to illustrate our main results, it is often helpful to make the simplifying assumption that recombination occurs at constant per-site rate *r*. In this case, denoting *x* ≡ |*x*_{f} − *x*_{i}|, Equation 1 reduces to(2)For clarity we use this simpler expression throughout most of our subsequent analysis, but we note that our results can all be generalized to account for the effects of variation in recombination rates by replacing Equation 2 with Equation 1 throughout.

Intuitively, Equation 2 reflects the fact that sites far from the focal site will typically recombine away more frequently and hence have an ancestral fitness distribution that is closer to that given by the steady-state mutation–selection balance, *μ*/*s*. By contrast, sites very close to the focal site are unlikely to recombine away, and the ancestral lineage at these sites is biased to be more fit than average. The distance is the boundary between these two regimes and represents the natural length scale on which the effects of selection are diminished. On distances small compared to *L*^{∗}, the ancestral fitness distribution is comparable to the nonrecombining case, since recombination occurs very infrequently compared to selection. By contrast, on distances large compared to *L*^{∗}, recombination occurs so rapidly that selection does not have time to remove mutations from the ancestral lineage before it is reset by recombination.

We note that, in analogy with earlier work on background selection with recombination, as well as the structured coalescent method of Zeng and Charlesworth (2011), this derivation assumes that all sites may be treated deterministically. This approximation will be reasonable provided no individual lineage becomes a significant fraction of the population (or, analogously, if the typical timescale on which deleterious mutations are removed form the population is short relative to the population size), which holds when . When this approximation breaks down, mutant lineages may grow to a significant fraction of the total population, and our results will no longer accurately capture the ancestral fitness distribution.

We compare our theoretical result in Equation 2 with forward-time simulations in Figure 2. We compare the ancestral fitness distribution at five different ancestral time points, in two different parameter regimes. We see that for strong selection/recombination, Equation 2 accurately describes the ancestral fitness distribution at each time point, as predicted. By contrast, when selection and/or recombination become weaker, we see that our Equation 2 systematically overestimates the ancestral fitness. Thus, as expected, our results become less accurate as *N*_{e}*s* becomes small and the deterministic approximation breaks down. This is a consequence of fluctuations: as the strong-selection/recombination condition breaks down, mutant lineages may occasionally grow to a substantial fraction of the population. This will systematically bias mean mutation probabilities to higher values. As we will later see, despite these events, we will still be able to accurately predict the shape of genealogies as the strong-selection/recombination condition begins to be violated. However, strong violations of this condition will cause our results to break down by introducing additional distortions that we are not able to address.

We note that in deriving Equation 2, we have assumed a continuous approximation that requires the per-generation rate of recombination to be small (*rx* ≪ 1). This will be strictly valid only if the total genome-wide recombination rate is small, *rL* ≪ 1. However, in practice, our result will still be valid even when *rL* > 1, since only sites close to the focal site contribute to the effects of selection on genealogies. To be specific, only sites within *x* ≲ *L*^{∗} of the focal site are significantly affected by selection, so Equation 2 will in fact be roughly valid provided only that *rL*^{∗} ≈ *s* ≪ 1, which we generally expect to hold. In a similar vein, instead of approximating *r*(*x*) = *rx*, it would be more appropriate to use a mapping function such as the Haldane formula, (Haldane 1919). However, this is also roughly equivalent to our approximation within the relevant range, provided only that *s* ≪ 1. This was earlier noted by Nordborg *et al.* (1996), who observed that the choice of mapping function was not significant since only closely linked sites contribute to the effects of selection.

We have derived an ancestral fitness distribution, which captures the probability that the ancestor of an individual will have a mutation at a particular site in the past. We now use this ancestral fitness distribution to calculate the probability that two or more individuals share the same set of mutations, which will enable us to calculate the probability of coalescence as a function of time.

### The effective population size

When two ancestral lineages have the same set of mutations across all sites (*i.e.*, they are in the same “configuration”), they coalesce with per-generation probability 1/*N*_{config}, where *N*_{config} is the total number of individuals in that configuration. Thus the probability that two arbitrary ancestral lineages coalesce a time *t* in the past is the probability they exist in the same configuration divided by the number of individuals in that configuration,As in earlier work on background selection with recombination, provided the deterministic approximation holds and *μ*/*s* ≪ 1, we may treat each site as independent (see File S1 for further details about this approximation). We then havewhere we have neglected terms of order or higher. As before, it is possible to keep this entirely general, allowing the focal site to be at any position along a genome of arbitrary length. However, to illustrate our results, we assume that the focal site is located at the center of a genome of length *L*. Approximating the sum as an integral, this becomesCarrying out the integral, we find (3)where we have defined *rL* ≡ *R* and *μL* ≡ *U*_{d}. Although we derived this result by considering a sample of size two, we arrive at the same *N*_{e}(*t*) for arbitrary sample sizes provided we may treat lineages as independent and exchangeable. This assumption holds provided the typical timescale for backward-in-time mutation events is short relative to the typical coalescence time [*e.g.*, when , where *n* is the sample size]. We note that this condition becomes more restrictive for larger samples, but provided it holds, the coalescence probabilities are described by the *N*_{e}(*t*) given above.

We see from Equation 3 that in the recent past, the effective population size is simply *N*_{e}(0) = *N*. However, as time recedes into the past, the ancestral lineages become biased toward more fit configurations and are correspondingly more likely to coexist in the same configuration concurrently. This implies that the rate of coalescence increases with time. As *t*→∞, our results reduce to , the haploid version of the original background selection result.

We compare our result for *N*_{e}(*t*) in Equation 3 with forward-time simulations in Figure 3, as a function of genome size and recombination rate. Figure 3 illustrates the significant period of transition from the larger *N*_{e} in the recent past, prior to reaching the long-term result, which results in distortions in the shapes of genealogies. The agreement is generally good, but we note that for smaller recombination rates our analysis systematically underestimates *N*_{e}(*t*). This is a consequence of the deterministic approximation breaking down as the recombination rate decreases, leading to strong fluctuations in the population.

Our results differ from the classical background selection results by incorporating the transient period during which deleterious alleles may segregate in the population prior to being removed. The timescale of this transition period is, roughly, of order 1/*s* generations. In the deterministic regime, we have assumed that *N*_{e}*s* ≫ 1, such that this transition period is, by definition, short relative to the typical coalescence times. However, despite this, as seen in Figure 3, by accounting for this transition period we are able to capture a significant deviation from the classical result. This deviation is ultimately a primary source of the distortions we expect to see in genealogies, and thus our results are able to show how selection can lead to distortions in genealogies and in genealogical statistics and how these effects depend upon the parameters involved. However, we note that our analysis is restricted to addressing the distortions that arise due to this transient period—when the deterministic approximation breaks down, fluctuations in the population become significant and lead to further distortions, including topological distortions, which our analysis is not able to capture.

### Coalescence times and other single-site statistics

Our result for *N*_{e}(*t*) leads immediately to an expression for the distribution of times to the next coalescence event in a sample of size *n*,(4)We compare this prediction to forward-time simulations for a sample of two individuals in Figure 4; we see that there are significant distortions introduced by the time dependence of *N*_{e}(*t*), which lead to a nonzero peak in the distribution of coalescence times.

Using the distributions of coalescence times, we can calculate various statistics describing genetic diversity. For example, the distribution of pairwise heterozygosity at a neutral focal site is given by(5)In contrast, if the site is a selected site, then we have from Equation 2 that the probability the ancestor carries a deleterious mutation is . Backward in time, an individual carrying such a mutation will undergo a deleterious mutation from the nonmutant state at rate . Thus, the backward-in-time mutation rate is simply *μe*^{−}* ^{st}* (Nicolaisen and Desai 2012). Therefore, we can estimate that(6)Using similar logic, we can explicitly calculate more complex statistics, using neutral methods incorporating a time-varying population size. For example, the mean time to the most recent common ancestor may be calculated using Equation 4 of Austerlitz

*et al.*(1997), and arbitrary moments of the total branch length may be calculated using Equation 20 of Eriksson

*et al.*(2010). Similarly, if the focal site is a neutral site, the site frequency spectrum may be calculated using Equation 2 of Polanski and Kimmel (2003).

This approach is illustrated in Figure 5. Here, we consider the total lengths of branches ancestral to *i* individuals in a sample of size 10 (normalized by the length of branches ancestral to 1 individual). We compare forward-time simulations (represented by circles) with our theoretical result. For comparison, we also show the result expected for any fixed effective population size. We see that there is a noticeable deviation from the neutral expectation, characterized by an excess of rare branch lengths relative to more common branches.

If the focal site is a neutral site, mutations occur uniformly along the branch lengths. Thus, our result in Figure 5 would be directly analogous to the site frequency spectrum and implies an excess of rare alleles relative to common ones. In contrast, if the focal site is a selected site, deleterious mutations occur at a backward-in-time rate of *μe*^{−}* ^{st}*. In this case, deleterious mutations will be further biased toward recent branches, leading to an even more pronounced excess of rare alleles relative to common ones. To understand the expected frequency spectrum in this case, or to calculate any other complicated genealogical statistic, we can implement purely neutral and nonrecombining coalescent simulations that account for the effects of selection and recombination simply by using the appropriate

*N*

_{e}(

*t*).

### Incorporating a distribution of fitness effects

Our analysis can be easily extended to account for variation in recombination rates, mutation rates, and selection coefficients across the genome. Using the same logic described above, we find that in this more general case the time-dependent effective population size is given by(7)where *μ*(*x*_{i}) and *s*(*x*_{i}) are the mutation rate and the selection coefficient at site *x*_{i}, respectively, and *r*(*x*_{i}, *x*_{f}) is the total recombination rate between *x*_{i} and the focal site *x*_{f}. When *t* → ∞, this reduces to the classical background selection result (Hudson and Kaplan 1995b; Charlesworth 1996; Nordborg *et al.* 1996; Loewe and Charlesworth 2007).

A particularly interesting case is the situation where mutation and recombination rates are constant across the genome, but each selected site has a fitness effect drawn from some distribution *ρ*(*s*). In this case, we find(8)We note that this result assumes that lineages can be treated deterministically, which requires that no lineage become a significant fraction of the total population, and thus *N*_{e}*s* ≫ 1. Thus, we expect Equation 8 to hold only when the bulk of the mutations are either in this regime, *N*_{e}*s _{i}* ≫ 1, or nearly neutral,

*N*

_{e}

*s*≪ 1. Although our analysis is still reasonable when a small number of mutations exist in the intermediate regime, it requires that the bulk of the deviation from neutrality satisfies the strong-selection/recombination condition, such that these mutations of intermediate effect can be neglected. This issue was also addressed in earlier work considering a distribution of fitness effects (see,

_{i}*e.g.*, Nordborg

*et al.*1996).

Several recent studies have suggested that the distribution of deleterious fitness effects in humans and *Drosophila* may be characterized by a gamma distribution with shape parameter *β* < 1. For example, Keightley and Eyre-Walker (2007) estimated a shape parameter of *β* ∼ 0.2 for human populations and *β* ∼ 0.35 for *Drosophila*. Motivated by these findings, we compare our theoretical results in Equation 8 with forward-time simulations for two populations with gamma distributions of fitness effects, using shape parameters of 0.5 and 0.25, in Figure 6. For reference, we also show the theoretical result expected under a single-*s* case, where *s* is the mean fitness effect. We see that our results accurately characterize the time dependence of the effective population size under a distribution of fitness effects.

### Incorporating temporal variation in the population size

A key feature of our analysis is that, within this deterministic regime, the dynamics of ancestral lineages are independent of the population size. The implication of this is that the ancestral fitness distribution in Equation 2 is independent of the population size, and thus Equation 3 depends only upon an overall multiplication by *N*. This phenomenon was recently discussed in Zeng (2012) and was used as the basis for incorporating a changing population size into structured coalescent simulations. Similarly, we can incorporate a changing population size into our analysis, simply by replacing *N* with *N*(*t*) in our Equation 3.

We caution, however, that this framework implicitly assumes that the population remains in mutation–selection balance throughout its history and is characterized by the same mutation rates, recombination rates, and selection coefficients throughout. If these other parameters are also changing with time, this will not hold. Similarly, the strong-selection/recombination condition must hold throughout the timescale of coalescence.

To illustrate this, in Figure 7 we compare forward-time simulations for a population experiencing exponential growth *N*(*t*) = *Ne*^{−ℓ}* ^{t}* (with

*t*measured backward in time from the present) with our theoretical results from Equation 3. For reference, we include the predicted theoretical result in the absence of selection.

### Forward-time simulations

Our forward-time simulations are closely modeled off those of Zeng and Charlesworth (2011). Specifically, we simulate a haploid population of constant size *N*, with a genome of length *L*. Each generation, we introduce a Poisson-distributed number of new deleterious mutations uniformly throughout the population, with mean *NU*_{d}. We then simulate reproduction, introducing a Poisson-distributed number of recombination events uniformly throughout the population. For any nonrecombinant offspring, one ancestor is chosen, weighted according to its fitness. For recombinant offspring, two ancestors are chosen, again weighted according to their fitnesses. The appropriate number of breakpoints is randomly chosen along the genome, and one of the two resulting genotypes is randomly selected as the offspring. These steps are repeated each generation.

We note that one key difference between our simulations and those of Zeng and Charlesworth (2011) is that we allow multiple recombination events to occur within a single individual. This makes it possible to consider the *L* → ∞ limit, where multiple recombination events become common. We ran all simulations for a minimum of at least 10*N* generations to achieve equilibrium. When incorporating a distribution of fitness effects, we chose the fitness effect at each site according to the fitness distribution *ρ*(*s*). At least 10,000 trials were completed for each parameter regime.

## Discussion

Our analysis demonstrates that the effects of strong purifying selection and recombination can be summarized in terms of a time-dependent effective population size, *N*_{e}(*t*). This *N*_{e}(*t*) characterizes the distortions we expect to see in genealogies and can be used as the basis for quick and efficient methods to analyze single-site statistics. It illustrates how these statistics depend upon parameters such as the selection coefficient, position in the genome, and variation in the recombination rate.

Our results extend earlier work that summarized the effects of purifying selection and recombination by using a reduced but *constant* effective population size (Charlesworth *et al.* 1993; Hudson and Kaplan 1995b). The simplicity of this earlier result has made it broadly useful in interpreting patterns in sequence data—for example, in determining whether background selection can be responsible for observed relationships between diversity and local recombination rates in humans and *Drosophila* (Hudson and Kaplan 1995b). Our analysis represents the simplest analytical extension of this earlier work that can describe distortions from neutrality in addition to overall reductions in diversity.

By summarizing these effects in a time-dependent effective population size, our approach makes it possible to continue to use neutral methods for inference and estimation even in the presence of background selection, simply by allowing the population size to vary appropriately with time. For example, our results could be used in combination with a recent method developed by O’Fallon (2011) to incorporate a time-varying coalescent rate into genealogy samplers as a means to improve genealogical inference. This method considered a coalescence rate that declines linearly as time recedes into the past, but by instead incorporating the *N*_{e}(*t*) we have calculated here, we can incorporate background selection and recombination into this and other existing neutral methods in a principled way.

We note that the *N*_{e}(*t*) derived here is very similar to the purely nonrecombining case we analyzed in earlier work (Nicolaisen and Desai 2012). In both cases, *N*_{e}(*t*) begins at the actual population size in the present and then declines into the past before eventually reaching a long-term reduced size, *N*_{e}(∞). Although the explicit form of the *N*_{e}(*t*) depends on the recombination rate, its qualitative features are similar to and lead to similar distortions to the asexual case. This similarity suggests that the general conclusions of earlier analyses of background selection in the nonrecombining case may often be qualitatively robust to the presence of recombination.

The reasons for this qualitative similarity (and the extent to which it holds) remain an open question. One appealing possibility is that local, closely linked genomic regions can be treated as effectively nonrecombining blocks, while loci separated by larger distances can be treated as freely recombining. Thus, a recombining population would be analogous to an asexual population with a particular “block length”, which would depend upon the strength of selection and the recombination rate. However, there are several problems with this intuition. In particular, it is not clear that there is a sufficiently sharp transition between regions that are effectively nonrecombining and those that are effectively freely recombining. Furthermore, the size of the effective block length may typically depend upon other parameters (such as the sample size).

We can naively attempt to quantify this similarity between a recombining population and an asexual population: if we define an effective genome size *L*^{∗} (the “effectively nonrecombining block length”, such that ) and an effective selection strength *s*^{∗}, then by demanding that the long-term effective population size *N*_{e}(∞) and the typical transition time to this long-term result are equivalent between the two populations, we arrive atUsing these effective parameters, we find a close (though not identical) match between our results and those of the corresponding asexual model. However, it is not clear whether this rough equivalence or the effective parameters *L*^{∗} and *s*^{∗} have any predictive power beyond the statistics we have used here to define them. This remains an important topic for future work.

We note that our analysis rests on the assumption that mutation frequencies can be treated deterministically and that ancestral lineages can be treated independently. The implication of this latter assumption is that all pairs of lineages are considered independent and exchangeable, independent of the history of the sample. A consequence of this assumption is that, since all pairs of lineages are equally likely to coalesce, genealogies will be topologically neutral. As selection becomes weaker and these assumptions are violated, correlations between the lineages become important, topological distortions will arise, and our analysis breaks down. Furthermore, when this begins to occur, fluctuations in the sizes of lineages become very significant, and the deterministic assumption is also violated. Thus, methods capable of describing these fluctuations are required to fully understand the effects of purifying selection and recombination in the weak selection regime. Very little is currently known about this regime, which is an important direction for future work.

We note that recent work has begun to address these fluctuations and the effects of weak purifying selection, but only in the purely asexual case. For example, (O’Fallon *et al.* 2010) developed a semianalytical approach to understand these effects in the *Ns* ≈ 1 regime. Their approach is to divide the population into a continuous distribution of fitness classes and calculate a corresponding ancestral fitness distribution and, in turn, coalescent rate. An alternative approach by Good *et al.* (2013) has suggested that the effects of many weakly selected mutations on sequence diversity are identical to the effects of fewer strongly selected mutations, making it possible to “map” weakly selected populations onto their equivalent strongly selected counterparts. An important question for future work is whether these weak selection methods may be extended to include recombination. A detailed understanding of the “effective” similarity between asexual and recombining populations, hinted at by our results, may potentially provide a way forward, by suggesting that these new asexual methods might also apply in the presence of recombination, with a suitable reinterpretation of the parameters.

## Acknowledgments

We thank Benjamin Good for many useful discussions. This work was supported by the James S. McDonnell Foundation, the Harvard Milton Fund, and the Alfred P. Sloan Foundation. L.E.N. is supported by the Department of Defense through the National Defense Science and Engineering Graduate Fellowship Program. Simulations were run on the Odyssey cluster supported by the FAS Sciences Division Research Computing Group at Harvard University.

## Footnotes

*Communicating editor: Y. S. Song*

- Received May 12, 2013.
- Accepted June 20, 2013.

- Copyright © 2013 by the Genetics Society of America