Abstract
Self-fertilizing species often present lower levels of neutral polymorphism than their outcrossing relatives. Indeed, selfing automatically increases the rate of coalescence per generation, but also enhances the effects of background selection and genetic hitchhiking by reducing the efficiency of recombination. Approximations for the effect of background selection in partially selfing populations have been derived previously, assuming tight linkage between deleterious alleles and neutral loci. However, loosely linked deleterious mutations may have important effects on neutral diversity in highly selfing populations. In this article, I use a general method based on multilocus population genetics theory to express the effect of a deleterious allele on diversity at a linked neutral locus in terms of moments of genetic associations between loci. Expressions for these genetic moments at equilibrium are then computed for arbitrary rates of selfing and recombination. An extrapolation of the results to the case where deleterious alleles segregate at multiple loci is checked using individual-based simulations. At high selfing rates, the tight linkage approximation underestimates the effect of background selection in genomes with moderate to high map length; however, another simple approximation can be obtained for this situation and provides accurate predictions as long as the deleterious mutation rate is not too high.
- deleterious mutation
- genetic drift
- effective population size
- multilocus population genetics
- self-fertilization
UNDERSTANDING the evolutionary consequences of transitions between reproductive systems has been the focus of an important number of theoretical and empirical studies. In particular, the shift from biparental sexual reproduction to self-fertilization has occurred frequently in plants and animals (Goodwillie et al. 2005; Jarne and Auld 2006), but the phylogenetic distribution of selfing lineages suggests that these are often relatively short-lived and may thus correspond to an “evolutionary dead end” or “blind alley” (e.g., Stebbins 1957; Williams 1992; Takebayashi and Morrell 2001; Goldberg et al. 2010; Igic and Busch 2013). A possible reason for the lack of macroevolutionary success of selfing species may be their reduced capacity to produce novel genotypes (in particular, genotypes adapted to new environmental conditions), due to a reduced efficiency of recombination. Furthermore, self-fertilization lowers the effective size of populations and should thereby decrease the efficiency of natural selection against deleterious alleles, which may lead to mutation accumulation and population extinction (Lynch et al. 1995; Schultz and Lynch 1997). Analyses based on molecular data show little evidence for increased ratios of nonsynonymous to synonymous substitutions () in selfing lineages that would indicate a reduced efficiency of purifying selection (Glémin and Muyle 2014; Hartfield 2015 and references therein): this may be due to the recent origin of those lineages or to the low rates of outcrossing maintained by most predominantly selfing species (Wright et al. 2013). However, several recent studies showed elevated ratios of nonsynonymous to synonymous polymorphism (
) in various selfing species (compared with their outcrossing relatives), suggesting that deleterious alleles may reach higher frequencies in selfers (e.g., Brandvain et al. 2013; Burgarella et al. 2015; and other references listed in table 1 of Hartfield 2015).
The lower effective size of selfing populations has been demonstrated empirically using neutral diversity data from a variety of species (e.g., Charlesworth 2003; Glémin et al. 2006) and is thought to result from two types of effects. The first type is an automatic increase in the rate of coalescence per generation, since the ancestral lineages of the two homologous copies of a gene in an individual may coalesce in a single generation (with probability ) if this individual has been produced by selfing. Due to this effect, the effective population size is given by
(Pollak 1987; Nordborg 2000), where N is the census size and where the inbreeding coefficient F equals
in a population in which a proportion α of individuals are produced by selfing. Therefore,
is expected to decline linearly from N to
as α increases from 0 to 1. However,
may be further decreased (second effect) by selective sweeps or by selection against deleterious alleles (background selection) (Charlesworth et al. 1993; Charlesworth 2012), whose effects are amplified by the lower effective recombination rates of selfing populations.
Several models have computed the effect of background selection on neutral diversity in randomly mating populations, using different approaches (Hudson and Kaplan 1995; Nordborg et al. 1996; Santiago and Caballero 1998; Charlesworth 2012). These showed in particular that a deleterious allele at mutation–selection balance reduces the expected diversity at a linked neutral locus by a factor where u is the mutation rate toward the deleterious allele,
the heterozygous fitness effect of this allele (assumed different from zero), and r the recombination rate between the two loci. The case of partially selfing populations has been addressed by Nordborg (1997), using a structured coalescent model and a separation of timescales argument. Indeed, assuming that recombination and coalescence of lineages present in different individuals occur at a much lower rate than the coalescence of lineages present in the same individual due to selfing, the population can be described in terms of haplotypes instead of diploid genotypes, which considerably simplifies the analysis. Under this assumption, the effect of a deleterious allele on linked neutral diversity is given by a similar expression as in the panmictic case, replacing
by
(measuring the strength of selection against the deleterious allele in a partially selfing population) and r by the effective recombination rate
(see also Nordborg 2000). Extrapolating this result to the case of deleterious alleles segregating at many loci, Glémin (2007) and Glémin and Ronfort (2012) showed that the effective size of highly selfing populations may be strongly reduced by background selection effects.
Strictly, Nordborg’s (1997) result holds for tightly linked loci, since the separation of timescales argument supposes a low recombination rate. While the effective population size at a given locus should be little affected by loosely linked loci as long as the selfing rate remains moderate, this may be less so when the selfing rate is high, so that linkage disequilibria may extend over relatively large genetic distances (e.g., Nordborg et al. 2002). Using multilocus individual-based simulations of partially selfing populations, Kamran-Disfani and Agrawal (2014) observed discrepancies between the estimated and predictions obtained by extrapolating Nordborg’s (1997) result over a whole genetic map. These may be caused by the fact that the effects of loosely linked loci are not sufficiently well predicted by the separation of timescales approximation and become important at high selfing rates.
In this article, I construct a model of background selection in partially selfing populations by extending the multilocus population genetics framework previously developed by Barton and Turelli (1991) and Kirkpatrick et al. (2002). As we will see, a strength of this approach is that it allows one to decompose evolutionary processes (here the background selection effect) into different terms involving linkage disequilibria and other forms of genetic associations, for which intuitive interpretation can be given. Expressions valid for any value of the recombination rate are derived and shown to converge to Nordborg’s (1997) result when linkage is tight. However, this tight linkage approximation may significantly underestimate the strength of background selection when the selfing rate is high (but <1); we will see that another approximation yielding better predictions at high selfing rates can be obtained from the general model. A good match between the analytical predictions and multilocus simulation results is observed as long as the genomic deleterious mutation rate U is not too high (0.1 per haploid genome), while discrepancies appear at higher values of U: those are likely due to genetic associations between deleterious alleles at different loci, which are neglected in the analysis.
Model
General method
As in previous models of background selection (e.g., Hudson and Kaplan 1995; Nordborg et al. 1996), I first consider the effect of a single deleterious allele maintained at mutation–selection balance at a given locus on the dynamics of genetic diversity at a linked neutral locus. This effect can be quantified by computing the expected change in neutral diversity over one generation, which is affected by various moments of genetic associations between the two loci (e.g., the variance in linkage disequilibrium and other moments of associations between genes present either on the same haplotype or on different haplotypes of a diploid individual). Assuming recurrent mutation at the neutral locus, the effective population size at the neutral locus can be deduced from the expected neutral diversity at equilibrium (Nordborg et al. 1996). Alternatively, we may ignore mutation at the neutral locus and calculate
by equating the expected rate of loss in diversity per generation to
since diversity is eroded at a rate
per generation in a Wright–Fisher population. This is the approach that is used here. Strictly, it relies on a quasi-equilibrium approximation, since moments of genetic associations (for example, the variance in linkage disequilibrium) are expressed in terms of diversity at the neutral locus and of the frequency of the deleterious allele, implying that these moments of genetic associations equilibrate fast relative to changes in allele frequencies. However, this approximation is justified when population size is sufficiently large (so that changes in allele frequencies due to drift remain small) and yields the same expression for
as what would be obtained by calculating the equilibrium neutral diversity under recurrent mutation.
I consider the following life cycle: N individuals are present at the start of each generation and produce a very large (effectively infinite) number of juveniles in proportion to their fitness. A proportion α of offspring is produced by selfing, while the remaining is produced by random fusion of gametes. Finally, N individuals are sampled randomly among all juveniles produced, to form the next generation (drift). Fitness depends on genotype at the selected locus, where two alleles (denoted 0 and 1) are segregating: allele 1 is deleterious, reducing fitness by a factor
in the heterozygous state and
in the homozygous state. The deleterious mutation rate (from allele 0 to allele 1) is denoted u. As in previous treatments (Hudson and Kaplan 1995; Nordborg et al. 1996) I assume that
and
so that the frequency of the deleterious allele remains small and can be approximated by the deterministic mutation–selection balance frequency (strictly, this also assumes
where
is the frequency of the deleterious allele). Finally, r measures the recombination rate between the two loci.
In the following I use a general method to compute the effects of selection, reproduction, and drift on moments of genetic associations. This method is based on a previous formalism for the analysis of multilocus models (Barton and Turelli 1991; Kirkpatrick et al. 2002), extended to include genetic drift. It was used previously to study selection for sex in finite diploid populations undergoing both sexual and asexual reproduction (Roze and Michod 2010) and is described in Appendix A for the case of a partially selfing population. To simplify the notation, the examples shown in Appendix A concern the case of a biallelic neutral locus; however, the method extends to multiple alleles, yielding the same expression for the decay of neutral diversity per generation. The analysis of the two-locus model then proceeds in three steps. First, I express the expected decay of genetic diversity per generation in terms of various genetic moments involving both loci. Then, recurrence equations describing the dynamics of these two-locus moments are derived (to the first order in and assuming that the deleterious allele stays at low frequency). Finally, these equations are solved to obtain expressions for two-locus moments at (quasi-)equilibrium, in terms of diversity at the neutral locus and of the different parameters of the model. Injecting these solutions into the equation describing the decay of neutral diversity yields an expression for the effect of the deleterious allele on
at the neutral locus.
The results of this two-locus model are then extrapolated to a situation where deleterious alleles segregate at a large number of loci, located at various genetic distances from the neutral locus. For this, I assume that the effects of the different selected loci on diversity at the neutral locus are multiplicative, thereby neglecting genetic associations between selected loci. In the absence of epistasis between deleterious alleles, this approximation is expected to yield correct results under random mating in the regime considered here (where selection against deleterious alleles is stronger than drift, e.g., Hudson and Kaplan 1995) but may be less accurate under partial selfing, as inbreeding generates different forms of associations between loci (correlations in homozygosity in particular, e.g., Roze 2015). Nonetheless, we will see that this assumption of multiplicative effects often generates accurate predictions, as long as the genomic deleterious mutation rate is not too high.
Defining genetic associations
The parameters and variables of the model are summarized in Table 1. Throughout the following, the neutral locus is denoted A, while the selected locus is denoted B. Two alleles denoted 0 and 1 segregate at each locus (we will see below how the notation can be extended to deal with multiple neutral alleles), allele 1 at locus B being the deleterious allele. Indicator variables and
describe the genotype of an individual at locus i: these variables equal 1 if allele 1 is present on the maternally or paternally (respectively) inherited chromosome of this individual at locus i and 0 otherwise. The frequency of allele 1 at locus i (denoted
) is thus
(1)where
stands for the average over all individuals in the population. Neglecting drift, the frequency of the deleterious allele at mutation–selection balance (denoted
) is given by
(2)with
(e.g., Glémin 2003).
For each locus i, centered variables and
are defined as
(3)Following Kirkpatrick et al. (2002), the association between the sets
and
of loci present on the two haplotypes of the same individual is given by
(4)where
(5)(note that
), and where sets
and
may be the empty set
A, B, or
Associations between genes present on the same haplotype of an individual (
) are simply denoted
For example,
is a measure of the departure from Hardy–Weinberg equilibrium at locus A, while
represents the linkage disequilibrium between loci A and B (genetic association between alleles present on the same haplotype, maternal or paternal). Similarly,
measures the association between alleles at loci A and B present on different haplotypes of the same individual.
Because population size is finite, allele frequencies and genetic associations are random variables. Throughout this article, I use the notation for the expected value of the genetic moment
(a product of allele frequencies, genetic associations, or both) at a given generation: for example,
is the expected squared linkage disequilibrium between the two loci. In the following, the moment
will be of particular importance: indeed, using the fact that
and
(since these variables equal 0 or 1), we obtain that
(where
), thus representing the expected genetic diversity at locus A. Therefore, the effective population size at the neutral locus can be quantified by computing the rate of decay of
per generation.
As mentioned above, the model can be extended to an arbitrary number n of alleles segregating at the neutral locus. In this case, we can define indicator variables and
that equal 1 if the maternally (respectively, paternally) inherited chromosome of a given individual carries allele k at locus A (and 0 otherwise), with
Vectors
and
are defined as
and
in each of these vectors (and for a given individual) a single element equals 1 while all other elements equal zero. Finally, the vector
holds the frequencies of the different alleles at locus A in the population. Defining
and
genetic associations may be defined in the same way as above, associations with two “A” subscripts involving a dot product between the corresponding
vectors. In particular,
(6)where
is the frequency of allele k at locus A in the population. Similarly,
while
. Equation 6 shows that, as in the biallelic case,
represents the expected genetic diversity at the neutral locus. As mentioned earlier, the method for computing multilocus moments is explained in Appendix A for the case of a biallelic neutral locus, but the results shown below are valid for any number of alleles segregating at this locus.
Multilocus simulations
Analytical predictions are tested using individual-based, multilocus simulations. The simulation program (written in C++ and available from Dryad) is very similar to the program used in Roze (2015), representing a population of N diploid individuals whose genome consists of a linear chromosome with total map length R. Relatively large values of R (usually 10 M) are used in most simulations to mimic a whole genome with multiple chromosomes. Each generation, deleterious alleles occur at rate U per haploid genome; all deleterious alleles have the same selection and dominance coefficient and have multiplicative effects on fitness (no epistasis). As we will see, variable selection coefficients have been implemented in a different version of the program. Offspring are formed by selfing with probability α and by random fusion of gametes with probability A neutral locus with an infinite number of possible alleles is located at the midpoint of the chromosome, mutating at a rate
per generation. The program runs for
generations, genetic diversity at the neutral locus being recorded every 50 generations and measured as
(where
is the frequency of neutral allele i). The effective population size at the neutral locus is then estimated from
where
is the average neutral diversity at equilibrium, yielding
(7)(the exact expression including terms in
yields undistinguishable results for the parameter values used here). In the simulations,
is obtained by averaging after a burn-in period of 15,000 generations, which was sufficient for diversity to reach equilibrium with
As we will see, a different version of the program including a neutral sequence with an infinite number of sites was also used, in which case diversity takes longer to equilibrate.
Data availability
The author states that all data necessary for confirming the conclusions presented in the article are represented fully within the article. The Dryad DOI is doi:10.5061/dryad.p3r01.
Results
All results are obtained using the method presented in Appendix A for computing recursions for multilocus moments, implemented in a Mathematica notebook (Supplemental Material, File S1). All terms are derived to the first order in and to the first order in
(the frequency of the deleterious allele at mutation–selection balance, given by Equation 2).
General results
The expected change in genetic diversity at locus A per generation can be written as(8)where
is the change in diversity due to selection and
the change in diversity due to drift, given by
(9)to leading order. In the absence of selection,
= 0 while
at quasi-equilibrium is given by (from Equation A10)
(10)with
and the expected change in neutral diversity per generation thus becomes
(11)This corresponds to the classical result that
=
under partial selfing (Pollak 1987): the increased homozygosity caused by selfing amplifies the effect of drift, since the same allele is sampled twice every time a homozygote is sampled. When selection acts at locus B, we obtain the following expression for
to the first order in s:
(12)An expression to the second order in s is provided in Appendix B; however, both expressions generally yield very similar quantitative results, although adding the terms in
may slightly improve the predictions under loose linkage. The different terms that appear in Equation 12 may be interpreted as follows. From the definitions given in the previous section, the term
may also be written as
(where again
stands for the average over all individuals), thus measuring a covariance between
and
(note that
). The quantity
is higher in individuals carrying rarer alleles at the neutral locus: for example, in the case of a biallelic neutral locus, it equals
and
in 00, 01, and 11 individuals, where
is the frequency of allele 1. Furthermore, the quantity
is higher in individuals carrying more deleterious alleles at the selected locus (it is nearly 0, 1, and 2 in individuals carrying 0, 1, and 2 deleterious alleles at locus B, assuming
is small). Therefore, a positive value of
indicates that rarer alleles at the neutral locus tend to be found in individuals carrying higher numbers of deleterious alleles at the selected locus, while a negative value of
indicates the opposite. Recursions for the moments
and
over one generation are given in Appendix B, to the first order in s,
and
In the absence of selfing (
), we obtain that
at quasi-equilibrium, while
is generated by the variance in linkage disequilibrium
and by the effect of selection and is positive when
Indeed, when a given neutral allele becomes associated (by chance) to the deleterious allele at locus B, this neutral allele tends to decrease in frequency, generating a positive
(the deleterious allele at locus B tends to become associated with rarer alleles at locus A). This in turns reduces genetic diversity at the neutral locus (as shown by Equation 12), since these rarer alleles will further decrease in frequency due to their association with the deleterious allele. Because partial selfing generates cross-haplotype associations (between genes present on different haplotypes of a diploid),
and
are given by more complicated expressions when
involving moments such as
or
(see Appendix B).
Similarly, the quantity that appears on the second line of Equation 12 measures a covariance between
and
the quantity
being higher in homozygotes at locus B than in heterozygotes. A positive value of
thus indicates that rarer alleles at locus A tend to be found more often in homozygotes at locus B than in heterozygotes. As shown by Equation 12, this would reduce neutral diversity when the deleterious allele is partially recessive (
), since in this case homozygotes at the selected locus have a lower fitness than heterozygotes. A recursion for
is given in Appendix B (Equation B14); remarkably, it shows that identity disequilibrium (covariance in homozygosity) between the two loci generates negative
(that is, rarer alleles at locus A tend to be found more often in heterozygotes at locus B) even in the absence of selection. Indeed, setting
in Equation B14 yields at quasi-equilibrium
(13)where
is the identity disequilibrium between loci A and B (Weir and Cockerham 1969). It is given by
where
(14)is the probability of joint identity-by-descent at the two loci. This result may be interpreted as follows. Due to identity disequilibrium, the frequency of heterozygotes at locus A is higher among heterozygotes at locus B than among homozygotes. Furthermore, the frequency of rarer neutral alleles is higher among heterozygotes than among homozygotes at locus A (this is easily seen in the case of a biallelic locus): therefore, the frequency of rarer alleles at locus A should be higher among heterozygotes at locus B. As shown by Equation 12, this effect tends to increase neutral diversity, as long as the deleterious allele is partially recessive (
), so that heterozygotes at locus B have a higher fitness than homozygotes. As shown in the next subsection (high effective recombination), the effect of identity disequilibrium dominates over all other effects when the effective recombination rate
is sufficiently high [more precisely,
], in which case partially recessive deleterious alleles tend to increase neutral diversity (this effect usually stays rather small). However, weaker effective recombination increases the relative importance of the terms on the first line of Equation 12 that tend to decrease neutral diversity. Furthermore, weak effective recombination changes the sign of
(through the terms in s in Equation B14), due to the fact that the association between rarer alleles at locus A and the deleterious allele at locus B (represented by moments
,
) generates an association between those rarer alleles and homozygosity for the deleterious allele at locus B.
As shown in Appendix B, calculating the different terms of Equation 12 at quasi-equilibrium requires computing six two-locus moments that are generated by finite population size:
and
Recursions for these moments are also given in Appendix B. Although the solutions obtained are rather complicated, they are readily computed numerically using Mathematica (see File S2); furthermore, we will see that they can be approximated by simpler expressions in several cases. Importantly, all expressions obtained are in
and thus vanish when N tends to infinity.
Besides its effect on selection at locus B also affects the average excess homozygosity at locus A
and thus the term
in Equation 8. Because
is multiplied by
in Equation 9, it is sufficient to compute this moment in the limit as N tends to infinity, to obtain an expression for
to the first order in
Using the results in Roze (2015) (also derived in File S1), we have at quasi-equilibrium, to the first order in s and
(15)where again
is the identity disequilibrium between loci A and B. Indeed, homozygosity at locus A is reduced when the deleterious allele at locus B is partially recessive (
), due to the fact that homozygotes at locus A tend to be also homozygous at locus B, while homozygotes at locus B have a lower fitness than heterozygotes.
Putting everything together, we obtain an expression for the change in diversity at locus A over one generation of the form(16)where T is a function of s, h, r, and α. To the first order in
we thus have
(17)where
represents the effect of background selection. Although the expression obtained for T from the equations given in Appendix B is complicated, we will now see that simple approximations can be obtained in several regimes (in particular, high effective recombination, tight linkage, and high selfing).
High effective recombination (with partial selfing)
Under partial selfing and when the effective recombination rate is high (and assuming that the dominance coefficient h of the deleterious allele is significantly different from 0.5), Equation 12 is dominated by the term on the second line, since
is generated by drift even in the absence of selection and is thus proportional to
(see Equation 13), while the terms
and
on the first line are generated by selection and drift and are thus proportional to
Neglecting the term on the first line of Equation 12 and using Equations 9 and 15 yields the following expression for the change in neutral diversity, to the first order in s,
and
(18)Using Equation 2, we obtain for the effective population size at the neutral locus
(19)independent of s. Equation 19 shows that identity disequilibrium between the neutral and the selected locus (
) increases the effective population size at the neutral locus when the deleterious allele is partially recessive. This is caused by two effects: (i) identity disequilibrium reduces the excess homozygosity at locus A caused by selfing (Equation 15), since homozygotes at locus A tend to be also homozygous at locus B, while homozygotes at locus B have a lower fitness than heterozygotes when the deleterious allele is partially recessive; and (ii) the higher fitness of heterozygotes at locus B increases diversity at locus A since heterozygotes at locus B tend to be also heterozygous at locus A (Equation 13).
This increase in effective population size caused by identity disequilibrium usually stays modest, however (since it is expected to occur only for high effective recombination), and is thus difficult to observe in simulations. Background selection has stronger effects when the effective recombination rate becomes low, in which case the term in the first line of Equation 12 becomes of the same order of magnitude as the term on the second line [indeed, we can show that the denominator of
and
is proportional to ε when both
and s are of order ε], while the sign of
changes due to the effect of selection. Two approximations for this regime are given below (tight linkage, high selfing).
Random mating
In the absence of selfing (), Equations B4–B9 yield the following expressions for
and
at quasi-equilibrium,
(20)
(21)while
and
= 0. Furthermore, Equations B10–B14 yield
(22)while
and
= 0. Finally, the rate of decay of neutral diversity is given by
(23)Under tight linkage (i.e., both r and s are of order ε, where ε is a small term), we obtain from Equations 20–22
(24)
(25)giving
(26)in agreement with the results obtained by Hudson and Kaplan (1995) and Nordborg et al. (1996). Under loose linkage (
), Equations 20–23 yield
(27)Setting
in Equation 27 and replacing
by
yields
which is equivalent to Robertson’s (1961) heuristic result that selection at unlinked loci decreases the effective population size by four times the additive variance in fitness—indeed, the variance in fitness caused by selection against the deleterious allele is
(see also Charlesworth 2012).
Tight linkage
When r is small, Nordborg’s (1997) separation of timescales argument can be used to express associations between genes present on different haplotypes of a diploid in terms of associations between genes present on the same haplotype (see also Nordborg 2000; Padhukasahasram et al. 2008). Consider for example the association between two genes sampled at loci A and B from different haplotypes of an individual. Going backward in time, two different events may happen to the ancestral lineages of these genes: they may find themselves on the same haplotype (which may take only a few generations if both lineages stay in the same individual due to selfing) or move to different individuals due to an outcrossing event, in which case it will take a long time before they find themselves again in the same individual (assuming N is large). To leading order, the probability that these lineages join on the same haplotype before moving to different individuals is F. Considering now all possible pairs of genes at loci A and B on different haplotypes of the same individual (in all individuals of the population) and going backward in time, we may assume that a proportion F of such pairs find themselves on the same haplotype after a small number of generations, while the remaining
have moved to different individual lineages (and thus become independent): therefore,
—note that this approximation assumes tight linkage, as it neglects recombination events separating genes that have joined on the same haplotype, over the small number of generations considered. Using this approximation, we have
(28)Similarly,
(from Equation A9), which yields (assuming that
is small)
(29)Finally,
from which we obtain (using the fact that the moment
is negligible, as shown in File S1)
(30)Equations 28–30 can also be obtained from Equations B4–B14, under the assumption that r is small (see File S1). Plugging Equations 28–30 into Equation 12 yields
(31)Furthermore, plugging Equations 29 and 30 into Equations B4 and B10 and assuming that r is small yields the same expressions as Equations 24 and 25 (obtained for random mating) for
and
replacing h by
r by
and N by
Therefore, the present model converges to Nordborg’s (1997) result when loci are tightly linked.
Figure 1A shows the decrease in effective population size at the neutral locus caused by the deleterious allele ( see Equation 17), as a function of the recombination rate between the two loci (on a log scale). When r tends to zero,
tends to
and selfing increases
(as it decreases the frequency of the deleterious allele). When r is sufficiently high, however (∼ >0.01 for the parameter values used in Figure 1), increased selfing causes stronger background selection. Under complete selfing (
),
becomes independent of r and is
As can be seen in Figure 1A, the tight linkage approximation accurately predicts the solution obtained from Equations B4–B14, except when r and α are high (in which case it underestimates the strength of background selection). As we will see, this discrepancy for high values of r may lead to substantial differences when integrating over a large genetic map, as the majority of deleterious alleles are only loosely linked to the neutral locus, yet significantly affect
at this locus when selfing is high (this is more visible in Figure 1C, showing the strength of background selection as a function of the position of the deleterious allele along the genetic map). Note, however, that the tight linkage approximation yields the same prediction as the more general model when
(
).
Background selection effect generated by a single deleterious allele ( see Equation 17) on
at the neutral locus, as a function of the recombination rate r between the two loci (A and B) and of the position of the deleterious allele along a linear chromosome of total map length 10 M (C and D) (the neutral locus is located at position 0). Different colors correspond to different values of the selfing rate as indicated in A. Solid curves correspond to the results obtained from the equations in Appendix B, dashed curves in A and C correspond to the tight linkage approximation [Equation 26, replacing h by
and r by
], and dotted curves in B and D correspond to the high selfing approximation (Equation 34). Parameter values:
High selfing
Simple approximations can also be obtained when the selfing rate is high, for any value of the recombination rate r. From Equations B4–B9 and assuming that α is close to 1, we obtain(32)Furthermore, Equations B10–B14 yield
(33)which, using
generates the following approximation for the rate of decay of neutral diversity:
(34)Note that Equation 34 again yields
when
As shown in Figure 1D, this approximation better matches the general model than the tight linkage approximation when the selfing rate is high (but <1) and when linkage is loose. However, Figure 1B shows that it differs substantially from the general model under tight linkage and is thus expected to perform poorly when the selfing rate is not high (since in this case background selection is mainly caused by deleterious alleles that are tightly linked to the neutral locus).
Multilocus extrapolation and simulations
Following previous work (e.g., Hudson and Kaplan 1995; Nordborg 1997; Glémin 2007; Kamran-Disfani and Agrawal 2014), results from the two-locus model may be extrapolated to the case of deleterious alleles segregating at multiple loci by assuming multiplicative effects of the different deleterious alleles on neutral diversity. When the neutral locus is located at the midpoint of a linear genome with total map length R, and when all deleterious alleles have the same selection and dominance coefficients (as in the simulation program), this yields (using Equation 2)(35)where U is the deleterious mutation rate per haploid genome and
corresponds to the term T in Equation 16, in which the recombination rate r is expressed in terms of the genetic distance x (in morgans) between the neutral locus and the deleterious allele, using Haldane’s mapping function
(Haldane 1919). This integral can be computed numerically as shown in File S2.
Figure 2 shows the effective size of a population of census size 20,000 as a function of the selfing rate, for a haploid genomic deleterious mutation rate and genome map length
M. As can be seen in Figure 2, the analytical model (solid curves) fits well with the simulation results for all values of α. Note that in Figure 2, Figure 3, Figure 4, and Figure 5, solid curves have been obtained by calculating numerically the integral in Equation 35, using a system of recursions expressed to the second order in s for the different moments shown in Appendix B (see File S1 and File S2); however, the results obtained using the recursions given in Appendix B (expressed to the first order in s) are undistinguishable in most cases (results not shown). Explicit forms can be obtained for the integral in Equation 35 (as a function of R and the different parameters of the model), when using either the tight linkage approximation or the high selfing approximation, and are given in File S2. The results shown in Figure 2 confirm that the tight linkage approximation underestimates the effect of background selection when the selfing rate is high, as loosely linked deleterious alleles significantly affect neutral diversity. In this case, the high selfing approximation is more accurate, closely matching the simulation results when
Effective population size as a function of the selfing rate α, for different values of the dominance coefficient of deleterious alleles h. Solid circles show simulation results (in this and subsequent figures, error bars are smaller than the size of the solid circles), solid curves show analytical predictions from the complete model (see File S2), dashed curves show the tight linkage approximation [Equation 26, replacing h by
and r by
], and dotted curves show the high selfing approximation (Equation 34). Parameter values:
Effective population size as a function of the selfing rate α, for different values of genome map length R: 0.1 (red), 1 (orange), 10 (green), and 100 (blue). Solid circles show simulation results, solid curves show analytical predictions from the complete model, dashed curves (left) show the tight linkage approximation, and dotted curves (right) show the high selfing approximation. Parameter values:
Effective population size as a function of the selfing rate α, for different values of the strength of selection against deleterious alleles s: 0.005 (blue), 0.05 (green), and 0.5 (red). Solid circles show simulation results, solid curves show analytical predictions from the complete model, dashed curves (left) show the tight linkage approximation, and dotted curves (right) show the high selfing approximation. Parameter values:
Effective population size as a function of the selfing rate α, for different values of the dominance coefficient of deleterious alleles h. Circles and curves are the same as in Figure 2 with
Figure 3 and Figure 4 show that the full model provides accurate predictions for for different values of map length R (from 0.1 to 100) and strength of selection against deleterious alleles s (from 0.005 to 0.5). As expected, the tight linkage approximation works better at lower values of R. When the selfing rate is low, increasing s magnifies the strength of background selection, while it has the opposite effect under high selfing (due to the fact that mutation-free individuals are more abundant when selection against deleterious alleles is stronger). Finally, Figure 5 shows that increasing the deleterious mutation rate U (to 0.5 per haploid genome) increases the strength of background selection, leading to very low values of
at high selfing rates (
estimated from the simulations is close to 45 when
). When
discrepancies between the full model and the simulations appear at low values of h (
in particular) and intermediate values of α: these are probably caused by genetic associations between selected loci (identity disequilibrium in particular), which are neglected in the analysis. Using an expression for the frequency of deleterious allele
that takes into account the effect of identity disequilibria (Roze 2015) does not significantly improve the results (not shown). Discrepancies also appear at higher values of h and high selfing rate (
0.8–0.9), the analytical model underestimating the strength of background selection. Finally, the model overestimates the effect of background selection when the selfing rate is very high (
), for all values of h: although this is not visible in Figure 5 (as
is very small when α is high), it becomes apparent when
is plotted on a log scale, as shown in Figure S1. For these parameter values (
), selection against deleterious alleles is no longer efficient and these accumulate over time in the population (results not shown).
Equation 35 can be extended to the more realistic case where deleterious alleles at different loci have different fitness effects (assuming that selection remains sufficiently strong at most loci so that deleterious alleles stay near mutation–selection balance), by replacing s and h by functions of map position x (e.g., Charlesworth 2012). If the number of deleterious loci is large and if the distribution of fitness effects of mutations does not depend on genomic location, the integral in Equation 35 can be replaced by a double integral, over map position x and over the joint distribution of s and h. To test this, the simulation program was modified to include a log-normal distribution of selection coefficients s across loci, assuming a constant heterozygous effect of mutations (), generating a negative covariance between s and h (see figure 5 in Roze 2015). Figure 6 shows results for
and for different values of the variance of deleterious effects of mutations across loci, setting the average of the log-normal distribution and the heterozygous effect of mutations so that
and
in all cases. As can be seen in Figure 6, increasing the variance of s slightly increases
and this effect is captured by integrating Equation 34 (high selfing approximation) over the distribution of s.
Left: effective population size as a function of the standard deviation σ of
across loci, assuming a log-normal distribution of s and setting the average of
so that
for all values of σ. Solid curve shows prediction obtained by integrating Equation 34 over the distribution of s and over the genetic map; solid circles show simulation results. Parameter values:
In the simulations the heterozygous fitness effect
is the same for all mutations and adjusted so that
for all values of σ. Right: the distribution of s for
(solid curve), 0.5 (dashed curve), and 1 (dotted curve).
Temporal change in population size
It is worth emphasizing that the moment-based method presented in this article yields an expression for the “inbreeding effective size” (as it is based on the rate of decay of neutral diversity or the instantaneous rate of coalescence), while coalescent-based methods (e.g., Hudson and Kaplan 1995; Nordborg 1997) yield an expression for the “coalescence effective size,” corresponding to half the average coalescence time between two sequences randomly sampled from the population. After a single change in population size, and if selection and recombination rate are sufficiently large relative to the different moments computed in Appendix B should equilibrate quickly relative to the change in neutral diversity, and the inbreeding effective population size should thus rapidly converge to its equilibrium value for the new census population size. By contrast, expected coalescence times will take longer to equilibrate. Assuming that the inbreeding effective size instantaneously reaches its new equilibrium value, the expected coalescence time t generations after the change is given by
(36)where
is the new inbreeding effective size and
the average coalescence time at the time of the change in population size. To test this prediction, the simulation program was modified to include a neutral sequence with an infinite number of sites at the midpoint of the chromosome. Indeed, the expected coalescence time between two sequences is simply given by
where μ is the mutation rate within the sequence and D the average number of differences between two sequences, given by
(where the sum is over all segregating neutral sites within the sequence). Figure 7 shows an example in which
during the first
generations (starting from a monomorphic population, which is equivalent to the coalescence of all lineages at generation zero), while
during the last
generations. As can be seen in Figure 7, the quasi-equilibrium argument leading to Equation 36 correctly predicts the dynamics of
both during the initial phase and after the change in population size. The quasi-equilibrium argument could also possibly be used in situations where population size changes continuously over time (i.e., exponential population growth), although this was not explored here.
Average coalescence time between two randomly sampled sequences at a neutral locus located at the midpoint of the chromosome, as a function of time. The population is monomorphic at time zero (all lineages coalesce), during the first
generations, and
during the last
generations. Parameter values:
Dashed curve shows prediction from Equation 36, where the inbreeding effective size is obtained from Equation 35 and assumed to reach instantaneously its equilibrium value for a given N. Solid curve shows simulation results (average over 4000 replicate simulations), where
is estimated from the diversity of a neutral sequence with an infinite number of sites (see text) and mutation rate
per generation.
Discussion
We have seen how multilocus population genetics theory can be used to express the effect of a deleterious allele at mutation–selection balance on the dynamics of diversity at a linked neutral locus in terms of moments of linkage disequilibrium and other genetic associations between these two loci. This provides an alternative to methods based on computing expected coalescence times of pairs of genes present on different types of genetic backgrounds (e.g., Nordborg 1997; Agrawal and Hartfield 2016) and allows one to decompose the background selection effect into different terms for which intuitive interpretation may be given. In a panmictic population, background selection is driven by the variance in linkage disequilibrium between the two loci: neutral alleles that become associated to the deleterious allele tend to decrease in frequency, eventually causing their loss from the population. Under partial selfing, background selection is reinforced by random associations between a given neutral allele and selected alleles present on the other haplotype of the same individual (
) and associations between neutral alleles and homozygosity at the selected locus (
). Furthermore, we have seen that identity disequilibrium (correlation in heterozygosity between loci) has the opposite effect and enhances neutral diversity when the deleterious allele is partially recessive, but this effect usually stays modest.
Under tight linkage, the results converge to the approximation derived by Nordborg (1997), using a separation of timescales argument. In that case, the population behaves approximately as a panmictic population of size in which the dominance coefficient of the deleterious allele and the recombination rate are replaced by the effective parameters
and
Similarly, Agrawal and Hartfield (2016) showed that in a population reproducing sexually at rate σ and asexually (by mitosis) at rate
the effect of a deleterious allele located at a small recombination distance r from the neutral locus takes the same form as in a panmictic (fully sexual) population, replacing r by the effective recombination rate
(as shown in Appendix C, the effect of partial asexuality can also be derived using the methods of the present article). However, Agrawal and Hartfield (2016) also showed that this tight linkage approximation underestimates the strength of background selection caused by loosely linked loci, which becomes important when sex is rare. We have seen that a similar result holds under partial selfing: at high selfing rates, neutral diversity may be significantly affected by loosely linked deleterious alleles, whose effect is underestimated by the tight linkage approximation. In that case, a more precise approximation is provided by Equation 34 above.
As mentioned previously (Nordborg 1997), assuming multiplicative effects of deleterious alleles at different loci on neutral diversity (Equation 35) should be less accurate under partial selfing than under random mating, since inbreeding generates different forms of genetic associations between those loci (e.g., Kamran-Disfani and Agrawal 2014; Roze 2015). Nevertheless, we have seen that Equation 35 yields accurate predictions as long as the genomic deleterious mutation rate stays moderate ( Figure 2, Figure 3, and Figure 4), so that the average number of deleterious alleles per genome is not too large. Another important assumption of the model is that selection against deleterious alleles is sufficiently strong relative to drift, so that these alleles are maintained near their deterministic mutation–selection equilibrium frequency. In the regime where
or lower, sometimes called “interference selection” or a “weak Hill–Robertson interference” regime (e.g., McVean and Charlesworth 2000; Comeron and Kreitman 2002; Good et al. 2014), background selection models assuming
may overestimate the effect of deleterious alleles on neutral diversity by several orders of magnitude (Kaiser and Charlesworth 2008; Good et al. 2014). This may possibly be due to negative linkage disequilibria between deleterious alleles generated by the Hill–Robertson effect (Hill and Robertson 1966), reducing the variance in fitness in the population and thus increasing coalescence times. As illustrated in Figure S1, this regime may be particularly important in highly selfing populations and when the deleterious mutation rate U is high, since
may be sufficiently reduced to affect the efficiency of selection at an important proportion of selected sites. This is confirmed by empirical observations of higher ratios of nonsynonymous to synonymous polymorphism (
) in selfing lineages than in their outcrossing relatives (Glémin and Muyle 2014; Hartfield 2015 and references therein), indicating that a significant fraction of deleterious alleles may increase in frequency due to drift (in which case the assumption that
at most loci is not valid). It is thus important to keep in mind that background selection models such as the one presented here overestimate the reduction in
in such situations, and it would be interesting (although probably challenging) to obtain analytical predictions for neutral diversity in populations undergoing low rates of sex or high selfing rates and in which selection against a high proportion of deleterious mutations is rendered ineffective.
More generally, the strong reduction in effective population size of highly selfing (or asexual) populations caused by background selection may have an important influence on stochastic processes that would play a more marginal role in populations with larger For example, identity disequilibrium between selected loci in a partially selfing population generates positive linkage disequilibrium between these loci, but as shown by Kamran-Disfani and Agrawal (2014), stochastic forces generating negative linkage disequilibrium become stronger than this deterministic effect when the selfing rate is high. In a similar way, deterministic forces acting on the evolution of recombination rates (e.g., Roze and Lenormand 2005) or mutation rates (e.g., Lynch 2010) may be overwhelmed by stochastic forces under strong inbreeding. Developing analytical models that could scale these different types of effects may thus help us to better understand how mating systems affect the evolution of genetic architecture. The methods presented in this article could possibly be used to explore such questions.
Acknowledgments
I thank the bioinformatics and computing service of the Roscoff Biological Station for computing time and two anonymous reviewers for helpful comments. This work was supported by the French Agence Nationale de la Recherche (project TRANS, ANR-11-BSV7-013 and project Clonix, ANR-11-BSV7-007).
Appendices
Appendix A: Deriving Recursions for Multilocus Moments
In the following, I use the notation for genetic associations measured at the next generation, while
and
represent associations measured among juveniles (after reproduction, before drift) and among parents after selection (that is, weighting each parent by its relative fitness). Therefore, selection changes associations
to
recombination and fertilization (with partial selfing) change associations
to
and drift changes associations
to
Similarly,
denotes allele frequencies at the next generation, while
represents allele frequencies among juveniles (which are the same as among selected parents, as recombination and fertilization do not change allele frequencies). In the following I show how to derive equations representing these different steps. For this, I focus on examples rather than presenting general (and necessarily cumbersome) equations; however, general expressions are implemented in a Mathematica notebook (File S1). Throughout this article, I assume that selection is weak (s small) and that population size is sufficiently large so that
however, no assumption is done on the relative orders of magnitude of r and s. Finally, I assume that
so that the frequency of the deleterious allele at mutation–selection balance (
given by Equation 2) is small.
Drift
To illustrate the effect of drift on genetic moments, consider the expected diversity at the neutral locus, By definition,
where
is the average over all individuals of the next generation, while
is the frequency of allele 1 at locus A among these individuals. Writing
the change in allele frequency due to drift, we have
(A1)Expanding and grouping terms in
this is
(A2)In the following, I use the notation
for moments measured among individuals of the next generation (after drift), but using the values of allele frequencies (called “reference values” in Kirkpatrick et al. 2002) before drift (among juveniles): in particular,
while we have
Noting that
Equation A2 can be written as
(A3)Next, we can note that products of genetic associations may also be viewed as associations between genes present in two individuals, sampled with replacement from the whole population: for example,
is the association between one gene at locus A from a first individual and one gene at locus A from a second individual sampled with replacement from the population at the next generation (and using allele frequencies among juveniles as reference values). Following previous works (Roze and Rousset 2008; Roze 2009; Roze and Michod 2010), I thus write such products as single associations, using the symbol
to separate sets of genes present in different individuals sampled with replacement from the population. Using this notation,
is written as
and Equation A3 becomes
(A4)Equation A4 corresponds to the first step of the computation of the effect of drift on genetic moments (i.e., taking into account changes in allele frequencies due to drift). The second (and last) step consists of expressing associations between genes present in different individuals sampled with replacement from the population (at the next generation) in terms of associations between genes in individuals sampled without replacement. For example, we have
(A5)where
is the association between two genes at locus A from two individuals sampled without replacement, at the next generation (and using allele frequencies before drift as reference values). Indeed, two genes sampled with replacement from the population may be the same gene with probability
be the two homologous copies of the same individual with probability
or come from two different individuals with probability
Finally, because the different individuals of the next generation have been sampled independently from an infinite population of juveniles, associations between genes present in different individuals (and using as reference values allele frequencies among juveniles) are the same as associations measured among juveniles: in particular,
and
Noting that
[indeed, because the population of juveniles is infinite, we have
while
from the definition of genetic associations], we finally obtain
(A6)representing the effect of drift on the moment
Because all results are computed to the first order in
it is sufficient to express the moments that are multiplied by
in Equation A6 in the limit as N tends to infinity.
The same method can be used to compute the effect of drift on other genetic moments. For example, using the same reasoning as for deriving Equation A4 above, we obtain for the expected squared linkage disequilibrium(A7)while computing these moments in terms of associations among juveniles yields
(A8)Again, because all results are computed to the first order in
it is sufficient to compute the four moments within parentheses in Equation A8 in the limit as population size tends to infinity [we will see below that the moments
and
become negligible in this limit, while the moment
is generated by selfing].
Equation A8 shows that genetic associations with repeated “B” indices (such as ) may appear within recursions (see also Equation A12). Because locus B is biallelic, these repeated indices can be eliminated using the relation (e.g., equation 5 in Kirkpatrick et al. 2002)
(A9)where
is any set of loci, and i is a biallelic locus. However, repeated “A” indices will not be eliminated when computing recursions, so that the equations obtained still hold for the case where more than two alleles segregate at the neutral locus.
Recombination and fertilization
Computing moments measured among juveniles in terms of moments measured among selected parents is somewhat simpler, as recombination and fertilization do not change allele frequencies: therefore, we only have to consider the different possible modes of transmission of genes between generations. In particular, while
(A10)Indeed, the two homologous genes of a juvenile at locus A come from the same parent if this juvenile has been produced by selfing (probability α), in which case they are copies of the same parental gene with probability 1/2, while they come from the two homologous genes of the parent with probability
With probability
the juvenile has been produced by outcrossing, in which case its two homologous genes are sampled with replacement from the parental population: the association thus becomes
which equals zero (since
). Similarly, we have
(A11)while
(A12)where
is given by Equation A11.
Selection
As for the effect of drift, computing the effect of selection on genetic moments can be decomposed into two steps (see also Barton and Turelli 1991; Kirkpatrick et al. 2002). The first step consists in taking into account the change in reference values (i.e., allele frequencies that appear within associations) due to selection: denoting moments measured among selected parents, but using allele frequencies before selection (
) as reference values (instead of allele frequencies after selection
), we have (following the same reasoning as for the derivation of Equations A4 and A7)
(A13)while
(A14)Finally, computing associations measured after selection (using as reference values allele frequencies before selection)
in terms of associations measured before selection
is done by weighting each individual by its relative fitness
(A15)where E is the average over all individuals before selection, W is the fitness of the individual, and
is the mean fitness of the population. To express the right-hand side of Equation A15 in terms of genetic associations, it is useful to write
in terms of
variables (e.g., Barton and Turelli 1991; Kirkpatrick et al. 2002). The fitness of an individual can be written as
(A16)which, after rearranging and averaging over all individuals, yields
(A17)with
Equation A17 thus takes the form of a polynomial of
variables. However, because terms in
and
appear in the denominator, obtaining expressions in terms of moments of genetic associations and allele frequencies requires developing
as a Taylor series in s. To the first order in s, we have
(A18)From this, we obtain, to the first order in s,
(A19)Given that moments
and
equal zero at quasi-equilibrium to the first order in
and
(see File S1), Equation A19 yields Equation 12 in the main text. Associations between genes present in different individuals are obtained similarly. For example,
equals zero to the first order in s (because
), while Equation A18 yields the following expression to the second order in s:
(A20)Equations A19 and A20 illustrate the fact that computing the effect of selection on a given genetic moment introduces more complicated moments, with a higher number of B indices. This may lead to an infinite system of recursions, as the effect of selection on these moments will introduce yet other moments with even more B indices. However, assuming
and
(so that the frequency of the deleterious allele at mutation–selection balance,
remains small), we may neglect moments that are proportional to
—more precisely, moments that are
—to obtain expressions to the first order in
As we will see, using this “rare allele approximation” yields closed systems of recursions for genetic moments.
Rare allele approximation
Deriving expressions to the first order in can be done using the following general rule. Because all genetic associations involving at least one B index are proportional to
moments involving two elements with a B index (where an element is either an allele frequency
or an association
among genes present in the same individual) are of order
in the limit as population size tends to infinity. For example,
and
are all of order
as N tends to infinity, while
and
are of order
in the same limit. Now, taking the effect of drift into account when computing recursions for genetic moments generally introduces moments with one less element carrying a B index, multiplied by
This is illustrated by Equation A8 showing the effect of drift on the moment
(two elements with a B index): drift introduces two terms involving moments carrying a single element with a B index (
and
). Because these moments are multiplied by
in Equation A8, it is sufficient to express them in the limit as N tends to infinity: in this limit, these terms are thus proportional to
(by contrast, the terms in
and
in Equation A8 are proportional to
and may thus be neglected). More generally, moments carrying x elements with a B index are thus of order
in the limit as N tends to infinity and of order
to the first order in
(for
). To compute expressions to the first order in
we may thus neglect all moments that must be expressed to the first order in
and that carry more than two elements with a B index (such as
) and all moments expressed in the limit as N tends to infinity carrying more than one element with a B index (such as
). Using this approximation, Equation A20 simplifies to
(A21)Furthermore, we obtain for the moments
and
(that are needed to compute a recursion for
, as shown by Equation A11), to the first order in s



Appendix B: Recursions for Two-Locus Moments
Using the method shown in Appendix A, we obtain for the change in neutral diversity during selection, to the second order in s,(B1)while the change in neutral diversity during drift is given by Equation 9 in the main text. The same method can be used to compute recursions for the different moments that appear in Equation B1, from which solutions at quasi-equilibrium can be obtained. For example, Equations A8, A11, and A22–A24 yield the following recursion for
(B2)To obtain an expression to the first order in
it is sufficient to express
and
in Equation B2 in the limit when N tends to infinity. Using Equation A9, we have
Furthermore,
equals zero at quasi-equilibrium in an infinite population (indeed, the solution obtained for
from the equations below is in
): therefore,
when population size tends to infinity. From Equation A12, a recursion for
when N tends to infinity and
is given by
(B3)Using
and
yields
where
is given by Equation 14, and corresponds to the probability of joint coalescence of two pairs of genes sampled at loci A and B due to selfing, in an infinite population. It is possible to compute
to the first order in s, but the term in s is always negligible when selection is weak and is thus ignored here. Using the expressions just derived for
and
Equation B2 becomes
(B4)Similarly, we obtain the following recursions for moments
and
(which are also generated by finite population size), to the first order in s,
and
(recursions to the second order in s are derived in File S1, but yield very similar quantitative results in most cases):
(B5)
(B6)
(B7)
(B8)
(B9)Equations B4–B9 can be solved at quasi-equilibrium (setting
for each moment) to obtain expressions for the six moments of the form
where Z is a function (that differs for each moment) of s, h, r, and α. Although these expressions are complicated, they are easily computed numerically using Mathematica (see File S2).
Recursions for the moments and
that appear on the first line of Equation 12 are given by (to the first order in s,
and
)
(B10)
(B11)It is sufficient to express the moment
that appears in Equations B10 and B11 in the limit as N tends to infinity. To the first order in s, we have
(B12)giving at quasi-equilibrium
(B13)(see also Roze 2015), where
is the identity disequilibrium between loci A and B. Finally, a recursion for
(second line of Equation 12) to the first order in s,
and
is given by
(B14)Equations B10, B11, and B14 can be solved to obtain expressions for
and
at quasi-equilibrium (using the expressions for
and
obtained from Equations B4–B9); again, Mathematica commands to obtain numerical solutions can be found in File S2. Approximations for high selfing are obtained by assuming that s and
are of order ε and expressing Equations B4–B14 to the first order in ε. Similarly, tight linkage approximations are obtained by assuming that s and r are of order ε (see File S1).
Appendix C: Partial Asexuality
Background selection under partial asexuality in diploids has been explored recently by Agrawal and Hartfield (2016), using coalescence models. The method presented in Appendix A can also be used for the case of a population in which a proportion σ of offspring are produced by sexual reproduction each generation (with random gamete fusion) and a proportion by asexual reproduction (mitosis), as shown in Roze and Michod (2010). In particular, we obtain for the change in neutral diversity over one generation (to the second order in s and assuming that h is significantly different from zero)
(C1)Recursions for the different moments that appear in Equation C1 are given by [to the first order in
and
]
(C2)
(C3)
(C4)
(C5)
(C6)Expressions for
and
at quasi-equilibrium can be obtained from Equations C2–C6 and plugged into Equation C1. When the rate of sex σ is small, this yields equations 5 and 6 in Agrawal and Hartfield (2016).
Footnotes
Communicating editor: N. A. Rosenberg
Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.187955/-/DC1.
- Received February 4, 2016.
- Accepted April 7, 2016.
- Copyright © 2016 by the Genetics Society of America