## Abstract

In small populations, genetic linkage between a polymorphic neutral locus and loci subject to selection, either against partially recessive mutations or in favor of heterozygotes, may result in an apparent selective advantage to heterozygotes at the neutral locus (associative overdominance) and a retardation of the rate of loss of variability by genetic drift at this locus. In large populations, selection against deleterious mutations has previously been shown to reduce variability at linked neutral loci (background selection). We describe analytical, numerical, and simulation studies that shed light on the conditions under which retardation *vs.* acceleration of loss of variability occurs at a neutral locus linked to a locus under selection. We consider a finite, randomly mating population initiated from an infinite population in equilibrium at a locus under selection. With mutation and selection, retardation occurs only when *S*, the product of twice the effective population size and the selection coefficient, is of order 1. With *S* >> 1, background selection always causes an acceleration of loss of variability. Apparent heterozygote advantage at the neutral locus is, however, always observed when mutations are partially recessive, even if there is an accelerated rate of loss of variability. With heterozygote advantage at the selected locus, loss of variability is nearly always retarded. The results shed light on experiments on the loss of variability at marker loci in laboratory populations and on the results of computer simulations of the effects of multiple selected loci on neutral variability.

- associative overdominance
- background selection
- deleterious mutations
- heterozygote advantage
- neutral variability

THERE has recently been much interest in the effects of selection at one locus on patterns of evolution and variation at linked neutral or nearly neutral loci, with mounting evidence for such effects from surveys of genome-wide patterns of molecular variability and evolution (Cutter and Payseur 2013; Neher 2013; Charlesworth and Campos 2014). Attention has especially been directed at the possibility of enhanced neutral variability at nucleotide sites that are closely linked to sites under long-term balancing selection (Charlesworth 2006; Gao *et al.* 2015) and at the reduction in variability caused by the hitchhiking effects of directional selection, involving either positive selection (selective sweeps) (Maynard Smith and Haigh 1974) or negative selection (background selection) (Charlesworth *et al.* 1993). There seems little doubt that such linkage effects play an important role in shaping patterns of variability across the genome (Cutter and Payseur 2013; Charlesworth and Campos 2014).

Recent studies have, however, made little or no reference to classical work on associative overdominance (AOD), dating back to >40 years ago. Following a proposal by Frydenberg (1963), who coined the term, it was shown by Sved (1968, 1971, 1972) and by Ohta and Kimura (1970; Ohta 1971, 1973) that linkage disequilibrium (LD) between a polymorphic neutral locus and a locus subject either to selection in favor of heterozygotes or to selection against recessive or partially recessive mutant deleterious alleles could result in apparent heterozygote advantage at the neutral locus. This is because such LD, generated by genetic drift in a randomly mating finite population, leads to an association between the homozygosities of the two loci (Haldane 1949). If homozygosity at the selected locus results in reduced fitness, homozygotes at the neutral locus will appear to be at a selective disadvantage. A heuristic treatment of this effect is given by Charlesworth and Charlesworth (2010, pp. 396–397 and 403–404).

This effect of LD in a randomly mating population should be distinguished from the effect of identity disequilibrium (ID) in a population with a mixture of random mating and matings between close relatives (Haldane 1949; Cockerham and Weir 1968), as exemplified by species that reproduce by a mixture of outcrossing and self-fertilization. Here, AOD can arise because of variation among individuals in their inbreeding coefficients, which causes correlations among loci in their levels of homozygosity even if they are unlinked (Haldane 1949; Cockerham and Weir 1968). Theoretical models show that this type of AOD does not require either finite population size or LD (Ohta and Cockerham 1974; Charlesworth 1991).

Associations between heterozygosities at putatively neutral molecular marker loci and higher values of fitness components have frequently been found in natural populations, leading to a debate as to whether AOD due to LD with random mating or to ID caused by variation in inbreeding levels is the main cause of these associations (David 1998; Hansson and Westerberg 2002). Current evidence appears to favor the latter model (Szulkin *et al.* 2010; Hoffman *et al.* 2014), especially because the LD model seems to require a relatively small effective population size to produce substantial effects, unless linkage is very tight (Ohta 1971, 1973).

It has also been reported that the level of variability in both quantitative traits and marker loci in laboratory populations can sometimes decline less rapidly over time than is expected under the standard neutral model (*e.g.*, Rumball *et al.* 1994; Latter 1998; Gilligan *et al.* 2005). In addition, laboratory populations and populations of domesticated animals and plants often have levels of quantitative trait variability that are surprisingly high, given their low effective population sizes (Johnson and Barton 2005; Hill 2010). Sved (1968) and Ohta (1971) proposed that AOD caused by randomly generated LD leads to a retardation in the loss of variability at neutral loci linked to loci under selection. Ohta suggested that this effect could be substantial when the effects of partially recessive mutations distributed over a whole chromosome are considered, provided that the population size is sufficiently small (Ohta 1971, 1973). Computer simulations of multilocus systems have indeed shown that a retardation of the rate of loss of neutral variability can occur in small populations (Latter 1998; Pamilo and Palsson 1998; Palsson and Pamilo 1999; Wang and Hill 1999; Wang *et al.* 1999). However, an accelerated rate of loss is observed when the level of dominance of deleterious mutations is sufficiently high and/or selection is sufficiently strong. This effect reflects background selection (BGS), whereby the effective population size experienced by a neutral locus is reduced by the presence of linked, deleterious mutations (Charlesworth *et al.* 1993).

AOD due to LD is an attractive potential explanation for the maintenance of unexpectedly high levels of variability in small populations. However, we lack a quantitative theory concerning the conditions under which AOD produces noticeable effects on levels of variability, other than for the cases of selfing and sib-mating lines studied by Wang and Hill (1999). It is also unclear when retardation *vs.* acceleration of the rate of loss of variability is likely to prevail in randomly mating populations, especially with multiple loci subject to mutation and selection. As a first step toward such a theory, we present some analytical, numerical, and simulation results on the simplest possible model: a neutral locus linked to another locus subject to selection. We also present simulation results for a neutral locus surrounded by a pair of selected loci. Selection can involve either partially recessive deleterious mutations or heterozygote advantage.

Our main focus is on a small, randomly mating population, founded from a large initial equilibrium population, mimicking experiments on the rates of loss of neutral variability in laboratory populations. We derive approximate expressions for the apparent selection coefficients against homozygotes at the neutral locus, as well for the rate of loss of neutral variability. We show that, contrary to what seems to have been widely assumed, these apparent selection coefficients have nothing to do with the rate of loss of variability, assuming that selection is sufficiently weak that second-order terms in the selection coefficients can be neglected. In particular, with selection against partially recessive deleterious mutations there is a wide range of parameter space in which BGS increases the rate of loss of variability, but there is always an apparent selective advantage to heterozygotes at the neutral locus. When selection is very weak, however, AOD always causes a retardation of the loss of neutral variability, unless mutations are close to being semidominant in their fitness effects.

## Theory

### Apparent selection coefficients against homozygotes at a neutral locus caused by a linked locus subject to mutation to deleterious alleles

We assume that the neutral locus is segregating for two alleles, A_{1} and A_{2}, with frequencies *x* and *y* = 1 – *x*, respectively, in a given generation. The selected locus has wild-type and deleterious mutant alleles, B_{1} and B_{2}, respectively, with frequencies *p* and *q* = 1 – *p*. The selection and dominance coefficients at this locus are *h* and *s*, such that the relative fitnesses of B_{1}B_{1}, B_{1}B_{2}, and B_{2}B_{2} are 1, 1 – *hs*, and 1 – *s*. The mutation rates for B_{1} to B_{2} and B_{2} to B_{1} are *u* and *v*, respectively. Let the haplotype frequencies of A_{1}B_{1}, A_{1}B_{2}, A_{2}B_{1}, and A_{2}B_{2} be *y*_{1}, *y*_{2}, *y*_{3}, and *y*_{4}. We have *x* = *y*_{1} + *y*_{2}, *q* = *y*_{2} + *y*_{4}. The frequency of recombination between the two loci is *c*.

With random mating, the apparent fitnesses (denoted by tildes) of the three genotypes A_{1}A_{1}, A_{1}A_{2}, and A_{2}A_{2} at the neutral locus, conditional on these haplotype frequencies, are as follows (Sved 1968, 1972; Ohta 1971):(1a)(1b)(1c)The extent of apparent selection at a neutral locus, induced by LD in a randomly mating finite population, can be assessed by determining the expectations of these apparent fitnesses over the distribution of haplotype frequencies induced by drift, conditioning on segregation at the A locus (Ohta and Kimura 1970; Ohta 1971). In the present case, the starting point is assumed to be a population of infinite size, at equilibrium under mutation and selection at the B locus and with no LD between the two loci. It is thereafter maintained at a population size of *N* breeding individuals each generation. For simplicity, a Wright–Fisher population model is assumed, so that *N* is also the effective population size.

The expectations for a given generation *t* after the foundation of the population are most simply found by expressing the haplotype frequencies in terms of the products of the relevant allele frequencies and the coefficient of linkage disequilibrium, *D* = *y*_{1} *y*_{4} – *y*_{2} *y*_{3}. The expectation of *D*, E{*D*}, remains at zero, since selection, mutation, and drift do not affect the direction of association between the two loci; this applies generally to E{*Dq ^{i}*}, where

*i*is an arbitrary nonnegative integer (a proof is given in section S4 of Supplemental Material, File S1). This does not, however, imply that quantities involving the expectation of products of

*D*and other functions of the allele frequencies at the two loci can be ignored, which complicates the analyses.

Simple algebra (Ohta 1971) yields expressions for the expected apparent selection coefficients against A_{1}A_{1} and A_{2}A_{2} homozygotes (neglecting second-order terms in *s*) over a set of replicate populations that are segregating at the A locus. For simplicity, we simply refer to these as the apparent selection coefficients and use tildes to distinguish them from the selection coefficients at the selected locus. We have(2a)(2b)It has previously been assumed that the terms in *D* in Equations 2 can be neglected, so that the extent of apparent overdominance at the neutral locus is given by the terms in *D*^{2} alone. This allows approximations for the apparent selection coefficients to be derived; two different approaches have been used, as described by Sved (1968, 1972), Ohta and Kimura (1970), and Ohta (1971) and by Bierne *et al.* (2000), respectively. We also neglect the terms in *D*, which are numerically small. We use an approach that takes into account the initial allele frequency at the neutral locus in the founding generation, as well as the subsequent effects of drift in creating a probability distribution around this frequency.

The measures of apparent overdominance in Equations 2 can be conveniently be written as(3a)(3b)where *r*^{2} = *D*^{2}/[*x*(1 − *x*)*pq*] is the squared correlation coefficient in allelic state between the two loci (Hill and Robertson 1968). The difference between Equations 2 and 3 in the notation for the apparent selection coefficients is intended to emphasize the absence of terms in *D* from Equations 3.

To obtain useful approximate expressions for these apparent selection coefficients at an arbitrary time *t*, we assume that the probability distributions of *x*, *q*, and *r* are independent of each other, which is of course not exact (the effects of departures from independence are considered in the *Results* section on the apparent selection coefficients). We can then write(4a)(4b)To first order in *s*, the quantity *s*(1 – 2*h*)E{*pq*} is equal to the expectation of the inbred load, *B*, *i.e.*, the expected difference in fitness between the logarithms of the mean fitness of a randomly mating population and of a completely homozygous population with the same allele frequencies (Greenberg and Crow 1960). Because of the loss of variability due to drift, E{*B*} will in general be smaller than the inbred load for an infinite population, *B** = *p*q*s*(1 – 2*h*), where *q** is the equilibrium frequency of B_{2} under mutation and selection in an infinite population (Glémin *et al.* 2003). If *v* << *u*, as assumed here, *q** is given by the following expression (Crow and Kimura 1970, p. 260):(5a)With *hs* >> *u*, this reduces to the familiar result of Haldane (1927):(5b)This yields the widely used formula for the inbred load in an infinite population:(6)Weakly selected loci may contribute substantially to the inbred load in the initial population, so that *q** could be substantially greater than zero. The more general expressions for *q** and *B** described above are, therefore, used here.

An exact analytic treatment of how E{*B*} changes over time in a finite population is difficult. Since the expected change in allele frequency due to selection is a third-degree function of *q* when *h* ≠ 0.5, the third and higher moments of *q* enter into the expected change per generation in *B* under drift, mutation, and selection, so that closed expressions cannot be obtained without using a full solution to the relevant diffusion equation under selection (*e.g.*, Balick *et al.* 2015). The simplest way to obtain tractable analytical results is to make the assumption that departures caused by drift from *q** are sufficiently small that the change in allele frequency due to selection per generation, (Δ_{s}*q*), can be linearized around *q** and hence equated to (Δ_{s}*q*)* _{q}** + (

*q*–

*q**)(dΔ

_{s}

*q/*d

*q*)

** (*

_{q}*e.g.*, Charlesworth and Charlesworth 2010, p. 355). If the mutational contributions to the change in allele frequency are included, the net change in

*q*is equal to (

*q*–

*q**)(dΔ

_{s}

*q/*d

*q*)

***

_{q}**–**(

*u*+

*v*). The expected change in

*q*in a finite population is then zero, and a recursion equation for the variance of

*q*,

*V*, can easily be obtained, yielding a simple expression for the approximate value of E{

_{q}*pq*} in a given generation (see

*Appendix*).

An approximate linear recursion equation for the expectation of *r*^{2} can be obtained in a similar way, following the approach of Sved (1971) (see *Appendix*). The quantity *σ*^{2}* _{d}* = E{

*D*

^{2}}/E{

*xypq*} is also frequently used as a descriptor of LD in finite populations, since it can be calculated using diffusion equations (Ohta 1971; Ohta and Kimura 1971), as discussed below. While a simple recursion for

*σ*

^{2}

*does not exist, a heuristic approach is to assume that it has a similar form to that for*

_{d}*r*

^{2}, replacing the equilibrium value of

*r*

^{2}given by Sved’s approach by the (smaller) equilibrium value of

*σ*

^{2}

*(see*

_{d}*Appendix*). We can then substitute the expected value of

*σ*

^{2}

*for the corresponding expectation of*

_{d}*r*

^{2}in Equations 3. Both of these approaches ignore any effects of selection on LD.

We also need to obtain the expectations of *x*^{–1} and (1 – *x*)^{–1} for a given generation, conditioned on segregation at the neutral locus. This can be done by means of the diffusion equation solution for the probability distribution at a biallelic locus under pure drift (Kimura 1955), using the terms involving the first few eigenfunctions of the power series representation of the probability distribution, conditioning on an initial allele frequency *x*_{0} (see *Appendix*). The approximate expectations of *x*^{–1} and (1 – *x*)^{–1} can then easily be determined (see *Appendix*). As was shown by Fisher (1930), this distribution is asymptotically close to a uniform distribution, with some slight deviations in the terminal classes and with a mean allele frequency of 0.5 regardless of the value of *x*_{0}. The asymptotic values of *x*^{–1} and (1 – *x*)^{–1} can then be calculated, which both approach the same value asymptotically (Equation A11). This implies that the apparent selection coefficients against each homozygote should approach the same asymptotic value, provided that any effects of mutation at the neutral locus can be ignored over the timescale under consideration.

### Apparent selection coefficients induced by linkage to a locus with heterozygote advantage

The same machinery can be used when the selected locus is segregating for a pair of alleles where the B_{1}B_{1} and B_{2}B_{2} homozygotes have fitnesses 1 – *s* and 1 – *t* relative to a fitness of 1 for B_{1}B_{2} . We have(7a)(7b)(7c)(Sved 1968; Ohta and Kimura 1970).

The approximate apparent selection coefficients are then given by(8a)(8b)Using the same argument that led to Equations 4, these apparent selection coefficients can be further approximated by(9a)(9b)The expectation of the inbred load *B* in a finite population is now equal to (*s* + *t*) E{*pq*}, so that Equations 4 and 9 both involve the same function of E{*B*}. An approximation for E{*pq*} can again be found by linearizing the recursion relation for *q* around the deterministic equilibrium, *q** = *s*/(*s* + *t*). The coefficient *κ* that measures the rate of approach to equilibrium is now equal to –*st*/(*s* + *t*) (Charlesworth and Charlesworth 2010, p. 355). In this case, no mutation is allowed at the locus under selection; this is because we are considering a single nucleotide site instead of a whole gene sequence, and mutations to new variants are unlikely to occur over the timescale with which we are concerned.

### The effect of selection at a linked locus on the amount of neutral variability

#### Lack of relevance of the apparent selection coefficients to the rate of loss of variability:

It was assumed in previous theoretical studies that the apparent selection coefficients obtained by ignoring terms in *D* in Equations 2 and 8 provide a measure of the extent to which the loss of variability at a neutral locus, caused by drift, is retarded by linkage to a selected locus (Sved 1968; Ohta and Kimura 1970; Ohta 1971). But this assumption is mistaken, as can be shown as follows.

This can be seen for the case of deleterious mutations at the B locus; a similar argument holds for the case of heterozygote advantage. The expected change in *x* can be obtained from Equations 2, yieldingThis gives(10)In other words, the terms in *D*^{2} contribute nothing to the change in the frequency of allele B_{1} at the neutral locus, which is determined entirely by the expected product of *D* and the additive effect of B_{1} on fitness, *a*(*q*) = *s*[*h* + *q*(1 – 2*h*)], provided that terms in *s*^{2} are neglected. An equivalent result is used in models of the effects of selective sweeps on neutral allele frequencies at linked sites (Charlesworth and Charlesworth 2010, p. 410). It is an example of the Price equation (Price 1970), since *Da* is the covariance between fitness and the allelic state with respect to A_{1} at the neutral locus (Santiago and Caballero 1995).

Since the change in the heterozygosity, 2*x*(1 – *x*), at the neutral locus caused by the change in allele frequency due to selection at the B locus is ∼2(1 – 2*x*)Δ*x*, the accompanying change in the expected heterozygosity, *H* = 2E{*xy*}, is given by(11)Similar results apply to the case of heterozygote advantage. Here, *a*(*q*) = *qt* – *ps* = *q*(*s*+*t*) – *s*, so that(12)(13)We cannot, therefore, use the apparent selection coefficients to assess the extent to which loss of variability at the neutral locus is affected by selection at a linked locus.

#### How to calculate the rate of loss of variability:

To solve this problem, we employ the linear diffusion operator method to obtain a closed system of recursion equations for the expectations of the quantities of interest (Ohta and Kimura 1971; Ohta 1971); this is closely related to the generation matrix approach of Hill and Robertson (1968) and Hill (1977). These methods provide recursion equations for the expectations of functions of the allele frequencies at the two loci and the linkage disequilibrium coefficient.

We have extended this method to include the effects of selection, mutation, and drift on the allele frequency *q* at the selected locus, reducing the dimensionality of the set of recursion equations by ignoring third- and higher-order moments of *q* around *q**. The complexity of the relations between the different variables means that a closed system can be obtained only by use of a nine-dimensional vector, **Y**, of the expectations of functions of allele frequencies and *D*, with a corresponding 9 × 9 recursion matrix, **R** (see section S1 of File S1 for details).

The elements of **Y** are as follows: *Y*_{1} = E{*xy*}, *Y*_{2} = E{*xyq*}, *Y*_{3} = E{*xyq*^{2}}, *Y*_{4} = E{*D*(*x – y*)}, *Y*_{5} =E{*D*(*x – y*)*q*}, *Y*_{6} = E{*D*(*x – y*)*q*^{2}}, *Y*_{7} = E{*D*^{2}}, *Y*_{8} = E{*D*^{2}*q*}, and *Y*_{9} = E{*D*^{2}*q*^{2}}. Iteration of the **R** matrix provides an approximation for *σ*^{2}* _{d}* in an arbitrary generation (which can be used instead of E{

*r*

^{2}} in Equations 3) as well as for

*H*, since

*σ*

^{2}

*=*

_{d}*Y*

_{7}/(

*Y*

_{2}–

*Y*

_{3}) and

*H*= 2

*Y*

_{1}.

### Computer simulations

#### One neutral and one selected locus:

The theoretical predictions described above were compared with Monte Carlo simulation results for a population started in mutation–selection balance at the selected locus, which was in linkage equilibrium with a neutral locus. This was then run for a chosen number of generations at a fixed population size. In each generation, the new haplotype frequencies were calculated using the standard deterministic equations, and 2*N* uniform random numbers were generated to sample each new haplotype from the cumulative distribution of haplotype frequencies.

Since the effects of a single selected locus on a linked neutral locus are very small, it was necessary to run a large number of replicate simulations (usually 10^{7}) to obtain tight confidence intervals on the mean heterozygosity at the neutral locus.

#### Multiple loci with selection and mutation:

To simulate multiple loci subject to mutation and selection, we assumed additive fitness interactions across the relevant loci. For weak selection, this should yield very similar results to those of a multiplicative fitness model. The state of the population in a given generation is represented by the haplotype frequencies with respect to *n* loci subject to mutation and selection, following the same rules as in the single-locus case, with a neutral locus in the center of the chromosome. The recombination frequency between the loci under selection is *c*; the frequency of recombination between the neutral locus and each of the adjacent selected loci is *c*/2. Complete interference is assumed, so that only single crossovers are allowed, generating a frequency of recombination between the two terminal loci of (*n* – 1)*c*.

The frequencies of diploid genotypes at the start of a given generation are assumed to be given by random combinations of the frequencies of the haplotypes determined in the previous generation, thus tacitly assuming an infinite number of new zygotes. The frequencies of haplotypes after recombination were calculated deterministically, using an extension of the three-locus algorithm described by Crow and Kimura (1970, p. 51). The haplotype frequencies after mutation and selection were then calculated deterministically. The haplotype frequencies after drift were obtained by choosing 2*N* haplotypes from a pool with the new haplotype frequencies, in the same way as above.

FORTRAN programs for all the cases mentioned here are available on request from B. Charlesworth.

### Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

## Results

### A single selected locus with mutation and selection

To generalize the results described below as far as possible, we have used the principle from diffusion equation theory that the outcome of the evolutionary process is determined by the products of the effective population size, *N*_{e}, and the deterministic parameters, measuring time in units of 2*N*_{e} generations (Ewens 2004, p. 157). In the present case, where *N*_{e} = *N*, the simulation results for a given *N* and a set of mutation, selection, and recombination parameters can be applied to a system with a population size of *KN* and deterministic parameters that are a factor of *K*^{–1} times those used here. For this reason, the selection and mutation parameters in most of the material described below are scaled by a factor of 2*N*, thereby avoiding reference to a specific population size. We use *S* = 2*Ns*, *C* = 2*Nc*, *U* = 2*Nu*, and *V* = 2*Nv* to denote these scaled parameters.

An additional point to note is that the assumption that the initial state of the population has *D* = 0 implies that only the first three elements of the initial **Y** vector are nonzero. Without loss of generality, the elements of **Y** can be divided by the product of the initial frequencies of A_{1} and A_{2}, *x*_{0}*y*_{0}, and the resulting vector, **Y′**, can then be iterated using the **R** matrix. The ratio of the first element of **Y′** in generation *t* to (1 – 1/2*N*)* ^{t}* measures the ratio of the mean heterozygosity at the neutral locus to its value in the absence of selection at the B locus,

*H*

_{rel}. This means that the matrix predictions for

*H*

_{rel}and

*σ*

^{2}

*are independent of the initial allele frequencies at the neutral locus, since the initial value of*

_{d}**Y′**is independent of these frequencies. If, however,

*D*were nonzero in the initial generation, only the asymptotic behavior of the system would be independent of the initial allele frequencies.

#### Accuracy of the matrix approximation:

We first consider the accuracy of the 9 × 9 matrix approximation. Figure 1 shows the values of *H*_{rel} and 2ΔE{*xy*} = Δ*H*_{sel} from Equation 11 over 2*N* generations, for two different dominance coefficients and *C* = 0.1. We are assuming that selection and mutation at the B locus relate to deleterious mutations affecting the gene as a whole rather than individual nucleotide sites, whereas the alleles at the A locus represent a pair of variants chosen by an experimenter as neutral markers and are not affected by mutation over the time course of the experiment. The selection and mutation parameters were chosen such that the equilibrium frequency of the deleterious mutation at the B locus was ∼0.3. This may seem to be very high, but is consistent with the typical frequency per gene of putatively deleterious nonsynonymous mutations in *Drosophila* populations (Haddrill *et al.* 2010), although the selection coefficient and mutation rates used in Figure 1 are probably much higher than is realistic.

Agreement between the simulation and matrix results is remarkably good for this parameter set. A value of *h* = 0.1 is associated with *H*_{rel} > 1 in later generations (retardation of loss of neutral variability), whereas *h* = 0.45 is associated with *H*_{rel} < 1 (acceleration of loss of neutral variability). This illustrates the conflict between AOD and BGS, noted previously in simulation results (Pamilo and Palsson 1998; Wang and Hill 1999); for sufficiently low *h*, AOD is the dominant force as far as the level of neutral variability is concerned, whereas BGS dominates when *h* is sufficiently close to 1/2. We provide some insights into the reasons for this behavior in the next but one section.

#### Approximation for the rate of loss of variability with weak selection:

This conflict implies the existence of a critical value of *h*, *h*_{c}, where the two effects exactly cancel each other; the properties of *h*_{c} are examined in more detail below. Unfortunately, there is no exact fixed value of *h*_{c}, since the extent to which *H*_{rel} is greater or less than one depends on the generation in question, as shown in Figure 2, which uses the matrix results. As *h* increases from 0 to 0.5, *H*_{rel} declines, at both generations *N* and 2*N*. The inset in Figure 2 shows that *h*_{c} is larger for the earlier generations. However, *h*_{c} is close to 0.37 for all generations for the selection and mutation parameters used in Figure 2.

When *q** is small and *S* is of order 1, agreement with the simulations is good only in earlier generations of the process, reflecting a high level of dispersion of *q* around *q**, so that our neglect of higher-order terms in *q – q** is inaccurate. In general, however, the matrix gives reasonably good approximations up to 2*N* generations, although the extent of retardation of the rate of loss of variability tends to be underestimated (or the rate of acceleration overestimated) by the matrix approximation with small *q** (see Figure B in File S1, section S10). We will often use results for this generation as a standard, since it is similar to the duration of the *Drosophila* experiments on AOD described in the *Discussion*.

### Apparent selection coefficients and their relation to the rate of loss of variability:

These conclusions are confirmed by the results displayed in Table 1 for an initial frequency of A_{1} of 0.1, over a range of recombination rates. Table 1 also shows the simulation results for an additional measure of variability, the proportion of segregating neutral loci, *P*_{s}, at 0.5*N* and 2*N* generations. In addition, the exact and approximate apparent selection coefficients are displayed, together with the matrix and simulation values for *H*_{rel}. More detailed results are shown in Table A of File S1, section S9.

It can be seen that the approximate apparent selection coefficients are similar in magnitude to the exact ones, but tend to exceed them, especially with very close linkage and for the neutral allele with the lower initial frequency. This is mainly because of the inaccuracy of the assumption of independence between the probability distributions of allele frequencies at the neutral and selected loci used in deriving the equations for the apparent selection coefficients. The correlation coefficient *r* is a property of the genealogies at the two loci and so should be relatively independent of the allele frequencies. The simulations show, however, that the levels of variability at the two loci are positively correlated, so that the expectation of *pqx*^{–1} in Equation 3a is smaller than E{*pq*}E{*x*^{–1}} in Equation 4a. If the exact simulation results for the apparent selection coefficients are compared with those using the simulated values for the expectations of *r*^{2}, *pq*, and *x*^{–1} or (1 – *x*)^{–1} in Equations 4, as opposed to the approximations for these quantities used to obtain the values in Table 1, it is found that there is still a substantial discrepancy when linkage is tight, which can arise only from the inaccuracy of the independence assumption used to obtain Equations 4.

With an initial frequency of A_{1} of 0.5, the apparent selection coefficients for A_{1}A_{1} and A_{2}A_{2} are always equal, as expected from Equations 3 and 4. In contrast, for an initial frequency of 0.1 there is initially a stronger apparent selection coefficient against A_{1}A_{1}, again as expected, but the selection coefficients converge on equality as the mean neutral allele frequency at segregating sites approaches 0.5; these are also similar to the selection coefficients for the case of an initial frequency of 0.5. The apparent selection coefficients at generation 2*N* are quite close to the asymptotic values given by Equations 4 when the B locus has reached mutation–selection–drift equilibrium, when E{*pq*} is calculated using the linear approximation described above. For example, with *h* = 0.1, *C* = 0.1, and the selection and mutation parameters in Table 1, the asymptotic apparent selection coefficient for populations segregating at the A locus is 0.0014 when the equilibrium neutral value of σ_{d}^{2} is used as the estimate of E{*r*^{2}} in Equations 4, whereas the observed value at 2*N* generations for the case with *x*_{0} = 0.5 is 0.0013. In contrast, the asymptotic value predicted by Equation 7 of Ohta (1971) is 0.0024, a somewhat worse fit. This probably reflects the fact that Ohta assumed a fixed allele frequency at the neutral locus, as well as assuming *q* <<* 1.

As expected, the magnitudes of the apparent selection coefficients decline sharply as the recombination rate increases, but always indicate heterozygote advantage, even with *C* = 10 and *h* = 0.45. But with *h* = 0.45, there is an acceleration of the loss of variability at the neutral locus when linkage is tight, despite the apparent selective advantage to heterozygotes. This confirms the above conclusion that the apparent selection coefficients are not relevant to the extent to which mutation and selection at one locus affect the rate of loss of variability at a linked neutral locus; they show apparent heterozygote advantage even when BGS causes an accelerated loss of variability. With *C* = 10, little effect of selection on the rate of loss of variability can be detected (Table 1).

### Conflict between AOD and BGS with respect to the rate of loss of variability:

The existence of a conflict between AOD and BGS, and its evident dependence on the dominance coefficient, raises the question of how to make generalizations about the regions of parameter space in which one or the other force is dominant. One way of approaching this problem is to study the properties of the leading eigenvalue of the **R** matrix, *λ*_{0}, as a function of the relevant variables. If we equate *λ*_{0} to 1 – 1/(2*N*_{e}), where *N*_{e} is the effective population size in the presence of selection at the B locus, 1/(1 – *λ*_{0}) provides a measure of 2*N*_{e}. Consider first the limiting case when *S* = 2*Ns* >> 1 and *u* << *hs*, so that the expected frequency of A_{2} is close to *u*/(*hs*) (Equation 5b). The approximation for *λ*_{0} – 1 for this case (Equations S17 and S18 of section S2 of File S1) shows that *N*_{e} is given by the standard equation for BGS at a single locus (Charlesworth and Charlesworth 2010, p. 402), which assumes these conditions.

As argued by Whitlock and Barton (1997), the value of 2*N*_{e} obtained from 1/(1 – *λ*_{0}) gives the asymptotic value of the mean coalescent time, *T*_{2}, for a pair of alleles at the neutral locus. Under the infinite-sites model, the nucleotide diversity at mutation–drift equilibrium is equal to the product of the neutral mutation rate and 2*N*_{e} (Charlesworth and Charlesworth 2010, p. 211), so that we can use these results to predict the effect of selection at the B locus on the equilibrium level of variability at the neutral locus. (See section S2 of File S1, Equation S16, for a more rigorous derivation of this result.)

However, since the matrix approximation is inaccurate for later generations with *S* < 1 and small *q**, this use of the leading eigenvalue to estimate *N*_{e} is limited in scope and is not necessarily accurate for general *q** and small *S*. The approximate approach described in the next section deals with the case when *S* is of order ≤1.

### Approximation for the rate of loss of variability with weak selection:

The starting point for this analysis is Equation 11, which shows that the change in expected heterozygosity due to selection, Δ*H*_{sel}, is the sum of two terms, Δ_{1} = – 2*shY*_{4} and Δ_{2} = – 2*s*(1 – 2*h*)*Y*_{5}, where *Y*_{4} = E{*D*(*x* – *y*)} and *Y*_{5} = E{*D*(*x* – *y*)*q*}. We know from previous work on BGS that an acceleration of loss of neutral variability under drift always occurs in haploid models with linked deleterious mutations. These are equivalent to diploid models with *h* = 0.5, for which Δ_{2} = 0, so that the process is entirely driven by Δ_{1}; a necessary and sufficient condition for acceleration in this case is thus Δ_{1} < 0. This implies that, with *h* < 0.5, there is a conflict between the two terms when Δ_{1} < 0 and Δ_{2} > 0. Larger *h* gives more weight to Δ_{1} relative to Δ_{2}, so that selection is more likely to reduce variability when *h* is large, in agreement with the numerical results just described. It is, however, unclear from this argument why small *S* should favor AOD over BGS, which we now proceed to investigate.

The complexity of the recursion relations means that it is impossible to obtain simple, exact expressions for *Y*_{4} and *Y*_{5}, but useful approximations for weak selection (*S* < 1) can be obtained as follows. Using expressions (S4d) and (S4e) from the matrix approximation derived in section S1 of File S1, the changes per generation in these moments in the absence of selection are(14a)(14b)In the absence of selection, *Y*_{4} thus tends to zero, and *Y*_{5} tends to – 2*Y*_{7}/[ (*C* + *U* + *V*) + 5], which is <0 and <*Y*_{7} in magnitude. This suggests that the magnitude of Δ_{1} will be much smaller than that of Δ_{2} when selection is weak, which is confirmed by the following argument.

Approximate recursions for *Y*_{4} and *Y*_{5} with selection are given by Equations S4 in File S1, section S1. We make the simplification of dropping terms involving *u*, *v*, and products of *s* or *D*^{2} with *q* and *q*^{2}, on the assumption that these are small relative to similar terms involving *s* or *D*^{2} alone. [These assumptions will be violated when selection is strong in relation to drift (*S* > 1) or when *q** is >> 0.] In addition, for the neutral case, it is possible to show that *Y*_{6} and *Y*_{5} approach equality and *Y*_{8} approaches *Y*_{7}/2 if *q** is set to 0.5 (we neglect *Y*_{9}, since it is considerably smaller than *Y*_{8}): see Equations S27 of File S1, section S6. Our numerical results show that these relations are approximately true under much wider conditions, so that we can replace *Y*_{6} with *Y*_{5} and *Y*_{8} with *Y*_{7}/2 in Equations S4 in File S1, section S1, allowing us to obtain a pair of coupled difference equations for *Y*_{4} and *Y*_{5}, given the value of *Y*_{7}.

It is also convenient to rescale all terms by multiplying by 2*N*. We can then write(15a)(15b)where(16a)(16b)(16c)If the changes in *Y*_{4}, etc., are sufficiently slow, we can set Δ*Y*_{4} and Δ*Y*_{5} to zero and solve the resulting linear equations to obtain approximations for *Y*_{4} and *Y*_{5} as functions of *Y*_{7}:(17a)(17b)(17c)This shows that *Y*_{4} is *O*(*S*), whereas *Y*_{5} is of order 1, so that Δ*H*_{sel} is dominated by Δ_{2} when *S* is small, provided that *h* < 1/2. The determinant (det) is always positive for *S* ≤ 60; this condition is not very restrictive given our assumption of relatively small *S*, and we assume positivity from now on. Given that *X*_{7} > 0, it can easily be shown that *Y*_{4} > 0 when *h <* 1/2; *S <* 2 is sufficient for *Y*_{5} < 0 when *h =* 0.5. For small values of *h* these conditions are less stringent; with *h* = 0, *Y*_{4} > 0 independently of *S*, and *Y*_{5} < 0 when *S* < 6. Furthermore, neither *Y*_{4} nor *Y*_{5} depends on *q**, except through their common factor of *Y*_{7}. Our numerical results show that these conclusions are correct, provided that *S* is of order 1.

From Equation 11, the sign of the change in heterozygosity at the neutral locus due to selection, Δ*H*_{sel}, is the same as the sign of –[*h Y*_{4} + (1 – 2*h*) *Y*_{5}]. Variability will be increased by selection if Δ*H*_{sel} > 0 and reduced if Δ*H*_{sel} < 0, so that this implies that the critical upper value of *h*, *h*_{c}, required for an increase in neutral variability due to AOD, is independent of *q** under the assumed conditions. The value of *h*_{c} can be found by setting Δ*H*_{sel} = 0. Using Equations 11 together with the numerators of Equations 17a and 17b, we find that *h*_{c} is the value of *h* that satisfies the following cubic equation:(18)With *S* close to 0, Equation 18 reduces to 12*h* – 6 = 0, *i.e.*, *h*_{c} = 1/2, suggesting that variability will always be increased for realistic *h* values when selection is very weak. For other cases, some insights into the properties of *h*_{c} can be obtained by assuming *C* = 0, which approximates cases with low recombination rates. With *S* = 1, the solution to the cubic is ∼*h*_{c} = 0.36, slightly smaller than the value of 0.37 obtained from the numerical results described earlier. Figure 3 shows a comparison of the values obtained by solving Equation 18 using Matlab symbolic calculations with the values from the matrix approximation for generation 2*N* with *C* = 0.1 (very similar patterns are obtained for *C* values of up to 1). It will be seen that they agree quite well when *S* < 1, regardless of the value of *q**, but Equation 18 mostly underestimates *h*_{c} for *S* > 1, except for values of *q** much smaller than or >>0.1. For *S* << 1, dominance coefficients close to 1/2 are compatible with an enhanced level of neutral variability, but BGS always prevails for values of *S* >> 3, even when *h* is very close to 0.

The quantity 2*N* Δ*H*_{sel}/*H* is equivalent to *ε* = *N*_{e}*/N* – 1, where *N*_{e} is the effective population size experienced by the neutral locus for the generation in question. For small *S*, we can approximate *ε* by a Taylor series around *S* = 0. We have(19)The first term in this equation is derived entirely from the contribution of *Y*_{5} to Δ*H*_{sel}. It implies that selection will cause increased variability if *h* is slightly <1/2 when *O*(*S*^{2}) terms are negligible, consistent with the conclusions reached above. With very weak selection, BGS is not effective unless mutations are close to being semidominant. The second-order term is always <0 and increases in magnitude as *S* increases; under the most favorable situation (*h* = 0), *S <* 4(5 + *C*)/10 is necessary for *ε* > 0, which is slightly more restrictive than is indicated by the results in Figure 3. Figure 4 shows some examples of the extent to which this approximation matches the simulation results; in general, it tends to underestimate the extent to which loss of variability is retarded by selection when *S* is of order 1.

These approximations also imply that the magnitude of Δ*H*_{sel} is proportional to *Y*_{7} = E{*D*^{2}}. If we approximate E{*D*^{2}} by E{*xy*}E{*pq*}*σ*^{2}* _{d}* =

*H*E{

*pq*}

*σ*

^{2}

*/2, as was done when obtaining Equations 4 for the apparent selection coefficients, Equations 17 and 18 give*

_{d}(20)

Because *σ*^{2}* _{d}*E{

*pq*} ≤ 0.25, the quadratic dependence of the denominator on

*C*, and the inverse relation between

*σ*

^{2}

*and*

_{d}*C*, implies that there will be only a very small effect of selection on neutral variability when

*C*>> 1, unless

*S*>>

*C*. In addition, the size of the effect is strongly determined by

*S*and by E{

*pq*}, since

*σ*

^{2}

*takes a value between 0 and 1.*

_{d}Overall, these results suggest that retardation of loss of neutral variability caused by a single selected locus is unlikely to be important except for *S* values of the order of ≤1. In such cases, if linkage is sufficiently tight, there can be a significant retardation of loss of neutral variability and an enhancement of the equilibrium level of variability even when *h* approaches 1/2.

## Two selected loci with mutation and selection

From simulation studies of the effect of multiple, linked selected loci on the behavior of neutral loci located among them, we would expect to see larger values of both the apparent selection coefficients and the level of neutral heterozygosity than with a single selected locus (Ohta 1971; Latter 1998; Pamilo and Palsson 1998; Palsson and Pamilo 1999; Wang and Hill 1999; Bierne *et al.* 2000). However, it is unclear whether the effects of multiple selected loci are approximately additive or whether there is some degree of synergism.

With small population size, very close linkage, strong selection, and a sufficiently high level of recessivity of deleterious mutations, simulations have shown that “crystallization” of the population into two predominant and complementary haplotypes (+ – + – … and – + – +…, where + denotes wild type and – mutant) with respect to the selected loci can occur, such that the behavior of the system is very similar to that of a single locus with strong heterozygote advantage (Charlesworth and Charlesworth 1997; Palsson and Pamilo 1999; Palsson 2001). This can result in long-term maintenance of variability at the selected loci. This is strongly suggestive of a synergistic effect of multiple selected loci.

No analytical treatment of more than one selected locus has yet been done, so we have investigated this problem, using simulations of a small number of selected loci. The results of simulations with two selected loci surrounding a neutral locus are shown in Table 2, using the same set of recombination frequencies between the neutral and nearest selected locus as in Table 1. As far as the apparent selection coefficients are concerned, with *h* = 0.1 and the lower two recombination rates, there is clear evidence that the apparent selection coefficients in generation 2*N* (but not necessarily in generation 0.5*N*) are somewhat larger than twice the apparent selection coefficients in the single-locus case, indicating a degree of synergism. A similar pattern is found with four selected loci (results not shown).

It is more difficult to quantify the effect on *H*_{rel}, since this is strongly dependent on the generation in question. With close linkage and *h* = 0.1, the values of *H*_{rel} – 1 in generation 2*N* in Table 2 are slightly larger than twice the corresponding values in Table 2. Conversely, with *h* = 0.45 and close linkage, for which *H*_{rel} *<* 1, the absolute values of *H*_{rel} – 1 in generation 2*N* are less than twice the corresponding values in Table 1, although the apparent selection coefficients are slightly larger than twice the values with a single selected locus. A more rigorous test for additivity is described in section S5 of File S1; it suggests that the effects of two loci on *N*_{e}*/N* – 1, as estimated from Δ*H*_{sel}, are somewhat greater than additive when there is a retardation of loss of variability and less than additive when there is an acceleration.

## A single selected locus with heterozygote advantage

The methods used for the case of mutation and selection can be used to model the case of linkage to a single selected locus with heterozygote advantage, by making appropriate modifications to the terms in the **R** matrix and to the expressions for the changes in haplotype frequencies in the computer simulations (see Equations 9 and section S3 of File S1). Figure 5 and Table 3 compare some examples of matrix and simulation results; these again suggest that the matrix predictions are accurate for this parameter range (further details are shown in Table B, section S9 of File S1). Once again, the ratio of the rate of recombination to the strength of selection has to be sufficiently small for a substantial effect.

Approximate results can be obtained for the case when *S* = 2*Ns* and *T* = 2*Nt* are both <1, on the same lines as for Equations 15–20 for the mutation–selection balance case. The coefficients for the approximate recursion relations for *Y*_{4} and *Y*_{5} are now as follows:(21a)(21b)(21c)The equivalents of Equations 17 and 18 are given in section S3 of File S1 and can be used to calculate Δ*H*_{sel} in the same way as for the mutation–selection case. The first-order term in *S* and *T* in Δ*H*_{sel} gives(22)The inbred load in this case is (*s* + *t*)E{*pq*} (see Equations 9), so that this is identical in form to the first-order approximation for the mutation–selection balance case (see Equation 19). For sufficiently weak selection, therefore, heterozygote advantage should always lead to an enhancement of neutral variability at a linked locus.

It is known, however, that the asymptotic rate of loss of variability at a locus with heterozygote advantage has a complex dependence on the deterministic equilibrium allele frequency and the population size, with extreme allele frequencies (outside the range 0.2–0.8) leading to an acceleration of loss of variability when *S* + *T* is sufficiently large (Robertson 1962). This suggests that a similarly complex pattern should be found for the dependence of the measure of retardation/acceleration based on the leading eigenvalue of the recursion matrix. Figure C of File S1 (section S10) shows that this is indeed the case, although there is only a small region of parameter space where acceleration rather than retardation of loss of variability occurs.

Under most circumstances, some retardation of the loss of variability at a neutral locus is, therefore, likely to be observed as a result of linkage to a selected locus, although its magnitude is small unless linkage is tight and *S* + *T* >> 1. For *C* >> *S* + *T* it is unlikely that any experimentally detectable effects could be observed. The same applies to the apparent selection coefficients against homozygotes at the neutral locus.

## Discussion

### Some general considerations

Our results shed some new light on the old question of the properties of AOD at a neutral locus in a randomly mating population of finite size, resulting from randomly generated LD with respect to loci subject to selection. The pioneering work of Sved (1968, 1972), Ohta and Kimura (1970), and Ohta (1971) gave expressions for the apparent selection coefficients against homozygotes at a neutral locus, using the equivalents of our Equations 3 and 8. Our approximate estimates of the apparent selection coefficients for the case of a finite population founded from an infinite population at equilibrium with no LD between the selected and neutral loci (Equations 4 and 9) followed their approach in using only the terms involving *D*^{2} in these equations, which are the dominant terms.

However, as was noted by Sved (1968, p. 552) for the case of heterozygote advantage and by Latter (1998) for the case of selection against deleterious mutations, these selection coefficients do not result in any change in allele frequency at the neutral locus. As mentioned above (Equation 10), this is a completely general conclusion: application of the Price equation (Price 1970) shows that the change Δ*x* per generation in the frequency of allele A_{1} at the neutral locus is equal to *aD*, where *a* is the average effect of A_{1} on fitness (see also Santiago and Caballero 1995, p. 1016), to the order of the approximations used here (neglect of second-order terms in *s* and 1/*N*). Neither the change in allele frequency nor the change in heterozygosity at the neutral locus is influenced by the *D*^{2} terms under these conditions.

There is thus no connection between apparent overdominance caused by linkage disequilibrium and any retardation of loss of variability by drift at a neutral locus. Such a connection seems to have been widely assumed (*e.g.*, Latter 1998), although it was pointed out by Charlesworth (1991) that AOD caused by identity disequilibrium has no effect on allele frequencies at the neutral loci concerned. As was noted by Bierne *et al.* (2000), for this case it is necessary to treat apparent overdominance at the neutral locus as a phenomenon that is distinct from any retardation of loss of variability. This point is reinforced by the results in Table 1 and Table 2 for the case of selection against recurrent deleterious mutations, which show that apparent overdominance can accompany an acceleration of loss of neutral variability when the dominance coefficient *h* is sufficiently large (see the entries with *h* = 0.45). Of course, *h* < 1/2 is a necessary condition for both retardation of loss of variability and apparent overdominance. This is not restrictive, given the evidence that the dominance coefficients for slightly deleterious mutations are generally <1/2 (Crow 1993; Manna *et al.* 2011).

### Selection against deleterious mutations

#### General conclusions:

Computer simulations of systems with many loci subject to mutation to deleterious alleles have shown that a noticeable retardation of loss of neutral variation at linked loci can occur (Latter 1998; Pamilo and Palsson 1998; Palsson and Pamilo 1999; Wang and Hill 1999; Wang *et al.* 1999), for certain ranges of values of selection and dominance coefficients. As pointed out by Pamilo and Palsson (1998), Palsson and Pamilo (1999), Wang and Hill (1999), and Wang *et al.* (1999), there is a conflict between the effect of AOD on neutral variability and the effect of BGS; the latter involves a reduction in the effective population size experienced by a neutral locus caused by its linkage to deleterious mutations (Charlesworth *et al.* 1993). Their multilocus simulations, as well as the matrix-based investigation by Wang and Hill (1999) of a model of a single selected locus linked to a neutral locus in a sib-mating and selfing population, showed that retardation *vs.* acceleration of the rate of loss of neutral variability is favored by relatively weak selection (2*Ns* values of order ≤4) and low dominance coefficients; see, for example, figure 1 of Wang and Hill (1999).

Our analytical and numerical methods, employing a 9 × 9 matrix of “moments” of functions of the allele frequencies at the two loci and *D*, as well as weak selection approximations for the change in heterozygosity, have allowed us to investigate in detail the regimes in which retardation *vs.* acceleration of loss of neutral variability occurs.

The main conclusions are as follows:

Retardation rather than acceleration of loss of variability is favored by sufficiently low values of the dominance coefficient,

*h*, when*S*= 2*Ns*is in the range 0.5–4.5 (Figure 3). The commonly used estimate of*h*= 0.25 for slightly deleterious mutations (Manna*et al.*2011; Charlesworth 2015) is consistent with retardation when*S*≤ 2.5, except when*q** is close to 0.5.*S*< 0.5 allows retardation of loss even for*h*values approaching 1/2; this domain of*S*values can be thought of as the*AOD limit*.In contrast, there is always an acceleration of loss of variability when

*S*> 4.5, even with very low*h*values; this constitutes the*BGS limit*. Despite this acceleration, apparent heterozygote advantage at the neutral locus is always observed, provided that*h*< 1/2.For

*q**<< 1 and for*S*>> 1, the asymptotic behavior of the expected heterozygosity at the neutral locus is well predicted by the*N*_{e}value given by the standard equation for BGS.The magnitude of the extent of retardation or acceleration of loss of variability is an increasing function of

*C/S*, for a given*h*value. Equation 19 also shows that, when*S <<*1, the effect is proportional to the inbred load scaled by 2*N*, for a given value of*C*.

#### Implications for empirical results:

Population genomic analyses of levels of nonsynonymous site variability in a variety of organisms have consistently shown that there is a wide distribution of selection coefficients against new deleterious mutations and suggest a mean *S* for natural populations that is >>1 (reviewed by Charlesworth 2015). The large effective population sizes of most populations that have been studied in this way imply, however, that the mean selection coefficient against a new deleterious mutation is likely to be very small. For example, the current estimate of mean *hs* for new nonsynonymous mutations in *Drosophila melanogaster* is ∼0.001; with *h* = 0.25, the corresponding mean *s* is 0.004 (Charlesworth 2015). With *N* = 50, as used in the experimental and simulation studies discussed below, the mean *S* is 0.4. If there were no variation in *s* for new mutations, this would suggest that they fall well within the range where Equations 18 and 19 predict retardation of loss of variability. However, we also have to take into account the fact that there is a wide distribution of *s*, as we discuss below. A coefficient of variation of *s* for new mutations of ∼2 is suggested by the *Drosophila* polymorphism data (Kousathanas and Keightley 2013).

### Effects of multiple loci subject to mutation and selection

We now consider how multiple loci subject to deleterious mutation affect a linked neutral locus; this question can be asked about both the apparent selection coefficients and the extent of retardation or acceleration of loss of variability.

#### Apparent selection coefficients:

With respect to the apparent selection coefficients, previous workers (Ohta 1971, 1973; Ohta and Cockerham 1974; Bierne *et al.* 2000) assumed that multiple loci combine approximately additively. Table 2, which describes results for a pair of selected loci, suggests that this assumption may be somewhat conservative; in most of the examples shown there, the selection coefficients are somewhat larger than twice the corresponding single-locus values, especially with a low initial frequency of the A_{1} allele.

The procedure of summing contributions over all sites when there are many selected loci, as used by Ohta (1971, 1973), may thus underestimate the apparent selection coefficients, although the departure from additivity is likely to be small for the very weak selection that is probably most common. An alternative to Equation 13 of Ohta (1973), based on Equations 3, is derived in section S6 of File S1 (Equations S31). We assume that the population is being studied at a sufficiently long time after establishment that its LD is close to equilibrium. We also assume a single chromosome with a uniform rate of recombination along the chromosome, as well as a linear genetic mapping function (this somewhat overestimates the amount of recombination, compared with the true situation with partial crossover interference). For a randomly placed marker, we have the asymptotic result(23)where *B*_{tot} is the inbred load associated with homozygosity for the whole chromosome, and *C** = 2*NM*, with *M* denoting the map length of the chromosome (taking into account any sex differences in recombination rates, by averaging over the values for each sex).

It is useful to compare this formula with the multilocus simulation results of Latter (1998), who modeled a *D. melanogaster* autosome evolving in a population maintained at a size of 50 for 200 generations after its foundation from an equilibrium population. The chromosome was 100 cM in length; the absence of crossing over in males means that the effective value of *M* is 0.5 M, so that *C** = 50 with *N* = 50. For a randomly placed marker, Equation 23 gives an approximate apparent selection coefficient of 0.312*B*_{tot.}

*B*_{tot} can be estimated as follows. The initial value of the net fitness (relative to a balancer chromosome) for homozygous *D. melanogaster* second chromosomes was 0.4, after purging them of major-effect mutations by extraction from a population maintained for many generations at an approximate size of 50 (Latter *et al.* 1995; Latter 1998). This can be equated to the fitness of a homozygous chromosome relative to that of an outbred genotype in the initial population; the value of *B*_{tot} for the initial population is then –ln(0.4), ∼0.90, assuming multiplicative fitness effects. This assumption is consistent with the log-linear decline in fitness with inbreeding coefficient *F* in the small laboratory populations of *D. melanogaster* described by Latter *et al.* (1995).

The simulations of Latter (1998) adjusted the number of mutable loci to produce an initial *B*_{tot} of 0.90. However, we need to take into account the reduction in *B*_{tot} as variability is lost at the selected loci. A minimum estimate of *B*_{tot} at generation *t* is provided by assuming that the ratio of E{*pq*} to its initial value follows the standard neutral result, 1 – *F _{t}* = exp(–

*t*/2

*N)*. For

*N*= 50 and

*t*= 200, this procedure yields

*B*

_{tot}= 0.90 × (1 –

*F*

_{200}) = 0.122 and a value of 0.038 for the apparent selection coefficient on a random marker, somewhat lower than the value of 0.05 given in figure 6 of Latter (1998). The discrepancy probably reflects an overestimate of the loss of variability by use of the standard neutral value of

*F*. Latter’s figure 4 shows

_{t}*F*

_{200}= 0.7 for the selected loci in his simulations, giving

*B*

_{tot}= 0.270 and an apparent selection coefficient of 0.084. The single-locus approximations for the apparent selection coefficients often overestimate the true values (see Table 1), so that it is not surprising that Equation 23 overestimates the apparent selection coefficient.

Since the numerator of Equation 23 involves a product of C* and a logarithmic function of *C**, the apparent selection coefficients are roughly inversely proportional to population size. Species with larger numbers of chromosomes than *Drosophila*, and with crossing over in both sexes, will have much smaller apparent selection coefficients for the same value of *N*. Detectable apparent selection coefficients arising from AOD in randomly mating populations are likely to be found only in very small populations, especially in organisms with large numbers of chromosomes such as most vertebrates. This suggests that AOD arising from identity disequilibrium (Szulkin *et al.* 2010) is a more likely explanation of heterozygosity/fitness relations in natural populations than LD due to drift in a randomly mating population (Hansson and Westerberg 2002), unless the population has been reduced to a very small number of breeding individuals.

#### Rate of loss of variability:

This approach can also be applied to the rate of loss of neutral variability, using the weak selection approximation of Equation 19, which includes only the first- and second-order terms in *S* in the expression for *ε* = *N*_{e}/*N* – 1 = 2*N* Δ*H*_{sel}, again assuming that LD is close to neutral equilibrium. The derivation of the approximate expected value of *ε* for a neutral marker placed randomly on a chromosome, using the same assumptions as before, is given in section S7 of File S1 (Equations S32–S35). The final result is(24)where CV* _{s}* is the coefficient of variation of the distribution of

*s*for new mutations. With an exponential distribution, CV

*= 1.*

_{s}We can compare the predictions of this equation with the simulation results for a mean *S* of 0.5 in Latter’s (1998) table 6, which is similar to the value suggested by the population genomic analyses for a population size of 50 and meets the weak selection requirement. Latter used a measure (Δ*F*) derived from the net change in *F* for neutral loci over *t* generations, relative to the corresponding value without selection; this is approximately equivalent to the ratio of *N* to the harmonic mean of *N*_{e}, *i.e.*, to 1/(1 + *ε*). For the case of an exponential distribution of *s*, he estimated Δ*F* as ∼0.6 for *t* = 200, for three different dominance coefficients, 0, 0.1, and 0.2. With *h* = 0.2, the estimate of *ε* for a random marker with a population size of 50 is 1.35*B*_{tot}; with the above estimate of *B*_{tot} = 0.270 for generation 200, this gives *ε* = 0.365 and Δ*F* = 0.73, which is somewhat higher than Latter’s (1998) value. But the comparisons of the single-locus simulation results with Equation 19 suggested that the approximation underestimates *ε* by ∼30% (Figure 4); increasing *ε* by this amount to 0.474 gives Δ*F* = 0.68. If the effects of different loci on *N*_{e} combine multiplicatively rather than additively, as is known to be the case for BGS (Charlesworth and Charlesworth 2010, p. 402), *ε* should be replaced by exp(*ε*) – 1 = 0.606, and Δ*F* = 0.62, which is very close to the simulation value. The value of *h* has only a small effect on the results; for example, with *h* = 0, the uncorrected *ε* = 0.454 and Δ*F* = 0.69.

We can also use Equation 24 to determine the expectations with the population genomics estimates of a mean *s* of 0.004 and CV* _{s}* = 2, with

*h*= 0.25. Substituting these values into Equation 24 with the same

*N*and recombination parameters as before, Equation 24 gives a negative value of

*ε*= –0.0663, and Δ

*F*= 1.07. This implies that BGS would cause a reduction in

*N*

_{e}, to ∼93% of the neutral value. This conclusion should be treated with caution, since the very wide distribution of selection coefficients in this case means that there will be a substantial range of values that violate the assumption of

*S*< 1 required for the validity of Equation 19.

### Relation to the data on loss of variability in small populations

Most of the data that are relevant to the question of whether AOD generated by linkage to deleterious mutations can retard the loss of neutral variability in small randomly mating populations come from a few studies of the behavior of putatively neutral markers and quantitative traits in small laboratory populations of *D. melanogaster*. To avoid the complications of interpretation associated with high levels of inbreeding, we do not discuss results from studies of sib-mating or first-cousin mating, although these suggest a significant retardation of loss of variability (Rumball *et al.* 1994), consistent with the conclusion that retardation is favored when *S* is small. Latter (1998) analyzed the results of Latter *et al.* (1995) for two allozyme markers and found a Δ*F* value of 0.52 over 200 generations in populations maintained at a size of ∼50 (see Latter’s table 6), consistent with his multilocus simulation results based on an exponential distribution of *s*.

Experiments using seven allozyme markers and two bristle traits in pedigreed populations ranging in size from sib-mating to 500, and maintained for 50 generations, showed that the regression coefficients of heterozygosity and genetic variance on pedigree inbreeding coefficient were significantly lower than the value of 1 expected with neutrality, suggesting a retardation of loss of variability (Gilligan *et al.* 2005). The facts that the results from populations with different sizes were pooled, and that there is a nonlinear relation between heterozygosity and inbreeding coefficient when AOD is acting, mean that a quantitative analysis of these results in terms of our model is impossible. In addition, a later analysis of eight microsatellite loci over 48 generations for the same set of populations showed an acceleration of loss of variability by this criterion (Montgomery *et al.* 2010), which the authors interpreted as evidence for selective sweeps related to adaptation to the laboratory environment.

Different experiments have, therefore, given very different patterns, and it is certainly possible that any retardation of loss of variability due to AOD may be obscured by selective sweeps, even if *N* is of a size that would otherwise lead to a retardation of loss of variability. However, a basic problem with experiments of this type is that the heterozygosity for a single neutral locus has a very high stochastic variance, even if allele frequencies are estimated with complete accuracy (Avery and Hill 1977); see Equation S39 of File S1, section S8. This yields the asymptotic expression for large *t*:(25)This equation implies that the relative error increases as mean heterozygosity decreases. For example, at generation 100, if the initial heterozygosity was at the maximal value of 0.5 for a biallelic locus, the mean neutral *H* for *N* = 50 is 0.184, yielding an expected coefficient of variation for a single locus in a single population of 1.47. To obtain a coefficient of variation of 0.1, (1.47/0.1)^{2} = 188 independent heterozygosities would need to be measured, from either independent loci or populations or from a combination of the two. Given that figure 6 of Latter (1998) shows a difference of ∼0.2 between the expected neutral *F* value and the value from simulations with AOD due to deleterious mutations, which corresponds to the difference in scaled *H* values, it is clear that many more replications than this would be necessary to obtain statistically significant results, especially as the distribution of *H* is far from normal. The differences between the different experiments could thus be purely stochastic, especially as the autocorrelations between *H* values in different generations mean that the regression tests used by Gilligan *et al.* (2005) and Montgomery *et al.* (2010) are problematic.

It is possible that tight linkage of neutral markers to sets of very strongly and highly recessive deleterious variants that are in repulsion LD with each other could maintain variation as a result of the pseudo-overdominance that can arise in this circumstance: see the section on effects of multiple loci subject to mutation and selection, and Charlesworth and Charlesworth (1997), Palsson and Pamilo (1999), and Palsson (2001). This is most likely to occur in very small populations; major-effect mutations are relatively sparsely distributed, so that their effects on neutral variability are likely to be restricted to specific genomic regions, rather than evenly spread across the genome. Recent genomic investigations of the well-known Chillingham population of cattle, which have been maintained at a very small population size for 350 years, are consistent with this interpretation. Genome-wide diversity is very low, and residual variability is localized to a number of specific genomic locations (Williams *et al.* 2016).

A full assessment of the possibility that a realistic distribution of mutational effects on fitness (DFE), which are currently becoming available from genome-wide polymorphism data (Charlesworth 2015), can cause significant retardation of the loss of neutral variability will require simulations that use realistic DFEs, coupled with experiments on the effects of reduced population size that exploit the high levels of replication that can be achieved using modern genomic technology for generating large numbers of markers. It would also be desirable to use populations that have been maintained for a long period in the laboratory at a large size as the initial population, to avoid possible confounding effects of selective sweeps. Current theory and data cannot convincingly answer the question of whether AOD due to deleterious mutations is a credible explanation for the presence of more than expected levels of variability in small populations.

### Selection in favor of heterozygotes

Our main conclusion is that retardation of loss of variability is nearly always observed when there is heterozygote advantage, except when the allele frequency at the selected locus is close to the boundaries 0 or 1, and there is relatively strong selection, when it is known that heterozygote advantage tends to accelerate the loss of variability (Robertson 1962). As shown in Figure 4 and Table 3, a substantial degree of retardation can occur when *c* < < 10(*s* + *t*), especially with intermediate equilibrium allele frequencies. As with mutation and selection, with *S* + *T* < 1 and a fixed value of *C* the magnitude of retardation is proportional to the inbred load scaled by 2*N* (see Equation 22).

Genome scans for signatures of balancing selection suggest that this is a relatively rare phenomenon compared with the number of genes in the genome (Charlesworth 2006; Gao *et al.* 2015), so that it is unlikely that a given neutral site will be closely linked to a locus maintained by heterozygote advantage. The exception is inversion polymorphisms in organisms such as *Drosophila*; these can cover relatively large proportions of the genome and also suppress crossing over to a considerable extent outside their breakpoints (Krimbas and Powell 1992). They could thus have a substantial impact on the behavior of neutral variants in small populations. Our conclusions concerning heterozygote advantage are, therefore, relevant to experimental results on small populations where inversions are segregating, as is often the case with *D. melanogaster*.

## Acknowledgments

We thank Nick Barton and Bill Hill for useful discussions. Nick Barton, Reinhard Bürger, Bill Hill, Jinliang Wang, and an anonymous reviewer made helpful suggestions for improving this article. L.Z. was supported by a China Scholarship Council scholarship (no. 201506100068).

## Appendix

### Approximations for the Expected Frequency of B_{2} and Expected Heterozygosity at the Selected Locus

With weak selection, the deterministic change in the frequency *q* of allele B_{2} under selection and mutation (neglecting terms of order *s*^{2}) is given by(A1)Because Δ*q* is third degree in *q*, it is impossible to obtain exact, closed equations for the changes in the expectations of *q* and *q*^{2}, E{*q*} and E{*q*^{2}}, under drift, selection, and mutation. However, if it is assumed that second-order terms in the deviation of *q* from its deterministic equilibrium value *q**, given by Equation 5a, can be ignored, approximate recursion relations can be obtained as follows. [The matrix calculations described in section S1 of File S1, which neglect third-order terms in (*q* – *q**), yield better approximations.]

Taking the first derivative of Equation A1 with respect to *q* yields a linear deterministic recursion relation for the departure of *q* in generation *t* from the equilibrium frequency *q** under mutation and selection, which is given by Equations 5,(A2a)where(A2b)Since this is linear in *q*, the mean of *q* under drift, selection, and mutation remains at *q**.

The equilibrium expected heterozygosity at the selected locus is given by *H*_{2}* = 2*q**(1 – *q**)*A/*(1 + *A*), where *A* = –4*N*_{e}*κ* (*e.g.*, Charlesworth and Charlesworth 2010, p. 355). The change in the expected heterozygosity at the selected locus at time *t*, *H*_{2}* _{t}*, is approximately equal to the expectation of 2(1 – 2

*q*)Δ

_{t}*q*(

*q*); combining this with Equations A2 yields the following recursion relation:(A3)Because

_{t}*H*

_{2}is twice the expectation of

*pq*, Equation A3 was used in Equations 4 for determining the approximate apparent selection coefficients.

### Approximate Neutral Recursion Relations for the Expectation of *r*^{2}

For the purely neutral case, Sved (1971) proposed the following approximation for the recursion relation for the expected correlation coefficient in generation *t*, E{*r _{t}*

^{2}},(A4a)where

*c*is the frequency of recombination between the two loci, assumed to be <<0.5. Since mutation at the B locus reduces the coefficient of linkage disequilibrium

*D*with the A locus by

*u*+

*v*each generation, this equation should be modified as follows:(A4b)This yields the equilibrium solution(A4c)where

*C*′ = 2

*N*(

*c*+

*u*+

*v*) and E*{} indicates the expectation at statistical equilibrium.

If the initial population is in linkage equilibrium, Equations A4 yield the following homogeneous recursion relation:(A5)This suggests the following expression for σ_{d}^{2} = E{*D*^{2}}/E{*xypq*}:(A6a)From Ohta and Kimura (1971), the equilibrium value of σ_{d}^{2} is given by(A6b)The simulation results show that Equation A6a tends to overestimate E{*r*^{2}} for *t* ≤ 0.5*N*; it is quite accurate for later generations up to *t* = *N* and tends to underestimate E{*r*^{2}} thereafter. Equation A5 gives a better approximation for the earlier and later generations, but a worse approximation for the intermediate generations.

### Expectations of the Reciprocals of the Allele Frequencies at the Neutral Locus

The expectations of *x*^{–1} and *y*^{–1} = (1 – *x*) ^{–1}, conditioned on segregation at locus A, for use in Equations 3 and 9, can be derived using the diffusion equation solution for the case of pure drift (Kimura 1955), under the assumption that departures from strict neutrality can be neglected as a first-order approximation. For this purpose, it is convenient to rescale time in units of 2*N* generations, such that *T* = *t*/(2*N*). Using the first three terms in the series expansion for the unconditional probability density of the frequency *x* of A_{1} at time *T*, given an initial frequency *x*_{0}, we have(A7)The expected frequency of heterozygotes at locus A at time *T* is(A8)To determine the conditional probability of *x* for a segregating population, we need the probability that the population is still segregating for A_{1} and A_{2} at time *T*, given by equation 8.4.9 of Crow and Kimura (1970):(A9)The conditional probability density of *x* at time *T* is then given by *φ*/*P _{s}*, which can be used to determine the mean of

*x*

^{–1}at time

*T*for segregating populations,(A10)where

*γ*is Euler’s constant (∼0.5772), which is equal to the difference between the sum and integral of 1/

*x*.

An equivalent expression for the expectation of *y*^{–1} can be obtained by interchanging *x* and *y* and *x*_{0} and *y*_{0} in these equations.

For sufficiently large *T*, it is well known that Equation A7 implies that the conditional probability distribution becomes uniform (Fisher 1930), regardless of the initial allele frequency, so that its mean should be well approximated by(A11)As was pointed out by Fisher (1930), there is a slight departure of the asymptotic distribution from a uniform distribution, due to the inaccuracy of the diffusion approximation near the frequencies 0 and 1, with lower frequencies in the subterminal classes close to either boundary, although the distribution is still symmetrical about 0.5. Using the table of exact values computed by Fisher for the first 10 classes from 1/(2*N*) upward and from 1 – 1/(2*N*) downward, the asymptotic conditional means are slightly greater than the above value, by 2.299 – [2.6556 × 2*N*/(2*N* – 1)]. Addition of this term to Equation A11 provides a slightly more accurate approximation for the long-term expectation of the reciprocals of the allele frequencies, which was used to calculate the asymptotic expected selection coefficients.

## Footnotes

*Communicating editor: R. Nielsen*Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.188912/-/DC1.

- Received March 4, 2016.
- Accepted April 25, 2016.

- Copyright © 2016 by the Genetics Society of America