The effects of recessive, deleterious mutations on genetic variation at linked neutral loci can be heterozygosity-decreasing because of reduced effective population sizes or heterozygosity-increasing because of associative overdominance. Here we examine the balance between these effects by simulating individual diploid genotypes in small panmictic populations. The haploid genome consists of one linkage group with 1000 loci that can have deleterious mutations and a neutral marker. Combinations of the following parameters are studied: gametic mutation rate to harmful alleles (U), population size (N), recombination rate (r), selection coefficient (s), and dominance (h). Tight linkage (r ≤ 10–4) gives significant associative effects, leading either to strong reduction of heterozygosity when the product Nhs is large or to a clear increase when the product Nhs is small, the boundary between these effects being 1 < Nhs < 4 in our simulations. Associative overdominance can lead to heterozygosities that are larger than predicted by the background selection models and even larger than the neutral expectation.
THE selective effects on neutral loci due to linkage have been described for different modes of selection, for hitchhiking (Maynard Smith and Haigh 1974) or “selective sweep” background selection (Charlesworthet al. 1993; Hudson and Kaplan 1995a; Nordborg et al. 1996), and temporal fluctuations in the direction of selection (Gillespie 1994; Barton 1995). Directional selection reduces variation of neutral or nearly neutral variants, either because neutral alleles hitchhike with linked beneficial mutations that sweep through the population or because background selection against deleterious mutations reduces the effective population size. The theory of background selection is developed for large populations where Muller's ratchet is not effective and selection reduces the effective population size approximately to a fraction f0, which is the frequency of mutation-free chromosomes, and consequently the amount of variation and the number of segregating sites at linked loci are reduced to the same extent. Correlation of the recombination rate and nucleotide diversity in Drosophila melanogaster agrees with the predictions of background selection (Begun and Aquadro 1992; Aquadroet al. 1994; Charlesworth 1996) although the data also agree with hitchhiking (Stephan 1995).
Reality becomes more complicated in small populations. Selection is not simply directional against chromosomes carrying harmful mutations. Chromosomes that are most divergent from each other carry different harmful mutations, and selection depends not only on the number of such mutations but also on how often they are expressed as homozygotes. This can result in frequency-dependent selection favoring rare variants at the level of chromosomes and in balancing selection when recombination is restricted. Such selection promotes neutral variation at linked loci, opposite to the effect of background selection. Frydenberg (1963) called this effect of recessive deleterious mutations on linked loci associative overdominance, and further studies by Sved (1972) and Ohta (1973) quantified some of its effects. Associative overdominance has been invoked as the cause for the observed relationship between individual heterozygosity and fitness (Ohta 1973; Zouros and Mallet 1989; Pamilo and Pálsson 1998).
Associative overdominance can result from both gametic and genetic correlations, i.e., it should be stronger under inbreeding, in small populations, with tight linkage and weak selection (Ohta 1971). Whether associative overdominance based on recessive harmful mutations can override the reduction of the effective population size by background selection is the subject of this study, and this is examined by exploring the parameter space defined by recombination rate and the product Nhs. We look particularly at the situations when the product Nhs is close to one, as the limiting value between neutrality and background selection is given by hs < 1/N [Charlesworthet al. 1993; compare with the limiting value s < 1/(2N) or s < 1/N for additive selection at a single locus (Kimura 1983, p. 44)].
We use a diploid model with one pair of chromosomes carrying 1000 loci, spread uniformly along the chromosome. The number of new deleterious mutations per gamete is selected in each generation randomly from a Poisson distribution (with a mean of U/2), the sites at which deleterious mutations occur being sampled from a uniform distribution. Two different mutation rates are used in our simulations (U = 0.1 and 0.01 per diploid genome per generation). These values are consistent with mutation rates used by Charlesworth et al. (1993) and are somewhat conservative (Keightley 1994).
The basic fitness model calculates the fitness of an individual multiplicatively over the selected loci as dominance (h) coefficients and the population size (N) are fixed in each simulation with the product Nhs varying from 0 to 20 (N ranges from 50 to 600, s from 0.02 to 0.1, and h from 0.0 to 0.5).
The associative effects depend on the distribution of deleterious alleles in the genomes, and this distribution can be influenced by the fitness scheme. We therefore expand the basic model by allowing epistatic interactions and variation of selection coefficients among loci. Synergistic epistasis, based on an approximation (assuming small s) by Charlesworth et al. (1991) of Crow's (1970) quadratic fitness model employing an “effective number of mutations” n, weights heterozygous loci by the dominance coefficient as n = hz + y, and the fitness is then given by (Charlesworthet al. 1991). Values used in the simulations are h = 0.1 and the selection parameters a = 0.01, 0.1 and b = 0.02, 0.1. When b = 0 the synergistic model reduces to the multiplicative model, with s = a ⪡ 1.
Varying selection is modeled by drawing the selection coefficient (s) for each mutation from a γ-distribution with parameters α = 31.4 and β = 0.691, based on the maximum-likelihood estimates from the data of Mukai et al. (1972; see Charlesworth 1996) and assuming a constant dominance coefficient h = 0.1.
The fitness values are used to give a probability distribution from which the parents of the next generation are drawn. Mating is random and the offspring consist of two gametes randomly selected from the two parents. The number of crossovers is sampled from a Poisson distribution with mean L (L = 0, 0.01, 0.1, or 1.0). The recombination frequency r between adjacent loci is given by Haldane's equation (Haldane 1919) r = 1/2(1 – exp(–2L/(n – 1))), where n is the number of loci. The value of r thus ranges from 0 to 10–3 in our simulations. The crossover sites along the chromosomes are sampled from a uniform distribution without any interference. The choic of recombination rates is made to allow a comparison with the earlier work by Charlesworth et al. (1993), who predicted that the recombination rate r > 10–3 will remove any significant linkage effects.
Infinite-allele model: Our basic model concerning the behavior of neutral markers assumes a locus where mutations follow the infinite-allele model. Repeated mutations occur at the neutral locus at a rate of μ = 2 × 10–4 mutations/gamete/generation, each resulting in a new allele. The marker heterozygosity is recorded and averaged over generations (50,000 or 100,000). The expected neutral heterozygosity is 4Nμ/(1 + 4Nμ) (Kimura 1983). We examined the effects of background by introducing neutral mutations at loci both at the tip and in the center of the chromosome.
To examine allele frequency dynamics, the simulations are used for constructing empirical transition matrices of allele frequencies at the neutral marker. The observed mean change Δp = p1 – p0 between two subsequent generations shows whether the selected loci cause any frequency-dependent associative selection at the linked neutral locus. The empirical transition probabilities are used to construct Markov chains that are used to give cumulative heterozygosities (Hc) for newly introduced mutations (see the two-allele model below). This approach connects the two models used here, the two-allele model and the infinite-allele model.
Two-allele model: To record the times to loss of respective fixation of new mutations, we use a biallelic model (Charlesworthet al. 1993). The model is first run 2000 generations with mutations occurring at the background loci, after which a neutral mutation is introduced at a locus in the middle of the chromosome and the simulation is run until the mutation is either fixed or lost. The number of deleterious mutations per chromosome stabilizes within the prerun of 2000 generations, except in a few cases with extremely low recombination rates, when deleterious mutations keep accumulating in the chromosomes (see below). We introduce 5000 mutations in the same background genotypes sequentially and new backgrounds are generated normally ten times, giving a total of 50,000 introductions for each set of parameter values. In some cases with a low recombination rate, deleterious mutations started to accumulate in the chromosomes, and fewer simulations are run.
The number and time of fixations and losses of neutral alleles are recorded, as are the cumulative heterozygosities , where xi is the frequency of the neutral mutation in generation i. The expected cumulative heterozygosity of a neutral allele until either fixation or loss is He = 2Ne/N and equal to 2.0 in our simulations.
A source code for the simulation program exists independently in Basic and C. Pseudo-random numbers are generated in the C-code using procedures from Numerical Recipes in C (Presset al. 1988).
Theoretical expectations: We use the theory developed by Nordborg et al. (1996) to calculate the expected effects of background selection. The approximate reduction in genetic diversity (π) relative to the neutral expectation (πn) due to background selection is where qi = 3u/(2hisi + si) is the equilibrium frequency mutation rate per locus), and ρi = ri(1 – sihi)/(sihi) measures the effects of recombination frequency (ri) between the neutral locus and the selected locus i. This applies mainly for large populations, when the product Nhs > 1 and the recombination rate between adjacent loci is <10–3. The cumulative heterozygosity is affected to the same degree as genetic diversity within generations (Charlesworthet al. 1993).
Ohta (1971) derived an analytical formula for associative overdominance, where the selection coefficient between a marker homozygote and heterozygote, caused by a single background locus, is when u ⪢ hs. The joint effect of all the background loci is calculated as an accumulating product of 1 – s′ over all background loci, although such a multilocus s′ is not always a good approximation of a real system (Pamilo and Pálsson 1998). The theory of associative overdominance does not give any prediction for the marker heterozygosity.
Coalescence: Deleterious alleles are expected to affect the coalescent times at linked neutral loci. As the application of the coalescence theory is gaining popularity, we also use this approach in the case of no recombination. Tajima (1989) proposed a test of neutrality that compares the estimates of θ = 4Nμ, Π (the average number of nucleotide differences), and Sn (number of segregating sites). Different types of selection affect the estimates in a different way. Let , where n is the number of haplotypes. The difference Π – Sn/an, which is tested by Tajima's test, is positive in the case of overdominance, zero for a neutral locus, and negative under directional selection. Mean time to coalescence (T2/2) (or the expected time to common ancestor of two randomly sampled genes) is related to Π as E(Π) = T2μ, and the overall size of a tree (Tn) based on observed coalescence times is related to Sn as E(Sn) = Tnμ. Tajima's test can therefore be based on Tn and T2 (Hudson and Kaplan 1995b). Ten replicates without recombination are simulated for 5000 generations each, both without selection and with selection (s = 0.1, h = 0.2), and the mutation rate U = 0.1. The coalescence tree is recorded for the chromosomes remaining at the end of each simulation and used for the test.
No background selection: The mean cumulative heterozygosity (Hc = 2.054, SE = 0.07) in the two-allele model without background selection agrees well with the theoretical expectation of 2.0. The observed mean heterozygosity for the infinite-allel model (0.0744) also agrees with the neutral expectation (0.0741) for N = 100 and μ = 2 × 10–4. Tajima's test of the coalescence tree also confirms the neutral expectation, as the test statistic, –0.07 (SE = 0.04), is not significantly different from zero.
Effect of linkage: The results under selection are based on the mutation rate U = 0.1 for the deleterious alleles unless otherwise stated. When the recombination rate between adjacent loci is r = 10–3, the heterozygosities of the neutral marker at the center of the chromosome agree well with the neutral expectations, or are slightly less, for all combinations of N, h, and s used (Figure 1, a and b and Table 1). The times to fixation or loss of the introduced alleles in the two-allele model are also close to the neutral expectations of E(tfix) = 4Ne = 400 generations and E(tloss) = (2Ne/N)ln(2N) = 10.6 generations when N = 100 (Table 1).
With tighter linkage (r < 10–3), the heterozygosities depart from the neutral expectations in a way that depends on the other parameters (Table 1). The recombination rate of r = 10–4 results in heterozygosities lower than the neutral expectations when Nhs > 1. The larger the product Nhs the lower are the heterozygosities (Table 1 and Figure 1, c and d). Very tight linkage (r = 10–5), however, increases heterozygosities above the neutral expectations for values of Nhs ≤ 4 (Figure 1, e and f).
The mean time to loss of the introduced neutral allele always decreases with reducing recombination rate, and the above differences in the cumulative heterozygosity between the sets of parameter values depend largely on the mean time to fixation. When the mean time to fixation increases, so does its variance (Table 1). This means that there are occasional runs where the introduced mutation segregates for a very long time, increasing the mean heterozygosity. Heterozygosity at a neutral marker at the tip of the chromosome is about the same as at a locus in the middle when r = 10–3, lower when r = 10–4, and higher when r = 10–5 (Figure 2).
When recombination is rare or absent (r = 10–5, r = 0) the number of harmful mutations occasionally keeps accumulating, and the chromosomes start to evolve to a system of balanced lethals. The marker heterozygosities are in these cases underestimated in our simulations and the mean time to fixation cannot be estimated (Table 1). To further study the process in the case of no recombination, we look at the frequency changes and the coalescence events (see below).
Effect of mutation rate: Reducing the genomic rate at the background alters the results in such a way that the observed heterozygosities for U = 0.01 tend to be smaller than for U = 0.1, but can still be higher than the neutral expectation (Table 1).
Effect of dominance: The heterozygosity at the neutral marker is largely predicted by the product Nhs (for a given value of r), but there is residual variation depending on the individual parameters. The associative effect is larger with strong dominance (small h), and small h values can lead to high heterozygosities when linkage is tight. The observed heterozygosities exceed the neutral expectation up to Nhs = 5 when h = 0.1 and up to Nhs ≈ 2 when h = 0.3 (Figure 1f).
Effect of selection: When the product Nhs is kept fixed, there is a tendency that weak selection leads to lower heterozygosities than strong selection (s = 0.2; Figure 1c). However, this relationship is reversed when linkage is tight (r = 10–5) and Nhs > 3 (Figure 1e).
Simulations with synergistic selection and with varying selection coefficients give qualitatively similar results to those with fixed values of s and multiplicative selection. Although, synergistic selection is comparable to a stronger selection, the mean number of mutations per individual when a = 0.01 is similar to when the s value is 00.5–0.08, and the heterozygosity is less than the neutral expectation when a and b are equal to 0.1 for r = 10–5.
Comparison with the background selection model: The mean heterozygosity from our simulations tends to exceed that predicted by the background selection model (Figure 3). The departure between the observed and predicted values is large in cases where the predicted heterozygosities are small and particularly when the populations are small, and the deviations are largest for r = 10–5.
Comparison with Ohta's model: When linkage is tight (r = 10–5), the coefficient of associative overdominance s′ calculated according to Equation 4 correlates well with heterozygosity, and the observed heterozygosity exceeds the neutral expectation when s′ > 0.07 (Figure 4). For r = 10–4, the correlation between s′ and heterozygosity is weak.
Frequency-dependent selection: The Markov chains, based on the observed allele frequency changes in the infinite-allele model, result in cumulative heterozygosities very close to those observed in the two-allele model (Table 2). Plotting the observed allele frequency change Δp against the frequency p0 in the parental generation shows some clear patterns. The Δp values are large for tight linkage, reflecting small effective population size and strong drift. No frequency dependence is observed when r = 10–3. There is frequency-dependent selection favoring rare alleles (Figure 5a) in the case where the observed heterozygosity clearly exceeds the neutral expectation (by a factor of 1.26; r = 10–4, s = 0.02, h = 0.1, N = 100, U = 0.1). As there are many rare alleles (new mutations), their frequency increases must be compensated by negative Δp's of common alleles. A low recombination rate (r = 10–5) can occasionally lead to formation of complementary chromosomes that accumulate harmful mutations and evolve to a system equivalent to balanced lethals (Figure 5b). Such a case shows, of course, clear frequency-dependent selection.
Coalescence: Background selection in the absence of crossing over and with parameters s = 0.1, h = 0.1, and N = 100 gives a mean coalescence time T2/2 = 2156 (SE = 400), which greatly exceeds the neutral expection of 2N = 200 generations. The same holds for the mean overall size of the tree Tn = 9470 (SE = 1426), while the neutral expectation is 2349.
The distribution of branch lengths in the simulations of a completely neutral model fits well an exponential distribution, whereas the tree with harmful mutations has few and very long internal branches, and many short external branches, and the total tree length increases. Mean branch length increases with selection from 5.6 to 23.8 generations, and the maximum time to coalescence increases from 441 to 4959 generations.
The Tajima's tests gave significantly positive values whether based on the coalescence tree (D = 4.817, SE = 0.316) or on the number of mutations (D = 1.22, SE = 0.06). The results under selection depend on the number of generations simulated, as the size of the tree and the test statistic D increase with time when the chromosomes evolve into a system of balanced lethals.
The effects of associative overdominance were also observed with the Fu and Li test (Fu and Li 1993), with the internal branches becoming excessively long compared to the external branches.
The results show joint effects of reduced effective population size and associative overdominance, the balance depending on the interaction of the recombination rate, population size, dominance, and selection. Any significant linkage effects require that the recombination rate between adjacent loci is r < 10–3. Associative overdominance is clearest close to the origin of the parameter space defined by the recombination rate r and the product Nhs. In other words, the requirements are tight linkage r ≤ 10–4, small population size, strong dominance, and relatively weak selection. The boundary between the two opposite linkage effects, one increasing and the other decreasing heterozygosity at linked loci, lies generally within the range 1 < Nhs < 4 for the parameter values used in our simulations.
The heterozygosity-decreasing effects of background selection, as derived by Nordborg et al. (1996) for large populations, explain part of our results. Heterozygosity at the neutral marker is close to the neutral expectation when r = 10–3 and tends to drop below that level when r = 10–4. However, the reduction is not as large as predicted by the model, indicating that associative overdominance plays a role in determining the variation pattern at the marker locus. Reducing the recombination further to r = 10–5 finally shifts the balance toward associative overdominance and the heterozygosity exceeds the neutral level for small values of Nhs. The simultaneous action of both factors is seen in the mean times to fixation and loss. Reduced recombination always decreases the mean time to loss of a new neutral mutant, showing the effect of reduced effective population size. This is true even in the cases where the average heterozygosity is high, and this high level is caused by a long time to fixation keeping the population polymorphic.
The effect of background selection depends on the increased fitness variance among individuals, as that variance reduces the effective population size (Charlesworthet al. 1993). In small populations and with tight linkage, the harmful mutations are at linkage disequilibria with the alleles at a linked marker locus and there can develop chromosomal segments with several deleterious mutations linked with a specific marker allele (Sved 1972). Selection acting on such linkage groups can take the form of overdominance, which is seen at the individual level as an association between heterozygosity and fitness (Pamilo and Pálsson 1998). Low genomic mutation rate (U = 0.01) can only lead to such linkage disequilibria if selection is weak (s = 0.02); stronger selection eliminates harmful mutations more efficiently.
Whether heterozygosity-increasing or -decreasing effects dominate depends also on the product Nhs. With low recombination, slightly deleterious mutations accumulate in small populations and selection becomes stronger as it acts on a block of several mutations, resulting in associative overdominance. As the product Nhs gets larger, purifying selection generally becomes more efficient, reducing Ne and variation at neutral loci as predicted by background selection, and the results depend on the interaction of h and s values as can be seen in Table 1.
Our results indicate that associative overdominance is partly caused by frequency-dependent selection favoring rare types, but the main effect is due to a prolonged time to fixation. This means that the neutral mutations are linked to specific genetic backgrounds and that the resulting heterozygosity varies among markers. Some of them, with a suitable background, show associative overdominance while others are affected more by purifying background selection even when the values of the genetic parameters are the same. In our results, we focused on the mean effects, but one must remember that the variances can be large.
The theoretical models developed to date for the effect of deleterious mutations have clear limitations when applied to small populations. Ohta's (1971, 1973) theory on associative overdominance requires tight linkage (or large s′ values) for heterozygosity to be elevated above the neutral expectation. When s′ values are small, drift and purifying background selection override associative overdominance, and heterozygosity is reduced even though it still shows a positive relationship with s′ at the individual level (Pamilo and Pálsson 1998). The background selection model fails to predict the behavior of a linked marker when the parameter values are within the region leading to associative overdominance. Large heterozygosity values and a positive relationship between heterozygosity and s′ indicate that both types of selection are operating at the same time. The analytical solution of Nordborg et al. (1996) for background selection is derived for large populations, assuming, e.g., that the frequency of a deleterious mutant allele at locus, i, qi ⪡ (hsi + ri(1 – hsi))Ne. This assumption is most likely to be violated for low recombination rate, weak selection, and small population size, and as pointed out by Nordborg et al. (1996) and confirmed in this study, it will lead to underestimation of heterozygosity at the neutral marker.
The theoretical studies of background selection have largely relied on the assumption that there exists a class of chromosomes free of harmful mutations with a frequency of f0. This basic assumption underlies the model of Hudson and Kaplan (1995b) based on the coalescence theory. About 10% of the values of the Tajima's D statistic in the study of Hudson and Kaplan (1995b) with N = 100 and sh = 0.02 were highly negative, <–1.61. In our study with sh = 0.01 all Tajima's D statistics are positive, showing the overdominance effect. The difference in results shows that the assumptions used in models for large populations do not in fact hold for small populations.
Mean time to fixation increases in the case of associative overdominance. In an extreme case without recombination, the gamete pool splits in two major divisions, with a short time to coalescence within but a very long time between divisions. Neither major division of haplotypes can get fixed as the homozygotes have much lower fitness than heterozygotes. The population enters a balanced equilibrium in a way similar to a case of multilocus heterosis described by Franklin and Lewontin (1970). Most mutations are rapidly lost, however, as time to coalescence within the divisions becomes shorter, indicating a small effective size of the divisions augmented by background selection. Crossing over breaks down such chromosomal blocks, preventing the accumulation of harmful mutations. But a related phenomenon is still seen when the neutral markers are at linkage disequilibrium with closely linked deleterious mutations, and the effect is seen as a long time to fixation of a neutral mutation.
Even though the populations studied here are of small size, from 100 to 600 individuals, the result may also be of concern for larger populations either when they are subdivided or if they undergo a reduction in population size, which further causes a temporary linkage effect. The effective population sizes are often only 10% of the actual size (Frankham 1996). Associative overdominance may contribute to deviation of observed patterns from the predictions of background selection theory (Charlesworth 1996; Kirby and Stephan 1996), but most studies on the relationship between recombination rate and sequence variation have been conducted on large populations of Drosophila where the associative overdominance effect is predicted to be weak. There are, however, many species of both animals and plants in which conditions necessary for the associative overdominance are met.
The background selection models predict that heterozygosity is reduced in areas of low recombination (Charlesworthet al. 1993; Nordborg et al. 1996) whereas our results show that the opposite can be true in some situations. Comparisons of genomic areas with different rates of recombination can test these predictions. Deleterious mutations can at the same time both reduce the effective population size and cause associative overdominance, and the latter effect can explain why the data do not fit the predictions of the background selection, particularly in species where the population sizes are small or have fluctuated.
We thank the reviewers for many valuable comments and Martin Lascoux for discussions and comments. This work was supported by grants from the Carl Trygger Foundation and was done partly by using the computer facilities at the Plant Breeding Department, Swedish Agricultural University.
Communicating editor: W. Stephan
- Received January 16, 1999.
- Accepted May 13, 1999.
- Copyright © 1999 by the Genetics Society of America