Joint Effects of Genetic Hitchhiking and Background Selection on Neutral Variation
- Yuseob Kim⇓ and
- Wolfgang Stephan
- Corresponding author: Yuseob Kim, Department of Biology, University of Rochester, Rochester, NY 14627. E-mail: yuse{at}troi.cc.rochester.edu
Abstract
Due to relatively high rates of strongly selected deleterious mutations, directional selection on favorable alleles (causing hitchhiking effects on linked neutral polymorphisms) is expected to occur while a deleterious mutation-selection balance is present in a population. We analyze this interaction of directional selection and background selection and study their combined effects on neutral variation, using a three-locus model in which each locus is subjected to either deleterious, favorable, or neutral mutations. Average heterozygosity is measured by simulations (1) at the stationary state under the assumption of recurrent hitchhiking events and (2) as a transient level after a single hitchhiking event. The simulation results are compared to theoretical predictions. It is shown that known analytical solutions describing the hitchhiking effect without background selection can be modified such that they accurately predict the joint effects of hitchhiking and background on linked, neutral variation. Generalization of these results to a more appropriate multilocus model (such that background selection can occur at multiple sites) suggests that, in regions of very low recombination rates, stationary levels of nucleotide diversity are primarily determined by hitchhiking, whereas in regions of high recombination, background selection is the dominant force. The implications of these results on the identification and estimation of the relevant parameters of the model are discussed.
IT has been suggested that the “hitchhiking effect” of a strongly selected allele on the frequencies of neutral DNA polymorphisms at linked loci may play an important role in determining the patterns of genetic variation across eukaryotic genomes (Maynard Smith and Haigh 1974; Ohta and Kimura 1975; Kaplanet al. 1989; Stephanet al. 1992; Gillespie 1994; Barton 1998). Furthermore, it has been shown that, on the basis of the observed patterns of variation, the relevant parameters of the underlying selective process can be estimated (Wiehe and Stephan 1993; Stephan 1995). As predicted by the theory of hitchhiking (Birky and Walsh 1988, and references above), the level of genetic variation is usually positively correlated with the rate of recombination, but divergence between closely related species is nearly unaffected by recombination (Begun and Aquadro 1992). On the other hand, the theory of “background selection” (leading to a reduction of effective population size by recurrent deleterious mutations) proposed by Charlesworth et al. (1993) makes qualitatively similar predictions (Hudson and Kaplan 1995; Nordborget al. 1996). Hudson and Kaplan (1995) and Charlesworth (1996) showed that background selection can explain genome-wide patterns in Drosophila melanogaster polymorphism data. However, a similarly good fit has been obtained when the hitchhiking model alone is applied to the same data set (Stephan 1995). Other studies supported hitchhiking over background selection for explaining patterns of genetic variation at (mostly) individual loci (Schlöttereret al. 1997; Nurminskyet al. 1998; Stephanet al. 1998; Benassiet al. 1999), whereas in some cases background selection was thought to be sufficient in explaining the results. Thus, it appears that the relative importance of hitchhiking and background selection in determining the level of genetic variation remains essentially unknown.
Previous studies of the hitchhiking effect used a model in which recurrent deleterious mutations at linked loci are not included. However, since the rate of deleterious mutations is believed to be high (Keightley and Eyre-Walker 1999), hitchhiking events are likely to occur in chromosomal regions where the standing level of variation is already reduced by background selection (Charlesworth 1996). Therefore, the effect of background selection on the process of hitchhiking should be investigated to assess the relative importance of these two forces.
Peck (1994) and Barton (1995) studied the reduction of the fixation probability of strongly selected alleles due to background selection. Stephan et al. (1999) derived results for the effect of background selection on the nucleotide diversity and fixation probability at a partially linked, weakly selected locus. But genetic variation at a neutral locus that is partially linked to both a locus under strong positive selection and a locus under background selection has not been investigated. In the latter case, the dynamics of favorable and deleterious alleles may interfere with each other, making it difficult to predict the outcome of this interaction by analyzing these processes individually. In this article, we investigate this problem using simulations based on three-locus, two-allele models. We measure heterozygosity at a neutral locus using two different models of genetic hitchhiking. In the first model, we analyze the stationary level of heterozygosity caused by background selection and recurrent hitchhiking events; in the second one, we study the effect of background selection and a single hitchhiking event on heterozygosity. The results of this analysis have significant implications for our understanding of selective processes in natural populations and for the identification and estimation of relevant parameters of these processes.
STATIONARY LEVEL OF HETEROZYGOSITY CAUSED BY BACKGROUND SELECTION AND RECURRENT HITCHHIKING EVENTS (MODEL 1)
In this section, we investigate the stationary level of heterozygosity determined by recurrent substitutions of favorable alleles and continuous removal of deleterious alleles by background selection. We use a simple discrete-generation model of a diploid population of size N. (Note that a list of parameters is provided in Table 1.) Since we assume that the fitness effects of alleles within a locus combine multiplicatively, this model is equivalent to that of a haploid population of size 2N that undergoes random conjugation and recombination. We consider a three-locus model such that the three loci are located on a chromosome in the following order: The first locus (Del) experiences recurrent deleterious mutations. The mutation of the wild-type allele (A) to a deleterious allele (a) with selective disadvantage t occurs at a rate u (per gene per generation). That is, as in Stephan et al. (1999), background selection acts at a single locus. The second locus (Fav) is under positive selection such that a favorable allele (B) with selection coefficient s is introduced as described below. Mutation from an ancestral (m) to a derived neutral allele (M) occurs at the third locus (Neu). The genetic background (haplotype) on which the mutations to B and to M occur is chosen randomly in proportion to its frequency. The recombination fractions between Del and Fav and between Fav and Neu are r1 and r2, respectively. The two other possibilities of gene order, i.e., Del-Neu-Fav and Fav-Del-Neu, were also studied. For all three gene orders, the simulation and theoretical methods are similar. So we focus on the case of Del-Fav-Neu.
Simulation methods: For this model, there are eight possible haplotypes (ABM, ABm, AbM, Abm, aBM, aBm, abM, and abm) in the population. Therefore, the dynamics of the system can be completely described by the changes of eight haplotype frequencies. It is straightforward to derive a set of equations describing the deterministic change of haplotype frequencies by selection, recombination, and deleterious mutations (appendix a). When one copy of B or M is introduced in the population, haplotype frequencies are changed accordingly before they are subjected to selection. To incorporate the effects of finite population size, multinomial sampling of different haplotypes was simulated after their frequencies were changed according to the deterministic equations. We used the random binomial number generator of Press et al. (1992) with some modification.
Definitions of parameters
Each simulation starts with a population of Abm and abm haplotypes. The initial frequency of a is given as u/t. Then, one copy of the favorable allele (B) is introduced in the population at rate uf if the population is fixed for b (note that uf is a mutation rate per population per generation, whereas u is per gene per generation). If B is fixed, all B's are converted to b. Therefore, with u, uf > 0, a mutation-selection balance at the Del locus and occasional directional selection at the Fav locus occur simultaneously. To measure the standing level of genetic variation at Neu, we used the method suggested by Charlesworth et al. (1993). An allele M is introduced at the beginning of each simulation run and introduced again whenever it is lost in the population by drift. If M is fixed in the population, at the next generation all M's are converted to m and then another M is introduced. The frequency of M, y, is monitored until M is lost or fixed. During this period, heterozygosity, 2y(1 − y), is summed over generations. The expected value of this sum, H, in the neutral model is 2.0 (Kimura 1969, 1971). Heterozygosity per site per generation, π, for a given hypothetical process of mutation, with rate μ, is then obtained by “spreading out” trajectories of M over time according to the mutation process; i.e., π = 2NμH. To observe the change of the level of genetic variation, we only need to measure H without modeling a specific mutation process (Charlesworthet al. 1993).
This procedure is based on the principle of ergodicity, which says that averaging a random variable of a stationary stochastic process over time leads to the same result as averaging this quantity at any given time point over different realizations of the process. Assuming that the selective phase during hitchhiking is very short, we may consider each selective sweep instantaneous. Since these hitchhiking events are modeled as a time-homogeneous Poisson process, their effect is a shortening of the trajectories of the neutral allele M, independent of time. As a consequence, the expectation of H can be found by evaluating this random variable at arbitrary time points during a particular realization of the process and by averaging over these values, or by evaluating an ensemble of realizations of the process at a particular time. A total of 108 introductions of M were made consecutively in each simulation run and the mean value of H was obtained. The mean numbers of generations until loss and until fixation were also recorded.
Theoretical predictions: We combined the previous theoretical results of hitchhiking and background selection using the following assumptions. Deleterious mutations occur very frequently, leading to the establishment of a mutation-selection balance. We assume that this mutation-selection balance at Del affects the fixation probability and the effective population size at the Fav locus. On the other hand, we assume that directional selection at Fav has no influence on the mutation-selection balance at Del. Therefore, the equilibrium frequency of the deleterious allele (a) is maintained during the selective phase. The combined effects of background selection and hitchhiking on heterozygosity may therefore be approximated by well-known formulas (Kaplanet al. 1989; Stephanet al. 1992; Wiehe and Stephan 1993) that describe the effects of hitchhiking on neutral variation as a function of linkage, effective population size, and strength of selection, except that these latter quantities have to be corrected to allow for the occurrence of background selection.
The effect of background selection at a single locus on a linked locus is given by
Simulation results: Table 2 shows the results of simulations for the gene arrangement Del-Fav-Neu in model 1. In most cases, we used t = 0.02, which is close to the mean heterozygote effect of deleterious mutations estimated from D. melanogaster (Crow and Simmons 1983). The choices of s, r1, r2, u, and uf are rather arbitrary. Since simulation time increases with population size, we used 2N = 105, which is smaller than typical Drosophila population sizes. We obtained accurate results for H, the fixation probability of B, and the mean time to loss (T0) and fixation (T1) of M expected under standard theories (simulations 1-1, 1-4, and 1-8). The fixation probability of M was close to 1/2N in all the simulation runs for model 1 (data not shown), which agrees with the fact that selection at linked loci does not affect the fixation probability of neutral mutants (Birky and Walsh 1988). The H values obtained for all parameter values agree well with our simple theoretical predictions (Equation 5). This result indicates that the effects of background selection and hitchhiking can be combined in a predictable way. However, the combined effects are not simply multiplicative. For example, comparing simulations 1-3, 1-4, and 1-6, the H values were reduced by the factors of 0.67 and 0.84 by background selection and hitchhiking, respectively, when each process took place without the other. However, when both processes occurred at the same time, the reduction factor of H was 0.61, larger than the product 0.67 × 0.84 = 0.57. This discrepancy can be fully explained by the expected change of fH from 0.836 (for 1-4) to 0.894 (for 1-6). This suggests that the effect of hitchhiking diminishes with background selection. This nonmultiplicative combination of hitchhiking and background selection produces even more striking results when the effect of hitchhiking is very strong: In simulations 1-9 and 1-10, it is shown that the reduction of heterozygosity is smaller when hitchhiking is combined with background selection than when hitchhiking occurs in the absence of background selection. Simulations for the other arrangements of genes, i.e., Del-Neu-Fav and Fav-Del-Neu, gave qualitatively the same results (data not shown).
In simulations 1-13, 1-14, and 1-15, where s > t, the equilibrium frequency of the deleterious allele at Del is likely to be perturbed by the substitution of B, which violates the assumptions of our theoretical predictions. However, for these parameter values theoretical predictions given by (5) still agree well with simulation results (Table 2). We further address this problem for model 2 below.
Generalizations and implications: The results obtained from our three-locus analysis suggest the following generalizations. Consider a chromosome of a finite physical length throughout which the recombination rate per nucleotide per generation, ρ, is constant. A position on the chromosome is described by the number of nucleotides, l, away from the reference locus Neu, where a positive (negative) value of l defines a locus to the right (left) of Neu. Neu is located lL and lR nucleotides away from the left and right ends of the chromosome, respectively. Deleterious mutations can occur at any position along the chromosome at a rate u (per nucleotide per generation). Single hitchhiking events may occur according to a time-homogeneous Poisson process caused by advantageous substitutions at randomly chosen loci. The question we ask is, What is the joint effect of these forces on neutral polymorphism at Neu? Equation 5 suggests that nucleotide diversity, π, can be approximated as
Equation 6 can be derived in a similar way as the corresponding equation without background selection (Wiehe and Stephan 1993). Let ν(l, ρ) and α(l, ρ) be the expected number of advantageous substitutions and the strength of selection at a position l, respectively. The rate at which a neutral polymorphism at Neu undergoes hitchhiking caused by selected substitutions between l and l + dl nucleotides away is then given by ν(l, ρ) [1 − h(l, ρ)]dl, where
Results for model 1 (gene order: Del-Fav-Neu)
Assuming deleterious mutations have a uniform selective disadvantage, t, and effects of background selection at many loci combine multiplicatively, it can be shown that
Relative nucleotide diversity, π/π0, against pernucleotide recombination rate, ρ. The model described in the text is used with t = s = 0.02, u = 1.5 × 10−9, 1L = 1R = 107, kαν0 = 10−9. The continuous line (—), produced by Equations 6, 10, and 11, represents the joint effect of hitchhiking and background selection. The graphs of relative diversity determined by background selection alone (- ċ -) and hitchhiking alone (---) are produced by π = π0fB(ρ) (Equation 10) and π = π0ρ/(ρ + kαν0) (Wiehe and Stephan 1993), respectively.
TRANSIENT PATTERNS OF HETEROZYGOSITY AFTER A SINGLE HITCHHIKING EVENT UNDER THE INFLUENCE OF BACKGROUND SELECTION (MODEL 2)
We use the same three-locus model as above, but assume different mutational processes for the Fav and Neu loci. We are interested in transient patterns of heterozygosity at Neu at given time points after a hitchhiking event, rather than the stationary level of heterozygosity measured in model 1. We define time, T, as the number of generations before present (T = 0). One selected substitution, from b to B, at Fav occurs at a fixed time in the past (T = τ). It is assumed that the previous substitution at Fav took place very long ago such that the level of genetic variation at Neu has recovered to its equilibrium level before the selected substitution occurs at T = τ; i.e., we are analyzing the effect of a single hitchhiking event. At the Neu locus, mutant alleles, M, are introduced in the past at T = λ (0 < λ < ∞), such that they may occur at any generation with an equal rate, μ (per gene). Each mutant has a certain probability of still segregating at T = 0, thus contributing to heterozygosity at T = 0. Therefore, heterozygosity can be determined by adding up all the contributions made by the mutations in the past (Kimura 1969, Equation 5). This suggests a simulation procedure in which heterozygosity at T = 0 is measured by summing over all trajectories of M that occurred in the past. That is, in each simulation run we follow the trajectory of allele M and record its frequency at T = 0. By repeating this procedure, we obtain the distribution of the frequency of M and thus the average heterozygosity at T = 0. As in model 1, recurrent deleterious mutations occur at the Del locus.
Simulation methods: The recurrence equations and multinomial sampling method that were described in model 1 are used to simulate the dynamics of allele frequency changes. In the simplest way, model 2 can be simulated by introducing a neutral mutant, M, at T = λ, where λ is uniformly distributed between 0 and L (⪢N), and by recording its frequency at T = 0. If M is lost or fixed before T = 0, its frequency is recorded as zero. Thus, the frequency distribution of M at T = 0 can be obtained by repeating the above procedure many times. However, this straightforward simulation scheme is too time-consuming because the length of the time window, L, has to be set to a large value. To circumvent this problem, we use the following procedures, which allow us to keep L reasonably small.
For simplicity, consider first the model without background selection. We assume that, at the beginning of the time window (T = L), the population is in mutation-drift equilibrium such that the number and the frequency distribution of a segregating allele at Neu can be described by the standard neutral theory (Kimura 1983). Then, we introduce mutants at T = L with initial frequency i/2N (i = 1, … , 2N − 1), where the probability of frequency i/2N is proportional to 1/i. The rest of the mutants are introduced at T = λ with initial frequency 1/2N, where λ is uniformly distributed between 0 and L − 1. The ratio of mutants appearing at T = L and T = λ is adjusted such that the expected number of segregating sites is at equilibrium. For a given mutation rate (per generation per locus), μ, the probability that allele M is segregating at the Neu locus at T = L is given by θa2N, where θ = 4Nμ and
To incorporate background selection on Del, the same procedure is used but with the following changes. We assume that, when neutral mutants are introduced, the population is in a deleterious mutation-selection equilibrium. Therefore, the frequency of the deleterious allele (a), q, is set to u/t when each replicate of the simulation starts with a new introduction of M. One copy of M randomly associates with A or a at T = λ. The expected number of segregating sites at T = L is now ~4N2μa2K, where K is the closest integer to N2 = NfB(r1 + r2) (see Equation 1); the latter is the effective population size at Neu when background selection has been taken into account. δ is now given by 2N2a2K/(2N2a2K + NL). The initial frequency of M at T = L is i/2K (i = 1, … , 2K − 1), where the probability of frequency i/2K is proportional to 1/i. We assume no linkage disequilibrium between Del and Neu at T = L; thus the frequency of the AM haplotype, for example, is given by (1 − u/t)(i/2K). However, we used L > τ + 103, so that haplotype frequencies immediately before the hitchhiking event depend little on the initial frequencies.
Theoretical predictions: To predict average heterozygosity (π) at T = 0 for model 2, we used the same approach as for model 1 by modifying the effective population sizes at the linked loci due to background selection. A simple solution can be obtained as
Simulation results: We introduced 2 × 107 M alleles independently in a diploid population of a size 2N = 105, as described above, and observed its frequency at T = 0 (Figure 2 and Table 3). The frequency distributions of allele M at T = 0, before and after the hitchhiking event, are shown in Figure 2. Comparing the observed and expected distributions before hitchhiking, it is clearly seen that background selection does not change the shape of the frequency distribution of a linked neutral locus. Significant excess of low-frequency alleles was not observed. Immediately after the hitchhiking event, however, the number of intermediate-frequency alleles was greatly reduced, and that of high-frequency alleles increased significantly. This effect was previously observed by J. Fay and C.-I Wu (personal communication). We further discuss this observation below.
Allele frequency distribution of M before and after a hitchhiking event. Shaded bars represent the frequency data obtained at T = 0 in simulation 3-2 of Table 3 (before hitchhiking). Small squares connected by lines show the expected number of M's segregating in each frequency interval. The expected number of M's segregating in the frequency interval (y, y + dy) is assumed to be θ/ydy, where θ = 4N2μ. μ is determined by simulation, as explained in the text. Then, the expected number of M's in a frequency interval (i/20, (i + 1)/20], i = 0, … , 19, is given by . Solid bars represent the data obtained in simulation 3-3 (immediately after hitchhiking).
Table 3 summarizes the simulation results for heterozygosity (π), observed immediately after the hitchhiking event (except 2-2). Predicted values (using Equation 13) agree well with the simulation results. Figure 3A also shows that (13) accurately predicts heterozygosities at various time points after the hitchhiking event. For τ = 1, the average reduction of heterozygosity by a hitchhiking event, predicted by h (Equation 3), simply corresponds to π/θ. π/θ does not change significantly from 2-1 (0.48) to 2-3 (0.48) or from 2-4 (0.065) to 2-5 (0.067), which implies that the reduction of the effective population size at Fav by background selection does not weaken the effect of a single hitchhiking event, at least when s = t. This result differs from model 1, where the effect of hitchhiking decreased as the effect of background selection increased. This discrepancy is caused by the fact that N2 and Φ are significantly reduced by background selection but h is relatively insensitive to changes of α1 = 2N1s (discussed above).
Results for model 2 (gene order: Del-Fav-Neu)
Changes of heterozygosity, homozygosity, and fixation rates at Neu over time after a hitchhiking event. The parameters are 2N = 105, L = 15,000, τ = 10,000, s = t = 0.02, u/t = 0.2, r1 = 10−3, r2 = 10−4, and number of M's introduced = 108. (A) In addition to T = 0, the frequency of M was recorded every 500 generations. Mean heterozygosity and homozygosity were calculated at each generation. Solid squares represent observed heterozygosities. Lines were drawn for the expected values of heterozygosity using Equation 13. Shaded squares represent homozygosity. (B) Whenever the M allele is fixed, the time of this event was recorded. The histogram shows the number of fixation events at each time interval. The interval between 0 and 0.2N generations after the hitchhiking includes the fixation events occurring during the substitution of B.
We also investigated the perturbance of the mutation-selection balance at Del during the substitution of B at Fav and its effect on heterozygosity after the fixation of B. It was previously observed that the frequency of the deleterious allele, pa, deviates most from its equilibrium value, u/t, when the increase of the frequency of B, pB, is greatest. pa returns toward u/t after pB exceeds 0.5 (data not shown). We thus recorded pa in the first generation after pB became >0.5 during the substitution process of B. When s ≤ t, pa remained very close to u/t. When s = 0.02 and t = 0.005 (2-7 and 2-8), the mean and standard deviation of pa increased significantly, as expected. However, the observed and expected value of π still agreed very well. To further investigate the change of pa, we conducted additional simulations in which B was introduced in initial linkage either with A or with a (2-7a and 2-7b and 2-8a and 2-8b). The initial linkage with A did not change pa significantly. However, the linkage with a greatly elevated pa. Surprisingly, mean heterozygosities after the hitchhiking event were relatively close to each other despite a large difference in pa during the selective phase, although small increases in heterozygosity for the case of initial linkage with a were observed (2-7b and 2-8b). If the increase of the deleterious allele frequency caused the reduction of effective population size at linked loci, it would have resulted in a lower heterozygosity after the fixation of B. However, the result is in the opposite direction. One might argue that the reduction of effective population size will weaken the strength of directional selection and thus result in higher values of π. However, it was shown above that a decrease of α1 does not change h significantly (2-4 and 2-5). Therefore, the agreement of the observed value of heterozygosity with its prediction needs to be further explored, as the underlying assumption—constant pa during the selective phase—is violated (see discussion).
Finally, we investigated the increase of homozygosity of derived neutral alleles after a hitchhiking event. We observed the level of homozygosity of M at various time points before and after the hitchhiking event (Figure 3A). Homozygosity increased sharply immediately after hitchhiking. However, it dropped quickly and, after a short time (<0.5N generations), the homozygosity/heterozygosity ratio decreased below its standing level before the hitchhiking event. When we decreased the strength of the hitchhiking effect by increasing r2 from 10−4 to 10−3, the immediate increase of homozygosity was smaller than shown in Figure 3A, but the decrease of homozygosity over time was slower than in Figure 3A (data not shown). The rapid change of homozygosity over time implies that the high-frequency alleles produced by hitchhiking are quickly fixed in the population. We confirmed this by recording fixation events of allele M over time (Figure 3B). There was a great increase of fixation events during and shortly after the substitution of B. As hitchhiking events cannot change the average substitution rate of neutral alleles (Birky and Walsh 1988), the transient increase of the fixation rate should be followed by a period of a low fixation rate. Indeed, we observed a reduction of fixation events following the period of a high fixation rate (Figure 3B). The same pattern was observed when we replaced the hitchhiking event with a population bottleneck in the simulation (data not shown).
DISCUSSION
We demonstrated by simulations that formulas for background selection and hitchhiking can be combined to predict genetic variation at a linked neutral locus, despite the fact that these processes may interfere with one other. Analytic solutions previously known for the hitchhiking effect agreed well with our simulation results when effective population size and the fixation probability of the selected allele were modified by background selection. Two different simulation procedures were used in models 1 and 2. In these models, background selection occurs at one locus, as if deleterious mutations distributed over an entire chromosome were collapsed into a single locus. Therefore, the results obtained in this study might not be directly applicable to the realistic situation where background selection results from deleterious mutations at many loci. However, since it was shown that the effects of deleterious mutations at two loci combine multiplicatively to reduce genetic variation (Hudson and Kaplan 1995; Nordborget al. 1996) and the fixation probability of a favorable allele (Barton 1995), background selection at many loci is not likely to change the overall results of this study.
Our simulation results indicated that (5) and (13) are approximately correct even if the frequency, pa, of the deleterious allele deviates from its equilibrium value due to strong directional selection at linked loci. In simulations 2-7b and 2-8b, we expected further reduction of heterozygosity at T = 0 because pa increased significantly during the selective phase, which might mean a further reduction of effective population size. However, the following argument shows that an increase of pa does not necessarily imply a decrease of effective population size. Hudson and Kaplan (1995) explained how background selection can reduce effective population size and thus the size of gene genealogies. In a population in mutation-selection balance, two ancestral genes can have a common ancestor in the previous generation only if two genes have the same number of deleterious alleles at linked loci. As time runs backward, ancestral genes are preferentially found in chromosomes with no deleterious alleles at linked loci because chromosomes carrying deleterious alleles have a small probability of having descendants. Therefore, the rate at which gene lineages coalesce increases as the number of chromosomes with no deleterious alleles decreases. However, if a favorable allele B that was initially linked with deleterious alleles goes to fixation (simulations 2-7b and 2-8b), some ancestral genes must be found on chromosomes with deleterious allele a during the selective phase. This is because all the descendants at Neu after fixation should be in linkage with allele B, which was initially in linkage with a. The association between B and a decays by recombination as time goes backward to the early stage of the selective phase. Therefore, the increase of pa in the middle of the selective phases in 2-7b and 2-8b may not affect effective population size as the ancestral genes are found on chromosomes with a as well as those without a. Slight increases of heterozygosities in 2-7b and 2-8b indicate that this effect slightly increases, rather than decreases, effective population size.
The generalization of model 1 leading to (6) describes the overall relationship between recombination rate and genetic variation. Equation 6 can be used to estimate the parameters of background selection and/or hitchhiking in natural populations. The intensity of a selective sweep, αν(α = 2Ns and ν is the number of strongly selected substitutions per nucleotide per generation), in D. melanogaster populations has previously been estimated without incorporating background selection (Wiehe and Stephan 1993; Stephan 1995). Therefore, this method was thought to have overestimated the effect of hitchhiking. However, (6) suggests that, in regions of very low recombination, the reduction of heterozygosity is mainly determined by hitchhiking unless the effect of background selection is extremely strong. This condition is likely to be met in D. melanogaster populations for the following reason. Charlesworth (1996) predicted a pattern of genetic variation across the D. melanogaster genome using a perhaploid genome mutation rate 0.48, which was obtained from the mutation accumulation studies by Mukai et al. (1974) and Ohnishi (1977). As a result, the expected level of heterozygosity was very close to the observed level of nucleotide diversity. However, recent surveys of the rates and effects of deleterious mutations in D. melanogaster suggest about fivefold lower values of the mutation rate (Keightley and Eyre-Walker 1999). If true, the expected level of heterozygosity explained by background selection should be significantly higher than that obtained by Charlesworth (1996) (see Equation 10), and the remaining reduction of heterozygosity should be explained by hitchhiking. As the value of αν was mainly determined by loci in regions of low recombination in which the relationship between nucleotide diversity and recombination is approximately linear, the estimation of Wiehe and Stephan (1993) and Stephan (1995) appears to be valid, but the interpretation of their results has to take into account that ν now depends on background selection (see Equation 6).
Heterozygosities over a physical distance. Graphs were produced from Equation 13. Distance is defined to be zero at the location of Fav. Recombination rates are assumed to follow Haldane's map function with ρ = 10−9. The effect of background selection is uniform over this region, with N2 = 106 and μ = 10−9.
Our results can also be used to interpret recent observations of genetic variation on the Y chromosomes of D. melanogaster and D. simulans. Zurovcova and Eanes (1999) reported strongly reduced nucleotide diversity in the dynein gene Dhc-Yh3. Background selection is unlikely to explain this result since the Y chromosome encodes only six known genes and, therefore, background selection is probably not very strong. However, according to Equation 6, hitchhiking—even if it is very rare—is consistent with the observed extreme reduction of diversity on the nonrecombining Y chromossome.
Separation of the product αν into its components, i.e., the rate and the strength of directional selection, requires the measurement of the time between consecutive hitchhiking events or the selection coefficients of selected alleles. Equation 13 suggests that τ and s cannot be estimated separately from levels of nucleotide diversity, even if θ for the region is known. A possible solution is to find a local reduction of heterozygosity in a chromosomal region as a result of a single hitchhiking event. Figure 4 shows that combinations of τ and s produce unique patterns of expected heterozygosity over a physical distance. Therefore, a joint estimation of τ and s could be made by fitting (13) to multilocus polymorphism data in a chromosomal region. This approach will be useful in regions of high recombination where local reduction spans over a relatively short distance and the estimate of θ can be obtained from data from adjacent regions that are assumed to be close to the equilibrium level of heterozygosity.
Acknowledgments
We thank two reviewers and Bruce Walsh for valuable comments on the manuscript. This work was supported in part by National Science Foundation grant DEB-9896179 and by funds from the University of Rochester to W.S., and by an Ernst Caspari fellowship to Y.K.
APPENDIX A
Eight haplotype frequencies (x1, x2, … , x8) representing ABM, ABm, AbM, Abm, aBM, aBm, abM, and abm, respectively, change deterministically by selection, recombination, and deleterious mutations to (x1′, x2′, … , x8′). Equations for recombination are easily derived from a table of random matings where double recombination events are ignored.
1. Selection:
2. Recombination:
3. Mutation:
APPENDIX B
In the simulation of model 2, one mutant, M, is introduced at T = L with probability δ or at T = λ with probability 1 − δ, as described above. Its contribution to heterozygosity at T = 0 is 2y(1 − y), where y is the frequency of M at T = 0. The simulation measures the average value of this contribution by introducing many M's independently. Mean heterozygosity contributed by one M, π*, is predicted to be
Footnotes
-
Communicating editor: J. B. Walsh
- Received December 7, 1999.
- Accepted March 20, 2000.
- Copyright © 2000 by the Genetics Society of America