## Abstract

A central problem in population genetics is to detect and analyze positive natural selection by which beneficial mutations are driven to fixation. The hitchhiking effect of a rapidly spreading beneficial mutation, which results in local removal of standing genetic variation, allows such an analysis using DNA sequence polymorphism. However, the current mathematical theory that predicts the pattern of genetic hitchhiking relies on the assumption that a beneficial mutation increases to a high frequency in a single random-mating population, which is certainly violated in reality. Individuals in natural populations are distributed over a geographic space. The spread of a beneficial allele can be delayed by limited migration of individuals over the space and its hitchhiking effect can also be affected. To study this effect of geographic structure on genetic hitchhiking, we analyze a simple model of directional selection in a subdivided population. In contrast to previous studies on hitchhiking in subdivided populations, we mainly investigate the range of sufficiently high migration rates that would homogenize genetic variation at neutral loci. We provide a heuristic mathematical analysis that describes how the genealogical structure at a neutral locus linked to the locus under selection is expected to change in a population divided into two demes. Our results indicate that the overall strength of genetic hitchhiking—the degree to which expected heterozygosity decreases—is diminished by population subdivision, mainly because opportunity for the breakdown of hitchhiking by recombination increases as the spread of the beneficial mutation across demes is delayed when migration rate is much smaller than the strength of selection. Furthermore, the amount of genetic variation after a selective sweep is expected to be unequal over demes: a greater reduction in expected heterozygosity occurs in the subpopulation from which the beneficial mutation originates than in its neighboring subpopulations. This raises a possibility of detecting a “hidden” geographic structure of population by carefully analyzing the pattern of a selective sweep.

WHEN a beneficial mutation arises in a population and rapidly increases to high frequency by directional selection, it also increases the frequency of neutral alleles on the same chromosome at linked polymorphic loci, resulting in a sudden reduction in genetic variation. This effect, termed genetic hitchhiking (Maynard Smith and Haigh 1974) or selective sweep, provides a powerful means to identify and study recent episodes of adaptive evolution (reviewed in Nielsen 2005; Sabeti *et al.* 2006; Thornton *et al.* 2007; Akey 2009; Stephan 2010). Numerous studies advanced the mathematical model of this evolutionary process and now provide accurate theoretical predictions and tools for genomic data analyses (Maynard Smith and Haigh 1974; Kaplan *et al.* 1989; Stephan *et al.* 1992; Fay and Wu 2000; Kim and Stephan 2002; Hermisson and Pennings 2005; Etheridge *et al.* 2006). However, major theoretical results were obtained from models that consider the spread of a beneficial mutation in a single random-mating population.

Natural populations are composed of individuals that are distributed over geographical space. Therefore, with limited migration, mating occurs more frequently among individuals that are close to each other than among those that are far apart on the space. This geographic structure is often reduced to a simple demographic model in which a number of small populations (or demes), each of which is panmictic, are connected to each other by limited migration (Wright 1940). In this model, a proportion *m* of individuals in a deme is replaced by migrants from other demes each generation. A fundamental result of spatial population genetics is that, if *m* is sufficiently large so that *Nm* > 1 where *N* is the population size of a deme, neutral genetic variation is homogenized over the entire system of connected demes at equilibrium under mutation–drift–migration balance (Slatkin 1987). Therefore, even when a population has a clear geographic structure (*m* << 0.5), its pattern of polymorphism at many neutral markers may not deviate from that under panmixia. In this case, modeling a natural population to be panmictic would not cause much problem.

However, if an evolutionary process occurs at a timescale that is much shorter than that of the neutral coalescent, limited migration may lead to a heterogeneous footprint of the process over geographic space. The spread of a beneficial mutation is a case of such fast evolutionary change. Let the frequency of a new beneficial mutation initially increase by a factor of 1 + *s* per generation in a local deme. Then, its spread to the entire population will be limited by migration if *m* < *s*, regardless of the value of *Nm*. Namely, the geographical structure of natural populations is expected to cause the allele frequency trajectory of a beneficial allele to deviate from that in a panmictic population. Then, the hitchhiking effect of the beneficial mutation spreading over a subdivided population may be different from that in a panmictic population. For example, Barton (2000) predicted that a beneficial mutation spreading over a geographic distance will produce a weaker hitchhiking effect because the time taken for the mutation to reach fixation is longer compared to the case of panmixia.

The theory of genetic hitchhiking in a subdivided population was developed in previous studies, notably by Slatkin and Wiehe (1998) and Santiago and Caballero (2005). They mainly analyzed the model in which subpopulations are isolated with weak migration (*Nm* << 1), under which a beneficial mutation goes to fixation in one population before it starts to increase in the next population (*i.e.*, the locus under selection is polymorphic in one deme only at a given time). Their results indicated that, if neutral variation near the target of selection is initially homogeneous among subpopulations, the sequential fixations of the beneficial allele may create genetic differentiation, because occasional recombination may cause different neutral alleles at a locus to hitchhike in different subpopulations. Therefore, Wright’s *F*_{ST} would increase from a small to an intermediate value by hitchhiking (Slatkin and Wiehe 1998; Bierne 2010). On the other hand, if subpopulations are highly differentiated initially, the fixation of a beneficial allele in the entire population reduces the level of differentiation at linked neutral loci, because common neutral alleles are likely to hitchhike to the beneficial mutation with limited recombination. Therefore, *F*_{ST} decreases from a high to an intermediate value (Santiago and Caballero 2005). These theoretical studies helped interpret the geographic pattern of neutral polymorphism shaped by natural selection in a highly structured species such as *Drosophila ananassae* (Stephan *et al.* 1998; Baines *et al.* 2004; Das *et al.* 2004) and others (Faure *et al.* 2008).

In this study, we are interested in a subdivided population with more frequent migration (*Nm* > 1), under which the pattern of long-term neutral variation would be similar to that under a panmictic population and, more importantly, the frequency of a beneficial allele may be rising concurrently, if not in complete synchrony, in multiple demes. This biological condition is important because, as explained above, natural populations have a geographic structure that can modify the frequency trajectories of beneficial mutations over space while this structure is often undetected at the neutral loci (thus the genomic average pattern of variation) as *Nm* > 1. We are investigating the effect of this hidden geographic structure on the pattern of selective sweeps. Our analysis begins with a simple model in which two equal-sized panmictic populations are connected by limited migration and a beneficial mutation occurs in one deme and then spreads to the entire population. Even with this simple model, we could provide only heuristic mathematical analysis based on numerous simplifying assumptions and then compare the solution with computer simulations. However, our limited results still clearly demonstrate the important effect of population subdivision on the pattern of genetic hitchhiking.

## Model

A brief overview of the model of selective sweep in a subdivided population, in comparison with that in a panmictic population, is presented in Figure 1. A positively selected allele, *B*, rapidly spreads in the population in association with a particular neutral “hitchhiker” allele at a linked locus. Such an association is broken down by recombination events, which allow the residual levels of polymorphism after the fixation of the *B* allele. Key differences in the process of genetic hitchhiking between panmictic and subdivided populations are identified in Figure 1. First, it takes longer for *B* to be fixed in a subdivided population because the spread of *B* in the second deme starts only after *B* has increased to a sufficiently high frequency in the first deme. Second, in the panmictic population, opportunity for the recombination breakdown of allelic association monotonically decreases while the frequency of *B* increases. On the other hand, such recombination occurs as a two-step process in the subdivided population, first in deme 1 and later in deme 2. This may effectively increase the overall rate of recombination breakdown and thus weaker hitchhiking. Importantly, depending on which *B*-bearing chromosomes migrate and increase under positive selection in the second deme, the same or different alleles at the neutral locus may be hitchhiked to high frequency in the two demes.

In our model of a subdivided population, 2*NK* haploid individuals are subdivided equally into *K* demes. Unless stated otherwise, demes are structured according to the circular stepping-stone model if *K* > 2. Demes are indexed by 1 to *K*, indicating their spatial order. Demes 1 and *K* are neighboring each other, thus making a circular population structure. Generations are not overlapping and, in each generation, haploids reproduce in the order of selection, recombination, and migration. During migration, the proportion *m* of haploids in deme *j* moves to neighboring demes (*m*/2 to deme *j* − 1 and *m*/2 to deme *j* + 1). Note that for *K* = 2, two demes exchange 2*Nm* migrants per generation. For modeling the hitchhiking effect, we consider two biallelic loci—selected and neutral loci that are partially linked with the probability of recombination *r* per generation. At the selected locus, mutation from the wild-type allele *b* to a beneficial allele *B*, with selective advantage *s*, arises on a particular chromosome in deme 1. At the time of this mutation, each of *K* demes is polymorphic at the neutral locus, with *A* and *a* alleles in frequencies *p*_{0} and 1 − *p*_{0}, respectively. After genetic hitchhiking, the frequency of *A* in deme *j* changes to *p _{j}* and

*p*= ∑

*/*

_{j}p_{j}*K*. We are mainly concerned about the change in heterozygosity in the entire population (from

A forward-in-time simulation is built directly on this discrete-time genetic model. At each generation, haplotype frequencies at *K* demes are changed deterministically by selection, recombination, and migration, followed by the step of random sampling that uses a random binominal number generator (Kim and Wiehe 2009). The initial allele frequency at the neutral locus is given as a fixed value (0.2) for all demes. We also specified initial frequencies for *K* demes sampled from equilibrium distribution at the balance of mutation, migration, and drift (obtained by separate forward-in-time simulations) but this did not yield different outcomes when the hitchhiking effect was measured by

## Migration-Limited Trajectory of Beneficial Mutation

The frequency of beneficial mutation rapidly increases within a deme by positive selection. However, its spread into the entire population might be limited by the rate of migration between demes. Namely, there might be a “delay” in the fixation of beneficial mutation in a subdivided population compared to the panmictic population of equal size. To analyze the hitchhiking effect in this model, we first need to understand how much delay in the spread of beneficial mutation is caused by geographical structure of the population. We first consider the case of *K* = 2. It is assumed that the beneficial mutation arising in deme 1 is eventually fixed in both demes 1 and 2. Let *X _{j}*(

*T*) be the frequency of allele

*B*in deme

*j*at time

*T*, when time is counted forward in generations and

*T*= 0 when the mutation to allele

*B*happened in deme 1. Then, we define

*B*that survives loss by genetic drift is established in deme

*j*. We define the delay in the spread of allele

*B*by

*m*<<

*s*,

*B*allele by genetic drift but later receive the allele from deme 2. In this case the roles of deme 1 and 2 are reversed and the delay becomes

The expected value of δ can be solved by approximating *X*_{1}(*T*) by the deterministic trajectory of a beneficial mutation, starting from frequency ε (<<1), in a single panmictic population of size 2*N*. It is therefore assumed that migration between two populations does not affect the trajectory, which might be justified for *m* < *s*. Therefore, we define*et al.* 1992). In each generation, on average 2*NmX*_{1}(*T*) copies of allele *B* enter deme 2 from deme 1. Each copy of *B* is lost by genetic drift with probability 1 − 2*s*, approximately, assuming *s* << 1. At least one copy of *B* successfully establishes in deme 2 at time *T* with probability *n*, and making a Taylor series approximation)

An essentially identical result was obtained by Slatkin (1976). With 4*Nm* ≫1, *e ^{sT}*)

^{4}

*≈ 1 + 4*

^{Nm}*Nm*ε

*e*. Then, a simpler approximation (however, an overestimate of

^{sT}Considering that the allele frequency of a beneficial mutation starting from one copy is elevated by the inverse of its fixation probability (*i.e.*, probability of surviving extinction by genetic drift ≈ 2*s*) relative to its deterministic trajectory, we may use ε = 1/(2*N*)/(2*s*) = 1/(4*Ns*) to maximize the fit of Equation 1 to the stochastic trajectory of allele *B* (Maynard Smith 1971; Kim and Nielsen 2004). Then, we obtain the approximation of the mean delay time*s*) as the duration of the fixation process (*s* and *m*. Figure 2 shows the comparison of these approximations to the delay time observed in frequency-based stochastic simulations.

## Hitchhiking Effect of the Beneficial Mutation—“Marked” Coalescent

Next, we analyze the hitchhiking effect of a beneficial mutation that spreads across demes in the manner described above. Our goal is to obtain an approximate solution for the change of heterozygosity due to hitchhiking. The analysis is based on modeling genetic hitchhiking by the “marked” coalescent, which is derived from the structural coalescent model of genetic hitchhiking (Kaplan *et al.* 1989). In the structural coalescent model, gene lineages at a neutral locus traced from a present-day sample into the past are described to jump between two genetic backgrounds (corresponding to beneficial, *B*, and ancestral, *b*, alleles) by recombination, while coalescence among lineages is allowed only when they are on the same background. It can be shown that, during the selective phase (the period between the birth and fixation of the *B* allele), lineages that arrive in the *b* background rarely move back to the *B* background and also rarely coalesce to each other (Kaplan *et al.* 1989; Durrett and Schweinsberg 2004; Etheridge *et al.* 2006; Pfaffelhuber *et al.* 2006). If we thus model that lineages moving from the *B* to the *b* background by recombination remain distinct until they exit the selective phase, the genealogy at the neutral locus is fully described by events of such recombination mapped along the genealogy at the selected locus (referred to below as the *B* genealogy). Therefore, one may first obtain a *B* genealogy and then calculate how often it is marked by recombination events to predict the pattern of genetic variation at a linked neutral locus (Pfaffelhuber *et al.* 2006). Here we first show how a marked coalescent allows a simple derivation of the standard result of genetic hitchhiking [hard selective sweep in a single random mating population (Maynard Smith and Haigh 1974)].

As we are interested in obtaining the heterozygosity after the fixation of the beneficial mutation, we obtain the *B* genealogy starting from two distinct beneficial mutations on sampled chromosomes. We consider a single panmictic population of size 2*N* in which the beneficial mutation is quasi-fixed with frequency 1 − ε. Time, *t*, is now counted backward in generations, *t* = 0 being the present (time of sampling). Then, the frequency of beneficial mutation, *B*, in the population is modeled to decrease from 1 − ε to ε by*B* allele is introduced in the population. The lineages of the two *B* alleles will coalesce at *t* = *t*_{C} (≤τ). While tracing each lineage of *B*, the associated allele at the neutral locus may recombine onto a chromosome carrying allele *b* at some time between *t* = 0 and *t*_{C}. The probability that this event happens at time *t*, given that it did not happen in the previous *t* − 1 generations, is (1 − *x*(*t*))*r*, which is an increasing function with *t*. If no recombination event occurs on either lineage, the two linked neutral lineages will coalesce at *t* = *t*_{C}, which happens with probability *P*_{coal},*f _{c}*(

*t*) is the probability distribution of the coalescent time at the selected locus. It is given approximately by

*f*(

_{c}*t*) to be 1 for

*t*= τ and 0 for

*t*< τ − 1: the coalescent occurs at the time of beneficial mutation. This is equivalent to ignoring genetic drift within the allelic class of

*B*(Maynard Smith and Haigh 1974; Barton 1998; Nielsen

*et al.*2005). Then,

Ignoring the probability of mutation in the period between *t* = 0 and τ, the two neutral alleles in the sample will be observed different only if their ancestors at *t* = τ are distinct, which happens with probability 1 − *P*_{coal}, and different. Let *t* = τ. Then, the mean heterozygosity at the neutral locus at present is given approximately by

Using ε = 1/(2*N*), we obtain the solution of Maynard Smith and Haigh (1974). As explained above, a better solution that corrects for the stochastic effect of conditioning on the fixation of beneficial mutation is obtained by using ε = 1/(4*Ns*), which agrees reasonably well with stochastic simulation results (Kim and Nielsen 2004). More accurate solutions available so far effectively take the full distribution of the coalescent time (Equation 8) (Stephan *et al.* 1992; Barton 1998) and furthermore the full stochastic trajectory of beneficial mutation (Barton 1998; Etheridge *et al.* 2006; Pfaffelhuber *et al.* 2006) into account. However, in this study, we aim to derive solutions for the accuracy of simple approximation *i.e.*, ignoring genetic drift within the subpopulation of beneficial alleles, as our primary aim is to examine the relative strengths of hitchhiking in panmictic *vs.* subdivided populations.

## Hitchhiking Effect in a Subdivided Population, *K* = 2

Next, we come back to the model of a subdivided population with *K* = 2 in which a beneficial mutation, *B*, arising first in deme 1, propagates into deme 2 with a delay of δ generations. We assume that *m* < *s* << 1 but 4*Nm* >> 1. With this migration rate, the pattern of variation at a neutral locus without selection is close to that of neutral equilibrium in a panmictic population. We thus assume that the heterozygosity at the neutral locus immediately before the time of beneficial mutation (*t* = δ + τ as defined below) is given by *B* is fixed in the entire population. We denote them as *H*^{(11)}, *H*^{(22)}, and *H*^{(12)}, respectively.

Again, time is counted backward from the present (*t* = 0), at which the allele *B* is fixed in the entire population and two haploids are sampled randomly. Assuming migration is weaker than selection (once a copy of *B* enters a deme, its frequency increases mostly due to selection without being affected by continuous in- and outflow of *B*), we may model the trajectory of allele *B* frequency in deme 1 and deme 2 by*B*, its hitchhiking effect on the neutral locus is determined by first obtaining the coalescent tree at the selected locus and then marking the recombination events on this tree. Below, approximations are obtained for strong selection relative to migration. The lineage of a copy of a *B* allele (a “*B* lineage”) in deme 1 will jump to deme 2 with probability ∼2*Nmx*_{2}(*t* + 1)/(2*Nx*_{1}(*t*)) ≈ *mx*_{2}(*t*)/*x*_{1}(*t*) and that in deme 2 will jump to deme 1 with probability ∼*mx*_{1}(*t*)/*x*_{2}(*t*). Then, if selection is strong such that *m* << 1/τ (but still *Nm* > 1), migrations of *B* lineages in either direction will be rare until *x*_{2}(*t*) becomes close to zero, which drastically increases the migration rate from deme 2 to deme 1. On the other hand, if the strength of selection is moderate such that *m* > 1/τ ≈ *s*/log[4*Ns*], there might be multiple jumps of *B* lineages in both directions between times 0 and τ. To derive simple approximations, we mainly consider the range of large *s*/*m* (or *m* ≪ 1/τ). The applicability of our approximations to a wider parameter range is examined by computer simulations. We also provide an alternative derivation for *m* > 1/τ in the *Appendix*.

### i. Two gene copies from deme 2

As will be demonstrated shortly, the effect of population subdivision on the hitchhiking effect, examined by relative reduction in expected heterozygosity, may be most pronounced in deme 2 (*i.e.*, effect on *B* genealogy” is determined as we trace the lineages of two *B* alleles on these chromosomes backward in time until they coalesce. Two mutually exclusive genealogical events occur. One event is the separate migrations of two lineages, from deme 2 and deme 1, followed by the coalescence in deme 1. The other event is the coalescence of two lineages in deme 2, followed by the migration of the common ancestor into deme 1. As *m* ≪ 1/τ, we ignore the possibility that a *B* lineage migrates to deme 1 and then later migrates back to deme 2 and coalesces. Therefore, once any of these two lineages migrates to deme 1, the coalescence of the two lineages should happen in deme 1. The coalescence in deme 1 must occur before *t* = τ + δ and that in deme 2 before *t* = τ . The probabilities of these two events are given by 1 − *Q* and *Q*, respectively.

Again, the probability that a given lineage in deme 2 migrates to deme 1 is ∼*mx*_{1}(*t*)/*x*_{2}(*t*). The probability that two lineages at deme 2, which remained distinct until time *t* − 1, coalesce at time *t* is 1/(2*Nx*_{2}(*t*)). Then, the probability that the coalescent event occurs at deme 2 at time *t* is given by

The total probability of the coalescence in deme 2 is therefore

With probability 1 − *Q*, the coalescence of *B* lineages occurs at deme 1. In this case, it is modeled that one *B* lineage first migrates to deme 1 at time *t _{m}*

_{1}and the other later at

*t*

_{m}_{2}(

*t*

_{m}_{1}<

*t*

_{m}_{2}). The probability distribution of

*t*

_{m}_{1}is obtained using a similar approximation as

Figure 4 shows that the peaks of *f*_{2}* _{c}*(

*t*) and

*f*

_{m}_{1}(

*t*) both occur between τ/2 and τ. It is because both probabilities for the coalescent and migration increase substantially when

*x*

_{2}(

*t*) becomes small. Given the first migration occurs at

*t*

_{m}_{1},

*t*

_{m}_{2}should be distributed between

*t*

_{m}_{1}and τ. However, as

*m*≪ 1/τ we may assume that

*t*

_{m}_{2}is simply equal to τ (the remaining

*B*lineage traces back to the first

*B*allele that, forward in time, entered deme 2 and survived stochastic loss). As demonstrated below, this simplification leads to an overestimation of heterozygosity at the linked neutral locus. Furthermore, we make another simplifying assumption that two

*B*lineages now in deme 1 coalesce at time

*t*= τ + δ, which also leads to an overestimation of heterozygosity at the neutral locus by the same degree that Equation 10 overestimates the heterozygosity in the standard (panmixia) model of genetic hitchhiking.

Heterozygosity at the neutral locus is determined by the probability that the *B* genealogy described above is marked by a recombination event by which a linked neutral lineage moves to a chromosome carrying allele *b*. The probability of this recombination event at time *t* is (1 − *x*_{1}(*t*))*r* if the lineage is in deme 1 and (1 − *x*_{2}(*t*))*r* if the lineage is in deme 2. Therefore, no recombination event is marked on a lineage that remains in deme *i* from time *t*_{1} to *t*_{2} with probability *B* lineages coalesce at deme 1 (with probability *i.e.*, neutral lineages coalesce at the same time *B* lineages coalesce) is approximated by

If δ is smaller than τ/2 (*i.e.*,

If *t _{m}*

_{1}≈ τ (for very small

*m*),

*B*genealogy (= δ) due to migration. {Note that more accurate comparison between panmictic

*vs.*subdivided populations should take the difference in ε [

*e.g.*, 1/(8

*Ns*)

*vs.*1/(4

*Ns*), respectively, for a total population size of 4

*N*] into account. Therefore, the comparison in this case should be made between 1 − (ε/2)

^{2}

^{r}^{/}

*and*

^{s}If δ >> τ [*x*_{1}(*r*) ≈ 1], Equation 16 is similarly approximated to ε^{4}^{r}^{/}* ^{s}*, which basically means a twofold increase in the neutral lineages’ opportunity to dissociate from

*B*lineages. However, this radical effect on genetic hitchhiking may not happen frequently because δ ≫ τ requires a very low migration rate, which causes

*B*lineages to coalesce in deme 2 rather than in deme 1.

If *B* lineages coalesce in deme 2, the probability that this genealogy is marked by zero recombination events is approximated by^{2}^{r}^{/}* ^{s}* as the time of the coalescent is smaller than τ. Finally, the hitchhiking effect on the heterozygosity for two gene copies sampled in deme 2 is given by

Figures 5 and 6 show the comparison between this approximation and results from frequency-based simulation. As we took steps in simplifying formulas that consistently lead to the overestimation of relative heterozygosity after a selective sweep, our approximation predicts greater elevation of heterozygosity (*i.e.*, greater decline of the hitchhiking effect) due to population subdivision than suggested by simulation results.

### ii. One gene copy from deme 1 and the other from deme 2

Next, the approximate solution for the expected heterozygosity when one chromosome is sampled from deme 1 and the other from deme 2, *H*^{(12)}, is similarly obtained. Ignoring the possibility that the *B* lineage starting in deme 1 enters deme 2 and coalesces to the other *B* lineage there, we may consider a model in which the coalescence of two *B* lineages occurs in deme 1 only: one lineage always remains in deme 1 and the other lineage migrates from deme 2 to deme 1 at time *t _{m}*. This will lead to an overestimation of

*H*

^{(12)}because the coalescence in deme 2, which would produce smaller

*B*genealogy with less opportunity for recombination, is precluded. Furthermore, we again make the simplifying assumption that the coalescent event in deme 1 happens at

*t*= τ + δ. Then, the probability that this genealogy is marked by zero recombination events is approximately

*f*(

_{m}*t*) is the probability distribution of

*t*. Therefore,

_{m}We may use

However, when *m*/*s* is very small, which causes the distribution of *t _{m}* to be sharply concentrated very closely to τ, the above equation fails to accurately describe the probability density (data not shown). We thus take another step of simplification by using

*f*(

_{m}*t*) = 1 if

*t*= τ and 0 otherwise: the migration of the lineage from deme 2 to 1 happens at time τ. This again leads to the overestimation of

*H*

^{(12)}because the length of the lineage staying in deme 2 and being subject to recombination marking is maximized.

*H*

^{(12)}is then finally reduced to

*B*lineage, starting from deme 2, is subject to the additional marking of a recombination event due to its extension by δ. Figure 7 shows the analytic approximation to the relative increase of

*H*

^{(12)}due to population subdivision [Equation 23, with

*H*

^{(12)}caused by population subdivision.

### iii. Two gene copies from deme 1

We finally investigate the expected heterozygosity when both chromosomes are sampled from deme 1. When it is assumed that lineages starting in deme 1 do not migrate to deme 2, the hitchhiking effect is determined solely due to the trajectory of *B* in deme 1 only. In that case*lower* than that in the panmictic population that would be created if two demes are merged via free migration. The latter is given by *B* is halved as the population size doubles. It is well known that, if all else is equal, reduction in polymorphism by a single selective sweep is greater in a smaller population because the frequency trajectory of the beneficial allele is shorter (Barton 2000). Therefore, the effect of population subdivision with low migration is to *increase* the hitchhiking effect when two chromosomes are sampled from the deme where the beneficial allele originated. However, as the migration rate increases, *H*^{(11)} should approach that in the merged panmictic population. Therefore, it is expected that the relative effect of population subdivision on hitchhiking, *H*^{(11)}_{subdiv}/*H*^{(11)}_{panmic}, is ∼(1−ε^{2}^{r}^{/}* ^{s}*)/(1−(ε/2)

^{2}

^{r}^{/}

*) with*

^{s}*m*≪ 1/τ but gradually increases to 1 as

*m*increases. Simulation results confirm this prediction (Figure 8). In the following, we use Equation 24 for

*H*

^{(11)}as we are mainly concerned in the case of

*m*≪

*s*.

### iv. Random sample from the entire population

The expected effect of hitchhiking on the entire population, given that two chromosomes are sampled randomly over two demes, is therefore

Figures 9 and 10 show that the relative heterozygosity after genetic hitchhiking is larger in the subdivided population with small *m* than in the panmictic population. Namely, population subdivision weakens the strength of genetic hitchhiking. Compared to simulation results, our approximation overestimates the elevation of relative heterozygosity for small *m*, as expected from our consistent use of assumptions that lead to the overestimation of heterozygosity. However, we also note that this effect of population subdivision is limited to very small *m*/*s* values. With intermediate *m* (∼0.1*s*), both the approximation (although expected to be inaccurate) and simulation results suggest that only minor elevation of heterozygosity occurs. An alternative derivation for *m* > 1/τ, provided in the *Appendix*, also suggests that no significant increase of heterozygosity occurs due to population subdivision.

We also plotted in Figure 10 the expected increase in time taken for a beneficial mutation to become fixed in the entire population (“fixation time”), given by the ratio of the expectation under population subdivision, *Ns*]/*s*. As the relative length of the fixation time increases in the subdivided population with decreasing *m*, *H*^{(22)} and *H*^{(12)} respond differently to decreasing *m*, the overall effect being less drastic increase of *H*^{(}^{T}^{)} than that of fixation time.

## Subdivided Population With *K* = 10

Next, we expand our analysis to the stepping-stone model of a subdivided population with more than two demes. The beneficial mutation arising in deme 1 is expected to propagate symmetrically in both directions and the last deme where the mutation is fixed is ⌊*K*/2⌋ steps away from deme 1. We therefore predict that, at least for *m* ≪ *s*, the mean total time taken for the beneficial mutation to become fixed in the entire population (fixation time” is approximately

Compared to simulation results, this approximation works well asymptotically for very small *m*/*s* while the panmictic approximation works for larger *m* (Figure 11). As the migration rate increases, the fixation time approaches that in the corresponding panmictic population.

Simulation results were obtained only for the hitchhiking effect with *K* = 10. The pattern of heterozygosity immediately after the fixation of the beneficial allele in the entire population is similar to that with *K* = 2 (Figure 12A): a smaller migration rate between neighboring demes results in higher heterozygosity (for two chromosomes randomly selected from the entire population; shaded curve in Figure 12A). However, similar to the case of *K* = 2, the elevation of heterozogosity is less pronounced than the increase of fixation time. For example, comparing Figures 11 and 12A, while fixation time increases by 68%, heterozygosity increases by 34%. When two chromosomes are sampled from each subpopulation, heterozygosity greatly depends on the location of sampling if *m* ≪ *s*: heterozygosity is lowest in the deme of the beneficial mutation’s origin (deme 1) and progressively increases as distance from deme 1 increases (Figure 12A). This generalizes the pattern obtained above with *K* = 2 that the strength of genetic hitchhiking diminishes as the beneficial mutation spreads across neighboring demes. As explained above, genetic hitchhiking reduces heterozygosity in the first deme of a subdivided population with small *m*/*s* more than it reduces heterozygosity in the merged (panmictic) population because deme 1 behaves similarly to a small isolated population. However, in other demes (*e.g.*, demes 3–6 in Figure 12A), heterozygosity greatly increases as migration rate decreases. This produces an overall negative correlation between heterozygosity (over the entire population) and migration rate. We also note that, as migration rate decreases, heterozygosity over the entire population increases much faster than heterozygosity for a single deme. This reflects the fact that genetic hitchhiking spreading over a subdivided population creates genetic differentiation among subpopulations that were initially homogeneous (see below), as predicted by Slatkin and Wiehe (1998).

For a comparison, we also simulated selective sweeps spreading over *K* = 10 demes that are arranged according to Wright’s island model, in which a given haploid chromosome enters a common migrant pool with probability *m* per generation and then migrates to a randomly chosen deme. Except for this change in spatial arrangement, the same sets of parameters used in the simulations of the stepping-stone model were used. Figure 12B shows that the mean heterozygosity over the entire population after a selective sweep (shaded curve) increases with decreasing *m*. However, this increase is much smaller than that for the stepping-stone model. Similar to the stepping-stone model, the mean heterozygosity in deme 1, the birthplace of the beneficial allele, is reduced with low migration. Mean heterozygosities in other demes (results for only three of nine demes are shown in Figure 12B) slightly increase with decreasing migration. Much smaller heterozygosity for an individual deme than that for the entire population again reflects genetic differentiation among demes. Comparing Figure 12A and 12B, we conclude that the pattern of a selective sweep is more severely affected in the stepping-stone model than in the island model, which demonstrates the complexity of genetic hitchhiking caused by a beneficial mutation that spreads in one-dimensional space.

Further results, regarding the variance of heterozygosity and population differentiation, from the simulation of the stepping-stone model are given in Table 1. The coefficient of variation, *H*^{(}^{ij}^{)} is the expected heterozygosity for one chromosome sampled in deme *i* and another in deme *j*, was obtained over 10,000 replicates for a given parameter set. We also calculated *F*_{ST} = (*H*^{(}^{T}^{)} − *H*^{(}^{S}^{)})/*H*^{(}^{T}^{)}, where *H*^{(}^{T}^{)} is the expected heterozygosity in the entire population and *r*/*s*, population subdivision (reduction in *m*) causes a moderate increase in the coefficient of variation while it causes a drastic increase in *F*_{ST}. Note that simulation started with uniform frequencies of a neutral allele in all demes (*F*_{ST} = 0). As predicted by Slatkin and Wiehe (1998) and Bierne (2010), the effect of population subdivision on genetic differentiation after a selective sweep critically depends on the recombination rate: Table 1 shows that, for a given *m*, intermediate values of *r*/*s* (0.03 ∼ 0.1) produce the largest *F*_{ST}.

## Discussion

Complex geographic and demographic structures of natural populations have potentially great impact on evolutionary genetic processes and thus on the interpretation of DNA sequence polymorphism (Jensen *et al.* 2005; Nielsen *et al.* 2005; Li and Stephan 2006; Kim and Gulisija 2010; Stephan 2010). To examine the effect of spatial demographic structure on the pattern of selective sweeps, we used a simplified model of population subdivision. While a relatively slow evolutionary process such as the coalescence of neutral gene lineages may be less sensitive to spatial structure, a fast nonequilibrium process that sweeps across the entire population can be greatly affected by population subdivision, as demonstrated in this study. On the basis of results obtained above, we find that population subdivision causes several important modifications in the strength and pattern of a selective sweep.

First, the strength of genetic hitchhiking, measured by the reduction of expected heterozygosity, is diminished in a subdivided population if the migration rate is much smaller than the selective advantage of the beneficial mutation, while the population structure shaped by the same migration rate might be undetected in the examination of neutral polymorphism. As briefly argued by Barton (2000), this effect is attributed to increased time taken for the beneficial mutation to reach high frequency, which provides more opportunity for the breakdown of hitchhiking by recombination. This result implies that the strength of directional selection estimated from the chromosomal span of reduced polymorphism (Kim and Stephan 2002; Thornton *et al.* 2007) assuming panmixia might be an underestimate. However, we also find that the relative increase in the fixation time of a beneficial mutation under population subdivision is much greater than the relative increase in expected heterozygosity. This implies that, if the strength of selection is estimated separately from the chromosomal span of reduced polymorphism and from the age of the sweeping haplotype estimated by rare mutations (for example, Sáez *et al.* 2003; Meiklejohn *et al.* 2004; Xue *et al.* 2006), the former estimate might be greater than the latter (and closer to the true value). It is not, however, clear whether such a discrepancy can be detected with a reasonable statistical power.

The second important result is that a beneficial mutation leaves weaker signature of selection as it spreads across distant demes after its origin in the first population. With *K* = 2, while population subdivision results in a much *stronger* hitchhiking effect in the first population relative to that expected under panmixia (Equation 24 with *m* ≪ *s*), it causes a much *weaker* hitchhiking effect in the second population: the relative level of expected polymorphism is much higher than the panmixia. It is because the extension of neutral lineages exposed to hitchhike-breaking recombination applies only to those descending to the second population. Comparing Figures 6 and 8 we find that the expected heterozygosity can be up to 30% higher in the second population than in the first. Figure 12A also indicates that a gradient of heterozygosity will be observed along the path of the beneficial mutation’s propagation in a geographically structured population. Slatkin and Wiehe (1998) and Bierne (2010) showed that genetic hitchhiking can introduce population differentiation (large *F*_{ST}) in an initially homogeneous population (small *F*_{ST}) and also implicitly suggested that this population differentiation occurs as a selective sweep leaves asymmetric levels of variation in accordance with our conclusion. Their and our results make it clear that an evolutionary process can lead to heterogeneous outcomes across different demes if it occurs faster than migration. We may use this prediction of the gradient of heterozygosity along the pathway of a sweep to infer the geographical origin of a beneficial mutation and reveal the pattern of migration in a hidden geographic structure by analyzing DNA sequence polymorphism that harbors a signature of selective sweep distributed over multiple demes.

This study used simple models of a structured population, in which demes are of equal size and a given structure of division remains unchanged through time. Actual populations in nature experience complex demographic changes. An important case of demographic complexity is the split of an ancestral population into several demes that are dispersed over geographic structure, as in the evolutionary history of humans. We argue that our analysis of selective sweeps in subdivided populations is still applicable to this demographic scenario. If a beneficial mutation spreads very rapidly across subpopulations, which should be true for strong selection even with a certain delay by limited migration, the selective sweep would effectively occur under constant geographic structure as the timescale of the sweep is much shorter than that of major demographic processes. Our analysis can be applied as long as demes are genetically homogeneous at the time of mutation to the beneficial allele. This condition will be met if different demes were recently established from the split of the ancestral population and thus contain similar profiles of genetic variation.

Although it was not explicitly addressed in this study, our analysis suggests that population subdivision may also significantly modify the other (“higher-order”) pattern of polymorphism as the signature of a selective sweep. The site frequency spectrum and linkage disequilibrium are critically dependent on the shape (branching pattern) of the coalescent trees at the neutral loci, which in turn depends on the shape of genealogy at the locus under selection (*B* genealogy) (Fay and Wu 2000; Kim and Nielsen 2004; McVean 2007; Pfaffelhuber *et al.* 2008). For example, Pfaffelhuber *et al.* (2006) showed that approximating the *B* genealogy by a Yule process corrects the error introduced by the assumption of star-like *B* genealogy. Population subdivision is likely to introduce a much greater deviation from the star-like genealogy that cannot be corrected by a Yule tree. Consider a sample of chromosomes that are distributed randomly over multiple demes. The probability that two *B* lineages coalesce in a deme outside its mutational origin (Equation 14), thus preventing a star-like tree, becomes nonnegligible if *s* is sufficiently larger than *m*. Furthermore, while the markings of recombination events are concentrated near the root of a Yule tree in a single panmictic population, it is expected to occur in many other points along a *B* genealogy, corresponding to times when a *B* lineage is in a deme with low frequency of *B*, in a subdivided population. How these factors modify the patterns of frequency spectrum and linkage disequilibrium, particularly along the demes in the path of beneficial mutation’s propagation, remains to be investigated. This requires the development of computer simulation methods that can generate multisite (realistic DNA sequence) polymorphism data, which is planned for a follow-up article.

## Acknowledgments

We thank Wolfgang Stephan and two anonymous reviewers whose comments greatly improved the manuscript. This work was supported by National Institutes Health grant R01GM084320 and an Ewha Womans University Research Grant of 2010.

## Appendix: Deriving an Approximate Solution to Relative Heterozygosity for τ > 1/*m*

When *s* is not large enough to satisfy τ ≪ 1*/m*, while it may be still greater than *m* such that δ is nonnegligible, we cannot make the assumption that a *B* lineage stays in one deme until *x*_{2}(*t*) becomes small. *B* lineages will jump frequently across demes if *m* is greater than 1/τ. (For example, with 2*N* = 10^{5}, *s* = 0.01, and *m* = 0.001, it is calculated that *m*.) In such a case, we may ignore the probability that two *B* lineages will coalesce at deme 2 [Equations 13 and 14 show that *Q* is a decreasing function of *m* for a given τ, as the dominant factor of *f _{c}*(

*t*) between time 0 and τ/2 is

*B*lineages are then modeled to coalesce in deme 1 only, approximately at

*t*= τ + δ. Let λ(

*t*) (= 1 or 2) be the location (deme) of a randomly chosen

*B*lineage at time

*t*and

*I*

_{1}[λ] be 1 if λ = 1 and 0 otherwise. Then, considering two randomly chosen chromosomes from the entire population and a very small recombination rate, the probability of the coalescence at the neutral locus is given by

where *t*). If the jumps of *B* lineages are frequent enough, a given lineage would be in deme 1 or deme 2 with probabilities *x*_{1}(*t*)/(*x*_{1}(*t*) + *x*_{2}(*t*)) and *x*_{2}(*t*)/(*x*_{1}(*t*) + *x*_{2}(*t*)), respectively, at time *t*. Therefore, using an approximation *x*_{1}(*t*)/(*x*_{1}(*t*) + *x*_{2}(*t*)),

The last approximation results from the earlier assumption that δ is nonnegligible compared to τ *P*_{coal} > (ε/2)^{2}^{r}^{/}* ^{s}*, which means a stronger hitchhiking effect under a subdivided population contrary to the expectation that population subdivision would cause an overall increase in heterozygosity (and thus a decrease in homozygosity). Similar to the underestimation of

*H*

^{(11)}(Equation 24) relative to a panmictic expectation with intermediate

*m*, this error results from specifying the initial frequency of the beneficial mutation in deme 1 to be ε = 1/(4

*Ns*): with frequent migration that satisfies τ > 1/

*m*, the stochastic dynamics of the beneficial mutation would conform to that in a panmictic population of 4

*N*chromosomes. In that case, specifying the initial frequency to be ε would underestimate the length of selective phase, which leads to the overestimation of the hitchhiking effect. In conclusion, this derivation, despite errors, suggests that a clear decline in hitchhiking effect due to population subdivision is not expected under

*m*> 1/τ.

- Received May 1, 2011.
- Accepted June 2, 2011.

- Copyright © 2011 by the Genetics Society of America