## Abstract

Tumors initiate when a population of proliferating cells accumulates a certain number and type of genetic and/or epigenetic alterations. The population dynamics of such sequential acquisition of (epi)genetic alterations has been the topic of much investigation. The phenomenon of stochastic tunneling, where an intermediate mutant in a sequence does not reach fixation in a population before generating a double mutant, has been studied using a variety of computational and mathematical methods. However, the field still lacks a comprehensive analytical description since theoretical predictions of fixation times are available only for cases in which the second mutant is advantageous. Here, we study stochastic tunneling in a Moran model. Analyzing the deterministic dynamics of large populations we systematically identify the parameter regimes captured by existing approaches. Our analysis also reveals fitness landscapes and mutation rates for which finite populations are found in long-lived metastable states. These are landscapes in which the final mutant is not the most advantageous in the sequence, and resulting metastable states are a consequence of a mutation–selection balance. The escape from these states is driven by intrinsic noise, and their location affects the probability of tunneling. Existing methods no longer apply. In these regimes it is the escape from the metastable states that is the key bottleneck; fixation is no longer limited by the emergence of a successful mutant lineage. We used the so-called Wentzel–Kramers–Brillouin method to compute fixation times in these parameter regimes, successfully validated by stochastic simulations. Our work fills a gap left by previous approaches and provides a more comprehensive description of the acquisition of multiple mutations in populations of somatic cells.

UNDERSTANDING the dynamics of an evolving population structure has long been the goal of population genetics. Several authors have constructed probabilistic models to study allele frequency distributions in populations subject to mutation, selection, and genetic drift (Fisher 1930; Wright 1931; Moran 1962). The mathematical analysis of these models leads to an improved understanding of the underlying system and has been crucial for the interpretation of the laws of evolution. This is most evident in the quantitative analysis of cancer, which has seen numerous studies throughout the 20th century that addressed the kinetics of cancer initiation and progression (Nordling 1953; Armitage and Doll 1954; Fisher 1958; Knudson 1971; Moolgavkar 1978). Due to these and other studies (see Weinberg 2013 for a review), we now know that human cancer initiates when cells within a proliferating tissue accumulate a certain number and type of genetic and/or epigenetic alterations. These alterations can be point mutations, amplification and deletion of genomic material, structural changes such as translocations, loss or gain of DNA methylation and histone modifications, and others (Weinberg 2013).

The dynamics of mutation acquisition is governed by evolutionary parameters such as the rate at which alterations arise, the selection effect that these alterations confer to cells, and the size of the population of cells that proliferate within a tissue. Much effort has been devoted to modeling these processes mathematically and computationally and to analyzing the rates at which mutations arise within precancerous tissues (Moolgavkar and Luebeck 1992; Nunney 1999; Gatenby and Vincent 2003; Michor *et al.* 2004; Haeno *et al.* 2009). In particular, several investigators have studied the dynamics of two mutations arising sequentially in a population of a fixed finite number of cells. This scenario describes the inactivation of a tumor-suppressor gene (TSG), which directly regulates the growth and differentiation pathways of the cells (Weinberg 2013). This may or may not lead directly to cancer. Cells in which the TSG is inactivated can take a variety of fitness values. For instance embryonic retina cells with an inactivated RB1 gene can proliferate uncontrollably and create retinoblastomas (Knudson 1971). By definition these cells have a higher fitness than the wild-type cells. Alternatively, if chromosomal instability (CIN) is taken into account, cells with deactivated TSG can have a lower fitness than the wild type (Michor *et al.* 2005). Empirical evidence for the exact fitness (dis)advantage conferred to cells as a result of accumulating mutations is in general difficult to obtain, since *in vitro* growth assays of nontransformed cells are challenging. For this reason and to provide general methods, the modeling literature has addressed a range of fitness values for single- and double-mutant cells (*e.g.*, Michor *et al.* 2004).

Subsequent modeling work on mutation acquisition (Komarova *et al.* 2003; Iwasa *et al.* 2004; Proulx 2011; Haeno *et al.* 2013) has revealed a more detailed picture; a homogeneous population harboring no mutations can move to a homogeneous state in which all cells carry two mutations without ever visiting a homogeneous state in which all cells harbor just one mutation. This phenomenon is referred to as “stochastic tunneling” and represents an additional route to the homogeneous state with two mutations; the sequential route is still available to the system, but it becomes less likely in certain parameter regimes. In this context the term tunneling refers only to overlapping transitions between the homogeneous states; it does not imply a statement about the structure of the underlying fitness landscape. The process we refer to as tunneling is not limited to valley-crossing scenarios. Figure 1A provides a schematic illustration of the tunneling process.

As with much of the existing literature on the stochastic tunneling, our work is not just limited to the case of cancer initiation. Instead our results are related and applicable to more general scenarios in population genetics, including situations in which a heterogeneous population is maintained through mutation–selection balance or the case of Muller’s ratchet when increasingly deleterious mutations become fixed (Muller 1964).

So far, most analytical investigations of stochastic tunneling (Komarova *et al.* 2003; Iwasa *et al.* 2004; Nowak *et al.* 2004; Proulx 2011) have been limited to considering transitions between homogeneous (or monomorphic) states of the population, as indicated in Figure 1A. These investigations were performed assuming that cells proliferate according to the Moran process—a stochastic model of overlapping generations in which one cell division and one death event occur during each time step (Moran 1962). Nowak *et al.* (2004) analyzed the effect of the population size and mutation rates on the rate of appearance of a single cell with two mutations. These authors noted that for small, intermediate, and large populations, it takes two hits, one hit, and zero hits, respectively, for a cell to accumulate two mutations. Here a hit is defined as a rate-limiting step, such as the appearance of an alteration when mutation rates are small. Considering fixation of cells with two mutations, Komarova *et al.* (2003), Iwasa *et al.* (2004), and Haeno *et al.* (2013) obtained explicit expressions for the tunneling rate. Subsequently, Iwasa *et al.* (2005) used the assumption of independent lineages (*i.e.*, individual lineages of cells harboring one mutation were considered to behave independently from each other) to compute the probability distribution for the time of emergence of a single second mutant in intermediate or large populations. Proulx (2011) used a similar branching-process approach to derive a further tunneling rate.

Other types of dynamics such as the Wright–Fisher process have been studied as well; see, *e.g.*, Proulx (2011). In the Wright–Fisher process, cell generations are assumed to be nonoverlapping, so that many birth and death events occur during each time step (Ewens 2004). Using this process, Weinreich and Chao (2005) determined the critical population sizes for sequential fixation or stochastic tunneling, and Weissman *et al.* (2009) calculated the rate of tunneling as a function of the mutation rates, population size, and relative fitness of cells harboring one mutation. Finally, these results were extended to investigate the effects of recombination, or sexual reproduction, on the rate of stochastic tunneling (Weissman *et al.* 2010; Lynch 2010; Altland *et al.* 2011). These authors found that the time to establishment of the double-mutant cells can be reduced by several orders of magnitude when sexual reproduction is considered (Weissman *et al.* 2010).

Recently, Proulx (2012) studied stochastic tunneling in a model that was not built upon the homogeneous-state approach. The author constructed a mutational network to study gene duplication. Although the setting of the model is very different from the setting we consider here, the underlying principles are similar.

The existing approaches for the Moran model in an asexual population provide accurate analytical approximations for a subset of the parameter space. We present a systematic overview of the scope of existing quantitative work. Extensive regions of parameter space, up to date, have been left unexplored by analytical approaches. These are predominantly situations in which the double mutant is not the most advantageous in the sequence. Before the double mutant reaches fixation, the population must travel across a fitness hill or move constantly downhill in fitness space, as illustrated in Figure 1B. The dynamics can then become trapped in quasi-equilibria—a consequence of the mutation–selection balance (Crow and Kimura 1970). In these long-lived equilibria, the population is heterogeneous, and so previous approaches are not justified. Throughout our article, these states are referred to as “metastable.” When these states exist, fixation is driven purely by demographic fluctuations. We address this regime based on ideas from mathematical statistical physics. Specifically we employ the Wentzel–Kramers–Brillouin (WKB) method to derive quantitative predictions for fixation times. Examples of using the WKB method to describe the escape from metastable states include the computation of mixing times in evolutionary games (Black *et al.* 2012), investigating extinction processes in coexisting bacteria (Lohmar and Meerson 2011) or predator–prey systems (Gottesman and Meerson 2012), and investigating epidemic models (van Herwaarden and Grasman 1995; Kamenev and Meerson 2008; Dykman *et al.* 2008; Black and McKane 2011; Billings *et al.* 2013). In the presence of recombination, Altland *et al.* (2011) have shown that metastable states can appear when recombination rates are large, even if the double mutant is advantageous. The authors note that the time to escape across this “recombination barrier” increases exponentially with system size, as explained based on a WKB approach.

In this article, we first classified the generic types of behavior that can occur in a population of cells that acquire two subsequent mutations in a Moran process: we determined when metastable states occur, when fixation is driven by intrinsic noise as opposed to deterministic flow, and where in parameter space fixation occurs in several subsequent hits. This classification was achieved by systematically studying the underlying deterministic dynamics of the process. We then obtained expressions for fixation times in parameter regimes that could not be captured by previous methods, *i.e.*, regimes in which metastable states are found. We thus employed the WKB method to provide a more complete analytical description of the fixation dynamics in these parameter regimes. Our work fills the gap left by the existing literature and leads to a more comprehensive understanding of mutation acquisition and stochastic tunneling in evolving populations.

## The Model

We considered a well-mixed, finite population of *N* cells. Each cell can be of one of three possible types, labeled type 0, a wild-type cell harboring no mutations, type 1, a cell harboring one mutation, and type 2, a cell harboring two mutations. Initially, all cells are of type 0. The evolution of the population is determined by a Moran process (Moran 1962). During each elementary time step of this process, a cell is randomly chosen to reproduce proportional to its fitness. In the same time step another cell is randomly removed, such that the total population size remains constant. The daughter cell can either inherit its type from the parent or acquire a mutation during division. The relative fitness values of type 0, type 1, and type 2 cells are denoted by , , and . Without loss of generality, we use throughout. The mutation rates and denote the probability that the daughter of a type 0 cell is of type 1 and the probability that the daughter of a type 1 cell is of type 2, respectively. We neglect all other combinations of mutations. The assumption of no backmutation is commonly used in the population genetics literature (Ma *et al.* 2008). It is justifiable since the human genome is very large, bp, and the probability of mutating a specific base per cell division very small, – (Kunkel and Bebenek 2000). Therefore the chance of undoing a specific point mutation is vanishingly small. The probability that a second critical alteration occurs at a different locus is much higher.

In our model, finite populations will eventually reach a state in which all cells have acquired two mutations. This state is “absorbing”; *i.e.*, once this state has been reached, no further dynamics can occur. There are of course physical processes beyond the second mutation. In precancerous tissues, for example, there will be a finite probability that cells progress from this state to accumulate further changes (see *Discussion*). These processes are not the focus of our work though, and so are not included in the model.

Let us denote the number of type 0, type 1, and type 2 cells by , , and , respectively; we have . The transition rates for the Moran process are given by(1)The quantity is the average fitness in the population. The transition labeled represents a process in which a cell of type *i* is replaced by a cell of type *j*. In a process labeled , for example, the state of the population changes from to state . As an example, the first reaction rate, , in Equation 1 can be broken down as follows: a type 0 cell is chosen to reproduce at rate . The offspring does not mutate with probability . Finally, a type 1 cell is chosen to be removed at rate . The rates for the other processes can be interpreted analogously. We choose a continuous-time setup, and correspondingly all rates in Equation 1 scale linearly in the population size *N*. Simulations are carried out using a standard Gillespie algorithm (Gillespie 1977), and times are measured in cellular generations.

This process is described exactly by a master equation (van Kampen 2007), which governs the behavior of the probability, , such that the population is in state at time *t* and is given by (2)The vector **v** indicates a change in the composition of the population due to the corresponding reaction, and represents the partial derivative of with respect to time. This equation states that the probability that the population is in state **n** at time *t* increases due to transitions into the state **n** and decreases due to transitions out of the state **n**.

The master equation contains full information about the stochastic population dynamics. In particular, the detailed statistics of the population at any time can be derived from it, and it captures effects driven by intrinsic noise, such as extinction and fixation. Obtaining a full solution of the master equation is difficult or impossible though in all but the simplest of cases. As a starting point, it is often useful to first consider the deterministic limit of infinite populations. In this limit, the distribution is sharply peaked around its average, and so the dynamics reduces to a set of equations for this mean. This approach does not capture any of the stochastic effects. However, the types of stochastic trajectories that can be observed for different parameter sets are, to some extent, set by the underlying deterministic dynamics. We thus first analyze the deterministic limit of the model.

## Deterministic Analysis and Types of Stochastic Behavior

In the limit , the population evolves according to a deterministic set of equations. Writing , we have the relation , and the average fitness is given by . The equations governing the dynamics of the population are then (3)These equations can be derived systematically using a system-size expansion of the master equation (2); see, *e.g.*, van Kampen (2007). Note that refers to the relative concentration of cells of type *i* in the population (not to be confused with the probability of being found in a homogeneous state of type *i* cells as studied in Iwasa *et al.* 2004 and Haeno *et al.* 2013). For example, would indicate that the population is in a state in which all three types are present with equal numbers.

Given the relation , the dynamics has only two independent degrees of freedom. Time courses of the system can hence be thought of as a trajectory in a “concentration simplex,” as depicted in the satellite diagrams of Figure 2. Each point in the simplex represents one particular state of the population. At points in the interior of the simplex all three types of cells are present in the population ( for ). Points on the edges of the simplex represent states in which one of the three types is not present, for example, for points along the edge connecting the bottom-right corner of the simplex with the top corner. We refer to this as the 1–2 edge in the following, and similarly for the other edges. The three corners of the simplex represent the homogeneous states, *i.e.*, (bottom left corner), (bottom right) and (top corner).

The deterministic equations (3) have a trivial fixed point (a point at which for all ) at , corresponding to the absorbing state. The equations can have a further zero, one, or two nontrivial fixed points, depending on the values of the fitness parameters and the mutation rates. These fixed points correspond to points at which mutation and selection balance (Crow and Kimura 1970). Each fixed point can be either stable (*i.e.*, attracting from all directions) or a saddle (attracting from some directions, repelling in others). The system is found not to contain any fully repelling fixed points (*Appendix A*). Figure 2 shows the deterministic dynamics in different parameter regimes, indicated as regions I–V. Below we discuss the stochastic behavior in each of these parameter regimes.

### Region I (mutation–selection balance between type 1 and type 2 cells)

In region I, the deterministic dynamics flows toward a fixed point on the 1–2 edge of the concentration simplex (). The type 0 cells have the lowest fitness and are deterministically lost by selection. The fixed point is a consequence of mutation–selection balance between type 1 and type 2 cells (Crow and Kimura 1970). Writing , the existence condition for the equilibrium () reduces to . It is a well-known result from population genetics that this condition prevents the deterministic loss of the type 1 cells (Crow and Kimura 1970). The deterministic system gets stuck at this fixed point, but a finite population will eventually reach fixation in the all-2 state. At large but finite population sizes, the stochastic dynamics are expected to approximately follow the deterministic path shown in Figure 2 such that type 0 cells quickly become extinct. The lack of backward mutations means the population cannot depart from this edge and the problem reduces to one degree of freedom. The mutation–selection balance maintains the heterogeneous population state of type 1 and type 2 cells. The intrinsic noise then has to drive the system from this metastable state into the absorbing all-2 state against the direction of selection. Fixation times are expected to grow exponentially with the population size (Antal and Scheuring 2006; Mobilia 2010; Altland *et al.* 2011).

### Region II (mutation–selection balance between all three types and, separately, between types 1 and 2)

In region II, the deterministic flow from the all-wild-type state is toward a stable fixed point in the interior of the simplex. This point corresponds to the mutation–selection balance point of all three species. There is a second fixed point located on the 1–2 edge, which corresponds to mutation–selection balance between types 1 and 2 in the absence of type 0 cells (analogous to region I). As type 0 cells have the highest fitness in this regime, selection is directed away from the 1–2 edge. Thus the fixed point on this edge is a saddle. As before the stochastic dynamics in finite populations will reach the all-2 state eventually. The population will closely follow the deterministic trajectory (see Figure 2) before reaching the metastable state about the stable interior fixed point. Here the mutation–selection balance maintains the heterogeneous state with all three species present. The population will fluctuate about this fixed point until it eventually overcomes the adverse selection and escapes. There are two possibilities for the subsequent behavior: (i) Type 0 cells become extinct and the population settles into the metastable state on the 1–2 edge. Intrinsic fluctuations enable the population to overcome the adverse selection along the edge and reach the absorbing all-2 state. This corresponds to sequential extinction, first of type 0 cells and then of type 1 cells. This process is equivalent to a minimal model of Muller’s ratchet (Muller 1964), in which the most advantageous phenotypes are sequentially lost. A trajectory of this type is illustrated in Figures 3, A and C. (ii) Cells of type 0 and type 1 can in principle go extinct (almost) simultaneously. The trajectory of the system then hits the 1–2 edge infinitesimally close to the all-2 corner of the simplex (here infinitesimally close means a distance of order away from the upper corner). It does not become trapped in the metastable state located on the 1–2 edge. In numerical simulations (data not shown) we find that this second path is realized only very rarely, and so our mathematical analysis of region II below focuses on sequential extinction.

### Region III (mutation–selection balance between all three types)

In region III the deterministic dynamics has a single stable fixed point in the interior of the concentration simplex. This point again corresponds to the mutation–selection balance point of all three species. Large, but finite, populations will behave as discussed in case (ii) for region II. They will initially become trapped in the metastable state about the mutation–selection balance point, before intrinsic fluctuations eventually drive the system to the absorbing all-2 state. In region III, type 0 cells and type 1 cells go extinct at essentially the same time. If the type 0 cells become extinct first, then type 1 cells quickly become extinct as selection along the 1–2 edge is directed toward the absorbing state (). This is illustrated in Figures 3, B and D.

### Regions IV and V (beneficial type 2 mutation)

In a subset of the parameter space, shown as regions IV and V in Figure 2, the deterministic flow from the all-wild-type state is directly to the absorbing all-2 state. For such model parameters we expect that fixation in finite populations will be quick as type 2 cells are favored by selection (and mutation). These scenarios agree with the theory of natural selection, in which the populations fitness increases over time (Fisher 1930). In region IV this is achieved by crossing a fitness valley, and in region V it is achieved by sequentially selecting the most advantageous phenotype.

Figure 4 illustrates which parameter regimes have previously been studied in the stochastic tunneling literature. These existing studies almost exclusively focus on regions IV and V, *i.e.*, cases in which fixation is driven not primarily by demographic noise, but by the underlying deterministic flow. As mentioned above fixation is typically fast in regions IV and V. Based on similar studies in evolutionary game theory one would expect the fixation time to grow logarithmically with the population size, (Antal and Scheuring 2006), and this is indeed what we find in simulations (data not shown). The regions containing nontrivial fixed points are largely unexplored by previous investigations. Fixation is controlled by stochastic effects so that fixation times are large and broadly distributed. As we discuss below, fixation times grow exponentially with the population size in such cases. This is perfectly in line with the findings of Haeno *et al.* (2013), who point out that fixation in these regions takes a very long time. Efficient measurements of fixation time in simulations are hence difficult. Methods that require the numerical solutions of, for example, the backward Fokker–Planck equation or a backward master equation reach their limits here as well Haeno *et al.* (2013). The contribution of our work is to analyze precisely these previously inaccessible cases. We compute the fixation properties of systems in which the underlying deterministic flow has one or more attracting fixed points away from the absorbing all-2 state.

## WKB Analysis

Let us now analyze the dynamics in regions I, II, and III, *i.e.*, situations in which the deterministic dynamics has one or two nontrivial fixed points. In large but finite populations these fixed points correspond to metastable states in which the effect of mutation and selection balance. The aim of the following analysis is to calculate the rate at which the population will escape this state and enter the absorbing state in which all cells harbor two mutations. To proceed with the analysis we make the following simplifying assumptions, which are justified by the previous deterministic analysis:

We assume that the population first settles into a distribution about the mutation–selection balance point. This distribution is calculated below.

We assume that the population will leak into the absorbing state on a very long time scale from this distribution. With this assumption we can also say that the time taken for the population to reach the metastable state is negligible when compared to the escape time.

With these assumptions we can compute [from the master equation (2)] the distribution about the mutation–selection balance point and the escape rate. These assumptions (and hence the subsequent analysis) require the selective pressure to be greater than the effect of noise, such that the metastable states are long lived. For this reason, the approach described here is valid only for large *N*, which satisfy this condition (the minimum value of *N* for which our analysis is valid is dependent on the remaining model parameters, but comparisons with simulation results show it is accurate for ).

Mathematically we formulate the problem as follows: the shape of the distribution about the mutation–selection balance point is given by and is henceforth referred to as the quasi-stationary distribution (QSD) in line with existing literature (Assaf and Meerson 2010). The mean time taken to escape from the metastable state, *τ*, is much greater than the time taken to initially reach the metastable state , *i.e.*, . Provided this condition holds, we can assume that after a short time the probability to find the population in state **n** is given by (4)The exponential decay factor, , describes the leaking process from the metastable state into the absorbing state, . The second equation follows from normalization.

To find the mean fixation time of the type 2 cells, we substitute Equation 4 into the master equation (2) to obtain the quasi-stationary master equation (QSME) (5)For (the absorbing state) we have (6)where we have used . Hence if we find the QSD by solving the QSME (5), we can determine the mean fixation time, *τ*, and the probability to have reached fixation by time *t*, .

By separating variables in Equation 4, we have reduced the complexity of the master equation (2) [time does not feature in the QSME (5)]. If we now replace the discrete variables **n** with continuous variables , we further reduce the complexity. This continuous approximation is valid as we have already stated that we require *N* to be large. We now employ the WKB ansatz to represent the QSD as (7)where is known as the *action*, and () is the so-called *amplitude* (Assaf and Meerson 2010). We have introduced *C* as a normalization constant. To find the QSD, we follow *e.g.*, Assaf and Meerson (2010) and expand the QSME (5) in powers of . Further analysis can be carried out if the QSME has only one variable. This is relevant in regions I and II of our model, where the population must escape from a metastable state on the 1–2 edge of the concentration simplex. An example of the QSD obtained from this procedure is shown in Figure 5.

Escape from an interior metastable state can also be studied using the WKB approach. The QSME then retains 2 degrees of freedom, and explicit expressions for the action and escape time cannot be obtained. In the following we briefly describe the main steps for each of the two situations.

### Analysis of mutation–selection balance between types 1 and 2

We first consider the case in which the population must escape from a metastable state on the 1–2 edge (regions I and II). As the population cannot depart from the 1–2 boundary, the system then reduces to 1 degree of freedom. We parameterize the system in terms of , such that is the absorbing state (all cells of type 2) and . The analysis now closely follows the work of Assaf and Meerson (2010), specifically their scenario A. The outcome of the analysis are expressions for the action , (determined up to an additive constant), and the normalization constant *C*. With this we find an explicit expression for the mean escape time, *τ*, from the metastable state. In region I in which there exists just a single deterministic fixed point, this time corresponds to the mean fixation time, as documented above. The fixation time in region II is discussed in the next section. This analysis is covered in detail in *Appendix B*; we here list only the main steps and the result of the solution procedure:

Substitute the WKB ansatz (7) into the QSME (5), and then expand in powers of . The equation at leading order in

*N*can be solved to find the action . The equation at next-leading order can be solved to find the function . The QSD (7) is normalized by considering a Gaussian approximation of . Note that with expressions for*S*, , and*C*, we have derived the distribution about the mutation–selection balance point. An example of this is plotted in Figure 5.Find a so-called boundary-layer solution by recursively solving the QSME (5) for , using Equation 6 to express the solution as a function of the expected escape time

*τ*.Matching the recursive boundary-layer solution and the WKB solution in the domain gives the expected escape time from the metastable state to the absorbing state, (8)

### Analysis of mutation–selection balance between all three types

When a fixed point exists away from the state-space boundaries, as found in regions II and III in Figure 2, the solution procedure described above is no longer viable; the equations recovered from step 1 above can no longer be integrated directly subject to the appropriate boundary conditions (see *Appendix C*). Instead, minimum-action Hamiltonian trajectories must be computed. Problems of this type are usually tackled in one of two ways: first, the equations of motion could be integrated using a shooting method to find the optimal (most likely) trajectory with a given final point (Kamenev and Meerson 2008; Black and Mckane 2011; Gottesman and Meerson 2012); second, the equations can be integrated using an iterative scheme, which converges to the optimal trajectory (Lohmar and Meerson 2011) connecting given start and end points. We found that the second method quickly converges for our problem, so results presented in the following use this method. The details of the procedure are described in *Appendix C*.

Using this method we calculated the action, *S*, accumulated over the trajectory from the stable interior fixed point to the boundary saddle fixed point in region II or directly to the absorbing state in region III. We then used a known result presented by Van Herwaarden and Grasman (1995) for the escape time from a metastable state, (9)where the constant *C* can be found by fitting to simulation data. This expression has the same functional dependence on *N* as the one given in Equation 8.

In principle, one could numerically find the action of a trajectory starting at the stable fixed point and finishing at a general position **x** in concentration space. This would yield the values , and the distribution about the metastable state is then given by . This very technical analysis is tedious and beyond the scope of this article.

## Results

We analyze the results separately for each region (I, II, and III) of parameter space. In particular we discuss the implications the model parameters have for the probability with which tunneling occurs, and for the fixation time of type 2 cells.

### Region I

In this region, type 1 cells have a fitness advantage over both type 0 and type 2 cells, such that the fitness landscape has an intermediate maximum (a fitness “hill;” see Figure 2). As a result, type 0 cells are deterministically lost and the population relaxes to the QSD (*i.e.*, the mutation–selection equilibrium) on the 1–2 edge, as described above, and probability slowly leaks into the absorbing state.

To test the accuracy of this approach, in Figure 5 we compare the QSD measured in simulations with the theoretical approximation. The data in the figure reveal good agreement between theory and model simulations for , and for . In the region just above the agreement between the theoretical result for and simulation data breaks down; here the theoretical value from Equation 7 diverges. This is a result of taking the limit in the QSME (5), in particular for the case , which corresponds to . The value of *τ* is crucial to determining the value of . When calculating the mean fixation time, *τ*, we circumvented this known problem by considering a so-called boundary-layer approach (Assaf and Meerson 2010). The boundary-layer solutions (dashed lines in Figure 5; for details of the calculation see *Appendix B*) show better agreement with simulation results close to than the QSD obtained from the WKB ansatz (solid lines).

Results for the mean fixation time in region I are shown in Figure 6A. In Figure 6B, we plot the probability that type 2 cells have reached fixation by time (including fixation earlier than that). We refer to this quantity as the fixation probability. Fixation times are shown to increase exponentially with . This is a consequence of the increasing height of the selection “barrier,” which must be overcome for type 2 cells to reach fixation. Also, increasing pushes the mutation–selection balance point toward the all-1 state. This results in a further increase in fixation time (or decrease in fixation probability). As the metastable state approaches the all-1 state, the probability of the population reaching the all-1 state due to demographic fluctuations increases. Thus increasing decreases the probability of tunneling.

Increasing the fitness of type 2 cells, on the other hand, pushes the metastable state closer to the absorbing state. This leads to a significant reduction in the fixation time (increase in fixation probability) as also shown in Figure 6A (Figure 6B). Increasing the mutation rate has a similar effect to increasing ; the mutation–selection balance point approaches the absorbing state, and the net effect of selection away from the absorbing state is reduced, leading to a decrease in the fixation time. In line with the previous literature (Iwasa *et al.* 2004; Haeno *et al.* 2013), increasing the mutation rate increases the probability of tunneling.

In Figure 6, A and B, the theoretical predictions from the WKB method are in excellent agreement with simulation results. This is the case even at the moderate population size of . Small deviations occur when mutation rates are low (dashed lines and open symbols in Figure 6). The theory then slightly underestimates the fixation time (overestimates the fixation probability). This is a consequence of assuming that the population approaches the metastable state in a negligible amount of time. For very small mutation rates, it takes an increasing period of time for successful (*i.e.*, nonvanishing) mutant lineages to appear. Deviations between theory and simulation results occur when . At this point the theory breaks down as the fixed point on the 1–2 edge approaches the absorbing state. The barrier associated with adverse selection is then negligible and the assumptions underlying the WKB approximation are no longer justified.

### Region II

To reach fixation in this region the population must accumulate successive mutations of lower fitness (). The population first approaches a metastable state corresponding to the mutation–selection balance point of all three species. From here there are two possible routes to fixation, sequential or (almost) simultaneous extinction of types 0 and 1, as described previously and shown in Figure 3. By computing the action accumulated along both routes, we have shown that the path of least action—the most probable path to fixation—corresponds to the path of sequential extinction. We treat this two-hit process as two separate problems: (i) escape from the interior metastable state to the boundary (loss of the advantageous type 0 phenotype) and (ii) escape from the boundary metastable state to the absorbing all-2 state (analogous to region I). A typical realization of this sequence of events is shown in Figure 3C.

As in region I, the probability of tunneling decreases as the fitness advantage of type 1 cells over type 2 cells increases. This is because the fixed point on the 1–2 edge approaches the state. For the same reason, the tunneling rate decreases as the mutation rates decrease.

Following the convention used by Gottesman and Meerson (2012), we labeled the time to reach the 1–2 boundary as , indicating that the three-species system turns into a two-species system when the wild-type cell goes extinct. Similarly, the time to travel from the boundary fixed point (two species present in the population) to the absorbing state (one species) is denoted by . With this notation we also labeled the action accumulated along each segment as and . As is given by Equation 8, we express the mean fixation time of type 2 cells as (10)The coefficient is found by fitting to simulation data for the time taken to reach the 1–2 boundary as a function of the population size.

Small changes to the parameters now have significant effects on the fixation time, as shown in Figure 7 (solid symbols/solid lines). Increasing the fitness of the type 2 cells moves both the interior fixed point and the boundary fixed point toward the absorbing all-2 state. It also reduces the strength of selection away from the absorbing state. These combined effects dramatically reduce the mean fixation time and its rate of increase with the population size.

### Region III

In this region fixation is controlled by the escape from the interior metastable state; it is a one-hit process. The stable interior fixed point, corresponding to the mutation–selection balance point of the three species, is located close to the all-wild-type state as type 0 cells are the most advantageous. The direct path from the metastable state to the all-2 state is the dominant (least action) path, as shown in Figure 3B. As a result, the probability of tunneling is higher than in the previous cases. It increases as the fitness of type 2 cells and the mutation rates increase as the stable interior fixed point moves to lower numbers of type 1 cells (away from the all-1 state).

The fixation time is computed from Equation 9, where is the action accumulated along the direct path from the stable interior fixed point to the all-2 state. The coefficient is again found by fitting to simulation data. We see in Figure 7 (open symbols/dashed lines) that varying the model parameters has a lesser effect on fixation times than in region II. In region III, fixation is a one-hit process—the population must only escape the stable fixed point—and not a two-hit process as in region II where the effects of the two steps are compounded. Contrary to the results for region I, the mean fixation time is a decreasing function of in region III. This can be explained as follows: by increasing , the selection strength away from the 1–2 boundary decreases and the stable state moves to higher type 1 numbers, such that the population has an improved chance of reaching the 1–2 boundary. From there selection is directed toward the absorbing state, and the time spent on the 1–2 boundary is negligible compared to the time to reach this edge. Hence, the fixation time reduces as type 1 cells become more fit. The rate of increase of the fixation time with the population size reduces as well (see Figure 7).

Note that there are systematic deviations between theory and simulation results in the data set shown as open triangles in Figure 7, and to a lesser extent also for the data shown as open diamonds. This is attributed to the fact that the fitness parameters and are very similar to each other or equal for these instances, and they are also close to the fitness of the wild type. Selection is then close to neutral and the metastable state is only weakly attracting. The WKB approach then reaches its limits as the assumption of a long-lived metastable state begins to break down.

## Discussion

In this article we investigated the fixation of two successive mutations in a finite population of individuals proliferating according to the Moran process. We discussed this in the context of the somatic evolution of a compartment of cells. The accumulation of two mutations can correspond to the inactivation of a tumor-suppressor gene or alteration of genes causing CIN (Weinberg 2013). If the cell carrying two mutations is deleterious, as can be the case with recessive CIN genes (Michor *et al.* 2005), it will generally have low concentrations within the tissue. Then the chance of a cancerous phenotype emerging (further mutation) is very low. Demographic fluctuations can drive the double mutant to higher numbers, but these states are short lived. If the double mutant reaches fixation, the state is maintained until a further mutation occurs and hence the chance of a cancerous phenotype emerging is much greater.

We first analyzed the deterministic limit of the evolutionary dynamics. We identified parameter regimes in which mutation and selection balance. These are regimes in which the double mutant is not the most advantageous in the sequence. In finite populations, this mutation–selection balance gives rise to long-lived metastable states. Our analysis identified the escape from these metastable states as the key bottleneck to fixation of cells with two mutations. For parameter values for which there is no mutation–selection balance (*i.e.*, type 2 cells have the highest fitness), the fixation dynamics is largely governed by the deterministic flow. The rate-limiting steps are then the appearance of successful mutant lineages (Nowak *et al.* 2004), and the subsequent fixation of cells with two mutations is a zero-hit process. As such the progression from healthy tissue (all wild type) to susceptible tissue (all type 2; inactivated TSG) will be fast relative to the cases in which a mutation–selection balance exists. If there is one stable fixed point in the deterministic dynamics, the process becomes a one-hit phenomenon limited by the escape from the corresponding metastable state. In regions with two fixed points one observes a two-hit process. The population becomes trapped in a first metastable state, escapes to a second metastable state, and then reaches full fixation.

In addition to this qualitative classification, we calculated fixation times in parameter regimes previously inaccessible to existing analytical approaches. These are precisely the regions of parameter space in which mutation–selection balance exists. We used the WKB method to calculate the mean escape time from the corresponding metastable states to the absorbing all-type 2 state. This escape time is identified as the fixation time of type 2 cells. We tested the analytical expressions and numerical results obtained from the WKB approach against individual-based simulations of the population dynamics. Our theoretical predictions in principle rely on a limit of large but finite populations, and so they can be expected to be valid only for large enough populations. The comparison against simulations demonstrates the accuracy of our theory even at moderate population sizes of cells. For populations much smaller than this the assumptions of the WKB method break down. The rate-limiting step is then the occurrence of a successful lineage of mutants and not the escape from metastable states. The expressions obtained from the WKB approach become more accurate as the population size increases.

This analysis allowed us to classify how changes to the fitness landscape, mutation rates, and population size affect the probability of tunneling and the time-to-fixation of cells harboring two mutations. In terms of the development of tumors, our analysis shows that the path to accumulating mutations is not simply limited by the mutation rates, but also by escape from metastable states. Populations can exist in a heterogeneous state for very long periods of time before fluctuations eventually drive the second mutation to fixation. The probability with which stochastic tunneling occurs is, in part, determined by the location of these metastable states. If they are located close to the all-type 1 state, then the probability of tunneling is low. This occurs when cells with one mutation have a higher fitness than those with two mutations (regions I and II). The probability of tunneling decreases as the fitness gap between these two types of cell increases or as mutation rates decrease. The mean fixation time increases exponentially as the fitness gap increases. Cells with one and two mutations are present in the tissue compartment for long periods of time; their numbers are maintained by the mutation–selection balance. Cell types are lost sequentially. Wild-type cells can be driven to extinction by selection (region I) or by demographic fluctuations (region II). The extinction of type 1 cells is driven exclusively by fluctuations. When type 2 cells have a higher fitness than type 1 cells, and when both are less fit than the wild type (region III), selection is always against cells of type 1. Mutation–selection balance maintains a low concentration of type 1 cells in the tissue, and hence the probability of tunneling is high. In this regime the mean fixation time is a decreasing function of the fitness of type 1 cells. As for all escape problems from metastable states, the fixation time scales exponentially with the size of the population. Fixation is noise driven, and as the population size is increased the noise strength decreases, and hence fixation takes longer.

Although our theory is aimed at large population sizes and exponentially growing fixation times, we have shown that it can also make accurate predictions on biologically relevant timescales. Assuming a cell generation lasts for 1 day, our theory can capture fixation times of around 3 years or more (>10^{3} generations). Related studies on the progression of cancer suggest a typical time scale on the order of 10 years to accumulate a sufficient number of mutations (Beerenwinkel *et al.* 2007; Bozic *et al.* 2010), which is well within the scope of our theory. However, the times predicted by our theory are extremely sensitive to parameter variation. This limits the parameter ranges for which biologically relevant time scales can be generated. Specifically selective (dis)advantages need to be small (). This is in agreement with selection coefficients in related studies (Bozic *et al.* 2010). Of course the length of a cellular generation can vary by an order of magnitude or so, depending on the specific cell type (Weinberg 2013).

Our results do, however, allow an extrapolation to situations when fixation times become very long, for instance, for very large populations and/or when selection is strongly against the invading mutants. In these scenarios, stochastic simulations can become too expensive computationally to provide meaningful measurements. Analytical methods based on backward master equations or backward Fokker–Planck equations suffer from computational limitations as well in such cases. Our mathematical work complements existing analytical approaches to the Moran model of cells acquiring two successive mutations. Previous work has provided an appropriate machinery with which to compute the time-to-fixation of the second mutation in situations without metastable states. The present article specifically addresses cases in which fixation is limited by the escape from long-lived states. This contribution closes a gap in the analytical characterization of fixation in this model and a more complete picture is now available. We have added a new method to the toolbox used to study stochastic tunneling. Our deterministic analysis provides a systematic procedure to determine which tool to use for a given set of parameters. This accomplishment removes the need for stochastic simulations altogether, or at the very least it limits the circumstances under which they are needed.

This work has clear limitations in that it focuses on the Moran model with only two successive mutations. We have not considered any processes beyond the second mutation; however, such cases can exist in physical systems. If the type 2 cells are not cancerous, one would be interested in, for example, calculating when a metastatic cell (three mutations) first arises. In general this does not require the fixation of type 2 cells and is related to the total number of type 2 cells over time. If metastable states are present, the cumulative number of type 2 cells prior to fixation is small, as described above. While we do not analyze this further, the typical number of type 2 cells at any time can in principle be computed from the quasi-stationary distribution, Equation 4.

Our systematic approach, along with the combined theoretical apparatus of previous work and the WKB method, is readily transferable to more complex models of cancer initiation and progression. One possible extension to this study is the generalization to more than two mutations. If a cell can accumulate *d* possible mutations, metastable states are found provided . A similar analysis can then be carried out. If the fitness landscape is arranged such that , the problem is analogous to Muller’s ratchet (Muller 1964), which describes the accumulation of successive maladaptive mutations. Metzger and Eule (2013) recently studied a special case of this problem using a WKB approach. Finally Gokhale *et al.* (2009) studied valley-crossing dynamics with *d* possible mutations. They have shown that allowing multiple paths to accumulate, the mutations reduce the fixation time. As such, allowing multiple paths in our model could reduce the fixation times we have measured. Work along both of these lines is in progress.

## Acknowledgments

P.A. acknowledges support from the Engineering and Physical Sciences Council (United Kingdom), EPSRC. F.M. acknowledges support from the Dana-Farber Cancer Institute Physical Sciences-Oncology Center (National Institutes of Health U54CA143798). T.G. acknowledges support from the EPSRC, grant reference EP/K037145/1.

## Appendix A

### Fixed Points and Stability

From the deterministic equations (3) it can be seen that the state is a fixed point; *i.e.*, at this point for . This is the absorbing state, so this result is rather obvious. Nontrivial fixed points exist away from the absorbing state in some parameter regions.

The stability of a fixed point is determined by the eigenvalues of the Jacobian of the deterministic equations (3). Due to the overall constraint , the system is effectively two dimensional. We can write the Jacobian in terms of two variables, and , as (A1)Along the 1–2 boundary of the concentration simplex, Equations 3 can be expressed in terms of a single variable, . A fixed point, , on this boundary satisfies the equations (A2)where . These equalities are satisfied by the value (along with ). The parameter range in which this fixed point exists is determined by the condition , which we can write as . The fixed point on the 1–2 edge therefore exists when type 1 cells have a fitness advantage over type 2 cells; the factor accounts for effects of mutation. Increasing this fitness advantage moves the fixed point toward , or equivalently away from the absorbing state at . For vanishing mutation rate , the fixed point approaches the state.

Evaluating the eigenvalues of the Jacobian in Equation A1 at this fixed point, we find that the point is stable if and that it is a saddle if . These two cases correspond to regions I and II in Figure 2.

A fixed point of Equations 3 with is found as (A3)provided the model parameters satisfy and . Further analysis of the Jacobian (A1) at this point shows that the fixed point is stable whenever it exists. This is the region of parameter space in which cells with one and two mutations, respectively, are both less fit than the wild type. This is the case in regions II and III in Figure 2. The fixed point moves closer to when the fitness advantage of the wild type cells is increased (*e.g.*, by lowering the fitness of type 1 cells, ). Decreasing the mutation rates also moves the fixed point closer to .

## Appendix B

### Solving the Quasi-stationary Master Equation

In terms of the variable , we can write the QSME (5) for as (B1)where and . Substituting the WKB ansatz, Equation 7, into Equation B1 and expanding in powers of we arrive at (B2)where we have ignored the term as this term is smaller than (*τ* scales as ). The leading-order terms of this equation are equivalent to the Hamilton–Jacobi equation, (B3)where is the so-called “position” variable, and is the so-called “momentum” variable. This equation is best solved using the method of characteristics; *i.e.*, we look for parametric solutions, . These trajectories fulfill Hamilton’s equations, (B4)They satisfy the principle of least action and correspond to the most likely path taken in the so-called phase-space, the space spanned by . The Hamilton–Jacobi equation has the trivial solution , which corresponds to the deterministic “relaxation” trajectory, for which the equation of motion is simply (B5)As we are interested in escape from a stable fixed point, we seek the nontrivial “activation” trajectory, for which in general. The relevant boundary condition is , where indicates the fixed point of the deterministic dynamics from which the trajectory emanates.

In one dimension, *i.e.*, in the case of a single fixed point on the 1–2 boundary (region I of Figure 2), the Hamilton–Jacobi equation (B3) can be written as (B6)where is the concentration of cells of type 1, , , and (reaction rates as described in Equation 1). This equation can be solved to obtain the activation trajectory , and hence . We can now substitute into the equation consisting of next-leading-order terms [] of Equation B2 to find . Following this procedure we find (B7)The QSD is now determined up to a normalization factor. The QSD is peaked about the fixed point located at ; see *Appendix A*. Hence we can expand the QSD (7) about this fixed point such that (B8)where we have used . Normalizing to unity then determines the normalization coefficient, (B9)The QSD determined above breaks down when , or equivalently when , *i.e.*, close to the absorbing state. In this region we consider a recursive solution of Equation 5 that does not rely on a specific form for the QSD; *i.e.*, we do not use the WKB ansatz (7). We expand Equation 5 about to obtain (B10)This is to be solved for (). Using , we can write this as (B11)where . This recursive system can be solved to arrive at (B12)where the second step follows from . Using Equation 6 and expanding the relevant transition rate, , about we can write . By matching the recursively obtained boundary-layer solution of Equation B12 with the WKB solution in Equation B8 at , we obtain an expression for the fixation time *τ*, as shown in Equation 8.

## Appendix C

### Numerical Solutions for Fixation Time

We now address the case in which there is an internal stable fixed point of the deterministic dynamics. The problem then retains 2 degrees of freedom. We follow the initial steps of *Appendix B* to arrive at the Hamilton–Jacobi Equation B3. Given that the original system is two dimensional, we now find four variables for the Hamilton–Jacobi problem, two position variables and (equivalent to and ), and two corresponding momenta and . These are defined by . As the “energy” is fixed () we have three effective degrees of freedom and no obvious solution to . We consider again Hamilton’s equations (B4). These equations describe the trajectory that minimizes the action, and hence by solving these we can then determine the fixation time. To determine the boundary conditions we need to find the fixed points of Equations (B4). We first note that there are three zero-momentum fixed points, which correspond to the fixed points of the deterministic equations (3). Following Gottesman and Meerson (2012), we label these as for the absorbing state [], for the 1–2 boundary fixed point, and for the stable interior fixed point defined by Equation A3. As we seek to determine the activation trajectory, we need to find fixed points of Equations B4 with nonzero momenta, but with positions corresponding to and (the possible end points of the trajectories). These so-called fluctuational fixed points are labeled as and .

The relevant trajectory is then found using an iterative method to solve the two-boundary problem. Consider the scenario in which the stable interior fixed point is the only fixed point of the deterministic system for and , other than the absorbing state, *i.e.* for parameters in region III of Figure 2. Here the activation trajectory that leads to fixation starts at and finishes at . To start the iteration, we fix the momenta for all times to the values at , and then numerically integrate the equations of motion (B4) for the position vector **q** forward in time, starting at and keeping the momenta constant. This integration is carried out for a sufficient range of time to reach the vicinity of the fixed point , but not too long to avoid numerical errors building up. In the next step the relations for the momenta in Equation B4 are then integrated backward in time using the trajectory found in the previous iteration. The momenta at the start of this backward integration are chosen as those corresponding to . This procedure is then iterated, with alternating forward and backward integration of Hamilton’s equation. At each step of the procedure the action of the path is found as (C1)The iteration of alternating forward and backward integration is then repeated until has reached convergence. The action can then be used in Equation 9 to determine the fixation time .

In region II of parameter space a similar procedure is applied. In this case the minimizing trajectory in -space which connects and .

## Footnotes

*Communicating editor: L. M. Wahl*

- Received August 25, 2014.
- Accepted January 19, 2015.

- Copyright © 2015 by the Genetics Society of America