Abstract
With rare exceptions, human tumors arise from single cells that have accumulated the necessary number and types of heritable alterations. Each such cell leads to dysregulated growth and eventually the formation of a tumor. Despite their monoclonal origin, at the time of diagnosis most tumors show a striking amount of intratumor heterogeneity in all measurable phenotypes; such heterogeneity has implications for diagnosis, treatment efficacy, and the identification of drug targets. An understanding of the extent and evolution of intratumor heterogeneity is therefore of direct clinical importance. In this article, we investigate the evolutionary dynamics of heterogeneity arising during exponential expansion of a tumor cell population, in which heritable alterations confer random fitness changes to cells. We obtain analytical estimates for the extent of heterogeneity and quantify the effects of system parameters on this tumor trait. Our work contributes to a mathematical understanding of intratumor heterogeneity and is also applicable to organisms like bacteria, agricultural pests, and other microbes.
HUMAN cancers frequently display substantial intratumor heterogeneity in genotype, gene expression, cellular morphology, metabolic activity, motility, and behaviors such as proliferation rate, antigen expression, drug response, and metastatic potential (Fidler and Hart 1982; Heppner 1984; Nicolson 1984; Campbell and Polyak 2007; Dick 2008). For example, a molecular and phenotypic analysis of breast cancer cells has revealed defined subpopulations with distinct gene expression and (epi)genetic profiles (Shipitsin et al. 2007). Heterogeneity and the existence of subpopulations within single tumors have also been demonstrated via flow cytometry in cervical cancers and lymph node metastases (Nguyen et al. 1993) as well as in leukemias (Wolman 1986). Virtually every major type of human cancer has been shown to contain distinct cell subpopulations with differing heritable alterations (Heppner 1984; Merlo et al. 2006; Campbell and Polyak 2007). Heterogeneity is also present in premalignant lesions; for instance, genetic clonal diversity has been observed in Barrett's esophagus, a condition associated with increased risk of developing esophageal adenocarcinoma (Maley et al. 2006; Lai et al. 2007).
Tumor heterogeneity has direct clinical implications for disease classification and prognosis as well as for treatment efficacy and the identification of drug targets (Merlo et al. 2006; Campbell and Polyak 2007). The degree of clonal diversity in Barrett's esophagus, for instance, is correlated with clinical progression to esophageal adenocarcinoma (Maley et al. 2006). In prostate carcinomas, tumor heterogeneity plays a key role in pretreatment underestimation of tumor aggressiveness and incorrect assessment of DNA ploidy status of tumors (Wolman 1986; Haggarth et al. 2005). Heterogeneity has long been implicated in the development of resistance to cancer therapies after an initial response (Geisler 2002; Merlo et al. 2006) as well as in the development of metastases (Fidler 1978). In addition, tumor heterogeneity hampers the precision of microarray-based analyses of gene expression patterns, which are widely used for the identification of genes associated with specific tumor types (O'Sullivan et al. 2005). These issues underscore the importance of obtaining a more detailed understanding of the origin and temporal evolution of intratumor heterogeneity.
To study the dynamics of intratumor heterogeneity, we construct and analyze a stochastic evolutionary model of an expanding population with random mutational fitness advances. Evolutionary models of populations with random mutational advances have been studied in the context of fixed-size Wright–Fisher processes for both finite and infinite populations (Gerrish and Lenski 1998; Park and Krug 2007; Park et al. 2010); Gerrish and Lenski (1998) studied the speed of evolution in a Wright–Fisher model with random mutational advances in the context of finite but large populations while Park et al. (2010) obtained accurate asymptotic approximations for the evolutionary dynamics of the population, following ideas presented in Park and Krug (2007). The latter work and references therein constitute a substantial exploration of the effects of random mutational advances in fixed-size populations. Our present work complements this research by exploring the effects of random mutational advances in the context of exponentially expanding populations and in particular the implications of these mutational advances on population heterogeneity. Models of exponentially expanding populations are appropriate for the study of situations arising during tumorigenesis, but are also applicable to other organisms undergoing binary replication such as bacteria, agricultural pests, and other microbes and pathogens. Bacterial populations, for instance, are diverging quickly in both genotype and phenotype, as studied by Lenski and colleagues. These investigators examined the dynamics of phenotypic evolution in populations of Escherichia coli that were propagated by daily serial transfer for 1500 days, yielding 10,000 generations of binary fission (Lenski et al. 1991; Lenski and Travisano 1994). The fitness of the bacteria improved on average by 50% relative to the ancestor, and other phenotypic properties, such as cell size, also underwent large changes. Similarly, single malaria isolates have been found to consist of heterogeneous populations of parasites that can have varying characteristics of drug response, from highly resistant to completely sensitive (Foley and Tilley 1997). These findings have implications for treatment strategies, as not all pathogen populations are sensitive to therapeutic interventions, and necessitate the study of diversity dynamics in growing populations of cells.
In this article, we consider an exponentially expanding population of tumor cells in which (epi)genetic alterations confer random fitness changes to cells. This model is used to investigate the extent of genetic diversity in tumor subpopulations as well as its evolution over time. The mathematical framework is based on the clonal evolution model of carcinogenesis, which postulates that tumors are monoclonal (i.e., originating from a single abnormal cell) and that over time the descendants of this ancestral cell acquire various combinations of mutations (Merlo et al. 2006; Campbell and Polyak 2007). According to this model, genetic drift and natural selection drive the progression and diversity of tumors. Our work complements studies of the effects of random mutational fitness distributions on the growth kinetics of tumors (Durrett et al. 2010) and contributes to the mathematical investigation of intratumor heterogeneity (Coldman and Goldie 1985, 1986; Michelson et al. 1989; Kansal et al. 2000; Komarova 2006; Haeno et al. 2007; Schweinsberg 2008; Bozic et al. 2010; Durrett and Moseley 2010).
MATERIALS AND METHODS
We consider a multitype branching process model of tumorigenesis in which (epi)genetic alterations confer an additive change to the birth rate of the cell. This additive change is drawn according to a probability distribution v, which is referred to as the mutational fitness distribution. Cells that have accumulated i ≥ 0 mutations are denoted as type-i cells. Initially, the population consists entirely of type-0 cells, which divide at rate a0, die at rate b0, and produce type-1 cells at rate u. The initial population, whose size is given by V0, is considered to be sufficiently large so that its growth can be approximated by , where λ0 = a0 − b0 and time t is measured in units of cell division. Type-1 cells divide at rate a0 + X, where X ≥ 0 is drawn according to the distribution v, and give rise to type-2 cells at rate u. All cell types die at rate b0 < a0. In general, a type-(k − 1) cell with birth rate ak−1 produces a new type-k cell at rate u, and the new type-k cell type divides at rate ak−1 +X, where is drawn according to v. Each type-k cell produced by a type-(k − 1) cell initiates a genetically distinct lineage of cells, and the set of all of its type-k descendants is referred to as its family. The total number of type-k cells in the population at time t is given by and the set of all type-k cells is called the kth wave or generation k. The total population size at time t is given by .
Note that in our model, mutations are not coupled to cell divisions but instead occur at a fixed rate per unit of time. This model can be modified to exclusively consider (epi)genetic alterations that occur during cell division by assuming that type-0 individuals divide at rate α0 and during each division, there is a chance μ that a mutation occurs, producing a type-1 individual. These two model versions are equivalent for wave-1 cells if a0 = α0(1 − μ) and u = α0μ. For later waves, the model must be altered so that the mutation rate is dependent upon the genetic constitution of the cells, since accumulation of alterations modifies the fitness of the cell and hence the rate at which further changes are accumulated. Analysis of this model would then require the mutation rate term to be inside the integral over the support of the fitness distribution; however, this modification does not alter the limiting results significantly. Our model and results can also be modified to account for generation-dependent mutation rates (Coldman and Goldie 1985; Park et al. 2010).
The mutational fitness distribution v determines the effects of each (epi)genetic change that is accumulated in the population of cells. We consider fitness distributions concentrated on [0, b] for some b > 0. We discuss two distinct classes of distributions: (i) v is discrete and assigns mass gi to a finite number of values ; (ii) v is continuous with a bounded density that is continuous and positive at b. Figure 1 shows a snapshot of the population decomposition in a sample simulation for case ii and illustrates the complex genotypic composition of the population of cells generated by our model.
Figure 1.—
A sample cross-section of tumor heterogeneity. (A) The composition of a tumor cell population at time units after tumor initiation. Each “wave” of cells, defined as the set of cells harboring the same number of (epi)genetic alterations, is represented by a different color. Individual clones of cells with identical genotype are shown as circles, positioned on the horizontal axis according to fitness (i.e., birth rate) and on the vertical axis according to clone size. Note that later waves tend to have larger fitness values. (B) The time at which individual clones of cells were created during tumor progression. The color scale depicts the time of emergence of each clone. In A and B, parameters are , , , and
The determination of the distribution v in the context of bacteria and viruses has been the subject of several experimental studies (Imhof and Schlotterer 2001; Sanjuan et al. 2004), which have generally produced results leading to the conclusion that v has an exponential distribution. However, more recently Rokyta and colleagues (Rokyta et al. 2008) presented studies of bacteriophages, in which the distribution of beneficial mutational effects appears to have a truncated right tail. One possible explanation for this result is that the experiments were done at an elevated temperature that might have led to a limited number of available beneficial mutations. A similar scenario might arise during tumorigenesis when only a limited number of (epi)genetic alterations allow a cell to progress to a more aggressive phenotype.
RESULTS
There are two sources of heterogeneity present in the population: variability in the number of mutations per cell (heterogeneity between generations) and genotypic variation between members of the same generation (heterogeneity within a generation). We investigate these two sources of heterogeneity and derive analytic results that quantify the relationship between model parameters—e.g., mutation rate and mutational fitness distribution—and the amount of genotypic variation present in the population over time.
Between-generation heterogeneity:
Asymptotic results for the size of generation
k were obtained in (
Durrett et al. 2010;
Durrett and Moseley 2010); see
Equations A1 and
A2 in the
appendix for restatements of the relevant results from these articles.
Equation A1 implies that in case i, for example, we have the approximation
when
t is large, where
is the maximum growth rate that can be attained by generation
k mutants,
and
is a positive random variable with known Laplace transform. In case ii there is an additional term in (2) of the form
. Dividing both sides of (2) by
and speeding up time by a factor of
L, we note that the log size of generation
k approaches a deterministic, linear limit as the mutation rate becomes very small. In particular, as
, we have
in probability, where
The limiting process depends on
λ0, the growth rate of type-0 cells, and
b, the maximum attainable fitness increase, but is otherwise independent of the particular choice of fitness distribution. An example of the limiting process is displayed in
Figure 2, a and b.
Figure 2.—
The process for the small mutation limit. The limiting process is shown for the case in which the mutation rate goes to zero. (A) The time, t, on the horizontal axis vs. the log number of cells of wave k, , as given by Equation 3. Both time and space are given in units of L = log(1/u). This plot shows the first 20 waves started at i.e., the time that type-1 cells begin to be born. (B) A closer look at the first 7 waves from a, showing the changes in the dominant type. (C) The birth times of the first 20 generations as a function of the generation number. (D) The dominant type in the population as a function of time. The index of the largest generation at time t is defined as . In A–D, parameters are and .
As a consequence of Equation 3, we obtain the following insight regarding the birth time of type-
k cells: if
is the first time a type-
k individual is born, then, as
,
in probability for all
k ≥ 0. From the definition of
, we have that
is decreasing so that the increments between the birth times for successive generations decrease as
k increases. This effect leads to an acceleration in the rate at which new mutations are accumulated (
Figure 2c). This acceleration occurs regardless of the choice of fitness distribution, assuming that i or ii holds. Since
is inversely proportional to
b, distributions that allow for larger fitness increases tend to exhibit shorter increments.
Bozic et al. (2010) observed a similar acceleration of waves in their model of mutation accumulation. On the basis of approximations by Beerenwinkel et al. (2007), they concluded that this acceleration was an artifact of the presence of both passenger and driver mutations and that it does not occur when only driver mutations conferring a fixed selective advantage are considered—i.e., when the fitness increments are deterministic. In contrast to these conclusions, we find that the acceleration of waves occurs regardless of the choice of mutational fitness distribution and is due to the difference in growth rates between successive generations: type-k cells arise when generation k − 1 reaches size and since the asymptotic growth rate of generation k is larger than the asymptotic growth rate of generation k − 1, generation k + 1 reaches size faster than generation k. We observed another example of this phenomenon earlier during our study of a related Moran model for tumor growth in which the total population of cells grows at a fixed exponential rate (Durrett and Mayberry 2010). In this model, the cause of acceleration was similarly related to growth rates—later generations take longer to achieve dominance in the expanding population of cells, and hence new types are born with a higher fitness advantage compared to the population bulk, allowing them to grow more rapidly.
As a second application of Equation 3, we derive an analytic expression for the time at which type-
k cells become dominant in the population. Let
for all
be the first time that type-
k cells become the most frequent cell type in the population. Then, as
, we have
in probability for all
The limit
is the solution to
,
i.e., the time when
first overtakes
.
Figure 2d demonstrates how the index
of the largest generation at time
t, defined as
changes over time. The transitions between periods of dominance are sharp only in the small mutation limit. At any given time, the population consists primarily of members of the current dominant generation;
i.e., (1/
L)log
Z(
Lt)→
zm(t)(
t) as
. Therefore, for small mutation rates, the amount of genetic heterogeneity present in the population is determined by the amount of heterogeneity present in the dominant generation.
To determine the accuracy of our results when the mutation rate is small, we compare our limiting approximation in Equation 3 with the results of numerical simulations, using a mutational fitness distribution corresponding to a point mass at
b [
i.e.,
]. Given that there are ∼3 billion base pairs in the human genome and the mutation rate per base pair is
(
Seshadri et al. 1987;
Oller et al. 1989;
Kunkel and Bebenek 2000), point mutations occur at a rate of 0.3–30 per cell division. However, since advantageous mutations constitute only a fraction of all possible mutations, the mutation rate per cell division for advantageous mutations is smaller than the overall mutation rate per cell division. In the following example, we use a mutation rate per cell per time unit of
and a cell division rate of
a = 0.2. In
Figure 3, we compare the average size of the
kth generation in simulations (
Figure 3a) with the approximation given by Equation 3 (
Figure 3b). Although the behavior is qualitatively similar to the small mutation limit, the approximation consistently underestimates the times at which new waves appear. To explain the source of this bias, we use the alternative approximation given by the right-hand side of
Equation 2, which can be rewritten as
where
. Using the expression for the Laplace transform of
and the numerical algorithm presented in
Ridout (2008), we sample 1000 variates from the distribution of
.
Table 1 shows the sample mean and standard deviation of log
, for
. The distribution of log
has a positive mean and is skewed to the right (
Figure 4), implying that the limit in Equation 3 in general underestimates the size of generation
k for positive mutation rates. The approximation obtained by replacing log
with the sample mean of log
is displayed in
Figure 3c. After an initial period in which the number of type-
k cells is small, the behavior of the process closely resembles the one shown in
Figure 3a. The variance of log
increases with
k and hence, we expect an increasing amount of variability in the simulations in the time when type-
k cells arise.
Figure 3d displays the right-hand side of
Equation 6, replacing log
with the value two standard deviations above its mean to illustrate an extreme scenario.
Figure 3.—
The size of the first four generations of cells. The log size of generations 1–4 is shown as a function of time t. Both time and space are plotted in units of . (A) The average values of the log generation sizes over sample simulations. (B) The limiting approximation from Equation 3 for the log size of the generations. (C) The approximation from Equation 6 using the mean of . (D) The approximation from Equation 6 using a value two standard deviations above the mean of to demonstrate an extreme scenario. Parameters are , , , , and
Figure 4.—
The distribution of log Vd,1. The relative frequency histogram of 1000 random samples from the distribution of is shown, as specified by Equation 1.
TABLE 1Means and standard deviations for log Vd,k for k = 1, 2, 3, as specified by Equation 1
Generation | Mean | Standard deviation |
1 | 4.7638 | 1.3738 |
2 | 10.6010 | 2.2434 |
3 | 17.0519 | 3.0282 |
Generation | Mean | Standard deviation |
1 | 4.7638 | 1.3738 |
2 | 10.6010 | 2.2434 |
3 | 17.0519 | 3.0282 |
TABLE 1Means and standard deviations for log Vd,k for k = 1, 2, 3, as specified by Equation 1
Generation | Mean | Standard deviation |
1 | 4.7638 | 1.3738 |
2 | 10.6010 | 2.2434 |
3 | 17.0519 | 3.0282 |
Generation | Mean | Standard deviation |
1 | 4.7638 | 1.3738 |
2 | 10.6010 | 2.2434 |
3 | 17.0519 | 3.0282 |
While the limit in (3) depends on the fitness distribution only through the maximum attainable fitness increase b, the distribution of also depends on the fitness distribution through the probability of attaining a fitness advance of b if v is discrete and the value of the probability density function at b if v is continuous (see Equation 4.9 in Durrett et al. 2010). As a consequence, our finite time approximation (6) takes into consideration the shape of the fitness distribution near b and the corrector term log accounts for variations in the likelihood of attaining the maximum possible fitness advance.
Within-generation heterogeneity:
We begin our investigation of within-generation heterogeneity by examining the extent of diversity present in the first generation of cells. We use two statistical measures to assess heterogeneity: (i) Simpson's index, which is given by the probability that two randomly chosen cells from the first generation stem from the same clone, and (ii) the fraction of individuals in the first generation that stem from the largest family of cells. To obtain these results, we derive an alternate formulation of the limit in Equation 1 that shows the limit is the sum of points in a nonhomogeneous Poisson process (see the appendix for more details). Each point in the limiting process represents the contribution of a different mutant lineage to so that it suffices to calculate i and ii for the points in the limiting process.
Simpson's index:
Let us introduce some terminology by defining
to be the
largest point in the limiting point process and by setting
. Then Simpson's index for the point process is defined by
We may also consider Simpson's index for a random walk
, where the
are independent random variables with a tail probability
, and
. Then the limit as
n goes to infinity of
is
(
Fuchs et al. 2001), where
denotes the ratio of the growth rate of type-0 cells to the maximal growth rate of type-1 cells. Furthermore, the expected value of
converges to the expected value of
R so we have
See the
appendix for details of this calculation.
Equation 7 shows that the average amount of heterogeneity present in the first generation depends only on
α.
Figure 5 displays the sample mean of Simpson's index of
Z as a function of time for different values of
α. Initially, the sample mean tends toward
the expected value of Simpson's index for the limiting point process. Although the sample mean is greater than the limiting value for larger values of
b, our theory guarantees that eventually, the values of the sample mean converge. However, it is impossible to simulate the process for such long times since the population size and number of cell types become too large.
Figure 5.—
The expected value of Simpson's index for the first wave of cells. The sample mean of Simpson's index (dots) over time t and the expected value of Simpson's index for the limiting point process (line) are shown, for two different values of . Parameters are , , and , where in a and in b.
Expressions for the density and higher moments of
R can be obtained in a similar manner by the comparison techniques we used in the proof of
Equation 7 (see
appendix and
Logan et al. 1973;
Shao 1997). In addition, near the origin, the density
g(
x) of
R has the form
as
. Figure 6 displays simulations of Simpson's index for the first wave mutants in the branching process
Z. Note the convergence of the empirical distribution of Simpson's index to the distribution for the limiting point process.
Figure 6.—
The empirical distribution of Simpson's index for the first wave of cells. Individual plots of Simpson's index are shown for the branching process at times , 90, 110, and 130 along with Simpson's index for the limiting point process . The histograms show the average over 1000 sample simulations. For the limiting point process, we approximate Simpson's index by examining the largest points in the process. Parameters are , , and .
Largest clones:
To further investigate heterogeneity properties of the point process, we examine the fraction of cells descended from the largest family of first-generation mutants defined as . This quantity reveals the degree of dominance of the largest clone in the first wave of mutants. For large n, values of Vn near one indicate that the population is largely dominated by a single clone, while values near zero indicate a highly heterogeneous population where no single clone contributes significantly to the total. Using a similar approach to that in the previous section, we again consider this calculation in the context of a random walk. Consider a sequence of independent, identically distributed random variables Yi with partial sums Wn. Define the maximum value of this sequence from 1 to n to be Y(1). Then classical results regarding one-dimensional random walks characterize the limiting characteristic function of Wn/Y(1) (Darling 1952). In the appendix, we demonstrate that these results can be applied to the study the largest clone contributions in the limiting point process.
We show that 1/
Vn converges in distribution to a nontrivial limit
W and obtain an explicit formula for the characteristic function of the limit: as
,
, where
W has characteristic function
ψ satisfying
ψ(0) = 1 and
with
Interestingly, the characteristic function of
W is nonintegrable since its density has a singularity at 1. This finding implies that there is a disproportionately large chance that a single clone dominates the population. Further details are shown in the
appendix.
Differentiating
ψ then leads to simple expressions for the mean and variance of the limit,
Figure 7a suggests that the rate of convergence is slow for
α close to 1. These formulas provide us with the first two moments of the diversity measure
W and reveal its dependence on
α.
Figure 7.—
The largest clones in the population of cells. (A) A comparison between Monte Carlo estimates for and the limit (B) A comparison of the Monte Carlo estimates for and the curve . The Monte Carlo estimates are averaged over 100 sample simulations.
Equation 9 implies that Vn converges to a nontrivial limit V = W−1 and Jensen's inequality applied to the strictly convex function 1/x implies that This result provides a lower bound on the expected value of the limit of Vn and indicates that for values of α close to zero, the population is eventually dominated by a single clone. Even though this result indicates only a lower bound, simulations suggest that deviations of the mean from 1−α are small, as illustrated in Figure 7b.
Extensions to generation k:
The results obtained in the previous two sections for the first generation of cells can easily be extended to later generations by noting that each mutation to a type-(
k − 1) individual starts a new family regardless of the mutated cell's family tree. Therefore, the amount of heterogeneity within generation
k depends on the relative growth rates of type-
k and type-(
k − 1) individuals in the same way that the amount of heterogeneity within generation 1 depends on the relative growth rates of type-1 and type-0 individuals. In particular, letting
Rk denote the limiting value of Simpson's index for generation
k, we have
where
is the relative growth rate of type-(
k −1) individuals compared to type-
k individuals (see also
Equation 7 above). Note that the relative growth rates
αk are increasing functions of
k. Therefore,
Equation 11 shows that the mean of Simpson's index is a decreasing function of the generation number.
Total population heterogeneity:
We conclude this section by demonstrating how heterogeneity measures for the population as a whole can be calculated using information about both intra- and interwave heterogeneity. First, define the total collection of all genotypes present at time
t as
N(
t). For a particular genotype,
x, the number of cells at time
t that have exactly this genotype is given by
Z(x)(
t). Two cells have the same genotype if they contain exactly the same collection of mutations. Then Simpson's index for the entire population is given by
To show how this expression depends on the contributions of different waves, define the total collection of genotypes present in the wave-
k population at time
t by
Nk(
t), and let
Zk(x) be the population of cells in wave
k that have genotype
x. By defining
K(
t) to be the number of waves present at time
t, we obtain the alternate expression
This decomposition expresses Simpson's index for the whole population in terms of Simpson's index for each wave and the contribution of each wave to the total population. Combining the result in (5), which gives the dominant wave as a function of time, with (11), which describes wave-
k heterogeneity, we obtain a description of how the extent of heterogeneity changes in time. However, more refined results are needed to describe the transitions between the dominance of successive waves.
DISCUSSION
In this article, we investigated the evolution of intratumor heterogeneity in a stochastic model of tumor cell expansion. Our model incorporates random mutational advances conferred by (epi)genetic alterations and our analysis focused on the extent of heterogeneity present in the tumor. We first considered heterogeneity between tumor subpopulations with varying numbers of alterations and obtained limiting results, as the mutation rate approaches zero, for the contribution of each wave of mutants to the total tumor cell population. We showed that in the limit, this intergeneration heterogeneity depends on the maximum attainable fitness advance conferred by (epi)genetic alterations, but not on the specific form of the mutational fitness distribution. Our analysis also led to analytical expressions for the arrival time of the first cell with k mutations and showed that the rate of accumulation of new genetic alterations accelerates over time due to the increasing growth rates of successive generations. We demonstrated with stochastic simulations that for small but positive mutation rates, our limiting approximations provide good predictions of the model behavior (see Figure 3). These simulations also suggest that as time increases, multiple waves of mutants coexist without a single, largely dominant wave. For large t, the mean growth rate of the kth wave is given by , showing that variation in fitness within a particular wave is a transient property. The extent to which this variation affects tumor dynamics at small times is the subject of ongoing work.
We also investigated the genotypic diversity present within the kth generation of mutants by considering two measures of diversity: Simpson's index, which is given by the probability that two randomly selected cells stem from the same family, and the fraction of individuals in generation k that stem from the largest family of individuals. We obtained limiting expressions for the mean of Simpson's index as well as the form of its density near the origin. Interestingly, the limiting mean of Simpson's index is given by the quantity 1−α, where α is the ratio between the maximum attainable fitness values of type-(k − 1) and type-k individuals. We then observed that, as time increases, the mass of the distribution of Simpson's index moves closer to 0, indicating higher levels of diversity in the tumor at later times (see Figure 6). This behavior was also observed via direct numerical simulation of the branching process—the distribution and mean of Simpson's index converged to the predicted limiting values.
Finally, we investigated the ratio between the total population size of the kth wave of mutants and the size of the largest family. We showed that this ratio can be approximated by a random variable with mean (1−α)−1. An explicit formula for the characteristic function of this random variable was also obtained (see Equation 9). Note that as α approaches 1, the mean of the ratio grows to infinity—i.e., the largest family of cells constitutes a vanishing proportion of the total population of wave-k cells as the maximum possible fitness advance goes to zero.
In the context of tumorigenesis, where a tumor originally starting from a single cell reaches cell numbers of ≥1012, the limit as t goes to infinity is indeed the appropriate regime of study. We have compared our results regarding Simpson's index with numerical simulations at finite times (see, e.g., Figures 5 and 6) and have found good qualitative agreement with the asymptotic limits. Estimates of overall mutation rates in evolving cancer cell populations range from 10−9 to 10−2. In Figure 3 we demonstrate good agreement between our results regarding interwave heterogeneity in the small mutation rate limit and numerical simulations when the mutation rate is ; thus, analysis in this limiting regime captures the dynamics for mutation rates < per cell division.
In conclusion, our analysis indicates that tumor diversity is strongly dependent upon the age of the tumor and the maximum attainable fitness advance of mutant cells. If only small fitness advances are possible, then the tumor population is expected to have a larger extent of diversity compared to situations in which fitness advances are large. The acceleration of waves observed in our studies of intergeneration heterogeneity provides evidence that an older tumor has a higher level of diversity than a young tumor. In addition, we have shown that the mean of Simpson's index for generation k is a decreasing function of the generation number (see Equation 11), indicating a larger extent of diversity in later generations and suggesting a further increase in the total extent of heterogeneity present in the tumor at later times.
Possible extensions of our model include spatial considerations and the effects of tissue organization on the generation of intratumor heterogeneity as well as the inclusion of other cell types, such as immune system cells and the microenvironment. Furthermore, alternative growth dynamics should be considered to test the extent of heterogeneity arising in populations that follow logistic, Gompertzian, or other growth models. We have neglected these aspects in the current version of the model to focus on the dynamics of tumor diversity in an exponentially growing population of cells. Our model provides a rational understanding of the extent and dynamics of intratumor heterogeneity and is useful for obtaining an accurate picture of its generation during tumorigenesis.
Acknowledgements
The authors thank Theresa Edmonds for exceptional research assistance. This work is supported by National Cancer Institute grant U54CA143798 (to J.F., K.L., and F.M.), National Science Foundation (NSF) grant DMS 0704996 (to R.D.), and NSF RTG grant DMS 0739164 (to J.M.).
LITERATURE CITED
Beerenwinkel
N
, Antal
T
, Dingli
D
, Traulsen
A
, Kinzler
K W
et al. ,
2007
Genetic progression and the waiting time to cancer
.
PLoS Comput. Biol.
3
:
e225
.
Bozic
I
, Antal
T
, Ohtsuki
H
, Carter
H
, Kim
D
et al. ,
2010
Accumulation of driver and passenger mutations during tumor progression
.
Proc. Natl. Acad. Sci. USA
107
:
18545
–
18550
.
Campbell
L L
, Polyak
K
,
2007
Breast tumor heterogeneity: Cancer stem cells or clonal evolution?
Cell Cycle
6
:
2332
–
2338
.
Coldman
A J
, Goldie
J H
,
1985
Role of mathematical modeling in protocol formulation in cancer chemotherapy
.
Cancer Treat. Rep.
69
:
1041
–
1048
.
Coldman
A J
, Goldie
J H
,
1986
A stochastic model for the origin and treatment of tumors containing drug-resistant cells
.
Bull. Math. Biol.
48
:
279
–
292
.
Darling
D A
,
1952
The role of the maximum term in the sum of independent random variables
.
Trans. Am. Math. Soc.
72
:
85
–
107
.
Dick
J E
,
2008
Stem cell concepts renew cancer research
.
Blood
112
:
4793
–
4807
.
Durrett
R
,
2005
Probability: Theory and Examples
.
Brooks Cole–Thomson Learning
,
Belmont, CA
.
Durrett
R
, Moseley
S
,
2010
Evolution of resistance and progression to disease during clonal expansion of cancer
.
Theor. Popul. Biol.
77
:
42
–
48
.
Durrett
R
, Foo
J
, Leder
K
, Mayberry
J
, Michor
F
,
2010
Evolutionary dynamics of tumor progression with random fitness values
.
Theor. Popul. Biol.
78
:
54
–
66
.
Durrett
R
, Mayberry
J
,
2011
Traveling waves of selective sweeps
.
Ann. Appl. Probab.
21
:
699
–
744
.
Durrett
R
, Moseley
S
,
2010
Evolution of resistance and progression to disease during clonal expansion of cancer
.
Theor. Popul. Biol.
77
:
42
–
48
.
Durrett
R
, Foo
J
, Leder
K
, Mayberry
J
, Michor
F
,
2010
Evolutionary dynamics of tumor progression with random fitness values
.
Theor. Popul. Biol.
78
:
54
–
66
.
Fidler
I J
,
1978
Tumor heterogeneity and the biology of cancer invasion and metastasis
.
Cancer Res.
38
:
2651
–
2660
.
Fidler
I J
, Hart
I R
,
1982
Biological diversity in metastatic neoplasms: origins and implications
.
Science
217
:
998
–
1003
.
Foley
M
, Tilley
L
,
1997
Quinoline antimalarials: mechanisms of action and resistance
.
Int. J. Parasitol.
27
:
231
–
240
.
Fuchs
A
, Joffe
A
, Teugels
J
,
2001
Expectation of the ratio of the sum of squares to the square of the sum: exact and asymptotic results
.
Theory Probab. Appl.
46
:
243
–
255
.
Geisler
J P
, Rose
S L
, Geisler
H E
, Miller
G A
, Wiemann
M C
,
2002
Drug resistance and tumor heterogeneity
.
CME J. Gynecol. Oncol.
7
:
25
–
28
.
Gerrish
P J
, Lenski
R E
,
1998
The fate of competing beneficial mutations in an asexual population
.
Genetica
102–103
:
127
–
144
.
Haeno
H
, Iwasa
Y
, Michor
F
,
2007
The evolution of two mutations during clonal expansion
.
Genetics
177
:
2209
–
2221
.
Haggarth
L
, Auer
G
, Busch
C
, Norberg
M
, Haggman
M
et al. ,
2005
The significance of tumor heterogeneity for prediction of DNA ploidy of prostate cancer
.
Scand. J. Urol. Nephrol.
39
:
387
–
392
.
Heppner
G H
,
1984
Tumor heterogeneity
.
Cancer Res.
44
:
2259
–
2265
.
Imhof
M
, Schlotterer
C
,
2001
Fitness effects of advantageous mutations in evolving Escherichia coli populations
.
Proc. Natl. Acad. Sci. USA
98
:
1113
–
1117
.
Kansal
A R
, Torquato
S
, Chiocca
E A
, Deisboeck
T S
,
2000
Emergence of a subpopulation in a computational model of tumor growth
.
J. Theor. Biol.
207
:
431
–
441
.
Komarova
N
,
2006
Stochastic modeling of drug resistance in cancer
.
J. Theor. Biol.
239
:
351
–
366
.
Kunkel
T A
, Bebenek
K
,
2000
DNA replication fidelity
.
Annu. Rev. Biochem.
69
:
497
–
529
.
Lai
L A
, Paulson
T G
, Li
X
, Sanchez
C A
, Maley
C
et al. ,
2007
Increasing genomic instability during premalignant neoplastic progression revealed through high resolution array-CGH
.
Genes Chromosomes Cancer
46
:
532
–
542
.
Lenski
R E
, Travisano
M
,
1994
Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations
.
Proc. Natl. Acad. Sci. USA
91
:
6808
–
6814
.
Lenski
R E
, Rose
M R
, Simpson
S C
, Tadler
S C
,
1991
Long-term experimental evolution in Escherichia coli. I. Adaptation and divergence during 2,000 generations
.
Am. Nat.
138
:
1315
–
1341
.
Logan
B F
, Mallows
C L
, Rice
S O
, Shepp
L A
,
1973
Limit distributions of self-normalized sums
.
Ann. Probab.
1
:
788
–
809
.
Loya
P
,
2005
Dirichlet and fresnel integrals via integrated integration
.
Math. Mag.
78
(
1
):
63
–
67
.
Maley
C C
, Galipeau
P C
, Finley
J C
, Wongsurawat
V J
, Li
X
et al. ,
2006
Genetic clonal diversity predicts progression to esophageal adenocarcinoma
.
Nat. Genet.
38
:
468
–
473
.
Merlo
L M
, Pepper
J W
, Reid
B J
, Maley
C C
,
2006
Cancer as an evolutionary and ecological process
.
Nat. Rev. Cancer
6
:
924
–
935
.
Michelson
S
, Ito
K
, Tran
H T
, Leith
J T
,
1989
Stochastic models for subpopulation emergence in heterogeneous tumors
.
Bull. Math. Biol.
51
:
731
–
747
.
Nguyen
H N
, Sevin
B U
, Averette
H E
, Ramos
R
, Ganjei
P
et al. ,
1993
Evidence of tumor heterogeneity in cervical cancers and lymph node metastases as determined by flow cytometry
.
Cancer
71
:
2543
–
2550
.
Nicolson
G L
,
1984
Generation of phenotypic diversity and progression in metastatic tumor cells
.
Cancer Metastasis Rev.
3
:
25
–
42
.
Oller
A R
, Rastogi
P
, Morgenthaler
S
, Thilly
W G
,
1989
A statistical model to estimate variance in long term-low dose mutation assays: testing of the model in a human lymphoblastoid mutation assay
.
Mutat. Res.
216
:
149
–
161
.
O'Sullivan
M
, Budhraja
V
, Sadovsky
Y
, Pfeifer
J D
,
2005
Tumor heterogeneity affects the precision of microarray analysis
.
Diagn. Mol. Pathol.
14
:
65
–
71
.
Park
S C
, Krug
J
,
2007
Clonal interference in large populations
.
Proc. Natl. Acad. Sci. USA
104
:
18135
–
18140
.
Park
S C
, Simon
A
, Krug
J
,
2010
The speed of evolution in large asexual populations
.
J. Stat. Phys.
138
:
381
–
410
.
Resnick
S
,
1987
Extreme Values, Regular Variation, and Point Processes
.
Springer-Verlag
,
Berlin/Heidelberg, Germany/New York
Ridout
M S
,
2008
Generating random numbers from a distribution specified by its laplace transform
.
Stat. Comput.
19
:
439
–
450
.
Rokyta
D R
, Beisel
C J
, Joyce
P
, Ferris
M T
, Burch
C L
et al. ,
2008
Beneficial fitness effects are not exponential for two viruses
.
J. Mol. Evol.
67
:
368
–
376
.
Sanjuan
R
, Moya
A
, Elena
S F
,
2004
The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus
.
Proc. Natl. Acad. Sci. USA
101
:
8396
–
8401
.
Schweinsberg
J
,
2008
The waiting time for m mutations. Electron
.
J. Probab.
13
:
1442
–
1478
.
Seshadri
R
, Kutlaca
R J
, Trainor
K
, Matthews
C
, Morley
A A
,
1987
Mutation rate of normal and malignant human lymphocytes
.
Cancer Res.
47
:
407
–
409
.
Shao
Q M
,
1997
Self-normalized large deviations
.
Ann. Probab.
25
:
285
–
328
.
Shipitsin
M
, Campbell
L L
, Argani
P
, Weremowicz
S
, Bloushtain-Qimron
N
et al. ,
2007
Molecular definition of breast tumor heterogeneity
.
Cancer Cell
11
:
259
–
273
.
Wolman
S R
,
1986
Cytogenetic heterogeneity: its role in tumor evolution
.
Cancer Genet. Cytogenet.
19
:
129
–
140
.
APPENDIX
Convergence in distribution of Zk(t):
Define
λk =
λ0 +
kb to be the maximum growth rate that can be attained by a generation
k mutant, and let
If
ν is discrete and assigns mass
gi to a finite number of values
b1 <
b2 < … <
bN =
b, then
where “⇒” denotes convergences in distribution, and
Vd,k has Laplace transform
for all
θ ≥ 0.
If instead
ν is a continuous distribution on [0,
b] with a bounded density
g that is continuous and positive at
b, then
where
Vc,k has Laplace transform
for all
θ ≥ 0. Here
dk(
λ0,
b) and
ck(
λ0,
b) are constants that depend on the model parameters (
Durrett et al. 2010), and “d” and “c” in the constants and subscripts stand for discrete and continuous. See
Durrett and Moseley (2010) for a proof of
Equation A1 and
Durrett et al. (2010) for a proof of (A2). To show the dependence on the density
g, we write the value of
c1(
λ0,
b) as
The convergence in (A2) was also numerically investigated in Figures 1 and 2 of
Durrett et al. (2010).
Point process limit:
In this section, we discuss the point process representation for the limit in Equations A1 and A2 in the case where k = 1. The limit is the sum of points in a nonhomogeneous Poisson process. Before stating this result, we introduce some terminology. Here and in what follows, we use to denote the number of points in the set A. We say that Λ is a Poisson process on (0, ∞) with mean measure μ if Λ is a random set of points in (0, ∞) with the following properties:
For any A⊂(0, ∞), N(A) = is a Poisson random variable with mean μ(A).
For any k ≥ 1, if A1, … , Ak are disjoint subsets of (0, ∞), then N(Ai), 1 ≤ i ≤ k are independent.
We also let α = λ0/λ1 ∈ (0, 1) denote the ratio of the growth rate of type-0 cells to the maximal growth rate of type-1 cells and note that 1 + p1 = 1/α.
Theorem 1.
Let Λ
be a Poisson process on (0, ∞)
with mean measureand let S denote the sum of the points in Λ
. Then positive constants Ad,
Ac = Ad(
λ0,
b),
Ac(
λ0,
b)
exist that depend on the indicated parameters so that in case i as t →∞
and in case ii as t →∞
For more details, see Durrett et al. (2010), Theorem 3, and Durrett and Moseley (2010), Corollary to Theorem 3. Note that the mean measure for Λ has tail m(x, ∞) = x−α.
Let
Xn denote the
nth largest point in Λ, and let
denote the sum of the
n largest points. To determine the dependence of
Xn on
n we first note that if we define Λ′ =
f (Λ) where
f (
x) =
x−α, then Λ′ is a Poisson process and after making the change of variables
y =
x−α, we can see that the mean measure is
In other words, Λ′ is a homogeneous Poisson process with constant intensity and hence the spacings between points are independent exponentials with mean 1. If we let
Tn denote the time of the
nth arrival in Λ′, then the law of large numbers implies that
Tn ∼
n as
n → ∞. Since
, we obtain
Xn ∼
n−1/α as
n → ∞. In addition, we have the following Lemma:
Lemma 1.
Furthermore,
if we define ,
then Proof. Since Tn has a Gamma(n, 1) distribution, we have Stirling's approximation implies that Γ(n − 1/α)/ Γ(n) ∼ n−1/α and the second conclusion follows. ■
Simpson's index:
To prove
Equation 7 in the text, we use a result in
Fuchs et al. (2001) that shows that
Here
where
Yi are iid random variables in the domain of attraction of a stable law with index
α and
Sn =
Y1 + … +
Yn. To explain the connection between the two results, note that if we have
P(
Yi >
x) =
x−α, for
x ≥ 1 and letting
Yn,i =
Yi/
n1/α, then
This implies that if we let Δ
n = {
Yn,i :
i ≤
n} be the point process associated with the
Yn,i and define the measures
and
then we have
so that we should expect
ER to agree with lim
ERn.
To make this argument rigorous, let
denote the truncated value of Simpson's index for Λ
n and
denote the truncated value for Λ. Then for any
ε > 0, we have
We complete the proof by deriving appropriate bounds for each of the three terms on the right-hand side of (A4). For the first term, we have the following:
Lemma 2.
where hε → 0
as ε → 0.
Proof. Let
ε > 0 and write
and
for
k = 1, 2. Since
for
k = 1, 2, we have the bound
After noting that
and
we have for any
δ > 0,
where we have used (A5) in the last line. To control the third term on the right, let
φ denote the Laplace transform of
Y1. Then
as
t → 0 since 1 −
e−x ∼
x as
x → 0 implies that
We can conclude that
as
n → ∞. In particular,
An,1 ⇒
A1, where
A1 has the above Laplace transform. Since
as
t → ∞, we have
P(
A1 = 0) = 0 so that taking
δ =
ε(1−α)/2 in (A6) yields the result. ■
To bound the second term on the right-hand side of (A4), we need some notation. Let
Mp denote the class of all point measures on (0, ∞). In a slight abuse of notation, we write
v ∈
ν when
ν ∈
Mp and
v ∈ supp(
ν). We equip
Mp with the topology of vague convergence (see, for example, Section 3.4 in
Resnick 1987) and take as our
σ-algebra the one generated by open sets in this topology. Associated with any random set of points, we can associate a measure
ξ that is a random variable with values in
Mp. We write Λ
n ⇒ Λ to mean that the associated random measures
ξn ⇒
ξ.
Lemma 3. Λ
n ⇒ Λ
and if we define the maps Fk,ε:
Mp → [0, ∞)
byfor k = 1, 2,
thenProof.
Since
for all Borel sets
A, the first claim follows from Proposition 3.21 in
Resnick (1987). The second claim follows from the continuous mapping theorem (see, for example,
Resnick 1987, p. 152) the fact that
Fk,ε is continuous away from measures
ν with
ε ∉
ν and the fact that the random measure associated with Λ has no point masses with probability 1. ■
As a consequence of this lemma, the fact that Rn(ε) ≤ 1, and the bounded convergence theorem, we have the following:
Corollary 1.
as n → ∞
for any ε > 0.
It thus remains to establish the following:
Proof. We can establish this result using the same results as in the proof of Lemma 2, in particular if we define
and
for
k = 1, 2. Then, following the display in
Equation A6, we have for any
δ > 0
It is obvious that
P(
A1 = 0) = 0 and
for
ε < 1, so it remains only to establish that
as
ε → 0. This result follows immediately from Lemma 1. Therefore, taking
completes the proof. ■
We can now complete the proof of Equation 7 by letting n → ∞ and then ε → 0 in (A4) and applying Lemmas 2 and 4 and Corollary 1.
Largest clones:
Equation 9 is a consequence of the following theorem:
Theorem 2.
As n → ∞,
where W has characteristic function ψ satisfying ψ (0) = 1
andfor all t ≠ 0
withThe form of the characteristic function is the same as the characteristic function for limn→∞Tn/Y(1), where the Yi are iid random variables with power law tails, Y(1) = maxi≤nYi, and (see, for example, Darling 1952). Again, this agreement is a consequence of the previously discussed connection between Δn and the limiting Poisson point process.
To prove Theorem 2, we need the following notation. For a real number
t, we define the function
For a complex number
z we denote the real part of
z by Re[
z] and its imaginary part by Im[
z].
Proof of Theorem 2. Theorem 5.1 in
Darling (1952) implies that we have
as
n → ∞ whereas in the
Simpson's index section,
Y(1) = max
i≤nYi and
To conclude that
Tn/
Y(1) ⇒
V, we need to show that
y is continuous at 0. To establish this fact, we make the change of variables
v =
tu to conclude that
Since 1 − exp(
iv) ∼ −
iv as
v → 0, the integral on the right-hand side of (A7) is finite and hence
as
t → 0. Since
Tn/
Y(1) ⇒
V, the fact that
Sn/
X1 ⇒
V follows from the arguments in the previous section. ■
It is interesting to note that the characteristic function in Theorem 2 is not integrable. The problem is that the density of
blows up near 1. As an explanation for this, we note that with probability
there is a point in the process bigger than x and no points in [1, x). When this happens,
and so
If we had
Fn(
y) ? (
y − 1)
α, then the density would blow up like (
y−1)
α−1 as
y → 1. We confirm that this gives the right asymptotic by providing an explicit formula for the density of W.
Corollary 2.
W has a density on (1, ∞)
given by Note that integral expression above does not converge absolutely so part of the proof consists of showing that the limit exists. If we apply the change of variable
s =
t(
y − 1) in the definition of
f, we see that
thus confirming the intuition that the density blows up like (
y − 1)
α−1 as
y approaches 1.
Proof of Corollary 2. We first establish that there are no point masses in the distribution of
V. By the inversion formula we have for any
a ∈ ℝ,
If we focus on the positive axis and use the change of variable
s =
t/
T,
From display (A7) it follows that for every
s ∈ (0, 1) we have
eisT(1−a)/
fα(
sT) → 0 as
T→∞. Note that
which implies |
fα(
t)| ≥ 1 for all
t. Therefore |
eisT(1−a)/
fα(
t)| ≤ 1 for all
t and it follows via the dominated convergence theorem that
A similar result holds for the integral on the negative axis and we conclude that
We can therefore conclude for
x > 1 and
h > 0 via the inversion formula (see
Durrett 2005, Equation 3.2) and Fubini's theorem that
Therefore, to establish the result we need to show that
This follows if we show that
is a convergent integral and that a bounded function
h exists defined on (
x,
x +
h) such that
We first use integration by parts to see
Recalling that |
fα(
T)| → ∞ as
T → ±∞, it follows that if we establish that
is integrable on (−∞, ∞), then the convergence of the integral and the existence of a bounded dominating function will be established. Since
fα is bounded away from 0, it suffices to check that the function decays fast enough. Recalling the definition of
fα,
which follows by passing the derivative inside the integral in the definition of
fα. We can establish that
by observing
which can be found in many places,
e.g.,
Loya (2005). Thus,
We can similarly establish that for
t sufficiently large
for a positive finite constant
C1. Thus for
t sufficiently large
establishing the result. ■
We conclude this section with the proof of
Equation 10. Using the Taylor series expansion of exp(
iu) ∼ 0 in (A7) above implies that
and therefore
so that in particular,
for all
k ≥ 1. Let
S(
t) = log
ψ(
t) =
it − log
fα(
t). Then dropping the
α subscript on
fα, we have
which yields the desired result for the mean:
Now
so
completing the proof. ■
© Genetics 2011