Abstract

With rare exceptions, human tumors arise from single cells that have accumulated the necessary number and types of heritable alterations. Each such cell leads to dysregulated growth and eventually the formation of a tumor. Despite their monoclonal origin, at the time of diagnosis most tumors show a striking amount of intratumor heterogeneity in all measurable phenotypes; such heterogeneity has implications for diagnosis, treatment efficacy, and the identification of drug targets. An understanding of the extent and evolution of intratumor heterogeneity is therefore of direct clinical importance. In this article, we investigate the evolutionary dynamics of heterogeneity arising during exponential expansion of a tumor cell population, in which heritable alterations confer random fitness changes to cells. We obtain analytical estimates for the extent of heterogeneity and quantify the effects of system parameters on this tumor trait. Our work contributes to a mathematical understanding of intratumor heterogeneity and is also applicable to organisms like bacteria, agricultural pests, and other microbes.

HUMAN cancers frequently display substantial intratumor heterogeneity in genotype, gene expression, cellular morphology, metabolic activity, motility, and behaviors such as proliferation rate, antigen expression, drug response, and metastatic potential (Fidler and Hart 1982; Heppner 1984; Nicolson 1984; Campbell and Polyak 2007; Dick 2008). For example, a molecular and phenotypic analysis of breast cancer cells has revealed defined subpopulations with distinct gene expression and (epi)genetic profiles (Shipitsin  et al. 2007). Heterogeneity and the existence of subpopulations within single tumors have also been demonstrated via flow cytometry in cervical cancers and lymph node metastases (Nguyen  et al. 1993) as well as in leukemias (Wolman 1986). Virtually every major type of human cancer has been shown to contain distinct cell subpopulations with differing heritable alterations (Heppner 1984; Merlo  et al. 2006; Campbell and Polyak 2007). Heterogeneity is also present in premalignant lesions; for instance, genetic clonal diversity has been observed in Barrett's esophagus, a condition associated with increased risk of developing esophageal adenocarcinoma (Maley  et al. 2006; Lai  et al. 2007).

Tumor heterogeneity has direct clinical implications for disease classification and prognosis as well as for treatment efficacy and the identification of drug targets (Merlo  et al. 2006; Campbell and Polyak 2007). The degree of clonal diversity in Barrett's esophagus, for instance, is correlated with clinical progression to esophageal adenocarcinoma (Maley  et al. 2006). In prostate carcinomas, tumor heterogeneity plays a key role in pretreatment underestimation of tumor aggressiveness and incorrect assessment of DNA ploidy status of tumors (Wolman 1986; Haggarth  et al. 2005). Heterogeneity has long been implicated in the development of resistance to cancer therapies after an initial response (Geisler 2002; Merlo  et al. 2006) as well as in the development of metastases (Fidler 1978). In addition, tumor heterogeneity hampers the precision of microarray-based analyses of gene expression patterns, which are widely used for the identification of genes associated with specific tumor types (O'Sullivan  et al. 2005). These issues underscore the importance of obtaining a more detailed understanding of the origin and temporal evolution of intratumor heterogeneity.

To study the dynamics of intratumor heterogeneity, we construct and analyze a stochastic evolutionary model of an expanding population with random mutational fitness advances. Evolutionary models of populations with random mutational advances have been studied in the context of fixed-size Wright–Fisher processes for both finite and infinite populations (Gerrish and Lenski 1998; Park and Krug 2007; Park  et al. 2010); Gerrish and Lenski (1998) studied the speed of evolution in a Wright–Fisher model with random mutational advances in the context of finite but large populations while Park  et al. (2010) obtained accurate asymptotic approximations for the evolutionary dynamics of the population, following ideas presented in Park and Krug (2007). The latter work and references therein constitute a substantial exploration of the effects of random mutational advances in fixed-size populations. Our present work complements this research by exploring the effects of random mutational advances in the context of exponentially expanding populations and in particular the implications of these mutational advances on population heterogeneity. Models of exponentially expanding populations are appropriate for the study of situations arising during tumorigenesis, but are also applicable to other organisms undergoing binary replication such as bacteria, agricultural pests, and other microbes and pathogens. Bacterial populations, for instance, are diverging quickly in both genotype and phenotype, as studied by Lenski and colleagues. These investigators examined the dynamics of phenotypic evolution in populations of Escherichia coli that were propagated by daily serial transfer for 1500 days, yielding 10,000 generations of binary fission (Lenski  et al. 1991; Lenski and Travisano 1994). The fitness of the bacteria improved on average by 50% relative to the ancestor, and other phenotypic properties, such as cell size, also underwent large changes. Similarly, single malaria isolates have been found to consist of heterogeneous populations of parasites that can have varying characteristics of drug response, from highly resistant to completely sensitive (Foley and Tilley 1997). These findings have implications for treatment strategies, as not all pathogen populations are sensitive to therapeutic interventions, and necessitate the study of diversity dynamics in growing populations of cells.

In this article, we consider an exponentially expanding population of tumor cells in which (epi)genetic alterations confer random fitness changes to cells. This model is used to investigate the extent of genetic diversity in tumor subpopulations as well as its evolution over time. The mathematical framework is based on the clonal evolution model of carcinogenesis, which postulates that tumors are monoclonal (i.e., originating from a single abnormal cell) and that over time the descendants of this ancestral cell acquire various combinations of mutations (Merlo  et al. 2006; Campbell and Polyak 2007). According to this model, genetic drift and natural selection drive the progression and diversity of tumors. Our work complements studies of the effects of random mutational fitness distributions on the growth kinetics of tumors (Durrett  et al. 2010) and contributes to the mathematical investigation of intratumor heterogeneity (Coldman and Goldie 1985, 1986; Michelson  et al. 1989; Kansal  et al. 2000; Komarova 2006; Haeno  et al. 2007; Schweinsberg 2008; Bozic  et al. 2010; Durrett and Moseley 2010).

MATERIALS AND METHODS

We consider a multitype branching process model of tumorigenesis in which (epi)genetic alterations confer an additive change to the birth rate of the cell. This additive change is drawn according to a probability distribution v, which is referred to as the mutational fitness distribution. Cells that have accumulated i ≥ 0 mutations are denoted as type-i cells. Initially, the population consists entirely of type-0 cells, which divide at rate a0, die at rate b0, and produce type-1 cells at rate u. The initial population, whose size is given by V0, is considered to be sufficiently large so that its growth can be approximated by Z0(t)=V0exp[λ0t], where λ0  = a0  − b0 and time t is measured in units of cell division. Type-1 cells divide at rate a0 + X, where X ≥ 0 is drawn according to the distribution v, and give rise to type-2 cells at rate u. All cell types die at rate b0 < a0. In general, a type-(k − 1) cell with birth rate ak−1 produces a new type-k cell at rate u, and the new type-k cell type divides at rate ak−1 +X, where X0 is drawn according to v. Each type-k cell produced by a type-(k − 1) cell initiates a genetically distinct lineage of cells, and the set of all of its type-k descendants is referred to as its family. The total number of type-k cells in the population at time t is given by Zk(t) and the set of all type-k cells is called the kth wave or generation k. The total population size at time t is given by Z(t).

Note that in our model, mutations are not coupled to cell divisions but instead occur at a fixed rate per unit of time. This model can be modified to exclusively consider (epi)genetic alterations that occur during cell division by assuming that type-0 individuals divide at rate α0 and during each division, there is a chance μ that a mutation occurs, producing a type-1 individual. These two model versions are equivalent for wave-1 cells if a0 = α0(1 − μ) and u = α0μ. For later waves, the model must be altered so that the mutation rate is dependent upon the genetic constitution of the cells, since accumulation of alterations modifies the fitness of the cell and hence the rate at which further changes are accumulated. Analysis of this model would then require the mutation rate term to be inside the integral over the support of the fitness distribution; however, this modification does not alter the limiting results significantly. Our model and results can also be modified to account for generation-dependent mutation rates (Coldman and Goldie 1985; Park  et al. 2010).

The mutational fitness distribution v determines the effects of each (epi)genetic change that is accumulated in the population of cells. We consider fitness distributions concentrated on [0, b] for some b > 0. We discuss two distinct classes of distributions: (i) v is discrete and assigns mass gi to a finite number of values b1<b2<...<bN=b; (ii) v is continuous with a bounded density g(x) that is continuous and positive at b. Figure 1 shows a snapshot of the population decomposition in a sample simulation for case ii and illustrates the complex genotypic composition of the population of cells generated by our model.

Figure 1.—

A sample cross-section of tumor heterogeneity. (A) The composition of a tumor cell population at t=150 time units after tumor initiation. Each “wave” of cells, defined as the set of cells harboring the same number of (epi)genetic alterations, is represented by a different color. Individual clones of cells with identical genotype are shown as circles, positioned on the horizontal axis according to fitness (i.e., birth rate) and on the vertical axis according to clone size. Note that later waves tend to have larger fitness values. (B) The time at which individual clones of cells were created during tumor progression. The color scale depicts the time of emergence of each clone. In A and B, parameters are a0=0.2, b0=0.1, vU([0,0.05]), and u=0.001.

The determination of the distribution v in the context of bacteria and viruses has been the subject of several experimental studies (Imhof and Schlotterer 2001; Sanjuan  et al. 2004), which have generally produced results leading to the conclusion that v has an exponential distribution. However, more recently Rokyta and colleagues (Rokyta  et al. 2008) presented studies of bacteriophages, in which the distribution of beneficial mutational effects appears to have a truncated right tail. One possible explanation for this result is that the experiments were done at an elevated temperature that might have led to a limited number of available beneficial mutations. A similar scenario might arise during tumorigenesis when only a limited number of (epi)genetic alterations allow a cell to progress to a more aggressive phenotype.

RESULTS

There are two sources of heterogeneity present in the population: variability in the number of mutations per cell (heterogeneity between generations) and genotypic variation between members of the same generation (heterogeneity within a generation). We investigate these two sources of heterogeneity and derive analytic results that quantify the relationship between model parameters—e.g., mutation rate and mutational fitness distribution—and the amount of genotypic variation present in the population over time.

Between-generation heterogeneity:

Asymptotic results for the size of generation k were obtained in (Durrett  et al. 2010; Durrett and Moseley 2010); see Equations A1 and A2 in the appendix for restatements of the relevant results from these articles. Equation A1 implies that in case i, for example, we have the approximation
logZk(t)λkt(k+pk)log(1/u)+logVd,k,
(2)
when t is large, where λk=λ0+kb is the maximum growth rate that can be attained by generation k mutants,
pk=k+j=0k1λkλj,
and Vd,k is a positive random variable with known Laplace transform. In case ii there is an additional term in (2) of the form (k+pk)logt. Dividing both sides of (2) by L=log(1/u) and speeding up time by a factor of L, we note that the log size of generation k approaches a deterministic, linear limit as the mutation rate becomes very small. In particular, as u0, we have
(1/L)log+Zk(Lt)zk(t)=[λkt(k+pk)]+=λk(tβk)+
(3a)
in probability, where
βk=k+pkλk=j=0k11λj.
(3b)
The limiting process depends on λ0, the growth rate of type-0 cells, and b, the maximum attainable fitness increase, but is otherwise independent of the particular choice of fitness distribution. An example of the limiting process is displayed in Figure 2, a and b.
Figure 2.—

The process for the small mutation limit. The limiting process is shown for the case in which the mutation rate goes to zero. (A) The time, t, on the horizontal axis vs. the log number of cells of wave k, zk(t), as given by Equation 3. Both time and space are given in units of L = log(1/u). This plot shows the first 20 waves started at t=10=1/λ0,  i.e., the time that type-1 cells begin to be born. (B) A closer look at the first 7 waves from a, showing the changes in the dominant type. (C) The birth times of the first 20 generations as a function of the generation number. (D) The dominant type in the population as a function of time. The index m(t) of the largest generation at time t is defined as zm(t)(t)=max{zk(t):k0}. In A–D, parameters are λ0=0.1 and b=0.05.

As a consequence of Equation 3, we obtain the following insight regarding the birth time of type-k cells: if Tk=inf{t0:Zk(t)>0} is the first time a type-k individual is born, then, as u0,
TkLβk
(4)
in probability for all k ≥ 0. From the definition of βk, we have that βkβk1=1/λk is decreasing so that the increments between the birth times for successive generations decrease as k increases. This effect leads to an acceleration in the rate at which new mutations are accumulated (Figure 2c). This acceleration occurs regardless of the choice of fitness distribution, assuming that i or ii holds. Since 1/λk is inversely proportional to b, distributions that allow for larger fitness increases tend to exhibit shorter increments.

Bozic  et al. (2010) observed a similar acceleration of waves in their model of mutation accumulation. On the basis of approximations by Beerenwinkel  et al. (2007), they concluded that this acceleration was an artifact of the presence of both passenger and driver mutations and that it does not occur when only driver mutations conferring a fixed selective advantage are considered—i.e., when the fitness increments are deterministic. In contrast to these conclusions, we find that the acceleration of waves occurs regardless of the choice of mutational fitness distribution and is due to the difference in growth rates between successive generations: type-k cells arise when generation k − 1 reaches size O(1/u) and since the asymptotic growth rate of generation k is larger than the asymptotic growth rate of generation k − 1, generation k + 1 reaches size O(1/u) faster than generation k. We observed another example of this phenomenon earlier during our study of a related Moran model for tumor growth in which the total population of cells grows at a fixed exponential rate (Durrett and Mayberry 2010). In this model, the cause of acceleration was similarly related to growth rates—later generations take longer to achieve dominance in the expanding population of cells, and hence new types are born with a higher fitness advantage compared to the population bulk, allowing them to grow more rapidly.

As a second application of Equation 3, we derive an analytic expression for the time at which type-k cells become dominant in the population. Let Sk=inf{t0:Zk(t)>Zj(t), for all jk} be the first time that type-k cells become the most frequent cell type in the population. Then, as u0, we have
SkLtk=b1+βk
(5)
in probability for all k1. The limit tk is the solution to λk(tkβk)=λk1(tkβk1), i.e., the time when zk(t) first overtakes zk1(t). Figure 2d demonstrates how the index m(t) of the largest generation at time t, defined as zm(t)(t)=max{zk(t):k0}, changes over time. The transitions between periods of dominance are sharp only in the small mutation limit. At any given time, the population consists primarily of members of the current dominant generation; i.e., (1/L)log Z(Lt)→ zm(t)(t) as u0. Therefore, for small mutation rates, the amount of genetic heterogeneity present in the population is determined by the amount of heterogeneity present in the dominant generation.
To determine the accuracy of our results when the mutation rate is small, we compare our limiting approximation in Equation 3 with the results of numerical simulations, using a mutational fitness distribution corresponding to a point mass at b [i.e., vδ(b)]. Given that there are ∼3 billion base pairs in the human genome and the mutation rate per base pair is O(108)O(1010) (Seshadri  et al. 1987; Oller  et al. 1989; Kunkel and Bebenek 2000), point mutations occur at a rate of 0.3–30 per cell division. However, since advantageous mutations constitute only a fraction of all possible mutations, the mutation rate per cell division for advantageous mutations is smaller than the overall mutation rate per cell division. In the following example, we use a mutation rate per cell per time unit of u=105 and a cell division rate of a = 0.2. In Figure 3, we compare the average size of the kth generation in simulations (Figure 3a) with the approximation given by Equation 3 (Figure 3b). Although the behavior is qualitatively similar to the small mutation limit, the approximation consistently underestimates the times at which new waves appear. To explain the source of this bias, we use the alternative approximation given by the right-hand side of Equation 2, which can be rewritten as
zˆkL(t)=Lzk(t/L)+logVd,k,
(6)
where L=log(1/u). Using the expression for the Laplace transform of Vd,k and the numerical algorithm presented in Ridout (2008), we sample 1000 variates from the distribution of Vd,k. Table 1 shows the sample mean and standard deviation of log Vd,k, for k=1,2,3. The distribution of log Vd,k has a positive mean and is skewed to the right (Figure 4), implying that the limit in Equation 3 in general underestimates the size of generation k for positive mutation rates. The approximation obtained by replacing log Vd,k with the sample mean of log Vd,k is displayed in Figure 3c. After an initial period in which the number of type-k cells is small, the behavior of the process closely resembles the one shown in Figure 3a. The variance of log Vd,k increases with k and hence, we expect an increasing amount of variability in the simulations in the time when type-k cells arise. Figure 3d displays the right-hand side of Equation 6, replacing log Vd,k with the value two standard deviations above its mean to illustrate an extreme scenario.
Figure 3.—

The size of the first four generations of cells. The log size of generations 1–4 is shown as a function of time t. Both time and space are plotted in units of L=log(1/u). (A) The average values of the log generation sizes over 106 sample simulations. (B) The limiting approximation from Equation 3 for the log size of the generations. (C) The approximation from Equation 6 using the mean of logVd,k. (D) The approximation from Equation 6 using a value two standard deviations above the mean of logVd,k to demonstrate an extreme scenario. Parameters are u=105, vδ(b), a0=0.2, b0=0.1, and b=0.05.

Figure 4.—

The distribution of log Vd,1. The relative frequency histogram of 1000 random samples from the distribution of logVd,1 is shown, as specified by Equation 1.

TABLE 1

Means and standard deviations for log Vd,k for k = 1, 2, 3, as specified by Equation 1

GenerationMeanStandard deviation
14.76381.3738
210.60102.2434
317.05193.0282
GenerationMeanStandard deviation
14.76381.3738
210.60102.2434
317.05193.0282
TABLE 1

Means and standard deviations for log Vd,k for k = 1, 2, 3, as specified by Equation 1

GenerationMeanStandard deviation
14.76381.3738
210.60102.2434
317.05193.0282
GenerationMeanStandard deviation
14.76381.3738
210.60102.2434
317.05193.0282

While the limit in (3) depends on the fitness distribution only through the maximum attainable fitness increase b, the distribution of Vd,k also depends on the fitness distribution through the probability of attaining a fitness advance of b if v is discrete and the value of the probability density function at b if v is continuous (see Equation 4.9 in Durrett  et al. 2010). As a consequence, our finite time approximation (6) takes into consideration the shape of the fitness distribution near b and the corrector term log Vd,k accounts for variations in the likelihood of attaining the maximum possible fitness advance.

Within-generation heterogeneity:

We begin our investigation of within-generation heterogeneity by examining the extent of diversity present in the first generation of cells. We use two statistical measures to assess heterogeneity: (i) Simpson's index, which is given by the probability that two randomly chosen cells from the first generation stem from the same clone, and (ii) the fraction of individuals in the first generation that stem from the largest family of cells. To obtain these results, we derive an alternate formulation of the limit in Equation 1 that shows the limit is the sum of points in a nonhomogeneous Poisson process (see the  appendix for more details). Each point in the limiting process represents the contribution of a different mutant lineage to Z1(t) so that it suffices to calculate i and ii for the points in the limiting process.

Simpson's index:

Let us introduce some terminology by defining Xn to be the nth largest point in the limiting point process and by setting Sn=i=1nXi. Then Simpson's index for the point process is defined by
R=i=1Xi2(S)2=i=1(XiS)2.
We may also consider Simpson's index for a random walk Rn=i=1n(Yi/Wn)2, where the Yi are independent random variables with a tail probability P(Yi>x)=xα, and Wn=i=1nYi. Then the limit as n goes to infinity of Rn is 1α (Fuchs  et al. 2001), where α=λ0/λ1(0,1) denotes the ratio of the growth rate of type-0 cells to the maximal growth rate of type-1 cells. Furthermore, the expected value of Rn converges to the expected value of R so we have
ER=1α.
(7)
See the  appendix for details of this calculation. Equation 7 shows that the average amount of heterogeneity present in the first generation depends only on α. Figure 5 displays the sample mean of Simpson's index of Z as a function of time for different values of α. Initially, the sample mean tends toward 1α, the expected value of Simpson's index for the limiting point process. Although the sample mean is greater than the limiting value for larger values of b, our theory guarantees that eventually, the values of the sample mean converge. However, it is impossible to simulate the process for such long times since the population size and number of cell types become too large.
Figure 5.—

The expected value of Simpson's index for the first wave of cells. The sample mean of Simpson's index (dots) over time t and the expected value of Simpson's index for the limiting point process (line) are shown, for two different values of α. Parameters are λ0=0.1, a0=0.2, and vU([0,b]), where b=0.01 in a and b=0.05 in b.

Expressions for the density and higher moments of R can be obtained in a similar manner by the comparison techniques we used in the proof of Equation 7 (see  appendix and Logan  et al. 1973; Shao 1997). In addition, near the origin, the density g(x) of R has the form
g(x)ax3/2exp[bx1]
(8)
as x0.  Figure 6 displays simulations of Simpson's index for the first wave mutants in the branching process Z. Note the convergence of the empirical distribution of Simpson's index to the distribution for the limiting point process.
Figure 6.—

The empirical distribution of Simpson's index for the first wave of cells. Individual plots of Simpson's index are shown for the branching process at times t=70, 90, 110, and 130 along with Simpson's index for the limiting point process (t=). The histograms show the average over 1000 sample simulations. For the limiting point process, we approximate Simpson's index by examining the largest 104 points in the process. Parameters are λ0=0.1, a0=0.2, and vU([0,0.01]).

Largest clones:

To further investigate heterogeneity properties of the point process, we examine the fraction of cells descended from the largest family of first-generation mutants defined as Vn=X1/Sn. This quantity reveals the degree of dominance of the largest clone in the first wave of mutants. For large n, values of Vn near one indicate that the population is largely dominated by a single clone, while values near zero indicate a highly heterogeneous population where no single clone contributes significantly to the total. Using a similar approach to that in the previous section, we again consider this calculation in the context of a random walk. Consider a sequence of independent, identically distributed random variables Yi with partial sums Wn. Define the maximum value of this sequence from 1 to n to be Y(1). Then classical results regarding one-dimensional random walks characterize the limiting characteristic function of Wn/Y(1) (Darling 1952). In the appendix, we demonstrate that these results can be applied to the study the largest clone contributions in the limiting point process.

We show that 1/Vn converges in distribution to a nontrivial limit W and obtain an explicit formula for the characteristic function of the limit: as n, Vn1W, where W has characteristic function ψ satisfying ψ(0) = 1 and
ψ(t)=exp[it]fα(t) for all t0,
(9a)
with
fa(t)=1+α01(1eitu)u(α+1)du.
(9b)
Interestingly, the characteristic function of W is nonintegrable since its density has a singularity at 1. This finding implies that there is a disproportionately large chance that a single clone dominates the population. Further details are shown in the  appendix.
Differentiating ψ then leads to simple expressions for the mean and variance of the limit,
EW=11α and var(W)=2(1α)2(2α).
(10)
Figure 7a suggests that the rate of convergence is slow for α close to 1. These formulas provide us with the first two moments of the diversity measure W and reveal its dependence on α.
Figure 7.—

The largest clones in the population of cells. (A) A comparison between Monte Carlo estimates for EVn1 and the limit (1α)1. (B) A comparison of the Monte Carlo estimates for EVn and the curve (1α)+. The Monte Carlo estimates are averaged over 100 sample simulations.

Equation 9 implies that Vn converges to a nontrivial limit V = W−1 and Jensen's inequality applied to the strictly convex function 1/x implies that E(limX1/Sn)>1α. This result provides a lower bound on the expected value of the limit of Vn and indicates that for values of α close to zero, the population is eventually dominated by a single clone. Even though this result indicates only a lower bound, simulations suggest that deviations of the mean from 1−α are small, as illustrated in Figure 7b.

Extensions to generation k:

The results obtained in the previous two sections for the first generation of cells can easily be extended to later generations by noting that each mutation to a type-(k − 1) individual starts a new family regardless of the mutated cell's family tree. Therefore, the amount of heterogeneity within generation k depends on the relative growth rates of type-k and type-(k − 1) individuals in the same way that the amount of heterogeneity within generation 1 depends on the relative growth rates of type-1 and type-0 individuals. In particular, letting Rk denote the limiting value of Simpson's index for generation k, we have
ERk=1αk,
(11)
where αk=λk1/λk is the relative growth rate of type-(k −1) individuals compared to type-k individuals (see also Equation 7 above). Note that the relative growth rates αk are increasing functions of k. Therefore, Equation 11 shows that the mean of Simpson's index is a decreasing function of the generation number.

Total population heterogeneity:

We conclude this section by demonstrating how heterogeneity measures for the population as a whole can be calculated using information about both intra- and interwave heterogeneity. First, define the total collection of all genotypes present at time t as N(t). For a particular genotype, x, the number of cells at time t that have exactly this genotype is given by Z(x)(t). Two cells have the same genotype if they contain exactly the same collection of mutations. Then Simpson's index for the entire population is given by
SI(Z(t))=xN(t)(Z(x)(t)Z(t))2.
To show how this expression depends on the contributions of different waves, define the total collection of genotypes present in the wave-k population at time t by Nk(t), and let Zk(x) be the population of cells in wave k that have genotype x. By defining K(t) to be the number of waves present at time t, we obtain the alternate expression
SI(Z(t))=k=0K(t)xkNk(t)(Zk(xk)(t)Zk(t))2(Zk(t)Z(t))2.
This decomposition expresses Simpson's index for the whole population in terms of Simpson's index for each wave and the contribution of each wave to the total population. Combining the result in (5), which gives the dominant wave as a function of time, with (11), which describes wave-k heterogeneity, we obtain a description of how the extent of heterogeneity changes in time. However, more refined results are needed to describe the transitions between the dominance of successive waves.

DISCUSSION

In this article, we investigated the evolution of intratumor heterogeneity in a stochastic model of tumor cell expansion. Our model incorporates random mutational advances conferred by (epi)genetic alterations and our analysis focused on the extent of heterogeneity present in the tumor. We first considered heterogeneity between tumor subpopulations with varying numbers of alterations and obtained limiting results, as the mutation rate approaches zero, for the contribution of each wave of mutants to the total tumor cell population. We showed that in the limit, this intergeneration heterogeneity depends on the maximum attainable fitness advance conferred by (epi)genetic alterations, but not on the specific form of the mutational fitness distribution. Our analysis also led to analytical expressions for the arrival time of the first cell with k mutations and showed that the rate of accumulation of new genetic alterations accelerates over time due to the increasing growth rates of successive generations. We demonstrated with stochastic simulations that for small but positive mutation rates, our limiting approximations provide good predictions of the model behavior (see Figure 3). These simulations also suggest that as time increases, multiple waves of mutants coexist without a single, largely dominant wave. For large t, the mean growth rate of the kth wave is given by λ0+kb, showing that variation in fitness within a particular wave is a transient property. The extent to which this variation affects tumor dynamics at small times is the subject of ongoing work.

We also investigated the genotypic diversity present within the kth generation of mutants by considering two measures of diversity: Simpson's index, which is given by the probability that two randomly selected cells stem from the same family, and the fraction of individuals in generation k that stem from the largest family of individuals. We obtained limiting expressions for the mean of Simpson's index as well as the form of its density near the origin. Interestingly, the limiting mean of Simpson's index is given by the quantity 1−α, where α is the ratio between the maximum attainable fitness values of type-(k − 1) and type-k individuals. We then observed that, as time increases, the mass of the distribution of Simpson's index moves closer to 0, indicating higher levels of diversity in the tumor at later times (see Figure 6). This behavior was also observed via direct numerical simulation of the branching process—the distribution and mean of Simpson's index converged to the predicted limiting values.

Finally, we investigated the ratio between the total population size of the kth wave of mutants and the size of the largest family. We showed that this ratio can be approximated by a random variable with mean (1−α)−1. An explicit formula for the characteristic function of this random variable was also obtained (see Equation 9). Note that as α approaches 1, the mean of the ratio grows to infinity—i.e., the largest family of cells constitutes a vanishing proportion of the total population of wave-k cells as the maximum possible fitness advance goes to zero.

In the context of tumorigenesis, where a tumor originally starting from a single cell reaches cell numbers of ≥1012, the limit as t goes to infinity is indeed the appropriate regime of study. We have compared our results regarding Simpson's index with numerical simulations at finite times (see, e.g., Figures 5 and 6) and have found good qualitative agreement with the asymptotic limits. Estimates of overall mutation rates in evolving cancer cell populations range from 10−9 to 10−2. In Figure 3 we demonstrate good agreement between our results regarding interwave heterogeneity in the small mutation rate limit (u0) and numerical simulations when the mutation rate is 105; thus, analysis in this limiting regime captures the dynamics for mutation rates <105 per cell division.

In conclusion, our analysis indicates that tumor diversity is strongly dependent upon the age of the tumor and the maximum attainable fitness advance of mutant cells. If only small fitness advances are possible, then the tumor population is expected to have a larger extent of diversity compared to situations in which fitness advances are large. The acceleration of waves observed in our studies of intergeneration heterogeneity provides evidence that an older tumor has a higher level of diversity than a young tumor. In addition, we have shown that the mean of Simpson's index for generation k is a decreasing function of the generation number (see Equation 11), indicating a larger extent of diversity in later generations and suggesting a further increase in the total extent of heterogeneity present in the tumor at later times.

Possible extensions of our model include spatial considerations and the effects of tissue organization on the generation of intratumor heterogeneity as well as the inclusion of other cell types, such as immune system cells and the microenvironment. Furthermore, alternative growth dynamics should be considered to test the extent of heterogeneity arising in populations that follow logistic, Gompertzian, or other growth models. We have neglected these aspects in the current version of the model to focus on the dynamics of tumor diversity in an exponentially growing population of cells. Our model provides a rational understanding of the extent and dynamics of intratumor heterogeneity and is useful for obtaining an accurate picture of its generation during tumorigenesis.

Acknowledgements

The authors thank Theresa Edmonds for exceptional research assistance. This work is supported by National Cancer Institute grant U54CA143798 (to J.F., K.L., and F.M.), National Science Foundation (NSF) grant DMS 0704996 (to R.D.), and NSF RTG grant DMS 0739164 (to J.M.).

Footnotes

Communicating editor: M. W. Feldman

LITERATURE CITED

Beerenwinkel
N
,
Antal
T
,
Dingli
D
,
Traulsen
A
,
Kinzler
K W
 et al. ,
2007
 
Genetic progression and the waiting time to cancer
.
PLoS Comput. Biol.
 
3
:
e225
.

Bozic
I
,
Antal
T
,
Ohtsuki
H
,
Carter
H
,
Kim
D
 et al. ,
2010
 
Accumulation of driver and passenger mutations during tumor progression
.
Proc. Natl. Acad. Sci. USA
 
107
:
18545
18550
.

Campbell
L L
,
Polyak
K
,
2007
 
Breast tumor heterogeneity: Cancer stem cells or clonal evolution?
 
Cell Cycle
 
6
:
2332
2338
.

Coldman
A J
,
Goldie
J H
,
1985
 
Role of mathematical modeling in protocol formulation in cancer chemotherapy
.
Cancer Treat. Rep.
 
69
:
1041
1048
.

Coldman
A J
,
Goldie
J H
,
1986
 
A stochastic model for the origin and treatment of tumors containing drug-resistant cells
.
Bull. Math. Biol.
 
48
:
279
292
.

Darling
D A
,
1952
 
The role of the maximum term in the sum of independent random variables
.
Trans. Am. Math. Soc.
 
72
:
85
107
.

Dick
J E
,
2008
 
Stem cell concepts renew cancer research
.
Blood
 
112
:
4793
4807
.

Durrett
R
,
2005
 
Probability: Theory and Examples
.
Brooks Cole–Thomson Learning
,
Belmont, CA
.

Durrett
R
,
Moseley
S
,
2010
 
Evolution of resistance and progression to disease during clonal expansion of cancer
.
Theor. Popul. Biol.
 
77
:
42
48
.

Durrett
R
,
Foo
J
,
Leder
K
,
Mayberry
J
,
Michor
F
,
2010
 
Evolutionary dynamics of tumor progression with random fitness values
.
Theor. Popul. Biol.
 
78
:
54
66
.

Durrett
R
,
Mayberry
J
,
2011
 
Traveling waves of selective sweeps
.
Ann. Appl. Probab.
 
21
:
699
744
.

Durrett
R
,
Moseley
S
,
2010
 
Evolution of resistance and progression to disease during clonal expansion of cancer
.
Theor. Popul. Biol.
 
77
:
42
48
.

Durrett
R
,
Foo
J
,
Leder
K
,
Mayberry
J
,
Michor
F
,
2010
 
Evolutionary dynamics of tumor progression with random fitness values
.
Theor. Popul. Biol.
 
78
:
54
66
.

Fidler
I J
,
1978
 
Tumor heterogeneity and the biology of cancer invasion and metastasis
.
Cancer Res.
 
38
:
2651
2660
.

Fidler
I J
,
Hart
I R
,
1982
 
Biological diversity in metastatic neoplasms: origins and implications
.
Science
 
217
:
998
1003
.

Foley
M
,
Tilley
L
,
1997
 
Quinoline antimalarials: mechanisms of action and resistance
.
Int. J. Parasitol.
 
27
:
231
240
.

Fuchs
A
,
Joffe
A
,
Teugels
J
,
2001
 
Expectation of the ratio of the sum of squares to the square of the sum: exact and asymptotic results
.
Theory Probab. Appl.
 
46
:
243
255
.

Geisler
J P
,
Rose
S L
,
Geisler
H E
,
Miller
G A
,
Wiemann
M C
,
2002
 
Drug resistance and tumor heterogeneity
.
CME J. Gynecol. Oncol.
 
7
:
25
28
.

Gerrish
P J
,
Lenski
R E
,
1998
 
The fate of competing beneficial mutations in an asexual population
.
Genetica
 
102–103
:
127
144
.

Haeno
H
,
Iwasa
Y
,
Michor
F
,
2007
 
The evolution of two mutations during clonal expansion
.
Genetics
 
177
:
2209
2221
.

Haggarth
L
,
Auer
G
,
Busch
C
,
Norberg
M
,
Haggman
M
 et al. ,
2005
 
The significance of tumor heterogeneity for prediction of DNA ploidy of prostate cancer
.
Scand. J. Urol. Nephrol.
 
39
:
387
392
.

Heppner
G H
,
1984
 
Tumor heterogeneity
.
Cancer Res.
 
44
:
2259
2265
.

Imhof
M
,
Schlotterer
C
,
2001
 
Fitness effects of advantageous mutations in evolving Escherichia coli populations
.
Proc. Natl. Acad. Sci. USA
 
98
:
1113
1117
.

Kansal
A R
,
Torquato
S
,
Chiocca
E A
,
Deisboeck
T S
,
2000
 
Emergence of a subpopulation in a computational model of tumor growth
.
J. Theor. Biol.
 
207
:
431
441
.

Komarova
N
,
2006
 
Stochastic modeling of drug resistance in cancer
.
J. Theor. Biol.
 
239
:
351
366
.

Kunkel
T A
,
Bebenek
K
,
2000
 
DNA replication fidelity
.
Annu. Rev. Biochem.
 
69
:
497
529
.

Lai
L A
,
Paulson
T G
,
Li
X
,
Sanchez
C A
,
Maley
C
 et al. ,
2007
 
Increasing genomic instability during premalignant neoplastic progression revealed through high resolution array-CGH
.
Genes Chromosomes Cancer
 
46
:
532
542
.

Lenski
R E
,
Travisano
M
,
1994
 
Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations
.
Proc. Natl. Acad. Sci. USA
 
91
:
6808
6814
.

Lenski
R E
,
Rose
M R
,
Simpson
S C
,
Tadler
S C
,
1991
 
Long-term experimental evolution in Escherichia coli. I. Adaptation and divergence during 2,000 generations
.
Am. Nat.
 
138
:
1315
1341
.

Logan
B F
,
Mallows
C L
,
Rice
S O
,
Shepp
L A
,
1973
 
Limit distributions of self-normalized sums
.
Ann. Probab.
 
1
:
788
809
.

Loya
P
,
2005
 
Dirichlet and fresnel integrals via integrated integration
.
Math. Mag.
 
78
(
1
):
63
67
.

Maley
C C
,
Galipeau
P C
,
Finley
J C
,
Wongsurawat
V J
,
Li
X
 et al. ,
2006
 
Genetic clonal diversity predicts progression to esophageal adenocarcinoma
.
Nat. Genet.
 
38
:
468
473
.

Merlo
L M
,
Pepper
J W
,
Reid
B J
,
Maley
C C
,
2006
 
Cancer as an evolutionary and ecological process
.
Nat. Rev. Cancer
 
6
:
924
935
.

Michelson
S
,
Ito
K
,
Tran
H T
,
Leith
J T
,
1989
 
Stochastic models for subpopulation emergence in heterogeneous tumors
.
Bull. Math. Biol.
 
51
:
731
747
.

Nguyen
H N
,
Sevin
B U
,
Averette
H E
,
Ramos
R
,
Ganjei
P
 et al. ,
1993
 
Evidence of tumor heterogeneity in cervical cancers and lymph node metastases as determined by flow cytometry
.
Cancer
 
71
:
2543
2550
.

Nicolson
G L
,
1984
 
Generation of phenotypic diversity and progression in metastatic tumor cells
.
Cancer Metastasis Rev.
 
3
:
25
42
.

Oller
A R
,
Rastogi
P
,
Morgenthaler
S
,
Thilly
W G
,
1989
 
A statistical model to estimate variance in long term-low dose mutation assays: testing of the model in a human lymphoblastoid mutation assay
.
Mutat. Res.
 
216
:
149
161
.

O'Sullivan
M
,
Budhraja
V
,
Sadovsky
Y
,
Pfeifer
J D
,
2005
 
Tumor heterogeneity affects the precision of microarray analysis
.
Diagn. Mol. Pathol.
 
14
:
65
71
.

Park
S C
,
Krug
J
,
2007
 
Clonal interference in large populations
.
Proc. Natl. Acad. Sci. USA
 
104
:
18135
18140
.

Park
S C
,
Simon
A
,
Krug
J
,
2010
 
The speed of evolution in large asexual populations
.
J. Stat. Phys.
 
138
:
381
410
.

Resnick
S
,
1987
 
Extreme Values, Regular Variation, and Point Processes
.
Springer-Verlag
,
Berlin/Heidelberg, Germany/New York

Ridout
M S
,
2008
 
Generating random numbers from a distribution specified by its laplace transform
.
Stat. Comput.
 
19
:
439
450
.

Rokyta
D R
,
Beisel
C J
,
Joyce
P
,
Ferris
M T
,
Burch
C L
 et al. ,
2008
 
Beneficial fitness effects are not exponential for two viruses
.
J. Mol. Evol.
 
67
:
368
376
.

Sanjuan
R
,
Moya
A
,
Elena
S F
,
2004
 
The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus
.
Proc. Natl. Acad. Sci. USA
 
101
:
8396
8401
.

Schweinsberg
J
,
2008
 
The waiting time for m mutations. Electron
.
J. Probab.
 
13
:
1442
1478
.

Seshadri
R
,
Kutlaca
R J
,
Trainor
K
,
Matthews
C
,
Morley
A A
,
1987
 
Mutation rate of normal and malignant human lymphocytes
.
Cancer Res.
 
47
:
407
409
.

Shao
Q M
,
1997
 
Self-normalized large deviations
.
Ann. Probab.
 
25
:
285
328
.

Shipitsin
M
,
Campbell
L L
,
Argani
P
,
Weremowicz
S
,
Bloushtain-Qimron
N
 et al. ,
2007
 
Molecular definition of breast tumor heterogeneity
.
Cancer Cell
 
11
:
259
273
.

Wolman
S R
,
1986
 
Cytogenetic heterogeneity: its role in tumor evolution
.
Cancer Genet. Cytogenet.
 
19
:
129
140
.

APPENDIX

Convergence in distribution of Zk(t):
Define λk = λ0 + kb to be the maximum growth rate that can be attained by a generation k mutant, and let
pk=k+j=0k1λkλj.
If ν is discrete and assigns mass gi to a finite number of values b1 < b2 < … <bN = b, then
(1/u)k+pkeλktZk(t)Vd,k,
(A1)
where “⇒” denotes convergences in distribution, and Vd,k has Laplace transform
exp[dk(λ0,b)V0θλ0/λk]
for all θ ≥ 0.
If instead ν is a continuous distribution on [0, b] with a bounded density g that is continuous and positive at b, then
(t/u)k+pkeλktZk(t)Vc,k,
(A2)
where Vc,k has Laplace transform exp[ck(λ0,  b)V0θλ0/λk] for all θ ≥ 0. Here dk(λ0, b) and ck(λ0, b) are constants that depend on the model parameters (Durrett  et al. 2010), and “d” and “c” in the constants and subscripts stand for discrete and continuous. See Durrett and Moseley (2010) for a proof of Equation A1 and Durrett  et al. (2010) for a proof of (A2). To show the dependence on the density g, we write the value of c1(λ0, b) as
c1(λ0,b)=g(b)λ0+bλ01λ0+b(a0+bλ0+b)b/(λ0+b)Γ(λ0λ0+b)Γ(1λ0λ0+b).
The convergence in (A2) was also numerically investigated in Figures 1 and 2 of Durrett  et al. (2010).
Point process limit:

In this section, we discuss the point process representation for the limit in Equations A1 and A2 in the case where k = 1. The limit is the sum of points in a nonhomogeneous Poisson process. Before stating this result, we introduce some terminology. Here and in what follows, we use |A| to denote the number of points in the set A. We say that Λ is a Poisson process on (0, ∞) with mean measure μ if Λ is a random set of points in (0, ∞) with the following properties:

  • For any A⊂(0, ∞), N(A) = |ΛA| is a Poisson random variable with mean μ(A).

  • For any k ≥ 1, if A1, … , Ak are disjoint subsets of (0, ∞), then N(Ai), 1 ≤ ik are independent.

We also let α = λ0/λ1 ∈ (0, 1) denote the ratio of the growth rate of type-0 cells to the maximal growth rate of type-1 cells and note that 1 + p1 = 1/α.

Theorem 1. Let Λ be a Poisson process on (0, ∞) with mean measure
μ(A)=A  αz(α+1)dz
and let S denote the sum of the points in Λ. Then positive constants Ad, Ac  = Ad(λ0, b), Ac(λ0, b) exist that depend on the indicated parameters so that in case i as t →∞
(AduV0)(1+p1)eλ1tZ1(t)S,
and in case ii as t →∞
(AcuV0)(1+p1)t1+p1eλ1tZ1(t)S.

For more details, see Durrett  et al. (2010), Theorem 3, and Durrett and Moseley (2010), Corollary to Theorem 3. Note that the mean measure for Λ has tail m(x, ∞) = xα.

Let Xn denote the nth largest point in Λ, and let Sn=i=1nXi denote the sum of the n largest points. To determine the dependence of Xn on n we first note that if we define Λ′ = f (Λ) where f (x) = xα, then Λ′ is a Poisson process and after making the change of variables y = xα, we can see that the mean measure is
μ(A)=f1(A)αx(α+1)dx=Ady=|A|.
In other words, Λ′ is a homogeneous Poisson process with constant intensity and hence the spacings between points are independent exponentials with mean 1. If we let Tn denote the time of the nth arrival in Λ′, then the law of large numbers implies that Tnn as n → ∞. Since Xn=Tn1/α, we obtain Xnn−1/α as n → ∞. In addition, we have the following Lemma:

Lemma 1.

EXn=Γ(n1/α)Γ(n).
Furthermore, if we define  S=i=1Xi, then
ES<.

Proof. Since Tn has a Gamma(n, 1) distribution, we have EXn=ETn1/α=Γ(n1/α)/Γ(n). Stirling's approximation implies that Γ(n − 1/α)/ Γ(n) ∼ n−1/α and the second conclusion follows. ■

Simpson's index:
To prove Equation 7 in the text, we use a result in Fuchs  et al. (2001) that shows that
limnERn=1α.
(A3)
Here
Rn=i=1n(YiSn)2,
where Yi are iid random variables in the domain of attraction of a stable law with index α and Sn = Y1 + … + Yn. To explain the connection between the two results, note that if we have P(Yi > x) = xα, for x ≥ 1 and letting Yn,i = Yi/n1/α, then
nP(Yn,iA)μ(A).
This implies that if we let Δn = {Yn,i : in} be the point process associated with the Yn,i and define the measures ξnxΛnδx and ξ=xΛδx, then we have
ξnξ,
so that we should expect ER to agree with lim ERn.
To make this argument rigorous, let
Rn (ε)=i=1n Yn,i21Yn,i>ε(i=1nYn,i21Yn,i>ε)2
denote the truncated value of Simpson's index for Λn and
R(ε)=i Xi21Xi>ε(i=1  Xi1Xi>ε)2
denote the truncated value for Λ. Then for any ε > 0, we have
|ERnER|E|RnRn(ε)|+E|Rn(ε)R(ε)|+E|R(ε)R|.
(A4)
We complete the proof by deriving appropriate bounds for each of the three terms on the right-hand side of (A4). For the first term, we have the following:Lemma 2.
limnsupE|RnRn(ε)|hε,
where hε → 0 as ε → 0.
Proof. Let ε > 0 and write An,k=i=1nYn,ik,  An,k(ε)=i=1nYn,ik1Yn,i>ε, and A¯n,k(ε)=An,kAn,k(ε) for k = 1, 2. Since
EY1k1Y1εn1/α=1εn1/αkyk1yαdyCεkαnk/α1
for k = 1, 2, we have the bound
EA¯n,k(ε)Cεkα.
(A5)
After noting that An,2An,12,  An,2(ε)An,12(ε)An,12, and A¯n,1(ε)+An,1(ε)=An,1, we have for any δ > 0,
E|RnRn(ε)|=E|A¯n,2(ε)An,12+Rn(ε)(An,12(ε)An,12An,12)|δ2EA¯n,2(ε)+P(An,1δ)+δ1E(|2An,1(ε)A¯n,1(ε)A¯n,12(ε)|An,1)+P(An,1δ)=δ2E|A¯n,2(ε)|+δ1E(|A¯n,1(ε)(2+An,1(ε)An,1)|)+2P(An,1δ)δ2EA¯n,2(ε)+3δ1EA¯n,1(ε)+2P(An,1δ)ε2αδ2+3ε1αδ+2P(An,1δ),
(A6)
where we have used (A5) in the last line. To control the third term on the right, let φ denote the Laplace transform of Y1. Then
1φ(t)=α1(1ety)y(α+1)dy=αtαt(1ex)x(α+1)dxCtα
as t → 0 since 1 − exx as x → 0 implies that 0(1ex)x(α+1)dx<. We can conclude that
Eexp(tAn,1)=(1(1φtn1/α)n)exp(Ctα)
as n → ∞. In particular, An,1A1, where A1 has the above Laplace transform. Since
1exp(Ctα)1
as t → ∞, we have P(A1 = 0) = 0 so that taking δ = ε(1−α)/2 in (A6) yields the result. ■
To bound the second term on the right-hand side of (A4), we need some notation. Let Mp denote the class of all point measures on (0, ∞). In a slight abuse of notation, we write vν when νMp and v ∈ supp(ν). We equip Mp with the topology of vague convergence (see, for example, Section 3.4 in Resnick 1987) and take as our σ-algebra the one generated by open sets in this topology. Associated with any random set of points, we can associate a measure ξ that is a random variable with values in Mp. We write Λn ⇒ Λ to mean that the associated random measures ξnξ.Lemma 3. Λn ⇒ Λ and if we define the maps Fk,ε: Mp → [0, ∞) by
Fk,ε(μn)=xμnxk1x>ε
for k = 1, 2, then
(F1,ε(Λn),F2,ε(Λn))(F1,ε(Λ),F2,ε(Λ)).
Proof.Since
nP(Yn,iA)=nn1/αAαy(α+1)dy=Aαx(α+1)dx=μ(A)
for all Borel sets A, the first claim follows from Proposition 3.21 in Resnick (1987). The second claim follows from the continuous mapping theorem (see, for example, Resnick 1987, p. 152) the fact that Fk,ε is continuous away from measures ν with εν and the fact that the random measure associated with Λ has no point masses with probability 1. ■

As a consequence of this lemma, the fact that Rn(ε) ≤ 1, and the bounded convergence theorem, we have the following:

Corollary 1.
E|Rn(ε)R(ε)|0
as n → ∞ for any ε > 0.

It thus remains to establish the following:

Lemma 4.
limε0supE|R(ε)R|=0.
Proof. We can establish this result using the same results as in the proof of Lemma 2, in particular if we define Ak=n=1Xnk and A¯k(ε)=n=1Xnk1Xn<εE for k = 1, 2. Then, following the display in Equation A6, we have for any δ > 0
E|RR(ε)|δ2EA¯2(ε)+3δ1EA¯1(ε)+2P(A1δ).
It is obvious that P(A1 = 0) = 0 and EA¯2(ε)EA¯1(ε) for ε < 1, so it remains only to establish that
EA¯1(ε)0,
as ε → 0. This result follows immediately from Lemma 1. Therefore, taking δ=(EA¯1(ε))1/4 completes the proof. ■

We can now complete the proof of Equation 7 by letting n → ∞ and then ε → 0 in (A4) and applying Lemmas 2 and 4 and Corollary 1.

To prove Equation 8, we use a result in Logan  et al. (1973) that establishes that as n → ∞,
Sn(2)=R1/2=i=1n Yi(i=1n  Yi2)1/2
has a limiting distribution with a density f that satisfies
f(y)aeby2, asy,
for some constants a, b > 0 (see Logan  et al. 1973, Equation 5.7, and Shao 1997, Theorem 6.1). Making the change of variables x = y−1/2 yields Equation 8. ■
Largest clones:
Equation 9 is a consequence of the following theorem:Theorem 2. As n → ∞, Vn1W,  where W has characteristic function ψ satisfying ψ (0) = 1 and
ψ(t)=eitfα(t)
for all t ≠ 0 with
fα(t)=1+α01(1eitu)u(α+1)du.

The form of the characteristic function is the same as the characteristic function for limn→∞Tn/Y(1), where the Yi are iid random variables with power law tails, Y(1) = maxinYi, and Tn=i=1nYi (see, for example, Darling 1952). Again, this agreement is a consequence of the previously discussed connection between Δn and the limiting Poisson point process.

To prove Theorem 2, we need the following notation. For a real number t, we define the function
sgn(t)={1, t<0   0, t=0   1, t>0.
For a complex number z we denote the real part of z by Re[z] and its imaginary part by Im[z].
Proof of Theorem 2. Theorem 5.1 in Darling (1952) implies that we have
Eexp(itTn/Y(1))ψ(t)
as n → ∞ whereas in the Simpson's index section, Y(1) = maxinYi and Tn=i=1nYi. To conclude that Tn/Y(1)V, we need to show that y is continuous at 0. To establish this fact, we make the change of variables v = tu to conclude that
fα(t)=1+α01(1eitu)u(α+1)du=1+α|t|α0|t|(1eiv sgn(t))v(α+1)dv.
(A7)
Since 1 − exp(iv) ∼ −iv as v → 0, the integral on the right-hand side of (A7) is finite and hence
ψ(t)=eitfα1(t)1
as t → 0. Since Tn/Y(1)V, the fact that Sn/X1V follows from the arguments in the previous section. ■
It is interesting to note that the characteristic function in Theorem 2 is not integrable. The problem is that the density of Vn1blows up near 1. As an explanation for this, we note that with probability
exp((1xα))exp(xα)xα=e1xα
there is a point in the process bigger than x and no points in [1, x). When this happens,
Vn1=SnX11+nx
and so
Fn(y)=P(Vn1y)e1nα(y1)α.
If we had Fn(y) ? (y − 1)α, then the density would blow up like (y−1)α−1 as y → 1. We confirm that this gives the right asymptotic by providing an explicit formula for the density of W.
Corollary 2. W has a density on (1, ∞) given by
f(y)=limMMMeit(1y)fα(t)dt.
Note that integral expression above does not converge absolutely so part of the proof consists of showing that the limit exists. If we apply the change of variable s = t(y − 1) in the definition of f, we see that
f(y)=(y1)α1eit(y1)α+0(y1)1((1eiut)/uα+1))du,
thus confirming the intuition that the density blows up like (y − 1)α−1 as y approaches 1.
Proof of Corollary 2. We first establish that there are no point masses in the distribution of V. By the inversion formula we have for any a ∈ ℝ,
P(V=a)=limT12TTTeiatψ(t)dt=limT12TTTeit(1a)fα(t)dt.
If we focus on the positive axis and use the change of variable s = t/T,
12T0Teit(1a)fα(t) dt=1201eisT(1a)fα(sT)ds.
From display (A7) it follows that for every s ∈ (0, 1) we have eisT(1−a)/fα(sT) → 0 as T→∞. Note that
Re[fα(t)]=1+α011cosutuα+1du>1,
which implies |fα(t)| ≥ 1 for all t. Therefore |eisT(1−a)/fα(t)| ≤ 1 for all t and it follows via the dominated convergence theorem that
limT1201eisT(1a)fα(sT)ds=0.
A similar result holds for the integral on the negative axis and we conclude that
P(V=a)=0.
We can therefore conclude for x > 1 and h > 0 via the inversion formula (see Durrett 2005, Equation 3.2) and Fubini's theorem that
P(V(x,x+h))=limT12πTTxx+heityψ(t)dydt=limT12πxx+hTTeityψ(t)dtdy.
Therefore, to establish the result we need to show that
limT12πxx+hTTeityψ(t)dtdy=12πxx+heityψ(t)dtdy.
This follows if we show that limTTTeityψ(t)dt is a convergent integral and that a bounded function h exists defined on (x, x + h) such that
|hT(y)|=|TTeityψ(t)dt|h(y).
We first use integration by parts to see
hT(y)=TTeit(1y)fα(t)dt=i1y(eiT(1y)fα(T)eiT(1y)fα(T)+−TTeit(1y)fα(t)fα(t)2dt).
Recalling that |fα(T)| → ∞ as T → ±∞, it follows that if we establish that fα(t)/fα(t)2 is integrable on (−∞, ∞), then the convergence of the integral and the existence of a bounded dominating function will be established. Since fα is bounded away from 0, it suffices to check that the function decays fast enough. Recalling the definition of fα,
fα(t)=iαtα10teivvαdv,
which follows by passing the derivative inside the integral in the definition of fα. We can establish that
supT<|0Teivvαdv|<
by observing
0eivvαdv=eiπ(1α)/2Γ(1α),
which can be found in many places, e.g., Loya (2005). Thus,
|fα(t)|αtα1supT<|0Teivvαdv|C0tα1.
We can similarly establish that for t sufficiently large
|fα(t)|2C1t2α,
for a positive finite constant C1. Thus for t sufficiently large
|eit(1y)fα(t)fα(t)2|Ctα+1,
establishing the result. ■
We conclude this section with the proof of Equation 10. Using the Taylor series expansion of exp(iu) ∼ 0 in (A7) above implies that
1+fα(t)=1n=1α(it)n(nα)n!
and therefore
fα(k)(t)=n=kαintnk(nα)(nk)!
so that in particular,
fα(k)(0)=ikαkα
for all k ≥ 1. Let S(t) = logψ(t) = it − log fα(t). Then dropping the α subscript on fα, we have
S(t)=(if(t)f(t))=(i(logf(t))),
which yields the desired result for the mean:
EY=iS(0)=i(if(0))=11α.
Now
S(t)=(logf(t))′′=f(t)f(t)(f(t))2f2(t),
so
var(Y)=S(0)=f′′(0)+(f(0))2=α2α+α2(1α)2=α(1α)2(2α),
completing the proof. ■
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)