Abstract
Fluctuations in rates of gene expression can produce highly erratic time patterns of protein production in individual cells and wide diversity in instantaneous protein concentrations across cell populations. When two independently produced regulatory proteins acting at low cellular concentrations competitively control a switch point in a pathway, stochastic variations in their concentrations can produce probabilistic pathway selection, so that an initially homogeneous cell population partitions into distinct phenotypic subpopulations. Many pathogenic organisms, for example, use this mechanism to randomly switch surface features to evade host responses. This coupling between molecularlevel fluctuations and macroscopic phenotype selection is analyzed using the phage λ lysislysogeny decision circuit as a model system. The fraction of infected cells selecting the lysogenic pathway at different phage:cell ratios, predicted using a molecularlevel stochastic kinetic model of the genetic regulatory circuit, is consistent with experimental observations. The kinetic model of the decision circuit uses the stochastic formulation of chemical kinetics, stochastic mechanisms of gene expression, and a statisticalthermodynamic model of promoter regulation. Conventional deterministic kinetics cannot be used to predict statistics of regulatory systems that produce probabilistic outcomes. Rather, a stochastic kinetic analysis must be used to predict statistics of regulatory outcomes for such stochastically regulated systems.
IN McAdams and Arkin (1997), we analyzed properties of a representative single bacterial genetically coupled link, that is, a configuration where one promoter controls a gene whose protein product regulates another promoter. In that analysis, an integrated molecularlevel model of the mechanisms controlling gene transcription and translation was developed, and the expected time pattern of protein production from the controlled gene was investigated using the stochastic formulation of chemical kinetics (Gillespie 1977, 1992b). The results suggested that the stochastic fluctuations of the reaction rates of gene expression reactions can produce a highly erratic time pattern of protein production in each individual cell and a wide diversity of protein concentrations across a cell population at any instant of time (McAdams and Arkin 1997). (“Stochastic” is used here in the technical sense of “arising from a random process.”)
When the protein involved is a regulatory protein, these fluctuations in concentration from cell to cell cause dispersion in the time to complete regulated events in different cells, for example, different times to complete regulatory cascades. A particularly interesting case occurs when two independently produced regulatory proteins competitively control a developmental switch. The independent, stochastic temporal patterns of production of each regulatory protein can vary widely from cell to cell. In this case, the path choice from the competitively regulated switch would not be deterministic. Rather, the choice would be random with the probabilities of alternative choices dependent on the stochastic properties of the gene expression mechanisms and the design of the switch circuit. As a result an initially homogeneous cell population would partition into subpopulations following different pathways. The phenotypes on each path could be radically different. In many pathogenic organisms random variation of surface features assists in evasion of host defenses or otherwise enhances virulence (Putte and Goosen 1992; Robertson 1992; Finlay and Falkow 1997; Strauss and Falkow 1997). We suggest in this article that one source of the randomness expressed in the phenotype variations can be the random thermal fluctuations in the reaction rates of the chemical reactions comprising the regulatory circuit.
To examine this phenomenon, we analyze herein the effect of fluctuations in gene expression rates and other molecularlevel fluctuations on lysis or lysogeny pathway selection statistics by phage λinfected Escherichia coli cells. This path selection is made by the λ lysislysogeny decision circuit wherein a wellcharacterized competitive regulatory mechanism is central to the regulatory circuit that partitions the population between lytic and lysogenic outcomes.
APPROACH
Numerous studies have shown that the fraction of λinfected E. coli cells that become lysogenic is influenced by environmental parameters, especially the nutritional state of the cell and the ratio of phage particles to cells at the time of infection. [We follow Kourilsky (1973) and call this ratio the average phage input (API). The API is distinguished from the number of postinfection phage particles in a specific cell which we call the multiplicity of infection (MOI)]. We investigate here (i) the molecular mechanisms that cause the lysislysogeny decision circuit (described below) to select randomly different pathways in different cells, and (ii) use of a stochastic kinetic model of the circuit to predict the fraction selecting each pathway. The predicted fraction of lysogens over a range of APIs is compared with the fraction assayed by Kourilsky (1973). The Kourilsky experiments were selected for comparison because his experimental method was designed to minimize error in the percent lysogenization measured, and the measurements were made over a twodecade API range.
Our approach to kinetic analysis of the lysislysogeny decision outcome is as follows: The celllevel kinetic model of the phage λ lysislysogeny decision circuit (details below) uses the stochastic formulation of chemical kinetics (Gillespie 1976), includes stochastic mechanisms of gene expression (McAdams and Arkin 1997), and models promoter regulation using the statisticalthermodynamic approach described by Shea and Ackers (1985). The set of coupled stochastic equations comprising the stochastic kinetic model is solved using a standard Monte Carlo algorithm (Gillespie 1976) for systems of coupled chemical equations to predict the fraction of cells that will commit to lysogeny for various MOI. Over a wide API range, infecting phage particles distribute randomly among the target cells to produce a Poisson distribution of phage particles per cell as predicted theoretically (Ellis and Delbruck 1939). Thus, given the predicted probability of lysogeny in individual cells at different MOIs, the predicted percentage of lysogens in an experimental cell population can be computed as the Poissonweighted average of cell level predictions. This estimate is shown to compare favorably with experimental results.
STOCHASTIC KINETICS
When concentrations of the reacting species are low and reaction rates are slow, conventional deterministic chemical kinetics may not describe the development of systems of coupled reactions correctly (McQuarrieet al. 1964; Zheng and Ross 1991). Rather, for such chemical systems, one has to recognize that the individual chemical reaction steps occur discretely and are separated by time intervals of random length. Bacterial genetic regulatory mechanisms typically involve low intracellular concentrations of the reacting species and relatively slow reaction rates; the concentrations are low because the majority of regulatory molecules are produced in low quantities per cell (Guptasarma 1995) and most individual genes (and hence their regulatory regions where the regulatory molecules bind) are present in only one or two copies per cell. The rates of genetic reactions are frequently so slow that many minutes may separate individual transcripts successfully initiated (i.e., neglecting abortive initiations) from an activated promoter. The expected erratic pattern of protein production cited earlier results from production of random numbers of proteins in quick succession from these intermittently produced transcripts (McAdams and Arkin 1997). [Note that the erratic time pattern, which we postulate for protein production from the operons comprising the λ decision circuit, will result so long as the mechanisms of transcript initiation and translation control have the following broad statistical characteristics: (i) the statistical distributions of intertranscript intervals and proteins per transcript are skewed and have long tails, and (ii) the mean intertranscript time interval is relatively long (McAdams and Arkin 1997).]
Kinetics of conventional macroscopic coupled chemical reaction systems is modeled using systems of ordinary differential equations, and there is an implicit assumption of continuously varying chemical concentration and deterministic dynamics. Two critical characteristics of chemical systems compatible with these assumptions are: (i) that the number of molecules of each type in the reaction mix is large compared to thermal fluctuations in concentration, and (ii) for each type of reaction in the system, the number of reactions is large within each observation interval. For genetic circuits both of these presumptions are frequently invalid so that the deterministic approach to chemical kinetics breaks down.
In small, lowrate chemical systems it is necessary to pay attention to the fact that changes in chemical population levels really occur in integral numbers of molecules, and are occasioned by essentially random distinct reaction events. It has been shown that the time evolution of such a chemical system is a stochastic process of the Markov type (Gillespie 1992a,b; van Kampen 1992) that is described by the chemical master equation (Gillespie 1992b; van Kampen 1992). The solution to the chemical master equation can be calculated using a stochastic simulation algorithm devised by Gillespie (1976, 1977, 1992b). The Gillespie algorithm correctly accounts for the fluctuations that occur in a wellstirred chemically reacting system.
PHAGE LAMBDA DECISION CIRCUIT
The regulatory mechanisms controlling the λ phage lysis or lysogeny decision are generally known (Herskowitz and Hagen 1980; Friedman and Gottesman 1983; Echols 1986; Friedman 1992; Ptashne 1992; McAdams and Shapiro 1995). The mechanisms of the “switch” that locks the phage into one or the other of the alternate pathways are described by Meyer et al. (1980), Meyer and Ptashne (1980), Ptashne (1992), and Shea and Ackers (1985). It is established that the stability of the CII protein is an important determinant of the number of lysogens produced in an infected cell population; that the HflA and HflB proteins affect CII degradation; and that CII is stabilized somehow in the presence of the CIII protein (Hoytet al.1982; Rattrayet al. 1984). The integrated functions of the switch together with other coupled regulatory mechanisms that determine commitment to execution of the lysogenic and lytic pathways are described by McAdams and Shapiro (1995). Immediately after lambda phage infection of a target E. coli population, the cell population partitions into lytic and lysogenic phenotypes following mutually exclusive regulatory pathways.
The core of the lysislysogeny decision circuit is the fourpromoter, fivegene regulatory network shown in Figure 1a. The organization of the genetic elements of the decision circuit in the phage DNA is shown in Figure 1b. Reinforcement of the path commitment and initiation of the pathwayspecific actions associated with the selected pathway are accomplished by other coupled genes not shown (Herskowitz and Hagen 1980; McAdams and Shapiro 1995). The circuit shown in Figure 1a has two key subsystems: (i) the P_{RM}based switch that creates the circuit's bistability, and (ii) the Hfl proteolytic system, which integrates environmental signals into the circuit's behavior.
The core of the bistable switch is the complex biochemistry of the P_{R} and P_{RM} promoters' operator regions, which share three overlapping operator sites (Figure 1a, O_{R1}, O_{R2}, O_{R3}), where Cro and CI dimers bind competitively and in sequence, but in opposite order (Maureret al. 1980; Meyeret al. 1980; Meyer and Ptashne 1980; Ptashne 1992). Figure 2, a and b, shows contour maps of the activation level of P_{R} and P_{RM}, respectively, as a function of CI and Cro dimer concentration. The activation levels are calculated using the model and parameters in Shea and Ackers (1985).
The lysis or lysogeny outcome in each cell is determined by the specific temporal pattern of CI and Cro accumulation in that cell after infection. Immediately after infection, there are no CI or Cro molecules in the cell so the regulatory circuit is in the state labeled “S” in the lower left corners of Figure 2, a and b. At that point, P_{R} is fully activated; P_{RM} has only a low basal activation, and promoter P_{L} is also activated. Transcription and translation of the phage DNA is accomplished by the host cell's machinery.
Cro and N proteins are produced from transcripts initiated at promoters P_{R} and P_{L} and both proteins begin to accumulate immediately after infection. Initially terminators T_{R1} and T_{L1} partially block RNA polymerase (RNAP) transcription: about 50% at T_{R1} (Friedman and Gottesman 1983) and 80% at T_{L1} (Drahos and Szybalski 1981). However, as the concentration of N increases, the N protein (with other molecules from the host cell) acts to antiterminate RNAP at NUT sites upstream from T_{R1} and T_{L1} (Das 1992), and the average rate of transcription of downstream genes, including cII and cIII, increases.
The CI production rate is determined by the combined transcript initiation rates from P_{RE} and P_{RM} (Figure 1a). Since P_{RM} is essentially OFF until there is some CI in the cell (Figure 2b), there will be no accumulation of CI (and thus potential for lysogeny) unless P_{RE}initiated transcripts produce enough CI to get the cell into the concentration state where P_{RM} is activated and P_{R} is repressed, that is, into the regime where Log[CI_{2}] > ~−6.8 and Log[Cro_{2}] < ~−7.2 (or roughly more than 145 CI dimers and less than 55 Cro dimers, respectively). This can occur only in those cells where, by chance, there is early, strong CII production that persists long enough to activate P_{RE}. If, however, more than about 55 Cro dimers accumulate first, lysogeny will be precluded. (A nominal cell volume of 1.4 × 10^{−15} liters is used here to relate molecular concentration to molecule count.) In cells where, by chance, enough early CI production from P_{RE} transcripts occurs to repress P_{R} and activate P_{RM}, then CI concentration will tend to continue to increase automatically due to positive autoregulation leading to ever increasing repression of P_{R} and P_{L} by CI_{2}. Eventually CI_{2} concentration will rise into the negative feedback region of the P_{RM} repression curve and stabilize by autoregulation at a concentration in a range of 140–200 dimers per cell (Reichardt and Kaiser 1971; Levineet al. 1979), and Cro will eventually disappear due to dilution and degradation.
Though CII is produced from the same P_{R}initiated transcripts that encode Cro, CII accumulation is initially attenuated by termination of about 50% of the transcripts at T_{R1} and by CII's relatively short halflife (~2 min). Thus, initially, only Cro accumulates in the infected cells. In the presence of CIII, degradation of CII is reduced (Hoytet al. 1982). In the λ lysislysogeny decision, the Hfl system integrates two environmentally dependent signals into the circuit function: (i) nutritional state of the cell, and (ii) the level of the phage population in the cell's surrounding vicinity. This article examines how the latter sensing mechanism works. Both sensing functions depend on active control of CII proteolysis (Figure 1a). At higher levels of nutrition, Hflrelated proteolytic activity is higher so that CII and CIII have shorter lifetimes (Grodzikeret al. 1972; Belfort and Wulff 1974). This tends to reduce the mean and peak CII concentration levels, and thus the probability of P_{RE} activation. So the probability of lysogeny is reduced in wellfed cells. Dependence of lysogeny on MOI arises because the concentration of the hostencoded Hfl system is independent of MOI, while the number of cII and cIII genes is proportional to MOI. Thus, cells with higher MOI have a higher probability of achieving CII concentration necessary to activate P_{RE} and kickstart CI production.
If a cell reaches a state where (i) the Cro feedback loop is established, (ii) P_{RM} and P_{L} are repressed, and (iii) CII concentration is low, there is a high probability that the cell will continue on the lytic path. On the other hand, if a cell reaches a state where (i) the CI feedback loop is established, and (ii) P_{R} and P_{L} are repressed, there is a high probability that the cell will continue to lysogeny (McAdams and Shapiro 1995).
SOURCE DATA AND GENETIC CIRCUIT MODEL
Kourilsky's measurements of lysogeny versus API: We use the experimental assays of percent lysogeny versus API in Kourilsky (1973) to compare with predictions of the stochastic kinetic model. In Kourilsky's experiments, lambda phages were added at various API to exponentially growing E. coli cultures for an incubation time that ensured near 100% phage absorption. Kourilsky's measurements included O^{−} and P^{−} strains incapable of phage chromosome replication. Since the phage chromosome count does not increase, the postinfection distribution of the O^{−} or P^{−} phage particles among the target cells can be computed using the Poisson infection statistics model described below. Selection of the O^{−} and P^{−} mutants for modeling eliminates the need to model chromosome replication.
In Kourilsky's plots of log API versus the log of the percent cells lysogenized, the shape of the rate of lysogenization versus API curves was similar for starved and unstarved cells, but the starved curves were systematically shifted to a 50–100 times higher lysogenization rate with little effect on the qualitative dependence of lysogenization rate on the infection ratio [Figure 2 in Kourilsky (1973)]. The starved cell results with 50–100 times higher rates of lysogeny are used for comparison since the number of simulation runs necessary to estimate the fraction, f, of lysogeny varies as 1/f.
Stochastic kinetic model: The stochastic kinetic model used here to analyze operation of the λ lysislysogeny decision circuit includes the genetic mechanisms and the coupled protein dimerization and degradation reactions shown in Figure 1a. Genetic mechanisms are modeled using explicit, though approximate, reaction models of each submechanism and explicitly including features such as termination sites. Thus, promoter operator sites are modeled using the statisticalthermodynamic approach described by Shea and Ackers (1985). (The stochastic version of the Shea and Ackers model of promoter kinetics is produced by calculating the instantaneous probability of each distinct transcriptionally active state of a promoter using the partition function, and then using this probability in calculating the reaction probabilities for the transcript initiation reactions.) Transcript elongation is modeled as a sequence of individual nucleotide steps. Translation control is modeled as described by McAdams and Arkin (1997). Assumptions and reaction models used for elements of the system are described below and listed in Tables 1 and 2. The reaction models are represented as a set of coupled stochastic kinetic equations.
Analytical solution to such systems of stochastic reaction equations is only practical for simple reaction systems. However, numerical solutions can be computed for complex systems of coupled stochastic reactions using the Monte Carlo
algorithm described by Gillespie (1977). The Gillespie algorithm produces a stochastic realization of the temporal behavior of the system by calculating the probabilistic outcome of each discrete chemical event and the resulting changes in the number of each molecular species. In the application of the Gillespie algorithm to simulation of bacterial regulation, each simulation run provides a representative case of the sequence and timing of events and the regulatory outcome in an individual cell starting from specified conditions. Multiple runs with the same initial conditions (e.g., the same MOI) are used to estimate the probability that cells will enter lysogeny for these conditions. [About
To compare the percent lysogenization predicted by the simulation at various infection levels to experimental observations obtained by Kourilsky (1973), the Poissonweighted average of the probability of lysogeny at each MOI is computed to estimate the expected percent lysogeny as a function of API. The theoretical Poisson probability that a given cell will be infected with exactly MOI = M phage when API = A is
Modeling Assumptions
Dispersion in cell generation times can be neglected. Thus all runs used a cell cycle time of 35 min, consistent with the cell cycle time reported by Kourilsky (1973). E. coli cell generation times are observed to be approximately normal distributed with standard deviation of about 22% of the mean (Plank and Harvey 1979). The neglect of dispersion in cell generation times is equivalent to assuming that any growth raterelated effects in faster growing cells roughly offset the effects in slower growing cells.
The volume of the cell grows approximately linearly from 1 × 10^{−15} to 2 × 10^{−15} liters. (The maximum difference in volume between linear and exponential cell growth models is <6% with negligible effect on simulation results.)
Host housekeeping molecules relevant to phage gene expression and phage protein degradation are constitutively expressed and regulated at constant concentration, which is the same in all cells. This implies, for example, that all enzymes required for metabolic pathways, etc., are expressed at levels consistent with a healthy bacterium and that cytoplasmic concentrations of proteases, RNAP, ribosomes, and metabolic substrates are maintained during the early postinfection period when the lysislysogeny decision is being resolved. Both RNA polymerase and ribosomes are present in the cell in relatively large numbers, however, the free polymerase and ribosome concentrations are thought to be a fraction of the total and to be buffered by exchange with units that are engaged in other reactions. Under these conditions, fluctuations in polymerase and ribosome concentrations would be relatively small.
Regulatory effects of host proteins such as integration host factor and RNase III on phage λ gene expression are assumed to be equivalent for all cells and constant over time. For example, the effect of integration host factor on P_{L} activity (Giladiet al. 1990) is assumed to be included in the kinetic parameters of the promoter and to be independent of phage MOI.
Effects such as macromolecular crowding or twostep binding to DNA or RNA that might affect reaction kinetics are assumed to be subsumed into the kinetic parameter characterizing the reaction.
Intermediate reactions (as with sigmafactors or other subelements) in assembly of the RNAP and ribosome complexes are not rate limiting. Instead, we assume either (i) that the host cell maintains an effective concentration of transcriptionally and translationally available concentrations of these molecular complexes that are in rapid exchange with their binding sites on the DNA or RNA, or (ii) that the component subunits are in rapid equilibrium with functionally active assemblies (Shea and Ackers 1985; Ptashne and Gann 1997). The ratelimiting step in transcript initiation is assumed to be the closed to opencomplex isomerization reaction (McClure 1980).
Phage gene expression is stochastic, consistent with the mechanisms described by McAdams and Arkin (1997).
An average of 10 proteins are produced per transcript for all genes (Kepes 1963; Shea and Ackers 1985). Transcript degradation rates and ribosome binding rates are chosen to produce that average yield.
In Kourilsky's experiment the E. coli cells were unsynchronized, hence they were presumably infected at random times in the cell cycle (Kourilsky 1973). We assume all infections occur early enough in the cell cycle so that cell growth only affects operation of the decision logic by dilution effects on concentrations of phageencoded molecules. The initial rates of phage protein production from P_{R} and P_{L}initiated transcripts in each host cell are independent of cell volume. However, for the same rate of protein production, the consequent rate of change in phage protein concentration is cell size dependent so that timing of subsequent events could be slowed somewhat for larger cell size at infection time. Most cells that are fated to become lysogens are committed by 10 to 15 min (causing, for example, the cessation of Cro_{2} production shown in Figure 3c). Most infections early in the cell cycle are thus resolved before the next cell division. For infections occurring late in the cycle, if commitment has not occurred before division, the phage chromosomes and proteins at division are randomly shared between the daughter cells when the cell divides. Then the phage infection continues, but with lower MOI. For the O^{−} or P^{−} phage mutants, the average postdivision MOI is halved, since phage chromosome replication is not possible. Halving the MOI reduces the probability of lysogeny in the daughter cells. This suggests that neglecting cell division leads to some degree of overestimation of the probability of lysogens in the simulation.
The target cells are infected effectively simultaneously so that no temporal infection effects or phage infectiondependent immunity occurs.
The cell is assumed to be a homogeneous, wellstirred medium so the concept of “protein concentration” is valid and spatial effects are averaged out. [E. coli signaling proteins have been shown to diffuse distances comparable to the cell dimensions in much less than a second (Ishiharaet al. 1983).]
A cell becomes committed to lysogeny if there is (i) a sufficient timeintegrated concentration of CII to activate P_{RE}, and (ii) [CI_{2}] > [Cro_{2}] at the end of the 35min cell cycle. Activation of P_{RE} was defined as an average activation level of one opencomplex per 2 min over a contiguous 4min period. This level of CII production would also activate the other CIIdependent λ promoters, P_{antiQ} and P_{I}, that function in execution of the lysogenic pathway (McAdams and Shapiro 1995). CI_{2} concentration greater than that of Cro_{2} at the end of the cell cycle is an additional indication that activation of P_{RE} occurred early enough and was productive enough to lock on the CI feedback loop.
Reaction Models
The following paragraphs describe the models used for the mechanisms of the lysislysogeny decision circuit. Reactions and parameters are listed in Tables 1, 2 and 3. Parameters in the kinetic model are derived from the sources cited in Tables 1, 2 and 3. Considerations underlying selection of the CII and CIII proteolysis reaction models and rate parameters are described below.
Phage gene expression: The genetic mechanisms associated with transcript initiation and translation control produce the largest component of the stochastic effects that lead to divergent phenotypes in the λ infection system. The following genetic reactions are modeled: operator/promoter binding, transcript initiation, transcription, initiation of translation, translation, and initiation of mRNA degradation. The transcription model includes mechanisms for the two RNAP termination sites, T_{R1} and T_{L1}, and antitermination at the NUT_{R} and NUT_{L} sites (Figure 1a).
Operator/promoter binding and control of transcript initiation: Shea and Ackers (1985) describe a statistical mechanical/thermodynamic approach to modeling the P_{R}/P_{RM} promoter complex. We use the same approach to model P_{L} repression by CI_{2} and Cro_{2} and P_{RE} activation by CII using the parameters in Table 1. The instantaneous probability of each distinct occupancy state of a promoter is assumed to be determined by the partition function defined in accord with the Shea/Ackers formulation, and we use the probability in the stochastic formulation of kinetics as defined by Gillespie (1977, 1992b). A key assumption is that effector molecule binding reactions at a promoter occur much faster than the rate of transcript initiation at the promoter. With this rapid equilibrium assumption, the binding state of the promoter is modeled by randomly choosing the promoter state at each instant using the probabilities given by the partition function (Shea and Ackers 1985). If the promoter state selected is one from which RNAP can initiate transcription, then that transcript initiation reaction is included in the list of possible reactions for the next Monte Carlo calculation in accord with the Gillespie algorithm. Whenever the Monte Carlo calculation determines that a transcript is initiated from one of the promoters, a new transcribing RNAP is “initiated” on the corresponding DNA transcription object. The promoter activation functions in Figure 2 show the resulting average rates of transcript initiation as a function of CI_{2} and Cro_{2} concentrations. Occlusion of the promoter site by the footprint of a recently launched RNAP is included in the model.
Transcription: A transcript elongation model estimates the time delays between transcript initiation and arrival at the end of each coding region on the operon. This delay, plus the delay until an effective level of signaling molecules is accumulated, determines the timing of regulatory molecule concentrations that control regulatory networks. The movement of transcribing RNAP along the DNA is modeled as a sequence of independent onenucleotide reaction steps. Each such reaction is assumed to be unidirectional, that is, RNAP movement is assumed to be strongly forwardbiased. It is assumed that there is a single ratedetermining reaction for each RNAP step and that each forward step has constant probability of occurring per unit time, leading to an exponential distribution of interstep times. The exponential is characterized with an average steptime parameter. The same average step time was used at each nucleotide position and for coding and noncoding regions, neglecting the differences between transcription rates for different nucleotides. (Transcription through termination and antitermination sites is described below.)
Termination: At termination sites, transcribing RNAPs slow down (i.e., the average interstep time parameter is larger) and there is a probability of transcript termination at the site. When the RNAP is antiterminated upstream of the terminator site, the termination site is then modeled as the “normal” DNA described above.
Antitermination: The reactions to assemble the antiterminated form of RNAP at NUT_{L} and NUT_{R} sites (Figure 1a) depend on λ N protein concentration. The antitermination reaction complex also involves Rho and at least four additional host factors: NusA, NusB, NusG, and S10 (Whalenet al. 1988; Mason and Greenblatt 1991; Liet al. 1992; DeVito and Das 1994). These factors are assumed to be constitutively expressed and present in the necessary concentrations so that the concentration of N is limiting. In vitro studies suggest that in the presence of the proper host factors, the fractional readthrough of a downstream terminator increases in proportion to the concentration of N until there is full antitermination at N concentrations between 50 and 100 nm (Whalenet al. 1988; Liet al. 1992; DeVito and Das 1994). Under these conditions the antitermination process at the NUT site is modeled as a singlestep reaction assumed to be a pseudofirstorder reaction with the rate chosen such that antitermination of RNAP is near 100% for N concentrations above about 75 nm. This parameter choice fits the experimental dependence of fractional readthrough of a downstream terminator as a function of N concentration (Whalenet al. 1988; Liet al. 1992; DeVito and Das 1994).
Translation: Translation control is modeled as described by McAdams and Arkin (1997), based on the mechanism described by Yarchuk et al. (1992). In that model, a competition between ribosome and RNase E binding determines the average number of proteins produced per transcript. In the stochastic kinetic model of the λ circuit, this ribosomeRNase E competitive binding reaction is treated as a stochastic chemical reaction. The temporary occlusion of the ribosome binding site after a successful ribosome binding event is modeled. Motion of a translating ribosome on a transcript is modeled similarly to the model of motion of a transcribing RNAP on DNA described above. If one ribosome by chance overtakes another in the model, the progression of the former is halted until the latter moves ahead. The average ribosome step time is selected to be shorter than the RNAP steptime parameter, producing ribosome queuing as is observed (Kennell and Riezman 1977; Yarchuket al. 1992). As with the transcription model, the statistics of the interstep times are assumed to be described by the exponential probability function.
Phage protein dimerization and degradation: The principal reactions involving phageencoded proteins in the decision circuit are identified in the boxes labeled R1 to R5 in Figure 1a. Reactions in R1 include degradation and dimerization of CI; R2 includes dimerization and degradation of Cro; R3 and R4 include competitive degradation of CII and CIII by the two host cell proteases (see below); and R5 includes degradation of N.
CI and Cro dimerization and degradation: Degradation of CI and Cro is modeled as occurring predominantly by proteolysis of the monomeric form, a common degradation mode for multimeric proteins (Shea and Ackers 1985; Gottesman and Maurizi 1992). The dimerization reactions are assumed to be characterized by fast forward and reverse reaction rates whose ratio, the dissociation constant, is 20 nm (Shea and Ackers 1985). The dynamics of decay of multimeric protein populations is not exponential in general. Rather, the dynamics and specifically, the measured lifetimes, depend strongly on the forward and reverse rates of the multimerization reactions and on the initial concentration of each subspecies. Reaction parameters for the stochastic kinetic model here were selected to match experimental measurements of lifetimes.
Degradation of CII and CIII: Two membranebound protein complexes, HflA (Chenget al. 1988) and HflB (Banuettet al.1986; Hermanet al.1995; Kiharaet al. 1997), act todegrade CII. HflB is thought to degrade CIII (Hermanet al. 1995). Initial studies identified HflA as a nonessential membranebound GTPutilizing protease (Chenget al. 1988; Zorick and Echols 1991; Nobleet al. 1993). However, later experiments report that HflA is probably a regulator of HflB activity and that HflB is the major protease responsible for CII degradation. Additional evidence for proteolytic activity of HflB is provided by in vitro experiments with purified protein (Hermanet al. 1995; Kiharaet al. 1997; Shotlandet al. 1997). The Hfl proteolytic system has other host protein targets in addition to CII (Cheng and Echols 1987; Hermanet al. 1993; Hermanet al. 1995). Both HflA and HflB respond to host environmental signals. There is some evidence that HflA activity is directly or indirectly affected by the cataboliteactivating protein/cAMP system, which has been shown to reduce proteolytic activity in response to carbonsource starvation (Hoytet al. 1982; Banuett and Herskowitz 1987). CIII protects CII from proteolysis (Hoytet al. 1982; Rattrayet al. 1984; Banuettet al. 1986) even in the absence of HflA and HflB activity, which implies the existence of yet another proteolytic pathway for CII degradation (Kiharaet al. 1997). In summary, although phenomenology of the proteolysis of CII and CIII is relatively well characterized, the exact mechanisms whereby HflA, HflB, and perhaps another unidentified protein control degradation of CII and protection of CII by CIII are not known.
The halflife of unprotected CII has been observed to be anywhere from 5 min (Hoytet al. 1982) to less than 30 sec depending on conditions (Rattrayet al. 1984). The short halflife of CII and the relatively low concentrations of CII protein and HflA/B suggest that binding of CII to the Hfl proteins is tight and fast. Two alternative mechanisms have been hypothesized for the protection of CII by CIII (Chenget al. 1988): (i) competitive binding of CII and CIII to the Hfl (and perhaps another) proteolytic complex, or (ii) direct CIII binding to CII to form a proteolysisresistant complex. Available experimental data does not differentiate between the two alternatives. In order to select among candidate models, we investigated alternative reaction mechanisms seeking a reaction system that meets four constraints: (a) yields a 2min halflife for CII in the range of initial concentrations spanning 50–100 nm (Gottesman and Gottesman 1981; Chenget al. 1988), (b) produces an approximately 6min halflife for CIII in the absence of CII (Kornitzeret al. 1991a), (c) yields CIII protection of CII consistent with Hoyt et al. (1982) and Rattray et al. (1984), and (d) functions in the overall simulation model to produce the simulated percent lysogeny as a function of API consistent with Kourilsky (1973). Constraint (d) proved the most restrictive. The best satisfaction of the constraints (a) through (d) was obtained using a proteolytic system in which CII and CIII are competitive substrates for two independent proteases. The resulting reaction mechanism is shown in Figure 1a and rate parameters used are in Table 3. One protease (called P1) is more specific than the other but saturates at very low concentrations of CII; the other (called P2) is only slightly less specific, does not saturate, and has a much higher maximal activity. These proteases correspond to HflB and the putative second protease identified by Kihara et al. (1997).
Degradation of N: The halflife of the antiterminationcontrolling protein N is approximately 5 min. Degradation is by the Lon protease, and Lon is also thought to be responsible for degrading the Hfl proteins (Gottesman and Gottesman 1981). Reported data on N degradation fit a firstorder decay curve (Gottesman and Gottesman 1981); thus N probably does not saturate the protease. Accordingly, N degradation is modeled as a firstorder reaction.
Cell growth: The linear cell growth assumption was implemented as a constant probability of adding a small fixed volume increment each instant of time. Each run was started at an initial cell volume of 1 × 10^{−15} liter and continued until the volume doubled to 2 × 10^{−15} liter over 35 min of simulated cell time.
RESULTS
Time course of pathway selection: Figures 4 and 5 show the temporal trajectory of the concentration of key protein molecules in one lytic and one lysogenic case selected from runs at MOI 6. (Figure 2c is based on the same two cases.) The two cases show the randomness in the intracellular regulatory protein concentration trajectories and the differences in the trajectories for the divergent developmental paths possible in two initially identical cells. Of the phageencoded proteins shown in Figures 4 and 5, Cro_{2} and CII are expressed earliest in both the lytic and lysogenic cases. Cro_{2} appeared within 1 min of infection (Figure 5b) and CII appeared within 2 min (Figure 4a). Protein expression in the two cases began to diverge after about 5 min. Both the lytic and lysogenyfated cases experienced a nearly equal burst of CII production at this time (Figure 4a), however, in the lysogenyfated case, there was a simultaneous burst of CIII production (Figure 4b). So lysogeny resulted in this case because, by chance, the bursts of CII and CIII were both large and simultaneous so that CII degradation was slowed and it survived long enough to activate P_{RE} and kickstart CI production. Figure 4c shows that the CII/CIII proteases were strongly inhibited by the bursts of CIII production in the lysogenic case. CI_{2} concentration (Figure 5a) in the lysogenic case began to grow at about 12 min just after CII concentration peaked. The growing CI_{2} concentration repressed P_{R} and stopped Cro production. As a result, the Cro_{2} concentration declined in the lysogenyfated case after 12 min (Figure 5b). In contrast, in the lyticfated case no CIII production occurred so the unprotected CII rapidly degraded and did not activate P_{RE} enough to start the CI expression feedback loop. Without expression of CI, Cro_{2} production continued (Figure 5b) and lysis ensued.
Figure 2c shows the intracellular CI and Cro dimer concentration trajectory for the lysogenicfated case at MOI 6 (identical data as Figure 5) superimposed on the P_{R} and P_{RM} promoter activation contours. The CI_{2} repressor concentration began to autoregulate its own concentration 20 min after infection; thereafter, the CI_{2} concentration remained constant and P_{RM} activation slowly increased as Cro_{2} concentration was diluted in the growing cell (Figure 2c, arrow). The concentration trajectory for the lytic case in Figure 5 is not shown in Figure 2c to avoid confusing the figure. However, the oval on Figure 2c indicates the region where the concentrations stabilized at 12 min after infection.
Estimated statistics of concentration trajectories: The Monte Carlo solution to the stochastic kinetic equations produces a database of representative timedependent samples of the concentration trajectories as the infection progresses for each molecular species in the reaction system. Analysis of this database provides estimates of the statistical parameters of the infection progress in cells with corresponding initial conditions, e.g., a particular MOI. Figure 3a shows the estimated statistical distribution of the CI_{2} and Cro_{2} concentration trajectories for the subset of cells at MOI 6. (For all plots in Figure 3, the bold lines are the average concentration of the indicated species and the lighter lines are the ±1σ range.) The lysis and lysogenyfated subsets shown in Figure 3, b and c, each experience a different pattern of Cro_{2} and CI_{2} concentration growth statistics, distinct from each other and from the combined statistics. Figure 2d shows the same average Cro_{2} and CI_{2} concentration trajectories for the lysogenicfated and lyticfated cases at MOI 6 superimposed on the P_{R} and P_{RM} promoter activation contours.
Lysogenic fraction: Kinetic model estimates compared to experiment: The experimental lysogeny fraction data shown in Figure 6b for starved O^{−} (□) and P^{−} (○) mutants are from Figure 2 of Kourilsky (1973). For the higher API values in Figure 6b, Poisson statistics of infection (Equation 1) predicts that some cells would have quite high infection levels. At some infection level, the phage processes must become disruptive to the host cell processes so that the assumptions underlying the kinetic model become invalid. To address this possibility we solved for two sets of CII/CIII proteolysis parameters: first, for the best match to the experimental O^{−} and P^{−} data for all available API (Figure 6a, labeled “Full”, symbol: ▪), and, second, for the best match considering only API values ≤6 (Figure 6a, labeled “Lower”, symbol: ▾). Points in Figure 6a (▪ and ▾) reflect the estimated probability of lysogeny in individual infected cells vs. MOI from solution of the stochastic kinetic model equations for these two different choices of Hfl parameters. The vertically hatched area in Figure 6b indicates the range of difference between the resulting estimates of the fraction of lysogens. The hatched area in Figure 6a indicates the corresponding range of differences in the probability of lysogeny vs. MOI resulting from the different proteolytic model parameters. Both curves in Figure 6a show negligible lysogenization at MOI < 3 and a rapid increase in lysogeny for MOI > 3. Corresponding points in Figure 6b yield the solid lines bounding the hatched region. These estimates of the fraction of lysogens in an infected cell population versus API are calculated as the Poissonweighted sum of points for different MOIs in Figure 6a for corresponding cases using Equation 2. The match with experiment is good at the critical low MOI values, but falls above observed values at high MOI. We attribute the overestimation of the percentage lysogeny at high API in Figure 6b predominantly to disruption of host cell processes at high infection levels.
The three curves labeled Poissonn show the hypothetical fraction of lysogens vs. API expected for a “threshold model,” where all cells with MOI ≥ n are assumed to become lysogens. The solution of the stochastic kinetic model exhibits rapid onset of lysogeny for MOI > 2 (Figure 6a), representing an approximation to a threshold process in the decision circuit produced by the reinforcing effects of production from multiple promoters and earlier antitermination as MOI increases. At low APIs, both the experimental points and the stochastic model predictions for the O^{−} and P^{−} mutants lie between the idealized threshold model predictions for thresholds at MOI 3 and MOI 4.
Digital mutants: Additional tests of the kinetic model by predictions of other experimental observations are needed; however, we are unaware of additional, independent measurements for similar strains and conditions. Accordingly, we include in Figure 6c testable predictions of rates of lysogeny for several “digital mutants” based on changes in the stochastic kinetic model reflecting several mutant cases. The curve labeled O^{−}N^{−} in Figure 6c reflects our prediction of the percent lysogeny for a digital mutant with the function of the N protein disabled in the kinetic model and all other parameters as for the curve labeled “Full”. (We use the “O^{−}” notation to indicate replication deficient, i.e., either O^{−} or P^{−}, mutants.) This is the prediction of percent lysogeny from the kinetic model for a starved O^{−}N^{−} mutant; the curve labeled N^{−}/50 is the corresponding prediction for an unstarved O^{−}N^{−} mutant. [The “unstarved” estimate is derived by dividing the “starved” estimate by 50, consistent with the observation by Kourilsky (1973) that starved cells systematically exhibited increased lysogeny by 50–100×.] The predicted level of lysogeny is reduced for the O^{−}N^{−} case because the transcribing polymerases are never antiterminated and production of CII and CIII is always lower than when N production leading to RNAP antitermination is possible.
The curve in Figure 6a labeled “O^{−}Coop^{−}” shows the predicted probability of lysogeny for a digital mutant where CI binding to the P_{R}–P_{RM} operator sites is made noncooperative in the kinetic model, reducing the effectiveness of positive autoregulation of P_{RM}. The predicted experimental fraction of lysogens (not shown) is close to the curves for the O^{−}N^{−} cases in Figure 6c. The dashed line labeled O^{−}T^{−} in Figure 6c is the estimate for another starved digital mutant with the T_{R} and T_{L} termination sites disabled (the line labeled O^{−}T^{−}/50 is for unstarved mutants). The reduced slope of lines for the O^{−}T^{−} mutant in Figure 6c is due to the predicted increase in the estimated probability of lysogeny for cells with MOI 1 and 2 for this mutant as shown in Figure 6a.
DISCUSSION
Stochastic gene expression and competitive genetic regulatory mechanisms: In preceding sections a stochastic kinetic model of the λ lysislysogeny genetic regulatory circuit is used to estimate the dynamical behavior of the circuit, including effects of random patterns of gene expression. The random developmental path choice between the lysogenic or lytic path in individual cells was shown to result from the inevitable fluctuations in the temporal pattern of protein concentration growth caused by the molecularlevel thermal fluctuations in rates of ratedetermining reactions within gene expression mechanisms. The resulting differences in concentration between the regulatory proteins controlling the bistable switching elements of the decision circuit led to different path selections in different cells. The estimated variation with API of the fraction of a phage λinfected cell population that become lysogenic was shown to be consistent with experimental observations.
This analysis indicates how molecular level thermal fluctuations can be exploited by the regulatory circuit designs of developmental switches to produce different phenotypic outcomes. Such regulatory mechanisms will produce diverse phenotypes even in clonal cell populations maintained in the most homogeneous laboratory environments. In such systems environmental signals can act on the parameters of the regulatory circuit to bias the probabilities of path choice under different conditions.
Even in nondifferentiated cell populations, the concentration of regulatory proteins within individual cells will vary widely from the average concentration measured by laboratory procedures. In situations where there are divergent phenotype subpopulations as analyzed herein, the concentration trajectories of each subpopulation can differ radically from each other and from the average measured for the full population (Figure 3). For these situations, experimental methods (such as various cell sorting techniques) that profile the distribution of individual cell parameters are necessary. In any case, the function of regulatory circuits that determine cell fates is determined by protein concentrations within each cell, and the temporal pattern of these intracellular concentration trajectories is completely different from any averaged measurement (compare Figure 3 with Figures 4 and 5).
Role of termination sites in the λ circuit: The simulation results with hypothetical “digitalmutations” affecting termination effectiveness of the T_{R1} and T_{L1} termination sites demonstrate that both removal of these sites and removal of the antitermination effects of the N protein have major effects on the level of lysogeny, but in opposing directions (Figure 6a). We conjecture that the function of these phageencoded features in the phage regulatory circuit design may be to adjust the level of the phage lysogenic response to an “optimum” range for phage survival.
Other stochastic switching mechanisms: Although the specific analysis reported here deals with regulation of the phage λ infection, regulatory circuits based on bistable genetic regulatory mechanisms are used in many organisms to produce subpopulations of distinct phenotypes by random phenotype switching. Table 4 shows a small sample of wellknown cases of bistable regulatory mechanisms in regulatory circuits that produce stochastic phenotype outcomes. Many examples are found in pathogenic organisms. Conventional deterministic kinetics does not model statistics of regulatory systems that produce probabilistic outcomes. A stochastic kinetic analysis as used in this paper for the λ decision circuit can be used to predict statistics of regulatory outcomes for these stochastically regulated systems. Such stochastic kinetic analyses may also permit improved exploitation of information in the statistics of phenotypic outcomes.
For bistable switching phenomena that are integral to the operation of the cell, some complications encountered in analysis of the λ decision circuit would not be present. One example is the uncertainty in phage gene dosage and of timing of infection in the cell cycle due to the random nature of the phage infection process. Also, phage infection is disruptive of “normal” cell processes, whereas the cell's integral processes would presumably be less so.
Switching dynamics: We expect that many aspects of the λ switch dynamics will be found in other multipotent regulatory switches. Examples include: (i) transient, lowlevel expression of key regulatory proteins (as with CII, CIII, and N), (ii) stochastic progress toward commitment as concentrations of the controlling proteins in each cell change from moment to moment and the probability of each potential outcome also changes until eventually the cell's fate is determined, and (iii) design features that adjust timing of reactions as with the delay of CII and CIII concentration growth produced by termination sites T_{L1} and T_{R1}.
Stochastic gene expression and regulatory determinism: If random processes of gene expression tend to make the pattern of protein production inherently erratic, how do cells achieve the regulatory determinism necessary for most functions? One possibility is that the overall regulatory architecture can suppress deleterious effects of molecularlevel reaction fluctuations. For example, consider a case where expression of a protein is controlled by a ratelimiting mechanism acting at the level of translation. For both the conventional and the stochastic formulation of chemical kinetics, the kinetics of the ratelimiting step(s)will dominate the overall kinetics of a series of cascaded chemical reactions. So, stochastic processes affecting transcription control, posttranscriptional editing, or message transport may be screened by the final translation control mechanism. Another possibility is that the stochastic pattern of signal protein production may only cause uncertainty in timing of regulatory events, not uncertainty in outcome. Within broad limits the duration of many cellular functions may be less important to proper cellular function than the proper sequencing of events. For example, cells halt at various checkpoints until conditions (e.g., restoration of essential nutrients, completion of precursor cellular events) for further progress are satisfied (Hartwell and Weinert 1989; Kaufmann and Paules 1996; Wells 1996). In this case, the indeterminism relates to whether the cell will progress or not progress along a developmental path at any instant, rather than to a choice among alternate stable pathways. So, the regulatory decision logic is: “HALT until CONDITIONS are met then PROCEED,” where “CONDITIONS” are sensed environmental or cellular signals. The result is dispersion across the cell population in the rate of progression along prescribed pathways rather than dispersion in outcome.
Acknowledgments
This work was supported by Office of Naval Research Grant N000149610564. A.A. was partially supported by National Science Foundation grant CHE9109301. The work was also supported in part by a grant of computer time from the Department of Defense High Performance Computing Centers at Eglin Air Force Base and Maui. We thank Lucy Shapiro for valuable comments on the manuscript.
Footnotes

Communicating editor: R. S. Hawley
 Received March 5, 1998.
 Accepted April 30, 1998.
 Copyright © 1998 by the Genetics Society of America