## Abstract

Evolutionary explanations for the origin of modularity in genetic and developmental pathways generally assume that modularity confers a selective advantage. However, our results suggest that even in the absence of any direct selective advantage, genotypic modularity may increase through the formation of new subfunctions under near-neutral processes. Two subfunctions may be formed from a single ancestral subfunction by the process of fission. Subfunction fission occurs when multiple functions under unified genetic control become subdivided into more restricted functions under independent genetic control. Provided that population size is sufficiently small, random genetic drift and mutation can conspire to produce changes in the number of subfunctions in the genome of a species without necessarily altering the phenotype. Extensive genotypic modularity may then accrue in a near-neutral fashion in permissive population-genetic environments, potentially opening novel pathways to morphological evolution. Many aspects of gene complexity in multicellular eukaryotes may have arisen passively as population size reductions accompanied increases in organism size, with the adaptive exploitation of such complexity occurring secondarily.

EUKARYOTIC gene regulation is a remarkably complex process, with each gene displaying multiple functions in discrete tissues and times during development. To accomplish such tasks, the noncoding DNA of individual genes often harbors numerous small *cis-*acting elements that cooperatively interact with multiple *trans-*acting factors to tune levels of transcription (Davidson 2001). Mutations in these regulatory regions may influence many aspects of phenotypic evolution by imposing the loss or gain of gene expression (Raff 1996; Gerhart and Kirschner 1997; Force* et al*. 1999; Carroll 2001; Carroll* et al*. 2001). Over evolutionary time, an increase in the particulate nature of gene regulation seems to correlate with the subdivision and specialization of body plans of multicellular organisms, leading to organisms in which traits are capable of following independent evolutionary trajectories (Wagner 1996; Wagner and Altenberg 1996; Raff and Sly 2000). Increases in the particulate nature of gene regulation that affect the structure of developmental networks may be thought of as increases in genotypic modularity, while the subdivision and specialization of body regions at the phenotypic level may be thought of as increases in phenotypic modularity. Although it is tempting to view regulatory-region complexity as a prerequisite for the adaptive origin of morphological complexity, the causal link between genotypic and phenotypic modularity remains unclear, and a formal theoretical framework for the evolutionary origin of regulatory gene structure remains to be developed.

Renewed interest in the evolutionary fates of duplicate genes (Piatgorsky and Wistow 1991; Clark 1994; Hughes 1994; Walsh 1995; Sidow 1996; Nowak* et al*. 1997; Wagner 1994, 1998; Force* et al*. 1999; Stoltzfus 1999; Lynch and Force 2000; Lynch* et al*. 2001; Wagner 2001; Rodin and Riggs 2003) has resulted in the development of models that explicitly incorporate the complex, multifunctional organization of eukaryotic genes. For example, under the duplication-degeneration-complementation (DDC) model (Force* et al*. 1999; Lynch and Force 2000; Lynch* et al*. 2001), genes are posited to contain independently mutable subfunctions that can be partitioned among descendant copies following a gene-duplication event. A *gene subfunction* has been defined as an independently mutable function of a gene that falls into a distinct complementation class (Force* et al*. 1999). The defining characteristic of a subfunction is not the number or types of its DNA components, but their integrated operation in performing a task that is mutationally independent of other suites of functionally integrated elements acting at the same locus. A subfunction component may correspond to regulatory elements (*e.g.*, transcription-factor binding sites), splice junctions, mRNA stability elements, and/or coding regions (*e.g.*, functional motifs), among other possibilities (Force* et al.* 1999, 2004).Under the general DDC model, a variety of population-level mechanisms may lead to duplicate-gene preservation and the partitioning of gene subfunctions (Piatgorsky and Wistow 1991; Hughes 1994; Force* et al*. 1999; Lynch and Force 2000; Lynch* et al*. 2001; Adams* et al*. 2003; Rodin and Riggs 2003). However, although these mechanisms might explain the evolutionary fates of a large fraction of duplicate genes in metazoans and vascular plants, they also beg the question—How do new gene subfunctions arise in the first place? In this article we present a model for the origin of new regulatory subfunctions by a near-neutral process via cycles of information accretion and loss at individual loci. Our intention is to show how mutation, duplication, and genetic drift can drive the evolution of genotypic modularity.

In this article, we restrict our attention to the evolution of new regulatory subfunctions. Several studies on the structure of genetic networks and regulatory regions have provided clues as to the proper structure of models for the evolution of modularity at the level of gene regulation. Small regulatory elements can arise by *de novo* mutation or by transpositional insertion, providing many potential degrees of freedom for altering the number and type of transcription-factor binding sites (Arnosti* et al*. 1996; Wray 1998; Yuh* et al*. 1998; Brosius 1999; von Dassow and Monro 1999; Edelman* et al*. 2000; Stone and Wray 2001; MacArthur and Brookfield 2004). If various permutations of such elements provide for functionally equivalent outputs of a gene (see Edelman* et al*. 2000), then it follows that the stochastic turnover of control elements by nearly neutral processes may play a critical role in the evolution of regulatory regions (Bonneton* et al*. 1997; Ludwig* et al*. 1998; Hancock* et al*. 1999; Ludwig* et al*. 2000; McGregor *et al*. 2001; Shaw* et al*. 2001). Although some regulatory-region structures may endow their associated alleles with higher fitness than others, the range of effectively equivalent states will necessarily increase in populations with smaller size where the efficiency of selection is diminished and the intrusion of new transcription-factor binding sites into the system will proceed passively.

## A CONCEPTUAL MODEL FOR SUBFUNCTION FISSION

There are two broad views of how new regulatory elements may be incorporated into genes to form new regulatory subfunctions: *subfunction cooption* and *subfunction fission* (see Raff 1996; Carroll* et al*. 2001; Davidson 2001; Force* et al*. 2004). Subfunction cooption involves the evolution of a new function not carried out by the ancestral gene, whereas subfunction fission involves subdivision of a function already present in the ancestral gene. The effects of subfunction cooption may range from the altered expression of a single structural gene to the dramatic activation of a whole developmental pathway in a new location. Although cooption of *new* gene functions and expression domains is widely discussed in the literature (see Raff 1996; Carroll* et al*. 2001; Davidson 2001) and has undoubtedly played an important role in the evolution of new morphological structures, cooption may not be the most common pathway for the origin of new regulatory subfunctions.

In contrast to subfunction cooption, subfunction fission occurs when *multiple* functions under shared genetic control evolve to be under *independent* genetic control. The pattern of tissue-specific gene expression remains conserved during subfunction fission while the underlying molecular mechanisms for achieving that pattern undergo evolutionary modification. Two general phases may contribute to subfunction fission: (1) the replacement of completely shared regulatory binding sites with independent binding sites to form a semi-independent enhancer and (2) duplication of the semi-independent enhancers, followed by the formation of two entirely independent regions, each critical to a single regulatory subfunction, by complementary degenerative mutations (Figure 1). Phase 1 involves accretion, degeneration, and the replacement (ADR) of ancestral transcription-factor binding sites and phase 2 involves the DDC of binding sites within enhancers. The series of events involved in phase 2 are essentially the same as those underlying the subfunctionalization of gene duplicates, except in this case the duplicated region comprises just the enhancers within a gene.

Under subfunction fission, the new gene architecture diverges beneath a constant phenotype. Despite this initial invariance of expression patterns, subfunction fission may open up previously inaccessible evolutionary pathways by eliminating some pleiotropic constraints associated with shared regulatory regions while creating others. Therefore, the evolutionary potential of such alterations may not be realized until a new selective environment and/or appropriate mix of mutations is encountered, at which point modularity at the level of gene architecture may promote the evolution of phenotypic modularity. However, the model that we present highlights the logical distinction between the causal nonadaptive forces that may lead to the restructuring of genomic architecture and the secondary consequences of such change for phenotypic evolution. We now formalize the theory for the two phases contributing to subfunction fission.

### Phase 1—accretion, degeneration, and replacement:

We start by considering the process by which an allele with two overlapping regulatory subfunctions arises from an allele with one regulatory subfunction (Figure 2). We assume a starting point where several transcription factors are present, some of which are general and some tissue specific. For an initial shared regulatory state in which the same positive transcription factor (TF), TFA, drives the gene's expression in two tissues via the same binding site (A), the ancestral allele has only one subfunction because degenerative mutations in binding site A reduce expression in both domains similarly. We further assume the presence of tissue-specific transcription factors (TFB and TFC), expressed in a complementary manner (one in each tissue) with respect to the ancestral expression of TFA. The existence of these tissue-specific transcription factors provides the essential setting in which an allele using TFA as an activator of expression can give rise to a semi-independently regulated derivative requiring both TFB and TFC (Figure 2). Such a transition requires the addition of binding sites for TFB and TFC, followed by the loss of binding sites for TFA. Although positive Darwinian selection may directly promote subfunction formation, we restrict our attention in this article to a near-neutral process.

If we assume that binding sites for all three types of transcription factor are subject to mutational accretion and degeneration, then there is no permanent allelic state under this model, as the alternative classes of shared and semi-independently regulated alleles are free to drift in frequency. However, for heuristic purposes, we first focus on the case in which the population is initially fixed for the shared regulated allele and evaluate the expected time to a transient state of fixation by the semi-independently regulated allele. Even for the simple model outlined above, there are nine alternative allelic states (Figure 3), eight representing all possible permutations of the three types of transcription-factor binding sites (A, B, and C) and an additional coding-region null allele. We assume that each transcription-factor binding site is added to an allele at rate μ_{a} and deleted at rate μ_{d}. We denote the presence of a site with an uppercase letter and the absence of a site with a lowercase letter. Thus, for example, the *Abc* allele containing only binding site A becomes either an *ABc* allele at rate μ_{a} or an allele without any sites *abc* (but still having an intact coding region) at rate μ_{d}. In addition, each expressed allele mutates to the coding null class, denoted by *xxx*, at rate μ_{c}. The *abc* and *xxx* alleles are functionally equivalent but differ in their ability to mutate back into a viable state.

To simplify the following small-population-size approximation, we exclude all nonfixable classes of alleles. Only alleles containing minimally an A site or both B and C sites may go to fixation, so this reduces our consideration to just the *Abc*, *ABc*, *AbC*, *ABC*, and *aBC* alleles. It is convenient to further group these alleles into four classes: the W class containing the *Abc* allele, the X class containing the *ABc* and *AbC* alleles, the Y class containing the *ABC* allele, and the Z class containing the *aBC* allele. The shortest path from an ancestral state fixed for the shared regulated *Abc* allele to a fixed state involving the semi-independently regulated *aBC* allele involves just three sequential transitions: W to X, X to Y, and finally Y to Z. However, many other longer paths can lead to the same end result. Assuming a sufficiently small population size, all transitions between alternative allelic states will proceed independently in an effectively neutral fashion, and a transition matrix can be used to obtain the entire distribution of transition times between these (or any other) two states.

To clarify the definition of the various transition probabilities, we first consider the elements necessary for the shortest (W → X → Y → Z) route. First, because the *Abc* allele mutates to either the *ABc* or the *AbC* allele at rate μ_{a}, and because the expected time to neutral fixation is 4*N* generations (where *N* is the effective population size), the expected transition time from state W to the adjacent state X is [1/(2μ_{a}) + 4*N*], and the approximate per-generation probability of transition between these two states is the reciprocal of this quantity. Second, although the transition to the X class may involve either an *ABc* or an *AbC* allele, in both cases, the formation of the Y class results from the addition of a single binding site, so the approximate per-generation probability of this transition is [(1/μ_{a}) + 4*N*]^{−1}. Finally, the transition from the Y to the Z class involves the loss of the A site, so the approximate transition probability is [(1/μ_{d}) + 4*N*]^{−1}. Conditional on taking the direct route W → X → Y → Z, the mean time for the transition from the *Abc* to the *aBC* state is the sum of the three stepwise transition times. However, because the W ↔ X and X ↔ Y transitions are reversible, the average time to conversion to the semiregulated state is necessarily larger than that obtained by the shortest path.

To account for all potential paths, we use the transition-matrix **P**, with the element in the *i*th row and *j*th column, *P _{ij}*, denoting the probability of transition from state

*j*to

*i*. For our particular application, the four rows and columns are the classes W, X, Y, and Z. Following from the logic developed in the preceding paragraph, the nonzero off-diagonal elements are

*P*

_{WX}= [(1/μ

_{d}) + 4

*N*]

^{−1},

*P*

_{XW}= [(1/(2μ

_{a})) + 4

*N*]

^{−1},

*P*

_{XY}= [(1/(2μ

_{d})) + 4

*N*]

^{−1},

*P*

_{YX}= [(1/μ

_{a}) + 4

*N*]

^{−1}, and

*P*

_{ZY}= [(1/μ

_{d}) + 4

*N*]

^{−1}. All remaining off-diagonal entries are equal to zero, and columns of elements must sum to one, so

*P*

_{WW}= 1 −

*P*

_{XW},

*P*

_{XX}= 1 −

*P*

_{WX}−

*P*

_{YX}, and

*P*

_{YY}= 1 −

*P*

_{XY}−

*P*

_{ZY}. Note that because our interest here is simply in the time to first arrival at state Z, we treat this final state as an absorbing boundary, even though class Z can mutate back to Y, so

*P*

_{ZZ}= 1. The mean time to move from the fixed

*Abc*state to the fixed

*aBC*state is obtained by recursively multiplying the transition matrix by the column vector [

*p*

_{W},

*p*

_{X},

*p*

_{Y},

*p*

_{Z}]

*, where the elements denote the probability that the population is in each state, starting with the vector [1, 0, 0, 0]*

^{T}*. The time to first arrival at the fixed*

^{T}*aBC*state is then 1This is a first-order approximation, as it assumes instantaneous transitions between monomorphic states, ignoring the complexities of the polymorphic transition between fixed states, which can become important at large population sizes (see below).

To examine the validity of this small-population-size approximation (Figures 4 and 5), we compared the analytical results with those obtained by stochastic simulations. In this article we assume an ideal random-mating population where *N* is equal to the effective population size. The simulations kept track of genotype frequencies after random sampling of gametes in diploid, sexual populations. We assumed that homozygous recessive genotypes lacking expression in either tissue had a fitness of zero, whereas all other genotypes were assigned a fitness of one. The simulations were started with the *Abc* allele at a frequency of one and stopped when the sum of frequencies of all A-site-bearing alleles (*Abc*, *ABc*, *AbC*, and *ABC*) equaled zero. Provided that 4*N*μ_{a} ≪ 1 and 4*N*μ_{d} ≪ 1 (*i.e.,* the power of random genetic drift is well in excess of the mutation rates), the purely neutral theory provides a very good approximation to the time to first arrival at the fixed state of the semi-independently regulated *aBC* allele (Figure 4). At any given mutation rates in sufficiently small populations, the transition time is essentially independent of population size, because the rates of movement between alternative states are primarily determined by the waiting times for mutations, rather than by the time for such mutations to drift to fixation. In general, the time to subfunction fission declines as the ratio of μ_{a} to μ_{d} increases, eventually reaching an asymptote at 1/μ_{d} generations at high μ_{a}/μ_{d}, as the final degenerative step involving the Y → Z transition becomes the limiting factor (Figure 5).

The conditions 4*N*μ_{a} ≪ 1 and 4*N*μ_{d} ≪ 1 may be frequently met in eukaryotes. We know that the average value of 4*N*μ, where μ is the substitutional mutation rate per nucleotide, is 0̃.002 for vertebrates, 0̃.010 for invertebrates and land plants, and 0̃.1 for eukaryotic microbes (Lynch and Conery 2003). Thus, given that transcription-factor binding sites typically contain 4–20 nucleotides, we can expect 4*N*μ_{d} to be on the order of 4–20 times 4*N*μ and hence ≪1 for most multicellular species. Although new sites may arise in regions surrounding the existing enhancer, μ_{a} is generally unlikely to greatly exceed μ_{d}; therefore 4*N*μ_{a} is also expected to be ≪1. These rough approximations suggest that the population-genetic environments of most multicellular organisms enable alternative alleles like those in Figure 3 to drift freely back and forth to transient states of fixation in an effectively neutral fashion.

For larger *N*, the neutral theory progressively underestimates the transition time to fixed states (Figure 4). There appear to be two reasons for this behavior. First, because a semi-independently regulated allele has an additional transcription-factor binding site relative to the ancestral shared regulated allele and mutates at twice the rate to a nonfixable allele, the former is at a weak selective disadvantage of order μ_{d}. Second, because of the reversibility of mutations and the extended time to fixation with increasing *N*, arrival at the stopping criterion for our simulations of a completely monomorphic state becomes increasingly unlikely, and focusing on such an extreme state as an indicator of the availability of *aBC* alleles becomes increasingly misleading.

Because large populations will typically harbor polymorphisms involving the full spectrum of alleles in Figure 3, an alternative way to consider the potential for a locus to undergo an ADR transition is to consider the equilibrium distribution of average allele frequencies. To accomplish this, we ran simulations of single replicate populations and averaged the frequencies of the alleles over a large number of generations (Figure 6, top, bottom left). The clear result is that the average frequency of the semi-independently regulated *aBC* allele decreases with increasing population size, irrespective of the specific mutation rates μ_{a} and μ_{d}. Figure 6, bottom right, shows the infinite population size equilibrium frequencies of the four allele classes (see the appendix for the analytical solution). Such behavior results from the increased efficiency of mutationally induced selection against *aBC* alleles at large *N* relative to the less mutationally sensitive alleles *ABC*, *ABc*, *AbC*, and *Abc*. When the rate of accretionary mutation equals or exceeds the rate of degenerative mutation, the redundantly regulated *ABC* allele dominates, as a consequence of the mutational pressure toward gain of binding sites. When μ_{a}/μ_{d} < 1, the shared regulated *Abc* allele dominates for the opposite reason. For all conditions examined, the semi-independently regulated *aBC* allele maintains equilibrium frequencies of at least 0.04, so these results indicate that most large populations are potentially poised to make the transition to a semiregulated state.

One criticism that may be raised against our simple model is that it involves only three binding sites (A, B, and C), whereas most enhancer regions contain multiple copies of different transcription-factor binding sites. To address this concern we ran simulations for a six-site model, in which an allele may contain up to two each of the A, B, and C sites, making a model comprising 65 alleles. Genotypes carrying at least one functional A site or genotypes carrying at least one each of functional B and C sites with zero A sites were assumed to have a fitness of 1. Simulations were initiated with the locus fixed for the *AAbbcc* allele and ended when the frequency of all *A*-bearing alleles had gone to zero. Results for the six-site model show that the behavior is qualitatively the same as that of the three-site model, except that at small population sizes the time for ADR of *A*-bearing alleles is ∼60% shorter (Figure 7, top). While there are two ancestral A sites to lose, four (two B and two C) sites may be gained. The additional viable combinations decrease the time to fission by doubling the rate of addition of B and C sites. In addition, the simulation results show that ∼95% of the alleles at equilibrium exhibit at least partial redundancy with respect to the ancestral A function and ∼12% are of the semi-independently regulated class similar to the results of the three-site model (see Figure 6, top right, and data not shown). This analysis may suggest that enhancers exhibiting at least partial redundancy should be common and that semi-independently regulated alleles may have appreciable frequencies when the rates of addition and deletion are nearly equal.

### Phase 2—the formation of independent modular regulatory regions via enhancer subfunctionalization:

The ADR phase presented above can lead to the formation of two subfunctions with semi-independent regulation. Subsequent ADR events of both positive and negative regulatory elements may act to reinforce the initial fission event. Another type of reinforcing event may involve enhancer duplication. The rate of duplication of entire genes is known to be on the order of 1% per gene per million years (Lynch and Conery 2003), and small duplications of <1000 bp are far more frequent than whole-gene duplication (Katju and Lynch 2003), so internal duplication has the potential to contribute significantly to regulatory-region evolution. Local duplication of a regulatory region may facilitate the formation and preservation of two fully independent regulatory elements by a process that we call *enhancer subfunctionalization*. The end product of this internal duplication event is a state in which each of the duplicate regulatory regions becomes restricted to driving expression in a single tissue.

For simplicity, we again consider a single continuous stretch of DNA, with tissue-specific transcription factors binding to unique B and C sites (see Figures 1 and 8) with a fraction of the DNA-binding sites in the enhancer being required by both overlapping subfunctions. If such an enhancer duplicates to a local site that is completely linked to the ancestral site, subfunctionalization may eventually preserve the duplicate enhancers through complementary degenerative mutations under genetic drift. This process has previously been investigated through theory and simulations for gene duplicates (Force* et al*. 1999; Lynch and Force 2000; Lynch* et al*. 2001), and we follow the previous terminology for enhancer subfunctionalization.

Degenerative mutations in the shared component lead to loss of both subfunctions at rate μ_{c}, while degenerative mutations in the unique component B and C sites lead to the loss of each subfunction at rate μ_{r}. Therefore, the total rate of mutation for the overlapping enhancer corresponding to a subfunction pair is 2μ_{r} + μ_{c}, and the total rate for duplicate enhancer pairs is 4μ_{r} + 2μ_{c}. Assuming that the duplicated regions are entirely functionally redundant, such that alleles carrying the duplicates have identical fitness to those with the ancestral state, the probability of initial fixation of a duplicate enhancer allele is simply its initial frequency, *i.e.*, the neutral expectation, 1/(2*N*), where *N* is the size of the diploid population. Subfunctionalization requires that the first degenerative mutation to become fixed falls in a B or C site, the probability of which is 2μ_{r}/(2μ_{r} + μ_{c}). Finally, the last mutation to fix must fall in the nonshared unique site remaining in the still intact enhancer, and this will occur with probability μ_{r}/(2μ_{r} + μ_{c}) in populations of sufficiently small *N*. The product of the three terms is the probability that a newly arisen allele with a duplicate enhancer will be converted to a state of a nonoverlapping pair of enhancers by degenerative mutation, 2where α = μ_{r}/(2μ_{r} + μ_{c}).

As noted previously, this simple approach fails when μ_{c}*N* begins to exceed ∼0.1 (Lynch and Force 2000; Lynch* et al*. 2001). Subfunctional alleles mutate to null alleles at a higher rate, μ_{c}, than nonfunctional alleles carrying an intact overlapping enhancer and one dead overlapping enhancer. The mutation rate difference is small and begins to significantly affect the dynamics when the effective population size is large enough. In the appendix, we derive two estimates for the probability of enhancer subfunctionalization that account for this change in behavior in large populations. The behavior of the approximations was verified by individual-based computer simulations incorporating the sequential processes of mutation, selection (against lethal null homozygotes), and random gamete sampling using the same procedures outlined in our previous work on gene duplication (Figure 9; Lynch* et al*. 2001). For any given set of parameters (*N*, μ_{r}, and μ_{c}), 10^{5} independent simulations were evaluated to obtain a precise estimate of Pr(Sub).

Because of the inverse scaling of Pr(Sub) with *N*, it is convenient to focus on the scaled probability of subfunctionalization, θ_{Sub}, which is the ratio of the actual probability of subfunctionalization and the neutral probability of fixation 1/(2*N*). Provided *N*μ_{c} < 0.1, θ_{Sub} is very close to the prediction of the small-population theory, α^{2}/*N* (Figure 9). θ_{Sub} then slowly increases with *N* with a maximum slightly greater than the small-population prediction occurring in the vicinity of *Nμ*_{c} ≃ 1. However, with further increases in *N*, θ_{Sub} drops very rapidly, with θ_{Sub} ≃ 0 for *N*μ_{c} > 10.

## SUBFUNCTION FORMATION AND RESOLUTION LEAD TO THE MODULAR RESTRUCTURING OF GENE NETWORKS

The results reported in this communication and in our previous work may have more global significance when considered in their totality (Force* et al*. 1999, 2004; Lynch and Force 2000; Lynch* et al*. 2001). Over evolutionary time there has been a general increase in the functional and structural specialization of body regions in multicellular eukaryotes, such as the long-term increase in the numbers and types of arthropod limb morphologies (Carroll* et al*. 2001). The trend in morphological specialization and developmental regionalization may be due to changes in the underlying circuitry of the developmental gene networks. The predominant form of morphological evolution at the phenotypic level among metazoans may involve parcellation of existing developmental modules at the genotypic level (Wagner 1996; Wagner and Altenberg 1996; Force* et al*. 2004). The formation of new subfunctions and the association of subfunctions with different genes following gene duplication events will change the circuitry of the underlying developmental genetic networks.

We refer to the subfunction formation and resolution processes, as they impact both genes and networks, as *modular restructuring* (Figure 10). First, new subfunctions are formed within a gene by subfunction fission or cooption processes (formation). Second, subfunctions are partitioned among different gene copies following gene duplication by DDC mechanisms (resolution). The modular restructuring process might have immediate effects on the phenotype. More importantly, however, under relatively constant environmental conditions the phenotype may not be affected in any discernible way. Third, the new underlying genetic circuitry created by modular restructuring may open up new pathways for rapid morphological change. For instance, mutations may have unique phenotypic effects on the new genetic architecture that are now beneficial due to the removal of ancestral pleiotropic effects. Modular restructured genetic architectures may provide the evolutionary potential for rapid responses to novel environmental conditions. Therefore, modular restructuring at the genomic level may in part provide a population-level mechanism for the frequent observation of relatively rapid bursts of evolution and long periods of stasis observed in the fossil record (Gould and Eldredge 1977, 1993, for example).

## DISCUSSION

We have investigated the origin of new regulatory subfunctions via fission where multiple expression domains under unified genetic control become subdivided into more restricted expression domains under independent genetic control. Subfunction fission may proceed by the replacement of ancestral transcription-factor binding sites with new sites that drive more restricted patterns of gene expression. Duplication of overlapping enhancers may then lead to the formation of two entirely independent and modular regulatory regions through enhancer subfunctionalization.

Population size plays a key role in this process. Provided the effective population sizes are sufficiently small where *N*(μ_{a} + μ_{d}) ≤ 0.1, the time for ADR closely reflects the behavior of the small population theory derived here. The time for ADR is extended in large populations because of a small mutationally induced selective advantage of the most redundant *ABC* allele. In the case of the *ABC* allele, if any of the three sites is deleted, the resulting allele is viable when homozygous and may be fixed in the population. In contrast, deletion of any site in the *aBC* allele results in a nonviable allele when homozygous, which is unlikely to go to fixation in very large populations. Therefore, while the fixation of the *aBC* allele is inhibited in very large populations, its immediate precursor, the *ABC* allele, is increased by differential mutation pressure. However, mutation pressure begins to strongly affect the allele frequencies only when *N*(μ_{a} + μ_{d}) > 0.1. Given that the rates for μ_{a} and μ_{d} are not likely to be >10^{−6} and could be orders of magnitude less, the minimum effective population size where the distribution of allele frequencies begins to be affected by mutation-induced selection pressure is ∼100,000. Therefore, mutationally induced selection drives enhancers and genes toward overlapping function and nonmodularity in large populations. However, in small populations, ADR and DDC processes allow systems to diffuse toward modularity, meaning populations are likely to harbor a distribution of nonmodular, quasi-modular, and modular enhancer structures.

Previously, Stone and Wray simulated the evolution of new transcription-factor binding sites by point mutation to estimate the time to formation of new sites and then estimated their time to fixation using neutral theory (Stone and Wray 2001). While their calculations for the time to fixation were incorrect (MacArthur and Brookfield 2004), their treatment of the time for origin of new sites by mutation is consistent with our results. If we assume μ_{a} ≅ μ_{d} ≅ 10^{−7}, the expected number of generations for ADR is ∼50,000,000 at an effective population size of 100,000 (see Figure 4). This is a slow rate for the evolution of new subfunctions by fission, but given that the number of subfunctions in the genome is greater than the number of genes, it may be a significant process over long-term evolution. Our model provides a baseline for the time to formation of a new regulatory subfunction, which may be reduced significantly by selection. For instance, the time for accretion to form the *ABC* allele under the above conditions is reduced ∼1000-fold to about ∼50,000 generations when the partially redundant *AbC* and *ABc* alleles each have a selective advantage of *s* = 0.01 and their effects are additive (data not shown and also see MacArthur and Brookfield 2004).

In contrast to our near-neutral fission process, models of compensatory evolution suggest enhancers may evolve through pairs of individually deleterious mutations that are beneficial in combination (Carter and Wagner 2002). Under this model, evolution of functionally conserved enhancers may occur by a two-step process with the first step involving a deleterious intermediate and the second step involving a beneficial compensatory mutation. This model makes the prediction that enhancers will evolve faster in very large populations where the double-mutant allele arises from segregating single-mutant deleterious alleles in the population. For conserved enhancers to evolve faster than under neutrality in large populations requires the assumption that the new double-mutant enhancer allele has a higher fitness than the ancestral enhancer allele. It is not clear how frequently this would be the case, unless the process were cyclical where slightly deleterious mutations would become fixed by drift in small populations and then a new deleterious allele could act as an intermediate in the formation of a compensatory beneficial allele. Interestingly, long-term cyclical population size would also aid the near-neutral fission process. If populations go through prolonged periods of large effective size followed by prolonged periods of small effective size, a buildup of the redundant precursor *ABC* allele during the former period and the fission *aBC* allele during the latter period is possible. It is unlikely the slightly deleterious *aBc* and *abC* alleles would contribute significantly to the evolution of the *aBC* alleles under our assumption of near neutrality. However, if the *aBC* allele was strictly beneficial, then these indirect pathways would be expected to contribute significantly.

Our results suggest why a population-genetic perspective, incorporating random genetic drift, is central to understanding the evolution of genotypic and potentially phenotypic modularity. While many (Wagner 1995; Cheverud 1996; Wagner and Altenberg 1996; Gerhart and Kirschner 1997; Raff and Raff 2000; Raff and Sly 2000) have argued that modularity may be directly selected for at the individual level, or indirectly selected for at the population level as an enhancer of evolvability, our work suggests that new subfunctions and genetic modularity can evolve under certain population-genetic scenarios via a nearly neutral process, without any selection promoting modularity itself. In addition, an increase in the number of regulatory subfunctions corresponds to an increase in the complexity of developmental genetic networks, which may be the foundation for phenotypic complexity. Finally, the results of this article support the hypothesis that many of the complex features of eukaryotic genomes may arise as simple by-products of random genetic drift in populations with small effective sizes (small being potentially as large as 10^{7}) (Force* et al*. 1999, 2004; Lynch and Force 2000; Lynch* et al*. 2001; Lynch and Conery 2003).

## APPENDIX

### Equilibrium frequencies of fission alleles in an infinite population:

Here we compute the distribution of allele frequencies in an infinite population using a matrix-modeling approach. The probability that a binding site is added is μ_{a} and the probability that a binding site is lost through mutation is μ_{d}. All individuals that have a genotype where both expression domains are covered produce the same expected number of offspring, while individuals that do not produce zero offspring.

We can define a matrix with elements **A*** _{ij}* that describe the number of genotype

*i*offspring (row

*i*) produced by a single adult of genotype

*j*(column

*j*). Because the genotypes

*AbC*and

*ABc*have the same mutational properties (each goes to

*Abc*,

*ABC*, and either

*aBc*or

*abC*with the same probability) we can lump them into a single class. We index the genotypes as We can compute the matrix entries by considering the probability an individual in class

*j*produces an individual in class

*i*. This depends on the probability of each binding site being mutated. For example,

**A**

_{11}= (1 − μ

_{d})

^{3}because an individual with all three functional binding sites will produce an individual with three functional binding sites only if none of those sites are lost to mutation. Thus, the matrix

**A**is given by A1The long-term frequency of genotype class

*i*can be found by computing the dominant right eigenvector of

**A**. This represents the stable distribution of genotype densities, excluding the genotype classes that have no reproductive success. This vector can be normalized to produce frequencies by dividing by the sum of the vector elements. Note that because class 2 contains two genotypes, the frequency of genotypes

*AbC*and

*ABc*is half of the frequency of class 2.

### The probability of enhancer subfunctionalization:

Here we derive an approximation to the probability of enhancer subfunctionalization. We begin by considering a pair of linked overlapping duplicate enhancers within a gene. Each of two independent regulatory elements, *B* and *C*, within a single overlapping enhancer is knocked out at rate μ_{r} and the entire overlapping enhancer with shared regulatory elements is knocked out at rate μ_{c} (see Figure 8). We use the following shorthand: A single intact enhancer is denoted by *BC* for the two independently mutable components of the overlapping enhancer structure. If either site is functionally deleted it is replaced with an *****, and if both independent components or the shared components are functionally deleted the dead enhancer is denoted by ******. Four allele classes are formed during the process of enhancer subfunctionalization that are viable when fixed in populations. These include the duplicate enhancer alleles *BC*|*BC* with frequency *p*, the partial subfunctional alleles **C*|*BC*, *B**|*BC*, *BC*|**C*, and *BC*|*B** with frequency *x*, the subfunctional alleles **C*|*B** and *B**|**C* with frequency *y*, and the single nonfunctional enhancer class alleles **|*BC* and *BC*|** with frequency *q*. The initial nonduplicate enhancer *BC* alleles, with frequency *q*′, are identical in state to the single enhancer class alleles but are kept track of separately because they are not descendants of the original duplicate *BC*|*BC* allele. Furthermore, we refer to an allele class by its respective uppercase letter, *P*, *X*, *Y*, *Q*, and *Q*′.

We divide the problem into a series of fixation events of interval length 4*N* generations, because on average this is the coalescence time of a single neutral allele in a diploid population. We can then estimate the probability of enhancer subfunctionalization Pr(Sub) as a summation of the individual probabilities of fixation of subfunctional alleles during each interval. Thus, A2where Pr is the probability of fixation of the subfunctional *Y* alleles in interval *i* and *y̅*_{i} is the mean frequency of subfunctional *Y* alleles in interval *i*.

The original duplicate enhancer *P* allele begins as a single copy in the population. We make the assumption that during each interval one of the viable alleles goes to fixation. This assumption is clearly valid when the population size is small, as the probability of homozygosity is close to one. In large population sizes the full spectrum of alleles will be present at the time of fixation and will deviate slightly from this assumption. Following the first fixation event where either a duplicate *P* allele or partial subfunctional *X* allele is fixed, subfunctional *Y* alleles can be derived from either ancestor. Therefore, we can rewrite Pr(Sub) as A3where Pr and Pr are the probabilities of fixation of the subfunctional alleles at the *i*th fixation event derived from a fixed duplicate *P* allele or a fixed partial subfunctional *X* allele at the end of interval *i* − 1.

For the frequencies of alleles at the end of each interval, we use a subscript referencing the interval number and/or allele from which it was derived. Thus, *p̅*_{1} is the frequency of *P* after the first interval and *p̅*_{2} is the frequency of *P* after the second and subsequent intervals derived from a *P* ancestor. Similarly, *x̅*_{1} is the frequency of *X* after the first interval, *x̅*_{2,P} is the frequency of *X* after the second and subsequent intervals derived from a *P* ancestor, and *x̅*_{2,X} is the frequency of *X* after the second and subsequent intervals derived from a *X* ancestor. The probability of fixation of *Y* for the first interval is Pr. The probabilities of fixation of *Y* for the second interval and subsequent intervals are Pr for *Y* derived from *P* and Pr for *Y* derived from *X*. The probabilities of fixation of *Y* within the second and subsequent intervals remain the same but are weighted by the decaying cumulative frequencies of *P* and *X*. Using the geometric series, it can be shown the probability of enhancer subfunctionalization is A4and the scaled probability of subfunctionalization is A5

Next, we determine expressions for the frequencies of the alleles after each interval. For the first interval and subsequent intervals the mean frequencies of the alleles derived from duplicate *P* alleles can be approximated in the following manner. The duplicate enhancer *P* alleles with arrangement *BC*|*BC* mutate at a total rate of (4μ_{r} + 2μ_{c}) into *X*, *Y*, and *Q* alleles. The frequency of the duplicate P alleles after *t* generations is described by the decay equation, A6The partial subfunctional *X* alleles, with four arrangements **C*|*BC*, *B**|*BC*, *BC*|**C*, and *BC*|*B**, originate from the *P* alleles. The formation of *X* alleles requires a single independent regulatory element knockout at rate μ_{r} and requires that no other mutations occur in the remaining three independent regulatory sites (at rate 3μ_{r}) and the two shared regulatory regions (at rate 2μ_{c}). Thus, the frequency of *X* derived from a duplicate *P* ancestor is A7

In a similar fashion, equations for the frequencies of *Y*, *Q*, and *Q*′ alleles can be obtained A8A9A10After normalization the expected frequencies of the alleles across all possible populations at the time of fixation are , and , where .

To obtain the frequencies for the first interval we set *p*_{0} = 1/2*N*, and *q*_{0} = 1 − 1/2*N*. If following the first fixation event a duplicate *P* allele is fixed, the above equations (A6)–(A10) are used to determine the allele frequencies after 4*N* generations for the second and subsequent intervals where *p*_{0} = 1. If following the first fixation event a partial subfunctional *X* allele is fixed, we can obtain the allele frequencies , in a similar manner to those above with *x*_{0} set equal to 1: A11A12A13The expected frequencies of the alleles across all possible populations at the time of fixation are then and where *F* = *x _{t}* +

*y*+

_{t}*q*.

_{t}The three probabilities of fixation, Pr, Pr, and Pr, are functions of the subfunctional *Y* allele frequency during each interval. In Figure 9, we plot two approximations, the first where we treat the fixation of *Y* alleles as a neutral process and the second where we treat the fixation of *Y* alleles as the fixation of a slightly deleterious allele. In the first case, the fixation probabilities are equal to the frequencies of *Y* at each interval *i*. In the second case, we use the diffusion approximation for the probability of fixation of a beneficial allele in a diploid population (Kimura 1962), where *y̅*_{i} is the frequency of the slightly deleterious subfunctional alleles, *s* is the selection coefficient, and *N* is the population size. Consider the following two alleles, the *Y* allele *B**|**C* and the *Q* allele *BC*|**. The rate of mutation of the *Y* alleles to a nonfixable allele state is 2μ_{c} + 2μ_{r} and the rate of mutation of the *Q* alleles to a nonfixable allele state is μ_{c} + 2μ_{r}. Therefore, the subfunctional alleles die at a rate μ_{c} relative to the single nonfunctional enhancer *Q* alleles, suggesting the selection coefficient *s* for the *Y* alleles is −μ_{c}.

## Acknowledgments

The authors thank Thomas Hansen and Ashley Carter for their comments on an early draft of the manuscript. Our work has been funded, in part, by grants from the National Institutes of Health (NIH) (RR14085, HG02526-01, 5F32GM020892), and the National Science Foundation (IBN-0321461, IBN-023639). Work in the Pickett laboratory is supported by grants NIH 2R15GM061620-02 and U.S. Department of Agriculture 2003-35304-13252.

## Footnotes

Communicating editor: D. M. Rand

- Received February 12, 2004.
- Accepted February 8, 2005.

- Genetics Society of America