Genetics, Vol. 154, 1389-1401, March 2000, Copyright © 2000

The Role of Population Size, Pleiotropy and Fitness Effects of Mutations in the Evolution of Overlapping Gene Functions

Andreas Wagnera
a Department of Biology, University of New Mexico, Albuquerque, New Mexico 87131-1091 and The Santa Fe Institute, Santa Fe, New Mexico 87501

Corresponding author: Andreas Wagner, University of New Mexico, 167A Castetter Hall, Albuquerque, NM 87131-1091., wagnera{at}unm.edu (E-mail)

Communicating editor: A. G. CLARK


*  ABSTRACT
*TOP
*ABSTRACT
*MODEL AND RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Sheltered from deleterious mutations, genes with overlapping or partially redundant functions may be important sources of novel gene functions. While most partially redundant genes originated in gene duplications, it is much less clear why genes with overlapping functions have been retained, in some cases for hundreds of millions of years. A case in point is the many partially redundant genes in vertebrates, the result of ancient gene duplications in primitive chordates. Their persistence and ubiquity become surprising when it is considered that duplicate and original genes often diversify very rapidly, especially if the action of natural selection is involved. Are overlapping gene functions perhaps maintained because of their protective role against otherwise deleterious mutations? There are two principal objections against this hypothesis, which are the main subject of this article. First, because overlapping gene functions are maintained in populations by a slow process of "second order" selection, population sizes need to be very high for this process to be effective. It is shown that even in small populations, pleiotropic mutations that affect more than one of a gene's functions simultaneously can slow the mutational decay of functional overlap after a gene duplication by orders of magnitude. Furthermore, brief and transient increases in population size may be sufficient to maintain functional overlap. The second objection regards the fact that most naturally occurring mutations may have much weaker fitness effects than the rather drastic "knock-out" mutations that lead to detection of partially redundant functions. Given weak fitness effects of most mutations, is selection for the buffering effect of functional overlap strong enough to compensate for the diversifying force exerted by mutations? It is shown that the extent of functional overlap maintained in a population is not only independent of the mutation rate, but also independent of the average fitness effects of mutation. These results are discussed with respect to experimental evidence on redundant genes in organismal development.


OVERLAPPING or partially redundant gene functions are ubiquitous in eukaryotes. Their presence spans the entire taxonomic range from unicellular organisms to mammals. They are observed in proteins of any function, be it enzymatic, structural, or regulatory (BASSON et al. 1986 Down; HOFFMAN 1991; LUNDGREN et al. 1991 Down; HIGASHIJIMA et al. 1992 Down; GOLDSTEIN 1993 Down; CADIGAN et al. 1994 Down; GONZALES-GAITAN et al. 1994 Down; LI and NOLL 1994A Down, LI and NOLL 1994B Down; CONDIE and CAPECCHI 1995 Down; GOODSON and SPUDICH 1995 Down; HANKS et al. 1995 Down). Weak or no phenotypic effects of a knock-out mutation in a gene thought to play an important biological role are often the first indicators of such redundancy. Further analysis then typically reveals one or more related genes with functions similar to that of the mutated gene. With some potential exceptions, partially overlapping gene functions can be traced to (often old) gene duplications.

While the evolutionary origins of most partially redundant gene functions are obvious, it is much less obvious why genes with overlapping functions have been retained, in some cases for hundreds of millions of years. This holds especially for many partially redundant genes in vertebrates, the result of ancient gene duplications in primitive chordates (SHARMAN and HOLLAND 1996 Down; BAILEY et al. 1997 Down). This is even more puzzling given that duplicate and original genes can diversify very rapidly after duplication (CIRERA and AGUADE 1998 Down; TSAUR et al. 1998 Down; ZHANG et al. 1998 Down), especially if the action of natural selection is involved. Thus, are overlapping gene functions merely the transient remnants of past gene duplication events, a snapshot of a diversification process that will eventually be complete? Or are they perhaps maintained because of their protective role against otherwise deleterious mutations? These questions are attracting increasing interest (TAUTZ 1992 Down; THOMAS 1993 Down; COOKE et al. 1997 Down), and for good reasons. Sheltered from some deleterious mutations, partially redundant genes may be important sources of evolutionary novelties on the biochemical level.

This article deals with the two perhaps most serious objections to the persistence of overlapping gene functions as a buffer against mutation. The first objection concerns the population size required to stably maintain overlapping gene functions in a population. Even neutral mutations in genes with overlapping functions are likely to lead to diversification of the gene's functions. Counteracting this evolutionary force is a much slower process of "second order" selection for maintaining overlap (WAGNER 1999 Down), a process that acts on differences in functional overlap among genes. These differences are selectively neutral, but genes with greatly overlapping functions are less likely to undergo deleterious mutations than genes with distinct functions. Over time, genes with greater functional overlap will be preferentially retained in a population. Because of the indirect nature of this process, populations have to be large to sustain the amount of variation in overlap on which selection can act. A past contribution has shown that the required population sizes are not impossibly large (WAGNER 1999 Down). Here, it is shown that even in small populations, pleiotropic mutations that affect more than one of a gene's functions simultaneously can delay functional divergence by orders of magnitude. Moreover, in contrast to the maintenance of genetic variation in fluctuating populations, the evolution of functional overlap may be predominantly influenced not by occasional population bottlenecks but by occasional population size bursts.

The second potential objection regards the fact that most naturally occurring mutations are likely to have much weaker fitness effects than the rather drastic "knock-out" mutations that lead to detection of partially redundant functions in the laboratory. Given weak fitness effects of many mutations, is the selection process outlined above strong enough to compensate for the diversifying forces exerted by mutations? It is shown that the evolution of mean fitness and functional overlap are effectively decoupled. This implies that the average effect of deleterious mutations on fitness does not greatly influence the evolution of functional overlap and vice versa. This also has implications on the genetic load associated with overlapping gene functions, as discussed below.

To address the above questions, this contribution utilizes a conceptually simple mathematical model that relies on three main assumptions (WAGNER 1999 Down). First, the greater the functional overlap among genes, the greater the fraction of mutations in either gene that are phenotypically neutral. Second, both neutral and nonlethal deleterious mutations cause gene functions to diverge. A third assumption, already implicit in the above discussion, is that overlap among gene functions is a quantifiable variable, at least in principle.

While the first two assumptions are unproblematic, the third requires further comment. Two examples illustrate its validity. Overlapping gene functions are especially prominent among regulatory developmental genes, because these genes often have multiple functions. (Most housekeeping genes, many of which catalyze specific enzymatic reactions, may be restricted in their ability to evolve functional overlap. The specificity of their function may be part of the reason why housekeeping genes are rarer among the documented cases of functional overlap.) Consider the case of a transcription factor (Fig 1A), regulating the expression of multiple genes involved in a developmental process. Transcription factors occupy a prominent role in the many examples of genes with overlapping functions (JOYNER et al. 1991 Down; WEINTRAUB 1993 Down; MACONOCHIE et al. 1996 Down). After duplication (Fig 1B) and diversification (Fig 1C), the functional overlap among duplicate and original can be viewed as the subset of genes regulated by both factors. Microarray technology (SCHENA 1996 Down) has made the quantification of such overlaps a realistic goal: perturb the activity of each factor in separate experiments, measure the resulting change in gene expression patterns, and assess which subset of genes is affected similarly. A second example concerns genes with identical biochemical functions but spatiotemporal expression patterns that are not completely congruent (LI and NOLL 1994A Down, LI and NOLL 1994B Down; HANKS et al. 1995 Down). Here, the extent of spatiotemporal overlap in two expression domains can be taken as a measure of functional overlap. Recently developed automated image processing technology to quantify the concentration of multiple gene products in a developing embryo at single cell resolution enables such quantification (KOSMAN et al. 1998 Down).



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 1. (a) A group of genes regulated by a transcription factor (TF). (b) After duplication of the transcription factor gene, the same genes are now regulated by both original and duplicate. (c) Over time, each factor may lose the ability to regulate some of the genes, such that only a subset of genes (hatched) are regulated by both factors.


*  MODEL AND RESULTS
*TOP
*ABSTRACT
*MODEL AND RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The model is concerned with a population of haploid, randomly mating organisms with nonoverlapping generations. The assumption of haploidy is chosen merely because it exposes most clearly the key evolutionary principles at work. The genic mutation rate is denoted as µ (µ << 1). Neutral mutations that do not affect any aspect of the function of a gene product are not considered.

The central concept of the model is the notion of partially redundant or overlapping gene functions, and that the overlap in two gene's functions can be quantified. Specifically, let the variable r, r <= 0 <= 1, denote a measure of the functional overlap of two genes. If r = 1, then two genes have completely identical functions; if r = 0, there is no overlap in the gene's functions. If two genes have overlapping functions, then some mutations that eliminate one gene's function will be neutral because the function is covered by the other gene. This is conceptualized in the following way: if r = 1 (identical functions), all mutations are neutral; if r = 0, no mutation is neutral; and if 0 < r < 1, the rate of neutral mutations is some function f(r). The simplest case is that of a linear relation, in which the rate of neutral mutations per two genes is given by 2µr (neglecting terms of order µ2). While this linear relation is used throughout most of this section, the consequences of relaxing this assumption are explored.

Not only can mutations affect fitness, but also they may change the functional overlap among two genes, which will in general lead to a divergence in gene functions. This is modeled by a conditional probability density mr(r*|r), which denotes the probability that the functional overlap r* after mutation of two genes with overlap r before mutation lies in the interval (r*, r* + dr*). To leave the formalism as general as possible, only the minimal assumption that mutation reduces r on average by a factor {lambda}r is made, i.e.,

(1)

where 0 < {lambda}r < 1. Because a value of {lambda}r = 0.5 means that each mutation reduces r by 1/2 on average (a very rapid divergence rate), {lambda}r > 0.5 is used here. Complete loss-of-function mutations could be modeled as an important special case, but are not considered here, because their evolutionary dynamics have been extensively modeled by others (OHNO 1970 Down; NEI and ROYCHOUDHURY 1973 Down; KIMURA and KING 1979 Down; MARUYAMA and TAKAHATA 1981 Down; WATTERSON 1983 Down; OHTA 1987 Down; MARSHALL et al. 1994 Down; WALSH 1995 Down). Also, empirical data suggest that the complete elimination of one gene's function may be much less frequent than commonly assumed (ALLENDORF et al. 1975 Down; FERRIS and WHITT 1976 Down, FERRIS and WHITT 1979 Down; NADEAU and SANKOFF 1997 Down).

Summary of previous results:
Because results from a previous contribution (WAGNER 1999 Down) are used below, they are briefly reviewed here. In this earlier article, it was assumed that all nonneutral mutations, occurring at a rate of 2µ(1 - r) per two genes with overlap r, are effectively lethal. The evolution of the distribution of functional overlap, pt(r), in a population was studied under this assumption. The recurrence equation

(2)

describes the distribution pt(r) of r in generation t of a large population (Nµ > 50; WAGNER 1999 Down). It can be shown that the mean functional overlap among two genes = {int}10 rp(r)dr reaches a nonzero mutation-selection equilibrium independent of the initial condition and approximated by

(3)

Here, {sigma}2mr is the variance of the effect of mutations on r, defined as {sigma}2mr = {int}10(r* - {lambda}rr)2mr(r*|r)dr*. This approximation holds for values of {infty} in the interior of (0,1), requires no further assumptions about the distribution of mutational effects, and is independent of the mutation rate. It is in good agreement with numerical results (WAGNER 1999 Down). When this model was extended to n genes with overlapping functions, it was found that (i) the extent of overlap maintained in mutation-selection balance is statistically indistinguishable from that for the two-gene case, and (ii) linkage relations are not likely to favor or disfavor the evolution of overlap. These results motivate the restriction to the two-gene case with tight linkage analyzed below.

For small population Nµ << 1, it was found that functional overlap diminishes (functions diverge) at a rate approximated by

(4)

Here, <rt> denotes the mean functional overlap in an ensemble of small populations. In the absence of selection, mutations (1) would lead to an exponential decline in <rt> from an initial value of one immediately after gene duplication. The most important feature of (4) is that the decay in r is much slower than that: the lower the functional overlap between two genes, the lower the rate at which functions continue to diverge. The reason is that most mutations that affect r are neutral for large r but deleterious for small r. This leads to a reduction in r following a polynomial rather than an exponential rate.

In sum, if all nonneutral mutations have severely deleterious effects, then (i) functional overlap and thus an increased rate of nondeleterious mutations can be maintained among genes in large populations, and (ii) functions diverge, albeit very slowly, in small populations.

Evolution of functional overlap and of mean fitness are decoupled:
In this article the assumption of lethality of nonneutral mutations is relaxed. Specifically, fitness of an organism w is interpreted as a probability of survival, and nonneutral mutations in an organism with fitness w reduce fitness by a factor {lambda}w:

(5)

This formula, completely analogous to (1), allows for an increase in fitness provided that the variance of mutational effects on fitness {sigma}2mw = {int}10(w* - {lambda}ww)2mw(w*|w)dw* is large, but also ensures that mutations reduce mean fitness on average by a factor {lambda}w if {lambda}w < 1. Of interest is the evolution of the joint distribution of functional overlap and fitness pt(r,w), as well as the moments t = {int}10 {int}10 riwjpt(r,w)drdw. In a large population, pt(r,w) will evolve according to

(6)

The denominator

(7)

is a normalization factor representing the fraction of individuals surviving from one generation to the next. In the numerator, (1 - 2µ)wpt(r,w) is the fraction of individuals that do not undergo mutation in generation t and that survive into the next generation. 2µw {int}10 zrpt(zr,w)mr(r|zr)dzr is the fraction of individuals with fitness w that undergo a neutral mutation changing r from some value zr to r and that survive into the next generation. Finally, 2µw {int}10 {int}10(1 - zr)pt(zr,zw)mw(w|zw)mr (r|zr)dzrdzw is the fraction of individuals that undergo a mutation affecting both fitness (zw -> w) and functional overlap (zr -> r) and that survive into the next generation. In both these terms, the factor w in front of the integral represents the effect of selection, i.e., the probability that an individual of fitness w survives into the next generation. Implicit in these equations are the assumptions that functional overlap influences the probability of a deleterious mutation, but not its severity, and that mutations occur before selection.

This rather formidable equation, while perhaps too complicated to solve analytically, can be used to obtain a rough estimate for the correlation {rho}(r,w) between fitness and overlap in mutation-selection balance. The estimate, derived in the Appendix, suggests that this correlation is small; i.e.,

(8)

This crude approximation (see Appendix) is fully supported by numerical results, exemplified in Fig 2 and Fig 3. Fig 2 shows that both in cases where mutations have mildly (Fig 2, a and b; {lambda}w = 0.9) and severely deleterious effects (Fig 2C and Fig D; {lambda}w = 0.3), the correlation between r and µ is of order . This holds even in spite of the high mutation rates of 10-2 that were used for reasons of computational feasibility. Thus, in mutation-selection balance, there is only a weak correlation between fitness and functional overlap among genes. Moreover, the mean overlap observed both for strongly (Fig 2C) and for weakly (Fig 2A) deleterious mutations is statistically indistinguishable from the theoretical prediction (3) for the case where all mutations are lethal. Also, mean functional overlap at equilibrium is independent of the mutation rate when this rate is varied over two orders of magnitude (Fig 3B). Further, the difference between and one is of the order µ regardless of the values of {lambda}r (Fig 2C and Fig D) or {lambda}w (Fig 3A). In sum, functional overlap at equilibrium is independent of the mutation rate and of the severity of mutational effects on fitness ({lambda}w). Conversely, mean equilibrium fitness is not sensitive to changes in the rate at which mutations lead to a decay of redundancy ({lambda}r). In this sense, evolution of redundancy and fitness are decoupled. The reasons for this are discussed below.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 2. Mean fitness evolution and mean functional overlap in mutation-selection balance. (a) Mean functional overlap at mutation-selection equilibrium for {lambda}w = 0.9, i.e., nonneutral mutations reduce fitness on average by 10%. Dots and bars represent mean fitness and one standard deviation as obtained from Monte Carlo simulations. Mean equilibrium overlap scales linearly with {sigma}mr/(1 - {lambda}r), where {lambda}r is the mean reduction in functional overlap caused by mutation, and {sigma}mr is the standard deviation of this reduction. The solid line represents the analytical prediction (3) for {lambda}w = 0, i.e., where all nonneutral mutations are lethal. Note that equilibrium overlap maintained is statistically indistinguishable from that in the lethal case (solid line). (b) One minus mean fitness (dots and bars) and correlation between fitness and functional overlap (solid line) in mutation-selection equilibrium. The dashed line represents . Note that despite the high mutation rate µ = 10-2 used here for reasons of computational feasibility, both correlation and deviation of mean fitness from one are still of order , as estimated analytically. (c and d) Identical to a and b, respectively, but for more severe effects of mutations on fitness ({lambda}w = 0.3). N = 5 x 104, {sigma}mw = 0.05, 104 generations.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 3. (a)Dependency of mean fitness at mutation-selection equilibrium, plotted as 1 - , on {lambda}w, the average factor by which mutations reduce fitness. N = 5 x 104, {sigma}mw = 0.05, 104 generations simulated. Dots represent population means, bars one standard deviation. (b) Dependency of mean functional overlap at mutation-selection equilibrium on the mutation rate. Note the logarithmic scale for the mutation rate. Computational feasibility alone motivated the choice of range for µ. Note that even for high mutation rates, equilibrium overlap is statistically indistinguishable from the analytical prediction (3), represented by the solid horizontal line. {lambda}w = {lambda}r = 0.9, {sigma}mr = 0.05, {sigma}mw = 0.01, N = 500/µ generations simulated.

Transient population size bursts may be sufficient to maintain redundancy:
While the maintenance of functional overlap by means of natural selection requires large populations (Nµ >> 1), overlap may decay relatively slowly in small populations (WAGNER 1999 Down). This prompts a question as to what fraction of time a population of fluctuating size has to spend in a large Nµ regime for overlap to be maintained by selection. It might appear that the concept of effective population size (HARTL and CLARK 1997 Down) from population genetic theory holds the answer to this question. For populations that spend a fraction f of their time in a regime of small Nµ by having a small population size NS, and a fraction 1 - f in a regime of large Nµ by having a large NL, one calculates the (inbreeding) effective population size as

(9)

A population of fluctuating size with a given Ne should then behave in exactly the same manner as a population of constant size N = Ne. Numerical results from Fig 4 indicate that this is not the case for the evolutionary process studied here, because fluctuating and constant populations with the same Ne show different mean overlap.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 4. Population size fluctuations and evolution of overlap. Shown are mean and standard error of overlap in an ensemble of 50 populations after 5000 generations of evolution. The simulation was started at a mean overlap of 0.28, which is the mean overlap at mutation-selection balance in a very large population, as predicted by (3) for the parameters used ({lambda}r = 0.09, {lambda}w = 0, {sigma}mr = 0.05, µ = 0.25). Solid bars correspond to a population of constant size (Ne = N) such that Neµ has the value indicated on the abscissa. Open bars correspond to populations of fluctuating size with the same value of Neµ as for the adjacent solid bar, where each population spent tL = 250 generations in a regime of large population size (NL = 10,000) and tS = 250 generations in a regime of small population size, NS. NS was calculated from (9) as NS = f NeNL/(NL - Ne(1 - f)), with f = tS/(tL + tS). Mean overlap for Neµ = 5 and constant population size was equal to 1.5 x 10-5, a value too small to appear on the plot.

Fig 5 shows more extensive numerical results for the evolution of overlap under fluctuating Nµ, where a population spends alternatingly tS generations in a regime of small Nµ, in which redundancy will decay under the influence of drift, and tL generations in a regime of large Nµ, where selection will maintain functional overlap. The figure shows mean functional overlap in a population ensemble after 105 generations, as a function of the ratio tS/tL of time spent in the small Nµ regime. For comparison, a dashed line shows Neµ as calculated via (9), where f = tS/(tL + tS). Note that Neµ is smaller than one for all the values of tS and tL, such that drift should dominate the dynamics of overlap (Figure 7 in WAGNER 1999 Down). However, for many of the values of tS and tL in Fig 5, populations maintain levels of overlap indistinguishable from or close to those in mutation-selection balance for infinite populations, despite the fact that they spend only a small fraction of time in the large Nµ regime. Whereas the absolute amount of time tL spent per cycle in the large Nµ regime is clearly important in determining how much overlap is maintained, it is equally clear from Fig 5 that substantial overlap is maintained even for transient bursts in Nµ. For instance, Fig 5A shows that functional overlap close to that observed in mutation-selection equilibrium (infinite population size) is maintained if the population spends only one-hundredth of its time, or 100 out of 10,000 generations in the large Nµ regime. A caveat to these results is that computational limitations in simulating large populations necessitated modulating µ rather than N to simulate changes in Nµ. Note also that the specific values of tS and tL necessary to maintain overlap will depend on multiple factors, such as the location of the mutation-selection equilibrium, the variance in overlap introduced by mutation, and probably even the selection regime (soft selection vs. hard selection; HARTL and CLARK 1997 Down). Unfortunately, an analytical approach to this problem, which could both circumvent computational limitations and account for these factors, is currently not at hand.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 5. Influence of fluctuations in Nµ on functional overlap. Dots and bars correspond to mean and standard deviation of functional overlap <r> among two genes in an ensemble of 50 populations after 105 generations of simulated evolution with fluctuating population size. The simulation was started at a mean overlap corresponding to the upper horizontal line, which is the mean overlap at mutation-selection balance for a very large population from (3). The lower horizontal line is the value to which functional overlap would decline from this initial value after 105 generations if the population ensemble was under the sole influence of drift for the parameters used here. In the simulation, the population ensemble spent alternatingly tS generations in a regime of low Nµ (dominated by drift) and tL generations in a regime of high Nµ (dominated by selection). This cycling was repeated until 105 generations had elapsed and was continued after that until the end of the next period of small Nµ was reached. At that time, ensemble overlap statistics were evaluated. The dashed curve represents the value of Neµ as a function of tS and tL, where Ne is the inbreeding effective population size calculated from tS and tL. For the entire range of this plot, the values of Neµ suggest that the evolutionary dynamics should be dominated by drift, such that the mean ensemble overlap should be close to the lower horizontal line (WAGNER 1999 Down, Figure 7). However, as long as >50 generations are spent in each cycle of large population size (tL >= 50), the influence of selection continues to be appreciable, even if only a small fraction of time is spent in the large population regime. Only for smaller tL does drift dominate if tS/tL > 50. For reasons of computational feasibility, it was the mutation rate and not the population size that was varied. N = 1000, µS = 0.0005, µL = 0.5, {lambda}w = 0, {lambda}r = 0.9, {sigma}mr = 0.05.

Why is Ne not a reliable indicator for the evolution of overlap (Fig 4) in fluctuating populations? Even in the absence of analytical predictions for the change of r in fluctuating populations, it is safe to say that this must have to do with the different evolutionary dynamics of genetic variation and functional overlap. Genetic variation gets lost much faster in small populations than in large populations. I surmise that this is not true in the case of functional overlap. Whereas the amount of genetic variation maintained in a fluctuating population may be most heavily influenced by the population bottlenecks, the amount of functional overlap may be most heavily influenced by the largest population sizes.

Pleiotropic effects slow down divergence of gene functions:
Among the many possible relations of functional overlap r and the probability f(r) that a mutation leading to a loss of one or more functions in one gene is phenotypically neutral, only f(r) = r has been explored so far. The following scenario illustrates the role that pleiotropic mutations may have in this relation and motivates a more general mathematical form for f. Consider duplicate and original of a gene shortly after a gene duplication, where both copies carry out each of k functions. These could be k spatiotemporal expression domains, in which each of the two genes are involved in a developmental process, or k downstream genes regulated by each of two genes encoding a transcription factor (see Fig 1). Assume that, over time, mutations randomly eliminate the capability to carry out some of these functions in each of the genes, such that only a fraction r = kr/k of the original k functions is still performed by both genes. In this context, mutations with pleiotropic effect can be viewed as those that eliminate more than one, say l, function. If k is sufficiently large, then the probability that such a pleiotropic mutation is neutral is approximated by rl. As a consequence, for a given overlap r, the probability f(r) that a mutation eliminating l functions is neutral is given by f(r) = rl. In other words, the more extensive pleiotropic effects are, the smaller the protective effect against mutations afforded by a given overlap r. Only in the absence of pleiotropic effects does the simple relation r = f(r) hold. In practice, mutations span a range of pleiotropic effects, suggesting that f(r) < r, under the constraints that f(1) = 1 and f(0) = 0. Below, the effect on the evolution of redundancy of a moderate degree of pleiotropy (l = 2, 3) via the relation f(r) = rl is explored (Fig 6A).



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 6. Pleiotropy slows the decay of functional overlap. (a) The function f(r) = rl illustrates how the probability of neutral mutations for two genes of a given functional overlap r is influenced by the number of functions l affected per mutation. (b) Evolution of functional overlap <r> in a population ensemble (20 populations) under the influence of drift as a function of the degree of pleiotropy l. Dots and bars represent means and standard deviations as obtained from a Monte Carlo simulation. The solid lines are numerical solutions of (10); {lambda}r = 0.5, {sigma}mr = 0.05, {lambda}w = 0, µ = 2.5 x 10-6. As l is increased from 1 to 3, the decay of r slows. (c) Times {tau}x until <r> decreases from 1 to x as a function of the degree of pleiotropy l as calculated from (10). A linear increase in l leads to an exponential increase in {tau}x; {lambda}r = 0.9, {sigma}mr = 0.05, {lambda}w = 0, µ = 2.5 x 10-6, N = 100.

The evolution of <rt>, the mean functional overlap in a population ensemble evolving under the influence of genetic drift, can be approximated by an ordinary differential equation

(10)

which is a simple extension of (4) to l >= 1. Here, one unit of time corresponds to one (discrete) generation, and l >= 1 is the exponent in the function f(r) = rl. A solution to this equation can be obtained as an implicit function by a separation of variables, yielding

for the initial condition <r0> = 1. From this form of the solution, one can directly determine the time {tau}x necessary for <r> to decrease from one shortly after a gene duplication to a value of x. Let us consider the special case of the times tk = {tau}(1/2)k necessary to reduce <rt> to (1/2)k. One can show that

(11)

The times tk+1 - tk necessary for successive halving of <rt> not only scale as 2k, i.e., each consecutive halving of functional overlap takes twice as long, they also scale as 2l. Thus, how fast functional overlap will decay depends critically on the extent of pleiotropy of mutations. Fig 6B shows the decrease of functional overlap from <r0> = 1 obtained from a numerical solution of (10) (solid lines) and from Monte Carlo simulations (dots and bars) for realistic mutation rates (µ = 2.5 x 10-6) and small populations (N = 100). The decrease in mean ensemble overlap clearly slows down as the extent of pleiotropy increases. However, over longer evolutionary timescales than those depicted here, differences among the different degrees of pleiotropy are more dramatic than Fig 6B suggests. Fig 6C shows, on a linear-log scale, the times {tau}1/2, {tau}1/4, and {tau}1/8 obtained from (11) as a function of l. It shows that (i) for an increase of l from 1 to only 3, each of these times increases by a factor 10 to 100 and (ii) for l = 3, the time to decrease <r> by each additional factor of 1/2 increases by approximately a factor 10.

In sum, even moderate degrees of pleiotropy lead to a severe constraint on how much genetic drift can reduce functional overlap among genes. The intuitive explanation is simple: mutations with pleiotropic effects are more and more likely to eliminate nonredundant functions as the functional overlap between two genes decreases. Because such mutations are deleterious and are eliminated from the population, the decay of overlap via neutral mutations and drift is slowed down. If most mutations have effects on many of a gene's functions, one may observe significant functional overlap (congruent expression patterns, etc.) maintained over long evolutionary timescales. This is because most mutations will also affect the few unique functions of a gene and thus be deleterious. In this case, extensive functional overlap may not indicate effective buffering against mutations.


*  DISCUSSION
*TOP
*ABSTRACT
*MODEL AND RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The model explored here rests on three simple assumptions: (i) genes have functions that overlap in a quantifiable way; (ii) overlapping functions affect the probability of mutations being neutral; and (iii) mutations, on average, reduce functional overlap. As in any mathematical model, simplifications and abstractions were made. A haploid system has been modeled to illustrate the issues discussed below most clearly.

Purifying and directional selection:
The model does not incorporate advantageous mutations leading to novel functions. Instead, its restriction to purifying selection and neutral mutations reflects an implicit assumption that genes diversify mostly by (i) loss of some of their functions and, thus, functional specialization, or (ii) acquisition of new functions by neutral mutations. Whether this is the case may well depend on the type of genes considered. For instance, genes involved in self-recognition or reproductive functions are more likely to be subject to positive selection for diversification (CIRERA and AGUADE 1998 Down; TING et al. 1998 Down; TSAUR et al. 1998 Down). On the other hand, many of the most striking examples of partial redundancies come from genes embedded in highly conserved developmental pathways, such as that of muscle determination (WEINTRAUB 1993 Down). In these cases, there may be a limited potential for the evolution of novel functions, such that divergence by specialization may be prevalent (FORCE et al. 1999 Down). Corroborating evidence comes from experiments where a gene from an invertebrate (before duplication) can substitute for the function of one of its duplication products in very distantly related vertebrates, or vice versa. If the vertebrate gene had evolved radically new functions, such substitution might not be possible. Examples include nautilus and decapentaplegic from Drosophila melanogaster and their respective counterparts MyoD and BMP in mammals (VENUTI et al. 1991 Down; PADGETT et al. 1993 Down). However, the restriction to divergence by neutral mutations should not be read as an exclusive endorsement of one evolutionary scenario. It is rather an acknowledgment that this scenario is relevant for many developmental genes.

Mutational load and deleterious mutations:
For the model studied here, mean fitness and mean functional overlap among genes are only weakly correlated at mutation-selection balance (Fig 2). The reason is the following. If organisms in a population have different fitnesses, selection acts immediately on these differences, e.g., via differential survival probabilities. In contrast, if some organisms in a population have genes with lower functional overlap than other organisms, these differences are selectively neutral until mutations occur at these genes, which takes of the order of 1/µ generations per individual. In this case, the genes with the higher functional overlap r are less likely to undergo deleterious mutations (which are then rapidly eliminated by selection), and they therefore accumulate slowly in the population. Thus, one might think of the evolution of mean fitness and of mean overlap as occurring on different timescales, and this is what causes them to be effectively decoupled.

As a consequence, mean functional overlap is independent of the fitness effects of deleterious mutations (Fig 2). Whether deleterious mutations have only moderate fitness effects, or whether they are invariably lethal, they are eliminated at a rate much faster than that at which the evolution of overlap occurs. Conversely, mean fitness at equilibrium differs from 1 only by an amount that is of the order of the mutation rate, similar to what would be expected from a haploid two-locus model with recurrent mutations. This is because overlap influences only the rate of deleterious mutations (1 - r)µ. Whether r is large or small, the reduction in mean fitness from 1 is thus still of the order of µ. An important corollary is that overlapping gene functions are not maintained because they convey a higher mean fitness on a population with greater functional overlap. In this sense maintenance of functional overlap is not an adaptive phenomenon (FUTUYMA 1998 Down).

Evolution of diploidy and redundant gene functions:
A phenomenon superficially related to the evolution of redundancy is the evolution of diploidy. Diploids are able to "mask" the fitness effects of deleterious alleles, which may have been an important reason why diploidy evolved (CROW and KIMURA 1965 Down; KONDRASHOV and CROW 1991 Down; MAYNARD and SMITH 1978; PERROT et al. 1991 Down; OTTO and GOLDSTEIN 1992 Down). On the other hand, diploids carry a higher mutational load (mean fitness reduction due to deleterious mutations) than haploids, because they carry twice as many alleles (e.g., CROW and KIMURA 1965 Down). The higher mutational load counteracts the effects of masking, raising the question of which of these two forces wins out in the evolution of ploidy levels. The question has been studied by various authors, mostly in the framework of modifier theory (PERROT et al. 1991 Down; BENGTSSON 1992 Down; OTTO and GOLDSTEIN 1992 Down). The answer depends on details of parameters such as recombination rates and degree of dominance.

There are key differences in the questions studied here from those at issue in existing models of the evolution of ploidy levels. First, the increase in frequency of original gene and duplicate (corresponding to the rise of diploidy from haploid organisms) is not at issue here. Fixation of gene duplications must be a ubiquitous phenomenon, because a vast number of known proteins fall into a small number of gene families. Theoretical work suggests that such fixation is easily accomplished by neutral evolution (even without any selective advantage through "masking") provided that duplications occur at a finite rate (CLARK 1994 Down). At issue is rather the question if and how functional overlap is maintained, once established. A second distinction stems from the notion of deleterious mutations. It is best illustrated with the example of a set of genes regulated by one transcription factor whose gene has undergone gene duplication and subsequent loss of some functions between original and duplicate (Fig 1). As long as all genes are properly regulated by at least one of the two factors (Fig 1C), no fitness disadvantage will result. However, from the perspective of a classical population genetic model, all transcription factor genes that have lost the ability to regulate at least one gene in the set would harbor "deleterious" mutations, because if the allele complementing their missing functions (at a different locus in the case of duplicate genes, at the same locus on a sister chromosome in the case of diploidy) did not occur in the same organism, the organism would have reduced fitness. This creates the somewhat paradoxical situation where an entire gene pool may consist of deleterious alleles that nevertheless complement each other functionally, so that no organism with reduced fitness exists in the population. Thus, where multifunctional proteins may undergo specialization after gene duplications via loss of some of their functions, the notion of intrinsically deleterious mutations may not be a natural one. It may be more expedient to call a mutation deleterious only if it actually affects fitness. However, even if the notion of "intrinsically" deleterious mutations is abandoned, the question remains whether overlapping gene functions influence mean fitness (mutational load) in mutation-selection balance. The answer is yes, but to no significant extent. In the simple case where all deleterious mutations are lethal, mean equilibrium fitness is 1 - 2µ if two genes have completely diverged in function; it is 1 - 2µ(1 - ) if mean equilibrium overlap is equal to . Thus, differences in mutational load are small. This result will not be greatly affected if deleterious mutations have less severe effects, because mean equilibrium fitness is still of order 1 - µ.

Decoupling of the evolution of overlap and redundancy shows that {lambda}r and its associated standard deviation of mutational effects {sigma}mr are the only parameters determining the evolution of functional overlap. Although functional overlap can be measured, as discussed above, such measurements are not available, and thus it is difficult to estimate these parameters from experimental data. As far as {sigma}mr is concerned, it is perhaps most important that some mutations must increase functional overlap among genes for selection to sustain overlap (WAGNER 1999 Down). Ample indirect evidence that this is the case comes from the many examples of functionally convergent proteins in various organisms (DOOLITTLE 1994 Down). Measurement of {lambda}r is hampered by two additional factors, the significant stochastic component in the evolution of overlap (see below) and its dependency on the nature of the genes considered.

Population size fluctuations and redundancy:
Natural populations differ from the constructs of theoretical population genetics in many ways, e.g., mating does not occur at random, individuals show a clumped spatial distribution, and populations fluctuate in size. Such deviations can be dealt with effectively by calculating an effective population size Ne according to standard formulas (HARTL and CLARK 1997 Down). A recent survey on the relation of Ne and N in a large number of vertebrate and invertebrate populations suggests that fluctuations in population size are by far the most important factor influencing Ne (FRANKHAM 1995 Down).

Population size is a key factor in determining whether selection can maintain functional overlap among genes. Only if the influence of drift is weak, i.e., in populations with large size Nµ, can overlap be maintained indefinitely, or even be built up from disjoint functions (WAGNER 1999 Down). Perhaps surprisingly, Ne does not seem to be sufficient to predict the dynamics of overlap in fluctuating populations (Fig 4). With some caveats due to computational limitations and the absence of analytical predictions, the results shown in Fig 5 suggest that short spikes (e.g., 50–100 generations every 10,000 generations) in Nµ may sometimes be sufficient to sustain high levels of functional overlap. The reason for the inadequacy of Ne to predict the dynamics of overlap has to do with the different evolutionary dynamics of genetic variation (for which Ne was designed) and functional overlap. Whereas genetic variation gets lost much faster in small populations than in large populations, this may not be true for functional overlap, which may be lost slowly in small populations. It would clearly be desirable to have a measure of effective population size suitable to predict the decay of redundancy in fluctuating populations.

Population sizes with Nµ >> 1 are realistic for most microorganisms and some invertebrates. However, even in higher vertebrates, census population sizes are sometimes well within this realm, although population sizes may undergo frequent bottlenecks. For example, a study by AVISE et al. 1988 Down reports a discrepancy between current population sizes and historical population sizes in three vertebrate species including catfish (Arius felis) and redwing blackbirds (Agelaius phoeniceus). Census population sizes suggest a number of 107 and 2 x 107 breeding females, respectively. However, data on mitochondrial DNA variation suggest that past population sizes must have been substantially smaller (AVISE et al. 1988 Down). Nevertheless, even if large population sizes are sustained over only short periods of time, functional overlap among genes might be maintained in such populations. More data on population size history would be required to determine whether population bursts are frequent enough even in organisms normally characterized by small N.

An observation unrelated to the significance of population fluctuations can be made from the results shown here. "Error" bars shown in Fig 5 represent the standard deviation of functional overlap among populations within a population ensemble. Because these populations spend a significant amount of time in the small Nµ regime, they are monomorphic most of the time. This means that the standard deviation shown is a good measure of differences in functional overlap among populations evolving in parallel. These diffferences are obviously large (Fig 5), which is due to variation in the number of mutations reducing overlap that go to fixation. In some populations, several such mutations may have gone to fixation, in others only few. This holds not only for populations evolving in parallel, but would also apply to multiple gene pairs evolving in parallel in one lineage. The importance of stochastic effects may help explain why some developmental genes duplicated early during chordate evolution have preserved greater functional overlap than others. Examples might include members of the MyoD gene family (WEINTRAUB 1993 Down), where significant overlap in functions has been maintained, vs. members of the hedgehog gene family, which have almost completely diverged in expression pattern (BITGOOD and MCMAHON 1995 Down).

Pleiotropy slows the decay of overlap in small populations:
A large fraction of an organism's genomic gene content may be expressed during the development of each organ system (THAKER and KANKEL 1992 Down; DATTA et al. 1993 Down). This suggests that each developmental gene participates in many developmental processes, in line with evidence from many well-studied genes, which are expressed during more than one developmental stage. Such developmental multifunctionality is complemented by biochemical multifunctionality of proteins. Transcription factors exemplify this well (Fig 1). If each gene regulated by a transcription factor is viewed as one function of that factor, then most transcription factors are multifunctional. The abundance of multifunctionality in regulatory genes, both developmentally and biochemically, suggests that pleiotropic effects of mutations should be pervasive, as WRIGHT 1968 Down already foresaw. In other words, most mutations may affect more than one function of a developmental gene. Now what does this have to do with the evolution of functional overlap?

Envision a scenario such as that shown in Fig 1, where a transcription factor regulating the expression of multiple genes undergoes gene duplication, and subsequently the original and the copy lose some of their functions. In small populations, functional overlap will decay according to (4). The key feature of this equation is that the reduction of overlap proceeds more and more slowly as more and more functions get lost. This is because as the number of functions not covered by the other genes increases, the likelihood that a loss-of-function mutation affects a unique function and is deleterious increases as well. This slowing down in evolution is quite drastic. Pleiotropic mutational effects contribute further to this phenomenon. A simple argument presented above suggests that if a mutation affects l functions at a time, then the probability of a mutation in one of two genes with overlap r being neutral is of order rl. Thus, if pleiotropic mutations affect a sufficiently large number l of functions, then mutations will often have deleterious effects even if r is large. This should lead to a further slowing down of the decay of redundancy, an intuition confirmed by analytical and simulation results (Fig 6). As the number of functions affected by a mutation increases linearly, the time until overlap decays from one to a given value increases exponentially (Fig 6C).

Knock-out mutations—a severe kind of genetic perturbation—of many developmental genes in vertebrates have weak phenotypic effects. This suggests that significant functional overlap has been maintained among the genes in question, many of which are the product of ancient gene duplications having occurred sometime before the origin of tetrapods ~400 million years ago (WRAY et al. 1996 Down). Such preservation of functional overlap is not the "default" fate of duplicated genes, as several recent studies on the rapid functional diversification of duplicated genes show (CIRERA and AGUADE 1998 Down; TSAUR et al. 1998 Down; ZHANG et al. 1998 Down). Thus, it is likely that some evolutionary force is responsible for its preservation. Natural selection is likely to play an important role in this process, either actively maintaining overlap in large populations or passively delaying its decay in small ones. Even if selection only fulfills this second role, its importance must not be underestimated given the vast amount of time that has elapsed since these duplications occurred.

Manuscript received February 24, 1999; Accepted for publication November 23, 1999.
*  APPENDIX
*TOP
*ABSTRACT
*MODEL AND RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

AN ORDER-OF-MAGNITUDE ESTIMATION FOR THE CORRELATION OF FITNESS AND FUNCTIONAL OVERLAP
One can derive from (6) a recurrence equation for t and solve for at equilibrium by setting t = t+1. This yields

(A1)

which can be rewritten as

(A2)

Each of the terms in the square brackets on the right-hand side is of order unity or less. (Note that 1 > {int}10{int}10riwjp(r,w) -> 0 as i, j -> {infty}.) It follows that the covariance of r and w in mutation-selection balance is of order of the mutation rate, i.e., very small. By calculating the equilibrium value of in the same way, it can be shown that at mutation selection equilibrium Cov(rk, w) = - is of order µ for all k > 1. In other words, {cong} . Substituting this approximation into (A1), and µ cancel out, and one easily obtains

(A3)

This can be rewritten to yield Var(r) > {lambda}r - 2 = .

Using (6) to establish an equation for the mean fitness at equilibrium, one can show that the variance of w is given by

(A4)

Summarizing, both Cov(r,w) and Var(w) are of order µ, whereas the above form for Var(r) suggests that Var(r) is of order unity. Taken together, this indicates that

(A5)


*  LITERATURE CITED
*TOP
*ABSTRACT
*MODEL AND RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

ALLENDORF, F. W., F. M. UTTER and B. P. MAY, 1975 Gene duplication within the family Salmonidae. II. Detection and determination of the genetic control of duplicate loci through inheritance studies and the examination of populations, pp. 415–432 in Isozymes IV: Genetics and Evolution, edited by C. L. MARKERT. Academic Press, New York.

AVISE, J. C., R. M. BALL, and J. ARNOLD, 1988  Current versus historical population sizes in vertebrate species with high gene flow: a comparison based on mitochondrial DNA lineages and inbreeding theory for neutral mutations. Mol. Biol. Evol. 5:331-344[Abstract].

BAILEY, W. J., J. KIM, G. P. WAGNER, and F. H. RUDDLE, 1997  Phylogenetic reconstruction of vertebrate Hox cluster duplications. Mol. Biol. Evol. 14:843-853[Abstract].

BASSON, M. E., M. THORSNESS, and J. RINE, 1986  Saccharomyces cerevisiae contains two functional genes encoding 3-hydroxy-3-methylglutaryl-coenzyme A reductase. Proc. Natl. Acad. Sci. USA 83:5563-5567[Abstract/Free Full Text].

BENGTSSON, B. O., 1992  Deleterious mutations and the origin of the meiotic ploidy cycle. Genetics 131:741-744[Abstract].

BITGOOD, M. J. and A. P. MCMAHON, 1995  Hedgehog and Bmp genes are coexpressed at many diverse sites of cell-cell interaction in the mouse embryo. Dev. Biol. 172:126-138[Medline].

CADIGAN, K. M., U. GROSSNIKLAUS, and W. J. GEHRING, 1994  Functional redundancy: the respective roles of the two sloppy paired genes in Drosophila segmentation. Proc. Natl. Acad. Sci. USA 91:6324-6328[Abstract/Free Full Text].

CIRERA, S. and M. AGUADE, 1998  Molecular evolution of a duplication: the sex-peptide (Acp70A) gene region of Drosophila subobscura and Drosophila madeirensis.. Mol. Biol. Evol. 15:988-996[Abstract].

CLARK, A. G., 1994  Invasion and maintenance of a gene duplication. Proc. Natl. Acad. Sci. USA 91:2950-2954[Abstract/Free Full Text].

CONDIE, B. G. and M. R. CAPECCHI, 1995  Mice with targeted disruptions in the paralogous Hox genes Hoxa-3 and Hoxd-3 reveal genetic interactions. J. Cell. Biochem. 0(Suppl.):A7-A109.

COOKE, J., M. A. NOWAK, M. BOERLIJST, and J. MAYNARD-SMITH, 1997  Evolutionary origins and maintenance of redundant gene expression during metazoan development. Trends Genet. 13:360-364[Medline].

CROW, J. F. and M. KIMURA, 1965  Evolution in sexual and asexual populations. Am. Nat. 99:439-450.

DATTA, S., K. STARK, and D. R. KANKEL, 1993  Enhancer detector analysis of the extent of genomic involvement in nervous system development in Drosophila melanogaster.. J. Neurobiol. 24:824-841[Medline].

DOOLITTLE, R. F., 1994  Convergent evolution: the need to be explicit. Trends Biochem. Sci. 19:15-18[Medline].

FERRIS, S. D. and G. S. WHITT, 1976  Loss of duplicate gene expression after polyploidization. Nature 265:258-260.

FERRIS, S. D. and G. S. WHITT, 1979  Evolution of the differential regulation of duplicate genes after polyploidization. J. Mol. Evol. 12:267-317[Medline].

FORCE, A., M. LYNCH, F. B. PICKETT, A. AMORES, and Y. L. YAN et al., 1999  Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545[Abstract/Free Full Text].

FRANKHAM, R., 1995  Effective population size/adult population size ratios in wildlife: a review. Genet. Res. 66:95-107.

FUTUYMA, D., 1998 Evolutionary Biology, Ed. 3. Sinauer, Sunderland, MA.

GOLDSTEIN, L. S. B., 1993  Functional redundancy in mitotic force generation. J. Cell Biol. 120:1-3[Free Full Text].

GONZÁLES-GAITÁN, E. M., M. ROTHE, E. A. WIMMER, H. TAUBERT, and H. JÄCKLE, 1994  Redundant functions of the genes knirps and knirps-related for the establishment of anterior Drosophila head structures. Proc. Natl. Acad. Sci. USA 91:8567-8571[Abstract/Free Full Text].

GOODSON, H. V. and J. A. SPUDICH, 1995  Identification and molecular characterization of a yeast myosin I. Cell Motil. Cytoskel. 30:73-84[Medline].

HANKS, M., W. WURST, L. ANSON-CARTWRIGHT, A. AUERBACH, and A. L. JOYNER, 1995  Rescue of the En-1 mutant phenotype by replacement of En-1 with En-2.. Science 269:679-682[Abstract/Free Full Text].

HARTL, D. L., and A. G. CLARK, 1997 Principles of Population Genetics, Ed. 3. Sinauer, Sunderland, MA.

HIGASHIJIMA, S., T. MICHIUE, Y. EMORI, and K. SAIGO, 1992  Subtype determination of Drosophila embryonic external sensory organs by redundant homeo box genes BarH1 and BarH2.. Genes Dev. 6:1005-1018[Abstract/Free Full Text].

HOFFMANN, F. M., 1991  Drosophila abl and genetic redundancy in signal transduction. Trends Genet. 7:351-355[Medline].

JOYNER, A. L., K. HERRUP, B. A. AUERBACH, C. A. DAVIS, and J. ROSSANT, 1991  Subtle cerebellar phenotype in mice homozygous for a targeted deletion of the en-2 homeobox. Science 251:1239-1243[Abstract/Free Full Text].

KIMURA, M. and J. L. KING, 1979  Fixation of a deleterious allele at one of two "duplicate" loci by mutation pressure and random drift. Proc. Natl. Acad. Sci. USA 76:2858-2861[Abstract/Free Full Text].

KONDRASHOV, A. S. and J. F. CROW, 1991  Haploidy or diploidy: Which is better? Nature 351:314-315[Medline].

KOSMAN, D., J. REINITZ and D. H. SHARP, 1998 Automated assay of gene expression at cellular resolution, in Pacific Symposium on Biocomputing 1998, edited by R. B. ALTMAN, K. DUNKER, L. HUNTER and T. E. KLEIN. World Scientific, Singapore.

LI, X. and M. NOLL, 1994a  Compatibility between enhancers and promoters determines the transcriptional specificity of gooseberry and gooseberry neuro in the Drosophila embryo. EMBO J. 13:400-406[Medline].

LI, X. and M. NOLL, 1994b  Evolution of