Genetics, Vol. 159, 1789-1804, December 2001, Copyright © 2001

The Probability of Preservation of a Newly Arisen Gene Duplicate

Michael Lyncha, Martin O'Helyb, Bruce Walshc, and Allan Forced
a Department of Biology, Indiana University, Bloomington, Indiana 47405,
b Department of Integrative Biology, University of California, Berkeley, California 94720,
c Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
d Virginia Mason Research Center, Benaroya Research Institute, Seattle, Washington 98101

Corresponding author: Michael Lynch, Department of Biology, Indiana University, Bloomington, IN 47405., mlynch{at}bio.indiana.edu (E-mail)

Communicating editor: M. A. ASMUSSEN


*  ABSTRACT
*TOP
*ABSTRACT
*PRESERVATION BY DEGENERATIVE...
*PRESERVATION BY...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Newly emerging data from genome sequencing projects suggest that gene duplication, often accompanied by genetic map changes, is a common and ongoing feature of all genomes. This raises the possibility that differential expansion/contraction of various genomic sequences may be just as important a mechanism of phenotypic evolution as changes at the nucleotide level. However, the population-genetic mechanisms responsible for the success vs. failure of newly arisen gene duplicates are poorly understood. We examine the influence of various aspects of gene structure, mutation rates, degree of linkage, and population size (N) on the joint fate of a newly arisen duplicate gene and its ancestral locus. Unless there is active selection against duplicate genes, the probability of permanent establishment of such genes is usually no less than 1/(4N) (half of the neutral expectation), and it can be orders of magnitude greater if neofunctionalizing mutations are common. The probability of a map change (reassignment of a key function of an ancestral locus to a new chromosomal location) induced by a newly arisen duplicate is also generally >1/(4N) for unlinked duplicates, suggesting that recurrent gene duplication and alternative silencing may be a common mechanism for generating microchromosomal rearrangements responsible for postreproductive isolating barriers among species. Relative to subfunctionalization, neofunctionalization is expected to become a progressively more important mechanism of duplicate-gene preservation in populations with increasing size. However, even in large populations, the probability of neofunctionalization scales only with the square of the selective advantage. Tight linkage also influences the probability of duplicate-gene preservation, increasing the probability of subfunctionalization but decreasing the probability of neofunctionalization.


FOSTERED in part by the belief that gene duplication is a major contributor to the origin of evolutionary novelties, substantial theoretical and empirical attention has been given to the evolutionary fates of gene duplicates. The traditional view has been that a gene duplicate will ultimately suffer one of two fates: either one copy will be silenced by degenerative mutations (nonfunctionalization) or one copy will evolve a new beneficial function (neofunctionalization) that permanently preserves it in the population (HALDANE 1933 Down; FISHER 1935 Down; OHNO 1970 Down; NEI and ROYCHOUDHURY 1973 Down; CHRISTIANSEN and FRYDENBERG 1977 Down; BAILEY et al. 1978 Down; TAKAHATA and MARUYAMA 1979 Down; LI 1980 Down; WATTERSON 1983 Down; WALSH 1995 Down). Under this model, the alternative copy always retains the original function. However, a third possible fate has recently been recognized: both copies may be reciprocally preserved through the fixation of complementary loss-of-subfunction mutations (subfunctionalization), which results in a partitioning of the tasks of the ancestral gene (FORCE et al. 1999 Down; LYNCH and FORCE 2000A Down; STOLTZFUS 2000 Down; WAGNER 2000 Down). Such a partitioning of ancestral-gene tasks may also be driven by a form of positive Darwinian selection, the acquisition of copy-specific mutational refinements to alternative gene subfunctions previously kept at suboptimal levels by pleiotropic constraints (PIATIGORSKY and WISTOW 1991 Down; HUGHES 1994 Down). Finally, it has been suggested that redundancy may be directly advantageous as a mechanism for minimizing the phenotypic effects of null alleles and/or developmental accidents (CLARK 1994 Down; NOWAK et al. 1997 Down; KRAKAUER and NOWAK 1999 Down; WAGNER 1999 Down).

As pointed out by SPOFFORD 1969 Down, a significant gap in our understanding of gene duplication concerns the critical initial phase during which a single copy of a duplicated gene must rise to a high enough frequency in the population to become subject to the mutational processes noted above. Almost all of the existing theory for the evolution of duplicate genes starts with the assumption that all members of the base population carry two fully functional genes at both loci. This is perhaps a reasonable scenario for a newly established polyploid species, but an alternative approach is required to explain the establishment of single-gene duplicates originating by more common processes such as replicative translocation or tandem duplication.

Our focus is on the ultimate fate of a pair of duplicate loci, one of which (the ancestral copy) carries active alleles in all members of the population and the other of which (the descendant copy) is initially represented by a single gene in a single (heterozygous) individual, all other individuals at this latter locus being effectively null homozygotes. We restrict our attention to whole-gene duplication, so that processed pseudogenes or partial duplications are not considered, and we assume that there is no intrinsic disadvantage to duplicates as might arise if gene-dosage issues were important. Given these starting conditions, several potential outcomes can be envisioned:

First, as with any newly arisen mutation, there is a high probability that the new copy will be rapidly lost by random genetic drift. If there is no selective advantage for the new copy, this probability will be equal to {lambda} = 1 - [], where N denotes the population size. Upon such an outcome, all evidence of the duplication event will be eliminated from the population.

Second, in the rare event that the new duplicate rises to high frequency, it may randomly accumulate a higher load of degenerative mutations than the ancestral copy and in the absence of any selective advantage may eventually become nonfunctionalized. In this case, the ancestral gene copy is permanently retained, while a semipermanent record of the duplication event may transiently remain in the form of a pseudogene.

Third, if functional alleles rise by chance to high frequency at the new duplicate locus, it is possible that the ancestral copy will become a nonfunctional pseudogene. In this case, the population is again returned to the single-gene state of the ancestral population, but the genomic location of the functional gene will have changed (HALDANE 1933 Down; WALSH 1995 Down).

Finally, both copies of the locus may become permanently preserved either by subfunctionalization, with each copy carrying out a unique set of subfunctions (or both being mutationally reduced to the level of expression of the single-copy ancestral gene), or by neofunctionalization, with one copy evolving a new beneficial function at the expense of the original function (which is retained by the other copy). A change in map position will result if the two loci become subfunctionalized or if the original locus becomes neofunctionalized.

The evolutionary outcome of a gene-duplication event relates to three issues of potentially broad evolutionary significance. First, the mechanisms by which gene duplicates become permanently preserved have a bearing on the evolutionary potential of a species. For example, a neofunctionalizing mutation is equivalent to the origin of an evolutionary novelty, while subfunctionalizing mutations can provide new evolutionary flexibility by releasing an ancestral gene from pleiotropic constraints. We refer to the probability that a newly arisen gene duplicate becomes permanently preserved as {Theta}. Second, complete or partial silencing of an ancestral gene results in chromosomal repatterning, equivalent to a change in the genetic map, assuming the loci are not completely linked. Such changes are of relevance to the speciation process, as they passively induce postzygotic genomic incompatibilities in hybrid progeny (WERTH and WINDHAM 1991 Down; LYNCH and FORCE 2000B Down). We refer to the probability that a newly arisen gene duplicate induces a map change as {Delta}. Third, if duplicate genes become fixed in a population more frequently than their parental loci are lost, an expansion of the genome must occur. We refer to the probability that a newly arisen gene duplicate results in a permanent expansion of the genome size as {Gamma}. This is equivalent to the probability of joint preservation of a pair of duplicates.

The development of a comprehensive theory for the evolution of duplicate genes raises formidable technical difficulties because the process involves two multiallelic loci with epistatic interactions. We have been successful in deriving some analytical approximations that help provide insight into the mechanisms governing the dynamics of duplicate-gene evolution, but to establish the validity of the theory it has also been necessary to rely extensively on computer simulations.


*  PRESERVATION BY DEGENERATIVE MUTATIONS
*TOP
*ABSTRACT
*PRESERVATION BY DEGENERATIVE...
*PRESERVATION BY...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The situation in which mutations to novel beneficial functions are sufficiently rare to be ignored provides a useful null model for interpreting the fates of duplicate genes because the evolutionary dynamics are governed entirely by random genetic drift and degenerative mutation. Under this model, a newly arisen gene duplicate has three possible fates: (1) The new copy may simply be lost by random genetic drift and/or silenced by the accumulation of degenerative mutations; (2) the new copy may become permanently fixed in the population, with the original locus subsequently being silenced by degenerative mutations; or (3) both loci may become mutually preserved by subfunctionalization (Fig 1). The probability of preservation of the duplicate gene and, in the case of unlinked duplicates, the probability of a map change are equal to the sum of probabilities of fates 2 and 3, while the rate of genome expansion is equal to the probability of fate 3. To accommodate the fact that all of these probabilities decline rapidly with increasing N [because the probability of initial establishment is on the order of 1/(2N)], we scale the three summary statistics ({Theta}, {Delta}, and {Gamma}) by multiplying by 2N. Letting Pnon,o denote the probability of silencing of the original locus and Psub denote the probability of subfunctionalization,

(1a)

and

(1b)



View larger version (29K):
In this window
In a new window
Download PPT slide
 
Figure 1. Schematic for the alternative stable outcomes of the gene-duplication process for the subfunctionalization and neofunctionalization models. For both cases, the ancestral gene is on the left and the newly arisen duplicate is on the right. For the subfunctionalization model, the gene is divided into two sections, each one denoting an independently mutable subfunction. Diagonal lines denote loss of function or subfunction; diamonds denote neofunctionalization (with an accompanying loss of the original function). The probabilities of the alternative fates are listed on the left: non, nonfunctionalization; sub, subfunctionalization; neo, neofunctionalization; and o and m, the original and newly arisen locus, respectively. The genomic consequences of the various fates are marked on the right.

With this scaling, {Theta} = 1 implies that the probability of preservation of a newly arisen gene duplicate is equivalent to the rate of fixation of a neutral mutation, 1/(2N). Definitions of these and all additional terms associated with this model are summarized in Table 1.


 
View this table:
In this window
In a new window

 
Table 1. Terms associated with the model incorporating only degenerative mutations

As in most other theoretical investigations of the evolution of duplicate genes, we initially consider the double-null recessive model, whereby all two-locus genotypes have equal fitness except for the inviable double-null homozygotes that completely lack a particular function (or subfunction). Nonfunctionalizing mutations, which eliminate all gene function, arise at each locus at rate µc per gene copy per generation, and, when a gene has independently mutable subfunctions, each subfunction is subject to silencing at rate µr. We restrict our attention to the situation in which genes have either a single function (in which case µr = 0) or two independently mutable subfunctions (each with the same µr). Such subfunctions may be physically defined in a number of ways, including tissue-specific regulatory elements, alternative functional domains of a protein, and/or alternative splice variants. We consider the two extreme situations in which the duplicate loci are either completely linked (i.e., a tandem pair) or freely recombining.

As there is no reason to expect the mutation process to be altered upon gene duplication, we assume that the initial locus has allele frequencies expected under selection-mutation-drift equilibrium prior to duplication. The new locus is then randomly initiated with a single copy of either a fully functional allele or a subfunctional allele, with the probabilities of initial status being defined by the relative equilibrium frequencies of the classes of active alleles at the original locus. We also assume that the founding allele for the new locus is carried initially in a gamete containing its ancestral type at the original locus. In the case of complete linkage, because a duplicate is permanently associated with its parental source, a newly arisen subfunctional gene cannot proceed to fixation, as this would result in the loss of the alternative subfunction. In the case of free recombination, the ancestral locus is guaranteed to be preserved in the event the new locus is founded by a subfunctional allele.

It is well known that the equilibrium frequency of a recessive lethal (nonfunctional) allele for a gene with a single function is {surd} in large populations (Nµc > 1), and this frequency declines in smaller populations (Fig 2). The equilibrium frequency of nonfunctional alleles is reduced when genes have independently mutable subfunctions, but this is more than offset by the frequency of subfunctional alleles (Fig 2). For example, at large N with µc = µr = 10-5, each of the two types of subfunctional alleles have equilibrium frequencies of 0.0025, while the null allele has frequency 0.0015. Thus, provided N > 103, some subfunctional alleles are expected to be segregating at the initial locus unless µr << µc.



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 2. Expected equilibrium frequencies of null and subfunctional alleles at the initial locus at various population sizes, under drift-mutation-selection balance. Results were obtained by computer simulation with the mutation rate to nulls being µc = 10-5 and the gene either having a single function (µr = 0) or two independently mutable subfunctions with µr = 10-5. In the latter case, each of the two possible types of subfunctional alleles has expected frequencies equal to the plotted values.

To evaluate the probabilities of the three alternative fates (Pnon,o, Pnon,m, and Psub) under this model over a range of population sizes, we performed stochastic simulations of a gamete-based model, which we have previously shown to yield equivalent results to individual-based simulations (LYNCH and FORCE 2000A Down). An effectively infinite gamete pool is assumed so that recombination and mutation can be treated as deterministic processes. Given the expected frequencies of gamete types in any generation, the expected frequencies of zygote genotypes after random mating and selection are determined, and then the actual zygote frequencies are obtained by random sampling of N genotypes. This cycle of events is continued until the final fate of the pair of duplicates has been determined, i.e., when either one locus completely lacks functional alleles (nonfunctionalization) or when each locus has completely lost a unique subfunction (subfunctionalization). For any set of mutational parameters, we typically performed enough simulations so that at least 2500 runs would lead to the gene duplicate becoming well-established in the population by random genetic drift. This required as many as 109 replicate runs at large N, and we employed no fewer than 5 x 106 runs at small N.

Linked loci:
Cases of absolute linkage can be treated formally as a single-locus model, and in this case we refer to a linked pair of duplicates as a two-copy allele. Functional two-copy alleles have a slight selective advantage over their single-copy counterparts during the initial phase of establishment because single-copy alleles that experience either subfunctionalizing or nonfunctionalizing mutations can never go to fixation, whereas a mutated two-copy allele can fix as long as the two component genes cover all subfunctions. In small populations, this advantage is negligible because the two-copy allele is either lost or fixed by random genetic drift before a significant probability of mutation has accrued, and the probability that the new duplicate initially drifts to fixation is very close to its initial frequency, 1/(2N). Letting P'non,o and P'sub denote the subsequent fate probabilities conditional on the two-copy allele having become established, then because nonfunctionalization will occur randomly at one locus or the other, P'non,o = , and

(2a)


(2b)

To obtain an expression for P'sub, we note that the probability that the first mutation to be fixed in a two-copy lineage is of a subfunctionalizing type is 2µr/(µc + 2µr). Conditional on this occurring, joint preservation of the two genes by subfunctionalization is expected to occur with probability , because following the loss of one subfunction from one locus, the subfunctional locus is still free to fix subsequent mutations at rate µr + µc (resulting in nonfunctionalization), while the intact locus may only fix a mutation for the alternative subfunction (at rate µr, resulting in subfunctionalization; FORCE et al. 1999 Down). Thus, for small N, we expect P'sub ~= 2{alpha}2, and hence {Theta} ~= 0.5 + {alpha}2 and {Gamma} ~= 2{alpha}2.

With increasing population size, there is an increasing probability that single-copy alleles will mutate during the long sojourn of a two-copy allele through the population, putting the former at a slight selective disadvantage. Consider, for example, the case of genes with a single function. At the limit as N -> {infty}, the expected frequency of descendants of the initial two-copy gene among the total pool of functional genes increases from the initial level of 1/(2N) to a stable level of 1/N (Appendix). This transient behavior occurs because the initial mutations experienced by two-copy alleles are completely neutral, which causes their descendants to increase at the expense of one-copy alleles. The increase continues until all two-copy alleles have acquired a mutation in at least one copy, at which point they are selectively equivalent to functional single-copy alleles. These results suggest that at large N a completely linked pair of duplicate genes (in this case, assumed to be incapable of subfunctionalization or neofunctionalization) will fix with probability 1/N, with a random member of the pair becoming silenced, which further implies {Theta} -> 2N · (1/N) · 0.5 = 1.0 as N -> {infty}. The temporal dynamics outlined in the Appendix suggest that this large-population approximation should apply provided Nµc > 2. Using the approach outlined in the Appendix, after considerable analysis, we also obtained results that suggest that {Theta} -> 1.0 as N -> {infty} when there are two independently mutable subfunctions.

The preceding analytical approximations are in close agreement with observations from computer simulations (Fig 3 and Fig 4). At small N, {alpha} = 0.0 when there is only a single-gene function, yielding {Theta} ~= 0.5 and {Gamma} = 0, whereas {alpha} = 0.333 when µr = µc, yielding {Theta} ~= 0.611 and {Gamma} ~= 0.222. As N -> {infty}, {Theta} -> 1.0 under the conditions of one or two subfunctions, and {Gamma} -> 0.



View larger version (22K):
In this window
In a new window
Download PPT slide
 
Figure 3. The scaled probability of preservation of a duplicate gene (also equal to the scaled probability of a map change) for the situation in which the rate of mutation to novel functions is negligible. Open and solid symbols denote results for freely recombining and completely linked loci, respectively. Squares denote the results for the situation in which there are two independently mutable subfunctions, each with mutation rate µr = 10-5, and the circles denote the case in which there is a single function (µr = 0). In both cases, the rate of origin of mutations that eliminate all function is µc = 10-5. The dotted lines denote the analytical approximations for the case of unlinked genes obtained by use of Equation 2a, Equation 3, Equation 4, Equation 6, and Equation 8.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 4. The scaled probability of duplicate-gene preservation by subfunctionalization for the situation in which there are two independently mutable subfunctions and the rate of mutation to novel functions is negligible. Open and solid symbols denote results for freely recombining and completely linked loci, respectively. The mutation rates are µr = µc = 10-5. The dotted line denotes the analytical approximation for the case of freely recombining loci, obtained by use of Equation 2b, Equation 3, Equation 4, Equation 6, and Equation 8.

Unlinked duplicates:
For freely recombining loci, the selective advantage of a newly arisen duplicate is negligible due to the fact that it does not remain associated with a functional partner. The key issue then becomes whether the newly arisen gene is capable of drifting to fixation in an intact state. As pointed out in LYNCH and FORCE 2000A Down, the probability of subfunctionalization of unlinked duplicates declines with increasing population size because the accumulation of secondary mutations can eventually silence a subfunctional allele during the long (~4N generation; KIMURA and OHTA 1969 Down) sojourn to fixation. To account for this behavior, we present the following approximations, first for a fully functional newborn gene duplicate and then for a subfunctional newborn.

Under the assumption of negligible selection, an initially fully functional allele retains full functionality after 4N generations with probability

(3)

(again, assuming two independently mutable subfunctions) and will have lost a single subfunction with probability

(4)

Having reached the latter state (with the original locus still intact), joint preservation of the two loci by subfunctionalization will occur with probability {alpha}, following the logic outlined above. Noting that subsequent fixation events are expected to occur approximately every 4N generations on average and that P1Pt-10 is the probability that an initially intact gene has lost a single subfunction 4Nt generations following fixation, then the probability of subfunctionalization, conditional on the initial establishment of a duplicate, is

(5)

If, on the other hand, the newly arisen duplicate is a copy of a subfunctional allele, then the probability that it is intact after the expected 4N generations required for establishment is

(6)

and

(7)

is the conditional probability of subfunctionalization. Letting pf denote the expected initial frequency of the fully functional allele at the original locus, then the weighted conditional probability of subfunctionalization is

(8)

For small N, pf ~= 1 and P1/(1 - P0) -> 2{alpha}, yielding P'sub ~= 2{alpha}2, and from Equation 2a and Equation 2b, {Theta} = {Delta} ~= 0.5 + {alpha}2 and {Gamma} ~= 2{alpha}2. These results are identical to the expectations for linked duplicates. As N -> {infty}, P'sub -> 0, implying {Theta} = {Delta} -> 0.5 and {Gamma} -> 0. This suggests that the probability of duplicate-gene preservation at large N is twofold lower in unlinked than in linked duplicates.

Provided Nµc < 10, these analytical approximations for unlinked duplicates yield results that are quite compatible with those obtained by computer simulation (Fig 3 and Fig 4). There are three fairly distinct regions of response to increasing N. First, for Nµc << 1, {Theta} = {Delta} ~= 0.5 + {alpha}2 and {Gamma} ~= 2{alpha}2 as predicted by the theory for small N. Second, for 1 < Nµc < 10, {Theta} = {Delta} ~= 0.5 and {Gamma} ~= 0 as predicted by the theory for large N. Third, as Nµc increases beyond 10, {Theta} = {Delta} gradually approaches zero. Although this latter phase is unaccounted for by the theory, it presumably occurs because when Nµc > 1 there is a significant probability that all of the descendants of a newly arisen duplicate become silenced by mutations prior to the initial establishment of the lineage. In any event, contrary to the situation for linked duplicates, the probability of preservation of unlinked duplicates declines with increasing population size, although, provided Nµc < 10, this probability still equals or exceeds 1/4N.


*  PRESERVATION BY NEOFUNCTIONALIZATION
*TOP
*ABSTRACT
*PRESERVATION BY DEGENERATIVE...
*PRESERVATION BY...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We now consider the situation in which mutations with phenotypic effects either silence a gene or introduce a new beneficial function at the expense of the original function (Fig 1). The fitness landscape is assumed to be one in which individuals that carry no alleles with the original function have zero fitness, with the remaining genotypes having fitnesses equal to 1 + ns, where n = 0, 1, 2, or 3 is the number of neofunctional alleles carried. Silencing mutations are assumed to arise at rate µc per gene copy for both types of active alleles, whereas alleles of the "ancestral" type (hereafter referred to as wild type) can also mutate to the neofunctionalized state at rate µb.

To evaluate the probabilities of the alternative fates of a pair of duplicate loci subject to beneficial mutations, we employed a simulation approach identical in structure to that described in the previous section, starting with a single-copy locus with allele frequencies equal to the simulated expectations under selection-mutation-drift equilibrium. The newly arisen duplicate was initiated as a single copy randomly recruited from the pool of wild-type and neofunctional alleles at the original locus, and the generation-to-generation cycle of events was continued until the final fate of the pair of duplicates had been established. It is straightforward to identify nonfunctionalization as a final stable state, as this simply requires that one locus becomes fixed for null alleles. Identification of neofunctionalization as a fate is slightly more subjective because, in a finite population, there is always a very small possibility that a neofunctionalized locus may become lost in the future (because it carries a beneficial but nonessential function and is subject to nonfunctionalizing mutations). We considered neofunctionalization to have occurred when one locus had completely lost the wild-type allele and acquired a high enough frequency of the neofunctionalized allele to ensure a probability of fixation of the latter of at least 0.99. Using the diffusion approximation for the fixation probability of a beneficial allele with additive effects (KIMURA 1962 Down), this critical frequency is equal to

(9)

which for large Ns reduces to p* ~= 1.15/(Ns). (For the case of completely linked duplicates, this critical frequency must be applied to pairs of two-copy alleles with one neofunctional and one wild-type member, because neofunctional single-copy genes cannot become fixed in the population.) In the simulations that we performed, we assumed that the rate of mutation to neofunctional alleles (10-9 per gene per generation) is much smaller than the mutation rate to nulls (10-5 per gene per generation, as in the previous section), and s was 0.001, 0.01, or 0.1.

Under this model, a newly arisen gene duplicate can be regarded as preserved in the population if neofunctionalization occurs at either locus or if the original locus becomes nonfunctionalized. Thus, the scaled probability of preservation is

(10)

with the component terms being defined in Table 1 and Table 2. For genes that are not completely linked, a map change occurs if the original locus becomes silenced or neofunctionalized, so the scaled probability of a map change is

(11)


 
View this table:
In this window
In a new window

 
Table 2. Additional terms associated with the neofunctionalization model

Finally, a new gene is added to the genome whenever one member of the pair is neofunctionalized, as this results in joint preservation of both copies. Hence,

(12)

A key feature of this model of gene duplication is that the original locus (prior to duplication) can exhibit a balanced polymorphism due to the recurrent input of mutations and to heterozygote superiority. Although neofunctional alleles have zero fitness when in the homozygous state, they have a heterozygote advantage of s when associated with wild-type alleles. For large N, a set of standard recursion equations for allele frequencies (ignoring drift) yields the approximate equilibrium frequencies of the neofunctional (n) and null (0) alleles. For µc < [s/(1 + s)]2,

(13a)


(13b)

whereas for µc > [s/(1 + s)]2,

(14a)


(14b)

These results, combined with observations from computer simulations (Fig 5), illustrate two key points. First, for sufficiently weak positive selection (µc > [s/(1 + s)]2), the mutation pressure against a neofunctional allele overwhelms the selective advantage, maintaining the frequency of neofunctional alleles at the original locus at negligible levels. For example, with s = 0.001 and µc = 10-5, n asymptotically approaches ~µb/(2s) ~= 5 x 10-7 at large N. In this case, a new duplicate locus will almost always be initiated with a wild-type allele, and neofunctionalization will require mutation to new neofunctional alleles subsequent to the duplication process. Second, when selection is stronger c < [s/(1 + s)]2), the expected frequency of neofunctional alleles residing at the original locus is nearly a threshold function of population size, being closely approximated by Equation 13a, provided Ns2 > 4, and rapidly dropping to negligible values (<1/2N) for N below the threshold. For example, as N -> {infty}, with µc = 10-5, n -> 0.0088 when s = 0.01, and n -> 0.083 when s = 0.1. This means that at large population sizes with unlinked loci, neofunctionalization need not rely on the rare occurrence of beneficial mutations but can be poised to move forward if (1) the new locus is founded with a neofunctional allele or (2) the new locus is founded with a wild-type allele that subsequently acquires a sufficiently high frequency that the neofunctional alleles at the original locus become subject to directional, rather than balancing, selection.



View larger version (29K):
In this window
In a new window
Download PPT slide
 
Figure 5. Expected equilibrium frequencies of neofunctional (n) and nonfunctional (null, 0) alleles at the initial locus at various population sizes, under drift-mutation-selection balance, obtained by computer simulation.

Linked loci:
In the case of complete linkage, a newly arisen gene duplicate must be of wild type to have any chance of permanent preservation, because under the assumptions of the model a linked pair of neofunctional genes is lethal in the homozygous state. So for linked duplicates, we considered only the case in which the initial duplicate carried the essential ancestral function. In this case, permanent preservation of both loci occurs when the founding two-copy allele goes to fixation and one member evolves a new function. This outcome yields a state of fixed heterozygosity, in the sense that each gamete carries one allele with the ancestral function and another with the new function (SPOFFORD 1969 Down).

As noted above, the case of completely linked duplicates can be treated as a single-locus model with two classes of alleles, single copy and two copy. Ignoring the weak directional forces of selection, a newly arisen linked pair of gene duplicates (i.e., a two-copy allele carrying only wild-type genes) will initially be destined to go to fixation with probability 1/(2N) and otherwise to become lost with probability {lambda}. Should the two-copy allele proceed down the path toward fixation, one member of the pair will ultimately become either silenced or neofunctionalized. For fully redundant genes, silencing mutations go to fixation at the rate of µc per locus, since the number of newly arising mutations is 2Nµc per locus and the probability of a fixation of a neutral allele is 1/(2N), whereas beneficial mutations to a novel function go to fixation at the rate of 2NuFµb, as there are again 2N gene copies per locus, each mutating at rate µb and in this case fixing with probability uF. We rely on the diffusion approximation for the probability of fixation of a newly arisen beneficial mutation with additive effects,

(15)

(KIMURA 1962 Down). Letting ß = denote the relative probability of neofunctionalization, the conditional probabilities of the four possible fates of linked duplicates destined to fixation are

(16a)


(16b)

Were these the only paths to the preservation of a new duplicate, one would expect the upper limit for {Theta} and {Gamma} to equal 1, because ß <= 1.0. However, we must also consider the possibility of the appearance of a neofunctionalizing mutation in a two-copy allele that is otherwise destined to be lost by random genetic drift, as this can alter the course of events.

To quantify the probability of such a rescue effect, we need to know the number of alleles that are available targets for neofunctionalizing mutations. The expected number of two-copy alleles in the population in generation t, conditional on not having yet been lost or having been rescued, can be shown to be

(17)

where uL(t) is the probability that the locus has been lost by drift by generation t. Because we are focusing on a large-population phenomenon, uL(t) can be approximated with FISHER's (1922) recursion for a mutant allele initially present in a single copy,

(18)

starting with uL(0) = 0. The probability that a two-copy allele otherwise destined to be lost acquires a neofunctionalizing mutation in generation t that will carry it to fixation is then

(19)

the 2 accounting for the two copies of the ancestral gene per two-copy allele, and the term e-µct being the probability that a gene within the pair has not acquired a silencing mutation by time t. Letting

(20)

be the probability that an effectively neutral allele destined to eventual loss is lost in generation t and {ell}(t) be the probability that the fate of two-copy alleles has not been determined by generation t, then the partition of the contributions to alternative fates for the {lambda} cases in which a two-copy allele is initially destined to become lost is

(21a)


(21b)

with

(22)

The final probabilities of the four alternative fates are given by

(23a)


(23b)


(23c)


(23d)

(For the reader's convenience, we summarized the definitions of all terms associated with the neofunctionalization model in Table 2.)

For the most part, these expressions are in good agreement with the simulated data (Fig 6 and Fig 7). At small population sizes, there is a negligible likelihood of a beneficial mutation resurrecting a two-copy locus destined to be lost by drift, so from Equation 16a and Equation 16b alone, {Theta} ~= (1 + ß)/2 and {Gamma} ~= ß. At the very smallest population sizes (N < 103), ß asymptotically approaches µb/(µc + µb), which for µb << µc results in {Theta} -> 0.5 + (µbc) and {Gamma} -> µbc. On the other hand, in the limit as N -> {infty}, the chance of the original locus becoming silenced is negligible, which results in {Gamma} ~= {Theta} scaling nearly linearly with population size.



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 6. The scaled probability of preservation of a duplicate gene for the situation in which mutations either completely silence a gene or endow it with a new function at the expense of the old function. Solid lines are the predictions derived from the theory outlined in the text.



View larger version (22K):
In this window
In a new window
Download PPT slide
 
Figure 7. The scaled probability of genome expansion per newly arisen gene duplicate for the situation in which mutations either completely silence a gene or endow it with a new function at the expense of the old function. Solid lines are the predictions derived from the theory outlined in the text.

Unlinked loci:
The probability of neofunctionalization can be greatly enhanced in the case of freely recombining loci because a new duplicate locus that is founded by a neofunctionalized allele is free to move toward fixation and because the fates of subsequent mutations at one locus are less influenced by those at the other. Given that the equilibrium allele frequencies at the original locus are related to N and s in a threshold manner (Equation 13a HREF="#FD13b">Equation 13b and Equation 14aEquation 14b and Fig 5), two alternative sets of analytical approximations appear to be necessary.

We first consider the situation in which neofunctionalized alleles are likely to be segregating at nonnegligible frequencies, µc < [s/(1 + s)]2, which for the parameters that we examined holds for s = 0.1 and 0.01. To have any chance of establishing itself permanently, a newly arisen duplicate locus must be founded by either a neofunctionalized (n) or wild-type (f) allele, the probabilities of which are

(24a)


(24b)

where n and 0 are defined by the values in Fig 5. If the founder allele is of the neofunctional type, the probability of fixation is given by Equation 15 with selection coefficient

(25a)

and, conditional upon such fixation, the original locus must maintain the original function. If the founder allele is wild type, the probability of fixation is a function of the relative fitnesses of the ff, f0, and 00 genotypes at the new locus induced by the presence of 00, n0, and nn genotypes at the original locus, where 0 denotes a nonfunctional allele. The latter genotypes have zero fitness if the genotype at the new locus is 00 but respective fitnesses of 1, 1 + s, and 1 + 2s if the genotype at the new locus is ff or f0. Scaling the fitness of the 00 genotype at the new locus to be equal to one, the initial expected selective advantage of both the ff and f0 genotypes is equal to

(25b)

which for large N and µc < [s/(1 + s)]2 simplifies to sf ~= s2/(1 + 2s). WRIGHT 1969 Down(p. 382) provides a series approximation for the probability of fixation of a dominant beneficial mutation, but for the values of s that we employed this yields results that are very close to the values obtained with Equation 15 after substituting sf for s. Conditional upon fixation of the f allele at the new locus, the neofunctional alleles residing at the original locus may proceed to fixation with probability

(26)

and in the event that this does not occur, one of the two loci is expected to become neofunctionalized via new mutations with probability ß. Summing up the various paths, the probabilities of the four alternative fates of the gene pair are then given by

(27a)


(27b)


(27c)


(27d)

where uF(sf) and uF(sn) are obtained from Equation 15 after substituting for s. In the limit for large N, ß -> 1, pn -> s/(1 + 2s), uF(sf) -> 2s2/(1 + 2s), and uF(sn) -> 2s(1 + s)2/(1 + 2s)2, leading to {Theta} = {Gamma} ~= 4Ns2(2 + 3s)(1 + s)/(1 + 2s)2 and {Delta} ~= {Theta}/(2 + 3s). Provided s < 0.1, these large-N/large-s approximations reduce further to {Theta} = {Gamma} ~= 8Ns2 and {Delta} ~= 4Ns2, showing that all three statistics increase linearly with N (implying that the probabilities of these fates are independent of N) and with the square of s.

We now turn to the situation in which µc > [s/(1 + s)]2, which for the parameters that we examined holds for s = 0.001, and in which case there is a negligible chance of the new locus being initially founded with a neofunctional allele. We again take a cohort approach, similar to that used in the case of linked loci, noting that the founder allele at the new locus is initially destined to fix with probability 1/(2N) and otherwise to be lost with probability {lambda}. In the former case, one of the loci is expected to eventually become neofunctionalized with probability ß or to become nonfunctionalized with probability 1 - ß. In the latter case, we must account for the possibility that the new locus, otherwise destined to be lost, will be rescued with a neofunctionalizing mutation. The probability of rescue in generation t is given by

(28)

with uF defined by Equation 15 and nm(t) by Equation 17, and the generation-specific contributions to alternative fates for the cases in which the founder allele is initially destined to loss are

(29a)


(29b)

where pL(t) is defined by Equation 20, and

(30)

We then have

(31a)


(31b)


(31c)


(31d)

As can be seen in Fig 6 Fig 7 Fig 8, the theory for freely recombining duplicates is in fairly close agreement with the values of {Theta}, {Gamma}, and {Delta} observed over the full range of N and s, the main exception being the overestimation of {Delta} at large N when selection is weak. When N is small, {Theta} = {Delta} ~= 0.5 independent of s. This is again a consequence of the fact that the probability of fixation of a newly arisen locus is equal to 1/(2N) and that one of the loci will then almost always become silenced, because of the negligible probability of neofunctionalization. On the other hand, once N exceeds a threshold value (depending on s and µc), {Theta} scales linearly with N and approximately linearly with s2 in agreement with the asymptotic expressions given above. A similar scaling with N and s2 is seen for {Gamma} at large N. The abrupt change in the behavior of {Theta}, {Gamma}, and {Delta} at intermediate N and strong selection (s = 0.01 and 0.1) corresponds precisely with the abrupt change in frequency of neofunctional alleles at the original locus (Fig 5).



View larger version (25K):
In this window
In a new window
Download PPT slide
 
Figure 8. The scaled probability of a map change (for unlinked duplicates) per newly arisen gene duplicate for the situation in which mutations either completely silence a gene or endow it with a new function at the expense of the old function. Solid lines are the predictions derived from the theory outlined in the text.


*  DISCUSSION
*TOP
*ABSTRACT
*PRESERVATION BY DEGENERATIVE...
*PRESERVATION BY...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

These results demonstrate that the evolutionary trajectories of duplicate genes are not just functions of intrinsic organismal properties such as gene structure, regulatory-region complexity, distribution of mutational effects, etc., but are also highly dependent on the effective size of a population. This view suggests that the mechanisms influencing the fates of duplicate genes may vary dramatically among species (and even within the history of individual species lineages) depending on the population size prevailing during the initial appearance of a duplicate gene. Population size influences the evolution of duplicate genes in two ways. First, larger populations are more likely to harbor segregating subfunctional or neofunctional alleles at the ancestral locus prior to duplication, raising the possibility that the newly arisen locus may be founded by an allele other than the wild type and also the possibility that the ancestral locus can rapidly become neofunctionalized (without the reliance on new beneficial mutations) if the new locus becomes established with wild-type alleles. Second, because the time to fixation (and loss) increases with increasing population size, the potential fates of duplicate genes can be altered during the long period in which they drift through large populations and acquire secondary mutations. For example, subfunctional alleles at a new locus may become completely silenced by degenerative mutations prior to fixation, whereas functional alleles that are otherwise destined to be lost by drift can on occasion be rescued by a beneficial mutation. Thus, attempts to understand the evolution of the duplicate genes (and by extrapolation, other aspects of genome expansion/contraction) are not likely to be successful unless they are considered in the context of the genetic properties of finite populations.

Preservation of the new copy:
Two rather different models, one incorporating only degenerative mutations and the other also including beneficial mutations, suggest that the probability of preservation of a newly arisen duplicate gene is generally no less than half of its initial frequency (i.e., {Theta} > 0.5) regardless of the degree of linkage (Fig 3 and Fig 6). Thus, unless there is active selection against a duplicate gene, its probability of permanent establishment is at least one-half the expected fixation probability of a neutral allele, i.e., >=1/(4N). Moreover, in the absence of an appreciable likelihood of fixation of beneficial mutations (either because the rate of mutation to such alleles is too low, the beneficial effects are too small, or the population size is insufficiently large), the probability of preservation is unlikely to exceed 1/(2N). On the other hand, in sufficiently large populations, neofunctionalization can lead to probabilities of preservation (per duplication event) that are independent of N and orders of magnitude greater than possible under a scenario dominated by degenerative mutations. Provided the null mutation rate is sufficiently small relative to the strength of selection (µc < [s/(1 + s)]2) and the effective population size is sufficiently large (Ns2 > 4; Fig 5), most cases of neofunctionalization following gene duplication are expected to be driven by neofunctional alleles preexisting at the ancestral locus rather than by mutations arising subsequent to the duplication event. If the new locus is founded by a wild-type allele that reaches sufficiently high frequency, natural selection will promote the neofunctional alleles segregating at the original locus. Alternatively, the new locus may be founded by a neofunctional allele that goes to fixation, in which case the original gene function will be maintained at the ancestral locus.

Although our results suggest that subfunctionalization will be a more common mechanism of duplicate-gene preservation in small populations, with neofunctionalization becoming progressively more common as N increases, the exact population size at which neofunctionalization begins to exceed subfunctionalization as a preservational mechanism will depend on the relative rates of origin of the two types of preservational mutations (µr and µb) and on the selective advantage of neofunctional alleles. For the case of neofunctionalization, it is noteworthy that {Theta}(= {Delta}) scales not with s, as would normally be expected for an unconditionally advantageous allele at a single locus, but with the square of s. This scaling can be understood most easily by considering the case of unlinked duplicates at large N. If the founding allele at the new locus is wild type, its main initial advantage (relative to "absentee" alleles at the new locus) arises in backgrounds where the genotype at the ancestral locus is of type nn, n0, or 00, and from Equation 13a and Equation 13b it can be seen that the most abundant of these genotypes, nn, has an expected frequency ~= [s/(1 + 2s)]2. On the other hand, if the founding allele is of the neofunctional type, it will go to fixation with probability ~= 2sf, and from Equation 25b it can be seen that sf ~= s2/(1 + 2s). Thus, regardless of the nature of the founder allele, its probability of fixation scales approximately with s2 at large N. If subfunctionalizing mutations greatly outnumber neofunctionalizing mutations and s is typically small, neither of which seems unlikely, then the majority of successful gene duplicates may owe their preservation to subfunctionalization. Not included in our analyses is the possibility that many duplicates may be subfunctionalized at birth via the duplication process itself, due, for example, to the failure of the duplicated region to cover the full ancestral gene sequence (AVEROF et al. 1996 Down). Such conditions would further increase the relative incidence of subfunctionalization as a preservational process.

In large populations, the degree of linkage between duplicate genes can substantially influence the probability of preservation of a new gene copy (Fig 3 and Fig 6). When degenerative mutations dominate the process, a linked pair of functional duplicates has a weak transient selective advantage over a single-copy allele, because the former requires at least two mutations to be silenced. This results in an increase in the probability of preservation from 1/(4N) at small N to an asymptotic level of 1/(2N) at large N. Thus, in the absence of beneficial mutations, a linked pair of duplicates fixes at the neutral rate at large N despite the fact that the underlying process is non-neutral. This behavior contrasts with that of an unlinked duplicate, which, in the absence of beneficial mutations, is prevented from becoming permanently established in very large populations by saturation with silencing mutations by the time the lineage fixes in the population. In contrast, when neofunctionalizing mutations become a prominent influence, linkage reduces the probability of preservation of gene duplicates. Free recombination facilitates the neofunctionalization process because a pair of completely linked neofunctional genes (or a pair containing one neofunctional and one nonfunctional copy) is prevented from going to fixation by the lack of the critical ancestral gene function.

These results suggest the hypothesis that duplicate genes that are preserved by neofunctionalization will tend to be unlinked, whereas those preserved by subfunctionalization (or silencing of the ancestral gene) will tend to be more closely linked (at least during the period of preservation). It should be noted, however, that although duplicate genes often arise in tandem association with the parental locus, they are frequently recruited to new locations at an early stage of their history (LYNCH and CONERY 2000 Down). The influence of linkage on the fate of a duplicate pair will clearly depend on the timing of such translocation events.

Evolution of genome size:
Although the preservation of duplicate genes often leads to an expansion in genome size, this is not necessarily the case because the preservation of a new gene copy may be balanced by the loss of the ancestral copy. For example, in sufficiently small populations, where the likelihood of neofunctionalization is reduced to negligible levels, a new duplicate may still become preserved if it drifts to fixation and the original locus becomes nonfunctionalized, but in this case there is no net change in genome size. Any pressure toward genome-size expansion is expected to come from subfunctionalization until a critical population size has been reached and neofunctionalization becomes more dominant, the exact threshold population size again depending on µr, µb, s2, and the degree of linkage between ancestral and descendant loci.

Like nucleotide substitutions, insertions, and deletions, gene duplication appears to be a common attribute of all genomes. For example, analysis of the complete genomic sequences of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae suggests that new duplications may typically become established in populations at rates on the order of 10-3–10-2 per gene per million years (LYNCH and CONERY 2000 Down). These are probably conservative estimates as they do not include duplicates arising in large multigene families. Thus, on a per-locus basis, the rate of gene duplication appears to be of the same order of magnitude as nucleotide substitution. With the typical eukaryotic genome containing on the order of 104–105 genes, it appears (very roughly) that 10–1000 new gene duplicates may become established at high frequency per genome on a timescale of 1 million years, with their subsequent long-term fates then depending on the mutational mechanisms outlined above.

Because subfunctionalizing and neofunctionalizing mechanisms will generally ensure an innate tendency toward a net accumulation of new genes, stability in genome size requires selection against too many gene duplicates and/or molecular mechanisms that stochastically delete additional copies. In the absence of such opposing forces, one might expect the expansion of genome size to be a self-accelerating process, as the accumulation of more genes provides more substrate for future duplications. However, the opportunities for preservation by subfunctionalization are expected to be reduced as members of a gene family partition up the tasks of the ancestral gene, and, under the neofunctionalization model, the likelihood of establishing a new beneficial function may decline with an increase in organismal complexity; i.e., both µr and µb may decline with increasing genome size. These design limitations alone may constrain the indefinite expansion of genome size, but mutational mechanisms almost certainly play an additional role. For example, nonessential DNA appears to have a half-life of ~14 million years in Drosophila and ~880 million years in mammals (PETROV and HARTL 1998 Down), and comparative analyses have consistently indicated a tendency for the rate of deletion of DNA to exceed that of insertion (DE JONG and RYDEN 1981 Down; GU and LI 1995 Down; LYNCH 1996 Down). Although numerous mechanisms may counteract the innate tendency toward genome expansion generated by gene duplication, it is unlikely that these opposing forces will ever be perfectly balanced. Rather, the genome sizes of individual species may typically undergo stochastic phases of expansion and contraction depending on the prevailing aspects of population size and selection regime.

The mechanisms that we have suggested for the expansion of genome size via duplicate genes need not be all inclusive. For example, it has been suggested that genomic redundancies may be selectively maintained to mask the consequences of null homozygotes or errors in transcription and translation (CLARK 1994 Down; NOWAK et al. 1997 Down; KRAKAUER and NOWAK 1999 Down; WAGNER 1999 Down). Although these types of buffering models are diverse in terms of assumptions, they are most closely related to our analyses in which both the neofunctionalizing and subfunctionalizing mutation rates are equal to zero. In this case, any selective advantage of a newly arisen duplicate gene is entirely derived from masking the effects of the null homozygote at the original locus, whose frequency approaches µc when N is large. However, under this simple model, we find that one member of a duplicate pair is always eventually lost by random genetic drift, even at very large population sizes. This seems to result from the fact that the selective advantage of a duplicate gene under this model (the equilibrium frequency of null homozygotes at the original locus) is less than or equal to the silencing mutation rate. Thus, the permanent preservation of duplicate genes by a buffering mechanism appears to require both very large N and a frequency of null phenotypes elevated above the genetic expectation by errors in intracellular processing.

Alterations of the genetic map:
Gene duplication may be of as much relevance to the origin of new species as it is to the origin of evolutionary novelty within species (WERTH and WINDHAM 1991 Down; LYNCH and FORCE 2000B Down). As noted above, for unlinked duplicates, the probability of a map change for gene function (or subfunction) is generally no less than 1/(4N) per gene-duplication event, and, in large populations, neofunctionalization can magnify this probability by several orders of magnitude (Fig 8). One consequence of a map change is that double-null homozygotes segregate out with frequency 1/16 in the progeny of F1 hybrids, and additional problems can arise when nulls are not completely recessive, when genomic imprinting occurs, when one member of a pair resides on a sex chromosome, and when the haploid phase of the genome is transcriptionally active (LYNCH and FORCE 2000B Down). If we accept that the incremental rate of origin of new gene duplicates in a population is somewhere in the range of 10–1000 per million years, then on the order of a dozen to a few hundred potential map changes can be expected to arise in two lineages separated for this time period, the actual number depending on the fraction of newly arisen duplicates that are either unlinked at the time of origin or soon become unlinked by subsequent chromosomal events. Consistent with this view, recent work in comparative genomics indicates that even when gross chromosomal gene order remains roughly stable between species, microchromosomal rearrangements (including reassignments of individual genes to new chromosomal locations associated with duplication events) are quite common among closely related species (KENT and ZAHLER 2000 Down; BANCROFT 2001 Down; DEHAL et al. 2001 Down). An indirect consequence of gene duplication for the origin of map changes that we have not considered here is homologous recombination between duplicated loci, which can produce reciprocal translocations (RYU et al. 1998 Down). Thus, there is little question that duplication-induced map changes are a common genomic property, and the key remaining questions concern the degree to which these, as opposed to other mechanisms (e.g., changes within genes), dominate the process of reproductive isolation.

Although the origin of new species is often viewed as a small-population phenomenon, our results demonstrate how reproductive incompatibilities can passively arise between very large isolated populations. Because {Delta} increases with increasing s, reproductive incompatibilities induced by gene duplication may be accompanied by the origin of new adaptive functions. However, such an association is a simple consequence of the change in map position that frequently accompanies the origin of genes with new functions, not a result of the adaptive changes themselves. It is noteworthy as well that map displacements of divergently resolved gene duplicates will cause the superficial appearance of negative epistatic interactions in the genetic analysis of hybrid progeny, even in the absence of any interactions between the gene products contributing to novelties in the sister taxa. In this sense, studies of reproductive isolating barriers that do not identify mechanisms to the gene level may be quite deceiving. As emphasized elsewhere (LYNCH and CONERY 2000 Down; LYNCH and FORCE 2000B Down), the gene-duplication model for the origin of genomic incompatibility is consistent with both the leading genetic models for the origin of reproductive isolation (the epistasis model of DOBZHANSKY 1936 Down and MULLER 1940 Down and the chromosomal rearrangement model of WHITE 1978 Down and others), while invoking fewer assumptions than either. Our results also raise the hypothesis that divergent resolution of gene duplicates following a genome-wide or chromosomal duplication event may promote the origin of many nested reproductive isolation events in descendant lineages, with adaptive radiations following as a secondary consequence.

Future work:
The theory developed in this article is meant to provide some heuristic guidance to our understanding of the mechanisms that lead to the preservation vs. silencing of duplicate genes, and by necessity a number of assumptions have been made. For example, we have focused on nonfunctionalizing and subfunctionalizing mutations of large effects (as have most previous theoretical investigations in this area). However, our earlier work (LYNCH and FORCE 2000A Down) suggests that additional subfunctions or mutations of minor effect will simply increase the probability of duplicate-gene preservation to a level of 1/(2N) when N is small, and limited simulations at large N suggest the same. In addition, we have ignored issues of dosage, which may play a significant role with genes whose products must be in the correct stoichiometric ratios with those of their interacting partners (FORCE et al. 1999 Down; SHIMELD 1999 Down). Except in the case of duplications involving entire genomes, such effects would impose negative selection against newly arisen duplicates. Finally, in our models involving neofunctionalization, we assumed that a mutant allele with a gain of function fails to perform its original function. One can envision a range of additional models involving neofunctionalizing mutations, the opposite extreme being the case in which neofunctionalization has no impact on the ancestral gene function. In the latter case, however, one would imagine that such unconditionally beneficial mutations would have ample opportunity to arise at the original locus (where virtually all of the mutational substrate resides). We have, therefore, chosen to focus on mutant alleles that depend on the duplication process to provide the freedom necessary to move toward fixation.

These issues aside, it is clear that a definitive understanding of the forces that dictate the fates of duplicate genes will require careful work at the empirical level. Such studies will need to focus on pairs of loci that are relatively early in their phase of establishment because the mutations responsible for the initial preservation of such genes may be substantially different from those that are incurred during subsequent evolutionary history. Unfortunately, almost all existing studies of the biology of duplicate genes have focused on pairs that have been established for so long that it is impossible to identify the mutations that were responsible for their initial preservation. A fundamental issue that remains to be resolved is the extent to which newborn duplicate genes share the full spectrum of functions and efficiencies of their ancestral copy. Although the preceding theory assumes complete functional redundancy, there is no reason why duplicated gene regions should always provide full coverage of upstream and downstream regulatory regions. Less than full coverage will almost certainly modify the potential evolutionary trajectories of newly arisen duplicates, most likely increasing the probability of subfunctionalization, but perhaps providing new opportunities for neofunctionalization as well.

For newly arising pairs of loci, it will be most instructive to know the incidence of active vs. partially or completely silenced alleles at both the original and the descendant locus, as well as the incidence of absenteeism at the new locus. Silent nucleotide sites should help reveal the relative ages of pairs of duplicates (assuming problems with gene conversion are minor), and careful studies of the rate of substitution at silent vs. replacement sites may clarify whether different gene regions are evolving in a neutral fashion, are being maintained by purifying selection, or are in the process of being transformed to new beneficial functions. A series of such studies with loci of different ages could then provide at least a qualitative glimpse into the factors that determine the fates of a typical pair of gene duplicates and the timescale over which these are established. DERMITZAKIS and CLARK 2001 Down recently proposed a phylogenetic method for testing whether the two members of a duplicate pair evolve in a similar manner over all of their protein-coding domains, showing how significant differences between paralogues can be used to identify the potential footprints of subfunctionalization. In principle, their approach can be extended to regulatory-region DNA, and the conceptual power of the method may be greatly enhanced by the inclusion of an outgroup species containing a single-copy gene. The primary caveat here is that the statistical power of phylogenetic comparison is relatively weak unless the phylogeny is deep enough to contain substantial numbers of nucleotide substitutions, so the method of DERMITZAKIS and CLARK 2001 Down may be of limited utility in studies of the earliest stages of gene duplication.

As whole genome sequences have emerged for a diversity of species, the identification of newly arisen pairs of duplicates has become quite feasible (LYNCH and CONERY 2000 Down), and it is also clear that duplications still in the process of spreading through a population can be located. An example of such a study is the recent investigation of the {alpha}-amylase gene cluster in the D. melanogaster complex (ROBIN et al. 2000 Down). Phylogenetic analysis suggests that one member of this cluster is fixed as a pseudogene in D. melanogaster (a victim of nonfunctionalization), whereas its orthologues remain active and apparently under purifying selection in the closely related species D. simulans and D. yakuba. It seems very likely that this locus contained at least some active alleles in the common ancestor of these three species but had not yet arrived at a stable state. Under this interpretation, the alternative states that have arisen in the descendant lineages may simply be stochastic outcomes of the mutation process and allelic sorting by random genetic drift (as in our simulations). It remains to be seen whether the new locus has been preserved by subfunctionalization or neofunctionalization in the D. simulans and D. yakuba lineages or whether it is still in a phase of resolution (in fact, only a single allele was examined in these two taxa). Several other examples of presence/absence polymorphisms of duplicate genes are known in Drosophila, including methallothionein in D. melanogaster (LANGE et al. 1990 Down), urate oxidase in D. virilis (LOOTENS et al. 1993 Down), and alcohol dehydrogenase in D. funebris (AMADOR and JUAN 1999 Down).

Finally, we note that our results have not entirely clarified the conditions influencing the likelihood of successful gene-duplication events in extremely large populations. On the one hand, neofunctionalizing mutations are most likely to become permanently established in large populations (Fig 6). On the other hand, if the preservational process is largely driven by degenerative mutations or if the selective advantage of a neofunctional allele is sufficiently small, when Nµc >> 10 and the loci are unlinked, it is almost certain that all of the descendants of a newly arisen duplicate will be silenced by the time its lineage is fixed (Fig 3). It is, therefore, at least plausible that the increased genome size of vertebrates (mouse and human) relative to invertebrates (flies and worms), of C. elegans relative to D. melanogaster, and perhaps even eukaryotes relative to prokaryotes is largely an indirect consequence of differences in effective population size. This view does not deny the possibility that increases in genome size may ultimately facilitate the evolution of organismal complexity by natural selection, but it does raise the possibility that nonselective forces, most notably random genetic drift and degenerative mutation, set the initial stage upon which such evolutionary changes can subsequently take place.


*  ACKNOWLEDGMENTS

We thank Kevin Higgins for help with computational procedures. This research was supported by National Institutes of Health (NIH) grant RO1-GM36827 to M.L.; by graduate fellowships to A.F. funded by a National Science Foundation (NSF) training grant in genetic mechanisms of evolution and in evolution and by an NIH training grant in developmental biology; and by a postdoctoral fellowship to A.F. funded by an NSF IGERT training grant in evolution, development, and genomics.

Manuscript received April 2, 2001; Accepted for publication August 27, 2001.


*  APPENDIX
*TOP
*ABSTRACT
*PRESERVATION BY DEGENERATIVE...
*PRESERVATION BY...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Assuming linked duplicates with a single function, we designate the null and functional single-copy alleles as 0 and f, respectively, whereas the four possible two-copy alleles are designated as 00, 0f, f0, and ff. Under the double-null-homozygote model, alleles 0 and 00 are equally viable, and we define their joint frequency to be P0, which implies an absolute fitness for these alleles of W0 = 1 - p0. All other alleles have absolute fitnesses equal to 1, so that mean population fitness is = 1 - p20. The set of recursion equations for allele frequencies under the assumption of an infinite population size is

To transform these difference equations into a solvable set of differential equations, we (1) assume p0 remains at its initial equilibrium value for a one-locus system, {surd} (in reality, there is a very slight initial decline in p0 when a functional two-copy allele appears, as this slightly reduces the input into the 0 class); (2) use ~= 1 + p20 = 1 + µc; and (3) ignore terms of order µ2c. The frequencies of the four classes of active alleles then change according to

Noting that the initial frequencies are pf = 1 - (N) - {surd}, p0f = pf0 = 0, and pff = N, the solutions of the above equations are

which shows that as t -> {infty}, the descendants of the founding duplicate rise in frequency from pff(0) = to p0f({infty}) + pf0({infty}) = 1/N.


*  LITERATURE CITED
*TOP
*ABSTRACT
*PRESERVATION BY DEGENERATIVE...
*PRESERVATION BY...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

AMADOR, A. and E. JUAN, 1999  Nonfixed duplication containing the Adh gene and a truncated form of the Adhr gene in the Drosophila funebris species group: different modes of evolution of Adh relative to Adhr in Drosophila.. Mol. Biol. Evol. 16:1439-1456[Abstract].

AVEROF, M., R. DAWES, and D. FERRIER, 1996  Diversification of arthropod Hox genes as a paradigm for the evolution of gene functions. Semin. Cell Dev. Biol. 7:539-551.

BAILEY, G. S., R. T. M. POULTER, and P. A. STOCKWELL, 1978  Gene duplication in tetraploid fish: model for gene silencing at unlinked duplicated loci. Proc. Natl. Acad. Sci. USA 75:5575-5579[Abstract/Free Full Text].

BANCROFT, I., 2001  Duplicate and diverge: the evolution of plant genome microstructure. Trends Genet. 17:89-93[Medline].

CHRISTIANSEN, F. B. and O. FRYDENBERG, 1977  Selection-mutation balance for two nonallelic recessives producing an inferior double homozygote. Am. J. Hum. Genet. 29:195-207[Medline].

CLARK, A. G., 1994  Invasion and maintenance of a gene duplication. Proc. Natl. Acad. Sci. USA 91:2950-2954[Abstract/Free Full Text].

DEHAL, P., P. PREDKI, A. S. OLSEN, A. KOBAYASHI, and P. FOLTA et al., 2001  Human chromosome 19 and related regions in mouse: conservative and lineage-specific evolution. Science 293:104-111[Abstract/Free Full Text].

DE JONG, W. W. and L. RYDEN, 1981  Causes of more frequent deletions than insertions in mutations and protein evolution. Nature 290:157-159[Medline].

DERMITZAKIS, E. T. and A. G. CLARK, 2001  Differential selection after duplication in mammalian developmental genes. Mol. Biol. Evol. 18:557-562[Abstract/Free Full Text].

DOBZHANSKY, T., 1936  Studies on hybrid sterility. II. Localization of sterility factors in Drosophila pseudoobscura hybrids. Genetics 21:113-135[Free Full Text].

FISHER, R. A., 1922  On the dominance ratio. Proc. R. Soc. Edinb. 52:399-433.

FISHER, R. A., 1935  The sheltering of lethals. Am. Nat. 69:446-455.

FORCE, A., M. LYNCH, B. PICKETT, A. AMORES, and Y.-L. YAN et al., 1999  Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545[Abstract/Free Full Text].

GU, X. and W.-H. LI, 1995  The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequencing alignment. J. Mol. Evol. 40:464-473[Medline].

HALDANE, J. B. S., 1933  The part played by recurrent mutation in evolution. Am. Nat. 67:5-9.

HUGHES, A. L., 1994  The evolution of functionally novel proteins after gene duplication. Proc. R. Soc. Lond. Ser. B 256:119-124[Medline].

KENT, W. J. and A. M. ZAHLER, 2000  Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res. 10:1115-1125[Abstract/Free Full Text].

KIMURA, M., 1962  On the probability of fixation of mutant genes in a population. Genetics 47:713-719[Free Full Text].

KIMURA, M. and T. OHTA, 1969  The average number of generations until fixation of a mutant gene in a finite population. Genetics 61:763-771[Free Full Text].

KRAKAUER, D. C. and M. A. NOWAK, 1999  Evolutionary preservation of redundant duplicated genes. Semin. Cell Dev. Biol. 10:555-559[Medline].

LANGE, B. W., C. H. LANGLEY, and W. STEPHAN, 1990  Molecular evolution of Drosophila metallothionein genes. Genetics 126:921-932[Abstract].

LI, W.-H., 1980  Rate of gene silencing at duplicate loci: a theoretical study and interpretation of data from tetraploid fishes. Genetics 95:237-258[Abstract/Free Full Text].

LOOTENS, S., J. BURNETT, and T. B. FRIEDMAN, 1993  An intraspecific gene duplication polymorphism of the urate oxidase gene of Drosophila virilis: a genetic and molecular analysis. Mol. Biol. Evol. 10:635-646[Abstract].

LYNCH, M., 1996  Mutation accumulation in transfer RNAs: molecular evidence for Muller's ratchet in mitochondrial genomes. Mol. Biol. Evol. 13:209-220[Abstract].

LYNCH, M. and J. C. CONERY, 2000  The evolutionary fate and consequences of duplicate genes. Science 290:1151-1154[Abstract/Free Full Text].

LYNCH, M. and A. FORCE, 2000a  The probability of duplicate gene preservation by subfunctionalization. Genetics 154:459-473[Abstract/Free Full Text].

LYNCH, M. and A. FORCE, 2000b  The origin of interspecific genomic incompatibility via gene duplication. Am. Nat. 156:590-605.

MULLER, H. J., 1940 Bearing of the Drosophila work on systematics, pp. 185–268 in The New Systematics, edited by J. S. HUXLEY. Clarendon Press, Oxford.

NEI, M. and A. K. ROYCHOUDHURY, 1973  Probability of fixation of nonfunctional genes at duplicate loci. Am. Nat. 107:362-372.

NOWAK, M. A., M. C. BOERLIJST, J. COOKE, and J. MAYNARD SMITH, 1997  Evolution of genetic redundancy. Nature 388:167-170[Medline].

OHNO, S., 1970 Evolution by Gene Duplication. Springer-Verlag, Berlin.

PETROV, D. A. and D. L. HARTL, 1998  High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis groups. Mol. Biol. Evol. 15:293-302[Abstract].

PIATIGORSKY, J. and G. WISTOW, 1991  The recruitment of crystallins: new functions precede gene duplication. Science 252:1078-1079[Free Full Text].

ROBIN, G. C., Q. DE, R. J. RUSSELL, D. J. CUTLER, and J. G. OAKSHOTT, 2000  The evolution of an {alpha}-esterase pseudogene inactivated in the Drosophila melanogaster lineage. Mol. Biol. Evol. 17:563-575[Abstract/Free Full Text].

RYU, S.-L., Y. MUROOKA, and Y. KANEKO, 1998  Reciprocal translocation at duplicated RPL2 loci might cause speciation of Saccharomyces bayanus and Saccharomyces cerevisiae.. Curr. Genet. 33:345-351[Medline].

SHIMELD, S. M., 1999  Gene function, gene networks and the fate of duplicated genes. Semin. Cell Dev. Biol. 10:549-553[Medline].

SPOFFORD, J. B., 1969  Heterosis and the evolution of duplications. Am. Nat. 103:407-432.

STOLTZFUS, A., 2000  On the possibility of constructive neutral evolution. J. Mol. Biol. 49:169-181.

TAKAHATA, N. and T. MARUYAMA, 1979  Polymorphism and loss of duplicate gene expression: a theoretical study with application to tetraploid fish. Proc. Natl. Acad. Sci. USA 76:4521-4525[Abstract/Free Full Text].

WAGNER, A., 1999  Redundant gene functions and natural selection. J. Evol. Biol. 12:1-16.

WAGNER, A., 2000  The role of population size, pleiotropy and fitness effects of mutations in the evolution of overlapping gene functions. Genetics 154:1389-1401[Abstract/Free Full Text].

WALSH, J. B., 1995  How often do duplicated genes evolve new functions? Genetics 110:345-364[Abstract/Free Full Text].

WATTERSON, G. A., 1983  On the time for gene silencing at duplicate loci. Genetics 105:745-766[Abstract/Free Full Text].

WERTH, C. R. and M. D. WINDHAM, 1991  A model for divergent, allopatric speciation of polyploid pteridophytes resulting from silencing of duplicate-gene expression. Am. Nat. 137:515-526.

WHITE, M. J. D., 1978 Modes of Speciation. Freeman, San Francisco.

WRIGHT, S., 1969 Evolution and the Genetics of Populations, Vol. 2, The Theory of Gene Frequencies. University of Chicago Press, Chicago.




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
J. C. Opazo, A. M. Sloan, K. L. Campbell, and J. F. Storz
Origin and Ascendancy of a Chimeric Fusion Gene: The {beta}/{delta}-Globin Gene of Paenungulate Mammals
Mol. Biol. Evol., July 1, 2009; 26(7): 1469 - 1478.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. Chaudhary, L. Flagel, R. M. Stupar, J. A. Udall, N. Verma, N. M. Springer, and J. F. Wendel
Reciprocal Silencing, Transcriptional Bias and Functional Divergence of Homeologs in Polyploid Cotton (Gossypium)
Genetics, June 1, 2009; 182(2): 503 - 517.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Ha, E.-D. Kim, and Z. J. Chen
Duplicate genes increase expression diversity in closely related species and allopolyploids
PNAS, February 17, 2009; 106(7): 2295 - 2300.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
N. Siefers, K. K. Dang, R. W. Kumimoto, W. E. Bynum IV, G. Tayrose, and B. F. Holt III
Tissue-Specific Expression Patterns of Arabidopsis NF-Y Transcription Factors Suggest Potential for Extensive Combinatorial Complexity
Plant Physiology, February 1, 2009; 149(2): 625 - 641.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. L. Rogers, T. Bedford, and D. L. Hartl
Formation and Longevity of Chimeric and Duplicate Genes in Drosophila melanogaster
Genetics, January 1, 2009; 181(1): 313 - 322.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
A. H. Paterson, J. E. Bowers, F. A. Feltus, H. Tang, L. Lin, and X. Wang
Comparative Genomics of Grasses Promises a Bountiful Harvest
Plant Physiology, January 1, 2009; 149(1): 125 - 131.
[Full Text] [PDF]


Home page
J. Neurosci.Home page
A. J. Fein, M. A. Wright, E. A. Slat, A. B. Ribera, and L. L. Isom
scn1bb, a Zebrafish Ortholog of SCN1B Expressed in Excitable and Nonexcitable Cells, Affects Motor Neuron Axon Morphology and Touch Sensitivity
J. Neurosci., November 19, 2008; 28(47): 12510 - 12522.
[Abstract] [Full Text] [PDF]


Home page
Mol PlantHome page
J.-H. Xu and J. Messing
Diverged Copies of the Seed Regulatory Opaque-2 Gene by a Segmental Duplication in the Progenitor Genome of Rice, Sorghum, and Maize
Mol Plant, September 1, 2008; 1(5): 760 - 769.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
Q. Zhou, G. Zhang, Y. Zhang, S. Xu, R. Zhao, Z. Zhan, X. Li, Y. Ding, S. Yang, and W. Wang
On the origin of new genes in Drosophila
Genome Res., September 1, 2008; 18(9): 1446 - 1455.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. A. Studer, S. Penel, L. Duret, and M. Robinson-Rechavi
Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes
Genome Res., September 1, 2008; 18(9): 1393 - 1402.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Semon and K. H. Wolfe
Preferential subfunctionalization of slow-evolving genes after allopolyploidization in Xenopus laevis
PNAS, June 17, 2008; 105(24): 8333 - 8338.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
J. Sitaraman, M. Bui, and Z. Liu
LEUNIG_HOMOLOG and LEUNIG Perform Partially Redundant Functions during Arabidopsis Embryo and Floral Development
Plant Physiology, June 1, 2008; 147(2): 672 - 681.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. F. Storz, F. G. Hoffmann, J. C. Opazo, and H. Moriyama
Adaptive Functional Divergence Among Triplicated {alpha}-Globin Genes in Rodents
Genetics, March 1, 2008; 178(3): 1623 - 1638.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
C. Lin, B. Shen, Z. Xu, T. G. Kollner, J. Degenhardt, and H. K. Dooner
Characterization of the Monoterpene Synthase Gene tps26, the Ortholog of a Gene Induced by Insect Herbivory in Maize
Plant Physiology, March 1, 2008; 146(3): 940 - 951.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
R. Kafri, O. Dahan, J. Levy, and Y. Pilpel
Preferential protection of protein interaction network hubs in yeast: Evolved functionality of genetic redundancy
PNAS, January 29, 2008; 105(4): 1243 - 1248.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. R. Scannell and K. H. Wolfe
A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast
Genome Res., January 1, 2008; 18(1): 137 - 147.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
X. Wang, H. Tang, J. E. Bowers, F. A. Feltus, and A. H. Paterson
Extensive Concerted Evolution of Rice Paralogs and the Road to Regaining Independence
Genetics, November 1, 2007; 177(3): 1753 - 1763.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. W. Hahn, J. P. Demuth, and S.-G. Han
Accelerated Rate of Gene Gain and Loss in Primates
Genetics, November 1, 2007; 177(3): 1941 - 1949.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. W. Ganko, B. C. Meyers, and T. J. Vision
Divergence in Expression between Duplicated Genes in Arabidopsis
Mol. Biol. Evol., October 1, 2007; 24(10): 2298 - 2309.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. J. Evans
Ancestry Influences the Fate of Duplicated Genes Millions of Years After Polyploidization of Clawed Frogs (Xenopus)
Genetics, June 1, 2007; 176(2): 1119 - 1130.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Lynch
Colloquium Papers: The frailty of adaptive hypotheses for the origins of organismal complexity
PNAS, May 15, 2007; 104(suppl_1): 8597 - 8604.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
J. G Bragg and A. Wagner
Protein carbon content evolves in response to carbon availability and may influence the fate of duplicated genes
Proc R Soc B, April 22, 2007; 274(1613): 1063 - 1070.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
Rhesus Macaque Genome Sequencing and Analysis Cons, R. A. Gibbs, J. Rogers, M. G. Katze, R. Bumgarner, G. M. Weinstock, E. R. Mardis, K. A. Remington, R. L. Strausberg, J. C. Venter, et al.
Evolutionary and Biomedical Insights from the Rhesus Macaque Genome
Science, April 13, 2007; 316(5822): 222 - 234.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. O. Sassi, E. L. Braun, and S. A. Benner
The Evolution of Seminal Ribonuclease: Pseudogene Reactivation or Multiple Gene Inactivation Events?
Mol. Biol. Evol., April 1, 2007; 24(4): 1012 - 1024.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
P. Labbe, A. Berthomieu, C. Berticat, H. Alout, M. Raymond, T. Lenormand, and M. Weill
Independent Duplications of the Acetylcholinesterase Gene Conferring Insecticide Resistance in the Mosquito Culex pipiens
Mol. Biol. Evol., April 1, 2007; 24(4): 1056 - 1067.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
S. Bezhani, C. Winter, S. Hershman, J. D. Wagner, J. F. Kennedy, C. S. Kwon, J. Pfluger, Y. Su, and D. Wagner
Unique, Shared, and Redundant Roles for the Arabidopsis SWI/SNF Chromatin Remodeling ATPases BRAHMA and SPLAYED
PLANT CELL, February 1, 2007; 19(2): 403 - 416.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. C. Preston and E. A. Kellogg
Reconstructing the Evolutionary History of Paralogous APETALA1/FRUITFULL-Like Genes in Grasses (Poaceae)
Genetics, September 1, 2006; 174(1): 421 - 437.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
T. H. Oakley, B. Ostman, and A. C. V. Wilson
Repression and loss of gene expression outpaces activation and gain in recently duplicated fly genes
PNAS, August 1, 2006; 103(31): 11637 - 11641.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Benderoth, S. Textor, A. J. Windsor, T. Mitchell-Olds, J. Gershenzon, and J. Kroymann
Positive selection driving diversification in plant secondary metabolism
PNAS, June 13, 2006; 103(24): 9118 - 9123.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Petersen, R. Teich, B. Becker, R. Cerff, and H. Brinkmann
The GapA/B Gene Duplication Marks the Origin of Streptophyta (Charophytes and Land Plants)
Mol. Biol. Evol., June 1, 2006; 23(6): 1109 - 1118.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
A. J. Windsor, M. E. Schranz, N. Formanova, S. Gebauer-Jung, J. G. Bishop, D. Schnabelrauch, J. Kroymann, and T. Mitchell-Olds
Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis.
Plant Physiology, April 1, 2006; 140(4): 1169 - 1182.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. Zhang, M. M. Miyamoto, and M. J. Cohn
Lamprey type II collagen and Sox9 reveal an ancient origin of the vertebrate collagenous skeleton
PNAS, February 28, 2006; 103(9): 3180 - 3185.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
B. A. Chapman, J. E. Bowers, F. A. Feltus, and A. H. Paterson
Buffering of crucial functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome duplication
PNAS, February 21, 2006; 103(8): 2730 - 2735.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S.-H. Shiu, J. K. Byrnes, R. Pan, P. Zhang, and W.-H. Li
Role of positive selection in the retention of duplicate genes in mammalian genomes
PNAS, February 14, 2006; 103(7): 2232 - 2236.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Lynch
The Origins of Eukaryotic Gene Structure
Mol. Biol. Evol., February 1, 2006; 23(2): 450 - 468.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
M. B Bonsall
Longevity and ageing: appraising the evolutionary consequences of growing old
Phil Trans R Soc B, January 29, 2006; 361(1465): 119 - 135.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Paulsen and A. von Haeseler
INVHOGEN: a database of homologous invertebrate genes
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D349 - D353.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. L. Hughes
Gene duplication and the origin of novel proteins
PNAS, June 21, 2005; 102(25): 8791 - 8792.
[Full Text] [PDF]


Home page
BioinformaticsHome page
J.-F. Dufayard, L. Duret, S. Penel, M. Gouy, F. Rechenmann, and G. Perriere
Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases
Bioinformatics, June 1, 2005; 21(11): 2596 - 2603.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. D. Jones, A. W. Custer, and D. J. Begun
Origin and Evolution of a Chimeric Fusion Gene in Drosophila subobscura, D. madeirensis and D. guanche
Genetics, May 1, 2005; 170(1): 207 - 219.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Force, W. A. Cresko, F. B. Pickett, S. R. Proulx, C. Amemiya, and M. Lynch
The Origin of Subfunctions and Modular Gene Regulation
Genetics, May 1, 2005; 170(1): 433 - 446.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Lynch, D. G. Scofield, and X. Hong
The Evolution of Transcription-Initiation Sites
Mol. Biol. Evol., April 1, 2005; 22(4): 1137 - 1146.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Dale, T. Jones, and M. Pontes
Degenerative Evolution and Functional Diversification of Type-III Secretion Systems in the Insect Endosymbiont Sodalis glossinidius
Mol. Biol. Evol., March 1, 2005; 22(3): 758 - 766.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. S. Rahman, J. Hothersall, J. Crosby, T. J. Simpson, and C. M. Thomas
Tandemly Duplicated Acyl Carrier Proteins, Which Increase Polyketide Antibiotic Production, Can Apparently Function Either in Parallel or in Series
J. Biol. Chem., February 25, 2005; 280(8): 6399 - 6408.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Thornton and M. Long
Excess of Amino Acid Substitutions Relative to Polymorphism Between X-Linked Duplications in Drosophila melanogaster
Mol. Biol. Evol., February 1, 2005; 22(2): 273 - 284.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. C. Moore, S. R. Grant, and M. D. Purugganan
Molecular Population Genetics of Redundant Floral-Regulatory Genes in Arabidopsis thaliana
Mol. Biol. Evol., January 1, 2005; 22(1): 91 - 103.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Guillet-Claude, N. Isabel, B. Pelgas, and J. Bousquet
The Evolutionary Implications of knox-I Gene Duplications in Conifers: Correlated Evidence from Phylogeny, Gene Mapping, and Analysis of Functional Divergence
Mol. Biol. Evol., December 1, 2004; 21(12): 2232 - 2245.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
M. Lynch and J. S. Conery
Response to Comment on "The Origins of Genome Complexity"
Science, November 5, 2004; 306(5698): 978b - 978b.
[Full Text] [PDF]


Home page
Infect. Immun.Home page
L. Giacani, E. S. Sun, K. Hevner, B. J. Molini, W. C. Van Voorhis, S. A. Lukehart, and A. Centurion-Lara
Tpr Homologs in Treponema paraluiscuniculi Cuniculi A Strain
Infect. Immun., November 1, 2004; 72(11): 6561 - 6576.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Huminiecki and K. H. Wolfe
Divergence of Spatial Gene Expression Profiles Following Species-Specific Gene Duplications in Human and Mouse
Genome Res., October 1, 2004; 14(10a): 1870 - 1879.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
C.-T. Ting, S.-C. Tsaur, S. Sun, W. E. Browne, Y.-C. Chen, N. H. Patel, and C.-I Wu
Gene duplication and speciation in Drosophila: Evidence from the Odysseus locus
PNAS, August 17, 2004; 101(33): 12232 - 12235.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
R. D'Ovidio, A. Raiola, C. Capodicasa, A. Devoto, D. Pontiggia, S. Roberti, R. Galletti, E. Conti, D. O'Sullivan, and G. De Lorenzo
Characterization of the Complex Locus of Bean Encoding Polygalacturonase-Inhibiting Proteins Reveals Subfunctionalization for Defense against Fungi and Insects
Plant Physiology, August 1, 2004; 135(4): 2424 - 2435.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. Friedman and A. L. Hughes
Two Patterns of Genome Organization in Mammals: the Chromosomal Distribution of Duplicate Genes in Human and Mouse
Mol. Biol. Evol., June 1, 2004; 21(6): 1008 - 1013.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. J. Clauss and T. Mitchell-Olds
Functional Divergence in Tandemly Duplicated Arabidopsis thaliana Trypsin Inhibitor Genes
Genetics, March 1, 2004; 166(3): 1419 - 1436.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
R. C. Moore and M. D. Purugganan
The early stages of duplicate gene evolution
PNAS, December 23, 2003; 100(26): 15682 - 15687.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
V. Katju and M. Lynch
The Structure and Early Evolution of Recently Arisen Gene Duplicates in the Caenorhabditis elegans Genome
Genetics, December 1, 2003; 165(4): 1793 - 1803.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Kroymann, S. Donnerhacke, D. Schnabelrauch, and T. Mitchell-Olds
Evolutionary dynamics of an Arabidopsis insect resistance quantitative trait locus
PNAS, November 25, 2003; 100(suppl_2): 14587 - 14592.
[Abstract] [Full Text]


Home page
ScienceHome page
M. Lynch and J. S. Conery
The Origins of Genome Complexity
Science, November 21, 2003; 302(5649): 1401 - 1404.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
L. P. Martinez-Castilla and E. R. Alvarez-Buylla
Adaptive evolution in the Arabidopsis MADS-box gene family inferred from its complete resolved phylogeny
PNAS, November 11, 2003; 100(23): 13407 - 13412.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
F. Rodriguez-Trelles, R. Tarrio, and F. J. Ayala
Convergent neofunctionalization by positive Darwinian selection after ancient recurrent duplications of the xanthine dehydrogenase gene
PNAS, November 11, 2003; 100(23): 13413 - 13417.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. P. Mortlock, C. Guenther, and D. M. Kingsley
A General Approach for Identifying Distant Regulatory Elements Applied to the Gdf6 Gene
Genome Res., September 1, 2003; 13(9): 2069 - 2081.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
C. G. Kurland, B. Canback, and O. G. Berg
Horizontal gene transfer: A critical view
PNAS, August 19, 2003; 100(17): 9658 - 9662.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
C. K. Doyle, B. K. Davis, R. G. Cook, R. R. Rich, and J. R. Rodgers
Hyperconservation of the N-Formyl Peptide Binding Site of M3: Evidence that M3 Is an Old Eutherian Molecule with Conserved Recognition of a Pathogen-Associated Molecular Pattern
J. Immunol., July 15, 2003; 171(2): 836 - 844.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. L. Hughes and R. Friedman
Parallel Evolution by Gene Duplication in the Genomes of Two Unicellular Fungi
Genome Res., May 1, 2003; 13(5): 794 - 799.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
K. L. Adams, R. Cronn, R. Percifield, and J. F. Wendel
Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing
PNAS, April 15, 2003; 100(8): 4649 - 4654.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
L. C. Hileman and D. A. Baum
Why Do Paralogs Persist? Molecular Evolution of CYCLOIDEA and Related Floral Symmetry Genes in Antirrhineae (Veronicaceae)
Mol. Biol. Evol., April 1, 2003; 20(4): 591 - 600.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. Friedman and A. L. Hughes
The Temporal Distribution of Gene Duplication Events in a Set of Highly Conserved Human Gene Families
Mol. Biol. Evol., January 1, 2003; 20(1): 154 - 161.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
S. Ferrari, D. Vairo, F. M. Ausubel, F. Cervone, and G. De Lorenzo
Tandemly Duplicated Arabidopsis Genes That Encode Polygalacturonase-Inhibiting Proteins Are Regulated Coordinately by Different Signal Transduction Pathways in Response to Fungal Infection
PLANT CELL, January 1, 2003; 15(1): 93 - 106.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
O. G. Berg and C. G. Kurland
Evolution of Microbial Genomes: Sequence Acquisition and Loss
Mol. Biol. Evol., December 1, 2002; 19(12): 2265 - 2276.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
V. Nembaware, K. Crum, J. Kelso, and C. Seoighe
Impact of the Presence of Paralogs on Sequence Divergence in a Set of Mouse-Human Orthologs
Genome Res., September 1, 2002; 12(9): 1370 - 1376.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Lynch
Intron evolution as a population-genetic process
PNAS, April 30, 2002; 99(9): 6118 - 6123.
[Abstract] [Full Text] [PDF]