- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Lynch, M.
- Articles by Force, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Lynch, M.
- Articles by Force, A.
The Probability of Preservation of a Newly Arisen Gene Duplicate
Michael Lyncha, Martin O'Helyb, Bruce Walshc, and Allan Forceda Department of Biology, Indiana University, Bloomington, Indiana 47405,
b Department of Integrative Biology, University of California, Berkeley, California 94720,
c Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
d Virginia Mason Research Center, Benaroya Research Institute, Seattle, Washington 98101
Corresponding author: Michael Lynch, Department of Biology, Indiana University, Bloomington, IN 47405., mlynch{at}bio.indiana.edu (E-mail)
Communicating editor: M. A. ASMUSSEN
| ABSTRACT |
|---|
Newly emerging data from genome sequencing projects suggest that gene duplication, often accompanied by genetic map changes, is a common and ongoing feature of all genomes. This raises the possibility that differential expansion/contraction of various genomic sequences may be just as important a mechanism of phenotypic evolution as changes at the nucleotide level. However, the population-genetic mechanisms responsible for the success vs. failure of newly arisen gene duplicates are poorly understood. We examine the influence of various aspects of gene structure, mutation rates, degree of linkage, and population size (N) on the joint fate of a newly arisen duplicate gene and its ancestral locus. Unless there is active selection against duplicate genes, the probability of permanent establishment of such genes is usually no less than 1/(4N) (half of the neutral expectation), and it can be orders of magnitude greater if neofunctionalizing mutations are common. The probability of a map change (reassignment of a key function of an ancestral locus to a new chromosomal location) induced by a newly arisen duplicate is also generally >1/(4N) for unlinked duplicates, suggesting that recurrent gene duplication and alternative silencing may be a common mechanism for generating microchromosomal rearrangements responsible for postreproductive isolating barriers among species. Relative to subfunctionalization, neofunctionalization is expected to become a progressively more important mechanism of duplicate-gene preservation in populations with increasing size. However, even in large populations, the probability of neofunctionalization scales only with the square of the selective advantage. Tight linkage also influences the probability of duplicate-gene preservation, increasing the probability of subfunctionalization but decreasing the probability of neofunctionalization.
FOSTERED in part by the belief that gene duplication is a major contributor to the origin of evolutionary novelties, substantial theoretical and empirical attention has been given to the evolutionary fates of gene duplicates. The traditional view has been that a gene duplicate will ultimately suffer one of two fates: either one copy will be silenced by degenerative mutations (nonfunctionalization) or one copy will evolve a new beneficial function (neofunctionalization) that permanently preserves it in the population (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
As pointed out by ![]()
Our focus is on the ultimate fate of a pair of duplicate loci, one of which (the ancestral copy) carries active alleles in all members of the population and the other of which (the descendant copy) is initially represented by a single gene in a single (heterozygous) individual, all other individuals at this latter locus being effectively null homozygotes. We restrict our attention to whole-gene duplication, so that processed pseudogenes or partial duplications are not considered, and we assume that there is no intrinsic disadvantage to duplicates as might arise if gene-dosage issues were important. Given these starting conditions, several potential outcomes can be envisioned:
First, as with any newly arisen mutation, there is a high probability that the new copy will be rapidly lost by random genetic drift. If there is no selective advantage for the new copy, this probability will be equal to
= 1 - [
], where N denotes the population size. Upon such an outcome, all evidence of the duplication event will be eliminated from the population.
Second, in the rare event that the new duplicate rises to high frequency, it may randomly accumulate a higher load of degenerative mutations than the ancestral copy and in the absence of any selective advantage may eventually become nonfunctionalized. In this case, the ancestral gene copy is permanently retained, while a semipermanent record of the duplication event may transiently remain in the form of a pseudogene.
Third, if functional alleles rise by chance to high frequency at the new duplicate locus, it is possible that the ancestral copy will become a nonfunctional pseudogene. In this case, the population is again returned to the single-gene state of the ancestral population, but the genomic location of the functional gene will have changed (![]()
![]()
Finally, both copies of the locus may become permanently preserved either by subfunctionalization, with each copy carrying out a unique set of subfunctions (or both being mutationally reduced to the level of expression of the single-copy ancestral gene), or by neofunctionalization, with one copy evolving a new beneficial function at the expense of the original function (which is retained by the other copy). A change in map position will result if the two loci become subfunctionalized or if the original locus becomes neofunctionalized.
The evolutionary outcome of a gene-duplication event relates to three issues of potentially broad evolutionary significance. First, the mechanisms by which gene duplicates become permanently preserved have a bearing on the evolutionary potential of a species. For example, a neofunctionalizing mutation is equivalent to the origin of an evolutionary novelty, while subfunctionalizing mutations can provide new evolutionary flexibility by releasing an ancestral gene from pleiotropic constraints. We refer to the probability that a newly arisen gene duplicate becomes permanently preserved as
. Second, complete or partial silencing of an ancestral gene results in chromosomal repatterning, equivalent to a change in the genetic map, assuming the loci are not completely linked. Such changes are of relevance to the speciation process, as they passively induce postzygotic genomic incompatibilities in hybrid progeny (![]()
![]()
. Third, if duplicate genes become fixed in a population more frequently than their parental loci are lost, an expansion of the genome must occur. We refer to the probability that a newly arisen gene duplicate results in a permanent expansion of the genome size as
. This is equivalent to the probability of joint preservation of a pair of duplicates.
The development of a comprehensive theory for the evolution of duplicate genes raises formidable technical difficulties because the process involves two multiallelic loci with epistatic interactions. We have been successful in deriving some analytical approximations that help provide insight into the mechanisms governing the dynamics of duplicate-gene evolution, but to establish the validity of the theory it has also been necessary to rely extensively on computer simulations.
| PRESERVATION BY DEGENERATIVE MUTATIONS |
|---|
The situation in which mutations to novel beneficial functions are sufficiently rare to be ignored provides a useful null model for interpreting the fates of duplicate genes because the evolutionary dynamics are governed entirely by random genetic drift and degenerative mutation. Under this model, a newly arisen gene duplicate has three possible fates: (1) The new copy may simply be lost by random genetic drift and/or silenced by the accumulation of degenerative mutations; (2) the new copy may become permanently fixed in the population, with the original locus subsequently being silenced by degenerative mutations; or (3) both loci may become mutually preserved by subfunctionalization (Fig 1). The probability of preservation of the duplicate gene and, in the case of unlinked duplicates, the probability of a map change are equal to the sum of probabilities of fates 2 and 3, while the rate of genome expansion is equal to the probability of fate 3. To accommodate the fact that all of these probabilities decline rapidly with increasing N [because the probability of initial establishment is on the order of 1/(2N)], we scale the three summary statistics (
,
, and
) by multiplying by 2N. Letting Pnon,o denote the probability of silencing of the original locus and Psub denote the probability of subfunctionalization,
![]() |
(1a) |
and
![]() |
(1b) |
|
With this scaling,
= 1 implies that the probability of preservation of a newly arisen gene duplicate is equivalent to the rate of fixation of a neutral mutation, 1/(2N). Definitions of these and all additional terms associated with this model are summarized in Table 1.
|
As in most other theoretical investigations of the evolution of duplicate genes, we initially consider the double-null recessive model, whereby all two-locus genotypes have equal fitness except for the inviable double-null homozygotes that completely lack a particular function (or subfunction). Nonfunctionalizing mutations, which eliminate all gene function, arise at each locus at rate µc per gene copy per generation, and, when a gene has independently mutable subfunctions, each subfunction is subject to silencing at rate µr. We restrict our attention to the situation in which genes have either a single function (in which case µr = 0) or two independently mutable subfunctions (each with the same µr). Such subfunctions may be physically defined in a number of ways, including tissue-specific regulatory elements, alternative functional domains of a protein, and/or alternative splice variants. We consider the two extreme situations in which the duplicate loci are either completely linked (i.e., a tandem pair) or freely recombining.
As there is no reason to expect the mutation process to be altered upon gene duplication, we assume that the initial locus has allele frequencies expected under selection-mutation-drift equilibrium prior to duplication. The new locus is then randomly initiated with a single copy of either a fully functional allele or a subfunctional allele, with the probabilities of initial status being defined by the relative equilibrium frequencies of the classes of active alleles at the original locus. We also assume that the founding allele for the new locus is carried initially in a gamete containing its ancestral type at the original locus. In the case of complete linkage, because a duplicate is permanently associated with its parental source, a newly arisen subfunctional gene cannot proceed to fixation, as this would result in the loss of the alternative subfunction. In the case of free recombination, the ancestral locus is guaranteed to be preserved in the event the new locus is founded by a subfunctional allele.
It is well known that the equilibrium frequency of a recessive lethal (nonfunctional) allele for a gene with a single function is 
in large populations (Nµc > 1), and this frequency declines in smaller populations (Fig 2). The equilibrium frequency of nonfunctional alleles is reduced when genes have independently mutable subfunctions, but this is more than offset by the frequency of subfunctional alleles (Fig 2). For example, at large N with µc = µr = 10-5, each of the two types of subfunctional alleles have equilibrium frequencies of 0.0025, while the null allele has frequency 0.0015. Thus, provided N > 103, some subfunctional alleles are expected to be segregating at the initial locus unless µr << µc.
|
To evaluate the probabilities of the three alternative fates (Pnon,o, Pnon,m, and Psub) under this model over a range of population sizes, we performed stochastic simulations of a gamete-based model, which we have previously shown to yield equivalent results to individual-based simulations (![]()
Linked loci:
Cases of absolute linkage can be treated formally as a single-locus model, and in this case we refer to a linked pair of duplicates as a two-copy allele. Functional two-copy alleles have a slight selective advantage over their single-copy counterparts during the initial phase of establishment because single-copy alleles that experience either subfunctionalizing or nonfunctionalizing mutations can never go to fixation, whereas a mutated two-copy allele can fix as long as the two component genes cover all subfunctions. In small populations, this advantage is negligible because the two-copy allele is either lost or fixed by random genetic drift before a significant probability of mutation has accrued, and the probability that the new duplicate initially drifts to fixation is very close to its initial frequency, 1/(2N). Letting P'non,o and P'sub denote the subsequent fate probabilities conditional on the two-copy allele having become established, then because nonfunctionalization will occur randomly at one locus or the other, P'non,o =
, and
![]() |
(2a) |
![]() |
(2b) |
To obtain an expression for P'sub, we note that the probability that the first mutation to be fixed in a two-copy lineage is of a subfunctionalizing type is 2µr/(µc + 2µr). Conditional on this occurring, joint preservation of the two genes by subfunctionalization is expected to occur with probability
, because following the loss of one subfunction from one locus, the subfunctional locus is still free to fix subsequent mutations at rate µr + µc (resulting in nonfunctionalization), while the intact locus may only fix a mutation for the alternative subfunction (at rate µr, resulting in subfunctionalization; ![]()
2
2, and hence
0.5 +
2 and
2
2.
With increasing population size, there is an increasing probability that single-copy alleles will mutate during the long sojourn of a two-copy allele through the population, putting the former at a slight selective disadvantage. Consider, for example, the case of genes with a single function. At the limit as N
, the expected frequency of descendants of the initial two-copy gene among the total pool of functional genes increases from the initial level of 1/(2N) to a stable level of 1/N (Appendix). This transient behavior occurs because the initial mutations experienced by two-copy alleles are completely neutral, which causes their descendants to increase at the expense of one-copy alleles. The increase continues until all two-copy alleles have acquired a mutation in at least one copy, at which point they are selectively equivalent to functional single-copy alleles. These results suggest that at large N a completely linked pair of duplicate genes (in this case, assumed to be incapable of subfunctionalization or neofunctionalization) will fix with probability 1/N, with a random member of the pair becoming silenced, which further implies
2N · (1/N) · 0.5 = 1.0 as N
. The temporal dynamics outlined in the Appendix suggest that this large-population approximation should apply provided Nµc > 2. Using the approach outlined in the Appendix, after considerable analysis, we also obtained results that suggest that
1.0 as N
when there are two independently mutable subfunctions.
The preceding analytical approximations are in close agreement with observations from computer simulations (Fig 3 and Fig 4). At small N,
= 0.0 when there is only a single-gene function, yielding
0.5 and
= 0, whereas
= 0.333 when µr = µc, yielding
0.611 and
0.222. As N
,
1.0 under the conditions of one or two subfunctions, and
0.
|
|
Unlinked duplicates:
For freely recombining loci, the selective advantage of a newly arisen duplicate is negligible due to the fact that it does not remain associated with a functional partner. The key issue then becomes whether the newly arisen gene is capable of drifting to fixation in an intact state. As pointed out in ![]()
4N generation; ![]()
Under the assumption of negligible selection, an initially fully functional allele retains full functionality after 4N generations with probability
![]() |
(3) |
(again, assuming two independently mutable subfunctions) and will have lost a single subfunction with probability
![]() |
(4) |
Having reached the latter state (with the original locus still intact), joint preservation of the two loci by subfunctionalization will occur with probability
, following the logic outlined above. Noting that subsequent fixation events are expected to occur approximately every 4N generations on average and that P1Pt-10 is the probability that an initially intact gene has lost a single subfunction 4Nt generations following fixation, then the probability of subfunctionalization, conditional on the initial establishment of a duplicate, is
![]() |
(5) |
If, on the other hand, the newly arisen duplicate is a copy of a subfunctional allele, then the probability that it is intact after the expected 4N generations required for establishment is
![]() |
(6) |
and
![]() |
(7) |
is the conditional probability of subfunctionalization. Letting pf denote the expected initial frequency of the fully functional allele at the original locus, then the weighted conditional probability of subfunctionalization is
![]() |
(8) |
For small N, pf
1 and P1/(1 - P0)
2
, yielding P'sub
2
2, and from Equation 2a and Equation 2b,
=
0.5 +
2 and
2
2. These results are identical to the expectations for linked duplicates. As N
, P'sub
0, implying
=
0.5 and
0. This suggests that the probability of duplicate-gene preservation at large N is twofold lower in unlinked than in linked duplicates.
Provided Nµc < 10, these analytical approximations for unlinked duplicates yield results that are quite compatible with those obtained by computer simulation (Fig 3 and Fig 4). There are three fairly distinct regions of response to increasing N. First, for Nµc << 1,
=
0.5 +
2 and
2
2 as predicted by the theory for small N. Second, for 1 < Nµc < 10,
=
0.5 and
0 as predicted by the theory for large N. Third, as Nµc increases beyond 10,
=
gradually approaches zero. Although this latter phase is unaccounted for by the theory, it presumably occurs because when Nµc > 1 there is a significant probability that all of the descendants of a newly arisen duplicate become silenced by mutations prior to the initial establishment of the lineage. In any event, contrary to the situation for linked duplicates, the probability of preservation of unlinked duplicates declines with increasing population size, although, provided Nµc < 10, this probability still equals or exceeds 1/4N.
| PRESERVATION BY NEOFUNCTIONALIZATION |
|---|
We now consider the situation in which mutations with phenotypic effects either silence a gene or introduce a new beneficial function at the expense of the original function (Fig 1). The fitness landscape is assumed to be one in which individuals that carry no alleles with the original function have zero fitness, with the remaining genotypes having fitnesses equal to 1 + ns, where n = 0, 1, 2, or 3 is the number of neofunctional alleles carried. Silencing mutations are assumed to arise at rate µc per gene copy for both types of active alleles, whereas alleles of the "ancestral" type (hereafter referred to as wild type) can also mutate to the neofunctionalized state at rate µb.
To evaluate the probabilities of the alternative fates of a pair of duplicate loci subject to beneficial mutations, we employed a simulation approach identical in structure to that described in the previous section, starting with a single-copy locus with allele frequencies equal to the simulated expectations under selection-mutation-drift equilibrium. The newly arisen duplicate was initiated as a single copy randomly recruited from the pool of wild-type and neofunctional alleles at the original locus, and the generation-to-generation cycle of events was continued until the final fate of the pair of duplicates had been established. It is straightforward to identify nonfunctionalization as a final stable state, as this simply requires that one locus becomes fixed for null alleles. Identification of neofunctionalization as a fate is slightly more subjective because, in a finite population, there is always a very small possibility that a neofunctionalized locus may become lost in the future (because it carries a beneficial but nonessential function and is subject to nonfunctionalizing mutations). We considered neofunctionalization to have occurred when one locus had completely lost the wild-type allele and acquired a high enough frequency of the neofunctionalized allele to ensure a probability of fixation of the latter of at least 0.99. Using the diffusion approximation for the fixation probability of a beneficial allele with additive effects (![]()
![]() |
(9) |
which for large Ns reduces to p*
1.15/(Ns). (For the case of completely linked duplicates, this critical frequency must be applied to pairs of two-copy alleles with one neofunctional and one wild-type member, because neofunctional single-copy genes cannot become fixed in the population.) In the simulations that we performed, we assumed that the rate of mutation to neofunctional alleles (10-9 per gene per generation) is much smaller than the mutation rate to nulls (10-5 per gene per generation, as in the previous section), and s was 0.001, 0.01, or 0.1.
Under this model, a newly arisen gene duplicate can be regarded as preserved in the population if neofunctionalization occurs at either locus or if the original locus becomes nonfunctionalized. Thus, the scaled probability of preservation is
![]() |
(10) |
with the component terms being defined in Table 1 and Table 2. For genes that are not completely linked, a map change occurs if the original locus becomes silenced or neofunctionalized, so the scaled probability of a map change is
![]() |
(11) |
|
Finally, a new gene is added to the genome whenever one member of the pair is neofunctionalized, as this results in joint preservation of both copies. Hence,
![]() |
(12) |
A key feature of this model of gene duplication is that the original locus (prior to duplication) can exhibit a balanced polymorphism due to the recurrent input of mutations and to heterozygote superiority. Although neofunctional alleles have zero fitness when in the homozygous state, they have a heterozygote advantage of s when associated with wild-type alleles. For large N, a set of standard recursion equations for allele frequencies (ignoring drift) yields the approximate equilibrium frequencies of the neofunctional (n) and null (0) alleles. For µc < [s/(1 + s)]2,
![]() |
(13a) |
![]() |
(13b) |
whereas for µc > [s/(1 + s)]2,
![]() |
(14a) |
![]() |
(14b) |
These results, combined with observations from computer simulations (Fig 5), illustrate two key points. First, for sufficiently weak positive selection (µc > [s/(1 + s)]2), the mutation pressure against a neofunctional allele overwhelms the selective advantage, maintaining the frequency of neofunctional alleles at the original locus at negligible levels. For example, with s = 0.001 and µc = 10-5,
n asymptotically approaches
µb/(2s)
5 x 10-7 at large N. In this case, a new duplicate locus will almost always be initiated with a wild-type allele, and neofunctionalization will require mutation to new neofunctional alleles subsequent to the duplication process. Second, when selection is stronger (µc < [s/(1 + s)]2), the expected frequency of neofunctional alleles residing at the original locus is nearly a threshold function of population size, being closely approximated by Equation 13a, provided Ns2 > 4, and rapidly dropping to negligible values (<1/2N) for N below the threshold. For example, as N
, with µc = 10-5,
n
0.0088 when s = 0.01, and
n
0.083 when s = 0.1. This means that at large population sizes with unlinked loci, neofunctionalization need not rely on the rare occurrence of beneficial mutations but can be poised to move forward if (1) the new locus is founded with a neofunctional allele or (2) the new locus is founded with a wild-type allele that subsequently acquires a sufficiently high frequency that the neofunctional alleles at the original locus become subject to directional, rather than balancing, selection.
|
Linked loci:
In the case of complete linkage, a newly arisen gene duplicate must be of wild type to have any chance of permanent preservation, because under the assumptions of the model a linked pair of neofunctional genes is lethal in the homozygous state. So for linked duplicates, we considered only the case in which the initial duplicate carried the essential ancestral function. In this case, permanent preservation of both loci occurs when the founding two-copy allele goes to fixation and one member evolves a new function. This outcome yields a state of fixed heterozygosity, in the sense that each gamete carries one allele with the ancestral function and another with the new function (![]()
As noted above, the case of completely linked duplicates can be treated as a single-locus model with two classes of alleles, single copy and two copy. Ignoring the weak directional forces of selection, a newly arisen linked pair of gene duplicates (i.e., a two-copy allele carrying only wild-type genes) will initially be destined to go to fixation with probability 1/(2N) and otherwise to become lost with probability
. Should the two-copy allele proceed down the path toward fixation, one member of the pair will ultimately become either silenced or neofunctionalized. For fully redundant genes, silencing mutations go to fixation at the rate of µc per locus, since the number of newly arising mutations is 2Nµc per locus and the probability of a fixation of a neutral allele is 1/(2N), whereas beneficial mutations to a novel function go to fixation at the rate of 2NuFµb, as there are again 2N gene copies per locus, each mutating at rate µb and in this case fixing with probability uF. We rely on the diffusion approximation for the probability of fixation of a newly arisen beneficial mutation with additive effects,
![]() |
(15) |
(![]()
denote the relative probability of neofunctionalization, the conditional probabilities of the four possible fates of linked duplicates destined to fixation are
![]() |
(16a) |
![]() |
(16b) |
Were these the only paths to the preservation of a new duplicate, one would expect the upper limit for
and
to equal 1, because ß
1.0. However, we must also consider the possibility of the appearance of a neofunctionalizing mutation in a two-copy allele that is otherwise destined to be lost by random genetic drift, as this can alter the course of events.
To quantify the probability of such a rescue effect, we need to know the number of alleles that are available targets for neofunctionalizing mutations. The expected number of two-copy alleles in the population in generation t, conditional on not having yet been lost or having been rescued, can be shown to be
![]() |
(17) |
where uL(t) is the probability that the locus has been lost by drift by generation t. Because we are focusing on a large-population phenomenon, uL(t) can be approximated with FISHER's (1922) recursion for a mutant allele initially present in a single copy,
![]() |
(18) |
starting with uL(0) = 0. The probability that a two-copy allele otherwise destined to be lost acquires a neofunctionalizing mutation in generation t that will carry it to fixation is then
![]() |
(19) |
the 2 accounting for the two copies of the ancestral gene per two-copy allele, and the term e-µct being the probability that a gene within the pair has not acquired a silencing mutation by time t. Letting
![]() |
(20) |
be the probability that an effectively neutral allele destined to eventual loss is lost in generation t and
(t) be the probability that the fate of two-copy alleles has not been determined by generation t, then the partition of the contributions to alternative fates for the
cases in which a two-copy allele is initially destined to become lost is
![]() |
(21a) |
![]() |
(21b) |
with
![]() |
(22) |
The final probabilities of the four alternative fates are given by
![]() |
(23a) |
![]() |
(23b) |
![]() |
(23c) |
![]() |
(23d) |
(For the reader's convenience, we summarized the definitions of all terms associated with the neofunctionalization model in Table 2.)
For the most part, these expressions are in good agreement with the simulated data (Fig 6 and Fig 7). At small population sizes, there is a negligible likelihood of a beneficial mutation resurrecting a two-copy locus destined to be lost by drift, so from Equation 16a and Equation 16b alone,
(1 + ß)/2 and
ß. At the very smallest population sizes (N < 103), ß asymptotically approaches µb/(µc + µb), which for µb << µc results in
0.5 + (µb/µc) and
µb/µc. On the other hand, in the limit as N
, the chance of the original locus becoming silenced is negligible, which results in
scaling nearly linearly with population size.
|
|
Unlinked loci:
The probability of neofunctionalization can be greatly enhanced in the case of freely recombining loci because a new duplicate locus that is founded by a neofunctionalized allele is free to move toward fixation and because the fates of subsequent mutations at one locus are less influenced by those at the other. Given that the equilibrium allele frequencies at the original locus are related to N and s in a threshold manner (Equation 13a HREF="#FD13b">Equation 13b and Equation 14aEquation 14b and Fig 5), two alternative sets of analytical approximations appear to be necessary.
We first consider the situation in which neofunctionalized alleles are likely to be segregating at nonnegligible frequencies, µc < [s/(1 + s)]2, which for the parameters that we examined holds for s = 0.1 and 0.01. To have any chance of establishing itself permanently, a newly arisen duplicate locus must be founded by either a neofunctionalized (n) or wild-type (f) allele, the probabilities of which are
![]() |
(24a) |
![]() |
(24b) |
where
n and
0 are defined by the values in Fig 5. If the founder allele is of the neofunctional type, the probability of fixation is given by Equation 15 with selection coefficient
![]() |
(25a) |
and, conditional upon such fixation, the original locus must maintain the original function. If the founder allele is wild type, the probability of fixation is a function of the relative fitnesses of the ff, f0, and 00 genotypes at the new locus induced by the presence of 00, n0, and nn genotypes at the original locus, where 0 denotes a nonfunctional allele. The latter genotypes have zero fitness if the genotype at the new locus is 00 but respective fitnesses of 1, 1 + s, and 1 + 2s if the genotype at the new locus is ff or f0. Scaling the fitness of the 00 genotype at the new locus to be equal to one, the initial expected selective advantage of both the ff and f0 genotypes is equal to
![]() |
(25b) |
which for large N and µc < [s/(1 + s)]2 simplifies to sf
s2/(1 + 2s). ![]()
![]() |
(26) |
and in the event that this does not occur, one of the two loci is expected to become neofunctionalized via new mutations with probability ß. Summing up the various paths, the probabilities of the four alternative fates of the gene pair are then given by
![]() |
(27a) |
![]() |
(27b) |
![]() |
(27c) |
![]() |
(27d) |
where uF(sf) and uF(sn) are obtained from Equation 15 after substituting for s. In the limit for large N, ß
1, pn
s/(1 + 2s), uF(sf)
2s2/(1 + 2s), and uF(sn)
2s(1 + s)2/(1 + 2s)2, leading to
=
4Ns2(2 + 3s)(1 + s)/(1 + 2s)2 and
/(2 + 3s). Provided s < 0.1, these large-N/large-s approximations reduce further to
=
8Ns2 and
4Ns2, showing that all three statistics increase linearly with N (implying that the probabilities of these fates are independent of N) and with the square of s.
We now turn to the situation in which µc > [s/(1 + s)]2, which for the parameters that we examined holds for s = 0.001, and in which case there is a negligible chance of the new locus being initially founded with a neofunctional allele. We again take a cohort approach, similar to that used in the case of linked loci, noting that the founder allele at the new locus is initially destined to fix with probability 1/(2N) and otherwise to be lost with probability
. In the former case, one of the loci is expected to eventually become neofunctionalized with probability ß or to become nonfunctionalized with probability 1 - ß. In the latter case, we must account for the possibility that the new locus, otherwise destined to be lost, will be rescued with a neofunctionalizing mutation. The probability of rescue in generation t is given by
![]() |
(28) |
with uF defined by Equation 15 and nm(t) by Equation 17, and the generation-specific contributions to alternative fates for the cases in which the founder allele is initially destined to loss are
![]() |
(29a) |
![]() |
(29b) |
where pL(t) is defined by Equation 20, and
![]() |
(30) |
We then have
![]() |
(31a) |
![]() |
(31b) |
![]() |
(31c) |
![]() |
(31d) |
As can be seen in Fig 6 Fig 7 Fig 8, the theory for freely recombining duplicates is in fairly close agreement with the values of
,
, and
observed over the full range of N and s, the main exception being the overestimation of
at large N when selection is weak. When N is small,
=
0.5 independent of s. This is again a consequence of the fact that the probability of fixation of a newly arisen locus is equal to 1/(2N) and that one of the loci will then almost always become silenced, because of the negligible probability of neofunctionalization. On the other hand, once N exceeds a threshold value (depending on s and µc),
scales linearly with N and approximately linearly with s2 in agreement with the asymptotic expressions given above. A similar scaling with N and s2 is seen for
at large N. The abrupt change in the behavior of
,
, and
at intermediate N and strong selection (s = 0.01 and 0.1) corresponds precisely with the abrupt change in frequency of neofunctional alleles at the original locus (Fig 5).
|
| DISCUSSION |
|---|
These results demonstrate that the evolutionary trajectories of duplicate genes are not just functions of intrinsic organismal properties such as gene structure, regulatory-region complexity, distribution of mutational effects, etc., but are also highly dependent on the effective size of a population. This view suggests that the mechanisms influencing the fates of duplicate genes may vary dramatically among species (and even within the history of individual species lineages) depending on the population size prevailing during the initial appearance of a duplicate gene. Population size influences the evolution of duplicate genes in two ways. First, larger populations are more likely to harbor segregating subfunctional or neofunctional alleles at the ancestral locus prior to duplication, raising the possibility that the newly arisen locus may be founded by an allele other than the wild type and also the possibility that the ancestral locus can rapidly become neofunctionalized (without the reliance on new beneficial mutations) if the new locus becomes established with wild-type alleles. Second, because the time to fixation (and loss) increases with increasing population size, the potential fates of duplicate genes can be altered during the long period in which they drift through large populations and acquire secondary mutations. For example, subfunctional alleles at a new locus may become completely silenced by degenerative mutations prior to fixation, whereas functional alleles that are otherwise destined to be lost by drift can on occasion be rescued by a beneficial mutation. Thus, attempts to understand the evolution of the duplicate genes (and by extrapolation, other aspects of genome expansion/contraction) are not likely to be successful unless they are considered in the context of the genetic properties of finite populations.
Preservation of the new copy:
Two rather different models, one incorporating only degenerative mutations and the other also including beneficial mutations, suggest that the probability of preservation of a newly arisen duplicate gene is generally no less than half of its initial frequency (i.e.,
> 0.5) regardless of the degree of linkage (Fig 3 and Fig 6). Thus, unless there is active selection against a duplicate gene, its probability of permanent establishment is at least one-half the expected fixation probability of a neutral allele, i.e.,
1/(4N). Moreover, in the absence of an appreciable likelihood of fixation of beneficial mutations (either because the rate of mutation to such alleles is too low, the beneficial effects are too small, or the population size is insufficiently large), the probability of preservation is unlikely to exceed 1/(2N). On the other hand, in sufficiently large populations, neofunctionalization can lead to probabilities of preservation (per duplication event) that are independent of N and orders of magnitude greater than possible under a scenario dominated by degenerative mutations. Provided the null mutation rate is sufficiently small relative to the strength of selection (µc < [s/(1 + s)]2) and the effective population size is sufficiently large (Ns2 > 4; Fig 5), most cases of neofunctionalization following gene duplication are expected to be driven by neofunctional alleles preexisting at the ancestral locus rather than by mutations arising subsequent to the duplication event. If the new locus is founded by a wild-type allele that reaches sufficiently high frequency, natural selection will promote the neofunctional alleles segregating at the original locus. Alternatively, the new locus may be founded by a neofunctional allele that goes to fixation, in which case the original gene function will be maintained at the ancestral locus.
Although our results suggest that subfunctionalization will be a more common mechanism of duplicate-gene preservation in small populations, with neofunctionalization becoming progressively more common as N increases, the exact population size at which neofunctionalization begins to exceed subfunctionalization as a preservational mechanism will depend on the relative rates of origin of the two types of preservational mutations (µr and µb) and on the selective advantage of neofunctional alleles. For the case of neofunctionalization, it is noteworthy that
(=
) scales not with s, as would normally be expected for an unconditionally advantageous allele at a single locus, but with the square of s. This scaling can be understood most easily by considering the case of unlinked duplicates at large N. If the founding allele at the new locus is wild type, its main initial advantage (relative to "absentee" alleles at the new locus) arises in backgrounds where the genotype at the ancestral locus is of type nn, n0, or 00, and from Equation 13a and Equation 13b it can be seen that the most abundant of these genotypes, nn, has an expected frequency
[s/(1 + 2s)]2. On the other hand, if the founding allele is of the neofunctional type, it will go to fixation with probability
2sf, and from Equation 25b it can be seen that sf
s2/(1 + 2s). Thus, regardless of the nature of the founder allele, its probability of fixation scales approximately with s2 at large N. If subfunctionalizing mutations greatly outnumber neofunctionalizing mutations and s is typically small, neither of which seems unlikely, then the majority of successful gene duplicates may owe their preservation to subfunctionalization. Not included in our analyses is the possibility that many duplicates may be subfunctionalized at birth via the duplication process itself, due, for example, to the failure of the duplicated region to cover the full ancestral gene sequence (![]()
In large populations, the degree of linkage between duplicate genes can substantially influence the probability of preservation of a new gene copy (Fig 3 and Fig 6). When degenerative mutations dominate the process, a linked pair of functional duplicates has a weak transient selective advantage over a single-copy allele, because the former requires at least two mutations to be silenced. This results in an increase in the probability of preservation from 1/(4N) at small N to an asymptotic level of 1/(2N) at large N. Thus, in the absence of beneficial mutations, a linked pair of duplicates fixes at the neutral rate at large N despite the fact that the underlying process is non-neutral. This behavior contrasts with that of an unlinked duplicate, which, in the absence of beneficial mutations, is prevented from becoming permanently established in very large populations by saturation with silencing mutations by the time the lineage fixes in the population. In contrast, when neofunctionalizing mutations become a prominent influence, linkage reduces the probability of preservation of gene duplicates. Free recombination facilitates the neofunctionalization process because a pair of completely linked neofunctional genes (or a pair containing one neofunctional and one nonfunctional copy) is prevented from going to fixation by the lack of the critical ancestral gene function.
These results suggest the hypothesis that duplicate genes that are preserved by neofunctionalization will tend to be unlinked, whereas those preserved by subfunctionalization (or silencing of the ancestral gene) will tend to be more closely linked (at least during the period of preservation). It should be noted, however, that although duplicate genes often arise in tandem association with the parental locus, they are frequently recruited to new locations at an early stage of their history (![]()
Evolution of genome size:
Although the preservation of duplicate genes often leads to an expansion in genome size, this is not necessarily the case because the preservation of a new gene copy may be balanced by the loss of the ancestral copy. For example, in sufficiently small populations, where the likelihood of neofunctionalization is reduced to negligible levels, a new duplicate may still become preserved if it drifts to fixation and the original locus becomes nonfunctionalized, but in this case there is no net change in genome size. Any pressure toward genome-size expansion is expected to come from subfunctionalization until a critical population size has been reached and neofunctionalization becomes more dominant, the exact threshold population size again depending on µr, µb, s2, and the degree of linkage between ancestral and descendant loci.
Like nucleotide substitutions, insertions, and deletions, gene duplication appears to be a common attribute of all genomes. For example, analysis of the complete genomic sequences of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae suggests that new duplications may typically become established in populations at rates on the order of 10-310-2 per gene per million years (![]()
Because subfunctionalizing and neofunctionalizing mechanisms will generally ensure an innate tendency toward a net accumulation of new genes, stability in genome size requires selection against too many gene duplicates and/or molecular mechanisms that stochastically delete additional copies. In the absence of such opposing forces, one might expect the expansion of genome size to be a self-accelerating process, as the accumulation of more genes provides more substrate for future duplications. However, the opportunities for preservation by subfunctionalization are expected to be reduced as members of a gene family partition up the tasks of the ancestral gene, and, under the neofunctionalization model, the likelihood of establishing a new beneficial function may decline with an increase in organismal complexity; i.e., both µr and µb may decline with increasing genome size. These design limitations alone may constrain the indefinite expansion of genome size, but mutational mechanisms almost certainly play an additional role. For example, nonessential DNA appears to have a half-life of
14 million years in Drosophila and
880 million years in mammals (![]()
![]()
![]()
![]()
The mechanisms that we have suggested for the expansion of genome size via duplicate genes need not be all inclusive. For example, it has been suggested that genomic redundancies may be selectively maintained to mask the consequences of null homozygotes or errors in transcription and translation (![]()
![]()
![]()
![]()
Alterations of the genetic map:
Gene duplication may be of as much relevance to the origin of new species as it is to the origin of evolutionary novelty within species (![]()
![]()
![]()























































