| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Hideki Innan, School of Public Health, University of Texas Health Science Center, 1200 Hermann Pressler, Houston, TX 77030., hideki.innan{at}uth.tmc.edu (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
Nonindependent evolution of duplicated genes is called concerted evolution. In this article, we study the evolutionary process of duplicated regions that involves concerted evolution. The model incorporates mutation and gene conversion: the former increases d, the divergence between two duplicated regions, while the latter decreases d. It is demonstrated that the process consists of three phases. Phase I is the time until d reaches its equilibrium value, d0. In phase II d fluctuates around d0, and d increases again in phase III. Our simulation results demonstrate that the length of concerted evolution (i.e., phase II) is highly variable, while the lengths of the other two phases are relatively constant. It is also demonstrated that the length of phase II approximately follows an exponential distribution with mean
, which is a function of many parameters including gene conversion rate and the length of gene conversion tract. On the basis of these findings, we obtain the probability distribution of the level of divergence between a pair of duplicated regions as a function of time, mutation rate, and
. Finally, we discuss potential problems in genomic data analysis of duplicated genes when it is based on the molecular clock but concerted evolution is common.
TO understand the evolutionary importance of gene duplication, it is critical to know when duplication events occurred. This is not very difficult as long as two duplicated genes have accumulated mutations independently, because we can estimate the time to the duplication event from the level of nucleotide divergence between two duplicates. The idea that nucleotide divergence has a linear correlation with time is known as the "molecular clock." However, the molecular clock hypothesis does not always hold for duplicated genes because of the phenomenon called "concerted evolution" (reviewed in ![]()
![]()
![]()
![]()
Gene conversion has been considered as the most important mechanism for this homogenization in duplicated genes (i.e., a small multigene family with copy number of 2), although unequal crossing over could also be important for large- or middle-size multigene families. Clear evidence for gene conversion is seen when DNA polymorphism data are available for both of the duplicated genes, because gene conversion creates "shared polymorphic sites" (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
The action of gene conversion can also be suggested from phylogenetic studies. For example, suppose that duplicated genes I and II exist in species A and B. This means the gene duplication event predates the speciation (Fig 1B). Without gene conversion, it is expected that we observe a tree that is consistent with the real tree. However, if these genes have undergone concerted evolution, the observed tree might be inconsistent with the real tree. That is, the two duplicated genes in each species are more closely related (Fig 1B, right tree). On the basis of this idea, recently, ![]()
Thus, there are many lines of evidence for gene conversion between duplicated genes. However, the effect of gene conversion on the divergence between duplicated genes has not been well understood theoretically. The purpose of this article is to investigate the behavior of the divergence between duplicated genes after gene duplication. We modify gene conversion models of ![]()
| MODEL AND SIMULATION |
|---|
The evolutionary process of a pair of duplicated regions is considered. We study the behavior of d, the number of nucleotide differences between duplicated regions after their birth (i.e., duplication). The process involves mutation and gene conversion; the former increases the level of divergence while the latter decreases it. Parameters used in this section are summarized in Table 1.
|
Suppose a duplication event creates two identical sequences in a genome at time T = 0. In this article, we consider the evolutionary process of a pair of subregions that are within the duplicated regions as illustrated in Fig 2. Each considered region is represented by a large box and assigned to an interval of (0, 1). After the duplication event, the regions start accumulating mutations and gene conversion works to homogenize variation. Mutation occurs independently in the two regions at rate µ per region per generation, so that the number of mutations in each region follows a Poisson distribution with mean µ. For each mutation, its position is determined as a random variable between 0 and 1 (i.e., infinite-site model).
|
Gene conversion transfers a DNA fragment from one to the other. Fig 2 shows two examples of gene conversion events. Gene conversion I transfers a DNA segment between positions 0.2 and 0.5 from the original region to the duplicated region, so that the corresponding part of the duplicated region is replaced by the original sequence. Note that shaded boxes represent the sequence of the original region before gene conversion and open boxes show the duplicated region. Gene conversion could involve regions outside of the interval (0, 1). Gene conversion II in Fig 2 consists of a region from position 0.85 to 1 and a fragment outside of position 1.
Gene conversion is simulated assuming the length of the conversion tract follows a geometric distribution in a finite length of DNA region. This assumption is from ![]()
![]()
![]() |
(1) |
which corresponds to the gene conversion rate per site per generation defined in ![]()
![]()
![]()
In this study, we consider another two probabilities. One is the probability that determines whether the two regions make pairing in meiosis, which is considered to be required for gene conversion to occur. This probability, S1(x), should be a function of x, the divergence between the two regions. For example, S1(x) is defined as
![]() |
(2) |
when the two regions make pairing when the divergence is less than s1.
The other is the probability that a gene conversion event is successfully completed. Let S2(y) be this probability, which is given by a function of y, the divergence in a gene conversion tract. One example is that gene conversion occurs only when y is smaller than a certain threshold, s2. That is,
![]() |
(3) |
Another example might be that S2(y) linearly decreases as y increases:
![]() |
(4) |
Note that S1(x) depends on the divergence in the whole region, x, while S2(y) is given for each gene conversion tract and y represents the level of divergence in a tract. Similar models are used in ![]()
![]()
![]()
![]()
In addition to nucleotide mutations, ![]()
Note that the numbers of mutations and gene conversion events per generation follow Poisson distributions. When the expected numbers of these events are very small, we can obtain essentially the same simulation results by an approximate method, in which the Poisson processes are simulated every k generation with the parameters multiplied by k.
| RESULTS |
|---|
The behavior of the divergence between duplicated regions, d, is investigated by computer simulation. Throughout this study µ = 106/region is assumed: the simulated region corresponds to a 1-kb region if we assume that a standard mutation rate = 109/site/generation. Also, k = 105 is assumed. We assume 1/Q = 0.1, s1 = 100 in (2), and s2 = 100z' in (3), where z' is the gene conversion tract length in the interval (0, 1). The results are in Fig 3, which shows four independent realizations of the trace of d for c = {0, 1, 3, 5, 10} x 108 from top to bottom. Without gene conversion (top), d increases linearly with increasing time (i.e., molecular clock). When c = 108, a little delay is observed in the increasing function, and the delay gets bigger as c increases. When c = 5 x 108, d fluctuates around an equilibrium value for a quite long time, and then d starts increasing linearly. The bottom indicates that c = 107 is so high that the fluctuation time of d is extremely long (but the fluctuation does not continue forever).
|
Thus, the evolutionary process of duplicated regions involves a long fluctuation time of d unless the gene conversion rate is small. Fig 4 illustrates a typical behavior of d, which consists of three phases: phase I is the time until d reaches its equilibrium value, d0, which depends mainly on the mutation rate and gene conversion rate (![]()
![]()
|
To study the length of concerted evolution, we consider Td1, the waiting time for the first hit of d = d1, d1 >> d0. Td1 is divided into three parts, t1, t2, and t3, according to the phases (see Fig 4). We expect that Td1 is directly related to the length of concerted evolution, t2, because t2
Td1 d1/(2µ). The effect of gene conversion rate on Td1 is investigated by simulation. We assume d1 = 200. Table 2 summarizes the results of simulations for c = {1, 2, 3, ... , 10} x 108 when 1/Q =
, s1 = 100, and S2 is given by (3) with s2 = 100. It is demonstrated that as c increases, the average of Td1 increases exponentially. The variance of Td1 is huge, indicating that Td1 is highly variable. Similar results are obtained for the case of 1/Q = 0.1 (also see Fig 5A).
|
|
We also investigate the relationship between c and Td1 under various gene conversion models (Fig 5). First we assume 1/Q = 0.1, s1 = 100 in (2) and S2 is given by (3) with s2 = 100z'. The result is presented by solid stars in Fig 5A. Td1 is larger than that for 1/Q =
(solid squares), indicating that the gene conversion tract length has a significant effect on the length of concerted evolution (see below). Next we consider Td1 for 1/Q =
and 1/Q = 0.1 in a model where s1 = 100 in (2) and S2 is given by (4) with s3 = 100z'. It is shown that the increase of Td1 against c is slower in comparison with the previous model. This is expected because the effective gene conversion rate given by (4) is smaller than that given by (3) if c is the same and s2 = s3.
In Fig 5B, Td1 in a model with terminator mutation is shown. We assume m = 1.25 x 1011, s1 = 100 in (2) and S2 is given by (3) with s2 = 100z'. When 1/Q =
, as c increases, Td1 saturates around 1/(2m) = 4 x 1010 because phase II is terminated with probability 2m. The situation is complicated when 1/Q = 0.1. Td1 saturates somehow around 1/(2m) when c
89 x 108, but again starts increasing for c
10 x 108. This is because after a terminator mutation gene conversion is suppressed in a short region around the mutation when 1/Q is small. That is, as c increases, in most regions phase II continues for a quite long time even after a terminator mutation.
Fig 6A shows the effect of the length of gene conversion tract on Td1 when c = 3 x 108, m = 0, and s1 = 100. As 1/Q decreases, the average of Td1 increases dramatically. This observation could be understood as follows. As shown in Fig 6B, 1/Q has a positive correlation with the variance of d in phase II. The variance of d is a very important factor to determine the time of phase II because phase II is terminated when d happens to exceed a threshold value of d, dt. dt is defined as the minimum value of d that is too big for gene conversion to occur (see Fig 4). That is, once d hits dt, there is no chance that the system returns to phase II. It is obvious that the larger the variance of d, the more chance that d hits dt, creating the negative correlation between Td1 and 1/Q. Note that 1/Q has no effect on the expectation of d in phase II (![]()
![]()
|
Thus, the time of phase II, t2, might be considered as a waiting time for an event when d first hits dt. Let
be the expectation of this waiting time. It is expected that t2 approximately follows an exponential distribution with mean
when
is large, and the variance of Td1 is approximately given by
![]() |
(5) |
because t1, t2, and t3 are almost independent. Table 2 demonstrates that Equation 5 holds quite well when c
3 x 108. This supports the hypothesis that t2 approximately follows an exponential distribution.
Our simulation results have indicated that the relationship between d and time is complicated because many parameters (c, 1/Q, s1, s2, and m) affect t2. However, the probability distribution function (pdf) of d might involve only three parameters, µ,
, and T, because
summarizes all parameters that affect t2. Here, we attempt to obtain the pdf of d as a function of µ,
, and T, assuming the pdf of t2 is given by
![]() |
(6) |
We define te as the effective time that directly contributes to the linear accumulation of mutations, which is given by
![]() |
(7) |
We assume T >> t1. If concerted evolution is still going on at T (i.e., phase II), te = t1 and d is somewhere around d0. The probability that the system is in phase II at T is given by
![]() |
(8) |
and the pdf of te is given by
![]() |
(9) |
Note that t1 is unknown here, but the numerical calculation of (9) can be done assuming t1
0 because T >> t1. Then, the pdf of d is given by convolution:
![]() |
(10) |
This equation works best for d >> dt. The probability that d < dt is almost identical with that in Equation 8.
We can also obtain the pdf of Td1. That is,
![]() |
(11) |
when µ is small.
| DISCUSSION |
|---|
The evolutionary process of a pair of duplicated regions is studied by simulations. The model incorporates mutation and gene conversion: the former increases d, the divergence between two duplicated regions, while the latter decreases d. It is demonstrated that the process consists of three phases. Phase I is the time until d reaches its equilibrium value, d0. In phase II d fluctuates around d0, and d increases again in phase III. These three phases are defined such that d has a positive linear correlation with time if phase II is deleted. Phase II approximately corresponds to the time under concerted evolution. The lengths of the three phases, t1, t2, and t3, could be almost independent of each other. t1 and t3 are relatively constant, while t2 is highly variable. Our simulations demonstrated that t2 approximately follows an exponential distribution because t2 is a waiting time for a random event that initiates phase III. The rate that such events occur determines
, the expectation of t2, which depends on the mutation rate (µ and m), gene conversion rate (c together with S1 and S2), and the average length of gene conversion tract (1/Q). It seems extremely difficult to obtain an equation for
as a function of these parameters, but we were able to obtain the pdf of d given
, µ, and T.
In this study, we considered mutations as only a mechanism to terminate phase II. However, strong selection could also be a factor to stop concerted evolution. ![]()
![]()
Recent genomic sequence data provide great opportunities to study the evolution of gene duplication (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
|
Fig 7A shows the expected frequency distribution of d when b = 0 and c = 3 x 108 (open bars), indicating that gene conversion creates
10 times more duplicated genes with low divergence (d < 50) than the case without gene conversion (solid bars) does because d spends a long time around d0. The two distributions are identical for d > 100 because gene conversion does not occur (i.e., dt = 100). It is indicated that gene conversion alone can also create an exponential-like distribution of d as well as the constant death process. The open bars in Fig 7B show that the frequency distribution under the joint effect of the death process and gene conversion is also similar to an exponential distribution. The peak of genes with low divergence is approximately three times higher than that of the case of the death process only, and the distribution decreases very quickly as d increases.
Thus, if concerted evolution of duplicated genes via gene conversion is common, the effect of gene conversion on the frequency distribution of d cannot be ignored. In such a case, it is indicated that the result of duplicated gene analysis based on the molecular clock should be biased. For example, suppose a and b are estimated by fitting an exponential distribution. An estimate of a depends on the number of duplicated genes with very low d: the more genes with low d, the higher the rate of gene duplication. If the number of duplicated genes of low divergence is increased by gene conversion, a should be overestimated. This excess of genes of low divergence might also contribute to an overestimation of b because an estimate of b depends on how quickly the distribution decreases as d increases.
Fig 7C shows the observed frequency distributions of the level of divergence measured by Ks, the expected number of synonymous substitutions per site in the Drosophila melanogaster genome. The data are from ![]()
![]()
![]()
Our results suggest that analyzing duplicated genes on the basis of the molecular clock might be misleading if concerted evolution is common. Therefore, before analyzing data, it is very important to test the molecular clock hypothesis, which might require genomic information from other related species. Unfortunately, such data are few at this moment, but will be available in time.
| ACKNOWLEDGMENTS |
|---|
J. S. Conery and M. Lynch kindly provided us information on their data in ![]()
Manuscript received September 16, 2003; Accepted for publication December 2, 2003.
| LITERATURE CITED |
|---|
ARNHEIM, N., 1983 Concerted evolution of multigene families, pp. 3861 in Evolution of Genes and Proteins, edited by M. NEI and R. K. KOEHN. Sinauer Associates, Sunderland, MA.
BAILEY, J. A., Z. GU, R. A. CLARK, K. REINERT, and R. V. SAMONTE et al., 2002 Recent segmental duplications in the human genome. Science 297:1003-1007.
BETTENCOURT, B. R. and M. E. FEDER, 2002 Rapid concerted evolution via gene conversion at the Drosophila hsp70 genes. J. Mol. Evol. 54:569-586.[CrossRef][Medline]
CHARLESWORTH, D., B. K. MABLE, M. H. SCHIERUP, C. BARTOLOMÉ, and P. AWADALLA, 2003 Diversity and linkage of genes in the self-incompatibility gene family in Arabidopsis lyrata.. Genetics 164:1519-1535.
FRIEDMAN, R. and A. L. HUGHES, 2001 Pattern and timing of gene duplication in animal genomes. Genome Res. 11:1842-1847.
GU, X., Y. WANG, and J. GU, 2002 Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31:205-209.[CrossRef][Medline]
INNAN, H., 2002 A method for estimating the mutation, gene conversion and recombination parameters in small multigene families. Genetics 161:865-872.
INNAN, H., 2003a The coalescent and infinite-site model of a small multigene family. Genetics 163:803-810.
INNAN, H., 2003b A two-locus gene conversion model with selection and its application to the human RHCE and RHD genes. Proc. Natl. Acad. Sci. USA 100:8793-8798.
INOMATA, N., H. SHIBATA, E. OKUYAMA, and T. YAMAZAKI, 1995 Evolutionary relationships and sequence variation of
-amylase variants encoded by duplicated genes in the Amy locus of Drosophila melanogaster.. Genetics 141:237-244.[Abstract]
KING, L. M., 1998 The role of gene conversion in determining sequence variation and divergence in the Est 5 gene family in Drosophila pseudoobscura.. Genetics 148:305-315.
LAZZARO, B. P. and A. G. CLARK, 2001 Evidence for recent paralogous gene conversion and exceptional allelic divergence in the Attacin genes of Drosophila melanogaster.. Genetics 159:659-671.
LI, W.-H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.
LYNCH, M. and J. S. CONERY, 2000 The evolutionary fate and consequences of duplicate genes. Science 290:1151-1155.
MCLYSAGHT, A., K. HOKAMP, and K. H. WOLFE, 2002 Extensive genomic duplication during early chordate evolution. Nat. Genet. 31:200-204.[CrossRef][Medline]
NIELSEN, K. M., J. KASPER, M. CHOI, T. BEDFORD, and K. KRISTIANSE et al., 2003 Gene conversion as a source of nucleotide diversity in Plasmodium falciparum.. Mol. Biol. Evol. 20:726-734.
OHTA, T., 1980 Evolution and Variation of Multigene Families. Springer-Verlag, Berlin/New York.
OHTA, T., 1983 On the evolution of multigene families. Theor. Popul. Biol. 23:216-240.[CrossRef][Medline]
ROZEN, S., H. SKALETSKY, J. D. MARSZALEK, P. J. MINX, and H. S. CORDUM et al., 2003 Abundant gene conversion between arms of palindromes in human and ape chromosomes. Nature 423:873-876.[CrossRef][Medline]
SATO, K., T. NISHIO, R. KIMURA, M. KUSABA, and T. SUZUKI et al., 2002 Coevolution of the S locus genes SRK, SLG and SP11/SCR in Brassica oleracea and B. rapa.. Genetics 162:931-940.
WALSH, J. B., 1987 Sequence-dependent gene conversion: Can duplicated genes diverge fast enough to escape conversion? Genetics 117:543-557.
WIUF, C. and J. HEIN, 2000 The coalescent with gene conversion. Genetics 155:451-462.
This article has been cited by other articles:
![]() |
K. M. Teshima and H. Innan Neofunctionalization of Duplicated Genes Under the Pressure of Gene Conversion Genetics, March 1, 2008; 178(3): 1385 - 1398. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Zhang and N. A. Rosenberg On the Genealogy of a Duplicated Microsatellite Genetics, December 1, 2007; 177(4): 2109 - 2122. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. R. Thornton The Neutral Coalescent Process for Recent Gene Duplications and Copy-Number Variants Genetics, October 1, 2007; 177(2): 987 - 1000. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Labbe, A. Berthomieu, C. Berticat, H. Alout, M. Raymond, T. Lenormand, and M. Weill Independent Duplications of the Acetylcholinesterase Gene Conferring Insecticide Resistance in the Mosquito Culex pipiens Mol. Biol. Evol., April 1, 2007; 24(4): 1056 - 1067. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Zhang and T. Peterson Gene Conversion Between Direct Noncoding Repeats Promotes Genetic and Phenotypic Diversity at a Regulatory Locus of Zea mays (L.) Genetics, October 1, 2006; 174(2): 753 - 762. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-S. Lin, J. K. Byrnes, J.-K. Hwang, and W.-H. Li Codon-usage bias versus gene conversion in the evolution of yeast duplicate genes PNAS, September 26, 2006; 103(39): 14412 - 14416. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kubota, H. Iwata, H. M. H. Goldstone, E.-Y. Kim, J. J. Stegeman, and S. Tanabe Cytochrome P450 1A4 and 1A5 in Common Cormorant (Phalacrocorax carbo): Evolutionary Relationships and Functional Implications Associated with Dioxin and Related Compounds Toxicol. Sci., August 1, 2006; 92(2): 394 - 408. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Thomas Concerted Evolution of Two Novel Protein Families in Caenorhabditis Species Genetics, April 1, 2006; 172(4): 2269 - 2281. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Backstrom, H. Ceplitis, S. Berlin, and H. Ellegren Gene Conversion Drives the Evolution of HINTW, an Ampliconic Gene on the Female-Specific Avian W Chromosome Mol. Biol. Evol., October 1, 2005; 22(10): 1992 - 1999. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. P. Sugino and H. Innan Estimating the Time to the Whole-Genome Duplication and the Duration of Concerted Evolution via Gene Conversion in Yeast Genetics, September 1, 2005; 171(1): 63 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. L. Prigoda, A. Nassuth, and B. K. Mable Phenotypic and Genotypic Expression of Self-incompatibility Haplotypes in Arabidopsis lyrata Suggests Unique Origin of Alleles in Different Dominance Classes Mol. Biol. Evol., July 1, 2005; 22(7): 1609 - 1620. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Schmidt and R. Durrett Adaptive Evolution Drives the Diversification of Zinc-Finger Binding Domains Mol. Biol. Evol., December 1, 2004; 21(12): 2326 - 2339. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-z. Gao and H. Innan Very Low Gene Duplication Rate in the Yeast Genome Science, November 19, 2004; 306(5700): 1367 - 1370. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |