| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Hidenori Tachida, Department of Biology, Faculty of Science, Kyushu University, 33, Fukuoka, 812-8581, Japan., htachscb{at}mbox.nc.kyushu-u.ac.jp (E-mail).
Communicating editor: A. G. CLARK
| ABSTRACT |
|---|
Evolution of multigene families by gene duplication and subsequent diversification is analyzed assuming a haploid model without interchromosomal crossing over. Chromosomes with more different genes are assumed to have higher fitness. Advantageous and deleterious mutations and duplication/deletion also affect the evolution, as in previous studies. In addition, negative selection on the total number of genes (copy number selection) is incorporated in the model. First, a Markov chain approximation is used to obtain formulas for the average numbers of different alleles, genes without pseudogene mutations, and pseudogenes assuming that mutation rates and duplication/deletion rates are all very small. Computer simulation shows that the approximation works well if the products of population size with mutation and duplication/deletion rates are all small compared to 1. However, as they become large, the approximation underestimates gene numbers, especially the number of pseudogenes. Based on the approximation, the following was found: (1) Gene redundancy measured by the average number of redundant genes decreases as advantageous selection becomes stronger. (2) The number of different genes can be approximately described by a linear pure-birth process and thus has a coefficient of variation around 1. (3) The birth rate is an increasing function of population size without copy number selection, but not necessarily so otherwise. (4) Copy number selection drastically decreases the number of pseudogenes. Available data of mutation rates and duplication/deletion rates suggest much faster increases of gene numbers than those observed in the evolution of currently existing multigene families. Various explanations for this discrepancy are discussed based on our approximate analysis.
GENE duplication with subsequent diversification is considered to have played very important roles in the organismal evolution (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]() |
(1) |
In the present article, as an extension of OHTA's work, a haploid model of gene duplication is analyzed paying attention to the change of gene number. Copy number selection is explicitly incorporated into the model, and the duplication rate from a single gene was assumed to be 0. First, the model is described. Then, a Markov chain approximation of the model is derived, and its behavior is analyzed assuming low mutation and duplication/deletion rates. Results of extensive computer simulation to check the accuracy of the approximate analysis will be described. Approximate formulas for the rate of increase of the number of different genes and gene redundancy were obtained as a function of population size, mutation rates, duplication/deletion rates, and selection coefficients. Copy number selection was shown to be very effective in reducing the number of pseudogenes. Relationships of various haploid models also will be discussed.
| MODEL |
|---|
Definitions of symbols used are summarized in Table 1. A random-mating haploid population consisting of N chromosomes is assumed. Generations are discrete, and all chromosomes have two identical genes at generation zero. Each chromosome undergoes mutation, gene duplication/deletion, and random sampling with selection, in this order, in one generation.
|
We call a gene that never has had pseudogene mutation in its descent a live gene. Each live gene mutates to a new different allele or to a pseudogene with rates ua or up, respectively. Here, following the usage of ![]()
![]()
Next, if a chromosome has more than one gene, duplication or deletion occurs with a rate
/2 per gene copy, respectively. If a chromosome has just one gene, no change occurs. Here, we are considering sister-chromatid unequal crossing over as a genetic mechanism for copy number change. If the mechanism for copy number change is duplicative transposition, copy number change can occur even in chromosomes with a single copy.
Finally, N chromosomes of the next generation are sampled from the N chromosomes of the present generation with probabilities proportional to fitness of each chromosome. Let k and nt be the numbers of different alleles and all genes in a chromosome, respectively. The fitness of the chromosome is zero if k = 0. Otherwise, we consider the following three fitness functions, w(k,nt). 1. Model I (![]()
![]() |
(2) |
is the average number of different alleles per chromosome in the population, and sa and sd(sa, sd > 0) are positive and negative selection coefficients, respectively. 2. Model II (![]() |
(3) |
3. Model III,
![]() |
(4) |
Models I and II are modifications of those by ![]()
![]()
Random sampling with selection completes one generation. This cycle is repeated many times. At generation zero, the population starts with all chromosomes having two live genes (k = 1, nl = 2, np = 0), and all genes are identical as mentioned before.
Homologous equal crossing over does not occur in the present haploid model. Thus, this is a single locus infinite-allele model. For large populations, the diffusion approximation can be applied, and, in this approximation, relevant parameters are Nua, Nup, N
, Nsa, and Nsd measuring time in units of N (![]()
Because the evolution of the model seems very complex, we first analyze a Markov chain whose behavior approximates that of our haploid model under a certain condition. The validity of the approximation will be examined by computer simulation later.
| MARKOV CHAIN APPROXIMATION |
|---|
We assume that Nua, Nup, and N
are all very small compared to 1. Also, we assume that positive selection is fairly strong, so that the number of different alleles never decreases. Under these conditions, the population is expected to be mostly monomorphic and thus represented by one chromosome type. A mutant chromosome type that is to replace the currently dominant chromosome type in the population is expected to be fixed quickly. For characterization of a chromosome, let k be the number of different alleles in the chromosome as in the previous section. The chromosome is characterized by k numbers, m1,···,mk, each representing the number of live genes with the same allelic state and the number of pseudogenes, np. Thus, in this approximation, the population state is characterized by a vector (m1,···,mk,np). The number of live genes, nl, is expressed as nl =
ki=1 mi , and the total number, nt, is expressed as nt = nl + np. Occasionally, transitions among states occur. Because we assume Nua, Nup, N
1, one of the following types of changes occurs in a chromosome, and the chromosome is fixed in the population with the probabilities specified below:
![]() |
(5) |
where F(p) =
p0 exp[-Nsax(2 - x)]dx (see ![]()
![]()
2. A live gene in the ith class with mi > 1 becomes a pseudogene. Because such change is selectively neutral with mi > 1, the rate for the ith class genes to become a pseudogene is miup.
3. The number of live genes of the ith class increases by gene duplication. The rate for this event is mi
Ng+/2, where g+ is the fixation probability of a deleterious mutant with a selection coefficient -sd and is expressed as (![]()
![]() |
(6) |
4. The number of live genes in the ith allelic class decreases by deletion. The rate for this event in mi
Ng-/2, where g- is expressed as,
![]() |
(7) |
5. The number of pseudogenes increases by gene duplication. The rate of increase is np
Ng+/2.
6. The number of pseudogenes decreases by deletion. The rate of decrease is np
Ng-/2.
The above transition probabilities determine a Markov chain. Because we assumed strong selection (SS) and weak mutation, duplication/deletion (WMD), we call the result of this approximation the SSWMD limit. This type of approximation has been widely used to analyze complex genetic systems (![]()
![]()
![]()
-1, ua/
and up/
, in addition to Nsa and Nsd, determine the behavior of the process. The process is still a multidimensional Markov chain and is difficult to analyze directly. So we take two approaches. One is computer simulation of the Markov chain and another is further approximations for studying some aspects of the process. For each parameter set, 10,000 replications were made in the simulation.
Because k is an increasing function of time with strong positive selection, k is always > 1 once k becomes more than 1. We call the phases with k = 1 and k > 1 the first and second phases, respectively. In the first phase, the total number of genes, nt, can become 1, and then no change of the gene number occurs thereafter. For this reason, the number of pseudogenes, np, affects the evolution of k and nt. On the other hand, in the second phase, nl is always larger than 1, and change of the number of genes never stops. The evolution of k and nl does not depend on np. Therefore, these two phases are analyzed separately.
In the first phase, the population starts with k = 1, nt = 2, and we are interested in when the population enters the second phase (k > 1). Let p1(t) be the probability that k is 1 at time t. Because evolution in this phase is influenced by np, p1(t) is difficult to compute. We examined p1(t) by simulation of the Markov chain. Figure 1 shows the time-dependent behavior of p1(t). Solid and broken lines represent results for cases without (sd = 0) and with (sd = 0.01sa) copy number selection, respectively, and time is measured in units of 1/
generations. The mutation rates in this example are 10ua = up =
. The probability p1(t) decreases quickly in the first few 1/
generations, and then the change becomes small. In the later generations, nt is 1 in most populations with k = 1, and no changes occur in these populations. That is why p1(t) does not change in the later generations. Positive selection decreases the probability, but very weak copy number selection can increase the probability. The effect of copy number selection is stronger for the cases with stronger positive selection.
|
Ultimately, the population moves to either the state with nt = 1 or that with k > 1. The ultimate probability, p1(
), was estimated by simulation. Because p1(t) is a monotone decreasing function of t and bounded from the lower side by the probability
1(t) of nt being 1, simulations are continued until p1(t) -
1(t) < 0.01 so that we can estimate p1(
) with a maximum error of 0.01. The results are shown in Table 2. The ratio ua/
and effects of selection are important in determining p1(
). For small ua/
, most populations become ultimately fixed by chromosomes with a single copy. The ratio up/ua has little effect probably because the duplication/deletion force is strong. As ua/
becomes larger, p1(
) decreases and the effects of pseudogene mutation become larger. The effect of copy number selection is always significant even though sd/sa = 0.01 in the examples.
|
In the second phase, we are interested in the evolution of k, nl, and np. We first investigate the ratio Q = nl/k, the average number of genes with the same allelic state and a measure of strict gene redundancy. Because the duplication/deletion process does not stop with nt > 1, the evolution of the number of genes with the ith allelic state, mi (we call this the ith allelic class) can be decoupled from np or other mj's in this approximation. Here, we first concentrate on the evolution of one allelic class and denote the number of live genes in the class by m.
Consider the following process: At the start, m is one. Transition from m = 1 to m = 2 occurs at the rate a =
Ng+/2 by gene duplication. For m > 1, the process moves to m - 1, m + 1, or 1. The transition to 1 is added because we want to know the average Q over all allelic classes, and for this purpose, the process moves to a new allelic class with probability one-half if a different allele is created from this allelic class. The transition to m - 1 occurs by deletion, advantageous mutation, or pseudogene mutation, and its rate is denoted by mb. Because we stay in the same allelic class in one-half of the cases when an advantageous mutation occurs, b = up + N(uaf+/2 +
g-/2). The transition to m + 1 occurs by duplication and its rate is ma. Finally, the transition rate to 1 is mc with c = Nuaf+/2. Let qm(t) be the probability of the state being m at time t. Then, qm(t)'s satisfy the following system of differential equations:
![]() |
(8) |
![]() |
(9) |
With the generating function h(z,t) = 
m=1 qmzm , the system of differential equations is expressed by
![]() |
(10) |

m=1 mqm is the mean number. Although the time-dependent solution of this equation is difficult to obtain because of q1 and Mq in the right-hand side, we can compute the equilibrium solution h(z) by noting 
m=1 qm = 1 and qm
1 for all n. The resulting solution is ![]() |
(11) |
![]() |
(12) |
![]() |
(13) |
![]() |
(14) |
It is intuitively clear that increasing b or c decreases and increasing a increases Mq, but quantitatively how it changes is fairly complex, as expressed by (14).
Because Mq is the mean number of live genes in one allelic class, its value might be close to the ratio Q = nl/k, although Mq computed above is the equilibrium value. We can check how well Mq can approximate Q by simulation. Because Q is 1 in populations with nt = 1, we computed Q conditioned on nt > 1 from the simulation at t = 10/
(Table 3). As shown in Table 3, Q estimated from the simulation and Mq computed from (14) agree well. Thus, it seems that the distribution of m is close to equilibrium at t = 10/
. As Nsa or ua becomes larger, the redundancy measured by Mq becomes less because both b and c increase. Increasing up increases b and thus decreases Mq. Introduction of the copy number selection also decreases Mq. Only when ua/
is very small, is high genetic redundancy observed, but otherwise the genetic redundancy is very small.
|
Next, we consider the number (k) of different alleles. Because a new allele is created from a redundant copy, its production rate is proportional to the number of redundant genes and uaNf+. Thus, in the second phase, k(t) is expected to be approximated by a pure-birth process with a rate of (Mq - q1)uaNf+. Define k*(t) = E[k(t),k(
) > 1] as the expectation of k over the event that k eventually enters the second phase. The relationship between k*(t) and E[k(t)] is represented by
![]() |
(15) |
As noted above, because k(t) is a pure-birth process in the second phase, k*(t) takes the form
![]() |
(16) |
= (Mq - q1)uaNf+. Thus, we can write the expectation of k as ![]() |
(17) |
To check the accuracy of this expression based on the pure-birth process approximation, the values computed from it and E[k(t)] obtained from the Markov chain simulation were compared (Figure 2). To obtain C, the simulation data were fitted by the curve represented by (17). The simulation data are represented by open symbols, and the fitted curve is shown by solid lines in Figure 2. The agreement seems very good except for the very intial phase. Thus, in the second phase, k(t) evolves approximately as a pure-birth process defined as above. Note that the fitted curves are >1 at t = 0. This is probably because we started with a redundant copy (m1 = 2). In this curve fit, we regard that a certain amount of new alleles are created at t = 0.
|
Finally, we consider the number of pseudogenes, np. The average rate of production of new pseudogenes per allelic class is again proportional to the number of redundant genes and represented by ß1 = (Mq - q1)up. Once a new pseudogene is created, its number can change by duplication/deletion. The rate of decrease is ß2 = (g- - g+)N
/2. Thus, the expected number of pseudogenes may take the following form,
![]() |
(18) |
However, recall that the estimated E[k(t)] by (17) is not 1 at t = 0. Because pseudogenes are produced from redundant genes as new alleles but with a different rate ß1 per allelic class, we need to add the following term due to pseudogenes created at t = 0:
![]() |
(19) |
The term in parentheses is the number of new alleles created at t = 0 in the curve fit mentioned above, and the ratio of the number of pseudogenes to that of new alleles when they are created is ß1/
(![]()
![]() |
(20) |
Although the computation of this formula requires p1(
), if we want an expectation conditioned on the event of k > 1, it can be computed from the conditional expectation of k(t). Thus, this formula can be used to examine the relationship between conditional expectations of k and np. We again checked the accuracy of this equation with the values obtained by the Markov chain simulation, and the results are shown in Table 4. The agreement is generally very good. Without copy number selection (sd = 0), ß2 = 0, and if we note k*(t) - (1 - p1(
)) = E[k(t)] - 1 [see (15)], Equation 20 is simplified to
![]() |
(21) |
1, ![]() |
(22) |
|
Thus, if
that increases ß2 linearly is large, E[np]/k* becomes small. This can be seen from Table 4, where even with weak copy number selection (sd = sa/100), the number of pseudogenes is drastically reduced.
| SIMULATION |
|---|
To examine the validity and limitation of the SSWMD limit obtained by the Markov chain approximation and also to investigate parameter ranges outside that assumed in the approximation, we carried out computer simulation closely following the model. We call this type of simulation the Wright-Fisher simulation. In the simulation, N chromosomes are prepared. In each chromosome, k, m1, ..., mk, np and its fitness are stored. Numbers of mutations and duplication/deletion events are determined by generating Poisson random numbers with means computed by multiplying the number of genes and respective rates. Genes undergoing mutations or duplication/deletion are chosen randomly. After these, the fitness of each chromosome is determined according to (2), (3), or (4) and is divided by the maximum fitness of the current population. The next generation is constructed in the following way: First, choose a chromosome randomly. Next, draw a uniform random number in [0,1) and, if the fitness of the chosen chromosome is less than the random number, add it to the population of the next generation. This is repeated until the chromosome number becomes N. For each parameter set, 1600 replications are made. We present the simulation results in comparison with the SSWMD limit.
First, we examine time courses of the average k, nl, np and the proportion
1(t) of populations with nt = 1 when N
, Nup, and Nua are small and agreement between the Markov chain simulation and Wright-Fisher simulation is expected. Some examples are shown in Figure 3. The values computed by the Markov chain simulation are represented by solid lines and those by the Wright-Fisher simulation are represented by dotted lines. Parameters used are
= up = 0.001, ua = 0.0001, sa = 0.1, sd = 0, and N = 100 (Figure 3A) and N = 400 (Figure 3B). With N = 100, the Markov chain approximation is very close to the Wright-Fisher simulation. For this parameter set, N
= Nup = 0.01 and Nua = 0.001. However, with N = 400, the Markov chain approximation underestimates expected values of k, np, nl by 10 to 20%, especially in the later generations, although they seem to increase exponenetially. The largest discrepancy is found in np. As the number of genes per chromosome increases, the population becomes polymorphic because product parameters such as nlN
become large. This may be the reason for the discrepancies. Estimates of
1(t) from the two simulations agree well.
|
To examine the numbers of genes in a wider parameter range, more simulations were carried out, and some of the results are shown in Table 5 (without copy number selection) and Table 6 (with copy number selection). To clarify the effect of increasing
, ua, and up, cases with
= 0.0001 and 0.001 are shown along with the SSWMD limit keeping the ua/
and up/
constant. Means and variances of gene numbers at t = 10/
are shown. With ua/
= 0.02 and N = 100, values for
= 0.001 and 0.0001 are quite close to the SSWMD limit. As population size increases (N = 400) or ua/
approaches 0.1, the values with a large
(0.001) deviate from the SSWMD limit. Especially, the number of pseudogenes becomes large when
is large, decreasing the ratio of the number of different alleles to those of pseudogenes. With larger N or ua, populations become polymorphic with regard to gene numbers, thus breaking the assumption of the Markov chain approximation. The increase of pseudogenes is probably due to competition among advantageous alleles and resulting reduction of the probability of fixation of new different alleles. The trend becomes more pronounced when both N and ua are larger (ua/
= 0.1, N = 400). In this parameter set, the numbers of different alleles and live genes are also larger than the SSWMD limit. This is especially true when there is copy number selection. In the Markov chain approximation, fixations of new alleles are assumed to occur after fixations of redundant copies. However, in polymorphic populations, new, different alleles may appear before fixation of redundant copies. This may increase the fixation rate of new genes, especially when there is copy number selection and fixations of redundant copies are difficult to make occur.
|
|
The number of different alleles increases as population size increases without copy number selection. However, this is not necessarily true with copy number selection. With ua/
= 0.02, the average numbers of different alleles are almost the same for N = 100 and 400. The variance of k is generally very large and in the order of the square of the mean when the mean is large. As ua increases, keeping ua/up constant, E[k] always increases.
Thus far, we investigated only cases with fairly strong selection (Nsa = 10, 40), where one of the assumptions of the SSWMD limit is satisfied. We also investigated when selection is weak, and the result is shown in Table 7. The result of reducing
to 0.0001 was indistinguishable from that of
= 0.001 and is not shown. The average number of different alleles is very close to 1, although pseudogenes are created. Thus, for smaller populations, new alleles are difficult to evolve. Copy number selection has very little effect on E[k] and slight effects on E[np]. One notable difference from the strong selection case is that E[k] decreases as ua increases, keeping ua/up constant. The effect of increasing up overrides that of increasing ua if Nsa is of the order of 1. Also note that the variance of np is very large. Some populations are considered to have a very large number of pseudogenes, whereas nt = 1 and np = 0 in other populations.
|
| DISCUSSION |
|---|
In the present article, evolution of multigene families by gene duplication was analyzed assuming a haploid population and no interchromosomal recombination. In such a population, redundant copies are created first, and then one of them evolves to become a new gene by adaptive mutation. This system differs from those in which selection directly operates on the number of gene copies (![]()
![]()
![]()
First, note that in the SSWMD limit, if we measure time in units of
-1, the behavior of the process is characterized by the scaled parameters u*a =
and u*p =
, in addition to Nsa and Nsd. Thus, if we keep these parameters constant, the birth rate
is proportional to
. Let
* =
/
and consider a simple case of no copy number selection (sd = 0). In this case,
* is expressed as
![]() |
(23) |
Here, d is proportional to both u*a and Nf+. Two limiting cases are instructive. If d
1, Equation 23 reduces to
![]() |
(24) |
This shows that
* increases linearly as d increases and that the rate is a decreasing function of u*p . Thus, if u*p is small,
* quickly increases as d increases either by an increase of N or u*a . However, the increase is saturated, and for d
1,
![]() |
(25) |
The increase rate with d =
is
/2 per generation as expected, but the approach to it is affected by the pseudogene mutation rate. Between these two extreme cases,
* was numerically shown to be an increasing function of d in the range 0.01
up/
10 (data not shown). Thus, for example, the rate of increase of gene number is larger in larger populations with sd = 0. However, this is not always true when there is copy number selection (sd
0), as shown in Figure 4. In these cases, as N becomes larger,
* first increases and then decreases. This is because the effect of copy number selection also becomes larger in larger populations. However, as noted in the previous section, the SSWMD limit underestimates the number of different alleles with copy number selection as the change rates (ua, up, and u
) become larger. Thus, the decrease of the rate as N increases is not as pronounced, as Figure 4 shows in more realistic situations. Indeed, when ua = 10-5, up = 10-4,
= 10-4, sa = 0.1, and sd = 0.002, corresponding to set 1 in Figure 4, the average numbers of different alleles at the 10/
th generation were 2.048 ± 0.0103 for N = 200 and 1.896 ± 0.081 for N = 800. In summary, the number of different alleles increases as population size becomes large without copy number selection, but this is not necessarily so with copy number selection.
|
Next, we examine the speed of gene duplication. We measure it by the doubling time of E[k]. It is expressed by log2/[(Mq - q1)uaNf+] and depends on ua, up,
, Nsa, and Nsd. We do not have reliable estimates for most of the parameters, but some rough calculation under this model is possible. For up, we may use the null mutation rate measured in Drosophila by ![]()
![]()
= 10-4, taking an intermediate value. Currently, we do not have any estimates for the advantageous mutation rate ua or selection parameters sa, sd, but to get a rough idea, let us assume ua = 10-6, employing the bandmorph mutation rate 1.03 x 10-6, estimated by ![]()
1, the doubling time is shown to be roughly proportional to (uaNf+
)-1 from (24), and reducing ua to one-tenth makes the time about 10 times greater. At any rate, gene duplications can occur very quickly under the present model from the view point of the evolutionary time scale. However, according to ![]()
![]()
![]()
![]()
![]()
![]()
We usually observe large variation in gene numbers in one multigene family among related species. For example, in Drosophila amylase genes, the numbers of genes are two (![]()
![]()
![]()
![]()
![]()
![]()
, where n0 is the initial number (see p. 159 of ![]()
In the present article, as a fitness function, OHTA's model (Model I; ![]()
of different alleles is proportional to
, in Model I and Nsa in Model II (WALSH's model; ![]()
1,
* is proportional to d as shown in (24). Indeed, for Nua = 0.04, Nup = N
= 0.4, Nsa = 40, and sd = 0, the Wright-Fisher-type simulation shows that the average number of different alleles at t = 25N was 150 in Model II, whereas it was 6 in Model I. Thus, the selection in Model I is the least efficient in increasing gene number.
We assumed that no duplication/deletion occurs if the total number, nt, of genes becomes 1 because the duplication rate is much smaller if there is only a single copy. We divided the process into the first and second phases, depending on whether k is 1 or more than 1. In the second phase, the evolution is the same as that in OHTA's model (![]()
is not small (Table 2). Thus, we expect to find lineages with only a single gene even if gene number is large in other lineages.
From our approximate analysis, the gene duplication processes in a haploid model without interchromosomal crossing over were characterized, and dependencies of the speed of gene duplication, genetic redundancy, and the number of pseudogenes on various parameters are clarified for strong selection cases. With the accumulation of data on gene families, various discussions on the mechanisms of the evolution of multigene families have been made (e.g., ![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We thank T. OHTA and M. IIZUKA, A. CLARK, and an anonymous reviewer for their helpful comments on earlier drafts of the manuscript. This research was partially supported by grants-in-aid to H.T. from the Ministry of Education, Science and Culture of Japan.
Manuscript received October 12, 1997; Accepted for publication May 13, 1998.
| LITERATURE CITED |
|---|
BROWN, C. J., C. F. AQUADRO, and W. W. ANDERSON, 1990 DNA sequence evolution of the amylase multigene family in Drosophila pseudoobscura.. Genetics 126:131-138[Abstract].
BUCKLER, E. S., IV, A. IPPOLITO, and T. P. HOLTSFORD, 1997 The evolution ribosomal DNA: divergent paralogues and phylogenetic implications. Genetics 145:821-832[Abstract].
CARROLL, S. B., 1995 Homeotic genes and the evolution of arthropods and chordates. Nature 376:479-485[Medline].
CLARK, A. G., 1994 Invasion and maintenance of a gene duplication. Proc. Natl. Acad. Sci. USA 91:2950-02954
CLEGG, M. T., M. P. CUMMINGS, and M. L. DURMBIN, 1997 The evolution of plant nuclear genes. Proc. Natl. Acad. Sci. USA 94:7791-7798
COX, D. R., and H. D. MILLER, 1965 The Theory of Stochastic Processes. Chapman and Hall, London.
CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetic Theory. Harper & Row, New York.
DA LAGE, J. L., M. L. LEMEUNIER, M. L. CARIOU, and J. R. DAVID, 1992 Multiple amylase genes in Drosophila ananassae and related species. Genet. Res. 59:85-92[Medline].
EGGLESTON, W. B., M. ALLEMAN, and J. L. KERMICLE, 1995 Molecular organization and germinal instability of R-stippled maize. Genetics 141:347-360[Abstract].
FRYXELL, K. J., 1996 The coevolution of gene family trees. Trends Genet. 12:364-369[Medline].
GILLESPIE, J. H., 1983 Some properties of finite populations experiencing strong selection and weak mutation. Am. Nat. 121:691-708.
HOLLICK, J. B., J. E. DORWEILER, and V. L. CHANDLER, 1997 Paramutation and related allelic interactions. Trends Genet. 13:302-308[Medline].
IWABE, N., K. KUMA, and T. MIYATA, 1996 Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. Mol. Biol. Evol. 13:483-493[Abstract].
KIMURA, K., 1962 On the probability of fixation of mutant genes in a population. Genetics 47:713-719
KIMURA, M. and J. F. CROW, 1964 The number of alleles that can be maintained in a finite population. Genetics 49:725-738
LANGE, B. W., C. H. LANGLEY, and W. STEPHAN, 1990 Molecular evolution of Drosophila metallothionein genes. Genetics 126:921-932[Abstract].
LI, W. H., 1987 Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24:337-345[Medline].
LYCKEGAARD, E. M. S. and A. G. CLARK, 1989 Ribosomal DNA and stellate gene copy number variation on the Y chromosome of Drosophila melanogaster.. Proc. Natl. Acad. Sci. USA 86:1944-1984
MUKAI, T. and C. C. COCKERHAM, 1977 Spontaneous mutation rates at enzyme loci in Drosophila melanogaster.. Proc. Natl. Acad. Sci. USA 74:2514-2517
NEITZ, M. and J. NEITZ, 1995 Numbers and ratios of visual pigment genes for normal red-green color vision. Science 267:1013-1016
OHNO, S., 1970 Evolution by Gene Duplication. Springer-Verlag, New York.
OHTA, T., 1983 Theoretical study on the accumulation of selfish DNA. Genet. Res. 41:1-15[Medline].
OHTA, T., 1987a A model of evolution for accumulating genetic information. J. Theor. Biol. 124:199-211[Medline].
OHTA, T., 1987b Simulating evolution by gene duplication. Genetics 115:207-213
OHTA, T., 1988a Further simulation studies on evolution by gene duplication. Evolution 42:375-386.
OHTA, T., 1988b Multigene and supergene families. Oxf. Surv. Evol. Biol. 5:41-65.
OHTA, T., 1991 Multigene families and the evolution of complexity. J. Mol. Evol. 33:34-41[Medline].
PAYANT, V., S. ABUKASHAWA, M. SASSEVILLE, B. F. BENKEL, and D. A. HICKEY et al., 1988 Evolutionary conservation of the chromosomal configuration and regulation of amylase genes among eight species of the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 5:560-567[Abstract].
PETROV, D. A., E. R. LOZOVSKAYA, and D. L. HARTL, 1996 High intrinsic rate of DNA loss in Drosophila. Nature 384:346-349[Medline].
SHAPIRA, S. K. and V. G. FINNERTY, 1986 The use of genetic complementation in the study of eukaryotic macromolecular evolution: rate of spontaneous gene duplication at two loci of Drosophila melanogaster.. J. Mol. Evol. 23:159-167[Medline].
SHIBATA, H. and T. YAMAZAKI, 1995 Molecular evolution of the duplicated Amy locus in Drosophila melanogaster species subgroup: concerted evolution only in coding region and excess of nonsynonymous substitutions in speciation. Genetics 141:223-236[Abstract].
STEPHAN, W., 1986 Recombination and the evolution of satellite DNA. Genet. Res. 47:167-174[Medline].
SUDUPAK, M. A., J. L. BENNETZEN, and S. H. HULLBERT, 1993 Unequal exchange and meiotic instability of disease-resistance genes in the Rp 1 region of maize. Genetics 133:119-125[Abstract].
TAKANO, T. S., S. KUSAKABE, A. KOGA, and T. MUKAI, 1989 Polymorphism for the number of tandemly multiplicated glycerol-3-phosphate dehydrogenase genes in Drosophila melanogaster.. Proc. Natl. Acad. Sci. USA 86:5000-5004
THEISSEN, G., J. T. KIM, and H. SAEDLER, 1996 Classification and phylogeny of the MADS-box multigene family suggest defined roles of MADS-box gene subfamilies in the morphological evolution of eukaryotes. J. Mol. Evol. 43:484-516[Medline].
WALSH, J. B., 1985 Interaction of selection and biased gene conversion in a multigene family. Proc. Natl. Acad. Sci. USA 82:153-157
WALSH, J. B., 1995 How often do duplicated genes evolve new functions? Genetics 139:421-428[Abstract].
| ||||||||||||||||||||||||||||||||||||||||||