Evolution of Multigene Families by Gene Duplication: A Haploid Model

Tachida, Hidenori; Kuboyama, Tohru

doi:10.1093/genetics/149.4.2147

Abstract

Evolution of multigene families by gene duplication and subsequent diversification is analyzed assuming a haploid model without interchromosomal crossing over. Chromosomes with more different genes are assumed to have higher fitness. Advantageous and deleterious mutations and duplication/deletion also affect the evolution, as in previous studies. In addition, negative selection on the total number of genes (copy number selection) is incorporated in the model. First, a Markov chain approximation is used to obtain formulas for the average numbers of different alleles, genes without pseudogene mutations, and pseudogenes assuming that mutation rates and duplication/deletion rates are all very small. Computer simulation shows that the approximation works well if the products of population size with mutation and duplication/deletion rates are all small compared to 1. However, as they become large, the approximation underestimates gene numbers, especially the number of pseudogenes. Based on the approximation, the following was found: (1) Gene redundancy measured by the average number of redundant genes decreases as advantageous selection becomes stronger. (2) The number of different genes can be approximately described by a linear pure-birth process and thus has a coefficient of variation around 1. (3) The birth rate is an increasing function of population size without copy number selection, but not necessarily so otherwise. (4) Copy number selection drastically decreases the number of pseudogenes. Available data of mutation rates and duplication/deletion rates suggest much faster increases of gene numbers than those observed in the evolution of currently existing multigene families. Various explanations for this discrepancy are discussed based on our approximate analysis.

GENE duplication with subsequent diversification is considered to have played very important roles in the organismal evolution (Ohno 1970, 1988b, 1991). If a gene is duplicated, the selective constraint becomes less for the extra copy, and it can evolve to have a (slightly) different function, while the original function of the gene is kept in the other copy. Thus, gene duplication with subsequent diversification is one of the simplest ways to acquire a new function and is thought to have been employed many times during evolution. For example, Iwabe et al. (1996) suggested that gene duplications contributed to the compartmentalization of a cell and formation of multicellular organisms by creating genes expressed tissue specifically. Also, homeobox genes (Carroll 1995) and MCM1, AGAMOUS, DEFICIENS, and SRF (MADS)-box genes (Thiessen et al. 1996) were apparently created by ancient gene duplications. Gene duplications are not restricted to ancient times. Changes of gene number are observed in closely related species in many taxa, and even examples of polymorphisms with regard to copy number within species are known (e.g., Lyckegaard and Clark 1989; Takano et al. 1989; Lange et al. 1990; Neitz and Neitz 1995). Thus, gene duplications are still ongoing evolutionary processes. How such gene duplications and formation of multigene families occur at the population level is an interesting evolutionary problem.

Ohta (1987a,b, 1988a,b) theoretically studied evolution by gene duplication in haploid and diploid models using Monte Carlo simulation. She showed among others that positive selection is necessary to acquire genes with new functions and that the variance of the gene number is generally large. Also the ratio (R) of the number of genes with different functions to the number of pseudogenes was proposed to measure effects of positive selection, and an approximate formula to compute it was derived (Ohta 1987b),

R = \frac{u_{+} v_{+}}{v_{-}},

(1)

where v₊ and v₋ are rates of mutation to a different gene with a new function and of mutation to a pseudogene, respectively, and u₊ is the fixation probability of an advantageous mutation introduced into a population. Later, Clark (1994) and Walsh (1995) studied different aspects of two gene systems assuming slightly different models. However, there are questions still to be answered. First, because Ohta's results (Ohta 1987a,b, 1988a,b) were mostly from simulation except for Equation 1 dependency of gene multiplication on parameters like population size, intensity of selection, mutation rate, and duplication/deletion rate was not so clear. As we do not have good estimates for most of these parameters, deriving analytical expressions describing gene multiplication processes is desirable to quantify speed of gene duplication and gene redundancy. Second, negative selection on the total number of genes per chromosome (copy number selection) was not considered in the previous works. A tendency for DNA loss is observed in Drosophila (Petrov et al. 1996), and gene silencing is observed when multiple copies are introduced into plants (Hollick et al. 1997). Thus, there is a possibility of the action of copy number selection. Third, the rate of gene duplication from a single gene and from multiple genes are considered to be different. However, this difference was ignored to simplify the model in Ohta's simulation.

In the present article, as an extension of Ohta's work, a haploid model of gene duplication is analyzed paying attention to the change of gene number. Copy number selection is explicitly incorporated into the model, and the duplication rate from a single gene was assumed to be 0. First, the model is described. Then, a Markov chain approximation of the model is derived, and its behavior is analyzed assuming low mutation and duplication/deletion rates. Results of extensive computer simulation to check the accuracy of the approximate analysis will be described. Approximate formulas for the rate of increase of the number of different genes and gene redundancy were obtained as a function of population size, mutation rates, duplication/deletion rates, and selection coefficients. Copy number selection was shown to be very effective in reducing the number of pseudogenes. Relationships of various haploid models also will be discussed.

MODEL

Definitions of symbols used are summarized in Table 1. A random-mating haploid population consisting of N chromosomes is assumed. Generations are discrete, and all chromosomes have two identical genes at generation zero. Each chromosome undergoes mutation, gene duplication/deletion, and random sampling with selection, in this order, in one generation.

We call a gene that never has had pseudogene mutation in its descent a live gene. Each live gene mutates to a new different allele or to a pseudogene with rates u_a or u_p, respectively. Here, following the usage of Ohta (1988b), we use the word alleles to denote gene types, although alleles may not be at the same locus. Each new allele is assumed to be different from any other alleles preexisting in the population (Kimura and Crow 1964). Because fitness of a chromosome is a monotone increasing function of the number of different alleles, as described later, u_a is the advantageous mutation rate. Once a gene becomes a pseudogene, it stays a pseudogene.

Next, if a chromosome has more than one gene, duplication

TABLE 1

Definitions of symbols frequently used in the text

Symbol	Meaning
k	Number of different alleles in a chromosome
m_i	Number of copies with the ith allelic state
n₁	Number of live genes in a chromosome
n_p	Number of pseudogenes in a chromosome
n_t	Number of total genes in a chromosome
N	Population size
u_a	Mutation rate to a different allele
u_p	Mutation rate to a pseudogene
γ	Duplication/deletion rate
s_a	Coefficient for advantageous selection
s_d	Selection coefficient for copy number selection
f₊	Fixation probability of a chromosome with one more different allele
g₊	Fixation probability of a chromosome with one more gene
g₋	Fixation probability of a chromosome with one less gene
α	Rate of increase of the number of different alleles
p₁(t)	Probability of the population fixed by chromosomes with k = 1
${\tilde{p}}_{1}$	Probability of the population fixed by chromosomes with n_t = 1
R	Ratio of the number of different alleles to the number of pseudogenes

Symbol	Meaning
k	Number of different alleles in a chromosome
m_i	Number of copies with the ith allelic state
n₁	Number of live genes in a chromosome
n_p	Number of pseudogenes in a chromosome
n_t	Number of total genes in a chromosome
N	Population size
u_a	Mutation rate to a different allele
u_p	Mutation rate to a pseudogene
γ	Duplication/deletion rate
s_a	Coefficient for advantageous selection
s_d	Selection coefficient for copy number selection
f₊	Fixation probability of a chromosome with one more different allele
g₊	Fixation probability of a chromosome with one more gene
g₋	Fixation probability of a chromosome with one less gene
α	Rate of increase of the number of different alleles
p₁(t)	Probability of the population fixed by chromosomes with k = 1
${\tilde{p}}_{1}$	Probability of the population fixed by chromosomes with n_t = 1
R	Ratio of the number of different alleles to the number of pseudogenes

Open in new tab

TABLE 1

Definitions of symbols frequently used in the text

Symbol	Meaning
k	Number of different alleles in a chromosome
m_i	Number of copies with the ith allelic state
n₁	Number of live genes in a chromosome
n_p	Number of pseudogenes in a chromosome
n_t	Number of total genes in a chromosome
N	Population size
u_a	Mutation rate to a different allele
u_p	Mutation rate to a pseudogene
γ	Duplication/deletion rate
s_a	Coefficient for advantageous selection
s_d	Selection coefficient for copy number selection
f₊	Fixation probability of a chromosome with one more different allele
g₊	Fixation probability of a chromosome with one more gene
g₋	Fixation probability of a chromosome with one less gene
α	Rate of increase of the number of different alleles
p₁(t)	Probability of the population fixed by chromosomes with k = 1
${\tilde{p}}_{1}$	Probability of the population fixed by chromosomes with n_t = 1
R	Ratio of the number of different alleles to the number of pseudogenes

Symbol	Meaning
k	Number of different alleles in a chromosome
m_i	Number of copies with the ith allelic state
n₁	Number of live genes in a chromosome
n_p	Number of pseudogenes in a chromosome
n_t	Number of total genes in a chromosome
N	Population size
u_a	Mutation rate to a different allele
u_p	Mutation rate to a pseudogene
γ	Duplication/deletion rate
s_a	Coefficient for advantageous selection
s_d	Selection coefficient for copy number selection
f₊	Fixation probability of a chromosome with one more different allele
g₊	Fixation probability of a chromosome with one more gene
g₋	Fixation probability of a chromosome with one less gene
α	Rate of increase of the number of different alleles
p₁(t)	Probability of the population fixed by chromosomes with k = 1
${\tilde{p}}_{1}$	Probability of the population fixed by chromosomes with n_t = 1
R	Ratio of the number of different alleles to the number of pseudogenes

Open in new tab

or deletion occurs with a rate γ/2 per gene copy, respectively. If a chromosome has just one gene, no change occurs. Here, we are considering sister-chromatid unequal crossing over as a genetic mechanism for copy number change. If the mechanism for copy number change is duplicative transposition, copy number change can occur even in chromosomes with a single copy.

Finally, N chromosomes of the next generation are sampled from the N chromosomes of the present generation with probabilities proportional to fitness of each chromosome. Let k and n_t be the numbers of different alleles and all genes in a chromosome, respectively. The fitness of the chromosome is zero if k = 0. Otherwise, we consider the following three fitness functions, w(k,n_t).

Model I (Ohta 1987b),
$w (k, n_{t}) = {\begin{matrix} \exp (- s_{d} n_{t}) & (k \geq \bar{k}) \\ \exp [- s_{a} (\bar{k} - k) - s_{d} n_{t}] & (k < \bar{k}) \end{matrix},$
(2)
where $\bar{k}$ is the average number of different alleles per chromosome in the population, and s_a and s_d(s_a, s_d > 0) are positive and negative selection coefficients, respectively.
Model II (Walsh 1995),
$w (k, n_{t}) = \exp [s_{a} k - s_{d} n_{t}] .$
(3)
Model III,
$w (k, n_{t}) = {\begin{matrix} \exp [s_{a} (k - \bar{k}) - s_{d} n_{t}] & (k > \bar{k}) \\ \exp (- s_{d} n_{t}) & (k \leq \bar{k}) \end{matrix} .$
(4)
Models I and II are modifications of those by Ohta (1987b) and Walsh (1995), respectively, with addition of deleterious effects determined by the total number of genes (copy number selection). As will be noted in the next section, models I, II, and III correspond to the recessive, semidominant, and dominant advantageous mutation models, respectively, in regard to the increase of the number of different alleles. We mainly show results for Model I. Differences among the models will be discussed where relevant.

Random sampling with selection completes one generation. This cycle is repeated many times. At generation zero, the population starts with all chromosomes having two live genes (k = 1, n_l = 2, n_p = 0), and all genes are identical as mentioned before.

Homologous equal crossing over does not occur in the present haploid model. Thus, this is a single locus infinite-allele model. For large populations, the diffusion approximation can be applied, and, in this approximation, relevant parameters are Nu_a, Nu_p, Nγ, Ns_a, and Ns_d measuring time in units of N (Crow and Kimura 1970).

Because the evolution of the model seems very complex, we first analyze a Markov chain whose behavior approximates that of our haploid model under a certain condition. The validity of the approximation will be examined by computer simulation later.

MARKOV CHAIN APPROXIMATION

We assume that Nu_a, Nu_p, and Nγ are all very small compared to 1. Also, we assume that positive selection is fairly strong, so that the number of different alleles never decreases. Under these conditions, the population is expected to be mostly monomorphic and thus represented by one chromosome type. A mutant chromosome type that is to replace the currently dominant chromosome type in the population is expected to be fixed quickly. For characterization of a chromosome, let k be the number of different alleles in the chromosome as in the previous section. The chromosome is characterized by k numbers, m₁,⋯,m_k, each representing the number of live genes with the same allelic state and the number of pseudogenes, n_p. Thus, in this approximation, the population state is characterized by a vector (m₁,⋯,m_k,n_p). The number of live genes, n_l, is expressed as $n_{1} = Σ_{i = 1}^{k} m_{i}$ ⁠, and the total number, n_t, is expressed as n_t = n_l + n_p. Occasionally, transitions among states occur. Because we assume Nu_a, Nu_p, Nγ ≪ 1, one of the following types of changes occurs in a chromosome, and the chromosome is fixed in the population with the probabilities specified below:

A live gene in the ith class with m_i > 1 has an advantageous mutation. The rate of this event is u_am_iNf₊, where f₊ is the fixation probability of a mutant chromosome with one more different allele than other chromosomes in the population. Depending on the models of selection, f₊ has a different form and is approximately expressed as,
$f_{+} = {\begin{matrix} \sqrt{(4 s_{a}) ∕ (π N)} & (Model I, Ohta 1987 b) \\ 2 s_{a} ∕ (1 - \exp (- 2 N s_{a})) & (Model II, Walsh 1995) \\ F (1 ∕ N) ∕ F (1) & (Model III) \end{matrix}$
(5)
where $F (p) = \int_{0}^{p} \exp [- N s_{a} x (2 - x)] d x$ (see Crow and Kimura 1970). Note that the fixation probabilities in models I, II, and III correspond to those of recessive, semidominant, and dominant advantageous mutations, respectively. The derivation for Model III can be done in the same way as that in Model I by Ohta (1987b).
A live gene in the ith class with m_i > 1 becomes a pseudogene. Because such change is selectively neutral with m_i > 1, the rate for the ith class genes to become a pseudogene is m_iu_p.
The number of live genes of the ith class increases by gene duplication. The rate for this event is m_iγNg₊/2, where g₊ is the fixation probability of a deleterious mutant with a selection coefficient −s_d and is expressed as (Kimura 1962),
$g_{+} = \frac{\exp (2 s_{d}) - 1}{\exp (2 N s_{d}) - 1} .$
(6)
The number of live genes in the ith allelic class decreases by deletion. The rate for this event in m_iγNg₋/2, where g₋ is expressed as,
$g_{-} = \frac{1 - \exp (- 2 s_{d})}{1 - \exp (- 2 N s_{d})} .$
(7)
The number of pseudogenes increases by gene duplication. The rate of increase is n_pγNg₊/2.
The number of pseudogenes decreases by deletion. The rate of decrease is n_pγNg₋/2.

The above transition probabilities determine a Markov chain. Because we assumed strong selection (SS) and weak mutation, duplication/deletion (WMD), we call the result of this approximation the SSWMD limit. This type of approximation has been widely used to analyze complex genetic systems (Gillespie 1983; Walsh 1985; Li 1987). In this approximation, further reduction of the number of relevant parameters occurs. If we measure time in units of γ⁻¹, u_a/γ and u_p/γ, in addition to Ns_a and Ns_d, determine the behavior of the process. The process is still a multidimensional Markov chain and is difficult to analyze directly. So we take two approaches. One is computer simulation of the Markov chain and another is further approximations for studying some

Figure 1.

The probability, p1(t), that the number of different alleles is 1 at t. All combinations of Nsa = 5, 10, 40, and sd = 0, 0.01 sa are used for selection coefficients. Other parameters are ua = 0.1γ and up = γ.

Open in new tab Download slide

The probability, p₁(t), that the number of different alleles is 1 at t. All combinations of Ns_a = 5, 10, 40, and s_d = 0, 0.01 s_a are used for selection coefficients. Other parameters are u_a = 0.1γ and u_p = γ.

aspects of the process. For each parameter set, 10,000 replications were made in the simulation.

Because k is an increasing function of time with strong positive selection, k is always > 1 once k becomes more than 1. We call the phases with k = 1 and k > 1 the first and second phases, respectively. In the first phase, the total number of genes, n_t, can become 1, and then no change of the gene number occurs thereafter. For this reason, the number of pseudogenes, n_p, affects the evolution of k and n_t. On the other hand, in the second phase, n_l is always larger than 1, and change of the number of genes never stops. The evolution of k and n_l does not depend on n_p. Therefore, these two phases are analyzed separately.

In the first phase, the population starts with k = 1, n_t = 2, and we are interested in when the population enters the second phase (k > 1). Let p₁(t) be the probability that k is 1 at time t. Because evolution in this phase is influenced by n_p, p₁(t) is difficult to compute. We examined p₁(t) by simulation of the Markov chain. Figure 1 shows the time-dependent behavior of p₁(t). Solid and broken lines represent results for cases without (s_d = 0) and with (s_d = 0.01 s_a) copy number selection, respectively, and time is measured in units of 1/γ generations. The mutation rates in this example are 10u_a = u_p = γ. The probability p₁(t) decreases quickly in the first few 1/γ generations, and then the change becomes small. In the later generations, n_t is 1 in most populations with k = 1, and no changes occur in these populations. That is why p₁(t) does not change in the later generations. Positive selection decreases the probability, but very weak copy number selection can increase the probability. The effect of copy number selection is stronger for the cases with stronger positive selection.

Ultimately, the population moves to either the state with n_t = 1 or that with k > 1. The ultimate probability, p₁(∞), was estimated by simulation. Because p₁(t) is a monotone decreasing function of t and bounded from the lower side by the probability ${\tilde{p}}_{1} (t)$ of n_t being 1, simulations are continued until p₁(t) − ${\tilde{p}}_{1} (t)$ < 0.01 so that we can estimate p₁(∞) with a maximum error of 0.01. The results are shown in Table 2. The ratio u_a/γ and effects of selection are important in determining p₁(∞). For small u_a/γ, most populations become ultimately fixed by chromosomes with a single copy. The ratio u_p/u_a has little effect probably because the duplication/deletion force is strong. As u_a/γ becomes larger, p₁(∞) decreases and the effects of pseudogene mutation become larger. The effect of copy number selection is always significant even though s_d/s_a = 0.01 in the examples.

In the second phase, we are interested in the evolution of k, n_l, and n_p. We first investigate the ratio Q = n_l/k, the average number of genes with the same allelic state and a measure of strict gene redundancy. Because the duplication/deletion process does not stop with n_t > 1, the evolution of the number of genes with the ith allelic state, m_i (we call this the ith allelic class) can be decoupled from n_p or other m_j's in this approximation. Here, we first concentrate on the evolution of one allelic class and denote the number of live genes in the class by m.

Consider the following process: At the start, m is one. Transition from m = 1 to m = 2 occurs at the rate a = γNg₊/2 by gene duplication. For m > 1, the process moves to m − 1, m + 1, or 1. The transition to 1 is added because we want to know the average Q over all allelic classes, and for this purpose, the process moves to a new allelic class with probability one-half if a different allele is created from this allelic class. The transition to m − 1 occurs by deletion, advantageous mutation, or pseudogene mutation, and its rate is denoted by mb. Because we stay in the same allelic class in one-half of the cases when an advantageous mutation occurs, b = u_p + N(u_af₊/2 + γg₋/2). The transition to m + 1 occurs by duplication and its rate is ma. Finally, the transition rate to 1 is mc with c = Nu_af₊/2. Let q_m(t) be the probability of the state being m at time t. Then, q_m(t)'s satisfy the following system of differential equations:

\frac{d q_{1}}{d t} = - a q_{1} + 2 b q_{2} + c Σ_{i = 1}^{\infty} i q_{i}

(8)

\begin{matrix} \frac{d q_{m}}{d t} & = - m (a + b + c) q_{m} + (m - 1) a q_{m - 1} \\ + (m + 1) b q_{m + 1} (m \geq 2) . \end{matrix}

(9)

With the generating function

h (z, t) = Σ_{m = 1}^{\infty} q_{m} z^{m}

⁠, the system of differential equations is expressed by

\begin{matrix} \frac{\partial h (z, t)}{\partial t} = & b (1 - \frac{a + b + c}{b} z + \frac{b}{a}) \frac{\partial h}{\partial z} \\ + (b q_{1} + c M_{q}) z - b q_{1}, \end{matrix}

(10)

where

M_{q} = Σ_{m = 1}^{\infty} m q_{m}

is the mean number. Although the time-dependent solution of this equation is difficult

TABLE 2

The ultimate probability, p₁(∞), of the population being fixed by chromosomes with one copy

		Ns_a = 5		Ns_a = 10		Ns_a = 40
u_a/γ	u_p/u_a	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100
0.02	5	0.73	0.77	0.69	0.76	0.59	0.79
	10	0.73	0.78	0.70	0.76	0.60	0.80
0.1	5	0.54	0.58	0.49	0.55	0.38	0.54
	10	0.58	0.62	0.53	0.60	0.42	0.61
0.5	5	0.40	0.44	0.36	0.40	0.25	0.37
	10	0.49	0.53	0.43	0.50	0.33	0.50

		Ns_a = 5		Ns_a = 10		Ns_a = 40
u_a/γ	u_p/u_a	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100
0.02	5	0.73	0.77	0.69	0.76	0.59	0.79
	10	0.73	0.78	0.70	0.76	0.60	0.80
0.1	5	0.54	0.58	0.49	0.55	0.38	0.54
	10	0.58	0.62	0.53	0.60	0.42	0.61
0.5	5	0.40	0.44	0.36	0.40	0.25	0.37
	10	0.49	0.53	0.43	0.50	0.33	0.50

Open in new tab

TABLE 2

The ultimate probability, p₁(∞), of the population being fixed by chromosomes with one copy

		Ns_a = 5		Ns_a = 10		Ns_a = 40
u_a/γ	u_p/u_a	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100
0.02	5	0.73	0.77	0.69	0.76	0.59	0.79
	10	0.73	0.78	0.70	0.76	0.60	0.80
0.1	5	0.54	0.58	0.49	0.55	0.38	0.54
	10	0.58	0.62	0.53	0.60	0.42	0.61
0.5	5	0.40	0.44	0.36	0.40	0.25	0.37
	10	0.49	0.53	0.43	0.50	0.33	0.50

		Ns_a = 5		Ns_a = 10		Ns_a = 40
u_a/γ	u_p/u_a	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100	s_d = 0	s_d = s_a/100
0.02	5	0.73	0.77	0.69	0.76	0.59	0.79
	10	0.73	0.78	0.70	0.76	0.60	0.80
0.1	5	0.54	0.58	0.49	0.55	0.38	0.54
	10	0.58	0.62	0.53	0.60	0.42	0.61
0.5	5	0.40	0.44	0.36	0.40	0.25	0.37
	10	0.49	0.53	0.43	0.50	0.33	0.50

Open in new tab

to obtain because of q₁ and M_q in the right-hand side, we can compute the equilibrium solution h(z) by noting ∞ and q_m ≤ 1 for all n. The resulting solution is

h (z) = - \frac{q_{1}}{r} \ln (1 - r z),

(11)

where

q_{1} = - \frac{r}{\ln (1 - r)}

(12)

r = \frac{2 a}{a + b + c + \sqrt{{(a + b + c)}^{2} - 4 a b}} .

(13)

The equilibrium distribution of m is easily obtained by expanding the generating function. The mean number M_q is also computed from h(z) and represented as

M_{q} = \frac{b r - a}{c \ln (1 - r)} .

(14)

It is intuitively clear that increasing b or c decreases and increasing a increases M_q, but quantitatively how it changes is fairly complex, as expressed by (14).

Because M_q is the mean number of live genes in one allelic class, its value might be close to the ratio Q = n_l/k, although M_q computed above is the equilibrium value. We can check how well M_q can approximate Q by simulation. Because Q is 1 in populations with n_t = 1, we computed Q conditioned on n_t > 1 from the simulation at t = 10/γ (Table 3). As shown in Table 3, Q estimated from the simulation and M_q computed from (14) agree well. Thus, it seems that the distribution of m is close to equilibrium at t = 10/γ. As Ns_a or u_a becomes larger, the redundancy measured by M_q becomes less because both b and c increase. Increasing u_p increases b and thus decreases M_q. Introduction of the copy number selection also decreases M_q. Only when u_a/γ is very small, is high genetic redundancy observed, but otherwise the genetic redundancy is very small.

Next, we consider the number (k) of different alleles. Because a new allele is created from a redundant copy, its production rate is proportional to the number of redundant genes and u_aNf₊. Thus, in the second phase, k(t) is expected to be approximated by a pure-birth process with a rate of (M_q − q₁)u_aNf₊. Define k*(t) = E[k(t),k(∞) > 1] as the expectation of k over the event that k eventually enters the second phase. The relationship between k*(t) and E[k(t)] is represented by

E [k (t)] = p_{1} (\infty) + k^{*} (t) .

(15)

As noted above, because k(t) is a pure-birth process in the second phase, k*(t) takes the form

k^{*} (t) \propto \exp [α t],

(16)

where α = (M_q − q₁)u_aNf₊. Thus, we can write the expectation of k as

E [k (t)] = p_{1} (\infty) + C \times (1 - p_{1} (\infty)) \exp [α t] .

(17)

To check the accuracy of this expression based on the pure-birth process approximation, the values computed from it and E[k(t)] obtained from the Markov chain simulation were compared (Figure 2). To obtain C, the simulation data were fitted by the curve represented by (17). The simulation data are represented by open symbols, and the fitted curve is shown by solid lines in Figure 2. The agreement seems very good except for the very intial phase. Thus, in the second phase, k(t) evolves approximately as a pure-birth process defined as above. Note that the fitted curves are >1 at t = 0. This is probably because we started with a redundant copy (m₁ = 2). In this curve fit, we regard that a certain amount of new alleles are created at t = 0.

Finally, we consider the number of pseudogenes, n_p. The average rate of production of new pseudogenes per allelic class is again proportional to the number of redundant genes and represented by β₁ = (M_q − q₁)u_p. Once a new pseudogene is created, its number can change by duplication/deletion. The rate of decrease is β₂ = (g₋ − g₊)Nγ/2. Thus, the expected number of pseudogenes may take the following form,

E [n_{p} (t)] ~ \int_{0}^{t} β_{1} k^{*} (s) \exp [- β_{2} (t - s)] d s .

(18)

However, recall that the estimated E[k(t)] by (17) is not 1 at t = 0. Because pseudogenes are produced from redundant genes as new alleles but with a different rate

TABLE 3

Observed and expected redundancy, Q = n_l/k

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	2.038	1.953	1.912	1.837	1.639	1.616
		10	1.772	1.713	1.672	1.650	1.531	1.511
	0.1	5	1.282	1.278	1.245	1.245	1.179	1.176
		10	1.185	1.182	1.163	1.167	1.134	1.132
0.01	0.02	5	1.862	1.792	1.634	1.603	1.227	1.226
		10	1.626	1.607	1.500	1.484	1.206	1.200
	0.1	5	1.257	1.254	1.206	1.205	1.097	1.094
		10	1.172	1.168	1.145	1.143	1.075	1.074

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	2.038	1.953	1.912	1.837	1.639	1.616
		10	1.772	1.713	1.672	1.650	1.531	1.511
	0.1	5	1.282	1.278	1.245	1.245	1.179	1.176
		10	1.185	1.182	1.163	1.167	1.134	1.132
0.01	0.02	5	1.862	1.792	1.634	1.603	1.227	1.226
		10	1.626	1.607	1.500	1.484	1.206	1.200
	0.1	5	1.257	1.254	1.206	1.205	1.097	1.094
		10	1.172	1.168	1.145	1.143	1.075	1.074

^a

conditioned on n_t > 1 at t = 10/7 are computed from the Markov chain simulation.

^b

M_q's computed from Equation 14 are shown.

Open in new tab

TABLE 3

Observed and expected redundancy, Q = n_l/k

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	2.038	1.953	1.912	1.837	1.639	1.616
		10	1.772	1.713	1.672	1.650	1.531	1.511
	0.1	5	1.282	1.278	1.245	1.245	1.179	1.176
		10	1.185	1.182	1.163	1.167	1.134	1.132
0.01	0.02	5	1.862	1.792	1.634	1.603	1.227	1.226
		10	1.626	1.607	1.500	1.484	1.206	1.200
	0.1	5	1.257	1.254	1.206	1.205	1.097	1.094
		10	1.172	1.168	1.145	1.143	1.075	1.074

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	2.038	1.953	1.912	1.837	1.639	1.616
		10	1.772	1.713	1.672	1.650	1.531	1.511
	0.1	5	1.282	1.278	1.245	1.245	1.179	1.176
		10	1.185	1.182	1.163	1.167	1.134	1.132
0.01	0.02	5	1.862	1.792	1.634	1.603	1.227	1.226
		10	1.626	1.607	1.500	1.484	1.206	1.200
	0.1	5	1.257	1.254	1.206	1.205	1.097	1.094
		10	1.172	1.168	1.145	1.143	1.075	1.074

^a

conditioned on n_t > 1 at t = 10/7 are computed from the Markov chain simulation.

^b

M_q's computed from Equation 14 are shown.

Open in new tab

β₁ per allelic class, we need to add the following term due to pseudogenes created at t = 0:

\frac{β_{1}}{α} (k^{*} (0) - (1 - p_{1} (\infty))) \exp [- β_{2} t] .

(19)

The term in parentheses is the number of new alleles created at t = 0 in the curve fit mentioned above, and the ratio of the number of pseudogenes to that of new alleles when they are created is β₁/α (Ohta 1987b). Combining (18) and (19), we obtain

\begin{matrix} E [n_{p} (t)] & = β_{1} [\frac{1}{α + β_{2}} + \frac{β_{2}}{α (α + β_{2})} \exp [- (α + β_{2}) t]] k^{*} (t) \\ - \frac{β_{1} (1 - p_{1} (\infty))}{α} \exp [- β_{2} t] . \end{matrix}

(20)

Although the computation of this formula requires p₁(∞), if we want an expectation conditioned on the

Figure 2.

Open in new tab Download slide

Comparison of the number of different alleles obtained by Markov chain simulation and that expected under a linear pure-birth process. Simulation results are represented by ○, Ns_a = 5; □, Ns_a = 10; and ♦, Ns_a = 40. The curve fits are obtained assuming Equation 17. See text for more detail. Other parameters are u_a = 0.1γ and u_p = γ.

event of k > 1, it can be computed from the conditional expectation of k(t). Thus, this formula can be used to examine the relationship between conditional expectations of k and n_p. We again checked the accuracy of this equation with the values obtained by the Markov chain simulation, and the results are shown in Table 4. The agreement is generally very good. Without copy number selection (s_d = 0), β₂ = 0, and if we note k*(t) − (1 − p₁(∞)) = E[k(t)] − 1 [see (15)], Equation 20 is simplified to

E [n_{p} (t)] = \frac{β_{1}}{α} (E [k (t)] - 1),

(21)

which is equivalent to (1) obtained by Ohta (1987b). With the copy number selection, the following expression is obtained for β₂t ≫ 1,

E [n_{p}] \approx \frac{β_{1}}{α + β_{2}} k^{*} .

(22)

Thus, if γ that increases β₂ linearly is large, E[n_p]/k* becomes small. This can be seen from Table 4, where even with weak copy number selection (s_d = s_a/100), the number of pseudogenes is drastically reduced.

SIMULATION

To examine the validity and limitation of the SSWMD limit obtained by the Markov chain approximation and also to investigate parameter ranges outside that assumed in the approximation, we carried out computer simulation closely following the model. We call this type of simulation the Wright-Fisher simulation. In the simulation, N chromosomes are prepared. In each chromosome, k, m₁, …, m_k, n_p and its fitness are stored. Numbers of mutations and duplication/deletion events are determined by generating Poisson random numbers with means computed by multiplying the number of genes and respective rates. Genes undergoing mutations or duplication/deletion are chosen randomly. After these,

TABLE 4

Observed and expected number of pseudogenes

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	1.119	1.102	1.237	1.228	1.469	1.490
		10	1.650	1.667	1.840	1.822	2.312	2.308
	0.1	5	3.866	3.885	4.665	4.678	6.775	6.779
		10	4.093	4.127	4.679	4.734	6.797	6.869
0.01	0.02	5	0.707	0.668	0.451	0.449	0.064	0.064
		10	1.011	0.980	0.747	0.712	0.101	0.104
	0.1	5	2.679	2.636	2.176	2.202	0.500	0.519
		10	2.786	2.753	2.228	2.206	0.583	0.578

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	1.119	1.102	1.237	1.228	1.469	1.490
		10	1.650	1.667	1.840	1.822	2.312	2.308
	0.1	5	3.866	3.885	4.665	4.678	6.775	6.779
		10	4.093	4.127	4.679	4.734	6.797	6.869
0.01	0.02	5	0.707	0.668	0.451	0.449	0.064	0.064
		10	1.011	0.980	0.747	0.712	0.101	0.104
	0.1	5	2.679	2.636	2.176	2.202	0.500	0.519
		10	2.786	2.753	2.228	2.206	0.583	0.578

Values at t = 10/7 are shown.

^a

E[n_p] computed from the Markov chain simulation.

^b

Values computed from Equation 20 are shown.

Open in new tab

TABLE 4

Observed and expected number of pseudogenes

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	1.119	1.102	1.237	1.228	1.469	1.490
		10	1.650	1.667	1.840	1.822	2.312	2.308
	0.1	5	3.866	3.885	4.665	4.678	6.775	6.779
		10	4.093	4.127	4.679	4.734	6.797	6.869
0.01	0.02	5	0.707	0.668	0.451	0.449	0.064	0.064
		10	1.011	0.980	0.747	0.712	0.101	0.104
	0.1	5	2.679	2.636	2.176	2.202	0.500	0.519
		10	2.786	2.753	2.228	2.206	0.583	0.578

			Ns_a = 5		Ns_a = 10		Ns_a = 40
S_d/s_a	u_a/γ	u_p/u_a	Obs.^a	Exp.^b	Obs.^a	Exp.^b	Obs.^a	Exp.^b
0.0	0.02	5	1.119	1.102	1.237	1.228	1.469	1.490
		10	1.650	1.667	1.840	1.822	2.312	2.308
	0.1	5	3.866	3.885	4.665	4.678	6.775	6.779
		10	4.093	4.127	4.679	4.734	6.797	6.869
0.01	0.02	5	0.707	0.668	0.451	0.449	0.064	0.064
		10	1.011	0.980	0.747	0.712	0.101	0.104
	0.1	5	2.679	2.636	2.176	2.202	0.500	0.519
		10	2.786	2.753	2.228	2.206	0.583	0.578

Values at t = 10/7 are shown.

^a

E[n_p] computed from the Markov chain simulation.

^b

Values computed from Equation 20 are shown.

Open in new tab

the fitness of each chromosome is determined according to (2), (3), or (4) and is divided by the maximum fitness of the current population. The next generation is constructed in the following way: First, choose a chromosome randomly. Next, draw a uniform random number in [0,1) and, if the fitness of the chosen chromosome is less than the random number, add it to the population of the next generation. This is repeated until the chromosome number becomes N. For each parameter set, 1600 replications are made. We present the simulation results in comparison with the SSWMD limit.

First, we examine time courses of the average k, n_l, n_p and the proportion ${\tilde{p}}_{1} (t)$ of populations with n_t = 1 when Nγ, Nu_p, and Nu_a are small and agreement between the Markov chain simulation and Wright-Fisher simulation is expected. Some examples are shown in Figure 3. The values computed by the Markov chain simulation are represented by solid lines and those by the Wright-Fisher simulation are represented by dotted lines. Parameters used are γ = u_p = 0.001, u_a = 0.0001, s_a = 0.1, s_d = 0, and N = 100 (Figure 3A) and N = 400 (Figure 3B). With N = 100, the Markov chain approximation is very close to the Wright-Fisher simulation. For this parameter set, Nγ = Nu_p = 0.01 and Nu_a = 0.001. However, with N = 400, the Markov chain approximation underestimates expected values of k, n_p, n_l by 10 to 20%, especially in the later generations, although they seem to increase exponenetially. The largest discrepancy is found in n_p. As the number of genes per chromosome increases, the population becomes polymorphic because product parameters such as n_lNγ become large. This may be the reason for the discrepancies. Estimates of ${\tilde{p}}_{1} (t)$ from the two simulations agree well.

To examine the numbers of genes in a wider parameter range, more simulations were carried out, and some of the results are shown in Table 5 (without copy number selection) and Table 6 (with copy number selection). To clarify the effect of increasing γ, u_a, and u_p, cases with γ = 0.0001 and 0.001 are shown along with the SSWMD limit keeping the u_a/γ and u_p/γ constant. Means and variances of gene numbers at t = 10/γ are shown. With u_a/γ = 0.02 and N = 100, values for γ = 0.001 and 0.0001 are quite close to the SSWMD limit. As population size increases (N = 400) or u_a/γ approaches 0.1, the values with a large γ (0.001) deviate from the SSWMD limit. Especially, the number of pseudogenes becomes large when γ is large, decreasing the ratio of the number of different alleles to those of pseudogenes. With larger N or u_a, populations become polymorphic with regard to gene numbers, thus breaking the assumption of the Markov chain approximation. The increase of pseudogenes is probably due to competition among advantageous alleles and resulting reduction of the probability of fixation of new different alleles. The trend becomes more pronounced when both N and u_a are larger (u_a/γ = 0.1, N = 400). In this parameter set, the numbers of different alleles and live genes are also larger than the SSWMD limit. This is especially true when there is copy number selection. In the Markov chain approximation, fixations of new alleles are assumed to occur after fixations of redundant copies. However, in polymorphic populations, new, different alleles may appear before fixation of redundant copies. This may increase the fixation rate of new genes, especially when there is copy number selection and fixations of redundant copies are difficult to make occur.

The number of different alleles increases as population size increases without copy number selection. However, this is not necessarily true with copy number selection. With u_a/γ = 0.02, the average numbers of different alleles are almost the same for N = 100 and 400. The variance of k is generally very large and in the order of the square of the mean when the mean is large. As u_a increases, keeping u_a/u_p constant, E[k] always increases.

Thus far, we investigated only cases with fairly strong selection (Ns_a = 10, 40), where one of the assumptions of the SSWMD limit is satisfied. We also investigated when selection is weak, and the result is shown in Table 7.

Figure 3.

Open in new tab Download slide

Comparison of the expected numbers of genes obtained by the Markov chain simulation and Wright-Fisher simulation. Two population sizes are used. A, N = 100; B, N = 400. The average numbers of different alleles (k), live genes (n_l), pseudogenes (n_p), and the proportion of populations with n_t = 1 (p₁) are plotted against generation. Values obtained by the Markov chain simulation (M) are represented by closed symbols with solid lines and those by the Wright-Fisher simulation (W) are represented by open symbols with broken lines. Other parameters are γ = u_p = 0.0001, u_a = 0.00001, s_a = 0.1, s_d = 0.

The result of reducing γ to 0.0001 was indistinguishable from that of γ = 0.001 and is not shown. The average number of different alleles is very close to 1, although pseudogenes are created. Thus, for smaller populations, new alleles are difficult to evolve. Copy number selection has very little effect on E[k] and slight effects on E[n_p]. One notable difference from the strong selection case is that E[k] decreases as u_a increases, keeping u_a/u_p constant. The effect of increasing u_p overrides that of increasing u_a if Ns_a is of the order of 1. Also note that the variance of n_p is very large. Some populations are considered to have a very large number of pseudogenes, whereas n_t = 1 and n_p = 0 in other populations.

DISCUSSION

In the present article, evolution of multigene families by gene duplication was analyzed assuming a haploid population and no interchromosomal recombination. In such a population, redundant copies are created first, and then one of them evolves to become a new gene by adaptive mutation. This system differs from those in which selection directly operates on the number of gene copies (Ohta 1983; Stephan 1986) in that acquiring adaptive genes requires the presence of redundant genes (Ohta 1987a). To analyze this system, we introduced the Markov chain approximation for small mutation and duplication/deletion rates under which it is assumed that each population is fixed by one chromosome type and that evolution occurs by occasional jumps among chromosome types (the SSWMD limit). By this approximation, the extent of gene redundancy was calculated, and the evolution of the average number of different alleles was characterized. Effects of polymorphism expected under higher mutation and duplication/deletion rates were analyzed using computer simulation. The most significant departure from the SSWMD limit with the same ratios of the rates is the increase of the number of pseudogenes. Also, polymorphism elevates the number of different alleles compared to that computed by the SSWMD limit, especially when there is negative selection on the copy number. With these caveats in mind, we consider some of the questions concerning the evolution of multigene families based on the SSWMD limit.

First, note that in the SSWMD limit, if we measure time in units of γ⁻¹, the behavior of the process is characterized by the scaled parameters

u_{a}^{*} = u_{a} ∕ γ

and

u_{p}^{*} = u_{p} ∕ γ

⁠, in addition to Ns_a and Ns_d. Thus, if we keep these parameters constant, the birth rate α is proportional to γ. Let α* = α/γ and consider a simple case of no copy number selection (s_d = 0). In this case, α* is expressed as

α^{*} = \frac{- (1 - (2 u_{p}^{*} + 2 d + 1) r)}{\log (1 - r)},

(23)

where

d = N f_{+} u_{a}^{*}, r = \frac{1}{1 + u_{p}^{*} + d + \sqrt{{(d + u_{p}^{*})}^{2} + d}} .

Here, d is proportional to both

u_{a}^{*}

and Nf₊. Two limiting cases are instructive. If d ≪ 1, Equation 23 reduces to

α^{*} \approx \frac{d}{2 u_{p}^{*} (1 + 2 u_{p}^{*}) \log (1 + \frac{1}{2 u_{p}^{*}})} .

(24)

This shows that α* increases linearly as d increases and that the rate is a decreasing function of

u_{p}^{*}

⁠. Thus, if

u_{p}^{*}

is small, α* quickly increases as d increases either by an increase of N or

u_{a}^{*}

⁠. However, the increase is saturated, and for d ≫ 1,

α^{*} \approx \frac{d}{2 (d + u_{p}^{*})} .

(25)

TABLE 5

Numbers of different alleles (k), live genes (n₁), and pseudogenes (n_p)

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.611	1.699	2.301	8.034	1.799	17.737	0.6738
		0.0001	1.681	2.057	2.367	8.383	1.827	18.459	0.6681
		M	1.650	2.001	2.306	8.155	1.840	18.312	0.6748
	400	0.001	2.780	8.020	4.509	32.288	3.121	34.573	0.5825
		0.0001	2.736	9.101	3.896	26.051	2.274	20.182	0.6025
		M	2.647	8.608	3.735	23.955	2.312	21.745	0.5960
0.1	100	0.001	2.491	4.833	2.939	8.082	5.753	73.999	0.4894
		0.0001	2.729	7.932	3.095	11.501	4.906	62.177	0.5181
		M	2.689	7.801	3.043	11.339	4.679	54.512	0.5159
	400	0.001	6.236	24.883	7.741	41.118	12.974	195.675	0.3294
		0.0001	6.624	45.713	7.595	62.824	8.187	105.184	0.3894
		M	5.902	43.539	6.639	58.045	6.797	86.013	0.4155

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.611	1.699	2.301	8.034	1.799	17.737	0.6738
		0.0001	1.681	2.057	2.367	8.383	1.827	18.459	0.6681
		M	1.650	2.001	2.306	8.155	1.840	18.312	0.6748
	400	0.001	2.780	8.020	4.509	32.288	3.121	34.573	0.5825
		0.0001	2.736	9.101	3.896	26.051	2.274	20.182	0.6025
		M	2.647	8.608	3.735	23.955	2.312	21.745	0.5960
0.1	100	0.001	2.491	4.833	2.939	8.082	5.753	73.999	0.4894
		0.0001	2.729	7.932	3.095	11.501	4.906	62.177	0.5181
		M	2.689	7.801	3.043	11.339	4.679	54.512	0.5159
	400	0.001	6.236	24.883	7.741	41.118	12.974	195.675	0.3294
		0.0001	6.624	45.713	7.595	62.824	8.187	105.184	0.3894
		M	5.902	43.539	6.639	58.045	6.797	86.013	0.4155

Values at t = 10/7 are shown. Other parameters are s_a = 0.1, s_d = 0, and u_p/u_a = 10. Number of replications for each parameter set is 1600.

^a

M denotes the values obtained by the Markov chain simulation.

Open in new tab

TABLE 5

Numbers of different alleles (k), live genes (n₁), and pseudogenes (n_p)

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.611	1.699	2.301	8.034	1.799	17.737	0.6738
		0.0001	1.681	2.057	2.367	8.383	1.827	18.459	0.6681
		M	1.650	2.001	2.306	8.155	1.840	18.312	0.6748
	400	0.001	2.780	8.020	4.509	32.288	3.121	34.573	0.5825
		0.0001	2.736	9.101	3.896	26.051	2.274	20.182	0.6025
		M	2.647	8.608	3.735	23.955	2.312	21.745	0.5960
0.1	100	0.001	2.491	4.833	2.939	8.082	5.753	73.999	0.4894
		0.0001	2.729	7.932	3.095	11.501	4.906	62.177	0.5181
		M	2.689	7.801	3.043	11.339	4.679	54.512	0.5159
	400	0.001	6.236	24.883	7.741	41.118	12.974	195.675	0.3294
		0.0001	6.624	45.713	7.595	62.824	8.187	105.184	0.3894
		M	5.902	43.539	6.639	58.045	6.797	86.013	0.4155

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.611	1.699	2.301	8.034	1.799	17.737	0.6738
		0.0001	1.681	2.057	2.367	8.383	1.827	18.459	0.6681
		M	1.650	2.001	2.306	8.155	1.840	18.312	0.6748
	400	0.001	2.780	8.020	4.509	32.288	3.121	34.573	0.5825
		0.0001	2.736	9.101	3.896	26.051	2.274	20.182	0.6025
		M	2.647	8.608	3.735	23.955	2.312	21.745	0.5960
0.1	100	0.001	2.491	4.833	2.939	8.082	5.753	73.999	0.4894
		0.0001	2.729	7.932	3.095	11.501	4.906	62.177	0.5181
		M	2.689	7.801	3.043	11.339	4.679	54.512	0.5159
	400	0.001	6.236	24.883	7.741	41.118	12.974	195.675	0.3294
		0.0001	6.624	45.713	7.595	62.824	8.187	105.184	0.3894
		M	5.902	43.539	6.639	58.045	6.797	86.013	0.4155

Values at t = 10/7 are shown. Other parameters are s_a = 0.1, s_d = 0, and u_p/u_a = 10. Number of replications for each parameter set is 1600.

^a

M denotes the values obtained by the Markov chain simulation.

Open in new tab

The increase rate with d = ∞ is γ/2 per generation as expected, but the approach to it is affected by the pseudogene mutation rate. Between these two extreme cases, α* was numerically shown to be an increasing function of d in the range 0.01 ≤ u_p/γ ≤ 10 (data not shown). Thus, for example, the rate of increase of gene number is larger in larger populations with s_d = 0. However, this is not always true when there is copy number selection (s_d ≠ 0), as shown in Figure 4. In these cases, as N becomes larger, α* first increases and then decreases. This is because the effect of copy number selection also becomes larger in larger populations. However, as noted in the previous section, the SSWMD limit underestimates the number of different alleles with copy number selection as the change rates (u_a, u_p, and u_γ) become larger. Thus, the decrease of the rate as N increases is not as pronounced, as Figure 4 shows in more realistic situations. Indeed, when u_a = 10⁻⁵, u_p = 10⁻⁴, γ = 10⁻⁴, s_a = 0.1, and s_d = 0.002, corresponding to set 1 in Figure 4, the average numbers of different alleles at the 10/γth generation were 2.048 ± 0.0103 for N = 200 and 1.896 ± 0.081 for N = 800. In summary, the number of different alleles increases as population size becomes large without copy number selection, but this is not necessarily so with copy number selection.

Next, we examine the speed of gene duplication. We measure it by the doubling time of E[k]. It is expressed by log2/[(M_q − q₁)u_aNf₊] and depends on u_a, u_p, γ, Ns_a, and Ns_d. We do not have reliable estimates for most of the parameters, but some rough calculation under

TABLE 6

Numbers of different alleles (k), live genes (n_l), and pseudogenes (n_p) in copy number selection

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.419	1.147	1.792	4.161	0.774	4.689	0.7625
		0.0001	1.407	0.978	1.740	3.332	0.734	4.600	0.7612
		M	1.452	1.180	1.805	3.834	0.747	4.529	0.7464
	400	0.001	1.693	2.306	2.081	5.706	0.294	0.827	0.7556
		0.0001	1.484	1.297	1.648	2.388	0.116	0.233	0.7856
		M	1.436	1.165	1.567	2.055	0.101	0.232	0.8021
0.1	100	0.001	2.278	4.300	2.631	7.033	3.011	25.729	0.5694
		0.0001	2.283	4.808	2.541	6.764	2.172	16.809	0.5894
		M	2.266	5.111	2.508	7.251	2.228	16.977	0.5914
	400	0.001	3.982	13.521	4.675	20.695	2.716	15.939	0.4869
		0.0001	3.114	11.480	3.376	14.438	0.827	2.729	0.5869
		M	2.643	7.618	2.795	9.115	0.583	1.765	0.6091

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.419	1.147	1.792	4.161	0.774	4.689	0.7625
		0.0001	1.407	0.978	1.740	3.332	0.734	4.600	0.7612
		M	1.452	1.180	1.805	3.834	0.747	4.529	0.7464
	400	0.001	1.693	2.306	2.081	5.706	0.294	0.827	0.7556
		0.0001	1.484	1.297	1.648	2.388	0.116	0.233	0.7856
		M	1.436	1.165	1.567	2.055	0.101	0.232	0.8021
0.1	100	0.001	2.278	4.300	2.631	7.033	3.011	25.729	0.5694
		0.0001	2.283	4.808	2.541	6.764	2.172	16.809	0.5894
		M	2.266	5.111	2.508	7.251	2.228	16.977	0.5914
	400	0.001	3.982	13.521	4.675	20.695	2.716	15.939	0.4869
		0.0001	3.114	11.480	3.376	14.438	0.827	2.729	0.5869
		M	2.643	7.618	2.795	9.115	0.583	1.765	0.6091

Values at t = 10/γ are shown. Other parameters are s_a = 0.1, s_d = 0.01, and u_p/u_a = 10. Number or replications for each parameter set is 1600.

^a

M denotes the values obtained by the Markov chain simulation.

Open in new tab

TABLE 6

Numbers of different alleles (k), live genes (n_l), and pseudogenes (n_p) in copy number selection

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.419	1.147	1.792	4.161	0.774	4.689	0.7625
		0.0001	1.407	0.978	1.740	3.332	0.734	4.600	0.7612
		M	1.452	1.180	1.805	3.834	0.747	4.529	0.7464
	400	0.001	1.693	2.306	2.081	5.706	0.294	0.827	0.7556
		0.0001	1.484	1.297	1.648	2.388	0.116	0.233	0.7856
		M	1.436	1.165	1.567	2.055	0.101	0.232	0.8021
0.1	100	0.001	2.278	4.300	2.631	7.033	3.011	25.729	0.5694
		0.0001	2.283	4.808	2.541	6.764	2.172	16.809	0.5894
		M	2.266	5.111	2.508	7.251	2.228	16.977	0.5914
	400	0.001	3.982	13.521	4.675	20.695	2.716	15.939	0.4869
		0.0001	3.114	11.480	3.376	14.438	0.827	2.729	0.5869
		M	2.643	7.618	2.795	9.115	0.583	1.765	0.6091

			k		n₁		n_p
u_a/γ	N	γ^a	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.02	100	0.001	1.419	1.147	1.792	4.161	0.774	4.689	0.7625
		0.0001	1.407	0.978	1.740	3.332	0.734	4.600	0.7612
		M	1.452	1.180	1.805	3.834	0.747	4.529	0.7464
	400	0.001	1.693	2.306	2.081	5.706	0.294	0.827	0.7556
		0.0001	1.484	1.297	1.648	2.388	0.116	0.233	0.7856
		M	1.436	1.165	1.567	2.055	0.101	0.232	0.8021
0.1	100	0.001	2.278	4.300	2.631	7.033	3.011	25.729	0.5694
		0.0001	2.283	4.808	2.541	6.764	2.172	16.809	0.5894
		M	2.266	5.111	2.508	7.251	2.228	16.977	0.5914
	400	0.001	3.982	13.521	4.675	20.695	2.716	15.939	0.4869
		0.0001	3.114	11.480	3.376	14.438	0.827	2.729	0.5869
		M	2.643	7.618	2.795	9.115	0.583	1.765	0.6091

Values at t = 10/γ are shown. Other parameters are s_a = 0.1, s_d = 0.01, and u_p/u_a = 10. Number or replications for each parameter set is 1600.

^a

M denotes the values obtained by the Markov chain simulation.

Open in new tab

TABLE 7

Numbers of different alleles (k), live genes (n_l), and pseudogenes (n_p) with weak selection

		k		n_l		n_p
s_d/s_a	u_a/γ	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.0	0.02	1.130	0.261	1.477	2.526	1.411	13.711	0.7719
	0.1	1.068	0.087	1.158	0.313	2.967	33.561	0.6656
0.01	0.02	1.145	0.301	1.439	2.105	1.289	14.155	0.7863
	0.1	1.056	0.076	1.126	0.244	2.245	23.901	0.7087

		k		n_l		n_p
s_d/s_a	u_a/γ	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.0	0.02	1.130	0.261	1.477	2.526	1.411	13.711	0.7719
	0.1	1.068	0.087	1.158	0.313	2.967	33.561	0.6656
0.01	0.02	1.145	0.301	1.439	2.105	1.289	14.155	0.7863
	0.1	1.056	0.076	1.126	0.244	2.245	23.901	0.7087

Values at t = 10/γ are shown. Other parameters are N = 25, s_a = 0.1 and u_p/u_a = 10. Number of replications for each parameter set is 1600.

Open in new tab

TABLE 7

Numbers of different alleles (k), live genes (n_l), and pseudogenes (n_p) with weak selection

		k		n_l		n_p
s_d/s_a	u_a/γ	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.0	0.02	1.130	0.261	1.477	2.526	1.411	13.711	0.7719
	0.1	1.068	0.087	1.158	0.313	2.967	33.561	0.6656
0.01	0.02	1.145	0.301	1.439	2.105	1.289	14.155	0.7863
	0.1	1.056	0.076	1.126	0.244	2.245	23.901	0.7087

		k		n_l		n_p
s_d/s_a	u_a/γ	Mean	Var.	Mean	Var.	Mean	Var.	${\tilde{p}}_{1}$
0.0	0.02	1.130	0.261	1.477	2.526	1.411	13.711	0.7719
	0.1	1.068	0.087	1.158	0.313	2.967	33.561	0.6656
0.01	0.02	1.145	0.301	1.439	2.105	1.289	14.155	0.7863
	0.1	1.056	0.076	1.126	0.244	2.245	23.901	0.7087

Values at t = 10/γ are shown. Other parameters are N = 25, s_a = 0.1 and u_p/u_a = 10. Number of replications for each parameter set is 1600.

Open in new tab

this model is possible. For u_p, we may use the null mutation rate measured in Drosophila by Mukai and Cockerham (1977). Averaging over four loci encoding various enzymes, they obtained 1.03 × 10⁻⁵ for the null mutation rate per loci. Estimates of unequal crossing-over rates vary widely. For example, in Drosophila, the duplication rates from a single gene at the ma and ry loci are estimated to be 2.7 × 10⁻⁶ and 1.7 × 10⁻⁴, respectively (Shapira and Finnerty 1986). In maize, the rate per copy is estimated to be in the range 10⁻⁴ − 10⁻³ (Sudpak et al. 1993; Eggelston et al. 1995). Here, we tentatively assume γ = 10⁻⁴, taking an intermediate value. Currently, we do not have any estimates for the advantageous mutation rate u_a or selection parameters s_a, s_d, but to get a rough idea, let us assume u_a = 10⁻⁶, employing the bandmorph mutation rate 1.03 × 10⁻⁶, estimated by Mukai and Cockerham (1977) in Drosophila. With Ns_a = 40 and no copy number selection, the doubling time is 8 × 10⁴ generations. Because d* ≪ 1, the doubling time is shown to be roughly proportional to (u_aNf₊γ)⁻¹ from (24), and reducing u_a to one-tenth makes the time about 10 times greater. At any rate, gene

Figure 4.

Open in new tab Download slide

Examples of the rate, α, of increase of the number of different alleles with copy number selection as a function of population size. α/γ is computed from the SSWMD limit as a function of population size. Two examples are shown. Set 1, u_a = 0.1γ, u_p = γ; Set 2, u_a = 0.02γ, u_p = 0.2γ. Other parameters are s_a = 0.1, s_d = 0.002.

duplications can occur very quickly under the present model from the view point of the evolutionary time scale. However, according to Fryxell (1996), the doubling time is of the order of 10 to 100 million years in a particular lineage of a gene family. Clegg et al. (1997) surveyed three multigene families in plants and estimated new gene recruitment rates to be 2.9 × 10⁻⁸ for Adh, 2.8 × 10⁻⁷ for Chs, and 2.7 × 10⁻⁷ for rbcS. Because explosive gene multiplications are not observed in the evolution of multigene families, Fryxell (1996) proposed that simultaneous coevolution of two or more gene families might be necessary for gene multiplication in respective families, which will diminish duplication rates significantly. However, there are other explanations. For example, Ns_a might not be constant. If Ns_a becomes close to 1 or less, the number of different alleles stays near 1 as shown by Ohta (1987a,b) and by the present study. Another explanation is copy number selection. If we introduce weak copy number selection (s_d = 0.01s_a), the doubling times in the above examples become 2 × 10⁶ (u_a = 10⁻⁷) and 2 × 10⁵ (u_a = 10⁻⁶) generations, making the times much longer. Thus, copy number selection might have contributed to retarding the duplication rates. Copy number selection might also explain the general observation of larger multiplicity of genes in plants and mammals compared to those in, for example, Drosophila and lower eukaryotes. The numbers of rDNA genes in plants are very large compared to those in Drosophila and seem to contain many pseudogenes (Buckler et al. 1997). The number of pseudogenes is expected to be very small under strong copy number selection (see Tables 4 and 6).

We usually observe large variation in gene numbers in one multigene family among related species. For example, in Drosophila amylase genes, the numbers of genes are two (Payant et al. 1988; Shibata and Yamazaki 1995), one to three (Brown et al. 1990), and at least four (Da Lage et al. 1992) in D. melanogaster, D. pseudoobscura, and D. ananassae, respectively. Such variation can be explained by difference in parameters such as s_a and N. However, even with the same parameter set, Clark (1994) showed that multiple equilibrium points concerning gene configuration are possible in two gene systems. Moreover, the variance of gene number is large, an emphasized by Ohta (1987b). As shown here, this is because the increase of gene number behaves like a pure linear-birth process. The coefficient of variation of such a process is asymptotically expressed as $1 ∕ \sqrt{n_{0}}$ ⁠, where n₀ is the initial number (see p. 159 of Cox and Miller 1965). In our case, a further increase of the variance occurs because the gene number remains 1 in a significant proportion of lineages. Thus, we should expect a very large variance of gene numbers in a multigene family.

In the present article, as a fitness function, Ohta's model (Model I; Ohta 1987b) was mostly used. In the Markov chain approximation, the difference in fitness function contributes only through difference in the fixation probability of a chromosome with one more different allele than other chromosomes in the population. As noted, models I, II, and III correspond to recessive, semidominant, and dominant advantageous mutation models in this regard. Thus, the increase in the number of different alleles is more difficult in this order because a mutant chromosome appears singly at first. However, if we consider a decrease of the number of different alleles by chance, models I, II, and III correspond to dominant, semidominant, and recessive deleterious mutant models. Therefore, although it is most difficult to increase the number of different alleles in Model I, it is also most difficult to decrease it. Dependency on parameters also differes among the models. The fixation rate d = u_aNf₊/γ of different alleles is proportional to $\sqrt{N s_{a}}$ ⁠, in Model I and Ns_a in Model II (Walsh's model; Walsh 1995). Thus, with the same Ns_a > 1, the number of different alleles becomes larger under Model II. For d ≪ 1, α* is proportional to d as shown in (24). Indeed, for Nu_a = 0.04, Nu_p = Nγ = 0.4, Ns_a = 40, and s_d = 0, the Wright-Fisher-type simulation shows that the average number of different alleles at t = 25N was 150 in Model II, whereas it was 6 in Model I. Thus, the selection in Model I is the least efficient in increasing gene number.

We assumed that no duplication/deletion occurs if the total number, n_t, of genes becomes 1 because the duplication rate is much smaller if there is only a single copy. We divided the process into the first and second phases, depending on whether k is 1 or more than 1. In the second phase, the evolution is the same as that in Ohta's model (Ohta 1987b). The analysis of the first phase shows that even if Ns_a is fairly large, nearly half or more of populations starting from chromosomes with two copies ends up in single gene chromosomes unless u_a/γ is not small (Table 2). Thus, we expect to find lineages with only a single gene even if gene number is large in other lineages.

From our approximate analysis, the gene duplication processes in a haploid model without interchromosomal crossing over were characterized, and dependencies of the speed of gene duplication, genetic redundancy, and the number of pseudogenes on various parameters are clarified for strong selection cases. With the accumulation of data on gene families, various discussions on the mechanisms of the evolution of multigene families have been made (e.g., Fryxell 1996; Clegg et al. 1997). Our results will provide a quantitative framework for such discussions by incorporating gene duplication and subsequent divergence into the model. However, higher organisms are mostly diploid, and interchromosomal crossing over does occur. The diploidy and interchromosomal crossing over not included here are known to accelerate the increase of gene number (Ohta 1988a). Incorporating them into the model and analyzing the model in more detail would be necessary for a better understanding of the evolution of multigene families.

Acknowledgement

We thank T. Ohta and M. Iizuka, A. Clark, and an anonymous reviewer for their helpful comments on earlier drafts of the manuscript. This research was partially supported by grants-in-aid to H.T. from the Ministry of Education, Science and Culture of Japan.

Footnotes

Communicating editor: A. G. Clark

LITERATURE CITED

Brown

C J

,

Aquadro

C F

,

Anderson

W W

,

1990

DNA sequence evolution of the amylase multigene family in Drosophila pseudoobscura

.

Genetics

126

:

131

–

138

.

Buckler

E S

IV,

Ippolito

A

,

Holtsford

T P

,

1997

The evolution ribosomal DNA: divergent paralogues and phylogenetic implications

.

Genetics

145

:

821

–

832

.

Carroll

S B

,

1995

Homeotic genes and the evolution of arthropods and chordates

.

Nature

376

:

479

–

485

.

Clark

A G

,

1994

Invasion and maintenance of a gene duplication

.

Proc. Natl. Acad. Sci. USA

91

:

2950

–

02954

.

Google Scholar

Crossref

WorldCat

Clegg

M T

,

Cummings

M P

,

Durmbin

M L

,

1997

The evolution of plant nuclear genes

.

Proc. Natl. Acad. Sci. USA

94

:

7791

–

7798

.

Google Scholar

Crossref

WorldCat

Cox

D R

,

Miller

H D

,

1965

The Theory of Stochastic Processes

.

Chapman and Hall

,

London

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Crow

J F

,

Kimura

M

,

1970

An Introduction to Population Genetic Theory

.

Harper & Row

,

New York

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Da Lage

J L

,

Lemeunier

M L

,

Cariou

M L

,

David

J R

,

1992

Multiple amylase genes in Drosophila ananassae and related species

.

Genet. Res.

59

:

85

–

92

.

Eggleston

W B

,

Alleman

M

,

Kermicle

J L

,

1995

Molecular organization and germinal instability of R-stippled maize

.

Genetics

141

:

347

–

360

.

Fryxell

K J

,

1996

The coevolution of gene family trees

.

Trends Genet.

12

:

364

–

369

.

Gillespie

J H

,

1983

Some properties of finite populations experiencing strong selection and weak mutation

.

Am. Nat.

121

:

691

–

708

.

Google Scholar

Crossref

WorldCat

Hollick

J B

,

Dorweiler

J E

,

Chandler

V L

,

1997

Paramutation and related allelic interactions

.

Trends Genet.

13

:

302

–

308

.

Iwabe

N

,

Kuma

K

,

Miyata

T

,

1996

Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates

.

Mol. Biol. Evol.

13

:

483

–

493

.

Kimura

K

,

1962

On the probability of fixation of mutant genes in a population

.

Genetics

47

:

713

–

719

.

Kimura

M

,

Crow

J F

,

1964

The number of alleles that can be maintained in a finite population

.

Genetics

49

:

725

–

738

.

Lange

B W

,

Langley

C H

,

Stephan

W

,

1990

Molecular evolution of Drosophila metallothionein genes

.

Genetics

126

:

921

–

932

.

Li

W H

,

1987

Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons

.

J. Mol. Evol.

24

:

337

–

345

.

Lyckegaard

E M S

,

Clark

A G

,

1989

Ribosomal DNA and stellate gene copy number variation on the Y chromosome of Drosophila melanogaster

.

Proc. Natl. Acad. Sci. USA

86

:

1944

–

1984

.

Google Scholar

Crossref

WorldCat

Mukai

T

,

Cockerham

C C

,

1977

Spontaneous mutation rates at enzyme loci in Drosophila melanogaster

.

Proc. Natl. Acad. Sci. USA

74

:

2514

–

2517

.

Google Scholar

Crossref

WorldCat

Neitz

M

,

Neitz

J

,

1995

Numbers and ratios of visual pigment genes for normal red-green color vision

.

Science

267

:

1013

–

1016

.

Ohno

S

,

1970

Evolution by Gene Duplication

.

Springer-Verlag

,

New York

.

Ohta

T

,

1983

Theoretical study on the accumulation of selfish DNA

.

Genet. Res.

41

:

1

–

15

.

Ohta

T

,

1987a

A model of evolution for accumulating genetic information

.

J. Theor. Biol.

124

:

199

–

211

.

Google Scholar

Crossref

WorldCat

Ohta

T

,

1987b

Simulating evolution by gene duplication

.

Genetics

115

:

207

–

213

.

Google Scholar

Crossref

WorldCat

Ohta

T

,

1988a

Further simulation studies on evolution by gene duplication

.

Evolution

42

:

375

–

386

.

Google Scholar

Crossref

WorldCat

Ohta

T

,

1988b

Multigene and supergene families

.

Oxf. Surv. Evol. Biol.

5

:

41

–

65

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Ohta

T

,

1991

Multigene families and the evolution of complexity

.

J. Mol. Evol.

33

:

34

–

41

.

Payant

V

,

Abukashawa

S

,

Sasseville

M

,

Benkel

B F

,

Hickey

D A

et al. ,

1988

Evolutionary conservation of the chromosomal configuration and regulation of amylase genes among eight species of the Drosophila melanogaster species subgroup

.

Mol. Biol. Evol.

5

:

560

–

567

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Petrov

D A

,

Lozovskaya

E R

,

Hartl

D L

,

1996

High intrinsic rate of DNA loss in Drosophila

.

Nature

384

:

346

–

349

.

Shapira

S K

,

Finnerty

V G

,

1986

The use of genetic complementation in the study of eukaryotic macromolecular evolution: rate of spontaneous gene duplication at two loci of Drosophila melanogaster

.

J. Mol. Evol.

23

:

159

–

167

.

Shibata

H

,

Yamazaki

T

,

1995

Molecular evolution of the duplicated Amy locus in Drosophila melanogaster species subgroup: concerted evolution only in coding region and excess of non-synonymous substitutions in speciation

.

Genetics

141

:

223

–

236

.

Stephan

W

,

1986

Recombination and the evolution of satellite DNA

.

Genet. Res.

47

:

167

–

174

.

Sudupak

M A

,

Bennetzen

J L

,

Hullbert

S H

,

1993

Unequal exchange and meiotic instability of disease-resistance genes in the Rp 1 region of maize

.

Genetics

133

:

119

–

125

.

Takano

T S

,

Kusakabe

S

,

Koga

A

,

Mukai

T

,

1989

Polymorphism for the number of tandemly multiplicated glycerol-3-phosphate dehydrogenase genes in Drosophila melanogaster

.

Proc. Natl. Acad. Sci. USA

86

:

5000

–

5004

.

Google Scholar

Crossref

WorldCat

Theissen

G

,

Kim

J T

,

Saedler

H

,

1996

Classification and phylogeny of the MADS-box multigene family suggest defined roles of MADS-box gene subfamilies in the morphological evolution of eukaryotes

.

J. Mol. Evol.

43

:

484

–

516

.

Walsh

J B

,

1985

Interaction of selection and biased gene conversion in a multigene family

.

Proc. Natl. Acad. Sci. USA

82

:

153

–

157

.

Google Scholar

Crossref

WorldCat

Walsh

J B

,

1995

How often do duplicated genes evolve new functions?

Genetics

139

:

421

–

428

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
February 2021	1
March 2021	1
April 2021	13
May 2021	10
June 2021	3
July 2021	1
August 2021	1
September 2021	3
October 2021	7
November 2021	4
January 2022	9
February 2022	4
March 2022	1
April 2022	8
May 2022	8
June 2022	5
July 2022	5
August 2022	11
September 2022	9
October 2022	3
November 2022	5
December 2022	12
January 2023	4
February 2023	8
March 2023	16
April 2023	21
May 2023	12
June 2023	9
July 2023	4
August 2023	8
September 2023	3
October 2023	4
November 2023	5
December 2023	19
January 2024	9
February 2024	9
March 2024	11
April 2024	19

Article Contents

Evolution of Multigene Families by Gene Duplication: A Haploid Model

Abstract

MODEL

MARKOV CHAIN APPROXIMATION

SIMULATION

DISCUSSION

Acknowledgement

Footnotes

LITERATURE CITED

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Evolution of Multigene Families by Gene Duplication: A Haploid Model

Abstract

MODEL

MARKOV CHAIN APPROXIMATION

SIMULATION

DISCUSSION

Acknowledgement

Footnotes

LITERATURE CITED

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only