- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Innan, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Innan, H.
A Method for Estimating the Mutation, Gene Conversion and Recombination Parameters in Small Multigene Families
Hideki Innanaa Department of Biological Science, University of Southern California, Los Angeles, California 90089-1340
Corresponding author: Hideki Innan, University of Southern California, 835 W. 37th St., SHS 172, Los Angeles, CA 90089-1340., hi_innan{at}hotmail.com (E-mail)
Communicating editor: F. TAJIMA
| ABSTRACT |
|---|
A simple two-locus gene conversion model is considered to investigate the amounts of DNA variation and linkage disequilibrium in small multigene families. The exact solutions for the expectations and variances of the amounts of variation within and between two loci are obtained. It is shown that gene conversion increases the amount of variation within each locus and decreases the amount of variation between two loci. The expectation and variance of the amount of linkage disequilibrium are also obtained. Gene conversion generates positive linkage disequilibrium and the degree of linkage disequilibrium decreases as the recombination rate is increased. Using the theoretical results, a method for estimating the mutation, gene conversion, and recombination parameters is developed and applied to the data of the Amy multigene family in Drosophila melanogaster. The gene conversion rate is estimated to be
60165 times higher than the mutation rate for synonymous sites.
AS mechanisms to homogenize DNA sequence variation in multigene families, gene conversion and unequal crossing over have been considered. By computer simulations, ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Multigene families whose copy number is two are called small multigene families (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Under this simple model, the exact solutions for the equilibrium expectations and variances of the amounts of variation within and between two loci were obtained by a diffusion method. The amount of linkage disequilibrium was also investigated analytically. Using the theoretical results, a method for estimating the mutation, gene conversion, and recombination parameters was developed. The method was applied to estimate these three parameters in the Amy multigene family of D. melanogaster. It was shown that the gene conversion rate is
60165 times higher than the mutation rate for synonymous sites.
| THEORY |
|---|
Consider two linked loci, I and II, in a random mating population with N diploids. We consider two neutral alleles, A and a, so that there are four haplotypes, A-A, A-a, a-A, and a-a (the first letter represents the allele at locus I and the second one represents the allele at locus II). It is assumed that the mutation rate between two alleles is µ per locus per generation. The recombination rate between two loci is assumed to be r per generation. Intrachromosomal gene conversion occurs at the rate c per locus per generation; e.g., A-a changes into A-A with probability c and into a-a with the same probability. Interchromosomal gene conversion is not considered. Let the frequencies of A-A, A-a, a-A, and a-a be x1, x2, x3, and x4
, respectively. Given x1, x2, x3, and x4, their expectations in the next generation are given by
![]() |
(1a) |
![]() |
(1b) |
![]() |
(1c) |
and
![]() |
(1d) |
where
.
Under this model, we calculate the expectations of moments of allele frequencies using a diffusion method, which was introduced to population genetics by ![]()
![]() |
(2) |
where L is the differential operator of the Kolmogorov backward equation (![]()
![]()
![]() |
(3) |
We can transform the three variables, x1, x2, and x3, in Equation 3 into p, q, and D:
![]() |
(4) |
p and q represent the frequencies of A in loci I and II, respectively. Then, (3) becomes
![]() |
(5) |
and
![]() |
(6) |
where
, and
. Without gene conversion
, this equation is the same as Equation 12 in ![]()
First, letting g = p and q in (2) and (6), we have
![]() |
(7) |
when
0 and C
0.
Next, letting g = p2, q2, pq, and D, we obtain the following four equations:
![]() |
(8) |
![]() |
(9) |
![]() |
(10) |
and
![]() |
(11) |
From (811), we have
![]() |
(12) |
![]() |
(13) |
![]() |
(14) |
where
![]() |
(15a) |
![]() |
(15b) |
![]() |
(15c) |
and
![]() |
(15d) |
Therefore, the expectations of the amounts of variation (heterozygosity) within loci I and II are given by
![]() |
(16a) |
and
![]() |
(16b) |
respectively, where E(p2) is given by (12). The expected amount of variation within a locus is given by the average of hwI and hwII:
![]() |
(16c) |
Define the amount of variation between two loci, hb, as the probability that a pair of alleles randomly chosen from each locus are different. The expectation of hb becomes
![]() |
(17) |
where E(pq) is given by (13).
In a similar way, the variances of hwI, hwII, and hb are written as
![]() |
(18) |
and
![]() |
(19) |
The derivations for E(p4), E(p3), E(p2q2), and E(p2q) are shown in the Appendix We can also obtain the covariance between hwI and hwII and the variance of D. That is,
![]() |
(20) |
and
![]() |
(21) |
where the derivation for E(D2) is also in the Appendix From (18) and (20), the variance of hw is given by
![]() |
(22) |
Numerical examples for E(hw), E(hb), and E(D) are shown in Fig 1. Fig 1A shows the results for E(hw) given
= 0.01. Gene conversion increases the amount of variation within a locus. Note that E(hw) = 0.0098 without gene conversion. When the gene conversion rate is relatively small (C = 0.1), E(hw) is
1.75-fold larger than that without gene conversion, while there is almost no effect of gene conversion on E(hw) when C = 100. Recombination also increases E(hw) but the effect is relatively small. Fig 1B shows the results for E(hb). Gene conversion decreases the amount of variation between two loci. The amount of variation between two loci is much bigger than that within each locus unless C is very large. When C = 100, E(hw) and E(hb) are almost the same. In Fig 1C, it is shown that gene conversion generates positive linkage disequilibrium. When there is no gene conversion, E(D) = 0. D is positively correlated with C, and D decreases as R increases. These results are consistent with other studies (e.g., ![]()
|
| DATA ANALYSIS AND ESTIMATION OF PARAMETERS |
|---|
Since the expectations and variances of the amounts of variation within and between two loci and linkage disequilibrium are given by functions of
, C, and R, it may be possible to estimate these parameters from DNA polymorphism data. An estimation method is explained using the data of the Amy region in D. melanogaster as an example (see Figure 1 in ![]()
First, we estimate the amounts of variation and linkage disequilibrium for a particular site. Consider the 567th site of the Amy genes where two nucleotides, T and C, are segregating, so that there are four possible haplotypes, T-T, T-C, C-T, and C-C (the first letter represents the nucleotide in the proximal gene and the second one represents that in the distal gene). Denote the number of these haplotypes by n1, n2, n3, and n4. Estimates of heterozygosity within the proximal and distal genes are given by
![]() |
(23a) |
where
. Then, we have the average of hwp and hwd as
![]() |
(23b) |
The amount of variation between two genes is estimated by
![]() |
(24) |
Since n1 = 1, n2 = 5, n3 = 0, and n4 = 3 at the 567th site, hwp = 0.5, hwd = 0.222, hw = 0.361, and hb = 0.639.
Next, we consider the amount of linkage disequilibrium. Usually linkage disequilibrium in the sample is calculated as (n1n4 - n2n3)/n2. Therefore, from ![]()
![]() |
(25) |
from which we have
= 0.0417 at the 567th site.
From (2325), we can calculate hw, hb, and
for all sites of the genes and we have their averages. The averages of hwp and hwd correspond to
wp and
wd, the average numbers of pairwise differences within the proximal and distal genes per site.
b is the average number of pairwise differences between two genes, which is the average of hb. Let d be the average of
. Only the data for the synonymous sites of ![]()
![]()
wp = 0.0315 and
wd = 0.0302. Then, the average number of pairwise differences within a gene,
w, is 0.0309. In a similar way, we have the average number of pairwise differences between two genes,
b = 0.0452. The average of linkage disequilibrium between two genes, d, becomes 0.000452. If
, C, and R are constant for all the sites, the expectations of
w,
b, and d are given by
![]() |
(26) |
where E(hw), E(hb), and E(D) are given by (16c), (17), and (14), respectively.
Since E(
w), E(
b), and E(d) are given by functions of
, C, and R, it may be possible to estimate these parameters from
w,
b, and d, although the equations for E(
w), E(
b), and E(d) are too complicated to solve for
, C, and R. One way for the estimation is to find a set of
, C, and R that minimizes x:
![]() |
(27) |
Although we do not have analytical expressions for Var(
w), Var(
b), and Var(d), we may be able to use (22), (19), and (21) for them, respectively, because these equations are used as weighting factors in (27). Note that the equations for Var(hw), Var(hb), and Var(D) are based on the two-locus model. The variances of
w,
b, and d could be smaller than those of hw, hb, and D, as
w,
b, and d are calculated from DNA sequence data. For the Amy genes of D. melanogaster, given
, and
, the minimum x is obtained when
, and
. Unfortunately, it is not possible to evaluate the variances of these estimates. They might depend on
, C, R, and the sample size (n). Recombination within each gene may decrease the variances.
| DISCUSSION |
|---|
A simple two-locus gene conversion model was considered to investigate the amounts of DNA variation and linkage disequilibrium in small multigene families. The exact solutions for the expectations and variances of the amounts of variation within and between two loci were obtained. It was shown that gene conversion increases the amount of variation within each locus and that the degree of increase is large when the gene conversion rate is relatively small. On the other hand, gene conversion decreases the amount of variation between two loci and there is almost no difference between
w and
b when the gene conversion rate is very large. The effect of recombination on the amounts of variation within and between two loci is relatively small. The expectation and variance of the amount of linkage disequilibrium were also obtained. Gene conversion generates positive linkage disequilibrium and the degree of linkage disequilibrium decreases as the recombination rate increases.
The model considered here is a special case of OHTA's (1982) general model, and the theoretical results obtained in this article are consistent with her results. ![]()
corresponds to the model of this study. ![]()
and c1
c2
1 - E(hb). ![]()
were obtained without this assumption by a diffusion method. This method is useful to obtain the variances of hw, hb, and D. The transient equations for the second orders of the identity coefficients are too complicated to solve (![]()
![]()
Using the theoretical results, a method for estimating the mutation, gene conversion, and recombination parameters was developed. The method was applied to the data of the Amy multigene family of D. melanogaster (![]()
for synonymous sites is 0.0172, which is close to the average of this species (0.0135; ![]()
60-fold larger than the estimate of the mutation rate for synonymous sites. The amount of variation within a locus is much larger than
because of a high rate of gene conversion.
Similar results are obtained from recent data of the same region (Table 1). ![]()
for synonymous sites and for the total coding region are estimated to be 0.0302 and 0.0089, respectively. C for the total coding region is estimated to be 3.88, which is similar to that for synonymous sites (4.08). The similarity of the two estimates of C is consistent with the mechanism of gene conversion, because a single conversion event usually involves a certain length of DNA fragment. For the Japanese sample, since negative d is observed, the estimation was conducted assuming free recombination
. The results are very similar to those of the Kenyan sample. An estimate of
for synonymous sites is about fourfold bigger than that for the total coding region, while two estimates of C are similar. Estimates of
and C for the Kenyan sample are larger than those for the Japanese sample, probably because of the difference of population size.
|
To estimate
, C, and R, these parameters are assumed to be constant across the region. The obtained estimates might be the averages for all the sites considered. Since the Amy genes of D. melanogaster are reversely duplicated, R could have a large heterogeneity across the region. Assuming the recombination rate per site is constant (
per kb), R for the first position is
4.5
and for the last position is
7.5
because the length of the region between the two Amy genes is
4.5 kb. The effect of heterogeneity in the recombination rate on
was investigated (Fig 2) because the effect of R on
is relatively large. Almost no correlation was detected, suggesting that the effect of the heterogeneity of R on the estimates may not be large.
|
The method considered here ignores the effect of selection, and estimates might be biased if selection is working. Purifying selection decreases the amounts of variation within and between two loci. The effect of selection is large and complicated when some kind of balancing selection acts to maintain two different alleles in a population. The amount of variation between two loci increases dramatically as selection intensity increases. The amount of variation within each locus is increased when selection is relatively weak, while almost no variation is observed when selection is very strong (H. INNAN, unpublished results).
| ACKNOWLEDGMENTS |
|---|
The author thanks M. Nordborg for comments. This study was supported in part by a fellowship from the Japan Society for the Promotion of Science.
Manuscript received May 15, 2001; Accepted for publication March 4, 2002.
| APPENDIX |
|---|
In equilibrium, letting g = p3, p2q, and pD in (2) and (6), we have three equations for E(p3), E(p2q), and E(pD),
![]() |
(A1) |
![]() |
(A2) |
and
![]() |
(A3) |
because E(p3) = E(q)3, E(p2q) = E(pq2), and E(pD) = E(qD). From (A1A3), we have the solutions for E(p3), E(p2q), and E(pD). To show the solutions, it is helpful to introduce the equations

Then, E(p3), E(p2q), and E(pD) are given by
![]() |
(A4) |
![]() |
(A5) |
and
![]() |
(A6) |
In a similar way, we have the following six equations letting g = p4, p3q, p2q2, p2D, pqD, and D2:
![]() |
(A7) |
![]() |
(A8) |
![]() |
(A9) |
![]() |
(A10) |
![]() |
(A11) |
and
![]() |
(A12) |
From (A7A12), we have the solutions of equilibrium expectations for p4, p3q, p2q2, D2, p2D, and pqD. To show the very complicated solutions, it is helpful to introduce the following equations:

Then, E(p4), E(p3q), E(p2q2), E(D2), E(p2D), and E(pqD) are given by
![]() |
(A13) |
![]() |
(A14) |
![]() |
(A15) |
![]() |
(A16) |
![]() |
(A17) |
![]() |
(A18) |
| LITERATURE CITED |
|---|
ARAKI, H., N. INOMATA, and T. YAMAZAKI, 2001 Molecular evolution of duplicated amylase gene regions in Drosophila melanogaster: evidence of positive selection in the coding regions and selective constraints in the cis-regulatory regions. Genetics 157:667-677
BAHN, E., 1967 Crossing over in the chromosomal region determining amylase isozymes in Drosophila melanogaster. Hereditas 58:1-12[Medline].
BALTIMORE, D., 1981 Gene conversion: some implications for immunoglobulin genes. Cell 24:592-594[Medline].
BASTEN, C. J. and B. S. WEIR, 1990 Effect of gene conversion on variances of digenic identity measures. Theor. Popul. Biol. 38:125-148[Medline].
BIRKY, C. W., JR. and R. V. SKAVARIL, 1976 Maintenance of genetic homogeneity in systems with multiple genomes. Genet. Res. 27:249-265[Medline].
BLACK, J. A. and D. GIBSON, 1974 Neutral evolution and immunoglobulin diversity. Nature 250:327-328[Medline].
DOVER, G. and E. COEN, 1981 Springcleaning ribosomal DNA: a model for multigene evolution? Nature 290:731-732[Medline].
EDELMAN, G. M., and J. A. GALLY, 1970 Arrangement and evolution of eukaryotic genes, pp. 962972 in Neurosciences: Second Study Program, edited by F. O. SCHMITT. Rockefeller University Press, New York.
INOMATA, N., H. SHIBATA, E. OKUYAMA, and T. YAMAZAKI, 1995 Evolutionary relationships and sequence variation of
-amylase variants encoded by duplicated genes in the Amy locus of Drosophila melanogaster.. Genetics 141:237-244[Abstract].
KIMURA, M., 1964 Diffusion models in population genetics. J. Appl. Probab. 1:117-232.
MORIYAMA, E. N. and J. R. POWELL, 1996 Intraspecific nuclear DNA variation in Drosophila.. Mol. Biol. Evol. 13:261-277[Abstract].
NAGYLAKI, T., 1984a Evolution of multigene families under interchromosomal gene conversion. Proc. Natl. Acad. Sci. USA 81:3796-3800
NAGYLAKI, T., 1984b The evolution of multigene families under intrachromosomal gene conversion. Genetics 106:529-548
NAGYLAKI, T. and T. D. PETES, 1982 Intrachromosomal gene conversion and the maintenance of sequence homogeneity among repeated genes. Genetics 100:315-337
NEI, M. and A. K. ROYCHOUDHURY, 1974 Sampling variances of heterozygosity and genetic distance. Genetics 76:379-390
OHTA, T., 1976 Simple model for treating evolution of multigene families. Nature 263:74-76[Medline].
OHTA, T., 1977 On the gene conversion model as a mechanism for maintenance of homogeneity in systems with multiple genomes. Genet. Res. 30:89-91[Medline].
OHTA, T., 1978 Theoretical population genetics of repeated genes forming a multigene family. Genetics 88:845-861
OHTA, T., 1981 Genetic variation in small multigene families. Genet. Res. 37:133-149[Medline].
OHTA, T., 1982 Allelic and nonallelic homology of a supergene family. Proc. Natl. Acad. Sci. USA 79:3251-3254
OHTA, T., 1983 On the evolution of multigene families. Theor. Popul. Biol. 23:216-240[Medline].
OHTA, T., 1984 Some models of gene conversion for treating the evolution of multigene families. Genetics 106:517-528
OHTA, T., 1985 Variances and covariances of identity coefficients of a multigene family. Proc. Natl. Acad. Sci. USA 82:829-833
OHTA, T. and M. KIMURA, 1969a Linkage disequilibrium due to random genetic drift. Genet. Res. 13:47-55.
OHTA, T. and M. KIMURA, 1969b Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genetics 63:229-238
PAYANT, V., S. ABUKASHAWA, M. SASSEVILLE, B. F. BENKEL, and D. A. HICKEY et al., 1988 Evolutionary conservation of the chromosomal configuration and regulation of amylase genes among eight species of the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 5:560-567[Abstract].
SHIBATA, H. and T. YAMAZAKI, 1995 Molecular evolution of the duplicated Amy locus in the Drosophila melanogaster species subgroup: concerted evolution only in the coding region and an excess of nonsynonymous substitutions in speciation. Genetics 141:223-236[Abstract].
SMITH, G. P., 1974 Unequal crossover and the evolution of multigene families. Cold Spring Harbor Symp. Quant. Biol. 38:507-513
SMITH, G. P., 1976 Evolution of repeated DNA sequences by unequal crossover. Science 191:528-535
WALSH, J. B., 1988 Unusual behaviour of linkage disequilibrium in two-locus gene conversion models. Genet. Res. 51:55-58[Medline].
This article has been cited by other articles:
![]() |
S. Mano and H. Innan The Evolutionary Rate of Duplicated Genes Under Concerted Evolution Genetics, September 1, 2008; 180(1): 493 - 505. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Takuno, T. Nishio, Y. Satta, and H. Innan Preservation of a Pseudogene by Gene Conversion and Diversifying Selection Genetics, September 1, 2008; 180(1): 517 - 531. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Beisswanger and W. Stephan Evidence that strong positive selection drives neofunctionalization in the tandemly duplicated polyhomeotic genes in Drosophila PNAS, April 8, 2008; 105(14): 5447 - 5452. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Teshima and H. Innan Neofunctionalization of Duplicated Genes Under the Pressure of Gene Conversion Genetics, March 1, 2008; 178(3): 1385 - 1398. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Zhang and N. A. Rosenberg On the Genealogy of a Duplicated Microsatellite Genetics, December 1, 2007; 177(4): 2109 - 2122. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hagenblad, J. Bechsgaard, and D. Charlesworth Linkage Disequilibrium Between Incompatibility Locus Region Genes in the Plant Arabidopsis lyrata Genetics, June 1, 2006; 173(2): 1057 - 1073. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Hallast, L. Nagirnaja, T. Margus, and M. Laan Segmental duplications and gene conversion: Human luteinizing hormone/chorionic gonadotropin {beta} gene cluster Genome Res., November 1, 2005; 15(11): 1535 - 1546. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. P. Sugino and H. Innan Estimating the Time to the Whole-Genome Duplication and the Duration of Concerted Evolution via Gene Conversion in Yeast Genetics, September 1, 2005; 171(1): 63 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kuang, S.-S. Woo, B. C. Meyers, E. Nevo, and R. W. Michelmore Multiple Genetic Processes Result in Heterogeneous Rates of Evolution within the Major Cluster Disease Resistance Genes in Lettuce PLANT CELL, November 1, 2004; 16(11): 2870 - 2894. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Teshima and H. Innan The Effect of Gene Conversion on the Divergence Between Duplicated Genes Genetics, March 1, 2004; 166(3): 1553 - 1560. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Innan A two-locus gene conversion model with selection and its application to the human RHCE and RHD genes PNAS, July 22, 2003; 100(15): 8793 - 8798. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Nielsen, J. Kasper, M. Choi, T. Bedford, K. Kristiansen, D. F. Wirth, S. K. Volkman, E. R. Lozovsky, and D. L. Hartl Gene Conversion as a Source of Nucleotide Diversity in Plasmodium falciparum Mol. Biol. Evol., May 1, 2003; 20(5): 726 - 734. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Innan The Coalescent and Infinite-Site Model of a Small Multigene Family Genetics, February 1, 2003; 163(2): 803 - 810. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Innan, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Innan, H.






























































