We carried out mutation screen experiments to understand the rate and molecular nature of spontaneous de novo mutations in Drosophila melanogaster, which are crucial for many evolutionary issues, but still poorly understood. We screened for eye-color and body-color mutations that occurred in the germline cells of the first generation offspring of wild-caught females. The offspring were from matings that had occurred in the field and therefore had a genetic composition close to that of flies in natural populations. We employed 1554 F1 individuals from 374 wild-caught females for the experiments to avoid biased contributions of any particular genotype. From ∼8.6 million alleles screened, we obtained 10 independent mutants: two point mutations (one for each sex), a single deletion of ∼6 kb in a male, a single transposable element insertion in a female, five large deletions ranging in size from 40 to 500 kb in females, and a single mutation of unknown nature in a male. The five large deletions were presumably generated by nonallelic homologous recombination (NAHR) between transposable elements at different locations, illustrating the mutagenic nature of recombination. The high occurrence of NAHR that we observed has important consequences for genome evolution through the production of segmental duplications.
MUTATION rate and its variability are central to theoretical population genetics and evolutionary studies, but are still poorly understood. For example, the rate for mildly deleterious mutations affecting the viability of Drosophila was estimated to be as high as one per diploid genome per generation (Mukai 1964; Mukai et al. 1972). However, this high rate has been challenged by using different methods and analyses, and the issue is not completely settled (e.g., García-Dorado et al. 1998; Fry et al. 1999; Charlesworth et al. 2004; Fry 2004). The mutation rate has also been estimated on the basis of the sequence divergence between species (e.g., Kondrashov and Crow 1993; Eyre-Walker and Keightley 1999; Keightley and Eyre-Walker 2000). These estimates, however, are subject to considerable uncertainty about divergence times, generation lengths, and neutrality of the sites studied.
As well as an accurate estimate of mutation rate, understanding the nature of the mutations is also crucial. Without knowing the nature of the mutations (their molecular spectrum), one cannot understand the evolutionary and practical impacts of mutations. Nevertheless, only a few studies have focused on the molecular characterization of spontaneous mutations in eukaryotes (e.g., Yamaguchi et al. 1994; Yang et al. 2001). Recently, sequencing of mutation accumulation (MA) lines in several model organisms has been applied to obtain a direct estimate of mutation rate and the spectrum of the mutations (Denver et al. 2004; Haag-Liautard et al. 2007; Lynch et al. 2008). Although sequencing is a powerful means for finding mutations, this approach is not without its problems. Due to the significant variability in rate and pattern of mutation among progenitor genotypes (Haag-Liautard et al. 2007), surveys of many individuals are necessary. However, this approach is difficult to apply to many independent MA lines. Furthermore, complete deletions of a locus and probably duplication would escape detection by a PCR-based approach.
Segmental duplication and deletion (or copy number variation, CNV) have been recognized as an important source of genetic variability in the past few years (e.g., Iafrate et al. 2004; Sebat et al. 2004; Perry et al. 2006; Dopman and Hartl 2007; Graubert et al. 2007; Wong et al. 2007; Emerson et al. 2008) and they have the potential to affect human health (Lupski 1998). These structural variations can be generated by nonallelic homologous recombination (NAHR; ectopic recombination) between repeated sequences (e.g., Goldberg et al. 1983; Roeder 1983; Chance et al. 1994). Despite their importance, relatively little is known about the rate and pattern of spontaneous large-scale structural mutations. Obviously, more data of this type are required.
We carried out large-scale mutation screening experiments aimed at understanding the rate and nature of spontaneous mutations. In the design of the experimental scheme, we took special care to reduce bias due to genetic background and between-individual heterogeneity. We screened for eye-color and body-color mutations that occur in germline cells of the first-generation offspring (F1) of wild-caught inseminated females. The offspring were from matings that had occurred in the field and therefore had a genetic composition close to that of flies in a natural population. We employed 1554 F1 individuals from 374 wild-caught females for the screening experiments, avoiding bias toward any particular genotype. Male and female F1 offspring were individually crossed to tester strains for F2 offspring screening, making a between-sex comparison possible. From ∼8.6 million alleles, we obtained 10 independent eye-color mutants. The most striking finding in this study is that five of the seven female mutations characterized were large deletions ranging in size from 40 to 500 kb, presumably generated by NAHR between transposable elements (TEs) at different locations. The high occurrence of NAHR has important consequences for genome evolution.
MATERIALS AND METHODS
Flies and culture conditions:
We collected 135 and 239 inseminated females of Drosophila melanogaster at Kyoto, Japan, in 2005 and 2006, respectively. Flies were bred in 2.6 × 10 cm vials. Each vial contained 10 ml of food (7 g agar, 80 g dried yeast, 40 g cornmeal, 100 g glucose/liter of water with 4 ml of propionic acid and 3 ml of 10% p-hydroxybutylbanzoate) seeded with a few grains of live baker's yeast. Flies were kept at 25° on a 12 hr light/12 hr dark cycle.
The mutant strains dpov1 cn1 bw1 and b1 cn1 bw1 were obtained from the Drosophila Genetic Resource Center at the Kyoto Institute of Technology and used as tester strains. FM6 and SM1 chromosomes were used for chromosome extraction. For rapid confirmation of X-linked mutations, we employed an attached-X strain (C(1)RM, y wa; TT35). An inbred y w strain (TT16) carrying the standard gene arrangement was used for cytogenetic analysis of salivary-gland polytene chromosomes. To prevent mutations during the chromosome-extraction procedure as much as possible, the genetic backgrounds of the tester strains, balancer (SM1/Sp), and attached-X strains were replaced with that of an inbred strain originating from a female collected in Kyoto, Japan, in 2002. The Drosophila Genetic Resource Center at the Kyoto Institute of Technology also provided mutant strains used in the complementation tests: y1 pn1; cm1; ras1 dy1; v1; g1; bw1; e1; Df(2R)bw-HB132, Frd1/SM6a; and Df(2R)bw5, bw5 sp2/T(2;3)apXa, apXa.
The mating scheme for detecting visible mutations is summarized in Figure 1. Each wild-caught female (P) was placed individually in a vial to produce progeny from any mating that had occurred in nature. We collected 1–10 virgin F1 females and males from each vial without anesthesia. These F1 flies were immediately and individually crossed with tester flies (three to four tester males for each F1 female and two to three tester females for each F1 male). The average number of F1 females and males used for the crosses per P female was 2.6 and 1.5, respectively. The flies were transferred every 3 days to a new vial, nine times. Efforts were made to keep the number of tester flies in a vial constant during the experiment.
The F2 offspring from these crosses were screened for eye-color and body-color phenotypes. The dumpy (dp) locus is characterized by three distinct mutant phenotypes (Carlson 1959): truncated wings (designated o), thoracic vortices (v), and embryonic lethal (l). Screening was performed for wing shape, but not satisfactorily for bristle and thoracic phenotypes. While we identified a single wing-shape mutant at dp in this screen, some dp mutants were probably missed. To avoid possible bias due to this complexity, we decided to focus on genes affecting eye color and body color. In addition to recessive mutations at the autosomal mutant loci of the tester strain [black (b), cinnabar (cn), and brown (bw)] and dominant mutations, this scheme should identify X-linked recessive mutations in male offspring from crosses of an F1 female × tester-strain males.
If an F1 fly is heterozygous for a loss-of-function allele at b, cn, or bw, 50% of the F2 offspring should show the mutant phenotype. In addition, sibships of mutants and the parental F1 flies were used as controls for molecular study, by which de novo mutations can be distinguished from variation in the natural population.
To estimate the total number of F2 flies, we counted the total number of adult flies that emerged from 137 randomly sampled crosses (137 × 10 = 1370 vials; ∼10% of all the crosses). The average number of F2 offspring per vial was 106 from crosses of an F1 female with tester males and 160 from crosses of an F1 male with tester females. Due to the large scope of the study and the need for freshly collected flies, four sets of experiments were performed during the course of 11 months. Tester strains used were dp cn bw for the first two sets of experiments and b cn bw for the latter two sets.
Isolation of mutant alleles:
F2 males showing any mutant phenotypes were crossed individually to the same tester strain and then to the attached-X strain to check the phenotypes in the progeny. For “transmitted” second-chromosome mutations, lines were established from the mutant male progeny. X-chromosomal mutations were maintained with the attached-X chromosome and were subjected to genetic complementation tests with the yellow (y), prune (pn), white (w), carmine (cm), raspberry (ras), vermilion (v), and garnet (g) mutants.
F2 females with a mutant phenotype were mated individually to the second-chromosome balancer strain, from which we established at least four second-chromosome lines. The phenotype was confirmed and the affected locus was identified by crossing the lines to tester strains of cn and bw.
In total, 422 suspicious F2 flies were tested and 12 transmitted mutants and a single mosaic mutant were obtained. In subsequent studies, we focused on 10 transmitted mutants, for which we successfully identified the affected loci.
De novo mutations were studied by PCR-based analyses. We extracted DNA from F1 parental flies, F2 mutants, and their wild-type sibships individually and from a few flies of homozygous, hemizygous, and heterozygous mutant strains, using the GenElute Mammalian Genomic DNA kits (Sigma-Aldrich, St. Louis). A series of primer sets were designed for amplification of the entire coding region of each mutant locus with Primer3 software (Rozen and Skaletsky 2000). TaKaRa LA Taq (TAKARA BIO, Otsu, Japan) was used for PCR reactions. MultiScreen-FB filter plates (Millipore, Billerica, MA) and QIAquick PCR purification kits (QIAGEN KK, Tokyo) were used for purification of PCR products. BigDye Terminator Cycle sequencing kit version 3 (Applied Biosystems, Foster City, CA) was used for the sequencing reactions and the products were run on an ABI 3100 Automated Sequencer (Applied Biosystems).
PCR products from homozygous and hemizygous flies were directly sequenced after purification. bw3 and bw4 chromosomes were homozygous lethal and then maintained over the SM1 chromosome. PCR products from these SM1/mutant heterozygous flies were also directly sequenced en masse. Because cn1 and bw1 on the tester chromosomes are a deletion and an insertion mutation, respectively (Crosby et al. 2007), PCR products from flies heterozygous for these mutations were gel-fractionated to isolate products from de novo mutant alleles and purified with QIAquick gel extraction kits (QIAGEN KK). The sequences of two parental alleles were inferred from multiple sequences obtained from wild-type sibships of mutants together with the sequences of the parental flies.
Reference sequences were derived from NCBI (http://flybase.org/), Ensembl (http://www.ensembl.org/index.html), and FlyBase (http://flybase.bio.indiana.edu/; D. melanogaster genome R5.5; Crosby et al. 2007) genome databases. The accession numbers of the other reference sequences are L23543 for bw, U56245 for cn, AF222049 for Transpac, AY180917 for roo, and AY180918 for nomad. Primer sequences are provided in supplemental Table 1.
Mutation detection rate:
From an observed nonsense-mutation frequency, we estimated the point mutation rate, assuming that all nonsense mutations are detected, as in Drake (1991). Therefore, the correction factor is the reciprocal of the fraction of nonsense mutations among all possible single-nucleotide changes. This fraction was calculated from the sequences of all three autosomal and seven X-linked genes studied here. We also took into account transition bias, which was estimated to be four by Haag-Liautard et al. (2007). We obtained a rate of 0.010 (0.012 under no transition bias), which is less than one-fourth of the 3/64 (0.047) rate used by Drake (1991).
Converting per-gene rates into per-genome rates:
A per-gene rate of single-gene mutations, including point mutations, insertions, and small deletions, was converted into a rate per diploid genome by simply multiplying by twice the number of genes (∼14,000; Crosby et al. 2007). For a rate of occurrence of large deletions spanning multiple genes, we first calculated the probability (Phit) that any nucleotide within a gene of size R is hit by a deletion of size D occurring randomly in a genome of size G. Since the number of different deletions of size D (i.e., with different breakpoints) that hit any nucleotide of the gene is D + R − 1, this is approximately expressed as Phit ≈ (D + R)/G. Here, the genome size G = 120 Mb and R = 5 kb (the average size of the nine genes studied; FlyBase, http://flybase.bio.indiana.edu/; D. melanogaster genome R5.5; Crosby et al. 2007). Then a rate per diploid genome can be obtained by multiplying a per-gene rate by twice the reciprocal of Phit.
Lower and upper 95% confidence limits of the mutation rate were calculated using the tables from Gehrels (1986).
We screened ∼8.6 million alleles for spontaneous de novo visible mutations that occurred in the germline cells of F1 females and males from field-inseminated (P) females of D. melanogaster (Table 1). The screening procedure was designed to detect recessive mutations at the autosomal b, cn, and bw loci. In addition, male progeny of the F1 females (about one-fourth of total screened flies, or ∼517,000) were screened for X-linked mutations that affected eye or body color. The total numbers of parental (P) females, F1 females, and F1 males used in the experiments were 374, 983, and 571, respectively.
The numbers of alleles screened for each locus and the identified mutations are shown in Table 1. In total, we found 12 de novo mutations. Among them, two are mutations (x1 and x2) that occurred at unknown loci and are not further considered here. All mutations were derived from different F1 flies except for two bw mutants (bw3's) that originated from a single F1 female and seemed to have the same lesion, a deletion of ∼100 kb. Because the data suggest their independent mutational origins as shown below, we counted both of these two bw3 mutations as independent mutations in the subsequent rate calculation. There was no clear maternal age effect (Table 2).
The average rate of visible mutations was estimated to be 1.35 × 10−6 per gene (95% confidence interval, 0.28–3.94 × 10−6) for the three autosomal loci in males and 1.10 × 10−6 (95% confidence interval, 0.44–2.27 × 10−6) for the three autosomal and seven X-linked loci in females, yielding an average for all the mutations of 1.16 × 10−6 (0.56–2.14 × 10−6) (Table 1). There was a single nonsense mutation, which can be converted into a point mutation rate of 6.6 × 10−9 using the correction factor (97.4; see materials and methods) and the average length of the coding regions of the 10 genes studied (1730 nucleotides).
Molecular nature of the mutations:
We characterized 9 of the 10 mutations (Table 2): two point mutations, one at the bw locus (bw1; a nonsense mutation from TGG to TAG) and the other at the v locus [v1; a missense mutation from TTT (Phe) to TCT (Ser)]; an insertion of a retrotransposon at cn; and six deletions, one at cn, two at w, and three at bw. The remaining mutation is bw2 (Table 2). For this mutant, we could not find a change in either the flanking regions (over a total length of 1 kb) or the coding regions, suggesting a change in another regulatory region.
As for cn2, we obtained a longer PCR fragment from the mutant fly than from its normal sibships, indicating an insertion mutation. We determined the sequences surrounding the insertion site and found the insertion of a retrotransposon, Transpac, in exon 3 (supplemental Figure 1). Transpac is a member of the gypsy group of LTR retrotransposons in Drosophila (Bowen and Mcdonald 2001). The insertion was ∼5 kb long, which is comparable to the full length of the Transpac element (5249 bp; AF222049). Duplication of the four-nucleotide target sequence, 5′-ATAT-3′, was observed at the boundary, which is also consistent with the previous study (Bowen and McDonald 2001).
Only one (cn1 in Table 2) of the six deletions was derived from a male parent, although the deletion rate did not differ between females (0.58 × 10−6) and males (0.45 × 10−6). The deletion was 5523 bp long and included the entire cn gene and parts of 5′-UTRs of two adjacent genes, CG30497 and Calcineurin B2 (CanB2). The mutation had a seven-nucleotide insertion, 5′-AAAGGAC-3′, at the boundary. This motif was not found in the 2-kb regions around the breakpoints in the wild-type sequences. The mutant sequence matched 100% to one of parental sequences except for the deletion and the seven-nucleotide insertion. We could not find any repeat sequences, such as TEs and simple repeats, around the boundaries in the parental sequence.
We failed to amplify DNA sequences of two w mutants (w1 and w2) in all seven primer sets covering the entire w coding region and 5′ flanking region, suggesting a large deletion. We successfully amplified DNA fragments encompassing the deletions from w1 (supplemental Figure 2) and w2 (data not shown) by using a pair of primers placed on the flanking regions of the deletion. The sizes of the two deletions were ∼44 kb (w1) and ∼272 kb (w2). Both deletions carried a retrotransposon at the boundary: roo for w1 (supplemental Figure 3) and nomad for w2 (supplemental Figure 4). In both cases, the same retrotransposon was found at the proximal breakpoint on one parental chromosome and at the distal breakpoint on the other homologous chromosome. These retrotransposons are in the same orientation. Furthermore, LTR sequences of these retrotransposons indicated that recombination occurred within the retrotransposons or the internal region (if any) in both cases (see the legends to supplemental Figures 3 and 4). In short, it is very likely that these deletions were caused by interchromosomal NAHR between the retrotransposons located at different chromosomal sites.
Three bw mutant chromosomes, two bw3's and bw4, were lethal in homozygotes and in heterozygotes with Df(2R)bw-HB132 and with Df(2R)bw5. They carried large deletions (Figure 2). The two bw3 mutants were cytologically indistinguishable. They originated from a single parental female, but from different vials, vials 1 and 7 (Table 2). On the basis of the presence or absence of heterozygous sites in the PCR products, we estimated the deletion size of ∼100 kb for the bw3's and ∼500 kb for the bw4. We amplified both proximal and distal regions of the breakpoints and determined the sequences for these mutants and their multiple wild-type sibships. We thus found that recombination had occurred in small regions surrounding the deletions in all three mutants (in positions in the D. melanogaster genome R5.5, 19,279,000–19,454,000 in bw3's and 19,114,000–19,667,000 in bw4; data not shown). These deletions seem to have also been generated by NAHR. These recombination events also suggest the meiotic origins of the mutations. The two bw3 mutants are likely to have independent mutational origins, even if they have exactly the same lesion.
Per-genome mutation rate:
Genomic rates should be estimated separately for single-gene mutations (point mutations, insertions, and small deletions) and large deletions affecting multiple genes simultaneously. For the former, the rate per diploid genome was estimated to be 0.0163 (see materials and methods). For the latter, by using the average size of the five deletions of ∼200 kb, we obtained an estimate of 0.0009 as a rate per diploid genome in females (see materials and methods) and, when divided by two, an estimate of 0.0005 as a sex-average rate. In addition to deletions, NAHR produces duplications. Assuming the same rate for deletions and duplications, their rate per diploid genome was estimated to be 0.0009. In summary, we obtained 0.017 for an estimate of the mutation rate per diploid genome.
Rate and spectrum of spontaneous mutations:
This study attempted to obtain estimates of the rate and spectrum of loss-of-function mutations in many different genetic backgrounds found in a natural population. The estimated rate (1.2 × 10−6) is smaller than the previous estimates of the null mutation rate obtained in MA experiments (10.3 × 10−6/locus/generation in Mukai and Cockerham 1977; 3.86 × 10−6/locus/generation in Voelker et al. 1980; 13.0 × 10−6/locus/generation in Harada et al. 1993) and estimates of the visible mutation rate in specific locus tests (7.5 × 10−6 in Woodruff et al. 1983; 5.3 × 10−6 in Yang et al. 2001). The heterogeneity of the estimates, at least partially, stems from the activity levels of TEs. Chromosomal aberrations and mobilization of TEs occurred at high frequencies in the MA lines in Mukai and Cockerham (1977) and Voelker et al. (1980) (Yamaguchi and Mukai 1974; Harada et al. 1990). Indeed, insertional mutations were identified as a major class of mutations in Harada et al. (1993)'s MA experiment (Yamaguchi et al. 1994; 5 of the 6 mutations analyzed) and the single-generation specific locus test by Yang et al. (2001; 5 of the 13 mutations).
Given such a large amount of variation in mutation rate, including the mobility of TEs among different genetic backgrounds, our fundamental interest is to understand the features of mutations that take place in the genomic composition of a natural population. In this study, we screened for loss-of-function mutations that occurred in germ cells of the first-generation offspring of wild-caught inseminated females. The offspring were produced from matings that had taken place in the field and therefore carried genotypes close to those of flies in nature. Thus this screening (as well as that by Yang et al. 2001) satisfies the above conditions and we can place some confidence in these estimates.
The detection rate of point mutations in this study was calculated to be ∼1.0%, which is much lower than the rate used by Drake (1991), 3/64 = 4.7%. The detection rates were uniform across the 10 genes studied, and they varied only from 0.009 to 0.013. This clearly indicates a deficiency of codons that are different in a single codon position from the stop codons. By using this detection rate, we obtained an estimate of the point mutation rate for coding regions of 6.6 × 10−9/bp/generation. This rate is comparable to the point mutation rates estimated from the mutation accumulation studies [8.5 × 10−9 calculated by Drake et al. (1998) based on Mukai and Cockerham (1977)'s data and 5.8 × 10−9 in Haag-Liautard et al. (2007)].
We obtained 0.017 for an estimate of mutation rate per diploid genome per generation. This is clearly an underestimate because we can identify only mutations that had a substantial functional effect. By multiplying the above estimate of the point mutation rate by two times the number of genes times the average gene size, we can obtain the per-genome mutation rate for coding regions. This becomes (6.6 × 10−9) × 2 (14,000) × (1500) = 0.3 (Harrison et al. 2003; Crosby et al. 2007). This rate seems to be comparable with the recent estimate of the deleterious mutation rate by MA experiments (≥0.12 in Charlesworth et al. 2004), but not with the much smaller estimates by García-Dorado et al. (1998) and Fry et al. (1999). Assuming the same mutation rate for coding and noncoding regions (Haag-Liautard et al. 2007) and using the genome size of 120 Mb, we also estimated the whole-genome mutation rate to be (6.6 × 10−9) × 2(120 × 106) = 1.6. An estimate of 1.2 mutations/diploid genome was also obtained from a recent molecular analysis of MA lines (Haag-Liautard et al. 2007).
In contrast to humans and other mammals (Haldane 1947; for review, see Crow 2000), mutation rates in Drosophila do not differ much between the sexes (Bauer and Aquadro 1997; see Woodruff et al. 1983 for review). In this study, there was no significant sex difference in the mutation rate at the three autosomal loci: 1.46 × 10−6 (0.40–3.73 × 10−6) in females and 1.35 × 10−6 (0.28–3.94 × 10−6) in males. However, the spectrum of mutations differed between the two sexes. Five of the seven mutations that occurred in female germ cells were large deletions associated with interchromosomal recombination. Two of them (w3 and w4) were caused by NAHR between TEs at different locations; the remaining three (two bw3's and bw4) were also likely to be due to NAHR. Although we found a 5.5-kb deletion at the cn locus (cn1) by an unknown mechanism, no such deletion occurred in male germ cells. This finding also raises the question of whether there is a sex difference in the point mutation rate in Drosophila. Although there is no evidence for an enhanced mutation rate in males from the comparisons of sequence divergence between the X chromosome and autosomes (Bauer and Aquadro 1997; Begun et al. 2007), future study may shed light on this possibility.
In this study, we identified five large deletions ranging from ∼40 to 500 kb. All of them seemed to be generated by NAHR between TEs. Although Yang et al. (2001) did not report any large deletions in their mutation screen, there are two uncharacterized mutations in their study that might be large structural variants. Although NAHR between interspersed TEs has long been appreciated as a source of variation in Drosophila (Goldberg et al. 1983; Davis et al. 1987; Lim 1988), to the best of our knowledge, this study provides the first estimate of the rate under wild-type conditions (0.0009/diploid genome/generation).
The frequent occurrence of structural variant via NAHR between interspersed TEs has a significant impact on genome evolution from the following two aspects: (1) it affects the abundance and distribution of TEs themselves, and (2) large segmental duplication gives rise to gene duplication, which is an important long-term evolutionary force (Ohno 1970).
Although TEs are ubiquitous throughout genomes, their individual frequencies are quite low in the euchromatic portions of the Drosophila genome (Montgomery and Langley 1983; Biémont et al. 1994). Selection appears to control the transpositional spread and distribution of TEs, but what type of selection is the main driving force? Selection against the deleterious effects of TE insertions per se is an obvious possibility (Charlesworth and Charlesworth 1983; Hoogland and Biémont 1996). Alternatively, the abundance and distribution of TEs may be controlled indirectly by selection against structural changes caused by NAHR between TEs (Montgomery et al. 1987; Langley et al. 1988). While the debate between these two selection models remains unresolved, due in part to a large amount of heterogeneity in the distribution between TE families (Hoogland and Biémont 1996; Charlesworth et al. 1997), our results also provide qualitative support for the model of selection against NAHR-mediated structural variants (Montgomery et al. 1991; Charlesworth et al. 1992). Future study based on the estimated rate should assess the impact of NAHR quantitatively.
In addition to deletions, NAHR events produce segmental duplications, and their potential consequence is gene duplications. Using the NAHR rate as a basis, we obtained the estimate of the per-gene duplication rate of 0.4 × 10−6 [=(5/6,365,000)/2]. On the other hand, from sequence analyses of duplicated genes in the genome sequence, the origination rate of gene duplication is indirectly estimated as 0.001 × 10−6/gene/generation for Drosophila (Lynch 2007). This is only ∼1/400 of the present estimate. What is the reason for this large discrepancy?
A simple explanation is that the duplication rate reported here is not typical for the genome as a whole and the true rate is much lower. However, this is very unlikely because our estimate may even be an underestimate for several reasons. It is conceivable that we failed to find X-linked rearrangements due to hemizygous lethality. Indeed, there are essential genes in the vicinity of the target loci, for example, innexin 2 close to the cm locus and lethal(1)discs degenerate 4 immediately upstream of the g locus (Crosby et al. 2007). Large autosomal deletions may also be missed due to low viability. For example, two ribosomal protein genes (RpL37b and RpL12), which could be haplo-insufficient, are located ∼500 kb proximal and distal to the bw locus, respectively. In addition, some genes, such as y, w, and cn, are located in regions of low recombination, where NAHR might be reduced concomitantly with normal recombination. Taken together, the true rate is likely to be several times larger. Despite being obtained under specific mutant conditions, previous studies report the following estimates of duplication rate in females: 2.7 × 10−6 at the marron-like locus (Shapira and Finnerty 1986) and 16 × 10−6 and 170 × 10−6 at the rosy locus (Gelbart and Chovnick 1979; Shapira and Finnerty 1986). These estimates are even larger than the present one. Shapira and Finnerty (1986) further reported the occurrence of duplication in the male germline. In summary, the available data are not consistent with the hypothesis that the true duplication rate is as small as 0.001 × 10−6.
Alternatively, the origination rate from the genome sequence analyses may be biased downward. However, gene conversion could lead to an underestimation of the age of duplication through a homogenization process, which, in turn, may result in an overestimation of origination rate (Gao and Innan 2004). Therefore, a much higher rate compared with the above estimate is less likely.
Consequently, we conclude that a large fraction of duplications are deleterious and are rapidly eliminated from the population by purifying selection. Many such duplications would be missed in the genome analyses, resulting in lower estimates of duplication rate. Recently, Lynch et al. (2008) report a direct estimate of gene duplication of ∼1.5 × 10−6/gene/cell division in yeast. This is again much higher than the indirect estimates from genome sequences (Lynch 2007).
The chromosomal distribution of recent segmental duplications and CNVs in the human genome is nonrandom, and genes associated with immunity and defense, membrane surface interactions, drug detoxification, and growth/development were particularly enriched (Bailey et al. 2002). Although positive selection may play a significant role in later evolutionary phases of these segmental duplications, gene dosage effects must be more important in the early phase of evolution. Imbalanced gene dosage is expected to be less harmful for these genes enriched with segmental duplications.
The data for the origination rate of duplication are still limited to a few cases. Future study may uncover regional heterogeneity in the rate. A genomewide estimate of the duplication rate and its comparison with a distribution of duplication in the genome will give important insights into the evolutionary and medical impact of structural variants and the role of selection in genome evolution.
We thank Yuriko Ishii and Kimiko Suzuki for technical assistance, Yasushi Hiromi and Yutaka Inoue for valuable help, and the Drosophila Genetic Resource Center at the Kyoto Institute of Technology for stocks. The comments of the reviewers and editor are also greatly appreciated. This work was supported in part by a Grant-in-Aid for Scientific Research (C) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (T.T.-S.), a grant from the Mitsubishi Foundation (T.T.-S.), and the National Institute of Genetics Cooperative Research Program (M.I., T.T.-S.).
↵1 Present address: Technology and Development Team for Mammalian Cellular Dynamics, RIKEN (Institute of Physical and Chemical Research) BioResource Center, Tsukuba 305-0074, Japan.
Communicating editor: M. Aguadé
- Received July 1, 2008.
- Accepted December 23, 2008.
- Copyright © 2009 by the Genetics Society of America