Targeting induced local lesions in genomes (TILLING) is a reverse-genetic method for identifying point mutations in chemically mutagenized populations. For functional genomics, it is ideal to have a stable collection of heavily mutagenized lines that can be screened over an extended period of time. However, long-term storage is impractical for Drosophila, so mutant strains must be maintained by continual propagation of live cultures. Here we evaluate a strategy in which ethylmethane sulfonate (EMS) mutagenized chromosomes were maintained as heterozygotes with balancer chromosomes for >100 generations before screening. The strategy yielded a spectrum of point mutations similar to those found in previous studies of EMS-induced mutations, as well as 2.4% indels (insertions and deletions). Our analysis of 1887 point mutations in 148 targets showed evidence for selection against deleterious lesions and differential retention of lesions among targets on the basis of their position relative to balancer breakpoints, leading to a broad distribution of mutational densities. Despite selection and differential retention, the success of a user-funded service based on screening a large collection several years after mutagenesis indicates sufficient stability for use as a long-term reverse-genetic resource. Our study has implications for the use of balancer chromosomes to maintain mutant lines and provides the first large-scale quantitative assessment of the limitations of using breeding populations for repositories of genetic variability.
CHEMICAL mutagenesis has been the traditional means of obtaining genetic lesions for forward-genetic studies, but has also been widely used for reverse genetics (McCallum et al. 2000a; Draper et al. 2004; Smits et al. 2004; Bentley 2006; Cuppen et al. 2007; Till et al. 2007; Cooper et al. 2008). Whereas in forward-genetic screens lesions are identified phenotypically and are selectively retained for further study, reverse-genetic screens require the retention of all lines for a long enough period of time to allow for genotypic screening to identify the lesion. In seed plants, recovery of genetic lesions at a later time is usually simple because storage of seeds provides a nearly ideal long-term genetic repository (Comai and Henikoff 2006). Similarly, the ability to store whole animals or sperm by freezing allows for the maintenance of a permanent reverse-genetic resource (Draper et al. 2004; Cuppen et al. 2007). In contrast, many animal species lack a practical germline storage form. For example, Drosophila needs to be maintained continuously in breeding populations because attempts to store them frozen in a permanent repository have met with only limited success (Steponkus et al. 1990; Mazur et al. 2008). In practice, long-term storage is unfeasible for large numbers of mutagenized lines.
Two general strategies can be used for recovery of lesions by reverse-genetic approaches in organisms such as Drosophila. In the past, screening was done during the initial generations following mutagenesis, allowing for the recovery of lesions before interbreeding causes loss by selection or drift. In two previous reports, several thousands of lines were screened within a few generations following mutagenesis (Bentley et al. 2000; Winkler et al. 2005). This strategy was practical for the focus on a small number of targets and to minimize efforts involved in long-term maintenance of thousands of lines. Alternatively, mutagenized Drosophila chromosomes can be continuously maintained as interbreeding lines in heterozygous condition with balancer chromosomes to retain recessive sterile, subviable, or lethal alleles (Koundakjian et al. 2004). A potential disadvantage is that any genetic changes due to recombination, spontaneous mutation, or other means that occur subsequent to mutagenesis may accumulate in these collections, resulting in heterogeneity within a line or loss of mutant alleles. However, to make a reverse-genetic resource that can serve the general community of researchers working on an organism, a long-term maintenance strategy is needed. Just how long balanced lines retain induced mutations without the complicating effects of extraneous changes is unknown.
Previous studies described the Zuker collection (Z-lines), a forward-genetic resource of ∼6000 Drosophila melanogaster chromosome 2 and ∼6000 chromosome 3 balanced lines that were established and maintained for phenotypic screening (Koundakjian et al. 2004; Wakimoto et al. 2004). These balanced lines could potentially provide a reverse-genetic resource for the Drosophila community. Toward this end, we have used the Z-lines to establish a user-fee-funded Drosophila targeting induced local lesions in genomes (TILLING) service (Fly-TILL, http://tilling.fhcrc.org/fly/).
Over the course of 3 years, Fly-TILL has delivered a total of 2008 mutations in 148 user-selected targets. Some of these mutant alleles have been sequence verified by users and have been described in published studies (Lake et al. 2007), suggesting that balanced lines are sufficiently stable to be useful for reverse genetics. In this report, we analyze the full set of mutations discovered in the Z-lines by Fly-TILL, and we evaluate the long-term stability of these lines. We report on the mutational density and spectrum of mutant lesions identified in the Z-lines and use the data to deduce the effects of selection, drift, and meiotic recombination on chromosomes maintained in continuous culture with balancer chromosomes. We conclude that, despite some losses of mutations due to selection and differential retention, the Z-lines are sufficiently stable to be used over the course of several years as the basis of a reverse-genetic resource.
MATERIALS AND METHODS
Properties of the Z-lines, which consist of 12,336 lines, were previously described by Koundakjian et al. (2004). As the original aim for the collection was to obtain viable mutations affecting adult behavior and vision, the collection consists of lines containing chromosomes that were heavily mutagenized by exposure to 25 mm of EMS but permitted survival of homozygous adults in the F3 generation after mutagenesis. The chromosome 2 collection (Z2) was established in 1996–1997 with treated cn bw chromosomes maintained in stock with the CyO balancer. The chromosome 3 collection (Z3) was established in 1997–1998 with treated third chromosome in a bw; st background maintained in stock with TM6B-Tb. Both collections have been continuously maintained at 18°. For TILLING, at least 10 adult males were collected from 11,741 of the lines, including 439 of the PMM lines, which were previously described (Koundakjian et al. 2004). For 20% of the Z3 lines, the collected males were homozygous for the mutagenized chromosome. For the remaining lines, the collected males were heterozygous for the balancer chromosome.
DNAs were prepared using the Fastprep DNA kit (QBiogene/MP Biomedical, Irvine, CA) as previously described (Till et al. 2003a). DNAs were quantitated on 1.5% agarose gels by comparison to phage λ-derived DNA markers and normalized for concentration prior to pooling eightfold.
Minor modifications were made to the Arabidopsis TILLING method in which labeled PCR primers are used to amplify pooled genomic DNA, followed by the formation of heteroduplexes and digestion with a nuclease that cleaves specifically at mismatches. Users identified regions likely to be important for function in their gene of interest using the input utility CODDLE (McCallum et al. 2000b). The program identified primers to amplify ∼1.5-kb targets. Pooled DNA from the Z-lines was screened using PCR amplification as previously described (Till et al. 2003b) or in a two-step method. For the first step, 5 μl PCR mix (1× Takara Ex-Taq buffer, 3 mm MgCl2, 0.4 mm each dNTP, 0.09 μm each unlabeled primer, 0.5 units Takara Ex-Taq polymerase) was added to 5 μl of 0.075 ng/μl pooled genomic DNA and incubated for 8 cycles of 94° for 20 sec, 73° for 30 sec (increment −1°/cycle), ramp to 72° at 0.5°/sec, 72° for 1 min, followed by 16 cycles of 94° for 20 sec, 65° for 30 sec, ramp to 72° at 0.5°/sec for 1 min, followed by 5 min at 72°. For the second step, 2 μl of 0.84 μm each labeled primer was added to the PCR reactions and incubated at 94° for 2 min followed by 25 cycles of 94° for 20 sec, 65° for 30 sec, and ramp to 72° at 0.5°/sec for 1 min, followed by 5 min at 72°. Celery juice extract digestion of heteroduplexed DNA was performed to cleave at base mismatches. The products were separated by electrophoresis on denaturing gels using LI-COR 4200 or 4300 instruments (Lincoln, NE). Gel images were analyzed using GelBuddy (Zerr and Henikoff 2005). Pools containing putative mutant individuals were rescreened to identify the individuals, and then the target was sequenced for confirmation of the lesion and to determine zygosity as previously described (Greene et al. 2003; Till et al. 2003b). Not all targets were screened on every chromosome 2 or 3 line. For each target locus, either ∼3000 or ∼6000 lines were screened. Thus, the data can be divided into results from screening the first half (Z2-A or Z3-A) or second half (Z2-B or Z3-B) of each population. Results from screening each half of the populations were compared to each other and to length-matched sets of Arabidopsis data (supplemental Table S1).
Estimation of mutation frequency:
Mutation frequency was calculated as (number of mutations discovered)/[(total number of bases screened) × (total number of individuals screened)]. Each target had 200 bp subtracted from the length due to inefficient screening of the ends of amplicons (Greene et al. 2003). Twelve lines that contained multiple synonymous polymorphisms in several targets were eliminated from analysis as they were likely to have been contaminants or harbored a mutation that accelerated their rate of spontaneous mutation.
Mutational spectrum and categories of protein-coding lesions:
The Z-lines were established over a period of 3 years (from 1996 to 1998) (Koundakjian et al. 2004). We obtained the chromosome 3 collection for the isolation of DNA in 2003 and the chromosome 2 collection in 2004. The time interval from mutagenesis to DNA collection ranged from 5 to 8 years, which is estimated to be 120–215 fly generations. We performed TILLING on eightfold pools of DNA using the mismatch-cleavage protocol that we previously described for Arabidopsis TILLING (Till et al. 2003b). For chromosome 3, we have screened 104 user-selected targets and have sequence verified 1487 single-nucleotide mutations. For chromosome 2, we have screened 44 user-selected targets and have sequence verified 521 single-nucleotide mutations. For both chromosomes, the average target size was ∼1440 bp. This implies a density for discovered mutations of ∼1/380 kb for the Z3 lines and ∼1/480 for the Z2 lines (supplemental Table S1).
Seventy-four percent of point mutations that we identified were G/C to A/T transitions (Table 1). This preponderance of G/C to A/T transitions is consistent with previous studies of EMS-induced mutations in Drosophila (Bentley et al. 2000; Winkler et al. 2005) and is expected on the basis of the guanine alkylating activity of EMS. Nevertheless, TILLING of EMS-mutagenized Arabidopsis lines resulted in 99.5% G/C to A/T transitions, suggestive of species-specific differences in repair of EMS-induced lesions (Greene et al. 2003). A-to-T transversions comprised the second largest class of Drosophila mutations (16%). The remaining 10% of point mutations were distributed among various categories. Interestingly, we also recovered 48 indels (2.4%), ranging from 1 to 538 bp with a median size of 3 bp. A similar rate of indel recovery was reported for EMS-induced mutations that were identified as noncomplementers of a nanos mutation in a forward-genetic screen (one deletion and one insertion among 60 alleles) (Arrizabalaga and Lehmann 1999). In contrast, no confirmed indels were detected among ∼2000 Arabidopsis mutations (Greene et al. 2003) or among ∼1000 Caenorhabditis elegans mutations (Cuppen et al. 2007).
In nearly every case, the target was selected to maximize the discovery of nonsynonymous mutations in coding exons that are likely to damage protein function. On the basis of the predicted amino acid sequences of these 148 chromosome 2 and 3 targets, we expected to discover 52% missense, 43% silent, and 5% truncation (stop codons and splice junction changes) mutations (Table 2). We observed 53% missense mutations and only 3% truncation mutations. This significant (P < 0.05) reduction in observed relative to expected truncation mutations suggests selection against deleterious lesions following mutagenesis.
Distribution of mutations among TILLed loci:
Our previous analysis of 2000 EMS-induced Arabidopsis mutations in 190 targets did not uncover any evidence that some targets were more heavily mutagenized than others (Greene et al. 2003). The discovery of 6000 more mutations in 400 additional targets has still not revealed any evidence for differentially mutagenized loci (http://tilling.fhcrc.org and our unpublished data). Insofar as this appears to be the largest data set of EMS-induced mutations that has been analyzed, it seems clear that mutagenesis with EMS is random and the underlying distribution of EMS-induced mutations in our Arabidopsis population is normal (Gaussian). Given this expectation of a normal distribution, we were surprised to find that the number of mutations was much higher for some Drosophila targets than for others (Figure 1, top). Compared to Arabidopsis (bottom), the distribution of mutations per target for chromosome 3 has a wider range (standard deviation ∼5.2 vs. 3.8; supplemental Table S1). Comparing the distributions of discovered mutations from chromosome 3 to Arabidopsis by the nonparametric Kolmogorov–Smirnov test, we find that the distributions are significantly different (D = 0.220, P = 0.001). The distribution from chromosome 2 also has a wider range than that from Arabidopsis; however, this difference is not statistically significant (D = 0.162, P = 0.31), which reflects the smaller sample size and greater variability in the chromosome 2 distribution (supplemental Figure S1).
We considered the possibility that the apparent difference in the Drosophila and Arabidopsis distributions reflects a bias in detection. Indeed, some Drosophila targets were more difficult to screen due to differences between primer pairs in their ability to amplify targets, especially chromosome 2 targets where amplification was generally more difficult. Poor amplification is likely to cause failure to detect mutations (false negatives). These difficulties might account for some of the targets that yielded very few mutations. To minimize these sources of technical variation, we limited our analysis to the more heavily screened chromosome 3 lines. By evaluating the detection of mutations from independent screens of three overlapping amplicons, we estimate a false-negative rate of ∼30% (91/132 detected), comparable to the rate observed in Arabidopsis (25%; Greene et al. 2003). Given the similarities in false-negative rates for Arabidopsis and Drosophila TILLING, the bias in mutation distribution is likely not due to failure to detect mutations.
Retention of induced mutations is nonrandom along chromosomes:
We next considered the possibility that the nonnormal Drosophila distribution of mutations was inherent to the Z-lines. Given the evidence that EMS mutagenesis of Arabidopsis thaliana and C. elegans is random (Greene et al. 2003; Cuppen et al. 2007), it seemed unlikely that the distribution we observe in Drosophila occurred as a direct consequence of exposure to EMS. However, it is possible that the nonnormal distribution of mutations resulted from differential selection of mutations during establishment or maintenance of the lines. The Arabidopsis lines were not subject to any artificial selection other than the ability to produce seeds. However, ∼80% of the Drosophila lines initially established after mutagenesis failed to yield homozygotes and were discarded (Koundakjian et al. 2004). Because lines with homozygous lethal chromosomes were intentionally removed, we expected that the distribution would be more restricted than that observed for Arabidopsis and were surprised to find the opposite. As this selection of viable lines does not explain the increased variation observed in the distribution of mutations, we are left with the possibility that the bias was introduced during maintenance of the lines. The Z-lines have been maintained in live culture for many generations, unlike Arabidopsis, where the third generation of mutant lines has been stored as seed stock for years. For chromosome 2, lines were maintained for 190–215 generations prior to collection for screening and for chromosome 3, lines were maintained for 120–145 generations.
One way that long-term maintenance of stocks might result in differential retention of mutations is if recombination occurred between the mutagenized chromosome and the balancer chromosome, leading to loss of mutant alleles by drift or selection. If so, then we would expect that some lines that are demonstrably heterozygous because they retain the balancer chromosome will harbor homozygous mutations. Consistent with this possibility, we discovered 91 homozygous mutations in balanced chromosome 3 lines. These homozygous mutations are likely to have been induced, because they show a similar proportion of G/C to A/T transitions (2/3) as were found for the entire population. Therefore, loss by recombination is a plausible mechanism that would result in a variable distribution of number of alleles recovered per locus in Drosophila.
The prevention of recombination and loss of mutant alleles is the major reason that balancer chromosomes contain multiple inversions, because rearrangement breakpoints suppress recombination (Gong et al. 2005). It follows that alleles closer to rearrangement breakpoints will be less likely to undergo reciprocal recombination or gene conversion in heterozygotes. Accordingly, we can test the possibility that recombination contributes to the observed distribution of Drosophila mutations by asking whether there is evidence for increased frequency of mutant alleles located in regions that contain breakpoints on the balancer chromosome compared to mutations located in regions devoid of nearby breakpoints on the balancer. Indeed, when we divide the chromosome 3 map into intervals separated by balancer inversion breakpoints (Figure 2), where mutant alleles are considered close if they are within 3 cM (as defined on the standard genetic map) of the nearest breakpoint, we find a trend consistent with this possibility (supplemental Figure 2). For the 97 screens with amplicon targets that are close to a balancer inversion breakpoint, we recovered an average of 10 mutations per amplicon, whereas for the 67 screens with similarly sized targets that are far from a breakpoint, we recovered an average of 7.5 mutations per amplicon (Table 3). While not statistically significant (P < 0.1), this suggests that ∼1/4 of the mutations far from an inversion breakpoint have been lost by recombination.
Recombination by itself will not cause loss of mutant alleles. However, with an effective population size of <25 individuals per vial (on the basis of the approximate number of individuals transferred each generation during stock maintenance) extinction of some mutant alleles will inevitably occur, a process that is exacerbated by bottlenecks during routine maintenance of fly stocks. In addition to population drift, we expect that deleterious alleles will be lost by purifying selection, consistent with our finding that truncation mutations are the most sensitive to loss. To evaluate this possibility, we asked whether truncation mutations show evidence of differential retention depending on distance from the nearest balancer chromosome inversion breakpoint. We found 68 truncation mutations for chromosome 3, and 23 for chromosome 2. Considering only chromosome 3, the 97 screens with amplicon targets that are close to inversion breakpoints yielded 51 truncation mutations, whereas the 67 screens with similarly sized targets that are far from inversion breakpoints yielded only 17 truncation mutations (P < 0.05; Table 3). This finding confirms that the higher number of alleles per locus discovered in amplicons close to balancer rearrangement breakpoints can be accounted for by recombination between mutagenized chromosome 3 and the balancer chromosome.
We have shown that a point-mutation resource based on screening a continuously breeding population is a practical reverse-genetic strategy. Despite the fact that over a decade has passed since the Drosophila Z-lines were established, we are still able to recover useful allelic series for the large majority of genes subject to TILLING. Nevertheless, our study has revealed that the number of EMS-induced mutations recovered per Drosophila target is more variable than what we have observed for EMS-induced mutations in a long-term Arabidopsis reverse-genetic resource, where the distribution of mutations was as expected for random mutagenesis. This variability appears to be a general feature of the Z-lines, as it was seen independently for the chromosome 2 and chromosome 3 lines. We attribute most of the variability in number of mutations recovered to gradual attrition of mutations during stock maintenance.
In addition to attrition during maintenance, we expect that the Z-lines have been subject to other processes that might have affected the density of mutations following EMS mutagenesis. The overall mutation rate that we observed is about half that observed in a previous TILLING screen (Winkler et al. 2005), and at least some of this difference can be accounted for by attrition during stock maintenance. Another likely contribution to this difference was the use of a higher dose of EMS (50 mm for Winkler et al. 2005 vs. 25 mm for the Z-lines) than used for the Z-lines. In addition, the generation of mosaic offspring or selection during the first few generations following mutagenesis might have contributed to the lower mutation rate that we observed (Koundakjian et al. 2004). For example, nearly all of the homozygous lethal chromosomes were culled out during the F3 generation, with the aim of establishing a forward-genetic resource for phenotypic screening of adult phenotypes. However, this procedure might have reduced the overall mutation density by favoring the lines derived from adult males that had ingested relatively less EMS during mutagenesis. This selection step would have reduced the mutation density, but neither the removal of lethals nor the retention of mosaic lines would be expected to affect the distribution of mutations, including those that are silent (supplemental Table S2).
We also expect that some spontaneous mutations have accumulated during stock maintenance. A study of spontaneous mutation accumulation in lines diverged by ∼200 generations discovered 25 point mutations by screening ∼20 million bp, or 6 × 10−9 bp/generation (Haag-Liautard et al. 2007). For chromosome 3, we observed 2.6 × 10−6 mutations/bp after ∼120 generations. Therefore, if the rate of spontaneous mutations varies little between strains and differences in culture conditions, then we might predict a maximum of ∼30% of the mutations that we identified to have accumulated spontaneously. However, it is likely that far fewer spontaneous mutations have accumulated spontaneously in the Z-lines, because the rate estimated by Haag-Liautard et al. (2007) is based on sib-mated lines, so mutations had a higher chance of being fixed. Although G/C to A/T transitions comprised the largest fraction of spontaneous mutations (11/25), the second largest fraction (6/25) consisted of A/T to G/C transitions (Haag-Liautard et al. 2007), a more than sixfold excess over what we observed in the Z-lines (3.9%). Therefore, even if all of the Z-line A/T to G/C mutations were spontaneous, they would be expected to account for only ∼15% of the total, placing an upper limit on the spontaneous mutation accumulation rate. In any case, spontaneous mutations would be expected to occur without bias in favor of particular genes, making it unlikely that their accumulation contributes to the variability we observed.
We also observed homozygosity for 6% of the chromosome 3 mutant alleles in lines that were heterozygous for a balancer chromosome, evidence for double crossing over or gene conversion. We then tested the possibility that such recombination events can account for the variability in the distribution of mutations. We found that truncation mutations were significantly more likely to be in genes close to balancer breakpoints, which suggests that they were better protected from loss due to recombination. Local recombination with balancer chromosomes will gradually reduce the utility of lines during long-term maintenance: mutations that are lost before isolation of DNA for screening will result in the discovery of fewer mutations than expected for a gene, and mutations lost after DNA isolation will represent unrecoverable false positives. Although several years had passed between mutagenesis and DNA isolation, and screening has continued for >3 years, the Z-lines remain a valuable, if imperfect, reverse-genetic resource.
A variety of different reverse-genetic resources have proliferated for Drosophila and other model organisms (Alonso and Ecker 2006; Barstead and Moerman 2006; Dietzl et al. 2007; Sivasubbu et al. 2007), demonstrating that there is still a strong demand for allelic series of point mutations. Knockouts are often too severe to be useful for studying vital functions, and RNAi-based knockdowns raise issues of tissue specificity and off-target effects. However, despite incremental efficiency improvements in our TILLING pipeline, screening a large enough population to identify an allelic series remains relatively time consuming and expensive, requiring thousands of dollars per target to screen 3000 lines for an average yield of ∼9 chromosome 3 point mutations. Because of polymorphic differences between the mutagenized chromosome and the balancer, TILLING balanced fly lines is inherently less efficient than TILLING Arabidopsis lines, which are derived from a single genome. Using our heteroduplex mismatch-cleavage method, gel bands produced by common polymorphisms in many individuals can mask new mutations present in single individuals (Slade et al. 2005; Cooper et al. 2008), and unrecognized primer mismatches likely increase the failure rate. Fortunately, massively parallel sequencing following oligonucleotide-based capture strategies have the potential of making TILLING faster and cheaper without relying on heteroduplex-based screening (E. Greene, J. Cooper and S. Henikoff, unpublished results). As a result, we envision the simultaneous interrogation of dozens or hundreds of gene regions. For this prospect to be practical, a long-term point mutation resource needs to be established, and in the case of Drosophila, this means a reasonably stable collection. As we have shown here, such a long-term resource is feasible, and we expect that our analysis of Z-line mutation retention will provide a useful guide for such an effort.
We are grateful to Charles Zuker for generously supporting the development and maintenance of the Zuker collection, to Lorraine Chuman and Andrea Lougheed for maintaining and distributing the Z-lines to Fly-TILL users, to members of the Zuker lab for helping Fly-TILL get started, and to Luca Comai for support during the early phases of this project. We thank past and present members of the Fly-TILL Project, including Elisabeth Bowers, Maggie Darlow, Aaron Holm, Robert Laport, Lindsay Soetaert, and Kim Young, and Kathleen Wilson for help with fly collections. We also thank the many users in the fly community for supporting Fly-TILL with their orders. This work was partially supported by National Science Foundation grant 0234960 to S.H., by National Institutes of Health grant U54 HD42454 and a Washington Research Foundation Professorship to B.W., and by the Howard Hughes Medical Institute.
↵1 Present address: FAO/IAEA Agricultural and Biotechnology Laboratory, A-2444 Seibersdorf, Austria.
Communicating editor: R. S. Hawley
- Received June 9, 2008.
- Accepted July 20, 2008.
- Copyright © 2008 by the Genetics Society of America