Chromosome 4, the smallest autosome (~5 Mb in length) in Drosophila melanogaster contains two major regions. The centromeric domain (~4 Mb) is heterochromatic and consists primarily of short, satellite repeats. The remaining ~1.2 Mb, which constitutes the banded region (101E–102F) on salivary gland polytene chromosomes and contains the identified genes, is the region mapped in this study. Chromosome walking was hindered by the abundance of moderately repeated sequences dispersed along the chromosome, so we used many entry points to recover overlapping cosmid and BAC clones. In situ hybridization of probes from the two ends of the map to polytene chromosomes confirmed that the cloned region had spanned the 101E–102F interval. Our BAC clones comprised three contigs; one gap was positioned distally in 102EF and the other was located proximally at 102B. Twenty-three genes, representing about half of our revised estimate of the total number of genes on chromosome 4, were positioned on the BAC contigs. A minimal tiling set of the clones we have mapped will facilitate both the assembly of the DNA sequence of the chromosome and a functional analysis of its genes.
THE fruit fly Drosophila melanogaster has two sex chromosomes and three autosomes. The smallest chromosome, 4, is ~5 Mb length (Locke and McDermid 1993) and consists of two major regions. The centromeric region is heterochromatic and the three bands, h59, h60, and h61, identified by Hoechst-33258 staining and N banding, contain primarily short satellite repeats (Loheet al. 1993). This region forms part of the highly condensed, underreplicated chromocenter seen in polytene chromosome spreads. The other region, which contains the remaining ~1.2 Mb, constitutes cytogenetic bands 101E–102F on polytene salivary gland chromosomes. Of the ~75 genes Hochman (1976) predicted on chromosome 4, few have been mapped but all are likely in the 101E–102F region. Most Drosophila species have a similar, presumably homologous, small chromosome that appears as a “dot” in metaphase spreads. It is referred to as the F element in Muller's description of chromosome segments in Drosophila species (Muller 1940).
In addition to its small size, chromosome 4 is atypical in several other respects, all of which relate to the long-standing presumption that this chromosome is, in some ways, heterochromatic in nature. First, the banded region of chromosome 4 often shows a diffuse and poorly defined appearance that is similar to regions of hetero-chromatin. A second similarity to heterochromatin was revealed when it was shown that the chromosomal protein HP1, thought to be an important constituent of heterochromatin, bound to several sites along the chromosome (Eissenberget al. 1992). Third, repetitive DNA sequences normally confined to heterochromatin are distributed throughout the banded region of 4 (Mikloset al. 1988). Recently, a new very short repeat sequence, DINE-1, has been found distributed along chromosome 4 (Locke et al. 1999a,b). Fourth, similarities to heterochromatin are indicated by the behavior of P-element transgenes inserted into this region since they frequently show variegated expression of a white+ marker gene (Wallrath and Elgin 1995; Wallrathet al. 1996). This is a typical characteristic of insertions near heterochromatic boundaries. Finally, chromosome 4 normally fails to undergo crossing over during female meiosis (Bridges 1935).
The small size of chromosome 4 makes cytogenetic localization of its genes difficult, while the lack of crossing over precludes the construction of a typical genetic map. Consequently, many of the genes on 4 are uncharacterized at present. Hochman (1976), who undertook a systematic mutational screen to catalog the loci on 4, recovered lethal alleles in 36 essential loci and estimated that an additional 30 loci were missed in his screens. The mutants were roughly mapped to one of three regions using the two existing deletions. When we initiated this study, relatively few of the genes had been accurately mapped. More recently, however, several cloned chromosome 4 genes have been localized using in situ hybridization. We undertook the construction of a physical map of the chromosome to facilitate an ordering of the genes.
Using known, unique chromosome 4 sequences as probes, we isolated cosmid clones and initiated several chromosome walks with the goal of forming one cosmid contig spanning the gene-containing region defined cytologically as 101EF–102F. Despite screening one existing cosmid library and building three others, our cosmid map contained several gaps. However, by screening three new bacterial artificial chromosome (BAC) libraries, all but two of these gaps were filled. We now have clones that cover 1.1 Mb of chromosome 4. The physical map and its construction are presented here.
MATERIALS AND METHODS
Maintenance of stocks: Drosophila were grown at room temperature on a standard yeast-sucrose medium described in Nash and Bell (1968). In the later stages of the project, a standard cornmeal and molasses medium was used.
D. melanogaster cosmid libraries: Drosophila genomic DNA was prepared as described in Locke and Tartof (1994) from adults of the Oregon-R strain. A Sau3AI cosmid library was constructed by ligating partially digested, size selected, and dephosphorylated genomic DNA fragments into BamHI linearized pDO193 (Ahmed 1994). The cloning site of pDO193 was modified slightly to introduce flanking promoter sites for T7 and T3 RNA polymerase. The ligated products were packaged in vitro using Gigapack XL extracts (Stratagene, La Jolla, CA). Insert-containing colonies were picked into 384-well microtiter plates, stamped onto Genescreen Plus membranes, and processed using standard procedures (NEN Life Science Products, Inc.). A similar procedure was used to construct a KpnI cosmid library and a HindIII fosmid library, using the vector pBeloBAC11 (Wanget al. 1997). A Drosophila (iso-1) genomic cosmid library, made with Sau3AI (Tamkunet al. 1992), was also screened.
D. melanogaster BAC libraries: Three BAC genomic libraries were screened during the course of this work. The RPCI-98 BAC library (EcoRI partial digest) was constructed by Aaron Mammoser and Kazutoyo Osoegawa at Roswell Park Cancer Institute, in collaboration with the Drosophila Genome Project at Lawrence Berkeley National Laboratory. The DrosBACN library (NdeII partial digest) and the DrosBACH library (Hin-dIII partial digest) were both constructed by Alain Billaud at the Centre d'Etude du Polymorphisme Humaine (CEPH) and were distributed by the UK Human Genome Mapping Project (HGMP) Resource Centre. The BAC libraries were obtained as high-density gridded filters.
Escherichia coli host strains: The Sau3AI library was plated in E. coli host strain DH5α (GIBCO BRL Life Technologies, Gaithersburg, MD). The KpnI and HindIII cosmid and the fosmid library were plated in E. coli host strain DH10B (GIBCO BRL Life Technologies). The Sau3AI and KpnI cosmid libraries were plated additionally in E. coli host strain PMC107 (Ramanet al. 1997). All of the BAC clones were grown in host strain DH10B.
Radiolabeling of probes: DNA probes (cDNAs or genomic fragments) were labeled by random priming with [α-32P]dCTP as described by Feinberg and Vogelstein (1984). Fragments used as probes for screening library filters were shown to be free of repetitive DNA by hybridizing to blots of genomic DNA digested with a restriction endonuclease. RNA transcripts labeled with [α-32P]UTP were made using T7 and T3 RNA polymerases following the manufacturer's recommendations (Boehringer Mannheim, Indianapolis). Transcripts likewise were shown to be unique before being used to screen the library filters.
Screening of library filters: DNA-DNA hybridization reactions were performed in 50 mm Na3PO4 (pH 7.2), 6× SSC, 1% SDS at 65°. RNA-DNA hybridization reactions were performed in 50% formamide, 6× SSC, 1% SDS at 42°. High stringency washes consisted of a 30-min wash in 2× SSC, 0.1% SDS at 65°, a 30-min wash in 0.2× SSC, 0.1% SDS at 65°, and a 30-min wash in 0.1× SSC, 0.1% SDS at 65°.
Assembly of clones into contigs: Clones that were deemed positive for a particular probe were propagated in E. coli. Cosmid DNA was extracted by alkaline lysis. BAC clone DNA was isolated with a modified alkaline lysis protocol from the UK HGMP Resource Centre using ProCipitate reagent (Ligochem, Inc.). The clones were then assembled into contigs using a combination of cross-screening (Lockeet al. 1996), restriction fragment analysis, and hybridization reactions using unique end fragments. In situ hybridization to polytene squashes was then used to confirm the contig location on chromosome 4.
In situ hybridization to salivary chromosomes: Polytene chromosome squashes were prepared from larval salivary glands using the translocation T(1:4)wm5 (Lindsley and Zimm 1992), which reciprocally exchanges the banded region of chromosome 4 (101EF–102F) with the tip of the X chromosome (1A–3C). This translocation moves chromosome 4 away from the chromocenter thereby aiding in situ cytology. DNA probes were made by labeling whole BAC clones with either digoxigenin-11-dUTP or biotin-16-dUTP, and these were hybridized to polytene squashes using standard procedures (Boehringer Mannheim). Digoxigenin-labeled probes were detected with anti-digoxigenin rhodamine, Fab fragments. Biotin-labeled probes were detected by binding with streptavidin fluorescein, followed by a signal amplification step using biotinylated anti-streptavidin and a final binding step with streptavidin fluorescein. DAPI (4′,6-diamidino-2-phenylindole) was used as a counterstain. Rhodamine and fluorescein signals were observed using a fluorescence microscope with the appropriate filters.
Construction of cosmid contigs: Cosmid libraries were initially screened with unique probes mapped to chromosome 4: cubitus interruptus (ci), zfh-2, and eyeless (ey). Clones, recovered using these initial probes, furnished fragments that were used to identify adjacent cosmid clones in the libraries. As additional unique chromosome 4 probes were obtained, each was used to initiate a new chromosome walk or was mapped to clones in an existing contig. Chromosome walks were extended as long as unique terminal fragments could be identified or until a gap was reached. Using this methodology, we were able to obtain clones that covered about 800 kb of the estimated 1200 kb of the cytological region 101EF–102F (Figure 1). They comprised six contigs of which five were three cosmids or less in length. The sixth, which was estimated to be ~400 kb long, covered the region 102B to 102D as determined by in situ hybridization of the end clones to polytene chromosomes (data not shown). Unique probes from many of the genes shown at the top of Figure 1 were mapped to various cosmid clones, although a number (e.g., 1339, spa and pho) were not represented in the cosmid libraries.
Construction of BAC contigs: When the three BAC clone libraries became available (see materials and methods), we began by screening them with selected unique probes from the existing cosmid clones. We did this first to confirm the cosmid contigs and second to obtain BACs that spanned the gaps between the cosmid contigs. Using the BAC clones, we were able to coalesce the six cosmid contigs into two BAC contigs (Figure 1). Furthermore, the BAC libraries contained clones that hybridized to calcium-activated protein secretion (CAPS) (Walentet al. 1992), a probe that had not been used to screen for cosmid clones. This produced an additional BAC contig located distally on the chromosome that consisted of the three overlapping clones (BACH59K20, BACN40B03, and BACN5O16), two of which are shown in Figure 1.
The three BAC contigs (Figure 1) extend over the cytological region 101EF to 102F, the full length of the visible polytene chromosome. We used the terminal BAC clones (BACR5L22 and BACN5O16) that defined the proximal and distal limits of the three contigs as probes to polytene chromosomes to map their cytological location. Clone BACR5L22 hybridizes to the most proximal region on chromosome 4, 101EF, while the clone BACN5O16 hybridizes to the distal-most band at 102F and should include the telomere region (Figure 2A) since BACN5O16 also hybridizes to the telomeres of other chromosomes.
The distal gap is substantially larger than the proximal gap. When its flanking clones, BACR22J20 and BACH59K20, are used as probes to polytene chromosomes, there is a visible gap that spans several polytene bands (Figure 2B). On the basis of the width of the adjacent BAC clone hybridization signals we estimate this gap to be about one or two BACs in length (about 50–200 kb). We have a probe derived from a chromosome 4 P-element insertion, 118-E15 (Wallrath and Elgin 1995), that we mapped to within this gapped region by in situ hybridization. This probe also cross-hybridizes to a single site on the X chromosome. We have used this fragment to screen the BAC libraries and recovered >20 clones, all of which overlap and correspond to the X-chromosome site. This provides a direct comparison of the likelihood for recovering chromosome 4 vs. X-chromosome BAC clones and represents what may be a typical bias against chromosome 4 sequences in this, and possibly other, clone libraries.
The proximal gap, flanked by clones BACN4K08 and BACR1E21, is very small. Using these BACs as probes to polytene chromosomes reveals no visible gap between the two regions of hybridization (Figure 2C). Probes near the ends of both BAC clones identify common-sized restriction fragments on Southern transfers of genomic DNA. From their size we estimate that this gap is less than 1 kb. Although the very ends of both BACs BACN4K08 and BACR1E21 are repetitious, single-copy sequences somewhat removed from the gap in both clones were identified and used to probe the three BAC libraries. However, these probes failed to identify any new BACs that would have closed the gap. The repetitive sequences at the BAC clone ends also precluded us from using PCR methods to span this gap.
Localization of previously identified loci: During the construction of this physical map, we obtained a set of 23 cloned sequences from genes or anonymous cDNAs that had been mapped to chromosome 4 by various methods. These were positioned on the BAC and cosmid clones and their unambiguous order along the minimal tiling set of clones is shown at the top of Figure 1. Where known, the distal-proximal orientation of genes is shown. The location shown for the highly repetitious clone described as “Dr. D” (Mikloset al. 1988), which contains sequences of both DINE-1 and the Hoppel transposon (Locke et al. 1999a,b), was determined using a single-copy subfragment. The unique probes also hybridized to BAC clones not shown in Figure 1 that are listed in the appendix. The number of clones that hybridized to a particular probe varied from 1 to 25, but on average 5 clones were identified by each unique probe from the three libraries.
Distribution of Sau3AI restriction fragments in regions represented by or lacking cosmid clones: Some regions, for example, those represented by the clone 1339, were absent from our cosmid libraries but present in BAC clones, suggesting a bias may have existed in the distribution of cosmid clones. A bias of Sau3AI sites (used to make the largest cosmid library) could have distorted the production and/or recovery of the smaller cosmid clones. Since we now have BAC clones from regions lacking cosmid clones, we can ask whether the frequency of Sau3AI sites (as revealed by the size of Sau3AI fragments) differs in these regions compared to those represented in the cosmid libraries. We took three BAC clones from a cosmid-poor region (BACN28B18, BACN4K08, BACR1E21) and three from a cosmid-rich region (BACR44L03, BACR21N24, BACR30L15) and digested their DNA with Sau3AI (Figure 3A). The restriction fragments of both groups are similar in size suggesting no difference in the distribution of Sau3AI sites.
To show that we could detect a restriction enzyme cleavage bias through restriction fragment size differences, we used two other restriction enzymes, MseI and HhaI. Whereas Sau3AI cleaves at the sequence GATC, in which AT and GC content are equal, MseI cleaves at an AT-rich site (TTAA), while HhaI cleaves at a GC-rich site (GCGC). A comparison of the size of restriction fragments produced by these enzymes in the same BAC clones shows a difference in cleavage frequency. MseI produces mostly small fragments of <1 kb (Figure 3C), whereas HhaI produces fragments that are mostly greater than 1 kb (Figure 3B). The easily observable difference in fragment lengths indicates we could have detected differences in Sau3AI restriction if they had existed. Thus we can rule out a bias in Sau3AI cleavage sites as the cause of the incomplete coverage of the chromosome by the cosmid clones. Furthermore, these data support the idea that chromosome 4, at least as it is represented by these BAC clones, is AT rich.
The presence of many gaps in our initial cosmid contig is not unexpected because several factors are known to bias the recovery of genomic regions using cosmid vectors. These include a bias in the distribution of cleavage sites for the restriction enzyme used to construct the libraries, the inability of a fragment to be propagated in E. coli, and the chance probability of a clone not being included. Attempts to walk across several of the gaps were confounded by the presence of repeated sequences at the termini of the flanking clones. Locke et al. (1999a,b) estimated that approximately half the restriction fragments from chromosome 4 cosmid clones contain repeated sequences. To minimize this problem we screened BAC libraries, which contained inserts about three times as long as those in the cosmid clones, and recovered clones that reduced the number of gaps to two. It appears that sequences from these two gaps are not present in any of the three BAC libraries, although these libraries do contain a fair representation of other chromosome 4 sequences. The appendix shows that cloned regions vary in coverage from 1 to 25 BACs, with an average of ~5 clones per unique probe region.
Absence of clones in libraries built using partial digestion with Sau3AI might be due to a bias in the distribution of its cleavage site, GATC. This possibility was discarded for our partial Sau3AI cosmid library because BAC clones from regions within the gapped regions of the cosmid contigs had Sau3AI fragment sizes equivalent to those of BAC clones included in cosmid clones. Thus these sequences must be absent from these cosmid libraries for other reasons.
Previous in situ hybridization data have shown that chromosome 4 has a sequence distribution that differs from the other chromosomes (Pardueet al. 1987; Lowenhauptet al. 1989). Probing with the simple sequence (dC-dA)nċ(dG-dT)n showed hybridization to the euchromatic arms of all the chromosomes except chromosome 4. The (dC-dA)nċ(dG-dT)n probe distribution parallels that of the presence of crossing over (all euchromatic arms except chromosome 4). A large-scale genomic DNA sequence available for chromosome 4 was obtained from two cosmids located at some distance from each other (Ahmed and Podemski 1997; Locke et al. 1999a,b). Both sequences are very AT rich and together show only 34.8% G + C. This is substantially less than the genome-wide average of 43% (Laird and McCarthy 1969) or the value of 40.8% for an ~2.9-Mbp section of euchromatin from chromosome 2L (Ashburneret al. 1999). The large reduction in GC content on chromosome 4 is further reflected in the prevalence of the restriction site for MseI (TTAA) compared to that for HhaI (GCGC) in the BAC clones we tested. The sequence pecularities in chromosome 4 DNA [e.g., AT richness and lack of (dC-dA)nċ(dG-dT)n] may be relevant to the lack of crossing over on this chromosome.
We have localized 23 genes and anonymous cDNAs to the clone set shown in Figure 1. Included among the genes are four (ci, ey, spa/sv, and bt) of the seven visible mutations known to lie on the chromosome. Hochman (1976) described eight visible mutations on chromosome 4; however, it has recently been shown that spa and sv are alleles at a single complex locus (Fuet al. 1998). We have placed this locus within the region of overlap between BACR17F10 and BACR22J20 (Figure 1). Hochman (1976) used an approximation to the Poisson distribution for those loci that mutated either once or twice to predict that ~72 loci were present on the chromosome. However, a reexamination of his data, to include all genes with fewer than six alleles (25 in total), suggests the total number of loci on the chromosome is ~48. Another estimate of gene number, one protein coding gene per 13 kb (Ashburneret al. 1999), suggests chromosome 4 should contain 92 genes. But the analysis of two chromosome 4 cosmids containing 63 kb (Locke et al. 1999a,b) indicates that the gene density on chromosome 4 may not be as high as that on 2L and supports the revised lower estimates. These estimates may soon be out of date since the two major autosomes have been mapped (Hoskinset al. 2000) and the Drosophila genome sequence is being determined. The annotation of chromosome 4 sequences will provide a better estimate of gene number.
The repetitive sequences within the ~2.9 Mb region of 2L consist of six LINE elements, 11 retrotransposons, and one short (~171 bp) repeat (Ashburneret al. 1999). While our contig for chromosome 4 has not been probed for all the different Drosophila transposable elements, we do know that, on average, chromosome 4 clones have a repeated sequence every other restriction fragment (Locke et al. 1999a,b). While not truly quantitative, this observation clearly differentiates chromosome 4 from chromosome 2L, on which 1.8% of the sequence is repetitive (Ashburneret al. 1999), and from the Drosophila euchromatin in general, in which the repetitive DNA content is 9%. If the sequence from the two chromosome 4 cosmids provides an accurate view, then we know that the nature and interspersion pattern of repetitive DNA on chromosome 4 differs from that on the other chromosomes. Analysis of >63 kb showed that four transposable elements (including three Hoppel elements) and 18 of the short repeats we have called DINE-1 were present (Locke et al. 1999a,b). The euchromatic distribution of both Hoppel and DINE-1 is almost exclusively confined to chromosome 4 and explains why extending chromosome walks on this chromosome is more difficult than elsewhere in the genome (Bender et al. 1983a,b; Garberet al. 1983; Scottet al. 1983). Our physical map will assist in the assembly of the DNA sequence for the entire chromosome. There are perhaps as many as 25 genes represented by lethal alleles and three more by visible mutations that have yet to be coupled to specific sequenced genes. We (Locke et al. 1999a,b) and others (Mikloset al. 1988) have invoked the relative richness of repeats on chromosome 4 to explain its partly heterochromatic properties. The BAC clones will provide a useful resource for studying the genes on chromosome 4 and the functional significance of the interspersion pattern of its repetitive elements on the expression of these genes.
We thank B. Hepperle, G. Rairdan, D. Adams, and E. Woloshyn for excellent technical assistance. We also thank J. Bell, D. Pilgrim, K. Roy, D. Nash, H. McDermid, and A. Ahmed for their earlier support and interest. In addition, we thank the many researchers who provided chromosome 4 sequence probes used to identify the cosmid and BAC clones in this work: K. Arora, B. Berwin, P. Callaerts, A. R. Campos, H. Clevers, J. L. Couderc, R. Dubreuil, S. C. R. Elgin, M. Forte, E.Frei, E. A. Fyrberg, C. S. Goodman, Y. Grau, T. E. Haerry, M.-L. A. Joiner, J. A. Kassis, S. Kunes, R. K. Kutty, P. Lasko, R. W. Levis, J.Mehrens, M. Noll, J. N. Noordermeer, C. J. O'Kane. S. N. Robinow, S. Russel, A. Scully, I. Sidén-Kiamos, B. Stronach, C. Sung, S. Sweeney, L. L. Wallrath, A. Whitworth, M. Winberg, A. Worthington, the Drosophila Genome Project at Lawrence Berkeley National Laboratory, and the UK Human Genome Mapping Project Resource Centre. L.P. conducted or supervised the experimental work with the help of N.A.and H.K. J.L. and R.H. jointly conceived the project, designed the experimental approach and participated in the data analysis. J.L. wrote the article with substantial contributions from R.H. and L.P. This work was funded by grants from the Canadian Genome Analysis and Technology Program, the Medical Research Council of Canada, and the Natural Sciences and Engineering Research Council of Canada.
Communicating editor: R. S. Hawley
- Received December 1, 1999.
- Accepted March 22, 2000.
- Copyright © 2000 by the Genetics Society of America