Genomic Organization and Characterization of the white Locus of the Mediterranean Fruitfly, Ceratitis capitata
L. M. Gomulski, R. J. Pitts, S. Costa, G. Saccone, C. Torti, L. C. Polito, G. Gasperi, A. R. Malacrida, F. C. Kafatos, L. J. Zwiebel

Abstract

An ∼14-kb region of genomic DNA encoding the wild-type white eye (w+) color gene from the medfly, Ceratitis capitata has been cloned and characterized at the molecular level. Comparison of the intron-exon organization of this locus among several dipteran insects reveals distinct organizational patterns that are consistent with the phylogenetic relationships of these flies and the dendrogram of the predicted primary amino acid sequence of the white loci. An examination of w+ expression during medfly development has been carried out, displaying overall similarity to corresponding studies for white gene homologues in Drosophila melanogaster and other insects. Interestingly, we have detected two phenotypically neutral allelic forms of the locus that have arisen as the result of an apparently novel insertion or deletion event located in the large first intron of the medfly white locus. Cloning and sequencing of two mutant white alleles, w1 and w2, from the we,wp and M245 strains, respectively, indicate that the mutant conditions in these strains are the result of independent events—a frameshift mutation in exon 6 for w1 and a deletion including a large part of exon 2 in the case of w2.

THE Mediterranean fruit fly Ceratitis capitata (medfly) is a major pest of many important agricultural products. Originally a native of sub-Saharan Africa, over the last 100 years it has taken advantage of trading activities to expand its range throughout the world including the Mediterranean basin, the Americas, and Australia. In the middle of the 20th century the medfly was introduced into South and Central America and has since spread rapidly, threatening North American fruit production (Voss 1992). Population genetic studies have confirmed the worldwide colonization process (Malacridaet al. 1998; He and Haymer 1999). Recently, a considerable effort has been undertaken to prevent its spread to the continental United States, especially in California (Kahnet al. 1990; Carey 1991; Haymeret al. 1997; Davieset al. 1999). The medfly is the most notorious of the tephritid fruit flies because of its unusually wide host range and its ability to adapt to both temperate and tropical zones (Robinson 1989).

Given this economic imperative, several studies have been undertaken to design novel biological control programs by which the medfly’s negative impact might be reduced (International Atomic Energy Agency 1998). To facilitate both the design and implementation of such programs, as well as to advance ongoing genetic studies of the medfly, considerable efforts have been made to establish and enhance methodology for germline transformation in this agricultural insect pest model system (Loukeriset al. 1995; Handleret al. 1998). A significant aspect of these transgenic technologies is the choice of genetic markers that may be used to distinguish stable lines of transformed insects. Many different medfly mutant strains are available (Rossler and Rosenthal 1992; Rossleret al. 1994; Gubbet al. 1998) including eye color mutants. The pigmentation transporter gene white and other genes responsible for adult eye color have often been the phenotypic markers of choice in Drosophila and other insect transformation systems (Rubin and Spradling 1982; Coateset al. 1998; Jasinskieneet al. 1998). Moreover, two independent allelic eye color mutations of the white locus, which result in lack of eye pigmentation, have been described in the medfly (Rossler and Rosenthal 1992; Tortiet al. 1994). Wild-type eye color for C. capitata is emerald metallic green. The two spontaneous mutations, like the white eye gene of Drosophila melanogaster, affect the pigmentation of the larval and adult malpighian tubules, the ocelli, and the adult compound eyes, which appear white in color. However, unlike Drosophila, the medfly white eye mutation does not affect the testes sheath, which retains its yellow coloration. These two mutations have been shown to be allelic by genetic tests (Rossleret al. 1994; Gubbet al. 1998) and homologous to the Drosophila white gene (Zwiebelet al. 1995). The white locus of C. capitata is located on chromosome 5 (Tortiet al. 1994), where most of the genes homologous to the X-linked genes of Drosophila are located (Malacridaet al. 1986; Zacharopoulouet al. 1992). Furthermore, and most importantly, partial phenotypic rescue of the mutants was achieved by germline transformation using the medfly white cDNA as a dominant marker (Loukeriset al. 1995; Handleret al. 1998).

The white gene has been the subject of considerable genetic and molecular characterizations in D. melanogaster (reviewed in Hazelrigg 1987) and the corresponding homologues have been identified in several other insects of economic and medical importance. Initially, using cross-hybridization to Drosophila white as a probe, a partial sequence was obtained from the blowfly Lucilia cuprina (Elizuret al. 1990). This information facilitated an alignment of white sequences to design oligonucleotide primers against conserved amino acid domains that were used in a PCR cloning approach to isolate a white cDNA from C. capitata (Zwiebelet al. 1995). Subsequently, a full-length cDNA sequence from the blowfly (Garciaet al. 1996), as well as genomic clones encompassing the entire white locus from three mosquito species, Anopheles albimanus (Keet al. 1997), A. gambiae s.s. (Besanskyet al. 1995), and Aedes aegypti (Coateset al. 1997), and the Queensland fruit fly Bactrocera tryoni (Bennett and Frommer 1997), have been reported. More recently, cDNAs encoding human and mouse white proteins have also been isolated (Croopet al. 1997).

Successful gene transfer into the medfly using transformation vectors based on the Minos (Loukeriset al. 1995) and piggyBac (Handleret al. 1998) transposable elements was achieved using two independent spontaneous medfly white eye color mutants as the recipient strains. In both cases, these allelic mutations were rescued by a wild-type white cDNA minigene within the transposon constructs. Here we report the cloning and characterization of the wild-type locus and the two spontaneous mutant white alleles that are currently used in transformation studies. These studies provide a detailed analysis of these relevant alleles as well as a unique opportunity to examine the genomic organization and evolution of an entire locus within a group of dipteran insects of academic, economic, and medical importance.

MATERIALS AND METHODS

White eye alleles: Three white alleles have been considered, w+, w1, and w2. w+ is the wild-type phenotype allele, while w1 and w2 are two independent spontaneous mutations that lack all eye pigmentation. For purposes of nomenclature we adopt the rules described for Drosophila as suggested by Gubb et al. (1998). w1 was previously described as we (Rossler and Rosenthal 1992) while w2 (Gubbet al. 1998) was previously described as w (Rossleret al. 1994; Zwiebelet al. 1995).

Medfly strains: Three independently derived medfly laboratory strains were used in this study: Benakeion, which is associated with the w+ allele and has wild-type eye color, and two white eye mutant strains, we,wp, which is associated with the w1 allele, and M245, which is associated with the w2 allele.

The wild-type eye color strain Benakeion was originally established in the laboratory by P. A. Mourikis (Benakeion Institute of Phytopathology, Athens, Greece) with flies from the Southern Peloponnese (Greece) and Palermo (Italy; Rina and Savakis 1991). M245 was derived from the white eye strain (Tortiet al. 1994), which displays intra- and interstrain dysgenic traits such as gonadal sterility, chromosomal breakages, and white instability. The M245 line is homozygous for the dark pupae (dp) mutation, which affects the color of the pupal case (Rossler and Koltin 1976). The we,wp strain is homozygous for the white pupae (wp) mutation, which affects the color of the pupal case (Rossler 1979).

All strains were maintained in quarantine facilities at either the Institute for Molecular Biology and Biotechnology, Heraklion, Crete or at the University of Pavia, Pavia, Italy. Standard larval and adult rearing methods were used (Saul 1982).

Isolation of nucleic acids: Total genomic DNA was prepared from pooled collections (for libraries) or single adult medflies (for Southern blots) according to standard protocols for Drosophila (Ashburner 1989). Individual phage DNA was prepared from single plaques using plate lysates according to Sambrook et al. (1989). For Northern blot and reverse transcriptase (RT)-PCR analysis, total RNA was extracted from embryonic, larval, pupal, or adult medflies using RNeasy kits (QIAGEN, Chatsworth, CA) according to the manufacturer’s instructions. Poly(A)+ RNA was isolated using a Pharmacia (Piscataway, NJ) mRNA purification kit.

Isolation and subcloning of wild-type w+ and white eye mutant w2 genomic phage: The w+ allele: Approximately 100,000 clones from a Benakeion genomic phage library were screened with a 400-bp PCR product generated using degenerate oligonucleotide primers designed using an alignment of the white gene (w+) sequences from D. melanogaster and L. cuprina (Zwiebelet al. 1995). A set of four positive plaques were identified, purified, and mapped by restriction digestions and Southern blotting that distinguished two classes of w+ gene containing genomic phage, designated C1 and D1, containing inserts of ∼17 and 15.5 kb, respectively. Both classes of phage contained identical EcoRI digestion products of 1.8, 2.9, 0.8, and 3.0 kb as well as unique 7.0-(D1) and 8.5-kb (C1) products that were subsequently subcloned into pBluescript II (KS) (Stratagene, La Jolla, CA) vectors by shotgun cloning protocols (Sambrooket al. 1989).

The w2 mutant allele: A genomic phage library was constructed from total genomic DNA from adult w2 mutant flies from the M245 strain. MboI-digested M245 DNA was size fractionated on a sucrose gradient and fragments ranging from 16 to 20 kb were ligated into λ DASH II BamHI vector arms and packaged into phage using Gigapack II gold packaging extracts. Recombinant phage were assayed according to the manufacturer’s protocols (Stratagene cloning systems).

This library was screened using a mixture of probes derived from Sau3A digestion of a 5.5-kb EcoRI/XhoI subclone of the Benakeion w+ 8.5-kb (C1) EcoRI clone. Two positive plaques were obtained, which appeared to be identical on the basis of restriction analysis. EcoRI digestion restriction fragments of the insert were cloned into the pBluescript II KS vector (Stratagene). The subclones obtained were 2.1-kb EcoRI, 0.8-kb EcoRI, and ∼10-kb EcoRI.

The w1 mutant allele: Appropriate PCR primers, based on the Benakeion and M245 genomic sequences, were used to amplify four fragments that encompassed the entire coding region of the w1 allele with the exception of most of intron 1. These fragments were cloned into the PCR2.1-TOPO vector using the TOPO TA cloning kit (Invitrogen, San Diego). At least three independent clones, derived from different individuals, were analyzed for each fragment.

DNA sequencing: DNA sequencing of each clone or subclone was carried out by a combination of manual reactions using Sequenase version 2.0 (United States Biochemical, Cleveland) and automated analysis by the EMBL sequencing service. In addition, automated sequencing was carried out using both ABI 377 and ABI 310 instruments using the ABI Prism BigDye Terminator cycle sequencing ready reaction kit (Perkin Elmer, Norwalk, CT). All sequence data were compiled and analyzed using Sequencher Version 3.0 software (Perkin Elmer).

Sequence comparisons and phylogenetic analysis: Sequence comparisons with the DDBJ/EMBL/GenBank and SWISS-PROT databases were performed using the Genetics Computer Group software (Devereuxet al. 1984). Alignments and dendrograms were constructed using the CLUSTAL W software suite (Thompsonet al. 1994). Sequence relationships were analyzed using the maximum-parsimony option on the phylogenetic analysis using parsimony (PAUP) software suite (Swofford 1991). Confidence levels for phyletic groupings were obtained by the bootstrapping method with a heuristic search (Felsenstein 1985), repeating the analysis 1000 times.

Southern and Northern blotting: Southern blot transfers of genomic DNA or white subclones to Hybond N+ (Amersham Life Science, Arlington Heights, IL) were carried out under alkaline conditions according to the manufacturer’s instructions. Hybridizations were carried out at high stringency under aqueous conditions according to the protocol of Church and Gilbert (1984). Poly(A)+ RNA (15 μg per lane) was run on 1.2% formaldehyde/agarose gels and transferred to Zeta-Probe GT membranes (Bio-Rad, Richmond, CA). Probes for hybridization were prepared from the inserts derived from appropriate medfly w+ genomic or cDNA subclones and a 650-bp fragment of the medfly tubulin gene (the kind gift of Drs. B. Arca and S. Brogna, Institute of Molecular Biology and Biotechnology) after gel purification using Qiaquick gel extraction protocols (QIAGEN). Purified DNAs were radiolabeled with [α-32P]dCTP by random hexamer labeling (Feinberg and Vogelstein 1983) using a High Prime DNA labeling kit (Boehringer Mannheim, Indianapolis). Hybridization was performed at 42° in 5× SSPE, 2× Denhardt’s, 0.1% SDS in 50% formamide. The first wash was performed in 1× SSC, 0.2% SDS at room temperature; the final two washes were in 0.2× SSC, 0.1% SDS at 65°. Autoradiography was carried out for between 12 hr and 5 days at -80°.

PCR, RT-PCR, and 5′ Rapid Amplification of cDNA Ends: PCR was performed using a either a Perkin Elmer 9600 or 9700 thermal cycler with Ampli-Taq enzyme (Perkin Elmer) under standard reaction conditions in 10 mm Tris-HCl, pH 8.3, 50 mm KCl, 200 μm each dNTP, and 1.5 mm MgCl2. Long PCR was performed using appropriate primers and the GeneAmp XL PCR kit (Perkin Elmer) according to the manufacturer’s instructions. 5′ rapid amplification of cDNA ends (RACE) was performed on total RNA prepared from adults using a MARATHON RACE kit (CLONTECH, Palo Alto, CA) according to the manufacturer’s instructions using an oligonucleotide primer designated as W13 (5′ gcgttctactgctttagtatc tac). For RT-PCR analysis, C. capitata total RNA was isolated from 1-day-old pupae of wild-type (Benakeion), w1, and w2 mutants, using the RNeasy RNA isolation kit (QIAGEN) following the manufacturer’s protocol. RT-PCR was subsequently performed using the Titan One-Tube RT-PCR kit (Roche Molecular Biochemicals), where the manufacturer’s protocol was followed except that reactions were scaled down from 50 μl to 25 μl by using half the amount of each reagent. Approximately 0.5 μg of each RNA sample and 0.2 μm final concentration of White5 (5′ gcagtagaacgccatagag) and White4 (5′ acgct gtgtgccatgaacg) oligonucleotide primers, corresponding to sequences within exons 1 and 3, respectively, were used for each reaction. First-strand synthesis was performed at 50° for 30 min. This step was followed by 10 cycles of 94° for 30 sec, 53° for 30 sec, and 68° for 45 sec, then 30 cycles of 94° for 30 sec, 53° for 30 sec, and 68° for 45 sec with incremental increases of 5 sec per cycle. Reactions were then incubated at 68° for 7 min prior to a final hold at 4°. Five microliters of each reaction were analyzed on a 1.5% agarose gel.

RESULTS

The complete genomic sequence of the wild-type white (w+) gene: Approximately 100,000 plaques (at least two genome equivalents) of a C. capitata total genomic library derived from the wild-type Benakeion laboratory strain (Zwiebelet al. 1995) were probed with a 400-bp white PCR product generated from Benakeion genomic DNA, previously isolated in our laboratory. Four positive plaques were characterized by restriction mapping and Southern blotting. The restriction maps (Figure 1) indicated that two subclasses of phage inserts were present among the four phage differing only in the length of an EcoRI fragment of 7.0/8.5 kb. One representative phage from each subclass, designated D1 and C1, respectively, was chosen for further study. DNA sequencing demonstrated that the only difference between these subclasses was the presence or absence of an ∼1.5-kb sequence at the extreme 3′ end of the large first intron (Figure 1). Southern blot analysis of EcoRI-digested Benakeion genomic DNA prepared from individual wild-type w+ flies, using a cDNA-derived probe (Zwiebelet al. 1995), demonstrated that this polymorphism is also present in the strain. As shown in Figure 2, individual flies can be homozygous for either one of the 8.5- and 7.0-kb alleles, or heterozygous for both. The polymorphic sequence indel present in C1 (lowercase italics in Figure 3) is flanked by a direct repeat of a 12-bp sequence (5′-taattttgttta-3′) that is only present once in the D1 allele. It is noteworthy that the 12-bp repeat sequence forms part of the 3′ acceptor site including the branch point. In the longer C1 allele the branch point is ATCTAAT, whereas in the D1 allele the branch point is AATTAAT (heavily underscored in Figure 3). A BLAST homology search (Altschulet al. 1990) of all available databases reveals no significant homology to the white locus polymorphic indel. That sequence contains a very long array of AT-rich repeats reminiscent of an imperfect microsatellite array, (TAAAA)n.

Further upstream in intron 1 a 96-/97-bp direct repeat is present at positions 8833-8928 and 11421-11517 (not shown; see GenBank sequence). These sequences, separated by almost 2.5 kb, share 82% identity. The direct repeats and the sequence between them show no homology to any database entry.

Another intriguing aspect of the intron 1 sequence is the presence of 412 bp (from positions 5277 to 5689, see GenBank sequence) displaying high similarity (∼93% DNA identity) to the transposase coding region of p12 and p19, two members of the mellifera subfamily of mariner family transposable elements originally characterized in D. erecta (Loheet al. 1995). Like the D. erecta p12 and p19 mariner copies, this mariner-like element (MLE) sequence appears to be inactive due to deletions and several stop codons within the transposase open reading frame (ORF). The inverted terminal repeat sequences are not discernible.

Figure 1.

White alleles. Schematic diagram of the organization of four alleles of the white locus showing the size polymorphism between the wild-type C1 and D1 alleles and the respective deletion and frameshift mutations in the w2 and w1 mutant alleles.

The published 2252-bp medfly white cDNA has a 2031-bp ORF extending from the initiation ATG codon to the stop codon TGA (Zwiebelet al. 1995), corresponding to genomic locus positions 4087 and 16,167 and interrupted by six introns (underlined in Figure 3). Within the 3′ untranslated region (UTR), a consensus hexanucleotide sequence, AATAAA (dashed underline in Figure 3), precedes the known polyadenylation site by 14 bp. The putative 5′-end of the white transcript has been mapped to position 3384 of the genomic sequence by 5′-RACE cloning (data not shown). Located at 32 and 81 nucleotides upstream of this putative end is the pentamer ACAGT (not shown). It is pertinent that the eukaryotic capsite consensus sequence TCAGT occurs in 25% of arthropod polymerase II-transcribed promoters within 10 bp of the transcription start site and that this sequence or (A, G, C) CAGT variants may be initiators (Cherbas and Cherbas 1993). No CCAAT or TATA box homologies were found within 500 bp upstream from the ACAGT of the genomic sequence. In addition, a BLASTN search (Altschulet al. 1990) of the Eukaryotic Promoter Database (release 45) performed at the National Center for Biotechnology Information revealed no clear matches to known promoter elements in this region.

Figure 2.

—Medfly DNA blot. Wild-type allele size variants, C1 and D1, in the Benakeion strain visualized by hybridization of EcoRI-digested genomic DNA with a w+ cDNA probe.

Organization of the white locus: The exon/intron positions of the medfly white locus were initially examined relative to the restriction maps of the w+ genomic phage clones by Southern blots with cDNA probes. Their positions were subsequently confirmed by direct DNA sequencing followed by comparisons to the C. capitata white cDNA sequence (Zwiebelet al. 1995). As indicated in Figure 4a, the coding region of the medfly white gene consists of seven exons ranging in size from 75 to 657 bp, all of which conform to the GT-AG exon/intron splice rule (Breathnach and Chambon 1981). Furthermore, a comparative analysis of white gene organization among some dipteran insects from the Tephritidae [C. capitata, B. tryoni (Bennett and Frommer 1997)], Calliphoridae [L. cuprina (Garciaet al. 1996)], Drosophilidae [D. melanogaster (O’Hareet al. 1984)], and Culicidae [A. gambiae (Besanskyet al. 1995) and A. albimanus (Keet al. 1997)] families (Figure 4a), as expected, suggests that the white locus from C. capitata is most closely related to the white locus from B. tryoni. Both are organized into seven exons that divide the coding region at identical amino acid positions and display characteristically large first introns (9.7/8.2 kb and 12 kb, respectively). Furthermore, the white locus from L. cuprina is organized similarly although it displays a significantly shorter (4.5 kb) intron 1, while the D. melanogaster locus contains a still shorter (3.1 kb) first intron and only six exons. The organization of white loci from two species of Anopheline mosquitoes, A. gambiae and A. albimanus, is even more diverged, with only four exons, as well as the novel appearance in the A. gambiae sequence of a 2.6-kb intron that splits the long 5′UTR (Besanskyet al. 1995). A dendrogram derived from a Clustal W alignment (Thompsonet al. 1994) prepared with a maximum-parsimony analysis of conceptual white amino acid sequences from these insects is present in Figure 4b. The dendrogram, rooted with the two Culicidae species, is completely consistent with the above pattern of organizational divergence, as well as the taxonomic distance of the four dipteran families from which these sequences are derived.

Figure 3.

—Partial nucleotide sequence of the wild-type white locus. The additional intron 1 sequence in the C1 allele is shown in lowercase italics flanked by the 12-bp direct repeats (in boldface) with the 3′ branch point heavily underscored. Exons are shaded gray (uppercase) while introns are unshaded (lowercase). Start and stop codons are underlined. The putative polyadenylation signal is found near the end of exon 6 (dashed underline). Numbers indicate nucleotide positions within the complete sequence of the C1 allele. The sequences have been deposited in GenBank with the accession nos. AF318275 (w+C1) and AF318276 (w+D1).

Developmental expression: Northern blots with poly(A)+ RNA prepared from embryonic, larval, and adult stages were probed with a full-length white cDNA as well as a fragment of the medfly tubulin gene that was used as a loading control (Figure 5). In these studies, an ∼2.9-kb RNA species was detected at a low level in 0- to 8-hr embryos but was absent from mid- and late-stage embryos. Expression of the white gene rises to low but measurable levels in first instar larvae but falls below the limit of detection in later larval stages before it rises yet again in pupal RNA. The highest levels of white expression are observed in adult stages during which time its phenotypic effects are most easily seen (Figure 5). Taking into account the addition of a full-length poly(A) tract, which typically can extend several hundred nucleotides, this hybridizing band is in keeping with the predicted size for the full-length white cDNA.

Figure 4.

White organization by species. (a) Exon/intron organization of white genes in dipteran insects of the families Tephritidae (C. capitata and B. tryoni), Calliphoridae (L. cuprina), Drosophilidae (D. melanogaster), and Culicidae (A. albimanus, A. gambiae). The lengths of the first intron are shown in kilobases and the lengths of the exons are in amino acid residues. (b) Maximum-parsimony dendrogram based on an alignment of the amino acid sequences of the white gene from the six diptera, rooted with the two Culicidae species. Bootstrap values derived from 1000 resamplings of the data set are shown at the nodes. The tree is of length 513 with a consistency index of 0.9649.

Molecular characterization of two mutant alleles (w1 and w2) of the white gene: To date, two independent white eye mutant alleles have been identified in the medfly, both of which have been phenotypically rescued by germline transformation using wild-type medfly white cDNA constructs (Loukeriset al. 1995; Handleret al. 1998). To provide a more complete understanding of these rescue experiments and mindful of the fact that both mutant strains are likely to be used as host strains for establishing future transgenic lines of medfly, we have undertaken a molecular analysis of the relevant white mutant alleles.

For w1, sequencing of four independent w1 PCR clones covering exons 1 through 6, excluding most of intron 1, revealed no gross deletions or insertions. However, a frameshift mutation, due to the presence of a single base duplication, was observed in exon 6 (corresponding to position 15,679 in the w+ sequence; Figures 1 and 6a). This frameshift mutation gives rise to a premature stop codon and would presumably result in a truncated transcript. Furthermore, intron 1 was determined to be 10.5 kb in length using long PCR (Figure 1). As was the case for w2, the w1 allele appears to have a first intron more similar to that of the w+ C1 than that of the D1 allele, as the available sequence for the 3′ end of the first intron is identical to that of the w+ C1 allele. Apart from the frameshift mutation, the w1 sequence exhibited no amino acid substitutions, with respect to the w+ C1 allele, in the other exons. A total of four silent substitutions were also present (one in exon 1 and three in exon 2).

Figure 5.

—Medfly Northern blot. Poly(A)+ RNA prepared from embryonic (1, 2, and 3 days old), larval (first, second, and third instar), pupal, and adult stages of wild-type (Benakeion strain) individuals probed with a full-length white cDNA and a fragment of the medfly tubulin gene used as a loading control.

In the case of w2, direct sequencing of three independent EcoRI fragments derived from a genomic library of the M245 strain revealed that the w2 white mutant allele is similar in size to the w+ C1 allele (as opposed to the D1 counterpart). This is explained by the presence in the w2 allele of a large part of the polymorphic indel sequence in the w+ C1 allele. The major difference between the w2 and the w+ C1 allele is a 377-bp deletion extending from within intron 1 and including 225 bp of exon 2, such that the mutant w2 allele retains only the last 19 bp of exon 2 (Figures 1 and 6b). The 3′ 12-bp direct repeat, which forms part of the 3′ splicing acceptor site, is also included in the deletion. The other exons are complete and conserved with respect to the w+ sequence. Furthermore, the exons of the w2 mutant sequence contained a total of three amino acid substitutions with respect to the w+ C1 sequence: one in exon 2, histidine (H) to tyrosine (Y) near the deletion site; one in exon 5, alanine (A) to serine (S); and one in exon 6, glycine (G) to aspartic acid (D). In addition, a total of 26 silent substitutions are also present (eight in exon 3, one in exon 4a, three in exon 4b, one in exon 5, and 13 in exon 6).

To further confirm the sequencing data obtained for the mutant white alleles, RT-PCR analysis was undertaken using RNA prepared from wild-type, w1, and w2 early pupae. In these experiments (Figure 7), oligonucleotide primers derived from both exons 1 and 3 generate an expected 844-bp white gene using mRNAs prepared from both the wild-type (Benakeion) and the w1 mutant pupae. For w1, this result is indicative of a lack of grossly altered white gene template. In contrast and in keeping with the sequencing data presented above, mRNA prepared from the w2 mutant strain generates several smaller products ranging from ∼700 to 370 bp. These data most likely correspond to cDNA templates derived from a combination of either cryptic splicing variants or RNA degradation products.

DISCUSSION

This study extends our previous report, which described the cloning and functional examination of the medfly white cDNA to (1) the genomic organization of the locus and its developmental expression; (2) the molecular evolution of the locus with respect to representative dipteran insects from the Tephritidae, Calliphoridae, Drosophilidae, and Culicidae families; and (3) the characterization of two mutant white alleles currently used as markers for medfly transformation.

White gene organization in the medfly: The medfly white locus contains seven exons extending over ∼12 kb. The most obvious characteristic of the medfly white locus is the presence of an extremely long and variable first intron. Long introns tend to accumulate insertions such as transposable elements and repetitive sequences and tend to be more polymorphic in terms of size than short introns (Stephanet al. 1994). Extensive size variability in a long intron has previously been described in the medfly Adh1 gene (Gomulskiet al. 1997). The long medfly white first intron contains a truncated mariner element, long direct repeat sequences, and a microsatellite-bearing indel flanked by 12-bp direct repeats. The intron is highly polymorphic, varying in size from 8.2 kb in the w+ D1 allele to 9.7 kb in the w+ C1 and w2 alleles to ∼10.5 kb in the w1 mutant allele. Part of this polymorphism is explained by the presence or absence of the 1.5-kb indel and size variation of the microsatellite sequence therein.

A possible explanation for the origin of the C1/D1 polymorphism is that ectopic recombination between the two 12-bp direct repeat sequences in the C1 variant resulted in the deletion of the 1.5-kb sequence, giving rise to the shorter D1 variant (Figure 1). The counterpart recombinant variant would have contained an extra copy of the delimited sequence, giving rise to another, longer, variant with three copies of the 12-bp direct repeat. In any case, being entirely within the intron, these differences are presumably neutral as both alleles give rise to wild-type white cDNA products.

Evolution of the white locus: An updated comparison of the gene organization among white homologues from the six dipteran insect species where genomic sequences are available is presented (Figure 4a) and reveals several noteworthy features. The most apparent is the observation that the white loci of the Tephritidae fruitflies B. tyroni and C. capitata are easily distinguished by the presence of extremely long first introns that are almost twice as long as the next longest intron in the white homologue from the Calliphoridae blowfly, L. cuprina. Several studies have indicated that there may be a positive correlation between intron length and genome size (Moriyamaet al. 1998; Deutsch and Long 1999). This observation may be relevant in this context as the medfly and B. tryoni have similarly sized genomes of ∼5 × 108 bp (Tortiet al. 1998; P. Atkinson, personal communication) and therefore might maintain similarly sized introns. Smaller introns are expected from either A. gambiae or D. melanogaster, whose genome sizes are 2.6 × 108 bp (Besansky and Powell 1992) and 1.7 × 108 bp (Crainet al. 1976), respectively. Nevertheless, given their taxonomic relatedness it is not surprising that C. capitata and B. tryoni white loci should display the high degree of similarity described here.

Figure 6.

White mutations. (a) Comparison of w+ (C1) sequence and w2 sequence showing deletion including part of exon 2. (b) Comparison of w+ sequence and w1 sequence showing frameshift in exon 6. The sequences have been deposited in GenBank with the accession nos. AF315648 (w2), AF315646, and AF315647 (w1).

Figure 7.

—RT-PCR analysis of medfly white. cDNA templates were derived from pupal RNAs prepared from Benakeion (wild-type) and w1 and w2 mutant strains.

More interestingly, the increase in the number of exons from the lower diptera (Culicidae) to the higher diptera (Calliphoridae and Tephritidae), due to the acquisition of introns, is evident. The Culicidae, A. gambiae and A. albimanus, each possess four exons. The Drosophilidae D. melanogaster possesses six exons, the result of the acquisition of extra introns in what were exons 2 and exon 4. Finally, the two Tephritidae flies, C. capitata and B. tryoni, share with the Calliphoridae, L. cuprina, the presence of an additional intron in exon 4, giving rise to a total of seven introns. It is interesting to note that the dendrogram shown in Figure 4b, which is based on an alignment of the conceptual translation products of the white genes of these diptera, completely reflects the relationships derived from the comparison of the genomic organization of this locus as shown in Figure 4a. Among the higher diptera, the white products of the Tephritidae (Acalyptrate) appear to be more closely related to that of the Calliphoridae (Calyptrate) than to that of the Drosophilidae (Acalyptrate). Overall, the white products of the lower diptera of the Culicidae appear to be the most differentiated. The topology of the dendrogram is in agreement with that derived from sequences of the glucose-6-phosphate dehydrogenase (G6pdh) gene (Soto-Adameset al. 1994) and supports the evolutionary hypothesis proposed by Crampton (1944) in which the Tephritidae are closer to the Calyptrate Calliphoridae than to the Acalyptrate Drosophilidae. Clearly, the white intron/exon organization represents a useful phylogenetic character for higher-order Dipteran phylogenetic analysis. In the future it will be interesting to extend this analysis to include the intron organization of those white loci for which we currently know only the cDNA sequence, such as in humans and mice (Croopet al. 1997).

Developmental expression: In Drosophila, white transcripts measure 2.6 kb in length and are rare, having been estimated to make up as little as 0.0005% of total poly(A)+RNA from pupae or adults (O’Hareet al. 1983). Although no absolute measurement of white transcript abundance in C. capitata has been undertaken, we have used RNA blot analysis to examine the developmental expression pattern for C. capitata white transcripts. In these studies they appear to be expressed at relatively low levels in early embryonic and larval stages before rising to higher expression levels in pupal and adult stages. The developmental variation in Ceratitis white expression is consistent with the fluctuating levels observed for B. tryoni white transcripts during larval and pupal stages (Bennett and Frommer 1997), although expression in Drosophila appears to maintain constant levels throughout development (Pirrotta and Brockl 1984). Furthermore, in these definitive Drosophila experiments, the presence of several minor transcripts, among them a smaller male-specific transcript, was noted and thought to be responsible for gonadal pigmentation. In C. capitata, while similar pigmentation effects are observed in both male and female malpighian tubules (data not shown), we failed to observe any size variants during our analysis. We cannot, however, discard the possibility that a more sensitive expression study employing RNase protection or reverse-transcriptase-PCR techniques might uncover such tissue-specific variations in white transcription.

Molecular characterization of the mutant white alleles: As is the case in Drosophila, medfly white mutants are not the result of a single mutation event. The w1 allele contains a frameshift mutation in exon 6, due to the presence of a single base duplication (Figures 1 and 6a). This frameshift mutation gives rise to an appropriately sized transcript (Figure 7) that contains a premature stop codon, which would result in a truncated translation product. This effect would, presumably, seriously disrupt the functionality of the protein, as exon 6 is thought to encode the transmembrane helices (Garciaet al. 1996). The two amino acid substitutions found in exons 5 and 6 of the w2 mutant sequence are unlikely to be important as they are also present in the wild-type white cDNA sequence (Zwiebelet al. 1995). Furthermore, they are not located in highly conserved motif regions thought to be critical to the protein function. The second white mutant allele, w2, from the M245 strain described here, is very similar to the longer Benakeion C1 intron variant, but lacks 152 bp of the 3′-end of intron 1 and 225 bp of exon 2 (Figures 1 and 6b). This deletion, which interrupts the coding sequence, is almost certainly the cause of the white mutation in this strain, although we cannot exclude the possibility that other disruptive mutations may also be present. Furthermore, the RT-PCR data presented here (Figure 7) support the hypothesis that the loss of the 5′ part of exon 2 and the missing 3′ splicing site in intron 1 might give rise to other cryptic or nonproductive alternative splicing sites. The nature of these two mutant alleles may have important implications for their suitability for use in rescue experiments. This is reflected by that fact that one would presume that the large deletion of part of the w2 coding sequence would render this allele very stable. The w1 mutant allele, the result of a frameshift due to single base duplication would, conversely, be more susceptible to reversion. Given the importance of efficient and reliable markers in transformation systems, a w2 homozygotic strain would therefore be a more appropriate host. This is important in that reversion of the mutant white allele to a functional wild-type allele in the host strain would give rise to false transformation events, whereas reversion in a transformed line would potentially mask subsequent instabilities of the integrations. Furthermore, revertants would be at a high selective advantage as eye color mutants are at a disadvantage in terms of mating competitiveness with respect to their wild-type counterparts in Drosophila and C. capitata, presumably due to impaired visual response or discrimination (Greer and Green 1962; Confortiet al. 1999).

Despite its widespread use, a problem with the white marker in insect transformation systems is that transformants often display differing levels of rescue of the eye color phenotype. Previous studies have shown that in the medfly, as in Drosophila, this is due to position effects resulting in the suppression of the white marker minigene (Pirrottaet al. 1985; Hazelrigg 1987; Loukeriset al. 1995; Handleret al. 1998). Given the importance of visible markers, not only for the initial identification of transgenic insects but also for the assessment of their stability, new or improved markers will continue to be an important factor in the development of transformation systems. Particularly in the case of endogenous marker genes, such as white, the stability of the mutant allele chosen within the recipient strain should not be overlooked.

Acknowledgments

We thank Dr. C. Louis and the entire faculty and staff at the Institute for Molecular Biology and Biotechnology (Heraklion, Crete, Greece) where this work was initiated, as well as A. N. Fox, C. E. Merrill, and other members of the Zwiebel laboratory for helpful discussions. This research was supported by the U.S. Department of Agriculture-CSRS (92-37302-8237; to L.J.Z.); a National Science Foundation (United States)-NATO fellowship (9255297; to L.J.Z.); a short-term European Molecular Biology Organization fellowship (to S.C.); the John D. and Catherine T. MacArthur Foundation (to F.C.K.); the European Communities Commission (IC18-CT96-0100); the Italian Ministry of Agriculture and the International Atomic Energy Agency, Vienna (to A.M.); and in part by PRIN98 “Molecular Regulation of Development” (to G.S.).

Footnotes

  • Communicating editor: T. C. Kaufman

  • Received August 8, 2000.
  • Accepted November 15, 2000.

LITERATURE CITED

View Abstract