High-throughput screens allow us to understand how transcription factors trigger developmental processes, including cell specification. A major challenge is identification of their binding sites because feedback loops and homeostatic interactions may mask the direct impact of those factors in transcriptome analyses. Moreover, this approach dissects the downstream signaling cascades and facilitates identification of conserved transcriptional programs. Here we show the results and the validation of a DNA adenine methyltransferase identification (DamID) genome-wide screen that identifies the direct targets of Glide/Gcm, a potent transcription factor that controls glia, hemocyte, and tendon cell differentiation in Drosophila. The screen identifies many genes that had not been previously associated with Glide/Gcm and highlights three major signaling pathways interacting with Glide/Gcm: Notch, Hedgehog, and JAK/STAT, which all involve feedback loops. Furthermore, the screen identifies effector molecules that are necessary for cell-cell interactions during late developmental processes and/or in ontogeny. Typically, immunoglobulin (Ig) domain–containing proteins control cell adhesion and axonal navigation. This shows that early and transiently expressed fate determinants not only control other transcription factors that, in turn, implement a specific developmental program but also directly affect late developmental events and cell function. Finally, while the mammalian genome contains two orthologous Gcm genes, their function has been demonstrated in vertebrate-specific tissues, placenta, and parathyroid glands, begging questions on the evolutionary conservation of the Gcm cascade in higher organisms. Here we provide the first evidence for the conservation of Gcm direct targets in humans. In sum, this work uncovers novel aspects of cell specification and sets the basis for further understanding of the role of conserved Gcm gene regulatory cascades.
UNDERSTANDING the molecular signature of a developmental pathway is a major challenge in modern biology. Transcription factors specify cell fates by inducing the expression of specific genes. For instance, the zinc finger transcription factor glial cell deficient/glial cell missing (Glide/Gcm, or Gcm for the sake of simplicity) is expressed transiently at early stages (Bernardoni et al. 1997; Laneve et al. 2013; Flici et al. 2014) and controls Drosophila glial and blood development (Hosoya et al. 1995; Jones et al. 1995; Vincent et al. 1996; Bernardoni et al. 1997; Egger et al. 2002; Freeman et al. 2003; Soustelle et al. 2004; Altenhein et al. 2006). Gcm is also expressed in tendon and peritracheal cells (Soustelle et al. 2004; Laneve et al. 2013), showing that fate determinants have a much broader role than expected and likely trigger the expression of target genes depending on the transcriptional and epigenetic environment of the different cell types. Expression profiling data and computational predictions were used previously to gain a better understanding of the Gcm regulatory network (Egger et al. 2002; Freeman et al. 2003; Altenhein et al. 2006), but these approaches did not allow genome-wide identification of the direct targets. Genes directly targeted by transcription factors are commonly identified by chromatin immunoprecipitation (CHiP) using specific antibodies targeting the transcription factors. Because no efficient antibody is available for Gcm (Popkova et al. 2012; Laneve et al. 2013), we decided to use DNA adenine methyltransferase identification (DamID) to identify the Gcm direct targets in Drosophila.
The DamID chromatin profiling is a methylation-based tagging method used to identify the direct genomic loci bound by transcription factors (van Steensel and Henikoff 2000; van Steensel et al. 2001). The approach is based on the fusion of a bacterial Dam methylase to a protein of interest to mark the factor’s genomic binding sites by adenine methylation. The DamID screen allowed us to identify 1031 targets, only some of which have already been associated with a Gcm-dependent cascade. Several targets belong to the Notch (N), JAK/STAT, and Hedgehog (Hh) pathways and suggest the presence of feedback loops. Because these pathways were previously shown to affect the cell populations depending on Gcm, the DamID data provide a molecular frame to clarify the observed mutant phenotypes (Hosoya et al. 1995; Jones et al. 1995; Bernardoni et al. 1997). The DamID screen also brought to light two key features of the Gcm pathway.
First, we address the late role of fate determinants beyond their ability to trigger novel transcriptional programs that are subsequently maintained by other factors [reviewed in Cattenoz and Giangrande (2015)]. The transiently expressed Gcm transcription factor is known to induce the expression of Reverse polarity (Repo), Tramtrack (Ttk), and Pointed (Pnt) transcription factors that will ensure and maintain the glial-specific differentiation program (Flici et al. 2014) [reviewed in Cattenoz and Giangrande (2015)], and many Gcm targets identified by the DamID screen code for transcription factors. In addition, however, we found a significantly high number of effector genes, including numerous members of the Ig domain–containing protein family. These are molecules that affect cell function or late developmental events, including cell migration, a key feature of glia and hemocytes (Schmucker et al. 2000; Watson et al. 2005; kumar et al. 2015) [reviewed in Schwabe et al. 2009)]. This suggests that early genes such as gcm may have a much broader impact than expected in cell specification/physiology.
Second, the Gcm pathway is conserved in evolution. The Gcm protein is structurally conserved, as are most key developmental factors present in the fly genome. Like the fly ortholog, murine mGcm1 (mGcm1) and mGcm2 are important transcription factors because their deletion is lethal (Anson-Cartwright et al. 2000; Gunther et al. 2000). However, the main role of the mammalian genes, including the human genes, is, respectively, in the placenta and the parathyroid glands, two tissues that do not exist in invertebrates (Kim et al. 1998; Basyuk et al. 1999, 2009; Gordon et al. 2001; Correa et al. 2002; Chen et al. 2004; Mannstadt et al. 2008, 2011; Doyle et al. 2012; Yi et al. 2012; Park et al. 2013; Mitsui et al. 2014). The DamID data allow us to identify direct targets that are common in flies and vertebrates. To the best of our knowledge, this is the first evidence of functional conservation and sets the basis to further understand the Gcm network in mammals.
Materials and Methods
The pUASTattB-NDam construct was made by cloning the Dam-Myc cassette from pNDam-Myc (van Steensel and Henikoff 2000; van Steensel et al. 2001), using EcoRI and BglII, into pUASTattB. To produce the Dam-Gcm fusion construct, the gcm full-length coding sequence was cloned into pUASTattB-NDam (Choksi et al., 2006) using KpnI and NotI sites. The two constructs were used to produce UAS Dam and UAS Dam-Gcm flies, respectively, employing the docking site attP-22A (Bischof et al. 2007). Stage 10–11 embryos [4–7 hr after egg laying (AEL)] were collected from the two strains. DNA isolation, processing, and amplification were performed as described previously (Choksi et al., 2006). The Dam-only and Dam-Gcm samples were labeled and hybridized together on a whole-genome 2.1 million–feature tiling array with 50- to 75-mer oligonucleotides spaced at ∼55-bp intervals (Nimblegen Systems). Arrays were scanned and intensities extracted (Nimblegen Systems). Three biological replicates (with one dye swap) were performed. Log2 ratios of each spot were median normalized.
A peak-finding algorithm with false-discovery-rate (FDR) analysis was developed to identify significant binding sites (PERL script available on request). All peaks spanning eight or more consecutive probes (∼900 bp) over a twofold ratio change were assigned a FDR value. To assign a FDR value, the frequency of a range of small peak heights (0.1–1.25 log2 increase) were calculated within a randomized data set (for each chromosome arm) using 20 iterations for each peak size. This was repeated for a range of peak widths (6–15 consecutive probes). All these data were used to model the exponential decay of the FDR with respect to increasing peak height and peak width, therefore enabling extrapolation of FDR values for higher and broader peaks. This analysis was performed independently for each replicate data set. Each peak was assigned the highest FDR value from the three replicates. Genes were defined as targets where a binding event (with FDR < 0.1%) occurred within 5 kb of the transcriptional unit (depending on the proximity of adjacent genes).
Conservation of the Gcm binding sites located in DamID peaks
The Drosophila genome (version BDGP R5/Dm3) was scanned for the canonical Gcm binding sites (GBSs) listed in Figure 1B. For each GBS, the conservation score, which was calculated from 12 Drosophila species, mosquito, honeybee, and red flour beetle (Blanchette et al. 2004; Siepel et al. 2005), was taken from the Conservation track (multiz15way) on the University of California Santa Cruz (UCSC) Genome Browser. The GBSs located within 1 kb of DamID peaks were compared with the whole population of GBSs (Figure 1E). An F-test was used to compare the variance of the two populations, and a t-test for unequal sample variance was used to calculate the P-value.
Comparison with expression profiling data
The data set from Freeman et al. (2003) was retrieved directly from the publication. For Egger’s data set, the raw data were retrieved and analyzed as described in the paper (Egger et al. 2002) (intensities >50 and fold change >1.5) with a more restrictive P-value (<0.001). The data set from Altenhein et al. (2006) comprising the filtered and tested genes for the gcm gain of function (GOF) and gcm loss of function (LOF) was retrieved, and all genes giving nonspecific or negative in situ hybridizations (ISHs) were removed to make the Venn diagrams in Figure 2. The R package VennDiagram was used to draw the diagrams in Figure 2, B and C (Chen and Boutros 2011). The gene names for all data sets then were converted to FlyBase gene numbers (Fbgn) for comparison with the DamID genes using the FlyBase conversion tool (dos Santos et al. 2015).
The expression profiles of the DamID genes were compared to the expression profile of gcm in embryos using the in situ data produced by the Berkeley Drosophila Genome Project on embryos (Tomancak et al. 2002, 2007; Hammonds et al. 2013) (Table 1, Table 2, and Supporting Information, Table S1, column O).
Fly strains and immunolabeling
Flies were raised on standard medium at 25°. The following strains were used: gcmGal4, UASmCD8GFP/CyO, Tb1 (gcmGal4, UASGFP in the text) (Soustelle and Giangrande 2007), y1v1; P(TRiP.JF01075)attP2 (UASgcmRNAi in the text) (Bloomington B#31519), repoGal80 (gift of B. Altenhein), and enGal4 (Bloomington B#30564) weres crossed with Oregon R flies to generate enGal4/+ or with UASgcm F18A flies (Bernardoni et al. 1998) to generate enGal4/+; UASgcm/+ flies (Figure 4), and gcmGal4, UASmCD8GFP was recombined with repoGal80 to produce the line gcmGal4, UASmCD8GFP, repoGal80/CyO. Overnight lays of Drosophila embryos were used for Figure 4. In Figure 5, Drosophila central nervous systems (CNSs) were dissected and labeled as described previously (Ceron et al. 2001). The primary antibodies used were rat anti-Ci [1:100; supernatant from the Developmental Studies Hybridoma Bank (DSHB)], mouse anti-Ptc (1:100; supernatant from the DSHB), mouse anti-Smo (1:100; supernatant from the DSHB), rat anti-Elav (1:200; supernatant from the DSHB), chicken anti-GFP (1:1000; Abcam #13970), and rabbit anti-Dh31 [1:500; kindly provided by J. Veenstra (Veenstra et al. 2008; Veenstra 2009)]. Secondary antibodies conjugated with FITC, Cy3, or Cy5 (Jackson) were used at 1:500. DAPI was used at 100 ng/ml for nuclear counterstaining. Embryos and brains were mounted in VECTASHIELD (Vector) mounting medium and analyzed by confocal microscopy (Leica SP5) using identical settings between controls and mutants (gcm GOF and hypomorph).
Gene Ontology (GO) term and protein domain enrichment analysis
Luciferase assay in S2 cells
For CG30002, CycA, E(spl)m8, and ptc (Figure 3, A–D), sense and antisense oligonucleotides covering the GBSs in each gene were synthesized with flanking restriction sites for KpnI at the 5′ extremity and for NheI at the 3′ extremity. Each pair of oligonucleotides was designed with the wild-type (WT) GBS and a mutated GBS that is not bound by Gcm (mutated for nucleotides 2, 3, 6, and/or 7. In Table 3, the GBS and restriction sites are indicated by capital letters. For each GBS, the WT and mutant double-stranded probes were prepared as follows: 1 nmol of forward probe and 1 nmol of reverse probe were combined in 10 mM Tris, pH 7.5, 1 mM EDTA, and 50 mM NaCl in 100 μl of total solution. The mix was incubated for 1 min at 95° in a heating block, and then the heating block was turned off and allowed to cool to 25°. Then 2 μg of annealed oligonucleotides was digested with 20 units of KpnI [New England Biolabs (NEB) #R3142S] and 20 units of NheI (NEB #R3131S) in CutSmart buffer (NEB #B7204S) for 90 min at 37°. The digested double-stranded probes then were cleaned using a PCR Clean-Up Kit [Macherey-Nagel (MN) #740609] according to the manufacturer’s instructions.
Then 1 μg of luciferase reporter plasmid pGL4.23[luc2/minP] (pGL4.23) (Promega #E841A) was digested with KpnI and NheI as described previously; after 90 min at 37°, 20 units of alkaline phosphatase, calf intestinal (CIP; Promega #M0290S) was added to the plasmid and incubated for 1 hr at 37°. The plasmid then was cleaned using a PCR Clean-Up Kit (MN #740609).
Then 50 ng of digested luciferase plasmid was combined with the digested annealed probes (ratio plasmid/probe = 1:6), 400 units of ligase (NEB #M0202S), and ligation buffer (NEB #B0202S) and incubated overnight at 18°. The ligated plasmids then were dialyzed for 30 min on membrane filters (Millipore #VSWP02500) and amplified using the Plasmid DNA Purification Kit (MN #740410) according to the manufacturer’s instructions.
Transfections of Drosophila S2 cells were carried out in 12-well plates using Effectene transfection reagent (Qiagen #301427) according to the manufacturer’s instructions. Cells were transfected with 0.5 μg pPac-lacZ, 0.5 μg pGL4.23 carrying the indicated GBS, 0.5 μg pPac-gcm (Miller et al. 1998) or 0.5 μg pPac (Krasnow et al. 1989). Then, 48 hr after transfection, cells were collected, washed once in cold PBS, and resuspended in 100 μl of lysis buffer (25 mM Tris-phosphate, pH 7.8, 2 mM EDTA, 1 mM DTT, 10% glycerol, and 1% Triton X-100). The suspensions were frozen and thawed four times in liquid nitrogen and centrifuged for 30 min at 4° at 13,000 × g. The luciferase and LacZ activities were measured in triplicate for each sample. For LacZ measurements, 20 μl of lysate was mixed with 50 μl of β-galactosidase assay buffer (60 mM Na2PO4, 40 mM NaH2PO4, 10 mM KCl, 1 mM MgCl2, and 50 mM β-mercaptoethanol) and 20 μl of ortho-nitrophenyl-β-galactoside (ONPG, 4 mg/ml) and incubated at 37° for 20 min. The reaction was stopped by adding 50 μl of 1 M Na2CO3, and the optical density at 415 nm was measured. For luciferase activity, 10 μl of protein lysate was analyzed on an opaque 96-well plate (Packard Instruments #6005290) with a Berthold Microluminat LB96P Luminometer by injecting 50 μl of luciferase buffer (20 mM Tris-phosphate, pH 7.8, 1 mM MgCl2, 2.5 mM MgSO4, 0.1 mM EDTA, 0.5 mM ATP, 0.5 mM luciferine, 0.3 mM coenzyme A, and 30 mM DTT). For both LacZ and luciferase assays, background levels were estimated using lysate from nontransfected S2 cells. The relative luciferase levels were calculated as follows: first, the background was subtracted from each value, and then the average values of the technical triplicate were calculated. From there, the luciferase activity of each sample was normalized to the LacZ activity (luciferase activity/LacZ activity) to correct for transfection efficiency variability, and the ratio luciferase with Gcm/luciferase without Gcm was calculated. For each WT and mutant GBS, biological triplicates were carried out.
S2 cell FACS and quantitative PCR (qPCR)
S2 cells were plated in six-well plates, 6 million cells per well, in 1.5 ml of Schneider medium complemented with 10% fetal calf serum (FCS) and 0.5% penicillin and 0.5% streptomycin (PS). Cells were transfected 12 hr after plating using Effectene Transfection Reagent (Qiagen). Briefly, 2 µg of pPac-gal4 vector and 1 µg of pUAS-GFP for the negative control and 2 µg of pPac-gcm (Miller et al. 1998) and 1 µg of 4.3kb repo-GFP (repoGFP) (Laneve et al. 2013) for the gcm GOF were mixed with 90 µl of EC buffer and 24 µl of enhancer and incubated for 5 min at room temperature, and then 25 µl of Effectene was added, and the mix was incubated at room temperature for 20 min. Then 600 µl of Schneider medium + 10% FCS + 0.5% PS was added to the mix before spreading it on the cells. At 48 hr after transfection, cells were sorted on a BD FACSAria according to GFP expression to obtain more than 80% of transfected cells in the sample. The RNA then was extracted using TRI Reagent (Sigma), and 1 µg of RNA per sample was DNAse treated with RNAse-free DNAse 1 (Thermo Fisher) and reverse transcribed with Superscript II (Invitrogen). qPCR was performed on a LightCycler 480 (Roche) with SYBR master (Roche) on the equivalent of 5 ng of reverse-transcribed RNA with the primer pairs listed in Table S3. Each PCR was carried out in triplicate on at least three biological replicates. The quantity of each transcript was normalized to the quantity of the housekeeping genes Glyceraldehyde 3 phosphate dehydrogenase 1 (Gapdh1) and Actin 5c (Act5c). The P-values were measured comparing the control with the transfected cells using Student’s t-test (bars = SEM).
qPCR on gcm-overexpressing embryos
RNA extraction was carried out on 50 to 100 enGal4/+ or enGal4/+; UASgcm/+ stage 13–14 embryos (at 25°, 2 hr of egg laying, 9 hr and 20 min of incubation before collection) using TRI Reagent (Sigma). RNA extraction and qPCR were performed as described for the S2 cells in triplicate.
Conversion to human orthologs
The Drosophila RNAi Screening Center (DRSC) integrative ortholog prediction tool (Hu et al. 2011) was used to retrieve the human orthologs of all Gcm targets identified in Drosophila by the DamID screen. All human genes with a weighted score > 1 were selected.
qPCR in HeLa cells
HeLa cells were plated in six-well plates, 400,000 cells per well, in 1.6 ml of Dulbecco’s Modification of Eagle’s Medium (DMEM) complemented with 5% FCS and gentamycin. Cells were transfected 12 hr after plating using Effectene Transfection Reagent (Qiagen). Briefly, 1 µg of pCIG vector, 1 µg of pCIG vector expressing mouse Gcm1 (pCIG-mGcm1) (Soustelle et al. 2007), or 1 µg of pCIG vector expressing mouse Gcm2 (pCIG-mGcm2) were mixed with 100 µl of EC buffer and 8 µl of enhancer and incubated for 5 min at room temperature, and then 10 µl of Effectene was added, and the mix was incubated at room temperature for 20 min. Then 200 µl of DMEM + 5% FCS + gentamycin was added to the mix before spreading it on the cells, and 48 hr after transfection, the RNA was extracted using TRI Reagent (Sigma). Then 1 µg of RNA per sample was DNAse treated with RNAse-free DNAse 1 (Thermo Fisher) and reverse transcribed with Superscript II (Invitrogen). qPCR was performed on a LightCycler 480 (Roche) with SYBR master (Roche) on the equivalent of 5 ng of reverse-transcribed RNA with the primer pairs listed in Table S3. Each PCR was carried out in triplicate on at least three biological replicates. The quantity of each transcript was normalized to the quantity of the housekeeping genes Glyceraldehyde 3 phosphate dehydrogenase (GAPDH) and Actin Beta (ACTB). The P-values were measured comparing the control with the transfected cells using Student’s t-test (bars = SEM).
Luciferase assay in HeLa cells
For GCM1 and GCM2 (Figure 9), oligonucleotides surrounding the GBSs were designed with flanking restriction sites for KpnI at the 5′ extremity and for NheI at the 3′ extremity (in Table 4, the GBS and restriction sites are indicated by lowercase letters). Each pair of oligonucleotides was used to amplify the genomic region encompassing the GBSs on HeLa genomic DNA using Expand High Fidelity System DNA polymerase (Roche). The amplicons were digested with 20 units of KpnI (NEB #R3142S) and 20 units of NheI (NEB #R3131S) in CutSmart buffer (NEB #B7204S) for 90 min at 37°. The digested amplicons then were cleaned using a PCR Clean-Up Kit (MN #740609) according to the manufacturer’s instructions. Then 1 μg of luciferase reporter plasmid pGL4.23[luc2/minP] (pGL4.23) (Promega #E841A) was digested with KpnI and NheI as described previously. After 90 min at 37°, 20 units of CIP (Promega #M0290S) was added to the plasmid and incubated for 1 hr at 37°. The plasmid then was cleaned using a PCR Clean-Up Kit (MN #740609) according to the manufacturer’s instructions, and 50 ng of digested luciferase plasmid was combined with the digested amplicons (ratio of plasmid/probe = 1:6), 400 units of ligase (NEB #M0202S), and ligation buffer (NEB #B0202S) and incubated overnight at 18°. The ligated plasmids then were dialyzed for 30 min on membrane filters (Millipore #VSWP02500) and amplified using the Plasmid DNA Purification Kit (MN #740410) according to the manufacturer’s instructions. These plasmids were used for the luciferase assay (Figure 9) and as templates for mutagenesis. To mutagenize the reporters, primers overlapping the GBSs were designed with mutations for nucleotides at position 2, 3, 6, and/or 7 in the GBSs (in Table 4, the GBS and restriction sites are indicated by lowercase letters).
For each gene, PCR was performed using 5 ng of pGL4.23 containing the WT locus with Expand High Fidelity System DNA polymerase (Roche). A first round of PCRs was carried out to generate the amplicon containing the first and second mutated GBSs and the amplicon containing the second and third mutated GBSs with the following primer pairs: GBS1mut forward/GBS2mut reverse and GBS2mut forward/GBS3mut reverse. The two amplicons then were combined using the primers GBS1mut forward and GBS3mut reverse and inserted into pGL4.23.
HeLa cells were plated in 24-well plates, 60,000 cells per well, in 350 µl of DMEM complemented with 5% FCS and gentamycin. Cells were transfected 12 hr after plating using Effectene Transfection Reagent (Qiagen). Briefly, 2.5 ng of pGL4.75 vector, 250 ng of pCIG vector expressing either mouse Gcm1 (pCIG-mGcm1) (Soustelle et al. 2007) or mouse Gcm2 (pCIG-mGcm2) or empty, and 250 ng of pGL4.23 vector containing the GBS WT or mutant were mixed with 60 µl of EC buffer and 4 µl of enhancer and incubated 5 min at room temperature; then 5 µl of Effectene was added, and the mix was incubated at room temperature for 20 min. Then 100 µl of DMEM + 5% FCS + gentamycin was added to the mix before spreading it on the cells, and 48 hr after transfection, the luciferase assay was performed using the Dual-Luciferase Reporter Assay System (Promega) according to the manufacturer’s instructions with a Berthold Microluminat LB96P Luminometer.
All data are joined with the publication in Supporting Information.
The DamID screen identifies loci containing GBSs
To identify the genes directly regulated by Gcm, we mapped its binding sites using a genome-wide DamID screen (van Steensel and Henikoff 2000; van Steensel et al. 2001). Briefly, the Escherichia coli DNA adenine methyltransferase (Dam) was fused N-terminal to the full-length Gcm coding sequences. Thus, wherever Gcm binds, the Dam methylates the surrounding DNA. The methylated DNA then can be identified by microarray. In our case, the Dam-Gcm screen was performed on Drosophila embryos at stage 11, when Gcm expression peaks. Because Gcm is expressed in several cell types: glia, hemocytes, and tendon cells (Soustelle et al. 2004), as well as neuronal (Chotard et al. 2005; Soustelle and Giangrande 2007) and peritracheal cell subsets (Laneve et al. 2013), we decided to search for all its direct targets and did not restrict expression of the Dam-Gcm fusion to a specific cell type.
Overall, 4863 DamID peaks were identified. Motif enrichment analysis using the MICRA tool (Southall and Brand 2009) revealed enrichment for the motif ATGCGGG at the loci bound by the Dam-Gcm fusion (Figure 1A). This motif is closely related to most of the GBSs previously described and validated functionally (Figure 1B) (Akiyama et al. 1996; Schreiber et al. 1997; Miller et al. 1998; Ragone et al. 2003). Up to 83% of the loci identified in the screen contain canonical a GBS(s) within 1 kb of the peak (Figure 1C), and the average density of the GBS(s) present at the DamID peaks (0.693 GBS/kb) is significantly higher than the average GBS density over the whole genome (0.138 GBS/kb) (P = 1.496 × 10−148; Wilcoxon test = 0) (Figure 1D). Finally, because numerous GBSs are present throughout the genome but may not all be relevant to the Gcm cascade, we asked whether those that are under a DamID peak are more likely to be directly associated with Gcm. Indeed, the GBSs present under DamID peaks are significantly more conserved than the GBSs in the whole genome (12 Drosophila species, mosquito, honeybee, and red flour beetle were used for the comparative analysis), thus adding strength to the DamID data (P = 0.00273) (Figure 1E).
The DamID screen identifies genes previously characterized as Gcm interactors
The 4863 DamID peaks are located in the vicinity of (<5 kb) or within 1031 genes (Figure 1F shows the overall peak locations). To assess the specificity of the DamID screen, known targets of Gcm were examined more closely. For instance, gcm itself contains several GBSs upstream of its transcription start site (TSS) and is known to autoregulate, and the strongest GBS was determined previously to be GBS 3/C, which is located 3 kb upstream of the gcm TSS (Miller et al. 1998; Ragone et al. 2003) (Figure 2A). A DamID peak was detected on top of this GBS (Figure 2A). Other examples include sna, AGO1, brat, and lola, which were all identified as gcm interactors in a genetic screen (Popkova et al. 2012) (Figure S1, A–D). All of them contain at least one significant DamID peak in their promoter regions. Moreover, the gene loco involved in late glial cell differentiation is controlled directly by Gcm (Granderath et al. 2000) and has five canonical GBSs, three of which are located within a DamID peak (Figure S1E). Importantly, the three GBSs located under the peak are critical for the expression pattern of loco in glial cell (Granderath et al. 2000). The gene pnt, involved in glial development (Chen et al. 1992; Klambt 1993) and the immune response (Zettervall et al. 2004), was described as downstream of gcm (Giesen et al. 1997) and contains one canonical GBS within one DamID peak (Figure S1F). Two other genes were extensively described as targets of Gcm during glial cell development: ttk and repo. Ttk is a transcriptional repressor inhibiting the neuronal fate in neural stem cells (Giesen et al. 1997). While containing two canonical GBSs within a DamID peak, ttk was not identified as a direct target of Gcm by our screen because the peak is located far (9.2 kb) from the TSS (Figure S1G). The Repo homeodomain transcription factor is required for the late differentiation of lateral glial cells (Campbell et al. 1994; Xiong et al. 1994; Halter et al. 1995) and is directly activated by Gcm through the 11 GBSs present in the 4.3-kb region upstream of the TSS (Lee and Jones 2005). However, repo was not selected in our screen because the observed DamID peak did not pass the enrichment threshold to be considered significant by the algorithm (Figure S1H). This indicates that the criteria for the identification of the Gcm direct targets are extremely stringent.
Three teams had previously performed genome-wide screens for the Gcm downstream targets (Egger et al. 2002; Freeman et al. 2003; Altenhein et al. 2006). Egger et al. (2002) compared WT stage 11 embryos to those expressing Gcm ectopically in the neuroectoderm (GOF embryos) and identified 356 genes significantly enriched in gcm GOF compared to WT animals (P < 0.001). Freeman et al. (2003) combined computational prediction, expression profiling analyses in WT and gcm GOF, and IHS in WT, gcm GOF, and gcm LOF animals to identify and validate 48 genes as downstream targets of Gcm. And finally, Altenhein et al. (2006) tracked Gcm downstream targets in stage 9–16 embryos in WT, gcm LOF, and gcm GOF animals and validated 119 genes by IHS. Together these studies identified 471 downstream targets of Gcm, but the overlap between the three data sets is quite weak, with only 42 genes identified by two of the studies and 5 genes identified by all three studies (Figure 2B). Cross-referencing the three data sets with the DamID peaks allowed us to considerably restrict the number of targets identified by the expression profiling analyses and revealed that 47 genes identified as downstream targets of Gcm in at least one of these studies are direct targets of Gcm according to DamID (Figure 2, C and D). Of note, in the first part of their study, Freeman et al. (2003) developed an algorithm to predict 384 direct targets of Gcm based on the presence of a cluster of eight GBSs in the surrounding regions. Among the predicted targets, only 8.3% (17 of 204 tested) were confirmed in Gcm GOF or Gcm LOF embryos (Freeman et al. 2003). Cross-referencing the DamID data set with these bioinformatics data returned 47 genes of 384 (12%) predicted to be direct targets by the Freeman et al. (2003) algorithm and confirmed as direct targets by our screen. Among these 47 genes, only 7 were previously validated in vivo (Table S1). Together with the observed evolutionary conservation of the GBSs under DamID peaks, this underlines the importance of scoring for occupied binding sites.
Finally, expression profiling and DamID analyses provide complementary information because the first approach tells the direct and indirect transcriptional consequences of a mutation, whereas the second tells where the transcription factor binds in the genome. We thus verified that the direct targets identified in our screen are differentially represented in the expression profiling data. Because expression of the direct targets is induced just after Gcm starts being expressed (stage 10), we expect them to be enriched in the expression profiling data relative to the early stages. We thus analyzed the data of Altenhein et al. (2006) and found that for most of the downregulated genes identified in the LOF expression profiling (119 in these data sets), expression starts being affected over a large time window, between stages 11 and 13 (Figure 2E). In contrast, when we performed the same analysis on the gene subset that also was detected in the DamID screen (13 genes), we found that most of these targets start to be downregulated at earlier stages (Figure 2F). Together these findings indicate that the DamID screen is an efficient method to identify the direct targets of Gcm.
The DamID screen identifies new direct targets of Gcm
Among the 1031 genes identified by the DamID screen, more than 900 are new. The interaction between Gcm and four new target genes was validated by luciferase assays in Drosophila S2 cells. These genes were selected to be representative of the different locations of the DamID peaks. They include genes showing a DamID peak in the promoter and carrying canonical GBS-like CG30002; genes for which the closest GBSs are near the DamID peak but do not overlap with it, such as CycA; and genes for which the DamID peak and the GBSs are located within the transcribed region of the gene, such as CycA, Enhancer of split m8 (E(spl)m8), and ptc. For each gene, the regions containing the GBS under the DamID peak or closest GBS to the DamID peak were cloned in a luciferase reporter plasmid. For DamID peaks covering two GBSs (CG30002 and ptc), one reporter was built per GBS. The constructs then were transfected in S2 cells with or without the Gcm expression vector ppacGcm. In parallel, the same regions were mutated for their GBSs and analyzed similarly (Figure 3, A–D). The gene CG30002 contains a significant DamID peak in its promoter region and two GBSs at the position of the peak (Figure 3A). The luciferase assays indicate that both GBSs induce transcription of the reporter on cotransfection with Gcm, and no induction is observed when the GBSs are mutated. Similar observations were made on CycA, E(spl)m8, and ptc, even though the DamID peaks are located within the coding sequences of these genes (Figure 3, B—D).
To confirm the data obtained with the reporter plasmids, we analyzed the effects of Gcm on the endogenous genes and measured the levels of their transcripts in S2 cells by qPCR. S2 cells were transfected with ppacGcm because Gcm is expressed at extremely low levels in those cells (Cherbas et al. 2011). We were able to show significant induction of CycA and E(spl)m8 expression, but no induction was observed for CG30002 or ptc (Figure 3E). Such a negative result might be due to the facts that S2 cells do not contain the appropriate cofactors, the genes are in a repressed chromatin state, and/or S2 cells have low transfection efficiency. Indeed, FACS analysis of S2 cells transfected with the Gal4 expression vector ppacGal4 and UAS-GFP vectors revealed that only 4.92% of the cells express GFP. This means that only a minority of the cell population contains the two plasmids (Figure 3, G and H). To improve the readout of the assay, S2 cells were transfected with ppacGcm and the reporter plasmid repoGFP, which was used previously to trace Gcm activity (Lee and Jones 2005; Laneve et al. 2013; Flici et al. 2014). This allowed us to sort the cells that express the GFP produced under the repo promoter. Based on the FACS analysis, our transfection protocol allows the detection of GFP in 2.11% of S2 cells (Figure 3I). These GFP+ cells were enriched to reach at least 80% purity, and the preceding target genes were analyzed by qPCR. First, we monitored the levels of repo endogenous transcripts to assess the efficiency of our protocol. Without FACS sorting, no change in repo levels was observable on Gcm expression, whereas we could see a 30-fold increase on adding the FACS sorting step (Figure 3F). This step also allowed us to detect the induction of CG30002 and ptc expression (Figure 3F) and greatly improved the detection of E(spl)m8 and CycA transcript induction.
Finally, we tested DamID target genes in vivo. Because ptc is strongly required in the epidermis at the level of muscle attachment sites, where Gcm is also expressed and required, we drove epithelial Gcm expression using the engrailedGal4 (enGal4) driver (Figure 4I), which induces expression of tendon cell markers (Soustelle et al. 2004). Because several other members of the Hh pathway also were identified in the DamID screen, including rdx, smo, ci, and Pka-C1, we analyzed these genes as well. First, we performed qPCR assays and found increased expression for some of them (Pka-C1 and rdx) (Figure 4J). To complement this approach, we performed qualitative analyses by immunolabeling on Gcm-overexpressing embryos and found significantly increased expression for Ptc, Ci, and Smo (Figure 4, A–H′). In agreement with previously obtained data (Soustelle et al. 2004), the expression of Repo did not increase, likely owing to the lack of cell-specific factors, which are known to affect Gcm activity strongly (De Iaco et al. 2006). Conversely, the expression of Repo increases on overexpression of Gcm in the neurogenic region, whereas that of Ci, Ptc, and Smo does not (data not shown). In sum, the in vivo findings validate those in S2 cell transfection assays, which thus provide a simple and sensitive approach.
Moreover, the targets identified by the DamID screen are not necessarily expressed in embryos at stage 11, the stage at which the screen was performed. Indeed, comparison of DamID and modENCODE transcriptome data on stage 10–11 embryos reveals that 8.7% of the genes identified in our screen are not expressed at this stage (Figure 3J). For example, the Diuretic hormone 31 (Dh31) gene is not detected in embryos on ISH (Tomancak et al. 2002, 2007; Hammonds et al. 2013) and starts to be expressed at stage 17 according to modENCODE data (Graveley et al. 2011). Nevertheless, the Dh31 locus contains two DamID peaks, and Gcm induces Dh31 expression in S2 cells (Figure 5A). We therefore analyzed later developmental stages and found colocalization of Dh31 and Gcm in a single cell of the larval brain hemisphere (Figure 5, B–C″). At that stage, Gcm is expressed in the de novo–produced glial cells of the optic lobe, in the lamina, and in medulla neurons (Chotard et al. 2005), as well as in two groups of neurons of the central brain, the so-called dorsolateral and medial clusters (Soustelle et al. 2007). The double Gcm/Dh31+ cell belongs to the dorsolateral cluster, and colocalization is affected in gcm mutant animals. The gcmGal4, UASGFP line allowed us to trace gcm expression in WT and hypomorphic conditions obtained by using gcmGal4 homozygous or heterozygous animals carrying a UASgcmRNAi construct (Figure 5, D–F). These data strongly suggest that the Dam construct is present and can bind sites in cells in which Gcm and/or its targets are not yet expressed. In sum, we have shown that S2 cells can be used to validate direct targets of Gcm on cell sorting, that the identified targets are actually induced by Gcm in vivo, and that the Dam-Gcm fusion likely identifies most Gcm direct targets in the fly genome.
Protein domain enrichment analysis
To annotate the genes identified in the DamID screen, we first performed a protein domain enrichment analysis using DAVID bioinformatics resources (Huang et al. 2009a,b). The analysis showed enrichment for genes coding for proteins containing basic Helix-Loop-Helix (bHLH) domains (17 genes) and Homeobox domains (26 genes), which are characteristic of transcription factors, but the most enriched family is the Ig domain–containing protein (DAVID protein domain enrichment: 7.79-fold; P = 1.3 × 10−14; FDR = 2.0 × 10−11) (Figure 6A). Most of the Ig genes are involved in or at least expressed during nervous system development (Figure 6B) and code for guidance molecules [reviewed in Patel and Van Vactor (2002)]. This includes two fibroblast growth factor receptors (Htl and Btl), two netrin receptors (Unc-5 and Fra), six members of the Beaten path family (Beat), three members of the Down syndrome cell adhesion molecule family (Dscam), two fasciclins (Fas), three roundabout proteins (Robo), and several others, including seven members of the defective proboscis extension response family (Dpr) and two Dpr interactors (Ozkan et al. 2013) (Figure 6B and Table 1). The chemoreceptor family of genes dpr has been poorly characterized so far, but these genes are also expressed in glial cells (DeSalvo et al. 2014), suggesting that Gcm may regulate these genes during gliogenesis.
To confirm induction of this class of genes by Gcm, the levels of expression of 12 of them were assayed in S2 cells in basal conditions or on transfection with a Gcm expression vector (Figure 6C). The expression of 8 genes of 12 is significantly induced by Gcm in S2 cells, the strongest induction being observed for htl (>100-fold increase) (Figure 6C).
GO term enrichment analysis
Gcm direct targets are involved in nervous system development:
Following protein domain enrichment analysis, we carried out a GO term enrichment analysis for biological function using DAVID. The analysis retrieved 230 genes involved in nervous system development (8.8-fold enrichment; P = 2.4 ×10−12; FDR = 3.9 × 10−9). This subset of genes was then further analyzed using an enrichment analysis for molecular function. As expected from a cell fate determinant, the major class of genes regulated by Gcm and involved in nervous system development is transcription factors (67 genes; 6.7-fold enrichment; P = 1.0 × 10−32; FDR = 1.3 × 10−29) (list in Table S1, column J). More specifically, we found genes involved in (1) neural stem cell (also called neuroblast) regulation, (2) embryonic glial cell development, and (3) larval optic lobe development (Figure 7A and Table 1).
Up to 34 genes regulate neural stem cells, including genes that likely allow the transition from stem cell to glial identity. Interestingly, Pros also was identified as a positive regulator of Gcm (Freeman and Doe 2001; Ragone et al. 2001; Choksi et al. 2006), and both Brat and Lola interact with Gcm genetically (Popkova et al., 2012). These data suggest the presence of feedback loops in the establishment of glial fate.
Numerous targets are directly linked to glial cell development, as expected given the gliogenic role of Gcm. For example, Hkb controls glial subtype specification by reinforcing Gcm autoregulation in a specific glial lineage and interacts with Gcm genetically as well as biochemically (De Iaco et al. 2006) (Popkova et al. 2012). In addition, several genes are required in longitudinal glia precursor division (Figure 7A and Table 1).
As to genes involved in optic lobe development, this is in line with the finding that Gcm is necessary for both neuronal and glial cell development within the larval optic lobe (Chotard et al. 2005; Yoshida et al. 2005; Soustelle et al. 2007). Of note, one of the targets, Tll, has been shown recently to be necessary for specification of lamina neuronal precursors, showing a mutant phenotype similar to that induced by the lack of Gcm (Guillermin et al. 2015).
Among the targets, we identified several members of signaling pathways that control neural development at different steps: Hh, Egfr/Ras, and Fat/Hippo pathways (ft, Egfr, Ras85D, rho, ds, d, CycE, th, and dally) and, finally, the N pathway (N, Ser, Dl, l(1)sc, ase, and eight genes of the E(spl) complex, or E(spl)-C). Because of their peculiar organization, we further validated the genes of E(spl)-C, which spans over 50 kb and is located on the right arm of chromosome 3 (Figure 7, B and C). Its members are all induced in FACS-sorted S2 cells transfected with Gcm, whereas the gene directly adjacent to the complex, gro, is devoid of a DamID peak and is not induced, indicating that Gcm activity is specific to the complex (Figure 7C). N and Dl also were validated (Figure 8B).
Interestingly, several genes targeted by Gcm are involved in the function of fully differentiated glia. Our screen identified several genes acting in the blood-brain barrier (BBB) (Figure 7A and Table 1). Overlap between our screen and transcriptome data of the BBB reveals that Gcm targets 61 genes enriched in the BBB compared to all glial cells (Table S1, column L) (DeSalvo et al. 2014; Limmer et al. 2014). Gcm also targets several genes controlling axon ensheathment and glial cell migration (Figure 7A and Table 1). In sum, the screen reveals the molecular role and mode of action of Gcm in neural development and function.
Gcm direct targets are involved in immune system development:
In addition to its role in the nervous system, Gcm is also required for differentiation and proliferation of embryonic plasmatocytes. Embryos mutant for gcm and its paralog gcm2 present a decreased number of plasmatocytes, and the plasmatocytes do not complete the differentiation process (Bernardoni et al. 1997; Alfonso and Jones 2002) [reviewed in Kammerer and Giangrande (2001), Evans and Banerjee (2003), and Waltzer et al. (2010)]. Plasmatocytes are macrophages that can differentiate into another type of hemocyte called a lamellocyte on immune challenge (Rizki 1957; Stofanko et al. 2010), a process that Gcm helps to repress (Jacques et al. 2009). The DamID screen identified 68 genes known to regulate the immune system (Figure 8A, Table 2, and Table S1, column M), and similar to the nervous system, transcription factors are quite prominent (14 transcription factors) (Table S1, column J).
In addition to gcm itself and gcm2 (Bernardoni et al. 1997; Kammerer and Giangrande 2001; Alfonso and Jones 2002), Gcm targets genes that promote the differentiation of prohemocytes into plasmatocytes (Figure 8A and Table 2). Several targets inhibit the JAK/STAT pathway, whose activation leads to lamellocyte differentiation (Figure 8A and Table 2). Pen and ush inhibit the formation of lamellocytes and so-called melanotic tumors, which are masses of aggregated hemocytes enriched in lamellocytes (Kussel and Frasch 1995; Sorrentino et al. 2007). In addition, Gcm targets genes characterized by their role in hemocyte proliferation or migration (Figure 8A and Table 2).
Similar to the nervous system, Gcm also targets genes that are involved in function of the mature immune system. These include genes involved in (1) coagulation, (2) phagocytosis, (3) autophagy, (4) the inflammatory cascade mediated by Wnt, and (5) melanotic encapsulation of foreign targets (Figure 8A and Table 2). Finally, DamID targets also include genes that tune the immune response based on the nature of the pathogen. Thus, based on the screen, Gcm induces the expression of genes involved in the response to fungi, in the response to gram-negative bacteria, and more broadly, in the antimicrobial humoral response (Figure 8A and Table 2). In addition, we assayed 11 genes involved in hemocyte biology using S2 cells transfected by a Gcm expression vector and confirmed the induction of 10 of them (Figure 8B).
Gcm direct targets are involved in tendon cell and peritracheal cell development:
Tendons cells link muscles to the backbone of the organism, and Gcm expression in these cells is required for proper muscle attachment (Soustelle et al. 2004). Several Gcm targets are involved in the maturation of tendon cells, such as Hh signaling pathway proteins Wg and Egfr. The Hh signaling pathway and Wg are necessary in the early development of tendon cells (Hatini and DiNardo 2001), whereas the Egfr pathway promotes terminal differentiation after the junction between the migrating muscle and the tendon has been established (Yarnitzky et al. 1997).
Other targets expressed in tendon cells are directly involved in muscle migration toward the tendon. Sli and Sdc are guidance cues that attract the muscle (Kramer et al. 2001; Steigemann et al. 2004). Leucine-rich tendon-specific protein (Lrt) interacts with Robo expressed in the muscle to arrest muscle migratory behavior (Wayburn and Volk 2009). Dnt and Hbs control muscle attachment site selection (Dworak et al. 2001; Lahaye et al. 2012). A third class of targets controls the later step of building the junction between muscles and tendons. Mew, If, Dys, and Wech are core components of the junction (Prokop et al. 1998; Loer et al. 2008) [reviewed in Charvet et al. (2012)]; Short stop (Shot) is an actin-tubulin cross-linker involved in junction stabilization (Bottenberg et al. 2009), and Sema-5C is a transmembrane protein so far poorly characterized in tendon cells (Bahri et al. 2001).
Finally, Gcm is expressed in peritracheal cells (Laneve et al. 2013), endocrine cells located along the trachea that secrete ecdysis-triggering hormone (O’Brien and Taghert 1998). Only two markers of peritracheal cells have been characterized so far: the bHLH transcription factor Dimm (Hewes et al. 2003) and the neuropeptide biosynthetic enzyme peptidylglycine-α-hydroxylating monooxygenase (Phm) (O’Brien and Taghert 1998). Gcm acts upstream of dimm (Laneve et al. 2013), and dimm is known to control the expression of phm (Park et al. 2008), but neither dimm nor phm is present in the DamID screen, suggesting that Gcm does not directly induce these genes in peritracheal cells. While little is known about peritracheal cells, dimm and phm were further characterized in neuroendocrine cells: 212 downstream targets of Dimm were recently identified by combining CHiP and transcriptome analyses (Hadzic et al. 2015). Comparison of this data set with the DamID screen revealed that 27 targets are common to Dimm and Gcm (Table S1), suggesting a potential contribution of Gcm to the Dimm regulatory pathway. In addition, based on our screen, Gcm targets a gene involved in the endocrine function of Dimm+ cells: syt-β is controlled by Dimm and is involved in calcium-dependent exocytosis (Adolfsen et al. 2004; Park et al. 2014). To conclude, Gcm is necessary for the development of peritracheal cells expressing Dimm, and our screen identified genes involved in the Dimm pathway. This sets the stage for future analysis of these genes in peritracheal cells. Overall, our screen reveals the full extent of Gcm function in the diverse cell types in which this transcription factor is expressed.
Conservation of the Gcm molecular cascade in mammals
The mammalian genome contains two orthologs of Drosophila gcm genes, which are named in humans GCM1 and GCM2. GCM1 is required for differentiation of trophoblasts in the developing placenta (Basyuk et al. 1999, 2009); GCM2 is expressed and required mainly in the parathyroid glands (Kim et al. 1998; Gordon et al. 2001). Because few targets have been identified for GCM1 and GCM2, we sought for a conserved Gcm regulatory cascade on retrieving the mammalian orthologs of the Drosophila targets identified in the DamID screen (Table S1). To start a comparative analysis, we chose genes that have a functional significance in mammals. A GO term enrichment analysis identified orthologs that are associated with parathyroid gland or placenta development, which allowed us to restrict the list to 29 genes potentially targeted by GCM genes in mammals (Table S2). We further analyzed the impact of murine Gcm genes on GCM1 and GCM2; T-box transcription factor (TBX1); GATA3, GATA4, and GATA6; FGFR1 and FGFR2; and Delta-like 1 (DLL1) expression.
GCM genes regulate their own expression
GCM1 and GCM2 contain multiple GBSs in their promoters (Figure 9, A and B). This suggests the existence of a positive-feedback loop that has not been documented in mammals but that is very well characterized for Drosophila gcm genes (Miller et al. 1998; Ragone et al. 2003). To validate the autoregulation of GCM1 and GCM2 in mammals, we monitored their levels of expression in HeLa cells transfected with expression vectors for mGcm1 and mGcm2. The use of mouse orthologs allowed us to design qPCR primers specific for the human transcripts and to specifically quantify their levels of expression. This set of experiments shows that in HeLa cells, GCM1 expression is induced by both mGcm1 and mGcm2 (Figure 9C), and GCM2 is induced only by mGcm2 (Figure 9D).
To demonstrate that the induction of GCM1 and GCM2 expression was carried out via the GBSs, we arbitrarily selected promoter fragments containing three GBSs with at least 75% similarity with the canonical GBS ATG(A/C)GGG(T/C) (Yu et al. 2002). These regions were cloned in luciferase reporters (Figure 9, A and B, red). For each region, we also built the reporter with the mutated GBSs. The reporters were then transfected in HeLa cells with or without mGcm1 or mGcm2. For GCM1, we could observe an induction by mGcm1 and not by mGcm2, suggesting that the region inserted in the reporter allows for Gcm1-mediated induction. However, the three GBSs are not sufficient for induction mediated by mGcm1 because their mutagenesis leaves it unaffected (Figure 9E). Further scrutiny of the region revealed the presence of two other GBSs with <75% similarity with the canonical GBS but still containing nucleotides 2, 3, 6, and 7, which were determined to be indispensable for Gcm binding (Schreiber et al. 1998). For GCM2, we showed that this gene is able to induce expression of the luciferase reporter carrying the GCM2 promoter and that this promoter is inactive when the canonical GBSs are mutated (Figure 9F). This demonstrates that GCM2 is able to regulate its own expression via the GBSs inserted in the luciferase reporter. To conclude, mGcm1 and mGcm2 induce GCM1 expression via a region that has to be defined, and mGcm2 activates GCM2 transcription via a region covering the first exon-intron junction. These experiments demonstrate that positive autoregulation is conserved between Drosophila and mammalian gcm genes.
Other DamID targets are conserved in mammals
TBX1 was shown previously to be coexpressed with GCM2 during formation of the parathyroid glands (Manley et al. 2004; Reeh et al. 2014) and contains 46 GBSs in its promoter. To assess whether the GCM transcription factors are able to induce TBX1 expression in mammals, we analyzed the levels of TBX1 transcripts in HeLa cells transfected with expression vectors coding for mouse orthologs mGcm1 or mGcm2. This assay indicated that expression of TBX1 is specifically induced by mGcm2 (Figure 9G).
The three GATA transcription factors GATA-3, GATA-4, and GATA-6 control numerous developmental processes in mammals [reviewed in Molkentin (2000), Cantor and Orkin (2005), Zaytouni et al. (2011), and Chlon and Crispino (2012)]. The three genes contain several GBSs in their promoters, and expression of both GATA3 and GATA6 is induced by mGcm2 in HeLa cells (Figure 9, H–J).
FGFR1 and FGFR2 are the mammalian orthologs of the DamID targets btl and htl. FGFRs are widely described for their roles in angiogenesis, cancer development, and organogenesis [reviewed in Bates (2011), Kelleher et al. (2013), and Katoh and Nakagama (2014)]. They are also required for building the placental vascular system and are expressed in trophoblasts (Anteby et al. 2005; Pfarrer et al. 2006). Both genes contain GBSs in their promoters. In HeLa cells, mGcm1 induces the expression of FGFR1, and mGcm2 induces the expression of both FGFR1 and FGFR2 (Figure 9, K and L).
Finally, DLL1 is one of the ligands of the N receptor (Shimizu et al. 2000). The N signaling pathway plays a critical role in cell fate determination [reviewed in Artavanis-Tsakonas et al. (1999) and Schwanbeck et al. (2011)], including trophoblast development (Zhao and Lin 2012; Rayon et al. 2014; Massimiani et al. 2015), and DLL1 is expressed in trophoblasts (Herr et al. 2011; Gasperowicz et al. 2013). In humans, the DLL1 promoter contains GBSs, and DLL1 expression is induced specifically by mGcm1 in HeLa cells (Figure 9M).
The Drosophila orthologs of mammalian target genes work in pathways known to depend on Gcm
The preceding data indicate that TBX1, GATA factors, the FGFRs, and the N ligand DLL1 are regulated by Gcm in mammals. As mentioned earlier, no tissue has been found so far for which Gcm function is required in both mammals and Drosophila. This suggests that instead of a conservation of Gcm in similar tissues, we should look for conserved Gcm cascades. To further test this hypothesis, we validated the impact of Drosophila Gcm transcription factor on the Drosophila orthologs of the four families of genes identified earlier.
The Drosophila orthologs of TBX1 are the T-box transcriptions factors Bi, H15 (Porsch et al. 1998), and Doc1, which are required for ganglion mother cell (GMC) differentiation during development of the embryonic nervous system (Choksi et al. 2006; Leal et al. 2009). We validated the role of Gcm on Doc1 in S2 cells (Figure 9N).
The Drosophila ortholog of the GATA factor Pnr regulates postembryonic tendon cell differentiation (Ghazi et al. 2003). There is a significant induction of pnr expression by Gcm in S2 cells (Figure 9N), but further experiments are required to demonstrate the impact of Pnr on embryonic tendon cells or the impact of Gcm on postembryonic tendon cells.
The ortholog of the FGFRs, htl, is involved in ensheathing glia morphogenesis (Stork et al. 2014), and Gcm is required for differentiation of these cells (Awasaki et al. 2008). Gcm strongly induces expression of htl in S2 cells (Figure 6C). The second ortholog, grn, presents the same expression pattern as gcm in the developing embryonic CNS at stage 11 (Lin et al. 1995), indicating that Gcm may regulate grn expression in this tissue.
Finally, the DLL1 Drosophila ortholog Dl is required as part of the N pathway for the development of embryonic glia (Udolph et al. 2001; Van de Bor and Giangrande 2001; Umesono et al. 2002; Edenfeld et al. 2007), the larval optic lobe (Egger et al. 2010; Yasugi et al. 2010; Wang et al. 2011), and tendon cells (Nabel-Rosen et al. 1999; Ghazi et al. 2003). Dl expression is induced by Gcm in S2 cells (Figure 8B).
The DamID approach allowed us to identify the direct targets of Gcm in Drosophila in a genome-wide fashion and to extend these findings to mammals. It also allowed us to recognize key molecular pathways and developmental processes that depend on Gcm. The improvement of the transfection assays on FACS sorting provides a rapid and sensitive tool to characterize molecular pathways and genetic interactions. This versatile approach is particularly useful to study genes that are expressed at weak levels, in very few cells, or for which the target tissues are still unknown.
Feedback loops between Gcm and signaling pathways
Gcm is widely described for its role in nervous system development and hemocyte differentiation, and indeed, many of the genes identified in the screen are involved in these two processes (Table 1 and Table 2). The screen allowed us to establish a direct link between Gcm and three major signaling pathways, the vast majority of whose members are targeted by Gcm.
The Hh pathway and Gcm are necessary for tendon cell differentiation as well as for lamina neuron proliferation and differentiation, but the relation between Gcm and the Hh pathway remained vague (Chotard et al. 2005; Umetsu et al. 2006). Validation of the DamID screen in cells and in vivo reveals that Gcm can control five members of the Hh pathway (basically only Cos2 was not identified in the screen) (Figure 10A) at tendon cells. This includes Hh itself; its receptor, Ptc; the proteins transducing the signal Smo, Pka-C1, and Ci; and inhibitor of the signaling pathway Rdx. Interestingly, Chotard et al. (2005) proposed that Gcm interacts with the Hh pathway. Together with the finding that the Hh pathway is also required upstream of Gcm in tendon cells for definition of their precursor (Soustelle et al. 2004), these data suggest the presence of a feedback loop.
The JAK/STAT signaling pathway triggers lamellocyte differentiation [reviewed in Agaisse and Perrimon (2004) and Myllymaki and Ramet (2014)] and was shown previously to be repressed by Gcm (Jacques et al. 2009). The DamID screen shows the direct interaction between Gcm and two major inhibitors of the JAK/STAT pathway, Ptp61F and Socs36E, providing a molecular explanation for the observed genetic interactions. Interestingly, Gcm also targets Stat92E and Os (basically all the major players of the JAK/STAT pathway were identified in the screen) (Figure 10B), and the JAK/STAT pathway was shown to induce gcm expression in the optic lobe (Wang et al. 2013). This suggests the existence of a feedback loop in this cascade as well.
Finally, the N pathway was described as an activator or inhibitor of Gcm in glial cell differentiation depending on the context (Udolph et al. 2001; Van de Bor and Giangrande 2001; Umesono et al. 2002). The DamID screen shows that Gcm interacts with 30 genes of the N pathway, including seven inhibitors and seven activators of that pathway (Table S1); all the genes in the N pathway were found the in screen (Figure 10C).
Thus, the DamID screen helps to clarify the impact of Gcm on regulatory pathways at the molecular level and paves the way for future studies assessing the biological relevance of these interactions in vivo. These pathways had not been characterized previously as regulated by Gcm, nor had they been identified by the three transcriptome assays (Egger et al. 2002; Freeman et al. 2003; Altenhein et al. 2006). This is most likely due to the fact that these pathways involve feedback loops and/or are required in most, if not all, tissues during embryogenesis. Future studies will assess whether Gcm targets different members of a signaling pathway in specific tissues.
Gcm regulates genes organized in cluster
Analysis of the DamID screen reveals the regulation of genes organized in a cluster, as illustrated by E(spl)-C. This observation is in line with the hypothesis that chromatin conformation plays an important role in transcription factor–mediated induction of gene expression. Over a region of 0.8 Mb, the DamID peaks are exclusively concentrated in a region of 50 kb that contains the 12 members of E(spl)-C (Figure 7B), whose expression is upregulated by Gcm. In contrast, the unrelated gro gene is located at the boundary of the complex, is not controlled by Gcm, and does not contain a DamID peak (Figure 7C). The precise targeting of Gcm to genes within the folded region of E(spl)-C correlates well with chromosome conformation capture (3C) analyses, which revealed long-range interactions within the 50-kb complex but very few interactions with regions surrounding it (Schaaf et al. 2013). Further development of 3C technology will allow full comprehension of the impact of the chromatin three-dimensional structure on gene expression.
Gcm regulates genes required for the final function of differentiating tissue
Gcm is generally described as a cell fate determinant. Accordingly, many Gcm direct targets are involved in the early differentiation step, e.g., implementation of glial fate at the expense of neuronal fate in the nervous system (Figure 7A and Figure 8A, pink-shaded area). Surprisingly, a number of genes identified by the screen are necessary at late developmental stages or for function of fully differentiated cells (Figure 7A and Figure 8A, green-shaded area). Typical examples are provided by the genes expressed specifically in the BBB involved in amino acid, sugar, and water exchange and by those coding for septate junction proteins, which are necessary for the filtering function of the BBB (Deligiannaki et al. 2015) [reviewed in Hindle and Bainton (2014)]. Similarly, a number of genes acting in the immune system are involved in antigen-specific immune response and the encapsulation of foreign targets. Thus, an early and transiently expressed gene directly targets genes involved in physiologic responses. The fact is that early genes play a broader role that initially was thought not to be so uncommon because this was also observed for the transcription factor Pros in the nervous system. Pros promotes the switch from self-renewal to differentiation. DamID and transcriptome analyses of the downstream targets of Pros revealed that in addition to promoting genes involved in repression of stem cell self-renewal, it also promotes expression of genes involved later on in terminally differentiated neurons (Choksi et al. 2006).
Gcm downstream targets are conserved in mammals
The use of simple organisms such as Drosophila to understand the GCM regulatory network in vertebrates has been limited to few studies (Iwasaki et al. 2003; Soustelle and Giangrande 2007). This was mostly due to the known requirement of mammalian Gcm genes in placenta or parathyroid glands, two tissues that have no fly equivalent. Despite the disparity of tissues in which Gcm genes are expressed, we were able to find conservation in the target genes. Because in humans GCM2 downregulation and mutations are associated with parathyroid adenomas and hypoparathyroidism, respectively (Mannstadt et al. 2008; Doyle et al. 2012; Yi et al. 2012; Park et al. 2013; Mitsui et al. 2014), while GCM1 downregulation is associated with preeclampsia (Chen et al. 2004), the downstream genes identified in this work, including TBX1, GATA factors, and FGFR, represent interesting candidates to dissect the molecular mechanisms underlying these pathologies. Typically, TBX1 mutations in humans result in DiGeorge syndrome, which includes parathyroid aplasia (Jerome and Papaioannou 2001; Merscher et al. 2001). Also, GATA3 regulates GCM2 in the parathyroid gland (Grigorieva et al. 2010), and GATA3, GATA4, and GATA6 are required during trophoblast development [reviewed by Bai et al. (2013)]. Our data suggest a feedback loop between GCM2 and GATA3 in the parathyroid and point to a hitherto unknown role of GATA6 in this tissue, in line with recent immunolabeling data (Uhlen et al. 2015). Finally, FGFR1 is misregulated in hyperparathyroidism (Komaba et al. 2010).
Finally, the impact of Gcm on the N pathway seems to be conserved to the largest extent. Of the 30 genes identified by the screen, we confirmed 9 by qPCR, including N, E(spl)-C, and Dl. In humans, we show that DLL1 is regulated by GCM1, and in the mouse, the E(spl)-C ortholog Hes5 was reported to be regulated by mGcm1 and mGcm2 (Hitoshi et al. 2011). This gives strong support to the hypothesis that Gcm regulates the N pathway in both Drosophila and mammals. However, the effect of Gcm most likely depends on the tissue. Indeed, opposite outcomes have already been observed for the effect of Notch on Gcm in Drosophila (Udolph et al. 2001; Van de Bor and Giangrande 2001; Umesono et al. 2002), and both activators and inhibitors of the N pathway were identified in the DamID screen (Table S1). This means that the interaction between Gcm and the N pathway will need to be studied case by case. Overall, our data indicate that even though the main sites of gcm expression may be different in mammals and Drosophila, the Gcm cascade is at least partially conserved. In this study, we discovered 980 new potential direct targets of Gcm, demonstrated the direct interaction for 36 of them, and the use of Drosophila allowed us to discover eight new targets of the GCMs in humans, which include the characterization of a feedback loop for GCMs on themselves.
We thank the Developmental Studies Hybridoma Bank and the Bloomington Stock Center for reagents and flies, as well as J. Veenstra (INCIA UMR5287 CNRS, France) for the gift of the anti-DH31 antibody and B. Altenhein (University of Mainz, Germany) for fly strains. We thank K. Jamet for initial bioinformatics analyses. We thank C. Diebold, C. Delaporte, and the Institut de Génétique et de Biologie Moléculaire et Cellulaire for technical assistance. We also thank the members of our laboratory for valuable input and comments on the manuscript. This work was supported by grants from the Institut National de la Santé et de la Recherche Médicale, the Centre National de la Recherche Scientifique, the University of Strasbourg, Hôpital de Strasbourg, The fondation Association pour la Recherche sur le Cancer, the Ligue National Contre le Cancer, the French National Cancer Institute, and the Agence Nationale de la Recherche (ANR). A.P. and P.C. were funded by the Fondation pour la Recherche Médicale and by the ANR, respectively. A. Popkova also benefited from a short development traveling fellowship to visit the laboratory of A. Brand in Cambridge (UK). The Institut de Génétique et de Biologie Moléculaire et Cellulaire also was supported by a French state fund through the ANR Laboratoire d'excellence. T.D.S. and A.H.B. were funded by Wellcome Trust Programme grants 068055 and 092545 to A.H.B. A.H.B. acknowledges core funding to the Gurdon Institute from the Wellcome Trust (092096) and the Cancer Research UK (C6946/A14492).
Communicating editor: P. K. Geyer
Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.182154/-/DC1
- Received August 21, 2015.
- Accepted November 3, 2015.
- Copyright © 2016 Cattenoz et al.
Available freely online through the author-supported open access option. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.