| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |


,1
* Institute for Plant Genomics and Biotechnology, Texas A&M University, College Station, Texas 77843
Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas 77843
Department of Horticultural Sciences, Texas A&M University, College Station, Texas 77843
1 Corresponding author: Institute for Plant Genomics and Biotechnology, Norman E. Borlaug Center MS 2123, College Station, TX 77843.
E-mail: jmullet{at}tamu.edu
| ABSTRACT |
|---|
|
|
|---|
85-fold reduction of dhn2 RNA in sorghum shoots. Overall, we conclude that phylogenetic analysis of gene families among rice, sorghum, and maize will help identify regulatory sequences in the noncoding regions of genes and contribute to our understanding of grass gene regulatory networks.
5% of these genomes are under functional constraint (WATERSTON et al. 2002). Surprisingly, only
1.5% of the sequences under selection correspond to protein-coding sequences, underscoring the importance of noncoding regulatory sequences in genome function. Partly in response to this finding, the human genome project ENCODE was initiated to identify and elucidate the functions of the noncoding regulatory portions of the human genome sequence (COLLINS et al. 2003). Recent progress on sequencing plant genomes is creating a similar opportunity to identify and understand the function of noncoding regulatory sequences that regulate plant genes (HAO et al. 1998; ARABIDOPSIS GENOME INITIATIVE 2000; CHANDLER and BRENDEL 2002; RICE CHROMOSOME 10 SEQUENCING CONSORTIUM 2003).
The noncoding regulatory portion of eukaryotic genomes controls gene function through modulation of transcription initiation, RNA processing, RNA stability, translation, and chromatin structure. Promoter cis-regulatory elements that provide binding sites for transcription-factors (TFs) are of particular interest because they regulate gene transcription, guide development, and form the basis of gene regulatory networks (DAVIDSON et al. 2003). Like animal promoters, plant promoters contain regulatory modules composed of combinations of cis-elements that mediate changes in transcription in response to internal and external input. For example, an
350-bp region of the promoter of maize rab17 contains a minimum of nine TF-binding sites that mediate responses to ABA and dehydration and regulate gene expression during seed and vegetative development (BUSK et al. 1997). Cis-elements are also important to define because phenotypic variation can be caused by mutations in these sequences. For example, sequence differences in the teosinte branched-1 promoter are correlated with changes in gene expression, morphology, and development associated with the evolution of cultivated maize from teosinte (WANG et al. 1999; CLARK et al. 2004). Similarly, sequence differences in a putative cis-element of the AP1 promoter have been proposed to be responsible for variation in vernalization requirements in wheat (YAN et al. 2003).
The Arabidopsis thaliana genome encodes
1500 transcription factors of which
45% are unique to plants (RIECHMANN et al. 2000). Information about the binding sites for plant transcription factors is increasing rapidly (see the TRANSFAC database at http://www.gene-regulation.com/; PLACE at http://www.dna.affrc.go.jp/htdocs/PLACE/; PlantCARE at http://intra.psb.ugent.be:8080/PlantCARE/; and AGRIS at http://arabidopsis.med.ohio-state.edu). The discovery and characterization of TF-binding sites often involve electrophoretic mobility shift assays, DNAseI footprinting analysis, and site-directed mutation studies. Scaling these biochemical approaches for genome-wide analysis of cis-elements is challenging. Noncoding regulatory elements can also be identified through computational analysis of promoters of coregulated genes (TAVAZOIE and CHURCH 1999; HUGHES et al. 2001). An increasing number of microarray-based gene expression studies in plants are helping to identify regulons and the underlying cis-element modules that mediate gene expression patterns in plants (HARMER et al. 2000; SUNG et al. 2001).
A complementary way to identify noncoding regulatory sequences involves phylogenetic analysis of promoter sequences of homologous genes from species of varying divergence (ANSARI-LARI et al. 1998; THACKER et al. 1999; HARDISON 2000). The rationale for this approach is based on the finding that expression of homologous genes in different species is often similar, which suggests the retention of common regulatory elements. The regulatory sequences associated with homologous genes from diverged species can be identified because they are more conserved than the surrounding nonfunctional sequences. Computational approaches have been developed to facilitate phylogenetic searches for regulatory sequences (FICKETT and WASSERMAN 2000; TOMPA 2001; HALFON et al. 2002; REBEIZ et al. 2002; LENHARD et al. 2003; ROMBAUTS et al. 2003; FRITH et al. 2004). Successful implementation of these search programs requires an understanding of species phylogeny and an initial assessment of useful search parameters suitable for comparison of genes from diverged species to reduce the incidence of random sequence matching among nonfunctionally conserved sequences (TAUTZ 2000). This depends on a number of factors, but species separated for 15430 MY have been successfully analyzed using phylogenetic analysis (COLINAS et al. 2002; MUELLER et al. 2002). Comparison of highly diverged species reduces the problem of random sequence matching; however, studies of more closely related species often provide the most information since extended evolution of regulatory regions and biological functions reduces the ability to detect regulatory sequences (COLINAS et al. 2002).
Phylogenetic analysis has been used to identify conserved noncoding sequences (CNS) in plant genes in a number of studies. A study of 22 cruciferous species spanning
45 MY of divergence allowed the identification of CNS corresponding to known cis-elements associated with Chs and Apetala (KOCH et al. 2001). Phylogenetic shadowing of AGAMOUS genes in 29 Brassicaceae species identified several known and putative cis-elements in introns (HONG et al. 2003). A study of orthologous gene sequences from A. thaliana and cauliflower, species separated for 14.520.4 MY (COLINAS et al. 2002), identified approximately one highly significant 25-bp CNS (75% conserved) per gene.
Similar results have also been obtained through phylogenetic analysis of genes from grass species. Comparative analysis of phytochrome A gene promoters from sorghum, maize, and rice revealed CNS that spanned known cis-regulatory sequences (MORISHIGE et al. 2002). KAPLINSKY et al. (2002) compared the noncoding sequences of seven orthologous genes from rice, maize, and other grasses representing
50 MY of divergence and concluded that plant CNS are generally shorter than mammalian CNS from species of similar divergence. A follow-up study by this group on 52 homologous maize/rice gene pairs found that CNS spanning >14 bp are often located in introns and associated with regulatory genes (INADA et al. 2003). Similarly, a study involving >300 grass gene comparisons concluded that 20 bp (with 70% sequence matching or greater) was the minimal length needed to identify significant CNS among grass orthologs (GUO and MOOSE 2003). Unfortunately, the CNS identified in the studies above often did not span TF-binding sites known to regulate the target genes. The known TF-binding sites were missed because the size and conservation of these sites (610 bp) was below the sequence lengths used to eliminate random matching among sequences.
The goal of this study was to determine how to use phylogenetic analysis to identify cis-elements including 6- to 10-bp TF-binding sites that control gene expression in grass species. To do this, we developed a modified phylogenetic approach that facilitates the discovery of regulatory elements using a multi-stage process that includes analysis of several members of a gene family. The study focused on a family of ABA-responsive rab16/17 genes from sorghum, maize, and rice, species separated for
1620 MY (sorghum, maize) and
50 MY (sorghum, maize vs. rice; DOEBLEY et al. 1990). The rab16/17 genes encode a group of related
16- to 17-kD dehydrins that help protect plants from injury during dehydration (CLOSE 1997). Maize rab17, a well-characterized ABA-responsive gene (BUSK et al. 1997; BUSK and PAGES 1998; KIZIS and PAGES 2002), was used as a reference to determine if phylogenetic analysis was producing useful results. The identification of previously discovered and several new putative regulatory elements in the current phylogenetic study of rab16/17 genes indicates that this approach will be useful for annotation of sorghum, maize, and rice gene regulatory sequences.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Acquisition of gene sequences related to rab16/17:
The sequence of the maize rab17 gene used in this study was previously reported (Zmrab17; X15994). The sorghum and rice ESTs most related to maize rab17 were identified using The Institute for Genome Research's eukaryotic gene ortholog database (ortholog cluster 476665; http://www.tigr.org/tbd/tgi/ego/). The sorghum EST sequence (AW747029; e52) was used to identify sorghum BAC 21O3 by hybridization to a BAC library derived from IS3620C. Sorghum BAC 2103 was sheared (Gene Machines, San Carlos, CA) into
2-kb fragments and subcloned into pBluescriptII (Stratagene, La Jolla, CA). Clones that hybridized to sorghum EST AW747029 were sequenced from both ends using T3 and T7 primers. Sequences were assembled into
5x deep contigs containing
1000 bp of flanking 5' and 3' DNA using Sequencher software (Gene Codes, Ann Arbor, MI). The resulting genomic sequence matched a sequence of this gene previously named Sbdhn2 (GenBank U63831). Therefore the BAC-derived gene sequence obtained in this study was also named Sbdhn2 and the genomic sequence was deposited in GenBank (AY177889), where 5'-noncoding sequences correspond to nucleotides 11049 bp. The rice EST with the highest sequence similarity to maize rab17 (AU091664; e55) identified five related rice genomic sequences: rab16AD and a genomic sequence from the whole genome shotgun (WGS) database. The WGS rab sequence (AAAA01012244) was very similar to Osrab16A (97% nucleotide identity) so it was designated Osrab16A2. The 5'-noncoding sequence of the Osrab16A2 gene was included in this study (50806140 bp). 5'-noncoding sequences of four other members of the rice rab16 family used in this study had been previously reported (Osrab16A: Y00842, 11599 bp; Osrab16B: X52422, 11395 bp; Osrab16C: X52423, 11476 bp; Osrab16D: X52424, 1685 bp).
Analysis of mRNA abundance:
RNA was isolated from root and shoot tissue separately using Trizol reagent with the suggested modification for plants (Molecular Research Center, Cincinnati). Seed RNA was extracted from dry seeds using Concert Reagent (Invitrogen, Carlsbad, CA). First-strand cDNA was made by reverse transcribing 1 µg of total RNA with random hexamers using the TAQMAN reverse transcription reagents (Applied Biosystems, Branchburg, NJ). Quantitative Real Time PCR was performed on an Applied Biosystems 7900HT machine using SYBR chemistry for Zmrab17, Osrab16A2, and Osrab16C. The generation of specific PCR products was confirmed both by melting curve and by gel analysis. FAM/TAM probes were required for specific detection of Sbdhn2, Osrab16A, Osrab16B, and Osrab16D (Synthegen, Houston). Primers and probes were designed using Primer Express software (Applied Biosystems) to allow amplification of
100-bp products of similar GC and Tm characteristics.
Thermal-cycling conditions were 2 min at 50° and 10 min at 95° followed by 47 cycles at 95° for 15 sec and 60° for 1 min. Assays were performed in triplicate and data were analyzed using the ABI PRISM 7900HT SDS software (Applied Biosystems). Quantification was achieved using the comparative cycle threshold (CT) method (BIECHE et al. 1999), which normalizes the number of target gene copies to an endogenous reference gene (i.e., 18S rRNA, detected using the ribosomal TAQMAN kit supplied by Applied Biosystems). Fold inductions were calculated as Î(dCTcontrol-dCTABA).
Primer and probe sequences are as follows:
To determine the relative abundance of 16A, 16B, and 16D mRNA, RT-PCR was performed on known amounts of templates. Rice BAC OSJNBb34E03, which encodes the rab16A, rab16B, rab16C, and rab16D genes, was serially diluted and used as template for rab16A, rab16B, and rab16D primer/probe sets. These standard curves were then used to calculate primer efficiency and adjust dCT values to relative expression values.
Phylogenetic analysis:
The FootPrinter program (http://bio.cs.washington.edu/software.html) was used to identify conserved sequences among the rab genes analyzed. During optimization a wide range of search parameters were tested. Most comparisons used the following parameters: motif size, 8; maximum number of mutations, 1; maximum number of mutations per branch, 0; subregion size, 50 bp; subregion change cost, 1; allow for regulatory losses, no, except for sorghum and maize comparisons, which utilized a motif search size of 10 with no allowable mutations.
| RESULTS |
|---|
|
|
|---|
500 bp upstream of the coding region.
|
500 bp upstream. The initial alignments seeded by sequence matches identified by FootPrinter were then manually edited to maximize overall alignment. The results of the alignment process are shown in Figure 1 where all matching sequences were initially colored blue. Regions of extended homology with at least a 7/8 bp match between sorghum or maize and rice were defined as CNS and highlighted in yellow (rationale provided below).
Overall, this process allowed
70% of the sorghum/maize 5'-noncoding sequences analyzed to be aligned, whereas only
3050% of the sorghum dhn2 or maize rab17 5'-noncoding sequences could be aligned to the rice rab16A2 sequence. Two large INDELS spanning 14 and 5054 bp in the Osrab16A2 5'-noncoding region relative to Sbdhn2/Zmrab17 were the primary cause for loss of overall alignment. The extent of sequence alignment in the Osrab16A2 promoter vs. Zmrab17 or Sbdhn2 promoters declined to
30% in the region 300400 bp upstream from the translation start site. Sequences >400 bp upstream of the translation start sites became difficult to align, in part due to an increase in AT-rich sequences (data not shown). A number of INDELs ranging in size from 1 to 54 bp were used to create the alignments between maize, sorghum, and rice 5'-non-coding sequences (Figure 1). While many of these INDELS were probably introduced as an arbitrary consequence of the alignment process, overall the analysis revealed islands of conserved sequence surrounded by stretches of less-conserved sequence that have been modified extensively by insertions/deletions over the past
50 MY.
Analysis of known maize rab17 regulatory elements:
The overlap between the maize rab17 cis-elements previously defined through biochemical analysis and CNS elements was investigated as a first step toward understanding the limits of phylogenetic analysis of noncoding sequences from rice, sorghum, and maize. Aligned sequences that spanned each maize rab17 cis-element were compared in cross-species analysis (Table 1). In four of the nine elements, sequence conservation was high (7/88/8 bp) among rab17, dhn2, and rab16A2 (ABRE1, DRE2, ABRE2, and ABRE4). In contrast, only 4/8 bp of the DRE1 cis-element identified in maize were conserved in comparisons of rice/sorghum or rice/maize even though the core binding sequence (ACCG) for this element was present in all three species. Furthermore, Osrab16A2 apparently lacks sequences that would align to ABRE3a/3b and SPH present in the promoters of Zmrab17 and Sbdhn2; therefore, only sorghum/maize alignments were useful for detecting these regulatory elements. Similarly, the sequence corresponding to GRA was not present in sorghum in the aligned region; therefore, a sequence match was observed only between rice and maize. These results are consistent with the expectation that loss, gain, or significant change in regulatory elements among homologous genes after species separation will cause regulatory elements to be missed using phylogenetic analysis (false negatives). Information about these regulatory elements can often be obtained by carrying out phylogenetic analysis on homologous genes from more than two species spanning a range of divergence (i.e., GRA was detected in rice/maize comparisons; ABRE3a/3b and SPH were detected in sorghum/maize comparisons).
|
50 MYA, because the probability of retaining a 7-bp sequence by chance in a sequence that is identical by descent in these species pairs is reasonably low (P
0.002; KAPLINSKY et al. 2002). Furthermore, initial searches for 7/8-bp CNS in pairs of genes were restricted to aligned portions of the 5'-noncoding region that occur in the same relative order to increase the probability that comparisons of sequences that are identical by descent were made. TF-binding sites that were not present in the same relative order due to insertions, deletions, or rearrangements were identified in a separate search (see below). Using this approach, 17 7/8-bp CNS were located in the sorghum/rice or maize/rice pairwise comparisons of 5'-noncoding regions (Figure 1, sequences highlighted in yellow and numbered 117). Eight of the 7/8-bp CNS were present in all three species (Figure 1, CNS 2, 6, 7, 9, 11, 14, 15, 16) whereas 9 7/8-bp CNSs were present only in sorghum/rice or maize/rice comparisons (Figure 1, CNS 1, 3, 4, 5, 8, 10, 12, 13, 17). Of these latter 9 CNS, 6 contained 6/8-bp matches among all three species (Figure 1, CNS 1, 3, 4, 8, 13, 17). Furthermore, the longest exact sequence match in the regions spanned by each CNS was identified to determine if CNS were part of much larger stretches of conserved sequence. The consecutive number of conserved bases per CNS ranged from 4 to 10 bp with an average identical sequence match of 6.5 bp. The rapid loss of alignment outside of CNS, including sequences flanking the nine known maize rab17 cis-elements, indicates good discrimination of 7/8-bp CNS from surrounding putative nonfunctionally constrained sequences.
The 5'-noncoding regions of sorghum dhn2 and maize rab17 genes were also subjected to phylogenetic analysis to see if useful information about regulatory elements could be obtained from this analysis. Because sorghum and maize diverged only
16 MYA, a scan for CNS >19 bp would be required to achieve the same discrimination as that obtained by a screen for 7-bp sequences retained in sorghum/rice or maize/rice. However, only one CNS spanning at least 20 bp was present in the sorghum/maize alignment (CNS 27). Therefore, the aligned 5'-noncoding sequences of sorghum/maize were searched for CNS that were >9 bp even though the probability of a random occurrence of a 10-bp sequence match between these species is 0.05. This search identified 11 CNS ranging in size from 10 to 20 bp with an average sequence match spanning 13 bp, much larger than most known TF-binding sites (Figure 1, CNS 1828). The Sbdhn2/Zmrab17 CNS spanned seven of the nine known cis-elements but two cis-elements were missed using the >9-bp CNS search parameter (Figure 1, GRA, SPH). While insufficient divergence has occurred between sorghum and maize to accurately discriminate TF-binding sites, it was possible that CNS > 9 bp identified in comparisons of these species might span recently evolved cis-elements. Therefore, the Sbdhn2/Zmrab17 CNS were screened for known TF-binding sites, and these sequences were retained for further downstream analysis as described below.
Correspondence between rab17 CNS and TF-binding sites:
The relationship between 7/8-bp CNS identified through alignment of sorghum/rice and maize/rice rab 5'-noncoding sequences, known cis-elements, and putative TF-binding sites is shown in Table 2. In previous biochemical studies, nine TF-binding sites were identified in the Zmrab17 sequence region spanning 173 to 315 (BUSK et al. 1997). A scan for 7/8-bp CNS among sorghum/rice or maize/rice identified five of the nine previously identified TF-binding sites (Figure 1; ABRE4, GRA, ABRE2, DRE2, ABRE1; Table 2, CNS 11, 12, 14, 15, 16). All but one of these TF-binding sites was identified in both the sorghum/rice and maize/rice comparisons, indicating a high degree of conservation. CNS 12, which matched the TF-binding site GRA, was identified only in the rice/maize alignment due to a deletion in sorghum (Figure 1). The GRA TF-binding site in maize includes the sequence (GCCGCC) that matches the binding site for AP2 factors involved in responses to jasmonate and ethylene (BROWN et al. 2003). DRE1, SPH, and ABRE3a/3b were missed using the 7/8-bp criteria although the core DRE-binding sequence (ACCG) is perfectly conserved in all three species (BUSK et al. 1997).
|
The 11 >9-bp CNS identified in comparisons of sorghum/maize were also searched for putative TF-binding sites (Table 2; CNS 1828). Sbdhn2/Zmrab17 CNS 20, 22, 25, 26, and 27 overlap CNS 4, 11, 14, 15, and 16 in searches of sorghum/rice and maize/rice, respectively, and were therefore not analyzed further. Sbdhn2/Zm rab17 CNS 18 contains the C-ABRE-containing sequence (GCCGTG) similar to CNS 9, while CNS 19 spans a DBF-like binding sequence (CACAAG; KIZIS and PAGES 2002). CNS 21 spans the sequence CCTAATCC that has a core TAAT motif often found in HD-ZIP protein-binding sites (WOLBERGER 1996), while CNS 23 contains a Myb3-like motif (CTAACCA) in a reverse orientation (ABE et al. 1997). CNS 24 corresponds to ABRE3a/3b identified through biochemical analysis (BUSK et al. 1997). CNS 28 spans sequences that contain the DRE core-binding site (ACCG) recognized by some AP2 transcription factors (CACCGG).
A search for TF-binding sequences was also performed by scanning the entire 5'-region of each gene for matches to TF-binding sites in the TRANSFAC and PLACE databases (http://www.gene-regulation.com/pub/databases.html#transfac; http://www.dna.affrc.go.jp/htdocs/PLACE/) to identify putative cis-elements that were missed in sorghum/rice or maize/rice analyses due to the loss or creation of regulatory elements after species separation. This search identified four additional putative TF-binding sites: a (GCCGCC) AP2-ERF binding sequence (BROWN et al. 2003) immediately downstream of DRE2 in rice, a DRE-like sequence upstream of CNS 28 (ACCGAC in both maize and sorghum), and two DRE-like/AP2 binding sequences in rice (CACCGT, CACCGG) that partially overlap the GRA and SPH cis-elements in maize (Figure 1, labeled beneath Osrab16A2 sequence).
Overall, the implementation of phylogenetic analysis and TF-binding site searches described above identified 17 CNS from the Sbdhn2/Osrab16A2 or Zmrab17/Osrab16A2 searches, 6 additional unique Zmrab17/Sbdhn2 CNS that span known or putative TF-binding sites, plus 4 other putative TF-binding sites that are not supported through CNS discovery from phylogenetic analysis of either sorghum/maize by rice or sorghum by maize. Thus, a total of 27 possible CNS/TF-binding sequences were identified in the
400-bp 5'-noncoding region upstream of the homologous rab genes, corresponding to one putative regulatory element every
14 bp.
Phylogenetic analysis of additional genes related to maize rab16/17:
Gene families are created by gene duplication and therefore family members share a degree of sequence conservation and common regulation reflective of the time of divergence and forces of selection. While expression of many members of a gene family is often regulated through common regulatory pathways, specific genes of the family exhibit divergent expression under selected conditions. Therefore, phylogenetic analysis of gene families could help validate the presence of common regulatory elements and provide pairs of genes that differ by a limited subset of the regulatory elements that differentiate expression of specific members of the gene family. In these latter cases, correlation between variation in regulatory element composition and differences in gene expression could help elucidate the function of regulatory sequences. On the basis of this idea, we tested if phylogenetic analysis of four additional members of the rice rab16 gene family (rab16A-D) together with the cluster of related dhn2, rab17, and rab16A2 genes from sorghum, maize, and rice would provide useful information about CNS function.
BLASTN searches of the maize rab17 EST sequence against the nonredundant database identified several rab17 gene homologs, including rab16A (e19), rab16B (e19), rab16C (e23), and rab16D (e19). Rice rab16AD genes are organized in close proximity to each other in a tandem array consistent with derivation by duplication (YAMAGUCHI-SHINOZAKI et al. 1989). The proteins encoded by rab16A-D are
6592% similar in amino acid sequence and have domains and motifs common to the dehydrins (CLOSE 1997). CLUSTAL analysis of protein-coding regions was used to estimate the extent of divergence among Osrab16A-D, Osrab16A2, Zmrab17, and Sbdhn2 (Figure 2). This analysis showed that three pairs of RAB proteins, encoded by Sbdhn2/Zmrab17, Osrab16A2/Osrab16A, and Osrab16B/Osrab16C, are most similar to each other and incrementally diverged from the other pairs of proteins (Figure 2). The sorghum dhn2 and maize rab17 genes diverged
16 MYA, providing an estimate of the time and extent of divergence between this pair of genes and other genes with similar divergence. This analysis also showed that the proteins encoded by Osrab16D and Osrab16B/C have diverged to a similar extent as Sbdhn2 and Osrab16A2 (
50 MY). Overall, divergence among the RAB16 proteins was greater than among the initial set of RAB proteins analyzed (SbDHN2, ZmRAB17, and OsRAB16A2), suggesting that sufficient evolution had occurred to apply similar criteria for phylogenetic analysis to selected pairs of the larger set of rab16/17 gene family members.
|
50 MY) were aligned and searched for 7/8-bp CNS using FootPrinter. Following CNS discovery through pairwise analysis of genes, CNS common to more than two genes were aligned where possible. The results of this analysis are shown in Figure 3, where 7/8-bp CNS are highlighted with various colors (CNS without biochemical support, yellow; ABREs, blue; non-ABRE biochemically defined elements, brown, green, pink, and gray) and numbered above the corresponding sequence. TF-binding sites identified through biochemical analysis of Zmrab17 and Osrab16B are boxed and labeled above the Zmrab17 sequence (Figure 3, DRE1, ABRE1, C-ABRE, etc.; ONO et al. 1996; BUSK et al. 1997), while sequences related to known TF-binding sites that reside outside CNS are colored red and labeled below the set of seven rab sequences [Figure 3, C-ABRE (CGTG; ONO et al. 1996), SPH (CATGC; BUSK et al. 1997), CBF1 (CCGAC; STOCKINGER et al. 1997; MEDINA et al. 1999), ERF (GCCGCC; BROWN et al. 2003), DRE (ACCG-core), AP2 (GCCGGT; NIU et al. 2002), bHLH MYC-like binding site (CANNTG; ABE et al. 1997), MYB (AAACAAT, CCAACC; LI and PARISH 1995); PLACE/TRANSFAC databases). Some motifs were found in the reverse orientation and are indicated by the addition of (-) following the motif name. In total, four new CNS were identified through phylogenetic analysis with the additional rice rab paralogs: 16A16D (Figure 3, CNS 11.1, 11.2, 12.1, and 13.1). CNS 11.1 identified the biochemically defined SPH element, while the remaining new CNS appear novel.
|
|
Figure 4 shows that among the rab genes analyzed, mRNA levels increased
100- to 10,000-fold in roots following treatment with ABA and
10- to 1000-fold in shoots and that the level of rab16/17 mRNA in seeds is
50- to 1000-fold higher than that in roots of control vegetative plants. Induction of the rab16/17 genes by ABA is consistent with the presence of one or more ABRE sequences in all of the rab genes and "coupling elements" such as DRE2 in six of the seven genes analyzed (SHEN et al. 1996). However, while all of the genes responded to ABA and all are expressed in seeds, significant variation in rab16/17 gene mRNA abundance was observed. For example, Sbdhn2 showed greater induction by ABA in shoots compared to Zmrab17, and rab16D had the smallest difference in seed vs. root mRNA level among the rab genes analyzed (Figure 4).
|
85 times lower than Zmrab17 (Figure 5A). Therefore, the increased induction of Sbdhn2 mRNA by ABA in shoots compared to Zmrab17 mRNA (Figure 4) was due primarily to relatively low levels of Sbdhn2 mRNA in control shoots. Sbdhn2 and Zmrab17 have 14/17 CNS/TF-binding sites in common; however, Sbdhn2 lacks CNS 5, CNS 10, and CNS 12 (GRA; Figure 3). It has previously been demonstrated that the GRA element contributes significantly to basal Zmrab17 gene expression in unperturbed shoots (BUSK et al. 1997), consistent with the results presented here.
|
|
| DISCUSSION |
|---|
|
|
|---|
1525 bp CNS discovered through these approaches were often located within introns and considered likely to regulate gene expression (INADA et al. 2003), although their location and size are inconsistent with TF-binding sites. In this study, phylogenetic analysis was carried out on a group of ABA-responsive genes related to maize rab17 that are induced in response to plant dehydration during seed development. The goal was to investigate the utility of phylogenetic methods for identifying 5'-noncoding regulatory sequences including TF-binding sites among grass genes.
Useful phylogenetic CNS search parameters based on several considerations were developed. First, the promoters of most genes contain TF-binding sites that are 610 bp long with only a subset of these bases under strong selection. Phylogenetic searches for CNS larger than TF-binding sites would require conservation of base pairs that are not under selection, leading to a high level of false negatives consistent with prior results (INADA et al. 2003). Second, analysis of rice, sorghum, and maize sequences spanning known TF-binding sites in rab17 indicated that 7/8-bp sequence matches in aligned regions would identify most of the binding sites that are common to the genes and the species being compared. Third, on the basis of mutation rates in the grasses (GAUT et al. 1996), the probability that species such as sorghum, maize, and rice, separated for
1650 MY, have retained a 7-bp match at random in a DNA sequence that is identical by descent is relatively low (KAPLINSKY et al. 2002). Moreover, comparisons of sorghum/rice and maize/rice sequences allowed good discrimination of CNS from other sequences in the promoters. On average, searches for 7/8-bp CNS identified identical sequence matches that spanned 6.5 bp and sequences surrounding CNS were usually much less conserved due to mutations, deletions, and insertions. In contrast, searches for CNS among sorghum and maize identified much longer identical sequences (13 bp) and resulted in a higher false-positive rate.
A prior phylogenetic study of grass genes concluded that it would be difficult to identify 7-bp CNS due to random sequence matches, especially among AT-rich sequences (GUO and MOOSE 2003). This complication was minimized in the current study in two ways. First, the search for overall sequence alignment and CNS was done incrementally, starting from the translation initiation codon and terminating when the degree of alignment and rate of CNS discovery declined significantly. Among the rab16/17 genes analyzed, overall sequence alignment and the rate of 7/8-bp CNS discovery decreased in sequences >400450 bp upstream of the site of translation initiation. The region farther upstream contained many 7-bp AT-rich sequences similar to those reported by GUO and MOOSE (2003). Second, 7/8-bp CNS were required to occur in the same order relative to the translation start sites of the genes being compared, increasing the probability that the sequences were identical by descent. The final step in our approach involved searching the CNS and all other 5'-noncoding sequences for known TF-binding sites. This was done to identify additional TF-binding sites that were missed due to DNA insertions, deletions, or rearrangements since species divergence.
Application of the phylogenetic approach developed in this study for CNS discovery in sorghum dhn2, maize rab17, and rice rab16A2 genes identified 17 7/8-bp CNS in the 5'-noncoding region of these genes. In the rab17 promoter, five of the nine TF-binding sites previously defined by biochemical approaches were identified in the initial CNS alignment step, while four sites (DRE1, ABRE3a, ABRE3b, and SPH) were identified through analysis of rice rab16 paralogs or in searches for TF-binding sites (discussed below). Furthermore, 5 CNS identified in all three genes contained potential transcription-factor-binding sites identified through database searches: CNS 4 (DRE core; NIU et al. 2002), CNS 17 (putative TF-binding site in embryos; BUSK et al. 1997), CNS 9 [ABRE half site (CGTGC; IZAWA et al. 1993)], CNS 5 (bHLH MYC-like binding site), and a TATA-box. Two additional CNS were identified in the phylogenetic comparisons that did not span known TF-binding sequences (CNS 8 and 10). Overall, a total of 28 possible CNS/TF-binding sequences, or approximately one putative regulatory element every 14 bp, were identified in the
400-bp 5'-noncoding region upstream of these three rab genes. Similar results were obtained with biochemical analysis of the maize rab17 region spanning 184 to 305, where nine cis-elements, or one cis-element every 13 bp, were discovered (BUSK et al. 1997).
Phylogenetic analysis of 5'-noncoding sequences among the rab17/16 gene family:
Analysis of homologous genes from widely diverged species will not detect regulatory elements that have been gained or lost by the genes being compared since divergence. This loss of information can be avoided to some extent by analyzing orthologs from more than two species or through phylogenetic "shadowing" of numerous species, including those diverged within the past 20 MY (BOFFELLI et al. 2003; HONG et al. 2003). In the present study, we tested an additional way to identify 5'-noncoding regulatory sequences by analyzing several members of a gene family. We assumed that members of gene families will have some regulatory elements in common and that differences in selected regulatory elements are present in specific members of the gene family. This idea is consistent with information showing that plants activate subsets of rab/dhn genes in response to different types of abiotic stress and in a range of tissues and developmental stages via specific complements of TF/ABRE interactions (YAMAGUCHI-SHINOZAKI et al. 1989; KIM et al. 1997; CHOI et al. 2000; UNO et al. 2000). Moreover, we thought that differences in CNS content among gene family members could be related to variation in gene expression, providing tentative connections between CNS content and expression patterns.
To test this approach, phylogenetic analysis was carried out on five members of the rice rab16 gene family plus maize rab17 and sorghum dhn2. Phylogenetic analysis of 7/8-bp CNS among the larger group of rab/dhn genes identified many of the CNS/TF-binding sites found through analysis of three genes from rice, sorghum, and maize, providing increased evidence for the functional significance of these sequences. In addition, the analysis of the larger set of rab16/17 genes detected five CNS that were not identified in comparisons of Sbdhn2/Zmrab17 vs. Osrab16A2: a CNS located in the predicted 5'-UTR (CNTCGATC; data not shown); CNS 11.1 that spans the SPH element; and CNS 11.2, 12.1, and 13.1. On the basis of these results, we conclude that discovery of regulatory sequences by phylogenetic analysis is improved by the combined analysis of paralogs and orthologs from species spanning a range of divergence.
The alignment of CNS/TF-binding sites among the seven rab16/17 genes revealed several additional features regarding CNS composition and organization. First, the number of CNS shared by pairs of genes decreases as divergence among the genes increases. This trend probably reflects divergence in gene regulation and the accumulation of mutations that reduce our ability to detect CNS. Second, the conservation of sequences in and around CNS shared by gene family members is not perfect and could potentially contribute to differences in gene expression. For example, although ABRE1, -2, -3, and -4 all contain the same five-base ABRE core sequence (ACGTG), these binding sites differ in flanking nucleotides. Variation in sequences flanking ABRE core sequences are known to influence the interactions and binding affinities of these regulatory elements with different members of the bZIP family of transcription factors (IZAWA et al. 1993; HATTORI et al. 2002). Third, while the order of CNS/TF-binding sites in a region of the promoter was often conserved among the group of rab genes analyzed here, mutations, deletions, and insertions caused significant variation in the sequences and spacing between CNS.
Association of CNS content and rab gene expression:
The final part in our study assessed various methods for relating variation in CNS composition to differences in gene expression. rab genes are regulated by ABA, NaCl, cold, and other perturbations and are expressed in a wide range of cells, tissues, and developmental stages. In addition, ABA-responsive gene mRNA levels are regulated at the levels of transcription and RNA stability through regulatory elements located in the promoter as well as other regions of these genes not surveyed in this study (FINKELSTEIN et al. 2002; XIONG et al. 2002; HIMMELBACH et al. 2003). Therefore, because data on rab16/17 mRNA abundance were collected only from roots, shoots, and seeds and from control and ABA-treated vegetative tissues, the associations between CNS and gene expression identified in this study will be incomplete. However, these data allowed the utility of methods for making associations between CNS content and gene expression to be explored and several associations to be tentatively identified for follow-up study.
Plots of fold changes in mRNA abundance induced by ABA or between tissues (seeds and roots) helped identify variation in rab16/17 gene expression. For example, ABA-induced expression of Sbdhn2 mRNA in shoots was greater than that of the other rab genes analyzed (Figure 4). Furthermore, analysis of the ratio of Sbdhn2 to Zmrab17 mRNA levels in basal and ABA-induced states showed that Sbdhn2 mRNA levels were low relative to Zmrab17 specifically in control shoots (Figure 5). This difference in expression was correlated with the lack of GRA and CNS 5 in Sbdhn2 relative to Zmrab17. This supports previous work in maize where mutations in the GCCGCC motif in the GRA element resulted in reduced basal expression of Zmrab17 in leaves (BUSK et al. 1997). The transcription factors that bind to this element in maize or sorghum have not yet been identified. However, the (GCCGCC) ERF motif that is part of the Zmrab17 GRA binds AP2/ERBP factors that are involved in jasmonic acid/ethylene regulation in other plants (HAO et al. 1998; BROWN et al. 2003).
The ratio of expression of very closely related rab genes such as Osrab16A and Osrab16A2 was similar in all basal and ABA-induced states examined (Figure 6). This result is consistent with the fact that these genes had 16/16 CNS in common. The ratios of Zmrab17 and Osrab16A2 mRNA levels were also similar under all conditions studied except in seeds. However, the CNS/TF-binding site composition of this pair of genes varies in several ways. Osrab16A2 lacks ABRE3a/3b, SPH, and CNS 12.1, contains modified GRA and DRE2 sequences, and has CNS 13.1, CBF1, and ERF sequences not present in Zmrab17 (Figure 3). This suggests that there is redundancy and/or compensating changes in the regulatory elements in these two genes.
Analysis of differences in ABA-induced expression and ratios of gene expression among pairs of genes can be done without correction for primer efficiencies. However, elements contributing to consistent differences in mRNA abundance in all tissues and states will not be detected in these analyses. Therefore, the abundance of Osrab16A, -16B, and -16D mRNAs was compared after correcting for differences in primer efficiency (Figure 6). This analysis showed that Osrab16A was expressed at higher levels than Osrab16B in all states examined. In addition, Osrab16A mRNA increased more than Osrab16B in ABA-treated roots and shoots between 3 and 27 hr. These differences in expression are correlated with the presence of GRA, CNS 13, 9, and 5, as well as loss of SPH and CNS 12.1 in Osrab16A relative to Osrab16B. Continued accumulation of Osrab16A mRNA in ABA-treated plants between 3 and 27 hr might be associated with DRE-like sequences in the DRE1 and GRA regions of this gene, which are not present in Osrab16B. The quantitative analysis of rab mRNA levels also showed that Osrab16D was expressed at relatively low levels in shoots and seeds but at levels comparable to Osrab16B in roots. Both of these genes contain ABRE1 and CNS 17, which may help explain similar levels of ABA-induced expression in roots. However, Osrab16D lacks DRE1, DRE2, ABRE2, and ABRE4 and CNS 9, 12.1, and 11.2, subsets of which are important determinants of Zmrab17 gene expression in shoots and embryos (Table 3). Interestingly, the ABRE1 element in Osrab16D is flanked by several SPH-like sequences, which have been found to mediate ABA responses in a similar configuration in the napA promoter (EZCURRA et al. 1999). Osrab16D also contains putative MYC-like and MYB-binding sequences immediately upstream of CNS 10 (Figure 3). While these elements are not phylogenetically conserved among the rab genes analyzed, it is well established that some ABA-responsive genes are modulated by bHLH transcription factors (ABE et al. 1997). Further biochemical assays will be required to test the significance of these latter putative binding sequences in Osrab16D.
An even wider phylogenetic analysis of rab and dhn gene family members among grass species could elucidate stepwise changes in gene expression, CNS/TF-binding sites, and associated phenotypes that have occurred during the
50 MY of evolution of the grass family. A complete analysis of the rab/dhn gene family in rice, sorghum, and maize could also help determine if differences in rab/dhn gene content and expression contribute to variation in drought tolerance among these grasses. Comparisons among orthologs from highly divergent species are most useful for TF-binding site identification, whereas phylogenetic analysis of more closely related species and gene families within species will be useful for identifying sequence regions containing more recently evolved regulatory elements. The overall grass gene CNS annotation process would benefit greatly from in-depth analysis of gene expression, better definition of TF-binding sites, and global mapping of TF-promoter associations through genome-wide chromatin immunoprecipitation assays (LEE et al. 2002). Above all, the collection of a complete set of gene sequences from sorghum and maize will be required to extract the full benefit of phylogenetic analysis of these grasses.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
| LITERATURE CITED |
|---|
|
|
|---|
ABE, H., K. YAMAGUCHI-SHINOZAKI, T. URAO, T. IWASAKI, D. HOSOKAWA et al., 1997 Role of Arabidopsis MYC and MYB homologs in drought- and abscisic acid-regulated gene expression. Plant Cell 9: 18591868.[Abstract]
ANSARI-LARI, M. A., J. C. OELTJEN, S. SCHWARTZ, Z. ZHANG, D. M. MUZNY et al., 1998 Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Res. 8: 2940.
ARABIDOPSIS GENOME INITIATIVE, 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815.[CrossRef][Medline]
BIECHE, I., I. LAURENDEAU, S. TOZLU, M. OLIVI, D. VIDAUD et al., 1999 Quantitation of MYC gene expression in sporadic breast tumors with a real-time reverse transcription-PCR assay. Cancer Res. 59: 27592765.
BLANCHETTE, M., and M. TOMPA, 2002 Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12: 739748.
BLANCHETTE, M., B. SCHWIKOWSKI and M. TOMPA, 2002 Algorithms for phylogenetic footprinting. J. Comput. Biol. 9: 211223.[CrossRef][Medline]
BOFFELLI, D., J. MCAULIFFE, D. OVCHARENKO, K. D. LEWIS, I. OVCHARENKO et al., 2003 Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 13911394.
BROWN, R. L., K. KAZAN, K. C. MCGRATH, D. J. MACLEAN and J. M. MANNERS, 2003 A role for the GCC-box in jasmonate-mediated activation of the PDF1. Plant Physiol. 132: 10201032.
BUELL, C. R., 2002 Current status of the sequence of the rice genome and prospects for finishing the first monocot genome. Plant Physiol. 130: 15851586.
BUSK, P. K., and M. PAGES, 1998 Regulation of abscisic acid-induced transcription. Plant Mol. Biol. 37: 425435.[CrossRef][Medline]
BUSK, P. K., A. B. JENSEN and M. PAGES, 1997 Regulatory elements in vivo in the promoter of the abscisic acid responsive gene rab17 from maize. Plant J. 11: 12851295.[CrossRef][Medline]
CHANDLER, V. L., and V. BRENDEL, 2002 The Maize Genome Sequencing Project. Plant Physiol. 130: 15941597.
CHOI, H.-I., J.-H. HONG, J.-O. HA, J.-Y. KANG and S. Y. KIM, 2000 ABFs, a family of ABA-responsive element binding factors. J. Biol. Chem. 275: 17231730.
CLARK, R. M., E. LINTON, J. MESSING and J. F. DOEBLEY, 2004 Pattern of diversity in the genomic region near the maize domestication gene tb1. Proc. Natl. Acad. Sci. USA 101: 700707.
CLOSE, T. J., 1997 Dehydrins: a commonality in the response of plants to dehydration and low temperature. Physiol. Plant. 100: 291296.[CrossRef]
COLINAS, J., K. BIRNBAUM and P. N. BENFEY, 2002 Using cauliflower to find conserved non-coding regions in Arabidopsis. Plant Physiol. 129: 451454.
COLLINS, F. S., E. D. GREEN, A. E. GUTTMACHER and M. S. GUYER, 2003 A vision for the future of genomics research. Nature 422: 835847.[CrossRef][Medline]
DAVIDSON, E. H., D. R. MCCLAY and L. HOOD, 2003 Regulatory gene networks and the properties of the developmental process. Proc. Natl. Acad. Sci. USA 100: 14751480.
DOEBLEY, J., M. DURBIN, E. M. GOLENBERG, M. T. CLEG and D. P. MA, 1990 Evolutionary analysis of the large subunit of carboxylase (rbcL) nucleotide sequence among the grasses (Gramineae). Evolution 44: 10971108.[CrossRef]
EZCURRA, I., M. ELLERSTROM, P. WYCLIFFE, K. STALBERG and L. RASK, 1999 Interaction between composite elements in the napA promoter: both the B-box ABA-responsive complex and the RY/G complex are necessary for seed-specific expression. Plant Mol. Biol. 40: 699709.[CrossRef][Medline]
FICKETT, J. W., and W. W. WASSERMAN, 2000 Discovery and modeling of transcriptional regulatory regions. Curr. Opin. Biotechnol. 11: 1924.[CrossRef][Medline]
FINKELSTEIN, R. R., S. S. L. GAMPALA and C. D. ROCK, 2002 Abscisic acid signaling in seeds and seedlings. Plant Cell 14: S15S45.
FRITH, M. C., U. HANSEN, J. L. SPOUGE and Z. WENG, 2004 Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32: 189200.
GAUT, B. S., B. R. MORTON, B. C. MCCAIG and M. T. CLEGG, 1996 Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93: 1027410279.
GUO, H., and S. P. MOOSE, 2003 Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. Plant Cell 15: 11431158.
HALFON, M. S., Y. GRAD, G. M. CHURCH and A. M. MICHELSON, 2002 Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12: 10191028.
HAO, D., M. OHME-TAKAGI and A. SARAI, 1998 Unique mode of GCC box recognition by the DNA-binding domain of ethylene-responsive element-binding factor (ERF domain) in plant. J. Biol. Chem. 273: 2685726861.
HARDISON, R. C., 2000 Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16: 369372.[CrossRef][Medline]
H