Abstract
Determination and interpretation of fungal gene expression profiles based on digital reconstruction of expressed sequenced tags (ESTs) are reported. A total of 51,524 DNA sequence files processed with PipeOnline resulted in 9775 single and 5660 contig unique ESTs, 31.2% of a typical fungal transcriptome. Half of the unique ESTs shared homology with genes in public databases, 35.8% of which are functionally defined and 64.2% are unclear or unknown. In Aspergillus nidulans 86% of transcripts associate with intermediate metabolism functions, mainly related to carbohydrate, amino acid, protein, and peptide biosynthesis. During asexual development, A. nidulans unexpectedly accumulates stress response and inducer-dependent transcripts in the absence of an inducer. Stress response genes in A. nidulans ESTs total 1039 transcripts, contrasting with 117 in Neurospora crassa, a 14.3-fold difference. A total of 5.6% of A. nidulans ESTs implicate inducer-dependent cell wall degradation or amino acid acquisition, 3.5-fold higher than in N. crassa. Accumulation of stress response and inducer-dependent transcripts suggests general derepression of cis-regulation during terminal asexual development.
ANALYSIS of gene expression was pioneered by the Northern blot technique in 1977 (Alwineet al. 1977). Expression patterns are investigated by the interpretation of mRNA (Berk and Sharp 1977) or protein-derived expression maps. mRNA expression maps are currently constructed from expressed sequenced tags (ESTs; Schmittet al. 1999; Ohlrogge and Benning 2000) or evaluated from microarray chips (Cho and Campbell 2000; Richmond and Somerville 2000).
The study of differences in gene expression patterns is a promising approach for genetic, biochemical, cellular, and morphogenetic systems. Because it is now technologically feasible to measure global gene expression levels, it is possible to determine precise gene/function outlines affected by broad environmental cues, determine the onset of morphogenesis, or establish gene clusters involved in cellular processes and regulatory networks (Doebley and Lukens 1998; Tavazoieet al. 1999).
Comprehensive gene expression survey methodologies arrange into analog and digital resolution of gene expression levels. Analog methods are based on physical measurements of DNA/DNA or RNA/DNA hybridization between probes and tags (gene-specific DNA fragments or oligonucleotide collection) while digital methods derive expression levels from absolute counts of randomly generated tags from large condition-, stage-, organ-, or tissue-specific cDNA populations (Audic and Claverie 1997).
EST-based gene expression analysis requires a few assumptions. Underlying digital expression profiling is the assumption that in vivo populations of a given gene transcript from a particular tissue or organ of origin are proportionally represented in in vitro synthesized cDNA libraries from which tags are randomly sampled and sequenced. Thus, counting ESTs and relating them to the total sequenced population of ESTs provides absolute estimates of mRNA expression levels (Kozian and Kirschbaum 1999). The requirement for EST expression maps is random sequencing of nonnormalized libraries. As a result, qualified EST data sets provide exact transcriptional profiles and quantitatively compare mRNA expression levels of different organisms.
There are more than 5,609,655 entries in dbEST (database of Expressed Sequence Tags) release September 8, 2000 sampled from more than 5318 libraries (June 21, 2000). This public source of data retains information of genetic significance supporting the understanding of specific biological processes and phenomena.
Relative frequency variations of condition-, stage-, tissue-, or organ-specific tags (of any kind) is the basis for determination of differential gene expression levels in one, a few, or all genes of a given genome. Accuracy and range of expression profiles are dependent on limitations imposed by the method of choice. For digital expression profiles, the size of the EST collection under consideration needs to be validated through statistical models (Audic and Claverie 1997).
Fungi have simple and mostly nonredundant genomes, with Saccharomyces cerevisiae being the exception (Seoighe and Wolfe 1998). Most fungal genomes are haploid in the predominant phases of the life cycle; gene redundancy is observed but is not common, and pseudogenes are rare. Thus, a single-pass DNA sequence derived from RNA is a reliable source for quantitative and qualitative analysis.
The number of genes encoded by fungal genomes has been estimated by a number of methods and varies from 5800 for S. cerevisiae to 8100 or 9200 for Aspergillus nidulans and Neurospora crassa, respectively (Kupferet al. 1997; Kelkaret al. 2001; Smulianet al. 2001).
It has become increasingly apparent that expression of morphogenetic programs, activation or inactivation of physiological processes, and structural reorganization of cellular components are exerted through rearrangements of gene expression levels of entire transcriptomes, and where observations have been made, they seem to follow modular implementation (Doebley and Lukens 1998). Modular implementation of transcripts coding structural and metabolic functions points toward the predominance of coordination through cis-regulatory elements.
In this study partitioned cellular functions are correlated with expression levels for genes with rationalized activities. Transcripts with unclear or unknown functional assignments are not considered. This information is examined, and metabolic conditional profiles are reconstructed from various fungal tissues on the basis of digital associations of function with digital expression data gathered from large EST collections.
Initially, different fungi are shown parsing broad functional categories into quantitatively similar clusters, regardless of tissue, organ, or physiological origin. However, when specific functional clusters were analyzed within a functional category, it is found that during A. nidulans asexual development, a significant number of transcripts are related to stress responses and metabolism, suggesting preventive or protective roles.
Interestingly, there is a significant and revealing number of inducer-dependent transcripts in the absence of inducer at the time the mRNA sample was collected. To explain inducer-dependent transcripts in the absence of inducer and stress response transcripts in the absence of an obvious source of stress, it is proposed that during the final stages of asexual development, vegetative cis-acting regulatory networks are no longer functional and result in general derepression. These findings indicate that during asexual development, A. nidulans overrides vegetative regulatory controls and produces a series of transcripts that may add protective and adaptive advantages to the dormant spore.
MATERIALS AND METHODS
Fungal EST data manipulation: ESTs and source descriptions were obtained from the National Center for Biotechnology Information dbEST using batch-ENTREZ (Schuleret al. 1996). Only EST collections of fungal origin exceeding 1000 records were used in this study. ENTREZ-downloaded, FASTA-formatted files were processed using PipeOnline, an EST optimized DNA sequence analysis, and a database construction package with automated gene function annotation (described by Ayoubi, and available from the http://aspergillus-genomics.org website). Edited, assembled, and annotated ESTs can be inspected using GeneBrowser, surveyed with simple or structured queries, and results can be retrieved as FASTA- or TAB-delimited files. Databases used for this study are available from the aspergillus-genomics.org website.
Functional annotation: PipeOnline utilizes the ontological functional organization and the gene names (function) from the Metabolic Pathways database (MPW; Selkovet al. 1998; Overbeeket al. 2000). The MPW dictionary is grouped into six basic categories, each of which is subdivided into two to nine subcategories containing 2727 nonredundant standard enzyme, protein, and gene names at the lowest level. Functional assignment of records in a PipeOnline database is accomplished through a predefined gene name/gene index lookup table. The functional annotating algorithm identifies a function independent from BLASTX homolog description matching. A detailed description of the functional sorting process is being published elsewhere and is available from the http://aspergillus-genomics.org website.
Quantitative analysis and functional distribution: Each file (also identified as a clone in this study) downloaded from the dbEST database was treated as being a single-pass cDNA sequence derived from a randomly chosen clone of a nonnormalized source cDNA library. Table 1 contains additional biological information about the libraries. The term “contig” means multiple input files from a single organism and/or library assembled into a single, homologous overlapping consensus output sequence using default PHRAP arguments; i.e., minmatch = 14, minscore = 30, and maxgap = 30 (compared with other assembly programs in Chen and Skiena 2000). Singlets (clones) depict single-pass, nonhomologous, and nonoverlapping sequencing reads.
Clone equivalency from differentially sized clone collections: Quantitative comparisons between N. crassa and A. nidulans whose EST collections are of different sizes were done by using a clone equivalency conversion factor on the basis of the discussion by Audic and Claverie (1997). One clone from the A. nidulans EST collection (12,485) equals 1.5 clones from the N. crassa (20,172) library. Conversely, one clone from the N. crassa library equals 0.7 clones in A. nidulans. These equivalency factors fall within a confidence interval >0.993 and <0.994.
Assembly of digital expression profiles: The FASTA- and TAB-delimited export features of PipeOnline were used extensively to download data from functionally categorized queries and upload into a local spreadsheet program for human expert verification, validation, and final tag counting.
RESULTS
Evaluation of EST-derived DNA sequence information from fungal origin: Table 1 provides a summarized overview of all publicly available fungal EST collections (retrievable from GenBank, dbEST subset). Employing PipeOnline (available from the http://aspergillus-genomics.org website), out of 51,524 submitted ESTs belonging to eight fungal species, 15,435 unique (nonhomologous) sequence files were recovered, of which 5660 (36%) were contigs and 9775 (64%) were singlets.
EST-derived DNA sequence information from fungal origin in the public domain
Automated functional annotation of fungal ESTs
The unigene (U/dGb) and multiple sequence assembly (C/dGb) ratios are indicators asserting DNA sequence diversity and redundancy, respectively. For example, from a total of 51,524 fungal EST files (clones), 30% (U ratio) are unique, and 11% (C ratio) have been sequenced at least twice. U and C ratio comparisons among EST collections account for representation differences between cDNA libraries and the randomness of each sequencing effort, respectively. U and C ratios are noninformative when the total number of EST files is low. The N. crassa (U/dGB 0.11), Mycosphaerella graminicola (U/dGB 0.69), and Magnaporthe grisea (U/dGB 0.69) EST collections deviate from the average 0.46 U/dGB ratio.
Table 2 displays an assessment reproducing probable transcriptome coverage with fungal ESTs and aggregation of biochemical information assigned to these DNA sequences on the basis of the PipeOnline automated functional assignment algorithm. On average, 31.2% of a typical fungal transcriptome is represented in the listed databases, whereas the A. nidulans EST collection retains the largest number of the predicted transcriptome (56.7%) and M. grisea keeps the least, at 8.6%. On average, half (50.1%) of the unique ESTs share homology [high scoring pair (hsp) >100] with other entries in GenBank; 35.8% of those are functionally annotated by PipeOnline, and 64.2% of unique ESTs remain unclear or have no homolog (hsp < 100).
Through the analysis of the functional outline of EST collections, this study detects qualitative and quantitative differences in functional content between collections and suggests stage-specific properties unique to A. nidulans asexual development and vegetative growth in N. crassa. All potential functions that could not be found in the A. nidulans EST library employing the PipeOnline functional sorting algorithm were retrieved and are displayed in Table 3. It was found that the vast majority of missing functions are related to active transport of amino acids, nucleotides, sugars, and other basic components missing from the minimal medium from which the source tissue was harvested. Functions other than transport genes that were found missing include DNA adenine methylation, tRNA anabolism genes, seven group tRNA methyltransferases, nitrate respiration, signal peptide trimming, and catabolism genes.
At first, it was surprising to discover that transport functions were missing in A. nidulans ESTs; however, the tissues that served as sequencing templates were grown in mineral, glucose-only-containing medium (Aramayo and Timberlake 1990), affirming this unexpected result. Moreover, this strongly indicates that false annotations are not common, thus authenticating digital profiling based on transcript counting and automated annotation.
Metabolic profile of A. nidulans: Table 4 shows the overall metabolic activity distributed into functional categories among the three largest EST collections, A. nidulans, N. crassa, and S. pombe. All three collections sorted roughly into a similar functional pattern with intermediate metabolism accounting for ∼84% (82% the lowest and 86% the highest) of all surveyed transcripts. Also included were information pathways, transmembrane transport, signal transduction, and electron transport accounting for 8, 2, 3, and 3% of the transcripts, respectively. The A. nidulans, N. crassa, and Schizosaccharomyces pombe ESTs sorted all functional categories into similar patterns.
Figure 1 shows that in A. nidulans 86% of all the transcripts correlate with functions that fall within the category of intermediate metabolism. Within that category, half (47%) of the transcripts encode functions related to synthesis and degradation of carbohydrate and 16% encode functions related to amino acid, peptide, and protein metabolism.
Biochemical functions missing in the A. nidulans ESTs
Metabolic activity distribution of three fungi among major functional categories
Carbohydrate synthesis (58%) is the most active portion of the intermediate metabolism category. Main carbohydrate pathways account for 21% of the activity and production of mono-, di-, and polysaccharides: 17, 4, and 9%, respectively. Anabolic processing of sugars such as the production of sugar alcohol, alcohols, and organic acids account for 37% of carbohydrate-related activity. Production of aminosugars and other carbohydrates fill the remaining 10%.
—Intermediate metabolism functional distribution of A. nidulans asexually developing cultures. Metabolic activity is expressed as a function of the number of ESTs (all clones) sequenced with functions that classify into the depicted major metabolic groups. Bars with arrowheads indicate the biochemical trend and anabolism of catabolism within groups that make a category. Blunt-ended bars indicate the total contribution of a given category.
Amino acid anabolism is another predominant activity (∼10% of intermediate metabolism) while a moderate amount (4.3%) of recycling via proteolysis can be detected. The contributions of other categories—nitrogen, sulfur, and phosphorus metabolism, fatty acids membrane and related metabolite production, vitamins, heme, coenzymes, and other prosthetic groups—is modest, amounting to 35% of the overall metabolic activity.
Misappropriated gene expression: During the functional distribution analysis of the A. nidulans ESTs transcripts were noticed whose predicted function indicates a requirement for a specific inducer that was not present in the medium from which the EST source tissue was collected. Thus, it is not expected that these transcripts would be observed, yet they are clearly present.
Misappropriated inducer-dependent transcripts in A. nidulans and N. crassa
Table 5 reports a survey of inducer-regulated activities detected in A. nidulans and compared with N. crassa. In A. nidulans, a total of 700 misappropriated inducer-dependent transcripts (MIT), 5.6% of the entire clone collection whose expression (based on the predicted function) requires a specific inducer, was found. N. crassa contains 336 such transcripts, representing 1.7% of that EST collection. Furthermore, these unexpected transcripts fall into two major groups: inducer-dependent and glucose-derepression-like transcripts (derepression of transcription by the absence of glucose). Inducer-dependent transcripts are frequent in A. nidulans, with 392 transcripts defining eight discrete functions, and rare in vegetative N. crassa tissues, with only 52 transcripts (26 amino acid oxidase, 16 chitin synthase, and 10 glucanase transcripts). A. nidulans transcripts produced in response to an inducer (56%) are implicated in plant cell wall degradation (7.1%), remodeling of the spore cell wall (22.6%), or acquisition of amino acids (26.3%). Glucose-derepression-like transcripts are detectable in A. nidulans and N. crassa, and they represent 44 and 84.5% of MIT transcripts, respectively.
Stress response genes are differentially expressed: Another observation was the frequent detection of stress-related genes in the A. nidulans asexually developing EST collection. Thus, A. nidulans stress-related transcripts were scored and compared with the N. crassa vegetative EST collection. A summary of the findings is shown in Figure 2.
Heat shock, DNA repair, trehalose recycling, and starvation response genes are predominant in A. nidulans (development), with 10.0-, 6.2-, 4.2-, and 3.0-fold higher expression levels when compared to N. crassa (vegetative). Sorbitol recycling (7.1×), homeostasis (3.6×), oxygen radical removal (3.2×), and proton flux (5.0×) were dominant activities detected in N. crassa. Stress genes account for 9.9% of all transcripts in A. nidulans, and they appear to be a major group present in asexually developing A. nidulans tissue. In N. crassa, however, similar stress genes account for only 4.85% of the total transcripts, a 2.0-fold reduction. Moreover, stress transcripts prevalent in A. nidulans (heat shock, starvation, trehalose, and DNA repair) total 1039 transcripts, which contrasts with 117 transcripts in N. crassa, a 14.3-fold difference. Thus, in A. nidulans representation of stress-related transcripts is dramatically increased in relation to N. crassa.
—Digital gene expression comparison of stress response genes in A. nidulans and N. crassa. Open and solid boxes indicate gene clusters associated with N. crassa and A. nidulans ESTs, respectively.
DISCUSSION
In this study, digital gene expression profiles were evaluated for asexually developing A. nidulans tissues and compared to N. crassa vegetative tissue using equivalency criteria to account for population differences. The analysis of other fungal cDNA libraries was used to gather information about functional distribution and consistency of the digital information recovered from PipeOnline databases (Table 1).
EST collections obtained from different fungi by extraction of mRNA from tissues exposed to numerous physiological conditions did not result in extreme variability (Tables 1 and 2). Library redundancy and representation of clones was comparable if the size of the EST collection was considered (Table 1). In addition, functional annotation by PipeOnline produced EST subsets with functional annotation that corresponds on average to 35.8% of all the fungal collections, 44.6% being the highest and 27.5% the lowest (Table 2). Thus, dbEST fungal ESTs are useful for quantitative analysis, producing results with biological significance (Audic and Claverie 1997; Ewinget al. 1999).
ESTs from different organisms cluster similarly into the main cellular metabolic and structural groups (Table 4). These results corroborate the findings by Ewing and Claverie (2000) that gene clusters, representing metabolic pathways, may be compared with clusters of other organisms to render significant multiconditional gene expression information. Thus, A. nidulans clusters, when compared with N. crassa EST clusters, should reveal differences significant to vegetative growth or asexual development.
Here, the focus is on the identification of gene clusters important in A. nidulans asexual development by employing N. crassa ESTs to compare the vegetative state. We found two transcript clusters of interest, which we analyze and discuss in detail: (1) stress response genes and (2) misappropriated inducible genes whose expression requires an inducer absent at the time the tissue was harvested.
A. nidulans accumulates significant levels of these misappropriated transcripts during conidiation (Table 5). Under vegetative growth conditions, these transcripts are expressed only if an inducer is present. Accumulation of these transcripts may occur late during spore maturation and translate during germination. These pretranscribed mRNAs could confer a significant advantage if the spore germinates on a substratum on which free glucose is not readily available. An interesting aspect of these misappropriated transcripts is that all of them encode functions for plant cell wall or protein degradation, substrates likely to be abundant in natural habitats where Aspergillii are commonly found. The question of whether these misappropriated transcripts encode functional proteins remains unclear.
—Model illustrating the accumulation of stress response and misappropriated transcripts during asexual reproduction. Open and solid boxes indicate gene clusters associated with vegetative and asexual reproductive tissue ESTs, respectively. Lines connecting boxes indicate associations defined through EST function clustering.
Finally, the involvement of low mRNA levels encoding catabolic functions has been suggested to explain regulation of cellulases and other plant-cell-wall-degrading enzymes (Torigoiet al. 1996; Carle-Uriosteet al. 1997). Furthermore, implication of specific activities localized to the spore, or development in fungi, has also been suggested (Kubicek 1987; Baggaet al. 1989; Messneret al. 1991).
Accumulation of stress response mRNAs during conidiation produced another cluster for which a large number of transcripts has been determined (Figure 2). Heat-shock transcripts account for 7.6% (919 transcripts) of all A. nidulans ESTs, 10.0-fold higher than in N. crassa. Other stress-related clusters expressed at higher levels during asexual reproduction include DNA repair (62 transcripts, 6.2 times higher), trehalose recycling (34 transcripts, 4.2× higher) transcripts, and starvation response genes.
Not all stress response clusters were overrepresented in A. nidulans. Proton flux (427 transcripts, 5.0×), oxygen radical removal (241 transcripts, 3.2×), metal homeostasis (151 transcripts, 3.6×), and sorbitol recycling (43 transcripts, 7.1×) were expressed at higher levels in N. crassa, even though the overall difference was <2.6-fold.
Association of stress responses, specifically reactive oxygen removal, with reproduction in A. nidulans has been frequent and in some cases corroborated with powerful experimental demonstration (Skromneet al. 1995; Navarroet al. 1996; Kawasakiet al. 1997; Hafkeret al. 1998; Navarro and Aguirre 1998). Digital clustering is based on quantitative measurements (Audic and Claverie 1997; Ewing and Claverie 2000); thus, the heat-shock cluster needs to be considered as meaningful. Interestingly, oxygen radical removal was not found as a dominant cluster in A. nidulans even though 7 catalase A (developmental specific) and 4 catalase B transcripts were detected (data not shown, available from the http://aspergillus-genomics.org website). In fact, 3.21 times more transcripts in the N. crassa oxygen radical removal cluster were counted, including 30 catalase A transcripts and no catalase B.
Heat-shock treatments in A. nidulans germlings has been reported to dramatically increase trehalose, mannitol, and catalase A mRNA levels (Noventa-Jordaoet al. 1999). Moreover, treatment with hydrogen peroxide increases germling viability after heat shock, and catalase-A-deficient mutants are heat sensitive. Thus, there seems to be a genetic link between these two gene clusters.
Vegetative tissues are specialized in rapid growth and environmental occupation (Wessels 1994; Bartnicki-Garciaet al. 1995; Harold 1999). Polar growth happens at hyphal tips, and components are delivered mainly through a vesicular delivery system (Kamadaet al. 1991). Asexual development in filamentous fungi entails a three-step process in which vegetative cells initiate asexual reproduction by making a decision usually noticed by switching the mode of cell division from polar to budding (Wieseret al. 1994; Yuet al. 1996; Kaminskyj and Hamer 1998; Yeet al. 1999). Conidiophore assembly and conidium production in most cases involve more than one cell type specialized in channeling haploid mitotic nuclei and amplifying the outcome of the reproductive process (Timberlake 1991; Prade and Timberlake 1993; Karos and Fischer 1996). The final stage entails maturation of the newly synthesized conidium by the addition of protective cell wall layers to the spore (Stringeret al. 1991; Timberlake 1991; Prade and Timberlake 1994).
Figure 3 shows a model that summarizes our findings. It is likely that during the later stages of development stress response and misappropriated transcript cluster genes are deposited in the conidium. These transcripts may be advantageous during the germination process. Simple carbon sources may not always be present, and induction of enzymes that enable assimilation and metabolism of alternate carbon sources is essential and may require the presence of low levels of the mRNA. Thus, deposition of these misappropriated transcripts during the reproductive process may be essential to the future survival of the spore. Deposition of stress response genes can be explained similarly. Spores may germinate under conditions where temperatures, salt concentrations, or water potential are not ideal.
Detection of inducer-dependent transcripts in the absence of inducer and stress response transcripts in the absence of an obvious source of stress late in development may indicate that vegetative cis-acting regulatory networks are no longer functional and result in derepression. Thus, during asexual development, A. nidulans overrides vegetative regulatory controls and produces a series of transcripts that may add protective and adaptive advantages to the dormant spore. These observations lead to the conclusion that during terminal asexual development, vegetative cis-acting regulatory networks are no longer functional and result in general derepression that may result in augmented survivability upon germination.
Acknowledgments
We thank Bruce Roe and Doris Kupfer, whose scientific interest resulted in the first public, free-of-charge, and unfettered access to large-scale DNA sequence information to the fungal community. We also thank Eduardo Misawa and co-workers from the Oklahoma State University bioinformatics laboratory for expert assistance in the Pipe-Online software package development and implementation as well as administration of local computational resources. The reported research was partially funded with grants from the National Science Foundation (NSF 98-13360), the United States Department of Agriculture (USDA 97-35303-4459), and an industrial consortium: Genencor International (USA), Glaxo Wellcome (Spain), Gist-brocades (The Netherlands), Novo Nordisk (Denmark), Kikkoman (Japan), and Amano (Japan).
Footnotes
-
Communicating editor: J. Arnold
- Received October 10, 2000.
- Accepted December 15, 2000.
- Copyright © 2001 by the Genetics Society of America