The purpose of this project was to identify large numbers of Arabidopsis genes with essential functions during seed development. More than 120,000 T-DNA insertion lines were generated following Agrobacterium-mediated transformation. Transgenic plants were screened for defective seeds and putative mutants were subjected to detailed analysis in subsequent generations. Plasmid rescue and TAIL-PCR were used to recover plant sequences flanking insertion sites in tagged mutants. More than 4200 mutants with a wide range of seed phenotypes were identified. Over 1700 of these mutants were analyzed in detail. The 350 tagged embryo-defective (emb) mutants identified to date represent a significant advance toward saturation mutagenesis of EMB genes in Arabidopsis. Plant sequences adjacent to T-DNA borders in mutants with confirmed insertion sites were used to map genome locations and establish tentative identities for 167 EMB genes with diverse biological functions. The frequency of duplicate mutant alleles recovered is consistent with a relatively small number of essential (EMB) genes with nonredundant functions during seed development. Other functions critical to seed development in Arabidopsis may be protected from deleterious mutations by extensive genome duplications.
SEED development has become a popular subject for genetic analysis because it plays a critical role in the life cycle of flowering plants (Meinke 1995; Laux and Jurgens 1997; Mordhorstet al. 1997). Many genes must be expressed as the zygote divides in a regulated manner, completes morphogenesis, and differentiates into a mature embryo capable of surviving desiccation and producing a viable plant. Arabidopsis thaliana has long been used as a model system for mutant analysis of seed development (Meinke and Sussex 1979). Hundreds of embryo-defective (emb) mutants have been identified and used to address fundamental questions in plant biology. These recessive mutants have been recovered either by screening immature siliques for abnormal seeds (Meinke 1994) or by screening at the seedling stage for defects indicative of a disruption of normal embryogenesis (Jurgenset al. 1994). Included among the many published mutants are examples of defects in growth, morphogenesis, cell division, cell differentiation, and pattern formation (Meinke 1994; Yadegariet al. 1994). Some of these embryo mutants are allelic to mutants known in other labs for their aberrations in vegetative development (Conklinet al. 1999; Lukowitzet al. 2001). Others are identical to pattern mutants isolated elsewhere at the seedling stage (Jurgenset al. 1994). A few mutants appear to be weak alleles of female gametophytic factors (Springeret al. 1995). The common feature is that embryo development becomes abnormal after fertilization and before germination.
Genetic analysis of seed development has progressed in recent years from descriptions of mutant phenotypes to identification of mutant genes. Emphasis for gene isolation has been placed on collections of insertion mutants (Feldmann 1991; Bouchezet al. 1993). Several labs have participated in screens for T-DNA mutants defective in embryogenesis (Castleet al. 1993; Yadegariet al. 1994; Devicet al. 1996; Lindseyet al. 1996) while others have established systems to identify genes through transposon tagging (Altmannet al. 1995; Osborneet al. 1995; Wismanet al. 1998; Parinovet al. 1999; Speulmanet al. 1999; Tissieret al. 1999) and gene trapping (Toppinget al. 1994; Sundaresanet al. 1995; Martienssen 1998). Advances have also been made in methods for map-based cloning (Lukowitzet al. 2000). The result has been significant advances in the number of EMB genes identified. Examples include mutants defective in transcription factors and associated proteins (Aidaet al. 1997; Grossniklauset al. 1998; Hardtke and Berleth 1998; Li and Thomas 1998; Lotanet al. 1998; Luerssenet al. 1998), metabolic enzymes (Pattonet al. 1998; Lukowitzet al. 2001), transport and trafficking proteins (Lukowitzet al. 1996; Assaadet al. 2001), factors required for replication and translation (Springeret al. 1995; Tsugekiet al. 1996; Zhang and Somerville 1997; Uweret al. 1998), and proteins with other essential functions (Albertet al. 1999; Janget al. 2000; Schricket al. 2000; Boissonet al. 2001).
We describe here the results of a large-scale project designed to saturate for T-DNA tagged mutants defective in seed development. A companion article (Budziszewskiet al. 2001) presents a similar analysis of genes required for seedling development. Together these articles represent a coordinated effort to establish a large collection of insertion mutants defective in genes with essential functions during plant growth and development. Such knockouts of essential genes will be an important component of international efforts to build upon the Arabidopsis Genome Initiative (2000) and determine the biological function of every gene in Arabidopsis before the end of the decade (Somerville and Dangl 2000).
MATERIALS AND METHODS
Plant maintenance: T-DNA insertion lines were generated in the Columbia (COL) and Wassilewskija (WS) ecotypes of A. thaliana. Plants used for transformation experiments were grown in soil (Super Fine Germinating Mix; Conrad Fafard, Agawam, MA), watered from below once a week and maintained under fluorescent lights at 22° ± 4° with 50% relative humidity and 16-hr light/8-hr dark cycles. Transgenic (T1) plants screened for mutant phenotypes at Syngenta were grown in Aratrays (Lehle Seeds, Round Rock, TX) and covered with Aracons (Lehle) at bolting to minimize cross-pollination. Lines chosen for genetic analysis at Oklahoma State University were grown in pots containing a mixture of vermiculite, soil, and sand (Heathet al. 1986) and placed in a growth room at 24° ± 2° under fluorescent lights on 16 hr light/8 hr dark cycles. Plants were watered daily from below with a fertilizer solution (Peters 15-16-17; Scotts, Chicago, IL). Heterozygous plants were identified by screening immature siliques for the presence of 25% defective seeds following self-pollination (Meinke 1994).
Plant transformation: A number of binary transformation vectors were used in this study. Initial populations were generated with activation tagging vectors pPCVICEn4HPT (Hayashiet al. 1992) and pSKI015 (Weigelet al. 2000) containing multiple enhancers in the right border and plasmid rescue elements for subsequent recovery of flanking sequences. Because this project did not emphasize the creation of gain-of-function mutations, most insertion lines were made with simple T-DNA vectors (pCSA104, pDAP101, pCSA110) with a BASTA-resistance gene, plasmid rescue element, and minimal borders designed for production of gene knockouts. The pCSA110 vector contained an additional fragment with a GUS reporter gene driven by the LAT52 pollen-specific promoter (Eyalet al. 1995). This vector was transformed into qrt/qrt plants (Preusset al. 1994) to facilitate segregation analysis in pollen for related projects. Vectors were electroporated into disarmed Agrobacterium tumefaciens strain C58 cIrif containing the helper plasmid pMP90RK (Koncz and Schell 1986). Colonies with the inserted binary vector were selected on Luria plates supplemented with carbenicillin (100 μg/ml) and rifampicin (400 μg/ml). Single colonies were transferred to 5-ml liquid cultures of Luria broth with antibiotics (Sigma, St. Louis) and grown overnight at 28° with shaking. The entire culture was then transferred to 500 ml of fresh medium and grown to OD600 = 2.
Transformed plants were generated with the vacuum infiltration method (Bechtold and Pelletier 1998) with modifications (Pattonet al. 1996) and with the floral dip method (Clough and Bent 1998). Transformation efficiency ranged from 1 to 3% of progeny seeds. Optimal results were obtained with the dip method. Flowering plants were submerged up to the basal rosette for 10–30 sec in 500 ml of 10 mm MgCl2 and 0.01% Silwet L-77 (Lehle) containing Agrobacterium cells from a 500-ml overnight culture. These plants were then subjected to the same treatment again 5 days later. Seeds from these plants were bulk harvested at maturity. Transformants carrying the pPCVICEn4HPT vector were selected on agar plates supplemented with Murashige and Skoog (1962) salts and 25 μg/ml hygromycin B (Sigma). All other transformants were selected in soil by spraying seedlings with 240 μg/ml glufosinate ammonium (BASTA; Hoechst, Frankfurt, Germany) in 0.005% Silwet L-77. Bulk seeds were mixed with filtered sand and spread over the soil surface with a salt shaker to a final density of 40 seeds/cm to allow for optimal selection. Seedlings were sprayed three times over a 2-week period with a Preval disposable paint applicator (Precision Valve, Yonkers, NY) and once after transplantation to Aratrays. Immature siliques of resistant transplants were examined under a dissecting microscope for defective seeds. Mature siliques were harvested from individual plants that appeared to be heterozygous for a seed mutation.
Genetic analysis: Ratios of resistant to sensitive (R:S) seedlings were obtained for each mutant by plating 50 mature seeds (T2 or occasionally T3 generation) on each of two 100 × 15-mm petri plates containing the inorganic salts of Murashige and Skoog (1962), 3% glucose, 0.8% (w/v) purified agar, and either 30 mg/liter hygromycin (Sigma) or 50 mg/liter glufosinate ammonium (Crescent Chemical, Hauppauge, NY). Seeds were surface sterilized as described by Errampalli et al. (1991) and transferred to specified locations on the agar surface using sterile forceps. Plates were stored at 4° for 2 days to improve germination rates and then placed beneath fluorescent lights and maintained under the same environmental conditions described above for plants grown in pots. Seedlings were scored for resistance 10–14 days after plating. Resistant seedlings selected for transplantation were removed from agar plates with fine forceps, transferred to soil, and covered with plastic for 2 days to maintain high humidity. Seedling mortality was <5% with this method. The number of resistant seedlings transplanted per transgenic line depended on the R:S ratio observed: 36 plants for a low ratio line (≤5:1) where tagging status could be resolved and 27 plants for moderate ratio lines (5–25:1) where 10 heterozygous transplants were identified and mature seeds harvested. Mutants from parental lines with low R:S ratios were classified as tagged if no wild-type plants were identified among surviving transplants, possibly tagged if one to three wild types were found, and not tagged if four or more wild types were encountered. Mature T3 seeds were harvested as individual siliques from at least two heterozygotes per mutant and bulk harvested from the remaining heterozygotes to provide sufficient seed stocks for subsequent analysis. Plant tissue from tagged mutants was frozen in liquid nitrogen, stored at −80°, and sent to Syngenta on dry ice for molecular analysis.
Phenotypic analysis: Mutants were placed into broad phenotypic classes on the basis of terminal seed phenotypes observed under a dissecting microscope (Meinke 1994). This was first done with primary transformants and later confirmed by analysis of progeny plants. Mutant seeds that lacked a visible embryo were classified as zygotic (very small seed) or preglobular (intermediate seed size) even when a globular embryo was later found with Nomarski optics (Meinke 1994). Mutants classified as globular contained a spherical arrested embryo that was visible upon dissection; transition mutants produced a range of globular, triangular, and heart-shaped embryos; and cotyledon mutants contained embryos with more advanced cotyledons. The remaining emb mutants were classified as variable (wide range of seed sizes and embryo stages) or other (unusual features not found in standard classes). Seeds with a glassy appearance characteristic of titan mutants (Liu and Meinke 1998) were examined further with Nomarski optics. Pigment mutants without defects in seed morphogenesis were classified as albino (extensive loss of seed pigmentation), pale mature (moderate loss of seed pigmentation), or fusca (inappropriate accumulation of anthocyanin). All genetic and phenotypic data were assembled into an Access (Microsoft, Redmond, WA) database designed to facilitate data tracking.
Plasmid rescue: During the initial phase of this project, plasmid rescue was used to recover border sequences flanking T-DNA inserts (Feldmann 1992). Genomic plant DNA (~1 μg) isolated according to Reiter et al. (1992) was digested with restriction enzyme (New England Biolabs, Beverly, MA) and resulting fragments were ligated and electroporated into Escherichia coli cells (XL-1 Blue; Stratagene) using standard protocols. Resistant colonies were selected on 100 μg/ml ampicillin, plasmid DNA was isolated using QIAGEN (Valencia, CA) or Promega (Madison, WI) kits, and selected clones were analyzed by restriction digestion and sequencing.
Thermal asymmetric interlaced PCR: Most border sequences were recovered in a high-throughput manner using a modified thermal asymmetric interlaced (TAIL)-PCR protocol (Liuet al. 1995). DNA from lyophilized plant tissue was obtained using the Puregene DNA isolation kit (Gentra Systems, Minneapolis) with the following modifications. After resuspension of the DNA pellet in hydration buffer, samples were treated with RNase I, incubated at 37° for 30 min, extracted with phenol:chloroform:isoamyl alcohol (24:24:1), and extracted again with chloroform:isoamyl alcohol. DNA was precipitated with 0.1 volume 3 m sodium acetate, pH 5.2, and an equal volume of isopropyl alcohol. Samples were then centrifuged at 20,000 × g for 5 min, and the pellet was rinsed with 70% ethanol, air dried, resuspended in TE buffer at pH 8.0, and analyzed by agarose gel electrophoresis. Several modifications were made in the TAIL-PCR method of Liu et al. (1995). The amount of genomic DNA was reduced to 1 ng per primary reaction; T-DNA-specific primers with a higher Tm were designed (Table 1); and the tertiary reaction volume was reduced to 50 μl. With PE 9600 thermocyclers (Applied Biosystems, Foster City, CA), reaction mixes were held on ice until the block warmed to 30°. When MJ Research (Waltham, MA) Tetrad DNA Engines were used, a 2-min hold at 4° was added to the beginning of the program and reactions were placed on the thermocycler block during this refrigerated dwell. The slow ramp program step used with these machines was 0.2°/sec to 72°.
To improve the efficiency of TAIL-PCR, an abbreviated protocol (TAIL2K) was developed where the secondary and tertiary reactions were replaced with a single 40-μl reaction subjected to modified cycling conditions. Initial linear amplification with the secondary T-DNA primer was carried out for 5 cycles (94°, 10 sec; 64°, 1 min; 72°, 2.5 min), followed by 15 “super” cycles (94°, 10 sec; 64°, 1 min; 72°, 2.5 min; 94°, 10 sec; 64°, 1 min; 72°, 2.5 min; 94°, 10 sec; 44°, 1 min; 72°, 2.5 min). Reactions were then subjected to a low annealing temperature PCR for 5 cycles (94°, 10 sec; 44°, 1 min; 72°, 3 min). A final extension at 72° for 5 min was followed by a hold at 4°. The MJ Research thermocycler was utilized in the calculated temperature control method to execute cycling to emulate the PE9600.
Twelve parallel TAIL-PCR reaction sets were performed on DNA recovered from each tagged mutant. Six degenerate (AD) primers (Table 1) were used in combination with two sets of specific primers, one set for each T-DNA border. A 5-μl sample of each completed TAIL-PCR reaction was subjected to electrophoresis on a 2% agarose gel. Successful TAIL-PCR reactions (10 μl) were incubated at 37° for 1 hr with 2 units of exonuclease I (United States Biochemicals, Cleveland) and 0.5 units of shrimp alkaline phosphatase (United States Biochemicals) added in 2 μl of 1× PCR buffer, followed by a 10-min 70° heat inactivation step. The treated reaction mixture was used as template for direct DNA sequencing (ABI Prism 377) with Big Dye terminator chemistry (Applied Biosystems), using the tertiary border primer.
Border confirmation and second border rescue: Borders recovered through plasmid rescue were confirmed using Southern blots probed with PCR products generated from Arabidopsis sequences near the insertion point. The presence of a polymorphic hybridizing band with DNA from heterozygous plants was used as evidence that a legitimate border had been identified. With borders isolated through TAIL-PCR, insertion points identified by BLAST (Altschulet al. 1997) comparison to Arabidopsis genomic sequence were confirmed by direct PCR, using a genome-specific primer (GSP) located 100–500 bases from the insertion site (oriented toward the insertion) in combination with the secondary or tertiary T-DNA primer (Table 1). All GSPs were designed as pairs flanking an insertion point. Each GSP was then paired with the appropriate border primer in confirmation reactions, using DNA from heterozygous plants. A wild-type reaction with GSPs alone was run as a control. When TAIL-PCR failed to recover both sides of an insertion, two additional GSPs more distant (~2 and 5 kb) from the insertion site and pointing toward the expected second border were designed. Long PCR conditions (Advantage 2; Clontech, Palo Alto, CA) were then employed following the manufacturer's directions. Seven PCR reactions were performed using appropriate pairs of genomic and T-DNA border primers. When this method failed, distant GSPs were paired with internal T-DNA primers designed at 1-kb intervals in both orientations across the entire T-DNA. Any borders recovered with this approach were classified as abnormal because they lacked the expected 24-bp T-DNA imperfect repeat characteristic of right and left borders. Mutants that failed this abnormal second border rescue were set aside.
Border sequence analysis:Sequence trace files for each tagged mutant were imported into a mutant-specific Sequencher project (Gene Codes, Ann Arbor, MI). Trace files were typically examined for overall quality and manually trimmed. Consensus sequences assembled from multiple PCR products were then compared with the Arabidopsis genome using BLAST (Altschulet al. 1997). All sequence information was entered into a work prioritization database where border recovery status was tracked.
Identification of mutants: The results of our efforts to saturate the genome with seed mutations are summarized in Table 2. More than 120,000 independent insertion lines were generated and siliques of resistant transplants were screened for defective seeds. This strategy made more efficient use of growth space than screening sibling plants in the next generation. Of the 4200 putative seed mutants identified through our initial screen, ~70% were defective in embryo morphology and the remainder were altered in seed pigmentation. The frequency of embryo defectives identified (2.5% of lines screened) was similar to that reported with smaller populations (Castleet al. 1993; Devicet al. 1996). To estimate the degree of saturation achieved, we first examined segregation of the selectable marker in several hundred lines chosen at random. The results indicated that T1 plants averaged 1.5 insertion sites and that the entire population contained at least 180,000 independent insertions.
Genetic characterization of mutants: The strategy developed for high-throughput genetic analysis of mutants recovered from this screen is outlined in Figure 1. During the first phase of the project, T2 seeds from each putative mutant were planted in soil and grown to maturity to confirm the presence of a seed phenotype and then plated on selection medium to estimate the number of unlinked insertion sites and resolve tagging status. To maximize throughput in later phases of the project, we eliminated screening of T2 plants in soil except when seed was limited. Tagging status was resolved by transplanting resistant seedlings from putative mutants with a low R:S ratio indicative of a single insertion site and screening the resulting plants for defective seeds. Expected responses of tagged and untagged mutants are summarized in Table 3. Mutants with T-DNA inserts linked to the mutation were expected to give intermediate results. Tagged mutants with a single functional insert should produce only heterozygous (emb/EMB) transplants because wild-type (EMB/EMB) seedlings lack the resistance (T-DNA) marker associated with the mutation and are eliminated during selection.
The difference between R:S ratios expected for tagged and untagged mutants could not be used to resolve tagging status given the small number of seedlings scored. This is illustrated in Figure 2 with a histogram of ratios obtained for a large collection of mutants recovered from lines with a low R:S ratio. Although the average ratio for tagged mutants (2.2 ± 1.2) was lower than that for untagged mutants (3.0 ± 1.0), consistent with results predicted from Table 3, there was sufficient overlap to require analysis of transplants to resolve tagging status. Resistant seedlings from mutants with a moderate ratio indicative of two unlinked insertion sites were also transplanted to soil but in this case progeny seeds from 10 heterozygotes were plated again to identify subfamilies without the second insert. Analysis was usually terminated if the desired subfamily with a low R:S ratio was not identified. Mutants with high ratios indicative of multiple insertion sites required more extensive work to resolve tagging status and were not usually examined further.
More than 2100 putative seed mutants identified through the initial screen of parental lines were subjected to genetic analysis using this approach. Included in this collection were 1501 emb and 204 pigment mutants with confirmed seed phenotypes, ~20 mutants with defects seen only after germination, and ~220 putative mutants from high-ratio lines where seed phenotypes were not confirmed because resistant seedlings were not transplanted. Another 220 lines with no apparent phenotype were found, most of them early in the project before screening of parental (T1) transplants had been perfected. Forty lines segregating for two different seed mutations were also encountered. Half of these lines carried two different embryo defectives. Most of the others had one pigment mutation and one embryo defective. As shown in Table 4, approximately one-half of the confirmed mutants contained a single insertion site on the basis of genetic data, another 20–30% appeared to contain two unlinked insertions, and the remainder gave high ratios indicative of multiple insertion sites. The proportion of mutants assigned to each ratio class was consistent between embryo defectives and pigment mutants.
The tagging status of >1000 confirmed seed mutants has been resolved to date (Table 5). This includes >90% of the mutants recovered from parental lines with one or two insertion sites. A total of 354 tagged mutants defective in embryo development and another 48 tagged mutants altered in seed pigmentation have been identified (Table 2). This represents a significant advance when compared to past studies (Castleet al. 1993; Yadegariet al. 1994; Devicet al. 1996). Ratios were calculated for ~450,000 seedlings on selection plates and >30,000 resistant transplants were screened for defective seeds. Most of the unresolved mutants contained multiple insertion sites. The number of untagged mutants identified was about twice the number of tagged mutants. The frequency of tagged mutants (~35% of total mutants resolved) was independent of the initial number of insertion sites. In other words, tagged mutants were identified from parental lines with one and two insertion sites at about the same frequency. The tagging status of some mutants classified as possibly tagged was resolved by comparing border sequences obtained from heterozygous and sibling wild-type transplants.
Phenotypic characterization of mutants: Terminal phenotypes of mutant embryos were first noted for parental (T1) plants and then later confirmed in subsequent generations. A simple classification system based on seed size and embryo morphology as viewed through a dissecting microscope was used to assign mutants to phenotypic classes. Detailed examination of cleared seeds with Nomarski optics was reserved for mutants with exceptional defects and mutants with glassy seeds indicative of a titan phenotype (Liu and Meinke 1998; McElveret al. 2000). The distribution of terminal phenotypes observed (Table 6) was similar to that reported for other mutant collections (Castleet al. 1993). The large number of tagged mutants identified here made it possible to compare terminal phenotypes of tagged mutants to those of the entire collection. If most mutants with phenotypes extending beyond the globular stage were weak alleles of genes required at early stages of development, then tagged mutants should on average arrest at some-what earlier stages because many T-DNA insertions are likely to be null alleles (Krysanet al. 1999). The results as illustrated in Table 6 do not support this conclusion. The distributions of phenotypic classes for tagged mutants were not significantly different from those observed with the entire collection.
Recovery of border sequences: A classification system was devised to summarize the status of border sequence identification for tagged mutants. The number of mutants assigned to each class is shown in Table 7. Class A mutants had sequence information from both sides of the insert and were therefore assigned the highest confidence level for gene identification. Most of the class B mutants, with sequence data from only one side of the insert, were assumed to have a truncated or rearranged insert that interfered with recovery of the second border. Although preliminary gene assignments for many of these mutants are likely to be correct, the presence of an extended deletion adjacent to an aberrant border could in some cases remove a linked gene responsible for the mutant phenotype. Mutants assigned to this class have therefore been excluded from the functional analyses described later. Class Cs mutants had sequence information from both sides of the insert but appeared to contain a deletion that spanned several adjacent genes. These mutants could nevertheless be included with Class A/B mutants on the physical map because the phenotype was associated with a defined chromosomal locus. Class C mutants had sequence data from two or more linked borders, indicative of chromosomal rearrangements or multiple linked inserts. Mutants with even more complex patterns of unlinked insertion sites were assigned to Class D. Some of these mutants could with further work be resolved and upgraded.
Border sequence assignments: The process of assigning border sequences to a specific genomic locus is illustrated in Table 8. Results are presented for 12 class A mutants representative of the collection. In each case, consensus flanking sequences from both sides of the insert matched the same locus. The next most homologous locus differed enough in sequence to allow confirmation that the correct gene had been identified. Rare mismatches between consensus border sequences and the corresponding genomic locus were not unexpected given the high throughput nature of the project. Mutants where such a definitive genome assignment could not be made were set aside for further analysis.
Distribution of target genes: One hundred and ninety-one tagged mutants with sequenced insertions at a single chromosomal location were chosen for detailed analysis. These mutants were from classes A, B, and Cs as defined above. The 167 EMB genes disrupted in these 191 mutants are distributed throughout the genome. The number of target genes identified on each chromosome (Table 9) is proportional to the amount of DNA present and the total number of predicted genes (Arabidopsis Genome Initiative 2000). A sequence-based physical map of EMB genes identified to date is shown in Figure 3. Results are shown for genes represented by a single mutant allele (left line of each chromosome set) and more than one mutant allele (right line). The distribution of target genes along the length of each chromosome appears to be random with the exception of centromeric regions, which are known to contain a lower density of genes. The combined results of genetic (Franzmannet al. 1995) and physical maps support the conclusion that EMB genes are dispersed throughout the genome. The relationship between these gene locations and the presence or absence of localized genome duplications remains to be determined.
Identification of target genes: Predicted functions of the 12 EMB genes noted in Table 8 are listed in Table 10. Included are a variety of metabolic enzymes; chromosome scaffold proteins; enzymes required for protein synthesis, folding, and turnover; and proteins of unknown function. The remaining gene identities will be disclosed through a National Science Foundation-funded 2010 project designed to facilitate the identification and analysis of essential genes in Arabidopsis. Project deliverables include synthesized information on 300 EMB and 75 seed pigment genes; duplicate mutant alleles for many of these genes identified through a combination of forward and reverse genetics; distribution quantities of mutant seeds available to the community; information on gene expression patterns, mutant phenotypes, and related sequences; and a database accessed via the internet. Target gene identities for most of the class A, B, and Cs mutants described here will be made public according to the schedule presented on the project web page (http://www.seedgenes.org).
Duplicate mutant alleles: Included in the collection of 191 class A, B, and Cs mutants presented here are 146 genes disrupted once, 20 genes represented by two mutant alleles, and 1 gene with five separate alleles. This ~8-kb gene has been identified before (A. Ray, personal communication) and is known as SUS1 (Schwartzet al. 1994), SIN1 (Rayet al. 1996), and CAF1 (Jacobsenet al. 1999). Other published genes found here include FUS6 (Castle and Meinke 1994), EDD (Uweret al. 1998), RSW1 (Arioliet al. 1998), SLP (Apuyaet al. 2001), KEULE (Assaadet al. 2001), and RML1/CAD2 (Vernouxet al. 2000). The BIO1 gene inferred from biochemical analysis (Shellhammer and Meinke 1990) and molecular complementation (Pattonet al. 1996) of mutant embryos was also recovered. The frequency of duplicate mutant alleles identified is similar to that described for another collection of mutants examined in detail (Franzmannet al. 1995). Seed mutations in that project were mapped relative to visible markers and complementation tests were performed between mutants that mapped to similar regions of the genome. In both of these studies, the occurrence of duplicate mutant alleles is consistent with the presence of ~500–750 target genes essential for seed development. Table 11 compares the results obtained here with those expected for a genome containing 500–1000 target EMB genes with equal rates of mutation. Although this method has obvious limitations (Franzmannet al. 1995) related to differences in gene size and susceptibility to T-DNA disruption, the results obtained support the conclusion that a rather small number of Arabidopsis genes are readily disrupted by mutation to give a seed phenotype.
One issue that could be addressed for the first time with this large collection of tagged mutants was whether EMB genes disrupted by T-DNA insertion were larger than average. Our analysis revealed that the average size (4.4 ± 2.6 kb) of the predicted genomic coding region for 21 EMB genes represented by duplicate mutant alleles was indeed larger than the average (2.0 ± 1.5 kb) reported for the entire genome (Arabidopsis Genome Initiative 2000). This is consistent with the accepted conclusion that large genes are more frequent targets for T-DNA insertion than small genes (Krysanet al. 1999). We were more surprised to find that EMB genes (class A) represented by a single mutant allele were also somewhat larger than average (2.9 ± 2.1 kb). This suggests that forward genetic screens for seed mutants have to date favored the identification of somewhat larger than average genes and may have missed knockouts in many small genes with essential functions during seed development. Estimates of target EMB gene numbers in Arabidopsis may therefore need to be increased somewhat as additional data are obtained from saturation mutagenesis and reverse genetic experiments. Whether essential genes are larger or smaller on average than other Arabidopsis genes also remains to be resolved.
Functional diversity of genes identified: All EMB sequences were searched against the GenBank nonredundant peptide, nucleotide, EST, and SwissProt databases using the default BLAST 2.0 algorithm. A cutoff of ≤1e−25 expected value was used for a positive match. With small proteins that could not reach this level with a perfect match, a 70% amino acid identity threshold was used. Detailed analyses were performed on 110 EMB genes (class A) with confirmed insertion sites. Forty-six of these (42%) could not be assigned to a functional class, consistent with percentages observed for the entire genome (Arabidopsis Genome Initiative 2000). The remaining 64 proteins spanned a wide range of functional classes, from metabolism (15), cell growth (8), transcription (17), translation (9), and protein fate (11) to protein transport and traffic (2) and plant defense (2). On the basis of TargetP analysis (Emanuelssonet al. 2000), >25% of the 110 EMB proteins included in this study appear to be targeted to chloroplasts. This is about twice the percentage expected on the basis of genome-wide predictions. In contrast, the proportion of EMB proteins predicted to be associated with secretory pathways is lower than expected.
Most EMB proteins with assigned functions have apparent homologs in other organisms. In contrast, most EMB proteins with uncertain functions are plant-specific based on the BLAST criteria described. Six of these proteins have to date been identified only in Arabidopsis. Another two unassigned proteins are found only in plants and bacteria; one appears to be localized to chloroplasts and the other to mitochondria. About one-half of the unassigned proteins contain at least one PROSITE (Hofmannet al. 1999) or Pfam (Batemanet al. 2000) motif that could with further analysis provide some insight into protein function. The other one-half may require more detailed experimentation before predictions of protein function can be made.
Assigning functions to all 26,000 Arabidopsis genes is a monumental task that will require complementary research strategies and extensive multinational coordination. Part of this effort must be devoted to characterizing gene expression patterns and validating biochemical functions of protein products within the plant cell. A second critical component will be to determine the biological significance of each gene product and its relevance to plant growth and development. This will require forward genetics and saturation mutagenesis on a genomics level. When gene functions are not redundant, this objective can be met by determining which gene disruptions result in a mutant phenotype.
We describe here an efficient system for identifying large numbers of genes with essential and nonredundant functions during seed development. More than 120,000 T-DNA insertion lines were screened for defects in seed development, 4200 seed mutants with a wide range of defects were identified, 1705 confirmed mutants were analyzed in detail, 354 tagged embryo-defective mutants and 48 tagged pigment mutants were recovered, 24 examples of duplicate alleles were discovered, and 167 EMB genes with confirmed insertion sites were placed on the physical map. Related efforts to identify genes required for seedling development are described in a companion article (Budziszewskiet al. 2001). When combined with results from other laboratories working at different stages of the life cycle, these projects should help to identify the full complement of essential Arabidopsis genes and define the minimal gene set required to produce a flowering plant.
One question that needs to be addressed is whether mutants defective in essential genes should by definition (a) fail to complete the life cycle, (b) exhibit an obvious phenotype, or (c) display any phenotype from dramatic to subtle. We believe the first definition is too narrow and dependent on growth conditions whereas the third definition is too broad and requires that a variety of methods be used to search for minor defects. We have therefore chosen the second definition of an essential gene for this project—one that is needed for normal growth and development and gives rise to a visible phenotype when disrupted. We believe that this definition represents an effective standard for dealing with a diverse collection of plant genes. A distinction must also be made between essential genes, which are amenable to analysis with knockouts because their functions and expression patterns are not duplicated elsewhere in the genome, and essential functions, some of which are encoded by redundant gene families. The essential genes described in this report therefore represent a subset of the minimal gene set required to fulfill every function required for seed development. Alternative strategies will be needed in the future to determine the consequences of disrupting groups of genes with related and/or overlapping functions.
Considerable progress has already been made in the identification of essential genes in Saccharomyces (Winzeleret al. 1999; Tohe and Oguchi 2000) and Caenorhabditis (Fraseret al. 2000; Gonczyet al. 2000). The systematic approach used to analyze the yeast genome cannot be duplicated with Arabidopsis at present because efficient systems of homologous recombination and gene replacement are not available. The RNA interference methods used to dissect gene function in Caenorhabditis (Fraseret al. 2000) show promise with plant systems (Chuang and Meyerowitz 2000; Metteet al. 2000) but may be more difficult to exploit given the presence of extensive gene duplications. The best approach available at present is therefore to screen large collections of insertion lines for mutant phenotypes and accumulate information on essential genes associated with a disruption of normal growth and development.
Several different classes of seed mutants must be included to meet this objective: gametophytic mutants where gene expression occurs before fertilization but also affects seed development (Springeret al. 1995); embryo defectives characterized by abnormal morphology (Meinke 1994); seed pigment mutants with normal morphology but altered pigmentation (Norriset al. 1995); high chlorophyll fluorescence mutants (Meureret al. 1996); and other types of mutants with biochemical defects that exhibit a cellular phenotype without altering seed morphology or pigmentation. We chose to focus on embryo defectives and a limited number of pigment mutants for this study. Additional examples of mutants with pigment defects that become apparent before and after germination are described by Budziszewski et al. (2001).
Saturating the Arabidopsis genome with mutations affecting seed development presents a significant challenge because of the large number of target genes involved. Confirmation of gene identities through molecular complementation also requires considerable effort and multiple crosses when working with lethals. The identification of duplicate mutant alleles will therefore play an important role both in assessing the degree of saturation and confirming gene identities inferred from the analysis of a single allele. These additional alleles should come from two major sources: forward genetic screens where the same gene has been disrupted more than once and reverse genetic analysis where an attempt has been made to find another T-DNA insertion within the same gene. The public availability of research facilities designed to expedite reverse genetics in Arabidopsis (Krysanet al. 1999) should reduce the effort required to confirm most of the target genes found in this project.
Biological activities of genes reported here extend across a broad spectrum of protein functions and cellular compartments. This is not surprising in light of the many biochemical functions required to make a differentiated plant embryo from a heterotrophic zygote. Many genes known to be essential for growth in other organisms nevertheless remain to be identified. The probable explanations are that saturation has not been achieved and that redundancy (genomic and biochemical) has compensated for the loss of many gene products. Detailed comparison of EMB gene locations relative to known genome duplications may provide further insights into why the loss of certain gene products is tolerated whereas disruption of others interferes with normal seed development. Some interesting trends are already beginning to emerge. The large number of EMB proteins predicted to be targeted to chloroplasts, for example, is consistent with the conclusion that loss of chloroplast function results in embryonic lethality whereas disruption of photosynthetic function alters seed pigmentation but not morphogenesis. Proteins required for transcription, translation, and protein destination also appear to be more common in the present collection than across the whole genome, whereas those involved with intracellular traffic and signaling are underrepresented. The abundance of plant-specific proteins with unclassified functions suggests that plants have diverged from other major groups of organisms with respect to some essential gene functions. Making sense of these functions within a cellular, organismal, and evolutionary context, as stated in Arabidopsis community objectives for the coming decade, will require much more detailed analysis and experimental confirmation in the years ahead.
This was a large-scale project that involved many dedicated individuals at both Syngenta and Oklahoma State University. For assistance with plating, screening, and plant maintenance at OSU, we thank (from past to present) Linda Franzmann, Forrest Bath, Amy Davis, Jennifer Stanfield, Julie Guertin, James Stanfield, Mike Rumbaugh, Justin Rineer, Jia Qian Wu, Dylan Jackson, Ryan Bobsein, Steven Hutchens, Joel Davis, Cathy Sonleitner, Julie Aylward, Stephanie Blochowiak, Laura Meinke, Thomas Showalter, Anna Schissel, Shkelzen Shabani, and Audrey Martinez. For assistance with transformation, screening, and plant maintenance at Syngenta, we thank Karin Nelson, Ernie Madhavan, Jill Dunn, Eddie Cates, Lisa Davis, Katie Stokes, Caroline Durkin, and Amy Gregowski. Sharon Potter-Lewis provided the plasmid rescue protocol, Sandy Volrath performed some of the initial plasmid rescue experiments, Laura Weislo developed TAIL-PCR primers for the pSKI015 vector, Ken Phillips and Greg Budziszewski participated in helpful discussions, and Daphne Preuss provided qrt/qrt seeds and the LAT52:GUS fragment for construction of pCSA110. Special thanks to Eric Ward for guidance during the initial phase of this project.
Communicating editor: C. S. Gasser
- Received April 24, 2001.
- Accepted September 28, 2001.
- Copyright © 2001 by the Genetics Society of America